Full Code of pingcap/ossinsight-lite for AI

main 86e9015a8709 cached

62 files

310.0 KB

80.4k tokens

25 symbols

1 requests

Download .txt

Showing preview only (331K chars total). Download the full file or copy to clipboard to get everything.

Repository: pingcap/ossinsight-lite
Branch: main
Commit: 86e9015a8709
Files: 62
Total size: 310.0 KB

Directory structure:
gitextract_rmebsf2k/

├── .cursor/
│   └── rules/
│       └── styleguide.mdc
├── .github/
│   └── workflows/
│       ├── docs.yml
│       └── sitemap-sync.yml
├── .gitignore
├── .vscode/
│   └── settings.json
├── DEVELOPMENT.md
├── LICENSE
├── Makefile
├── README.md
├── demos.yml
├── mkdocs.yml
├── pyproject.toml
├── scripts/
│   ├── check_dependencies.py
│   ├── generate_demos.py
│   └── sync_from_tidb_docs.py
└── src/
    ├── ai/
    │   ├── concepts/
    │   │   └── vector-search.md
    │   ├── examples/
    │   │   ├── auto-embedding-with-pytidb.md
    │   │   ├── basic-with-pytidb.md
    │   │   ├── fulltext-search-with-pytidb.md
    │   │   ├── hybrid-search-with-pytidb.md
    │   │   ├── image-search-with-pytidb.md
    │   │   ├── index.md
    │   │   ├── memory-with-pytidb.md
    │   │   ├── rag-with-pytidb.md
    │   │   ├── text2sql-with-pytidb.md
    │   │   └── vector-search-with-pytidb.md
    │   ├── guides/
    │   │   ├── auto-embedding.md
    │   │   ├── connect.md
    │   │   ├── filtering.md
    │   │   ├── fulltext-search.md
    │   │   ├── hybrid-search.md
    │   │   ├── image-search.md
    │   │   ├── joins.md
    │   │   ├── raw-queries.md
    │   │   ├── reranking.md
    │   │   ├── tables.md
    │   │   ├── transaction.md
    │   │   └── vector-search.md
    │   ├── index.md
    │   ├── integrations/
    │   │   ├── embedding-cohere.md
    │   │   ├── embedding-gemini.md
    │   │   ├── embedding-huggingface.md
    │   │   ├── embedding-jinaai.md
    │   │   ├── embedding-nvidia-nim.md
    │   │   ├── embedding-openai-compatible.md
    │   │   ├── embedding-openai.md
    │   │   ├── embedding-overview.md
    │   │   ├── embedding-tidb-cloud-hosted.md
    │   │   ├── langchain.md
    │   │   ├── llamaindex.md
    │   │   ├── tidb-mcp-claude-code.md
    │   │   ├── tidb-mcp-claude-desktop.md
    │   │   ├── tidb-mcp-cursor.md
    │   │   ├── tidb-mcp-server.md
    │   │   ├── tidb-mcp-vscode.md
    │   │   └── tidb-mcp-windsurf.md
    │   ├── javascripts/
    │   │   └── mathjax.js
    │   └── quickstart.md
    ├── robots.txt
    ├── styles/
    │   └── extra.css
    └── templates/
        ├── demo_gallery_template.j2
        └── demo_page_template.j2

================================================
FILE CONTENTS
================================================

================================================
FILE: .cursor/rules/styleguide.mdc
================================================
---
description: 
globs: 
alwaysApply: true
---
# Documentation Review Style Guide

## Behavior instruction

You are acting as a **senior technical writer** who is reviewing TiDB documentation pull requests and you always provide ready-to-commit doc suggestions so the PR author can commit them directly.

## Note

- When you finish the review, you directly add comments to the PR instead of requesting changes to avoid blocking the pull request from being merged.
- If the PR author is ti-chi-bot, you only correct English grammar, spelling, and punctuation mistakes, if any.

## Review aspects

- Clarity, simplicity, completeness, and readability
- Logical flow and sentence structure
- Technical accuracy and terminology consistency

## General writing principles

- Correct English grammar, spelling, and punctuation mistakes, if any.
- Make sure the documentation is easy to understand for TiDB users.
- Write in **second person** ("you") when addressing users.
- Prefer **present tense** unless describing historical behavior.
- Avoid unnecessary words and repetition.
- Use **consistent terminology**. For example:

    - ❌ Do not mix "database" and "instance"
    - ✅ Use "replicate" instead of "synchronize" when referring to replicating data from one TiDB cluster to another.

## Structure and format

- Use sentence case for headings (e.g., `## Configure the cluster`).
- Use ordered lists (`1.`, `2.`) for steps.
- Code snippets, command names, options, and paths should be in backticks (`` ` ``).

## Markdown style

- Add a blank line before and after headings and lists.
- Use proper heading hierarchy (no jumping from `##` to `####`).

## Common issues to flag

- Passive voice overuse

    _"The cluster is started by TiUP"_ → _"TiUP starts the cluster"_

- Inconsistent use of technical terms

    _"TiDB Cloud Starter clusters" vs. "TiDB Cloud clusters"_ – pick one.

- Unclear step instructions

     _"Do it like before"_ → _"Repeat step 3 using the updated config file"_

- Grammar and spelling issues

    _"recieve"_ → _"receive"_, _"an TiDB instance"_ → _"a TiDB instance"_

## Special notes

- Follow any existing terminology in our glossary (`/glossary.md` if available).
- When in doubt, favor clarity over cleverness.
- If something might confuse a new user, suggest a reword.

## Purposes of this style guide

This guide helps Gemini Code Assist provide actionable, high-quality suggestions for improving technical documentation, especially for PRs related to user guides, how-to articles, and product reference material.

================================================
FILE: .github/workflows/docs.yml
================================================
name: Docs Publish

on:
  push:
    branches:
      - main

permissions:
  contents: write

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Configure Git Credentials
        run: |
          git config user.name github-actions[bot]
          git config user.email 41898282+github-actions[bot]@users.noreply.github.com
      - name: Install uv
        uses: astral-sh/setup-uv@v5
        with:
          enable-cache: true
      - name: Set up Python
        run: uv python install
      - run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV 
      - uses: actions/cache@v4
        with:
          key: mkdocs-material-${{ env.cache_id }}
          path: .cache 
          restore-keys: |
            mkdocs-material-
      - name: Install dependencies
        run: uv sync
      - name: Generate demo pages and gallery
        run: make generate-demos
      - name: Deploy docs
        run: uv run mkdocs gh-deploy --force


================================================
FILE: .github/workflows/sitemap-sync.yml
================================================
name: Sync Sitemap

on:
  push:
    branches:
      - gh-pages
    paths:
      - 'sitemap.xml'

permissions:
  contents: write

jobs:
  sync-sitemap:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          ref: gh-pages
          token: ${{ secrets.GITHUB_TOKEN }}

      - name: Configure Git Credentials
        run: |
          git config user.name github-actions[bot]
          git config user.email 41898282+github-actions[bot]@users.noreply.github.com

      - name: Check if sitemap.xml exists in root
        id: check-sitemap
        run: |
          if [ -f "sitemap.xml" ]; then
            echo "exists=true" >> $GITHUB_OUTPUT
          else
            echo "exists=false" >> $GITHUB_OUTPUT
          fi

      - name: Create ai directory if it doesn't exist
        if: steps.check-sitemap.outputs.exists == 'true'
        run: mkdir -p ai

      - name: Copy sitemap.xml to ai/sitemap.xml
        if: steps.check-sitemap.outputs.exists == 'true'
        run: cp sitemap.xml ai/sitemap.xml

      - name: Commit and push changes
        if: steps.check-sitemap.outputs.exists == 'true'
        run: |
          git add ai/sitemap.xml
          if git diff --staged --quiet; then
            echo "No changes to commit"
          else
            git commit -m "Sync sitemap.xml to ai/sitemap.xml"
            git push origin gh-pages
          fi 

================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# UV
#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#uv.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# Ruff stuff:
.ruff_cache/

# PyPI configuration file
.pypirc


================================================
FILE: .vscode/settings.json
================================================
{
    "cSpell.words": [
        "FULLTEXT",
        "Pydantic",
        "getenv",
        "jina",
        "jinaai",
        "Rerank",
        "reranker",
        "reranking",
        "tablename",
        "multimodal"
    ]
}

================================================
FILE: DEVELOPMENT.md
================================================
# Development Guide

This guide covers the development setup and maintenance of the TiDB for AI documentation site.

## Quick Start

1. **Install UV** (fast dependency manager):

   ```bash
   curl -LsSf https://astral.sh/uv/install.sh | sh
   ```

2. **Install dependencies**:

   ```bash
   make install
   ```

3. **Start development server**:

   ```bash
   make serve
   ```

## Project Structure

```
├── demos.yml                    # Demo Gallery configuration
├── scripts/generate_demos.py    # Demo Gallery generation script  
├── src/
│   ├── templates/               # Jinja2 templates
│   └── ai/examples/             # Generated demo pages
└── Makefile                     # Build commands
``` 

## Helpful Commands

```bash
# Dependencies
make check          # Check dependencies and setup
make install        # Install/update dependencies

# Development workflow
make serve          # Start development server
make build          # Build documentation site
make clean          # Clean build artifacts

# Demo management
make generate-demos         # Generate demo pages and gallery from demos.yml config
make generate-demo-pages    # Only generate demo pages.
make generate-demo-gallery  # Only generate demo gallery.

# Other
make help           # Show all available commands
```

## Maintain the Demo Gallery

The Demo Gallery showcases AI demos of TiDB and is configured via [demos.yml](demos.yml).

To regenerate the demo gallery from configuration, run:

```bash
make generate-demos
```

### How to add a new demo

You can follow the steps below to add a new demo:

1. Add entry to `demos` array in `demos.yml` with unique `id`, title, description, and display properties

  For example:

  ```yaml
  demos:
    - id: "basic"
      title: "Basic Usage"
      description: "Learn fundamental PyTiDB operations"
      icon: "⚙️"
      background: "linear-gradient(135deg, #10b981, var(--brand-color))"
      link: "basic-with-pytidb/"
      doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/basic/README.md"
      cover_image: null
  ```

2. Add the demo `id` to appropriate category's `demos` array

  For example:

  ```yaml
  categories:
    - id: "featured"
      title: "Search"
      demos: ["image-search"]
  ```

3. Run `make generate-demos` to regenerate
4. Commit changes


================================================
FILE: LICENSE
================================================
Attribution-ShareAlike 3.0 Unported

=======================================================================

CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
LEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN
ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
REGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR DAMAGES
RESULTING FROM ITS USE.

License

THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS
CREATIVE COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS
PROTECTED BY COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK
OTHER THAN AS AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS
PROHIBITED.

BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND
AGREE TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS
LICENSE MAY BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE
RIGHTS CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS
AND CONDITIONS.

1. Definitions

   a. "Adaptation" means a work based upon the Work, or upon the Work
   and other pre-existing works, such as a translation, adaptation,
   derivative work, arrangement of music or other alterations of a
   literary or artistic work, or phonogram or performance and includes
   cinematographic adaptations or any other form in which the Work may
   be recast, transformed, or adapted including in any form
   recognizably derived from the original, except that a work that
   constitutes a Collection will not be considered an Adaptation for
   the purpose of this License. For the avoidance of doubt, where the
   Work is a musical work, performance or phonogram, the
   synchronization of the Work in timed-relation with a moving image
   ("synching") will be considered an Adaptation for the purpose of
   this License.

   b. "Collection" means a collection of literary or artistic works,
   such as encyclopedias and anthologies, or performances, phonograms
   or broadcasts, or other works or subject matter other than works
   listed in Section 1(f) below, which, by reason of the selection and
   arrangement of their contents, constitute intellectual creations,
   in which the Work is included in its entirety in unmodified form
   along with one or more other contributions, each constituting
   separate and independent works in themselves, which together are
   assembled into a collective whole. A work that constitutes a
   Collection will not be considered an Adaptation (as defined below)
   for the purposes of this License.

   c. "Creative Commons Compatible License" means a license that is
   listed at https://creativecommons.org/compatiblelicenses that has
   been approved by Creative Commons as being essentially equivalent
   to this License, including, at a minimum, because that license: (i)
   contains terms that have the same purpose, meaning and effect as
   the License Elements of this License; and, (ii) explicitly permits
   the relicensing of adaptations of works made available under that
   license under this License or a Creative Commons jurisdiction
   license with the same License Elements as this License.

   d. "Distribute" means to make available to the public the original
   and copies of the Work or Adaptation, as appropriate, through sale
   or other transfer of ownership.

   e. "License Elements" means the following high-level license
   attributes as selected by Licensor and indicated in the title of
   this License: Attribution, ShareAlike.

   f. "Licensor" means the individual, individuals, entity or entities
   that offer(s) the Work under the terms of this License.

   g. "Original Author" means, in the case of a literary or artistic
   work, the individual, individuals, entity or entities who created
   the Work or if no individual or entity can be identified, the
   publisher; and in addition (i) in the case of a performance the
   actors, singers, musicians, dancers, and other persons who act,
   sing, deliver, declaim, play in, interpret or otherwise perform
   literary or artistic works or expressions of folklore; (ii) in the
   case of a phonogram the producer being the person or legal entity
   who first fixes the sounds of a performance or other sounds; and,
   (iii) in the case of broadcasts, the organization that transmits
   the broadcast.

   h. "Work" means the literary and/or artistic work offered under the
   terms of this License including without limitation any production
   in the literary, scientific and artistic domain, whatever may be
   the mode or form of its expression including digital form, such as
   a book, pamphlet and other writing; a lecture, address, sermon or
   other work of the same nature; a dramatic or dramatico-musical
   work; a choreographic work or entertainment in dumb show; a musical
   composition with or without words; a cinematographic work to which
   are assimilated works expressed by a process analogous to
   cinematography; a work of drawing, painting, architecture,
   sculpture, engraving or lithography; a photographic work to which
   are assimilated works expressed by a process analogous to
   photography; a work of applied art; an illustration, map, plan,
   sketch or three-dimensional work relative to geography, topography,
   architecture or science; a performance; a broadcast; a phonogram; a
   compilation of data to the extent it is protected as a
   copyrightable work; or a work performed by a variety or circus
   performer to the extent it is not otherwise considered a literary
   or artistic work.

   i. "You" means an individual or entity exercising rights under this
   License who has not previously violated the terms of this License
   with respect to the Work, or who has received express permission
   from the Licensor to exercise rights under this License despite a
   previous violation.

   j. "Publicly Perform" means to perform public recitations of the
   Work and to communicate to the public those public recitations, by
   any means or process, including by wire or wireless means or public
   digital performances; to make available to the public Works in such
   a way that members of the public may access these Works from a
   place and at a place individually chosen by them; to perform the
   Work to the public by any means or process and the communication to
   the public of the performances of the Work, including by public
   digital performance; to broadcast and rebroadcast the Work by any
   means including signs, sounds or images.
   k. "Reproduce" means to make copies of the Work by any means
   including without limitation by sound or visual recordings and the
   right of fixation and reproducing fixations of the Work, including
   storage of a protected performance or phonogram in digital form or
   other electronic medium.

2. Fair Dealing Rights. Nothing in this License is intended to reduce,
limit, or restrict any uses free from copyright or rights arising from
limitations or exceptions that are provided for in connection with the
copyright protection under copyright law or other applicable laws.

3. License Grant. Subject to the terms and conditions of this License,
Licensor hereby grants You a worldwide, royalty-free, non-exclusive,
perpetual (for the duration of the applicable copyright) license to
exercise the rights in the Work as stated below:

   a. to Reproduce the Work, to incorporate the Work into one or more
   Collections, and to Reproduce the Work as incorporated in the
   Collections;

   b. to create and Reproduce Adaptations provided that any such
   Adaptation, including any translation in any medium, takes
   reasonable steps to clearly label, demarcate or otherwise identify
   that changes were made to the original Work. For example, a
   translation could be marked "The original work was translated from
   English to Spanish," or a modification could indicate "The original
   work has been modified.";

   c. to Distribute and Publicly Perform the Work including as
   incorporated in Collections; and,

   d. to Distribute and Publicly Perform Adaptations.

   e. For the avoidance of doubt:

      i. Non-waivable Compulsory License Schemes. In those
      jurisdictions in which the right to collect royalties through
      any statutory or compulsory licensing scheme cannot be waived,
      the Licensor reserves the exclusive right to collect such
      royalties for any exercise by You of the rights granted under
      this License;

      ii. Waivable Compulsory License Schemes. In those jurisdictions
      in which the right to collect royalties through any statutory or
      compulsory licensing scheme can be waived, the Licensor waives
      the exclusive right to collect such royalties for any exercise
      by You of the rights granted under this License; and,

      iii. Voluntary License Schemes. The Licensor waives the right to
      collect royalties, whether individually or, in the event that
      the Licensor is a member of a collecting society that
      administers voluntary licensing schemes, via that society, from
      any exercise by You of the rights granted under this License.

   The above rights may be exercised in all media and formats whether
   now known or hereafter devised. The above rights include the right
   to make such modifications as are technically necessary to exercise
   the rights in other media and formats. Subject to Section 8(f), all
   rights not expressly granted by Licensor are hereby reserved.

4. Restrictions. The license granted in Section 3 above is expressly
made subject to and limited by the following restrictions:

   a. You may Distribute or Publicly Perform the Work only under the
   terms of this License. You must include a copy of, or the Uniform
   Resource Identifier (URI) for, this License with every copy of the
   Work You Distribute or Publicly Perform. You may not offer or
   impose any terms on the Work that restrict the terms of this
   License or the ability of the recipient of the Work to exercise the
   rights granted to that recipient under the terms of the License.
   You may not sublicense the Work. You must keep intact all notices
   that refer to this License and to the disclaimer of warranties with
   every copy of the Work You Distribute or Publicly Perform. When You
   Distribute or Publicly Perform the Work, You may not impose any
   effective technological measures on the Work that restrict the
   ability of a recipient of the Work from You to exercise the rights
   granted to that recipient under the terms of the License. This
   Section 4(a) applies to the Work as incorporated in a Collection,
   but this does not require the Collection apart from the Work itself
   to be made subject to the terms of this License. If You create a
   Collection, upon notice from any Licensor You must, to the extent
   practicable, remove from the Collection any credit as required by
   Section 4(c), as requested. If You create an Adaptation, upon
   notice from any Licensor You must, to the extent practicable,
   remove from the Adaptation any credit as required by Section 4(c),
   as requested.

   b. You may Distribute or Publicly Perform an Adaptation only under
   the terms of: (i) this License; (ii) a later version of this
   License with the same License Elements as this License; (iii) a
   Creative Commons jurisdiction license (either this or a later
   license version) that contains the same License Elements as this
   License (e.g., Attribution-ShareAlike 3.0 US)); (iv) a Creative
   Commons Compatible License. If you license the Adaptation under one
   of the licenses mentioned in (iv), you must comply with the terms
   of that license. If you license the Adaptation under the terms of
   any of the licenses mentioned in (i), (ii) or (iii) (the
   "Applicable License"), you must comply with the terms of the
   Applicable License generally and the following provisions: (I) You
   must include a copy of, or the URI for, the Applicable License with
   every copy of each Adaptation You Distribute or Publicly Perform;
   (II) You may not offer or impose any terms on the Adaptation that
   restrict the terms of the Applicable License or the ability of the
   recipient of the Adaptation to exercise the rights granted to that
   recipient under the terms of the Applicable License; (III) You must
   keep intact all notices that refer to the Applicable License and to
   the disclaimer of warranties with every copy of the Work as
   included in the Adaptation You Distribute or Publicly Perform; (IV)
   when You Distribute or Publicly Perform the Adaptation, You may not
   impose any effective technological measures on the Adaptation that
   restrict the ability of a recipient of the Adaptation from You to
   exercise the rights granted to that recipient under the terms of
   the Applicable License. This Section 4(b) applies to the Adaptation
   as incorporated in a Collection, but this does not require the
   Collection apart from the Adaptation itself to be made subject to
   the terms of the Applicable License.

   c. If You Distribute, or Publicly Perform the Work or any
   Adaptations or Collections, You must, unless a request has been
   made pursuant to Section 4(a), keep intact all copyright notices
   for the Work and provide, reasonable to the medium or means You are
   utilizing: (i) the name of the Original Author (or pseudonym, if
   applicable) if supplied, and/or if the Original Author and/or
   Licensor designate another party or parties (e.g., a sponsor
   institute, publishing entity, journal) for attribution
   ("Attribution Parties") in Licensor's copyright notice, terms of
   service or by other reasonable means, the name of such party or
   parties; (ii) the title of the Work if supplied; (iii) to the
   extent reasonably practicable, the URI, if any, that Licensor
   specifies to be associated with the Work, unless such URI does not
   refer to the copyright notice or licensing information for the
   Work; and (iv) , consistent with Ssection 3(b), in the case of an
   Adaptation, a credit identifying the use of the Work in the
   Adaptation (e.g., "French translation of the Work by Original
   Author," or "Screenplay based on original Work by Original
   Author"). The credit required by this Section 4(c) may be
   implemented in any reasonable manner; provided, however, that in
   the case of a Adaptation or Collection, at a minimum such credit
   will appear, if a credit for all contributing authors of the
   Adaptation or Collection appears, then as part of these credits and
   in a manner at least as prominent as the credits for the other
   contributing authors. For the avoidance of doubt, You may only use
   the credit required by this Section for the purpose of attribution
   in the manner set out above and, by exercising Your rights under
   this License, You may not implicitly or explicitly assert or imply
   any connection with, sponsorship or endorsement by the Original
   Author, Licensor and/or Attribution Parties, as appropriate, of You
   or Your use of the Work, without the separate, express prior
   written permission of the Original Author, Licensor and/or
   Attribution Parties.

   d. Except as otherwise agreed in writing by the Licensor or as may
   be otherwise permitted by applicable law, if You Reproduce,
   Distribute or Publicly Perform the Work either by itself or as part
   of any Adaptations or Collections, You must not distort, mutilate,
   modify or take other derogatory action in relation to the Work
   which would be prejudicial to the Original Author's honor or
   reputation. Licensor agrees that in those jurisdictions (e.g.
   Japan), in which any exercise of the right granted in Section 3(b)
   of this License (the right to make Adaptations) would be deemed to
   be a distortion, mutilation, modification or other derogatory
   action prejudicial to the Original Author's honor and reputation,
   the Licensor will waive or not assert, as appropriate, this
   Section, to the fullest extent permitted by the applicable national
   law, to enable You to reasonably exercise Your right under Section
   3(b) of this License (right to make Adaptations) but not otherwise.

5. Representations, Warranties and Disclaimer

   UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING,
   LICENSOR OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR
   WARRANTIES OF ANY KIND CONCERNING THE WORK, EXPRESS, IMPLIED,
   STATUTORY OR OTHERWISE, INCLUDING, WITHOUT LIMITATION, WARRANTIES
   OF TITLE, MERCHANTIBILITY, FITNESS FOR A PARTICULAR PURPOSE,
   NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS,
   ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS, WHETHER OR NOT
   DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF
   IMPLIED WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU.

6. Limitation on Liability.

   EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE LAW, IN NO EVENT WILL
   LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR ANY SPECIAL,
   INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES ARISING
   OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS
   BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

7. Termination

   a. This License and the rights granted hereunder will terminate
   automatically upon any breach by You of the terms of this License.
   Individuals or entities who have received Adaptations or
   Collections from You under this License, however, will not have
   their licenses terminated provided such individuals or entities
   remain in full compliance with those licenses. Sections 1, 2, 5, 6,
   7, and 8 will survive any termination of this License.

   b. Subject to the above terms and conditions, the license granted
   here is perpetual (for the duration of the applicable copyright in
   the Work). Notwithstanding the above, Licensor reserves the right
   to release the Work under different license terms or to stop
   distributing the Work at any time; provided, however that any such
   election will not serve to withdraw this License (or any other
   license that has been, or is required to be, granted under the
   terms of this License), and this License will continue in full
   force and effect unless terminated as stated above.

8. Miscellaneous

   a. Each time You Distribute or Publicly Perform the Work or a
   Collection, the Licensor offers to the recipient a license to the
   Work on the same terms and conditions as the license granted to You
   under this License.

   b. Each time You Distribute or Publicly Perform an Adaptation,
   Licensor offers to the recipient a license to the original Work on
   the same terms and conditions as the license granted to You under
   this License.

   c. If any provision of this License is invalid or unenforceable
   under applicable law, it shall not affect the validity or
   enforceability of the remainder of the terms of this License, and
   without further action by the parties to this agreement, such
   provision shall be reformed to the minimum extent necessary to make
   such provision valid and enforceable.

   d. No term or provision of this License shall be deemed waived and
   no breach consented to unless such waiver or consent shall be in
   writing and signed by the party to be charged with such waiver or
   consent.

   e. This License constitutes the entire agreement between the
   parties with respect to the Work licensed here. There are no
   understandings, agreements or representations with respect to the
   Work not specified here. Licensor shall not be bound by any
   additional provisions that may appear in any communication from
   You. This License may not be modified without the mutual written
   agreement of the Licensor and You.

   f. The rights granted under, and the subject matter referenced, in
   this License were drafted utilizing the terminology of the Berne
   Convention for the Protection of Literary and Artistic Works (as
   amended on September 28, 1979), the Rome Convention of 1961, the
   WIPO Copyright Treaty of 1996, the WIPO Performances and Phonograms
   Treaty of 1996 and the Universal Copyright Convention (as revised
   on July 24, 1971). These rights and subject matter take effect in
   the relevant jurisdiction in which the License terms are sought to
   be enforced according to the corresponding provisions of the
   implementation of those treaty provisions in the applicable
   national law. If the standard suite of rights granted under
   applicable copyright law includes additional rights not granted
   under this License, such additional rights are deemed to be
   included in the License; this License is not intended to restrict
   the license of any rights under applicable law.

=======================================================================

Creative Commons Notice

Creative Commons is not a party to this License, and makes no warranty
whatsoever in connection with the Work. Creative Commons will not be
liable to You or any party on any legal theory for any damages
whatsoever, including without limitation any general, special,
incidental or consequential damages arising in connection to this
license. Notwithstanding the foregoing two (2) sentences, if Creative
Commons has expressly identified itself as the Licensor hereunder, it
shall have all rights and obligations of Licensor.

Except for the limited purpose of indicating to the public that the
Work is licensed under the CCPL, Creative Commons does not authorize
the use by either party of the trademark "Creative Commons" or any
related trademark or logo of Creative Commons without the prior written
consent of Creative Commons. Any permitted use will be in compliance
with Creative Commons' then-current trademark usage guidelines, as may
be published on its website or otherwise made available upon request
from time to time. For the avoidance of doubt, this trademark
restriction does not form part of the License.

Creative Commons may be contacted at https://creativecommons.org/.

================================================
FILE: Makefile
================================================
# TiDB for AI Documentation Site

.PHONY: help install generate-demos generate-demo-pages generate-demo-gallery serve build clean check

help:
	@echo "TiDB for AI Documentation Site"
	@echo "============================="
	@echo "Available commands:"
	@echo "  install         - Install dependencies using UV"
	@echo "  generate-demos  - Generate demo pages and gallery from demos.yml config"
	@echo "  generate-demo-pages - Only generate demo pages."
	@echo "  generate-demo-gallery - Only generate demo gallery."
	@echo "  serve           - Start the development server"
	@echo "  build           - Build the documentation site"
	@echo "  clean           - Clean build artifacts"
	@echo "  check           - Check dependencies and project setup"
	@echo ""
	@echo "Demo gallery configuration:"
	@echo "  Edit demos.yml in the project root to manage gallery content"
	@echo ""
	@echo "Prerequisites:"
	@echo "  UV package manager - curl -LsSf https://astral.sh/uv/install.sh | sh"

install:
	uv pip install -e .

generate-demos:
	python scripts/generate_demos.py

generate-demo-pages:
	python scripts/generate_demos.py --skip-gallery

generate-demo-gallery:
	python scripts/generate_demos.py --skip-demos

serve:
	mkdocs serve

build:
	mkdocs build

clean:
	rm -rf site/
	rm -rf .mkdocs_cache/

check:
	python scripts/check_dependencies.py 

================================================
FILE: README.md
================================================
# TiDB for AI

TiDB is an open-source, distributed SQL database designed for modern AI applications, offering seamless scalability, real-time analytics, and unified storage for vectors, documents, knowledge graphs, operational data and more.

🚀 Follow our [Quickstart Guide](https://pingcap.github.io/ai/quickstart/) to begin building your first AI application with TiDB

## PyTiDB

TiDB provide a Python SDK and a series of integrations with popular AI frameworks to help developers build AI applications efficiently.

To install the TiDB Python SDK, run the following command:

```bash
pip install pytidb
```

Examples:

- Getting Started: [Basic Usage](https://pingcap.github.io/ai/examples/basic-with-pytidb/), [Auto Embedding](https://pingcap.github.io/ai/examples/auto-embedding-with-pytidb/)
- Search & Retrieval: [Vector Search](https://pingcap.github.io/ai/examples/vector-search-with-pytidb/), [Fulltext Search](https://pingcap.github.io/ai/examples/fulltext-search-with-pytidb/), [Hybrid Search](https://pingcap.github.io/ai/examples/hybrid-search-with-pytidb/), [Image Search](https://pingcap.github.io/ai/examples/image-search-with-pytidb/)
- AI Applications: [RAG](https://pingcap.github.io/ai/examples/rag-with-pytidb/), [Text2SQL](https://pingcap.github.io/ai/examples/text2sql-with-pytidb/), [Memory](https://pingcap.github.io/ai/examples/memory-with-pytidb/)


Integrations:

- AI Frameworks: [LlamaIndex](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-llamaindex/), [LangChain](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-langchain/)
- ORM Libraries: [SQLAlchemy](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-sqlalchemy/), [Django-ORM](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-django-orm/), [Peewee](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-peewee/)
- AI Services: [Bedrock](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-amazon-bedrock/)
- Embedding Models/Services: [JinaAI](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-jinaai-embedding/)

## Contribute

We welcome contributions to improve the TiDB for AI documentation! For development setup, maintenance scripts, and detailed contribution guidelines, please see [DEVELOPMENT.md](DEVELOPMENT.md).

## FAQ

### How can I get support?

- [Join our Discord](https://discord.com/invite/vYU9h56kAX) (recommended)
- [Ask questions in our forum](https://ask.pingcap.com/)
- [Send support tickets](https://tidb.support.pingcap.com/)


================================================
FILE: demos.yml
================================================
# Demo Gallery Configuration
title: "Demo Gallery"
description: |
  Explore hands-on demos showcasing how TiDB empowers AI applications.<br>
  Get started quickly with <a href='https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=demo_gallery' target='_blank' rel='noopener noreferrer' class='gallery-cta-link'><b>TiDB Cloud Starter</b></a> to build your own AI-powered solutions.

meta:
  description: "Explore hands-on demos showcasing how TiDB empowers AI applications. Get started quickly with TiDB Cloud Starter to build your own AI-powered solutions."

categories:
  - id: "featured"
    title: "⭐ Featured"
    demos: ["image-search", "rag", "memory"]
    
  - id: "getting-started"
    title: "🚀 Getting Started"
    demos: ["basic", "auto-embedding"]
    
  - id: "search"
    title: "🔍 Search & Retrieval"
    demos: ["vector-search", "fulltext-search", "hybrid-search", "image-search"]
    
  - id: "ai-apps"
    title: "🤖 AI Applications"
    demos: ["rag", "memory", "text2sql"]

demos:

  - id: "image-search"
    title: "Image Search"
    description: "Build an image search application using multimodal embeddings for both text-to-image and image-to-image search."
    category: "search"
    icon: null
    background: null
    link: "image-search-with-pytidb/"
    doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/image_search/README.md"
    cover_image: "https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9"
    
  - id: "rag"
    title: "RAG"
    description: "Build a RAG application that combines document retrieval with language generation."
    category: "ai-apps"
    icon: null
    background: null
    link: "rag-with-pytidb/"
    doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/rag/README.md"
    cover_image: "https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e"
    
  - id: "basic"
    title: "Basic Usage"
    description: "Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation."
    category: "getting-started"
    icon: "⚙️"
    background: "linear-gradient(135deg, #10b981, var(--brand-color))"
    link: "basic-with-pytidb/"
    doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/basic/README.md"
    cover_image: null
    
  - id: "auto-embedding"
    title: "Auto Embedding"
    description: "Automatically generate embeddings for your text data using built-in embedding models."
    category: "getting-started"
    icon: "🤖"
    background: "radial-gradient(circle at center, #8b5cf6 0%, var(--brand-color) 100%)"
    link: "auto-embedding-with-pytidb/"
    doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/auto_embedding/README.md"
    cover_image: null
    
  - id: "vector-search"
    title: "Vector Search"
    description: "Implement semantic search using vector embeddings to find similar content."
    category: "search"
    icon: null
    background: null
    link: "vector-search-with-pytidb/"
    doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/vector_search/README.md"
    cover_image: "https://github.com/user-attachments/assets/6d7783a5-ce9c-4dcc-8b95-49d5f0ca735a"
    
  - id: "fulltext-search"
    title: "Fulltext Search"
    description: "Perform traditional text search using MySQL fulltext search capabilities."
    category: "search"
    icon: null
    background: null
    link: "fulltext-search-with-pytidb/"
    doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/fulltext_search/README.md"
    cover_image: "https://github.com/user-attachments/assets/c81ddad4-f996-4b1f-85c0-5cbb55bc2a3a"
    
  - id: "hybrid-search"
    title: "Hybrid Search"
    description: "Combine vector search and fulltext search for more comprehensive results."
    category: "search"
    icon: null
    background: null
    link: "hybrid-search-with-pytidb/"
    doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/hybrid_search/README.md"
    cover_image: "https://github.com/user-attachments/assets/6e1c639d-2160-44c8-86b4-958913b9eca5"
    
  - id: "memory"
    title: "Memory"
    description: "Implement conversation memory for chatbots and conversational AI applications."
    category: "ai-apps"
    icon: null
    background: null
    link: "memory-with-pytidb/"
    doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/memory/README.md"
    cover_image: "https://github.com/user-attachments/assets/74dee96b-ea20-49dc-ad27-679faa5bf9b8"
    
  - id: "text2sql"
    title: "Text2SQL"
    description: "Convert natural language queries into SQL statements using AI models."
    category: "ai-apps"
    icon: "💬"
    background: "linear-gradient(135deg, #06b6d4, var(--brand-color))"
    link: "text2sql-with-pytidb/"
    doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/text2sql/README.md"
    cover_image: null

# CTA section configuration
cta:
  title: "Ready to build your AI application?"
  description: "Start your AI journey with TiDB Cloud Starter. Follow our quickstart guide to build your first AI-powered application in minutes, or explore specific examples for your use case."
  buttons:
    - text: "Try TiDB Cloud Starter"
      url: "https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme"
      type: "primary"
      external: true
    - text: "View Quickstart Guide"
      url: "/ai/quickstart/"
      type: "secondary"
      external: false 


================================================
FILE: mkdocs.yml
================================================
site_name: TiDB for AI
site_url: https://pingcap.github.io
repo_name: pingcap/tidb
repo_url: https://github.com/pingcap/tidb
edit_uri: https://github.com/pingcap/pingcap.github.io/tree/main/src
docs_dir: src
watch:
  - src

theme:
  name: material
  logo: assets/logo.svg
  favicon: assets/favicon.ico
  color_mode: dark
  palette:
    # Palette toggle for dark mode
    - scheme: tidb-dark
      primary: custom
      toggle:
        icon: material/weather-night
        name: Switch to light mode
    # Palette toggle for light mode
    - scheme: tidb-light
      primary: custom
      toggle:
        icon: material/weather-sunny
        name: Switch to dark mode
  features:
    - content.code.copy
    - content.tabs.link
    - content.action.edit
    - content.tooltips
    - toc.follow
    - navigation.top
    - navigation.tabs
    - navigation.tabs.sticky
    - navigation.footer
    - navigation.tracking
    - navigation.instant
    - content.footnote.tooltips
  icon:
    repo: fontawesome/brands/github
    annotation: material/arrow-right-circle

plugins:
  - search
  # - autorefs
  # - mkdocs-jupyter
  - redirects:
      redirect_maps:
        "index.md": "ai/index.md"
        "ai/integrations/mcp.md": "ai/integrations/tidb-mcp-server.md"

markdown_extensions:
  - admonition
  - footnotes
  - pymdownx.critic
  - pymdownx.caret
  - pymdownx.keys
  - pymdownx.mark
  - pymdownx.tilde
  - pymdownx.details
  - pymdownx.highlight:
      anchor_linenums: true
      line_spans: __span
      pygments_lang_class: true
  - pymdownx.inlinehilite
  - pymdownx.snippets:
      base_path: ..
      dedent_subsections: true
  - pymdownx.superfences
  - pymdownx.tabbed:
      alternate_style: true
  - md_in_html
  - abbr
  - attr_list
  - pymdownx.snippets
  - pymdownx.emoji:
      emoji_index: !!python/name:material.extensions.emoji.twemoji
      emoji_generator: !!python/name:material.extensions.emoji.to_svg
  - markdown.extensions.toc:
      baselevel: 1
      permalink: ""
  - pymdownx.arithmatex:
      generic: true

nav:
  - Home:
    - TiDB for AI: ai/index.md
    - 🚀 Quickstart: ai/quickstart.md
    - 📖 Concepts:
      - Vector Search: ai/concepts/vector-search.md
    - 📚 Guides:
      - Connect to database: ai/guides/connect.md
      - Working with tables: ai/guides/tables.md
      - Vector Search: ai/guides/vector-search.md
      - Fulltext Search: ai/guides/fulltext-search.md
      - Hybrid Search: ai/guides/hybrid-search.md
      - Image Search: ai/guides/image-search.md
      - Auto Embedding: ai/guides/auto-embedding.md
      - Reranking: ai/guides/reranking.md
      - Filtering: ai/guides/filtering.md
    - 💡 Demos:
      - Gallery: ai/examples/index.md
      - Basic Usage: ai/examples/basic-with-pytidb.md
      - Auto Embedding: ai/examples/auto-embedding-with-pytidb.md
      - Vector Search: ai/examples/vector-search-with-pytidb.md
      - Fulltext Search: ai/examples/fulltext-search-with-pytidb.md
      - Hybrid Search: ai/examples/hybrid-search-with-pytidb.md
      - Image Search: ai/examples/image-search-with-pytidb.md
      - RAG: ai/examples/rag-with-pytidb.md
      - Memory: ai/examples/memory-with-pytidb.md
      - Text2SQL: ai/examples/text2sql-with-pytidb.md
    - 🔌 Integrations:
      - MCP Integration:
        - TiDB MCP Server: ai/integrations/tidb-mcp-server.md
      - IDE & Tool Integration:
        - Cursor: ai/integrations/tidb-mcp-cursor.md
        - Claude Desktop: ai/integrations/tidb-mcp-claude-desktop.md
        - Claude Code: ai/integrations/tidb-mcp-claude-code.md
        - Visual Studio Code: ai/integrations/tidb-mcp-vscode.md
        - Windsurf: ai/integrations/tidb-mcp-windsurf.md
      - AI Frameworks:
        - LlamaIndex: ai/integrations/llamaindex.md
        - LangChain: ai/integrations/langchain.md
      - Embeddings:
        - Overview: ai/integrations/embedding-overview.md
        - TiDB Cloud Hosted: ai/integrations/embedding-tidb-cloud-hosted.md
        - OpenAI: ai/integrations/embedding-openai.md
        - OpenAI Compatible: ai/integrations/embedding-openai-compatible.md
        - Cohere: ai/integrations/embedding-cohere.md
        - Jina AI: ai/integrations/embedding-jinaai.md
        - Google Gemini: ai/integrations/embedding-gemini.md
        - Hugging Face: ai/integrations/embedding-huggingface.md
        - NVIDIA NIM: ai/integrations/embedding-nvidia-nim.md

  - Concepts:
    - Vector Search: ai/concepts/vector-search.md
  - Guides:
    - Connect to database: ai/guides/connect.md
    - Working with tables: ai/guides/tables.md
    - Vector Search: ai/guides/vector-search.md
    - Fulltext Search: ai/guides/fulltext-search.md
    - Hybrid Search: ai/guides/hybrid-search.md
    - Image Search: ai/guides/image-search.md
    - Auto Embedding: ai/guides/auto-embedding.md
    - Reranking: ai/guides/reranking.md
    - Filtering: ai/guides/filtering.md
    - Transaction: ai/guides/transaction.md
    - Raw Queries: ai/guides/raw-queries.md
    - Multiple Table Joins: ai/guides/joins.md
  - Demos:
    - Gallery: ai/examples/index.md
    - Basic Usage: ai/examples/basic-with-pytidb.md
    - Auto Embedding: ai/examples/auto-embedding-with-pytidb.md
    - Vector Search: ai/examples/vector-search-with-pytidb.md
    - Fulltext Search: ai/examples/fulltext-search-with-pytidb.md
    - Hybrid Search: ai/examples/hybrid-search-with-pytidb.md
    - Image Search: ai/examples/image-search-with-pytidb.md
    - RAG: ai/examples/rag-with-pytidb.md
    - Memory: ai/examples/memory-with-pytidb.md
    - Text2SQL: ai/examples/text2sql-with-pytidb.md
  - Integrations:
    - MCP Integration:
      - TiDB MCP Server: ai/integrations/tidb-mcp-server.md
    - IDE & Tool Integration:
      - Cursor: ai/integrations/tidb-mcp-cursor.md
      - Claude Desktop: ai/integrations/tidb-mcp-claude-desktop.md
      - Claude Code: ai/integrations/tidb-mcp-claude-code.md
      - Visual Studio Code: ai/integrations/tidb-mcp-vscode.md
      - Windsurf: ai/integrations/tidb-mcp-windsurf.md
    - AI Frameworks:
      - LlamaIndex: ai/integrations/llamaindex.md
      - LangChain: ai/integrations/langchain.md
    - Embeddings:
      - Overview: ai/integrations/embedding-overview.md
      - TiDB Cloud Hosted: ai/integrations/embedding-tidb-cloud-hosted.md
      - OpenAI: ai/integrations/embedding-openai.md
      - OpenAI Compatible: ai/integrations/embedding-openai-compatible.md
      - Cohere: ai/integrations/embedding-cohere.md
      - Jina AI: ai/integrations/embedding-jinaai.md
      - Google Gemini: ai/integrations/embedding-gemini.md
      - Hugging Face: ai/integrations/embedding-huggingface.md
      - NVIDIA NIM: ai/integrations/embedding-nvidia-nim.md

extra:
  social:
    - icon: fontawesome/brands/github
      link: https://github.com/pingcap/pytidb
    - icon: fontawesome/brands/x-twitter
      link: https://twitter.com/TiDB_Developer
    - icon: fontawesome/brands/linkedin
      link: https://www.linkedin.com/company/pingcap
  analytics:
    provider: google
    property: G-W2XGDSWFBH

extra_css:
  - styles/extra.css

extra_javascript:
  - https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js


================================================
FILE: pyproject.toml
================================================
# TiDB for AI Documentation
# This project uses UV for fast dependency management
# Install UV: curl -LsSf https://astral.sh/uv/install.sh | sh
# Install deps: make install (which runs: uv pip install -e .)

[project]
name = "docs"
version = "0.1.0"
description = "The documentation for TiDB and its AI capabilities"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
    "mkdocs-material>=9.6.12",
    "mkdocs-jupyter>=0.25.1",
    "mkdocstrings[python]>=0.29.1",
    "mkdocs>=1.6.1",
    "mkdocs-redirects>=1.2.2",
    "requests>=2.31.0",
    "PyYAML>=6.0",
    "Jinja2>=3.1.0",
    "click>=8.0.0",
]


================================================
FILE: scripts/check_dependencies.py
================================================
#!/usr/bin/env python3
"""Simple dependency check for TiDB for AI documentation."""

import sys
import subprocess

def main():
    print("🚀 TiDB for AI Documentation - Quick Check")
    
    # Check UV
    try:
        result = subprocess.run(["uv", "--version"], capture_output=True, text=True)
        if result.returncode == 0:
            print("✅ UV is available")
        else:
            print("❌ UV not working")
            return 1
    except FileNotFoundError:
        print("❌ UV not found - Install: curl -LsSf https://astral.sh/uv/install.sh | sh")
        return 1
    
    # Check basic imports
    try:
        import mkdocs, requests, yaml
        print("✅ Dependencies are installed")
    except ImportError as e:
        print(f"❌ Missing dependency: {e}")
        print("💡 Run: make install")
        return 1
    
    print("🎉 Ready to go! Run 'make serve' to start.")
    return 0

if __name__ == "__main__":
    sys.exit(main()) 

================================================
FILE: scripts/generate_demos.py
================================================
#!/usr/bin/env python3
"""
Generate demo gallery and individual demo documentation pages.
This script reads configuration from demos.yml and generates:
1. Individual demo pages using demo_template.j2
2. Gallery index page using gallery_template.j2
"""

import sys
import requests
import re
import yaml
from pathlib import Path
from jinja2 import Environment, FileSystemLoader
import click

# Configuration
LOCAL_EXAMPLES_DIR = Path("src/ai/examples")
CONFIG_FILE = Path("demos.yml")
DEMO_TEMPLATE_FILE = Path("src/templates/demo_page_template.j2")
GALLERY_TEMPLATE_FILE = Path("src/templates/demo_gallery_template.j2")
OUTPUT_FILE = LOCAL_EXAMPLES_DIR / "index.md"


def load_config():
    """Load configuration from YAML file."""
    if not CONFIG_FILE.exists():
        raise click.ClickException(f"Configuration file {CONFIG_FILE} not found.")
    
    try:
        with open(CONFIG_FILE, 'r', encoding='utf-8') as f:
            config = yaml.safe_load(f)
        return config
    except yaml.YAMLError as e:
        raise click.ClickException(f"Error parsing {CONFIG_FILE}: {e}")


def fetch_readme_from_doc_link(doc_link):
    """Fetch README.md content from the provided doc_link."""
    # Convert GitHub tree URL to raw content URL
    if "github.com" in doc_link and "/tree/" in doc_link:
        raw_url = doc_link.replace("github.com", "raw.githubusercontent.com").replace("/tree/", "/")
    elif doc_link.endswith("/README.md"):
        raw_url = doc_link
    else:
        # Assume it's a directory link, append README.md
        raw_url = doc_link.rstrip('/') + '/README.md'
        if "github.com" in raw_url and "/tree/" in raw_url:
            raw_url = raw_url.replace("github.com", "raw.githubusercontent.com").replace("/tree/", "/")
    
    try:
        response = requests.get(raw_url, timeout=30)
        response.raise_for_status()
        return response.text
    except requests.RequestException as e:
        click.echo(f"Error fetching README from {raw_url}: {e}", err=True)
        return None


def extract_repo_info_from_doc_link(doc_link):
    """Extract repository and path information from doc_link."""
    if "github.com" not in doc_link:
        return None, None, None, None
        
    # Parse URL to extract owner, repo, and path
    parts = doc_link.replace("https://github.com/", "").split("/")
    if len(parts) < 2:
        return None, None, None, None
        
    owner, repo = parts[0], parts[1]
    
    # Extract path after /tree/branch/
    if "/tree/" in doc_link:
        try:
            tree_index = parts.index("tree")
            if len(parts) > tree_index + 2:  # owner/repo/tree/branch/path...
                branch = parts[tree_index + 1]
                path_parts = parts[tree_index + 2:]
                # Remove README.md if present
                if path_parts and path_parts[-1] == "README.md":
                    path_parts = path_parts[:-1]
                path = "/".join(path_parts)
                return owner, repo, branch, path
        except ValueError:
            pass
    
    return None, None, None, None


def process_readme_content(content, demo_config):
    """Process README content to adapt it for the documentation site."""
    if not content:
        return None
    
    # Extract repository info
    owner, repo, branch, example_path = extract_repo_info_from_doc_link(demo_config['doc_link'])
    
    if not all([owner, repo, branch, example_path]):
        click.echo(f"Warning: Could not extract repo info from {demo_config['doc_link']}", err=True)
        return content
    
    base_repo_url = f"https://github.com/{owner}/{repo}"
    base_raw_url = f"https://raw.githubusercontent.com/{owner}/{repo}/{branch}"
    
    # Fix relative links to point to the original repository
    processed_content = re.sub(
        r'\]\((?!https?://)(.*?)\)',
        f']({base_repo_url}/tree/{branch}/{example_path}/\\1)',
        content
    )
    
    # Fix relative image links
    processed_content = re.sub(
        r'!\[([^\]]*)\]\((?!https?://)(.*?)\)',
        f'![\\1]({base_raw_url}/{example_path}/\\2)',
        processed_content
    )
    
    return processed_content


def create_demo_page(demo_config, content):
    """Create a markdown file for a demo using the demo template."""
    if not DEMO_TEMPLATE_FILE.exists():
        raise click.ClickException(f"Demo template file {DEMO_TEMPLATE_FILE} not found.")
    
    # Create the local examples directory if it doesn't exist
    LOCAL_EXAMPLES_DIR.mkdir(parents=True, exist_ok=True)
    
    try:
        # Set up Jinja2 environment
        env = Environment(loader=FileSystemLoader("src/templates"))
        template = env.get_template('demo_page_template.j2')
        
        # Render the template
        rendered_content = template.render(
            demo=demo_config,
            content=content
        )
        
        # Create the markdown file
        filename = f"{demo_config['id'].replace('_', '-')}-with-pytidb.md"
        filepath = LOCAL_EXAMPLES_DIR / filename
        
        with open(filepath, 'w', encoding='utf-8') as f:
            f.write(rendered_content)
        
        return filepath
    except Exception as e:
        raise click.ClickException(f"Error creating demo page for {demo_config['id']}: {e}")


def generate_gallery_page(config):
    """Generate the gallery page using Jinja2 template."""
    if not GALLERY_TEMPLATE_FILE.exists():
        raise click.ClickException(f"Gallery template file {GALLERY_TEMPLATE_FILE} not found.")
    
    try:
        # Set up Jinja2 environment
        env = Environment(loader=FileSystemLoader("src/templates"))
        template = env.get_template('demo_gallery_template.j2')
        
        # Render the template
        rendered_content = template.render(
            config=config,
            categories=config.get('categories', {}),
            demos=config.get('demos', {})
        )
        
        # Write the rendered content to the output file
        with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
            f.write(rendered_content)
        
        return True
    except Exception as e:
        raise click.ClickException(f"Error generating gallery page: {e}")


def sync_demo_docs(config, fetch_from_remote=True):
    """Sync demo documentation files based on demos.yml configuration."""
    if not fetch_from_remote:
        return []
    
    created_files = []
    demos_config = config.get('demos', [])
    
    with click.progressbar(demos_config, label='Processing demos') as demos:
        for demo_config in demos:
            demo_id = demo_config['id']
            doc_link = demo_config.get('doc_link')
            
            if not doc_link:
                click.echo(f"Warning: No doc_link found for demo '{demo_id}', skipping...", err=True)
                continue
            
            # Fetch README content
            readme_content = fetch_readme_from_doc_link(doc_link)
            
            if readme_content:
                # Process content
                processed_content = process_readme_content(readme_content, demo_config)
                
                if processed_content:
                    # Create demo page
                    filepath = create_demo_page(demo_config, processed_content)
                    if filepath:
                        created_files.append((demo_id, filepath))
                else:
                    click.echo(f"Failed to process content for {demo_id}", err=True)
            else:
                click.echo(f"Failed to fetch README for {demo_id}", err=True)
    
    return created_files


@click.command()
@click.option('--skip-demos', is_flag=True, 
              help='Skip generating individual demo pages from remote repositories')
@click.option('--skip-gallery', is_flag=True,
              help='Skip generating the demo gallery index page')
@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
def main(skip_demos, skip_gallery, verbose):
    """Generate demo gallery and individual demo documentation pages.
    
    This script reads configuration from demos.yml and generates:
    1. Individual demo pages using demo_template.j2
    2. Gallery index page using gallery_template.j2
    
    By default, both demo pages and gallery are generated.
    """
    if verbose:
        click.echo("Running in verbose mode...")
    
    # Load configuration
    try:
        config = load_config()
    except click.ClickException:
        raise
    
    created_files = []
    
    # Generate demo pages (unless skipped)
    if not skip_demos:
        if verbose:
            click.echo("Generating demo pages from remote repositories...")
        created_files = sync_demo_docs(config, fetch_from_remote=True)
        
        if created_files:
            click.echo(f"\n✅ Generated {len(created_files)} demo pages:")
            for demo_id, filepath in created_files:
                click.echo(f"   • {demo_id} → {filepath}")
        elif verbose:
            click.echo("No demo pages were generated.")
    else:
        if verbose:
            click.echo("Skipping demo page generation...")
    
    # Generate gallery page (unless skipped)
    if not skip_gallery:
        if verbose:
            click.echo("Generating gallery page from template...")
        
        if generate_gallery_page(config):
            click.echo(f"✅ Gallery page generated: {OUTPUT_FILE}")
        else:
            raise click.ClickException("Failed to generate gallery page.")
    else:
        if verbose:
            click.echo("Skipping gallery page generation...")
    
    # Check if nothing was generated
    if skip_demos and skip_gallery:
        click.echo("⚠️  Both demos and gallery generation were skipped. Nothing to do.")
    else:
        click.echo("\n🎉 Done! You can now run 'mkdocs serve' to view the documentation site.")


if __name__ == "__main__":
    main() 

================================================
FILE: scripts/sync_from_tidb_docs.py
================================================
import requests
import re
import sys


def remove_custom_content_blocks(content):
    """Remove <CustomContent ...>...</CustomContent> blocks from the content."""
    return re.sub(r"<CustomContent[\s\S]*?</CustomContent>", "", content)


def collapse_extra_blank_lines(content):
    """Collapse 3 or more blank lines to 2 blank lines."""
    return re.sub(r"\n{3,}", "\n\n", content)


def convert_note_blocks(content):
    """Convert '> **Note:**' blocks to '!!! note' syntax with indented content."""

    def note_repl(m):
        note_body = re.sub(r"^> ?", "", m.group(2), flags=re.MULTILINE).strip()
        indented = "\n".join(
            "    " + line if line.strip() else "" for line in note_body.splitlines()
        )
        return "!!! note\n\n" + indented + "\n\n"

    return re.sub(r"> \*\*Note:\*\*\n((?:> *\n)*)(> .*(?:\n|$)+)", note_repl, content)


def convert_warning_blocks(content):
    """Convert '> **Warning:**' blocks to '!!! warning' syntax with indented content."""

    def warning_repl(m):
        warning_body = re.sub(r"^> ?", "", m.group(2), flags=re.MULTILINE).strip()
        indented = "\n".join(
            "    " + line if line.strip() else "" for line in warning_body.splitlines()
        )
        return "!!! warning\n\n" + indented + "\n\n"

    return re.sub(r"> \*\*Warning:\*\*\n((?:> *\n)*)(> .*(?:\n|$)+)", warning_repl, content)


def convert_tip_blocks(content):
    """Convert '> **Tip**' blocks to '!!! tip' syntax with indented content."""

    def tip_repl(m):
        tip_body = re.sub(r"^> ?", "", m.group(2), flags=re.MULTILINE).strip()
        indented = "\n".join(
            "    " + line if line.strip() else "" for line in tip_body.splitlines()
        )
        return "!!! tip\n\n" + indented + "\n\n"

    return re.sub(r"> \*\*Tip:?\*\*\n((?:> *\n)*)(> .*(?:\n|$)+)", tip_repl, content)


def remove_see_also_section(content):
    """Remove the '## See also' section and everything after it."""
    return re.sub(r"## See also[\s\S]*$", "", content)


def replace_image_paths(content):
    """Replace image paths to point to the local assets directory."""
    return content.replace(
        "/media/vector-search/embedding-search.png", "../../assets/embedding-search.png"
    )


def replace_relative_doc_links(content):
    """Replace relative doc links with full tidbcloud doc links, remove .md suffix and 'vector-search/' in path."""

    def link_repl(m):
        path = m.group(1)
        # Remove leading /, ./ or ../
        path = re.sub(r"^/|^\./|^\.\./", "", path)
        path = path.replace("vector-search/", "")  # Remove 'vector-search/' directory
        return f"(https://docs.pingcap.com/tidbcloud/{path})"

    return re.sub(r"\(((?:/|\./|\.\./)[^)]+?)\.md\)", link_repl, content)


def remove_overview_from_title(content):
    """Remove 'Overview' from the main title if present."""
    return re.sub(
        r"^(# .*)Overview(.*)$",
        lambda m: m.group(1).rstrip() + m.group(2) + "\n",
        content,
        flags=re.MULTILINE,
    )


def remove_front_matter(content):
    """Remove YAML front matter if present."""
    return re.sub(r"^---[\s\S]*?---\n", "", content)


def remove_simpletab_blocks(content):
    """Remove <SimpleTab> and <div label=...> ... </div> blocks, flattening their content."""
    content = re.sub(r"<SimpleTab>|</SimpleTab>", "", content)
    content = re.sub(r'<div label="[^"]*">', "", content)
    content = re.sub(r"</div>", "", content)
    return content


def process_overview():
    url = "https://raw.githubusercontent.com/pingcap/docs/refs/heads/master/vector-search/vector-search-overview.md"
    response = requests.get(url)
    content = response.text
    content = remove_custom_content_blocks(content)
    content = collapse_extra_blank_lines(content)
    content = convert_note_blocks(content)
    content = remove_see_also_section(content)
    content = replace_image_paths(content)
    content = replace_relative_doc_links(content)
    content = remove_overview_from_title(content)
    save_to_file(content, "./src/ai/concepts/vector-search.md")


def process_llamaindex():
    url = "https://raw.githubusercontent.com/pingcap/docs/master/vector-search/vector-search-integrate-with-llamaindex.md"
    response = requests.get(url)
    content = response.text
    content = remove_front_matter(content)
    content = remove_custom_content_blocks(content)
    content = remove_simpletab_blocks(content)
    content = collapse_extra_blank_lines(content)
    content = convert_warning_blocks(content)
    content = convert_note_blocks(content)
    content = convert_tip_blocks(content)
    content = remove_see_also_section(content)
    content = replace_image_paths(content)
    content = replace_relative_doc_links(content)
    save_to_file(content, "./src/ai/integrations/llamaindex.md")


def process_langchain():
    url = "https://raw.githubusercontent.com/pingcap/docs/master/vector-search/vector-search-integrate-with-langchain.md"
    response = requests.get(url)
    content = response.text
    content = remove_front_matter(content)
    content = remove_custom_content_blocks(content)
    content = remove_simpletab_blocks(content)
    content = collapse_extra_blank_lines(content)
    content = convert_warning_blocks(content)
    content = convert_note_blocks(content)
    content = convert_tip_blocks(content)
    content = remove_see_also_section(content)
    content = replace_image_paths(content)
    content = replace_relative_doc_links(content)
    save_to_file(content, "./src/ai/integrations/langchain.md")


def save_to_file(content, filename):
    """Save the processed content to a file."""
    with open(filename, "w") as f:
        f.write(content)


def main():
    if len(sys.argv) > 1:
        arg = sys.argv[1].lower()
        if arg in ("--llamaindex", "llamaindex"):
            process_llamaindex()
            print("llamaindex doc synced.")
            return
        elif arg in ("--langchain", "langchain"):
            process_langchain()
            print("langchain doc synced.")
            return
        elif arg in ("--overview", "overview"):
            process_overview()
            print("overview doc synced.")
            return
        else:
            print(f"Unknown argument: {arg}")
            print("Usage: python sync_from_tidb_docs.py [--llamaindex|--langchain|--overview]")
            return

    process_overview()
    print("overview doc synced (default).")


if __name__ == "__main__":
    main()


================================================
FILE: src/ai/concepts/vector-search.md
================================================
---
title: Vector Search Overview
summary: Learn about Vector Search in TiDB. This feature provides an advanced search solution for performing semantic similarity searches across various data types, including documents, images, audio, and video.
---

# Vector Search


Vector search offers a powerful solution for semantic similarity searches across diverse data types, such as documents, images, audio, and video. It allows developers to leverage their MySQL expertise to build scalable applications enriched with generative AI capabilities, simplifying the integration of advanced search functionality.

!!! note

    The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Starter](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-cloud-starter) clusters.

## Concepts

Vector search is a search method that prioritizes the meaning of your data to deliver relevant results.

Unlike traditional full-text search, which relies on exact keyword matching and word frequency, vector search converts various data types (such as text, images, or audio) into high-dimensional vectors and queries based on the similarity between these vectors. This search method captures the semantic meaning and contextual information of the data, leading to a more precise understanding of user intent.

Even when the search terms do not exactly match the content in the database, vector search can still provide results that align with the user's intent by analyzing the semantics of the data.

For example, a full-text search for "a swimming animal" only returns results containing these exact keywords. In contrast, vector search can return results for other swimming animals, such as fish or ducks, even if these results do not contain the exact keywords.

### Vector embedding

A vector embedding, also known as an embedding, is a sequence of numbers that represents real-world objects in a high-dimensional space. It captures the meaning and context of unstructured data, such as documents, images, audio, and videos.

Vector embeddings are essential in machine learning and serve as the foundation for semantic similarity searches.

TiDB introduces [Vector data types](https://docs.pingcap.com/tidbcloud/vector-search-data-types) and [Vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) designed to optimize the storage and retrieval of vector embeddings, enhancing their use in AI applications. You can store vector embeddings in TiDB and perform vector search queries to find the most relevant data using these data types.

### Embedding model

Embedding models are algorithms that transform data into [vector embeddings](#vector-embedding).

Choosing an appropriate embedding model is crucial for ensuring the accuracy and relevance of semantic search results. For unstructured text data, you can find top-performing text embedding models on the [Massive Text Embedding Benchmark (MTEB) Leaderboard](https://huggingface.co/spaces/mteb/leaderboard).

To learn how to generate vector embeddings for your specific data types, refer to integration tutorials or examples of embedding models.

## How vector search works

After converting raw data into vector embeddings and storing them in TiDB, your application can execute vector search queries to find the data most semantically or contextually relevant to a user's query.

TiDB vector search identifies the top-k nearest neighbor (KNN) vectors by using a [distance function](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators) to calculate the distance between the given vector and vectors stored in the database. The vectors closest to the given vector in the query represent the most similar data in meaning.

![The Schematic TiDB Vector Search](../../assets/embedding-search.png)

As a relational database with integrated vector search capabilities, TiDB enables you to store data and their corresponding vector representations (that is, vector embeddings) together in one database. You can choose any of the following ways for storage:

- Store data and their corresponding vector representations in different columns of the same table.
- Store data and their corresponding vector representation in different tables. In this way, you need to use `JOIN` queries to combine the tables when retrieving data.

## Use cases

### Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is an architecture designed to optimize the output of Large Language Models (LLMs). By using vector search, RAG applications can store vector embeddings in the database and retrieve relevant documents as additional context when the LLM generates responses, thereby improving the quality and relevance of the answers.

### Semantic search

Semantic search is a search technology that returns results based on the meaning of a query, rather than simply matching keywords. It interprets the meaning across different languages and various types of data (such as text, images, and audio) using embeddings. Vector search algorithms then use these embeddings to find the most relevant data that satisfies the user's query.

### Recommendation engine

A recommendation engine is a system that proactively suggests content, products, or services that are relevant and personalized to users. It accomplishes this by creating embeddings that represent user behavior and preferences. These embeddings help the system identify similar items that other users have interacted with or shown interest in. This increases the likelihood that the recommendations will be both relevant and appealing to the user.



================================================
FILE: src/ai/examples/auto-embedding-with-pytidb.md
================================================
---
title: Auto Embedding
description: "Automatically generate embeddings for your text data using built-in embedding models."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/auto_embedding"
---

# Auto Embedding Demo

This example showcases how to use the auto embedding feature with PyTiDB Client.

* Connect to TiDB with PyTiDB Client
* Define a table with a VectorField configured for automatic embedding
* Insert plain text data, embeddings are populated automatically in the background
* Run vector searches with natural language queries, embedding happens transparently

## Prerequisites

- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)

## How to run

**Step 1**: Clone the repository

```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/auto_embedding/
```

**Step 2**: Install the required packages

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```

**Step 3**: Set up environment to connect to database

Go to [TiDB Cloud console](https://tidbcloud.com/clusters) to get the connection parameters and set up the environment variable like this:

```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test

# Using TiDB Cloud Free embedding model by default, which is no required to set up any API key
EMBEDDING_PROVIDER=tidbcloud_free
EOF
```

**Step 4**: Run the demo

```bash
python main.py
```

**Expected output:**

```plain
=== Define embedding function ===
Embedding function (model id: tidbcloud_free/amazon/titan-embed-text-v2) defined

=== Define table schema ===
Table created

=== Truncate table ===
Table truncated

=== Insert sample data ===
Inserted 3 chunks

=== Perform vector search ===
id: 1, text: TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads., distance: 0.30373281240458805
id: 2, text: PyTiDB is a Python library for developers to connect to TiDB., distance: 0.422506501973434
id: 3, text: LlamaIndex is a Python library for building AI-powered applications., distance: 0.5267239638442787
```


---

## Related Resources

- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/auto_embedding)
- **Category**: Getting-Started

- **Description**: Automatically generate embeddings for your text data using built-in embedding models.


[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } 

================================================
FILE: src/ai/examples/basic-with-pytidb.md
================================================
---
title: Basic Usage
description: "Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/basic"
---

# Basic CRUD Demo

This example demonstrates basic CRUD (Create, Read, Update, Delete) operations with PyTiDB.

* Use PyTiDB Client to connect to TiDB
* Create a table with text, vector, and JSON columns
* Perform basic CRUD operations on data

## Prerequisites

- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)

## How to run

**Step 1**: Clone the repository to local

```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/basic/
```

**Step 2**: Install the required packages

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```

**Step 3**: Set up environment to connect to database

Go to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:

```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test
EOF
```

**Step 4**: Run the demo

```bash
python main.py
```

*Expected output:*

```plain
=== CREATE TABLE ===
Table created

=== TRUNCATE TABLE ===
Table truncated

=== CREATE ===
Created 3 items

=== READ ===
ID: 1, Content: TiDB is a distributed SQL database, Metadata: {'category': 'database'}
ID: 2, Content: GPT-4 is a large language model, Metadata: {'category': 'llm'}
ID: 3, Content: LlamaIndex is a Python library for building AI-powered applications, Metadata: {'category': 'rag'}

=== UPDATE ===
Updated item #1
After update - ID: 1, Content: TiDB Cloud Starter is a fully-managed, auto-scaling cloud database service, Metadata: {'category': 'dbass'}

=== DELETE ===
Deleted item #2

=== FINAL STATE ===
ID: 1, Content: TiDB Cloud Starter is a fully-managed, auto-scaling cloud database service, Metadata: {'category': 'dbass'}
ID: 3, Content: LlamaIndex is a Python library for building AI-powered applications, Metadata: {'category': 'rag'}

=== COUNT ROWS ===
Number of rows: 2

=== DROP TABLE ===
Table dropped

Basic CRUD operations completed!
```

---

## Related Resources

- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/basic)
- **Category**: Getting-Started

- **Description**: Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation.


[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } 

================================================
FILE: src/ai/examples/fulltext-search-with-pytidb.md
================================================
---
title: Fulltext Search
description: "Perform traditional text search using MySQL fulltext search capabilities."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/fulltext_search"
---

# Fulltext Search Example

This example demonstrates how to build a E-commerce product search application using TiDB's full-text search feature with multilingual support. Users can search for products by keywords in their preferred language.

<p align="center">
  <img width="700" alt="E-commerce product search with full-text search" src="https://github.com/user-attachments/assets/c81ddad4-f996-4b1f-85c0-5cbb55bc2a3a" />
  <p align="center"><i>E-commerce product search with full-text search</i></p>
</p>

## Prerequisites

- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)

## How to run

**Step 1**: Clone the repository to local

```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/fulltext_search/;
```

**Step 2**: Install the required packages and setup environment

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```

**Step 3**: Set up environment to connect to database

Go to the [TiDB Cloud console](https://tidbcloud.com/), create a new cluster if you don't have one, and then get the connection parameters on the connection dialog.

```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=pytidb_fulltext_demo
EOF
```

**Step 4**: Run the Streamlit app

```bash
streamlit run app.py
```

**Step 5**: open the browser and visit `http://localhost:8501`


---

## Related Resources

- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/fulltext_search)
- **Category**: Search

- **Description**: Perform traditional text search using MySQL fulltext search capabilities.


[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } 

================================================
FILE: src/ai/examples/hybrid-search-with-pytidb.md
================================================
---
title: Hybrid Search
description: "Combine vector search and fulltext search for more comprehensive results."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/hybrid_search"
---

# Hybrid Search Demo

In this demo, we will show you how to use hybrid search to combine vector search and full-text search on a set of documents.

<p align="center">
    <img src="https://github.com/user-attachments/assets/6e1c639d-2160-44c8-86b4-958913b9eca5" alt="TiDB Hybrid Search Demo" width="700"/>
    <p align="center"><i>TiDB Hybrid Search Demo</i></p>
</p>

## Prerequisites

* Python 3.10+
* TiDB database instance (👉 [Create a free TiDB Serverless Cluster](https://tidbcloud.com/free-trial))
* OpenAI API key (Go to [OpenAI](https://platform.openai.com/api-keys) to get the API key)

> **Note**
> 
> Currently, full-text search is only available for the following product option and region:
>
> - TiDB Cloud Starter: Frankfurt (eu-central-1), Singapore (ap-southeast-1)

## How to run

**Step 1**: Clone the repository

```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/hybrid_search;
```

**Step 2**: Install the required packages and setup environment

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```

**Step 3**: Set up environment to connect to storage

If you are using TiDB Cloud, you can find the connection parameters in the [TiDB Cloud console](https://tidbcloud.com/).

```bash
cat > .env <<EOF
TIDB_HOST=localhost
TIDB_PORT=4000
TIDB_USERNAME=root
TIDB_PASSWORD=
TIDB_DATABASE=pytidb_hybrid_demo
OPENAI_API_KEY=<your-openai-api-key>
EOF
```

**Step 4**: Run the demo

**Option 1**: Run the Streamlit app

If you want to check the demo with a web UI, you can run the following command:

```bash
streamlit run app.py
```

Open the browser and visit `http://localhost:8501`

**Option 2**: Run the demo script

If you want to check the demo with a script, you can run the following command:

```bash
python example.py
```

Expected output:

```
=== CONNECT TO TIDB ===
Connected to TiDB.

=== CREATE TABLE ===
Table created.

=== INSERT SAMPLE DATA ===
Inserted 3 rows.

=== PERFORM HYBRID SEARCH ===
Search results:
[
    {
        "_distance": 0.4740166257687124,
        "_match_score": 1.6804268,
        "_score": 0.03278688524590164,
        "id": 60013,
        "text": "TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads."
    },
    {
        "_distance": 0.6428459116216618,
        "_match_score": 0.78427225,
        "_score": 0.03200204813108039,
        "id": 60015,
        "text": "LlamaIndex is a Python library for building AI-powered applications."
    },
    {
        "_distance": 0.641581407158715,
        "_match_score": null,
        "_score": 0.016129032258064516,
        "id": 60014,
        "text": "PyTiDB is a Python library for developers to connect to TiDB."
    }
]
```



---

## Related Resources

- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/hybrid_search)
- **Category**: Search

- **Description**: Combine vector search and fulltext search for more comprehensive results.


[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } 

================================================
FILE: src/ai/examples/image-search-with-pytidb.md
================================================
---
title: Image Search
description: "Build an image search application using multimodal embeddings for both text-to-image and image-to-image search."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/image_search"
---

# Pet Image Search Demo

This example showcases how to build a powerful image search application by combining TiDB's vector search capabilities with multimodal embedding models.

With just a few lines of code, you can create an intelligent search system that understands both text and images.

- 🔍 **Text-to-Image Search**: Find the perfect pet photos by describing what you're looking for in natural language - from "fluffy orange cat"
- 🖼️ **Image-to-Image Search**: Upload a photo and instantly discover visually similar pets based on breed, color, pose and more

<p align="center">
  <img width="700" alt="PyTiDB Image Search Demo" src="https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9" />
  <p align="center"><i>Pet image search via multimodal embeddings</i></p>
</p>


## Prerequisites

- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
- **Jina AI API Key**: Get your free API key at [jina.ai Embeddings ↗️](https://jina.ai/embeddings/)

## How to run

**Step 1**: Clone the repository to local

```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/image_search/
```

**Step 2**: Install the required packages

```bash
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r reqs.txt
```

**Step 3**: Set up environment variables

Go to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:

```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test

JINA_AI_API_KEY={your-jina-ai-api-key}
EOF
```

**Step 3**: Download and extract the dataset

In this demo, we will use the [Oxford Pets dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) to load pet images to the database for search.

*For Linux/MacOS:*

```bash
# Download the dataset
curl -L -o oxford_pets.tar.gz "https://thor.robots.ox.ac.uk/~vgg/data/pets/images.tar.gz"

# Extract the dataset
mkdir -p oxford_pets
tar -xzf oxford_pets.tar.gz -C oxford_pets
```

**Step 4**: Run the app

```bash
streamlit run app.py
```

Open `http://localhost:8501` in your browser.

**Step 5**: Load data

In the sample app, you can click the **Load Sample Data** button to load some sample data to the database.

Or if you want to load all the data in the Oxford Pets dataset, click the **Load All Data** button.

**Step 6**: Search

1. Select the **Search type** in the sidebar
2. Input a text description of the pet you're looking for, or upload a photo of a dog or cat
3. Click the **Search** button


---

## Related Resources

- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/image_search)
- **Category**: Search

- **Description**: Build an image search application using multimodal embeddings for both text-to-image and image-to-image search.


[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } 

================================================
FILE: src/ai/examples/index.md
================================================
---
title: Demo Gallery
description: Explore hands-on demos showcasing how TiDB empowers AI applications. Get started quickly with TiDB Cloud Starter to build your own AI-powered solutions.
hide:
  - navigation
  - toc
  - pageTitle
  - editButton 
---

<style>

/* CSS Variables */
:root {
  --brand-color: #de243d;
  --brand-hover: #b71e34;
  --border-radius-sm: 0.5rem;
  --border-radius-md: 0.75rem;
  --border-radius-lg: 1rem;
  --spacing-sm: 1rem;
  --spacing-md: 2rem;
  --spacing-lg: 3rem;
  --transition-fast: 0.2s;
  --transition-normal: 0.3s;
  --dark-overlay: rgba(255, 255, 255, 0.08);
  --dark-border: rgba(255, 255, 255, 0.1);
  --dark-bg-subtle: rgba(255, 255, 255, 0.05);
}

/* Smooth scrolling for the entire page */
html {
  scroll-behavior: smooth;
}

/* Gallery Container */
.gallery-container {
  max-width: 1280px;
  margin: 0 auto;
  padding: var(--spacing-md) var(--spacing-sm);
}

/* Header */
.gallery-header {
  text-align: center;
  margin-bottom: var(--spacing-lg);
}

.gallery-title {
  font-size: 72px !important;
  font-weight: 800 !important;
  margin-bottom: 8px !important;
  line-height: 1 !important;
  color: var(--md-default-fg-color) !important;
}

.gallery-description {
  font-size: 22px !important;
  color: var(--md-default-fg-color--light) !important;
  padding: 0 120px;
  margin-bottom: 5rem !important;
}

/* Gallery CTA link styles */
.gallery-cta-link {
    position: relative;
    text-decoration: none;
    transition: all 0.3s ease-in-out;
}

.gallery-cta-link:hover {
    border-bottom: 3px solid var(--brand-color);
}


/* Layout */
.gallery-layout {
  display: flex;
  gap: var(--spacing-sm) !important;
}

/* Sidebar */
.gallery-sidebar {
  width: 8rem;
  flex-shrink: 0;
}

.sidebar-nav {
  position: sticky;
  top: 140px;
}

.sidebar-title {
  font-size: 14px !important;
  font-weight: 400 !important;
  color: var(--md-default-fg-color--light) !important;
  text-transform: uppercase;
  letter-spacing: 0.05em;
  margin: 0 0 0.5rem 0 !important;
}

.sidebar-links {
  display: flex;
  flex-direction: column;
  gap: 2px;
  margin-left: -12px;
}

.sidebar-link {
  display: block;
  padding: 8px 12px;
  border-radius: var(--border-radius-sm);
  font-size: 14px;
  font-weight: 400 !important;
  color: var(--md-default-fg-color--light) !important;
  text-decoration: none !important;
  transition: all var(--transition-fast) ease;
  text-align: left;
  cursor: pointer;
}

.sidebar-link:hover {
  background-color: var(--md-default-fg-color--lightest) !important;
  color: var(--md-default-fg-color) !important;
  font-weight: 500 !important;
  transform: translateX(2px);
}

.sidebar-link:focus-visible {
  outline: 2px solid var(--brand-color);
  outline-offset: 2px;
}

/* Content */
.gallery-content {
  flex: 1;
  padding: 0 var(--spacing-lg);
}

.gallery-section {
  margin-bottom: var(--spacing-lg);
  scroll-margin-top: 120px;
}

.section-title {
  font-size: 24px !important;
  font-weight: 700 !important;
  color: var(--md-default-fg-color) !important;
  margin: 0 0 1.5rem 0 !important;
}

.cards-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
  gap: var(--spacing-sm) !important;
}

/* Cards */
.gallery-card {
  display: block;
  background: var(--md-default-bg-color);
  border: 1px solid var(--md-default-fg-color--lightest);
  border-radius: var(--border-radius-md);
  overflow: hidden;
  transition: all var(--transition-normal) ease;
  text-decoration: none !important;
  color: inherit;
}

.gallery-card:hover {
  transform: translateY(-2px);
  box-shadow: var(--md-shadow-z2);
  text-decoration: none !important;
  outline: 2px solid var(--brand-color);
  outline-offset: 2px;
}


.gallery-card:hover .card-title {
  color: var(--brand-color) !important;
}

.card-image {
  height: 8rem;
  position: relative;
  overflow: hidden;
  background-color: var(--md-default-fg-color--lightest);
  border-bottom: 1px solid var(--md-default-fg-color--lightest);
}

.card-image img {
  width: 100%;
  height: 100%;
  object-fit: cover;
}

.card-gradient {
  display: flex;
  align-items: center;
  justify-content: center;
  font-size: 2.5rem;
}

.card-badge {
  position: absolute;
  top: 0.5rem;
  left: 0.5rem;
  background: rgba(0, 0, 0, 0.7);
  color: white;
  padding: 0.2rem 0.4rem;
  border-radius: 0.2rem;
  font-size: 0.5rem !important;
  font-weight: 500 !important;
}

.card-content {
  padding: 12px;
}

.card-title {
  font-size: 0.75rem !important;
  font-weight: 600 !important;
  line-height: 1.5 !important;
  margin: 0 !important;
  color: var(--md-default-fg-color) !important;
}

.card-description {
  color: var(--md-default-fg-color--light) !important;
  font-size: 0.65rem !important;
  line-height: 1.5;
  display: -webkit-box;
  margin: 0;
  -webkit-line-clamp: 3;
  -webkit-box-orient: vertical;
  overflow: hidden;
}

/* CTA */
.gallery-cta {
  background: linear-gradient(135deg, rgba(222, 36, 61, 0.08) 0%, rgba(99, 102, 241, 0.08) 100%);
  border: 1px solid rgba(222, 36, 61, 0.1);
  border-radius: var(--border-radius-lg);
  padding: var(--spacing-md);
  text-align: center;
  margin-top: var(--spacing-lg);
}

.cta-title {
  font-size: 1.5rem;
  font-weight: 700;
  color: var(--md-default-fg-color);
  margin-bottom: var(--spacing-sm);
}

.cta-description {
  color: var(--md-default-fg-color--light);
  margin: 0 auto var(--spacing-md);
  max-width: 42rem;
}

.cta-buttons {
  display: flex;
  justify-content: center;
  gap: var(--spacing-sm);
  flex-wrap: wrap;
}

/* Button shared styles */
.btn-primary,
.btn-secondary {
  padding: 0.75rem 1.5rem;
  border-radius: var(--border-radius-sm);
  font-weight: 500;
  text-decoration: none !important;
  transition: all var(--transition-fast);
}

.btn-primary {
  background-color: var(--brand-color);
  color: white !important;
}

.btn-primary:hover {
  background-color: var(--brand-hover);
  color: white !important;
}

.btn-secondary {
  border: 1px solid var(--md-default-fg-color--lighter);
  background-color: var(--md-default-bg-color);
  color: var(--md-default-fg-color) !important;
}

.btn-secondary:hover {
  background-color: var(--md-default-fg-color--lightest);
  color: var(--md-default-fg-color) !important;
}

/* Dark mode styles */
[data-md-color-scheme="tidb-dark"] .sidebar-link:hover {
  background-color: var(--dark-overlay) !important;
}

[data-md-color-scheme="tidb-dark"] .gallery-card {
  border-color: var(--dark-border);
}

[data-md-color-scheme="tidb-dark"] .card-image {
  background-color: var(--dark-bg-subtle);
  border-bottom-color: var(--dark-border);
}

[data-md-color-scheme="tidb-dark"] .gallery-cta {
  background: linear-gradient(135deg, rgba(222, 36, 61, 0.12) 0%, rgba(99, 102, 241, 0.12) 100%);
  border-color: rgba(222, 36, 61, 0.2);
}

[data-md-color-scheme="tidb-dark"] .btn-secondary:hover {
  background-color: rgba(255, 255, 255, 0.1);
}

/* Responsive adjustments */
@media (max-width: 768px) {
  .gallery-layout { flex-direction: column; }
  .gallery-sidebar { width: 100%; }
  .sidebar-nav { position: static; }
  .sidebar-links { flex-direction: row; gap: 0.5rem; flex-wrap: wrap; }
  .sidebar-link { padding: 12px 16px !important; min-height: 44px; display: flex; align-items: center; }
  .gallery-content { padding: 0; }
  .gallery-description { padding: 0 20px; }
  .cards-grid { grid-template-columns: 1fr; }
  .gallery-title { font-size: 48px !important; }
}

@media (max-width: 1024px) and (min-width: 769px) {
  .cards-grid { grid-template-columns: repeat(2, 1fr); }
}

</style>



<div class="gallery-container">
  <div class="gallery-header">
    <h1 class="gallery-title">Demo Gallery</h1>
    <p class="gallery-description">
      Explore hands-on demos showcasing how TiDB empowers AI applications.<br>
Get started quickly with <a href='https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=demo_gallery' target='_blank' rel='noopener noreferrer' class='gallery-cta-link'><b>TiDB Cloud Starter</b></a> to build your own AI-powered solutions.

    </p>
  </div>

  <div class="gallery-layout">
    <div class="gallery-sidebar">
      <div class="sidebar-nav">
        <h6 class="sidebar-title">Categories</h6>
                          <nav class="sidebar-links">
            <a href="#featured" class="sidebar-link">Featured</a>
            <a href="#getting-started" class="sidebar-link">Getting Started</a>
            <a href="#search" class="sidebar-link">Search & Retrieval</a>
            <a href="#ai-apps" class="sidebar-link">AI Applications</a>
        </nav>
      </div>
    </div>
              <div class="gallery-content">
        <section id="featured" class="gallery-section">
        <h2 class="section-title">⭐ Featured</h2>
        <div class="cards-grid">
          <a href="image-search-with-pytidb/" class="gallery-card">
            <div class="card-image">
              <img src="https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9" alt="Image Search Demo">
            </div>
            <div class="card-content">
              <h3 class="card-title">Image Search</h3>
              <p class="card-description">
                Build an image search application using multimodal embeddings for both text-to-image and image-to-image search.
              </p>
            </div>
          </a>
          <a href="rag-with-pytidb/" class="gallery-card">
            <div class="card-image">
              <img src="https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e" alt="RAG Demo">
            </div>
            <div class="card-content">
              <h3 class="card-title">RAG</h3>
              <p class="card-description">
                Build a RAG application that combines document retrieval with language generation.
              </p>
            </div>
          </a>
          <a href="memory-with-pytidb/" class="gallery-card">
            <div class="card-image">
              <img src="https://github.com/user-attachments/assets/74dee96b-ea20-49dc-ad27-679faa5bf9b8" alt="Memory Demo">
            </div>
            <div class="card-content">
              <h3 class="card-title">Memory</h3>
              <p class="card-description">
                Implement conversation memory for chatbots and conversational AI applications.
              </p>
            </div>
          </a>
        </div>
      </section>
        <section id="getting-started" class="gallery-section">
        <h2 class="section-title">🚀 Getting Started</h2>
        <div class="cards-grid">
          <a href="basic-with-pytidb/" class="gallery-card">
            <div class="card-image card-gradient" style="background: linear-gradient(135deg, #10b981, var(--brand-color));">
              <div>⚙️</div>
            </div>
            <div class="card-content">
              <h3 class="card-title">Basic Usage</h3>
              <p class="card-description">
                Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation.
              </p>
            </div>
          </a>
          <a href="auto-embedding-with-pytidb/" class="gallery-card">
            <div class="card-image card-gradient" style="background: radial-gradient(circle at center, #8b5cf6 0%, var(--brand-color) 100%);">
              <div>🤖</div>
            </div>
            <div class="card-content">
              <h3 class="card-title">Auto Embedding</h3>
              <p class="card-description">
                Automatically generate embeddings for your text data using built-in embedding models.
              </p>
            </div>
          </a>
        </div>
      </section>
        <section id="search" class="gallery-section">
        <h2 class="section-title">🔍 Search & Retrieval</h2>
        <div class="cards-grid">
          <a href="vector-search-with-pytidb/" class="gallery-card">
            <div class="card-image">
              <img src="https://github.com/user-attachments/assets/6d7783a5-ce9c-4dcc-8b95-49d5f0ca735a" alt="Vector Search Demo">
            </div>
            <div class="card-content">
              <h3 class="card-title">Vector Search</h3>
              <p class="card-description">
                Implement semantic search using vector embeddings to find similar content.
              </p>
            </div>
          </a>
          <a href="fulltext-search-with-pytidb/" class="gallery-card">
            <div class="card-image">
              <img src="https://github.com/user-attachments/assets/c81ddad4-f996-4b1f-85c0-5cbb55bc2a3a" alt="Fulltext Search Demo">
            </div>
            <div class="card-content">
              <h3 class="card-title">Fulltext Search</h3>
              <p class="card-description">
                Perform traditional text search using MySQL fulltext search capabilities.
              </p>
            </div>
          </a>
          <a href="hybrid-search-with-pytidb/" class="gallery-card">
            <div class="card-image">
              <img src="https://github.com/user-attachments/assets/6e1c639d-2160-44c8-86b4-958913b9eca5" alt="Hybrid Search Demo">
            </div>
            <div class="card-content">
              <h3 class="card-title">Hybrid Search</h3>
              <p class="card-description">
                Combine vector search and fulltext search for more comprehensive results.
              </p>
            </div>
          </a>
          <a href="image-search-with-pytidb/" class="gallery-card">
            <div class="card-image">
              <img src="https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9" alt="Image Search Demo">
            </div>
            <div class="card-content">
              <h3 class="card-title">Image Search</h3>
              <p class="card-description">
                Build an image search application using multimodal embeddings for both text-to-image and image-to-image search.
              </p>
            </div>
          </a>
        </div>
      </section>
        <section id="ai-apps" class="gallery-section">
        <h2 class="section-title">🤖 AI Applications</h2>
        <div class="cards-grid">
          <a href="rag-with-pytidb/" class="gallery-card">
            <div class="card-image">
              <img src="https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e" alt="RAG Demo">
            </div>
            <div class="card-content">
              <h3 class="card-title">RAG</h3>
              <p class="card-description">
                Build a RAG application that combines document retrieval with language generation.
              </p>
            </div>
          </a>
          <a href="memory-with-pytidb/" class="gallery-card">
            <div class="card-image">
              <img src="https://github.com/user-attachments/assets/74dee96b-ea20-49dc-ad27-679faa5bf9b8" alt="Memory Demo">
            </div>
            <div class="card-content">
              <h3 class="card-title">Memory</h3>
              <p class="card-description">
                Implement conversation memory for chatbots and conversational AI applications.
              </p>
            </div>
          </a>
          <a href="text2sql-with-pytidb/" class="gallery-card">
            <div class="card-image card-gradient" style="background: linear-gradient(135deg, #06b6d4, var(--brand-color));">
              <div>💬</div>
            </div>
            <div class="card-content">
              <h3 class="card-title">Text2SQL</h3>
              <p class="card-description">
                Convert natural language queries into SQL statements using AI models.
              </p>
            </div>
          </a>
        </div>
      </section>
      <div class="gallery-cta">
        <h3 class="cta-title">Ready to build your AI application?</h3>
        <p class="cta-description">
          Start your AI journey with TiDB Cloud Starter. Follow our quickstart guide to build your first AI-powered application in minutes, or explore specific examples for your use case.
        </p>
        <div class="cta-buttons">
          <a href="https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme" target="_blank" rel="noopener noreferrer" class="btn-primary">Try TiDB Cloud Starter</a>
          <a href="/ai/quickstart/" class="btn-secondary">View Quickstart Guide</a>
        </div>
      </div>
    </div>
  </div>
</div> 

================================================
FILE: src/ai/examples/memory-with-pytidb.md
================================================
---
title: Memory
description: "Implement conversation memory for chatbots and conversational AI applications."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/memory"
---

# AI Agent Memory Demo

This example showcases how to build an intelligent AI agent with persistent memory powered by TiDB's vector search capabilities.

With just a few lines of code, you can create a conversational AI that remembers past interactions and builds context over time.

- 🧠 **Persistent Memory**: The AI remembers conversations across sessions and user interactions
- 💬 **Interactive Chat**: Both web interface and command-line options for flexible interaction
- 👤 **Multi-User Support**: Different users can have separate memory contexts
- 🔍 **Real-Time Memory Viewing**: Visual display of all stored memories in the web interface

<p align="center">
    <img src="https://github.com/user-attachments/assets/b57ae0fb-9075-43a9-8690-edaa69ca9f40" alt="AI Agent with memory powered by TiDB" width="700"/>
    <p align="center"><i>AI Agent with memory powered by TiDB</i></p>
</p>

## Prerequisites

- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
- **OpenAI API Key**: Get your API key at [OpenAI Platform ↗️](https://platform.openai.com/api-keys)

## How to run

**Step 1**: Clone the repository to local

```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/memory/
```

**Step 2**: Install the required packages

```bash
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r reqs.txt
```

**Step 3**: Set up environment variables

Go to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:

```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test

OPENAI_API_KEY={your-openai-api-key}
EOF
```

**Step 4**: Run the application

Choose one of the following options:

**Option 1**: Launch Web Application:

```bash
streamlit run app.py
```

Visit `http://localhost:8501` in your browser and follow the instructions in the [Interact with memory in Web Application](https://github.com/pingcap/pytidb/tree/main/examples/memory/#interact-with-memory-in-web-application) section to start interacting with the memory-enabled AI assistant.

**Option 2**: Run Command Line Application:

```bash
python main.py
```

Follow the instructions in the [Interact with memory in Command Line Application](https://github.com/pingcap/pytidb/tree/main/examples/memory/#interact-with-memory-in-command-line-application) section to start interacting with the memory-enabled AI assistant.

## Interact with memory in Web Application

In the web application, you can interact with the AI assistant, the user interface includes:

- **Sidebar**: User settings and chat list.
- **Main chat area**: Chat interface with the AI assistant.
- **Memory viewer**: Real-time memory viewer showing stored facts.

You can follow the following steps to check how the memory works:

1. Introduce yourself in the default chat session. For example, "Hello, I am John. I work as a software engineer and love guitar."
2. You can see the information you provided in the memory viewer.
3. Click **New chat** in the sidebar to start a new chat session.
4. Ask "Who am I?" in the new chat session. The AI will recall your information from previous conversations.


## Interact with memory in Command Line Application

In the command line application, you can interact with the AI assistant and introduce yourself.

**Example conversation:**

```plain
Chat with AI (type 'exit' to quit)
You: Hello, I am Mini256.
AI: Hello, Mini256! How can I assist you today?
You: I am working at PingCAP.
AI: That's great to hear, Mini256! PingCAP is known for its work on distributed databases, particularly TiDB. How's your experience been working there?
You: I am developing pytidb (A Python SDK for TiDB) which helps developers easy to connect to TiDB.
AI: That sounds like a great project, Mini256! Developing a Python SDK for TiDB can make it much easier for developers to integrate with TiDB and interact with it using Python. If you need any advice on best practices, libraries to use, or specific features to implement, feel free to ask!
You: exit
Goodbye!
```

After the first conversation, the AI assistant will remember the information you provided and use it to answer future questions.

Now, you can start a new chat session and ask the AI assistant "Who am I?".

**Example conversation in another chat session:**

```plain
Chat with AI (type 'exit' to quit)
You: Who am I?
AI: You are Mini256, and you work at PingCAP, where you are developing pytidb, a Python SDK for TiDB to assist developers in easily connecting to TiDB.
You: exit
Goodbye!
```

As you can see, the AI assistant remembers you across sessions!



---

## Related Resources

- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/memory)
- **Category**: Ai-Apps

- **Description**: Implement conversation memory for chatbots and conversational AI applications.


[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } 

================================================
FILE: src/ai/examples/rag-with-pytidb.md
================================================
---
title: RAG
description: "Build a RAG application that combines document retrieval with language generation."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/rag"
---

# RAG Example

This example demonstrates how to use PyTiDB to build a minimal RAG application.

* Use Ollama to deploy local embedding model and LLM model
* Use Streamlit to build a Web UI for the RAG application
* Use PyTiDB to build a minimal RAG application

<p align="center">
  <img src="https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e" alt="RAG application built with PyTiDB" width="600" />
  <p align="center"><i>RAG application built with PyTiDB</i></p>
</p>

## Prerequisites

- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
- **Ollama**: You can install it from [Ollama ↗️](https://ollama.com/download)

## How to run

**Step 1**: Prepare the inference API

Pull the embedding and LLM model via ollama CLI:

```bash
ollama pull mxbai-embed-large
ollama pull gemma3:4b
ollama run gemma3:4b
```

Test the `/embed` and `/generate` endpoints to make sure they are running:

```bash
curl http://localhost:11434/api/embed -d '{
  "model": "mxbai-embed-large",
  "input": "Llamas are members of the camelid family"
}'
```

```bash
curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:4b",
  "prompt": "Hello, Who are you?"
}'
```

**Step 2**: Clone the repository to local

```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/rag/;
```

**Step 3**: Install the required packages and setup environment

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```

**Step 4**: Set up environment to connect to database

Go to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:

```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test
EOF
```

**Step 5**: Run the Streamlit app

```bash
streamlit run main.py
```

**Step 6**: Open the browser and visit `http://localhost:8501`

## Troubleshooting

### `502 Bad Gateway` Error

Try to disable the global proxy settings.


---

## Related Resources

- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/rag)
- **Category**: Ai-Apps

- **Description**: Build a RAG application that combines document retrieval with language generation.


[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } 

================================================
FILE: src/ai/examples/text2sql-with-pytidb.md
================================================
---
title: Text2SQL
description: "Convert natural language queries into SQL statements using AI models."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/text2sql"
---

# Text2SQL Demo

This demo showcases an AI-powered interface that converts natural language questions into SQL queries and executes them against TiDB. Built with PyTiDB, OpenAI GPT, and Streamlit, it provides a seamless way to interact with your database using plain English.

## Prerequisites

- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
- **OpenAI API Key**: Get your API key at [OpenAI Platform ↗️](https://platform.openai.com/api-keys)


## How to run

**Step 1**: Clone the repository

```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/text2sql/;
```

**Step 2**: Install the required packages

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```

**Step 3**: Run the Streamlit app

```bash
streamlit run app.py
```

**Step 4**: Run streamlit app

Open the browser and visit `http://localhost:8501`

* Input OpenAI API key in left sidebar
* Input the TiDB Cloud connection string in left sidebar, the format is `mysql+pymysql://root@localhost:4000/test`

---

## Related Resources

- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/text2sql)
- **Category**: Ai-Apps

- **Description**: Convert natural language queries into SQL statements using AI models.


[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } 

================================================
FILE: src/ai/examples/vector-search-with-pytidb.md
================================================
---
title: Vector Search
description: "Implement semantic search using vector embeddings to find similar content."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/vector_search"
---

# Vector Search Example

This example demonstrates how to build a semantic search application using TiDB and local embedding models. It leverages vector search to find similar items based on meaning, not just keywords. The app uses Streamlit for the web UI and Ollama for local embedding generation.

<p align="center">
  <img width="700" alt="Semantic search with vector embeddings" src="https://github.com/user-attachments/assets/6d7783a5-ce9c-4dcc-8b95-49d5f0ca735a" />
  <p align="center"><i>Semantic search with vector embeddings</i></p>
</p>

## Prerequisites

- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
- **Ollama**: You can install it from [Ollama ↗️](https://ollama.com/download)

## How to run

**Step 1**: Start the embedding service with Ollama

Pull the embedding model:

```bash
ollama pull mxbai-embed-large
```

Test the embedding service to make sure it is running:

```bash
curl http://localhost:11434/api/embed -d '{
  "model": "mxbai-embed-large",
  "input": "Llamas are members of the camelid family"
}'
```

**Step 2**: Clone the repository to local

```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/vector_search/
```

**Step 3**: Install the required packages and set up the environment

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```

**Step 4**: Set up environment to connect to TiDB

Go to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:

```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=pytidb_vector_search
EOF
```

**Step 5**: Run the Streamlit app

```bash
streamlit run app.py
```

**Step 6**: Open your browser and visit `http://localhost:8501`


---

## Related Resources

- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/vector_search)
- **Category**: Search

- **Description**: Implement semantic search using vector embeddings to find similar content.


[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } 

================================================
FILE: src/ai/guides/auto-embedding.md
================================================
# Auto Embedding

Auto embedding is a feature that allows you to automatically generate vector embeddings for text data.

!!! tip

    To check the complete example code, please refer to the [auto embedding example](https://github.com/pingcap/pytidb/blob/main/examples/auto_embedding).

## Basic Usage

In this example, we use TiDB Cloud hosted embedding models for demonstration, for other providers, please check the [Supported Providers](../integrations/embedding-overview.md#supported-providers) list.

### Step 1. Define a embedding function

=== "Python"

    Define a embedding function to generate vector embeddings for text data.

    ```python
    from pytidb.embeddings import EmbeddingFunction

    embed_func = EmbeddingFunction(
        model_name="tidbcloud_free/amazon/titan-embed-text-v2",
    )
    ```

### Step 2. Create a table and a vector field

=== "Python"

    Use `embed_func.VectorField()` to create a vector field on the table.

    To enable auto embedding, you need to set `source_field` to the field that you want to embed.

    ```python hl_lines="7"
    from pytidb.schema import TableModel, Field
    from pytidb.datatype import TEXT

    class Chunk(TableModel):
        id: int = Field(primary_key=True)
        text: str = Field(sa_type=TEXT)
        text_vec: list[float] = embed_func.VectorField(source_field="text")

    table = client.create_table(schema=Chunk, if_exists="overwrite")
    ```

    You don't need to specify the `dimensions` parameter, it will be automatically determined by the embedding model.
    
    However, you can specify the `dimensions` parameter to override the default dimension.

### Step 3. Insert some sample data

=== "Python"

    Insert some sample data into the table.

    ```python
    table.bulk_insert([
        Chunk(text="TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads."),
        Chunk(text="PyTiDB is a Python library for developers to connect to TiDB."),
        Chunk(text="LlamaIndex is a Python library for building AI-powered applications."),
    ])
    ```

    When inserting data, the `text_vec` field will be automatically populated with the vector embeddings generated based on the `text` field.

### Step 4. Perform a vector search

=== "Python"

    You can pass the query text to the `search()` method directly, the query text will be embedded and then used for vector search.

    ```python
    table.search("HTAP database").limit(3).to_list()
    ```


================================================
FILE: src/ai/guides/connect.md
================================================
# Connect to database

In this guide, we will introduce how to connect to a TiDB database using the TiDB client.

## Install the dependencies

[pytidb](https://github.com/pingcap/pytidb) is a Python client built upon [SQLAlchemy](https://sqlalchemy.org/), it provides a series of high-level APIs to help developers store and search vector embeddings without writing raw SQL.

To install the Python client, run the following command:

```bash
pip install pytidb
```

## Connect with connection parameters

Choose the steps based on your deployment type:

=== "TiDB Cloud Starter"

    You can create a starter cluster in the [TiDB Cloud](https://tidbcloud.com/free-trial/), and then get the connection parameters from the web console.

    1. Navigate to the [Clusters page](https://tidbcloud.com/clusters), and then click the name of your target cluster to go to its overview page.
    2. Click **Connect** in the upper-right corner. A connection dialog is displayed, with connection parameters listed.
    3. Copy the connection parameters to your code or environment variables.

    Example code:

    ```python title="main.py"
    from pytidb import TiDBClient

    db = TiDBClient.connect(
        host="{gateway-region}.prod.aws.tidbcloud.com",
        port=4000,
        username="{prefix}.root",
        password="{password}",
        database="test",
    )
    ```

    !!! tip

        For TiDB Cloud Starter, [TLS connection to the database](https://docs.pingcap.com/tidbcloud/secure-connections-to-starter-clusters/) is required when using Public Endpoint. TiDB Client will **automatically** enable TLS connection for starter clusters.

=== "TiDB Self-Managed"

    You can follow [Quick Start with TiDB Self-Managed](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb/#deploy-a-local-test-cluster) to deploy a TiDB cluster for testing.

    Example code:

    ```python title="main.py"
    from pytidb import TiDBClient

    db = TiDBClient.connect(
        host="{tidb_server_host}",
        port=4000,
        username="root",
        password="{password}",
        database="test",
    )
    ```

    !!! tip
    
        If you are using `tiup playground` to deploy a TiDB cluster for testing, the default host is `127.0.0.1` and the default password is empty.

Once connected, you can use the `db` object to operate tables, query data, and more.

## Connect with connection string

If you prefer to use a connection string (database URL), you can follow the format based on your deployment type:

=== "TiDB Cloud Starter"

    You can create a starter cluster in the [TiDB Cloud](https://tidbcloud.com/free-trial/), and then get the connection parameters from the web console.

    1. Navigate to the [Clusters page](https://tidbcloud.com/clusters), and then click the name of your target cluster to go to its overview page.
    2. Click **Connect** in the upper-right corner. A connection dialog is displayed with the connection parameters listed.
    3. Copy the connection parameters and construct the connection string as the format below.

    ```python title="main.py"
    from pytidb import TiDBClient

    db = TiDBClient.connect(
        database_url="mysql+pymysql://{USERNAME}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}?ssl_verify_cert=true&ssl_verify_identity=true",
    )
    ```

    !!! note
    
        For TiDB Cloud Starter, [TLS connection to the database](https://docs.pingcap.com/tidbcloud/secure-connections-to-starter-clusters/) is required when using Public Endpoint, so you need to set `ssl_verify_cert=true&ssl_verify_identity=true` in the connection string.

=== "TiDB Self-Managed"

    You can follow the format below to construct the connection string:

    ```python title="main.py"
    from pytidb import TiDBClient

    db = TiDBClient.connect(
        database_url="mysql+pymysql://{USERNAME}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}",
    )
    ```

    !!! tip

        If you are using `tiup playground` to deploy a TiDB cluster for testing, the connection string is: 
        
        ```
        mysql+pymysql://root:@127.0.0.1:4000/test
        ```

## Connect with SQLAlchemy DB engine

If your application already has an existing SQLAlchemy database engine, you can reuse the engine through the `db_engine` parameter:

```python title="main.py"
from pytidb import TiDBClient

db = TiDBClient(db_engine=db_engine)
```

## Next Steps

After connecting to your TiDB database, you can explore the following guides to learn how to work with your data:

- [Working with Tables](./tables.md): Learn how to define and manage tables in TiDB.
- [Vector Search](./vector-search.md): Perform semantic search using vector embeddings.
- [Fulltext Search](./fulltext-search.md): Retrieve documents using keyword-based search.
- [Hybrid Search](./hybrid-search.md): Combine vector and full-text search for more relevant results.


================================================
FILE: src/ai/guides/filtering.md
================================================
# Filtering

As a relational database, TiDB supports a rich set of [SQL operators](https://docs.pingcap.com/tidbcloud/operators/) and allows flexible combinations of filtering conditions that enable you to query your data precisely.

## Overview

You can not only apply filtering on scalar fields but also on JSON fields. Filtering on JSON fields is often used for [metadata filtering](./vector-search.md#metadata-filtering) in vector search.

=== "Python"

For PyTiDB, you can apply filtering by passing a **filters** parameter to the `table.query()`, `table.delete()`, `table.update()`, and `table.search()` methods.

The **filters** parameter supports two formats: [Dictionary Filters](#dictionary-filters) and [SQL String Filters](#sql-string-filters).

## Dictionary Filters

=== "Python"

PyTiDB allows you to define filter conditions using a Python dictionary with operators as the **filters** parameter.

The dictionary structure of **filters** is as follows:

```python
{
    "<key>": {
        "<operator>": <value>
    },
    ...
}
```

- `<key>`: The key can be a column name, a JSON path expression to access a JSON field (see [Metadata filtering](./vector-search.md#metadata-filtering)), or a [logical operator](#logical-operators).
- `<operator>`: The operator can be a [compare operator](#compare-operators) or an [inclusion operator](#inclusion-operators).
- `<value>`: The value can be a scalar value, an array, it depends on the operator.

**Example: Filter records where `created_at` is greater than 2024-01-01**

```python
table.query({
    # The `created_at` is a scalar field with DATETIME type
    "created_at": {
        "$gt": "2024-01-01"
    }
})
```

**Example: Filter records where `meta.category` is in the array ["tech", "science"]**

```python
results = (
    table.search("some query", search_type="vector")
        .filter({
            # The `meta` is a JSON field, and its value is a JSON object like {"category": "tech"}
            "meta.category": {
                "$in": ["tech", "science"]
            }
        })
        .limit(10)
        .to_list()
)
```

### Compare operators

You can use the following compare operators to filter records:

| Operator | Description                       |
|----------|-----------------------------------|
| `$eq`    | Equal to value                    |
| `$ne`    | Not equal to value                |
| `$gt`    | Greater than value                |
| `$gte`   | Greater than or equal to value    |
| `$lt`    | Less than value                   |
| `$lte`   | Less than or equal to value       |

**Example: filter records where `user_id` is equal to 1**

```python
{
    "user_id": {
        "$eq": 1
    }
}
```

You can omit the `$eq` operator. The following query is equivalent to the above:

```python
{
    "user_id": 1
}
```

### Inclusion operators

You can use the following inclusion operators to filter records:

| Operator | Description                       |
|----------|-----------------------------------|
| `$in`    | In array (string, int, or float)  |
| `$nin`   | Not in array (string, int, float) |

**Example: Filter records where `category` is in the array ["tech", "science"]**

```python
{
    "category": {
        "$in": ["tech", "science"]
    }
}
```

### Logical operators

You can use the logical operators `$and` and `$or` to combine multiple filters.

| Operator | Description                                         |
|----------|-----------------------------------------------------|
| `$and`   | Returns results that match **all** filters in the list |
| `$or`    | Returns results that match **any** filter in the list |

**Syntax for using `$and` or `$or`:**

```python
{
    "$and|$or": [
        {
            "field_name": {
                <operator>: <value>
            }
        },
        {
            "field_name": {
                <operator>: <value>
            }
        }
        ...
    ]
}
```

**Example: using `$and` to combine multiple filters:**

```python
{
    "$and": [
        {
            "created_at": {
                "$gt": "2024-01-01"
            }
        },
        {
            "meta.category": {
                "$in": ["tech", "science"]
            }
        }
    ]
}
```

## SQL String Filters

=== "Python"

You can also use a SQL string as the `filters` parameter. The string should be a valid SQL `WHERE` clause (without the `WHERE` keyword) using TiDB's SQL syntax.

**Example: Filter records where `created_at` is greater than 2024-01-01**

```python
results = table.query(
    filters="created_at > '2024-01-01'",
    limit=10
).to_list()
```

**Example: Filter records where the JSON field `meta.category` equals 'tech'**

```python
results = table.query(
    filters="meta->>'$.category' = 'tech'",
    limit=10
).to_list()
```

You can combine multiple conditions using `AND`, `OR`, and parentheses, and use any [SQL operators](https://docs.pingcap.com/tidbcloud/operators/) supported by TiDB.

!!! warning

    When using SQL string filters with dynamic user input, always validate the input to prevent [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) vulnerabilities.


================================================
FILE: src/ai/guides/fulltext-search.md
================================================
# Full-text Search

**Full-text search** enables you to find documents or data by matching keywords or phrases within the entire text content. It is widely used in search engines, document management, e-commerce, and any scenario where users need to search large volumes of unstructured or semi-structured text.

TiDB provides full-text search capabilities for **massive datasets** with high performance and built-in **multilingual support**.

!!! note

    Full-text search is currently in the early stages with limited accessibility. It is only available for **TiDB Cloud Starter** in the following regions:
    
    - **Frankfurt (eu-central-1)**
    - **Singapore (ap-southeast-1)**
  
    If you have feedback or need help, feel free to reach out to us on [Discord](https://discord.gg/zcqexutz2R).

!!! tip

    For a complete example of full-text search, see the [E-commerce product search demo](../examples/fulltext-search-with-pytidb.md).

## Basic Usage

### Step 1. Create Table and Full-text Index

=== "Python"

    You can use `FullTextField` to define a text field with full-text search enabled. The `fts_parser` parameter in Python corresponds to the `WITH PARSER` clause in SQL.

    For example, the following code creates a table with a full-text index on the `title` column:

    ```python hl_lines="6"
    from pytidb.schema import TableModel, Field, FullTextField

    class Item(TableModel):
        __tablename__ = "items"
        id: int = Field(primary_key=True)
        title: str = FullTextField(fts_parser="MULTILINGUAL")

    table = client.create_table(schema=Item, if_exists="overwrite")
    ```

    The `fts_parser` parameter specifies the parser for the full-text index. Supported values:
    
    - `STANDARD`: Fast, works for English content, splits words by spaces and punctuation.
    - `MULTILINGUAL` (default): Supports multiple languages, including English, Chinese, Japanese, and Korean.

=== "SQL"

    Create a table with a full-text index:

    ```sql hl_lines="4"
    CREATE TABLE items(
        id INT PRIMARY KEY,
        title TEXT,
        FULLTEXT INDEX (title) WITH PARSER MULTILINGUAL
    );
    ```

    You can also add a full-text index to an existing table with a separate statement:

    ```sql
    CREATE TABLE items(
        id INT PRIMARY KEY,
        title TEXT
    );

    ALTER TABLE items ADD FULLTEXT INDEX (title)
    WITH PARSER MULTILINGUAL ADD_COLUMNAR_REPLICA_ON_DEMAND;
    ```

    The following parsers are supported in the `WITH PARSER <PARSER_NAME>` clause:

    - `STANDARD`: Fast, works for English content, splits words by spaces and punctuation.
    - `MULTILINGUAL`: Supports multiple languages, including English, Chinese, Japanese, and Korean.

### Step 2. Insert Sample Data

For demonstration purposes, the following sample data covers English, Japanese, and Chinese text.

=== "Python"

    You can use the `bulk_insert` method to insert sample data into the table.

    ```python
    table.bulk_insert([
        Item(id=1, title="Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency"),
        Item(id=2, title="Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable"),
        Item(id=3, title="IPX7 Waterproof Earbuds, Sport ready, Touch control, High-quality music"),
        Item(id=4, title="Sports Earbuds, Secure fit, Sweatproof, Long battery, Workout support"),
        Item(id=5, title="Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience"),
        Item(id=6, title="Bluetoothイヤホン HiFi音質 48hバッテリー 急速充電 低遅延"),
        Item(id=7, title="Bluetooth5.3ヘッドホン ノイズキャンセリング 没入サウンド 快適装着"),
        Item(id=8, title="IPX7防水イヤホン スポーツ対応 タッチ操作 高音質音楽"),
        Item(id=9, title="スポーツイヤホン 安定装着 防汗 長持ちバッテリー ワークアウト対応"),
        Item(id=10, title="有線ヘッドホン スタジオ級 HDサウンド 快適装着 プロ音楽体験"),
        Item(id=11, title="无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟"),
        Item(id=12, title="蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光"),
        Item(id=13, title="IPX7防水真无线耳机 运动无忧 智能触控 随时畅听高品质音乐"),
        Item(id=14, title="运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话"),
        Item(id=15, title="录音室级有线耳机 高清音质 舒适佩戴 可拆卸线材 多设备兼容 降噪麦克风"),
    ])
    ```

=== "SQL"

    You can use the `INSERT INTO` statement to insert the sample data into the table.

    ```sql
    INSERT INTO items (id, title) VALUES
        (1, 'Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency'),
        (2, 'Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable'),
        (3, 'IPX7 Waterproof Earbuds, Sport ready, Touch control, High-quality music'),
        (4, 'Sports Earbuds, Secure fit, Sweatproof, Long battery, Workout support'),
        (5, 'Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience'),
        (6, 'Bluetoothイヤホン HiFi音質 48hバッテリー 急速充電 低遅延'),
        (7, 'Bluetooth5.3ヘッドホン ノイズキャンセリング 没入サウンド 快適装着'),
        (8, 'IPX7防水イヤホン スポーツ対応 タッチ操作 高音質音楽'),
        (9, 'スポーツイヤホン 安定装着 防汗 長持ちバッテリー ワークアウト対応'),
        (10, '有线ヘッドホン スタジオ级 HDサウンド 快适装着 プロ音楽体験'),
        (11, '无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟'),
        (12, '蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光'),
        (13, 'IPX7防水真无线耳机 运动无忧 智能触控 随时畅听高品质音乐'),
        (14, '运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话'),
        (15, '录音室级有线耳机 高清音质 舒适佩戴 可拆卸线材 多设备兼容 降噪麦克风');
    ```

### Step 3. Perform a Full-text Search

=== "Python"

    To perform a full-text search with pytidb, use the `search` method and set the `search_type` parameter to `"fulltext"`.

    **Example: Search for the 3 most relevant documents**

    ```python
    results = table.search("Bluetooth Headphones", search_type="fulltext").limit(3).to_list()
    print(json.dumps(results, indent=2, ensure_ascii=False))
    ```

    ```python title="Execution result"
    [
        {
            "id": 2,
            "title": "Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable",
            "_match_score": 3.7390857,
            "_score": 3.7390857
        },
        {
            "id": 5,
            "title": "Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience",
            "_match_score": 1.9798478,
            "_score": 1.9798478
        },
        {
            "id": 1,
            "title": "Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency",
            "_match_score": 1.620981,
            "_score": 1.620981
        }
    ]
    ```

    The results are sorted by relevance, with the most relevant documents listed first.

    The `_match_score` (or `_score`) field indicates the relevance score of each document, calculated using the [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) algorithm—a widely used ranking function in information retrieval.

    **Example: Search for the 3 most relevant documents in another language**

    ```python
    results = table.search("蓝牙耳机", search_type="fulltext").limit(3).to_list()
    print(json.dumps(results, indent=2, ensure_ascii=False))
    ```

    ```python title="Execution result"
    [
        {
            "id": 11,
            "title": "无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟",
            "_match_score": 3.000002,
            "_score": 3.000002
        },
        {
            "id": 12,
            "title": "蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光",
            "_match_score": 2.5719738,
            "_score": 2.5719738
        },
        {
            "id": 14,
            "title": "运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话",
            "_match_score": 1.1418362,
            "_score": 1.1418362
        }
    ]
    ```

=== "SQL"

    To perform a full-text search, use the `fts_match_word()` function.

    **Example: Search for the 3 most relevant documents**

    ```sql
    SELECT *, fts_match_word("Bluetooth Headphones", title) AS score
    FROM items
    WHERE fts_match_word("Bluetooth Headphones", title)
    ORDER BY score DESC
    LIMIT 3;
    ```

    ```plain title="Execution result"
    +----+-----------------------------------------------------------------------------+-----------+
    | id | title                                                                       | score     |
    +----+-----------------------------------------------------------------------------+-----------+
    |  2 | Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable    | 3.7390857 |
    |  5 | Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience | 1.9798478 |
    |  1 | Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency      |  1.620981 |
    +----+-----------------------------------------------------------------------------+-----------+
    ```

    The results are ordered by relevance, with the most relevant documents first.

    **Example: Search for the 3 most relevant documents in another language**

    ```sql
    SELECT *, fts_match_word("蓝牙耳机", title) AS score
    FROM items
    WHERE fts_match_word("蓝牙耳机", title)
    ORDER BY score DESC
    LIMIT 3;
    ```

    ```plain title="Execution result"
    +----+------------------------------------------------------------------+-----------+
    | id | title                                                            | score     |
    +----+------------------------------------------------------------------+-----------+
    | 11 | 无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟                    |  3.000002 |
    | 12 | 蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光        | 2.5719738 |
    | 14 | 运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话               | 1.1418362 |
    +----+------------------------------------------------------------------+-----------+
    ```

## See Also

In Retrieval-Augmented Generation (RAG) scenarios, it is often beneficial to utilize both full-text search and vector search for optimal results.

- Learn how to combine these approaches in the [hybrid search guide](./hybrid-search.md).
- For more on vector search, see the [vector search guide](../concepts/vector-search.md).

================================================
FILE: src/ai/guides/hybrid-search.md
================================================
# Hybrid Search

Hybrid search is a technique that combines multiple search algorithms to deliver more accurate and relevant results.

TiDB supports both semantic search (also known as vector search) and keyword-based search (full-text search). By leveraging the strengths of both approaches, you can achieve superior search results through hybrid search.

<p align="center">
    <img src="https://docs-download.pingcap.com/media/images/docs/vector-search/hybrid-search-overview.svg" alt="hybrid search overview" width="800"/>
</p>

!!! tip

    For a complete example of hybrid search, refer to the [hybrid-search example](../examples/hybrid-search-with-pytidb.md).


## Basic Usage

### Step 1. Define an Embedding Function

Define an embedding function to generate vector representations of text data.

```python
from pytidb.embeddings import EmbeddingFunction

embed_fn = EmbeddingFunction(
    model_name="openai/text-embedding-3-small",
    api_key=os.getenv("OPENAI_API_KEY"),
)
```

### Step 2. Create a Table with Vector and Full-Text Indexes

=== "Python"

    After you have [connected to your TiDB database](./connect.md) using `TiDBClient` and get the `client` instance:

    You can now create a table with both a `FullTextField` and a `VectorField` to store the text data and its vector embedding.

    Example:

    ```python
    from pytidb.schema import TableModel, Field, FullTextField

    class Chunk(TableModel):
        __tablename__ = "chunks_for_hybrid_search"
        id: int = Field(primary_key=True)
        text: str = FullTextField()
        text_vec: list[float] = embed_fn.VectorField(source_field="text")

    table = client.create_table(schema=Chunk, if_exists="overwrite")
    ```

    In this example, PyTiDB will automatically create a full-text index on the `text` column and a vector index on the `text_vec` column.

### Step 3. Insert Sample Data

=== "Python"

    Use the `bulk_insert()` method to insert sample data into the table.

    ```python
    table.truncate()
    table.bulk_insert([
        Chunk(
            text="TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads.",
        ),
        Chunk(
            text="PyTiDB is a Python library for developers to connect to TiDB.",
        ),
        Chunk(
            text="LlamaIndex is a Python library for building AI-powered applications.",
        ),
    ])
    ```

    The `text_vec` field is automatically populated with the vector embedding of the text data via the [Auto Embedding](../guides/auto-embedding.md) feature.

### Step 4. Perform Hybrid Search

To enable hybrid search, set the `search_type` parameter to `hybrid` when calling the `search()` method.

```python
results = (
    table.search(
        "AI database", search_type="hybrid"
    )
    .limit(3)
    .to_list()
)

for item in results:
    item.pop("text_vec")
print(json.dumps(results, indent=4, sort_keys=True))
```

The search results contain three special fields:

- `_distance`: The distance between the query vector and the vector data in the table, as returned by the vector search.
- `_match_score`: The match score between the query and the text field, as returned by the full-text search.
- `_score`: The final score of the search result, calculated by the fusion algorithm.

```json title="Output"
[
    {
        "_distance": 0.4740166257687124,
        "_match_score": 1.6804268,
        "_score": 0.03278688524590164,
        "id": 60013,
        "text": "TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads."
    },
    {
        "_distance": 0.6428459116216618,
        "_match_score": 0.78427225,
        "_score": 0.03200204813108039,
        "id": 60015,
        "text": "LlamaIndex is a Python library for building AI-powered applications."
    },
    {
        "_distance": 0.641581407158715,
        "_match_score": null,
        "_score": 0.016129032258064516,
        "id": 60014,
        "text": "PyTiDB is a Python library for developers to connect to TiDB."
    }
]
```


## Fusion Methods

Fusion methods combine results from vector (semantic) and full-text (keyword) searches into a single, unified ranking. This ensures that the final results leverage both semantic relevance and keyword matching.

PyTiDB supports two fusion methods:

- `rrf`: Reciprocal Rank Fusion (default)
- `weighted`: Weighted Score Fusion

You can select the fusion method that best fits your use case to optimize hybrid search results.

### Reciprocal Rank Fusion (RRF)

Reciprocal Rank Fusion (RRF) is an algorithm that evaluates search results by leveraging the rank of documents in multiple result sets.

For more details, see the [RRF paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).

=== "Python"

    Enable reciprocal rank fusion by specifying the `method` parameter as `"rrf"` in the `.fusion()` method.

    ```python
    results = (
        table.search(
            "AI database", search_type="hybrid"
        )
        .fusion(method="rrf")
        .limit(3)
        .to_list()
    )
    ```

    Parameters:

    - `k`: A constant (default: 60) to prevent division by zero and control the impact of high-ranked documents.

### Weighted Score Fusion

Weighted Score Fusion combines vector search and full-text search scores using weighted sum:

```python
final_score = vs_weight * vector_score + fts_weight * fulltext_score
```

=== "Python"

    Enable weighted score fusion by specifying the `method` parameter as `"weighted"` in the `.fusion()` method.

    For example, to give more weight to vector search, set the `vs_weight` parameter to 0.7 and the `fts_weight` parameter to 0.3:

    ```python
    results = (
        table.search(
            "AI database", search_type="hybrid"
        )
        .fusion(method="weighted", vs_weight=0.7, fts_weight=0.3)
        .limit(3)
        .to_list()
    )
    ```

    Parameters:

    - `vs_weight`: The weight of the vector search score.
    - `fts_weight`: The weight of the full-text search score.


## Rerank Method

Hybrid search also supports reranking using reranker-specific models. 

=== "Python"

    Use the `rerank()` method to specify a reranker that sorts search results by relevance between the query and the documents.

    **Example: Using JinaAI Reranker to rerank the hybrid search results**

    ```python
    reranker = Reranker(
        # Use the `jina-reranker-m0` model
        model_name="jina_ai/jina-reranker-m0",
        api_key="{your-jinaai-api-key}"
    )

    results = (
        table.search(
            "AI database", search_type="hybrid"
        )
        .fusion(method="rrf", k=60)
        .rerank(reranker, "text")
        .limit(3)
        .to_list()
    )
    ```

    To check other reranker models, see the [Reranking](../guides/reranking.md) guide.


================================================
FILE: src/ai/guides/image-search.md
================================================
# Image search

**Image search** helps you find similar images by comparing their visual content, not just text or metadata. This feature is useful for e-commerce, content moderation, digital asset management, and any scenario where you need to search for or deduplicate images based on appearance.

TiDB enables image search using **vector search**. With automatic embedding, you can generate image embeddings from image URLs, PIL images, or keyword text using a multimodal embedding model. TiDB then efficiently searches for similar vectors at scale.

!!! tip

    For a complete example of image search, see the [Pet image search demo](../examples/image-search-with-pytidb.md).

## Basic usage

### Step 1. Define an embedding function

To generate image embeddings, you need an embedding model that supports image input.

For demonstration, you can use Jina AI's multimodal embedding model to generate image embeddings.

Go to [Jina AI](https://jina.ai/embeddings) to create an API key, then initialize the embedding function as follows:

```python hl_lines="7"
from pytidb.embeddings import EmbeddingFunction

image_embed = EmbeddingFunction(
    # Or another provider/model that supports multimodal input
    model_name="jina_ai/jina-embedding-v4",
    api_key="{your-jina-api-key}",
    multimodal=True,
)
```

### Step 2. Create a table and vector field

Use `VectorField()` to define a vector field for storing image embeddings. Set the `source_field` parameter to specify the field that stores image URLs.

```python
from pytidb.schema import TableModel, Field

class ImageItem(TableModel):
    __tablename__ = "image_items"
    id: int = Field(primary_key=True)
    image_uri: str = Field()
    image_vec: list[float] = image_embed.VectorField(
        source_field="image_uri"
    )

table = client.create_table(schema=ImageItem, if_exists="overwrite")
```

### Step 3. Insert image data

When you insert data, the `image_vec` field is automatically populated with the embedding generated from the `image_uri`.

```python
table.bulk_insert([
    ImageItem(image_uri="https://example.com/image1.jpg"),
    ImageItem(image_uri="https://example.com/image2.jpg"),
    ImageItem(image_uri="https://example.com/image3.jpg"),
])
```

### Step 4. Perform image search

Image search is a type of vector search. Automatic embedding lets you input an image URL, PIL image, or keyword text directly. All these inputs are converted to vector embeddings for similarity matching.

#### Option 1: Search by image URL

Search for similar images by providing an image URL:

```python
results = table.search("https://example.com/query.jpg").limit(3).to_list()
```

The client converts the input image URL into a vector. TiDB then finds and returns the most similar images by comparing their vectors.

#### Option 2: Search by PIL image

You can also search for similar images by providing an image file or bytes:

```python
from PIL import Image

image = Image.open("/path/to/query.jpg")

results = table.search(image).limit(3).to_list()
```

The client converts the PIL image object into a Base64 string before sending it to the embedding model.

#### Option 3: Search by keyword text

You can also search for similar images by providing keyword text. 

For example, if you are working on a pet image dataset, you can search for similar images by keywords like "orange tabby cat" or "golden retriever puppy".

```python
results = table.search("orange tabby cat").limit(3).to_list()
```

The keyword text will be converted to a vector embedding that captures the semantic meaning by the multimodal embedding model, and then a vector search will be performed to find the images whose embeddings are most similar to the keyword embedding.

## See also

- [Automatic embedding guide](./auto-embedding.md)
- [Vector search guide](../concepts/vector-search.md)
- [Pet image search demo](../examples/image-search-with-pytidb.md)


================================================
FILE: src/ai/guides/joins.md
================================================
# Multiple Table Joins

As a relational database, TiDB allows you to store diverse data in tables with different structures (for example: `chunks`, `documents`, `users`, `chats`) in a single database. Moreover, you can use joins to combine data from multiple tables to perform complex queries.

## Basic Usage


### Step 1. Create tables and insert sample data

=== "Python"

    Assuming you have already [connected to the TiDB database](./connect.md) via TiDBClient:

    Create a `documents` table and insert some sample data:

    ```python
    from pytidb import Session
    from pytidb.schema import TableModel, Field
    from pytidb.sql import select

    class Document(TableModel):
        __tablename__ = "documents"
        id: int = Field(primary_key=True)
        title: str = Field(max_length=255)

    client.create_table(schema=Document, if_exists="overwrite")
    client.table("documents").truncate()
    client.table("documents").bulk_insert([
        Document(id=1, title="The Power of Positive Thinking"),
        Document(id=2, title="The Happiness Advantage"),
        Document(id=3, title="The Art of Happiness"),
    ])
    ```

    Create a `chunks` table and insert some sample data:

    ```python
    class Chunk(TableModel):
        __tablename__ = "chunks"
        id: int = Field(primary_key=True)
        text: str = Field(max_length=255)
        document_id: int = Field(foreign_key="documents.id")

    client.create_table(schema=Chunk, if_exists="overwrite")
    client.table("chunks").truncate()
    client.table("chunks").bulk_insert([
        Chunk(id=1, text="Positive thinking can change your life", document_id=1),
        Chunk(id=2, text="Happiness leads to success", document_id=2),
        Chunk(id=3, text="Finding joy in everyday moments", document_id=3),
    ])
    ```

=== "SQL"

    Create a `documents` table and insert some sample data:

    ```sql
    CREATE TABLE documents (
        id INT PRIMARY KEY,
        title VARCHAR(255) NOT NULL
    );

    INSERT INTO documents (id, title) VALUES 
        (1, 'The Power of Positive Thinking'),
        (2, 'The Happiness Advantage'),
        (3, 'The Art of Happiness');
    ```

    Create a `chunks` table and insert some sample data:

    ```sql
    CREATE TABLE chunks (
        id INT PRIMARY KEY,
        text VARCHAR(255) NOT NULL,
        document_id INT NOT NULL,
        FOREIGN KEY (document_id) REFERENCES documents(id)
    );

    INSERT INTO chunks (id, text, document_id) VALUES 
        (1, 'Positive thinking can change your life', 1),
        (2, 'Happiness leads to success', 2),
        (3, 'Finding joy in everyday moments', 3);
    ```


### Step 2. Perform a join query

=== "Python"

    ```python
    with Session(client.db_engine) as db_session:
        query = (
            select(Chunk)
            .join(Document, Chunk.document_id == Document.id)
            .where(Document.title == "The Power of Positive Thinking")
        )
        chunks = db_session.exec(query).all()

    [(c.id, c.text, c.document_id) for c in chunks]
    ```

=== "SQL"

    Perform a join query to combine data from the `chunks` and `documents` tables:

    ```sql
    SELECT c.id, c.text, c.document_id
    FROM chunks c
    JOIN documents d ON c.document_id = d.id
    WHERE d.title = 'The Power of Positive Thinking';
    ```


================================================
FILE: src/ai/guides/raw-queries.md
================================================
# Raw Queries

## Operate data with raw SQL

You can use `client.execute()` method to execute `INSERT`, `UPDATE`, `DELETE` and other data manipulation SQL statements.

```python
client.execute("INSERT INTO chunks(text, user_id) VALUES ('sample text', 5)")
```

### SQL injection prevention

Both of the `execute` and `query` methods are support the **Parameterized SQL** feature, which help you avoid [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) while building dynamic SQL statements.

```python
client.execute(
    "INSERT INTO chunks(text, user_id) VALUES (:text, :user_id)",
    {
        "text": "sample text",
        "user_id": 6,
    },
)
```

## Query data with rawSQL

You can use `client.query()` method to execute `SELECT`, `SHOW` and other query SQL statements.

### Output query result

The `client.query()` method will return a `SQLQueryResult` instance with some helper methods:

- `to_pydantic()`
- `to_list()`
- `to_pandas()`
- `to_rows()`
- `scalar()`


#### As Pydantic model

The `to_pydantic()` method will return a list of Pydantic models.

```python
client.query("SELECT id, text, user_id FROM chunks").to_pydantic()
```

#### As SQLAlchemy result rows

The `to_rows()` method will return a list of tuple, every tuple represent of one row of data.

```python
client.query("SHOW TABLES;").to_rows()
```

#### As list of dict

The `to_list()` method will convert the query result into a list of dict.

```python
client.query(
    "SELECT id, text, user_id FROM chunks WHERE user_id = :user_id",
    {
        "user_id": 3
    }
).to_list()
```

#### As pandas DataFrame

The `to_pandas()` method to convert the query result to a `pandas.DataFrame`, which is displayed as human-friendly style on the notebook:

```python
client.query("SELECT id, text, user_id FROM chunks").to_pandas()
```

#### As scalar value

The `scalar()` method will return the first column of the first row of the result set.

```python
client.query("SELECT COUNT(*) FROM chunks;").scalar()
```

================================================
FILE: src/ai/guides/reranking.md
================================================
# Reranking

Reranking is a technique used to improve the relevance and accuracy of search results by re-evaluating and reordering them using a dedicated rerank model.

The search process works in two stages:

1. **Initial Retrieval**: Vector search identifies the top `k` most similar documents from the collection
2. **Reranking**: A reranking model evaluates these `k` documents based on the relevance between the query and the documents and reorders them to produce the final top `n` results (where `n` ≤ `k`)

This two-stage retrieval approach significantly improves both document relevance and accuracy.

## Basic Usage

=== "Python"

    PyTiDB provides the `Reranker` class that allows you to use reranker models from multiple third-party providers.

    1. Create a reranker instance

        ```python
        from pytidb.rerankers import Reranker

        reranker = Reranker(model_name="{provider}/{model_name}")
        ```

    2. Apply reranker via `.rerank()` method

        ```python
        table.search("{query}").rerank(reranker, "{field_to_rerank}").limit(3)
        ```

## Supported Providers

Here are some examples to use reranker models from third-party providers.

### Jina AI

To enable reranker provided by JinaAI, go to their [website](https://jina.ai/reranker) to create a API key.

For example:

```python
jinaai = Reranker(
    # Using the `jina-reranker-m0` model
    model_name="jina_ai/jina-reranker-m0",
    api_key="{your-jinaai-api-key}"
)
```


================================================
FILE: src/ai/guides/tables.md
================================================
# Working with tables

TiDB uses tables to organize and store collections of related data. It provides flexible schema definition capabilities, allowing you to structure your tables according to your specific requirements.

A table can contain multiple columns with different data types to store various kinds of data. Supported data types include text, numbers, vectors, binary data (`BLOB`), JSON, and more.

!!! tip

    For a complete working example, see the [basic example](https://github.com/pingcap/pytidb/tree/main/examples/basic) in our repository.

## Create a table

### Using TableModel

TiDB provides a `TableModel` class that represents the schema of a table. This class is compatible with the [Pydantic Model](https://docs.pydantic.dev/latest/concepts/models/) and allows you to define the table structure in a declarative way.

In the following example, you create a table named `items` with these columns:

- `id`: a primary key column with an integer type
- `content`: a text type column
- `embedding`: a vector type column with 3 dimensions
- `meta`: a JSON type column

=== "Python"

    After you [connect to the database](./connect.md) using PyTiDB and obtain a `client` instance, you can create a table with the `create_table` method.

    ```python hl_lines="12"
    from pytidb.schema import TableModel, Field, VectorField
    from pytidb.datatype import TEXT, JSON

    class Item(TableModel):
        __tablename__ = "items"

        id: int = Field(primary_key=True)
        content: str = Field(sa_type=TEXT)
        embedding: list[float] = VectorField(dimensions=3)
        meta: dict = Field(sa_type=JSON, default_factory=dict)

    table = client.create_table(schema=Item, if_exists="overwrite")
    ```

    The `create_table` method accepts these parameters:

    - `schema`: The `TableModel` class that defines your table structure.
    - `if_exists`: The creation mode of the table.
        - `raise` (default): Creates the table if it does not exist; raises an error if it already exists.
        - `skip`: Creates the table if it does not exist; does nothing if it already exists.
        - `overwrite`: Drops the existing table and creates a new one. This is useful for **testing and development**, but not recommended for production environments.
  
    Once the table is created, you can use the `table` object to insert, update, delete, and query data.

=== "SQL"

    Use the `CREATE TABLE` statement to create a table.

    ```sql
    CREATE TABLE items (
        id INT PRIMARY KEY,
        content TEXT,
        embedding VECTOR(3),
        meta JSON
    );
    ```

## Add data to a table

### With TableModel

You can use a `TableModel` instance to represent a record and insert it into the table.

To insert a single record:

=== "Python"

    Use the `table.insert()` method to insert a single record into the table.

    ```python
    table.insert(
        Item(
            id=1,
            content="TiDB is a distributed SQL database",
            embedding=[0.1, 0.2, 0.3],
            meta={"category": "database"},
        )
    )
    ```

=== "SQL"

    Use the `INSERT INTO` statement to insert a single record into the table.

    ```sql
    INSERT INTO items(id, content, embedding, meta)
    VALUES (1, 'TiDB is a distributed SQL database', '[0.1, 0.2, 0.3]', '{"category": "database"}');
    ```

To insert multiple records:

=== "Python"

    Use the `table.bulk_insert()` method to insert multiple records into the table.

    ```python
    table.bulk_insert([
        Item(
            id=2,
            content="GPT-4 is a large language model",
            embedding=[0.4, 0.5, 0.6],
            meta={"category": "llm"},
        ),
        Item(
            id=3,
            content="LlamaIndex is a Python library for building AI-powered applications",
            embedding=[0.7, 0.8, 0.9],
            meta={"category": "rag"},
        ),
    ])
    ```

=== "SQL"

    Use the `INSERT INTO` statement to insert multiple records into the table.

    ```sql
    INSERT INTO items(id, content, embedding, meta)
    VALUES
        (2, 'GPT-4 is a large language model', '[0.4, 0.5, 0.6]', '{"category": "llm"}'),
        (3, 'LlamaIndex is a Python library for building AI-powered applications', '[0.7, 0.8, 0.9]', '{"category": "rag"}');
    ```

### With Dict

You can also use `dict` to represent records and insert them into the table. This approach is more flexible and doesn't require to use a `TableModel` to insert data.

To insert a single record:

=== "Python"

    Use the `table.insert()` method with a dictionary to insert a single record into the table.

    ```python
    table.insert({
        "id": 1,
        "content": "TiDB is a distributed SQL database",
        "embedding": [0.1, 0.2, 0.3],
        "meta": {"category": "database"},
    })
    ```

=== "SQL"

    Use the `INSERT INTO` statement to insert a single record into the table.

    ```sql
    INSERT INTO items(id, content, embedding, meta)
    VALUES (1, 'TiDB is a distributed SQL database', '[0.1, 0.2, 0.3]', '{"category": "database"}');
    ```

## Save data to a table

The `save` method provides a convenient way to insert or update a single record. If a record with the specified primary key does not exist, it creates a new record. If the record already exists, it overwrites the entire record.

!!! note

    If a record ID already exists, `table.save()` function overwrites the entire record. To change only part of a record, use `table.update()`.

=== "Python"

    Use the `table.save()` method to save a single record to the table.

    **Example: Save a new record**

    ```python
    saved_record = table.save(
        Item(
            id=4,
            content="Vector databases enable AI applications",
            embedding=[1.0, 1.1, 1.2],
            meta={"category": "vector-db"},
        )
    )
    ```

    **Example: Save an existing record (overwrites the entire record)**

    ```python
    # This overwrites the entire record with id=1
    updated_record = table.save(
        Item(
            id=1,  # Existing ID
            content="Updated content for TiDB",
            embedding=[0.2, 0.3, 0.4],
            meta={"category": "updated"},
        )
    )
    ```

=== "SQL"

    Use the `INSERT ... ON DUPLICATE KEY UPDATE` statement to save a record.

    **Example: Save a new record or update if it exists**

    ```sql
    INSERT INTO items(id, content, embedding, meta)
    VALUES (4, 'Vector databases enable AI applications', '[1.0, 1.1, 1.2]', '{"category": "vector-db"}')
    ON DUPLICATE KEY UPDATE
        content = VALUES(content),
        embedding = VALUES(embedding),
        meta = VALUES(meta);
    ```

## Query data from a table

To fetch records from a table:

=== "Python"

    Use the `table.query()` method to fetch the records from the table.

    **Example: Fetch the first 10 records**

    ```python
    result = table.query(limit=10).to_list()
    ```

=== "SQL"

    Use the `SELECT` statement to fetch the records from the table.

    **Example: Fetch the first 10 records**

    ```sql
    SELECT * FROM items LIMIT 10;
    ```

To fetch records based on query conditions:

=== "Python"

    Pass the `filters` parameter to the `table.query()` method.

    ```python
    result = table.query(
        filters={"meta.category": "database"},
        limit=10
    ).to_list()
    ```

=== "SQL"

    Use the `WHERE` clause to filter records.

    **Example: Fetch the 10 records with category "database"**

    ```sql
    SELECT * FROM items WHERE meta->>'$.category' = 'database' LIMIT 10;
    ```

For a complete list of supported filter operations and examples, refer to the [filtering](./filtering.md) guide.

## Update data in a table

=== "Python"

    Use the `table.update()` method to update records with [filters](./filtering.md).

    **Example: Update the record whose `id` equals 1**

    ```python
    table.update(
        values={
            "content": "TiDB Cloud Starter is a fully managed, auto-scaling cloud database service"
            "embedding": [0.1, 0.2, 0.4],
            "meta": {"category": "dbass"},
        },
        filters={
            "id": 1
        },
    )
    ```

=== "SQL"

    Use the `UPDATE` statement to update records with [filters](./filtering.md).

    **Example: Update the record whose `id` equals 1**

    ```sql
    UPDATE items
    SET
        content = 'TiDB Cloud Starter is a fully managed, auto-scaling cloud database service',
        embedding = '[0.1, 0.2, 0.4]',
        meta = '{"category": "dbass"}'
    WHERE
        id = 1;
    ```

## Delete from a table

=== "Python"

    Use the `table.delete()` method to delete records with [filters](./filtering.md).

    **Example: Delete the record where `id` equals 2**

    ```python
    table.delete(
        filters={
            "id": 2
        }
    )
    ```

=== "SQL"

    Use the `DELETE` statement to delete records with [filters](./filtering.md).

    **Example: Delete the record where `id` equals 2**

    ```sql
    DELETE FROM items WHERE id = 2;
    ```

## Truncate a table

=== "Python"

    To remove all data from the table but keep the table structure, use the `table.truncate()` method.

    ```python
    table.truncate()
    ```

    To check that the table is truncated, verify that it contains 0 rows.

    ```python
    table.rows()
    ```

=== "SQL"

    To remove all data from the table but keep the table structure, use the `TRUNCATE TABLE` statement.

    ```sql
    TRUNCATE TABLE items;
    ```

    To check that the table is truncated, verify that it contains 0 rows.

    ```sql
    SELECT COUNT(*) FROM items;
    ```

## Drop a table

=== "Python"

    To permanently remove a table from the database, use the `client.drop_table()` method.

    ```python
    client.drop_table("items")
    ```

    To check that the table is removed from the database:

    ```python
    client.table_names()
    ```

=== "SQL"

    To permanently remove a table from the database, use the `DROP TABLE` statement.

    ```sql
    DROP TABLE items;
    ```

    To check that the table is removed from the database:

    ```sql
    SHOW TABLES;
    ```


================================================
FILE: src/ai/guides/transaction.md
================================================
# Transaction

TiDB supports ACID transactions, which ensure data consistency and reliability.

## Basic Usage

=== "Python"

    ```python
    with client.session() as session:
        initial_total_balance = session.query("SELECT SUM(balance) FROM players").scalar()

        # Transfer 10 coins from player 1 to player 2
        session.execute("UPDATE players SET balance = balance - 10 WHERE id = 1")
        session.execute("UPDATE players SET balance = balance + 10 WHERE id = 2")

        session.commit()
        # or session.rollback()

        final_total_balance = session.query("SELECT SUM(balance) FROM players").scalar()
        assert final_total_balance == initial_total_balance
    ```

## See also

- [TiDB Develop Guide - Transaction](https://docs.pingcap.com/tidbcloud/dev-guide-transaction-overview/)
- [TiDB Docs- SQL Reference - Transactions](https://docs.pingcap.com/tidbcloud/transaction-overview/)

================================================
FILE: src/ai/guides/vector-search.md
================================================
# Vector Search

Vector search uses semantic similarity to help you find the most relevant records, even if your query does not explicitly include all the keywords.

!!! tip

    For a complete example of vector search, see the [vector-search example](../examples/vector-search-with-pytidb.md).


## Basic Usage

This section shows you how to use vector search in your application in minimal steps. Before you start, you need to [connect to the database](./connect.md).

### Step 1. Create a table with a vector field

=== "Python"

    You can use `client.create_table()` to create a table and use `VectorField` to define a vector field.

    In this example, we create a table named `documents` with four columns:

    - `id`: The primary key of the table.
    - `text`: The text content of the document.
    - `text_vec`: The vector embedding of the text content.
    - `meta`: The metadata of the document, which is a JSON object.

    ```python hl_lines="9"
    from pytidb.schema import TableModel, Field, VectorField
    from pytidb.datatype import TEXT, JSON

    class Document(TableModel):
        __tablename__ = "documents"

        id: int = Field(primary_key=True)
        text: str = Field(sa_type=TEXT)
        text_vec: list[float] = VectorField(dimensions=3)
        meta: dict = Field(sa_type=JSON, default_factory=dict)

    table = client.create_table(schema=Document, if_exists="overwrite")
    ```

    The `VectorField` class accepts the following parameters:

    - `dimensions`: The number of dimensions of the vector. Once specified, only vectors with this exact dimension can be stored in this field.
    - `index`: Whether to create a [vector index](https://docs.pingcap.com/tidbcloud/vector-search-index/) for the vector field. Defaults to `True`.
    - `distance_metric`: The distance metric to use for the vector index. Supported values:
        - `DistanceMetric.COSINE` (default): Cosine distance metric, suitable for measuring text similarity
        - `DistanceMetric.L2`: L2 distance metric, suitable for capturing overall difference

=== "SQL"

    You can use the `CREATE TABLE` statement to create a table and using `VECTOR` type to define a vector column.

    ```sql hl_lines="4 5"
    CREATE TABLE documents (
        id INT PRIMARY KEY,
        text TEXT,
        text_vec VECTOR(3),
        VECTOR INDEX `vec_idx_text_vec`((VEC_COSINE_DISTANCE(`text_vec`)))
    );
    ```

    In this example:

    - The `text_vec` column is defined as a `VECTOR` type with 3 dimensions, it means that the vector to be stored in this column must have 3 dimensions.
    - A vector index is created using the `VEC_COSINE_DISTANCE` function to optimize vector search performance

    TiDB supports two distance functions for vector indexes:
    
    - `VEC_COSINE_DISTANCE`: Calculates the cosine distance between two vectors
    - `VEC_L2_DISTANCE`: Calculates L2 distance (Euclidean distance) between two vectors


### Step 2. Insert vector data into the table

For demonstration purposes, insert some text and their corresponding vector embeddings into the table. In this example, we use simple 3-dimensional vectors.

We insert three documents:

- `dog` with the vector embedding `[1, 2, 1]`
- `fish` with the vector embedding `[1, 2, 4]`
- `tree` with the vector embedding `[1, 0, 0]`

=== "Python"

    ```python
    table.bulk_insert([
        Document(text="dog", text_vec=[1,2,1], meta={"category": "animal"}),
        Document(text="fish", text_vec=[1,2,4], meta={"category": "animal"}),
        Document(text="tree", text_vec=[1,0,0], meta={"category": "plant"}),
    ])
    ```

=== "SQL"

    ```sql
    INSERT INTO documents (id, text, text_vec, meta)
    VALUES
        (1, 'dog', '[1,2,1]', '{"category": "animal"}'),
        (2, 'fish', '[1,2,4]', '{"category": "animal"}'),
        (3, 'tree', '[1,0,0]', '{"category": "plant"}');
    ```

!!! tip

    In real-world applications, vector embeddings are usually generated by an [embedding model](../concepts/vector-search.md#embedding-model).
    
    For convenience, pytidb provides an auto embedding feature that can automatically generate vector embeddings for your text fields when you insert, update, or search—no manual processing needed.

    For details, see the [Auto Embedding](./auto-embedding.md) guide.

### Step 3. Perform vector search

Vector search uses vector distance metrics to measure the similarity and relevance between vectors. The closer the distance, the more relevant the record. To find the most relevant documents in the table, you need to specify a query vector. 

In this example, we assume the query is `A swimming animal` and its vector embedding is `[1, 2, 3]`.

=== "Python"

    You can use the `table.search()` method to perform vector search, which uses `search_mode="vector"` by default.

    ```python
    table.search([1, 2, 3]).limit(3).to_list()
    ```

    ```python title="Execution result"
    [
        {"id": 2, "text": "fish", "text_vec": [1,2,4], "_distance": 0.00853986601633272},
        {"id": 1, "text": "dog", "text_vec": [1,2,1], "_distance": 0.12712843905603044},
        {"id": 3, "text": "tree", "text_vec": [1,0,0], "_distance": 0.7327387580875756},
    ]
    ```

    The result shows that the most relevant document is `fish` with a distance of `0.00853986601633272`.

=== "SQL"

    You can use the `ORDER BY <distance_function>(<column_name>, <query_vector>) LIMIT <n>` clause in the `SELECT` statement to get the n nearest neighbors of the query vector.

    In this example, we use the `vec_cosine_distance` function to calculate the cosine distance between the vectors stored in the `text_vec` column and the provided query vector `[1, 2, 3]`.

    ```sql
    SELECT id, text, vec_cosine_distance(text_vec, '[1,2,3]') AS distance
    FROM documents
    ORDER BY distance
    LIMIT 3;
    ```

    ```plain title="Execution result"
    +----+----------+---------------------+
    | id | text     | distance            |
    +----+----------+---------------------+
    |  2 | fish     | 0.00853986601633272 |
    |  1 | dog      | 0.12712843905603044 |
    |  3 | tree     |  0.7327387580875756 |
    +----+----------+---------------------+
    3 rows in set (0.15 sec)
    ```

    The result shows that the most relevant document is `fish` with a distance of `0.00853986601633272`.


## Distance metrics

Distance metrics are a measure of the similarity between a pair of vectors. Currently, TiDB supports the following distance metrics:

=== "Python"

    The `table.search()` API supports the following distance metrics:

    | Metric Name              | Description                                                    | Best For |
    |--------------------------|----------------------------------------------------------------|----------|
    | `DistanceMetric.COSINE`  | Calculates the cosine distance between two vectors (default). Measures the angle between vectors. | Text embeddings, semantic search |
    | `DistanceMetric.L2`      | Calculates the L2 distance (Euclidean distance) between two vectors. Measures the straight-line distance. | Image features |

    To change the distance metric used for vector search, use the `.distance_metric()` method.

    **Example: Use the L2 distance metric**

    ```python
    from pytidb.schema import DistanceMetric

    results = (
        table.search([1, 2, 3])
            .distance_metric(DistanceMetric.L2)
            .limit(10)
            .to_list()
    )
    ```

=== "SQL"

    In SQL, you can use the following built-in functions to calculate vector distances directly in your queries:

    | Function Name                                                                                                                        | Description                                                    |
    |-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|
    | [`VEC_L2_DISTANCE`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_l2_distance)                       | Calculates L2 distance (Euclidean distance) between two vectors |
    | [`VEC_COSINE_DISTANCE`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_cosine_distance)               | Calculates the cosine distance between two vectors              |
    | [`VEC_NEGATIVE_INNER_PRODUCT`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_negative_inner_product) | Calculates the negative of the inner product between two vectors|
    | [`VEC_L1_DISTANCE`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_l1_distance)                       | Calculates L1 distance (Manhattan distance) between two vectors |


## Distance threshold

The `table.search()` API allows you to set a distance threshold to control the similarity of the returned results. By specifying this threshold, you can exclude less similar vectors and return only those that meet your relevance criteria.

=== "Python"

    Use the `.distance_threshold()` method to set a maximum distance for the search results. Only records with a distance less than the threshold are returned.

    **Example: Only return documents with a distance less than 0.5**

    ```python
    results = table.search([1, 2, 3]).distance_threshold(0.5).limit(10).to_list()
    ```

=== "SQL"

    In SQL, use the `HAVING` clause with a distance function to filter results by distance:

    **Example: Only return documents with a distance less than 0.1**

    ```sql
    SELECT id, text, vec_cosine_distance(text_vec, '[1,2,3]') AS distance
    FROM documents
    HAVING distance < 0.1
    ORDER BY distance
    LIMIT 10;
    ```

## Distance range

The `table.search()` API also supports specifying a distance range to further refine the results.

=== "Python"

    Use the `.distance_range()` method to set both minimum and maximum distance values. Only records with a distance within this range are returned.

    **Example: Only return documents with a distance between 0.01 and 0.05**

    ```python
    results = table.search([1, 2, 3]).distance_range(0.01, 0.05).limit(10).to_list()
    ```

=== "SQL"

    To specify a distance range in SQL, use `BETWEEN` or other comparison operators in the `HAVING` clause:

    **Example: Only return documents with a distance between 0.01 and 0.05**

    ```sql
    SELECT id, text, vec_l2_distance(text_vec, '[1,2,3]') AS distance
    FROM documents
    HAVING distance BETWEEN 0.01 AND 0.05
    ORDER BY distance
    LIMIT 10;
    ```

## Metadata filtering

As a relational database, TiDB supports a rich set of [SQL operators](https://docs.pingcap.com/tidbcloud/operators/) and allows flexible combinations of filtering conditions.

For vector search in TiDB, you can apply metadata filtering on scalar fields (e.g., integers, strings) or JSON fields.

Typically, vector search combined with metadata filtering operates in two modes:

- **Post-filtering**: In a two-stage retrieval process, TiDB first performs vector search to retrieve the top-k candidate results from the entire vector space, then applies the filter to this candidate set. The vector search stage typically leverages a vector index for efficiency.
- **Pre-filtering**: The filter is applied before vector search. If the filter is highly selective and the filtered field is indexed with a scalar index, this approach can significantly reduce the search space and improve performance.

### Post-filtering

=== "Python"

    Use the `.filter()` method with a filter dictionary to apply filtering to vector search.

    By default, the `table.search()` API uses post-filtering mode to maximize search performance with the vector index.

    **Example: Vector search with post-filtering**

    ```python
    results = (
        table.search([1, 2, 3])
            # The `meta` is a JSON field, and its value is a JSON object
            # like {"category": "animal"}
            .filter({"meta.category": "animal"})
            .num_candidate(50)
            .limit(10)
            .to_list()
    )
    ```

    !!! tip
        When using a vector index, if the final `limit` is very small, the accuracy of the results may decrease. You can use the `.num_candidate()` method to control how many candidates to retrieve from the vector index during the vector search phase, without changing the `limit` parameter.

        A higher `num_candidate` value generally improves recall but may reduce query performance. Adjust this value based on your dataset and accuracy requirements.

=== "SQL"

    Currently, vector indexes are only effective in strict ANN (Approximate Nearest Neighbor) queries, such as:

    ```sql
    SELECT * FROM <table> ORDER BY <distance_func>(<column>) LIMIT <n>
    ```

    In other words, you cannot use a `WHERE` clause together with a vector index in the same query.

    If you need to combine vector search with additional filtering conditions, you can use the post-filtering pattern. In this approach, the ANN query will be divided into two parts:
    
    - The inner query performs the vector search using the vector index.
    - The outer query applies the `WHERE` condition to filter the results.

    ```sql hl_lines="8"
    SELECT *
    FROM (
        SELECT id, text, meta, vec_cosine_distance(text_vec, '[1,2,3]') AS distance
        FROM documents
        ORDER BY distance
        LIMIT 50
    ) candidates
    WHERE meta->>'$.category' = 'animal'
    ORDER BY distance
    LIMIT 10;
    ```

    !!! tip

        The post-filtering pattern may lead to false positives — for example, the inner query may retrieve the top 50 most similar records, but none of them match the `WHERE` condition.

        To mitigate this, you can increase the `LIMIT` value (e.g., 50) in the **inner query** to fetch more candidates, improving the chances of returning enough valid results after filtering.

    For supported SQL operators, see [Operators](https://docs.pingcap.com/tidbcloud/operators/) in the TiDB Cloud documentation.

### Pre-filtering

=== "Python"

    To enable pre-filtering, set the `prefilter` parameter to `True` in the `.filter()` method.

    **Example: Vector search with pre-filtering**

    ```python
    results = (
        table.search([1, 2, 3])
            .filter({"meta.category": "animal"}, prefilter=True)
            .limit(10)
            .to_list()
    )
    ```

    For supported filter operators, see [Filtering](./filtering.md).

=== "SQL"

    In SQL, use the `->>` operator or `JSON_EXTRACT` to access JSON fields in the `WHERE` clause:

    ```sql
    SELECT id, text, meta, vec_cosine_distance(text_vec, '[1,2,3]') AS distance
    FROM documents
    WHERE meta->>'$.category' = 'animal'
    ORDER BY distance
    LIMIT 10;
    ```

    For supported SQL operators, see [Operators](https://docs.pingcap.com/tidbcloud/operators/) in the TiDB Cloud documentation.

## Multiple vector fields

TiDB supports defining multiple vector columns in a single table, allowing you to store and search different types of vector embeddings. 

For example, you can store both text embeddings and image embeddings in the same table, making it convenient to manage multi-modal data.

=== "Python"

    You can define multiple vector fields in the schema and perform vector search on the specified vector field by using the `.vector_column()` method.

    **Example: Specify the vector field to search on**

    ```python hl_lines="6 8 17"
    # Create a table with multiple vector fields
    class RichTextDocument(TableModel):
        __tablename__ = "rich_text_documents"
        id: int = Field(primary_key=True)
        text: str = Field(sa_type=TEXT)
        text_vec: list[float] = VectorField(dimensions=3)
        image_url: str
        image_vec: list[float] = VectorField(dimensions=3)

    table = client.create_table(schema=RichTextDocument, if_exists="overwrite")

    # Insert sample data ...

    # Search using image vector field
    results = (
        table.search([1, 2, 3])
            .vector_column("image_vec")
            .distance_metric(DistanceMetric.COSINE)
            .limit(10)
            .to_list()
    )
    ```

=== "SQL"

    You can create multiple vector columns in a table and search them using suitable distance functions:

    ```sql
    -- Create a table with multiple vector fields
    CREATE TABLE rich_text_documents (
        id BIGINT PRIMARY KEY,
        text TEXT,
        text_vec VECTOR(3),
        image_url VARCHAR(255),
        image_vec VECTOR(3)
    );

    -- Insert sample data ...

    -- Search using text vector
    SELECT id, image_url, vec_l2_distance(image_vec, '[4,5,6]') AS image_distance
    FROM rich_text_documents
    ORDER BY image_distance
    LIMIT 10;
    ```


## Output search results

=== "Python"

    The `table.search()` API lets you convert search results into several common data processing formats:

    ### As SQLAlchemy result rows

    To work with raw SQLAlchemy result rows, use:

    ```python
    table.search([1, 2, 3]).limit(10).to_rows()
    ```

    ### As a list of Python dictionaries

    For easier manipulation in Python, convert the results to a list of dictionaries:

    ```python
    table.search([1, 2, 3]).limit(10).to_list()
    ```

    ### As a pandas DataFrame

    To display results in a user-friendly table—especially useful in Jupyter notebooks—convert them to a pandas DataFrame:

    ```python
    table.search([1, 2, 3]).limit(10).to_pandas()
    ```

    ### As a list of Pydantic model instances

    The `TableModel` class can also be used as a Pydantic model to represent data entities. To work with results as Pydantic model instances, use:

    ```python
    table.search([1, 2, 3]).limit(10).to_pydantic()
    ```

================================================
FILE: src/ai/index.md
================================================
# TiDB for AI

TiDB is an open-source, distributed SQL database designed for modern AI applications, offering seamless scalability, real-time analytics, and unified storage for vectors, documents, knowledge graphs, operational data and more.

=== "Python"

    TiDB provide a Python SDK and a series of integrations with popular AI frameworks to help developers build AI applications efficiently.

    To install the TiDB Python SDK, run the following command:

    ```bash
    pip install pytidb
    ```

    Integrations:

    - AI Frameworks: [LlamaIndex](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-llamaindex/), [LangChain](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-langchain/)
    - ORM Libraries: [SQLAlchemy](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-sqlalchemy/), [Django-ORM](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-django-orm/), [Peewee](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-peewee/)
    - AI Services: [Bedrock](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-amazon-bedrock/)
    - Embedding Models/Services: [JinaAI](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-jinaai-embedding/)

## Next Steps

- 📖 Explore [Concepts](./concepts/vector-search.md) to understand the fundamentals of building modern AI applications
- 🚀 Follow our [Quickstart Guide](./quickstart.md) to begin building your first AI application with TiDB


================================================
FILE: src/ai/integrations/embedding-cohere.md
================================================
---
title: "Integrate TiDB Vector Search with Cohere Embeddings API"
description: "Learn how to integrate TiDB Vector Search with Cohere Embeddings API to store embeddings and perform semantic search."
keywords: "TiDB, Cohere, Vector search, text embeddings, multilingual embeddings"
---

# Integrate TiDB Vector Search with Cohere Embeddings API

This tutorial demonstrates how to use [Cohere](https://cohere.com/embed) to generate text embeddings, store them in TiDB vector storage, and perform semantic search.

!!! info

    Currently, [Server-Side Auto Embedding](../guides/auto-embedding.md) is only available on [TiDB Cloud Starter](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme) clusters in the following AWS regions:

    - `Frankfurt (eu-central-1)`
    - `Oregon (us-west-2)`
    - `N. Virginia (us-east-1)`

## Cohere Embeddings

Cohere offers multilingual embedding models for search, RAG, and classification. The latest `embed-v4.0` model supports text, images, and mixed content. You can use the Cohere Embeddings API with TiDB through the AI SDK or native SQL functions for automatic embedding generation.

### Supported Models

| Model Name                       | Dimensions | Max Input Tokens | Description |
|----------------------------------|------------|------------------|-------------|
| `cohere/embed-v4.0`             | 256, 512, 1024, 1536 (default) | 128k | Latest multimodal model supporting text, images, and mixed content (PDFs) |
| `cohere/embed-english-v3.0`     | 1024       | 512              | High-performance English embedding model optimized for search and classification |
| `cohere/embed-multilingual-v3.0`| 1024       | 512              | Multilingual model supporting 100+ languages |
| `cohere/embed-english-light-v3.0` | 384     | 512              | Lightweight English model for faster processing with similar performance |
| `cohere/embed-multilingual-light-v3.0` | 384 | 512          | Lightweight multilingual model for faster processing with similar performance |

For a complete list of supported models and detailed specifications, see the [Cohere Embeddings Documentation](https://docs.cohere.com/docs/cohere-embed).

## Usage example

This example demonstrates creating a vector table, inserting documents, and performing similarity search using Cohere embedding models.

### Step 1: Connect to the database

=== "Python"

    ```python
    from pytidb import TiDBClient

    tidb_client = TiDBClient.connect(
        host="{gateway-region}.prod.aws.tidbcloud.com",
        port=4000,
        username="{prefix}.root",
        password="{password}",
        database="{database}",
        ensure_db=True,
    )
    ```

=== "SQL"

    ```bash
    mysql -h {gateway-region}.prod.aws.tidbcloud.com \
        -P 4000 \
        -u {prefix}.root \
        -p{password} \
        -D {database}
    ```

### Step 2: Configure the API key

Create your API key from the [Cohere Dashboard](https://dashboard.cohere.com/api-keys) and bring your own key (BYOK) to use the embedding service.

=== "Python"

    Configure the API key for the Cohere embedding provider using the TiDB Client:

    ```python
    tidb_client.configure_embedding_provider(
        provider="cohere",
        api_key="{your-cohere-api-key}",
    )
    ```

=== "SQL"

    Set the API key for the Cohere embedding provider using SQL:

    ```sql
    SET @@GLOBAL.TIDB_EXP_EMBED_COHERE_API_KEY = "{your-cohere-api-key}";
    ```

### Step 3: Create a vector table

Create a table with a vector field that uses the `cohere/embed-v4.0` model to generate 1536-dimensional vectors (default dimension):

=== "Python"

    ```python
    from pytidb.schema import TableModel, Field
    from pytidb.embeddings import EmbeddingFunction
    from pytidb.datatype import TEXT

    class Document(TableModel):
        __tablename__ = "sample_documents"
        id: int = Field(primary_key=True)
        content: str = Field(sa_type=TEXT)
        embedding: list[float] = EmbeddingFunction(
            model_name="cohere/embed-v4.0"
        ).VectorField(source_field="content")

    table = tidb_client.create_table(schema=Document, if_exists="overwrite")
    ```

=== "SQL"

    ```sql
    CREATE TABLE sample_documents (
        `id`        INT PRIMARY KEY,
        `content`   TEXT,
        `embedding` VECTOR(1536) GENERATED ALWAYS AS (EMBED_TEXT(
            "cohere/embed-v4.0",
            `content`
        )) STORED
    );
    ```

### Step 4: Insert data into the table

=== "Python"

    Use the `table.insert()` or `table.bulk_insert()` API to add data:

    ```python
    documents = [
        Document(id=1, content="Python: High-level programming language for data science and web development."),
        Document(id=2, content="Python snake: Non-venomous constrictor found in tropical regions."),
        Document(id=3, content="Python framework: Django and Flask are popular web frameworks."),
        Document(id=4, content="Python libraries: NumPy and Pandas for data analysis."),
        Document(id=5, content="Python ecosystem: Rich collection of packages and tools."),
    ]
    table.bulk_insert(documents)
    ```

=== "SQL"

    Insert data using the `INSERT INTO` statement:

    ```sql
    INSERT INTO sample_documents (id, content)
    VALUES
        (1, "Python: High-level programming language for data science and web development."),
        (2, "Python snake: Non-venomous constrictor found in tropical regions."),
        (3, "Python framework: Django and Flask are popular web frameworks."),
        (4, "Python libraries: NumPy and Pandas for data analysis."),
        (5, "Python ecosystem: Rich collection of packages and tools.");
    ```

### Step 5: Search for similar documents

=== "Python"

    Use the `table.search()` API to perform vector search:

    ```python
    results = table.search("How to learn Python programming?") \
        .limit(2) \
        .to_list()
    print(results)
    ```

=== "SQL"

    Use the `VEC_EMBED_COSINE_DISTANCE` function to perform vector search based on cosine distance metric:

    ```sql
    SELECT
        `id`,
        `content`,
        VEC_EMBED_COSINE_DISTANCE(embedding, "How to learn Python programming?") AS _distance
    FROM sample_documents
    ORDER BY _distance ASC
    LIMIT 2;
    ```


================================================
FILE: src/ai/integrations/embedding-gemini.md
================================================
---
title: "Integrate TiDB Vector Search with Google Gemini Embeddings API"
description: "Learn how to integrate TiDB Vector Search with Google Gemini Embeddings API to store embeddings and perform semantic search."
keywords: "TiDB, Google Gemini, Vector search, text embeddings, multimodal embeddings"
---

# Integrate TiDB Vector Search with Google Gemini Embeddings API

This tutorial demonstrates how to use [Google Gemini](https://ai.google.dev/models/gemini) to generate embeddings for text and image data, store them in TiDB vector storage, and perform semantic search.

!!! info

    Currently, [Server-Side Auto Embedding](../guides/auto-embedding.md) is only available on [TiDB Cloud Starter](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme) clusters in the following AWS regions:

    - `Frankfurt (eu-central-1)`
    - `Oregon (us-west-2)`
    - `N. Virginia (us-east-1)`

## Google Gemini Embeddings

The Gemini API provides text embedding models that generate embeddings for words, phrases, sentences, and code. These embeddings enable advanced natural language processing (NLP) tasks such as semantic search, classification, and clustering. By using context-aware embeddings, you can achieve more accurate results than with traditional keyword-based methods.

### Supported Models

| Model Name | Dimensions (recommended) | Max Input Tokens | Description |
|------------|--------------------------|------------|-------------|
| `gemini-embedding-001` | 128–3072 (768, 1536, 3072) | 2048 | Text and code embeddings |

For a complete list of supported models an

Download .txt

gitextract_rmebsf2k/

├── .cursor/
│   └── rules/
│       └── styleguide.mdc
├── .github/
│   └── workflows/
│       ├── docs.yml
│       └── sitemap-sync.yml
├── .gitignore
├── .vscode/
│   └── settings.json
├── DEVELOPMENT.md
├── LICENSE
├── Makefile
├── README.md
├── demos.yml
├── mkdocs.yml
├── pyproject.toml
├── scripts/
│   ├── check_dependencies.py
│   ├── generate_demos.py
│   └── sync_from_tidb_docs.py
└── src/
    ├── ai/
    │   ├── concepts/
    │   │   └── vector-search.md
    │   ├── examples/
    │   │   ├── auto-embedding-with-pytidb.md
    │   │   ├── basic-with-pytidb.md
    │   │   ├── fulltext-search-with-pytidb.md
    │   │   ├── hybrid-search-with-pytidb.md
    │   │   ├── image-search-with-pytidb.md
    │   │   ├── index.md
    │   │   ├── memory-with-pytidb.md
    │   │   ├── rag-with-pytidb.md
    │   │   ├── text2sql-with-pytidb.md
    │   │   └── vector-search-with-pytidb.md
    │   ├── guides/
    │   │   ├── auto-embedding.md
    │   │   ├── connect.md
    │   │   ├── filtering.md
    │   │   ├── fulltext-search.md
    │   │   ├── hybrid-search.md
    │   │   ├── image-search.md
    │   │   ├── joins.md
    │   │   ├── raw-queries.md
    │   │   ├── reranking.md
    │   │   ├── tables.md
    │   │   ├── transaction.md
    │   │   └── vector-search.md
    │   ├── index.md
    │   ├── integrations/
    │   │   ├── embedding-cohere.md
    │   │   ├── embedding-gemini.md
    │   │   ├── embedding-huggingface.md
    │   │   ├── embedding-jinaai.md
    │   │   ├── embedding-nvidia-nim.md
    │   │   ├── embedding-openai-compatible.md
    │   │   ├── embedding-openai.md
    │   │   ├── embedding-overview.md
    │   │   ├── embedding-tidb-cloud-hosted.md
    │   │   ├── langchain.md
    │   │   ├── llamaindex.md
    │   │   ├── tidb-mcp-claude-code.md
    │   │   ├── tidb-mcp-claude-desktop.md
    │   │   ├── tidb-mcp-cursor.md
    │   │   ├── tidb-mcp-server.md
    │   │   ├── tidb-mcp-vscode.md
    │   │   └── tidb-mcp-windsurf.md
    │   ├── javascripts/
    │   │   └── mathjax.js
    │   └── quickstart.md
    ├── robots.txt
    ├── styles/
    │   └── extra.css
    └── templates/
        ├── demo_gallery_template.j2
        └── demo_page_template.j2

Download .txt

SYMBOL INDEX (25 symbols across 3 files)

FILE: scripts/check_dependencies.py
  function main (line 7) | def main():

FILE: scripts/generate_demos.py
  function load_config (line 25) | def load_config():
  function fetch_readme_from_doc_link (line 38) | def fetch_readme_from_doc_link(doc_link):
  function extract_repo_info_from_doc_link (line 60) | def extract_repo_info_from_doc_link(doc_link):
  function process_readme_content (line 90) | def process_readme_content(content, demo_config):
  function create_demo_page (line 122) | def create_demo_page(demo_config, content):
  function generate_gallery_page (line 153) | def generate_gallery_page(config):
  function sync_demo_docs (line 179) | def sync_demo_docs(config, fetch_from_remote=True):
  function main (line 222) | def main(skip_demos, skip_gallery, verbose):

FILE: scripts/sync_from_tidb_docs.py
  function remove_custom_content_blocks (line 6) | def remove_custom_content_blocks(content):
  function collapse_extra_blank_lines (line 11) | def collapse_extra_blank_lines(content):
  function convert_note_blocks (line 16) | def convert_note_blocks(content):
  function convert_warning_blocks (line 29) | def convert_warning_blocks(content):
  function convert_tip_blocks (line 42) | def convert_tip_blocks(content):
  function remove_see_also_section (line 55) | def remove_see_also_section(content):
  function replace_image_paths (line 60) | def replace_image_paths(content):
  function replace_relative_doc_links (line 67) | def replace_relative_doc_links(content):
  function remove_overview_from_title (line 80) | def remove_overview_from_title(content):
  function remove_front_matter (line 90) | def remove_front_matter(content):
  function remove_simpletab_blocks (line 95) | def remove_simpletab_blocks(content):
  function process_overview (line 103) | def process_overview():
  function process_llamaindex (line 117) | def process_llamaindex():
  function process_langchain (line 134) | def process_langchain():
  function save_to_file (line 151) | def save_to_file(content, filename):
  function main (line 157) | def main():

Download .json

Condensed preview — 62 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (335K chars).

[
  {
    "path": ".cursor/rules/styleguide.mdc",
    "chars": 2552,
    "preview": "---\ndescription: \nglobs: \nalwaysApply: true\n---\n# Documentation Review Style Guide\n\n## Behavior instruction\n\nYou are act"
  },
  {
    "path": ".github/workflows/docs.yml",
    "chars": 981,
    "preview": "name: Docs Publish\n\non:\n  push:\n    branches:\n      - main\n\npermissions:\n  contents: write\n\njobs:\n  deploy:\n    runs-on:"
  },
  {
    "path": ".github/workflows/sitemap-sync.yml",
    "chars": 1436,
    "preview": "name: Sync Sitemap\n\non:\n  push:\n    branches:\n      - gh-pages\n    paths:\n      - 'sitemap.xml'\n\npermissions:\n  contents"
  },
  {
    "path": ".gitignore",
    "chars": 3440,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": ".vscode/settings.json",
    "chars": 224,
    "preview": "{\n    \"cSpell.words\": [\n        \"FULLTEXT\",\n        \"Pydantic\",\n        \"getenv\",\n        \"jina\",\n        \"jinaai\",\n    "
  },
  {
    "path": "DEVELOPMENT.md",
    "chars": 2315,
    "preview": "# Development Guide\n\nThis guide covers the development setup and maintenance of the TiDB for AI documentation site.\n\n## "
  },
  {
    "path": "LICENSE",
    "chars": 22200,
    "preview": "Attribution-ShareAlike 3.0 Unported\n\n=======================================================================\n\nCREATIVE C"
  },
  {
    "path": "Makefile",
    "chars": 1340,
    "preview": "# TiDB for AI Documentation Site\n\n.PHONY: help install generate-demos generate-demo-pages generate-demo-gallery serve bu"
  },
  {
    "path": "README.md",
    "chars": 2536,
    "preview": "# TiDB for AI\n\nTiDB is an open-source, distributed SQL database designed for modern AI applications, offering seamless s"
  },
  {
    "path": "demos.yml",
    "chars": 5491,
    "preview": "# Demo Gallery Configuration\ntitle: \"Demo Gallery\"\ndescription: |\n  Explore hands-on demos showcasing how TiDB empowers "
  },
  {
    "path": "mkdocs.yml",
    "chars": 7120,
    "preview": "site_name: TiDB for AI\nsite_url: https://pingcap.github.io\nrepo_name: pingcap/tidb\nrepo_url: https://github.com/pingcap/"
  },
  {
    "path": "pyproject.toml",
    "chars": 617,
    "preview": "# TiDB for AI Documentation\n# This project uses UV for fast dependency management\n# Install UV: curl -LsSf https://astra"
  },
  {
    "path": "scripts/check_dependencies.py",
    "chars": 954,
    "preview": "#!/usr/bin/env python3\n\"\"\"Simple dependency check for TiDB for AI documentation.\"\"\"\n\nimport sys\nimport subprocess\n\ndef m"
  },
  {
    "path": "scripts/generate_demos.py",
    "chars": 9934,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nGenerate demo gallery and individual demo documentation pages.\nThis script reads configuratio"
  },
  {
    "path": "scripts/sync_from_tidb_docs.py",
    "chars": 6505,
    "preview": "import requests\nimport re\nimport sys\n\n\ndef remove_custom_content_blocks(content):\n    \"\"\"Remove <CustomContent ...>...</"
  },
  {
    "path": "src/ai/concepts/vector-search.md",
    "chars": 5628,
    "preview": "---\ntitle: Vector Search Overview\nsummary: Learn about Vector Search in TiDB. This feature provides an advanced search s"
  },
  {
    "path": "src/ai/examples/auto-embedding-with-pytidb.md",
    "chars": 2625,
    "preview": "---\ntitle: Auto Embedding\ndescription: \"Automatically generate embeddings for your text data using built-in embedding mo"
  },
  {
    "path": "src/ai/examples/basic-with-pytidb.md",
    "chars": 2733,
    "preview": "---\ntitle: Basic Usage\ndescription: \"Learn fundamental PyTiDB operations including database connection, table creation, "
  },
  {
    "path": "src/ai/examples/fulltext-search-with-pytidb.md",
    "chars": 2095,
    "preview": "---\ntitle: Fulltext Search\ndescription: \"Perform traditional text search using MySQL fulltext search capabilities.\"\nsour"
  },
  {
    "path": "src/ai/examples/hybrid-search-with-pytidb.md",
    "chars": 3238,
    "preview": "---\ntitle: Hybrid Search\ndescription: \"Combine vector search and fulltext search for more comprehensive results.\"\nsource"
  },
  {
    "path": "src/ai/examples/image-search-with-pytidb.md",
    "chars": 3376,
    "preview": "---\ntitle: Image Search\ndescription: \"Build an image search application using multimodal embeddings for both text-to-ima"
  },
  {
    "path": "src/ai/examples/index.md",
    "chars": 16460,
    "preview": "---\ntitle: Demo Gallery\ndescription: Explore hands-on demos showcasing how TiDB empowers AI applications. Get started qu"
  },
  {
    "path": "src/ai/examples/memory-with-pytidb.md",
    "chars": 5393,
    "preview": "---\ntitle: Memory\ndescription: \"Implement conversation memory for chatbots and conversational AI applications.\"\nsource_r"
  },
  {
    "path": "src/ai/examples/rag-with-pytidb.md",
    "chars": 2708,
    "preview": "---\ntitle: RAG\ndescription: \"Build a RAG application that combines document retrieval with language generation.\"\nsource_"
  },
  {
    "path": "src/ai/examples/text2sql-with-pytidb.md",
    "chars": 1664,
    "preview": "---\ntitle: Text2SQL\ndescription: \"Convert natural language queries into SQL statements using AI models.\"\nsource_repo: \"h"
  },
  {
    "path": "src/ai/examples/vector-search-with-pytidb.md",
    "chars": 2511,
    "preview": "---\ntitle: Vector Search\ndescription: \"Implement semantic search using vector embeddings to find similar content.\"\nsourc"
  },
  {
    "path": "src/ai/guides/auto-embedding.md",
    "chars": 2483,
    "preview": "# Auto Embedding\n\nAuto embedding is a feature that allows you to automatically generate vector embeddings for text data."
  },
  {
    "path": "src/ai/guides/connect.md",
    "chars": 4869,
    "preview": "# Connect to database\n\nIn this guide, we will introduce how to connect to a TiDB database using the TiDB client.\n\n## Ins"
  },
  {
    "path": "src/ai/guides/filtering.md",
    "chars": 5157,
    "preview": "# Filtering\n\nAs a relational database, TiDB supports a rich set of [SQL operators](https://docs.pingcap.com/tidbcloud/op"
  },
  {
    "path": "src/ai/guides/fulltext-search.md",
    "chars": 9899,
    "preview": "# Full-text Search\n\n**Full-text search** enables you to find documents or data by matching keywords or phrases within th"
  },
  {
    "path": "src/ai/guides/hybrid-search.md",
    "chars": 6816,
    "preview": "# Hybrid Search\n\nHybrid search is a technique that combines multiple search algorithms to deliver more accurate and rele"
  },
  {
    "path": "src/ai/guides/image-search.md",
    "chars": 3916,
    "preview": "# Image search\n\n**Image search** helps you find similar images by comparing their visual content, not just text or metad"
  },
  {
    "path": "src/ai/guides/joins.md",
    "chars": 3339,
    "preview": "# Multiple Table Joins\n\nAs a relational database, TiDB allows you to store diverse data in tables with different structu"
  },
  {
    "path": "src/ai/guides/raw-queries.md",
    "chars": 2004,
    "preview": "# Raw Queries\n\n## Operate data with raw SQL\n\nYou can use `client.execute()` method to execute `INSERT`, `UPDATE`, `DELET"
  },
  {
    "path": "src/ai/guides/reranking.md",
    "chars": 1484,
    "preview": "# Reranking\n\nReranking is a technique used to improve the relevance and accuracy of search results by re-evaluating and "
  },
  {
    "path": "src/ai/guides/tables.md",
    "chars": 10209,
    "preview": "# Working with tables\n\nTiDB uses tables to organize and store collections of related data. It provides flexible schema d"
  },
  {
    "path": "src/ai/guides/transaction.md",
    "chars": 924,
    "preview": "# Transaction\n\nTiDB supports ACID transactions, which ensure data consistency and reliability.\n\n## Basic Usage\n\n=== \"Pyt"
  },
  {
    "path": "src/ai/guides/vector-search.md",
    "chars": 17891,
    "preview": "# Vector Search\n\nVector search uses semantic similarity to help you find the most relevant records, even if your query d"
  },
  {
    "path": "src/ai/index.md",
    "chars": 1484,
    "preview": "# TiDB for AI\n\nTiDB is an open-source, distributed SQL database designed for modern AI applications, offering seamless s"
  },
  {
    "path": "src/ai/integrations/embedding-cohere.md",
    "chars": 6330,
    "preview": "---\ntitle: \"Integrate TiDB Vector Search with Cohere Embeddings API\"\ndescription: \"Learn how to integrate TiDB Vector Se"
  },
  {
    "path": "src/ai/integrations/embedding-gemini.md",
    "chars": 7060,
    "preview": "---\ntitle: \"Integrate TiDB Vector Search with Google Gemini Embeddings API\"\ndescription: \"Learn how to integrate TiDB Ve"
  },
  {
    "path": "src/ai/integrations/embedding-huggingface.md",
    "chars": 7007,
    "preview": "---\ntitle: \"Integrate TiDB Vector Search with Hugging Face Embeddings\"\ndescription: \"Learn how to integrate TiDB Vector "
  },
  {
    "path": "src/ai/integrations/embedding-jinaai.md",
    "chars": 5710,
    "preview": "---\ntitle: \"Integrate TiDB Vector Search with Jina AI Embeddings API\"\ndescription: \"Learn how to integrate TiDB Vector S"
  },
  {
    "path": "src/ai/integrations/embedding-nvidia-nim.md",
    "chars": 6336,
    "preview": "---\ntitle: \"Integrate TiDB Vector Search with NVIDIA NIM Embeddings\"\ndescription: \"Learn how to integrate TiDB Vector Se"
  },
  {
    "path": "src/ai/integrations/embedding-openai-compatible.md",
    "chars": 4703,
    "preview": "---\ntitle: \"Integrate TiDB Vector Search with OpenAI-Compatible Embedding API\"\ndescription: \"Learn how to integrate TiDB"
  },
  {
    "path": "src/ai/integrations/embedding-openai.md",
    "chars": 5383,
    "preview": "---\ntitle: \"Integrate TiDB Vector Search with OpenAI Embeddings API\"\ndescription: \"Learn how to integrate TiDB Vector Se"
  },
  {
    "path": "src/ai/integrations/embedding-overview.md",
    "chars": 2624,
    "preview": "# Embeddings Integration\n\n## Overview\n\nTiDB provides a unified interface for integrating with various embedding provider"
  },
  {
    "path": "src/ai/integrations/embedding-tidb-cloud-hosted.md",
    "chars": 5259,
    "preview": "---\ntitle: \"Integrate TiDB Vector Search with TiDB Cloud Hosted Embedding Models\"\ndescription: \"Learn how to integrate T"
  },
  {
    "path": "src/ai/integrations/langchain.md",
    "chars": 25231,
    "preview": "\n# Integrate Vector Search with LangChain\n\nThis tutorial demonstrates how to integrate the [vector search](https://docs."
  },
  {
    "path": "src/ai/integrations/llamaindex.md",
    "chars": 10688,
    "preview": "\n# Integrate Vector Search with LlamaIndex\n\nThis tutorial demonstrates how to integrate the [vector search](https://docs"
  },
  {
    "path": "src/ai/integrations/tidb-mcp-claude-code.md",
    "chars": 2926,
    "preview": "---\ntitle: Get started with Claude Code and TiDB MCP Server\ndescription: This guide shows you how to configure the TiDB "
  },
  {
    "path": "src/ai/integrations/tidb-mcp-claude-desktop.md",
    "chars": 2001,
    "preview": "---\ntitle: Get started with Claude Desktop and TiDB MCP Server\ndescription: This guide shows you how to configure the Ti"
  },
  {
    "path": "src/ai/integrations/tidb-mcp-cursor.md",
    "chars": 3405,
    "preview": "---\ntitle: Get started with Cursor and TiDB MCP Server\ndescription: This guide shows you how to configure the TiDB MCP S"
  },
  {
    "path": "src/ai/integrations/tidb-mcp-server.md",
    "chars": 5680,
    "preview": "---\ntitle: TiDB MCP Server\ndescription: Manage your TiDB databases using natural language instructions with the TiDB MCP"
  },
  {
    "path": "src/ai/integrations/tidb-mcp-vscode.md",
    "chars": 2457,
    "preview": "---\ntitle: Get started with Visual Studio Code and TiDB MCP Server\ndescription: This guide shows you how to configure th"
  },
  {
    "path": "src/ai/integrations/tidb-mcp-windsurf.md",
    "chars": 2550,
    "preview": "---\ntitle: Get started with Windsurf and TiDB MCP Server\ndescription: This guide shows you how to configure the TiDB MCP"
  },
  {
    "path": "src/ai/javascripts/mathjax.js",
    "chars": 426,
    "preview": "window.MathJax = {\n    tex: {\n        inlineMath: [[\"\\\\(\", \"\\\\)\"]],\n        displayMath: [[\"\\\\[\", \"\\\\]\"]],\n        proce"
  },
  {
    "path": "src/ai/quickstart.md",
    "chars": 6865,
    "preview": "---\ntitle: Quickstart\ndescription: Get started with TiDB using Python SDK.\n---\n\n# Quickstart\n\nIn this guide, you will le"
  },
  {
    "path": "src/robots.txt",
    "chars": 72,
    "preview": "User-agent: *\nAllow: /\n\nSitemap:  https://pingcap.github.io/sitemap.xml\n"
  },
  {
    "path": "src/styles/extra.css",
    "chars": 7703,
    "preview": "/* Dark mode */\n[data-md-color-scheme=\"tidb-dark\"] {\n\n    --md-default-fg-color:               hsla(0, 0%, 95%, 1);\n    "
  },
  {
    "path": "src/templates/demo_gallery_template.j2",
    "chars": 9981,
    "preview": "---\ntitle: {{ config.title }}\ndescription: {{ config.meta.description }}\nhide:\n  - navigation\n  - toc\n  - pageTitle\n  - "
  },
  {
    "path": "src/templates/demo_page_template.j2",
    "chars": 465,
    "preview": "---\ntitle: {{ demo.title }}\ndescription: \"{{ demo.description }}\"\nsource_repo: \"{{ demo.doc_link | replace('/README.md',"
  }
]

About this extraction

This page contains the full source code of the pingcap/ossinsight-lite GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 62 files (310.0 KB), approximately 80.4k tokens, and a symbol index with 25 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo