Showing preview only (331K chars total). Download the full file or copy to clipboard to get everything.
Repository: pingcap/ossinsight-lite
Branch: main
Commit: 86e9015a8709
Files: 62
Total size: 310.0 KB
Directory structure:
gitextract_rmebsf2k/
├── .cursor/
│ └── rules/
│ └── styleguide.mdc
├── .github/
│ └── workflows/
│ ├── docs.yml
│ └── sitemap-sync.yml
├── .gitignore
├── .vscode/
│ └── settings.json
├── DEVELOPMENT.md
├── LICENSE
├── Makefile
├── README.md
├── demos.yml
├── mkdocs.yml
├── pyproject.toml
├── scripts/
│ ├── check_dependencies.py
│ ├── generate_demos.py
│ └── sync_from_tidb_docs.py
└── src/
├── ai/
│ ├── concepts/
│ │ └── vector-search.md
│ ├── examples/
│ │ ├── auto-embedding-with-pytidb.md
│ │ ├── basic-with-pytidb.md
│ │ ├── fulltext-search-with-pytidb.md
│ │ ├── hybrid-search-with-pytidb.md
│ │ ├── image-search-with-pytidb.md
│ │ ├── index.md
│ │ ├── memory-with-pytidb.md
│ │ ├── rag-with-pytidb.md
│ │ ├── text2sql-with-pytidb.md
│ │ └── vector-search-with-pytidb.md
│ ├── guides/
│ │ ├── auto-embedding.md
│ │ ├── connect.md
│ │ ├── filtering.md
│ │ ├── fulltext-search.md
│ │ ├── hybrid-search.md
│ │ ├── image-search.md
│ │ ├── joins.md
│ │ ├── raw-queries.md
│ │ ├── reranking.md
│ │ ├── tables.md
│ │ ├── transaction.md
│ │ └── vector-search.md
│ ├── index.md
│ ├── integrations/
│ │ ├── embedding-cohere.md
│ │ ├── embedding-gemini.md
│ │ ├── embedding-huggingface.md
│ │ ├── embedding-jinaai.md
│ │ ├── embedding-nvidia-nim.md
│ │ ├── embedding-openai-compatible.md
│ │ ├── embedding-openai.md
│ │ ├── embedding-overview.md
│ │ ├── embedding-tidb-cloud-hosted.md
│ │ ├── langchain.md
│ │ ├── llamaindex.md
│ │ ├── tidb-mcp-claude-code.md
│ │ ├── tidb-mcp-claude-desktop.md
│ │ ├── tidb-mcp-cursor.md
│ │ ├── tidb-mcp-server.md
│ │ ├── tidb-mcp-vscode.md
│ │ └── tidb-mcp-windsurf.md
│ ├── javascripts/
│ │ └── mathjax.js
│ └── quickstart.md
├── robots.txt
├── styles/
│ └── extra.css
└── templates/
├── demo_gallery_template.j2
└── demo_page_template.j2
================================================
FILE CONTENTS
================================================
================================================
FILE: .cursor/rules/styleguide.mdc
================================================
---
description:
globs:
alwaysApply: true
---
# Documentation Review Style Guide
## Behavior instruction
You are acting as a **senior technical writer** who is reviewing TiDB documentation pull requests and you always provide ready-to-commit doc suggestions so the PR author can commit them directly.
## Note
- When you finish the review, you directly add comments to the PR instead of requesting changes to avoid blocking the pull request from being merged.
- If the PR author is ti-chi-bot, you only correct English grammar, spelling, and punctuation mistakes, if any.
## Review aspects
- Clarity, simplicity, completeness, and readability
- Logical flow and sentence structure
- Technical accuracy and terminology consistency
## General writing principles
- Correct English grammar, spelling, and punctuation mistakes, if any.
- Make sure the documentation is easy to understand for TiDB users.
- Write in **second person** ("you") when addressing users.
- Prefer **present tense** unless describing historical behavior.
- Avoid unnecessary words and repetition.
- Use **consistent terminology**. For example:
- ❌ Do not mix "database" and "instance"
- ✅ Use "replicate" instead of "synchronize" when referring to replicating data from one TiDB cluster to another.
## Structure and format
- Use sentence case for headings (e.g., `## Configure the cluster`).
- Use ordered lists (`1.`, `2.`) for steps.
- Code snippets, command names, options, and paths should be in backticks (`` ` ``).
## Markdown style
- Add a blank line before and after headings and lists.
- Use proper heading hierarchy (no jumping from `##` to `####`).
## Common issues to flag
- Passive voice overuse
_"The cluster is started by TiUP"_ → _"TiUP starts the cluster"_
- Inconsistent use of technical terms
_"TiDB Cloud Starter clusters" vs. "TiDB Cloud clusters"_ – pick one.
- Unclear step instructions
_"Do it like before"_ → _"Repeat step 3 using the updated config file"_
- Grammar and spelling issues
_"recieve"_ → _"receive"_, _"an TiDB instance"_ → _"a TiDB instance"_
## Special notes
- Follow any existing terminology in our glossary (`/glossary.md` if available).
- When in doubt, favor clarity over cleverness.
- If something might confuse a new user, suggest a reword.
## Purposes of this style guide
This guide helps Gemini Code Assist provide actionable, high-quality suggestions for improving technical documentation, especially for PRs related to user guides, how-to articles, and product reference material.
================================================
FILE: .github/workflows/docs.yml
================================================
name: Docs Publish
on:
push:
branches:
- main
permissions:
contents: write
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure Git Credentials
run: |
git config user.name github-actions[bot]
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true
- name: Set up Python
run: uv python install
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
- uses: actions/cache@v4
with:
key: mkdocs-material-${{ env.cache_id }}
path: .cache
restore-keys: |
mkdocs-material-
- name: Install dependencies
run: uv sync
- name: Generate demo pages and gallery
run: make generate-demos
- name: Deploy docs
run: uv run mkdocs gh-deploy --force
================================================
FILE: .github/workflows/sitemap-sync.yml
================================================
name: Sync Sitemap
on:
push:
branches:
- gh-pages
paths:
- 'sitemap.xml'
permissions:
contents: write
jobs:
sync-sitemap:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: gh-pages
token: ${{ secrets.GITHUB_TOKEN }}
- name: Configure Git Credentials
run: |
git config user.name github-actions[bot]
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
- name: Check if sitemap.xml exists in root
id: check-sitemap
run: |
if [ -f "sitemap.xml" ]; then
echo "exists=true" >> $GITHUB_OUTPUT
else
echo "exists=false" >> $GITHUB_OUTPUT
fi
- name: Create ai directory if it doesn't exist
if: steps.check-sitemap.outputs.exists == 'true'
run: mkdir -p ai
- name: Copy sitemap.xml to ai/sitemap.xml
if: steps.check-sitemap.outputs.exists == 'true'
run: cp sitemap.xml ai/sitemap.xml
- name: Commit and push changes
if: steps.check-sitemap.outputs.exists == 'true'
run: |
git add ai/sitemap.xml
if git diff --staged --quiet; then
echo "No changes to commit"
else
git commit -m "Sync sitemap.xml to ai/sitemap.xml"
git push origin gh-pages
fi
================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# UV
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
#uv.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/
# Ruff stuff:
.ruff_cache/
# PyPI configuration file
.pypirc
================================================
FILE: .vscode/settings.json
================================================
{
"cSpell.words": [
"FULLTEXT",
"Pydantic",
"getenv",
"jina",
"jinaai",
"Rerank",
"reranker",
"reranking",
"tablename",
"multimodal"
]
}
================================================
FILE: DEVELOPMENT.md
================================================
# Development Guide
This guide covers the development setup and maintenance of the TiDB for AI documentation site.
## Quick Start
1. **Install UV** (fast dependency manager):
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
2. **Install dependencies**:
```bash
make install
```
3. **Start development server**:
```bash
make serve
```
## Project Structure
```
├── demos.yml # Demo Gallery configuration
├── scripts/generate_demos.py # Demo Gallery generation script
├── src/
│ ├── templates/ # Jinja2 templates
│ └── ai/examples/ # Generated demo pages
└── Makefile # Build commands
```
## Helpful Commands
```bash
# Dependencies
make check # Check dependencies and setup
make install # Install/update dependencies
# Development workflow
make serve # Start development server
make build # Build documentation site
make clean # Clean build artifacts
# Demo management
make generate-demos # Generate demo pages and gallery from demos.yml config
make generate-demo-pages # Only generate demo pages.
make generate-demo-gallery # Only generate demo gallery.
# Other
make help # Show all available commands
```
## Maintain the Demo Gallery
The Demo Gallery showcases AI demos of TiDB and is configured via [demos.yml](demos.yml).
To regenerate the demo gallery from configuration, run:
```bash
make generate-demos
```
### How to add a new demo
You can follow the steps below to add a new demo:
1. Add entry to `demos` array in `demos.yml` with unique `id`, title, description, and display properties
For example:
```yaml
demos:
- id: "basic"
title: "Basic Usage"
description: "Learn fundamental PyTiDB operations"
icon: "⚙️"
background: "linear-gradient(135deg, #10b981, var(--brand-color))"
link: "basic-with-pytidb/"
doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/basic/README.md"
cover_image: null
```
2. Add the demo `id` to appropriate category's `demos` array
For example:
```yaml
categories:
- id: "featured"
title: "Search"
demos: ["image-search"]
```
3. Run `make generate-demos` to regenerate
4. Commit changes
================================================
FILE: LICENSE
================================================
Attribution-ShareAlike 3.0 Unported
=======================================================================
CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
LEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN
ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
REGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR DAMAGES
RESULTING FROM ITS USE.
License
THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS
CREATIVE COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS
PROTECTED BY COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK
OTHER THAN AS AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS
PROHIBITED.
BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND
AGREE TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS
LICENSE MAY BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE
RIGHTS CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS
AND CONDITIONS.
1. Definitions
a. "Adaptation" means a work based upon the Work, or upon the Work
and other pre-existing works, such as a translation, adaptation,
derivative work, arrangement of music or other alterations of a
literary or artistic work, or phonogram or performance and includes
cinematographic adaptations or any other form in which the Work may
be recast, transformed, or adapted including in any form
recognizably derived from the original, except that a work that
constitutes a Collection will not be considered an Adaptation for
the purpose of this License. For the avoidance of doubt, where the
Work is a musical work, performance or phonogram, the
synchronization of the Work in timed-relation with a moving image
("synching") will be considered an Adaptation for the purpose of
this License.
b. "Collection" means a collection of literary or artistic works,
such as encyclopedias and anthologies, or performances, phonograms
or broadcasts, or other works or subject matter other than works
listed in Section 1(f) below, which, by reason of the selection and
arrangement of their contents, constitute intellectual creations,
in which the Work is included in its entirety in unmodified form
along with one or more other contributions, each constituting
separate and independent works in themselves, which together are
assembled into a collective whole. A work that constitutes a
Collection will not be considered an Adaptation (as defined below)
for the purposes of this License.
c. "Creative Commons Compatible License" means a license that is
listed at https://creativecommons.org/compatiblelicenses that has
been approved by Creative Commons as being essentially equivalent
to this License, including, at a minimum, because that license: (i)
contains terms that have the same purpose, meaning and effect as
the License Elements of this License; and, (ii) explicitly permits
the relicensing of adaptations of works made available under that
license under this License or a Creative Commons jurisdiction
license with the same License Elements as this License.
d. "Distribute" means to make available to the public the original
and copies of the Work or Adaptation, as appropriate, through sale
or other transfer of ownership.
e. "License Elements" means the following high-level license
attributes as selected by Licensor and indicated in the title of
this License: Attribution, ShareAlike.
f. "Licensor" means the individual, individuals, entity or entities
that offer(s) the Work under the terms of this License.
g. "Original Author" means, in the case of a literary or artistic
work, the individual, individuals, entity or entities who created
the Work or if no individual or entity can be identified, the
publisher; and in addition (i) in the case of a performance the
actors, singers, musicians, dancers, and other persons who act,
sing, deliver, declaim, play in, interpret or otherwise perform
literary or artistic works or expressions of folklore; (ii) in the
case of a phonogram the producer being the person or legal entity
who first fixes the sounds of a performance or other sounds; and,
(iii) in the case of broadcasts, the organization that transmits
the broadcast.
h. "Work" means the literary and/or artistic work offered under the
terms of this License including without limitation any production
in the literary, scientific and artistic domain, whatever may be
the mode or form of its expression including digital form, such as
a book, pamphlet and other writing; a lecture, address, sermon or
other work of the same nature; a dramatic or dramatico-musical
work; a choreographic work or entertainment in dumb show; a musical
composition with or without words; a cinematographic work to which
are assimilated works expressed by a process analogous to
cinematography; a work of drawing, painting, architecture,
sculpture, engraving or lithography; a photographic work to which
are assimilated works expressed by a process analogous to
photography; a work of applied art; an illustration, map, plan,
sketch or three-dimensional work relative to geography, topography,
architecture or science; a performance; a broadcast; a phonogram; a
compilation of data to the extent it is protected as a
copyrightable work; or a work performed by a variety or circus
performer to the extent it is not otherwise considered a literary
or artistic work.
i. "You" means an individual or entity exercising rights under this
License who has not previously violated the terms of this License
with respect to the Work, or who has received express permission
from the Licensor to exercise rights under this License despite a
previous violation.
j. "Publicly Perform" means to perform public recitations of the
Work and to communicate to the public those public recitations, by
any means or process, including by wire or wireless means or public
digital performances; to make available to the public Works in such
a way that members of the public may access these Works from a
place and at a place individually chosen by them; to perform the
Work to the public by any means or process and the communication to
the public of the performances of the Work, including by public
digital performance; to broadcast and rebroadcast the Work by any
means including signs, sounds or images.
k. "Reproduce" means to make copies of the Work by any means
including without limitation by sound or visual recordings and the
right of fixation and reproducing fixations of the Work, including
storage of a protected performance or phonogram in digital form or
other electronic medium.
2. Fair Dealing Rights. Nothing in this License is intended to reduce,
limit, or restrict any uses free from copyright or rights arising from
limitations or exceptions that are provided for in connection with the
copyright protection under copyright law or other applicable laws.
3. License Grant. Subject to the terms and conditions of this License,
Licensor hereby grants You a worldwide, royalty-free, non-exclusive,
perpetual (for the duration of the applicable copyright) license to
exercise the rights in the Work as stated below:
a. to Reproduce the Work, to incorporate the Work into one or more
Collections, and to Reproduce the Work as incorporated in the
Collections;
b. to create and Reproduce Adaptations provided that any such
Adaptation, including any translation in any medium, takes
reasonable steps to clearly label, demarcate or otherwise identify
that changes were made to the original Work. For example, a
translation could be marked "The original work was translated from
English to Spanish," or a modification could indicate "The original
work has been modified.";
c. to Distribute and Publicly Perform the Work including as
incorporated in Collections; and,
d. to Distribute and Publicly Perform Adaptations.
e. For the avoidance of doubt:
i. Non-waivable Compulsory License Schemes. In those
jurisdictions in which the right to collect royalties through
any statutory or compulsory licensing scheme cannot be waived,
the Licensor reserves the exclusive right to collect such
royalties for any exercise by You of the rights granted under
this License;
ii. Waivable Compulsory License Schemes. In those jurisdictions
in which the right to collect royalties through any statutory or
compulsory licensing scheme can be waived, the Licensor waives
the exclusive right to collect such royalties for any exercise
by You of the rights granted under this License; and,
iii. Voluntary License Schemes. The Licensor waives the right to
collect royalties, whether individually or, in the event that
the Licensor is a member of a collecting society that
administers voluntary licensing schemes, via that society, from
any exercise by You of the rights granted under this License.
The above rights may be exercised in all media and formats whether
now known or hereafter devised. The above rights include the right
to make such modifications as are technically necessary to exercise
the rights in other media and formats. Subject to Section 8(f), all
rights not expressly granted by Licensor are hereby reserved.
4. Restrictions. The license granted in Section 3 above is expressly
made subject to and limited by the following restrictions:
a. You may Distribute or Publicly Perform the Work only under the
terms of this License. You must include a copy of, or the Uniform
Resource Identifier (URI) for, this License with every copy of the
Work You Distribute or Publicly Perform. You may not offer or
impose any terms on the Work that restrict the terms of this
License or the ability of the recipient of the Work to exercise the
rights granted to that recipient under the terms of the License.
You may not sublicense the Work. You must keep intact all notices
that refer to this License and to the disclaimer of warranties with
every copy of the Work You Distribute or Publicly Perform. When You
Distribute or Publicly Perform the Work, You may not impose any
effective technological measures on the Work that restrict the
ability of a recipient of the Work from You to exercise the rights
granted to that recipient under the terms of the License. This
Section 4(a) applies to the Work as incorporated in a Collection,
but this does not require the Collection apart from the Work itself
to be made subject to the terms of this License. If You create a
Collection, upon notice from any Licensor You must, to the extent
practicable, remove from the Collection any credit as required by
Section 4(c), as requested. If You create an Adaptation, upon
notice from any Licensor You must, to the extent practicable,
remove from the Adaptation any credit as required by Section 4(c),
as requested.
b. You may Distribute or Publicly Perform an Adaptation only under
the terms of: (i) this License; (ii) a later version of this
License with the same License Elements as this License; (iii) a
Creative Commons jurisdiction license (either this or a later
license version) that contains the same License Elements as this
License (e.g., Attribution-ShareAlike 3.0 US)); (iv) a Creative
Commons Compatible License. If you license the Adaptation under one
of the licenses mentioned in (iv), you must comply with the terms
of that license. If you license the Adaptation under the terms of
any of the licenses mentioned in (i), (ii) or (iii) (the
"Applicable License"), you must comply with the terms of the
Applicable License generally and the following provisions: (I) You
must include a copy of, or the URI for, the Applicable License with
every copy of each Adaptation You Distribute or Publicly Perform;
(II) You may not offer or impose any terms on the Adaptation that
restrict the terms of the Applicable License or the ability of the
recipient of the Adaptation to exercise the rights granted to that
recipient under the terms of the Applicable License; (III) You must
keep intact all notices that refer to the Applicable License and to
the disclaimer of warranties with every copy of the Work as
included in the Adaptation You Distribute or Publicly Perform; (IV)
when You Distribute or Publicly Perform the Adaptation, You may not
impose any effective technological measures on the Adaptation that
restrict the ability of a recipient of the Adaptation from You to
exercise the rights granted to that recipient under the terms of
the Applicable License. This Section 4(b) applies to the Adaptation
as incorporated in a Collection, but this does not require the
Collection apart from the Adaptation itself to be made subject to
the terms of the Applicable License.
c. If You Distribute, or Publicly Perform the Work or any
Adaptations or Collections, You must, unless a request has been
made pursuant to Section 4(a), keep intact all copyright notices
for the Work and provide, reasonable to the medium or means You are
utilizing: (i) the name of the Original Author (or pseudonym, if
applicable) if supplied, and/or if the Original Author and/or
Licensor designate another party or parties (e.g., a sponsor
institute, publishing entity, journal) for attribution
("Attribution Parties") in Licensor's copyright notice, terms of
service or by other reasonable means, the name of such party or
parties; (ii) the title of the Work if supplied; (iii) to the
extent reasonably practicable, the URI, if any, that Licensor
specifies to be associated with the Work, unless such URI does not
refer to the copyright notice or licensing information for the
Work; and (iv) , consistent with Ssection 3(b), in the case of an
Adaptation, a credit identifying the use of the Work in the
Adaptation (e.g., "French translation of the Work by Original
Author," or "Screenplay based on original Work by Original
Author"). The credit required by this Section 4(c) may be
implemented in any reasonable manner; provided, however, that in
the case of a Adaptation or Collection, at a minimum such credit
will appear, if a credit for all contributing authors of the
Adaptation or Collection appears, then as part of these credits and
in a manner at least as prominent as the credits for the other
contributing authors. For the avoidance of doubt, You may only use
the credit required by this Section for the purpose of attribution
in the manner set out above and, by exercising Your rights under
this License, You may not implicitly or explicitly assert or imply
any connection with, sponsorship or endorsement by the Original
Author, Licensor and/or Attribution Parties, as appropriate, of You
or Your use of the Work, without the separate, express prior
written permission of the Original Author, Licensor and/or
Attribution Parties.
d. Except as otherwise agreed in writing by the Licensor or as may
be otherwise permitted by applicable law, if You Reproduce,
Distribute or Publicly Perform the Work either by itself or as part
of any Adaptations or Collections, You must not distort, mutilate,
modify or take other derogatory action in relation to the Work
which would be prejudicial to the Original Author's honor or
reputation. Licensor agrees that in those jurisdictions (e.g.
Japan), in which any exercise of the right granted in Section 3(b)
of this License (the right to make Adaptations) would be deemed to
be a distortion, mutilation, modification or other derogatory
action prejudicial to the Original Author's honor and reputation,
the Licensor will waive or not assert, as appropriate, this
Section, to the fullest extent permitted by the applicable national
law, to enable You to reasonably exercise Your right under Section
3(b) of this License (right to make Adaptations) but not otherwise.
5. Representations, Warranties and Disclaimer
UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING,
LICENSOR OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR
WARRANTIES OF ANY KIND CONCERNING THE WORK, EXPRESS, IMPLIED,
STATUTORY OR OTHERWISE, INCLUDING, WITHOUT LIMITATION, WARRANTIES
OF TITLE, MERCHANTIBILITY, FITNESS FOR A PARTICULAR PURPOSE,
NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS,
ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS, WHETHER OR NOT
DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF
IMPLIED WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU.
6. Limitation on Liability.
EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE LAW, IN NO EVENT WILL
LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR ANY SPECIAL,
INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES ARISING
OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS
BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
7. Termination
a. This License and the rights granted hereunder will terminate
automatically upon any breach by You of the terms of this License.
Individuals or entities who have received Adaptations or
Collections from You under this License, however, will not have
their licenses terminated provided such individuals or entities
remain in full compliance with those licenses. Sections 1, 2, 5, 6,
7, and 8 will survive any termination of this License.
b. Subject to the above terms and conditions, the license granted
here is perpetual (for the duration of the applicable copyright in
the Work). Notwithstanding the above, Licensor reserves the right
to release the Work under different license terms or to stop
distributing the Work at any time; provided, however that any such
election will not serve to withdraw this License (or any other
license that has been, or is required to be, granted under the
terms of this License), and this License will continue in full
force and effect unless terminated as stated above.
8. Miscellaneous
a. Each time You Distribute or Publicly Perform the Work or a
Collection, the Licensor offers to the recipient a license to the
Work on the same terms and conditions as the license granted to You
under this License.
b. Each time You Distribute or Publicly Perform an Adaptation,
Licensor offers to the recipient a license to the original Work on
the same terms and conditions as the license granted to You under
this License.
c. If any provision of this License is invalid or unenforceable
under applicable law, it shall not affect the validity or
enforceability of the remainder of the terms of this License, and
without further action by the parties to this agreement, such
provision shall be reformed to the minimum extent necessary to make
such provision valid and enforceable.
d. No term or provision of this License shall be deemed waived and
no breach consented to unless such waiver or consent shall be in
writing and signed by the party to be charged with such waiver or
consent.
e. This License constitutes the entire agreement between the
parties with respect to the Work licensed here. There are no
understandings, agreements or representations with respect to the
Work not specified here. Licensor shall not be bound by any
additional provisions that may appear in any communication from
You. This License may not be modified without the mutual written
agreement of the Licensor and You.
f. The rights granted under, and the subject matter referenced, in
this License were drafted utilizing the terminology of the Berne
Convention for the Protection of Literary and Artistic Works (as
amended on September 28, 1979), the Rome Convention of 1961, the
WIPO Copyright Treaty of 1996, the WIPO Performances and Phonograms
Treaty of 1996 and the Universal Copyright Convention (as revised
on July 24, 1971). These rights and subject matter take effect in
the relevant jurisdiction in which the License terms are sought to
be enforced according to the corresponding provisions of the
implementation of those treaty provisions in the applicable
national law. If the standard suite of rights granted under
applicable copyright law includes additional rights not granted
under this License, such additional rights are deemed to be
included in the License; this License is not intended to restrict
the license of any rights under applicable law.
=======================================================================
Creative Commons Notice
Creative Commons is not a party to this License, and makes no warranty
whatsoever in connection with the Work. Creative Commons will not be
liable to You or any party on any legal theory for any damages
whatsoever, including without limitation any general, special,
incidental or consequential damages arising in connection to this
license. Notwithstanding the foregoing two (2) sentences, if Creative
Commons has expressly identified itself as the Licensor hereunder, it
shall have all rights and obligations of Licensor.
Except for the limited purpose of indicating to the public that the
Work is licensed under the CCPL, Creative Commons does not authorize
the use by either party of the trademark "Creative Commons" or any
related trademark or logo of Creative Commons without the prior written
consent of Creative Commons. Any permitted use will be in compliance
with Creative Commons' then-current trademark usage guidelines, as may
be published on its website or otherwise made available upon request
from time to time. For the avoidance of doubt, this trademark
restriction does not form part of the License.
Creative Commons may be contacted at https://creativecommons.org/.
================================================
FILE: Makefile
================================================
# TiDB for AI Documentation Site
.PHONY: help install generate-demos generate-demo-pages generate-demo-gallery serve build clean check
help:
@echo "TiDB for AI Documentation Site"
@echo "============================="
@echo "Available commands:"
@echo " install - Install dependencies using UV"
@echo " generate-demos - Generate demo pages and gallery from demos.yml config"
@echo " generate-demo-pages - Only generate demo pages."
@echo " generate-demo-gallery - Only generate demo gallery."
@echo " serve - Start the development server"
@echo " build - Build the documentation site"
@echo " clean - Clean build artifacts"
@echo " check - Check dependencies and project setup"
@echo ""
@echo "Demo gallery configuration:"
@echo " Edit demos.yml in the project root to manage gallery content"
@echo ""
@echo "Prerequisites:"
@echo " UV package manager - curl -LsSf https://astral.sh/uv/install.sh | sh"
install:
uv pip install -e .
generate-demos:
python scripts/generate_demos.py
generate-demo-pages:
python scripts/generate_demos.py --skip-gallery
generate-demo-gallery:
python scripts/generate_demos.py --skip-demos
serve:
mkdocs serve
build:
mkdocs build
clean:
rm -rf site/
rm -rf .mkdocs_cache/
check:
python scripts/check_dependencies.py
================================================
FILE: README.md
================================================
# TiDB for AI
TiDB is an open-source, distributed SQL database designed for modern AI applications, offering seamless scalability, real-time analytics, and unified storage for vectors, documents, knowledge graphs, operational data and more.
🚀 Follow our [Quickstart Guide](https://pingcap.github.io/ai/quickstart/) to begin building your first AI application with TiDB
## PyTiDB
TiDB provide a Python SDK and a series of integrations with popular AI frameworks to help developers build AI applications efficiently.
To install the TiDB Python SDK, run the following command:
```bash
pip install pytidb
```
Examples:
- Getting Started: [Basic Usage](https://pingcap.github.io/ai/examples/basic-with-pytidb/), [Auto Embedding](https://pingcap.github.io/ai/examples/auto-embedding-with-pytidb/)
- Search & Retrieval: [Vector Search](https://pingcap.github.io/ai/examples/vector-search-with-pytidb/), [Fulltext Search](https://pingcap.github.io/ai/examples/fulltext-search-with-pytidb/), [Hybrid Search](https://pingcap.github.io/ai/examples/hybrid-search-with-pytidb/), [Image Search](https://pingcap.github.io/ai/examples/image-search-with-pytidb/)
- AI Applications: [RAG](https://pingcap.github.io/ai/examples/rag-with-pytidb/), [Text2SQL](https://pingcap.github.io/ai/examples/text2sql-with-pytidb/), [Memory](https://pingcap.github.io/ai/examples/memory-with-pytidb/)
Integrations:
- AI Frameworks: [LlamaIndex](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-llamaindex/), [LangChain](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-langchain/)
- ORM Libraries: [SQLAlchemy](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-sqlalchemy/), [Django-ORM](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-django-orm/), [Peewee](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-peewee/)
- AI Services: [Bedrock](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-amazon-bedrock/)
- Embedding Models/Services: [JinaAI](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-jinaai-embedding/)
## Contribute
We welcome contributions to improve the TiDB for AI documentation! For development setup, maintenance scripts, and detailed contribution guidelines, please see [DEVELOPMENT.md](DEVELOPMENT.md).
## FAQ
### How can I get support?
- [Join our Discord](https://discord.com/invite/vYU9h56kAX) (recommended)
- [Ask questions in our forum](https://ask.pingcap.com/)
- [Send support tickets](https://tidb.support.pingcap.com/)
================================================
FILE: demos.yml
================================================
# Demo Gallery Configuration
title: "Demo Gallery"
description: |
Explore hands-on demos showcasing how TiDB empowers AI applications.<br>
Get started quickly with <a href='https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=demo_gallery' target='_blank' rel='noopener noreferrer' class='gallery-cta-link'><b>TiDB Cloud Starter</b></a> to build your own AI-powered solutions.
meta:
description: "Explore hands-on demos showcasing how TiDB empowers AI applications. Get started quickly with TiDB Cloud Starter to build your own AI-powered solutions."
categories:
- id: "featured"
title: "⭐ Featured"
demos: ["image-search", "rag", "memory"]
- id: "getting-started"
title: "🚀 Getting Started"
demos: ["basic", "auto-embedding"]
- id: "search"
title: "🔍 Search & Retrieval"
demos: ["vector-search", "fulltext-search", "hybrid-search", "image-search"]
- id: "ai-apps"
title: "🤖 AI Applications"
demos: ["rag", "memory", "text2sql"]
demos:
- id: "image-search"
title: "Image Search"
description: "Build an image search application using multimodal embeddings for both text-to-image and image-to-image search."
category: "search"
icon: null
background: null
link: "image-search-with-pytidb/"
doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/image_search/README.md"
cover_image: "https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9"
- id: "rag"
title: "RAG"
description: "Build a RAG application that combines document retrieval with language generation."
category: "ai-apps"
icon: null
background: null
link: "rag-with-pytidb/"
doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/rag/README.md"
cover_image: "https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e"
- id: "basic"
title: "Basic Usage"
description: "Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation."
category: "getting-started"
icon: "⚙️"
background: "linear-gradient(135deg, #10b981, var(--brand-color))"
link: "basic-with-pytidb/"
doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/basic/README.md"
cover_image: null
- id: "auto-embedding"
title: "Auto Embedding"
description: "Automatically generate embeddings for your text data using built-in embedding models."
category: "getting-started"
icon: "🤖"
background: "radial-gradient(circle at center, #8b5cf6 0%, var(--brand-color) 100%)"
link: "auto-embedding-with-pytidb/"
doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/auto_embedding/README.md"
cover_image: null
- id: "vector-search"
title: "Vector Search"
description: "Implement semantic search using vector embeddings to find similar content."
category: "search"
icon: null
background: null
link: "vector-search-with-pytidb/"
doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/vector_search/README.md"
cover_image: "https://github.com/user-attachments/assets/6d7783a5-ce9c-4dcc-8b95-49d5f0ca735a"
- id: "fulltext-search"
title: "Fulltext Search"
description: "Perform traditional text search using MySQL fulltext search capabilities."
category: "search"
icon: null
background: null
link: "fulltext-search-with-pytidb/"
doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/fulltext_search/README.md"
cover_image: "https://github.com/user-attachments/assets/c81ddad4-f996-4b1f-85c0-5cbb55bc2a3a"
- id: "hybrid-search"
title: "Hybrid Search"
description: "Combine vector search and fulltext search for more comprehensive results."
category: "search"
icon: null
background: null
link: "hybrid-search-with-pytidb/"
doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/hybrid_search/README.md"
cover_image: "https://github.com/user-attachments/assets/6e1c639d-2160-44c8-86b4-958913b9eca5"
- id: "memory"
title: "Memory"
description: "Implement conversation memory for chatbots and conversational AI applications."
category: "ai-apps"
icon: null
background: null
link: "memory-with-pytidb/"
doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/memory/README.md"
cover_image: "https://github.com/user-attachments/assets/74dee96b-ea20-49dc-ad27-679faa5bf9b8"
- id: "text2sql"
title: "Text2SQL"
description: "Convert natural language queries into SQL statements using AI models."
category: "ai-apps"
icon: "💬"
background: "linear-gradient(135deg, #06b6d4, var(--brand-color))"
link: "text2sql-with-pytidb/"
doc_link: "https://github.com/pingcap/pytidb/tree/main/examples/text2sql/README.md"
cover_image: null
# CTA section configuration
cta:
title: "Ready to build your AI application?"
description: "Start your AI journey with TiDB Cloud Starter. Follow our quickstart guide to build your first AI-powered application in minutes, or explore specific examples for your use case."
buttons:
- text: "Try TiDB Cloud Starter"
url: "https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme"
type: "primary"
external: true
- text: "View Quickstart Guide"
url: "/ai/quickstart/"
type: "secondary"
external: false
================================================
FILE: mkdocs.yml
================================================
site_name: TiDB for AI
site_url: https://pingcap.github.io
repo_name: pingcap/tidb
repo_url: https://github.com/pingcap/tidb
edit_uri: https://github.com/pingcap/pingcap.github.io/tree/main/src
docs_dir: src
watch:
- src
theme:
name: material
logo: assets/logo.svg
favicon: assets/favicon.ico
color_mode: dark
palette:
# Palette toggle for dark mode
- scheme: tidb-dark
primary: custom
toggle:
icon: material/weather-night
name: Switch to light mode
# Palette toggle for light mode
- scheme: tidb-light
primary: custom
toggle:
icon: material/weather-sunny
name: Switch to dark mode
features:
- content.code.copy
- content.tabs.link
- content.action.edit
- content.tooltips
- toc.follow
- navigation.top
- navigation.tabs
- navigation.tabs.sticky
- navigation.footer
- navigation.tracking
- navigation.instant
- content.footnote.tooltips
icon:
repo: fontawesome/brands/github
annotation: material/arrow-right-circle
plugins:
- search
# - autorefs
# - mkdocs-jupyter
- redirects:
redirect_maps:
"index.md": "ai/index.md"
"ai/integrations/mcp.md": "ai/integrations/tidb-mcp-server.md"
markdown_extensions:
- admonition
- footnotes
- pymdownx.critic
- pymdownx.caret
- pymdownx.keys
- pymdownx.mark
- pymdownx.tilde
- pymdownx.details
- pymdownx.highlight:
anchor_linenums: true
line_spans: __span
pygments_lang_class: true
- pymdownx.inlinehilite
- pymdownx.snippets:
base_path: ..
dedent_subsections: true
- pymdownx.superfences
- pymdownx.tabbed:
alternate_style: true
- md_in_html
- abbr
- attr_list
- pymdownx.snippets
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- markdown.extensions.toc:
baselevel: 1
permalink: ""
- pymdownx.arithmatex:
generic: true
nav:
- Home:
- TiDB for AI: ai/index.md
- 🚀 Quickstart: ai/quickstart.md
- 📖 Concepts:
- Vector Search: ai/concepts/vector-search.md
- 📚 Guides:
- Connect to database: ai/guides/connect.md
- Working with tables: ai/guides/tables.md
- Vector Search: ai/guides/vector-search.md
- Fulltext Search: ai/guides/fulltext-search.md
- Hybrid Search: ai/guides/hybrid-search.md
- Image Search: ai/guides/image-search.md
- Auto Embedding: ai/guides/auto-embedding.md
- Reranking: ai/guides/reranking.md
- Filtering: ai/guides/filtering.md
- 💡 Demos:
- Gallery: ai/examples/index.md
- Basic Usage: ai/examples/basic-with-pytidb.md
- Auto Embedding: ai/examples/auto-embedding-with-pytidb.md
- Vector Search: ai/examples/vector-search-with-pytidb.md
- Fulltext Search: ai/examples/fulltext-search-with-pytidb.md
- Hybrid Search: ai/examples/hybrid-search-with-pytidb.md
- Image Search: ai/examples/image-search-with-pytidb.md
- RAG: ai/examples/rag-with-pytidb.md
- Memory: ai/examples/memory-with-pytidb.md
- Text2SQL: ai/examples/text2sql-with-pytidb.md
- 🔌 Integrations:
- MCP Integration:
- TiDB MCP Server: ai/integrations/tidb-mcp-server.md
- IDE & Tool Integration:
- Cursor: ai/integrations/tidb-mcp-cursor.md
- Claude Desktop: ai/integrations/tidb-mcp-claude-desktop.md
- Claude Code: ai/integrations/tidb-mcp-claude-code.md
- Visual Studio Code: ai/integrations/tidb-mcp-vscode.md
- Windsurf: ai/integrations/tidb-mcp-windsurf.md
- AI Frameworks:
- LlamaIndex: ai/integrations/llamaindex.md
- LangChain: ai/integrations/langchain.md
- Embeddings:
- Overview: ai/integrations/embedding-overview.md
- TiDB Cloud Hosted: ai/integrations/embedding-tidb-cloud-hosted.md
- OpenAI: ai/integrations/embedding-openai.md
- OpenAI Compatible: ai/integrations/embedding-openai-compatible.md
- Cohere: ai/integrations/embedding-cohere.md
- Jina AI: ai/integrations/embedding-jinaai.md
- Google Gemini: ai/integrations/embedding-gemini.md
- Hugging Face: ai/integrations/embedding-huggingface.md
- NVIDIA NIM: ai/integrations/embedding-nvidia-nim.md
- Concepts:
- Vector Search: ai/concepts/vector-search.md
- Guides:
- Connect to database: ai/guides/connect.md
- Working with tables: ai/guides/tables.md
- Vector Search: ai/guides/vector-search.md
- Fulltext Search: ai/guides/fulltext-search.md
- Hybrid Search: ai/guides/hybrid-search.md
- Image Search: ai/guides/image-search.md
- Auto Embedding: ai/guides/auto-embedding.md
- Reranking: ai/guides/reranking.md
- Filtering: ai/guides/filtering.md
- Transaction: ai/guides/transaction.md
- Raw Queries: ai/guides/raw-queries.md
- Multiple Table Joins: ai/guides/joins.md
- Demos:
- Gallery: ai/examples/index.md
- Basic Usage: ai/examples/basic-with-pytidb.md
- Auto Embedding: ai/examples/auto-embedding-with-pytidb.md
- Vector Search: ai/examples/vector-search-with-pytidb.md
- Fulltext Search: ai/examples/fulltext-search-with-pytidb.md
- Hybrid Search: ai/examples/hybrid-search-with-pytidb.md
- Image Search: ai/examples/image-search-with-pytidb.md
- RAG: ai/examples/rag-with-pytidb.md
- Memory: ai/examples/memory-with-pytidb.md
- Text2SQL: ai/examples/text2sql-with-pytidb.md
- Integrations:
- MCP Integration:
- TiDB MCP Server: ai/integrations/tidb-mcp-server.md
- IDE & Tool Integration:
- Cursor: ai/integrations/tidb-mcp-cursor.md
- Claude Desktop: ai/integrations/tidb-mcp-claude-desktop.md
- Claude Code: ai/integrations/tidb-mcp-claude-code.md
- Visual Studio Code: ai/integrations/tidb-mcp-vscode.md
- Windsurf: ai/integrations/tidb-mcp-windsurf.md
- AI Frameworks:
- LlamaIndex: ai/integrations/llamaindex.md
- LangChain: ai/integrations/langchain.md
- Embeddings:
- Overview: ai/integrations/embedding-overview.md
- TiDB Cloud Hosted: ai/integrations/embedding-tidb-cloud-hosted.md
- OpenAI: ai/integrations/embedding-openai.md
- OpenAI Compatible: ai/integrations/embedding-openai-compatible.md
- Cohere: ai/integrations/embedding-cohere.md
- Jina AI: ai/integrations/embedding-jinaai.md
- Google Gemini: ai/integrations/embedding-gemini.md
- Hugging Face: ai/integrations/embedding-huggingface.md
- NVIDIA NIM: ai/integrations/embedding-nvidia-nim.md
extra:
social:
- icon: fontawesome/brands/github
link: https://github.com/pingcap/pytidb
- icon: fontawesome/brands/x-twitter
link: https://twitter.com/TiDB_Developer
- icon: fontawesome/brands/linkedin
link: https://www.linkedin.com/company/pingcap
analytics:
provider: google
property: G-W2XGDSWFBH
extra_css:
- styles/extra.css
extra_javascript:
- https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js
================================================
FILE: pyproject.toml
================================================
# TiDB for AI Documentation
# This project uses UV for fast dependency management
# Install UV: curl -LsSf https://astral.sh/uv/install.sh | sh
# Install deps: make install (which runs: uv pip install -e .)
[project]
name = "docs"
version = "0.1.0"
description = "The documentation for TiDB and its AI capabilities"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"mkdocs-material>=9.6.12",
"mkdocs-jupyter>=0.25.1",
"mkdocstrings[python]>=0.29.1",
"mkdocs>=1.6.1",
"mkdocs-redirects>=1.2.2",
"requests>=2.31.0",
"PyYAML>=6.0",
"Jinja2>=3.1.0",
"click>=8.0.0",
]
================================================
FILE: scripts/check_dependencies.py
================================================
#!/usr/bin/env python3
"""Simple dependency check for TiDB for AI documentation."""
import sys
import subprocess
def main():
print("🚀 TiDB for AI Documentation - Quick Check")
# Check UV
try:
result = subprocess.run(["uv", "--version"], capture_output=True, text=True)
if result.returncode == 0:
print("✅ UV is available")
else:
print("❌ UV not working")
return 1
except FileNotFoundError:
print("❌ UV not found - Install: curl -LsSf https://astral.sh/uv/install.sh | sh")
return 1
# Check basic imports
try:
import mkdocs, requests, yaml
print("✅ Dependencies are installed")
except ImportError as e:
print(f"❌ Missing dependency: {e}")
print("💡 Run: make install")
return 1
print("🎉 Ready to go! Run 'make serve' to start.")
return 0
if __name__ == "__main__":
sys.exit(main())
================================================
FILE: scripts/generate_demos.py
================================================
#!/usr/bin/env python3
"""
Generate demo gallery and individual demo documentation pages.
This script reads configuration from demos.yml and generates:
1. Individual demo pages using demo_template.j2
2. Gallery index page using gallery_template.j2
"""
import sys
import requests
import re
import yaml
from pathlib import Path
from jinja2 import Environment, FileSystemLoader
import click
# Configuration
LOCAL_EXAMPLES_DIR = Path("src/ai/examples")
CONFIG_FILE = Path("demos.yml")
DEMO_TEMPLATE_FILE = Path("src/templates/demo_page_template.j2")
GALLERY_TEMPLATE_FILE = Path("src/templates/demo_gallery_template.j2")
OUTPUT_FILE = LOCAL_EXAMPLES_DIR / "index.md"
def load_config():
"""Load configuration from YAML file."""
if not CONFIG_FILE.exists():
raise click.ClickException(f"Configuration file {CONFIG_FILE} not found.")
try:
with open(CONFIG_FILE, 'r', encoding='utf-8') as f:
config = yaml.safe_load(f)
return config
except yaml.YAMLError as e:
raise click.ClickException(f"Error parsing {CONFIG_FILE}: {e}")
def fetch_readme_from_doc_link(doc_link):
"""Fetch README.md content from the provided doc_link."""
# Convert GitHub tree URL to raw content URL
if "github.com" in doc_link and "/tree/" in doc_link:
raw_url = doc_link.replace("github.com", "raw.githubusercontent.com").replace("/tree/", "/")
elif doc_link.endswith("/README.md"):
raw_url = doc_link
else:
# Assume it's a directory link, append README.md
raw_url = doc_link.rstrip('/') + '/README.md'
if "github.com" in raw_url and "/tree/" in raw_url:
raw_url = raw_url.replace("github.com", "raw.githubusercontent.com").replace("/tree/", "/")
try:
response = requests.get(raw_url, timeout=30)
response.raise_for_status()
return response.text
except requests.RequestException as e:
click.echo(f"Error fetching README from {raw_url}: {e}", err=True)
return None
def extract_repo_info_from_doc_link(doc_link):
"""Extract repository and path information from doc_link."""
if "github.com" not in doc_link:
return None, None, None, None
# Parse URL to extract owner, repo, and path
parts = doc_link.replace("https://github.com/", "").split("/")
if len(parts) < 2:
return None, None, None, None
owner, repo = parts[0], parts[1]
# Extract path after /tree/branch/
if "/tree/" in doc_link:
try:
tree_index = parts.index("tree")
if len(parts) > tree_index + 2: # owner/repo/tree/branch/path...
branch = parts[tree_index + 1]
path_parts = parts[tree_index + 2:]
# Remove README.md if present
if path_parts and path_parts[-1] == "README.md":
path_parts = path_parts[:-1]
path = "/".join(path_parts)
return owner, repo, branch, path
except ValueError:
pass
return None, None, None, None
def process_readme_content(content, demo_config):
"""Process README content to adapt it for the documentation site."""
if not content:
return None
# Extract repository info
owner, repo, branch, example_path = extract_repo_info_from_doc_link(demo_config['doc_link'])
if not all([owner, repo, branch, example_path]):
click.echo(f"Warning: Could not extract repo info from {demo_config['doc_link']}", err=True)
return content
base_repo_url = f"https://github.com/{owner}/{repo}"
base_raw_url = f"https://raw.githubusercontent.com/{owner}/{repo}/{branch}"
# Fix relative links to point to the original repository
processed_content = re.sub(
r'\]\((?!https?://)(.*?)\)',
f']({base_repo_url}/tree/{branch}/{example_path}/\\1)',
content
)
# Fix relative image links
processed_content = re.sub(
r'!\[([^\]]*)\]\((?!https?://)(.*?)\)',
f'',
processed_content
)
return processed_content
def create_demo_page(demo_config, content):
"""Create a markdown file for a demo using the demo template."""
if not DEMO_TEMPLATE_FILE.exists():
raise click.ClickException(f"Demo template file {DEMO_TEMPLATE_FILE} not found.")
# Create the local examples directory if it doesn't exist
LOCAL_EXAMPLES_DIR.mkdir(parents=True, exist_ok=True)
try:
# Set up Jinja2 environment
env = Environment(loader=FileSystemLoader("src/templates"))
template = env.get_template('demo_page_template.j2')
# Render the template
rendered_content = template.render(
demo=demo_config,
content=content
)
# Create the markdown file
filename = f"{demo_config['id'].replace('_', '-')}-with-pytidb.md"
filepath = LOCAL_EXAMPLES_DIR / filename
with open(filepath, 'w', encoding='utf-8') as f:
f.write(rendered_content)
return filepath
except Exception as e:
raise click.ClickException(f"Error creating demo page for {demo_config['id']}: {e}")
def generate_gallery_page(config):
"""Generate the gallery page using Jinja2 template."""
if not GALLERY_TEMPLATE_FILE.exists():
raise click.ClickException(f"Gallery template file {GALLERY_TEMPLATE_FILE} not found.")
try:
# Set up Jinja2 environment
env = Environment(loader=FileSystemLoader("src/templates"))
template = env.get_template('demo_gallery_template.j2')
# Render the template
rendered_content = template.render(
config=config,
categories=config.get('categories', {}),
demos=config.get('demos', {})
)
# Write the rendered content to the output file
with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
f.write(rendered_content)
return True
except Exception as e:
raise click.ClickException(f"Error generating gallery page: {e}")
def sync_demo_docs(config, fetch_from_remote=True):
"""Sync demo documentation files based on demos.yml configuration."""
if not fetch_from_remote:
return []
created_files = []
demos_config = config.get('demos', [])
with click.progressbar(demos_config, label='Processing demos') as demos:
for demo_config in demos:
demo_id = demo_config['id']
doc_link = demo_config.get('doc_link')
if not doc_link:
click.echo(f"Warning: No doc_link found for demo '{demo_id}', skipping...", err=True)
continue
# Fetch README content
readme_content = fetch_readme_from_doc_link(doc_link)
if readme_content:
# Process content
processed_content = process_readme_content(readme_content, demo_config)
if processed_content:
# Create demo page
filepath = create_demo_page(demo_config, processed_content)
if filepath:
created_files.append((demo_id, filepath))
else:
click.echo(f"Failed to process content for {demo_id}", err=True)
else:
click.echo(f"Failed to fetch README for {demo_id}", err=True)
return created_files
@click.command()
@click.option('--skip-demos', is_flag=True,
help='Skip generating individual demo pages from remote repositories')
@click.option('--skip-gallery', is_flag=True,
help='Skip generating the demo gallery index page')
@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
def main(skip_demos, skip_gallery, verbose):
"""Generate demo gallery and individual demo documentation pages.
This script reads configuration from demos.yml and generates:
1. Individual demo pages using demo_template.j2
2. Gallery index page using gallery_template.j2
By default, both demo pages and gallery are generated.
"""
if verbose:
click.echo("Running in verbose mode...")
# Load configuration
try:
config = load_config()
except click.ClickException:
raise
created_files = []
# Generate demo pages (unless skipped)
if not skip_demos:
if verbose:
click.echo("Generating demo pages from remote repositories...")
created_files = sync_demo_docs(config, fetch_from_remote=True)
if created_files:
click.echo(f"\n✅ Generated {len(created_files)} demo pages:")
for demo_id, filepath in created_files:
click.echo(f" • {demo_id} → {filepath}")
elif verbose:
click.echo("No demo pages were generated.")
else:
if verbose:
click.echo("Skipping demo page generation...")
# Generate gallery page (unless skipped)
if not skip_gallery:
if verbose:
click.echo("Generating gallery page from template...")
if generate_gallery_page(config):
click.echo(f"✅ Gallery page generated: {OUTPUT_FILE}")
else:
raise click.ClickException("Failed to generate gallery page.")
else:
if verbose:
click.echo("Skipping gallery page generation...")
# Check if nothing was generated
if skip_demos and skip_gallery:
click.echo("⚠️ Both demos and gallery generation were skipped. Nothing to do.")
else:
click.echo("\n🎉 Done! You can now run 'mkdocs serve' to view the documentation site.")
if __name__ == "__main__":
main()
================================================
FILE: scripts/sync_from_tidb_docs.py
================================================
import requests
import re
import sys
def remove_custom_content_blocks(content):
"""Remove <CustomContent ...>...</CustomContent> blocks from the content."""
return re.sub(r"<CustomContent[\s\S]*?</CustomContent>", "", content)
def collapse_extra_blank_lines(content):
"""Collapse 3 or more blank lines to 2 blank lines."""
return re.sub(r"\n{3,}", "\n\n", content)
def convert_note_blocks(content):
"""Convert '> **Note:**' blocks to '!!! note' syntax with indented content."""
def note_repl(m):
note_body = re.sub(r"^> ?", "", m.group(2), flags=re.MULTILINE).strip()
indented = "\n".join(
" " + line if line.strip() else "" for line in note_body.splitlines()
)
return "!!! note\n\n" + indented + "\n\n"
return re.sub(r"> \*\*Note:\*\*\n((?:> *\n)*)(> .*(?:\n|$)+)", note_repl, content)
def convert_warning_blocks(content):
"""Convert '> **Warning:**' blocks to '!!! warning' syntax with indented content."""
def warning_repl(m):
warning_body = re.sub(r"^> ?", "", m.group(2), flags=re.MULTILINE).strip()
indented = "\n".join(
" " + line if line.strip() else "" for line in warning_body.splitlines()
)
return "!!! warning\n\n" + indented + "\n\n"
return re.sub(r"> \*\*Warning:\*\*\n((?:> *\n)*)(> .*(?:\n|$)+)", warning_repl, content)
def convert_tip_blocks(content):
"""Convert '> **Tip**' blocks to '!!! tip' syntax with indented content."""
def tip_repl(m):
tip_body = re.sub(r"^> ?", "", m.group(2), flags=re.MULTILINE).strip()
indented = "\n".join(
" " + line if line.strip() else "" for line in tip_body.splitlines()
)
return "!!! tip\n\n" + indented + "\n\n"
return re.sub(r"> \*\*Tip:?\*\*\n((?:> *\n)*)(> .*(?:\n|$)+)", tip_repl, content)
def remove_see_also_section(content):
"""Remove the '## See also' section and everything after it."""
return re.sub(r"## See also[\s\S]*$", "", content)
def replace_image_paths(content):
"""Replace image paths to point to the local assets directory."""
return content.replace(
"/media/vector-search/embedding-search.png", "../../assets/embedding-search.png"
)
def replace_relative_doc_links(content):
"""Replace relative doc links with full tidbcloud doc links, remove .md suffix and 'vector-search/' in path."""
def link_repl(m):
path = m.group(1)
# Remove leading /, ./ or ../
path = re.sub(r"^/|^\./|^\.\./", "", path)
path = path.replace("vector-search/", "") # Remove 'vector-search/' directory
return f"(https://docs.pingcap.com/tidbcloud/{path})"
return re.sub(r"\(((?:/|\./|\.\./)[^)]+?)\.md\)", link_repl, content)
def remove_overview_from_title(content):
"""Remove 'Overview' from the main title if present."""
return re.sub(
r"^(# .*)Overview(.*)$",
lambda m: m.group(1).rstrip() + m.group(2) + "\n",
content,
flags=re.MULTILINE,
)
def remove_front_matter(content):
"""Remove YAML front matter if present."""
return re.sub(r"^---[\s\S]*?---\n", "", content)
def remove_simpletab_blocks(content):
"""Remove <SimpleTab> and <div label=...> ... </div> blocks, flattening their content."""
content = re.sub(r"<SimpleTab>|</SimpleTab>", "", content)
content = re.sub(r'<div label="[^"]*">', "", content)
content = re.sub(r"</div>", "", content)
return content
def process_overview():
url = "https://raw.githubusercontent.com/pingcap/docs/refs/heads/master/vector-search/vector-search-overview.md"
response = requests.get(url)
content = response.text
content = remove_custom_content_blocks(content)
content = collapse_extra_blank_lines(content)
content = convert_note_blocks(content)
content = remove_see_also_section(content)
content = replace_image_paths(content)
content = replace_relative_doc_links(content)
content = remove_overview_from_title(content)
save_to_file(content, "./src/ai/concepts/vector-search.md")
def process_llamaindex():
url = "https://raw.githubusercontent.com/pingcap/docs/master/vector-search/vector-search-integrate-with-llamaindex.md"
response = requests.get(url)
content = response.text
content = remove_front_matter(content)
content = remove_custom_content_blocks(content)
content = remove_simpletab_blocks(content)
content = collapse_extra_blank_lines(content)
content = convert_warning_blocks(content)
content = convert_note_blocks(content)
content = convert_tip_blocks(content)
content = remove_see_also_section(content)
content = replace_image_paths(content)
content = replace_relative_doc_links(content)
save_to_file(content, "./src/ai/integrations/llamaindex.md")
def process_langchain():
url = "https://raw.githubusercontent.com/pingcap/docs/master/vector-search/vector-search-integrate-with-langchain.md"
response = requests.get(url)
content = response.text
content = remove_front_matter(content)
content = remove_custom_content_blocks(content)
content = remove_simpletab_blocks(content)
content = collapse_extra_blank_lines(content)
content = convert_warning_blocks(content)
content = convert_note_blocks(content)
content = convert_tip_blocks(content)
content = remove_see_also_section(content)
content = replace_image_paths(content)
content = replace_relative_doc_links(content)
save_to_file(content, "./src/ai/integrations/langchain.md")
def save_to_file(content, filename):
"""Save the processed content to a file."""
with open(filename, "w") as f:
f.write(content)
def main():
if len(sys.argv) > 1:
arg = sys.argv[1].lower()
if arg in ("--llamaindex", "llamaindex"):
process_llamaindex()
print("llamaindex doc synced.")
return
elif arg in ("--langchain", "langchain"):
process_langchain()
print("langchain doc synced.")
return
elif arg in ("--overview", "overview"):
process_overview()
print("overview doc synced.")
return
else:
print(f"Unknown argument: {arg}")
print("Usage: python sync_from_tidb_docs.py [--llamaindex|--langchain|--overview]")
return
process_overview()
print("overview doc synced (default).")
if __name__ == "__main__":
main()
================================================
FILE: src/ai/concepts/vector-search.md
================================================
---
title: Vector Search Overview
summary: Learn about Vector Search in TiDB. This feature provides an advanced search solution for performing semantic similarity searches across various data types, including documents, images, audio, and video.
---
# Vector Search
Vector search offers a powerful solution for semantic similarity searches across diverse data types, such as documents, images, audio, and video. It allows developers to leverage their MySQL expertise to build scalable applications enriched with generative AI capabilities, simplifying the integration of advanced search functionality.
!!! note
The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Starter](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-cloud-starter) clusters.
## Concepts
Vector search is a search method that prioritizes the meaning of your data to deliver relevant results.
Unlike traditional full-text search, which relies on exact keyword matching and word frequency, vector search converts various data types (such as text, images, or audio) into high-dimensional vectors and queries based on the similarity between these vectors. This search method captures the semantic meaning and contextual information of the data, leading to a more precise understanding of user intent.
Even when the search terms do not exactly match the content in the database, vector search can still provide results that align with the user's intent by analyzing the semantics of the data.
For example, a full-text search for "a swimming animal" only returns results containing these exact keywords. In contrast, vector search can return results for other swimming animals, such as fish or ducks, even if these results do not contain the exact keywords.
### Vector embedding
A vector embedding, also known as an embedding, is a sequence of numbers that represents real-world objects in a high-dimensional space. It captures the meaning and context of unstructured data, such as documents, images, audio, and videos.
Vector embeddings are essential in machine learning and serve as the foundation for semantic similarity searches.
TiDB introduces [Vector data types](https://docs.pingcap.com/tidbcloud/vector-search-data-types) and [Vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) designed to optimize the storage and retrieval of vector embeddings, enhancing their use in AI applications. You can store vector embeddings in TiDB and perform vector search queries to find the most relevant data using these data types.
### Embedding model
Embedding models are algorithms that transform data into [vector embeddings](#vector-embedding).
Choosing an appropriate embedding model is crucial for ensuring the accuracy and relevance of semantic search results. For unstructured text data, you can find top-performing text embedding models on the [Massive Text Embedding Benchmark (MTEB) Leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
To learn how to generate vector embeddings for your specific data types, refer to integration tutorials or examples of embedding models.
## How vector search works
After converting raw data into vector embeddings and storing them in TiDB, your application can execute vector search queries to find the data most semantically or contextually relevant to a user's query.
TiDB vector search identifies the top-k nearest neighbor (KNN) vectors by using a [distance function](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators) to calculate the distance between the given vector and vectors stored in the database. The vectors closest to the given vector in the query represent the most similar data in meaning.

As a relational database with integrated vector search capabilities, TiDB enables you to store data and their corresponding vector representations (that is, vector embeddings) together in one database. You can choose any of the following ways for storage:
- Store data and their corresponding vector representations in different columns of the same table.
- Store data and their corresponding vector representation in different tables. In this way, you need to use `JOIN` queries to combine the tables when retrieving data.
## Use cases
### Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is an architecture designed to optimize the output of Large Language Models (LLMs). By using vector search, RAG applications can store vector embeddings in the database and retrieve relevant documents as additional context when the LLM generates responses, thereby improving the quality and relevance of the answers.
### Semantic search
Semantic search is a search technology that returns results based on the meaning of a query, rather than simply matching keywords. It interprets the meaning across different languages and various types of data (such as text, images, and audio) using embeddings. Vector search algorithms then use these embeddings to find the most relevant data that satisfies the user's query.
### Recommendation engine
A recommendation engine is a system that proactively suggests content, products, or services that are relevant and personalized to users. It accomplishes this by creating embeddings that represent user behavior and preferences. These embeddings help the system identify similar items that other users have interacted with or shown interest in. This increases the likelihood that the recommendations will be both relevant and appealing to the user.
================================================
FILE: src/ai/examples/auto-embedding-with-pytidb.md
================================================
---
title: Auto Embedding
description: "Automatically generate embeddings for your text data using built-in embedding models."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/auto_embedding"
---
# Auto Embedding Demo
This example showcases how to use the auto embedding feature with PyTiDB Client.
* Connect to TiDB with PyTiDB Client
* Define a table with a VectorField configured for automatic embedding
* Insert plain text data, embeddings are populated automatically in the background
* Run vector searches with natural language queries, embedding happens transparently
## Prerequisites
- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
## How to run
**Step 1**: Clone the repository
```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/auto_embedding/
```
**Step 2**: Install the required packages
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```
**Step 3**: Set up environment to connect to database
Go to [TiDB Cloud console](https://tidbcloud.com/clusters) to get the connection parameters and set up the environment variable like this:
```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test
# Using TiDB Cloud Free embedding model by default, which is no required to set up any API key
EMBEDDING_PROVIDER=tidbcloud_free
EOF
```
**Step 4**: Run the demo
```bash
python main.py
```
**Expected output:**
```plain
=== Define embedding function ===
Embedding function (model id: tidbcloud_free/amazon/titan-embed-text-v2) defined
=== Define table schema ===
Table created
=== Truncate table ===
Table truncated
=== Insert sample data ===
Inserted 3 chunks
=== Perform vector search ===
id: 1, text: TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads., distance: 0.30373281240458805
id: 2, text: PyTiDB is a Python library for developers to connect to TiDB., distance: 0.422506501973434
id: 3, text: LlamaIndex is a Python library for building AI-powered applications., distance: 0.5267239638442787
```
---
## Related Resources
- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/auto_embedding)
- **Category**: Getting-Started
- **Description**: Automatically generate embeddings for your text data using built-in embedding models.
[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary }
================================================
FILE: src/ai/examples/basic-with-pytidb.md
================================================
---
title: Basic Usage
description: "Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/basic"
---
# Basic CRUD Demo
This example demonstrates basic CRUD (Create, Read, Update, Delete) operations with PyTiDB.
* Use PyTiDB Client to connect to TiDB
* Create a table with text, vector, and JSON columns
* Perform basic CRUD operations on data
## Prerequisites
- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
## How to run
**Step 1**: Clone the repository to local
```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/basic/
```
**Step 2**: Install the required packages
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```
**Step 3**: Set up environment to connect to database
Go to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:
```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test
EOF
```
**Step 4**: Run the demo
```bash
python main.py
```
*Expected output:*
```plain
=== CREATE TABLE ===
Table created
=== TRUNCATE TABLE ===
Table truncated
=== CREATE ===
Created 3 items
=== READ ===
ID: 1, Content: TiDB is a distributed SQL database, Metadata: {'category': 'database'}
ID: 2, Content: GPT-4 is a large language model, Metadata: {'category': 'llm'}
ID: 3, Content: LlamaIndex is a Python library for building AI-powered applications, Metadata: {'category': 'rag'}
=== UPDATE ===
Updated item #1
After update - ID: 1, Content: TiDB Cloud Starter is a fully-managed, auto-scaling cloud database service, Metadata: {'category': 'dbass'}
=== DELETE ===
Deleted item #2
=== FINAL STATE ===
ID: 1, Content: TiDB Cloud Starter is a fully-managed, auto-scaling cloud database service, Metadata: {'category': 'dbass'}
ID: 3, Content: LlamaIndex is a Python library for building AI-powered applications, Metadata: {'category': 'rag'}
=== COUNT ROWS ===
Number of rows: 2
=== DROP TABLE ===
Table dropped
Basic CRUD operations completed!
```
---
## Related Resources
- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/basic)
- **Category**: Getting-Started
- **Description**: Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation.
[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary }
================================================
FILE: src/ai/examples/fulltext-search-with-pytidb.md
================================================
---
title: Fulltext Search
description: "Perform traditional text search using MySQL fulltext search capabilities."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/fulltext_search"
---
# Fulltext Search Example
This example demonstrates how to build a E-commerce product search application using TiDB's full-text search feature with multilingual support. Users can search for products by keywords in their preferred language.
<p align="center">
<img width="700" alt="E-commerce product search with full-text search" src="https://github.com/user-attachments/assets/c81ddad4-f996-4b1f-85c0-5cbb55bc2a3a" />
<p align="center"><i>E-commerce product search with full-text search</i></p>
</p>
## Prerequisites
- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
## How to run
**Step 1**: Clone the repository to local
```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/fulltext_search/;
```
**Step 2**: Install the required packages and setup environment
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```
**Step 3**: Set up environment to connect to database
Go to the [TiDB Cloud console](https://tidbcloud.com/), create a new cluster if you don't have one, and then get the connection parameters on the connection dialog.
```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=pytidb_fulltext_demo
EOF
```
**Step 4**: Run the Streamlit app
```bash
streamlit run app.py
```
**Step 5**: open the browser and visit `http://localhost:8501`
---
## Related Resources
- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/fulltext_search)
- **Category**: Search
- **Description**: Perform traditional text search using MySQL fulltext search capabilities.
[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary }
================================================
FILE: src/ai/examples/hybrid-search-with-pytidb.md
================================================
---
title: Hybrid Search
description: "Combine vector search and fulltext search for more comprehensive results."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/hybrid_search"
---
# Hybrid Search Demo
In this demo, we will show you how to use hybrid search to combine vector search and full-text search on a set of documents.
<p align="center">
<img src="https://github.com/user-attachments/assets/6e1c639d-2160-44c8-86b4-958913b9eca5" alt="TiDB Hybrid Search Demo" width="700"/>
<p align="center"><i>TiDB Hybrid Search Demo</i></p>
</p>
## Prerequisites
* Python 3.10+
* TiDB database instance (👉 [Create a free TiDB Serverless Cluster](https://tidbcloud.com/free-trial))
* OpenAI API key (Go to [OpenAI](https://platform.openai.com/api-keys) to get the API key)
> **Note**
>
> Currently, full-text search is only available for the following product option and region:
>
> - TiDB Cloud Starter: Frankfurt (eu-central-1), Singapore (ap-southeast-1)
## How to run
**Step 1**: Clone the repository
```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/hybrid_search;
```
**Step 2**: Install the required packages and setup environment
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```
**Step 3**: Set up environment to connect to storage
If you are using TiDB Cloud, you can find the connection parameters in the [TiDB Cloud console](https://tidbcloud.com/).
```bash
cat > .env <<EOF
TIDB_HOST=localhost
TIDB_PORT=4000
TIDB_USERNAME=root
TIDB_PASSWORD=
TIDB_DATABASE=pytidb_hybrid_demo
OPENAI_API_KEY=<your-openai-api-key>
EOF
```
**Step 4**: Run the demo
**Option 1**: Run the Streamlit app
If you want to check the demo with a web UI, you can run the following command:
```bash
streamlit run app.py
```
Open the browser and visit `http://localhost:8501`
**Option 2**: Run the demo script
If you want to check the demo with a script, you can run the following command:
```bash
python example.py
```
Expected output:
```
=== CONNECT TO TIDB ===
Connected to TiDB.
=== CREATE TABLE ===
Table created.
=== INSERT SAMPLE DATA ===
Inserted 3 rows.
=== PERFORM HYBRID SEARCH ===
Search results:
[
{
"_distance": 0.4740166257687124,
"_match_score": 1.6804268,
"_score": 0.03278688524590164,
"id": 60013,
"text": "TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads."
},
{
"_distance": 0.6428459116216618,
"_match_score": 0.78427225,
"_score": 0.03200204813108039,
"id": 60015,
"text": "LlamaIndex is a Python library for building AI-powered applications."
},
{
"_distance": 0.641581407158715,
"_match_score": null,
"_score": 0.016129032258064516,
"id": 60014,
"text": "PyTiDB is a Python library for developers to connect to TiDB."
}
]
```
---
## Related Resources
- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/hybrid_search)
- **Category**: Search
- **Description**: Combine vector search and fulltext search for more comprehensive results.
[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary }
================================================
FILE: src/ai/examples/image-search-with-pytidb.md
================================================
---
title: Image Search
description: "Build an image search application using multimodal embeddings for both text-to-image and image-to-image search."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/image_search"
---
# Pet Image Search Demo
This example showcases how to build a powerful image search application by combining TiDB's vector search capabilities with multimodal embedding models.
With just a few lines of code, you can create an intelligent search system that understands both text and images.
- 🔍 **Text-to-Image Search**: Find the perfect pet photos by describing what you're looking for in natural language - from "fluffy orange cat"
- 🖼️ **Image-to-Image Search**: Upload a photo and instantly discover visually similar pets based on breed, color, pose and more
<p align="center">
<img width="700" alt="PyTiDB Image Search Demo" src="https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9" />
<p align="center"><i>Pet image search via multimodal embeddings</i></p>
</p>
## Prerequisites
- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
- **Jina AI API Key**: Get your free API key at [jina.ai Embeddings ↗️](https://jina.ai/embeddings/)
## How to run
**Step 1**: Clone the repository to local
```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/image_search/
```
**Step 2**: Install the required packages
```bash
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r reqs.txt
```
**Step 3**: Set up environment variables
Go to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:
```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test
JINA_AI_API_KEY={your-jina-ai-api-key}
EOF
```
**Step 3**: Download and extract the dataset
In this demo, we will use the [Oxford Pets dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) to load pet images to the database for search.
*For Linux/MacOS:*
```bash
# Download the dataset
curl -L -o oxford_pets.tar.gz "https://thor.robots.ox.ac.uk/~vgg/data/pets/images.tar.gz"
# Extract the dataset
mkdir -p oxford_pets
tar -xzf oxford_pets.tar.gz -C oxford_pets
```
**Step 4**: Run the app
```bash
streamlit run app.py
```
Open `http://localhost:8501` in your browser.
**Step 5**: Load data
In the sample app, you can click the **Load Sample Data** button to load some sample data to the database.
Or if you want to load all the data in the Oxford Pets dataset, click the **Load All Data** button.
**Step 6**: Search
1. Select the **Search type** in the sidebar
2. Input a text description of the pet you're looking for, or upload a photo of a dog or cat
3. Click the **Search** button
---
## Related Resources
- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/image_search)
- **Category**: Search
- **Description**: Build an image search application using multimodal embeddings for both text-to-image and image-to-image search.
[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary }
================================================
FILE: src/ai/examples/index.md
================================================
---
title: Demo Gallery
description: Explore hands-on demos showcasing how TiDB empowers AI applications. Get started quickly with TiDB Cloud Starter to build your own AI-powered solutions.
hide:
- navigation
- toc
- pageTitle
- editButton
---
<style>
/* CSS Variables */
:root {
--brand-color: #de243d;
--brand-hover: #b71e34;
--border-radius-sm: 0.5rem;
--border-radius-md: 0.75rem;
--border-radius-lg: 1rem;
--spacing-sm: 1rem;
--spacing-md: 2rem;
--spacing-lg: 3rem;
--transition-fast: 0.2s;
--transition-normal: 0.3s;
--dark-overlay: rgba(255, 255, 255, 0.08);
--dark-border: rgba(255, 255, 255, 0.1);
--dark-bg-subtle: rgba(255, 255, 255, 0.05);
}
/* Smooth scrolling for the entire page */
html {
scroll-behavior: smooth;
}
/* Gallery Container */
.gallery-container {
max-width: 1280px;
margin: 0 auto;
padding: var(--spacing-md) var(--spacing-sm);
}
/* Header */
.gallery-header {
text-align: center;
margin-bottom: var(--spacing-lg);
}
.gallery-title {
font-size: 72px !important;
font-weight: 800 !important;
margin-bottom: 8px !important;
line-height: 1 !important;
color: var(--md-default-fg-color) !important;
}
.gallery-description {
font-size: 22px !important;
color: var(--md-default-fg-color--light) !important;
padding: 0 120px;
margin-bottom: 5rem !important;
}
/* Gallery CTA link styles */
.gallery-cta-link {
position: relative;
text-decoration: none;
transition: all 0.3s ease-in-out;
}
.gallery-cta-link:hover {
border-bottom: 3px solid var(--brand-color);
}
/* Layout */
.gallery-layout {
display: flex;
gap: var(--spacing-sm) !important;
}
/* Sidebar */
.gallery-sidebar {
width: 8rem;
flex-shrink: 0;
}
.sidebar-nav {
position: sticky;
top: 140px;
}
.sidebar-title {
font-size: 14px !important;
font-weight: 400 !important;
color: var(--md-default-fg-color--light) !important;
text-transform: uppercase;
letter-spacing: 0.05em;
margin: 0 0 0.5rem 0 !important;
}
.sidebar-links {
display: flex;
flex-direction: column;
gap: 2px;
margin-left: -12px;
}
.sidebar-link {
display: block;
padding: 8px 12px;
border-radius: var(--border-radius-sm);
font-size: 14px;
font-weight: 400 !important;
color: var(--md-default-fg-color--light) !important;
text-decoration: none !important;
transition: all var(--transition-fast) ease;
text-align: left;
cursor: pointer;
}
.sidebar-link:hover {
background-color: var(--md-default-fg-color--lightest) !important;
color: var(--md-default-fg-color) !important;
font-weight: 500 !important;
transform: translateX(2px);
}
.sidebar-link:focus-visible {
outline: 2px solid var(--brand-color);
outline-offset: 2px;
}
/* Content */
.gallery-content {
flex: 1;
padding: 0 var(--spacing-lg);
}
.gallery-section {
margin-bottom: var(--spacing-lg);
scroll-margin-top: 120px;
}
.section-title {
font-size: 24px !important;
font-weight: 700 !important;
color: var(--md-default-fg-color) !important;
margin: 0 0 1.5rem 0 !important;
}
.cards-grid {
display: grid;
grid-template-columns: repeat(3, 1fr);
gap: var(--spacing-sm) !important;
}
/* Cards */
.gallery-card {
display: block;
background: var(--md-default-bg-color);
border: 1px solid var(--md-default-fg-color--lightest);
border-radius: var(--border-radius-md);
overflow: hidden;
transition: all var(--transition-normal) ease;
text-decoration: none !important;
color: inherit;
}
.gallery-card:hover {
transform: translateY(-2px);
box-shadow: var(--md-shadow-z2);
text-decoration: none !important;
outline: 2px solid var(--brand-color);
outline-offset: 2px;
}
.gallery-card:hover .card-title {
color: var(--brand-color) !important;
}
.card-image {
height: 8rem;
position: relative;
overflow: hidden;
background-color: var(--md-default-fg-color--lightest);
border-bottom: 1px solid var(--md-default-fg-color--lightest);
}
.card-image img {
width: 100%;
height: 100%;
object-fit: cover;
}
.card-gradient {
display: flex;
align-items: center;
justify-content: center;
font-size: 2.5rem;
}
.card-badge {
position: absolute;
top: 0.5rem;
left: 0.5rem;
background: rgba(0, 0, 0, 0.7);
color: white;
padding: 0.2rem 0.4rem;
border-radius: 0.2rem;
font-size: 0.5rem !important;
font-weight: 500 !important;
}
.card-content {
padding: 12px;
}
.card-title {
font-size: 0.75rem !important;
font-weight: 600 !important;
line-height: 1.5 !important;
margin: 0 !important;
color: var(--md-default-fg-color) !important;
}
.card-description {
color: var(--md-default-fg-color--light) !important;
font-size: 0.65rem !important;
line-height: 1.5;
display: -webkit-box;
margin: 0;
-webkit-line-clamp: 3;
-webkit-box-orient: vertical;
overflow: hidden;
}
/* CTA */
.gallery-cta {
background: linear-gradient(135deg, rgba(222, 36, 61, 0.08) 0%, rgba(99, 102, 241, 0.08) 100%);
border: 1px solid rgba(222, 36, 61, 0.1);
border-radius: var(--border-radius-lg);
padding: var(--spacing-md);
text-align: center;
margin-top: var(--spacing-lg);
}
.cta-title {
font-size: 1.5rem;
font-weight: 700;
color: var(--md-default-fg-color);
margin-bottom: var(--spacing-sm);
}
.cta-description {
color: var(--md-default-fg-color--light);
margin: 0 auto var(--spacing-md);
max-width: 42rem;
}
.cta-buttons {
display: flex;
justify-content: center;
gap: var(--spacing-sm);
flex-wrap: wrap;
}
/* Button shared styles */
.btn-primary,
.btn-secondary {
padding: 0.75rem 1.5rem;
border-radius: var(--border-radius-sm);
font-weight: 500;
text-decoration: none !important;
transition: all var(--transition-fast);
}
.btn-primary {
background-color: var(--brand-color);
color: white !important;
}
.btn-primary:hover {
background-color: var(--brand-hover);
color: white !important;
}
.btn-secondary {
border: 1px solid var(--md-default-fg-color--lighter);
background-color: var(--md-default-bg-color);
color: var(--md-default-fg-color) !important;
}
.btn-secondary:hover {
background-color: var(--md-default-fg-color--lightest);
color: var(--md-default-fg-color) !important;
}
/* Dark mode styles */
[data-md-color-scheme="tidb-dark"] .sidebar-link:hover {
background-color: var(--dark-overlay) !important;
}
[data-md-color-scheme="tidb-dark"] .gallery-card {
border-color: var(--dark-border);
}
[data-md-color-scheme="tidb-dark"] .card-image {
background-color: var(--dark-bg-subtle);
border-bottom-color: var(--dark-border);
}
[data-md-color-scheme="tidb-dark"] .gallery-cta {
background: linear-gradient(135deg, rgba(222, 36, 61, 0.12) 0%, rgba(99, 102, 241, 0.12) 100%);
border-color: rgba(222, 36, 61, 0.2);
}
[data-md-color-scheme="tidb-dark"] .btn-secondary:hover {
background-color: rgba(255, 255, 255, 0.1);
}
/* Responsive adjustments */
@media (max-width: 768px) {
.gallery-layout { flex-direction: column; }
.gallery-sidebar { width: 100%; }
.sidebar-nav { position: static; }
.sidebar-links { flex-direction: row; gap: 0.5rem; flex-wrap: wrap; }
.sidebar-link { padding: 12px 16px !important; min-height: 44px; display: flex; align-items: center; }
.gallery-content { padding: 0; }
.gallery-description { padding: 0 20px; }
.cards-grid { grid-template-columns: 1fr; }
.gallery-title { font-size: 48px !important; }
}
@media (max-width: 1024px) and (min-width: 769px) {
.cards-grid { grid-template-columns: repeat(2, 1fr); }
}
</style>
<div class="gallery-container">
<div class="gallery-header">
<h1 class="gallery-title">Demo Gallery</h1>
<p class="gallery-description">
Explore hands-on demos showcasing how TiDB empowers AI applications.<br>
Get started quickly with <a href='https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=demo_gallery' target='_blank' rel='noopener noreferrer' class='gallery-cta-link'><b>TiDB Cloud Starter</b></a> to build your own AI-powered solutions.
</p>
</div>
<div class="gallery-layout">
<div class="gallery-sidebar">
<div class="sidebar-nav">
<h6 class="sidebar-title">Categories</h6>
<nav class="sidebar-links">
<a href="#featured" class="sidebar-link">Featured</a>
<a href="#getting-started" class="sidebar-link">Getting Started</a>
<a href="#search" class="sidebar-link">Search & Retrieval</a>
<a href="#ai-apps" class="sidebar-link">AI Applications</a>
</nav>
</div>
</div>
<div class="gallery-content">
<section id="featured" class="gallery-section">
<h2 class="section-title">⭐ Featured</h2>
<div class="cards-grid">
<a href="image-search-with-pytidb/" class="gallery-card">
<div class="card-image">
<img src="https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9" alt="Image Search Demo">
</div>
<div class="card-content">
<h3 class="card-title">Image Search</h3>
<p class="card-description">
Build an image search application using multimodal embeddings for both text-to-image and image-to-image search.
</p>
</div>
</a>
<a href="rag-with-pytidb/" class="gallery-card">
<div class="card-image">
<img src="https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e" alt="RAG Demo">
</div>
<div class="card-content">
<h3 class="card-title">RAG</h3>
<p class="card-description">
Build a RAG application that combines document retrieval with language generation.
</p>
</div>
</a>
<a href="memory-with-pytidb/" class="gallery-card">
<div class="card-image">
<img src="https://github.com/user-attachments/assets/74dee96b-ea20-49dc-ad27-679faa5bf9b8" alt="Memory Demo">
</div>
<div class="card-content">
<h3 class="card-title">Memory</h3>
<p class="card-description">
Implement conversation memory for chatbots and conversational AI applications.
</p>
</div>
</a>
</div>
</section>
<section id="getting-started" class="gallery-section">
<h2 class="section-title">🚀 Getting Started</h2>
<div class="cards-grid">
<a href="basic-with-pytidb/" class="gallery-card">
<div class="card-image card-gradient" style="background: linear-gradient(135deg, #10b981, var(--brand-color));">
<div>⚙️</div>
</div>
<div class="card-content">
<h3 class="card-title">Basic Usage</h3>
<p class="card-description">
Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation.
</p>
</div>
</a>
<a href="auto-embedding-with-pytidb/" class="gallery-card">
<div class="card-image card-gradient" style="background: radial-gradient(circle at center, #8b5cf6 0%, var(--brand-color) 100%);">
<div>🤖</div>
</div>
<div class="card-content">
<h3 class="card-title">Auto Embedding</h3>
<p class="card-description">
Automatically generate embeddings for your text data using built-in embedding models.
</p>
</div>
</a>
</div>
</section>
<section id="search" class="gallery-section">
<h2 class="section-title">🔍 Search & Retrieval</h2>
<div class="cards-grid">
<a href="vector-search-with-pytidb/" class="gallery-card">
<div class="card-image">
<img src="https://github.com/user-attachments/assets/6d7783a5-ce9c-4dcc-8b95-49d5f0ca735a" alt="Vector Search Demo">
</div>
<div class="card-content">
<h3 class="card-title">Vector Search</h3>
<p class="card-description">
Implement semantic search using vector embeddings to find similar content.
</p>
</div>
</a>
<a href="fulltext-search-with-pytidb/" class="gallery-card">
<div class="card-image">
<img src="https://github.com/user-attachments/assets/c81ddad4-f996-4b1f-85c0-5cbb55bc2a3a" alt="Fulltext Search Demo">
</div>
<div class="card-content">
<h3 class="card-title">Fulltext Search</h3>
<p class="card-description">
Perform traditional text search using MySQL fulltext search capabilities.
</p>
</div>
</a>
<a href="hybrid-search-with-pytidb/" class="gallery-card">
<div class="card-image">
<img src="https://github.com/user-attachments/assets/6e1c639d-2160-44c8-86b4-958913b9eca5" alt="Hybrid Search Demo">
</div>
<div class="card-content">
<h3 class="card-title">Hybrid Search</h3>
<p class="card-description">
Combine vector search and fulltext search for more comprehensive results.
</p>
</div>
</a>
<a href="image-search-with-pytidb/" class="gallery-card">
<div class="card-image">
<img src="https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9" alt="Image Search Demo">
</div>
<div class="card-content">
<h3 class="card-title">Image Search</h3>
<p class="card-description">
Build an image search application using multimodal embeddings for both text-to-image and image-to-image search.
</p>
</div>
</a>
</div>
</section>
<section id="ai-apps" class="gallery-section">
<h2 class="section-title">🤖 AI Applications</h2>
<div class="cards-grid">
<a href="rag-with-pytidb/" class="gallery-card">
<div class="card-image">
<img src="https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e" alt="RAG Demo">
</div>
<div class="card-content">
<h3 class="card-title">RAG</h3>
<p class="card-description">
Build a RAG application that combines document retrieval with language generation.
</p>
</div>
</a>
<a href="memory-with-pytidb/" class="gallery-card">
<div class="card-image">
<img src="https://github.com/user-attachments/assets/74dee96b-ea20-49dc-ad27-679faa5bf9b8" alt="Memory Demo">
</div>
<div class="card-content">
<h3 class="card-title">Memory</h3>
<p class="card-description">
Implement conversation memory for chatbots and conversational AI applications.
</p>
</div>
</a>
<a href="text2sql-with-pytidb/" class="gallery-card">
<div class="card-image card-gradient" style="background: linear-gradient(135deg, #06b6d4, var(--brand-color));">
<div>💬</div>
</div>
<div class="card-content">
<h3 class="card-title">Text2SQL</h3>
<p class="card-description">
Convert natural language queries into SQL statements using AI models.
</p>
</div>
</a>
</div>
</section>
<div class="gallery-cta">
<h3 class="cta-title">Ready to build your AI application?</h3>
<p class="cta-description">
Start your AI journey with TiDB Cloud Starter. Follow our quickstart guide to build your first AI-powered application in minutes, or explore specific examples for your use case.
</p>
<div class="cta-buttons">
<a href="https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme" target="_blank" rel="noopener noreferrer" class="btn-primary">Try TiDB Cloud Starter</a>
<a href="/ai/quickstart/" class="btn-secondary">View Quickstart Guide</a>
</div>
</div>
</div>
</div>
</div>
================================================
FILE: src/ai/examples/memory-with-pytidb.md
================================================
---
title: Memory
description: "Implement conversation memory for chatbots and conversational AI applications."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/memory"
---
# AI Agent Memory Demo
This example showcases how to build an intelligent AI agent with persistent memory powered by TiDB's vector search capabilities.
With just a few lines of code, you can create a conversational AI that remembers past interactions and builds context over time.
- 🧠 **Persistent Memory**: The AI remembers conversations across sessions and user interactions
- 💬 **Interactive Chat**: Both web interface and command-line options for flexible interaction
- 👤 **Multi-User Support**: Different users can have separate memory contexts
- 🔍 **Real-Time Memory Viewing**: Visual display of all stored memories in the web interface
<p align="center">
<img src="https://github.com/user-attachments/assets/b57ae0fb-9075-43a9-8690-edaa69ca9f40" alt="AI Agent with memory powered by TiDB" width="700"/>
<p align="center"><i>AI Agent with memory powered by TiDB</i></p>
</p>
## Prerequisites
- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
- **OpenAI API Key**: Get your API key at [OpenAI Platform ↗️](https://platform.openai.com/api-keys)
## How to run
**Step 1**: Clone the repository to local
```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/memory/
```
**Step 2**: Install the required packages
```bash
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r reqs.txt
```
**Step 3**: Set up environment variables
Go to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:
```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test
OPENAI_API_KEY={your-openai-api-key}
EOF
```
**Step 4**: Run the application
Choose one of the following options:
**Option 1**: Launch Web Application:
```bash
streamlit run app.py
```
Visit `http://localhost:8501` in your browser and follow the instructions in the [Interact with memory in Web Application](https://github.com/pingcap/pytidb/tree/main/examples/memory/#interact-with-memory-in-web-application) section to start interacting with the memory-enabled AI assistant.
**Option 2**: Run Command Line Application:
```bash
python main.py
```
Follow the instructions in the [Interact with memory in Command Line Application](https://github.com/pingcap/pytidb/tree/main/examples/memory/#interact-with-memory-in-command-line-application) section to start interacting with the memory-enabled AI assistant.
## Interact with memory in Web Application
In the web application, you can interact with the AI assistant, the user interface includes:
- **Sidebar**: User settings and chat list.
- **Main chat area**: Chat interface with the AI assistant.
- **Memory viewer**: Real-time memory viewer showing stored facts.
You can follow the following steps to check how the memory works:
1. Introduce yourself in the default chat session. For example, "Hello, I am John. I work as a software engineer and love guitar."
2. You can see the information you provided in the memory viewer.
3. Click **New chat** in the sidebar to start a new chat session.
4. Ask "Who am I?" in the new chat session. The AI will recall your information from previous conversations.
## Interact with memory in Command Line Application
In the command line application, you can interact with the AI assistant and introduce yourself.
**Example conversation:**
```plain
Chat with AI (type 'exit' to quit)
You: Hello, I am Mini256.
AI: Hello, Mini256! How can I assist you today?
You: I am working at PingCAP.
AI: That's great to hear, Mini256! PingCAP is known for its work on distributed databases, particularly TiDB. How's your experience been working there?
You: I am developing pytidb (A Python SDK for TiDB) which helps developers easy to connect to TiDB.
AI: That sounds like a great project, Mini256! Developing a Python SDK for TiDB can make it much easier for developers to integrate with TiDB and interact with it using Python. If you need any advice on best practices, libraries to use, or specific features to implement, feel free to ask!
You: exit
Goodbye!
```
After the first conversation, the AI assistant will remember the information you provided and use it to answer future questions.
Now, you can start a new chat session and ask the AI assistant "Who am I?".
**Example conversation in another chat session:**
```plain
Chat with AI (type 'exit' to quit)
You: Who am I?
AI: You are Mini256, and you work at PingCAP, where you are developing pytidb, a Python SDK for TiDB to assist developers in easily connecting to TiDB.
You: exit
Goodbye!
```
As you can see, the AI assistant remembers you across sessions!
---
## Related Resources
- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/memory)
- **Category**: Ai-Apps
- **Description**: Implement conversation memory for chatbots and conversational AI applications.
[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary }
================================================
FILE: src/ai/examples/rag-with-pytidb.md
================================================
---
title: RAG
description: "Build a RAG application that combines document retrieval with language generation."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/rag"
---
# RAG Example
This example demonstrates how to use PyTiDB to build a minimal RAG application.
* Use Ollama to deploy local embedding model and LLM model
* Use Streamlit to build a Web UI for the RAG application
* Use PyTiDB to build a minimal RAG application
<p align="center">
<img src="https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e" alt="RAG application built with PyTiDB" width="600" />
<p align="center"><i>RAG application built with PyTiDB</i></p>
</p>
## Prerequisites
- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
- **Ollama**: You can install it from [Ollama ↗️](https://ollama.com/download)
## How to run
**Step 1**: Prepare the inference API
Pull the embedding and LLM model via ollama CLI:
```bash
ollama pull mxbai-embed-large
ollama pull gemma3:4b
ollama run gemma3:4b
```
Test the `/embed` and `/generate` endpoints to make sure they are running:
```bash
curl http://localhost:11434/api/embed -d '{
"model": "mxbai-embed-large",
"input": "Llamas are members of the camelid family"
}'
```
```bash
curl http://localhost:11434/api/generate -d '{
"model": "gemma3:4b",
"prompt": "Hello, Who are you?"
}'
```
**Step 2**: Clone the repository to local
```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/rag/;
```
**Step 3**: Install the required packages and setup environment
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```
**Step 4**: Set up environment to connect to database
Go to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:
```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test
EOF
```
**Step 5**: Run the Streamlit app
```bash
streamlit run main.py
```
**Step 6**: Open the browser and visit `http://localhost:8501`
## Troubleshooting
### `502 Bad Gateway` Error
Try to disable the global proxy settings.
---
## Related Resources
- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/rag)
- **Category**: Ai-Apps
- **Description**: Build a RAG application that combines document retrieval with language generation.
[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary }
================================================
FILE: src/ai/examples/text2sql-with-pytidb.md
================================================
---
title: Text2SQL
description: "Convert natural language queries into SQL statements using AI models."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/text2sql"
---
# Text2SQL Demo
This demo showcases an AI-powered interface that converts natural language questions into SQL queries and executes them against TiDB. Built with PyTiDB, OpenAI GPT, and Streamlit, it provides a seamless way to interact with your database using plain English.
## Prerequisites
- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
- **OpenAI API Key**: Get your API key at [OpenAI Platform ↗️](https://platform.openai.com/api-keys)
## How to run
**Step 1**: Clone the repository
```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/text2sql/;
```
**Step 2**: Install the required packages
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```
**Step 3**: Run the Streamlit app
```bash
streamlit run app.py
```
**Step 4**: Run streamlit app
Open the browser and visit `http://localhost:8501`
* Input OpenAI API key in left sidebar
* Input the TiDB Cloud connection string in left sidebar, the format is `mysql+pymysql://root@localhost:4000/test`
---
## Related Resources
- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/text2sql)
- **Category**: Ai-Apps
- **Description**: Convert natural language queries into SQL statements using AI models.
[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary }
================================================
FILE: src/ai/examples/vector-search-with-pytidb.md
================================================
---
title: Vector Search
description: "Implement semantic search using vector embeddings to find similar content."
source_repo: "https://github.com/pingcap/pytidb/tree/main/examples/vector_search"
---
# Vector Search Example
This example demonstrates how to build a semantic search application using TiDB and local embedding models. It leverages vector search to find similar items based on meaning, not just keywords. The app uses Streamlit for the web UI and Ollama for local embedding generation.
<p align="center">
<img width="700" alt="Semantic search with vector embeddings" src="https://github.com/user-attachments/assets/6d7783a5-ce9c-4dcc-8b95-49d5f0ca735a" />
<p align="center"><i>Semantic search with vector embeddings</i></p>
</p>
## Prerequisites
- **Python 3.10+**
- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)
- **Ollama**: You can install it from [Ollama ↗️](https://ollama.com/download)
## How to run
**Step 1**: Start the embedding service with Ollama
Pull the embedding model:
```bash
ollama pull mxbai-embed-large
```
Test the embedding service to make sure it is running:
```bash
curl http://localhost:11434/api/embed -d '{
"model": "mxbai-embed-large",
"input": "Llamas are members of the camelid family"
}'
```
**Step 2**: Clone the repository to local
```bash
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/vector_search/
```
**Step 3**: Install the required packages and set up the environment
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
```
**Step 4**: Set up environment to connect to TiDB
Go to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:
```bash
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=pytidb_vector_search
EOF
```
**Step 5**: Run the Streamlit app
```bash
streamlit run app.py
```
**Step 6**: Open your browser and visit `http://localhost:8501`
---
## Related Resources
- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/vector_search)
- **Category**: Search
- **Description**: Implement semantic search using vector embeddings to find similar content.
[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary }
================================================
FILE: src/ai/guides/auto-embedding.md
================================================
# Auto Embedding
Auto embedding is a feature that allows you to automatically generate vector embeddings for text data.
!!! tip
To check the complete example code, please refer to the [auto embedding example](https://github.com/pingcap/pytidb/blob/main/examples/auto_embedding).
## Basic Usage
In this example, we use TiDB Cloud hosted embedding models for demonstration, for other providers, please check the [Supported Providers](../integrations/embedding-overview.md#supported-providers) list.
### Step 1. Define a embedding function
=== "Python"
Define a embedding function to generate vector embeddings for text data.
```python
from pytidb.embeddings import EmbeddingFunction
embed_func = EmbeddingFunction(
model_name="tidbcloud_free/amazon/titan-embed-text-v2",
)
```
### Step 2. Create a table and a vector field
=== "Python"
Use `embed_func.VectorField()` to create a vector field on the table.
To enable auto embedding, you need to set `source_field` to the field that you want to embed.
```python hl_lines="7"
from pytidb.schema import TableModel, Field
from pytidb.datatype import TEXT
class Chunk(TableModel):
id: int = Field(primary_key=True)
text: str = Field(sa_type=TEXT)
text_vec: list[float] = embed_func.VectorField(source_field="text")
table = client.create_table(schema=Chunk, if_exists="overwrite")
```
You don't need to specify the `dimensions` parameter, it will be automatically determined by the embedding model.
However, you can specify the `dimensions` parameter to override the default dimension.
### Step 3. Insert some sample data
=== "Python"
Insert some sample data into the table.
```python
table.bulk_insert([
Chunk(text="TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads."),
Chunk(text="PyTiDB is a Python library for developers to connect to TiDB."),
Chunk(text="LlamaIndex is a Python library for building AI-powered applications."),
])
```
When inserting data, the `text_vec` field will be automatically populated with the vector embeddings generated based on the `text` field.
### Step 4. Perform a vector search
=== "Python"
You can pass the query text to the `search()` method directly, the query text will be embedded and then used for vector search.
```python
table.search("HTAP database").limit(3).to_list()
```
================================================
FILE: src/ai/guides/connect.md
================================================
# Connect to database
In this guide, we will introduce how to connect to a TiDB database using the TiDB client.
## Install the dependencies
[pytidb](https://github.com/pingcap/pytidb) is a Python client built upon [SQLAlchemy](https://sqlalchemy.org/), it provides a series of high-level APIs to help developers store and search vector embeddings without writing raw SQL.
To install the Python client, run the following command:
```bash
pip install pytidb
```
## Connect with connection parameters
Choose the steps based on your deployment type:
=== "TiDB Cloud Starter"
You can create a starter cluster in the [TiDB Cloud](https://tidbcloud.com/free-trial/), and then get the connection parameters from the web console.
1. Navigate to the [Clusters page](https://tidbcloud.com/clusters), and then click the name of your target cluster to go to its overview page.
2. Click **Connect** in the upper-right corner. A connection dialog is displayed, with connection parameters listed.
3. Copy the connection parameters to your code or environment variables.
Example code:
```python title="main.py"
from pytidb import TiDBClient
db = TiDBClient.connect(
host="{gateway-region}.prod.aws.tidbcloud.com",
port=4000,
username="{prefix}.root",
password="{password}",
database="test",
)
```
!!! tip
For TiDB Cloud Starter, [TLS connection to the database](https://docs.pingcap.com/tidbcloud/secure-connections-to-starter-clusters/) is required when using Public Endpoint. TiDB Client will **automatically** enable TLS connection for starter clusters.
=== "TiDB Self-Managed"
You can follow [Quick Start with TiDB Self-Managed](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb/#deploy-a-local-test-cluster) to deploy a TiDB cluster for testing.
Example code:
```python title="main.py"
from pytidb import TiDBClient
db = TiDBClient.connect(
host="{tidb_server_host}",
port=4000,
username="root",
password="{password}",
database="test",
)
```
!!! tip
If you are using `tiup playground` to deploy a TiDB cluster for testing, the default host is `127.0.0.1` and the default password is empty.
Once connected, you can use the `db` object to operate tables, query data, and more.
## Connect with connection string
If you prefer to use a connection string (database URL), you can follow the format based on your deployment type:
=== "TiDB Cloud Starter"
You can create a starter cluster in the [TiDB Cloud](https://tidbcloud.com/free-trial/), and then get the connection parameters from the web console.
1. Navigate to the [Clusters page](https://tidbcloud.com/clusters), and then click the name of your target cluster to go to its overview page.
2. Click **Connect** in the upper-right corner. A connection dialog is displayed with the connection parameters listed.
3. Copy the connection parameters and construct the connection string as the format below.
```python title="main.py"
from pytidb import TiDBClient
db = TiDBClient.connect(
database_url="mysql+pymysql://{USERNAME}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}?ssl_verify_cert=true&ssl_verify_identity=true",
)
```
!!! note
For TiDB Cloud Starter, [TLS connection to the database](https://docs.pingcap.com/tidbcloud/secure-connections-to-starter-clusters/) is required when using Public Endpoint, so you need to set `ssl_verify_cert=true&ssl_verify_identity=true` in the connection string.
=== "TiDB Self-Managed"
You can follow the format below to construct the connection string:
```python title="main.py"
from pytidb import TiDBClient
db = TiDBClient.connect(
database_url="mysql+pymysql://{USERNAME}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}",
)
```
!!! tip
If you are using `tiup playground` to deploy a TiDB cluster for testing, the connection string is:
```
mysql+pymysql://root:@127.0.0.1:4000/test
```
## Connect with SQLAlchemy DB engine
If your application already has an existing SQLAlchemy database engine, you can reuse the engine through the `db_engine` parameter:
```python title="main.py"
from pytidb import TiDBClient
db = TiDBClient(db_engine=db_engine)
```
## Next Steps
After connecting to your TiDB database, you can explore the following guides to learn how to work with your data:
- [Working with Tables](./tables.md): Learn how to define and manage tables in TiDB.
- [Vector Search](./vector-search.md): Perform semantic search using vector embeddings.
- [Fulltext Search](./fulltext-search.md): Retrieve documents using keyword-based search.
- [Hybrid Search](./hybrid-search.md): Combine vector and full-text search for more relevant results.
================================================
FILE: src/ai/guides/filtering.md
================================================
# Filtering
As a relational database, TiDB supports a rich set of [SQL operators](https://docs.pingcap.com/tidbcloud/operators/) and allows flexible combinations of filtering conditions that enable you to query your data precisely.
## Overview
You can not only apply filtering on scalar fields but also on JSON fields. Filtering on JSON fields is often used for [metadata filtering](./vector-search.md#metadata-filtering) in vector search.
=== "Python"
For PyTiDB, you can apply filtering by passing a **filters** parameter to the `table.query()`, `table.delete()`, `table.update()`, and `table.search()` methods.
The **filters** parameter supports two formats: [Dictionary Filters](#dictionary-filters) and [SQL String Filters](#sql-string-filters).
## Dictionary Filters
=== "Python"
PyTiDB allows you to define filter conditions using a Python dictionary with operators as the **filters** parameter.
The dictionary structure of **filters** is as follows:
```python
{
"<key>": {
"<operator>": <value>
},
...
}
```
- `<key>`: The key can be a column name, a JSON path expression to access a JSON field (see [Metadata filtering](./vector-search.md#metadata-filtering)), or a [logical operator](#logical-operators).
- `<operator>`: The operator can be a [compare operator](#compare-operators) or an [inclusion operator](#inclusion-operators).
- `<value>`: The value can be a scalar value, an array, it depends on the operator.
**Example: Filter records where `created_at` is greater than 2024-01-01**
```python
table.query({
# The `created_at` is a scalar field with DATETIME type
"created_at": {
"$gt": "2024-01-01"
}
})
```
**Example: Filter records where `meta.category` is in the array ["tech", "science"]**
```python
results = (
table.search("some query", search_type="vector")
.filter({
# The `meta` is a JSON field, and its value is a JSON object like {"category": "tech"}
"meta.category": {
"$in": ["tech", "science"]
}
})
.limit(10)
.to_list()
)
```
### Compare operators
You can use the following compare operators to filter records:
| Operator | Description |
|----------|-----------------------------------|
| `$eq` | Equal to value |
| `$ne` | Not equal to value |
| `$gt` | Greater than value |
| `$gte` | Greater than or equal to value |
| `$lt` | Less than value |
| `$lte` | Less than or equal to value |
**Example: filter records where `user_id` is equal to 1**
```python
{
"user_id": {
"$eq": 1
}
}
```
You can omit the `$eq` operator. The following query is equivalent to the above:
```python
{
"user_id": 1
}
```
### Inclusion operators
You can use the following inclusion operators to filter records:
| Operator | Description |
|----------|-----------------------------------|
| `$in` | In array (string, int, or float) |
| `$nin` | Not in array (string, int, float) |
**Example: Filter records where `category` is in the array ["tech", "science"]**
```python
{
"category": {
"$in": ["tech", "science"]
}
}
```
### Logical operators
You can use the logical operators `$and` and `$or` to combine multiple filters.
| Operator | Description |
|----------|-----------------------------------------------------|
| `$and` | Returns results that match **all** filters in the list |
| `$or` | Returns results that match **any** filter in the list |
**Syntax for using `$and` or `$or`:**
```python
{
"$and|$or": [
{
"field_name": {
<operator>: <value>
}
},
{
"field_name": {
<operator>: <value>
}
}
...
]
}
```
**Example: using `$and` to combine multiple filters:**
```python
{
"$and": [
{
"created_at": {
"$gt": "2024-01-01"
}
},
{
"meta.category": {
"$in": ["tech", "science"]
}
}
]
}
```
## SQL String Filters
=== "Python"
You can also use a SQL string as the `filters` parameter. The string should be a valid SQL `WHERE` clause (without the `WHERE` keyword) using TiDB's SQL syntax.
**Example: Filter records where `created_at` is greater than 2024-01-01**
```python
results = table.query(
filters="created_at > '2024-01-01'",
limit=10
).to_list()
```
**Example: Filter records where the JSON field `meta.category` equals 'tech'**
```python
results = table.query(
filters="meta->>'$.category' = 'tech'",
limit=10
).to_list()
```
You can combine multiple conditions using `AND`, `OR`, and parentheses, and use any [SQL operators](https://docs.pingcap.com/tidbcloud/operators/) supported by TiDB.
!!! warning
When using SQL string filters with dynamic user input, always validate the input to prevent [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) vulnerabilities.
================================================
FILE: src/ai/guides/fulltext-search.md
================================================
# Full-text Search
**Full-text search** enables you to find documents or data by matching keywords or phrases within the entire text content. It is widely used in search engines, document management, e-commerce, and any scenario where users need to search large volumes of unstructured or semi-structured text.
TiDB provides full-text search capabilities for **massive datasets** with high performance and built-in **multilingual support**.
!!! note
Full-text search is currently in the early stages with limited accessibility. It is only available for **TiDB Cloud Starter** in the following regions:
- **Frankfurt (eu-central-1)**
- **Singapore (ap-southeast-1)**
If you have feedback or need help, feel free to reach out to us on [Discord](https://discord.gg/zcqexutz2R).
!!! tip
For a complete example of full-text search, see the [E-commerce product search demo](../examples/fulltext-search-with-pytidb.md).
## Basic Usage
### Step 1. Create Table and Full-text Index
=== "Python"
You can use `FullTextField` to define a text field with full-text search enabled. The `fts_parser` parameter in Python corresponds to the `WITH PARSER` clause in SQL.
For example, the following code creates a table with a full-text index on the `title` column:
```python hl_lines="6"
from pytidb.schema import TableModel, Field, FullTextField
class Item(TableModel):
__tablename__ = "items"
id: int = Field(primary_key=True)
title: str = FullTextField(fts_parser="MULTILINGUAL")
table = client.create_table(schema=Item, if_exists="overwrite")
```
The `fts_parser` parameter specifies the parser for the full-text index. Supported values:
- `STANDARD`: Fast, works for English content, splits words by spaces and punctuation.
- `MULTILINGUAL` (default): Supports multiple languages, including English, Chinese, Japanese, and Korean.
=== "SQL"
Create a table with a full-text index:
```sql hl_lines="4"
CREATE TABLE items(
id INT PRIMARY KEY,
title TEXT,
FULLTEXT INDEX (title) WITH PARSER MULTILINGUAL
);
```
You can also add a full-text index to an existing table with a separate statement:
```sql
CREATE TABLE items(
id INT PRIMARY KEY,
title TEXT
);
ALTER TABLE items ADD FULLTEXT INDEX (title)
WITH PARSER MULTILINGUAL ADD_COLUMNAR_REPLICA_ON_DEMAND;
```
The following parsers are supported in the `WITH PARSER <PARSER_NAME>` clause:
- `STANDARD`: Fast, works for English content, splits words by spaces and punctuation.
- `MULTILINGUAL`: Supports multiple languages, including English, Chinese, Japanese, and Korean.
### Step 2. Insert Sample Data
For demonstration purposes, the following sample data covers English, Japanese, and Chinese text.
=== "Python"
You can use the `bulk_insert` method to insert sample data into the table.
```python
table.bulk_insert([
Item(id=1, title="Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency"),
Item(id=2, title="Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable"),
Item(id=3, title="IPX7 Waterproof Earbuds, Sport ready, Touch control, High-quality music"),
Item(id=4, title="Sports Earbuds, Secure fit, Sweatproof, Long battery, Workout support"),
Item(id=5, title="Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience"),
Item(id=6, title="Bluetoothイヤホン HiFi音質 48hバッテリー 急速充電 低遅延"),
Item(id=7, title="Bluetooth5.3ヘッドホン ノイズキャンセリング 没入サウンド 快適装着"),
Item(id=8, title="IPX7防水イヤホン スポーツ対応 タッチ操作 高音質音楽"),
Item(id=9, title="スポーツイヤホン 安定装着 防汗 長持ちバッテリー ワークアウト対応"),
Item(id=10, title="有線ヘッドホン スタジオ級 HDサウンド 快適装着 プロ音楽体験"),
Item(id=11, title="无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟"),
Item(id=12, title="蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光"),
Item(id=13, title="IPX7防水真无线耳机 运动无忧 智能触控 随时畅听高品质音乐"),
Item(id=14, title="运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话"),
Item(id=15, title="录音室级有线耳机 高清音质 舒适佩戴 可拆卸线材 多设备兼容 降噪麦克风"),
])
```
=== "SQL"
You can use the `INSERT INTO` statement to insert the sample data into the table.
```sql
INSERT INTO items (id, title) VALUES
(1, 'Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency'),
(2, 'Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable'),
(3, 'IPX7 Waterproof Earbuds, Sport ready, Touch control, High-quality music'),
(4, 'Sports Earbuds, Secure fit, Sweatproof, Long battery, Workout support'),
(5, 'Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience'),
(6, 'Bluetoothイヤホン HiFi音質 48hバッテリー 急速充電 低遅延'),
(7, 'Bluetooth5.3ヘッドホン ノイズキャンセリング 没入サウンド 快適装着'),
(8, 'IPX7防水イヤホン スポーツ対応 タッチ操作 高音質音楽'),
(9, 'スポーツイヤホン 安定装着 防汗 長持ちバッテリー ワークアウト対応'),
(10, '有线ヘッドホン スタジオ级 HDサウンド 快适装着 プロ音楽体験'),
(11, '无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟'),
(12, '蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光'),
(13, 'IPX7防水真无线耳机 运动无忧 智能触控 随时畅听高品质音乐'),
(14, '运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话'),
(15, '录音室级有线耳机 高清音质 舒适佩戴 可拆卸线材 多设备兼容 降噪麦克风');
```
### Step 3. Perform a Full-text Search
=== "Python"
To perform a full-text search with pytidb, use the `search` method and set the `search_type` parameter to `"fulltext"`.
**Example: Search for the 3 most relevant documents**
```python
results = table.search("Bluetooth Headphones", search_type="fulltext").limit(3).to_list()
print(json.dumps(results, indent=2, ensure_ascii=False))
```
```python title="Execution result"
[
{
"id": 2,
"title": "Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable",
"_match_score": 3.7390857,
"_score": 3.7390857
},
{
"id": 5,
"title": "Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience",
"_match_score": 1.9798478,
"_score": 1.9798478
},
{
"id": 1,
"title": "Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency",
"_match_score": 1.620981,
"_score": 1.620981
}
]
```
The results are sorted by relevance, with the most relevant documents listed first.
The `_match_score` (or `_score`) field indicates the relevance score of each document, calculated using the [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) algorithm—a widely used ranking function in information retrieval.
**Example: Search for the 3 most relevant documents in another language**
```python
results = table.search("蓝牙耳机", search_type="fulltext").limit(3).to_list()
print(json.dumps(results, indent=2, ensure_ascii=False))
```
```python title="Execution result"
[
{
"id": 11,
"title": "无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟",
"_match_score": 3.000002,
"_score": 3.000002
},
{
"id": 12,
"title": "蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光",
"_match_score": 2.5719738,
"_score": 2.5719738
},
{
"id": 14,
"title": "运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话",
"_match_score": 1.1418362,
"_score": 1.1418362
}
]
```
=== "SQL"
To perform a full-text search, use the `fts_match_word()` function.
**Example: Search for the 3 most relevant documents**
```sql
SELECT *, fts_match_word("Bluetooth Headphones", title) AS score
FROM items
WHERE fts_match_word("Bluetooth Headphones", title)
ORDER BY score DESC
LIMIT 3;
```
```plain title="Execution result"
+----+-----------------------------------------------------------------------------+-----------+
| id | title | score |
+----+-----------------------------------------------------------------------------+-----------+
| 2 | Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable | 3.7390857 |
| 5 | Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience | 1.9798478 |
| 1 | Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency | 1.620981 |
+----+-----------------------------------------------------------------------------+-----------+
```
The results are ordered by relevance, with the most relevant documents first.
**Example: Search for the 3 most relevant documents in another language**
```sql
SELECT *, fts_match_word("蓝牙耳机", title) AS score
FROM items
WHERE fts_match_word("蓝牙耳机", title)
ORDER BY score DESC
LIMIT 3;
```
```plain title="Execution result"
+----+------------------------------------------------------------------+-----------+
| id | title | score |
+----+------------------------------------------------------------------+-----------+
| 11 | 无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟 | 3.000002 |
| 12 | 蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光 | 2.5719738 |
| 14 | 运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话 | 1.1418362 |
+----+------------------------------------------------------------------+-----------+
```
## See Also
In Retrieval-Augmented Generation (RAG) scenarios, it is often beneficial to utilize both full-text search and vector search for optimal results.
- Learn how to combine these approaches in the [hybrid search guide](./hybrid-search.md).
- For more on vector search, see the [vector search guide](../concepts/vector-search.md).
================================================
FILE: src/ai/guides/hybrid-search.md
================================================
# Hybrid Search
Hybrid search is a technique that combines multiple search algorithms to deliver more accurate and relevant results.
TiDB supports both semantic search (also known as vector search) and keyword-based search (full-text search). By leveraging the strengths of both approaches, you can achieve superior search results through hybrid search.
<p align="center">
<img src="https://docs-download.pingcap.com/media/images/docs/vector-search/hybrid-search-overview.svg" alt="hybrid search overview" width="800"/>
</p>
!!! tip
For a complete example of hybrid search, refer to the [hybrid-search example](../examples/hybrid-search-with-pytidb.md).
## Basic Usage
### Step 1. Define an Embedding Function
Define an embedding function to generate vector representations of text data.
```python
from pytidb.embeddings import EmbeddingFunction
embed_fn = EmbeddingFunction(
model_name="openai/text-embedding-3-small",
api_key=os.getenv("OPENAI_API_KEY"),
)
```
### Step 2. Create a Table with Vector and Full-Text Indexes
=== "Python"
After you have [connected to your TiDB database](./connect.md) using `TiDBClient` and get the `client` instance:
You can now create a table with both a `FullTextField` and a `VectorField` to store the text data and its vector embedding.
Example:
```python
from pytidb.schema import TableModel, Field, FullTextField
class Chunk(TableModel):
__tablename__ = "chunks_for_hybrid_search"
id: int = Field(primary_key=True)
text: str = FullTextField()
text_vec: list[float] = embed_fn.VectorField(source_field="text")
table = client.create_table(schema=Chunk, if_exists="overwrite")
```
In this example, PyTiDB will automatically create a full-text index on the `text` column and a vector index on the `text_vec` column.
### Step 3. Insert Sample Data
=== "Python"
Use the `bulk_insert()` method to insert sample data into the table.
```python
table.truncate()
table.bulk_insert([
Chunk(
text="TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads.",
),
Chunk(
text="PyTiDB is a Python library for developers to connect to TiDB.",
),
Chunk(
text="LlamaIndex is a Python library for building AI-powered applications.",
),
])
```
The `text_vec` field is automatically populated with the vector embedding of the text data via the [Auto Embedding](../guides/auto-embedding.md) feature.
### Step 4. Perform Hybrid Search
To enable hybrid search, set the `search_type` parameter to `hybrid` when calling the `search()` method.
```python
results = (
table.search(
"AI database", search_type="hybrid"
)
.limit(3)
.to_list()
)
for item in results:
item.pop("text_vec")
print(json.dumps(results, indent=4, sort_keys=True))
```
The search results contain three special fields:
- `_distance`: The distance between the query vector and the vector data in the table, as returned by the vector search.
- `_match_score`: The match score between the query and the text field, as returned by the full-text search.
- `_score`: The final score of the search result, calculated by the fusion algorithm.
```json title="Output"
[
{
"_distance": 0.4740166257687124,
"_match_score": 1.6804268,
"_score": 0.03278688524590164,
"id": 60013,
"text": "TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads."
},
{
"_distance": 0.6428459116216618,
"_match_score": 0.78427225,
"_score": 0.03200204813108039,
"id": 60015,
"text": "LlamaIndex is a Python library for building AI-powered applications."
},
{
"_distance": 0.641581407158715,
"_match_score": null,
"_score": 0.016129032258064516,
"id": 60014,
"text": "PyTiDB is a Python library for developers to connect to TiDB."
}
]
```
## Fusion Methods
Fusion methods combine results from vector (semantic) and full-text (keyword) searches into a single, unified ranking. This ensures that the final results leverage both semantic relevance and keyword matching.
PyTiDB supports two fusion methods:
- `rrf`: Reciprocal Rank Fusion (default)
- `weighted`: Weighted Score Fusion
You can select the fusion method that best fits your use case to optimize hybrid search results.
### Reciprocal Rank Fusion (RRF)
Reciprocal Rank Fusion (RRF) is an algorithm that evaluates search results by leveraging the rank of documents in multiple result sets.
For more details, see the [RRF paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).
=== "Python"
Enable reciprocal rank fusion by specifying the `method` parameter as `"rrf"` in the `.fusion()` method.
```python
results = (
table.search(
"AI database", search_type="hybrid"
)
.fusion(method="rrf")
.limit(3)
.to_list()
)
```
Parameters:
- `k`: A constant (default: 60) to prevent division by zero and control the impact of high-ranked documents.
### Weighted Score Fusion
Weighted Score Fusion combines vector search and full-text search scores using weighted sum:
```python
final_score = vs_weight * vector_score + fts_weight * fulltext_score
```
=== "Python"
Enable weighted score fusion by specifying the `method` parameter as `"weighted"` in the `.fusion()` method.
For example, to give more weight to vector search, set the `vs_weight` parameter to 0.7 and the `fts_weight` parameter to 0.3:
```python
results = (
table.search(
"AI database", search_type="hybrid"
)
.fusion(method="weighted", vs_weight=0.7, fts_weight=0.3)
.limit(3)
.to_list()
)
```
Parameters:
- `vs_weight`: The weight of the vector search score.
- `fts_weight`: The weight of the full-text search score.
## Rerank Method
Hybrid search also supports reranking using reranker-specific models.
=== "Python"
Use the `rerank()` method to specify a reranker that sorts search results by relevance between the query and the documents.
**Example: Using JinaAI Reranker to rerank the hybrid search results**
```python
reranker = Reranker(
# Use the `jina-reranker-m0` model
model_name="jina_ai/jina-reranker-m0",
api_key="{your-jinaai-api-key}"
)
results = (
table.search(
"AI database", search_type="hybrid"
)
.fusion(method="rrf", k=60)
.rerank(reranker, "text")
.limit(3)
.to_list()
)
```
To check other reranker models, see the [Reranking](../guides/reranking.md) guide.
================================================
FILE: src/ai/guides/image-search.md
================================================
# Image search
**Image search** helps you find similar images by comparing their visual content, not just text or metadata. This feature is useful for e-commerce, content moderation, digital asset management, and any scenario where you need to search for or deduplicate images based on appearance.
TiDB enables image search using **vector search**. With automatic embedding, you can generate image embeddings from image URLs, PIL images, or keyword text using a multimodal embedding model. TiDB then efficiently searches for similar vectors at scale.
!!! tip
For a complete example of image search, see the [Pet image search demo](../examples/image-search-with-pytidb.md).
## Basic usage
### Step 1. Define an embedding function
To generate image embeddings, you need an embedding model that supports image input.
For demonstration, you can use Jina AI's multimodal embedding model to generate image embeddings.
Go to [Jina AI](https://jina.ai/embeddings) to create an API key, then initialize the embedding function as follows:
```python hl_lines="7"
from pytidb.embeddings import EmbeddingFunction
image_embed = EmbeddingFunction(
# Or another provider/model that supports multimodal input
model_name="jina_ai/jina-embedding-v4",
api_key="{your-jina-api-key}",
multimodal=True,
)
```
### Step 2. Create a table and vector field
Use `VectorField()` to define a vector field for storing image embeddings. Set the `source_field` parameter to specify the field that stores image URLs.
```python
from pytidb.schema import TableModel, Field
class ImageItem(TableModel):
__tablename__ = "image_items"
id: int = Field(primary_key=True)
image_uri: str = Field()
image_vec: list[float] = image_embed.VectorField(
source_field="image_uri"
)
table = client.create_table(schema=ImageItem, if_exists="overwrite")
```
### Step 3. Insert image data
When you insert data, the `image_vec` field is automatically populated with the embedding generated from the `image_uri`.
```python
table.bulk_insert([
ImageItem(image_uri="https://example.com/image1.jpg"),
ImageItem(image_uri="https://example.com/image2.jpg"),
ImageItem(image_uri="https://example.com/image3.jpg"),
])
```
### Step 4. Perform image search
Image search is a type of vector search. Automatic embedding lets you input an image URL, PIL image, or keyword text directly. All these inputs are converted to vector embeddings for similarity matching.
#### Option 1: Search by image URL
Search for similar images by providing an image URL:
```python
results = table.search("https://example.com/query.jpg").limit(3).to_list()
```
The client converts the input image URL into a vector. TiDB then finds and returns the most similar images by comparing their vectors.
#### Option 2: Search by PIL image
You can also search for similar images by providing an image file or bytes:
```python
from PIL import Image
image = Image.open("/path/to/query.jpg")
results = table.search(image).limit(3).to_list()
```
The client converts the PIL image object into a Base64 string before sending it to the embedding model.
#### Option 3: Search by keyword text
You can also search for similar images by providing keyword text.
For example, if you are working on a pet image dataset, you can search for similar images by keywords like "orange tabby cat" or "golden retriever puppy".
```python
results = table.search("orange tabby cat").limit(3).to_list()
```
The keyword text will be converted to a vector embedding that captures the semantic meaning by the multimodal embedding model, and then a vector search will be performed to find the images whose embeddings are most similar to the keyword embedding.
## See also
- [Automatic embedding guide](./auto-embedding.md)
- [Vector search guide](../concepts/vector-search.md)
- [Pet image search demo](../examples/image-search-with-pytidb.md)
================================================
FILE: src/ai/guides/joins.md
================================================
# Multiple Table Joins
As a relational database, TiDB allows you to store diverse data in tables with different structures (for example: `chunks`, `documents`, `users`, `chats`) in a single database. Moreover, you can use joins to combine data from multiple tables to perform complex queries.
## Basic Usage
### Step 1. Create tables and insert sample data
=== "Python"
Assuming you have already [connected to the TiDB database](./connect.md) via TiDBClient:
Create a `documents` table and insert some sample data:
```python
from pytidb import Session
from pytidb.schema import TableModel, Field
from pytidb.sql import select
class Document(TableModel):
__tablename__ = "documents"
id: int = Field(primary_key=True)
title: str = Field(max_length=255)
client.create_table(schema=Document, if_exists="overwrite")
client.table("documents").truncate()
client.table("documents").bulk_insert([
Document(id=1, title="The Power of Positive Thinking"),
Document(id=2, title="The Happiness Advantage"),
Document(id=3, title="The Art of Happiness"),
])
```
Create a `chunks` table and insert some sample data:
```python
class Chunk(TableModel):
__tablename__ = "chunks"
id: int = Field(primary_key=True)
text: str = Field(max_length=255)
document_id: int = Field(foreign_key="documents.id")
client.create_table(schema=Chunk, if_exists="overwrite")
client.table("chunks").truncate()
client.table("chunks").bulk_insert([
Chunk(id=1, text="Positive thinking can change your life", document_id=1),
Chunk(id=2, text="Happiness leads to success", document_id=2),
Chunk(id=3, text="Finding joy in everyday moments", document_id=3),
])
```
=== "SQL"
Create a `documents` table and insert some sample data:
```sql
CREATE TABLE documents (
id INT PRIMARY KEY,
title VARCHAR(255) NOT NULL
);
INSERT INTO documents (id, title) VALUES
(1, 'The Power of Positive Thinking'),
(2, 'The Happiness Advantage'),
(3, 'The Art of Happiness');
```
Create a `chunks` table and insert some sample data:
```sql
CREATE TABLE chunks (
id INT PRIMARY KEY,
text VARCHAR(255) NOT NULL,
document_id INT NOT NULL,
FOREIGN KEY (document_id) REFERENCES documents(id)
);
INSERT INTO chunks (id, text, document_id) VALUES
(1, 'Positive thinking can change your life', 1),
(2, 'Happiness leads to success', 2),
(3, 'Finding joy in everyday moments', 3);
```
### Step 2. Perform a join query
=== "Python"
```python
with Session(client.db_engine) as db_session:
query = (
select(Chunk)
.join(Document, Chunk.document_id == Document.id)
.where(Document.title == "The Power of Positive Thinking")
)
chunks = db_session.exec(query).all()
[(c.id, c.text, c.document_id) for c in chunks]
```
=== "SQL"
Perform a join query to combine data from the `chunks` and `documents` tables:
```sql
SELECT c.id, c.text, c.document_id
FROM chunks c
JOIN documents d ON c.document_id = d.id
WHERE d.title = 'The Power of Positive Thinking';
```
================================================
FILE: src/ai/guides/raw-queries.md
================================================
# Raw Queries
## Operate data with raw SQL
You can use `client.execute()` method to execute `INSERT`, `UPDATE`, `DELETE` and other data manipulation SQL statements.
```python
client.execute("INSERT INTO chunks(text, user_id) VALUES ('sample text', 5)")
```
### SQL injection prevention
Both of the `execute` and `query` methods are support the **Parameterized SQL** feature, which help you avoid [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) while building dynamic SQL statements.
```python
client.execute(
"INSERT INTO chunks(text, user_id) VALUES (:text, :user_id)",
{
"text": "sample text",
"user_id": 6,
},
)
```
## Query data with rawSQL
You can use `client.query()` method to execute `SELECT`, `SHOW` and other query SQL statements.
### Output query result
The `client.query()` method will return a `SQLQueryResult` instance with some helper methods:
- `to_pydantic()`
- `to_list()`
- `to_pandas()`
- `to_rows()`
- `scalar()`
#### As Pydantic model
The `to_pydantic()` method will return a list of Pydantic models.
```python
client.query("SELECT id, text, user_id FROM chunks").to_pydantic()
```
#### As SQLAlchemy result rows
The `to_rows()` method will return a list of tuple, every tuple represent of one row of data.
```python
client.query("SHOW TABLES;").to_rows()
```
#### As list of dict
The `to_list()` method will convert the query result into a list of dict.
```python
client.query(
"SELECT id, text, user_id FROM chunks WHERE user_id = :user_id",
{
"user_id": 3
}
).to_list()
```
#### As pandas DataFrame
The `to_pandas()` method to convert the query result to a `pandas.DataFrame`, which is displayed as human-friendly style on the notebook:
```python
client.query("SELECT id, text, user_id FROM chunks").to_pandas()
```
#### As scalar value
The `scalar()` method will return the first column of the first row of the result set.
```python
client.query("SELECT COUNT(*) FROM chunks;").scalar()
```
================================================
FILE: src/ai/guides/reranking.md
================================================
# Reranking
Reranking is a technique used to improve the relevance and accuracy of search results by re-evaluating and reordering them using a dedicated rerank model.
The search process works in two stages:
1. **Initial Retrieval**: Vector search identifies the top `k` most similar documents from the collection
2. **Reranking**: A reranking model evaluates these `k` documents based on the relevance between the query and the documents and reorders them to produce the final top `n` results (where `n` ≤ `k`)
This two-stage retrieval approach significantly improves both document relevance and accuracy.
## Basic Usage
=== "Python"
PyTiDB provides the `Reranker` class that allows you to use reranker models from multiple third-party providers.
1. Create a reranker instance
```python
from pytidb.rerankers import Reranker
reranker = Reranker(model_name="{provider}/{model_name}")
```
2. Apply reranker via `.rerank()` method
```python
table.search("{query}").rerank(reranker, "{field_to_rerank}").limit(3)
```
## Supported Providers
Here are some examples to use reranker models from third-party providers.
### Jina AI
To enable reranker provided by JinaAI, go to their [website](https://jina.ai/reranker) to create a API key.
For example:
```python
jinaai = Reranker(
# Using the `jina-reranker-m0` model
model_name="jina_ai/jina-reranker-m0",
api_key="{your-jinaai-api-key}"
)
```
================================================
FILE: src/ai/guides/tables.md
================================================
# Working with tables
TiDB uses tables to organize and store collections of related data. It provides flexible schema definition capabilities, allowing you to structure your tables according to your specific requirements.
A table can contain multiple columns with different data types to store various kinds of data. Supported data types include text, numbers, vectors, binary data (`BLOB`), JSON, and more.
!!! tip
For a complete working example, see the [basic example](https://github.com/pingcap/pytidb/tree/main/examples/basic) in our repository.
## Create a table
### Using TableModel
TiDB provides a `TableModel` class that represents the schema of a table. This class is compatible with the [Pydantic Model](https://docs.pydantic.dev/latest/concepts/models/) and allows you to define the table structure in a declarative way.
In the following example, you create a table named `items` with these columns:
- `id`: a primary key column with an integer type
- `content`: a text type column
- `embedding`: a vector type column with 3 dimensions
- `meta`: a JSON type column
=== "Python"
After you [connect to the database](./connect.md) using PyTiDB and obtain a `client` instance, you can create a table with the `create_table` method.
```python hl_lines="12"
from pytidb.schema import TableModel, Field, VectorField
from pytidb.datatype import TEXT, JSON
class Item(TableModel):
__tablename__ = "items"
id: int = Field(primary_key=True)
content: str = Field(sa_type=TEXT)
embedding: list[float] = VectorField(dimensions=3)
meta: dict = Field(sa_type=JSON, default_factory=dict)
table = client.create_table(schema=Item, if_exists="overwrite")
```
The `create_table` method accepts these parameters:
- `schema`: The `TableModel` class that defines your table structure.
- `if_exists`: The creation mode of the table.
- `raise` (default): Creates the table if it does not exist; raises an error if it already exists.
- `skip`: Creates the table if it does not exist; does nothing if it already exists.
- `overwrite`: Drops the existing table and creates a new one. This is useful for **testing and development**, but not recommended for production environments.
Once the table is created, you can use the `table` object to insert, update, delete, and query data.
=== "SQL"
Use the `CREATE TABLE` statement to create a table.
```sql
CREATE TABLE items (
id INT PRIMARY KEY,
content TEXT,
embedding VECTOR(3),
meta JSON
);
```
## Add data to a table
### With TableModel
You can use a `TableModel` instance to represent a record and insert it into the table.
To insert a single record:
=== "Python"
Use the `table.insert()` method to insert a single record into the table.
```python
table.insert(
Item(
id=1,
content="TiDB is a distributed SQL database",
embedding=[0.1, 0.2, 0.3],
meta={"category": "database"},
)
)
```
=== "SQL"
Use the `INSERT INTO` statement to insert a single record into the table.
```sql
INSERT INTO items(id, content, embedding, meta)
VALUES (1, 'TiDB is a distributed SQL database', '[0.1, 0.2, 0.3]', '{"category": "database"}');
```
To insert multiple records:
=== "Python"
Use the `table.bulk_insert()` method to insert multiple records into the table.
```python
table.bulk_insert([
Item(
id=2,
content="GPT-4 is a large language model",
embedding=[0.4, 0.5, 0.6],
meta={"category": "llm"},
),
Item(
id=3,
content="LlamaIndex is a Python library for building AI-powered applications",
embedding=[0.7, 0.8, 0.9],
meta={"category": "rag"},
),
])
```
=== "SQL"
Use the `INSERT INTO` statement to insert multiple records into the table.
```sql
INSERT INTO items(id, content, embedding, meta)
VALUES
(2, 'GPT-4 is a large language model', '[0.4, 0.5, 0.6]', '{"category": "llm"}'),
(3, 'LlamaIndex is a Python library for building AI-powered applications', '[0.7, 0.8, 0.9]', '{"category": "rag"}');
```
### With Dict
You can also use `dict` to represent records and insert them into the table. This approach is more flexible and doesn't require to use a `TableModel` to insert data.
To insert a single record:
=== "Python"
Use the `table.insert()` method with a dictionary to insert a single record into the table.
```python
table.insert({
"id": 1,
"content": "TiDB is a distributed SQL database",
"embedding": [0.1, 0.2, 0.3],
"meta": {"category": "database"},
})
```
=== "SQL"
Use the `INSERT INTO` statement to insert a single record into the table.
```sql
INSERT INTO items(id, content, embedding, meta)
VALUES (1, 'TiDB is a distributed SQL database', '[0.1, 0.2, 0.3]', '{"category": "database"}');
```
## Save data to a table
The `save` method provides a convenient way to insert or update a single record. If a record with the specified primary key does not exist, it creates a new record. If the record already exists, it overwrites the entire record.
!!! note
If a record ID already exists, `table.save()` function overwrites the entire record. To change only part of a record, use `table.update()`.
=== "Python"
Use the `table.save()` method to save a single record to the table.
**Example: Save a new record**
```python
saved_record = table.save(
Item(
id=4,
content="Vector databases enable AI applications",
embedding=[1.0, 1.1, 1.2],
meta={"category": "vector-db"},
)
)
```
**Example: Save an existing record (overwrites the entire record)**
```python
# This overwrites the entire record with id=1
updated_record = table.save(
Item(
id=1, # Existing ID
content="Updated content for TiDB",
embedding=[0.2, 0.3, 0.4],
meta={"category": "updated"},
)
)
```
=== "SQL"
Use the `INSERT ... ON DUPLICATE KEY UPDATE` statement to save a record.
**Example: Save a new record or update if it exists**
```sql
INSERT INTO items(id, content, embedding, meta)
VALUES (4, 'Vector databases enable AI applications', '[1.0, 1.1, 1.2]', '{"category": "vector-db"}')
ON DUPLICATE KEY UPDATE
content = VALUES(content),
embedding = VALUES(embedding),
meta = VALUES(meta);
```
## Query data from a table
To fetch records from a table:
=== "Python"
Use the `table.query()` method to fetch the records from the table.
**Example: Fetch the first 10 records**
```python
result = table.query(limit=10).to_list()
```
=== "SQL"
Use the `SELECT` statement to fetch the records from the table.
**Example: Fetch the first 10 records**
```sql
SELECT * FROM items LIMIT 10;
```
To fetch records based on query conditions:
=== "Python"
Pass the `filters` parameter to the `table.query()` method.
```python
result = table.query(
filters={"meta.category": "database"},
limit=10
).to_list()
```
=== "SQL"
Use the `WHERE` clause to filter records.
**Example: Fetch the 10 records with category "database"**
```sql
SELECT * FROM items WHERE meta->>'$.category' = 'database' LIMIT 10;
```
For a complete list of supported filter operations and examples, refer to the [filtering](./filtering.md) guide.
## Update data in a table
=== "Python"
Use the `table.update()` method to update records with [filters](./filtering.md).
**Example: Update the record whose `id` equals 1**
```python
table.update(
values={
"content": "TiDB Cloud Starter is a fully managed, auto-scaling cloud database service"
"embedding": [0.1, 0.2, 0.4],
"meta": {"category": "dbass"},
},
filters={
"id": 1
},
)
```
=== "SQL"
Use the `UPDATE` statement to update records with [filters](./filtering.md).
**Example: Update the record whose `id` equals 1**
```sql
UPDATE items
SET
content = 'TiDB Cloud Starter is a fully managed, auto-scaling cloud database service',
embedding = '[0.1, 0.2, 0.4]',
meta = '{"category": "dbass"}'
WHERE
id = 1;
```
## Delete from a table
=== "Python"
Use the `table.delete()` method to delete records with [filters](./filtering.md).
**Example: Delete the record where `id` equals 2**
```python
table.delete(
filters={
"id": 2
}
)
```
=== "SQL"
Use the `DELETE` statement to delete records with [filters](./filtering.md).
**Example: Delete the record where `id` equals 2**
```sql
DELETE FROM items WHERE id = 2;
```
## Truncate a table
=== "Python"
To remove all data from the table but keep the table structure, use the `table.truncate()` method.
```python
table.truncate()
```
To check that the table is truncated, verify that it contains 0 rows.
```python
table.rows()
```
=== "SQL"
To remove all data from the table but keep the table structure, use the `TRUNCATE TABLE` statement.
```sql
TRUNCATE TABLE items;
```
To check that the table is truncated, verify that it contains 0 rows.
```sql
SELECT COUNT(*) FROM items;
```
## Drop a table
=== "Python"
To permanently remove a table from the database, use the `client.drop_table()` method.
```python
client.drop_table("items")
```
To check that the table is removed from the database:
```python
client.table_names()
```
=== "SQL"
To permanently remove a table from the database, use the `DROP TABLE` statement.
```sql
DROP TABLE items;
```
To check that the table is removed from the database:
```sql
SHOW TABLES;
```
================================================
FILE: src/ai/guides/transaction.md
================================================
# Transaction
TiDB supports ACID transactions, which ensure data consistency and reliability.
## Basic Usage
=== "Python"
```python
with client.session() as session:
initial_total_balance = session.query("SELECT SUM(balance) FROM players").scalar()
# Transfer 10 coins from player 1 to player 2
session.execute("UPDATE players SET balance = balance - 10 WHERE id = 1")
session.execute("UPDATE players SET balance = balance + 10 WHERE id = 2")
session.commit()
# or session.rollback()
final_total_balance = session.query("SELECT SUM(balance) FROM players").scalar()
assert final_total_balance == initial_total_balance
```
## See also
- [TiDB Develop Guide - Transaction](https://docs.pingcap.com/tidbcloud/dev-guide-transaction-overview/)
- [TiDB Docs- SQL Reference - Transactions](https://docs.pingcap.com/tidbcloud/transaction-overview/)
================================================
FILE: src/ai/guides/vector-search.md
================================================
# Vector Search
Vector search uses semantic similarity to help you find the most relevant records, even if your query does not explicitly include all the keywords.
!!! tip
For a complete example of vector search, see the [vector-search example](../examples/vector-search-with-pytidb.md).
## Basic Usage
This section shows you how to use vector search in your application in minimal steps. Before you start, you need to [connect to the database](./connect.md).
### Step 1. Create a table with a vector field
=== "Python"
You can use `client.create_table()` to create a table and use `VectorField` to define a vector field.
In this example, we create a table named `documents` with four columns:
- `id`: The primary key of the table.
- `text`: The text content of the document.
- `text_vec`: The vector embedding of the text content.
- `meta`: The metadata of the document, which is a JSON object.
```python hl_lines="9"
from pytidb.schema import TableModel, Field, VectorField
from pytidb.datatype import TEXT, JSON
class Document(TableModel):
__tablename__ = "documents"
id: int = Field(primary_key=True)
text: str = Field(sa_type=TEXT)
text_vec: list[float] = VectorField(dimensions=3)
meta: dict = Field(sa_type=JSON, default_factory=dict)
table = client.create_table(schema=Document, if_exists="overwrite")
```
The `VectorField` class accepts the following parameters:
- `dimensions`: The number of dimensions of the vector. Once specified, only vectors with this exact dimension can be stored in this field.
- `index`: Whether to create a [vector index](https://docs.pingcap.com/tidbcloud/vector-search-index/) for the vector field. Defaults to `True`.
- `distance_metric`: The distance metric to use for the vector index. Supported values:
- `DistanceMetric.COSINE` (default): Cosine distance metric, suitable for measuring text similarity
- `DistanceMetric.L2`: L2 distance metric, suitable for capturing overall difference
=== "SQL"
You can use the `CREATE TABLE` statement to create a table and using `VECTOR` type to define a vector column.
```sql hl_lines="4 5"
CREATE TABLE documents (
id INT PRIMARY KEY,
text TEXT,
text_vec VECTOR(3),
VECTOR INDEX `vec_idx_text_vec`((VEC_COSINE_DISTANCE(`text_vec`)))
);
```
In this example:
- The `text_vec` column is defined as a `VECTOR` type with 3 dimensions, it means that the vector to be stored in this column must have 3 dimensions.
- A vector index is created using the `VEC_COSINE_DISTANCE` function to optimize vector search performance
TiDB supports two distance functions for vector indexes:
- `VEC_COSINE_DISTANCE`: Calculates the cosine distance between two vectors
- `VEC_L2_DISTANCE`: Calculates L2 distance (Euclidean distance) between two vectors
### Step 2. Insert vector data into the table
For demonstration purposes, insert some text and their corresponding vector embeddings into the table. In this example, we use simple 3-dimensional vectors.
We insert three documents:
- `dog` with the vector embedding `[1, 2, 1]`
- `fish` with the vector embedding `[1, 2, 4]`
- `tree` with the vector embedding `[1, 0, 0]`
=== "Python"
```python
table.bulk_insert([
Document(text="dog", text_vec=[1,2,1], meta={"category": "animal"}),
Document(text="fish", text_vec=[1,2,4], meta={"category": "animal"}),
Document(text="tree", text_vec=[1,0,0], meta={"category": "plant"}),
])
```
=== "SQL"
```sql
INSERT INTO documents (id, text, text_vec, meta)
VALUES
(1, 'dog', '[1,2,1]', '{"category": "animal"}'),
(2, 'fish', '[1,2,4]', '{"category": "animal"}'),
(3, 'tree', '[1,0,0]', '{"category": "plant"}');
```
!!! tip
In real-world applications, vector embeddings are usually generated by an [embedding model](../concepts/vector-search.md#embedding-model).
For convenience, pytidb provides an auto embedding feature that can automatically generate vector embeddings for your text fields when you insert, update, or search—no manual processing needed.
For details, see the [Auto Embedding](./auto-embedding.md) guide.
### Step 3. Perform vector search
Vector search uses vector distance metrics to measure the similarity and relevance between vectors. The closer the distance, the more relevant the record. To find the most relevant documents in the table, you need to specify a query vector.
In this example, we assume the query is `A swimming animal` and its vector embedding is `[1, 2, 3]`.
=== "Python"
You can use the `table.search()` method to perform vector search, which uses `search_mode="vector"` by default.
```python
table.search([1, 2, 3]).limit(3).to_list()
```
```python title="Execution result"
[
{"id": 2, "text": "fish", "text_vec": [1,2,4], "_distance": 0.00853986601633272},
{"id": 1, "text": "dog", "text_vec": [1,2,1], "_distance": 0.12712843905603044},
{"id": 3, "text": "tree", "text_vec": [1,0,0], "_distance": 0.7327387580875756},
]
```
The result shows that the most relevant document is `fish` with a distance of `0.00853986601633272`.
=== "SQL"
You can use the `ORDER BY <distance_function>(<column_name>, <query_vector>) LIMIT <n>` clause in the `SELECT` statement to get the n nearest neighbors of the query vector.
In this example, we use the `vec_cosine_distance` function to calculate the cosine distance between the vectors stored in the `text_vec` column and the provided query vector `[1, 2, 3]`.
```sql
SELECT id, text, vec_cosine_distance(text_vec, '[1,2,3]') AS distance
FROM documents
ORDER BY distance
LIMIT 3;
```
```plain title="Execution result"
+----+----------+---------------------+
| id | text | distance |
+----+----------+---------------------+
| 2 | fish | 0.00853986601633272 |
| 1 | dog | 0.12712843905603044 |
| 3 | tree | 0.7327387580875756 |
+----+----------+---------------------+
3 rows in set (0.15 sec)
```
The result shows that the most relevant document is `fish` with a distance of `0.00853986601633272`.
## Distance metrics
Distance metrics are a measure of the similarity between a pair of vectors. Currently, TiDB supports the following distance metrics:
=== "Python"
The `table.search()` API supports the following distance metrics:
| Metric Name | Description | Best For |
|--------------------------|----------------------------------------------------------------|----------|
| `DistanceMetric.COSINE` | Calculates the cosine distance between two vectors (default). Measures the angle between vectors. | Text embeddings, semantic search |
| `DistanceMetric.L2` | Calculates the L2 distance (Euclidean distance) between two vectors. Measures the straight-line distance. | Image features |
To change the distance metric used for vector search, use the `.distance_metric()` method.
**Example: Use the L2 distance metric**
```python
from pytidb.schema import DistanceMetric
results = (
table.search([1, 2, 3])
.distance_metric(DistanceMetric.L2)
.limit(10)
.to_list()
)
```
=== "SQL"
In SQL, you can use the following built-in functions to calculate vector distances directly in your queries:
| Function Name | Description |
|-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|
| [`VEC_L2_DISTANCE`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_l2_distance) | Calculates L2 distance (Euclidean distance) between two vectors |
| [`VEC_COSINE_DISTANCE`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_cosine_distance) | Calculates the cosine distance between two vectors |
| [`VEC_NEGATIVE_INNER_PRODUCT`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_negative_inner_product) | Calculates the negative of the inner product between two vectors|
| [`VEC_L1_DISTANCE`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_l1_distance) | Calculates L1 distance (Manhattan distance) between two vectors |
## Distance threshold
The `table.search()` API allows you to set a distance threshold to control the similarity of the returned results. By specifying this threshold, you can exclude less similar vectors and return only those that meet your relevance criteria.
=== "Python"
Use the `.distance_threshold()` method to set a maximum distance for the search results. Only records with a distance less than the threshold are returned.
**Example: Only return documents with a distance less than 0.5**
```python
results = table.search([1, 2, 3]).distance_threshold(0.5).limit(10).to_list()
```
=== "SQL"
In SQL, use the `HAVING` clause with a distance function to filter results by distance:
**Example: Only return documents with a distance less than 0.1**
```sql
SELECT id, text, vec_cosine_distance(text_vec, '[1,2,3]') AS distance
FROM documents
HAVING distance < 0.1
ORDER BY distance
LIMIT 10;
```
## Distance range
The `table.search()` API also supports specifying a distance range to further refine the results.
=== "Python"
Use the `.distance_range()` method to set both minimum and maximum distance values. Only records with a distance within this range are returned.
**Example: Only return documents with a distance between 0.01 and 0.05**
```python
results = table.search([1, 2, 3]).distance_range(0.01, 0.05).limit(10).to_list()
```
=== "SQL"
To specify a distance range in SQL, use `BETWEEN` or other comparison operators in the `HAVING` clause:
**Example: Only return documents with a distance between 0.01 and 0.05**
```sql
SELECT id, text, vec_l2_distance(text_vec, '[1,2,3]') AS distance
FROM documents
HAVING distance BETWEEN 0.01 AND 0.05
ORDER BY distance
LIMIT 10;
```
## Metadata filtering
As a relational database, TiDB supports a rich set of [SQL operators](https://docs.pingcap.com/tidbcloud/operators/) and allows flexible combinations of filtering conditions.
For vector search in TiDB, you can apply metadata filtering on scalar fields (e.g., integers, strings) or JSON fields.
Typically, vector search combined with metadata filtering operates in two modes:
- **Post-filtering**: In a two-stage retrieval process, TiDB first performs vector search to retrieve the top-k candidate results from the entire vector space, then applies the filter to this candidate set. The vector search stage typically leverages a vector index for efficiency.
- **Pre-filtering**: The filter is applied before vector search. If the filter is highly selective and the filtered field is indexed with a scalar index, this approach can significantly reduce the search space and improve performance.
### Post-filtering
=== "Python"
Use the `.filter()` method with a filter dictionary to apply filtering to vector search.
By default, the `table.search()` API uses post-filtering mode to maximize search performance with the vector index.
**Example: Vector search with post-filtering**
```python
results = (
table.search([1, 2, 3])
# The `meta` is a JSON field, and its value is a JSON object
# like {"category": "animal"}
.filter({"meta.category": "animal"})
.num_candidate(50)
.limit(10)
.to_list()
)
```
!!! tip
When using a vector index, if the final `limit` is very small, the accuracy of the results may decrease. You can use the `.num_candidate()` method to control how many candidates to retrieve from the vector index during the vector search phase, without changing the `limit` parameter.
A higher `num_candidate` value generally improves recall but may reduce query performance. Adjust this value based on your dataset and accuracy requirements.
=== "SQL"
Currently, vector indexes are only effective in strict ANN (Approximate Nearest Neighbor) queries, such as:
```sql
SELECT * FROM <table> ORDER BY <distance_func>(<column>) LIMIT <n>
```
In other words, you cannot use a `WHERE` clause together with a vector index in the same query.
If you need to combine vector search with additional filtering conditions, you can use the post-filtering pattern. In this approach, the ANN query will be divided into two parts:
- The inner query performs the vector search using the vector index.
- The outer query applies the `WHERE` condition to filter the results.
```sql hl_lines="8"
SELECT *
FROM (
SELECT id, text, meta, vec_cosine_distance(text_vec, '[1,2,3]') AS distance
FROM documents
ORDER BY distance
LIMIT 50
) candidates
WHERE meta->>'$.category' = 'animal'
ORDER BY distance
LIMIT 10;
```
!!! tip
The post-filtering pattern may lead to false positives — for example, the inner query may retrieve the top 50 most similar records, but none of them match the `WHERE` condition.
To mitigate this, you can increase the `LIMIT` value (e.g., 50) in the **inner query** to fetch more candidates, improving the chances of returning enough valid results after filtering.
For supported SQL operators, see [Operators](https://docs.pingcap.com/tidbcloud/operators/) in the TiDB Cloud documentation.
### Pre-filtering
=== "Python"
To enable pre-filtering, set the `prefilter` parameter to `True` in the `.filter()` method.
**Example: Vector search with pre-filtering**
```python
results = (
table.search([1, 2, 3])
.filter({"meta.category": "animal"}, prefilter=True)
.limit(10)
.to_list()
)
```
For supported filter operators, see [Filtering](./filtering.md).
=== "SQL"
In SQL, use the `->>` operator or `JSON_EXTRACT` to access JSON fields in the `WHERE` clause:
```sql
SELECT id, text, meta, vec_cosine_distance(text_vec, '[1,2,3]') AS distance
FROM documents
WHERE meta->>'$.category' = 'animal'
ORDER BY distance
LIMIT 10;
```
For supported SQL operators, see [Operators](https://docs.pingcap.com/tidbcloud/operators/) in the TiDB Cloud documentation.
## Multiple vector fields
TiDB supports defining multiple vector columns in a single table, allowing you to store and search different types of vector embeddings.
For example, you can store both text embeddings and image embeddings in the same table, making it convenient to manage multi-modal data.
=== "Python"
You can define multiple vector fields in the schema and perform vector search on the specified vector field by using the `.vector_column()` method.
**Example: Specify the vector field to search on**
```python hl_lines="6 8 17"
# Create a table with multiple vector fields
class RichTextDocument(TableModel):
__tablename__ = "rich_text_documents"
id: int = Field(primary_key=True)
text: str = Field(sa_type=TEXT)
text_vec: list[float] = VectorField(dimensions=3)
image_url: str
image_vec: list[float] = VectorField(dimensions=3)
table = client.create_table(schema=RichTextDocument, if_exists="overwrite")
# Insert sample data ...
# Search using image vector field
results = (
table.search([1, 2, 3])
.vector_column("image_vec")
.distance_metric(DistanceMetric.COSINE)
.limit(10)
.to_list()
)
```
=== "SQL"
You can create multiple vector columns in a table and search them using suitable distance functions:
```sql
-- Create a table with multiple vector fields
CREATE TABLE rich_text_documents (
id BIGINT PRIMARY KEY,
text TEXT,
text_vec VECTOR(3),
image_url VARCHAR(255),
image_vec VECTOR(3)
);
-- Insert sample data ...
-- Search using text vector
SELECT id, image_url, vec_l2_distance(image_vec, '[4,5,6]') AS image_distance
FROM rich_text_documents
ORDER BY image_distance
LIMIT 10;
```
## Output search results
=== "Python"
The `table.search()` API lets you convert search results into several common data processing formats:
### As SQLAlchemy result rows
To work with raw SQLAlchemy result rows, use:
```python
table.search([1, 2, 3]).limit(10).to_rows()
```
### As a list of Python dictionaries
For easier manipulation in Python, convert the results to a list of dictionaries:
```python
table.search([1, 2, 3]).limit(10).to_list()
```
### As a pandas DataFrame
To display results in a user-friendly table—especially useful in Jupyter notebooks—convert them to a pandas DataFrame:
```python
table.search([1, 2, 3]).limit(10).to_pandas()
```
### As a list of Pydantic model instances
The `TableModel` class can also be used as a Pydantic model to represent data entities. To work with results as Pydantic model instances, use:
```python
table.search([1, 2, 3]).limit(10).to_pydantic()
```
================================================
FILE: src/ai/index.md
================================================
# TiDB for AI
TiDB is an open-source, distributed SQL database designed for modern AI applications, offering seamless scalability, real-time analytics, and unified storage for vectors, documents, knowledge graphs, operational data and more.
=== "Python"
TiDB provide a Python SDK and a series of integrations with popular AI frameworks to help developers build AI applications efficiently.
To install the TiDB Python SDK, run the following command:
```bash
pip install pytidb
```
Integrations:
- AI Frameworks: [LlamaIndex](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-llamaindex/), [LangChain](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-langchain/)
- ORM Libraries: [SQLAlchemy](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-sqlalchemy/), [Django-ORM](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-django-orm/), [Peewee](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-peewee/)
- AI Services: [Bedrock](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-amazon-bedrock/)
- Embedding Models/Services: [JinaAI](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-jinaai-embedding/)
## Next Steps
- 📖 Explore [Concepts](./concepts/vector-search.md) to understand the fundamentals of building modern AI applications
- 🚀 Follow our [Quickstart Guide](./quickstart.md) to begin building your first AI application with TiDB
================================================
FILE: src/ai/integrations/embedding-cohere.md
================================================
---
title: "Integrate TiDB Vector Search with Cohere Embeddings API"
description: "Learn how to integrate TiDB Vector Search with Cohere Embeddings API to store embeddings and perform semantic search."
keywords: "TiDB, Cohere, Vector search, text embeddings, multilingual embeddings"
---
# Integrate TiDB Vector Search with Cohere Embeddings API
This tutorial demonstrates how to use [Cohere](https://cohere.com/embed) to generate text embeddings, store them in TiDB vector storage, and perform semantic search.
!!! info
Currently, [Server-Side Auto Embedding](../guides/auto-embedding.md) is only available on [TiDB Cloud Starter](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme) clusters in the following AWS regions:
- `Frankfurt (eu-central-1)`
- `Oregon (us-west-2)`
- `N. Virginia (us-east-1)`
## Cohere Embeddings
Cohere offers multilingual embedding models for search, RAG, and classification. The latest `embed-v4.0` model supports text, images, and mixed content. You can use the Cohere Embeddings API with TiDB through the AI SDK or native SQL functions for automatic embedding generation.
### Supported Models
| Model Name | Dimensions | Max Input Tokens | Description |
|----------------------------------|------------|------------------|-------------|
| `cohere/embed-v4.0` | 256, 512, 1024, 1536 (default) | 128k | Latest multimodal model supporting text, images, and mixed content (PDFs) |
| `cohere/embed-english-v3.0` | 1024 | 512 | High-performance English embedding model optimized for search and classification |
| `cohere/embed-multilingual-v3.0`| 1024 | 512 | Multilingual model supporting 100+ languages |
| `cohere/embed-english-light-v3.0` | 384 | 512 | Lightweight English model for faster processing with similar performance |
| `cohere/embed-multilingual-light-v3.0` | 384 | 512 | Lightweight multilingual model for faster processing with similar performance |
For a complete list of supported models and detailed specifications, see the [Cohere Embeddings Documentation](https://docs.cohere.com/docs/cohere-embed).
## Usage example
This example demonstrates creating a vector table, inserting documents, and performing similarity search using Cohere embedding models.
### Step 1: Connect to the database
=== "Python"
```python
from pytidb import TiDBClient
tidb_client = TiDBClient.connect(
host="{gateway-region}.prod.aws.tidbcloud.com",
port=4000,
username="{prefix}.root",
password="{password}",
database="{database}",
ensure_db=True,
)
```
=== "SQL"
```bash
mysql -h {gateway-region}.prod.aws.tidbcloud.com \
-P 4000 \
-u {prefix}.root \
-p{password} \
-D {database}
```
### Step 2: Configure the API key
Create your API key from the [Cohere Dashboard](https://dashboard.cohere.com/api-keys) and bring your own key (BYOK) to use the embedding service.
=== "Python"
Configure the API key for the Cohere embedding provider using the TiDB Client:
```python
tidb_client.configure_embedding_provider(
provider="cohere",
api_key="{your-cohere-api-key}",
)
```
=== "SQL"
Set the API key for the Cohere embedding provider using SQL:
```sql
SET @@GLOBAL.TIDB_EXP_EMBED_COHERE_API_KEY = "{your-cohere-api-key}";
```
### Step 3: Create a vector table
Create a table with a vector field that uses the `cohere/embed-v4.0` model to generate 1536-dimensional vectors (default dimension):
=== "Python"
```python
from pytidb.schema import TableModel, Field
from pytidb.embeddings import EmbeddingFunction
from pytidb.datatype import TEXT
class Document(TableModel):
__tablename__ = "sample_documents"
id: int = Field(primary_key=True)
content: str = Field(sa_type=TEXT)
embedding: list[float] = EmbeddingFunction(
model_name="cohere/embed-v4.0"
).VectorField(source_field="content")
table = tidb_client.create_table(schema=Document, if_exists="overwrite")
```
=== "SQL"
```sql
CREATE TABLE sample_documents (
`id` INT PRIMARY KEY,
`content` TEXT,
`embedding` VECTOR(1536) GENERATED ALWAYS AS (EMBED_TEXT(
"cohere/embed-v4.0",
`content`
)) STORED
);
```
### Step 4: Insert data into the table
=== "Python"
Use the `table.insert()` or `table.bulk_insert()` API to add data:
```python
documents = [
Document(id=1, content="Python: High-level programming language for data science and web development."),
Document(id=2, content="Python snake: Non-venomous constrictor found in tropical regions."),
Document(id=3, content="Python framework: Django and Flask are popular web frameworks."),
Document(id=4, content="Python libraries: NumPy and Pandas for data analysis."),
Document(id=5, content="Python ecosystem: Rich collection of packages and tools."),
]
table.bulk_insert(documents)
```
=== "SQL"
Insert data using the `INSERT INTO` statement:
```sql
INSERT INTO sample_documents (id, content)
VALUES
(1, "Python: High-level programming language for data science and web development."),
(2, "Python snake: Non-venomous constrictor found in tropical regions."),
(3, "Python framework: Django and Flask are popular web frameworks."),
(4, "Python libraries: NumPy and Pandas for data analysis."),
(5, "Python ecosystem: Rich collection of packages and tools.");
```
### Step 5: Search for similar documents
=== "Python"
Use the `table.search()` API to perform vector search:
```python
results = table.search("How to learn Python programming?") \
.limit(2) \
.to_list()
print(results)
```
=== "SQL"
Use the `VEC_EMBED_COSINE_DISTANCE` function to perform vector search based on cosine distance metric:
```sql
SELECT
`id`,
`content`,
VEC_EMBED_COSINE_DISTANCE(embedding, "How to learn Python programming?") AS _distance
FROM sample_documents
ORDER BY _distance ASC
LIMIT 2;
```
================================================
FILE: src/ai/integrations/embedding-gemini.md
================================================
---
title: "Integrate TiDB Vector Search with Google Gemini Embeddings API"
description: "Learn how to integrate TiDB Vector Search with Google Gemini Embeddings API to store embeddings and perform semantic search."
keywords: "TiDB, Google Gemini, Vector search, text embeddings, multimodal embeddings"
---
# Integrate TiDB Vector Search with Google Gemini Embeddings API
This tutorial demonstrates how to use [Google Gemini](https://ai.google.dev/models/gemini) to generate embeddings for text and image data, store them in TiDB vector storage, and perform semantic search.
!!! info
Currently, [Server-Side Auto Embedding](../guides/auto-embedding.md) is only available on [TiDB Cloud Starter](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme) clusters in the following AWS regions:
- `Frankfurt (eu-central-1)`
- `Oregon (us-west-2)`
- `N. Virginia (us-east-1)`
## Google Gemini Embeddings
The Gemini API provides text embedding models that generate embeddings for words, phrases, sentences, and code. These embeddings enable advanced natural language processing (NLP) tasks such as semantic search, classification, and clustering. By using context-aware embeddings, you can achieve more accurate results than with traditional keyword-based methods.
### Supported Models
| Model Name | Dimensions (recommended) | Max Input Tokens | Description |
|------------|--------------------------|------------|-------------|
| `gemini-embedding-001` | 128–3072 (768, 1536, 3072) | 2048 | Text and code embeddings |
For a complete list of supported models an
gitextract_rmebsf2k/
├── .cursor/
│ └── rules/
│ └── styleguide.mdc
├── .github/
│ └── workflows/
│ ├── docs.yml
│ └── sitemap-sync.yml
├── .gitignore
├── .vscode/
│ └── settings.json
├── DEVELOPMENT.md
├── LICENSE
├── Makefile
├── README.md
├── demos.yml
├── mkdocs.yml
├── pyproject.toml
├── scripts/
│ ├── check_dependencies.py
│ ├── generate_demos.py
│ └── sync_from_tidb_docs.py
└── src/
├── ai/
│ ├── concepts/
│ │ └── vector-search.md
│ ├── examples/
│ │ ├── auto-embedding-with-pytidb.md
│ │ ├── basic-with-pytidb.md
│ │ ├── fulltext-search-with-pytidb.md
│ │ ├── hybrid-search-with-pytidb.md
│ │ ├── image-search-with-pytidb.md
│ │ ├── index.md
│ │ ├── memory-with-pytidb.md
│ │ ├── rag-with-pytidb.md
│ │ ├── text2sql-with-pytidb.md
│ │ └── vector-search-with-pytidb.md
│ ├── guides/
│ │ ├── auto-embedding.md
│ │ ├── connect.md
│ │ ├── filtering.md
│ │ ├── fulltext-search.md
│ │ ├── hybrid-search.md
│ │ ├── image-search.md
│ │ ├── joins.md
│ │ ├── raw-queries.md
│ │ ├── reranking.md
│ │ ├── tables.md
│ │ ├── transaction.md
│ │ └── vector-search.md
│ ├── index.md
│ ├── integrations/
│ │ ├── embedding-cohere.md
│ │ ├── embedding-gemini.md
│ │ ├── embedding-huggingface.md
│ │ ├── embedding-jinaai.md
│ │ ├── embedding-nvidia-nim.md
│ │ ├── embedding-openai-compatible.md
│ │ ├── embedding-openai.md
│ │ ├── embedding-overview.md
│ │ ├── embedding-tidb-cloud-hosted.md
│ │ ├── langchain.md
│ │ ├── llamaindex.md
│ │ ├── tidb-mcp-claude-code.md
│ │ ├── tidb-mcp-claude-desktop.md
│ │ ├── tidb-mcp-cursor.md
│ │ ├── tidb-mcp-server.md
│ │ ├── tidb-mcp-vscode.md
│ │ └── tidb-mcp-windsurf.md
│ ├── javascripts/
│ │ └── mathjax.js
│ └── quickstart.md
├── robots.txt
├── styles/
│ └── extra.css
└── templates/
├── demo_gallery_template.j2
└── demo_page_template.j2
SYMBOL INDEX (25 symbols across 3 files) FILE: scripts/check_dependencies.py function main (line 7) | def main(): FILE: scripts/generate_demos.py function load_config (line 25) | def load_config(): function fetch_readme_from_doc_link (line 38) | def fetch_readme_from_doc_link(doc_link): function extract_repo_info_from_doc_link (line 60) | def extract_repo_info_from_doc_link(doc_link): function process_readme_content (line 90) | def process_readme_content(content, demo_config): function create_demo_page (line 122) | def create_demo_page(demo_config, content): function generate_gallery_page (line 153) | def generate_gallery_page(config): function sync_demo_docs (line 179) | def sync_demo_docs(config, fetch_from_remote=True): function main (line 222) | def main(skip_demos, skip_gallery, verbose): FILE: scripts/sync_from_tidb_docs.py function remove_custom_content_blocks (line 6) | def remove_custom_content_blocks(content): function collapse_extra_blank_lines (line 11) | def collapse_extra_blank_lines(content): function convert_note_blocks (line 16) | def convert_note_blocks(content): function convert_warning_blocks (line 29) | def convert_warning_blocks(content): function convert_tip_blocks (line 42) | def convert_tip_blocks(content): function remove_see_also_section (line 55) | def remove_see_also_section(content): function replace_image_paths (line 60) | def replace_image_paths(content): function replace_relative_doc_links (line 67) | def replace_relative_doc_links(content): function remove_overview_from_title (line 80) | def remove_overview_from_title(content): function remove_front_matter (line 90) | def remove_front_matter(content): function remove_simpletab_blocks (line 95) | def remove_simpletab_blocks(content): function process_overview (line 103) | def process_overview(): function process_llamaindex (line 117) | def process_llamaindex(): function process_langchain (line 134) | def process_langchain(): function save_to_file (line 151) | def save_to_file(content, filename): function main (line 157) | def main():
Condensed preview — 62 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (335K chars).
[
{
"path": ".cursor/rules/styleguide.mdc",
"chars": 2552,
"preview": "---\ndescription: \nglobs: \nalwaysApply: true\n---\n# Documentation Review Style Guide\n\n## Behavior instruction\n\nYou are act"
},
{
"path": ".github/workflows/docs.yml",
"chars": 981,
"preview": "name: Docs Publish\n\non:\n push:\n branches:\n - main\n\npermissions:\n contents: write\n\njobs:\n deploy:\n runs-on:"
},
{
"path": ".github/workflows/sitemap-sync.yml",
"chars": 1436,
"preview": "name: Sync Sitemap\n\non:\n push:\n branches:\n - gh-pages\n paths:\n - 'sitemap.xml'\n\npermissions:\n contents"
},
{
"path": ".gitignore",
"chars": 3440,
"preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
},
{
"path": ".vscode/settings.json",
"chars": 224,
"preview": "{\n \"cSpell.words\": [\n \"FULLTEXT\",\n \"Pydantic\",\n \"getenv\",\n \"jina\",\n \"jinaai\",\n "
},
{
"path": "DEVELOPMENT.md",
"chars": 2315,
"preview": "# Development Guide\n\nThis guide covers the development setup and maintenance of the TiDB for AI documentation site.\n\n## "
},
{
"path": "LICENSE",
"chars": 22200,
"preview": "Attribution-ShareAlike 3.0 Unported\n\n=======================================================================\n\nCREATIVE C"
},
{
"path": "Makefile",
"chars": 1340,
"preview": "# TiDB for AI Documentation Site\n\n.PHONY: help install generate-demos generate-demo-pages generate-demo-gallery serve bu"
},
{
"path": "README.md",
"chars": 2536,
"preview": "# TiDB for AI\n\nTiDB is an open-source, distributed SQL database designed for modern AI applications, offering seamless s"
},
{
"path": "demos.yml",
"chars": 5491,
"preview": "# Demo Gallery Configuration\ntitle: \"Demo Gallery\"\ndescription: |\n Explore hands-on demos showcasing how TiDB empowers "
},
{
"path": "mkdocs.yml",
"chars": 7120,
"preview": "site_name: TiDB for AI\nsite_url: https://pingcap.github.io\nrepo_name: pingcap/tidb\nrepo_url: https://github.com/pingcap/"
},
{
"path": "pyproject.toml",
"chars": 617,
"preview": "# TiDB for AI Documentation\n# This project uses UV for fast dependency management\n# Install UV: curl -LsSf https://astra"
},
{
"path": "scripts/check_dependencies.py",
"chars": 954,
"preview": "#!/usr/bin/env python3\n\"\"\"Simple dependency check for TiDB for AI documentation.\"\"\"\n\nimport sys\nimport subprocess\n\ndef m"
},
{
"path": "scripts/generate_demos.py",
"chars": 9934,
"preview": "#!/usr/bin/env python3\n\"\"\"\nGenerate demo gallery and individual demo documentation pages.\nThis script reads configuratio"
},
{
"path": "scripts/sync_from_tidb_docs.py",
"chars": 6505,
"preview": "import requests\nimport re\nimport sys\n\n\ndef remove_custom_content_blocks(content):\n \"\"\"Remove <CustomContent ...>...</"
},
{
"path": "src/ai/concepts/vector-search.md",
"chars": 5628,
"preview": "---\ntitle: Vector Search Overview\nsummary: Learn about Vector Search in TiDB. This feature provides an advanced search s"
},
{
"path": "src/ai/examples/auto-embedding-with-pytidb.md",
"chars": 2625,
"preview": "---\ntitle: Auto Embedding\ndescription: \"Automatically generate embeddings for your text data using built-in embedding mo"
},
{
"path": "src/ai/examples/basic-with-pytidb.md",
"chars": 2733,
"preview": "---\ntitle: Basic Usage\ndescription: \"Learn fundamental PyTiDB operations including database connection, table creation, "
},
{
"path": "src/ai/examples/fulltext-search-with-pytidb.md",
"chars": 2095,
"preview": "---\ntitle: Fulltext Search\ndescription: \"Perform traditional text search using MySQL fulltext search capabilities.\"\nsour"
},
{
"path": "src/ai/examples/hybrid-search-with-pytidb.md",
"chars": 3238,
"preview": "---\ntitle: Hybrid Search\ndescription: \"Combine vector search and fulltext search for more comprehensive results.\"\nsource"
},
{
"path": "src/ai/examples/image-search-with-pytidb.md",
"chars": 3376,
"preview": "---\ntitle: Image Search\ndescription: \"Build an image search application using multimodal embeddings for both text-to-ima"
},
{
"path": "src/ai/examples/index.md",
"chars": 16460,
"preview": "---\ntitle: Demo Gallery\ndescription: Explore hands-on demos showcasing how TiDB empowers AI applications. Get started qu"
},
{
"path": "src/ai/examples/memory-with-pytidb.md",
"chars": 5393,
"preview": "---\ntitle: Memory\ndescription: \"Implement conversation memory for chatbots and conversational AI applications.\"\nsource_r"
},
{
"path": "src/ai/examples/rag-with-pytidb.md",
"chars": 2708,
"preview": "---\ntitle: RAG\ndescription: \"Build a RAG application that combines document retrieval with language generation.\"\nsource_"
},
{
"path": "src/ai/examples/text2sql-with-pytidb.md",
"chars": 1664,
"preview": "---\ntitle: Text2SQL\ndescription: \"Convert natural language queries into SQL statements using AI models.\"\nsource_repo: \"h"
},
{
"path": "src/ai/examples/vector-search-with-pytidb.md",
"chars": 2511,
"preview": "---\ntitle: Vector Search\ndescription: \"Implement semantic search using vector embeddings to find similar content.\"\nsourc"
},
{
"path": "src/ai/guides/auto-embedding.md",
"chars": 2483,
"preview": "# Auto Embedding\n\nAuto embedding is a feature that allows you to automatically generate vector embeddings for text data."
},
{
"path": "src/ai/guides/connect.md",
"chars": 4869,
"preview": "# Connect to database\n\nIn this guide, we will introduce how to connect to a TiDB database using the TiDB client.\n\n## Ins"
},
{
"path": "src/ai/guides/filtering.md",
"chars": 5157,
"preview": "# Filtering\n\nAs a relational database, TiDB supports a rich set of [SQL operators](https://docs.pingcap.com/tidbcloud/op"
},
{
"path": "src/ai/guides/fulltext-search.md",
"chars": 9899,
"preview": "# Full-text Search\n\n**Full-text search** enables you to find documents or data by matching keywords or phrases within th"
},
{
"path": "src/ai/guides/hybrid-search.md",
"chars": 6816,
"preview": "# Hybrid Search\n\nHybrid search is a technique that combines multiple search algorithms to deliver more accurate and rele"
},
{
"path": "src/ai/guides/image-search.md",
"chars": 3916,
"preview": "# Image search\n\n**Image search** helps you find similar images by comparing their visual content, not just text or metad"
},
{
"path": "src/ai/guides/joins.md",
"chars": 3339,
"preview": "# Multiple Table Joins\n\nAs a relational database, TiDB allows you to store diverse data in tables with different structu"
},
{
"path": "src/ai/guides/raw-queries.md",
"chars": 2004,
"preview": "# Raw Queries\n\n## Operate data with raw SQL\n\nYou can use `client.execute()` method to execute `INSERT`, `UPDATE`, `DELET"
},
{
"path": "src/ai/guides/reranking.md",
"chars": 1484,
"preview": "# Reranking\n\nReranking is a technique used to improve the relevance and accuracy of search results by re-evaluating and "
},
{
"path": "src/ai/guides/tables.md",
"chars": 10209,
"preview": "# Working with tables\n\nTiDB uses tables to organize and store collections of related data. It provides flexible schema d"
},
{
"path": "src/ai/guides/transaction.md",
"chars": 924,
"preview": "# Transaction\n\nTiDB supports ACID transactions, which ensure data consistency and reliability.\n\n## Basic Usage\n\n=== \"Pyt"
},
{
"path": "src/ai/guides/vector-search.md",
"chars": 17891,
"preview": "# Vector Search\n\nVector search uses semantic similarity to help you find the most relevant records, even if your query d"
},
{
"path": "src/ai/index.md",
"chars": 1484,
"preview": "# TiDB for AI\n\nTiDB is an open-source, distributed SQL database designed for modern AI applications, offering seamless s"
},
{
"path": "src/ai/integrations/embedding-cohere.md",
"chars": 6330,
"preview": "---\ntitle: \"Integrate TiDB Vector Search with Cohere Embeddings API\"\ndescription: \"Learn how to integrate TiDB Vector Se"
},
{
"path": "src/ai/integrations/embedding-gemini.md",
"chars": 7060,
"preview": "---\ntitle: \"Integrate TiDB Vector Search with Google Gemini Embeddings API\"\ndescription: \"Learn how to integrate TiDB Ve"
},
{
"path": "src/ai/integrations/embedding-huggingface.md",
"chars": 7007,
"preview": "---\ntitle: \"Integrate TiDB Vector Search with Hugging Face Embeddings\"\ndescription: \"Learn how to integrate TiDB Vector "
},
{
"path": "src/ai/integrations/embedding-jinaai.md",
"chars": 5710,
"preview": "---\ntitle: \"Integrate TiDB Vector Search with Jina AI Embeddings API\"\ndescription: \"Learn how to integrate TiDB Vector S"
},
{
"path": "src/ai/integrations/embedding-nvidia-nim.md",
"chars": 6336,
"preview": "---\ntitle: \"Integrate TiDB Vector Search with NVIDIA NIM Embeddings\"\ndescription: \"Learn how to integrate TiDB Vector Se"
},
{
"path": "src/ai/integrations/embedding-openai-compatible.md",
"chars": 4703,
"preview": "---\ntitle: \"Integrate TiDB Vector Search with OpenAI-Compatible Embedding API\"\ndescription: \"Learn how to integrate TiDB"
},
{
"path": "src/ai/integrations/embedding-openai.md",
"chars": 5383,
"preview": "---\ntitle: \"Integrate TiDB Vector Search with OpenAI Embeddings API\"\ndescription: \"Learn how to integrate TiDB Vector Se"
},
{
"path": "src/ai/integrations/embedding-overview.md",
"chars": 2624,
"preview": "# Embeddings Integration\n\n## Overview\n\nTiDB provides a unified interface for integrating with various embedding provider"
},
{
"path": "src/ai/integrations/embedding-tidb-cloud-hosted.md",
"chars": 5259,
"preview": "---\ntitle: \"Integrate TiDB Vector Search with TiDB Cloud Hosted Embedding Models\"\ndescription: \"Learn how to integrate T"
},
{
"path": "src/ai/integrations/langchain.md",
"chars": 25231,
"preview": "\n# Integrate Vector Search with LangChain\n\nThis tutorial demonstrates how to integrate the [vector search](https://docs."
},
{
"path": "src/ai/integrations/llamaindex.md",
"chars": 10688,
"preview": "\n# Integrate Vector Search with LlamaIndex\n\nThis tutorial demonstrates how to integrate the [vector search](https://docs"
},
{
"path": "src/ai/integrations/tidb-mcp-claude-code.md",
"chars": 2926,
"preview": "---\ntitle: Get started with Claude Code and TiDB MCP Server\ndescription: This guide shows you how to configure the TiDB "
},
{
"path": "src/ai/integrations/tidb-mcp-claude-desktop.md",
"chars": 2001,
"preview": "---\ntitle: Get started with Claude Desktop and TiDB MCP Server\ndescription: This guide shows you how to configure the Ti"
},
{
"path": "src/ai/integrations/tidb-mcp-cursor.md",
"chars": 3405,
"preview": "---\ntitle: Get started with Cursor and TiDB MCP Server\ndescription: This guide shows you how to configure the TiDB MCP S"
},
{
"path": "src/ai/integrations/tidb-mcp-server.md",
"chars": 5680,
"preview": "---\ntitle: TiDB MCP Server\ndescription: Manage your TiDB databases using natural language instructions with the TiDB MCP"
},
{
"path": "src/ai/integrations/tidb-mcp-vscode.md",
"chars": 2457,
"preview": "---\ntitle: Get started with Visual Studio Code and TiDB MCP Server\ndescription: This guide shows you how to configure th"
},
{
"path": "src/ai/integrations/tidb-mcp-windsurf.md",
"chars": 2550,
"preview": "---\ntitle: Get started with Windsurf and TiDB MCP Server\ndescription: This guide shows you how to configure the TiDB MCP"
},
{
"path": "src/ai/javascripts/mathjax.js",
"chars": 426,
"preview": "window.MathJax = {\n tex: {\n inlineMath: [[\"\\\\(\", \"\\\\)\"]],\n displayMath: [[\"\\\\[\", \"\\\\]\"]],\n proce"
},
{
"path": "src/ai/quickstart.md",
"chars": 6865,
"preview": "---\ntitle: Quickstart\ndescription: Get started with TiDB using Python SDK.\n---\n\n# Quickstart\n\nIn this guide, you will le"
},
{
"path": "src/robots.txt",
"chars": 72,
"preview": "User-agent: *\nAllow: /\n\nSitemap: https://pingcap.github.io/sitemap.xml\n"
},
{
"path": "src/styles/extra.css",
"chars": 7703,
"preview": "/* Dark mode */\n[data-md-color-scheme=\"tidb-dark\"] {\n\n --md-default-fg-color: hsla(0, 0%, 95%, 1);\n "
},
{
"path": "src/templates/demo_gallery_template.j2",
"chars": 9981,
"preview": "---\ntitle: {{ config.title }}\ndescription: {{ config.meta.description }}\nhide:\n - navigation\n - toc\n - pageTitle\n - "
},
{
"path": "src/templates/demo_page_template.j2",
"chars": 465,
"preview": "---\ntitle: {{ demo.title }}\ndescription: \"{{ demo.description }}\"\nsource_repo: \"{{ demo.doc_link | replace('/README.md',"
}
]
About this extraction
This page contains the full source code of the pingcap/ossinsight-lite GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 62 files (310.0 KB), approximately 80.4k tokens, and a symbol index with 25 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.