[
  {
    "path": ".cursor/rules/styleguide.mdc",
    "content": "---\ndescription: \nglobs: \nalwaysApply: true\n---\n# Documentation Review Style Guide\n\n## Behavior instruction\n\nYou are acting as a **senior technical writer** who is reviewing TiDB documentation pull requests and you always provide ready-to-commit doc suggestions so the PR author can commit them directly.\n\n## Note\n\n- When you finish the review, you directly add comments to the PR instead of requesting changes to avoid blocking the pull request from being merged.\n- If the PR author is ti-chi-bot, you only correct English grammar, spelling, and punctuation mistakes, if any.\n\n## Review aspects\n\n- Clarity, simplicity, completeness, and readability\n- Logical flow and sentence structure\n- Technical accuracy and terminology consistency\n\n## General writing principles\n\n- Correct English grammar, spelling, and punctuation mistakes, if any.\n- Make sure the documentation is easy to understand for TiDB users.\n- Write in **second person** (\"you\") when addressing users.\n- Prefer **present tense** unless describing historical behavior.\n- Avoid unnecessary words and repetition.\n- Use **consistent terminology**. For example:\n\n    - ❌ Do not mix \"database\" and \"instance\"\n    - ✅ Use \"replicate\" instead of \"synchronize\" when referring to replicating data from one TiDB cluster to another.\n\n## Structure and format\n\n- Use sentence case for headings (e.g., `## Configure the cluster`).\n- Use ordered lists (`1.`, `2.`) for steps.\n- Code snippets, command names, options, and paths should be in backticks (`` ` ``).\n\n## Markdown style\n\n- Add a blank line before and after headings and lists.\n- Use proper heading hierarchy (no jumping from `##` to `####`).\n\n## Common issues to flag\n\n- Passive voice overuse\n\n    _\"The cluster is started by TiUP\"_ → _\"TiUP starts the cluster\"_\n\n- Inconsistent use of technical terms\n\n    _\"TiDB Cloud Starter clusters\" vs. \"TiDB Cloud clusters\"_ – pick one.\n\n- Unclear step instructions\n\n     _\"Do it like before\"_ → _\"Repeat step 3 using the updated config file\"_\n\n- Grammar and spelling issues\n\n    _\"recieve\"_ → _\"receive\"_, _\"an TiDB instance\"_ → _\"a TiDB instance\"_\n\n## Special notes\n\n- Follow any existing terminology in our glossary (`/glossary.md` if available).\n- When in doubt, favor clarity over cleverness.\n- If something might confuse a new user, suggest a reword.\n\n## Purposes of this style guide\n\nThis guide helps Gemini Code Assist provide actionable, high-quality suggestions for improving technical documentation, especially for PRs related to user guides, how-to articles, and product reference material."
  },
  {
    "path": ".github/workflows/docs.yml",
    "content": "name: Docs Publish\n\non:\n  push:\n    branches:\n      - main\n\npermissions:\n  contents: write\n\njobs:\n  deploy:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - name: Configure Git Credentials\n        run: |\n          git config user.name github-actions[bot]\n          git config user.email 41898282+github-actions[bot]@users.noreply.github.com\n      - name: Install uv\n        uses: astral-sh/setup-uv@v5\n        with:\n          enable-cache: true\n      - name: Set up Python\n        run: uv python install\n      - run: echo \"cache_id=$(date --utc '+%V')\" >> $GITHUB_ENV \n      - uses: actions/cache@v4\n        with:\n          key: mkdocs-material-${{ env.cache_id }}\n          path: .cache \n          restore-keys: |\n            mkdocs-material-\n      - name: Install dependencies\n        run: uv sync\n      - name: Generate demo pages and gallery\n        run: make generate-demos\n      - name: Deploy docs\n        run: uv run mkdocs gh-deploy --force\n"
  },
  {
    "path": ".github/workflows/sitemap-sync.yml",
    "content": "name: Sync Sitemap\n\non:\n  push:\n    branches:\n      - gh-pages\n    paths:\n      - 'sitemap.xml'\n\npermissions:\n  contents: write\n\njobs:\n  sync-sitemap:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout repository\n        uses: actions/checkout@v4\n        with:\n          ref: gh-pages\n          token: ${{ secrets.GITHUB_TOKEN }}\n\n      - name: Configure Git Credentials\n        run: |\n          git config user.name github-actions[bot]\n          git config user.email 41898282+github-actions[bot]@users.noreply.github.com\n\n      - name: Check if sitemap.xml exists in root\n        id: check-sitemap\n        run: |\n          if [ -f \"sitemap.xml\" ]; then\n            echo \"exists=true\" >> $GITHUB_OUTPUT\n          else\n            echo \"exists=false\" >> $GITHUB_OUTPUT\n          fi\n\n      - name: Create ai directory if it doesn't exist\n        if: steps.check-sitemap.outputs.exists == 'true'\n        run: mkdir -p ai\n\n      - name: Copy sitemap.xml to ai/sitemap.xml\n        if: steps.check-sitemap.outputs.exists == 'true'\n        run: cp sitemap.xml ai/sitemap.xml\n\n      - name: Commit and push changes\n        if: steps.check-sitemap.outputs.exists == 'true'\n        run: |\n          git add ai/sitemap.xml\n          if git diff --staged --quiet; then\n            echo \"No changes to commit\"\n          else\n            git commit -m \"Sync sitemap.xml to ai/sitemap.xml\"\n            git push origin gh-pages\n          fi "
  },
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\ncover/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\n.pybuilder/\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# pyenv\n#   For a library or package, you might want to ignore these files since the code is\n#   intended to run in multiple environments; otherwise, check them in:\n.python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# UV\n#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.\n#   This is especially recommended for binary packages to ensure reproducibility, and is more\n#   commonly ignored for libraries.\n#uv.lock\n\n# poetry\n#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.\n#   This is especially recommended for binary packages to ensure reproducibility, and is more\n#   commonly ignored for libraries.\n#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control\n#poetry.lock\n\n# pdm\n#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.\n#pdm.lock\n#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it\n#   in version control.\n#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control\n.pdm.toml\n.pdm-python\n.pdm-build/\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n\n# pytype static type analyzer\n.pytype/\n\n# Cython debug symbols\ncython_debug/\n\n# PyCharm\n#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can\n#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore\n#  and can be added to the global gitignore or merged into this file.  For a more nuclear\n#  option (not recommended) you can uncomment the following to ignore the entire idea folder.\n.idea/\n\n# Ruff stuff:\n.ruff_cache/\n\n# PyPI configuration file\n.pypirc\n"
  },
  {
    "path": ".vscode/settings.json",
    "content": "{\n    \"cSpell.words\": [\n        \"FULLTEXT\",\n        \"Pydantic\",\n        \"getenv\",\n        \"jina\",\n        \"jinaai\",\n        \"Rerank\",\n        \"reranker\",\n        \"reranking\",\n        \"tablename\",\n        \"multimodal\"\n    ]\n}"
  },
  {
    "path": "DEVELOPMENT.md",
    "content": "# Development Guide\n\nThis guide covers the development setup and maintenance of the TiDB for AI documentation site.\n\n## Quick Start\n\n1. **Install UV** (fast dependency manager):\n\n   ```bash\n   curl -LsSf https://astral.sh/uv/install.sh | sh\n   ```\n\n2. **Install dependencies**:\n\n   ```bash\n   make install\n   ```\n\n3. **Start development server**:\n\n   ```bash\n   make serve\n   ```\n\n## Project Structure\n\n```\n├── demos.yml                    # Demo Gallery configuration\n├── scripts/generate_demos.py    # Demo Gallery generation script  \n├── src/\n│   ├── templates/               # Jinja2 templates\n│   └── ai/examples/             # Generated demo pages\n└── Makefile                     # Build commands\n``` \n\n## Helpful Commands\n\n```bash\n# Dependencies\nmake check          # Check dependencies and setup\nmake install        # Install/update dependencies\n\n# Development workflow\nmake serve          # Start development server\nmake build          # Build documentation site\nmake clean          # Clean build artifacts\n\n# Demo management\nmake generate-demos         # Generate demo pages and gallery from demos.yml config\nmake generate-demo-pages    # Only generate demo pages.\nmake generate-demo-gallery  # Only generate demo gallery.\n\n# Other\nmake help           # Show all available commands\n```\n\n## Maintain the Demo Gallery\n\nThe Demo Gallery showcases AI demos of TiDB and is configured via [demos.yml](demos.yml).\n\nTo regenerate the demo gallery from configuration, run:\n\n```bash\nmake generate-demos\n```\n\n### How to add a new demo\n\nYou can follow the steps below to add a new demo:\n\n1. Add entry to `demos` array in `demos.yml` with unique `id`, title, description, and display properties\n\n  For example:\n\n  ```yaml\n  demos:\n    - id: \"basic\"\n      title: \"Basic Usage\"\n      description: \"Learn fundamental PyTiDB operations\"\n      icon: \"⚙️\"\n      background: \"linear-gradient(135deg, #10b981, var(--brand-color))\"\n      link: \"basic-with-pytidb/\"\n      doc_link: \"https://github.com/pingcap/pytidb/tree/main/examples/basic/README.md\"\n      cover_image: null\n  ```\n\n2. Add the demo `id` to appropriate category's `demos` array\n\n  For example:\n\n  ```yaml\n  categories:\n    - id: \"featured\"\n      title: \"Search\"\n      demos: [\"image-search\"]\n  ```\n\n3. Run `make generate-demos` to regenerate\n4. Commit changes\n"
  },
  {
    "path": "LICENSE",
    "content": "Attribution-ShareAlike 3.0 Unported\n\n=======================================================================\n\nCREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE\nLEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN\nATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS\nINFORMATION ON AN \"AS-IS\" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES\nREGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR DAMAGES\nRESULTING FROM ITS USE.\n\nLicense\n\nTHE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS\nCREATIVE COMMONS PUBLIC LICENSE (\"CCPL\" OR \"LICENSE\"). THE WORK IS\nPROTECTED BY COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK\nOTHER THAN AS AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS\nPROHIBITED.\n\nBY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND\nAGREE TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS\nLICENSE MAY BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE\nRIGHTS CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS\nAND CONDITIONS.\n\n1. Definitions\n\n   a. \"Adaptation\" means a work based upon the Work, or upon the Work\n   and other pre-existing works, such as a translation, adaptation,\n   derivative work, arrangement of music or other alterations of a\n   literary or artistic work, or phonogram or performance and includes\n   cinematographic adaptations or any other form in which the Work may\n   be recast, transformed, or adapted including in any form\n   recognizably derived from the original, except that a work that\n   constitutes a Collection will not be considered an Adaptation for\n   the purpose of this License. For the avoidance of doubt, where the\n   Work is a musical work, performance or phonogram, the\n   synchronization of the Work in timed-relation with a moving image\n   (\"synching\") will be considered an Adaptation for the purpose of\n   this License.\n\n   b. \"Collection\" means a collection of literary or artistic works,\n   such as encyclopedias and anthologies, or performances, phonograms\n   or broadcasts, or other works or subject matter other than works\n   listed in Section 1(f) below, which, by reason of the selection and\n   arrangement of their contents, constitute intellectual creations,\n   in which the Work is included in its entirety in unmodified form\n   along with one or more other contributions, each constituting\n   separate and independent works in themselves, which together are\n   assembled into a collective whole. A work that constitutes a\n   Collection will not be considered an Adaptation (as defined below)\n   for the purposes of this License.\n\n   c. \"Creative Commons Compatible License\" means a license that is\n   listed at https://creativecommons.org/compatiblelicenses that has\n   been approved by Creative Commons as being essentially equivalent\n   to this License, including, at a minimum, because that license: (i)\n   contains terms that have the same purpose, meaning and effect as\n   the License Elements of this License; and, (ii) explicitly permits\n   the relicensing of adaptations of works made available under that\n   license under this License or a Creative Commons jurisdiction\n   license with the same License Elements as this License.\n\n   d. \"Distribute\" means to make available to the public the original\n   and copies of the Work or Adaptation, as appropriate, through sale\n   or other transfer of ownership.\n\n   e. \"License Elements\" means the following high-level license\n   attributes as selected by Licensor and indicated in the title of\n   this License: Attribution, ShareAlike.\n\n   f. \"Licensor\" means the individual, individuals, entity or entities\n   that offer(s) the Work under the terms of this License.\n\n   g. \"Original Author\" means, in the case of a literary or artistic\n   work, the individual, individuals, entity or entities who created\n   the Work or if no individual or entity can be identified, the\n   publisher; and in addition (i) in the case of a performance the\n   actors, singers, musicians, dancers, and other persons who act,\n   sing, deliver, declaim, play in, interpret or otherwise perform\n   literary or artistic works or expressions of folklore; (ii) in the\n   case of a phonogram the producer being the person or legal entity\n   who first fixes the sounds of a performance or other sounds; and,\n   (iii) in the case of broadcasts, the organization that transmits\n   the broadcast.\n\n   h. \"Work\" means the literary and/or artistic work offered under the\n   terms of this License including without limitation any production\n   in the literary, scientific and artistic domain, whatever may be\n   the mode or form of its expression including digital form, such as\n   a book, pamphlet and other writing; a lecture, address, sermon or\n   other work of the same nature; a dramatic or dramatico-musical\n   work; a choreographic work or entertainment in dumb show; a musical\n   composition with or without words; a cinematographic work to which\n   are assimilated works expressed by a process analogous to\n   cinematography; a work of drawing, painting, architecture,\n   sculpture, engraving or lithography; a photographic work to which\n   are assimilated works expressed by a process analogous to\n   photography; a work of applied art; an illustration, map, plan,\n   sketch or three-dimensional work relative to geography, topography,\n   architecture or science; a performance; a broadcast; a phonogram; a\n   compilation of data to the extent it is protected as a\n   copyrightable work; or a work performed by a variety or circus\n   performer to the extent it is not otherwise considered a literary\n   or artistic work.\n\n   i. \"You\" means an individual or entity exercising rights under this\n   License who has not previously violated the terms of this License\n   with respect to the Work, or who has received express permission\n   from the Licensor to exercise rights under this License despite a\n   previous violation.\n\n   j. \"Publicly Perform\" means to perform public recitations of the\n   Work and to communicate to the public those public recitations, by\n   any means or process, including by wire or wireless means or public\n   digital performances; to make available to the public Works in such\n   a way that members of the public may access these Works from a\n   place and at a place individually chosen by them; to perform the\n   Work to the public by any means or process and the communication to\n   the public of the performances of the Work, including by public\n   digital performance; to broadcast and rebroadcast the Work by any\n   means including signs, sounds or images.\n   k. \"Reproduce\" means to make copies of the Work by any means\n   including without limitation by sound or visual recordings and the\n   right of fixation and reproducing fixations of the Work, including\n   storage of a protected performance or phonogram in digital form or\n   other electronic medium.\n\n2. Fair Dealing Rights. Nothing in this License is intended to reduce,\nlimit, or restrict any uses free from copyright or rights arising from\nlimitations or exceptions that are provided for in connection with the\ncopyright protection under copyright law or other applicable laws.\n\n3. License Grant. Subject to the terms and conditions of this License,\nLicensor hereby grants You a worldwide, royalty-free, non-exclusive,\nperpetual (for the duration of the applicable copyright) license to\nexercise the rights in the Work as stated below:\n\n   a. to Reproduce the Work, to incorporate the Work into one or more\n   Collections, and to Reproduce the Work as incorporated in the\n   Collections;\n\n   b. to create and Reproduce Adaptations provided that any such\n   Adaptation, including any translation in any medium, takes\n   reasonable steps to clearly label, demarcate or otherwise identify\n   that changes were made to the original Work. For example, a\n   translation could be marked \"The original work was translated from\n   English to Spanish,\" or a modification could indicate \"The original\n   work has been modified.\";\n\n   c. to Distribute and Publicly Perform the Work including as\n   incorporated in Collections; and,\n\n   d. to Distribute and Publicly Perform Adaptations.\n\n   e. For the avoidance of doubt:\n\n      i. Non-waivable Compulsory License Schemes. In those\n      jurisdictions in which the right to collect royalties through\n      any statutory or compulsory licensing scheme cannot be waived,\n      the Licensor reserves the exclusive right to collect such\n      royalties for any exercise by You of the rights granted under\n      this License;\n\n      ii. Waivable Compulsory License Schemes. In those jurisdictions\n      in which the right to collect royalties through any statutory or\n      compulsory licensing scheme can be waived, the Licensor waives\n      the exclusive right to collect such royalties for any exercise\n      by You of the rights granted under this License; and,\n\n      iii. Voluntary License Schemes. The Licensor waives the right to\n      collect royalties, whether individually or, in the event that\n      the Licensor is a member of a collecting society that\n      administers voluntary licensing schemes, via that society, from\n      any exercise by You of the rights granted under this License.\n\n   The above rights may be exercised in all media and formats whether\n   now known or hereafter devised. The above rights include the right\n   to make such modifications as are technically necessary to exercise\n   the rights in other media and formats. Subject to Section 8(f), all\n   rights not expressly granted by Licensor are hereby reserved.\n\n4. Restrictions. The license granted in Section 3 above is expressly\nmade subject to and limited by the following restrictions:\n\n   a. You may Distribute or Publicly Perform the Work only under the\n   terms of this License. You must include a copy of, or the Uniform\n   Resource Identifier (URI) for, this License with every copy of the\n   Work You Distribute or Publicly Perform. You may not offer or\n   impose any terms on the Work that restrict the terms of this\n   License or the ability of the recipient of the Work to exercise the\n   rights granted to that recipient under the terms of the License.\n   You may not sublicense the Work. You must keep intact all notices\n   that refer to this License and to the disclaimer of warranties with\n   every copy of the Work You Distribute or Publicly Perform. When You\n   Distribute or Publicly Perform the Work, You may not impose any\n   effective technological measures on the Work that restrict the\n   ability of a recipient of the Work from You to exercise the rights\n   granted to that recipient under the terms of the License. This\n   Section 4(a) applies to the Work as incorporated in a Collection,\n   but this does not require the Collection apart from the Work itself\n   to be made subject to the terms of this License. If You create a\n   Collection, upon notice from any Licensor You must, to the extent\n   practicable, remove from the Collection any credit as required by\n   Section 4(c), as requested. If You create an Adaptation, upon\n   notice from any Licensor You must, to the extent practicable,\n   remove from the Adaptation any credit as required by Section 4(c),\n   as requested.\n\n   b. You may Distribute or Publicly Perform an Adaptation only under\n   the terms of: (i) this License; (ii) a later version of this\n   License with the same License Elements as this License; (iii) a\n   Creative Commons jurisdiction license (either this or a later\n   license version) that contains the same License Elements as this\n   License (e.g., Attribution-ShareAlike 3.0 US)); (iv) a Creative\n   Commons Compatible License. If you license the Adaptation under one\n   of the licenses mentioned in (iv), you must comply with the terms\n   of that license. If you license the Adaptation under the terms of\n   any of the licenses mentioned in (i), (ii) or (iii) (the\n   \"Applicable License\"), you must comply with the terms of the\n   Applicable License generally and the following provisions: (I) You\n   must include a copy of, or the URI for, the Applicable License with\n   every copy of each Adaptation You Distribute or Publicly Perform;\n   (II) You may not offer or impose any terms on the Adaptation that\n   restrict the terms of the Applicable License or the ability of the\n   recipient of the Adaptation to exercise the rights granted to that\n   recipient under the terms of the Applicable License; (III) You must\n   keep intact all notices that refer to the Applicable License and to\n   the disclaimer of warranties with every copy of the Work as\n   included in the Adaptation You Distribute or Publicly Perform; (IV)\n   when You Distribute or Publicly Perform the Adaptation, You may not\n   impose any effective technological measures on the Adaptation that\n   restrict the ability of a recipient of the Adaptation from You to\n   exercise the rights granted to that recipient under the terms of\n   the Applicable License. This Section 4(b) applies to the Adaptation\n   as incorporated in a Collection, but this does not require the\n   Collection apart from the Adaptation itself to be made subject to\n   the terms of the Applicable License.\n\n   c. If You Distribute, or Publicly Perform the Work or any\n   Adaptations or Collections, You must, unless a request has been\n   made pursuant to Section 4(a), keep intact all copyright notices\n   for the Work and provide, reasonable to the medium or means You are\n   utilizing: (i) the name of the Original Author (or pseudonym, if\n   applicable) if supplied, and/or if the Original Author and/or\n   Licensor designate another party or parties (e.g., a sponsor\n   institute, publishing entity, journal) for attribution\n   (\"Attribution Parties\") in Licensor's copyright notice, terms of\n   service or by other reasonable means, the name of such party or\n   parties; (ii) the title of the Work if supplied; (iii) to the\n   extent reasonably practicable, the URI, if any, that Licensor\n   specifies to be associated with the Work, unless such URI does not\n   refer to the copyright notice or licensing information for the\n   Work; and (iv) , consistent with Ssection 3(b), in the case of an\n   Adaptation, a credit identifying the use of the Work in the\n   Adaptation (e.g., \"French translation of the Work by Original\n   Author,\" or \"Screenplay based on original Work by Original\n   Author\"). The credit required by this Section 4(c) may be\n   implemented in any reasonable manner; provided, however, that in\n   the case of a Adaptation or Collection, at a minimum such credit\n   will appear, if a credit for all contributing authors of the\n   Adaptation or Collection appears, then as part of these credits and\n   in a manner at least as prominent as the credits for the other\n   contributing authors. For the avoidance of doubt, You may only use\n   the credit required by this Section for the purpose of attribution\n   in the manner set out above and, by exercising Your rights under\n   this License, You may not implicitly or explicitly assert or imply\n   any connection with, sponsorship or endorsement by the Original\n   Author, Licensor and/or Attribution Parties, as appropriate, of You\n   or Your use of the Work, without the separate, express prior\n   written permission of the Original Author, Licensor and/or\n   Attribution Parties.\n\n   d. Except as otherwise agreed in writing by the Licensor or as may\n   be otherwise permitted by applicable law, if You Reproduce,\n   Distribute or Publicly Perform the Work either by itself or as part\n   of any Adaptations or Collections, You must not distort, mutilate,\n   modify or take other derogatory action in relation to the Work\n   which would be prejudicial to the Original Author's honor or\n   reputation. Licensor agrees that in those jurisdictions (e.g.\n   Japan), in which any exercise of the right granted in Section 3(b)\n   of this License (the right to make Adaptations) would be deemed to\n   be a distortion, mutilation, modification or other derogatory\n   action prejudicial to the Original Author's honor and reputation,\n   the Licensor will waive or not assert, as appropriate, this\n   Section, to the fullest extent permitted by the applicable national\n   law, to enable You to reasonably exercise Your right under Section\n   3(b) of this License (right to make Adaptations) but not otherwise.\n\n5. Representations, Warranties and Disclaimer\n\n   UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING,\n   LICENSOR OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR\n   WARRANTIES OF ANY KIND CONCERNING THE WORK, EXPRESS, IMPLIED,\n   STATUTORY OR OTHERWISE, INCLUDING, WITHOUT LIMITATION, WARRANTIES\n   OF TITLE, MERCHANTIBILITY, FITNESS FOR A PARTICULAR PURPOSE,\n   NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS,\n   ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS, WHETHER OR NOT\n   DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF\n   IMPLIED WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU.\n\n6. Limitation on Liability.\n\n   EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE LAW, IN NO EVENT WILL\n   LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR ANY SPECIAL,\n   INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES ARISING\n   OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS\n   BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.\n\n7. Termination\n\n   a. This License and the rights granted hereunder will terminate\n   automatically upon any breach by You of the terms of this License.\n   Individuals or entities who have received Adaptations or\n   Collections from You under this License, however, will not have\n   their licenses terminated provided such individuals or entities\n   remain in full compliance with those licenses. Sections 1, 2, 5, 6,\n   7, and 8 will survive any termination of this License.\n\n   b. Subject to the above terms and conditions, the license granted\n   here is perpetual (for the duration of the applicable copyright in\n   the Work). Notwithstanding the above, Licensor reserves the right\n   to release the Work under different license terms or to stop\n   distributing the Work at any time; provided, however that any such\n   election will not serve to withdraw this License (or any other\n   license that has been, or is required to be, granted under the\n   terms of this License), and this License will continue in full\n   force and effect unless terminated as stated above.\n\n8. Miscellaneous\n\n   a. Each time You Distribute or Publicly Perform the Work or a\n   Collection, the Licensor offers to the recipient a license to the\n   Work on the same terms and conditions as the license granted to You\n   under this License.\n\n   b. Each time You Distribute or Publicly Perform an Adaptation,\n   Licensor offers to the recipient a license to the original Work on\n   the same terms and conditions as the license granted to You under\n   this License.\n\n   c. If any provision of this License is invalid or unenforceable\n   under applicable law, it shall not affect the validity or\n   enforceability of the remainder of the terms of this License, and\n   without further action by the parties to this agreement, such\n   provision shall be reformed to the minimum extent necessary to make\n   such provision valid and enforceable.\n\n   d. No term or provision of this License shall be deemed waived and\n   no breach consented to unless such waiver or consent shall be in\n   writing and signed by the party to be charged with such waiver or\n   consent.\n\n   e. This License constitutes the entire agreement between the\n   parties with respect to the Work licensed here. There are no\n   understandings, agreements or representations with respect to the\n   Work not specified here. Licensor shall not be bound by any\n   additional provisions that may appear in any communication from\n   You. This License may not be modified without the mutual written\n   agreement of the Licensor and You.\n\n   f. The rights granted under, and the subject matter referenced, in\n   this License were drafted utilizing the terminology of the Berne\n   Convention for the Protection of Literary and Artistic Works (as\n   amended on September 28, 1979), the Rome Convention of 1961, the\n   WIPO Copyright Treaty of 1996, the WIPO Performances and Phonograms\n   Treaty of 1996 and the Universal Copyright Convention (as revised\n   on July 24, 1971). These rights and subject matter take effect in\n   the relevant jurisdiction in which the License terms are sought to\n   be enforced according to the corresponding provisions of the\n   implementation of those treaty provisions in the applicable\n   national law. If the standard suite of rights granted under\n   applicable copyright law includes additional rights not granted\n   under this License, such additional rights are deemed to be\n   included in the License; this License is not intended to restrict\n   the license of any rights under applicable law.\n\n=======================================================================\n\nCreative Commons Notice\n\nCreative Commons is not a party to this License, and makes no warranty\nwhatsoever in connection with the Work. Creative Commons will not be\nliable to You or any party on any legal theory for any damages\nwhatsoever, including without limitation any general, special,\nincidental or consequential damages arising in connection to this\nlicense. Notwithstanding the foregoing two (2) sentences, if Creative\nCommons has expressly identified itself as the Licensor hereunder, it\nshall have all rights and obligations of Licensor.\n\nExcept for the limited purpose of indicating to the public that the\nWork is licensed under the CCPL, Creative Commons does not authorize\nthe use by either party of the trademark \"Creative Commons\" or any\nrelated trademark or logo of Creative Commons without the prior written\nconsent of Creative Commons. Any permitted use will be in compliance\nwith Creative Commons' then-current trademark usage guidelines, as may\nbe published on its website or otherwise made available upon request\nfrom time to time. For the avoidance of doubt, this trademark\nrestriction does not form part of the License.\n\nCreative Commons may be contacted at https://creativecommons.org/."
  },
  {
    "path": "Makefile",
    "content": "# TiDB for AI Documentation Site\n\n.PHONY: help install generate-demos generate-demo-pages generate-demo-gallery serve build clean check\n\nhelp:\n\t@echo \"TiDB for AI Documentation Site\"\n\t@echo \"=============================\"\n\t@echo \"Available commands:\"\n\t@echo \"  install         - Install dependencies using UV\"\n\t@echo \"  generate-demos  - Generate demo pages and gallery from demos.yml config\"\n\t@echo \"  generate-demo-pages - Only generate demo pages.\"\n\t@echo \"  generate-demo-gallery - Only generate demo gallery.\"\n\t@echo \"  serve           - Start the development server\"\n\t@echo \"  build           - Build the documentation site\"\n\t@echo \"  clean           - Clean build artifacts\"\n\t@echo \"  check           - Check dependencies and project setup\"\n\t@echo \"\"\n\t@echo \"Demo gallery configuration:\"\n\t@echo \"  Edit demos.yml in the project root to manage gallery content\"\n\t@echo \"\"\n\t@echo \"Prerequisites:\"\n\t@echo \"  UV package manager - curl -LsSf https://astral.sh/uv/install.sh | sh\"\n\ninstall:\n\tuv pip install -e .\n\ngenerate-demos:\n\tpython scripts/generate_demos.py\n\ngenerate-demo-pages:\n\tpython scripts/generate_demos.py --skip-gallery\n\ngenerate-demo-gallery:\n\tpython scripts/generate_demos.py --skip-demos\n\nserve:\n\tmkdocs serve\n\nbuild:\n\tmkdocs build\n\nclean:\n\trm -rf site/\n\trm -rf .mkdocs_cache/\n\ncheck:\n\tpython scripts/check_dependencies.py "
  },
  {
    "path": "README.md",
    "content": "# TiDB for AI\n\nTiDB is an open-source, distributed SQL database designed for modern AI applications, offering seamless scalability, real-time analytics, and unified storage for vectors, documents, knowledge graphs, operational data and more.\n\n🚀 Follow our [Quickstart Guide](https://pingcap.github.io/ai/quickstart/) to begin building your first AI application with TiDB\n\n## PyTiDB\n\nTiDB provide a Python SDK and a series of integrations with popular AI frameworks to help developers build AI applications efficiently.\n\nTo install the TiDB Python SDK, run the following command:\n\n```bash\npip install pytidb\n```\n\nExamples:\n\n- Getting Started: [Basic Usage](https://pingcap.github.io/ai/examples/basic-with-pytidb/), [Auto Embedding](https://pingcap.github.io/ai/examples/auto-embedding-with-pytidb/)\n- Search & Retrieval: [Vector Search](https://pingcap.github.io/ai/examples/vector-search-with-pytidb/), [Fulltext Search](https://pingcap.github.io/ai/examples/fulltext-search-with-pytidb/), [Hybrid Search](https://pingcap.github.io/ai/examples/hybrid-search-with-pytidb/), [Image Search](https://pingcap.github.io/ai/examples/image-search-with-pytidb/)\n- AI Applications: [RAG](https://pingcap.github.io/ai/examples/rag-with-pytidb/), [Text2SQL](https://pingcap.github.io/ai/examples/text2sql-with-pytidb/), [Memory](https://pingcap.github.io/ai/examples/memory-with-pytidb/)\n\n\nIntegrations:\n\n- AI Frameworks: [LlamaIndex](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-llamaindex/), [LangChain](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-langchain/)\n- ORM Libraries: [SQLAlchemy](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-sqlalchemy/), [Django-ORM](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-django-orm/), [Peewee](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-peewee/)\n- AI Services: [Bedrock](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-amazon-bedrock/)\n- Embedding Models/Services: [JinaAI](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-jinaai-embedding/)\n\n## Contribute\n\nWe welcome contributions to improve the TiDB for AI documentation! For development setup, maintenance scripts, and detailed contribution guidelines, please see [DEVELOPMENT.md](DEVELOPMENT.md).\n\n## FAQ\n\n### How can I get support?\n\n- [Join our Discord](https://discord.com/invite/vYU9h56kAX) (recommended)\n- [Ask questions in our forum](https://ask.pingcap.com/)\n- [Send support tickets](https://tidb.support.pingcap.com/)\n"
  },
  {
    "path": "demos.yml",
    "content": "# Demo Gallery Configuration\ntitle: \"Demo Gallery\"\ndescription: |\n  Explore hands-on demos showcasing how TiDB empowers AI applications.<br>\n  Get started quickly with <a href='https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=demo_gallery' target='_blank' rel='noopener noreferrer' class='gallery-cta-link'><b>TiDB Cloud Starter</b></a> to build your own AI-powered solutions.\n\nmeta:\n  description: \"Explore hands-on demos showcasing how TiDB empowers AI applications. Get started quickly with TiDB Cloud Starter to build your own AI-powered solutions.\"\n\ncategories:\n  - id: \"featured\"\n    title: \"⭐ Featured\"\n    demos: [\"image-search\", \"rag\", \"memory\"]\n    \n  - id: \"getting-started\"\n    title: \"🚀 Getting Started\"\n    demos: [\"basic\", \"auto-embedding\"]\n    \n  - id: \"search\"\n    title: \"🔍 Search & Retrieval\"\n    demos: [\"vector-search\", \"fulltext-search\", \"hybrid-search\", \"image-search\"]\n    \n  - id: \"ai-apps\"\n    title: \"🤖 AI Applications\"\n    demos: [\"rag\", \"memory\", \"text2sql\"]\n\ndemos:\n\n  - id: \"image-search\"\n    title: \"Image Search\"\n    description: \"Build an image search application using multimodal embeddings for both text-to-image and image-to-image search.\"\n    category: \"search\"\n    icon: null\n    background: null\n    link: \"image-search-with-pytidb/\"\n    doc_link: \"https://github.com/pingcap/pytidb/tree/main/examples/image_search/README.md\"\n    cover_image: \"https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9\"\n    \n  - id: \"rag\"\n    title: \"RAG\"\n    description: \"Build a RAG application that combines document retrieval with language generation.\"\n    category: \"ai-apps\"\n    icon: null\n    background: null\n    link: \"rag-with-pytidb/\"\n    doc_link: \"https://github.com/pingcap/pytidb/tree/main/examples/rag/README.md\"\n    cover_image: \"https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e\"\n    \n  - id: \"basic\"\n    title: \"Basic Usage\"\n    description: \"Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation.\"\n    category: \"getting-started\"\n    icon: \"⚙️\"\n    background: \"linear-gradient(135deg, #10b981, var(--brand-color))\"\n    link: \"basic-with-pytidb/\"\n    doc_link: \"https://github.com/pingcap/pytidb/tree/main/examples/basic/README.md\"\n    cover_image: null\n    \n  - id: \"auto-embedding\"\n    title: \"Auto Embedding\"\n    description: \"Automatically generate embeddings for your text data using built-in embedding models.\"\n    category: \"getting-started\"\n    icon: \"🤖\"\n    background: \"radial-gradient(circle at center, #8b5cf6 0%, var(--brand-color) 100%)\"\n    link: \"auto-embedding-with-pytidb/\"\n    doc_link: \"https://github.com/pingcap/pytidb/tree/main/examples/auto_embedding/README.md\"\n    cover_image: null\n    \n  - id: \"vector-search\"\n    title: \"Vector Search\"\n    description: \"Implement semantic search using vector embeddings to find similar content.\"\n    category: \"search\"\n    icon: null\n    background: null\n    link: \"vector-search-with-pytidb/\"\n    doc_link: \"https://github.com/pingcap/pytidb/tree/main/examples/vector_search/README.md\"\n    cover_image: \"https://github.com/user-attachments/assets/6d7783a5-ce9c-4dcc-8b95-49d5f0ca735a\"\n    \n  - id: \"fulltext-search\"\n    title: \"Fulltext Search\"\n    description: \"Perform traditional text search using MySQL fulltext search capabilities.\"\n    category: \"search\"\n    icon: null\n    background: null\n    link: \"fulltext-search-with-pytidb/\"\n    doc_link: \"https://github.com/pingcap/pytidb/tree/main/examples/fulltext_search/README.md\"\n    cover_image: \"https://github.com/user-attachments/assets/c81ddad4-f996-4b1f-85c0-5cbb55bc2a3a\"\n    \n  - id: \"hybrid-search\"\n    title: \"Hybrid Search\"\n    description: \"Combine vector search and fulltext search for more comprehensive results.\"\n    category: \"search\"\n    icon: null\n    background: null\n    link: \"hybrid-search-with-pytidb/\"\n    doc_link: \"https://github.com/pingcap/pytidb/tree/main/examples/hybrid_search/README.md\"\n    cover_image: \"https://github.com/user-attachments/assets/6e1c639d-2160-44c8-86b4-958913b9eca5\"\n    \n  - id: \"memory\"\n    title: \"Memory\"\n    description: \"Implement conversation memory for chatbots and conversational AI applications.\"\n    category: \"ai-apps\"\n    icon: null\n    background: null\n    link: \"memory-with-pytidb/\"\n    doc_link: \"https://github.com/pingcap/pytidb/tree/main/examples/memory/README.md\"\n    cover_image: \"https://github.com/user-attachments/assets/74dee96b-ea20-49dc-ad27-679faa5bf9b8\"\n    \n  - id: \"text2sql\"\n    title: \"Text2SQL\"\n    description: \"Convert natural language queries into SQL statements using AI models.\"\n    category: \"ai-apps\"\n    icon: \"💬\"\n    background: \"linear-gradient(135deg, #06b6d4, var(--brand-color))\"\n    link: \"text2sql-with-pytidb/\"\n    doc_link: \"https://github.com/pingcap/pytidb/tree/main/examples/text2sql/README.md\"\n    cover_image: null\n\n# CTA section configuration\ncta:\n  title: \"Ready to build your AI application?\"\n  description: \"Start your AI journey with TiDB Cloud Starter. Follow our quickstart guide to build your first AI-powered application in minutes, or explore specific examples for your use case.\"\n  buttons:\n    - text: \"Try TiDB Cloud Starter\"\n      url: \"https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme\"\n      type: \"primary\"\n      external: true\n    - text: \"View Quickstart Guide\"\n      url: \"/ai/quickstart/\"\n      type: \"secondary\"\n      external: false \n"
  },
  {
    "path": "mkdocs.yml",
    "content": "site_name: TiDB for AI\nsite_url: https://pingcap.github.io\nrepo_name: pingcap/tidb\nrepo_url: https://github.com/pingcap/tidb\nedit_uri: https://github.com/pingcap/pingcap.github.io/tree/main/src\ndocs_dir: src\nwatch:\n  - src\n\ntheme:\n  name: material\n  logo: assets/logo.svg\n  favicon: assets/favicon.ico\n  color_mode: dark\n  palette:\n    # Palette toggle for dark mode\n    - scheme: tidb-dark\n      primary: custom\n      toggle:\n        icon: material/weather-night\n        name: Switch to light mode\n    # Palette toggle for light mode\n    - scheme: tidb-light\n      primary: custom\n      toggle:\n        icon: material/weather-sunny\n        name: Switch to dark mode\n  features:\n    - content.code.copy\n    - content.tabs.link\n    - content.action.edit\n    - content.tooltips\n    - toc.follow\n    - navigation.top\n    - navigation.tabs\n    - navigation.tabs.sticky\n    - navigation.footer\n    - navigation.tracking\n    - navigation.instant\n    - content.footnote.tooltips\n  icon:\n    repo: fontawesome/brands/github\n    annotation: material/arrow-right-circle\n\nplugins:\n  - search\n  # - autorefs\n  # - mkdocs-jupyter\n  - redirects:\n      redirect_maps:\n        \"index.md\": \"ai/index.md\"\n        \"ai/integrations/mcp.md\": \"ai/integrations/tidb-mcp-server.md\"\n\nmarkdown_extensions:\n  - admonition\n  - footnotes\n  - pymdownx.critic\n  - pymdownx.caret\n  - pymdownx.keys\n  - pymdownx.mark\n  - pymdownx.tilde\n  - pymdownx.details\n  - pymdownx.highlight:\n      anchor_linenums: true\n      line_spans: __span\n      pygments_lang_class: true\n  - pymdownx.inlinehilite\n  - pymdownx.snippets:\n      base_path: ..\n      dedent_subsections: true\n  - pymdownx.superfences\n  - pymdownx.tabbed:\n      alternate_style: true\n  - md_in_html\n  - abbr\n  - attr_list\n  - pymdownx.snippets\n  - pymdownx.emoji:\n      emoji_index: !!python/name:material.extensions.emoji.twemoji\n      emoji_generator: !!python/name:material.extensions.emoji.to_svg\n  - markdown.extensions.toc:\n      baselevel: 1\n      permalink: \"\"\n  - pymdownx.arithmatex:\n      generic: true\n\nnav:\n  - Home:\n    - TiDB for AI: ai/index.md\n    - 🚀 Quickstart: ai/quickstart.md\n    - 📖 Concepts:\n      - Vector Search: ai/concepts/vector-search.md\n    - 📚 Guides:\n      - Connect to database: ai/guides/connect.md\n      - Working with tables: ai/guides/tables.md\n      - Vector Search: ai/guides/vector-search.md\n      - Fulltext Search: ai/guides/fulltext-search.md\n      - Hybrid Search: ai/guides/hybrid-search.md\n      - Image Search: ai/guides/image-search.md\n      - Auto Embedding: ai/guides/auto-embedding.md\n      - Reranking: ai/guides/reranking.md\n      - Filtering: ai/guides/filtering.md\n    - 💡 Demos:\n      - Gallery: ai/examples/index.md\n      - Basic Usage: ai/examples/basic-with-pytidb.md\n      - Auto Embedding: ai/examples/auto-embedding-with-pytidb.md\n      - Vector Search: ai/examples/vector-search-with-pytidb.md\n      - Fulltext Search: ai/examples/fulltext-search-with-pytidb.md\n      - Hybrid Search: ai/examples/hybrid-search-with-pytidb.md\n      - Image Search: ai/examples/image-search-with-pytidb.md\n      - RAG: ai/examples/rag-with-pytidb.md\n      - Memory: ai/examples/memory-with-pytidb.md\n      - Text2SQL: ai/examples/text2sql-with-pytidb.md\n    - 🔌 Integrations:\n      - MCP Integration:\n        - TiDB MCP Server: ai/integrations/tidb-mcp-server.md\n      - IDE & Tool Integration:\n        - Cursor: ai/integrations/tidb-mcp-cursor.md\n        - Claude Desktop: ai/integrations/tidb-mcp-claude-desktop.md\n        - Claude Code: ai/integrations/tidb-mcp-claude-code.md\n        - Visual Studio Code: ai/integrations/tidb-mcp-vscode.md\n        - Windsurf: ai/integrations/tidb-mcp-windsurf.md\n      - AI Frameworks:\n        - LlamaIndex: ai/integrations/llamaindex.md\n        - LangChain: ai/integrations/langchain.md\n      - Embeddings:\n        - Overview: ai/integrations/embedding-overview.md\n        - TiDB Cloud Hosted: ai/integrations/embedding-tidb-cloud-hosted.md\n        - OpenAI: ai/integrations/embedding-openai.md\n        - OpenAI Compatible: ai/integrations/embedding-openai-compatible.md\n        - Cohere: ai/integrations/embedding-cohere.md\n        - Jina AI: ai/integrations/embedding-jinaai.md\n        - Google Gemini: ai/integrations/embedding-gemini.md\n        - Hugging Face: ai/integrations/embedding-huggingface.md\n        - NVIDIA NIM: ai/integrations/embedding-nvidia-nim.md\n\n  - Concepts:\n    - Vector Search: ai/concepts/vector-search.md\n  - Guides:\n    - Connect to database: ai/guides/connect.md\n    - Working with tables: ai/guides/tables.md\n    - Vector Search: ai/guides/vector-search.md\n    - Fulltext Search: ai/guides/fulltext-search.md\n    - Hybrid Search: ai/guides/hybrid-search.md\n    - Image Search: ai/guides/image-search.md\n    - Auto Embedding: ai/guides/auto-embedding.md\n    - Reranking: ai/guides/reranking.md\n    - Filtering: ai/guides/filtering.md\n    - Transaction: ai/guides/transaction.md\n    - Raw Queries: ai/guides/raw-queries.md\n    - Multiple Table Joins: ai/guides/joins.md\n  - Demos:\n    - Gallery: ai/examples/index.md\n    - Basic Usage: ai/examples/basic-with-pytidb.md\n    - Auto Embedding: ai/examples/auto-embedding-with-pytidb.md\n    - Vector Search: ai/examples/vector-search-with-pytidb.md\n    - Fulltext Search: ai/examples/fulltext-search-with-pytidb.md\n    - Hybrid Search: ai/examples/hybrid-search-with-pytidb.md\n    - Image Search: ai/examples/image-search-with-pytidb.md\n    - RAG: ai/examples/rag-with-pytidb.md\n    - Memory: ai/examples/memory-with-pytidb.md\n    - Text2SQL: ai/examples/text2sql-with-pytidb.md\n  - Integrations:\n    - MCP Integration:\n      - TiDB MCP Server: ai/integrations/tidb-mcp-server.md\n    - IDE & Tool Integration:\n      - Cursor: ai/integrations/tidb-mcp-cursor.md\n      - Claude Desktop: ai/integrations/tidb-mcp-claude-desktop.md\n      - Claude Code: ai/integrations/tidb-mcp-claude-code.md\n      - Visual Studio Code: ai/integrations/tidb-mcp-vscode.md\n      - Windsurf: ai/integrations/tidb-mcp-windsurf.md\n    - AI Frameworks:\n      - LlamaIndex: ai/integrations/llamaindex.md\n      - LangChain: ai/integrations/langchain.md\n    - Embeddings:\n      - Overview: ai/integrations/embedding-overview.md\n      - TiDB Cloud Hosted: ai/integrations/embedding-tidb-cloud-hosted.md\n      - OpenAI: ai/integrations/embedding-openai.md\n      - OpenAI Compatible: ai/integrations/embedding-openai-compatible.md\n      - Cohere: ai/integrations/embedding-cohere.md\n      - Jina AI: ai/integrations/embedding-jinaai.md\n      - Google Gemini: ai/integrations/embedding-gemini.md\n      - Hugging Face: ai/integrations/embedding-huggingface.md\n      - NVIDIA NIM: ai/integrations/embedding-nvidia-nim.md\n\nextra:\n  social:\n    - icon: fontawesome/brands/github\n      link: https://github.com/pingcap/pytidb\n    - icon: fontawesome/brands/x-twitter\n      link: https://twitter.com/TiDB_Developer\n    - icon: fontawesome/brands/linkedin\n      link: https://www.linkedin.com/company/pingcap\n  analytics:\n    provider: google\n    property: G-W2XGDSWFBH\n\nextra_css:\n  - styles/extra.css\n\nextra_javascript:\n  - https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js\n"
  },
  {
    "path": "pyproject.toml",
    "content": "# TiDB for AI Documentation\n# This project uses UV for fast dependency management\n# Install UV: curl -LsSf https://astral.sh/uv/install.sh | sh\n# Install deps: make install (which runs: uv pip install -e .)\n\n[project]\nname = \"docs\"\nversion = \"0.1.0\"\ndescription = \"The documentation for TiDB and its AI capabilities\"\nreadme = \"README.md\"\nrequires-python = \">=3.12\"\ndependencies = [\n    \"mkdocs-material>=9.6.12\",\n    \"mkdocs-jupyter>=0.25.1\",\n    \"mkdocstrings[python]>=0.29.1\",\n    \"mkdocs>=1.6.1\",\n    \"mkdocs-redirects>=1.2.2\",\n    \"requests>=2.31.0\",\n    \"PyYAML>=6.0\",\n    \"Jinja2>=3.1.0\",\n    \"click>=8.0.0\",\n]\n"
  },
  {
    "path": "scripts/check_dependencies.py",
    "content": "#!/usr/bin/env python3\n\"\"\"Simple dependency check for TiDB for AI documentation.\"\"\"\n\nimport sys\nimport subprocess\n\ndef main():\n    print(\"🚀 TiDB for AI Documentation - Quick Check\")\n    \n    # Check UV\n    try:\n        result = subprocess.run([\"uv\", \"--version\"], capture_output=True, text=True)\n        if result.returncode == 0:\n            print(\"✅ UV is available\")\n        else:\n            print(\"❌ UV not working\")\n            return 1\n    except FileNotFoundError:\n        print(\"❌ UV not found - Install: curl -LsSf https://astral.sh/uv/install.sh | sh\")\n        return 1\n    \n    # Check basic imports\n    try:\n        import mkdocs, requests, yaml\n        print(\"✅ Dependencies are installed\")\n    except ImportError as e:\n        print(f\"❌ Missing dependency: {e}\")\n        print(\"💡 Run: make install\")\n        return 1\n    \n    print(\"🎉 Ready to go! Run 'make serve' to start.\")\n    return 0\n\nif __name__ == \"__main__\":\n    sys.exit(main()) "
  },
  {
    "path": "scripts/generate_demos.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nGenerate demo gallery and individual demo documentation pages.\nThis script reads configuration from demos.yml and generates:\n1. Individual demo pages using demo_template.j2\n2. Gallery index page using gallery_template.j2\n\"\"\"\n\nimport sys\nimport requests\nimport re\nimport yaml\nfrom pathlib import Path\nfrom jinja2 import Environment, FileSystemLoader\nimport click\n\n# Configuration\nLOCAL_EXAMPLES_DIR = Path(\"src/ai/examples\")\nCONFIG_FILE = Path(\"demos.yml\")\nDEMO_TEMPLATE_FILE = Path(\"src/templates/demo_page_template.j2\")\nGALLERY_TEMPLATE_FILE = Path(\"src/templates/demo_gallery_template.j2\")\nOUTPUT_FILE = LOCAL_EXAMPLES_DIR / \"index.md\"\n\n\ndef load_config():\n    \"\"\"Load configuration from YAML file.\"\"\"\n    if not CONFIG_FILE.exists():\n        raise click.ClickException(f\"Configuration file {CONFIG_FILE} not found.\")\n    \n    try:\n        with open(CONFIG_FILE, 'r', encoding='utf-8') as f:\n            config = yaml.safe_load(f)\n        return config\n    except yaml.YAMLError as e:\n        raise click.ClickException(f\"Error parsing {CONFIG_FILE}: {e}\")\n\n\ndef fetch_readme_from_doc_link(doc_link):\n    \"\"\"Fetch README.md content from the provided doc_link.\"\"\"\n    # Convert GitHub tree URL to raw content URL\n    if \"github.com\" in doc_link and \"/tree/\" in doc_link:\n        raw_url = doc_link.replace(\"github.com\", \"raw.githubusercontent.com\").replace(\"/tree/\", \"/\")\n    elif doc_link.endswith(\"/README.md\"):\n        raw_url = doc_link\n    else:\n        # Assume it's a directory link, append README.md\n        raw_url = doc_link.rstrip('/') + '/README.md'\n        if \"github.com\" in raw_url and \"/tree/\" in raw_url:\n            raw_url = raw_url.replace(\"github.com\", \"raw.githubusercontent.com\").replace(\"/tree/\", \"/\")\n    \n    try:\n        response = requests.get(raw_url, timeout=30)\n        response.raise_for_status()\n        return response.text\n    except requests.RequestException as e:\n        click.echo(f\"Error fetching README from {raw_url}: {e}\", err=True)\n        return None\n\n\ndef extract_repo_info_from_doc_link(doc_link):\n    \"\"\"Extract repository and path information from doc_link.\"\"\"\n    if \"github.com\" not in doc_link:\n        return None, None, None, None\n        \n    # Parse URL to extract owner, repo, and path\n    parts = doc_link.replace(\"https://github.com/\", \"\").split(\"/\")\n    if len(parts) < 2:\n        return None, None, None, None\n        \n    owner, repo = parts[0], parts[1]\n    \n    # Extract path after /tree/branch/\n    if \"/tree/\" in doc_link:\n        try:\n            tree_index = parts.index(\"tree\")\n            if len(parts) > tree_index + 2:  # owner/repo/tree/branch/path...\n                branch = parts[tree_index + 1]\n                path_parts = parts[tree_index + 2:]\n                # Remove README.md if present\n                if path_parts and path_parts[-1] == \"README.md\":\n                    path_parts = path_parts[:-1]\n                path = \"/\".join(path_parts)\n                return owner, repo, branch, path\n        except ValueError:\n            pass\n    \n    return None, None, None, None\n\n\ndef process_readme_content(content, demo_config):\n    \"\"\"Process README content to adapt it for the documentation site.\"\"\"\n    if not content:\n        return None\n    \n    # Extract repository info\n    owner, repo, branch, example_path = extract_repo_info_from_doc_link(demo_config['doc_link'])\n    \n    if not all([owner, repo, branch, example_path]):\n        click.echo(f\"Warning: Could not extract repo info from {demo_config['doc_link']}\", err=True)\n        return content\n    \n    base_repo_url = f\"https://github.com/{owner}/{repo}\"\n    base_raw_url = f\"https://raw.githubusercontent.com/{owner}/{repo}/{branch}\"\n    \n    # Fix relative links to point to the original repository\n    processed_content = re.sub(\n        r'\\]\\((?!https?://)(.*?)\\)',\n        f']({base_repo_url}/tree/{branch}/{example_path}/\\\\1)',\n        content\n    )\n    \n    # Fix relative image links\n    processed_content = re.sub(\n        r'!\\[([^\\]]*)\\]\\((?!https?://)(.*?)\\)',\n        f'![\\\\1]({base_raw_url}/{example_path}/\\\\2)',\n        processed_content\n    )\n    \n    return processed_content\n\n\ndef create_demo_page(demo_config, content):\n    \"\"\"Create a markdown file for a demo using the demo template.\"\"\"\n    if not DEMO_TEMPLATE_FILE.exists():\n        raise click.ClickException(f\"Demo template file {DEMO_TEMPLATE_FILE} not found.\")\n    \n    # Create the local examples directory if it doesn't exist\n    LOCAL_EXAMPLES_DIR.mkdir(parents=True, exist_ok=True)\n    \n    try:\n        # Set up Jinja2 environment\n        env = Environment(loader=FileSystemLoader(\"src/templates\"))\n        template = env.get_template('demo_page_template.j2')\n        \n        # Render the template\n        rendered_content = template.render(\n            demo=demo_config,\n            content=content\n        )\n        \n        # Create the markdown file\n        filename = f\"{demo_config['id'].replace('_', '-')}-with-pytidb.md\"\n        filepath = LOCAL_EXAMPLES_DIR / filename\n        \n        with open(filepath, 'w', encoding='utf-8') as f:\n            f.write(rendered_content)\n        \n        return filepath\n    except Exception as e:\n        raise click.ClickException(f\"Error creating demo page for {demo_config['id']}: {e}\")\n\n\ndef generate_gallery_page(config):\n    \"\"\"Generate the gallery page using Jinja2 template.\"\"\"\n    if not GALLERY_TEMPLATE_FILE.exists():\n        raise click.ClickException(f\"Gallery template file {GALLERY_TEMPLATE_FILE} not found.\")\n    \n    try:\n        # Set up Jinja2 environment\n        env = Environment(loader=FileSystemLoader(\"src/templates\"))\n        template = env.get_template('demo_gallery_template.j2')\n        \n        # Render the template\n        rendered_content = template.render(\n            config=config,\n            categories=config.get('categories', {}),\n            demos=config.get('demos', {})\n        )\n        \n        # Write the rendered content to the output file\n        with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:\n            f.write(rendered_content)\n        \n        return True\n    except Exception as e:\n        raise click.ClickException(f\"Error generating gallery page: {e}\")\n\n\ndef sync_demo_docs(config, fetch_from_remote=True):\n    \"\"\"Sync demo documentation files based on demos.yml configuration.\"\"\"\n    if not fetch_from_remote:\n        return []\n    \n    created_files = []\n    demos_config = config.get('demos', [])\n    \n    with click.progressbar(demos_config, label='Processing demos') as demos:\n        for demo_config in demos:\n            demo_id = demo_config['id']\n            doc_link = demo_config.get('doc_link')\n            \n            if not doc_link:\n                click.echo(f\"Warning: No doc_link found for demo '{demo_id}', skipping...\", err=True)\n                continue\n            \n            # Fetch README content\n            readme_content = fetch_readme_from_doc_link(doc_link)\n            \n            if readme_content:\n                # Process content\n                processed_content = process_readme_content(readme_content, demo_config)\n                \n                if processed_content:\n                    # Create demo page\n                    filepath = create_demo_page(demo_config, processed_content)\n                    if filepath:\n                        created_files.append((demo_id, filepath))\n                else:\n                    click.echo(f\"Failed to process content for {demo_id}\", err=True)\n            else:\n                click.echo(f\"Failed to fetch README for {demo_id}\", err=True)\n    \n    return created_files\n\n\n@click.command()\n@click.option('--skip-demos', is_flag=True, \n              help='Skip generating individual demo pages from remote repositories')\n@click.option('--skip-gallery', is_flag=True,\n              help='Skip generating the demo gallery index page')\n@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')\ndef main(skip_demos, skip_gallery, verbose):\n    \"\"\"Generate demo gallery and individual demo documentation pages.\n    \n    This script reads configuration from demos.yml and generates:\n    1. Individual demo pages using demo_template.j2\n    2. Gallery index page using gallery_template.j2\n    \n    By default, both demo pages and gallery are generated.\n    \"\"\"\n    if verbose:\n        click.echo(\"Running in verbose mode...\")\n    \n    # Load configuration\n    try:\n        config = load_config()\n    except click.ClickException:\n        raise\n    \n    created_files = []\n    \n    # Generate demo pages (unless skipped)\n    if not skip_demos:\n        if verbose:\n            click.echo(\"Generating demo pages from remote repositories...\")\n        created_files = sync_demo_docs(config, fetch_from_remote=True)\n        \n        if created_files:\n            click.echo(f\"\\n✅ Generated {len(created_files)} demo pages:\")\n            for demo_id, filepath in created_files:\n                click.echo(f\"   • {demo_id} → {filepath}\")\n        elif verbose:\n            click.echo(\"No demo pages were generated.\")\n    else:\n        if verbose:\n            click.echo(\"Skipping demo page generation...\")\n    \n    # Generate gallery page (unless skipped)\n    if not skip_gallery:\n        if verbose:\n            click.echo(\"Generating gallery page from template...\")\n        \n        if generate_gallery_page(config):\n            click.echo(f\"✅ Gallery page generated: {OUTPUT_FILE}\")\n        else:\n            raise click.ClickException(\"Failed to generate gallery page.\")\n    else:\n        if verbose:\n            click.echo(\"Skipping gallery page generation...\")\n    \n    # Check if nothing was generated\n    if skip_demos and skip_gallery:\n        click.echo(\"⚠️  Both demos and gallery generation were skipped. Nothing to do.\")\n    else:\n        click.echo(\"\\n🎉 Done! You can now run 'mkdocs serve' to view the documentation site.\")\n\n\nif __name__ == \"__main__\":\n    main() "
  },
  {
    "path": "scripts/sync_from_tidb_docs.py",
    "content": "import requests\nimport re\nimport sys\n\n\ndef remove_custom_content_blocks(content):\n    \"\"\"Remove <CustomContent ...>...</CustomContent> blocks from the content.\"\"\"\n    return re.sub(r\"<CustomContent[\\s\\S]*?</CustomContent>\", \"\", content)\n\n\ndef collapse_extra_blank_lines(content):\n    \"\"\"Collapse 3 or more blank lines to 2 blank lines.\"\"\"\n    return re.sub(r\"\\n{3,}\", \"\\n\\n\", content)\n\n\ndef convert_note_blocks(content):\n    \"\"\"Convert '> **Note:**' blocks to '!!! note' syntax with indented content.\"\"\"\n\n    def note_repl(m):\n        note_body = re.sub(r\"^> ?\", \"\", m.group(2), flags=re.MULTILINE).strip()\n        indented = \"\\n\".join(\n            \"    \" + line if line.strip() else \"\" for line in note_body.splitlines()\n        )\n        return \"!!! note\\n\\n\" + indented + \"\\n\\n\"\n\n    return re.sub(r\"> \\*\\*Note:\\*\\*\\n((?:> *\\n)*)(> .*(?:\\n|$)+)\", note_repl, content)\n\n\ndef convert_warning_blocks(content):\n    \"\"\"Convert '> **Warning:**' blocks to '!!! warning' syntax with indented content.\"\"\"\n\n    def warning_repl(m):\n        warning_body = re.sub(r\"^> ?\", \"\", m.group(2), flags=re.MULTILINE).strip()\n        indented = \"\\n\".join(\n            \"    \" + line if line.strip() else \"\" for line in warning_body.splitlines()\n        )\n        return \"!!! warning\\n\\n\" + indented + \"\\n\\n\"\n\n    return re.sub(r\"> \\*\\*Warning:\\*\\*\\n((?:> *\\n)*)(> .*(?:\\n|$)+)\", warning_repl, content)\n\n\ndef convert_tip_blocks(content):\n    \"\"\"Convert '> **Tip**' blocks to '!!! tip' syntax with indented content.\"\"\"\n\n    def tip_repl(m):\n        tip_body = re.sub(r\"^> ?\", \"\", m.group(2), flags=re.MULTILINE).strip()\n        indented = \"\\n\".join(\n            \"    \" + line if line.strip() else \"\" for line in tip_body.splitlines()\n        )\n        return \"!!! tip\\n\\n\" + indented + \"\\n\\n\"\n\n    return re.sub(r\"> \\*\\*Tip:?\\*\\*\\n((?:> *\\n)*)(> .*(?:\\n|$)+)\", tip_repl, content)\n\n\ndef remove_see_also_section(content):\n    \"\"\"Remove the '## See also' section and everything after it.\"\"\"\n    return re.sub(r\"## See also[\\s\\S]*$\", \"\", content)\n\n\ndef replace_image_paths(content):\n    \"\"\"Replace image paths to point to the local assets directory.\"\"\"\n    return content.replace(\n        \"/media/vector-search/embedding-search.png\", \"../../assets/embedding-search.png\"\n    )\n\n\ndef replace_relative_doc_links(content):\n    \"\"\"Replace relative doc links with full tidbcloud doc links, remove .md suffix and 'vector-search/' in path.\"\"\"\n\n    def link_repl(m):\n        path = m.group(1)\n        # Remove leading /, ./ or ../\n        path = re.sub(r\"^/|^\\./|^\\.\\./\", \"\", path)\n        path = path.replace(\"vector-search/\", \"\")  # Remove 'vector-search/' directory\n        return f\"(https://docs.pingcap.com/tidbcloud/{path})\"\n\n    return re.sub(r\"\\(((?:/|\\./|\\.\\./)[^)]+?)\\.md\\)\", link_repl, content)\n\n\ndef remove_overview_from_title(content):\n    \"\"\"Remove 'Overview' from the main title if present.\"\"\"\n    return re.sub(\n        r\"^(# .*)Overview(.*)$\",\n        lambda m: m.group(1).rstrip() + m.group(2) + \"\\n\",\n        content,\n        flags=re.MULTILINE,\n    )\n\n\ndef remove_front_matter(content):\n    \"\"\"Remove YAML front matter if present.\"\"\"\n    return re.sub(r\"^---[\\s\\S]*?---\\n\", \"\", content)\n\n\ndef remove_simpletab_blocks(content):\n    \"\"\"Remove <SimpleTab> and <div label=...> ... </div> blocks, flattening their content.\"\"\"\n    content = re.sub(r\"<SimpleTab>|</SimpleTab>\", \"\", content)\n    content = re.sub(r'<div label=\"[^\"]*\">', \"\", content)\n    content = re.sub(r\"</div>\", \"\", content)\n    return content\n\n\ndef process_overview():\n    url = \"https://raw.githubusercontent.com/pingcap/docs/refs/heads/master/vector-search/vector-search-overview.md\"\n    response = requests.get(url)\n    content = response.text\n    content = remove_custom_content_blocks(content)\n    content = collapse_extra_blank_lines(content)\n    content = convert_note_blocks(content)\n    content = remove_see_also_section(content)\n    content = replace_image_paths(content)\n    content = replace_relative_doc_links(content)\n    content = remove_overview_from_title(content)\n    save_to_file(content, \"./src/ai/concepts/vector-search.md\")\n\n\ndef process_llamaindex():\n    url = \"https://raw.githubusercontent.com/pingcap/docs/master/vector-search/vector-search-integrate-with-llamaindex.md\"\n    response = requests.get(url)\n    content = response.text\n    content = remove_front_matter(content)\n    content = remove_custom_content_blocks(content)\n    content = remove_simpletab_blocks(content)\n    content = collapse_extra_blank_lines(content)\n    content = convert_warning_blocks(content)\n    content = convert_note_blocks(content)\n    content = convert_tip_blocks(content)\n    content = remove_see_also_section(content)\n    content = replace_image_paths(content)\n    content = replace_relative_doc_links(content)\n    save_to_file(content, \"./src/ai/integrations/llamaindex.md\")\n\n\ndef process_langchain():\n    url = \"https://raw.githubusercontent.com/pingcap/docs/master/vector-search/vector-search-integrate-with-langchain.md\"\n    response = requests.get(url)\n    content = response.text\n    content = remove_front_matter(content)\n    content = remove_custom_content_blocks(content)\n    content = remove_simpletab_blocks(content)\n    content = collapse_extra_blank_lines(content)\n    content = convert_warning_blocks(content)\n    content = convert_note_blocks(content)\n    content = convert_tip_blocks(content)\n    content = remove_see_also_section(content)\n    content = replace_image_paths(content)\n    content = replace_relative_doc_links(content)\n    save_to_file(content, \"./src/ai/integrations/langchain.md\")\n\n\ndef save_to_file(content, filename):\n    \"\"\"Save the processed content to a file.\"\"\"\n    with open(filename, \"w\") as f:\n        f.write(content)\n\n\ndef main():\n    if len(sys.argv) > 1:\n        arg = sys.argv[1].lower()\n        if arg in (\"--llamaindex\", \"llamaindex\"):\n            process_llamaindex()\n            print(\"llamaindex doc synced.\")\n            return\n        elif arg in (\"--langchain\", \"langchain\"):\n            process_langchain()\n            print(\"langchain doc synced.\")\n            return\n        elif arg in (\"--overview\", \"overview\"):\n            process_overview()\n            print(\"overview doc synced.\")\n            return\n        else:\n            print(f\"Unknown argument: {arg}\")\n            print(\"Usage: python sync_from_tidb_docs.py [--llamaindex|--langchain|--overview]\")\n            return\n\n    process_overview()\n    print(\"overview doc synced (default).\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "src/ai/concepts/vector-search.md",
    "content": "---\ntitle: Vector Search Overview\nsummary: Learn about Vector Search in TiDB. This feature provides an advanced search solution for performing semantic similarity searches across various data types, including documents, images, audio, and video.\n---\n\n# Vector Search\n\n\nVector search offers a powerful solution for semantic similarity searches across diverse data types, such as documents, images, audio, and video. It allows developers to leverage their MySQL expertise to build scalable applications enriched with generative AI capabilities, simplifying the integration of advanced search functionality.\n\n!!! note\n\n    The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Starter](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-cloud-starter) clusters.\n\n## Concepts\n\nVector search is a search method that prioritizes the meaning of your data to deliver relevant results.\n\nUnlike traditional full-text search, which relies on exact keyword matching and word frequency, vector search converts various data types (such as text, images, or audio) into high-dimensional vectors and queries based on the similarity between these vectors. This search method captures the semantic meaning and contextual information of the data, leading to a more precise understanding of user intent.\n\nEven when the search terms do not exactly match the content in the database, vector search can still provide results that align with the user's intent by analyzing the semantics of the data.\n\nFor example, a full-text search for \"a swimming animal\" only returns results containing these exact keywords. In contrast, vector search can return results for other swimming animals, such as fish or ducks, even if these results do not contain the exact keywords.\n\n### Vector embedding\n\nA vector embedding, also known as an embedding, is a sequence of numbers that represents real-world objects in a high-dimensional space. It captures the meaning and context of unstructured data, such as documents, images, audio, and videos.\n\nVector embeddings are essential in machine learning and serve as the foundation for semantic similarity searches.\n\nTiDB introduces [Vector data types](https://docs.pingcap.com/tidbcloud/vector-search-data-types) and [Vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) designed to optimize the storage and retrieval of vector embeddings, enhancing their use in AI applications. You can store vector embeddings in TiDB and perform vector search queries to find the most relevant data using these data types.\n\n### Embedding model\n\nEmbedding models are algorithms that transform data into [vector embeddings](#vector-embedding).\n\nChoosing an appropriate embedding model is crucial for ensuring the accuracy and relevance of semantic search results. For unstructured text data, you can find top-performing text embedding models on the [Massive Text Embedding Benchmark (MTEB) Leaderboard](https://huggingface.co/spaces/mteb/leaderboard).\n\nTo learn how to generate vector embeddings for your specific data types, refer to integration tutorials or examples of embedding models.\n\n## How vector search works\n\nAfter converting raw data into vector embeddings and storing them in TiDB, your application can execute vector search queries to find the data most semantically or contextually relevant to a user's query.\n\nTiDB vector search identifies the top-k nearest neighbor (KNN) vectors by using a [distance function](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators) to calculate the distance between the given vector and vectors stored in the database. The vectors closest to the given vector in the query represent the most similar data in meaning.\n\n![The Schematic TiDB Vector Search](../../assets/embedding-search.png)\n\nAs a relational database with integrated vector search capabilities, TiDB enables you to store data and their corresponding vector representations (that is, vector embeddings) together in one database. You can choose any of the following ways for storage:\n\n- Store data and their corresponding vector representations in different columns of the same table.\n- Store data and their corresponding vector representation in different tables. In this way, you need to use `JOIN` queries to combine the tables when retrieving data.\n\n## Use cases\n\n### Retrieval-Augmented Generation (RAG)\n\nRetrieval-Augmented Generation (RAG) is an architecture designed to optimize the output of Large Language Models (LLMs). By using vector search, RAG applications can store vector embeddings in the database and retrieve relevant documents as additional context when the LLM generates responses, thereby improving the quality and relevance of the answers.\n\n### Semantic search\n\nSemantic search is a search technology that returns results based on the meaning of a query, rather than simply matching keywords. It interprets the meaning across different languages and various types of data (such as text, images, and audio) using embeddings. Vector search algorithms then use these embeddings to find the most relevant data that satisfies the user's query.\n\n### Recommendation engine\n\nA recommendation engine is a system that proactively suggests content, products, or services that are relevant and personalized to users. It accomplishes this by creating embeddings that represent user behavior and preferences. These embeddings help the system identify similar items that other users have interacted with or shown interest in. This increases the likelihood that the recommendations will be both relevant and appealing to the user.\n\n"
  },
  {
    "path": "src/ai/examples/auto-embedding-with-pytidb.md",
    "content": "---\ntitle: Auto Embedding\ndescription: \"Automatically generate embeddings for your text data using built-in embedding models.\"\nsource_repo: \"https://github.com/pingcap/pytidb/tree/main/examples/auto_embedding\"\n---\n\n# Auto Embedding Demo\n\nThis example showcases how to use the auto embedding feature with PyTiDB Client.\n\n* Connect to TiDB with PyTiDB Client\n* Define a table with a VectorField configured for automatic embedding\n* Insert plain text data, embeddings are populated automatically in the background\n* Run vector searches with natural language queries, embedding happens transparently\n\n## Prerequisites\n\n- **Python 3.10+**\n- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)\n\n## How to run\n\n**Step 1**: Clone the repository\n\n```bash\ngit clone https://github.com/pingcap/pytidb.git\ncd pytidb/examples/auto_embedding/\n```\n\n**Step 2**: Install the required packages\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\npip install -r reqs.txt\n```\n\n**Step 3**: Set up environment to connect to database\n\nGo to [TiDB Cloud console](https://tidbcloud.com/clusters) to get the connection parameters and set up the environment variable like this:\n\n```bash\ncat > .env <<EOF\nTIDB_HOST={gateway-region}.prod.aws.tidbcloud.com\nTIDB_PORT=4000\nTIDB_USERNAME={prefix}.root\nTIDB_PASSWORD={password}\nTIDB_DATABASE=test\n\n# Using TiDB Cloud Free embedding model by default, which is no required to set up any API key\nEMBEDDING_PROVIDER=tidbcloud_free\nEOF\n```\n\n**Step 4**: Run the demo\n\n```bash\npython main.py\n```\n\n**Expected output:**\n\n```plain\n=== Define embedding function ===\nEmbedding function (model id: tidbcloud_free/amazon/titan-embed-text-v2) defined\n\n=== Define table schema ===\nTable created\n\n=== Truncate table ===\nTable truncated\n\n=== Insert sample data ===\nInserted 3 chunks\n\n=== Perform vector search ===\nid: 1, text: TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads., distance: 0.30373281240458805\nid: 2, text: PyTiDB is a Python library for developers to connect to TiDB., distance: 0.422506501973434\nid: 3, text: LlamaIndex is a Python library for building AI-powered applications., distance: 0.5267239638442787\n```\n\n\n---\n\n## Related Resources\n\n- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/auto_embedding)\n- **Category**: Getting-Started\n\n- **Description**: Automatically generate embeddings for your text data using built-in embedding models.\n\n\n[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } "
  },
  {
    "path": "src/ai/examples/basic-with-pytidb.md",
    "content": "---\ntitle: Basic Usage\ndescription: \"Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation.\"\nsource_repo: \"https://github.com/pingcap/pytidb/tree/main/examples/basic\"\n---\n\n# Basic CRUD Demo\n\nThis example demonstrates basic CRUD (Create, Read, Update, Delete) operations with PyTiDB.\n\n* Use PyTiDB Client to connect to TiDB\n* Create a table with text, vector, and JSON columns\n* Perform basic CRUD operations on data\n\n## Prerequisites\n\n- **Python 3.10+**\n- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)\n\n## How to run\n\n**Step 1**: Clone the repository to local\n\n```bash\ngit clone https://github.com/pingcap/pytidb.git\ncd pytidb/examples/basic/\n```\n\n**Step 2**: Install the required packages\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\npip install -r reqs.txt\n```\n\n**Step 3**: Set up environment to connect to database\n\nGo to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:\n\n```bash\ncat > .env <<EOF\nTIDB_HOST={gateway-region}.prod.aws.tidbcloud.com\nTIDB_PORT=4000\nTIDB_USERNAME={prefix}.root\nTIDB_PASSWORD={password}\nTIDB_DATABASE=test\nEOF\n```\n\n**Step 4**: Run the demo\n\n```bash\npython main.py\n```\n\n*Expected output:*\n\n```plain\n=== CREATE TABLE ===\nTable created\n\n=== TRUNCATE TABLE ===\nTable truncated\n\n=== CREATE ===\nCreated 3 items\n\n=== READ ===\nID: 1, Content: TiDB is a distributed SQL database, Metadata: {'category': 'database'}\nID: 2, Content: GPT-4 is a large language model, Metadata: {'category': 'llm'}\nID: 3, Content: LlamaIndex is a Python library for building AI-powered applications, Metadata: {'category': 'rag'}\n\n=== UPDATE ===\nUpdated item #1\nAfter update - ID: 1, Content: TiDB Cloud Starter is a fully-managed, auto-scaling cloud database service, Metadata: {'category': 'dbass'}\n\n=== DELETE ===\nDeleted item #2\n\n=== FINAL STATE ===\nID: 1, Content: TiDB Cloud Starter is a fully-managed, auto-scaling cloud database service, Metadata: {'category': 'dbass'}\nID: 3, Content: LlamaIndex is a Python library for building AI-powered applications, Metadata: {'category': 'rag'}\n\n=== COUNT ROWS ===\nNumber of rows: 2\n\n=== DROP TABLE ===\nTable dropped\n\nBasic CRUD operations completed!\n```\n\n---\n\n## Related Resources\n\n- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/basic)\n- **Category**: Getting-Started\n\n- **Description**: Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation.\n\n\n[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } "
  },
  {
    "path": "src/ai/examples/fulltext-search-with-pytidb.md",
    "content": "---\ntitle: Fulltext Search\ndescription: \"Perform traditional text search using MySQL fulltext search capabilities.\"\nsource_repo: \"https://github.com/pingcap/pytidb/tree/main/examples/fulltext_search\"\n---\n\n# Fulltext Search Example\n\nThis example demonstrates how to build a E-commerce product search application using TiDB's full-text search feature with multilingual support. Users can search for products by keywords in their preferred language.\n\n<p align=\"center\">\n  <img width=\"700\" alt=\"E-commerce product search with full-text search\" src=\"https://github.com/user-attachments/assets/c81ddad4-f996-4b1f-85c0-5cbb55bc2a3a\" />\n  <p align=\"center\"><i>E-commerce product search with full-text search</i></p>\n</p>\n\n## Prerequisites\n\n- **Python 3.10+**\n- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)\n\n## How to run\n\n**Step 1**: Clone the repository to local\n\n```bash\ngit clone https://github.com/pingcap/pytidb.git\ncd pytidb/examples/fulltext_search/;\n```\n\n**Step 2**: Install the required packages and setup environment\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\npip install -r reqs.txt\n```\n\n**Step 3**: Set up environment to connect to database\n\nGo to the [TiDB Cloud console](https://tidbcloud.com/), create a new cluster if you don't have one, and then get the connection parameters on the connection dialog.\n\n```bash\ncat > .env <<EOF\nTIDB_HOST={gateway-region}.prod.aws.tidbcloud.com\nTIDB_PORT=4000\nTIDB_USERNAME={prefix}.root\nTIDB_PASSWORD={password}\nTIDB_DATABASE=pytidb_fulltext_demo\nEOF\n```\n\n**Step 4**: Run the Streamlit app\n\n```bash\nstreamlit run app.py\n```\n\n**Step 5**: open the browser and visit `http://localhost:8501`\n\n\n---\n\n## Related Resources\n\n- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/fulltext_search)\n- **Category**: Search\n\n- **Description**: Perform traditional text search using MySQL fulltext search capabilities.\n\n\n[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } "
  },
  {
    "path": "src/ai/examples/hybrid-search-with-pytidb.md",
    "content": "---\ntitle: Hybrid Search\ndescription: \"Combine vector search and fulltext search for more comprehensive results.\"\nsource_repo: \"https://github.com/pingcap/pytidb/tree/main/examples/hybrid_search\"\n---\n\n# Hybrid Search Demo\n\nIn this demo, we will show you how to use hybrid search to combine vector search and full-text search on a set of documents.\n\n<p align=\"center\">\n    <img src=\"https://github.com/user-attachments/assets/6e1c639d-2160-44c8-86b4-958913b9eca5\" alt=\"TiDB Hybrid Search Demo\" width=\"700\"/>\n    <p align=\"center\"><i>TiDB Hybrid Search Demo</i></p>\n</p>\n\n## Prerequisites\n\n* Python 3.10+\n* TiDB database instance (👉 [Create a free TiDB Serverless Cluster](https://tidbcloud.com/free-trial))\n* OpenAI API key (Go to [OpenAI](https://platform.openai.com/api-keys) to get the API key)\n\n> **Note**\n> \n> Currently, full-text search is only available for the following product option and region:\n>\n> - TiDB Cloud Starter: Frankfurt (eu-central-1), Singapore (ap-southeast-1)\n\n## How to run\n\n**Step 1**: Clone the repository\n\n```bash\ngit clone https://github.com/pingcap/pytidb.git\ncd pytidb/examples/hybrid_search;\n```\n\n**Step 2**: Install the required packages and setup environment\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\npip install -r reqs.txt\n```\n\n**Step 3**: Set up environment to connect to storage\n\nIf you are using TiDB Cloud, you can find the connection parameters in the [TiDB Cloud console](https://tidbcloud.com/).\n\n```bash\ncat > .env <<EOF\nTIDB_HOST=localhost\nTIDB_PORT=4000\nTIDB_USERNAME=root\nTIDB_PASSWORD=\nTIDB_DATABASE=pytidb_hybrid_demo\nOPENAI_API_KEY=<your-openai-api-key>\nEOF\n```\n\n**Step 4**: Run the demo\n\n**Option 1**: Run the Streamlit app\n\nIf you want to check the demo with a web UI, you can run the following command:\n\n```bash\nstreamlit run app.py\n```\n\nOpen the browser and visit `http://localhost:8501`\n\n**Option 2**: Run the demo script\n\nIf you want to check the demo with a script, you can run the following command:\n\n```bash\npython example.py\n```\n\nExpected output:\n\n```\n=== CONNECT TO TIDB ===\nConnected to TiDB.\n\n=== CREATE TABLE ===\nTable created.\n\n=== INSERT SAMPLE DATA ===\nInserted 3 rows.\n\n=== PERFORM HYBRID SEARCH ===\nSearch results:\n[\n    {\n        \"_distance\": 0.4740166257687124,\n        \"_match_score\": 1.6804268,\n        \"_score\": 0.03278688524590164,\n        \"id\": 60013,\n        \"text\": \"TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads.\"\n    },\n    {\n        \"_distance\": 0.6428459116216618,\n        \"_match_score\": 0.78427225,\n        \"_score\": 0.03200204813108039,\n        \"id\": 60015,\n        \"text\": \"LlamaIndex is a Python library for building AI-powered applications.\"\n    },\n    {\n        \"_distance\": 0.641581407158715,\n        \"_match_score\": null,\n        \"_score\": 0.016129032258064516,\n        \"id\": 60014,\n        \"text\": \"PyTiDB is a Python library for developers to connect to TiDB.\"\n    }\n]\n```\n\n\n\n---\n\n## Related Resources\n\n- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/hybrid_search)\n- **Category**: Search\n\n- **Description**: Combine vector search and fulltext search for more comprehensive results.\n\n\n[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } "
  },
  {
    "path": "src/ai/examples/image-search-with-pytidb.md",
    "content": "---\ntitle: Image Search\ndescription: \"Build an image search application using multimodal embeddings for both text-to-image and image-to-image search.\"\nsource_repo: \"https://github.com/pingcap/pytidb/tree/main/examples/image_search\"\n---\n\n# Pet Image Search Demo\n\nThis example showcases how to build a powerful image search application by combining TiDB's vector search capabilities with multimodal embedding models.\n\nWith just a few lines of code, you can create an intelligent search system that understands both text and images.\n\n- 🔍 **Text-to-Image Search**: Find the perfect pet photos by describing what you're looking for in natural language - from \"fluffy orange cat\"\n- 🖼️ **Image-to-Image Search**: Upload a photo and instantly discover visually similar pets based on breed, color, pose and more\n\n<p align=\"center\">\n  <img width=\"700\" alt=\"PyTiDB Image Search Demo\" src=\"https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9\" />\n  <p align=\"center\"><i>Pet image search via multimodal embeddings</i></p>\n</p>\n\n\n## Prerequisites\n\n- **Python 3.10+**\n- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)\n- **Jina AI API Key**: Get your free API key at [jina.ai Embeddings ↗️](https://jina.ai/embeddings/)\n\n## How to run\n\n**Step 1**: Clone the repository to local\n\n```bash\ngit clone https://github.com/pingcap/pytidb.git\ncd pytidb/examples/image_search/\n```\n\n**Step 2**: Install the required packages\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate  # Windows: .venv\\Scripts\\activate\npip install -r reqs.txt\n```\n\n**Step 3**: Set up environment variables\n\nGo to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:\n\n```bash\ncat > .env <<EOF\nTIDB_HOST={gateway-region}.prod.aws.tidbcloud.com\nTIDB_PORT=4000\nTIDB_USERNAME={prefix}.root\nTIDB_PASSWORD={password}\nTIDB_DATABASE=test\n\nJINA_AI_API_KEY={your-jina-ai-api-key}\nEOF\n```\n\n**Step 3**: Download and extract the dataset\n\nIn this demo, we will use the [Oxford Pets dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) to load pet images to the database for search.\n\n*For Linux/MacOS:*\n\n```bash\n# Download the dataset\ncurl -L -o oxford_pets.tar.gz \"https://thor.robots.ox.ac.uk/~vgg/data/pets/images.tar.gz\"\n\n# Extract the dataset\nmkdir -p oxford_pets\ntar -xzf oxford_pets.tar.gz -C oxford_pets\n```\n\n**Step 4**: Run the app\n\n```bash\nstreamlit run app.py\n```\n\nOpen `http://localhost:8501` in your browser.\n\n**Step 5**: Load data\n\nIn the sample app, you can click the **Load Sample Data** button to load some sample data to the database.\n\nOr if you want to load all the data in the Oxford Pets dataset, click the **Load All Data** button.\n\n**Step 6**: Search\n\n1. Select the **Search type** in the sidebar\n2. Input a text description of the pet you're looking for, or upload a photo of a dog or cat\n3. Click the **Search** button\n\n\n---\n\n## Related Resources\n\n- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/image_search)\n- **Category**: Search\n\n- **Description**: Build an image search application using multimodal embeddings for both text-to-image and image-to-image search.\n\n\n[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } "
  },
  {
    "path": "src/ai/examples/index.md",
    "content": "---\ntitle: Demo Gallery\ndescription: Explore hands-on demos showcasing how TiDB empowers AI applications. Get started quickly with TiDB Cloud Starter to build your own AI-powered solutions.\nhide:\n  - navigation\n  - toc\n  - pageTitle\n  - editButton \n---\n\n<style>\n\n/* CSS Variables */\n:root {\n  --brand-color: #de243d;\n  --brand-hover: #b71e34;\n  --border-radius-sm: 0.5rem;\n  --border-radius-md: 0.75rem;\n  --border-radius-lg: 1rem;\n  --spacing-sm: 1rem;\n  --spacing-md: 2rem;\n  --spacing-lg: 3rem;\n  --transition-fast: 0.2s;\n  --transition-normal: 0.3s;\n  --dark-overlay: rgba(255, 255, 255, 0.08);\n  --dark-border: rgba(255, 255, 255, 0.1);\n  --dark-bg-subtle: rgba(255, 255, 255, 0.05);\n}\n\n/* Smooth scrolling for the entire page */\nhtml {\n  scroll-behavior: smooth;\n}\n\n/* Gallery Container */\n.gallery-container {\n  max-width: 1280px;\n  margin: 0 auto;\n  padding: var(--spacing-md) var(--spacing-sm);\n}\n\n/* Header */\n.gallery-header {\n  text-align: center;\n  margin-bottom: var(--spacing-lg);\n}\n\n.gallery-title {\n  font-size: 72px !important;\n  font-weight: 800 !important;\n  margin-bottom: 8px !important;\n  line-height: 1 !important;\n  color: var(--md-default-fg-color) !important;\n}\n\n.gallery-description {\n  font-size: 22px !important;\n  color: var(--md-default-fg-color--light) !important;\n  padding: 0 120px;\n  margin-bottom: 5rem !important;\n}\n\n/* Gallery CTA link styles */\n.gallery-cta-link {\n    position: relative;\n    text-decoration: none;\n    transition: all 0.3s ease-in-out;\n}\n\n.gallery-cta-link:hover {\n    border-bottom: 3px solid var(--brand-color);\n}\n\n\n/* Layout */\n.gallery-layout {\n  display: flex;\n  gap: var(--spacing-sm) !important;\n}\n\n/* Sidebar */\n.gallery-sidebar {\n  width: 8rem;\n  flex-shrink: 0;\n}\n\n.sidebar-nav {\n  position: sticky;\n  top: 140px;\n}\n\n.sidebar-title {\n  font-size: 14px !important;\n  font-weight: 400 !important;\n  color: var(--md-default-fg-color--light) !important;\n  text-transform: uppercase;\n  letter-spacing: 0.05em;\n  margin: 0 0 0.5rem 0 !important;\n}\n\n.sidebar-links {\n  display: flex;\n  flex-direction: column;\n  gap: 2px;\n  margin-left: -12px;\n}\n\n.sidebar-link {\n  display: block;\n  padding: 8px 12px;\n  border-radius: var(--border-radius-sm);\n  font-size: 14px;\n  font-weight: 400 !important;\n  color: var(--md-default-fg-color--light) !important;\n  text-decoration: none !important;\n  transition: all var(--transition-fast) ease;\n  text-align: left;\n  cursor: pointer;\n}\n\n.sidebar-link:hover {\n  background-color: var(--md-default-fg-color--lightest) !important;\n  color: var(--md-default-fg-color) !important;\n  font-weight: 500 !important;\n  transform: translateX(2px);\n}\n\n.sidebar-link:focus-visible {\n  outline: 2px solid var(--brand-color);\n  outline-offset: 2px;\n}\n\n/* Content */\n.gallery-content {\n  flex: 1;\n  padding: 0 var(--spacing-lg);\n}\n\n.gallery-section {\n  margin-bottom: var(--spacing-lg);\n  scroll-margin-top: 120px;\n}\n\n.section-title {\n  font-size: 24px !important;\n  font-weight: 700 !important;\n  color: var(--md-default-fg-color) !important;\n  margin: 0 0 1.5rem 0 !important;\n}\n\n.cards-grid {\n  display: grid;\n  grid-template-columns: repeat(3, 1fr);\n  gap: var(--spacing-sm) !important;\n}\n\n/* Cards */\n.gallery-card {\n  display: block;\n  background: var(--md-default-bg-color);\n  border: 1px solid var(--md-default-fg-color--lightest);\n  border-radius: var(--border-radius-md);\n  overflow: hidden;\n  transition: all var(--transition-normal) ease;\n  text-decoration: none !important;\n  color: inherit;\n}\n\n.gallery-card:hover {\n  transform: translateY(-2px);\n  box-shadow: var(--md-shadow-z2);\n  text-decoration: none !important;\n  outline: 2px solid var(--brand-color);\n  outline-offset: 2px;\n}\n\n\n.gallery-card:hover .card-title {\n  color: var(--brand-color) !important;\n}\n\n.card-image {\n  height: 8rem;\n  position: relative;\n  overflow: hidden;\n  background-color: var(--md-default-fg-color--lightest);\n  border-bottom: 1px solid var(--md-default-fg-color--lightest);\n}\n\n.card-image img {\n  width: 100%;\n  height: 100%;\n  object-fit: cover;\n}\n\n.card-gradient {\n  display: flex;\n  align-items: center;\n  justify-content: center;\n  font-size: 2.5rem;\n}\n\n.card-badge {\n  position: absolute;\n  top: 0.5rem;\n  left: 0.5rem;\n  background: rgba(0, 0, 0, 0.7);\n  color: white;\n  padding: 0.2rem 0.4rem;\n  border-radius: 0.2rem;\n  font-size: 0.5rem !important;\n  font-weight: 500 !important;\n}\n\n.card-content {\n  padding: 12px;\n}\n\n.card-title {\n  font-size: 0.75rem !important;\n  font-weight: 600 !important;\n  line-height: 1.5 !important;\n  margin: 0 !important;\n  color: var(--md-default-fg-color) !important;\n}\n\n.card-description {\n  color: var(--md-default-fg-color--light) !important;\n  font-size: 0.65rem !important;\n  line-height: 1.5;\n  display: -webkit-box;\n  margin: 0;\n  -webkit-line-clamp: 3;\n  -webkit-box-orient: vertical;\n  overflow: hidden;\n}\n\n/* CTA */\n.gallery-cta {\n  background: linear-gradient(135deg, rgba(222, 36, 61, 0.08) 0%, rgba(99, 102, 241, 0.08) 100%);\n  border: 1px solid rgba(222, 36, 61, 0.1);\n  border-radius: var(--border-radius-lg);\n  padding: var(--spacing-md);\n  text-align: center;\n  margin-top: var(--spacing-lg);\n}\n\n.cta-title {\n  font-size: 1.5rem;\n  font-weight: 700;\n  color: var(--md-default-fg-color);\n  margin-bottom: var(--spacing-sm);\n}\n\n.cta-description {\n  color: var(--md-default-fg-color--light);\n  margin: 0 auto var(--spacing-md);\n  max-width: 42rem;\n}\n\n.cta-buttons {\n  display: flex;\n  justify-content: center;\n  gap: var(--spacing-sm);\n  flex-wrap: wrap;\n}\n\n/* Button shared styles */\n.btn-primary,\n.btn-secondary {\n  padding: 0.75rem 1.5rem;\n  border-radius: var(--border-radius-sm);\n  font-weight: 500;\n  text-decoration: none !important;\n  transition: all var(--transition-fast);\n}\n\n.btn-primary {\n  background-color: var(--brand-color);\n  color: white !important;\n}\n\n.btn-primary:hover {\n  background-color: var(--brand-hover);\n  color: white !important;\n}\n\n.btn-secondary {\n  border: 1px solid var(--md-default-fg-color--lighter);\n  background-color: var(--md-default-bg-color);\n  color: var(--md-default-fg-color) !important;\n}\n\n.btn-secondary:hover {\n  background-color: var(--md-default-fg-color--lightest);\n  color: var(--md-default-fg-color) !important;\n}\n\n/* Dark mode styles */\n[data-md-color-scheme=\"tidb-dark\"] .sidebar-link:hover {\n  background-color: var(--dark-overlay) !important;\n}\n\n[data-md-color-scheme=\"tidb-dark\"] .gallery-card {\n  border-color: var(--dark-border);\n}\n\n[data-md-color-scheme=\"tidb-dark\"] .card-image {\n  background-color: var(--dark-bg-subtle);\n  border-bottom-color: var(--dark-border);\n}\n\n[data-md-color-scheme=\"tidb-dark\"] .gallery-cta {\n  background: linear-gradient(135deg, rgba(222, 36, 61, 0.12) 0%, rgba(99, 102, 241, 0.12) 100%);\n  border-color: rgba(222, 36, 61, 0.2);\n}\n\n[data-md-color-scheme=\"tidb-dark\"] .btn-secondary:hover {\n  background-color: rgba(255, 255, 255, 0.1);\n}\n\n/* Responsive adjustments */\n@media (max-width: 768px) {\n  .gallery-layout { flex-direction: column; }\n  .gallery-sidebar { width: 100%; }\n  .sidebar-nav { position: static; }\n  .sidebar-links { flex-direction: row; gap: 0.5rem; flex-wrap: wrap; }\n  .sidebar-link { padding: 12px 16px !important; min-height: 44px; display: flex; align-items: center; }\n  .gallery-content { padding: 0; }\n  .gallery-description { padding: 0 20px; }\n  .cards-grid { grid-template-columns: 1fr; }\n  .gallery-title { font-size: 48px !important; }\n}\n\n@media (max-width: 1024px) and (min-width: 769px) {\n  .cards-grid { grid-template-columns: repeat(2, 1fr); }\n}\n\n</style>\n\n\n\n<div class=\"gallery-container\">\n  <div class=\"gallery-header\">\n    <h1 class=\"gallery-title\">Demo Gallery</h1>\n    <p class=\"gallery-description\">\n      Explore hands-on demos showcasing how TiDB empowers AI applications.<br>\nGet started quickly with <a href='https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=demo_gallery' target='_blank' rel='noopener noreferrer' class='gallery-cta-link'><b>TiDB Cloud Starter</b></a> to build your own AI-powered solutions.\n\n    </p>\n  </div>\n\n  <div class=\"gallery-layout\">\n    <div class=\"gallery-sidebar\">\n      <div class=\"sidebar-nav\">\n        <h6 class=\"sidebar-title\">Categories</h6>\n                          <nav class=\"sidebar-links\">\n            <a href=\"#featured\" class=\"sidebar-link\">Featured</a>\n            <a href=\"#getting-started\" class=\"sidebar-link\">Getting Started</a>\n            <a href=\"#search\" class=\"sidebar-link\">Search & Retrieval</a>\n            <a href=\"#ai-apps\" class=\"sidebar-link\">AI Applications</a>\n        </nav>\n      </div>\n    </div>\n              <div class=\"gallery-content\">\n        <section id=\"featured\" class=\"gallery-section\">\n        <h2 class=\"section-title\">⭐ Featured</h2>\n        <div class=\"cards-grid\">\n          <a href=\"image-search-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image\">\n              <img src=\"https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9\" alt=\"Image Search Demo\">\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">Image Search</h3>\n              <p class=\"card-description\">\n                Build an image search application using multimodal embeddings for both text-to-image and image-to-image search.\n              </p>\n            </div>\n          </a>\n          <a href=\"rag-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image\">\n              <img src=\"https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e\" alt=\"RAG Demo\">\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">RAG</h3>\n              <p class=\"card-description\">\n                Build a RAG application that combines document retrieval with language generation.\n              </p>\n            </div>\n          </a>\n          <a href=\"memory-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image\">\n              <img src=\"https://github.com/user-attachments/assets/74dee96b-ea20-49dc-ad27-679faa5bf9b8\" alt=\"Memory Demo\">\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">Memory</h3>\n              <p class=\"card-description\">\n                Implement conversation memory for chatbots and conversational AI applications.\n              </p>\n            </div>\n          </a>\n        </div>\n      </section>\n        <section id=\"getting-started\" class=\"gallery-section\">\n        <h2 class=\"section-title\">🚀 Getting Started</h2>\n        <div class=\"cards-grid\">\n          <a href=\"basic-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image card-gradient\" style=\"background: linear-gradient(135deg, #10b981, var(--brand-color));\">\n              <div>⚙️</div>\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">Basic Usage</h3>\n              <p class=\"card-description\">\n                Learn fundamental PyTiDB operations including database connection, table creation, and data manipulation.\n              </p>\n            </div>\n          </a>\n          <a href=\"auto-embedding-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image card-gradient\" style=\"background: radial-gradient(circle at center, #8b5cf6 0%, var(--brand-color) 100%);\">\n              <div>🤖</div>\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">Auto Embedding</h3>\n              <p class=\"card-description\">\n                Automatically generate embeddings for your text data using built-in embedding models.\n              </p>\n            </div>\n          </a>\n        </div>\n      </section>\n        <section id=\"search\" class=\"gallery-section\">\n        <h2 class=\"section-title\">🔍 Search & Retrieval</h2>\n        <div class=\"cards-grid\">\n          <a href=\"vector-search-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image\">\n              <img src=\"https://github.com/user-attachments/assets/6d7783a5-ce9c-4dcc-8b95-49d5f0ca735a\" alt=\"Vector Search Demo\">\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">Vector Search</h3>\n              <p class=\"card-description\">\n                Implement semantic search using vector embeddings to find similar content.\n              </p>\n            </div>\n          </a>\n          <a href=\"fulltext-search-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image\">\n              <img src=\"https://github.com/user-attachments/assets/c81ddad4-f996-4b1f-85c0-5cbb55bc2a3a\" alt=\"Fulltext Search Demo\">\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">Fulltext Search</h3>\n              <p class=\"card-description\">\n                Perform traditional text search using MySQL fulltext search capabilities.\n              </p>\n            </div>\n          </a>\n          <a href=\"hybrid-search-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image\">\n              <img src=\"https://github.com/user-attachments/assets/6e1c639d-2160-44c8-86b4-958913b9eca5\" alt=\"Hybrid Search Demo\">\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">Hybrid Search</h3>\n              <p class=\"card-description\">\n                Combine vector search and fulltext search for more comprehensive results.\n              </p>\n            </div>\n          </a>\n          <a href=\"image-search-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image\">\n              <img src=\"https://github.com/user-attachments/assets/7ba9733a-4d1f-4094-8edb-58731ebd08e9\" alt=\"Image Search Demo\">\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">Image Search</h3>\n              <p class=\"card-description\">\n                Build an image search application using multimodal embeddings for both text-to-image and image-to-image search.\n              </p>\n            </div>\n          </a>\n        </div>\n      </section>\n        <section id=\"ai-apps\" class=\"gallery-section\">\n        <h2 class=\"section-title\">🤖 AI Applications</h2>\n        <div class=\"cards-grid\">\n          <a href=\"rag-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image\">\n              <img src=\"https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e\" alt=\"RAG Demo\">\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">RAG</h3>\n              <p class=\"card-description\">\n                Build a RAG application that combines document retrieval with language generation.\n              </p>\n            </div>\n          </a>\n          <a href=\"memory-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image\">\n              <img src=\"https://github.com/user-attachments/assets/74dee96b-ea20-49dc-ad27-679faa5bf9b8\" alt=\"Memory Demo\">\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">Memory</h3>\n              <p class=\"card-description\">\n                Implement conversation memory for chatbots and conversational AI applications.\n              </p>\n            </div>\n          </a>\n          <a href=\"text2sql-with-pytidb/\" class=\"gallery-card\">\n            <div class=\"card-image card-gradient\" style=\"background: linear-gradient(135deg, #06b6d4, var(--brand-color));\">\n              <div>💬</div>\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">Text2SQL</h3>\n              <p class=\"card-description\">\n                Convert natural language queries into SQL statements using AI models.\n              </p>\n            </div>\n          </a>\n        </div>\n      </section>\n      <div class=\"gallery-cta\">\n        <h3 class=\"cta-title\">Ready to build your AI application?</h3>\n        <p class=\"cta-description\">\n          Start your AI journey with TiDB Cloud Starter. Follow our quickstart guide to build your first AI-powered application in minutes, or explore specific examples for your use case.\n        </p>\n        <div class=\"cta-buttons\">\n          <a href=\"https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"btn-primary\">Try TiDB Cloud Starter</a>\n          <a href=\"/ai/quickstart/\" class=\"btn-secondary\">View Quickstart Guide</a>\n        </div>\n      </div>\n    </div>\n  </div>\n</div> "
  },
  {
    "path": "src/ai/examples/memory-with-pytidb.md",
    "content": "---\ntitle: Memory\ndescription: \"Implement conversation memory for chatbots and conversational AI applications.\"\nsource_repo: \"https://github.com/pingcap/pytidb/tree/main/examples/memory\"\n---\n\n# AI Agent Memory Demo\n\nThis example showcases how to build an intelligent AI agent with persistent memory powered by TiDB's vector search capabilities.\n\nWith just a few lines of code, you can create a conversational AI that remembers past interactions and builds context over time.\n\n- 🧠 **Persistent Memory**: The AI remembers conversations across sessions and user interactions\n- 💬 **Interactive Chat**: Both web interface and command-line options for flexible interaction\n- 👤 **Multi-User Support**: Different users can have separate memory contexts\n- 🔍 **Real-Time Memory Viewing**: Visual display of all stored memories in the web interface\n\n<p align=\"center\">\n    <img src=\"https://github.com/user-attachments/assets/b57ae0fb-9075-43a9-8690-edaa69ca9f40\" alt=\"AI Agent with memory powered by TiDB\" width=\"700\"/>\n    <p align=\"center\"><i>AI Agent with memory powered by TiDB</i></p>\n</p>\n\n## Prerequisites\n\n- **Python 3.10+**\n- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)\n- **OpenAI API Key**: Get your API key at [OpenAI Platform ↗️](https://platform.openai.com/api-keys)\n\n## How to run\n\n**Step 1**: Clone the repository to local\n\n```bash\ngit clone https://github.com/pingcap/pytidb.git\ncd pytidb/examples/memory/\n```\n\n**Step 2**: Install the required packages\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate  # Windows: .venv\\Scripts\\activate\npip install -r reqs.txt\n```\n\n**Step 3**: Set up environment variables\n\nGo to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:\n\n```bash\ncat > .env <<EOF\nTIDB_HOST={gateway-region}.prod.aws.tidbcloud.com\nTIDB_PORT=4000\nTIDB_USERNAME={prefix}.root\nTIDB_PASSWORD={password}\nTIDB_DATABASE=test\n\nOPENAI_API_KEY={your-openai-api-key}\nEOF\n```\n\n**Step 4**: Run the application\n\nChoose one of the following options:\n\n**Option 1**: Launch Web Application:\n\n```bash\nstreamlit run app.py\n```\n\nVisit `http://localhost:8501` in your browser and follow the instructions in the [Interact with memory in Web Application](https://github.com/pingcap/pytidb/tree/main/examples/memory/#interact-with-memory-in-web-application) section to start interacting with the memory-enabled AI assistant.\n\n**Option 2**: Run Command Line Application:\n\n```bash\npython main.py\n```\n\nFollow the instructions in the [Interact with memory in Command Line Application](https://github.com/pingcap/pytidb/tree/main/examples/memory/#interact-with-memory-in-command-line-application) section to start interacting with the memory-enabled AI assistant.\n\n## Interact with memory in Web Application\n\nIn the web application, you can interact with the AI assistant, the user interface includes:\n\n- **Sidebar**: User settings and chat list.\n- **Main chat area**: Chat interface with the AI assistant.\n- **Memory viewer**: Real-time memory viewer showing stored facts.\n\nYou can follow the following steps to check how the memory works:\n\n1. Introduce yourself in the default chat session. For example, \"Hello, I am John. I work as a software engineer and love guitar.\"\n2. You can see the information you provided in the memory viewer.\n3. Click **New chat** in the sidebar to start a new chat session.\n4. Ask \"Who am I?\" in the new chat session. The AI will recall your information from previous conversations.\n\n\n## Interact with memory in Command Line Application\n\nIn the command line application, you can interact with the AI assistant and introduce yourself.\n\n**Example conversation:**\n\n```plain\nChat with AI (type 'exit' to quit)\nYou: Hello, I am Mini256.\nAI: Hello, Mini256! How can I assist you today?\nYou: I am working at PingCAP.\nAI: That's great to hear, Mini256! PingCAP is known for its work on distributed databases, particularly TiDB. How's your experience been working there?\nYou: I am developing pytidb (A Python SDK for TiDB) which helps developers easy to connect to TiDB.\nAI: That sounds like a great project, Mini256! Developing a Python SDK for TiDB can make it much easier for developers to integrate with TiDB and interact with it using Python. If you need any advice on best practices, libraries to use, or specific features to implement, feel free to ask!\nYou: exit\nGoodbye!\n```\n\nAfter the first conversation, the AI assistant will remember the information you provided and use it to answer future questions.\n\nNow, you can start a new chat session and ask the AI assistant \"Who am I?\".\n\n**Example conversation in another chat session:**\n\n```plain\nChat with AI (type 'exit' to quit)\nYou: Who am I?\nAI: You are Mini256, and you work at PingCAP, where you are developing pytidb, a Python SDK for TiDB to assist developers in easily connecting to TiDB.\nYou: exit\nGoodbye!\n```\n\nAs you can see, the AI assistant remembers you across sessions!\n\n\n\n---\n\n## Related Resources\n\n- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/memory)\n- **Category**: Ai-Apps\n\n- **Description**: Implement conversation memory for chatbots and conversational AI applications.\n\n\n[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } "
  },
  {
    "path": "src/ai/examples/rag-with-pytidb.md",
    "content": "---\ntitle: RAG\ndescription: \"Build a RAG application that combines document retrieval with language generation.\"\nsource_repo: \"https://github.com/pingcap/pytidb/tree/main/examples/rag\"\n---\n\n# RAG Example\n\nThis example demonstrates how to use PyTiDB to build a minimal RAG application.\n\n* Use Ollama to deploy local embedding model and LLM model\n* Use Streamlit to build a Web UI for the RAG application\n* Use PyTiDB to build a minimal RAG application\n\n<p align=\"center\">\n  <img src=\"https://github.com/user-attachments/assets/dfd85672-65ce-4a46-8dd2-9f77d826363e\" alt=\"RAG application built with PyTiDB\" width=\"600\" />\n  <p align=\"center\"><i>RAG application built with PyTiDB</i></p>\n</p>\n\n## Prerequisites\n\n- **Python 3.10+**\n- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)\n- **Ollama**: You can install it from [Ollama ↗️](https://ollama.com/download)\n\n## How to run\n\n**Step 1**: Prepare the inference API\n\nPull the embedding and LLM model via ollama CLI:\n\n```bash\nollama pull mxbai-embed-large\nollama pull gemma3:4b\nollama run gemma3:4b\n```\n\nTest the `/embed` and `/generate` endpoints to make sure they are running:\n\n```bash\ncurl http://localhost:11434/api/embed -d '{\n  \"model\": \"mxbai-embed-large\",\n  \"input\": \"Llamas are members of the camelid family\"\n}'\n```\n\n```bash\ncurl http://localhost:11434/api/generate -d '{\n  \"model\": \"gemma3:4b\",\n  \"prompt\": \"Hello, Who are you?\"\n}'\n```\n\n**Step 2**: Clone the repository to local\n\n```bash\ngit clone https://github.com/pingcap/pytidb.git\ncd pytidb/examples/rag/;\n```\n\n**Step 3**: Install the required packages and setup environment\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\npip install -r reqs.txt\n```\n\n**Step 4**: Set up environment to connect to database\n\nGo to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:\n\n```bash\ncat > .env <<EOF\nTIDB_HOST={gateway-region}.prod.aws.tidbcloud.com\nTIDB_PORT=4000\nTIDB_USERNAME={prefix}.root\nTIDB_PASSWORD={password}\nTIDB_DATABASE=test\nEOF\n```\n\n**Step 5**: Run the Streamlit app\n\n```bash\nstreamlit run main.py\n```\n\n**Step 6**: Open the browser and visit `http://localhost:8501`\n\n## Troubleshooting\n\n### `502 Bad Gateway` Error\n\nTry to disable the global proxy settings.\n\n\n---\n\n## Related Resources\n\n- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/rag)\n- **Category**: Ai-Apps\n\n- **Description**: Build a RAG application that combines document retrieval with language generation.\n\n\n[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } "
  },
  {
    "path": "src/ai/examples/text2sql-with-pytidb.md",
    "content": "---\ntitle: Text2SQL\ndescription: \"Convert natural language queries into SQL statements using AI models.\"\nsource_repo: \"https://github.com/pingcap/pytidb/tree/main/examples/text2sql\"\n---\n\n# Text2SQL Demo\n\nThis demo showcases an AI-powered interface that converts natural language questions into SQL queries and executes them against TiDB. Built with PyTiDB, OpenAI GPT, and Streamlit, it provides a seamless way to interact with your database using plain English.\n\n## Prerequisites\n\n- **Python 3.10+**\n- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)\n- **OpenAI API Key**: Get your API key at [OpenAI Platform ↗️](https://platform.openai.com/api-keys)\n\n\n## How to run\n\n**Step 1**: Clone the repository\n\n```bash\ngit clone https://github.com/pingcap/pytidb.git\ncd pytidb/examples/text2sql/;\n```\n\n**Step 2**: Install the required packages\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\npip install -r reqs.txt\n```\n\n**Step 3**: Run the Streamlit app\n\n```bash\nstreamlit run app.py\n```\n\n**Step 4**: Run streamlit app\n\nOpen the browser and visit `http://localhost:8501`\n\n* Input OpenAI API key in left sidebar\n* Input the TiDB Cloud connection string in left sidebar, the format is `mysql+pymysql://root@localhost:4000/test`\n\n---\n\n## Related Resources\n\n- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/text2sql)\n- **Category**: Ai-Apps\n\n- **Description**: Convert natural language queries into SQL statements using AI models.\n\n\n[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } "
  },
  {
    "path": "src/ai/examples/vector-search-with-pytidb.md",
    "content": "---\ntitle: Vector Search\ndescription: \"Implement semantic search using vector embeddings to find similar content.\"\nsource_repo: \"https://github.com/pingcap/pytidb/tree/main/examples/vector_search\"\n---\n\n# Vector Search Example\n\nThis example demonstrates how to build a semantic search application using TiDB and local embedding models. It leverages vector search to find similar items based on meaning, not just keywords. The app uses Streamlit for the web UI and Ollama for local embedding generation.\n\n<p align=\"center\">\n  <img width=\"700\" alt=\"Semantic search with vector embeddings\" src=\"https://github.com/user-attachments/assets/6d7783a5-ce9c-4dcc-8b95-49d5f0ca735a\" />\n  <p align=\"center\"><i>Semantic search with vector embeddings</i></p>\n</p>\n\n## Prerequisites\n\n- **Python 3.10+**\n- **A TiDB Cloud Starter cluster**: Create a free cluster here: [tidbcloud.com ↗️](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme)\n- **Ollama**: You can install it from [Ollama ↗️](https://ollama.com/download)\n\n## How to run\n\n**Step 1**: Start the embedding service with Ollama\n\nPull the embedding model:\n\n```bash\nollama pull mxbai-embed-large\n```\n\nTest the embedding service to make sure it is running:\n\n```bash\ncurl http://localhost:11434/api/embed -d '{\n  \"model\": \"mxbai-embed-large\",\n  \"input\": \"Llamas are members of the camelid family\"\n}'\n```\n\n**Step 2**: Clone the repository to local\n\n```bash\ngit clone https://github.com/pingcap/pytidb.git\ncd pytidb/examples/vector_search/\n```\n\n**Step 3**: Install the required packages and set up the environment\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\npip install -r reqs.txt\n```\n\n**Step 4**: Set up environment to connect to TiDB\n\nGo to [TiDB Cloud console](https://tidbcloud.com/clusters) and get the connection parameters, then set up the environment variable like this:\n\n```bash\ncat > .env <<EOF\nTIDB_HOST={gateway-region}.prod.aws.tidbcloud.com\nTIDB_PORT=4000\nTIDB_USERNAME={prefix}.root\nTIDB_PASSWORD={password}\nTIDB_DATABASE=pytidb_vector_search\nEOF\n```\n\n**Step 5**: Run the Streamlit app\n\n```bash\nstreamlit run app.py\n```\n\n**Step 6**: Open your browser and visit `http://localhost:8501`\n\n\n---\n\n## Related Resources\n\n- **Source Code**: [View on GitHub](https://github.com/pingcap/pytidb/tree/main/examples/vector_search)\n- **Category**: Search\n\n- **Description**: Implement semantic search using vector embeddings to find similar content.\n\n\n[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } "
  },
  {
    "path": "src/ai/guides/auto-embedding.md",
    "content": "# Auto Embedding\n\nAuto embedding is a feature that allows you to automatically generate vector embeddings for text data.\n\n!!! tip\n\n    To check the complete example code, please refer to the [auto embedding example](https://github.com/pingcap/pytidb/blob/main/examples/auto_embedding).\n\n## Basic Usage\n\nIn this example, we use TiDB Cloud hosted embedding models for demonstration, for other providers, please check the [Supported Providers](../integrations/embedding-overview.md#supported-providers) list.\n\n### Step 1. Define a embedding function\n\n=== \"Python\"\n\n    Define a embedding function to generate vector embeddings for text data.\n\n    ```python\n    from pytidb.embeddings import EmbeddingFunction\n\n    embed_func = EmbeddingFunction(\n        model_name=\"tidbcloud_free/amazon/titan-embed-text-v2\",\n    )\n    ```\n\n### Step 2. Create a table and a vector field\n\n=== \"Python\"\n\n    Use `embed_func.VectorField()` to create a vector field on the table.\n\n    To enable auto embedding, you need to set `source_field` to the field that you want to embed.\n\n    ```python hl_lines=\"7\"\n    from pytidb.schema import TableModel, Field\n    from pytidb.datatype import TEXT\n\n    class Chunk(TableModel):\n        id: int = Field(primary_key=True)\n        text: str = Field(sa_type=TEXT)\n        text_vec: list[float] = embed_func.VectorField(source_field=\"text\")\n\n    table = client.create_table(schema=Chunk, if_exists=\"overwrite\")\n    ```\n\n    You don't need to specify the `dimensions` parameter, it will be automatically determined by the embedding model.\n    \n    However, you can specify the `dimensions` parameter to override the default dimension.\n\n### Step 3. Insert some sample data\n\n=== \"Python\"\n\n    Insert some sample data into the table.\n\n    ```python\n    table.bulk_insert([\n        Chunk(text=\"TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads.\"),\n        Chunk(text=\"PyTiDB is a Python library for developers to connect to TiDB.\"),\n        Chunk(text=\"LlamaIndex is a Python library for building AI-powered applications.\"),\n    ])\n    ```\n\n    When inserting data, the `text_vec` field will be automatically populated with the vector embeddings generated based on the `text` field.\n\n### Step 4. Perform a vector search\n\n=== \"Python\"\n\n    You can pass the query text to the `search()` method directly, the query text will be embedded and then used for vector search.\n\n    ```python\n    table.search(\"HTAP database\").limit(3).to_list()\n    ```\n"
  },
  {
    "path": "src/ai/guides/connect.md",
    "content": "# Connect to database\n\nIn this guide, we will introduce how to connect to a TiDB database using the TiDB client.\n\n## Install the dependencies\n\n[pytidb](https://github.com/pingcap/pytidb) is a Python client built upon [SQLAlchemy](https://sqlalchemy.org/), it provides a series of high-level APIs to help developers store and search vector embeddings without writing raw SQL.\n\nTo install the Python client, run the following command:\n\n```bash\npip install pytidb\n```\n\n## Connect with connection parameters\n\nChoose the steps based on your deployment type:\n\n=== \"TiDB Cloud Starter\"\n\n    You can create a starter cluster in the [TiDB Cloud](https://tidbcloud.com/free-trial/), and then get the connection parameters from the web console.\n\n    1. Navigate to the [Clusters page](https://tidbcloud.com/clusters), and then click the name of your target cluster to go to its overview page.\n    2. Click **Connect** in the upper-right corner. A connection dialog is displayed, with connection parameters listed.\n    3. Copy the connection parameters to your code or environment variables.\n\n    Example code:\n\n    ```python title=\"main.py\"\n    from pytidb import TiDBClient\n\n    db = TiDBClient.connect(\n        host=\"{gateway-region}.prod.aws.tidbcloud.com\",\n        port=4000,\n        username=\"{prefix}.root\",\n        password=\"{password}\",\n        database=\"test\",\n    )\n    ```\n\n    !!! tip\n\n        For TiDB Cloud Starter, [TLS connection to the database](https://docs.pingcap.com/tidbcloud/secure-connections-to-starter-clusters/) is required when using Public Endpoint. TiDB Client will **automatically** enable TLS connection for starter clusters.\n\n=== \"TiDB Self-Managed\"\n\n    You can follow [Quick Start with TiDB Self-Managed](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb/#deploy-a-local-test-cluster) to deploy a TiDB cluster for testing.\n\n    Example code:\n\n    ```python title=\"main.py\"\n    from pytidb import TiDBClient\n\n    db = TiDBClient.connect(\n        host=\"{tidb_server_host}\",\n        port=4000,\n        username=\"root\",\n        password=\"{password}\",\n        database=\"test\",\n    )\n    ```\n\n    !!! tip\n    \n        If you are using `tiup playground` to deploy a TiDB cluster for testing, the default host is `127.0.0.1` and the default password is empty.\n\nOnce connected, you can use the `db` object to operate tables, query data, and more.\n\n## Connect with connection string\n\nIf you prefer to use a connection string (database URL), you can follow the format based on your deployment type:\n\n=== \"TiDB Cloud Starter\"\n\n    You can create a starter cluster in the [TiDB Cloud](https://tidbcloud.com/free-trial/), and then get the connection parameters from the web console.\n\n    1. Navigate to the [Clusters page](https://tidbcloud.com/clusters), and then click the name of your target cluster to go to its overview page.\n    2. Click **Connect** in the upper-right corner. A connection dialog is displayed with the connection parameters listed.\n    3. Copy the connection parameters and construct the connection string as the format below.\n\n    ```python title=\"main.py\"\n    from pytidb import TiDBClient\n\n    db = TiDBClient.connect(\n        database_url=\"mysql+pymysql://{USERNAME}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}?ssl_verify_cert=true&ssl_verify_identity=true\",\n    )\n    ```\n\n    !!! note\n    \n        For TiDB Cloud Starter, [TLS connection to the database](https://docs.pingcap.com/tidbcloud/secure-connections-to-starter-clusters/) is required when using Public Endpoint, so you need to set `ssl_verify_cert=true&ssl_verify_identity=true` in the connection string.\n\n=== \"TiDB Self-Managed\"\n\n    You can follow the format below to construct the connection string:\n\n    ```python title=\"main.py\"\n    from pytidb import TiDBClient\n\n    db = TiDBClient.connect(\n        database_url=\"mysql+pymysql://{USERNAME}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}\",\n    )\n    ```\n\n    !!! tip\n\n        If you are using `tiup playground` to deploy a TiDB cluster for testing, the connection string is: \n        \n        ```\n        mysql+pymysql://root:@127.0.0.1:4000/test\n        ```\n\n## Connect with SQLAlchemy DB engine\n\nIf your application already has an existing SQLAlchemy database engine, you can reuse the engine through the `db_engine` parameter:\n\n```python title=\"main.py\"\nfrom pytidb import TiDBClient\n\ndb = TiDBClient(db_engine=db_engine)\n```\n\n## Next Steps\n\nAfter connecting to your TiDB database, you can explore the following guides to learn how to work with your data:\n\n- [Working with Tables](./tables.md): Learn how to define and manage tables in TiDB.\n- [Vector Search](./vector-search.md): Perform semantic search using vector embeddings.\n- [Fulltext Search](./fulltext-search.md): Retrieve documents using keyword-based search.\n- [Hybrid Search](./hybrid-search.md): Combine vector and full-text search for more relevant results.\n"
  },
  {
    "path": "src/ai/guides/filtering.md",
    "content": "# Filtering\n\nAs a relational database, TiDB supports a rich set of [SQL operators](https://docs.pingcap.com/tidbcloud/operators/) and allows flexible combinations of filtering conditions that enable you to query your data precisely.\n\n## Overview\n\nYou can not only apply filtering on scalar fields but also on JSON fields. Filtering on JSON fields is often used for [metadata filtering](./vector-search.md#metadata-filtering) in vector search.\n\n=== \"Python\"\n\nFor PyTiDB, you can apply filtering by passing a **filters** parameter to the `table.query()`, `table.delete()`, `table.update()`, and `table.search()` methods.\n\nThe **filters** parameter supports two formats: [Dictionary Filters](#dictionary-filters) and [SQL String Filters](#sql-string-filters).\n\n## Dictionary Filters\n\n=== \"Python\"\n\nPyTiDB allows you to define filter conditions using a Python dictionary with operators as the **filters** parameter.\n\nThe dictionary structure of **filters** is as follows:\n\n```python\n{\n    \"<key>\": {\n        \"<operator>\": <value>\n    },\n    ...\n}\n```\n\n- `<key>`: The key can be a column name, a JSON path expression to access a JSON field (see [Metadata filtering](./vector-search.md#metadata-filtering)), or a [logical operator](#logical-operators).\n- `<operator>`: The operator can be a [compare operator](#compare-operators) or an [inclusion operator](#inclusion-operators).\n- `<value>`: The value can be a scalar value, an array, it depends on the operator.\n\n**Example: Filter records where `created_at` is greater than 2024-01-01**\n\n```python\ntable.query({\n    # The `created_at` is a scalar field with DATETIME type\n    \"created_at\": {\n        \"$gt\": \"2024-01-01\"\n    }\n})\n```\n\n**Example: Filter records where `meta.category` is in the array [\"tech\", \"science\"]**\n\n```python\nresults = (\n    table.search(\"some query\", search_type=\"vector\")\n        .filter({\n            # The `meta` is a JSON field, and its value is a JSON object like {\"category\": \"tech\"}\n            \"meta.category\": {\n                \"$in\": [\"tech\", \"science\"]\n            }\n        })\n        .limit(10)\n        .to_list()\n)\n```\n\n### Compare operators\n\nYou can use the following compare operators to filter records:\n\n| Operator | Description                       |\n|----------|-----------------------------------|\n| `$eq`    | Equal to value                    |\n| `$ne`    | Not equal to value                |\n| `$gt`    | Greater than value                |\n| `$gte`   | Greater than or equal to value    |\n| `$lt`    | Less than value                   |\n| `$lte`   | Less than or equal to value       |\n\n**Example: filter records where `user_id` is equal to 1**\n\n```python\n{\n    \"user_id\": {\n        \"$eq\": 1\n    }\n}\n```\n\nYou can omit the `$eq` operator. The following query is equivalent to the above:\n\n```python\n{\n    \"user_id\": 1\n}\n```\n\n### Inclusion operators\n\nYou can use the following inclusion operators to filter records:\n\n| Operator | Description                       |\n|----------|-----------------------------------|\n| `$in`    | In array (string, int, or float)  |\n| `$nin`   | Not in array (string, int, float) |\n\n**Example: Filter records where `category` is in the array [\"tech\", \"science\"]**\n\n```python\n{\n    \"category\": {\n        \"$in\": [\"tech\", \"science\"]\n    }\n}\n```\n\n### Logical operators\n\nYou can use the logical operators `$and` and `$or` to combine multiple filters.\n\n| Operator | Description                                         |\n|----------|-----------------------------------------------------|\n| `$and`   | Returns results that match **all** filters in the list |\n| `$or`    | Returns results that match **any** filter in the list |\n\n**Syntax for using `$and` or `$or`:**\n\n```python\n{\n    \"$and|$or\": [\n        {\n            \"field_name\": {\n                <operator>: <value>\n            }\n        },\n        {\n            \"field_name\": {\n                <operator>: <value>\n            }\n        }\n        ...\n    ]\n}\n```\n\n**Example: using `$and` to combine multiple filters:**\n\n```python\n{\n    \"$and\": [\n        {\n            \"created_at\": {\n                \"$gt\": \"2024-01-01\"\n            }\n        },\n        {\n            \"meta.category\": {\n                \"$in\": [\"tech\", \"science\"]\n            }\n        }\n    ]\n}\n```\n\n## SQL String Filters\n\n=== \"Python\"\n\nYou can also use a SQL string as the `filters` parameter. The string should be a valid SQL `WHERE` clause (without the `WHERE` keyword) using TiDB's SQL syntax.\n\n**Example: Filter records where `created_at` is greater than 2024-01-01**\n\n```python\nresults = table.query(\n    filters=\"created_at > '2024-01-01'\",\n    limit=10\n).to_list()\n```\n\n**Example: Filter records where the JSON field `meta.category` equals 'tech'**\n\n```python\nresults = table.query(\n    filters=\"meta->>'$.category' = 'tech'\",\n    limit=10\n).to_list()\n```\n\nYou can combine multiple conditions using `AND`, `OR`, and parentheses, and use any [SQL operators](https://docs.pingcap.com/tidbcloud/operators/) supported by TiDB.\n\n!!! warning\n\n    When using SQL string filters with dynamic user input, always validate the input to prevent [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) vulnerabilities.\n"
  },
  {
    "path": "src/ai/guides/fulltext-search.md",
    "content": "# Full-text Search\n\n**Full-text search** enables you to find documents or data by matching keywords or phrases within the entire text content. It is widely used in search engines, document management, e-commerce, and any scenario where users need to search large volumes of unstructured or semi-structured text.\n\nTiDB provides full-text search capabilities for **massive datasets** with high performance and built-in **multilingual support**.\n\n!!! note\n\n    Full-text search is currently in the early stages with limited accessibility. It is only available for **TiDB Cloud Starter** in the following regions:\n    \n    - **Frankfurt (eu-central-1)**\n    - **Singapore (ap-southeast-1)**\n  \n    If you have feedback or need help, feel free to reach out to us on [Discord](https://discord.gg/zcqexutz2R).\n\n!!! tip\n\n    For a complete example of full-text search, see the [E-commerce product search demo](../examples/fulltext-search-with-pytidb.md).\n\n## Basic Usage\n\n### Step 1. Create Table and Full-text Index\n\n=== \"Python\"\n\n    You can use `FullTextField` to define a text field with full-text search enabled. The `fts_parser` parameter in Python corresponds to the `WITH PARSER` clause in SQL.\n\n    For example, the following code creates a table with a full-text index on the `title` column:\n\n    ```python hl_lines=\"6\"\n    from pytidb.schema import TableModel, Field, FullTextField\n\n    class Item(TableModel):\n        __tablename__ = \"items\"\n        id: int = Field(primary_key=True)\n        title: str = FullTextField(fts_parser=\"MULTILINGUAL\")\n\n    table = client.create_table(schema=Item, if_exists=\"overwrite\")\n    ```\n\n    The `fts_parser` parameter specifies the parser for the full-text index. Supported values:\n    \n    - `STANDARD`: Fast, works for English content, splits words by spaces and punctuation.\n    - `MULTILINGUAL` (default): Supports multiple languages, including English, Chinese, Japanese, and Korean.\n\n=== \"SQL\"\n\n    Create a table with a full-text index:\n\n    ```sql hl_lines=\"4\"\n    CREATE TABLE items(\n        id INT PRIMARY KEY,\n        title TEXT,\n        FULLTEXT INDEX (title) WITH PARSER MULTILINGUAL\n    );\n    ```\n\n    You can also add a full-text index to an existing table with a separate statement:\n\n    ```sql\n    CREATE TABLE items(\n        id INT PRIMARY KEY,\n        title TEXT\n    );\n\n    ALTER TABLE items ADD FULLTEXT INDEX (title)\n    WITH PARSER MULTILINGUAL ADD_COLUMNAR_REPLICA_ON_DEMAND;\n    ```\n\n    The following parsers are supported in the `WITH PARSER <PARSER_NAME>` clause:\n\n    - `STANDARD`: Fast, works for English content, splits words by spaces and punctuation.\n    - `MULTILINGUAL`: Supports multiple languages, including English, Chinese, Japanese, and Korean.\n\n### Step 2. Insert Sample Data\n\nFor demonstration purposes, the following sample data covers English, Japanese, and Chinese text.\n\n=== \"Python\"\n\n    You can use the `bulk_insert` method to insert sample data into the table.\n\n    ```python\n    table.bulk_insert([\n        Item(id=1, title=\"Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency\"),\n        Item(id=2, title=\"Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable\"),\n        Item(id=3, title=\"IPX7 Waterproof Earbuds, Sport ready, Touch control, High-quality music\"),\n        Item(id=4, title=\"Sports Earbuds, Secure fit, Sweatproof, Long battery, Workout support\"),\n        Item(id=5, title=\"Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience\"),\n        Item(id=6, title=\"Bluetoothイヤホン HiFi音質 48hバッテリー 急速充電 低遅延\"),\n        Item(id=7, title=\"Bluetooth5.3ヘッドホン ノイズキャンセリング 没入サウンド 快適装着\"),\n        Item(id=8, title=\"IPX7防水イヤホン スポーツ対応 タッチ操作 高音質音楽\"),\n        Item(id=9, title=\"スポーツイヤホン 安定装着 防汗 長持ちバッテリー ワークアウト対応\"),\n        Item(id=10, title=\"有線ヘッドホン スタジオ級 HDサウンド 快適装着 プロ音楽体験\"),\n        Item(id=11, title=\"无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟\"),\n        Item(id=12, title=\"蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光\"),\n        Item(id=13, title=\"IPX7防水真无线耳机 运动无忧 智能触控 随时畅听高品质音乐\"),\n        Item(id=14, title=\"运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话\"),\n        Item(id=15, title=\"录音室级有线耳机 高清音质 舒适佩戴 可拆卸线材 多设备兼容 降噪麦克风\"),\n    ])\n    ```\n\n=== \"SQL\"\n\n    You can use the `INSERT INTO` statement to insert the sample data into the table.\n\n    ```sql\n    INSERT INTO items (id, title) VALUES\n        (1, 'Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency'),\n        (2, 'Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable'),\n        (3, 'IPX7 Waterproof Earbuds, Sport ready, Touch control, High-quality music'),\n        (4, 'Sports Earbuds, Secure fit, Sweatproof, Long battery, Workout support'),\n        (5, 'Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience'),\n        (6, 'Bluetoothイヤホン HiFi音質 48hバッテリー 急速充電 低遅延'),\n        (7, 'Bluetooth5.3ヘッドホン ノイズキャンセリング 没入サウンド 快適装着'),\n        (8, 'IPX7防水イヤホン スポーツ対応 タッチ操作 高音質音楽'),\n        (9, 'スポーツイヤホン 安定装着 防汗 長持ちバッテリー ワークアウト対応'),\n        (10, '有线ヘッドホン スタジオ级 HDサウンド 快适装着 プロ音楽体験'),\n        (11, '无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟'),\n        (12, '蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光'),\n        (13, 'IPX7防水真无线耳机 运动无忧 智能触控 随时畅听高品质音乐'),\n        (14, '运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话'),\n        (15, '录音室级有线耳机 高清音质 舒适佩戴 可拆卸线材 多设备兼容 降噪麦克风');\n    ```\n\n### Step 3. Perform a Full-text Search\n\n=== \"Python\"\n\n    To perform a full-text search with pytidb, use the `search` method and set the `search_type` parameter to `\"fulltext\"`.\n\n    **Example: Search for the 3 most relevant documents**\n\n    ```python\n    results = table.search(\"Bluetooth Headphones\", search_type=\"fulltext\").limit(3).to_list()\n    print(json.dumps(results, indent=2, ensure_ascii=False))\n    ```\n\n    ```python title=\"Execution result\"\n    [\n        {\n            \"id\": 2,\n            \"title\": \"Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable\",\n            \"_match_score\": 3.7390857,\n            \"_score\": 3.7390857\n        },\n        {\n            \"id\": 5,\n            \"title\": \"Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience\",\n            \"_match_score\": 1.9798478,\n            \"_score\": 1.9798478\n        },\n        {\n            \"id\": 1,\n            \"title\": \"Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency\",\n            \"_match_score\": 1.620981,\n            \"_score\": 1.620981\n        }\n    ]\n    ```\n\n    The results are sorted by relevance, with the most relevant documents listed first.\n\n    The `_match_score` (or `_score`) field indicates the relevance score of each document, calculated using the [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) algorithm—a widely used ranking function in information retrieval.\n\n    **Example: Search for the 3 most relevant documents in another language**\n\n    ```python\n    results = table.search(\"蓝牙耳机\", search_type=\"fulltext\").limit(3).to_list()\n    print(json.dumps(results, indent=2, ensure_ascii=False))\n    ```\n\n    ```python title=\"Execution result\"\n    [\n        {\n            \"id\": 11,\n            \"title\": \"无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟\",\n            \"_match_score\": 3.000002,\n            \"_score\": 3.000002\n        },\n        {\n            \"id\": 12,\n            \"title\": \"蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光\",\n            \"_match_score\": 2.5719738,\n            \"_score\": 2.5719738\n        },\n        {\n            \"id\": 14,\n            \"title\": \"运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话\",\n            \"_match_score\": 1.1418362,\n            \"_score\": 1.1418362\n        }\n    ]\n    ```\n\n=== \"SQL\"\n\n    To perform a full-text search, use the `fts_match_word()` function.\n\n    **Example: Search for the 3 most relevant documents**\n\n    ```sql\n    SELECT *, fts_match_word(\"Bluetooth Headphones\", title) AS score\n    FROM items\n    WHERE fts_match_word(\"Bluetooth Headphones\", title)\n    ORDER BY score DESC\n    LIMIT 3;\n    ```\n\n    ```plain title=\"Execution result\"\n    +----+-----------------------------------------------------------------------------+-----------+\n    | id | title                                                                       | score     |\n    +----+-----------------------------------------------------------------------------+-----------+\n    |  2 | Bluetooth 5.3 Headphones, Noise Cancelling, Immersive sound, Comfortable    | 3.7390857 |\n    |  5 | Wired Headphones, Studio-grade, HD sound, Comfortable, Pro music experience | 1.9798478 |\n    |  1 | Bluetooth Earphones, HiFi sound, 48h battery, Fast charge, Low latency      |  1.620981 |\n    +----+-----------------------------------------------------------------------------+-----------+\n    ```\n\n    The results are ordered by relevance, with the most relevant documents first.\n\n    **Example: Search for the 3 most relevant documents in another language**\n\n    ```sql\n    SELECT *, fts_match_word(\"蓝牙耳机\", title) AS score\n    FROM items\n    WHERE fts_match_word(\"蓝牙耳机\", title)\n    ORDER BY score DESC\n    LIMIT 3;\n    ```\n\n    ```plain title=\"Execution result\"\n    +----+------------------------------------------------------------------+-----------+\n    | id | title                                                            | score     |\n    +----+------------------------------------------------------------------+-----------+\n    | 11 | 无线蓝牙耳机 HiFi音质 48小时超长续航 快速充电 低延迟                    |  3.000002 |\n    | 12 | 蓝牙5.3降噪头戴式耳机 杜比全景声 沉浸音效 舒适佩戴 畅享静谧音乐时光        | 2.5719738 |\n    | 14 | 运动专用耳机 稳固佩戴 防汗设计 超长续航 低延迟音频 高清通话               | 1.1418362 |\n    +----+------------------------------------------------------------------+-----------+\n    ```\n\n## See Also\n\nIn Retrieval-Augmented Generation (RAG) scenarios, it is often beneficial to utilize both full-text search and vector search for optimal results.\n\n- Learn how to combine these approaches in the [hybrid search guide](./hybrid-search.md).\n- For more on vector search, see the [vector search guide](../concepts/vector-search.md)."
  },
  {
    "path": "src/ai/guides/hybrid-search.md",
    "content": "# Hybrid Search\n\nHybrid search is a technique that combines multiple search algorithms to deliver more accurate and relevant results.\n\nTiDB supports both semantic search (also known as vector search) and keyword-based search (full-text search). By leveraging the strengths of both approaches, you can achieve superior search results through hybrid search.\n\n<p align=\"center\">\n    <img src=\"https://docs-download.pingcap.com/media/images/docs/vector-search/hybrid-search-overview.svg\" alt=\"hybrid search overview\" width=\"800\"/>\n</p>\n\n!!! tip\n\n    For a complete example of hybrid search, refer to the [hybrid-search example](../examples/hybrid-search-with-pytidb.md).\n\n\n## Basic Usage\n\n### Step 1. Define an Embedding Function\n\nDefine an embedding function to generate vector representations of text data.\n\n```python\nfrom pytidb.embeddings import EmbeddingFunction\n\nembed_fn = EmbeddingFunction(\n    model_name=\"openai/text-embedding-3-small\",\n    api_key=os.getenv(\"OPENAI_API_KEY\"),\n)\n```\n\n### Step 2. Create a Table with Vector and Full-Text Indexes\n\n=== \"Python\"\n\n    After you have [connected to your TiDB database](./connect.md) using `TiDBClient` and get the `client` instance:\n\n    You can now create a table with both a `FullTextField` and a `VectorField` to store the text data and its vector embedding.\n\n    Example:\n\n    ```python\n    from pytidb.schema import TableModel, Field, FullTextField\n\n    class Chunk(TableModel):\n        __tablename__ = \"chunks_for_hybrid_search\"\n        id: int = Field(primary_key=True)\n        text: str = FullTextField()\n        text_vec: list[float] = embed_fn.VectorField(source_field=\"text\")\n\n    table = client.create_table(schema=Chunk, if_exists=\"overwrite\")\n    ```\n\n    In this example, PyTiDB will automatically create a full-text index on the `text` column and a vector index on the `text_vec` column.\n\n### Step 3. Insert Sample Data\n\n=== \"Python\"\n\n    Use the `bulk_insert()` method to insert sample data into the table.\n\n    ```python\n    table.truncate()\n    table.bulk_insert([\n        Chunk(\n            text=\"TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads.\",\n        ),\n        Chunk(\n            text=\"PyTiDB is a Python library for developers to connect to TiDB.\",\n        ),\n        Chunk(\n            text=\"LlamaIndex is a Python library for building AI-powered applications.\",\n        ),\n    ])\n    ```\n\n    The `text_vec` field is automatically populated with the vector embedding of the text data via the [Auto Embedding](../guides/auto-embedding.md) feature.\n\n### Step 4. Perform Hybrid Search\n\nTo enable hybrid search, set the `search_type` parameter to `hybrid` when calling the `search()` method.\n\n```python\nresults = (\n    table.search(\n        \"AI database\", search_type=\"hybrid\"\n    )\n    .limit(3)\n    .to_list()\n)\n\nfor item in results:\n    item.pop(\"text_vec\")\nprint(json.dumps(results, indent=4, sort_keys=True))\n```\n\nThe search results contain three special fields:\n\n- `_distance`: The distance between the query vector and the vector data in the table, as returned by the vector search.\n- `_match_score`: The match score between the query and the text field, as returned by the full-text search.\n- `_score`: The final score of the search result, calculated by the fusion algorithm.\n\n```json title=\"Output\"\n[\n    {\n        \"_distance\": 0.4740166257687124,\n        \"_match_score\": 1.6804268,\n        \"_score\": 0.03278688524590164,\n        \"id\": 60013,\n        \"text\": \"TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads.\"\n    },\n    {\n        \"_distance\": 0.6428459116216618,\n        \"_match_score\": 0.78427225,\n        \"_score\": 0.03200204813108039,\n        \"id\": 60015,\n        \"text\": \"LlamaIndex is a Python library for building AI-powered applications.\"\n    },\n    {\n        \"_distance\": 0.641581407158715,\n        \"_match_score\": null,\n        \"_score\": 0.016129032258064516,\n        \"id\": 60014,\n        \"text\": \"PyTiDB is a Python library for developers to connect to TiDB.\"\n    }\n]\n```\n\n\n## Fusion Methods\n\nFusion methods combine results from vector (semantic) and full-text (keyword) searches into a single, unified ranking. This ensures that the final results leverage both semantic relevance and keyword matching.\n\nPyTiDB supports two fusion methods:\n\n- `rrf`: Reciprocal Rank Fusion (default)\n- `weighted`: Weighted Score Fusion\n\nYou can select the fusion method that best fits your use case to optimize hybrid search results.\n\n### Reciprocal Rank Fusion (RRF)\n\nReciprocal Rank Fusion (RRF) is an algorithm that evaluates search results by leveraging the rank of documents in multiple result sets.\n\nFor more details, see the [RRF paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).\n\n=== \"Python\"\n\n    Enable reciprocal rank fusion by specifying the `method` parameter as `\"rrf\"` in the `.fusion()` method.\n\n    ```python\n    results = (\n        table.search(\n            \"AI database\", search_type=\"hybrid\"\n        )\n        .fusion(method=\"rrf\")\n        .limit(3)\n        .to_list()\n    )\n    ```\n\n    Parameters:\n\n    - `k`: A constant (default: 60) to prevent division by zero and control the impact of high-ranked documents.\n\n### Weighted Score Fusion\n\nWeighted Score Fusion combines vector search and full-text search scores using weighted sum:\n\n```python\nfinal_score = vs_weight * vector_score + fts_weight * fulltext_score\n```\n\n=== \"Python\"\n\n    Enable weighted score fusion by specifying the `method` parameter as `\"weighted\"` in the `.fusion()` method.\n\n    For example, to give more weight to vector search, set the `vs_weight` parameter to 0.7 and the `fts_weight` parameter to 0.3:\n\n    ```python\n    results = (\n        table.search(\n            \"AI database\", search_type=\"hybrid\"\n        )\n        .fusion(method=\"weighted\", vs_weight=0.7, fts_weight=0.3)\n        .limit(3)\n        .to_list()\n    )\n    ```\n\n    Parameters:\n\n    - `vs_weight`: The weight of the vector search score.\n    - `fts_weight`: The weight of the full-text search score.\n\n\n## Rerank Method\n\nHybrid search also supports reranking using reranker-specific models. \n\n=== \"Python\"\n\n    Use the `rerank()` method to specify a reranker that sorts search results by relevance between the query and the documents.\n\n    **Example: Using JinaAI Reranker to rerank the hybrid search results**\n\n    ```python\n    reranker = Reranker(\n        # Use the `jina-reranker-m0` model\n        model_name=\"jina_ai/jina-reranker-m0\",\n        api_key=\"{your-jinaai-api-key}\"\n    )\n\n    results = (\n        table.search(\n            \"AI database\", search_type=\"hybrid\"\n        )\n        .fusion(method=\"rrf\", k=60)\n        .rerank(reranker, \"text\")\n        .limit(3)\n        .to_list()\n    )\n    ```\n\n    To check other reranker models, see the [Reranking](../guides/reranking.md) guide.\n"
  },
  {
    "path": "src/ai/guides/image-search.md",
    "content": "# Image search\n\n**Image search** helps you find similar images by comparing their visual content, not just text or metadata. This feature is useful for e-commerce, content moderation, digital asset management, and any scenario where you need to search for or deduplicate images based on appearance.\n\nTiDB enables image search using **vector search**. With automatic embedding, you can generate image embeddings from image URLs, PIL images, or keyword text using a multimodal embedding model. TiDB then efficiently searches for similar vectors at scale.\n\n!!! tip\n\n    For a complete example of image search, see the [Pet image search demo](../examples/image-search-with-pytidb.md).\n\n## Basic usage\n\n### Step 1. Define an embedding function\n\nTo generate image embeddings, you need an embedding model that supports image input.\n\nFor demonstration, you can use Jina AI's multimodal embedding model to generate image embeddings.\n\nGo to [Jina AI](https://jina.ai/embeddings) to create an API key, then initialize the embedding function as follows:\n\n```python hl_lines=\"7\"\nfrom pytidb.embeddings import EmbeddingFunction\n\nimage_embed = EmbeddingFunction(\n    # Or another provider/model that supports multimodal input\n    model_name=\"jina_ai/jina-embedding-v4\",\n    api_key=\"{your-jina-api-key}\",\n    multimodal=True,\n)\n```\n\n### Step 2. Create a table and vector field\n\nUse `VectorField()` to define a vector field for storing image embeddings. Set the `source_field` parameter to specify the field that stores image URLs.\n\n```python\nfrom pytidb.schema import TableModel, Field\n\nclass ImageItem(TableModel):\n    __tablename__ = \"image_items\"\n    id: int = Field(primary_key=True)\n    image_uri: str = Field()\n    image_vec: list[float] = image_embed.VectorField(\n        source_field=\"image_uri\"\n    )\n\ntable = client.create_table(schema=ImageItem, if_exists=\"overwrite\")\n```\n\n### Step 3. Insert image data\n\nWhen you insert data, the `image_vec` field is automatically populated with the embedding generated from the `image_uri`.\n\n```python\ntable.bulk_insert([\n    ImageItem(image_uri=\"https://example.com/image1.jpg\"),\n    ImageItem(image_uri=\"https://example.com/image2.jpg\"),\n    ImageItem(image_uri=\"https://example.com/image3.jpg\"),\n])\n```\n\n### Step 4. Perform image search\n\nImage search is a type of vector search. Automatic embedding lets you input an image URL, PIL image, or keyword text directly. All these inputs are converted to vector embeddings for similarity matching.\n\n#### Option 1: Search by image URL\n\nSearch for similar images by providing an image URL:\n\n```python\nresults = table.search(\"https://example.com/query.jpg\").limit(3).to_list()\n```\n\nThe client converts the input image URL into a vector. TiDB then finds and returns the most similar images by comparing their vectors.\n\n#### Option 2: Search by PIL image\n\nYou can also search for similar images by providing an image file or bytes:\n\n```python\nfrom PIL import Image\n\nimage = Image.open(\"/path/to/query.jpg\")\n\nresults = table.search(image).limit(3).to_list()\n```\n\nThe client converts the PIL image object into a Base64 string before sending it to the embedding model.\n\n#### Option 3: Search by keyword text\n\nYou can also search for similar images by providing keyword text. \n\nFor example, if you are working on a pet image dataset, you can search for similar images by keywords like \"orange tabby cat\" or \"golden retriever puppy\".\n\n```python\nresults = table.search(\"orange tabby cat\").limit(3).to_list()\n```\n\nThe keyword text will be converted to a vector embedding that captures the semantic meaning by the multimodal embedding model, and then a vector search will be performed to find the images whose embeddings are most similar to the keyword embedding.\n\n## See also\n\n- [Automatic embedding guide](./auto-embedding.md)\n- [Vector search guide](../concepts/vector-search.md)\n- [Pet image search demo](../examples/image-search-with-pytidb.md)\n"
  },
  {
    "path": "src/ai/guides/joins.md",
    "content": "# Multiple Table Joins\n\nAs a relational database, TiDB allows you to store diverse data in tables with different structures (for example: `chunks`, `documents`, `users`, `chats`) in a single database. Moreover, you can use joins to combine data from multiple tables to perform complex queries.\n\n## Basic Usage\n\n\n### Step 1. Create tables and insert sample data\n\n=== \"Python\"\n\n    Assuming you have already [connected to the TiDB database](./connect.md) via TiDBClient:\n\n    Create a `documents` table and insert some sample data:\n\n    ```python\n    from pytidb import Session\n    from pytidb.schema import TableModel, Field\n    from pytidb.sql import select\n\n    class Document(TableModel):\n        __tablename__ = \"documents\"\n        id: int = Field(primary_key=True)\n        title: str = Field(max_length=255)\n\n    client.create_table(schema=Document, if_exists=\"overwrite\")\n    client.table(\"documents\").truncate()\n    client.table(\"documents\").bulk_insert([\n        Document(id=1, title=\"The Power of Positive Thinking\"),\n        Document(id=2, title=\"The Happiness Advantage\"),\n        Document(id=3, title=\"The Art of Happiness\"),\n    ])\n    ```\n\n    Create a `chunks` table and insert some sample data:\n\n    ```python\n    class Chunk(TableModel):\n        __tablename__ = \"chunks\"\n        id: int = Field(primary_key=True)\n        text: str = Field(max_length=255)\n        document_id: int = Field(foreign_key=\"documents.id\")\n\n    client.create_table(schema=Chunk, if_exists=\"overwrite\")\n    client.table(\"chunks\").truncate()\n    client.table(\"chunks\").bulk_insert([\n        Chunk(id=1, text=\"Positive thinking can change your life\", document_id=1),\n        Chunk(id=2, text=\"Happiness leads to success\", document_id=2),\n        Chunk(id=3, text=\"Finding joy in everyday moments\", document_id=3),\n    ])\n    ```\n\n=== \"SQL\"\n\n    Create a `documents` table and insert some sample data:\n\n    ```sql\n    CREATE TABLE documents (\n        id INT PRIMARY KEY,\n        title VARCHAR(255) NOT NULL\n    );\n\n    INSERT INTO documents (id, title) VALUES \n        (1, 'The Power of Positive Thinking'),\n        (2, 'The Happiness Advantage'),\n        (3, 'The Art of Happiness');\n    ```\n\n    Create a `chunks` table and insert some sample data:\n\n    ```sql\n    CREATE TABLE chunks (\n        id INT PRIMARY KEY,\n        text VARCHAR(255) NOT NULL,\n        document_id INT NOT NULL,\n        FOREIGN KEY (document_id) REFERENCES documents(id)\n    );\n\n    INSERT INTO chunks (id, text, document_id) VALUES \n        (1, 'Positive thinking can change your life', 1),\n        (2, 'Happiness leads to success', 2),\n        (3, 'Finding joy in everyday moments', 3);\n    ```\n\n\n### Step 2. Perform a join query\n\n=== \"Python\"\n\n    ```python\n    with Session(client.db_engine) as db_session:\n        query = (\n            select(Chunk)\n            .join(Document, Chunk.document_id == Document.id)\n            .where(Document.title == \"The Power of Positive Thinking\")\n        )\n        chunks = db_session.exec(query).all()\n\n    [(c.id, c.text, c.document_id) for c in chunks]\n    ```\n\n=== \"SQL\"\n\n    Perform a join query to combine data from the `chunks` and `documents` tables:\n\n    ```sql\n    SELECT c.id, c.text, c.document_id\n    FROM chunks c\n    JOIN documents d ON c.document_id = d.id\n    WHERE d.title = 'The Power of Positive Thinking';\n    ```\n"
  },
  {
    "path": "src/ai/guides/raw-queries.md",
    "content": "# Raw Queries\n\n## Operate data with raw SQL\n\nYou can use `client.execute()` method to execute `INSERT`, `UPDATE`, `DELETE` and other data manipulation SQL statements.\n\n```python\nclient.execute(\"INSERT INTO chunks(text, user_id) VALUES ('sample text', 5)\")\n```\n\n### SQL injection prevention\n\nBoth of the `execute` and `query` methods are support the **Parameterized SQL** feature, which help you avoid [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) while building dynamic SQL statements.\n\n```python\nclient.execute(\n    \"INSERT INTO chunks(text, user_id) VALUES (:text, :user_id)\",\n    {\n        \"text\": \"sample text\",\n        \"user_id\": 6,\n    },\n)\n```\n\n## Query data with rawSQL\n\nYou can use `client.query()` method to execute `SELECT`, `SHOW` and other query SQL statements.\n\n### Output query result\n\nThe `client.query()` method will return a `SQLQueryResult` instance with some helper methods:\n\n- `to_pydantic()`\n- `to_list()`\n- `to_pandas()`\n- `to_rows()`\n- `scalar()`\n\n\n#### As Pydantic model\n\nThe `to_pydantic()` method will return a list of Pydantic models.\n\n```python\nclient.query(\"SELECT id, text, user_id FROM chunks\").to_pydantic()\n```\n\n#### As SQLAlchemy result rows\n\nThe `to_rows()` method will return a list of tuple, every tuple represent of one row of data.\n\n```python\nclient.query(\"SHOW TABLES;\").to_rows()\n```\n\n#### As list of dict\n\nThe `to_list()` method will convert the query result into a list of dict.\n\n```python\nclient.query(\n    \"SELECT id, text, user_id FROM chunks WHERE user_id = :user_id\",\n    {\n        \"user_id\": 3\n    }\n).to_list()\n```\n\n#### As pandas DataFrame\n\nThe `to_pandas()` method to convert the query result to a `pandas.DataFrame`, which is displayed as human-friendly style on the notebook:\n\n```python\nclient.query(\"SELECT id, text, user_id FROM chunks\").to_pandas()\n```\n\n#### As scalar value\n\nThe `scalar()` method will return the first column of the first row of the result set.\n\n```python\nclient.query(\"SELECT COUNT(*) FROM chunks;\").scalar()\n```"
  },
  {
    "path": "src/ai/guides/reranking.md",
    "content": "# Reranking\n\nReranking is a technique used to improve the relevance and accuracy of search results by re-evaluating and reordering them using a dedicated rerank model.\n\nThe search process works in two stages:\n\n1. **Initial Retrieval**: Vector search identifies the top `k` most similar documents from the collection\n2. **Reranking**: A reranking model evaluates these `k` documents based on the relevance between the query and the documents and reorders them to produce the final top `n` results (where `n` ≤ `k`)\n\nThis two-stage retrieval approach significantly improves both document relevance and accuracy.\n\n## Basic Usage\n\n=== \"Python\"\n\n    PyTiDB provides the `Reranker` class that allows you to use reranker models from multiple third-party providers.\n\n    1. Create a reranker instance\n\n        ```python\n        from pytidb.rerankers import Reranker\n\n        reranker = Reranker(model_name=\"{provider}/{model_name}\")\n        ```\n\n    2. Apply reranker via `.rerank()` method\n\n        ```python\n        table.search(\"{query}\").rerank(reranker, \"{field_to_rerank}\").limit(3)\n        ```\n\n## Supported Providers\n\nHere are some examples to use reranker models from third-party providers.\n\n### Jina AI\n\nTo enable reranker provided by JinaAI, go to their [website](https://jina.ai/reranker) to create a API key.\n\nFor example:\n\n```python\njinaai = Reranker(\n    # Using the `jina-reranker-m0` model\n    model_name=\"jina_ai/jina-reranker-m0\",\n    api_key=\"{your-jinaai-api-key}\"\n)\n```\n"
  },
  {
    "path": "src/ai/guides/tables.md",
    "content": "# Working with tables\n\nTiDB uses tables to organize and store collections of related data. It provides flexible schema definition capabilities, allowing you to structure your tables according to your specific requirements.\n\nA table can contain multiple columns with different data types to store various kinds of data. Supported data types include text, numbers, vectors, binary data (`BLOB`), JSON, and more.\n\n!!! tip\n\n    For a complete working example, see the [basic example](https://github.com/pingcap/pytidb/tree/main/examples/basic) in our repository.\n\n## Create a table\n\n### Using TableModel\n\nTiDB provides a `TableModel` class that represents the schema of a table. This class is compatible with the [Pydantic Model](https://docs.pydantic.dev/latest/concepts/models/) and allows you to define the table structure in a declarative way.\n\nIn the following example, you create a table named `items` with these columns:\n\n- `id`: a primary key column with an integer type\n- `content`: a text type column\n- `embedding`: a vector type column with 3 dimensions\n- `meta`: a JSON type column\n\n=== \"Python\"\n\n    After you [connect to the database](./connect.md) using PyTiDB and obtain a `client` instance, you can create a table with the `create_table` method.\n\n    ```python hl_lines=\"12\"\n    from pytidb.schema import TableModel, Field, VectorField\n    from pytidb.datatype import TEXT, JSON\n\n    class Item(TableModel):\n        __tablename__ = \"items\"\n\n        id: int = Field(primary_key=True)\n        content: str = Field(sa_type=TEXT)\n        embedding: list[float] = VectorField(dimensions=3)\n        meta: dict = Field(sa_type=JSON, default_factory=dict)\n\n    table = client.create_table(schema=Item, if_exists=\"overwrite\")\n    ```\n\n    The `create_table` method accepts these parameters:\n\n    - `schema`: The `TableModel` class that defines your table structure.\n    - `if_exists`: The creation mode of the table.\n        - `raise` (default): Creates the table if it does not exist; raises an error if it already exists.\n        - `skip`: Creates the table if it does not exist; does nothing if it already exists.\n        - `overwrite`: Drops the existing table and creates a new one. This is useful for **testing and development**, but not recommended for production environments.\n  \n    Once the table is created, you can use the `table` object to insert, update, delete, and query data.\n\n=== \"SQL\"\n\n    Use the `CREATE TABLE` statement to create a table.\n\n    ```sql\n    CREATE TABLE items (\n        id INT PRIMARY KEY,\n        content TEXT,\n        embedding VECTOR(3),\n        meta JSON\n    );\n    ```\n\n## Add data to a table\n\n### With TableModel\n\nYou can use a `TableModel` instance to represent a record and insert it into the table.\n\nTo insert a single record:\n\n=== \"Python\"\n\n    Use the `table.insert()` method to insert a single record into the table.\n\n    ```python\n    table.insert(\n        Item(\n            id=1,\n            content=\"TiDB is a distributed SQL database\",\n            embedding=[0.1, 0.2, 0.3],\n            meta={\"category\": \"database\"},\n        )\n    )\n    ```\n\n=== \"SQL\"\n\n    Use the `INSERT INTO` statement to insert a single record into the table.\n\n    ```sql\n    INSERT INTO items(id, content, embedding, meta)\n    VALUES (1, 'TiDB is a distributed SQL database', '[0.1, 0.2, 0.3]', '{\"category\": \"database\"}');\n    ```\n\nTo insert multiple records:\n\n=== \"Python\"\n\n    Use the `table.bulk_insert()` method to insert multiple records into the table.\n\n    ```python\n    table.bulk_insert([\n        Item(\n            id=2,\n            content=\"GPT-4 is a large language model\",\n            embedding=[0.4, 0.5, 0.6],\n            meta={\"category\": \"llm\"},\n        ),\n        Item(\n            id=3,\n            content=\"LlamaIndex is a Python library for building AI-powered applications\",\n            embedding=[0.7, 0.8, 0.9],\n            meta={\"category\": \"rag\"},\n        ),\n    ])\n    ```\n\n=== \"SQL\"\n\n    Use the `INSERT INTO` statement to insert multiple records into the table.\n\n    ```sql\n    INSERT INTO items(id, content, embedding, meta)\n    VALUES\n        (2, 'GPT-4 is a large language model', '[0.4, 0.5, 0.6]', '{\"category\": \"llm\"}'),\n        (3, 'LlamaIndex is a Python library for building AI-powered applications', '[0.7, 0.8, 0.9]', '{\"category\": \"rag\"}');\n    ```\n\n### With Dict\n\nYou can also use `dict` to represent records and insert them into the table. This approach is more flexible and doesn't require to use a `TableModel` to insert data.\n\nTo insert a single record:\n\n=== \"Python\"\n\n    Use the `table.insert()` method with a dictionary to insert a single record into the table.\n\n    ```python\n    table.insert({\n        \"id\": 1,\n        \"content\": \"TiDB is a distributed SQL database\",\n        \"embedding\": [0.1, 0.2, 0.3],\n        \"meta\": {\"category\": \"database\"},\n    })\n    ```\n\n=== \"SQL\"\n\n    Use the `INSERT INTO` statement to insert a single record into the table.\n\n    ```sql\n    INSERT INTO items(id, content, embedding, meta)\n    VALUES (1, 'TiDB is a distributed SQL database', '[0.1, 0.2, 0.3]', '{\"category\": \"database\"}');\n    ```\n\n## Save data to a table\n\nThe `save` method provides a convenient way to insert or update a single record. If a record with the specified primary key does not exist, it creates a new record. If the record already exists, it overwrites the entire record.\n\n!!! note\n\n    If a record ID already exists, `table.save()` function overwrites the entire record. To change only part of a record, use `table.update()`.\n\n=== \"Python\"\n\n    Use the `table.save()` method to save a single record to the table.\n\n    **Example: Save a new record**\n\n    ```python\n    saved_record = table.save(\n        Item(\n            id=4,\n            content=\"Vector databases enable AI applications\",\n            embedding=[1.0, 1.1, 1.2],\n            meta={\"category\": \"vector-db\"},\n        )\n    )\n    ```\n\n    **Example: Save an existing record (overwrites the entire record)**\n\n    ```python\n    # This overwrites the entire record with id=1\n    updated_record = table.save(\n        Item(\n            id=1,  # Existing ID\n            content=\"Updated content for TiDB\",\n            embedding=[0.2, 0.3, 0.4],\n            meta={\"category\": \"updated\"},\n        )\n    )\n    ```\n\n=== \"SQL\"\n\n    Use the `INSERT ... ON DUPLICATE KEY UPDATE` statement to save a record.\n\n    **Example: Save a new record or update if it exists**\n\n    ```sql\n    INSERT INTO items(id, content, embedding, meta)\n    VALUES (4, 'Vector databases enable AI applications', '[1.0, 1.1, 1.2]', '{\"category\": \"vector-db\"}')\n    ON DUPLICATE KEY UPDATE\n        content = VALUES(content),\n        embedding = VALUES(embedding),\n        meta = VALUES(meta);\n    ```\n\n## Query data from a table\n\nTo fetch records from a table:\n\n=== \"Python\"\n\n    Use the `table.query()` method to fetch the records from the table.\n\n    **Example: Fetch the first 10 records**\n\n    ```python\n    result = table.query(limit=10).to_list()\n    ```\n\n=== \"SQL\"\n\n    Use the `SELECT` statement to fetch the records from the table.\n\n    **Example: Fetch the first 10 records**\n\n    ```sql\n    SELECT * FROM items LIMIT 10;\n    ```\n\nTo fetch records based on query conditions:\n\n=== \"Python\"\n\n    Pass the `filters` parameter to the `table.query()` method.\n\n    ```python\n    result = table.query(\n        filters={\"meta.category\": \"database\"},\n        limit=10\n    ).to_list()\n    ```\n\n=== \"SQL\"\n\n    Use the `WHERE` clause to filter records.\n\n    **Example: Fetch the 10 records with category \"database\"**\n\n    ```sql\n    SELECT * FROM items WHERE meta->>'$.category' = 'database' LIMIT 10;\n    ```\n\nFor a complete list of supported filter operations and examples, refer to the [filtering](./filtering.md) guide.\n\n## Update data in a table\n\n=== \"Python\"\n\n    Use the `table.update()` method to update records with [filters](./filtering.md).\n\n    **Example: Update the record whose `id` equals 1**\n\n    ```python\n    table.update(\n        values={\n            \"content\": \"TiDB Cloud Starter is a fully managed, auto-scaling cloud database service\"\n            \"embedding\": [0.1, 0.2, 0.4],\n            \"meta\": {\"category\": \"dbass\"},\n        },\n        filters={\n            \"id\": 1\n        },\n    )\n    ```\n\n=== \"SQL\"\n\n    Use the `UPDATE` statement to update records with [filters](./filtering.md).\n\n    **Example: Update the record whose `id` equals 1**\n\n    ```sql\n    UPDATE items\n    SET\n        content = 'TiDB Cloud Starter is a fully managed, auto-scaling cloud database service',\n        embedding = '[0.1, 0.2, 0.4]',\n        meta = '{\"category\": \"dbass\"}'\n    WHERE\n        id = 1;\n    ```\n\n## Delete from a table\n\n=== \"Python\"\n\n    Use the `table.delete()` method to delete records with [filters](./filtering.md).\n\n    **Example: Delete the record where `id` equals 2**\n\n    ```python\n    table.delete(\n        filters={\n            \"id\": 2\n        }\n    )\n    ```\n\n=== \"SQL\"\n\n    Use the `DELETE` statement to delete records with [filters](./filtering.md).\n\n    **Example: Delete the record where `id` equals 2**\n\n    ```sql\n    DELETE FROM items WHERE id = 2;\n    ```\n\n## Truncate a table\n\n=== \"Python\"\n\n    To remove all data from the table but keep the table structure, use the `table.truncate()` method.\n\n    ```python\n    table.truncate()\n    ```\n\n    To check that the table is truncated, verify that it contains 0 rows.\n\n    ```python\n    table.rows()\n    ```\n\n=== \"SQL\"\n\n    To remove all data from the table but keep the table structure, use the `TRUNCATE TABLE` statement.\n\n    ```sql\n    TRUNCATE TABLE items;\n    ```\n\n    To check that the table is truncated, verify that it contains 0 rows.\n\n    ```sql\n    SELECT COUNT(*) FROM items;\n    ```\n\n## Drop a table\n\n=== \"Python\"\n\n    To permanently remove a table from the database, use the `client.drop_table()` method.\n\n    ```python\n    client.drop_table(\"items\")\n    ```\n\n    To check that the table is removed from the database:\n\n    ```python\n    client.table_names()\n    ```\n\n=== \"SQL\"\n\n    To permanently remove a table from the database, use the `DROP TABLE` statement.\n\n    ```sql\n    DROP TABLE items;\n    ```\n\n    To check that the table is removed from the database:\n\n    ```sql\n    SHOW TABLES;\n    ```\n"
  },
  {
    "path": "src/ai/guides/transaction.md",
    "content": "# Transaction\n\nTiDB supports ACID transactions, which ensure data consistency and reliability.\n\n## Basic Usage\n\n=== \"Python\"\n\n    ```python\n    with client.session() as session:\n        initial_total_balance = session.query(\"SELECT SUM(balance) FROM players\").scalar()\n\n        # Transfer 10 coins from player 1 to player 2\n        session.execute(\"UPDATE players SET balance = balance - 10 WHERE id = 1\")\n        session.execute(\"UPDATE players SET balance = balance + 10 WHERE id = 2\")\n\n        session.commit()\n        # or session.rollback()\n\n        final_total_balance = session.query(\"SELECT SUM(balance) FROM players\").scalar()\n        assert final_total_balance == initial_total_balance\n    ```\n\n## See also\n\n- [TiDB Develop Guide - Transaction](https://docs.pingcap.com/tidbcloud/dev-guide-transaction-overview/)\n- [TiDB Docs- SQL Reference - Transactions](https://docs.pingcap.com/tidbcloud/transaction-overview/)"
  },
  {
    "path": "src/ai/guides/vector-search.md",
    "content": "# Vector Search\n\nVector search uses semantic similarity to help you find the most relevant records, even if your query does not explicitly include all the keywords.\n\n!!! tip\n\n    For a complete example of vector search, see the [vector-search example](../examples/vector-search-with-pytidb.md).\n\n\n## Basic Usage\n\nThis section shows you how to use vector search in your application in minimal steps. Before you start, you need to [connect to the database](./connect.md).\n\n### Step 1. Create a table with a vector field\n\n=== \"Python\"\n\n    You can use `client.create_table()` to create a table and use `VectorField` to define a vector field.\n\n    In this example, we create a table named `documents` with four columns:\n\n    - `id`: The primary key of the table.\n    - `text`: The text content of the document.\n    - `text_vec`: The vector embedding of the text content.\n    - `meta`: The metadata of the document, which is a JSON object.\n\n    ```python hl_lines=\"9\"\n    from pytidb.schema import TableModel, Field, VectorField\n    from pytidb.datatype import TEXT, JSON\n\n    class Document(TableModel):\n        __tablename__ = \"documents\"\n\n        id: int = Field(primary_key=True)\n        text: str = Field(sa_type=TEXT)\n        text_vec: list[float] = VectorField(dimensions=3)\n        meta: dict = Field(sa_type=JSON, default_factory=dict)\n\n    table = client.create_table(schema=Document, if_exists=\"overwrite\")\n    ```\n\n    The `VectorField` class accepts the following parameters:\n\n    - `dimensions`: The number of dimensions of the vector. Once specified, only vectors with this exact dimension can be stored in this field.\n    - `index`: Whether to create a [vector index](https://docs.pingcap.com/tidbcloud/vector-search-index/) for the vector field. Defaults to `True`.\n    - `distance_metric`: The distance metric to use for the vector index. Supported values:\n        - `DistanceMetric.COSINE` (default): Cosine distance metric, suitable for measuring text similarity\n        - `DistanceMetric.L2`: L2 distance metric, suitable for capturing overall difference\n\n=== \"SQL\"\n\n    You can use the `CREATE TABLE` statement to create a table and using `VECTOR` type to define a vector column.\n\n    ```sql hl_lines=\"4 5\"\n    CREATE TABLE documents (\n        id INT PRIMARY KEY,\n        text TEXT,\n        text_vec VECTOR(3),\n        VECTOR INDEX `vec_idx_text_vec`((VEC_COSINE_DISTANCE(`text_vec`)))\n    );\n    ```\n\n    In this example:\n\n    - The `text_vec` column is defined as a `VECTOR` type with 3 dimensions, it means that the vector to be stored in this column must have 3 dimensions.\n    - A vector index is created using the `VEC_COSINE_DISTANCE` function to optimize vector search performance\n\n    TiDB supports two distance functions for vector indexes:\n    \n    - `VEC_COSINE_DISTANCE`: Calculates the cosine distance between two vectors\n    - `VEC_L2_DISTANCE`: Calculates L2 distance (Euclidean distance) between two vectors\n\n\n### Step 2. Insert vector data into the table\n\nFor demonstration purposes, insert some text and their corresponding vector embeddings into the table. In this example, we use simple 3-dimensional vectors.\n\nWe insert three documents:\n\n- `dog` with the vector embedding `[1, 2, 1]`\n- `fish` with the vector embedding `[1, 2, 4]`\n- `tree` with the vector embedding `[1, 0, 0]`\n\n=== \"Python\"\n\n    ```python\n    table.bulk_insert([\n        Document(text=\"dog\", text_vec=[1,2,1], meta={\"category\": \"animal\"}),\n        Document(text=\"fish\", text_vec=[1,2,4], meta={\"category\": \"animal\"}),\n        Document(text=\"tree\", text_vec=[1,0,0], meta={\"category\": \"plant\"}),\n    ])\n    ```\n\n=== \"SQL\"\n\n    ```sql\n    INSERT INTO documents (id, text, text_vec, meta)\n    VALUES\n        (1, 'dog', '[1,2,1]', '{\"category\": \"animal\"}'),\n        (2, 'fish', '[1,2,4]', '{\"category\": \"animal\"}'),\n        (3, 'tree', '[1,0,0]', '{\"category\": \"plant\"}');\n    ```\n\n!!! tip\n\n    In real-world applications, vector embeddings are usually generated by an [embedding model](../concepts/vector-search.md#embedding-model).\n    \n    For convenience, pytidb provides an auto embedding feature that can automatically generate vector embeddings for your text fields when you insert, update, or search—no manual processing needed.\n\n    For details, see the [Auto Embedding](./auto-embedding.md) guide.\n\n### Step 3. Perform vector search\n\nVector search uses vector distance metrics to measure the similarity and relevance between vectors. The closer the distance, the more relevant the record. To find the most relevant documents in the table, you need to specify a query vector. \n\nIn this example, we assume the query is `A swimming animal` and its vector embedding is `[1, 2, 3]`.\n\n=== \"Python\"\n\n    You can use the `table.search()` method to perform vector search, which uses `search_mode=\"vector\"` by default.\n\n    ```python\n    table.search([1, 2, 3]).limit(3).to_list()\n    ```\n\n    ```python title=\"Execution result\"\n    [\n        {\"id\": 2, \"text\": \"fish\", \"text_vec\": [1,2,4], \"_distance\": 0.00853986601633272},\n        {\"id\": 1, \"text\": \"dog\", \"text_vec\": [1,2,1], \"_distance\": 0.12712843905603044},\n        {\"id\": 3, \"text\": \"tree\", \"text_vec\": [1,0,0], \"_distance\": 0.7327387580875756},\n    ]\n    ```\n\n    The result shows that the most relevant document is `fish` with a distance of `0.00853986601633272`.\n\n=== \"SQL\"\n\n    You can use the `ORDER BY <distance_function>(<column_name>, <query_vector>) LIMIT <n>` clause in the `SELECT` statement to get the n nearest neighbors of the query vector.\n\n    In this example, we use the `vec_cosine_distance` function to calculate the cosine distance between the vectors stored in the `text_vec` column and the provided query vector `[1, 2, 3]`.\n\n    ```sql\n    SELECT id, text, vec_cosine_distance(text_vec, '[1,2,3]') AS distance\n    FROM documents\n    ORDER BY distance\n    LIMIT 3;\n    ```\n\n    ```plain title=\"Execution result\"\n    +----+----------+---------------------+\n    | id | text     | distance            |\n    +----+----------+---------------------+\n    |  2 | fish     | 0.00853986601633272 |\n    |  1 | dog      | 0.12712843905603044 |\n    |  3 | tree     |  0.7327387580875756 |\n    +----+----------+---------------------+\n    3 rows in set (0.15 sec)\n    ```\n\n    The result shows that the most relevant document is `fish` with a distance of `0.00853986601633272`.\n\n\n## Distance metrics\n\nDistance metrics are a measure of the similarity between a pair of vectors. Currently, TiDB supports the following distance metrics:\n\n=== \"Python\"\n\n    The `table.search()` API supports the following distance metrics:\n\n    | Metric Name              | Description                                                    | Best For |\n    |--------------------------|----------------------------------------------------------------|----------|\n    | `DistanceMetric.COSINE`  | Calculates the cosine distance between two vectors (default). Measures the angle between vectors. | Text embeddings, semantic search |\n    | `DistanceMetric.L2`      | Calculates the L2 distance (Euclidean distance) between two vectors. Measures the straight-line distance. | Image features |\n\n    To change the distance metric used for vector search, use the `.distance_metric()` method.\n\n    **Example: Use the L2 distance metric**\n\n    ```python\n    from pytidb.schema import DistanceMetric\n\n    results = (\n        table.search([1, 2, 3])\n            .distance_metric(DistanceMetric.L2)\n            .limit(10)\n            .to_list()\n    )\n    ```\n\n=== \"SQL\"\n\n    In SQL, you can use the following built-in functions to calculate vector distances directly in your queries:\n\n    | Function Name                                                                                                                        | Description                                                    |\n    |-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|\n    | [`VEC_L2_DISTANCE`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_l2_distance)                       | Calculates L2 distance (Euclidean distance) between two vectors |\n    | [`VEC_COSINE_DISTANCE`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_cosine_distance)               | Calculates the cosine distance between two vectors              |\n    | [`VEC_NEGATIVE_INNER_PRODUCT`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_negative_inner_product) | Calculates the negative of the inner product between two vectors|\n    | [`VEC_L1_DISTANCE`](https://docs.pingcap.com/tidbcloud/vector-search-functions-and-operators/#vec_l1_distance)                       | Calculates L1 distance (Manhattan distance) between two vectors |\n\n\n## Distance threshold\n\nThe `table.search()` API allows you to set a distance threshold to control the similarity of the returned results. By specifying this threshold, you can exclude less similar vectors and return only those that meet your relevance criteria.\n\n=== \"Python\"\n\n    Use the `.distance_threshold()` method to set a maximum distance for the search results. Only records with a distance less than the threshold are returned.\n\n    **Example: Only return documents with a distance less than 0.5**\n\n    ```python\n    results = table.search([1, 2, 3]).distance_threshold(0.5).limit(10).to_list()\n    ```\n\n=== \"SQL\"\n\n    In SQL, use the `HAVING` clause with a distance function to filter results by distance:\n\n    **Example: Only return documents with a distance less than 0.1**\n\n    ```sql\n    SELECT id, text, vec_cosine_distance(text_vec, '[1,2,3]') AS distance\n    FROM documents\n    HAVING distance < 0.1\n    ORDER BY distance\n    LIMIT 10;\n    ```\n\n## Distance range\n\nThe `table.search()` API also supports specifying a distance range to further refine the results.\n\n=== \"Python\"\n\n    Use the `.distance_range()` method to set both minimum and maximum distance values. Only records with a distance within this range are returned.\n\n    **Example: Only return documents with a distance between 0.01 and 0.05**\n\n    ```python\n    results = table.search([1, 2, 3]).distance_range(0.01, 0.05).limit(10).to_list()\n    ```\n\n=== \"SQL\"\n\n    To specify a distance range in SQL, use `BETWEEN` or other comparison operators in the `HAVING` clause:\n\n    **Example: Only return documents with a distance between 0.01 and 0.05**\n\n    ```sql\n    SELECT id, text, vec_l2_distance(text_vec, '[1,2,3]') AS distance\n    FROM documents\n    HAVING distance BETWEEN 0.01 AND 0.05\n    ORDER BY distance\n    LIMIT 10;\n    ```\n\n## Metadata filtering\n\nAs a relational database, TiDB supports a rich set of [SQL operators](https://docs.pingcap.com/tidbcloud/operators/) and allows flexible combinations of filtering conditions.\n\nFor vector search in TiDB, you can apply metadata filtering on scalar fields (e.g., integers, strings) or JSON fields.\n\nTypically, vector search combined with metadata filtering operates in two modes:\n\n- **Post-filtering**: In a two-stage retrieval process, TiDB first performs vector search to retrieve the top-k candidate results from the entire vector space, then applies the filter to this candidate set. The vector search stage typically leverages a vector index for efficiency.\n- **Pre-filtering**: The filter is applied before vector search. If the filter is highly selective and the filtered field is indexed with a scalar index, this approach can significantly reduce the search space and improve performance.\n\n### Post-filtering\n\n=== \"Python\"\n\n    Use the `.filter()` method with a filter dictionary to apply filtering to vector search.\n\n    By default, the `table.search()` API uses post-filtering mode to maximize search performance with the vector index.\n\n    **Example: Vector search with post-filtering**\n\n    ```python\n    results = (\n        table.search([1, 2, 3])\n            # The `meta` is a JSON field, and its value is a JSON object\n            # like {\"category\": \"animal\"}\n            .filter({\"meta.category\": \"animal\"})\n            .num_candidate(50)\n            .limit(10)\n            .to_list()\n    )\n    ```\n\n    !!! tip\n        When using a vector index, if the final `limit` is very small, the accuracy of the results may decrease. You can use the `.num_candidate()` method to control how many candidates to retrieve from the vector index during the vector search phase, without changing the `limit` parameter.\n\n        A higher `num_candidate` value generally improves recall but may reduce query performance. Adjust this value based on your dataset and accuracy requirements.\n\n=== \"SQL\"\n\n    Currently, vector indexes are only effective in strict ANN (Approximate Nearest Neighbor) queries, such as:\n\n    ```sql\n    SELECT * FROM <table> ORDER BY <distance_func>(<column>) LIMIT <n>\n    ```\n\n    In other words, you cannot use a `WHERE` clause together with a vector index in the same query.\n\n    If you need to combine vector search with additional filtering conditions, you can use the post-filtering pattern. In this approach, the ANN query will be divided into two parts:\n    \n    - The inner query performs the vector search using the vector index.\n    - The outer query applies the `WHERE` condition to filter the results.\n\n    ```sql hl_lines=\"8\"\n    SELECT *\n    FROM (\n        SELECT id, text, meta, vec_cosine_distance(text_vec, '[1,2,3]') AS distance\n        FROM documents\n        ORDER BY distance\n        LIMIT 50\n    ) candidates\n    WHERE meta->>'$.category' = 'animal'\n    ORDER BY distance\n    LIMIT 10;\n    ```\n\n    !!! tip\n\n        The post-filtering pattern may lead to false positives — for example, the inner query may retrieve the top 50 most similar records, but none of them match the `WHERE` condition.\n\n        To mitigate this, you can increase the `LIMIT` value (e.g., 50) in the **inner query** to fetch more candidates, improving the chances of returning enough valid results after filtering.\n\n    For supported SQL operators, see [Operators](https://docs.pingcap.com/tidbcloud/operators/) in the TiDB Cloud documentation.\n\n### Pre-filtering\n\n=== \"Python\"\n\n    To enable pre-filtering, set the `prefilter` parameter to `True` in the `.filter()` method.\n\n    **Example: Vector search with pre-filtering**\n\n    ```python\n    results = (\n        table.search([1, 2, 3])\n            .filter({\"meta.category\": \"animal\"}, prefilter=True)\n            .limit(10)\n            .to_list()\n    )\n    ```\n\n    For supported filter operators, see [Filtering](./filtering.md).\n\n=== \"SQL\"\n\n    In SQL, use the `->>` operator or `JSON_EXTRACT` to access JSON fields in the `WHERE` clause:\n\n    ```sql\n    SELECT id, text, meta, vec_cosine_distance(text_vec, '[1,2,3]') AS distance\n    FROM documents\n    WHERE meta->>'$.category' = 'animal'\n    ORDER BY distance\n    LIMIT 10;\n    ```\n\n    For supported SQL operators, see [Operators](https://docs.pingcap.com/tidbcloud/operators/) in the TiDB Cloud documentation.\n\n## Multiple vector fields\n\nTiDB supports defining multiple vector columns in a single table, allowing you to store and search different types of vector embeddings. \n\nFor example, you can store both text embeddings and image embeddings in the same table, making it convenient to manage multi-modal data.\n\n=== \"Python\"\n\n    You can define multiple vector fields in the schema and perform vector search on the specified vector field by using the `.vector_column()` method.\n\n    **Example: Specify the vector field to search on**\n\n    ```python hl_lines=\"6 8 17\"\n    # Create a table with multiple vector fields\n    class RichTextDocument(TableModel):\n        __tablename__ = \"rich_text_documents\"\n        id: int = Field(primary_key=True)\n        text: str = Field(sa_type=TEXT)\n        text_vec: list[float] = VectorField(dimensions=3)\n        image_url: str\n        image_vec: list[float] = VectorField(dimensions=3)\n\n    table = client.create_table(schema=RichTextDocument, if_exists=\"overwrite\")\n\n    # Insert sample data ...\n\n    # Search using image vector field\n    results = (\n        table.search([1, 2, 3])\n            .vector_column(\"image_vec\")\n            .distance_metric(DistanceMetric.COSINE)\n            .limit(10)\n            .to_list()\n    )\n    ```\n\n=== \"SQL\"\n\n    You can create multiple vector columns in a table and search them using suitable distance functions:\n\n    ```sql\n    -- Create a table with multiple vector fields\n    CREATE TABLE rich_text_documents (\n        id BIGINT PRIMARY KEY,\n        text TEXT,\n        text_vec VECTOR(3),\n        image_url VARCHAR(255),\n        image_vec VECTOR(3)\n    );\n\n    -- Insert sample data ...\n\n    -- Search using text vector\n    SELECT id, image_url, vec_l2_distance(image_vec, '[4,5,6]') AS image_distance\n    FROM rich_text_documents\n    ORDER BY image_distance\n    LIMIT 10;\n    ```\n\n\n## Output search results\n\n=== \"Python\"\n\n    The `table.search()` API lets you convert search results into several common data processing formats:\n\n    ### As SQLAlchemy result rows\n\n    To work with raw SQLAlchemy result rows, use:\n\n    ```python\n    table.search([1, 2, 3]).limit(10).to_rows()\n    ```\n\n    ### As a list of Python dictionaries\n\n    For easier manipulation in Python, convert the results to a list of dictionaries:\n\n    ```python\n    table.search([1, 2, 3]).limit(10).to_list()\n    ```\n\n    ### As a pandas DataFrame\n\n    To display results in a user-friendly table—especially useful in Jupyter notebooks—convert them to a pandas DataFrame:\n\n    ```python\n    table.search([1, 2, 3]).limit(10).to_pandas()\n    ```\n\n    ### As a list of Pydantic model instances\n\n    The `TableModel` class can also be used as a Pydantic model to represent data entities. To work with results as Pydantic model instances, use:\n\n    ```python\n    table.search([1, 2, 3]).limit(10).to_pydantic()\n    ```"
  },
  {
    "path": "src/ai/index.md",
    "content": "# TiDB for AI\n\nTiDB is an open-source, distributed SQL database designed for modern AI applications, offering seamless scalability, real-time analytics, and unified storage for vectors, documents, knowledge graphs, operational data and more.\n\n=== \"Python\"\n\n    TiDB provide a Python SDK and a series of integrations with popular AI frameworks to help developers build AI applications efficiently.\n\n    To install the TiDB Python SDK, run the following command:\n\n    ```bash\n    pip install pytidb\n    ```\n\n    Integrations:\n\n    - AI Frameworks: [LlamaIndex](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-llamaindex/), [LangChain](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-langchain/)\n    - ORM Libraries: [SQLAlchemy](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-sqlalchemy/), [Django-ORM](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-django-orm/), [Peewee](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-peewee/)\n    - AI Services: [Bedrock](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-amazon-bedrock/)\n    - Embedding Models/Services: [JinaAI](https://docs.pingcap.com/tidbcloud/vector-search-integrate-with-jinaai-embedding/)\n\n## Next Steps\n\n- 📖 Explore [Concepts](./concepts/vector-search.md) to understand the fundamentals of building modern AI applications\n- 🚀 Follow our [Quickstart Guide](./quickstart.md) to begin building your first AI application with TiDB\n"
  },
  {
    "path": "src/ai/integrations/embedding-cohere.md",
    "content": "---\ntitle: \"Integrate TiDB Vector Search with Cohere Embeddings API\"\ndescription: \"Learn how to integrate TiDB Vector Search with Cohere Embeddings API to store embeddings and perform semantic search.\"\nkeywords: \"TiDB, Cohere, Vector search, text embeddings, multilingual embeddings\"\n---\n\n# Integrate TiDB Vector Search with Cohere Embeddings API\n\nThis tutorial demonstrates how to use [Cohere](https://cohere.com/embed) to generate text embeddings, store them in TiDB vector storage, and perform semantic search.\n\n!!! info\n\n    Currently, [Server-Side Auto Embedding](../guides/auto-embedding.md) is only available on [TiDB Cloud Starter](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme) clusters in the following AWS regions:\n\n    - `Frankfurt (eu-central-1)`\n    - `Oregon (us-west-2)`\n    - `N. Virginia (us-east-1)`\n\n## Cohere Embeddings\n\nCohere offers multilingual embedding models for search, RAG, and classification. The latest `embed-v4.0` model supports text, images, and mixed content. You can use the Cohere Embeddings API with TiDB through the AI SDK or native SQL functions for automatic embedding generation.\n\n### Supported Models\n\n| Model Name                       | Dimensions | Max Input Tokens | Description |\n|----------------------------------|------------|------------------|-------------|\n| `cohere/embed-v4.0`             | 256, 512, 1024, 1536 (default) | 128k | Latest multimodal model supporting text, images, and mixed content (PDFs) |\n| `cohere/embed-english-v3.0`     | 1024       | 512              | High-performance English embedding model optimized for search and classification |\n| `cohere/embed-multilingual-v3.0`| 1024       | 512              | Multilingual model supporting 100+ languages |\n| `cohere/embed-english-light-v3.0` | 384     | 512              | Lightweight English model for faster processing with similar performance |\n| `cohere/embed-multilingual-light-v3.0` | 384 | 512          | Lightweight multilingual model for faster processing with similar performance |\n\nFor a complete list of supported models and detailed specifications, see the [Cohere Embeddings Documentation](https://docs.cohere.com/docs/cohere-embed).\n\n## Usage example\n\nThis example demonstrates creating a vector table, inserting documents, and performing similarity search using Cohere embedding models.\n\n### Step 1: Connect to the database\n\n=== \"Python\"\n\n    ```python\n    from pytidb import TiDBClient\n\n    tidb_client = TiDBClient.connect(\n        host=\"{gateway-region}.prod.aws.tidbcloud.com\",\n        port=4000,\n        username=\"{prefix}.root\",\n        password=\"{password}\",\n        database=\"{database}\",\n        ensure_db=True,\n    )\n    ```\n\n=== \"SQL\"\n\n    ```bash\n    mysql -h {gateway-region}.prod.aws.tidbcloud.com \\\n        -P 4000 \\\n        -u {prefix}.root \\\n        -p{password} \\\n        -D {database}\n    ```\n\n### Step 2: Configure the API key\n\nCreate your API key from the [Cohere Dashboard](https://dashboard.cohere.com/api-keys) and bring your own key (BYOK) to use the embedding service.\n\n=== \"Python\"\n\n    Configure the API key for the Cohere embedding provider using the TiDB Client:\n\n    ```python\n    tidb_client.configure_embedding_provider(\n        provider=\"cohere\",\n        api_key=\"{your-cohere-api-key}\",\n    )\n    ```\n\n=== \"SQL\"\n\n    Set the API key for the Cohere embedding provider using SQL:\n\n    ```sql\n    SET @@GLOBAL.TIDB_EXP_EMBED_COHERE_API_KEY = \"{your-cohere-api-key}\";\n    ```\n\n### Step 3: Create a vector table\n\nCreate a table with a vector field that uses the `cohere/embed-v4.0` model to generate 1536-dimensional vectors (default dimension):\n\n=== \"Python\"\n\n    ```python\n    from pytidb.schema import TableModel, Field\n    from pytidb.embeddings import EmbeddingFunction\n    from pytidb.datatype import TEXT\n\n    class Document(TableModel):\n        __tablename__ = \"sample_documents\"\n        id: int = Field(primary_key=True)\n        content: str = Field(sa_type=TEXT)\n        embedding: list[float] = EmbeddingFunction(\n            model_name=\"cohere/embed-v4.0\"\n        ).VectorField(source_field=\"content\")\n\n    table = tidb_client.create_table(schema=Document, if_exists=\"overwrite\")\n    ```\n\n=== \"SQL\"\n\n    ```sql\n    CREATE TABLE sample_documents (\n        `id`        INT PRIMARY KEY,\n        `content`   TEXT,\n        `embedding` VECTOR(1536) GENERATED ALWAYS AS (EMBED_TEXT(\n            \"cohere/embed-v4.0\",\n            `content`\n        )) STORED\n    );\n    ```\n\n### Step 4: Insert data into the table\n\n=== \"Python\"\n\n    Use the `table.insert()` or `table.bulk_insert()` API to add data:\n\n    ```python\n    documents = [\n        Document(id=1, content=\"Python: High-level programming language for data science and web development.\"),\n        Document(id=2, content=\"Python snake: Non-venomous constrictor found in tropical regions.\"),\n        Document(id=3, content=\"Python framework: Django and Flask are popular web frameworks.\"),\n        Document(id=4, content=\"Python libraries: NumPy and Pandas for data analysis.\"),\n        Document(id=5, content=\"Python ecosystem: Rich collection of packages and tools.\"),\n    ]\n    table.bulk_insert(documents)\n    ```\n\n=== \"SQL\"\n\n    Insert data using the `INSERT INTO` statement:\n\n    ```sql\n    INSERT INTO sample_documents (id, content)\n    VALUES\n        (1, \"Python: High-level programming language for data science and web development.\"),\n        (2, \"Python snake: Non-venomous constrictor found in tropical regions.\"),\n        (3, \"Python framework: Django and Flask are popular web frameworks.\"),\n        (4, \"Python libraries: NumPy and Pandas for data analysis.\"),\n        (5, \"Python ecosystem: Rich collection of packages and tools.\");\n    ```\n\n### Step 5: Search for similar documents\n\n=== \"Python\"\n\n    Use the `table.search()` API to perform vector search:\n\n    ```python\n    results = table.search(\"How to learn Python programming?\") \\\n        .limit(2) \\\n        .to_list()\n    print(results)\n    ```\n\n=== \"SQL\"\n\n    Use the `VEC_EMBED_COSINE_DISTANCE` function to perform vector search based on cosine distance metric:\n\n    ```sql\n    SELECT\n        `id`,\n        `content`,\n        VEC_EMBED_COSINE_DISTANCE(embedding, \"How to learn Python programming?\") AS _distance\n    FROM sample_documents\n    ORDER BY _distance ASC\n    LIMIT 2;\n    ```\n"
  },
  {
    "path": "src/ai/integrations/embedding-gemini.md",
    "content": "---\ntitle: \"Integrate TiDB Vector Search with Google Gemini Embeddings API\"\ndescription: \"Learn how to integrate TiDB Vector Search with Google Gemini Embeddings API to store embeddings and perform semantic search.\"\nkeywords: \"TiDB, Google Gemini, Vector search, text embeddings, multimodal embeddings\"\n---\n\n# Integrate TiDB Vector Search with Google Gemini Embeddings API\n\nThis tutorial demonstrates how to use [Google Gemini](https://ai.google.dev/models/gemini) to generate embeddings for text and image data, store them in TiDB vector storage, and perform semantic search.\n\n!!! info\n\n    Currently, [Server-Side Auto Embedding](../guides/auto-embedding.md) is only available on [TiDB Cloud Starter](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme) clusters in the following AWS regions:\n\n    - `Frankfurt (eu-central-1)`\n    - `Oregon (us-west-2)`\n    - `N. Virginia (us-east-1)`\n\n## Google Gemini Embeddings\n\nThe Gemini API provides text embedding models that generate embeddings for words, phrases, sentences, and code. These embeddings enable advanced natural language processing (NLP) tasks such as semantic search, classification, and clustering. By using context-aware embeddings, you can achieve more accurate results than with traditional keyword-based methods.\n\n### Supported Models\n\n| Model Name | Dimensions (recommended) | Max Input Tokens | Description |\n|------------|--------------------------|------------|-------------|\n| `gemini-embedding-001` | 128–3072 (768, 1536, 3072) | 2048 | Text and code embeddings |\n\nFor a complete list of supported models and detailed specifications, see the [Google Gemini Embeddings Documentation](https://ai.google.dev/gemini-api/docs/embeddings).\n\n## Usage example\n\nThis example demonstrates creating a vector table, inserting documents, and performing similarity search using Google Gemini embedding models.\n\n### Step 1: Connect to the database\n\n=== \"Python\"\n\n    ```python\n    from pytidb import TiDBClient\n\n    tidb_client = TiDBClient.connect(\n        host=\"{gateway-region}.prod.aws.tidbcloud.com\",\n        port=4000,\n        username=\"{prefix}.root\",\n        password=\"{password}\",\n        database=\"{database}\",\n        ensure_db=True,\n    )\n    ```\n\n=== \"SQL\"\n\n    ```bash\n    mysql -h {gateway-region}.prod.aws.tidbcloud.com \\\n        -P 4000 \\\n        -u {prefix}.root \\\n        -p{password} \\\n        -D {database}\n    ```\n\n### Step 2: Configure the API key\n\nCreate your API key from the [Google AI Studio](https://makersuite.google.com/app/apikey) and bring your own key (BYOK) to use the embedding service.\n\n=== \"Python\"\n\n    Configure the API key for the Google Gemini embedding provider using the TiDB Client:\n\n    ```python\n    tidb_client.configure_embedding_provider(\n        provider=\"google_gemini\",\n        api_key=\"{your-google-api-key}\",\n    )\n    ```\n\n=== \"SQL\"\n\n    Set the API key for the Google Gemini embedding provider using SQL:\n\n    ```sql\n    SET @@GLOBAL.TIDB_EXP_EMBED_GEMINI_API_KEY = \"{your-google-api-key}\";\n    ```\n\n### Step 3: Create a vector table\n\nCreate a table with a vector field that uses the `gemini-embedding-001` model to generate 3072-dimensional vectors (default):\n\n=== \"Python\"\n\n    ```python\n    from pytidb.schema import TableModel, Field\n    from pytidb.embeddings import EmbeddingFunction\n    from pytidb.datatype import TEXT\n\n    class Document(TableModel):\n        __tablename__ = \"sample_documents\"\n        id: int = Field(primary_key=True)\n        content: str = Field(sa_type=TEXT)\n        embedding: list[float] = EmbeddingFunction(\n            model_name=\"gemini-embedding-001\"\n        ).VectorField(source_field=\"content\")\n\n    table = tidb_client.create_table(schema=Document, if_exists=\"overwrite\")\n    ```\n\n=== \"SQL\"\n\n    ```sql\n    CREATE TABLE sample_documents (\n        `id`        INT PRIMARY KEY,\n        `content`   TEXT,\n        `embedding` VECTOR(3072) GENERATED ALWAYS AS (EMBED_TEXT(\n            \"gemini-embedding-001\",\n            `content`\n        )) STORED\n    );\n    ```\n\n### Step 4: Insert data into the table\n\n=== \"Python\"\n\n    Use the `table.insert()` or `table.bulk_insert()` API to add data:\n\n    ```python\n    documents = [\n        Document(id=1, content=\"Java: Object-oriented language for cross-platform development.\"),\n        Document(id=2, content=\"Java coffee: Bold Indonesian beans with low acidity.\"),\n        Document(id=3, content=\"Java island: Densely populated, home to Jakarta.\"),\n        Document(id=4, content=\"Java's syntax is used in Android apps.\"),\n        Document(id=5, content=\"Dark roast Java beans enhance espresso blends.\"),\n    ]\n    table.bulk_insert(documents)\n    ```\n\n=== \"SQL\"\n\n    Insert data using the `INSERT INTO` statement:\n\n    ```sql\n    INSERT INTO sample_documents (id, content)\n    VALUES\n        (1, \"Java: Object-oriented language for cross-platform development.\"),\n        (2, \"Java coffee: Bold Indonesian beans with low acidity.\"),\n        (3, \"Java island: Densely populated, home to Jakarta.\"),\n        (4, \"Java's syntax is used in Android apps.\"),\n        (5, \"Dark roast Java beans enhance espresso blends.\");\n    ```\n\n### Step 5: Search for similar documents\n\n=== \"Python\"\n\n    Use the `table.search()` API to perform vector search:\n\n    ```python\n    results = table.search(\"How to start learning Java programming?\") \\\n        .limit(2) \\\n        .to_list()\n    print(results)\n    ```\n\n=== \"SQL\"\n\n    Use the `VEC_EMBED_COSINE_DISTANCE` function to perform vector search based on cosine distance metric:\n\n    ```sql\n    SELECT\n        `id`,\n        `content`,\n        VEC_EMBED_COSINE_DISTANCE(embedding, \"How to start learning Java programming?\") AS _distance\n    FROM sample_documents\n    ORDER BY _distance ASC\n    LIMIT 2;\n    ```\n\n## Custom embedding dimensions\n\nThe `gemini-embedding-001` model supports flexible vector dimensions through Matryoshka Representation Learning (MRL). You can specify the desired dimensions in your embedding function:\n\n=== \"Python\"\n\n    ```python\n    # For 1536 dimensions\n    embedding: list[float] = EmbeddingFunction(\n        model_name=\"gemini-embedding-001\",\n        dimensions=1536\n    ).VectorField(source_field=\"content\")\n    \n    # For 768 dimensions\n    embedding: list[float] = EmbeddingFunction(\n        model_name=\"gemini-embedding-001\", \n        dimensions=768\n    ).VectorField(source_field=\"content\")\n    ```\n\n=== \"SQL\"\n\n    ```sql\n    -- For 1536 dimensions\n    `embedding` VECTOR(1536) GENERATED ALWAYS AS (EMBED_TEXT(\n        \"gemini-embedding-001\",\n        `content`,\n        '{\"embedding_config\": {\"output_dimensionality\": 1536}}'\n    )) STORED\n    \n    -- For 768 dimensions  \n    `embedding` VECTOR(768) GENERATED ALWAYS AS (EMBED_TEXT(\n        \"gemini-embedding-001\",\n        `content`,\n        '{\"embedding_config\": {\"output_dimensionality\": 768}}'\n    )) STORED\n    ```\n\nChoose the appropriate dimensions based on your performance requirements and storage constraints. Higher dimensions provide better accuracy but require more storage and computational resources.\n"
  },
  {
    "path": "src/ai/integrations/embedding-huggingface.md",
    "content": "---\ntitle: \"Integrate TiDB Vector Search with Hugging Face Embeddings\"\ndescription: \"Learn how to integrate TiDB Vector Search with Hugging Face models to store embeddings and perform semantic search.\"\nkeywords: \"TiDB, Hugging Face, Vector search, text embeddings, transformers\"\n---\n\n# Integrate TiDB Vector Search with Hugging Face Embeddings\n\nThis tutorial demonstrates how to use [Hugging Face](https://huggingface.co/) models to generate text embeddings, store them in TiDB vector storage, and perform semantic search.\n\n!!! info\n\n    Currently, [Server-Side Auto Embedding](../guides/auto-embedding.md) is only available on [TiDB Cloud Starter](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme) clusters in the following AWS regions:\n\n    - `Frankfurt (eu-central-1)`\n    - `Oregon (us-west-2)`\n    - `N. Virginia (us-east-1)`\n\n## Hugging Face Embeddings\n\nHugging Face provides access to a vast collection of pre-trained embedding models through the Hugging Face Hub. You can integrate these models with TiDB using the AI SDK, which enables automatic embedding generation from various transformer-based models.\n\n### Supported Models\n\nHugging Face supports a wide range of embedding models. Here are some popular examples:\n\n| Model Name | Dimensions | Max Input Tokens | Description |\n|------------|------------|------------------|-------------|\n| `sentence-transformers/all-MiniLM-L6-v2` | 384 | 256 | Fast, lightweight model for general-purpose embeddings |\n| `sentence-transformers/all-mpnet-base-v2` | 768 | 384 | High-quality embeddings with good performance |\n| `sentence-transformers/all-MiniLM-L12-v2` | 384 | 256 | Balanced model between speed and quality |\n| `BAAI/bge-small-en-v1.5` | 384 | 512 | Multilingual model optimized for semantic search |\n| `BAAI/bge-base-en-v1.5` | 768 | 512 | Higher quality multilingual embeddings |\n| `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` | 384 | 256 | Multilingual model for semantic similarity across languages |\n| `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 768 | 384 | High-quality multilingual model based on MPNet architecture |\n| `bert-base-uncased` | 768 | 512 | Google's BERT base model with 12 layers and 12 attention heads |\n| `distilbert-base-uncased` | 768 | 512 | Lightweight BERT model with ~60% fewer parameters, 60% faster inference |\n\nFor a complete list of supported models and detailed specifications, see the [Hugging Face Model Hub](https://huggingface.co/models?pipeline_tag=sentence-similarity).\n\n## Usage example\n\nThis example demonstrates creating a vector table, inserting documents, and performing similarity search using Hugging Face embedding models.\n\n### Step 1: Connect to the database\n\n=== \"Python\"\n\n    ```python\n    from pytidb import TiDBClient\n\n    tidb_client = TiDBClient.connect(\n        host=\"{gateway-region}.prod.aws.tidbcloud.com\",\n        port=4000,\n        username=\"{prefix}.root\",\n        password=\"{password}\",\n        database=\"{database}\",\n        ensure_db=True,\n    )\n    ```\n\n=== \"SQL\"\n\n    ```bash\n    mysql -h {gateway-region}.prod.aws.tidbcloud.com \\\n        -P 4000 \\\n        -u {prefix}.root \\\n        -p{password} \\\n        -D {database}\n    ```\n\n### Step 2: Configure the API key\n\nIf you're using a private model or need higher rate limits, you can configure your Hugging Face API token. You can create your token from the [Hugging Face Token Settings](https://huggingface.co/settings/tokens) page:\n\n=== \"Python\"\n\n    Configure the API token for Hugging Face models using the TiDB Client:\n\n    ```python\n    tidb_client.configure_embedding_provider(\n        provider=\"huggingface\",\n        api_key=\"{your-huggingface-token}\",\n    )\n    ```\n\n=== \"SQL\"\n\n    Set the API token for Hugging Face models using SQL:\n\n    ```sql\n    SET @@GLOBAL.TIDB_EXP_EMBED_HUGGINGFACE_API_KEY = \"{your-huggingface-token}\";\n    ```\n\n### Step 3: Create a vector table\n\nCreate a table with a vector field that uses a Hugging Face model to generate embeddings:\n\n=== \"Python\"\n\n    ```python\n    from pytidb.schema import TableModel, Field\n    from pytidb.embeddings import EmbeddingFunction\n    from pytidb.datatype import TEXT\n\n    class Document(TableModel):\n        __tablename__ = \"sample_documents\"\n        id: int = Field(primary_key=True)\n        content: str = Field(sa_type=TEXT)\n        embedding: list[float] = EmbeddingFunction(\n            model_name=\"huggingface/sentence-transformers/all-MiniLM-L6-v2\"\n        ).VectorField(source_field=\"content\")\n\n    table = tidb_client.create_table(schema=Document, if_exists=\"overwrite\")\n    ```\n\n=== \"SQL\"\n\n    ```sql\n    CREATE TABLE sample_documents (\n        `id`        INT PRIMARY KEY,\n        `content`   TEXT,\n        `embedding` VECTOR(384) GENERATED ALWAYS AS (EMBED_TEXT(\n            \"huggingface/sentence-transformers/all-MiniLM-L6-v2\",\n            `content`\n        )) STORED\n    );\n    ```\n\n!!! tip\n\n    The vector dimensions depend on the model you choose. For example, `huggingface/sentence-transformers/all-MiniLM-L6-v2` produces 384-dimensional vectors, while `huggingface/sentence-transformers/all-mpnet-base-v2` produces 768-dimensional vectors.\n\n### Step 4: Insert data into the table\n\n=== \"Python\"\n\n    Use the `table.insert()` or `table.bulk_insert()` API to add data:\n\n    ```python\n    documents = [\n        Document(id=1, content=\"Machine learning algorithms can identify patterns in data.\"),\n        Document(id=2, content=\"Deep learning uses neural networks with multiple layers.\"),\n        Document(id=3, content=\"Natural language processing helps computers understand text.\"),\n        Document(id=4, content=\"Computer vision enables machines to interpret images.\"),\n        Document(id=5, content=\"Reinforcement learning learns through trial and error.\"),\n    ]\n    table.bulk_insert(documents)\n    ```\n\n=== \"SQL\"\n\n    Insert data using the `INSERT INTO` statement:\n\n    ```sql\n    INSERT INTO sample_documents (id, content)\n    VALUES\n        (1, \"Machine learning algorithms can identify patterns in data.\"),\n        (2, \"Deep learning uses neural networks with multiple layers.\"),\n        (3, \"Natural language processing helps computers understand text.\"),\n        (4, \"Computer vision enables machines to interpret images.\"),\n        (5, \"Reinforcement learning learns through trial and error.\");\n    ```\n\n### Step 5: Search for similar documents\n\n=== \"Python\"\n\n    Use the `table.search()` API to perform vector search:\n\n    ```python\n    results = table.search(\"How do neural networks work?\") \\\n        .limit(3) \\\n        .to_list()\n    \n    for doc in results:\n        print(f\"ID: {doc.id}, Content: {doc.content}\")\n    ```\n\n=== \"SQL\"\n\n    Use the `VEC_EMBED_COSINE_DISTANCE` function to perform vector search with cosine distance:\n\n    ```sql\n    SELECT\n        `id`,\n        `content`,\n        VEC_EMBED_COSINE_DISTANCE(embedding, \"How do neural networks work?\") AS _distance\n    FROM sample_documents\n    ORDER BY _distance ASC\n    LIMIT 3;\n    ```\n"
  },
  {
    "path": "src/ai/integrations/embedding-jinaai.md",
    "content": "---\ntitle: \"Integrate TiDB Vector Search with Jina AI Embeddings API\"\ndescription: \"Learn how to integrate TiDB Vector Search with Jina AI Embeddings API to store embeddings and perform semantic search.\"\nkeywords: \"TiDB, Jina AI, Vector search, text embeddings, multimodal embeddings\"\n---\n\n# Integrate TiDB Vector Search with Jina AI Embeddings API\n\nThis tutorial demonstrates how to use [Jina AI](https://jina.ai/embeddings/) to generate embeddings for text and image data, store them in TiDB vector storage, and perform semantic search.\n\n!!! info\n\n    Currently, [Server-Side Auto Embedding](../guides/auto-embedding.md) is only available on [TiDB Cloud Starter](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme) clusters in the following AWS regions:\n\n    - `Frankfurt (eu-central-1)`\n    - `Oregon (us-west-2)`\n    - `N. Virginia (us-east-1)`\n\n## Jina AI Embeddings\n\nJina AI provides high-performance, multimodal, and multilingual long-context embeddings for search, RAG, and agent applications.\n\n### Supported Models\n\n| Model Name                       | Dimensions | Max Input Tokens | Description |\n|----------------------------------|------------|------------------|-------------|\n| `jina_ai/jina-embeddings-v4`     | 2048 | 32,768 | Multimodal, multilingual, text and image embeddings |\n| `jina_ai/jina-clip-v2`           | 1024       | 8192             | Multilingual multimodal embeddings for texts and images |\n| `jina_ai/jina-embeddings-v3`     | 1024       | 8192             | Multilingual, text and code embeddings |\n\nFor a complete list of supported models and detailed specifications, see the [Jina AI Embeddings Documentation](https://jina.ai/embeddings/).\n\n## Usage example\n\nThis example demonstrates creating a vector table, inserting documents, and performing similarity search using Jina AI embedding models.\n\n### Step 1: Connect to the database\n\n=== \"Python\"\n\n    ```python\n    from pytidb import TiDBClient\n\n    tidb_client = TiDBClient.connect(\n        host=\"{gateway-region}.prod.aws.tidbcloud.com\",\n        port=4000,\n        username=\"{prefix}.root\",\n        password=\"{password}\",\n        database=\"{database}\",\n        ensure_db=True,\n    )\n    ```\n\n=== \"SQL\"\n\n    ```bash\n    mysql -h {gateway-region}.prod.aws.tidbcloud.com \\\n        -P 4000 \\\n        -u {prefix}.root \\\n        -p{password} \\\n        -D {database}\n    ```\n\n### Step 2: Configure the API key\n\nCreate your API key from the [Jina AI Platform](https://jina.ai/embeddings/) and bring your own key (BYOK) to use the embedding service.\n\n=== \"Python\"\n\n    Configure the API key for the Jina AI embedding provider using the TiDB Client:\n\n    ```python\n    tidb_client.configure_embedding_provider(\n        provider=\"jina_ai\",\n        api_key=\"{your-jina-api-key}\",\n    )\n    ```\n\n=== \"SQL\"\n\n    Set the API key for the Jina AI embedding provider using SQL:\n\n    ```sql\n    SET @@GLOBAL.TIDB_EXP_EMBED_JINA_AI_API_KEY = \"{your-jina-api-key}\";\n    ```\n\n### Step 3: Create a vector table\n\nCreate a table with a vector field that uses the `jina_ai/jina-embeddings-v4` model to generate 2048-dimensional vectors:\n\n=== \"Python\"\n\n    ```python\n    from pytidb.schema import TableModel, Field\n    from pytidb.embeddings import EmbeddingFunction\n    from pytidb.datatype import TEXT\n\n    class Document(TableModel):\n        __tablename__ = \"sample_documents\"\n        id: int = Field(primary_key=True)\n        content: str = Field(sa_type=TEXT)\n        embedding: list[float] = EmbeddingFunction(\n            model_name=\"jina_ai/jina-embeddings-v4\"\n        ).VectorField(source_field=\"content\")\n\n    table = tidb_client.create_table(schema=Document, if_exists=\"overwrite\")\n    ```\n\n=== \"SQL\"\n\n    ```sql\n    CREATE TABLE sample_documents (\n        `id`        INT PRIMARY KEY,\n        `content`   TEXT,\n        `embedding` VECTOR(2048) GENERATED ALWAYS AS (EMBED_TEXT(\n            \"jina_ai/jina-embeddings-v4\",\n            `content`\n        )) STORED\n    );\n    ```\n\n### Step 4: Insert data into the table\n\n=== \"Python\"\n\n    Use the `table.insert()` or `table.bulk_insert()` API to add data:\n\n    ```python\n    documents = [\n        Document(id=1, content=\"Java: Object-oriented language for cross-platform development.\"),\n        Document(id=2, content=\"Java coffee: Bold Indonesian beans with low acidity.\"),\n        Document(id=3, content=\"Java island: Densely populated, home to Jakarta.\"),\n        Document(id=4, content=\"Java's syntax is used in Android apps.\"),\n        Document(id=5, content=\"Dark roast Java beans enhance espresso blends.\"),\n    ]\n    table.bulk_insert(documents)\n    ```\n\n=== \"SQL\"\n\n    Insert data using the `INSERT INTO` statement:\n\n    ```sql\n    INSERT INTO sample_documents (id, content)\n    VALUES\n        (1, \"Java: Object-oriented language for cross-platform development.\"),\n        (2, \"Java coffee: Bold Indonesian beans with low acidity.\"),\n        (3, \"Java island: Densely populated, home to Jakarta.\"),\n        (4, \"Java's syntax is used in Android apps.\"),\n        (5, \"Dark roast Java beans enhance espresso blends.\");\n    ```\n\n### Step 5: Search for similar documents\n\n=== \"Python\"\n\n    Use the `table.search()` API to perform vector search:\n\n    ```python\n    results = table.search(\"How to start learning Java programming?\") \\\n        .limit(2) \\\n        .to_list()\n    print(results)\n    ```\n\n=== \"SQL\"\n\n    Use the `VEC_EMBED_COSINE_DISTANCE` function to perform vector search based on cosine distance metric:\n\n    ```sql\n    SELECT\n        `id`,\n        `content`,\n        VEC_EMBED_COSINE_DISTANCE(embedding, \"How to start learning Java programming?\") AS _distance\n    FROM sample_documents\n    ORDER BY _distance ASC\n    LIMIT 2;\n    ```\n"
  },
  {
    "path": "src/ai/integrations/embedding-nvidia-nim.md",
    "content": "---\ntitle: \"Integrate TiDB Vector Search with NVIDIA NIM Embeddings\"\ndescription: \"Learn how to integrate TiDB Vector Search with NVIDIA NIM models to store embeddings and perform semantic search.\"\nkeywords: \"TiDB, NVIDIA NIM, Vector search, text embeddings, AI models\"\n---\n\n# Integrate TiDB Vector Search with NVIDIA NIM Embeddings\n\nThis tutorial demonstrates how to use [NVIDIA NIM](https://developer.nvidia.com/nim) models to generate text embeddings, store them in TiDB vector storage, and perform semantic search.\n\n!!! info\n\n    Currently, [Server-Side Auto Embedding](../guides/auto-embedding.md) is only available on [TiDB Cloud Starter](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme) clusters in the following AWS regions:\n\n    - `Frankfurt (eu-central-1)`\n    - `Oregon (us-west-2)`\n    - `N. Virginia (us-east-1)`\n\n## NVIDIA NIM Embeddings\n\nNVIDIA NIM™ (NVIDIA Inference Microservices) provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and RTX™ AI PCs and workstations. NIM microservices expose industry-standard APIs for simple integration into AI applications, development frameworks, and workflows.\n\nYou can integrate NVIDIA NIM embedding models with TiDB using the AI SDK, which enables automatic embedding generation from various transformer-based models.\n\n### Supported Models\n\nNVIDIA NIM supports a range of embedding models optimized for different use cases. Here are some popular examples:\n\n| Model Name | Dimensions | Max Input Tokens | Description |\n|------------|------------|------------------|-------------|\n| `nvidia/nv-embed-v1` | 4096 | 32k | High-quality general-purpose embeddings based on Mistral-7B-v0.1 with Latent-Attention pooling |\n| `nvidia/llama-3_2-nemoretriever-300m-embed-v1` | 2048 | 8192 | Multilingual embeddings using Llama 3.2 architecture, supporting 20+ languages and long-context reasoning |\n\nFor a complete list of supported models and detailed specifications, see the [NVIDIA Build Platform](https://build.nvidia.com/search?q=Embedding).\n\n## Usage example\n\nThis example demonstrates creating a vector table, inserting documents, and performing similarity search using NVIDIA NIM embedding models.\n\n### Step 1: Connect to the database\n\n=== \"Python\"\n\n    ```python\n    from pytidb import TiDBClient\n\n    tidb_client = TiDBClient.connect(\n        host=\"{gateway-region}.prod.aws.tidbcloud.com\",\n        port=4000,\n        username=\"{prefix}.root\",\n        password=\"{password}\",\n        database=\"{database}\",\n        ensure_db=True,\n    )\n    ```\n\n=== \"SQL\"\n\n    ```bash\n    mysql -h {gateway-region}.prod.aws.tidbcloud.com \\\n        -P 4000 \\\n        -u {prefix}.root \\\n        -p{password} \\\n        -D {database}\n    ```\n\n### Step 2: Configure the API key\n\nIf you're using NVIDIA NIM models that require authentication, you can configure your API key. You can get free access to NIM API endpoints through the [NVIDIA Developer Program](https://developer.nvidia.com/nim) or create your API key from the [NVIDIA Build Platform](https://build.nvidia.com/settings/api-keys):\n\n=== \"Python\"\n\n    Configure the API key for NVIDIA NIM models using the TiDB Client:\n\n    ```python\n    tidb_client.configure_embedding_provider(\n        provider=\"nvidia_nim\",\n        api_key=\"{your-nvidia-api-key}\",\n    )\n    ```\n\n=== \"SQL\"\n\n    Set the API key for NVIDIA NIM models using SQL:\n\n    ```sql\n    SET @@GLOBAL.TIDB_EXP_EMBED_NVIDIA_NIM_API_KEY = \"{your-nvidia-api-key}\";\n    ```\n\n### Step 3: Create a vector table\n\nCreate a table with a vector field that uses an NVIDIA NIM model to generate embeddings:\n\n=== \"Python\"\n\n    ```python\n    from pytidb.schema import TableModel, Field\n    from pytidb.embeddings import EmbeddingFunction\n    from pytidb.datatype import TEXT\n\n    class Document(TableModel):\n        __tablename__ = \"sample_documents\"\n        id: int = Field(primary_key=True)\n        content: str = Field(sa_type=TEXT)\n        embedding: list[float] = EmbeddingFunction(\n            model_name=\"nvidia/nv-embed-v1\"\n        ).VectorField(source_field=\"content\")\n\n    table = tidb_client.create_table(schema=Document, if_exists=\"overwrite\")\n    ```\n\n=== \"SQL\"\n\n    ```sql\n    CREATE TABLE sample_documents (\n        `id`        INT PRIMARY KEY,\n        `content`   TEXT,\n        `embedding` VECTOR(4096) GENERATED ALWAYS AS (EMBED_TEXT(\n            \"nvidia/nv-embed-v1\",\n            `content`\n        )) STORED\n    );\n    ```\n\n### Step 4: Insert data into the table\n\n=== \"Python\"\n\n    Use the `table.insert()` or `table.bulk_insert()` API to add data:\n\n    ```python\n    documents = [\n        Document(id=1, content=\"Machine learning algorithms can identify patterns in data.\"),\n        Document(id=2, content=\"Deep learning uses neural networks with multiple layers.\"),\n        Document(id=3, content=\"Natural language processing helps computers understand text.\"),\n        Document(id=4, content=\"Computer vision enables machines to interpret images.\"),\n        Document(id=5, content=\"Reinforcement learning learns through trial and error.\"),\n    ]\n    table.bulk_insert(documents)\n    ```\n\n=== \"SQL\"\n\n    Insert data using the `INSERT INTO` statement:\n\n    ```sql\n    INSERT INTO sample_documents (id, content)\n    VALUES\n        (1, \"Machine learning algorithms can identify patterns in data.\"),\n        (2, \"Deep learning uses neural networks with multiple layers.\"),\n        (3, \"Natural language processing helps computers understand text.\"),\n        (4, \"Computer vision enables machines to interpret images.\"),\n        (5, \"Reinforcement learning learns through trial and error.\");\n    ```\n\n### Step 5: Search for similar documents\n\n=== \"Python\"\n\n    Use the `table.search()` API to perform vector search:\n\n    ```python\n    results = table.search(\"How do neural networks work?\") \\\n        .limit(3) \\\n        .to_list()\n    \n    for doc in results:\n        print(f\"ID: {doc.id}, Content: {doc.content}\")\n    ```\n\n=== \"SQL\"\n\n    Use the `VEC_EMBED_COSINE_DISTANCE` function to perform vector search with cosine distance:\n\n    ```sql\n    SELECT\n        `id`,\n        `content`,\n        VEC_EMBED_COSINE_DISTANCE(embedding, \"How do neural networks work?\") AS _distance\n    FROM sample_documents\n    ORDER BY _distance ASC\n    LIMIT 3;\n    ```\n"
  },
  {
    "path": "src/ai/integrations/embedding-openai-compatible.md",
    "content": "---\ntitle: \"Integrate TiDB Vector Search with OpenAI-Compatible Embedding API\"\ndescription: \"Learn how to integrate TiDB Vector Search with OpenAI-Compatible Embedding API to store embeddings and perform semantic search.\"\nkeywords: \"TiDB, OpenAI-like Embeddings, OpenAI Compatible API, Vector search, text embeddings\"\n---\n\n# Integrate TiDB Vector Search with OpenAI-Compatible Embedding API\n\nThis tutorial demonstrates how to use OpenAI-compatible embedding services to generate text embeddings, store them in TiDB vector storage, and perform semantic search.\n\n!!! info\n\n    Currently, TiDB does not support native SQL functions for OpenAI-compatible embedding services, but you can integrate them using the AI SDK.\n\n## OpenAI-like embedding services\n\nBecause the OpenAI Embedding API is widely used, many AI service providers offer APIs that are compatible with the OpenAI Embedding API, such as:\n\n- [Ollama](https://ollama.com/)\n- [vLLM](https://vllm.ai/)\n\nTiDB AI SDK provides an `EmbeddingFunction` class that can be used to integrate with OpenAI-compatible embedding services.\n\n## Usage example\n\nThis example demonstrates creating a vector table, inserting documents, and performing similarity search using OpenAI-compatible embedding models.\n\n### Step 1: Connect to the database\n\n=== \"Python\"\n\n    ```python\n    from pytidb import TiDBClient\n\n    tidb_client = TiDBClient.connect(\n        host=\"{gateway-region}.prod.aws.tidbcloud.com\",\n        port=4000,\n        username=\"{prefix}.root\",\n        password=\"{password}\",\n        database=\"{database}\",\n        ensure_db=True,\n    )\n    ```\n\n### Step 2: Define the embedding function\n\n=== \"Python\"\n\n    To integrate with OpenAI-compatible embedding services, initialize the `EmbeddingFunction` class and set the `model_name` parameter with the `openai/` prefix.\n\n    ```python\n    from pytidb.embeddings import EmbeddingFunction\n\n    openai_like_embed = EmbeddingFunction(\n        model_name=\"openai/{model_name}\",\n        api_base=\"{your-api-base}\",\n        api_key=\"{your-api-key}\",\n    )\n    ```\n\n    The parameters are:\n\n    - `model_name`: Specifies the model to use. Use the format `openai/{model_name}`.\n    - `api_base`: The base URL of your OpenAI compatible embedding API service.\n    - `api_key`: The API key can be used to authenticate with the embedding API service.\n\n\n**Example: Using Ollama and `nomic-embed-text` model**\n\n=== \"Python\"\n\n```python\nopenai_like_embed = EmbeddingFunction(\n    model_name=\"openai/nomic-embed-text\",\n    api_base=\"http://localhost:11434/v1\"\n)\n```\n\n**Example: Using vLLM and `nomic-embed-text` model**\n\n=== \"Python\"\n\n```python\nopenai_like_embed = EmbeddingFunction(\n    model_name=\"openai/intfloat/e5-mistral-7b-instruct\",\n    api_base=\"http://localhost:8000/v1\"\n)\n```\n\n### Step 3: Create a vector table\n\nCreate a table with a vector field that uses Ollama platform and `nomic-embed-text` model.\n\n=== \"Python\"\n\n    ```python\n    from pytidb.schema import TableModel, Field\n    from pytidb.embeddings import EmbeddingFunction\n    from pytidb.datatype import TEXT\n\n    openai_like_embed = EmbeddingFunction(\n        model_name=\"openai/nomic-embed-text\",\n        api_base=\"{your-api-base}\",\n    )\n\n    class Document(TableModel):\n        __tablename__ = \"sample_documents\"\n        id: int = Field(primary_key=True)\n        content: str = Field(sa_type=TEXT)\n        embedding: list[float] = openai_like_embed.VectorField(source_field=\"content\")\n\n    table = tidb_client.create_table(schema=Document, if_exists=\"overwrite\")\n    ```\n\n### Step 4: Insert data into the table\n\n=== \"Python\"\n\n    Use the `table.insert()` or `table.bulk_insert()` API to add data:\n\n    ```python\n    documents = [\n        Document(id=1, content=\"Java: Object-oriented language for cross-platform development.\"),\n        Document(id=2, content=\"Java coffee: Bold Indonesian beans with low acidity.\"),\n        Document(id=3, content=\"Java island: Densely populated, home to Jakarta.\"),\n        Document(id=4, content=\"Java's syntax is used in Android apps.\"),\n        Document(id=5, content=\"Dark roast Java beans enhance espresso blends.\"),\n    ]\n    table.bulk_insert(documents)\n    ```\n\nWith [Auto Embedding](../guides/auto-embedding.md) enabled, TiDB automatically generates vector field values when you insert data.\n\n### Step 5: Search for similar documents\n\n=== \"Python\"\n\n    Use the `table.search()` API to perform vector search:\n\n    ```python\n    results = table.search(\"How to start learning Java programming?\") \\\n        .limit(2) \\\n        .to_list()\n    print(results)\n    ```\n\nWith [Auto Embedding](../guides/auto-embedding.md) enabled, TiDB automatically generates vector embeddings for query text during vector search.\n"
  },
  {
    "path": "src/ai/integrations/embedding-openai.md",
    "content": "---\ntitle: \"Integrate TiDB Vector Search with OpenAI Embeddings API\"\ndescription: \"Learn how to integrate TiDB Vector Search with OpenAI AI Embeddings API to store embeddings and perform semantic search.\"\nkeywords: \"TiDB, OpenAI Embeddings, Vector search, text embeddings\"\n---\n\n# Integrate TiDB Vector Search with OpenAI Embeddings API\n\nThis tutorial demonstrates how to use [OpenAI](https://openai.com/api/) to generate text embeddings, store them in TiDB vector storage, and perform semantic search.\n\n!!! info\n\n    Currently, Auto Embedding is only available on TiDB Cloud Starter clusters in the following AWS regions:\n\n    - Frankfurt (eu-central-1)\n    - Oregon (us-west-2)\n    - N. Virginia (us-east-1)\n\n## OpenAI Embeddings\n\nOpenAI offers cost-effective, high-performance embedding models. You can integrate the OpenAI Embeddings API with TiDB using the AI SDK or native SQL functions for automatic embedding generation.\n\n### Supported Models\n\n| Model Name                        | Dimensions | Max Input Tokens |\n|-----------------------------------|------------|------------------|\n| `openai/text-embedding-3-small`   | 1536       | 8191             |\n| `openai/text-embedding-3-large`   | 3072       | 8191             |\n\nFor a complete list of supported models, see the [OpenAI Embedding API Reference](https://platform.openai.com/docs/api-reference/embeddings).\n\n## Usage example\n\nThis example demonstrates creating a vector table, inserting documents, and performing similarity search using OpenAI embedding models.\n\n### Step 1: Connect to the database\n\n=== \"Python\"\n\n    ```python\n    from pytidb import TiDBClient\n\n    tidb_client = TiDBClient.connect(\n        host=\"{gateway-region}.prod.aws.tidbcloud.com\",\n        port=4000,\n        username=\"{prefix}.root\",\n        password=\"{password}\",\n        database=\"{database}\",\n        ensure_db=True,\n    )\n    ```\n\n=== \"SQL\"\n\n    ```bash\n    mysql -h {gateway-region}.prod.aws.tidbcloud.com \\\n        -P 4000 \\\n        -u {prefix}.root \\\n        -p{password} \\\n        -D {database}\n    ```\n\n### Step 2: Configure the API key\n\nCreate your own API key from the [OpenAI API Platform](https://platform.openai.com/api-keys) and bring your own key (BYOK) to use the embedding service.\n\n=== \"Python\"\n\n    Configure the API key for the OpenAI embedding provider using the TiDB Client:\n\n    ```python\n    tidb_client.configure_embedding_provider(\n        provider=\"openai\",\n        api_key=\"{your-openai-api-key}\",\n    )\n    ```\n\n=== \"SQL\"\n\n    Set the API key for the OpenAI embedding provider using SQL:\n\n    ```sql\n    SET @@GLOBAL.TIDB_EXP_EMBED_OPENAI_API_KEY = \"{your-openai-api-key}\";\n    ```\n\n### Step 3: Create a vector table\n\nCreate a table with a vector field that uses the `openai/text-embedding-3-small` model to generate 1536-dimensional vectors:\n\n=== \"Python\"\n\n\n    ```python\n    from pytidb.schema import TableModel, Field\n    from pytidb.embeddings import EmbeddingFunction\n    from pytidb.datatype import TEXT\n\n    class Document(TableModel):\n        __tablename__ = \"sample_documents\"\n        id: int = Field(primary_key=True)\n        content: str = Field(sa_type=TEXT)\n        embedding: list[float] = EmbeddingFunction(\n            model_name=\"openai/text-embedding-3-small\"\n        ).VectorField(source_field=\"content\")\n\n    table = tidb_client.create_table(schema=Document, if_exists=\"overwrite\")\n    ```\n\n=== \"SQL\"\n\n    ```sql\n    CREATE TABLE sample_documents (\n        `id`        INT PRIMARY KEY,\n        `content`   TEXT,\n        `embedding` VECTOR(1536) GENERATED ALWAYS AS (EMBED_TEXT(\n            \"openai/text-embedding-3-small\",\n            `content`\n        )) STORED\n    );\n    ```\n\n### Step 4: Insert data into the table\n\n=== \"Python\"\n\n    Use the `table.insert()` or `table.bulk_insert()` API to add data:\n\n    ```python\n    documents = [\n        Document(id=1, content=\"Java: Object-oriented language for cross-platform development.\"),\n        Document(id=2, content=\"Java coffee: Bold Indonesian beans with low acidity.\"),\n        Document(id=3, content=\"Java island: Densely populated, home to Jakarta.\"),\n        Document(id=4, content=\"Java's syntax is used in Android apps.\"),\n        Document(id=5, content=\"Dark roast Java beans enhance espresso blends.\"),\n    ]\n    table.bulk_insert(documents)\n    ```\n\n=== \"SQL\"\n\n    Insert data using the `INSERT INTO` statement:\n\n    ```sql\n    INSERT INTO sample_documents (id, content)\n    VALUES\n        (1, \"Java: Object-oriented language for cross-platform development.\"),\n        (2, \"Java coffee: Bold Indonesian beans with low acidity.\"),\n        (3, \"Java island: Densely populated, home to Jakarta.\"),\n        (4, \"Java's syntax is used in Android apps.\"),\n        (5, \"Dark roast Java beans enhance espresso blends.\");\n    ```\n\n### Step 5: Search for similar documents\n\n=== \"Python\"\n\n    Use the `table.search()` API to perform vector search:\n\n    ```python\n    results = table.search(\"How to start learning Java programming?\") \\\n        .limit(2) \\\n        .to_list()\n    print(results)\n    ```\n\n=== \"SQL\"\n\n    Use the `VEC_EMBED_COSINE_DISTANCE` function to perform vector search with cosine distance:\n\n    ```sql\n    SELECT\n        `id`,\n        `content`,\n        VEC_EMBED_COSINE_DISTANCE(embedding, \"How to start learning Java programming?\") AS _distance\n    FROM sample_documents\n    ORDER BY _distance ASC\n    LIMIT 2;\n    ```\n"
  },
  {
    "path": "src/ai/integrations/embedding-overview.md",
    "content": "# Embeddings Integration\n\n## Overview\n\nTiDB provides a unified interface for integrating with various embedding providers and models:\n\n- **Programmatic use**: Use the `EmbeddingFunction` class from the AI SDK to create embedding functions for specific providers or models.\n- **SQL use**: Use the `EMBED_TEXT` function to generate embeddings directly from text data.\n\n\n## Embedding Function\n\n=== \"Python\"\n\n  Use the `EmbeddingFunction` class to work with different embedding providers and models.\n\n  ```python\n  from pytidb.embeddings import EmbeddingFunction\n\n  embed_func = EmbeddingFunction(\n      model_name=\"<provider_name>/<model_name>\",\n  )\n  ```\n\n  **Parameters:**\n\n  - `model_name` *(required)*:  \n    Specifies the embedding model to use, in the format `{provider_name}/{model_name}`.\n\n  - `dimensions` *(optional)*:\n    The dimensionality of output vector embeddings. If not provided and the model lacks a default dimension, a test string is embedded during initialization to determine the actual dimension automatically.\n\n  - `api_key` *(optional)*: \n    The API key for accessing the embedding service. If not explicitly set, retrieves the key from the provider's default environment variable.\n\n  - `api_base` *(optional)*:\n    The base URL of the embedding API service.\n\n  - `use_server` *(optional)*:\n    Whether to use TiDB Cloud's hosted embedding service. Defaults to `True` for TiDB Cloud Starter.\n\n  - `multimodal` *(optional)*:\n    Whether to use a multimodal embedding model. When enabled, `use_server` is automatically set to `False`, and the embedding service is called client-side.\n\n=== \"SQL\"\n\n  ```sql\n  SELECT EMBED_TEXT('{model_id}', '{text}', '{extra_params}');\n  ```\n\n  **Parameters:**\n\n  - `model_id` *(required)*:\n    The ID of the embedding model, in the format `{provider_name}/{model_name}`, for example, `tidbcloud_free/amazon/titan-embed-text-v2`.\n\n  - `text` *(required)*:\n    The text to generate embeddings from.\n\n  - `extra_params` *(optional)*:\n    Additional parameters sent to the embedding API. Refer to the embedding provider's documentation for supported parameters.\n\n## Supported Providers\n\nThe following embedding providers are supported. Click on the corresponding provider to learn how to integrate and enable automatic embedding for your data.\n\n- [TiDB Cloud Hosted](embedding-tidb-cloud-hosted.md)\n- [OpenAI](embedding-openai.md)\n- [OpenAI Compatible](embedding-openai-compatible.md) \n- [Cohere](embedding-cohere.md)\n- [Jina AI](embedding-jinaai.md)\n- [Google Gemini](embedding-gemini.md)\n- [Hugging Face](embedding-huggingface.md)\n- [NVIDIA NIM](embedding-nvidia-nim.md)\n"
  },
  {
    "path": "src/ai/integrations/embedding-tidb-cloud-hosted.md",
    "content": "---\ntitle: \"Integrate TiDB Vector Search with TiDB Cloud Hosted Embedding Models\"\ndescription: \"Learn how to integrate TiDB Vector Search with TiDB Cloud Hosted Embedding Models to store embeddings and perform semantic search.\"\nkeywords: \"TiDB, TiDB Cloud, Vector search, text embeddings\"\n---\n\n# Integrate TiDB Vector Search with TiDB Cloud Hosted Embedding Models\n\nThis tutorial demonstrates how to use TiDB Cloud hosted embedding models to generate embeddings for text data, store them in TiDB vector storage, and perform semantic search.\n\n!!! info\n\n    Currently, [Server-Side Auto Embedding](../guides/auto-embedding.md) is only available on [TiDB Cloud Starter](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme) clusters in the following AWS regions:\n\n    - `Frankfurt (eu-central-1)`\n    - `Oregon (us-west-2)`\n    - `N. Virginia (us-east-1)`\n\n## TiDB Cloud Hosted Embeddings\n\nTiDB Cloud provides hosted embedding models for generating text embeddings without requiring external API keys.\n\n### Supported Models\n\nTiDB Cloud currently supports the following hosted embedding models:\n\n| Model Name                                    | Dimensions | Max Input Tokens | Features |\n|-----------------------------------------------|------------|------------------|----------|\n| `tidbcloud_free/amazon/titan-embed-text-v2`  | 1536       | 8192             | Text, Multilingual |\n| `tidbcloud_free/cohere/embed-english-v3`     | 1024       | 512              | Text, English-optimized |\n| `tidbcloud_free/cohere/embed-multilingual-v3`| 1024       | 512              | Text, Multilingual |\n\n!!! info\n\n    `tidbcloud_free` prefix models are provided by TiDB Cloud for free.\n\n## Usage example\n\nThis example demonstrates creating a vector table, inserting documents, and performing similarity search using TiDB Cloud hosted embedding models.\n\n### Step 1: Connect to the database\n\n=== \"Python\"\n\n    ```python\n    from pytidb import TiDBClient\n\n    tidb_client = TiDBClient.connect(\n        host=\"{gateway-region}.prod.aws.tidbcloud.com\",\n        port=4000,\n        username=\"{prefix}.root\",\n        password=\"{password}\",\n        database=\"{database}\",\n        ensure_db=True,\n    )\n    ```\n\n=== \"SQL\"\n\n    ```bash\n    mysql -h {gateway-region}.prod.aws.tidbcloud.com \\\n        -P 4000 \\\n        -u {prefix}.root \\\n        -p{password} \\\n        -D {database}\n    ```\n\n### Step 2: Create a vector table\n\nCreate a table with a vector field that uses the `tidbcloud_free/amazon/titan-embed-text-v2` model to generate 1536-dimensional vectors:\n\n=== \"Python\"\n\n    ```python\n    from pytidb.schema import TableModel, Field\n    from pytidb.embeddings import EmbeddingFunction\n    from pytidb.datatype import TEXT\n\n    class Document(TableModel):\n        __tablename__ = \"sample_documents\"\n        id: int = Field(primary_key=True)\n        content: str = Field(sa_type=TEXT)\n        embedding: list[float] = EmbeddingFunction(\n            model_name=\"tidbcloud_free/amazon/titan-embed-text-v2\"\n        ).VectorField(source_field=\"content\")\n\n    table = tidb_client.create_table(schema=Document, if_exists=\"overwrite\")\n    ```\n\n=== \"SQL\"\n\n    ```sql\n    CREATE TABLE sample_documents (\n        `id`        INT PRIMARY KEY,\n        `content`   TEXT,\n        `embedding` VECTOR(1536) GENERATED ALWAYS AS (EMBED_TEXT(\n            \"tidbcloud_free/amazon/titan-embed-text-v2\",\n            `content`\n        )) STORED\n    );\n    ```\n\n!!! info\n\n    `tidbcloud_free` prefix models is not required to configure the API key.\n\n### Step 3: Insert data into the table\n\n=== \"Python\"\n\n    Use the `table.insert()` or `table.bulk_insert()` API to add data:\n\n    ```python\n    documents = [\n        Document(id=1, content=\"Java: Object-oriented language for cross-platform development.\"),\n        Document(id=2, content=\"Java coffee: Bold Indonesian beans with low acidity.\"),\n        Document(id=3, content=\"Java island: Densely populated, home to Jakarta.\"),\n        Document(id=4, content=\"Java's syntax is used in Android apps.\"),\n        Document(id=5, content=\"Dark roast Java beans enhance espresso blends.\"),\n    ]\n    table.bulk_insert(documents)\n    ```\n\n=== \"SQL\"\n\n    Insert data using the `INSERT INTO` statement:\n\n    ```sql\n    INSERT INTO sample_documents (id, content)\n    VALUES\n        (1, \"Java: Object-oriented language for cross-platform development.\"),\n        (2, \"Java coffee: Bold Indonesian beans with low acidity.\"),\n        (3, \"Java island: Densely populated, home to Jakarta.\"),\n        (4, \"Java's syntax is used in Android apps.\"),\n        (5, \"Dark roast Java beans enhance espresso blends.\");\n    ```\n\n### Step 4: Search for similar documents\n\n=== \"Python\"\n\n    Use the `table.search()` API to perform vector search:\n\n    ```python\n    results = table.search(\"How to start learning Java programming?\") \\\n        .limit(2) \\\n        .to_list()\n    print(results)\n    ```\n\n=== \"SQL\"\n\n    Use the `VEC_EMBED_COSINE_DISTANCE` function to perform vector search based on cosine distance metric:\n\n    ```sql\n    SELECT\n        `id`,\n        `content`,\n        VEC_EMBED_COSINE_DISTANCE(embedding, \"How to start learning Java programming?\") AS _distance\n    FROM sample_documents\n    ORDER BY _distance ASC\n    LIMIT 2;\n    ```"
  },
  {
    "path": "src/ai/integrations/langchain.md",
    "content": "\n# Integrate Vector Search with LangChain\n\nThis tutorial demonstrates how to integrate the [vector search](https://docs.pingcap.com/tidbcloud/vector-search-overview) feature of TiDB with [LangChain](https://python.langchain.com/).\n\n!!! note\n\n    The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Starter](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-cloud-starter) clusters.\n\n!!! tip\n\n    You can view the complete [sample code](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/vectorstores/tidb_vector.ipynb) on Jupyter Notebook, or run the sample code directly in the [Colab](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/vectorstores/tidb_vector.ipynb) online environment.\n\n## Prerequisites\n\nTo complete this tutorial, you need:\n\n- [Python 3.8 or higher](https://www.python.org/downloads/) installed.\n- [Jupyter Notebook](https://jupyter.org/install) installed.\n- [Git](https://git-scm.com/downloads) installed.\n- A TiDB cluster.\n\n## Get started\n\nThis section provides step-by-step instructions for integrating TiDB Vector Search with LangChain to perform semantic searches.\n\n### Step 1. Create a new Jupyter Notebook file\n\nIn your preferred directory, create a new Jupyter Notebook file named `integrate_with_langchain.ipynb`:\n\n```shell\ntouch integrate_with_langchain.ipynb\n```\n\n### Step 2. Install required dependencies\n\nIn your project directory, run the following command to install the required packages:\n\n```shell\n!pip install langchain langchain-community\n!pip install langchain-openai\n!pip install pymysql\n!pip install tidb-vector\n```\n\nOpen the `integrate_with_langchain.ipynb` file in Jupyter Notebook, and then add the following code to import the required packages:\n\n```python\nfrom langchain_community.document_loaders import TextLoader\nfrom langchain_community.vectorstores import TiDBVectorStore\nfrom langchain_openai import OpenAIEmbeddings\nfrom langchain_text_splitters import CharacterTextSplitter\n```\n\n### Step 3. Set up your environment\n\nConfigure the environment variables depending on the TiDB deployment option you've selected.\n\nFor a TiDB Cloud Starter cluster, take the following steps to obtain the cluster connection string and configure environment variables:\n\n1. Navigate to the [**Clusters**](https://tidbcloud.com/console/clusters) page, and then click the name of your target cluster to go to its overview page.\n\n2. Click **Connect** in the upper-right corner. A connection dialog is displayed.\n\n3. Ensure the configurations in the connection dialog match your operating environment.\n\n    - **Connection Type** is set to `Public`.\n    - **Branch** is set to `main`.\n    - **Connect With** is set to `SQLAlchemy`.\n    - **Operating System** matches your environment.\n\n4. Click the **PyMySQL** tab and copy the connection string.\n\n    > **Tip:**\n    >\n    > If you have not set a password yet, click **Generate Password** to generate a random password.\n\n5. Configure environment variables.\n\n    This document uses [OpenAI](https://platform.openai.com/docs/introduction) as the embedding model provider. In this step, you need to provide the connection string obtained from the previous step and your [OpenAI API key](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key).\n\n    To configure the environment variables, run the following code. You will be prompted to enter your connection string and OpenAI API key:\n\n    ```python\n    # Use getpass to securely prompt for environment variables in your terminal.\n    import getpass\n    import os\n\n    # Copy your connection string from the TiDB Cloud console.\n    # Connection string format: \"mysql+pymysql://<USER>:<PASSWORD>@<HOST>:4000/<DB>?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true\"\n    tidb_connection_string = getpass.getpass(\"TiDB Connection String:\")\n    os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n    ```\n\nThis document uses [OpenAI](https://platform.openai.com/docs/introduction) as the embedding model provider. In this step, you need to provide the connection string obtained from the previous step and your [OpenAI API key](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key).\n\nTo configure the environment variables, run the following code. You will be prompted to enter your connection string and OpenAI API key:\n\n```python\n# Use getpass to securely prompt for environment variables in your terminal.\nimport getpass\nimport os\n\n# Connection string format: \"mysql+pymysql://<USER>:<PASSWORD>@<HOST>:4000/<DB>?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true\"\ntidb_connection_string = getpass.getpass(\"TiDB Connection String:\")\nos.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n```\n\nTaking macOS as an example, the cluster connection string is as follows:\n\n```dotenv\nTIDB_DATABASE_URL=\"mysql+pymysql://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE_NAME>\"\n# For example: TIDB_DATABASE_URL=\"mysql+pymysql://root@127.0.0.1:4000/test\"\n```\n\nYou need to modify the values of the connection parameters according to your TiDB cluster. If you are running TiDB on your local machine, `<HOST>` is `127.0.0.1` by default. The initial `<PASSWORD>` is empty, so if you are starting the cluster for the first time, you can omit this field.\n\nThe following are descriptions for each parameter:\n\n- `<USERNAME>`: The username to connect to the TiDB cluster.\n- `<PASSWORD>`: The password to connect to the TiDB cluster.\n- `<HOST>`: The host of the TiDB cluster.\n- `<PORT>`: The port of the TiDB cluster.\n- `<DATABASE>`: The name of the database you want to connect to.\n\n### Step 4. Load the sample document\n\n#### Step 4.1 Download the sample document\n\nIn your project directory, create a directory named `data/how_to/` and download the sample document [`state_of_the_union.txt`](https://github.com/langchain-ai/langchain/blob/master/docs/docs/how_to/state_of_the_union.txt) from the [langchain-ai/langchain](https://github.com/langchain-ai/langchain) GitHub repository.\n\n```shell\n!mkdir -p 'data/how_to/'\n!wget 'https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/docs/how_to/state_of_the_union.txt' -O 'data/how_to/state_of_the_union.txt'\n```\n\n#### Step 4.2 Load and split the document\n\nLoad the sample document from `data/how_to/state_of_the_union.txt` and split it into chunks of approximately 1,000 characters each using a `CharacterTextSplitter`.\n\n```python\nloader = TextLoader(\"data/how_to/state_of_the_union.txt\")\ndocuments = loader.load()\ntext_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\ndocs = text_splitter.split_documents(documents)\n```\n\n### Step 5. Embed and store document vectors\n\nTiDB vector store supports both cosine distance (`consine`) and Euclidean distance (`l2`) for measuring similarity between vectors. The default strategy is cosine distance.\n\nThe following code creates a table named `embedded_documents` in TiDB, which is optimized for vector search.\n\n```python\nembeddings = OpenAIEmbeddings()\nvector_store = TiDBVectorStore.from_documents(\n    documents=docs,\n    embedding=embeddings,\n    table_name=\"embedded_documents\",\n    connection_string=tidb_connection_string,\n    distance_strategy=\"cosine\",  # default, another option is \"l2\"\n)\n```\n\nUpon successful execution, you can directly view and access the `embedded_documents` table in your TiDB database.\n\n### Step 6. Perform a vector search\n\nThis step demonstrates how to query \"What did the president say about Ketanji Brown Jackson\" from the document `state_of_the_union.txt`.\n\n```python\nquery = \"What did the president say about Ketanji Brown Jackson\"\n```\n\n#### Option 1: Use `similarity_search_with_score()`\n\nThe `similarity_search_with_score()` method calculates the vector space distance between the documents and the query. This distance serves as a similarity score, determined by the chosen `distance_strategy`. The method returns the top `k` documents with the lowest scores. A lower score indicates a higher similarity between a document and your query.\n\n```python\ndocs_with_score = vector_store.similarity_search_with_score(query, k=3)\nfor doc, score in docs_with_score:\n   print(\"-\" * 80)\n   print(\"Score: \", score)\n   print(doc.page_content)\n   print(\"-\" * 80)\n```\n\n<details>\n   <summary><b>Expected output</b></summary>\n\n```plain\n--------------------------------------------------------------------------------\nScore:  0.18472413652518527\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.\n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.\n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n--------------------------------------------------------------------------------\n--------------------------------------------------------------------------------\nScore:  0.21757513022785557\nA former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\n\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.\n\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.\n\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.\n\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.\n\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n--------------------------------------------------------------------------------\n--------------------------------------------------------------------------------\nScore:  0.22676987253721725\nAnd for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong.\n\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential.\n\nWhile it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.\n\nAnd soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things.\n\nSo tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.\n\nFirst, beat the opioid epidemic.\n--------------------------------------------------------------------------------\n```\n\n</details>\n\n#### Option 2: Use `similarity_search_with_relevance_scores()`\n\nThe `similarity_search_with_relevance_scores()` method returns the top `k` documents with the highest relevance scores. A higher score indicates a higher degree of similarity between a document and your query.\n\n```python\ndocs_with_relevance_score = vector_store.similarity_search_with_relevance_scores(query, k=2)\nfor doc, score in docs_with_relevance_score:\n    print(\"-\" * 80)\n    print(\"Score: \", score)\n    print(doc.page_content)\n    print(\"-\" * 80)\n```\n\n<details>\n   <summary><b>Expected output</b></summary>\n\n```plain\n--------------------------------------------------------------------------------\nScore:  0.8152758634748147\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.\n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.\n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n--------------------------------------------------------------------------------\n--------------------------------------------------------------------------------\nScore:  0.7824248697721444\nA former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\n\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.\n\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.\n\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.\n\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.\n\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n--------------------------------------------------------------------------------\n```\n\n</details>\n\n### Use as a retriever\n\nIn Langchain, a [retriever](https://python.langchain.com/v0.2/docs/concepts/#retrievers) is an interface that retrieves documents in response to an unstructured query, providing more functionality than a vector store. The following code demonstrates how to use TiDB vector store as a retriever.\n\n```python\nretriever = vector_store.as_retriever(\n   search_type=\"similarity_score_threshold\",\n   search_kwargs={\"k\": 3, \"score_threshold\": 0.8},\n)\ndocs_retrieved = retriever.invoke(query)\nfor doc in docs_retrieved:\n   print(\"-\" * 80)\n   print(doc.page_content)\n   print(\"-\" * 80)\n```\n\nThe expected output is as follows:\n\n```\n--------------------------------------------------------------------------------\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.\n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.\n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n--------------------------------------------------------------------------------\n```\n\n### Remove the vector store\n\nTo remove an existing TiDB vector store, use the `drop_vectorstore()` method:\n\n```python\nvector_store.drop_vectorstore()\n```\n\n## Search with metadata filters\n\nTo refine your searches, you can use metadata filters to retrieve specific nearest-neighbor results that match the applied filters.\n\n### Supported metadata types\n\nEach document in the TiDB vector store can be paired with metadata, structured as key-value pairs within a JSON object. Keys are always strings, while values can be any of the following types:\n\n- String\n- Number: integer or floating point\n- Boolean: `true` or `false`\n\nFor example, the following is a valid metadata payload:\n\n```json\n{\n  \"page\": 12,\n  \"book_title\": \"Siddhartha\"\n}\n```\n\n### Metadata filter syntax\n\nAvailable filters include the following:\n\n- `$or`: Selects vectors that match any one of the specified conditions.\n- `$and`: Selects vectors that match all the specified conditions.\n- `$eq`: Equal to the specified value.\n- `$ne`: Not equal to the specified value.\n- `$gt`: Greater than the specified value.\n- `$gte`: Greater than or equal to the specified value.\n- `$lt`: Less than the specified value.\n- `$lte`: Less than or equal to the specified value.\n- `$in`: In the specified array of values.\n- `$nin`: Not in the specified array of values.\n\nIf the metadata of a document is as follows:\n\n```json\n{\n  \"page\": 12,\n  \"book_title\": \"Siddhartha\"\n}\n```\n\nThe following metadata filters can match this document:\n\n```json\n{ \"page\": 12 }\n```\n\n```json\n{ \"page\": { \"$eq\": 12 } }\n```\n\n```json\n{\n  \"page\": {\n    \"$in\": [11, 12, 13]\n  }\n}\n```\n\n```json\n{ \"page\": { \"$nin\": [13] } }\n```\n\n```json\n{ \"page\": { \"$lt\": 11 } }\n```\n\n```json\n{\n  \"$or\": [{ \"page\": 11 }, { \"page\": 12 }],\n  \"$and\": [{ \"page\": 12 }, { \"page\": 13 }]\n}\n```\n\nIn a metadata filter, TiDB treats each key-value pair as a separate filter clause and combines these clauses using the `AND` logical operator.\n\n### Example\n\nThe following example adds two documents to `TiDBVectorStore` and adds a `title` field to each document as the metadata:\n\n```python\nvector_store.add_texts(\n    texts=[\n        \"TiDB Vector offers advanced, high-speed vector processing capabilities, enhancing AI workflows with efficient data handling and analytics support.\",\n        \"TiDB Vector, starting as low as $10 per month for basic usage\",\n    ],\n    metadatas=[\n        {\"title\": \"TiDB Vector functionality\"},\n        {\"title\": \"TiDB Vector Pricing\"},\n    ],\n)\n```\n\nThe expected output is as follows:\n\n```plain\n[UUID('c782cb02-8eec-45be-a31f-fdb78914f0a7'),\n UUID('08dcd2ba-9f16-4f29-a9b7-18141f8edae3')]\n```\n\nPerform a similarity search with metadata filters:\n\n```python\ndocs_with_score = vector_store.similarity_search_with_score(\n    \"Introduction to TiDB Vector\", filter={\"title\": \"TiDB Vector functionality\"}, k=4\n)\nfor doc, score in docs_with_score:\n    print(\"-\" * 80)\n    print(\"Score: \", score)\n    print(doc.page_content)\n    print(\"-\" * 80)\n```\n\nThe expected output is as follows:\n\n```plain\n--------------------------------------------------------------------------------\nScore:  0.12761409169211535\nTiDB Vector offers advanced, high-speed vector processing capabilities, enhancing AI workflows with efficient data handling and analytics support.\n--------------------------------------------------------------------------------\n```\n\n## Advanced usage example: travel agent\n\nThis section demonstrates a use case of integrating vector search with Langchain for a travel agent. The goal is to create personalized travel reports for clients, helping them find airports with specific amenities, such as clean lounges and vegetarian options.\n\nThe process involves two main steps:\n\n1. Perform a semantic search across airport reviews to identify airport codes that match the desired amenities.\n2. Execute a SQL query to merge these codes with route information, highlighting airlines and destinations that align with user's preferences.\n\n### Prepare data\n\nFirst, create a table to store airport route data:\n\n```python\n# Create a table to store flight plan data.\nvector_store.tidb_vector_client.execute(\n    \"\"\"CREATE TABLE airplan_routes (\n        id INT AUTO_INCREMENT PRIMARY KEY,\n        airport_code VARCHAR(10),\n        airline_code VARCHAR(10),\n        destination_code VARCHAR(10),\n        route_details TEXT,\n        duration TIME,\n        frequency INT,\n        airplane_type VARCHAR(50),\n        price DECIMAL(10, 2),\n        layover TEXT\n    );\"\"\"\n)\n\n# Insert some sample data into airplan_routes and the vector table.\nvector_store.tidb_vector_client.execute(\n    \"\"\"INSERT INTO airplan_routes (\n        airport_code,\n        airline_code,\n        destination_code,\n        route_details,\n        duration,\n        frequency,\n        airplane_type,\n        price,\n        layover\n    ) VALUES\n    ('JFK', 'DL', 'LAX', 'Non-stop from JFK to LAX.', '06:00:00', 5, 'Boeing 777', 299.99, 'None'),\n    ('LAX', 'AA', 'ORD', 'Direct LAX to ORD route.', '04:00:00', 3, 'Airbus A320', 149.99, 'None'),\n    ('EFGH', 'UA', 'SEA', 'Daily flights from SFO to SEA.', '02:30:00', 7, 'Boeing 737', 129.99, 'None');\n    \"\"\"\n)\nvector_store.add_texts(\n    texts=[\n        \"Clean lounges and excellent vegetarian dining options. Highly recommended.\",\n        \"Comfortable seating in lounge areas and diverse food selections, including vegetarian.\",\n        \"Small airport with basic facilities.\",\n    ],\n    metadatas=[\n        {\"airport_code\": \"JFK\"},\n        {\"airport_code\": \"LAX\"},\n        {\"airport_code\": \"EFGH\"},\n    ],\n)\n```\n\nThe expected output is as follows:\n\n```plain\n[UUID('6dab390f-acd9-4c7d-b252-616606fbc89b'),\n UUID('9e811801-0e6b-4893-8886-60f4fb67ce69'),\n UUID('f426747c-0f7b-4c62-97ed-3eeb7c8dd76e')]\n```\n\n### Perform a semantic search\n\nThe following code searches for airports with clean facilities and vegetarian options:\n\n```python\nretriever = vector_store.as_retriever(\n    search_type=\"similarity_score_threshold\",\n    search_kwargs={\"k\": 3, \"score_threshold\": 0.85},\n)\nsemantic_query = \"Could you recommend a US airport with clean lounges and good vegetarian dining options?\"\nreviews = retriever.invoke(semantic_query)\nfor r in reviews:\n    print(\"-\" * 80)\n    print(r.page_content)\n    print(r.metadata)\n    print(\"-\" * 80)\n```\n\nThe expected output is as follows:\n\n```plain\n--------------------------------------------------------------------------------\nClean lounges and excellent vegetarian dining options. Highly recommended.\n{'airport_code': 'JFK'}\n--------------------------------------------------------------------------------\n--------------------------------------------------------------------------------\nComfortable seating in lounge areas and diverse food selections, including vegetarian.\n{'airport_code': 'LAX'}\n--------------------------------------------------------------------------------\n```\n\n### Retrieve detailed airport information\n\nExtract airport codes from the search results and query the database for detailed route information:\n\n```python\n# Extracting airport codes from the metadata\nairport_codes = [review.metadata[\"airport_code\"] for review in reviews]\n\n# Executing a query to get the airport details\nsearch_query = \"SELECT * FROM airplan_routes WHERE airport_code IN :codes\"\nparams = {\"codes\": tuple(airport_codes)}\n\nairport_details = vector_store.tidb_vector_client.execute(search_query, params)\nairport_details.get(\"result\")\n```\n\nThe expected output is as follows:\n\n```plain\n[(1, 'JFK', 'DL', 'LAX', 'Non-stop from JFK to LAX.', datetime.timedelta(seconds=21600), 5, 'Boeing 777', Decimal('299.99'), 'None'),\n (2, 'LAX', 'AA', 'ORD', 'Direct LAX to ORD route.', datetime.timedelta(seconds=14400), 3, 'Airbus A320', Decimal('149.99'), 'None')]\n```\n\n### Streamline the process\n\nAlternatively, you can streamline the entire process using a single SQL query:\n\n```python\nsearch_query = f\"\"\"\n    SELECT\n        VEC_Cosine_Distance(se.embedding, :query_vector) as distance,\n        ar.*,\n        se.document as airport_review\n    FROM\n        airplan_routes ar\n    JOIN\n        {TABLE_NAME} se ON ar.airport_code = JSON_UNQUOTE(JSON_EXTRACT(se.meta, '$.airport_code'))\n    ORDER BY distance ASC\n    LIMIT 5;\n\"\"\"\nquery_vector = embeddings.embed_query(semantic_query)\nparams = {\"query_vector\": str(query_vector)}\nairport_details = vector_store.tidb_vector_client.execute(search_query, params)\nairport_details.get(\"result\")\n```\n\nThe expected output is as follows:\n\n```plain\n[(0.1219207353407008, 1, 'JFK', 'DL', 'LAX', 'Non-stop from JFK to LAX.', datetime.timedelta(seconds=21600), 5, 'Boeing 777', Decimal('299.99'), 'None', 'Clean lounges and excellent vegetarian dining options. Highly recommended.'),\n (0.14613754359804654, 2, 'LAX', 'AA', 'ORD', 'Direct LAX to ORD route.', datetime.timedelta(seconds=14400), 3, 'Airbus A320', Decimal('149.99'), 'None', 'Comfortable seating in lounge areas and diverse food selections, including vegetarian.'),\n (0.19840519342700513, 3, 'EFGH', 'UA', 'SEA', 'Daily flights from SFO to SEA.', datetime.timedelta(seconds=9000), 7, 'Boeing 737', Decimal('129.99'), 'None', 'Small airport with basic facilities.')]\n```\n\n### Clean up data\n\nFinally, clean up the resources by dropping the created table:\n\n```python\nvector_store.tidb_vector_client.execute(\"DROP TABLE airplan_routes\")\n```\n\nThe expected output is as follows:\n\n```plain\n{'success': True, 'result': 0, 'error': None}\n```\n\n"
  },
  {
    "path": "src/ai/integrations/llamaindex.md",
    "content": "\n# Integrate Vector Search with LlamaIndex\n\nThis tutorial demonstrates how to integrate the [vector search](https://docs.pingcap.com/tidbcloud/vector-search-overview) feature of TiDB with [LlamaIndex](https://www.llamaindex.ai).\n\n!!! note\n\n    The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Starter](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-cloud-starter) clusters.\n\n!!! tip\n\n    You can view the complete [sample code](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/TiDBVector.ipynb) on Jupyter Notebook, or run the sample code directly in the [Colab](https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/TiDBVector.ipynb) online environment.\n\n## Prerequisites\n\nTo complete this tutorial, you need:\n\n- [Python 3.8 or higher](https://www.python.org/downloads/) installed.\n- [Jupyter Notebook](https://jupyter.org/install) installed.\n- [Git](https://git-scm.com/downloads) installed.\n- A TiDB cluster.\n\n## Get started\n\nThis section provides step-by-step instructions for integrating TiDB Vector Search with LlamaIndex to perform semantic searches.\n\n### Step 1. Create a new Jupyter Notebook file\n\nIn the root directory, create a new Jupyter Notebook file named `integrate_with_llamaindex.ipynb`:\n\n```shell\ntouch integrate_with_llamaindex.ipynb\n```\n\n### Step 2. Install required dependencies\n\nIn your project directory, run the following command to install the required packages:\n\n```shell\npip install llama-index-vector-stores-tidbvector\npip install llama-index\n```\n\nOpen the `integrate_with_llamaindex.ipynb` file in Jupyter Notebook and add the following code to import the required packages:\n\n```python\nimport textwrap\n\nfrom llama_index.core import SimpleDirectoryReader, StorageContext\nfrom llama_index.core import VectorStoreIndex\nfrom llama_index.vector_stores.tidbvector import TiDBVectorStore\n```\n\n### Step 3. Configure environment variables\n\nConfigure the environment variables depending on the TiDB deployment option you've selected.\n\nFor a TiDB Cloud Starter cluster, take the following steps to obtain the cluster connection string and configure environment variables:\n\n1. Navigate to the [**Clusters**](https://tidbcloud.com/console/clusters) page, and then click the name of your target cluster to go to its overview page.\n\n2. Click **Connect** in the upper-right corner. A connection dialog is displayed.\n\n3. Ensure the configurations in the connection dialog match your operating environment.\n\n    - **Connection Type** is set to `Public`.\n    - **Branch** is set to `main`.\n    - **Connect With** is set to `SQLAlchemy`.\n    - **Operating System** matches your environment.\n\n4. Click the **PyMySQL** tab and copy the connection string.\n\n    > **Tip:**\n    >\n    > If you have not set a password yet, click **Generate Password** to generate a random password.\n\n5. Configure environment variables.\n\n    This document uses [OpenAI](https://platform.openai.com/docs/introduction) as the embedding model provider. In this step, you need to provide the connection string obtained from the previous step and your [OpenAI API key](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key).\n\n    To configure the environment variables, run the following code. You will be prompted to enter your connection string and OpenAI API key:\n\n    ```python\n    # Use getpass to securely prompt for environment variables in your terminal.\n    import getpass\n    import os\n\n    # Copy your connection string from the TiDB Cloud console.\n    # Connection string format: \"mysql+pymysql://<USER>:<PASSWORD>@<HOST>:4000/<DB>?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true\"\n    tidb_connection_string = getpass.getpass(\"TiDB Connection String:\")\n    os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n    ```\n\nThis document uses [OpenAI](https://platform.openai.com/docs/introduction) as the embedding model provider. In this step, you need to provide the connection string of your TiDB cluster and your [OpenAI API key](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key).\n\nTo configure the environment variables, run the following code. You will be prompted to enter your connection string and OpenAI API key:\n\n```python\n# Use getpass to securely prompt for environment variables in your terminal.\nimport getpass\nimport os\n\n# Connection string format: \"mysql+pymysql://<USER>:<PASSWORD>@<HOST>:4000/<DB>?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true\"\ntidb_connection_string = getpass.getpass(\"TiDB Connection String:\")\nos.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n```\n\nTaking macOS as an example, the cluster connection string is as follows:\n\n```dotenv\nTIDB_DATABASE_URL=\"mysql+pymysql://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE_NAME>\"\n# For example: TIDB_DATABASE_URL=\"mysql+pymysql://root@127.0.0.1:4000/test\"\n```\n\nYou need to modify the parameters in the connection string according to your TiDB cluster. If you are running TiDB on your local machine, `<HOST>` is `127.0.0.1` by default. The initial `<PASSWORD>` is empty, so if you are starting the cluster for the first time, you can omit this field.\n\nThe following are descriptions for each parameter:\n\n- `<USERNAME>`: The username to connect to the TiDB cluster.\n- `<PASSWORD>`: The password to connect to the TiDB cluster.\n- `<HOST>`: The host of the TiDB cluster.\n- `<PORT>`: The port of the TiDB cluster.\n- `<DATABASE>`: The name of the database you want to connect to.\n\n### Step 4. Load the sample document\n\n#### Step 4.1 Download the sample document\n\nIn your project directory, create a directory named `data/paul_graham/` and download the sample document [`paul_graham_essay.txt`](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt) from the [run-llama/llama_index](https://github.com/run-llama/llama_index) GitHub repository.\n\n```shell\n!mkdir -p 'data/paul_graham/'\n!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'\n```\n\n#### Step 4.2 Load the document\n\nLoad the sample document from `data/paul_graham/paul_graham_essay.txt` using the `SimpleDirectoryReader` class.\n\n```python\ndocuments = SimpleDirectoryReader(\"./data/paul_graham\").load_data()\nprint(\"Document ID:\", documents[0].doc_id)\n\nfor index, document in enumerate(documents):\n   document.metadata = {\"book\": \"paul_graham\"}\n```\n\n### Step 5. Embed and store document vectors\n\n#### Step 5.1 Initialize the TiDB vector store\n\nThe following code creates a table named `paul_graham_test` in TiDB, which is optimized for vector search.\n\n```python\ntidbvec = TiDBVectorStore(\n   connection_string=tidb_connection_url,\n   table_name=\"paul_graham_test\",\n   distance_strategy=\"cosine\",\n   vector_dimension=1536,\n   drop_existing_table=False,\n)\n```\n\nUpon successful execution, you can directly view and access the `paul_graham_test` table in your TiDB database.\n\n#### Step 5.2 Generate and store embeddings\n\nThe following code parses the documents, generates embeddings, and stores them in the TiDB vector store.\n\n```python\nstorage_context = StorageContext.from_defaults(vector_store=tidbvec)\nindex = VectorStoreIndex.from_documents(\n   documents, storage_context=storage_context, show_progress=True\n)\n```\n\nThe expected output is as follows:\n\n```plain\nParsing nodes: 100%|██████████| 1/1 [00:00<00:00,  8.76it/s]\nGenerating embeddings: 100%|██████████| 21/21 [00:02<00:00,  8.22it/s]\n```\n\n### Step 6. Perform a vector search\n\nThe following creates a query engine based on the TiDB vector store and performs a semantic similarity search.\n\n```python\nquery_engine = index.as_query_engine()\nresponse = query_engine.query(\"What did the author do?\")\nprint(textwrap.fill(str(response), 100))\n```\n\n> **Note**\n>\n> `TiDBVectorStore` only supports the [`default`](https://docs.llamaindex.ai/en/stable/api_reference/storage/vector_store/?h=vectorstorequerymode#llama_index.core.vector_stores.types.VectorStoreQueryMode) query mode.\n\nThe expected output is as follows:\n\n```plain\nThe author worked on writing, programming, building microcomputers, giving talks at conferences,\npublishing essays online, developing spam filters, painting, hosting dinner parties, and purchasing\na building for office use.\n```\n\n### Step 7. Search with metadata filters\n\nTo refine your searches, you can use metadata filters to retrieve specific nearest-neighbor results that match the applied filters.\n\n#### Query with `book != \"paul_graham\"` filter\n\nThe following example excludes results where the `book` metadata field is `\"paul_graham\"`:\n\n```python\nfrom llama_index.core.vector_stores.types import (\n   MetadataFilter,\n   MetadataFilters,\n)\n\nquery_engine = index.as_query_engine(\n   filters=MetadataFilters(\n      filters=[\n         MetadataFilter(key=\"book\", value=\"paul_graham\", operator=\"!=\"),\n      ]\n   ),\n   similarity_top_k=2,\n)\nresponse = query_engine.query(\"What did the author learn?\")\nprint(textwrap.fill(str(response), 100))\n```\n\nThe expected output is as follows:\n\n```plain\nEmpty Response\n```\n\n#### Query with `book == \"paul_graham\"` filter\n\nThe following example filters results to include only documents where the `book` metadata field is `\"paul_graham\"`:\n\n```python\nfrom llama_index.core.vector_stores.types import (\n   MetadataFilter,\n   MetadataFilters,\n)\n\nquery_engine = index.as_query_engine(\n   filters=MetadataFilters(\n      filters=[\n         MetadataFilter(key=\"book\", value=\"paul_graham\", operator=\"==\"),\n      ]\n   ),\n   similarity_top_k=2,\n)\nresponse = query_engine.query(\"What did the author learn?\")\nprint(textwrap.fill(str(response), 100))\n```\n\nThe expected output is as follows:\n\n```plain\nThe author learned programming on an IBM 1401 using an early version of Fortran in 9th grade, then\nlater transitioned to working with microcomputers like the TRS-80 and Apple II. Additionally, the\nauthor studied philosophy in college but found it unfulfilling, leading to a switch to studying AI.\nLater on, the author attended art school in both the US and Italy, where they observed a lack of\nsubstantial teaching in the painting department.\n```\n\n### Step 8. Delete documents\n\nDelete the first document from the index:\n\n```python\ntidbvec.delete(documents[0].doc_id)\n```\n\nCheck whether the documents had been deleted:\n\n```python\nquery_engine = index.as_query_engine()\nresponse = query_engine.query(\"What did the author learn?\")\nprint(textwrap.fill(str(response), 100))\n```\n\nThe expected output is as follows:\n\n```plain\nEmpty Response\n```\n\n"
  },
  {
    "path": "src/ai/integrations/tidb-mcp-claude-code.md",
    "content": "---\ntitle: Get started with Claude Code and TiDB MCP Server\ndescription: This guide shows you how to configure the TiDB MCP Server in Claude Code.\n---\n\n# Get started with Claude Code and TiDB MCP Server\n\nThis guide shows you how to configure the TiDB MCP Server in Claude Code.\n\n## Prerequisites\n\nBefore you begin, ensure you have the following:\n\n- **Claude Code**: Install it from [claude.com](https://claude.com/product/claude-code).\n- **Python (>=3.10) and uv**: Ensure Python (version 3.10 or later) and uv are installed. Follow the [installation guide](https://docs.astral.sh/uv/getting-started/installation/) to install uv.\n- **A TiDB cluster**: For a managed option, create a TiDB Cloud Starter cluster at [tidbcloud.com](https://tidbcloud.com/free-trial).\n\n## Connect to TiDB Cloud Starter (recommended)\n\nUse the TiDB Cloud console to generate a ready-to-run Claude Code command.\n\n1. Go to the [Clusters](https://tidbcloud.com/console/clusters) page, select your cluster, and then click **Use with AI Tools** in the upper-right corner.\n2. In the **Access `your_cluster_name` with AI tools** dialog, select the **Branch** and **Database** that Claude Code should access.\n3. Review the **Prerequisites** list in the dialog and install any missing dependencies.\n4. Configure the root password:\n\n   - If you have not set a password yet, click **Generate Password** and store it in a secure location (it is shown only once).\n   - If a password already exists, enter it in the **Enter the password for easy setup** field.\n   - If you forget the password, click **Reset password** in the **Prerequisites** section to generate a new one.\n\n5. Select the **Claude Code** tab, copy the setup command, and run it in your terminal.\n\n## Manual configuration (any TiDB cluster)\n\nIf you prefer manual setup, use one of the following methods and replace the placeholders with your connection parameters.\n\n### Method 1: CLI command\n\n```bash\nclaude mcp add --transport stdio TiDB \\\n  --env TIDB_HOST='<YOUR_TIDB_HOST>' \\\n  --env TIDB_PORT=<YOUR_TIDB_PORT> \\\n  --env TIDB_USERNAME='<YOUR_TIDB_USERNAME>' \\\n  --env TIDB_PASSWORD='<YOUR_TIDB_PASSWORD>' \\\n  --env TIDB_DATABASE='<YOUR_TIDB_DATABASE>' \\\n  -- uvx --from 'pytidb[mcp]' 'tidb-mcp-server'\n```\n\n### Method 2: Project config file\n\nAdd the following configuration to your project-level `.mcp.json` file. For details, see the [Claude Code MCP documentation](https://code.claude.com/docs/en/mcp#project-scope).\n\n```json\n{\n  \"mcpServers\": {\n    \"TiDB\": {\n      \"type\": \"stdio\",\n      \"command\": \"uvx\",\n      \"args\": [\"--from\", \"pytidb[mcp]\", \"tidb-mcp-server\"],\n      \"env\": {\n        \"TIDB_HOST\": \"<YOUR_TIDB_HOST>\",\n        \"TIDB_PORT\": \"<YOUR_TIDB_PORT>\",\n        \"TIDB_USERNAME\": \"<YOUR_TIDB_USERNAME>\",\n        \"TIDB_PASSWORD\": \"<YOUR_TIDB_PASSWORD>\",\n        \"TIDB_DATABASE\": \"<YOUR_TIDB_DATABASE>\"\n      }\n    }\n  }\n}\n```\n\n## See also\n\n- [TiDB MCP Server](./tidb-mcp-server.md)\n"
  },
  {
    "path": "src/ai/integrations/tidb-mcp-claude-desktop.md",
    "content": "---\ntitle: Get started with Claude Desktop and TiDB MCP Server\ndescription: This guide shows you how to configure the TiDB MCP Server in Claude Desktop.\n---\n\n# Integrate TiDB MCP Server with Claude Desktop\n\nThis guide shows you how to configure the TiDB MCP Server in Claude Desktop.\n\n## Prerequisites\n\nBefore you begin, ensure you have the following:\n\n- **Claude Desktop**: Download and install Claude Desktop from [claude.ai](https://claude.ai/download).\n- **Python (>=3.10) and uv**: Ensure Python (version 3.10 or later) and uv is installed. Follow the [installation guide](https://docs.astral.sh/uv/getting-started/installation/) to install uv.\n- **A TiDB Cloud Starter Cluster**: You can create a free TiDB cluster here [tidbcloud.com](https://tidbcloud.com/free-trial).\n\n## Setup steps\n\nYou can follow the steps below to set up the TiDB MCP Server in Claude Desktop:\n\n1. Open the **Settings** dialog.\n2. Click the **Developers** tab in the dialog.\n3. Click the **Edit Config** button to open the MCP config file `claude_desktop_config.json`.\n4. Copy the following configuration into the `claude_desktop_config.json` file.\n\n    ```json\n    {\n      \"mcpServers\": {\n        \"TiDB\": {\n          \"command\": \"uvx --from pytidb[mcp] tidb-mcp-server\",\n          \"env\": {\n            \"TIDB_HOST\": \"localhost\",\n            \"TIDB_PORT\": \"4000\",\n            \"TIDB_USERNAME\": \"root\",\n            \"TIDB_PASSWORD\": \"\",\n            \"TIDB_DATABASE\": \"test\"\n          }\n        }\n      }\n    }\n    ```\n\n5. Go to the [TiDB Cloud cluster page](https://tidbcloud.com/console/clusters) and navigate to the cluster you want to connect to.\n6. Click the **Connect** button to get the connection parameters, and replace the `TIDB_HOST`, `TIDB_PORT`, `TIDB_USERNAME`, `TIDB_PASSWORD`, and `TIDB_DATABASE` values with your own.\n7. Restart Claude Desktop.\n\n\nFor more details, please refer to the quickstart guide to learn [how to configure the MCP server in Claude Desktop](https://modelcontextprotocol.io/quickstart/user)."
  },
  {
    "path": "src/ai/integrations/tidb-mcp-cursor.md",
    "content": "---\ntitle: Get started with Cursor and TiDB MCP Server\ndescription: This guide shows you how to configure the TiDB MCP Server in the Cursor editor.\n---\n\n# Get started with Cursor and TiDB MCP Server\n\nThis guide shows you how to configure the TiDB MCP Server in the Cursor editor.\n\nFor one-click installation, you can click the following button:\n\n[![Install TiDB MCP Server](https://cursor.com/deeplink/mcp-install-dark.svg)](cursor://anysphere.cursor-deeplink/mcp/install?name=TiDB&config=eyJjb21tYW5kIjoidXZ4IC0tZnJvbSBweXRpZGJbbWNwXSB0aWRiLW1jcC1zZXJ2ZXIiLCJlbnYiOnsiVElEQl9IT1NUIjoibG9jYWxob3N0IiwiVElEQl9QT1JUIjoiNDAwMCIsIlRJREJfVVNFUk5BTU0iOiJyb290IiwiVElEQl9QQVNTV09SRCI6IiIsIlRJREJfREFUQUJBU0UiOiJ0ZXN0In19)\n\n## Prerequisites\n\nBefore you begin, ensure you have the following:\n\n- **Cursor Editor**: Download and install Cursor from [cursor.com](https://cursor.com).\n- **Python (>=3.10) and uv**: Ensure Python (version 3.10 or later) and uv are installed. Follow the [installation guide](https://docs.astral.sh/uv/getting-started/installation/) to install uv.\n- **A TiDB cluster**: For a managed option, create a TiDB Cloud Starter cluster at [tidbcloud.com](https://tidbcloud.com/free-trial).\n\n## Connect to TiDB Cloud Starter (recommended)\n\nUse the TiDB Cloud console to create a Cursor configuration with your cluster credentials.\n\n1. Go to the [Clusters](https://tidbcloud.com/console/clusters) page, select your cluster, and then click **Use with AI Tools** in the upper-right corner.\n2. In the **Access `your_cluster_name` with AI tools** dialog, select the **Branch** and **Database** that Cursor should access.\n3. Review the **Prerequisites** list in the dialog and install any missing dependencies.\n4. Configure the root password:\n\n   - If you have not set a password yet, click **Generate Password** and store it in a secure location (it is shown only once).\n   - If a password already exists, enter it in the **Enter the password for easy setup** field.\n   - If you forget the password, click **Reset password** in the **Prerequisites** section to generate a new one.\n\n5. Select the **Cursor** tab, click **Add to Cursor**, and then click **Install** in Cursor.\n\n## Manual configuration (any TiDB cluster)\n\nIf you prefer manual setup, add the following configuration to your `.cursor/mcp.json` file and replace the placeholders with your connection parameters:\n\n```json\n{\n  \"mcpServers\": {\n    \"TiDB\": {\n      \"command\": \"uvx --from pytidb[mcp] tidb-mcp-server\",\n      \"env\": {\n        \"TIDB_HOST\": \"<YOUR_TIDB_HOST>\",\n        \"TIDB_PORT\": \"<YOUR_TIDB_PORT>\",\n        \"TIDB_USERNAME\": \"<YOUR_TIDB_USERNAME>\",\n        \"TIDB_PASSWORD\": \"<YOUR_TIDB_PASSWORD>\",\n        \"TIDB_DATABASE\": \"<YOUR_TIDB_DATABASE>\"\n      }\n    }\n  }\n}\n```\n\nFor more details, see the [Model Context Protocol documentation](https://docs.cursor.com/context/model-context-protocol#configuring-mcp-servers).\n\n## Troubleshooting\n\nIf you encounter any issues installing the TiDB MCP Server, check the MCP logs in the Cursor editor.\n\n1. Click **View** > **Output** in the main menu at the top of the editor.\n2. Select **MCP** from the dropdown menu in the **Output** panel.\n3. If you see errors like `[error] Could not start MCP server tidb-mcp-server: Error: spawn uvx ENOENT`, it means the `uvx` command may not exist in your `$PATH` system variable. For macOS users, you can install `uvx` by running `brew install uv`.\n"
  },
  {
    "path": "src/ai/integrations/tidb-mcp-server.md",
    "content": "---\ntitle: TiDB MCP Server\ndescription: Manage your TiDB databases using natural language instructions with the TiDB MCP Server.\n---\n\n# TiDB MCP Server\n\nThe TiDB MCP Server is an open-source tool that enables you to interact with TiDB databases using natural language instructions.\n\n## Understanding MCP and TiDB MCP Server\n\nThe [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) is a protocol standardizes communication between LLMs and external tools.\n\nMCP adopts a client-server architecture, allowing a host application to connect to multiple external servers:\n\n- **Hosts**: AI-powered applications, such as Claude Desktop or IDEs like Cursor, that initiate connections to MCP servers.\n\n- **Clients**: Components embedded within host applications that establish one-to-one connections with individual MCP servers.\n\n- **Servers**: External services, such as the **TiDB MCP Server**, which provide tools, context, and prompts to clients for interacting with external systems.\n\nThe **TiDB MCP Server** is an MCP-compatible server that provides tools, context to MCP clients for interacting with TiDB databases.\n\n## Prerequisites\n\nBefore you begin, ensure you have the following:\n\n- **An MCP-compatible client**: For example, [Cursor](./tidb-mcp-cursor.md) or [Claude Desktop](./tidb-mcp-claude-desktop.md).\n- **Python (>=3.10) and uv**: Ensure Python (version 3.10 or later) and uv is installed. Follow the [installation guide](https://docs.astral.sh/uv/getting-started/installation/) to install uv.\n- **A TiDB Cloud Starter Cluster**: You can create a free TiDB cluster here [tidbcloud.com](https://tidbcloud.com/free-trial).\n\n## Supported MCP Clients\n\nRefer to the following guides for detailed examples of using the TiDB MCP Server with specific MCP clients:\n\n- [Cursor](./tidb-mcp-cursor.md)\n- [Claude Desktop](./tidb-mcp-claude-desktop.md)\n\nIf your MCP client is not listed above, please follow the setup steps below.\n\n## Setup steps\n\nThe TiDB MCP Server supports two modes to integrate with MCP clients:\n\n- Standard Input/Output (STDIO) mode (default)\n- Server-Sent Events (SSE) mode\n  \nTiDB MCP Server uses STDIO mode by default, which is not required to start up a standalone server in advance.\n\nYou can choose one of them to set up the TiDB MCP Server in your MCP client.\n\n### STDIO Mode\n\nTo set up the TiDB MCP Server in your MCP client using STDIO mode, follow these steps:\n\n1. Refer to your MCP client’s documentation to learn how to configure the MCP Server.\n\n2. Go to your [TiDB Cloud clusters](https://tidbcloud.com/console/clusters) page and navigate to your cluster.\n\n3. Click the **Connect** button in the cluster page to get the connection parameters.\n\n4. Configure the TiDB MCP Server with your connection parameters in the `mcpServers` section of your AI application’s configuration file.\n  \n      Example MCP configuration file:\n\n      ```json\n      {\n        \"mcpServers\": {\n          \"TiDB\": {\n            \"command\": \"uvx --from pytidb[mcp] tidb-mcp-server\",\n            \"env\": {\n              \"TIDB_HOST\": \"localhost\",\n              \"TIDB_PORT\": \"4000\",\n              \"TIDB_USERNAME\": \"root\",\n              \"TIDB_PASSWORD\": \"\",\n              \"TIDB_DATABASE\": \"test\"\n            }\n          }\n        }\n      }\n      ```\n\n### Server-Sent Events (SSE) Mode\n\nTo set up the TiDB MCP Server in your MCP client using SSE mode, follow these steps:\n\n1. Refer to your MCP client’s documentation to learn how to configure the MCP Server.\n\n2. Go to your [TiDB Cloud clusters](https://tidbcloud.com/console/clusters) page and navigate to your cluster.\n\n3. Click the **Connect** button in the cluster page to get the connection parameters.\n\n4. Create a `.env` file with your own connection parameters.\n\n    Example `.env` file:\n\n    ```bash\n    cat > .env <<EOF\n    TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com\n    TIDB_PORT=4000\n    TIDB_USERNAME={prefix}.root\n    TIDB_PASSWORD={password}\n    TIDB_DATABASE=test\n    EOF\n    ```\n\n5. Start the TiDB MCP Server with the `--transport sse` option.\n\n    ```bash\n    uvx --from \"pytidb[mcp]\" tidb-mcp-server --transport sse\n    ```\n\n6. Add the `TiDB` MCP server configuration to the `mcpServers` section of your AI application’s configuration file.\n\n    ```json\n    {\n      \"mcpServers\": {\n        \"TiDB\": {\n          \"url\": \"http://localhost:8000/sse\"\n        }\n      }\n    }\n    ```\n\n## Supported actions (tools)\n\nThe TiDB MCP Server provides the following actions (tools) to MCP clients. You can use these tools to interact with your TiDB projects and databases using natural language instructions.\n\n**Database Management**\n\n- `show_databases` - Show all databases in the TiDB cluster\n\n    * `username`: Database username (string, optional)\n    * `password`: Database password (string, optional)\n\n- `switch_database` - Switch to a specific database\n\n    * `db_name`: Database name to switch to (string, required)\n    * `username`: Database username (string, optional)\n    * `password`: Database password (string, optional)\n\n- `show_tables` - Show all tables in the current database\n\n**SQL query and execution**\n\n- `db_query` - Execute read-only SQL queries\n\n    * `sql_stmt`: SQL query statement (string, required)\n\n- `db_execute` - Execute data modification SQL statements\n\n    * `sql_stmts`: A single SQL statement or an array of SQL statements (string or array, required)\n\n**User Management**\n\n- `db_create_user` - Create a new database user\n\n    * `username`: Name for the new user (string, required)\n    * `password`: Password for the new user (string, required)\n\n- `db_remove_user` - Remove an existing database user\n\n    * `username`: Name of the user to remove (string, required)\n"
  },
  {
    "path": "src/ai/integrations/tidb-mcp-vscode.md",
    "content": "---\ntitle: Get started with Visual Studio Code and TiDB MCP Server\ndescription: This guide shows you how to configure the TiDB MCP Server in Visual Studio Code.\n---\n\n# Get started with Visual Studio Code and TiDB MCP Server\n\nThis guide shows you how to configure the TiDB MCP Server in Visual Studio Code (VS Code).\n\n## Prerequisites\n\nBefore you begin, ensure you have the following:\n\n- **Visual Studio Code**: Download and install VS Code from [code.visualstudio.com](https://code.visualstudio.com).\n- **Python (>=3.10) and uv**: Ensure Python (version 3.10 or later) and uv are installed. Follow the [installation guide](https://docs.astral.sh/uv/getting-started/installation/) to install uv.\n- **A TiDB cluster**: For a managed option, create a TiDB Cloud Starter cluster at [tidbcloud.com](https://tidbcloud.com/free-trial).\n\n## Connect to TiDB Cloud Starter (recommended)\n\nUse the TiDB Cloud console to generate a VS Code configuration.\n\n1. Go to the [Clusters](https://tidbcloud.com/console/clusters) page, select your cluster, and then click **Use with AI Tools** in the upper-right corner.\n2. In the **Access `your_cluster_name` with AI tools** dialog, select the **Branch** and **Database** that VS Code should access.\n3. Review the **Prerequisites** list in the dialog and install any missing dependencies.\n4. Configure the root password:\n\n   - If you have not set a password yet, click **Generate Password** and store it in a secure location (it is shown only once).\n   - If a password already exists, enter it in the **Enter the password for easy setup** field.\n   - If you forget the password, click **Reset password** in the **Prerequisites** section to generate a new one.\n\n5. Select the **VS Code** tab, click **Add to VS Code**, and then click **Install** in VS Code.\n\n## Manual configuration (any TiDB cluster)\n\nIf you prefer manual setup, add the following configuration to your `.vscode/mcp.json` file and replace the placeholders with your connection parameters:\n\n```json\n{\n  \"mcpServers\": {\n    \"TiDB\": {\n      \"type\": \"stdio\",\n      \"command\": \"uvx\",\n      \"args\": [\"--from\", \"pytidb[mcp]\", \"tidb-mcp-server\"],\n      \"env\": {\n        \"TIDB_HOST\": \"<YOUR_TIDB_HOST>\",\n        \"TIDB_PORT\": \"<YOUR_TIDB_PORT>\",\n        \"TIDB_USERNAME\": \"<YOUR_TIDB_USERNAME>\",\n        \"TIDB_PASSWORD\": \"<YOUR_TIDB_PASSWORD>\",\n        \"TIDB_DATABASE\": \"<YOUR_TIDB_DATABASE>\"\n      }\n    }\n  }\n}\n```\n\n## See also\n\n- [TiDB MCP Server](./tidb-mcp-server.md)\n"
  },
  {
    "path": "src/ai/integrations/tidb-mcp-windsurf.md",
    "content": "---\ntitle: Get started with Windsurf and TiDB MCP Server\ndescription: This guide shows you how to configure the TiDB MCP Server in Windsurf.\n---\n\n# Get started with Windsurf and TiDB MCP Server\n\nThis guide shows you how to configure the TiDB MCP Server in Windsurf.\n\n## Prerequisites\n\nBefore you begin, ensure you have the following:\n\n- **Windsurf**: Download and install Windsurf from [windsurf.com](https://windsurf.com).\n- **Python (>=3.10) and uv**: Ensure Python (version 3.10 or later) and uv are installed. Follow the [installation guide](https://docs.astral.sh/uv/getting-started/installation/) to install uv.\n- **A TiDB cluster**: For a managed option, create a TiDB Cloud Starter cluster at [tidbcloud.com](https://tidbcloud.com/free-trial).\n\n## Connect to TiDB Cloud Starter (recommended)\n\nUse the TiDB Cloud console to gather the connection details, then update Windsurf's MCP configuration.\n\n1. Go to the [Clusters](https://tidbcloud.com/console/clusters) page, select your cluster, and then click **Use with AI Tools** in the upper-right corner.\n2. In the **Access `your_cluster_name` with AI tools** dialog, select the **Branch** and **Database** that Windsurf should access.\n3. Review the **Prerequisites** list in the dialog and install any missing dependencies.\n4. Configure the root password:\n\n   - If you have not set a password yet, click **Generate Password** and store it in a secure location (it is shown only once).\n   - If a password already exists, enter it in the **Enter the password for easy setup** field.\n   - If you forget the password, click **Reset password** in the **Prerequisites** section to generate a new one.\n\n5. Select the **Windsurf** tab and copy the provided connection values.\n6. Update your `mcp_config.json` file using the copied values. For more information, see the [Windsurf MCP documentation](https://docs.windsurf.com/windsurf/cascade/mcp#adding-a-new-mcp-plugin).\n\n## Manual configuration (any TiDB cluster)\n\nIf you prefer manual setup, update your `mcp_config.json` file as follows and replace the placeholders with your connection parameters:\n\n```json\n{\n  \"mcpServers\": {\n    \"TiDB\": {\n      \"command\": \"uvx\",\n      \"args\": [\"--from\", \"pytidb[mcp]\", \"tidb-mcp-server\"],\n      \"env\": {\n        \"TIDB_HOST\": \"<YOUR_TIDB_HOST>\",\n        \"TIDB_PORT\": \"<YOUR_TIDB_PORT>\",\n        \"TIDB_USERNAME\": \"<YOUR_TIDB_USERNAME>\",\n        \"TIDB_PASSWORD\": \"<YOUR_TIDB_PASSWORD>\",\n        \"TIDB_DATABASE\": \"<YOUR_TIDB_DATABASE>\"\n      }\n    }\n  }\n}\n```\n\n## See also\n\n- [TiDB MCP Server](./tidb-mcp-server.md)\n"
  },
  {
    "path": "src/ai/javascripts/mathjax.js",
    "content": "window.MathJax = {\n    tex: {\n        inlineMath: [[\"\\\\(\", \"\\\\)\"]],\n        displayMath: [[\"\\\\[\", \"\\\\]\"]],\n        processEscapes: true,\n        processEnvironments: true\n    },\n    options: {\n        ignoreHtmlClass: \".*|\",\n        processHtmlClass: \"arithmatex\"\n    }\n};\n  \ndocument$.subscribe(() => { \n    MathJax.startup.output.clearCache()\n    MathJax.typesetClear()\n    MathJax.texReset()\n    MathJax.typesetPromise()\n})"
  },
  {
    "path": "src/ai/quickstart.md",
    "content": "---\ntitle: Quickstart\ndescription: Get started with TiDB using Python SDK.\n---\n\n# Quickstart\n\nIn this guide, you will learn how to get started with [vector search](./concepts/vector-search.md) in TiDB using Python SDK. Follow along to build your first AI application working with TiDB.\n\n## Prerequisites\n\n- Go [tidbcloud.com](https://tidbcloud.com/) to create a TiDB Cloud Starter cluster for free or using [tiup playground](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb/#deploy-a-local-test-cluster) to a TiDB Self-Managed cluster for local testing.\n\n## Installation\n\n[pytidb](https://github.com/pingcap/pytidb) is the official Python SDK for TiDB, designed to help developers build AI applications efficiently.\n\nTo install the Python SDK, run the following command:\n\n```bash\npip install pytidb\n```\n\nTo use built-in embedding function, install the `models` extension (alternative):\n\n```bash\npip install \"pytidb[models]\"\n```\n\n## Connect to database\n\n=== \"TiDB Cloud Starter\"\n\n    You can get these connection parameters from the [TiDB Cloud console](https://tidbcloud.com/clusters):\n\n    1. Navigate to the [Clusters page](https://tidbcloud.com/clusters), and then click the name of your target cluster to go to its overview page.\n    2. Click **Connect** in the upper-right corner. A connection dialog is displayed, with connection parameters listed.\n\n    For example, if the connection parameters are displayed as follows:\n\n    ```text\n    HOST:     gateway01.us-east-1.prod.shared.aws.tidbcloud.com\n    PORT:     4000\n    USERNAME: 4EfqPF23YKBxaQb.root\n    PASSWORD: abcd1234\n    DATABASE: test\n    CA:       /etc/ssl/cert.pem\n    ```\n\n    The corresponding Python code to connect to the TiDB Cloud Starter cluster would be as follows:\n\n    ```python\n    from pytidb import TiDBClient\n\n    client = TiDBClient.connect(\n        host=\"gateway01.us-east-1.prod.shared.aws.tidbcloud.com\",\n        port=4000,\n        username=\"4EfqPF23YKBxaQb.root\",\n        password=\"abcd1234\",\n        database=\"test\",\n    )\n    ```\n\n    > **Note:** The preceding example is for demonstration purposes only. You need to fill in the parameters with your own values and keep them secure.\n\n=== \"TiDB Self-Managed\"\n\n    Here is a basic example for connecting to a self-managed TiDB cluster:\n\n    ```python\n    from pytidb import TiDBClient\n\n    client = TiDBClient.connect(\n        host=\"localhost\",\n        port=4000,\n        username=\"root\",\n        password=\"\",\n        database=\"test\",\n        ensure_db=True,\n    )\n    ```\n\n    > **Tip:** Please modify the connection parameters according to your actual deployment.\n\nOnce connected, you can use the `client` object to operate tables, query data, and more. \n\n## Create an embedding function\n\nWhen working with [embedding models](./concepts/vector-search.md#embedding-model), you can leverage the embedding function to automatically vectorize your data at both insertion and query stages. It natively supports popular embedding models like OpenAI, Jina AI, Hugging Face, Sentence Transformers, and others.\n\n=== \"OpenAI\"\n\n    Go [OpenAI platform](https://platform.openai.com/api-keys) to create your API key for embedding.\n\n    ```python\n    from pytidb.embeddings import EmbeddingFunction\n\n    text_embed = EmbeddingFunction(\n        model_name=\"openai/text-embedding-3-small\",\n        api_key=\"<your-openai-api-key>\",\n    )\n    ```\n\n=== \"Jina AI\"\n\n    Go [Jina AI](https://jina.ai/embeddings/) to create your API key for embedding.\n\n    ```python\n    from pytidb.embeddings import EmbeddingFunction\n\n    text_embed = EmbeddingFunction(\n        model_name=\"jina/jina-embeddings-v3\",\n        api_key=\"<your-jina-api-key>\",\n    )\n    ```\n\n## Create a table\n\nAs an example, create a table named `chunks` with the following columns:\n\n- `id` (int): the ID of the chunk.\n- `text` (text): the text content of the chunk.\n- `text_vec` (vector): the vector embeddings of the text.\n- `user_id` (int): the ID of the user who created the chunk.\n\n=== \"Python\"\n\n    ```python hl_lines=\"6\"\n    from pytidb.schema import TableModel, Field, VectorField\n\n    class Chunk(TableModel):\n        id: int | None = Field(default=None, primary_key=True)\n        text: str = Field()\n        text_vec: list[float] = text_embed.VectorField(source_field=\"text\")\n        user_id: int = Field()\n\n    table = client.create_table(schema=Chunk, if_exists=\"overwrite\")\n    ```\n\nOnce created, you can use the `table` object to insert data, search data, and more.\n\n## Insert Data\n\nNow let's add some sample data to our table. \n\n```python\ntable.bulk_insert([\n    # 👇 The text will be automatically embedded and populated into the `text_vec` field.\n    Chunk(text=\"PyTiDB is a Python library for developers to connect to TiDB.\", user_id=2),\n    Chunk(text=\"LlamaIndex is a framework for building AI applications.\", user_id=2),\n    Chunk(text=\"OpenAI is a company and platform that provides AI models service and tools.\", user_id=3),\n])\n```\n\n\n## Search for nearest neighbors\n\nTo search for nearest neighbors of a given query, you can use the `table.search()` method, it will perform a [vector search](./guides/vector-search.md) by default.\n\n```python\ntable.search(\n    # 👇 Pass the query text directly, it will be embedded to a query vector automatically.\n    \"A library for my artificial intelligence software\"\n)\n.limit(3).to_list()\n```\n\nIn this example, vector search compares the query vector with the stored vectors in the `text_vec` field of the `chunks` table and returns the top 3 most semantically relevant results based on similarity scores.\n\nThe closer `_distance` means the more similar the two vectors are.\n\n```json title=\"Expected output\"\n[\n    {\n        'id': 2,\n        'text': 'LlamaIndex is a framework for building AI applications.',\n        'text_vec': [...],\n        'user_id': 2,\n        '_distance': 0.5719928358786761,\n        '_score': 0.4280071641213239\n    },\n    {\n        'id': 3,\n        'text': 'OpenAI is a company and platform that provides AI models service and tools.',\n        'text_vec': [...],\n        'user_id': 3,\n        '_distance': 0.603133726213383,\n        '_score': 0.396866273786617\n    },\n    {\n        'id': 1,\n        'text': 'PyTiDB is a Python library for developers to connect to TiDB.',\n        'text_vec': [...],\n        'user_id': 2,\n        '_distance': 0.6202191842385758,\n        '_score': 0.3797808157614242\n    }\n]\n```\n\n## Delete data\n\nTo delete a specific row from the table, you can use the `table.delete()` method:\n\n```python\ntable.delete({\n    \"id\": 1\n})\n```\n\n## Drop table\n\nWhen you no longer need a table, you can drop it using the `client.drop_table()` method:\n\n```python\nclient.drop_table(\"chunks\")\n```\n\n## Next steps\n\n- Learn more details about [Vector Search](./guides/vector-search.md), [Fulltext Search](./guides/fulltext-search.md) and [Hybrid Search](./guides/hybrid-search.md) in TiDB.\n"
  },
  {
    "path": "src/robots.txt",
    "content": "User-agent: *\nAllow: /\n\nSitemap:  https://pingcap.github.io/sitemap.xml\n"
  },
  {
    "path": "src/styles/extra.css",
    "content": "/* Dark mode */\n[data-md-color-scheme=\"tidb-dark\"] {\n\n    --md-default-fg-color:               hsla(0, 0%, 95%, 1);\n    --md-default-fg-color--light:        hsla(0, 0%, 95%, 0.7);\n    --md-default-fg-color--lighter:      hsla(0, 0%, 95%, 0.3);\n    --md-default-fg-color--lightest:     hsla(0, 0%, 90%, 0.12);\n    --md-default-bg-color:               hsl(0, 0%, 10%);\n    --md-default-bg-color--light:        hsla(0, 0%, 10%, 0.54);\n    --md-default-bg-color--lighter:      hsla(0, 0%, 10%, 0.32);\n    --md-default-bg-color--lightest:     hsla(0, 0%, 10%, 0.07);\n  \n    /* Code color shades */\n    --md-code-fg-color:                  hsla(220, 13%, 85%, 1);\n    --md-code-bg-color:                  hsla(223, 13%, 16%, 1);\n  \n    /* Code highlighting color shades */\n    --md-code-hl-color:                  hsla(220, 100%, 67%, 1);\n    --md-code-hl-color--light:           hsla(220, 91%, 39%, 0.1);\n  \n    /* Code highlighting syntax color shades */\n    --md-code-hl-number-color:           hsla(14, 80%, 55%, 1);\n    --md-code-hl-special-color:          hsla(340, 80%, 60%, 1);\n    --md-code-hl-function-color:         hsla(200, 80%, 65%, 1);\n    --md-code-hl-constant-color:         hsla(250, 70%, 65%, 1);\n    --md-code-hl-keyword-color:          hsla(219, 80%, 65%, 1);\n    --md-code-hl-string-color:           hsla(150, 60%, 45%, 1);\n    --md-code-hl-name-color:             var(--md-code-fg-color);\n    --md-code-hl-operator-color:         var(--md-default-fg-color--light);\n    --md-code-hl-punctuation-color:      var(--md-default-fg-color--light);\n    --md-code-hl-comment-color:          var(--md-default-fg-color--light);\n    --md-code-hl-generic-color:          var(--md-default-fg-color--light);\n    --md-code-hl-variable-color:         var(--md-default-fg-color--light);\n  \n    /* Typeset color shades */\n    --md-typeset-color:                  var(--md-default-fg-color);\n  \n    /* Typeset `a` color shades */\n    --md-typeset-a-color:                var(--md-accent-fg-color);\n  \n    /* Typeset `del` and `ins` color shades */\n    --md-typeset-del-color:              hsla(6, 90%, 60%, 0.15);\n    --md-typeset-ins-color:              hsla(150, 90%, 44%, 0.15);\n  \n    /* Typeset `kbd` color shades */\n    --md-typeset-kbd-color:              hsla(0, 0%, 98%, 1);\n    --md-typeset-kbd-accent-color:       hsla(0, 100%, 100%, 1);\n    --md-typeset-kbd-border-color:       hsla(0, 0%, 72%, 1);\n  \n    /* Typeset `mark` color shades */\n    --md-typeset-mark-color:             hsla(48, 100%, 70%, 0.5);\n  \n    /* Typeset `table` color shades */\n    --md-typeset-table-color:            hsla(0, 0%, 0%, 0.12);\n    --md-typeset-table-color--light:     hsla(0, 0%, 0%, 0.035);\n  \n    /* Admonition color shades */\n    --md-admonition-fg-color:            var(--md-default-fg-color);\n    --md-admonition-bg-color:            var(--md-default-bg-color);\n  \n    /* Warning color shades */\n    --md-warning-fg-color:              hsla(0, 0%, 0%, 0.87);\n    --md-warning-bg-color:              hsla(60, 100%, 80%, 1);\n  \n    /* Footer color shades */\n    --md-footer-fg-color:                hsla(0, 0%, 100%, 1);\n    --md-footer-fg-color--light:         hsla(0, 0%, 100%, 0.7);\n    --md-footer-fg-color--lighter:       hsla(0, 0%, 100%, 0.45);\n    --md-footer-bg-color:                hsla(0, 0%, 0%, 0.87);\n    --md-footer-bg-color--dark:          hsla(0, 0%, 0%, 0.32);\n  \n    /* Shadow depth 1 */\n    --md-shadow-z1:\n      0 0.25rem 0.625rem hsla(0, 0%, 0%, 0.05),\n      0 0 0.0625rem  hsla(0, 0%, 0%, 0.1);\n  \n    /* Shadow depth 2 */\n    --md-shadow-z2:\n      0 0.25rem 0.625rem hsla(0, 0%, 0%, 0.1),\n      0 0 0.0625rem  hsla(0, 0%, 0%, 0.25);\n  \n    /* Shadow depth 3 */\n    --md-shadow-z3:\n      0 0.25rem 0.625rem hsla(0, 0%, 0%, 0.2),\n      0 0 0.0625rem  hsla(0, 0%, 0%, 0.35);\n\n    /* Primary - Navbar */\n    --md-primary-fg-color:               hsla(0, 0%, 0%, 1);\n    --md-primary-fg-color--light:        hsla(0, 0%, 0%, 0.9);\n    --md-primary-fg-color--dark:         hsla(0, 0%, 0%, 0.8);\n    --md-primary-bg-color:               hsla(0, 0%, 100%, 1);\n    --md-primary-bg-color--light:        hsla(0, 0%, 100%, 0.7);\n\n    --md-accent-fg-color:                #de243d;\n    --md-accent-fg-color--transparent:   hsla(0, 0%, 100%, 0.5);\n    --md-accent-bg-color:                hsla(0, 0%, 100%, 0.9);\n    --md-accent-bg-color--light:         hsla(0, 0%, 100%, 0.8);\n}\n\n/* Light mode */\n[data-md-color-scheme=\"tidb-light\"] {\n    --md-default-fg-color:               hsla(0, 0%, 10%, 1);\n    --md-default-fg-color--light:        hsla(0, 0%, 10%, 0.7);\n    --md-default-fg-color--lighter:      hsla(0, 0%, 10%, 0.3);\n    --md-default-fg-color--lightest:     hsla(0, 0%, 10%, 0.12);\n    --md-default-bg-color:               hsl(0, 0%, 100%);\n    --md-default-bg-color--light:        hsla(0, 0%, 100%, 0.54);\n    --md-default-bg-color--lighter:      hsla(0, 0%, 100%, 0.32);\n    --md-default-bg-color--lightest:     hsla(0, 0%, 100%, 0.07);\n\n    /* Code color shades */\n    --md-code-fg-color:                  #2d2d2d;\n    --md-code-bg-color:                  #f7f7f9;\n\n    /* Typeset color shades */\n    --md-typeset-color:                  var(--md-default-fg-color);\n\n    /* Typeset `a` color shades */\n    --md-typeset-a-color:                var(--md-accent-fg-color);\n\n    /* Typeset `del` and `ins` color shades */\n    --md-typeset-del-color:              hsla(6, 90%, 60%, 0.15);\n    --md-typeset-ins-color:              hsla(150, 90%, 44%, 0.15);\n\n    /* Typeset `kbd` color shades */\n    --md-typeset-kbd-color:              hsla(0, 0%, 10%, 1);\n    --md-typeset-kbd-accent-color:       hsla(0, 0%, 0%, 1);\n    --md-typeset-kbd-border-color:       hsla(0, 0%, 72%, 1);\n\n    /* Typeset `mark` color shades */\n    --md-typeset-mark-color:             hsla(48, 100%, 70%, 0.5);\n\n    /* Typeset `table` color shades */\n    --md-typeset-table-color:            hsla(0, 0%, 0%, 0.12);\n    --md-typeset-table-color--light:     hsla(0, 0%, 0%, 0.035);\n\n    /* Admonition color shades */\n    --md-admonition-fg-color:            var(--md-default-fg-color);\n    --md-admonition-bg-color:            var(--md-default-bg-color);\n\n    /* Warning color shades */\n    --md-warning-fg-color:              hsla(0, 0%, 0%, 0.87);\n    --md-warning-bg-color:              hsla(60, 100%, 80%, 1);\n\n    /* Footer color shades */\n    --md-footer-fg-color:                hsla(0, 0%, 100%, 1);\n    --md-footer-fg-color--light:         hsla(0, 0%, 100%, 0.7);\n    --md-footer-fg-color--lighter:       hsla(0, 0%, 100%, 0.45);\n    --md-footer-bg-color:                hsla(0, 0%, 0%, 0.87);\n    --md-footer-bg-color--dark:          hsla(0, 0%, 0%, 0.32);\n\n    /* Shadow depth 1 */\n    --md-shadow-z1:\n      0 0.25rem 0.625rem hsla(0, 0%, 0%, 0.05),\n      0 0 0.0625rem  hsla(0, 0%, 0%, 0.1);\n\n    /* Shadow depth 2 */\n    --md-shadow-z2:\n      0 0.25rem 0.625rem hsla(0, 0%, 0%, 0.1),\n      0 0 0.0625rem  hsla(0, 0%, 0%, 0.25);\n\n    /* Shadow depth 3 */\n    --md-shadow-z3:\n      0 0.25rem 0.625rem hsla(0, 0%, 0%, 0.2),\n      0 0 0.0625rem  hsla(0, 0%, 0%, 0.35);\n\n    /* Primary - Navbar */\n    --md-primary-fg-color:               hsla(0, 0%, 0%, 1);\n    --md-primary-fg-color--light:        hsla(0, 0%, 0%, 0.9);\n    --md-primary-fg-color--dark:         hsla(0, 0%, 0%, 0.8);\n    --md-primary-bg-color:               hsla(0, 0%, 100%, 1);\n    --md-primary-bg-color--light:        hsla(0, 0%, 100%, 0.7);\n\n    --md-accent-fg-color:                #de243d;\n    --md-accent-fg-color--transparent:   hsla(0, 0%, 0%, 0.5);\n    --md-accent-bg-color:                hsla(0, 0%, 0%, 0.9);\n    --md-accent-bg-color--light:         hsla(0, 0%, 0%, 0.8);\n}\n\n.md-header__title {\n    margin-left: 0 !important;\n}\n"
  },
  {
    "path": "src/templates/demo_gallery_template.j2",
    "content": "---\ntitle: {{ config.title }}\ndescription: {{ config.meta.description }}\nhide:\n  - navigation\n  - toc\n  - pageTitle\n  - editButton \n---\n\n<style>\n\n/* CSS Variables */\n:root {\n  --brand-color: #de243d;\n  --brand-hover: #b71e34;\n  --border-radius-sm: 0.5rem;\n  --border-radius-md: 0.75rem;\n  --border-radius-lg: 1rem;\n  --spacing-sm: 1rem;\n  --spacing-md: 2rem;\n  --spacing-lg: 3rem;\n  --transition-fast: 0.2s;\n  --transition-normal: 0.3s;\n  --dark-overlay: rgba(255, 255, 255, 0.08);\n  --dark-border: rgba(255, 255, 255, 0.1);\n  --dark-bg-subtle: rgba(255, 255, 255, 0.05);\n}\n\n/* Smooth scrolling for the entire page */\nhtml {\n  scroll-behavior: smooth;\n}\n\n/* Gallery Container */\n.gallery-container {\n  max-width: 1280px;\n  margin: 0 auto;\n  padding: var(--spacing-md) var(--spacing-sm);\n}\n\n/* Header */\n.gallery-header {\n  text-align: center;\n  margin-bottom: var(--spacing-lg);\n}\n\n.gallery-title {\n  font-size: 72px !important;\n  font-weight: 800 !important;\n  margin-bottom: 8px !important;\n  line-height: 1 !important;\n  color: var(--md-default-fg-color) !important;\n}\n\n.gallery-description {\n  font-size: 22px !important;\n  color: var(--md-default-fg-color--light) !important;\n  padding: 0 120px;\n  margin-bottom: 5rem !important;\n}\n\n/* Gallery CTA link styles */\n.gallery-cta-link {\n    position: relative;\n    text-decoration: none;\n    transition: all 0.3s ease-in-out;\n}\n\n.gallery-cta-link:hover {\n    border-bottom: 3px solid var(--brand-color);\n}\n\n\n/* Layout */\n.gallery-layout {\n  display: flex;\n  gap: var(--spacing-sm) !important;\n}\n\n/* Sidebar */\n.gallery-sidebar {\n  width: 8rem;\n  flex-shrink: 0;\n}\n\n.sidebar-nav {\n  position: sticky;\n  top: 140px;\n}\n\n.sidebar-title {\n  font-size: 14px !important;\n  font-weight: 400 !important;\n  color: var(--md-default-fg-color--light) !important;\n  text-transform: uppercase;\n  letter-spacing: 0.05em;\n  margin: 0 0 0.5rem 0 !important;\n}\n\n.sidebar-links {\n  display: flex;\n  flex-direction: column;\n  gap: 2px;\n  margin-left: -12px;\n}\n\n.sidebar-link {\n  display: block;\n  padding: 8px 12px;\n  border-radius: var(--border-radius-sm);\n  font-size: 14px;\n  font-weight: 400 !important;\n  color: var(--md-default-fg-color--light) !important;\n  text-decoration: none !important;\n  transition: all var(--transition-fast) ease;\n  text-align: left;\n  cursor: pointer;\n}\n\n.sidebar-link:hover {\n  background-color: var(--md-default-fg-color--lightest) !important;\n  color: var(--md-default-fg-color) !important;\n  font-weight: 500 !important;\n  transform: translateX(2px);\n}\n\n.sidebar-link:focus-visible {\n  outline: 2px solid var(--brand-color);\n  outline-offset: 2px;\n}\n\n/* Content */\n.gallery-content {\n  flex: 1;\n  padding: 0 var(--spacing-lg);\n}\n\n.gallery-section {\n  margin-bottom: var(--spacing-lg);\n  scroll-margin-top: 120px;\n}\n\n.section-title {\n  font-size: 24px !important;\n  font-weight: 700 !important;\n  color: var(--md-default-fg-color) !important;\n  margin: 0 0 1.5rem 0 !important;\n}\n\n.cards-grid {\n  display: grid;\n  grid-template-columns: repeat(3, 1fr);\n  gap: var(--spacing-sm) !important;\n}\n\n/* Cards */\n.gallery-card {\n  display: block;\n  background: var(--md-default-bg-color);\n  border: 1px solid var(--md-default-fg-color--lightest);\n  border-radius: var(--border-radius-md);\n  overflow: hidden;\n  transition: all var(--transition-normal) ease;\n  text-decoration: none !important;\n  color: inherit;\n}\n\n.gallery-card:hover {\n  transform: translateY(-2px);\n  box-shadow: var(--md-shadow-z2);\n  text-decoration: none !important;\n  outline: 2px solid var(--brand-color);\n  outline-offset: 2px;\n}\n\n\n.gallery-card:hover .card-title {\n  color: var(--brand-color) !important;\n}\n\n.card-image {\n  height: 8rem;\n  position: relative;\n  overflow: hidden;\n  background-color: var(--md-default-fg-color--lightest);\n  border-bottom: 1px solid var(--md-default-fg-color--lightest);\n}\n\n.card-image img {\n  width: 100%;\n  height: 100%;\n  object-fit: cover;\n}\n\n.card-gradient {\n  display: flex;\n  align-items: center;\n  justify-content: center;\n  font-size: 2.5rem;\n}\n\n.card-badge {\n  position: absolute;\n  top: 0.5rem;\n  left: 0.5rem;\n  background: rgba(0, 0, 0, 0.7);\n  color: white;\n  padding: 0.2rem 0.4rem;\n  border-radius: 0.2rem;\n  font-size: 0.5rem !important;\n  font-weight: 500 !important;\n}\n\n.card-content {\n  padding: 12px;\n}\n\n.card-title {\n  font-size: 0.75rem !important;\n  font-weight: 600 !important;\n  line-height: 1.5 !important;\n  margin: 0 !important;\n  color: var(--md-default-fg-color) !important;\n}\n\n.card-description {\n  color: var(--md-default-fg-color--light) !important;\n  font-size: 0.65rem !important;\n  line-height: 1.5;\n  display: -webkit-box;\n  margin: 0;\n  -webkit-line-clamp: 3;\n  -webkit-box-orient: vertical;\n  overflow: hidden;\n}\n\n/* CTA */\n.gallery-cta {\n  background: linear-gradient(135deg, rgba(222, 36, 61, 0.08) 0%, rgba(99, 102, 241, 0.08) 100%);\n  border: 1px solid rgba(222, 36, 61, 0.1);\n  border-radius: var(--border-radius-lg);\n  padding: var(--spacing-md);\n  text-align: center;\n  margin-top: var(--spacing-lg);\n}\n\n.cta-title {\n  font-size: 1.5rem;\n  font-weight: 700;\n  color: var(--md-default-fg-color);\n  margin-bottom: var(--spacing-sm);\n}\n\n.cta-description {\n  color: var(--md-default-fg-color--light);\n  margin: 0 auto var(--spacing-md);\n  max-width: 42rem;\n}\n\n.cta-buttons {\n  display: flex;\n  justify-content: center;\n  gap: var(--spacing-sm);\n  flex-wrap: wrap;\n}\n\n/* Button shared styles */\n.btn-primary,\n.btn-secondary {\n  padding: 0.75rem 1.5rem;\n  border-radius: var(--border-radius-sm);\n  font-weight: 500;\n  text-decoration: none !important;\n  transition: all var(--transition-fast);\n}\n\n.btn-primary {\n  background-color: var(--brand-color);\n  color: white !important;\n}\n\n.btn-primary:hover {\n  background-color: var(--brand-hover);\n  color: white !important;\n}\n\n.btn-secondary {\n  border: 1px solid var(--md-default-fg-color--lighter);\n  background-color: var(--md-default-bg-color);\n  color: var(--md-default-fg-color) !important;\n}\n\n.btn-secondary:hover {\n  background-color: var(--md-default-fg-color--lightest);\n  color: var(--md-default-fg-color) !important;\n}\n\n/* Dark mode styles */\n[data-md-color-scheme=\"tidb-dark\"] .sidebar-link:hover {\n  background-color: var(--dark-overlay) !important;\n}\n\n[data-md-color-scheme=\"tidb-dark\"] .gallery-card {\n  border-color: var(--dark-border);\n}\n\n[data-md-color-scheme=\"tidb-dark\"] .card-image {\n  background-color: var(--dark-bg-subtle);\n  border-bottom-color: var(--dark-border);\n}\n\n[data-md-color-scheme=\"tidb-dark\"] .gallery-cta {\n  background: linear-gradient(135deg, rgba(222, 36, 61, 0.12) 0%, rgba(99, 102, 241, 0.12) 100%);\n  border-color: rgba(222, 36, 61, 0.2);\n}\n\n[data-md-color-scheme=\"tidb-dark\"] .btn-secondary:hover {\n  background-color: rgba(255, 255, 255, 0.1);\n}\n\n/* Responsive adjustments */\n@media (max-width: 768px) {\n  .gallery-layout { flex-direction: column; }\n  .gallery-sidebar { width: 100%; }\n  .sidebar-nav { position: static; }\n  .sidebar-links { flex-direction: row; gap: 0.5rem; flex-wrap: wrap; }\n  .sidebar-link { padding: 12px 16px !important; min-height: 44px; display: flex; align-items: center; }\n  .gallery-content { padding: 0; }\n  .gallery-description { padding: 0 20px; }\n  .cards-grid { grid-template-columns: 1fr; }\n  .gallery-title { font-size: 48px !important; }\n}\n\n@media (max-width: 1024px) and (min-width: 769px) {\n  .cards-grid { grid-template-columns: repeat(2, 1fr); }\n}\n\n</style>\n\n\n\n<div class=\"gallery-container\">\n  <div class=\"gallery-header\">\n    <h1 class=\"gallery-title\">{{ config.title }}</h1>\n    <p class=\"gallery-description\">\n      {{ config.description | safe }}\n    </p>\n  </div>\n\n  <div class=\"gallery-layout\">\n    <div class=\"gallery-sidebar\">\n      <div class=\"sidebar-nav\">\n        <h6 class=\"sidebar-title\">Categories</h6>\n                          <nav class=\"sidebar-links\">\n            {%- for category in categories %}\n            <a href=\"#{{ category.id }}\" class=\"sidebar-link\">{{ category.title | replace('⭐ ', '') | replace('🚀 ', '') | replace('🔍 ', '') | replace('🤖 ', '') }}</a>\n            {%- endfor %}\n        </nav>\n      </div>\n    </div>\n              <div class=\"gallery-content\">\n        {%- for category in categories %}\n        {%- if category.demos %}\n        {%- set category_demos = [] %}\n        {%- for demo_id in category.demos %}\n        {%- set demo = demos | selectattr('id', 'equalto', demo_id) | first %}\n        {%- if demo %}{% set _ = category_demos.append(demo) %}{% endif %}\n        {%- endfor %}\n        <section id=\"{{ category.id }}\" class=\"gallery-section\">\n        <h2 class=\"section-title\">{{ category.title }}</h2>\n        <div class=\"cards-grid\">\n          {%- for demo in category_demos %}\n          <a href=\"{{ demo.link }}\" class=\"gallery-card\">\n            <div class=\"card-image{% if demo.background or demo.icon %} card-gradient{% endif %}\"{% if demo.background %} style=\"background: {{ demo.background }};\"{% endif %}>\n              {%- if demo.cover_image %}\n              <img src=\"{{ demo.cover_image }}\" alt=\"{{ demo.title }} Demo\">\n              {%- elif demo.icon %}\n              <div>{{ demo.icon }}</div>\n              {%- endif %}\n            </div>\n            <div class=\"card-content\">\n              <h3 class=\"card-title\">{{ demo.title }}</h3>\n              <p class=\"card-description\">\n                {{ demo.description }}\n              </p>\n            </div>\n          </a>\n          {%- endfor %}\n        </div>\n      </section>\n      {%- endif %}\n      {%- endfor %}\n      <div class=\"gallery-cta\">\n        <h3 class=\"cta-title\">{{ config.cta.title }}</h3>\n        <p class=\"cta-description\">\n          {{ config.cta.description }}\n        </p>\n        <div class=\"cta-buttons\">\n          {%- for button in config.cta.buttons %}\n          <a href=\"{{ button.url }}\"{% if button.external %} target=\"_blank\" rel=\"noopener noreferrer\"{% endif %} class=\"btn-{{ button.type }}\">{{ button.text }}</a>\n          {%- endfor %}\n        </div>\n      </div>\n    </div>\n  </div>\n</div> "
  },
  {
    "path": "src/templates/demo_page_template.j2",
    "content": "---\ntitle: {{ demo.title }}\ndescription: \"{{ demo.description }}\"\nsource_repo: \"{{ demo.doc_link | replace('/README.md', '') }}\"\n---\n\n{{ content | safe }}\n\n---\n\n## Related Resources\n\n- **Source Code**: [View on GitHub]({{ demo.doc_link | replace('/README.md', '') }})\n- **Category**: {{ demo.category | title }}\n{% if demo.description %}\n- **Description**: {{ demo.description }}\n{% endif %}\n\n[🏠 Back to Demo Gallery](../index.md){ .md-button .md-button--primary } "
  }
]