Full Code of oobabooga/text-generation-webui for AI

main 256431f25869 cached

118 files

1013.0 KB

240.9k tokens

687 symbols

1 requests

Download .txt

Showing preview only (1,057K chars total). Download the full file or copy to clipboard to get everything.

Repository: oobabooga/text-generation-webui
Branch: main
Commit: 256431f25869
Files: 118
Total size: 1013.0 KB

Directory structure:
gitextract_rxb2e4y6/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report_template.yml
│   │   └── feature_request.md
│   ├── dependabot.yml
│   ├── pull_request_template.md
│   └── workflows/
│       ├── build-everything-tgw.yml
│       ├── build-portable-release-cuda.yml
│       ├── build-portable-release-rocm.yml
│       ├── build-portable-release-vulkan.yml
│       └── build-portable-release.yml
├── .gitignore
├── LICENSE
├── README.md
├── cmd_linux.sh
├── cmd_macos.sh
├── cmd_windows.bat
├── docker/
│   ├── .dockerignore
│   ├── TensorRT-LLM/
│   │   └── Dockerfile
│   ├── amd/
│   │   ├── Dockerfile
│   │   └── docker-compose.yml
│   ├── cpu/
│   │   ├── Dockerfile
│   │   └── docker-compose.yml
│   ├── intel/
│   │   ├── Dockerfile
│   │   └── docker-compose.yml
│   └── nvidia/
│       ├── Dockerfile
│       └── docker-compose.yml
├── docs/
│   ├── 01 - Chat Tab.md
│   ├── 02 - Default and Notebook Tabs.md
│   ├── 03 - Parameters Tab.md
│   ├── 04 - Model Tab.md
│   ├── 05 - Training Tab.md
│   ├── 06 - Session Tab.md
│   ├── 07 - Extensions.md
│   ├── 08 - Additional Tips.md
│   ├── 09 - Docker.md
│   ├── 11 - AMD Setup.md
│   ├── 12 - OpenAI API.md
│   ├── 13 - Keyboard Shortcuts.md
│   ├── Image Generation Tutorial.md
│   ├── Multimodal Tutorial.md
│   ├── README.md
│   ├── Tool Calling Tutorial.md
│   └── What Works.md
├── download-model.py
├── js/
│   ├── dark_theme.js
│   ├── global_scope_js.js
│   ├── katex/
│   │   └── auto-render.js
│   ├── main.js
│   ├── save_files.js
│   ├── show_controls.js
│   ├── switch_tabs.js
│   └── update_big_picture.js
├── modules/
│   ├── LoRA.py
│   ├── callbacks.py
│   ├── chat.py
│   ├── evaluate.py
│   ├── exllamav3.py
│   ├── exllamav3_hf.py
│   ├── extensions.py
│   ├── grammar/
│   │   ├── grammar_utils.py
│   │   └── logits_process.py
│   ├── html_generator.py
│   ├── image_models.py
│   ├── image_utils.py
│   ├── llama_cpp_server.py
│   ├── loaders.py
│   ├── logging_colors.py
│   ├── logits.py
│   ├── metadata_gguf.py
│   ├── models.py
│   ├── models_settings.py
│   ├── paths.py
│   ├── presets.py
│   ├── prompts.py
│   ├── reasoning.py
│   ├── sampler_hijack.py
│   ├── sane_markdown_lists.py
│   ├── shared.py
│   ├── tensorrt_llm.py
│   ├── text_generation.py
│   ├── tool_parsing.py
│   ├── tool_use.py
│   ├── torch_utils.py
│   ├── training.py
│   ├── transformers_loader.py
│   ├── ui.py
│   ├── ui_chat.py
│   ├── ui_default.py
│   ├── ui_file_saving.py
│   ├── ui_image_generation.py
│   ├── ui_model_menu.py
│   ├── ui_notebook.py
│   ├── ui_parameters.py
│   ├── ui_session.py
│   ├── utils.py
│   └── web_search.py
├── one_click.py
├── requirements/
│   ├── full/
│   │   ├── requirements.txt
│   │   ├── requirements_amd.txt
│   │   ├── requirements_apple_intel.txt
│   │   ├── requirements_apple_silicon.txt
│   │   ├── requirements_cpu_only.txt
│   │   └── requirements_nowheels.txt
│   └── portable/
│       ├── requirements.txt
│       ├── requirements_amd.txt
│       ├── requirements_apple_intel.txt
│       ├── requirements_apple_silicon.txt
│       ├── requirements_cpu_only.txt
│       ├── requirements_cuda131.txt
│       ├── requirements_nowheels.txt
│       └── requirements_vulkan.txt
├── server.py
├── setup.cfg
├── start_linux.sh
├── start_macos.sh
├── start_windows.bat
├── update_wizard_linux.sh
├── update_wizard_macos.sh
└── update_wizard_windows.bat

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/ISSUE_TEMPLATE/bug_report_template.yml
================================================
name: "Bug report"
description: Report a bug
labels: [ "bug" ]
body:
  - type: markdown
    attributes:
      value: |
        Thanks for taking the time to fill out this bug report!
  - type: textarea
    id: bug-description
    attributes:
      label: Describe the bug
      description: A clear and concise description of what the bug is.
      placeholder: Bug description
    validations:
      required: true
  - type: checkboxes
    attributes:
      label: Is there an existing issue for this?
      description: Please search to see if an issue already exists for the issue you encountered.
      options:
        - label: I have searched the existing issues
          required: true
  - type: textarea
    id: reproduction
    attributes:
      label: Reproduction
      description: Please provide the steps necessary to reproduce your issue.
      placeholder: Reproduction
    validations:
      required: true
  - type: textarea
    id: screenshot
    attributes:
      label: Screenshot
      description: "If possible, please include screenshot(s) so that we can understand what the issue is."
  - type: textarea
    id: logs
    attributes:
      label: Logs
      description: "Please include the full stacktrace of the errors you get in the command-line (if any)."
      render: shell
    validations:
      required: true
  - type: textarea
    id: system-info
    attributes:
      label: System Info
      description: "Please share your operating system and GPU type (NVIDIA/AMD/Intel/Apple). If you are using a Google Colab notebook, mention that instead."
      render: shell
      placeholder: 
    validations:
      required: true


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an improvement or new feature for the web UI
title: ''
labels: 'enhancement'
assignees: ''

---

**Description**

A clear and concise description of what you want to be implemented.

**Additional Context**

If applicable, please provide any extra information, external links, or screenshots that could be useful.


================================================
FILE: .github/dependabot.yml
================================================
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates

version: 2
updates:
  - package-ecosystem: "pip"
    directories:
      - "/requirements/full/"
      - "/requirements/portable/"
    target-branch: "dev"
    schedule:
      interval: "weekly"


================================================
FILE: .github/pull_request_template.md
================================================
## Checklist:

- [ ] I have read the [Contributing guidelines](https://github.com/oobabooga/text-generation-webui/wiki/Contributing-guidelines).


================================================
FILE: .github/workflows/build-everything-tgw.yml
================================================
name: Build Everything TGW

on:
  workflow_dispatch:
    inputs:
      version:
        description: 'Version tag of text-generation-webui to build: v3.0'
        default: 'v3.0'
        required: true
        type: string

permissions:
  contents: write

jobs:
  build_release_cuda_windows:
    name: CUDA Windows
    uses: ./.github/workflows/build-portable-release-cuda.yml
    with:
      version: ${{ inputs.version }}
      config: 'os:windows-2022'

  build_release_cuda_linux:
    name: CUDA Linux
    uses: ./.github/workflows/build-portable-release-cuda.yml
    with:
      version: ${{ inputs.version }}
      config: 'os:ubuntu-22.04'

  build_release_vulkan_windows:
    name: Vulkan Windows
    uses: ./.github/workflows/build-portable-release-vulkan.yml
    with:
      version: ${{ inputs.version }}
      config: 'os:windows-2022'

  build_release_vulkan_linux:
    name: Vulkan Linux
    uses: ./.github/workflows/build-portable-release-vulkan.yml
    with:
      version: ${{ inputs.version }}
      config: 'os:ubuntu-22.04'

  build_release_rocm_linux:
    name: ROCm Linux
    uses: ./.github/workflows/build-portable-release-rocm.yml
    with:
      version: ${{ inputs.version }}
      config: 'os:ubuntu-22.04'

  build_release_cpu_windows:
    name: CPU Windows
    uses: ./.github/workflows/build-portable-release.yml
    with:
      version: ${{ inputs.version }}
      config: 'os:windows-2022'

  build_release_cpu_linux:
    name: CPU Linux
    uses: ./.github/workflows/build-portable-release.yml
    with:
      version: ${{ inputs.version }}
      config: 'os:ubuntu-22.04'

  build_release_macos:
    name: macOS
    uses: ./.github/workflows/build-portable-release.yml
    with:
      version: ${{ inputs.version }}
      config: 'os:macos-15-intel,macos-14'


================================================
FILE: .github/workflows/build-portable-release-cuda.yml
================================================
name: Build CUDA

on:
  workflow_dispatch:
    inputs:
      version:
        description: 'Version tag of text-generation-webui to build: v3.0'
        default: 'v3.0'
        required: true
        type: string
      config:
        description: 'Override configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
        default: 'Default'
        required: false
        type: string
      exclude:
        description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
        default: 'None'
        required: false
        type: string
  workflow_call:
    inputs:
      version:
        description: 'Version tag of text-generation-webui to build: v3.0'
        default: 'v3.0'
        required: true
        type: string
      config:
        description: 'Configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
        default: 'Default'
        required: false
        type: string
      exclude:
        description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
        default: 'None'
        required: false
        type: string

permissions:
  contents: write

jobs:
  define_matrix:
    name: Define Build Matrix
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    defaults:
      run:
        shell: pwsh
    env:
      CONFIGIN: ${{ inputs.config }}
      EXCLUDEIN: ${{ inputs.exclude }}

    steps:
      - name: Define Job Output
        id: set-matrix
        run: |
          $matrix = @{
              'os' = @('ubuntu-22.04', 'windows-2022')
              'pyver' = @("3.13")
              'cuda' = @("12.4", "13.1")
          }

          if ($env:CONFIGIN -ne 'Default') {$env:CONFIGIN.split(';').foreach({$matrix[$_.split(':')[0]] = $_.split(':')[1].split(',')})}

          if ($env:EXCLUDEIN -ne 'None') {
              $exclusions = @()
              $exclusions += $env:EXCLUDEIN.split(';').replace(':','=').replace(',',"`n") | ConvertFrom-StringData
              $matrix['exclude'] = $exclusions
          }

          $matrixOut = ConvertTo-Json $matrix -Compress
          Write-Output ('matrix=' + $matrixOut) >> $env:GITHUB_OUTPUT

  build_wheels:
    name: ${{ matrix.os }} ${{ matrix.pyver }} CUDA ${{ matrix.cuda }}
    needs: define_matrix
    runs-on: ${{ matrix.os }}
    strategy:
      matrix: ${{ fromJSON(needs.define_matrix.outputs.matrix) }}
    defaults:
      run:
        shell: pwsh
    env:
      PCKGVER: ${{ inputs.version }}

    steps:
      - uses: actions/checkout@v6
        with:
          repository: 'oobabooga/text-generation-webui'
          ref: ${{ inputs.version }}
          submodules: 'recursive'

      - uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.pyver }}

      - name: Build Package
        shell: bash
        run: |
            VERSION_CLEAN="${{ inputs.version }}"
            VERSION_CLEAN="${VERSION_CLEAN#v}"
            cd ..
            cp -r text-generation-webui "text-generation-webui-${VERSION_CLEAN}"
            cd "text-generation-webui-${VERSION_CLEAN}"

            # Remove extensions that need additional requirements
            allowed=("character_bias" "gallery" "openai" "sd_api_pictures")
            find extensions/ -mindepth 1 -maxdepth 1 -type d | grep -v -E "$(printf '%s|' "${allowed[@]}" | sed 's/|$//')" | xargs rm -rf

            # Define common variables
            CUDA_VERSION="${{ matrix.cuda }}"
            VERSION="${{ inputs.version }}"

            # 1. Set platform-specific variables
            if [[ "$RUNNER_OS" == "Windows" ]]; then
                PLATFORM="windows"
                PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-pc-windows-msvc-install_only.tar.gz"
                PIP_PATH="portable_env/python.exe -m pip"
                PACKAGES_PATH="portable_env/Lib/site-packages"
                rm start_linux.sh start_macos.sh
            else
                PLATFORM="linux"
                PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz"
                PIP_PATH="portable_env/bin/python -m pip"
                PACKAGES_PATH="portable_env/lib/python3.13/site-packages"
                rm start_macos.sh start_windows.bat
            fi

            # 2. Download and extract Python
            cd ..
            echo "Downloading Python for $PLATFORM..."
            curl -L -o python-build.tar.gz "$PYTHON_URL"
            tar -xzf python-build.tar.gz
            mv python "text-generation-webui-${VERSION_CLEAN}/portable_env"

            # 3. Prepare requirements file based on CUDA version
            cd "text-generation-webui-${VERSION_CLEAN}"
            if [[ "$CUDA_VERSION" == "13.1" ]]; then
                REQ_FILE="requirements/portable/requirements_cuda131.txt"
            else
                REQ_FILE="requirements/portable/requirements.txt"
            fi

            # 4. Install packages
            echo "Installing Python packages from $REQ_FILE..."
            $PIP_PATH install --target="./$PACKAGES_PATH" -r "$REQ_FILE"

            # 5. Clean up
            rm -rf .git cmd* update_wizard* Colab-TextGen-GPU.ipynb docker setup.cfg .github .gitignore requirements/ one_click.py

            # 6. Create archive
            cd ..
            if [[ "$RUNNER_OS" == "Windows" ]]; then
                ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-cuda${CUDA_VERSION}.zip"
                echo "Creating archive: $ARCHIVE_NAME"
                powershell -Command "Compress-Archive -Path text-generation-webui-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME"
            else
                ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-cuda${CUDA_VERSION}.tar.gz"
                echo "Creating archive: $ARCHIVE_NAME"
                tar czf "$ARCHIVE_NAME" "text-generation-webui-${VERSION_CLEAN}"
            fi

      - name: Upload files to a GitHub release
        id: upload-release
        uses: svenstaro/upload-release-action@2.7.0
        continue-on-error: true
        with:
          repo_token: ${{ secrets.GITHUB_TOKEN }}
          file: ../textgen-portable-*
          tag: ${{ inputs.version }}
          file_glob: true
          make_latest: false
          overwrite: true


================================================
FILE: .github/workflows/build-portable-release-rocm.yml
================================================
name: Build ROCm

on:
  workflow_dispatch:
    inputs:
      version:
        description: 'Version tag of text-generation-webui to build: v3.0'
        default: 'v3.0'
        required: true
        type: string
      config:
        description: 'Override configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
        default: 'Default'
        required: false
        type: string
      exclude:
        description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
        default: 'None'
        required: false
        type: string
  workflow_call:
    inputs:
      version:
        description: 'Version tag of text-generation-webui to build: v3.0'
        default: 'v3.0'
        required: true
        type: string
      config:
        description: 'Configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
        default: 'Default'
        required: false
        type: string
      exclude:
        description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
        default: 'None'
        required: false
        type: string

permissions:
  contents: write

jobs:
  define_matrix:
    name: Define Build Matrix
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    defaults:
      run:
        shell: pwsh
    env:
      CONFIGIN: ${{ inputs.config }}
      EXCLUDEIN: ${{ inputs.exclude }}

    steps:
      - name: Define Job Output
        id: set-matrix
        run: |
          $matrix = @{
              'os' = @('ubuntu-22.04', 'windows-2022')
              'pyver' = @("3.13")
          }

          if ($env:CONFIGIN -ne 'Default') {$env:CONFIGIN.split(';').foreach({$matrix[$_.split(':')[0]] = $_.split(':')[1].split(',')})}

          if ($env:EXCLUDEIN -ne 'None') {
              $exclusions = @()
              $exclusions += $env:EXCLUDEIN.split(';').replace(':','=').replace(',',"`n") | ConvertFrom-StringData
              $matrix['exclude'] = $exclusions
          }

          $matrixOut = ConvertTo-Json $matrix -Compress
          Write-Output ('matrix=' + $matrixOut) >> $env:GITHUB_OUTPUT

  build_wheels:
    name: ${{ matrix.os }} ${{ matrix.pyver }}
    needs: define_matrix
    runs-on: ${{ matrix.os }}
    strategy:
      matrix: ${{ fromJSON(needs.define_matrix.outputs.matrix) }}
    defaults:
      run:
        shell: pwsh
    env:
      PCKGVER: ${{ inputs.version }}

    steps:
      - uses: actions/checkout@v6
        with:
          repository: 'oobabooga/text-generation-webui'
          ref: ${{ inputs.version }}
          submodules: 'recursive'

      - uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.pyver }}

      - name: Build Package
        shell: bash
        run: |
            VERSION_CLEAN="${{ inputs.version }}"
            VERSION_CLEAN="${VERSION_CLEAN#v}"
            cd ..
            cp -r text-generation-webui "text-generation-webui-${VERSION_CLEAN}"
            cd "text-generation-webui-${VERSION_CLEAN}"

            # Remove extensions that need additional requirements
            allowed=("character_bias" "gallery" "openai" "sd_api_pictures")
            find extensions/ -mindepth 1 -maxdepth 1 -type d | grep -v -E "$(printf '%s|' "${allowed[@]}" | sed 's/|$//')" | xargs rm -rf

            # Define common variables
            VERSION="${{ inputs.version }}"

            # 1. Set platform-specific variables
            if [[ "$RUNNER_OS" == "Windows" ]]; then
                PLATFORM="windows"
                PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-pc-windows-msvc-install_only.tar.gz"
                PIP_PATH="portable_env/python.exe -m pip"
                PACKAGES_PATH="portable_env/Lib/site-packages"
                rm start_linux.sh start_macos.sh
            else
                PLATFORM="linux"
                PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz"
                PIP_PATH="portable_env/bin/python -m pip"
                PACKAGES_PATH="portable_env/lib/python3.13/site-packages"
                rm start_macos.sh start_windows.bat
            fi

            # 2. Download and extract Python
            cd ..
            echo "Downloading Python for $PLATFORM..."
            curl -L -o python-build.tar.gz "$PYTHON_URL"
            tar -xzf python-build.tar.gz
            mv python "text-generation-webui-${VERSION_CLEAN}/portable_env"

            # 3. Prepare requirements file
            REQ_FILE="requirements/portable/requirements_amd.txt"

            cd "text-generation-webui-${VERSION_CLEAN}"

            # 4. Install packages
            echo "Installing Python packages from $REQ_FILE..."
            $PIP_PATH install --target="./$PACKAGES_PATH" -r "$REQ_FILE"

            # 5. Clean up
            rm -rf .git cmd* update_wizard* Colab-TextGen-GPU.ipynb docker setup.cfg .github .gitignore requirements/ one_click.py

            # 6. Create archive
            cd ..
            if [[ "$RUNNER_OS" == "Windows" ]]; then
                ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-rocm7.2.zip"
                echo "Creating archive: $ARCHIVE_NAME"
                powershell -Command "Compress-Archive -Path text-generation-webui-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME"
            else
                ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-rocm7.2.tar.gz"
                echo "Creating archive: $ARCHIVE_NAME"
                tar czf "$ARCHIVE_NAME" "text-generation-webui-${VERSION_CLEAN}"
            fi

      - name: Upload files to a GitHub release
        id: upload-release
        uses: svenstaro/upload-release-action@2.7.0
        continue-on-error: true
        with:
          repo_token: ${{ secrets.GITHUB_TOKEN }}
          file: ../textgen-portable-*
          tag: ${{ inputs.version }}
          file_glob: true
          make_latest: false
          overwrite: true


================================================
FILE: .github/workflows/build-portable-release-vulkan.yml
================================================
name: Build Vulkan

on:
  workflow_dispatch:
    inputs:
      version:
        description: 'Version tag of text-generation-webui to build: v3.0'
        default: 'v3.0'
        required: true
        type: string
      config:
        description: 'Override configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
        default: 'Default'
        required: false
        type: string
      exclude:
        description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
        default: 'None'
        required: false
        type: string
  workflow_call:
    inputs:
      version:
        description: 'Version tag of text-generation-webui to build: v3.0'
        default: 'v3.0'
        required: true
        type: string
      config:
        description: 'Configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
        default: 'Default'
        required: false
        type: string
      exclude:
        description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
        default: 'None'
        required: false
        type: string

permissions:
  contents: write

jobs:
  define_matrix:
    name: Define Build Matrix
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    defaults:
      run:
        shell: pwsh
    env:
      CONFIGIN: ${{ inputs.config }}
      EXCLUDEIN: ${{ inputs.exclude }}

    steps:
      - name: Define Job Output
        id: set-matrix
        run: |
          $matrix = @{
              'os' = @('ubuntu-22.04', 'windows-2022')
              'pyver' = @("3.13")
          }

          if ($env:CONFIGIN -ne 'Default') {$env:CONFIGIN.split(';').foreach({$matrix[$_.split(':')[0]] = $_.split(':')[1].split(',')})}

          if ($env:EXCLUDEIN -ne 'None') {
              $exclusions = @()
              $exclusions += $env:EXCLUDEIN.split(';').replace(':','=').replace(',',"`n") | ConvertFrom-StringData
              $matrix['exclude'] = $exclusions
          }

          $matrixOut = ConvertTo-Json $matrix -Compress
          Write-Output ('matrix=' + $matrixOut) >> $env:GITHUB_OUTPUT

  build_wheels:
    name: ${{ matrix.os }} ${{ matrix.pyver }}
    needs: define_matrix
    runs-on: ${{ matrix.os }}
    strategy:
      matrix: ${{ fromJSON(needs.define_matrix.outputs.matrix) }}
    defaults:
      run:
        shell: pwsh
    env:
      PCKGVER: ${{ inputs.version }}

    steps:
      - uses: actions/checkout@v6
        with:
          repository: 'oobabooga/text-generation-webui'
          ref: ${{ inputs.version }}
          submodules: 'recursive'

      - uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.pyver }}

      - name: Build Package
        shell: bash
        run: |
            VERSION_CLEAN="${{ inputs.version }}"
            VERSION_CLEAN="${VERSION_CLEAN#v}"
            cd ..
            cp -r text-generation-webui "text-generation-webui-${VERSION_CLEAN}"
            cd "text-generation-webui-${VERSION_CLEAN}"

            # Remove extensions that need additional requirements
            allowed=("character_bias" "gallery" "openai" "sd_api_pictures")
            find extensions/ -mindepth 1 -maxdepth 1 -type d | grep -v -E "$(printf '%s|' "${allowed[@]}" | sed 's/|$//')" | xargs rm -rf

            # Define common variables
            VERSION="${{ inputs.version }}"

            # 1. Set platform-specific variables
            if [[ "$RUNNER_OS" == "Windows" ]]; then
                PLATFORM="windows"
                PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-pc-windows-msvc-install_only.tar.gz"
                PIP_PATH="portable_env/python.exe -m pip"
                PACKAGES_PATH="portable_env/Lib/site-packages"
                rm start_linux.sh start_macos.sh
            else
                PLATFORM="linux"
                PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz"
                PIP_PATH="portable_env/bin/python -m pip"
                PACKAGES_PATH="portable_env/lib/python3.13/site-packages"
                rm start_macos.sh start_windows.bat
            fi

            # 2. Download and extract Python
            cd ..
            echo "Downloading Python for $PLATFORM..."
            curl -L -o python-build.tar.gz "$PYTHON_URL"
            tar -xzf python-build.tar.gz
            mv python "text-generation-webui-${VERSION_CLEAN}/portable_env"

            # 3. Prepare requirements file
            REQ_FILE="requirements/portable/requirements_vulkan.txt"

            cd "text-generation-webui-${VERSION_CLEAN}"

            # 4. Install packages
            echo "Installing Python packages from $REQ_FILE..."
            $PIP_PATH install --target="./$PACKAGES_PATH" -r "$REQ_FILE"

            # 5. Clean up
            rm -rf .git cmd* update_wizard* Colab-TextGen-GPU.ipynb docker setup.cfg .github .gitignore requirements/ one_click.py

            # 6. Create archive
            cd ..
            if [[ "$RUNNER_OS" == "Windows" ]]; then
                ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-vulkan.zip"
                echo "Creating archive: $ARCHIVE_NAME"
                powershell -Command "Compress-Archive -Path text-generation-webui-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME"
            else
                ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}-vulkan.tar.gz"
                echo "Creating archive: $ARCHIVE_NAME"
                tar czf "$ARCHIVE_NAME" "text-generation-webui-${VERSION_CLEAN}"
            fi

      - name: Upload files to a GitHub release
        id: upload-release
        uses: svenstaro/upload-release-action@2.7.0
        continue-on-error: true
        with:
          repo_token: ${{ secrets.GITHUB_TOKEN }}
          file: ../textgen-portable-*
          tag: ${{ inputs.version }}
          file_glob: true
          make_latest: false
          overwrite: true


================================================
FILE: .github/workflows/build-portable-release.yml
================================================
name: Build CPU and macOS

on:
  workflow_dispatch:
    inputs:
      version:
        description: 'Version tag of text-generation-webui to build: v3.0'
        default: 'v3.0'
        required: true
        type: string
      config:
        description: 'Override configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
        default: 'Default'
        required: false
        type: string
      exclude:
        description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
        default: 'None'
        required: false
        type: string
  workflow_call:
    inputs:
      version:
        description: 'Version tag of text-generation-webui to build: v3.0'
        default: 'v3.0'
        required: true
        type: string
      config:
        description: 'Configurations to build: key1:item1-1,item1-2;key2:item2-1,item2-2'
        default: 'Default'
        required: false
        type: string
      exclude:
        description: 'Exclude build configurations: key1-1:item1-1,key1-2:item1-2;key2-1:item2-1,key2-2:item2-2'
        default: 'None'
        required: false
        type: string

permissions:
  contents: write

jobs:
  define_matrix:
    name: Define Build Matrix
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    defaults:
      run:
        shell: pwsh
    env:
      CONFIGIN: ${{ inputs.config }}
      EXCLUDEIN: ${{ inputs.exclude }}

    steps:
      - name: Define Job Output
        id: set-matrix
        run: |
          $matrix = @{
              'os' = @('ubuntu-22.04', 'windows-2022', 'macos-14')
              'pyver' = @("3.13")
          }

          if ($env:CONFIGIN -ne 'Default') {$env:CONFIGIN.split(';').foreach({$matrix[$_.split(':')[0]] = $_.split(':')[1].split(',')})}

          if ($env:EXCLUDEIN -ne 'None') {
              $exclusions = @()
              $exclusions += $env:EXCLUDEIN.split(';').replace(':','=').replace(',',"`n") | ConvertFrom-StringData
              $matrix['exclude'] = $exclusions
          }

          $matrixOut = ConvertTo-Json $matrix -Compress
          Write-Output ('matrix=' + $matrixOut) >> $env:GITHUB_OUTPUT

  build_wheels:
    name: ${{ matrix.os }} ${{ matrix.pyver }}
    needs: define_matrix
    runs-on: ${{ matrix.os }}
    strategy:
      matrix: ${{ fromJSON(needs.define_matrix.outputs.matrix) }}
    defaults:
      run:
        shell: pwsh
    env:
      PCKGVER: ${{ inputs.version }}

    steps:
      - uses: actions/checkout@v6
        with:
          repository: 'oobabooga/text-generation-webui'
          ref: ${{ inputs.version }}
          submodules: 'recursive'

      - uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.pyver }}

      - name: Build Package
        shell: bash
        run: |
            VERSION_CLEAN="${{ inputs.version }}"
            VERSION_CLEAN="${VERSION_CLEAN#v}"
            cd ..
            cp -r text-generation-webui "text-generation-webui-${VERSION_CLEAN}"
            cd "text-generation-webui-${VERSION_CLEAN}"

            # Remove extensions that need additional requirements
            allowed=("character_bias" "gallery" "openai" "sd_api_pictures")
            find extensions/ -mindepth 1 -maxdepth 1 -type d | grep -v -E "$(printf '%s|' "${allowed[@]}" | sed 's/|$//')" | xargs rm -rf

            # Define common variables
            VERSION="${{ inputs.version }}"
            OS_TYPE="${{ matrix.os }}"

            # 1. Set platform-specific variables
            if [[ "$RUNNER_OS" == "Windows" ]]; then
                PLATFORM="windows-cpu"
                PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-pc-windows-msvc-install_only.tar.gz"
                PIP_PATH="portable_env/python.exe -m pip"
                PACKAGES_PATH="portable_env/Lib/site-packages"
                rm start_linux.sh start_macos.sh
            elif [[ "$RUNNER_OS" == "macOS" ]]; then
                if [[ "$OS_TYPE" == "macos-15-intel" ]]; then
                    PLATFORM="macos-x86_64"
                    PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-apple-darwin-install_only.tar.gz"
                    REQ_TYPE="apple_intel"
                else
                    PLATFORM="macos-arm64"
                    PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-aarch64-apple-darwin-install_only.tar.gz"
                    REQ_TYPE="apple_silicon"
                fi
                PIP_PATH="portable_env/bin/python -m pip"
                PACKAGES_PATH="portable_env/lib/python3.13/site-packages"
                rm start_linux.sh start_windows.bat
            else
                # Linux case
                PLATFORM="linux-cpu"
                PYTHON_URL="https://github.com/astral-sh/python-build-standalone/releases/download/20260303/cpython-3.13.12+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz"
                PIP_PATH="portable_env/bin/python -m pip"
                PACKAGES_PATH="portable_env/lib/python3.13/site-packages"
                rm start_macos.sh start_windows.bat
            fi

            # 2. Download and extract Python
            echo "Downloading Python for $PLATFORM..."
            cd ..
            curl -L -o python-build.tar.gz "$PYTHON_URL"
            tar -xzf python-build.tar.gz
            mv python "text-generation-webui-${VERSION_CLEAN}/portable_env"

            # 3. Prepare requirements file based on platform
            cd "text-generation-webui-${VERSION_CLEAN}"

            # Select requirements file based on platform
            if [[ "$RUNNER_OS" == "macOS" ]]; then
                if [[ "$OS_TYPE" == "macos-15-intel" ]]; then
                    REQ_FILE="requirements/portable/requirements_apple_intel.txt"
                else
                    REQ_FILE="requirements/portable/requirements_apple_silicon.txt"
                fi
            else
                REQ_FILE="requirements/portable/requirements_cpu_only.txt"
            fi

            echo "Using requirements file: $REQ_FILE"

            # 4. Install packages
            echo "Installing Python packages from $REQ_FILE..."
            $PIP_PATH install --target="./$PACKAGES_PATH" -r "$REQ_FILE"

            # 5. Clean up
            rm -rf .git cmd* update_wizard* Colab-TextGen-GPU.ipynb docker setup.cfg .github .gitignore requirements/ one_click.py

            # 6. Create archive
            cd ..
            if [[ "$RUNNER_OS" == "Windows" ]]; then
                ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}.zip"
                echo "Creating archive: $ARCHIVE_NAME"
                powershell -Command "Compress-Archive -Path text-generation-webui-${VERSION_CLEAN} -DestinationPath $ARCHIVE_NAME"
            else
                ARCHIVE_NAME="textgen-portable-${VERSION_CLEAN}-${PLATFORM}.tar.gz"
                echo "Creating archive: $ARCHIVE_NAME"
                tar czf "$ARCHIVE_NAME" "text-generation-webui-${VERSION_CLEAN}"
            fi

      - name: Upload files to a GitHub release
        id: upload-release
        uses: svenstaro/upload-release-action@2.7.0
        continue-on-error: true
        with:
          repo_token: ${{ secrets.GITHUB_TOKEN }}
          file: ../textgen-portable-*
          tag: ${{ inputs.version }}
          file_glob: true
          make_latest: false
          overwrite: true


================================================
FILE: .gitignore
================================================
/css
/extensions
/installer_files
/repositories
/user_data

.chroma
.DS_Store
.eslintrc.js
.idea
.installer_state.json
.venv
venv
.envrc
.direnv
.vs
.vscode
*.bak
*.ipynb
*.log
*pycache*
cert.pem
key.pem
package.json
package-lock.json
Thumbs.db
wandb

# ignore user docker config and top level links to docker files
/docker-compose.yaml
/docker-compose.yml
/Dockerfile
.env


================================================
FILE: LICENSE
================================================
                    GNU AFFERO GENERAL PUBLIC LICENSE
                       Version 3, 19 November 2007

 Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.

                            Preamble

  The GNU Affero General Public License is a free, copyleft license for
software and other kinds of works, specifically designed to ensure
cooperation with the community in the case of network server software.

  The licenses for most software and other practical works are designed
to take away your freedom to share and change the works.  By contrast,
our General Public Licenses are intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users.

  When we speak of free software, we are referring to freedom, not
price.  Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.

  Developers that use our General Public Licenses protect your rights
with two steps: (1) assert copyright on the software, and (2) offer
you this License which gives you legal permission to copy, distribute
and/or modify the software.

  A secondary benefit of defending all users' freedom is that
improvements made in alternate versions of the program, if they
receive widespread use, become available for other developers to
incorporate.  Many developers of free software are heartened and
encouraged by the resulting cooperation.  However, in the case of
software used on network servers, this result may fail to come about.
The GNU General Public License permits making a modified version and
letting the public access it on a server without ever releasing its
source code to the public.

  The GNU Affero General Public License is designed specifically to
ensure that, in such cases, the modified source code becomes available
to the community.  It requires the operator of a network server to
provide the source code of the modified version running there to the
users of that server.  Therefore, public use of a modified version, on
a publicly accessible server, gives the public access to the source
code of the modified version.

  An older license, called the Affero General Public License and
published by Affero, was designed to accomplish similar goals.  This is
a different license, not a version of the Affero GPL, but Affero has
released a new version of the Affero GPL which permits relicensing under
this license.

  The precise terms and conditions for copying, distribution and
modification follow.

                       TERMS AND CONDITIONS

  0. Definitions.

  "This License" refers to version 3 of the GNU Affero General Public License.

  "Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.

  "The Program" refers to any copyrightable work licensed under this
License.  Each licensee is addressed as "you".  "Licensees" and
"recipients" may be individuals or organizations.

  To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy.  The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.

  A "covered work" means either the unmodified Program or a work based
on the Program.

  To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy.  Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.

  To "convey" a work means any kind of propagation that enables other
parties to make or receive copies.  Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.

  An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License.  If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.

  1. Source Code.

  The "source code" for a work means the preferred form of the work
for making modifications to it.  "Object code" means any non-source
form of a work.

  A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.

  The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form.  A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.

  The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities.  However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work.  For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.

  The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.

  The Corresponding Source for a work in source code form is that
same work.

  2. Basic Permissions.

  All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met.  This License explicitly affirms your unlimited
permission to run the unmodified Program.  The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work.  This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.

  You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force.  You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright.  Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.

  Conveying under any other circumstances is permitted solely under
the conditions stated below.  Sublicensing is not allowed; section 10
makes it unnecessary.

  3. Protecting Users' Legal Rights From Anti-Circumvention Law.

  No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.

  When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.

  4. Conveying Verbatim Copies.

  You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.

  You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.

  5. Conveying Modified Source Versions.

  You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:

    a) The work must carry prominent notices stating that you modified
    it, and giving a relevant date.

    b) The work must carry prominent notices stating that it is
    released under this License and any conditions added under section
    7.  This requirement modifies the requirement in section 4 to
    "keep intact all notices".

    c) You must license the entire work, as a whole, under this
    License to anyone who comes into possession of a copy.  This
    License will therefore apply, along with any applicable section 7
    additional terms, to the whole of the work, and all its parts,
    regardless of how they are packaged.  This License gives no
    permission to license the work in any other way, but it does not
    invalidate such permission if you have separately received it.

    d) If the work has interactive user interfaces, each must display
    Appropriate Legal Notices; however, if the Program has interactive
    interfaces that do not display Appropriate Legal Notices, your
    work need not make them do so.

  A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit.  Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.

  6. Conveying Non-Source Forms.

  You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:

    a) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by the
    Corresponding Source fixed on a durable physical medium
    customarily used for software interchange.

    b) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by a
    written offer, valid for at least three years and valid for as
    long as you offer spare parts or customer support for that product
    model, to give anyone who possesses the object code either (1) a
    copy of the Corresponding Source for all the software in the
    product that is covered by this License, on a durable physical
    medium customarily used for software interchange, for a price no
    more than your reasonable cost of physically performing this
    conveying of source, or (2) access to copy the
    Corresponding Source from a network server at no charge.

    c) Convey individual copies of the object code with a copy of the
    written offer to provide the Corresponding Source.  This
    alternative is allowed only occasionally and noncommercially, and
    only if you received the object code with such an offer, in accord
    with subsection 6b.

    d) Convey the object code by offering access from a designated
    place (gratis or for a charge), and offer equivalent access to the
    Corresponding Source in the same way through the same place at no
    further charge.  You need not require recipients to copy the
    Corresponding Source along with the object code.  If the place to
    copy the object code is a network server, the Corresponding Source
    may be on a different server (operated by you or a third party)
    that supports equivalent copying facilities, provided you maintain
    clear directions next to the object code saying where to find the
    Corresponding Source.  Regardless of what server hosts the
    Corresponding Source, you remain obligated to ensure that it is
    available for as long as needed to satisfy these requirements.

    e) Convey the object code using peer-to-peer transmission, provided
    you inform other peers where the object code and Corresponding
    Source of the work are being offered to the general public at no
    charge under subsection 6d.

  A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.

  A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling.  In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage.  For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product.  A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.

  "Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source.  The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.

  If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information.  But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).

  The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed.  Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.

  Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.

  7. Additional Terms.

  "Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law.  If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.

  When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it.  (Additional permissions may be written to require their own
removal in certain cases when you modify the work.)  You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.

  Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:

    a) Disclaiming warranty or limiting liability differently from the
    terms of sections 15 and 16 of this License; or

    b) Requiring preservation of specified reasonable legal notices or
    author attributions in that material or in the Appropriate Legal
    Notices displayed by works containing it; or

    c) Prohibiting misrepresentation of the origin of that material, or
    requiring that modified versions of such material be marked in
    reasonable ways as different from the original version; or

    d) Limiting the use for publicity purposes of names of licensors or
    authors of the material; or

    e) Declining to grant rights under trademark law for use of some
    trade names, trademarks, or service marks; or

    f) Requiring indemnification of licensors and authors of that
    material by anyone who conveys the material (or modified versions of
    it) with contractual assumptions of liability to the recipient, for
    any liability that these contractual assumptions directly impose on
    those licensors and authors.

  All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10.  If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term.  If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.

  If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.

  Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.

  8. Termination.

  You may not propagate or modify a covered work except as expressly
provided under this License.  Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).

  However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.

  Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.

  Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License.  If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.

  9. Acceptance Not Required for Having Copies.

  You are not required to accept this License in order to receive or
run a copy of the Program.  Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance.  However,
nothing other than this License grants you permission to propagate or
modify any covered work.  These actions infringe copyright if you do
not accept this License.  Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.

  10. Automatic Licensing of Downstream Recipients.

  Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License.  You are not responsible
for enforcing compliance by third parties with this License.

  An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations.  If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.

  You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License.  For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.

  11. Patents.

  A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based.  The
work thus licensed is called the contributor's "contributor version".

  A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version.  For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.

  Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.

  In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement).  To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.

  If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients.  "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.

  If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.

  A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License.  You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.

  Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.

  12. No Surrender of Others' Freedom.

  If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License.  If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all.  For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.

  13. Remote Network Interaction; Use with the GNU General Public License.

  Notwithstanding any other provision of this License, if you modify the
Program, your modified version must prominently offer all users
interacting with it remotely through a computer network (if your version
supports such interaction) an opportunity to receive the Corresponding
Source of your version by providing access to the Corresponding Source
from a network server at no charge, through some standard or customary
means of facilitating copying of software.  This Corresponding Source
shall include the Corresponding Source for any work covered by version 3
of the GNU General Public License that is incorporated pursuant to the
following paragraph.

  Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU General Public License into a single
combined work, and to convey the resulting work.  The terms of this
License will continue to apply to the part which is the covered work,
but the work with which it is combined will remain governed by version
3 of the GNU General Public License.

  14. Revised Versions of this License.

  The Free Software Foundation may publish revised and/or new versions of
the GNU Affero General Public License from time to time.  Such new versions
will be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.

  Each version is given a distinguishing version number.  If the
Program specifies that a certain numbered version of the GNU Affero General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation.  If the Program does not specify a version number of the
GNU Affero General Public License, you may choose any version ever published
by the Free Software Foundation.

  If the Program specifies that a proxy can decide which future
versions of the GNU Affero General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.

  Later license versions may give you additional or different
permissions.  However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.

  15. Disclaimer of Warranty.

  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

  16. Limitation of Liability.

  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.

  17. Interpretation of Sections 15 and 16.

  If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.

                     END OF TERMS AND CONDITIONS

            How to Apply These Terms to Your New Programs

  If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.

  To do so, attach the following notices to the program.  It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.

    <one line to give the program's name and a brief idea of what it does.>
    Copyright (C) <year>  <name of author>

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU Affero General Public License as published
    by the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU Affero General Public License for more details.

    You should have received a copy of the GNU Affero General Public License
    along with this program.  If not, see <https://www.gnu.org/licenses/>.

Also add information on how to contact you by electronic and paper mail.

  If your software can interact with users remotely through a computer
network, you should also make sure that it provides a way for users to
get its source.  For example, if your program is a web application, its
interface could display a "Source" link that leads users to an archive
of the code.  There are many ways you could offer source, and different
solutions will be better for different programs; see section 13 for the
specific requirements.

  You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU AGPL, see
<https://www.gnu.org/licenses/>.


================================================
FILE: README.md
================================================
<div align="center" markdown="1">
   <sup>Special thanks to:</sup>
   <br>
   <br>
   <a href="https://go.warp.dev/text-generation-webui">
      <img alt="Warp sponsorship" width="400" src="https://raw.githubusercontent.com/warpdotdev/brand-assets/refs/heads/main/Github/Sponsor/Warp-Github-LG-02.png">
   </a>

### [Warp, built for coding with multiple AI agents](https://go.warp.dev/text-generation-webui)
[Available for macOS, Linux, & Windows](https://go.warp.dev/text-generation-webui)<br>
</div>
<hr>

# Text Generation Web UI

A Gradio web UI for running Large Language Models locally. 100% private and offline. Supports text generation, vision, tool-calling, training, image generation, and more.

[Try the Deep Reason extension](https://oobabooga.gumroad.com/l/deep_reason)

|![Image1](https://github.com/oobabooga/screenshots/raw/main/INSTRUCT-3.5.png) | ![Image2](https://github.com/oobabooga/screenshots/raw/main/CHAT-3.5.png) |
|:---:|:---:|
|![Image1](https://github.com/oobabooga/screenshots/raw/main/DEFAULT-3.5.png) | ![Image2](https://github.com/oobabooga/screenshots/raw/main/PARAMETERS-3.5.png) |

## Features

- **Multiple backends**: [llama.cpp](https://github.com/ggerganov/llama.cpp), [Transformers](https://github.com/huggingface/transformers), [ExLlamaV3](https://github.com/turboderp-org/exllamav3), and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). Switch between backends and models without restarting.
- **File attachments**: Upload text files, PDF documents, and .docx documents to talk about their contents.
- **Vision (multimodal)**: Attach images to messages for visual understanding ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Multimodal-Tutorial)).
- **Tool-calling**: Models can call custom functions during chat — web search, page fetching, math, and more. Each tool is a single `.py` file, easy to create and extend ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Tool-Calling-Tutorial)).
- **OpenAI-compatible API**: Chat and Completions endpoints with tool-calling support. Use as a local drop-in replacement for the OpenAI API ([examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples)).
- **Training**: Fine-tune LoRAs on multi-turn chat or raw text datasets. Supports resuming interrupted runs ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/05-%E2%80%90-Training-Tab)).
- **Image generation**: A dedicated tab for `diffusers` models like **Z-Image-Turbo**. Features 4-bit/8-bit quantization and a persistent gallery with metadata ([tutorial](https://github.com/oobabooga/text-generation-webui/wiki/Image-Generation-Tutorial)).
- **Easy setup**: [Portable builds](https://github.com/oobabooga/text-generation-webui/releases) (zero setup, just unzip and run) for GGUF models on Windows/Linux/macOS, or a one-click installer for the full feature set.
- 100% offline and private, with zero telemetry, external resources, or remote update requests.
- `instruct` mode for instruction-following (like ChatGPT), and `chat-instruct`/`chat` modes for talking to custom characters. Prompts are automatically formatted with Jinja2 templates.
- Edit messages, navigate between message versions, and branch conversations at any point.
- Free-form text generation in the Notebook tab without being limited to chat turns.
- Multiple sampling parameters and generation options for sophisticated text generation control.
- Aesthetic UI with dark and light themes.
- Syntax highlighting for code blocks and LaTeX rendering for mathematical expressions.
- Extension support, with numerous built-in and user-contributed extensions available. See the [wiki](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions) and [extensions directory](https://github.com/oobabooga/text-generation-webui-extensions) for details.

## How to install

#### ✅ Option 1: Portable builds (get started in 1 minute)

No installation needed – just download, unzip and run. All dependencies included.

Download from here: **https://github.com/oobabooga/text-generation-webui/releases**

- Builds are provided for Linux, Windows, and macOS, with options for CUDA, Vulkan, ROCm, and CPU-only.
- Compatible with GGUF (llama.cpp) models.

#### Option 2: Manual portable install with venv

Very fast setup that should work on any Python 3.9+:

```bash
# Clone repository
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies (choose appropriate file under requirements/portable for your hardware)
pip install -r requirements/portable/requirements.txt --upgrade

# Launch server (basic command)
python server.py --portable --api --auto-launch

# When done working, deactivate
deactivate
```

#### Option 3: One-click installer

For users who need additional backends (ExLlamaV3, Transformers), training, image generation, or extensions (TTS, voice input, translation, etc). Requires ~10GB disk space and downloads PyTorch.

1. Clone the repository, or [download its source code](https://github.com/oobabooga/text-generation-webui/archive/refs/heads/main.zip) and extract it.
2. Run the startup script for your OS: `start_windows.bat`, `start_linux.sh`, or `start_macos.sh`.
3. When prompted, select your GPU vendor.
4. After installation, open `http://127.0.0.1:7860` in your browser.

To restart the web UI later, run the same `start_` script.

You can pass command-line flags directly (e.g., `./start_linux.sh --help`), or add them to `user_data/CMD_FLAGS.txt` (e.g., `--api` to enable the API).

To update, run the update script for your OS: `update_wizard_windows.bat`, `update_wizard_linux.sh`, or `update_wizard_macos.sh`.

To reinstall with a fresh Python environment, delete the `installer_files` folder and run the `start_` script again.

<details>
<summary>
One-click installer details
</summary>

### One-click-installer

The script uses Miniforge to set up a Conda environment in the `installer_files` folder.

If you ever need to install something manually in the `installer_files` environment, you can launch an interactive shell using the cmd script: `cmd_linux.sh`, `cmd_windows.bat`, or `cmd_macos.sh`.

* There is no need to run any of those scripts (`start_`, `update_wizard_`, or `cmd_`) as admin/root.
* To install requirements for extensions, it is recommended to use the update wizard script with the "Install/update extensions requirements" option. At the end, this script will install the main requirements for the project to make sure that they take precedence in case of version conflicts.
* For automated installation, you can use the `GPU_CHOICE`, `LAUNCH_AFTER_INSTALL`, and `INSTALL_EXTENSIONS` environment variables. For instance: `GPU_CHOICE=A LAUNCH_AFTER_INSTALL=FALSE INSTALL_EXTENSIONS=TRUE ./start_linux.sh`.

</details>

<details>
<summary>
Manual full installation with conda or docker
</summary>

### Full installation with Conda

#### 0. Install Conda

https://github.com/conda-forge/miniforge

On Linux or WSL, Miniforge can be automatically installed with these two commands:

```
curl -sL "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh" > "Miniforge3.sh"
bash Miniforge3.sh
```

For other platforms, download from: https://github.com/conda-forge/miniforge/releases/latest

#### 1. Create a new conda environment

```
conda create -n textgen python=3.13
conda activate textgen
```

#### 2. Install Pytorch

| System | GPU | Command |
|--------|---------|---------|
| Linux/WSL | NVIDIA | `pip3 install torch==2.9.1 --index-url https://download.pytorch.org/whl/cu128` |
| Linux/WSL | CPU only | `pip3 install torch==2.9.1 --index-url https://download.pytorch.org/whl/cpu` |
| Linux | AMD | `pip3 install https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torch-2.9.1%2Brocm7.2.0.lw.git7e1940d4-cp313-cp313-linux_x86_64.whl` |
| MacOS + MPS | Any | `pip3 install torch==2.9.1` |
| Windows | NVIDIA | `pip3 install torch==2.9.1 --index-url https://download.pytorch.org/whl/cu128` |
| Windows | CPU only | `pip3 install torch==2.9.1` |

The up-to-date commands can be found here: https://pytorch.org/get-started/locally/.

If you need `nvcc` to compile some library manually, you will additionally need to install this:

```
conda install -y -c "nvidia/label/cuda-12.8.1" cuda
```

#### 3. Install the web UI

```
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements/full/<requirements file according to table below>
```

Requirements file to use:

| GPU | requirements file to use |
|--------|---------|
| NVIDIA | `requirements.txt` |
| AMD | `requirements_amd.txt` |
| CPU only | `requirements_cpu_only.txt` |
| Apple Intel | `requirements_apple_intel.txt` |
| Apple Silicon | `requirements_apple_silicon.txt` |

### Start the web UI

```
conda activate textgen
cd text-generation-webui
python server.py
```

Then browse to

`http://127.0.0.1:7860`

#### Manual install

The `requirements*.txt` above contain various wheels precompiled through GitHub Actions. If you wish to compile things manually, or if you need to because no suitable wheels are available for your hardware, you can use `requirements_nowheels.txt` and then install your desired loaders manually.

### Alternative: Docker

```
For NVIDIA GPU:
ln -s docker/{nvidia/Dockerfile,nvidia/docker-compose.yml,.dockerignore} .
For AMD GPU:
ln -s docker/{amd/Dockerfile,amd/docker-compose.yml,.dockerignore} .
For Intel GPU:
ln -s docker/{intel/Dockerfile,intel/docker-compose.yml,.dockerignore} .
For CPU only
ln -s docker/{cpu/Dockerfile,cpu/docker-compose.yml,.dockerignore} .
cp docker/.env.example .env
#Create logs/cache dir :
mkdir -p user_data/logs user_data/cache
# Edit .env and set:
#   TORCH_CUDA_ARCH_LIST based on your GPU model
#   APP_RUNTIME_GID      your host user's group id (run `id -g` in a terminal)
#   BUILD_EXTENIONS      optionally add comma separated list of extensions to build
# Edit user_data/CMD_FLAGS.txt and add in it the options you want to execute (like --listen --cpu)
#
docker compose up --build
```

* You need to have Docker Compose v2.17 or higher installed. See [this guide](https://github.com/oobabooga/text-generation-webui/wiki/09-%E2%80%90-Docker) for instructions.
* For additional docker files, check out [this repository](https://github.com/Atinoda/text-generation-webui-docker).

### Updating the requirements

From time to time, the `requirements*.txt` change. To update, use these commands:

```
conda activate textgen
cd text-generation-webui
pip install -r <requirements file that you have used> --upgrade
```
</details>

<details>
<summary>
List of command-line flags
</summary>

```txt
usage: server.py [-h] [--user-data-dir USER_DATA_DIR] [--multi-user] [--model MODEL] [--lora LORA [LORA ...]] [--model-dir MODEL_DIR] [--lora-dir LORA_DIR] [--model-menu] [--settings SETTINGS]
                 [--extensions EXTENSIONS [EXTENSIONS ...]] [--verbose] [--idle-timeout IDLE_TIMEOUT] [--image-model IMAGE_MODEL] [--image-model-dir IMAGE_MODEL_DIR] [--image-dtype {bfloat16,float16}]
                 [--image-attn-backend {flash_attention_2,sdpa}] [--image-cpu-offload] [--image-compile] [--image-quant {none,bnb-8bit,bnb-4bit,torchao-int8wo,torchao-fp4,torchao-float8wo}]
                 [--loader LOADER] [--ctx-size N] [--cache-type N] [--model-draft MODEL_DRAFT] [--draft-max DRAFT_MAX] [--gpu-layers-draft GPU_LAYERS_DRAFT] [--device-draft DEVICE_DRAFT]
                 [--ctx-size-draft CTX_SIZE_DRAFT] [--spec-type {none,ngram-mod,ngram-simple,ngram-map-k,ngram-map-k4v,ngram-cache}] [--spec-ngram-size-n SPEC_NGRAM_SIZE_N]
                 [--spec-ngram-size-m SPEC_NGRAM_SIZE_M] [--spec-ngram-min-hits SPEC_NGRAM_MIN_HITS] [--gpu-layers N] [--cpu-moe] [--mmproj MMPROJ] [--streaming-llm] [--tensor-split TENSOR_SPLIT]
                 [--row-split] [--no-mmap] [--mlock] [--no-kv-offload] [--batch-size BATCH_SIZE] [--ubatch-size UBATCH_SIZE] [--threads THREADS] [--threads-batch THREADS_BATCH] [--numa]
                 [--parallel PARALLEL] [--fit-target FIT_TARGET] [--extra-flags EXTRA_FLAGS] [--cpu] [--cpu-memory CPU_MEMORY] [--disk] [--disk-cache-dir DISK_CACHE_DIR] [--load-in-8bit] [--bf16]
                 [--no-cache] [--trust-remote-code] [--force-safetensors] [--no_use_fast] [--attn-implementation IMPLEMENTATION] [--load-in-4bit] [--use_double_quant] [--compute_dtype COMPUTE_DTYPE]
                 [--quant_type QUANT_TYPE] [--gpu-split GPU_SPLIT] [--enable-tp] [--tp-backend TP_BACKEND] [--cfg-cache] [--listen] [--listen-port LISTEN_PORT] [--listen-host LISTEN_HOST] [--share]
                 [--auto-launch] [--gradio-auth GRADIO_AUTH] [--gradio-auth-path GRADIO_AUTH_PATH] [--ssl-keyfile SSL_KEYFILE] [--ssl-certfile SSL_CERTFILE] [--subpath SUBPATH] [--old-colors]
                 [--portable] [--api] [--public-api] [--public-api-id PUBLIC_API_ID] [--api-port API_PORT] [--api-key API_KEY] [--admin-key ADMIN_KEY] [--api-enable-ipv6] [--api-disable-ipv4]
                 [--nowebui] [--temperature N] [--dynatemp-low N] [--dynatemp-high N] [--dynatemp-exponent N] [--smoothing-factor N] [--smoothing-curve N] [--min-p N] [--top-p N] [--top-k N]
                 [--typical-p N] [--xtc-threshold N] [--xtc-probability N] [--epsilon-cutoff N] [--eta-cutoff N] [--tfs N] [--top-a N] [--top-n-sigma N] [--adaptive-target N] [--adaptive-decay N]
                 [--dry-multiplier N] [--dry-allowed-length N] [--dry-base N] [--repetition-penalty N] [--frequency-penalty N] [--presence-penalty N] [--encoder-repetition-penalty N]
                 [--no-repeat-ngram-size N] [--repetition-penalty-range N] [--penalty-alpha N] [--guidance-scale N] [--mirostat-mode N] [--mirostat-tau N] [--mirostat-eta N]
                 [--do-sample | --no-do-sample] [--dynamic-temperature | --no-dynamic-temperature] [--temperature-last | --no-temperature-last] [--sampler-priority N] [--dry-sequence-breakers N]
                 [--enable-thinking | --no-enable-thinking] [--reasoning-effort N] [--chat-template-file CHAT_TEMPLATE_FILE]

Text Generation Web UI

options:
  -h, --help                                           show this help message and exit

Basic settings:
  --user-data-dir USER_DATA_DIR                        Path to the user data directory. Default: auto-detected.
  --multi-user                                         Multi-user mode. Chat histories are not saved or automatically loaded. Best suited for small trusted teams.
  --model MODEL                                        Name of the model to load by default.
  --lora LORA [LORA ...]                               The list of LoRAs to load. If you want to load more than one LoRA, write the names separated by spaces.
  --model-dir MODEL_DIR                                Path to directory with all the models.
  --lora-dir LORA_DIR                                  Path to directory with all the loras.
  --model-menu                                         Show a model menu in the terminal when the web UI is first launched.
  --settings SETTINGS                                  Load the default interface settings from this yaml file. See user_data/settings-template.yaml for an example. If you create a file called
                                                       user_data/settings.yaml, this file will be loaded by default without the need to use the --settings flag.
  --extensions EXTENSIONS [EXTENSIONS ...]             The list of extensions to load. If you want to load more than one extension, write the names separated by spaces.
  --verbose                                            Print the prompts to the terminal.
  --idle-timeout IDLE_TIMEOUT                          Unload model after this many minutes of inactivity. It will be automatically reloaded when you try to use it again.

Image model:
  --image-model IMAGE_MODEL                            Name of the image model to select on startup (overrides saved setting).
  --image-model-dir IMAGE_MODEL_DIR                    Path to directory with all the image models.
  --image-dtype {bfloat16,float16}                     Data type for image model.
  --image-attn-backend {flash_attention_2,sdpa}        Attention backend for image model.
  --image-cpu-offload                                  Enable CPU offloading for image model.
  --image-compile                                      Compile the image model for faster inference.
  --image-quant {none,bnb-8bit,bnb-4bit,torchao-int8wo,torchao-fp4,torchao-float8wo}
                                                       Quantization method for image model.

Model loader:
  --loader LOADER                                      Choose the model loader manually, otherwise, it will get autodetected. Valid options: Transformers, llama.cpp, ExLlamav3_HF, ExLlamav3, TensorRT-
                                                       LLM.

Context and cache:
  --ctx-size, --n_ctx, --max_seq_len N                 Context size in tokens. 0 = auto for llama.cpp (requires gpu-layers=-1), 8192 for other loaders.
  --cache-type, --cache_type N                         KV cache type; valid options: llama.cpp - fp16, q8_0, q4_0; ExLlamaV3 - fp16, q2 to q8 (can specify k_bits and v_bits separately, e.g. q4_q8).

Speculative decoding:
  --model-draft MODEL_DRAFT                            Path to the draft model for speculative decoding.
  --draft-max DRAFT_MAX                                Number of tokens to draft for speculative decoding.
  --gpu-layers-draft GPU_LAYERS_DRAFT                  Number of layers to offload to the GPU for the draft model.
  --device-draft DEVICE_DRAFT                          Comma-separated list of devices to use for offloading the draft model. Example: CUDA0,CUDA1
  --ctx-size-draft CTX_SIZE_DRAFT                      Size of the prompt context for the draft model. If 0, uses the same as the main model.
  --spec-type {none,ngram-mod,ngram-simple,ngram-map-k,ngram-map-k4v,ngram-cache}
                                                       Draftless speculative decoding type. Recommended: ngram-mod.
  --spec-ngram-size-n SPEC_NGRAM_SIZE_N                N-gram lookup size for ngram speculative decoding.
  --spec-ngram-size-m SPEC_NGRAM_SIZE_M                Draft n-gram size for ngram speculative decoding.
  --spec-ngram-min-hits SPEC_NGRAM_MIN_HITS            Minimum n-gram hits for ngram-map speculative decoding.

llama.cpp:
  --gpu-layers, --n-gpu-layers N                       Number of layers to offload to the GPU. -1 = auto.
  --cpu-moe                                            Move the experts to the CPU (for MoE models).
  --mmproj MMPROJ                                      Path to the mmproj file for vision models.
  --streaming-llm                                      Activate StreamingLLM to avoid re-evaluating the entire prompt when old messages are removed.
  --tensor-split TENSOR_SPLIT                          Split the model across multiple GPUs. Comma-separated list of proportions. Example: 60,40.
  --row-split                                          Split the model by rows across GPUs. This may improve multi-gpu performance.
  --no-mmap                                            Prevent mmap from being used.
  --mlock                                              Force the system to keep the model in RAM.
  --no-kv-offload                                      Do not offload the K, Q, V to the GPU. This saves VRAM but reduces the performance.
  --batch-size BATCH_SIZE                              Maximum number of prompt tokens to batch together when calling llama-server. This is the application level batch size.
  --ubatch-size UBATCH_SIZE                            Maximum number of prompt tokens to batch together when calling llama-server. This is the max physical batch size for computation (device level).
  --threads THREADS                                    Number of threads to use.
  --threads-batch THREADS_BATCH                        Number of threads to use for batches/prompt processing.
  --numa                                               Activate NUMA task allocation for llama.cpp.
  --parallel PARALLEL                                  Number of parallel request slots. The context size is divided equally among slots. For example, to have 4 slots with 8192 context each, set
                                                       ctx_size to 32768.
  --fit-target FIT_TARGET                              Target VRAM margin per device for auto GPU layers, comma-separated list of values in MiB. A single value is broadcast across all devices.
                                                       Default: 1024.
  --extra-flags EXTRA_FLAGS                            Extra flags to pass to llama-server. Format: "flag1=value1,flag2,flag3=value3". Example: "override-tensor=exps=CPU"

Transformers/Accelerate:
  --cpu                                                Use the CPU to generate text. Warning: Training on CPU is extremely slow.
  --cpu-memory CPU_MEMORY                              Maximum CPU memory in GiB. Use this for CPU offloading.
  --disk                                               If the model is too large for your GPU(s) and CPU combined, send the remaining layers to the disk.
  --disk-cache-dir DISK_CACHE_DIR                      Directory to save the disk cache to.
  --load-in-8bit                                       Load the model with 8-bit precision (using bitsandbytes).
  --bf16                                               Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU.
  --no-cache                                           Set use_cache to False while generating text. This reduces VRAM usage slightly, but it comes at a performance cost.
  --trust-remote-code                                  Set trust_remote_code=True while loading the model. Necessary for some models.
  --force-safetensors                                  Set use_safetensors=True while loading the model. This prevents arbitrary code execution.
  --no_use_fast                                        Set use_fast=False while loading the tokenizer (it's True by default). Use this if you have any problems related to use_fast.
  --attn-implementation IMPLEMENTATION                 Attention implementation. Valid options: sdpa, eager, flash_attention_2.

bitsandbytes 4-bit:
  --load-in-4bit                                       Load the model with 4-bit precision (using bitsandbytes).
  --use_double_quant                                   use_double_quant for 4-bit.
  --compute_dtype COMPUTE_DTYPE                        compute dtype for 4-bit. Valid options: bfloat16, float16, float32.
  --quant_type QUANT_TYPE                              quant_type for 4-bit. Valid options: nf4, fp4.

ExLlamaV3:
  --gpu-split GPU_SPLIT                                Comma-separated list of VRAM (in GB) to use per GPU device for model layers. Example: 20,7,7.
  --enable-tp, --enable_tp                             Enable Tensor Parallelism (TP) to split the model across GPUs.
  --tp-backend TP_BACKEND                              The backend for tensor parallelism. Valid options: native, nccl. Default: native.
  --cfg-cache                                          Create an additional cache for CFG negative prompts. Necessary to use CFG with that loader.

Gradio:
  --listen                                             Make the web UI reachable from your local network.
  --listen-port LISTEN_PORT                            The listening port that the server will use.
  --listen-host LISTEN_HOST                            The hostname that the server will use.
  --share                                              Create a public URL. This is useful for running the web UI on Google Colab or similar.
  --auto-launch                                        Open the web UI in the default browser upon launch.
  --gradio-auth GRADIO_AUTH                            Set Gradio authentication password in the format "username:password". Multiple credentials can also be supplied with "u1:p1,u2:p2,u3:p3".
  --gradio-auth-path GRADIO_AUTH_PATH                  Set the Gradio authentication file path. The file should contain one or more user:password pairs in the same format as above.
  --ssl-keyfile SSL_KEYFILE                            The path to the SSL certificate key file.
  --ssl-certfile SSL_CERTFILE                          The path to the SSL certificate cert file.
  --subpath SUBPATH                                    Customize the subpath for gradio, use with reverse proxy
  --old-colors                                         Use the legacy Gradio colors, before the December/2024 update.
  --portable                                           Hide features not available in portable mode like training.

API:
  --api                                                Enable the API extension.
  --public-api                                         Create a public URL for the API using Cloudflare.
  --public-api-id PUBLIC_API_ID                        Tunnel ID for named Cloudflare Tunnel. Use together with public-api option.
  --api-port API_PORT                                  The listening port for the API.
  --api-key API_KEY                                    API authentication key.
  --admin-key ADMIN_KEY                                API authentication key for admin tasks like loading and unloading models. If not set, will be the same as --api-key.
  --api-enable-ipv6                                    Enable IPv6 for the API
  --api-disable-ipv4                                   Disable IPv4 for the API
  --nowebui                                            Do not launch the Gradio UI. Useful for launching the API in standalone mode.

API generation defaults:
  --temperature N                                      Temperature
  --dynatemp-low N                                     Dynamic temperature low
  --dynatemp-high N                                    Dynamic temperature high
  --dynatemp-exponent N                                Dynamic temperature exponent
  --smoothing-factor N                                 Smoothing factor
  --smoothing-curve N                                  Smoothing curve
  --min-p N                                            Min P
  --top-p N                                            Top P
  --top-k N                                            Top K
  --typical-p N                                        Typical P
  --xtc-threshold N                                    XTC threshold
  --xtc-probability N                                  XTC probability
  --epsilon-cutoff N                                   Epsilon cutoff
  --eta-cutoff N                                       Eta cutoff
  --tfs N                                              TFS
  --top-a N                                            Top A
  --top-n-sigma N                                      Top N Sigma
  --adaptive-target N                                  Adaptive target
  --adaptive-decay N                                   Adaptive decay
  --dry-multiplier N                                   DRY multiplier
  --dry-allowed-length N                               DRY allowed length
  --dry-base N                                         DRY base
  --repetition-penalty N                               Repetition penalty
  --frequency-penalty N                                Frequency penalty
  --presence-penalty N                                 Presence penalty
  --encoder-repetition-penalty N                       Encoder repetition penalty
  --no-repeat-ngram-size N                             No repeat ngram size
  --repetition-penalty-range N                         Repetition penalty range
  --penalty-alpha N                                    Penalty alpha
  --guidance-scale N                                   Guidance scale
  --mirostat-mode N                                    Mirostat mode
  --mirostat-tau N                                     Mirostat tau
  --mirostat-eta N                                     Mirostat eta
  --do-sample, --no-do-sample                          Do sample
  --dynamic-temperature, --no-dynamic-temperature      Dynamic temperature
  --temperature-last, --no-temperature-last            Temperature last
  --sampler-priority N                                 Sampler priority
  --dry-sequence-breakers N                            DRY sequence breakers
  --enable-thinking, --no-enable-thinking              Enable thinking
  --reasoning-effort N                                 Reasoning effort
  --chat-template-file CHAT_TEMPLATE_FILE              Path to a chat template file (.jinja, .jinja2, or .yaml) to use as the default instruction template for API requests. Overrides the model's
                                                       built-in template.
```

</details>

## Downloading models

1. Download a GGUF model file from [Hugging Face](https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads&search=gguf).
2. Place it in the `user_data/models` folder.

That's it. The UI will detect it automatically.

To check what will fit your GPU, you can use the [VRAM Calculator](https://huggingface.co/spaces/oobabooga/accurate-gguf-vram-calculator).

<details>
<summary>Other model types (Transformers, EXL3)</summary>

Models that consist of multiple files (like 16-bit Transformers models and EXL3 models) should be placed in a subfolder inside `user_data/models`:

```
text-generation-webui
└── user_data
    └── models
        └── Qwen_Qwen3-8B
            ├── config.json
            ├── generation_config.json
            ├── model-00001-of-00004.safetensors
            ├── ...
            ├── tokenizer_config.json
            └── tokenizer.json
```

These formats require the one-click installer (not the portable build).
</details>

## Documentation

https://github.com/oobabooga/text-generation-webui/wiki

## Community

https://www.reddit.com/r/Oobabooga/

## Acknowledgments

- In August 2023, [Andreessen Horowitz](https://a16z.com/) (a16z) provided a generous grant to encourage and support my independent work on this project. I am **extremely** grateful for their trust and recognition.
- This project was inspired by [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) and wouldn't exist without it.


================================================
FILE: cmd_linux.sh
================================================
#!/usr/bin/env bash

cd "$(dirname "${BASH_SOURCE[0]}")"

if [[ "$(pwd)" =~ " " ]]; then echo This script relies on Miniforge which can not be silently installed under a path with spaces. && exit; fi

# deactivate existing conda envs as needed to avoid conflicts
{ conda deactivate && conda deactivate && conda deactivate; } 2> /dev/null

# config
CONDA_ROOT_PREFIX="$(pwd)/installer_files/conda"
INSTALL_ENV_DIR="$(pwd)/installer_files/env"

# environment isolation
export PYTHONNOUSERSITE=1
unset PYTHONPATH
unset PYTHONHOME
export CUDA_PATH="$INSTALL_ENV_DIR"
export CUDA_HOME="$CUDA_PATH"

# activate env
bash --init-file <(echo "source \"$CONDA_ROOT_PREFIX/etc/profile.d/conda.sh\" && conda activate \"$INSTALL_ENV_DIR\"")


================================================
FILE: cmd_macos.sh
================================================
#!/bin/bash

cd "$(dirname "${BASH_SOURCE[0]}")"

if [[ "$(pwd)" =~ " " ]]; then echo This script relies on Miniforge which can not be silently installed under a path with spaces. && exit; fi

# deactivate existing conda envs as needed to avoid conflicts
{ conda deactivate && conda deactivate && conda deactivate; } 2> /dev/null

# config
CONDA_ROOT_PREFIX="$(pwd)/installer_files/conda"
INSTALL_ENV_DIR="$(pwd)/installer_files/env"

# environment isolation
export PYTHONNOUSERSITE=1
unset PYTHONPATH
unset PYTHONHOME
export CUDA_PATH="$INSTALL_ENV_DIR"
export CUDA_HOME="$CUDA_PATH"

# activate env
source $CONDA_ROOT_PREFIX/etc/profile.d/conda.sh
conda activate $INSTALL_ENV_DIR
exec bash --norc


================================================
FILE: cmd_windows.bat
================================================
@echo off

cd /D "%~dp0"

set PATH=%PATH%;%SystemRoot%\system32

echo "%CD%"| findstr /C:" " >nul && echo This script relies on Miniforge which can not be silently installed under a path with spaces. && goto end

@rem fix failed install when installing to a separate drive
set TMP=%cd%\installer_files
set TEMP=%cd%\installer_files

@rem deactivate existing conda envs as needed to avoid conflicts
(call conda deactivate && call conda deactivate && call conda deactivate) 2>nul

@rem config
set CONDA_ROOT_PREFIX=%cd%\installer_files\conda
set INSTALL_ENV_DIR=%cd%\installer_files\env

@rem environment isolation
set PYTHONNOUSERSITE=1
set PYTHONPATH=
set PYTHONHOME=
set PYTHONUTF8=1
set "CUDA_PATH=%INSTALL_ENV_DIR%"
set "CUDA_HOME=%CUDA_PATH%"

@rem activate installer env
call "%CONDA_ROOT_PREFIX%\condabin\conda.bat" activate "%INSTALL_ENV_DIR%" || ( echo. && echo Miniforge hook not found. && goto end )

@rem enter commands
cmd /k "%*"

:end
pause


================================================
FILE: docker/.dockerignore
================================================
.env
Dockerfile
/user_data


================================================
FILE: docker/TensorRT-LLM/Dockerfile
================================================
FROM nvidia/cuda:13.0.1-cudnn-runtime-ubuntu24.04

# Install Python 3.12, Git, and OpenMPI
RUN apt update && apt install -y python3.12 python3-pip git build-essential openmpi-bin libopenmpi-dev

# Set the working directory
WORKDIR /app

# This is needed to avoid an error about "Failed to build mpi4py" in the next command
ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

# Install text-generation-webui
RUN git clone https://github.com/oobabooga/text-generation-webui
WORKDIR /app/text-generation-webui
RUN pip install --break-system-packages -r requirements/full/requirements.txt

# Install TensorRT-LLM
RUN pip3 install --break-system-packages tensorrt_llm==1.1.0 --extra-index-url https://pypi.nvidia.com

# Expose the necessary port for the Python server
EXPOSE 7860 5000

# Run the Python server.py script with the specified command
CMD ["python3", "server.py", "--api", "--listen"]


================================================
FILE: docker/amd/Dockerfile
================================================
# BUILDER
FROM ubuntu:22.04
WORKDIR /builder
ARG BUILD_EXTENSIONS="${BUILD_EXTENSIONS:-}"
ARG APP_UID="${APP_UID:-6972}"
ARG APP_GID="${APP_GID:-6972}"

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,rw \
    apt update && \
    apt install --no-install-recommends -y git vim build-essential python3-dev pip bash curl && \
    rm -rf /var/lib/apt/lists/*
WORKDIR /home/app/
RUN git clone https://github.com/oobabooga/text-generation-webui.git 
WORKDIR /home/app/text-generation-webui
RUN GPU_CHOICE=B LAUNCH_AFTER_INSTALL=FALSE INSTALL_EXTENSIONS=TRUE ./start_linux.sh --verbose
EXPOSE ${CONTAINER_PORT:-7860} ${CONTAINER_API_PORT:-5000}
WORKDIR /home/app/text-generation-webui
# set umask to ensure group read / write at runtime
CMD umask 0002 && export HOME=/home/app/text-generation-webui && ./start_linux.sh --listen


================================================
FILE: docker/amd/docker-compose.yml
================================================
version: "3.3"
services:
  text-generation-webui:
    build:
      context: .
      args:
        BUILD_EXTENSIONS: ${BUILD_EXTENSIONS:-}
        APP_GID: ${APP_GID:-6972}
        APP_UID: ${APP_UID:-6972}
    env_file: .env
    user: "${APP_RUNTIME_UID:-6972}:${APP_RUNTIME_GID:-6972}"
    ports:
      - "${HOST_PORT:-7860}:${CONTAINER_PORT:-7860}"
      - "${HOST_API_PORT:-5000}:${CONTAINER_API_PORT:-5000}"
    stdin_open: true
    group_add:
      - video
    tty: true
    ipc: host
    devices:
      - /dev/kfd
      - /dev/dri 
    cap_add: 
      - SYS_PTRACE
    security_opt:
      - seccomp=unconfined
    volumes:
      - ./user_data:/home/app/text-generation-webui/user_data


================================================
FILE: docker/cpu/Dockerfile
================================================
# BUILDER
FROM ubuntu:22.04
WORKDIR /builder
ARG BUILD_EXTENSIONS="${BUILD_EXTENSIONS:-}"
ARG APP_UID="${APP_UID:-6972}"
ARG APP_GID="${APP_GID:-6972}"

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,rw \
    apt update && \
    apt install --no-install-recommends -y git vim build-essential python3-dev pip bash curl && \
    rm -rf /var/lib/apt/lists/*
WORKDIR /home/app/
RUN git clone https://github.com/oobabooga/text-generation-webui.git 
WORKDIR /home/app/text-generation-webui
RUN GPU_CHOICE=N LAUNCH_AFTER_INSTALL=FALSE INSTALL_EXTENSIONS=TRUE ./start_linux.sh --verbose
EXPOSE ${CONTAINER_PORT:-7860} ${CONTAINER_API_PORT:-5000}
# set umask to ensure group read / write at runtime
WORKDIR /home/app/text-generation-webui
CMD umask 0002 && export HOME=/home/app/text-generation-webui && ./start_linux.sh --listen


================================================
FILE: docker/cpu/docker-compose.yml
================================================
version: "3.3"
services:
  text-generation-webui:
    build:
      context: .
      args:
        BUILD_EXTENSIONS: ${BUILD_EXTENSIONS:-}
        APP_GID: ${APP_GID:-6972}
        APP_UID: ${APP_UID:-6972}
    env_file: .env
    user: "${APP_RUNTIME_UID:-6972}:${APP_RUNTIME_GID:-6972}"
    ports:
      - "${HOST_PORT:-7860}:${CONTAINER_PORT:-7860}"
      - "${HOST_API_PORT:-5000}:${CONTAINER_API_PORT:-5000}"
    stdin_open: true
    tty: true
    volumes:
      - ./user_data:/home/app/text-generation-webui/user_data


================================================
FILE: docker/intel/Dockerfile
================================================
# BUILDER
FROM ubuntu:22.04
WORKDIR /builder
ARG BUILD_EXTENSIONS="${BUILD_EXTENSIONS:-}"
ARG APP_UID="${APP_UID:-6972}"
ARG APP_GID="${APP_GID:-6972}"

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,rw \
    apt update && \
    apt install --no-install-recommends -y git vim build-essential python3-dev pip bash curl && \
    rm -rf /var/lib/apt/lists/*
WORKDIR /home/app/
RUN git clone https://github.com/oobabooga/text-generation-webui.git 
WORKDIR /home/app/text-generation-webui
RUN GPU_CHOICE=D LAUNCH_AFTER_INSTALL=FALSE INSTALL_EXTENSIONS=TRUE ./start_linux.sh --verbose
EXPOSE ${CONTAINER_PORT:-7860} ${CONTAINER_API_PORT:-5000}
# set umask to ensure group read / write at runtime
WORKDIR /home/app/text-generation-webui
CMD umask 0002 && export HOME=/home/app/text-generation-webui && ./start_linux.sh --listen


================================================
FILE: docker/intel/docker-compose.yml
================================================
version: "3.3"
services:
  text-generation-webui:
    build:
      context: .
      args:
        BUILD_EXTENSIONS: ${BUILD_EXTENSIONS:-}
        APP_GID: ${APP_GID:-6972}
        APP_UID: ${APP_UID:-6972}
    env_file: .env
    user: "${APP_RUNTIME_UID:-6972}:${APP_RUNTIME_GID:-6972}"
    ports:
      - "${HOST_PORT:-7860}:${CONTAINER_PORT:-7860}"
      - "${HOST_API_PORT:-5000}:${CONTAINER_API_PORT:-5000}"
    stdin_open: true
    group_add:
      - video
    tty: true
    ipc: host
    devices:
      - /dev/kfd
      - /dev/dri 
    cap_add: 
      - SYS_PTRACE
    security_opt:
      - seccomp=unconfined
    volumes:
      - ./user_data:/home/app/text-generation-webui/user_data


================================================
FILE: docker/nvidia/Dockerfile
================================================
# BUILDER
FROM ubuntu:22.04
WORKDIR /builder
ARG TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST:-3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX}"
ARG BUILD_EXTENSIONS="${BUILD_EXTENSIONS:-}"
ARG APP_UID="${APP_UID:-6972}"
ARG APP_GID="${APP_GID:-6972}"

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,rw \
    apt update && \
    apt install --no-install-recommends -y git vim build-essential python3-dev pip bash curl && \
    rm -rf /var/lib/apt/lists/*
WORKDIR /home/app/
RUN git clone https://github.com/oobabooga/text-generation-webui.git 
WORKDIR /home/app/text-generation-webui
RUN GPU_CHOICE=A LAUNCH_AFTER_INSTALL=FALSE INSTALL_EXTENSIONS=TRUE ./start_linux.sh --verbose
EXPOSE ${CONTAINER_PORT:-7860} ${CONTAINER_API_PORT:-5000}
WORKDIR /home/app/text-generation-webui
# set umask to ensure group read / write at runtime
CMD umask 0002 && export HOME=/home/app/text-generation-webui && ./start_linux.sh --listen


================================================
FILE: docker/nvidia/docker-compose.yml
================================================
version: "3.3"
services:
  text-generation-webui:
    build:
      context: .
      args:
        # specify which cuda version your card supports: https://developer.nvidia.com/cuda-gpus
        TORCH_CUDA_ARCH_LIST: ${TORCH_CUDA_ARCH_LIST:-8.6;8.9+PTX}
        BUILD_EXTENSIONS: ${BUILD_EXTENSIONS:-}
        APP_GID: ${APP_GID:-6972}
        APP_UID: ${APP_UID:-6972}
    env_file: .env
    user: "${APP_RUNTIME_UID:-6972}:${APP_RUNTIME_GID:-6972}"
    ports:
      - "${HOST_PORT:-7860}:${CONTAINER_PORT:-7860}"
      - "${HOST_API_PORT:-5000}:${CONTAINER_API_PORT:-5000}"
    stdin_open: true
    tty: true
    volumes:
      - ./user_data:/home/app/text-generation-webui/user_data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]


================================================
FILE: docs/01 - Chat Tab.md
================================================
Used to have multi-turn conversations with the model.

## Input area

The main action buttons are:

* **Send**: sends your message and makes the model start a reply.
* **Stop**: stops an ongoing generation as soon as the next token is generated (which can take a while for a slow model).

The hover menu (☰) that appears over the chat area contains:

* **Regenerate**: similar to Send, but your last message is used as input instead of the text in the input field. Note that if the temperature/top_p/top_k parameters are low in the "Parameters" tab of the UI, the new reply may end up identical to the previous one.
* **Continue**: makes the model attempt to continue the existing reply. In some cases, the model may simply end the existing turn immediately without generating anything new, but in other cases, it may generate a longer reply.
* **Remove last reply**: removes the last input/output pair from the history and sends your last message back into the input field.
* **Impersonate**: makes the model generate a new message on your behalf in the input field, taking into consideration the existing chat history.
* **Send dummy message**: adds a new message to the chat history without causing the model to generate a reply.
* **Send dummy reply**: adds a new reply to the chat history as if the model had generated this reply. Useful in conjunction with "Send dummy message".
* **Send to Notebook**: sends the entire chat prompt up to now to the Notebook tab.
* **Show controls**: checkbox that toggles the visibility of the sidebar controls (Start reply with, Mode, Chat style, etc.). Shortcut: Ctrl+S.

## Past chats

Allows you to switch between the current and previous conversations with the current character, or between the current and previous instruct conversations (if in "instruct" mode). The available buttons are:

* **Branch**: creates a branch of the current conversation at a specific message.
* **Rename**: allows you to give a unique name to the selected conversation.
* **🗑️**: deletes the selected conversation.
* **New chat**: starts a new conversation. If you are talking to a character that has a "Greeting" message defined, this message will be automatically added to the new history.

A search field is also available to filter conversations by name.

## Sidebar controls

The sidebar (toggled via "Show controls") contains:

* **Start reply with**: whatever you type there will appear at the start of every reply by the bot. This is useful to guide the response in the desired direction.
* **Reasoning effort**: controls the thinking depth for models that support it. Options: low, medium, high.
* **Enable thinking**: enables extended thinking mode for models that support it.
* **Activate web search**: when enabled, the model can search the web for information before replying. You can also set the number of pages to download.
* **Mode**: see below.
* **Chat style**: see below.
* **Command for chat-instruct mode**: the command that is used in chat-instruct mode to query the model to generate a reply on behalf of the character. Can be used creatively to generate specific kinds of responses. Inside this string, `<|character|>` is a placeholder that gets replaced with the bot name, and `<|prompt|>` is a placeholder that gets replaced with the full chat prompt.

## Mode

The most important input field. It defines how the chat prompt is formatted. There are 3 options: chat, chat-instruct, and instruct. It is worth going into more detail about this because it seems to not be obvious to a lot of people.

### Instruction-following models

There are two kinds of models: base models, like Llama and GPT-J, and fine-tuned models, like Alpaca and Vicuna. Fine-tuned models are trained starting from base models, most often with the goal of getting the model to understand and respond to instructions just like ChatGPT does. Let's call such models *instruction-following models*.

Each instruction-following model was trained on a specific prompt format, and you have to use that exact prompt format if you want the model to follow your instructions as accurately as it can.

As an example, this is the Alpaca format:

```
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Hi there!

### Response:
Hello! It's nice to meet you. What can I help with?

### Instruction:
How are you?

### Response:
I'm doing well, thank you for asking! Is there something specific you would like to talk about or ask me? I'm here to help answer any questions you may have.
```

This format is characterized by a context string at the top, and alternating turns where each user input starts with `### Instruction:` and each bot turn starts with `### Response:`. There are also weirder formats, like the one used by the Llama-2-chat models released by Meta AI:

```
[INST] <<SYS>>
Answer the questions.
<</SYS>>
Hi there! [/INST] Hello! It's nice to meet you. What can I help with? </s><s>[INST] How are you? [/INST] I'm doing well, thank you for asking! Is there something specific you would like to talk about or ask me? I'm here to help answer any questions you may have.
```

In this format, there are special tokens at the end of each bot reply (`</s>`, the end of sequence token, and `<s>`, the beginning of sequence token); no new lines separating the turns; and the context string is written between `<<SYS>>` and `<</SYS>>`. Despite the intimidating look of this format, the logic is the same: there are user turns and bot turns, and each one appears in a specific place in the template.

It is important to emphasize that instruction-following models **have to be used with the exact prompt format that they were trained on**. Using those models with any other prompt format should be considered undefined behavior. The model will still generate replies, but they will be less accurate to your inputs.

Now that an instruction-following model is defined, we can move on to describing the 3 chat modes.

### Chat

Used for talking to the character defined under "Character" tab using a simple chat prompt in this format:

```
Chiharu Yamada's Persona: Chiharu Yamada is a young, computer engineer-nerd with a knack for problem solving and a passion for technology.
You: Hi there!
Chiharu Yamada: Hello! It's nice to meet you. What can I help with?
You: How are you?
Chiharu Yamada: I'm doing well, thank you for asking! Is there something specific you would like to talk about or ask me? I'm here to help answer any questions you may have.
```

There are 3 adjustable parameters in the "Character" tab being used in this prompt:

* The **Context** string appears at the top of the prompt. Most often it describes the bot's personality and adds a few example messages to guide the model towards the desired reply length and format. This string never gets truncated: as the prompt size increases, old messages get removed one at a time until the prompt becomes smaller than the truncation length set under "Parameters" > "Generation" > "Truncate the prompt up to this length".
* The **Your name** string appears at the beginning of each user reply. By default, this string is "You".
* The **Character's name** string appears at the beginning of each bot reply.

Additionally, the **Greeting** string appears as the bot's opening message whenever the history is cleared.

The "Chat" option should typically be used only for base models or non-instruct fine tunes, and should not be used for instruction-following models.

### Instruct

Used for talking to an instruction-following model using the prompt format defined under "Parameters" > "Instruction template". Think of this option as an offline ChatGPT.

The prompt format is defined by the **Instruction template** parameter in "Parameters" > "Instruction template", which represents a Jinja2 template.

Note that when you load a model in the "Model" tab, the web UI will try to automatically detect its instruction template (if any), and will update the values under "Parameters" > "Instruction template" accordingly. This is done using a set of regular expressions defined in `user_data/models/config.yaml`. This detection is not guaranteed to be accurate. You should check the model card on Hugging Face to see if you are using the correct prompt format.

### Chat-instruct

As said above, instruction-following models are meant to be used with their specific prompt templates. The chat-instruct mode allows you to use those templates to generate a chat reply, thus mixing Chat and Instruct modes (hence the name).

It works by creating a single instruction-following turn where a command is given followed by the regular chat prompt. Here is an example in Alpaca format:

```
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Continue the chat dialogue below. Write a single reply for the character "Chiharu Yamada".
Chiharu Yamada's Persona: Chiharu Yamada is a young, computer engineer-nerd with a knack for problem solving and a passion for technology.
You: Hi there!
Chiharu Yamada: Hello! It's nice to meet you. What can I help with?
You: How are you?

### Response:
Chiharu Yamada:
```

Here, the command is

> Continue the chat dialogue below. Write a single reply for the character "Chiharu Yamada".

Below this command, the regular chat prompt is added, including its Context string and the chat history, and then the user turn ends. The bot turn starts with the "Character's name" string followed by `:`, thus prompting the instruction-following model to write a single reply for the character.

Note that you can get creative: instead of writing something trivial like "Write a single reply for the character", you could add more complex instructions like

> This is an adventure game, and your task is to write a reply in name of "<|character|>" where 3 options are given for the user to then choose from.

And it works:

![chat-instruct](https://github.com/oobabooga/text-generation-webui/assets/112222186/e38e3469-8263-4a10-b1a1-3c955026b8e7)

## Chat style

This defines the visual style of the chat UI. Each option is a CSS file defined under `text-generation-webui/css/chat_style-name.css`, where "name" is how this style is called in the dropdown menu. You can add new styles by simply copying `chat_style-cai-chat.css` to `chat_style-myNewStyle.css` and editing the contents of this new file. If you end up with a style that you like, you are highly encouraged to submit it to the repository.

The styles are only applied to chat and chat-instruct modes. Instruct mode has its separate style defined in `text-generation-webui/css/html_instruct_style.css`.

## Character gallery

This menu is a built-in extension defined under `text-generation-webui/extensions/gallery`. It displays a gallery with your characters, and if you click on a character, it will be automatically selected in the Character tab.


================================================
FILE: docs/02 - Default and Notebook Tabs.md
================================================
Used to generate raw completions starting from your prompt.

## Default tab

This tab contains two main text boxes: Input, where you enter your prompt, and Output, where the model output will appear.

### Input

The number on the lower right of the Input box counts the number of tokens in the input. It gets updated whenever you update the input text as long as a model is loaded (otherwise there is no tokenizer to count the tokens).

Below the Input box, the following buttons can be found:

* **Continue**: starts a new generation taking as input the text in the "Output" box.
* **Generate**: starts a new generation.
* **Stop**: stops an ongoing generation as soon as the next token is generated (which can take a while for a slow model).

In the **Prompt** menu, you can select from saved prompts stored in `user_data/logs/notebook`. The **New** button creates a new prompt, the **Rename** button renames the selected prompt, and the 🗑️ button deletes it. The 🔄 button refreshes the list.

### Output

Five tabs can be found:

* **Raw**: where the raw text generated by the model appears.
* **Markdown**: it contains a "Render" button. You can click on it at any time to render the current output as markdown. This is particularly useful for models that generate LaTeX equations like GALACTICA.
* **HTML**: displays the output in an HTML style that is meant to be easier to read. Its style is defined under `text-generation-webui/css/html_readable_style.css`.
* **Logits**: when you click on "Get next token probabilities", this tab displays the 50 most likely next tokens and their probabilities based on your current input. If "Use samplers" is checked, the probabilities will be the ones after the sampling parameters in the "Parameters" > "Generation" tab are applied. Otherwise, they will be the raw probabilities generated by the model.
* **Tokens**: allows you to tokenize your prompt and see the ID numbers for the individual tokens.

## Notebook tab

Precisely the same thing as the Default tab, with the difference that the output appears in the same text box as the input.

It contains the following additional button:

* **Regenerate**: uses your previous input for generation while discarding the last output.


================================================
FILE: docs/03 - Parameters Tab.md
================================================
## Generation

Contains parameters that control the text generation. 

### Quick rundown

LLMs work by generating one token at a time. Given your prompt, the model calculates the probabilities for every possible next token. The actual token generation is done after that. 

* In *greedy decoding*, the most likely token is always picked.
* Most commonly, *sampling* techniques are used to choose from the next-token distribution in a more non-trivial way with the goal of improving the quality of the generated text.

### Preset menu

Can be used to save and load combinations of parameters for reuse.

* **🎲 button**: creates a random yet interpretable preset. Only 1 parameter of each category is included for the categories: removing tail tokens, avoiding repetition, and flattening the distribution. That is, top_p and top_k are not mixed, and neither are repetition_penalty and frequency_penalty. You can use this button to break out of a loop of bad generations after multiple "Regenerate" attempts.

#### Built-in presets

These were obtained after a blind contest called "Preset Arena" where hundreds of people voted. The full results can be found [here](https://github.com/oobabooga/oobabooga.github.io/blob/main/arena/results.md).

A key takeaway is that the best presets are:

* **For Instruct**: Divine Intellect, Big O, simple-1.
* **For Chat**: Midnight Enigma, Yara, Shortwave.

The other presets are:

* Mirostat: a special decoding technique first implemented in llama.cpp and then adapted into this repository for all loaders. Many people have obtained positive results with it for chat.
* LLaMA-Precise: a legacy preset that was the default for the web UI before the Preset Arena.
* Debug-deterministic: disables sampling. It is useful for debugging, or if you intentionally want to use greedy decoding.

### Parameters description

For more information about the parameters, the [transformers documentation](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig) is a good reference.

* **max_new_tokens**: Maximum number of tokens to generate. Don't set it higher than necessary: it is used in the truncation calculation through the formula `(prompt_length) = min(truncation_length - max_new_tokens, prompt_length)`, so your prompt will get truncated if you set it too high.
* **temperature**: Primary factor to control the randomness of outputs. 0 = deterministic (only the most likely token is used). Higher value = more randomness.
* **top_p**: If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
* **min_p**: Tokens with probability smaller than `(min_p) * (probability of the most likely token)` are discarded. This is the same as top_a but without squaring the probability.
* **top_k**: Similar to top_p, but select instead only the top_k most likely tokens. Higher value = higher range of possible random results.
* **repetition_penalty**: Penalty factor for repeating prior tokens. 1 means no penalty, higher value = less repetition, lower value = more repetition.
* **presence_penalty**: Similar to repetition_penalty, but with an additive offset on the raw token scores instead of a multiplicative factor. It may generate better results. 0 means no penalty, higher value = less repetition, lower value = more repetition. Previously called "additive_repetition_penalty".
* **frequency_penalty**: Repetition penalty that scales based on how many times the token has appeared in the context. Be careful with this; there's no limit to how much a token can be penalized.
* **repetition_penalty_range**: The number of most recent tokens to consider for repetition penalty. 0 makes all tokens be used.
* **dry_multiplier**: Set to greater than 0 to enable DRY (Don't Repeat Yourself) sampling. It penalizes tokens that would extend a sequence that already appeared in the context. Recommended value: 0.8.
* **dry_allowed_length**: The longest sequence that can be repeated without being penalized by DRY. Shorter values make DRY more aggressive.
* **dry_base**: Controls how fast the DRY penalty grows with increasing sequence length.
* **typical_p**: If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
* **tfs**: Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. See [this blog post](https://www.trentonbricken.com/Tail-Free-Sampling/) for details. The closer to 0, the more discarded tokens.
* **top_a**: Tokens with probability smaller than `(top_a) * (probability of the most likely token)^2` are discarded.
* **top_n_sigma**: Keeps only tokens within N standard deviations of the mean log-probability. Acts as an adaptive cutoff that adjusts to the shape of the distribution. 0 disables it.
* **xtc_threshold**: eXclusion from Top Choices (XTC) sampling. If 2 or more tokens have probability above this threshold, the top token may be removed. This encourages the model to use less common word choices and can increase creativity.
* **xtc_probability**: The probability that XTC removal will actually happen when the threshold condition is met. Set to 1 for it to always apply, or lower for occasional application.
* **epsilon_cutoff**: In units of 1e-4; a reasonable value is 3. This sets a probability floor below which tokens are excluded from being sampled.
* **eta_cutoff**: In units of 1e-4; a reasonable value is 3. The main parameter of the special Eta Sampling technique. See [this paper](https://arxiv.org/pdf/2210.15191.pdf) for a description.
* **guidance_scale**: The main parameter for Classifier-Free Guidance (CFG). [The paper](https://arxiv.org/pdf/2306.17806.pdf) suggests that 1.5 is a good value. It can be used in conjunction with a negative prompt or not.
* **Negative prompt**: Only used when `guidance_scale != 1`. It is most useful for instruct models and custom system messages. You place your full prompt in this field with the system message replaced with the default one for the model (like "You are Llama, a helpful assistant...") to make the model pay more attention to your custom system message.
* **penalty_alpha**: Contrastive Search is enabled by setting this to greater than zero and unchecking "do_sample". It should be used with a low value of top_k, for instance, top_k = 4.
* **mirostat_mode**: Activates Mirostat sampling, an adaptive decoding method that dynamically controls output perplexity for higher-quality text generation. 0 is disabled. 1 is the classic Mirostat algorithm described in [the paper](https://arxiv.org/abs/2007.14966), but can be less stable, or “wobbly,” and produce less coherent text. 2 is the improved version that is more stable and has lower perplexity, recommended for most use cases. 
*Note: Use either mirostat or dynamic_temperature, not both at the same time.*
* **mirostat_tau**: Target perplexity for Mirostat sampling. Controls how “surprising” the text is. Higher values = more diverse, lower = more predictable. Preset Arena suggests 8 as a good value.
* **mirostat_eta**: Learning rate for Mirostat’s perplexity adjustment. Higher values = adapts faster but less stable, lower values = slower but more stable. Preset Arena suggests 0.1 as a good value.
* **adaptive_target**: Target probability for adaptive-p sampling. This method adjusts the sampling threshold dynamically based on an exponential moving average of recent token probabilities. 0 disables it.
* **adaptive_decay**: EMA decay rate for adaptive-p sampling. Controls how quickly the running average adjusts. Default: 0.9.
* **dynamic_temperature**: Activates Dynamic Temperature. This modifies temperature to range between "dynatemp_low" (minimum) and "dynatemp_high" (maximum), with an entropy-based scaling. The steepness of the curve is controlled by "dynatemp_exponent". 
*Note: Use either dynamic_temperature or mirostat, not both at the same time.*
* **smoothing_factor**: Activates Quadratic Sampling. When `0 < smoothing_factor < 1`, the logits distribution becomes flatter. When `smoothing_factor > 1`, it becomes more peaked.
* **smoothing_curve**: Adjusts the dropoff curve of Quadratic Sampling. Higher values make the curve steeper. Only takes effect when smoothing_factor is set.
* **temperature_last**: Makes temperature the last sampler instead of the first. With this, you can remove low probability tokens with a sampler like min_p and then use a high temperature to make the model creative without losing coherency. Note: this parameter takes precedence over "Sampler priority". That means that `temperature`/`dynamic_temperature`/`quadratic_sampling` will be removed from wherever they are and moved to the end of the stack.
* **do_sample**: When unchecked, sampling is entirely disabled, and greedy decoding is used instead (the most likely token is always picked).
* **Seed**: Set the Pytorch seed to this number. Note that some loaders do not use Pytorch (notably llama.cpp). For these loaders, the seed has no effect.
* **encoder_repetition_penalty**: Also known as the "Hallucinations filter". Used to penalize tokens that are *not* in the prior text. Higher value = more likely to stay in context, lower value = more likely to diverge.
* **no_repeat_ngram_size**: If not set to 0, specifies the length of token sets that are completely blocked from repeating at all. Higher values = blocks larger phrases, lower values = blocks words or letters from repeating. Only 0 or high values are a good idea in most cases.

To the right (or below if you are on mobile), the following parameters are present:

* **Truncate the prompt up to this length**: Used to prevent the prompt from getting bigger than the model's context length. In the case of the transformers loader, which allocates memory dynamically, this parameter can also be used to set a VRAM ceiling and prevent out-of-memory errors. This parameter is automatically updated with the model's context length (from "ctx_size" for loaders that use this parameter, and from the model metadata directly for loaders that do not) when you load a model.
* **Maximum number of tokens/second**: to make text readable in real-time in case the model is generating too fast. Good if you want to flex and tell everyone how good your GPU is.
* **Custom system message**: If not empty, will be used instead of the default system message in the instruction template. Useful for customizing the personality of the chatbot. Example: "You are a duck."
* **Custom stopping strings**: The model stops generating as soon as any of the strings set in this field is generated. Note that when generating text in the Chat tab, some default stopping strings are set regardless of this parameter, like "\nYour Name:" and "\nBot name:" for chat mode. That's why this parameter has a "Custom" in its name.
* **Custom token bans**: Allows you to ban the model from generating certain tokens altogether. You need to find the token IDs under "Default" > "Tokens" or "Notebook" > "Tokens", or by looking at the `tokenizer.json` for the model directly.
* **auto_max_new_tokens**: When checked, the max_new_tokens parameter is expanded in the backend to the available context length. The maximum length is given by the "truncation_length" parameter. This is useful for getting long replies in the Chat tab without having to click on "Continue" many times.
* **Ban the eos_token**: One of the possible tokens that a model can generate is the EOS (End of Sequence) token. When it is generated, the generation stops prematurely. When this parameter is checked, that token is banned from being generated, and the generation will always generate "max_new_tokens" tokens.
* **Add the bos_token to the beginning of prompts**: By default, the tokenizer will add a BOS (Beginning of Sequence) token to your prompt. During training, BOS tokens are used to separate different documents. If unchecked, no BOS token will be added, and the model will interpret your prompt as being in the middle of a document instead of at the start of one. This significantly changes the output and can make it more creative.
* **Skip special tokens**: When decoding the generated tokens, skip special tokens from being converted to their text representation. Otherwise, BOS appears as `<s>`, EOS as `</s>`, etc.
* **prompt_lookup_num_tokens**: Activates Prompt Lookup Decoding, a form of speculative decoding for the Transformers loader. It guesses future tokens by looking for matching patterns in the prompt itself, which can speed up generation for tasks that involve repeating or paraphrasing parts of the input.
* **Activate text streaming**: When unchecked, the full response is outputted at once, without streaming the words one at a time. I recommend unchecking this parameter on high latency networks like running the webui on Google Colab or using `--share`.
* **Static KV cache**: Use a static cache for improved performance with the Transformers loader. May not be compatible with all models.
* **Sampler priority**: Allows you to customize the order in which the different samplers are applied. The first sampler on the list gets applied first. With this, custom orders like `top_p -> temperature -> top_k` can be defined.
* **DRY sequence breakers**: Tokens across which DRY sequence matching is not continued. Typically punctuation and special tokens. Only used when DRY is active (dry_multiplier > 0).
* **Load grammar from file**: Loads a GBNF grammar from a file under `user_data/grammars`. The output is written to the "Grammar" box below. You can also save and delete custom grammars using this menu.
* **Grammar**: Allows you to constrain the model output to a particular format. For instance, you can make the model generate lists, JSON, specific words, etc. Grammar is extremely powerful and I highly recommend it. The syntax looks a bit daunting at first sight, but it gets very easy once you understand it. See the [GBNF Guide](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md) for details.

### Chat tab controls

The following parameters appear in the Chat tab sidebar rather than the Parameters tab:

* **reasoning_effort**: Controls the thinking depth for models that support it (used by GPT-OSS). Options: low, medium, high.
* **enable_thinking**: Enables extended thinking mode for models that support it (used by Seed-OSS and pre-2507 Qwen3). When enabled, the model can use a thinking step before generating its reply.

## Instruction template

This sub-tab within the Parameters tab defines the instruction template used in the Chat tab when "instruct" or "chat-instruct" are selected under "Mode".

* **Saved instruction templates**: A dropdown menu where you can select a template. Click **Load** to apply it. The 💾 button saves the current template, and the 🗑️ button deletes the selected one.
* **Instruction template**: A Jinja2 template that defines the prompt format for the instruction-following conversation.
* **Send to notebook**: Send the full instruction template in string format to the Notebook tab.
* **Chat template**: A Jinja2 template that defines the prompt format for regular chat conversations with characters.

## Character tab

The Character tab is a separate top-level tab that contains the following sub-tabs:

### Character

Parameters that define the character used in the Chat tab when "chat" or "chat-instruct" are selected under "Mode".

* **Character**: A dropdown menu where you can select from saved characters, save a new character (💾 button), and delete the selected character (🗑️). The **Restore character** button resets the character to its last saved state.
* **Character's name**: The bot name as it appears in the prompt.
* **Context**: A string that is always at the top of the prompt. It never gets truncated. It usually defines the bot's personality and some key elements of the conversation.
* **Greeting**: An opening message for the bot. When set, it appears whenever you start a new chat.
* **Character picture**: A profile picture for the bot. To make it apply, you need to save the bot by clicking on 💾.
* **Your picture**: Your profile picture. It will be used in all conversations.

Note: the following replacements take place in the context and greeting fields when the chat prompt is generated:

* `{{char}}` and `<BOT>` get replaced with "Character's name".
* `{{user}}` and `<USER>` get replaced with "Your name".

So you can use those special placeholders in your character definitions. They are commonly found in TavernAI character cards.

### User

Allows you to create and manage user profiles.

* **User**: A dropdown to select, save (💾), or delete (🗑️) user profiles.
* **Name**: Your name as it appears in the prompt.
* **Description**: An optional description of yourself that can be referenced in conversations.

### Chat history

In this tab, you can download the current chat history in JSON format and upload a previously saved chat history.

When a history is uploaded, a new chat is created to hold it. That is, you don't lose your current chat in the Chat tab.

### Upload character

#### YAML or JSON

Allows you to upload characters in the YAML format used by the web UI, including optionally a profile picture.

#### TavernAI PNG

Allows you to upload a TavernAI character card. It will be converted to the internal YAML format of the web UI after upload.


================================================
FILE: docs/04 - Model Tab.md
================================================
This is where you load models, apply LoRAs to a loaded model, and download new models.

## Model loaders

### llama.cpp

Loads: GGUF models. Note: GGML models have been deprecated and do not work anymore.

Example: https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF

* **gpu_layers**: The number of layers to allocate to the GPU. If set to 0, only the CPU will be used. If you want to offload all layers, you can simply set this to the maximum value.
* **ctx_size**: Context length of the model. In llama.cpp, the cache is preallocated, so the higher this value, the higher the VRAM. It is automatically set to the maximum sequence length for the model based on the metadata inside the GGUF file, but you may need to lower this value to fit the model into your GPU. Set to 0 for automatic context size based on available memory. After loading the model, the "Truncate the prompt up to this length" parameter under "Parameters" > "Generation" is automatically set to your chosen "ctx_size" so that you don't have to set the same thing twice.
* **cache_type**: KV cache quantization type. Valid options: `fp16`, `q8_0`, `q4_0`. Lower quantization saves VRAM at the cost of some quality.
* **tensor_split**: For multi-gpu only. Sets the amount of memory to allocate per GPU as proportions. Not to be confused with other loaders where this is set in GB; here you can set something like `30,70` for 30%/70%.
* **batch_size**: Maximum number of prompt tokens to batch together when calling llama_eval.
* **ubatch_size**: Physical maximum batch size for prompt processing.
* **threads**: Number of threads. Recommended value: your number of physical cores.
* **threads_batch**: Number of threads for batch processing. Recommended value: your total number of cores (physical + virtual).
* **cpu_moe**: Force MoE expert layers to run on the CPU, keeping the rest on the GPU.
* **extra_flags**: Extra flags to pass to llama-server. Format: `flag1=value1,flag2,flag3=value3`. Example: `override-tensor=exps=CPU`.
* **mmproj**: Path to the mmproj file for multimodal (vision) models. This enables image understanding capabilities.
* **streaming_llm**: Experimental feature to avoid re-evaluating the entire prompt when part of it is removed, for instance, when you hit the context length for the model in chat mode and an old message is removed.
* **cpu**: Force a version of llama.cpp compiled without GPU acceleration to be used. Can usually be ignored. Only set this if you want to use CPU only and llama.cpp doesn't work otherwise.
* **row_split**: Split the model by rows across GPUs. This may improve multi-gpu performance.
* **no_kv_offload**: Do not offload the KV cache to the GPU. This saves VRAM but reduces performance.
* **no_mmap**: Loads the model into memory at once, possibly preventing I/O operations later on at the cost of a longer load time.
* **mlock**: Force the system to keep the model in RAM rather than swapping or compressing.
* **numa**: May improve performance on certain multi-cpu systems.

### Transformers

Loads: full precision (16-bit or 32-bit) models, as well as bitsandbytes-quantized models. The repository usually has a clean name without GGUF or EXL3 in its name, and the model files are named `model.safetensors` or split into parts like `model-00001-of-00004.safetensors`.

Example: [https://huggingface.co/lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5).

Full precision models use a ton of VRAM, so you will usually want to select the "load_in_4bit" and "use_double_quant" options to load the model in 4-bit precision using bitsandbytes.

Options:

* **gpu_split**: When using multiple GPUs, sets the amount of VRAM in GB to allocate per GPU. Example: `20,7,7`.
* **cpu_memory**: Maximum CPU memory in GiB to use for CPU offloading via the accelerate library. Whatever doesn't fit in the GPU or CPU will go to a disk cache if the "disk" checkbox is enabled.
* **compute_dtype**: Used when "load_in_4bit" is checked. I recommend leaving the default value.
* **quant_type**: Used when "load_in_4bit" is checked. I recommend leaving the default value.
* **attn_implementation**: Choose the attention implementation. Valid options: `sdpa`, `eager`, `flash_attention_2`. The default (`sdpa`) works well in most cases; `flash_attention_2` may be useful for training.
* **cpu**: Loads the model in CPU mode using Pytorch. The model will be loaded in 32-bit precision, so a lot of RAM will be used. CPU inference with transformers is older than llama.cpp and it works, but it's a lot slower. Note: this parameter has a different interpretation in the llama.cpp loader (see above).
* **load_in_8bit**: Load the model in 8-bit precision using bitsandbytes. The 8-bit kernel in that library has been optimized for training and not inference, so load_in_8bit is slower than load_in_4bit (but more accurate).
* **bf16**: Use bfloat16 precision instead of float16 (the default). Only applies when quantization is not used.
* **disk**: Enable disk offloading for layers that don't fit into the GPU and CPU combined.
* **load_in_4bit**: Load the model in 4-bit precision using bitsandbytes.
* **use_double_quant**: Use double quantization with 4-bit loading for reduced memory usage.
* **trust-remote-code**: Some models use custom Python code to load the model or the tokenizer. For such models, this option needs to be set. It doesn't download any remote content: all it does is execute the .py files that get downloaded with the model. Those files can potentially include malicious code; I have never seen it happen, but it is in principle possible.
* **no_use_fast**: Do not use the "fast" version of the tokenizer. Can usually be ignored; only check this if you can't load the tokenizer for your model otherwise.

### ExLlamav3_HF

Loads: EXL3 models. These models usually have "EXL3" or "exl3" in the model name.

Uses the ExLlamaV3 backend with Transformers samplers.

* **ctx_size**: Context length of the model. The cache is preallocated, so the higher this value, the higher the VRAM. It is automatically set to the maximum sequence length for the model based on its metadata, but you may need to lower this value to fit the model into your GPU. After loading the model, the "Truncate the prompt up to this length" parameter under "Parameters" > "Generation" is automatically set to your chosen "ctx_size" so that you don't have to set the same thing twice.
* **cache_type**: KV cache quantization type. Valid options: `fp16`, `q2` to `q8`. You can also specify key and value bits separately, e.g. `q4_q8`. Lower quantization saves VRAM at the cost of some quality.
* **gpu_split**: Comma-separated list of VRAM (in GB) to use per GPU device for model layers. Example: `20,7,7`.
* **cfg_cache**: Creates a second cache to hold the CFG negative prompts. You need to set this if and only if you intend to use CFG in the "Parameters" > "Generation" tab. Checking this parameter doubles the cache VRAM usage.
* **no_use_fast**: Do not use the "fast" version of the tokenizer.
* **enable_tp**: Enable Tensor Parallelism (TP) to split the model across GPUs.
* **tp_backend**: The backend for tensor parallelism. Valid options: `native`, `nccl`. Default: `native`.

### ExLlamav3

The same as ExLlamav3_HF but using the internal samplers of ExLlamaV3 instead of the ones in the Transformers library. Supports speculative decoding with a draft model. Also supports multimodal (vision) models natively.

* **ctx_size**: Same as ExLlamav3_HF.
* **cache_type**: Same as ExLlamav3_HF.
* **gpu_split**: Same as ExLlamav3_HF.
* **enable_tp**: Enable Tensor Parallelism (TP) to split the model across GPUs.
* **tp_backend**: The backend for tensor parallelism. Valid options: `native`, `nccl`. Default: `native`.

### TensorRT-LLM

Loads: TensorRT-LLM engine models. These are highly optimized models compiled specifically for NVIDIA GPUs.

* **ctx_size**: Context length of the model.
* **cpp_runner**: Use the ModelRunnerCpp runner, which is faster than the default ModelRunner but doesn't support streaming yet.

## Model dropdown

Here you can select a model to be loaded, refresh the list of available models, load/unload/reload the selected model, and save the settings for the model. The "settings" are the values in the input fields (checkboxes, sliders, dropdowns) below this dropdown.

After saving, those settings will get restored whenever you select that model again in the dropdown menu.

If the **Autoload the model** checkbox is selected, the model will be loaded as soon as it is selected in this menu. Otherwise, you will have to click on the "Load" button.

## LoRA dropdown

Used to apply LoRAs to the model. Note that LoRA support is not implemented for all loaders. Check the [What Works](https://github.com/oobabooga/text-generation-webui/wiki/What-Works) page for details.

## Download model or LoRA

Here you can download a model or LoRA directly from the https://huggingface.co/ website.

* Models will be saved to `user_data/models`.
* LoRAs will be saved to `user_data/loras`.

In the input field, you can enter either the Hugging Face username/model path (like `facebook/galactica-125m`) or the full model URL (like `https://huggingface.co/facebook/galactica-125m`). To specify a branch, add it at the end after a ":" character like this: `facebook/galactica-125m:main`.

To download a single file, as necessary for models in GGUF format, you can click on "Get file list" after entering the model path in the input field, and then copy and paste the desired file name in the "File name" field before clicking on "Download".


================================================
FILE: docs/05 - Training Tab.md
================================================
## Training Your Own LoRAs

A LoRA is tied to a specific model architecture — a LoRA trained on Llama 3 8B won't work on Mistral 7B. Train on the exact model you plan to use.

### Quick Start

1. Load your base model with the **Transformers** loader (no LoRAs loaded).
2. Open the **Training** tab > **Train LoRA**.
3. Pick a dataset and configure parameters (see [below](#parameters)).
4. Click **Start LoRA Training** and monitor the [loss](#loss).
5. When done, load the LoRA from the **Models** tab and test it.

### Resuming Training

To resume from a checkpoint, use the same LoRA name and uncheck `Override Existing Files`. If checkpoints exist (from `Save every n steps`), training will automatically resume from the latest one with full optimizer and scheduler state preserved. Note that you cannot change the `Rank` of an already created LoRA.

You should also use `Copy parameters from` to restore the UI settings (learning rate, epochs, etc.) from the previous run, so that training continues with the same configuration.

### Troubleshooting

- **Corrupted outputs**: Start over with a lower Learning Rate.
- **Not learning enough**: Run more epochs, or increase the Rank.
- **Unwanted formatting**: Tweak your dataset, or train for fewer steps.

## Instruction Templates

All instruction/chat training uses `apply_chat_template()` with Jinja2 templates. You have two options in the **Instruction Template** dropdown:

- **Chat Template**: Uses the model's built-in chat template from its tokenizer. Works with instruct/chat models that ship with a chat template (Llama 3, Qwen, Mistral, etc.).
- **Named template** (e.g. ChatML, Alpaca, Llama-v3, etc.): Loads a Jinja2 template from `user_data/instruction-templates/`. This is useful for base models that don't have a built-in template, or when you want to override the model's default template.

Both options are functionally identical — the only difference is where the Jinja2 template string comes from. In both cases:
- The dataset is tokenized via `apply_chat_template()`
- Labels are automatically masked so only assistant responses are trained on
- Multi-turn conversations are supported natively
- Special tokens are handled correctly by the template

The WebUI ships with 50+ templates in `user_data/instruction-templates/`. You can also add your own by creating a `.yaml` file with an `instruction_template` key containing a Jinja2 template string, or a plain `.jinja` file.

**Dataset formats:** Your JSON dataset can use either of these structures:

OpenAI messages format:
```json
[
  {
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is Python?"},
      {"role": "assistant", "content": "A programming language."},
      {"role": "user", "content": "What's it used for?"},
      {"role": "assistant", "content": "Web dev, data science, scripting, and more."}
    ]
  }
]
```

ShareGPT format (`conversations` key with `from`/`value` fields):
```json
[
  {
    "conversations": [
      {"from": "system", "value": "You are a helpful assistant."},
      {"from": "human", "value": "What is Python?"},
      {"from": "gpt", "value": "A programming language."},
      {"from": "human", "value": "What's it used for?"},
      {"from": "gpt", "value": "Web dev, data science, scripting, and more."}
    ]
  }
]
```

## Text Dataset

For pretraining-style training on raw text, use the **Text Dataset** tab. Your dataset should be a JSON file with one document per row, each with a `"text"` key:

```json
[
  {"text": "First document content..."},
  {"text": "Second document content..."}
]
```

This is the standard format used by most pretraining datasets (The Pile, RedPajama, etc.).

Each document is tokenized (with BOS token), concatenated into one long token sequence, and split into chunks of `Cutoff Length` tokens. The final chunk is padded if shorter than the cutoff length. When `Add EOS token` is enabled, an EOS token is appended after each document before concatenation, helping the model learn document boundaries.

- `Stride Length` controls the overlap between consecutive chunks in tokens. Set to 0 for non-overlapping chunks (the standard concatenate-and-split approach). Values like 256 or 512 create overlapping chunks that help the model learn context across chunk boundaries, at the cost of more training samples.

## Target Modules

By default, **Target all linear layers** is enabled. This uses peft's `all-linear` mode, which applies LoRA to every `nn.Linear` layer in the model except the output head (`lm_head`). It works for any model architecture.

If you uncheck it, you can manually select individual projection modules (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `down_proj`, `up_proj`). Targeting fewer modules reduces VRAM usage and adapter size, but also reduces how much the model can learn. The default selection of `q_proj` + `v_proj` is the minimum for basic style/format training.

## Parameters

Each parameter has a description in the UI. Below is guidance on the most important choices.

### VRAM

VRAM usage during training is roughly similar to inference with ~1000 tokens of context. If you can run the model, you can probably train LoRAs with the default settings. If you run out of VRAM, reduce `Micro Batch Size` or `Cutoff Length`. Training 4-bit quantized models uses more VRAM — set `Micro Batch Size` to `1` to compensate.

### Rank

Higher rank = more learning capacity = larger adapter = more VRAM. Use 4–8 for style/format, 128–256 to teach factual knowledge.

### Learning Rate and Epochs

These control how aggressively the model learns and how many times it sees the data. Higher LR + fewer epochs = fast but rough. Lower LR + more epochs = slower but higher quality. The scheduler (default: cosine) decays the LR over the course of training — see [HuggingFace docs](https://huggingface.co/docs/transformers/main_classes/optimizer_schedules#schedules) for graphs of each option.

## Loss

When you're running training, the WebUI's console window will log reports that include, among other things, a numeric value named `Loss`. It will start as a high number, and gradually get lower and lower as it goes.

Loss measures how far the model's predictions are from the training data, with `0` meaning a perfect match. It's calculated as the cross-entropy between the model's output distribution and the expected tokens.

In practice, a loss of `0` means the model has overfit — it memorized the training data at the expense of its general capabilities.

Loss is a balancing game: you want it low enough that the model learns your data, but not so low that it loses general knowledge. Generally, if it goes below `1.0`, overfitting is likely and you should stop training. In some cases you may want to go as low as `0.5` (if you need very predictable outputs). Different goals have different needs, so experiment and see what works best for you.

Note: if you see Loss start at or suddenly jump to exactly `0`, it is likely something has gone wrong in your training process (eg model corruption).


================================================
FILE: docs/06 - Session Tab.md
================================================
Here you can restart the UI with new settings.

## Settings

* **Toggle light/dark theme**: switches between light and dark mode.
* **Show two columns in the Notebook tab**: toggles between the two-column Default layout and the single-column Notebook layout.
* **Turn long pasted text into attachments in the Chat tab**: when enabled, long pasted text is automatically converted into file attachments.
* **Include attachments/search results from previous messages in the chat prompt**: when enabled, attachments and web search results from earlier messages are included in subsequent prompts.

## Extensions & flags

* **Available extensions**: shows a list of extensions available under `text-generation-webui/extensions` and `text-generation-webui/user_data/extensions`. Note that some of these extensions may require manually installing Python requirements through the command: `pip install -r extensions/extension_name/requirements.txt`.
* **Boolean command-line flags**: shows command-line flags of bool (true/false) type.

After selecting your desired flags and extensions, you can restart the UI by clicking on **Apply flags/extensions and restart**.

## Install or update an extension

In this field, you can enter the GitHub URL for an extension and press enter to either install it (i.e. cloning it into `text-generation-webui/extensions`) or update it with `git pull` in case it is already cloned.

Note that some extensions may include additional Python requirements. In this case, to install those you have to run the command

```
pip install -r extensions/extension-name/requirements.txt
```

or

```
pip install -r extensions\extension-name\requirements.txt
```

if you are on Windows.

If you used the one-click installer, this command should be executed in the terminal window that appears when you run the "cmd_" script for your OS.

## Saving UI defaults

The **Save extensions settings to user_data/settings.yaml** button gathers the visible values in the UI and saves them to `user_data/settings.yaml` so that your settings will persist across multiple restarts of the UI.

Note that preset parameters like temperature are not individually saved, so you need to first save your preset and select it in the preset menu before saving the defaults.


================================================
FILE: docs/07 - Extensions.md
================================================
# Extensions

Extensions are defined by files named `script.py` inside subfolders of either:
- `text-generation-webui/extensions`
- `text-generation-webui/user_data/extensions`

They are loaded at startup if the folder name is specified after the `--extensions` flag.

For instance, `extensions/silero_tts/script.py` or `user_data/extensions/silero_tts/script.py` gets loaded with `python server.py --extensions silero_tts`.

**Note:** Extensions in `user_data/extensions/` take priority over those in `extensions/` when both exist with the same name.

## [text-generation-webui-extensions](https://github.com/oobabooga/text-generation-webui-extensions)

The repository above contains a directory of user extensions.

If you create an extension, you are welcome to host it in a GitHub repository and submit a PR adding it to the list.

## Built-in extensions

|Extension|Description|
|---------|-----------|
|[openai](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai)| Creates an API that mimics the OpenAI API and can be used as a drop-in replacement. |
|[superboogav2](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/superboogav2)| Enhanced RAG extension with support for PDF, DOCX, and PPTX files. |
|[send_pictures](https://github.com/oobabooga/text-generation-webui/blob/main/extensions/send_pictures/)| Creates an image upload field that can be used to send images to the bot in chat mode. Captions are automatically generated using BLIP. |
|[coqui_tts](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/coqui_tts)| Text-to-speech extension using Coqui XTTS v2. |
|[silero_tts](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/silero_tts)| Text-to-speech extension using [Silero](https://github.com/snakers4/silero-models). When used in chat mode, responses are replaced with an audio widget. |
|[whisper_stt](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/whisper_stt)| Allows you to enter your inputs in chat mode using your microphone. |
|[perplexity_colors](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/perplexity_colors)| Colors each token in the output text by its associated probability, as derived from the model logits. |
|[google_translate](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/google_translate)| Automatically translates inputs and outputs using Google Translate.|
|[gallery](https://github.com/oobabooga/text-generation-webui/blob/main/extensions/gallery/)| Creates a gallery with the chat characters and their pictures. |
|[sd_api_pictures](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/sd_api_pictures)| Allows you to request pictures from the bot in chat mode, which will be generated using the AUTOMATIC1111 Stable Diffusion API. See examples [here](https://github.com/oobabooga/text-generation-webui/pull/309). |
|[long_replies](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/long_replies)| Forces longer replies by suppressing early newlines in the model output. |
|[ngrok](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/ngrok)| Allows you to access the web UI remotely using the ngrok reverse tunnel service (free). It's an alternative to the built-in Gradio `--share` feature. |
|[superbooga](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/superbooga)| An extension that uses ChromaDB to create an arbitrarily large pseudocontext, taking as input text files, URLs, or pasted text. Based on https://github.com/kaiokendev/superbig. |
|[character_bias](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/character_bias)| Just a very simple example that adds a hidden string at the beginning of the bot's reply in chat mode. |

## How to write an extension

The extensions framework is based on special functions and variables that you can define in `script.py`. The functions are the following:

| Function        | Description |
|-------------|-------------|
| `def setup()` | Is executed when the extension gets imported. |
| `def ui()` | Creates custom gradio elements when the UI is launched. | 
| `def custom_css()` | Returns custom CSS as a string. It is applied whenever the web UI is loaded. |
| `def custom_js()` | Same as above but for javascript. |
| `def input_modifier(string, state, is_chat=False)`  | Modifies the input string before it enters the model. In chat mode, it is applied to the user message. Otherwise, it is applied to the entire prompt. |
| `def output_modifier(string, state, is_chat=False)`  | Modifies the output string before it is presented in the UI. In chat mode, it is applied to the bot's reply. Otherwise, it is applied to the entire output. |
| `def chat_input_modifier(text, visible_text, state)` | Modifies both the visible and internal inputs in chat mode. Can be used to hijack the chat input with custom content. |
| `def bot_prefix_modifier(string, state)`  | Applied in chat mode to the prefix for the bot's reply. |
| `def state_modifier(state)`  | Modifies the dictionary containing the UI input parameters before it is used by the text generation functions. |
| `def history_modifier(history)`  | Modifies the chat history before the text generation in chat mode begins. |
| `def custom_generate_reply(...)` | Overrides the main text generation function. |
| `def custom_generate_chat_prompt(...)` | Overrides the prompt generator in chat mode. |
| `def tokenizer_modifier(state, prompt, input_ids, input_embeds)` | Modifies the `input_ids`/`input_embeds` fed to the model. Should return `prompt`, `input_ids`, `input_embeds`. See the `example` extension for a template. |
| `def custom_tokenized_length(prompt)` | Used in conjunction with `tokenizer_modifier`, returns the length in tokens of `prompt`. See the `example` extension for a template. |

Additionally, you can define a special `params` dictionary. In it, the `display_name` key is used to define the displayed name of the extension in the UI, and the `is_tab` key is used to define whether the extension should appear in a new tab. By default, extensions appear at the bottom of the "Text generation" tab.

Example:

```python
params = {
    "display_name": "Google Translate",
    "is_tab": True,
}
```

The `params` dict may also contain variables that you want to be customizable through a `settings.yaml` file. For instance, assuming the extension is in `extensions/google_translate`, the variable `language string` in

```python
params = {
    "display_name": "Google Translate",
    "is_tab": True,
    "language string": "jp"
}
```

can be customized by adding a key called `google_translate-language string` to `settings.yaml`:

```python
google_translate-language string: 'fr'
``` 

That is, the syntax for the key is `extension_name-variable_name`.

## Using multiple extensions at the same time

You can activate more than one extension at a time by providing their names separated by spaces after `--extensions`. The input, output, and bot prefix modifiers will be applied in the specified order. 

Example:

```
python server.py --extensions enthusiasm translate # First apply enthusiasm, then translate
python server.py --extensions translate enthusiasm # First apply translate, then enthusiasm
```

Do note, that for:
- `custom_generate_chat_prompt`
- `custom_generate_reply`
- `custom_tokenized_length`

only the first declaration encountered will be used and the rest will be ignored. 

## A full example

The source code below can be found at [extensions/example/script.py](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/example/script.py).

```python
"""
An example of extension. It does nothing, but you can add transformations
before the return statements to customize the webui behavior.

Starting from history_modifier and ending in output_modifier, the
functions are declared in the same order that they are called at
generation time.
"""

import gradio as gr
import torch
from transformers import LogitsProcessor

from modules import chat, shared
from modules.text_generation import (
    decode,
    encode,
    generate_reply,
)

params = {
    "display_name": "Example Extension",
    "is_tab": False,
}

class MyLogits(LogitsProcessor):
    """
    Manipulates the probabilities for the next token before it gets sampled.
    Used in the logits_processor_modifier function below.
    """
    def __init__(self):
        pass

    def __call__(self, input_ids, scores):
        # probs = torch.softmax(scores, dim=-1, dtype=torch.float)
        # probs[0] /= probs[0].sum()
        # scores = torch.log(probs / (1 - probs))
        return scores

def history_modifier(history):
    """
    Modifies the chat history.
    Only used in chat mode.
    """
    return history

def state_modifier(state):
    """
    Modifies the state variable, which is a dictionary containing the input
    values in the UI like sliders and checkboxes.
    """
    return state

def chat_input_modifier(text, visible_text, state):
    """
    Modifies the user input string in chat mode (visible_text).
    You can also modify the internal representation of the user
    input (text) to change how it will appear in the prompt.
    """
    return text, visible_text

def input_modifier(string, state, is_chat=False):
    """
    In default/notebook modes, modifies the whole prompt.

    In chat mode, it is the same as chat_input_modifier but only applied
    to "text", here called "string", and not to "visible_text".
    """
    return string

def bot_prefix_modifier(string, state):
    """
    Modifies the prefix for the next bot reply in chat mode.
    By default, the prefix will be something like "Bot Name:".
    """
    return string

def tokenizer_modifier(state, prompt, input_ids, input_embeds):
    """
    Modifies the input ids and embeds.
    Modifies the input ids and embeds fed to the model.
    Only used by loaders that use the transformers library for sampling.
    """
    return prompt, input_ids, input_embeds

def logits_processor_modifier(processor_list, input_ids):
    """
    Adds logits processors to the list, allowing you to access and modify
    the next token probabilities.
    Only used by loaders that use the transformers library for sampling.
    """
    processor_list.append(MyLogits())
    return processor_list

def output_modifier(string, state, is_chat=False):
    """
    Modifies the LLM output before it gets presented.

    In chat mode, the modified version goes into history['visible'],
    and the original version goes into history['internal'].
    """
    return string

def custom_generate_chat_prompt(user_input, state, **kwargs):
    """
    Replaces the function that generates the prompt from the chat history.
    Only used in chat mode.
    """
    result = chat.generate_chat_prompt(user_input, state, **kwargs)
    return result

def custom_css():
    """
    Returns a CSS string that gets appended to the CSS for the webui.
    """
    return ''

def custom_js():
    """
    Returns a javascript string that gets appended to the javascript
    for the webui.
    """
    return ''

def setup():
    """
    Gets executed only once, when the extension is imported.
    """
    pass

def ui():
    """
    Gets executed when the UI is drawn. Custom gradio elements and
    their corresponding event handlers should be defined here.

    To learn about gradio components, check out the docs:
    https://gradio.app/docs/
    """
    pass
```


================================================
FILE: docs/08 - Additional Tips.md
================================================
## Audio notification

If your computer takes a long time to generate each response for the model that you are using, you can enable an audio notification for when the response is completed. This feature was kindly contributed by HappyWorldGames in [#1277](https://github.com/oobabooga/text-generation-webui/pull/1277).

### Installation

Simply place a file called "notification.mp3" in the same folder as `server.py`. Here you can find some examples:

* https://pixabay.com/sound-effects/search/ding/?duration=0-30
* https://pixabay.com/sound-effects/search/notification/?duration=0-30

Source: https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/1126

This file will be automatically detected the next time you start the web UI.

## Miscellaneous info

### You can train LoRAs in CPU mode

Load the web UI with

```
python server.py --cpu
```

and start training the LoRA from the training tab as usual.

### You can check the sha256sum of downloaded models with the download script

```
python download-model.py facebook/galactica-125m --check
```

### The download script continues interrupted downloads by default

It doesn't start over.



================================================
FILE: docs/09 - Docker.md
================================================
Docker Compose is a way of installing and launching the web UI in an isolated Ubuntu image using only a few commands.

## Prerequisites

You need Docker Compose v2.17 or higher:

```
~$ docker compose version
Docker Compose version v2.21.0
```

Installation instructions: https://docs.docker.com/engine/install/

For NVIDIA GPUs, you also need the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).

## Quick start

There are four Docker variants available under `docker/`:

| Directory | GPU | Notes |
|-----------|-----|-------|
| `docker/nvidia` | NVIDIA | Requires NVIDIA Container Toolkit |
| `docker/amd` | AMD | Requires ROCm-compatible GPU |
| `docker/intel` | Intel Arc | Beta support |
| `docker/cpu` | None | CPU-only inference |

To launch (using NVIDIA as an example):

```bash
cd text-generation-webui/docker/nvidia
cp ../.env.example .env
# Optionally edit .env to customize ports, TORCH_CUDA_ARCH_LIST, etc.
docker compose up --build
```

The web UI will be available at `http://localhost:7860`.

## User data

Create a `user_data/` directory next to the `docker-compose.yml` to persist your models, characters, presets, and settings between container rebuilds:

```bash
mkdir -p user_data
```

This directory is mounted into the container at runtime. You can place a `CMD_FLAGS.txt` inside it to pass persistent flags to the web UI (e.g., `--api`).

Models can be downloaded through the web UI's “Model” tab once it's running, and they will be saved to `user_data/models/`.

## Dedicated docker repository

An external repository maintains a docker wrapper for this project as well as several pre-configured 'one-click' `docker compose` variants. It can be found at: [Atinoda/text-generation-webui-docker](https://github.com/Atinoda/text-generation-webui-docker).


================================================
FILE: docs/11 - AMD Setup.md
================================================
## Using an AMD GPU in Linux

Requires ROCm 6.4 to be installed.

### Option 1: One-click installer

The one-click installer (`start_linux.sh`) automatically detects AMD GPUs. When prompted, select the AMD option, or set the `GPU_CHOICE` environment variable before running:

```
GPU_CHOICE=B ./start_linux.sh
```

### Option 2: Manual conda install

Follow the manual conda installation instructions in the README, using the AMD PyTorch command:

```
pip3 install torch==2.9.1 --index-url https://download.pytorch.org/whl/rocm6.4
```

Then install the project requirements with the AMD requirements file:

```
pip install -r requirements/full/requirements_amd.txt
```


================================================
FILE: docs/12 - OpenAI API.md
================================================
## OpenAI compatible API

The main API for this project is meant to be a drop-in replacement to the OpenAI API, including Chat and Completions endpoints.

* It is 100% offline and private.
* It doesn't create any logs.
* It doesn't connect to OpenAI.
* It doesn't use the openai-python library.

### Starting the API

Add `--api` to your command-line flags.

* To create a public Cloudflare URL, add the `--public-api` flag.
* To listen on your local network, add the `--listen` flag.
* To change the port, which is 5000 by default, use `--api-port 1234` (change 1234 to your desired port number).
* To use SSL, add `--ssl-keyfile key.pem --ssl-certfile cert.pem`. ⚠️ **Note**: this doesn't work with `--public-api` since Cloudflare already uses HTTPS by default.
* To use an API key for authentication, add `--api-key yourkey`.

### Examples

For the documentation with all the endpoints, parameters and their types, consult `http://127.0.0.1:5000/docs` or the [typing.py](https://github.com/oobabooga/text-generation-webui/blob/main/extensions/openai/typing.py) file.

The official examples in the [OpenAI documentation](https://platform.openai.com/docs/api-reference) should also work, and the same parameters apply (although the API here has more optional parameters).

#### Completions

```shell
curl http://127.0.0.1:5000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "This is a cake recipe:\n\n1.",
    "max_tokens": 512,
    "temperature": 0.6,
    "top_p": 0.95,
    "top_k": 20
  }'
```

#### Chat completions

Works best with instruction-following models. If the "instruction_template" variable is not provided, it will be guessed automatically based on the model name using the regex patterns in `user_data/models/config.yaml`.

```shell
curl http://127.0.0.1:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "temperature": 0.6,
    "top_p": 0.95,
    "top_k": 20
  }'
```

#### Chat completions with characters

```shell
curl http://127.0.0.1:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Hello! Who are you?"
      }
    ],
    "mode": "chat-instruct",
    "character": "Example",
    "temperature": 0.6,
    "top_p": 0.95,
    "top_k": 20
  }'
```

#### Multimodal/vision (llama.cpp and ExLlamaV3)

##### With /v1/chat/completions (recommended!)

```shell
curl http://127.0.0.1:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Please describe what you see in this image."},
          {"type": "image_url", "image_url": {"url": "https://github.com/turboderp-org/exllamav3/blob/master/examples/media/cat.png?raw=true"}}
        ]
      }
    ],
    "temperature": 0.6,
    "top_p": 0.95,
    "top_k": 20
  }'
```

For base64-encoded images, just replace the inner "url" value with this format: `data:image/FORMAT;base64,BASE64_STRING` where FORMAT is the file type (png, jpeg, gif, etc.) and BASE64_STRING is your base64-encoded image data.

##### With /v1/completions

```shell
curl http://127.0.0.1:5000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "About image <__media__> and image <__media__>, what I can say is that the first one"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://github.com/turboderp-org/exllamav3/blob/master/examples/media/cat.png?raw=true"
            }
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://github.com/turboderp-org/exllamav3/blob/master/examples/media/strawberry.png?raw=true"
            }
          }
        ]
      }
    ],
    "temperature": 0.6,
    "top_p": 0.95,
    "top_k": 20
  }'
```

For base64-encoded images, just replace the inner "url" values with this format: `data:image/FORMAT;base64,BASE64_STRING` where FORMAT is the file type (png, jpeg, gif, etc.) and BASE64_STRING is your base64-encoded image data.

#### Image generation

```shell
curl http://127.0.0.1:5000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "an orange tree",
    "steps": 9,
    "cfg_scale": 0,
    "batch_size": 1,
    "batch_count": 1
  }'
```

You need to load an image model first. You can do this via the UI, or by adding `--image-model your_model_name` when launching the server.

The output is a JSON object containing a `data` array. Each element has a `b64_json` field with the base64-encoded PNG image:

```json
{
  "created": 1764791227,
  "data": [
    {
      "b64_json": "iVBORw0KGgo..."
    }
  ]
}
```

#### SSE streaming

```shell
curl http://127.0.0.1:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "temperature": 0.6,
    "top_p": 0.95,
    "top_k": 20,
    "stream": true
  }'
```

#### Logits

```shell
curl -k http://127.0.0.1:5000/v1/internal/logits \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Who is best, Asuka or Rei? Answer:",
    "use_samplers": false
  }'
```

#### Logits after sampling parameters

```shell
curl -k http://127.0.0.1:5000/v1/internal/logits \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Who is best, Asuka or Rei? Answer:",
    "use_samplers": true,
    "top_k": 3
  }'
```

#### List models

```shell
curl -k http://127.0.0.1:5000/v1/internal/model/list \
  -H "Content-Type: application/json"
```

#### Load model

```shell
curl -k http://127.0.0.1:5000/v1/internal/model/load \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "Qwen_Qwen3-0.6B-Q4_K_M.gguf",
    "args": {
      "ctx_size": 32768,
      "flash_attn": true,
      "cache_type": "q8_0"
    }
  }'
```

#### Python chat example

```python
import requests

url = "http://127.0.0.1:5000/v1/chat/completions"

headers = {
    "Content-Type": "application/json"
}

history = []

while True:
    user_message = input("> ")
    history.append({"role": "user", "content": user_message})
    data = {
        "messages": history,
        "temperature": 0.6,
        "top_p": 0.95,
        "top_k": 20
    }

    response = requests.post(url, headers=headers, json=data, verify=False)
    assistant_message = response.json()['choices'][0]['message']['content']
    history.append({"role": "assistant", "content": assistant_message})
    print(assistant_message)
```

#### Python chat example with streaming

Start the script with `python -u` to see the output in real time.

```python
import requests
import sseclient  # pip install sseclient-py
import json

url = "http://127.0.0.1:5000/v1/chat/completions"

headers = {
    "Content-Type": "application/json"
}

history = []

while True:
    user_message = input("> ")
    history.append({"role": "user", "content": user_message})
    data = {
        "stream": True,
        "messages": history,
        "temperature": 0.6,
        "top_p": 0.95,
        "top_k": 20
    }

    stream_response = requests.post(url, headers=headers, json=data, verify=False, stream=True)
    client = sseclient.SSEClient(stream_response)

    assistant_message = ''
    for event in client.events():
        payload = json.loads(event.data)
        chunk = payload['choices'][0]['delta']['content']
        assistant_message += chunk
        print(chunk, end='')

    print()
    history.append({"role": "assistant", "content": assistant_message})
```

#### Python completions example with streaming

Start the script with `python -u` to see the output in real time.

```python
import json
import requests
import sseclient  # pip install sseclient-py

url = "http://127.0.0.1:5000/v1/completions"

headers = {
    "Content-Type": "application/json"
}

data = {
    "prompt": "This is a cake recipe:\n\n1.",
    "max_tokens": 512,
    "temperature": 0.6,
    "top_p": 0.95,
    "top_k": 20,
    "stream": True,
}

stream_response = requests.post(url, headers=headers, json=data, verify=False, stream=True)
client = sseclient.SSEClient(stream_response)

print(data['prompt'], end='')
for event in client.events():
    payload = json.loads(event.data)
    print(payload['choices'][0]['text'], end='')

print()
```

#### Python parallel requests example

The API supports handling multiple requests in parallel. For ExLlamaV3, this works out of the box. For llama.cpp, you need to pass `--parallel N` to set the number of concurrent slots.

```python
import concurrent.futures
import requests

url = "http://127.0.0.1:5000/v1/chat/completions"
prompts = [
    "Write a haiku about the ocean.",
    "Explain quantum computing in simple terms.",
    "Tell me a joke about programmers.",
]

def send_request(prompt):
    response = requests.post(url, json={
        "messages": [{"role": "user", "content": prompt}],
        "max_tokens": 200,
    })
    return response.json()["choices"][0]["message"]["content"]

with concurrent.futures.ThreadPoolExecutor() as executor:
    results = list(executor.map(send_request, prompts))

for prompt, result in zip(prompts, results):
    print(f"Q: {prompt}\nA: {result}\n")
```

#### Python example with API key

Replace

```python
headers = {
    "Content-Type": "application/json"
}
```

with

```python
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer yourPassword123"
}
```

in any of the examples above.

#### Tool/Function calling

Use a model with tool calling support (Qwen, Mistral, GPT-OSS, etc). Tools are passed via the `tools` parameter and the prompt is automatically formatted using the model's Jinja2 template.

When the model decides to call a tool, the response will have `finish_reason: "tool_calls"` and a `tool_calls` array with structured function names and arguments. You then execute the tool, send the result back as a `role: "tool"` message, and continue until the model responds with `finish_reason: "stop"`.

Some models call multiple tools in parallel (Qwen, Mistral), while others call one at a time (GPT-OSS). The loop below handles both styles.

```python
import json
import requests

url = "http://127.0.0.1:5000/v1/chat/completions"

# Define your tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_time",
            "description": "Get the current time in a given timezone",
            "parameters": {
                "type": "object",
                "properties": {
                    "timezone": {"type": "string", "description": "IANA timezone string"},
                },
                "required": ["timezone"]
            }
        }
    },
]


def execute_tool(name, arguments):
    """Replace this with your actual tool implementations."""
    if name == "get_weather":
        return {"temperature": 22, "condition": "sunny", "humidity": 45}
    elif name == "get_time":
        return {"time": "2:30 PM", "timezone": "JST"}
    return {"error": f"Unknown tool: {name}"}


messages = [{"role": "user", "content": "What time is it in Tokyo and what's the weather like there?"}]

# Tool-calling loop: keep going until the model gives a final answer
for _ in range(10):
    response = requests.post(url, json={"messages": messages, "tools": tools}).json()
    choice = response["choices"][0]

    if choice["finish_reason"] == "tool_calls":
        # Add the assistant's response (with tool_calls) to history
        messages.append({
            "role": "assistant",
            "content": choice["message"]["content"],
            "tool_calls": choice["message"]["tool_calls"],
        })

        # Execute each tool and add results to history
        for tool_call in choice["message"]["tool_calls"]:
            name = tool_call["function"]["name"]
            arguments = json.loads(tool_call["function"]["arguments"])
            result = execute_tool(name, arguments)

            print(f"Tool call: {name}({arguments}) => {result}")
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call["id"],
                "content": json.dumps(result),
            })
    else:
        # Final answer
        print(f"\nAssistant: {choice['message']['content']}")
        break
```

### Environment variables

The following environment variables can be used (they take precedence over everything else):

| Variable Name          | Description                                                                                        | Example Value              |
|------------------------|------------------------------------|----------------------------|
| `OPENEDAI_PORT`           | Port number         |             5000               |
| `OPENEDAI_CERT_PATH`      | SSL certificate file path         |            cert.pem                |
| `OPENEDAI_KEY_PATH`       | SSL key file path                    |             key.pem               |
| `OPENEDAI_DEBUG`          | Enable debugging (set to 1)    | 1                          |
| `OPENEDAI_EMBEDDING_MODEL` | Embedding model (if applicable) |          sentence-transformers/all-mpnet-base-v2                  |
| `OPENEDAI_EMBEDDING_DEVICE` | Embedding device (if applicable) |           cuda                 |

#### Persistent settings with `settings.yaml`

You can also set the following variables in your `settings.yaml` file:

```
openai-embedding_device: cuda
openai-embedding_model: "sentence-transformers/all-mpnet-base-v2"
openai-debug: 1
```

### Third-party application setup

You can usually force an application that uses the OpenAI API to connect to the local API by using the following environment variables:

```shell
OPENAI_API_HOST=http://127.0.0.1:5000
```

or

```shell
OPENAI_API_KEY=sk-111111111111111111111111111111111111111111111111
OPENAI_API_BASE=http://127.0.0.1:5000/v1
```

With the [official python openai client](https://github.com/openai/openai-python) (v1.x), the address can be set like this:

```python
from openai import OpenAI

client = OpenAI(
    api_key="sk-111111111111111111111111111111111111111111111111",
    base_url="http://127.0.0.1:5000/v1"
)

response = client.chat.completions.create(
    model="x",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
```

With the [official Node.js openai client](https://github.com/openai/openai-node) (v4.x):

```js
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "http://127.0.0.1:5000/v1",
});

const response = await client.chat.completions.create({
  model: "x",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
```
### Embeddings (alpha)

Embeddings requires `sentence-transformers` installed, but chat and completions will function without it loaded. The embeddings endpoint is currently using the HuggingFace model: `sentence-transformers/all-mpnet-base-v2` for embeddings. This produces 768 dimensional embeddings. The model is small and fast. This model and embedding size may change in the future.

| model name             | dimensions | input max tokens | speed | size | Avg. performance |
| ---------------------- | ---------- | ---------------- | ----- | ---- | ---------------- |
| all-mpnet-base-v2      | 768        | 384              | 2800  | 420M | 63.3             |
| all-MiniLM-L6-v2       | 384        | 256              | 14200 | 80M  | 58.8             |

In short, the all-MiniLM-L6-v2 model is 5x faster, 5x smaller ram, 2x smaller storage, and still offers good quality. Stats from (https://www.sbert.net/docs/pretrained_models.html). To change the model from the default you can set the environment variable `OPENEDAI_EMBEDDING_MODEL`, ex. "OPENEDAI_EMBEDDING_MODEL=all-MiniLM-L6-v2".

Warning: You cannot mix embeddings from different models even if they have the same dimensions. They are not comparable.

### Compatibility

| API endpoint              | notes                                                                       |
| ------------------------- | --------------------------------------------------------------------------- |
| /v1/chat/completions      | Use with instruction-following models. Supports streaming, tool calls.      |
| /v1/completions           | Text completion endpoint.                                                   |
| /v1/embeddings            | Using SentenceTransformer embeddings.                                       |
| /v1/images/generations    | Image generation, response_format='b64_json' only.                         |
| /v1/moderations           | Basic support via embeddings.                                               |
| /v1/models                | Lists models. Currently loaded model first.                                 |
| /v1/models/{id}           | Returns model info.                                                         |
| /v1/audio/\*              | Supported.                                                                  |
| /v1/images/edits          | Not yet supported.                                                          |
| /v1/images/variations     | Not yet supported.                                                          |

#### Applications

Almost everything needs the `OPENAI_API_KEY` and `OPENAI_API_BASE` environment variables set, but there are some exceptions.

| Compatibility | Application/Library  | Website                                                                        | Notes                                                                                     |
| ------------- | -------------------- | ------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------- |
| ✅❌          | openai-python        | https://github.com/openai/openai-python                                        | Use `OpenAI(base_url="http://127.0.0.1:5000/v1")`. Only the endpoints from above work.   |
| ✅❌          | openai-node          | https://github.com/openai/openai-node                                          | Use `new OpenAI({baseURL: "http://127.0.0.1:5000/v1"})`. See example above.              |
| ✅            | anse                 | https://github.com/anse-app/anse                                               | API Key & URL configurable in UI, Images also work.                                       |
| ✅            | shell_gpt            | https://github.com/TheR1D/shell_gpt                                            | OPENAI_API_HOST=http://127.0.0.1:5000                                                    |
| ✅            | gpt-shell            | https://github.com/jla/gpt-shell                                               | OPENAI_API_BASE=http://127.0.0.1:5000/v1                                                 |
| ✅            | gpt-discord-bot      | https://github.com/openai/gpt-discord-bot                                      | OPENAI_API_BASE=http://127.0.0.1:5000/v1                                                 |
| ✅            | OpenAI for Notepad++ | https://github.com/Krazal/nppopenai                                            | api_url=http://127.0.0.1:5000 in the config file, or environment variables.               |
| ✅            | vscode-openai        | https://marketplace.visualstudio.com/items?itemName=AndrewButson.vscode-openai | OPENAI_API_BASE=http://127.0.0.1:5000/v1                                                 |
| ✅❌          | langchain            | https://github.com/hwchase17/langchain                                         | Use `base_url="http://127.0.0.1:5000/v1"`. Results depend on model and prompt formatting. |


================================================
FILE: docs/13 - Keyboard Shortcuts.md
================================================
# Keyboard Shortcuts

#### General

| Shortcut                | Description                                      |
|-------------------------|--------------------------------------------------|
| Esc                     | Stop generation                                  |

#### Chat tab

| Shortcut                | Description                                      |
|-------------------------|--------------------------------------------------|
| Ctrl + S                | Show/hide chat controls                          |
| Ctrl + Enter            | Regenerate                                       |
| Alt + Enter             | Continue                                         |
| Ctrl + Shift + Backspace| Remove last                                      |
| Ctrl + Shift + M        | Impersonate                                      |
| ← (Left Arrow)          | Navigate to previous version of last assistant message |
| → (Right Arrow)         | Navigate to next version of last assistant message (or regenerate if at latest version) |


================================================
FILE: docs/Image Generation Tutorial.md
================================================
# Image Generation Tutorial

This feature allows you to generate images using `diffusers` models like [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) directly within the web UI.

<img alt="print" src="https://github.com/user-attachments/assets/5108de50-658b-4e93-b2ae-4656d076bc9d" />


## Installation

1. Clone the repository with

```
git clone https://github.com/oobabooga/text-generation-webui
```

or download it from [here](https://github.com/oobabooga/text-generation-webui/archive/refs/heads/main.zip) and unzip it.

2. Use the one-click installer.

- Windows: Double click on `start_windows.bat`
- Linux: Run `./start_linux.sh`
- macOS: Run `./start_macos.sh`

Note: Image generation does not work with the portable builds in `.zip` format in the [Releases page](https://github.com/oobabooga/text-generation-webui/releases). You need the "full" version of the web UI.

## Downloading a model

1. Once installation ends, browse to `http://127.0.0.1:7860/`.
2. Click on "Image AI" on the left.
3. Click on "Model" at the top.
4. In the "Download model" field, paste `https://huggingface.co/Tongyi-MAI/Z-Image-Turbo` and click "Download".
5. Wait for the download to finish (it's 31 GB).

## Loading the model

Select the quantization option in the "Quantization" menu and click "Load".

The memory usage for `Z-Image-Turbo` for each option is:

| Quantization Method | VRAM Usage |
| :--- | :--- |
| None (FP16/BF16) | 25613 MiB |
| bnb-8bit | 16301 MiB |
| bnb-8bit + CPU Offload | 16235 MiB |
| bnb-4bit | 11533 MiB |
| bnb-4bit + CPU Offload | 7677 MiB |

The `torchao` options support `torch.compile` for faster image generation, with `float8wo` specifically providing native hardware acceleration for RTX 40-series and newer GPUs.

Note: The next time you launch the web UI, the model will get automatically loaded with your last settings when you try to generate an image. You do not need to go to the Model tab and click "Load" each time.

## Generating images:

1. While still in the "Image AI" page, go to the "Generate" tab.
2. Type your prompt and click on the Generate button.

### Model-specific settings

- For Z-Image-Turbo, make sure to keep CFG Scale at 0 and Steps at 9. Do not write a Negative Prompt as it will get ignored with this CFG Scale value.

### LLM Prompt Variations

To use this feature, you need to load an LLM in the main "Model" page on the left.

If you have no idea what to use, do this to get started:

1. Download [Qwen3-4B-Q3_K_M.gguf](https://huggingface.co/unsloth/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-Q3_K_M.gguf) to your `text-generation-webui/user_data/models` folder.
2. Select the mo

Download .txt

gitextract_rxb2e4y6/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report_template.yml
│   │   └── feature_request.md
│   ├── dependabot.yml
│   ├── pull_request_template.md
│   └── workflows/
│       ├── build-everything-tgw.yml
│       ├── build-portable-release-cuda.yml
│       ├── build-portable-release-rocm.yml
│       ├── build-portable-release-vulkan.yml
│       └── build-portable-release.yml
├── .gitignore
├── LICENSE
├── README.md
├── cmd_linux.sh
├── cmd_macos.sh
├── cmd_windows.bat
├── docker/
│   ├── .dockerignore
│   ├── TensorRT-LLM/
│   │   └── Dockerfile
│   ├── amd/
│   │   ├── Dockerfile
│   │   └── docker-compose.yml
│   ├── cpu/
│   │   ├── Dockerfile
│   │   └── docker-compose.yml
│   ├── intel/
│   │   ├── Dockerfile
│   │   └── docker-compose.yml
│   └── nvidia/
│       ├── Dockerfile
│       └── docker-compose.yml
├── docs/
│   ├── 01 - Chat Tab.md
│   ├── 02 - Default and Notebook Tabs.md
│   ├── 03 - Parameters Tab.md
│   ├── 04 - Model Tab.md
│   ├── 05 - Training Tab.md
│   ├── 06 - Session Tab.md
│   ├── 07 - Extensions.md
│   ├── 08 - Additional Tips.md
│   ├── 09 - Docker.md
│   ├── 11 - AMD Setup.md
│   ├── 12 - OpenAI API.md
│   ├── 13 - Keyboard Shortcuts.md
│   ├── Image Generation Tutorial.md
│   ├── Multimodal Tutorial.md
│   ├── README.md
│   ├── Tool Calling Tutorial.md
│   └── What Works.md
├── download-model.py
├── js/
│   ├── dark_theme.js
│   ├── global_scope_js.js
│   ├── katex/
│   │   └── auto-render.js
│   ├── main.js
│   ├── save_files.js
│   ├── show_controls.js
│   ├── switch_tabs.js
│   └── update_big_picture.js
├── modules/
│   ├── LoRA.py
│   ├── callbacks.py
│   ├── chat.py
│   ├── evaluate.py
│   ├── exllamav3.py
│   ├── exllamav3_hf.py
│   ├── extensions.py
│   ├── grammar/
│   │   ├── grammar_utils.py
│   │   └── logits_process.py
│   ├── html_generator.py
│   ├── image_models.py
│   ├── image_utils.py
│   ├── llama_cpp_server.py
│   ├── loaders.py
│   ├── logging_colors.py
│   ├── logits.py
│   ├── metadata_gguf.py
│   ├── models.py
│   ├── models_settings.py
│   ├── paths.py
│   ├── presets.py
│   ├── prompts.py
│   ├── reasoning.py
│   ├── sampler_hijack.py
│   ├── sane_markdown_lists.py
│   ├── shared.py
│   ├── tensorrt_llm.py
│   ├── text_generation.py
│   ├── tool_parsing.py
│   ├── tool_use.py
│   ├── torch_utils.py
│   ├── training.py
│   ├── transformers_loader.py
│   ├── ui.py
│   ├── ui_chat.py
│   ├── ui_default.py
│   ├── ui_file_saving.py
│   ├── ui_image_generation.py
│   ├── ui_model_menu.py
│   ├── ui_notebook.py
│   ├── ui_parameters.py
│   ├── ui_session.py
│   ├── utils.py
│   └── web_search.py
├── one_click.py
├── requirements/
│   ├── full/
│   │   ├── requirements.txt
│   │   ├── requirements_amd.txt
│   │   ├── requirements_apple_intel.txt
│   │   ├── requirements_apple_silicon.txt
│   │   ├── requirements_cpu_only.txt
│   │   └── requirements_nowheels.txt
│   └── portable/
│       ├── requirements.txt
│       ├── requirements_amd.txt
│       ├── requirements_apple_intel.txt
│       ├── requirements_apple_silicon.txt
│       ├── requirements_cpu_only.txt
│       ├── requirements_cuda131.txt
│       ├── requirements_nowheels.txt
│       └── requirements_vulkan.txt
├── server.py
├── setup.cfg
├── start_linux.sh
├── start_macos.sh
├── start_windows.bat
├── update_wizard_linux.sh
├── update_wizard_macos.sh
└── update_wizard_windows.bat

Download .txt

SYMBOL INDEX (687 symbols across 55 files)

FILE: download-model.py
  class ModelDownloader (line 32) | class ModelDownloader:
    method __init__ (line 33) | def __init__(self, max_retries=7):
    method get_session (line 39) | def get_session(self):
    method sanitize_model_and_branch_names (line 59) | def sanitize_model_and_branch_names(self, model, branch):
    method get_download_links_from_huggingface (line 80) | def get_download_links_from_huggingface(self, model, branch, text_only...
    method get_output_folder (line 187) | def get_output_folder(self, model, branch, is_lora, is_llamacpp=False,...
    method progress_bar_slots (line 207) | def progress_bar_slots(self):
    method initialize_progress_bar_slots (line 213) | def initialize_progress_bar_slots(self, num_threads):
    method get_progress_bar_position (line 216) | def get_progress_bar_position(self):
    method release_progress_bar_position (line 225) | def release_progress_bar_position(self, slot):
    method get_single_file (line 229) | def get_single_file(self, url, output_folder, start_from_scratch=False):
    method start_download_threads (line 326) | def start_download_threads(self, file_list, output_folder, start_from_...
    method download_model_files (line 339) | def download_model_files(self, model, branch, links, sha256, output_fo...
    method check_model_files (line 363) | def check_model_files(self, model, branch, links, sha256, output_folder):

FILE: js/dark_theme.js
  function toggleDarkMode (line 1) | function toggleDarkMode() {

FILE: js/global_scope_js.js
  function copyToClipboard (line 5) | function copyToClipboard(element) {
  function fallbackCopyToClipboard (line 29) | function fallbackCopyToClipboard(text) {
  function branchHere (line 50) | function branchHere(element) {
  function editHere (line 84) | function editHere(buttonElement) {
  function startEditing (line 108) | function startEditing(messageElement, messageBody, isUserMessage) {
  function createEditingInterface (line 142) | function createEditingInterface(text) {
  function setupEditingHandlers (line 167) | function setupEditingHandlers(textarea, messageElement, originalHTML, me...
  function submitMessageEdit (line 197) | function submitMessageEdit(index, newText, isUserMessage) {
  function navigateVersion (line 220) | function navigateVersion(element, direction) {
  function regenerateClick (line 260) | function regenerateClick() {
  function continueClick (line 264) | function continueClick() {
  function removeLastClick (line 268) | function removeLastClick() {
  function autoScrollToBottom (line 272) | function autoScrollToBottom() {
  function updateInstructPadding (line 284) | function updateInstructPadding() {
  function handleMorphdomUpdate (line 303) | function handleMorphdomUpdate(data) {
  function applyMorphdomUpdate (line 314) | function applyMorphdomUpdate(data) {

FILE: js/katex/auto-render.js
  function r (line 13) | function r(e) {

FILE: js/main.js
  function isModifiedKeyboardEvent (line 52) | function isModifiedKeyboardEvent() {
  function isFocusedOnEditableTextbox (line 60) | function isFocusedOnEditableTextbox() {
  function isElementVisibleOnScreen (line 215) | function isElementVisibleOnScreen(element) {
  function showMenu (line 323) | function showMenu() {
  function hideMenu (line 327) | function hideMenu() {
  function isMouseOverButtonOrMenu (line 358) | function isMouseOverButtonOrMenu() {
  function respondToChatInputVisibility (line 436) | function respondToChatInputVisibility(element, callback) {
  function handleChatInputVisibilityChange (line 450) | function handleChatInputVisibilityChange(isVisible) {
  function addBigPicture (line 464) | function addBigPicture() {
  function deleteBigPicture (line 480) | function deleteBigPicture() {
  function toggleBigPicture (line 487) | function toggleBigPicture() {
  function respondToRenameVisibility (line 513) | function respondToRenameVisibility(element, callback) {
  function handleVisibilityChange (line 528) | function handleVisibilityChange(isVisible) {
  function moveToChatTab (line 602) | function moveToChatTab() {
  function restoreOriginalPosition (line 630) | function restoreOriginalPosition() {
  function handleIndividualSidebarClose (line 735) | function handleIndividualSidebarClose(event) {
  function toggleSidebar (line 754) | function toggleSidebar(sidebar, toggle, forceClose = false) {
  function isMobile (line 787) | function isMobile() {
  function initializeSidebars (line 792) | function initializeSidebars() {
  function createMobileTopBar (line 905) | function createMobileTopBar() {
  function navigateLastAssistantMessage (line 924) | function navigateLastAssistantMessage(direction) {
  constant MAX_PLAIN_TEXT_LENGTH (line 976) | const MAX_PLAIN_TEXT_LENGTH = 2500;
  function setupPasteHandler (line 978) | function setupPasteHandler() {
  function addMiniDeletes (line 1026) | function addMiniDeletes() {
  function updateMargin (line 1112) | function updateMargin() {

FILE: js/save_files.js
  function getCurrentTimestamp (line 2) | function getCurrentTimestamp() {
  function saveFile (line 10) | function saveFile(contents, filename) {
  function saveHistory (line 20) | function saveHistory(history, character, mode) {
  function saveSession (line 35) | function saveSession(session) {

FILE: js/show_controls.js
  function toggle_controls (line 3) | function toggle_controls(value) {

FILE: js/switch_tabs.js
  function scrollToTop (line 1) | function scrollToTop() {
  function findButtonsByText (line 5) | function findButtonsByText(buttonText) {
  function switch_to_chat (line 18) | function switch_to_chat() {
  function switch_to_notebook (line 23) | function switch_to_notebook() {
  function switch_to_generation_parameters (line 29) | function switch_to_generation_parameters() {
  function switch_to_character (line 35) | function switch_to_character() {
  function switch_to_image_ai_generate (line 40) | function switch_to_image_ai_generate() {

FILE: js/update_big_picture.js
  function updateBigPicture (line 1) | function updateBigPicture() {

FILE: modules/LoRA.py
  function add_lora_to_model (line 7) | def add_lora_to_model(lora_names):
  function get_lora_path (line 11) | def get_lora_path(lora_name):
  function add_lora_transformers (line 19) | def add_lora_transformers(lora_names):
  function merge_loras (line 74) | def merge_loras():

FILE: modules/callbacks.py
  class StopNowException (line 8) | class StopNowException(Exception):
  class Iteratorize (line 12) | class Iteratorize:
    method __init__ (line 21) | def __init__(self, func, args=None, kwargs=None, callback=None):
    method __iter__ (line 51) | def __iter__(self):
    method __next__ (line 54) | def __next__(self):
    method __del__ (line 61) | def __del__(self):
    method __enter__ (line 64) | def __enter__(self):
    method __exit__ (line 67) | def __exit__(self, exc_type, exc_val, exc_tb):

FILE: modules/chat.py
  function strftime_now (line 50) | def strftime_now(format):
  function get_current_timestamp (line 54) | def get_current_timestamp():
  function update_message_metadata (line 59) | def update_message_metadata(metadata_dict, role, index, **fields):
  function custom_tojson (line 85) | def custom_tojson(value, indent=None, ensure_ascii=True):
  function _raise_exception (line 93) | def _raise_exception(message):
  function get_compiled_template (line 102) | def get_compiled_template(template_str):
  function str_presenter (line 112) | def str_presenter(dumper, data):
  class _JsonDict (line 128) | class _JsonDict(dict):
    method __str__ (line 136) | def __str__(self):
    method __add__ (line 139) | def __add__(self, other):
    method __radd__ (line 142) | def __radd__(self, other):
  function _deserialize_tool_call_arguments (line 146) | def _deserialize_tool_call_arguments(tool_calls):
  function _expand_tool_sequence (line 172) | def _expand_tool_sequence(tool_seq):
  function generate_chat_prompt (line 215) | def generate_chat_prompt(user_input, state, **kwargs):
  function count_prompt_tokens (line 583) | def count_prompt_tokens(text_input, state):
  function get_stopping_strings (line 627) | def get_stopping_strings(state):
  function add_message_version (line 700) | def add_message_version(history, role, row_idx, is_current=True):
  function add_message_attachment (line 726) | def add_message_attachment(history, row_idx, file_path, is_user=True):
  function extract_pdf_text (line 806) | def extract_pdf_text(pdf_path):
  function extract_docx_text (line 822) | def extract_docx_text(docx_path):
  function generate_search_query (line 869) | def generate_search_query(user_message, state):
  function chatbot_wrapper (line 906) | def chatbot_wrapper(text, state, regenerate=False, _continue=False, load...
  function impersonate_wrapper (line 1147) | def impersonate_wrapper(textbox, state):
  function generate_chat_reply (line 1164) | def generate_chat_reply(text, state, regenerate=False, _continue=False, ...
  function character_is_loaded (line 1176) | def character_is_loaded(state, raise_exception=False):
  function generate_chat_reply_wrapper (line 1187) | def generate_chat_reply_wrapper(text, state, regenerate=False, _continue...
  function remove_last_message (line 1463) | def remove_last_message(history):
  function send_dummy_message (line 1483) | def send_dummy_message(text, state):
  function send_dummy_reply (line 1502) | def send_dummy_reply(text, state):
  function redraw_html (line 1527) | def redraw_html(history, name1, name2, mode, style, character, reset_cac...
  function start_new_chat (line 1531) | def start_new_chat(state, unique_id=None):
  function get_history_file_path (line 1553) | def get_history_file_path(unique_id, character, mode):
  function save_history (line 1562) | def save_history(history, unique_id, character, mode):
  function rename_history (line 1578) | def rename_history(old_id, new_id, character, mode):
  function get_paths (line 1595) | def get_paths(state):
  function find_all_histories (line 1618) | def find_all_histories(state):
  function find_all_histories_with_first_prompts (line 1627) | def find_all_histories_with_first_prompts(state):
  function load_latest_history (line 1671) | def load_latest_history(state):
  function load_history_after_deletion (line 1701) | def load_history_after_deletion(state, idx):
  function update_character_menu_after_deletion (line 1724) | def update_character_menu_after_deletion(idx):
  function get_chat_state_key (line 1732) | def get_chat_state_key(character, mode):
  function load_last_chat_state (line 1740) | def load_last_chat_state():
  function save_last_chat_state (line 1753) | def save_last_chat_state(character, mode, unique_id):
  function load_history (line 1771) | def load_history(unique_id, character, mode):
  function load_history_json (line 1799) | def load_history_json(file, history):
  function delete_history (line 1826) | def delete_history(unique_id, character, mode):
  function replace_character_names (line 1831) | def replace_character_names(text, name1, name2):
  function generate_pfp_cache (line 1836) | def generate_pfp_cache(character):
  function load_character (line 1859) | def load_character(character, name1, name2):
  function restore_character_for_ui (line 1906) | def restore_character_for_ui(state):
  function clear_character_for_ui (line 1926) | def clear_character_for_ui(state):
  function load_character_memoized (line 1944) | def load_character_memoized(character, name1, name2):
  function load_instruction_template_memoized (line 1949) | def load_instruction_template_memoized(template):
  function upload_character (line 1954) | def upload_character(file, img_path, tavern=False):
  function build_pygmalion_style_context (line 1988) | def build_pygmalion_style_context(data):
  function upload_tavern_character (line 2003) | def upload_tavern_character(img_path, _json):
  function check_tavern_character (line 2008) | def check_tavern_character(img_path):
  function upload_your_profile_picture (line 2026) | def upload_your_profile_picture(img_path):
  function generate_character_yaml (line 2041) | def generate_character_yaml(name, greeting, context):
  function generate_instruction_template_yaml (line 2052) | def generate_instruction_template_yaml(instruction_template):
  function save_character (line 2060) | def save_character(name, greeting, context, picture, filename):
  function delete_character (line 2076) | def delete_character(name, instruct=False):
  function generate_user_pfp_cache (line 2087) | def generate_user_pfp_cache(user):
  function load_user (line 2109) | def load_user(user_name, name1, user_bio):
  function generate_user_yaml (line 2149) | def generate_user_yaml(name, user_bio):
  function save_user (line 2159) | def save_user(name, user_bio, picture, filename):
  function delete_user (line 2181) | def delete_user(name):
  function update_user_menu_after_deletion (line 2193) | def update_user_menu_after_deletion(idx):
  function handle_user_menu_change (line 2207) | def handle_user_menu_change(state):
  function handle_save_user_click (line 2226) | def handle_save_user_click(name1):
  function my_yaml_output (line 2235) | def my_yaml_output(data):
  function handle_send_dummy_message_click (line 2249) | def handle_send_dummy_message_click(text, state):
  function handle_send_dummy_reply_click (line 2257) | def handle_send_dummy_reply_click(text, state):
  function handle_remove_last_click (line 2265) | def handle_remove_last_click(state):
  function handle_unique_id_select (line 2273) | def handle_unique_id_select(state):
  function handle_start_new_chat_click (line 2285) | def handle_start_new_chat_click(state):
  function handle_start_incognito_chat_click (line 2301) | def handle_start_incognito_chat_click(state):
  function handle_delete_chat_confirm_click (line 2315) | def handle_delete_chat_confirm_click(state):
  function handle_branch_chat_click (line 2334) | def handle_branch_chat_click(state):
  function handle_edit_message_click (line 2361) | def handle_edit_message_click(state):
  function handle_navigate_version_click (line 2408) | def handle_navigate_version_click(state):
  function handle_rename_chat_click (line 2460) | def handle_rename_chat_click():
  function handle_rename_chat_confirm (line 2468) | def handle_rename_chat_confirm(rename_to, state):
  function handle_search_chat_change (line 2486) | def handle_search_chat_change(state):
  function handle_upload_chat_history (line 2492) | def handle_upload_chat_history(load_chat_history, state):
  function handle_character_menu_change (line 2515) | def handle_character_menu_change(state):
  function handle_character_picture_change (line 2548) | def handle_character_picture_change(picture_path):
  function handle_mode_change (line 2568) | def handle_mode_change(state):
  function handle_save_character_click (line 2595) | def handle_save_character_click(name2):
  function handle_load_template_click (line 2603) | def handle_load_template_click(instruction_template):
  function handle_save_template_click (line 2612) | def handle_save_template_click(instruction_template_str):
  function handle_delete_template_click (line 2625) | def handle_delete_template_click(template):
  function handle_your_picture_change (line 2636) | def handle_your_picture_change(picture, state):
  function handle_send_instruction_click (line 2643) | def handle_send_instruction_click(state):
  function handle_send_chat_click (line 2656) | def handle_send_chat_click(state):

FILE: modules/evaluate.py
  function load_past_evaluations (line 14) | def load_past_evaluations():
  function save_past_evaluations (line 26) | def save_past_evaluations(df):
  function calculate_perplexity (line 34) | def calculate_perplexity(models, input_dataset, stride, _max_length):
  function add_entry_to_past_evaluations (line 136) | def add_entry_to_past_evaluations(perplexity, model, dataset, stride, ma...
  function is_in_past_evaluations (line 151) | def is_in_past_evaluations(model, dataset, stride, max_length):
  function generate_markdown_table (line 163) | def generate_markdown_table():

FILE: modules/exllamav3.py
  class LogitBiasFilter (line 41) | class LogitBiasFilter(Filter):
    method __init__ (line 44) | def __init__(self, tokenizer, logit_bias_dict):
    method reset (line 49) | def reset(self): pass
    method accept_token (line 50) | def accept_token(self, token): pass
    method is_completed (line 51) | def is_completed(self): return False
    method use_background_worker (line 52) | def use_background_worker(self): return False
    method get_next_logit_mask (line 54) | def get_next_logit_mask(self):
  class ConcurrentGenerator (line 64) | class ConcurrentGenerator:
    method __init__ (line 65) | def __init__(self, generator):
    method _iterate_loop (line 74) | def _iterate_loop(self):
    method submit (line 101) | def submit(self, job) -> queue.Queue:
    method cancel (line 109) | def cancel(self, job):
    method stop (line 116) | def stop(self):
  class Exllamav3Model (line 122) | class Exllamav3Model:
    method __init__ (line 123) | def __init__(self):
    method device (line 127) | def device(self) -> torch.device:
    method from_pretrained (line 131) | def from_pretrained(cls, path_to_model):
    method is_multimodal (line 257) | def is_multimodal(self) -> bool:
    method _process_images_for_generation (line 261) | def _process_images_for_generation(self, prompt: str, state: dict) -> ...
    method generate_with_streaming (line 316) | def generate_with_streaming(self, prompt, state):
    method _capture_logprobs (line 477) | def _capture_logprobs(self, result):
    method generate (line 496) | def generate(self, prompt, state):
    method get_logits (line 503) | def get_logits(self, token_ids, **kwargs):
    method encode (line 540) | def encode(self, string, **kwargs):
    method decode (line 544) | def decode(self, ids, **kwargs):
    method last_prompt_token_count (line 551) | def last_prompt_token_count(self):
    method unload (line 554) | def unload(self):

FILE: modules/exllamav3_hf.py
  class Exllamav3HF (line 28) | class Exllamav3HF(PreTrainedModel, GenerationMixin):
    method __init__ (line 29) | def __init__(self, model_dir):
    method _validate_model_class (line 94) | def _validate_model_class(self):
    method _validate_model_kwargs (line 97) | def _validate_model_kwargs(self, model_kwargs: Dict[str, Any]):
    method prepare_inputs_for_generation (line 100) | def prepare_inputs_for_generation(self, input_ids, **kwargs):
    method device (line 104) | def device(self) -> torch.device:
    method __call__ (line 107) | def __call__(self, *args, **kwargs):
    method from_pretrained (line 253) | def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union...
    method unload (line 262) | def unload(self):

FILE: modules/extensions.py
  function apply_settings (line 17) | def apply_settings(extension, name):
  function load_extensions (line 28) | def load_extensions():
  function iterator (line 83) | def iterator():
  function _apply_string_extensions (line 90) | def _apply_string_extensions(function_name, text, state, is_chat=False):
  function _apply_chat_input_extensions (line 121) | def _apply_chat_input_extensions(text, visible_text, state):
  function _apply_custom_generate_chat_prompt (line 130) | def _apply_custom_generate_chat_prompt(text, state, **kwargs):
  function _apply_state_modifier_extensions (line 139) | def _apply_state_modifier_extensions(state):
  function _apply_history_modifier_extensions (line 148) | def _apply_history_modifier_extensions(history):
  function _apply_tokenizer_extensions (line 157) | def _apply_tokenizer_extensions(function_name, state, prompt, input_ids,...
  function _apply_logits_processor_extensions (line 167) | def _apply_logits_processor_extensions(function_name, processor_list, in...
  function _apply_custom_tokenized_length (line 179) | def _apply_custom_tokenized_length(prompt):
  function _apply_custom_generate_reply (line 188) | def _apply_custom_generate_reply():
  function _apply_custom_css (line 196) | def _apply_custom_css():
  function _apply_custom_js (line 205) | def _apply_custom_js():
  function create_extensions_block (line 214) | def create_extensions_block():
  function create_extensions_tabs (line 229) | def create_extensions_tabs():
  function apply_extensions (line 255) | def apply_extensions(typ, *args, **kwargs):

FILE: modules/grammar/grammar_utils.py
  class ParseState (line 36) | class ParseState:
    method __init__ (line 37) | def __init__(self):
  function get_symbol_id (line 42) | def get_symbol_id(state, src):
  function generate_symbol_id (line 48) | def generate_symbol_id(state, base_name):
  function is_word_char (line 54) | def is_word_char(c):
  function hex_to_int (line 58) | def hex_to_int(c):
  function remove_leading_white_space (line 66) | def remove_leading_white_space(src, newline_ok):
  function parse_name (line 94) | def parse_name(src):
  function read_hex (line 103) | def read_hex(s):
  function parse_char (line 110) | def parse_char(src):
  function parse_sequence (line 142) | def parse_sequence(state, src, rule_name, outbuf, is_nested):
  function parse_alternates (line 249) | def parse_alternates(state, src, rule_name, rule_id, is_nested):
  function parse_rule (line 262) | def parse_rule(state, src):
  function parse_ebnf (line 282) | def parse_ebnf(src):
  function print_rule (line 300) | def print_rule(file, grammar_encoding, index, symbol_id_names):
  function print_grammar (line 333) | def print_grammar(file, state):
  class GrammarConstraint (line 353) | class GrammarConstraint(ABC):
    method __init__ (line 354) | def __init__(self, grammar_str, start_rule_name, tokenizer):
    method init_stacks (line 389) | def init_stacks(self):
    method advance_stack (line 404) | def advance_stack(self, stack):
    method accept_char (line 449) | def accept_char(self, *args, **kwargs):
    method accept_token_id (line 453) | def accept_token_id(self, *args, **kwargs):
    method filter_vocab (line 457) | def filter_vocab(self, *args, **kwargs):
  class IncrementalGrammarConstraint (line 461) | class IncrementalGrammarConstraint(GrammarConstraint):
    method __init__ (line 462) | def __init__(self, grammar_str, start_rule_name, tokenizer):
    method accept_char (line 465) | def accept_char(self, char, stacks):
    method accept_string (line 497) | def accept_string(self, string: str, stacks: List[List[int]]):
    method accept_token_id (line 502) | def accept_token_id(self, token_id: int, stacks: List[List[int]]):
    method accept_token_ids (line 520) | def accept_token_ids(self, token_ids: List[int], stacks: List[List[int...
    method batch_filter_vocab (line 529) | def batch_filter_vocab(self, batch_stacks, device):
    method filter_vocab (line 535) | def filter_vocab(self, stacks, device):
    method pos_char_acceptance (line 551) | def pos_char_acceptance(self, pos, char):
    method token_acceptance_for_stack (line 571) | def token_acceptance_for_stack(self, stack, device):
  class StaticGrammarConstraint (line 617) | class StaticGrammarConstraint(GrammarConstraint):
    method __init__ (line 618) | def __init__(self, grammar_str, start_rule_name, tokenizer):
    method accept_char (line 621) | def accept_char(self):
  class TokenTrie (line 633) | class TokenTrie:
    method __init__ (line 634) | def __init__(self, tokenizer):
    method id2str (line 640) | def id2str(self, token_id):
    method __len__ (line 643) | def __len__(self):
    method load_tokens (line 646) | def load_tokens(self, tokenizer):
    method insert_into_trie (line 689) | def insert_into_trie(self, trie, token_bytes, token_id):
  function initialize_grammar (line 699) | def initialize_grammar(grammar_string):

FILE: modules/grammar/logits_process.py
  class GrammarConstrainedLogitsProcessor (line 31) | class GrammarConstrainedLogitsProcessor(LogitsProcessor):
    method __init__ (line 32) | def __init__(self, grammar_constraint):
    method filter_logits (line 37) | def filter_logits(self, logits, device):
    method process_logits (line 47) | def process_logits(self, input_ids, scores, parse_start_index=None):
    method __call__ (line 103) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...

FILE: modules/html_generator.py
  function minify_css (line 21) | def minify_css(css: str) -> str:
  function fix_newlines (line 71) | def fix_newlines(string):
  function replace_quotes (line 78) | def replace_quotes(text):
  function replace_blockquote (line 108) | def replace_blockquote(m):
  function extract_thinking_block (line 112) | def extract_thinking_block(string):
  function build_tool_call_block (line 118) | def build_tool_call_block(header, body, message_id, index):
  function build_thinking_block (line 146) | def build_thinking_block(thinking_content, message_id, has_remaining_con...
  function build_main_content_block (line 172) | def build_main_content_block(content):
  function process_markdown_content (line 180) | def process_markdown_content(string):
  function convert_to_markdown (line 327) | def convert_to_markdown(string, message_id=None):
  function convert_to_markdown_wrapped (line 401) | def convert_to_markdown_wrapped(string, message_id=None, use_cache=True):
  function generate_basic_html (line 412) | def generate_basic_html(string):
  function make_thumbnail (line 419) | def make_thumbnail(image):
  function get_image_cache (line 427) | def get_image_cache(path):
  function format_message_timestamp (line 468) | def format_message_timestamp(history, role, index, tooltip_include_times...
  function format_message_attachments (line 480) | def format_message_attachments(history, role, index):
  function get_message_tooltip (line 518) | def get_message_tooltip(history, role, index, include_timestamp=True):
  function get_version_navigation_html (line 535) | def get_version_navigation_html(history, i, role):
  function actions_html (line 560) | def actions_html(history, i, role, info_message=""):
  function generate_instruct_html (line 590) | def generate_instruct_html(history, last_message_only=False):
  function get_character_image_with_cache_buster (line 648) | def get_character_image_with_cache_buster():
  function generate_cai_chat_html (line 658) | def generate_cai_chat_html(history, name1, name2, style, character, rese...
  function time_greeting (line 722) | def time_greeting():
  function chat_html_wrapper (line 732) | def chat_html_wrapper(history, name1, name2, mode, style, character, res...

FILE: modules/image_models.py
  function get_quantization_config (line 8) | def get_quantization_config(quant_method):
  function get_pipeline_type (line 87) | def get_pipeline_type(pipe):
  function load_image_model (line 103) | def load_image_model(model_name, dtype='bfloat16', attn_backend='sdpa', ...
  function unload_image_model (line 187) | def unload_image_model():

FILE: modules/image_utils.py
  function open_image_safely (line 12) | def open_image_safely(path):
  function convert_pil_to_base64 (line 26) | def convert_pil_to_base64(image: Image.Image) -> str:
  function decode_base64_image (line 35) | def decode_base64_image(base64_string: str) -> Image.Image:
  function process_message_content (line 49) | def process_message_content(content: Any) -> Tuple[str, List[Image.Image]]:
  function convert_image_attachments_to_pil (line 107) | def convert_image_attachments_to_pil(image_attachments: List[dict]) -> L...
  function convert_openai_messages_to_images (line 122) | def convert_openai_messages_to_images(messages: List[dict]) -> List[Imag...

FILE: modules/llama_cpp_server.py
  class LlamaServer (line 28) | class LlamaServer:
    method __init__ (line 29) | def __init__(self, model_path, server_path=None):
    method encode (line 46) | def encode(self, text, add_bos_token=False, **kwargs):
    method decode (line 60) | def decode(self, token_ids, **kwargs):
    method prepare_payload (line 70) | def prepare_payload(self, state):
    method _process_images_for_generation (line 154) | def _process_images_for_generation(self, state: dict) -> List[Any]:
    method is_multimodal (line 171) | def is_multimodal(self) -> bool:
    method generate_with_streaming (line 175) | def generate_with_streaming(self, prompt, state):
    method generate (line 272) | def generate(self, prompt, state):
    method get_logits (line 279) | def get_logits(self, prompt, state, n_probs=128, use_samplers=False):
    method _get_vocabulary_size (line 313) | def _get_vocabulary_size(self):
    method _get_bos_token (line 323) | def _get_bos_token(self):
    method _is_port_available (line 335) | def _is_port_available(self, port):
    method _find_available_port (line 344) | def _find_available_port(self):
    method _start_server (line 355) | def _start_server(self):
    method __enter__ (line 514) | def __enter__(self):
    method __exit__ (line 518) | def __exit__(self, exc_type, exc_val, exc_tb):
    method __del__ (line 522) | def __del__(self):
    method stop (line 526) | def stop(self):
  function filter_stderr_with_progress (line 539) | def filter_stderr_with_progress(process_stderr):

FILE: modules/loaders.py
  function transformers_samplers (line 79) | def transformers_samplers():
  function list_all_samplers (line 265) | def list_all_samplers():
  function blacklist_samplers (line 274) | def blacklist_samplers(loader, dynamic_temperature):
  function get_all_params (line 292) | def get_all_params():
  function list_model_elements (line 301) | def list_model_elements():
  function make_loader_params_visible (line 351) | def make_loader_params_visible(loader):

FILE: modules/logging_colors.py
  function setup_logging (line 6) | def setup_logging():

FILE: modules/logits.py
  function get_next_logits (line 15) | def get_next_logits(*args, **kwargs):
  function _get_next_logits (line 36) | def _get_next_logits(prompt, state, use_samplers, previous, top_logits=2...

FILE: modules/metadata_gguf.py
  class GGUFValueType (line 5) | class GGUFValueType(IntEnum):
  function get_single (line 50) | def get_single(value_type, file):
  function load_metadata (line 66) | def load_metadata(fname):

FILE: modules/models.py
  function load_model (line 12) | def load_model(model_name, loader=None):
  function llama_cpp_server_loader (line 74) | def llama_cpp_server_loader(model_name):
  function transformers_loader (line 97) | def transformers_loader(model_name):
  function ExLlamav3_HF_loader (line 102) | def ExLlamav3_HF_loader(model_name):
  function ExLlamav3_loader (line 108) | def ExLlamav3_loader(model_name):
  function TensorRT_LLM_loader (line 115) | def TensorRT_LLM_loader(model_name):
  function unload_model (line 125) | def unload_model(keep_model_name=False):
  function reload_model (line 149) | def reload_model():
  function unload_model_if_idle (line 154) | def unload_model_if_idle():

FILE: modules/models_settings.py
  function get_fallback_settings (line 14) | def get_fallback_settings():
  function get_model_metadata (line 24) | def get_model_metadata(model):
  function infer_loader (line 184) | def infer_loader(model_name, model_settings, hf_quant_method=None):
  function update_model_parameters (line 204) | def update_model_parameters(state, initial=False):
  function apply_model_settings_to_state (line 224) | def apply_model_settings_to_state(model, state):
  function save_model_settings (line 258) | def save_model_settings(model, state):
  function save_instruction_template (line 285) | def save_instruction_template(model, template):
  function load_gguf_metadata_with_cache (line 317) | def load_gguf_metadata_with_cache(model_file):
  function get_model_size_mb (line 321) | def get_model_size_mb(model_file: Path) -> float:
  function estimate_vram (line 339) | def estimate_vram(gguf_file, gpu_layers, ctx_size, cache_type):
  function update_gpu_layers_and_vram (line 390) | def update_gpu_layers_and_vram(loader, model, gpu_layers, ctx_size, cach...
  function load_instruction_template (line 402) | def load_instruction_template(template):
  function _jinja_template_from_old_format (line 421) | def _jinja_template_from_old_format(params, verbose=False):

FILE: modules/paths.py
  function resolve_user_data_dir (line 5) | def resolve_user_data_dir():

FILE: modules/presets.py
  function default_preset (line 54) | def default_preset():
  function presets_params (line 65) | def presets_params():
  function load_preset (line 69) | def load_preset(name, verbose=False):
  function load_preset_memoized (line 90) | def load_preset_memoized(name):
  function load_preset_for_ui (line 94) | def load_preset_for_ui(name, state):
  function reset_preset_for_ui (line 100) | def reset_preset_for_ui(name, state):
  function neutralize_samplers_for_ui (line 107) | def neutralize_samplers_for_ui(state):
  function loader_contains (line 114) | def loader_contains(sampler):
  function remove_defaults (line 121) | def remove_defaults(state):
  function generate_preset_yaml (line 132) | def generate_preset_yaml(state):

FILE: modules/prompts.py
  function load_prompt (line 7) | def load_prompt(fname):
  function count_tokens (line 32) | def count_tokens(text):

FILE: modules/reasoning.py
  function extract_reasoning (line 16) | def extract_reasoning(text, html_escaped=False):

FILE: modules/sampler_hijack.py
  class TemperatureLogitsWarperCustom (line 24) | class TemperatureLogitsWarperCustom(LogitsProcessor):
    method __init__ (line 29) | def __init__(self, temperature: float):
    method __call__ (line 42) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class DynamicTemperatureLogitsWarper (line 47) | class DynamicTemperatureLogitsWarper(LogitsProcessor):
    method __init__ (line 52) | def __init__(self, dynatemp_low: float, dynatemp_high: float, dynatemp...
    method __call__ (line 57) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class QuadraticSamplingLogitsWarper (line 105) | class QuadraticSamplingLogitsWarper(LogitsProcessor):
    method __init__ (line 110) | def __init__(self, smoothing_factor, smoothing_curve):
    method __call__ (line 114) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class TailFreeLogitsWarper (line 132) | class TailFreeLogitsWarper(LogitsProcessor):
    method __init__ (line 133) | def __init__(self, tfs: float, filter_value: float = -float("Inf"), mi...
    method __call__ (line 141) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class TopALogitsWarper (line 172) | class TopALogitsWarper(LogitsProcessor):
    method __init__ (line 173) | def __init__(self, top_a: float, filter_value: float = -float("Inf"), ...
    method __call__ (line 181) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class TopNSigmaLogitsWarper (line 198) | class TopNSigmaLogitsWarper(LogitsProcessor):
    method __init__ (line 199) | def __init__(self, n_sigma: float = 2.0, filter_value: float = -float(...
    method __call__ (line 214) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class AdaptivePLogitsWarper (line 238) | class AdaptivePLogitsWarper(LogitsProcessor):
    method __init__ (line 251) | def __init__(self, adaptive_target, adaptive_decay, filter_value=-floa...
    method __call__ (line 265) | def __call__(self, input_ids, scores):
  class XTCLogitsWarper (line 306) | class XTCLogitsWarper(LogitsProcessor):
    method __init__ (line 307) | def __init__(self, threshold: float, probability: float, filter_value:...
    method __call__ (line 318) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class DRYLogitsProcessor (line 352) | class DRYLogitsProcessor(LogitsProcessor):
    method __init__ (line 353) | def __init__(self, multiplier: float, base: float, allowed_length: int...
    method __call__ (line 360) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class MirostatLogitsWarper (line 424) | class MirostatLogitsWarper(LogitsProcessor):
    method __init__ (line 425) | def __init__(self, mirostat_mode: int, mirostat_tau: float, mirostat_e...
    method __call__ (line 437) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class SpyLogitsWarper (line 473) | class SpyLogitsWarper(LogitsProcessor):
    method __init__ (line 474) | def __init__(self):
    method __call__ (line 477) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class RepetitionPenaltyLogitsProcessorWithRange (line 483) | class RepetitionPenaltyLogitsProcessorWithRange(LogitsProcessor):
    method __init__ (line 484) | def __init__(self, penalty: float, _range: int):
    method apply_repetition_penalty (line 490) | def apply_repetition_penalty(self, input_ids_row, scores_row):
    method __call__ (line 499) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class PresencePenaltyLogitsProcessor (line 507) | class PresencePenaltyLogitsProcessor(LogitsProcessor):
    method __init__ (line 508) | def __init__(self, presence_penalty: float, _range: int):
    method apply_presence_penalty (line 512) | def apply_presence_penalty(self, input_ids_row, scores_row):
    method __call__ (line 521) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  class FrequencyPenaltyLogitsProcessor (line 528) | class FrequencyPenaltyLogitsProcessor(LogitsProcessor):
    method __init__ (line 529) | def __init__(self, frequency_penalty: float, _range: int):
    method apply_frequency_penalty (line 533) | def apply_frequency_penalty(self, input_ids_row, scores_row):
    method __call__ (line 542) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
  function get_logits_processor_patch (line 549) | def get_logits_processor_patch(self, **kwargs):
  function generation_config_init_patch (line 756) | def generation_config_init_patch(self, **kwargs):
  function hijack_samplers (line 786) | def hijack_samplers():

FILE: modules/sane_markdown_lists.py
  class SaneListIndentProcessor (line 42) | class SaneListIndentProcessor(ListIndentProcessor):
    method __init__ (line 54) | def __init__(self, *args):
    method test (line 58) | def test(self, parent: etree.Element, block: str) -> bool:
    method get_level (line 64) | def get_level(self, parent: etree.Element, block: str) -> tuple[int, e...
    method detab (line 91) | def detab(self, text: str, length: int | None = None) -> tuple[str, str]:
    method looseDetab (line 106) | def looseDetab(self, text: str, level: int = 1) -> str:
  class SaneOListProcessor (line 115) | class SaneOListProcessor(OListProcessor):
    method __init__ (line 123) | def __init__(self, parser: blockparser.BlockParser):
    method run (line 134) | def run(self, parent: etree.Element, blocks: list[str]) -> None:
    method looseDetab (line 193) | def looseDetab(self, text: str, indent_length: int, level: int = 1) ->...
    method get_items (line 201) | def get_items(self, block: str) -> list[str]:
  class SaneUListProcessor (line 231) | class SaneUListProcessor(SaneOListProcessor):
    method __init__ (line 238) | def __init__(self, parser: blockparser.BlockParser):
    method get_items (line 245) | def get_items(self, block: str) -> list[str]:
  class SaneParagraphProcessor (line 269) | class SaneParagraphProcessor(ParagraphProcessor):
    method __init__ (line 272) | def __init__(self, parser: BlockParser):
    method run (line 277) | def run(self, parent: etree.Element, blocks: list[str]) -> None:
  class SaneListExtension (line 320) | class SaneListExtension(Extension):
    method extendMarkdown (line 323) | def extendMarkdown(self, md):
  function makeExtension (line 334) | def makeExtension(**kwargs):  # pragma: no cover

FILE: modules/shared.py
  function do_cmd_flags_warnings (line 374) | def do_cmd_flags_warnings():
  function apply_image_model_cli_overrides (line 405) | def apply_image_model_cli_overrides():
  function fix_loader_name (line 421) | def fix_loader_name(name):
  function add_extension (line 438) | def add_extension(name, last=False):
  function is_chat (line 448) | def is_chat():
  function load_user_config (line 452) | def load_user_config():

FILE: modules/tensorrt_llm.py
  class TensorRTLLMModel (line 10) | class TensorRTLLMModel:
    method __init__ (line 11) | def __init__(self):
    method from_pretrained (line 15) | def from_pretrained(cls, path_to_model):
    method generate_with_streaming (line 28) | def generate_with_streaming(self, prompt, state):
    method generate (line 61) | def generate(self, prompt, state):
    method unload (line 68) | def unload(self):

FILE: modules/text_generation.py
  function generate_reply (line 20) | def generate_reply(*args, **kwargs):
  function _generate_reply (line 44) | def _generate_reply(question, state, stopping_strings=None, is_chat=Fals...
  function encode (line 128) | def encode(prompt, add_special_tokens=True, add_bos_token=True, truncati...
  function decode (line 178) | def decode(output_ids, skip_special_tokens=True):
  function get_encoded_length (line 185) | def get_encoded_length(prompt):
  function get_token_ids (line 193) | def get_token_ids(prompt):
  function get_max_prompt_length (line 204) | def get_max_prompt_length(state):
  function generate_reply_wrapper (line 208) | def generate_reply_wrapper(question, state, stopping_strings=None):
  function formatted_outputs (line 222) | def formatted_outputs(reply, model_name):
  function set_manual_seed (line 226) | def set_manual_seed(seed):
  function stop_everything_event (line 246) | def stop_everything_event():
  function apply_stopping_strings (line 250) | def apply_stopping_strings(reply, all_stop_strings):
  function get_reply_from_output_ids (line 275) | def get_reply_from_output_ids(output_ids, state=None, starting_from=0):
  function generate_reply_HF (line 300) | def generate_reply_HF(question, original_question, state, stopping_strin...
  function generate_reply_custom (line 489) | def generate_reply_custom(question, original_question, state, stopping_s...
  function print_prompt (line 528) | def print_prompt(prompt, max_chars=-1):

FILE: modules/tool_parsing.py
  function get_tool_call_id (line 6) | def get_tool_call_id() -> str:
  function streaming_tool_buffer_check (line 27) | def streaming_tool_buffer_check(text, markers=None, tool_names=None, che...
  function check_and_sanitize_tool_call_candidate (line 70) | def check_and_sanitize_tool_call_candidate(candidate_dict: dict, tool_na...
  function _extract_balanced_json (line 90) | def _extract_balanced_json(text: str, start: int) -> str | None:
  function _parse_channel_tool_calls (line 123) | def _parse_channel_tool_calls(answer: str, tool_names: list[str]):
  function _parse_mistral_token_tool_calls (line 166) | def _parse_mistral_token_tool_calls(answer: str, tool_names: list[str]):
  function _parse_bare_name_tool_calls (line 200) | def _parse_bare_name_tool_calls(answer: str, tool_names: list[str]):
  function _parse_xml_param_tool_calls (line 241) | def _parse_xml_param_tool_calls(answer: str, tool_names: list[str]):
  function _parse_kimi_tool_calls (line 282) | def _parse_kimi_tool_calls(answer: str, tool_names: list[str]):
  function _parse_minimax_tool_calls (line 320) | def _parse_minimax_tool_calls(answer: str, tool_names: list[str]):
  function _parse_deep_seek_tool_calls (line 361) | def _parse_deep_seek_tool_calls(answer: str, tool_names: list[str]):
  function _parse_glm_tool_calls (line 397) | def _parse_glm_tool_calls(answer: str, tool_names: list[str]):
  function _parse_pythonic_tool_calls (line 441) | def _parse_pythonic_tool_calls(answer: str, tool_names: list[str]):
  function detect_tool_call_format (line 560) | def detect_tool_call_format(template_str):
  function parse_tool_call (line 595) | def parse_tool_call(answer: str, tool_names: list[str], return_prefix: b...

FILE: modules/tool_use.py
  function get_available_tools (line 9) | def get_available_tools():
  function load_tools (line 16) | def load_tools(selected_names):
  function execute_tool (line 58) | def execute_tool(func_name, arguments, executors):

FILE: modules/torch_utils.py
  function get_device (line 10) | def get_device():
  function clear_torch_cache (line 25) | def clear_torch_cache():

FILE: modules/training.py
  function create_ui (line 36) | def create_ui():
  function do_interrupt (line 188) | def do_interrupt():
  function do_copy_params (line 193) | def do_copy_params(lora_name: str, *args):
  function change_rank_limit (line 212) | def change_rank_limit(use_higher_ranks: bool):
  function clean_path (line 217) | def clean_path(base_path: str, path: str):
  function get_instruction_templates (line 226) | def get_instruction_templates():
  function load_template (line 235) | def load_template(name):
  function backup_adapter (line 249) | def backup_adapter(input_folder):
  function calc_trainable_parameters (line 280) | def calc_trainable_parameters(model):
  function do_train (line 296) | def do_train(lora_name: str, always_override: bool, all_linear: bool, q_...
  function format_time (line 841) | def format_time(seconds: float):

FILE: modules/transformers_loader.py
  class _StopEverythingStoppingCriteria (line 27) | class _StopEverythingStoppingCriteria(transformers.StoppingCriteria):
    method __init__ (line 28) | def __init__(self):
    method __call__ (line 31) | def __call__(self, input_ids: torch.LongTensor, _scores: torch.FloatTe...
  class Stream (line 35) | class Stream(transformers.StoppingCriteria):
    method __init__ (line 36) | def __init__(self, callback_func=None):
    method __call__ (line 39) | def __call__(self, input_ids, scores) -> bool:
  class LogitsBiasProcessor (line 46) | class LogitsBiasProcessor(LogitsProcessor):
    method __init__ (line 47) | def __init__(self, logit_bias={}):
    method __call__ (line 54) | def __call__(self, input_ids: torch.LongTensor, logits: torch.FloatTen...
    method __repr__ (line 60) | def __repr__(self):
  class LogprobProcessor (line 64) | class LogprobProcessor(LogitsProcessor):
    method __init__ (line 65) | def __init__(self, logprobs=None):
    method __call__ (line 70) | def __call__(self, input_ids: torch.LongTensor, logits: torch.FloatTen...
    method __repr__ (line 81) | def __repr__(self):
  function load_tokenizer (line 85) | def load_tokenizer(model_name, tokenizer_dir=None):
  function load_model_HF (line 105) | def load_model_HF(model_name):
  function get_max_memory_dict (line 209) | def get_max_memory_dict():

FILE: modules/ui.py
  function list_model_elements (line 122) | def list_model_elements():
  function list_interface_input_elements (line 127) | def list_interface_input_elements():
  function gather_interface_values (line 263) | def gather_interface_values(*args):
  function apply_interface_values (line 283) | def apply_interface_values(state, use_persistent=False):
  function save_settings (line 300) | def save_settings(state, preset, extensions_list, show_controls, theme_s...
  function store_current_state_and_debounce (line 365) | def store_current_state_and_debounce(interface_state, preset, extensions...
  function _perform_debounced_save (line 388) | def _perform_debounced_save():
  function setup_auto_save (line 406) | def setup_auto_save():
  function create_refresh_button (line 532) | def create_refresh_button(refresh_component, refresh_method, refreshed_a...

FILE: modules/ui_chat.py
  function create_ui (line 17) | def create_ui():
  function create_character_settings_ui (line 138) | def create_character_settings_ui():
  function create_chat_settings_ui (line 197) | def create_chat_settings_ui():
  function create_event_handlers (line 222) | def create_event_handlers():

FILE: modules/ui_default.py
  function create_ui (line 19) | def create_ui():
  function create_event_handlers (line 76) | def create_event_handlers():
  function handle_new_prompt (line 158) | def handle_new_prompt():
  function handle_delete_prompt_confirm_default (line 169) | def handle_delete_prompt_confirm_default(prompt_name):
  function handle_rename_prompt_click_default (line 192) | def handle_rename_prompt_click_default(current_name):
  function handle_rename_prompt_confirm_default (line 201) | def handle_rename_prompt_confirm_default(new_name, current_name):

FILE: modules/ui_file_saving.py
  function create_ui (line 9) | def create_ui():
  function create_event_handlers (line 70) | def create_event_handlers():
  function handle_save_preset_confirm_click (line 98) | def handle_save_preset_confirm_click(filename, contents):
  function handle_save_confirm_click (line 114) | def handle_save_confirm_click(root_state, filename, contents):
  function handle_delete_confirm_click (line 127) | def handle_delete_confirm_click(root_state, filename):
  function handle_save_character_confirm_click (line 140) | def handle_save_character_confirm_click(name2, greeting, context, charac...
  function handle_delete_character_confirm_click (line 155) | def handle_delete_character_confirm_click(character):
  function handle_save_preset_click (line 170) | def handle_save_preset_click(state):
  function handle_delete_preset_click (line 179) | def handle_delete_preset_click(preset):
  function handle_save_grammar_click (line 189) | def handle_save_grammar_click(grammar_string):
  function handle_delete_grammar_click (line 200) | def handle_delete_grammar_click(grammar_file):
  function handle_save_user_confirm_click (line 210) | def handle_save_user_confirm_click(name1, user_bio, your_picture, filena...
  function handle_delete_user_confirm_click (line 225) | def handle_delete_user_confirm_click(user):

FILE: modules/ui_image_generation.py
  function round_to_step (line 51) | def round_to_step(value, step=STEP):
  function clamp (line 55) | def clamp(value, min_val, max_val):
  function apply_aspect_ratio (line 59) | def apply_aspect_ratio(aspect_ratio, current_width, current_height):
  function update_height_from_width (line 82) | def update_height_from_width(width, aspect_ratio):
  function update_width_from_height (line 93) | def update_width_from_height(height, aspect_ratio):
  function swap_dimensions_and_update_ratio (line 104) | def swap_dimensions_and_update_ratio(width, height, aspect_ratio):
  function build_generation_metadata (line 120) | def build_generation_metadata(state, actual_seed):
  function save_generated_images (line 135) | def save_generated_images(images, state, actual_seed):
  function read_image_metadata (line 164) | def read_image_metadata(image_path):
  function format_metadata_for_display (line 180) | def format_metadata_for_display(metadata):
  function get_all_history_images (line 213) | def get_all_history_images(force_refresh=False):
  function get_paginated_images (line 240) | def get_paginated_images(page=0, force_refresh=False):
  function get_initial_page_info (line 257) | def get_initial_page_info():
  function refresh_gallery (line 263) | def refresh_gallery(current_page=0):
  function go_to_page (line 270) | def go_to_page(page_num, current_page):
  function next_page (line 282) | def next_page(current_page):
  function prev_page (line 289) | def prev_page(current_page):
  function on_gallery_select (line 296) | def on_gallery_select(evt: gr.SelectData, current_page):
  function send_to_generate (line 321) | def send_to_generate(selected_image_path):
  function read_dropped_image_metadata (line 346) | def read_dropped_image_metadata(image_path):
  function create_ui (line 355) | def create_ui():
  function create_event_handlers (line 537) | def create_event_handlers():
  function generate_prompt_variation (line 692) | def generate_prompt_variation(state):
  function progress_bar_html (line 748) | def progress_bar_html(progress=0, text=""):
  function generate (line 761) | def generate(state, save_images=True):
  function load_image_model_wrapper (line 926) | def load_image_model_wrapper(model_name, dtype, attn_backend, cpu_offloa...
  function unload_image_model_wrapper (line 953) | def unload_image_model_wrapper():
  function download_image_model_wrapper (line 961) | def download_image_model_wrapper(model_path):

FILE: modules/ui_model_menu.py
  function create_ui (line 26) | def create_ui():
  function create_event_handlers (line 143) | def create_event_handlers():
  function load_model_wrapper (line 199) | def load_model_wrapper(selected_model, loader, autoload=False):
  function load_lora_wrapper (line 231) | def load_lora_wrapper(selected_loras):
  function download_model_wrapper (line 237) | def download_model_wrapper(repo_id, specific_file, progress=gr.Progress(...
  function update_truncation_length (line 385) | def update_truncation_length(current_length, state):
  function get_initial_vram_info (line 397) | def get_initial_vram_info():
  function get_initial_gpu_layers_max (line 410) | def get_initial_gpu_layers_max():
  function handle_load_model_event_initial (line 418) | def handle_load_model_event_initial(model, state):
  function handle_load_model_event_final (line 427) | def handle_load_model_event_final(truncation_length, loader, state):
  function handle_unload_model_click (line 432) | def handle_unload_model_click():
  function format_file_size (line 437) | def format_file_size(size_bytes):

FILE: modules/ui_notebook.py
  function create_ui (line 25) | def create_ui():
  function create_event_handlers (line 83) | def create_event_handlers():
  function generate_and_save_wrapper_notebook (line 171) | def generate_and_save_wrapper_notebook(textbox_content, interface_state,...
  function handle_new_prompt (line 193) | def handle_new_prompt():
  function handle_delete_prompt_confirm_notebook (line 204) | def handle_delete_prompt_confirm_notebook(prompt_name):
  function handle_rename_prompt_click_notebook (line 227) | def handle_rename_prompt_click_notebook(current_name):
  function handle_rename_prompt_confirm_notebook (line 235) | def handle_rename_prompt_confirm_notebook(new_name, current_name):
  function autosave_prompt (line 250) | def autosave_prompt(text, prompt_name):
  function safe_autosave_prompt (line 258) | def safe_autosave_prompt(content, prompt_name):
  function store_notebook_state_and_debounce (line 264) | def store_notebook_state_and_debounce(text, prompt_name):
  function _perform_notebook_debounced_save (line 281) | def _perform_notebook_debounced_save():

FILE: modules/ui_parameters.py
  function create_ui (line 9) | def create_ui():
  function create_event_handlers (line 112) | def create_event_handlers():
  function get_truncation_length (line 130) | def get_truncation_length():
  function load_grammar (line 137) | def load_grammar(name):

FILE: modules/ui_session.py
  function create_ui (line 7) | def create_ui():
  function handle_save_settings (line 52) | def handle_save_settings(state, preset, extensions, show_controls, theme):
  function handle_default_to_notebook_change (line 64) | def handle_default_to_notebook_change(show_two_columns, default_input, d...
  function set_interface_arguments (line 89) | def set_interface_arguments(extensions, bool_active):
  function get_boolean_arguments (line 104) | def get_boolean_arguments(active=False):

FILE: modules/utils.py
  function gradio (line 11) | def gradio(*keys):
  function sanitize_filename (line 18) | def sanitize_filename(name):
  function _is_path_allowed (line 29) | def _is_path_allowed(abs_path_str):
  function save_file (line 40) | def save_file(fname, contents):
  function delete_file (line 60) | def delete_file(fname):
  function current_time (line 76) | def current_time():
  function atoi (line 80) | def atoi(text):
  function replace_all (line 85) | def replace_all(text, dic):
  function natural_keys (line 92) | def natural_keys(text):
  function check_model_loaded (line 96) | def check_model_loaded():
  function resolve_model_path (line 110) | def resolve_model_path(model_name_or_path, image_model=False):
  function get_available_models (line 125) | def get_available_models():
  function get_available_image_models (line 179) | def get_available_image_models():
  function get_available_ggufs (line 197) | def get_available_ggufs():
  function get_available_mmproj (line 211) | def get_available_mmproj():
  function get_available_presets (line 224) | def get_available_presets():
  function get_available_prompts (line 228) | def get_available_prompts():
  function get_available_characters (line 244) | def get_available_characters():
  function get_available_users (line 249) | def get_available_users():
  function get_available_instruction_templates (line 256) | def get_available_instruction_templates():
  function get_available_extensions (line 265) | def get_available_extensions():
  function get_available_loras (line 280) | def get_available_loras():
  function get_datasets (line 284) | def get_datasets(path: str, ext: str):
  function get_chat_datasets (line 292) | def get_chat_datasets(path: str):
  function get_text_datasets (line 297) | def get_text_datasets(path: str):
  function _peek_json_keys (line 302) | def _peek_json_keys(filepath):
  function _is_chat_dataset (line 342) | def _is_chat_dataset(filepath):
  function _is_text_dataset (line 347) | def _is_text_dataset(filepath):
  function get_available_chat_styles (line 352) | def get_available_chat_styles():
  function get_available_grammars (line 356) | def get_available_grammars():

FILE: modules/web_search.py
  function _validate_url (line 17) | def _validate_url(url):
  function get_current_timestamp (line 37) | def get_current_timestamp():
  function download_web_page (line 42) | def download_web_page(url, timeout=10, include_links=False):
  function perform_web_search (line 79) | def perform_web_search(query, num_pages=3, max_workers=5, timeout=10, fe...
  function truncate_content_by_tokens (line 156) | def truncate_content_by_tokens(content, max_tokens=8192):
  function add_web_search_attachments (line 172) | def add_web_search_attachments(history, row_idx, user_message, search_qu...

FILE: one_click.py
  function signal_handler (line 27) | def signal_handler(sig, frame):
  function is_linux (line 34) | def is_linux():
  function is_windows (line 38) | def is_windows():
  function is_macos (line 42) | def is_macos():
  function is_x86_64 (line 46) | def is_x86_64():
  function is_installed (line 50) | def is_installed():
  function load_state (line 63) | def load_state():
  function save_state (line 74) | def save_state(state):
  function get_gpu_choice (line 80) | def get_gpu_choice():
  function get_pytorch_install_command (line 111) | def get_pytorch_install_command(gpu_choice):
  function get_pytorch_update_command (line 129) | def get_pytorch_update_command(gpu_choice):
  function get_requirements_file (line 147) | def get_requirements_file(gpu_choice):
  function get_current_commit (line 165) | def get_current_commit():
  function get_extensions_names (line 170) | def get_extensions_names():
  function check_env (line 174) | def check_env():
  function clear_cache (line 187) | def clear_cache():
  function run_cmd (line 192) | def run_cmd(cmd, assert_success=False, environment=False, capture_output...
  function print_big_message (line 218) | def print_big_message(message):
  function calculate_file_hash (line 228) | def calculate_file_hash(file_path):
  function generate_alphabetic_sequence (line 237) | def generate_alphabetic_sequence(index):
  function get_user_choice (line 247) | def get_user_choice(question, options_dict):
  function update_pytorch_and_python (line 265) | def update_pytorch_and_python():
  function clean_outdated_pytorch_cuda_dependencies (line 272) | def clean_outdated_pytorch_cuda_dependencies():
  function install_webui (line 290) | def install_webui():
  function update_requirements (line 323) | def update_requirements(initial_installation=False, pull=True):
  function install_extensions_requirements (line 467) | def install_extensions_requirements():
  function launch_webui (line 476) | def launch_webui():

FILE: server.py
  function signal_handler (line 35) | def signal_handler(sig, frame):
  function create_interface (line 56) | def create_interface():

Download .json

Condensed preview — 118 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,079K chars).

[
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report_template.yml",
    "chars": 1660,
    "preview": "name: \"Bug report\"\ndescription: Report a bug\nlabels: [ \"bug\" ]\nbody:\n  - type: markdown\n    attributes:\n      value: |\n "
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "chars": 354,
    "preview": "---\nname: Feature request\nabout: Suggest an improvement or new feature for the web UI\ntitle: ''\nlabels: 'enhancement'\nas"
  },
  {
    "path": ".github/dependabot.yml",
    "chars": 517,
    "preview": "# To get started with Dependabot version updates, you'll need to specify which\n# package ecosystems to update and where "
  },
  {
    "path": ".github/pull_request_template.md",
    "chars": 145,
    "preview": "## Checklist:\n\n- [ ] I have read the [Contributing guidelines](https://github.com/oobabooga/text-generation-webui/wiki/C"
  },
  {
    "path": ".github/workflows/build-everything-tgw.yml",
    "chars": 1795,
    "preview": "name: Build Everything TGW\n\non:\n  workflow_dispatch:\n    inputs:\n      version:\n        description: 'Version tag of tex"
  },
  {
    "path": ".github/workflows/build-portable-release-cuda.yml",
    "chars": 6477,
    "preview": "name: Build CUDA\n\non:\n  workflow_dispatch:\n    inputs:\n      version:\n        description: 'Version tag of text-generati"
  },
  {
    "path": ".github/workflows/build-portable-release-rocm.yml",
    "chars": 6162,
    "preview": "name: Build ROCm\n\non:\n  workflow_dispatch:\n    inputs:\n      version:\n        description: 'Version tag of text-generati"
  },
  {
    "path": ".github/workflows/build-portable-release-vulkan.yml",
    "chars": 6165,
    "preview": "name: Build Vulkan\n\non:\n  workflow_dispatch:\n    inputs:\n      version:\n        description: 'Version tag of text-genera"
  },
  {
    "path": ".github/workflows/build-portable-release.yml",
    "chars": 7604,
    "preview": "name: Build CPU and macOS\n\non:\n  workflow_dispatch:\n    inputs:\n      version:\n        description: 'Version tag of text"
  },
  {
    "path": ".gitignore",
    "chars": 374,
    "preview": "/css\n/extensions\n/installer_files\n/repositories\n/user_data\n\n.chroma\n.DS_Store\n.eslintrc.js\n.idea\n.installer_state.json\n."
  },
  {
    "path": "LICENSE",
    "chars": 34523,
    "preview": "                    GNU AFFERO GENERAL PUBLIC LICENSE\n                       Version 3, 19 November 2007\n\n Copyright (C)"
  },
  {
    "path": "README.md",
    "chars": 30633,
    "preview": "<div align=\"center\" markdown=\"1\">\n   <sup>Special thanks to:</sup>\n   <br>\n   <br>\n   <a href=\"https://go.warp.dev/text-"
  },
  {
    "path": "cmd_linux.sh",
    "chars": 728,
    "preview": "#!/usr/bin/env bash\n\ncd \"$(dirname \"${BASH_SOURCE[0]}\")\"\n\nif [[ \"$(pwd)\" =~ \" \" ]]; then echo This script relies on Mini"
  },
  {
    "path": "cmd_macos.sh",
    "chars": 699,
    "preview": "#!/bin/bash\n\ncd \"$(dirname \"${BASH_SOURCE[0]}\")\"\n\nif [[ \"$(pwd)\" =~ \" \" ]]; then echo This script relies on Miniforge wh"
  },
  {
    "path": "cmd_windows.bat",
    "chars": 955,
    "preview": "@echo off\n\ncd /D \"%~dp0\"\n\nset PATH=%PATH%;%SystemRoot%\\system32\n\necho \"%CD%\"| findstr /C:\" \" >nul && echo This script re"
  },
  {
    "path": "docker/.dockerignore",
    "chars": 27,
    "preview": ".env\nDockerfile\n/user_data\n"
  },
  {
    "path": "docker/TensorRT-LLM/Dockerfile",
    "chars": 904,
    "preview": "FROM nvidia/cuda:13.0.1-cudnn-runtime-ubuntu24.04\n\n# Install Python 3.12, Git, and OpenMPI\nRUN apt update && apt install"
  },
  {
    "path": "docker/amd/Dockerfile",
    "chars": 834,
    "preview": "# BUILDER\nFROM ubuntu:22.04\nWORKDIR /builder\nARG BUILD_EXTENSIONS=\"${BUILD_EXTENSIONS:-}\"\nARG APP_UID=\"${APP_UID:-6972}\""
  },
  {
    "path": "docker/amd/docker-compose.yml",
    "chars": 691,
    "preview": "version: \"3.3\"\nservices:\n  text-generation-webui:\n    build:\n      context: .\n      args:\n        BUILD_EXTENSIONS: ${BU"
  },
  {
    "path": "docker/cpu/Dockerfile",
    "chars": 834,
    "preview": "# BUILDER\nFROM ubuntu:22.04\nWORKDIR /builder\nARG BUILD_EXTENSIONS=\"${BUILD_EXTENSIONS:-}\"\nARG APP_UID=\"${APP_UID:-6972}\""
  },
  {
    "path": "docker/cpu/docker-compose.yml",
    "chars": 522,
    "preview": "version: \"3.3\"\nservices:\n  text-generation-webui:\n    build:\n      context: .\n      args:\n        BUILD_EXTENSIONS: ${BU"
  },
  {
    "path": "docker/intel/Dockerfile",
    "chars": 834,
    "preview": "# BUILDER\nFROM ubuntu:22.04\nWORKDIR /builder\nARG BUILD_EXTENSIONS=\"${BUILD_EXTENSIONS:-}\"\nARG APP_UID=\"${APP_UID:-6972}\""
  },
  {
    "path": "docker/intel/docker-compose.yml",
    "chars": 691,
    "preview": "version: \"3.3\"\nservices:\n  text-generation-webui:\n    build:\n      context: .\n      args:\n        BUILD_EXTENSIONS: ${BU"
  },
  {
    "path": "docker/nvidia/Dockerfile",
    "chars": 922,
    "preview": "# BUILDER\nFROM ubuntu:22.04\nWORKDIR /builder\nARG TORCH_CUDA_ARCH_LIST=\"${TORCH_CUDA_ARCH_LIST:-3.5;5.0;6.0;6.1;7.0;7.5;8"
  },
  {
    "path": "docker/nvidia/docker-compose.yml",
    "chars": 843,
    "preview": "version: \"3.3\"\nservices:\n  text-generation-webui:\n    build:\n      context: .\n      args:\n        # specify which cuda v"
  },
  {
    "path": "docs/01 - Chat Tab.md",
    "chars": 10956,
    "preview": "Used to have multi-turn conversations with the model.\n\n## Input area\n\nThe main action buttons are:\n\n* **Send**: sends yo"
  },
  {
    "path": "docs/02 - Default and Notebook Tabs.md",
    "chars": 2229,
    "preview": "Used to generate raw completions starting from your prompt.\n\n## Default tab\n\nThis tab contains two main text boxes: Inpu"
  },
  {
    "path": "docs/03 - Parameters Tab.md",
    "chars": 17434,
    "preview": "## Generation\n\nContains parameters that control the text generation. \n\n### Quick rundown\n\nLLMs work by generating one to"
  },
  {
    "path": "docs/04 - Model Tab.md",
    "chars": 9586,
    "preview": "This is where you load models, apply LoRAs to a loaded model, and download new models.\n\n## Model loaders\n\n### llama.cpp\n"
  },
  {
    "path": "docs/05 - Training Tab.md",
    "chars": 7113,
    "preview": "## Training Your Own LoRAs\n\nA LoRA is tied to a specific model architecture — a LoRA trained on Llama 3 8B won't work on"
  },
  {
    "path": "docs/06 - Session Tab.md",
    "chars": 2267,
    "preview": "Here you can restart the UI with new settings.\n\n## Settings\n\n* **Toggle light/dark theme**: switches between light and d"
  },
  {
    "path": "docs/07 - Extensions.md",
    "chars": 11587,
    "preview": "# Extensions\n\nExtensions are defined by files named `script.py` inside subfolders of either:\n- `text-generation-webui/ex"
  },
  {
    "path": "docs/08 - Additional Tips.md",
    "chars": 1154,
    "preview": "## Audio notification\n\nIf your computer takes a long time to generate each response for the model that you are using, yo"
  },
  {
    "path": "docs/09 - Docker.md",
    "chars": 1851,
    "preview": "Docker Compose is a way of installing and launching the web UI in an isolated Ubuntu image using only a few commands.\n\n#"
  },
  {
    "path": "docs/11 - AMD Setup.md",
    "chars": 669,
    "preview": "## Using an AMD GPU in Linux\n\nRequires ROCm 6.4 to be installed.\n\n### Option 1: One-click installer\n\nThe one-click insta"
  },
  {
    "path": "docs/12 - OpenAI API.md",
    "chars": 20451,
    "preview": "## OpenAI compatible API\n\nThe main API for this project is meant to be a drop-in replacement to the OpenAI API, includin"
  },
  {
    "path": "docs/13 - Keyboard Shortcuts.md",
    "chars": 1045,
    "preview": "# Keyboard Shortcuts\n\n#### General\n\n| Shortcut                | Description                                      |\n|----"
  },
  {
    "path": "docs/Image Generation Tutorial.md",
    "chars": 3812,
    "preview": "# Image Generation Tutorial\n\nThis feature allows you to generate images using `diffusers` models like [Tongyi-MAI/Z-Imag"
  },
  {
    "path": "docs/Multimodal Tutorial.md",
    "chars": 2447,
    "preview": "## Getting started\n\n### 1. Find a multimodal model\n\nGGUF models with vision capabilities are uploaded along a `mmproj` f"
  },
  {
    "path": "docs/README.md",
    "chars": 219,
    "preview": "These files are a mirror of the documentation at:\n\n# https://github.com/oobabooga/text-generation-webui/wiki\n\nIt is reco"
  },
  {
    "path": "docs/Tool Calling Tutorial.md",
    "chars": 4692,
    "preview": "## Supported models\n\nThe following models are supported:\n\n- Qwen 3.5\n- GPT-OSS\n- Mistral Small / Devstral\n- DeepSeek V3\n"
  },
  {
    "path": "docs/What Works.md",
    "chars": 753,
    "preview": "## What Works\n\n| Loader         | Loading LoRAs | Training LoRAs | Multimodal | Perplexity evaluation |\n|---------------"
  },
  {
    "path": "download-model.py",
    "chars": 20018,
    "preview": "'''\nDownloads models from Hugging Face to user_data/models/username_modelname.\n\nExample:\npython download-model.py facebo"
  },
  {
    "path": "js/dark_theme.js",
    "chars": 797,
    "preview": "function toggleDarkMode() {\n  document.body.classList.toggle(\"dark\");\n  var currentCSS = document.getElementById(\"highli"
  },
  {
    "path": "js/global_scope_js.js",
    "chars": 14630,
    "preview": "// -------------------------------------------------\n// Event handlers\n// ----------------------------------------------"
  },
  {
    "path": "js/katex/auto-render.js",
    "chars": 8535,
    "preview": "! function(e, t) {\n    \"object\" == typeof exports && \"object\" == typeof module ? module.exports = t(require(\"katex\")) : "
  },
  {
    "path": "js/main.js",
    "chars": 38489,
    "preview": "// ------------------------------------------------\n// Main\n// ------------------------------------------------\n\n// Sync"
  },
  {
    "path": "js/save_files.js",
    "chars": 1282,
    "preview": "// Functions for downloading JSON files\nfunction getCurrentTimestamp() {\n  const now = new Date();\n  const timezoneOffse"
  },
  {
    "path": "js/show_controls.js",
    "chars": 1497,
    "preview": "const chatParent = document.querySelector(\".chat-parent\");\n\nfunction toggle_controls(value) {\n  const extensions = docum"
  },
  {
    "path": "js/switch_tabs.js",
    "chars": 1241,
    "preview": "function scrollToTop() {\n  window.scrollTo({ top: 0 });\n}\n\nfunction findButtonsByText(buttonText) {\n  const buttons = do"
  },
  {
    "path": "js/update_big_picture.js",
    "chars": 260,
    "preview": "function updateBigPicture() {\n  var existingElement = document.querySelector(\".bigProfilePicture\");\n  if (existingElemen"
  },
  {
    "path": "modules/LoRA.py",
    "chars": 2803,
    "preview": "from pathlib import Path\n\nimport modules.shared as shared\nfrom modules.logging_colors import logger\n\n\ndef add_lora_to_mo"
  },
  {
    "path": "modules/callbacks.py",
    "chars": 1583,
    "preview": "import traceback\nfrom queue import Queue\nfrom threading import Thread\n\nimport modules.shared as shared\n\n\nclass StopNowEx"
  },
  {
    "path": "modules/chat.py",
    "chars": 102296,
    "preview": "import base64\nimport copy\nimport functools\nimport html\nimport json\nimport pprint\nimport re\nimport shutil\nimport threadin"
  },
  {
    "path": "modules/evaluate.py",
    "chars": 6513,
    "preview": "import datetime\nfrom pathlib import Path\n\nimport pandas as pd\nfrom tqdm import tqdm\n\nfrom modules import shared\nfrom mod"
  },
  {
    "path": "modules/exllamav3.py",
    "chars": 23566,
    "preview": "import math\nimport queue\nimport threading\nimport traceback\nfrom pathlib import Path\nfrom typing import Any, List, Tuple\n"
  },
  {
    "path": "modules/exllamav3_hf.py",
    "chars": 10865,
    "preview": "import os\nimport traceback\nfrom pathlib import Path\nfrom typing import Any, Dict, Optional, Union\n\nimport torch\nfrom tor"
  },
  {
    "path": "modules/extensions.py",
    "chars": 9250,
    "preview": "import importlib\nimport importlib.util\nimport sys\nimport traceback\nfrom functools import partial\nfrom inspect import sig"
  },
  {
    "path": "modules/grammar/grammar_utils.py",
    "chars": 26751,
    "preview": "'''\nThis file has been 100% copied from this PR to the Transformers library:\nhttps://github.com/huggingface/transformers"
  },
  {
    "path": "modules/grammar/logits_process.py",
    "chars": 5152,
    "preview": "'''\nThis file has been 100% copied from this PR to the Transformers library:\nhttps://github.com/huggingface/transformers"
  },
  {
    "path": "modules/html_generator.py",
    "chars": 32511,
    "preview": "import datetime\nimport functools\nimport html\nimport os\nimport re\nimport time\nfrom pathlib import Path\n\nimport markdown\nf"
  },
  {
    "path": "modules/image_models.py",
    "chars": 6426,
    "preview": "import time\n\nimport modules.shared as shared\nfrom modules.logging_colors import logger\nfrom modules.utils import resolve"
  },
  {
    "path": "modules/image_utils.py",
    "chars": 4816,
    "preview": "import base64\nimport io\nimport os\nfrom pathlib import Path\nfrom typing import Any, List, Tuple\n\nfrom PIL import Image\n\nf"
  },
  {
    "path": "modules/llama_cpp_server.py",
    "chars": 23827,
    "preview": "import json\nimport os\nimport pprint\nimport re\nimport socket\nimport subprocess\nimport sys\nimport threading\nimport time\nfr"
  },
  {
    "path": "modules/loaders.py",
    "chars": 8449,
    "preview": "import functools\nfrom collections import OrderedDict\n\nloaders_and_params = OrderedDict({\n    'llama.cpp': [\n        'gpu"
  },
  {
    "path": "modules/logging_colors.py",
    "chars": 2695,
    "preview": "import logging\n\nlogger = logging.getLogger('text-generation-webui')\n\n\ndef setup_logging():\n    '''\n    Copied from: http"
  },
  {
    "path": "modules/logits.py",
    "chars": 4410,
    "preview": "import time\nimport traceback\n\nimport numpy as np\n\nfrom modules import models, shared\nfrom modules.logging_colors import "
  },
  {
    "path": "modules/metadata_gguf.py",
    "chars": 2565,
    "preview": "import struct\nfrom enum import IntEnum\n\n\nclass GGUFValueType(IntEnum):\n    UINT8 = 0\n    INT8 = 1\n    UINT16 = 2\n    INT"
  },
  {
    "path": "modules/models.py",
    "chars": 5412,
    "preview": "import sys\nimport time\n\nimport modules.shared as shared\nfrom modules.logging_colors import logger\nfrom modules.models_se"
  },
  {
    "path": "modules/models_settings.py",
    "chars": 17901,
    "preview": "import functools\nimport json\nimport re\nfrom math import floor\nfrom pathlib import Path\n\nimport yaml\n\nfrom modules import"
  },
  {
    "path": "modules/paths.py",
    "chars": 935,
    "preview": "import sys\nfrom pathlib import Path\n\n\ndef resolve_user_data_dir():\n    \"\"\"\n    Resolve the user_data directory path. Ord"
  },
  {
    "path": "modules/presets.py",
    "chars": 3960,
    "preview": "import functools\nimport pprint\nfrom pathlib import Path\n\nimport yaml\n\nfrom modules import shared\nfrom modules.loaders im"
  },
  {
    "path": "modules/prompts.py",
    "chars": 1036,
    "preview": "from pathlib import Path\n\nfrom modules import shared, utils\nfrom modules.text_generation import get_encoded_length\n\n\ndef"
  },
  {
    "path": "modules/reasoning.py",
    "chars": 3991,
    "preview": "import html as html_module\n\n# Thinking block format definitions: (start_tag, end_tag, content_start_tag)\n# Use None for "
  },
  {
    "path": "modules/sampler_hijack.py",
    "chars": 32988,
    "preview": "import json\nimport math\nimport pprint\nimport random\n\nimport torch\nimport transformers\nfrom transformers.generation.logit"
  },
  {
    "path": "modules/sane_markdown_lists.py",
    "chars": 13919,
    "preview": "# Code based on the Sane List Extension for Python-Markdown\n# =======================================\n\n# Modify the beha"
  },
  {
    "path": "modules/shared.py",
    "chars": 31842,
    "preview": "import argparse\nimport copy\nimport os\nimport shlex\nimport sys\nfrom collections import OrderedDict\nfrom pathlib import Pa"
  },
  {
    "path": "modules/tensorrt_llm.py",
    "chars": 2386,
    "preview": "from pathlib import Path\n\nfrom tensorrt_llm._tensorrt_engine import LLM\nfrom tensorrt_llm.llmapi import SamplingParams\n\n"
  },
  {
    "path": "modules/text_generation.py",
    "chars": 19384,
    "preview": "import ast\nimport copy\nimport html\nimport pprint\nimport random\nimport time\nimport traceback\n\nimport numpy as np\n\nimport "
  },
  {
    "path": "modules/tool_parsing.py",
    "chars": 24629,
    "preview": "import json\nimport random\nimport re\n\n\ndef get_tool_call_id() -> str:\n    letter_bytes = \"abcdefghijklmnopqrstuvwxyz01234"
  },
  {
    "path": "modules/tool_use.py",
    "chars": 2540,
    "preview": "import importlib.util\nimport json\n\nfrom modules import shared\nfrom modules.logging_colors import logger\nfrom modules.uti"
  },
  {
    "path": "modules/torch_utils.py",
    "chars": 1049,
    "preview": "import gc\n\nimport torch\nfrom accelerate.utils import is_npu_available, is_xpu_available\nfrom transformers import is_torc"
  },
  {
    "path": "modules/training.py",
    "chars": 44783,
    "preview": "import os\n\nos.environ[\"WANDB_MODE\"] = \"offline\"\n# os.environ[\"WANDB_DISABLED\"] = \"true\"\n\nimport json\nimport math\nimport "
  },
  {
    "path": "modules/transformers_loader.py",
    "chars": 8047,
    "preview": "import pprint\nfrom pathlib import Path\n\nimport torch\nimport torch.nn.functional as F\nimport transformers\nfrom accelerate"
  },
  {
    "path": "modules/ui.py",
    "chars": 18681,
    "preview": "import copy\nimport threading\nfrom pathlib import Path\n\nimport gradio as gr\nimport yaml\n\nimport extensions\nimport modules"
  },
  {
    "path": "modules/ui_chat.py",
    "chars": 31860,
    "preview": "import json\nfrom functools import partial\nfrom pathlib import Path\n\nimport gradio as gr\nfrom PIL import Image\n\nfrom modu"
  },
  {
    "path": "modules/ui_default.py",
    "chars": 12338,
    "preview": "from pathlib import Path\n\nimport gradio as gr\n\nfrom modules import logits, shared, ui, utils\nfrom modules.prompts import"
  },
  {
    "path": "modules/ui_file_saving.py",
    "chars": 11271,
    "preview": "import traceback\n\nimport gradio as gr\n\nfrom modules import chat, presets, shared, ui, utils\nfrom modules.utils import gr"
  },
  {
    "path": "modules/ui_image_generation.py",
    "chars": 39285,
    "preview": "import json\nimport os\nimport random\nimport time\nimport traceback\nfrom datetime import datetime\nfrom pathlib import Path\n"
  },
  {
    "path": "modules/ui_model_menu.py",
    "chars": 28289,
    "preview": "import importlib\nimport math\nimport queue\nimport threading\nimport traceback\nfrom functools import partial\nfrom pathlib i"
  },
  {
    "path": "modules/ui_notebook.py",
    "chars": 15018,
    "preview": "import threading\nimport time\nfrom pathlib import Path\n\nimport gradio as gr\n\nfrom modules import logits, shared, ui, util"
  },
  {
    "path": "modules/ui_parameters.py",
    "chars": 15646,
    "preview": "from pathlib import Path\n\nimport gradio as gr\n\nfrom modules import loaders, presets, shared, ui, ui_chat, utils\nfrom mod"
  },
  {
    "path": "modules/ui_session.py",
    "chars": 5743,
    "preview": "import gradio as gr\n\nfrom modules import shared, ui, utils\nfrom modules.utils import gradio\n\n\ndef create_ui():\n    mu = "
  },
  {
    "path": "modules/utils.py",
    "chars": 11750,
    "preview": "import os\nimport re\nfrom datetime import datetime\nfrom pathlib import Path\n\nfrom modules import shared\nfrom modules.logg"
  },
  {
    "path": "modules/web_search.py",
    "chars": 7755,
    "preview": "import concurrent.futures\nimport html\nimport ipaddress\nimport random\nimport re\nimport socket\nfrom concurrent.futures imp"
  },
  {
    "path": "one_click.py",
    "chars": 20724,
    "preview": "import argparse\nimport glob\nimport hashlib\nimport json\nimport os\nimport platform\nimport re\nimport signal\nimport site\nimp"
  },
  {
    "path": "requirements/full/requirements.txt",
    "chars": 2019,
    "preview": "accelerate==1.12.*\naudioop-lts<1.0; python_version >= \"3.13\"\nbitsandbytes==0.49.*\ndatasets\ndiffusers==0.37.*\neinops\nfast"
  },
  {
    "path": "requirements/full/requirements_amd.txt",
    "chars": 1068,
    "preview": "accelerate==1.12.*\naudioop-lts<1.0; python_version >= \"3.13\"\ndatasets\ndiffusers==0.37.*\neinops\nfastapi==0.112.4\nhuggingf"
  },
  {
    "path": "requirements/full/requirements_apple_intel.txt",
    "chars": 873,
    "preview": "accelerate==1.12.*\naudioop-lts<1.0; python_version >= \"3.13\"\ndatasets\ndiffusers==0.37.*\neinops\nfastapi==0.112.4\nhuggingf"
  },
  {
    "path": "requirements/full/requirements_apple_silicon.txt",
    "chars": 872,
    "preview": "accelerate==1.12.*\naudioop-lts<1.0; python_version >= \"3.13\"\ndatasets\ndiffusers==0.37.*\neinops\nfastapi==0.112.4\nhuggingf"
  },
  {
    "path": "requirements/full/requirements_cpu_only.txt",
    "chars": 1070,
    "preview": "accelerate==1.12.*\naudioop-lts<1.0; python_version >= \"3.13\"\ndatasets\ndiffusers==0.37.*\neinops\nfastapi==0.112.4\nhuggingf"
  },
  {
    "path": "requirements/full/requirements_nowheels.txt",
    "chars": 698,
    "preview": "accelerate==1.12.*\naudioop-lts<1.0; python_version >= \"3.13\"\ndatasets\ndiffusers==0.37.*\neinops\nfastapi==0.112.4\nhuggingf"
  },
  {
    "path": "requirements/portable/requirements.txt",
    "chars": 885,
    "preview": "audioop-lts<1.0; python_version >= \"3.13\"\nfastapi==0.112.4\nhuggingface-hub==1.5.*\njinja2==3.1.6\nmarkdown\nnumpy==2.2.*\npy"
  },
  {
    "path": "requirements/portable/requirements_amd.txt",
    "chars": 888,
    "preview": "audioop-lts<1.0; python_version >= \"3.13\"\nfastapi==0.112.4\nhuggingface-hub==1.5.*\njinja2==3.1.6\nmarkdown\nnumpy==2.2.*\npy"
  },
  {
    "path": "requirements/portable/requirements_apple_intel.txt",
    "chars": 693,
    "preview": "audioop-lts<1.0; python_version >= \"3.13\"\nfastapi==0.112.4\nhuggingface-hub==1.5.*\njinja2==3.1.6\nmarkdown\nnumpy==2.2.*\npy"
  },
  {
    "path": "requirements/portable/requirements_apple_silicon.txt",
    "chars": 692,
    "preview": "audioop-lts<1.0; python_version >= \"3.13\"\nfastapi==0.112.4\nhuggingface-hub==1.5.*\njinja2==3.1.6\nmarkdown\nnumpy==2.2.*\npy"
  },
  {
    "path": "requirements/portable/requirements_cpu_only.txt",
    "chars": 890,
    "preview": "audioop-lts<1.0; python_version >= \"3.13\"\nfastapi==0.112.4\nhuggingface-hub==1.5.*\njinja2==3.1.6\nmarkdown\nnumpy==2.2.*\npy"
  },
  {
    "path": "requirements/portable/requirements_cuda131.txt",
    "chars": 885,
    "preview": "audioop-lts<1.0; python_version >= \"3.13\"\nfastapi==0.112.4\nhuggingface-hub==1.5.*\njinja2==3.1.6\nmarkdown\nnumpy==2.2.*\npy"
  },
  {
    "path": "requirements/portable/requirements_nowheels.txt",
    "chars": 518,
    "preview": "audioop-lts<1.0; python_version >= \"3.13\"\nfastapi==0.112.4\nhuggingface-hub==1.5.*\njinja2==3.1.6\nmarkdown\nnumpy==2.2.*\npy"
  },
  {
    "path": "requirements/portable/requirements_vulkan.txt",
    "chars": 889,
    "preview": "audioop-lts<1.0; python_version >= \"3.13\"\nfastapi==0.112.4\nhuggingface-hub==1.5.*\njinja2==3.1.6\nmarkdown\nnumpy==2.2.*\npy"
  },
  {
    "path": "server.py",
    "chars": 13587,
    "preview": "import os\nimport signal\nimport sys\nimport time\nimport warnings\nfrom functools import partial\nfrom pathlib import Path\nfr"
  },
  {
    "path": "setup.cfg",
    "chars": 61,
    "preview": "[pycodestyle]\nmax-line-length = 120\nignore = E402, E501, E722"
  },
  {
    "path": "start_linux.sh",
    "chars": 2558,
    "preview": "#!/usr/bin/env bash\n\n# environment isolation\nexport PYTHONNOUSERSITE=1\nunset PYTHONPATH\nunset PYTHONHOME\n\ncd \"$(dirname "
  },
  {
    "path": "start_macos.sh",
    "chars": 2547,
    "preview": "#!/bin/bash\n\n# environment isolation\nexport PYTHONNOUSERSITE=1\nunset PYTHONPATH\nunset PYTHONHOME\n\ncd \"$(dirname \"$(readl"
  },
  {
    "path": "start_windows.bat",
    "chars": 4299,
    "preview": "@echo off\nsetlocal enabledelayedexpansion\n\n@rem environment isolation\nset PYTHONNOUSERSITE=1\nset PYTHONPATH=\nset PYTHONH"
  },
  {
    "path": "update_wizard_linux.sh",
    "chars": 889,
    "preview": "#!/usr/bin/env bash\n\ncd \"$(dirname \"${BASH_SOURCE[0]}\")\"\n\nif [[ \"$(pwd)\" =~ \" \" ]]; then echo This script relies on Mini"
  },
  {
    "path": "update_wizard_macos.sh",
    "chars": 881,
    "preview": "#!/bin/bash\n\ncd \"$(dirname \"${BASH_SOURCE[0]}\")\"\n\nif [[ \"$(pwd)\" =~ \" \" ]]; then echo This script relies on Miniforge wh"
  },
  {
    "path": "update_wizard_windows.bat",
    "chars": 1041,
    "preview": "@echo off\n\ncd /D \"%~dp0\"\n\nset PATH=%PATH%;%SystemRoot%\\system32\n\necho \"%CD%\"| findstr /C:\" \" >nul && echo This script re"
  }
]

About this extraction

This page contains the full source code of the oobabooga/text-generation-webui GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 118 files (1013.0 KB), approximately 240.9k tokens, and a symbol index with 687 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo