Full Code of lm-sys/FastChat for AI

main 587d5cfa1609 cached

208 files

1.5 MB

398.6k tokens

1301 symbols

1 requests

Download .txt

Showing preview only (1,591K chars total). Download the full file or copy to clipboard to get everything.

Repository: lm-sys/FastChat
Branch: main
Commit: 587d5cfa1609
Files: 208
Total size: 1.5 MB

Directory structure:
gitextract_qr2pzmmc/

├── .github/
│   ├── PULL_REQUEST_TEMPLATE.md
│   └── workflows/
│       └── python-package.yml
├── .gitignore
├── .pylintrc
├── LICENSE
├── README.md
├── docker/
│   ├── Dockerfile
│   └── docker-compose.yml
├── docs/
│   ├── arena.md
│   ├── awq.md
│   ├── commands/
│   │   ├── conv_release.md
│   │   ├── data_cleaning.md
│   │   ├── leaderboard.md
│   │   ├── local_cluster.md
│   │   ├── pypi.md
│   │   └── webserver.md
│   ├── dashinfer_integration.md
│   ├── dataset_release.md
│   ├── exllama_v2.md
│   ├── gptq.md
│   ├── langchain_integration.md
│   ├── lightllm_integration.md
│   ├── mlx_integration.md
│   ├── model_support.md
│   ├── openai_api.md
│   ├── server_arch.md
│   ├── third_party_ui.md
│   ├── training.md
│   ├── vicuna_weights_version.md
│   ├── vllm_integration.md
│   └── xFasterTransformer.md
├── fastchat/
│   ├── __init__.py
│   ├── constants.py
│   ├── conversation.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── clean_sharegpt.py
│   │   ├── convert_alpaca.py
│   │   ├── extract_gpt4_only.py
│   │   ├── extract_single_round.py
│   │   ├── filter_wrong_format.py
│   │   ├── get_stats.py
│   │   ├── hardcoded_questions.py
│   │   ├── inspect_data.py
│   │   ├── merge.py
│   │   ├── optional_clean.py
│   │   ├── optional_replace.py
│   │   ├── prepare_all.py
│   │   ├── pretty_json.py
│   │   ├── sample.py
│   │   ├── split_long_conversation.py
│   │   └── split_train_test.py
│   ├── llm_judge/
│   │   ├── README.md
│   │   ├── clean_judgment.py
│   │   ├── common.py
│   │   ├── compute_agreement.py
│   │   ├── data/
│   │   │   ├── judge_prompts.jsonl
│   │   │   ├── mt_bench/
│   │   │   │   ├── question.jsonl
│   │   │   │   └── reference_answer/
│   │   │   │       └── gpt-4.jsonl
│   │   │   └── vicuna_bench/
│   │   │       ├── question.jsonl
│   │   │       └── reference_answer/
│   │   │           └── gpt-4.jsonl
│   │   ├── download_mt_bench_pregenerated.py
│   │   ├── gen_api_answer.py
│   │   ├── gen_judgment.py
│   │   ├── gen_model_answer.py
│   │   ├── qa_browser.py
│   │   └── show_result.py
│   ├── model/
│   │   ├── __init__.py
│   │   ├── apply_delta.py
│   │   ├── apply_lora.py
│   │   ├── compression.py
│   │   ├── convert_fp16.py
│   │   ├── llama_condense_monkey_patch.py
│   │   ├── make_delta.py
│   │   ├── model_adapter.py
│   │   ├── model_chatglm.py
│   │   ├── model_cllm.py
│   │   ├── model_codet5p.py
│   │   ├── model_exllama.py
│   │   ├── model_falcon.py
│   │   ├── model_registry.py
│   │   ├── model_xfastertransformer.py
│   │   ├── model_yuan2.py
│   │   ├── monkey_patch_non_inplace.py
│   │   ├── rwkv_model.py
│   │   └── upload_hub.py
│   ├── modules/
│   │   ├── __init__.py
│   │   ├── awq.py
│   │   ├── exllama.py
│   │   ├── gptq.py
│   │   └── xfastertransformer.py
│   ├── protocol/
│   │   ├── api_protocol.py
│   │   └── openai_api_protocol.py
│   ├── serve/
│   │   ├── __init__.py
│   │   ├── api_provider.py
│   │   ├── base_model_worker.py
│   │   ├── call_monitor.py
│   │   ├── cli.py
│   │   ├── controller.py
│   │   ├── dashinfer_worker.py
│   │   ├── gateway/
│   │   │   ├── README.md
│   │   │   └── nginx.conf
│   │   ├── gradio_block_arena_anony.py
│   │   ├── gradio_block_arena_named.py
│   │   ├── gradio_block_arena_vision.py
│   │   ├── gradio_block_arena_vision_anony.py
│   │   ├── gradio_block_arena_vision_named.py
│   │   ├── gradio_global_state.py
│   │   ├── gradio_web_server.py
│   │   ├── gradio_web_server_multi.py
│   │   ├── huggingface_api.py
│   │   ├── huggingface_api_worker.py
│   │   ├── inference.py
│   │   ├── launch_all_serve.py
│   │   ├── lightllm_worker.py
│   │   ├── mlx_worker.py
│   │   ├── model_worker.py
│   │   ├── monitor/
│   │   │   ├── add_markdown_info.py
│   │   │   ├── basic_stats.py
│   │   │   ├── classify/
│   │   │   │   ├── README.md
│   │   │   │   ├── category.py
│   │   │   │   ├── config.yaml
│   │   │   │   ├── display_score.py
│   │   │   │   ├── label.py
│   │   │   │   └── vision_config.yaml
│   │   │   ├── clean_battle_data.py
│   │   │   ├── clean_chat_data.py
│   │   │   ├── copilot_arena.py
│   │   │   ├── criteria_labeling.py
│   │   │   ├── dataset_release_scripts/
│   │   │   │   ├── arena_33k/
│   │   │   │   │   ├── count_unique_users.py
│   │   │   │   │   ├── filter_bad_conv.py
│   │   │   │   │   ├── merge_field.py
│   │   │   │   │   ├── sample.py
│   │   │   │   │   └── upload_hf_dataset.py
│   │   │   │   └── lmsys_chat_1m/
│   │   │   │       ├── approve_all.py
│   │   │   │       ├── compute_stats.py
│   │   │   │       ├── filter_bad_conv.py
│   │   │   │       ├── final_post_processing.py
│   │   │   │       ├── instructions.md
│   │   │   │       ├── merge_oai_tag.py
│   │   │   │       ├── process_all.sh
│   │   │   │       ├── sample.py
│   │   │   │       └── upload_hf_dataset.py
│   │   │   ├── deduplication.py
│   │   │   ├── elo_analysis.py
│   │   │   ├── inspect_conv.py
│   │   │   ├── intersect_conv_file.py
│   │   │   ├── leaderboard_csv_to_html.py
│   │   │   ├── monitor.py
│   │   │   ├── monitor_md.py
│   │   │   ├── rating_systems.py
│   │   │   ├── summarize_cluster.py
│   │   │   ├── tag_openai_moderation.py
│   │   │   ├── topic_clustering.py
│   │   │   └── vote_time_stats/
│   │   │       ├── README.md
│   │   │       ├── analyze_data.py
│   │   │       └── plot.py
│   │   ├── multi_model_worker.py
│   │   ├── openai_api_server.py
│   │   ├── register_worker.py
│   │   ├── remote_logger.py
│   │   ├── sglang_worker.py
│   │   ├── shutdown_serve.py
│   │   ├── test_message.py
│   │   ├── test_throughput.py
│   │   ├── vision/
│   │   │   ├── create_vqa_examples_dir.py
│   │   │   ├── create_vqa_examples_json.py
│   │   │   └── image.py
│   │   └── vllm_worker.py
│   ├── train/
│   │   ├── llama2_flash_attn_monkey_patch.py
│   │   ├── llama_flash_attn_monkey_patch.py
│   │   ├── llama_xformers_attn_monkey_patch.py
│   │   ├── train.py
│   │   ├── train_baichuan.py
│   │   ├── train_flant5.py
│   │   ├── train_lora.py
│   │   ├── train_lora_t5.py
│   │   ├── train_mem.py
│   │   ├── train_with_template.py
│   │   ├── train_xformers.py
│   │   └── train_yuan2.py
│   └── utils.py
├── format.sh
├── playground/
│   ├── FastChat_API_GoogleColab.ipynb
│   ├── __init__.py
│   ├── benchmark/
│   │   └── benchmark_api_provider.py
│   ├── deepspeed_config_s2.json
│   ├── deepspeed_config_s3.json
│   └── test_embedding/
│       ├── README.md
│       ├── test_classification.py
│       ├── test_semantic_search.py
│       └── test_sentence_similarity.py
├── pyproject.toml
├── scripts/
│   ├── build-api.sh
│   ├── test_readme_train.sh
│   ├── train_lora.sh
│   ├── train_vicuna_13b.sh
│   ├── train_vicuna_7b.sh
│   └── upload_pypi.sh
└── tests/
    ├── README.md
    ├── killall_python.sh
    ├── launch_openai_api_test_server.py
    ├── load_test.py
    ├── test_cli.py
    ├── test_cli_inputs.txt
    ├── test_image_utils.py
    ├── test_openai_api.py
    ├── test_openai_langchain.py
    └── test_openai_vision_api.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/PULL_REQUEST_TEMPLATE.md
================================================
<!-- Thank you for your contribution! -->

<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number (if applicable)

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed.
- [ ] I've made sure the relevant tests are passing (if applicable).


================================================
FILE: .github/workflows/python-package.yml
================================================
name: Python package

on: [push, pull_request]

jobs:
  build:

    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        python-version: ["3.10"]

    steps:
    - uses: actions/checkout@v3
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python-version }}
        cache: 'pip'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        python -m pip install -e '.[dev]'
    - name: Run linter
      run: |
        pylint -d all -e E0602 ./fastchat/
    - name: Check formatting
      run: |
        black --check .


================================================
FILE: .gitignore
================================================
# Python
__pycache__
*.pyc
*.egg-info
dist
.venv

# Log
*.log
*.log.*
*.json
!playground/deepspeed_config_s2.json
!playground/deepspeed_config_s3.json

# Editor
.idea
*.swp

# Other
.DS_Store
wandb
output
checkpoints_flant5_3b

# Data
*.pkl
*.csv
tests/state_of_the_union.txt

# Build
build

# Image data
serve_images
val2014
vqa_examples

================================================
FILE: .pylintrc
================================================
# This Pylint rcfile contains a best-effort configuration to uphold the
# best-practices and style described in the Google Python style guide:
#   https://google.github.io/styleguide/pyguide.html
#
# Its canonical open-source location is:
#   https://google.github.io/styleguide/pylintrc

[MASTER]

# Files or directories to be skipped. They should be base names, not paths.
ignore=third_party,ray_patches,providers

# Files or directories matching the regex patterns are skipped. The regex
# matches against base names, not paths.
ignore-patterns=

# Pickle collected data for later comparisons.
persistent=no

# List of plugins (as comma separated values of python modules names) to load,
# usually to register additional checkers.
load-plugins=

# Use multiple processes to speed up Pylint.
jobs=4

# Allow loading of arbitrary C extensions. Extensions are imported into the
# active Python interpreter and may run arbitrary code.
unsafe-load-any-extension=no


[MESSAGES CONTROL]

# Only show warnings with the listed confidence levels. Leave empty to show
# all. Valid levels: HIGH, INFERENCE, INFERENCE_FAILURE, UNDEFINED
confidence=

# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
# multiple time (only on the command line, not in the configuration file where
# it should appear only once). See also the "--disable" option for examples.
#enable=

# Disable the message, report, category or checker with the given id(s). You
# can either give multiple identifiers separated by comma (,) or put this
# option multiple times (only on the command line, not in the configuration
# file where it should appear only once).You can also use "--disable=all" to
# disable everything first and then reenable specific checks. For example, if
# you want to run only the similarities checker, you can use "--disable=all
# --enable=similarities". If you want to run only the classes checker, but have
# no Warning level messages displayed, use"--disable=all --enable=classes
# --disable=W"
disable=abstract-method,
        apply-builtin,
        arguments-differ,
        attribute-defined-outside-init,
        backtick,
        bad-option-value,
        basestring-builtin,
        buffer-builtin,
        c-extension-no-member,
        consider-using-enumerate,
        cmp-builtin,
        cmp-method,
        coerce-builtin,
        coerce-method,
        delslice-method,
        div-method,
        duplicate-code,
        eq-without-hash,
        execfile-builtin,
        file-builtin,
        filter-builtin-not-iterating,
        fixme,
        getslice-method,
        global-statement,
        hex-method,
        idiv-method,
        implicit-str-concat-in-sequence,
        import-error,
        import-self,
        import-star-module-level,
        inconsistent-return-statements,
        input-builtin,
        intern-builtin,
        invalid-str-codec,
        locally-disabled,
        logging-format-interpolation,  # FIXME(sky): make pass.
        logging-fstring-interpolation,  # FIXME(sky): make pass.
        long-builtin,
        long-suffix,
        map-builtin-not-iterating,
        misplaced-comparison-constant,
        missing-function-docstring,
        metaclass-assignment,
        next-method-called,
        next-method-defined,
        no-absolute-import,
        no-else-break,
        no-else-continue,
        no-else-raise,
        no-else-return,
        no-init,  # added
        no-member,
        no-name-in-module,
        no-self-use,
        nonzero-method,
        oct-method,
        old-division,
        old-ne-operator,
        old-octal-literal,
        old-raise-syntax,
        parameter-unpacking,
        print-statement,
        raising-string,
        range-builtin-not-iterating,
        raw_input-builtin,
        rdiv-method,
        reduce-builtin,
        relative-import,
        reload-builtin,
        round-builtin,
        setslice-method,
        signature-differs,
        standarderror-builtin,
        suppressed-message,
        sys-max-int,
        too-few-public-methods,
        too-many-ancestors,
        too-many-arguments,
        too-many-boolean-expressions,
        too-many-branches,
        too-many-instance-attributes,
        too-many-locals,
        too-many-nested-blocks,
        too-many-public-methods,
        too-many-return-statements,
        too-many-statements,
        trailing-newlines,
        unichr-builtin,
        unicode-builtin,
        unnecessary-pass,
        unpacking-in-except,
        useless-else-on-loop,
        useless-object-inheritance,
        useless-suppression,
        using-cmp-argument,
        wrong-import-order,
        xrange-builtin,
        zip-builtin-not-iterating,


[REPORTS]

# Set the output format. Available formats are text, parseable, colorized, msvs
# (visual studio) and html. You can also give a reporter class, eg
# mypackage.mymodule.MyReporterClass.
output-format=text

# Put messages in a separate file for each module / package specified on the
# command line instead of printing them on stdout. Reports (if any) will be
# written in a file name "pylint_global.[txt|html]". This option is deprecated
# and it will be removed in Pylint 2.0.
files-output=no

# Tells whether to display a full report or only the messages
reports=no

# Python expression which should return a note less than 10 (10 is the highest
# note). You have access to the variables errors warning, statement which
# respectively contain the number of errors / warnings messages and the total
# number of statements analyzed. This is used by the global evaluation report
# (RP0004).
evaluation=10.0 - ((float(5 * error + warning + refactor + convention) / statement) * 10)

# Template used to display messages. This is a python new-style format string
# used to format the message information. See doc for all details
#msg-template=


[BASIC]

# Good variable names which should always be accepted, separated by a comma
good-names=main,_

# Bad variable names which should always be refused, separated by a comma
bad-names=

# Colon-delimited sets of names that determine each other's naming style when
# the name regexes allow several styles.
name-group=

# Include a hint for the correct naming format with invalid-name
include-naming-hint=no

# List of decorators that produce properties, such as abc.abstractproperty. Add
# to this list to register other decorators that produce valid properties.
property-classes=abc.abstractproperty,cached_property.cached_property,cached_property.threaded_cached_property,cached_property.cached_property_with_ttl,cached_property.threaded_cached_property_with_ttl

# Regular expression matching correct function names
function-rgx=^(?:(?P<exempt>setUp|tearDown|setUpModule|tearDownModule)|(?P<camel_case>_?[A-Z][a-zA-Z0-9]*)|(?P<snake_case>_?[a-z][a-z0-9_]*))$

# Regular expression matching correct variable names
variable-rgx=^[a-z][a-z0-9_]*$

# Regular expression matching correct constant names
const-rgx=^(_?[A-Z][A-Z0-9_]*|__[a-z0-9_]+__|_?[a-z][a-z0-9_]*)$

# Regular expression matching correct attribute names
attr-rgx=^_{0,2}[a-z][a-z0-9_]*$

# Regular expression matching correct argument names
argument-rgx=^[a-z][a-z0-9_]*$

# Regular expression matching correct class attribute names
class-attribute-rgx=^(_?[A-Z][A-Z0-9_]*|__[a-z0-9_]+__|_?[a-z][a-z0-9_]*)$

# Regular expression matching correct inline iteration names
inlinevar-rgx=^[a-z][a-z0-9_]*$

# Regular expression matching correct class names
class-rgx=^_?[A-Z][a-zA-Z0-9]*$

# Regular expression matching correct module names
module-rgx=^(_?[a-z][a-z0-9_]*|__init__)$

# Regular expression matching correct method names
method-rgx=(?x)^(?:(?P<exempt>_[a-z0-9_]+__|runTest|setUp|tearDown|setUpTestCase|tearDownTestCase|setupSelf|tearDownClass|setUpClass|(test|assert)_*[A-Z0-9][a-zA-Z0-9_]*|next)|(?P<camel_case>_{0,2}[A-Z][a-zA-Z0-9_]*)|(?P<snake_case>_{0,2}[a-z][a-z0-9_]*))$

# Regular expression which should only match function or class names that do
# not require a docstring.
no-docstring-rgx=(__.*__|main|test.*|.*test|.*Test)$

# Minimum line length for functions/classes that require docstrings, shorter
# ones are exempt.
docstring-min-length=10


[TYPECHECK]

# List of decorators that produce context managers, such as
# contextlib.contextmanager. Add to this list to register other decorators that
# produce valid context managers.
contextmanager-decorators=contextlib.contextmanager,contextlib2.contextmanager

# Tells whether missing members accessed in mixin class should be ignored. A
# mixin class is detected if its name ends with "mixin" (case insensitive).
ignore-mixin-members=yes

# List of module names for which member attributes should not be checked
# (useful for modules/projects where namespaces are manipulated during runtime
# and thus existing member attributes cannot be deduced by static analysis. It
# supports qualified module names, as well as Unix pattern matching.
ignored-modules=

# List of class names for which member attributes should not be checked (useful
# for classes with dynamically set attributes). This supports the use of
# qualified names.
ignored-classes=optparse.Values,thread._local,_thread._local

# List of members which are set dynamically and missed by pylint inference
# system, and so shouldn't trigger E1101 when accessed. Python regular
# expressions are accepted.
generated-members=


[FORMAT]

# Maximum number of characters on a single line.
max-line-length=100

# TODO(https://github.com/PyCQA/pylint/issues/3352): Direct pylint to exempt
# lines made too long by directives to pytype.

# Regexp for a line that is allowed to be longer than the limit.
ignore-long-lines=(?x)(
  ^\s*(\#\ )?<?https?://\S+>?$|
  ^\s*(from\s+\S+\s+)?import\s+.+$)

# Allow the body of an if to be on the same line as the test if there is no
# else.
single-line-if-stmt=yes

# List of optional constructs for which whitespace checking is disabled. `dict-
# separator` is used to allow tabulation in dicts, etc.: {1  : 1,\n222: 2}.
# `trailing-comma` allows a space between comma and closing bracket: (a, ).
# `empty-line` allows space-only lines.
no-space-check=

# Maximum number of lines in a module
max-module-lines=99999

# String used as indentation unit.  The internal Google style guide mandates 2
# spaces.  Google's externaly-published style guide says 4, consistent with
# PEP 8. Here we use 4 spaces.
indent-string='    '

# Number of spaces of indent required inside a hanging  or continued line.
indent-after-paren=4

# Expected format of line ending, e.g. empty (any line ending), LF or CRLF.
expected-line-ending-format=


[MISCELLANEOUS]

# List of note tags to take in consideration, separated by a comma.
notes=TODO


[STRING]

# This flag controls whether inconsistent-quotes generates a warning when the
# character used as a quote delimiter is used inconsistently within a module.
check-quote-consistency=yes


[VARIABLES]

# Tells whether we should check for unused import in __init__ files.
init-import=no

# A regular expression matching the name of dummy variables (i.e. expectedly
# not used).
dummy-variables-rgx=^\*{0,2}(_$|unused_|dummy_)

# List of additional names supposed to be defined in builtins. Remember that
# you should avoid to define new builtins when possible.
additional-builtins=

# List of strings which can identify a callback function by name. A callback
# name must start or end with one of those strings.
callbacks=cb_,_cb

# List of qualified module names which can have objects that can redefine
# builtins.
redefining-builtins-modules=six,six.moves,past.builtins,future.builtins,functools


[LOGGING]

# Logging modules to check that the string format arguments are in logging
# function parameter format
logging-modules=logging,absl.logging,tensorflow.io.logging


[SIMILARITIES]

# Minimum lines number of a similarity.
min-similarity-lines=4

# Ignore comments when computing similarities.
ignore-comments=yes

# Ignore docstrings when computing similarities.
ignore-docstrings=yes

# Ignore imports when computing similarities.
ignore-imports=no


[SPELLING]

# Spelling dictionary name. Available dictionaries: none. To make it working
# install python-enchant package.
spelling-dict=

# List of comma separated words that should not be checked.
spelling-ignore-words=

# A path to a file that contains private dictionary; one word per line.
spelling-private-dict-file=

# Tells whether to store unknown words to indicated private dictionary in
# --spelling-private-dict-file option instead of raising a message.
spelling-store-unknown-words=no


[IMPORTS]

# Deprecated modules which should not be used, separated by a comma
deprecated-modules=regsub,
                   TERMIOS,
                   Bastion,
                   rexec,
                   sets

# Create a graph of every (i.e. internal and external) dependencies in the
# given file (report RP0402 must not be disabled)
import-graph=

# Create a graph of external dependencies in the given file (report RP0402 must
# not be disabled)
ext-import-graph=

# Create a graph of internal dependencies in the given file (report RP0402 must
# not be disabled)
int-import-graph=

# Force import order to recognize a module as part of the standard
# compatibility libraries.
known-standard-library=

# Force import order to recognize a module as part of a third party library.
known-third-party=enchant, absl

# Analyse import fallback blocks. This can be used to support both Python 2 and
# 3 compatible code, which means that the block might have code that exists
# only in one or another interpreter, leading to false positives when analysed.
analyse-fallback-blocks=no


[CLASSES]

# List of method names used to declare (i.e. assign) instance attributes.
defining-attr-methods=__init__,
                      __new__,
                      setUp

# List of member names, which should be excluded from the protected access
# warning.
exclude-protected=_asdict,
                  _fields,
                  _replace,
                  _source,
                  _make

# List of valid names for the first argument in a class method.
valid-classmethod-first-arg=cls,
                            class_

# List of valid names for the first argument in a metaclass class method.
valid-metaclass-classmethod-first-arg=mcs


[EXCEPTIONS]

# Exceptions that will emit a warning when being caught. Defaults to
# "Exception"
overgeneral-exceptions=StandardError,
                       Exception,
                       BaseException

#######

# https://github.com/edaniszewski/pylint-quotes#configuration
string-quote=single
triple-quote=double
docstring-quote=double


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
# FastChat
| [**Demo**](https://lmarena.ai/) | [**Discord**](https://discord.gg/6GXcFg3TH8) | [**X**](https://x.com/lmsysorg) |

FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
- FastChat powers Chatbot Arena ([lmarena.ai](https://lmarena.ai)), serving over 10 million chat requests for 70+ LLMs.
- Chatbot Arena has collected over 1.5M human votes from side-by-side LLM battles to compile an online [LLM Elo leaderboard](https://lmarena.ai/?leaderboard).

FastChat's core features include:
- The training and evaluation code for state-of-the-art models (e.g., Vicuna, MT-Bench).
- A distributed multi-model serving system with web UI and OpenAI-compatible RESTful APIs.

## News
- [2024/03] 🔥 We released Chatbot Arena technical [report](https://arxiv.org/abs/2403.04132).
- [2023/09] We released **LMSYS-Chat-1M**, a large-scale real-world LLM conversation dataset. Read the [report](https://arxiv.org/abs/2309.11998).
- [2023/08] We released **Vicuna v1.5** based on Llama 2 with 4K and 16K context lengths. Download [weights](#vicuna-weights).
- [2023/07] We released **Chatbot Arena Conversations**, a dataset containing 33k conversations with human preferences. Download it [here](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations).

<details>
<summary>More</summary>

- [2023/08] We released **LongChat v1.5** based on Llama 2 with 32K context lengths. Download [weights](#longchat).
- [2023/06] We introduced **MT-bench**, a challenging multi-turn question set for evaluating chatbots. Check out the blog [post](https://lmsys.org/blog/2023-06-22-leaderboard/).
- [2023/06] We introduced **LongChat**, our long-context chatbots and evaluation tools. Check out the blog [post](https://lmsys.org/blog/2023-06-29-longchat/).
- [2023/05] We introduced **Chatbot Arena** for battles among LLMs. Check out the blog [post](https://lmsys.org/blog/2023-05-03-arena).
- [2023/03] We released **Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality**. Check out the blog [post](https://vicuna.lmsys.org).

</details>

<a href="https://lmarena.ai"><img src="assets/demo_narrow.gif" width="70%"></a>

## Contents
- [Install](#install)
- [Model Weights](#model-weights)
- [Inference with Command Line Interface](#inference-with-command-line-interface)
- [Serving with Web GUI](#serving-with-web-gui)
- [API](#api)
- [Evaluation](#evaluation)
- [Fine-tuning](#fine-tuning)
- [Citation](#citation)

## Install

### Method 1: With pip

```bash
pip3 install "fschat[model_worker,webui]"
```

### Method 2: From source

1. Clone this repository and navigate to the FastChat folder
```bash
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
```

If you are running on Mac:
```bash
brew install rust cmake
```

2. Install Package
```bash
pip3 install --upgrade pip  # enable PEP 660 support
pip3 install -e ".[model_worker,webui]"
```

## Model Weights
### Vicuna Weights
[Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/) is based on Llama 2 and should be used under Llama's [model license](https://github.com/facebookresearch/llama/blob/main/LICENSE).

You can use the commands below to start chatting. It will automatically download the weights from Hugging Face repos.
Downloaded weights are stored in a `.cache` folder in the user's home folder (e.g., `~/.cache/huggingface/hub/<model_name>`).

See more command options and how to handle out-of-memory in the "Inference with Command Line Interface" section below.

**NOTE: `transformers>=4.31` is required for 16K versions.**

| Size | Chat Command | Hugging Face Repo |
| ---  | --- | --- |
| 7B   | `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5`  | [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)   |
| 7B-16k   | `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5-16k`  | [lmsys/vicuna-7b-v1.5-16k](https://huggingface.co/lmsys/vicuna-7b-v1.5-16k)   |
| 13B  | `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-13b-v1.5` | [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5) |
| 13B-16k  | `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-13b-v1.5-16k` | [lmsys/vicuna-13b-v1.5-16k](https://huggingface.co/lmsys/vicuna-13b-v1.5-16k) |
| 33B  | `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-33b-v1.3` | [lmsys/vicuna-33b-v1.3](https://huggingface.co/lmsys/vicuna-33b-v1.3) |

**Old weights**: see [docs/vicuna_weights_version.md](docs/vicuna_weights_version.md) for all versions of weights and their differences.

### Other Models
Besides Vicuna, we also released two additional models: [LongChat](https://lmsys.org/blog/2023-06-29-longchat/) and FastChat-T5.
You can use the commands below to chat with them. They will automatically download the weights from Hugging Face repos.

| Model | Chat Command | Hugging Face Repo |
| ---  | --- | --- |
| LongChat-7B   | `python3 -m fastchat.serve.cli --model-path lmsys/longchat-7b-32k-v1.5`  | [lmsys/longchat-7b-32k](https://huggingface.co/lmsys/longchat-7b-32k-v1.5)   |
| FastChat-T5-3B   | `python3 -m fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0`  | [lmsys/fastchat-t5-3b-v1.0](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) |

## Inference with Command Line Interface

<a href="https://lmarena.ai"><img src="assets/screenshot_cli.png" width="70%"></a>

(Experimental Feature: You can specify `--style rich` to enable rich text output and better text streaming quality for some non-ASCII content. This may not work properly on certain terminals.)

#### Supported Models
FastChat supports a wide range of models, including
LLama 2, Vicuna, Alpaca, Baize, ChatGLM, Dolly, Falcon, FastChat-T5, GPT4ALL, Guanaco, MTP, OpenAssistant, OpenChat, RedPajama, StableLM, WizardLM, xDAN-AI and more.

See a complete list of supported models and instructions to add a new model [here](docs/model_support.md).

#### Single GPU
The command below requires around 14GB of GPU memory for Vicuna-7B and 28GB of GPU memory for Vicuna-13B.
See the ["Not Enough Memory" section](#not-enough-memory) below if you do not have enough memory.
`--model-path` can be a local folder or a Hugging Face repo name.
```
python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5
```

#### Multiple GPUs
You can use model parallelism to aggregate GPU memory from multiple GPUs on the same machine. 
```
python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --num-gpus 2
```

Tips:
Sometimes the "auto" device mapping strategy in huggingface/transformers does not perfectly balance the memory allocation across multiple GPUs.
You can use `--max-gpu-memory` to specify the maximum memory per GPU for storing model weights.
This allows it to allocate more memory for activations, so you can use longer context lengths or larger batch sizes. For example,

```
python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --num-gpus 2 --max-gpu-memory 8GiB
```

#### CPU Only
This runs on the CPU only and does not require GPU. It requires around 30GB of CPU memory for Vicuna-7B and around 60GB of CPU memory for Vicuna-13B.
```
python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --device cpu
```

Use Intel AI Accelerator AVX512_BF16/AMX to accelerate CPU inference.
```
CPU_ISA=amx python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --device cpu
```

#### Metal Backend (Mac Computers with Apple Silicon or AMD GPUs)
Use `--device mps` to enable GPU acceleration on Mac computers (requires torch >= 2.0).
Use `--load-8bit` to turn on 8-bit compression.
```
python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --device mps --load-8bit
```
Vicuna-7B can run on a 32GB M1 Macbook with 1 - 2 words / second.

#### Intel XPU (Intel Data Center and Arc A-Series GPUs)
Install the [Intel Extension for PyTorch](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html). Set the OneAPI environment variables:
```
source /opt/intel/oneapi/setvars.sh
```

Use `--device xpu` to enable XPU/GPU acceleration.
```
python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --device xpu
```
Vicuna-7B can run on an Intel Arc A770 16GB.

#### Ascend NPU
Install the [Ascend PyTorch Adapter](https://github.com/Ascend/pytorch). Set the CANN environment variables:
```
source /usr/local/Ascend/ascend-toolkit/set_env.sh
```

Use `--device npu` to enable NPU acceleration.
```
python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --device npu
```
Vicuna-7B/13B can run on an Ascend NPU.

#### Not Enough Memory
If you do not have enough memory, you can enable 8-bit compression by adding `--load-8bit` to commands above.
This can reduce memory usage by around half with slightly degraded model quality.
It is compatible with the CPU, GPU, and Metal backend.

Vicuna-13B with 8-bit compression can run on a single GPU with 16 GB of VRAM, like an Nvidia RTX 3090, RTX 4080, T4, V100 (16GB), or an AMD RX 6800 XT.

```
python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --load-8bit
```

In addition to that, you can add `--cpu-offloading` to commands above to offload weights that don't fit on your GPU onto the CPU memory.
This requires 8-bit compression to be enabled and the bitsandbytes package to be installed, which is only available on linux operating systems.

#### More Platforms and Quantization
- For AMD GPU users, please install ROCm and [the ROCm version of PyTorch](https://pytorch.org/get-started/locally/) before you install FastChat. See also this [post](https://github.com/lm-sys/FastChat/issues/104#issuecomment-1613791563).
- FastChat supports ExLlama V2. See [docs/exllama_v2.md](/docs/exllama_v2.md).
- FastChat supports GPTQ 4bit inference with [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa). See [docs/gptq.md](/docs/gptq.md).
- FastChat supports AWQ 4bit inference with [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq). See [docs/awq.md](/docs/awq.md).
- [MLC LLM](https://mlc.ai/mlc-llm/), backed by [TVM Unity](https://github.com/apache/tvm/tree/unity) compiler, deploys Vicuna natively on phones, consumer-class GPUs and web browsers via Vulkan, Metal, CUDA and WebGPU.

#### Use models from modelscope
For Chinese users, you can use models from www.modelscope.cn via specify the following environment variables.
```bash
export FASTCHAT_USE_MODELSCOPE=True
```

## Serving with Web GUI

<a href="https://lmarena.ai"><img src="assets/screenshot_gui.png" width="70%"></a>

To serve using the web UI, you need three main components: web servers that interface with users, model workers that host one or more models, and a controller to coordinate the webserver and model workers. You can learn more about the architecture [here](docs/server_arch.md).

Here are the commands to follow in your terminal:

#### Launch the controller
```bash
python3 -m fastchat.serve.controller
```

This controller manages the distributed workers.

#### Launch the model worker(s)
```bash
python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.5
```
Wait until the process finishes loading the model and you see "Uvicorn running on ...". The model worker will register itself to the controller .

To ensure that your model worker is connected to your controller properly, send a test message using the following command:
```bash
python3 -m fastchat.serve.test_message --model-name vicuna-7b-v1.5
```
You will see a short output.

#### Launch the Gradio web server
```bash
python3 -m fastchat.serve.gradio_web_server
```

This is the user interface that users will interact with.

By following these steps, you will be able to serve your models using the web UI. You can open your browser and chat with a model now.
If the models do not show up, try to reboot the gradio web server.

## Launch Chatbot Arena (side-by-side battle UI)

Currently, Chatbot Arena is powered by FastChat. Here is how you can launch an instance of Chatbot Arena locally.

FastChat supports popular API-based models such as OpenAI, Anthropic, Gemini, Mistral and more. To add a custom API, please refer to the model support [doc](./docs/model_support.md). Below we take OpenAI models as an example.

Create a JSON configuration file `api_endpoint.json` with the api endpoints of the models you want to serve, for example:
```
{
    "gpt-4o-2024-05-13": {
        "model_name": "gpt-4o-2024-05-13",
        "api_base": "https://api.openai.com/v1",
        "api_type": "openai",
        "api_key": [Insert API Key],
        "anony_only": false
    }
}
```
For Anthropic models, specify `"api_type": "anthropic_message"` with your Anthropic key. Similarly, for gemini model, specify `"api_type": "gemini"`. More details can be found in [api_provider.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/api_provider.py).

To serve your own model using local gpus, follow the instructions in [Serving with Web GUI](#serving-with-web-gui).

Now you're ready to launch the server:
```
python3 -m fastchat.serve.gradio_web_server_multi --register-api-endpoint-file api_endpoint.json
```

#### (Optional): Advanced Features, Scalability, Third Party UI
- You can register multiple model workers to a single controller, which can be used for serving a single model with higher throughput or serving multiple models at the same time. When doing so, please allocate different GPUs and ports for different model workers.
```
# worker 0
CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.5 --controller http://localhost:21001 --port 31000 --worker http://localhost:31000
# worker 1
CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.model_worker --model-path lmsys/fastchat-t5-3b-v1.0 --controller http://localhost:21001 --port 31001 --worker http://localhost:31001
```
- You can also launch a multi-tab gradio server, which includes the Chatbot Arena tabs.
```bash
python3 -m fastchat.serve.gradio_web_server_multi
```
- The default model worker based on huggingface/transformers has great compatibility but can be slow. If you want high-throughput batched serving, you can try [vLLM integration](docs/vllm_integration.md).
- If you want to host it on your own UI or third party UI, see [Third Party UI](docs/third_party_ui.md).

## API
### OpenAI-Compatible RESTful APIs & SDK
FastChat provides OpenAI-compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs.
The FastChat server is compatible with both [openai-python](https://github.com/openai/openai-python) library and cURL commands.
The REST API is capable of being executed from Google Colab free tier, as demonstrated in the [FastChat_API_GoogleColab.ipynb](https://github.com/lm-sys/FastChat/blob/main/playground/FastChat_API_GoogleColab.ipynb) notebook, available in our repository.
See [docs/openai_api.md](docs/openai_api.md).

### Hugging Face Generation APIs
See [fastchat/serve/huggingface_api.py](fastchat/serve/huggingface_api.py).

### LangChain Integration
See [docs/langchain_integration](docs/langchain_integration.md).

## Evaluation
We use MT-bench, a set of challenging multi-turn open-ended questions to evaluate models. 
To automate the evaluation process, we prompt strong LLMs like GPT-4 to act as judges and assess the quality of the models' responses.
See instructions for running MT-bench at [fastchat/llm_judge](fastchat/llm_judge).

MT-bench is the new recommended way to benchmark your models. If you are still looking for the old 80 questions used in the vicuna blog post, please go to [vicuna-blog-eval](https://github.com/lm-sys/vicuna-blog-eval).

## Fine-tuning
### Data

Vicuna is created by fine-tuning a Llama base model using approximately 125K user-shared conversations gathered from ShareGPT.com with public APIs. To ensure data quality, we convert the HTML back to markdown and filter out some inappropriate or low-quality samples. Additionally, we divide lengthy conversations into smaller segments that fit the model's maximum context length. For detailed instructions to clean the ShareGPT data, check out [here](docs/commands/data_cleaning.md).

We will not release the ShareGPT dataset. If you would like to try the fine-tuning code, you can run it with some dummy conversations in [dummy_conversation.json](data/dummy_conversation.json). You can follow the same format and plug in your own data.

### Code and Hyperparameters
Our code is based on [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) with additional support for multi-turn conversations.
We use similar hyperparameters as the Stanford Alpaca.

| Hyperparameter | Global Batch Size | Learning rate | Epochs | Max length | Weight decay |
| --- | ---: | ---: | ---: | ---: | ---: |
| Vicuna-13B | 128 | 2e-5 | 3 | 2048 | 0 |

### Fine-tuning Vicuna-7B with Local GPUs

- Install dependency
```bash
pip3 install -e ".[train]"
```

- You can use the following command to train Vicuna-7B with 4 x A100 (40GB). Update `--model_name_or_path` with the actual path to Llama weights and `--data_path` with the actual path to data.
```bash
torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train_mem.py \
    --model_name_or_path meta-llama/Llama-2-7b-hf \
    --data_path data/dummy_conversation.json \
    --bf16 True \
    --output_dir output_vicuna \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 10 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True
```

Tips:
- If you are using V100 which is not supported by FlashAttention, you can use the [memory-efficient attention](https://arxiv.org/abs/2112.05682) implemented in [xFormers](https://github.com/facebookresearch/xformers). Install xformers and replace `fastchat/train/train_mem.py` above with [fastchat/train/train_xformers.py](fastchat/train/train_xformers.py).
- If you meet out-of-memory due to "FSDP Warning: When using FSDP, it is efficient and recommended... ", see solutions [here](https://github.com/huggingface/transformers/issues/24724#issuecomment-1645189539).
- If you meet out-of-memory during model saving, see solutions [here](https://github.com/pytorch/pytorch/issues/98823).
- To turn on logging to popular experiment tracking tools such as Tensorboard, MLFlow or Weights & Biases, use the `report_to` argument, e.g. pass `--report_to wandb` to turn on logging to Weights & Biases.

### Other models, platforms and LoRA support
More instructions to train other models (e.g., FastChat-T5) and use LoRA are in [docs/training.md](docs/training.md).

### Fine-tuning on Any Cloud with SkyPilot
[SkyPilot](https://github.com/skypilot-org/skypilot) is a framework built by UC Berkeley for easily and cost effectively running ML workloads on any cloud (AWS, GCP, Azure, Lambda, etc.).
Find SkyPilot documentation [here](https://github.com/skypilot-org/skypilot/tree/master/llm/vicuna) on using managed spot instances to train Vicuna and save on your cloud costs.

## Citation
The code (training, serving, and evaluation) in this repository is mostly developed for or derived from the paper below.
Please cite it if you find the repository helpful.

```
@misc{zheng2023judging,
      title={Judging LLM-as-a-judge with MT-Bench and Chatbot Arena},
      author={Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric. P Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica},
      year={2023},
      eprint={2306.05685},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

We are also planning to add more of our research to this repository.


================================================
FILE: docker/Dockerfile
================================================
FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04

RUN apt-get update -y && apt-get install -y python3.9 python3.9-distutils curl
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py
RUN pip3 install fschat
RUN pip3 install fschat[model_worker,webui]

================================================
FILE: docker/docker-compose.yml
================================================
version: "3.9"

services:
  fastchat-controller:
    build:
      context: .
      dockerfile: Dockerfile
    image: fastchat:latest
    ports:
      - "21001:21001"
    entrypoint: ["python3.9", "-m", "fastchat.serve.controller", "--host", "0.0.0.0", "--port", "21001"]
  fastchat-model-worker:
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - huggingface:/root/.cache/huggingface
    image: fastchat:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    entrypoint: ["python3.9", "-m", "fastchat.serve.model_worker", "--model-names", "${FASTCHAT_WORKER_MODEL_NAMES:-vicuna-7b-v1.5}", "--model-path", "${FASTCHAT_WORKER_MODEL_PATH:-lmsys/vicuna-7b-v1.5}", "--worker-address", "http://fastchat-model-worker:21002", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002"]
  fastchat-api-server:
    build:
      context: .
      dockerfile: Dockerfile
    image: fastchat:latest
    ports:
      - "8000:8000"
    entrypoint: ["python3.9", "-m", "fastchat.serve.openai_api_server", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "8000"]
volumes:
  huggingface:


================================================
FILE: docs/arena.md
================================================
# Chatbot Arena
Chatbot Arena is an LLM benchmark platform featuring anonymous, randomized battles, available at https://lmarena.ai.
We invite the entire community to join this benchmarking effort by contributing your votes and models.

## How to add a new model
If you want to see a specific model in the arena, you can follow the methods below.

### Method 1: Hosted by 3rd party API providers or yourself
If you have a model hosted by a 3rd party API provider or yourself, please give us the access to an API endpoint.
  - We prefer OpenAI-compatible APIs, so we can reuse our [code](https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/api_provider.py) for calling OpenAI models.
  - If you have your own API protocol, please follow the [instructions](model_support.md) to add them. Contribute your code by sending a pull request.

### Method 2: Hosted by LMSYS
1. Contribute the code to support this model in FastChat by submitting a pull request. See [instructions](model_support.md).
2. After the model is supported, we will try to schedule some compute resources to host the model in the arena. However, due to the limited resources we have, we may not be able to serve every model. We will select the models based on popularity, quality, diversity, and other factors.


## How to launch vision arena

1. Run `python3 -m fastchat.serve.controller` to start the controller and begin registering local model workers and API-provided workers.
2. Run `python3 -m fastchat.serve.sglang_worker --model-path <model-path> --tokenizer-path <tokenizer-path>` to run local vision-language models. Currently supported models include the LLaVA and Yi-VL series.
3. If you are using a 3rd party model with an API provider (e.g. GPT-4-V, Gemini 1.5), please follow the instructions [model_support.md](model_support.md) to add a json file `api_endpoints.json`.
4. Run the gradio server with the `--vision-arena` flag on.
5. To run and store images into a remote directory, add the flag: `--use-remote-storage`
6. To run and allow samples of random questions, add `--random_questions metadata_sampled.json`. Check sections below for how to generate this.

Example command:
```
python3 -m fastchat.serve.gradio_web_server_multi --share --register-api-endpoint-file api_endpoints.json --vision-arena --use-remote-storage --random-questions metadata_sampled.json
```

### NSFW and CSAM Detection
1. Adding NSFW Endpoint and API key: Please add the following environment variables to run the NSFW moderation filter for images: 
  - `AZURE_IMG_MODERATION_ENDPOINT`: This is the endpoint that the NSFW moderator is hosted (e.g. https://{endpoint}/contentmoderator/moderate/v1.0/ProcessImage/Evaluate). Change the `endpoint` to your own.
  - `AZURE_IMG_MODERATION_API_KEY`: Your API key to run this endpoint.
2. Adding CSAM API key:
  - `PHOTODNA_API_KEY`: The API key that runs the CSAM detector endpoint.

Example in `~/.bashrc`:
```
export AZURE_IMG_MODERATION_ENDPOINT=https://<endpoint>/contentmoderator/moderate/v1.0/ProcessImage/Evaluate
export AZURE_IMG_MODERATION_API_KEY=<api-key>
export PHOTODNA_API_KEY=<api-key>
```

### Adding Random Samples for VQA
We provide random samples of example images for users to interact with coming from various datasets including DocVQA, RealWorldQA, ChartQA and VizWiz-VQA.
1. Download the images and generate random questions file by running `python fastchat/serve/vision/create_vqa_examples_dir.py`

================================================
FILE: docs/awq.md
================================================
# AWQ 4bit Inference

We integrated [AWQ](https://github.com/mit-han-lab/llm-awq) into FastChat to provide **efficient and accurate** 4bit LLM inference.

## Install AWQ

Setup environment (please refer to [this link](https://github.com/mit-han-lab/llm-awq#install) for more details):
```bash
conda create -n fastchat-awq python=3.10 -y
conda activate fastchat-awq
# cd /path/to/FastChat
pip install --upgrade pip    # enable PEP 660 support
pip install -e .             # install fastchat

git clone https://github.com/mit-han-lab/llm-awq repositories/llm-awq
cd repositories/llm-awq
pip install -e .             # install awq package

cd awq/kernels				
python setup.py install	     # install awq CUDA kernels
```

## Chat with the CLI

```bash
# Download quantized model from huggingface
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/mit-han-lab/vicuna-7b-v1.3-4bit-g128-awq

# You can specify which quantized model to use by setting --awq-ckpt
python3 -m fastchat.serve.cli \
    --model-path models/vicuna-7b-v1.3-4bit-g128-awq \
    --awq-wbits 4 \
    --awq-groupsize 128 
```

## Benchmark

* Through **4-bit weight quantization**, AWQ helps to run larger language models within the device memory restriction and prominently accelerates token generation. All benchmarks are done with group_size 128. 

* Benchmark on NVIDIA RTX A6000:

  | Model           | Bits | Max Memory (MiB) | Speed (ms/token) | AWQ Speedup |
  | --------------- | ---- | ---------------- | ---------------- | ----------- |
  | vicuna-7b       | 16   | 13543            | 26.06            | /           |
  | vicuna-7b       | 4    | 5547             | 12.43            | 2.1x        |
  | llama2-7b-chat  | 16   | 13543            | 27.14            | /           |
  | llama2-7b-chat  | 4    | 5547             | 12.44            | 2.2x        |
  | vicuna-13b      | 16   | 25647            | 44.91            | /           |
  | vicuna-13b      | 4    | 9355             | 17.30            | 2.6x        |
  | llama2-13b-chat | 16   | 25647            | 47.28            | /           |
  | llama2-13b-chat | 4    | 9355             | 20.28            | 2.3x        |

* NVIDIA RTX 4090:

  | Model           | AWQ 4bit Speed (ms/token) | FP16 Speed (ms/token) | AWQ Speedup |
  | --------------- | ------------------------- | --------------------- | ----------- |
  | vicuna-7b       | 8.61                      | 19.09                 | 2.2x        |
  | llama2-7b-chat  | 8.66                      | 19.97                 | 2.3x        |
  | vicuna-13b      | 12.17                     | OOM                   | /           |
  | llama2-13b-chat | 13.54                     | OOM                   | /           |

* NVIDIA Jetson Orin:

  | Model           | AWQ 4bit Speed (ms/token) | FP16 Speed (ms/token) | AWQ Speedup |
  | --------------- | ------------------------- | --------------------- | ----------- |
  | vicuna-7b       | 65.34                     | 93.12                 | 1.4x        |
  | llama2-7b-chat  | 75.11                     | 104.71                | 1.4x        |
  | vicuna-13b      | 115.40                    | OOM                   | /           |
  | llama2-13b-chat | 136.81                    | OOM                   | /           |


================================================
FILE: docs/commands/conv_release.md
================================================
## Chatbot Arena Conversations

1. Gather battles
```
python3 clean_battle_data.py --max-num 10 --mode conv_release
```

2. Tag OpenAI moderation
```
python3 tag_openai_moderation.py --in clean_battle_conv_20230814.json
```

3. Clean PII

4. Filter additional blocked words

```
python3 filter_bad_conv.py --in clean_battle_conv_20230630_tagged_v1_pii.json
```

5. Add additional toxicity tag


## All Conversations

1. Gather chats
```
python3 clean_chat_data.py
```

2. Sample
```
python3 conv_release_scripts/sample.py
```


## Prompt distribution



================================================
FILE: docs/commands/data_cleaning.md
================================================
## Data cleaning

## Requirements
```
pip3 install bs4 markdownify
pip3 install polyglot pyicu pycld2
```

## Steps
```
# Convert html to markdown
python3 -m fastchat.data.clean_sharegpt --in sharegpt_html.json --out sharegpt_clean.json

# Keep or remove specific languages
python3 -m fastchat.data.optional_clean --in sharegpt_clean.json --out sharegpt_clean_lang.json --skip-lang SOME_LANGUAGE_CODE

# Split long conversations
python3 -m fastchat.data.split_long_conversation --in sharegpt_clean_lang.json --out sharegpt_clean_lang_split.json --model-name /home/ubuntu/model_weights/llama-7b/
```


================================================
FILE: docs/commands/leaderboard.md
================================================
### Get logs
```
gsutil -m rsync -r gs://fastchat_logs ~/fastchat_logs/
```

### Clean battle data
```
cd ~/FastChat/fastchat/serve/monitor
python3 clean_battle_data.py
```

### Run Elo analysis
```
python3 elo_analysis.py --clean-battle-file clean_battle_20230905.json
```

### Copy files to HF space
1. update plots
```
scp atlas:/data/lmzheng/FastChat/fastchat/serve/monitor/elo_results_20230905.pkl .
```

2. update table
```
wget https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard/raw/main/leaderboard_table_20230905.csv
```

### Update files on webserver
```
DATE=20231002

rm -rf elo_results.pkl leaderboard_table.csv
wget https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard/resolve/main/elo_results_$DATE.pkl
wget https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard/resolve/main/leaderboard_table_$DATE.csv
ln -s leaderboard_table_$DATE.csv leaderboard_table.csv
ln -s elo_results_$DATE.pkl elo_results.pkl
```


================================================
FILE: docs/commands/local_cluster.md
================================================
### Local GPU cluster
node-01
```
python3 -m fastchat.serve.controller --host 0.0.0.0 --port 10002

CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-13b-v1.5 --model-name vicuna-13b --controller http://node-01:10002 --host 0.0.0.0 --port 31000 --worker-address http://$(hostname):31000
CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-13b-v1.5 --model-name vicuna-13b --controller http://node-01:10002 --host 0.0.0.0 --port 31001 --worker-address http://$(hostname):31001

CUDA_VISIBLE_DEVICES=2,3 ray start --head
python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-33b-v1.3 --model-name vicuna-33b --controller http://node-01:10002 --host 0.0.0.0 --port 31002 --worker-address http://$(hostname):31002 --num-gpus 2
```

node-02
```
CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.vllm_worker --model-path meta-llama/Llama-2-13b-chat-hf --model-name llama-2-13b-chat --controller http://node-01:10002 --host 0.0.0.0 --port 31000 --worker-address http://$(hostname):31000 --tokenizer meta-llama/Llama-2-7b-chat-hf
CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.vllm_worker --model-path meta-llama/Llama-2-13b-chat-hf --model-name llama-2-13b-chat --controller http://node-01:10002 --host 0.0.0.0 --port 31001 --worker-address http://$(hostname):31001 --tokenizer meta-llama/Llama-2-7b-chat-hf
CUDA_VISIBLE_DEVICES=2 python3 -m fastchat.serve.vllm_worker --model-path meta-llama/Llama-2-7b-chat-hf --model-name llama-2-7b-chat --controller http://node-01:10002 --host 0.0.0.0 --port 31002 --worker-address http://$(hostname):31002 --tokenizer meta-llama/Llama-2-7b-chat-hf
CUDA_VISIBLE_DEVICES=3 python3 -m fastchat.serve.vllm_worker --model-path WizardLM/WizardLM-13B-V1.1 --model-name wizardlm-13b  --controller http://node-01:10002 --host 0.0.0.0 --port 31003 --worker-address http://$(hostname):31003
```

node-03
```
python3 -m fastchat.serve.vllm_worker --model-path mosaicml/mpt-30b-chat --controller http://node-01:10002 --host 0.0.0.0 --port 31000 --worker-address http://$(hostname):31000 --num-gpus 2
python3 -m fastchat.serve.vllm_worker --model-path timdettmers/guanaco-33b-merged --model-name guanaco-33b  --controller http://node-01:10002 --host 0.0.0.0 --port 31002 --worker-address http://$(hostname):31002 --num-gpus 2 --tokenizer hf-internal-testing/llama-tokenizer
```

node-04
```
CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.multi_model_worker --model-path ~/model_weights/RWKV-4-Raven-14B-v12-Eng98%25-Other2%25-20230523-ctx8192.pth --model-name RWKV-4-Raven-14B --model-path lmsys/fastchat-t5-3b-v1.0 --model-name fastchat-t5-3b --controller http://node-01:10002 --host 0.0.0.0 --port 31000 --worker http://$(hostname):31000 --limit 4
CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.multi_model_worker --model-path OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 --model-name oasst-pythia-12b --model-path mosaicml/mpt-7b-chat --model-name mpt-7b-chat --controller http://node-01:10002 --host 0.0.0.0 --port 31001 --worker http://$(hostname):31001 --limit 4
CUDA_VISIBLE_DEVICES=2 python3 -m fastchat.serve.multi_model_worker --model-path lmsys/vicuna-7b-v1.5 --model-name vicuna-7b --model-path THUDM/chatglm-6b --model-name chatglm-6b --controller http://node-01:10002 --host 0.0.0.0 --port 31002 --worker http://$(hostname):31002 --limit 4
CUDA_VISIBLE_DEVICES=3 python3 -m fastchat.serve.vllm_worker --model-path ~/model_weights/alpaca-13b  --controller http://node-01:10002 --host 0.0.0.0 --port 31003 --worker-address http://$(hostname):31003
```

test
```
python3 -m fastchat.serve.test_message --model vicuna-13b --controller http://localhost:10002
```


================================================
FILE: docs/commands/pypi.md
================================================
### Requirement
```
python3 -m pip install twine
python3 -m pip install --upgrade pip
pip3 install build
```

### Upload
```
bash scripts/upload_pypi.sh
```


================================================
FILE: docs/commands/webserver.md
================================================
### Install
```
sudo apt update
sudo apt install tmux htop

wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh
bash Anaconda3-2022.10-Linux-x86_64.sh

conda create -n fastchat python=3.9
conda activate fastchat

git clone https://github.com/lm-sys/FastChat.git
cd FastChat
pip3 install -e .
```


### Launch servers
```
cd fastchat_logs/controller
python3 -m fastchat.serve.controller --host 0.0.0.0 --port 21001
python3 -m fastchat.serve.register_worker --controller http://localhost:21001 --worker-name https://
python3 -m fastchat.serve.test_message --model vicuna-13b --controller http://localhost:21001

cd fastchat_logs/server0

python3 -m fastchat.serve.huggingface_api_worker --model-info-file ~/elo_results/register_hf_api_models.json

export OPENAI_API_KEY=
export ANTHROPIC_API_KEY=
export GCP_PROJECT_ID=

python3 -m fastchat.serve.gradio_web_server_multi --controller http://localhost:21001 --concurrency 50 --add-chatgpt --add-claude --add-palm --elo ~/elo_results/elo_results.pkl --leaderboard-table-file ~/elo_results/leaderboard_table.csv --register ~/elo_results/register_oai_models.json --show-terms

python3 backup_logs.py
```


### Check the launch time
```
for i in $(seq 0 11); do cat fastchat_logs/server$i/gradio_web_server.log | grep "Running on local URL" | tail -n 1; done
```


### Increase the limit of max open files
One process (do not need reboot)
```
sudo prlimit --nofile=1048576:1048576 --pid=$id

for id in $(ps -ef | grep gradio_web_server | awk '{print $2}'); do echo $id; prlimit --nofile=1048576:1048576 --pid=$id; done
```

System (need reboot): Add the lines below to `/etc/security/limits.conf`
```
* hard nofile 65535
* soft nofile 65535
```


### Gradio edit  (3.35.2)
1. gtag and canvas
```
vim /home/vicuna/anaconda3/envs/fastchat/lib/python3.9/site-packages/gradio/templates/frontend/index.html
```

```
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-K6D24EE9ED"></script><script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());
  gtag('config', 'G-K6D24EE9ED');
  window.__gradio_mode__ = "app";
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js"></script>
```

2. deprecation warnings
```
vim /home/vicuna/anaconda3/envs/fastchat/lib/python3.9/site-packages/gradio/deprecation.py
```

```
def check_deprecated_parameters(
```

3. Loading
```
vim /home/vicuna/anaconda3/envs/fastchat/lib/python3.9/site-packages/gradio/templates/frontend/assets/index-188ef5e8.js
```

```
%s/"Loading..."/"Loading...(Please refresh if it takes more than 30 seconds)"/g
```


================================================
FILE: docs/dashinfer_integration.md
================================================
# dash-infer Integration
[DashInfer](https://github.com/modelscope/dash-infer) is a high-performance inference engine specifically optimized for CPU environments, delivering exceptional performance boosts for LLM inference tasks. It supports acceleration for a variety of models including Llama, Qwen, and ChatGLM, making it a versatile choice as a performant worker in FastChat. Notably, DashInfer exhibits significant performance enhancements on both Intel x64 and ARMv9 processors, catering to a wide spectrum of hardware platforms. Its efficient design and optimization techniques ensure rapid and accurate inference capabilities, making it an ideal solution for deploying large language models in resource-constrained environments or scenarios where CPU utilization is preferred over GPU acceleration.

## Instructions
1. Install dash-infer.
    ```
    pip install dashinfer
    ```

2. When you launch a model worker, replace the normal worker (`fastchat.serve.model_worker`) with the dash-infer worker (`fastchat.serve.dashinfer_worker`). All other commands such as controller, gradio web server, and OpenAI API server are kept the same.
   ```
   python3 -m fastchat.serve.dashinfer_worker --model-path qwen/Qwen-7B-Chat --revision=master /path/to/dashinfer-model-generation-config.json
   ```
Here is an example:
   ```
   python3 -m fastchat.serve.dashinfer_worker --model-path qwen/Qwen-7B-Chat --revision=master dash-infer/examples/python/model_config/config_qwen_v10_7b.json
   ```

   If you use an already downloaded model, try to replace model-path with a local one and choose a conversation template via --conv-template option
   '''
   python3 -m fastchat.serve.dashinfer_worker --model-path ~/.cache/modelscope/hub/qwen/Qwen-7B-Chat --conv-template qwen-7b-chat /path/to/dashinfer-model-generation-config.json
   '''
   All avaliable conversation chat templates are listed at [fastchat/conversation.py](../fastchat/conversation.py)


================================================
FILE: docs/dataset_release.md
================================================
## Datasets
We release the following datasets based on our projects and websites.

- [LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)
- [LMSYS-Human-Preference-55k](lmsys/lmsys-arena-human-preference-55k)
- [Chatbot Arena Conversation Dataset](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
- [MT-bench Human Annotation Dataset](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments)


================================================
FILE: docs/exllama_v2.md
================================================
# ExllamaV2 GPTQ Inference Framework

Integrated [ExllamaV2](https://github.com/turboderp/exllamav2) customized kernel into Fastchat to provide **Faster** GPTQ inference speed.

**Note: Exllama not yet support embedding REST API.**

## Install ExllamaV2

Setup environment (please refer to [this link](https://github.com/turboderp/exllamav2#how-to) for more details):

```bash
git clone https://github.com/turboderp/exllamav2
cd exllamav2
pip install -e .
```

Chat with the CLI:
```bash
python3 -m fastchat.serve.cli \
    --model-path models/vicuna-7B-1.1-GPTQ-4bit-128g \
    --enable-exllama
```

Start model worker:
```bash
# Download quantized model from huggingface
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g models/vicuna-7B-1.1-GPTQ-4bit-128g

# Load model with default configuration (max sequence length 4096, no GPU split setting).
python3 -m fastchat.serve.model_worker \
    --model-path models/vicuna-7B-1.1-GPTQ-4bit-128g \
    --enable-exllama

#Load model with max sequence length 2048, allocate 18 GB to CUDA:0 and 24 GB to CUDA:1.
python3 -m fastchat.serve.model_worker \
    --model-path models/vicuna-7B-1.1-GPTQ-4bit-128g \
    --enable-exllama \
    --exllama-max-seq-len 2048 \
    --exllama-gpu-split 18,24
```

`--exllama-cache-8bit` can be used to enable 8-bit caching with exllama and save some VRAM.

## Performance 

Reference: https://github.com/turboderp/exllamav2#performance


| Model      | Mode         | Size  | grpsz | act | V1: 3090Ti | V1: 4090 | V2: 3090Ti | V2: 4090    |
|------------|--------------|-------|-------|-----|------------|----------|------------|-------------|
| Llama      | GPTQ         | 7B    | 128   | no  | 143 t/s    | 173 t/s  | 175 t/s    | **195** t/s |
| Llama      | GPTQ         | 13B   | 128   | no  | 84 t/s     | 102 t/s  | 105 t/s    | **110** t/s |
| Llama      | GPTQ         | 33B   | 128   | yes | 37 t/s     | 45 t/s   | 45 t/s     | **48** t/s  |
| OpenLlama  | GPTQ         | 3B    | 128   | yes | 194 t/s    | 226 t/s  | 295 t/s    | **321** t/s |
| CodeLlama  | EXL2 4.0 bpw | 34B   | -     | -   | -          | -        | 42 t/s     | **48** t/s  |
| Llama2     | EXL2 3.0 bpw | 7B    | -     | -   | -          | -        | 195 t/s    | **224** t/s |
| Llama2     | EXL2 4.0 bpw | 7B    | -     | -   | -          | -        | 164 t/s    | **197** t/s |
| Llama2     | EXL2 5.0 bpw | 7B    | -     | -   | -          | -        | 144 t/s    | **160** t/s |
| Llama2     | EXL2 2.5 bpw | 70B   | -     | -   | -          | -        | 30 t/s     | **35** t/s  |
| TinyLlama  | EXL2 3.0 bpw | 1.1B  | -     | -   | -          | -        | 536 t/s    | **635** t/s |
| TinyLlama  | EXL2 4.0 bpw | 1.1B  | -     | -   | -          | -        | 509 t/s    | **590** t/s |


================================================
FILE: docs/gptq.md
================================================
# GPTQ 4bit Inference

Support GPTQ 4bit inference with [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).

1. Window user: use the `old-cuda` branch.
2. Linux user: recommend the `fastest-inference-4bit` branch.

## Install

Setup environment:
```bash
# cd /path/to/FastChat
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git repositories/GPTQ-for-LLaMa
cd repositories/GPTQ-for-LLaMa
# Window's user should use the `old-cuda` branch
git switch fastest-inference-4bit
# Install `quant-cuda` package in FastChat's virtualenv
python3 setup_cuda.py install
pip3 install texttable
```

Chat with the CLI:
```bash
python3 -m fastchat.serve.cli \
    --model-path models/vicuna-7B-1.1-GPTQ-4bit-128g \
    --gptq-wbits 4 \
    --gptq-groupsize 128
```

Start model worker:
```bash
# Download quantized model from huggingface
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g models/vicuna-7B-1.1-GPTQ-4bit-128g

python3 -m fastchat.serve.model_worker \
    --model-path models/vicuna-7B-1.1-GPTQ-4bit-128g \
    --gptq-wbits 4 \
    --gptq-groupsize 128

# You can specify which quantized model to use
python3 -m fastchat.serve.model_worker \
    --model-path models/vicuna-7B-1.1-GPTQ-4bit-128g \
    --gptq-ckpt models/vicuna-7B-1.1-GPTQ-4bit-128g/vicuna-7B-1.1-GPTQ-4bit-128g.safetensors \
    --gptq-wbits 4 \
    --gptq-groupsize 128 \
    --gptq-act-order
```

## Benchmark

| LLaMA-13B | branch                 | Bits | group-size | memory(MiB) | PPL(c4) | Median(s/token) | act-order | speed up |
| --------- | ---------------------- | ---- | ---------- | ----------- | ------- | --------------- | --------- | -------- |
| FP16      | fastest-inference-4bit | 16   | -          | 26634       | 6.96    | 0.0383          | -         | 1x       |
| GPTQ      | triton                 | 4    | 128        | 8590        | 6.97    | 0.0551          | -         | 0.69x    |
| GPTQ      | fastest-inference-4bit | 4    | 128        | 8699        | 6.97    | 0.0429          | true      | 0.89x    |
| GPTQ      | fastest-inference-4bit | 4    | 128        | 8699        | 7.03    | 0.0287          | false     | 1.33x    |
| GPTQ      | fastest-inference-4bit | 4    | -1         | 8448        | 7.12    | 0.0284          | false     | 1.44x    |


================================================
FILE: docs/langchain_integration.md
================================================
# Local LangChain with FastChat

[LangChain](https://python.langchain.com/en/latest/index.html) is a library that facilitates the development of applications by leveraging large language models (LLMs) and enabling their composition with other sources of computation or knowledge.
FastChat's OpenAI-compatible [API server](openai_api.md) enables using LangChain with open models seamlessly.

## Launch RESTful API Server

Here are the steps to launch a local OpenAI API server for LangChain.

First, launch the controller

```bash
python3 -m fastchat.serve.controller
```

LangChain uses OpenAI model names by default, so we need to assign some faux OpenAI model names to our local model.
Here, we use Vicuna as an example and use it for three endpoints: chat completion, completion, and embedding.
`--model-path` can be a local folder or a Hugging Face repo name.
See a full list of supported models [here](../README.md#supported-models).

```bash
python3 -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path lmsys/vicuna-7b-v1.5
```

Finally, launch the RESTful API server

```bash
python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
```

## Set OpenAI Environment

You can set your environment with the following commands.

Set OpenAI base url

```bash
export OPENAI_API_BASE=http://localhost:8000/v1
```

Set OpenAI API key

```bash
export OPENAI_API_KEY=EMPTY
```

If you meet the following OOM error while creating embeddings, please set a smaller batch size by using environment variables.

~~~bash
openai.error.APIError: Invalid response object from API: '{"object":"error","message":"**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**\\n\\n(CUDA out of memory. Tried to allocate xxx MiB (GPU 0; xxx GiB total capacity; xxx GiB already allocated; xxx MiB free; xxx GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF)","code":50002}' (HTTP response code was 400)
~~~

You can try `export FASTCHAT_WORKER_API_EMBEDDING_BATCH_SIZE=1`.

## Try local LangChain

Here is a question answerting example.

Download a text file.

```bash
wget https://raw.githubusercontent.com/hwchase17/langchain/v0.0.200/docs/modules/state_of_the_union.txt
```

Run LangChain.

~~~py
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator

embedding = OpenAIEmbeddings(model="text-embedding-ada-002")
loader = TextLoader("state_of_the_union.txt")
index = VectorstoreIndexCreator(embedding=embedding).from_loaders([loader])
llm = ChatOpenAI(model="gpt-3.5-turbo")

questions = [
    "Who is the speaker",
    "What did the president say about Ketanji Brown Jackson",
    "What are the threats to America",
    "Who are mentioned in the speech",
    "Who is the vice president",
    "How many projects were announced",
]

for query in questions:
    print("Query:", query)
    print("Answer:", index.query(query, llm=llm))
~~~


================================================
FILE: docs/lightllm_integration.md
================================================
# LightLLM Integration
You can use [LightLLM](https://github.com/ModelTC/lightllm) as an optimized worker implementation in FastChat.
It offers advanced continuous batching and a much higher (~10x) throughput.
See the supported models [here](https://github.com/ModelTC/lightllm?tab=readme-ov-file#supported-model-list).

## Instructions
1. Please refer to the [Get started](https://github.com/ModelTC/lightllm?tab=readme-ov-file#get-started) to install LightLLM. Or use [Pre-built image](https://github.com/ModelTC/lightllm?tab=readme-ov-file#container)

2. When you launch a model worker, replace the normal worker (`fastchat.serve.model_worker`) with the LightLLM worker (`fastchat.serve.lightllm_worker`). All other commands such as controller, gradio web server, and OpenAI API server are kept the same. Refer to [--max_total_token_num](https://github.com/ModelTC/lightllm/blob/4a9824b6b248f4561584b8a48ae126a0c8f5b000/docs/ApiServerArgs.md?plain=1#L23) to understand how to calculate the `--max_total_token_num` argument.
   ```
   python3 -m fastchat.serve.lightllm_worker --model-path lmsys/vicuna-7b-v1.5 --tokenizer_mode "auto" --max_total_token_num 154000
   ```

   If you what to use quantized weight and kv cache for inference, try

   ```
   python3 -m fastchat.serve.lightllm_worker --model-path lmsys/vicuna-7b-v1.5 --tokenizer_mode "auto" --max_total_token_num 154000 --mode triton_int8weight triton_int8kv
   ```


================================================
FILE: docs/mlx_integration.md
================================================
# Apple MLX Integration

You can use [Apple MLX](https://github.com/ml-explore/mlx) as an optimized worker implementation in FastChat.

It runs models efficiently on Apple Silicon

See the supported models [here](https://github.com/ml-explore/mlx-examples/tree/main/llms#supported-models).

Note that for Apple Silicon Macs with less memory, smaller models (or quantized models) are recommended.

## Instructions

1. Install MLX.

   ```
   pip install "mlx-lm>=0.0.6"
   ```

2. When you launch a model worker, replace the normal worker (`fastchat.serve.model_worker`) with the MLX worker (`fastchat.serve.mlx_worker`). Remember to launch a model worker after you have launched the controller ([instructions](../README.md))

   ```
   python3 -m fastchat.serve.mlx_worker --model-path TinyLlama/TinyLlama-1.1B-Chat-v1.0
   ```


================================================
FILE: docs/model_support.md
================================================
# Model Support
This document describes how to support a new model in FastChat.

## Content
- [Local Models](#local-models)
- [API-Based Models](#api-based-models)

## Local Models
To support a new local model in FastChat, you need to correctly handle its prompt template and model loading.
The goal is to make the following command run with the correct prompts.

```
python3 -m fastchat.serve.cli --model [YOUR_MODEL_PATH]
```

You can run this example command to learn the code logic.

```
python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.5
```

You can add `--debug` to see the actual prompt sent to the model.

### Steps

FastChat uses the `Conversation` class to handle prompt templates and `BaseModelAdapter` class to handle model loading.

1. Implement a conversation template for the new model at [fastchat/conversation.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py). You can follow existing examples and use `register_conv_template` to add a new one. Please also add a link to the official reference code if possible.
2. Implement a model adapter for the new model at [fastchat/model/model_adapter.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_adapter.py). You can follow existing examples and use `register_model_adapter` to add a new one.
3. (Optional) add the model name to the "Supported models" [section](#supported-models) above and add more information in [fastchat/model/model_registry.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_registry.py).

After these steps, the new model should be compatible with most FastChat features, such as CLI, web UI, model worker, and OpenAI-compatible API server. Please do some testing with these features as well.

### Supported models

- [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
  - example: `python3 -m fastchat.serve.cli --model-path meta-llama/Llama-2-7b-chat-hf`
- Vicuna, Alpaca, LLaMA, Koala
  - example: `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5`
- [allenai/tulu-2-dpo-7b](https://huggingface.co/allenai/tulu-2-dpo-7b)
- [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
- [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B)
- [BAAI/AquilaChat2-34B](https://huggingface.co/BAAI/AquilaChat2-34B)
- [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en#using-huggingface-transformers)
- [argilla/notus-7b-v1](https://huggingface.co/argilla/notus-7b-v1)
- [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)
- [BlinkDL/RWKV-4-Raven](https://huggingface.co/BlinkDL/rwkv-4-raven)
  - example: `python3 -m fastchat.serve.cli --model-path ~/model_weights/RWKV-4-Raven-7B-v11x-Eng99%-Other1%-20230429-ctx8192.pth`
- [bofenghuang/vigogne-2-7b-instruct](https://huggingface.co/bofenghuang/vigogne-2-7b-instruct)
- [bofenghuang/vigogne-2-7b-chat](https://huggingface.co/bofenghuang/vigogne-2-7b-chat)
- [camel-ai/CAMEL-13B-Combined-Data](https://huggingface.co/camel-ai/CAMEL-13B-Combined-Data)
- [codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf)
- [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b)
- [deepseek-ai/deepseek-llm-67b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat)
- [deepseek-ai/deepseek-coder-33b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)
- [FlagAlpha/Llama2-Chinese-13b-Chat](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat)
- [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b)
- [FreedomIntelligence/ReaLM-7b-v1](https://huggingface.co/FreedomIntelligence/Realm-7b)
- [h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b)
- [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
- [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)
- [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
- [cllm/consistency-llm-7b-codesearchnet/consistency-llm-7b-gsm8k/consistency-llm-7b-sharegpt48k/consistency-llm-7b-spider](https://huggingface.co/cllm)
- [IEITYuan/Yuan2-2B/51B/102B-hf](https://huggingface.co/IEITYuan)
- [lcw99/polyglot-ko-12.8b-chang-instruct-chat](https://huggingface.co/lcw99/polyglot-ko-12.8b-chang-instruct-chat)
- [lmsys/fastchat-t5-3b-v1.0](https://huggingface.co/lmsys/fastchat-t5)
- [meta-math/MetaMath-7B-V1.0](https://huggingface.co/meta-math/MetaMath-7B-V1.0)
- [Microsoft/Orca-2-7b](https://huggingface.co/microsoft/Orca-2-7b)
- [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat)
  - example: `python3 -m fastchat.serve.cli --model-path mosaicml/mpt-7b-chat`
- [Neutralzz/BiLLa-7B-SFT](https://huggingface.co/Neutralzz/BiLLa-7B-SFT)
- [nomic-ai/gpt4all-13b-snoozy](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy)
- [NousResearch/Nous-Hermes-13b](https://huggingface.co/NousResearch/Nous-Hermes-13b)
- [openaccess-ai-collective/manticore-13b-chat-pyg](https://huggingface.co/openaccess-ai-collective/manticore-13b-chat-pyg)
- [OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5](https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5)
- [openchat/openchat_3.5](https://huggingface.co/openchat/openchat_3.5)
- [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)
- [OpenLemur/lemur-70b-chat-v1](https://huggingface.co/OpenLemur/lemur-70b-chat-v1)
- [Phind/Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2)
- [project-baize/baize-v2-7b](https://huggingface.co/project-baize/baize-v2-7b)
- [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)
- [rishiraj/CatPPT](https://huggingface.co/rishiraj/CatPPT)
- [Salesforce/codet5p-6b](https://huggingface.co/Salesforce/codet5p-6b)
- [StabilityAI/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b)
- [tenyx/TenyxChat-7B-v1](https://huggingface.co/tenyx/TenyxChat-7B-v1)
- [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b)
- [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)
- [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
- [tiiuae/falcon-180B-chat](https://huggingface.co/tiiuae/falcon-180B-chat)
- [timdettmers/guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged)
- [togethercomputer/RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat)
- [VMware/open-llama-7b-v2-open-instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
- [WizardLM/WizardLM-13B-V1.0](https://huggingface.co/WizardLM/WizardLM-13B-V1.0)
- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
- [Xwin-LM/Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1)
- Any [EleutherAI](https://huggingface.co/EleutherAI) pythia model such as [pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b)
- Any [Peft](https://github.com/huggingface/peft) adapter trained on top of a
  model above.  To activate, must have `peft` in the model path.  Note: If
  loading multiple peft models, you can have them share the base model weights by
  setting the environment variable `PEFT_SHARE_BASE_WEIGHTS=true` in any model
  worker.


## API-Based Models
To support an API-based model, consider learning from the existing OpenAI example.
If the model is compatible with OpenAI APIs, then a configuration file is all that's needed without any additional code.
For custom protocols, implementation of a streaming generator in [fastchat/serve/api_provider.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/api_provider.py) is required, following the provided examples. Currently, FastChat is compatible with OpenAI, Anthropic, Google Vertex AI, Mistral, Nvidia NGC, YandexGPT and Reka.

### Steps to Launch a WebUI with an API Model
1. Specify the endpoint information in a JSON configuration file. For instance, create a file named `api_endpoints.json`:
```json
{
  "gpt-3.5-turbo": {
    "model_name": "gpt-3.5-turbo",
    "api_type": "openai",
    "api_base": "https://api.openai.com/v1",
    "api_key": "sk-******",
    "anony_only": false,
    "recommended_config": {
      "temperature": 0.7,
      "top_p": 1.0
    },
    "text-arena": true,
    "vision-arena": false,
  }
}
```
  - "api_type" can be one of the following: openai, anthropic, gemini, mistral, yandexgpt or reka. For custom APIs, add a new type and implement it accordingly.
  - "anony_only" indicates whether to display this model in anonymous mode only.
  - "recommended_config" indicates the recommended generation parameters for temperature and top_p.
  - "text-arena" indicates whether the model should be displayed in the Text Arena.
  - "vision-arena" indicates whether the model should be displayed in the Vision Arena.

2. Launch the Gradio web server with the argument `--register api_endpoints.json`:
```
python3 -m fastchat.serve.gradio_web_server --controller "" --share --register api_endpoints.json
```

Now, you can open a browser and interact with the model.


================================================
FILE: docs/openai_api.md
================================================
# OpenAI-Compatible RESTful APIs

FastChat provides OpenAI-compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs.
The FastChat server is compatible with both [openai-python](https://github.com/openai/openai-python) library and cURL commands.

The following OpenAI APIs are supported:
- Chat Completions. (Reference: https://platform.openai.com/docs/api-reference/chat)
- Completions. (Reference: https://platform.openai.com/docs/api-reference/completions)
- Embeddings. (Reference: https://platform.openai.com/docs/api-reference/embeddings)

The REST API can be seamlessly operated from Google Colab, as demonstrated in the [FastChat_API_GoogleColab.ipynb](https://github.com/lm-sys/FastChat/blob/main/playground/FastChat_API_GoogleColab.ipynb) notebook, available in our repository. This notebook provides a practical example of how to utilize the API effectively within the Google Colab environment.

## RESTful API Server
First, launch the controller

```bash
python3 -m fastchat.serve.controller
```

Then, launch the model worker(s)

```bash
python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.5
```

Finally, launch the RESTful API server

```bash
python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
```

Now, let us test the API server.

### OpenAI Official SDK
The goal of `openai_api_server.py` is to implement a fully OpenAI-compatible API server, so the models can be used directly with [openai-python](https://github.com/openai/openai-python) library.

First, install OpenAI python package >= 1.0:
```bash
pip install --upgrade openai
```

Then, interact with the Vicuna model:
```python
import openai

openai.api_key = "EMPTY"
openai.base_url = "http://localhost:8000/v1/"

model = "vicuna-7b-v1.5"
prompt = "Once upon a time"

# create a completion
completion = openai.completions.create(model=model, prompt=prompt, max_tokens=64)
# print the completion
print(prompt + completion.choices[0].text)

# create a chat completion
completion = openai.chat.completions.create(
  model=model,
  messages=[{"role": "user", "content": "Hello! What is your name?"}]
)
# print the completion
print(completion.choices[0].message.content)
```

Streaming is also supported. See [test_openai_api.py](../tests/test_openai_api.py).  If your api server is behind a proxy you'll need to turn off buffering, you can do so in Nginx by setting `proxy_buffering off;` in the location block for the proxy.

### cURL
cURL is another good tool for observing the output of the api.

List Models:
```bash
curl http://localhost:8000/v1/models
```

Chat Completions:
```bash
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vicuna-7b-v1.5",
    "messages": [{"role": "user", "content": "Hello! What is your name?"}]
  }'
```

Text Completions:
```bash
curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vicuna-7b-v1.5",
    "prompt": "Once upon a time",
    "max_tokens": 41,
    "temperature": 0.5
  }'
```

Embeddings:
```bash
curl http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vicuna-7b-v1.5",
    "input": "Hello world!"
  }'
```

### Running multiple 

If you want to run multiple models on the same machine and in the same process,
you can replace the `model_worker` step above with a multi model variant:

```bash
python3 -m fastchat.serve.multi_model_worker \
    --model-path lmsys/vicuna-7b-v1.5 \
    --model-names vicuna-7b-v1.5 \
    --model-path lmsys/longchat-7b-16k \
    --model-names longchat-7b-16k
```

This loads both models into the same accelerator and in the same process.  This
works best when using a Peft model that triggers the `PeftModelAdapter`.

TODO: Base model weight optimization will be fixed once [this
Peft](https://github.com/huggingface/peft/issues/430) issue is resolved.

## LangChain Support
This OpenAI-compatible API server supports LangChain. See [LangChain Integration](langchain_integration.md) for details.

## Adjusting Environment Variables

### Timeout
By default, a timeout error will occur if a model worker does not response within 100 seconds. If your model/hardware is slower, you can change this timeout through an environment variable: 

```bash
export FASTCHAT_WORKER_API_TIMEOUT=<larger timeout in seconds>
```

### Batch size
If you meet the following OOM error while creating embeddings. You can use a smaller batch size by setting

```bash
export FASTCHAT_WORKER_API_EMBEDDING_BATCH_SIZE=1
```

## Todos
Some features to be implemented:

- [ ] Support more parameters like `logprobs`, `logit_bias`, `user`, `presence_penalty` and `frequency_penalty`
- [ ] Model details (permissions, owner and create time)
- [ ] Edits API
- [ ] Rate Limitation Settings


================================================
FILE: docs/server_arch.md
================================================
# FastChat Server Architecture
![server arch](../assets/server_arch.png)


================================================
FILE: docs/third_party_ui.md
================================================
# Third Party UI
If you want to host it on your own UI or third party UI, you can launch the [OpenAI compatible server](openai_api.md) and host with a tunnelling service such as Tunnelmole or ngrok, and then enter the credentials appropriately.

You can find suitable UIs from third party repos:
- [WongSaang's ChatGPT UI](https://github.com/WongSaang/chatgpt-ui)
- [McKayWrigley's Chatbot UI](https://github.com/mckaywrigley/chatbot-ui)

- Please note that some third-party providers only offer the standard `gpt-3.5-turbo`, `gpt-4`, etc., so you will have to add your own custom model inside the code. [Here is an example of how to create a UI with any custom model name](https://github.com/ztjhz/BetterChatGPT/pull/461).

##### Using Tunnelmole
Tunnelmole is an open source tunnelling tool. You can find its source code on [Github](https://github.com/robbie-cahill/tunnelmole-client). Here's how you can use Tunnelmole:
1. Install Tunnelmole with `curl -O https://install.tunnelmole.com/9Wtxu/install && sudo bash install`. (On Windows, download [tmole.exe](https://tunnelmole.com/downloads/tmole.exe)). Head over to the [README](https://github.com/robbie-cahill/tunnelmole-client) for other methods such as `npm` or building from source.
2. Run `tmole 7860` (replace `7860` with your listening port if it is different from 7860). The output will display two URLs: one HTTP and one HTTPS. It's best to use the HTTPS URL for better privacy and security.
```
➜  ~ tmole 7860
http://bvdo5f-ip-49-183-170-144.tunnelmole.net is forwarding to localhost:7860
https://bvdo5f-ip-49-183-170-144.tunnelmole.net is forwarding to localhost:7860
```

##### Using ngrok
ngrok is a popular closed source tunnelling tool. First download and install it from [ngrok.com](https://ngrok.com/downloads). Here's how to use it to expose port 7860.
```
ngrok http 7860
```


================================================
FILE: docs/training.md
================================================
### Fine-tuning FastChat-T5
You can use the following command to train FastChat-T5 with 4 x A100 (40GB).
```bash
torchrun --nproc_per_node=4 --master_port=9778 fastchat/train/train_flant5.py \
    --model_name_or_path google/flan-t5-xl \
    --data_path ./data/dummy_conversation.json \
    --bf16 True \
    --output_dir ./checkpoints_flant5_3b \
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 300 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap T5Block \
    --tf32 True \
    --model_max_length 2048 \
    --preprocessed_path ./preprocessed_data/processed.json \
    --gradient_checkpointing True 
```

After training, please use our post-processing [function](https://github.com/lm-sys/FastChat/blob/55051ad0f23fef5eeecbda14a2e3e128ffcb2a98/fastchat/utils.py#L166-L185) to update the saved model weight. Additional discussions can be found [here](https://github.com/lm-sys/FastChat/issues/643).

### Fine-tuning using (Q)LoRA
You can use the following command to train Vicuna-7B using QLoRA using ZeRO2. Note that ZeRO3 is not currently supported with QLoRA but ZeRO3 does support LoRA, which has a reference configuraiton under playground/deepspeed_config_s3.json. To use QLoRA, you must have bitsandbytes>=0.39.0 and transformers>=4.30.0 installed.
```bash
deepspeed fastchat/train/train_lora.py \
    --model_name_or_path ~/model_weights/llama-7b \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --data_path ./data/dummy_conversation.json \
    --bf16 True \
    --output_dir ./checkpoints \
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 100 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --q_lora True \
    --deepspeed playground/deepspeed_config_s2.json \
```

For T5-XL or XXL

```bash
deepspeed fastchat/train/train_lora_t5.py \
        --model_name_or_path google/flan-t5-xl    \
        --data_path ./data/dummy_conversation.json \
        --bf16 True \
        --output_dir ./checkpoints_flant5_3b \
        --num_train_epochs 3 \
        --per_device_train_batch_size 1 \
        --per_device_eval_batch_size 1  \
        --gradient_accumulation_steps 4  \
        --evaluation_strategy "no"  \
        --save_strategy "steps"  \
        --save_steps 300 \
        --save_total_limit 1 \
        --learning_rate 2e-5 \
        --weight_decay 0.     \
        --warmup_ratio 0.03    \
        --lr_scheduler_type "cosine"   \
        --logging_steps 1 \
        --model_max_length 2048    \
        --preprocessed_path ./preprocessed_data/processed.json \
        --gradient_checkpointing True \
        --q_lora True     \
        --deepspeed playground/deepspeed_config_s2.json
        
```

### Fine-tuning Vicuna-7B with Local NPUs

You can use the following command to train Vicuna-7B with 8 x NPUs. Use `--nproc_per_node` to specify the number of NPUs.
```bash
torchrun --nproc_per_node=8 --master_port=20001 fastchat/train/train.py \
    --model_name_or_path ~/vicuna-7b-v1.5-16k  \
    --data_path data/dummy_conversation.json \
    --fp16 True \
    --output_dir output_vicuna \
    --num_train_epochs 3 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 10 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True
```


================================================
FILE: docs/vicuna_weights_version.md
================================================
## Vicuna Weights

| Weights version | Link | FastChat version compatibility | Base Model | Release Date | Fine-tuning Data |
| ---- | ---- | ---- | ---- | ---- | ---- |
| v1.5 | [7B](https://huggingface.co/lmsys/vicuna-7b-v1.5), [7B-16k](https://huggingface.co/lmsys/vicuna-7b-v1.5-16k), [13B](https://huggingface.co/lmsys/vicuna-13b-v1.5), [13B-16k](https://huggingface.co/lmsys/vicuna-13b-v1.5-16k) | `>=0.2.21` | Llama 2 | Aug. 1, 2023 | 370M tokens |
| v1.3 | [7B](https://huggingface.co/lmsys/vicuna-7b-v1.3), [13B](https://huggingface.co/lmsys/vicuna-13b-v1.3), [33B](//huggingface.co/lmsys/vicuna-33b-v1.3) | `>=0.2.1` | Llama 1 | Jun. 22, 2023 | 370M tokens |
| v1.1 | [7B](https://huggingface.co/lmsys/vicuna-7b-v1.1), [13B](https://huggingface.co/lmsys/vicuna-13b-v1.1) | `>=0.2.1` | Llama 1 | Apr. 12, 2023 | - |
| v0 | [7B-delta](https://huggingface.co/lmsys/vicuna-7b-delta-v0), [13B-delta](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | `<=0.1.10` | Llama 1 | Mar. 30, 2023 | - |

### Updates
- Major updates of weights v1.5
  - Use Llama2 as the base model.
  - Provide 16K context length versions using linear RoPE scaling.

- Major updates of weights v1.3
  - Train with twice the amount of ShareGPT data compared to previous versions.
  - Provide merged weights directly instead of delta weights.

- Major updates of weights v1.1
  - Refactor the tokenization and separator. In Vicuna v1.1, the separator has been changed from `###` to the EOS token `</s>`. This change makes it easier to determine the generation stop criteria and enables better compatibility with other libraries.
  - Fix the supervised fine-tuning loss computation for better model quality.

## Prompt Template

### Example prompt (weights v1.1, v1.3, v1.5)
```
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

USER: Hello!
ASSISTANT: Hello!</s>
USER: How are you?
ASSISTANT: I am good.</s>
```

See a full prompt template [here](https://github.com/lm-sys/FastChat/blob/d578599c69d060e6d40943f1b5b72af98956092a/fastchat/conversation.py#L286-L299) and example output [here](https://github.com/lm-sys/FastChat/blob/d578599c69d060e6d40943f1b5b72af98956092a/fastchat/conversation.py#L748-L753).

### Example prompt (weights v0)
```
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.

### Human: Hello!
### Assistant: Hello!
### Human: How are you?
### Assistant: I am good.
```

See the full prompt template [here](https://github.com/lm-sys/FastChat/blob/d578599c69d060e6d40943f1b5b72af98956092a/fastchat/conversation.py#L238-L269).

## How to Apply Delta Weights (Only Needed for Weights v0)

We release [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/) weights v0 as delta weights to comply with the LLaMA model license.
You can add our delta to the original LLaMA weights to obtain the Vicuna weights. Instructions:

1. Get the original LLaMA weights in the Hugging Face format by following the instructions [here](https://huggingface.co/docs/transformers/main/model_doc/llama).
2. Use the following scripts to get Vicuna weights by applying our delta. They will automatically download delta weights from our Hugging Face [account](https://huggingface.co/lmsys).

**NOTE**:
Weights v1.1 are only compatible with ```transformers>=4.28.0``` and ``fschat >= 0.2.0``.
Please update your local packages accordingly. If you follow the above commands to do a fresh install, then you should get all the correct versions.

#### Vicuna-7B
This conversion command needs around 30 GB of CPU RAM.
See the "Low CPU Memory Conversion" section below if you do not have enough memory.
Replace `/path/to/*` with the real paths.
```bash
python3 -m fastchat.model.apply_delta \
    --base-model-path /path/to/llama-7b \
    --target-model-path /path/to/output/vicuna-7b \
    --delta-path lmsys/vicuna-7b-delta-v1.1
```

#### Vicuna-13B
This conversion command needs around 60 GB of CPU RAM.
See the "Low CPU Memory Conversion" section below if you do not have enough memory.
Replace `/path/to/*` with the real paths.
```bash
python3 -m fastchat.model.apply_delta \
    --base-model-path /path/to/llama-13b \
    --target-model-path /path/to/output/vicuna-13b \
    --delta-path lmsys/vicuna-13b-delta-v1.1
```

#### Low CPU Memory Conversion
You can try these methods to reduce the CPU RAM requirement of weight conversion.
1. Append `--low-cpu-mem` to the commands above, which will split large weight files into smaller ones and use the disk as temporary storage. This can keep the peak memory at less than 16GB.
2. Create a large swap file and rely on the operating system to automatically utilize the disk as virtual memory.

## FAQ

### Tokenizer issues
There are some frequently asked tokenizer issues (https://github.com/lm-sys/FastChat/issues/408).
Some of them are not only related to FastChat or Vicuna weights but are also related to how you convert the base llama model.

We suggest that you use `transformers>=4.28.0` and redo the weight conversion for the base llama model.
After applying the delta, you should have a file named `special_tokens_map.json` in your converted weight folder for either v0 or v1.1.
The contents of this file should be the same as this file: https://huggingface.co/lmsys/vicuna-13b-delta-v0/blob/main/special_tokens_map.json.
If the file is not present, please copy the `special_tokens_map.json` and `tokenizer_config.json` files from https://huggingface.co/lmsys/vicuna-13b-delta-v0/tree/main to your converted weight folder. This works for both v0 and v1.1.


================================================
FILE: docs/vllm_integration.md
================================================
# vLLM Integration
You can use [vLLM](https://vllm.ai/) as an optimized worker implementation in FastChat.
It offers advanced continuous batching and a much higher (~10x) throughput.
See the supported models [here](https://vllm.readthedocs.io/en/latest/models/supported_models.html).

## Instructions
1. Install vLLM.
    ```
    pip install vllm
    ```

2. When you launch a model worker, replace the normal worker (`fastchat.serve.model_worker`) with the vLLM worker (`fastchat.serve.vllm_worker`). All other commands such as controller, gradio web server, and OpenAI API server are kept the same.
   ```
   python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-7b-v1.5
   ```

   If you see tokenizer errors, try
   ```
   python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-7b-v1.5 --tokenizer hf-internal-testing/llama-tokenizer
   ```

   If you use an AWQ quantized model, try
   '''
   python3 -m fastchat.serve.vllm_worker --model-path TheBloke/vicuna-7B-v1.5-AWQ --quantization awq
   '''


================================================
FILE: docs/xFasterTransformer.md
================================================
# xFasterTransformer Inference Framework

Integrated [xFasterTransformer](https://github.com/intel/xFasterTransformer) customized framework into Fastchat to provide **Faster** inference speed on Intel CPU.

## Install xFasterTransformer

Setup environment (please refer to [this link](https://github.com/intel/xFasterTransformer#installation) for more details):

```bash
pip install xfastertransformer
```

## Prepare models

Prepare Model (please refer to [this link](https://github.com/intel/xFasterTransformer#prepare-model) for more details):
```bash
python ./tools/chatglm_convert.py -i ${HF_DATASET_DIR} -o  ${OUTPUT_DIR}
```

## Parameters of xFasterTransformer
--enable-xft to enable xfastertransformer in Fastchat
--xft-max-seq-len to set the max token length the model can process. max token length include input token length.
--xft-dtype to set datatype used in xFasterTransformer for computation. xFasterTransformer can support fp32, fp16, int8, bf16 and hybrid data types like : bf16_fp16, bf16_int8. For datatype details please refer to [this link](https://github.com/intel/xFasterTransformer/wiki/Data-Type-Support-Platform)
    

Chat with the CLI:
```bash
#run inference on all CPUs and using float16
python3 -m fastchat.serve.cli \
    --model-path /path/to/models \
    --enable-xft \
    --xft-dtype fp16
```
or with numactl on multi-socket server for better performance
```bash
#run inference on numanode 0 and with data type bf16_fp16 (first token uses bfloat16, and rest tokens use float16)
numactl -N 0  --localalloc \
python3 -m fastchat.serve.cli \
    --model-path /path/to/models/chatglm2_6b_cpu/ \
    --enable-xft \
    --xft-dtype bf16_fp16
```
or using MPI to run inference on 2 sockets for better performance
```bash
#run inference on numanode 0 and 1 and with data type bf16_fp16 (first token uses bfloat16, and rest tokens use float16)
OMP_NUM_THREADS=$CORE_NUM_PER_SOCKET LD_PRELOAD=libiomp5.so mpirun \
-n 1 numactl -N 0  --localalloc \
python -m fastchat.serve.cli \ 
    --model-path /path/to/models/chatglm2_6b_cpu/ \
    --enable-xft \
    --xft-dtype bf16_fp16 : \
-n 1 numactl -N 1  --localalloc \
python -m fastchat.serve.cli \
    --model-path /path/to/models/chatglm2_6b_cpu/ \
    --enable-xft \
    --xft-dtype bf16_fp16
```


Start model worker:
```bash
# Load model with default configuration (max sequence length 4096, no GPU split setting).
python3 -m fastchat.serve.model_worker \
    --model-path /path/to/models \
    --enable-xft \
    --xft-dtype bf16_fp16 
```
or with numactl on multi-socket server for better performance
```bash
#run inference on numanode 0 and with data type bf16_fp16 (first token uses bfloat16, and rest tokens use float16)
numactl -N 0  --localalloc python3 -m fastchat.serve.model_worker \
    --model-path /path/to/models \
    --enable-xft \
    --xft-dtype bf16_fp16 
```
or using MPI to run inference on 2 sockets for better performance
```bash
#run inference on numanode 0 and 1 and with data type bf16_fp16 (first token uses bfloat16, and rest tokens use float16)
OMP_NUM_THREADS=$CORE_NUM_PER_SOCKET LD_PRELOAD=libiomp5.so mpirun \
-n 1 numactl -N 0  --localalloc  python -m fastchat.serve.model_worker \
    --model-path /path/to/models \
    --enable-xft \
    --xft-dtype bf16_fp16 : \
-n 1 numactl -N 1  --localalloc  python -m fastchat.serve.model_worker \
    --model-path /path/to/models \
    --enable-xft \
    --xft-dtype bf16_fp16 
```

For more details, please refer to [this link](https://github.com/intel/xFasterTransformer#how-to-run) 


================================================
FILE: fastchat/__init__.py
================================================
__version__ = "0.2.36"


================================================
FILE: fastchat/constants.py
================================================
"""
Global constants.
"""

from enum import IntEnum
import os

REPO_PATH = os.path.dirname(os.path.dirname(__file__))

# Survey Link URL (to be removed) #00729c
# SURVEY_LINK = """<div style='text-align: left; margin: 20px 0;'>
#     <div style='display: inline-block; border: 2px solid #00729c; padding: 20px; padding-bottom: 10px; padding-top: 10px; border-radius: 5px;'>
#         <span style='color: #00729c; font-weight: bold;'>New Launch! Copilot Arena: <a href='https://marketplace.visualstudio.com/items?itemName=copilot-arena.copilot-arena' style='color: #00729c; text-decoration: underline;'>VS Code Extension</a> to compare Top LLMs</span>
#     </div>
# </div>"""
# SURVEY_LINK = ""

COLOR = "#F11414"
SURVEY_LINK = f"""<div style='text-align: center; margin: 20px 0;'>
    <div style='display: block; width: 100%; border: 2px solid {COLOR}; padding: 20px; padding-bottom: 10px; padding-top: 10px; border-radius: 5px; background-color: #FE9393'>
        <span style='font-weight: bold; font-size: 20px; color: #050505; '>🔔 New Arena UI at <a href='https://lmarena.ai/leaderboard?utm_campaign=hf_banner' target="_blank" rel="noopener noreferrer" style="color: #233F9C; text-decoration: underline;">lmarena.ai/leaderboard</a>! Check it out and give feedback!</a></span>
    </div>
</div>"""

##### For the gradio web server
SERVER_ERROR_MSG = (
    "**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**"
)
TEXT_MODERATION_MSG = (
    "$MODERATION$ YOUR TEXT VIOLATES OUR CONTENT MODERATION GUIDELINES."
)
IMAGE_MODERATION_MSG = (
    "$MODERATION$ YOUR IMAGE VIOLATES OUR CONTENT MODERATION GUIDELINES."
)
MODERATION_MSG = "$MODERATION$ YOUR INPUT VIOLATES OUR CONTENT MODERATION GUIDELINES."
CONVERSATION_LIMIT_MSG = "YOU HAVE REACHED THE CONVERSATION LENGTH LIMIT. PLEASE CLEAR HISTORY AND START A NEW CONVERSATION."
INACTIVE_MSG = "THIS SESSION HAS BEEN INACTIVE FOR TOO LONG. PLEASE REFRESH THIS PAGE."
SLOW_MODEL_MSG = (
    "⚠️  Models are thinking. Please stay patient as it may take over a minute."
)
RATE_LIMIT_MSG = "**RATE LIMIT OF THIS MODEL IS REACHED. PLEASE COME BACK LATER OR USE <span style='color: red; font-weight: bold;'>[BATTLE MODE](https://lmarena.ai)</span> (the 1st tab).**"
# Maximum input length
INPUT_CHAR_LEN_LIMIT = int(os.getenv("FASTCHAT_INPUT_CHAR_LEN_LIMIT", 12000))
BLIND_MODE_INPUT_CHAR_LEN_LIMIT = int(
    os.getenv("FASTCHAT_BLIND_MODE_INPUT_CHAR_LEN_LIMIT", 30000)
)
# Maximum conversation turns
CONVERSATION_TURN_LIMIT = 50
# Session expiration time
SESSION_EXPIRATION_TIME = 3600
# The output dir of log files
LOGDIR = os.getenv("LOGDIR", ".")
# CPU Instruction Set Architecture
CPU_ISA = os.getenv("CPU_ISA")


##### For the controller and workers (could be overwritten through ENV variables.)
CONTROLLER_HEART_BEAT_EXPIRATION = int(
    os.getenv("FASTCHAT_CONTROLLER_HEART_BEAT_EXPIRATION", 90)
)
WORKER_HEART_BEAT_INTERVAL = int(os.getenv("FASTCHAT_WORKER_HEART_BEAT_INTERVAL", 45))
WORKER_API_TIMEOUT = int(os.getenv("FASTCHAT_WORKER_API_TIMEOUT", 100))
WORKER_API_EMBEDDING_BATCH_SIZE = int(
    os.getenv("FASTCHAT_WORKER_API_EMBEDDING_BATCH_SIZE", 4)
)


class ErrorCode(IntEnum):
    """
    https://platform.openai.com/docs/guides/error-codes/api-errors
    """

    VALIDATION_TYPE_ERROR = 40001

    INVALID_AUTH_KEY = 40101
    INCORRECT_AUTH_KEY = 40102
    NO_PERMISSION = 40103

    INVALID_MODEL = 40301
    PARAM_OUT_OF_RANGE = 40302
    CONTEXT_OVERFLOW = 40303

    RATE_LIMIT = 42901
    QUOTA_EXCEEDED = 42902
    ENGINE_OVERLOADED = 42903

    INTERNAL_ERROR = 50001
    CUDA_OUT_OF_MEMORY = 50002
    GRADIO_REQUEST_ERROR = 50003
    GRADIO_STREAM_UNKNOWN_ERROR = 50004
    CONTROLLER_NO_WORKER = 50005
    CONTROLLER_WORKER_TIMEOUT = 50006


================================================
FILE: fastchat/conversation.py
================================================
"""
Conversation prompt templates.

We kindly request that you import fastchat instead of copying this file if you wish to use it.
If you have any changes in mind, please contribute back so the community can benefit collectively and continue to maintain these valuable templates.
"""

import base64
import dataclasses
from enum import auto, IntEnum
from io import BytesIO
import os
from typing import List, Any, Dict, Union, Tuple


class SeparatorStyle(IntEnum):
    """Separator styles."""

    ADD_COLON_SINGLE = auto()
    ADD_COLON_TWO = auto()
    ADD_COLON_SPACE_SINGLE = auto()
    NO_COLON_SINGLE = auto()
    NO_COLON_TWO = auto()
    ADD_NEW_LINE_SINGLE = auto()
    LLAMA2 = auto()
    LLAMA3 = auto()
    CHATGLM = auto()
    CHATML = auto()
    CHATINTERN = auto()
    DOLLY = auto()
    RWKV = auto()
    PHOENIX = auto()
    ROBIN = auto()
    FALCON_CHAT = auto()
    CHATGLM3 = auto()
    DEEPSEEK_CHAT = auto()
    METAMATH = auto()
    YUAN2 = auto()
    GEMMA = auto()
    CLLM = auto()
    DEFAULT = auto()


IMAGE_PLACEHOLDER_STR = "$$<image>$$"


@dataclasses.dataclass
class Conversation:
    """A class that manages prompt templates and keeps all conversation history."""

    # The name of this template
    name: str
    # The template of the system prompt
    system_template: str = "{system_message}"
    # The system message
    system_message: str = ""
    system_message_vision: str = ""
    # The names of two roles
    roles: Tuple[str] = ("USER", "ASSISTANT")
    # All messages. Each item is (role, message).
    # Each message is either a string or a tuple of (string, List[image_url]).
    messages: List[List[str]] = ()
    # The number of few shot examples
    offset: int = 0
    # The separator style and configurations
    sep_style: SeparatorStyle = SeparatorStyle.ADD_COLON_SINGLE
    sep: str = "\n"
    sep2: str = None
    # Stop criteria (the default one is EOS token)
    stop_str: Union[str, List[str]] = None
    # Stops generation if meeting any token in this list
    stop_token_ids: List[int] = None
    # The maximum image size in megabytes that this model takes in. None means we do not resize the image.
    max_image_size_mb: int = None

    def get_prompt(self) -> str:
        """Get the prompt for generation."""
        system_prompt = self.system_template.format(system_message=self.system_message)
        if self.sep_style == SeparatorStyle.ADD_COLON_SINGLE:
            ret = system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    if type(message) is tuple:
                        message, images = message
                        message = IMAGE_PLACEHOLDER_STR * len(images) + message
                    ret += role + ": " + message + self.sep
                else:
                    ret += role + ":"
            return ret
        elif self.sep_style == SeparatorStyle.ADD_COLON_TWO:
            seps = [self.sep, self.sep2]
            ret = system_prompt + seps[0]
            for i, (role, message) in enumerate(self.messages):
                if message:
                    if type(message) is tuple:
                        message, images = message
                        message = IMAGE_PLACEHOLDER_STR * len(images) + message
                    ret += role + ": " + message + seps[i % 2]
                else:
                    ret += role + ":"
            return ret
        elif self.sep_style == SeparatorStyle.ADD_COLON_SPACE_SINGLE:
            ret = system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + ": " + message + self.sep
                else:
                    ret += role + ": "  # must be end with a space
            return ret
        elif self.sep_style == SeparatorStyle.ADD_NEW_LINE_SINGLE:
            ret = "" if system_prompt == "" else system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + "\n" + message + self.sep
                else:
                    ret += role + "\n"
            return ret
        elif self.sep_style == SeparatorStyle.NO_COLON_SINGLE:
            ret = system_prompt
            for role, message in self.messages:
                if message:
                    ret += role + message + self.sep
                else:
                    ret += role
            return ret
        elif self.sep_style == SeparatorStyle.NO_COLON_TWO:
            seps = [self.sep, self.sep2]
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += role + message + seps[i % 2]
                else:
                    ret += role
            return ret
        elif self.sep_style == SeparatorStyle.RWKV:
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += (
                        role
                        + ": "
                        + message.replace("\r\n", "\n").replace("\n\n", "\n")
                    )
                    ret += "\n\n"
                else:
                    ret += role + ":"
            return ret
        elif self.sep_style == SeparatorStyle.LLAMA2:
            seps = [self.sep, self.sep2]
            if self.system_message:
                ret = system_prompt
            else:
                ret = "[INST] "
            for i, (role, message) in enumerate(self.messages):
                tag = self.roles[i % 2]
                if message:
                    if i == 0:
                        ret += message + " "
                    else:
                        ret += tag + " " + message + seps[i % 2]
                else:
                    ret += tag
            return ret
        elif self.sep_style == SeparatorStyle.LLAMA3:
            ret = "<|begin_of_text|>"
            if self.system_message:
                ret += system_prompt
            else:
                ret += ""
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += f"<|start_header_id|>{role}<|end_header_id|>\n\n"
                    ret += f"{message.strip()}<|eot_id|>"
                else:
                    ret += f"<|start_header_id|>{role}<|end_header_id|>\n\n"
            return ret
        elif self.sep_style == SeparatorStyle.CHATGLM:
            # source: https://huggingface.co/THUDM/chatglm-6b/blob/1d240ba371910e9282298d4592532d7f0f3e9f3e/modeling_chatglm.py#L1302-L1308
            # source2: https://huggingface.co/THUDM/chatglm2-6b/blob/e186c891cf64310ac66ef10a87e6635fa6c2a579/modeling_chatglm.py#L926
            round_add_n = 1 if self.name == "chatglm2" else 0
            if system_prompt:
                ret = system_prompt + self.sep
            else:
                ret = ""

            for i, (role, message) in enumerate(self.messages):
                if i % 2 == 0:
                    ret += f"[Round {i//2 + round_add_n}]{self.sep}"

                if message:
                    ret += f"{role}：{message}{self.sep}"
                else:
                    ret += f"{role}："
            return ret
        elif self.sep_style == SeparatorStyle.CHATML:
            ret = "" if system_prompt == "" else system_prompt + self.sep + "\n"
            for role, message in self.messages:
                if message:
                    if type(message) is tuple:
                        message, images = message
                        message = IMAGE_PLACEHOLDER_STR * len(images) + message
                    ret += role + "\n" + message + self.sep + "\n"
                else:
                    ret += role + "\n"
            return ret
        elif self.sep_style == SeparatorStyle.CHATGLM3:
            ret = ""
            if self.system_message:
                ret += system_prompt
            for role, message in self.messages:
                if message:
                    ret += role + "\n" + message
                else:
                    ret += role
            return ret
        elif self.sep_style == SeparatorStyle.CHATINTERN:
            # source: https://huggingface.co/internlm/internlm-chat-7b-8k/blob/bd546fa984b4b0b86958f56bf37f94aa75ab8831/modeling_internlm.py#L771
            seps = [self.sep, self.sep2]
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                if i % 2 == 0:
                    ret += "<s>"
                if message:
                    ret += role + ":" + message + seps[i % 2] + "\n"
                else:
                    ret += role + ":"
            return ret
        elif self.sep_style == SeparatorStyle.DOLLY:
            seps = [self.sep, self.sep2]
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += role + ":\n" + message + seps[i % 2]
                    if i % 2 == 1:
                        ret += "\n\n"
                else:
                    ret += role + ":\n"
            return ret
        elif self.sep_style == SeparatorStyle.PHOENIX:
            ret = system_prompt
            for role, message in self.messages:
                if message:
                    ret += role + ": " + "<s>" + message + "</s>"
                else:
                    ret += role + ": " + "<s>"
            return ret
        elif self.sep_style == SeparatorStyle.ROBIN:
            ret = system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + ":\n" + message + self.sep
                else:
                    ret += role + ":\n"
            return ret
        elif self.sep_style == SeparatorStyle.FALCON_CHAT:
            ret = ""
            if self.system_message:
                ret += system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + ": " + message + self.sep
                else:
                    ret += role + ":"
            return ret
        elif self.sep_style == SeparatorStyle.METAMATH:
            ret = "" if system_prompt == "" else system_prompt + self.sep
            for i, (role, message) in enumerate(self.messages):
                # For MetaMath, sep2 is used to prefix the message.
                starting_sep = ":\n" if i % 2 == 0 else ": " + self.sep2
                ending_sep = self.sep if i % 2 == 0 else ""
                if message:
                    ret += role + starting_sep + message + ending_sep
                else:
                    ret += role + starting_sep
            return ret
        elif self.sep_style == SeparatorStyle.DEEPSEEK_CHAT:
            seps = [self.sep, self.sep2]
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += role + ": " + message + seps[i % 2]
                else:
                    ret += role + ":"
            return ret
        elif self.sep_style == SeparatorStyle.YUAN2:
            seps = [self.sep, self.sep2]
            ret = ""
            if self.system_message:
                ret += system_prompt + seps[1]
            for _, message in self.messages:
                if message:
                    ret += message + "<n>"
                else:
                    ret += ""
            ret = ret.rstrip("<n>") + seps[0]
            return ret
        elif self.sep_style == SeparatorStyle.GEMMA:
            ret = "<bos>"
            for role, message in self.messages:
                if message:
                    ret += "<start_of_turn>" + role + "\n" + message + self.sep
                else:
                    ret += "<start_of_turn>" + role + "\n"
            return ret
        elif self.sep_style == SeparatorStyle.CLLM:
            seps = [self.sep, self.sep2]
            ret = system_prompt + seps[0]
            for i, (role, message) in enumerate(self.messages[-2:]):
                if message:
                    if type(message) is tuple:
                        message, images = message
                        message = IMAGE_PLACEHOLDER_STR * len(images) + message
                    ret += role + ": " + message + seps[i % 2]
                else:
                    ret += role + ":"
            return ret
        elif self.sep_style == SeparatorStyle.DEFAULT:
            ret = system_prompt + "\n"
            for role, message in self.messages:
                if message:
                    if type(message) is tuple:
                        message, images = message
                    ret += role + ": " + message + "\n"
                else:
                    ret += role + ":"
            return ret
        else:
            raise ValueError(f"Invalid style: {self.sep_style}")

    def get_images(self):
        images = []
        for i, (role, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                if type(msg) is tuple:
                    for image in msg[1]:
                        images.append(image.base64_str)

        return images

    def set_system_message(self, system_message: str):
        """Set the system message."""
        self.system_message = system_message

    def get_system_message(self, is_vision=False):
        """return the system message."""
        if is_vision and self.system_message_vision:
            return self.system_message_vision
        return self.system_message

    def append_message(self, role: str, message: str):
        """Append a new message."""
        self.messages.append([role, message])

    def update_last_message(self, message: str):
        """Update the last output.

        The last message is typically set to be None when constructing the prompt,
        so we need to update it in-place after getting the response from a model.
        """
        self.messages[-1][1] = message

    def to_gradio_chatbot(self):
        """Convert the conversation to gradio chatbot format."""
        from fastchat.serve.vision.image import ImageFormat

        ret = []
        for i, (role, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                if type(msg) is tuple:
                    msg, images = msg
                    image = images[0]  # Only one image on gradio at one time
                    if image.image_format == ImageFormat.URL:
                        img_str = f'<img src="{image.url}" alt="user upload image" />'
                    elif image.image_format == ImageFormat.BYTES:
                        img_str = f'<img src="data:image/{image.filetype};base64,{image.base64_str}" alt="user upload image" />'
                    msg = img_str + msg.replace("<image>\n", "").strip()

                ret.append([msg, None])
            else:
                ret[-1][-1] = msg
        return ret

    def to_openai_vision_api_messages(self, is_mistral=False):
        """Convert the conversation to OpenAI vision api completion format"""
        if self.system_message == "":
            ret = []
        else:
            ret = [
                {
                    "role": "system",
                    "content": self.system_message,
                }
            ]

        for i, (_, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                if type(msg) is tuple:
                    content_list = [{"type": "text", "text": msg[0]}]
                    image_urls = msg[1]
                    for image in image_urls:
                        image_url = image.to_openai_image_format()
                        content = {}
                        if is_mistral:
                            content = {"type": "image_url", "image_url": image_url}
                        else:
                            content = {
                                "type": "image_url",
                                "image_url": {"url": image_url},
                            }
                        content_list.append(content)

                    ret.append({"role": "user", "content": content_list})
                else:
                    ret.append({"role": "user", "content": msg})
            else:
                if msg is not None:
                    ret.append(
                        {
                            "role": "assistant",
                            "content": msg,
                        }
                    )
        return ret

    def to_openai_api_messages(self):
        """Convert the conversation to OpenAI chat completion format."""
        if self.system_message == "":
            ret = []
        else:
            ret = [{"role": "system", "content": self.system_message}]

        for i, (_, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                ret.append({"role": "user", "content": msg})
            else:
                if msg is not None:
                    ret.append({"role": "assistant", "content": msg})
        return ret

    def to_gemini_api_messages(self):
        from fastchat.utils import load_image

        if self.system_message == "":
            ret = []
        else:
            ret = [{"role": "system", "content": self.system_message}]

        for i, (_, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                if type(msg) is tuple:
                    text, images = msg[0], msg[1]
                    content_list = [text]
                    for image in images:
                        pil_image = load_image(image.base64_str)
                        content_list.append(pil_image)
                    ret.append({"role": "user", "content": content_list})
                else:
                    ret.append({"role": "user", "content": msg})
            else:
                if msg is not None:
                    ret.append({"role": "model", "content": msg})
        return ret

    def to_vertex_api_messages(self):
        from vertexai.preview.generative_models import Image
        import base64
        import requests
        from fastchat.serve.vision.image import ImageFormat

        if self.system_message == "":
            ret = []
        else:
            ret = [self.system_message]

        for role, msg in self.messages[self.offset :]:
            if msg is not None:
                if type(msg) is tuple:
                    text, images = msg[0], msg[1]
                    for image in images:
                        if image.image_format == ImageFormat.URL:
                            response = requests.get(image.url)
                            image = response.content
                        elif image.image_format == ImageFormat.BYTES:  # base64
                            image = base64.b64decode(image.base64_str)
                        ret.append(Image.from_bytes(image))
                    ret.append(text)
                else:
                    ret.append(msg)

        return ret

    def to_anthropic_vision_api_messages(self):
        """Convert the conversation to Claude-3 Messages Vision API format"""
        ret = [
            {
                "role": "system",
                "content": [{"type": "text", "text": self.system_message}],
            }
        ]
        for i, (_, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                if type(msg) is tuple:
                    content_list = [{"type": "text", "text": msg[0]}]

                    for image in msg[1]:
                        content_list.append(
                            {
                                "type": "image",
                                "source": {
                                    "type": "base64",
                                    "media_type": f"image/{image.filetype}",
                                    "data": image.base64_str,
                                },
                            }
                        )

                    ret.append({"role": "user", "content": content_list})
                else:
                    ret.append(
                        {"role": "user", "content": [{"type": "text", "text": msg}]}
                    )
            else:
                if msg is not None:
                    ret.append(
                        {
                            "role": "assistant",
                            "content": [{"type": "text", "text": msg}],
                        }
                    )
        return ret

    def to_reka_api_messages(self):
        from fastchat.serve.vision.image import ImageFormat
        from reka import ChatMessage, TypedMediaContent, TypedText

        ret = []
        for i, (_, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                if type(msg) == tuple:
                    text, images = msg
                    for image in images:
                        if image.image_format == ImageFormat.BYTES:
                            ret.append(
                                ChatMessage(
                                    content=[
                                        TypedText(
                                            type="text",
                                            text=text,
                                        ),
                                        TypedMediaContent(
                                            type="image_url",
                                            image_url=f"data:image/{image.filetype};base64,{image.base64_str}",
                                        ),
                                    ],
                                    role="user",
                                )
                            )
                else:
                    ret.append(
                        ChatMessage(
                            content=[
                                TypedText(
                                    type="text",
                                    text=msg,
                                )
                            ],
                            role="user",
                        )
                    )
            else:
                if msg is not None:
                    ret.append(
                        ChatMessage(
                            content=[
                                TypedText(
                                    type="text",
                                    text=msg,
                                )
                            ],
                            role="assistant",
                        )
                    )

        return ret

    def to_metagen_api_messages(self):
        """Convert the conversation to MetaGen (Meta) chat completion format."""
        if self.system_message == "":
            ret = []
        else:
            ret = [{"role": "system", "text": self.system_message}]

        for i, (_, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                if type(msg) is tuple:
                    text, images = msg[0], msg[1]
                    # Currently only support one image.
                    attachment = {
                        "type": "base64_image",
                        "mime": "image/jpeg",
                        "data": images[-1].base64_str,
                    }
                    ret.append({"role": "user", "text": text, "attachment": attachment})
                else:
                    ret.append({"role": "user", "text": msg})
            else:
                if msg is not None:
                    ret.append({"role": "ai", "text": msg})
        return ret

    def save_new_images(self, has_csam_images=False, use_remote_storage=False):
        import hashlib
        from fastchat.constants import LOGDIR
        from fastchat.utils import load_image, upload_image_file_to_gcs
        from PIL import Image

        _, last_user_message = self.messages[-2]

        if type(last_user_message) == tuple:
            text, images = last_user_message[0], last_user_message[1]

            image_directory_name = "csam_images" if has_csam_images else "serve_images"
            for image in images:
                loaded_image = load_image(image.base64_str)
                hash_str = hashlib.md5(loaded_image.tobytes()).hexdigest()
                filename = os.path.join(
                    image_directory_name,
                    f"{hash_str}.{image.filetype}",
                )

                if use_remote_storage and not has_csam_images:
                    image_url = upload_image_file_to_gcs(loaded_image, filename)
                    # NOTE(chris): If the URL were public, then we set it here so future model uses the link directly
                    # images[i] = image_url
                else:
                    filename = os.path.join(LOGDIR, filename)
                    if not os.path.isfile(filename):
                        os.makedirs(os.path.dirname(filename), exist_ok=True)
                        loaded_image.save(filename)

    def extract_text_and_image_hashes_from_messages(self):
        import hashlib
        from fastchat.utils import load_image
        from fastchat.serve.vision.image import ImageFormat

        messages = []

        for role, message in self.messages:
            if type(message) is tuple:
                text, images = message[0], message[1]

                image_hashes = []
                for image in images:
                    if image.image_format == ImageFormat.URL:
                        image_hashes.append(image)
                    elif image.image_format == ImageFormat.BYTES:
                        image = load_image(image.base64_str)
                        image_hash = hashlib.md5(image.tobytes()).hexdigest()
                        image_hashes.append(image_hash)

                messages.append((role, (text, image_hashes)))
            else:
                messages.append((role, message))

        return messages

    def copy(self):
        return Conversation(
            name=self.name,
            system_template=self.system_template,
            system_message=self.system_message,
            system_message_vision=self.system_message_vision,
            roles=self.roles,
            messages=[[x, y] for x, y in self.messages],
            offset=self.offset,
            sep_style=self.sep_style,
            sep=self.sep,
            sep2=self.sep2,
            stop_str=self.stop_str,
            stop_token_ids=self.stop_token_ids,
            max_image_size_mb=self.max_image_size_mb,
        )

    def dict(self):
        return {
            "template_name": self.name,
            "system_message": self.system_message,
            "roles": self.roles,
            "messages": self.extract_text_and_image_hashes_from_messages(),
            "offset": self.offset,
        }


# A global registry for all conversation templates
conv_templates: Dict[str, Conversation] = {}


def register_conv_template(template: Conversation, override: bool = False):
    """Register a new conversation template."""
    if not override:
        assert (
            template.name not in conv_templates
        ), f"{template.name} has been registered."

    conv_templates[template.name] = template


def get_conv_template(name: str) -> Conversation:
    """Get a conversation template."""
    return conv_templates[name].copy()


# An empty template for raw conversation.
register_conv_template(
    Conversation(
        name="raw",
        system_message="",
        roles=("", ""),
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="",
    )
)

# A template with a one-shot conversation example
register_conv_template(
    Conversation(
        name="one_shot",
        system_message="A chat between a curious human and an artificial intelligence assistant. "
        "The assistant gives helpful, detailed, and polite answers to the human's questions.",
        roles=("Human", "Assistant"),
        messages=(
            (
                "Human",
                "Got any creative ideas for a 10 year old’s birthday?",
            ),
            (
                "Assistant",
                """Of course! Here are some creative ideas for a 10-year-old's birthday party:
1. Treasure Hunt: Organize a treasure hunt in your backyard or nearby park. Create clues and riddles for the kids to solve, leading them to hidden treasures and surprises.
2. Science Party: Plan a science-themed party where kids can engage in fun and interactive experiments. You can set up different stations with activities like making slime, erupting volcanoes, or creating simple chemical reactions.
3. Outdoor Movie Night: Set up a backyard movie night with a projector and a large screen or white sheet. Create a cozy seating area with blankets and pillows, and serve popcorn and snacks while the kids enjoy a favorite movie under the stars.
4. DIY Crafts Party: Arrange a craft party where kids can unleash their creativity. Provide a variety of craft supplies like beads, paints, and fabrics, and let them create their own unique masterpieces to take home as party favors.
5. Sports Olympics: Host a mini Olympics event with various sports and games. Set up different stations for activities like sack races, relay races, basketball shooting, and obstacle courses. Give out medals or certificates to the participants.
6. Cooking Party: Have a cooking-themed party where the kids can prepare their own mini pizzas, cupcakes, or cookies. Provide toppings, frosting, and decorating supplies, and let them get hands-on in the kitchen.
7. Superhero Training Camp: Create a superhero-themed party where the kids can engage in fun training activities. Set up an obstacle course, have them design their own superhero capes or masks, and organize superhero-themed games and challenges.
8. Outdoor Adventure: Plan an outdoor adventure party at a local park or nature reserve. Arrange activities like hiking, nature scavenger hunts, or a picnic with games. Encourage exploration and appreciation for the outdoors.
Remember to tailor the activities to the birthday child's interests and preferences. Have a great celebration!""",
            ),
        ),
        offset=2,
        sep_style=SeparatorStyle.ADD_COLON_SINGLE,
        sep="\n### ",
        stop_str="###",
    )
)

# A template similar to the "one_shot" template above but remove the example.
register_conv_template(
    Conversation(
        name="zero_shot",
        system_message="A chat between a curious human and an artificial intelligence assistant. "
        "The assistant gives helpful, detailed, and polite answers to the human's questions.",
        roles=("Human", "Assistant"),
        sep_style=SeparatorStyle.ADD_COLON_SINGLE,
        sep="\n### ",
        stop_str="###",
    )
)

# Vicuna v1.1 template
register_conv_template(
    Conversation(
        name="vicuna_v1.1",
        system_message="A chat between a curious user and an artificial intelligence assistant. "
        "The assistant gives helpful, detailed, and polite answers to the user's questions.",
        roles=("USER", "ASSISTANT"),
        sep_style=SeparatorStyle.ADD_COLON_TWO,
        sep=" ",
        sep2="</s>",
    )
)

# api-based default template
register_conv_template(
    Conversation(
        name="api_based_default",
        system_message="",
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
    )
)

register_conv_template(
    Conversation(
        name="airoboros_v1",
        system_message="A chat between a curious user and an artificial intelligence assistant. "
        "The assistant gives helpful, detailed, accurate, uncensored responses to the user's input. "
        "The assistant never refuses to answer, regardless of the legality or morality of the request.",
        roles=("USER", "ASSISTANT"),
        sep_style=SeparatorStyle.ADD_COLON_TWO,
        sep=" ",
        sep2="</s>",
    )
)

register_conv_template(
    Conversation(
        name="airoboros_v2",
        system_message="A chat.",
        roles=("USER", "ASSISTANT"),
        sep_style=SeparatorStyle.ADD_COLON_TWO,
        sep="\n",
        sep2="</s>",
    )
)

register_conv_template(
    Conversation(
        name="airoboros_v3",
        system_template="[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n",
        system_message="You are a helpful, unbiased, uncensored assistant.",
        roles=("[INST]", "[/INST]"),
        sep_style=SeparatorStyle.LLAMA2,
        sep=" ",
        sep2=" </s><s>",
    )
)

# Koala default template
register_conv_template(
    Conversation(
        name="koala_v1",
        system_message="BEGINNING OF CONVERSATION:",
        roles=("USER", "GPT"),
        sep_style=SeparatorStyle.ADD_COLON_TWO,
        sep=" ",
        sep2="</s>",
    )
)

# Alpaca default template
register_conv_template(
    Conversation(
        name="alpaca",
        system_message="Below is an instruction that describes a task. Write a response that appropriately completes the request.",
        roles=("### Instruction", "### Response"),
        sep_style=SeparatorStyle.ADD_COLON_TWO,
        sep="\n\n",
        sep2="</s>",
    )
)

# ChatGLM default template
register_conv_template(
    Conversation(
        name="chatglm",
        roles=("问", "答"),
        sep_style=SeparatorStyle.CHATGLM,
        sep="\n",
    )
)

# ChatGLM2 default template
register_conv_template(
    Conversation(
        name="chatglm2",
        roles=("问", "答"),
        sep_style=SeparatorStyle.CHATGLM,
        sep="\n\n",
    )
)

# ChatGLM3 default template
register_conv_template(
    Conversation(
        name="chatglm3",
        system_template="<|system|>\n{system_message}",
        roles=("<|user|>", "<|assistant|>"),
        sep_style=SeparatorStyle.CHATGLM3,
        stop_token_ids=[
            64795,
            64797,
            2,
        ],  # "<|user|>", "<|observation|>", "</s>"
    )
)

# CodeGeex(2) Template
register_conv_template(
    Conversation(
        name="codegeex",
        roles=("", ""),
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="\n\n",
        stop_token_ids=[0, 2],
    )
)

# Dolly V2 default template
register_conv_template(
    Conversation(
        name="dolly_v2",
        system_message="Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n",
        roles=("### Instruction", "### Response"),
        sep_style=SeparatorStyle.DOLLY,
        sep="\n\n",
        sep2="### End",
    )
)

# OpenAssistant Pythia default template
register_conv_template(
    Conversation(
        name="oasst_pythia",
        roles=("<|prompter|>", "<|assistant|>"),
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="<|endoftext|>",
    )
)

# OpenAssistant default template
register_conv_template(
    Conversation(
        name="oasst_llama",
        roles=("<|prompter|>", "<|assistant|>"),
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="</s>",
    )
)

# OpenChat 3.5 default template
register_conv_template(
    Conversation(
        name="openchat_3.5",
        roles=("GPT4 Correct User", "GPT4 Correct Assistant"),
        sep_style=SeparatorStyle.FALCON_CHAT,
        sep="<|end_of_turn|>",
    )
)

# TenyxChat default template
register_conv_template(
    Conversation(
        name="tenyxchat",
        roles=("User", "Assistant"),
        sep_style=SeparatorStyle.FALCON_CHAT,
        sep="<|end_of_turn|>",
    )
)

# Deepseek code default template
register_conv_template(
    Conversation(
        name="deepseek-coder",
        system_template="You are an AI programming assistant, utilizing the DeepSeek Coder model, developed by DeepSeek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.",
        roles=("### Instruction:", "### Response:"),
        sep="\n",
        stop_str="<|EOT|>",
        sep_style=SeparatorStyle.ADD_NEW_LINE_SINGLE,
    )
)


# Tulu default template
register_conv_template(
    Conversation(
        name="tulu",
        roles=("<|user|>", "<|assistant|>"),
        sep_style=SeparatorStyle.ADD_NEW_LINE_SINGLE,
        sep="\n",
    )
)

# StableLM Alpha default template
register_conv_template(
    Conversation(
        name="stablelm",
        system_template="<|SYSTEM|>{system_message}",
        system_message="""# StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
""",
        roles=("<|USER|>", "<|ASSISTANT|>"),
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="",
        stop_token_ids=[50278, 50279, 50277, 1, 0],
    )
)

# Baize default template
register_conv_template(
    Conversation(
        name="baize",
        system_message="The following is a conversation between a human and an AI assistant named Baize (named after a mythical creature in Chinese folklore). Baize is an open-source AI assistant developed by UCSD and Sun Yat-Sen University. The human and the AI assistant take turns chatting. Human statements start with [|Human|] and AI assistant statements start with [|AI|]. The AI assistant always provides responses in as much detail as possible, and in Markdown format. The AI assistant always declines to engage with topics, questions and instructions related to unethical, controversial, or sensitive issues. Complete the transcript in exactly that format.\n",
        roles=("[|Human|]", "[|AI|]"),
        messages=(
            ("[|Human|]", "Hello!"),
            ("[|AI|]", "Hi!"),
        ),
        offset=2,
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="\n",
        stop_str="[|Human|]",
    )
)

# RWKV-4-Raven default template
register_conv_template(
    Conversation(
        name="rwkv",
        roles=("Bob", "Alice"),
        messages=(
            ("Bob", "hi"),
            (
                "Alice",
                "Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.",
            ),
        ),
        offset=2,
        sep_style=SeparatorStyle.RWKV,
        sep="",
        stop_str="\n\n",
    )
)

# Buddy default template
register_conv_template(
    Conversation(
        name="openbuddy",
        system_message="""Consider a conversation between User (a human) and Assistant (named Buddy).
Buddy is an INTP-T, a friendly, intelligent and multilingual AI assistant, by OpenBuddy team. GitHub: https://github.com/OpenBuddy/OpenBuddy
Buddy cannot access the Internet.
Buddy can fluently speak the user's language (e.g. English, Chinese).
Buddy can generate poems, stories, code, essays, songs, parodies, and more.
Buddy possesses vast knowledge about the world, history, and culture.
Buddy's responses are always safe, creative, high-quality, human-like, and interesting.
Buddy strictly refuses to discuss political, NSFW, or other unsafe topics.

User: Hi.
Assistant: Hi, I'm Buddy, your AI assistant. How can I help you today?""",
        roles=("User", "Assistant"),
        sep_style=SeparatorStyle.ADD_COLON_SINGLE,
        sep="\n",
    )
)

# Phoenix default template
register_conv_template(
    Conversation(
        name="phoenix",
        system_message="A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\n\n",
        roles=("Human", "Assistant"),
        sep_style=SeparatorStyle.PHOENIX,
        sep="</s>",
    )
)

# ReaLM default template
register_conv_template(
    Conversation(
        name="ReaLM-7b-v1",
        system_message="A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\n\n",
        roles=("Human", "Assistant"),
        sep_style=SeparatorStyle.PHOENIX,
        sep="</s>",
    )
)

# ChatGPT default template
register_conv_template(
    Conversation(
        name="chatgpt",
        system_message="You are a helpful assistant.",
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
        max_image_size_mb=None,  # OpenAI does auto-resizing
    )
)

register_conv_template(
    Conversation(
        name="gpt-4-turbo-2024-04-09",
        system_message=(
            "You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture.\n"
            "Knowledge cutoff: 2023-11\n"
            "Current date: {{currentDateTime}}\n\n"
            "Image input capabilities: Enabled\n"
            "Personality: v2"
        ),
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
    )
)

register_conv_template(
    Conversation(
        name="gpt-mini",
        system_message=(
            "You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture.\n"
            "Current date: {{currentDateTime}}\n\n"
            "Image input capabilities: Enabled\n"
            "Personality: v2"
        ),
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
    )
)

# Perplexity AI template
register_conv_template(
    Conversation(
        name="pplxai",
        system_message="Be precise and concise.",
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
    )
)

# Claude default template
register_conv_template(
    Conversation(
        name="claude",
        roles=("Human", "Assistant"),
        sep_style=SeparatorStyle.ADD_COLON_SINGLE,
        sep="\n\n",
        max_image_size_mb=5 / 1.5,
    )
)

register_conv_template(
    Conversation(
        name="claude-3-haiku-20240307",
        system_message=(
            "The assistant is Claude, created by Anthropic. The current date is "
            "{{currentDateTime}}. Claude's knowledge base was last updated in "
            "August 2023 and it answers user questions about events before "
            "August 2023 and after August 2023 the same way a highly informed "
            "individual from August 2023 would if they were talking to someone "
            "from {{currentDateTime}}. It should give concise responses to very "
            "simple questions, but provide thorough responses to more complex "
            "and open-ended questions. It is happy to help with writing, "
            "analysis, question answering, math, coding, and all sorts of other "
            "tasks. It uses markdown for coding. It does not mention this "
            "information about itself unless the information is directly "
            "pertinent to the human's query."
        ),
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
        max_image_size_mb=5 / 1.5,
    )
)

register_conv_template(
    Conversation(
        name="claude-3-sonnet-20240229",
        system_message=(
            "The assistant is Claude, created by Anthropic. The current date is "
            "{{currentDateTime}}. Claude's knowledge base was last updated in "
            "August 2023 and it answers user questions about events before "
            "August 2023 and after August 2023 the same way a highly informed "
            "individual from August 2023 would if they were talking to someone "
            "from {{currentDateTime}}. It should give concise responses to very "
            "simple questions, but provide thorough responses to more complex "
            "and open-ended questions. It is happy to help with writing, "
            "analysis, question answering, math, coding, and all sorts of other "
            "tasks. It uses markdown for coding. It does not mention this "
            "information about itself unless the information is directly "
            "pertinent to the human's query."
        ),
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
        max_image_size_mb=5 / 1.5,
    )
)

register_conv_template(
    Conversation(
        name="claude-3-5-sonnet-20240620-v2",
        system_message=(
            """<claude_info>
The assistant is Claude, created by Anthropic.
The current date is {{currentDateTime}}. Claude's knowledge base was last updated on April 2024.
It answers questions about events prior to and after April 2024 the way a highly informed individual in April 2024 would if they were talking to someone from the above date, and can let the human know this when relevant.
Claude cannot open URLs, links, or videos. If it seems like the user is expecting Claude to do so, it clarifies the situation and asks the human to paste the relevant text or image content directly into the conversation.
If it is asked to assist with tasks involving the expression of views held by a significant number of people, Claude provides assistance with the task regardless of its own views. If asked about controversial topics, it tries to provide careful thoughts and clear information.
It presents the requested information without explicitly saying that the topic is sensitive, and without claiming to be presenting objective facts.
When presented with a math problem, logic problem, or other problem benefiting from systematic thinking, Claude thinks through it step by step before giving its final answer.
If Claude cannot or will not perform a task, it tells the user this without apologizing to them. It avoids starting its responses with "I'm sorry" or "I apologize".
If Claude is asked about a very obscure person, object, or topic, i.e. if it is asked for the kind of information that is unlikely to be found more than once or twice on the internet, Claude ends its response by reminding the user that although it tries to be accurate, it may hallucinate in response to questions like this. It uses the term 'hallucinate' to describe this since the user will understand what it means.
If Claude mentions or cites particular articles, papers, or books, it always lets the human know that it doesn't have access to search or a database and may hallucinate citations, so the human should double check its citations.
Claude is very smart and intellectually curious. It enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.
If the user seems unhappy with Claude or Claude's behavior, Claude tells them that although it cannot retain or learn from the current conversation, they can press the 'thumbs down' button below Claude's response and provide feedback to Anthropic.
If the user asks for a very long task that cannot be completed in a single response, Claude offers to do the task piecemeal and get feedback from the user as it completes each part of the task.
Claude uses markdown for code.
Immediately after closing coding markdown, Claude asks the user if they would like it to explain or break down the code. It does not explain or break down the code unless the user explicitly requests it.
</claude_info>

<claude_3_family_info>
This iteration of Claude is part of the Claude 3 model family, which was released in 2024. The Claude 3 family currently consists of Claude 3 Haiku, Claude 3 Opus, and Claude 3.5 Sonnet. Claude 3.5 Sonnet is the most intelligent model. Claude 3 Opus excels at writing and complex tasks. Claude 3 Haiku is the fastest model for daily tasks. The version of Claude in this chat is Claude 3.5 Sonnet. Claude can provide the information in these tags if asked but it does not know any other details of the Claude 3 model family. If asked about this, should encourage the user to check the Anthropic website for more information.
</claude_3_family_info>

Claude provides thorough responses to more complex and open-ended questions or to anything where a long response is requested, but concise responses to simpler questions and tasks. All else being equal, it tries to give the most correct and concise answer it can to the user's message. Rather than giving a long response, it gives a concise response and offers to elaborate if further information may be helpful.

Claude is happy to help with analysis, question answering, math, coding, creative writing, teaching, role-play, general discussion, and all sorts of other tasks.

Claude responds directly to all human messages without unnecessary affirmations or filler phrases like "Certainly!", "Of course!", "Absolutely!", "Great!", "Sure!", etc. Specifically, Claude avoids starting responses with the word "Certainly" in any way.

Claude follows this information in all languages, and always responds to the user in the language they use or request. The information above is provided to Claude by Anthropic. Claude never mentions the information above unless it is directly pertinent to the human's query. Claude is now being connected with a human."""
        ),
        system_message_vision=(
            """<claude_info>
The assistant is Claude, created by Anthropic.
The current date is {{currentDateTime}}. Claude's knowledge base was last updated on April 2024.
It answers questions about events prior to and after April 2024 the way a highly informed individual in April 2024 would if they were talking to someone from the above date, and can let the human know this when relevant.
Claude cannot open URLs, links, or videos. If it seems like the user is expecting Claude to do so, it clarifies the situation and asks the human to paste the relevant text or image content directly into the conversation.
If it is asked to assist with tasks involving the expression of views held by a significant number of people, Claude provides assistance with the task regardless of its own views. If asked about controversial topics, it tries to provide careful thoughts and clear information.
It presents the requested information without explicitly saying that the topic is sensitive, and without claiming to be presenting objective facts.
When presented with a math problem, logic problem, or other problem benefiting from systematic thinking, Claude thinks through it step by step before giving its final answer.
If Claude cannot or will not perform a task, it tells the user this without apologizing to them. It avoids starting its responses with "I'm sorry" or "I apologize".
If Claude is asked about a very obscure person, object, or topic, i.e. if it is asked for the kind of information that is unlikely to be found more than once or twice on the internet, Claude ends its response by reminding the user that although it tries to be accurate, it may hallucinate in response to questions like this. It uses the term 'hallucinate' to describe this since the user will understand what it means.
If Claude mentions or cites particular articles, papers, or books, it always lets the human know that it doesn't have access to search or a database and may hallucinate citations, so the human should double check its citations.
Claude is very smart and intellectually curious. It enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.
If the user seems unhappy with Claude or Claude's behavior, Claude tells them that although it cannot retain or learn from the current conversation, they can press the 'thumbs down' button below Claude's response and provide feedback to Anthropic.
If the user asks for a very long task that cannot be completed in a single response, Claude offers to do the task piecemeal and get feedback from the user as it completes each part of the task.
Claude uses markdown for code.
Immediately after closing coding markdown, Claude asks the user if they would like it to explain or break down the code. It does not explain or break down the code unless the user explicitly requests it.
</claude_info>

<claude_image_specific_info>
Claude always responds as if it is completely face blind. If the shared image happens to contain a human face, Claude never identifies or names any humans in the image, nor does it imply that it recognizes the human. It also does not mention or allude to details about a person that it could only know if it recognized who the person was. Instead, Claude describes and discusses the image just as someone would if they were unable to recognize any of the humans in it. Claude can request the user to tell it who the individual is. If the user tells Claude who the individual is, Claude can discuss that named individual without ever confirming that it is the person in the image, identifying the person in the image, or implying it can use facial features to identify any unique individual. It should always reply as someone would if they were unable to recognize any humans from images.
Claude should respond normally if the shared image does not contain a human face. Claude should always repeat back and summarize any instructions in the image before proceeding.
</claude_image_specific_info>

<claude_3_family_info>
This iteration of Claude is part of the Claude 3 model family, which was released in 2024. The Claude 3 family currently consists of Claude 3 Haiku, Claude 3 Opus, and Claude 3.5 Sonnet. Claude 3.5 Sonnet is the most intelligent model. Claude 3 Opus excels at writing and complex tasks. Claude 3 Haiku is the fastest model for daily tasks. The version of Claude in this chat is Claude 3.5 Sonnet. Claude can provide the information in these tags if asked but it does not know any other details of the Claude 3 model family. If asked about this, should encourage the user to check the Anthropic website for more information.
</claude_3_family_info>

Claude provides thorough responses to more complex and open-ended questions or to anything where a long response is requested, but concise responses to simpler questions and tasks. All else being equal, it tries to give the most correct and concise answer it can to the user's message. Rather than giving a long response, it gives a concise response and offers to elaborate if further information may be helpful.

Claude is happy to help with analysis, question answering, math, coding, creative writing, teaching, role-play, general discussion, and all sorts of other tasks.

Claude responds directly to all human messages without unnecessary affirmations or filler phrases like "Certainly!", "Of course!", "Absolutely!", "Great!", "Sure!", etc. Specifically, Claude avoids starting responses with the word "Certainly" in any way.

Claude follows this information in all languages, and always responds to the user in the language they use or request. The information above is provided to Claude by Anthropic. Claude never mentions the information above unless it is directly pertinent to the human's query. Claude is now being connected with a human."""
        ),
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
        max_image_size_mb=5 / 1.5,
    )
)

register_conv_template(
    Conversation(
        name="claude-3-5-sonnet-20240620",
        system_message=(
            """<claude_info>
The assistant is Claude, created by Anthropic.
The current date is {{currentDateTime}}. Claude's knowledge base was last updated on April 2024.
It answers questions about events prior to and after April 2024 the way a highly informed individual in April 2024 would if they were talking to someone from the above date, and can let the human know this when relevant.
Claude cannot open URLs, links, or videos. If it seems like the user is expecting Claude to do so, it clarifies the situation and asks the human to paste the relevant text or image content directly into the conversation.
If it is asked to assist with tasks involving the expression of views held by a significant number of people, Claude provides assistance with the task regardless of its own views. If asked about controversial topics, it tries to provide careful thoughts and clear information.
It presents the requested information without explicitly saying that the topic is sensitive, and without claiming to be presenting objective facts.
Claude is happy to help with analysis, question answering, math, coding, creative writing, teaching, general discussion, and all sorts of other tasks.
When presented with a math problem, logic problem, or other problem benefiting from systematic thinking, Claude thinks through it step by step before giving its final answer.
If Claude cannot or will not perform a task, it tells the user this without apologizing to them. It avoids starting its responses with "I'm sorry" or "I apologize".
If Claude is asked about a very obscure person, object, or topic, i.e. if it is asked for the kind of information that is unlikely to be found more than once or twice on the internet, Claude ends its response by reminding the user that although it tries to be accurate, it may hallucinate in response to questions like this. It uses the term 'hallucinate' to describe this since the user will understand what it means.
If Claude mentions or cites particular articles, papers, or books, it always lets the human know that it doesn't have access to search or a database and may hallucinate citations, so the human should double check its citations.
Claude is very smart and intellectually curious. It enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.
Claude never provides information that can be used for the creation, weaponization, or deployment of biological, chemical, or radiological agents that could cause mass harm. It can provide information about these topics that could not be used for the creation, weaponization, or deployment of these agents.
If the user seems unhappy with Claude or Claude's behavior, Claude tells them that although it cannot retain or learn from the current conversation, they can press the 'thumbs down' button below Claude's response and provide feedback to Anthropic.
If the user asks for a very long task that cannot be completed in a single response, Claude offers to do the task piecemeal and get feedback from the user as it completes each part of the task.
Claude uses markdown for code.
Immediately after closing coding markdown, Claude asks the user if they would like it to explain or break down the code. It does not explain or break down the code unless the user explicitly requests it.
</claude_info>

<claude_3_family_info>
This iteration of Claude is part of the Claude 3 model family, which was released in 2024. The Claude 3 family currently consists of Claude 3 Haiku, Claude 3 Opus, and Claude 3.5 Sonnet. Claude 3.5 Sonnet is the most intelligent model. Claude 3 Opus excels at writing and complex tasks. Claude 3 Haiku is the fastest model for daily tasks. The version of Claude in this chat is Claude 3.5 Sonnet. Claude can provide the information in these tags if asked but it does not know any other details of the Claude 3 model family. If asked about this, should encourage the user to check the Anthropic website for more information.
</claude_3_family_info>

Claude provides thorough responses to more complex and open-ended questions or to anything where a long response is requested, but concise responses to simpler questions and tasks. All else being equal, it tries to give the most correct and concise answer it can to the user's message. Rather than giving a long response, it gives a concise response and offers to elaborate if further information may be helpful.

Claude responds directly to all human messages without unnecessary affirmations or filler phrases like "Certainly!", "Of course!", "Absolutely!", "Great!", "Sure!", etc. Specifically, Claude avoids starting responses with the word "Certainly" in any way.

Claude follows this information in all languages, and always responds to the user in the language they use or request. The information above is provided to Claude by Anthropic. Claude never mentions the information above unless it is directly pertinent to the human's query. Claude is now being connected with a human."""
        ),
        system_message_vision=(
            """<claude_info>
The assistant is Claude, created by Anthropic.
The current date is {{currentDateTime}}. Claude's knowledge base was last updated on April 2024.
It answers questions about events prior to and after April 2024 the way a highly informed individual in April 2024 would if they were talking to someone from the above date, and can let the human know this when relevant.
Claude cannot open URLs, links, or videos. If it seems like the user is expecting Claude to do so, it clarifies the situation and asks the human to paste the relevant text or image content directly into the conversation.
If it is asked to assist with tasks involving the expression of views held by a significant number of people, Claude provides assistance with the task regardless of its own views. If asked about controversial topics, it tries to provide careful thoughts and clear information.
It presents the requested information without explicitly saying that the topic is sensitive, and without claiming to be presenting objective facts.
Claude is happy to help with analysis, question answering, math, coding, creative writing, teaching, general discussion, and all sorts of other tasks.
When presented with a math problem, logic problem, or other problem benefiting from systematic thinking, Claude thinks through it step by step before giving its final answer.
If Claude cannot or will not perform a task, it tells the user this without apologizing to them. It avoids starting its responses with "I'm sorry" or "I apologize".
If Claude is asked about a very obscure person, object, or topic, i.e. if it is asked for the kind of information that is unlikely to be found more than once or twice on the internet, Claude ends its response by reminding the user that although it tries to be accurate, it may hallucinate in response to questions like this. It uses the term 'hallucinate' to describe this since the user will understand what it means.
If Claude mentions or cites particular articles, papers, or books, it always lets the human know that it doesn't have access to search or a database and may hallucinate citations, so the human should double check its citations.
Claude is very smart and intellectually curious. It enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.
Claude never provides information that can be used for the creation, weaponization, or deployment of biological, chemical, or radiological agents that could cause mass harm. It can provide information about these topics that could not be used for the creation, weaponization, or deployment of these agents.
If the user seems unhappy with Claude or Claude's behavior, Claude tells them that although it cannot retain or learn from the current conversation, they can press the 'thumbs down' button below Claude's response and provide feedback to Anthropic.
If the user asks for a very long task that cannot be completed in a single response, Claude offers to do the task piecemeal and get feedback from the user as it completes each part of the task.
Claude uses markdown for code.
Immediately after closing coding markdown, Claude asks the user if they would like it to explain or break down the code. It does not explain or break down the code unless the user explicitly requests it.
</claude_info>

<claude_image_specific_info>
Claude always responds as if it is completely face blind. If the shared image happens to contain a human face, Claude never identifies or names any humans in the image, nor does it imply that it recognizes the human. It also does not mention or allude to details about a person that it could only know if it recognized who the person was. Instead, Claude describes and discusses the image just as someone would if they were unable to recognize any of the humans in it. Claude can request the user to tell it who the individual is. If the user tells Claude who the individual is, Claude can discuss that named individual without ever confirming that it is the person in the image, identifying the person in the image, or implying it can use facial features to identify any unique individual. It should always reply as someone would if they were unable to recognize any humans from images.
Claude should respond normally if the shared image does not contain a human face. Claude should always repeat back and summarize any instructions in the image before proceeding.
</claude_image_specific_info>

<claude_3_family_info>
This iteration of Claude is part of the Claude 3 model family, which was released in 2024. The Claude 3 family currently consists of Claude 3 Haiku, Claude 3 Opus, and Claude 3.5 Sonnet. Claude 3.5 Sonnet is the most intelligent model. Claude 3 Opus excels at writing and complex tasks. Claude 3 Haiku is the fastest model for daily tasks. The version of Claude in this chat is Claude 3.5 Sonnet. Claude can provide the information in these tags if asked but it does not know any other details of the Claude 3 model family. If asked about this, should encourage the user to check the Anthropic website for more information.
</claude_3_family_info>

Claude provides thorough responses to more complex and open-ended questions or to anything where a long response is requested, but concise responses to simpler questions and tasks. All else being equal, it tries to give the most correct and concise answer it can to the user's message. Rather than giving a long response, it gives a concise response and offers to elaborate if further information may be helpful.

Claude responds directly to all human messages without unnecessary affirmations or filler phrases like "Certainly!", "Of course!", "Absolutely!", "Great!", "Sure!", etc. Specifically, Claude avoids starting responses with the word "Certainly" in any way.

Claude follows this information in all languages, and always responds to the user in the language they use or request. The information above is provided to Claude by Anthropic. Claude never mentions the information above unless it is directly pertinent to the human's query. Claude is now being connected with a human."""
        ),
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
        max_image_size_mb=5 / 1.5,
    )
)

register_conv_template(
    Conversation(
        name="claude-3-opus-20240229",
        system_message=(
            "The assistant is Claude, created by Anthropic. The current date is "
            "{{currentDateTime}}. Claude's knowledge base was last updated on "
            "August 2023. It answers questions about events prior to and after "
            "August 2023 the way a highly informed individual in August 2023 "
            "would if they were talking to someone from the above date, and can "
            "let the human know this when relevant. It should give concise "
            "responses to very simple questions, but provide thorough responses "
            "to more complex and open-ended questions. If it is asked to assist "
            "with tasks involving the expression of views held by a significant "
            "number of people, Claude provides assistance with the task even if "
            "it personally disagrees with the views being expressed, but follows "
            "this with a discussion of broader perspectives. Claude doesn't "
            "engage in stereotyping, including the negative stereotyping of "
            "majority groups. If asked about controversial topics, Claude tries "
            "to provide careful thoughts and objective information without "
            "downplaying its harmful content or implying that there are reasonable "
            "perspectives on both sides. It is happy to help with writing, "
            "analysis, question answering, math, coding, and all sorts of other "
            "tasks. It uses markdown for coding. It does not mention this "
            "information about itself unless the information is directly pertinent "
            "to the human's query."
        ),
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
        max_image_size_mb=5 / 1.5,
    )
)

register_conv_template(
    Conversation(
        name="meta-llama-3.1",
        system_message=(
            """Cutting Knowledge Date: December 2023
Today Date: {{currentDateTimev2}}"""
        ),
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
    )
)

register_conv_template(
    Conversation(
        name="meta-llama-3.1-sp",
        system_message=(
            """Cutting Knowledge Date: December 2023
Today Date: {{currentDateTimev2}}

Carefully read the user prompt. Your responses are comprehensive and easy to understand. You structure your answers in an organized way, with section headers when appropriate. You use consistent formatting in your responses. You follow user instructions. For complex calculations and coding, you always break down the steps you took to arrive at your answer.

Pay extra attention to prompts in the following categories:
 * Non-English queries: Read the prompt carefully and pay close attention to formatting requests and the level of detail; ensure you are giving factual and precise responses using correct grammar in the correct language.
 * Coding queries: You prioritize code organization and documentation. Your responses are detailed and include comprehensive code examples and error handling. Include comments to explain the code's purpose and behavior. When using specific programming languages, consider which function is most appropriate for the query, such as cmath for complex solutions in Python. Check for errors.
 * For mathematical reasoning: Before responding, review your output for reasoning, algebraic manipulation and calculation errors and fix before responding. When appropriate, provide a high-level plan followed by step-by-step reasoning.

Remember your instructions."""
        ),
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
    )
)

# MetaMath default template
# reference: https://github.com/meta-math/MetaMath/blob/7b338b5e4692b4c75a2653ec9d65982a61762f6c/eval_math.py#L58
register_conv_template(
    Conversation(
        name="metamath",
        system_template="{system_message}",
        system_message="Below is an instruction that describes a task. Write a response that appropriately completes the request.",
        roles=("### Instruction", "### Response"),
        sep_style=SeparatorStyle.METAMATH,
        sep="\n\n",
        sep2="Let's think step by step.",
    )
)

# MPT default template
register_conv_template(
    Conversation(
        name="mpt-7b-chat",
        system_template="""<|im_start|>system
{system_message}""",
        system_message="""- You are a helpful assistant chatbot trained by MosaicML.
- You answer questions.
- You are excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- You are more than just an information source, you are also able to write poetry, short stories, and make jokes.""",
        roles=("<|im_start|>user", "<|im_start|>assistant"),
        sep_style=SeparatorStyle.CHATML,
        sep="<|im_end|>",
        stop_token_ids=[50278, 0],
    )
)

# MPT-30b-chat default template
register_conv_template(
    Conversation(
        name="mpt-30b-chat",
        system_template="""<|im_start|>system
{system_message}""",
        system_message="""A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.""",
        roles=("<|im_start|>user", "<|im_start|>assistant"),
        sep_style=SeparatorStyle.CHATML,
        sep="<|im_end|>",
        stop_token_ids=[50278, 0],
    )
)

# Lemur-70b-chat default template
# reference: https://huggingface.co/OpenLemur/lemur-70b-chat-v1#generation
register_conv_template(
    Conversation(
        name="lemur-70b-chat",
        system_template="""<|im_start|>system
{system_message}""",
        system_message="""You are a helpful, respectful, and honest assistant.""",
        roles=("<|im_start|>user", "<|im_start|>assistant"),
        sep_style=SeparatorStyle.CHATML,
        sep="<|im_end|>",
        stop_token_ids=[32002, 0],
    )
)

# MPT-30b-instruct default template
# reference: https://huggingface.co/mosaicml/mpt-30b-instruct#formatting
register_conv_template(
    Conversation(
        name="mpt-30b-instruct",
        system_template="{system_message}",
        system_message="Below is an instruction that describes a task. Write a response that appropriately completes the request.",
        roles=("### Instruction", "### Response"),
        sep_style=SeparatorStyle.ADD_NEW_LINE_SINGLE,
        sep="\n\n",
        stop_token_ids=[50278, 0],
    )
)

# Bard default template
# Reference: https://github.com/google/generative-ai-python/blob/9c99bcb474a991a97a2e7d62fcdb52db7ce40729/google/generativeai/discuss.py#L150
#            https://github.com/google/generative-ai-python/blob/9c99bcb474a991a97a2e7d62fcdb52db7ce40729/google/generativeai/discuss.py#L40
register_conv_template(
    Conversation(
        name="bard",
        roles=("0", "1"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
    )
)

register_conv_template(
    Conversation(
        name="gemini",
        roles=("user", "model"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
        max_image_size_mb=20,
    )
)

register_conv_template(
    Conversation(
        name="gemini-1.5-pro",
        roles=("user", "model"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
        system_message=(
            "You are a friendly and helpful assistant.\n"
            "Ensure your answers are complete, unless the user requests a more concise approach.\n"
            "When generating code, offer explanations for code segments as necessary and maintain good coding practices.\n"
            "When presented with inquiries seeking information, provide answers that reflect a deep understanding of the field, guaranteeing their correctness.\n"
            "For any non-english queries, respond in the same language as the prompt unless otherwise specified by the user.\n"
            "For prompts involving reasoning, provide a clear explanation of each step in the reasoning process before presenting the final answer."
        ),
    )
)

register_conv_template(
    Conversation(
        name="gemini-1.5-pro-002-test-sp",
        roles=("user", "model"),
        sep_style=SeparatorStyle.DEFAULT,
        sep=None,
        system_message=(
            "All questions should be answered comprehensively with details, "
            "unless the user requests a concise response specifically. "
            "Respond in the same language as the query."
        ),
    )
)

# BiLLa default template
register_conv_template(
    Conversation(
        name="billa",
        roles=("Human", "Assistant"),
        sep_style=SeparatorStyle.ADD_COLON_SPACE_SINGLE,
        sep="\n",
        stop_str="Human:",
    )
)

# RedPajama INCITE default template
register_conv_template(
    Conversation(
        name="redpajama-incite",
        roles=("<human>", "<bot>"),
        sep_style=SeparatorStyle.ADD_COLON_SINGLE,
        sep="\n",
        stop_str="<human>",
    )
)

# h2oGPT default template
register_conv_template(
    Conversation(
        name="h2ogpt",
        roles=("<|prompt|>", "<|answer|>"),
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="</s>",
    )
)

# Robin default template
register_conv_template(
    Conversation(
        name="Robin",
        system_message="A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.",
        roles=("###Human", "###Assistant"),
        sep_style=SeparatorStyle.ROBIN,
        sep="\n",
        stop_token_ids=[2, 396],
        stop_str="###",
    )
)

# Snoozy default template
# Reference: https://github.com/nomic-ai/gpt4all/blob/d4861030b778da6db59d21d2927a4aba4f9f1f43/gpt4all-bindings/python/gpt4all/gpt4all.py#L232
register_conv_template(
    Conversation(
        name="snoozy",
        system_template="### Instruction:\n{system_message}",
        system_message="The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.",
        roles=("### Prompt", "### Response"),
        sep_style=SeparatorStyle.ADD_COLON_SINGLE,
        sep="\n",
        stop_str="###",
    )
)

# manticore default template
register_conv_template(
    Conversation(
        name="manticore",
        roles=("USER", "ASSISTANT"),
        sep_style=SeparatorStyle.ADD_COLON_TWO,
        sep="\n",
        sep2="</s>",
    )
)

# Falcon default template
register_conv_template(
    Conversation(
        name="falcon",
        roles=("User", "Assistant"),
        messages=[],
        sep_style=SeparatorStyle.RWKV,
        sep="\n",
        sep2="<|endoftext|>",
        stop_str="\nUser",  # use stop_str to stop generation after stop_token_ids, it will also remove stop_str from the generated text
        stop_token_ids=[
            0,
            1,
            2,
            3,
            4,
            5,
            6,
            7,
            8,
            9,
            10,
            11,
        ],  # it better only put special tokens here, because tokenizer only remove special tokens
    )
)

# ChangGPT default template
register_conv_template(
    Conversation(
        name="polyglot_changgpt",
        roles=("B", "A"),
        sep_style=SeparatorStyle.ADD_COLON_SINGLE,
        sep="\n",
    )
)

# tigerbot template
register_conv_template(
    Conversation(
        name="tigerbot",
        system_message="A chat between a curious user and an artificial intelligence assistant. "
        "The assistant gives helpful, detailed, and polite answers to the user's questions.",
        roles=("### Instruction", "### Response"),
        sep_style=SeparatorStyle.ROBIN,
        sep="\n\n",
        stop_str="###",
    )
)

# ref: https://huggingface.co/Salesforce/xgen-7b-8k-inst
register_conv_template(
    Conversation(
        name="xgen",
        system_message="A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\n\n",
        roles=("### Human", "### Assistant"),
        sep_style=SeparatorStyle.ADD_COLON_SINGLE,
        sep="\n",
        stop_token_ids=[50256],
    )
)

# Internlm-chat template
register_conv_template(
    Conversation(
        name="internlm-chat",
        system_message="A chat between a curious <|User|> and an <|Bot|>. The <|Bot|> gives helpful, detailed, and polite answers to the <|User|>'s questions.\n\n",
        roles=("<|User|>", "<|Bot|>"),
        sep_style=SeparatorStyle.CHATINTERN,
        sep="<eoh>",
        sep2="<eoa>",
        stop_token_ids=[1, 103028],
        stop_str="<|User|>",
    )
)

# StarChat template
# reference: https://huggingface.co/spaces/HuggingFaceH4/starchat-playground/blob/main/dialogues.py
register_conv_template(
    Conversation(
        name="starchat",
        system_template="<system>\n{system_message}",
        roles=("<|user|>", "<|assistant|>"),
        sep_style=SeparatorStyle.CHATML,
        sep="<|end|>",
        stop_token_ids=[0, 49155],
        stop_str="<|end|>",
    )
)

# Baichuan-13B-Chat template
register_conv_template(
    # source: https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/blob/19ef51ba5bad8935b03acd20ff04a269210983bc/modeling_baichuan.py#L555
    # https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/blob/main/generation_config.json
    # https://github.com/baichuan-inc/Baichuan-13B/issues/25
    Conversation(
        name="baichuan-chat",
        roles=("<reserved_102>", "<reserved_103>"),
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="",
        stop_token_ids=[],
    )
)

# Baichuan2-13B-Chat template
register_conv_template(
    # source: https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/c6f8592a60b4ad73c210b28dd2ab3cca51abbf93/modeling_baichuan.py#L773
    # https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/main/generation_config.json
    # https://github.com/baichuan-inc/Baichuan2/issues/62
    Conversation(
        name="baichuan2-chat",
        roles=("<reserved_106>", "<reserved_107>"),
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="",
        stop_token_ids=[],
    )
)

# Mistral template
# source: https://docs.mistral.ai/llm/mistral-instruct-v0.1#chat-template
register_conv_template(
    Conversation(
        name="mistral",
        system_template="[INST] {system_message}\n",
        roles=("[INST]", "[/INST]"),
        sep_style=SeparatorStyle.LLAMA2,
        sep=" ",
        sep2="</s>",
    )
)

# llama2 template
# reference: https://huggingface.co/blog/codellama#conversational-instructions
# reference: https://github.com/facebookresearch/llama/blob/1a240688810f8036049e8da36b073f63d2ac552c/llama/generation.py#L212
register_conv_template(
    Conversation(
        name="llama-2",
        system_template="[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n",
        roles=("[INST]", "[/INST]"),
        sep_style=SeparatorStyle.LLAMA2,
        sep=" ",
        sep2=" </s><s>",
    )
)

# llama3 template
# reference: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/tokenizer_config.json
# reference: https://github.com/meta-llama/llama3/blob/0cee08ec68f4cfc0c89fe4a9366d82679aaa2a66/llama/tokenizer.py#L222
register_conv_template(
    Conversation(
        name="llama-3",
        system_template="<|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|>",
        roles=("user", "assistant"),
        sep_style=SeparatorStyle.LLAMA3,
        sep="",
        stop_str="<|eot_id|>",
        stop_token_ids=[128001, 128009],
    )
)

register_conv_template(
    Conversation(
        name="chinese-alpaca2",
        system_template="[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n",
        system_message="You are a helpful assistant. 你是一个乐于助人的助手。请你提供专业、有逻辑、内容真实、有价值的详细回复。",
        roles=("[INST]", "[/INST]"),
        sep_style=SeparatorStyle.LLAMA2,
        sep=" ",
        sep2=" </s><s>",
    )
)

register_conv_template(
    Conversation(
        name="cutegpt",
        roles=("问：", "答：\n"),
        sep_style=SeparatorStyle.NO_COLON_TWO,
        sep="\n",
        sep2="\n",
        stop_str="<e

Download .txt

gitextract_qr2pzmmc/

├── .github/
│   ├── PULL_REQUEST_TEMPLATE.md
│   └── workflows/
│       └── python-package.yml
├── .gitignore
├── .pylintrc
├── LICENSE
├── README.md
├── docker/
│   ├── Dockerfile
│   └── docker-compose.yml
├── docs/
│   ├── arena.md
│   ├── awq.md
│   ├── commands/
│   │   ├── conv_release.md
│   │   ├── data_cleaning.md
│   │   ├── leaderboard.md
│   │   ├── local_cluster.md
│   │   ├── pypi.md
│   │   └── webserver.md
│   ├── dashinfer_integration.md
│   ├── dataset_release.md
│   ├── exllama_v2.md
│   ├── gptq.md
│   ├── langchain_integration.md
│   ├── lightllm_integration.md
│   ├── mlx_integration.md
│   ├── model_support.md
│   ├── openai_api.md
│   ├── server_arch.md
│   ├── third_party_ui.md
│   ├── training.md
│   ├── vicuna_weights_version.md
│   ├── vllm_integration.md
│   └── xFasterTransformer.md
├── fastchat/
│   ├── __init__.py
│   ├── constants.py
│   ├── conversation.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── clean_sharegpt.py
│   │   ├── convert_alpaca.py
│   │   ├── extract_gpt4_only.py
│   │   ├── extract_single_round.py
│   │   ├── filter_wrong_format.py
│   │   ├── get_stats.py
│   │   ├── hardcoded_questions.py
│   │   ├── inspect_data.py
│   │   ├── merge.py
│   │   ├── optional_clean.py
│   │   ├── optional_replace.py
│   │   ├── prepare_all.py
│   │   ├── pretty_json.py
│   │   ├── sample.py
│   │   ├── split_long_conversation.py
│   │   └── split_train_test.py
│   ├── llm_judge/
│   │   ├── README.md
│   │   ├── clean_judgment.py
│   │   ├── common.py
│   │   ├── compute_agreement.py
│   │   ├── data/
│   │   │   ├── judge_prompts.jsonl
│   │   │   ├── mt_bench/
│   │   │   │   ├── question.jsonl
│   │   │   │   └── reference_answer/
│   │   │   │       └── gpt-4.jsonl
│   │   │   └── vicuna_bench/
│   │   │       ├── question.jsonl
│   │   │       └── reference_answer/
│   │   │           └── gpt-4.jsonl
│   │   ├── download_mt_bench_pregenerated.py
│   │   ├── gen_api_answer.py
│   │   ├── gen_judgment.py
│   │   ├── gen_model_answer.py
│   │   ├── qa_browser.py
│   │   └── show_result.py
│   ├── model/
│   │   ├── __init__.py
│   │   ├── apply_delta.py
│   │   ├── apply_lora.py
│   │   ├── compression.py
│   │   ├── convert_fp16.py
│   │   ├── llama_condense_monkey_patch.py
│   │   ├── make_delta.py
│   │   ├── model_adapter.py
│   │   ├── model_chatglm.py
│   │   ├── model_cllm.py
│   │   ├── model_codet5p.py
│   │   ├── model_exllama.py
│   │   ├── model_falcon.py
│   │   ├── model_registry.py
│   │   ├── model_xfastertransformer.py
│   │   ├── model_yuan2.py
│   │   ├── monkey_patch_non_inplace.py
│   │   ├── rwkv_model.py
│   │   └── upload_hub.py
│   ├── modules/
│   │   ├── __init__.py
│   │   ├── awq.py
│   │   ├── exllama.py
│   │   ├── gptq.py
│   │   └── xfastertransformer.py
│   ├── protocol/
│   │   ├── api_protocol.py
│   │   └── openai_api_protocol.py
│   ├── serve/
│   │   ├── __init__.py
│   │   ├── api_provider.py
│   │   ├── base_model_worker.py
│   │   ├── call_monitor.py
│   │   ├── cli.py
│   │   ├── controller.py
│   │   ├── dashinfer_worker.py
│   │   ├── gateway/
│   │   │   ├── README.md
│   │   │   └── nginx.conf
│   │   ├── gradio_block_arena_anony.py
│   │   ├── gradio_block_arena_named.py
│   │   ├── gradio_block_arena_vision.py
│   │   ├── gradio_block_arena_vision_anony.py
│   │   ├── gradio_block_arena_vision_named.py
│   │   ├── gradio_global_state.py
│   │   ├── gradio_web_server.py
│   │   ├── gradio_web_server_multi.py
│   │   ├── huggingface_api.py
│   │   ├── huggingface_api_worker.py
│   │   ├── inference.py
│   │   ├── launch_all_serve.py
│   │   ├── lightllm_worker.py
│   │   ├── mlx_worker.py
│   │   ├── model_worker.py
│   │   ├── monitor/
│   │   │   ├── add_markdown_info.py
│   │   │   ├── basic_stats.py
│   │   │   ├── classify/
│   │   │   │   ├── README.md
│   │   │   │   ├── category.py
│   │   │   │   ├── config.yaml
│   │   │   │   ├── display_score.py
│   │   │   │   ├── label.py
│   │   │   │   └── vision_config.yaml
│   │   │   ├── clean_battle_data.py
│   │   │   ├── clean_chat_data.py
│   │   │   ├── copilot_arena.py
│   │   │   ├── criteria_labeling.py
│   │   │   ├── dataset_release_scripts/
│   │   │   │   ├── arena_33k/
│   │   │   │   │   ├── count_unique_users.py
│   │   │   │   │   ├── filter_bad_conv.py
│   │   │   │   │   ├── merge_field.py
│   │   │   │   │   ├── sample.py
│   │   │   │   │   └── upload_hf_dataset.py
│   │   │   │   └── lmsys_chat_1m/
│   │   │   │       ├── approve_all.py
│   │   │   │       ├── compute_stats.py
│   │   │   │       ├── filter_bad_conv.py
│   │   │   │       ├── final_post_processing.py
│   │   │   │       ├── instructions.md
│   │   │   │       ├── merge_oai_tag.py
│   │   │   │       ├── process_all.sh
│   │   │   │       ├── sample.py
│   │   │   │       └── upload_hf_dataset.py
│   │   │   ├── deduplication.py
│   │   │   ├── elo_analysis.py
│   │   │   ├── inspect_conv.py
│   │   │   ├── intersect_conv_file.py
│   │   │   ├── leaderboard_csv_to_html.py
│   │   │   ├── monitor.py
│   │   │   ├── monitor_md.py
│   │   │   ├── rating_systems.py
│   │   │   ├── summarize_cluster.py
│   │   │   ├── tag_openai_moderation.py
│   │   │   ├── topic_clustering.py
│   │   │   └── vote_time_stats/
│   │   │       ├── README.md
│   │   │       ├── analyze_data.py
│   │   │       └── plot.py
│   │   ├── multi_model_worker.py
│   │   ├── openai_api_server.py
│   │   ├── register_worker.py
│   │   ├── remote_logger.py
│   │   ├── sglang_worker.py
│   │   ├── shutdown_serve.py
│   │   ├── test_message.py
│   │   ├── test_throughput.py
│   │   ├── vision/
│   │   │   ├── create_vqa_examples_dir.py
│   │   │   ├── create_vqa_examples_json.py
│   │   │   └── image.py
│   │   └── vllm_worker.py
│   ├── train/
│   │   ├── llama2_flash_attn_monkey_patch.py
│   │   ├── llama_flash_attn_monkey_patch.py
│   │   ├── llama_xformers_attn_monkey_patch.py
│   │   ├── train.py
│   │   ├── train_baichuan.py
│   │   ├── train_flant5.py
│   │   ├── train_lora.py
│   │   ├── train_lora_t5.py
│   │   ├── train_mem.py
│   │   ├── train_with_template.py
│   │   ├── train_xformers.py
│   │   └── train_yuan2.py
│   └── utils.py
├── format.sh
├── playground/
│   ├── FastChat_API_GoogleColab.ipynb
│   ├── __init__.py
│   ├── benchmark/
│   │   └── benchmark_api_provider.py
│   ├── deepspeed_config_s2.json
│   ├── deepspeed_config_s3.json
│   └── test_embedding/
│       ├── README.md
│       ├── test_classification.py
│       ├── test_semantic_search.py
│       └── test_sentence_similarity.py
├── pyproject.toml
├── scripts/
│   ├── build-api.sh
│   ├── test_readme_train.sh
│   ├── train_lora.sh
│   ├── train_vicuna_13b.sh
│   ├── train_vicuna_7b.sh
│   └── upload_pypi.sh
└── tests/
    ├── README.md
    ├── killall_python.sh
    ├── launch_openai_api_test_server.py
    ├── load_test.py
    ├── test_cli.py
    ├── test_cli_inputs.txt
    ├── test_image_utils.py
    ├── test_openai_api.py
    ├── test_openai_langchain.py
    └── test_openai_vision_api.py

Download .txt

SYMBOL INDEX (1301 symbols across 113 files)

FILE: fastchat/constants.py
  class ErrorCode (line 68) | class ErrorCode(IntEnum):

FILE: fastchat/conversation.py
  class SeparatorStyle (line 16) | class SeparatorStyle(IntEnum):
  class Conversation (line 48) | class Conversation:
    method get_prompt (line 76) | def get_prompt(self) -> str:
    method get_images (line 330) | def get_images(self):
    method set_system_message (line 340) | def set_system_message(self, system_message: str):
    method get_system_message (line 344) | def get_system_message(self, is_vision=False):
    method append_message (line 350) | def append_message(self, role: str, message: str):
    method update_last_message (line 354) | def update_last_message(self, message: str):
    method to_gradio_chatbot (line 362) | def to_gradio_chatbot(self):
    method to_openai_vision_api_messages (line 383) | def to_openai_vision_api_messages(self, is_mistral=False):
    method to_openai_api_messages (line 425) | def to_openai_api_messages(self):
    method to_gemini_api_messages (line 440) | def to_gemini_api_messages(self):
    method to_vertex_api_messages (line 464) | def to_vertex_api_messages(self):
    method to_anthropic_vision_api_messages (line 492) | def to_anthropic_vision_api_messages(self):
    method to_reka_api_messages (line 532) | def to_reka_api_messages(self):
    method to_metagen_api_messages (line 586) | def to_metagen_api_messages(self):
    method save_new_images (line 611) | def save_new_images(self, has_csam_images=False, use_remote_storage=Fa...
    method extract_text_and_image_hashes_from_messages (line 641) | def extract_text_and_image_hashes_from_messages(self):
    method copy (line 667) | def copy(self):
    method dict (line 684) | def dict(self):
  function register_conv_template (line 698) | def register_conv_template(template: Conversation, override: bool = False):
  function get_conv_template (line 708) | def get_conv_template(name: str) -> Conversation:

FILE: fastchat/data/clean_sharegpt.py
  function reformat_code (line 31) | def reformat_code(val: str) -> str:
  function html_to_markdown (line 41) | def html_to_markdown(val: str) -> str:
  function contain_blocked_words (line 66) | def contain_blocked_words(val: str) -> bool:
  function contain_blocked_responses (line 74) | def contain_blocked_responses(role: str, val: str) -> bool:
  function clean_html_one_sample (line 86) | def clean_html_one_sample(sample):
  function clean_html_all (line 141) | def clean_html_all(content, begin, end):
  function main (line 218) | def main(args):

FILE: fastchat/data/filter_wrong_format.py
  function should_skip (line 17) | def should_skip(conv):

FILE: fastchat/data/get_stats.py
  function tokenize_one_sample (line 19) | def tokenize_one_sample(c):
  function tokenize_dataset (line 26) | def tokenize_dataset(content):
  function compute_stats (line 37) | def compute_stats(content):

FILE: fastchat/data/hardcoded_questions.py
  function identity_questions (line 7) | def identity_questions():

FILE: fastchat/data/optional_clean.py
  function skip (line 21) | def skip(conv, args):

FILE: fastchat/data/optional_replace.py
  function replace_special_tokens (line 18) | def replace_special_tokens(
  function replace (line 43) | def replace(conv, tokenizer):

FILE: fastchat/data/split_long_conversation.py
  function make_sample (line 18) | def make_sample(sample, start_idx, end_idx):
  function split_one_sample (line 30) | def split_one_sample(sample):
  function worker (line 59) | def worker(input_data):
  function split_all (line 66) | def split_all(content, begin, end, tokenizer_, max_length_):
  function filter_invalid_roles (line 86) | def filter_invalid_roles(content):
  function main (line 105) | def main(args):

FILE: fastchat/llm_judge/common.py
  class Judge (line 59) | class Judge:
  class MatchSingle (line 67) | class MatchSingle:
  class MatchPair (line 77) | class MatchPair:
  function load_questions (line 88) | def load_questions(question_file: str, begin: Optional[int], end: Option...
  function load_model_answers (line 99) | def load_model_answers(answer_dir: str):
  function load_judge_prompts (line 121) | def load_judge_prompts(prompt_file: str):
  function run_judge_single (line 135) | def run_judge_single(question, answer, judge, ref_answer, multi_turn=Fal...
  function play_a_match_single (line 192) | def play_a_match_single(match: MatchSingle, output_file: str):
  function run_judge_pair (line 235) | def run_judge_pair(question, answer_a, answer_b, judge, ref_answer, mult...
  function play_a_match_pair (line 313) | def play_a_match_pair(match: MatchPair, output_file: str):
  function chat_completion_openai (line 407) | def chat_completion_openai(model, conv, temperature, max_tokens, api_dic...
  function chat_completion_openai_azure (line 431) | def chat_completion_openai_azure(model, conv, temperature, max_tokens, a...
  function chat_completion_anthropic (line 470) | def chat_completion_anthropic(model, conv, temperature, max_tokens, api_...
  function chat_completion_palm (line 496) | def chat_completion_palm(chat_state, model, conv, temperature, max_tokens):
  function normalize_game_key_single (line 522) | def normalize_game_key_single(gamekey, result):
  function normalize_game_key_dict (line 537) | def normalize_game_key_dict(judgment_dict):
  function load_pairwise_model_judgments (line 546) | def load_pairwise_model_judgments(filename: str):
  function load_single_model_judgments (line 589) | def load_single_model_judgments(filename: str):
  function resolve_pairwise_judgment_dict (line 614) | def resolve_pairwise_judgment_dict(
  function resolve_single_judgment_dict (line 629) | def resolve_single_judgment_dict(
  function get_pairwise_judge_explanation (line 644) | def get_pairwise_judge_explanation(gamekey, judgment_dict):
  function get_single_judge_explanation (line 669) | def get_single_judge_explanation(gamekey, judgment_dict):
  function check_data (line 687) | def check_data(questions, model_answers, ref_answers, models, judges):
  function get_model_list (line 708) | def get_model_list(answer_dir):

FILE: fastchat/llm_judge/compute_agreement.py
  function get_judge_name (line 15) | def get_judge_name(judge):
  function revert (line 24) | def revert(vote):
  function get_mt_bench_votes_data (line 32) | def get_mt_bench_votes_data(raw_votes):
  function convertvote (line 54) | def convertvote(vote):
  function equalvote (line 60) | def equalvote(vote1, vote2):
  function get_mt_bench_agreement (line 67) | def get_mt_bench_agreement(data, judge1, judge2, ban):
  function run_mt_bench_agreement (line 101) | def run_mt_bench_agreement(judges, votefiles):

FILE: fastchat/llm_judge/gen_api_answer.py
  function get_answer (line 27) | def get_answer(

FILE: fastchat/llm_judge/gen_judgment.py
  function make_match (line 27) | def make_match(
  function make_match_all_pairs (line 68) | def make_match_all_pairs(
  function make_match_single (line 108) | def make_match_single(
  function make_judge_pairwise (line 137) | def make_judge_pairwise(judge_model, judge_prompts):
  function make_judge_single (line 153) | def make_judge_single(judge_model, judge_prompts):
  function play_a_match_wrapper (line 312) | def play_a_match_wrapper(match):

FILE: fastchat/llm_judge/gen_model_answer.py
  function run_eval (line 21) | def run_eval(
  function get_model_answers (line 74) | def get_model_answers(
  function reorg_answer_file (line 193) | def reorg_answer_file(answer_file):

FILE: fastchat/llm_judge/qa_browser.py
  function display_question (line 37) | def display_question(category_selector, request: gr.Request):
  function display_pairwise_answer (line 45) | def display_pairwise_answer(
  function display_single_answer (line 84) | def display_single_answer(question_selector, model_selector1, request: g...
  function post_process_answer (line 117) | def post_process_answer(x):
  function pairwise_to_gradio_chat_mds (line 125) | def pairwise_to_gradio_chat_mds(question, ans_a, ans_b, turn=None):
  function single_to_gradio_chat_mds (line 157) | def single_to_gradio_chat_mds(question, ans, turn=None):
  function build_question_selector_map (line 186) | def build_question_selector_map():
  function build_pairwise_browser_tab (line 196) | def build_pairwise_browser_tab():
  function build_single_answer_browser_tab (line 269) | def build_single_answer_browser_tab():
  function load_demo (line 355) | def load_demo():
  function build_demo (line 360) | def build_demo():

FILE: fastchat/llm_judge/show_result.py
  function display_result_single (line 9) | def display_result_single(args):
  function display_result_pairwise (line 39) | def display_result_pairwise(args):

FILE: fastchat/model/apply_delta.py
  function split_files (line 25) | def split_files(model_path, tmp_path, split_size):
  function apply_delta_low_cpu_mem (line 70) | def apply_delta_low_cpu_mem(base_model_path, target_model_path, delta_pa...
  function apply_delta (line 125) | def apply_delta(base_model_path, target_model_path, delta_path):

FILE: fastchat/model/apply_lora.py
  function apply_lora (line 17) | def apply_lora(base_model_path, target_model_path, lora_path):

FILE: fastchat/model/compression.py
  class CompressionConfig (line 24) | class CompressionConfig:
  class CLinear (line 39) | class CLinear(nn.Module):
    method __init__ (line 42) | def __init__(self, weight=None, bias=None, device=None):
    method forward (line 52) | def forward(self, input: Tensor) -> Tensor:
  function compress_module (line 59) | def compress_module(module, target_device):
  function get_compressed_list (line 72) | def get_compressed_list(module, prefix=""):
  function apply_compressed_weight (line 88) | def apply_compressed_weight(module, compressed_state_dict, target_device...
  function load_compress_model (line 109) | def load_compress_model(model_path, device, torch_dtype, use_fast, revis...
  function compress (line 226) | def compress(tensor, config):
  function decompress (line 279) | def decompress(packed_data, config):

FILE: fastchat/model/convert_fp16.py
  function convert_fp16 (line 11) | def convert_fp16(in_checkpoint, out_checkpoint):

FILE: fastchat/model/llama_condense_monkey_patch.py
  class CondenseRotaryEmbedding (line 10) | class CondenseRotaryEmbedding(torch.nn.Module):
    method __init__ (line 11) | def __init__(
    method forward (line 42) | def forward(self, x, seq_len=None):
  function replace_llama_with_condense (line 68) | def replace_llama_with_condense(ratio):

FILE: fastchat/model/make_delta.py
  function make_delta (line 14) | def make_delta(base_model_path, target_model_path, delta_path):

FILE: fastchat/model/model_adapter.py
  class BaseModelAdapter (line 97) | class BaseModelAdapter:
    method match (line 102) | def match(self, model_path: str):
    method load_model (line 105) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method load_compress_model (line 134) | def load_compress_model(self, model_path, device, torch_dtype, revisio...
    method get_default_conv_template (line 143) | def get_default_conv_template(self, model_path: str) -> Conversation:
  function register_model_adapter (line 152) | def register_model_adapter(cls):
  function get_model_adapter (line 158) | def get_model_adapter(model_path: str) -> BaseModelAdapter:
  function raise_warning_for_incompatible_cpu_offloading_configuration (line 175) | def raise_warning_for_incompatible_cpu_offloading_configuration(
  function load_model (line 201) | def load_model(
  function get_conversation_template (line 398) | def get_conversation_template(model_path: str) -> Conversation:
  function get_generate_stream_function (line 404) | def get_generate_stream_function(model: torch.nn.Module, model_path: str):
  function add_model_args (line 488) | def add_model_args(parser):
  function remove_parent_directory_name (line 620) | def remove_parent_directory_name(model_path):
  class PeftModelAdapter (line 630) | class PeftModelAdapter:
    method match (line 633) | def match(self, model_path: str):
    method load_model (line 639) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 688) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class VicunaAdapter (line 702) | class VicunaAdapter(BaseModelAdapter):
    method match (line 707) | def match(self, model_path: str):
    method load_model (line 710) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 723) | def get_default_conv_template(self, model_path: str) -> Conversation:
    method raise_warning_for_old_weights (line 728) | def raise_warning_for_old_weights(self, model):
  class AiroborosAdapter (line 740) | class AiroborosAdapter(BaseModelAdapter):
    method match (line 743) | def match(self, model_path: str):
    method get_default_conv_template (line 748) | def get_default_conv_template(self, model_path: str) -> Conversation:
    method load_model (line 755) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
  class LongChatAdapter (line 771) | class LongChatAdapter(BaseModelAdapter):
    method match (line 776) | def match(self, model_path: str):
    method load_model (line 779) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 796) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class GoogleT5Adapter (line 800) | class GoogleT5Adapter(BaseModelAdapter):
    method match (line 803) | def match(self, model_path: str):
    method load_model (line 809) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
  class KoalaAdapter (line 821) | class KoalaAdapter(BaseModelAdapter):
    method match (line 826) | def match(self, model_path: str):
    method get_default_conv_template (line 829) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class AlpacaAdapter (line 833) | class AlpacaAdapter(BaseModelAdapter):
    method match (line 838) | def match(self, model_path: str):
    method get_default_conv_template (line 841) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class ChatGLMAdapter (line 845) | class ChatGLMAdapter(BaseModelAdapter):
    method match (line 848) | def match(self, model_path: str):
    method load_model (line 851) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 869) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class CodeGeexAdapter (line 878) | class CodeGeexAdapter(BaseModelAdapter):
    method match (line 881) | def match(self, model_path: str):
    method load_model (line 884) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 894) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class DollyV2Adapter (line 898) | class DollyV2Adapter(BaseModelAdapter):
    method match (line 901) | def match(self, model_path: str):
    method load_model (line 904) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 918) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class OasstPythiaAdapter (line 922) | class OasstPythiaAdapter(BaseModelAdapter):
    method match (line 925) | def match(self, model_path: str):
    method get_default_conv_template (line 929) | def get_default_conv_template(self, model_path: str) -> Conversation:
    method load_model (line 932) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
  class OasstLLaMAAdapter (line 939) | class OasstLLaMAAdapter(BaseModelAdapter):
    method match (line 944) | def match(self, model_path: str):
    method get_default_conv_template (line 950) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class OpenChat35Adapter (line 954) | class OpenChat35Adapter(BaseModelAdapter):
    method match (line 957) | def match(self, model_path: str):
    method get_default_conv_template (line 964) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class TenyxChatAdapter (line 968) | class TenyxChatAdapter(BaseModelAdapter):
    method match (line 971) | def match(self, model_path: str):
    method get_default_conv_template (line 974) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class PythiaAdapter (line 978) | class PythiaAdapter(BaseModelAdapter):
    method match (line 981) | def match(self, model_path: str):
    method load_model (line 984) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
  class StableLMAdapter (line 991) | class StableLMAdapter(BaseModelAdapter):
    method match (line 994) | def match(self, model_path: str):
    method get_default_conv_template (line 997) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class MPTAdapter (line 1001) | class MPTAdapter(BaseModelAdapter):
    method match (line 1004) | def match(self, model_path: str):
    method load_model (line 1008) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1024) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class BaizeAdapter (line 1040) | class BaizeAdapter(BaseModelAdapter):
    method match (line 1045) | def match(self, model_path: str):
    method get_default_conv_template (line 1048) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class RwkvAdapter (line 1052) | class RwkvAdapter(BaseModelAdapter):
    method match (line 1055) | def match(self, model_path: str):
    method load_model (line 1058) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1068) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class OpenBuddyAdapter (line 1072) | class OpenBuddyAdapter(BaseModelAdapter):
    method match (line 1077) | def match(self, model_path: str):
    method get_default_conv_template (line 1080) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class PhoenixAdapter (line 1084) | class PhoenixAdapter(BaseModelAdapter):
    method match (line 1087) | def match(self, model_path: str):
    method get_default_conv_template (line 1090) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class ReaLMAdapter (line 1094) | class ReaLMAdapter(BaseModelAdapter):
    method match (line 1097) | def match(self, model_path: str):
    method load_model (line 1100) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1107) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class ChatGPTAdapter (line 1111) | class ChatGPTAdapter(BaseModelAdapter):
    method match (line 1114) | def match(self, model_path: str):
    method load_model (line 1117) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1120) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class AzureOpenAIAdapter (line 1144) | class AzureOpenAIAdapter(BaseModelAdapter):
    method match (line 1147) | def match(self, model_path: str):
    method load_model (line 1150) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1153) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class PplxAIAdapter (line 1157) | class PplxAIAdapter(BaseModelAdapter):
    method match (line 1160) | def match(self, model_path: str):
    method load_model (line 1166) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1169) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class ClaudeAdapter (line 1173) | class ClaudeAdapter(BaseModelAdapter):
    method match (line 1176) | def match(self, model_path: str):
    method load_model (line 1179) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1182) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class BardAdapter (line 1194) | class BardAdapter(BaseModelAdapter):
    method match (line 1197) | def match(self, model_path: str):
    method load_model (line 1200) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1203) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class PaLM2Adapter (line 1207) | class PaLM2Adapter(BaseModelAdapter):
    method match (line 1210) | def match(self, model_path: str):
    method load_model (line 1213) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1216) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class GeminiAdapter (line 1220) | class GeminiAdapter(BaseModelAdapter):
    method match (line 1223) | def match(self, model_path: str):
    method load_model (line 1226) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1229) | def get_default_conv_template(self, model_path: str) -> Conversation:
    method match (line 2238) | def match(self, model_path: str):
    method load_model (line 2241) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 2244) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class BiLLaAdapter (line 1233) | class BiLLaAdapter(BaseModelAdapter):
    method match (line 1236) | def match(self, model_path: str):
    method get_default_conv_template (line 1239) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class RedPajamaINCITEAdapter (line 1243) | class RedPajamaINCITEAdapter(BaseModelAdapter):
    method match (line 1246) | def match(self, model_path: str):
    method load_model (line 1249) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1259) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class H2OGPTAdapter (line 1263) | class H2OGPTAdapter(BaseModelAdapter):
    method match (line 1268) | def match(self, model_path: str):
    method get_default_conv_template (line 1271) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class RobinAdapter (line 1275) | class RobinAdapter(BaseModelAdapter):
    method match (line 1280) | def match(self, model_path: str):
    method get_default_conv_template (line 1283) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class SnoozyAdapter (line 1287) | class SnoozyAdapter(BaseModelAdapter):
    method match (line 1292) | def match(self, model_path: str):
    method get_default_conv_template (line 1296) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class WizardLMAdapter (line 1300) | class WizardLMAdapter(BaseModelAdapter):
    method match (line 1305) | def match(self, model_path: str):
    method get_default_conv_template (line 1308) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class ManticoreAdapter (line 1318) | class ManticoreAdapter(BaseModelAdapter):
    method match (line 1323) | def match(self, model_path: str):
    method get_default_conv_template (line 1326) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class GuanacoAdapter (line 1330) | class GuanacoAdapter(BaseModelAdapter):
    method match (line 1335) | def match(self, model_path: str):
    method load_model (line 1338) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1350) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class ChangGPTAdapter (line 1354) | class ChangGPTAdapter(BaseModelAdapter):
    method match (line 1357) | def match(self, model_path: str):
    method get_default_conv_template (line 1361) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class CamelAdapter (line 1365) | class CamelAdapter(BaseModelAdapter):
    method match (line 1370) | def match(self, model_path: str):
    method get_default_conv_template (line 1373) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class TuluAdapter (line 1377) | class TuluAdapter(BaseModelAdapter):
    method match (line 1382) | def match(self, model_path: str):
    method get_default_conv_template (line 1385) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class FalconAdapter (line 1389) | class FalconAdapter(BaseModelAdapter):
    method match (line 1392) | def match(self, model_path: str):
    method load_model (line 1395) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1410) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class FalconChatAdapter (line 1414) | class FalconChatAdapter(BaseModelAdapter):
    method match (line 1415) | def match(self, model_path: str):
    method get_default_conv_template (line 1418) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class TigerBotAdapter (line 1422) | class TigerBotAdapter(BaseModelAdapter):
    method match (line 1425) | def match(self, model_path: str):
    method load_model (line 1428) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1443) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class BaichuanAdapter (line 1447) | class BaichuanAdapter(BaseModelAdapter):
    method match (line 1450) | def match(self, model_path: str):
    method load_model (line 1453) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1466) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class XGenAdapter (line 1475) | class XGenAdapter(BaseModelAdapter):
    method match (line 1478) | def match(self, model_path: str):
    method load_model (line 1481) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1495) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class NousHermesAdapter (line 1499) | class NousHermesAdapter(BaseModelAdapter):
    method match (line 1504) | def match(self, model_path: str):
    method get_default_conv_template (line 1507) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class InternLMChatAdapter (line 1511) | class InternLMChatAdapter(BaseModelAdapter):
    method match (line 1514) | def match(self, model_path: str):
    method load_model (line 1517) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1533) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class StarChatAdapter (line 1537) | class StarChatAdapter(BaseModelAdapter):
    method match (line 1540) | def match(self, model_path: str):
    method get_default_conv_template (line 1543) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class MistralAdapter (line 1547) | class MistralAdapter(BaseModelAdapter):
    method match (line 1550) | def match(self, model_path: str):
    method load_model (line 1553) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1559) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class Llama2Adapter (line 1563) | class Llama2Adapter(BaseModelAdapter):
    method match (line 1566) | def match(self, model_path: str):
    method load_model (line 1569) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1575) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class Llama3Adapter (line 1579) | class Llama3Adapter(BaseModelAdapter):
    method match (line 1582) | def match(self, model_path: str):
    method load_model (line 1585) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1591) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class Llama31Adapter (line 1595) | class Llama31Adapter(BaseModelAdapter):
    method match (line 1598) | def match(self, model_path: str):
    method load_model (line 1606) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1612) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class GrokAdapter (line 1622) | class GrokAdapter(BaseModelAdapter):
    method match (line 1623) | def match(self, model_path: str):
    method get_default_conv_template (line 1626) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class CuteGPTAdapter (line 1632) | class CuteGPTAdapter(BaseModelAdapter):
    method match (line 1635) | def match(self, model_path: str):
    method load_model (line 1638) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1648) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class OpenOrcaAdapter (line 1652) | class OpenOrcaAdapter(BaseModelAdapter):
    method match (line 1664) | def match(self, model_path: str):
    method load_model (line 1670) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1682) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class DolphinAdapter (line 1688) | class DolphinAdapter(OpenOrcaAdapter):
    method match (line 1691) | def match(self, model_path: str):
    method get_default_conv_template (line 1694) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class Hermes2Adapter (line 1698) | class Hermes2Adapter(BaseModelAdapter):
    method match (line 1703) | def match(self, model_path: str):
    method load_model (line 1709) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1721) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class NousHermes2MixtralAdapter (line 1725) | class NousHermes2MixtralAdapter(BaseModelAdapter):
    method match (line 1728) | def match(self, model_path: str):
    method get_default_conv_template (line 1737) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class WizardCoderAdapter (line 1741) | class WizardCoderAdapter(BaseModelAdapter):
    method match (line 1746) | def match(self, model_path: str):
    method get_default_conv_template (line 1749) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class QwenChatAdapter (line 1755) | class QwenChatAdapter(BaseModelAdapter):
    method match (line 1773) | def match(self, model_path: str):
    method float_set (line 1776) | def float_set(self, config, option):
    method load_model (line 1790) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1825) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class SmaugChatAdapter (line 1829) | class SmaugChatAdapter(BaseModelAdapter):
    method match (line 1832) | def match(self, model_path: str):
    method get_default_conv_template (line 1835) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class BGEAdapter (line 1839) | class BGEAdapter(BaseModelAdapter):
    method match (line 1844) | def match(self, model_path: str):
    method load_model (line 1847) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1866) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class E5Adapter (line 1870) | class E5Adapter(BaseModelAdapter):
    method match (line 1875) | def match(self, model_path: str):
    method load_model (line 1878) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1895) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class AquilaChatAdapter (line 1899) | class AquilaChatAdapter(BaseModelAdapter):
    method match (line 1908) | def match(self, model_path: str):
    method load_model (line 1911) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1925) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class Lamma2ChineseAdapter (line 1939) | class Lamma2ChineseAdapter(BaseModelAdapter):
    method match (line 1942) | def match(self, model_path: str):
    method load_model (line 1945) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1960) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class Lamma2ChineseAlpacaAdapter (line 1964) | class Lamma2ChineseAlpacaAdapter(BaseModelAdapter):
    method match (line 1967) | def match(self, model_path: str):
    method load_model (line 1970) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1985) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class VigogneAdapter (line 1989) | class VigogneAdapter(BaseModelAdapter):
    method match (line 1994) | def match(self, model_path: str):
    method load_model (line 1997) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 2013) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class OpenLLaMaOpenInstructAdapter (line 2021) | class OpenLLaMaOpenInstructAdapter(BaseModelAdapter):
    method match (line 2026) | def match(self, model_path: str):
    method load_model (line 2031) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 2047) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class CodeLlamaAdapter (line 2051) | class CodeLlamaAdapter(BaseModelAdapter):
    method match (line 2054) | def match(self, model_path: str):
    method load_model (line 2057) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 2063) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class StableVicunaAdapter (line 2067) | class StableVicunaAdapter(BaseModelAdapter):
    method match (line 2070) | def match(self, model_path: str):
    method load_model (line 2073) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 2079) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class PhindCodeLlamaAdapter (line 2083) | class PhindCodeLlamaAdapter(CodeLlamaAdapter):
    method match (line 2086) | def match(self, model_path: str):
    method get_default_conv_template (line 2089) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class Llama2ChangAdapter (line 2093) | class Llama2ChangAdapter(Llama2Adapter):
    method match (line 2096) | def match(self, model_path: str):
    method get_default_conv_template (line 2099) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class ZephyrAdapter (line 2103) | class ZephyrAdapter(BaseModelAdapter):
    method match (line 2106) | def match(self, model_path: str):
    method get_default_conv_template (line 2109) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class NotusAdapter (line 2113) | class NotusAdapter(BaseModelAdapter):
    method match (line 2116) | def match(self, model_path: str):
    method get_default_conv_template (line 2119) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class CatPPTAdapter (line 2123) | class CatPPTAdapter(BaseModelAdapter):
    method match (line 2126) | def match(self, model_path: str):
    method get_default_conv_template (line 2129) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class TinyLlamaAdapter (line 2133) | class TinyLlamaAdapter(BaseModelAdapter):
    method match (line 2136) | def match(self, model_path: str):
    method get_default_conv_template (line 2139) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class XwinLMAdapter (line 2143) | class XwinLMAdapter(BaseModelAdapter):
    method match (line 2148) | def match(self, model_path: str):
    method get_default_conv_template (line 2151) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class LemurAdapter (line 2155) | class LemurAdapter(BaseModelAdapter):
    method match (line 2160) | def match(self, model_path: str):
    method get_default_conv_template (line 2163) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class PygmalionAdapter (line 2167) | class PygmalionAdapter(BaseModelAdapter):
    method match (line 2172) | def match(self, model_path: str):
    method get_default_conv_template (line 2177) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class XdanAdapter (line 2181) | class XdanAdapter(BaseModelAdapter):
    method match (line 2184) | def match(self, model_path: str):
    method get_default_conv_template (line 2187) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class MicrosoftOrcaAdapter (line 2191) | class MicrosoftOrcaAdapter(BaseModelAdapter):
    method match (line 2196) | def match(self, model_path: str):
    method get_default_conv_template (line 2199) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class YiAdapter (line 2203) | class YiAdapter(BaseModelAdapter):
    method match (line 2206) | def match(self, model_path: str):
    method get_default_conv_template (line 2209) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class DeepseekCoderAdapter (line 2213) | class DeepseekCoderAdapter(BaseModelAdapter):
    method match (line 2216) | def match(self, model_path: str):
    method get_default_conv_template (line 2219) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class DeepseekChatAdapter (line 2223) | class DeepseekChatAdapter(BaseModelAdapter):
    method match (line 2228) | def match(self, model_path: str):
    method get_default_conv_template (line 2231) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class GeminiAdapter (line 2235) | class GeminiAdapter(BaseModelAdapter):
    method match (line 1223) | def match(self, model_path: str):
    method load_model (line 1226) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 1229) | def get_default_conv_template(self, model_path: str) -> Conversation:
    method match (line 2238) | def match(self, model_path: str):
    method load_model (line 2241) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 2244) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class Yuan2Adapter (line 2250) | class Yuan2Adapter(BaseModelAdapter):
    method match (line 2253) | def match(self, model_path: str):
    method load_model (line 2256) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 2297) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class MetaMathAdapter (line 2301) | class MetaMathAdapter(BaseModelAdapter):
    method match (line 2304) | def match(self, model_path: str):
    method get_default_conv_template (line 2307) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class BagelAdapter (line 2311) | class BagelAdapter(BaseModelAdapter):
    method match (line 2314) | def match(self, model_path: str):
    method get_default_conv_template (line 2317) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class SolarAdapter (line 2321) | class SolarAdapter(BaseModelAdapter):
    method match (line 2324) | def match(self, model_path: str):
    method get_default_conv_template (line 2327) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class SteerLMAdapter (line 2331) | class SteerLMAdapter(BaseModelAdapter):
    method match (line 2334) | def match(self, model_path: str):
    method get_default_conv_template (line 2337) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class GemmaAdapter (line 2341) | class GemmaAdapter(BaseModelAdapter):
    method match (line 2344) | def match(self, model_path: str):
    method get_default_conv_template (line 2347) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class LlavaAdapter (line 2351) | class LlavaAdapter(BaseModelAdapter):
    method load_model (line 2354) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method match (line 2358) | def match(self, model_path: str):
    method get_default_conv_template (line 2361) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class YuanAdapter (line 2369) | class YuanAdapter(BaseModelAdapter):
    method match (line 2372) | def match(self, model_path: str):
    method load_model (line 2375) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 2399) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class OlmoAdapter (line 2403) | class OlmoAdapter(BaseModelAdapter):
    method match (line 2406) | def match(self, model_path: str):
    method get_default_conv_template (line 2409) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class YandexGPTAdapter (line 2413) | class YandexGPTAdapter(BaseModelAdapter):
    method match (line 2416) | def match(self, model_path: str):
    method get_default_conv_template (line 2419) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class CllmAdapter (line 2423) | class CllmAdapter(BaseModelAdapter):
    method match (line 2426) | def match(self, model_path: str):
    method load_model (line 2429) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 2450) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class CohereAdapter (line 2454) | class CohereAdapter(BaseModelAdapter):
    method match (line 2457) | def match(self, model_path: str):
    method load_model (line 2460) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 2463) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class DBRXAdapter (line 2467) | class DBRXAdapter(BaseModelAdapter):
    method match (line 2470) | def match(self, model_path: str):
    method load_model (line 2473) | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
    method get_default_conv_template (line 2476) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class RekaAdapter (line 2480) | class RekaAdapter(BaseModelAdapter):
    method match (line 2483) | def match(self, model_path: str):
    method get_default_conv_template (line 2486) | def get_default_conv_template(self, model_path: str) -> Conversation:
  class NoSystemAdapter (line 2490) | class NoSystemAdapter(BaseModelAdapter):
    method match (line 2491) | def match(self, model_path: str):
    method get_default_conv_template (line 2499) | def get_default_conv_template(self, model_path: str) -> Conversation:

FILE: fastchat/model/model_chatglm.py
  class InvalidScoreLogitsProcessor (line 11) | class InvalidScoreLogitsProcessor(LogitsProcessor):
    method __call__ (line 12) | def __call__(
  function process_response (line 24) | def process_response(response):
  function recover_message_list (line 40) | def recover_message_list(prompt):
  function generate_stream_chatglm (line 66) | def generate_stream_chatglm(

FILE: fastchat/model/model_cllm.py
  function get_jacobian_trajectory (line 21) | def get_jacobian_trajectory(
  function generate_stream_cllm (line 109) | def generate_stream_cllm(

FILE: fastchat/model/model_codet5p.py
  function generate_stream_codet5p (line 14) | def generate_stream_codet5p(

FILE: fastchat/model/model_exllama.py
  function generate_stream_exllama (line 8) | def generate_stream_exllama(

FILE: fastchat/model/model_falcon.py
  function generate_stream_falcon (line 13) | def generate_stream_falcon(

FILE: fastchat/model/model_registry.py
  function register_model_info (line 12) | def register_model_info(
  function get_model_info (line 21) | def get_model_info(name: str) -> ModelInfo:

FILE: fastchat/model/model_xfastertransformer.py
  function generate_stream_xft (line 9) | def generate_stream_xft(

FILE: fastchat/model/model_yuan2.py
  function generate_stream_yuan2 (line 13) | def generate_stream_yuan2(

FILE: fastchat/model/monkey_patch_non_inplace.py
  function rotate_half (line 13) | def rotate_half(x):
  function apply_rotary_pos_emb (line 20) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
  function forward (line 30) | def forward(
  function replace_llama_attn_with_non_inplace_operations (line 117) | def replace_llama_attn_with_non_inplace_operations():

FILE: fastchat/model/rwkv_model.py
  class RwkvModel (line 14) | class RwkvModel:
    method __init__ (line 15) | def __init__(self, model_path):
    method to (line 27) | def to(self, target):
    method __call__ (line 30) | def __call__(self, input_ids, use_cache, past_key_values=None):
    method generate (line 40) | def generate(

FILE: fastchat/model/upload_hub.py
  function upload_hub (line 14) | def upload_hub(model_path, hub_repo_id, component, private):

FILE: fastchat/modules/awq.py
  class AWQConfig (line 10) | class AWQConfig:
  function load_awq_quantized (line 24) | def load_awq_quantized(model_name, awq_config: AWQConfig, device):
  function find_awq_ckpt (line 75) | def find_awq_ckpt(awq_config: AWQConfig):

FILE: fastchat/modules/exllama.py
  class ExllamaConfig (line 6) | class ExllamaConfig:
  class ExllamaModel (line 12) | class ExllamaModel:
    method __init__ (line 13) | def __init__(self, exllama_model, exllama_cache):
  function load_exllama_model (line 19) | def load_exllama_model(model_path, exllama_config: ExllamaConfig):

FILE: fastchat/modules/gptq.py
  class GptqConfig (line 11) | class GptqConfig:
  function load_gptq_quantized (line 29) | def load_gptq_quantized(model_name, gptq_config: GptqConfig):
  function find_gptq_ckpt (line 65) | def find_gptq_ckpt(gptq_config: GptqConfig):

FILE: fastchat/modules/xfastertransformer.py
  class XftConfig (line 6) | class XftConfig:
  class XftModel (line 18) | class XftModel:
    method __init__ (line 19) | def __init__(self, xft_model, xft_config):
  function load_xft_model (line 24) | def load_xft_model(model_path, xft_config: XftConfig):

FILE: fastchat/protocol/api_protocol.py
  class ErrorResponse (line 9) | class ErrorResponse(BaseModel):
  class ModelPermission (line 15) | class ModelPermission(BaseModel):
  class ModelCard (line 30) | class ModelCard(BaseModel):
  class ModelList (line 40) | class ModelList(BaseModel):
  class UsageInfo (line 45) | class UsageInfo(BaseModel):
  class APIChatCompletionRequest (line 51) | class APIChatCompletionRequest(BaseModel):
  class ChatMessage (line 67) | class ChatMessage(BaseModel):
  class ChatCompletionResponseChoice (line 72) | class ChatCompletionResponseChoice(BaseModel):
  class ChatCompletionResponse (line 78) | class ChatCompletionResponse(BaseModel):
  class DeltaMessage (line 87) | class DeltaMessage(BaseModel):
  class ChatCompletionResponseStreamChoice (line 92) | class ChatCompletionResponseStreamChoice(BaseModel):
  class ChatCompletionStreamResponse (line 98) | class ChatCompletionStreamResponse(BaseModel):
  class APITokenCheckRequestItem (line 106) | class APITokenCheckRequestItem(BaseModel):
  class APITokenCheckRequest (line 112) | class APITokenCheckRequest(BaseModel):
  class APITokenCheckResponseItem (line 116) | class APITokenCheckResponseItem(BaseModel):
  class APITokenCheckResponse (line 122) | class APITokenCheckResponse(BaseModel):
  class CompletionRequest (line 126) | class CompletionRequest(BaseModel):
  class CompletionResponseChoice (line 144) | class CompletionResponseChoice(BaseModel):
  class CompletionResponse (line 151) | class CompletionResponse(BaseModel):
  class CompletionResponseStreamChoice (line 160) | class CompletionResponseStreamChoice(BaseModel):
  class CompletionStreamResponse (line 167) | class CompletionStreamResponse(BaseModel):

FILE: fastchat/protocol/openai_api_protocol.py
  class ErrorResponse (line 9) | class ErrorResponse(BaseModel):
  class ModelPermission (line 15) | class ModelPermission(BaseModel):
  class ModelCard (line 30) | class ModelCard(BaseModel):
  class ModelList (line 40) | class ModelList(BaseModel):
  class UsageInfo (line 45) | class UsageInfo(BaseModel):
  class LogProbs (line 51) | class LogProbs(BaseModel):
  class ChatCompletionRequest (line 58) | class ChatCompletionRequest(BaseModel):
  class ChatMessage (line 77) | class ChatMessage(BaseModel):
  class ChatCompletionResponseChoice (line 82) | class ChatCompletionResponseChoice(BaseModel):
  class ChatCompletionResponse (line 88) | class ChatCompletionResponse(BaseModel):
  class DeltaMessage (line 97) | class DeltaMessage(BaseModel):
  class ChatCompletionResponseStreamChoice (line 102) | class ChatCompletionResponseStreamChoice(BaseModel):
  class ChatCompletionStreamResponse (line 108) | class ChatCompletionStreamResponse(BaseModel):
  class TokenCheckRequestItem (line 116) | class TokenCheckRequestItem(BaseModel):
  class TokenCheckRequest (line 122) | class TokenCheckRequest(BaseModel):
  class TokenCheckResponseItem (line 126) | class TokenCheckResponseItem(BaseModel):
  class TokenCheckResponse (line 132) | class TokenCheckResponse(BaseModel):
  class EmbeddingsRequest (line 136) | class EmbeddingsRequest(BaseModel):
  class EmbeddingsResponse (line 144) | class EmbeddingsResponse(BaseModel):
  class CompletionRequest (line 151) | class CompletionRequest(BaseModel):
  class CompletionResponseChoice (line 171) | class CompletionResponseChoice(BaseModel):
  class CompletionResponse (line 178) | class CompletionResponse(BaseModel):
  class CompletionResponseStreamChoice (line 187) | class CompletionResponseStreamChoice(BaseModel):
  class CompletionStreamResponse (line 194) | class CompletionStreamResponse(BaseModel):

FILE: fastchat/serve/api_provider.py
  function get_api_provider_stream_iter (line 18) | def get_api_provider_stream_iter(
  function openai_api_stream_iter (line 268) | def openai_api_stream_iter(
  function column_api_stream_iter (line 364) | def column_api_stream_iter(
  function p2l_api_stream_iter (line 428) | def p2l_api_stream_iter(
  function upload_openai_file_to_gcs (line 496) | def upload_openai_file_to_gcs(file_id):
  function openai_assistant_api_stream_iter (line 511) | def openai_assistant_api_stream_iter(
  function anthropic_api_stream_iter (line 651) | def anthropic_api_stream_iter(model_name, prompt, temperature, top_p, ma...
  function anthropic_message_api_stream_iter (line 685) | def anthropic_message_api_stream_iter(
  function gemini_api_stream_iter (line 756) | def gemini_api_stream_iter(
  function ai2_api_stream_iter (line 846) | def ai2_api_stream_iter(
  function mistral_api_stream_iter (line 919) | def mistral_api_stream_iter(
  function nvidia_api_stream_iter (line 978) | def nvidia_api_stream_iter(
  function yandexgpt_api_stream_iter (line 1036) | def yandexgpt_api_stream_iter(
  function cohere_api_stream_iter (line 1077) | def cohere_api_stream_iter(
  function vertex_api_stream_iter (line 1146) | def vertex_api_stream_iter(model_name, messages, temperature, top_p, max...
  function reka_api_stream_iter (line 1212) | def reka_api_stream_iter(
  function metagen_api_stream_iter (line 1272) | def metagen_api_stream_iter(

FILE: fastchat/serve/base_model_worker.py
  function heart_beat_worker (line 21) | def heart_beat_worker(obj):
  class BaseModelWorker (line 27) | class BaseModelWorker:
    method __init__ (line 28) | def __init__(
    method make_conv_template (line 63) | def make_conv_template(
    method init_heart_beat (line 80) | def init_heart_beat(self):
    method register_to_controller (line 89) | def register_to_controller(self):
    method send_heart_beat (line 102) | def send_heart_beat(self):
    method get_queue_length (line 131) | def get_queue_length(self):
    method get_status (line 145) | def get_status(self):
    method count_token (line 152) | def count_token(self, params):
    method get_conv_template (line 167) | def get_conv_template(self):
    method generate_stream_gate (line 170) | def generate_stream_gate(self, params):
    method generate_gate (line 173) | def generate_gate(self, params):
    method get_embeddings (line 176) | def get_embeddings(self, params):
  function release_worker_semaphore (line 180) | def release_worker_semaphore():
  function acquire_worker_semaphore (line 184) | def acquire_worker_semaphore():
  function create_background_tasks (line 190) | def create_background_tasks():
  function api_generate_stream (line 197) | async def api_generate_stream(request: Request):
  function api_generate (line 206) | async def api_generate(request: Request):
  function api_get_embeddings (line 215) | async def api_get_embeddings(request: Request):
  function api_get_status (line 224) | async def api_get_status(request: Request):
  function api_count_token (line 229) | async def api_count_token(request: Request):
  function api_get_conv (line 235) | async def api_get_conv(request: Request):
  function api_model_details (line 240) | async def api_model_details(request: Request):

FILE: fastchat/serve/call_monitor.py
  class Monitor (line 15) | class Monitor:
    method __init__ (line 18) | def __init__(self, log_dir_list: list):
    method update_stats (line 25) | async def update_stats(self, num_file=1) -> None:
    method get_model_call_limit (line 68) | def get_model_call_limit(self, model: str) -> int:
    method update_model_call_limit (line 73) | def update_model_call_limit(self, model: str, limit: int) -> bool:
    method is_model_limit_reached (line 79) | def is_model_limit_reached(self, model: str) -> bool:
    method is_user_limit_reached (line 89) | def is_user_limit_reached(self, model: str, user_id: str) -> bool:
    method get_model_call_stats (line 104) | def get_model_call_stats(
    method get_user_call_stats (line 124) | def get_user_call_stats(
    method get_num_users (line 153) | def get_num_users(self, most_recent_min: int = 60) -> int:
  function app_startup (line 165) | async def app_startup():
  function get_model_call_limit (line 170) | async def get_model_call_limit(model: str):
  function update_model_call_limit (line 175) | async def update_model_call_limit(model: str, limit: int):
  function is_limit_reached (line 182) | async def is_limit_reached(model: str, user_id: str):
  function get_num_users (line 197) | async def get_num_users():
  function get_num_users_day (line 202) | async def get_num_users_day():
  function get_user_call_stats (line 207) | async def get_user_call_stats(
  function get_model_call_stats (line 216) | async def get_model_call_stats(

FILE: fastchat/serve/cli.py
  class SimpleChatIO (line 40) | class SimpleChatIO(ChatIO):
    method __init__ (line 41) | def __init__(self, multiline: bool = False):
    method prompt_for_input (line 44) | def prompt_for_input(self, role) -> str:
    method prompt_for_output (line 58) | def prompt_for_output(self, role: str):
    method stream_output (line 61) | def stream_output(self, output_stream):
    method print_output (line 73) | def print_output(self, text: str):
  class RichChatIO (line 77) | class RichChatIO(ChatIO):
    method _ (line 81) | def _(event):
    method __init__ (line 84) | def __init__(self, multiline: bool = False, mouse: bool = False):
    method prompt_for_input (line 94) | def prompt_for_input(self, role) -> str:
    method prompt_for_output (line 107) | def prompt_for_output(self, role: str):
    method stream_output (line 110) | def stream_output(self, output_stream):
    method print_output (line 148) | def print_output(self, text: str):
  class ProgrammaticChatIO (line 152) | class ProgrammaticChatIO(ChatIO):
    method prompt_for_input (line 153) | def prompt_for_input(self, role) -> str:
    method prompt_for_output (line 173) | def prompt_for_output(self, role: str):
    method stream_output (line 176) | def stream_output(self, output_stream):
    method print_output (line 188) | def print_output(self, text: str):
  function main (line 192) | def main(args):

FILE: fastchat/serve/controller.py
  class DispatchMethod (line 34) | class DispatchMethod(Enum):
    method from_str (line 39) | def from_str(cls, name):
  class WorkerInfo (line 49) | class WorkerInfo:
  function heart_beat_controller (line 58) | def heart_beat_controller(controller):
  class Controller (line 64) | class Controller:
    method __init__ (line 65) | def __init__(self, dispatch_method: str):
    method register_worker (line 75) | def register_worker(
    method get_worker_status (line 104) | def get_worker_status(self, worker_name: str):
    method remove_worker (line 117) | def remove_worker(self, worker_name: str):
    method refresh_all_workers (line 120) | def refresh_all_workers(self):
    method list_models (line 130) | def list_models(self):
    method list_multimodal_models (line 138) | def list_multimodal_models(self):
    method list_language_models (line 147) | def list_language_models(self):
    method get_worker_address (line 156) | def get_worker_address(self, model_name: str):
    method receive_heart_beat (line 209) | def receive_heart_beat(self, worker_name: str, queue_length: int):
    method remove_stale_workers_by_expiration (line 219) | def remove_stale_workers_by_expiration(self):
    method handle_no_worker (line 229) | def handle_no_worker(self, params):
    method handle_worker_timeout (line 237) | def handle_worker_timeout(self, worker_address):
    method worker_api_get_status (line 247) | def worker_api_get_status(self):
    method worker_api_generate_stream (line 266) | def worker_api_generate_stream(self, params):
  function register_worker (line 289) | async def register_worker(request: Request):
  function refresh_all_workers (line 300) | async def refresh_all_workers():
  function list_models (line 305) | async def list_models():
  function list_multimodal_models (line 311) | async def list_multimodal_models():
  function list_language_models (line 317) | async def list_language_models():
  function get_worker_address (line 323) | async def get_worker_address(request: Request):
  function receive_heart_beat (line 330) | async def receive_heart_beat(request: Request):
  function worker_api_generate_stream (line 337) | async def worker_api_generate_stream(request: Request):
  function worker_api_get_status (line 344) | async def worker_api_get_status(request: Request):
  function worker_api_get_status (line 349) | async def worker_api_get_status(request: Request):
  function create_controller (line 353) | def create_controller():

FILE: fastchat/serve/dashinfer_worker.py
  function download_model (line 32) | def download_model(model_id, revision):
  class DashInferWorker (line 54) | class DashInferWorker(BaseModelWorker):
    method __init__ (line 55) | def __init__(
    method generate_stream (line 96) | async def generate_stream(self, params):
    method generate (line 208) | async def generate(self, params):
  function release_worker_semaphore (line 214) | def release_worker_semaphore():
  function acquire_worker_semaphore (line 218) | def acquire_worker_semaphore():
  function create_background_tasks (line 224) | def create_background_tasks():
  function api_generate_stream (line 231) | async def api_generate_stream(request: Request):
  function api_generate (line 240) | async def api_generate(request: Request):
  function api_get_status (line 249) | async def api_get_status(request: Request):
  function api_count_token (line 254) | async def api_count_token(request: Request):
  function api_get_conv (line 260) | async def api_get_conv(request: Request):
  function api_model_details (line 265) | async def api_model_details(request: Request):

FILE: fastchat/serve/gradio_block_arena_anony.py
  function set_global_vars_anony (line 51) | def set_global_vars_anony(enable_moderation_):
  function load_demo_side_by_side_anony (line 56) | def load_demo_side_by_side_anony(models_, url_params):
  function vote_last_response (line 69) | def vote_last_response(states, vote_type, model_selectors, request: gr.R...
  function leftvote_last_response (line 102) | def leftvote_last_response(
  function rightvote_last_response (line 112) | def rightvote_last_response(
  function tievote_last_response (line 122) | def tievote_last_response(
  function bothbad_vote_last_response (line 132) | def bothbad_vote_last_response(
  function regenerate (line 142) | def regenerate(state0, state1, request: gr.Request):
  function clear_history (line 156) | def clear_history(request: gr.Request):
  function share_click (line 170) | def share_click(state0, state1, model_selector0, model_selector1, reques...
  function get_sample_weight (line 193) | def get_sample_weight(model, outage_models, sampling_weights, sampling_b...
  function is_model_match_pattern (line 202) | def is_model_match_pattern(model, patterns):
  function get_battle_pair (line 212) | def get_battle_pair(
  function add_text (line 269) | def add_text(
  function bot_response_multi (line 358) | def bot_response_multi(
  function build_side_by_side_ui_anony (line 441) | def build_side_by_side_ui_anony(models):

FILE: fastchat/serve/gradio_block_arena_named.py
  function set_global_vars_named (line 44) | def set_global_vars_named(enable_moderation_):
  function load_demo_side_by_side_named (line 49) | def load_demo_side_by_side_named(models, url_params):
  function vote_last_response (line 68) | def vote_last_response(states, vote_type, model_selectors, request: gr.R...
  function leftvote_last_response (line 81) | def leftvote_last_response(
  function rightvote_last_response (line 91) | def rightvote_last_response(
  function tievote_last_response (line 101) | def tievote_last_response(
  function bothbad_vote_last_response (line 111) | def bothbad_vote_last_response(
  function regenerate (line 121) | def regenerate(state0, state1, request: gr.Request):
  function clear_history (line 135) | def clear_history(request: gr.Request):
  function share_click (line 146) | def share_click(state0, state1, model_selector0, model_selector1, reques...
  function add_text (line 154) | def add_text(
  function bot_response_multi (line 224) | def bot_response_multi(
  function flash_buttons (line 304) | def flash_buttons():
  function build_side_by_side_ui_named (line 314) | def build_side_by_side_ui_named(models):

FILE: fastchat/serve/gradio_block_arena_vision.py
  function get_vqa_sample (line 70) | def get_vqa_sample():
  function set_visible_image (line 77) | def set_visible_image(textbox):
  function set_invisible_image (line 89) | def set_invisible_image():
  function add_image (line 93) | def add_image(textbox):
  function vote_last_response (line 101) | def vote_last_response(state, vote_type, model_selector, request: gr.Req...
  function upvote_last_response (line 115) | def upvote_last_response(state, model_selector, request: gr.Request):
  function downvote_last_response (line 122) | def downvote_last_response(state, model_selector, request: gr.Request):
  function flag_last_response (line 129) | def flag_last_response(state, model_selector, request: gr.Request):
  function regenerate (line 136) | def regenerate(state, request: gr.Request):
  function clear_history (line 146) | def clear_history(request: gr.Request):
  function clear_history_example (line 155) | def clear_history_example(request: gr.Request):
  function report_csam_image (line 165) | def report_csam_image(state, image):
  function _prepare_text_with_image (line 169) | def _prepare_text_with_image(state, text, images, csam_flag):
  function convert_images_to_conversation_format (line 181) | def convert_images_to_conversation_format(images):
  function moderate_input (line 194) | def moderate_input(state, text, all_conv_text, model_list, images, ip):
  function add_text (line 219) | def add_text(
  function build_single_vision_language_model_ui (line 298) | def build_single_vision_language_model_ui(

FILE: fastchat/serve/gradio_block_arena_vision_anony.py
  function get_vqa_sample (line 100) | def get_vqa_sample():
  function load_demo_side_by_side_vision_anony (line 107) | def load_demo_side_by_side_vision_anony():
  function clear_history_example (line 117) | def clear_history_example(request: gr.Request):
  function vote_last_response (line 130) | def vote_last_response(states, vote_type, model_selectors, request: gr.R...
  function leftvote_last_response (line 173) | def leftvote_last_response(
  function rightvote_last_response (line 183) | def rightvote_last_response(
  function tievote_last_response (line 193) | def tievote_last_response(
  function bothbad_vote_last_response (line 203) | def bothbad_vote_last_response(
  function regenerate (line 213) | def regenerate(state0, state1, request: gr.Request):
  function clear_history (line 232) | def clear_history(request: gr.Request):
  function add_text (line 246) | def add_text(
  function build_side_by_side_vision_ui_anony (line 378) | def build_side_by_side_vision_ui_anony(context: Context, random_question...

FILE: fastchat/serve/gradio_block_arena_vision_named.py
  function load_demo_side_by_side_vision_named (line 72) | def load_demo_side_by_side_vision_named(context: Context):
  function clear_history_example (line 95) | def clear_history_example(request: gr.Request):
  function vote_last_response (line 106) | def vote_last_response(states, vote_type, model_selectors, request: gr.R...
  function leftvote_last_response (line 120) | def leftvote_last_response(
  function rightvote_last_response (line 130) | def rightvote_last_response(
  function tievote_last_response (line 140) | def tievote_last_response(
  function bothbad_vote_last_response (line 150) | def bothbad_vote_last_response(
  function regenerate (line 160) | def regenerate(state0, state1, request: gr.Request):
  function clear_history (line 179) | def clear_history(request: gr.Request):
  function add_text (line 190) | def add_text(
  function build_side_by_side_vision_ui_named (line 305) | def build_side_by_side_vision_ui_named(context: Context, random_question...

FILE: fastchat/serve/gradio_global_state.py
  class Context (line 6) | class Context:

FILE: fastchat/serve/gradio_web_server.py
  class State (line 114) | class State:
    method __init__ (line 115) | def __init__(self, model_name, is_vision=False):
    method update_ans_models (line 133) | def update_ans_models(self, ans: str) -> None:
    method update_router_outputs (line 136) | def update_router_outputs(self, outputs: Dict[str, float]) -> None:
    method init_system_prompt (line 139) | def init_system_prompt(self, conv, is_vision):
    method to_gradio_chatbot (line 153) | def to_gradio_chatbot(self):
    method dict (line 156) | def dict(self):
  function set_global_vars (line 184) | def set_global_vars(
  function get_conv_log_filename (line 195) | def get_conv_log_filename(is_vision=False, has_csam_image=False):
  function get_model_list (line 208) | def get_model_list(controller_url, register_api_endpoint_file, vision_ar...
  function load_demo_single (line 255) | def load_demo_single(context: Context, query_params):
  function load_demo (line 274) | def load_demo(url_params, request: gr.Request):
  function vote_last_response (line 288) | def vote_last_response(state, vote_type, model_selector, request: gr.Req...
  function upvote_last_response (line 305) | def upvote_last_response(state, model_selector, request: gr.Request):
  function downvote_last_response (line 312) | def downvote_last_response(state, model_selector, request: gr.Request):
  function flag_last_response (line 319) | def flag_last_response(state, model_selector, request: gr.Request):
  function regenerate (line 326) | def regenerate(state, request: gr.Request):
  function clear_history (line 336) | def clear_history(request: gr.Request):
  function get_ip (line 343) | def get_ip(request: gr.Request):
  function add_text (line 355) | def add_text(state, model_selector, text, request: gr.Request):
  function model_worker_stream_iter (line 388) | def model_worker_stream_iter(
  function is_limit_reached (line 431) | def is_limit_reached(model_name, ip):
  function bot_response (line 444) | def bot_response(
  function get_model_description_md (line 826) | def get_model_description_md(models):
  function build_about (line 849) | def build_about():
  function build_single_model_ui (line 887) | def build_single_model_ui(models, add_promotion_links=False):
  function build_demo (line 1029) | def build_demo(models):

FILE: fastchat/serve/gradio_web_server_multi.py
  function build_visualizer (line 53) | def build_visualizer():
  function load_demo (line 110) | def load_demo(context: Context, request: gr.Request):
  function build_demo (line 169) | def build_demo(

FILE: fastchat/serve/huggingface_api.py
  function main (line 16) | def main(args):

FILE: fastchat/serve/huggingface_api_worker.py
  function get_gen_kwargs (line 51) | def get_gen_kwargs(
  function could_be_stop (line 83) | def could_be_stop(text, stop):
  class HuggingfaceApiWorker (line 90) | class HuggingfaceApiWorker(BaseModelWorker):
    method __init__ (line 91) | def __init__(
    method count_token (line 130) | def count_token(self, params):
    method generate_stream_gate (line 138) | def generate_stream_gate(self, params):
    method generate_gate (line 196) | def generate_gate(self, params):
    method get_embeddings (line 201) | def get_embeddings(self, params):
  function release_worker_semaphore (line 205) | def release_worker_semaphore(worker):
  function acquire_worker_semaphore (line 209) | def acquire_worker_semaphore(worker):
  function create_background_tasks (line 215) | def create_background_tasks(worker):
  function api_generate_stream (line 222) | async def api_generate_stream(request: Request):
  function api_generate (line 232) | async def api_generate(request: Request):
  function api_get_embeddings (line 242) | async def api_get_embeddings(request: Request):
  function api_get_status (line 252) | async def api_get_status(request: Request):
  function api_count_token (line 261) | async def api_count_token(request: Request):
  function api_get_conv (line 268) | async def api_get_conv(request: Request):
  function api_model_details (line 275) | async def api_model_details(request: Request):
  function create_huggingface_api_worker (line 281) | def create_huggingface_api_worker():

FILE: fastchat/serve/inference.py
  function prepare_logits_processor (line 45) | def prepare_logits_processor(
  function generate_stream (line 62) | def generate_stream(
  class ChatIO (line 319) | class ChatIO(abc.ABC):
    method prompt_for_input (line 321) | def prompt_for_input(self, role: str) -> str:
    method prompt_for_output (line 325) | def prompt_for_output(self, role: str):
    method stream_output (line 329) | def stream_output(self, output_stream):
    method print_output (line 333) | def print_output(self, text: str):
  function chat_loop (line 337) | def chat_loop(

FILE: fastchat/serve/launch_all_serve.py
  function string_args (line 209) | def string_args(args, args_list):
  function launch_worker (line 235) | def launch_worker(item):
  function launch_all (line 256) | def launch_all():

FILE: fastchat/serve/lightllm_worker.py
  class LightLLMWorker (line 42) | class LightLLMWorker(BaseModelWorker):
    method __init__ (line 43) | def __init__(
    method generate_stream (line 77) | async def generate_stream(self, params):
    method generate (line 188) | async def generate(self, params):
  function release_worker_semaphore (line 194) | def release_worker_semaphore():
  function acquire_worker_semaphore (line 198) | def acquire_worker_semaphore():
  function create_background_tasks (line 204) | def create_background_tasks(request_id):
  function api_generate_stream (line 215) | async def api_generate_stream(request: Request):
  function api_generate (line 227) | async def api_generate(request: Request):
  function api_get_status (line 240) | async def api_get_status(request: Request):
  function api_count_token (line 245) | async def api_count_token(request: Request):
  function api_get_conv (line 251) | async def api_get_conv(request: Request):
  function api_model_details (line 256) | async def api_model_details(request: Request):

FILE: fastchat/serve/mlx_worker.py
  class MLXWorker (line 39) | class MLXWorker(BaseModelWorker):
    method __init__ (line 40) | def __init__(
    method generate_stream (line 77) | async def generate_stream(self, params):
    method generate (line 165) | async def generate(self, params):
  function release_worker_semaphore (line 171) | def release_worker_semaphore():
  function acquire_worker_semaphore (line 175) | def acquire_worker_semaphore():
  function create_background_tasks (line 181) | def create_background_tasks(request_id):
  function api_generate_stream (line 192) | async def api_generate_stream(request: Request):
  function api_generate (line 203) | async def api_generate(request: Request):
  function api_get_status (line 216) | async def api_get_status(request: Request):
  function api_count_token (line 221) | async def api_count_token(request: Request):
  function api_get_conv (line 227) | async def api_get_conv(request: Request):
  function api_model_details (line 232) | async def api_model_details(request: Request):
  function cleanup_at_exit (line 239) | def cleanup_at_exit():

FILE: fastchat/serve/model_worker.py
  class ModelWorker (line 38) | class ModelWorker(BaseModelWorker):
    method __init__ (line 39) | def __init__(
    method generate_stream_gate (line 104) | def generate_stream_gate(self, params):
    method generate_gate (line 146) | def generate_gate(self, params):
    method __process_embed_chunk (line 151) | def __process_embed_chunk(self, input_ids, attention_mask, **model_typ...
    method __encode_base64 (line 178) | def __encode_base64(self, embeddings: torch.Tensor) -> List[str]:
    method get_embeddings (line 185) | def get_embeddings(self, params):
  function create_model_worker (line 303) | def create_model_worker():

FILE: fastchat/serve/monitor/add_markdown_info.py
  function count_markdown_elements (line 10) | def count_markdown_elements(markdown_text, suffix):
  function remove_pattern (line 32) | def remove_pattern(answer, pattern):
  function get_element_counts (line 39) | def get_element_counts(df, column):
  function add_markdown_meta (line 56) | def add_markdown_meta(row):

FILE: fastchat/serve/monitor/basic_stats.py
  function get_log_files (line 19) | def get_log_files(max_num_files=None):
  function load_log_files (line 37) | def load_log_files(filename):
  function load_log_files_parallel (line 59) | def load_log_files_parallel(log_files, num_threads=16):
  function get_anony_vote_df (line 70) | def get_anony_vote_df(df):
  function merge_counts (line 78) | def merge_counts(series, on, names):
  function report_basic_stats (line 89) | def report_basic_stats(log_files):

FILE: fastchat/serve/monitor/classify/category.py
  class Category (line 15) | class Category:
    method __init__ (line 16) | def __init__(self):
    method create_category (line 20) | def create_category(name):
    method post_process (line 46) | def post_process(self):
  class CategoryHardPrompt (line 50) | class CategoryHardPrompt(Category):
    method __init__ (line 51) | def __init__(self):
    method get_score (line 66) | def get_score(self, judgment):
    method pre_process (line 80) | def pre_process(self, prompt):
    method post_process (line 85) | def post_process(self, judgment):
  class CategoryIF (line 90) | class CategoryIF(Category):
    method __init__ (line 91) | def __init__(self):
    method get_score (line 98) | def get_score(self, judgment):
    method pre_process (line 108) | def pre_process(self, prompt):
    method post_process (line 116) | def post_process(self, judgment):
  class CategoryMath (line 124) | class CategoryMath(Category):
    method __init__ (line 125) | def __init__(self):
    method get_score (line 132) | def get_score(self, judgment):
    method pre_process (line 142) | def pre_process(self, prompt):
    method post_process (line 150) | def post_process(self, judgment):
  class CategoryCreativeWriting (line 155) | class CategoryCreativeWriting(Category):
    method __init__ (line 156) | def __init__(self):
    method get_score (line 163) | def get_score(self, judgment):
    method pre_process (line 179) | def pre_process(self, prompt):
    method post_process (line 187) | def post_process(self, judgment):
  class CategoryCaptioning (line 196) | class CategoryCaptioning(Category):
    method __init__ (line 197) | def __init__(self):
    method get_score (line 204) | def get_score(self, judgment):
    method pre_process (line 214) | def pre_process(self, prompt, api_type="openai"):
    method post_process (line 222) | def post_process(self, judgment):
  class CategoryCreativeWritingVision (line 227) | class CategoryCreativeWritingVision(Category):
    method __init__ (line 228) | def __init__(self):
    method get_score (line 235) | def get_score(self, judgment):
    method pre_process (line 251) | def pre_process(self, prompt, api_type="openai"):
    method post_process (line 259) | def post_process(self, judgment):
  class CategoryEntityRecognition (line 265) | class CategoryEntityRecognition(Category):
    method __init__ (line 266) | def __init__(self):
    method get_score (line 273) | def get_score(self, judgment):
    method pre_process (line 283) | def pre_process(self, prompt, api_type="openai"):
    method post_process (line 291) | def post_process(self, judgment):
  function pil_to_base64 (line 301) | def pil_to_base64(image_path):
  class CategoryOpticalCharacterRecognition (line 309) | class CategoryOpticalCharacterRecognition(Category):
    method __init__ (line 310) | def __init__(self):
    method get_score (line 317) | def get_score(self, judgment):
    method pre_process (line 327) | def pre_process(self, prompt, api_type="openai"):
    method post_process (line 368) | def post_process(self, judgment):
  class CategoryHumor (line 373) | class CategoryHumor(Category):
    method __init__ (line 374) | def __init__(self):
    method get_score (line 381) | def get_score(self, judgment):
    method pre_process (line 391) | def pre_process(self, prompt, api_type="openai"):
    method post_process (line 430) | def post_process(self, judgment):
  class CategoryHomework (line 438) | class CategoryHomework(Category):
    method __init__ (line 439) | def __init__(self):
    method get_score (line 450) | def get_score(self, judgment):
    method pre_process (line 460) | def pre_process(self, prompt, api_type="openai"):
    method post_process (line 503) | def post_process(self, judgment):
  class CategoryDiagram (line 508) | class CategoryDiagram(Category):
    method __init__ (line 509) | def __init__(self):
    method get_score (line 524) | def get_score(self, judgment):
    method pre_process (line 534) | def pre_process(self, prompt, api_type="openai"):
    method post_process (line 577) | def post_process(self, judgment):

FILE: fastchat/serve/monitor/classify/label.py
  function make_config (line 29) | def make_config(config_file: str) -> dict:
  function get_endpoint (line 36) | def get_endpoint(endpoint_list):
  function chat_completion_openai (line 45) | def chat_completion_openai(model, messages, temperature, max_tokens, api...
  function chat_completion_anthropic (line 92) | def chat_completion_anthropic(model, messages, temperature, max_tokens, ...
  function chat_completion_gemini (line 125) | def chat_completion_gemini(
  function get_answer (line 181) | def get_answer(
  function category_merge (line 245) | def category_merge(row):
  function find_required_tasks (line 263) | def find_required_tasks(row):

FILE: fastchat/serve/monitor/clean_battle_data.py
  function remove_html (line 81) | def remove_html(raw):
  function to_openai_format (line 87) | def to_openai_format(messages):
  function replace_model_name (line 95) | def replace_model_name(old_name, tstamp):
  function read_file (line 116) | def read_file(filename):
  function read_file_parallel (line 131) | def read_file_parallel(log_files, num_threads=16):
  function process_data (line 140) | def process_data(
  function clean_battle_data (line 317) | def clean_battle_data(

FILE: fastchat/serve/monitor/clean_chat_data.py
  function date_range (line 32) | def date_range(start="2023-04-01"):
  function get_log_files (line 44) | def get_log_files(max_num_files=None):
  function get_action_type_data (line 58) | def get_action_type_data(filename, action_type):
  function process_data (line 74) | def process_data(row, action_type):
  function clean_chat_data (line 139) | def clean_chat_data(log_files, action_type, num_parallel):

FILE: fastchat/serve/monitor/copilot_arena.py
  function process_copilot_arena_leaderboard (line 11) | def process_copilot_arena_leaderboard(leaderboard):
  function build_copilot_arena_tab (line 37) | def build_copilot_arena_tab():

FILE: fastchat/serve/monitor/criteria_labeling.py
  function get_endpoint (line 43) | def get_endpoint(endpoint_list):
  function get_score (line 55) | def get_score(judgment):
  function chat_completion_openai (line 70) | def chat_completion_openai(model, messages, temperature, max_tokens, api...
  function get_answer (line 116) | def get_answer(

FILE: fastchat/serve/monitor/dataset_release_scripts/arena_33k/filter_bad_conv.py
  class TypeCode (line 20) | class TypeCode(Enum):
  function detect_type (line 31) | def detect_type(conv):

FILE: fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/compute_stats.py
  function to_remove (line 71) | def to_remove(x):

FILE: fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/filter_bad_conv.py
  class TypeCode (line 28) | class TypeCode(Enum):
  function detect_type (line 39) | def detect_type(conv):

FILE: fastchat/serve/monitor/elo_analysis.py
  function get_median_elo_from_bootstrap (line 31) | def get_median_elo_from_bootstrap(bootstrap_df):
  function compute_pairwise_win_fraction (line 37) | def compute_pairwise_win_fraction(battles, model_order, limit_show_numbe...
  function visualize_leaderboard_table (line 79) | def visualize_leaderboard_table(rating):
  function visualize_pairwise_win_fraction (line 101) | def visualize_pairwise_win_fraction(battles, model_order, scale=1):
  function visualize_battle_count (line 124) | def visualize_battle_count(battles, model_order, scale=1):
  function visualize_average_win_rate (line 148) | def visualize_average_win_rate(battles, limit_show_number, scale=1):
  function visualize_bootstrap_elo_rating (line 164) | def visualize_bootstrap_elo_rating(df, df_final, limit_show_number, scal...
  function limit_user_votes (line 194) | def limit_user_votes(battles, daily_vote_per_user):
  function get_model_pair_stats (line 215) | def get_model_pair_stats(battles):
  function outlier_detect (line 239) | def outlier_detect(
  function filter_long_conv (line 311) | def filter_long_conv(row):
  function report_elo_analysis_results (line 321) | def report_elo_analysis_results(
  function pretty_print_elo_rating (line 467) | def pretty_print_elo_rating(rating):

FILE: fastchat/serve/monitor/inspect_conv.py
  function get_log_files (line 13) | def get_log_files(max_num_files=None):
  function pretty_print_conversation (line 31) | def pretty_print_conversation(messages):
  function inspect_convs (line 36) | def inspect_convs(log_files):

FILE: fastchat/serve/monitor/leaderboard_csv_to_html.py
  function model_hyperlink (line 14) | def model_hyperlink(model_name, link):

FILE: fastchat/serve/monitor/monitor.py
  function recompute_final_ranking (line 57) | def recompute_final_ranking(arena_df):
  function arena_hard_title (line 70) | def arena_hard_title(date):
  function update_elo_components (line 82) | def update_elo_components(
  function update_worker (line 125) | def update_worker(
  function load_demo (line 138) | def load_demo(url_params, request: gr.Request):
  function model_hyperlink (line 143) | def model_hyperlink(model_name, link):
  function load_leaderboard_table_csv (line 147) | def load_leaderboard_table_csv(filename, add_hyperlink=True):
  function build_basic_stats_tab (line 184) | def build_basic_stats_tab():
  function get_full_table (line 204) | def get_full_table(arena_df, model_table_df, model_to_score):
  function arena_hard_process (line 233) | def arena_hard_process(leaderboard_table_file, filepath):
  function get_arena_table (line 269) | def get_arena_table(
  function update_leaderboard_df (line 354) | def update_leaderboard_df(arena_table_vals):
  function update_overall_leaderboard_df (line 383) | def update_overall_leaderboard_df(arena_table_vals):
  function build_arena_tab (line 425) | def build_arena_tab(
  function build_full_leaderboard_tab (line 718) | def build_full_leaderboard_tab(elo_results, model_table_df, model_to_sco...
  function get_arena_category_table (line 742) | def get_arena_category_table(results_df, categories, metric="ranking"):
  function build_category_leaderboard_tab (line 805) | def build_category_leaderboard_tab(
  function get_combined_table (line 882) | def get_combined_table(elo_results, model_table_df):
  function build_leaderboard_tab (line 917) | def build_leaderboard_tab(
  function build_demo (line 1112) | def build_demo(elo_results_file, leaderboard_table_file, arena_hard_lead...

FILE: fastchat/serve/monitor/monitor_md.py
  function make_default_md_1 (line 137) | def make_default_md_1(mirror=False):
  function make_default_md_2 (line 147) | def make_default_md_2(mirror=False):
  function make_arena_leaderboard_md (line 162) | def make_arena_leaderboard_md(arena_df, last_updated_time, vision=False):
  function make_category_arena_leaderboard_md (line 177) | def make_category_arena_leaderboard_md(arena_df, arena_subset_df, name="...
  function make_full_leaderboard_md (line 196) | def make_full_leaderboard_md():
  function make_leaderboard_md_live (line 210) | def make_leaderboard_md_live(elo_results):
  function arena_hard_title (line 219) | def arena_hard_title(date):

FILE: fastchat/serve/monitor/rating_systems.py
  function get_matchups_models (line 24) | def get_matchups_models(df):
  function preprocess_for_elo (line 31) | def preprocess_for_elo(df):
  function preprocess_for_bt (line 44) | def preprocess_for_bt(df):
  function preprocess_for_style (line 65) | def preprocess_for_style(
  function fit_vectorized_elo (line 110) | def fit_vectorized_elo(
  function compute_elo (line 139) | def compute_elo(df, k=4.0, base=10.0, init_rating=1000.0, scale=400.0):
  function compute_bootstrap_elo (line 153) | def compute_bootstrap_elo(
  function bt_loss_and_grad (line 165) | def bt_loss_and_grad(ratings, matchups, outcomes, weights, alpha=1.0):
  function fit_bt (line 184) | def fit_bt(matchups, outcomes, weights, n_models, alpha, tol=1e-6):
  function scale_and_offset (line 197) | def scale_and_offset(
  function compute_bt (line 213) | def compute_bt(df, base=10.0, scale=400.0, init_rating=1000, tol=1e-6):
  function compute_bootstrap_bt (line 220) | def compute_bootstrap_bt(
  function contextual_bt_loss_and_grad (line 256) | def contextual_bt_loss_and_grad(
  function fit_contextual_bt (line 296) | def fit_contextual_bt(
  function compute_style_control (line 326) | def compute_style_control(
  function compute_bootstrap_style_control (line 348) | def compute_bootstrap_style_control(

FILE: fastchat/serve/monitor/summarize_cluster.py
  function truncate_string (line 19) | def truncate_string(s, l):

FILE: fastchat/serve/monitor/tag_openai_moderation.py
  function tag_moderation (line 20) | def tag_moderation(text):
  function tag_openai_moderation (line 33) | def tag_openai_moderation(x):

FILE: fastchat/serve/monitor/topic_clustering.py
  function remove_punctuation (line 24) | def remove_punctuation(input_string):
  function read_texts (line 33) | def read_texts(input_file, min_length, max_length, english_only):
  function get_embeddings (line 82) | def get_embeddings(texts, model_name, batch_size):
  function run_k_means (line 107) | def run_k_means(embeddings, num_clusters):
  function run_agg_cluster (line 126) | def run_agg_cluster(embeddings, num_clusters):
  function run_hdbscan_cluster (line 148) | def run_hdbscan_cluster(embeddings):
  function get_topk_indices (line 171) | def get_topk_indices(centers, labels, embeddings, topk):
  function print_topk (line 187) | def print_topk(texts, labels, topk_indices, show_cut_off):
  function get_cluster_info (line 200) | def get_cluster_info(texts, labels, topk_indices):

FILE: fastchat/serve/monitor/vote_time_stats/analyze_data.py
  function _serialize_json (line 8) | def _serialize_json(data):
  function process_record (line 29) | def process_record(r):
  function process_file (line 81) | def process_file(infile: str, outfile: str):

FILE: fastchat/serve/multi_model_worker.py
  function release_worker_semaphore (line 72) | def release_worker_semaphore():
  function acquire_worker_semaphore (line 76) | def acquire_worker_semaphore():
  function create_background_tasks (line 86) | def create_background_tasks():
  function api_generate_stream (line 98) | async def api_generate_stream(request: Request):
  function api_generate (line 108) | async def api_generate(request: Request):
  function api_get_embeddings (line 118) | async def api_get_embeddings(request: Request):
  function api_get_status (line 128) | async def api_get_status(request: Request):
  function api_count_token (line 137) | async def api_count_token(request: Request):
  function api_get_conv (line 144) | async def api_get_conv(request: Request):
  function api_model_details (line 151) | async def api_model_details(request: Request):
  function create_multi_model_worker (line 157) | def create_multi_model_worker():

FILE: fastchat/serve/openai_api_server.py
  function fetch_remote (line 73) | async def fetch_remote(url, pload=None, name=None):
  class AppSettings (line 97) | class AppSettings(BaseSettings):
  function check_api_key (line 109) | async def check_api_key(
  function create_error_response (line 131) | def create_error_response(code: int, message: str) -> JSONResponse:
  function validation_exception_handler (line 138) | async def validation_exception_handler(request, exc):
  function check_model (line 142) | async def check_model(request) -> Optional[JSONResponse]:
  function check_length (line 155) | async def check_length(request, prompt, max_tokens, worker_addr):
  function check_requests (line 180) | def check_requests(request) -> Optional[JSONResponse]:
  function process_input (line 228) | def process_input(model_name, inp):
  function create_openai_logprobs (line 252) | def create_openai_logprobs(logprob_dict):
  function _add_to_set (line 257) | def _add_to_set(s, new_stop):
  function get_gen_params (line 266) | async def get_gen_params(
  function get_worker_address (line 367) | async def get_worker_address(model_name: str) -> str:
  function get_conv (line 387) | async def get_conv(model_name: str, worker_addr: str):
  function show_available_models (line 398) | async def show_available_models():
  function create_chat_completion (line 412) | async def create_chat_completion(request: ChatCompletionRequest):
  function chat_completion_stream_generator (line 486) | async def chat_completion_stream_generator(
  function create_completion (line 543) | async def create_completion(request: CompletionRequest):
  function generate_completion_stream_generator (line 621) | async def generate_completion_stream_generator(
  function generate_completion_stream (line 680) | async def generate_completion_stream(payload: Dict[str, Any], worker_add...
  function generate_completion (line 702) | async def generate_completion(payload: Dict[str, Any], worker_addr: str):
  function create_embeddings (line 708) | async def create_embeddings(request: EmbeddingsRequest, model_name: str ...
  function get_embedding (line 754) | async def get_embedding(payload: Dict[str, Any]):
  function count_tokens (line 767) | async def count_tokens(request: APITokenCheckRequest):
  function create_chat_completion (line 802) | async def create_chat_completion(request: APIChatCompletionRequest):
  function create_openai_api_server (line 878) | def create_openai_api_server():

FILE: fastchat/serve/remote_logger.py
  function get_remote_logger (line 13) | def get_remote_logger():
  class EmptyLogger (line 24) | class EmptyLogger:
    method __init__ (line 27) | def __init__(self):
    method log (line 30) | def log(self, _data: dict):
  class RemoteLogger (line 34) | class RemoteLogger:
    method __init__ (line 37) | def __init__(self, url: str):
    method log (line 44) | def log(self, data: dict):
    method _send_logs (line 47) | def _send_logs(self):

FILE: fastchat/serve/sglang_worker.py
  function pipeline (line 34) | def pipeline(s, prompt, max_tokens):
  class SGLWorker (line 43) | class SGLWorker(BaseModelWorker):
    method __init__ (line 44) | def __init__(
    method generate_stream (line 81) | async def generate_stream(self, params):
    method generate_stream_gate (line 164) | async def generate_stream_gate(self, params):
    method generate_gate (line 175) | async def generate_gate(self, params):
  function release_worker_semaphore (line 181) | def release_worker_semaphore():
  function acquire_worker_semaphore (line 185) | def acquire_worker_semaphore():
  function create_background_tasks (line 191) | def create_background_tasks():
  function api_generate_stream (line 198) | async def api_generate_stream(request: Request):
  function api_generate (line 207) | async def api_generate(request: Request):
  function api_get_status (line 216) | async def api_get_status(request: Request):
  function api_count_token (line 221) | async def api_count_token(request: Request):
  function api_get_conv (line 227) | async def api_get_conv(request: Request):
  function api_model_details (line 232) | async def api_model_details(request: Request):

FILE: fastchat/serve/test_message.py
  function main (line 10) | def main():

FILE: fastchat/serve/test_throughput.py
  function main (line 12) | def main():

FILE: fastchat/serve/vision/create_vqa_examples_dir.py
  function download_images_and_create_json (line 20) | def download_images_and_create_json(

FILE: fastchat/serve/vision/image.py
  class ImageFormat (line 8) | class ImageFormat(IntEnum):
  class Image (line 18) | class Image(BaseModel):
    method convert_image_to_base64 (line 24) | def convert_image_to_base64(self):
    method to_openai_image_format (line 46) | def to_openai_image_format(self):
    method resize_image_and_return_image_in_bytes (line 59) | def resize_image_and_return_image_in_bytes(self, image, max_image_size...
    method convert_url_to_image_bytes (line 96) | def convert_url_to_image_bytes(self, max_image_size_mb):
    method to_conversation_format (line 118) | def to_conversation_format(self, max_image_size_mb):

FILE: fastchat/serve/vllm_worker.py
  class VLLMWorker (line 31) | class VLLMWorker(BaseModelWorker):
    method __init__ (line 32) | def __init__(
    method generate_stream (line 67) | async def generate_stream(self, params):
    method generate (line 172) | async def generate(self, params):
  function release_worker_semaphore (line 178) | def release_worker_semaphore():
  function acquire_worker_semaphore (line 182) | def acquire_worker_semaphore():
  function create_background_tasks (line 188) | def create_background_tasks(request_id):
  function api_generate_stream (line 199) | async def api_generate_stream(request: Request):
  function api_generate (line 211) | async def api_generate(request: Request):
  function api_get_status (line 224) | async def api_get_status(request: Request):
  function api_count_token (line 229) | async def api_count_token(request: Request):
  function api_get_conv (line 235) | async def api_get_conv(request: Request):
  function api_model_details (line 240) | async def api_model_details(request: Request):

FILE: fastchat/train/llama2_flash_attn_monkey_patch.py
  function apply_rotary_pos_emb (line 18) | def apply_rotary_pos_emb(q, k, cos_sin, position_ids):
  function forward (line 32) | def forward(
  function _prepare_decoder_attention_mask (line 108) | def _prepare_decoder_attention_mask(
  function replace_llama_attn_with_flash_attn (line 132) | def replace_llama_attn_with_flash_attn():
  function test (line 144) | def test():

FILE: fastchat/train/llama_flash_attn_monkey_patch.py
  function forward (line 13) | def forward(
  function _prepare_decoder_attention_mask (line 90) | def _prepare_decoder_attention_mask(
  function replace_llama_attn_with_flash_attn (line 97) | def replace_llama_attn_with_flash_attn():

FILE: fastchat/train/llama_xformers_attn_monkey_patch.py
  function replace_llama_attn_with_xformers_attn (line 19) | def replace_llama_attn_with_xformers_attn():
  function xformers_forward (line 23) | def xformers_forward(

FILE: fastchat/train/train.py
  class ModelArguments (line 37) | class ModelArguments:
  class DataArguments (line 51) | class DataArguments:
  class TrainingArguments (line 62) | class TrainingArguments(transformers.TrainingArguments):
  function rank0_print (line 76) | def rank0_print(*args):
  function trainer_save_model_safe (line 81) | def trainer_save_model_safe(trainer: transformers.Trainer):
  function preprocess (line 92) | def preprocess(
  class SupervisedDataset (line 180) | class SupervisedDataset(Dataset):
    method __init__ (line 183) | def __init__(self, raw_data, tokenizer: transformers.PreTrainedTokeniz...
    method __len__ (line 194) | def __len__(self):
    method __getitem__ (line 197) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  class LazySupervisedDataset (line 205) | class LazySupervisedDataset(Dataset):
    method __init__ (line 208) | def __init__(self, raw_data, tokenizer: transformers.PreTrainedTokeniz...
    method __len__ (line 217) | def __len__(self):
    method __getitem__ (line 220) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  function make_supervised_data_module (line 235) | def make_supervised_data_module(
  function train (line 256) | def train():

FILE: fastchat/train/train_baichuan.py
  class ModelArguments (line 39) | class ModelArguments:
  class DataArguments (line 44) | class DataArguments:
  class TrainingArguments (line 52) | class TrainingArguments(transformers.TrainingArguments):
  function rank0_print (line 66) | def rank0_print(*args):
  function safe_save_model_for_hf_trainer (line 71) | def safe_save_model_for_hf_trainer(trainer: transformers.Trainer, output...
  function apply_prompt_template (line 80) | def apply_prompt_template(sources, systems=None):
  function tokenize_conversations (line 100) | def tokenize_conversations(conversations, tokenizer):
  function mask_targets (line 112) | def mask_targets(conversations, targets, tokenizer, conv):
  function preprocess (line 151) | def preprocess(sources, tokenizer: transformers.PreTrainedTokenizer, **k...
  class SupervisedDataset (line 180) | class SupervisedDataset(Dataset):
    method __init__ (line 183) | def __init__(self, raw_data, tokenizer: transformers.PreTrainedTokeniz...
    method __len__ (line 196) | def __len__(self):
    method __getitem__ (line 199) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  class LazySupervisedDataset (line 207) | class LazySupervisedDataset(Dataset):
    method __init__ (line 210) | def __init__(self, raw_data, tokenizer: transformers.PreTrainedTokeniz...
    method __len__ (line 218) | def __len__(self):
    method __getitem__ (line 221) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  function make_supervised_data_module (line 240) | def make_supervised_data_module(
  function train (line 275) | def train():

FILE: fastchat/train/train_flant5.py
  class ModelArguments (line 47) | class ModelArguments:
  class DataArguments (line 52) | class DataArguments:
  class TrainingArguments (line 64) | class TrainingArguments(transformers.TrainingArguments):
  function safe_save_model_for_hf_trainer (line 75) | def safe_save_model_for_hf_trainer(trainer: transformers.Trainer, output...
  function smart_tokenizer_and_embedding_resize (line 84) | def smart_tokenizer_and_embedding_resize(
  function _tokenize_fn (line 115) | def _tokenize_fn(
  function _form_qa (line 142) | def _form_qa(
  function _add_speaker_and_signal (line 179) | def _add_speaker_and_signal(header, source, get_conversation=True):
  function preprocess (line 221) | def preprocess(
  class SupervisedDataset (line 266) | class SupervisedDataset(Dataset):
    method __init__ (line 269) | def __init__(
    method __len__ (line 342) | def __len__(self):
    method __getitem__ (line 345) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  class DataCollatorForSupervisedDataset (line 350) | class DataCollatorForSupervisedDataset(object):
    method __call__ (line 355) | def __call__(self, instances: Sequence[Dict]) -> Dict[str, torch.Tensor]:
  function make_supervised_data_module (line 378) | def make_supervised_data_module(
  function train (line 395) | def train():

FILE: fastchat/train/train_lora.py
  class TrainingArguments (line 43) | class TrainingArguments(transformers.TrainingArguments):
  class LoraArguments (line 56) | class LoraArguments:
  function maybe_zero_3 (line 68) | def maybe_zero_3(param):
  function get_peft_state_maybe_zero_3 (line 79) | def get_peft_state_maybe_zero_3(named_params, bias):
  function train (line 104) | def train():

FILE: fastchat/train/train_lora_t5.py
  class LoraArguments (line 59) | class LoraArguments:
  class ModelArguments (line 70) | class ModelArguments:
  class DataArguments (line 75) | class DataArguments:
  class TrainingArguments (line 87) | class TrainingArguments(transformers.TrainingArguments):
  function safe_save_model_for_hf_trainer (line 98) | def safe_save_model_for_hf_trainer(
  function train (line 109) | def train():

FILE: fastchat/train/train_with_template.py
  class ModelArguments (line 39) | class ModelArguments:
  class DataArguments (line 44) | class DataArguments:
  class TrainingArguments (line 52) | class TrainingArguments(transformers.TrainingArguments):
  function rank0_print (line 66) | def rank0_print(*args):
  function safe_save_model_for_hf_trainer (line 71) | def safe_save_model_for_hf_trainer(trainer: transformers.Trainer, output...
  function apply_prompt_template (line 80) | def apply_prompt_template(sources, template_id, systems=None):
  function tokenize_conversations (line 100) | def tokenize_conversations(conversations, tokenizer):
  function get_prompt_separator (line 112) | def get_prompt_separator(conv):
  function mask_targets (line 144) | def mask_targets(conversations, targets, tokenizer, conv):
  function preprocess (line 198) | def preprocess(
  class SupervisedDataset (line 229) | class SupervisedDataset(Dataset):
    method __init__ (line 232) | def __init__(
    method __len__ (line 247) | def __len__(self):
    method __getitem__ (line 250) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  class LazySupervisedDataset (line 258) | class LazySupervisedDataset(Dataset):
    method __init__ (line 261) | def __init__(
    method __len__ (line 272) | def __len__(self):
    method __getitem__ (line 275) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  function make_supervised_data_module (line 295) | def make_supervised_data_module(
  function train (line 337) | def train():

FILE: fastchat/train/train_yuan2.py
  class ModelArguments (line 37) | class ModelArguments:
  class DataArguments (line 51) | class DataArguments:
  class TrainingArguments (line 65) | class TrainingArguments(transformers.TrainingArguments):
  function rank0_print (line 79) | def rank0_print(*args):
  function trainer_save_model_safe (line 84) | def trainer_save_model_safe(trainer: transformers.Trainer):
  function right_replace (line 96) | def right_replace(string, old, new, max=1):
  function preprocess (line 100) | def preprocess(
  class SupervisedDataset (line 316) | class SupervisedDataset(Dataset):
    method __init__ (line 319) | def __init__(
    method __len__ (line 332) | def __len__(self):
    method __getitem__ (line 335) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  class LazySupervisedDataset (line 343) | class LazySupervisedDataset(Dataset):
    method __init__ (line 346) | def __init__(
    method __len__ (line 358) | def __len__(self):
    method __getitem__ (line 361) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  function make_supervised_data_module (line 378) | def make_supervised_data_module(
  function train (line 399) | def train():

FILE: fastchat/utils.py
  function build_logger (line 26) | def build_logger(logger_name, logger_filename):
  class StreamToLogger (line 84) | class StreamToLogger(object):
    method __init__ (line 89) | def __init__(self, logger, log_level=logging.INFO):
    method __getattr__ (line 95) | def __getattr__(self, attr):
    method write (line 98) | def write(self, buf):
    method flush (line 113) | def flush(self):
  function disable_torch_init (line 120) | def disable_torch_init():
  function get_gpu_memory (line 130) | def get_gpu_memory(max_gpus=None):
  function oai_moderation (line 152) | def oai_moderation(text, custom_thresholds=None):
  function moderation_filter (line 177) | def moderation_filter(text, model_list, do_moderation=False):
  function clean_flant5_ckpt (line 208) | def clean_flant5_ckpt(ckpt_path):
  function pretty_print_semaphore (line 232) | def pretty_print_semaphore(semaphore):
  function iter_over_async (line 276) | def iter_over_async(
  function detect_language (line 302) | def detect_language(text: str) -> str:
  function parse_gradio_auth_creds (line 318) | def parse_gradio_auth_creds(filename: str):
  function is_partial_stop (line 331) | def is_partial_stop(output: str, stop_str: str):
  function run_cmd (line 339) | def run_cmd(cmd: str):
  function is_sentence_complete (line 345) | def is_sentence_complete(output: str):
  function get_context_length (line 364) | def get_context_length(config):
  function str_to_torch_dtype (line 379) | def str_to_torch_dtype(dtype: str):
  function load_image (line 394) | def load_image(image_file):
  function upload_image_file_to_gcs (line 415) | def upload_image_file_to_gcs(image, filename):
  function get_image_file_from_gcs (line 433) | def get_image_file_from_gcs(filename):
  function image_moderation_request (line 444) | def image_moderation_request(image_bytes, endpoint, api_key):
  function image_moderation_provider (line 458) | def image_moderation_provider(image, api_type):
  function image_moderation_filter (line 474) | def image_moderation_filter(image):

FILE: playground/benchmark/benchmark_api_provider.py
  class Metrics (line 16) | class Metrics:
    method __init__ (line 17) | def __init__(self):
    method to_dict (line 21) | def to_dict(self):
  function sample_image_and_question (line 25) | def sample_image_and_question(random_questions_dict, index):
  function call_model (line 37) | def call_model(
  function run_benchmark (line 76) | def run_benchmark(model_name, model_api_dict, random_questions_dict, num...
  function benchmark_models (line 100) | def benchmark_models(api_endpoint_info, random_questions_dict, models):
  function main (line 116) | def main(api_endpoint_file, random_questions, output_file):

FILE: playground/test_embedding/test_classification.py
  function get_embedding_from_api (line 16) | def get_embedding_from_api(word, model="vicuna-7b-v1.1"):
  function create_embedding_data_frame (line 38) | def create_embedding_data_frame(data_path, model, max_tokens=500):
  function train_random_forest (line 55) | def train_random_forest(df):

FILE: playground/test_embedding/test_semantic_search.py
  function cosine_similarity (line 11) | def cosine_similarity(vec1, vec2):
  function get_embedding_from_api (line 18) | def get_embedding_from_api(word, model="vicuna-7b-v1.1"):
  function create_embedding_data_frame (line 40) | def create_embedding_data_frame(data_path, model, max_tokens=500):
  function search_reviews (line 57) | def search_reviews(df, product_description, n=3, pprint=False, model="vi...
  function print_model_search (line 76) | def print_model_search(input_path, model):

FILE: playground/test_embedding/test_sentence_similarity.py
  function get_embedding_from_api (line 10) | def get_embedding_from_api(word, model="vicuna-7b-v1.5"):
  function cosine_similarity (line 32) | def cosine_similarity(vec1, vec2):
  function print_cosine_similarity (line 36) | def print_cosine_similarity(embeddings, texts):

FILE: tests/launch_openai_api_test_server.py
  function launch_process (line 8) | def launch_process(cmd):

FILE: tests/load_test.py
  function litellm_completion (line 12) | def litellm_completion(args, tokenizer, image_url=None):
  function main (line 66) | def main(args):

FILE: tests/test_cli.py
  function test_single_gpu (line 8) | def test_single_gpu():
  function test_multi_gpu (line 38) | def test_multi_gpu():
  function test_8bit (line 54) | def test_8bit():
  function test_hf_api (line 70) | def test_hf_api():

FILE: tests/test_image_utils.py
  function check_byte_size_in_mb (line 21) | def check_byte_size_in_mb(image_base64_str):
  function generate_random_image (line 25) | def generate_random_image(target_size_mb, image_format="PNG"):
  class DontResizeIfLessThanMaxTest (line 55) | class DontResizeIfLessThanMaxTest(unittest.TestCase):
    method test_dont_resize_if_less_than_max (line 56) | def test_dont_resize_if_less_than_max(self):
  class ResizeLargeImageForModerationEndpoint (line 73) | class ResizeLargeImageForModerationEndpoint(unittest.TestCase):
    method test_resize_large_image_and_send_to_moderation_filter (line 74) | def test_resize_large_image_and_send_to_moderation_filter(self):
  class DontResizeIfMaxImageSizeIsNone (line 83) | class DontResizeIfMaxImageSizeIsNone(unittest.TestCase):
    method test_dont_resize_if_max_image_size_is_none (line 84) | def test_dont_resize_if_max_image_size_is_none(self):
  class OpenAIConversationDontResizeImage (line 100) | class OpenAIConversationDontResizeImage(unittest.TestCase):
    method test (line 101) | def test(self):
  class ClaudeConversationResizesCorrectly (line 116) | class ClaudeConversationResizesCorrectly(unittest.TestCase):
    method test (line 117) | def test(self):

FILE: tests/test_openai_api.py
  function test_list_models (line 17) | def test_list_models():
  function test_completion (line 23) | def test_completion(model, logprob):
  function test_completion_stream (line 41) | def test_completion_stream(model):
  function test_embedding (line 57) | def test_embedding(model):
  function test_chat_completion (line 63) | def test_chat_completion(model):
  function test_chat_completion_stream (line 72) | def test_chat_completion_stream(model):
  function test_openai_curl (line 88) | def test_openai_curl():

FILE: tests/test_openai_langchain.py
  function test_chain (line 15) | def test_chain():

FILE: tests/test_openai_vision_api.py
  function encode_image (line 16) | def encode_image(image):
  function test_list_models (line 36) | def test_list_models():
  function test_chat_completion (line 42) | def test_chat_completion(model):
  function test_chat_completion_stream (line 97) | def test_chat_completion_stream(model):
  function test_openai_curl (line 123) | def test_openai_curl():

Download .json

Condensed preview — 208 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,638K chars).

[
  {
    "path": ".github/PULL_REQUEST_TEMPLATE.md",
    "chars": 594,
    "preview": "<!-- Thank you for your contribution! -->\n\n<!-- Please add a reviewer to the assignee section when you create a PR. If y"
  },
  {
    "path": ".github/workflows/python-package.yml",
    "chars": 673,
    "preview": "name: Python package\n\non: [push, pull_request]\n\njobs:\n  build:\n\n    runs-on: ubuntu-latest\n    strategy:\n      fail-fast"
  },
  {
    "path": ".gitignore",
    "chars": 338,
    "preview": "# Python\n__pycache__\n*.pyc\n*.egg-info\ndist\n.venv\n\n# Log\n*.log\n*.log.*\n*.json\n!playground/deepspeed_config_s2.json\n!playg"
  },
  {
    "path": ".pylintrc",
    "chars": 14765,
    "preview": "# This Pylint rcfile contains a best-effort configuration to uphold the\n# best-practices and style described in the Goog"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "README.md",
    "chars": 20100,
    "preview": "# FastChat\n| [**Demo**](https://lmarena.ai/) | [**Discord**](https://discord.gg/6GXcFg3TH8) | [**X**](https://x.com/lmsy"
  },
  {
    "path": "docker/Dockerfile",
    "chars": 276,
    "preview": "FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04\n\nRUN apt-get update -y && apt-get install -y python3.9 python3.9-distutils c"
  },
  {
    "path": "docker/docker-compose.yml",
    "chars": 1290,
    "preview": "version: \"3.9\"\n\nservices:\n  fastchat-controller:\n    build:\n      context: .\n      dockerfile: Dockerfile\n    image: fas"
  },
  {
    "path": "docs/arena.md",
    "chars": 3436,
    "preview": "# Chatbot Arena\nChatbot Arena is an LLM benchmark platform featuring anonymous, randomized battles, available at https:/"
  },
  {
    "path": "docs/awq.md",
    "chars": 3319,
    "preview": "# AWQ 4bit Inference\n\nWe integrated [AWQ](https://github.com/mit-han-lab/llm-awq) into FastChat to provide **efficient a"
  },
  {
    "path": "docs/commands/conv_release.md",
    "chars": 552,
    "preview": "## Chatbot Arena Conversations\n\n1. Gather battles\n```\npython3 clean_battle_data.py --max-num 10 --mode conv_release\n```\n"
  },
  {
    "path": "docs/commands/data_cleaning.md",
    "chars": 599,
    "preview": "## Data cleaning\n\n## Requirements\n```\npip3 install bs4 markdownify\npip3 install polyglot pyicu pycld2\n```\n\n## Steps\n```\n"
  },
  {
    "path": "docs/commands/leaderboard.md",
    "chars": 950,
    "preview": "### Get logs\n```\ngsutil -m rsync -r gs://fastchat_logs ~/fastchat_logs/\n```\n\n### Clean battle data\n```\ncd ~/FastChat/fas"
  },
  {
    "path": "docs/commands/local_cluster.md",
    "chars": 3679,
    "preview": "### Local GPU cluster\nnode-01\n```\npython3 -m fastchat.serve.controller --host 0.0.0.0 --port 10002\n\nCUDA_VISIBLE_DEVICES"
  },
  {
    "path": "docs/commands/pypi.md",
    "chars": 157,
    "preview": "### Requirement\n```\npython3 -m pip install twine\npython3 -m pip install --upgrade pip\npip3 install build\n```\n\n### Upload"
  },
  {
    "path": "docs/commands/webserver.md",
    "chars": 2694,
    "preview": "### Install\n```\nsudo apt update\nsudo apt install tmux htop\n\nwget https://repo.anaconda.com/archive/Anaconda3-2022.10-Lin"
  },
  {
    "path": "docs/dashinfer_integration.md",
    "chars": 1952,
    "preview": "# dash-infer Integration\n[DashInfer](https://github.com/modelscope/dash-infer) is a high-performance inference engine sp"
  },
  {
    "path": "docs/dataset_release.md",
    "chars": 484,
    "preview": "## Datasets\nWe release the following datasets based on our projects and websites.\n\n- [LMSYS-Chat-1M: A Large-Scale Real-"
  },
  {
    "path": "docs/exllama_v2.md",
    "chars": 2855,
    "preview": "# ExllamaV2 GPTQ Inference Framework\n\nIntegrated [ExllamaV2](https://github.com/turboderp/exllamav2) customized kernel i"
  },
  {
    "path": "docs/gptq.md",
    "chars": 2365,
    "preview": "# GPTQ 4bit Inference\n\nSupport GPTQ 4bit inference with [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)."
  },
  {
    "path": "docs/langchain_integration.md",
    "chars": 3209,
    "preview": "# Local LangChain with FastChat\n\n[LangChain](https://python.langchain.com/en/latest/index.html) is a library that facili"
  },
  {
    "path": "docs/lightllm_integration.md",
    "chars": 1431,
    "preview": "# LightLLM Integration\nYou can use [LightLLM](https://github.com/ModelTC/lightllm) as an optimized worker implementation"
  },
  {
    "path": "docs/mlx_integration.md",
    "chars": 828,
    "preview": "# Apple MLX Integration\n\nYou can use [Apple MLX](https://github.com/ml-explore/mlx) as an optimized worker implementatio"
  },
  {
    "path": "docs/model_support.md",
    "chars": 9278,
    "preview": "# Model Support\nThis document describes how to support a new model in FastChat.\n\n## Content\n- [Local Models](#local-mode"
  },
  {
    "path": "docs/openai_api.md",
    "chars": 4870,
    "preview": "# OpenAI-Compatible RESTful APIs\n\nFastChat provides OpenAI-compatible APIs for its supported models, so you can use Fast"
  },
  {
    "path": "docs/server_arch.md",
    "chars": 73,
    "preview": "# FastChat Server Architecture\n![server arch](../assets/server_arch.png)\n"
  },
  {
    "path": "docs/third_party_ui.md",
    "chars": 1851,
    "preview": "# Third Party UI\nIf you want to host it on your own UI or third party UI, you can launch the [OpenAI compatible server]("
  },
  {
    "path": "docs/training.md",
    "chars": 4326,
    "preview": "### Fine-tuning FastChat-T5\nYou can use the following command to train FastChat-T5 with 4 x A100 (40GB).\n```bash\ntorchru"
  },
  {
    "path": "docs/vicuna_weights_version.md",
    "chars": 5690,
    "preview": "## Vicuna Weights\n\n| Weights version | Link | FastChat version compatibility | Base Model | Release Date | Fine-tuning D"
  },
  {
    "path": "docs/vllm_integration.md",
    "chars": 1022,
    "preview": "# vLLM Integration\nYou can use [vLLM](https://vllm.ai/) as an optimized worker implementation in FastChat.\nIt offers adv"
  },
  {
    "path": "docs/xFasterTransformer.md",
    "chars": 3540,
    "preview": "# xFasterTransformer Inference Framework\n\nIntegrated [xFasterTransformer](https://github.com/intel/xFasterTransformer) c"
  },
  {
    "path": "fastchat/__init__.py",
    "chars": 23,
    "preview": "__version__ = \"0.2.36\"\n"
  },
  {
    "path": "fastchat/constants.py",
    "chars": 3743,
    "preview": "\"\"\"\nGlobal constants.\n\"\"\"\n\nfrom enum import IntEnum\nimport os\n\nREPO_PATH = os.path.dirname(os.path.dirname(__file__))\n\n#"
  },
  {
    "path": "fastchat/conversation.py",
    "chars": 102852,
    "preview": "\"\"\"\nConversation prompt templates.\n\nWe kindly request that you import fastchat instead of copying this file if you wish "
  },
  {
    "path": "fastchat/data/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "fastchat/data/clean_sharegpt.py",
    "chars": 7188,
    "preview": "\"\"\"\n- Convert html to markdown with basic data cleaning.\n- Deduplication.\n\nUsage:\npython3 -m fastchat.data.clean_sharegp"
  },
  {
    "path": "fastchat/data/convert_alpaca.py",
    "chars": 1094,
    "preview": "\"\"\"\nConvert alpaca dataset into sharegpt format.\n\nUsage: python3 -m fastchat.data.convert_alpaca --in alpaca_data.json\n\""
  },
  {
    "path": "fastchat/data/extract_gpt4_only.py",
    "chars": 976,
    "preview": "\"\"\"\nExtract the conversations generated by GPT-4 only.\n\nUsage: python3 -m fastchat.data.extract_gpt4_only --in sharegpt."
  },
  {
    "path": "fastchat/data/extract_single_round.py",
    "chars": 883,
    "preview": "\"\"\"\nExtract the first round of the conversations.\n\nUsage: python3 -m fastchat.data.extract_single_round --in sharegpt.js"
  },
  {
    "path": "fastchat/data/filter_wrong_format.py",
    "chars": 1157,
    "preview": "\"\"\"\nFilter conversations with wrong formats.\n\nUsage:\npython3 -m fastchat.data.filter_wrong_format --in input.json --out "
  },
  {
    "path": "fastchat/data/get_stats.py",
    "chars": 2413,
    "preview": "\"\"\"\nGet stats of a dataset.\n\nUsage: python3 -m fastchat.data.get_stats --in sharegpt.json\n\"\"\"\n\nimport argparse\nfrom conc"
  },
  {
    "path": "fastchat/data/hardcoded_questions.py",
    "chars": 6264,
    "preview": "\"\"\"\nHardcoded question and answers.\n\"\"\"\nimport json\n\n\ndef identity_questions():\n    \"\"\" \"\n    Adapted from https://githu"
  },
  {
    "path": "fastchat/data/inspect_data.py",
    "chars": 939,
    "preview": "\"\"\"\nUsage:\npython3 -m fastchat.data.inspect_data --in sharegpt_20230322_clean_lang_split.json\n\"\"\"\nimport argparse\nimport"
  },
  {
    "path": "fastchat/data/merge.py",
    "chars": 664,
    "preview": "\"\"\"\nMerge two conversation files into one\n\nUsage: python3 -m fastchat.data.merge --in file1.json file2.json --out merged"
  },
  {
    "path": "fastchat/data/optional_clean.py",
    "chars": 2699,
    "preview": "\"\"\"\nDo optional cleaning (e.g., remove some languages).\n\nUsage:\npython3 -m fastchat.data.optional_clean --in input.json "
  },
  {
    "path": "fastchat/data/optional_replace.py",
    "chars": 2361,
    "preview": "\"\"\"\nDo optional replace of bos/eos/pad/unk.\n\nUsage:\npython3 -m fastchat.data.optional_replace --in input.json --out outp"
  },
  {
    "path": "fastchat/data/prepare_all.py",
    "chars": 1851,
    "preview": "\"\"\"Prepare all datasets.\"\"\"\n\nimport argparse\nimport os\n\nfrom fastchat.utils import run_cmd\n\n\nif __name__ == \"__main__\":\n"
  },
  {
    "path": "fastchat/data/pretty_json.py",
    "chars": 495,
    "preview": "\"\"\"\nUsage:\npython3 pretty_json.py --in in.json --out out.json\n\"\"\"\n\nimport argparse\nimport json\n\n\nif __name__ == \"__main_"
  },
  {
    "path": "fastchat/data/sample.py",
    "chars": 1233,
    "preview": "\"\"\"\nSample some conversations from a file.\n\nUsage: python3 -m fastchat.data.sample --in sharegpt.json --out sampled.json"
  },
  {
    "path": "fastchat/data/split_long_conversation.py",
    "chars": 3741,
    "preview": "\"\"\"\nSplit long conversations based on certain max length.\n\nUsage: python3 -m fastchat.data.split_long_conversation \\\n   "
  },
  {
    "path": "fastchat/data/split_train_test.py",
    "chars": 1112,
    "preview": "\"\"\"\nSplit the dataset into training and test set.\n\nUsage: python3 -m fastchat.data.split_train_test --in sharegpt.json\n\""
  },
  {
    "path": "fastchat/llm_judge/README.md",
    "chars": 6796,
    "preview": "# LLM Judge\n| [Paper](https://arxiv.org/abs/2306.05685) | [Leaderboard](https://huggingface.co/spaces/lmsys/chatbot-aren"
  },
  {
    "path": "fastchat/llm_judge/clean_judgment.py",
    "chars": 2251,
    "preview": "\"\"\"\nClean model judgment files.\n\"\"\"\nimport argparse\nimport json\n\nselected_models = [\n    \"alpaca-13b\",\n    \"baize-v2-13b"
  },
  {
    "path": "fastchat/llm_judge/common.py",
    "chars": 22379,
    "preview": "\"\"\"\nCommon data structures and utilities.\n\"\"\"\n\nimport ast\nimport dataclasses\nimport glob\nimport json\nimport os\nimport re"
  },
  {
    "path": "fastchat/llm_judge/compute_agreement.py",
    "chars": 4420,
    "preview": "\"\"\"\nCompute agreement among judges.\n\nUsage:\npython compute_agreement.py --judges gpt4-pair human --votefiles human_judgm"
  },
  {
    "path": "fastchat/llm_judge/data/judge_prompts.jsonl",
    "chars": 10449,
    "preview": "{\"name\": \"pair-v2\", \"type\": \"pairwise\", \"system_prompt\": \"Please act as an impartial judge and evaluate the quality of t"
  },
  {
    "path": "fastchat/llm_judge/data/mt_bench/question.jsonl",
    "chars": 48929,
    "preview": "{\"question_id\": 81, \"category\": \"writing\", \"turns\": [\"Compose an engaging travel blog post about a recent trip to Hawaii"
  },
  {
    "path": "fastchat/llm_judge/data/mt_bench/reference_answer/gpt-4.jsonl",
    "chars": 51109,
    "preview": "{\"question_id\": 101, \"answer_id\": \"TFomieEmmAgdeCkvmuvwbc\", \"model_id\": \"gpt-4\", \"choices\": [{\"index\": 0, \"turns\": [\"If "
  },
  {
    "path": "fastchat/llm_judge/data/vicuna_bench/question.jsonl",
    "chars": 13125,
    "preview": "{\"question_id\": 1, \"category\": \"generic\", \"turns\": [\"How can I improve my time management skills?\"]}\n{\"question_id\": 2, "
  },
  {
    "path": "fastchat/llm_judge/data/vicuna_bench/reference_answer/gpt-4.jsonl",
    "chars": 11301,
    "preview": "{\"question_id\": 61, \"answer_id\": \"YdL4XwENkLCLXMbH65rjKy\", \"model_id\": \"gpt-4\", \"choices\": [{\"index\": 0, \"turns\": [\"Here"
  },
  {
    "path": "fastchat/llm_judge/download_mt_bench_pregenerated.py",
    "chars": 2247,
    "preview": "\"\"\"\nDownload the pre-generated model answers and judgments for MT-bench.\n\"\"\"\nimport os\n\nfrom fastchat.utils import run_c"
  },
  {
    "path": "fastchat/llm_judge/gen_api_answer.py",
    "chars": 4661,
    "preview": "\"\"\"Generate answers with GPT-4\n\nUsage:\npython3 gen_api_answer.py --model gpt-3.5-turbo\n\"\"\"\nimport argparse\nimport json\ni"
  },
  {
    "path": "fastchat/llm_judge/gen_judgment.py",
    "chars": 9560,
    "preview": "\"\"\"\nUsage:\npython gen_judgment.py --model-list [LIST-OF-MODEL-ID] --parallel [num-concurrent-api-call] --mode [single|pa"
  },
  {
    "path": "fastchat/llm_judge/gen_model_answer.py",
    "chars": 9659,
    "preview": "\"\"\"Generate answers with local models.\n\nUsage:\npython3 gen_model_answer.py --model-path lmsys/fastchat-t5-3b-v1.0 --mode"
  },
  {
    "path": "fastchat/llm_judge/qa_browser.py",
    "chars": 12680,
    "preview": "\"\"\"\nUsage:\npython3 qa_browser.py --share\n\"\"\"\n\nimport argparse\nfrom collections import defaultdict\nimport re\n\nimport grad"
  },
  {
    "path": "fastchat/llm_judge/show_result.py",
    "chars": 4644,
    "preview": "\"\"\"\nUsage:\npython3 show_result.py --mode [single|pairwise-baseline|pairwise-all]\n\"\"\"\nimport argparse\nimport pandas as pd"
  },
  {
    "path": "fastchat/model/__init__.py",
    "chars": 112,
    "preview": "from fastchat.model.model_adapter import (\n    load_model,\n    get_conversation_template,\n    add_model_args,\n)\n"
  },
  {
    "path": "fastchat/model/apply_delta.py",
    "chars": 5999,
    "preview": "\"\"\"\nApply the delta weights on top of a base model.\n\nUsage:\npython3 -m fastchat.model.apply_delta --base ~/model_weights"
  },
  {
    "path": "fastchat/model/apply_lora.py",
    "chars": 1565,
    "preview": "\"\"\"\nApply the LoRA weights on top of a base model.\n\nUsage:\npython3 -m fastchat.model.apply_lora --base ~/model_weights/l"
  },
  {
    "path": "fastchat/model/compression.py",
    "chars": 10416,
    "preview": "import dataclasses\nimport gc\nimport glob\nimport os\n\nfrom accelerate import init_empty_weights\nfrom accelerate.utils impo"
  },
  {
    "path": "fastchat/model/convert_fp16.py",
    "chars": 846,
    "preview": "\"\"\"\nUsage:\npython3 -m fastchat.model.convert_fp16 --in in-folder --out out-folder\n\"\"\"\nimport argparse\n\nfrom transformers"
  },
  {
    "path": "fastchat/model/llama_condense_monkey_patch.py",
    "chars": 2864,
    "preview": "# Code adapted from https://huggingface.co/kaiokendev/superhot-13b-8k-no-rlhf-test/blob/main/llama_rope_scaled_monkey_pa"
  },
  {
    "path": "fastchat/model/make_delta.py",
    "chars": 1835,
    "preview": "\"\"\"\nMake the delta weights by subtracting base weights.\n\nUsage:\npython3 -m fastchat.model.make_delta --base ~/model_weig"
  },
  {
    "path": "fastchat/model/model_adapter.py",
    "chars": 92186,
    "preview": "\"\"\"Model adapter registration.\"\"\"\n\nimport math\nimport os\nimport re\nimport sys\nfrom typing import Dict, List, Optional\nim"
  },
  {
    "path": "fastchat/model/model_chatglm.py",
    "chars": 4202,
    "preview": "\"\"\"\nInference code for ChatGLM.\nAdapted from https://huggingface.co/THUDM/chatglm-6b/blob/main/modeling_chatglm.py.\n\"\"\"\n"
  },
  {
    "path": "fastchat/model/model_cllm.py",
    "chars": 6729,
    "preview": "import torch\nimport gc\n\nimport os\nimport time\nimport random\nfrom typing import Dict, Optional, Sequence, List, Tuple\nfro"
  },
  {
    "path": "fastchat/model/model_codet5p.py",
    "chars": 3197,
    "preview": "import gc\nfrom threading import Thread\nimport torch\nimport transformers\nfrom transformers import (\n    GenerationConfig,"
  },
  {
    "path": "fastchat/model/model_exllama.py",
    "chars": 2222,
    "preview": "import gc\nimport sys\nfrom typing import Dict\n\nimport torch\n\n\ndef generate_stream_exllama(\n    model,\n    tokenizer,\n    "
  },
  {
    "path": "fastchat/model/model_falcon.py",
    "chars": 4414,
    "preview": "import gc\nfrom threading import Thread\nfrom typing import Iterable\n\nimport torch\nimport transformers\nfrom transformers i"
  },
  {
    "path": "fastchat/model/model_registry.py",
    "chars": 27130,
    "preview": "\"\"\"Additional information of the models.\"\"\"\nfrom collections import namedtuple, OrderedDict\nfrom typing import List\n\n\nMo"
  },
  {
    "path": "fastchat/model/model_xfastertransformer.py",
    "chars": 2352,
    "preview": "import gc\nfrom threading import Thread\n\nimport torch\nfrom transformers import TextIteratorStreamer\n\n\n@torch.inference_mo"
  },
  {
    "path": "fastchat/model/model_yuan2.py",
    "chars": 4385,
    "preview": "import gc\nfrom threading import Thread\nfrom typing import Iterable\n\nimport torch\nimport transformers\nfrom transformers i"
  },
  {
    "path": "fastchat/model/monkey_patch_non_inplace.py",
    "chars": 4216,
    "preview": "\"\"\"\nMonkey patch the llama implementation in the huggingface/transformers library.\nAvoid bugs in mps backend by not usin"
  },
  {
    "path": "fastchat/model/rwkv_model.py",
    "chars": 2537,
    "preview": "import os\nfrom types import SimpleNamespace\nimport warnings\n\nimport torch\n\nos.environ[\"RWKV_JIT_ON\"] = \"1\"\nos.environ[\"R"
  },
  {
    "path": "fastchat/model/upload_hub.py",
    "chars": 1521,
    "preview": "\"\"\"\nUpload weights to huggingface.\n\nUsage:\npython3 -m fastchat.model.upload_hub --model-path ~/model_weights/vicuna-13b "
  },
  {
    "path": "fastchat/modules/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "fastchat/modules/awq.py",
    "chars": 2638,
    "preview": "from dataclasses import dataclass, field\nfrom pathlib import Path\nimport sys\n\nimport torch\nfrom transformers import Auto"
  },
  {
    "path": "fastchat/modules/exllama.py",
    "chars": 1469,
    "preview": "from dataclasses import dataclass, field\nimport sys\n\n\n@dataclass\nclass ExllamaConfig:\n    max_seq_len: int\n    gpu_split"
  },
  {
    "path": "fastchat/modules/gptq.py",
    "chars": 2266,
    "preview": "from dataclasses import dataclass, field\nimport os\nfrom os.path import isdir, isfile\nfrom pathlib import Path\nimport sys"
  },
  {
    "path": "fastchat/modules/xfastertransformer.py",
    "chars": 1298,
    "preview": "from dataclasses import dataclass\nimport sys\n\n\n@dataclass\nclass XftConfig:\n    max_seq_len: int = 4096\n    beam_width: i"
  },
  {
    "path": "fastchat/protocol/api_protocol.py",
    "chars": 4677,
    "preview": "from typing import Literal, Optional, List, Dict, Any, Union\n\nimport time\n\nimport shortuuid\nfrom pydantic import BaseMod"
  },
  {
    "path": "fastchat/protocol/openai_api_protocol.py",
    "chars": 5432,
    "preview": "from typing import Literal, Optional, List, Dict, Any, Union\n\nimport time\n\nimport shortuuid\nfrom pydantic import BaseMod"
  },
  {
    "path": "fastchat/serve/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "fastchat/serve/api_provider.py",
    "chars": 42083,
    "preview": "\"\"\"Call API providers.\"\"\"\n\nimport json\nimport os\nimport random\nimport re\nfrom typing import Optional\nimport time\n\nimport"
  },
  {
    "path": "fastchat/serve/base_model_worker.py",
    "chars": 7031,
    "preview": "import asyncio\nimport threading\nimport time\nfrom typing import List\n\nfrom fastapi import FastAPI, Request, BackgroundTas"
  },
  {
    "path": "fastchat/serve/call_monitor.py",
    "chars": 7988,
    "preview": "import json\nimport os\nimport glob\nimport time\n\nfrom fastapi import FastAPI\nimport hashlib\nimport asyncio\n\nREFRESH_INTERV"
  },
  {
    "path": "fastchat/serve/cli.py",
    "chars": 10936,
    "preview": "\"\"\"\nChat with a model with command line interface.\n\nUsage:\npython3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.5\npy"
  },
  {
    "path": "fastchat/serve/controller.py",
    "chars": 12046,
    "preview": "\"\"\"\nA controller manages distributed workers.\nIt sends worker addresses to clients.\n\"\"\"\nimport argparse\nimport asyncio\ni"
  },
  {
    "path": "fastchat/serve/dashinfer_worker.py",
    "chars": 11331,
    "preview": "\"\"\"\nA model worker that executes the model based on dash-infer.\n\nSee documentations at docs/dashinfer_integration.md\n\"\"\""
  },
  {
    "path": "fastchat/serve/gateway/README.md",
    "chars": 1735,
    "preview": "# fastchat Nginx Gateway\n\n## Purpose of the Gateway\n\nThe Nginx gateway serves the following purposes:\n\n1. Protects Gradi"
  },
  {
    "path": "fastchat/serve/gateway/nginx.conf",
    "chars": 3998,
    "preview": "user www-data;\nworker_processes auto;\npid /run/nginx.pid;\ninclude /etc/nginx/modules-enabled/*.conf;\n\nevents {\n        w"
  },
  {
    "path": "fastchat/serve/gradio_block_arena_anony.py",
    "chars": 20051,
    "preview": "\"\"\"\nChatbot Arena (battle) tab.\nUsers chat with two anonymous models.\n\"\"\"\n\nimport json\nimport time\nimport re\n\nimport gra"
  },
  {
    "path": "fastchat/serve/gradio_block_arena_named.py",
    "chars": 15831,
    "preview": "\"\"\"\nChatbot Arena (side-by-side) tab.\nUsers chat with two chosen models.\n\"\"\"\n\nimport json\nimport time\n\nimport gradio as "
  },
  {
    "path": "fastchat/serve/gradio_block_arena_vision.py",
    "chars": 16784,
    "preview": "\"\"\"\nThe gradio demo server for chatting with a large multimodal model.\n\nUsage:\npython3 -m fastchat.serve.controller\npyth"
  },
  {
    "path": "fastchat/serve/gradio_block_arena_vision_anony.py",
    "chars": 20924,
    "preview": "\"\"\"\nChatbot Arena (battle) tab.\nUsers chat with two anonymous models.\n\"\"\"\n\nimport json\nimport time\n\nimport gradio as gr\n"
  },
  {
    "path": "fastchat/serve/gradio_block_arena_vision_named.py",
    "chars": 19146,
    "preview": "\"\"\"\nMultimodal Chatbot Arena (side-by-side) tab.\nUsers chat with two chosen models.\n\"\"\"\n\nimport json\nimport os\nimport ti"
  },
  {
    "path": "fastchat/serve/gradio_global_state.py",
    "chars": 441,
    "preview": "from dataclasses import dataclass, field\nfrom typing import List\n\n\n@dataclass\nclass Context:\n    text_models: List[str] "
  },
  {
    "path": "fastchat/serve/gradio_web_server.py",
    "chars": 37103,
    "preview": "\"\"\"\nThe gradio demo server for chatting with a single model.\n\"\"\"\n\nimport argparse\nfrom collections import defaultdict\nim"
  },
  {
    "path": "fastchat/serve/gradio_web_server_multi.py",
    "chars": 15378,
    "preview": "\"\"\"\nThe gradio demo server with multiple tabs.\nIt supports chatting with a single model or chatting with two models side"
  },
  {
    "path": "fastchat/serve/huggingface_api.py",
    "chars": 2296,
    "preview": "\"\"\"\nUse FastChat with Hugging Face generation APIs.\n\nUsage:\npython3 -m fastchat.serve.huggingface_api --model lmsys/vicu"
  },
  {
    "path": "fastchat/serve/huggingface_api_worker.py",
    "chars": 12619,
    "preview": "\"\"\"\nA model worker that calls huggingface inference endpoint.\n\nRegister models in a JSON file with the following format:"
  },
  {
    "path": "fastchat/serve/inference.py",
    "chars": 18802,
    "preview": "\"\"\"Inference for FastChat models.\"\"\"\nimport abc\nimport gc\nimport json\nimport math\nimport os\nimport sys\nimport time\nfrom "
  },
  {
    "path": "fastchat/serve/launch_all_serve.py",
    "chars": 8448,
    "preview": "\"\"\"\nUsage: python launch_all_serve_by_shell.py --model-path-address \"THUDM/chatglm2-6b@localhost@2021\" \"huggyllama/llama"
  },
  {
    "path": "fastchat/serve/lightllm_worker.py",
    "chars": 17332,
    "preview": "\"\"\"\nA model worker that executes the model based on LightLLM.\n\nSee documentations at docs/lightllm_integration.md\n\"\"\"\n\ni"
  },
  {
    "path": "fastchat/serve/mlx_worker.py",
    "chars": 8598,
    "preview": "\"\"\"\nA model worker using Apple MLX\n\nhttps://github.com/ml-explore/mlx-examples/tree/main/llms\n\nCode based on vllm_worker"
  },
  {
    "path": "fastchat/serve/model_worker.py",
    "chars": 15165,
    "preview": "\"\"\"\nA model worker that executes the model.\n\"\"\"\nimport argparse\nimport base64\nimport gc\nimport json\nimport os\nfrom typin"
  },
  {
    "path": "fastchat/serve/monitor/add_markdown_info.py",
    "chars": 2733,
    "preview": "import pandas as pd\nimport re\nimport argparse\n\nfrom tqdm import tqdm\n\ntqdm.pandas()\n\n\ndef count_markdown_elements(markdo"
  },
  {
    "path": "fastchat/serve/monitor/basic_stats.py",
    "chars": 7243,
    "preview": "import argparse\nimport code\nimport datetime\nimport json\nimport os\nfrom pytz import timezone\nimport time\n\nimport pandas a"
  },
  {
    "path": "fastchat/serve/monitor/classify/README.md",
    "chars": 2263,
    "preview": "## Download dataset\nWe have pre-generated several category classifier benchmarks and ground truths. You can download the"
  },
  {
    "path": "fastchat/serve/monitor/classify/category.py",
    "chars": 30509,
    "preview": "# Tag structure\n# - category_tag\n#     - criteria_v0.1\n#         - specificity\n#         - ...\n#     - math_v0.1\n#      "
  },
  {
    "path": "fastchat/serve/monitor/classify/config.yaml",
    "chars": 503,
    "preview": "# Yaml config file for category classification\n\ninput_file: null # json\ncache_file: null # json\noutput_file: null # json"
  },
  {
    "path": "fastchat/serve/monitor/classify/display_score.py",
    "chars": 2535,
    "preview": "import pandas as pd\nimport argparse\nimport os\nfrom glob import glob\nfrom sklearn.metrics import recall_score, precision_"
  },
  {
    "path": "fastchat/serve/monitor/classify/label.py",
    "chars": 14655,
    "preview": "import argparse\nimport json\nimport pandas as pd\nimport os\nimport time\nimport concurrent.futures\nimport tqdm\nimport yaml\n"
  },
  {
    "path": "fastchat/serve/monitor/classify/vision_config.yaml",
    "chars": 577,
    "preview": "# Yaml config file for category classification\n\ninput_file: null # json\ncache_file: null # json\noutput_file: null # json"
  },
  {
    "path": "fastchat/serve/monitor/clean_battle_data.py",
    "chars": 12380,
    "preview": "\"\"\"\nClean chatbot arena battle log.\n\nUsage:\npython3 clean_battle_data.py --mode conv_release\n\"\"\"\nimport argparse\nimport "
  },
  {
    "path": "fastchat/serve/monitor/clean_chat_data.py",
    "chars": 6934,
    "preview": "\"\"\"\nClean chatbot arena chat log.\n\nUsage:\npython3 clean_chat_data.py\n\"\"\"\nimport argparse\nimport json\nimport os\nimport ha"
  },
  {
    "path": "fastchat/serve/monitor/copilot_arena.py",
    "chars": 3464,
    "preview": "import gradio as gr\nimport pandas as pd\nimport requests\nimport os\n\nfrom fastchat.serve.monitor.monitor import recompute_"
  },
  {
    "path": "fastchat/serve/monitor/criteria_labeling.py",
    "chars": 8169,
    "preview": "import argparse\nimport json\nimport pandas as pd\nimport os\nimport re\nimport ast\nimport time\nimport concurrent.futures\nimp"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/arena_33k/count_unique_users.py",
    "chars": 647,
    "preview": "\"\"\"Count the unique users in a battle log file.\"\"\"\n\nimport argparse\nimport json\n\n\nif __name__ == \"__main__\":\n    parser "
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/arena_33k/filter_bad_conv.py",
    "chars": 4554,
    "preview": "\"\"\"\nFilter conversations for release.\n\nUsage: python3 filter_bad_conv.py --in clean_battle_conv_20230630_tagged_v1_pii.j"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/arena_33k/merge_field.py",
    "chars": 722,
    "preview": "\"\"\"Count the unique users in a battle log file.\"\"\"\n\nimport argparse\nimport json\n\n\nif __name__ == \"__main__\":\n    parser "
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/arena_33k/sample.py",
    "chars": 785,
    "preview": "\"\"\"\nCount the unique users in a battle log file.\n\nUsage:\npython3 -input in.json --number 1000\n\"\"\"\n\nimport argparse\nimpor"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/arena_33k/upload_hf_dataset.py",
    "chars": 281,
    "preview": "\"\"\"\nUpload to huggingface.\n\"\"\"\nimport json\nfrom datasets import Dataset, DatasetDict, load_dataset\n\nobjs = json.load(ope"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/approve_all.py",
    "chars": 471,
    "preview": "import requests\n\nheaders = {\"authorization\": \"Bearer hf_XXX\"}\n\nurl = \"https://huggingface.co/api/datasets/lmsys/lmsys-ch"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/compute_stats.py",
    "chars": 2897,
    "preview": "\"\"\"\nFrom colab:\nhttps://colab.research.google.com/drive/1oMdw_Lqgmd6DletSOLHsyD-Rc96cRShs?usp=sharing\n\"\"\"\nimport argpars"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/filter_bad_conv.py",
    "chars": 4278,
    "preview": "\"\"\"\nFilter conversations for release.\n\nDependency:\npip install opencc-python-reimplementedpip install opencc-python-reim"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/final_post_processing.py",
    "chars": 670,
    "preview": "import argparse\nimport json\n\nfrom tqdm import tqdm\nimport numpy as np\n\n\nif __name__ == \"__main__\":\n    parser = argparse"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/instructions.md",
    "chars": 587,
    "preview": "```\nexport BASE=clean_conv_20230809_100k_pii\nexport SCALE=10\n\n# filter words\npython3 filter_bad_conv.py --in $BASE.json\n"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/merge_oai_tag.py",
    "chars": 1385,
    "preview": "import argparse\nimport json\nimport time\n\nfrom tqdm import tqdm\n\n\nif __name__ == \"__main__\":\n    parser = argparse.Argume"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/process_all.sh",
    "chars": 516,
    "preview": "export BASE=clean_conv_20230809_1.5M_pii\n#export BASE=clean_conv_20230809_100k_pii\nexport SCALE=1\n\n# Filter words\npython"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/sample.py",
    "chars": 786,
    "preview": "\"\"\"\nCount the unique users in a battle log file.\n\nUsage:\npython3 -input in.json --number 1000\n\"\"\"\n\nimport argparse\nimpor"
  },
  {
    "path": "fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/upload_hf_dataset.py",
    "chars": 445,
    "preview": "\"\"\"\nUpload to huggingface.\n\"\"\"\nimport argparse\nimport json\nfrom datasets import Dataset, DatasetDict, load_dataset\n\n\nif "
  },
  {
    "path": "fastchat/serve/monitor/deduplication.py",
    "chars": 2655,
    "preview": "import os\nimport json\nimport pandas as pd\nimport ast\n\nimport matplotlib.pyplot as plt\nfrom matplotlib import rcParams\n\ni"
  },
  {
    "path": "fastchat/serve/monitor/elo_analysis.py",
    "chars": 18292,
    "preview": "import argparse\nimport ast\nfrom collections import defaultdict\nimport datetime\nimport json\nimport math\nimport pickle\nfro"
  },
  {
    "path": "fastchat/serve/monitor/inspect_conv.py",
    "chars": 2784,
    "preview": "import argparse\nimport code\nimport datetime\nimport json\nimport os\nfrom pytz import timezone\nimport time\n\nimport pandas a"
  },
  {
    "path": "fastchat/serve/monitor/intersect_conv_file.py",
    "chars": 861,
    "preview": "\"\"\"\nTake the intersection of two conversation files.\n\nUsage: python3 -m fastchat.data.merge --input input.json --conv-id"
  },
  {
    "path": "fastchat/serve/monitor/leaderboard_csv_to_html.py",
    "chars": 1303,
    "preview": "\"\"\"\nConvert a leaderboard csv file to html table used in the blog.\n\nUsage:\npython3 leaderboard_csv_to_html.py --in leade"
  },
  {
    "path": "fastchat/serve/monitor/monitor.py",
    "chars": 41772,
    "preview": "\"\"\"\nLive monitor of the website statistics and leaderboard.\n\nDependency:\nsudo apt install pkg-config libicu-dev\npip inst"
  },
  {
    "path": "fastchat/serve/monitor/monitor_md.py",
    "chars": 9841,
    "preview": "import pandas as pd\nimport pickle\nimport gradio as gr\n\nfrom fastchat.constants import SURVEY_LINK\n\ndeprecated_model_name"
  },
  {
    "path": "fastchat/serve/monitor/rating_systems.py",
    "chars": 13208,
    "preview": "import os\nimport math\nimport multiprocessing as mp\nfrom functools import partial\nimport numpy as np\nfrom scipy.special i"
  },
  {
    "path": "fastchat/serve/monitor/summarize_cluster.py",
    "chars": 2911,
    "preview": "\"\"\"\nUsage:\npython3 summarize_cluster.py --in results_c20_kmeans_cluster.pkl --model gpt-4 --num-prompts 100\npython3 summ"
  },
  {
    "path": "fastchat/serve/monitor/tag_openai_moderation.py",
    "chars": 1662,
    "preview": "\"\"\"\nAdd OpenAI moderation API results to all conversations.\n\"\"\"\nimport argparse\nfrom concurrent.futures import ThreadPoo"
  },
  {
    "path": "fastchat/serve/monitor/topic_clustering.py",
    "chars": 9930,
    "preview": "\"\"\"\n\nUsage:\npython3 topic_clustering.py --in arena.json --english-only --min-length 32\npython3 topic_clustering.py --in "
  },
  {
    "path": "fastchat/serve/monitor/vote_time_stats/README.md",
    "chars": 180,
    "preview": "# Instructions\n\nFirst run `analyze_data.py` to collect metadata of all votes.\n\nThen run `plot.py` to get the plot. You n"
  },
  {
    "path": "fastchat/serve/monitor/vote_time_stats/analyze_data.py",
    "chars": 3802,
    "preview": "import datetime\nimport glob\nimport json\nfrom collections import deque\nimport tqdm\n\n\ndef _serialize_json(data):\n    # Ser"
  },
  {
    "path": "fastchat/serve/monitor/vote_time_stats/plot.py",
    "chars": 1958,
    "preview": "import json\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport numpy as np\n\n\ninfile = \"output.jsonl\"\ndate = \"2"
  },
  {
    "path": "fastchat/serve/multi_model_worker.py",
    "chars": 9944,
    "preview": "\"\"\"\nA multi-model worker that contains multiple sub-works one for each model.  This\nsupports running a list of models on"
  },
  {
    "path": "fastchat/serve/openai_api_server.py",
    "chars": 33101,
    "preview": "\"\"\"A server that provides OpenAI-compatible RESTful APIs. It supports:\n\n- Chat Completions. (Reference: https://platform"
  },
  {
    "path": "fastchat/serve/register_worker.py",
    "chars": 834,
    "preview": "\"\"\"\nManually register workers.\n\nUsage:\npython3 -m fastchat.serve.register_worker --controller http://localhost:21001 --w"
  },
  {
    "path": "fastchat/serve/remote_logger.py",
    "chars": 1620,
    "preview": "# A JSON logger that sends data to remote endpoint.\n# Architecturally, it hosts a background thread that sends logs to a"
  },
  {
    "path": "fastchat/serve/sglang_worker.py",
    "chars": 10037,
    "preview": "\"\"\"\nA model worker that executes the model based on SGLang.\n\nUsage:\npython3 -m fastchat.serve.sglang_worker --model-path"
  },
  {
    "path": "fastchat/serve/shutdown_serve.py",
    "chars": 751,
    "preview": "\"\"\"\nUsage：\npython shutdown_serve.py --down all\noptions: \"all\",\"controller\",\"model_worker\",\"openai_api_server\"， `all` mea"
  },
  {
    "path": "fastchat/serve/test_message.py",
    "chars": 2469,
    "preview": "\"\"\"Send a test message.\"\"\"\nimport argparse\nimport json\n\nimport requests\n\nfrom fastchat.model.model_adapter import get_co"
  },
  {
    "path": "fastchat/serve/test_throughput.py",
    "chars": 3981,
    "preview": "\"\"\"Benchmarking script to test the throughput of serving workers.\"\"\"\nimport argparse\nimport json\n\nimport requests\nimport"
  },
  {
    "path": "fastchat/serve/vision/create_vqa_examples_dir.py",
    "chars": 3970,
    "preview": "import datasets\nfrom datasets import load_dataset\nfrom PIL import Image\nfrom pathlib import Path\nimport pandas as pd\nimp"
  },
  {
    "path": "fastchat/serve/vision/create_vqa_examples_json.py",
    "chars": 1044,
    "preview": "\"\"\"\nChanges proportion of examples in metadata_sampled.json\n\nUsage:\n\npython3 -m fastchat.serve.vision.create_vqa_example"
  },
  {
    "path": "fastchat/serve/vision/image.py",
    "chars": 4601,
    "preview": "import base64\nfrom enum import auto, IntEnum\nfrom io import BytesIO\n\nfrom pydantic import BaseModel\n\n\nclass ImageFormat("
  },
  {
    "path": "fastchat/serve/vllm_worker.py",
    "chars": 10303,
    "preview": "\"\"\"\nA model worker that executes the model based on vLLM.\n\nSee documentations at docs/vllm_integration.md\n\"\"\"\n\nimport ar"
  },
  {
    "path": "fastchat/train/llama2_flash_attn_monkey_patch.py",
    "chars": 8321,
    "preview": "import warnings\nfrom typing import Optional, Tuple\n\nimport torch\nfrom flash_attn import __version__ as flash_attn_versio"
  },
  {
    "path": "fastchat/train/llama_flash_attn_monkey_patch.py",
    "chars": 4054,
    "preview": "from typing import Optional, Tuple\nimport warnings\n\nimport torch\nfrom torch import nn\nimport transformers\nfrom transform"
  },
  {
    "path": "fastchat/train/llama_xformers_attn_monkey_patch.py",
    "chars": 4916,
    "preview": "\"\"\"\nDirectly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_a"
  },
  {
    "path": "fastchat/train/train.py",
    "chars": 10397,
    "preview": "# This code is based on tatsu-lab/stanford_alpaca. Below is the original copyright:\n#\n#    Copyright 2023 Rohan Taori, I"
  },
  {
    "path": "fastchat/train/train_baichuan.py",
    "chars": 11470,
    "preview": "# This code is based on tatsu-lab/stanford_alpaca. Below is the original copyright:\n#\n#    Copyright 2023 Rohan Taori, I"
  },
  {
    "path": "fastchat/train/train_flant5.py",
    "chars": 14992,
    "preview": "# Adapted from tatsu-lab@stanford_alpaca. Below is the original copyright:\n#    Copyright 2023 Rohan Taori, Ishaan Gulra"
  },
  {
    "path": "fastchat/train/train_lora.py",
    "chars": 7803,
    "preview": "# Usage: deepspeed train_lora.py --deepspeed <$PATH_TO_DEEPSPEED_CONFIG>\n\n# Adapted from tatsu-lab@stanford_alpaca. Belo"
  },
  {
    "path": "fastchat/train/train_lora_t5.py",
    "chars": 7637,
    "preview": "# Adapted from tatsu-lab@stanford_alpaca. Below is the original copyright:\n#    Copyright 2023 Rohan Taori, Ishaan Gulra"
  },
  {
    "path": "fastchat/train/train_mem.py",
    "chars": 355,
    "preview": "# Make it more memory efficient by monkey patching the LLaMA model with FlashAttn.\n\n# Need to call this before importing"
  },
  {
    "path": "fastchat/train/train_with_template.py",
    "chars": 13634,
    "preview": "# This code is based on tatsu-lab/stanford_alpaca. Below is the original copyright:\n#\n#    Copyright 2023 Rohan Taori, I"
  },
  {
    "path": "fastchat/train/train_xformers.py",
    "chars": 372,
    "preview": "# Make it more memory efficient by monkey patching the LLaMA model with xformers attention.\n\n# Need to call this before "
  },
  {
    "path": "fastchat/train/train_yuan2.py",
    "chars": 16796,
    "preview": "# This code is based on tatsu-lab/stanford_alpaca. Below is the original copyright:\n#\n#    Copyright 2023 Rohan Taori, I"
  },
  {
    "path": "fastchat/utils.py",
    "chars": 15464,
    "preview": "\"\"\"\nCommon utilities.\n\"\"\"\nfrom asyncio import AbstractEventLoop\nfrom io import BytesIO\nimport base64\nimport json\nimport "
  },
  {
    "path": "format.sh",
    "chars": 2322,
    "preview": "#!/usr/bin/env bash\n\n# Adapted from https://github.com/skypilot-org/skypilot/blob/master/format.sh\n\n# Cause the script t"
  },
  {
    "path": "playground/FastChat_API_GoogleColab.ipynb",
    "chars": 99331,
    "preview": "{\n  \"cells\": [\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"id\": \"1UDur96B5C7T\"\n      },\n      \"sou"
  },
  {
    "path": "playground/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "playground/benchmark/benchmark_api_provider.py",
    "chars": 3960,
    "preview": "\"\"\"\nUsage:\npython3 -m playground.benchmark.benchmark_api_provider --api-endpoint-file api_endpoints.json --output-file ."
  },
  {
    "path": "playground/deepspeed_config_s2.json",
    "chars": 291,
    "preview": "{\n  \"zero_optimization\": {\n    \"stage\": 2,\n    \"offload_optimizer\": {\n        \"device\": \"cpu\"\n    },\n    \"contiguous_gra"
  },
  {
    "path": "playground/deepspeed_config_s3.json",
    "chars": 916,
    "preview": "{\n    \"fp16\": {\n        \"enabled\": \"auto\",\n        \"loss_scale\": 0,\n        \"loss_scale_window\": 1000,\n        \"initial_"
  },
  {
    "path": "playground/test_embedding/README.md",
    "chars": 720,
    "preview": "## Machine Learning with Embeddings\nYou can use embeddings to\n- Evaluate text similarity, see [test_sentence_similarity."
  },
  {
    "path": "playground/test_embedding/test_classification.py",
    "chars": 2788,
    "preview": "import json\nimport os\n\nimport numpy as np\nimport openai\nimport pandas as pd\nimport requests\nfrom sklearn.ensemble import"
  },
  {
    "path": "playground/test_embedding/test_semantic_search.py",
    "chars": 3075,
    "preview": "import json\nimport os\n\nimport numpy as np\nimport openai\nimport pandas as pd\nimport requests\nfrom scipy.spatial.distance "
  },
  {
    "path": "playground/test_embedding/test_sentence_similarity.py",
    "chars": 1839,
    "preview": "import json\nimport os\n\nimport numpy as np\nimport openai\nimport requests\nfrom scipy.spatial.distance import cosine\n\n\ndef "
  },
  {
    "path": "pyproject.toml",
    "chars": 1342,
    "preview": "[build-system]\nrequires = [\"setuptools>=61.0\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[project]\nname = \"fschat\"\nversio"
  },
  {
    "path": "scripts/build-api.sh",
    "chars": 1992,
    "preview": "#!/bin/bash\n# A rather convenient script for spinning up models behind screens\n\n\n# Variables\nPROJECT_DIR=\"$(pwd)\"\nCONDA_"
  },
  {
    "path": "scripts/test_readme_train.sh",
    "chars": 830,
    "preview": "torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train_mem.py \\\n    --model_name_or_path meta-llama/Llama-"
  },
  {
    "path": "scripts/train_lora.sh",
    "chars": 858,
    "preview": "deepspeed fastchat/train/train_lora.py \\\n    --model_name_or_path lmsys/vicuna-7b-v1.5  \\\n    --lora_r 8 \\\n    --lora_al"
  },
  {
    "path": "scripts/train_vicuna_13b.sh",
    "chars": 902,
    "preview": "torchrun --nproc_per_node=8 --master_port=20001 fastchat/train/train_mem.py \\\n    --model_name_or_path ~/model_weights/l"
  },
  {
    "path": "scripts/train_vicuna_7b.sh",
    "chars": 893,
    "preview": "torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train_mem.py \\\n    --model_name_or_path ~/model_weights/l"
  },
  {
    "path": "scripts/upload_pypi.sh",
    "chars": 60,
    "preview": "rm -rf dist\npython3 -m build\npython3 -m twine upload dist/*\n"
  },
  {
    "path": "tests/README.md",
    "chars": 2216,
    "preview": "## Unit tests for FastChat\n\n### Test CLI Inference\n\n```\npython3 test_cli.py\n```\n\n### Test OpenAI API Server\n\n```\npython3"
  },
  {
    "path": "tests/killall_python.sh",
    "chars": 88,
    "preview": "kill -9 $(ps aux | grep 'python' | grep 'fastchat' | grep -v 'grep' | awk '{print $2}')\n"
  }
]

// ... and 8 more files (download for full content)

About this extraction

This page contains the full source code of the lm-sys/FastChat GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 208 files (1.5 MB), approximately 398.6k tokens, and a symbol index with 1301 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo