Full Code of xorbitsai/inference for AI

main ebc027138775 cached

1635 files

67.4 MB

5.7M tokens

8657 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (22,847K chars total). Download the full file to get everything.

Repository: xorbitsai/inference
Branch: main
Commit: ebc027138775
Files: 1635
Total size: 67.4 MB

Directory structure:
gitextract_u_nl6j7f/

├── .dockerignore
├── .gitattributes
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yaml
│   │   └── feature_request.yaml
│   └── workflows/
│       ├── assign.yaml
│       ├── docker-cd.yaml
│       ├── issue.yaml
│       ├── pr_auto_run_gen_docs.yaml
│       ├── python.yaml
│       └── release.yaml
├── .gitignore
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── LICENSE
├── MANIFEST.in
├── README.md
├── README_ja_JP.md
├── README_zh_CN.md
├── benchmark/
│   ├── README.md
│   ├── benchmark_embedding.py
│   ├── benchmark_latency.py
│   ├── benchmark_long.py
│   ├── benchmark_rerank.py
│   ├── benchmark_runner.py
│   ├── benchmark_serving.py
│   └── utils.py
├── doc/
│   ├── Makefile
│   ├── source/
│   │   ├── _static/
│   │   │   └── switcher.json
│   │   ├── conf.py
│   │   ├── development/
│   │   │   ├── contributing_codebase.rst
│   │   │   ├── contributing_environment.rst
│   │   │   ├── index.rst
│   │   │   └── xinference_internals.rst
│   │   ├── examples/
│   │   │   ├── ai_podcast.rst
│   │   │   ├── chatbot.rst
│   │   │   ├── gradio_chatinterface.rst
│   │   │   ├── index.rst
│   │   │   ├── langchain_streamlit_doc_chat.rst
│   │   │   └── pdf_chatbot.rst
│   │   ├── gen_docs.py
│   │   ├── getting_started/
│   │   │   ├── environments.rst
│   │   │   ├── index.rst
│   │   │   ├── installation.rst
│   │   │   ├── installation_npu.rst
│   │   │   ├── logging.rst
│   │   │   ├── release_notes.rst
│   │   │   ├── troubleshooting.rst
│   │   │   ├── using_docker_image.rst
│   │   │   ├── using_kubernetes.rst
│   │   │   └── using_xinference.rst
│   │   ├── index.rst
│   │   ├── locale/
│   │   │   └── zh_CN/
│   │   │       └── LC_MESSAGES/
│   │   │           ├── development/
│   │   │           │   ├── contributing_codebase.po
│   │   │           │   ├── contributing_environment.po
│   │   │           │   ├── index.po
│   │   │           │   └── xinference_internals.po
│   │   │           ├── examples/
│   │   │           │   ├── ai_podcast.po
│   │   │           │   ├── chatbot.po
│   │   │           │   ├── gradio_chatinterface.po
│   │   │           │   ├── index.po
│   │   │           │   ├── langchain_streamlit_doc_chat.po
│   │   │           │   └── pdf_chatbot.po
│   │   │           ├── getting_started/
│   │   │           │   ├── environments.po
│   │   │           │   ├── index.po
│   │   │           │   ├── installation.po
│   │   │           │   ├── installation_npu.po
│   │   │           │   ├── logging.po
│   │   │           │   ├── release_notes.po
│   │   │           │   ├── troubleshooting.po
│   │   │           │   ├── using_docker_image.po
│   │   │           │   ├── using_kubernetes.po
│   │   │           │   └── using_xinference.po
│   │   │           ├── getting_started.po
│   │   │           ├── index.po
│   │   │           ├── models/
│   │   │           │   ├── builtin/
│   │   │           │   │   ├── audio/
│   │   │           │   │   │   └── index.po
│   │   │           │   │   ├── embedding/
│   │   │           │   │   │   ├── bge-base-en-v1.5.po
│   │   │           │   │   │   ├── bge-base-en.po
│   │   │           │   │   │   ├── bge-base-zh-v1.5.po
│   │   │           │   │   │   ├── bge-base-zh.po
│   │   │           │   │   │   ├── bge-large-en-v1.5.po
│   │   │           │   │   │   ├── bge-large-en.po
│   │   │           │   │   │   ├── bge-large-zh-noinstruct.po
│   │   │           │   │   │   ├── bge-large-zh-v1.5.po
│   │   │           │   │   │   ├── bge-large-zh.po
│   │   │           │   │   │   ├── bge-small-en-v1.5.po
│   │   │           │   │   │   ├── bge-small-zh-v1.5.po
│   │   │           │   │   │   ├── bge-small-zh.po
│   │   │           │   │   │   ├── e5-large-v2.po
│   │   │           │   │   │   ├── gte-base.po
│   │   │           │   │   │   ├── gte-large.po
│   │   │           │   │   │   ├── index.po
│   │   │           │   │   │   ├── jina-embeddings-v2-base-en.po
│   │   │           │   │   │   ├── jina-embeddings-v2-small-en.po
│   │   │           │   │   │   └── multilingual-e5-large.po
│   │   │           │   │   ├── image/
│   │   │           │   │   │   ├── flux.1-dev.po
│   │   │           │   │   │   ├── flux.1-schnell.po
│   │   │           │   │   │   ├── index.po
│   │   │           │   │   │   ├── kolors.po
│   │   │           │   │   │   ├── sd-turbo.po
│   │   │           │   │   │   ├── sd3-medium.po
│   │   │           │   │   │   ├── sdxl-turbo.po
│   │   │           │   │   │   ├── stable-diffusion-2-inpainting.po
│   │   │           │   │   │   ├── stable-diffusion-inpainting.po
│   │   │           │   │   │   ├── stable-diffusion-v1.5.po
│   │   │           │   │   │   ├── stable-diffusion-xl-base-1.0.po
│   │   │           │   │   │   └── stable-diffusion-xl-inpainting.po
│   │   │           │   │   ├── index.po
│   │   │           │   │   ├── llm/
│   │   │           │   │   │   ├── baichuan-2-chat.po
│   │   │           │   │   │   ├── baichuan-2.po
│   │   │           │   │   │   ├── baichuan-chat.po
│   │   │           │   │   │   ├── baichuan.po
│   │   │           │   │   │   ├── chatglm.po
│   │   │           │   │   │   ├── chatglm2-32k.po
│   │   │           │   │   │   ├── chatglm2.po
│   │   │           │   │   │   ├── chatglm3-32k.po
│   │   │           │   │   │   ├── chatglm3.po
│   │   │           │   │   │   ├── code-llama-instruct.po
│   │   │           │   │   │   ├── code-llama-python.po
│   │   │           │   │   │   ├── code-llama.po
│   │   │           │   │   │   ├── deepseek-chat.po
│   │   │           │   │   │   ├── deepseek-coder-instruct.po
│   │   │           │   │   │   ├── falcon-instruct.po
│   │   │           │   │   │   ├── falcon.po
│   │   │           │   │   │   ├── glaive-coder.po
│   │   │           │   │   │   ├── gorilla-openfunctions-v1.po
│   │   │           │   │   │   ├── gpt-2.po
│   │   │           │   │   │   ├── index.po
│   │   │           │   │   │   ├── internlm-20b.po
│   │   │           │   │   │   ├── internlm-7b.po
│   │   │           │   │   │   ├── internlm-chat-20b.po
│   │   │           │   │   │   ├── internlm-chat-7b.po
│   │   │           │   │   │   ├── llama-2-chat.po
│   │   │           │   │   │   ├── llama-2.po
│   │   │           │   │   │   ├── mistral-instruct-v0.1.po
│   │   │           │   │   │   ├── mistral-instruct-v0.2.po
│   │   │           │   │   │   ├── mistral-v0.1.po
│   │   │           │   │   │   ├── mixtral-instruct-v0.1.po
│   │   │           │   │   │   ├── mixtral-v0.1.po
│   │   │           │   │   │   ├── openbuddy.po
│   │   │           │   │   │   ├── openhermes-2.5.po
│   │   │           │   │   │   ├── opt.po
│   │   │           │   │   │   ├── orca.po
│   │   │           │   │   │   ├── qwen-chat.po
│   │   │           │   │   │   ├── starchat-beta.po
│   │   │           │   │   │   ├── starcoder.po
│   │   │           │   │   │   ├── starcoderplus.po
│   │   │           │   │   │   ├── tiny-llama.po
│   │   │           │   │   │   ├── vicuna-v1.3.po
│   │   │           │   │   │   ├── vicuna-v1.5-16k.po
│   │   │           │   │   │   ├── vicuna-v1.5.po
│   │   │           │   │   │   ├── wizardcoder-python-v1.0.po
│   │   │           │   │   │   ├── wizardlm-v1.0.po
│   │   │           │   │   │   ├── wizardmath-v1.0.po
│   │   │           │   │   │   ├── xverse-chat.po
│   │   │           │   │   │   ├── xverse.po
│   │   │           │   │   │   ├── yi-200k.po
│   │   │           │   │   │   ├── yi-chat.po
│   │   │           │   │   │   ├── yi.po
│   │   │           │   │   │   ├── zephyr-7b-alpha.po
│   │   │           │   │   │   └── zephyr-7b-beta.po
│   │   │           │   │   ├── rerank/
│   │   │           │   │   │   ├── bge-reranker-base.po
│   │   │           │   │   │   ├── bge-reranker-large.po
│   │   │           │   │   │   └── index.po
│   │   │           │   │   └── video/
│   │   │           │   │       ├── cogvideox-2b.po
│   │   │           │   │       └── index.po
│   │   │           │   ├── custom.po
│   │   │           │   ├── index.po
│   │   │           │   ├── lora.po
│   │   │           │   ├── model_abilities/
│   │   │           │   │   ├── audio.po
│   │   │           │   │   ├── chat.po
│   │   │           │   │   ├── embed.po
│   │   │           │   │   ├── flexible.po
│   │   │           │   │   ├── image.po
│   │   │           │   │   ├── index.po
│   │   │           │   │   ├── multimodal.po
│   │   │           │   │   ├── rerank.po
│   │   │           │   │   ├── tools.po
│   │   │           │   │   └── video.po
│   │   │           │   ├── model_memory.po
│   │   │           │   ├── model_update.po
│   │   │           │   ├── source/
│   │   │           │   │   └── source.po
│   │   │           │   ├── sources/
│   │   │           │   │   └── sources.po
│   │   │           │   ├── virtualenv.po
│   │   │           │   ├── xinference_model_hub.po
│   │   │           │   └── xinference_models_hub.po
│   │   │           ├── reference/
│   │   │           │   └── index.po
│   │   │           ├── reference.po
│   │   │           └── user_guide/
│   │   │               ├── auth_system.po
│   │   │               ├── backends.po
│   │   │               ├── cache_management.po
│   │   │               ├── client_api.po
│   │   │               ├── continuous_batching.po
│   │   │               ├── distributed_inference.po
│   │   │               ├── index.po
│   │   │               ├── launch.po
│   │   │               └── vllm_enhancement.po
│   │   ├── models/
│   │   │   ├── builtin/
│   │   │   │   ├── audio/
│   │   │   │   │   ├── belle-distilwhisper-large-v2-zh.rst
│   │   │   │   │   ├── belle-whisper-large-v2-zh.rst
│   │   │   │   │   ├── belle-whisper-large-v3-zh.rst
│   │   │   │   │   ├── chattts.rst
│   │   │   │   │   ├── cosyvoice-300m-instruct.rst
│   │   │   │   │   ├── cosyvoice-300m-sft.rst
│   │   │   │   │   ├── cosyvoice-300m.rst
│   │   │   │   │   ├── cosyvoice2-0.5b.rst
│   │   │   │   │   ├── f5-tts-mlx.rst
│   │   │   │   │   ├── f5-tts.rst
│   │   │   │   │   ├── fishspeech-1.5.rst
│   │   │   │   │   ├── fun-asr-mlt-nano-2512.rst
│   │   │   │   │   ├── fun-asr-nano-2512.rst
│   │   │   │   │   ├── index.rst
│   │   │   │   │   ├── indextts2.rst
│   │   │   │   │   ├── kokoro-82m-mlx.rst
│   │   │   │   │   ├── kokoro-82m-v1.1-zh.rst
│   │   │   │   │   ├── kokoro-82m.rst
│   │   │   │   │   ├── megatts3.rst
│   │   │   │   │   ├── melotts-chinese.rst
│   │   │   │   │   ├── melotts-english-v2.rst
│   │   │   │   │   ├── melotts-english-v3.rst
│   │   │   │   │   ├── melotts-english.rst
│   │   │   │   │   ├── melotts-french.rst
│   │   │   │   │   ├── melotts-japanese.rst
│   │   │   │   │   ├── melotts-korean.rst
│   │   │   │   │   ├── melotts-spanish.rst
│   │   │   │   │   ├── paraformer-zh-hotword.rst
│   │   │   │   │   ├── paraformer-zh-long.rst
│   │   │   │   │   ├── paraformer-zh-spk.rst
│   │   │   │   │   ├── paraformer-zh.rst
│   │   │   │   │   ├── qwen3-asr-0.6b.rst
│   │   │   │   │   ├── qwen3-asr-1.7b.rst
│   │   │   │   │   ├── seaco-paraformer-zh.rst
│   │   │   │   │   ├── sensevoicesmall.rst
│   │   │   │   │   ├── whisper-base-mlx.rst
│   │   │   │   │   ├── whisper-base.en-mlx.rst
│   │   │   │   │   ├── whisper-base.en.rst
│   │   │   │   │   ├── whisper-base.rst
│   │   │   │   │   ├── whisper-large-v3-mlx.rst
│   │   │   │   │   ├── whisper-large-v3-turbo-mlx.rst
│   │   │   │   │   ├── whisper-large-v3-turbo.rst
│   │   │   │   │   ├── whisper-large-v3.rst
│   │   │   │   │   ├── whisper-medium-mlx.rst
│   │   │   │   │   ├── whisper-medium.en-mlx.rst
│   │   │   │   │   ├── whisper-medium.en.rst
│   │   │   │   │   ├── whisper-medium.rst
│   │   │   │   │   ├── whisper-small-mlx.rst
│   │   │   │   │   ├── whisper-small.en-mlx.rst
│   │   │   │   │   ├── whisper-small.en.rst
│   │   │   │   │   ├── whisper-small.rst
│   │   │   │   │   ├── whisper-tiny-mlx.rst
│   │   │   │   │   ├── whisper-tiny.en-mlx.rst
│   │   │   │   │   ├── whisper-tiny.en.rst
│   │   │   │   │   └── whisper-tiny.rst
│   │   │   │   ├── embedding/
│   │   │   │   │   ├── bce-embedding-base_v1.rst
│   │   │   │   │   ├── bge-base-en-v1.5.rst
│   │   │   │   │   ├── bge-base-en.rst
│   │   │   │   │   ├── bge-base-zh-v1.5.rst
│   │   │   │   │   ├── bge-base-zh.rst
│   │   │   │   │   ├── bge-large-en-v1.5.rst
│   │   │   │   │   ├── bge-large-en.rst
│   │   │   │   │   ├── bge-large-zh-noinstruct.rst
│   │   │   │   │   ├── bge-large-zh-v1.5.rst
│   │   │   │   │   ├── bge-large-zh.rst
│   │   │   │   │   ├── bge-m3.rst
│   │   │   │   │   ├── bge-small-en-v1.5.rst
│   │   │   │   │   ├── bge-small-zh-v1.5.rst
│   │   │   │   │   ├── bge-small-zh.rst
│   │   │   │   │   ├── e5-large-v2.rst
│   │   │   │   │   ├── gme-qwen2-vl-2b-instruct.rst
│   │   │   │   │   ├── gme-qwen2-vl-7b-instruct.rst
│   │   │   │   │   ├── gte-base.rst
│   │   │   │   │   ├── gte-large.rst
│   │   │   │   │   ├── gte-qwen2.rst
│   │   │   │   │   ├── index.rst
│   │   │   │   │   ├── jina-clip-v2.rst
│   │   │   │   │   ├── jina-embeddings-v2-base-en.rst
│   │   │   │   │   ├── jina-embeddings-v2-base-zh.rst
│   │   │   │   │   ├── jina-embeddings-v2-small-en.rst
│   │   │   │   │   ├── jina-embeddings-v3.rst
│   │   │   │   │   ├── jina-embeddings-v4.rst
│   │   │   │   │   ├── m3e-base.rst
│   │   │   │   │   ├── m3e-large.rst
│   │   │   │   │   ├── m3e-small.rst
│   │   │   │   │   ├── multilingual-e5-large.rst
│   │   │   │   │   ├── qwen3-embedding-0.6b.rst
│   │   │   │   │   ├── qwen3-embedding-4b.rst
│   │   │   │   │   ├── qwen3-embedding-8b.rst
│   │   │   │   │   ├── qwen3-vl-embedding-2b.rst
│   │   │   │   │   ├── qwen3-vl-embedding-8b.rst
│   │   │   │   │   ├── text2vec-base-chinese-paraphrase.rst
│   │   │   │   │   ├── text2vec-base-chinese-sentence.rst
│   │   │   │   │   ├── text2vec-base-chinese.rst
│   │   │   │   │   ├── text2vec-base-multilingual.rst
│   │   │   │   │   └── text2vec-large-chinese.rst
│   │   │   │   ├── image/
│   │   │   │   │   ├── cogview4.rst
│   │   │   │   │   ├── deepseek-ocr.rst
│   │   │   │   │   ├── flux.1-dev.rst
│   │   │   │   │   ├── flux.1-kontext-dev.rst
│   │   │   │   │   ├── flux.1-schnell.rst
│   │   │   │   │   ├── flux.2-dev.rst
│   │   │   │   │   ├── flux.2-klein-4b.rst
│   │   │   │   │   ├── flux.2-klein-9b.rst
│   │   │   │   │   ├── got-ocr2_0.rst
│   │   │   │   │   ├── hunyuandit-v1.2-distilled.rst
│   │   │   │   │   ├── hunyuandit-v1.2.rst
│   │   │   │   │   ├── hunyuanocr.rst
│   │   │   │   │   ├── index.rst
│   │   │   │   │   ├── kolors.rst
│   │   │   │   │   ├── mineru2.5-2509-1.2b.rst
│   │   │   │   │   ├── paddleocr-vl.rst
│   │   │   │   │   ├── qwen-image-2512.rst
│   │   │   │   │   ├── qwen-image-edit-2509.rst
│   │   │   │   │   ├── qwen-image-edit-2511.rst
│   │   │   │   │   ├── qwen-image-edit.rst
│   │   │   │   │   ├── qwen-image-layered.rst
│   │   │   │   │   ├── qwen-image.rst
│   │   │   │   │   ├── sd-turbo.rst
│   │   │   │   │   ├── sd3-medium.rst
│   │   │   │   │   ├── sd3.5-large-turbo.rst
│   │   │   │   │   ├── sd3.5-large.rst
│   │   │   │   │   ├── sd3.5-medium.rst
│   │   │   │   │   ├── sdxl-turbo.rst
│   │   │   │   │   ├── stable-diffusion-2-inpainting.rst
│   │   │   │   │   ├── stable-diffusion-inpainting.rst
│   │   │   │   │   ├── stable-diffusion-v1.5.rst
│   │   │   │   │   ├── stable-diffusion-xl-base-1.0.rst
│   │   │   │   │   ├── stable-diffusion-xl-inpainting.rst
│   │   │   │   │   ├── z-image-turbo.rst
│   │   │   │   │   └── z-image.rst
│   │   │   │   ├── index.rst
│   │   │   │   ├── llm/
│   │   │   │   │   ├── baichuan-2-chat.rst
│   │   │   │   │   ├── baichuan-2.rst
│   │   │   │   │   ├── baichuan-m2.rst
│   │   │   │   │   ├── code-llama-instruct.rst
│   │   │   │   │   ├── code-llama-python.rst
│   │   │   │   │   ├── code-llama.rst
│   │   │   │   │   ├── codegeex4.rst
│   │   │   │   │   ├── codeqwen1.5-chat.rst
│   │   │   │   │   ├── codeqwen1.5.rst
│   │   │   │   │   ├── codeshell-chat.rst
│   │   │   │   │   ├── codeshell.rst
│   │   │   │   │   ├── codestral-v0.1.rst
│   │   │   │   │   ├── cogagent.rst
│   │   │   │   │   ├── deepseek-chat.rst
│   │   │   │   │   ├── deepseek-coder-instruct.rst
│   │   │   │   │   ├── deepseek-coder.rst
│   │   │   │   │   ├── deepseek-prover-v2.rst
│   │   │   │   │   ├── deepseek-r1-0528-qwen3.rst
│   │   │   │   │   ├── deepseek-r1-0528.rst
│   │   │   │   │   ├── deepseek-r1-distill-llama.rst
│   │   │   │   │   ├── deepseek-r1-distill-qwen.rst
│   │   │   │   │   ├── deepseek-r1.rst
│   │   │   │   │   ├── deepseek-v2-chat-0628.rst
│   │   │   │   │   ├── deepseek-v2-chat.rst
│   │   │   │   │   ├── deepseek-v2.5.rst
│   │   │   │   │   ├── deepseek-v3-0324.rst
│   │   │   │   │   ├── deepseek-v3.1.rst
│   │   │   │   │   ├── deepseek-v3.2-exp.rst
│   │   │   │   │   ├── deepseek-v3.2.rst
│   │   │   │   │   ├── deepseek-v3.rst
│   │   │   │   │   ├── deepseek-vl2.rst
│   │   │   │   │   ├── deepseek.rst
│   │   │   │   │   ├── dianjin-r1.rst
│   │   │   │   │   ├── ernie4.5.rst
│   │   │   │   │   ├── fin-r1.rst
│   │   │   │   │   ├── gemma-3-1b-it.rst
│   │   │   │   │   ├── gemma-3-it.rst
│   │   │   │   │   ├── glm-4.1v-thinking.rst
│   │   │   │   │   ├── glm-4.5.rst
│   │   │   │   │   ├── glm-4.5v.rst
│   │   │   │   │   ├── glm-4.6.rst
│   │   │   │   │   ├── glm-4.7-flash.rst
│   │   │   │   │   ├── glm-4.7.rst
│   │   │   │   │   ├── glm-4v.rst
│   │   │   │   │   ├── glm-5.rst
│   │   │   │   │   ├── glm-edge-chat.rst
│   │   │   │   │   ├── glm4-0414.rst
│   │   │   │   │   ├── glm4-chat-1m.rst
│   │   │   │   │   ├── glm4-chat.rst
│   │   │   │   │   ├── gorilla-openfunctions-v2.rst
│   │   │   │   │   ├── gpt-2.rst
│   │   │   │   │   ├── gpt-oss.rst
│   │   │   │   │   ├── huatuogpt-o1-llama-3.1.rst
│   │   │   │   │   ├── huatuogpt-o1-qwen2.5.rst
│   │   │   │   │   ├── index.rst
│   │   │   │   │   ├── internlm3-instruct.rst
│   │   │   │   │   ├── internvl3.rst
│   │   │   │   │   ├── kat-v1.rst
│   │   │   │   │   ├── kimi-k2.5.rst
│   │   │   │   │   ├── llama-2-chat.rst
│   │   │   │   │   ├── llama-2.rst
│   │   │   │   │   ├── llama-3-instruct.rst
│   │   │   │   │   ├── llama-3.1-instruct.rst
│   │   │   │   │   ├── llama-3.1.rst
│   │   │   │   │   ├── llama-3.2-vision-instruct.rst
│   │   │   │   │   ├── llama-3.2-vision.rst
│   │   │   │   │   ├── llama-3.3-instruct.rst
│   │   │   │   │   ├── llama-3.rst
│   │   │   │   │   ├── marco-o1.rst
│   │   │   │   │   ├── mineru2.5-2509-1.2b.rst
│   │   │   │   │   ├── minicpm-2b-dpo-bf16.rst
│   │   │   │   │   ├── minicpm-2b-dpo-fp16.rst
│   │   │   │   │   ├── minicpm-2b-dpo-fp32.rst
│   │   │   │   │   ├── minicpm-2b-sft-bf16.rst
│   │   │   │   │   ├── minicpm-2b-sft-fp32.rst
│   │   │   │   │   ├── minicpm-v-2.6.rst
│   │   │   │   │   ├── minicpm-v-4.5.rst
│   │   │   │   │   ├── minicpm3-4b.rst
│   │   │   │   │   ├── minicpm4.rst
│   │   │   │   │   ├── minimax-m2.5.rst
│   │   │   │   │   ├── minimax-m2.rst
│   │   │   │   │   ├── mistral-instruct-v0.1.rst
│   │   │   │   │   ├── mistral-instruct-v0.2.rst
│   │   │   │   │   ├── mistral-instruct-v0.3.rst
│   │   │   │   │   ├── mistral-large-instruct.rst
│   │   │   │   │   ├── mistral-nemo-instruct.rst
│   │   │   │   │   ├── mistral-v0.1.rst
│   │   │   │   │   ├── mixtral-8x22b-instruct-v0.1.rst
│   │   │   │   │   ├── mixtral-instruct-v0.1.rst
│   │   │   │   │   ├── mixtral-v0.1.rst
│   │   │   │   │   ├── moonlight-16b-a3b-instruct.rst
│   │   │   │   │   ├── openhermes-2.5.rst
│   │   │   │   │   ├── opt.rst
│   │   │   │   │   ├── orion-chat.rst
│   │   │   │   │   ├── ovis2.rst
│   │   │   │   │   ├── phi-2.rst
│   │   │   │   │   ├── phi-3-mini-128k-instruct.rst
│   │   │   │   │   ├── phi-3-mini-4k-instruct.rst
│   │   │   │   │   ├── qvq-72b-preview.rst
│   │   │   │   │   ├── qwen-chat.rst
│   │   │   │   │   ├── qwen1.5-chat.rst
│   │   │   │   │   ├── qwen1.5-moe-chat.rst
│   │   │   │   │   ├── qwen2-audio-instruct.rst
│   │   │   │   │   ├── qwen2-instruct.rst
│   │   │   │   │   ├── qwen2-moe-instruct.rst
│   │   │   │   │   ├── qwen2-vl-instruct.rst
│   │   │   │   │   ├── qwen2.5-coder-instruct.rst
│   │   │   │   │   ├── qwen2.5-coder.rst
│   │   │   │   │   ├── qwen2.5-instruct-1m.rst
│   │   │   │   │   ├── qwen2.5-instruct.rst
│   │   │   │   │   ├── qwen2.5-omni.rst
│   │   │   │   │   ├── qwen2.5-vl-instruct.rst
│   │   │   │   │   ├── qwen2.5.rst
│   │   │   │   │   ├── qwen3-coder.rst
│   │   │   │   │   ├── qwen3-instruct.rst
│   │   │   │   │   ├── qwen3-next-instruct.rst
│   │   │   │   │   ├── qwen3-next-thinking.rst
│   │   │   │   │   ├── qwen3-omni-instruct.rst
│   │   │   │   │   ├── qwen3-omni-thinking.rst
│   │   │   │   │   ├── qwen3-thinking.rst
│   │   │   │   │   ├── qwen3-vl-instruct.rst
│   │   │   │   │   ├── qwen3-vl-thinking.rst
│   │   │   │   │   ├── qwen3.5.rst
│   │   │   │   │   ├── qwen3.rst
│   │   │   │   │   ├── qwenlong-l1.rst
│   │   │   │   │   ├── qwq-32b-preview.rst
│   │   │   │   │   ├── qwq-32b.rst
│   │   │   │   │   ├── seallm_v2.5.rst
│   │   │   │   │   ├── seallm_v2.rst
│   │   │   │   │   ├── seallms-v3.rst
│   │   │   │   │   ├── seed-oss.rst
│   │   │   │   │   ├── skywork-math.rst
│   │   │   │   │   ├── skywork-or1-preview.rst
│   │   │   │   │   ├── skywork-or1.rst
│   │   │   │   │   ├── skywork.rst
│   │   │   │   │   ├── telechat.rst
│   │   │   │   │   ├── tiny-llama.rst
│   │   │   │   │   ├── wizardcoder-python-v1.0.rst
│   │   │   │   │   ├── wizardmath-v1.0.rst
│   │   │   │   │   ├── xiyansql-qwencoder-2504.rst
│   │   │   │   │   ├── xverse-chat.rst
│   │   │   │   │   ├── xverse.rst
│   │   │   │   │   ├── yi-1.5-chat-16k.rst
│   │   │   │   │   ├── yi-1.5-chat.rst
│   │   │   │   │   ├── yi-1.5.rst
│   │   │   │   │   ├── yi-200k.rst
│   │   │   │   │   ├── yi-chat.rst
│   │   │   │   │   └── yi.rst
│   │   │   │   ├── rerank/
│   │   │   │   │   ├── bce-reranker-base_v1.rst
│   │   │   │   │   ├── bge-reranker-base.rst
│   │   │   │   │   ├── bge-reranker-large.rst
│   │   │   │   │   ├── bge-reranker-v2-gemma.rst
│   │   │   │   │   ├── bge-reranker-v2-m3.rst
│   │   │   │   │   ├── bge-reranker-v2-minicpm-layerwise.rst
│   │   │   │   │   ├── index.rst
│   │   │   │   │   ├── jina-reranker-v2.rst
│   │   │   │   │   ├── jina-reranker-v3.rst
│   │   │   │   │   ├── minicpm-reranker.rst
│   │   │   │   │   ├── qwen3-reranker-0.6b.rst
│   │   │   │   │   ├── qwen3-reranker-4b.rst
│   │   │   │   │   ├── qwen3-reranker-8b.rst
│   │   │   │   │   ├── qwen3-vl-reranker-2b.rst
│   │   │   │   │   └── qwen3-vl-reranker-8b.rst
│   │   │   │   └── video/
│   │   │   │       ├── cogvideox-2b.rst
│   │   │   │       ├── cogvideox-5b.rst
│   │   │   │       ├── hunyuanvideo.rst
│   │   │   │       ├── index.rst
│   │   │   │       ├── wan2.1-1.3b.rst
│   │   │   │       ├── wan2.1-14b.rst
│   │   │   │       ├── wan2.1-flf2v-14b-720p.rst
│   │   │   │       ├── wan2.1-i2v-14b-480p.rst
│   │   │   │       ├── wan2.1-i2v-14b-720p.rst
│   │   │   │       ├── wan2.2-a14b.rst
│   │   │   │       ├── wan2.2-i2v-a14b.rst
│   │   │   │       └── wan2.2-ti2v-5b.rst
│   │   │   ├── custom.rst
│   │   │   ├── index.rst
│   │   │   ├── lora.rst
│   │   │   ├── model_abilities/
│   │   │   │   ├── audio.rst
│   │   │   │   ├── chat.rst
│   │   │   │   ├── embed.rst
│   │   │   │   ├── flexible.rst
│   │   │   │   ├── image.rst
│   │   │   │   ├── index.rst
│   │   │   │   ├── multimodal.rst
│   │   │   │   ├── rerank.rst
│   │   │   │   ├── tools.rst
│   │   │   │   └── video.rst
│   │   │   ├── model_memory.rst
│   │   │   ├── model_update.rst
│   │   │   ├── sources/
│   │   │   │   └── sources.rst
│   │   │   ├── virtualenv.rst
│   │   │   └── xinference_models_hub.rst
│   │   ├── norm_zh.py
│   │   ├── reference/
│   │   │   └── index.rst
│   │   └── user_guide/
│   │       ├── auth_system.rst
│   │       ├── backends.rst
│   │       ├── client_api.rst
│   │       ├── continuous_batching.rst
│   │       ├── distributed_inference.rst
│   │       ├── index.rst
│   │       ├── launch.rst
│   │       ├── metrics.rst
│   │       └── vllm_enhancement.rst
│   └── templates/
│       ├── audio.rst.jinja
│       ├── audio_index.rst.jinja
│       ├── embedding.rst.jinja
│       ├── embedding_index.rst.jinja
│       ├── image.rst.jinja
│       ├── image_index.rst.jinja
│       ├── llm.rst.jinja
│       ├── llm_index.rst.jinja
│       ├── metrics.jinja
│       ├── rerank.rst.jinja
│       ├── rerank_index.rst.jinja
│       ├── video.rst.jinja
│       └── video_index.rst.jinja
├── examples/
│   ├── AI_podcast.py
│   ├── AI_podcast_ZH.py
│   ├── AI_translate.py
│   ├── Custom_StableDiffusion_ControlNet.ipynb
│   ├── FunctionCall.ipynb
│   ├── LangChain_QA.ipynb
│   ├── LangChain_Streamlit_Doc_Chat.py
│   ├── StableDiffusionControlNet.ipynb
│   ├── Xinference_Quick_Start.ipynb
│   ├── audio_to_text.ipynb
│   ├── chat.py
│   ├── chat_vl.ipynb
│   └── gradio_chatinterface.py
├── pyproject.toml
├── setup.cfg
├── setup.py
├── versioneer.py
└── xinference/
    ├── __init__.py
    ├── _compat.py
    ├── _version.py
    ├── api/
    │   ├── __init__.py
    │   ├── dependencies.py
    │   ├── oauth2/
    │   │   ├── __init__.py
    │   │   ├── auth_service.py
    │   │   ├── types.py
    │   │   └── utils.py
    │   ├── responses.py
    │   ├── restful_api.py
    │   ├── routers/
    │   │   ├── __init__.py
    │   │   ├── admin.py
    │   │   ├── audio.py
    │   │   ├── embeddings.py
    │   │   ├── images.py
    │   │   ├── llm.py
    │   │   ├── models.py
    │   │   ├── rerank.py
    │   │   └── videos.py
    │   ├── schemas/
    │   │   ├── __init__.py
    │   │   └── requests.py
    │   ├── tests/
    │   │   ├── __init__.py
    │   │   ├── test_admin.py
    │   │   └── test_utils.py
    │   └── utils.py
    ├── client/
    │   ├── __init__.py
    │   ├── common.py
    │   ├── handlers.py
    │   ├── restful/
    │   │   ├── __init__.py
    │   │   ├── async_restful_client.py
    │   │   └── restful_client.py
    │   └── tests/
    │       ├── __init__.py
    │       ├── test_async_client.py
    │       ├── test_async_client_with_auth.py
    │       ├── test_client.py
    │       └── test_client_with_auth.py
    ├── conftest.py
    ├── constants.py
    ├── core/
    │   ├── __init__.py
    │   ├── cache_tracker.py
    │   ├── event.py
    │   ├── launch_strategy.py
    │   ├── metrics.py
    │   ├── model.py
    │   ├── otel.py
    │   ├── progress_tracker.py
    │   ├── resource.py
    │   ├── status_guard.py
    │   ├── supervisor.py
    │   ├── tests/
    │   │   ├── __init__.py
    │   │   ├── test_continuous_batching.py
    │   │   ├── test_launch_strategy.py
    │   │   ├── test_metrics.py
    │   │   ├── test_model.py
    │   │   ├── test_progressor.py
    │   │   ├── test_restful_api.py
    │   │   ├── test_types.py
    │   │   ├── test_utils.py
    │   │   └── test_worker.py
    │   ├── utils.py
    │   ├── virtual_env_manager.py
    │   └── worker.py
    ├── deploy/
    │   ├── __init__.py
    │   ├── cmdline.py
    │   ├── docker/
    │   │   ├── Dockerfile
    │   │   ├── Dockerfile.cpu
    │   │   ├── docker-compose-distributed.yml
    │   │   ├── docker-compose.yml
    │   │   ├── requirements/
    │   │   │   ├── requirements-base.txt
    │   │   │   ├── requirements-ml.txt
    │   │   │   └── requirements-models.txt
    │   │   └── requirements_cpu/
    │   │       ├── requirements_cpu-base.txt
    │   │       ├── requirements_cpu-ml.txt
    │   │       └── requirements_cpu-models.txt
    │   ├── local.py
    │   ├── supervisor.py
    │   ├── test/
    │   │   ├── __init__.py
    │   │   └── test_cmdline.py
    │   ├── utils.py
    │   └── worker.py
    ├── device_utils.py
    ├── fields.py
    ├── isolation.py
    ├── model/
    │   ├── __init__.py
    │   ├── audio/
    │   │   ├── __init__.py
    │   │   ├── chattts.py
    │   │   ├── core.py
    │   │   ├── cosyvoice.py
    │   │   ├── custom.py
    │   │   ├── f5tts.py
    │   │   ├── f5tts_mlx.py
    │   │   ├── fish_speech.py
    │   │   ├── funasr.py
    │   │   ├── indextts2.py
    │   │   ├── kokoro.py
    │   │   ├── kokoro_mlx.py
    │   │   ├── kokoro_zh.py
    │   │   ├── megatts.py
    │   │   ├── melotts.py
    │   │   ├── model_spec.json
    │   │   ├── qwen3_asr.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   ├── bbc_news.npy
    │   │   │   ├── jfk.flac
    │   │   │   ├── test_chattts.py
    │   │   │   ├── test_cosyvoice.py
    │   │   │   ├── test_f5tts.py
    │   │   │   ├── test_f5tts_mlx.py
    │   │   │   ├── test_fish_speech.py
    │   │   │   ├── test_funasr.py
    │   │   │   ├── test_kokoro.py
    │   │   │   ├── test_megatts.py
    │   │   │   ├── test_melotts.py
    │   │   │   ├── test_whisper.py
    │   │   │   └── test_whisper_mlx.py
    │   │   ├── utils.py
    │   │   ├── whisper.py
    │   │   └── whisper_mlx.py
    │   ├── batch.py
    │   ├── cache_manager.py
    │   ├── core.py
    │   ├── custom.py
    │   ├── embedding/
    │   │   ├── __init__.py
    │   │   ├── cache_manager.py
    │   │   ├── core.py
    │   │   ├── custom.py
    │   │   ├── embed_family.py
    │   │   ├── flag/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       └── test_flag.py
    │   │   ├── llama_cpp/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       └── test_llama_cpp.py
    │   │   ├── model_spec.json
    │   │   ├── sentence_transformers/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       └── test_sentence_transformers.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_embedding_models.py
    │   │   │   ├── test_integrated_embedding.py
    │   │   │   └── test_qwen3_vl_engine_params.py
    │   │   └── vllm/
    │   │       ├── __init__.py
    │   │       ├── core.py
    │   │       └── tests/
    │   │           ├── __init__.py
    │   │           └── test_vllm_embedding.py
    │   ├── flexible/
    │   │   ├── __init__.py
    │   │   ├── core.py
    │   │   ├── custom.py
    │   │   ├── launchers/
    │   │   │   ├── __init__.py
    │   │   │   ├── image_process_launcher.py
    │   │   │   ├── modelscope_launcher.py
    │   │   │   ├── transformers_launcher.py
    │   │   │   └── yolo_launcher.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   └── test_flexible_models.py
    │   │   └── utils.py
    │   ├── image/
    │   │   ├── __init__.py
    │   │   ├── cache_manager.py
    │   │   ├── core.py
    │   │   ├── custom.py
    │   │   ├── engine.py
    │   │   ├── engine_family.py
    │   │   ├── model_spec.json
    │   │   ├── ocr/
    │   │   │   ├── __init__.py
    │   │   │   ├── deepseek_ocr.py
    │   │   │   ├── got_ocr2.py
    │   │   │   ├── hunyuan_ocr.py
    │   │   │   ├── mlx.py
    │   │   │   ├── ocr_family.py
    │   │   │   ├── paddleocr_vl.py
    │   │   │   └── vllm.py
    │   │   ├── scheduler/
    │   │   │   ├── __init__.py
    │   │   │   └── flux.py
    │   │   ├── sdapi.py
    │   │   ├── stable_diffusion/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── mlx.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_got_ocr2.py
    │   │   │   └── test_stable_diffusion.py
    │   │   └── utils.py
    │   ├── llm/
    │   │   ├── __init__.py
    │   │   ├── cache_manager.py
    │   │   ├── config_parser.py
    │   │   ├── core.py
    │   │   ├── custom.py
    │   │   ├── harmony.py
    │   │   ├── llama_cpp/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       ├── test_gguf.py
    │   │   │       └── test_structured.py
    │   │   ├── llm_family.json
    │   │   ├── llm_family.py
    │   │   ├── lmdeploy/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       └── __init__.py
    │   │   ├── memory.py
    │   │   ├── mlx/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   ├── distributed_models/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── core.py
    │   │   │   │   ├── deepseek_v3.py
    │   │   │   │   ├── qwen2.py
    │   │   │   │   ├── qwen3.py
    │   │   │   │   └── qwen3_moe.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       ├── test_distributed_model.py
    │   │   │       └── test_mlx.py
    │   │   ├── reasoning_parser.py
    │   │   ├── sglang/
    │   │   │   ├── __init__.py
    │   │   │   └── core.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_harmony.py
    │   │   │   ├── test_llm_family.py
    │   │   │   ├── test_llm_model.py
    │   │   │   ├── test_memory_estimate.py
    │   │   │   ├── test_multimodal.py
    │   │   │   ├── test_stream_options.py
    │   │   │   └── test_utils.py
    │   │   ├── tool_parsers/
    │   │   │   ├── __init__.py
    │   │   │   ├── abstract_tool_parser.py
    │   │   │   ├── deepseek_r1_tool_parser.py
    │   │   │   ├── deepseek_v3_1_tool_parser.py
    │   │   │   ├── deepseek_v3_tool_parser.py
    │   │   │   ├── glm4_tool_parser.py
    │   │   │   ├── llama3_tool_parser.py
    │   │   │   ├── minimax_tool_parser.py
    │   │   │   ├── qwen_tool_parser.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       ├── test_deepseek_r1_tool_parser.py
    │   │   │       ├── test_deepseek_v3_1_tool_parser.py
    │   │   │       ├── test_deepseek_v3_tool_parser.py
    │   │   │       ├── test_glm4_tool_parser.py
    │   │   │       ├── test_llama3_tool_parser.py
    │   │   │       └── test_qwen_tool_parser.py
    │   │   ├── transformers/
    │   │   │   ├── __init__.py
    │   │   │   ├── chatglm.py
    │   │   │   ├── core.py
    │   │   │   ├── deepseek_v2.py
    │   │   │   ├── gemma3.py
    │   │   │   ├── gpt_oss.py
    │   │   │   ├── multimodal/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── cogagent.py
    │   │   │   │   ├── core.py
    │   │   │   │   ├── deepseek_vl2.py
    │   │   │   │   ├── gemma3.py
    │   │   │   │   ├── glm4_1v.py
    │   │   │   │   ├── glm4v.py
    │   │   │   │   ├── intern_vl.py
    │   │   │   │   ├── minicpmv26.py
    │   │   │   │   ├── minicpmv45.py
    │   │   │   │   ├── ovis2.py
    │   │   │   │   ├── qwen-omni.py
    │   │   │   │   ├── qwen2_audio.py
    │   │   │   │   └── qwen2_vl.py
    │   │   │   ├── opt.py
    │   │   │   ├── tensorizer_utils.py
    │   │   │   ├── tests/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── test_opt.py
    │   │   │   │   └── test_tensorizer.py
    │   │   │   └── utils.py
    │   │   ├── utils.py
    │   │   └── vllm/
    │   │       ├── __init__.py
    │   │       ├── core.py
    │   │       ├── distributed_executor.py
    │   │       ├── distributed_executor_v1.py
    │   │       ├── tests/
    │   │       │   ├── __init__.py
    │   │       │   ├── test_core_chat_model.py
    │   │       │   └── test_distributed_executor.py
    │   │       ├── utils.py
    │   │       └── xavier/
    │   │           ├── __init__.py
    │   │           ├── allocator.py
    │   │           ├── block.py
    │   │           ├── block_manager.py
    │   │           ├── block_tracker.py
    │   │           ├── collective.py
    │   │           ├── collective_manager.py
    │   │           ├── engine.py
    │   │           ├── executor.py
    │   │           ├── scheduler.py
    │   │           ├── test/
    │   │           │   ├── __init__.py
    │   │           │   └── test_xavier.py
    │   │           ├── transfer.py
    │   │           └── utils.py
    │   ├── rerank/
    │   │   ├── __init__.py
    │   │   ├── cache_manager.py
    │   │   ├── core.py
    │   │   ├── custom.py
    │   │   ├── llama_cpp/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       └── test_llama_cpp.py
    │   │   ├── model_spec.json
    │   │   ├── rerank_family.py
    │   │   ├── sentence_transformers/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       └── test_sentence_transformers.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_qwen3_vl_reranker_virtualenv.py
    │   │   │   └── test_rerank.py
    │   │   ├── utils.py
    │   │   └── vllm/
    │   │       ├── __init__.py
    │   │       ├── core.py
    │   │       └── tests/
    │   │           ├── __init__.py
    │   │           └── test_vllm.py
    │   ├── scheduler/
    │   │   ├── __init__.py
    │   │   ├── batch.py
    │   │   ├── core.py
    │   │   └── request.py
    │   ├── tests/
    │   │   ├── __init__.py
    │   │   └── test_utils.py
    │   ├── utils.py
    │   └── video/
    │       ├── __init__.py
    │       ├── cache_manager.py
    │       ├── core.py
    │       ├── diffusers.py
    │       ├── model_spec.json
    │       └── tests/
    │           ├── __init__.py
    │           └── test_diffusers_video.py
    ├── thirdparty/
    │   ├── __init__.py
    │   ├── audiotools/
    │   │   ├── __init__.py
    │   │   ├── core/
    │   │   │   ├── __init__.py
    │   │   │   ├── audio_signal.py
    │   │   │   ├── display.py
    │   │   │   ├── dsp.py
    │   │   │   ├── effects.py
    │   │   │   ├── ffmpeg.py
    │   │   │   ├── loudness.py
    │   │   │   ├── playback.py
    │   │   │   ├── templates/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── headers.html
    │   │   │   │   ├── pandoc.css
    │   │   │   │   └── widget.html
    │   │   │   ├── util.py
    │   │   │   └── whisper.py
    │   │   ├── data/
    │   │   │   ├── __init__.py
    │   │   │   ├── datasets.py
    │   │   │   ├── preprocess.py
    │   │   │   └── transforms.py
    │   │   ├── metrics/
    │   │   │   ├── __init__.py
    │   │   │   ├── distance.py
    │   │   │   ├── quality.py
    │   │   │   └── spectral.py
    │   │   ├── ml/
    │   │   │   ├── __init__.py
    │   │   │   ├── accelerator.py
    │   │   │   ├── decorators.py
    │   │   │   ├── experiment.py
    │   │   │   └── layers/
    │   │   │       ├── __init__.py
    │   │   │       ├── base.py
    │   │   │       └── spectral_gate.py
    │   │   ├── post.py
    │   │   └── preference.py
    │   ├── cosyvoice/
    │   │   ├── __init__.py
    │   │   ├── bin/
    │   │   │   ├── average_model.py
    │   │   │   ├── export_jit.py
    │   │   │   ├── export_onnx.py
    │   │   │   ├── inference_deprecated.py
    │   │   │   ├── spk2info.pt
    │   │   │   └── train.py
    │   │   ├── cli/
    │   │   │   ├── __init__.py
    │   │   │   ├── cosyvoice.py
    │   │   │   ├── frontend.py
    │   │   │   └── model.py
    │   │   ├── dataset/
    │   │   │   ├── __init__.py
    │   │   │   ├── dataset.py
    │   │   │   └── processor.py
    │   │   ├── flow/
    │   │   │   ├── decoder.py
    │   │   │   ├── flow.py
    │   │   │   ├── flow_matching.py
    │   │   │   └── length_regulator.py
    │   │   ├── hifigan/
    │   │   │   ├── discriminator.py
    │   │   │   ├── f0_predictor.py
    │   │   │   ├── generator.py
    │   │   │   └── hifigan.py
    │   │   ├── llm/
    │   │   │   └── llm.py
    │   │   ├── tokenizer/
    │   │   │   ├── assets/
    │   │   │   │   └── multilingual_zh_ja_yue_char_del.tiktoken
    │   │   │   └── tokenizer.py
    │   │   ├── transformer/
    │   │   │   ├── __init__.py
    │   │   │   ├── activation.py
    │   │   │   ├── attention.py
    │   │   │   ├── convolution.py
    │   │   │   ├── decoder.py
    │   │   │   ├── decoder_layer.py
    │   │   │   ├── embedding.py
    │   │   │   ├── encoder.py
    │   │   │   ├── encoder_layer.py
    │   │   │   ├── label_smoothing_loss.py
    │   │   │   ├── positionwise_feed_forward.py
    │   │   │   ├── subsampling.py
    │   │   │   └── upsample_encoder.py
    │   │   ├── utils/
    │   │   │   ├── __init__.py
    │   │   │   ├── class_utils.py
    │   │   │   ├── common.py
    │   │   │   ├── executor.py
    │   │   │   ├── file_utils.py
    │   │   │   ├── frontend_utils.py
    │   │   │   ├── losses.py
    │   │   │   ├── mask.py
    │   │   │   ├── scheduler.py
    │   │   │   └── train_utils.py
    │   │   └── vllm/
    │   │       └── cosyvoice2.py
    │   ├── deepseek_vl/
    │   │   ├── __init__.py
    │   │   ├── models/
    │   │   │   ├── __init__.py
    │   │   │   ├── clip_encoder.py
    │   │   │   ├── image_processing_vlm.py
    │   │   │   ├── modeling_vlm.py
    │   │   │   ├── processing_vlm.py
    │   │   │   ├── projector.py
    │   │   │   ├── sam.py
    │   │   │   └── siglip_vit.py
    │   │   ├── serve/
    │   │   │   ├── __init__.py
    │   │   │   ├── app_deepseek.py
    │   │   │   ├── app_modules/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── gradio_utils.py
    │   │   │   │   ├── overwrites.py
    │   │   │   │   ├── presets.py
    │   │   │   │   └── utils.py
    │   │   │   ├── assets/
    │   │   │   │   ├── Kelpy-Codos.js
    │   │   │   │   ├── custom.css
    │   │   │   │   └── custom.js
    │   │   │   └── inference.py
    │   │   └── utils/
    │   │       ├── __init__.py
    │   │       ├── conversation.py
    │   │       └── io.py
    │   ├── deepseek_vl2/
    │   │   ├── __init__.py
    │   │   ├── models/
    │   │   │   ├── __init__.py
    │   │   │   ├── configuration_deepseek.py
    │   │   │   ├── conversation.py
    │   │   │   ├── modeling_deepseek.py
    │   │   │   ├── modeling_deepseek_vl_v2.py
    │   │   │   ├── processing_deepseek_vl_v2.py
    │   │   │   └── siglip_vit.py
    │   │   ├── serve/
    │   │   │   ├── __init__.py
    │   │   │   ├── app_modules/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── gradio_utils.py
    │   │   │   │   ├── overwrites.py
    │   │   │   │   ├── presets.py
    │   │   │   │   └── utils.py
    │   │   │   ├── assets/
    │   │   │   │   ├── Kelpy-Codos.js
    │   │   │   │   ├── custom.css
    │   │   │   │   ├── custom.js
    │   │   │   │   └── simsun.ttc
    │   │   │   └── inference.py
    │   │   └── utils/
    │   │       ├── __init__.py
    │   │       └── io.py
    │   ├── f5_tts/
    │   │   ├── __init__.py
    │   │   ├── api.py
    │   │   ├── configs/
    │   │   │   ├── E2TTS_Base_train.yaml
    │   │   │   ├── E2TTS_Small_train.yaml
    │   │   │   ├── F5TTS_Base_train.yaml
    │   │   │   └── F5TTS_Small_train.yaml
    │   │   ├── eval/
    │   │   │   ├── README.md
    │   │   │   ├── ecapa_tdnn.py
    │   │   │   ├── eval_infer_batch.py
    │   │   │   ├── eval_infer_batch.sh
    │   │   │   ├── eval_librispeech_test_clean.py
    │   │   │   ├── eval_seedtts_testset.py
    │   │   │   └── utils_eval.py
    │   │   ├── infer/
    │   │   │   ├── README.md
    │   │   │   ├── examples/
    │   │   │   │   ├── basic/
    │   │   │   │   │   └── basic.toml
    │   │   │   │   ├── multi/
    │   │   │   │   │   ├── country.flac
    │   │   │   │   │   ├── main.flac
    │   │   │   │   │   ├── story.toml
    │   │   │   │   │   ├── story.txt
    │   │   │   │   │   └── town.flac
    │   │   │   │   └── vocab.txt
    │   │   │   ├── infer_cli.py
    │   │   │   ├── infer_gradio.py
    │   │   │   ├── speech_edit.py
    │   │   │   └── utils_infer.py
    │   │   ├── model/
    │   │   │   ├── __init__.py
    │   │   │   ├── backbones/
    │   │   │   │   ├── README.md
    │   │   │   │   ├── dit.py
    │   │   │   │   ├── mmdit.py
    │   │   │   │   └── unett.py
    │   │   │   ├── cfm.py
    │   │   │   ├── dataset.py
    │   │   │   ├── modules.py
    │   │   │   ├── trainer.py
    │   │   │   └── utils.py
    │   │   ├── scripts/
    │   │   │   ├── count_max_epoch.py
    │   │   │   └── count_params_gflops.py
    │   │   ├── socket_server.py
    │   │   └── train/
    │   │       ├── README.md
    │   │       ├── datasets/
    │   │       │   ├── prepare_csv_wavs.py
    │   │       │   ├── prepare_emilia.py
    │   │       │   ├── prepare_libritts.py
    │   │       │   ├── prepare_ljspeech.py
    │   │       │   └── prepare_wenetspeech4tts.py
    │   │       ├── finetune_cli.py
    │   │       ├── finetune_gradio.py
    │   │       └── train.py
    │   ├── fish_speech/
    │   │   ├── __init__.py
    │   │   ├── fish_speech/
    │   │   │   ├── __init__.py
    │   │   │   ├── callbacks/
    │   │   │   │   ├── __init__.py
    │   │   │   │   └── grad_norm.py
    │   │   │   ├── configs/
    │   │   │   │   ├── base.yaml
    │   │   │   │   ├── firefly_gan_vq.yaml
    │   │   │   │   ├── lora/
    │   │   │   │   │   └── r_8_alpha_16.yaml
    │   │   │   │   └── text2semantic_finetune.yaml
    │   │   │   ├── conversation.py
    │   │   │   ├── datasets/
    │   │   │   │   ├── concat_repeat.py
    │   │   │   │   ├── protos/
    │   │   │   │   │   ├── text-data.proto
    │   │   │   │   │   ├── text_data_pb2.py
    │   │   │   │   │   └── text_data_stream.py
    │   │   │   │   ├── semantic.py
    │   │   │   │   └── vqgan.py
    │   │   │   ├── i18n/
    │   │   │   │   ├── README.md
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── core.py
    │   │   │   │   ├── locale/
    │   │   │   │   │   ├── en_US.json
    │   │   │   │   │   ├── es_ES.json
    │   │   │   │   │   ├── ja_JP.json
    │   │   │   │   │   ├── ko_KR.json
    │   │   │   │   │   ├── pt_BR.json
    │   │   │   │   │   └── zh_CN.json
    │   │   │   │   └── scan.py
    │   │   │   ├── models/
    │   │   │   │   ├── text2semantic/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── lit_module.py
    │   │   │   │   │   ├── llama.py
    │   │   │   │   │   └── lora.py
    │   │   │   │   └── vqgan/
    │   │   │   │       ├── __init__.py
    │   │   │   │       ├── modules/
    │   │   │   │       │   ├── firefly.py
    │   │   │   │       │   └── fsq.py
    │   │   │   │       └── utils.py
    │   │   │   ├── scheduler.py
    │   │   │   ├── text/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── chn_text_norm/
    │   │   │   │   │   ├── .gitignore
    │   │   │   │   │   ├── README.md
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── basic_class.py
    │   │   │   │   │   ├── basic_constant.py
    │   │   │   │   │   ├── basic_util.py
    │   │   │   │   │   ├── cardinal.py
    │   │   │   │   │   ├── date.py
    │   │   │   │   │   ├── digit.py
    │   │   │   │   │   ├── fraction.py
    │   │   │   │   │   ├── money.py
    │   │   │   │   │   ├── percentage.py
    │   │   │   │   │   ├── telephone.py
    │   │   │   │   │   └── text.py
    │   │   │   │   ├── clean.py
    │   │   │   │   └── spliter.py
    │   │   │   ├── tokenizer.py
    │   │   │   ├── train.py
    │   │   │   ├── utils/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── braceexpand.py
    │   │   │   │   ├── context.py
    │   │   │   │   ├── file.py
    │   │   │   │   ├── instantiators.py
    │   │   │   │   ├── logger.py
    │   │   │   │   ├── logging_utils.py
    │   │   │   │   ├── rich_utils.py
    │   │   │   │   ├── spectrogram.py
    │   │   │   │   └── utils.py
    │   │   │   └── webui/
    │   │   │       ├── css/
    │   │   │       │   └── style.css
    │   │   │       ├── html/
    │   │   │       │   └── footer.html
    │   │   │       ├── js/
    │   │   │       │   └── animate.js
    │   │   │       ├── launch_utils.py
    │   │   │       └── manage.py
    │   │   └── tools/
    │   │       ├── api_client.py
    │   │       ├── api_server.py
    │   │       ├── download_models.py
    │   │       ├── e2e_webui.py
    │   │       ├── extract_model.py
    │   │       ├── file.py
    │   │       ├── fish_e2e.py
    │   │       ├── inference_engine/
    │   │       │   ├── __init__.py
    │   │       │   ├── reference_loader.py
    │   │       │   ├── utils.py
    │   │       │   └── vq_manager.py
    │   │       ├── llama/
    │   │       │   ├── build_dataset.py
    │   │       │   ├── eval_in_context.py
    │   │       │   ├── generate.py
    │   │       │   ├── merge_lora.py
    │   │       │   ├── quantize.py
    │   │       │   └── rebuild_tokenizer.py
    │   │       ├── run_webui.py
    │   │       ├── schema.py
    │   │       ├── sensevoice/
    │   │       │   ├── README.md
    │   │       │   ├── __init__.py
    │   │       │   ├── auto_model.py
    │   │       │   ├── fun_asr.py
    │   │       │   └── vad_utils.py
    │   │       ├── server/
    │   │       │   ├── agent/
    │   │       │   │   ├── __init__.py
    │   │       │   │   ├── generate.py
    │   │       │   │   ├── generation_utils.py
    │   │       │   │   └── pre_generation_utils.py
    │   │       │   ├── api_utils.py
    │   │       │   ├── exception_handler.py
    │   │       │   ├── inference.py
    │   │       │   ├── model_manager.py
    │   │       │   ├── model_utils.py
    │   │       │   └── views.py
    │   │       ├── smart_pad.py
    │   │       ├── vqgan/
    │   │       │   ├── create_train_split.py
    │   │       │   ├── extract_vq.py
    │   │       │   └── inference.py
    │   │       ├── webui/
    │   │       │   ├── __init__.py
    │   │       │   ├── inference.py
    │   │       │   └── variables.py
    │   │       └── whisper_asr.py
    │   ├── indextts/
    │   │   ├── BigVGAN/
    │   │   │   ├── ECAPA_TDNN.py
    │   │   │   ├── __init__.py
    │   │   │   ├── activations.py
    │   │   │   ├── alias_free_activation/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── cuda/
    │   │   │   │   │   ├── .gitignore
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── activation1d.py
    │   │   │   │   │   ├── anti_alias_activation.cpp
    │   │   │   │   │   ├── anti_alias_activation_cuda.cu
    │   │   │   │   │   ├── compat.h
    │   │   │   │   │   ├── load.py
    │   │   │   │   │   └── type_shim.h
    │   │   │   │   └── torch/
    │   │   │   │       ├── __init__.py
    │   │   │   │       ├── act.py
    │   │   │   │       ├── filter.py
    │   │   │   │       └── resample.py
    │   │   │   ├── alias_free_torch/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── act.py
    │   │   │   │   ├── filter.py
    │   │   │   │   └── resample.py
    │   │   │   ├── bigvgan.py
    │   │   │   ├── models.py
    │   │   │   ├── nnet/
    │   │   │   │   ├── CNN.py
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── linear.py
    │   │   │   │   └── normalization.py
    │   │   │   └── utils.py
    │   │   ├── __init__.py
    │   │   ├── cli.py
    │   │   ├── gpt/
    │   │   │   ├── __init__.py
    │   │   │   ├── conformer/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── attention.py
    │   │   │   │   ├── embedding.py
    │   │   │   │   └── subsampling.py
    │   │   │   ├── conformer_encoder.py
    │   │   │   ├── model.py
    │   │   │   ├── model_v2.py
    │   │   │   ├── perceiver.py
    │   │   │   ├── transformers_beam_search.py
    │   │   │   ├── transformers_generation_utils.py
    │   │   │   ├── transformers_gpt2.py
    │   │   │   └── transformers_modeling_utils.py
    │   │   ├── infer.py
    │   │   ├── infer_v2.py
    │   │   ├── s2mel/
    │   │   │   ├── dac/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── __main__.py
    │   │   │   │   ├── model/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── base.py
    │   │   │   │   │   ├── dac.py
    │   │   │   │   │   ├── discriminator.py
    │   │   │   │   │   └── encodec.py
    │   │   │   │   ├── nn/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── layers.py
    │   │   │   │   │   ├── loss.py
    │   │   │   │   │   └── quantize.py
    │   │   │   │   └── utils/
    │   │   │   │       ├── __init__.py
    │   │   │   │       ├── decode.py
    │   │   │   │       └── encode.py
    │   │   │   ├── hf_utils.py
    │   │   │   ├── modules/
    │   │   │   │   ├── alias_free_torch/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── act.py
    │   │   │   │   │   ├── filter.py
    │   │   │   │   │   └── resample.py
    │   │   │   │   ├── audio.py
    │   │   │   │   ├── bigvgan/
    │   │   │   │   │   ├── activations.py
    │   │   │   │   │   ├── alias_free_activation/
    │   │   │   │   │   │   ├── cuda/
    │   │   │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   │   │   ├── activation1d.py
    │   │   │   │   │   │   │   ├── anti_alias_activation.cpp
    │   │   │   │   │   │   │   ├── anti_alias_activation_cuda.cu
    │   │   │   │   │   │   │   ├── compat.h
    │   │   │   │   │   │   │   ├── load.py
    │   │   │   │   │   │   │   └── type_shim.h
    │   │   │   │   │   │   └── torch/
    │   │   │   │   │   │       ├── __init__.py
    │   │   │   │   │   │       ├── act.py
    │   │   │   │   │   │       ├── filter.py
    │   │   │   │   │   │       └── resample.py
    │   │   │   │   │   ├── bigvgan.py
    │   │   │   │   │   ├── config.json
    │   │   │   │   │   ├── env.py
    │   │   │   │   │   ├── meldataset.py
    │   │   │   │   │   └── utils.py
    │   │   │   │   ├── campplus/
    │   │   │   │   │   ├── DTDNN.py
    │   │   │   │   │   ├── classifier.py
    │   │   │   │   │   └── layers.py
    │   │   │   │   ├── commons.py
    │   │   │   │   ├── diffusion_transformer.py
    │   │   │   │   ├── encodec.py
    │   │   │   │   ├── flow_matching.py
    │   │   │   │   ├── gpt_fast/
    │   │   │   │   │   ├── generate.py
    │   │   │   │   │   ├── model.py
    │   │   │   │   │   └── quantize.py
    │   │   │   │   ├── hifigan/
    │   │   │   │   │   ├── f0_predictor.py
    │   │   │   │   │   └── generator.py
    │   │   │   │   ├── layers.py
    │   │   │   │   ├── length_regulator.py
    │   │   │   │   ├── openvoice/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── api.py
    │   │   │   │   │   ├── attentions.py
    │   │   │   │   │   ├── checkpoints_v2/
    │   │   │   │   │   │   └── converter/
    │   │   │   │   │   │       └── config.json
    │   │   │   │   │   ├── commons.py
    │   │   │   │   │   ├── mel_processing.py
    │   │   │   │   │   ├── models.py
    │   │   │   │   │   ├── modules.py
    │   │   │   │   │   ├── openvoice_app.py
    │   │   │   │   │   ├── se_extractor.py
    │   │   │   │   │   ├── transforms.py
    │   │   │   │   │   └── utils.py
    │   │   │   │   ├── quantize.py
    │   │   │   │   ├── rmvpe.py
    │   │   │   │   ├── vocos/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── heads.py
    │   │   │   │   │   ├── helpers.py
    │   │   │   │   │   ├── loss.py
    │   │   │   │   │   ├── models.py
    │   │   │   │   │   ├── modules.py
    │   │   │   │   │   ├── pretrained.py
    │   │   │   │   │   └── spectral_ops.py
    │   │   │   │   └── wavenet.py
    │   │   │   ├── optimizers.py
    │   │   │   └── wav2vecbert_extract.py
    │   │   ├── utils/
    │   │   │   ├── __init__.py
    │   │   │   ├── arch_util.py
    │   │   │   ├── checkpoint.py
    │   │   │   ├── common.py
    │   │   │   ├── feature_extractors.py
    │   │   │   ├── front.py
    │   │   │   ├── maskgct/
    │   │   │   │   └── models/
    │   │   │   │       ├── codec/
    │   │   │   │       │   ├── __init__.py
    │   │   │   │       │   ├── amphion_codec/
    │   │   │   │       │   │   ├── codec.py
    │   │   │   │       │   │   ├── quantize/
    │   │   │   │       │   │   │   ├── __init__.py
    │   │   │   │       │   │   │   ├── factorized_vector_quantize.py
    │   │   │   │       │   │   │   ├── lookup_free_quantize.py
    │   │   │   │       │   │   │   ├── residual_vq.py
    │   │   │   │       │   │   │   └── vector_quantize.py
    │   │   │   │       │   │   └── vocos.py
    │   │   │   │       │   ├── codec_dataset.py
    │   │   │   │       │   ├── codec_inference.py
    │   │   │   │       │   ├── codec_sampler.py
    │   │   │   │       │   ├── codec_trainer.py
    │   │   │   │       │   ├── facodec/
    │   │   │   │       │   │   ├── __init__.py
    │   │   │   │       │   │   ├── alias_free_torch/
    │   │   │   │       │   │   │   ├── __init__.py
    │   │   │   │       │   │   │   ├── act.py
    │   │   │   │       │   │   │   ├── filter.py
    │   │   │   │       │   │   │   └── resample.py
    │   │   │   │       │   │   ├── facodec_dataset.py
    │   │   │   │       │   │   ├── facodec_inference.py
    │   │   │   │       │   │   ├── facodec_trainer.py
    │   │   │   │       │   │   ├── modules/
    │   │   │   │       │   │   │   ├── JDC/
    │   │   │   │       │   │   │   │   ├── __init__.py
    │   │   │   │       │   │   │   │   ├── bst.t7
    │   │   │   │       │   │   │   │   └── model.py
    │   │   │   │       │   │   │   ├── attentions.py
    │   │   │   │       │   │   │   ├── commons.py
    │   │   │   │       │   │   │   ├── gradient_reversal.py
    │   │   │   │       │   │   │   ├── layers.py
    │   │   │   │       │   │   │   ├── quantize.py
    │   │   │   │       │   │   │   ├── style_encoder.py
    │   │   │   │       │   │   │   └── wavenet.py
    │   │   │   │       │   │   └── optimizer.py
    │   │   │   │       │   ├── kmeans/
    │   │   │   │       │   │   ├── repcodec_model.py
    │   │   │   │       │   │   └── vocos.py
    │   │   │   │       │   ├── melvqgan/
    │   │   │   │       │   │   └── melspec.py
    │   │   │   │       │   ├── ns3_codec/
    │   │   │   │       │   │   ├── README.md
    │   │   │   │       │   │   ├── __init__.py
    │   │   │   │       │   │   ├── alias_free_torch/
    │   │   │   │       │   │   │   ├── __init__.py
    │   │   │   │       │   │   │   ├── act.py
    │   │   │   │       │   │   │   ├── filter.py
    │   │   │   │       │   │   │   └── resample.py
    │   │   │   │       │   │   ├── facodec.py
    │   │   │   │       │   │   ├── gradient_reversal.py
    │   │   │   │       │   │   ├── melspec.py
    │   │   │   │       │   │   ├── quantize/
    │   │   │   │       │   │   │   ├── __init__.py
    │   │   │   │       │   │   │   ├── fvq.py
    │   │   │   │       │   │   │   └── rvq.py
    │   │   │   │       │   │   └── transformer.py
    │   │   │   │       │   ├── speechtokenizer/
    │   │   │   │       │   │   ├── model.py
    │   │   │   │       │   │   └── modules/
    │   │   │   │       │   │       ├── __init__.py
    │   │   │   │       │   │       ├── conv.py
    │   │   │   │       │   │       ├── lstm.py
    │   │   │   │       │   │       ├── norm.py
    │   │   │   │       │   │       ├── quantization/
    │   │   │   │       │   │       │   ├── __init__.py
    │   │   │   │       │   │       │   ├── ac.py
    │   │   │   │       │   │       │   ├── core_vq.py
    │   │   │   │       │   │       │   ├── distrib.py
    │   │   │   │       │   │       │   └── vq.py
    │   │   │   │       │   │       └── seanet.py
    │   │   │   │       │   └── vevo/
    │   │   │   │       │       └── vevo_repcodec.py
    │   │   │   │       └── tts/
    │   │   │   │           └── maskgct/
    │   │   │   │               ├── ckpt/
    │   │   │   │               │   └── wav2vec2bert_stats.pt
    │   │   │   │               ├── llama_nar.py
    │   │   │   │               └── maskgct_s2a.py
    │   │   │   ├── maskgct_utils.py
    │   │   │   ├── text_utils.py
    │   │   │   ├── typical_sampling.py
    │   │   │   ├── utils.py
    │   │   │   ├── webui_utils.py
    │   │   │   └── xtransformers.py
    │   │   └── vqvae/
    │   │       ├── __init__.py
    │   │       └── xtts_dvae.py
    │   ├── internvl/
    │   │   ├── __init__.py
    │   │   └── conversation.py
    │   ├── llava/
    │   │   ├── __init__.py
    │   │   ├── conversation.py
    │   │   ├── mm_utils.py
    │   │   └── model/
    │   │       ├── __init__.py
    │   │       ├── clip_encoder/
    │   │       │   ├── __init__.py
    │   │       │   ├── builder.py
    │   │       │   └── clip_encoder.py
    │   │       ├── constants.py
    │   │       ├── llava_arch.py
    │   │       ├── llava_llama.py
    │   │       └── multimodal_projector/
    │   │           ├── __init__.py
    │   │           └── builder.py
    │   ├── matcha/
    │   │   ├── VERSION
    │   │   ├── __init__.py
    │   │   ├── app.py
    │   │   ├── cli.py
    │   │   ├── data/
    │   │   │   ├── __init__.py
    │   │   │   ├── components/
    │   │   │   │   └── __init__.py
    │   │   │   └── text_mel_datamodule.py
    │   │   ├── hifigan/
    │   │   │   ├── LICENSE
    │   │   │   ├── README.md
    │   │   │   ├── __init__.py
    │   │   │   ├── config.py
    │   │   │   ├── denoiser.py
    │   │   │   ├── env.py
    │   │   │   ├── meldataset.py
    │   │   │   ├── models.py
    │   │   │   └── xutils.py
    │   │   ├── models/
    │   │   │   ├── __init__.py
    │   │   │   ├── baselightningmodule.py
    │   │   │   ├── components/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── decoder.py
    │   │   │   │   ├── flow_matching.py
    │   │   │   │   ├── text_encoder.py
    │   │   │   │   └── transformer.py
    │   │   │   └── matcha_tts.py
    │   │   ├── onnx/
    │   │   │   ├── __init__.py
    │   │   │   ├── export.py
    │   │   │   └── infer.py
    │   │   ├── text/
    │   │   │   ├── __init__.py
    │   │   │   ├── cleaners.py
    │   │   │   ├── numbers.py
    │   │   │   └── symbols.py
    │   │   ├── train.py
    │   │   └── utils/
    │   │       ├── __init__.py
    │   │       ├── audio.py
    │   │       ├── generate_data_statistics.py
    │   │       ├── get_durations_from_trained_model.py
    │   │       ├── instantiators.py
    │   │       ├── logging_utils.py
    │   │       ├── model.py
    │   │       ├── monotonic_align/
    │   │       │   ├── __init__.py
    │   │       │   ├── core.pyx
    │   │       │   └── setup.py
    │   │       ├── pylogger.py
    │   │       ├── rich_utils.py
    │   │       └── utils.py
    │   ├── megatts3/
    │   │   ├── __init__.py
    │   │   └── tts/
    │   │       ├── frontend_function.py
    │   │       ├── gradio_api.py
    │   │       ├── infer_cli.py
    │   │       ├── modules/
    │   │       │   ├── aligner/
    │   │       │   │   └── whisper_small.py
    │   │       │   ├── ar_dur/
    │   │       │   │   ├── ar_dur_predictor.py
    │   │       │   │   └── commons/
    │   │       │   │       ├── layers.py
    │   │       │   │       ├── nar_tts_modules.py
    │   │       │   │       ├── rel_transformer.py
    │   │       │   │       ├── rot_transformer.py
    │   │       │   │       ├── seq_utils.py
    │   │       │   │       └── transformer.py
    │   │       │   ├── llm_dit/
    │   │       │   │   ├── cfm.py
    │   │       │   │   ├── dit.py
    │   │       │   │   ├── time_embedding.py
    │   │       │   │   └── transformer.py
    │   │       │   └── wavvae/
    │   │       │       ├── decoder/
    │   │       │       │   ├── diag_gaussian.py
    │   │       │       │   ├── hifigan_modules.py
    │   │       │       │   ├── seanet_encoder.py
    │   │       │       │   └── wavvae_v3.py
    │   │       │       └── encoder/
    │   │       │           └── common_modules/
    │   │       │               ├── conv.py
    │   │       │               ├── lstm.py
    │   │       │               └── seanet.py
    │   │       └── utils/
    │   │           ├── audio_utils/
    │   │           │   ├── align.py
    │   │           │   ├── io.py
    │   │           │   └── plot.py
    │   │           ├── commons/
    │   │           │   ├── ckpt_utils.py
    │   │           │   └── hparams.py
    │   │           └── text_utils/
    │   │               ├── dict.json
    │   │               ├── ph_tone_convert.py
    │   │               ├── split_text.py
    │   │               └── text_encoder.py
    │   ├── melo/
    │   │   ├── __init__.py
    │   │   ├── api.py
    │   │   ├── app.py
    │   │   ├── attentions.py
    │   │   ├── commons.py
    │   │   ├── configs/
    │   │   │   └── config.json
    │   │   ├── data/
    │   │   │   └── example/
    │   │   │       └── metadata.list
    │   │   ├── data_utils.py
    │   │   ├── download_utils.py
    │   │   ├── infer.py
    │   │   ├── init_downloads.py
    │   │   ├── losses.py
    │   │   ├── main.py
    │   │   ├── mel_processing.py
    │   │   ├── models.py
    │   │   ├── modules.py
    │   │   ├── monotonic_align/
    │   │   │   ├── __init__.py
    │   │   │   └── core.py
    │   │   ├── preprocess_text.py
    │   │   ├── split_utils.py
    │   │   ├── text/
    │   │   │   ├── __init__.py
    │   │   │   ├── chinese.py
    │   │   │   ├── chinese_bert.py
    │   │   │   ├── chinese_mix.py
    │   │   │   ├── cleaner.py
    │   │   │   ├── cleaner_multiling.py
    │   │   │   ├── cmudict.rep
    │   │   │   ├── cmudict_cache.pickle
    │   │   │   ├── english.py
    │   │   │   ├── english_bert.py
    │   │   │   ├── english_utils/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── abbreviations.py
    │   │   │   │   ├── number_norm.py
    │   │   │   │   └── time_norm.py
    │   │   │   ├── es_phonemizer/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── base.py
    │   │   │   │   ├── cleaner.py
    │   │   │   │   ├── es_symbols.json
    │   │   │   │   ├── es_symbols.txt
    │   │   │   │   ├── es_symbols_v2.json
    │   │   │   │   ├── es_to_ipa.py
    │   │   │   │   ├── example_ipa.txt
    │   │   │   │   ├── gruut_wrapper.py
    │   │   │   │   ├── punctuation.py
    │   │   │   │   ├── spanish_symbols.txt
    │   │   │   │   └── test.ipynb
    │   │   │   ├── fr_phonemizer/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── base.py
    │   │   │   │   ├── cleaner.py
    │   │   │   │   ├── en_symbols.json
    │   │   │   │   ├── example_ipa.txt
    │   │   │   │   ├── fr_symbols.json
    │   │   │   │   ├── fr_to_ipa.py
    │   │   │   │   ├── french_abbreviations.py
    │   │   │   │   ├── french_symbols.txt
    │   │   │   │   ├── gruut_wrapper.py
    │   │   │   │   └── punctuation.py
    │   │   │   ├── french.py
    │   │   │   ├── french_bert.py
    │   │   │   ├── japanese.py
    │   │   │   ├── japanese_bert.py
    │   │   │   ├── ko_dictionary.py
    │   │   │   ├── korean.py
    │   │   │   ├── opencpop-strict.txt
    │   │   │   ├── spanish.py
    │   │   │   ├── spanish_bert.py
    │   │   │   ├── symbols.py
    │   │   │   └── tone_sandhi.py
    │   │   ├── train.py
    │   │   ├── train.sh
    │   │   ├── transforms.py
    │   │   └── utils.py
    │   ├── mlx/
    │   │   ├── __init__.py
    │   │   └── flux/
    │   │       ├── __init__.py
    │   │       ├── autoencoder.py
    │   │       ├── clip.py
    │   │       ├── datasets.py
    │   │       ├── flux.py
    │   │       ├── layers.py
    │   │       ├── lora.py
    │   │       ├── model.py
    │   │       ├── sampler.py
    │   │       ├── t5.py
    │   │       ├── tokenizers.py
    │   │       ├── trainer.py
    │   │       └── utils.py
    │   └── whisper/
    │       ├── __init__.py
    │       ├── __main__.py
    │       ├── assets/
    │       │   ├── gpt2.tiktoken
    │       │   ├── mel_filters.npz
    │       │   └── multilingual.tiktoken
    │       ├── audio.py
    │       ├── decoding.py
    │       ├── model.py
    │       ├── normalizers/
    │       │   ├── __init__.py
    │       │   ├── basic.py
    │       │   ├── english.json
    │       │   └── english.py
    │       ├── timing.py
    │       ├── tokenizer.py
    │       ├── transcribe.py
    │       ├── triton_ops.py
    │       ├── utils.py
    │       └── version.py
    ├── types.py
    ├── ui/
    │   ├── __init__.py
    │   ├── gradio/
    │   │   ├── __init__.py
    │   │   ├── chat_interface.py
    │   │   ├── media_interface.py
    │   │   └── utils/
    │   │       ├── __init__.py
    │   │       └── latex.py
    │   └── web/
    │       └── ui/
    │           ├── .eslintignore
    │           ├── .eslintrc.yml
    │           ├── .gitignore
    │           ├── .prettierignore
    │           ├── .prettierrc.yml
    │           ├── package.json
    │           ├── public/
    │           │   └── index.html
    │           └── src/
    │               ├── App.js
    │               ├── components/
    │               │   ├── MenuSide.js
    │               │   ├── Title.js
    │               │   ├── alertComponent.js
    │               │   ├── apiContext.js
    │               │   ├── authAlertDialog.js
    │               │   ├── copyComponent.js
    │               │   ├── deleteDialog.js
    │               │   ├── errorMessageSnackBar.js
    │               │   ├── fetchWrapper.js
    │               │   ├── fetcher.js
    │               │   ├── hotkeyFocusTextField.js
    │               │   ├── successMessageSnackBar.js
    │               │   ├── tableTitle.js
    │               │   ├── themeButton.js
    │               │   ├── themeContext.js
    │               │   ├── titleTypography.js
    │               │   ├── translateButton.js
    │               │   ├── utils.js
    │               │   └── versionLabel.js
    │               ├── i18n.js
    │               ├── index.css
    │               ├── index.js
    │               ├── locales/
    │               │   ├── en.json
    │               │   ├── ja.json
    │               │   ├── ko.json
    │               │   └── zh.json
    │               ├── router/
    │               │   └── index.js
    │               ├── scenes/
    │               │   ├── _layout/
    │               │   │   └── index.js
    │               │   ├── cluster_info/
    │               │   │   ├── index.js
    │               │   │   ├── nodeInfo.js
    │               │   │   └── style.js
    │               │   ├── launch_model/
    │               │   │   ├── LaunchModel.js
    │               │   │   ├── components/
    │               │   │   │   ├── cachedListDialog.js
    │               │   │   │   ├── commandBuilder.js
    │               │   │   │   ├── dynamicFieldList.js
    │               │   │   │   ├── editCustomModelDialog.js
    │               │   │   │   ├── launchModelDrawer.js
    │               │   │   │   ├── modelFormConfig.js
    │               │   │   │   ├── pasteDialog.js
    │               │   │   │   ├── progress.js
    │               │   │   │   ├── selectField.js
    │               │   │   │   └── virtualenvListDialog.js
    │               │   │   ├── data/
    │               │   │   │   └── data.js
    │               │   │   ├── index.js
    │               │   │   ├── launchCustom.js
    │               │   │   ├── modelCard.js
    │               │   │   └── styles/
    │               │   │       └── modelCardStyle.css
    │               │   ├── login/
    │               │   │   ├── header.js
    │               │   │   └── login.js
    │               │   ├── register_model/
    │               │   │   ├── components/
    │               │   │   │   ├── addControlnet.js
    │               │   │   │   ├── addModelSpecs.js
    │               │   │   │   ├── addStop.js
    │               │   │   │   └── addVirtualenv.js
    │               │   │   ├── data/
    │               │   │   │   └── languages.js
    │               │   │   ├── index.js
    │               │   │   ├── registerModel.js
    │               │   │   └── styles/
    │               │   │       └── registerModelStyle.css
    │               │   └── running_models/
    │               │       └── index.js
    │               └── theme.js
    └── utils.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .dockerignore
================================================
doc/
.idea/
.github/
build/
xinference.egg-info/
xinference/web/ui/build/
xinference/web/ui/node_modules/


================================================
FILE: .gitattributes
================================================
xinference/_version.py export-subst


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.yaml
================================================
name: "Bug Report"
description: Submit a bug report to help us improve Xinference. You should provide useful information AMAP rather than simply describing what happened. / 提交一个问题报告来帮助我们改进 Xinference。你必须提供有用的信息而不只是描述发生的现象，否则将不予处理。
body:
  - type: textarea
    id: system-info
    attributes:
      label: System Info / 系統信息
      description: Your operating environment / 您的运行环境信息
      placeholder: Includes Cuda version, transformers / xllamacpp / vllm version, Python version, operating system... / 包括Cuda版本，transformers / xllamacpp / vllm版本，Python版本，操作系统等。
    validations:
      required: true

  - type: checkboxes
    id: information-scripts-examples
    attributes:
      label: Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？
      description: 'How are you using Xinference? / 以何种方式使用 Xinference？'
      options:
        - label: docker / docker
        - label: pip install / 通过 pip install 安装
        - label: installation from source / 从源码安装

  - type: textarea
    id: start-way
    attributes:
      label: Version info / 版本信息
      description: The version of Xinference you are running / Xinference 版本
    validations:
      required: true

  - type: textarea
    id: commandline
    attributes:
      label: The command used to start Xinference / 用以启动 xinference 的命令
      description: |
        Please provide the command used to start Xinference. 
        If it is a distributed scenario, the commands for starting the supervisor and worker need to be listed separately. 
        If it is a Docker scenario, please provide the complete command for starting Xinference through Docker. 
        If it is another method, please describe it specifically.

        请提供启动 xinference 的命令。
        如果是分布式场景，启动 supervisor 和 worker 的命令需要分别列出。
        如果是docker场景，请提供通过 docker 启动 xinference 的完整命令。
        如果是其他方式，请具体描述。
    validations:
      required: true

  - type: textarea
    id: reproduction
    validations:
      required: true
    attributes:
      label: Reproduction / 复现过程
      description: |
        Please provide a code example that reproduces the problem you encountered, preferably with a minimal reproduction unit.
        If you have code snippets, error messages, stack traces, please provide them here as well.
        Please format your code correctly using code tags. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
        Do not use screenshots, as they are difficult to read and (more importantly) do not allow others to copy and paste your code.
        
        请提供能重现您遇到的问题的代码示例,最好是最小复现单元。
        如果您有代码片段、错误信息、堆栈跟踪、涉及的命令行操作等也请在此提供。
        请使用代码标签正确格式化您的代码。请参见 https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
        请勿使用截图，因为截图难以阅读，而且（更重要的是）不允许他人复制粘贴您的代码。
      placeholder: |
        Steps to reproduce the behavior/复现Bug的步骤:
          
          1.
          2.
          3.

  - type: textarea
    id: expected-behavior
    validations:
      required: true
    attributes:
      label: Expected behavior / 期待表现
      description: "A clear and concise description of what you would expect to happen. / 简单描述您期望发生的事情。"

================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.yaml
================================================
name: "Feature request"
description: Submit a request for a new Xinference feature / 提交一个新的 Xinference 的功能建议
labels: [ "feature" ]
body:
  - type: textarea
    id: feature-request
    validations:
      required: true
    attributes:
      label: Feature request / 功能建议
      description: |
        A brief description of the functional proposal.
        对功能建议的简述。

  - type: textarea
    id: motivation
    validations:
      required: true
    attributes:
      label: Motivation / 动机
      description: |
        Your motivation for making the suggestion. If that motivation is related to another GitHub issue, link to it here.
        您提出建议的动机。如果该动机与另一个 GitHub 问题有关，请在此处提供对应的链接。

  - type: textarea
    id: contribution
    validations:
      required: true
    attributes:
      label: Your contribution / 您的贡献
      description: |
        
        Your PR link or any other link you can help with.
        您的PR链接或者其他您能提供帮助的链接。

================================================
FILE: .github/workflows/assign.yaml
================================================
name: Assign
on:
  issue_comment:
    types: created

permissions:
  contents: read

jobs:
  issue_assign:
    permissions:
      issues: write
      pull-requests: write
    runs-on: ubuntu-22.04
    steps:
    - if: github.event.comment.body == 'take'
      run: |
        echo "Assigning issue ${{ github.event.issue.number }} to ${{ github.event.comment.user.login }}"
        curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -d '{"assignees": ["${{ github.event.comment.user.login }}"]}' https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.issue.number }}/assignees

================================================
FILE: .github/workflows/docker-cd.yaml
================================================
name: Xinference CD for DockerHub

on:
  schedule:
    - cron: '0 18 * * *'
  push:
    tags:
      - '*'
  workflow_dispatch:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  build:
    timeout-minutes: 240
    runs-on: self-hosted
    strategy:
      matrix:
        python-version: [ "3.10" ]
    steps:
      - name: Check out code
        uses: actions/checkout@v3
        with:
          fetch-depth: 0
          submodules: recursive

      - name: Log in to Docker Hub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_PASSWORD }}

      - name: Build and push Docker image
        shell: bash
        if: ${{ github.repository == 'xorbitsai/inference' }}
        env:
          DOCKER_ORG: ${{ secrets.DOCKERHUB_USERNAME }}
          PY_VERSION: ${{ matrix.python-version }}
        run: |
          if [[ "$GITHUB_REF" =~ ^"refs/tags/" ]]; then
            export GIT_TAG=$(echo "$GITHUB_REF" | sed -e "s/refs\/tags\///g")
          fi

          docker system prune -f -a

          if [[ -n "$GIT_TAG" ]]; then
            BRANCHES="$GIT_TAG"
            echo "Will handle tag $BRANCHES"
          else
            MAINBRANCH=$(git rev-parse --abbrev-ref HEAD)
            BRANCHES="$MAINBRANCH"
          fi
          
          for branch in $BRANCHES; do
            if [[ -n "$GIT_TAG" ]]; then
              export IMAGE_TAG="$GIT_TAG"
            else
              git checkout $branch
              export IMAGE_TAG="nightly-$branch"
            fi
            docker build -t "$DOCKER_ORG/xinference:${IMAGE_TAG}" --progress=plain -f xinference/deploy/docker/Dockerfile .
            docker push "$DOCKER_ORG/xinference:${IMAGE_TAG}"
            docker build -t "$DOCKER_ORG/xinference:${IMAGE_TAG}-cpu" --progress=plain -f xinference/deploy/docker/Dockerfile.cpu .
            docker push "$DOCKER_ORG/xinference:${IMAGE_TAG}-cpu"
            echo "XINFERENCE_IMAGE_TAG=${IMAGE_TAG}" >> $GITHUB_ENV
          done
          
          if [[ -n "$GIT_TAG" ]]; then
            docker tag "$DOCKER_ORG/xinference:${GIT_TAG}" "$DOCKER_ORG/xinference:latest"
            docker push "$DOCKER_ORG/xinference:latest"
            docker tag "$DOCKER_ORG/xinference:${GIT_TAG}-cpu" "$DOCKER_ORG/xinference:latest-cpu"
            docker push "$DOCKER_ORG/xinference:latest-cpu"
            echo "XINFERENCE_GIT_TAG=${GIT_TAG}" >> $GITHUB_ENV
          fi

      - name: Clean docker image cache
        shell: bash
        if: ${{ github.repository == 'xorbitsai/inference' }}
        run: |
          docker system prune -f -a


================================================
FILE: .github/workflows/issue.yaml
================================================
name: Close inactive issues
on:
  schedule:
    - cron: "0 19 * * *"
  workflow_dispatch:

jobs:
  close-issues:
    runs-on: ubuntu-latest
    permissions:
      issues: write
      pull-requests: write
    steps:
      - uses: actions/stale@v9
        with:
          days-before-issue-stale: 14
          days-before-issue-close: 10
          stale-issue-label: "stale"
          stale-issue-message: "This issue is stale because it has been open for 14 days with no activity."
          close-issue-message: "This issue was closed because it has been inactive for 10 days since being marked as stale."
          days-before-pr-stale: -1
          days-before-pr-close: -1
          operations-per-run: 500
          repo-token: ${{ secrets.GITHUB_TOKEN }}


================================================
FILE: .github/workflows/pr_auto_run_gen_docs.yaml
================================================
name: Auto run gen_docs.py and commit changes to PR

on:
  pull_request_target:
    types: [opened, synchronize]

permissions:
  contents: write
  pull-requests: write

jobs:
  run-gen-docs-and-commit:
    if: startsWith(github.event.pull_request.head.ref, 'chore/models-sync/')
    runs-on: ubuntu-latest
    steps:
      - name: Checkout base repository (trusted scripts)
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.base.ref }}
          repository: ${{ github.repository }}
          path: main
          fetch-depth: 0

      - name: Checkout PR head branch (working copy)
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.ref }}
          repository: ${{ github.event.pull_request.head.repo.full_name }}
          path: pr
          fetch-depth: 0

      - name: Decide whether to run gen_docs for latest commit
        id: decide
        working-directory: pr
        run: |
          set -e
          MSG="$(git log -1 --pretty=%B || echo "")"
          echo "Latest commit message: $MSG"
          if echo "$MSG" | grep -Eiq '\[(skip ci|ci skip)\]'; then
            echo "Skip token found in commit message; will not run."
            echo "run=false" >> $GITHUB_OUTPUT
            exit 0
          fi

          HEAD_SHA="$(git rev-parse HEAD)"
          BASE_SHA="${{ github.event.pull_request.base.sha }}"
          RANGE="$BASE_SHA...$HEAD_SHA"
          echo "Diff range (full PR): $RANGE"

          CHANGED_FILES="$(git diff --name-only "$RANGE" || true)"
          echo "Changed files in PR range:"
          echo "$CHANGED_FILES"

          RUN="false"
          for f in $CHANGED_FILES; do
            case "$f" in
              xinference/model/llm/llm_family.json|xinference/model/embedding/model_spec.json|xinference/model/rerank/model_spec.json|xinference/model/image/model_spec.json|xinference/model/audio/model_spec.json|xinference/model/video/model_spec.json)
                RUN="true"; break;;
            esac
          done
          echo "run=$RUN" >> $GITHUB_OUTPUT

      - name: Set up Python
        if: steps.decide.outputs.run == 'true'
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install gen_docs dependencies
        if: steps.decide.outputs.run == 'true'
        run: |
          python -m pip install --upgrade pip
          python -m pip install jinja2
          python -m pip install "xinference[doc]"

      - name: Run gen_docs.py if present
        if: steps.decide.outputs.run == 'true'
        working-directory: pr
        run: |
          echo "[Debug] CWD: $(pwd)"
          echo "[Debug] List ../main:"
          ls -la ../main || true
          echo "[Debug] List ../main/doc/source:"
          ls -la ../main/doc/source || true

          # Use PR branch's gen_docs.py if it exists, otherwise use main branch's
          if [ -f "doc/source/gen_docs.py" ]; then
            echo "Using PR branch's doc/source/gen_docs.py"
            echo "Running pr/doc/source/gen_docs.py from its directory"
            (cd doc/source && python -u gen_docs.py)
          elif [ -f "../main/doc/source/gen_docs.py" ]; then
            echo "Copying main/doc/source/gen_docs.py into PR workspace"
            mkdir -p doc/source
            cp -f ../main/doc/source/gen_docs.py doc/source/gen_docs.py
            echo "Running pr/doc/source/gen_docs.py from its directory"
            (cd doc/source && python -u gen_docs.py)
          elif [ -f "gen_docs.py" ]; then
            echo "Using PR branch's gen_docs.py"
            echo "Running pr/gen_docs.py"
            python -u gen_docs.py
          elif [ -f "../main/gen_docs.py" ]; then
            echo "Copying main/gen_docs.py into PR workspace"
            cp -f ../main/gen_docs.py gen_docs.py
            echo "Running pr/gen_docs.py"
            python -u gen_docs.py
          else
            echo "gen_docs.py not found in main repository, skipping."
          fi

      - name: Stage and commit changes back to PR branch
        if: steps.decide.outputs.run == 'true'
        working-directory: pr
        run: |
          echo "[Debug] Before staging:" && git status --porcelain

          echo "[Debug] check-ignore for generated file:"
          git check-ignore -v doc/source/_generated/auto_generated.txt || echo "Not ignored"
          git add -A
          git add -f doc/source/_generated || true

          echo "[Debug] After staging:" && git status --porcelain
          echo "[Debug] Staged diff:" && git diff --cached --name-status || true

          if ! git diff --cached --quiet; then
            git config user.name "github-actions[bot]"
            git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
            git commit -m "chore(docs): auto-run gen_docs.py"
          else
            echo "No changes to commit."
          fi

      - name: Push back for same-repo PR
        env:
          BRANCH: ${{ github.event.pull_request.head.ref }}
        if: steps.decide.outputs.run == 'true' && github.event.pull_request.head.repo.full_name == github.repository
        working-directory: pr
        run: |
          echo "Pushing changes to same-repo PR..."
          git push origin HEAD:$BRANCH || echo "No changes to push."

      - name: Push back for fork PR using maintainer PAT
        if: steps.decide.outputs.run == 'true' && github.event.pull_request.head.repo.full_name != github.repository && github.event.pull_request.maintainer_can_modify
        env:
          PUSH_TOKEN: ${{ secrets.PUSH_TOKEN }}
          BRANCH: ${{ github.event.pull_request.head.ref }}
          HEAD_FULL_NAME: ${{ github.event.pull_request.head.repo.full_name }}
        working-directory: pr
        run: |
          if [ -z "$PUSH_TOKEN" ]; then
            echo "Missing secrets.PUSH_TOKEN; cannot push to fork. Skipping push."
            exit 0
          fi
          echo "Pushing changes to fork PR using maintainer PAT..."
          git remote set-url origin "https://x-access-token:${PUSH_TOKEN}@github.com/${HEAD_FULL_NAME}.git"
          git push origin HEAD:$BRANCH || echo "No changes to push."

      - name: Skip push for fork PR without maintainer edit permission
        if: steps.decide.outputs.run != 'true' && github.event.pull_request.head.repo.full_name != github.repository && !github.event.pull_request.maintainer_can_modify
        run: |
          echo "Fork PR does not allow edits by maintainers; run succeeded but skip pushing commits."


================================================
FILE: .github/workflows/python.yaml
================================================
name: Python CI

on:
  push:
    branches:
      - '*'
  pull_request:
    types: ['opened', 'reopened', 'synchronize']

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  lint:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: [ "ubuntu-latest" ]
        python-version: [ "3.10" ]
    steps:
      - name: Check out code
        uses: actions/checkout@v3
        with:
          fetch-depth: 0
          submodules: recursive
      - name: Set up Python environment
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"
      - name: Install pre-commit
        run: pip install pre-commit
      - name: Run pre-commit
        run: pre-commit run --all-files
      - name: Set up Node.js
        uses: actions/setup-node@v1
        with:
          node-version: 16
      # ESLint and Prettier must be in `package.json`
      - name: Install Node.js dependencies
        run: cd xinference/ui/web/ui && npm ci
      - name: ESLint Check
        run: cd xinference/ui/web/ui && npx eslint .
      - name: Prettier Check
        run: cd xinference/ui/web/ui && ./node_modules/.bin/prettier --check .

  build_test_job:
    runs-on: ${{ matrix.os }}
    needs: lint
    env:
      CONDA_ENV: test
      SELF_HOST_PYTHON: /root/miniconda3/envs/inference_test/bin/python
      SELF_HOST_CONDA: /root/miniconda3/condabin/conda
    defaults:
      run:
        shell: bash -l {0}
    strategy:
      fail-fast: false
      matrix:
        os: [ "ubuntu-latest", "macos-latest", "windows-latest" ]
        python-version: [ "3.10", "3.11", "3.12", "3.13" ]
        module: [ "xinference" ]
        exclude:
          - { os: macos-latest, python-version: 3.11 }
          - { os: macos-latest, python-version: 3.12 }
          - { os: windows-latest, python-version: 3.11 }
          - { os: windows-latest, python-version: 3.12 }
        include:
          - { os: self-hosted, module: gpu, python-version: "3.11"}
          - { os: macos-latest, module: metal, python-version: "3.10" }

    steps:
      - name: Check out code
        uses: actions/checkout@v3
        with:
          fetch-depth: 0
          submodules: recursive

      - name: Set up conda ${{ matrix.python-version }}
        uses: conda-incubator/setup-miniconda@v3
        if: ${{ matrix.module != 'gpu' }}
        with:
          python-version: ${{ matrix.python-version }}
          activate-environment: ${{ env.CONDA_ENV }}

      # Important for python == 3.12 and 3.13
      - name: Update pip and setuptools
        if: ${{ matrix.python-version == '3.12' || matrix.python-version == '3.13' }}
        run: |
          python -m pip install -U pip "setuptools<82"

      # Install torch for Python 3.13 using nightly builds
      - name: Install torch for Python 3.13
        if: ${{ matrix.python-version == '3.13'}}
        run: |
          python -m pip install torch torchvision torchaudio

      - name: Install numpy
        if: |
          (startsWith(matrix.os, 'macos') && (matrix.python-version == '3.13')) || 
          (startsWith(matrix.os, 'windows'))
        run: |
          python -m pip install "numpy<2"

      - name: Install dependencies
        env:
          MODULE: ${{ matrix.module }}
          OS: ${{ matrix.os }}
        if: ${{ matrix.module != 'gpu' }}
        run: |
          if [ "$OS" == "ubuntu-latest" ]; then
            sudo rm -rf /usr/share/dotnet
            sudo rm -rf /opt/ghc
            sudo rm -rf "/usr/local/share/boost"
            sudo rm -rf "$AGENT_TOOLSDIRECTORY"
          fi
          pip install -e ".[dev]"
          pip install "xllamacpp>=0.2.0"
          if [ "$MODULE" == "metal" ]; then
            conda install -c conda-forge "ffmpeg<7"
            pip install "mlx>=0.22.0"
            pip install mlx-lm
            pip install "mlx-vlm>=0.3.4"
            pip install mlx-whisper
            pip install f5-tts-mlx
            pip install qwen-vl-utils!=0.0.9
            pip install tomli 
          else
            pip install "transformers<4.49"
            pip install attrdict
            pip install "timm>=0.9.16"
            if [ "${{ matrix.python-version }}" != "3.13" ]; then
              pip install torch torchvision
            fi
            pip install accelerate
            pip install sentencepiece
            pip install transformers_stream_generator
            pip install bitsandbytes
            pip install "sentence-transformers>=5.1.1"
            pip install modelscope
            pip install diffusers
            pip install protobuf
            pip install FlagEmbedding
            pip install "tenacity>=8.2.0,<8.4.0"
            pip install "jinja2==3.1.2"
            pip install jj-pytorchvideo
            pip install qwen-vl-utils!=0.0.9
            pip install datamodel_code_generator
            pip install jsonschema
          fi
        working-directory: .

      - name: Clean up disk
        if: |
          (startsWith(matrix.os, 'ubuntu'))
        run: |
          sudo rm -rf /usr/share/dotnet
          sudo rm -rf /usr/local/lib/android
          sudo rm -rf /opt/ghc
          sudo apt-get clean
          sudo rm -rf /var/lib/apt/lists/*
          df -h 
          
      - name: Fix SSL on Windows
        if: startsWith(matrix.os, 'windows')
        shell: bash
        run: |
          echo "activate conda env"

          source $CONDA/etc/profile.d/conda.sh || true
          conda activate $CONDA_ENV || true
          
          python -V
          which python
          
          echo "before: $SSL_CERT_FILE"
          
          python -m pip install --quiet certifi || true
          
          SSL_CERT_FILE=$(python -c "import certifi,os;print(os.path.normpath(certifi.where()))")
          
          export SSL_CERT_FILE
          export REQUESTS_CA_BUNDLE=$SSL_CERT_FILE
          export CURL_CA_BUNDLE=$SSL_CERT_FILE
          
          echo "after: $SSL_CERT_FILE"
          echo "SSL_CERT_FILE=$(python -c 'import certifi;print(certifi.where())')" >> $GITHUB_ENV
    
      - name: Test with pytest
        env:
          MODULE: ${{ matrix.module }}
          PYTORCH_MPS_HIGH_WATERMARK_RATIO: 1.0
          PYTORCH_MPS_LOW_WATERMARK_RATIO: 0.2
          XFORMERS_FORCE_DISABLE_TRITON: 1
          TORCH_DISABLE_FLASH_ATTENTION: 1
        run: |
          if [ "$MODULE" == "gpu" ]; then
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U -e ".[audio,dev]"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "openai>1"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U modelscope
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U gguf
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U uv
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U sse_starlette
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U xoscar
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "python-jose[cryptography]"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "passlib[bcrypt]"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "aioprometheus[starlette]"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "pynvml"
            ${{ env.SELF_HOST_PYTHON }} -m pip install "transformers==4.53.2"
            ${{ env.SELF_HOST_PYTHON }} -m pip install "funasr==1.2.7"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U nemo_text_processing<1.1.0
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U omegaconf~=2.3.0
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U WeTextProcessing<1.0.4
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U librosa
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U xxhash
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "ChatTTS>=0.2.1"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U HyperPyYAML
            ${{ env.SELF_HOST_PYTHON }} -m pip uninstall -y matcha-tts
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U onnxruntime-gpu==1.16.0; sys_platform == 'linux'
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U openai-whisper
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "torch==2.7.0" "torchaudio==2.7.0" "torchvision==0.22.0"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "loguru"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "natsort"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "loralib"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "ormsgpack"
            ${{ env.SELF_HOST_PYTHON }} -m pip uninstall -y opencc
            ${{ env.SELF_HOST_PYTHON }} -m pip uninstall -y "faster_whisper"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U accelerate
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U verovio
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U cachetools
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U silero-vad
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U pydantic
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U diffusers
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U onnx
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U onnxconverter_common
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U torchdiffeq
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "x_transformers>=1.31.14"
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U pypinyin
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U tomli
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U vocos
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U jieba
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U soundfile
            ${{ env.SELF_HOST_PYTHON }} -m pip install tensorizer
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U sentence-transformers
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U FlagEmbedding
            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "peft<=0.17.1"
            ${{ env.SELF_HOST_PYTHON }} -m pip install "xllamacpp>=0.2.0" --index-url https://xorbitsai.github.io/xllamacpp/whl/cu124 --extra-index-url https://pypi.org/simple
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              --disable-warnings \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/core/tests/test_continuous_batching.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/embedding/tests/test_qwen3_vl_engine_params.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/rerank/tests/test_qwen3_vl_reranker_virtualenv.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/image/tests/test_stable_diffusion.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/image/tests/test_got_ocr2.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_whisper.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_funasr.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_chattts.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_cosyvoice.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_f5tts.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_f5tts.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_melotts.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_kokoro.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_fish_speech.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_megatts.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/embedding/tests/test_integrated_embedding.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/embedding/vllm/tests/test_vllm_embedding.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/llm/transformers/tests/test_tensorizer.py && \
            ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/llm/tests/test_llm_model.py
          elif [ "$MODULE" == "metal" ]; then
            pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/llm/mlx/tests/test_mlx.py && \
            pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_whisper_mlx.py && \
            pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_f5tts_mlx.py && \
            pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/llm/mlx/tests/test_distributed_model.py
          else
            pytest --timeout=3000 \
              -W ignore::PendingDeprecationWarning \
              -vv \
              --cov-config=setup.cfg \
              --cov-report=xml \
              --cov=xinference \
              --ignore xinference/core/tests/test_continuous_batching.py \
              --ignore xinference/model/image/tests/test_stable_diffusion.py \
              --ignore xinference/model/image/tests/test_got_ocr2.py \
              --ignore xinference/model/audio/tests \
              --ignore xinference/model/embedding/tests/test_integrated_embedding.py \
              --ignore xinference/model/llm/transformers/tests/test_tensorizer.py \
              --ignore xinference/model/llm/tests/test_llm_model.py \
              --ignore xinference/model/llm/vllm \
              --ignore xinference/model/llm/sglang \
              --ignore xinference/client/tests/test_client.py \
              --ignore xinference/client/tests/test_async_client.py \
              --ignore xinference/model/llm/mlx \
              xinference
          
          fi
        working-directory: .


================================================
FILE: .github/workflows/release.yaml
================================================
name: Build and upload to PyPI

on:
  push:
    tags:
      - '*'

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true


jobs:
  build-publish:
    name: Build and publish Python distribution to PyPI
    runs-on: ubuntu-latest

    steps:
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"
      - uses: actions/checkout@v3
      - name: Install pypa/build
        run: >-
          python3 -m
          pip install
          build "setuptools<82"
          --user
      - name: Build web
        run: >-
          python setup.py build_web
      - name: Build a binary wheel and a source tarball
        run: >-
          python3 -m
          build
          --sdist
          --wheel
          --outdir dist/
          .
      # if is xorbitsai repo, upload to pypi
      - uses: pypa/gh-action-pypi-publish@v1.5.0
        if: github.repository == 'xorbitsai/inference'
        with:
          user: __token__
          password: ${{ secrets.PYPI_PASSWORD }}

      # if is not xorbitsai repo, upload to test
      - uses: pypa/gh-action-pypi-publish@v1.5.0
        if: github.repository != 'xorbitsai/inference'
        with:
          user: __token__
          password: ${{ secrets.TEST_PYPI_PASSWORD }}
          verbose: true
          repository_url: https://test.pypi.org/legacy/


================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
generated/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# IDEs
.idea
.vscode
*.iml

# VIM
*.sw*

# web staff
node_modules/
static/

# Local docs (project notes, refactoring plans, etc.)
docs/

# doc
doc/source/savefig/

# local env
local_env

asv/results

.DS_Store

# Exclude markdown files except README files
*.md
!README.md
!README_*.md


================================================
FILE: .pre-commit-config.yaml
================================================
files: xinference
repos:
  - repo: https://github.com/psf/black
    rev: 25.1.0
    hooks:
      - id: black
        exclude: thirdparty
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
      - id: end-of-file-fixer
        exclude: ^xinference/thirdparty
      - id: trailing-whitespace
        exclude: thirdparty
  - repo: https://github.com/PyCQA/flake8
    rev: 6.0.0
    hooks:
      - id: flake8
        args: [--config, setup.cfg]
        exclude: thirdparty
  - repo: https://github.com/pycqa/isort
    rev: 5.12.0
    hooks:
      - id: isort
        args: [--sp, setup.cfg]
        exclude: thirdparty
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.15.0
    hooks:
      - id: mypy
        additional_dependencies: ["tokenize-rt==3.2.0", "types-requests", "types-tabulate"]
        args: [--ignore-missing-imports, --follow-imports, skip]
        exclude: thirdparty
  - repo: https://github.com/codespell-project/codespell
    rev: v2.2.2
    hooks:
      - id: codespell
        args: [ --config, setup.cfg]
        exclude: thirdparty


================================================
FILE: .readthedocs.yaml
================================================
version: 2

# Build documentation in the docs/ directory with Sphinx
sphinx:
   configuration: doc/source/conf.py

build:
  os: ubuntu-20.04
  tools:
    python: "3.10"

python:
  install:
    - method: pip
      path: .
      extra_requirements:
        - doc

submodules:
  include: all
  recursive: true


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

================================================
FILE: MANIFEST.in
================================================
global-include *.pyx
global-include *.pxd
global-include xinference/**/*.json
global-exclude *.c
global-exclude *.cpp
include setup.cfg
include pyproject.toml
global-exclude .DS_Store
include versioneer.py
include xinference/_version.py
global-exclude conftest.py
include xinference/locale/*.json
include xinference/model/llm/*.json
include xinference/model/embedding/*.json
graft xinference/thirdparty
global-include xinference/ui/web/ui/build/**/*

================================================
FILE: README.md
================================================
<div align="center">
<img src="./assets/xorbits-logo.png" width="180px" alt="xorbits" />

# Xorbits Inference: Model Serving Made Easy 🤖

<p align="center">
  <a href="https://xinference.io/en">Xinference Enterprise</a> ·
  <a href="https://inference.readthedocs.io/en/latest/getting_started/installation.html#installation">Self-hosting</a> ·
  <a href="https://inference.readthedocs.io/">Documentation</a>
</p>

[![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
[![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
[![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
[![Docker Pulls](https://img.shields.io/docker/pulls/xprobe/xinference?style=for-the-badge&logo=docker)](https://hub.docker.com/r/xprobe/xinference)
[![Discord](https://img.shields.io/badge/join_Discord-5462eb.svg?logo=discord&style=for-the-badge&logoColor=%23f5f5f5)](https://discord.gg/Xw9tszSkr5)
[![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=x&style=for-the-badge)](https://twitter.com/xorbitsio)

<p align="center">
  <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-454545?style=for-the-badge"></a>
  <a href="./README_zh_CN.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/中文介绍-d9d9d9?style=for-the-badge"></a>
  <a href="./README_ja_JP.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-d9d9d9?style=for-the-badge"></a>
</p>

</div>
<br />


Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, 
speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy 
and serve your or state-of-the-art built-in models using just a single command. Whether you are a 
researcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full 
potential of cutting-edge AI models.

<div align="center">
<i><a href="https://discord.gg/Xw9tszSkr5">👉 Join our Discord community!</a></i>
</div>

## 🔥 Hot Topics
### Framework Enhancements
- Agent-native Serving: Xinference integrates with [Xagent](https://github.com/xorbitsai/xagent) to enable dynamic planning, tool use, and autonomous multi-step reasoning — moving beyond static pipelines.
- Auto batch: Multiple concurrent requests are automatically batched, significantly improving throughput: [#4197](https://github.com/xorbitsai/inference/pull/4197)
- [Xllamacpp](https://github.com/xorbitsai/xllamacpp): New llama.cpp Python binding, maintained by Xinference team, supports continuous batching and is more production-ready.: [#2997](https://github.com/xorbitsai/inference/pull/2997)
- Distributed inference: running models across workers: [#2877](https://github.com/xorbitsai/inference/pull/2877)
- VLLM enhancement: Shared KV cache across multiple replicas: [#2732](https://github.com/xorbitsai/inference/pull/2732)
### New Models
- Built-in support for [Qwen-3.5](https://github.com/QwenLM/Qwen3.5): [#4639](https://github.com/xorbitsai/inference/pull/4639)
- Built-in support for [GLM-5](https://github.com/zai-org/GLM-5): [#4638](https://github.com/xorbitsai/inference/pull/4638)
- Built-in support for [MiniMax-M2.5](https://github.com/MiniMax-AI/MiniMax-M2.5): [#4630](https://github.com/xorbitsai/inference/pull/4630)
- Built-in support for [Kimi-K2.5](https://github.com/MoonshotAI/Kimi-K2.5): [#4631](https://github.com/xorbitsai/inference/pull/4631)
- Built-in support for [FLUX.2-Klein](https://bfl.ai/models/flux-2-klein): [#4596](https://github.com/xorbitsai/inference/pull/4596)
- Built-in support for [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR): [#4581](https://github.com/xorbitsai/inference/pull/4581)
- Built-in support for [GLM-4.7](https://huggingface.co/zai-org/GLM-4.7): [#4565](https://github.com/xorbitsai/inference/pull/4565)
- Built-in support for [MinerU2.5-2509-1.2B](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B): [#4569](https://github.com/xorbitsai/inference/pull/4569)
### Integrations
- [Xagent](https://github.com/xorbitsai/xagent): an enterprise agent platform for building and running AI agents with planning, memory, and tool use — not limited to rigid workflows.
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
- [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.
- [RAGFlow](https://github.com/infiniflow/ragflow): is an open-source RAG engine based on deep document understanding.
- [MaxKB](https://github.com/1Panel-dev/MaxKB): MaxKB = Max Knowledge Brain, it is a powerful and easy-to-use AI assistant that integrates Retrieval-Augmented Generation (RAG) pipelines, supports robust workflows, and provides advanced MCP tool-use capabilities.


## Key Features
🌟 **Model Serving Made Easy**: Simplify the process of serving large language, speech 
recognition, and multimodal models. You can set up and deploy your models
for experimentation and production with a single command.

⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single 
command. Inference provides access to state-of-the-art open-source models!

🖥 **Heterogeneous Hardware Utilization**: Make the most of your hardware resources with
[ggml](https://github.com/ggerganov/ggml). Xorbits Inference intelligently utilizes heterogeneous
hardware, including GPUs and CPUs, to accelerate your model inference tasks.

⚙️ **Flexible API and Interfaces**: Offer multiple interfaces for interacting
with your models, supporting OpenAI compatible RESTful API (including Function Calling API), RPC, CLI 
and WebUI for seamless model management and interaction.

🌐 **Distributed Deployment**: Excel in distributed deployment scenarios, 
allowing the seamless distribution of model inference across multiple devices or machines.

🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
with popular third-party libraries including [LangChain](https://python.langchain.com/docs/integrations/providers/xinference), [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window), [Dify](https://docs.dify.ai/advanced/model-configuration/xinference), and [Chatbox](https://chatboxai.app/).

## Why Xinference
| Feature                                        | Xinference | FastChat | OpenLLM | RayLLM |
|------------------------------------------------|------------|----------|---------|--------|
| OpenAI-Compatible RESTful API                  | ✅ | ✅ | ✅ | ✅ |
| vLLM Integrations                              | ✅ | ✅ | ✅ | ✅ |
| More Inference Engines (GGML, TensorRT)        | ✅ | ❌ | ✅ | ✅ |
| More Platforms (CPU, Metal)                    | ✅ | ✅ | ❌ | ❌ |
| Multi-node Cluster Deployment                  | ✅ | ❌ | ❌ | ✅ |
| Image Models (Text-to-Image)                   | ✅ | ✅ | ❌ | ❌ |
| Text Embedding Models                          | ✅ | ❌ | ❌ | ❌ |
| Multimodal Models                              | ✅ | ❌ | ❌ | ❌ |
| Audio Models                                   | ✅ | ❌ | ❌ | ❌ |
| More OpenAI Functionalities (Function Calling) | ✅ | ❌ | ❌ | ❌ |

## Using Xinference

- **Self-hosting Xinference Community Edition</br>**
Quickly get Xinference running in your environment with this [starter guide](#getting-started).
Use our [documentation](https://inference.readthedocs.io/) for further references and more in-depth instructions.

- **Xinference for enterprise / organizations</br>**
We provide additional enterprise-centric features. [send us an email](mailto:business@xprobe.io?subject=[GitHub]Business%20License%20Inquiry) to discuss enterprise needs. </br>

## Staying Ahead

Star Xinference on GitHub and be instantly notified of new releases.

![star-us](assets/stay_ahead.gif)

## Getting Started

* [Docs](https://inference.readthedocs.io/en/latest/index.html)
* [Built-in Models](https://inference.readthedocs.io/en/latest/models/builtin/index.html)
* [Custom Models](https://inference.readthedocs.io/en/latest/models/custom.html)
* [Deployment Docs](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html)
* [Examples and Tutorials](https://inference.readthedocs.io/en/latest/examples/index.html)

### Jupyter Notebook

The lightest way to experience Xinference is to try our [Jupyter Notebook on Google Colab](https://colab.research.google.com/github/xorbitsai/inference/blob/main/examples/Xinference_Quick_Start.ipynb).

### Docker 

Nvidia GPU users can start Xinference server using [Xinference Docker Image](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html). Prior to executing the installation command, ensure that both [Docker](https://docs.docker.com/get-docker/) and [CUDA](https://developer.nvidia.com/cuda-downloads) are set up on your system.

```bash
docker run --name xinference -d -p 9997:9997 -e XINFERENCE_HOME=/data -v </on/your/host>:/data --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0
```

### K8s via helm

Ensure that you have GPU support in your Kubernetes cluster, then install as follows.

```
# add repo
helm repo add xinference https://xorbitsai.github.io/xinference-helm-charts

# update indexes and query xinference versions
helm repo update xinference
helm search repo xinference/xinference --devel --versions

# install xinference
helm install xinference xinference/xinference -n xinference --version 0.0.1-v<xinference_release_version>
```

For more customized installation methods on K8s, please refer to the [documentation](https://inference.readthedocs.io/en/latest/getting_started/using_kubernetes.html).

### Quick Start

Install Xinference by using pip as follows. (For more options, see [Installation page](https://inference.readthedocs.io/en/latest/getting_started/installation.html).)

```bash
pip install "xinference[all]"
```

To start a local instance of Xinference, run the following command:

```bash
$ xinference-local
```

Once Xinference is running, there are multiple ways you can try it: via the web UI, via cURL,
 via the command line, or via the Xinference’s python client. Check out our [docs]( https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html#run-xinference-locally) for the guide.

![web UI](assets/screenshot.png)

## Getting involved

| Platform                                                                                        | Purpose                                     |
|-------------------------------------------------------------------------------------------------|---------------------------------------------|
| [Github Issues](https://github.com/xorbitsai/inference/issues)                                  | Reporting bugs and filing feature requests. |
| [Discord](https://discord.gg/Xw9tszSkr5) | Collaborating with other Xinference users.  |
| [Twitter](https://twitter.com/xorbitsio)                                                        | Staying up-to-date on new features.         |

## Citation

If this work is helpful, please kindly cite as:

```bibtex
@inproceedings{lu2024xinference,
    title = "Xinference: Making Large Model Serving Easy",
    author = "Lu, Weizheng and Xiong, Lingfeng and Zhang, Feng and Qin, Xuye and Chen, Yueguo",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.30",
    pages = "291--300",
}
```

## Contributors

<a href="https://github.com/xorbitsai/inference/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=xorbitsai/inference" />
</a>

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=xorbitsai/inference&type=Date)](https://star-history.com/#xorbitsai/inference&Date)

================================================
FILE: README_ja_JP.md
================================================
<div align="center">
<img src="./assets/xorbits-logo.png" width="180px" alt="xorbits" />

# Xorbits Inference: モデルサービングを簡単に 🤖

<p align="center">
  <a href="https://xinference.io/ja">Xinference Enterprise（企業版）</a> ·
  <a href="https://inference.readthedocs.io/en/latest/getting_started/installation.html#installation">セルフホスティング</a> ·
  <a href="https://inference.readthedocs.io/">ドキュメント</a>
</p>

[![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
[![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
[![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
[![Docker Pulls](https://img.shields.io/docker/pulls/xprobe/xinference?style=for-the-badge&logo=docker)](https://hub.docker.com/r/xprobe/xinference)
[![Discord](https://img.shields.io/badge/join_Discord-5462eb.svg?logo=discord&style=for-the-badge&logoColor=%23f5f5f5)](https://discord.gg/Xw9tszSkr5)
[![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=x&style=for-the-badge)](https://twitter.com/xorbitsio)

<p align="center">
  <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-d9d9d9?style=for-the-badge"></a>
  <a href="./README_zh_CN.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/中文介绍-d9d9d9?style=for-the-badge"></a>
  <a href="./README_ja_JP.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-454545?style=for-the-badge"></a>
</p>
</div>
<br />


Xorbits Inference(Xinference) は、言語、音声認識、マルチモーダルモデルのために
設計された強力で汎用性の高いライブラリです。 Xorbits Inference を使えば、たった 1 つのコマンドで、
あなたや最先端のビルトインモデルを簡単にデプロイし、提供することができます。 Xorbits Inference は、
研究者、開発者、データサイエンティストを問わず、最先端の AI モデルの可能性を最大限に引き出すことができます。

<div align="center">
<i><a href="https://discord.gg/Xw9tszSkr5">👉 Discord コミュニティにご参加ください！</a></i>
</div>


## 主な特徴
🌟 **モデルサービングを簡単に**: 大規模な言語、音声認識、マルチモーダルモデルの提供プロセスを簡素化します。
1つのコマンドで、実験用と本番用のモデルをセットアップしてデプロイできます。

⚡️ **最先端モデル**: コマンド1つで最先端のビルトインモデルを実験。
Inference は、最先端のオープンソースモデルへのアクセスを提供します！

🖥 **異機種ハードウェアの利用**: [ggml](https://github.com/ggerganov/ggml) でハードウェアリソースを最大限に活用しましょう。
Xorbits Inference は、GPU や CPU を含む異種ハードウェアをインテリジェントに利用し、モデル推論タスクを高速化します。

⚙️ **柔軟な API とインターフェース**: OpenAI互換のRESTful API（Function Callingを含む）、RPC、コマンドライン、Web UIなど、
多様なインターフェースを提供し、モデルの管理と相互作用を容易にします。

🌐 **配布デプロイメント**: Excel の分散展開シナリオでは、複数のデバイスやマシンにモデルの推論をシームレスに分散させることができます。

🔌 **サードパーティライブラリとの組み込み統合**: Xorbits Inference は、[LangChain](https://python.langchain.com/docs/integrations/providers/xinference)
や [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window) のような人気のあるサードパーティライブラリと
シームレスに統合されています。

## なぜ Xinference を選ぶのか
| 機能 | Xinference | FastChat | OpenLLM | RayLLM |
|------|------------|----------|---------|--------|
| OpenAI 互換の RESTful API | ✅ | ✅ | ✅ | ✅ |
| vLLM 統合 | ✅ | ✅ | ✅ | ✅ |
| その他の推論エンジン（GGML、TensorRT） | ✅ | ❌ | ✅ | ✅ |
| その他のプラットフォーム（CPU、Metal） | ✅ | ✅ | ❌ | ❌ |
| マルチノードクラスター展開 | ✅ | ❌ | ❌ | ✅ |
| 画像モデル（テキストから画像へ） | ✅ | ✅ | ❌ | ❌ |
| テキスト埋め込みモデル | ✅ | ❌ | ❌ | ❌ |
| マルチモーダルモデル | ✅ | ❌ | ❌ | ❌ |
| より多くのOpenAI機能（関数呼び出し） | ✅ | ❌ | ❌ | ❌ |

## 入門ガイド

**始める前に、GitHubで私たちにスターを付けてください。そうすると、新しいリリースの通知を即座に受け取ることができます！**

* [ドキュメント](https://inference.readthedocs.io/en/latest/index.html)
* [組み込みモデル](https://inference.readthedocs.io/en/latest/models/builtin/index.html)
* [カスタムモデル](https://inference.readthedocs.io/en/latest/models/custom.html)
* [デプロイメントドキュメント](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html)
* [例とチュートリアル](https://inference.readthedocs.io/en/latest/examples/index.html)

### Jupyter Notebook

Xinferenceを体験する最軽量な方法は、私たちの[Google Colab上のJupyterノートブック](https://colab.research.google.com/github/xorbitsai/inference/blob/main/examples/Xinference_Quick_Start.ipynb)を試すことです]。

### Docker

Nvidia GPUユーザーは、[Xinference Dockerイメージ](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html)を使用してXinferenceサーバーを開始することができます。インストールコマンドを実行する前に、システムに[Docker](https://docs.docker.com/get-docker/)と[CUDA](https://developer.nvidia.com/cuda-downloads)が設定されていることを確認してください。

### クイックスタート

以下のようにpipを使用してXinferenceをインストールします。（他のオプションについては、[インストールページ](https://inference.readthedocs.io/en/latest/getting_started/installation.html)を参照してください。）

```bash
pip install "xinference[all]"
```

ローカルインスタンスのXinferenceを開始するには、次のコマンドを実行します：

```bash
$ xinference-local
```

Xinferenceが実行されると、Web UI、cURL、コマンドライン、またはXinferenceのPythonクライアントを介して試すことができます。詳細は[ドキュメント](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html#run-xinference-locally)をご覧ください。

![Web UI](assets/screenshot.png)

## 関与する

| プラットフォーム                                                                                        | 目的                    |
|-------------------------------------------------------------------------------------------------|-----------------------|
| [Github イシュー](https://github.com/xorbitsai/inference/issues)                                    | バグ報告と機能リクエストの提出。      |
| [Discord](https://discord.gg/Xw9tszSkr5) | 他のXinferenceユーザーとの協力。 |
| [Twitter](https://twitter.com/xorbitsio)                                                        | 新機能に関する最新情報の入手。       |

## 引用

この仕事が役立つ場合は、以下のように引用してください：

```bibtex
@inproceedings{lu2024xinference,
    title = "Xinference: Making Large Model Serving Easy",
    author = "Lu, Weizheng and Xiong, Lingfeng and Zhang, Feng and Qin, Xuye and Chen, Yueguo",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.30",
    pages = "291--300",
}
```

## 寄稿者

<a href="https://github.com/xorbitsai/inference/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=xorbitsai/inference" />
</a>

================================================
FILE: README_zh_CN.md
================================================
<div align="center">
<img src="./assets/xorbits-logo.png" width="180px" alt="xorbits" />

# Xorbits Inference：模型推理， 轻而易举 🤖

<p align="center">
  <a href="https://xinference.cn">Xinference 企业版</a> ·
  <a href="https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html#installation">自托管</a> ·
  <a href="https://inference.readthedocs.io/zh-cn/latest/index.html">文档</a>
</p>

[![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
[![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
[![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
[![Docker Pulls](https://img.shields.io/docker/pulls/xprobe/xinference?style=for-the-badge&logo=docker)](https://hub.docker.com/r/xprobe/xinference)
[![WeChat](https://img.shields.io/badge/添加微信小助手-07C160?style=for-the-badge&logo=wechat&logoColor=white)](https://xinference.cn/images/WeCom.jpg)
[![Zhihu](https://img.shields.io/static/v1?style=for-the-badge&message=未来速度&color=0084FF&logo=Zhihu&logoColor=FFFFFF&label=)](https://www.zhihu.com/org/xorbits)

<p align="center">
  <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-d9d9d9?style=for-the-badge"></a>
  <a href="./README_zh_CN.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/中文介绍-454545?style=for-the-badge"></a>
  <a href="./README_ja_JP.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-d9d9d9?style=for-the-badge"></a>
</p>
</div>
<br />


Xorbits Inference（Xinference）是一个性能强大且功能全面的分布式推理框架。可用于大语言模型（LLM），语音识别模型，多模态模型等各种模型的推理。通过 Xorbits Inference，你可以轻松地一键部署你自己的模型或内置的前沿开源模型。无论你是研究者，开发者，或是数据科学家，都可以通过 Xorbits Inference 与最前沿的 AI 模型，发掘更多可能。


<div align="center">
<i><a href="https://xinference.cn/images/WeCom.jpg">👉 添加企业微信、加入Xinference社区!</a></i>
</div>

## 🔥 近期热点
### 框架增强
- Agent 原生服务能力：Xinference 与 [Xagent](https://github.com/xorbitsai/xagent) 深度集成，支持动态规划、工具调用与多步自主推理，突破传统静态流程的限制。
- 自动 Batch: 多个并发请求会被自动合批处理，大幅提升吞吐量。: [#4197](https://github.com/xorbitsai/inference/pull/4197)
- 支持寒武纪芯片：[#3693](https://github.com/xorbitsai/inference/pull/3693)
- [Xllamacpp](https://github.com/xorbitsai/xllamacpp): 全新llama.cpp Python binding，由 Xinference 团队维护，支持持续并行且更生产可用: [#2997](https://github.com/xorbitsai/inference/pull/2997)
- 分布式推理：在多个 worker 上运行大尺寸模型：[#2877](https://github.com/xorbitsai/inference/pull/2877)
- VLLM 引擎增强: 跨副本共享KV Cache: [#2732](https://github.com/xorbitsai/inference/pull/2732)
### 新模型
- 内置 [Qwen-3.5](https://github.com/QwenLM/Qwen3.5): [#4639](https://github.com/xorbitsai/inference/pull/4639)
- 内置 [GLM-5](https://github.com/zai-org/GLM-5): [#4638](https://github.com/xorbitsai/inference/pull/4638)
- 内置 [MiniMax-M2.5](https://github.com/MiniMax-AI/MiniMax-M2.5): [#4630](https://github.com/xorbitsai/inference/pull/4630)
- 内置 [Kimi-K2.5](https://github.com/MoonshotAI/Kimi-K2.5): [#4631](https://github.com/xorbitsai/inference/pull/4631)
- 内置 [FLUX.2-Klein](https://bfl.ai/models/flux-2-klein): [#4596](https://github.com/xorbitsai/inference/pull/4596)
- 内置 [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR): [#4581](https://github.com/xorbitsai/inference/pull/4581)
- 内置 [GLM-4.7](https://huggingface.co/zai-org/GLM-4.7): [#4565](https://github.com/xorbitsai/inference/pull/4565)
- 内置 [MinerU2.5-2509-1.2B](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B): [#4569](https://github.com/xorbitsai/inference/pull/4569)
### 集成
- [Xagent](https://github.com/xorbitsai/xagent)：企业级 Agent 平台，用于构建和运行具备规划、记忆与工具调用能力的智能体，不再受限于僵化的工作流。
- [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/)：一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力，帮助您轻松实现复杂的问答场景。
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。
- [RAGFlow](https://github.com/infiniflow/ragflow): 是一款基于深度文档理解构建的开源 RAG 引擎。
- [MaxKB](https://github.com/1Panel-dev/MaxKB): MaxKB = Max Knowledge Base，是一款基于大语言模型和 RAG 的开源知识库问答系统，广泛应用于智能客服、企业内部知识库、学术研究与教育等场景。

## 主要功能
🌟 **模型推理，轻而易举**：大语言模型，语音识别模型，多模态模型的部署流程被大大简化。一个命令即可完成模型的部署工作。 

⚡️ **前沿模型，应有尽有**：框架内置众多中英文的前沿大语言模型，包括 baichuan，chatglm2 等，一键即可体验！内置模型列表还在快速更新中！

🖥 **异构硬件，快如闪电**：通过 [ggml](https://github.com/ggerganov/ggml)，同时使用你的 GPU 与 CPU 进行推理，降低延迟，提高吞吐！

⚙️ **接口调用，灵活多样**：提供多种使用模型的接口，包括 OpenAI 兼容的 RESTful API（包括 Function Calling），RPC，命令行，web UI 等等。方便模型的管理与交互。

🌐 **集群计算，分布协同**: 支持分布式部署，通过内置的资源调度器，让不同大小的模型按需调度到不同机器，充分使用集群资源。

🔌 **开放生态，无缝对接**: 与流行的三方库无缝对接，包括 [LangChain](https://python.langchain.com/docs/integrations/providers/xinference)，[LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window)，[Dify](https://docs.dify.ai/advanced/model-configuration/xinference)，以及 [Chatbox](https://chatboxai.app/)。

## 为什么选择 Xinference
| 功能特点                    | Xinference | FastChat | OpenLLM | RayLLM |
|-------------------------|------------|----------|---------|--------|
| 兼容 OpenAI 的 RESTful API | ✅ | ✅ | ✅ | ✅ |
| vLLM 集成                 | ✅ | ✅ | ✅ | ✅ |
| 更多推理引擎（GGML、TensorRT）   | ✅ | ❌ | ✅ | ✅ |
| 更多平台支持（CPU、Metal）       | ✅ | ✅ | ❌ | ❌ |
| 分布式集群部署                 | ✅ | ❌ | ❌ | ✅ |
| 图像模型（文生图）               | ✅ | ✅ | ❌ | ❌ |
| 文本嵌入模型                  | ✅ | ❌ | ❌ | ❌ |
| 多模态模型                   | ✅ | ❌ | ❌ | ❌ |
| 语音识别模型                  | ✅ | ❌ | ❌ | ❌ |
| 更多 OpenAI 功能 (函数调用)     | ✅ | ❌ | ❌ | ❌ |

## 使用 Xinference

- **自托管 Xinference 社区版</br>**
使用 [入门指南](#getting-started) 快速在你自己的环境中运行 Xinference。
参考 [文档](https://inference.readthedocs.io/zh-cn) 以获得参考和更多说明。

- **面向企业/组织的 Xinference 版本</br>**
我们提供额外的面向企业的功能。 [通过企业微信联系](https://xinference.cn/images/WeCom.jpg)
或 [提交表单](https://w8v6grm432.feishu.cn/share/base/form/shrcn9u1EBXQxmGMqILEjguuGoh) 讨论企业需求。 </br>

## 保持领先

在 GitHub 上给 Xinference Star，并立即收到新版本的通知。

![star-us](assets/stay_ahead.gif)

## 入门指南

* [文档](https://inference.readthedocs.io/zh-cn/latest/index.html)
* [内置模型](https://inference.readthedocs.io/zh-cn/latest/models/builtin/index.html)
* [自定义模型](https://inference.readthedocs.io/zh-cn/latest/models/custom.html)
* [部署文档](https://inference.readthedocs.io/zh-cn/latest/getting_started/using_xinference.html)
* [示例和教程](https://inference.readthedocs.io/zh-cn/latest/examples/index.html)

### Jupyter Notebook

体验 Xinference 最轻量级的方式是使用我们 [Google Colab 上的 Jupyter Notebook](https://colab.research.google.com/github/xorbitsai/inference/blob/main/examples/Xinference_Quick_Start.ipynb)。

### Docker

Nvidia GPU 用户可以使用[Xinference Docker 镜像](https://inference.readthedocs.io/zh-cn/latest/getting_started/using_docker_image.html) 启动 Xinference 服务器。在执行安装命令之前，确保你的系统中已经安装了 [Docker](https://docs.docker.com/get-docker/) 和 [CUDA](https://developer.nvidia.com/cuda-downloads)。

### Kubernetes

确保你的 Kubernetes 集群开启了 GPU 支持，然后通过 `helm` 进行如下方式的安装。

```
# 新增xinference仓库
helm repo add xinference https://xorbitsai.github.io/xinference-helm-charts

# 更新仓库，查询可安装的版本
helm repo update xinference
helm search repo xinference/xinference --devel --versions

# 在K8s中安装xinference
helm install xinference xinference/xinference -n xinference --version 0.0.1-v<xinference_release_version>
```

更多定制化安装方式，请参考[文档](https://inference.readthedocs.io/en/latest/getting_started/using_kubernetes.html)。

### 快速开始

使用 pip 安装 Xinference，操作如下。（更多选项，请参阅[安装页面](https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html)。）

```bash
pip install "xinference[all]"
```

要启动一个本地的 Xinference 实例，请运行以下命令：

```bash
$ xinference-local
```

一旦 Xinference 运行起来，你可以通过多种方式尝试它：通过网络界面、通过 cURL、通过命令行或通过 Xinference 的 Python 客户端。更多指南，请查看我们的[文档](https://inference.readthedocs.io/zh-cn/latest/getting_started/using_xinference.html#run-xinference-locally)。

![网络界面](assets/screenshot.png)

## 参与其中

| 平台                                                                                              | 目的                   |
|-------------------------------------------------------------------------------------------------|----------------------|
| [Github 问题](https://github.com/xorbitsai/inference/issues)                                      | 报告错误和提交功能请求。         |
| [Discord](https://discord.gg/Xw9tszSkr5) | 与其他 Xinference 用户合作。 |
| [Twitter](https://twitter.com/xorbitsio)                                                        | 及时了解新功能。             |
| [微信社群](https://xinference.cn/images/WeCom.jpg)                                     | 与其他 Xinference 用户交流。 |
| [知乎](https://zhihu.com/org/xorbits)                                                             | 了解团队最新的进展。           |

## 引用

如果您觉得此项目有帮助，请以如下格式引用我们：

```bibtex
@inproceedings{lu2024xinference,
    title = "Xinference: Making Large Model Serving Easy",
    author = "Lu, Weizheng and Xiong, Lingfeng and Zhang, Feng and Qin, Xuye and Chen, Yueguo",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.30",
    pages = "291--300",
}
```

## 合作

* [琶洲实验室 | 黄埔](https://www.pazhoulab-huangpu.com/#/)

## 贡献者

<a href="https://github.com/xorbitsai/inference/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=xorbitsai/inference" />
</a>

## Star 历史

[![Star History Chart](https://api.star-history.com/svg?repos=xorbitsai/inference&type=Date)](https://star-history.com/#xorbitsai/inference&Date)

================================================
FILE: benchmark/README.md
================================================
# Benchmarking Xinference

## Downloading the ShareGPT dataset

You can download the dataset by running:
```bash
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
```

## Benchmarking latency

This tool will sample prompts from dataset, and run benchmark with serialized requests.

```bash
python benchmark_latency.py --dataset /path/to/ShareGPT_V3_unfiltered_cleaned_split.json \
                            --tokenizer /path/to/tokenizer \
                            --num-prompt 100 \
                            --model-uid ${model_uid}
```

## Benchmarking serving

This tool will sample prompts from dataset, and run benchmark with parallel requests.

```bash
python benchmark_serving.py --dataset /path/to/ShareGPT_V3_unfiltered_cleaned_split.json \
                            --tokenizer /path/to/tokenizer \
                            --model-uid ${model_uid} \
                            --num-prompt 100 --concurrency 50
```

## Benchmarking long context serving

This tool will generate long prompts to sort random numbers, according to specified context length.

```
python benchmark/benchmark_long.py --context-length ${context_length} --tokenizer /path/to/tokenizer \
							--model-uid ${model_uid} \
							--num-prompts 32 -c 16
```

## Common Options for Benchmarking Tools
- `--stream`. You can enable streaming responses by using the option, which is useful for real-time data processing and receiving incremental data without waiting for the entire dataset to be processed. 

- `--print-error`. For troubleshooting and more detailed output, the option can be used to print detailed error messages if any errors are encountered during the execution. 

These options are available for use in all benchmarking tools provided in this suite, enhancing flexibility and providing essential debugging information.


================================================
FILE: benchmark/benchmark_embedding.py
================================================
# Copyright 2022-2025 XProbe Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import asyncio
import logging
import random
import time
import aiohttp
from typing import List, Dict, Optional
from datasets import load_dataset
import numpy as np
from benchmark_runner import ConcurrentBenchmarkRunner


logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class EmbeddingBenchmarkRunner(ConcurrentBenchmarkRunner):
    def __init__(
        self,
        api_url: str,
        model_uid: str,
        input_requests: List[Dict],
        stream: bool,
        concurrency: int,
        api_key: Optional[str] = None,
        print_error: bool = False,
    ):
        super().__init__(
            api_url,
            model_uid,
            input_requests,
            stream,
            concurrency,
            api_key,
            print_error,
        )

    async def _run(self):
        tasks = []
        for i in range(self.concurrency):
            tasks.append(asyncio.create_task(self.worker(i)))

        await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)

    async def worker(self, i: int):
        r = random.Random(i)
        index = r.randint(0, len(self.input_requests) - 1)
        while self.left > 0:
            request = self.input_requests[index]
            index += 1
            index = index % len(self.input_requests)
            await self.send_request(request)
            self.left -= 1
            # pring longer space to overwrite the previous when left decrease
            print("\rdone_request, left %d    " % (self.left), end="")
        # The last one
        print("")

    async def send_request(self, request, warming_up: bool = False):
        input = request["sentence"]
        request_start_time = time.time()

        pload = {
            "model": self.model_uid,
            "input": input,
        }

        headers = {"User-Agent": "Benchmark Client"}
        if self.api_key:
            headers["Authorization"] = f"Bearer {self.api_key}"

        timeout = aiohttp.ClientTimeout(total=3 * 3600)
        async with aiohttp.ClientSession(timeout=timeout) as session:
            async with session.post(
                self.api_url, headers=headers, json=pload
            ) as response:
                resp = await response.json()
                if response.status == 200:
                    request_end_time = time.time()
                    request_latency = request_end_time - request_start_time
                    if not warming_up:
                        self.outputs.append(request_latency)
                else:
                    logger.error(f"Failed to create chat completion: {resp}")


def main(args: argparse.Namespace):
    print(args)

    random.seed(args.seed)
    np.random.seed(args.seed)

    api_url = f"http://{args.host}:{args.port}/v1/embeddings"
    model_uid = args.model_uid

    logger.info("Preparing for benchmark.")
    dataset = load_dataset(args.dataset, args.subset)
    input_requests = dataset["test"].to_list()
    if args.num_query > 0:
        input_requests = input_requests[: args.num_query]
    else:
        args.num_query = len(input_requests)

    logger.info("Benchmark starts.")

    benchmark = EmbeddingBenchmarkRunner(
        api_url,
        model_uid,
        input_requests,
        args.stream,
        concurrency=args.concurrency,
        api_key=args.api_key,
        print_error=args.print_error,
    )
    asyncio.run(benchmark.run())

    # TODO: Print the results of request_latency in detail.
    # benchmark.print_stats() needs to be overridden
    print(f"Total time: {benchmark.benchmark_time:.2f} s")
    print(f"Throughput: {args.num_query / benchmark.benchmark_time:.2f} requests/s")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Stress test the embedding model.")
    parser.add_argument("--host", type=str, default="localhost")
    parser.add_argument("--port", type=int, default=9997)
    parser.add_argument(
        "--dataset",
        type=str,
        default="clue",
        help="Name to the dataset.",
    )
    parser.add_argument(
        "--subset",
        type=str,
        default="tnews",
        help="Subset to the dataset.",
    )
    parser.add_argument(
        "--concurrency",
        "-c",
        type=int,
        default=256,
        help="Set the concurrency of request to send",
    )
    parser.add_argument(
        "--num-query",
        "-q",
        type=int,
        default=-1,
        help="Set the query dataset count, default is all",
    )
    parser.add_argument(
        "--trust-remote-code",
        action="store_true",
        help="Trust remote code from huggingface.",
    )
    parser.add_argument(
        "--model-uid", type=str, required=True, help="Xinference model UID."
    )
    parser.add_argument("--seed", type=int, default=0)
    parser.add_argument(
        "--stream", action="store_true", help="Enable streaming responses."
    )
    parser.add_argument(
        "--api-key", type=str, default=None, help="Authorization api key",
    )
    parser.add_argument(
        "--print-error",
        action="store_true",
        help="Print detailed error messages if any errors encountered."
    )
    args = parser.parse_args()
    main(args)


================================================
FILE: benchmark/benchmark_latency.py
================================================
# Copyright 2022-2023 XProbe Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import asyncio
import logging
import random

import numpy as np
from utils import get_tokenizer, sample_requests
from benchmark_runner import BenchmarkRunner


logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class LatencyBenchmarkRunner(BenchmarkRunner):
    async def _run(self):
        total_requests = len(self.input_requests)
        for i, request in enumerate(self.input_requests):
            await self.send_request(request)
            remaining = total_requests - (i + 1)
            print(
                f"\rProcessed {i + 1}/{total_requests} requests, {remaining} remaining.",
                end="",
            )
        print("")


def main(args: argparse.Namespace):
    print(args)
    random.seed(args.seed)
    np.random.seed(args.seed)

    api_url = f"http://{args.host}:{args.port}/v1/chat/completions"
    model_uid = args.model_uid

    logger.info("Preparing for benchmark.")
    tokenizer = get_tokenizer(args.tokenizer, trust_remote_code=args.trust_remote_code)
    input_requests = sample_requests(args.dataset, args.num_prompts, tokenizer)

    logger.info("Benchmark starts.")

    benchmark = LatencyBenchmarkRunner(
        api_url,
        model_uid,
        input_requests,
        args.stream,
        args.api_key,
        args.print_error,
    )
    asyncio.run(benchmark.run())

    benchmark.print_stats()


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Benchmark the latency of processing a single batch of requests."
    )
    parser.add_argument("--host", type=str, default="localhost")
    parser.add_argument("--port", type=int, default=9997)
    parser.add_argument(
        "--dataset", type=str, required=True, help="Path to the dataset."
    )
    parser.add_argument(
        "--tokenizer", type=str, required=True, help="Name or path of the tokenizer."
    )
    parser.add_argument(
        "--num-prompts", type=int, default=100, help="Number of prompts to process."
    )
    parser.add_argument("--seed", type=int, default=0)
    parser.add_argument(
        "--trust-remote-code",
        action="store_true",
        help="Trust remote code from huggingface.",
    )
    parser.add_argument("--model-uid", type=str, help="Xinference model UID.")
    parser.add_argument(
        "--stream", action="store_true", help="Enable streaming responses."
    )
    parser.add_argument(
        "--api-key",
        type=str,
        default=None,
        help="Authorization api key",
    )
    parser.add_argument(
        "--print-error",
        action="store_true",
        help="Print detailed error messages if any errors encountered."
    )
    args = parser.parse_args()
    main(args)


================================================
FILE: benchmark/benchmark_long.py
================================================
# Copyright 2022-2023 XProbe Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import asyncio
import logging
import random

import numpy as np

from utils import generate_sorting_prompts, get_tokenizer
from benchmark_runner import ConcurrentBenchmarkRunner


logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class LongBenchmarkRunner(ConcurrentBenchmarkRunner):
    async def _run(self):
        tasks = []
        for i in range(self.concurrency):
            tasks.append(asyncio.create_task(self.worker(i)))

        await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)

    async def worker(self, i: int):
        r = random.Random(i)
        index = r.randint(0, len(self.input_requests) - 1)
        while self.left > 0:
            request = self.input_requests[index]
            index += 1
            index = index % len(self.input_requests)
            await self.send_request(request)
            self.left -= 1
            # pring longer space to overwrite the previous when left decrease
            print("\rdone_request, left %d    " % (self.left), end="")
        # The last one
        print("")


def main(args: argparse.Namespace):
    if args.concurrency > args.num_prompts:
        print("Fix concurrency with num_prompts %d" % (args.num_prompts))
        args.concurrency = args.num_prompts
    print(args)

    random.seed(args.seed)
    np.random.seed(args.seed)

    api_url = f"http://{args.host}:{args.port}/v1/chat/completions"
    model_uid = args.model_uid

    logger.info("Preparing for benchmark.")
    tokenizer = get_tokenizer(args.tokenizer, trust_remote_code=args.trust_remote_code)
    # XXX: generate_sorting_prompts() currently only generate prompts 1/2 to 2/3 of context_length,
    # because tokenizers vary by models, consider improve in the future.
    input_requests = generate_sorting_prompts(
        args.concurrency, args.context_length, args.context_length / 2 - 20, tokenizer
    )

    logger.info("Benchmark starts.")

    benchmark = LongBenchmarkRunner(
        api_url,
        model_uid,
        input_requests,
        args.stream,
        concurrency=args.concurrency,
        api_key=args.api_key,
        print_error=args.print_error,
    )
    asyncio.run(benchmark.run())

    benchmark.print_stats()


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Benchmark the online serving throughput with long context."
    )
    parser.add_argument("--host", type=str, default="localhost")
    parser.add_argument("--port", type=int, default=9997)
    parser.add_argument(
        "--tokenizer", type=str, required=True, help="Name or path of the tokenizer."
    )
    parser.add_argument(
        "--context-length", type=int, default=32768, help="model context_length."
    )
    parser.add_argument(
        "--num-prompts", type=int, default=16, help="Number of prompts to process."
    )
    parser.add_argument(
        "--concurrency",
        "-c",
        type=int,
        default=16,
        help="Set the concurrency of request to send",
    )
    parser.add_argument(
        "--trust-remote-code",
        action="store_true",
        help="Trust remote code from huggingface.",
    )
    parser.add_argument("--model-uid", type=str, help="Xinference model UID.")
    parser.add_argument(
        "--api-key", type=str, default=None, help="Authorization api key",
    )
    parser.add_argument("--seed", type=int, default=0)
    parser.add_argument(
        "--stream", action="store_true", help="Enable streaming responses."
    )
    parser.add_argument(
        "--print-error",
        action="store_true",
        help="Print detailed error messages if any errors encountered."
    )
    args = parser.parse_args()
    main(args)


================================================
FILE: benchmark/benchmark_rerank.py
================================================
# Copyright 2022-2023 XProbe Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import asyncio
import logging
import random
import time
import aiohttp
from typing import List, Dict, Optional
from datasets import load_dataset
import numpy as np
from benchmark_runner import ConcurrentBenchmarkRunner


logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class RerankBenchmarkRunner(ConcurrentBenchmarkRunner):
    def __init__(
        self,
        api_url: str,
        model_uid: str,
        input_requests: List[Dict],
        stream: bool,
        top_n: int,
        concurrency: int,
        api_key: Optional[str] = None,
        print_error: bool = False,
    ):
        super().__init__(
            api_url,
            model_uid,
            input_requests,
            stream,
            concurrency,
            api_key,
            print_error,
        )
        self.top_n = top_n

    async def _run(self):
        tasks = []
        for i in range(self.concurrency):
            tasks.append(asyncio.create_task(self.worker(i)))

        await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)

    async def worker(self, i: int):
        r = random.Random(i)
        index = r.randint(0, len(self.input_requests) - 1)
        while self.left > 0:
            request = self.input_requests[index]
            index += 1
            index = index % len(self.input_requests)
            await self.send_request(request)
            self.left -= 1
            # pring longer space to overwrite the previous when left decrease
            print("\rdone_request, left %d    " % (self.left), end="")
        # The last one
        print("")

    async def send_request(self, request, warming_up: bool = False):
        prompt, documents = request["query"], request["positive"]
        request_start_time = time.time()

        pload = {
            "model": self.model_uid,
            "top_n": self.top_n,
            "query": prompt,
            "documents": documents,
        }

        headers = {"User-Agent": "Benchmark Client"}
        if self.api_key:
            headers["Authorization"] = f"Bearer {self.api_key}"

        timeout = aiohttp.ClientTimeout(total=3 * 3600)
        async with aiohttp.ClientSession(timeout=timeout) as session:
            async with session.post(
                self.api_url, headers=headers, json=pload
            ) as response:
                resp = await response.json()
                if response.status == 200:
                    request_end_time = time.time()
                    request_latency = request_end_time - request_start_time
                    if not warming_up:
                        self.outputs.append(request_latency)
                else:
                    logger.error(f"Failed to create chat completion: {resp}")


def main(args: argparse.Namespace):
    print(args)

    random.seed(args.seed)
    np.random.seed(args.seed)

    api_url = f"http://{args.host}:{args.port}/v1/rerank"
    model_uid = args.model_uid

    logger.info("Preparing for benchmark.")
    dataset = load_dataset(args.dataset)
    input_requests = dataset["test"].remove_columns("negative").to_list()
    if args.num_query > 0:
        input_requests = input_requests[: args.num_query]
    else:
        args.num_query = len(input_requests)

    logger.info("Benchmark starts.")

    benchmark = RerankBenchmarkRunner(
        api_url,
        model_uid,
        input_requests,
        args.stream,
        top_n=args.top_n,
        concurrency=args.concurrency,
        api_key=args.api_key,
        print_error=args.print_error,
    )
    asyncio.run(benchmark.run())

    # TODO: Print the results of request_latency in detail.
    # benchmark.print_stats() needs to be overridden
    print(f"Total time: {benchmark.benchmark_time:.2f} s")
    print(f"Throughput: {args.num_query / benchmark.benchmark_time:.2f} requests/s")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Stress test the rerank model.")
    parser.add_argument("--host", type=str, default="localhost")
    parser.add_argument("--port", type=int, default=9997)
    parser.add_argument(
        "--dataset",
        type=str,
        default="mteb/scidocs-reranking",
        help="Path to the dataset.",
    )
    parser.add_argument(
        "--concurrency",
        "-c",
        type=int,
        default=16,
        help="Set the concurrency of request to send",
    )
    parser.add_argument(
        "--top-n",
        "-n",
        type=int,
        default=5,
        help="Set the top n to the rerank",
    )
    parser.add_argument(
        "--num-query",
        "-q",
        type=int,
        default=-1,
        help="Set the query dataset count, default is all",
    )
    parser.add_argument(
        "--trust-remote-code",
        action="store_true",
        help="Trust remote code from huggingface.",
    )
    parser.add_argument(
        "--model-uid", type=str, required=True, help="Xinference model UID."
    )
    parser.add_argument("--seed", type=int, default=0)
    parser.add_argument(
        "--stream", action="store_true", help="Enable streaming responses."
    )
    parser.add_argument(
        "--api-key", type=str, default=None, help="Authorization api key",
    )
    parser.add_argument(
        "--print-error",
        action="store_true",
        help="Print detailed error messages if any errors encountered."
    )
    args = parser.parse_args()
    main(args)


================================================
FILE: benchmark/benchmark_runner.py
================================================
# Copyright 2022-2023 XProbe Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import aiohttp
import json
import sys
import traceback
import warnings
import logging
from dataclasses import dataclass, field
import time
from typing import List, Optional, Tuple

import numpy as np

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

AIOHTTP_TIMEOUT = aiohttp.ClientTimeout(total=3 * 3600)


def remove_prefix(text: str, prefix: str) -> str:
    if text.startswith(prefix):
        return text[len(prefix) :].strip()
    return text.strip()


@dataclass
class RequestOutput:
    success: bool = False
    prompt_len: int = 0
    completion_tokens: int = 0
    latency: float = 0.0
    ttft: float = 0.0
    itl: List[float] = field(default_factory=list)  # List of inter-token latencies
    error: str = ""


class BenchmarkRunner:
    def __init__(
        self,
        api_url: str,
        model_uid: str,
        input_requests: List[Tuple[str, int, int]],
        stream: bool,
        api_key: Optional[str] = None,
        print_error: bool = False,
    ):
        self.api_url = api_url
        self.model_uid = model_uid
        self.input_requests = input_requests
        self.outputs: List[RequestOutput] = []
        self.benchmark_time = None
        self.stream = stream
        self.api_key = api_key
        self.print_error = print_error

    async def run(self):
        await self.warm_up()
        start_time = time.time()
        await self._run()
        end_time = time.time()
        self.benchmark_time = end_time - start_time

    async def warm_up(self, num_requests: int = 5):
        logger.info("Warming up...")
        for i in range(min(num_requests, len(self.input_requests))):
            request = self.input_requests[i]
            await self.send_request(request, warming_up=True)
        logger.info("Warm-up completed.")

    async def _run(self):
        pass

    async def send_request(self, request: tuple, warming_up: bool = False):
        prompt, prompt_len, output_len = request

        if self.stream:
            pload = {
                "model": self.model_uid,
                "n": 1,
                "temperature": 0.6,
                "top_p": 0.9,
                "max_tokens": output_len,
                "stream": True,
                "messages": [{"role": "user", "content": prompt}],
                "stream_options": {"include_usage": True},
            }
        else:
            pload = {
                "model": self.model_uid,
                "n": 1,
                "temperature": 0.6,
                "top_p": 0.9,
                "max_tokens": output_len,
                "stream": False,
                "messages": [{"role": "user", "content": prompt}],
            }

        headers = {"User-Agent": "Benchmark Client"}
        if self.api_key:
            headers["Authorization"] = f"Bearer {self.api_key}"

        async with aiohttp.ClientSession(timeout=AIOHTTP_TIMEOUT) as session:
            output = RequestOutput(prompt_len=prompt_len)
            ttft = 0.0
            st = time.perf_counter()
            most_recent_timestamp = st

            try:
                async with session.post(
                    self.api_url, headers=headers, json=pload
                ) as response:
                    if response.status == 200:
                        if self.stream:
                            async for chunk_bytes in response.content:
                                # {
                                #     "id": "chataec79465-dfea-46af-81b9-c28124063fc0",
                                #     "model": "llama-3-instruct",
                                #     "created": 1721202668,
                                #     "object": "chat.completion.chunk",
                                #     "choices": [
                                #         {
                                #             "index": 0,
                                #             "delta": {"role": "assistant", "content": ""},
                                #             "finish_reason": null,
                                #         }
                                #     ],
                                # }
                                chunk_bytes = chunk_bytes.strip()
                                if not chunk_bytes:
                                    continue

                                chunk = remove_prefix(chunk_bytes.decode("utf-8"), "data:")

                                if chunk == "[DONE]":
                                    latency = time.perf_counter() - st
                                else:
                                    timestamp = time.perf_counter()
                                    data = json.loads(chunk)

                                    # First token
                                    if ttft == 0.0:
                                        ttft = time.perf_counter() - st
                                        output.ttft = ttft

                                    # Decoding phase
                                    else:
                                        output.itl.append(timestamp - most_recent_timestamp)

                                    most_recent_timestamp = timestamp

                            output.latency = latency
                            output.success = True
                            output.completion_tokens = data["usage"]["completion_tokens"]
                        else:
                            resp = await response.json()
                            output.latency = time.perf_counter() - st
                            output.success = True
                            output.completion_tokens = resp["usage"]["completion_tokens"]
            except Exception:
                output.success = False
                exc_info = sys.exc_info()
                output.error = "".join(traceback.format_exception(*exc_info))

            if not warming_up:
                self.outputs.append(output)

    def print_stats(self):
        total_time = self.benchmark_time

        if self.stream:
            # Initialize variables for metrics
            total_input = 0
            completed = 0
            actual_output_lens = []
            itls = []
            tpots = []
            ttfts = []

            for output in self.outputs:
                if output.success:
                    actual_output_lens.append(output.completion_tokens)
                    total_input += output.prompt_len
                    if output.completion_tokens > 1:
                        tpots.append(
                            (output.latency - output.ttft)
                            / (output.completion_tokens - 1)
                        )
                    itls += output.itl
                    ttfts.append(output.ttft)
                    completed += 1
                else:
                    actual_output_lens.append(0)

            if completed == 0:
                warnings.warn(
                    "All requests failed. This is likely due to a misconfiguration "
                    "on the benchmark arguments.",
                    stacklevel=2,
                )

            # Calculate statistics
            total_output = sum(actual_output_lens)
            request_throughput = completed / total_time if total_time > 0 else 0
            input_throughput = total_input / total_time if total_time > 0 else 0
            output_throughput = total_output / total_time if total_time > 0 else 0

            mean_ttft = np.mean(ttfts) * 1000 if ttfts else 0
            median_ttft = np.median(ttfts) * 1000 if ttfts else 0
            std_ttft = np.std(ttfts) * 1000 if ttfts else 0
            p99_ttft = np.percentile(ttfts, 99) * 1000 if ttfts else 0

            mean_tpot = np.mean(tpots) * 1000 if tpots else 0
            median_tpot = np.median(tpots) * 1000 if tpots else 0
            std_tpot = np.std(tpots) * 1000 if tpots else 0
            p99_tpot = np.percentile(tpots, 99) * 1000 if tpots else 0

            mean_itl = np.mean(itls) * 1000 if itls else 0
            median_itl = np.median(itls) * 1000 if itls else 0
            std_itl = np.std(itls) * 1000 if itls else 0
            p99_itl = np.percentile(itls, 99) * 1000 if itls else 0

            # Print benchmark results
            print("{s:{c}^{n}}".format(s=" Benchmark Result ", n=50, c="="))
            print("{:<40} {:<10}".format("Successful requests:", completed))
            print("{:<40} {:<10.2f}".format("Benchmark duration (s):", total_time))
            print("{:<40} {:<10}".format("Total input tokens:", total_input))
            print("{:<40} {:<10}".format("Total generated tokens:", total_output))
            print(
                "{:<40} {:<10.2f}".format(
                    "Request throughput (req/s):", request_throughput
                )
            )
            print(
                "{:<40} {:<10.2f}".format(
                    "Input token throughput (tok/s):", input_throughput
                )
            )
            print(
                "{:<40} {:<10.2f}".format(
                    "Output token throughput (tok/s):", output_throughput
                )
            )

            print("{s:{c}^{n}}".format(s="Time to First Token", n=50, c="-"))
            print("{:<40} {:<10.4f}".format("Mean TTFT (ms):", mean_ttft))
            print("{:<40} {:<10.4f}".format("Median TTFT (ms):", median_ttft))
            print("{:<40} {:<10.4f}".format("Std TTFT (ms):", std_ttft))
            print("{:<40} {:<10.4f}".format("P99 TTFT (ms):", p99_ttft))

            print(
                "{s:{c}^{n}}".format(
                    s="Time per Output Token (excl. 1st token)", n=50, c="-"
                )
            )
            print("{:<40} {:<10.4f}".format("Mean TPOT (ms):", mean_tpot))
            print("{:<40} {:<10.4f}".format("Median TPOT (ms):", median_tpot))
            print("{:<40} {:<10.4f}".format("Std TPOT (ms):", std_tpot))
            print("{:<40} {:<10.4f}".format("P99 TPOT (ms):", p99_tpot))

            print("{s:{c}^{n}}".format(s="Inter-token Latency", n=50, c="-"))
            print("{:<40} {:<10.4f}".format("Mean ITL (ms):", mean_itl))
            print("{:<40} {:<10.4f}".format("Median ITL (ms):", median_itl))
            print("{:<40} {:<10.4f}".format("Std ITL (ms):", std_itl))
            print("{:<40} {:<10.4f}".format("P99 ITL (ms):", p99_itl))

            print("=" * 50)
        else:
            # Initialize variables for metrics
            total_input = 0
            completed = 0
            actual_output_lens = []
            latencies = []
            per_token_latencies = []
            per_output_token_latencies = []

            for output in self.outputs:
                if output.success:
                    actual_output_lens.append(output.completion_tokens)
                    total_input += output.prompt_len
                    latencies.append(output.latency)
                    per_token_latencies.append(
                        output.latency / (output.prompt_len + output.completion_tokens)
                    )
                    if output.completion_tokens > 0:
                        per_output_token_latencies.append(
                            output.latency / output.completion_tokens
                        )
                    completed += 1
                else:
                    actual_output_lens.append(0)

            if completed == 0:
                warnings.warn(
                    "All requests failed. This is likely due to a misconfiguration "
                    "on the benchmark arguments.",
                    stacklevel=2,
                )

            # Calculate statistics
            total_output = sum(actual_output_lens)
            request_throughput = len(self.outputs) / total_time if total_time > 0 else 0
            input_throughput = total_input / total_time if total_time > 0 else 0
            output_throughput = total_output / total_time if total_time > 0 else 0

            mean_latency = np.mean(latencies) if latencies else 0
            mean_per_token_latency = (
                np.mean(per_token_latencies) if per_token_latencies else 0
            )
            mean_per_output_token_latency = (
                np.mean(per_output_token_latencies) if per_output_token_latencies else 0
            )

            # Print benchmark results
            print("{s:{c}^{n}}".format(s=" Benchmark Result ", n=50, c="="))
            print("{:<40} {:<10}".format("Successful requests:", completed))
            print("{:<40} {:<10.2f}".format("Benchmark duration (s):", total_time))
            print("{:<40} {:<10}".format("Total input tokens:", total_input))
            print("{:<40} {:<10}".format("Total generated tokens:", total_output))
            print(
                "{:<40} {:<10.2f}".format(
                    "Request throughput (req/s):", request_throughput
                )
            )
            print(
                "{:<40} {:<10.2f}".format(
                    "Input token throughput (tok/s):", input_throughput
                )
            )
            print(
                "{:<40} {:<10.2f}".format(
                    "Output token throughput (tok/s):", output_throughput
                )
            )

            print("{s:{c}^{n}}".format(s="Latency Statistics", n=50, c="-"))
            print("{:<40} {:<10.4f}".format("Mean latency (s):", mean_latency))
            print(
                "{:<40} {:<10.4f}".format(
                    "Mean latency per token (s):", mean_per_token_latency
                )
            )
            print(
                "{:<40} {:<10.4f}".format(
                    "Mean latency per output token (s):", mean_per_output_token_latency
                )
            )

            print("=" * 50)

            print(f"Total time: {total_time:.2f} s")
            print(f"Throughput: {len(self.outputs) / total_time:.2f} requests/s")

        if completed < len(self.input_requests):
            if self.print_error:
                logger.info("Errors encountered during benchmark:")
                for output in self.outputs:
                    if not output.success:
                        print(f"Error for prompt with length {output.prompt_len}: {output.error}")
            else:
                logger.info(
                    "Errors were encountered during the benchmark. Run with --print-error to see detailed error messages."
                )


class ConcurrentBenchmarkRunner(BenchmarkRunner):
    def __init__(
        self,
        api_url: str,
        model_uid: str,
        input_requests: List[Tuple[str, int, int]],
        stream: bool,
        concurrency: int,
        api_key: Optional[str] = None,
        print_error: bool = False,
    ):
        super().__init__(
            api_url,
            model_uid,
            input_requests,
            stream,
            api_key,
            print_error,
        )
        self.concurrency = concurrency
        self.left = len(input_requests)

    async def worker(self):
        pass


================================================
FILE: benchmark/benchmark_serving.py
================================================
# Copyright 2022-2023 XProbe Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import asyncio
import logging
import random
from typing import List, Tuple, Optional

import numpy as np

from utils import sample_requests, get_tokenizer
from benchmark_runner import ConcurrentBenchmarkRunner


logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class ServingBenchmarkRunner(ConcurrentBenchmarkRunner):
    def __init__(
        self,
        api_url: str,
        model_uid: str,
        input_requests: List[Tuple[str, int, int]],
        stream: bool,
        concurrency: int,
        request_rate: float,
        api_key: Optional[str] = None,
        print_error: bool = False,
    ):
        super().__init__(
            api_url,
            model_uid,
            input_requests,
            stream,
            concurrency,
            api_key,
            print_error,
        )
        self.request_rate = request_rate
        self.queue = None  # delay the creation of the queue

    async def _run(self):
        tasks = []

        for _ in range(self.concurrency):
            tasks.append(asyncio.create_task(self.worker()))

        await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)

    async def warm_up(self, num_requests: int = 5):
        if self.queue is None:
            self.queue = asyncio.Queue(len(self.input_requests))

        logger.info(f"Enqueuing {len(self.input_requests)} requests.")
        for req in iter(self.input_requests):
            await self.queue.put(req)
        await super().warm_up(num_requests)

    async def worker(self):
        """
        wait request dispatch by run(), and then send_request.
        When all request is done, most worker will hang on self.queue,
        but at least one worker will exit"""
        while self.left > 0:
            request = await self.queue.get()
            await self.send_request(request)
            self.left -= 1
            print("\rdone_request, left %d    " % (self.left), end="")

            if self.request_rate != float("inf"):
                # If the request rate is infinity, then we don't need to wait.
                # Sample the request interval from the exponential distribution.
                interval = np.random.exponential(1.0 / self.request_rate)
                # The next request will be sent after the interval.
                await asyncio.sleep(interval)
        print("")


def main(args: argparse.Namespace):
    if args.concurrency > args.num_prompts:
        print("Fix concurrency with num_prompts %d" % (args.num_prompts))
        args.concurrency = args.num_prompts
    print(args)

    random.seed(args.seed)
    np.random.seed(args.seed)

    api_url = f"http://{args.host}:{args.port}/v1/chat/completions"
    model_uid = args.model_uid

    logger.info("Preparing for benchmark.")
    tokenizer = get_tokenizer(args.tokenizer, trust_remote_code=args.trust_remote_code)
    input_requests = sample_requests(
        args.dataset,
        args.num_prompts,
        tokenizer,
        prompt_len_limit=args.prompt_len_limit,
    )

    logger.info("Benchmark starts.")

    benchmark = ServingBenchmarkRunner(
        api_url,
        model_uid,
        input_requests,
        args.stream,
        request_rate=args.request_rate,
        concurrency=args.concurrency,
        api_key=args.api_key,
        print_error=args.print_error,
    )
    asyncio.run(benchmark.run())

    benchmark.print_stats()


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Benchmark the online serving throughput."
    )
    parser.add_argument("--host", type=str, default="localhost")
    parser.add_argument("--port", type=int, default=9997)
    parser.add_argument(
        "--dataset", type=str, required=True, help="Path to the dataset."
    )
    parser.add_argument(
        "--tokenizer", type=str, required=True, help="Name or path of the tokenizer."
    )
    parser.add_argument(
        "--num-prompts", type=int, default=100, help="Number of prompts to process."
    )
    parser.add_argument(
        "--prompt-len-limit", type=int, default=1024, help="Prompt length limitation."
    )
    parser.add_argument(
        "--api-key",
        type=str,
        default=None,
        help="Authorization api key",
    )
    parser.add_argument(
        "--concurrency",
        "-c",
        type=int,
        default=100,
        help="Set the concurrency of request to send",
    )
    parser.add_argument(
        "--request-rate",
        type=float,
        default=float("inf"),
        help="Number of requests per second. If this is inf, "
        "then all the requests are sent at time 0. "
        "Otherwise, we use Poisson process to synthesize "
        "the request arrival times.",
    )
    parser.add_argument("--seed", type=int, default=0)
    parser.add_argument(
        "--trust-remote-code",
        action="store_true",
        help="Trust remote code from huggingface.",
    )
    parser.add_argument("--model-uid", type=str, help="Xinference model UID.")
    parser.add_argument(
        "--stream", action="store_true", help="Enable streaming responses."
    )
    parser.add_argument(
        "--print-error",
        action="store_true",
        help="Print detailed error messages if any errors encountered."
    )
    args = parser.parse_args()
    main(args)


================================================
FILE: benchmark/utils.py
================================================
# Copyright 2022-2023 XProbe Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import json
import logging
import random
from typing import TYPE_CHECKING, List, Tuple

from transformers import AutoTokenizer, PreTrainedTokenizerFast

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


if TYPE_CHECKING:
    from transformers import PreTrainedTokenizerBase

# A fast LLaMA tokenizer with the pre-processed `tokenizer.json` file.
_FAST_LLAMA_TOKENIZER = "hf-internal-testing/llama-tokenizer"


def get_tokenizer(
    tokenizer_name: str,
    *args,
    tokenizer_mode: str = "auto",
    trust_remote_code: bool = False,
    **kwargs,
) -> "PreTrainedTokenizerBase":
    """Gets a tokenizer for the given model name via Huggingface."""
    if tokenizer_mode == "slow":
        if kwargs.get("use_fast", False):
            raise ValueError("Cannot use the fast tokenizer in slow tokenizer mode.")
        kwargs["use_fast"] = False

    if (
        "llama" in tokenizer_name.lower()
        and kwargs.get("use_fast", True)
        and tokenizer_name != _FAST_LLAMA_TOKENIZER
    ):
        logger.info(
            "For some LLaMA-based models, initializing the fast tokenizer may "
            "take a long time. To eliminate the initialization time, consider "
            f"using '{_FAST_LLAMA_TOKENIZER}' instead of the original "
            "tokenizer."
        )
    try:
        tokenizer = AutoTokenizer.from_pretrained(
            tokenizer_name, *args, trust_remote_code=trust_remote_code, **kwargs
        )
    except TypeError as e:
        # The LLaMA tokenizer causes a protobuf error in some environments.
        err_msg = (
            "Failed to load the tokenizer. If you are using a LLaMA-based "
            f"model, use '{_FAST_LLAMA_TOKENIZER}' instead of the original "
            "tokenizer."
        )
        raise RuntimeError(err_msg) from e
    except ValueError as e:
        # If the error pertains to the tokenizer class not existing or not
        # currently being imported, suggest using the --trust-remote-code flag.
        if not trust_remote_code and (
            "does not exist or is not currently imported." in str(e)
            or "requires you to execute the tokenizer file" in str(e)
        ):
            err_msg = (
                "Failed to load the tokenizer. If the tokenizer is a custom "
                "tokenizer not yet available in the HuggingFace transformers "
                "library, consider setting `trust_remote_code=True` in LLM "
                "or using the `--trust-remote-code` flag in the CLI."
            )
            raise RuntimeError(err_msg) from e
        else:
            raise e

    if not isinstance(tokenizer, PreTrainedTokenizerFast):
        logger.warning(
            "Using a slow tokenizer. This might cause a significant "
            "slowdown. Consider using a fast tokenizer instead."
        )
    return tokenizer


def sample_requests(
    dataset_path: str,
    num_requests: int,
    tokenizer: "PreTrainedTokenizerBase",
    prompt_len_limit: int = 1024,
) -> List[Tuple[str, int, int]]:
    # Load the dataset.
    with open(dataset_path) as f:
        dataset = json.load(f)
    # Filter out the conversations with less than 2 turns.
    dataset = [data for data in dataset if len(data["conversations"]) >= 2]
    # Only keep the first two turns of each conversation.
    dataset = [
        (data["conversations"][0]["value"], data["conversations"][1]["value"])
        for data in dataset
    ]

    # Tokenize the prompts and completions.
    prompts = [prompt for prompt, _ in dataset]
    prompt_token_ids = tokenizer(prompts).input_ids
    completions = [completion for _, completion in dataset]
    completion_token_ids = tokenizer(completions).input_ids
    tokenized_dataset = []
    for i in range(len(dataset)):
        output_len = len(completion_token_ids[i])
        tokenized_dataset.append((prompts[i], prompt_token_ids[i], output_len))

    # Filter out too long sequences.
    filtered_dataset: List[Tuple[str, int, int]] = []
    for prompt, prompt_token_ids, output_len in tokenized_dataset:
        prompt_len = len(prompt_token_ids)
        if prompt_len < 4 or output_len < 4:
            # Prune too short sequences.
            # This is because TGI causes errors when the input or output length
            # is too short.
            continue
        if (
            prompt_len > prompt_len_limit
            or prompt_len + output_len > prompt_len_limit * 2
        ):
            # Prune too long sequences.
            continue
        filtered_dataset.append((prompt, prompt_len, output_len))

    # Sample the requests.
    sampled_requests = random.sample(filtered_dataset, num_requests)
    return sampled_requests


def generate_sorting_prompts(
    num_prompts: int,
    context_length: int,
    prompt_len_limit: int,
    tokenizer: "PreTrainedTokenizerBase",
) -> List[Tuple[str, int, int]]:
    prompts = []
    for i in range(0, num_prompts):
        random_nums = []
        _prompt_len = 0
        while True:
            r_str = "%s" % random.randint(0, 99)
            r_len = len(r_str) + 1
            if r_len + _prompt_len > prompt_len_limit:
                break
            random_nums.append(r_str)
            _prompt_len += r_len
        prompt = "Sort the numbers:" + ",".join(random_nums)
        prompts.append(prompt)
    prompt_token_ids = tokenizer(prompts).input_ids
    dataset = []
    for i in range(0, len(prompts)):
        prompt_len = len(prompt_token_ids[i])
        dataset.append((prompts[i], prompt_len, context_length - prompt_len))
    return dataset


================================================
FILE: doc/Makefile
================================================
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS    ?=
SPHINXBUILD   ?= sphinx-build
SPHINXINTL    ?= sphinx-intl
SOURCEDIR     = source
BUILDDIR      = build

# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) $(SOURCEDIR)
I18NSPHINXLANGS = -l zh_CN

# Put it first so that "make" without argument is like "make help".
help:
	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile html_zh_cn html_ja_jp gettext

html_zh_cn:
	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) -t zh_cn -D language='zh_CN' "$(SOURCEDIR)" $(BUILDDIR)/html_zh_cn

gettext:
	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
	$(SPHINXINTL) update -p $(BUILDDIR)/locale $(I18NSPHINXLANGS)
	python $(SOURCEDIR)/norm_zh.py

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

================================================
FILE: doc/source/_static/switcher.json
================================================
[
  {
    "name": "简体中文(Chinese)",
    "version": "zh-cn",
    "url": "https://inference.readthedocs.io/zh-cn/latest/"
  },
  {
    "name": "English",
    "version": "en",
    "url": "https://inference.readthedocs.io/en/latest/",
    "preferred": true
  }
]

================================================
FILE: doc/source/conf.py
================================================
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))


# -- Project information -----------------------------------------------------

project = 'Xinference'
copyright = '2025, Xorbits Inc.'
author = 'xorbitsai'


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
    "sphinx.ext.mathjax",
    "sphinx.ext.ifconfig",
    "sphinx.ext.intersphinx",
    "sphinx.ext.viewcode",
    "sphinx.ext.githubpages",
    "sphinx.ext.autosummary",
    "sphinx.ext.napoleon",
    "sphinx_tabs.tabs",
    "sphinx_design",
    "IPython.sphinxext.ipython_directive",
    "IPython.sphinxext.ipython_console_highlighting",
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []

# i18n
locale_dirs = ["locale/"]  # path is example but recommended.
gettext_compact = False  # optional


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages.  See the documentation for
# a list of builtin themes.
#
html_theme = 'pydata_sphinx_theme'
html_title = "Xinference"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']

# Define the json_url for our version switcher.
version_match = os.environ.get("READTHEDOCS_LANGUAGE")
json_url = "https://inference.readthedocs.io/en/latest/_static/switcher.json"
if not version_match:
    version_match = 'en'

html_theme_options = {
    "show_toc_level": 2,
    "header_links_before_dropdown": 7,
    "icon_links": [
        {
            "name": "GitHub",
            "url": "https://github.com/xorbitsai/inference",
            "icon": "fa-brands fa-github",
            "type": "fontawesome",
        },
    ],
    "navbar_align": "content",  # [left, content, right] For testing that the navbar items align properly
    "navbar_start": ["navbar-logo", "version-switcher"],
    "navbar_center": ["navbar-nav"],
    "switcher": {
        "json_url": json_url,
        "version_match": version_match,
    },
}


if version_match != 'zh-cn':
    html_theme_options['icon_links'].extend([{
        "name": "Discord",
        "url": "https://discord.gg/Xw9tszSkr5",
        "icon": "fa-brands fa-discord",
        "type": "fontawesome",
    },
    {
        "name": "Twitter",
        "url": "https://twitter.com/xorbitsio",
        "icon": "fa-brands fa-twitter",
        "type": "fontawesome",
    }])
    html_theme_options["external_links"] = [
        {"name": "Official Site", "url": "https://xinference.io"},
    ]
    html_theme_options["header_links_before_dropdown"] = 6
else:
    html_theme_options['icon_links'].extend([{
        "name": "WeChat",
        "url": "https://xinference.cn/images/WeCom.jpg",
        "icon": "fa-brands fa-weixin",
        "type": "fontawesome",
    },
    {
        "name": "Zhihu",
        "url": "https://zhihu.com/org/xorbits",
        "icon": "fa-brands fa-zhihu",
        "type": "fontawesome",
    }])
    html_theme_options["external_links"] = [
        {"name": "产品官网", "url": "https://xinference.cn"},
    ]

html_favicon = "_static/favicon.svg"


================================================
FILE: doc/source/development/contributing_codebase.rst
================================================
=============================
Contributing to the code base
=============================

.. contents:: Table of contents:
   :local:

Code standards
--------------

Writing good code is not just about what you write. It is also about *how* you write it.
During Continuous Integration testing, several tools will be run to check your code for stylistic errors.
Good style is a requirement for submitting code to Xinference.

In addition, it is important that we do not make sudden changes to the code that
could have the potential to break a lot of user code as a result. Therefore
we need it to be as backwards compatible as possible to avoid mass breakages.

Autofixing formatting errors
----------------------------

Moreover, Continuous Integration will run code formatting checks
like ``black``, ``flake8``, ``isort``, and others using `pre-commit hooks <https://pre-commit.com/>`_
Any warnings generated by these checks will cause the Continuous Integration to fail. Therefore,
it is advisable to run the check yourself before submitting code. This
can be done by installing ``pre-commit``::

    pip install pre-commit

and then running::

    pre-commit install

from the root of the Xinference repository. This setup ensures that all styling checks are
automatically executed each time you commit changes without your needing to run each one manually.
In addition, using ``pre-commit`` will also allow you to more easily
remain up-to-date with our code checks as they change.

Note that if needed, you can skip these checks with ``git commit --no-verify``.

If you don't want to use ``pre-commit`` as part of your workflow, you can still use it
to run its checks with::

    pre-commit run --files <files you have modified>

without needing to have done ``pre-commit install`` beforehand.

If you want to run checks on all recently committed files on upstream/main you can use::

    pre-commit run --from-ref=upstream/main --to-ref=HEAD --all-files

without needing to have done ``pre-commit install`` beforehand.

.. note::

    You may consider periodically running ``pre-commit gc`` to clean up repos
    which are no longer used.

.. note::

    If you have conflicting installations of ``virtualenv``, if could lead to
    errors - refer to `here <https://github.com/pypa/virtualenv/issues/1875>`_.

    Also, due to a `bug in virtualenv <https://github.com/pypa/virtualenv/issues/1986>`_,
    you may run into issues if you're using conda. To solve this, you can downgrade
    ``virtualenv`` to version ``20.0.33``.

Backwards compatibility
-----------------------

Please try to maintain backward compatibility. If you think breakage is necessary,
clearly state why as part of the pull request. Also, be careful when changing method
signatures and add deprecation warnings where needed. Also, add the deprecated sphinx
directive to the deprecated functions or methods.

You'll also need to

1. Write a new test that asserts a warning is issued when calling with the deprecated argument
2. Update all of Xinference existing tests and code to use the new argument

Type hints
----------

Xinference strongly encourages the use of :pep:`484` style type hints. New development should
contain type hints and pull requests to annotate existing code are accepted as well!

Test-driven development
-----------------------

Xinference is serious about testing and strongly encourages contributors to embrace
`test-driven development (TDD) <https://en.wikipedia.org/wiki/Test-driven_development>`_.
This development process "relies on the repetition of a very short development cycle:
first the developer writes an (initially failing) automated test case that defines a desired
improvement or new function, then produces the minimum amount of code to pass that test."
So, before actually writing any code, you should write your tests. Often the test can be
taken from the original GitHub issue. However, it is always worth considering additional
use cases and writing corresponding tests.

Adding tests is frequently requested after code is pushed to Xinference. Thus,
it is worth getting in the habit of writing tests ahead of time so this is never an issue.

================================================
FILE: doc/source/development/contributing_environment.rst
================================================
==================================
Creating a development environment
==================================

.. contents:: Table of contents:
   :local:

Before proceeding with any code modifications, it's essential to set up the necessary environment for Xinference development,
which includes familiarizing yourself with Git usage, establishing an isolated environment, installing Xinference, and compiling the frontend.

Getting started with Git
-------------------------

Now that you have identified an issue you wish to resolve, an enhancement to incorporate, or documentation to enhance,
it's crucial to acquaint yourself with GitHub and the Xinference codebase.

To the new user, working with Git is one of the more intimidating aspects of contributing to Xinference.
It can very quickly become overwhelming, but sticking to the guidelines below will help simplify the process 
and minimize potential issues. As always, if you are having difficulties please
feel free to ask for help.

The code is hosted on `GitHub <https://github.com/xorbitsai/inference>`_. To
contribute you will need to sign up for a `free GitHub account
<https://github.com/signup/free>`_. We use `Git <https://git-scm.com/>`_ for
version control to allow many people to work together on the project.

`GitHub has instructions <https://help.github.com/set-up-git-redirect>`__ for installing git,
setting up your SSH key, and configuring git. All these steps need to be completed before
you can work seamlessly between your local repository and GitHub.

Some great resources for learning Git:

* `Official Git Documentation <https://git-scm.com/doc>`_
* `Pro Git Book <https://git-scm.com/book/en/v2>`_
* `Git Tutorial by Atlassian <https://www.atlassian.com/git/tutorials>`_
* `Git - Concise Guide <http://rogerdudler.github.io/git-guide/index.zh.html>`_

.. note::
   If the speed of ``git clone`` is slow, you can use the following command
   to add a proxy:

   ::

      export https_proxy=YourProxyAddress

Creating an isolated environment
--------------------------------

Before formally installing Xinference, it's recommended to create an isolated 
environment, using Conda recommended, for ease of subsequent operations.

::

   conda create --name xinf
   conda activate xinf

``xinf`` can be replaced with a custom Conda environment name.

Afterward, you'll need to install Python and Node.js (npm) in the newly created
Conda environment. Here are the commands:

::

   conda install python=3.12
   conda install nodejs

Install from source code
------------------------

Before we begin, please make sure that you have cloned the repository. 
Suppose you clone the repository as ``inference`` directory,  ``cd`` to this directory
where the ``setup.cfg`` and ``setup.py`` files are located, and run the following command:

::

   pip install -e .
   xinference-local

If the commands run successfully, you can use Xinference normally. For
detailed usage instructions, refer to
`using_xinference <https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html>`__.

If errors occur or the process freezes during execution, the next step
is to compile the frontend.

Frontend Compilation
--------------------

Navigate to the ``inference/xinference/ui/web/ui`` directory. Then, execute the following command
to clear the cache:

::

   npm cache clean

If the command fails to execute, you can try adding the ``--force`` option.

.. note::
   If the ``node_modules`` folder already exists in this directory,
   it's recommended to manually delete it before cleaning the cache.

Next, execute the following command in this directory to compile the
frontend:

::

   npm install
   npm run build

Still, if the first command fails to execute, you can try adding the ``--force`` option.

After compiling the frontend, you can ``cd`` back to the directory
where the ``setup.cfg`` and ``setup.py`` files are located,
and install Xinference via ``pip install -e .``.


================================================
FILE: doc/source/development/index.rst
================================================
.. _development_index:

===========
Development
===========

.. toctree::
    :maxdepth: 2

    contributing_environment
    contributing_codebase
    xinference_internals


================================================
FILE: doc/source/development/xinference_internals.rst
================================================
===========================
The internals of Xinference
===========================

.. contents:: Table of contents:
   :local:

Overview
========
Xinference leverages `Xoscar <https://github.com/xorbitsai/xoscar>`_, an actor programming framework we designed,
as its core component to manage machines, devices, and model inference processes. Each actor serves as a basic
unit for model inference and various inference backends can be integrate into the actor, enabling us to support
multiple inference engines and hardware. These actors are hosted and scheduled within actor pools, which are
designed to be asynchronous and non-blocking and function as resource pools.

.. raw:: html

    <img class="align-center" alt="actor" src="../_static/actor.svg" style="background-color: transparent", width="77%">

====

Both supervisor and worker are actor instances. Initially, an actor pool, serving as a resource pool, needs to be created
on each server; and each actor can utilize a CPU core or a GPU device. Each server has its own address (IP address or
hostname), so actors on different computing nodes can communicate with each other through these addresses. See `Actor`_ for more information.

RESTful API
===========
The RESTful API is implemented using `FastAPI <https://github.com/tiangolo/fastapi>`_, as specified in
`api/restful_api.py <https://github.com/xorbitsai/inference/tree/main/xinference/api/restful_api.py>`_.

::

  self._router.add_api_route("/status", self.get_status, methods=["GET"])

This is an example of the API ``/status``, it's corresponding function is ``get_status``. You can add connection
between RESTful API and the backend function you want in `api/restful_api.py <https://github.com/xorbitsai/inference/tree/main/xinference/api/restful_api.py>`_.

Command Line
============
The Command Line is implemented using `Click <https://click.palletsprojects.com/>`_, as specified in
`deploy/cmdline.py <https://github.com/xorbitsai/inference/tree/main/xinference/deploy/cmdline.py>`_,
allowing users to interact with the Xinference deployment features directly from the terminal.

Entry Points
------------
Take the command-lines we implemented as examples:

- ``xinference``: Provides commands for model management, including registering/unregistering models, listing all
  registered/running models, and launching or terminating specific models.
  It also features interactive commands like generate and chat for testing and interacting with deployed models in real-time.

- ``xinference-local``: Starts a local Xinference service.

- ``xinference-supervisor``: Initiates a supervisor process that manages and monitors worker actors within a distributed setup.

- ``xinference-worker``: Starts a worker process that executes tasks assigned by the supervisor, utilizing available
  computational resources effectively.

Each command is equipped with ``options`` and ``flags`` to customize its behavior, such as specifying log levels,
host addresses, port numbers, and other relevant settings.

Python projects define command-line console entry points in `setup.cfg` or `setup.py`.

::

  console_scripts =
      xinference = xinference.deploy.cmdline:cli
      xinference-local = xinference.deploy.cmdline:local
      xinference-supervisor = xinference.deploy.cmdline:supervisor
      xinference-worker = xinference.deploy.cmdline:worker

The command-line ``xinference`` can be referred to code in ``xinference.deploy.cmdline:cli``.

Click
-----
We use Click to implement a specific command-line:

::

  @click.option(
        "--host",
        "-H",
        default=XINFERENCE_DEFAULT_DISTRIBUTED_HOST,
        type=str,
        help="Specify the host address for the supervisor.",
    )
    @click.option(
        "--port",
        "-p",
        default=XINFERENCE_DEFAULT_ENDPOINT_PORT,
        type=int,
        help="Specify the port number for the Xinference web ui and service.",
    )

For example, the ``xinference-local`` command allows you to define the host address and port.

Actor
=====
Xinference is fundamentally based on `Xoscar <https://github.com/xorbitsai/xoscar>`_, our actor framework,
which can manage computational resources and Python processes to support scalable and concurrent programming.
The following is a pseudocode demonstrating how our Worker Actor works, the actual Worker Actor is more complex than this.

::

  import xoscar as xo

  class WorkerActor(xo.Actor):
      def __init__(self, *args, **kwargs):
          ...
      async def launch_model(self, model_id, n_gpu, ...):
          # launch an inference engine, use specific model class to load model checkpoints
          ...
      async def list_models(self):
          # list models on this actor
          ...
      async def terminate_model(self, model_id):
          # terminate the model
          ...
      async def __post_create__(self):
          # called after the actor instance is created
          ...
      async def __pre_destroy__(self):
          # called before the actor instance is destroyed
          ...

We use the ``WorkerActor`` as an example to illustrate how we build the Xinference. Each actor class
is a standard Python class that inherits from ``xoscar.Actor``. An instance of this class is a specific actor
within the actor pool.

- **Define Actor Actions**: Each actor needs to define certain actions or behaviors to accomplish specific tasks.
  For instance, the model inference ``WorkerActor`` needs to launch the model (``launch_model``), list the models
  in this actor (``list_models``), terminate a model (``terminate_model``). There are two special methods worth
  noting. The ``__post_create__`` is invoked before the actor is created, allowing for necessary initializations.
  The ``__pre_destroy__`` is called after the actor is destroyed, allowing for cleanup or finalization tasks.

- **Reference Actor and Invoke Methods**: When an actor is created, it yields a reference variable so that other
  actors can reference it. The actor reference can also be referenced with the address. Suppose the ``WorkerActor``
  is created and the reference variable is ``worker_ref``,  the ``launch_model`` method of this actor class can
  be invoked by calling ``worker_ref.launch_model()``.
  Even if the actor's method is originally a synchronized method, when called with an actor reference, it will
  become as an asynchronous method.

- **Inference Engine**: The actor can manage the process, and the inference engine is also a process. In the launch
  model part of the ``WorkerActor``, we can initialize different inference engines according to the user's need.
  Therefore, Xinference can support multiple inference engines and can easily adapt to new inference engines in the
  future.

See `Xoscar document <https://xoscar.dev/en/latest/getting_started/llm-inference.html>`_ for more actor use cases.

Asynchronous Programming
========================

Both Xinference and Xoscar highly utilize asynchronous programming of ``asyncio``.
Asynchronous programming is a programming paradigm that does not block.
Instead, requests and function calls are issued and executed in the background
and results are returned in the future. This enables us to perform
activities concurrently.

If you're not familiar with Pythons's ``asyncio``, you can see more tutorials for help:

  - `Python Asyncio Tutorial <https://bbc.github.io/cloudfit-public-docs/asyncio/asyncio-part-1.html>`__

  - `Real Python's asyncio Tutorial <https://realpython.com/async-io-python/>`__

  - `Python Official Documentation <https://docs.python.org/3/library/asyncio.html>`__


Model
=====

Xinference supports different types of models including large language models (LLMs), image models, audio models, embedding models, etc.
All models are implemented in `model/ <https://github.com/xorbitsai/inference/tree/main/xinference/model>`_.

LLM
---

Take `model/llm/ <https://github.com/xorbitsai/inference/tree/main/xinference/model/llm>`_ for example, it focuses on
the management and instantiation of LLMs. It includes detailed implementations for loading, configuring,
and deploying LLMs.

We support many backends such as GGML, PyTorch, and vLLM. Our generated content is compatible with the format of OpenAI, supporting features such as streaming output and returning chat completion format (for chat models only).
Therefore, there is a lot of adaptation work to be done after the model generate content. These tasks are not difficult, but they do require some time. When writing this part of the code, please refer to the `OpenAI API documentation <https://platform.openai.com/docs/introduction>`_ and the documentation of various inference backends, and make the necessary adaptations.

JSON
----

In `model/llm/llm_family.json <https://github.com/xorbitsai/inference/blob/main/xinference/model/llm/llm_family.json>`_,
we utilize JSON files to manage the metadata of emerging open-source models. Adding a new model does not necessitate writing new code,
it merely requires appending new metadata to the existing JSON file.

::

  {
      "model_name": "llama-2-chat",
      "model_ability": ["chat"],
      "model_specs": [
          {
              "model_format": "ggmlv3",
              "model_size_in_billions": 70,
              "quantization": ["q8_0", ...],
              "model_id": "TheBloke/Llama-2-70B-Chat-GGML",
          },
          ...
      ],
      "prompt_style": {
          "style_name": "LLAMA2",
          "system_prompt": "<s>[INST] <<SYS>>\nYou are a helpful AI assistant.\n<</SYS>>\n\n",
          "roles": ["[INST]", "[/INST]"],
          "stop_token_ids": [2],
          "stop": ["</s>"]
      }
  }

This is an example of how to define the Llama-2 chat model. The ``model_specs`` define the information of the model, as one model family
usually comes with various sizes, quantization methods, and file formats.
For instance, the ``model_format`` could be ``pytorch`` (using Hugging Face Transformers or vLLM as backend),
``ggmlv3`` (a tensor library associated with llama.cpp), or ``gptq`` (a post-training quantization framework).
The ``model_id`` defines the repository of the model hub from which Xinference downloads the checkpoint files.
Furthermore, due to distinct instruction-tuning processes, different model families have varying prompt styles.
The ``prompt_style`` in the JSON file specifies how to format prompts for this particular model.
For example, ``system_prompt`` and ``roles`` are used to specify the instructions and personality of the model.

Code Walkthrough
================

The main code is located in the `xinference/ <https://github.com/xorbitsai/inference/tree/main/xinference>`_:

- `api/ <https://github.com/xorbitsai/inference/tree/main/xinference/api>`_: `restful_api.py <https://github.com/xorbitsai/inference/tree/main/xinference/api/restful_api.py>`_
  is the core part that sets up and runs the RESTful APIs.
  It integrates an authentication service (the specific code is located in `oauth2/ <https://github.com/xorbitsai/inference/tree/main/xinference/api/oauth2>`_),
  as some or all endpointsrequire user authentication.

- `client/ <https://github.com/xorbitsai/inference/tree/main/xinference/client>`_: This is the client of Xinference.

  - `oscar/ <https://github.com/xorbitsai/inference/tree/main/xinference/client/oscar>`_ defines the Actor Client which acts as
    a client interface for interacting with models deployed in a Xinference cluster.

  - `restful/ <https://github.com/xorbitsai/inference/tree/main/xinference/client/restful>`_ implements a RESTful client for
    interacting with a Xinference service.

- `core/ <https://github.com/xorbitsai/inference/tree/main/xinference/core>`_: This is the core part of Xinference.

  - `metrics.py <https://github.com/xorbitsai/inference/tree/main/xinference/core/metrics.py>`_ and
    `resource.py <https://github.com/xorbitsai/inference/tree/main/xinference/core/resource.py>`_
    defines a set of tools for collecting and reporting metrics and the status of node resources, including model throughput,
    latency, the usage of CPU and GPU, memory usage, and more.

  - `image_interface.py <https://github.com/xorbitsai/inference/tree/main/xinference/core/image_interface.py>`_ and
    `chat_interface.py <https://github.com/xorbitsai/inference/tree/main/xinference/core/chat_interface.py>`_
    implement `Gradio <https://github.com/gradio-app/gradio>`_ interfaces for image and chat models, respectively.
    These interfaces allow users to interact with models through a Web UI, such as generating images or engaging in chat.
    They build user interfaces using the gradio package and communicate with backend models through our RESTful APIs.

  - `worker.py <https://github.com/xorbitsai/inference/tree/main/xinference/core/worker.py>`_ and
    `supervisor.py <https://github.com/xorbitsai/inference/tree/main/xinference/core/supervisor.py>`_
    respectively define the logic for worker actors and supervisor actor. Worker actors are responsible for carrying out specific
    model computation tasks, while supervisor actors manage the lifecycle of worker nodes, schedule tasks, and monitor system states.

  - `status_guard.py <https://github.com/xorbitsai/inference/tree/main/xinference/core/status_guard.py>`_ implements a status monitor
    to track the status of models (like creating, updating, terminating, etc.). It allows querying status information of model instances
    and managing these statuses based on the model's UID.

  - `cache_tracker.py <https://github.com/xorbitsai/inference/tree/main/xinference/core/cache_tracker.py>`_ defines a cache tracker for
    recording and managing cache status and information of model versions. It supports recording cache locations and statuses of model
    versions and querying model version information based on model names.

  - `event.py <https://github.com/xorbitsai/inference/tree/main/xinference/core/event.py>`_ defines an event collector for gathering and
    reporting various runtime events of models, such as information, warnings, and errors.
    `model.py <https://github.com/xorbitsai/inference/tree/main/xinference/core/model.py>`_ defines a Model Actor, the core component for
    direct model interactions. The Model Actor is responsible for executing model inference requests, handling input and output data streams,
    and supports various types of model operations.

- `deploy/ <https://github.com/xorbitsai/inference/tree/main/xinference/deploy>`_: It provides a command-line interface (CLI) for interacting
  with the Xinference framework, allowing users to perform operations by command line. See `Command Line`_ for more information.

- `locale/ <https://github.com/xorbitsai/inference/tree/main/xinference/locale>`_: It supports multi-language localization. By simply adding
  and updating JSON translation files, it becomes possible to support more languages, improving user experience.

- `model/ <https://github.com/xorbitsai/inference/tree/main/xinference/model>`_: It provides a structure for model descriptions, creation,
  and caching. See `Model`_ for more information.

- `web/ui/ <https://github.com/xorbitsai/inference/tree/main/xinference/web/ui>`_: The js code of the frontend (Web UI).


================================================
FILE: doc/source/examples/ai_podcast.rst
================================================
.. _examples_ai_podcast:

======================
Example: AI Podcast 🎙
======================

**Description**:

🎙️AI Podcast - Voice Conversations with Multiple Agents on M2 Max 💻

**Support Language** :

English (AI_Podcast.py)

Chinese (AI_Podcast_ZH.py)

**Used Technology (EN version)** :

    @ `OpenAI <https://twitter.com/OpenAI>`_ 's `whisper <https://pypi.org/project/openai-whisper/>`_

    @ `ggerganov <https://twitter.com/ggerganov>`_ 's `ggml <https://github.com/ggerganov/ggml>`_

    @ `WizardLM_AI <https://twitter.com/WizardLM_AI>`_ 's `wizardlm v1.0 <https://huggingface.co/WizardLM>`_

    @ `lmsysorg <https://twitter.com/lmsysorg>`_ 's `vicuna v1.3 <https://huggingface.co/lmsys/vicuna-7b-v1.3>`_

    @ `Xinference <https://github.com/xorbitsai/inference>`_ as a launcher

**Detailed Explanation on the Demo Functionality** :

1. Generate the Wizardlm Model and Vicuna Model when the program is launching with Xorbits Inference.
   Initiate the Chatroom by giving the two chatbot their names and telling them that there is a human user
   called "username", where "username" is given by user's input. Initialize a empty chat history for the chatroom.

2. Use Audio device to store recording into file, and transcribe the file using OpenAI's Whisper to receive a human readable text as string.

3. Based on the input message string, determine which agents the user want to talk to. Call the target agents and
   parse in the input string and chat history for the model to generate.

4. When the responses are ready, use Macos's "Say" Command to produce audio through speaker. Each agents have their
   own voice while speaking.

5. Store the user input and the agent response into chat history, and recursively looping the program until user
   explicitly says words like "see you" in their responses.

**Highlight Features with Xinference** :

1. With Xinference's distributed system, we can easily deploy two different models in the same session and in the
   same "chatroom". With enough resources, the framework can deploy any amount of models you like at the same time.

2. With Xinference, you can deploy the model easily by just adding a few lines of code.
   For examples, for launching the vicuna model in the demo, just by::

     args = parser.parse_args()
     endpoint = args.endpoint
     client = Client(endpoint)

     model_a = "vicuna-v1.3"
     model_a_uid = client.launch_model(
         model_name=model_a,
         model_format="ggmlv3",
         model_size_in_billions=7,
         quantization="q4_0",
         n_ctx=2048,
     )
     model_a_ref = client.get_model(model_a_uid)

   Then, the Xinference client will handle "target model downloading and caching", "set up environment and process
   for the model", and "run the service at selected endpoint. " You are now ready to play with your llm model.

**Original Demo Video** :

    * `🎙️AI Podcast - Voice Conversations with Multiple Agents on M2 Max💻🔥🤖 <https://twitter.com/yichaocheng/status/1679129417778442240>`_

**Source Code** :

    * `AI_Podcast <https://github.com/xorbitsai/inference/blob/main/examples/AI_podcast.py>`_ (English Version)

    * `AI_Podcast_ZH <https://github.com/xorbitsai/inference/blob/main/examples/AI_podcast_ZH.py>`_ (Chinese Version)

================================================
FILE: doc/source/examples/chatbot.rst
================================================
.. _examples_chatbot:

========================
Example: CLI chatbot 🤖️
========================

**Description**:

Demonstrate how to interact with Xinference to play with LLM chat functionality with an AI agent in command line💻

**Used Technology**:

    @ `ggerganov <https://twitter.com/ggerganov>`_ 's `ggml <https://github.com/ggerganov/ggml>`_

    @ `Xinference <https://github.com/xorbitsai/inference>`_ as a launcher

    @ All LLaMA and Chatglm models supported by `Xorbitsio inference <https://github.com/xorbitsai/inference>`_

**Detailed Explanation on the Demo Functionality** :

1. Take the user command line input in the terminal and grab the required parameters for model launching.

2. Launch the Xinference frameworks and automatically deploy the model user demanded into the cluster.

3. Initialize an empty chat history to store all the context in the chatroom.

4. Recursively ask for user's input as prompt and let the model to generate response based on the prompt and the
   chat history. Show the Output of the response in the terminal.

5. Store the user's input and agent's response into the chat history as context for the upcoming rounds.

**Source Code** :
    * `chat <https://github.com/RayJi01/Xprobe_inference/blob/main/examples/chat.py>`_

================================================
FILE: doc/source/examples/gradio_chatinterface.rst
================================================
.. _examples_gradio_chatinterface:

===============================
Example: Gradio ChatInterface🤗
===============================

**Description**:

This example showcases how to build a chatbot with 120 lines of code with Gradio ChatInterface and Xinference local LLM

**Used Technology**:

    @ `Xinference <https://github.com/xorbitsai/inference>`_ as a LLM model hosting service

    @ `Gradio <https://github.com/gradio-app/gradio>`_ as a web interface for the chatbot

**Detailed Explanation on the Demo Functionality** :

* Parse user-provided command line arguments to capture essential model parameters such as model name, size, format, and quantization.

* Establish a connection to the Xinference framework and deploy the specified model, ensuring it's ready for real-time interactions.

* Implement helper functions (flatten and to_chat) to efficiently handle and store chat interactions, ensuring the model has context for generating relevant responses.

* Set up an interactive chat interface using Gradio, allowing users to communicate with the model in a user-friendly environment.

* Activate the Gradio web interface, enabling users to start their chat sessions and receive model-generated responses based on their queries.

**Source Code** :
    * `Gradio ChatInterface <https://github.com/xorbitsai/inference/blob/main/examples/gradio_chatinterface.py>`_

================================================
FILE: doc/source/examples/index.rst
================================================
.. _examples_index:

========
Examples
========

.. toctree::
   :maxdepth: 2
   :hidden:

   ai_podcast
   chatbot
   gradio_chatinterface
   pdf_chatbot
   langchain_streamlit_doc_chat

Here you can find examples and resources to learn about how to use Xinference.

Demos
=====

End-to-end applications of using Xinference:

* `Voice Conversations with AI Agents on M2 Max <ai_podcast.html>`_

* `Interacting with LLM Models: A Command-Line Example <chatbot.html>`_

* `Interacting with LLM Models: A Gradio ChatInterface Example <gradio_chatinterface.html>`_

* `PDF Chatbot with Local LLM and Embeddings <pdf_chatbot.html>`_

* `Local Doc Conversations with LangChain and Streamlit <langchain_streamlit_doc_chat.html>`_

If you come across other examples in your own workflows we encourage you to contribute a `PR <https://github.com/xorbitsai/inference/pulls>`_!


Tutorials
=========

The following tutorials cover the basics of using Xinference in different scenarios:

* `[Notebook] Question-answering(QA) Application with Xinference, Milvus and LangChain <https://github.com/RayJi01/Xprobe_inference/blob/main/examples/LangChain_QA.ipynb>`_

* `Using Xinference local LLMs within LlamaIndex <https://docs.llamaindex.ai/en/stable/examples/llm/xinference_local_deployment.html>`_

* `[Chinese] 如何让 Chatbox 接入开源大模型，实现免费聊天 <https://zhuanlan.zhihu.com/p/655765551>`_

* `[Chinese] 摆脱 OpenAI 依赖，8 分钟教你用开源生态构建全栈 AI 应用 <https://mp.weixin.qq.com/s/cXBC0dikldNiGwOwPuJfUQ>`_

* `[Chinese] 使用全套开源工具构建 LLM 应用实战： 在 Dify 调用 Baichuan 开源模型能力 <https://mp.weixin.qq.com/s/JWYWyJxS3ludMpMDZKw_Dw>`_


Third-Party Library Integrations
================================

Xinference is designed to seamlessly integrate and deploy open-sourced AI models, so we want to incorporate support for mainstream toolkits
in the AI landscape. Xinference can be used with the following third-party libraries:

* LangChain `Text Embedding Models <https://python.langchain.com/docs/integrations/text_embedding/xinference>`_ and `LLMs <https://python.langchain.com/docs/integrations/llms/xinference>`_

* `LlamaIndex Xinference LLM <https://docs.llamaindex.ai/en/stable/api_reference/llms/xinference.html>`_


================================================
FILE: doc/source/examples/langchain_streamlit_doc_chat.rst
================================================
.. _examples_langchain_streamlit_doc_chat:

=======================================
Example: LangChain Streamlit Doc Chat📄
=======================================

**Description**:

This Streamlit-based application demonstrates a AI chatbot powered by local LLM and embedding models

**Used Technology**:

    @ `Xinference <https://github.com/xorbitsai/inference>`_: as the LLM and embedding model hosting service

    @ `LangChain <https://github.com/run-llama/llama_index>`_: orchestrates the entire document processing and query answering pipeline

    @ `Streamlit <https://streamlit.io/>`_: for interactive user interface

**Detailed Explanation on the Demo Functionality** :

* Streamlit UI for uploading text files, enhancing user interaction.

* Texts are split into chunks and embedded using Xinference for efficient processing.

* Executes similarity searches on embedded texts to pinpoint relevant sections for user queries.

* Utilizes a structured prompt template for focused LLM interactions.

* Xinference's LLM processes queries within the context of relevant document parts, providing accurate responses.

* The system facilitates effective and context-sensitive document exploration, aiding users in information retrieval.

**Source Code** :
    * `LangChain Streamlit Doc Chat <https://github.com/xorbitsai/inference/blob/main/examples/LangChain_Streamlit_Doc_Chat.py>`_

================================================
FILE: doc/source/examples/pdf_chatbot.rst
================================================
.. _examples_pdf_chatbot:

======================
Example: PDF Chatbot📚
======================

**Description**:

This example showcases how to build a PDF chatbot with local LLM and Embedding models

**Used Technology**:

    @ `Xinference <https://github.com/xorbitsai/inference>`_ as a LLM model hosting service

    @ `LlamaIndex <https://github.com/run-llama/llama_index>`_ for orchestrating the entire RAG pipeline 

    @ `Streamlit <https://streamlit.io/>`_ for interactive UI

**Detailed Explanation on the Demo Functionality** :

* Crafted a Dockerfile to simplify the process and ensure easy reproducibility.

* Set up models with Xinference and expose two ports for accessing them.

* Leverage Streamlit for seamless file uploads and interactive communication with the chat engine.

* 5x faster doc embedding than OpenAI's API.

* Leveraging the power of GGML to offload models to the GPU, ensuring swift acceleration. Less long waits for returns.

**Source Code** :
    * `PDF Chatbot <https://github.com/onesuper/PDF-Chatbot-Local-LLM-Embeddings>`_

================================================
FILE: doc/source/gen_docs.py
================================================
# Copyright 2022-2023 XProbe Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import json
import os
import sys
from collections import defaultdict

from jinja2 import Environment, FileSystemLoader

# Mock engine libraries before importing xinference modules
def mock_engine_libraries():
    """Mock engine libraries to make them appear installed for documentation generation"""
    from types import ModuleType
    from importlib.machinery import ModuleSpec
    
    # Create mock vllm module
    vllm_mock = ModuleType('vllm')
    vllm_mock.__version__ = "1.0.0"  # Latest version for full feature support
    vllm_mock.__spec__ = ModuleSpec('vllm', None)
    vllm_mock.__file__ = "mock_vllm.py"
    
    # Create mock mlx module with core submodule

    mlx_mock = ModuleType('mlx')
    mlx_mock.__version__ = "1.0.0"
    mlx_mock.__spec__ = ModuleSpec('mlx', None)
    mlx_mock.__file__ = "mock_mlx.py"
    
    mlx_core_mock = ModuleType('mlx.core')
    mlx_core_mock.__spec__ = ModuleSpec('mlx.core', None)
    mlx_core_mock.__file__ = "mock_mlx_core.py"
    # Add required attributes for xoscar serialization
    mlx_core_mock.array = type('MockArray', (), {})
    mlx_mock.core = mlx_core_mock
    
    # Create mock lmdeploy module  
    lmdeploy_mock = ModuleType('lmdeploy')
    lmdeploy_mock.__version__ = "0.6.0"
    lmdeploy_mock.__spec__ = ModuleSpec('lmdeploy', None)
    lmdeploy_mock.__file__ = "mock_lmdeploy.py"
    
    # Create mock sglang module
    sglang_mock = ModuleType('sglang')
    sglang_mock.__version__ = "0.3.0"
    sglang_mock.__spec__ = ModuleSpec('sglang', None)
    sglang_mock.__file__ = "mock_sglang.py"

    # Create mock xllamacpp module with proper module spec for importlib.util.find_spec
    import importlib.util
    import importlib.machinery

    xllamacpp_mock = ModuleType('xllamacpp')
    xllamacpp_mock.__version__ = "1.0.0"

    # Create a proper ModuleSpec that importlib.util.find_spec can find
    xllamacpp_spec = importlib.machinery.ModuleSpec('xllamacpp', None)
    xllamacpp_spec.origin = "mock_xllamacpp.py"
    xllamacpp_mock.__spec__ = xllamacpp_spec
    xllamacpp_mock.__file__ = "mock_xllamacpp.py"

    # Create mock mlx_lm module
    mlx_lm_mock = ModuleType('mlx_lm')
    mlx_lm_mock.__version__ = "1.0.0"
    mlx_lm_mock.__spec__ = ModuleSpec('mlx_lm', None)
    mlx_lm_mock.__file__ = "mock_mlx_lm.py"

    # Create mock mlx_vlm module
    mlx_vlm_mock = ModuleType('mlx_vlm')
    mlx_vlm_mock.__version__ = "1.0.0"
    mlx_vlm_mock.__spec__ = ModuleSpec('mlx_vlm', None)
    mlx_vlm_mock.__file__ = "mock_mlx_vlm.py"

    # Mock these modules in sys.modules
    sys.modules['vllm'] = vllm_mock
    sys.modules['mlx'] = mlx_mock
    sys.modules['mlx.core'] = mlx_core_mock
    sys.modules['lmdeploy'] = lmdeploy_mock
    sys.modules['sglang'] = sglang_mock
    sys.modules['xllamacpp'] = xllamacpp_mock
    sys.modules['mlx_lm'] = mlx_lm_mock
    sys.modules['mlx_vlm'] = mlx_vlm_mock

# Apply mocking before importing xinference modules
mock_engine_libraries()

# Mock platform checks BEFORE importing xinference modules
def mock_platform_checks():
    """Mock platform and hardware checks for documentation generation"""
    # Import and mock engine checks without modifying system-wide platform settings
    try:
        # Mock vLLM platform checks
        import xinference.model.llm.vllm.core as vllm_core
        vllm_core.VLLMModel._is_linux = lambda: True
        vllm_core.VLLMModel._has_cuda_device = lambda: True
        vllm_core.VLLMChatModel._is_linux = lambda: True
        vllm_core.VLLMChatModel._has_cuda_device = lambda: True
        vllm_core.VLLMMultiModel._is_linux = lambda: True
        vllm_core.VLLMMultiModel._has_cuda_device = lambda: True

        # Mock SGLang platform checks if available
        try:
            import xinference.model.llm.sglang.core as sglang_core
            sglang_core.SGLANGModel._is_linux = lambda: True
            sglang_core.SGLANGModel._has_cuda_device = lambda: True
            sglang_core.SGLANGChatModel._is_linux = lambda: True
            sglang_core.SGLANGChatModel._has_cuda_device = lambda: True
            sglang_core.SGLANGVisionModel._is_linux = lambda: True
            sglang_core.SGLANGVisionModel._has_cuda_device = lambda: True
        except ImportError:
            pass

        # Mock LMDEPLOY platform checks if available
        try:
            import xinference.model.llm.lmdeploy.core as lmdeploy_core
            lmdeploy_core.LMDeployModel._is_linux = lambda: True
            lmdeploy_core.LMDeployModel._has_cuda_device = lambda: True
            lmdeploy_core.LMDeployChatModel._is_linux = lambda: True
            lmdeploy_core.LMDeployChatModel._has_cuda_device = lambda: True
        except ImportError:
            pass

        # Mock MLX engine platform checks by monkey-patching the imports within MLX module
        try:
            # First, let's monkey-patch sys and platform imports within the MLX module only
            import xinference.model.llm.mlx.core as mlx_core

            # Create mock objects that look like sys.platform and platform functions
            class MockSys:
                platform = "darwin"

            class MockPlatform:
                @staticmethod
                def system():
                    return "Darwin"

                @staticmethod
                def processor():
                    return "arm"

            # Store original references
            original_mlx_match = mlx_core.MLXModel.match_json
            original_mlx_chat_match = mlx_core.MLXChatModel.match_json
            original_mlx_vision_match = mlx_core.MLXVisionModel.match_json

            # Now create wrapper functions that replace sys and platform only during the platform check
            def create_wrapped_match_json(original_match):
                def wrapped_match_json(cls, llm_family, llm_spec, quantization):
                    # Temporarily replace sys and platform in the MLX module
                    import sys as original_sys
                    import platform as original_platform

                    # Replace sys and platform temporarily
                    mlx_core.sys = MockSys()
                    mlx_core.platform = MockPlatform()

                    try:
                        # Call the original match_json which will now see the mocked platform
                        result = original_match.__func__(cls, llm_family, llm_spec, quantization)
                        return result
                    finally:
                        # Restore original sys and platform
                        mlx_core.sys = original_sys
                        mlx_core.platform = original_platform

                return classmethod(wrapped_match_json)

            # Apply the wrapped match_json methods
            mlx_core.MLXModel.match_json = create_wrapped_match_json(original_mlx_match)
            mlx_core.MLXChatModel.match_json = create_wrapped_match_json(original_mlx_chat_match)
            mlx_core.MLXVisionModel.match_json = create_wrapped_match_json(original_mlx_vision_match)

        except ImportError:
            pass

    except Exception as e:
        # If any mocking fails, continue without it
        print(f"Warning: Could not mock some engine platform checks: {e}")
        pass

mock_platform_checks()

from xinference.model.llm.llm_family import SUPPORTED_ENGINES, check_engine_by_spec_parameters
from xinference.model.llm.vllm.core import VLLM_INSTALLED, VLLM_SUPPORTED_MODELS, VLLM_SUPPORTED_CHAT_MODELS

# Mock platform checks again after imports to ensure they stick

# Re-register engines with mocked platform checks
from xinference.model.llm import generate_engine_config_by_model_family
from xinference.model.llm.llm_family import BUILTIN_LLM_FAMILIES, LLM_ENGINES

# Clear existing engine configurations
LLM_ENGINES.clear()

# Re-register all model families with mocked platform checks
for family in BUILTIN_LLM_FAMILIES:
    generate_engine_config_by_model_family(family)

MODEL_HUB_HUGGING_FACE = "Hugging Face"
MODEL_HUB_MODELSCOPE = "ModelScope"
_LEGACY_TRANSFORMERS_FORMATS = {"pytorch", "gptq", "awq", "bnb"}


def build_architecture_to_models(models):
    architecture_to_models = defaultdict(list)
    for model in models:
        for architecture in model.get("architectures", []) or []:
            architecture_to_models[architecture].append(model["model_name"])
    return architecture_to_models


def get_metrics_from_url(metrics_url):
    from prometheus_client.parser import text_string_to_metric_families
    import requests

    metrics = requests.get(metrics_url).content
    result = []
    for family in text_string_to_metric_families(metrics.decode("utf-8")):
        result.append({
            "name": family.name,
            "type": family.type,
            "help": family.documentation,
        })
    return result


def _can_use_transformers_legacy(model, model_spec):
    if model_spec.get("model_format") not in _LEGACY_TRANSFORMERS_FORMATS:
        return False
    abilities = set(model.get("model_ability", []))
    return "chat" in abilities or "generate" in abilities

def _extract_primary_model_src(model):
    if model.get("model_specs"):
        for spec in model["model_specs"]:
            if isinstance(spec, dict) and "model_src" in spec:
                return spec["model_src"]
    return model.get("model_src")

def main():
    template_dir = '../templates' 
    env = Environment(loader=FileSystemLoader(template_dir))

    with open('../../xinference/model/llm/llm_family.json', 'r') as model_file:
        models = json.load(model_file)

        model_by_names = { m['model_name']: m for m in models}

        sorted_models = []
        output_dir = './models/builtin/llm'
        os.makedirs(output_dir, exist_ok=True)
        current_files = {f for f in os.listdir(output_dir) if os.path.isfile(os.path.join(output_dir, f))}

        for model_name in sorted(model_by_names, key=str.lower):

            model = model_by_names[model_name]
            sorted_models.append(model)

            for model_spec in model['model_specs']:
                model_spec['model_hubs'] = []
                
                # Process different model sources
                if 'model_src' in model_spec:
                    # Handle new model_src structure
                    if 'huggingface' in model_spec['model_src']:
                        hf_src = model_spec['model_src']['huggingface']
                        model_spec['model_hubs'].append({
                            'name': MODEL_HUB_HUGGING_FACE,
                            'url': f"https://huggingface.co/{hf_src['model_id']}"
                        })
                        # Set model_id and quantizations for template compatibility
                        model_spec['model_id'] = hf_src['model_id']
                        model_spec['quantizations'] = hf_src['quantizations']
                        quantizations = hf_src['quantizations']
                    
                    if 'modelscope' in model_spec['model_src']:
                        ms_src = model_spec['model_src']['modelscope']
                        model_spec['model_hubs'].append({
                            'name': MODEL_HUB_MODELSCOPE,
                            'url': f"https://modelscope.cn/models/{ms_src['model_id']}"
                        })
                        
                    # If only modelscope exists and no huggingface, use modelscope data
                    if 'modelscope' in model_spec['model_src'] and 'huggingface' not in model_spec['model_src']:
                        ms_src = model_spec['model_src']['modelscope']
                        model_spec['model_id'] = ms_src['model_id']
                        model_spec['quantizations'] = ms_src['quantizations']
                        quantizations = ms_src['quantizations']
                else:
                    # Fallback for old format if still exists
                    model_spec['model_hubs'].append({
                        'name': MODEL_HUB_HUGGING_FACE,
                        'url': f"https://huggingface.co/{model_spec['model_id']}"
                    })
                    quantizations = model_spec.get('quantizations', [])

                # model engines
                engines = []
                for engine in SUPPORTED_ENGINES:
                    for quantization in quantizations:
                        size = model_spec['model_size_in_billions']
                        if isinstance(size, str) and '_' not in size:
                            size = int(size)
                        try:
                            check_engine_by_spec_parameters(engine, model_name, model_spec['model_format'],
                                                            size, quantization)
                        except ValueError:
                            if engine == "Transformers" and _can_use_transformers_legacy(
                                model, model_spec
                            ):
                                engines.append(engine)
                            continue
                        else:
                            engines.append(engine)
                model_spec['engines'] = sorted(list(set(engines)), reverse=True)

            rendered = env.get_template('llm.rst.jinja').render(model)
            output_file_name = f"{model['model_name'].lower()}.rst"
            if output_file_name in current_files:
                current_files.remove(output_file_name)
            output_file_path = os.path.join(output_dir, output_file_name)
            with open(output_file_path, 'w') as output_file:
                output_file.write(rendered)
                print(output_file_path)

        if current_files:
            for f in current_files:
                print(f"remove {f}")
                os.remove(os.path.join(output_dir, f))

        index_file_path = os.path.join(output_dir, "index.rst")
        with open(index_file_path, "w") as file:
            rendered_index = env.get_template('llm_index.rst.jinja').render(models=sorted_models)
            file.write(rendered_index)
        llm_sorted_models = sorted_models


    with open('../../xinference/model/embedding/model_spec.json', 'r') as file:
        models = json.load(file)

        model_by_names = { m['model_name']: m for m in models}

        sorted_models = []
        output_dir = './models/builtin/embedding'
        os.makedirs(output_dir, exist_ok=True)

        for model_name in sorted(model_by_names, key=str.lower):
            model = model_by_names[model_name]

            sorted_models.append(model)

            model['model_hubs'] = []
            
            # Process model specs for new model_src structure
            if 'model_specs' in model and model['model_specs']:
                model_spec = model['model_specs'][0]  # Use first spec for model hubs
                if 'model_src' in model_spec:
                    if 'huggingface' in model_spec['model_src']:
                        hf_src = model_spec['model_src']['huggingface']
                        model['model_hubs'].append({
                            'name': MODEL_HUB_HUGGING_FACE,
                            'url': f"https://huggingface.co/{hf_src['model_id']}"
                        })
                        # Set model_id for template compatibility (prefer huggingface)
                        model['model_id'] = hf_src['model_id']
                    
                    if 'modelscope' in model_spec['model_src']:
                        ms_src = model_spec['model_src']['modelscope']
                        model['model_hubs'].append({
                            'name': MODEL_HUB_MODELSCOPE,
                            'url': f"https://modelscope.cn/models/{ms_src['model_id']}"
                        })
                        # Only set modelscope model_id if no huggingface exists
                        if 'huggingface' not in model_spec['model_src']:
                            model['model_id'] = ms_src['model_id']
                else:
                    # Fallback for old format
                    model_id = model_spec.get('model_id', model.get('model_id', ''))
                    model['model_id'] = model_id
                    model['model_hubs'].append({
                        'name': MODEL_HUB_HUGGING_FACE,
                        'url': f"https://huggingface.co/{model_id}"
                    })
            else:
                # Fallback for very old format
                if 'model_id' in model:
                    model['model_hubs'].append({
                        'name': MODEL_HUB_HUGGING_FACE,
                        'url': f"https://huggingface.co/{model['model_id']}"
                    })

            rendered = env.get_template('embedding.rst.jinja').render(model)
            output_file_path = os.path.join(output_dir, f"{model['model_name'].lower()}.rst")
            with open(output_file_path, 'w') as output_file:
                output_file.write(rendered)
                print(output_file_path)

        index_file_path = os.path.join(output_dir, "index.rst")
        with open(index_file_path, "w") as file:            
            rendered_index = env.get_template('embedding_index.rst.jinja').render(models=sorted_models)
            file.write(rendered_index)

    with open('../../xinference/model/rerank/model_spec.json', 'r') as file:
        models = json.load(file)

        sorted_models = sorted(models, key=lambda x: x['model_name'].lower())
        output_dir = './models/builtin/rerank'
        os.makedirs(output_dir, exist_ok=True)

        for model in sorted_models:
            # Initialize model_hubs list
            model['model_hubs'] = []
            
            # Process model specs for new model_src structure
            model_spec = model['model_specs'][0]  # Use first spec for model hubs
            if 'model_src' in model_spec:
                if 'huggingface' in model_spec['model_src']:
                    hf_src = model_spec['model_src']['huggingface']
                    model['model_hubs'].append({
                        'name': MODEL_HUB_HUGGING_FACE,
                        'url': f"https://huggingface.co/{hf_src['model_id']}"
                    })
                    # Set model_id for template compatibility (prefer huggingface)
                    model['model_id'] = hf_src['model_id']
                
                if 'modelscope' in model_spec['model_src']:
                    ms_src = model_spec['model_src']['modelscope']
                    model['model_hubs'].append({
                        'name': MODEL_HUB_MODELSCOPE,
                        'url': f"https://modelscope.cn/models/{ms_src['model_id']}"
                    })
                    # Only set modelscope model_id if no huggingface exists
                    if 'huggingface' not in model_spec['model_src']:
                        model['model_id'] = ms_src['model_id']
            
            rendered = env.get_template('rerank.rst.jinja').render(model)
            output_file_path = os.path.join(output_dir, f"{model['model_name'].lower()}.rst")
            with open(output_file_path, 'w') as output_file:
                output_file.write(rendered)

        index_file_path = os.path.join(output_dir, "index.rst")
        with open(index_file_path, "w") as file:
            rendered_index = env.get_template('rerank_index.rst.jinja').render(models=sorted_models)
            file.write(rendered_index)

    with open('../../xinference/model/image/model_spec.json', 'r') as file:
        models = json.load(file)

        sorted_models = sorted(models, key=lambda x: x['model_name'].lower())
        output_dir = './models/builtin/image'
        os.makedirs(output_dir, exist_ok=True)

        for model in sorted_models:
            # Process model_src for template compatibility
            model_src = _extract_primary_model_src(model)
            if model_src:
                if 'huggingface' in model_src:
                    hf_src = model_src['huggingface']
                    model['model_id'] = hf_src['model_id']
                    # Handle GGUF related fields
                    if 'gguf_model_id' in hf_src:
                        model['gguf_model_id'] = hf_src['gguf_model_id']
                    if 'gguf_quantizations' in hf_src:
                        model['gguf_quantizations'] = ", ".join(hf_src['gguf_quantizations'])
                    # Handle Lightning related fields
                    if 'lightning_model_id' in hf_src:
                        model['lightning_model_id'] = hf_src['lightning_model_id']
                    if 'lightning_versions' in hf_src:
                        model['lightning_versions'] = ", ".join(hf_src['lightning_versions'])
                elif 'modelscope' in model_src:
                    model['model_id'] = model_src['modelscope']['model_id']
            
            available_controlnet = [cn["model_name"] for cn in model.get("controlnet", [])]
            if not available_controlnet:
                available_controlnet = None
            model["available_controlnet"] = available_controlnet
            model["model_ability"] = ', '.join(model.get("model_ability"))
            
            # Ensure gguf_quantizations is properly formatted (fallback for old format)
            if "gguf_quantizations" not in model:
                model["gguf_quantizations"] = ", ".join(model.get("gguf_quantizations", []))
            
            rendered = env.get_template('image.rst.jinja').render(model)
            output_file_path = os.path.join(output_dir, f"{model['model_name'].lower()}.rst")
            with open(output_file_path, 'w') as output_file:
                output_file.write(rendered)

        index_file_path = os.path.join(output_dir, "index.rst")
        with open(index_file_path, "w") as file:
            rendered_index = env.get_template('image_index.rst.jinja').render(models=sorted_models)
            file.write(rendered_index)

    with open('../../xinference/model/audio/model_spec.json', 'r') as file:
        models = json.load(file)

        sorted_models = sorted(models, key=lambda x: x['model_name'].lower())
        output_dir = './models/builtin/audio'
        os.makedirs(output_dir, exist_ok=True)

        for model in sorted_models:
            # Process model_src for template compatibility
            model_src = _extract_primary_model_src(model)
            if model_src:
                if 'huggingface' in model_src:
                    model['model_id'] = model_src['huggingface']['model_id']
                elif 'modelscope' in model_src:
                    model['model_id'] = model_src['modelscope']['model_id']
            
            rendered = env.get_template('audio.rst.jinja').render(model)
            output_file_path = os.path.join(output_dir, f"{model['model_name'].lower()}.rst")
            with open(output_file_path, 'w') as output_file:
                output_file.write(rendered)

        index_file_path = os.path.join(output_dir, "index.rst")
        with open(index_file_path, "w") as file:
            rendered_index = env.get_template('audio_index.rst.jinja').render(models=sorted_models)
            file.write(rendered_index)

    with open('../../xinference/model/video/model_spec.json', 'r') as file:
        models = json.load(file)

        sorted_models = sorted(models, key=lambda x: x['model_name'].lower())
        output_dir = './models/builtin/video'
        os.makedirs(output_dir, exist_ok=True)

        for model in sorted_models:
            # Process model_src for template compatibility
            model_src = _extract_primary_model_src(model)
            if model_src:
                if 'huggingface' in model_src:
                    model['model_id'] = model_src['huggingface']['model_id']
                elif 'modelscope' in model_src:
                    model['model_id'] = model_src['modelscope']['model_id']
            
            model["model_ability"] = ', '.join(model.get("model_ability"))
            rendered = env.get_template('video.rst.jinja').render(model)
            output_file_path = os.path.join(output_dir, f"{model['model_name'].lower()}.rst")
            with open(output_file_path, 'w') as output_file:
                output_file.write(rendered)

        index_file_path = os.path.join(output_dir, "index.rst")
        with open(index_file_path, "w") as file:
            rendered_index = env.get_template('video_index.rst.jinja').render(models=sorted_models)
            file.write(rendered_index)

    if VLLM_INSTALLED:
        architecture_to_models = build_architecture_to_models(llm_sorted_models)
        supported_architectures = []
        for architecture in VLLM_SUPPORTED_MODELS + VLLM_SUPPORTED_CHAT_MODELS:
            if architecture not in supported_architectures:
                supported_architectures.append(architecture)
        groups = []
        for architecture in supported_architectures:
            if architecture in architecture_to_models:
                model_names = sorted(set(architecture_to_models[architecture]), key=str.lower)
                groups.append(model_names)
            else:
                groups.append([architecture])
        groups = [', '.join("``%s``" % m for m in group) for group in groups]
        vllm_model_str = '\n'.join('- %s' % group for group in groups)
        for fn in ['getting_started/installation.rst', 'user_guide/backends.rst']:
            with open(fn) as f:
                content = f.read()
            start_label = '.. vllm_start'
            end_label = '.. vllm_end'
            start = content.find(start_label) + len(start_label)
            end = content.find(end_label)
            new_content = content[:start] + '\n\n' + vllm_model_str + '\n' + content[end:]
            with open(fn, 'w') as f:
                f.write(new_content)

    try:
        output_dir = './user_guide'
        os.makedirs(output_dir, exist_ok=True)

        supervisor_metrics = get_metrics_from_url("http://127.0.0.1:9997/metrics")
        worker_metrics = get_metrics_from_url("http://127.0.0.1:9977/metrics")
        all_metrics = {"supervisor_metrics": supervisor_metrics, "worker_metrics": worker_metrics}
        rendered = env.get_template('metrics.jinja').render(all_metrics)
        output_file_path = os.path.join(output_dir, "metrics.rst")
        with open(output_file_path, 'w') as output_file:
            output_file.write(rendered)
    except Exception:
        print("Skip generate metrics doc, please start a local xinference server by: `xinference-local -mp 9977`.")


if __name__ == "__main__":
    main()


================================================
FILE: doc/source/getting_started/environments.rst
================================================
.. _environments:

======================
Environments Variables
======================

XINFERENCE_ENDPOINT
~~~~~~~~~~~~~~~~~~~~
Endpoint of Xinference, used to connect to Xinference service.
Default value is http://127.0.0.1:9997 , you can get it through logs.

XINFERENCE_MODEL_SRC
~~~~~~~~~~~~~~~~~~~~~
Modelhub used for downloading models. Default is "huggingface", or you
can set "modelscope" as downloading source.

.. _environments_xinference_home:

XINFERENCE_HOME
~~~~~~~~~~~~~~~~
By default, Xinference uses ``<HOME>/.xinference`` as home path to store
necessary files such as logs and models, where ``<HOME>`` is the home
path of current user. You can change this directory by configuring this environment
variable.

XINFERENCE_HEALTH_CHECK_FAILURE_THRESHOLD
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The maximum number of failed health checks tolerated at Xinference startup.
Default value is 5.

XINFERENCE_HEALTH_CHECK_INTERVAL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Health check interval (seconds) at Xinference startup.
Default value is 5.

XINFERENCE_HEALTH_CHECK_TIMEOUT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Health check timeout (seconds) at Xinference startup.
Default value is 10.

XINFERENCE_DISABLE_HEALTH_CHECK
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Xinference will automatically report health check at Xinference startup.
Setting this environment to 1 can disable health check.

XINFERENCE_DISABLE_METRICS
~~~~~~~~~~~~~~~~~

Download .txt

gitextract_u_nl6j7f/

├── .dockerignore
├── .gitattributes
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yaml
│   │   └── feature_request.yaml
│   └── workflows/
│       ├── assign.yaml
│       ├── docker-cd.yaml
│       ├── issue.yaml
│       ├── pr_auto_run_gen_docs.yaml
│       ├── python.yaml
│       └── release.yaml
├── .gitignore
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── LICENSE
├── MANIFEST.in
├── README.md
├── README_ja_JP.md
├── README_zh_CN.md
├── benchmark/
│   ├── README.md
│   ├── benchmark_embedding.py
│   ├── benchmark_latency.py
│   ├── benchmark_long.py
│   ├── benchmark_rerank.py
│   ├── benchmark_runner.py
│   ├── benchmark_serving.py
│   └── utils.py
├── doc/
│   ├── Makefile
│   ├── source/
│   │   ├── _static/
│   │   │   └── switcher.json
│   │   ├── conf.py
│   │   ├── development/
│   │   │   ├── contributing_codebase.rst
│   │   │   ├── contributing_environment.rst
│   │   │   ├── index.rst
│   │   │   └── xinference_internals.rst
│   │   ├── examples/
│   │   │   ├── ai_podcast.rst
│   │   │   ├── chatbot.rst
│   │   │   ├── gradio_chatinterface.rst
│   │   │   ├── index.rst
│   │   │   ├── langchain_streamlit_doc_chat.rst
│   │   │   └── pdf_chatbot.rst
│   │   ├── gen_docs.py
│   │   ├── getting_started/
│   │   │   ├── environments.rst
│   │   │   ├── index.rst
│   │   │   ├── installation.rst
│   │   │   ├── installation_npu.rst
│   │   │   ├── logging.rst
│   │   │   ├── release_notes.rst
│   │   │   ├── troubleshooting.rst
│   │   │   ├── using_docker_image.rst
│   │   │   ├── using_kubernetes.rst
│   │   │   └── using_xinference.rst
│   │   ├── index.rst
│   │   ├── locale/
│   │   │   └── zh_CN/
│   │   │       └── LC_MESSAGES/
│   │   │           ├── development/
│   │   │           │   ├── contributing_codebase.po
│   │   │           │   ├── contributing_environment.po
│   │   │           │   ├── index.po
│   │   │           │   └── xinference_internals.po
│   │   │           ├── examples/
│   │   │           │   ├── ai_podcast.po
│   │   │           │   ├── chatbot.po
│   │   │           │   ├── gradio_chatinterface.po
│   │   │           │   ├── index.po
│   │   │           │   ├── langchain_streamlit_doc_chat.po
│   │   │           │   └── pdf_chatbot.po
│   │   │           ├── getting_started/
│   │   │           │   ├── environments.po
│   │   │           │   ├── index.po
│   │   │           │   ├── installation.po
│   │   │           │   ├── installation_npu.po
│   │   │           │   ├── logging.po
│   │   │           │   ├── release_notes.po
│   │   │           │   ├── troubleshooting.po
│   │   │           │   ├── using_docker_image.po
│   │   │           │   ├── using_kubernetes.po
│   │   │           │   └── using_xinference.po
│   │   │           ├── getting_started.po
│   │   │           ├── index.po
│   │   │           ├── models/
│   │   │           │   ├── builtin/
│   │   │           │   │   ├── audio/
│   │   │           │   │   │   └── index.po
│   │   │           │   │   ├── embedding/
│   │   │           │   │   │   ├── bge-base-en-v1.5.po
│   │   │           │   │   │   ├── bge-base-en.po
│   │   │           │   │   │   ├── bge-base-zh-v1.5.po
│   │   │           │   │   │   ├── bge-base-zh.po
│   │   │           │   │   │   ├── bge-large-en-v1.5.po
│   │   │           │   │   │   ├── bge-large-en.po
│   │   │           │   │   │   ├── bge-large-zh-noinstruct.po
│   │   │           │   │   │   ├── bge-large-zh-v1.5.po
│   │   │           │   │   │   ├── bge-large-zh.po
│   │   │           │   │   │   ├── bge-small-en-v1.5.po
│   │   │           │   │   │   ├── bge-small-zh-v1.5.po
│   │   │           │   │   │   ├── bge-small-zh.po
│   │   │           │   │   │   ├── e5-large-v2.po
│   │   │           │   │   │   ├── gte-base.po
│   │   │           │   │   │   ├── gte-large.po
│   │   │           │   │   │   ├── index.po
│   │   │           │   │   │   ├── jina-embeddings-v2-base-en.po
│   │   │           │   │   │   ├── jina-embeddings-v2-small-en.po
│   │   │           │   │   │   └── multilingual-e5-large.po
│   │   │           │   │   ├── image/
│   │   │           │   │   │   ├── flux.1-dev.po
│   │   │           │   │   │   ├── flux.1-schnell.po
│   │   │           │   │   │   ├── index.po
│   │   │           │   │   │   ├── kolors.po
│   │   │           │   │   │   ├── sd-turbo.po
│   │   │           │   │   │   ├── sd3-medium.po
│   │   │           │   │   │   ├── sdxl-turbo.po
│   │   │           │   │   │   ├── stable-diffusion-2-inpainting.po
│   │   │           │   │   │   ├── stable-diffusion-inpainting.po
│   │   │           │   │   │   ├── stable-diffusion-v1.5.po
│   │   │           │   │   │   ├── stable-diffusion-xl-base-1.0.po
│   │   │           │   │   │   └── stable-diffusion-xl-inpainting.po
│   │   │           │   │   ├── index.po
│   │   │           │   │   ├── llm/
│   │   │           │   │   │   ├── baichuan-2-chat.po
│   │   │           │   │   │   ├── baichuan-2.po
│   │   │           │   │   │   ├── baichuan-chat.po
│   │   │           │   │   │   ├── baichuan.po
│   │   │           │   │   │   ├── chatglm.po
│   │   │           │   │   │   ├── chatglm2-32k.po
│   │   │           │   │   │   ├── chatglm2.po
│   │   │           │   │   │   ├── chatglm3-32k.po
│   │   │           │   │   │   ├── chatglm3.po
│   │   │           │   │   │   ├── code-llama-instruct.po
│   │   │           │   │   │   ├── code-llama-python.po
│   │   │           │   │   │   ├── code-llama.po
│   │   │           │   │   │   ├── deepseek-chat.po
│   │   │           │   │   │   ├── deepseek-coder-instruct.po
│   │   │           │   │   │   ├── falcon-instruct.po
│   │   │           │   │   │   ├── falcon.po
│   │   │           │   │   │   ├── glaive-coder.po
│   │   │           │   │   │   ├── gorilla-openfunctions-v1.po
│   │   │           │   │   │   ├── gpt-2.po
│   │   │           │   │   │   ├── index.po
│   │   │           │   │   │   ├── internlm-20b.po
│   │   │           │   │   │   ├── internlm-7b.po
│   │   │           │   │   │   ├── internlm-chat-20b.po
│   │   │           │   │   │   ├── internlm-chat-7b.po
│   │   │           │   │   │   ├── llama-2-chat.po
│   │   │           │   │   │   ├── llama-2.po
│   │   │           │   │   │   ├── mistral-instruct-v0.1.po
│   │   │           │   │   │   ├── mistral-instruct-v0.2.po
│   │   │           │   │   │   ├── mistral-v0.1.po
│   │   │           │   │   │   ├── mixtral-instruct-v0.1.po
│   │   │           │   │   │   ├── mixtral-v0.1.po
│   │   │           │   │   │   ├── openbuddy.po
│   │   │           │   │   │   ├── openhermes-2.5.po
│   │   │           │   │   │   ├── opt.po
│   │   │           │   │   │   ├── orca.po
│   │   │           │   │   │   ├── qwen-chat.po
│   │   │           │   │   │   ├── starchat-beta.po
│   │   │           │   │   │   ├── starcoder.po
│   │   │           │   │   │   ├── starcoderplus.po
│   │   │           │   │   │   ├── tiny-llama.po
│   │   │           │   │   │   ├── vicuna-v1.3.po
│   │   │           │   │   │   ├── vicuna-v1.5-16k.po
│   │   │           │   │   │   ├── vicuna-v1.5.po
│   │   │           │   │   │   ├── wizardcoder-python-v1.0.po
│   │   │           │   │   │   ├── wizardlm-v1.0.po
│   │   │           │   │   │   ├── wizardmath-v1.0.po
│   │   │           │   │   │   ├── xverse-chat.po
│   │   │           │   │   │   ├── xverse.po
│   │   │           │   │   │   ├── yi-200k.po
│   │   │           │   │   │   ├── yi-chat.po
│   │   │           │   │   │   ├── yi.po
│   │   │           │   │   │   ├── zephyr-7b-alpha.po
│   │   │           │   │   │   └── zephyr-7b-beta.po
│   │   │           │   │   ├── rerank/
│   │   │           │   │   │   ├── bge-reranker-base.po
│   │   │           │   │   │   ├── bge-reranker-large.po
│   │   │           │   │   │   └── index.po
│   │   │           │   │   └── video/
│   │   │           │   │       ├── cogvideox-2b.po
│   │   │           │   │       └── index.po
│   │   │           │   ├── custom.po
│   │   │           │   ├── index.po
│   │   │           │   ├── lora.po
│   │   │           │   ├── model_abilities/
│   │   │           │   │   ├── audio.po
│   │   │           │   │   ├── chat.po
│   │   │           │   │   ├── embed.po
│   │   │           │   │   ├── flexible.po
│   │   │           │   │   ├── image.po
│   │   │           │   │   ├── index.po
│   │   │           │   │   ├── multimodal.po
│   │   │           │   │   ├── rerank.po
│   │   │           │   │   ├── tools.po
│   │   │           │   │   └── video.po
│   │   │           │   ├── model_memory.po
│   │   │           │   ├── model_update.po
│   │   │           │   ├── source/
│   │   │           │   │   └── source.po
│   │   │           │   ├── sources/
│   │   │           │   │   └── sources.po
│   │   │           │   ├── virtualenv.po
│   │   │           │   ├── xinference_model_hub.po
│   │   │           │   └── xinference_models_hub.po
│   │   │           ├── reference/
│   │   │           │   └── index.po
│   │   │           ├── reference.po
│   │   │           └── user_guide/
│   │   │               ├── auth_system.po
│   │   │               ├── backends.po
│   │   │               ├── cache_management.po
│   │   │               ├── client_api.po
│   │   │               ├── continuous_batching.po
│   │   │               ├── distributed_inference.po
│   │   │               ├── index.po
│   │   │               ├── launch.po
│   │   │               └── vllm_enhancement.po
│   │   ├── models/
│   │   │   ├── builtin/
│   │   │   │   ├── audio/
│   │   │   │   │   ├── belle-distilwhisper-large-v2-zh.rst
│   │   │   │   │   ├── belle-whisper-large-v2-zh.rst
│   │   │   │   │   ├── belle-whisper-large-v3-zh.rst
│   │   │   │   │   ├── chattts.rst
│   │   │   │   │   ├── cosyvoice-300m-instruct.rst
│   │   │   │   │   ├── cosyvoice-300m-sft.rst
│   │   │   │   │   ├── cosyvoice-300m.rst
│   │   │   │   │   ├── cosyvoice2-0.5b.rst
│   │   │   │   │   ├── f5-tts-mlx.rst
│   │   │   │   │   ├── f5-tts.rst
│   │   │   │   │   ├── fishspeech-1.5.rst
│   │   │   │   │   ├── fun-asr-mlt-nano-2512.rst
│   │   │   │   │   ├── fun-asr-nano-2512.rst
│   │   │   │   │   ├── index.rst
│   │   │   │   │   ├── indextts2.rst
│   │   │   │   │   ├── kokoro-82m-mlx.rst
│   │   │   │   │   ├── kokoro-82m-v1.1-zh.rst
│   │   │   │   │   ├── kokoro-82m.rst
│   │   │   │   │   ├── megatts3.rst
│   │   │   │   │   ├── melotts-chinese.rst
│   │   │   │   │   ├── melotts-english-v2.rst
│   │   │   │   │   ├── melotts-english-v3.rst
│   │   │   │   │   ├── melotts-english.rst
│   │   │   │   │   ├── melotts-french.rst
│   │   │   │   │   ├── melotts-japanese.rst
│   │   │   │   │   ├── melotts-korean.rst
│   │   │   │   │   ├── melotts-spanish.rst
│   │   │   │   │   ├── paraformer-zh-hotword.rst
│   │   │   │   │   ├── paraformer-zh-long.rst
│   │   │   │   │   ├── paraformer-zh-spk.rst
│   │   │   │   │   ├── paraformer-zh.rst
│   │   │   │   │   ├── qwen3-asr-0.6b.rst
│   │   │   │   │   ├── qwen3-asr-1.7b.rst
│   │   │   │   │   ├── seaco-paraformer-zh.rst
│   │   │   │   │   ├── sensevoicesmall.rst
│   │   │   │   │   ├── whisper-base-mlx.rst
│   │   │   │   │   ├── whisper-base.en-mlx.rst
│   │   │   │   │   ├── whisper-base.en.rst
│   │   │   │   │   ├── whisper-base.rst
│   │   │   │   │   ├── whisper-large-v3-mlx.rst
│   │   │   │   │   ├── whisper-large-v3-turbo-mlx.rst
│   │   │   │   │   ├── whisper-large-v3-turbo.rst
│   │   │   │   │   ├── whisper-large-v3.rst
│   │   │   │   │   ├── whisper-medium-mlx.rst
│   │   │   │   │   ├── whisper-medium.en-mlx.rst
│   │   │   │   │   ├── whisper-medium.en.rst
│   │   │   │   │   ├── whisper-medium.rst
│   │   │   │   │   ├── whisper-small-mlx.rst
│   │   │   │   │   ├── whisper-small.en-mlx.rst
│   │   │   │   │   ├── whisper-small.en.rst
│   │   │   │   │   ├── whisper-small.rst
│   │   │   │   │   ├── whisper-tiny-mlx.rst
│   │   │   │   │   ├── whisper-tiny.en-mlx.rst
│   │   │   │   │   ├── whisper-tiny.en.rst
│   │   │   │   │   └── whisper-tiny.rst
│   │   │   │   ├── embedding/
│   │   │   │   │   ├── bce-embedding-base_v1.rst
│   │   │   │   │   ├── bge-base-en-v1.5.rst
│   │   │   │   │   ├── bge-base-en.rst
│   │   │   │   │   ├── bge-base-zh-v1.5.rst
│   │   │   │   │   ├── bge-base-zh.rst
│   │   │   │   │   ├── bge-large-en-v1.5.rst
│   │   │   │   │   ├── bge-large-en.rst
│   │   │   │   │   ├── bge-large-zh-noinstruct.rst
│   │   │   │   │   ├── bge-large-zh-v1.5.rst
│   │   │   │   │   ├── bge-large-zh.rst
│   │   │   │   │   ├── bge-m3.rst
│   │   │   │   │   ├── bge-small-en-v1.5.rst
│   │   │   │   │   ├── bge-small-zh-v1.5.rst
│   │   │   │   │   ├── bge-small-zh.rst
│   │   │   │   │   ├── e5-large-v2.rst
│   │   │   │   │   ├── gme-qwen2-vl-2b-instruct.rst
│   │   │   │   │   ├── gme-qwen2-vl-7b-instruct.rst
│   │   │   │   │   ├── gte-base.rst
│   │   │   │   │   ├── gte-large.rst
│   │   │   │   │   ├── gte-qwen2.rst
│   │   │   │   │   ├── index.rst
│   │   │   │   │   ├── jina-clip-v2.rst
│   │   │   │   │   ├── jina-embeddings-v2-base-en.rst
│   │   │   │   │   ├── jina-embeddings-v2-base-zh.rst
│   │   │   │   │   ├── jina-embeddings-v2-small-en.rst
│   │   │   │   │   ├── jina-embeddings-v3.rst
│   │   │   │   │   ├── jina-embeddings-v4.rst
│   │   │   │   │   ├── m3e-base.rst
│   │   │   │   │   ├── m3e-large.rst
│   │   │   │   │   ├── m3e-small.rst
│   │   │   │   │   ├── multilingual-e5-large.rst
│   │   │   │   │   ├── qwen3-embedding-0.6b.rst
│   │   │   │   │   ├── qwen3-embedding-4b.rst
│   │   │   │   │   ├── qwen3-embedding-8b.rst
│   │   │   │   │   ├── qwen3-vl-embedding-2b.rst
│   │   │   │   │   ├── qwen3-vl-embedding-8b.rst
│   │   │   │   │   ├── text2vec-base-chinese-paraphrase.rst
│   │   │   │   │   ├── text2vec-base-chinese-sentence.rst
│   │   │   │   │   ├── text2vec-base-chinese.rst
│   │   │   │   │   ├── text2vec-base-multilingual.rst
│   │   │   │   │   └── text2vec-large-chinese.rst
│   │   │   │   ├── image/
│   │   │   │   │   ├── cogview4.rst
│   │   │   │   │   ├── deepseek-ocr.rst
│   │   │   │   │   ├── flux.1-dev.rst
│   │   │   │   │   ├── flux.1-kontext-dev.rst
│   │   │   │   │   ├── flux.1-schnell.rst
│   │   │   │   │   ├── flux.2-dev.rst
│   │   │   │   │   ├── flux.2-klein-4b.rst
│   │   │   │   │   ├── flux.2-klein-9b.rst
│   │   │   │   │   ├── got-ocr2_0.rst
│   │   │   │   │   ├── hunyuandit-v1.2-distilled.rst
│   │   │   │   │   ├── hunyuandit-v1.2.rst
│   │   │   │   │   ├── hunyuanocr.rst
│   │   │   │   │   ├── index.rst
│   │   │   │   │   ├── kolors.rst
│   │   │   │   │   ├── mineru2.5-2509-1.2b.rst
│   │   │   │   │   ├── paddleocr-vl.rst
│   │   │   │   │   ├── qwen-image-2512.rst
│   │   │   │   │   ├── qwen-image-edit-2509.rst
│   │   │   │   │   ├── qwen-image-edit-2511.rst
│   │   │   │   │   ├── qwen-image-edit.rst
│   │   │   │   │   ├── qwen-image-layered.rst
│   │   │   │   │   ├── qwen-image.rst
│   │   │   │   │   ├── sd-turbo.rst
│   │   │   │   │   ├── sd3-medium.rst
│   │   │   │   │   ├── sd3.5-large-turbo.rst
│   │   │   │   │   ├── sd3.5-large.rst
│   │   │   │   │   ├── sd3.5-medium.rst
│   │   │   │   │   ├── sdxl-turbo.rst
│   │   │   │   │   ├── stable-diffusion-2-inpainting.rst
│   │   │   │   │   ├── stable-diffusion-inpainting.rst
│   │   │   │   │   ├── stable-diffusion-v1.5.rst
│   │   │   │   │   ├── stable-diffusion-xl-base-1.0.rst
│   │   │   │   │   ├── stable-diffusion-xl-inpainting.rst
│   │   │   │   │   ├── z-image-turbo.rst
│   │   │   │   │   └── z-image.rst
│   │   │   │   ├── index.rst
│   │   │   │   ├── llm/
│   │   │   │   │   ├── baichuan-2-chat.rst
│   │   │   │   │   ├── baichuan-2.rst
│   │   │   │   │   ├── baichuan-m2.rst
│   │   │   │   │   ├── code-llama-instruct.rst
│   │   │   │   │   ├── code-llama-python.rst
│   │   │   │   │   ├── code-llama.rst
│   │   │   │   │   ├── codegeex4.rst
│   │   │   │   │   ├── codeqwen1.5-chat.rst
│   │   │   │   │   ├── codeqwen1.5.rst
│   │   │   │   │   ├── codeshell-chat.rst
│   │   │   │   │   ├── codeshell.rst
│   │   │   │   │   ├── codestral-v0.1.rst
│   │   │   │   │   ├── cogagent.rst
│   │   │   │   │   ├── deepseek-chat.rst
│   │   │   │   │   ├── deepseek-coder-instruct.rst
│   │   │   │   │   ├── deepseek-coder.rst
│   │   │   │   │   ├── deepseek-prover-v2.rst
│   │   │   │   │   ├── deepseek-r1-0528-qwen3.rst
│   │   │   │   │   ├── deepseek-r1-0528.rst
│   │   │   │   │   ├── deepseek-r1-distill-llama.rst
│   │   │   │   │   ├── deepseek-r1-distill-qwen.rst
│   │   │   │   │   ├── deepseek-r1.rst
│   │   │   │   │   ├── deepseek-v2-chat-0628.rst
│   │   │   │   │   ├── deepseek-v2-chat.rst
│   │   │   │   │   ├── deepseek-v2.5.rst
│   │   │   │   │   ├── deepseek-v3-0324.rst
│   │   │   │   │   ├── deepseek-v3.1.rst
│   │   │   │   │   ├── deepseek-v3.2-exp.rst
│   │   │   │   │   ├── deepseek-v3.2.rst
│   │   │   │   │   ├── deepseek-v3.rst
│   │   │   │   │   ├── deepseek-vl2.rst
│   │   │   │   │   ├── deepseek.rst
│   │   │   │   │   ├── dianjin-r1.rst
│   │   │   │   │   ├── ernie4.5.rst
│   │   │   │   │   ├── fin-r1.rst
│   │   │   │   │   ├── gemma-3-1b-it.rst
│   │   │   │   │   ├── gemma-3-it.rst
│   │   │   │   │   ├── glm-4.1v-thinking.rst
│   │   │   │   │   ├── glm-4.5.rst
│   │   │   │   │   ├── glm-4.5v.rst
│   │   │   │   │   ├── glm-4.6.rst
│   │   │   │   │   ├── glm-4.7-flash.rst
│   │   │   │   │   ├── glm-4.7.rst
│   │   │   │   │   ├── glm-4v.rst
│   │   │   │   │   ├── glm-5.rst
│   │   │   │   │   ├── glm-edge-chat.rst
│   │   │   │   │   ├── glm4-0414.rst
│   │   │   │   │   ├── glm4-chat-1m.rst
│   │   │   │   │   ├── glm4-chat.rst
│   │   │   │   │   ├── gorilla-openfunctions-v2.rst
│   │   │   │   │   ├── gpt-2.rst
│   │   │   │   │   ├── gpt-oss.rst
│   │   │   │   │   ├── huatuogpt-o1-llama-3.1.rst
│   │   │   │   │   ├── huatuogpt-o1-qwen2.5.rst
│   │   │   │   │   ├── index.rst
│   │   │   │   │   ├── internlm3-instruct.rst
│   │   │   │   │   ├── internvl3.rst
│   │   │   │   │   ├── kat-v1.rst
│   │   │   │   │   ├── kimi-k2.5.rst
│   │   │   │   │   ├── llama-2-chat.rst
│   │   │   │   │   ├── llama-2.rst
│   │   │   │   │   ├── llama-3-instruct.rst
│   │   │   │   │   ├── llama-3.1-instruct.rst
│   │   │   │   │   ├── llama-3.1.rst
│   │   │   │   │   ├── llama-3.2-vision-instruct.rst
│   │   │   │   │   ├── llama-3.2-vision.rst
│   │   │   │   │   ├── llama-3.3-instruct.rst
│   │   │   │   │   ├── llama-3.rst
│   │   │   │   │   ├── marco-o1.rst
│   │   │   │   │   ├── mineru2.5-2509-1.2b.rst
│   │   │   │   │   ├── minicpm-2b-dpo-bf16.rst
│   │   │   │   │   ├── minicpm-2b-dpo-fp16.rst
│   │   │   │   │   ├── minicpm-2b-dpo-fp32.rst
│   │   │   │   │   ├── minicpm-2b-sft-bf16.rst
│   │   │   │   │   ├── minicpm-2b-sft-fp32.rst
│   │   │   │   │   ├── minicpm-v-2.6.rst
│   │   │   │   │   ├── minicpm-v-4.5.rst
│   │   │   │   │   ├── minicpm3-4b.rst
│   │   │   │   │   ├── minicpm4.rst
│   │   │   │   │   ├── minimax-m2.5.rst
│   │   │   │   │   ├── minimax-m2.rst
│   │   │   │   │   ├── mistral-instruct-v0.1.rst
│   │   │   │   │   ├── mistral-instruct-v0.2.rst
│   │   │   │   │   ├── mistral-instruct-v0.3.rst
│   │   │   │   │   ├── mistral-large-instruct.rst
│   │   │   │   │   ├── mistral-nemo-instruct.rst
│   │   │   │   │   ├── mistral-v0.1.rst
│   │   │   │   │   ├── mixtral-8x22b-instruct-v0.1.rst
│   │   │   │   │   ├── mixtral-instruct-v0.1.rst
│   │   │   │   │   ├── mixtral-v0.1.rst
│   │   │   │   │   ├── moonlight-16b-a3b-instruct.rst
│   │   │   │   │   ├── openhermes-2.5.rst
│   │   │   │   │   ├── opt.rst
│   │   │   │   │   ├── orion-chat.rst
│   │   │   │   │   ├── ovis2.rst
│   │   │   │   │   ├── phi-2.rst
│   │   │   │   │   ├── phi-3-mini-128k-instruct.rst
│   │   │   │   │   ├── phi-3-mini-4k-instruct.rst
│   │   │   │   │   ├── qvq-72b-preview.rst
│   │   │   │   │   ├── qwen-chat.rst
│   │   │   │   │   ├── qwen1.5-chat.rst
│   │   │   │   │   ├── qwen1.5-moe-chat.rst
│   │   │   │   │   ├── qwen2-audio-instruct.rst
│   │   │   │   │   ├── qwen2-instruct.rst
│   │   │   │   │   ├── qwen2-moe-instruct.rst
│   │   │   │   │   ├── qwen2-vl-instruct.rst
│   │   │   │   │   ├── qwen2.5-coder-instruct.rst
│   │   │   │   │   ├── qwen2.5-coder.rst
│   │   │   │   │   ├── qwen2.5-instruct-1m.rst
│   │   │   │   │   ├── qwen2.5-instruct.rst
│   │   │   │   │   ├── qwen2.5-omni.rst
│   │   │   │   │   ├── qwen2.5-vl-instruct.rst
│   │   │   │   │   ├── qwen2.5.rst
│   │   │   │   │   ├── qwen3-coder.rst
│   │   │   │   │   ├── qwen3-instruct.rst
│   │   │   │   │   ├── qwen3-next-instruct.rst
│   │   │   │   │   ├── qwen3-next-thinking.rst
│   │   │   │   │   ├── qwen3-omni-instruct.rst
│   │   │   │   │   ├── qwen3-omni-thinking.rst
│   │   │   │   │   ├── qwen3-thinking.rst
│   │   │   │   │   ├── qwen3-vl-instruct.rst
│   │   │   │   │   ├── qwen3-vl-thinking.rst
│   │   │   │   │   ├── qwen3.5.rst
│   │   │   │   │   ├── qwen3.rst
│   │   │   │   │   ├── qwenlong-l1.rst
│   │   │   │   │   ├── qwq-32b-preview.rst
│   │   │   │   │   ├── qwq-32b.rst
│   │   │   │   │   ├── seallm_v2.5.rst
│   │   │   │   │   ├── seallm_v2.rst
│   │   │   │   │   ├── seallms-v3.rst
│   │   │   │   │   ├── seed-oss.rst
│   │   │   │   │   ├── skywork-math.rst
│   │   │   │   │   ├── skywork-or1-preview.rst
│   │   │   │   │   ├── skywork-or1.rst
│   │   │   │   │   ├── skywork.rst
│   │   │   │   │   ├── telechat.rst
│   │   │   │   │   ├── tiny-llama.rst
│   │   │   │   │   ├── wizardcoder-python-v1.0.rst
│   │   │   │   │   ├── wizardmath-v1.0.rst
│   │   │   │   │   ├── xiyansql-qwencoder-2504.rst
│   │   │   │   │   ├── xverse-chat.rst
│   │   │   │   │   ├── xverse.rst
│   │   │   │   │   ├── yi-1.5-chat-16k.rst
│   │   │   │   │   ├── yi-1.5-chat.rst
│   │   │   │   │   ├── yi-1.5.rst
│   │   │   │   │   ├── yi-200k.rst
│   │   │   │   │   ├── yi-chat.rst
│   │   │   │   │   └── yi.rst
│   │   │   │   ├── rerank/
│   │   │   │   │   ├── bce-reranker-base_v1.rst
│   │   │   │   │   ├── bge-reranker-base.rst
│   │   │   │   │   ├── bge-reranker-large.rst
│   │   │   │   │   ├── bge-reranker-v2-gemma.rst
│   │   │   │   │   ├── bge-reranker-v2-m3.rst
│   │   │   │   │   ├── bge-reranker-v2-minicpm-layerwise.rst
│   │   │   │   │   ├── index.rst
│   │   │   │   │   ├── jina-reranker-v2.rst
│   │   │   │   │   ├── jina-reranker-v3.rst
│   │   │   │   │   ├── minicpm-reranker.rst
│   │   │   │   │   ├── qwen3-reranker-0.6b.rst
│   │   │   │   │   ├── qwen3-reranker-4b.rst
│   │   │   │   │   ├── qwen3-reranker-8b.rst
│   │   │   │   │   ├── qwen3-vl-reranker-2b.rst
│   │   │   │   │   └── qwen3-vl-reranker-8b.rst
│   │   │   │   └── video/
│   │   │   │       ├── cogvideox-2b.rst
│   │   │   │       ├── cogvideox-5b.rst
│   │   │   │       ├── hunyuanvideo.rst
│   │   │   │       ├── index.rst
│   │   │   │       ├── wan2.1-1.3b.rst
│   │   │   │       ├── wan2.1-14b.rst
│   │   │   │       ├── wan2.1-flf2v-14b-720p.rst
│   │   │   │       ├── wan2.1-i2v-14b-480p.rst
│   │   │   │       ├── wan2.1-i2v-14b-720p.rst
│   │   │   │       ├── wan2.2-a14b.rst
│   │   │   │       ├── wan2.2-i2v-a14b.rst
│   │   │   │       └── wan2.2-ti2v-5b.rst
│   │   │   ├── custom.rst
│   │   │   ├── index.rst
│   │   │   ├── lora.rst
│   │   │   ├── model_abilities/
│   │   │   │   ├── audio.rst
│   │   │   │   ├── chat.rst
│   │   │   │   ├── embed.rst
│   │   │   │   ├── flexible.rst
│   │   │   │   ├── image.rst
│   │   │   │   ├── index.rst
│   │   │   │   ├── multimodal.rst
│   │   │   │   ├── rerank.rst
│   │   │   │   ├── tools.rst
│   │   │   │   └── video.rst
│   │   │   ├── model_memory.rst
│   │   │   ├── model_update.rst
│   │   │   ├── sources/
│   │   │   │   └── sources.rst
│   │   │   ├── virtualenv.rst
│   │   │   └── xinference_models_hub.rst
│   │   ├── norm_zh.py
│   │   ├── reference/
│   │   │   └── index.rst
│   │   └── user_guide/
│   │       ├── auth_system.rst
│   │       ├── backends.rst
│   │       ├── client_api.rst
│   │       ├── continuous_batching.rst
│   │       ├── distributed_inference.rst
│   │       ├── index.rst
│   │       ├── launch.rst
│   │       ├── metrics.rst
│   │       └── vllm_enhancement.rst
│   └── templates/
│       ├── audio.rst.jinja
│       ├── audio_index.rst.jinja
│       ├── embedding.rst.jinja
│       ├── embedding_index.rst.jinja
│       ├── image.rst.jinja
│       ├── image_index.rst.jinja
│       ├── llm.rst.jinja
│       ├── llm_index.rst.jinja
│       ├── metrics.jinja
│       ├── rerank.rst.jinja
│       ├── rerank_index.rst.jinja
│       ├── video.rst.jinja
│       └── video_index.rst.jinja
├── examples/
│   ├── AI_podcast.py
│   ├── AI_podcast_ZH.py
│   ├── AI_translate.py
│   ├── Custom_StableDiffusion_ControlNet.ipynb
│   ├── FunctionCall.ipynb
│   ├── LangChain_QA.ipynb
│   ├── LangChain_Streamlit_Doc_Chat.py
│   ├── StableDiffusionControlNet.ipynb
│   ├── Xinference_Quick_Start.ipynb
│   ├── audio_to_text.ipynb
│   ├── chat.py
│   ├── chat_vl.ipynb
│   └── gradio_chatinterface.py
├── pyproject.toml
├── setup.cfg
├── setup.py
├── versioneer.py
└── xinference/
    ├── __init__.py
    ├── _compat.py
    ├── _version.py
    ├── api/
    │   ├── __init__.py
    │   ├── dependencies.py
    │   ├── oauth2/
    │   │   ├── __init__.py
    │   │   ├── auth_service.py
    │   │   ├── types.py
    │   │   └── utils.py
    │   ├── responses.py
    │   ├── restful_api.py
    │   ├── routers/
    │   │   ├── __init__.py
    │   │   ├── admin.py
    │   │   ├── audio.py
    │   │   ├── embeddings.py
    │   │   ├── images.py
    │   │   ├── llm.py
    │   │   ├── models.py
    │   │   ├── rerank.py
    │   │   └── videos.py
    │   ├── schemas/
    │   │   ├── __init__.py
    │   │   └── requests.py
    │   ├── tests/
    │   │   ├── __init__.py
    │   │   ├── test_admin.py
    │   │   └── test_utils.py
    │   └── utils.py
    ├── client/
    │   ├── __init__.py
    │   ├── common.py
    │   ├── handlers.py
    │   ├── restful/
    │   │   ├── __init__.py
    │   │   ├── async_restful_client.py
    │   │   └── restful_client.py
    │   └── tests/
    │       ├── __init__.py
    │       ├── test_async_client.py
    │       ├── test_async_client_with_auth.py
    │       ├── test_client.py
    │       └── test_client_with_auth.py
    ├── conftest.py
    ├── constants.py
    ├── core/
    │   ├── __init__.py
    │   ├── cache_tracker.py
    │   ├── event.py
    │   ├── launch_strategy.py
    │   ├── metrics.py
    │   ├── model.py
    │   ├── otel.py
    │   ├── progress_tracker.py
    │   ├── resource.py
    │   ├── status_guard.py
    │   ├── supervisor.py
    │   ├── tests/
    │   │   ├── __init__.py
    │   │   ├── test_continuous_batching.py
    │   │   ├── test_launch_strategy.py
    │   │   ├── test_metrics.py
    │   │   ├── test_model.py
    │   │   ├── test_progressor.py
    │   │   ├── test_restful_api.py
    │   │   ├── test_types.py
    │   │   ├── test_utils.py
    │   │   └── test_worker.py
    │   ├── utils.py
    │   ├── virtual_env_manager.py
    │   └── worker.py
    ├── deploy/
    │   ├── __init__.py
    │   ├── cmdline.py
    │   ├── docker/
    │   │   ├── Dockerfile
    │   │   ├── Dockerfile.cpu
    │   │   ├── docker-compose-distributed.yml
    │   │   ├── docker-compose.yml
    │   │   ├── requirements/
    │   │   │   ├── requirements-base.txt
    │   │   │   ├── requirements-ml.txt
    │   │   │   └── requirements-models.txt
    │   │   └── requirements_cpu/
    │   │       ├── requirements_cpu-base.txt
    │   │       ├── requirements_cpu-ml.txt
    │   │       └── requirements_cpu-models.txt
    │   ├── local.py
    │   ├── supervisor.py
    │   ├── test/
    │   │   ├── __init__.py
    │   │   └── test_cmdline.py
    │   ├── utils.py
    │   └── worker.py
    ├── device_utils.py
    ├── fields.py
    ├── isolation.py
    ├── model/
    │   ├── __init__.py
    │   ├── audio/
    │   │   ├── __init__.py
    │   │   ├── chattts.py
    │   │   ├── core.py
    │   │   ├── cosyvoice.py
    │   │   ├── custom.py
    │   │   ├── f5tts.py
    │   │   ├── f5tts_mlx.py
    │   │   ├── fish_speech.py
    │   │   ├── funasr.py
    │   │   ├── indextts2.py
    │   │   ├── kokoro.py
    │   │   ├── kokoro_mlx.py
    │   │   ├── kokoro_zh.py
    │   │   ├── megatts.py
    │   │   ├── melotts.py
    │   │   ├── model_spec.json
    │   │   ├── qwen3_asr.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   ├── bbc_news.npy
    │   │   │   ├── jfk.flac
    │   │   │   ├── test_chattts.py
    │   │   │   ├── test_cosyvoice.py
    │   │   │   ├── test_f5tts.py
    │   │   │   ├── test_f5tts_mlx.py
    │   │   │   ├── test_fish_speech.py
    │   │   │   ├── test_funasr.py
    │   │   │   ├── test_kokoro.py
    │   │   │   ├── test_megatts.py
    │   │   │   ├── test_melotts.py
    │   │   │   ├── test_whisper.py
    │   │   │   └── test_whisper_mlx.py
    │   │   ├── utils.py
    │   │   ├── whisper.py
    │   │   └── whisper_mlx.py
    │   ├── batch.py
    │   ├── cache_manager.py
    │   ├── core.py
    │   ├── custom.py
    │   ├── embedding/
    │   │   ├── __init__.py
    │   │   ├── cache_manager.py
    │   │   ├── core.py
    │   │   ├── custom.py
    │   │   ├── embed_family.py
    │   │   ├── flag/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       └── test_flag.py
    │   │   ├── llama_cpp/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       └── test_llama_cpp.py
    │   │   ├── model_spec.json
    │   │   ├── sentence_transformers/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       └── test_sentence_transformers.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_embedding_models.py
    │   │   │   ├── test_integrated_embedding.py
    │   │   │   └── test_qwen3_vl_engine_params.py
    │   │   └── vllm/
    │   │       ├── __init__.py
    │   │       ├── core.py
    │   │       └── tests/
    │   │           ├── __init__.py
    │   │           └── test_vllm_embedding.py
    │   ├── flexible/
    │   │   ├── __init__.py
    │   │   ├── core.py
    │   │   ├── custom.py
    │   │   ├── launchers/
    │   │   │   ├── __init__.py
    │   │   │   ├── image_process_launcher.py
    │   │   │   ├── modelscope_launcher.py
    │   │   │   ├── transformers_launcher.py
    │   │   │   └── yolo_launcher.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   └── test_flexible_models.py
    │   │   └── utils.py
    │   ├── image/
    │   │   ├── __init__.py
    │   │   ├── cache_manager.py
    │   │   ├── core.py
    │   │   ├── custom.py
    │   │   ├── engine.py
    │   │   ├── engine_family.py
    │   │   ├── model_spec.json
    │   │   ├── ocr/
    │   │   │   ├── __init__.py
    │   │   │   ├── deepseek_ocr.py
    │   │   │   ├── got_ocr2.py
    │   │   │   ├── hunyuan_ocr.py
    │   │   │   ├── mlx.py
    │   │   │   ├── ocr_family.py
    │   │   │   ├── paddleocr_vl.py
    │   │   │   └── vllm.py
    │   │   ├── scheduler/
    │   │   │   ├── __init__.py
    │   │   │   └── flux.py
    │   │   ├── sdapi.py
    │   │   ├── stable_diffusion/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── mlx.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_got_ocr2.py
    │   │   │   └── test_stable_diffusion.py
    │   │   └── utils.py
    │   ├── llm/
    │   │   ├── __init__.py
    │   │   ├── cache_manager.py
    │   │   ├── config_parser.py
    │   │   ├── core.py
    │   │   ├── custom.py
    │   │   ├── harmony.py
    │   │   ├── llama_cpp/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       ├── test_gguf.py
    │   │   │       └── test_structured.py
    │   │   ├── llm_family.json
    │   │   ├── llm_family.py
    │   │   ├── lmdeploy/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       └── __init__.py
    │   │   ├── memory.py
    │   │   ├── mlx/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   ├── distributed_models/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── core.py
    │   │   │   │   ├── deepseek_v3.py
    │   │   │   │   ├── qwen2.py
    │   │   │   │   ├── qwen3.py
    │   │   │   │   └── qwen3_moe.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       ├── test_distributed_model.py
    │   │   │       └── test_mlx.py
    │   │   ├── reasoning_parser.py
    │   │   ├── sglang/
    │   │   │   ├── __init__.py
    │   │   │   └── core.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_harmony.py
    │   │   │   ├── test_llm_family.py
    │   │   │   ├── test_llm_model.py
    │   │   │   ├── test_memory_estimate.py
    │   │   │   ├── test_multimodal.py
    │   │   │   ├── test_stream_options.py
    │   │   │   └── test_utils.py
    │   │   ├── tool_parsers/
    │   │   │   ├── __init__.py
    │   │   │   ├── abstract_tool_parser.py
    │   │   │   ├── deepseek_r1_tool_parser.py
    │   │   │   ├── deepseek_v3_1_tool_parser.py
    │   │   │   ├── deepseek_v3_tool_parser.py
    │   │   │   ├── glm4_tool_parser.py
    │   │   │   ├── llama3_tool_parser.py
    │   │   │   ├── minimax_tool_parser.py
    │   │   │   ├── qwen_tool_parser.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       ├── test_deepseek_r1_tool_parser.py
    │   │   │       ├── test_deepseek_v3_1_tool_parser.py
    │   │   │       ├── test_deepseek_v3_tool_parser.py
    │   │   │       ├── test_glm4_tool_parser.py
    │   │   │       ├── test_llama3_tool_parser.py
    │   │   │       └── test_qwen_tool_parser.py
    │   │   ├── transformers/
    │   │   │   ├── __init__.py
    │   │   │   ├── chatglm.py
    │   │   │   ├── core.py
    │   │   │   ├── deepseek_v2.py
    │   │   │   ├── gemma3.py
    │   │   │   ├── gpt_oss.py
    │   │   │   ├── multimodal/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── cogagent.py
    │   │   │   │   ├── core.py
    │   │   │   │   ├── deepseek_vl2.py
    │   │   │   │   ├── gemma3.py
    │   │   │   │   ├── glm4_1v.py
    │   │   │   │   ├── glm4v.py
    │   │   │   │   ├── intern_vl.py
    │   │   │   │   ├── minicpmv26.py
    │   │   │   │   ├── minicpmv45.py
    │   │   │   │   ├── ovis2.py
    │   │   │   │   ├── qwen-omni.py
    │   │   │   │   ├── qwen2_audio.py
    │   │   │   │   └── qwen2_vl.py
    │   │   │   ├── opt.py
    │   │   │   ├── tensorizer_utils.py
    │   │   │   ├── tests/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── test_opt.py
    │   │   │   │   └── test_tensorizer.py
    │   │   │   └── utils.py
    │   │   ├── utils.py
    │   │   └── vllm/
    │   │       ├── __init__.py
    │   │       ├── core.py
    │   │       ├── distributed_executor.py
    │   │       ├── distributed_executor_v1.py
    │   │       ├── tests/
    │   │       │   ├── __init__.py
    │   │       │   ├── test_core_chat_model.py
    │   │       │   └── test_distributed_executor.py
    │   │       ├── utils.py
    │   │       └── xavier/
    │   │           ├── __init__.py
    │   │           ├── allocator.py
    │   │           ├── block.py
    │   │           ├── block_manager.py
    │   │           ├── block_tracker.py
    │   │           ├── collective.py
    │   │           ├── collective_manager.py
    │   │           ├── engine.py
    │   │           ├── executor.py
    │   │           ├── scheduler.py
    │   │           ├── test/
    │   │           │   ├── __init__.py
    │   │           │   └── test_xavier.py
    │   │           ├── transfer.py
    │   │           └── utils.py
    │   ├── rerank/
    │   │   ├── __init__.py
    │   │   ├── cache_manager.py
    │   │   ├── core.py
    │   │   ├── custom.py
    │   │   ├── llama_cpp/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       └── test_llama_cpp.py
    │   │   ├── model_spec.json
    │   │   ├── rerank_family.py
    │   │   ├── sentence_transformers/
    │   │   │   ├── __init__.py
    │   │   │   ├── core.py
    │   │   │   └── tests/
    │   │   │       ├── __init__.py
    │   │   │       └── test_sentence_transformers.py
    │   │   ├── tests/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_qwen3_vl_reranker_virtualenv.py
    │   │   │   └── test_rerank.py
    │   │   ├── utils.py
    │   │   └── vllm/
    │   │       ├── __init__.py
    │   │       ├── core.py
    │   │       └── tests/
    │   │           ├── __init__.py
    │   │           └── test_vllm.py
    │   ├── scheduler/
    │   │   ├── __init__.py
    │   │   ├── batch.py
    │   │   ├── core.py
    │   │   └── request.py
    │   ├── tests/
    │   │   ├── __init__.py
    │   │   └── test_utils.py
    │   ├── utils.py
    │   └── video/
    │       ├── __init__.py
    │       ├── cache_manager.py
    │       ├── core.py
    │       ├── diffusers.py
    │       ├── model_spec.json
    │       └── tests/
    │           ├── __init__.py
    │           └── test_diffusers_video.py
    ├── thirdparty/
    │   ├── __init__.py
    │   ├── audiotools/
    │   │   ├── __init__.py
    │   │   ├── core/
    │   │   │   ├── __init__.py
    │   │   │   ├── audio_signal.py
    │   │   │   ├── display.py
    │   │   │   ├── dsp.py
    │   │   │   ├── effects.py
    │   │   │   ├── ffmpeg.py
    │   │   │   ├── loudness.py
    │   │   │   ├── playback.py
    │   │   │   ├── templates/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── headers.html
    │   │   │   │   ├── pandoc.css
    │   │   │   │   └── widget.html
    │   │   │   ├── util.py
    │   │   │   └── whisper.py
    │   │   ├── data/
    │   │   │   ├── __init__.py
    │   │   │   ├── datasets.py
    │   │   │   ├── preprocess.py
    │   │   │   └── transforms.py
    │   │   ├── metrics/
    │   │   │   ├── __init__.py
    │   │   │   ├── distance.py
    │   │   │   ├── quality.py
    │   │   │   └── spectral.py
    │   │   ├── ml/
    │   │   │   ├── __init__.py
    │   │   │   ├── accelerator.py
    │   │   │   ├── decorators.py
    │   │   │   ├── experiment.py
    │   │   │   └── layers/
    │   │   │       ├── __init__.py
    │   │   │       ├── base.py
    │   │   │       └── spectral_gate.py
    │   │   ├── post.py
    │   │   └── preference.py
    │   ├── cosyvoice/
    │   │   ├── __init__.py
    │   │   ├── bin/
    │   │   │   ├── average_model.py
    │   │   │   ├── export_jit.py
    │   │   │   ├── export_onnx.py
    │   │   │   ├── inference_deprecated.py
    │   │   │   ├── spk2info.pt
    │   │   │   └── train.py
    │   │   ├── cli/
    │   │   │   ├── __init__.py
    │   │   │   ├── cosyvoice.py
    │   │   │   ├── frontend.py
    │   │   │   └── model.py
    │   │   ├── dataset/
    │   │   │   ├── __init__.py
    │   │   │   ├── dataset.py
    │   │   │   └── processor.py
    │   │   ├── flow/
    │   │   │   ├── decoder.py
    │   │   │   ├── flow.py
    │   │   │   ├── flow_matching.py
    │   │   │   └── length_regulator.py
    │   │   ├── hifigan/
    │   │   │   ├── discriminator.py
    │   │   │   ├── f0_predictor.py
    │   │   │   ├── generator.py
    │   │   │   └── hifigan.py
    │   │   ├── llm/
    │   │   │   └── llm.py
    │   │   ├── tokenizer/
    │   │   │   ├── assets/
    │   │   │   │   └── multilingual_zh_ja_yue_char_del.tiktoken
    │   │   │   └── tokenizer.py
    │   │   ├── transformer/
    │   │   │   ├── __init__.py
    │   │   │   ├── activation.py
    │   │   │   ├── attention.py
    │   │   │   ├── convolution.py
    │   │   │   ├── decoder.py
    │   │   │   ├── decoder_layer.py
    │   │   │   ├── embedding.py
    │   │   │   ├── encoder.py
    │   │   │   ├── encoder_layer.py
    │   │   │   ├── label_smoothing_loss.py
    │   │   │   ├── positionwise_feed_forward.py
    │   │   │   ├── subsampling.py
    │   │   │   └── upsample_encoder.py
    │   │   ├── utils/
    │   │   │   ├── __init__.py
    │   │   │   ├── class_utils.py
    │   │   │   ├── common.py
    │   │   │   ├── executor.py
    │   │   │   ├── file_utils.py
    │   │   │   ├── frontend_utils.py
    │   │   │   ├── losses.py
    │   │   │   ├── mask.py
    │   │   │   ├── scheduler.py
    │   │   │   └── train_utils.py
    │   │   └── vllm/
    │   │       └── cosyvoice2.py
    │   ├── deepseek_vl/
    │   │   ├── __init__.py
    │   │   ├── models/
    │   │   │   ├── __init__.py
    │   │   │   ├── clip_encoder.py
    │   │   │   ├── image_processing_vlm.py
    │   │   │   ├── modeling_vlm.py
    │   │   │   ├── processing_vlm.py
    │   │   │   ├── projector.py
    │   │   │   ├── sam.py
    │   │   │   └── siglip_vit.py
    │   │   ├── serve/
    │   │   │   ├── __init__.py
    │   │   │   ├── app_deepseek.py
    │   │   │   ├── app_modules/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── gradio_utils.py
    │   │   │   │   ├── overwrites.py
    │   │   │   │   ├── presets.py
    │   │   │   │   └── utils.py
    │   │   │   ├── assets/
    │   │   │   │   ├── Kelpy-Codos.js
    │   │   │   │   ├── custom.css
    │   │   │   │   └── custom.js
    │   │   │   └── inference.py
    │   │   └── utils/
    │   │       ├── __init__.py
    │   │       ├── conversation.py
    │   │       └── io.py
    │   ├── deepseek_vl2/
    │   │   ├── __init__.py
    │   │   ├── models/
    │   │   │   ├── __init__.py
    │   │   │   ├── configuration_deepseek.py
    │   │   │   ├── conversation.py
    │   │   │   ├── modeling_deepseek.py
    │   │   │   ├── modeling_deepseek_vl_v2.py
    │   │   │   ├── processing_deepseek_vl_v2.py
    │   │   │   └── siglip_vit.py
    │   │   ├── serve/
    │   │   │   ├── __init__.py
    │   │   │   ├── app_modules/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── gradio_utils.py
    │   │   │   │   ├── overwrites.py
    │   │   │   │   ├── presets.py
    │   │   │   │   └── utils.py
    │   │   │   ├── assets/
    │   │   │   │   ├── Kelpy-Codos.js
    │   │   │   │   ├── custom.css
    │   │   │   │   ├── custom.js
    │   │   │   │   └── simsun.ttc
    │   │   │   └── inference.py
    │   │   └── utils/
    │   │       ├── __init__.py
    │   │       └── io.py
    │   ├── f5_tts/
    │   │   ├── __init__.py
    │   │   ├── api.py
    │   │   ├── configs/
    │   │   │   ├── E2TTS_Base_train.yaml
    │   │   │   ├── E2TTS_Small_train.yaml
    │   │   │   ├── F5TTS_Base_train.yaml
    │   │   │   └── F5TTS_Small_train.yaml
    │   │   ├── eval/
    │   │   │   ├── README.md
    │   │   │   ├── ecapa_tdnn.py
    │   │   │   ├── eval_infer_batch.py
    │   │   │   ├── eval_infer_batch.sh
    │   │   │   ├── eval_librispeech_test_clean.py
    │   │   │   ├── eval_seedtts_testset.py
    │   │   │   └── utils_eval.py
    │   │   ├── infer/
    │   │   │   ├── README.md
    │   │   │   ├── examples/
    │   │   │   │   ├── basic/
    │   │   │   │   │   └── basic.toml
    │   │   │   │   ├── multi/
    │   │   │   │   │   ├── country.flac
    │   │   │   │   │   ├── main.flac
    │   │   │   │   │   ├── story.toml
    │   │   │   │   │   ├── story.txt
    │   │   │   │   │   └── town.flac
    │   │   │   │   └── vocab.txt
    │   │   │   ├── infer_cli.py
    │   │   │   ├── infer_gradio.py
    │   │   │   ├── speech_edit.py
    │   │   │   └── utils_infer.py
    │   │   ├── model/
    │   │   │   ├── __init__.py
    │   │   │   ├── backbones/
    │   │   │   │   ├── README.md
    │   │   │   │   ├── dit.py
    │   │   │   │   ├── mmdit.py
    │   │   │   │   └── unett.py
    │   │   │   ├── cfm.py
    │   │   │   ├── dataset.py
    │   │   │   ├── modules.py
    │   │   │   ├── trainer.py
    │   │   │   └── utils.py
    │   │   ├── scripts/
    │   │   │   ├── count_max_epoch.py
    │   │   │   └── count_params_gflops.py
    │   │   ├── socket_server.py
    │   │   └── train/
    │   │       ├── README.md
    │   │       ├── datasets/
    │   │       │   ├── prepare_csv_wavs.py
    │   │       │   ├── prepare_emilia.py
    │   │       │   ├── prepare_libritts.py
    │   │       │   ├── prepare_ljspeech.py
    │   │       │   └── prepare_wenetspeech4tts.py
    │   │       ├── finetune_cli.py
    │   │       ├── finetune_gradio.py
    │   │       └── train.py
    │   ├── fish_speech/
    │   │   ├── __init__.py
    │   │   ├── fish_speech/
    │   │   │   ├── __init__.py
    │   │   │   ├── callbacks/
    │   │   │   │   ├── __init__.py
    │   │   │   │   └── grad_norm.py
    │   │   │   ├── configs/
    │   │   │   │   ├── base.yaml
    │   │   │   │   ├── firefly_gan_vq.yaml
    │   │   │   │   ├── lora/
    │   │   │   │   │   └── r_8_alpha_16.yaml
    │   │   │   │   └── text2semantic_finetune.yaml
    │   │   │   ├── conversation.py
    │   │   │   ├── datasets/
    │   │   │   │   ├── concat_repeat.py
    │   │   │   │   ├── protos/
    │   │   │   │   │   ├── text-data.proto
    │   │   │   │   │   ├── text_data_pb2.py
    │   │   │   │   │   └── text_data_stream.py
    │   │   │   │   ├── semantic.py
    │   │   │   │   └── vqgan.py
    │   │   │   ├── i18n/
    │   │   │   │   ├── README.md
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── core.py
    │   │   │   │   ├── locale/
    │   │   │   │   │   ├── en_US.json
    │   │   │   │   │   ├── es_ES.json
    │   │   │   │   │   ├── ja_JP.json
    │   │   │   │   │   ├── ko_KR.json
    │   │   │   │   │   ├── pt_BR.json
    │   │   │   │   │   └── zh_CN.json
    │   │   │   │   └── scan.py
    │   │   │   ├── models/
    │   │   │   │   ├── text2semantic/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── lit_module.py
    │   │   │   │   │   ├── llama.py
    │   │   │   │   │   └── lora.py
    │   │   │   │   └── vqgan/
    │   │   │   │       ├── __init__.py
    │   │   │   │       ├── modules/
    │   │   │   │       │   ├── firefly.py
    │   │   │   │       │   └── fsq.py
    │   │   │   │       └── utils.py
    │   │   │   ├── scheduler.py
    │   │   │   ├── text/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── chn_text_norm/
    │   │   │   │   │   ├── .gitignore
    │   │   │   │   │   ├── README.md
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── basic_class.py
    │   │   │   │   │   ├── basic_constant.py
    │   │   │   │   │   ├── basic_util.py
    │   │   │   │   │   ├── cardinal.py
    │   │   │   │   │   ├── date.py
    │   │   │   │   │   ├── digit.py
    │   │   │   │   │   ├── fraction.py
    │   │   │   │   │   ├── money.py
    │   │   │   │   │   ├── percentage.py
    │   │   │   │   │   ├── telephone.py
    │   │   │   │   │   └── text.py
    │   │   │   │   ├── clean.py
    │   │   │   │   └── spliter.py
    │   │   │   ├── tokenizer.py
    │   │   │   ├── train.py
    │   │   │   ├── utils/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── braceexpand.py
    │   │   │   │   ├── context.py
    │   │   │   │   ├── file.py
    │   │   │   │   ├── instantiators.py
    │   │   │   │   ├── logger.py
    │   │   │   │   ├── logging_utils.py
    │   │   │   │   ├── rich_utils.py
    │   │   │   │   ├── spectrogram.py
    │   │   │   │   └── utils.py
    │   │   │   └── webui/
    │   │   │       ├── css/
    │   │   │       │   └── style.css
    │   │   │       ├── html/
    │   │   │       │   └── footer.html
    │   │   │       ├── js/
    │   │   │       │   └── animate.js
    │   │   │       ├── launch_utils.py
    │   │   │       └── manage.py
    │   │   └── tools/
    │   │       ├── api_client.py
    │   │       ├── api_server.py
    │   │       ├── download_models.py
    │   │       ├── e2e_webui.py
    │   │       ├── extract_model.py
    │   │       ├── file.py
    │   │       ├── fish_e2e.py
    │   │       ├── inference_engine/
    │   │       │   ├── __init__.py
    │   │       │   ├── reference_loader.py
    │   │       │   ├── utils.py
    │   │       │   └── vq_manager.py
    │   │       ├── llama/
    │   │       │   ├── build_dataset.py
    │   │       │   ├── eval_in_context.py
    │   │       │   ├── generate.py
    │   │       │   ├── merge_lora.py
    │   │       │   ├── quantize.py
    │   │       │   └── rebuild_tokenizer.py
    │   │       ├── run_webui.py
    │   │       ├── schema.py
    │   │       ├── sensevoice/
    │   │       │   ├── README.md
    │   │       │   ├── __init__.py
    │   │       │   ├── auto_model.py
    │   │       │   ├── fun_asr.py
    │   │       │   └── vad_utils.py
    │   │       ├── server/
    │   │       │   ├── agent/
    │   │       │   │   ├── __init__.py
    │   │       │   │   ├── generate.py
    │   │       │   │   ├── generation_utils.py
    │   │       │   │   └── pre_generation_utils.py
    │   │       │   ├── api_utils.py
    │   │       │   ├── exception_handler.py
    │   │       │   ├── inference.py
    │   │       │   ├── model_manager.py
    │   │       │   ├── model_utils.py
    │   │       │   └── views.py
    │   │       ├── smart_pad.py
    │   │       ├── vqgan/
    │   │       │   ├── create_train_split.py
    │   │       │   ├── extract_vq.py
    │   │       │   └── inference.py
    │   │       ├── webui/
    │   │       │   ├── __init__.py
    │   │       │   ├── inference.py
    │   │       │   └── variables.py
    │   │       └── whisper_asr.py
    │   ├── indextts/
    │   │   ├── BigVGAN/
    │   │   │   ├── ECAPA_TDNN.py
    │   │   │   ├── __init__.py
    │   │   │   ├── activations.py
    │   │   │   ├── alias_free_activation/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── cuda/
    │   │   │   │   │   ├── .gitignore
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── activation1d.py
    │   │   │   │   │   ├── anti_alias_activation.cpp
    │   │   │   │   │   ├── anti_alias_activation_cuda.cu
    │   │   │   │   │   ├── compat.h
    │   │   │   │   │   ├── load.py
    │   │   │   │   │   └── type_shim.h
    │   │   │   │   └── torch/
    │   │   │   │       ├── __init__.py
    │   │   │   │       ├── act.py
    │   │   │   │       ├── filter.py
    │   │   │   │       └── resample.py
    │   │   │   ├── alias_free_torch/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── act.py
    │   │   │   │   ├── filter.py
    │   │   │   │   └── resample.py
    │   │   │   ├── bigvgan.py
    │   │   │   ├── models.py
    │   │   │   ├── nnet/
    │   │   │   │   ├── CNN.py
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── linear.py
    │   │   │   │   └── normalization.py
    │   │   │   └── utils.py
    │   │   ├── __init__.py
    │   │   ├── cli.py
    │   │   ├── gpt/
    │   │   │   ├── __init__.py
    │   │   │   ├── conformer/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── attention.py
    │   │   │   │   ├── embedding.py
    │   │   │   │   └── subsampling.py
    │   │   │   ├── conformer_encoder.py
    │   │   │   ├── model.py
    │   │   │   ├── model_v2.py
    │   │   │   ├── perceiver.py
    │   │   │   ├── transformers_beam_search.py
    │   │   │   ├── transformers_generation_utils.py
    │   │   │   ├── transformers_gpt2.py
    │   │   │   └── transformers_modeling_utils.py
    │   │   ├── infer.py
    │   │   ├── infer_v2.py
    │   │   ├── s2mel/
    │   │   │   ├── dac/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── __main__.py
    │   │   │   │   ├── model/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── base.py
    │   │   │   │   │   ├── dac.py
    │   │   │   │   │   ├── discriminator.py
    │   │   │   │   │   └── encodec.py
    │   │   │   │   ├── nn/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── layers.py
    │   │   │   │   │   ├── loss.py
    │   │   │   │   │   └── quantize.py
    │   │   │   │   └── utils/
    │   │   │   │       ├── __init__.py
    │   │   │   │       ├── decode.py
    │   │   │   │       └── encode.py
    │   │   │   ├── hf_utils.py
    │   │   │   ├── modules/
    │   │   │   │   ├── alias_free_torch/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── act.py
    │   │   │   │   │   ├── filter.py
    │   │   │   │   │   └── resample.py
    │   │   │   │   ├── audio.py
    │   │   │   │   ├── bigvgan/
    │   │   │   │   │   ├── activations.py
    │   │   │   │   │   ├── alias_free_activation/
    │   │   │   │   │   │   ├── cuda/
    │   │   │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   │   │   ├── activation1d.py
    │   │   │   │   │   │   │   ├── anti_alias_activation.cpp
    │   │   │   │   │   │   │   ├── anti_alias_activation_cuda.cu
    │   │   │   │   │   │   │   ├── compat.h
    │   │   │   │   │   │   │   ├── load.py
    │   │   │   │   │   │   │   └── type_shim.h
    │   │   │   │   │   │   └── torch/
    │   │   │   │   │   │       ├── __init__.py
    │   │   │   │   │   │       ├── act.py
    │   │   │   │   │   │       ├── filter.py
    │   │   │   │   │   │       └── resample.py
    │   │   │   │   │   ├── bigvgan.py
    │   │   │   │   │   ├── config.json
    │   │   │   │   │   ├── env.py
    │   │   │   │   │   ├── meldataset.py
    │   │   │   │   │   └── utils.py
    │   │   │   │   ├── campplus/
    │   │   │   │   │   ├── DTDNN.py
    │   │   │   │   │   ├── classifier.py
    │   │   │   │   │   └── layers.py
    │   │   │   │   ├── commons.py
    │   │   │   │   ├── diffusion_transformer.py
    │   │   │   │   ├── encodec.py
    │   │   │   │   ├── flow_matching.py
    │   │   │   │   ├── gpt_fast/
    │   │   │   │   │   ├── generate.py
    │   │   │   │   │   ├── model.py
    │   │   │   │   │   └── quantize.py
    │   │   │   │   ├── hifigan/
    │   │   │   │   │   ├── f0_predictor.py
    │   │   │   │   │   └── generator.py
    │   │   │   │   ├── layers.py
    │   │   │   │   ├── length_regulator.py
    │   │   │   │   ├── openvoice/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── api.py
    │   │   │   │   │   ├── attentions.py
    │   │   │   │   │   ├── checkpoints_v2/
    │   │   │   │   │   │   └── converter/
    │   │   │   │   │   │       └── config.json
    │   │   │   │   │   ├── commons.py
    │   │   │   │   │   ├── mel_processing.py
    │   │   │   │   │   ├── models.py
    │   │   │   │   │   ├── modules.py
    │   │   │   │   │   ├── openvoice_app.py
    │   │   │   │   │   ├── se_extractor.py
    │   │   │   │   │   ├── transforms.py
    │   │   │   │   │   └── utils.py
    │   │   │   │   ├── quantize.py
    │   │   │   │   ├── rmvpe.py
    │   │   │   │   ├── vocos/
    │   │   │   │   │   ├── __init__.py
    │   │   │   │   │   ├── heads.py
    │   │   │   │   │   ├── helpers.py
    │   │   │   │   │   ├── loss.py
    │   │   │   │   │   ├── models.py
    │   │   │   │   │   ├── modules.py
    │   │   │   │   │   ├── pretrained.py
    │   │   │   │   │   └── spectral_ops.py
    │   │   │   │   └── wavenet.py
    │   │   │   ├── optimizers.py
    │   │   │   └── wav2vecbert_extract.py
    │   │   ├── utils/
    │   │   │   ├── __init__.py
    │   │   │   ├── arch_util.py
    │   │   │   ├── checkpoint.py
    │   │   │   ├── common.py
    │   │   │   ├── feature_extractors.py
    │   │   │   ├── front.py
    │   │   │   ├── maskgct/
    │   │   │   │   └── models/
    │   │   │   │       ├── codec/
    │   │   │   │       │   ├── __init__.py
    │   │   │   │       │   ├── amphion_codec/
    │   │   │   │       │   │   ├── codec.py
    │   │   │   │       │   │   ├── quantize/
    │   │   │   │       │   │   │   ├── __init__.py
    │   │   │   │       │   │   │   ├── factorized_vector_quantize.py
    │   │   │   │       │   │   │   ├── lookup_free_quantize.py
    │   │   │   │       │   │   │   ├── residual_vq.py
    │   │   │   │       │   │   │   └── vector_quantize.py
    │   │   │   │       │   │   └── vocos.py
    │   │   │   │       │   ├── codec_dataset.py
    │   │   │   │       │   ├── codec_inference.py
    │   │   │   │       │   ├── codec_sampler.py
    │   │   │   │       │   ├── codec_trainer.py
    │   │   │   │       │   ├── facodec/
    │   │   │   │       │   │   ├── __init__.py
    │   │   │   │       │   │   ├── alias_free_torch/
    │   │   │   │       │   │   │   ├── __init__.py
    │   │   │   │       │   │   │   ├── act.py
    │   │   │   │       │   │   │   ├── filter.py
    │   │   │   │       │   │   │   └── resample.py
    │   │   │   │       │   │   ├── facodec_dataset.py
    │   │   │   │       │   │   ├── facodec_inference.py
    │   │   │   │       │   │   ├── facodec_trainer.py
    │   │   │   │       │   │   ├── modules/
    │   │   │   │       │   │   │   ├── JDC/
    │   │   │   │       │   │   │   │   ├── __init__.py
    │   │   │   │       │   │   │   │   ├── bst.t7
    │   │   │   │       │   │   │   │   └── model.py
    │   │   │   │       │   │   │   ├── attentions.py
    │   │   │   │       │   │   │   ├── commons.py
    │   │   │   │       │   │   │   ├── gradient_reversal.py
    │   │   │   │       │   │   │   ├── layers.py
    │   │   │   │       │   │   │   ├── quantize.py
    │   │   │   │       │   │   │   ├── style_encoder.py
    │   │   │   │       │   │   │   └── wavenet.py
    │   │   │   │       │   │   └── optimizer.py
    │   │   │   │       │   ├── kmeans/
    │   │   │   │       │   │   ├── repcodec_model.py
    │   │   │   │       │   │   └── vocos.py
    │   │   │   │       │   ├── melvqgan/
    │   │   │   │       │   │   └── melspec.py
    │   │   │   │       │   ├── ns3_codec/
    │   │   │   │       │   │   ├── README.md
    │   │   │   │       │   │   ├── __init__.py
    │   │   │   │       │   │   ├── alias_free_torch/
    │   │   │   │       │   │   │   ├── __init__.py
    │   │   │   │       │   │   │   ├── act.py
    │   │   │   │       │   │   │   ├── filter.py
    │   │   │   │       │   │   │   └── resample.py
    │   │   │   │       │   │   ├── facodec.py
    │   │   │   │       │   │   ├── gradient_reversal.py
    │   │   │   │       │   │   ├── melspec.py
    │   │   │   │       │   │   ├── quantize/
    │   │   │   │       │   │   │   ├── __init__.py
    │   │   │   │       │   │   │   ├── fvq.py
    │   │   │   │       │   │   │   └── rvq.py
    │   │   │   │       │   │   └── transformer.py
    │   │   │   │       │   ├── speechtokenizer/
    │   │   │   │       │   │   ├── model.py
    │   │   │   │       │   │   └── modules/
    │   │   │   │       │   │       ├── __init__.py
    │   │   │   │       │   │       ├── conv.py
    │   │   │   │       │   │       ├── lstm.py
    │   │   │   │       │   │       ├── norm.py
    │   │   │   │       │   │       ├── quantization/
    │   │   │   │       │   │       │   ├── __init__.py
    │   │   │   │       │   │       │   ├── ac.py
    │   │   │   │       │   │       │   ├── core_vq.py
    │   │   │   │       │   │       │   ├── distrib.py
    │   │   │   │       │   │       │   └── vq.py
    │   │   │   │       │   │       └── seanet.py
    │   │   │   │       │   └── vevo/
    │   │   │   │       │       └── vevo_repcodec.py
    │   │   │   │       └── tts/
    │   │   │   │           └── maskgct/
    │   │   │   │               ├── ckpt/
    │   │   │   │               │   └── wav2vec2bert_stats.pt
    │   │   │   │               ├── llama_nar.py
    │   │   │   │               └── maskgct_s2a.py
    │   │   │   ├── maskgct_utils.py
    │   │   │   ├── text_utils.py
    │   │   │   ├── typical_sampling.py
    │   │   │   ├── utils.py
    │   │   │   ├── webui_utils.py
    │   │   │   └── xtransformers.py
    │   │   └── vqvae/
    │   │       ├── __init__.py
    │   │       └── xtts_dvae.py
    │   ├── internvl/
    │   │   ├── __init__.py
    │   │   └── conversation.py
    │   ├── llava/
    │   │   ├── __init__.py
    │   │   ├── conversation.py
    │   │   ├── mm_utils.py
    │   │   └── model/
    │   │       ├── __init__.py
    │   │       ├── clip_encoder/
    │   │       │   ├── __init__.py
    │   │       │   ├── builder.py
    │   │       │   └── clip_encoder.py
    │   │       ├── constants.py
    │   │       ├── llava_arch.py
    │   │       ├── llava_llama.py
    │   │       └── multimodal_projector/
    │   │           ├── __init__.py
    │   │           └── builder.py
    │   ├── matcha/
    │   │   ├── VERSION
    │   │   ├── __init__.py
    │   │   ├── app.py
    │   │   ├── cli.py
    │   │   ├── data/
    │   │   │   ├── __init__.py
    │   │   │   ├── components/
    │   │   │   │   └── __init__.py
    │   │   │   └── text_mel_datamodule.py
    │   │   ├── hifigan/
    │   │   │   ├── LICENSE
    │   │   │   ├── README.md
    │   │   │   ├── __init__.py
    │   │   │   ├── config.py
    │   │   │   ├── denoiser.py
    │   │   │   ├── env.py
    │   │   │   ├── meldataset.py
    │   │   │   ├── models.py
    │   │   │   └── xutils.py
    │   │   ├── models/
    │   │   │   ├── __init__.py
    │   │   │   ├── baselightningmodule.py
    │   │   │   ├── components/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── decoder.py
    │   │   │   │   ├── flow_matching.py
    │   │   │   │   ├── text_encoder.py
    │   │   │   │   └── transformer.py
    │   │   │   └── matcha_tts.py
    │   │   ├── onnx/
    │   │   │   ├── __init__.py
    │   │   │   ├── export.py
    │   │   │   └── infer.py
    │   │   ├── text/
    │   │   │   ├── __init__.py
    │   │   │   ├── cleaners.py
    │   │   │   ├── numbers.py
    │   │   │   └── symbols.py
    │   │   ├── train.py
    │   │   └── utils/
    │   │       ├── __init__.py
    │   │       ├── audio.py
    │   │       ├── generate_data_statistics.py
    │   │       ├── get_durations_from_trained_model.py
    │   │       ├── instantiators.py
    │   │       ├── logging_utils.py
    │   │       ├── model.py
    │   │       ├── monotonic_align/
    │   │       │   ├── __init__.py
    │   │       │   ├── core.pyx
    │   │       │   └── setup.py
    │   │       ├── pylogger.py
    │   │       ├── rich_utils.py
    │   │       └── utils.py
    │   ├── megatts3/
    │   │   ├── __init__.py
    │   │   └── tts/
    │   │       ├── frontend_function.py
    │   │       ├── gradio_api.py
    │   │       ├── infer_cli.py
    │   │       ├── modules/
    │   │       │   ├── aligner/
    │   │       │   │   └── whisper_small.py
    │   │       │   ├── ar_dur/
    │   │       │   │   ├── ar_dur_predictor.py
    │   │       │   │   └── commons/
    │   │       │   │       ├── layers.py
    │   │       │   │       ├── nar_tts_modules.py
    │   │       │   │       ├── rel_transformer.py
    │   │       │   │       ├── rot_transformer.py
    │   │       │   │       ├── seq_utils.py
    │   │       │   │       └── transformer.py
    │   │       │   ├── llm_dit/
    │   │       │   │   ├── cfm.py
    │   │       │   │   ├── dit.py
    │   │       │   │   ├── time_embedding.py
    │   │       │   │   └── transformer.py
    │   │       │   └── wavvae/
    │   │       │       ├── decoder/
    │   │       │       │   ├── diag_gaussian.py
    │   │       │       │   ├── hifigan_modules.py
    │   │       │       │   ├── seanet_encoder.py
    │   │       │       │   └── wavvae_v3.py
    │   │       │       └── encoder/
    │   │       │           └── common_modules/
    │   │       │               ├── conv.py
    │   │       │               ├── lstm.py
    │   │       │               └── seanet.py
    │   │       └── utils/
    │   │           ├── audio_utils/
    │   │           │   ├── align.py
    │   │           │   ├── io.py
    │   │           │   └── plot.py
    │   │           ├── commons/
    │   │           │   ├── ckpt_utils.py
    │   │           │   └── hparams.py
    │   │           └── text_utils/
    │   │               ├── dict.json
    │   │               ├── ph_tone_convert.py
    │   │               ├── split_text.py
    │   │               └── text_encoder.py
    │   ├── melo/
    │   │   ├── __init__.py
    │   │   ├── api.py
    │   │   ├── app.py
    │   │   ├── attentions.py
    │   │   ├── commons.py
    │   │   ├── configs/
    │   │   │   └── config.json
    │   │   ├── data/
    │   │   │   └── example/
    │   │   │       └── metadata.list
    │   │   ├── data_utils.py
    │   │   ├── download_utils.py
    │   │   ├── infer.py
    │   │   ├── init_downloads.py
    │   │   ├── losses.py
    │   │   ├── main.py
    │   │   ├── mel_processing.py
    │   │   ├── models.py
    │   │   ├── modules.py
    │   │   ├── monotonic_align/
    │   │   │   ├── __init__.py
    │   │   │   └── core.py
    │   │   ├── preprocess_text.py
    │   │   ├── split_utils.py
    │   │   ├── text/
    │   │   │   ├── __init__.py
    │   │   │   ├── chinese.py
    │   │   │   ├── chinese_bert.py
    │   │   │   ├── chinese_mix.py
    │   │   │   ├── cleaner.py
    │   │   │   ├── cleaner_multiling.py
    │   │   │   ├── cmudict.rep
    │   │   │   ├── cmudict_cache.pickle
    │   │   │   ├── english.py
    │   │   │   ├── english_bert.py
    │   │   │   ├── english_utils/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── abbreviations.py
    │   │   │   │   ├── number_norm.py
    │   │   │   │   └── time_norm.py
    │   │   │   ├── es_phonemizer/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── base.py
    │   │   │   │   ├── cleaner.py
    │   │   │   │   ├── es_symbols.json
    │   │   │   │   ├── es_symbols.txt
    │   │   │   │   ├── es_symbols_v2.json
    │   │   │   │   ├── es_to_ipa.py
    │   │   │   │   ├── example_ipa.txt
    │   │   │   │   ├── gruut_wrapper.py
    │   │   │   │   ├── punctuation.py
    │   │   │   │   ├── spanish_symbols.txt
    │   │   │   │   └── test.ipynb
    │   │   │   ├── fr_phonemizer/
    │   │   │   │   ├── __init__.py
    │   │   │   │   ├── base.py
    │   │   │   │   ├── cleaner.py
    │   │   │   │   ├── en_symbols.json
    │   │   │   │   ├── example_ipa.txt
    │   │   │   │   ├── fr_symbols.json
    │   │   │   │   ├── fr_to_ipa.py
    │   │   │   │   ├── french_abbreviations.py
    │   │   │   │   ├── french_symbols.txt
    │   │   │   │   ├── gruut_wrapper.py
    │   │   │   │   └── punctuation.py
    │   │   │   ├── french.py
    │   │   │   ├── french_bert.py
    │   │   │   ├── japanese.py
    │   │   │   ├── japanese_bert.py
    │   │   │   ├── ko_dictionary.py
    │   │   │   ├── korean.py
    │   │   │   ├── opencpop-strict.txt
    │   │   │   ├── spanish.py
    │   │   │   ├── spanish_bert.py
    │   │   │   ├── symbols.py
    │   │   │   └── tone_sandhi.py
    │   │   ├── train.py
    │   │   ├── train.sh
    │   │   ├── transforms.py
    │   │   └── utils.py
    │   ├── mlx/
    │   │   ├── __init__.py
    │   │   └── flux/
    │   │       ├── __init__.py
    │   │       ├── autoencoder.py
    │   │       ├── clip.py
    │   │       ├── datasets.py
    │   │       ├── flux.py
    │   │       ├── layers.py
    │   │       ├── lora.py
    │   │       ├── model.py
    │   │       ├── sampler.py
    │   │       ├── t5.py
    │   │       ├── tokenizers.py
    │   │       ├── trainer.py
    │   │       └── utils.py
    │   └── whisper/
    │       ├── __init__.py
    │       ├── __main__.py
    │       ├── assets/
    │       │   ├── gpt2.tiktoken
    │       │   ├── mel_filters.npz
    │       │   └── multilingual.tiktoken
    │       ├── audio.py
    │       ├── decoding.py
    │       ├── model.py
    │       ├── normalizers/
    │       │   ├── __init__.py
    │       │   ├── basic.py
    │       │   ├── english.json
    │       │   └── english.py
    │       ├── timing.py
    │       ├── tokenizer.py
    │       ├── transcribe.py
    │       ├── triton_ops.py
    │       ├── utils.py
    │       └── version.py
    ├── types.py
    ├── ui/
    │   ├── __init__.py
    │   ├── gradio/
    │   │   ├── __init__.py
    │   │   ├── chat_interface.py
    │   │   ├── media_interface.py
    │   │   └── utils/
    │   │       ├── __init__.py
    │   │       └── latex.py
    │   └── web/
    │       └── ui/
    │           ├── .eslintignore
    │           ├── .eslintrc.yml
    │           ├── .gitignore
    │           ├── .prettierignore
    │           ├── .prettierrc.yml
    │           ├── package.json
    │           ├── public/
    │           │   └── index.html
    │           └── src/
    │               ├── App.js
    │               ├── components/
    │               │   ├── MenuSide.js
    │               │   ├── Title.js
    │               │   ├── alertComponent.js
    │               │   ├── apiContext.js
    │               │   ├── authAlertDialog.js
    │               │   ├── copyComponent.js
    │               │   ├── deleteDialog.js
    │               │   ├── errorMessageSnackBar.js
    │               │   ├── fetchWrapper.js
    │               │   ├── fetcher.js
    │               │   ├── hotkeyFocusTextField.js
    │               │   ├── successMessageSnackBar.js
    │               │   ├── tableTitle.js
    │               │   ├── themeButton.js
    │               │   ├── themeContext.js
    │               │   ├── titleTypography.js
    │               │   ├── translateButton.js
    │               │   ├── utils.js
    │               │   └── versionLabel.js
    │               ├── i18n.js
    │               ├── index.css
    │               ├── index.js
    │               ├── locales/
    │               │   ├── en.json
    │               │   ├── ja.json
    │               │   ├── ko.json
    │               │   └── zh.json
    │               ├── router/
    │               │   └── index.js
    │               ├── scenes/
    │               │   ├── _layout/
    │               │   │   └── index.js
    │               │   ├── cluster_info/
    │               │   │   ├── index.js
    │               │   │   ├── nodeInfo.js
    │               │   │   └── style.js
    │               │   ├── launch_model/
    │               │   │   ├── LaunchModel.js
    │               │   │   ├── components/
    │               │   │   │   ├── cachedListDialog.js
    │               │   │   │   ├── commandBuilder.js
    │               │   │   │   ├── dynamicFieldList.js
    │               │   │   │   ├── editCustomModelDialog.js
    │               │   │   │   ├── launchModelDrawer.js
    │               │   │   │   ├── modelFormConfig.js
    │               │   │   │   ├── pasteDialog.js
    │               │   │   │   ├── progress.js
    │               │   │   │   ├── selectField.js
    │               │   │   │   └── virtualenvListDialog.js
    │               │   │   ├── data/
    │               │   │   │   └── data.js
    │               │   │   ├── index.js
    │               │   │   ├── launchCustom.js
    │               │   │   ├── modelCard.js
    │               │   │   └── styles/
    │               │   │       └── modelCardStyle.css
    │               │   ├── login/
    │               │   │   ├── header.js
    │               │   │   └── login.js
    │               │   ├── register_model/
    │               │   │   ├── components/
    │               │   │   │   ├── addControlnet.js
    │               │   │   │   ├── addModelSpecs.js
    │               │   │   │   ├── addStop.js
    │               │   │   │   └── addVirtualenv.js
    │               │   │   ├── data/
    │               │   │   │   └── languages.js
    │               │   │   ├── index.js
    │               │   │   ├── registerModel.js
    │               │   │   └── styles/
    │               │   │       └── registerModelStyle.css
    │               │   └── running_models/
    │               │       └── index.js
    │               └── theme.js
    └── utils.py

Download .txt

Showing preview only (702K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (8657 symbols across 776 files)

FILE: benchmark/benchmark_embedding.py
  class EmbeddingBenchmarkRunner (line 31) | class EmbeddingBenchmarkRunner(ConcurrentBenchmarkRunner):
    method __init__ (line 32) | def __init__(
    method _run (line 52) | async def _run(self):
    method worker (line 59) | async def worker(self, i: int):
    method send_request (line 73) | async def send_request(self, request, warming_up: bool = False):
  function main (line 101) | def main(args: argparse.Namespace):

FILE: benchmark/benchmark_latency.py
  class LatencyBenchmarkRunner (line 29) | class LatencyBenchmarkRunner(BenchmarkRunner):
    method _run (line 30) | async def _run(self):
  function main (line 42) | def main(args: argparse.Namespace):

FILE: benchmark/benchmark_long.py
  class LongBenchmarkRunner (line 30) | class LongBenchmarkRunner(ConcurrentBenchmarkRunner):
    method _run (line 31) | async def _run(self):
    method worker (line 38) | async def worker(self, i: int):
  function main (line 53) | def main(args: argparse.Namespace):

FILE: benchmark/benchmark_rerank.py
  class RerankBenchmarkRunner (line 31) | class RerankBenchmarkRunner(ConcurrentBenchmarkRunner):
    method __init__ (line 32) | def __init__(
    method _run (line 54) | async def _run(self):
    method worker (line 61) | async def worker(self, i: int):
    method send_request (line 75) | async def send_request(self, request, warming_up: bool = False):
  function main (line 105) | def main(args: argparse.Namespace):

FILE: benchmark/benchmark_runner.py
  function remove_prefix (line 33) | def remove_prefix(text: str, prefix: str) -> str:
  class RequestOutput (line 40) | class RequestOutput:
  class BenchmarkRunner (line 50) | class BenchmarkRunner:
    method __init__ (line 51) | def __init__(
    method run (line 69) | async def run(self):
    method warm_up (line 76) | async def warm_up(self, num_requests: int = 5):
    method _run (line 83) | async def _run(self):
    method send_request (line 86) | async def send_request(self, request: tuple, warming_up: bool = False):
    method print_stats (line 180) | def print_stats(self):
  class ConcurrentBenchmarkRunner (line 378) | class ConcurrentBenchmarkRunner(BenchmarkRunner):
    method __init__ (line 379) | def __init__(
    method worker (line 400) | async def worker(self):

FILE: benchmark/benchmark_serving.py
  class ServingBenchmarkRunner (line 31) | class ServingBenchmarkRunner(ConcurrentBenchmarkRunner):
    method __init__ (line 32) | def __init__(
    method _run (line 55) | async def _run(self):
    method warm_up (line 63) | async def warm_up(self, num_requests: int = 5):
    method worker (line 72) | async def worker(self):
  function main (line 92) | def main(args: argparse.Namespace):

FILE: benchmark/utils.py
  function get_tokenizer (line 33) | def get_tokenizer(
  function sample_requests (line 94) | def sample_requests(
  function generate_sorting_prompts (line 143) | def generate_sorting_prompts(

FILE: doc/source/gen_docs.py
  function mock_engine_libraries (line 23) | def mock_engine_libraries():
  function mock_platform_checks (line 99) | def mock_platform_checks():
  function build_architecture_to_models (line 215) | def build_architecture_to_models(models):
  function get_metrics_from_url (line 223) | def get_metrics_from_url(metrics_url):
  function _can_use_transformers_legacy (line 238) | def _can_use_transformers_legacy(model, model_spec):
  function _extract_primary_model_src (line 244) | def _extract_primary_model_src(model):
  function main (line 251) | def main():

FILE: doc/source/norm_zh.py
  function _zh_len (line 24) | def _zh_len(s):
  function _zh_split (line 34) | def _zh_split(s):
  function _normalize (line 53) | def _normalize(string, prefix="", width=76):
  function main (line 112) | def main():

FILE: examples/AI_podcast.py
  function get_audio_devices (line 103) | def get_audio_devices() -> str:
  function callback (line 122) | def callback(indata, frames, time, status):
  function record_unlimited (line 129) | def record_unlimited() -> numpy.ndarray:
  function format_prompt (line 162) | def format_prompt(model, audio_input) -> str:
  function text_to_audio (line 168) | def text_to_audio(response, voice_id):
  function chat_with_bot (line 185) | def chat_with_bot(
  function check_word_order (line 337) | def check_word_order(string, first_word, second_word) -> int:

FILE: examples/AI_podcast_ZH.py
  function get_audio_devices (line 110) | def get_audio_devices() -> str:
  function callback (line 131) | def callback(indata, frames, time, status):
  function record_unlimited (line 139) | def record_unlimited() -> numpy.ndarray:
  function lanuch_model (line 176) | def lanuch_model(alice_or_bob, model_a, username, model_uid, system_prom...
  function format_prompt (line 213) | def format_prompt(model, audio_input) -> str:
  function text_to_audio (line 219) | def text_to_audio(response, voice_id):
  function construct_Baichuan_prompt (line 239) | def construct_Baichuan_prompt(
  function _base_sanitize_generate_config (line 262) | def _base_sanitize_generate_config() -> PytorchGenerateConfig:
  function baichuan_sanitize_generate_config (line 273) | def baichuan_sanitize_generate_config() -> PytorchGenerateConfig:
  function _convert_completion_to_chat (line 285) | def _convert_completion_to_chat(completion: Completion) -> ChatCompletion:
  function chat_with_bot (line 305) | def chat_with_bot(
  function check_word_order (line 490) | def check_word_order(string, first_word, second_word) -> int:

FILE: examples/AI_translate.py
  function _prompt (line 24) | def _prompt(text):

FILE: examples/LangChain_Streamlit_Doc_Chat.py
  function write_text_file (line 14) | def write_text_file(content, file_path):

FILE: examples/gradio_chatinterface.py
  function flatten (line 80) | def flatten(matrix: List[List[str]]) -> List[str]:
  function to_chat (line 86) | def to_chat(lst: List[str]) -> List[Dict[str, str]]:
  function generate_wrapper (line 98) | def generate_wrapper(message: str, history: List[List[str]]) -> str:

FILE: setup.py
  class ExtraCommandMixin (line 53) | class ExtraCommandMixin:
    method run (line 56) | def run(self):
    method register_pre_command (line 61) | def register_pre_command(cls, cmd):
  class CustomInstall (line 65) | class CustomInstall(ExtraCommandMixin, install):
  class CustomDevelop (line 69) | class CustomDevelop(ExtraCommandMixin, develop):
  class CustomSDist (line 73) | class CustomSDist(ExtraCommandMixin, sdist):
  class BuildWeb (line 76) | class BuildWeb(Command):
    method initialize_options (line 87) | def initialize_options(self):
    method finalize_options (line 90) | def finalize_options(self):
    method run (line 94) | def run(cls):
  function build_long_description (line 128) | def build_long_description():

FILE: versioneer.py
  class VersioneerConfig (line 331) | class VersioneerConfig:
  function get_root (line 335) | def get_root():
  function get_config_from_root (line 378) | def get_config_from_root(root):
  class NotThisMethod (line 416) | class NotThisMethod(Exception):
  function register_vcs_handler (line 425) | def register_vcs_handler(vcs, method):  # decorator
  function run_command (line 436) | def run_command(commands, args, cwd=None, verbose=False, hide_stderr=Fal...
  function git_get_keywords (line 1146) | def git_get_keywords(versionfile_abs):
  function git_versions_from_keywords (line 1174) | def git_versions_from_keywords(keywords, tag_prefix, verbose):
  function git_pieces_from_vcs (line 1245) | def git_pieces_from_vcs(tag_prefix, root, verbose, runner=run_command):
  function do_vcs_install (line 1385) | def do_vcs_install(versionfile_source, ipy):
  function versions_from_parentdir (line 1423) | def versions_from_parentdir(parentdir_prefix, root, verbose):
  function versions_from_file (line 1471) | def versions_from_file(filename):
  function write_to_version_file (line 1490) | def write_to_version_file(filename, versions):
  function plus_or_dot (line 1500) | def plus_or_dot(pieces):
  function render_pep440 (line 1507) | def render_pep440(pieces):
  function render_pep440_branch (line 1531) | def render_pep440_branch(pieces):
  function pep440_split_post (line 1560) | def pep440_split_post(ver):
  function render_pep440_pre (line 1570) | def render_pep440_pre(pieces):
  function render_pep440_post (line 1594) | def render_pep440_post(pieces):
  function render_pep440_post_branch (line 1621) | def render_pep440_post_branch(pieces):
  function render_pep440_old (line 1650) | def render_pep440_old(pieces):
  function render_git_describe (line 1672) | def render_git_describe(pieces):
  function render_git_describe_long (line 1692) | def render_git_describe_long(pieces):
  function render (line 1712) | def render(pieces, style):
  class VersioneerBadRootError (line 1754) | class VersioneerBadRootError(Exception):
  function get_versions (line 1758) | def get_versions(verbose=False):
  function get_version (line 1839) | def get_version():
  function get_cmdclass (line 1844) | def get_cmdclass(cmdclass=None):
  function do_setup (line 2160) | def do_setup():
  function scan_setup_py (line 2217) | def scan_setup_py():
  function setup_command (line 2254) | def setup_command():

FILE: xinference/__init__.py
  function _install (line 34) | def _install():

FILE: xinference/_compat.py
  class JSONSchema (line 78) | class JSONSchema(BaseModel):
  class ResponseFormatJSONSchema (line 85) | class ResponseFormatJSONSchema(BaseModel):
  class CreateChatCompletionOpenAI (line 95) | class CreateChatCompletionOpenAI(BaseModel):

FILE: xinference/_version.py
  function get_keywords (line 22) | def get_keywords():
  class VersioneerConfig (line 35) | class VersioneerConfig:
  function get_config (line 39) | def get_config():
  class NotThisMethod (line 53) | class NotThisMethod(Exception):
  function register_vcs_handler (line 61) | def register_vcs_handler(vcs, method):  # decorator
  function run_command (line 74) | def run_command(commands, args, cwd=None, verbose=False, hide_stderr=Fal...
  function versions_from_parentdir (line 120) | def versions_from_parentdir(parentdir_prefix, root, verbose):
  function git_get_keywords (line 151) | def git_get_keywords(versionfile_abs):
  function git_versions_from_keywords (line 179) | def git_versions_from_keywords(keywords, tag_prefix, verbose):
  function git_pieces_from_vcs (line 250) | def git_pieces_from_vcs(tag_prefix, root, verbose, runner=run_command):
  function plus_or_dot (line 390) | def plus_or_dot(pieces):
  function render_pep440 (line 397) | def render_pep440(pieces):
  function render_pep440_branch (line 421) | def render_pep440_branch(pieces):
  function pep440_split_post (line 450) | def pep440_split_post(ver):
  function render_pep440_pre (line 460) | def render_pep440_pre(pieces):
  function render_pep440_post (line 484) | def render_pep440_post(pieces):
  function render_pep440_post_branch (line 511) | def render_pep440_post_branch(pieces):
  function render_pep440_old (line 540) | def render_pep440_old(pieces):
  function render_git_describe (line 562) | def render_git_describe(pieces):
  function render_git_describe_long (line 582) | def render_git_describe_long(pieces):
  function render (line 602) | def render(pieces, style):
  function get_versions (line 644) | def get_versions():

FILE: xinference/api/dependencies.py
  function get_api (line 32) | def get_api(request: Request) -> "RESTfulAPI":

FILE: xinference/api/oauth2/auth_service.py
  class TokenData (line 30) | class TokenData(BaseModel):
  class AuthService (line 35) | class AuthService:
    method __init__ (line 36) | def __init__(self, auth_config_file: Optional[str]):
    method config (line 41) | def config(self):
    method is_legal_api_key (line 45) | def is_legal_api_key(key: str) -> bool:
    method init_auth_config (line 49) | def init_auth_config(self):
    method __call__ (line 70) | def __call__(
    method get_user (line 119) | def get_user(self, username: str) -> Optional[User]:
    method get_user_and_scopes_with_api_key (line 125) | def get_user_and_scopes_with_api_key(
    method authenticate_user (line 134) | def authenticate_user(self, username: str, password: str):
    method generate_token_for_user (line 142) | def generate_token_for_user(self, username: str, password: str):

FILE: xinference/api/oauth2/types.py
  class LoginUserForm (line 19) | class LoginUserForm(BaseModel):
  class User (line 24) | class User(LoginUserForm):
  class AuthConfig (line 29) | class AuthConfig(BaseModel):
  class AuthStartupConfig (line 35) | class AuthStartupConfig(BaseModel):

FILE: xinference/api/oauth2/utils.py
  function create_access_token (line 21) | def create_access_token(
  function verify_password (line 37) | def verify_password(plain_password, hashed_password):
  function get_password_hash (line 52) | def get_password_hash(password):

FILE: xinference/api/responses.py
  class JSONResponse (line 24) | class JSONResponse(StarletteJSONResponse):
    method render (line 27) | def render(self, content: Any) -> bytes:

FILE: xinference/api/restful_api.py
  class RESTfulAPI (line 91) | class RESTfulAPI(CancelMixin):
    method __init__ (line 95) | def __init__(
    method _init_allowed_ip_list (line 114) | def _init_allowed_ip_list(self):
    method _is_ip_allowed (line 138) | def _is_ip_allowed(self, ip: str) -> bool:
    method is_authenticated (line 151) | def is_authenticated(self):
    method handle_request_limit_error (line 155) | def handle_request_limit_error(e: Exception):
    method _set_trace_model (line 160) | def _set_trace_model(model_uid: Optional[str]) -> None:
    method _set_trace_model_type (line 176) | def _set_trace_model_type(model_type: Optional[str]) -> None:
    method _get_supervisor_ref (line 191) | async def _get_supervisor_ref(self) -> xo.ActorRefType[SupervisorActor]:
    method _get_event_collector_ref (line 198) | async def _get_event_collector_ref(self) -> xo.ActorRefType[EventColle...
    method _report_error_event (line 205) | async def _report_error_event(self, model_uid: Optional[str], content:...
    method serve (line 223) | def serve(self, logging_conf: Optional[dict] = None):
    method _get_builtin_prompts (line 357) | async def _get_builtin_prompts(self) -> JSONResponse:
    method _get_builtin_families (line 368) | async def _get_builtin_families(self) -> JSONResponse:
    method build_llm_registration_from_config (line 379) | async def build_llm_registration_from_config(
    method list_models (line 400) | async def list_models(self) -> JSONResponse:
    method anthropic_list_models (line 422) | async def anthropic_list_models(self) -> JSONResponse:
    method anthropic_get_model (line 449) | async def anthropic_get_model(self, model_id: str) -> JSONResponse:
    method describe_model (line 474) | async def describe_model(self, model_uid: str) -> JSONResponse:
    method launch_model (line 486) | async def launch_model(
    method get_instance_info (line 598) | async def get_instance_info(
    method get_model_replicas (line 612) | async def get_model_replicas(self, model_uid: str) -> JSONResponse:
    method get_launch_model_progress (line 626) | async def get_launch_model_progress(self, model_uid: str) -> JSONRespo...
    method cancel_launch_model (line 636) | async def cancel_launch_model(self, model_uid: str) -> JSONResponse:
    method launch_model_by_version (line 646) | async def launch_model_by_version(
    method get_model_versions (line 674) | async def get_model_versions(
    method build_gradio_interface (line 686) | async def build_gradio_interface(
    method build_gradio_media_interface (line 729) | async def build_gradio_media_interface(
    method terminate_model (line 769) | async def terminate_model(self, model_uid: str) -> JSONResponse:
    method _get_model_last_error (line 790) | async def _get_model_last_error(self, replica_model_uid: bytes, e: Exc...
    method create_completion (line 803) | async def create_completion(self, request: Request) -> Response:
    method create_message (line 878) | async def create_message(self, request: Request) -> Response:
    method create_embedding (line 1019) | async def create_embedding(self, request: Request) -> Response:
    method convert_ids_to_tokens (line 1048) | async def convert_ids_to_tokens(self, request: Request) -> Response:
    method rerank (line 1075) | async def rerank(self, request: Request) -> Response:
    method create_transcriptions (line 1108) | async def create_transcriptions(
    method create_translations (line 1152) | async def create_translations(
    method create_speech (line 1196) | async def create_speech(
    method create_images (line 1258) | async def create_images(self, request: Request) -> Response:
    method sdapi_options (line 1292) | async def sdapi_options(self, request: Request) -> Response:
    method sdapi_sd_models (line 1312) | async def sdapi_sd_models(self, request: Request) -> Response:
    method sdapi_samplers (line 1325) | async def sdapi_samplers(self, request: Request) -> Response:
    method sdapi_txt2img (line 1338) | async def sdapi_txt2img(self, request: Request) -> Response:
    method sdapi_img2img (line 1362) | async def sdapi_img2img(self, request: Request) -> Response:
    method create_variations (line 1386) | async def create_variations(
    method create_inpainting (line 1443) | async def create_inpainting(
    method create_ocr (line 1498) | async def create_ocr(
    method create_image_edits (line 1537) | async def create_image_edits(
    method _stream_image_edit (line 1701) | async def _stream_image_edit(
    method create_flexible_infer (line 1778) | async def create_flexible_infer(self, request: Request) -> Response:
    method create_videos (line 1803) | async def create_videos(self, request: Request) -> Response:
    method create_videos_from_images (line 1835) | async def create_videos_from_images(
    method create_videos_from_first_last_frame (line 1879) | async def create_videos_from_first_last_frame(
    method create_chat_completion (line 1925) | async def create_chat_completion(self, request: Request) -> Response:
    method query_engines_by_model_name (line 2077) | async def query_engines_by_model_name(
    method register_model (line 2100) | async def register_model(self, model_type: str, request: Request) -> J...
    method unregister_model (line 2118) | async def unregister_model(self, model_type: str, model_name: str) -> ...
    method update_model_type (line 2131) | async def update_model_type(self, request: Request) -> JSONResponse:
    method list_model_registrations (line 2167) | async def list_model_registrations(
    method get_model_registrations (line 2189) | async def get_model_registrations(
    method get_model_events (line 2204) | async def get_model_events(self, model_uid: str) -> JSONResponse:
    method abort_request (line 2216) | async def abort_request(
    method list_vllm_supported_model_families (line 2240) | async def list_vllm_supported_model_families(self) -> JSONResponse:
    method extract_guided_params (line 2257) | def extract_guided_params(raw_body: dict) -> dict:
    method _convert_openai_to_anthropic (line 2303) | def _convert_openai_to_anthropic(self, openai_response: dict, model: s...
  function run (line 2394) | def run(
  function run_in_subprocess (line 2429) | def run_in_subprocess(

FILE: xinference/api/routers/__init__.py
  function register_all_routes (line 17) | def register_all_routes(api: RESTfulAPI) -> None:

FILE: xinference/api/routers/admin.py
  function get_status (line 26) | async def get_status(api: "RESTfulAPI" = Depends(get_api)) -> JSONResponse:
  function get_address (line 36) | async def get_address(api: "RESTfulAPI" = Depends(get_api)) -> JSONRespo...
  function login_for_access_token (line 40) | async def login_for_access_token(
  function is_cluster_authenticated (line 50) | async def is_cluster_authenticated(
  function get_cluster_device_info (line 56) | async def get_cluster_device_info(
  function get_cluster_version (line 69) | async def get_cluster_version() -> JSONResponse:
  function get_devices_count (line 78) | async def get_devices_count(
  function get_workers_info (line 90) | async def get_workers_info(
  function get_supervisor_info (line 105) | async def get_supervisor_info(
  function abort_cluster (line 120) | async def abort_cluster(
  function list_cached_models (line 136) | async def list_cached_models(
  function list_model_files (line 154) | async def list_model_files(
  function confirm_and_remove_model (line 176) | async def confirm_and_remove_model(
  function list_virtual_envs (line 195) | async def list_virtual_envs(
  function remove_virtual_env (line 216) | async def remove_virtual_env(
  function get_progress (line 242) | async def get_progress(
  function register_routes (line 260) | def register_routes(api: "RESTfulAPI") -> None:

FILE: xinference/api/routers/audio.py
  function register_routes (line 13) | def register_routes(api: "RESTfulAPI") -> None:

FILE: xinference/api/routers/embeddings.py
  function register_routes (line 13) | def register_routes(api: "RESTfulAPI") -> None:

FILE: xinference/api/routers/images.py
  function register_routes (line 15) | def register_routes(api: "RESTfulAPI") -> None:

FILE: xinference/api/routers/llm.py
  function register_routes (line 15) | def register_routes(api: "RESTfulAPI") -> None:

FILE: xinference/api/routers/models.py
  function register_routes (line 13) | def register_routes(api: "RESTfulAPI") -> None:

FILE: xinference/api/routers/rerank.py
  function register_routes (line 13) | def register_routes(api: "RESTfulAPI") -> None:

FILE: xinference/api/routers/videos.py
  function register_routes (line 15) | def register_routes(api: "RESTfulAPI") -> None:

FILE: xinference/api/schemas/requests.py
  class CreateCompletionRequest (line 15) | class CreateCompletionRequest(CreateCompletion):
    class Config (line 16) | class Config:
  class CreateEmbeddingRequest (line 25) | class CreateEmbeddingRequest(BaseModel):
    class Config (line 32) | class Config:
  class RerankRequest (line 40) | class RerankRequest(BaseModel):
  class TextToImageRequest (line 51) | class TextToImageRequest(BaseModel):
  class SDAPIOptionsRequest (line 61) | class SDAPIOptionsRequest(BaseModel):
  class SDAPITxt2imgRequst (line 65) | class SDAPITxt2imgRequst(BaseModel):
  class SDAPIImg2imgRequst (line 81) | class SDAPIImg2imgRequst(BaseModel):
  class TextToVideoRequest (line 98) | class TextToVideoRequest(BaseModel):
  class SpeechRequest (line 106) | class SpeechRequest(BaseModel):
  class RegisterModelRequest (line 116) | class RegisterModelRequest(BaseModel):
  class AutoConfigLLMRequest (line 122) | class AutoConfigLLMRequest(BaseModel):
  class UpdateModelRequest (line 127) | class UpdateModelRequest(BaseModel):
  class BuildGradioInterfaceRequest (line 131) | class BuildGradioInterfaceRequest(BaseModel):
  class BuildGradioMediaInterfaceRequest (line 143) | class BuildGradioMediaInterfaceRequest(BaseModel):

FILE: xinference/api/tests/test_admin.py
  function _json_body (line 26) | def _json_body(response):
  function mock_supervisor (line 31) | def mock_supervisor():
  function mock_api (line 53) | def mock_api(mock_supervisor):
  function test_get_status_returns_200_and_data (line 66) | async def test_get_status_returns_200_and_data(mock_api, mock_supervisor):
  function test_get_status_raises_500_on_supervisor_error (line 74) | async def test_get_status_raises_500_on_supervisor_error(mock_api, mock_...
  function test_get_address_returns_supervisor_address (line 83) | async def test_get_address_returns_supervisor_address(mock_api):
  function test_get_cluster_version_returns_version (line 91) | async def test_get_cluster_version_returns_version():
  function test_is_cluster_authenticated_returns_auth_flag (line 99) | async def test_is_cluster_authenticated_returns_auth_flag(mock_api):
  function test_login_for_access_token_returns_token (line 111) | async def test_login_for_access_token_returns_token(mock_api):
  function test_get_cluster_device_info_returns_data (line 127) | async def test_get_cluster_device_info_returns_data(mock_api, mock_super...
  function test_get_devices_count_returns_data (line 140) | async def test_get_devices_count_returns_data(mock_api, mock_supervisor):
  function test_get_workers_info_returns_data (line 148) | async def test_get_workers_info_returns_data(mock_api, mock_supervisor):
  function test_get_supervisor_info_returns_data (line 158) | async def test_get_supervisor_info_returns_data(mock_api, mock_supervisor):
  function test_abort_cluster_returns_result_and_does_not_kill_in_test (line 166) | async def test_abort_cluster_returns_result_and_does_not_kill_in_test(
  function test_list_cached_models_returns_list (line 177) | async def test_list_cached_models_returns_list(mock_api, mock_supervisor):
  function test_list_model_files_returns_paths (line 188) | async def test_list_model_files_returns_paths(mock_api, mock_supervisor):
  function test_confirm_and_remove_model_returns_result (line 203) | async def test_confirm_and_remove_model_returns_result(mock_api, mock_su...
  function test_list_virtual_envs_returns_list (line 213) | async def test_list_virtual_envs_returns_list(mock_api, mock_supervisor):
  function test_remove_virtual_env_requires_model_name (line 226) | async def test_remove_virtual_env_requires_model_name(mock_api):
  function test_remove_virtual_env_returns_result (line 240) | async def test_remove_virtual_env_returns_result(mock_api, mock_supervis...
  function test_get_progress_returns_progress (line 254) | async def test_get_progress_returns_progress(mock_api, mock_supervisor):
  function test_get_progress_raises_400_on_key_error (line 263) | async def test_get_progress_raises_400_on_key_error(mock_api, mock_super...

FILE: xinference/api/tests/test_utils.py
  class DummyModel (line 21) | class DummyModel:
    method __init__ (line 24) | def __init__(self, uid: str):
  class DummySupervisor (line 28) | class DummySupervisor:
    method __init__ (line 31) | def __init__(self, models=None):
    method get_model (line 34) | async def get_model(self, model_uid: str):
  class TestRequireModel (line 40) | class TestRequireModel:
    method test_successful_get (line 44) | async def test_successful_get(self):
    method test_model_not_found_raises_400 (line 56) | async def test_model_not_found_raises_400(self):
    method test_unexpected_error_raises_500 (line 69) | async def test_unexpected_error_raises_500(self):
    method test_reports_error_event (line 81) | async def test_reports_error_event(self):
    method test_no_report_error_event_when_none (line 101) | async def test_no_report_error_event_when_none(self):

FILE: xinference/api/utils.py
  function require_model (line 27) | async def require_model(

FILE: xinference/client/common.py
  function convert_float_to_int_or_str (line 19) | def convert_float_to_int_or_str(model_size: float) -> Union[int, str]:
  function streaming_response_iterator (line 30) | def streaming_response_iterator(
  function async_streaming_response_iterator (line 66) | async def async_streaming_response_iterator(

FILE: xinference/client/restful/async_restful_client.py
  function _filter_params (line 35) | def _filter_params(params: Dict[Any, Any]) -> Dict[Any, Any]:
  function _get_error_string (line 43) | async def _get_error_string(response: aiohttp.ClientResponse) -> str:
  function _release_response (line 61) | async def _release_response(response: aiohttp.ClientResponse):
  class AsyncRESTfulModelHandle (line 67) | class AsyncRESTfulModelHandle:
    method __init__ (line 73) | def __init__(self, model_uid: str, base_url: str, auth_headers: Dict):
    method close (line 82) | async def close(self):
    method __del__ (line 88) | def __del__(self):
  class AsyncRESTfulEmbeddingModelHandle (line 94) | class AsyncRESTfulEmbeddingModelHandle(AsyncRESTfulModelHandle):
    method create_embedding (line 95) | async def create_embedding(
    method convert_ids_to_tokens (line 136) | async def convert_ids_to_tokens(
  class AsyncRESTfulRerankModelHandle (line 177) | class AsyncRESTfulRerankModelHandle(AsyncRESTfulModelHandle):
    method rerank (line 178) | async def rerank(
  class AsyncRESTfulImageModelHandle (line 239) | class AsyncRESTfulImageModelHandle(AsyncRESTfulModelHandle):
    method text_to_image (line 240) | async def text_to_image(
    method image_to_image (line 287) | async def image_to_image(
    method image_edit (line 370) | async def image_edit(
    method inpainting (line 521) | async def inpainting(
    method ocr (line 601) | async def ocr(self, image: Union[str, bytes], **kwargs):
  class AsyncRESTfulVideoModelHandle (line 623) | class AsyncRESTfulVideoModelHandle(AsyncRESTfulModelHandle):
    method text_to_video (line 624) | async def text_to_video(
    method image_to_video (line 663) | async def image_to_video(
    method flf_to_video (line 712) | async def flf_to_video(
  class AsyncRESTfulGenerateModelHandle (line 780) | class AsyncRESTfulGenerateModelHandle(AsyncRESTfulModelHandle):
    method generate (line 781) | async def generate(
  class AsyncRESTfulChatModelHandle (line 835) | class AsyncRESTfulChatModelHandle(AsyncRESTfulGenerateModelHandle):
    method chat (line 836) | async def chat(
  class AsyncRESTfulAudioModelHandle (line 917) | class AsyncRESTfulAudioModelHandle(AsyncRESTfulModelHandle):
    method transcriptions (line 918) | async def transcriptions(
    method translations (line 987) | async def translations(
    method speech (line 1054) | async def speech(
  class AsyncRESTfulFlexibleModelHandle (line 1142) | class AsyncRESTfulFlexibleModelHandle(AsyncRESTfulModelHandle):
    method infer (line 1143) | async def infer(
  class AsyncClient (line 1177) | class AsyncClient:
    method __init__ (line 1178) | def __init__(self, base_url, api_key: Optional[str] = None):
    method close (line 1190) | async def close(self):
    method __del__ (line 1196) | def __del__(self):
    method _set_token (line 1201) | def _set_token(self, token: Optional[str]):
    method _get_token (line 1206) | def _get_token(self) -> Optional[str]:
    method _check_cluster_authenticated (line 1213) | def _check_cluster_authenticated(self):
    method vllm_models (line 1231) | async def vllm_models(self) -> Dict[str, Any]:
    method login (line 1248) | async def login(self, username: str, password: str):
    method list_models (line 1269) | async def list_models(self) -> Dict[str, Dict[str, Any]]:
    method launch_model (line 1293) | async def launch_model(
    method terminate_model (line 1410) | async def terminate_model(self, model_uid: str):
    method get_launch_model_progress (line 1435) | async def get_launch_model_progress(self, model_uid: str) -> dict:
    method cancel_launch_model (line 1465) | async def cancel_launch_model(self, model_uid: str):
    method get_instance_info (line 1488) | async def get_instance_info(self, model_name: str, model_uid: str):
    method _get_supervisor_internal_address (line 1501) | async def _get_supervisor_internal_address(self):
    method get_model (line 1510) | async def get_model(self, model_uid: str) -> AsyncRESTfulModelHandle:
    method describe_model (line 1580) | async def describe_model(self, model_uid: str):
    method register_model (line 1632) | async def register_model(
    method unregister_model (line 1672) | async def unregister_model(self, model_type: str, model_name: str):
    method list_model_registrations (line 1699) | async def list_model_registrations(self, model_type: str) -> List[Dict...
    method list_cached_models (line 1730) | async def list_cached_models(
    method list_deletable_models (line 1769) | async def list_deletable_models(
    method confirm_and_remove_model (line 1801) | async def confirm_and_remove_model(
    method get_model_registration (line 1833) | async def get_model_registration(
    method query_engine_by_model_name (line 1862) | async def query_engine_by_model_name(
    method abort_request (line 1899) | async def abort_request(
    method get_workers_info (line 1934) | async def get_workers_info(self):
    method get_supervisor_info (line 1945) | async def get_supervisor_info(self):
    method get_progress (line 1956) | async def get_progress(self, request_id: str):
    method abort_cluster (line 1967) | async def abort_cluster(self):

FILE: xinference/client/restful/restful_client.py
  function _get_error_string (line 34) | def _get_error_string(response: requests.Response) -> str:
  class RESTfulModelHandle (line 47) | class RESTfulModelHandle:
    method __init__ (line 53) | def __init__(self, model_uid: str, base_url: str, auth_headers: Dict):
    method close (line 59) | def close(self):
    method __del__ (line 67) | def __del__(self):
  class RESTfulEmbeddingModelHandle (line 72) | class RESTfulEmbeddingModelHandle(RESTfulModelHandle):
    method create_embedding (line 73) | def create_embedding(self, input: Union[str, List[str]], **kwargs) -> ...
    method convert_ids_to_tokens (line 109) | def convert_ids_to_tokens(
  class RESTfulRerankModelHandle (line 147) | class RESTfulRerankModelHandle(RESTfulModelHandle):
    method rerank (line 148) | def rerank(
  class RESTfulImageModelHandle (line 206) | class RESTfulImageModelHandle(RESTfulModelHandle):
    method text_to_image (line 207) | def text_to_image(
    method image_to_image (line 251) | def image_to_image(
    method image_edit (line 332) | def image_edit(
    method inpainting (line 463) | def inpainting(
    method ocr (line 538) | def ocr(self, image: Union[str, bytes], **kwargs):
  class RESTfulVideoModelHandle (line 558) | class RESTfulVideoModelHandle(RESTfulModelHandle):
    method text_to_video (line 559) | def text_to_video(
    method image_to_video (line 595) | def image_to_video(
    method flf_to_video (line 642) | def flf_to_video(
  class RESTfulGenerateModelHandle (line 696) | class RESTfulGenerateModelHandle(RESTfulModelHandle):
    method generate (line 697) | def generate(
  class RESTfulChatModelHandle (line 751) | class RESTfulChatModelHandle(RESTfulGenerateModelHandle):
    method chat (line 752) | def chat(
  class RESTfulAudioModelHandle (line 832) | class RESTfulAudioModelHandle(RESTfulModelHandle):
    method transcriptions (line 833) | def transcriptions(
    method translations (line 898) | def translations(
    method speech (line 961) | def speech(
  class RESTfulFlexibleModelHandle (line 1042) | class RESTfulFlexibleModelHandle(RESTfulModelHandle):
    method infer (line 1043) | def infer(
  class Client (line 1079) | class Client:
    method __init__ (line 1080) | def __init__(self, base_url, api_key: Optional[str] = None):
    method close (line 1089) | def close(self):
    method __del__ (line 1097) | def __del__(self):
    method _set_token (line 1101) | def _set_token(self, token: Optional[str]):
    method _get_token (line 1106) | def _get_token(self) -> Optional[str]:
    method _check_cluster_authenticated (line 1113) | def _check_cluster_authenticated(self):
    method vllm_models (line 1127) | def vllm_models(self) -> Dict[str, Any]:
    method login (line 1140) | def login(self, username: str, password: str):
    method list_models (line 1156) | def list_models(self) -> Dict[str, Dict[str, Any]]:
    method launch_model (line 1179) | def launch_model(
    method terminate_model (line 1308) | def terminate_model(self, model_uid: str):
    method get_launch_model_progress (line 1332) | def get_launch_model_progress(self, model_uid: str) -> dict:
    method cancel_launch_model (line 1360) | def cancel_launch_model(self, model_uid: str):
    method get_instance_info (line 1382) | def get_instance_info(self, model_name: str, model_uid: str):
    method _get_supervisor_internal_address (line 1394) | def _get_supervisor_internal_address(self):
    method get_model (line 1402) | def get_model(self, model_uid: str) -> RESTfulModelHandle:
    method describe_model (line 1472) | def describe_model(self, model_uid: str):
    method register_model (line 1522) | def register_model(
    method unregister_model (line 1559) | def unregister_model(self, model_type: str, model_name: str):
    method list_model_registrations (line 1585) | def list_model_registrations(
    method list_cached_models (line 1619) | def list_cached_models(
    method list_deletable_models (line 1657) | def list_deletable_models(
    method confirm_and_remove_model (line 1687) | def confirm_and_remove_model(
    method get_model_registration (line 1717) | def get_model_registration(
    method query_engine_by_model_name (line 1745) | def query_engine_by_model_name(
    method abort_request (line 1781) | def abort_request(self, model_uid: str, request_id: str, block_duratio...
    method get_workers_info (line 1813) | def get_workers_info(self):
    method get_supervisor_info (line 1823) | def get_supervisor_info(self):
    method get_progress (line 1833) | def get_progress(self, request_id: str):
    method abort_cluster (line 1843) | def abort_cluster(self):

FILE: xinference/client/tests/test_async_client.py
  class _DummyAsyncResponse (line 30) | class _DummyAsyncResponse:
    method json (line 33) | async def json(self):
    method release (line 36) | def release(self):
    method wait_for_close (line 39) | async def wait_for_close(self):
  class _DummyAsyncSession (line 43) | class _DummyAsyncSession:
    method __init__ (line 44) | def __init__(self):
    method post (line 47) | async def post(self, url, json=None, headers=None):
    method close (line 51) | async def close(self):
  function test_async_RESTful_client (line 57) | async def test_async_RESTful_client(setup):
  function test_async_query_engines_by_name (line 184) | async def test_async_query_engines_by_name(setup):
  function test_async_list_cached_models (line 199) | async def test_async_list_cached_models(setup):
  function test_async_RESTful_client_for_embedding (line 208) | async def test_async_RESTful_client_for_embedding(setup):
  function test_async_RESTful_client_custom_model (line 229) | async def test_async_RESTful_client_custom_model(setup):
  function test_async_client_from_modelscope (line 370) | async def test_async_client_from_modelscope(setup):
  function test_async_client_custom_embedding_model (line 395) | async def test_async_client_custom_embedding_model(setup):
  function test_async_rerank (line 448) | async def test_async_rerank(setup):
  function set_auto_recover_limit (line 509) | def set_auto_recover_limit():
  function set_test_oom_error (line 516) | def set_test_oom_error():
  function setup_cluster (line 523) | def setup_cluster():
  function test_auto_recover (line 558) | async def test_auto_recover(set_auto_recover_limit, setup_cluster):
  function test_model_error (line 598) | async def test_model_error(set_test_oom_error, setup_cluster):
  function test_async_restful_chat_enable_thinking_injected (line 620) | async def test_async_restful_chat_enable_thinking_injected():

FILE: xinference/client/tests/test_async_client_with_auth.py
  function test_async_client_auth (line 22) | async def test_async_client_auth(setup_with_auth):

FILE: xinference/client/tests/test_client.py
  class _DummyResponse (line 33) | class _DummyResponse:
    method json (line 36) | def json(self):
  class _DummySession (line 40) | class _DummySession:
    method __init__ (line 41) | def __init__(self):
    method post (line 44) | def post(self, url, json=None, stream=None, headers=None):
    method close (line 48) | def close(self):
  function test_RESTful_client (line 53) | def test_RESTful_client(setup):
  function test_query_engines_by_name (line 192) | def test_query_engines_by_name(setup):
  function test_list_cached_models (line 202) | def test_list_cached_models(setup):
  function test_RESTful_client_for_embedding (line 209) | def test_RESTful_client_for_embedding(setup):
  function test_RESTful_client_custom_model (line 227) | def test_RESTful_client_custom_model(setup):
  function test_client_from_modelscope (line 360) | def test_client_from_modelscope(setup):
  function test_client_error (line 380) | def test_client_error():
  function test_client_custom_embedding_model (line 392) | def test_client_custom_embedding_model(setup):
  function set_auto_recover_limit (line 442) | def set_auto_recover_limit():
  function set_test_oom_error (line 449) | def set_test_oom_error():
  function setup_cluster (line 456) | def setup_cluster():
  function test_auto_recover (line 490) | def test_auto_recover(set_auto_recover_limit, setup_cluster):
  function test_model_error (line 528) | def test_model_error(set_test_oom_error, setup_cluster):
  function test_restful_chat_enable_thinking_injected (line 548) | def test_restful_chat_enable_thinking_injected():

FILE: xinference/client/tests/test_client_with_auth.py
  function test_client_auth (line 21) | def test_client_auth(setup_with_auth):

FILE: xinference/conftest.py
  function api_health_check (line 104) | def api_health_check(endpoint: str, max_attempts: int, sleep_interval: i...
  function _start_test_cluster (line 128) | async def _start_test_cluster(
  function run_test_cluster (line 154) | def run_test_cluster(address: str, logging_conf: Optional[Dict] = None):
  function run_test_cluster_in_subprocess (line 167) | def run_test_cluster_in_subprocess(
  function setup (line 179) | def setup():
  function setup_with_file_logging (line 211) | def setup_with_file_logging():
  function setup_with_auth (line 243) | def setup_with_auth():

FILE: xinference/constants.py
  function get_xinference_home (line 61) | def get_xinference_home() -> str:

FILE: xinference/core/cache_tracker.py
  class CacheTrackerActor (line 22) | class CacheTrackerActor(xo.Actor):
    method __init__ (line 23) | def __init__(self):
    method default_uid (line 28) | def default_uid(cls) -> str:
    method _map_address_to_file_location (line 32) | def _map_address_to_file_location(
    method _update_file_location (line 44) | def _update_file_location(data: Dict, origin_version_info: Dict):
    method record_model_version (line 51) | def record_model_version(self, version_info: Dict[str, List[Dict]], ad...
    method update_cache_status (line 72) | def update_cache_status(
    method unregister_model_version (line 91) | def unregister_model_version(self, model_name: str):
    method get_model_versions (line 94) | def get_model_versions(self, model_name: str) -> List[Dict]:
    method get_model_version_count (line 101) | def get_model_version_count(self, model_name: str) -> int:
    method list_cached_models (line 104) | def list_cached_models(
    method list_deletable_models (line 126) | def list_deletable_models(self, model_version: str, worker_ip: str) ->...
    method confirm_and_remove_model (line 140) | def confirm_and_remove_model(self, model_version: str, worker_ip: str):

FILE: xinference/core/event.py
  class EventType (line 24) | class EventType(Enum):
  class Event (line 30) | class Event(TypedDict):
  class EventCollectorActor (line 36) | class EventCollectorActor(xo.StatelessActor):
    method __init__ (line 37) | def __init__(self):
    method default_uid (line 44) | def default_uid(cls) -> str:
    method get_model_events (line 47) | def get_model_events(self, model_uid: str) -> List[Dict]:
    method report_event (line 54) | def report_event(self, model_uid: str, event: Event):

FILE: xinference/core/launch_strategy.py
  class LaunchStrategy (line 25) | class LaunchStrategy(ABC):
    method select_worker (line 31) | def select_worker(
  class IdleFirstLaunchStrategy (line 47) | class IdleFirstLaunchStrategy(LaunchStrategy):
    method __init__ (line 53) | def __init__(self, worker_status: Dict[str, "WorkerStatus"]):
    method _select_least_loaded_gpu (line 57) | def _select_least_loaded_gpu(
    method _reserve_slot (line 99) | def _reserve_slot(
    method select_worker (line 114) | def select_worker(

FILE: xinference/core/metrics.py
  function record_metrics (line 42) | def record_metrics(name, op, kwargs):
  function launch_metrics_export_server (line 47) | def launch_metrics_export_server(q, host=None, port=None):

FILE: xinference/core/model.py
  class _OutOfMemoryError (line 61) | class _OutOfMemoryError(Exception):
  function register_batching_multimodal_models (line 76) | def register_batching_multimodal_models(*model_names: str):
  function request_limit (line 86) | def request_limit(fn):
  function oom_check (line 122) | def oom_check(fn):
  class ModelActor (line 153) | class ModelActor(xo.StatelessActor, CancelMixin):
    method __pre_destroy__ (line 156) | async def __pre_destroy__(self):
    method __init__ (line 195) | def __init__(
    method __post_create__ (line 245) | async def __post_create__(self):
    method __repr__ (line 254) | def __repr__(self) -> str:
    method __getattr__ (line 257) | def __getattr__(self, attr: str):
    method decrease_serve_count (line 260) | def decrease_serve_count(self):
    method start_transfer_for_vllm (line 264) | async def start_transfer_for_vllm(self, rank_addresses: List[str]):
    method _record_completion_metrics (line 286) | async def _record_completion_metrics(
    method _get_worker_ref (line 323) | async def _get_worker_ref(self) -> xo.ActorRefType["WorkerActor"]:
    method _get_progress_tracker_ref (line 332) | async def _get_progress_tracker_ref(
    method _get_progressor (line 343) | async def _get_progressor(self, request_id: str):
    method is_vllm_backend (line 354) | def is_vllm_backend(self) -> bool:
    method is_sglang_backend (line 359) | def is_sglang_backend(self) -> bool:
    method load (line 364) | async def load(self):
    method wait_for_load (line 393) | async def wait_for_load(self):
    method need_create_pools (line 397) | def need_create_pools(self):
    method set_pool_addresses (line 400) | def set_pool_addresses(self, pool_addresses: List[str]):
    method get_pool_addresses (line 404) | def get_pool_addresses(self) -> Optional[List[str]]:
    method set_worker_addresses (line 409) | def set_worker_addresses(self, shard: int, worker_addresses: List[str]):
    method model_uid (line 413) | def model_uid(self):
    method get_driver_info (line 424) | def get_driver_info(self):
    method stop (line 430) | async def stop(self):
    method _handle_oom_error (line 436) | async def _handle_oom_error(self, ex):
    method _to_generator (line 447) | def _to_generator(self, output_type: str, gen: types.GeneratorType):
    method _to_async_gen (line 486) | async def _to_async_gen(self, output_type: str, gen: types.AsyncGenera...
    method _handle_pending_requests (line 527) | async def _handle_pending_requests(self):
    method _call_wrapper_json (line 568) | async def _call_wrapper_json(self, fn: Callable, *args, **kwargs):
    method _call_wrapper_binary (line 571) | async def _call_wrapper_binary(self, fn: Callable, *args, **kwargs):
    method _call_wrapper (line 575) | async def _call_wrapper(self, output_type: str, fn: Callable, *args, *...
    method generate (line 643) | async def generate(self, prompt: str, *args, **kwargs):
    method chat (line 669) | async def chat(self, messages: List[Dict], *args, **kwargs):
    method abort_request (line 715) | async def abort_request(
    method create_embedding (line 736) | async def create_embedding(self, input: Union[str, List[str]], *args, ...
    method convert_ids_to_tokens (line 749) | async def convert_ids_to_tokens(
    method rerank (line 762) | async def rerank(
    method transcriptions (line 790) | async def transcriptions(
    method translations (line 818) | async def translations(
    method speech (line 846) | async def speech(
    method text_to_image (line 872) | async def text_to_image(
    method txt2img (line 901) | async def txt2img(
    method image_to_image (line 920) | async def image_to_image(
    method img2img (line 953) | async def img2img(
    method inpainting (line 972) | async def inpainting(
    method ocr (line 1009) | async def ocr(
    method infer (line 1026) | async def infer(
    method text_to_video (line 1044) | async def text_to_video(
    method image_to_video (line 1069) | async def image_to_video(
    method flf_to_video (line 1098) | async def flf_to_video(
    method record_metrics (line 1127) | async def record_metrics(self, name, op, kwargs):
    method get_pending_requests_count (line 1131) | async def get_pending_requests_count(self):

FILE: xinference/core/otel.py
  function setup_otel (line 49) | def setup_otel(
  function _build_headers (line 150) | def _build_headers(api_key: str) -> Optional[dict]:
  function _setup_tracing (line 157) | def _setup_tracing(
  function _build_span_exporter (line 197) | def _build_span_exporter(
  function _setup_metrics (line 230) | def _setup_metrics(
  function _build_metric_exporter (line 260) | def _build_metric_exporter(
  function _instrument_fastapi (line 292) | def _instrument_fastapi(app) -> None:  # type: ignore[type-arg]
  class ClusterMetricsCollector (line 305) | class ClusterMetricsCollector:
    method __init__ (line 324) | def __init__(self) -> None:
    method update (line 328) | def update(self, worker_address: str, node_info: dict) -> None:
    method remove_worker (line 332) | def remove_worker(self, worker_address: str) -> None:
    method register (line 336) | def register(self) -> None:
    method _cpu_utilization_cb (line 393) | def _cpu_utilization_cb(self, options):  # type: ignore[no-untyped-def]
    method _cpu_count_cb (line 401) | def _cpu_count_cb(self, options):  # type: ignore[no-untyped-def]
    method _memory_used_cb (line 411) | def _memory_used_cb(self, options):  # type: ignore[no-untyped-def]
    method _memory_total_cb (line 419) | def _memory_total_cb(self, options):  # type: ignore[no-untyped-def]
    method _gpu_utilization_cb (line 429) | def _gpu_utilization_cb(self, options):  # type: ignore[no-untyped-def]
    method _gpu_mem_used_cb (line 445) | def _gpu_mem_used_cb(self, options):  # type: ignore[no-untyped-def]
    method _gpu_mem_total_cb (line 461) | def _gpu_mem_total_cb(self, options):  # type: ignore[no-untyped-def]
    method _gpu_mem_free_cb (line 477) | def _gpu_mem_free_cb(self, options):  # type: ignore[no-untyped-def]
  function get_cluster_metrics_collector (line 498) | def get_cluster_metrics_collector() -> Optional[ClusterMetricsCollector]:

FILE: xinference/core/progress_tracker.py
  class _ProgressInfo (line 39) | class _ProgressInfo:
  class ProgressTrackerActor (line 45) | class ProgressTrackerActor(xo.StatelessActor):
    method default_uid (line 49) | def default_uid(cls) -> str:
    method __init__ (line 52) | def __init__(
    method __post_create__ (line 64) | async def __post_create__(self):
    method __pre_destroy__ (line 67) | async def __pre_destroy__(self):
    method _clear_finished (line 71) | async def _clear_finished(self):
    method start (line 95) | def start(self, request_id: str, info: Optional[str] = None):
    method set_progress (line 100) | def set_progress(
    method get_progress (line 113) | def get_progress(self, request_id: str) -> float:
    method get_progress_info (line 116) | def get_progress_info(self, request_id: str) -> Tuple[float, Optional[...
  class Progressor (line 121) | class Progressor:
    method __init__ (line 124) | def __init__(
    method start (line 144) | async def start(self):
    method split_stages (line 148) | def split_stages(self, n_stage: int, stage_weight: Optional[List[float...
    method __enter__ (line 165) | def __enter__(self):
    method __exit__ (line 172) | def __exit__(self, exc_type, exc_val, exc_tb):
    method set_progress (line 180) | def set_progress(self, progress: float, info: Optional[str] = None):

FILE: xinference/core/resource.py
  class ResourceStatus (line 24) | class ResourceStatus:
  class GPUStatus (line 33) | class GPUStatus:
  function gather_node_info (line 42) | def gather_node_info() -> Dict[str, Union[ResourceStatus, GPUStatus]]:

FILE: xinference/core/status_guard.py
  class LaunchStatus (line 25) | class LaunchStatus(Enum):
  class ReplicaStatus (line 34) | class ReplicaStatus(BaseModel):
  class InstanceInfo (line 45) | class InstanceInfo(BaseModel):
    method update (line 56) | def update(self, **kwargs):
  class StatusGuardActor (line 61) | class StatusGuardActor(xo.StatelessActor):
    method __init__ (line 62) | def __init__(self):
    method default_uid (line 67) | def default_uid(cls) -> str:
    method _drop_terminated_info (line 71) | def _drop_terminated_info(instance_infos: List[InstanceInfo]) -> List[...
    method set_instance_info (line 78) | def set_instance_info(self, model_uid: str, info: InstanceInfo):
    method get_instance_info (line 81) | def get_instance_info(
    method get_instance_count (line 100) | def get_instance_count(self, model_name: str) -> int:
    method update_instance_info (line 103) | def update_instance_info(self, model_uid: str, info: Dict):
    method update_replica_status (line 106) | def update_replica_status(
    method get_replica_statuses (line 141) | def get_replica_statuses(self, model_uid: str) -> List[ReplicaStatus]:

FILE: xinference/core/supervisor.py
  function callback_for_async_launch (line 88) | def callback_for_async_launch(model_uid: str):
  class WorkerStatus (line 94) | class WorkerStatus:
  class ReplicaInfo (line 101) | class ReplicaInfo:
  class SupervisorActor (line 109) | class SupervisorActor(xo.StatelessActor):
    method __init__ (line 110) | def __init__(self):
    method default_uid (line 126) | def default_uid(cls) -> str:
    method _get_worker_ref_by_ip (line 129) | def _get_worker_ref_by_ip(
    method __post_create__ (line 138) | async def __post_create__(self):
    method get_cluster_device_info (line 307) | async def get_cluster_device_info(self, detailed: bool = False) -> List:
    method get_builtin_prompts (line 355) | async def get_builtin_prompts() -> Dict[str, Any]:
    method get_builtin_families (line 361) | async def get_builtin_families() -> Dict[str, List[str]]:
    method get_devices_count (line 385) | async def get_devices_count(self) -> int:
    method _choose_worker (line 415) | async def _choose_worker(
    method get_status (line 439) | def get_status(self) -> Dict:
    method _get_spec_dicts (line 445) | def _get_spec_dicts(
    method _to_llm_reg (line 465) | async def _to_llm_reg(
    method _to_embedding_model_reg (line 489) | async def _to_embedding_model_reg(
    method _to_rerank_model_reg (line 516) | async def _to_rerank_model_reg(
    method _to_image_model_reg (line 543) | async def _to_image_model_reg(
    method _to_audio_model_reg (line 578) | async def _to_audio_model_reg(
    method _to_video_model_reg (line 613) | async def _to_video_model_reg(
    method _to_flexible_model_reg (line 648) | async def _to_flexible_model_reg(
    method list_model_registrations (line 670) | async def list_model_registrations(
    method get_model_registration (line 695) | async def get_model_registration(self, model_type: str, model_name: st...
    method query_engines_by_model_name (line 706) | async def query_engines_by_model_name(
    method register_model (line 734) | async def register_model(
    method _sync_register_model (line 802) | async def _sync_register_model(
    method unregister_model (line 827) | async def unregister_model(self, model_type: str, model_name: str):
    method update_model_type (line 841) | async def update_model_type(self, model_type: str):
    method _gen_model_uid (line 876) | def _gen_model_uid(self, model_name: str) -> str:
    method get_model_versions (line 884) | async def get_model_versions(self, model_type: str, model_name: str) -...
    method get_model_version_count (line 887) | async def get_model_version_count(self, model_name: str) -> int:
    method launch_model_by_version (line 891) | async def launch_model_by_version(
    method _get_worker_refs_by_ip (line 923) | def _get_worker_refs_by_ip(self, ip: str) -> List[xo.ActorRefType["Wor...
    method launch_builtin_model (line 937) | async def launch_builtin_model(
    method _launch_builtin_sharded_model (line 1396) | async def _launch_builtin_sharded_model(
    method get_launch_builtin_model_progress (line 1557) | async def get_launch_builtin_model_progress(self, model_uid: str) -> f...
    method cancel_launch_builtin_model (line 1576) | async def cancel_launch_builtin_model(self, model_uid: str):
    method get_instance_info (line 1599) | async def get_instance_info(
    method get_replica_statuses (line 1607) | async def get_replica_statuses(self, model_uid: str) -> List[Dict]:
    method get_instance_count (line 1622) | async def get_instance_count(self, model_name: str) -> int:
    method _check_dead_nodes (line 1625) | async def _check_dead_nodes(self):
    method terminate_model (line 1679) | async def terminate_model(self, model_uid: str, suppress_exception=Fal...
    method get_model (line 1740) | async def get_model(self, model_uid: str) -> xo.ActorRefType["ModelAct...
    method get_model_status (line 1763) | async def get_model_status(self, replica_model_uid: str):
    method describe_model (line 1778) | async def describe_model(self, model_uid: str) -> Dict[str, Any]:
    method list_models (line 1801) | async def list_models(self) -> Dict[str, Dict[str, Any]]:
    method is_local_deployment (line 1813) | def is_local_deployment(self) -> bool:
    method list_cached_models (line 1821) | async def list_cached_models(
    method abort_request (line 1849) | async def abort_request(
    method add_worker (line 1886) | async def add_worker(self, worker_address: str):
    method remove_worker (line 1900) | async def remove_worker(self, worker_address: str):
    method report_worker_status (line 1928) | async def report_worker_status(
    method list_deletable_models (line 1957) | async def list_deletable_models(
    method confirm_and_remove_model (line 1982) | async def confirm_and_remove_model(
    method list_virtual_envs (line 2008) | async def list_virtual_envs(
    method list_virtual_env_packages (line 2043) | async def list_virtual_env_packages(
    method remove_virtual_env (line 2082) | async def remove_virtual_env(
    method get_workers_info (line 2135) | async def get_workers_info(self) -> List[Dict[str, Any]]:
    method get_supervisor_info (line 2141) | async def get_supervisor_info(self) -> Dict[str, Any]:
    method trigger_exit (line 2147) | async def trigger_exit(self) -> bool:
    method abort_cluster (line 2155) | async def abort_cluster(self) -> bool:
    method record_metrics (line 2164) | def record_metrics(name, op, kwargs):
    method get_progress (line 2167) | async def get_progress(self, request_id: str) -> float:
    method call_collective_manager (line 2170) | async def call_collective_manager(

FILE: xinference/core/tests/test_continuous_batching.py
  class BaseThread (line 27) | class BaseThread(threading.Thread):
    method __init__ (line 28) | def __init__(self):
    method run_internal (line 32) | def run_internal(self):
    method run (line 35) | def run(self):
    method join (line 41) | def join(self, timeout=None):
  class InferenceThread (line 47) | class InferenceThread(BaseThread):
    method __init__ (line 48) | def __init__(self, prompt, generate_config, client, model):
    method stream (line 57) | def stream(self):
    method run_internal (line 60) | def run_internal(self):
  class InferenceThreadWithError (line 79) | class InferenceThreadWithError(InferenceThread):
    method __init__ (line 80) | def __init__(self, prompt, generate_config, client, model, sleep=None):
    method run_internal (line 84) | def run_internal(self):
  class AbortThread (line 98) | class AbortThread(BaseThread):
    method __init__ (line 99) | def __init__(self, client, model_uid, request_id, expected_res, sleep=...
    method run_internal (line 107) | def run_internal(self):
  function test_continuous_batching (line 118) | def test_continuous_batching(setup):

FILE: xinference/core/tests/test_launch_strategy.py
  class DummyRef (line 23) | class DummyRef:
    method __init__ (line 24) | def __init__(self, address: str):
  class DummyWorkerRef (line 28) | class DummyWorkerRef:
    method __init__ (line 29) | def __init__(self, address: str, model_count: int, launched: list):
    method get_model_count (line 34) | async def get_model_count(self) -> int:
    method launch_builtin_model (line 37) | async def launch_builtin_model(self, *args, **kwargs):
    method wait_for_load (line 43) | async def wait_for_load(self, model_uid: str):
  class DummyStatusGuard (line 47) | class DummyStatusGuard:
    method __init__ (line 48) | def __init__(self):
    method set_instance_info (line 51) | async def set_instance_info(self, model_uid: str, instance_info):
    method update_instance_info (line 54) | async def update_instance_info(self, model_uid: str, updates: dict):
  function test_assign_replica_gpu_single_slot_reused (line 58) | def test_assign_replica_gpu_single_slot_reused():
  function test_assign_replica_gpu_slicing (line 64) | def test_assign_replica_gpu_slicing():
  function test_idle_first_prefers_empty_gpu (line 71) | def test_idle_first_prefers_empty_gpu(monkeypatch):
  function test_idle_first_balances_with_reserve (line 91) | def test_idle_first_balances_with_reserve(monkeypatch):
  function test_idle_first_fallback_to_count_when_no_alloc (line 113) | def test_idle_first_fallback_to_count_when_no_alloc():
  function test_multi_worker_multi_gpu_even_distribution (line 127) | def test_multi_worker_multi_gpu_even_distribution():
  function test_cpu_fallback_no_gpu_alloc (line 169) | def test_cpu_fallback_no_gpu_alloc():
  function test_idle_first_multi_gpu_single_worker (line 184) | def test_idle_first_multi_gpu_single_worker():
  function test_idle_first_multi_gpu_two_workers (line 204) | def test_idle_first_multi_gpu_two_workers():
  function test_distributed_launch_avoids_same_worker_for_shards (line 234) | async def test_distributed_launch_avoids_same_worker_for_shards():

FILE: xinference/core/tests/test_metrics.py
  function setup_cluster (line 21) | def setup_cluster():
  function test_metrics_exporter_server (line 57) | async def test_metrics_exporter_server(setup_cluster):
  function disable_metrics (line 95) | def disable_metrics():
  function test_disable_metrics_exporter_server (line 104) | async def test_disable_metrics_exporter_server(disable_metrics, setup_cl...
  function test_metrics_exporter_data (line 128) | async def test_metrics_exporter_data(setup_cluster):

FILE: xinference/core/tests/test_model.py
  class MockModelFamily (line 28) | class MockModelFamily:
    method to_description (line 29) | def to_description(self) -> dict:
  class MockModel (line 33) | class MockModel:
    method __init__ (line 34) | def __init__(self):
    method generate (line 37) | async def generate(self, prompt, **kwargs):
  class MockModelActor (line 46) | class MockModelActor(ModelActor):
    method __init__ (line 47) | def __init__(
    method __pre_destroy__ (line 61) | async def __pre_destroy__(self):
    method record_metrics (line 64) | async def record_metrics(self, name, op, kwargs):
  function setup_pool (line 69) | async def setup_pool():
  function test_concurrent_call (line 78) | async def test_concurrent_call(setup_pool):

FILE: xinference/core/tests/test_progressor.py
  function test_progressor (line 25) | async def test_progressor():

FILE: xinference/core/tests/test_restful_api.py
  class _DummyRequest (line 30) | class _DummyRequest:
    method __init__ (line 31) | def __init__(self, payload):
    method json (line 34) | async def json(self):
  function test_restful_api (line 39) | async def test_restful_api(setup):
  function test_restful_api_for_embedding (line 325) | def test_restful_api_for_embedding(setup):
  function _check_invalid_tool_calls (line 398) | def _check_invalid_tool_calls(endpoint, model_uid_res):
  function test_restful_api_for_tool_calls (line 449) | def test_restful_api_for_tool_calls(setup, model_format, quantization):
  function test_restful_api_for_llama3_tool_calls (line 628) | def test_restful_api_for_llama3_tool_calls(setup, model_format, quantiza...
  function test_restful_api_for_gorilla_openfunctions_tool_calls (line 719) | def test_restful_api_for_gorilla_openfunctions_tool_calls(
  function test_restful_api_for_qwen_tool_calls (line 820) | def test_restful_api_for_qwen_tool_calls(setup, model_format, quantizati...
  function test_restful_api_with_request_limits (line 1040) | def test_restful_api_with_request_limits(setup):
  function test_openai (line 1106) | async def test_openai(setup):
  function test_lang_chain (line 1169) | def test_lang_chain(setup):
  function test_chat_completion_enable_thinking_injected (line 1265) | async def test_chat_completion_enable_thinking_injected(payload, expected):
  function test_launch_model_async (line 1291) | def test_launch_model_async(setup):
  function test_cancel_launch_model (line 1333) | def test_cancel_launch_model(setup):
  function test_events (line 1372) | def test_events(setup):
  function test_launch_model_by_version (line 1412) | def test_launch_model_by_version(setup):
  function test_builtin_families (line 1448) | def test_builtin_families(setup):
  function anthropic_setup (line 1469) | def anthropic_setup():
  function test_convert_openai_to_anthropic_with_tools (line 1528) | def test_convert_openai_to_anthropic_with_tools(anthropic_setup):
  function test_convert_openai_to_anthropic_without_tools (line 1555) | def test_convert_openai_to_anthropic_without_tools(anthropic_setup):
  function test_convert_openai_to_anthropic_mixed_content (line 1579) | def test_convert_openai_to_anthropic_mixed_content(anthropic_setup):
  function test_convert_openai_to_anthropic_multiple_tools (line 1628) | def test_convert_openai_to_anthropic_multiple_tools(anthropic_setup):
  function test_convert_openai_to_anthropic_empty_response (line 1684) | def test_convert_openai_to_anthropic_empty_response(anthropic_setup):
  function test_convert_openai_to_anthropic_invalid_tool_arguments (line 1698) | def test_convert_openai_to_anthropic_invalid_tool_arguments(anthropic_se...
  function test_anthropic_tools_response_format (line 1737) | def test_anthropic_tools_response_format(anthropic_setup):
  function anthropic_api (line 1778) | def anthropic_api():
  function mock_supervisor (line 1786) | def mock_supervisor():
  function sample_models (line 1793) | def sample_models():
  function test_anthropic_list_models (line 1816) | async def test_anthropic_list_models(anthropic_api, mock_supervisor, sam...
  function test_anthropic_get_model_found (line 1838) | async def test_anthropic_get_model_found(anthropic_api, mock_supervisor,...
  function test_anthropic_models_format_compatibility (line 1860) | async def test_anthropic_models_format_compatibility(
  function test_anthropic_models_include_original_fields (line 1892) | async def test_anthropic_models_include_original_fields(

FILE: xinference/core/tests/test_types.py
  function check_fields (line 25) | def check_fields(a, b):
  function test_create_completion_types (line 48) | def test_create_completion_types():
  function test_create_chat_completion_types (line 68) | def test_create_chat_completion_types():
  function test_openai_requests (line 91) | def test_openai_requests():
  function test_create_message_with_tools (line 135) | def test_create_message_with_tools():
  function test_create_message_without_tools (line 176) | def test_create_message_without_tools():
  function test_create_message_with_tool_choice_function (line 197) | def test_create_message_with_tool_choice_function():
  function test_create_message_with_optional_fields (line 229) | def test_create_message_with_optional_fields():
  function test_model_and_messages_type (line 256) | def test_model_and_messages_type():
  function test_message_create_params_structure (line 275) | def test_message_create_params_structure():

FILE: xinference/core/tests/test_utils.py
  function test_replica_model_uid (line 23) | def test_replica_model_uid():
  class DummyVirtualEnvManager (line 35) | class DummyVirtualEnvManager:
    method __init__ (line 36) | def __init__(self, python_path: str):
    method get_python_path (line 39) | def get_python_path(self) -> str:
  function test_build_subpool_envs_for_virtual_env_disabled (line 43) | def test_build_subpool_envs_for_virtual_env_disabled():
  function test_build_subpool_envs_for_virtual_env_enabled (line 51) | def test_build_subpool_envs_for_virtual_env_enabled():

FILE: xinference/core/tests/test_worker.py
  class MockWorkerActor (line 26) | class MockWorkerActor(WorkerActor):
    method __init__ (line 27) | def __init__(
    method __post_create__ (line 35) | async def __post_create__(self):
    method __pre_destroy__ (line 38) | async def __pre_destroy__(self):
    method get_gpu_to_model_uid (line 41) | def get_gpu_to_model_uid(self):
    method get_user_specified_gpu_to_model_uids (line 44) | def get_user_specified_gpu_to_model_uids(self):
    method set_allow_multi_replica_per_gpu (line 47) | def set_allow_multi_replica_per_gpu(self, allow: bool):
    method is_model_vllm_backend (line 50) | async def is_model_vllm_backend(self, model_uid):
    method launch_builtin_model (line 62) | async def launch_builtin_model(
    method terminate_model (line 79) | async def terminate_model(self, model_uid: str):
  function setup_pool (line 88) | async def setup_pool():
  function test_allocate_cuda_devices (line 97) | async def test_allocate_cuda_devices(setup_pool):
  function test_terminate_model_flag (line 124) | async def test_terminate_model_flag(setup_pool):
  function test_merge_virtual_env_packages_override_and_append (line 173) | def test_merge_virtual_env_packages_override_and_append():
  class DummyVirtualEnvManager (line 191) | class DummyVirtualEnvManager:
    method __init__ (line 192) | def __init__(self):
    method install_packages (line 196) | def install_packages(self, packages, **kwargs):
  function test_prepare_virtual_env_injects_engine_vars (line 200) | def test_prepare_virtual_env_injects_engine_vars():
  function test_prepare_virtual_env_without_engine_vars (line 217) | def test_prepare_virtual_env_without_engine_vars():
  function test_prepare_virtual_env_inherit_pip_config (line 233) | def test_prepare_virtual_env_inherit_pip_config(monkeypatch):
  function test_prepare_virtual_env_keeps_system_markers (line 254) | def test_prepare_virtual_env_keeps_system_markers():
  function test_launch_embedding_model (line 280) | async def test_launch_embedding_model(setup_pool):
  function test_launch_model_with_gpu_idx (line 350) | async def test_launch_model_with_gpu_idx(setup_pool):

FILE: xinference/core/utils.py
  class AbortRequestMessage (line 37) | class AbortRequestMessage(Enum):
  function truncate_log_arg (line 43) | def truncate_log_arg(arg) -> str:
  function log_async (line 50) | def log_async(
  function log_sync (line 119) | def log_sync(logger, level=logging.DEBUG, log_exception=True):
  function iter_replica_model_uid (line 160) | def iter_replica_model_uid(model_uid: str, replica: int) -> Generator[st...
  function build_replica_model_uid (line 169) | def build_replica_model_uid(model_uid: str, rep_id: int) -> str:
  function parse_replica_model_uid (line 176) | def parse_replica_model_uid(replica_model_uid: str) -> Tuple[str, int]:
  function is_valid_model_uid (line 188) | def is_valid_model_uid(model_uid: str) -> bool:
  function gen_random_string (line 195) | def gen_random_string(length: int) -> str:
  function json_dumps (line 199) | def json_dumps(o):
  function purge_dir (line 208) | def purge_dir(d):
  function parse_model_version (line 223) | def parse_model_version(model_version: str, model_type: str) -> Tuple:
  function merge_virtual_env_packages (line 252) | def merge_virtual_env_packages(
  function build_subpool_envs_for_virtual_env (line 295) | def build_subpool_envs_for_virtual_env(
  function apply_engine_virtualenv_settings (line 318) | def apply_engine_virtualenv_settings(
  function filter_virtualenv_packages_by_markers (line 340) | def filter_virtualenv_packages_by_markers(
  function assign_replica_gpu (line 419) | def assign_replica_gpu(
  class CancelMixin (line 439) | class CancelMixin:
    method __init__ (line 442) | def __init__(self):
    method _add_running_task (line 447) | def _add_running_task(self, request_id: Optional[str]):
    method _cancel_running_task (line 462) | def _cancel_running_task(

FILE: xinference/core/virtual_env_manager.py
  function get_engine_virtualenv_packages (line 79) | def get_engine_virtualenv_packages(model_engine: Optional[str]) -> List[...
  function get_engine_virtualenv_extra_index_urls (line 85) | def get_engine_virtualenv_extra_index_urls(
  function get_engine_virtualenv_index_strategy (line 98) | def get_engine_virtualenv_index_strategy(model_engine: Optional[str]) ->...
  function resolve_virtualenv_python_path (line 104) | def resolve_virtualenv_python_path(virtual_env_manager: Any) -> Optional...
  function expand_engine_dependency_placeholders (line 129) | def expand_engine_dependency_placeholders(
  class VirtualEnvManager (line 157) | class VirtualEnvManager:
    method __init__ (line 165) | def __init__(self, worker_address: str):
    method list_virtual_envs (line 174) | def list_virtual_envs(
    method remove_virtual_env (line 231) | def remove_virtual_env(
    method check_virtual_env_exists (line 390) | def check_virtual_env_exists(self, model_name: str) -> Dict[str, Any]:
    method list_virtual_env_packages (line 403) | def list_virtual_env_packages(self, model_name: str) -> Dict[str, Any]:
    method _detect_python_version (line 421) | def _detect_python_version(self, env_path: str) -> str:
    method _is_valid_python_version (line 476) | def _is_valid_python_version(self, python_version: str) -> bool:

FILE: xinference/core/worker.py
  class ModelStatus (line 110) | class ModelStatus:
  class LaunchInfo (line 115) | class LaunchInfo:
  class WorkerActor (line 125) | class WorkerActor(xo.StatelessActor):
    method __init__ (line 126) | def __init__(
    method recover_sub_pool (line 206) | async def recover_sub_pool(self, address):
    method default_uid (line 262) | def default_uid(cls) -> str:
    method _get_spec_dicts_with_cache_status (line 265) | def _get_spec_dicts_with_cache_status(
    method _prefer_model_hub (line 287) | def _prefer_model_hub(self, model_family: Any, preferred_hub: str = "h...
    method __post_create__ (line 307) | async def __post_create__(self):
    method __pre_destroy__ (line 433) | async def __pre_destroy__(self):
    method trigger_exit (line 436) | async def trigger_exit(self) -> bool:
    method get_supervisor_ref (line 444) | async def get_supervisor_ref(self, add_worker: bool = True) -> xo.Acto...
    method get_devices_count (line 500) | def get_devices_count():
    method get_model_count (line 506) | def get_model_count(self) -> int:
    method is_model_vllm_backend (line 509) | async def is_model_vllm_backend(self, model_uid: str) -> bool:
    method allocate_devices (line 515) | def allocate_devices(self, model_uid: str, n_gpu: int) -> List[int]:
    method allocate_devices_with_gpu_idx (line 560) | async def allocate_devices_with_gpu_idx(
    method get_gpu_allocation_status (line 598) | async def get_gpu_allocation_status(self) -> Dict[str, Any]:
    method release_devices (line 610) | def release_devices(self, model_uid: str):
    method _create_subpool (line 628) | async def _create_subpool(
    method _check_model_is_valid (line 662) | def _check_model_is_valid(self, model_name: str, model_format: Optiona...
    method register_model (line 670) | async def register_model(self, model_type: str, model: str, persist: b...
    method unregister_model (line 694) | async def unregister_model(self, model_type: str, model_name: str):
    method update_model_type (line 703) | async def update_model_type(self, model_type: str):
    method _store_complete_model_configurations (line 794) | async def _store_complete_model_configurations(self, model_type: str, ...
    method list_model_registrations (line 833) | async def list_model_registrations(
    method get_model_registration (line 1154) | async def get_model_registration(self, model_type: str, model_name: st...
    method query_engines_by_model_name (line 1239) | async def query_engines_by_model_name(
    method _get_model_ability (line 1255) | async def _get_model_ability(self, model: Any, model_type: str) -> Lis...
    method update_cache_status (line 1271) | async def update_cache_status(self, model_name: str, version_info: Any):
    method _create_virtual_env_manager (line 1286) | def _create_virtual_env_manager(
    method _prepare_virtual_env (line 1373) | def _prepare_virtual_env(
    method _get_progressor (line 1460) | async def _get_progressor(self, request_id: str):
    method _upload_download_progress (line 1479) | def _upload_download_progress(
    method launch_builtin_model (line 1490) | async def launch_builtin_model(
    method wait_for_load (line 1803) | async def wait_for_load(self, model_uid: str):
    method cancel_launch_model (line 1808) | async def cancel_launch_model(self, model_uid: str):
    method terminate_model (line 1841) | async def terminate_model(self, model_uid: str, is_model_die=False):
    method get_model_launch_status (line 1948) | def get_model_launch_status(self, model_uid: str) -> Optional[str]:
    method list_models (line 1963) | async def list_models(self) -> Dict[str, Dict[str, Any]]:
    method get_model (line 1967) | def get_model(self, model_uid: str) -> xo.ActorRefType["ModelActor"]:
    method describe_model (line 1977) | def describe_model(self, model_uid: str) -> Dict[str, Any]:
    method report_status (line 1983) | async def report_status(self):
    method _periodical_report_status (line 1996) | async def _periodical_report_status(self):
    method list_cached_models (line 2016) | async def list_cached_models(
    method list_deletable_models (line 2040) | async def list_deletable_models(self, model_version: str) -> List[str]:
    method confirm_and_remove_model (line 2079) | async def confirm_and_remove_model(self, model_version: str) -> bool:
    method list_virtual_envs (line 2101) | async def list_virtual_envs(
    method list_virtual_env_packages (line 2127) | async def list_virtual_env_packages(self, model_name: str) -> Dict[str...
    method remove_virtual_env (line 2131) | async def remove_virtual_env(
    method get_workers_info (line 2142) | async def get_workers_info(self) -> Dict[str, Any]:
    method update_model_status (line 2149) | def update_model_status(self, model_uid: str, **kwargs):
    method get_model_status (line 2155) | def get_model_status(self, model_uid: str):
    method record_metrics (line 2159) | def record_metrics(name, op, kwargs):
    method start_transfer_for_vllm (line 2162) | async def start_transfer_for_vllm(
    method launch_rank0_model (line 2169) | async def launch_rank0_model(
    method recover_model (line 2202) | async def recover_model(self, launch_args: Dict[str, Any]):

FILE: xinference/deploy/cmdline.py
  function get_endpoint (line 59) | def get_endpoint(endpoint: Optional[str]) -> str:
  function get_hash_endpoint (line 71) | def get_hash_endpoint(endpoint: str) -> str:
  function get_stored_token (line 79) | def get_stored_token(
  function start_local_cluster (line 95) | def start_local_cluster(
  function cli (line 158) | def cli(
  function local (line 218) | def local(
  function supervisor (line 272) | def supervisor(
  function worker (line 334) | def worker(
  function register_model (line 394) | def register_model(
  function unregister_model (line 437) | def unregister_model(
  function list_model_registrations (line 475) | def list_model_registrations(
  function list_cached_models (line 624) | def list_cached_models(
  function remove_cache (line 680) | def remove_cache(
  function model_launch (line 893) | def model_launch(
  function model_list (line 1074) | def model_list(endpoint: Optional[str], api_key: Optional[str]):
  function model_terminate (line 1208) | def model_terminate(
  function model_generate (line 1246) | def model_generate(
  function model_chat (line 1349) | def model_chat(
  function vllm_models (line 1452) | def vllm_models(endpoint: Optional[str], api_key: Optional[str]):
  function cluster_login (line 1475) | def cluster_login(
  function query_engine_by_model_name (line 1536) | def query_engine_by_model_name(
  function cal_model_mem (line 1691) | def cal_model_mem(
  function stop_cluster (line 1752) | def stop_cluster(endpoint: str, api_key: Optional[str], check: bool):

FILE: xinference/deploy/local.py
  function _start_local_cluster (line 42) | async def _start_local_cluster(
  function run (line 82) | def run(
  function run_in_subprocess (line 119) | def run_in_subprocess(
  function main (line 149) | def main(

FILE: xinference/deploy/supervisor.py
  function _start_supervisor (line 35) | async def _start_supervisor(address: str, logging_conf: Optional[Dict] =...
  function run (line 52) | def run(address: str, logging_conf: Optional[Dict] = None):
  function run_in_subprocess (line 65) | def run_in_subprocess(
  function main (line 73) | def main(

FILE: xinference/deploy/test/test_cmdline.py
  function test_cmdline (line 37) | def test_cmdline(setup, stream, model_uid):
  function test_cmdline_model_path_error (line 150) | def test_cmdline_model_path_error(setup):
  function test_cmdline_of_custom_model (line 182) | def test_cmdline_of_custom_model(setup):
  function test_rotate_logs (line 275) | def test_rotate_logs(setup_with_file_logging):
  function test_list_cached_models (line 314) | def test_list_cached_models(setup):
  function test_remove_cache (line 331) | def test_remove_cache(setup):
  function test_launch_error_in_passing_parameters (line 345) | def test_launch_error_in_passing_parameters():

FILE: xinference/deploy/utils.py
  class LoggerNameFilter (line 34) | class LoggerNameFilter(logging.Filter):
    method filter (line 35) | def filter(self, record):
  function get_log_file (line 42) | def get_log_file(sub_dir: str):
  function get_config_dict (line 55) | def get_config_dict(
  function create_worker_actor_pool (line 141) | async def create_worker_actor_pool(
  function health_check (line 152) | def health_check(address: str, max_attempts: int, sleep_interval: int = ...
  function get_timestamp_ms (line 190) | def get_timestamp_ms():
  function handle_click_args_type (line 196) | def handle_click_args_type(arg: str) -> Any:
  function set_envs (line 218) | def set_envs(key: str, value: str):

FILE: xinference/deploy/worker.py
  function start_worker_components (line 30) | async def start_worker_components(
  function _start_worker (line 58) | async def _start_worker(
  function main (line 78) | def main(

FILE: xinference/device_utils.py
  function is_vacc_available (line 29) | def is_vacc_available() -> bool:
  function is_xpu_available (line 39) | def is_xpu_available() -> bool:
  function is_npu_available (line 43) | def is_npu_available() -> bool:
  function is_mlu_available (line 53) | def is_mlu_available() -> bool:
  function is_musa_available (line 63) | def is_musa_available() -> bool:
  function get_available_device (line 74) | def get_available_device() -> DeviceType:
  function is_device_available (line 92) | def is_device_available(device: str) -> bool:
  function move_model_to_available_device (line 113) | def move_model_to_available_device(model):
  function get_device_preferred_dtype (line 122) | def get_device_preferred_dtype(device: str) -> Union[torch.dtype, None]:
  function is_hf_accelerate_supported (line 140) | def is_hf_accelerate_supported(device: str) -> bool:
  function empty_cache (line 150) | def empty_cache():
  function get_available_device_env_name (line 177) | def get_available_device_env_name():
  function gpu_count (line 181) | def gpu_count():
  function _get_nvidia_gpu_mem_info (line 199) | def _get_nvidia_gpu_mem_info(gpu_id: int) -> Dict[str, float]:
  function get_nvidia_gpu_info (line 220) | def get_nvidia_gpu_info() -> Dict:

FILE: xinference/isolation.py
  class Isolation (line 20) | class Isolation:
    method __init__ (line 22) | def __init__(
    method _run (line 36) | def _run(self):
    method _cancel_all_tasks (line 43) | def _cancel_all_tasks(loop):
    method start (line 65) | def start(self):
    method call (line 73) | def call(self, coro: Coroutine) -> Any:
    method thread_ident (line 78) | def thread_ident(self):
    method loop (line 82) | def loop(self):
    method _stop (line 85) | async def _stop(self):
    method stop (line 88) | def stop(self):

FILE: xinference/model/__init__.py
  function _install (line 16) | def _install():

FILE: xinference/model/audio/__init__.py
  function register_custom_model (line 41) | def register_custom_model():
  function _need_filter (line 63) | def _need_filter(spec: dict):
  function _install (line 71) | def _install():
  function register_builtin_model (line 97) | def register_builtin_model():
  function has_downloaded_models (line 102) | def has_downloaded_models():
  function load_downloaded_models (line 109) | def load_downloaded_models():
  function load_model_family_from_json (line 124) | def load_model_family_from_json(json_filename, target_families):

FILE: xinference/model/audio/chattts.py
  class ChatTTSModel (line 28) | class ChatTTSModel:
    method __init__ (line 29) | def __init__(
    method model_ability (line 46) | def model_ability(self):
    method load (line 49) | def load(self):
    method speech (line 64) | def speech(

FILE: xinference/model/audio/core.py
  function get_audio_model_descriptions (line 42) | def get_audio_model_descriptions():
  class AudioModelFamilyV2 (line 48) | class AudioModelFamilyV2(CacheableModelSpec, ModelInstanceInfoMixin):
    class Config (line 62) | class Config:
    method to_description (line 65) | def to_description(self):
    method to_version_info (line 76) | def to_version_info(self):
  function generate_audio_description (line 88) | def generate_audio_description(
  function match_audio (line 96) | def match_audio(
  function create_audio_model_instance (line 136) | def create_audio_model_instance(

FILE: xinference/model/audio/cosyvoice.py
  class CosyVoiceModel (line 26) | class CosyVoiceModel:
    method __init__ (line 27) | def __init__(
    method model_ability (line 45) | def model_ability(self):
    method load (line 48) | def load(self):
    method _speech_handle (line 80) | def _speech_handle(
    method speech (line 154) | def speech(

FILE: xinference/model/audio/custom.py
  class CustomAudioModelFamilyV2 (line 32) | class CustomAudioModelFamilyV2(AudioModelFamilyV2):
    method parse_raw (line 39) | def parse_raw(
  class AudioModelRegistry (line 75) | class AudioModelRegistry(ModelRegistry):
    method __init__ (line 78) | def __init__(self):
  function get_user_defined_audios (line 86) | def get_user_defined_audios() -> List[CustomAudioModelFamilyV2]:
  function register_audio (line 93) | def register_audio(model_spec: CustomAudioModelFamilyV2, persist: bool):
  function unregister_audio (line 100) | def unregister_audio(model_name: str, raise_error: bool = True):

FILE: xinference/model/audio/f5tts.py
  class F5TTSModel (line 27) | class F5TTSModel:
    method __init__ (line 28) | def __init__(
    method model_ability (line 46) | def model_ability(self):
    method load (line 49) | def load(self):
    method _infer (line 92) | def _infer(self, ref_audio, ref_text, text_gen, model_obj, mel_spec_ty...
    method speech (line 152) | def speech(

FILE: xinference/model/audio/f5tts_mlx.py
  class F5TTSMLXModel (line 32) | class F5TTSMLXModel:
    method __init__ (line 33) | def __init__(
    method model_ability (line 51) | def model_ability(self):
    method load (line 54) | def load(self):
    method speech (line 127) | def speech(

FILE: xinference/model/audio/fish_speech.py
  function wav_chunk_header (line 31) | def wav_chunk_header(sample_rate=44100, bit_depth=16, channels=1):
  class FishSpeechModel (line 46) | class FishSpeechModel:
    method __init__ (line 47) | def __init__(
    method model_ability (line 66) | def model_ability(self):
    method load (line 69) | def load(self):
    method speech (line 115) | def speech(

FILE: xinference/model/audio/funasr.py
  class FunASRModel (line 27) | class FunASRModel:
    method __init__ (line 28) | def __init__(
    method model_ability (line 45) | def model_ability(self):
    method convert_to_openai_format (line 48) | def convert_to_openai_format(self, input_data):
    method load (line 86) | def load(self):
    method transcriptions (line 113) | def transcriptions(
    method translations (line 164) | def translations(

FILE: xinference/model/audio/indextts2.py
  class Indextts2 (line 27) | class Indextts2:
    method __init__ (line 28) | def __init__(
    method model_ability (line 45) | def model_ability(self):
    method load (line 48) | def load(self):
    method speech (line 80) | def speech(

FILE: xinference/model/audio/kokoro.py
  class KokoroModel (line 28) | class KokoroModel:
    method __init__ (line 29) | def __init__(
    method model_ability (line 46) | def model_ability(self):
    method load (line 49) | def load(self):
    method speech (line 88) | def speech(

FILE: xinference/model/audio/kokoro_mlx.py
  class KokoroMLXModel (line 28) | class KokoroMLXModel:
    method __init__ (line 29) | def __init__(
    method model_ability (line 46) | def model_ability(self):
    method load (line 49) | def load(self):
    method speech (line 68) | def speech(

FILE: xinference/model/audio/kokoro_zh.py
  class KokoroZHModel (line 30) | class KokoroZHModel:
    method __init__ (line 31) | def __init__(
    method _en_callable (line 48) | def _en_callable(self, text):
    method model_ability (line 60) | def model_ability(self):
    method load (line 63) | def load(self):
    method speech (line 89) | def speech(

FILE: xinference/model/audio/megatts.py
  class MegaTTSModel (line 25) | class MegaTTSModel:
    method __init__ (line 26) | def __init__(
    method model_ability (line 44) | def model_ability(self):
    method load (line 47) | def load(self):
    method speech (line 62) | def speech(

FILE: xinference/model/audio/melotts.py
  class MeloTTSModel (line 26) | class MeloTTSModel:
    method __init__ (line 27) | def __init__(
    method model_ability (line 44) | def model_ability(self):
    method load (line 47) | def load(self):
    method speech (line 76) | def speech(

FILE: xinference/model/audio/qwen3_asr.py
  class Qwen3ASRModel (line 31) | class Qwen3ASRModel:
    method __init__ (line 32) | def __init__(
    method model_ability (line 49) | def model_ability(self):
    method load (line 52) | def load(self):
    method _extract_text_and_language (line 87) | def _extract_text_and_language(self, result) -> Tuple[str, Optional[st...
    method transcriptions (line 105) | def transcriptions(
    method translations (line 145) | def translations(

FILE: xinference/model/audio/tests/test_chattts.py
  function test_chattts (line 21) | def test_chattts(setup):

FILE: xinference/model/audio/tests/test_cosyvoice.py
  function test_cosyvoice_sft (line 23) | def test_cosyvoice_sft(setup, model_name):
  function test_cosyvoice (line 76) | def test_cosyvoice(setup, model_name):
  function test_cosyvoice_instruct (line 127) | def test_cosyvoice_instruct(setup, model_name):

FILE: xinference/model/audio/tests/test_f5tts.py
  function test_f5tts (line 19) | def test_f5tts(setup):

FILE: xinference/model/audio/tests/test_f5tts_mlx.py
  function test_f5tts_mlx (line 19) | def test_f5tts_mlx(setup):

FILE: xinference/model/audio/tests/test_fish_speech.py
  function test_fish_speech (line 19) | def test_fish_speech(setup):

FILE: xinference/model/audio/tests/test_funasr.py
  function test_restful_api_for_funasr (line 20) | def test_restful_api_for_funasr(setup):
  function test_verbose_for_funasr (line 55) | def test_verbose_for_funasr(setup):

FILE: xinference/model/audio/tests/test_kokoro.py
  function test_kokoro (line 19) | def test_kokoro(setup):
  function test_kokoro_zh (line 55) | def test_kokoro_zh(setup):

FILE: xinference/model/audio/tests/test_megatts.py
  function test_megatts (line 18) | def test_megatts(setup):

FILE: xinference/model/audio/tests/test_melotts.py
  function test_melotts (line 19) | def test_melotts(setup):

FILE: xinference/model/audio/tests/test_whisper.py
  function test_restful_api_for_whisper (line 22) | def test_restful_api_for_whisper(setup):
  function test_transcriptions_for_whisper (line 82) | def test_transcriptions_for_whisper(setup):
  function test_register_custom_audio (line 141) | def test_register_custom_audio():
  function test_persistent_custom_audio (line 181) | def test_persistent_custom_audio():

FILE: xinference/model/audio/tests/test_whisper_mlx.py
  function test_restful_api_for_whisper (line 26) | def test_restful_api_for_whisper(setup):
  function test_transcriptions_for_whisper (line 80) | def test_transcriptions_for_whisper(setup):

FILE: xinference/model/audio/utils.py
  function _extract_pcm_from_wav_bytes (line 28) | def _extract_pcm_from_wav_bytes(wav_bytes):
  function ensure_sample_rate (line 35) | def ensure_sample_rate(
  function audio_stream_generator (line 62) | def audio_stream_generator(
  function audio_to_bytes (line 108) | def audio_to_bytes(response_format: str, sample_rate: int, tensor: "torc...

FILE: xinference/model/audio/whisper.py
  class WhisperModelConfig (line 34) | class WhisperModelConfig(TypedDict, total=False):
  class WhisperModel (line 41) | class WhisperModel:
    method __init__ (line 42) | def __init__(
    method _sanitize_model_config (line 62) | def _sanitize_model_config(
    method model_ability (line 74) | def model_ability(self):
    method load (line 77) | def load(self):
    method _call_model (line 112) | def _call_model(
    method transcriptions (line 201) | def transcriptions(
    method translations (line 226) | def translations(

FILE: xinference/model/audio/whisper_mlx.py
  class WhisperMLXModel (line 26) | class WhisperMLXModel:
    method __init__ (line 27) | def __init__(
    method model_ability (line 45) | def model_ability(self):
    method load (line 48) | def load(self):
    method transcriptions (line 93) | def transcriptions(
    method translations (line 112) | def translations(
    method _call (line 135) | def _call(

FILE: xinference/model/batch.py
  class BatchMixin (line 27) | class BatchMixin:
    method __init__ (line 32) | def __init__(self, func: _ExtensibleWrapper, **kwargs):
    method queue (line 48) | def queue(self):
    method _ensure_process_batch_running (line 53) | def _ensure_process_batch_running(self):
    method _get_batch_size (line 61) | def _get_batch_size(self, *args, **kwargs) -> int:
    method _process_batch (line 64) | async def _process_batch(self):
    method _wrap_method (line 107) | def _wrap_method(self):

FILE: xinference/model/cache_manager.py
  class CacheManager (line 12) | class CacheManager:
    method __init__ (line 15) | def __init__(self, model_family: "CacheableModelSpec"):
    method get_cache_dir (line 29) | def get_cache_dir(self):
    method get_cache_status (line 32) | def get_cache_status(self):
    method _cache_from_uri (line 36) | def _cache_from_uri(self, model_spec: "CacheableModelSpec") -> str:
    method _cache (line 60) | def _cache(self) -> str:
    method cache (line 117) | def cache(self) -> str:
    method register_custom_model (line 120) | def register_custom_model(self, model_type: str):
    method unregister_custom_model (line 130) | def unregister_custom_model(self, model_type: str):

FILE: xinference/model/core.py
  function create_model_instance (line 20) | def create_model_instance(
  class CacheableModelSpec (line 126) | class CacheableModelSpec(BaseModel):
  class VirtualEnvSettings (line 134) | class VirtualEnvSettings(BaseModel):

FILE: xinference/model/custom.py
  class ModelRegistry (line 29) | class ModelRegistry:
    method __init__ (line 32) | def __init__(self) -> None:
    method find_model (line 37) | def find_model(self, model_name: str):
    method get_custom_models (line 45) | def get_custom_models(self):
    method check_model_uri (line 49) | def check_model_uri(self, model_spec: "CacheableModelSpec"):
    method add_ud_model (line 56) | def add_ud_model(self, model_spec):
    method register (line 59) | def register(self, model_spec: "CacheableModelSpec", persist: bool):
    method remove_ud_model (line 83) | def remove_ud_model(self, model_spec):
    method remove_ud_model_files (line 86) | def remove_ud_model_files(self, model_spec):
    method unregister (line 92) | def unregister(
  class RegistryManager (line 110) | class RegistryManager:
    method get_registry (line 114) | def get_registry(cls, model_type: str) -> ModelRegistry:
  function migrate_from_v1_to_v2 (line 140) | def migrate_from_v1_to_v2(model_type: str, model_spec_cls: Type):

FILE: xinference/model/embedding/__init__.py
  function register_builtin_model (line 45) | def register_builtin_model():
  function register_custom_model (line 50) | def register_custom_model():
  function check_format_with_engine (line 72) | def check_format_with_engine(model_format, engine):
  function generate_engine_config_by_model_name (line 80) | def generate_engine_config_by_model_name(model_family: "EmbeddingModelFa...
  function has_downloaded_models (line 118) | def has_downloaded_models():
  function load_downloaded_models (line 127) | def load_downloaded_models():
  function load_model_family_from_json (line 144) | def load_model_family_from_json(json_filename, target_families):
  function load_downloaded_models_to_dict (line 168) | def load_downloaded_models_to_dict(target_dict):
  function _install (line 184) | def _install():

FILE: xinference/model/embedding/cache_manager.py
  class EmbeddingCacheManager (line 10) | class EmbeddingCacheManager(CacheManager):
    method __init__ (line 11) | def __init__(self, model_family: "EmbeddingModelFamilyV2"):
    method cache (line 25) | def cache(self) -> str:

FILE: xinference/model/embedding/core.py
  function get_embedding_model_descriptions (line 49) | def get_embedding_model_descriptions():
  class TransformersEmbeddingSpecV1 (line 55) | class TransformersEmbeddingSpecV1(BaseModel):
  class LlamaCppEmbeddingSpecV1 (line 64) | class LlamaCppEmbeddingSpecV1(BaseModel):
  class EmbeddingModelFamilyV2 (line 83) | class EmbeddingModelFamilyV2(BaseModel, ModelInstanceInfoMixin):
    class Config (line 93) | class Config:
    method to_description (line 96) | def to_description(self):
    method to_version_info (line 111) | def to_version_info(self):
  function get_model_version (line 125) | def get_model_version(embedding_model: EmbeddingModelFamilyV2) -> str:
  function generate_embedding_description (line 130) | def generate_embedding_description(
  class EmbeddingModel (line 142) | class EmbeddingModel(abc.ABC):
    method __init__ (line 143) | def __init__(
    method check_lib (line 166) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 171) | def match_json(
    method match (line 180) | def match(
    method load (line 196) | def load(self):
    method _fix_langchain_openai_inputs (line 201) | def _fix_langchain_openai_inputs(
    method _text_length (line 239) | def _text_length(text):
    method _create_embedding (line 250) | def _create_embedding(
    method create_embedding (line 271) | def create_embedding(
    method create_embedding (line 279) | def create_embedding(self, args_list, kwargs_list):
    method _extract_sentences_kwargs (line 329) | def _extract_sentences_kwargs(self, args, kwargs):
    method _get_batch_size (line 363) | def _get_batch_size(self, *args, **kwargs) -> int:
    method convert_ids_to_tokens (line 370) | def convert_ids_to_tokens(
    method _clean_cache_if_needed (line 400) | def _clean_cache_if_needed(self, all_token_nums: int):
  function create_embedding_model_instance (line 416) | def create_embedding_model_instance(

FILE: xinference/model/embedding/custom.py
  class CustomEmbeddingModelFamilyV2 (line 25) | class CustomEmbeddingModelFamilyV2(EmbeddingModelFamilyV2):
  class EmbeddingModelRegistry (line 32) | class EmbeddingModelRegistry(ModelRegistry):
    method __init__ (line 35) | def __init__(self):
    method add_ud_model (line 42) | def add_ud_model(self, model_spec):
    method check_model_uri (line 48) | def check_model_uri(self, model_family: "EmbeddingModelFamilyV2"):
    method remove_ud_model (line 56) | def remove_ud_model(self, model_family: "CustomEmbeddingModelFamilyV2"):
    method remove_ud_model_files (line 62) | def remove_ud_model_files(self, model_family: "CustomEmbeddingModelFam...
  function get_user_defined_embeddings (line 72) | def get_user_defined_embeddings() -> List[EmbeddingModelFamilyV2]:
  function register_embedding (line 79) | def register_embedding(model_family: CustomEmbeddingModelFamilyV2, persi...
  function unregister_embedding (line 86) | def unregister_embedding(model_name: str, raise_error: bool = True):

FILE: xinference/model/embedding/embed_family.py
  function match_embedding (line 31) | def match_embedding(
  function check_engine_by_model_name_and_engine (line 118) | def check_engine_by_model_name_and_engine(
  function check_engine_by_model_name_and_engine_with_virtual_env (line 147) | def check_engine_by_model_name_and_engine_with_virtual_env(

FILE: xinference/model/embedding/flag/core.py
  class FlagEmbeddingModel (line 40) | class FlagEmbeddingModel(EmbeddingModel, BatchMixin):
    method __init__ (line 41) | def __init__(
    method load (line 57) | def load(self):
    method _create_embedding (line 105) | def _create_embedding(
    method check_lib (line 285) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 292) | def match_json(

FILE: xinference/model/embedding/flag/tests/test_flag.py
  function test_embedding_model_with_flag (line 42) | async def test_embedding_model_with_flag():

FILE: xinference/model/embedding/llama_cpp/core.py
  class _Done (line 34) | class _Done:
  class _Error (line 38) | class _Error:
    method __init__ (line 39) | def __init__(self, msg):
  class XllamaCppEmbeddingModel (line 43) | class XllamaCppEmbeddingModel(EmbeddingModel, BatchMixin):
    method __init__ (line 44) | def __init__(self, *args, **kwargs) -> None:
    method _sanitize_model_config (line 52) | def _sanitize_model_config(self, llamacpp_model_config: Optional[dict]...
    method _is_darwin_and_apple_silicon (line 67) | def _is_darwin_and_apple_silicon(self):
    method _is_linux (line 70) | def _is_linux(self):
    method load (line 73) | def load(self):
    method _create_embedding (line 197) | def _create_embedding(
    method check_lib (line 232) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 239) | def match_json(

FILE: xinference/model/embedding/llama_cpp/tests/test_llama_cpp.py
  function test_embedding_model_with_xllamacpp (line 42) | async def test_embedding_model_with_xllamacpp():

FILE: xinference/model/embedding/sentence_transformers/core.py
  class SentenceTransformerEmbeddingModel (line 32) | class SentenceTransformerEmbeddingModel(EmbeddingModel, BatchMixin):
    method __init__ (line 33) | def __init__(self, *args, **kwargs) -> None:
    method load (line 38) | def load(self):
    method _create_embedding (line 158) | def _create_embedding(
    method _normalize_vl_inputs (line 455) | def _normalize_vl_inputs(
    method _create_qwen3_vl_embedding (line 471) | def _create_qwen3_vl_embedding(self, sentences, **kwargs):
    method check_lib (line 503) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 507) | def match_json(

FILE: xinference/model/embedding/sentence_transformers/tests/test_sentence_transformers.py
  function test_embedding_model_with_sentence_transformer (line 42) | async def test_embedding_model_with_sentence_transformer():
  function test_embedding_model_with_sentence_transformer_truncate_dim (line 81) | async def test_embedding_model_with_sentence_transformer_truncate_dim():

FILE: xinference/model/embedding/tests/test_embedding_models.py
  function test_engine_supported (line 76) | def test_engine_supported():
  function test_model_from_modelscope (line 83) | async def test_model_from_modelscope():
  function test_get_cache_status (line 103) | def test_get_cache_status():
  function test_from_local_uri (line 115) | def test_from_local_uri():
  function test_register_custom_embedding (line 143) | def test_register_custom_embedding():
  function test_register_fault_embedding (line 224) | def test_register_fault_embedding():
  function test_convert_ids_to_tokens (line 304) | def test_convert_ids_to_tokens():

FILE: xinference/model/embedding/tests/test_integrated_embedding.py
  function test_sparse_embedding (line 25) | def test_sparse_embedding(setup):
  function test_clip_embedding (line 48) | def test_clip_embedding(setup):
  function test_llama_cpp_embedding (line 83) | def test_llama_cpp_embedding(setup):

FILE: xinference/model/embedding/tests/test_qwen3_vl_engine_params.py
  function _assert_engine_params (line 35) | def _assert_engine_params(params, engine_name):
  function test_qwen3_vl_embedding_engine_params_with_virtualenv (line 44) | def test_qwen3_vl_embedding_engine_params_with_virtualenv():
  function _get_cached_model_path (line 54) | def _get_cached_model_path():
  function _get_virtualenv_site_packages (line 60) | def _get_virtualenv_site_packages(env_path: str) -> str:
  function _purge_modules (line 67) | def _purge_modules(prefixes):
  function _prepare_engine_virtualenv (line 73) | def _prepare_engine_virtualenv(engine_name: str, virtual_env_packages=No...
  function test_qwen3_vl_embedding_sentence_transformers_startup_virtualenv (line 120) | def test_qwen3_vl_embedding_sentence_transformers_startup_virtualenv():

FILE: xinference/model/embedding/vllm/core.py
  class VLLMEmbeddingModel (line 30) | class VLLMEmbeddingModel(EmbeddingModel, BatchMixin):
    method __init__ (line 31) | def __init__(self, *args, **kwargs):
    method load (line 36) | def load(self):
    method _get_detailed_instruct (line 85) | def _get_detailed_instruct(task_description: str, query: str) -> str:
    method _create_embedding (line 88) | def _create_embedding(
    method _embed_vl (line 172) | def _embed_vl(
    method check_lib (line 229) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 236) | def match_json(
    method wait_for_load (line 262) | def wait_for_load(self):
    method _set_context_length (line 266) | def _set_context_length(self):

FILE: xinference/model/embedding/vllm/tests/test_vllm_embedding.py
  function test_embedding_model_with_vllm (line 46) | async def test_embedding_model_with_vllm():
  function test_embedding_model_with_vllm_long_text (line 86) | async def test_embedding_model_with_vllm_long_text():
  function test_change_dim (line 161) | def test_change_dim(setup):

FILE: xinference/model/flexible/__init__.py
  function register_custom_model (line 37) | def register_custom_model():
  function _install (line 54) | def _install():

FILE: xinference/model/flexible/core.py
  class FlexibleModelSpec (line 28) | class FlexibleModelSpec(CacheableModelSpec, ModelInstanceInfoMixin):
    method parser_args (line 37) | def parser_args(self):
    class Config (line 40) | class Config:
    method to_description (line 43) | def to_description(self):
    method to_version_info (line 53) | def to_version_info(self):
  function generate_flexible_model_description (line 63) | def generate_flexible_model_description(
  function get_flexible_model_descriptions (line 75) | def get_flexible_model_descriptions():
  class FlexibleModel (line 81) | class FlexibleModel:
    method __init__ (line 82) | def __init__(
    method load (line 96) | def load(self):
    method infer (line 101) | def infer(self, *args, **kwargs):
    method model_uid (line 108) | def model_uid(self):
    method model_path (line 112) | def model_path(self):
    method device (line 116) | def device(self):
    method config (line 120) | def config(self):
  function match_flexible_model (line 124) | def match_flexible_model(model_name):
  function create_flexible_model_instance (line 133) | def create_flexible_model_instance(

FILE: xinference/model/flexible/custom.py
  class FlexibleModelRegistry (line 9) | class FlexibleModelRegistry(ModelRegistry):
    method __init__ (line 12) | def __init__(self):
    method register (line 19) | def register(self, model_spec: "FlexibleModelSpec", persist: bool):
  function get_flexible_models (line 51) | def get_flexible_models():
  function register_flexible_model (line 58) | def register_flexible_model(model_spec: "FlexibleModelSpec", persist: bo...
  function unregister_flexible_model (line 65) | def unregister_flexible_model(model_name: str, raise_error: bool = True):

FILE: xinference/model/flexible/launchers/image_process_launcher.py
  class ImageRemoveBackgroundModel (line 25) | class ImageRemoveBackgroundModel(FlexibleModel):
    method infer (line 26) | def infer(self, *args, **kwargs):
  function launcher (line 58) | def launcher(model_uid: str, model_spec: FlexibleModelSpec, **kwargs) ->...

FILE: xinference/model/flexible/launchers/modelscope_launcher.py
  class ModelScopePipelineModel (line 18) | class ModelScopePipelineModel(FlexibleModel):
    method load (line 19) | def load(self):
    method infer (line 32) | def infer(self, *args, **kwargs):
  function launcher (line 36) | def launcher(model_uid: str, model_spec: FlexibleModelSpec, **kwargs) ->...

FILE: xinference/model/flexible/launchers/transformers_launcher.py
  class MockModel (line 20) | class MockModel(FlexibleModel):
    method infer (line 21) | def infer(self, *args, **kwargs):
  class AutoModel (line 25) | class AutoModel(FlexibleModel):
    method load (line 26) | def load(self):
    method infer (line 30) | def infer(self, *args, **kwargs):
  class TransformersTextClassificationModel (line 34) | class TransformersTextClassificationModel(FlexibleModel):
    method load (line 35) | def load(self):
    method infer (line 40) | def infer(self, *args, **kwargs):
  function launcher (line 44) | def launcher(model_uid: str, model_spec: FlexibleModelSpec, **kwargs) ->...

FILE: xinference/model/flexible/launchers/yolo_launcher.py
  class UltralyticsModel (line 25) | class UltralyticsModel(FlexibleModel):
    method load (line 26) | def load(self):
    method infer (line 35) | def infer(self, *args, **kwargs):
  function launcher (line 53) | def launcher(model_uid: str, model_spec: FlexibleModelSpec, **kwargs) ->...

FILE: xinference/model/flexible/tests/test_flexible_models.py
  function test_register_flexible_model (line 20) | def test_register_flexible_model():
  function test_model (line 39) | def test_model():

FILE: xinference/model/flexible/utils.py
  function get_launcher (line 18) | def get_launcher(launcher_name: str):

FILE: xinference/model/image/__init__.py
  function register_custom_model (line 43) | def register_custom_model():
  function _install (line 65) | def _install():
  function register_builtin_model (line 102) | def register_builtin_model():
  function has_downloaded_models (line 107) | def has_downloaded_models():
  function load_downloaded_models (line 114) | def load_downloaded_models():
  function load_model_family_from_json (line 129) | def load_model_family_from_json(json_filename, target_families):

FILE: xinference/model/image/cache_manager.py
  class ImageCacheManager (line 7) | class ImageCacheManager(CacheManager):
    method __init__ (line 8) | def __init__(self, model_family):
    method cache_gguf (line 24) | def cache_gguf(self, quantization: Optional[str] = None):
    method cache_lightning (line 80) | def cache_lightning(self, lightning_version: Optional[str] = None):

FILE: xinference/model/image/core.py
  function get_image_model_descriptions (line 32) | def get_image_model_descriptions():
  class ImageModelFamilyV2 (line 38) | class ImageModelFamilyV2(CacheableModelSpec, ModelInstanceInfoMixin):
    class Config (line 58) | class Config:
    method to_description (line 61) | def to_description(self):
    method to_version_info (line 77) | def to_version_info(self):
  function generate_image_description (line 106) | def generate_image_description(
  function match_diffusion (line 114) | def match_diffusion(
  function create_ocr_model_instance (line 155) | def create_ocr_model_instance(
  function create_image_model_instance (line 206) | def create_image_model_instance(
  function _select_ocr_model_family (line 341) | def _select_ocr_model_family(

FILE: xinference/model/image/custom.py
  class CustomImageModelFamilyV2 (line 24) | class CustomImageModelFamilyV2(ImageModelFamilyV2):
  class ImageModelRegistry (line 35) | class ImageModelRegistry(ModelRegistry):
    method __init__ (line 38) | def __init__(self):
  function get_user_defined_images (line 46) | def get_user_defined_images() -> List[ImageModelFamilyV2]:
  function register_image (line 53) | def register_image(model_spec: CustomImageModelFamilyV2, persist: bool):
  function unregister_image (line 68) | def unregister_image(model_name: str, raise_error: bool = True):

FILE: xinference/model/image/engine.py
  class DiffusersImageModel (line 24) | class DiffusersImageModel(DiffusionModel, ImageEngineModel):
    method match (line 30) | def match(cls, model_family: "ImageModelFamilyV2") -> bool:
  class VLLMImageModel (line 34) | class VLLMImageModel(ImageEngineModel):
    method match (line 36) | def match(cls, model_family: "ImageModelFamilyV2") -> bool:
    method check_lib (line 41) | def check_lib(cls):
  class SGLangImageModel (line 48) | class SGLangImageModel(ImageEngineModel):
    method match (line 50) | def match(cls, model_family: "ImageModelFamilyV2") -> bool:
    method check_lib (line 55) | def check_lib(cls):
  function register_builtin_image_engines (line 62) | def register_builtin_image_engines() -> None:

FILE: xinference/model/image/engine_family.py
  class ImageEngineModel (line 25) | class ImageEngineModel:
    method __init__ (line 28) | def __init__(self, *args: Any, **kwargs: Any) -> None:
    method match (line 32) | def match(cls, model_family: "ImageModelFamilyV2") -> bool:
    method check_lib (line 36) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
  function check_engine_by_model_name_and_engine (line 48) | def check_engine_by_model_name_and_engine(
  function check_engine_by_model_name_and_engine_with_virtual_env (line 81) | def check_engine_by_model_name_and_engine_with_virtual_env(
  function generate_engine_config_by_model_name (line 134) | def generate_engine_config_by_model_name(model_family: "ImageModelFamily...

FILE: xinference/model/image/ocr/__init__.py
  function register_builtin_ocr_engines (line 36) | def register_builtin_ocr_engines() -> None:

FILE: xinference/model/image/ocr/deepseek_ocr.py
  class DeepSeekOCRModelSize (line 35) | class DeepSeekOCRModelSize:
    method __init__ (line 44) | def __init__(self, size_type: str):
    method from_string (line 64) | def from_string(cls, size_str: str) -> "DeepSeekOCRModelSize":
    method __str__ (line 68) | def __str__(self) -> str:
  function load_image (line 72) | def load_image(image_path: str) -> Optional[PIL.Image.Image]:
  function find_closest_aspect_ratio (line 87) | def find_closest_aspect_ratio(
  function dynamic_preprocess (line 112) | def dynamic_preprocess(
  function normalize_transform (line 166) | def normalize_transform(
  class BasicImageTransform (line 183) | class BasicImageTransform:
    method __init__ (line 186) | def __init__(
    method __call__ (line 206) | def __call__(self, x: PIL.Image.Image) -> torch.Tensor:
  function re_match (line 210) | def re_match(text: str) -> Tuple[List[Tuple], List[str], List[str]]:
  function extract_coordinates_and_label (line 225) | def extract_coordinates_and_label(
  function draw_bounding_boxes (line 239) | def draw_bounding_boxes(
  function process_image_with_refs (line 339) | def process_image_with_refs(
  function clean_ocr_annotations (line 347) | def clean_ocr_annotations(text: str) -> str:
  function extract_text_blocks (line 374) | def extract_text_blocks(text: str) -> List[Dict[str, Any]]:
  class DeepSeekOCRModel (line 416) | class DeepSeekOCRModel(OCRModel):
    method match (line 420) | def match(cls, model_family: "ImageModelFamilyV2") -> bool:
    method __init__ (line 424) | def __init__(
    method model_ability (line 445) | def model_ability(self):
    method load (line 448) | def load(self):
    method ocr (line 487) | def ocr(
    method visualize_ocr (line 583) | def visualize_ocr(
    method _visualize_single (line 653) | def _visualize_single(
    method _ocr_single (line 795) | def _ocr_single(
    method infer (line 940) | def infer(

FILE: xinference/model/image/ocr/got_ocr2.py
  class GotOCR2Model (line 28) | class GotOCR2Model(OCRModel):
    method match (line 32) | def match(cls, model_family: "ImageModelFamilyV2") -> bool:
    method __init__ (line 35) | def __init__(
    method model_ability (line 56) | def model_ability(self):
    method load (line 59) | def load(self):
    method ocr (line 75) | def ocr(

FILE: xinference/model/image/ocr/hunyuan_ocr.py
  class HunyuanOCRModel (line 29) | class HunyuanOCRModel(OCRModel):
    method match (line 33) | def match(cls, model_family: "ImageModelFamilyV2") -> bool:
    method __init__ (line 36) | def __init__(
    method model_ability (line 54) | def model_ability(self):
    method _load (line 57) | def _load(self):
    method load (line 82) | def load(self):
    method ocr (line 86) | def ocr(self, image: PIL.Image.Image, prompt: Optional[str] = None, **...

FILE: xinference/model/image/ocr/mlx.py
  class MLXDeepSeekOCRModel (line 30) | class MLXDeepSeekOCRModel(DeepSeekOCRModel):
    method __init__ (line 33) | def __init__(
    method match (line 45) | def match(cls, model_family) -> bool:
    method check_lib (line 50) | def check_lib(cls):
    method load (line 55) | def load(self):
    method ocr (line 245) | def ocr(
    method _ocr_single (line 257) | def _ocr_single(
    method _prepare_inputs (line 278) | def _prepare_inputs(self, image: PIL.Image.Image, prompt: str):
    method _generate_text (line 293) | def _generate_text(self, image: PIL.Image.Image, prompt: str, **kwargs...

FILE: xinference/model/image/ocr/ocr_family.py
  class OCRModel (line 25) | class OCRModel:
    method __init__ (line 28) | def __init__(self, *args: Any, **kwargs: Any) -> None:
    method match (line 32) | def match(cls, model_family: "ImageModelFamilyV2") -> bool:
    method check_lib (line 36) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
  function check_engine_by_model_name_and_engine (line 48) | def check_engine_by_model_name_and_engine(
  function check_engine_by_model_name_and_engine_with_virtual_env (line 77) | def check_engine_by_model_name_and_engine_with_virtual_env(
  function generate_engine_config_by_model_name (line 126) | def generate_engine_config_by_model_name(model_family: "ImageModelFamily...

FILE: xinference/model/image/ocr/paddleocr_vl.py
  class PaddleOCRVLModel (line 29) | class PaddleOCRVLModel(OCRModel):
    method match (line 35) | def match(cls, model_family: "ImageModelFamilyV2") -> bool:
    method __init__ (line 38) | def __init__(
    method model_ability (line 59) | def model_ability(self):
    method load (line 62) | def load(self):
    method ocr (line 99) | def ocr(
    method _process_single (line 173) | def _process_single(

FILE: xinference/model/image/ocr/vllm.py
  function _load_vllm_model (line 29) | def _load_vllm_model(model_path: str, model_kwargs: Dict[str, Any]):
  function _sanitize_vllm_kwargs (line 54) | def _sanitize_vllm_kwargs(kwargs: Dict[str, Any]) -> Dict[str, Any]:
  function _filter_engine_args (line 69) | def _filter_engine_args(model_kwargs: Dict[str, Any]) -> Dict[str, Any]:
  function _build_sampling_params (line 82) | def _build_sampling_params(kwargs: Dict[str, Any]):
  function _extract_text (line 107) | def _extract_text(outputs: List[Any]) -> List[str]:
  function _shutdown_vllm_model (line 117) | def _shutdown_vllm_model(model: Any) -> None:
  class VLLMDeepSeekOCRModel (line 146) | class VLLMDeepSeekOCRModel(DeepSeekOCRModel):
    method load (line 149) | def load(self):
    method stop (line 154) | def stop(self):
    method _prepare_inputs (line 159) | def _prepare_inputs(
    method ocr (line 167) | def ocr(
    method visualize_ocr (line 200) | def visualize_ocr(
  class VLLMGotOCR2Model (line 232) | class VLLMGotOCR2Model(GotOCR2Model):
  class VLLMHunyuanOCRModel (line 236) | class VLLMHunyuanOCRModel(HunyuanOCRModel):
    method load (line 239) | def load(self):
    method stop (line 249) | def stop(self):
    method _build_prompt (line 255) | def _build_prompt(self, image: PIL.Image.Image, prompt: str) -> str:
    method ocr (line 272) | def ocr(
  class VLLMPaddleOCRVLModel (line 307) | class VLLMPaddleOCRVLModel(PaddleOCRVLModel):

FILE: xinference/model/image/scheduler/flux.py
  class Text2ImageRequest (line 36) | class Text2ImageRequest:
    method __init__ (line 37) | def __init__(
    method _set_width_and_height (line 71) | def _set_width_and_height(self):
    method set_generate_kwargs (line 74) | def set_generate_kwargs(self, generate_kwargs: Dict):
    method prompt (line 78) | def prompt(self):
    method n (line 82) | def n(self):
    method size (line 86) | def size(self):
    method response_format (line 90) | def response_format(self):
    method kwargs (line 94) | def kwargs(self):
    method width (line 98) | def width(self):
    method height (line 102) | def height(self):
    method generate_kwargs (line 106) | def generate_kwargs(self):
    method request_id (line 110) | def request_id(self):
  class FluxBatchSchedulerActor (line 114) | class FluxBatchSchedulerActor(xo.StatelessActor):
    method gen_uid (line 116) | def gen_uid(cls, model_uid: str):
    method __init__ (line 119) | def __init__(self):
    method set_model (line 129) | def set_model(self, model):
    method __post_create__ (line 135) | async def __post_create__(self):
    method __pre_destroy__ (line 144) | async def __pre_destroy__(self):
    method add_request (line 154) | async def add_request(self, unique_id: str, future, *args, **kwargs):
    method abort_request (line 163) | async def abort_request(self, req_id: str) -> str:
    method _handle_request (line 178) | def _handle_request(
    method _empty_cache (line 226) | def _empty_cache():
    method step (line 231) | async def step(self):
    method run (line 263) | async def run(self):
  function _cat_tensors (line 275) | def _cat_tensors(infos: List[Dict]) -> Dict:
  function _batch_text_to_image_internal (line 286) | def _batch_text_to_image_internal(
  function _batch_text_to_image (line 511) | def _batch_text_to_image(

FILE: xinference/model/image/sdapi.py
  class SDAPIToDiffusersConverter (line 22) | class SDAPIToDiffusersConverter:
    method convert_to_diffusers (line 56) | def convert_to_diffusers(sd_type: str, params: dict) -> dict:
    method get_available_args (line 72) | def get_available_args(sd_type: str) -> set:
  class SDAPIDiffusionModelMixin (line 78) | class SDAPIDiffusionModelMixin:
    method _check_kwargs (line 80) | def _check_kwargs(sd_type: str, kwargs: dict):
    method txt2img (line 106) | def txt2img(self, **kwargs):
    method _decode_b64_img (line 118) | def _decode_b64_img(img_str: str) -> Image:
    method img2img (line 127) | def img2img(self, **kwargs):

FILE: xinference/model/image/stable_diffusion/core.py
  function model_accept_param (line 70) | def model_accept_param(params: Union[str, List[str]], model: Any) -> bool:
  class DiffusionModel (line 87) | class DiffusionModel(SDAPIDiffusionModelMixin):
    method __init__ (line 88) | def __init__(
    method model_ability (line 130) | def model_ability(self):
    method _is_flux2_model (line 133) | def _is_flux2_model(self) -> bool:
    method _get_pipeline_type (line 140) | def _get_pipeline_type(ability: str) -> type:
    method _get_controlnet_model (line 151) | def _get_controlnet_model(self, name: str, path: str):
    method _get_model (line 162) | def _get_model(
    method _apply_lora (line 221) | def _apply_lora(self):
    method _get_layer_cls (line 234) | def _get_layer_cls(self, layer: str):
    method load (line 242) | def load(self):
    method _should_use_batching (line 378) | def _should_use_batching(self) -> bool:
    method _get_quantize_config (line 384) | def _get_quantize_config(self, method: str, quantization: str, module:...
    method _quantize_text_encoder (line 436) | def _quantize_text_encoder(self, quantize_text_encoder: Optional[str]):
    method _quantize_transformer (line 475) | def _quantize_transformer(self):
    method _quantize_transformer_gguf (line 511) | def _quantize_transformer_gguf(self):
    method _process_lightning (line 527) | def _process_lightning(self, kwargs):
    method _load_to_device (line 564) | def _load_to_device(self, model):
    method get_max_num_images_for_batching (line 593) | def get_max_num_images_for_batching(self):
    method _get_scheduler (line 597) | def _get_scheduler(model: Any, sampler_name: str):
    method _need_set_scheduler (line 681) | def _need_set_scheduler(self, scheduler: Any) -> bool:
    method _reset_when_done (line 693) | def _reset_when_done(self, model: Any, sampler_name: str):
    method _release_after (line 708) | def _release_after():
    method _wrap_deepcache (line 718) | def _wrap_deepcache(self, model: Any):
    method _process_progressor (line 730) | def _process_progressor(kwargs: dict):
    method _call_model (line 749) | def _call_model(
    method _filter_kwargs (line 794) | def _filter_kwargs(cls, model, kwargs: dict):
    method text_to_image (line 805) | async def text_to_image(
    method _ensure_scheduler_started (line 832) | async def _ensure_scheduler_started(self):
    method _gen_config_for_lightning (line 837) | def _gen_config_for_lightning(self, kwargs):
    method _direct_text_to_image (line 849) | async def _direct_text_to_image(
    method abort_request (line 871) | async def abort_request(self, request_id: str) -> str:
    method pad_to_multiple (line 884) | def pad_to_multiple(image, multiple=8):
    method _model_expects_four_channel_input (line 892) | def _model_expects_four_channel_input(model: Any) -> bool:
    method _ensure_four_channel_image (line 898) | def _ensure_four_channel_image(image: Any, model: Any):
    method _ensure_three_channel_image (line 919) | def _ensure_three_channel_image(image: Any):
    method image_to_image (line 932) | def image_to_image(
    method inpainting (line 997) | def inpainting(

FILE: xinference/model/image/stable_diffusion/mlx.py
  function quantization_predicate (line 36) | def quantization_predicate(name: str, m) -> bool:
  function to_latent_size (line 40) | def to_latent_size(image_size: Tuple[int, int]):
  class MLXDiffusionModel (line 54) | class MLXDiffusionModel(SDAPIDiffusionModelMixin):
    method __init__ (line 55) | def __init__(
    method model_ability (line 81) | def model_ability(self):
    method support_model (line 85) | def support_model(model_name: str) -> bool:
    method load (line 88) | def load(self):
    method _apply_lora (line 118) | def _apply_lora(self):
    method _release_after (line 136) | def _release_after():
    method text_to_image (line 145) | def text_to_image(
    method image_to_image (line 217) | def image_to_image(self, **kwargs):
    method inpainting (line 220) | def inpainting(self, **kwargs):

FILE: xinference/model/image/tests/test_got_ocr2.py
  function test_got_ocr2 (line 30) | def test_got_ocr2(setup):

FILE: xinference/model/image/tests/test_stable_diffusion.py
  function test_model (line 48) | async def test_model():
  function test_progressor (line 73) | async def test_progressor():
  function test_restful_api_for_image_with_canny_controlnet (line 117) | def test_restful_api_for_image_with_canny_controlnet(setup):
  function test_restful_api_for_image_with_mlsd_controlnet (line 159) | def test_restful_api_for_image_with_mlsd_controlnet(setup):
  function test_restful_api_abort (line 205) | def test_restful_api_abort(setup, model_name):
  function test_restful_api_for_sd_turbo (line 263) | def test_restful_api_for_sd_turbo(setup, model_name):
  function test_restful_api_for_sd_image2image (line 306) | def test_restful_api_for_sd_image2image(setup):
  function test_restful_api_for_sd_inpainting (line 342) | def test_restful_api_for_sd_inpainting(setup):
  function test_get_cache_status (line 383) | def test_get_cache_status():
  function test_register_custom_image (line 395) | def test_register_custom_image():
  function test_persist_custom_image (line 418) | def test_persist_custom_image():
  function test_launch_custom_image (line 450) | def test_launch_custom_image(setup):
  function test_launch_custom_image_with_controlnet (line 511) | def test_launch_custom_image_with_controlnet(setup):

FILE: xinference/model/image/utils.py
  function get_model_version (line 30) | def get_model_version(
  function _flatten_images (line 40) | def _flatten_images(images):
  function _needs_png (line 52) | def _needs_png(image) -> bool:
  function handle_image_result (line 60) | def handle_image_result(response_format: str, images) -> ImageList:

FILE: xinference/model/llm/__init__.py
  function register_builtin_model (line 51) | def register_builtin_model():
  function check_format_with_engine (line 56) | def check_format_with_engine(model_format, engine):
  function generate_engine_config_by_model_family (line 65) | def generate_engine_config_by_model_family(model_family: "LLMFamilyV2"):
  function register_custom_model (line 122) | def register_custom_model():
  function has_downloaded_models (line 142) | def has_downloaded_models():
  function load_downloaded_models (line 151) | def load_downloaded_models():
  function load_model_family_from_json (line 168) | def load_model_family_from_json(json_filename, target_families):
  function _install (line 220) | def _install():

FILE: xinference/model/llm/cache_manager.py
  class LLMCacheManager (line 28) | class LLMCacheManager(CacheManager):
    method __init__ (line 29) | def __init__(
    method cache_uri (line 51) | def cache_uri(self) -> str:
    method cache_from_huggingface (line 75) | def cache_from_huggingface(self) -> str:
    method cache_from_modelscope (line 148) | def cache_from_modelscope(self) -> str:
    method cache_from_openmind_hub (line 223) | def cache_from_openmind_hub(self) -> str:
    method cache_from_csghub (line 252) | def cache_from_csghub(self) -> str:
    method cache (line 314) | def cache(self) -> str:

FILE: xinference/model/llm/config_parser.py
  function _resolve_config_and_dir (line 8) | def _resolve_config_and_dir(model_path: str) -> Tuple[str, str]:
  function _load_json_file (line 23) | def _load_json_file(path: str) -> Dict[str, Any]:
  function _load_tokenizer_config (line 28) | def _load_tokenizer_config(model_dir: str) -> Optional[Dict[str, Any]]:
  function _load_chat_template_file (line 35) | def _load_chat_template_file(model_dir: str) -> Optional[str]:
  function _get_first_value (line 44) | def _get_first_value(config: Dict[str, Any], *keys: str) -> Optional[Any]:
  function _infer_context_length (line 52) | def _infer_context_length(config: Dict[str, Any]) -> int:
  function _normalize_architectures (line 64) | def _normalize_architectures(config: Dict[str, Any]) -> List[str]:
  function _match_family_by_architectures (line 73) | def _match_family_by_architectures(
  function _load_builtin_families (line 90) | def _load_builtin_families() -> List[Dict[str, Any]]:
  function _infer_languages (line 102) | def _infer_languages(config: Dict[str, Any]) -> List[str]:
  function _format_size_in_billions (line 111) | def _format_size_in_billions(size_in_billions: float) -> Union[int, str]:
  function _extract_numeric_size (line 118) | def _extract_numeric_size(value: Any) -> Optional[float]:
  function _infer_model_size_in_billions (line 143) | def _infer_model_size_in_billions(config: Dict[str, Any]) -> Optional[Un...
  function _infer_quantization (line 206) | def _infer_quantization(config: Dict[str, Any], model_format: str) -> str:
  function _extract_chat_template (line 228) | def _extract_chat_template(tokenizer_config: Optional[Dict[str, Any]]) -...
  function _infer_model_format (line 237) | def _infer_model_format(config: Dict[str, Any]) -> str:
  function build_llm_registration_from_local_config (line 255) | def build_llm_registration_from_local_config(

FILE: xinference/model/llm/core.py
  function get_llm_version_infos (line 41) | def get_llm_version_infos():
  class LLM (line 47) | class LLM(abc.ABC):
    method __init__ (line 50) | def __init__(
    method check_lib (line 73) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method _is_darwin_and_apple_silicon (line 77) | def _is_darwin_and_apple_silicon():
    method _is_linux (line 81) | def _is_linux():
    method _has_cuda_device (line 86) | def _has_cuda_device():
    method _has_mlu_device (line 109) | def _has_mlu_device():
    method _has_vacc_device (line 126) | def _has_vacc_device():
    method _has_musa_device (line 140) | def _has_musa_device():
    method _get_cuda_count (line 166) | def _get_cuda_count():
    method load (line 184) | def load(self):
    method match (line 188) | def match(
    method match_json (line 199) | def match_json(
    method prepare_parse_reasoning_content (line 204) | def prepare_parse_reasoning_content(
    method prepare_parse_tool_calls (line 220) | def prepare_parse_tool_calls(self):
  function generate_llm_version_info (line 240) | def generate_llm_version_info(llm_family: "LLMFamilyV2") -> Dict[str, Li...
  function create_llm_model_instance (line 259) | def create_llm_model_instance(

FILE: xinference/model/llm/custom.py
  class LLMModelRegistry (line 30) | class LLMModelRegistry(ModelRegistry):
    method __init__ (line 33) | def __init__(self):
    method add_ud_model (line 40) | def add_ud_model(self, model_spec):
    method check_model_uri (line 46) | def check_model_uri(self, llm_family: "LLMFamilyV2"):
    method remove_ud_model (line 54) | def remove_ud_model(self, llm_family: "LLMFamilyV2"):
    method remove_ud_model_files (line 60) | def remove_ud_model_files(self, llm_family: "LLMFamilyV2"):
  function get_user_defined_llm_families (line 70) | def get_user_defined_llm_families():
  function register_llm (line 77) | def register_llm(llm_family: "LLMFamilyV2", persist: bool):
  function unregister_llm (line 84) | def unregister_llm(model_name: str, raise_error: bool = True):

FILE: xinference/model/llm/harmony.py
  class HarmonyStreamParser (line 22) | class HarmonyStreamParser:
    method __init__ (line 23) | def __init__(self):
    method feed (line 29) | def feed(self, text):
  function async_stream_harmony_chat_completion (line 123) | async def async_stream_harmony_chat_completion(

FILE: xinference/model/llm/llama_cpp/core.py
  function _schema_to_grammar (line 33) | def _schema_to_grammar(schema: Dict[str, Any]) -> Optional[str]:
  function _apply_response_format (line 46) | def _apply_response_format(generate_config: Dict[str, Any]) -> None:
  class _Done (line 63) | class _Done:
  class _Error (line 67) | class _Error:
    method __init__ (line 68) | def __init__(self, msg):
  class XllamaCppModel (line 72) | class XllamaCppModel(LLM, ChatModelMixin):
    method __init__ (line 75) | def __init__(
    method _sanitize_model_config (line 87) | def _sanitize_model_config(self, llamacpp_model_config: Optional[dict]...
    method check_lib (line 112) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 119) | def match_json(
    method load (line 131) | def load(self):
    method generate (line 273) | def generate(
    method chat (line 331) | def chat(

FILE: xinference/model/llm/llama_cpp/tests/test_gguf.py
  function test_gguf (line 22) | def test_gguf(setup):
  function surprise_image_base64 (line 42) | def surprise_image_base64():
  function test_gguf_multimodal (line 51) | def test_gguf_multimodal(setup, surprise_image_base64):

FILE: xinference/model/llm/llama_cpp/tests/test_structured.py
  class CarType (line 18) | class CarType(str, Enum):
  class CarDescription (line 25) | class CarDescription(BaseModel):
  function _load_json_from_message (line 31) | def _load_json_from_message(message: Any) -> Dict[str, Any]:
  function test_apply_response_format_sets_grammar (line 65) | def test_apply_response_format_sets_grammar(monkeypatch):
  function test_apply_response_format_handles_conversion_failure (line 89) | def test_apply_response_format_handles_conversion_failure(monkeypatch):
  function test_apply_response_format_ignores_non_schema (line 116) | def test_apply_response_format_ignores_non_schema(monkeypatch):
  function test_apply_response_format_uses_real_xllamacpp_if_available (line 123) | def test_apply_response_format_uses_real_xllamacpp_if_available():
  function test_llamacpp_qwen3_json_schema (line 151) | def test_llamacpp_qwen3_json_schema(setup):

FILE: xinference/model/llm/llm_family.py
  class LlamaCppLLMSpecV2 (line 52) | class LlamaCppLLMSpecV2(BaseModel):
    method validate_model_size_with_radix (line 69) | def validate_model_size_with_radix(cls, v: object) -> object:
  class PytorchLLMSpecV2 (line 80) | class PytorchLLMSpecV2(BaseModel):
    method validate_model_size_with_radix (line 93) | def validate_model_size_with_radix(cls, v: object) -> object:
  class MLXLLMSpecV2 (line 104) | class MLXLLMSpecV2(BaseModel):
    method validate_model_size_with_radix (line 117) | def validate_model_size_with_radix(cls, v: object) -> object:
  class LLMFamilyV2 (line 128) | class LLMFamilyV2(BaseModel, ModelInstanceInfoMixin):
    class Config (line 160) | class Config:
    method _resolve_architectures (line 163) | def _resolve_architectures(self) -> Optional[List[str]]:
    method has_architecture (line 173) | def has_architecture(self, *architectures: str) -> bool:
    method matches_supported_architectures (line 179) | def matches_supported_architectures(
    method to_description (line 187) | def to_description(self):
    method to_version_info (line 207) | def to_version_info(self):
  class CustomLLMFamilyV2 (line 235) | class CustomLLMFamilyV2(LLMFamilyV2):
    method parse_raw (line 237) | def parse_raw(
  function register_transformer (line 347) | def register_transformer(cls):
  function cache_model_tokenizer_and_config (line 365) | def cache_model_tokenizer_and_config(
  function cache_model_config (line 413) | def cache_model_config(llm_family: LLMFamilyV2):
  function _get_cache_dir_for_model_mem (line 438) | def _get_cache_dir_for_model_mem(
  function match_model_size (line 462) | def match_model_size(
  function convert_model_size_to_float (line 481) | def convert_model_size_to_float(
  function match_llm (line 495) | def match_llm(
  function check_engine_by_spec_parameters (line 593) | def check_engine_by_spec_parameters(
  function check_engine_by_spec_parameters_with_virtual_env (line 625) | def check_engine_by_spec_parameters_with_virtual_env(

FILE: xinference/model/llm/lmdeploy/core.py
  class LMDeployModelConfig (line 50) | class LMDeployModelConfig(TypedDict, total=False):
  class LMDeployGenerateConfig (line 68) | class LMDeployGenerateConfig(TypedDict, total=False):
  class LMDeployModel (line 84) | class LMDeployModel(LLM):
    method __init__ (line 87) | def __init__(
    method _sanitize_model_config (line 102) | def _sanitize_model_config(
    method load (line 112) | def load(self):
    method check_lib (line 126) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 133) | def match_json(
    method generate (line 138) | def generate(
  class LMDeployChatModel (line 146) | class LMDeployChatModel(LMDeployModel, ChatModelMixin):
    method load (line 147) | def load(self):
    method match_json (line 185) | def match_json(
    method async_chat (line 202) | async def async_chat(
    method _chat_stream (line 229) | async def _chat_stream(self, messages, include_usage):
    method _chat (line 279) | async def _chat(self, messages) -> ChatCompletion:
    method _generate (line 306) | async def _generate(
    method _get_prompt_input (line 478) | async def _get_prompt_input(

FILE: xinference/model/llm/memory.py
  class ModelLayersInfo (line 43) | class ModelLayersInfo:
  class ModelMemInfo (line 52) | class ModelMemInfo:
  function estimate_llm_gpu_memory (line 100) | def estimate_llm_gpu_memory(
  function estimate_llm_gpu_memory_details (line 130) | def estimate_llm_gpu_memory_details(
  function _load_item_from_json (line 182) | def _load_item_from_json(config_data: Any, *keys: str) -> str:
  function load_model_config_json (line 191) | def load_model_config_json(config_path: str) -> ModelLayersInfo:
  function get_model_layers_info (line 213) | def get_model_layers_info(
  function _get_default_layers_from_size (line 238) | def _get_default_layers_from_size(size_in_billion: float) -> ModelLayers...
  function _convert_to_mb_model_size (line 275) | def _convert_to_mb_model_size(model_size: float, quantization: Optional[...
  function _compute_inference_only_activation_memory (line 289) | def _compute_inference_only_activation_memory(
  function _compute_model_size_gguf (line 302) | def _compute_model_size_gguf(info: ModelLayersInfo, quantization: str) -...

FILE: xinference/model/llm/mlx/core.py
  class MLXBatchModel (line 64) | class MLXBatchModel:
    method __init__ (line 78) | def __init__(
    method _get_lock (line 92) | def _get_lock() -> asyncio.Lock:
    method _get_or_create_generator (line 98) | def _get_or_create_generator(self, temperature: float, top_p: float):
    method _ensure_background_worker (line 132) | def _ensure_background_worker(self, gen_dict):
    method _background_worker (line 138) | async def _background_worker(self, gen_dict):
    method generate_stream (line 188) | async def generate_stream(
    method generate (line 332) | async def generate(
  class MLXModelConfig (line 357) | class MLXModelConfig(TypedDict, total=False):
  class MLXGenerateConfig (line 371) | class MLXGenerateConfig(TypedDict, total=False):
  class PromptCache (line 387) | class PromptCache:
  function get_context_length (line 393) | def get_context_length(config: dict) -> int:
  class MLXModel (line 410) | class MLXModel(LLM):
    method __init__ (line 414) | def __init__(
    method set_loop (line 444) | def set_loop(self, loop: asyncio.AbstractEventLoop):
    method _cleanup_memory (line 449) | def _cleanup_memory(self):
    method driver_info (line 460) | def driver_info(self) -> Optional[dict]:
    method set_shard_info (line 463) | def set_shard_info(self, shard: int, address: str):
    method get_rank_addresses (line 471) | async def get_rank_addresses(self) -> Optional[Dict[int, str]]:
    method _sanitize_model_config (line 475) | def _sanitize_model_config(
    method _sanitize_generate_config (line 485) | def _sanitize_generate_config(
    method _load_model (line 505) | def _load_model(self, **kwargs):
    method _load_model_shard (line 552) | def _load_model_shard(self, **kwargs):
    method _get_classes (line 617) | def _get_classes(config: dict):
    method load (line 643) | def load(self):
    method wait_for_load (line 695) | def wait_for_load(self):
    method check_lib (line 733) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 740) | def match_json(
    method _get_prompt_cache (line 753) | def _get_prompt_cache(
    method _generate_stream_inner (line 778) | def _generate_stream_inner(self, **kwargs):
    method _prepare_inputs (line 812) | def _prepare_inputs(
    method _generate_stream (line 821) | def _generate_stream(
    method _run_non_drivers (line 928) | def _run_non_drivers(
    method async_generate (line 959) | async def async_generate(
  class MLXChatModel (line 1106) | class MLXChatModel(MLXModel, ChatModelMixin):
    method _sanitize_generate_config (line 1109) | def _sanitize_generate_config(
    method match_json (line 1125) | def match_json(
    method async_chat (line 1138) | async def async_chat(
  class MLXVisionModel (line 1193) | class MLXVisionModel(MLXModel, ChatModelMixin):
    method check_lib (line 1197) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 1204) | def match_json(
    method generate (line 1215) | def generate(
    method wait_for_load (line 1270) | def wait_for_load(self):
    method _load_model (line 1278) | def _load_model(self, **kwargs):
    method load (line 1294) | def load(self):
    method _generate_stream_inner (line 1317) | def _generate_stream_inner(self, **kwargs):
    method _prepare_inputs (line 1380) | def _prepare_inputs(
    method chat (line 1439) | def chat(

FILE: xinference/model/llm/mlx/distributed_models/core.py
  class ReceiverActor (line 32) | class ReceiverActor(xo.StatelessActor):
    method __init__ (line 33) | def __init__(self, *args, **kwargs):
    method gen_uid (line 39) | def gen_uid(cls, uid: str, rank: int):
    method send (line 42) | async def send(self, data: "mx.array"):
    method recv (line 49) | async def recv(self):
  class DistributedModelMixin (line 53) | class DistributedModelMixin:
    method __init__ (line 63) | def __init__(self):
    method prepare (line 73) | def prepare(self):
    method _send_stage_result (line 83) | def _send_stage_result(self, result: "mx.array"):
    method _wait_prev_stage_result (line 109) | def _wait_prev_stage_result(self):
    method _broadcast_result (line 122) | def _broadcast_result(self, result: "mx.array"):
    method _get_result (line 144) | def _get_result(self) -> "mx.array":
    method pipeline (line 154) | def pipeline(self):
  class SafeKVCache (line 167) | class SafeKVCache:
    method __init__ (line 174) | def __init__(self):
    method state (line 180) | def state(self):
    method state (line 193) | def state(self, v):
    method __getattr__ (line 203) | def __getattr__(self, name):

FILE: xinference/model/llm/mlx/distributed_models/deepseek_v3.py
  class DeepseekV3Model (line 27) | class DeepseekV3Model(_DeepseekV3Model, DistributedModelMixin):
    method __init__ (line 28) | def __init__(self, *args, **kwargs):
    method __call__ (line 32) | def __call__(
  class Model (line 69) | class Model(_Model):
    method __init__ (line 70) | def __init__(self, config: ModelArgs):

FILE: xinference/model/llm/mlx/distributed_models/qwen2.py
  class Qwen2Model (line 30) | class Qwen2Model(_Qwen2Model, DistributedModelMixin):
    method __init__ (line 31) | def __init__(self, *args, **kwargs):
    method __call__ (line 35) | def __call__(
  class Model (line 74) | class Model(_Model):
    method __init__ (line 75) | def __init__(self, args: ModelArgs):

FILE: xinference/model/llm/mlx/distributed_models/qwen3.py
  class Qwen3Model (line 30) | class Qwen3Model(_Qwen3Model, DistributedModelMixin):
    method __init__ (line 31) | def __init__(self, *args, **kwargs):
    method __call__ (line 35) | def __call__(
  class Model (line 75) | class Model(_Model):
    method __init__ (line 76) | def __init__(self, args: ModelArgs):

FILE: xinference/model/llm/mlx/distributed_models/qwen3_moe.py
  class Qwen3MoeModel (line 29) | class Qwen3MoeModel(_Qwen3MoeModel, DistributedModelMixin):
    method __init__ (line 30) | def __init__(self, *args, **kwargs):
    method __call__ (line 34) | def __call__(
  class Model (line 70) | class Model(_Model):
    method __init__ (line 71) | def __init__(self, args: ModelArgs):

FILE: xinference/model/llm/mlx/tests/test_distributed_model.py
  class ModelActor (line 25) | class ModelActor(xo.StatelessActor):
    method __init__ (line 26) | def __init__(self, rank: int, model_uid: str, model_path: str):
    method set_rank_addresses (line 33) | def set_rank_addresses(self, rank_addresses):
    method _load (line 36) | def _load(self):
    method load (line 71) | async def load(self):
    method _generate (line 74) | def _generate(self, prompt: str, **kwargs):
    method generate (line 84) | async def generate(self, prompt: str, **kwargs):
  function setup_pool (line 89) | async def setup_pool():
  function test_distributed (line 102) | async def test_distributed(setup_pool):

FILE: xinference/model/llm/mlx/tests/test_mlx.py
  class InferenceThread (line 26) | class InferenceThread(threading.Thread):
    method __init__ (line 29) | def __init__(self, prompt, generate_config, model):
    method run (line 37) | def run(self):
    method join (line 62) | def join(self, timeout=None):
  function test_load_mlx (line 73) | def test_load_mlx(setup):
  function test_load_mlx_vision (line 102) | def test_load_mlx_vision(setup):
  function test_mlx_parallel_inference (line 153) | def test_mlx_parallel_inference(setup):

FILE: xinference/model/llm/reasoning_parser.py
  class ReasoningParser (line 12) | class ReasoningParser:
    method __init__ (line 15) | def __init__(
    method extract_reasoning_content_streaming (line 32) | def extract_reasoning_content_streaming(
    method extract_reasoning_content (line 128) | def extract_reasoning_content(
    method check_content_parser (line 162) | def check_content_parser(self) -> bool:
    method _create_chat_completion_chunk (line 172) | def _create_chat_completion_chunk(
    method _create_completion_chunk (line 200) | def _create_completion_chunk(
    method is_enable_thinking (line 227) | def is_enable_thinking(self):
    method prepare_reasoning_content_streaming (line 233) | async def prepare_reasoning_content_streaming(
    method prepare_reasoning_content_sync (line 302) | def prepare_reasoning_content_sync(self, chunks: Iterator[CompletionCh...
    method prepare_reasoning_content (line 365) | def prepare_reasoning_content(self, completion):
    method prepare_first_reasoning_content_chunk (line 392) | def prepare_first_reasoning_content_chunk(

FILE: xinference/model/llm/sglang/core.py
  class SGLANGModelConfig (line 50) | class SGLANGModelConfig(TypedDict, total=False):
  class SGLANGGenerateConfig (line 66) | class SGLANGGenerateConfig(TypedDict, total=False):
  class SGLANGModel (line 116) | class SGLANGModel(LLM):
    method __init__ (line 119) | def __init__(
    method driver_info (line 137) | def driver_info(self) -> Optional[dict]:
    method load (line 140) | def load(self):
    method wait_for_load (line 241) | def wait_for_load(self):
    method stop (line 252) | def stop(self):
    method _sanitize_model_config (line 256) | def _sanitize_model_config(
    method _apply_fp4_config (line 286) | def _apply_fp4_config(self, model_config: SGLANGModelConfig) -> None:
    method _sanitize_generate_config (line 299) | def _sanitize_generate_config(
    method check_lib (line 334) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 341) | def match_json(
    method _convert_state_to_completion_chunk (line 376) | def _convert_state_to_completion_chunk(
    method _convert_state_to_completion (line 413) | def _convert_state_to_completion(
    method _filter_sampling_params (line 450) | def _filter_sampling_params(cls, sampling_params: dict):
    method _stream_generate (line 455) | async def _stream_generate(
    method _non_stream_generate (line 494) | async def _non_stream_generate(
    method async_generate (line 514) | async def async_generate(
  class SGLANGChatModel (line 657) | class SGLANGChatModel(SGLANGModel, ChatModelMixin):
    method match_json (line 659) | def match_json(
    method _sanitize_chat_config (line 689) | def _sanitize_chat_config(
    method is_tool_call_chunk_start (line 702) | def is_tool_call_chunk_start(chunk):
    method is_tool_call_chunk_end (line 706) | def is_tool_call_chunk_end(chunk):
    method async_chat (line 709) | async def async_chat(
  class SGLANGVisionModel (line 756) | class SGLANGVisionModel(SGLANGModel, ChatModelMixin):
    method match_json (line 758) | def match_json(
    method _sanitize_chat_config (line 794) | def _sanitize_chat_config(
    method async_chat (line 805) | async def async_chat(

FILE: xinference/model/llm/tests/test_harmony.py
  function test_streaming_parser_multiple_texts (line 20) | def test_streaming_parser_multiple_texts():
  function test_harmony_streaming_and_nonstreaming (line 59) | async def test_harmony_streaming_and_nonstreaming():
  function test_async_stream_chunks (line 443) | async def test_async_stream_chunks():

FILE: xinference/model/llm/tests/test_llm_family.py
  function test_deserialize_llm_family_v1 (line 36) | def test_deserialize_llm_family_v1():
  function test_cache_from_huggingface_pytorch (line 114) | def test_cache_from_huggingface_pytorch():
  function test_cache_from_huggingface_gguf (line 142) | def test_cache_from_huggingface_gguf():
  function test_cache_from_uri_local (line 176) | def test_cache_from_uri_local():
  function test_custom_llm (line 209) | def test_custom_llm():
  function test_persistent_custom_llm (line 240) | def test_persistent_custom_llm():
  function test_is_locale_chinese_simplified (line 278) | def test_is_locale_chinese_simplified():
  function test_match_llm (line 292) | def test_match_llm():
  function test_is_valid_file_uri (line 327) | def test_is_valid_file_uri():
  function test_get_cache_status_pytorch (line 333) | def test_get_cache_status_pytorch():
  function test_get_cache_status_gguf (line 371) | def test_get_cache_status_gguf():
  function test_parse_chat_template (line 407) | def test_parse_chat_template():
  function test_match_model_size (line 520) | def test_match_model_size():
  function test_convert_model_size_to_float (line 537) | def test_convert_model_size_to_float():
  function test_quert_engine_vLLM (line 548) | def test_quert_engine_vLLM():
  function test_quert_engine_SGLang (line 608) | def test_quert_engine_SGLang():
  function test_query_engine_general (line 665) | def test_query_engine_general():

FILE: xinference/model/llm/tests/test_llm_model.py
  function test_restful_api_for_deepseek_with_reasoning (line 42) | async def test_restful_api_for_deepseek_with_reasoning(
  function test_restful_api_for_deepseek_without_reasoning (line 103) | async def test_restful_api_for_deepseek_without_reasoning(
  function test_qwen3_with_thinking_params (line 171) | async def test_qwen3_with_thinking_params(
  function test_qwen3_with_tools (line 237) | async def test_qwen3_with_tools(setup):
  function setup_cluster (line 340) | def setup_cluster():
  function test_qwen3_enable_thinking (line 385) | async def test_qwen3_enable_thinking(

FILE: xinference/model/llm/tests/test_memory_estimate.py
  function test_llm_estimate_memory (line 18) | def test_llm_estimate_memory():

FILE: xinference/model/llm/tests/test_multimodal.py
  function test_restful_api_for_qwen_vl (line 24) | def test_restful_api_for_qwen_vl(setup, model_format, quantization):
  function test_restful_api_for_yi_vl (line 136) | def test_restful_api_for_yi_vl(setup, model_format, quantization):
  function test_restful_api_for_deepseek_vl (line 224) | def test_restful_api_for_deepseek_vl(setup, model_format, quantization):
  function test_restful_api_for_qwen_audio (line 326) | def test_restful_api_for_qwen_audio(setup):

FILE: xinference/model/llm/tests/test_stream_options.py
  function test_openai_stream_options_llamacpp_chatglm (line 27) | async def test_openai_stream_options_llamacpp_chatglm(setup):
  function test_openai_stream_options_llamacpp (line 115) | async def test_openai_stream_options_llamacpp(setup):
  function test_openai_stream_options_pytorch_chatglm (line 204) | async def test_openai_stream_options_pytorch_chatglm(setup):
  function test_openai_stream_options_pytorch (line 294) | async def test_openai_stream_options_pytorch(setup):
  function test_openai_stream_options_pytorch_deepseek_vl (line 384) | async def test_openai_stream_options_pytorch_deepseek_vl(setup):
  function test_openai_stream_options_pytorch_internlm2 (line 474) | async def test_openai_stream_options_pytorch_internlm2(setup):
  function test_openai_stream_options_pytorch_qwen_vl (line 564) | async def test_openai_stream_options_pytorch_qwen_vl(setup):
  function test_openai_stream_options_pytorch_yi_vl (line 654) | async def test_openai_stream_options_pytorch_yi_vl(setup):
  function test_openai_stream_options_sgalng (line 744) | async def test_openai_stream_options_sgalng(setup):
  function test_openai_stream_options_vllm (line 834) | async def test_openai_stream_options_vllm(setup):
  function test_openai_stream_tools_vllm (line 924) | async def test_openai_stream_tools_vllm(setup):

FILE: xinference/model/llm/tests/test_utils.py
  function test_is_valid_model_name (line 20) | def test_is_valid_model_name():
  function filter_ids_and_created (line 36) | def filter_ids_and_created(data):
  function test_post_process_completion_chunk_without_thinking (line 48) | def test_post_process_completion_chunk_without_thinking():
  function test_post_process_completion_chunk_with_thinking (line 448) | def test_post_process_completion_chunk_with_thinking():
  function test_post_process_completion_chunk_with_parser (line 1018) | def test_post_process_completion_chunk_with_parser():
  function test_post_process_completion_without_thinking (line 1522) | def test_post_process_completion_without_thinking():
  function test_post_process_completion_with_thinking (line 1580) | def test_post_process_completion_with_thinking():
  function test_post_process_completion_with_parser (line 1636) | def test_post_process_completion_with_parser():

FILE: xinference/model/llm/tool_parsers/__init__.py
  function register_tool_parser (line 8) | def register_tool_parser(name: str):

FILE: xinference/model/llm/tool_parsers/abstract_tool_parser.py
  class ToolParser (line 1) | class ToolParser:
    method extract_tool_calls (line 8) | def extract_tool_calls(self, model_output: str):
    method extract_tool_calls_streaming (line 20) | def extract_tool_calls_streaming(

FILE: xinference/model/llm/tool_parsers/deepseek_r1_tool_parser.py
  class DeepseekR1ToolParser (line 13) | class DeepseekR1ToolParser(ToolParser):
    method __init__ (line 21) | def __init__(self):
    method extract_tool_calls (line 47) | def extract_tool_calls(
    method _get_function_calls (line 137) | def _get_function_calls(self, model_output: str) -> List[str]:
    method extract_tool_calls_streaming (line 164) | def extract_tool_calls_streaming(

FILE: xinference/model/llm/tool_parsers/deepseek_v3_1_tool_parser.py
  class DeepseekV3_1ToolParser (line 13) | class DeepseekV3_1ToolParser(ToolParser):
    method __init__ (line 22) | def __init__(self):
    method extract_tool_calls (line 33) | def extract_tool_calls(
    method extract_tool_calls_streaming (line 104) | def extract_tool_calls_streaming(

FILE: xinference/model/llm/tool_parsers/deepseek_v3_tool_parser.py
  class DeepseekV3ToolParser (line 13) | class DeepseekV3ToolParser(ToolParser):
    method __init__ (line 22) | def __init__(self):
    method _parse_json_function_call (line 30) | def _parse_json_function_call(
    method extract_tool_calls (line 50) | def extract_tool_calls(
    method extract_tool_calls_streaming (line 123) | def extract_tool_calls_streaming(

FILE: xinference/model/llm/tool_parsers/glm4_tool_parser.py
  class Glm4ToolParser (line 12) | class Glm4ToolParser(ToolParser):
    method __init__ (line 21) | def __init__(self):
    method _parse_json_function_call (line 29) | def _parse_json_function_call(
    method extract_tool_calls (line 49) | def extract_tool_calls(
    method extract_tool_calls_streaming (line 94) | def extract_tool_calls_streaming(

FILE: xinference/model/llm/tool_parsers/llama3_tool_parser.py
  class Llama3ToolParser (line 11) | class Llama3ToolParser(ToolParser):
    method __init__ (line 20) | def __init__(self):
    method extract_tool_calls (line 26) | def extract_tool_calls(
    method extract_tool_calls_streaming (line 51) | def extract_tool_calls_streaming(

FILE: xinference/model/llm/tool_parsers/minimax_tool_parser.py
  class MiniMaxToolParser (line 13) | class MiniMaxToolParser(ToolParser):
    method __init__ (line 21) | def __init__(self):
    method _parse_param_value (line 46) | def _parse_param_value(self, value: str) -> Any:
    method _parse_invoke_calls (line 55) | def _parse_invoke_calls(self, tool_block: str) -> List[Tuple[str, Dict...
    method _get_function_calls (line 64) | def _get_function_calls(self, model_output: str) -> List[str]:
    method _get_function_calls_streaming (line 76) | def _get_function_calls_streaming(self, model_output: str) -> List[str]:
    method is_contain_think (line 80) | def is_contain_think(self, model_output: str) -> bool:
    method _has_unclosed_tool_call (line 83) | def _has_unclosed_tool_call(self, text: str) -> bool:
    method extract_tool_calls (line 90) | def extract_tool_calls(
    method extract_tool_calls_streaming (line 124) | def extract_tool_calls_streaming(

FILE: xinference/model/llm/tool_parsers/qwen_tool_parser.py
  class QwenToolParser (line 13) | class QwenToolParser(ToolParser):
    method __init__ (line 22) | def __init__(self):
    method _parse_json_function_call (line 47) | def _parse_json_function_call(
    method _parse_json_function_call_stream (line 85) | def _parse_json_function_call_stream(
    method is_contain_think_end_token (line 105) | def is_contain_think_end_token(self, model_output: str) -> bool:
    method is_contain_think (line 117) | def is_contain_think(self, model_output: str) -> bool:
    method is_contain_tool_call (line 129) | def is_contain_tool_call(self, model_output: str) -> bool:
    method is_contain_tool_call_start_token (line 141) | def is_contain_tool_call_start_token(self, model_output: str) -> bool:
    method is_contain_tool_call_end_token (line 153) | def is_contain_tool_call_end_token(self, model_output: str) -> bool:
    method _get_function_calls (line 165) | def _get_function_calls(self, model_output: str) -> List[str]:
    method _get_function_calls_streaming (line 192) | def _get_function_calls_streaming(self, model_output: str) -> List[str]:
    method extract_tool_calls (line 207) | def extract_tool_calls(
    method _has_unclosed_tool_call (line 275) | def _has_unclosed_tool_call(self, text: str) -> bool:
    method extract_tool_calls_streaming (line 294) | def extract_tool_calls_streaming(

FILE: xinference/model/llm/tool_parsers/tests/test_deepseek_r1_tool_parser.py
  function test_tool_parser_extract_calls_without_thinking (line 4) | def test_tool_parser_extract_calls_without_thinking():

FILE: xinference/model/llm/tool_parsers/tests/test_deepseek_v3_1_tool_parser.py
  function test_extract_tool_calls_single_call (line 4) | def test_extract_tool_calls_single_call():
  function test_extract_tool_calls_multiple_calls (line 17) | def test_extract_tool_calls_multiple_calls():
  function test_extract_tool_calls_no_tool_call (line 36) | def test_extract_tool_calls_no_tool_call():
  function test_extract_tool_calls_streaming_full_sequence (line 217) | def test_extract_tool_calls_streaming_full_sequence():
  function test_extract_tool_calls_streaming_multi_sequence (line 242) | def test_extract_tool_calls_streaming_multi_sequence():
  function test_extract_tool_calls_streaming_split_token (line 268) | def test_extract_tool_calls_streaming_split_token():
  function test_extract_tool_calls_invalid_json (line 301) | def test_extract_tool_calls_invalid_json():

FILE: xinference/model/llm/tool_parsers/tests/test_deepseek_v3_tool_parser.py
  function test_tool_parser_extract_calls_without_thinking (line 4) | def test_tool_parser_extract_calls_without_thinking():

FILE: xinference/model/llm/tool_parsers/tests/test_glm4_tool_parser.py
  function test_tool_parser_extract_calls (line 4) | def test_tool_parser_extract_calls():
  function test_tool_parser_extract_calls_streaming (line 16) | def test_tool_parser_extract_calls_streaming():

FILE: xinference/model/llm/tool_parsers/tests/test_qwen_tool_parser.py
  function test_tool_parser_extract_calls_streaming_without_thinking_multi (line 4) | def test_tool_parser_extract_calls_streaming_without_thinking_multi():
  function test_tool_parser_extract_calls_streaming_without_thinking (line 287) | def test_tool_parser_extract_calls_streaming_without_thinking():
  function test_tool_parser_extract_calls_streaming_with_thinking (line 410) | def test_tool_parser_extract_calls_streaming_with_thinking():
  function test_tool_parser_extract_calls_streaming_with_parser (line 583) | def test_tool_parser_extract_calls_streaming_with_parser():
  function test_tool_parser_extract_calls_without_thinking_multi (line 714) | def test_tool_parser_extract_calls_without_thinking_multi():
  function test_tool_parser_extract_calls_without_thinking (line 729) | def test_tool_parser_extract_calls_without_thinking():
  function test_tool_parser_extract_calls_with_thinking (line 741) | def test_tool_parser_extract_calls_with_thinking():
  function test_tool_parser_extract_calls_with_parser (line 757) | def test_tool_parser_extract_calls_with_parser():

FILE: xinference/model/llm/transformers/__init__.py
  function import_submodules (line 22) | def import_submodules(package_path: str, package_name: str, globals_dict...

FILE: xinference/model/llm/transformers/chatglm.py
  class ChatglmPytorchChatModel (line 39) | class ChatglmPytorchChatModel(PytorchChatModel):
    method __init__ (line 42) | def __init__(
    method _get_model_class (line 58) | def _get_model_class(self):
    method _load_model (line 63) | def _load_model(self, **kwargs):
    method match_json (line 88) | def match_json(
    method _handle_tools (line 102) | def _handle_tools(self, messages, generate_config):
    method _process_messages (line 120) | def _process_messages(messages, tools=None, tool_choice="none"):
    method _process_response_non_streaming (line 208) | def _process_response_non_streaming(
    method _process_response_streaming (line 268) | def _process_response_streaming(output, tools, end=False):
    method _stream_chat (line 299) | def _stream_chat(self, inputs, tools, **kwargs):
    method _get_generate_kwargs (line 327) | def _get_generate_kwargs(generate_config):
    method chat (line 352) | def chat(  # type: ignore
    method prepare_sanitize_generate_config (line 445) | def prepare_sanitize_generate_config(self, req: InferenceRequest):
    method prepare_batch_inference (line 459) | def prepare_batch_inference(self, req_list: List[InferenceRequest]):
    method handle_chat_result_non_streaming (line 495) | def handle_chat_result_non_streaming(self, req: InferenceRequest):
    method handle_chat_result_streaming (line 509) | def handle_chat_result_streaming(self, req: InferenceRequest):

FILE: xinference/model/llm/transformers/core.py
  function register_non_default_model (line 63) | def register_non_default_model(*architectures: str):
  class PytorchModel (line 92) | class PytorchModel(LLM):
    method __init__ (line 95) | def __init__(
    method _sanitize_model_config (line 111) | def _sanitize_model_config(
    method _sanitize_generate_config (line 129) | def _sanitize_generate_config(
    method _check_tensorizer_integrity (line 145) | def _check_tensorizer_integrity(self):
    method _load_tensorizer (line 158) | def _load_tensorizer(self, **kwargs):
    method _save_tensorizer (line 172) | def _save_tensorizer(self, **kwargs):
    method _get_model_class (line 180) | def _get_model_class(self):
    method _get_components (line 185) | def _get_components(self, **kwargs):
    method _load_model (line 202) | def _load_model(self, **kwargs):
    method _apply_lora (line 227) | def _apply_lora(self):
    method apply_bnb_quantization (line 251) | def apply_bnb_quantization(
    method apply_fp_quantization (line 277) | def apply_fp_quantization(
    method apply_quantization_config (line 313) | def apply_quantization_config(
    method load (line 322) | def load(self):
    method _should_use_batching (line 410) | def _should_use_batching(self) -> bool:
    method _ensure_scheduler_started (line 439) | async def _ensure_scheduler_started(self):
    method generate (line 444) | async def generate(self, prompt: str, generate_config: Optional[dict] ...
    method _direct_generate (line 475) | async def _direct_generate(
    method _queue_to_async_generator (line 481) | async def _queue_to_async_generator(self, queue):
    method abort_request (line 502) | async def abort_request(self, request_id: str) -> Optional[str]:
    method stop_scheduler (line 510) | async def stop_scheduler(self):
    method stop (line 515) | def stop(self):
    method check_lib (line 537) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 544) | def match_json(
    method build_prefill_attention_mask (line 561) | def build_prefill_attention_mask(
    method build_decode_attention_mask (line 594) | def build_decode_attention_mask(
    method build_prefill_position_ids (line 631) | def build_prefill_position_ids(
    method build_decode_position_ids (line 654) | def build_decode_position_ids(
    method build_prefill_token_type_ids (line 668) | def build_prefill_token_type_ids(
    method build_decode_token_type_ids (line 677) | def build_decode_token_type_ids(
    method build_prefill_inputs (line 686) | def build_prefill_inputs(self, prompts: List, req_list: List[Inference...
    method build_prefill_kwargs (line 698) | def build_prefill_kwargs(self, prompts: List, req_list: List[Inference...
    method build_decode_kwargs (line 720) | def build_decode_kwargs(
    method get_batch_size_and_seq_len_indexes_from_kv (line 743) | def get_batch_size_and_seq_len_indexes_from_kv() -> Tuple[int, int]:
    method get_dtype (line 751) | def get_dtype(self):
    method get_context_len (line 755) | def get_context_len(self):
    method get_max_num_seqs (line 759) | def get_max_num_seqs(self) -> int:
    method prepare_sanitize_generate_config (line 762) | def prepare_sanitize_generate_config(self, req: InferenceRequest):
    method merge_kv_cache (line 765) | def merge_kv_cache(self, past_cache, new_cache):
    method prepare_batch_inference (line 870) | def prepare_batch_inference(self, req_list: List[InferenceRequest]):
    method get_builtin_stop_token_ids (line 904) | def get_builtin_stop_token_ids(self) -> Tuple:
    method handle_batch_inference_results (line 921) | def handle_batch_inference_results(self, req_list: List[InferenceReque...
    method batch_inference (line 956) | def batch_inference(self, req_list: List[InferenceRequest]):
    method build_reduced_kv_cache (line 965) | def build_reduced_kv_cache(self, cache, skipped_indexes: Set[int]):
  class PytorchChatModel (line 976) | class PytorchChatModel(PytorchModel, ChatModelMixin):
    method __init__ (line 977) | def __init__(
    method _sanitize_generate_config (line 993) | def _sanitize_generate_config(
    method match_json (line 1009) | def match_json(
    method chat (line 1026) | async def chat(
    method _direct_chat (line 1061) | async def _direct_chat(
    method load (line 1069) | def load(self):
    method _get_full_prompt (line 1072) | def _get_full_prompt(self, messages: List[Dict], tools, generate_confi...
    method prepare_batch_inference (line 1098) | def prepare_batch_inference(self, req_list: List[InferenceRequest]):
    method handle_chat_result_non_streaming (line 1114) | def handle_chat_result_non_streaming(self, req: InferenceRequest):
    method handle_chat_result_streaming (line 1126) | def handle_chat_result_streaming(self, req: InferenceRequest):
    method handle_batch_inference_results (line 1151) | def handle_batch_inference_results(self, req_list: List[InferenceReque...

FILE: xinference/model/llm/transformers/deepseek_v2.py
  class DeepSeekV2PytorchChatModel (line 27) | class DeepSeekV2PytorchChatModel(PytorchChatModel):
    method _load_model (line 30) | def _load_model(self, **kwargs):
    method match_json (line 64) | def match_json(

FILE: xinference/model/llm/transformers/gemma3.py
  class Gemma3TextChatModel (line 26) | class Gemma3TextChatModel(PytorchChatModel):
    method match_json (line 30) | def match_json(
    method _load_model (line 45) | def _load_model(self, **kwargs):
    method _get_full_prompt (line 62) | def _get_full_prompt(self, messages: List[Dict], tools, generate_confi...
    method build_prefill_kwargs (line 65) | def build_prefill_kwargs(self, prompts: List, req_list: List[Inference...
    method merge_kv_cache (line 94) | def merge_kv_cache(self, past_cache, new_cache):
    method build_decode_attention_mask (line 128) | def build_decode_attention_mask(
    method build_decode_position_ids (line 136) | def build_decode_position_ids(
    method build_reduced_kv_cache (line 144) | def build_reduced_kv_cache(self, cache, skipped_indexes: Set[int]):

FILE: xinference/model/llm/transformers/gpt_oss.py
  class GPTOSSPytorchChatModel (line 33) | class GPTOSSPytorchChatModel(PytorchChatModel):
    method _sanitize_model_config (line 36) | def _sanitize_model_config(
    method match_json (line 44) | def match_json(
    method chat (line 61) | async def chat(  # type:ignore

FILE: xinference/model/llm/transformers/multimodal/cogagent.py
  class CogAgentChatModel (line 34) | class CogAgentChatModel(PytorchMultiModalModel):
    method __init__ (line 37) | def __init__(self, *args, **kws):
    method match_json (line 51) | def match_json(
    method decide_device (line 63) | def decide_device(self):
    method load_processor (line 67) | def load_processor(self):
    method load_multimodal_model (line 74) | def load_multimodal_model(self):
    method _message_content_to_cogagent (line 86) | def _message_content_to_cogagent(self, content):
    method _history_content_to_cogagent (line 115) | def _history_content_to_cogagent(self, chat_history: List[Dict]):
    method _get_query_and_history (line 156) | def _get_query_and_history(
    method build_inputs_from_messages (line 184) | def build_inputs_from_messages(
    method build_generate_kwargs (line 216) | def build_generate_kwargs(
    method build_streaming_iter (line 229) | def build_streaming_iter(

FILE: xinference/model/llm/transformers/multimodal/core.py
  class PytorchMultiModalModel (line 29) | class PytorchMultiModalModel(PytorchChatModel):
    method __init__ (line 30) | def __init__(self, *args, **kwargs):
    method decide_device (line 38) | def decide_device(self):
    method load_processor (line 45) | def load_processor(self):
    method load_multimodal_model (line 52) | def load_multimodal_model(self):
    method load (line 58) | def load(self):
    method build_inputs_from_messages (line 70) | def build_inputs_from_messages(
    method build_generate_kwargs (line 83) | def build_generate_kwargs(
    method build_streaming_iter (line 94) | def build_streaming_iter(
    method get_stop_strs (line 106) | def get_stop_strs(self) -> List[str]:
    method check_conditions (line 109) | def check_conditions(self, new_text: str) -> Tuple[str, bool]:
    method generate_non_streaming (line 117) | def generate_non_streaming(
    method generate_streaming (line 149) | def generate_streaming(
    method chat (line 271) | def chat(

FILE: xinference/model/llm/transformers/multimodal/deepseek_vl2.py
  class DeepSeekVL2ChatModel (line 35) | class DeepSeekVL2ChatModel(PytorchMultiModalModel):
    method __init__ (line 38) | def __init__(self, *args, **kwargs):
    method match_json (line 43) | def match_json(
    method decide_device (line 55) | def decide_device(self):
    method load_processor (line 60) | def load_processor(self):
    method load_multimodal_model (line 69) | def load_multimodal_model(self):
    method _message_content_to_deepseek (line 85) | def _message_content_to_deepseek(content) -> Tuple[str, List[str]]:
    method get_stop_strs (line 148) | def get_stop_strs(self) -> List[str]:
    method build_generate_kwargs (line 153) | def build_generate_kwargs(self, generate_config: Dict):
    method build_inputs_from_messages (line 157) | def build_inputs_from_messages(
    method build_streaming_iter (line 213) | def build_streaming_iter(
    method check_conditions (line 228) | def check_conditions(self, new_text: str) -> Tuple[str, bool]:

FILE: xinference/model/llm/transformers/multimodal/gemma3.py
  class Gemma3ChatModel (line 29) | class Gemma3ChatModel(PytorchMultiModalModel):
    method match_json (line 33) | def match_json(
    method _sanitize_model_config (line 48) | def _sanitize_model_config(
    method decide_device (line 57) | def decide_device(self):
    method load_processor (line 62) | def load_processor(self):
    method load_multimodal_model (line 74) | def load_multimodal_model(self):
    method build_inputs_from_messages (line 82) | def build_inputs_from_messages(
    method build_generate_kwargs (line 97) | def build_generate_kwargs(
    method build_streaming_iter (line 106) | def build_streaming_iter(

FILE: xinference/model/llm/transformers/multimodal/glm4_1v.py
  class Glm4_1VModel (line 35) | class Glm4_1VModel(PytorchMultiModalModel):
    method match_json (line 42) | def match_json(
    method decide_device (line 54) | def decide_device(self):
    method load_processor (line 58) | def load_processor(self):
    method load_multimodal_model (line 64) | def load_multimodal_model(self):
    method _get_processed_msgs (line 79) | def _get_processed_msgs(messages: List[Dict]) -> List[Dict]:
    method build_inputs_from_messages (line 120) | def build_inputs_from_messages(
    method get_stop_strs (line 146) | def get_stop_strs(self) -> List[str]:
    method get_builtin_stop_token_ids (line 149) | def get_builtin_stop_token_ids(self) -> Tuple:
    method build_generate_kwargs (line 154) | def build_generate_kwargs(
    method build_streaming_iter (line 166) | def build_streaming_iter(

FILE: xinference/model/llm/transformers/multimodal/glm4v.py
  class Glm4VModel (line 37) | class Glm4VModel(PytorchMultiModalModel):
    method match_json (line 44) | def match_json(
    method decide_device (line 56) | def decide_device(self):
    method load_processor (line 60) | def load_processor(self):
    method load_multimodal_model (line 67) | def load_multimodal_model(self):
    method _get_processed_msgs (line 83) | def _get_processed_msgs(messages: List[Dict]) -> List[Dict]:
    method build_inputs_from_messages (line 118) | def build_inputs_from_messages(
    method build_generate_kwargs (line 134) | def build_generate_kwargs(
    method get_stop_strs (line 145) | def get_stop_strs(self) -> List[str]:
    method build_streaming_iter (line 148) | def build_streaming_iter(
    method _get_full_prompt (line 172) | def _get_full_prompt(self, messages, tools, generate_config: dict):
    method prepare_sanitize_generate_config (line 186) | def prepare_sanitize_generate_config(self, req: InferenceRequest):
    method build_prefill_inputs (line 199) | def build_prefill_inputs(self, prompts: List, req_list: List[Inference...
    method is_empty (line 236) | def is_empty(images_list: Optional[List[List[torch.Tensor]]]):
    method get_full_attention_mask (line 248) | def get_full_attention_mask(
    method build_prefill_kwargs (line 286) | def build_prefill_kwargs(self, prompts: List, req_list: List[Inference...
    method build_decode_attention_mask (line 304) | def build_decode_attention_mask(

FILE: xinference/model/llm/transformers/multimodal/intern_vl.py
  class InternVLChatModel (line 32) | class InternVLChatModel(PytorchMultiModalModel):
    method match_json (line 39) | def match_json(
    method decide_device (line 51) | def decide_device(self):
    method load_processor (line 82) | def load_processor(self):
    method load_multimodal_model (line 89) | def load_multimodal_model(self):
    method _build_transform (line 106) | def _build_transform(self, input_size=448):
    method _get_index (line 125) | def _get_index(bound, fps, max_frame, first_idx=0, num_segments=32):
    method _find_closest_aspect_ratio (line 143) | def _find_closest_aspect_ratio(
    method _dynamic_preprocess (line 160) | def _dynamic_preprocess(
    method _load_video (line 205) | def _load_video(
    method _message_content_to_intern (line 233) | def _message_content_to_intern(self, content, image_cnt):
    method _get_prompt_and_chat_history (line 270) | def _get_prompt_and_chat_history(
    method _load_image (line 295) | def _load_image(self, image_file, input_size=448, max_num=12):
    method build_inputs_from_messages (line 305) | def build_inputs_from_messages(
    method build_generate_kwargs (line 383) | def build_generate_kwargs(
    method build_streaming_iter (line 393) | def build_streaming_iter(
    method check_conditions (line 415) | def check_conditions(self, new_text: str) -> Tuple[str, bool]:

FILE: xinference/model/llm/transformers/multimodal/minicpmv26.py
  class MiniCPMV26Model (line 36) | class MiniCPMV26Model(PytorchMultiModalModel):
    method match_json (line 40) | def match_json(
    method _sanitize_model_config (line 52) | def _sanitize_model_config(
    method decide_device (line 61) | def decide_device(self):
    method load_processor (line 70) | def load_processor(self):
    method load_multimodal_model (line 86) | def load_multimodal_model(self):
    method _message_content_to_chat (line 109) | def _message_content_to_chat(self, content):
    method _convert_to_specific_style (line 167) | def _convert_to_specific_style(self, messages: List[Dict]) -> Tuple:
    method build_inputs_from_messages (line 203) | def build_inputs_from_messages(
    method build_generate_kwargs (line 215) | def build_generate_kwargs(
    method build_streaming_iter (line 221) | def build_streaming_iter(
    method prepare_sanitize_generate_config (line 234) | def prepare_sanitize_generate_config(self, req: InferenceRequest):
    method _handle_input_ids_and_images (line 253) | def _handle_input_ids_and_images(self, msgs: List[Dict]) -> Dict:
    method _get_full_prompt (line 286) | def _get_full_prompt(self, messages: List[Dict], tools, generate_confi...
    method build_prefill_kwargs (line 294) | def build_prefill_kwargs(self, prompts: List, req_list: List[Inference...
    method build_decode_position_ids (line 327) | def build_decode_position_ids(
    method batch_inference (line 332) | def batch_inference(self, req_list: List[InferenceRequest]):

FILE: xinference/model/llm/transformers/multimodal/minicpmv45.py
  class MiniCPMV45Model (line 36) | class MiniCPMV45Model(PytorchMultiModalModel):
    method match_json (line 40) | def match_json(
    method _sanitize_model_config (line 52) | def _sanitize_model_config(
    method decide_device (line 62) | def decide_device(self):
    method load_processor (line 71) | def load_processor(self):
    method load_multimodal_model (line 87) | def load_multimodal_model(self):
    method _message_content_to_chat (line 110) | def _message_content_to_chat(self, content):
    method _convert_to_specific_style (line 168) | def _convert_to_specific_style(self, messages: List[Dict]) -> Tuple:
    method build_inputs_from_messages (line 204) | def build_inputs_from_messages(
    method build_generate_kwargs (line 216) | def build_generate_kwargs(
    method build_streaming_iter (line 222) | def build_streaming_iter(
    method prepare_sanitize_generate_config (line 235) | def prepare_sanitize_generate_config(self, req: InferenceRequest):
    method _handle_input_ids_and_images (line 254) | def _handle_input_ids_and_images(self, msgs: List[Dict]) -> Dict:
    method _get_full_prompt (line 288) | def _get_full_prompt(self, messages: List[Dict], tools, generate_confi...
    method build_prefill_kwargs (line 296) | def build_prefill_kwargs(self, prompts: List, req_list: List[Inference...
    method build_decode_position_ids (line 329) | def build_decode_position_ids(
    method batch_inference (line 334) | def batch_inference(self, req_list: List[InferenceRequest]):

FILE: xinference/model/llm/transformers/multimodal/ovis2.py
  class Ovis2ChatModel (line 30) | class Ovis2ChatModel(PytorchMultiModalModel):
    method __init__ (line 33) | def __init__(self, *args, **kws):
    method match_json (line 39) | def match_json(
    method decide_device (line 56) | def decide_device(self):
    method load_processor (line 59) | def load_processor(self):
    method load_multimodal_model (line 62) | def load_multimodal_model(self):
    method _parse_messages_ovis (line 77) | def _parse_messages_ovis(messages: List[Dict]) -> List[Dict]:
    method _convert_video_tensors_to_pil (line 96) | def _convert_video_tensors_to_pil(video_inputs: List) -> List[Image.Im...
    method _generate_chat_data (line 137) | def _generate_chat_data(self, messages: List[Dict]):
    method build_generate_kwargs (line 194) | def build_generate_kwargs(
    method build_inputs_from_messages (line 210) | def build_inputs_from_messages(
    method build_streaming_iter (line 232) | def build_streaming_iter(

FILE: xinference/model/llm/transformers/multimodal/qwen-omni.py
  class QwenOmniChatModel (line 43) | class QwenOmniChatModel(PytorchMultiModalModel):
    method __init__ (line 53) | def __init__(self, *args, **kwargs):
    method match_json (line 62) | def match_json(
    method decide_device (line 87) | def decide_device(self):
    method load_processor (line 92) | def load_processor(self):
    method load_multimodal_model (line 103) | def load_multimodal_model(self):
    method _transform_messages (line 132) | def _transform_messages(
    method build_inputs_from_messages (line 150) | def build_inputs_from_messages(
    method build_generate_kwargs (line 181) | def build_generate_kwargs(
    method build_streaming_iter (line 192) | def build_streaming_iter(
    method generate_non_streaming (line 211) | def generate_non_streaming(

FILE: xinference/model/llm/transformers/multimodal/qwen2_audio.py
  class Qwen2AudioChatModel (line 32) | class Qwen2AudioChatModel(PytorchMultiModalModel):
    method match_json (line 36) | def match_json(
    method decide_device (line 48) | def decide_device(self):
    method load_processor (line 52) | def load_processor(self):
    method load_multimodal_model (line 64) | def load_multimodal_model(self):
    method _transform_messages (line 76) | def _transform_messages(
    method build_inputs_from_messages (line 99) | def build_inputs_from_messages(
    method build_generate_kwargs (line 113) | def build_generate_kwargs(
    method build_streaming_iter (line 119) | def build_streaming_iter(

FILE: xinference/model/llm/transformers/multimodal/qwen2_vl.py
  class Qwen2VLChatModel (line 44) | class Qwen2VLChatModel(PytorchMultiModalModel):
    method _sanitize_model_config (line 52) | def _sanitize_model_config(
    method match_json (line 62) | def match_json(
    method decide_device (line 86) | def decide_device(self):
    method load_processor (line 92) | def load_processor(self):
    method load_multimodal_model (line 105) | def load_multimodal_model(self):
    method build_inputs_from_messages (line 176) | def build_inputs_from_messages(
    method build_generate_kwargs (line 199) | def build_generate_kwargs(self, generate_config: Dict) -> Dict[str, Any]:
    method build_streaming_iter (line 204) | def build_streaming_iter(
    method prepare_sanitize_generate_config (line 232) | def prepare_sanitize_generate_config(self, req: InferenceRequest):
    method _get_full_prompt (line 244) | def _get_full_prompt(self, messages: List[Dict], tools, generate_confi...
    method build_prefill_kwargs (line 247) | def build_prefill_kwargs(self, prompts: List, req_list: List[Inference...

FILE: xinference/model/llm/transformers/opt.py
  class OptPytorchModel (line 25) | class OptPytorchModel(PytorchModel):
    method __init__ (line 28) | def __init__(
    method match_json (line 45) | def match_json(
    method build_prefill_position_ids (line 57) | def build_prefill_position_ids(
    method build_decode_position_ids (line 67) | def build_decode_position_ids(

FILE: xinference/model/llm/transformers/tensorizer_utils.py
  function _filter_kwargs (line 41) | def _filter_kwargs(kwargs):
  function _file_is_non_empty (line 48) | def _file_is_non_empty(
  function get_tensorizer_dir (line 57) | def get_tensorizer_dir(model_path: str) -> str:
  function check_tensorizer_integrity (line 62) | def check_tensorizer_integrity(
  function load_from_tensorizer (line 79) | def load_from_tensorizer(
  function _load_pretrained_from_tensorizer (line 131) | def _load_pretrained_from_tensorizer(
  function _load_model_from_tensorizer (line 168) | def _load_model_from_tensorizer(
  function save_to_tensorizer (line 260) | def save_to_tensorizer(
  function _tensorizer_serialize_model (line 276) | def _tensorizer_serialize_model(
  function _tensorizer_serialize_pretrained (line 310) | def _tensorizer_serialize_pretrained(

FILE: xinference/model/llm/transformers/tests/test_opt.py
  function test_opt_pytorch_model (line 25) | async def test_opt_pytorch_model(setup, quantization):
  function test_opt_fp4_model (line 76) | async def test_opt_fp4_model(setup):

FILE: xinference/model/llm/transformers/tests/test_tensorizer.py
  class TestTensorizerSerializeModel (line 19) | class TestTensorizerSerializeModel:
    method setup_and_teardown (line 21) | def setup_and_teardown(self):
    method _cleanup_directory (line 64) | def _cleanup_directory(self, directory):
    method test_tensor_file_exists (line 69) | def test_tensor_file_exists(self):
  function mock_environment (line 95) | def mock_environment(tmp_path):
  function test_tensorizer_serialize_model_cache_exists (line 108) | def test_tensorizer_serialize_model_cache_exists(

FILE: xinference/model/llm/transformers/utils.py
  function get_context_length (line 47) | def get_context_length(config) -> int:
  function prepare_logits_processor (line 70) | def prepare_logits_processor(
  function _get_token_from_logits (line 86) | def _get_token_from_logits(
  function _pad_to_max_length (line 115) | def _pad_to_max_length(x: List[int], max_len: int, pad: int) -> List[int]:
  function _pad_seqs_inplace (line 120) | def _pad_seqs_inplace(seqs: List[List[int]], reqs: List[InferenceRequest...
  function get_max_src_len (line 132) | def get_max_src_len(context_len: int, r: InferenceRequest) -> int:
  function pad_prefill_tokens (line 140) | def pad_prefill_tokens(
  function _get_completion (line 153) | def _get_completion(
  function _get_pad_param (line 188) | def _get_pad_param(seq_len_idx: int, pad_len: int) -> Tuple:
  function get_batch_size_and_seq_len_from_kv_cache (line 194) | def get_batch_size_and_seq_len_from_kv_cache(kv, xinf_model_obj: "Pytorc...
  function convert_to_cache_cls (line 216) | def convert_to_cache_cls(cache) -> DynamicCache:
  function _batch_inference_one_step_internal (line 226) | def _batch_inference_one_step_internal(
  function batch_inference_one_step (line 484) | def batch_inference_one_step(

FILE: xinference/model/llm/utils.py
  class ChatModelMixin (line 110) | class ChatModelMixin:
    method __init__ (line 111) | def __init__(self):
    method _compile_jinja_template (line 119) | def _compile_jinja_template(chat_template):
    method _build_from_raw_template (line 136) | def _build_from_raw_template(
    method get_full_context (line 145) | def get_full_context(
    method _get_chat_template_kwargs_from_generate_config (line 179) | def _get_chat_template_kwargs_from_generate_config(
    method convert_messages_with_content_list_to_str_conversion (line 205) | def convert_messages_with_content_list_to_str_conversion(
    method get_specific_prompt (line 224) | def get_specific_prompt(model_family: str, messages: List[ChatCompleti...
    method _to_chat_completion_chunk (line 294) | def _to_chat_completion_chunk(
    method _get_first_chat_completion_chunk (line 386) | def _get_first_chat_completion_chunk(
    method _get_final_chat_completion_chunk (line 418) | def _get_final_chat_completion_chunk(
    method _to_chat_completion_chunks (line 438) | def _to_chat_completion_chunks(
    method _tools_to_messages_for_deepseek (line 476) | def _tools_to_messages_for_deepseek(
    method _async_to_chat_completion_chunks (line 505) | async def _async_to_chat_completion_chunks(
    method _to_chat_completion (line 543) | def _to_chat_completion(
    method _eval_glm_chat_arguments (line 600) | def _eval_glm_chat_arguments(c) -> List[Tuple]:
    method _handle_qwen_tool_result (line 617) | def _handle_qwen_tool_result(cls, text: str) -> List[Tuple]:
    method _eval_qwen_chat_arguments (line 686) | def _eval_qwen_chat_arguments(
    method _eval_llama3_chat_arguments (line 695) | def _eval_llama3_chat_arguments(cls, c) -> List[Tuple]:
    method _eval_deepseek_chat_arguments (line 704) | def _eval_deepseek_chat_arguments(cls, c) -> List[Tuple]:
    method _eval_deepseek_r1_arguments (line 772) | def _eval_deepseek_r1_arguments(cls, c) -> List[Tuple]:
    method _eval_tool_arguments (line 818) | def _eval_tool_arguments(
    method _post_process_completion_chunk (line 840) | def _post_process_completion_chunk(
    method _post_process_completion (line 922) | def _post_process_completion(
    method _transform_messages (line 1013) | def _transform_messages(
    method _async_to_tool_completion_chunks (line 1050) | async def _async_to_tool_completion_chunks(
  function get_model_version (line 1093) | def get_model_version(
  function _decode_image (line 1102) | def _decode_image(_url):
  function _decode_image_without_rgb (line 1121) | def _decode_image_without_rgb(_url):
  function generate_completion_chunk (line 1141) | def generate_completion_chunk(
  function generate_completion (line 1175) | def generate_completion(
  function generate_chat_completion (line 1201) | def generate_chat_completion(
  function get_stop_token_ids_from_config_file (line 1230) | def get_stop_token_ids_from_config_file(model_path: str) -> Optional[Lis...
  function normalize_response_format (line 1252) | def normalize_response_format(
  function parse_messages (line 1278) | def parse_messages(messages: List[Dict]) -> Tuple:

FILE: xinference/model/llm/vllm/core.py
  class VLLMModelConfig (line 88) | class VLLMModelConfig(TypedDict, total=False):
  class VLLMGenerateConfig (line 117) | class VLLMGenerateConfig(TypedDict, total=False):
  function _get_effective_vllm_version (line 165) | def _get_effective_vllm_version() -> version.Version:
  function _virtual_env_allows_missing_vllm (line 177) | def _virtual_env_allows_missing_vllm() -> bool:
  function _append_unique (line 185) | def _append_unique(target: List[str], *items: str) -> None:
  function _update_vllm_supported_lists (line 209) | def _update_vllm_supported_lists() -> None:
  class VLLMModel (line 325) | class VLLMModel(LLM):
    method __init__ (line 328) | def __init__(
    method set_xavier_config (line 358) | def set_xavier_config(self, value: Optional[Dict]):
    method set_worker_addresses (line 361) | def set_worker_addresses(self, shard: int, worker_addresses: List[str]):
    method driver_info (line 371) | def driver_info(self) -> Optional[dict]:
    method need_create_pools (line 375) | def need_create_pools(self):
    method set_pool_addresses (line 378) | def set_pool_addresses(self, pool_addresses: List[str]):
    method get_pool_addresses (line 381) | def get_pool_addresses(self) -> Optional[List[str]]:
    method set_loop (line 384) | def set_loop(self, loop: asyncio.AbstractEventLoop):
    method _is_vllm_v1 (line 389) | def _is_vllm_v1(self) -> bool:
    method load (line 405) | def load(self):
    method wait_for_load (line 628) | def wait_for_load(self):
    method _set_context_length (line 640) | def _set_context_length(self):
    method _enable_v1_if_supported (line 652) | def _enable_v1_if_supported(self, engine_args: "vllm.AsyncEngineArgs"):
    method _preprocess_load_gguf (line 696) | def _preprocess_load_gguf(self):
    method stop (line 740) | def stop(self):
    method init_xavier (line 760) | async def init_xavier(self):
    method _check_healthy (line 763) | async def _check_healthy(self, interval: int = 30):
    method parse_str_field_to_dict (line 783) | def parse_str_field_to_dict(
    method _sanitize_model_config (line 828) | def _sanitize_model_config(
    method _sanitize_generate_config (line 891) | def _sanitize_generate_config(
    method check_lib (line 979) | def check_lib(cls) -> Union[bool, Tuple[bool, str]]:
    method match_json (line 989) | def match_json(
    method _convert_request_output_to_completion_chunk (line 1031) | def _convert_request_output_to_completion_chunk(
    method _convert_request_output_to_completion (line 1058) | def _convert_request_output_to_completion(
    method _get_tokenizer (line 1090) | async def _get_tokenizer(self, lora_request: Any) -> Any:
    method _tokenize (line 1124) | def _tokenize(self, tokenizer: Any, prompt: str, config: dict) -> List...
    method _gen_tokens_prompt (line 1148) | async def _gen_tokens_prompt(
    method async_generate (line 1162) | async def async_generate(
  class VLLMChatModel (line 1532) | class VLLMChatModel(VLLMModel, ChatModelMixin):
    method match_json (line 1534) | def match_json(
    method _sanitize_chat_config (line 1580) | def _sanitize_chat_config(
    method is_tool_call_chunk_start (line 1616) | def is_tool_call_chunk_start(chunk):
    method is_tool_call_chunk_end (line 1620) | def is_tool_call_chunk_end(chunk):
    method prefill_messages (line 1624) | def prefill_messages(messages: List[Dict]) -> List[Dict]:
    method async_chat (line 1650) | async def async_chat(
  class VLLMMultiModel (line 1718) | class VLLMMultiModel(VLLMModel, ChatModelMixin):
    method match_json (line 1720) | def match_json(
    method _attach_video_metadata (line 1777) | def _attach_video_metadata(
    method _sanitize_model_config (line 1801) | def _sanitize_model_config(
    method _sanitize_chat_config (line 1842) | def _sanitize_chat_config(
    method _gen_tokens_prompt (line 1861) | async def _gen_tokens_prompt(
    method _handle_base64_images (line 1882) | def _handle_base64_images(self, messages, temp_files):
    method async_chat (line 1930) | async def async_chat(

FILE: xinference/model/llm/vllm/distributed_executor.py
  class WorkerActor (line 45) | class WorkerActor(xo.StatelessActor):
    method __init__ (line 46) | def __init__(self, vllm_config: "VllmConfig", rpc_rank: int = 0, **kwa...
    method __post_create__ (line 50) | async def __post_create__(self):
    method __getattr__ (line 59) | def __getattr__(self, item):
    method gen_uid (line 63) | def gen_uid(cls, rank):
    method execute_method (line 66) | def execute_method(self, method: Union[str, Callable], *args, **kwargs):
  class WorkerWrapper (line 82) | class WorkerWrapper:
    method __init__ (line 83) | def __init__(
    method execute_method (line 91) | def execute_method(self, method: Union[str, Callable], *args, **kwargs):
    method execute_method_async (line 95) | async def execute_method_async(self, method: Union[str, Callable], *ar...
    method kill (line 98) | def kill(self):
  class XinferenceDistr

Copy disabled (too large) Download .json

Condensed preview — 1635 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (23,502K chars).

[
  {
    "path": ".dockerignore",
    "chars": 106,
    "preview": "doc/\n.idea/\n.github/\nbuild/\nxinference.egg-info/\nxinference/web/ui/build/\nxinference/web/ui/node_modules/\n"
  },
  {
    "path": ".gitattributes",
    "chars": 36,
    "preview": "xinference/_version.py export-subst\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.yaml",
    "chars": 3204,
    "preview": "name: \"Bug Report\"\ndescription: Submit a bug report to help us improve Xinference. You should provide useful information"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.yaml",
    "chars": 932,
    "preview": "name: \"Feature request\"\ndescription: Submit a request for a new Xinference feature / 提交一个新的 Xinference 的功能建议\nlabels: [ \""
  },
  {
    "path": ".github/workflows/assign.yaml",
    "chars": 606,
    "preview": "name: Assign\non:\n  issue_comment:\n    types: created\n\npermissions:\n  contents: read\n\njobs:\n  issue_assign:\n    permissio"
  },
  {
    "path": ".github/workflows/docker-cd.yaml",
    "chars": 2690,
    "preview": "name: Xinference CD for DockerHub\n\non:\n  schedule:\n    - cron: '0 18 * * *'\n  push:\n    tags:\n      - '*'\n  workflow_dis"
  },
  {
    "path": ".github/workflows/issue.yaml",
    "chars": 760,
    "preview": "name: Close inactive issues\non:\n  schedule:\n    - cron: \"0 19 * * *\"\n  workflow_dispatch:\n\njobs:\n  close-issues:\n    run"
  },
  {
    "path": ".github/workflows/pr_auto_run_gen_docs.yaml",
    "chars": 6561,
    "preview": "name: Auto run gen_docs.py and commit changes to PR\n\non:\n  pull_request_target:\n    types: [opened, synchronize]\n\npermis"
  },
  {
    "path": ".github/workflows/python.yaml",
    "chars": 16842,
    "preview": "name: Python CI\n\non:\n  push:\n    branches:\n      - '*'\n  pull_request:\n    types: ['opened', 'reopened', 'synchronize']\n"
  },
  {
    "path": ".github/workflows/release.yaml",
    "chars": 1389,
    "preview": "name: Build and upload to PyPI\n\non:\n  push:\n    tags:\n      - '*'\n\nconcurrency:\n  group: ${{ github.workflow }}-${{ gith"
  },
  {
    "path": ".gitignore",
    "chars": 2078,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 1102,
    "preview": "files: xinference\nrepos:\n  - repo: https://github.com/psf/black\n    rev: 25.1.0\n    hooks:\n      - id: black\n        exc"
  },
  {
    "path": ".readthedocs.yaml",
    "chars": 307,
    "preview": "version: 2\n\n# Build documentation in the docs/ directory with Sphinx\nsphinx:\n   configuration: doc/source/conf.py\n\nbuild"
  },
  {
    "path": "LICENSE",
    "chars": 11356,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "MANIFEST.in",
    "chars": 449,
    "preview": "global-include *.pyx\nglobal-include *.pxd\nglobal-include xinference/**/*.json\nglobal-exclude *.c\nglobal-exclude *.cpp\nin"
  },
  {
    "path": "README.md",
    "chars": 12557,
    "preview": "<div align=\"center\">\n<img src=\"./assets/xorbits-logo.png\" width=\"180px\" alt=\"xorbits\" />\n\n# Xorbits Inference: Model Ser"
  },
  {
    "path": "README_ja_JP.md",
    "chars": 6235,
    "preview": "<div align=\"center\">\n<img src=\"./assets/xorbits-logo.png\" width=\"180px\" alt=\"xorbits\" />\n\n# Xorbits Inference: モデルサービングを"
  },
  {
    "path": "README_zh_CN.md",
    "chars": 9724,
    "preview": "<div align=\"center\">\n<img src=\"./assets/xorbits-logo.png\" width=\"180px\" alt=\"xorbits\" />\n\n# Xorbits Inference：模型推理， 轻而易举"
  },
  {
    "path": "benchmark/README.md",
    "chars": 1916,
    "preview": "# Benchmarking Xinference\n\n## Downloading the ShareGPT dataset\n\nYou can download the dataset by running:\n```bash\nwget ht"
  },
  {
    "path": "benchmark/benchmark_embedding.py",
    "chars": 5820,
    "preview": "# Copyright 2022-2025 XProbe Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use "
  },
  {
    "path": "benchmark/benchmark_latency.py",
    "chars": 3314,
    "preview": "# Copyright 2022-2023 XProbe Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use "
  },
  {
    "path": "benchmark/benchmark_long.py",
    "chars": 4302,
    "preview": "# Copyright 2022-2023 XProbe Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use "
  },
  {
    "path": "benchmark/benchmark_rerank.py",
    "chars": 6023,
    "preview": "# Copyright 2022-2023 XProbe Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use "
  },
  {
    "path": "benchmark/benchmark_runner.py",
    "chars": 15640,
    "preview": "# Copyright 2022-2023 XProbe Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use "
  },
  {
    "path": "benchmark/benchmark_serving.py",
    "chars": 5916,
    "preview": "# Copyright 2022-2023 XProbe Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use "
  },
  {
    "path": "benchmark/utils.py",
    "chars": 6166,
    "preview": "# Copyright 2022-2023 XProbe Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use "
  },
  {
    "path": "doc/Makefile",
    "chars": 1157,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the "
  },
  {
    "path": "doc/source/_static/switcher.json",
    "chars": 257,
    "preview": "[\n  {\n    \"name\": \"简体中文(Chinese)\",\n    \"version\": \"zh-cn\",\n    \"url\": \"https://inference.readthedocs.io/zh-cn/latest/\"\n "
  },
  {
    "path": "doc/source/conf.py",
    "chars": 4217,
    "preview": "# Configuration file for the Sphinx documentation builder.\n#\n# This file only contains a selection of the most common op"
  },
  {
    "path": "doc/source/development/contributing_codebase.rst",
    "chars": 4167,
    "preview": "=============================\nContributing to the code base\n=============================\n\n.. contents:: Table of conten"
  },
  {
    "path": "doc/source/development/contributing_environment.rst",
    "chars": 3966,
    "preview": "==================================\nCreating a development environment\n==================================\n\n.. contents:: "
  },
  {
    "path": "doc/source/development/index.rst",
    "chars": 172,
    "preview": ".. _development_index:\n\n===========\nDevelopment\n===========\n\n.. toctree::\n    :maxdepth: 2\n\n    contributing_environment"
  },
  {
    "path": "doc/source/development/xinference_internals.rst",
    "chars": 15258,
    "preview": "===========================\nThe internals of Xinference\n===========================\n\n.. contents:: Table of contents:\n  "
  },
  {
    "path": "doc/source/examples/ai_podcast.rst",
    "chars": 3269,
    "preview": ".. _examples_ai_podcast:\n\n======================\nExample: AI Podcast 🎙\n======================\n\n**Description**:\n\n🎙️AI Po"
  },
  {
    "path": "doc/source/examples/chatbot.rst",
    "chars": 1275,
    "preview": ".. _examples_chatbot:\n\n========================\nExample: CLI chatbot 🤖️\n========================\n\n**Description**:\n\nDemo"
  },
  {
    "path": "doc/source/examples/gradio_chatinterface.rst",
    "chars": 1376,
    "preview": ".. _examples_gradio_chatinterface:\n\n===============================\nExample: Gradio ChatInterface🤗\n====================="
  },
  {
    "path": "doc/source/examples/index.rst",
    "chars": 2180,
    "preview": ".. _examples_index:\n\n========\nExamples\n========\n\n.. toctree::\n   :maxdepth: 2\n   :hidden:\n\n   ai_podcast\n   chatbot\n   g"
  },
  {
    "path": "doc/source/examples/langchain_streamlit_doc_chat.rst",
    "chars": 1390,
    "preview": ".. _examples_langchain_streamlit_doc_chat:\n\n=======================================\nExample: LangChain Streamlit Doc Cha"
  },
  {
    "path": "doc/source/examples/pdf_chatbot.rst",
    "chars": 1062,
    "preview": ".. _examples_pdf_chatbot:\n\n======================\nExample: PDF Chatbot📚\n======================\n\n**Description**:\n\nThis e"
  },
  {
    "path": "doc/source/gen_docs.py",
    "chars": 27256,
    "preview": "# Copyright 2022-2023 XProbe Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use "
  },
  {
    "path": "doc/source/getting_started/environments.rst",
    "chars": 3543,
    "preview": ".. _environments:\n\n======================\nEnvironments Variables\n======================\n\nXINFERENCE_ENDPOINT\n~~~~~~~~~~~"
  },
  {
    "path": "doc/source/getting_started/index.rst",
    "chars": 249,
    "preview": ".. _getting_started_index:\n\n===============\nGetting Started\n===============\n\n\n.. toctree::\n   :maxdepth: 2\n\n   installat"
  },
  {
    "path": "doc/source/getting_started/installation.rst",
    "chars": 6816,
    "preview": ".. _installation:\n\n============\nInstallation\n============\nXinference can be installed with ``pip`` on Linux, Windows, an"
  },
  {
    "path": "doc/source/getting_started/installation_npu.rst",
    "chars": 1661,
    "preview": ".. _installation_npu:\n\n\n=================================\nInstallation Guide for Ascend NPU\n============================"
  },
  {
    "path": "doc/source/getting_started/logging.rst",
    "chars": 1993,
    "preview": ".. _logging:\n\n=====================\nLogging in Xinference\n=====================\n\nConfigure Log Level\n###################"
  },
  {
    "path": "doc/source/getting_started/release_notes.rst",
    "chars": 3043,
    "preview": ".. _release_ntoes:\n\nRelease Notes\n=============\n\nThis page provides a version-by-version index of Xinference release not"
  },
  {
    "path": "doc/source/getting_started/troubleshooting.rst",
    "chars": 11927,
    "preview": ".. _troubleshooting:\n\n===============\nTroubleshooting\n===============\n\n\nNo huggingface repo access\n====================="
  },
  {
    "path": "doc/source/getting_started/using_docker_image.rst",
    "chars": 5390,
    "preview": ".. _using_docker_image:\n\n=======================\nXinference Docker Image\n=======================\n\nXinference provides of"
  },
  {
    "path": "doc/source/getting_started/using_kubernetes.rst",
    "chars": 2899,
    "preview": ".. _using_kubernetes:\n\n########################\nXinference on Kubernetes\n########################\n\n************\nHelm Sup"
  },
  {
    "path": "doc/source/getting_started/using_xinference.rst",
    "chars": 16025,
    "preview": ".. _using_xinference:\n\n================\nUsing Xinference\n================\n\n\nRun Xinference Locally\n====================="
  },
  {
    "path": "doc/source/index.rst",
    "chars": 6227,
    "preview": ".. _index:\n\n======================\nWelcome to Xinference!\n======================\n\n.. toctree::\n   :maxdepth: 2\n   :hidde"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/development/contributing_codebase.po",
    "chars": 8084,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/development/contributing_environment.po",
    "chars": 8155,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/development/index.po",
    "chars": 711,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/development/xinference_internals.po",
    "chars": 30084,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/examples/ai_podcast.po",
    "chars": 6656,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/examples/chatbot.po",
    "chars": 3258,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/examples/gradio_chatinterface.po",
    "chars": 3456,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/examples/index.po",
    "chars": 4598,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/examples/langchain_streamlit_doc_chat.po",
    "chars": 3667,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/examples/pdf_chatbot.po",
    "chars": 3049,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/getting_started/environments.po",
    "chars": 8166,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/getting_started/index.po",
    "chars": 719,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/getting_started/installation.po",
    "chars": 13125,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/getting_started/installation_npu.po",
    "chars": 3148,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/getting_started/logging.po",
    "chars": 3731,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/getting_started/release_notes.po",
    "chars": 4841,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2025, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/getting_started/troubleshooting.po",
    "chars": 20625,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po",
    "chars": 9767,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_kubernetes.po",
    "chars": 5066,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_xinference.po",
    "chars": 20502,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/getting_started.po",
    "chars": 719,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/index.po",
    "chars": 4952,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/audio/index.po",
    "chars": 879,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-base-en-v1.5.po",
    "chars": 1806,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-base-en.po",
    "chars": 1728,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-base-zh-v1.5.po",
    "chars": 1806,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-base-zh.po",
    "chars": 1728,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-large-en-v1.5.po",
    "chars": 1822,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-large-en.po",
    "chars": 1744,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-large-zh-noinstruct.po",
    "chars": 1912,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-large-zh-v1.5.po",
    "chars": 1822,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-large-zh.po",
    "chars": 1744,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-small-en-v1.5.po",
    "chars": 1821,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-small-zh-v1.5.po",
    "chars": 1821,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-small-zh.po",
    "chars": 1743,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/e5-large-v2.po",
    "chars": 1740,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/gte-base.po",
    "chars": 1691,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/gte-large.po",
    "chars": 1707,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/index.po",
    "chars": 838,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/jina-embeddings-v2-base-en.po",
    "chars": 1961,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/jina-embeddings-v2-small-en.po",
    "chars": 1976,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/multilingual-e5-large.po",
    "chars": 1890,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/flux.1-dev.po",
    "chars": 1464,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/flux.1-schnell.po",
    "chars": 1508,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/index.po",
    "chars": 879,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/kolors.po",
    "chars": 1437,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/sd-turbo.po",
    "chars": 1436,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/sd3-medium.po",
    "chars": 1496,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/sdxl-turbo.po",
    "chars": 1458,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/stable-diffusion-2-inpainting.po",
    "chars": 1667,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/stable-diffusion-inpainting.po",
    "chars": 1642,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/stable-diffusion-v1.5.po",
    "chars": 1656,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/stable-diffusion-xl-base-1.0.po",
    "chars": 1696,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/stable-diffusion-xl-inpainting.po",
    "chars": 1684,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/index.po",
    "chars": 716,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/baichuan-2-chat.po",
    "chars": 3184,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/baichuan-2.po",
    "chars": 3087,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/baichuan-chat.po",
    "chars": 2300,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/baichuan.po",
    "chars": 3639,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/chatglm.po",
    "chars": 2867,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/chatglm2-32k.po",
    "chars": 2351,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/chatglm2.po",
    "chars": 3041,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/chatglm3-32k.po",
    "chars": 2357,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/chatglm3.po",
    "chars": 3013,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/code-llama-instruct.po",
    "chars": 6565,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/code-llama-python.po",
    "chars": 6280,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/code-llama.po",
    "chars": 5754,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/deepseek-chat.po",
    "chars": 4681,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/deepseek-coder-instruct.po",
    "chars": 6644,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/falcon-instruct.po",
    "chars": 3074,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/falcon.po",
    "chars": 2750,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/glaive-coder.po",
    "chars": 2332,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/gorilla-openfunctions-v1.po",
    "chars": 3462,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/gpt-2.po",
    "chars": 2155,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/index.po",
    "chars": 23867,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/internlm-20b.po",
    "chars": 2360,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/internlm-7b.po",
    "chars": 2368,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/internlm-chat-20b.po",
    "chars": 2508,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/internlm-chat-7b.po",
    "chars": 2425,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/llama-2-chat.po",
    "chars": 5882,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/llama-2.po",
    "chars": 5344,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/mistral-instruct-v0.1.po",
    "chars": 3406,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/mistral-instruct-v0.2.po",
    "chars": 3515,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/mistral-v0.1.po",
    "chars": 3223,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/mixtral-instruct-v0.1.po",
    "chars": 3395,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/mixtral-v0.1.po",
    "chars": 3148,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/openbuddy.po",
    "chars": 2417,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/openhermes-2.5.po",
    "chars": 3144,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/opt.po",
    "chars": 2124,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/orca.po",
    "chars": 3360,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/qwen-chat.po",
    "chars": 7922,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/starchat-beta.po",
    "chars": 2304,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/starcoder.po",
    "chars": 2254,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/starcoderplus.po",
    "chars": 2310,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/tiny-llama.po",
    "chars": 2425,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/vicuna-v1.3.po",
    "chars": 5534,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/vicuna-v1.5-16k.po",
    "chars": 3014,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/vicuna-v1.5.po",
    "chars": 2892,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/wizardcoder-python-v1.0.po",
    "chars": 6395,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/wizardlm-v1.0.po",
    "chars": 3092,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/wizardmath-v1.0.po",
    "chars": 3788,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/xverse-chat.po",
    "chars": 2999,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/xverse.po",
    "chars": 3575,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/yi-200k.po",
    "chars": 2920,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/yi-chat.po",
    "chars": 2994,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/yi.po",
    "chars": 3472,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/zephyr-7b-alpha.po",
    "chars": 2421,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/zephyr-7b-beta.po",
    "chars": 2400,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/rerank/bge-reranker-base.po",
    "chars": 1349,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/rerank/bge-reranker-large.po",
    "chars": 1359,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/rerank/index.po",
    "chars": 828,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/video/cogvideox-2b.po",
    "chars": 1360,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/video/index.po",
    "chars": 879,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/custom.po",
    "chars": 16197,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/index.po",
    "chars": 6347,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/lora.po",
    "chars": 3132,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po",
    "chars": 30230,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/chat.po",
    "chars": 7978,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/embed.po",
    "chars": 4768,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/flexible.po",
    "chars": 10050,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/image.po",
    "chars": 17890,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/index.po",
    "chars": 726,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/multimodal.po",
    "chars": 8161,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/rerank.po",
    "chars": 1666,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/tools.po",
    "chars": 6717,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/video.po",
    "chars": 9698,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_memory.po",
    "chars": 3869,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/model_update.po",
    "chars": 2185,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2025, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/source/source.po",
    "chars": 4139,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/sources/sources.po",
    "chars": 2669,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po",
    "chars": 17514,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/xinference_model_hub.po",
    "chars": 9088,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2025, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/models/xinference_models_hub.po",
    "chars": 10329,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2025, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/reference/index.po",
    "chars": 14246,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/reference.po",
    "chars": 713,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/user_guide/auth_system.po",
    "chars": 9379,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/user_guide/backends.po",
    "chars": 15128,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/user_guide/cache_management.po",
    "chars": 726,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/user_guide/client_api.po",
    "chars": 6017,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/user_guide/continuous_batching.po",
    "chars": 6243,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/user_guide/distributed_inference.po",
    "chars": 4823,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/user_guide/index.po",
    "chars": 709,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/user_guide/launch.po",
    "chars": 7851,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2025, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/locale/zh_CN/LC_MESSAGES/user_guide/vllm_enhancement.po",
    "chars": 2561,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2023, Xorbits Inc.\n# This file is distributed under the same license as the Xi"
  },
  {
    "path": "doc/source/models/builtin/audio/belle-distilwhisper-large-v2-zh.rst",
    "chars": 512,
    "preview": ".. _models_builtin_belle-distilwhisper-large-v2-zh:\n\n===============================\nBelle-distilwhisper-large-v2-zh\n==="
  },
  {
    "path": "doc/source/models/builtin/audio/belle-whisper-large-v2-zh.rst",
    "chars": 470,
    "preview": ".. _models_builtin_belle-whisper-large-v2-zh:\n\n=========================\nBelle-whisper-large-v2-zh\n====================="
  },
  {
    "path": "doc/source/models/builtin/audio/belle-whisper-large-v3-zh.rst",
    "chars": 470,
    "preview": ".. _models_builtin_belle-whisper-large-v3-zh:\n\n=========================\nBelle-whisper-large-v3-zh\n====================="
  },
  {
    "path": "doc/source/models/builtin/audio/chattts.rst",
    "chars": 366,
    "preview": ".. _models_builtin_chattts:\n\n=======\nChatTTS\n=======\n\n- **Model Name:** ChatTTS\n- **Model Family:** ChatTTS\n- **Abilitie"
  },
  {
    "path": "doc/source/models/builtin/audio/cosyvoice-300m-instruct.rst",
    "chars": 485,
    "preview": ".. _models_builtin_cosyvoice-300m-instruct:\n\n=======================\nCosyVoice-300M-Instruct\n=======================\n\n- "
  }
]

// ... and 1435 more files (download for full content)

About this extraction

This page contains the full source code of the xorbitsai/inference GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 1635 files (67.4 MB), approximately 5.7M tokens, and a symbol index with 8657 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo