Repository: xorbitsai/inference Branch: main Commit: ebc027138775 Files: 1635 Total size: 67.4 MB Directory structure: gitextract_u_nl6j7f/ ├── .dockerignore ├── .gitattributes ├── .github/ │ ├── ISSUE_TEMPLATE/ │ │ ├── bug_report.yaml │ │ └── feature_request.yaml │ └── workflows/ │ ├── assign.yaml │ ├── docker-cd.yaml │ ├── issue.yaml │ ├── pr_auto_run_gen_docs.yaml │ ├── python.yaml │ └── release.yaml ├── .gitignore ├── .pre-commit-config.yaml ├── .readthedocs.yaml ├── LICENSE ├── MANIFEST.in ├── README.md ├── README_ja_JP.md ├── README_zh_CN.md ├── benchmark/ │ ├── README.md │ ├── benchmark_embedding.py │ ├── benchmark_latency.py │ ├── benchmark_long.py │ ├── benchmark_rerank.py │ ├── benchmark_runner.py │ ├── benchmark_serving.py │ └── utils.py ├── doc/ │ ├── Makefile │ ├── source/ │ │ ├── _static/ │ │ │ └── switcher.json │ │ ├── conf.py │ │ ├── development/ │ │ │ ├── contributing_codebase.rst │ │ │ ├── contributing_environment.rst │ │ │ ├── index.rst │ │ │ └── xinference_internals.rst │ │ ├── examples/ │ │ │ ├── ai_podcast.rst │ │ │ ├── chatbot.rst │ │ │ ├── gradio_chatinterface.rst │ │ │ ├── index.rst │ │ │ ├── langchain_streamlit_doc_chat.rst │ │ │ └── pdf_chatbot.rst │ │ ├── gen_docs.py │ │ ├── getting_started/ │ │ │ ├── environments.rst │ │ │ ├── index.rst │ │ │ ├── installation.rst │ │ │ ├── installation_npu.rst │ │ │ ├── logging.rst │ │ │ ├── release_notes.rst │ │ │ ├── troubleshooting.rst │ │ │ ├── using_docker_image.rst │ │ │ ├── using_kubernetes.rst │ │ │ └── using_xinference.rst │ │ ├── index.rst │ │ ├── locale/ │ │ │ └── zh_CN/ │ │ │ └── LC_MESSAGES/ │ │ │ ├── development/ │ │ │ │ ├── contributing_codebase.po │ │ │ │ ├── contributing_environment.po │ │ │ │ ├── index.po │ │ │ │ └── xinference_internals.po │ │ │ ├── examples/ │ │ │ │ ├── ai_podcast.po │ │ │ │ ├── chatbot.po │ │ │ │ ├── gradio_chatinterface.po │ │ │ │ ├── index.po │ │ │ │ ├── langchain_streamlit_doc_chat.po │ │ │ │ └── pdf_chatbot.po │ │ │ ├── getting_started/ │ │ │ │ ├── environments.po │ │ │ │ ├── index.po │ │ │ │ ├── installation.po │ │ │ │ ├── installation_npu.po │ │ │ │ ├── logging.po │ │ │ │ ├── release_notes.po │ │ │ │ ├── troubleshooting.po │ │ │ │ ├── using_docker_image.po │ │ │ │ ├── using_kubernetes.po │ │ │ │ └── using_xinference.po │ │ │ ├── getting_started.po │ │ │ ├── index.po │ │ │ ├── models/ │ │ │ │ ├── builtin/ │ │ │ │ │ ├── audio/ │ │ │ │ │ │ └── index.po │ │ │ │ │ ├── embedding/ │ │ │ │ │ │ ├── bge-base-en-v1.5.po │ │ │ │ │ │ ├── bge-base-en.po │ │ │ │ │ │ ├── bge-base-zh-v1.5.po │ │ │ │ │ │ ├── bge-base-zh.po │ │ │ │ │ │ ├── bge-large-en-v1.5.po │ │ │ │ │ │ ├── bge-large-en.po │ │ │ │ │ │ ├── bge-large-zh-noinstruct.po │ │ │ │ │ │ ├── bge-large-zh-v1.5.po │ │ │ │ │ │ ├── bge-large-zh.po │ │ │ │ │ │ ├── bge-small-en-v1.5.po │ │ │ │ │ │ ├── bge-small-zh-v1.5.po │ │ │ │ │ │ ├── bge-small-zh.po │ │ │ │ │ │ ├── e5-large-v2.po │ │ │ │ │ │ ├── gte-base.po │ │ │ │ │ │ ├── gte-large.po │ │ │ │ │ │ ├── index.po │ │ │ │ │ │ ├── jina-embeddings-v2-base-en.po │ │ │ │ │ │ ├── jina-embeddings-v2-small-en.po │ │ │ │ │ │ └── multilingual-e5-large.po │ │ │ │ │ ├── image/ │ │ │ │ │ │ ├── flux.1-dev.po │ │ │ │ │ │ ├── flux.1-schnell.po │ │ │ │ │ │ ├── index.po │ │ │ │ │ │ ├── kolors.po │ │ │ │ │ │ ├── sd-turbo.po │ │ │ │ │ │ ├── sd3-medium.po │ │ │ │ │ │ ├── sdxl-turbo.po │ │ │ │ │ │ ├── stable-diffusion-2-inpainting.po │ │ │ │ │ │ ├── stable-diffusion-inpainting.po │ │ │ │ │ │ ├── stable-diffusion-v1.5.po │ │ │ │ │ │ ├── stable-diffusion-xl-base-1.0.po │ │ │ │ │ │ └── stable-diffusion-xl-inpainting.po │ │ │ │ │ ├── index.po │ │ │ │ │ ├── llm/ │ │ │ │ │ │ ├── baichuan-2-chat.po │ │ │ │ │ │ ├── baichuan-2.po │ │ │ │ │ │ ├── baichuan-chat.po │ │ │ │ │ │ ├── baichuan.po │ │ │ │ │ │ ├── chatglm.po │ │ │ │ │ │ ├── chatglm2-32k.po │ │ │ │ │ │ ├── chatglm2.po │ │ │ │ │ │ ├── chatglm3-32k.po │ │ │ │ │ │ ├── chatglm3.po │ │ │ │ │ │ ├── code-llama-instruct.po │ │ │ │ │ │ ├── code-llama-python.po │ │ │ │ │ │ ├── code-llama.po │ │ │ │ │ │ ├── deepseek-chat.po │ │ │ │ │ │ ├── deepseek-coder-instruct.po │ │ │ │ │ │ ├── falcon-instruct.po │ │ │ │ │ │ ├── falcon.po │ │ │ │ │ │ ├── glaive-coder.po │ │ │ │ │ │ ├── gorilla-openfunctions-v1.po │ │ │ │ │ │ ├── gpt-2.po │ │ │ │ │ │ ├── index.po │ │ │ │ │ │ ├── internlm-20b.po │ │ │ │ │ │ ├── internlm-7b.po │ │ │ │ │ │ ├── internlm-chat-20b.po │ │ │ │ │ │ ├── internlm-chat-7b.po │ │ │ │ │ │ ├── llama-2-chat.po │ │ │ │ │ │ ├── llama-2.po │ │ │ │ │ │ ├── mistral-instruct-v0.1.po │ │ │ │ │ │ ├── mistral-instruct-v0.2.po │ │ │ │ │ │ ├── mistral-v0.1.po │ │ │ │ │ │ ├── mixtral-instruct-v0.1.po │ │ │ │ │ │ ├── mixtral-v0.1.po │ │ │ │ │ │ ├── openbuddy.po │ │ │ │ │ │ ├── openhermes-2.5.po │ │ │ │ │ │ ├── opt.po │ │ │ │ │ │ ├── orca.po │ │ │ │ │ │ ├── qwen-chat.po │ │ │ │ │ │ ├── starchat-beta.po │ │ │ │ │ │ ├── starcoder.po │ │ │ │ │ │ ├── starcoderplus.po │ │ │ │ │ │ ├── tiny-llama.po │ │ │ │ │ │ ├── vicuna-v1.3.po │ │ │ │ │ │ ├── vicuna-v1.5-16k.po │ │ │ │ │ │ ├── vicuna-v1.5.po │ │ │ │ │ │ ├── wizardcoder-python-v1.0.po │ │ │ │ │ │ ├── wizardlm-v1.0.po │ │ │ │ │ │ ├── wizardmath-v1.0.po │ │ │ │ │ │ ├── xverse-chat.po │ │ │ │ │ │ ├── xverse.po │ │ │ │ │ │ ├── yi-200k.po │ │ │ │ │ │ ├── yi-chat.po │ │ │ │ │ │ ├── yi.po │ │ │ │ │ │ ├── zephyr-7b-alpha.po │ │ │ │ │ │ └── zephyr-7b-beta.po │ │ │ │ │ ├── rerank/ │ │ │ │ │ │ ├── bge-reranker-base.po │ │ │ │ │ │ ├── bge-reranker-large.po │ │ │ │ │ │ └── index.po │ │ │ │ │ └── video/ │ │ │ │ │ ├── cogvideox-2b.po │ │ │ │ │ └── index.po │ │ │ │ ├── custom.po │ │ │ │ ├── index.po │ │ │ │ ├── lora.po │ │ │ │ ├── model_abilities/ │ │ │ │ │ ├── audio.po │ │ │ │ │ ├── chat.po │ │ │ │ │ ├── embed.po │ │ │ │ │ ├── flexible.po │ │ │ │ │ ├── image.po │ │ │ │ │ ├── index.po │ │ │ │ │ ├── multimodal.po │ │ │ │ │ ├── rerank.po │ │ │ │ │ ├── tools.po │ │ │ │ │ └── video.po │ │ │ │ ├── model_memory.po │ │ │ │ ├── model_update.po │ │ │ │ ├── source/ │ │ │ │ │ └── source.po │ │ │ │ ├── sources/ │ │ │ │ │ └── sources.po │ │ │ │ ├── virtualenv.po │ │ │ │ ├── xinference_model_hub.po │ │ │ │ └── xinference_models_hub.po │ │ │ ├── reference/ │ │ │ │ └── index.po │ │ │ ├── reference.po │ │ │ └── user_guide/ │ │ │ ├── auth_system.po │ │ │ ├── backends.po │ │ │ ├── cache_management.po │ │ │ ├── client_api.po │ │ │ ├── continuous_batching.po │ │ │ ├── distributed_inference.po │ │ │ ├── index.po │ │ │ ├── launch.po │ │ │ └── vllm_enhancement.po │ │ ├── models/ │ │ │ ├── builtin/ │ │ │ │ ├── audio/ │ │ │ │ │ ├── belle-distilwhisper-large-v2-zh.rst │ │ │ │ │ ├── belle-whisper-large-v2-zh.rst │ │ │ │ │ ├── belle-whisper-large-v3-zh.rst │ │ │ │ │ ├── chattts.rst │ │ │ │ │ ├── cosyvoice-300m-instruct.rst │ │ │ │ │ ├── cosyvoice-300m-sft.rst │ │ │ │ │ ├── cosyvoice-300m.rst │ │ │ │ │ ├── cosyvoice2-0.5b.rst │ │ │ │ │ ├── f5-tts-mlx.rst │ │ │ │ │ ├── f5-tts.rst │ │ │ │ │ ├── fishspeech-1.5.rst │ │ │ │ │ ├── fun-asr-mlt-nano-2512.rst │ │ │ │ │ ├── fun-asr-nano-2512.rst │ │ │ │ │ ├── index.rst │ │ │ │ │ ├── indextts2.rst │ │ │ │ │ ├── kokoro-82m-mlx.rst │ │ │ │ │ ├── kokoro-82m-v1.1-zh.rst │ │ │ │ │ ├── kokoro-82m.rst │ │ │ │ │ ├── megatts3.rst │ │ │ │ │ ├── melotts-chinese.rst │ │ │ │ │ ├── melotts-english-v2.rst │ │ │ │ │ ├── melotts-english-v3.rst │ │ │ │ │ ├── melotts-english.rst │ │ │ │ │ ├── melotts-french.rst │ │ │ │ │ ├── melotts-japanese.rst │ │ │ │ │ ├── melotts-korean.rst │ │ │ │ │ ├── melotts-spanish.rst │ │ │ │ │ ├── paraformer-zh-hotword.rst │ │ │ │ │ ├── paraformer-zh-long.rst │ │ │ │ │ ├── paraformer-zh-spk.rst │ │ │ │ │ ├── paraformer-zh.rst │ │ │ │ │ ├── qwen3-asr-0.6b.rst │ │ │ │ │ ├── qwen3-asr-1.7b.rst │ │ │ │ │ ├── seaco-paraformer-zh.rst │ │ │ │ │ ├── sensevoicesmall.rst │ │ │ │ │ ├── whisper-base-mlx.rst │ │ │ │ │ ├── whisper-base.en-mlx.rst │ │ │ │ │ ├── whisper-base.en.rst │ │ │ │ │ ├── whisper-base.rst │ │ │ │ │ ├── whisper-large-v3-mlx.rst │ │ │ │ │ ├── whisper-large-v3-turbo-mlx.rst │ │ │ │ │ ├── whisper-large-v3-turbo.rst │ │ │ │ │ ├── whisper-large-v3.rst │ │ │ │ │ ├── whisper-medium-mlx.rst │ │ │ │ │ ├── whisper-medium.en-mlx.rst │ │ │ │ │ ├── whisper-medium.en.rst │ │ │ │ │ ├── whisper-medium.rst │ │ │ │ │ ├── whisper-small-mlx.rst │ │ │ │ │ ├── whisper-small.en-mlx.rst │ │ │ │ │ ├── whisper-small.en.rst │ │ │ │ │ ├── whisper-small.rst │ │ │ │ │ ├── whisper-tiny-mlx.rst │ │ │ │ │ ├── whisper-tiny.en-mlx.rst │ │ │ │ │ ├── whisper-tiny.en.rst │ │ │ │ │ └── whisper-tiny.rst │ │ │ │ ├── embedding/ │ │ │ │ │ ├── bce-embedding-base_v1.rst │ │ │ │ │ ├── bge-base-en-v1.5.rst │ │ │ │ │ ├── bge-base-en.rst │ │ │ │ │ ├── bge-base-zh-v1.5.rst │ │ │ │ │ ├── bge-base-zh.rst │ │ │ │ │ ├── bge-large-en-v1.5.rst │ │ │ │ │ ├── bge-large-en.rst │ │ │ │ │ ├── bge-large-zh-noinstruct.rst │ │ │ │ │ ├── bge-large-zh-v1.5.rst │ │ │ │ │ ├── bge-large-zh.rst │ │ │ │ │ ├── bge-m3.rst │ │ │ │ │ ├── bge-small-en-v1.5.rst │ │ │ │ │ ├── bge-small-zh-v1.5.rst │ │ │ │ │ ├── bge-small-zh.rst │ │ │ │ │ ├── e5-large-v2.rst │ │ │ │ │ ├── gme-qwen2-vl-2b-instruct.rst │ │ │ │ │ ├── gme-qwen2-vl-7b-instruct.rst │ │ │ │ │ ├── gte-base.rst │ │ │ │ │ ├── gte-large.rst │ │ │ │ │ ├── gte-qwen2.rst │ │ │ │ │ ├── index.rst │ │ │ │ │ ├── jina-clip-v2.rst │ │ │ │ │ ├── jina-embeddings-v2-base-en.rst │ │ │ │ │ ├── jina-embeddings-v2-base-zh.rst │ │ │ │ │ ├── jina-embeddings-v2-small-en.rst │ │ │ │ │ ├── jina-embeddings-v3.rst │ │ │ │ │ ├── jina-embeddings-v4.rst │ │ │ │ │ ├── m3e-base.rst │ │ │ │ │ ├── m3e-large.rst │ │ │ │ │ ├── m3e-small.rst │ │ │ │ │ ├── multilingual-e5-large.rst │ │ │ │ │ ├── qwen3-embedding-0.6b.rst │ │ │ │ │ ├── qwen3-embedding-4b.rst │ │ │ │ │ ├── qwen3-embedding-8b.rst │ │ │ │ │ ├── qwen3-vl-embedding-2b.rst │ │ │ │ │ ├── qwen3-vl-embedding-8b.rst │ │ │ │ │ ├── text2vec-base-chinese-paraphrase.rst │ │ │ │ │ ├── text2vec-base-chinese-sentence.rst │ │ │ │ │ ├── text2vec-base-chinese.rst │ │ │ │ │ ├── text2vec-base-multilingual.rst │ │ │ │ │ └── text2vec-large-chinese.rst │ │ │ │ ├── image/ │ │ │ │ │ ├── cogview4.rst │ │ │ │ │ ├── deepseek-ocr.rst │ │ │ │ │ ├── flux.1-dev.rst │ │ │ │ │ ├── flux.1-kontext-dev.rst │ │ │ │ │ ├── flux.1-schnell.rst │ │ │ │ │ ├── flux.2-dev.rst │ │ │ │ │ ├── flux.2-klein-4b.rst │ │ │ │ │ ├── flux.2-klein-9b.rst │ │ │ │ │ ├── got-ocr2_0.rst │ │ │ │ │ ├── hunyuandit-v1.2-distilled.rst │ │ │ │ │ ├── hunyuandit-v1.2.rst │ │ │ │ │ ├── hunyuanocr.rst │ │ │ │ │ ├── index.rst │ │ │ │ │ ├── kolors.rst │ │ │ │ │ ├── mineru2.5-2509-1.2b.rst │ │ │ │ │ ├── paddleocr-vl.rst │ │ │ │ │ ├── qwen-image-2512.rst │ │ │ │ │ ├── qwen-image-edit-2509.rst │ │ │ │ │ ├── qwen-image-edit-2511.rst │ │ │ │ │ ├── qwen-image-edit.rst │ │ │ │ │ ├── qwen-image-layered.rst │ │ │ │ │ ├── qwen-image.rst │ │ │ │ │ ├── sd-turbo.rst │ │ │ │ │ ├── sd3-medium.rst │ │ │ │ │ ├── sd3.5-large-turbo.rst │ │ │ │ │ ├── sd3.5-large.rst │ │ │ │ │ ├── sd3.5-medium.rst │ │ │ │ │ ├── sdxl-turbo.rst │ │ │ │ │ ├── stable-diffusion-2-inpainting.rst │ │ │ │ │ ├── stable-diffusion-inpainting.rst │ │ │ │ │ ├── stable-diffusion-v1.5.rst │ │ │ │ │ ├── stable-diffusion-xl-base-1.0.rst │ │ │ │ │ ├── stable-diffusion-xl-inpainting.rst │ │ │ │ │ ├── z-image-turbo.rst │ │ │ │ │ └── z-image.rst │ │ │ │ ├── index.rst │ │ │ │ ├── llm/ │ │ │ │ │ ├── baichuan-2-chat.rst │ │ │ │ │ ├── baichuan-2.rst │ │ │ │ │ ├── baichuan-m2.rst │ │ │ │ │ ├── code-llama-instruct.rst │ │ │ │ │ ├── code-llama-python.rst │ │ │ │ │ ├── code-llama.rst │ │ │ │ │ ├── codegeex4.rst │ │ │ │ │ ├── codeqwen1.5-chat.rst │ │ │ │ │ ├── codeqwen1.5.rst │ │ │ │ │ ├── codeshell-chat.rst │ │ │ │ │ ├── codeshell.rst │ │ │ │ │ ├── codestral-v0.1.rst │ │ │ │ │ ├── cogagent.rst │ │ │ │ │ ├── deepseek-chat.rst │ │ │ │ │ ├── deepseek-coder-instruct.rst │ │ │ │ │ ├── deepseek-coder.rst │ │ │ │ │ ├── deepseek-prover-v2.rst │ │ │ │ │ ├── deepseek-r1-0528-qwen3.rst │ │ │ │ │ ├── deepseek-r1-0528.rst │ │ │ │ │ ├── deepseek-r1-distill-llama.rst │ │ │ │ │ ├── deepseek-r1-distill-qwen.rst │ │ │ │ │ ├── deepseek-r1.rst │ │ │ │ │ ├── deepseek-v2-chat-0628.rst │ │ │ │ │ ├── deepseek-v2-chat.rst │ │ │ │ │ ├── deepseek-v2.5.rst │ │ │ │ │ ├── deepseek-v3-0324.rst │ │ │ │ │ ├── deepseek-v3.1.rst │ │ │ │ │ ├── deepseek-v3.2-exp.rst │ │ │ │ │ ├── deepseek-v3.2.rst │ │ │ │ │ ├── deepseek-v3.rst │ │ │ │ │ ├── deepseek-vl2.rst │ │ │ │ │ ├── deepseek.rst │ │ │ │ │ ├── dianjin-r1.rst │ │ │ │ │ ├── ernie4.5.rst │ │ │ │ │ ├── fin-r1.rst │ │ │ │ │ ├── gemma-3-1b-it.rst │ │ │ │ │ ├── gemma-3-it.rst │ │ │ │ │ ├── glm-4.1v-thinking.rst │ │ │ │ │ ├── glm-4.5.rst │ │ │ │ │ ├── glm-4.5v.rst │ │ │ │ │ ├── glm-4.6.rst │ │ │ │ │ ├── glm-4.7-flash.rst │ │ │ │ │ ├── glm-4.7.rst │ │ │ │ │ ├── glm-4v.rst │ │ │ │ │ ├── glm-5.rst │ │ │ │ │ ├── glm-edge-chat.rst │ │ │ │ │ ├── glm4-0414.rst │ │ │ │ │ ├── glm4-chat-1m.rst │ │ │ │ │ ├── glm4-chat.rst │ │ │ │ │ ├── gorilla-openfunctions-v2.rst │ │ │ │ │ ├── gpt-2.rst │ │ │ │ │ ├── gpt-oss.rst │ │ │ │ │ ├── huatuogpt-o1-llama-3.1.rst │ │ │ │ │ ├── huatuogpt-o1-qwen2.5.rst │ │ │ │ │ ├── index.rst │ │ │ │ │ ├── internlm3-instruct.rst │ │ │ │ │ ├── internvl3.rst │ │ │ │ │ ├── kat-v1.rst │ │ │ │ │ ├── kimi-k2.5.rst │ │ │ │ │ ├── llama-2-chat.rst │ │ │ │ │ ├── llama-2.rst │ │ │ │ │ ├── llama-3-instruct.rst │ │ │ │ │ ├── llama-3.1-instruct.rst │ │ │ │ │ ├── llama-3.1.rst │ │ │ │ │ ├── llama-3.2-vision-instruct.rst │ │ │ │ │ ├── llama-3.2-vision.rst │ │ │ │ │ ├── llama-3.3-instruct.rst │ │ │ │ │ ├── llama-3.rst │ │ │ │ │ ├── marco-o1.rst │ │ │ │ │ ├── mineru2.5-2509-1.2b.rst │ │ │ │ │ ├── minicpm-2b-dpo-bf16.rst │ │ │ │ │ ├── minicpm-2b-dpo-fp16.rst │ │ │ │ │ ├── minicpm-2b-dpo-fp32.rst │ │ │ │ │ ├── minicpm-2b-sft-bf16.rst │ │ │ │ │ ├── minicpm-2b-sft-fp32.rst │ │ │ │ │ ├── minicpm-v-2.6.rst │ │ │ │ │ ├── minicpm-v-4.5.rst │ │ │ │ │ ├── minicpm3-4b.rst │ │ │ │ │ ├── minicpm4.rst │ │ │ │ │ ├── minimax-m2.5.rst │ │ │ │ │ ├── minimax-m2.rst │ │ │ │ │ ├── mistral-instruct-v0.1.rst │ │ │ │ │ ├── mistral-instruct-v0.2.rst │ │ │ │ │ ├── mistral-instruct-v0.3.rst │ │ │ │ │ ├── mistral-large-instruct.rst │ │ │ │ │ ├── mistral-nemo-instruct.rst │ │ │ │ │ ├── mistral-v0.1.rst │ │ │ │ │ ├── mixtral-8x22b-instruct-v0.1.rst │ │ │ │ │ ├── mixtral-instruct-v0.1.rst │ │ │ │ │ ├── mixtral-v0.1.rst │ │ │ │ │ ├── moonlight-16b-a3b-instruct.rst │ │ │ │ │ ├── openhermes-2.5.rst │ │ │ │ │ ├── opt.rst │ │ │ │ │ ├── orion-chat.rst │ │ │ │ │ ├── ovis2.rst │ │ │ │ │ ├── phi-2.rst │ │ │ │ │ ├── phi-3-mini-128k-instruct.rst │ │ │ │ │ ├── phi-3-mini-4k-instruct.rst │ │ │ │ │ ├── qvq-72b-preview.rst │ │ │ │ │ ├── qwen-chat.rst │ │ │ │ │ ├── qwen1.5-chat.rst │ │ │ │ │ ├── qwen1.5-moe-chat.rst │ │ │ │ │ ├── qwen2-audio-instruct.rst │ │ │ │ │ ├── qwen2-instruct.rst │ │ │ │ │ ├── qwen2-moe-instruct.rst │ │ │ │ │ ├── qwen2-vl-instruct.rst │ │ │ │ │ ├── qwen2.5-coder-instruct.rst │ │ │ │ │ ├── qwen2.5-coder.rst │ │ │ │ │ ├── qwen2.5-instruct-1m.rst │ │ │ │ │ ├── qwen2.5-instruct.rst │ │ │ │ │ ├── qwen2.5-omni.rst │ │ │ │ │ ├── qwen2.5-vl-instruct.rst │ │ │ │ │ ├── qwen2.5.rst │ │ │ │ │ ├── qwen3-coder.rst │ │ │ │ │ ├── qwen3-instruct.rst │ │ │ │ │ ├── qwen3-next-instruct.rst │ │ │ │ │ ├── qwen3-next-thinking.rst │ │ │ │ │ ├── qwen3-omni-instruct.rst │ │ │ │ │ ├── qwen3-omni-thinking.rst │ │ │ │ │ ├── qwen3-thinking.rst │ │ │ │ │ ├── qwen3-vl-instruct.rst │ │ │ │ │ ├── qwen3-vl-thinking.rst │ │ │ │ │ ├── qwen3.5.rst │ │ │ │ │ ├── qwen3.rst │ │ │ │ │ ├── qwenlong-l1.rst │ │ │ │ │ ├── qwq-32b-preview.rst │ │ │ │ │ ├── qwq-32b.rst │ │ │ │ │ ├── seallm_v2.5.rst │ │ │ │ │ ├── seallm_v2.rst │ │ │ │ │ ├── seallms-v3.rst │ │ │ │ │ ├── seed-oss.rst │ │ │ │ │ ├── skywork-math.rst │ │ │ │ │ ├── skywork-or1-preview.rst │ │ │ │ │ ├── skywork-or1.rst │ │ │ │ │ ├── skywork.rst │ │ │ │ │ ├── telechat.rst │ │ │ │ │ ├── tiny-llama.rst │ │ │ │ │ ├── wizardcoder-python-v1.0.rst │ │ │ │ │ ├── wizardmath-v1.0.rst │ │ │ │ │ ├── xiyansql-qwencoder-2504.rst │ │ │ │ │ ├── xverse-chat.rst │ │ │ │ │ ├── xverse.rst │ │ │ │ │ ├── yi-1.5-chat-16k.rst │ │ │ │ │ ├── yi-1.5-chat.rst │ │ │ │ │ ├── yi-1.5.rst │ │ │ │ │ ├── yi-200k.rst │ │ │ │ │ ├── yi-chat.rst │ │ │ │ │ └── yi.rst │ │ │ │ ├── rerank/ │ │ │ │ │ ├── bce-reranker-base_v1.rst │ │ │ │ │ ├── bge-reranker-base.rst │ │ │ │ │ ├── bge-reranker-large.rst │ │ │ │ │ ├── bge-reranker-v2-gemma.rst │ │ │ │ │ ├── bge-reranker-v2-m3.rst │ │ │ │ │ ├── bge-reranker-v2-minicpm-layerwise.rst │ │ │ │ │ ├── index.rst │ │ │ │ │ ├── jina-reranker-v2.rst │ │ │ │ │ ├── jina-reranker-v3.rst │ │ │ │ │ ├── minicpm-reranker.rst │ │ │ │ │ ├── qwen3-reranker-0.6b.rst │ │ │ │ │ ├── qwen3-reranker-4b.rst │ │ │ │ │ ├── qwen3-reranker-8b.rst │ │ │ │ │ ├── qwen3-vl-reranker-2b.rst │ │ │ │ │ └── qwen3-vl-reranker-8b.rst │ │ │ │ └── video/ │ │ │ │ ├── cogvideox-2b.rst │ │ │ │ ├── cogvideox-5b.rst │ │ │ │ ├── hunyuanvideo.rst │ │ │ │ ├── index.rst │ │ │ │ ├── wan2.1-1.3b.rst │ │ │ │ ├── wan2.1-14b.rst │ │ │ │ ├── wan2.1-flf2v-14b-720p.rst │ │ │ │ ├── wan2.1-i2v-14b-480p.rst │ │ │ │ ├── wan2.1-i2v-14b-720p.rst │ │ │ │ ├── wan2.2-a14b.rst │ │ │ │ ├── wan2.2-i2v-a14b.rst │ │ │ │ └── wan2.2-ti2v-5b.rst │ │ │ ├── custom.rst │ │ │ ├── index.rst │ │ │ ├── lora.rst │ │ │ ├── model_abilities/ │ │ │ │ ├── audio.rst │ │ │ │ ├── chat.rst │ │ │ │ ├── embed.rst │ │ │ │ ├── flexible.rst │ │ │ │ ├── image.rst │ │ │ │ ├── index.rst │ │ │ │ ├── multimodal.rst │ │ │ │ ├── rerank.rst │ │ │ │ ├── tools.rst │ │ │ │ └── video.rst │ │ │ ├── model_memory.rst │ │ │ ├── model_update.rst │ │ │ ├── sources/ │ │ │ │ └── sources.rst │ │ │ ├── virtualenv.rst │ │ │ └── xinference_models_hub.rst │ │ ├── norm_zh.py │ │ ├── reference/ │ │ │ └── index.rst │ │ └── user_guide/ │ │ ├── auth_system.rst │ │ ├── backends.rst │ │ ├── client_api.rst │ │ ├── continuous_batching.rst │ │ ├── distributed_inference.rst │ │ ├── index.rst │ │ ├── launch.rst │ │ ├── metrics.rst │ │ └── vllm_enhancement.rst │ └── templates/ │ ├── audio.rst.jinja │ ├── audio_index.rst.jinja │ ├── embedding.rst.jinja │ ├── embedding_index.rst.jinja │ ├── image.rst.jinja │ ├── image_index.rst.jinja │ ├── llm.rst.jinja │ ├── llm_index.rst.jinja │ ├── metrics.jinja │ ├── rerank.rst.jinja │ ├── rerank_index.rst.jinja │ ├── video.rst.jinja │ └── video_index.rst.jinja ├── examples/ │ ├── AI_podcast.py │ ├── AI_podcast_ZH.py │ ├── AI_translate.py │ ├── Custom_StableDiffusion_ControlNet.ipynb │ ├── FunctionCall.ipynb │ ├── LangChain_QA.ipynb │ ├── LangChain_Streamlit_Doc_Chat.py │ ├── StableDiffusionControlNet.ipynb │ ├── Xinference_Quick_Start.ipynb │ ├── audio_to_text.ipynb │ ├── chat.py │ ├── chat_vl.ipynb │ └── gradio_chatinterface.py ├── pyproject.toml ├── setup.cfg ├── setup.py ├── versioneer.py └── xinference/ ├── __init__.py ├── _compat.py ├── _version.py ├── api/ │ ├── __init__.py │ ├── dependencies.py │ ├── oauth2/ │ │ ├── __init__.py │ │ ├── auth_service.py │ │ ├── types.py │ │ └── utils.py │ ├── responses.py │ ├── restful_api.py │ ├── routers/ │ │ ├── __init__.py │ │ ├── admin.py │ │ ├── audio.py │ │ ├── embeddings.py │ │ ├── images.py │ │ ├── llm.py │ │ ├── models.py │ │ ├── rerank.py │ │ └── videos.py │ ├── schemas/ │ │ ├── __init__.py │ │ └── requests.py │ ├── tests/ │ │ ├── __init__.py │ │ ├── test_admin.py │ │ └── test_utils.py │ └── utils.py ├── client/ │ ├── __init__.py │ ├── common.py │ ├── handlers.py │ ├── restful/ │ │ ├── __init__.py │ │ ├── async_restful_client.py │ │ └── restful_client.py │ └── tests/ │ ├── __init__.py │ ├── test_async_client.py │ ├── test_async_client_with_auth.py │ ├── test_client.py │ └── test_client_with_auth.py ├── conftest.py ├── constants.py ├── core/ │ ├── __init__.py │ ├── cache_tracker.py │ ├── event.py │ ├── launch_strategy.py │ ├── metrics.py │ ├── model.py │ ├── otel.py │ ├── progress_tracker.py │ ├── resource.py │ ├── status_guard.py │ ├── supervisor.py │ ├── tests/ │ │ ├── __init__.py │ │ ├── test_continuous_batching.py │ │ ├── test_launch_strategy.py │ │ ├── test_metrics.py │ │ ├── test_model.py │ │ ├── test_progressor.py │ │ ├── test_restful_api.py │ │ ├── test_types.py │ │ ├── test_utils.py │ │ └── test_worker.py │ ├── utils.py │ ├── virtual_env_manager.py │ └── worker.py ├── deploy/ │ ├── __init__.py │ ├── cmdline.py │ ├── docker/ │ │ ├── Dockerfile │ │ ├── Dockerfile.cpu │ │ ├── docker-compose-distributed.yml │ │ ├── docker-compose.yml │ │ ├── requirements/ │ │ │ ├── requirements-base.txt │ │ │ ├── requirements-ml.txt │ │ │ └── requirements-models.txt │ │ └── requirements_cpu/ │ │ ├── requirements_cpu-base.txt │ │ ├── requirements_cpu-ml.txt │ │ └── requirements_cpu-models.txt │ ├── local.py │ ├── supervisor.py │ ├── test/ │ │ ├── __init__.py │ │ └── test_cmdline.py │ ├── utils.py │ └── worker.py ├── device_utils.py ├── fields.py ├── isolation.py ├── model/ │ ├── __init__.py │ ├── audio/ │ │ ├── __init__.py │ │ ├── chattts.py │ │ ├── core.py │ │ ├── cosyvoice.py │ │ ├── custom.py │ │ ├── f5tts.py │ │ ├── f5tts_mlx.py │ │ ├── fish_speech.py │ │ ├── funasr.py │ │ ├── indextts2.py │ │ ├── kokoro.py │ │ ├── kokoro_mlx.py │ │ ├── kokoro_zh.py │ │ ├── megatts.py │ │ ├── melotts.py │ │ ├── model_spec.json │ │ ├── qwen3_asr.py │ │ ├── tests/ │ │ │ ├── __init__.py │ │ │ ├── bbc_news.npy │ │ │ ├── jfk.flac │ │ │ ├── test_chattts.py │ │ │ ├── test_cosyvoice.py │ │ │ ├── test_f5tts.py │ │ │ ├── test_f5tts_mlx.py │ │ │ ├── test_fish_speech.py │ │ │ ├── test_funasr.py │ │ │ ├── test_kokoro.py │ │ │ ├── test_megatts.py │ │ │ ├── test_melotts.py │ │ │ ├── test_whisper.py │ │ │ └── test_whisper_mlx.py │ │ ├── utils.py │ │ ├── whisper.py │ │ └── whisper_mlx.py │ ├── batch.py │ ├── cache_manager.py │ ├── core.py │ ├── custom.py │ ├── embedding/ │ │ ├── __init__.py │ │ ├── cache_manager.py │ │ ├── core.py │ │ ├── custom.py │ │ ├── embed_family.py │ │ ├── flag/ │ │ │ ├── __init__.py │ │ │ ├── core.py │ │ │ └── tests/ │ │ │ ├── __init__.py │ │ │ └── test_flag.py │ │ ├── llama_cpp/ │ │ │ ├── __init__.py │ │ │ ├── core.py │ │ │ └── tests/ │ │ │ ├── __init__.py │ │ │ └── test_llama_cpp.py │ │ ├── model_spec.json │ │ ├── sentence_transformers/ │ │ │ ├── __init__.py │ │ │ ├── core.py │ │ │ └── tests/ │ │ │ ├── __init__.py │ │ │ └── test_sentence_transformers.py │ │ ├── tests/ │ │ │ ├── __init__.py │ │ │ ├── test_embedding_models.py │ │ │ ├── test_integrated_embedding.py │ │ │ └── test_qwen3_vl_engine_params.py │ │ └── vllm/ │ │ ├── __init__.py │ │ ├── core.py │ │ └── tests/ │ │ ├── __init__.py │ │ └── test_vllm_embedding.py │ ├── flexible/ │ │ ├── __init__.py │ │ ├── core.py │ │ ├── custom.py │ │ ├── launchers/ │ │ │ ├── __init__.py │ │ │ ├── image_process_launcher.py │ │ │ ├── modelscope_launcher.py │ │ │ ├── transformers_launcher.py │ │ │ └── yolo_launcher.py │ │ ├── tests/ │ │ │ ├── __init__.py │ │ │ └── test_flexible_models.py │ │ └── utils.py │ ├── image/ │ │ ├── __init__.py │ │ ├── cache_manager.py │ │ ├── core.py │ │ ├── custom.py │ │ ├── engine.py │ │ ├── engine_family.py │ │ ├── model_spec.json │ │ ├── ocr/ │ │ │ ├── __init__.py │ │ │ ├── deepseek_ocr.py │ │ │ ├── got_ocr2.py │ │ │ ├── hunyuan_ocr.py │ │ │ ├── mlx.py │ │ │ ├── ocr_family.py │ │ │ ├── paddleocr_vl.py │ │ │ └── vllm.py │ │ ├── scheduler/ │ │ │ ├── __init__.py │ │ │ └── flux.py │ │ ├── sdapi.py │ │ ├── stable_diffusion/ │ │ │ ├── __init__.py │ │ │ ├── core.py │ │ │ └── mlx.py │ │ ├── tests/ │ │ │ ├── __init__.py │ │ │ ├── test_got_ocr2.py │ │ │ └── test_stable_diffusion.py │ │ └── utils.py │ ├── llm/ │ │ ├── __init__.py │ │ ├── cache_manager.py │ │ ├── config_parser.py │ │ ├── core.py │ │ ├── custom.py │ │ ├── harmony.py │ │ ├── llama_cpp/ │ │ │ ├── __init__.py │ │ │ ├── core.py │ │ │ └── tests/ │ │ │ ├── __init__.py │ │ │ ├── test_gguf.py │ │ │ └── test_structured.py │ │ ├── llm_family.json │ │ ├── llm_family.py │ │ ├── lmdeploy/ │ │ │ ├── __init__.py │ │ │ ├── core.py │ │ │ └── tests/ │ │ │ └── __init__.py │ │ ├── memory.py │ │ ├── mlx/ │ │ │ ├── __init__.py │ │ │ ├── core.py │ │ │ ├── distributed_models/ │ │ │ │ ├── __init__.py │ │ │ │ ├── core.py │ │ │ │ ├── deepseek_v3.py │ │ │ │ ├── qwen2.py │ │ │ │ ├── qwen3.py │ │ │ │ └── qwen3_moe.py │ │ │ └── tests/ │ │ │ ├── __init__.py │ │ │ ├── test_distributed_model.py │ │ │ └── test_mlx.py │ │ ├── reasoning_parser.py │ │ ├── sglang/ │ │ │ ├── __init__.py │ │ │ └── core.py │ │ ├── tests/ │ │ │ ├── __init__.py │ │ │ ├── test_harmony.py │ │ │ ├── test_llm_family.py │ │ │ ├── test_llm_model.py │ │ │ ├── test_memory_estimate.py │ │ │ ├── test_multimodal.py │ │ │ ├── test_stream_options.py │ │ │ └── test_utils.py │ │ ├── tool_parsers/ │ │ │ ├── __init__.py │ │ │ ├── abstract_tool_parser.py │ │ │ ├── deepseek_r1_tool_parser.py │ │ │ ├── deepseek_v3_1_tool_parser.py │ │ │ ├── deepseek_v3_tool_parser.py │ │ │ ├── glm4_tool_parser.py │ │ │ ├── llama3_tool_parser.py │ │ │ ├── minimax_tool_parser.py │ │ │ ├── qwen_tool_parser.py │ │ │ └── tests/ │ │ │ ├── __init__.py │ │ │ ├── test_deepseek_r1_tool_parser.py │ │ │ ├── test_deepseek_v3_1_tool_parser.py │ │ │ ├── test_deepseek_v3_tool_parser.py │ │ │ ├── test_glm4_tool_parser.py │ │ │ ├── test_llama3_tool_parser.py │ │ │ └── test_qwen_tool_parser.py │ │ ├── transformers/ │ │ │ ├── __init__.py │ │ │ ├── chatglm.py │ │ │ ├── core.py │ │ │ ├── deepseek_v2.py │ │ │ ├── gemma3.py │ │ │ ├── gpt_oss.py │ │ │ ├── multimodal/ │ │ │ │ ├── __init__.py │ │ │ │ ├── cogagent.py │ │ │ │ ├── core.py │ │ │ │ ├── deepseek_vl2.py │ │ │ │ ├── gemma3.py │ │ │ │ ├── glm4_1v.py │ │ │ │ ├── glm4v.py │ │ │ │ ├── intern_vl.py │ │ │ │ ├── minicpmv26.py │ │ │ │ ├── minicpmv45.py │ │ │ │ ├── ovis2.py │ │ │ │ ├── qwen-omni.py │ │ │ │ ├── qwen2_audio.py │ │ │ │ └── qwen2_vl.py │ │ │ ├── opt.py │ │ │ ├── tensorizer_utils.py │ │ │ ├── tests/ │ │ │ │ ├── __init__.py │ │ │ │ ├── test_opt.py │ │ │ │ └── test_tensorizer.py │ │ │ └── utils.py │ │ ├── utils.py │ │ └── vllm/ │ │ ├── __init__.py │ │ ├── core.py │ │ ├── distributed_executor.py │ │ ├── distributed_executor_v1.py │ │ ├── tests/ │ │ │ ├── __init__.py │ │ │ ├── test_core_chat_model.py │ │ │ └── test_distributed_executor.py │ │ ├── utils.py │ │ └── xavier/ │ │ ├── __init__.py │ │ ├── allocator.py │ │ ├── block.py │ │ ├── block_manager.py │ │ ├── block_tracker.py │ │ ├── collective.py │ │ ├── collective_manager.py │ │ ├── engine.py │ │ ├── executor.py │ │ ├── scheduler.py │ │ ├── test/ │ │ │ ├── __init__.py │ │ │ └── test_xavier.py │ │ ├── transfer.py │ │ └── utils.py │ ├── rerank/ │ │ ├── __init__.py │ │ ├── cache_manager.py │ │ ├── core.py │ │ ├── custom.py │ │ ├── llama_cpp/ │ │ │ ├── __init__.py │ │ │ ├── core.py │ │ │ └── tests/ │ │ │ ├── __init__.py │ │ │ └── test_llama_cpp.py │ │ ├── model_spec.json │ │ ├── rerank_family.py │ │ ├── sentence_transformers/ │ │ │ ├── __init__.py │ │ │ ├── core.py │ │ │ └── tests/ │ │ │ ├── __init__.py │ │ │ └── test_sentence_transformers.py │ │ ├── tests/ │ │ │ ├── __init__.py │ │ │ ├── test_qwen3_vl_reranker_virtualenv.py │ │ │ └── test_rerank.py │ │ ├── utils.py │ │ └── vllm/ │ │ ├── __init__.py │ │ ├── core.py │ │ └── tests/ │ │ ├── __init__.py │ │ └── test_vllm.py │ ├── scheduler/ │ │ ├── __init__.py │ │ ├── batch.py │ │ ├── core.py │ │ └── request.py │ ├── tests/ │ │ ├── __init__.py │ │ └── test_utils.py │ ├── utils.py │ └── video/ │ ├── __init__.py │ ├── cache_manager.py │ ├── core.py │ ├── diffusers.py │ ├── model_spec.json │ └── tests/ │ ├── __init__.py │ └── test_diffusers_video.py ├── thirdparty/ │ ├── __init__.py │ ├── audiotools/ │ │ ├── __init__.py │ │ ├── core/ │ │ │ ├── __init__.py │ │ │ ├── audio_signal.py │ │ │ ├── display.py │ │ │ ├── dsp.py │ │ │ ├── effects.py │ │ │ ├── ffmpeg.py │ │ │ ├── loudness.py │ │ │ ├── playback.py │ │ │ ├── templates/ │ │ │ │ ├── __init__.py │ │ │ │ ├── headers.html │ │ │ │ ├── pandoc.css │ │ │ │ └── widget.html │ │ │ ├── util.py │ │ │ └── whisper.py │ │ ├── data/ │ │ │ ├── __init__.py │ │ │ ├── datasets.py │ │ │ ├── preprocess.py │ │ │ └── transforms.py │ │ ├── metrics/ │ │ │ ├── __init__.py │ │ │ ├── distance.py │ │ │ ├── quality.py │ │ │ └── spectral.py │ │ ├── ml/ │ │ │ ├── __init__.py │ │ │ ├── accelerator.py │ │ │ ├── decorators.py │ │ │ ├── experiment.py │ │ │ └── layers/ │ │ │ ├── __init__.py │ │ │ ├── base.py │ │ │ └── spectral_gate.py │ │ ├── post.py │ │ └── preference.py │ ├── cosyvoice/ │ │ ├── __init__.py │ │ ├── bin/ │ │ │ ├── average_model.py │ │ │ ├── export_jit.py │ │ │ ├── export_onnx.py │ │ │ ├── inference_deprecated.py │ │ │ ├── spk2info.pt │ │ │ └── train.py │ │ ├── cli/ │ │ │ ├── __init__.py │ │ │ ├── cosyvoice.py │ │ │ ├── frontend.py │ │ │ └── model.py │ │ ├── dataset/ │ │ │ ├── __init__.py │ │ │ ├── dataset.py │ │ │ └── processor.py │ │ ├── flow/ │ │ │ ├── decoder.py │ │ │ ├── flow.py │ │ │ ├── flow_matching.py │ │ │ └── length_regulator.py │ │ ├── hifigan/ │ │ │ ├── discriminator.py │ │ │ ├── f0_predictor.py │ │ │ ├── generator.py │ │ │ └── hifigan.py │ │ ├── llm/ │ │ │ └── llm.py │ │ ├── tokenizer/ │ │ │ ├── assets/ │ │ │ │ └── multilingual_zh_ja_yue_char_del.tiktoken │ │ │ └── tokenizer.py │ │ ├── transformer/ │ │ │ ├── __init__.py │ │ │ ├── activation.py │ │ │ ├── attention.py │ │ │ ├── convolution.py │ │ │ ├── decoder.py │ │ │ ├── decoder_layer.py │ │ │ ├── embedding.py │ │ │ ├── encoder.py │ │ │ ├── encoder_layer.py │ │ │ ├── label_smoothing_loss.py │ │ │ ├── positionwise_feed_forward.py │ │ │ ├── subsampling.py │ │ │ └── upsample_encoder.py │ │ ├── utils/ │ │ │ ├── __init__.py │ │ │ ├── class_utils.py │ │ │ ├── common.py │ │ │ ├── executor.py │ │ │ ├── file_utils.py │ │ │ ├── frontend_utils.py │ │ │ ├── losses.py │ │ │ ├── mask.py │ │ │ ├── scheduler.py │ │ │ └── train_utils.py │ │ └── vllm/ │ │ └── cosyvoice2.py │ ├── deepseek_vl/ │ │ ├── __init__.py │ │ ├── models/ │ │ │ ├── __init__.py │ │ │ ├── clip_encoder.py │ │ │ ├── image_processing_vlm.py │ │ │ ├── modeling_vlm.py │ │ │ ├── processing_vlm.py │ │ │ ├── projector.py │ │ │ ├── sam.py │ │ │ └── siglip_vit.py │ │ ├── serve/ │ │ │ ├── __init__.py │ │ │ ├── app_deepseek.py │ │ │ ├── app_modules/ │ │ │ │ ├── __init__.py │ │ │ │ ├── gradio_utils.py │ │ │ │ ├── overwrites.py │ │ │ │ ├── presets.py │ │ │ │ └── utils.py │ │ │ ├── assets/ │ │ │ │ ├── Kelpy-Codos.js │ │ │ │ ├── custom.css │ │ │ │ └── custom.js │ │ │ └── inference.py │ │ └── utils/ │ │ ├── __init__.py │ │ ├── conversation.py │ │ └── io.py │ ├── deepseek_vl2/ │ │ ├── __init__.py │ │ ├── models/ │ │ │ ├── __init__.py │ │ │ ├── configuration_deepseek.py │ │ │ ├── conversation.py │ │ │ ├── modeling_deepseek.py │ │ │ ├── modeling_deepseek_vl_v2.py │ │ │ ├── processing_deepseek_vl_v2.py │ │ │ └── siglip_vit.py │ │ ├── serve/ │ │ │ ├── __init__.py │ │ │ ├── app_modules/ │ │ │ │ ├── __init__.py │ │ │ │ ├── gradio_utils.py │ │ │ │ ├── overwrites.py │ │ │ │ ├── presets.py │ │ │ │ └── utils.py │ │ │ ├── assets/ │ │ │ │ ├── Kelpy-Codos.js │ │ │ │ ├── custom.css │ │ │ │ ├── custom.js │ │ │ │ └── simsun.ttc │ │ │ └── inference.py │ │ └── utils/ │ │ ├── __init__.py │ │ └── io.py │ ├── f5_tts/ │ │ ├── __init__.py │ │ ├── api.py │ │ ├── configs/ │ │ │ ├── E2TTS_Base_train.yaml │ │ │ ├── E2TTS_Small_train.yaml │ │ │ ├── F5TTS_Base_train.yaml │ │ │ └── F5TTS_Small_train.yaml │ │ ├── eval/ │ │ │ ├── README.md │ │ │ ├── ecapa_tdnn.py │ │ │ ├── eval_infer_batch.py │ │ │ ├── eval_infer_batch.sh │ │ │ ├── eval_librispeech_test_clean.py │ │ │ ├── eval_seedtts_testset.py │ │ │ └── utils_eval.py │ │ ├── infer/ │ │ │ ├── README.md │ │ │ ├── examples/ │ │ │ │ ├── basic/ │ │ │ │ │ └── basic.toml │ │ │ │ ├── multi/ │ │ │ │ │ ├── country.flac │ │ │ │ │ ├── main.flac │ │ │ │ │ ├── story.toml │ │ │ │ │ ├── story.txt │ │ │ │ │ └── town.flac │ │ │ │ └── vocab.txt │ │ │ ├── infer_cli.py │ │ │ ├── infer_gradio.py │ │ │ ├── speech_edit.py │ │ │ └── utils_infer.py │ │ ├── model/ │ │ │ ├── __init__.py │ │ │ ├── backbones/ │ │ │ │ ├── README.md │ │ │ │ ├── dit.py │ │ │ │ ├── mmdit.py │ │ │ │ └── unett.py │ │ │ ├── cfm.py │ │ │ ├── dataset.py │ │ │ ├── modules.py │ │ │ ├── trainer.py │ │ │ └── utils.py │ │ ├── scripts/ │ │ │ ├── count_max_epoch.py │ │ │ └── count_params_gflops.py │ │ ├── socket_server.py │ │ └── train/ │ │ ├── README.md │ │ ├── datasets/ │ │ │ ├── prepare_csv_wavs.py │ │ │ ├── prepare_emilia.py │ │ │ ├── prepare_libritts.py │ │ │ ├── prepare_ljspeech.py │ │ │ └── prepare_wenetspeech4tts.py │ │ ├── finetune_cli.py │ │ ├── finetune_gradio.py │ │ └── train.py │ ├── fish_speech/ │ │ ├── __init__.py │ │ ├── fish_speech/ │ │ │ ├── __init__.py │ │ │ ├── callbacks/ │ │ │ │ ├── __init__.py │ │ │ │ └── grad_norm.py │ │ │ ├── configs/ │ │ │ │ ├── base.yaml │ │ │ │ ├── firefly_gan_vq.yaml │ │ │ │ ├── lora/ │ │ │ │ │ └── r_8_alpha_16.yaml │ │ │ │ └── text2semantic_finetune.yaml │ │ │ ├── conversation.py │ │ │ ├── datasets/ │ │ │ │ ├── concat_repeat.py │ │ │ │ ├── protos/ │ │ │ │ │ ├── text-data.proto │ │ │ │ │ ├── text_data_pb2.py │ │ │ │ │ └── text_data_stream.py │ │ │ │ ├── semantic.py │ │ │ │ └── vqgan.py │ │ │ ├── i18n/ │ │ │ │ ├── README.md │ │ │ │ ├── __init__.py │ │ │ │ ├── core.py │ │ │ │ ├── locale/ │ │ │ │ │ ├── en_US.json │ │ │ │ │ ├── es_ES.json │ │ │ │ │ ├── ja_JP.json │ │ │ │ │ ├── ko_KR.json │ │ │ │ │ ├── pt_BR.json │ │ │ │ │ └── zh_CN.json │ │ │ │ └── scan.py │ │ │ ├── models/ │ │ │ │ ├── text2semantic/ │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── lit_module.py │ │ │ │ │ ├── llama.py │ │ │ │ │ └── lora.py │ │ │ │ └── vqgan/ │ │ │ │ ├── __init__.py │ │ │ │ ├── modules/ │ │ │ │ │ ├── firefly.py │ │ │ │ │ └── fsq.py │ │ │ │ └── utils.py │ │ │ ├── scheduler.py │ │ │ ├── text/ │ │ │ │ ├── __init__.py │ │ │ │ ├── chn_text_norm/ │ │ │ │ │ ├── .gitignore │ │ │ │ │ ├── README.md │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── basic_class.py │ │ │ │ │ ├── basic_constant.py │ │ │ │ │ ├── basic_util.py │ │ │ │ │ ├── cardinal.py │ │ │ │ │ ├── date.py │ │ │ │ │ ├── digit.py │ │ │ │ │ ├── fraction.py │ │ │ │ │ ├── money.py │ │ │ │ │ ├── percentage.py │ │ │ │ │ ├── telephone.py │ │ │ │ │ └── text.py │ │ │ │ ├── clean.py │ │ │ │ └── spliter.py │ │ │ ├── tokenizer.py │ │ │ ├── train.py │ │ │ ├── utils/ │ │ │ │ ├── __init__.py │ │ │ │ ├── braceexpand.py │ │ │ │ ├── context.py │ │ │ │ ├── file.py │ │ │ │ ├── instantiators.py │ │ │ │ ├── logger.py │ │ │ │ ├── logging_utils.py │ │ │ │ ├── rich_utils.py │ │ │ │ ├── spectrogram.py │ │ │ │ └── utils.py │ │ │ └── webui/ │ │ │ ├── css/ │ │ │ │ └── style.css │ │ │ ├── html/ │ │ │ │ └── footer.html │ │ │ ├── js/ │ │ │ │ └── animate.js │ │ │ ├── launch_utils.py │ │ │ └── manage.py │ │ └── tools/ │ │ ├── api_client.py │ │ ├── api_server.py │ │ ├── download_models.py │ │ ├── e2e_webui.py │ │ ├── extract_model.py │ │ ├── file.py │ │ ├── fish_e2e.py │ │ ├── inference_engine/ │ │ │ ├── __init__.py │ │ │ ├── reference_loader.py │ │ │ ├── utils.py │ │ │ └── vq_manager.py │ │ ├── llama/ │ │ │ ├── build_dataset.py │ │ │ ├── eval_in_context.py │ │ │ ├── generate.py │ │ │ ├── merge_lora.py │ │ │ ├── quantize.py │ │ │ └── rebuild_tokenizer.py │ │ ├── run_webui.py │ │ ├── schema.py │ │ ├── sensevoice/ │ │ │ ├── README.md │ │ │ ├── __init__.py │ │ │ ├── auto_model.py │ │ │ ├── fun_asr.py │ │ │ └── vad_utils.py │ │ ├── server/ │ │ │ ├── agent/ │ │ │ │ ├── __init__.py │ │ │ │ ├── generate.py │ │ │ │ ├── generation_utils.py │ │ │ │ └── pre_generation_utils.py │ │ │ ├── api_utils.py │ │ │ ├── exception_handler.py │ │ │ ├── inference.py │ │ │ ├── model_manager.py │ │ │ ├── model_utils.py │ │ │ └── views.py │ │ ├── smart_pad.py │ │ ├── vqgan/ │ │ │ ├── create_train_split.py │ │ │ ├── extract_vq.py │ │ │ └── inference.py │ │ ├── webui/ │ │ │ ├── __init__.py │ │ │ ├── inference.py │ │ │ └── variables.py │ │ └── whisper_asr.py │ ├── indextts/ │ │ ├── BigVGAN/ │ │ │ ├── ECAPA_TDNN.py │ │ │ ├── __init__.py │ │ │ ├── activations.py │ │ │ ├── alias_free_activation/ │ │ │ │ ├── __init__.py │ │ │ │ ├── cuda/ │ │ │ │ │ ├── .gitignore │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── activation1d.py │ │ │ │ │ ├── anti_alias_activation.cpp │ │ │ │ │ ├── anti_alias_activation_cuda.cu │ │ │ │ │ ├── compat.h │ │ │ │ │ ├── load.py │ │ │ │ │ └── type_shim.h │ │ │ │ └── torch/ │ │ │ │ ├── __init__.py │ │ │ │ ├── act.py │ │ │ │ ├── filter.py │ │ │ │ └── resample.py │ │ │ ├── alias_free_torch/ │ │ │ │ ├── __init__.py │ │ │ │ ├── act.py │ │ │ │ ├── filter.py │ │ │ │ └── resample.py │ │ │ ├── bigvgan.py │ │ │ ├── models.py │ │ │ ├── nnet/ │ │ │ │ ├── CNN.py │ │ │ │ ├── __init__.py │ │ │ │ ├── linear.py │ │ │ │ └── normalization.py │ │ │ └── utils.py │ │ ├── __init__.py │ │ ├── cli.py │ │ ├── gpt/ │ │ │ ├── __init__.py │ │ │ ├── conformer/ │ │ │ │ ├── __init__.py │ │ │ │ ├── attention.py │ │ │ │ ├── embedding.py │ │ │ │ └── subsampling.py │ │ │ ├── conformer_encoder.py │ │ │ ├── model.py │ │ │ ├── model_v2.py │ │ │ ├── perceiver.py │ │ │ ├── transformers_beam_search.py │ │ │ ├── transformers_generation_utils.py │ │ │ ├── transformers_gpt2.py │ │ │ └── transformers_modeling_utils.py │ │ ├── infer.py │ │ ├── infer_v2.py │ │ ├── s2mel/ │ │ │ ├── dac/ │ │ │ │ ├── __init__.py │ │ │ │ ├── __main__.py │ │ │ │ ├── model/ │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── base.py │ │ │ │ │ ├── dac.py │ │ │ │ │ ├── discriminator.py │ │ │ │ │ └── encodec.py │ │ │ │ ├── nn/ │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── layers.py │ │ │ │ │ ├── loss.py │ │ │ │ │ └── quantize.py │ │ │ │ └── utils/ │ │ │ │ ├── __init__.py │ │ │ │ ├── decode.py │ │ │ │ └── encode.py │ │ │ ├── hf_utils.py │ │ │ ├── modules/ │ │ │ │ ├── alias_free_torch/ │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── act.py │ │ │ │ │ ├── filter.py │ │ │ │ │ └── resample.py │ │ │ │ ├── audio.py │ │ │ │ ├── bigvgan/ │ │ │ │ │ ├── activations.py │ │ │ │ │ ├── alias_free_activation/ │ │ │ │ │ │ ├── cuda/ │ │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ │ ├── activation1d.py │ │ │ │ │ │ │ ├── anti_alias_activation.cpp │ │ │ │ │ │ │ ├── anti_alias_activation_cuda.cu │ │ │ │ │ │ │ ├── compat.h │ │ │ │ │ │ │ ├── load.py │ │ │ │ │ │ │ └── type_shim.h │ │ │ │ │ │ └── torch/ │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ ├── act.py │ │ │ │ │ │ ├── filter.py │ │ │ │ │ │ └── resample.py │ │ │ │ │ ├── bigvgan.py │ │ │ │ │ ├── config.json │ │ │ │ │ ├── env.py │ │ │ │ │ ├── meldataset.py │ │ │ │ │ └── utils.py │ │ │ │ ├── campplus/ │ │ │ │ │ ├── DTDNN.py │ │ │ │ │ ├── classifier.py │ │ │ │ │ └── layers.py │ │ │ │ ├── commons.py │ │ │ │ ├── diffusion_transformer.py │ │ │ │ ├── encodec.py │ │ │ │ ├── flow_matching.py │ │ │ │ ├── gpt_fast/ │ │ │ │ │ ├── generate.py │ │ │ │ │ ├── model.py │ │ │ │ │ └── quantize.py │ │ │ │ ├── hifigan/ │ │ │ │ │ ├── f0_predictor.py │ │ │ │ │ └── generator.py │ │ │ │ ├── layers.py │ │ │ │ ├── length_regulator.py │ │ │ │ ├── openvoice/ │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── api.py │ │ │ │ │ ├── attentions.py │ │ │ │ │ ├── checkpoints_v2/ │ │ │ │ │ │ └── converter/ │ │ │ │ │ │ └── config.json │ │ │ │ │ ├── commons.py │ │ │ │ │ ├── mel_processing.py │ │ │ │ │ ├── models.py │ │ │ │ │ ├── modules.py │ │ │ │ │ ├── openvoice_app.py │ │ │ │ │ ├── se_extractor.py │ │ │ │ │ ├── transforms.py │ │ │ │ │ └── utils.py │ │ │ │ ├── quantize.py │ │ │ │ ├── rmvpe.py │ │ │ │ ├── vocos/ │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── heads.py │ │ │ │ │ ├── helpers.py │ │ │ │ │ ├── loss.py │ │ │ │ │ ├── models.py │ │ │ │ │ ├── modules.py │ │ │ │ │ ├── pretrained.py │ │ │ │ │ └── spectral_ops.py │ │ │ │ └── wavenet.py │ │ │ ├── optimizers.py │ │ │ └── wav2vecbert_extract.py │ │ ├── utils/ │ │ │ ├── __init__.py │ │ │ ├── arch_util.py │ │ │ ├── checkpoint.py │ │ │ ├── common.py │ │ │ ├── feature_extractors.py │ │ │ ├── front.py │ │ │ ├── maskgct/ │ │ │ │ └── models/ │ │ │ │ ├── codec/ │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── amphion_codec/ │ │ │ │ │ │ ├── codec.py │ │ │ │ │ │ ├── quantize/ │ │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ │ ├── factorized_vector_quantize.py │ │ │ │ │ │ │ ├── lookup_free_quantize.py │ │ │ │ │ │ │ ├── residual_vq.py │ │ │ │ │ │ │ └── vector_quantize.py │ │ │ │ │ │ └── vocos.py │ │ │ │ │ ├── codec_dataset.py │ │ │ │ │ ├── codec_inference.py │ │ │ │ │ ├── codec_sampler.py │ │ │ │ │ ├── codec_trainer.py │ │ │ │ │ ├── facodec/ │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ ├── alias_free_torch/ │ │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ │ ├── act.py │ │ │ │ │ │ │ ├── filter.py │ │ │ │ │ │ │ └── resample.py │ │ │ │ │ │ ├── facodec_dataset.py │ │ │ │ │ │ ├── facodec_inference.py │ │ │ │ │ │ ├── facodec_trainer.py │ │ │ │ │ │ ├── modules/ │ │ │ │ │ │ │ ├── JDC/ │ │ │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ │ │ ├── bst.t7 │ │ │ │ │ │ │ │ └── model.py │ │ │ │ │ │ │ ├── attentions.py │ │ │ │ │ │ │ ├── commons.py │ │ │ │ │ │ │ ├── gradient_reversal.py │ │ │ │ │ │ │ ├── layers.py │ │ │ │ │ │ │ ├── quantize.py │ │ │ │ │ │ │ ├── style_encoder.py │ │ │ │ │ │ │ └── wavenet.py │ │ │ │ │ │ └── optimizer.py │ │ │ │ │ ├── kmeans/ │ │ │ │ │ │ ├── repcodec_model.py │ │ │ │ │ │ └── vocos.py │ │ │ │ │ ├── melvqgan/ │ │ │ │ │ │ └── melspec.py │ │ │ │ │ ├── ns3_codec/ │ │ │ │ │ │ ├── README.md │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ ├── alias_free_torch/ │ │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ │ ├── act.py │ │ │ │ │ │ │ ├── filter.py │ │ │ │ │ │ │ └── resample.py │ │ │ │ │ │ ├── facodec.py │ │ │ │ │ │ ├── gradient_reversal.py │ │ │ │ │ │ ├── melspec.py │ │ │ │ │ │ ├── quantize/ │ │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ │ ├── fvq.py │ │ │ │ │ │ │ └── rvq.py │ │ │ │ │ │ └── transformer.py │ │ │ │ │ ├── speechtokenizer/ │ │ │ │ │ │ ├── model.py │ │ │ │ │ │ └── modules/ │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ ├── conv.py │ │ │ │ │ │ ├── lstm.py │ │ │ │ │ │ ├── norm.py │ │ │ │ │ │ ├── quantization/ │ │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ │ ├── ac.py │ │ │ │ │ │ │ ├── core_vq.py │ │ │ │ │ │ │ ├── distrib.py │ │ │ │ │ │ │ └── vq.py │ │ │ │ │ │ └── seanet.py │ │ │ │ │ └── vevo/ │ │ │ │ │ └── vevo_repcodec.py │ │ │ │ └── tts/ │ │ │ │ └── maskgct/ │ │ │ │ ├── ckpt/ │ │ │ │ │ └── wav2vec2bert_stats.pt │ │ │ │ ├── llama_nar.py │ │ │ │ └── maskgct_s2a.py │ │ │ ├── maskgct_utils.py │ │ │ ├── text_utils.py │ │ │ ├── typical_sampling.py │ │ │ ├── utils.py │ │ │ ├── webui_utils.py │ │ │ └── xtransformers.py │ │ └── vqvae/ │ │ ├── __init__.py │ │ └── xtts_dvae.py │ ├── internvl/ │ │ ├── __init__.py │ │ └── conversation.py │ ├── llava/ │ │ ├── __init__.py │ │ ├── conversation.py │ │ ├── mm_utils.py │ │ └── model/ │ │ ├── __init__.py │ │ ├── clip_encoder/ │ │ │ ├── __init__.py │ │ │ ├── builder.py │ │ │ └── clip_encoder.py │ │ ├── constants.py │ │ ├── llava_arch.py │ │ ├── llava_llama.py │ │ └── multimodal_projector/ │ │ ├── __init__.py │ │ └── builder.py │ ├── matcha/ │ │ ├── VERSION │ │ ├── __init__.py │ │ ├── app.py │ │ ├── cli.py │ │ ├── data/ │ │ │ ├── __init__.py │ │ │ ├── components/ │ │ │ │ └── __init__.py │ │ │ └── text_mel_datamodule.py │ │ ├── hifigan/ │ │ │ ├── LICENSE │ │ │ ├── README.md │ │ │ ├── __init__.py │ │ │ ├── config.py │ │ │ ├── denoiser.py │ │ │ ├── env.py │ │ │ ├── meldataset.py │ │ │ ├── models.py │ │ │ └── xutils.py │ │ ├── models/ │ │ │ ├── __init__.py │ │ │ ├── baselightningmodule.py │ │ │ ├── components/ │ │ │ │ ├── __init__.py │ │ │ │ ├── decoder.py │ │ │ │ ├── flow_matching.py │ │ │ │ ├── text_encoder.py │ │ │ │ └── transformer.py │ │ │ └── matcha_tts.py │ │ ├── onnx/ │ │ │ ├── __init__.py │ │ │ ├── export.py │ │ │ └── infer.py │ │ ├── text/ │ │ │ ├── __init__.py │ │ │ ├── cleaners.py │ │ │ ├── numbers.py │ │ │ └── symbols.py │ │ ├── train.py │ │ └── utils/ │ │ ├── __init__.py │ │ ├── audio.py │ │ ├── generate_data_statistics.py │ │ ├── get_durations_from_trained_model.py │ │ ├── instantiators.py │ │ ├── logging_utils.py │ │ ├── model.py │ │ ├── monotonic_align/ │ │ │ ├── __init__.py │ │ │ ├── core.pyx │ │ │ └── setup.py │ │ ├── pylogger.py │ │ ├── rich_utils.py │ │ └── utils.py │ ├── megatts3/ │ │ ├── __init__.py │ │ └── tts/ │ │ ├── frontend_function.py │ │ ├── gradio_api.py │ │ ├── infer_cli.py │ │ ├── modules/ │ │ │ ├── aligner/ │ │ │ │ └── whisper_small.py │ │ │ ├── ar_dur/ │ │ │ │ ├── ar_dur_predictor.py │ │ │ │ └── commons/ │ │ │ │ ├── layers.py │ │ │ │ ├── nar_tts_modules.py │ │ │ │ ├── rel_transformer.py │ │ │ │ ├── rot_transformer.py │ │ │ │ ├── seq_utils.py │ │ │ │ └── transformer.py │ │ │ ├── llm_dit/ │ │ │ │ ├── cfm.py │ │ │ │ ├── dit.py │ │ │ │ ├── time_embedding.py │ │ │ │ └── transformer.py │ │ │ └── wavvae/ │ │ │ ├── decoder/ │ │ │ │ ├── diag_gaussian.py │ │ │ │ ├── hifigan_modules.py │ │ │ │ ├── seanet_encoder.py │ │ │ │ └── wavvae_v3.py │ │ │ └── encoder/ │ │ │ └── common_modules/ │ │ │ ├── conv.py │ │ │ ├── lstm.py │ │ │ └── seanet.py │ │ └── utils/ │ │ ├── audio_utils/ │ │ │ ├── align.py │ │ │ ├── io.py │ │ │ └── plot.py │ │ ├── commons/ │ │ │ ├── ckpt_utils.py │ │ │ └── hparams.py │ │ └── text_utils/ │ │ ├── dict.json │ │ ├── ph_tone_convert.py │ │ ├── split_text.py │ │ └── text_encoder.py │ ├── melo/ │ │ ├── __init__.py │ │ ├── api.py │ │ ├── app.py │ │ ├── attentions.py │ │ ├── commons.py │ │ ├── configs/ │ │ │ └── config.json │ │ ├── data/ │ │ │ └── example/ │ │ │ └── metadata.list │ │ ├── data_utils.py │ │ ├── download_utils.py │ │ ├── infer.py │ │ ├── init_downloads.py │ │ ├── losses.py │ │ ├── main.py │ │ ├── mel_processing.py │ │ ├── models.py │ │ ├── modules.py │ │ ├── monotonic_align/ │ │ │ ├── __init__.py │ │ │ └── core.py │ │ ├── preprocess_text.py │ │ ├── split_utils.py │ │ ├── text/ │ │ │ ├── __init__.py │ │ │ ├── chinese.py │ │ │ ├── chinese_bert.py │ │ │ ├── chinese_mix.py │ │ │ ├── cleaner.py │ │ │ ├── cleaner_multiling.py │ │ │ ├── cmudict.rep │ │ │ ├── cmudict_cache.pickle │ │ │ ├── english.py │ │ │ ├── english_bert.py │ │ │ ├── english_utils/ │ │ │ │ ├── __init__.py │ │ │ │ ├── abbreviations.py │ │ │ │ ├── number_norm.py │ │ │ │ └── time_norm.py │ │ │ ├── es_phonemizer/ │ │ │ │ ├── __init__.py │ │ │ │ ├── base.py │ │ │ │ ├── cleaner.py │ │ │ │ ├── es_symbols.json │ │ │ │ ├── es_symbols.txt │ │ │ │ ├── es_symbols_v2.json │ │ │ │ ├── es_to_ipa.py │ │ │ │ ├── example_ipa.txt │ │ │ │ ├── gruut_wrapper.py │ │ │ │ ├── punctuation.py │ │ │ │ ├── spanish_symbols.txt │ │ │ │ └── test.ipynb │ │ │ ├── fr_phonemizer/ │ │ │ │ ├── __init__.py │ │ │ │ ├── base.py │ │ │ │ ├── cleaner.py │ │ │ │ ├── en_symbols.json │ │ │ │ ├── example_ipa.txt │ │ │ │ ├── fr_symbols.json │ │ │ │ ├── fr_to_ipa.py │ │ │ │ ├── french_abbreviations.py │ │ │ │ ├── french_symbols.txt │ │ │ │ ├── gruut_wrapper.py │ │ │ │ └── punctuation.py │ │ │ ├── french.py │ │ │ ├── french_bert.py │ │ │ ├── japanese.py │ │ │ ├── japanese_bert.py │ │ │ ├── ko_dictionary.py │ │ │ ├── korean.py │ │ │ ├── opencpop-strict.txt │ │ │ ├── spanish.py │ │ │ ├── spanish_bert.py │ │ │ ├── symbols.py │ │ │ └── tone_sandhi.py │ │ ├── train.py │ │ ├── train.sh │ │ ├── transforms.py │ │ └── utils.py │ ├── mlx/ │ │ ├── __init__.py │ │ └── flux/ │ │ ├── __init__.py │ │ ├── autoencoder.py │ │ ├── clip.py │ │ ├── datasets.py │ │ ├── flux.py │ │ ├── layers.py │ │ ├── lora.py │ │ ├── model.py │ │ ├── sampler.py │ │ ├── t5.py │ │ ├── tokenizers.py │ │ ├── trainer.py │ │ └── utils.py │ └── whisper/ │ ├── __init__.py │ ├── __main__.py │ ├── assets/ │ │ ├── gpt2.tiktoken │ │ ├── mel_filters.npz │ │ └── multilingual.tiktoken │ ├── audio.py │ ├── decoding.py │ ├── model.py │ ├── normalizers/ │ │ ├── __init__.py │ │ ├── basic.py │ │ ├── english.json │ │ └── english.py │ ├── timing.py │ ├── tokenizer.py │ ├── transcribe.py │ ├── triton_ops.py │ ├── utils.py │ └── version.py ├── types.py ├── ui/ │ ├── __init__.py │ ├── gradio/ │ │ ├── __init__.py │ │ ├── chat_interface.py │ │ ├── media_interface.py │ │ └── utils/ │ │ ├── __init__.py │ │ └── latex.py │ └── web/ │ └── ui/ │ ├── .eslintignore │ ├── .eslintrc.yml │ ├── .gitignore │ ├── .prettierignore │ ├── .prettierrc.yml │ ├── package.json │ ├── public/ │ │ └── index.html │ └── src/ │ ├── App.js │ ├── components/ │ │ ├── MenuSide.js │ │ ├── Title.js │ │ ├── alertComponent.js │ │ ├── apiContext.js │ │ ├── authAlertDialog.js │ │ ├── copyComponent.js │ │ ├── deleteDialog.js │ │ ├── errorMessageSnackBar.js │ │ ├── fetchWrapper.js │ │ ├── fetcher.js │ │ ├── hotkeyFocusTextField.js │ │ ├── successMessageSnackBar.js │ │ ├── tableTitle.js │ │ ├── themeButton.js │ │ ├── themeContext.js │ │ ├── titleTypography.js │ │ ├── translateButton.js │ │ ├── utils.js │ │ └── versionLabel.js │ ├── i18n.js │ ├── index.css │ ├── index.js │ ├── locales/ │ │ ├── en.json │ │ ├── ja.json │ │ ├── ko.json │ │ └── zh.json │ ├── router/ │ │ └── index.js │ ├── scenes/ │ │ ├── _layout/ │ │ │ └── index.js │ │ ├── cluster_info/ │ │ │ ├── index.js │ │ │ ├── nodeInfo.js │ │ │ └── style.js │ │ ├── launch_model/ │ │ │ ├── LaunchModel.js │ │ │ ├── components/ │ │ │ │ ├── cachedListDialog.js │ │ │ │ ├── commandBuilder.js │ │ │ │ ├── dynamicFieldList.js │ │ │ │ ├── editCustomModelDialog.js │ │ │ │ ├── launchModelDrawer.js │ │ │ │ ├── modelFormConfig.js │ │ │ │ ├── pasteDialog.js │ │ │ │ ├── progress.js │ │ │ │ ├── selectField.js │ │ │ │ └── virtualenvListDialog.js │ │ │ ├── data/ │ │ │ │ └── data.js │ │ │ ├── index.js │ │ │ ├── launchCustom.js │ │ │ ├── modelCard.js │ │ │ └── styles/ │ │ │ └── modelCardStyle.css │ │ ├── login/ │ │ │ ├── header.js │ │ │ └── login.js │ │ ├── register_model/ │ │ │ ├── components/ │ │ │ │ ├── addControlnet.js │ │ │ │ ├── addModelSpecs.js │ │ │ │ ├── addStop.js │ │ │ │ └── addVirtualenv.js │ │ │ ├── data/ │ │ │ │ └── languages.js │ │ │ ├── index.js │ │ │ ├── registerModel.js │ │ │ └── styles/ │ │ │ └── registerModelStyle.css │ │ └── running_models/ │ │ └── index.js │ └── theme.js └── utils.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .dockerignore ================================================ doc/ .idea/ .github/ build/ xinference.egg-info/ xinference/web/ui/build/ xinference/web/ui/node_modules/ ================================================ FILE: .gitattributes ================================================ xinference/_version.py export-subst ================================================ FILE: .github/ISSUE_TEMPLATE/bug_report.yaml ================================================ name: "Bug Report" description: Submit a bug report to help us improve Xinference. You should provide useful information AMAP rather than simply describing what happened. / 提交一个问题报告来帮助我们改进 Xinference。你必须提供有用的信息而不只是描述发生的现象,否则将不予处理。 body: - type: textarea id: system-info attributes: label: System Info / 系統信息 description: Your operating environment / 您的运行环境信息 placeholder: Includes Cuda version, transformers / xllamacpp / vllm version, Python version, operating system... / 包括Cuda版本,transformers / xllamacpp / vllm版本,Python版本,操作系统等。 validations: required: true - type: checkboxes id: information-scripts-examples attributes: label: Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece? description: 'How are you using Xinference? / 以何种方式使用 Xinference?' options: - label: docker / docker - label: pip install / 通过 pip install 安装 - label: installation from source / 从源码安装 - type: textarea id: start-way attributes: label: Version info / 版本信息 description: The version of Xinference you are running / Xinference 版本 validations: required: true - type: textarea id: commandline attributes: label: The command used to start Xinference / 用以启动 xinference 的命令 description: | Please provide the command used to start Xinference. If it is a distributed scenario, the commands for starting the supervisor and worker need to be listed separately. If it is a Docker scenario, please provide the complete command for starting Xinference through Docker. If it is another method, please describe it specifically. 请提供启动 xinference 的命令。 如果是分布式场景,启动 supervisor 和 worker 的命令需要分别列出。 如果是docker场景,请提供通过 docker 启动 xinference 的完整命令。 如果是其他方式,请具体描述。 validations: required: true - type: textarea id: reproduction validations: required: true attributes: label: Reproduction / 复现过程 description: | Please provide a code example that reproduces the problem you encountered, preferably with a minimal reproduction unit. If you have code snippets, error messages, stack traces, please provide them here as well. Please format your code correctly using code tags. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting Do not use screenshots, as they are difficult to read and (more importantly) do not allow others to copy and paste your code. 请提供能重现您遇到的问题的代码示例,最好是最小复现单元。 如果您有代码片段、错误信息、堆栈跟踪、涉及的命令行操作等也请在此提供。 请使用代码标签正确格式化您的代码。请参见 https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting 请勿使用截图,因为截图难以阅读,而且(更重要的是)不允许他人复制粘贴您的代码。 placeholder: | Steps to reproduce the behavior/复现Bug的步骤: 1. 2. 3. - type: textarea id: expected-behavior validations: required: true attributes: label: Expected behavior / 期待表现 description: "A clear and concise description of what you would expect to happen. / 简单描述您期望发生的事情。" ================================================ FILE: .github/ISSUE_TEMPLATE/feature_request.yaml ================================================ name: "Feature request" description: Submit a request for a new Xinference feature / 提交一个新的 Xinference 的功能建议 labels: [ "feature" ] body: - type: textarea id: feature-request validations: required: true attributes: label: Feature request / 功能建议 description: | A brief description of the functional proposal. 对功能建议的简述。 - type: textarea id: motivation validations: required: true attributes: label: Motivation / 动机 description: | Your motivation for making the suggestion. If that motivation is related to another GitHub issue, link to it here. 您提出建议的动机。如果该动机与另一个 GitHub 问题有关,请在此处提供对应的链接。 - type: textarea id: contribution validations: required: true attributes: label: Your contribution / 您的贡献 description: | Your PR link or any other link you can help with. 您的PR链接或者其他您能提供帮助的链接。 ================================================ FILE: .github/workflows/assign.yaml ================================================ name: Assign on: issue_comment: types: created permissions: contents: read jobs: issue_assign: permissions: issues: write pull-requests: write runs-on: ubuntu-22.04 steps: - if: github.event.comment.body == 'take' run: | echo "Assigning issue ${{ github.event.issue.number }} to ${{ github.event.comment.user.login }}" curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -d '{"assignees": ["${{ github.event.comment.user.login }}"]}' https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.issue.number }}/assignees ================================================ FILE: .github/workflows/docker-cd.yaml ================================================ name: Xinference CD for DockerHub on: schedule: - cron: '0 18 * * *' push: tags: - '*' workflow_dispatch: concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true jobs: build: timeout-minutes: 240 runs-on: self-hosted strategy: matrix: python-version: [ "3.10" ] steps: - name: Check out code uses: actions/checkout@v3 with: fetch-depth: 0 submodules: recursive - name: Log in to Docker Hub uses: docker/login-action@v1 with: username: ${{ secrets.DOCKERHUB_USERNAME }} password: ${{ secrets.DOCKERHUB_PASSWORD }} - name: Build and push Docker image shell: bash if: ${{ github.repository == 'xorbitsai/inference' }} env: DOCKER_ORG: ${{ secrets.DOCKERHUB_USERNAME }} PY_VERSION: ${{ matrix.python-version }} run: | if [[ "$GITHUB_REF" =~ ^"refs/tags/" ]]; then export GIT_TAG=$(echo "$GITHUB_REF" | sed -e "s/refs\/tags\///g") fi docker system prune -f -a if [[ -n "$GIT_TAG" ]]; then BRANCHES="$GIT_TAG" echo "Will handle tag $BRANCHES" else MAINBRANCH=$(git rev-parse --abbrev-ref HEAD) BRANCHES="$MAINBRANCH" fi for branch in $BRANCHES; do if [[ -n "$GIT_TAG" ]]; then export IMAGE_TAG="$GIT_TAG" else git checkout $branch export IMAGE_TAG="nightly-$branch" fi docker build -t "$DOCKER_ORG/xinference:${IMAGE_TAG}" --progress=plain -f xinference/deploy/docker/Dockerfile . docker push "$DOCKER_ORG/xinference:${IMAGE_TAG}" docker build -t "$DOCKER_ORG/xinference:${IMAGE_TAG}-cpu" --progress=plain -f xinference/deploy/docker/Dockerfile.cpu . docker push "$DOCKER_ORG/xinference:${IMAGE_TAG}-cpu" echo "XINFERENCE_IMAGE_TAG=${IMAGE_TAG}" >> $GITHUB_ENV done if [[ -n "$GIT_TAG" ]]; then docker tag "$DOCKER_ORG/xinference:${GIT_TAG}" "$DOCKER_ORG/xinference:latest" docker push "$DOCKER_ORG/xinference:latest" docker tag "$DOCKER_ORG/xinference:${GIT_TAG}-cpu" "$DOCKER_ORG/xinference:latest-cpu" docker push "$DOCKER_ORG/xinference:latest-cpu" echo "XINFERENCE_GIT_TAG=${GIT_TAG}" >> $GITHUB_ENV fi - name: Clean docker image cache shell: bash if: ${{ github.repository == 'xorbitsai/inference' }} run: | docker system prune -f -a ================================================ FILE: .github/workflows/issue.yaml ================================================ name: Close inactive issues on: schedule: - cron: "0 19 * * *" workflow_dispatch: jobs: close-issues: runs-on: ubuntu-latest permissions: issues: write pull-requests: write steps: - uses: actions/stale@v9 with: days-before-issue-stale: 14 days-before-issue-close: 10 stale-issue-label: "stale" stale-issue-message: "This issue is stale because it has been open for 14 days with no activity." close-issue-message: "This issue was closed because it has been inactive for 10 days since being marked as stale." days-before-pr-stale: -1 days-before-pr-close: -1 operations-per-run: 500 repo-token: ${{ secrets.GITHUB_TOKEN }} ================================================ FILE: .github/workflows/pr_auto_run_gen_docs.yaml ================================================ name: Auto run gen_docs.py and commit changes to PR on: pull_request_target: types: [opened, synchronize] permissions: contents: write pull-requests: write jobs: run-gen-docs-and-commit: if: startsWith(github.event.pull_request.head.ref, 'chore/models-sync/') runs-on: ubuntu-latest steps: - name: Checkout base repository (trusted scripts) uses: actions/checkout@v4 with: ref: ${{ github.event.pull_request.base.ref }} repository: ${{ github.repository }} path: main fetch-depth: 0 - name: Checkout PR head branch (working copy) uses: actions/checkout@v4 with: ref: ${{ github.event.pull_request.head.ref }} repository: ${{ github.event.pull_request.head.repo.full_name }} path: pr fetch-depth: 0 - name: Decide whether to run gen_docs for latest commit id: decide working-directory: pr run: | set -e MSG="$(git log -1 --pretty=%B || echo "")" echo "Latest commit message: $MSG" if echo "$MSG" | grep -Eiq '\[(skip ci|ci skip)\]'; then echo "Skip token found in commit message; will not run." echo "run=false" >> $GITHUB_OUTPUT exit 0 fi HEAD_SHA="$(git rev-parse HEAD)" BASE_SHA="${{ github.event.pull_request.base.sha }}" RANGE="$BASE_SHA...$HEAD_SHA" echo "Diff range (full PR): $RANGE" CHANGED_FILES="$(git diff --name-only "$RANGE" || true)" echo "Changed files in PR range:" echo "$CHANGED_FILES" RUN="false" for f in $CHANGED_FILES; do case "$f" in xinference/model/llm/llm_family.json|xinference/model/embedding/model_spec.json|xinference/model/rerank/model_spec.json|xinference/model/image/model_spec.json|xinference/model/audio/model_spec.json|xinference/model/video/model_spec.json) RUN="true"; break;; esac done echo "run=$RUN" >> $GITHUB_OUTPUT - name: Set up Python if: steps.decide.outputs.run == 'true' uses: actions/setup-python@v5 with: python-version: '3.10' - name: Install gen_docs dependencies if: steps.decide.outputs.run == 'true' run: | python -m pip install --upgrade pip python -m pip install jinja2 python -m pip install "xinference[doc]" - name: Run gen_docs.py if present if: steps.decide.outputs.run == 'true' working-directory: pr run: | echo "[Debug] CWD: $(pwd)" echo "[Debug] List ../main:" ls -la ../main || true echo "[Debug] List ../main/doc/source:" ls -la ../main/doc/source || true # Use PR branch's gen_docs.py if it exists, otherwise use main branch's if [ -f "doc/source/gen_docs.py" ]; then echo "Using PR branch's doc/source/gen_docs.py" echo "Running pr/doc/source/gen_docs.py from its directory" (cd doc/source && python -u gen_docs.py) elif [ -f "../main/doc/source/gen_docs.py" ]; then echo "Copying main/doc/source/gen_docs.py into PR workspace" mkdir -p doc/source cp -f ../main/doc/source/gen_docs.py doc/source/gen_docs.py echo "Running pr/doc/source/gen_docs.py from its directory" (cd doc/source && python -u gen_docs.py) elif [ -f "gen_docs.py" ]; then echo "Using PR branch's gen_docs.py" echo "Running pr/gen_docs.py" python -u gen_docs.py elif [ -f "../main/gen_docs.py" ]; then echo "Copying main/gen_docs.py into PR workspace" cp -f ../main/gen_docs.py gen_docs.py echo "Running pr/gen_docs.py" python -u gen_docs.py else echo "gen_docs.py not found in main repository, skipping." fi - name: Stage and commit changes back to PR branch if: steps.decide.outputs.run == 'true' working-directory: pr run: | echo "[Debug] Before staging:" && git status --porcelain echo "[Debug] check-ignore for generated file:" git check-ignore -v doc/source/_generated/auto_generated.txt || echo "Not ignored" git add -A git add -f doc/source/_generated || true echo "[Debug] After staging:" && git status --porcelain echo "[Debug] Staged diff:" && git diff --cached --name-status || true if ! git diff --cached --quiet; then git config user.name "github-actions[bot]" git config user.email "41898282+github-actions[bot]@users.noreply.github.com" git commit -m "chore(docs): auto-run gen_docs.py" else echo "No changes to commit." fi - name: Push back for same-repo PR env: BRANCH: ${{ github.event.pull_request.head.ref }} if: steps.decide.outputs.run == 'true' && github.event.pull_request.head.repo.full_name == github.repository working-directory: pr run: | echo "Pushing changes to same-repo PR..." git push origin HEAD:$BRANCH || echo "No changes to push." - name: Push back for fork PR using maintainer PAT if: steps.decide.outputs.run == 'true' && github.event.pull_request.head.repo.full_name != github.repository && github.event.pull_request.maintainer_can_modify env: PUSH_TOKEN: ${{ secrets.PUSH_TOKEN }} BRANCH: ${{ github.event.pull_request.head.ref }} HEAD_FULL_NAME: ${{ github.event.pull_request.head.repo.full_name }} working-directory: pr run: | if [ -z "$PUSH_TOKEN" ]; then echo "Missing secrets.PUSH_TOKEN; cannot push to fork. Skipping push." exit 0 fi echo "Pushing changes to fork PR using maintainer PAT..." git remote set-url origin "https://x-access-token:${PUSH_TOKEN}@github.com/${HEAD_FULL_NAME}.git" git push origin HEAD:$BRANCH || echo "No changes to push." - name: Skip push for fork PR without maintainer edit permission if: steps.decide.outputs.run != 'true' && github.event.pull_request.head.repo.full_name != github.repository && !github.event.pull_request.maintainer_can_modify run: | echo "Fork PR does not allow edits by maintainers; run succeeded but skip pushing commits." ================================================ FILE: .github/workflows/python.yaml ================================================ name: Python CI on: push: branches: - '*' pull_request: types: ['opened', 'reopened', 'synchronize'] concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true jobs: lint: runs-on: ${{ matrix.os }} strategy: fail-fast: false matrix: os: [ "ubuntu-latest" ] python-version: [ "3.10" ] steps: - name: Check out code uses: actions/checkout@v3 with: fetch-depth: 0 submodules: recursive - name: Set up Python environment uses: actions/setup-python@v4 with: python-version: "3.10" - name: Install pre-commit run: pip install pre-commit - name: Run pre-commit run: pre-commit run --all-files - name: Set up Node.js uses: actions/setup-node@v1 with: node-version: 16 # ESLint and Prettier must be in `package.json` - name: Install Node.js dependencies run: cd xinference/ui/web/ui && npm ci - name: ESLint Check run: cd xinference/ui/web/ui && npx eslint . - name: Prettier Check run: cd xinference/ui/web/ui && ./node_modules/.bin/prettier --check . build_test_job: runs-on: ${{ matrix.os }} needs: lint env: CONDA_ENV: test SELF_HOST_PYTHON: /root/miniconda3/envs/inference_test/bin/python SELF_HOST_CONDA: /root/miniconda3/condabin/conda defaults: run: shell: bash -l {0} strategy: fail-fast: false matrix: os: [ "ubuntu-latest", "macos-latest", "windows-latest" ] python-version: [ "3.10", "3.11", "3.12", "3.13" ] module: [ "xinference" ] exclude: - { os: macos-latest, python-version: 3.11 } - { os: macos-latest, python-version: 3.12 } - { os: windows-latest, python-version: 3.11 } - { os: windows-latest, python-version: 3.12 } include: - { os: self-hosted, module: gpu, python-version: "3.11"} - { os: macos-latest, module: metal, python-version: "3.10" } steps: - name: Check out code uses: actions/checkout@v3 with: fetch-depth: 0 submodules: recursive - name: Set up conda ${{ matrix.python-version }} uses: conda-incubator/setup-miniconda@v3 if: ${{ matrix.module != 'gpu' }} with: python-version: ${{ matrix.python-version }} activate-environment: ${{ env.CONDA_ENV }} # Important for python == 3.12 and 3.13 - name: Update pip and setuptools if: ${{ matrix.python-version == '3.12' || matrix.python-version == '3.13' }} run: | python -m pip install -U pip "setuptools<82" # Install torch for Python 3.13 using nightly builds - name: Install torch for Python 3.13 if: ${{ matrix.python-version == '3.13'}} run: | python -m pip install torch torchvision torchaudio - name: Install numpy if: | (startsWith(matrix.os, 'macos') && (matrix.python-version == '3.13')) || (startsWith(matrix.os, 'windows')) run: | python -m pip install "numpy<2" - name: Install dependencies env: MODULE: ${{ matrix.module }} OS: ${{ matrix.os }} if: ${{ matrix.module != 'gpu' }} run: | if [ "$OS" == "ubuntu-latest" ]; then sudo rm -rf /usr/share/dotnet sudo rm -rf /opt/ghc sudo rm -rf "/usr/local/share/boost" sudo rm -rf "$AGENT_TOOLSDIRECTORY" fi pip install -e ".[dev]" pip install "xllamacpp>=0.2.0" if [ "$MODULE" == "metal" ]; then conda install -c conda-forge "ffmpeg<7" pip install "mlx>=0.22.0" pip install mlx-lm pip install "mlx-vlm>=0.3.4" pip install mlx-whisper pip install f5-tts-mlx pip install qwen-vl-utils!=0.0.9 pip install tomli else pip install "transformers<4.49" pip install attrdict pip install "timm>=0.9.16" if [ "${{ matrix.python-version }}" != "3.13" ]; then pip install torch torchvision fi pip install accelerate pip install sentencepiece pip install transformers_stream_generator pip install bitsandbytes pip install "sentence-transformers>=5.1.1" pip install modelscope pip install diffusers pip install protobuf pip install FlagEmbedding pip install "tenacity>=8.2.0,<8.4.0" pip install "jinja2==3.1.2" pip install jj-pytorchvideo pip install qwen-vl-utils!=0.0.9 pip install datamodel_code_generator pip install jsonschema fi working-directory: . - name: Clean up disk if: | (startsWith(matrix.os, 'ubuntu')) run: | sudo rm -rf /usr/share/dotnet sudo rm -rf /usr/local/lib/android sudo rm -rf /opt/ghc sudo apt-get clean sudo rm -rf /var/lib/apt/lists/* df -h - name: Fix SSL on Windows if: startsWith(matrix.os, 'windows') shell: bash run: | echo "activate conda env" source $CONDA/etc/profile.d/conda.sh || true conda activate $CONDA_ENV || true python -V which python echo "before: $SSL_CERT_FILE" python -m pip install --quiet certifi || true SSL_CERT_FILE=$(python -c "import certifi,os;print(os.path.normpath(certifi.where()))") export SSL_CERT_FILE export REQUESTS_CA_BUNDLE=$SSL_CERT_FILE export CURL_CA_BUNDLE=$SSL_CERT_FILE echo "after: $SSL_CERT_FILE" echo "SSL_CERT_FILE=$(python -c 'import certifi;print(certifi.where())')" >> $GITHUB_ENV - name: Test with pytest env: MODULE: ${{ matrix.module }} PYTORCH_MPS_HIGH_WATERMARK_RATIO: 1.0 PYTORCH_MPS_LOW_WATERMARK_RATIO: 0.2 XFORMERS_FORCE_DISABLE_TRITON: 1 TORCH_DISABLE_FLASH_ATTENTION: 1 run: | if [ "$MODULE" == "gpu" ]; then ${{ env.SELF_HOST_PYTHON }} -m pip install -U -e ".[audio,dev]" ${{ env.SELF_HOST_PYTHON }} -m pip install -U "openai>1" ${{ env.SELF_HOST_PYTHON }} -m pip install -U modelscope ${{ env.SELF_HOST_PYTHON }} -m pip install -U gguf ${{ env.SELF_HOST_PYTHON }} -m pip install -U uv ${{ env.SELF_HOST_PYTHON }} -m pip install -U sse_starlette ${{ env.SELF_HOST_PYTHON }} -m pip install -U xoscar ${{ env.SELF_HOST_PYTHON }} -m pip install -U "python-jose[cryptography]" ${{ env.SELF_HOST_PYTHON }} -m pip install -U "passlib[bcrypt]" ${{ env.SELF_HOST_PYTHON }} -m pip install -U "aioprometheus[starlette]" ${{ env.SELF_HOST_PYTHON }} -m pip install -U "pynvml" ${{ env.SELF_HOST_PYTHON }} -m pip install "transformers==4.53.2" ${{ env.SELF_HOST_PYTHON }} -m pip install "funasr==1.2.7" ${{ env.SELF_HOST_PYTHON }} -m pip install -U nemo_text_processing<1.1.0 ${{ env.SELF_HOST_PYTHON }} -m pip install -U omegaconf~=2.3.0 ${{ env.SELF_HOST_PYTHON }} -m pip install -U WeTextProcessing<1.0.4 ${{ env.SELF_HOST_PYTHON }} -m pip install -U librosa ${{ env.SELF_HOST_PYTHON }} -m pip install -U xxhash ${{ env.SELF_HOST_PYTHON }} -m pip install -U "ChatTTS>=0.2.1" ${{ env.SELF_HOST_PYTHON }} -m pip install -U HyperPyYAML ${{ env.SELF_HOST_PYTHON }} -m pip uninstall -y matcha-tts ${{ env.SELF_HOST_PYTHON }} -m pip install -U onnxruntime-gpu==1.16.0; sys_platform == 'linux' ${{ env.SELF_HOST_PYTHON }} -m pip install -U openai-whisper ${{ env.SELF_HOST_PYTHON }} -m pip install -U "torch==2.7.0" "torchaudio==2.7.0" "torchvision==0.22.0" ${{ env.SELF_HOST_PYTHON }} -m pip install -U "loguru" ${{ env.SELF_HOST_PYTHON }} -m pip install -U "natsort" ${{ env.SELF_HOST_PYTHON }} -m pip install -U "loralib" ${{ env.SELF_HOST_PYTHON }} -m pip install -U "ormsgpack" ${{ env.SELF_HOST_PYTHON }} -m pip uninstall -y opencc ${{ env.SELF_HOST_PYTHON }} -m pip uninstall -y "faster_whisper" ${{ env.SELF_HOST_PYTHON }} -m pip install -U accelerate ${{ env.SELF_HOST_PYTHON }} -m pip install -U verovio ${{ env.SELF_HOST_PYTHON }} -m pip install -U cachetools ${{ env.SELF_HOST_PYTHON }} -m pip install -U silero-vad ${{ env.SELF_HOST_PYTHON }} -m pip install -U pydantic ${{ env.SELF_HOST_PYTHON }} -m pip install -U diffusers ${{ env.SELF_HOST_PYTHON }} -m pip install -U onnx ${{ env.SELF_HOST_PYTHON }} -m pip install -U onnxconverter_common ${{ env.SELF_HOST_PYTHON }} -m pip install -U torchdiffeq ${{ env.SELF_HOST_PYTHON }} -m pip install -U "x_transformers>=1.31.14" ${{ env.SELF_HOST_PYTHON }} -m pip install -U pypinyin ${{ env.SELF_HOST_PYTHON }} -m pip install -U tomli ${{ env.SELF_HOST_PYTHON }} -m pip install -U vocos ${{ env.SELF_HOST_PYTHON }} -m pip install -U jieba ${{ env.SELF_HOST_PYTHON }} -m pip install -U soundfile ${{ env.SELF_HOST_PYTHON }} -m pip install tensorizer ${{ env.SELF_HOST_PYTHON }} -m pip install -U sentence-transformers ${{ env.SELF_HOST_PYTHON }} -m pip install -U FlagEmbedding ${{ env.SELF_HOST_PYTHON }} -m pip install -U "peft<=0.17.1" ${{ env.SELF_HOST_PYTHON }} -m pip install "xllamacpp>=0.2.0" --index-url https://xorbitsai.github.io/xllamacpp/whl/cu124 --extra-index-url https://pypi.org/simple ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ --disable-warnings \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/core/tests/test_continuous_batching.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/embedding/tests/test_qwen3_vl_engine_params.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/rerank/tests/test_qwen3_vl_reranker_virtualenv.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/image/tests/test_stable_diffusion.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/image/tests/test_got_ocr2.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_whisper.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_funasr.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_chattts.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_cosyvoice.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_f5tts.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_f5tts.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_melotts.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_kokoro.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_fish_speech.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_megatts.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/embedding/tests/test_integrated_embedding.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/embedding/vllm/tests/test_vllm_embedding.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/llm/transformers/tests/test_tensorizer.py && \ ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/llm/tests/test_llm_model.py elif [ "$MODULE" == "metal" ]; then pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/llm/mlx/tests/test_mlx.py && \ pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_whisper_mlx.py && \ pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/audio/tests/test_f5tts_mlx.py && \ pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ --cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/model/llm/mlx/tests/test_distributed_model.py else pytest --timeout=3000 \ -W ignore::PendingDeprecationWarning \ -vv \ --cov-config=setup.cfg \ --cov-report=xml \ --cov=xinference \ --ignore xinference/core/tests/test_continuous_batching.py \ --ignore xinference/model/image/tests/test_stable_diffusion.py \ --ignore xinference/model/image/tests/test_got_ocr2.py \ --ignore xinference/model/audio/tests \ --ignore xinference/model/embedding/tests/test_integrated_embedding.py \ --ignore xinference/model/llm/transformers/tests/test_tensorizer.py \ --ignore xinference/model/llm/tests/test_llm_model.py \ --ignore xinference/model/llm/vllm \ --ignore xinference/model/llm/sglang \ --ignore xinference/client/tests/test_client.py \ --ignore xinference/client/tests/test_async_client.py \ --ignore xinference/model/llm/mlx \ xinference fi working-directory: . ================================================ FILE: .github/workflows/release.yaml ================================================ name: Build and upload to PyPI on: push: tags: - '*' concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true jobs: build-publish: name: Build and publish Python distribution to PyPI runs-on: ubuntu-latest steps: - name: Set up Python uses: actions/setup-python@v4 with: python-version: "3.10" - uses: actions/checkout@v3 - name: Install pypa/build run: >- python3 -m pip install build "setuptools<82" --user - name: Build web run: >- python setup.py build_web - name: Build a binary wheel and a source tarball run: >- python3 -m build --sdist --wheel --outdir dist/ . # if is xorbitsai repo, upload to pypi - uses: pypa/gh-action-pypi-publish@v1.5.0 if: github.repository == 'xorbitsai/inference' with: user: __token__ password: ${{ secrets.PYPI_PASSWORD }} # if is not xorbitsai repo, upload to test - uses: pypa/gh-action-pypi-publish@v1.5.0 if: github.repository != 'xorbitsai/inference' with: user: __token__ password: ${{ secrets.TEST_PYPI_PASSWORD }} verbose: true repository_url: https://test.pypi.org/legacy/ ================================================ FILE: .gitignore ================================================ # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib64/ parts/ sdist/ var/ wheels/ pip-wheel-metadata/ share/python-wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover *.py,cover .hypothesis/ .pytest_cache/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 db.sqlite3-journal # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation generated/ # PyBuilder target/ # Jupyter Notebook .ipynb_checkpoints # IPython profile_default/ ipython_config.py # pyenv .python-version # pipenv # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # However, in case of collaboration, if having platform-specific dependencies or dependencies # having no cross-platform support, pipenv may install dependencies that don't work, or not # install all needed dependencies. #Pipfile.lock # PEP 582; used by e.g. github.com/David-OConnor/pyflow __pypackages__/ # Celery stuff celerybeat-schedule celerybeat.pid # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ .dmypy.json dmypy.json # Pyre type checker .pyre/ # IDEs .idea .vscode *.iml # VIM *.sw* # web staff node_modules/ static/ # Local docs (project notes, refactoring plans, etc.) docs/ # doc doc/source/savefig/ # local env local_env asv/results .DS_Store # Exclude markdown files except README files *.md !README.md !README_*.md ================================================ FILE: .pre-commit-config.yaml ================================================ files: xinference repos: - repo: https://github.com/psf/black rev: 25.1.0 hooks: - id: black exclude: thirdparty - repo: https://github.com/pre-commit/pre-commit-hooks rev: v5.0.0 hooks: - id: end-of-file-fixer exclude: ^xinference/thirdparty - id: trailing-whitespace exclude: thirdparty - repo: https://github.com/PyCQA/flake8 rev: 6.0.0 hooks: - id: flake8 args: [--config, setup.cfg] exclude: thirdparty - repo: https://github.com/pycqa/isort rev: 5.12.0 hooks: - id: isort args: [--sp, setup.cfg] exclude: thirdparty - repo: https://github.com/pre-commit/mirrors-mypy rev: v1.15.0 hooks: - id: mypy additional_dependencies: ["tokenize-rt==3.2.0", "types-requests", "types-tabulate"] args: [--ignore-missing-imports, --follow-imports, skip] exclude: thirdparty - repo: https://github.com/codespell-project/codespell rev: v2.2.2 hooks: - id: codespell args: [ --config, setup.cfg] exclude: thirdparty ================================================ FILE: .readthedocs.yaml ================================================ version: 2 # Build documentation in the docs/ directory with Sphinx sphinx: configuration: doc/source/conf.py build: os: ubuntu-20.04 tools: python: "3.10" python: install: - method: pip path: . extra_requirements: - doc submodules: include: all recursive: true ================================================ FILE: LICENSE ================================================ Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ================================================ FILE: MANIFEST.in ================================================ global-include *.pyx global-include *.pxd global-include xinference/**/*.json global-exclude *.c global-exclude *.cpp include setup.cfg include pyproject.toml global-exclude .DS_Store include versioneer.py include xinference/_version.py global-exclude conftest.py include xinference/locale/*.json include xinference/model/llm/*.json include xinference/model/embedding/*.json graft xinference/thirdparty global-include xinference/ui/web/ui/build/**/* ================================================ FILE: README.md ================================================
xorbits # Xorbits Inference: Model Serving Made Easy 🤖

Xinference Enterprise · Self-hosting · Documentation

[![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/) [![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE) [![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main) [![Docker Pulls](https://img.shields.io/docker/pulls/xprobe/xinference?style=for-the-badge&logo=docker)](https://hub.docker.com/r/xprobe/xinference) [![Discord](https://img.shields.io/badge/join_Discord-5462eb.svg?logo=discord&style=for-the-badge&logoColor=%23f5f5f5)](https://discord.gg/Xw9tszSkr5) [![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=x&style=for-the-badge)](https://twitter.com/xorbitsio)

README in English 简体中文版自述文件 日本語のREADME


Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command. Whether you are a researcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full potential of cutting-edge AI models.
👉 Join our Discord community!
## 🔥 Hot Topics ### Framework Enhancements - Agent-native Serving: Xinference integrates with [Xagent](https://github.com/xorbitsai/xagent) to enable dynamic planning, tool use, and autonomous multi-step reasoning — moving beyond static pipelines. - Auto batch: Multiple concurrent requests are automatically batched, significantly improving throughput: [#4197](https://github.com/xorbitsai/inference/pull/4197) - [Xllamacpp](https://github.com/xorbitsai/xllamacpp): New llama.cpp Python binding, maintained by Xinference team, supports continuous batching and is more production-ready.: [#2997](https://github.com/xorbitsai/inference/pull/2997) - Distributed inference: running models across workers: [#2877](https://github.com/xorbitsai/inference/pull/2877) - VLLM enhancement: Shared KV cache across multiple replicas: [#2732](https://github.com/xorbitsai/inference/pull/2732) ### New Models - Built-in support for [Qwen-3.5](https://github.com/QwenLM/Qwen3.5): [#4639](https://github.com/xorbitsai/inference/pull/4639) - Built-in support for [GLM-5](https://github.com/zai-org/GLM-5): [#4638](https://github.com/xorbitsai/inference/pull/4638) - Built-in support for [MiniMax-M2.5](https://github.com/MiniMax-AI/MiniMax-M2.5): [#4630](https://github.com/xorbitsai/inference/pull/4630) - Built-in support for [Kimi-K2.5](https://github.com/MoonshotAI/Kimi-K2.5): [#4631](https://github.com/xorbitsai/inference/pull/4631) - Built-in support for [FLUX.2-Klein](https://bfl.ai/models/flux-2-klein): [#4596](https://github.com/xorbitsai/inference/pull/4596) - Built-in support for [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR): [#4581](https://github.com/xorbitsai/inference/pull/4581) - Built-in support for [GLM-4.7](https://huggingface.co/zai-org/GLM-4.7): [#4565](https://github.com/xorbitsai/inference/pull/4565) - Built-in support for [MinerU2.5-2509-1.2B](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B): [#4569](https://github.com/xorbitsai/inference/pull/4569) ### Integrations - [Xagent](https://github.com/xorbitsai/xagent): an enterprise agent platform for building and running AI agents with planning, memory, and tool use — not limited to rigid workflows. - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable. - [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization. - [RAGFlow](https://github.com/infiniflow/ragflow): is an open-source RAG engine based on deep document understanding. - [MaxKB](https://github.com/1Panel-dev/MaxKB): MaxKB = Max Knowledge Brain, it is a powerful and easy-to-use AI assistant that integrates Retrieval-Augmented Generation (RAG) pipelines, supports robust workflows, and provides advanced MCP tool-use capabilities. ## Key Features 🌟 **Model Serving Made Easy**: Simplify the process of serving large language, speech recognition, and multimodal models. You can set up and deploy your models for experimentation and production with a single command. ⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single command. Inference provides access to state-of-the-art open-source models! 🖥 **Heterogeneous Hardware Utilization**: Make the most of your hardware resources with [ggml](https://github.com/ggerganov/ggml). Xorbits Inference intelligently utilizes heterogeneous hardware, including GPUs and CPUs, to accelerate your model inference tasks. ⚙️ **Flexible API and Interfaces**: Offer multiple interfaces for interacting with your models, supporting OpenAI compatible RESTful API (including Function Calling API), RPC, CLI and WebUI for seamless model management and interaction. 🌐 **Distributed Deployment**: Excel in distributed deployment scenarios, allowing the seamless distribution of model inference across multiple devices or machines. 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates with popular third-party libraries including [LangChain](https://python.langchain.com/docs/integrations/providers/xinference), [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window), [Dify](https://docs.dify.ai/advanced/model-configuration/xinference), and [Chatbox](https://chatboxai.app/). ## Why Xinference | Feature | Xinference | FastChat | OpenLLM | RayLLM | |------------------------------------------------|------------|----------|---------|--------| | OpenAI-Compatible RESTful API | ✅ | ✅ | ✅ | ✅ | | vLLM Integrations | ✅ | ✅ | ✅ | ✅ | | More Inference Engines (GGML, TensorRT) | ✅ | ❌ | ✅ | ✅ | | More Platforms (CPU, Metal) | ✅ | ✅ | ❌ | ❌ | | Multi-node Cluster Deployment | ✅ | ❌ | ❌ | ✅ | | Image Models (Text-to-Image) | ✅ | ✅ | ❌ | ❌ | | Text Embedding Models | ✅ | ❌ | ❌ | ❌ | | Multimodal Models | ✅ | ❌ | ❌ | ❌ | | Audio Models | ✅ | ❌ | ❌ | ❌ | | More OpenAI Functionalities (Function Calling) | ✅ | ❌ | ❌ | ❌ | ## Using Xinference - **Self-hosting Xinference Community Edition
** Quickly get Xinference running in your environment with this [starter guide](#getting-started). Use our [documentation](https://inference.readthedocs.io/) for further references and more in-depth instructions. - **Xinference for enterprise / organizations
** We provide additional enterprise-centric features. [send us an email](mailto:business@xprobe.io?subject=[GitHub]Business%20License%20Inquiry) to discuss enterprise needs.
## Staying Ahead Star Xinference on GitHub and be instantly notified of new releases. ![star-us](assets/stay_ahead.gif) ## Getting Started * [Docs](https://inference.readthedocs.io/en/latest/index.html) * [Built-in Models](https://inference.readthedocs.io/en/latest/models/builtin/index.html) * [Custom Models](https://inference.readthedocs.io/en/latest/models/custom.html) * [Deployment Docs](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html) * [Examples and Tutorials](https://inference.readthedocs.io/en/latest/examples/index.html) ### Jupyter Notebook The lightest way to experience Xinference is to try our [Jupyter Notebook on Google Colab](https://colab.research.google.com/github/xorbitsai/inference/blob/main/examples/Xinference_Quick_Start.ipynb). ### Docker Nvidia GPU users can start Xinference server using [Xinference Docker Image](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html). Prior to executing the installation command, ensure that both [Docker](https://docs.docker.com/get-docker/) and [CUDA](https://developer.nvidia.com/cuda-downloads) are set up on your system. ```bash docker run --name xinference -d -p 9997:9997 -e XINFERENCE_HOME=/data -v :/data --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0 ``` ### K8s via helm Ensure that you have GPU support in your Kubernetes cluster, then install as follows. ``` # add repo helm repo add xinference https://xorbitsai.github.io/xinference-helm-charts # update indexes and query xinference versions helm repo update xinference helm search repo xinference/xinference --devel --versions # install xinference helm install xinference xinference/xinference -n xinference --version 0.0.1-v ``` For more customized installation methods on K8s, please refer to the [documentation](https://inference.readthedocs.io/en/latest/getting_started/using_kubernetes.html). ### Quick Start Install Xinference by using pip as follows. (For more options, see [Installation page](https://inference.readthedocs.io/en/latest/getting_started/installation.html).) ```bash pip install "xinference[all]" ``` To start a local instance of Xinference, run the following command: ```bash $ xinference-local ``` Once Xinference is running, there are multiple ways you can try it: via the web UI, via cURL, via the command line, or via the Xinference’s python client. Check out our [docs]( https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html#run-xinference-locally) for the guide. ![web UI](assets/screenshot.png) ## Getting involved | Platform | Purpose | |-------------------------------------------------------------------------------------------------|---------------------------------------------| | [Github Issues](https://github.com/xorbitsai/inference/issues) | Reporting bugs and filing feature requests. | | [Discord](https://discord.gg/Xw9tszSkr5) | Collaborating with other Xinference users. | | [Twitter](https://twitter.com/xorbitsio) | Staying up-to-date on new features. | ## Citation If this work is helpful, please kindly cite as: ```bibtex @inproceedings{lu2024xinference, title = "Xinference: Making Large Model Serving Easy", author = "Lu, Weizheng and Xiong, Lingfeng and Zhang, Feng and Qin, Xuye and Chen, Yueguo", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-demo.30", pages = "291--300", } ``` ## Contributors ## Star History [![Star History Chart](https://api.star-history.com/svg?repos=xorbitsai/inference&type=Date)](https://star-history.com/#xorbitsai/inference&Date) ================================================ FILE: README_ja_JP.md ================================================
xorbits # Xorbits Inference: モデルサービングを簡単に 🤖

Xinference Enterprise(企業版) · セルフホスティング · ドキュメント

[![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/) [![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE) [![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main) [![Docker Pulls](https://img.shields.io/docker/pulls/xprobe/xinference?style=for-the-badge&logo=docker)](https://hub.docker.com/r/xprobe/xinference) [![Discord](https://img.shields.io/badge/join_Discord-5462eb.svg?logo=discord&style=for-the-badge&logoColor=%23f5f5f5)](https://discord.gg/Xw9tszSkr5) [![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=x&style=for-the-badge)](https://twitter.com/xorbitsio)

README in English 简体中文版自述文件 日本語のREADME


Xorbits Inference(Xinference) は、言語、音声認識、マルチモーダルモデルのために 設計された強力で汎用性の高いライブラリです。 Xorbits Inference を使えば、たった 1 つのコマンドで、 あなたや最先端のビルトインモデルを簡単にデプロイし、提供することができます。 Xorbits Inference は、 研究者、開発者、データサイエンティストを問わず、最先端の AI モデルの可能性を最大限に引き出すことができます。
👉 Discord コミュニティにご参加ください!
## 主な特徴 🌟 **モデルサービングを簡単に**: 大規模な言語、音声認識、マルチモーダルモデルの提供プロセスを簡素化します。 1つのコマンドで、実験用と本番用のモデルをセットアップしてデプロイできます。 ⚡️ **最先端モデル**: コマンド1つで最先端のビルトインモデルを実験。 Inference は、最先端のオープンソースモデルへのアクセスを提供します! 🖥 **異機種ハードウェアの利用**: [ggml](https://github.com/ggerganov/ggml) でハードウェアリソースを最大限に活用しましょう。 Xorbits Inference は、GPU や CPU を含む異種ハードウェアをインテリジェントに利用し、モデル推論タスクを高速化します。 ⚙️ **柔軟な API とインターフェース**: OpenAI互換のRESTful API(Function Callingを含む)、RPC、コマンドライン、Web UIなど、 多様なインターフェースを提供し、モデルの管理と相互作用を容易にします。 🌐 **配布デプロイメント**: Excel の分散展開シナリオでは、複数のデバイスやマシンにモデルの推論をシームレスに分散させることができます。 🔌 **サードパーティライブラリとの組み込み統合**: Xorbits Inference は、[LangChain](https://python.langchain.com/docs/integrations/providers/xinference) や [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window) のような人気のあるサードパーティライブラリと シームレスに統合されています。 ## なぜ Xinference を選ぶのか | 機能 | Xinference | FastChat | OpenLLM | RayLLM | |------|------------|----------|---------|--------| | OpenAI 互換の RESTful API | ✅ | ✅ | ✅ | ✅ | | vLLM 統合 | ✅ | ✅ | ✅ | ✅ | | その他の推論エンジン(GGML、TensorRT) | ✅ | ❌ | ✅ | ✅ | | その他のプラットフォーム(CPU、Metal) | ✅ | ✅ | ❌ | ❌ | | マルチノードクラスター展開 | ✅ | ❌ | ❌ | ✅ | | 画像モデル(テキストから画像へ) | ✅ | ✅ | ❌ | ❌ | | テキスト埋め込みモデル | ✅ | ❌ | ❌ | ❌ | | マルチモーダルモデル | ✅ | ❌ | ❌ | ❌ | | より多くのOpenAI機能(関数呼び出し) | ✅ | ❌ | ❌ | ❌ | ## 入門ガイド **始める前に、GitHubで私たちにスターを付けてください。そうすると、新しいリリースの通知を即座に受け取ることができます!** * [ドキュメント](https://inference.readthedocs.io/en/latest/index.html) * [組み込みモデル](https://inference.readthedocs.io/en/latest/models/builtin/index.html) * [カスタムモデル](https://inference.readthedocs.io/en/latest/models/custom.html) * [デプロイメントドキュメント](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html) * [例とチュートリアル](https://inference.readthedocs.io/en/latest/examples/index.html) ### Jupyter Notebook Xinferenceを体験する最軽量な方法は、私たちの[Google Colab上のJupyterノートブック](https://colab.research.google.com/github/xorbitsai/inference/blob/main/examples/Xinference_Quick_Start.ipynb)を試すことです]。 ### Docker Nvidia GPUユーザーは、[Xinference Dockerイメージ](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html)を使用してXinferenceサーバーを開始することができます。インストールコマンドを実行する前に、システムに[Docker](https://docs.docker.com/get-docker/)と[CUDA](https://developer.nvidia.com/cuda-downloads)が設定されていることを確認してください。 ### クイックスタート 以下のようにpipを使用してXinferenceをインストールします。(他のオプションについては、[インストールページ](https://inference.readthedocs.io/en/latest/getting_started/installation.html)を参照してください。) ```bash pip install "xinference[all]" ``` ローカルインスタンスのXinferenceを開始するには、次のコマンドを実行します: ```bash $ xinference-local ``` Xinferenceが実行されると、Web UI、cURL、コマンドライン、またはXinferenceのPythonクライアントを介して試すことができます。詳細は[ドキュメント](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html#run-xinference-locally)をご覧ください。 ![Web UI](assets/screenshot.png) ## 関与する | プラットフォーム | 目的 | |-------------------------------------------------------------------------------------------------|-----------------------| | [Github イシュー](https://github.com/xorbitsai/inference/issues) | バグ報告と機能リクエストの提出。 | | [Discord](https://discord.gg/Xw9tszSkr5) | 他のXinferenceユーザーとの協力。 | | [Twitter](https://twitter.com/xorbitsio) | 新機能に関する最新情報の入手。 | ## 引用 この仕事が役立つ場合は、以下のように引用してください: ```bibtex @inproceedings{lu2024xinference, title = "Xinference: Making Large Model Serving Easy", author = "Lu, Weizheng and Xiong, Lingfeng and Zhang, Feng and Qin, Xuye and Chen, Yueguo", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-demo.30", pages = "291--300", } ``` ## 寄稿者 ================================================ FILE: README_zh_CN.md ================================================
xorbits # Xorbits Inference:模型推理, 轻而易举 🤖

Xinference 企业版 · 自托管 · 文档

[![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/) [![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE) [![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main) [![Docker Pulls](https://img.shields.io/docker/pulls/xprobe/xinference?style=for-the-badge&logo=docker)](https://hub.docker.com/r/xprobe/xinference) [![WeChat](https://img.shields.io/badge/添加微信小助手-07C160?style=for-the-badge&logo=wechat&logoColor=white)](https://xinference.cn/images/WeCom.jpg) [![Zhihu](https://img.shields.io/static/v1?style=for-the-badge&message=未来速度&color=0084FF&logo=Zhihu&logoColor=FFFFFF&label=)](https://www.zhihu.com/org/xorbits)

README in English 简体中文版自述文件 日本語のREADME


Xorbits Inference(Xinference)是一个性能强大且功能全面的分布式推理框架。可用于大语言模型(LLM),语音识别模型,多模态模型等各种模型的推理。通过 Xorbits Inference,你可以轻松地一键部署你自己的模型或内置的前沿开源模型。无论你是研究者,开发者,或是数据科学家,都可以通过 Xorbits Inference 与最前沿的 AI 模型,发掘更多可能。
👉 添加企业微信、加入Xinference社区!
## 🔥 近期热点 ### 框架增强 - Agent 原生服务能力:Xinference 与 [Xagent](https://github.com/xorbitsai/xagent) 深度集成,支持动态规划、工具调用与多步自主推理,突破传统静态流程的限制。 - 自动 Batch: 多个并发请求会被自动合批处理,大幅提升吞吐量。: [#4197](https://github.com/xorbitsai/inference/pull/4197) - 支持寒武纪芯片:[#3693](https://github.com/xorbitsai/inference/pull/3693) - [Xllamacpp](https://github.com/xorbitsai/xllamacpp): 全新llama.cpp Python binding,由 Xinference 团队维护,支持持续并行且更生产可用: [#2997](https://github.com/xorbitsai/inference/pull/2997) - 分布式推理:在多个 worker 上运行大尺寸模型:[#2877](https://github.com/xorbitsai/inference/pull/2877) - VLLM 引擎增强: 跨副本共享KV Cache: [#2732](https://github.com/xorbitsai/inference/pull/2732) ### 新模型 - 内置 [Qwen-3.5](https://github.com/QwenLM/Qwen3.5): [#4639](https://github.com/xorbitsai/inference/pull/4639) - 内置 [GLM-5](https://github.com/zai-org/GLM-5): [#4638](https://github.com/xorbitsai/inference/pull/4638) - 内置 [MiniMax-M2.5](https://github.com/MiniMax-AI/MiniMax-M2.5): [#4630](https://github.com/xorbitsai/inference/pull/4630) - 内置 [Kimi-K2.5](https://github.com/MoonshotAI/Kimi-K2.5): [#4631](https://github.com/xorbitsai/inference/pull/4631) - 内置 [FLUX.2-Klein](https://bfl.ai/models/flux-2-klein): [#4596](https://github.com/xorbitsai/inference/pull/4596) - 内置 [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR): [#4581](https://github.com/xorbitsai/inference/pull/4581) - 内置 [GLM-4.7](https://huggingface.co/zai-org/GLM-4.7): [#4565](https://github.com/xorbitsai/inference/pull/4565) - 内置 [MinerU2.5-2509-1.2B](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B): [#4569](https://github.com/xorbitsai/inference/pull/4569) ### 集成 - [Xagent](https://github.com/xorbitsai/xagent):企业级 Agent 平台,用于构建和运行具备规划、记忆与工具调用能力的智能体,不再受限于僵化的工作流。 - [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/):一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力,帮助您轻松实现复杂的问答场景。 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。 - [RAGFlow](https://github.com/infiniflow/ragflow): 是一款基于深度文档理解构建的开源 RAG 引擎。 - [MaxKB](https://github.com/1Panel-dev/MaxKB): MaxKB = Max Knowledge Base,是一款基于大语言模型和 RAG 的开源知识库问答系统,广泛应用于智能客服、企业内部知识库、学术研究与教育等场景。 ## 主要功能 🌟 **模型推理,轻而易举**:大语言模型,语音识别模型,多模态模型的部署流程被大大简化。一个命令即可完成模型的部署工作。 ⚡️ **前沿模型,应有尽有**:框架内置众多中英文的前沿大语言模型,包括 baichuan,chatglm2 等,一键即可体验!内置模型列表还在快速更新中! 🖥 **异构硬件,快如闪电**:通过 [ggml](https://github.com/ggerganov/ggml),同时使用你的 GPU 与 CPU 进行推理,降低延迟,提高吞吐! ⚙️ **接口调用,灵活多样**:提供多种使用模型的接口,包括 OpenAI 兼容的 RESTful API(包括 Function Calling),RPC,命令行,web UI 等等。方便模型的管理与交互。 🌐 **集群计算,分布协同**: 支持分布式部署,通过内置的资源调度器,让不同大小的模型按需调度到不同机器,充分使用集群资源。 🔌 **开放生态,无缝对接**: 与流行的三方库无缝对接,包括 [LangChain](https://python.langchain.com/docs/integrations/providers/xinference),[LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window),[Dify](https://docs.dify.ai/advanced/model-configuration/xinference),以及 [Chatbox](https://chatboxai.app/)。 ## 为什么选择 Xinference | 功能特点 | Xinference | FastChat | OpenLLM | RayLLM | |-------------------------|------------|----------|---------|--------| | 兼容 OpenAI 的 RESTful API | ✅ | ✅ | ✅ | ✅ | | vLLM 集成 | ✅ | ✅ | ✅ | ✅ | | 更多推理引擎(GGML、TensorRT) | ✅ | ❌ | ✅ | ✅ | | 更多平台支持(CPU、Metal) | ✅ | ✅ | ❌ | ❌ | | 分布式集群部署 | ✅ | ❌ | ❌ | ✅ | | 图像模型(文生图) | ✅ | ✅ | ❌ | ❌ | | 文本嵌入模型 | ✅ | ❌ | ❌ | ❌ | | 多模态模型 | ✅ | ❌ | ❌ | ❌ | | 语音识别模型 | ✅ | ❌ | ❌ | ❌ | | 更多 OpenAI 功能 (函数调用) | ✅ | ❌ | ❌ | ❌ | ## 使用 Xinference - **自托管 Xinference 社区版
** 使用 [入门指南](#getting-started) 快速在你自己的环境中运行 Xinference。 参考 [文档](https://inference.readthedocs.io/zh-cn) 以获得参考和更多说明。 - **面向企业/组织的 Xinference 版本
** 我们提供额外的面向企业的功能。 [通过企业微信联系](https://xinference.cn/images/WeCom.jpg) 或 [提交表单](https://w8v6grm432.feishu.cn/share/base/form/shrcn9u1EBXQxmGMqILEjguuGoh) 讨论企业需求。
## 保持领先 在 GitHub 上给 Xinference Star,并立即收到新版本的通知。 ![star-us](assets/stay_ahead.gif) ## 入门指南 * [文档](https://inference.readthedocs.io/zh-cn/latest/index.html) * [内置模型](https://inference.readthedocs.io/zh-cn/latest/models/builtin/index.html) * [自定义模型](https://inference.readthedocs.io/zh-cn/latest/models/custom.html) * [部署文档](https://inference.readthedocs.io/zh-cn/latest/getting_started/using_xinference.html) * [示例和教程](https://inference.readthedocs.io/zh-cn/latest/examples/index.html) ### Jupyter Notebook 体验 Xinference 最轻量级的方式是使用我们 [Google Colab 上的 Jupyter Notebook](https://colab.research.google.com/github/xorbitsai/inference/blob/main/examples/Xinference_Quick_Start.ipynb)。 ### Docker Nvidia GPU 用户可以使用[Xinference Docker 镜像](https://inference.readthedocs.io/zh-cn/latest/getting_started/using_docker_image.html) 启动 Xinference 服务器。在执行安装命令之前,确保你的系统中已经安装了 [Docker](https://docs.docker.com/get-docker/) 和 [CUDA](https://developer.nvidia.com/cuda-downloads)。 ### Kubernetes 确保你的 Kubernetes 集群开启了 GPU 支持,然后通过 `helm` 进行如下方式的安装。 ``` # 新增xinference仓库 helm repo add xinference https://xorbitsai.github.io/xinference-helm-charts # 更新仓库,查询可安装的版本 helm repo update xinference helm search repo xinference/xinference --devel --versions # 在K8s中安装xinference helm install xinference xinference/xinference -n xinference --version 0.0.1-v ``` 更多定制化安装方式,请参考[文档](https://inference.readthedocs.io/en/latest/getting_started/using_kubernetes.html)。 ### 快速开始 使用 pip 安装 Xinference,操作如下。(更多选项,请参阅[安装页面](https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html)。) ```bash pip install "xinference[all]" ``` 要启动一个本地的 Xinference 实例,请运行以下命令: ```bash $ xinference-local ``` 一旦 Xinference 运行起来,你可以通过多种方式尝试它:通过网络界面、通过 cURL、通过命令行或通过 Xinference 的 Python 客户端。更多指南,请查看我们的[文档](https://inference.readthedocs.io/zh-cn/latest/getting_started/using_xinference.html#run-xinference-locally)。 ![网络界面](assets/screenshot.png) ## 参与其中 | 平台 | 目的 | |-------------------------------------------------------------------------------------------------|----------------------| | [Github 问题](https://github.com/xorbitsai/inference/issues) | 报告错误和提交功能请求。 | | [Discord](https://discord.gg/Xw9tszSkr5) | 与其他 Xinference 用户合作。 | | [Twitter](https://twitter.com/xorbitsio) | 及时了解新功能。 | | [微信社群](https://xinference.cn/images/WeCom.jpg) | 与其他 Xinference 用户交流。 | | [知乎](https://zhihu.com/org/xorbits) | 了解团队最新的进展。 | ## 引用 如果您觉得此项目有帮助,请以如下格式引用我们: ```bibtex @inproceedings{lu2024xinference, title = "Xinference: Making Large Model Serving Easy", author = "Lu, Weizheng and Xiong, Lingfeng and Zhang, Feng and Qin, Xuye and Chen, Yueguo", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-demo.30", pages = "291--300", } ``` ## 合作 * [琶洲实验室 | 黄埔](https://www.pazhoulab-huangpu.com/#/) ## 贡献者 ## Star 历史 [![Star History Chart](https://api.star-history.com/svg?repos=xorbitsai/inference&type=Date)](https://star-history.com/#xorbitsai/inference&Date) ================================================ FILE: benchmark/README.md ================================================ # Benchmarking Xinference ## Downloading the ShareGPT dataset You can download the dataset by running: ```bash wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json ``` ## Benchmarking latency This tool will sample prompts from dataset, and run benchmark with serialized requests. ```bash python benchmark_latency.py --dataset /path/to/ShareGPT_V3_unfiltered_cleaned_split.json \ --tokenizer /path/to/tokenizer \ --num-prompt 100 \ --model-uid ${model_uid} ``` ## Benchmarking serving This tool will sample prompts from dataset, and run benchmark with parallel requests. ```bash python benchmark_serving.py --dataset /path/to/ShareGPT_V3_unfiltered_cleaned_split.json \ --tokenizer /path/to/tokenizer \ --model-uid ${model_uid} \ --num-prompt 100 --concurrency 50 ``` ## Benchmarking long context serving This tool will generate long prompts to sort random numbers, according to specified context length. ``` python benchmark/benchmark_long.py --context-length ${context_length} --tokenizer /path/to/tokenizer \ --model-uid ${model_uid} \ --num-prompts 32 -c 16 ``` ## Common Options for Benchmarking Tools - `--stream`. You can enable streaming responses by using the option, which is useful for real-time data processing and receiving incremental data without waiting for the entire dataset to be processed. - `--print-error`. For troubleshooting and more detailed output, the option can be used to print detailed error messages if any errors are encountered during the execution. These options are available for use in all benchmarking tools provided in this suite, enhancing flexibility and providing essential debugging information. ================================================ FILE: benchmark/benchmark_embedding.py ================================================ # Copyright 2022-2025 XProbe Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import argparse import asyncio import logging import random import time import aiohttp from typing import List, Dict, Optional from datasets import load_dataset import numpy as np from benchmark_runner import ConcurrentBenchmarkRunner logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) class EmbeddingBenchmarkRunner(ConcurrentBenchmarkRunner): def __init__( self, api_url: str, model_uid: str, input_requests: List[Dict], stream: bool, concurrency: int, api_key: Optional[str] = None, print_error: bool = False, ): super().__init__( api_url, model_uid, input_requests, stream, concurrency, api_key, print_error, ) async def _run(self): tasks = [] for i in range(self.concurrency): tasks.append(asyncio.create_task(self.worker(i))) await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED) async def worker(self, i: int): r = random.Random(i) index = r.randint(0, len(self.input_requests) - 1) while self.left > 0: request = self.input_requests[index] index += 1 index = index % len(self.input_requests) await self.send_request(request) self.left -= 1 # pring longer space to overwrite the previous when left decrease print("\rdone_request, left %d " % (self.left), end="") # The last one print("") async def send_request(self, request, warming_up: bool = False): input = request["sentence"] request_start_time = time.time() pload = { "model": self.model_uid, "input": input, } headers = {"User-Agent": "Benchmark Client"} if self.api_key: headers["Authorization"] = f"Bearer {self.api_key}" timeout = aiohttp.ClientTimeout(total=3 * 3600) async with aiohttp.ClientSession(timeout=timeout) as session: async with session.post( self.api_url, headers=headers, json=pload ) as response: resp = await response.json() if response.status == 200: request_end_time = time.time() request_latency = request_end_time - request_start_time if not warming_up: self.outputs.append(request_latency) else: logger.error(f"Failed to create chat completion: {resp}") def main(args: argparse.Namespace): print(args) random.seed(args.seed) np.random.seed(args.seed) api_url = f"http://{args.host}:{args.port}/v1/embeddings" model_uid = args.model_uid logger.info("Preparing for benchmark.") dataset = load_dataset(args.dataset, args.subset) input_requests = dataset["test"].to_list() if args.num_query > 0: input_requests = input_requests[: args.num_query] else: args.num_query = len(input_requests) logger.info("Benchmark starts.") benchmark = EmbeddingBenchmarkRunner( api_url, model_uid, input_requests, args.stream, concurrency=args.concurrency, api_key=args.api_key, print_error=args.print_error, ) asyncio.run(benchmark.run()) # TODO: Print the results of request_latency in detail. # benchmark.print_stats() needs to be overridden print(f"Total time: {benchmark.benchmark_time:.2f} s") print(f"Throughput: {args.num_query / benchmark.benchmark_time:.2f} requests/s") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Stress test the embedding model.") parser.add_argument("--host", type=str, default="localhost") parser.add_argument("--port", type=int, default=9997) parser.add_argument( "--dataset", type=str, default="clue", help="Name to the dataset.", ) parser.add_argument( "--subset", type=str, default="tnews", help="Subset to the dataset.", ) parser.add_argument( "--concurrency", "-c", type=int, default=256, help="Set the concurrency of request to send", ) parser.add_argument( "--num-query", "-q", type=int, default=-1, help="Set the query dataset count, default is all", ) parser.add_argument( "--trust-remote-code", action="store_true", help="Trust remote code from huggingface.", ) parser.add_argument( "--model-uid", type=str, required=True, help="Xinference model UID." ) parser.add_argument("--seed", type=int, default=0) parser.add_argument( "--stream", action="store_true", help="Enable streaming responses." ) parser.add_argument( "--api-key", type=str, default=None, help="Authorization api key", ) parser.add_argument( "--print-error", action="store_true", help="Print detailed error messages if any errors encountered." ) args = parser.parse_args() main(args) ================================================ FILE: benchmark/benchmark_latency.py ================================================ # Copyright 2022-2023 XProbe Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import argparse import asyncio import logging import random import numpy as np from utils import get_tokenizer, sample_requests from benchmark_runner import BenchmarkRunner logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) class LatencyBenchmarkRunner(BenchmarkRunner): async def _run(self): total_requests = len(self.input_requests) for i, request in enumerate(self.input_requests): await self.send_request(request) remaining = total_requests - (i + 1) print( f"\rProcessed {i + 1}/{total_requests} requests, {remaining} remaining.", end="", ) print("") def main(args: argparse.Namespace): print(args) random.seed(args.seed) np.random.seed(args.seed) api_url = f"http://{args.host}:{args.port}/v1/chat/completions" model_uid = args.model_uid logger.info("Preparing for benchmark.") tokenizer = get_tokenizer(args.tokenizer, trust_remote_code=args.trust_remote_code) input_requests = sample_requests(args.dataset, args.num_prompts, tokenizer) logger.info("Benchmark starts.") benchmark = LatencyBenchmarkRunner( api_url, model_uid, input_requests, args.stream, args.api_key, args.print_error, ) asyncio.run(benchmark.run()) benchmark.print_stats() if __name__ == "__main__": parser = argparse.ArgumentParser( description="Benchmark the latency of processing a single batch of requests." ) parser.add_argument("--host", type=str, default="localhost") parser.add_argument("--port", type=int, default=9997) parser.add_argument( "--dataset", type=str, required=True, help="Path to the dataset." ) parser.add_argument( "--tokenizer", type=str, required=True, help="Name or path of the tokenizer." ) parser.add_argument( "--num-prompts", type=int, default=100, help="Number of prompts to process." ) parser.add_argument("--seed", type=int, default=0) parser.add_argument( "--trust-remote-code", action="store_true", help="Trust remote code from huggingface.", ) parser.add_argument("--model-uid", type=str, help="Xinference model UID.") parser.add_argument( "--stream", action="store_true", help="Enable streaming responses." ) parser.add_argument( "--api-key", type=str, default=None, help="Authorization api key", ) parser.add_argument( "--print-error", action="store_true", help="Print detailed error messages if any errors encountered." ) args = parser.parse_args() main(args) ================================================ FILE: benchmark/benchmark_long.py ================================================ # Copyright 2022-2023 XProbe Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import argparse import asyncio import logging import random import numpy as np from utils import generate_sorting_prompts, get_tokenizer from benchmark_runner import ConcurrentBenchmarkRunner logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) class LongBenchmarkRunner(ConcurrentBenchmarkRunner): async def _run(self): tasks = [] for i in range(self.concurrency): tasks.append(asyncio.create_task(self.worker(i))) await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED) async def worker(self, i: int): r = random.Random(i) index = r.randint(0, len(self.input_requests) - 1) while self.left > 0: request = self.input_requests[index] index += 1 index = index % len(self.input_requests) await self.send_request(request) self.left -= 1 # pring longer space to overwrite the previous when left decrease print("\rdone_request, left %d " % (self.left), end="") # The last one print("") def main(args: argparse.Namespace): if args.concurrency > args.num_prompts: print("Fix concurrency with num_prompts %d" % (args.num_prompts)) args.concurrency = args.num_prompts print(args) random.seed(args.seed) np.random.seed(args.seed) api_url = f"http://{args.host}:{args.port}/v1/chat/completions" model_uid = args.model_uid logger.info("Preparing for benchmark.") tokenizer = get_tokenizer(args.tokenizer, trust_remote_code=args.trust_remote_code) # XXX: generate_sorting_prompts() currently only generate prompts 1/2 to 2/3 of context_length, # because tokenizers vary by models, consider improve in the future. input_requests = generate_sorting_prompts( args.concurrency, args.context_length, args.context_length / 2 - 20, tokenizer ) logger.info("Benchmark starts.") benchmark = LongBenchmarkRunner( api_url, model_uid, input_requests, args.stream, concurrency=args.concurrency, api_key=args.api_key, print_error=args.print_error, ) asyncio.run(benchmark.run()) benchmark.print_stats() if __name__ == "__main__": parser = argparse.ArgumentParser( description="Benchmark the online serving throughput with long context." ) parser.add_argument("--host", type=str, default="localhost") parser.add_argument("--port", type=int, default=9997) parser.add_argument( "--tokenizer", type=str, required=True, help="Name or path of the tokenizer." ) parser.add_argument( "--context-length", type=int, default=32768, help="model context_length." ) parser.add_argument( "--num-prompts", type=int, default=16, help="Number of prompts to process." ) parser.add_argument( "--concurrency", "-c", type=int, default=16, help="Set the concurrency of request to send", ) parser.add_argument( "--trust-remote-code", action="store_true", help="Trust remote code from huggingface.", ) parser.add_argument("--model-uid", type=str, help="Xinference model UID.") parser.add_argument( "--api-key", type=str, default=None, help="Authorization api key", ) parser.add_argument("--seed", type=int, default=0) parser.add_argument( "--stream", action="store_true", help="Enable streaming responses." ) parser.add_argument( "--print-error", action="store_true", help="Print detailed error messages if any errors encountered." ) args = parser.parse_args() main(args) ================================================ FILE: benchmark/benchmark_rerank.py ================================================ # Copyright 2022-2023 XProbe Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import argparse import asyncio import logging import random import time import aiohttp from typing import List, Dict, Optional from datasets import load_dataset import numpy as np from benchmark_runner import ConcurrentBenchmarkRunner logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) class RerankBenchmarkRunner(ConcurrentBenchmarkRunner): def __init__( self, api_url: str, model_uid: str, input_requests: List[Dict], stream: bool, top_n: int, concurrency: int, api_key: Optional[str] = None, print_error: bool = False, ): super().__init__( api_url, model_uid, input_requests, stream, concurrency, api_key, print_error, ) self.top_n = top_n async def _run(self): tasks = [] for i in range(self.concurrency): tasks.append(asyncio.create_task(self.worker(i))) await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED) async def worker(self, i: int): r = random.Random(i) index = r.randint(0, len(self.input_requests) - 1) while self.left > 0: request = self.input_requests[index] index += 1 index = index % len(self.input_requests) await self.send_request(request) self.left -= 1 # pring longer space to overwrite the previous when left decrease print("\rdone_request, left %d " % (self.left), end="") # The last one print("") async def send_request(self, request, warming_up: bool = False): prompt, documents = request["query"], request["positive"] request_start_time = time.time() pload = { "model": self.model_uid, "top_n": self.top_n, "query": prompt, "documents": documents, } headers = {"User-Agent": "Benchmark Client"} if self.api_key: headers["Authorization"] = f"Bearer {self.api_key}" timeout = aiohttp.ClientTimeout(total=3 * 3600) async with aiohttp.ClientSession(timeout=timeout) as session: async with session.post( self.api_url, headers=headers, json=pload ) as response: resp = await response.json() if response.status == 200: request_end_time = time.time() request_latency = request_end_time - request_start_time if not warming_up: self.outputs.append(request_latency) else: logger.error(f"Failed to create chat completion: {resp}") def main(args: argparse.Namespace): print(args) random.seed(args.seed) np.random.seed(args.seed) api_url = f"http://{args.host}:{args.port}/v1/rerank" model_uid = args.model_uid logger.info("Preparing for benchmark.") dataset = load_dataset(args.dataset) input_requests = dataset["test"].remove_columns("negative").to_list() if args.num_query > 0: input_requests = input_requests[: args.num_query] else: args.num_query = len(input_requests) logger.info("Benchmark starts.") benchmark = RerankBenchmarkRunner( api_url, model_uid, input_requests, args.stream, top_n=args.top_n, concurrency=args.concurrency, api_key=args.api_key, print_error=args.print_error, ) asyncio.run(benchmark.run()) # TODO: Print the results of request_latency in detail. # benchmark.print_stats() needs to be overridden print(f"Total time: {benchmark.benchmark_time:.2f} s") print(f"Throughput: {args.num_query / benchmark.benchmark_time:.2f} requests/s") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Stress test the rerank model.") parser.add_argument("--host", type=str, default="localhost") parser.add_argument("--port", type=int, default=9997) parser.add_argument( "--dataset", type=str, default="mteb/scidocs-reranking", help="Path to the dataset.", ) parser.add_argument( "--concurrency", "-c", type=int, default=16, help="Set the concurrency of request to send", ) parser.add_argument( "--top-n", "-n", type=int, default=5, help="Set the top n to the rerank", ) parser.add_argument( "--num-query", "-q", type=int, default=-1, help="Set the query dataset count, default is all", ) parser.add_argument( "--trust-remote-code", action="store_true", help="Trust remote code from huggingface.", ) parser.add_argument( "--model-uid", type=str, required=True, help="Xinference model UID." ) parser.add_argument("--seed", type=int, default=0) parser.add_argument( "--stream", action="store_true", help="Enable streaming responses." ) parser.add_argument( "--api-key", type=str, default=None, help="Authorization api key", ) parser.add_argument( "--print-error", action="store_true", help="Print detailed error messages if any errors encountered." ) args = parser.parse_args() main(args) ================================================ FILE: benchmark/benchmark_runner.py ================================================ # Copyright 2022-2023 XProbe Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import aiohttp import json import sys import traceback import warnings import logging from dataclasses import dataclass, field import time from typing import List, Optional, Tuple import numpy as np logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) AIOHTTP_TIMEOUT = aiohttp.ClientTimeout(total=3 * 3600) def remove_prefix(text: str, prefix: str) -> str: if text.startswith(prefix): return text[len(prefix) :].strip() return text.strip() @dataclass class RequestOutput: success: bool = False prompt_len: int = 0 completion_tokens: int = 0 latency: float = 0.0 ttft: float = 0.0 itl: List[float] = field(default_factory=list) # List of inter-token latencies error: str = "" class BenchmarkRunner: def __init__( self, api_url: str, model_uid: str, input_requests: List[Tuple[str, int, int]], stream: bool, api_key: Optional[str] = None, print_error: bool = False, ): self.api_url = api_url self.model_uid = model_uid self.input_requests = input_requests self.outputs: List[RequestOutput] = [] self.benchmark_time = None self.stream = stream self.api_key = api_key self.print_error = print_error async def run(self): await self.warm_up() start_time = time.time() await self._run() end_time = time.time() self.benchmark_time = end_time - start_time async def warm_up(self, num_requests: int = 5): logger.info("Warming up...") for i in range(min(num_requests, len(self.input_requests))): request = self.input_requests[i] await self.send_request(request, warming_up=True) logger.info("Warm-up completed.") async def _run(self): pass async def send_request(self, request: tuple, warming_up: bool = False): prompt, prompt_len, output_len = request if self.stream: pload = { "model": self.model_uid, "n": 1, "temperature": 0.6, "top_p": 0.9, "max_tokens": output_len, "stream": True, "messages": [{"role": "user", "content": prompt}], "stream_options": {"include_usage": True}, } else: pload = { "model": self.model_uid, "n": 1, "temperature": 0.6, "top_p": 0.9, "max_tokens": output_len, "stream": False, "messages": [{"role": "user", "content": prompt}], } headers = {"User-Agent": "Benchmark Client"} if self.api_key: headers["Authorization"] = f"Bearer {self.api_key}" async with aiohttp.ClientSession(timeout=AIOHTTP_TIMEOUT) as session: output = RequestOutput(prompt_len=prompt_len) ttft = 0.0 st = time.perf_counter() most_recent_timestamp = st try: async with session.post( self.api_url, headers=headers, json=pload ) as response: if response.status == 200: if self.stream: async for chunk_bytes in response.content: # { # "id": "chataec79465-dfea-46af-81b9-c28124063fc0", # "model": "llama-3-instruct", # "created": 1721202668, # "object": "chat.completion.chunk", # "choices": [ # { # "index": 0, # "delta": {"role": "assistant", "content": ""}, # "finish_reason": null, # } # ], # } chunk_bytes = chunk_bytes.strip() if not chunk_bytes: continue chunk = remove_prefix(chunk_bytes.decode("utf-8"), "data:") if chunk == "[DONE]": latency = time.perf_counter() - st else: timestamp = time.perf_counter() data = json.loads(chunk) # First token if ttft == 0.0: ttft = time.perf_counter() - st output.ttft = ttft # Decoding phase else: output.itl.append(timestamp - most_recent_timestamp) most_recent_timestamp = timestamp output.latency = latency output.success = True output.completion_tokens = data["usage"]["completion_tokens"] else: resp = await response.json() output.latency = time.perf_counter() - st output.success = True output.completion_tokens = resp["usage"]["completion_tokens"] except Exception: output.success = False exc_info = sys.exc_info() output.error = "".join(traceback.format_exception(*exc_info)) if not warming_up: self.outputs.append(output) def print_stats(self): total_time = self.benchmark_time if self.stream: # Initialize variables for metrics total_input = 0 completed = 0 actual_output_lens = [] itls = [] tpots = [] ttfts = [] for output in self.outputs: if output.success: actual_output_lens.append(output.completion_tokens) total_input += output.prompt_len if output.completion_tokens > 1: tpots.append( (output.latency - output.ttft) / (output.completion_tokens - 1) ) itls += output.itl ttfts.append(output.ttft) completed += 1 else: actual_output_lens.append(0) if completed == 0: warnings.warn( "All requests failed. This is likely due to a misconfiguration " "on the benchmark arguments.", stacklevel=2, ) # Calculate statistics total_output = sum(actual_output_lens) request_throughput = completed / total_time if total_time > 0 else 0 input_throughput = total_input / total_time if total_time > 0 else 0 output_throughput = total_output / total_time if total_time > 0 else 0 mean_ttft = np.mean(ttfts) * 1000 if ttfts else 0 median_ttft = np.median(ttfts) * 1000 if ttfts else 0 std_ttft = np.std(ttfts) * 1000 if ttfts else 0 p99_ttft = np.percentile(ttfts, 99) * 1000 if ttfts else 0 mean_tpot = np.mean(tpots) * 1000 if tpots else 0 median_tpot = np.median(tpots) * 1000 if tpots else 0 std_tpot = np.std(tpots) * 1000 if tpots else 0 p99_tpot = np.percentile(tpots, 99) * 1000 if tpots else 0 mean_itl = np.mean(itls) * 1000 if itls else 0 median_itl = np.median(itls) * 1000 if itls else 0 std_itl = np.std(itls) * 1000 if itls else 0 p99_itl = np.percentile(itls, 99) * 1000 if itls else 0 # Print benchmark results print("{s:{c}^{n}}".format(s=" Benchmark Result ", n=50, c="=")) print("{:<40} {:<10}".format("Successful requests:", completed)) print("{:<40} {:<10.2f}".format("Benchmark duration (s):", total_time)) print("{:<40} {:<10}".format("Total input tokens:", total_input)) print("{:<40} {:<10}".format("Total generated tokens:", total_output)) print( "{:<40} {:<10.2f}".format( "Request throughput (req/s):", request_throughput ) ) print( "{:<40} {:<10.2f}".format( "Input token throughput (tok/s):", input_throughput ) ) print( "{:<40} {:<10.2f}".format( "Output token throughput (tok/s):", output_throughput ) ) print("{s:{c}^{n}}".format(s="Time to First Token", n=50, c="-")) print("{:<40} {:<10.4f}".format("Mean TTFT (ms):", mean_ttft)) print("{:<40} {:<10.4f}".format("Median TTFT (ms):", median_ttft)) print("{:<40} {:<10.4f}".format("Std TTFT (ms):", std_ttft)) print("{:<40} {:<10.4f}".format("P99 TTFT (ms):", p99_ttft)) print( "{s:{c}^{n}}".format( s="Time per Output Token (excl. 1st token)", n=50, c="-" ) ) print("{:<40} {:<10.4f}".format("Mean TPOT (ms):", mean_tpot)) print("{:<40} {:<10.4f}".format("Median TPOT (ms):", median_tpot)) print("{:<40} {:<10.4f}".format("Std TPOT (ms):", std_tpot)) print("{:<40} {:<10.4f}".format("P99 TPOT (ms):", p99_tpot)) print("{s:{c}^{n}}".format(s="Inter-token Latency", n=50, c="-")) print("{:<40} {:<10.4f}".format("Mean ITL (ms):", mean_itl)) print("{:<40} {:<10.4f}".format("Median ITL (ms):", median_itl)) print("{:<40} {:<10.4f}".format("Std ITL (ms):", std_itl)) print("{:<40} {:<10.4f}".format("P99 ITL (ms):", p99_itl)) print("=" * 50) else: # Initialize variables for metrics total_input = 0 completed = 0 actual_output_lens = [] latencies = [] per_token_latencies = [] per_output_token_latencies = [] for output in self.outputs: if output.success: actual_output_lens.append(output.completion_tokens) total_input += output.prompt_len latencies.append(output.latency) per_token_latencies.append( output.latency / (output.prompt_len + output.completion_tokens) ) if output.completion_tokens > 0: per_output_token_latencies.append( output.latency / output.completion_tokens ) completed += 1 else: actual_output_lens.append(0) if completed == 0: warnings.warn( "All requests failed. This is likely due to a misconfiguration " "on the benchmark arguments.", stacklevel=2, ) # Calculate statistics total_output = sum(actual_output_lens) request_throughput = len(self.outputs) / total_time if total_time > 0 else 0 input_throughput = total_input / total_time if total_time > 0 else 0 output_throughput = total_output / total_time if total_time > 0 else 0 mean_latency = np.mean(latencies) if latencies else 0 mean_per_token_latency = ( np.mean(per_token_latencies) if per_token_latencies else 0 ) mean_per_output_token_latency = ( np.mean(per_output_token_latencies) if per_output_token_latencies else 0 ) # Print benchmark results print("{s:{c}^{n}}".format(s=" Benchmark Result ", n=50, c="=")) print("{:<40} {:<10}".format("Successful requests:", completed)) print("{:<40} {:<10.2f}".format("Benchmark duration (s):", total_time)) print("{:<40} {:<10}".format("Total input tokens:", total_input)) print("{:<40} {:<10}".format("Total generated tokens:", total_output)) print( "{:<40} {:<10.2f}".format( "Request throughput (req/s):", request_throughput ) ) print( "{:<40} {:<10.2f}".format( "Input token throughput (tok/s):", input_throughput ) ) print( "{:<40} {:<10.2f}".format( "Output token throughput (tok/s):", output_throughput ) ) print("{s:{c}^{n}}".format(s="Latency Statistics", n=50, c="-")) print("{:<40} {:<10.4f}".format("Mean latency (s):", mean_latency)) print( "{:<40} {:<10.4f}".format( "Mean latency per token (s):", mean_per_token_latency ) ) print( "{:<40} {:<10.4f}".format( "Mean latency per output token (s):", mean_per_output_token_latency ) ) print("=" * 50) print(f"Total time: {total_time:.2f} s") print(f"Throughput: {len(self.outputs) / total_time:.2f} requests/s") if completed < len(self.input_requests): if self.print_error: logger.info("Errors encountered during benchmark:") for output in self.outputs: if not output.success: print(f"Error for prompt with length {output.prompt_len}: {output.error}") else: logger.info( "Errors were encountered during the benchmark. Run with --print-error to see detailed error messages." ) class ConcurrentBenchmarkRunner(BenchmarkRunner): def __init__( self, api_url: str, model_uid: str, input_requests: List[Tuple[str, int, int]], stream: bool, concurrency: int, api_key: Optional[str] = None, print_error: bool = False, ): super().__init__( api_url, model_uid, input_requests, stream, api_key, print_error, ) self.concurrency = concurrency self.left = len(input_requests) async def worker(self): pass ================================================ FILE: benchmark/benchmark_serving.py ================================================ # Copyright 2022-2023 XProbe Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import argparse import asyncio import logging import random from typing import List, Tuple, Optional import numpy as np from utils import sample_requests, get_tokenizer from benchmark_runner import ConcurrentBenchmarkRunner logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) class ServingBenchmarkRunner(ConcurrentBenchmarkRunner): def __init__( self, api_url: str, model_uid: str, input_requests: List[Tuple[str, int, int]], stream: bool, concurrency: int, request_rate: float, api_key: Optional[str] = None, print_error: bool = False, ): super().__init__( api_url, model_uid, input_requests, stream, concurrency, api_key, print_error, ) self.request_rate = request_rate self.queue = None # delay the creation of the queue async def _run(self): tasks = [] for _ in range(self.concurrency): tasks.append(asyncio.create_task(self.worker())) await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED) async def warm_up(self, num_requests: int = 5): if self.queue is None: self.queue = asyncio.Queue(len(self.input_requests)) logger.info(f"Enqueuing {len(self.input_requests)} requests.") for req in iter(self.input_requests): await self.queue.put(req) await super().warm_up(num_requests) async def worker(self): """ wait request dispatch by run(), and then send_request. When all request is done, most worker will hang on self.queue, but at least one worker will exit""" while self.left > 0: request = await self.queue.get() await self.send_request(request) self.left -= 1 print("\rdone_request, left %d " % (self.left), end="") if self.request_rate != float("inf"): # If the request rate is infinity, then we don't need to wait. # Sample the request interval from the exponential distribution. interval = np.random.exponential(1.0 / self.request_rate) # The next request will be sent after the interval. await asyncio.sleep(interval) print("") def main(args: argparse.Namespace): if args.concurrency > args.num_prompts: print("Fix concurrency with num_prompts %d" % (args.num_prompts)) args.concurrency = args.num_prompts print(args) random.seed(args.seed) np.random.seed(args.seed) api_url = f"http://{args.host}:{args.port}/v1/chat/completions" model_uid = args.model_uid logger.info("Preparing for benchmark.") tokenizer = get_tokenizer(args.tokenizer, trust_remote_code=args.trust_remote_code) input_requests = sample_requests( args.dataset, args.num_prompts, tokenizer, prompt_len_limit=args.prompt_len_limit, ) logger.info("Benchmark starts.") benchmark = ServingBenchmarkRunner( api_url, model_uid, input_requests, args.stream, request_rate=args.request_rate, concurrency=args.concurrency, api_key=args.api_key, print_error=args.print_error, ) asyncio.run(benchmark.run()) benchmark.print_stats() if __name__ == "__main__": parser = argparse.ArgumentParser( description="Benchmark the online serving throughput." ) parser.add_argument("--host", type=str, default="localhost") parser.add_argument("--port", type=int, default=9997) parser.add_argument( "--dataset", type=str, required=True, help="Path to the dataset." ) parser.add_argument( "--tokenizer", type=str, required=True, help="Name or path of the tokenizer." ) parser.add_argument( "--num-prompts", type=int, default=100, help="Number of prompts to process." ) parser.add_argument( "--prompt-len-limit", type=int, default=1024, help="Prompt length limitation." ) parser.add_argument( "--api-key", type=str, default=None, help="Authorization api key", ) parser.add_argument( "--concurrency", "-c", type=int, default=100, help="Set the concurrency of request to send", ) parser.add_argument( "--request-rate", type=float, default=float("inf"), help="Number of requests per second. If this is inf, " "then all the requests are sent at time 0. " "Otherwise, we use Poisson process to synthesize " "the request arrival times.", ) parser.add_argument("--seed", type=int, default=0) parser.add_argument( "--trust-remote-code", action="store_true", help="Trust remote code from huggingface.", ) parser.add_argument("--model-uid", type=str, help="Xinference model UID.") parser.add_argument( "--stream", action="store_true", help="Enable streaming responses." ) parser.add_argument( "--print-error", action="store_true", help="Print detailed error messages if any errors encountered." ) args = parser.parse_args() main(args) ================================================ FILE: benchmark/utils.py ================================================ # Copyright 2022-2023 XProbe Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import json import logging import random from typing import TYPE_CHECKING, List, Tuple from transformers import AutoTokenizer, PreTrainedTokenizerFast logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) if TYPE_CHECKING: from transformers import PreTrainedTokenizerBase # A fast LLaMA tokenizer with the pre-processed `tokenizer.json` file. _FAST_LLAMA_TOKENIZER = "hf-internal-testing/llama-tokenizer" def get_tokenizer( tokenizer_name: str, *args, tokenizer_mode: str = "auto", trust_remote_code: bool = False, **kwargs, ) -> "PreTrainedTokenizerBase": """Gets a tokenizer for the given model name via Huggingface.""" if tokenizer_mode == "slow": if kwargs.get("use_fast", False): raise ValueError("Cannot use the fast tokenizer in slow tokenizer mode.") kwargs["use_fast"] = False if ( "llama" in tokenizer_name.lower() and kwargs.get("use_fast", True) and tokenizer_name != _FAST_LLAMA_TOKENIZER ): logger.info( "For some LLaMA-based models, initializing the fast tokenizer may " "take a long time. To eliminate the initialization time, consider " f"using '{_FAST_LLAMA_TOKENIZER}' instead of the original " "tokenizer." ) try: tokenizer = AutoTokenizer.from_pretrained( tokenizer_name, *args, trust_remote_code=trust_remote_code, **kwargs ) except TypeError as e: # The LLaMA tokenizer causes a protobuf error in some environments. err_msg = ( "Failed to load the tokenizer. If you are using a LLaMA-based " f"model, use '{_FAST_LLAMA_TOKENIZER}' instead of the original " "tokenizer." ) raise RuntimeError(err_msg) from e except ValueError as e: # If the error pertains to the tokenizer class not existing or not # currently being imported, suggest using the --trust-remote-code flag. if not trust_remote_code and ( "does not exist or is not currently imported." in str(e) or "requires you to execute the tokenizer file" in str(e) ): err_msg = ( "Failed to load the tokenizer. If the tokenizer is a custom " "tokenizer not yet available in the HuggingFace transformers " "library, consider setting `trust_remote_code=True` in LLM " "or using the `--trust-remote-code` flag in the CLI." ) raise RuntimeError(err_msg) from e else: raise e if not isinstance(tokenizer, PreTrainedTokenizerFast): logger.warning( "Using a slow tokenizer. This might cause a significant " "slowdown. Consider using a fast tokenizer instead." ) return tokenizer def sample_requests( dataset_path: str, num_requests: int, tokenizer: "PreTrainedTokenizerBase", prompt_len_limit: int = 1024, ) -> List[Tuple[str, int, int]]: # Load the dataset. with open(dataset_path) as f: dataset = json.load(f) # Filter out the conversations with less than 2 turns. dataset = [data for data in dataset if len(data["conversations"]) >= 2] # Only keep the first two turns of each conversation. dataset = [ (data["conversations"][0]["value"], data["conversations"][1]["value"]) for data in dataset ] # Tokenize the prompts and completions. prompts = [prompt for prompt, _ in dataset] prompt_token_ids = tokenizer(prompts).input_ids completions = [completion for _, completion in dataset] completion_token_ids = tokenizer(completions).input_ids tokenized_dataset = [] for i in range(len(dataset)): output_len = len(completion_token_ids[i]) tokenized_dataset.append((prompts[i], prompt_token_ids[i], output_len)) # Filter out too long sequences. filtered_dataset: List[Tuple[str, int, int]] = [] for prompt, prompt_token_ids, output_len in tokenized_dataset: prompt_len = len(prompt_token_ids) if prompt_len < 4 or output_len < 4: # Prune too short sequences. # This is because TGI causes errors when the input or output length # is too short. continue if ( prompt_len > prompt_len_limit or prompt_len + output_len > prompt_len_limit * 2 ): # Prune too long sequences. continue filtered_dataset.append((prompt, prompt_len, output_len)) # Sample the requests. sampled_requests = random.sample(filtered_dataset, num_requests) return sampled_requests def generate_sorting_prompts( num_prompts: int, context_length: int, prompt_len_limit: int, tokenizer: "PreTrainedTokenizerBase", ) -> List[Tuple[str, int, int]]: prompts = [] for i in range(0, num_prompts): random_nums = [] _prompt_len = 0 while True: r_str = "%s" % random.randint(0, 99) r_len = len(r_str) + 1 if r_len + _prompt_len > prompt_len_limit: break random_nums.append(r_str) _prompt_len += r_len prompt = "Sort the numbers:" + ",".join(random_nums) prompts.append(prompt) prompt_token_ids = tokenizer(prompts).input_ids dataset = [] for i in range(0, len(prompts)): prompt_len = len(prompt_token_ids[i]) dataset.append((prompts[i], prompt_len, context_length - prompt_len)) return dataset ================================================ FILE: doc/Makefile ================================================ # Minimal makefile for Sphinx documentation # # You can set these variables from the command line, and also # from the environment for the first two. SPHINXOPTS ?= SPHINXBUILD ?= sphinx-build SPHINXINTL ?= sphinx-intl SOURCEDIR = source BUILDDIR = build # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) $(SOURCEDIR) I18NSPHINXLANGS = -l zh_CN # Put it first so that "make" without argument is like "make help". help: @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) .PHONY: help Makefile html_zh_cn html_ja_jp gettext html_zh_cn: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) -t zh_cn -D language='zh_CN' "$(SOURCEDIR)" $(BUILDDIR)/html_zh_cn gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale $(SPHINXINTL) update -p $(BUILDDIR)/locale $(I18NSPHINXLANGS) python $(SOURCEDIR)/norm_zh.py # Catch-all target: route all unknown targets to Sphinx using the new # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). %: Makefile @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) ================================================ FILE: doc/source/_static/switcher.json ================================================ [ { "name": "简体中文(Chinese)", "version": "zh-cn", "url": "https://inference.readthedocs.io/zh-cn/latest/" }, { "name": "English", "version": "en", "url": "https://inference.readthedocs.io/en/latest/", "preferred": true } ] ================================================ FILE: doc/source/conf.py ================================================ # Configuration file for the Sphinx documentation builder. # # This file only contains a selection of the most common options. For a full # list see the documentation: # https://www.sphinx-doc.org/en/master/usage/configuration.html # -- Path setup -------------------------------------------------------------- # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. # import os # import sys # sys.path.insert(0, os.path.abspath('.')) # -- Project information ----------------------------------------------------- project = 'Xinference' copyright = '2025, Xorbits Inc.' author = 'xorbitsai' # -- General configuration --------------------------------------------------- # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ "sphinx.ext.mathjax", "sphinx.ext.ifconfig", "sphinx.ext.intersphinx", "sphinx.ext.viewcode", "sphinx.ext.githubpages", "sphinx.ext.autosummary", "sphinx.ext.napoleon", "sphinx_tabs.tabs", "sphinx_design", "IPython.sphinxext.ipython_directive", "IPython.sphinxext.ipython_console_highlighting", ] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This pattern also affects html_static_path and html_extra_path. exclude_patterns = [] # i18n locale_dirs = ["locale/"] # path is example but recommended. gettext_compact = False # optional # -- Options for HTML output ------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # html_theme = 'pydata_sphinx_theme' html_title = "Xinference" # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # Define the json_url for our version switcher. version_match = os.environ.get("READTHEDOCS_LANGUAGE") json_url = "https://inference.readthedocs.io/en/latest/_static/switcher.json" if not version_match: version_match = 'en' html_theme_options = { "show_toc_level": 2, "header_links_before_dropdown": 7, "icon_links": [ { "name": "GitHub", "url": "https://github.com/xorbitsai/inference", "icon": "fa-brands fa-github", "type": "fontawesome", }, ], "navbar_align": "content", # [left, content, right] For testing that the navbar items align properly "navbar_start": ["navbar-logo", "version-switcher"], "navbar_center": ["navbar-nav"], "switcher": { "json_url": json_url, "version_match": version_match, }, } if version_match != 'zh-cn': html_theme_options['icon_links'].extend([{ "name": "Discord", "url": "https://discord.gg/Xw9tszSkr5", "icon": "fa-brands fa-discord", "type": "fontawesome", }, { "name": "Twitter", "url": "https://twitter.com/xorbitsio", "icon": "fa-brands fa-twitter", "type": "fontawesome", }]) html_theme_options["external_links"] = [ {"name": "Official Site", "url": "https://xinference.io"}, ] html_theme_options["header_links_before_dropdown"] = 6 else: html_theme_options['icon_links'].extend([{ "name": "WeChat", "url": "https://xinference.cn/images/WeCom.jpg", "icon": "fa-brands fa-weixin", "type": "fontawesome", }, { "name": "Zhihu", "url": "https://zhihu.com/org/xorbits", "icon": "fa-brands fa-zhihu", "type": "fontawesome", }]) html_theme_options["external_links"] = [ {"name": "产品官网", "url": "https://xinference.cn"}, ] html_favicon = "_static/favicon.svg" ================================================ FILE: doc/source/development/contributing_codebase.rst ================================================ ============================= Contributing to the code base ============================= .. contents:: Table of contents: :local: Code standards -------------- Writing good code is not just about what you write. It is also about *how* you write it. During Continuous Integration testing, several tools will be run to check your code for stylistic errors. Good style is a requirement for submitting code to Xinference. In addition, it is important that we do not make sudden changes to the code that could have the potential to break a lot of user code as a result. Therefore we need it to be as backwards compatible as possible to avoid mass breakages. Autofixing formatting errors ---------------------------- Moreover, Continuous Integration will run code formatting checks like ``black``, ``flake8``, ``isort``, and others using `pre-commit hooks `_ Any warnings generated by these checks will cause the Continuous Integration to fail. Therefore, it is advisable to run the check yourself before submitting code. This can be done by installing ``pre-commit``:: pip install pre-commit and then running:: pre-commit install from the root of the Xinference repository. This setup ensures that all styling checks are automatically executed each time you commit changes without your needing to run each one manually. In addition, using ``pre-commit`` will also allow you to more easily remain up-to-date with our code checks as they change. Note that if needed, you can skip these checks with ``git commit --no-verify``. If you don't want to use ``pre-commit`` as part of your workflow, you can still use it to run its checks with:: pre-commit run --files without needing to have done ``pre-commit install`` beforehand. If you want to run checks on all recently committed files on upstream/main you can use:: pre-commit run --from-ref=upstream/main --to-ref=HEAD --all-files without needing to have done ``pre-commit install`` beforehand. .. note:: You may consider periodically running ``pre-commit gc`` to clean up repos which are no longer used. .. note:: If you have conflicting installations of ``virtualenv``, if could lead to errors - refer to `here `_. Also, due to a `bug in virtualenv `_, you may run into issues if you're using conda. To solve this, you can downgrade ``virtualenv`` to version ``20.0.33``. Backwards compatibility ----------------------- Please try to maintain backward compatibility. If you think breakage is necessary, clearly state why as part of the pull request. Also, be careful when changing method signatures and add deprecation warnings where needed. Also, add the deprecated sphinx directive to the deprecated functions or methods. You'll also need to 1. Write a new test that asserts a warning is issued when calling with the deprecated argument 2. Update all of Xinference existing tests and code to use the new argument Type hints ---------- Xinference strongly encourages the use of :pep:`484` style type hints. New development should contain type hints and pull requests to annotate existing code are accepted as well! Test-driven development ----------------------- Xinference is serious about testing and strongly encourages contributors to embrace `test-driven development (TDD) `_. This development process "relies on the repetition of a very short development cycle: first the developer writes an (initially failing) automated test case that defines a desired improvement or new function, then produces the minimum amount of code to pass that test." So, before actually writing any code, you should write your tests. Often the test can be taken from the original GitHub issue. However, it is always worth considering additional use cases and writing corresponding tests. Adding tests is frequently requested after code is pushed to Xinference. Thus, it is worth getting in the habit of writing tests ahead of time so this is never an issue. ================================================ FILE: doc/source/development/contributing_environment.rst ================================================ ================================== Creating a development environment ================================== .. contents:: Table of contents: :local: Before proceeding with any code modifications, it's essential to set up the necessary environment for Xinference development, which includes familiarizing yourself with Git usage, establishing an isolated environment, installing Xinference, and compiling the frontend. Getting started with Git ------------------------- Now that you have identified an issue you wish to resolve, an enhancement to incorporate, or documentation to enhance, it's crucial to acquaint yourself with GitHub and the Xinference codebase. To the new user, working with Git is one of the more intimidating aspects of contributing to Xinference. It can very quickly become overwhelming, but sticking to the guidelines below will help simplify the process and minimize potential issues. As always, if you are having difficulties please feel free to ask for help. The code is hosted on `GitHub `_. To contribute you will need to sign up for a `free GitHub account `_. We use `Git `_ for version control to allow many people to work together on the project. `GitHub has instructions `__ for installing git, setting up your SSH key, and configuring git. All these steps need to be completed before you can work seamlessly between your local repository and GitHub. Some great resources for learning Git: * `Official Git Documentation `_ * `Pro Git Book `_ * `Git Tutorial by Atlassian `_ * `Git - Concise Guide `_ .. note:: If the speed of ``git clone`` is slow, you can use the following command to add a proxy: :: export https_proxy=YourProxyAddress Creating an isolated environment -------------------------------- Before formally installing Xinference, it's recommended to create an isolated environment, using Conda recommended, for ease of subsequent operations. :: conda create --name xinf conda activate xinf ``xinf`` can be replaced with a custom Conda environment name. Afterward, you'll need to install Python and Node.js (npm) in the newly created Conda environment. Here are the commands: :: conda install python=3.12 conda install nodejs Install from source code ------------------------ Before we begin, please make sure that you have cloned the repository. Suppose you clone the repository as ``inference`` directory, ``cd`` to this directory where the ``setup.cfg`` and ``setup.py`` files are located, and run the following command: :: pip install -e . xinference-local If the commands run successfully, you can use Xinference normally. For detailed usage instructions, refer to `using_xinference `__. If errors occur or the process freezes during execution, the next step is to compile the frontend. Frontend Compilation -------------------- Navigate to the ``inference/xinference/ui/web/ui`` directory. Then, execute the following command to clear the cache: :: npm cache clean If the command fails to execute, you can try adding the ``--force`` option. .. note:: If the ``node_modules`` folder already exists in this directory, it's recommended to manually delete it before cleaning the cache. Next, execute the following command in this directory to compile the frontend: :: npm install npm run build Still, if the first command fails to execute, you can try adding the ``--force`` option. After compiling the frontend, you can ``cd`` back to the directory where the ``setup.cfg`` and ``setup.py`` files are located, and install Xinference via ``pip install -e .``. ================================================ FILE: doc/source/development/index.rst ================================================ .. _development_index: =========== Development =========== .. toctree:: :maxdepth: 2 contributing_environment contributing_codebase xinference_internals ================================================ FILE: doc/source/development/xinference_internals.rst ================================================ =========================== The internals of Xinference =========================== .. contents:: Table of contents: :local: Overview ======== Xinference leverages `Xoscar `_, an actor programming framework we designed, as its core component to manage machines, devices, and model inference processes. Each actor serves as a basic unit for model inference and various inference backends can be integrate into the actor, enabling us to support multiple inference engines and hardware. These actors are hosted and scheduled within actor pools, which are designed to be asynchronous and non-blocking and function as resource pools. .. raw:: html actor ==== Both supervisor and worker are actor instances. Initially, an actor pool, serving as a resource pool, needs to be created on each server; and each actor can utilize a CPU core or a GPU device. Each server has its own address (IP address or hostname), so actors on different computing nodes can communicate with each other through these addresses. See `Actor`_ for more information. RESTful API =========== The RESTful API is implemented using `FastAPI `_, as specified in `api/restful_api.py `_. :: self._router.add_api_route("/status", self.get_status, methods=["GET"]) This is an example of the API ``/status``, it's corresponding function is ``get_status``. You can add connection between RESTful API and the backend function you want in `api/restful_api.py `_. Command Line ============ The Command Line is implemented using `Click `_, as specified in `deploy/cmdline.py `_, allowing users to interact with the Xinference deployment features directly from the terminal. Entry Points ------------ Take the command-lines we implemented as examples: - ``xinference``: Provides commands for model management, including registering/unregistering models, listing all registered/running models, and launching or terminating specific models. It also features interactive commands like generate and chat for testing and interacting with deployed models in real-time. - ``xinference-local``: Starts a local Xinference service. - ``xinference-supervisor``: Initiates a supervisor process that manages and monitors worker actors within a distributed setup. - ``xinference-worker``: Starts a worker process that executes tasks assigned by the supervisor, utilizing available computational resources effectively. Each command is equipped with ``options`` and ``flags`` to customize its behavior, such as specifying log levels, host addresses, port numbers, and other relevant settings. Python projects define command-line console entry points in `setup.cfg` or `setup.py`. :: console_scripts = xinference = xinference.deploy.cmdline:cli xinference-local = xinference.deploy.cmdline:local xinference-supervisor = xinference.deploy.cmdline:supervisor xinference-worker = xinference.deploy.cmdline:worker The command-line ``xinference`` can be referred to code in ``xinference.deploy.cmdline:cli``. Click ----- We use Click to implement a specific command-line: :: @click.option( "--host", "-H", default=XINFERENCE_DEFAULT_DISTRIBUTED_HOST, type=str, help="Specify the host address for the supervisor.", ) @click.option( "--port", "-p", default=XINFERENCE_DEFAULT_ENDPOINT_PORT, type=int, help="Specify the port number for the Xinference web ui and service.", ) For example, the ``xinference-local`` command allows you to define the host address and port. Actor ===== Xinference is fundamentally based on `Xoscar `_, our actor framework, which can manage computational resources and Python processes to support scalable and concurrent programming. The following is a pseudocode demonstrating how our Worker Actor works, the actual Worker Actor is more complex than this. :: import xoscar as xo class WorkerActor(xo.Actor): def __init__(self, *args, **kwargs): ... async def launch_model(self, model_id, n_gpu, ...): # launch an inference engine, use specific model class to load model checkpoints ... async def list_models(self): # list models on this actor ... async def terminate_model(self, model_id): # terminate the model ... async def __post_create__(self): # called after the actor instance is created ... async def __pre_destroy__(self): # called before the actor instance is destroyed ... We use the ``WorkerActor`` as an example to illustrate how we build the Xinference. Each actor class is a standard Python class that inherits from ``xoscar.Actor``. An instance of this class is a specific actor within the actor pool. - **Define Actor Actions**: Each actor needs to define certain actions or behaviors to accomplish specific tasks. For instance, the model inference ``WorkerActor`` needs to launch the model (``launch_model``), list the models in this actor (``list_models``), terminate a model (``terminate_model``). There are two special methods worth noting. The ``__post_create__`` is invoked before the actor is created, allowing for necessary initializations. The ``__pre_destroy__`` is called after the actor is destroyed, allowing for cleanup or finalization tasks. - **Reference Actor and Invoke Methods**: When an actor is created, it yields a reference variable so that other actors can reference it. The actor reference can also be referenced with the address. Suppose the ``WorkerActor`` is created and the reference variable is ``worker_ref``, the ``launch_model`` method of this actor class can be invoked by calling ``worker_ref.launch_model()``. Even if the actor's method is originally a synchronized method, when called with an actor reference, it will become as an asynchronous method. - **Inference Engine**: The actor can manage the process, and the inference engine is also a process. In the launch model part of the ``WorkerActor``, we can initialize different inference engines according to the user's need. Therefore, Xinference can support multiple inference engines and can easily adapt to new inference engines in the future. See `Xoscar document `_ for more actor use cases. Asynchronous Programming ======================== Both Xinference and Xoscar highly utilize asynchronous programming of ``asyncio``. Asynchronous programming is a programming paradigm that does not block. Instead, requests and function calls are issued and executed in the background and results are returned in the future. This enables us to perform activities concurrently. If you're not familiar with Pythons's ``asyncio``, you can see more tutorials for help: - `Python Asyncio Tutorial `__ - `Real Python's asyncio Tutorial `__ - `Python Official Documentation `__ Model ===== Xinference supports different types of models including large language models (LLMs), image models, audio models, embedding models, etc. All models are implemented in `model/ `_. LLM --- Take `model/llm/ `_ for example, it focuses on the management and instantiation of LLMs. It includes detailed implementations for loading, configuring, and deploying LLMs. We support many backends such as GGML, PyTorch, and vLLM. Our generated content is compatible with the format of OpenAI, supporting features such as streaming output and returning chat completion format (for chat models only). Therefore, there is a lot of adaptation work to be done after the model generate content. These tasks are not difficult, but they do require some time. When writing this part of the code, please refer to the `OpenAI API documentation `_ and the documentation of various inference backends, and make the necessary adaptations. JSON ---- In `model/llm/llm_family.json `_, we utilize JSON files to manage the metadata of emerging open-source models. Adding a new model does not necessitate writing new code, it merely requires appending new metadata to the existing JSON file. :: { "model_name": "llama-2-chat", "model_ability": ["chat"], "model_specs": [ { "model_format": "ggmlv3", "model_size_in_billions": 70, "quantization": ["q8_0", ...], "model_id": "TheBloke/Llama-2-70B-Chat-GGML", }, ... ], "prompt_style": { "style_name": "LLAMA2", "system_prompt": "[INST] <>\nYou are a helpful AI assistant.\n<>\n\n", "roles": ["[INST]", "[/INST]"], "stop_token_ids": [2], "stop": [""] } } This is an example of how to define the Llama-2 chat model. The ``model_specs`` define the information of the model, as one model family usually comes with various sizes, quantization methods, and file formats. For instance, the ``model_format`` could be ``pytorch`` (using Hugging Face Transformers or vLLM as backend), ``ggmlv3`` (a tensor library associated with llama.cpp), or ``gptq`` (a post-training quantization framework). The ``model_id`` defines the repository of the model hub from which Xinference downloads the checkpoint files. Furthermore, due to distinct instruction-tuning processes, different model families have varying prompt styles. The ``prompt_style`` in the JSON file specifies how to format prompts for this particular model. For example, ``system_prompt`` and ``roles`` are used to specify the instructions and personality of the model. Code Walkthrough ================ The main code is located in the `xinference/ `_: - `api/ `_: `restful_api.py `_ is the core part that sets up and runs the RESTful APIs. It integrates an authentication service (the specific code is located in `oauth2/ `_), as some or all endpointsrequire user authentication. - `client/ `_: This is the client of Xinference. - `oscar/ `_ defines the Actor Client which acts as a client interface for interacting with models deployed in a Xinference cluster. - `restful/ `_ implements a RESTful client for interacting with a Xinference service. - `core/ `_: This is the core part of Xinference. - `metrics.py `_ and `resource.py `_ defines a set of tools for collecting and reporting metrics and the status of node resources, including model throughput, latency, the usage of CPU and GPU, memory usage, and more. - `image_interface.py `_ and `chat_interface.py `_ implement `Gradio `_ interfaces for image and chat models, respectively. These interfaces allow users to interact with models through a Web UI, such as generating images or engaging in chat. They build user interfaces using the gradio package and communicate with backend models through our RESTful APIs. - `worker.py `_ and `supervisor.py `_ respectively define the logic for worker actors and supervisor actor. Worker actors are responsible for carrying out specific model computation tasks, while supervisor actors manage the lifecycle of worker nodes, schedule tasks, and monitor system states. - `status_guard.py `_ implements a status monitor to track the status of models (like creating, updating, terminating, etc.). It allows querying status information of model instances and managing these statuses based on the model's UID. - `cache_tracker.py `_ defines a cache tracker for recording and managing cache status and information of model versions. It supports recording cache locations and statuses of model versions and querying model version information based on model names. - `event.py `_ defines an event collector for gathering and reporting various runtime events of models, such as information, warnings, and errors. `model.py `_ defines a Model Actor, the core component for direct model interactions. The Model Actor is responsible for executing model inference requests, handling input and output data streams, and supports various types of model operations. - `deploy/ `_: It provides a command-line interface (CLI) for interacting with the Xinference framework, allowing users to perform operations by command line. See `Command Line`_ for more information. - `locale/ `_: It supports multi-language localization. By simply adding and updating JSON translation files, it becomes possible to support more languages, improving user experience. - `model/ `_: It provides a structure for model descriptions, creation, and caching. See `Model`_ for more information. - `web/ui/ `_: The js code of the frontend (Web UI). ================================================ FILE: doc/source/examples/ai_podcast.rst ================================================ .. _examples_ai_podcast: ====================== Example: AI Podcast 🎙 ====================== **Description**: 🎙️AI Podcast - Voice Conversations with Multiple Agents on M2 Max 💻 **Support Language** : English (AI_Podcast.py) Chinese (AI_Podcast_ZH.py) **Used Technology (EN version)** : @ `OpenAI `_ 's `whisper `_ @ `ggerganov `_ 's `ggml `_ @ `WizardLM_AI `_ 's `wizardlm v1.0 `_ @ `lmsysorg `_ 's `vicuna v1.3 `_ @ `Xinference `_ as a launcher **Detailed Explanation on the Demo Functionality** : 1. Generate the Wizardlm Model and Vicuna Model when the program is launching with Xorbits Inference. Initiate the Chatroom by giving the two chatbot their names and telling them that there is a human user called "username", where "username" is given by user's input. Initialize a empty chat history for the chatroom. 2. Use Audio device to store recording into file, and transcribe the file using OpenAI's Whisper to receive a human readable text as string. 3. Based on the input message string, determine which agents the user want to talk to. Call the target agents and parse in the input string and chat history for the model to generate. 4. When the responses are ready, use Macos's "Say" Command to produce audio through speaker. Each agents have their own voice while speaking. 5. Store the user input and the agent response into chat history, and recursively looping the program until user explicitly says words like "see you" in their responses. **Highlight Features with Xinference** : 1. With Xinference's distributed system, we can easily deploy two different models in the same session and in the same "chatroom". With enough resources, the framework can deploy any amount of models you like at the same time. 2. With Xinference, you can deploy the model easily by just adding a few lines of code. For examples, for launching the vicuna model in the demo, just by:: args = parser.parse_args() endpoint = args.endpoint client = Client(endpoint) model_a = "vicuna-v1.3" model_a_uid = client.launch_model( model_name=model_a, model_format="ggmlv3", model_size_in_billions=7, quantization="q4_0", n_ctx=2048, ) model_a_ref = client.get_model(model_a_uid) Then, the Xinference client will handle "target model downloading and caching", "set up environment and process for the model", and "run the service at selected endpoint. " You are now ready to play with your llm model. **Original Demo Video** : * `🎙️AI Podcast - Voice Conversations with Multiple Agents on M2 Max💻🔥🤖 `_ **Source Code** : * `AI_Podcast `_ (English Version) * `AI_Podcast_ZH `_ (Chinese Version) ================================================ FILE: doc/source/examples/chatbot.rst ================================================ .. _examples_chatbot: ======================== Example: CLI chatbot 🤖️ ======================== **Description**: Demonstrate how to interact with Xinference to play with LLM chat functionality with an AI agent in command line💻 **Used Technology**: @ `ggerganov `_ 's `ggml `_ @ `Xinference `_ as a launcher @ All LLaMA and Chatglm models supported by `Xorbitsio inference `_ **Detailed Explanation on the Demo Functionality** : 1. Take the user command line input in the terminal and grab the required parameters for model launching. 2. Launch the Xinference frameworks and automatically deploy the model user demanded into the cluster. 3. Initialize an empty chat history to store all the context in the chatroom. 4. Recursively ask for user's input as prompt and let the model to generate response based on the prompt and the chat history. Show the Output of the response in the terminal. 5. Store the user's input and agent's response into the chat history as context for the upcoming rounds. **Source Code** : * `chat `_ ================================================ FILE: doc/source/examples/gradio_chatinterface.rst ================================================ .. _examples_gradio_chatinterface: =============================== Example: Gradio ChatInterface🤗 =============================== **Description**: This example showcases how to build a chatbot with 120 lines of code with Gradio ChatInterface and Xinference local LLM **Used Technology**: @ `Xinference `_ as a LLM model hosting service @ `Gradio `_ as a web interface for the chatbot **Detailed Explanation on the Demo Functionality** : * Parse user-provided command line arguments to capture essential model parameters such as model name, size, format, and quantization. * Establish a connection to the Xinference framework and deploy the specified model, ensuring it's ready for real-time interactions. * Implement helper functions (flatten and to_chat) to efficiently handle and store chat interactions, ensuring the model has context for generating relevant responses. * Set up an interactive chat interface using Gradio, allowing users to communicate with the model in a user-friendly environment. * Activate the Gradio web interface, enabling users to start their chat sessions and receive model-generated responses based on their queries. **Source Code** : * `Gradio ChatInterface `_ ================================================ FILE: doc/source/examples/index.rst ================================================ .. _examples_index: ======== Examples ======== .. toctree:: :maxdepth: 2 :hidden: ai_podcast chatbot gradio_chatinterface pdf_chatbot langchain_streamlit_doc_chat Here you can find examples and resources to learn about how to use Xinference. Demos ===== End-to-end applications of using Xinference: * `Voice Conversations with AI Agents on M2 Max `_ * `Interacting with LLM Models: A Command-Line Example `_ * `Interacting with LLM Models: A Gradio ChatInterface Example `_ * `PDF Chatbot with Local LLM and Embeddings `_ * `Local Doc Conversations with LangChain and Streamlit `_ If you come across other examples in your own workflows we encourage you to contribute a `PR `_! Tutorials ========= The following tutorials cover the basics of using Xinference in different scenarios: * `[Notebook] Question-answering(QA) Application with Xinference, Milvus and LangChain `_ * `Using Xinference local LLMs within LlamaIndex `_ * `[Chinese] 如何让 Chatbox 接入开源大模型,实现免费聊天 `_ * `[Chinese] 摆脱 OpenAI 依赖,8 分钟教你用开源生态构建全栈 AI 应用 `_ * `[Chinese] 使用全套开源工具构建 LLM 应用实战: 在 Dify 调用 Baichuan 开源模型能力 `_ Third-Party Library Integrations ================================ Xinference is designed to seamlessly integrate and deploy open-sourced AI models, so we want to incorporate support for mainstream toolkits in the AI landscape. Xinference can be used with the following third-party libraries: * LangChain `Text Embedding Models `_ and `LLMs `_ * `LlamaIndex Xinference LLM `_ ================================================ FILE: doc/source/examples/langchain_streamlit_doc_chat.rst ================================================ .. _examples_langchain_streamlit_doc_chat: ======================================= Example: LangChain Streamlit Doc Chat📄 ======================================= **Description**: This Streamlit-based application demonstrates a AI chatbot powered by local LLM and embedding models **Used Technology**: @ `Xinference `_: as the LLM and embedding model hosting service @ `LangChain `_: orchestrates the entire document processing and query answering pipeline @ `Streamlit `_: for interactive user interface **Detailed Explanation on the Demo Functionality** : * Streamlit UI for uploading text files, enhancing user interaction. * Texts are split into chunks and embedded using Xinference for efficient processing. * Executes similarity searches on embedded texts to pinpoint relevant sections for user queries. * Utilizes a structured prompt template for focused LLM interactions. * Xinference's LLM processes queries within the context of relevant document parts, providing accurate responses. * The system facilitates effective and context-sensitive document exploration, aiding users in information retrieval. **Source Code** : * `LangChain Streamlit Doc Chat `_ ================================================ FILE: doc/source/examples/pdf_chatbot.rst ================================================ .. _examples_pdf_chatbot: ====================== Example: PDF Chatbot📚 ====================== **Description**: This example showcases how to build a PDF chatbot with local LLM and Embedding models **Used Technology**: @ `Xinference `_ as a LLM model hosting service @ `LlamaIndex `_ for orchestrating the entire RAG pipeline @ `Streamlit `_ for interactive UI **Detailed Explanation on the Demo Functionality** : * Crafted a Dockerfile to simplify the process and ensure easy reproducibility. * Set up models with Xinference and expose two ports for accessing them. * Leverage Streamlit for seamless file uploads and interactive communication with the chat engine. * 5x faster doc embedding than OpenAI's API. * Leveraging the power of GGML to offload models to the GPU, ensuring swift acceleration. Less long waits for returns. **Source Code** : * `PDF Chatbot `_ ================================================ FILE: doc/source/gen_docs.py ================================================ # Copyright 2022-2023 XProbe Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import json import os import sys from collections import defaultdict from jinja2 import Environment, FileSystemLoader # Mock engine libraries before importing xinference modules def mock_engine_libraries(): """Mock engine libraries to make them appear installed for documentation generation""" from types import ModuleType from importlib.machinery import ModuleSpec # Create mock vllm module vllm_mock = ModuleType('vllm') vllm_mock.__version__ = "1.0.0" # Latest version for full feature support vllm_mock.__spec__ = ModuleSpec('vllm', None) vllm_mock.__file__ = "mock_vllm.py" # Create mock mlx module with core submodule mlx_mock = ModuleType('mlx') mlx_mock.__version__ = "1.0.0" mlx_mock.__spec__ = ModuleSpec('mlx', None) mlx_mock.__file__ = "mock_mlx.py" mlx_core_mock = ModuleType('mlx.core') mlx_core_mock.__spec__ = ModuleSpec('mlx.core', None) mlx_core_mock.__file__ = "mock_mlx_core.py" # Add required attributes for xoscar serialization mlx_core_mock.array = type('MockArray', (), {}) mlx_mock.core = mlx_core_mock # Create mock lmdeploy module lmdeploy_mock = ModuleType('lmdeploy') lmdeploy_mock.__version__ = "0.6.0" lmdeploy_mock.__spec__ = ModuleSpec('lmdeploy', None) lmdeploy_mock.__file__ = "mock_lmdeploy.py" # Create mock sglang module sglang_mock = ModuleType('sglang') sglang_mock.__version__ = "0.3.0" sglang_mock.__spec__ = ModuleSpec('sglang', None) sglang_mock.__file__ = "mock_sglang.py" # Create mock xllamacpp module with proper module spec for importlib.util.find_spec import importlib.util import importlib.machinery xllamacpp_mock = ModuleType('xllamacpp') xllamacpp_mock.__version__ = "1.0.0" # Create a proper ModuleSpec that importlib.util.find_spec can find xllamacpp_spec = importlib.machinery.ModuleSpec('xllamacpp', None) xllamacpp_spec.origin = "mock_xllamacpp.py" xllamacpp_mock.__spec__ = xllamacpp_spec xllamacpp_mock.__file__ = "mock_xllamacpp.py" # Create mock mlx_lm module mlx_lm_mock = ModuleType('mlx_lm') mlx_lm_mock.__version__ = "1.0.0" mlx_lm_mock.__spec__ = ModuleSpec('mlx_lm', None) mlx_lm_mock.__file__ = "mock_mlx_lm.py" # Create mock mlx_vlm module mlx_vlm_mock = ModuleType('mlx_vlm') mlx_vlm_mock.__version__ = "1.0.0" mlx_vlm_mock.__spec__ = ModuleSpec('mlx_vlm', None) mlx_vlm_mock.__file__ = "mock_mlx_vlm.py" # Mock these modules in sys.modules sys.modules['vllm'] = vllm_mock sys.modules['mlx'] = mlx_mock sys.modules['mlx.core'] = mlx_core_mock sys.modules['lmdeploy'] = lmdeploy_mock sys.modules['sglang'] = sglang_mock sys.modules['xllamacpp'] = xllamacpp_mock sys.modules['mlx_lm'] = mlx_lm_mock sys.modules['mlx_vlm'] = mlx_vlm_mock # Apply mocking before importing xinference modules mock_engine_libraries() # Mock platform checks BEFORE importing xinference modules def mock_platform_checks(): """Mock platform and hardware checks for documentation generation""" # Import and mock engine checks without modifying system-wide platform settings try: # Mock vLLM platform checks import xinference.model.llm.vllm.core as vllm_core vllm_core.VLLMModel._is_linux = lambda: True vllm_core.VLLMModel._has_cuda_device = lambda: True vllm_core.VLLMChatModel._is_linux = lambda: True vllm_core.VLLMChatModel._has_cuda_device = lambda: True vllm_core.VLLMMultiModel._is_linux = lambda: True vllm_core.VLLMMultiModel._has_cuda_device = lambda: True # Mock SGLang platform checks if available try: import xinference.model.llm.sglang.core as sglang_core sglang_core.SGLANGModel._is_linux = lambda: True sglang_core.SGLANGModel._has_cuda_device = lambda: True sglang_core.SGLANGChatModel._is_linux = lambda: True sglang_core.SGLANGChatModel._has_cuda_device = lambda: True sglang_core.SGLANGVisionModel._is_linux = lambda: True sglang_core.SGLANGVisionModel._has_cuda_device = lambda: True except ImportError: pass # Mock LMDEPLOY platform checks if available try: import xinference.model.llm.lmdeploy.core as lmdeploy_core lmdeploy_core.LMDeployModel._is_linux = lambda: True lmdeploy_core.LMDeployModel._has_cuda_device = lambda: True lmdeploy_core.LMDeployChatModel._is_linux = lambda: True lmdeploy_core.LMDeployChatModel._has_cuda_device = lambda: True except ImportError: pass # Mock MLX engine platform checks by monkey-patching the imports within MLX module try: # First, let's monkey-patch sys and platform imports within the MLX module only import xinference.model.llm.mlx.core as mlx_core # Create mock objects that look like sys.platform and platform functions class MockSys: platform = "darwin" class MockPlatform: @staticmethod def system(): return "Darwin" @staticmethod def processor(): return "arm" # Store original references original_mlx_match = mlx_core.MLXModel.match_json original_mlx_chat_match = mlx_core.MLXChatModel.match_json original_mlx_vision_match = mlx_core.MLXVisionModel.match_json # Now create wrapper functions that replace sys and platform only during the platform check def create_wrapped_match_json(original_match): def wrapped_match_json(cls, llm_family, llm_spec, quantization): # Temporarily replace sys and platform in the MLX module import sys as original_sys import platform as original_platform # Replace sys and platform temporarily mlx_core.sys = MockSys() mlx_core.platform = MockPlatform() try: # Call the original match_json which will now see the mocked platform result = original_match.__func__(cls, llm_family, llm_spec, quantization) return result finally: # Restore original sys and platform mlx_core.sys = original_sys mlx_core.platform = original_platform return classmethod(wrapped_match_json) # Apply the wrapped match_json methods mlx_core.MLXModel.match_json = create_wrapped_match_json(original_mlx_match) mlx_core.MLXChatModel.match_json = create_wrapped_match_json(original_mlx_chat_match) mlx_core.MLXVisionModel.match_json = create_wrapped_match_json(original_mlx_vision_match) except ImportError: pass except Exception as e: # If any mocking fails, continue without it print(f"Warning: Could not mock some engine platform checks: {e}") pass mock_platform_checks() from xinference.model.llm.llm_family import SUPPORTED_ENGINES, check_engine_by_spec_parameters from xinference.model.llm.vllm.core import VLLM_INSTALLED, VLLM_SUPPORTED_MODELS, VLLM_SUPPORTED_CHAT_MODELS # Mock platform checks again after imports to ensure they stick # Re-register engines with mocked platform checks from xinference.model.llm import generate_engine_config_by_model_family from xinference.model.llm.llm_family import BUILTIN_LLM_FAMILIES, LLM_ENGINES # Clear existing engine configurations LLM_ENGINES.clear() # Re-register all model families with mocked platform checks for family in BUILTIN_LLM_FAMILIES: generate_engine_config_by_model_family(family) MODEL_HUB_HUGGING_FACE = "Hugging Face" MODEL_HUB_MODELSCOPE = "ModelScope" _LEGACY_TRANSFORMERS_FORMATS = {"pytorch", "gptq", "awq", "bnb"} def build_architecture_to_models(models): architecture_to_models = defaultdict(list) for model in models: for architecture in model.get("architectures", []) or []: architecture_to_models[architecture].append(model["model_name"]) return architecture_to_models def get_metrics_from_url(metrics_url): from prometheus_client.parser import text_string_to_metric_families import requests metrics = requests.get(metrics_url).content result = [] for family in text_string_to_metric_families(metrics.decode("utf-8")): result.append({ "name": family.name, "type": family.type, "help": family.documentation, }) return result def _can_use_transformers_legacy(model, model_spec): if model_spec.get("model_format") not in _LEGACY_TRANSFORMERS_FORMATS: return False abilities = set(model.get("model_ability", [])) return "chat" in abilities or "generate" in abilities def _extract_primary_model_src(model): if model.get("model_specs"): for spec in model["model_specs"]: if isinstance(spec, dict) and "model_src" in spec: return spec["model_src"] return model.get("model_src") def main(): template_dir = '../templates' env = Environment(loader=FileSystemLoader(template_dir)) with open('../../xinference/model/llm/llm_family.json', 'r') as model_file: models = json.load(model_file) model_by_names = { m['model_name']: m for m in models} sorted_models = [] output_dir = './models/builtin/llm' os.makedirs(output_dir, exist_ok=True) current_files = {f for f in os.listdir(output_dir) if os.path.isfile(os.path.join(output_dir, f))} for model_name in sorted(model_by_names, key=str.lower): model = model_by_names[model_name] sorted_models.append(model) for model_spec in model['model_specs']: model_spec['model_hubs'] = [] # Process different model sources if 'model_src' in model_spec: # Handle new model_src structure if 'huggingface' in model_spec['model_src']: hf_src = model_spec['model_src']['huggingface'] model_spec['model_hubs'].append({ 'name': MODEL_HUB_HUGGING_FACE, 'url': f"https://huggingface.co/{hf_src['model_id']}" }) # Set model_id and quantizations for template compatibility model_spec['model_id'] = hf_src['model_id'] model_spec['quantizations'] = hf_src['quantizations'] quantizations = hf_src['quantizations'] if 'modelscope' in model_spec['model_src']: ms_src = model_spec['model_src']['modelscope'] model_spec['model_hubs'].append({ 'name': MODEL_HUB_MODELSCOPE, 'url': f"https://modelscope.cn/models/{ms_src['model_id']}" }) # If only modelscope exists and no huggingface, use modelscope data if 'modelscope' in model_spec['model_src'] and 'huggingface' not in model_spec['model_src']: ms_src = model_spec['model_src']['modelscope'] model_spec['model_id'] = ms_src['model_id'] model_spec['quantizations'] = ms_src['quantizations'] quantizations = ms_src['quantizations'] else: # Fallback for old format if still exists model_spec['model_hubs'].append({ 'name': MODEL_HUB_HUGGING_FACE, 'url': f"https://huggingface.co/{model_spec['model_id']}" }) quantizations = model_spec.get('quantizations', []) # model engines engines = [] for engine in SUPPORTED_ENGINES: for quantization in quantizations: size = model_spec['model_size_in_billions'] if isinstance(size, str) and '_' not in size: size = int(size) try: check_engine_by_spec_parameters(engine, model_name, model_spec['model_format'], size, quantization) except ValueError: if engine == "Transformers" and _can_use_transformers_legacy( model, model_spec ): engines.append(engine) continue else: engines.append(engine) model_spec['engines'] = sorted(list(set(engines)), reverse=True) rendered = env.get_template('llm.rst.jinja').render(model) output_file_name = f"{model['model_name'].lower()}.rst" if output_file_name in current_files: current_files.remove(output_file_name) output_file_path = os.path.join(output_dir, output_file_name) with open(output_file_path, 'w') as output_file: output_file.write(rendered) print(output_file_path) if current_files: for f in current_files: print(f"remove {f}") os.remove(os.path.join(output_dir, f)) index_file_path = os.path.join(output_dir, "index.rst") with open(index_file_path, "w") as file: rendered_index = env.get_template('llm_index.rst.jinja').render(models=sorted_models) file.write(rendered_index) llm_sorted_models = sorted_models with open('../../xinference/model/embedding/model_spec.json', 'r') as file: models = json.load(file) model_by_names = { m['model_name']: m for m in models} sorted_models = [] output_dir = './models/builtin/embedding' os.makedirs(output_dir, exist_ok=True) for model_name in sorted(model_by_names, key=str.lower): model = model_by_names[model_name] sorted_models.append(model) model['model_hubs'] = [] # Process model specs for new model_src structure if 'model_specs' in model and model['model_specs']: model_spec = model['model_specs'][0] # Use first spec for model hubs if 'model_src' in model_spec: if 'huggingface' in model_spec['model_src']: hf_src = model_spec['model_src']['huggingface'] model['model_hubs'].append({ 'name': MODEL_HUB_HUGGING_FACE, 'url': f"https://huggingface.co/{hf_src['model_id']}" }) # Set model_id for template compatibility (prefer huggingface) model['model_id'] = hf_src['model_id'] if 'modelscope' in model_spec['model_src']: ms_src = model_spec['model_src']['modelscope'] model['model_hubs'].append({ 'name': MODEL_HUB_MODELSCOPE, 'url': f"https://modelscope.cn/models/{ms_src['model_id']}" }) # Only set modelscope model_id if no huggingface exists if 'huggingface' not in model_spec['model_src']: model['model_id'] = ms_src['model_id'] else: # Fallback for old format model_id = model_spec.get('model_id', model.get('model_id', '')) model['model_id'] = model_id model['model_hubs'].append({ 'name': MODEL_HUB_HUGGING_FACE, 'url': f"https://huggingface.co/{model_id}" }) else: # Fallback for very old format if 'model_id' in model: model['model_hubs'].append({ 'name': MODEL_HUB_HUGGING_FACE, 'url': f"https://huggingface.co/{model['model_id']}" }) rendered = env.get_template('embedding.rst.jinja').render(model) output_file_path = os.path.join(output_dir, f"{model['model_name'].lower()}.rst") with open(output_file_path, 'w') as output_file: output_file.write(rendered) print(output_file_path) index_file_path = os.path.join(output_dir, "index.rst") with open(index_file_path, "w") as file: rendered_index = env.get_template('embedding_index.rst.jinja').render(models=sorted_models) file.write(rendered_index) with open('../../xinference/model/rerank/model_spec.json', 'r') as file: models = json.load(file) sorted_models = sorted(models, key=lambda x: x['model_name'].lower()) output_dir = './models/builtin/rerank' os.makedirs(output_dir, exist_ok=True) for model in sorted_models: # Initialize model_hubs list model['model_hubs'] = [] # Process model specs for new model_src structure model_spec = model['model_specs'][0] # Use first spec for model hubs if 'model_src' in model_spec: if 'huggingface' in model_spec['model_src']: hf_src = model_spec['model_src']['huggingface'] model['model_hubs'].append({ 'name': MODEL_HUB_HUGGING_FACE, 'url': f"https://huggingface.co/{hf_src['model_id']}" }) # Set model_id for template compatibility (prefer huggingface) model['model_id'] = hf_src['model_id'] if 'modelscope' in model_spec['model_src']: ms_src = model_spec['model_src']['modelscope'] model['model_hubs'].append({ 'name': MODEL_HUB_MODELSCOPE, 'url': f"https://modelscope.cn/models/{ms_src['model_id']}" }) # Only set modelscope model_id if no huggingface exists if 'huggingface' not in model_spec['model_src']: model['model_id'] = ms_src['model_id'] rendered = env.get_template('rerank.rst.jinja').render(model) output_file_path = os.path.join(output_dir, f"{model['model_name'].lower()}.rst") with open(output_file_path, 'w') as output_file: output_file.write(rendered) index_file_path = os.path.join(output_dir, "index.rst") with open(index_file_path, "w") as file: rendered_index = env.get_template('rerank_index.rst.jinja').render(models=sorted_models) file.write(rendered_index) with open('../../xinference/model/image/model_spec.json', 'r') as file: models = json.load(file) sorted_models = sorted(models, key=lambda x: x['model_name'].lower()) output_dir = './models/builtin/image' os.makedirs(output_dir, exist_ok=True) for model in sorted_models: # Process model_src for template compatibility model_src = _extract_primary_model_src(model) if model_src: if 'huggingface' in model_src: hf_src = model_src['huggingface'] model['model_id'] = hf_src['model_id'] # Handle GGUF related fields if 'gguf_model_id' in hf_src: model['gguf_model_id'] = hf_src['gguf_model_id'] if 'gguf_quantizations' in hf_src: model['gguf_quantizations'] = ", ".join(hf_src['gguf_quantizations']) # Handle Lightning related fields if 'lightning_model_id' in hf_src: model['lightning_model_id'] = hf_src['lightning_model_id'] if 'lightning_versions' in hf_src: model['lightning_versions'] = ", ".join(hf_src['lightning_versions']) elif 'modelscope' in model_src: model['model_id'] = model_src['modelscope']['model_id'] available_controlnet = [cn["model_name"] for cn in model.get("controlnet", [])] if not available_controlnet: available_controlnet = None model["available_controlnet"] = available_controlnet model["model_ability"] = ', '.join(model.get("model_ability")) # Ensure gguf_quantizations is properly formatted (fallback for old format) if "gguf_quantizations" not in model: model["gguf_quantizations"] = ", ".join(model.get("gguf_quantizations", [])) rendered = env.get_template('image.rst.jinja').render(model) output_file_path = os.path.join(output_dir, f"{model['model_name'].lower()}.rst") with open(output_file_path, 'w') as output_file: output_file.write(rendered) index_file_path = os.path.join(output_dir, "index.rst") with open(index_file_path, "w") as file: rendered_index = env.get_template('image_index.rst.jinja').render(models=sorted_models) file.write(rendered_index) with open('../../xinference/model/audio/model_spec.json', 'r') as file: models = json.load(file) sorted_models = sorted(models, key=lambda x: x['model_name'].lower()) output_dir = './models/builtin/audio' os.makedirs(output_dir, exist_ok=True) for model in sorted_models: # Process model_src for template compatibility model_src = _extract_primary_model_src(model) if model_src: if 'huggingface' in model_src: model['model_id'] = model_src['huggingface']['model_id'] elif 'modelscope' in model_src: model['model_id'] = model_src['modelscope']['model_id'] rendered = env.get_template('audio.rst.jinja').render(model) output_file_path = os.path.join(output_dir, f"{model['model_name'].lower()}.rst") with open(output_file_path, 'w') as output_file: output_file.write(rendered) index_file_path = os.path.join(output_dir, "index.rst") with open(index_file_path, "w") as file: rendered_index = env.get_template('audio_index.rst.jinja').render(models=sorted_models) file.write(rendered_index) with open('../../xinference/model/video/model_spec.json', 'r') as file: models = json.load(file) sorted_models = sorted(models, key=lambda x: x['model_name'].lower()) output_dir = './models/builtin/video' os.makedirs(output_dir, exist_ok=True) for model in sorted_models: # Process model_src for template compatibility model_src = _extract_primary_model_src(model) if model_src: if 'huggingface' in model_src: model['model_id'] = model_src['huggingface']['model_id'] elif 'modelscope' in model_src: model['model_id'] = model_src['modelscope']['model_id'] model["model_ability"] = ', '.join(model.get("model_ability")) rendered = env.get_template('video.rst.jinja').render(model) output_file_path = os.path.join(output_dir, f"{model['model_name'].lower()}.rst") with open(output_file_path, 'w') as output_file: output_file.write(rendered) index_file_path = os.path.join(output_dir, "index.rst") with open(index_file_path, "w") as file: rendered_index = env.get_template('video_index.rst.jinja').render(models=sorted_models) file.write(rendered_index) if VLLM_INSTALLED: architecture_to_models = build_architecture_to_models(llm_sorted_models) supported_architectures = [] for architecture in VLLM_SUPPORTED_MODELS + VLLM_SUPPORTED_CHAT_MODELS: if architecture not in supported_architectures: supported_architectures.append(architecture) groups = [] for architecture in supported_architectures: if architecture in architecture_to_models: model_names = sorted(set(architecture_to_models[architecture]), key=str.lower) groups.append(model_names) else: groups.append([architecture]) groups = [', '.join("``%s``" % m for m in group) for group in groups] vllm_model_str = '\n'.join('- %s' % group for group in groups) for fn in ['getting_started/installation.rst', 'user_guide/backends.rst']: with open(fn) as f: content = f.read() start_label = '.. vllm_start' end_label = '.. vllm_end' start = content.find(start_label) + len(start_label) end = content.find(end_label) new_content = content[:start] + '\n\n' + vllm_model_str + '\n' + content[end:] with open(fn, 'w') as f: f.write(new_content) try: output_dir = './user_guide' os.makedirs(output_dir, exist_ok=True) supervisor_metrics = get_metrics_from_url("http://127.0.0.1:9997/metrics") worker_metrics = get_metrics_from_url("http://127.0.0.1:9977/metrics") all_metrics = {"supervisor_metrics": supervisor_metrics, "worker_metrics": worker_metrics} rendered = env.get_template('metrics.jinja').render(all_metrics) output_file_path = os.path.join(output_dir, "metrics.rst") with open(output_file_path, 'w') as output_file: output_file.write(rendered) except Exception: print("Skip generate metrics doc, please start a local xinference server by: `xinference-local -mp 9977`.") if __name__ == "__main__": main() ================================================ FILE: doc/source/getting_started/environments.rst ================================================ .. _environments: ====================== Environments Variables ====================== XINFERENCE_ENDPOINT ~~~~~~~~~~~~~~~~~~~~ Endpoint of Xinference, used to connect to Xinference service. Default value is http://127.0.0.1:9997 , you can get it through logs. XINFERENCE_MODEL_SRC ~~~~~~~~~~~~~~~~~~~~~ Modelhub used for downloading models. Default is "huggingface", or you can set "modelscope" as downloading source. .. _environments_xinference_home: XINFERENCE_HOME ~~~~~~~~~~~~~~~~ By default, Xinference uses ``/.xinference`` as home path to store necessary files such as logs and models, where ```` is the home path of current user. You can change this directory by configuring this environment variable. XINFERENCE_HEALTH_CHECK_FAILURE_THRESHOLD ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The maximum number of failed health checks tolerated at Xinference startup. Default value is 5. XINFERENCE_HEALTH_CHECK_INTERVAL ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Health check interval (seconds) at Xinference startup. Default value is 5. XINFERENCE_HEALTH_CHECK_TIMEOUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Health check timeout (seconds) at Xinference startup. Default value is 10. XINFERENCE_DISABLE_HEALTH_CHECK ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Xinference will automatically report health check at Xinference startup. Setting this environment to 1 can disable health check. XINFERENCE_DISABLE_METRICS ~~~~~~~~~~~~~~~~~~~~~~~~~~ Xinference will by default enable the metrics exporter on the supervisor and worker. Setting this environment to 1 will disable the /metrics endpoint on the supervisor and the HTTP service (only provide the /metrics endpoint) on the worker. XINFERENCE_DOWNLOAD_MAX_ATTEMPTS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Maximum download retry attempts for model files. Default value is 3. XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Enable continuous batching for text-to-image models by specifying the target image size (e.g., ``1024*1024``). Default is unset. XINFERENCE_SSE_PING_ATTEMPTS_SECONDS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Server-Sent Events keepalive ping interval (seconds). Default value is 600. XINFERENCE_MAX_TOKENS ~~~~~~~~~~~~~~~~~~~~~ Global max tokens limit override for requests. Default is unset. XINFERENCE_ALLOWED_IPS ~~~~~~~~~~~~~~~~~~~~~~ Restrict access to specified IPs or CIDR blocks. Default is unset (no restriction). XINFERENCE_BATCH_SIZE ~~~~~~~~~~~~~~~~~~~~~ Default batch size used by the server when batching is enabled. Default value is 32. XINFERENCE_BATCH_INTERVAL ~~~~~~~~~~~~~~~~~~~~~~~~~ Default batching interval (seconds). Default value is 0.003. XINFERENCE_ALLOW_MULTI_REPLICA_PER_GPU ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Whether to allow multiple replicas on a single GPU. Default value is 1 (enabled). XINFERENCE_LAUNCH_STRATEGY ~~~~~~~~~~~~~~~~~~~~~~~~~~ GPU allocation strategy for replicas. Default is ``IDLE_FIRST_LAUNCH_STRATEGY``. XINFERENCE_ENABLE_VIRTUAL_ENV ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Enable model virtual environments globally. Default value is 1 (enabled, starting from v2.0). XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Skip packages already present in system site-packages when creating virtual environments. Default value is 1. XINFERENCE_CSG_TOKEN ~~~~~~~~~~~~~~~~~~~~ Authentication token for CSGHub model source. Default is unset. XINFERENCE_CSG_ENDPOINT ~~~~~~~~~~~~~~~~~~~~~~~ CSGHub endpoint for model source. Default value is ``https://hub-stg.opencsg.com/``. ================================================ FILE: doc/source/getting_started/index.rst ================================================ .. _getting_started_index: =============== Getting Started =============== .. toctree:: :maxdepth: 2 installation using_xinference logging using_docker_image using_kubernetes troubleshooting environments release_notes ================================================ FILE: doc/source/getting_started/installation.rst ================================================ .. _installation: ============ Installation ============ Xinference can be installed with ``pip`` on Linux, Windows, and macOS. To run models using Xinference, you will need to install the backend corresponding to the type of model you intend to serve. If you aim to serve all supported models, you can install all the necessary dependencies with a single command:: pip install "xinference[all]" .. versionchanged:: v1.8.1 Due to irreconcilable package dependency conflicts between vLLM and sglang, we have removed sglang from the all extra. If you want to use sglang, please install it separately via ``pip install 'xinference[sglang]'``. Several usage scenarios require special attention. .. admonition:: **GGUF format** with **llama.cpp engine** In this situation, it's advised to install its dependencies manually based on your hardware specifications to enable acceleration. For more details, see the :ref:`installation_gguf` section. .. admonition:: **AWQ or GPTQ** format with **transformers engine** **This section is added in v1.6.0.** This is because the dependencies at this stage require special options and are difficult to install. Please run command below in advance .. code-block:: bash pip install "xinference[transformers_quantization]" --no-build-isolation Some dependencies like ``transformers`` might be downgraded, you can run ``pip install "xinference[all]"`` afterwards. If you want to install only the necessary backends, here's a breakdown of how to do it. .. _inference_backend: Transformers Backend ~~~~~~~~~~~~~~~~~~~~ PyTorch (transformers) supports the inference of most state-of-art models. It is the default backend for models in PyTorch format:: pip install "xinference[transformers]" Notes: - The transformers engine supports ``pytorch`` / ``gptq`` / ``awq`` / ``bnb`` / ``fp4`` formats. - FP4 format requires ``transformers`` with ``FPQuantConfig`` support. If you see an import error, please upgrade ``transformers`` to a newer version. vLLM Backend ~~~~~~~~~~~~ vLLM is a fast and easy-to-use library for LLM inference and serving. Xinference will choose vLLM as the backend to achieve better throughput when the following conditions are met: - The model format is ``pytorch``, ``gptq``, ``awq``, ``fp4``, ``fp8`` or ``bnb``. - When the model format is ``pytorch``, the quantization is ``none``. - When the model format is ``awq``, the quantization is ``Int4``. - When the model format is ``gptq``, the quantization is ``Int3``, ``Int4`` or ``Int8``. - The system is Linux and has at least one CUDA device - The model family (for custom models) / model name (for builtin models) is within the list of models supported by vLLM Currently, supported models include: .. vllm_start - ``code-llama``, ``code-llama-instruct``, ``code-llama-python``, ``deepseek``, ``deepseek-chat``, ``deepseek-coder``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-llama``, ``gorilla-openfunctions-v2``, ``HuatuoGPT-o1-LLaMA-3.1``, ``llama-2``, ``llama-2-chat``, ``llama-3``, ``llama-3-instruct``, ``llama-3.1``, ``llama-3.1-instruct``, ``llama-3.3-instruct``, ``tiny-llama``, ``wizardcoder-python-v1.0``, ``wizardmath-v1.0``, ``Yi``, ``Yi-1.5``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k``, ``Yi-200k``, ``Yi-chat`` - ``codestral-v0.1``, ``mistral-instruct-v0.1``, ``mistral-instruct-v0.2``, ``mistral-instruct-v0.3``, ``mistral-large-instruct``, ``mistral-nemo-instruct``, ``mistral-v0.1``, ``openhermes-2.5``, ``seallm_v2`` - ``Baichuan-M2``, ``codeqwen1.5``, ``codeqwen1.5-chat``, ``deepseek-r1-distill-qwen``, ``DianJin-R1``, ``fin-r1``, ``HuatuoGPT-o1-Qwen2.5``, ``KAT-V1``, ``marco-o1``, ``qwen1.5-chat``, ``qwen2-instruct``, ``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-coder-instruct``, ``qwen2.5-instruct``, ``qwen2.5-instruct-1m``, ``qwenLong-l1``, ``QwQ-32B``, ``QwQ-32B-Preview``, ``seallms-v3``, ``skywork-or1``, ``skywork-or1-preview``, ``XiYanSQL-QwenCoder-2504`` - ``llama-3.2-vision``, ``llama-3.2-vision-instruct`` - ``baichuan-2``, ``baichuan-2-chat`` - ``InternLM2ForCausalLM`` - ``qwen-chat`` - ``mixtral-8x22B-instruct-v0.1``, ``mixtral-instruct-v0.1``, ``mixtral-v0.1`` - ``cogagent`` - ``glm-edge-chat``, ``glm4-chat``, ``glm4-chat-1m`` - ``codegeex4``, ``glm-4v`` - ``seallm_v2.5`` - ``orion-chat`` - ``qwen1.5-moe-chat``, ``qwen2-moe-instruct`` - ``CohereForCausalLM`` - ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-vl2`` - ``deepseek-prover-v2``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-v3``, ``deepseek-v3-0324``, ``Deepseek-V3.1``, ``moonlight-16b-a3b-instruct`` - ``deepseek-r1-0528-qwen3``, ``qwen3`` - ``minicpm3-4b`` - ``internlm3-instruct`` - ``gemma-3-1b-it`` - ``glm4-0414`` - ``minicpm-2b-dpo-bf16``, ``minicpm-2b-dpo-fp16``, ``minicpm-2b-dpo-fp32``, ``minicpm-2b-sft-bf16``, ``minicpm-2b-sft-fp32``, ``minicpm4`` - ``Ernie4.5`` - ``Qwen3-Coder``, ``Qwen3-Instruct``, ``Qwen3-Thinking`` - ``glm-4.5``, ``GLM-4.6``, ``GLM-4.7`` - ``gpt-oss`` - ``seed-oss`` - ``Qwen3-Next-Instruct``, ``Qwen3-Next-Thinking`` - ``DeepSeek-V3.2``, ``DeepSeek-V3.2-Exp`` - ``MiniMax-M2``, ``MiniMax-M2.5`` - ``glm-5`` .. vllm_end To install Xinference and vLLM:: pip install "xinference[vllm]" # FlashInfer is optional but required for specific functionalities such as sliding window attention with Gemma 2. # For CUDA 12.4 & torch 2.4 to support sliding window attention for gemma 2 and llama 3.1 style rope pip install flashinfer -i https://flashinfer.ai/whl/cu124/torch2.4 # For other CUDA & torch versions, please check https://docs.flashinfer.ai/installation.html .. _installation_gguf: Llama.cpp Backend ~~~~~~~~~~~~~~~~~ Xinference supports models in ``gguf`` format via ``xllamacpp``. `xllamacpp `_ is developed by Xinference team, and is the sole backend for llama.cpp since v1.6.0. .. warning:: Since Xinference v1.5.0, ``llama-cpp-python`` is deprecated. Since Xinference v1.6.0, ``llama-cpp-python`` has been removed. Initial setup:: pip install "xinference[llama_cpp]" For more installation instructions for ``xllamacpp`` to enable GPU acceleration, please refer to: https://github.com/xorbitsai/xllamacpp SGLang Backend ~~~~~~~~~~~~~~ SGLang has a high-performance inference runtime with RadixAttention. It significantly accelerates the execution of complex LLM programs by automatic KV cache reuse across multiple calls. And it also supports other common techniques like continuous batching and tensor parallelism. Initial setup:: pip install "xinference[sglang]" MLX Backend ~~~~~~~~~~~ MLX-lm is designed for Apple silicon users to run LLM efficiently. Initial setup:: pip install "xinference[mlx]" Other Platforms ~~~~~~~~~~~~~~~ * :ref:`Ascend NPU ` ================================================ FILE: doc/source/getting_started/installation_npu.rst ================================================ .. _installation_npu: ================================= Installation Guide for Ascend NPU ================================= Xinference can run on Ascend NPU, follow below instructions to install. .. warning:: The open-source version relies on Transformers for inference, which can be slow on chips like 310p3. We provide an enterprise version that supports the MindIE engine, offering better performance and compatibility for Ascend NPU. Refer to `Xinference Enterprise `_ Installing PyTorch and Ascend extension for PyTorch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Install PyTorch CPU version and corresponding Ascend extension. Take PyTorch v2.1.0 as example. .. code-block:: bash pip3 install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cpu Then install `Ascend extension for PyTorch `_. .. code-block:: bash pip3 install 'numpy<2.0' pip3 install decorator pip3 install torch-npu==2.1.0.post3 Running below command to see if it correctly prints the Ascend NPU count. .. code-block:: bash python -c "import torch; import torch_npu; print(torch.npu.device_count())" Installing Xinference ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash pip3 install xinference Now you can use xinference according to :ref:`doc `. ``Transformers`` backend is the only available engine supported for Ascend NPU for open source version. Enterprise Support ~~~~~~~~~~~~~~~~~~ If you encounter any performance or other issues for Ascend NPU, please reach out to us via `link `_. ================================================ FILE: doc/source/getting_started/logging.rst ================================================ .. _logging: ===================== Logging in Xinference ===================== Configure Log Level ################### You can configure the log level with the ``--log-level`` option. For example, starting a local cluster with ``DEBUG`` log level: .. code-block:: bash xinference-local --log-level debug Log Files ######### Xinference supports log rotation of log files. By default, logs rotate when they reach 100MB (maxBytes), and up to 30 backup files (backupCount) are kept. Note that the log level configured above takes effect in both the command line logs and the log files. Log Directory Structure *********************** All the logs are stored in the ``/logs`` directory, where ```` can be configured as mentioned in :ref:`using_xinference`. Xinference creates a subdirectory under the log directory ``/logs``. The name of the subdirectory corresponds to the Xinference cluster startup time in milliseconds. Local deployment ================ In a local deployment, the logs of Xinference supervisor and Xinference workers are combined into a single file. An example of the log directory structure is shown below:: /logs └── local_1699503558105 └── xinference.log where ``1699503558105`` is the timestamp when the Xinference cluster was created. Therefore, when you create a cluster locally multiple times, you can look for the corresponding logs based on this timestamp. Distributed deployment ====================== In a distributed deployment, Xinference supervisor and Xinference workers each create their own subdirectory under the log directory. The name of the subdirectory starts with the role name, followed by the role startup time in milliseconds. An example of the log directory structure is shown below:: /logs └── supervisor_1699503558908 └── xinference.log worker_1699503559105 └── xinference.log ================================================ FILE: doc/source/getting_started/release_notes.rst ================================================ .. _release_ntoes: Release Notes ============= This page provides a version-by-version index of Xinference release notes. For detailed updates, please visit the corresponding links below. +-----------------+--------------------------------------------------------------------------------+ | Version | Release Notes | +=================+================================================================================+ | v2.3.0 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ | v2.2.0 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ | v2.1.0 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ | v2.0.0 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ | v1.17.0 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ | v1.16.0 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ | v1.15.0 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ | v1.14.0 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ | v1.13.0 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ | v1.12.0 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ | v1.11.0.post1 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ | v1.10.1 | `View release notes `_ | +-----------------+--------------------------------------------------------------------------------+ ---- For older versions and source history, see our GitHub releases page: https://github.com/xorbitsai/inference/releases ================================================ FILE: doc/source/getting_started/troubleshooting.rst ================================================ .. _troubleshooting: =============== Troubleshooting =============== No huggingface repo access ========================== Sometimes, you may face errors accessing huggingface models, such as the following message when accessing `llama2`: .. code-block:: text Cannot access gated repo for url https://huggingface.co/api/models/meta-llama/Llama-2-7b-hf. Repo model meta-llama/Llama-2-7b-hf is gated. You must be authenticated to access it. This typically indicates either a lack of access rights to the repository or missing huggingface access tokens. The following sections provide guidance on addressing these issues. Get access to the huggingface repo ---------------------------------- To obtain access, navigate to the desired huggingface repository and agree to its terms and conditions. As an illustration, for the `llama2` model, you can use this link: `https://huggingface.co/meta-llama/Llama-2-7b-hf `_. Set up credentials to access huggingface ---------------------------------------- Your credential to access huggingface can be found online at `https://huggingface.co/settings/tokens `_. You can set the token as an environmental variable, with ``export HUGGING_FACE_HUB_TOKEN=your_token_here``. Incompatibility Between NVIDIA Driver and PyTorch Version ========================================================= If you are using a NVIDIA GPU, you may face the following error: .. code-block:: text UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver by downloading and installi ng a new version from the URL: http://www.nvidia.com/Download/index.aspx Alterna tively, go to: https://pytorch.org to install a PyTorch version that has been co mpiled with your version of the CUDA driver. (Triggered internally at ..\c10\cu da\CUDAFunctions.cpp:112.) This typically indicates that your CUDA driver version is not compatible with the PyTorch version you are using. Go to `https://pytorch.org `_ to install a PyTorch version that has been compiled with your version of the CUDA driver. **Do not install a cuda version smaller than 11.8, preferably between 11.8 and 12.1.** Say if your CUDA driver version is 11.8, then you can install PyTorch with the following command: .. code-block:: python pip install torch==2.0.1+cu118 Xinference service cannot be accessed from external systems through ``:9997`` ================================================================================= Use ``-H 0.0.0.0`` parameter in when starting Xinference: .. code:: bash xinference-local -H 0.0.0.0 Then Xinference service will listen on all network interfaces (not limited to ``127.0.0.1`` or ``localhost``). If you are using the :ref:`using_docker_image`, please add ``-p :9997`` during the docker run command, then access is available through ``:`` of the local machine. Launching a built-in model takes a long time, and sometimes the model fails to download ======================================================================================= Xinference by default uses HuggingFace as the source for models. If your machines are in Mainland China, there might be accessibility issues when using built-in models. To address this, add environment variable ``XINFERENCE_MODEL_SRC=modelscope`` when starting the Xinference to change the model source to ModelScope, which is optimized for Mainland China. If you’re starting Xinference with Docker, include ``-e XINFERENCE_MODEL_SRC=modelscope`` during the docker run command. When using the official Docker image, RayWorkerVllm died due to OOM, causing the model to fail to load ======================================================================================================= Docker's ``--shm-size`` parameter is used to set the size of shared memory. The default size of shared memory (/dev/shm) is 64MB, which may be too small for vLLM backend. You can increase its size by setting the ``--shm-size`` parameter as follows: .. code:: bash docker run --shm-size=128g ... Missing ``model_engine`` parameter when launching LLM models ============================================================ Since version ``v0.11.0``, launching LLM models requires an additional ``model_engine`` parameter. For specific information, please refer to :ref:`here `. Resolving MKL Threading Layer Conflicts ======================================== When starting the Xinference server, you may encounter the error: ``ValueError: Model architectures ['Qwen2ForCausalLM'] failed to be inspected. Please check the logs for more details.`` The underlying cause shown in the logs is: .. code-block:: text Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library. Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it. This typically occurs when NumPy was installed via conda. Conda's NumPy is built with Intel MKL optimizations, which conflicts with the GNU OpenMP library (libgomp) already loaded in your environment. Solution 1: Override the Threading Layer ----------------------------------------- Force Intel's Math Kernel Library to use GNU's OpenMP implementation: .. code-block:: bash MKL_THREADING_LAYER=GNU xinference-local Solution 2: Reinstall NumPy with pip ------------------------------------- Uninstall conda's NumPy and reinstall using pip: .. code-block:: bash pip uninstall -y numpy && pip install numpy #Or just --force-reinstall pip install --force-reinstall numpy Related Note: vLLM and PyTorch ------------------------------- If you're using vLLM, avoid installing PyTorch with conda. Refer to the official vLLM installation guide for GPU-specific instructions: https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html Configuring PyPI Mirrors to Speed Up Package Installation ========================================================== If you're in Mainland China, using a PyPI mirror can significantly speed up package installation. Here are some commonly used mirrors: - Tsinghua University: ``https://pypi.tuna.tsinghua.edu.cn/simple`` - Alibaba Cloud: ``https://mirrors.aliyun.com/pypi/simple/`` - Tencent Cloud: ``https://mirrors.cloud.tencent.com/pypi/simple`` However, be aware that some packages may not be available on certain mirrors. For example, if you're installing ``xinference[audio]`` using only the Aliyun mirror, the installation may fail. This happens because ``num2words``, a dependency used by ``MeloTTS``, is not available on the Aliyun mirror. As a result, ``pip install xinference[audio]`` will resolve to older versions like ``xinference==1.2.0`` and ``xoscar==0.8.0`` (as of Oct 27, 2025). These older versions are incompatible and will produce the error: ``MainActorPool.append_sub_pool() got an unexpected keyword argument 'start_method'`` .. code-block:: bash curl -s https://mirrors.aliyun.com/pypi/simple/num2words/ | grep -i "num2words" # Returns NOTHING! But it works on Tsinghua or Tencent mirrors. # uv pip install "xinference[audio]" will then install the following packages (as of Oct 27, 2025): + x-transformers==2.10.2 + xinference==1.2.0 + xoscar==0.8.0 To avoid this issue when installing the xinference audio package, use multiple mirrors: .. code-block:: bash uv pip install xinference[audio] --index-url https://mirrors.aliyun.com/pypi/simple --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple # Optional: Set this globally in your uv config mkdir -p ~/.config/uv cat >> ~/.config/uv/uv.toml << EOF index-url = "https://mirrors.aliyun.com/pypi/simple" extra-index-url = ["https://pypi.tuna.tsinghua.edu.cn/simple"] EOF Installing Xinference 1.12.0 with uv Fails (As of November 2025) ================================================================= **Note:** This is a temporary issue due to the current package ecosystem and uv prioritizing **higher versions for direct dependencies** over **indirect dependencies**. Symptom ------- When installing xinference 1.12.0 as of November 2025 using ``uv pip install xinference``, you may encounter an issue where very old package versions are installed, particularly: - ``transformers==4.12.2`` (from 2021) - ``tokenizers==0.10.3`` (from 2021) - ``huggingface-hub==1.0.1`` Then uv fails with "Failed to build `tokenizers==0.10.3`" Root Cause ---------- This occurs because uv prioritizes **higher versions for direct dependencies** over **indirect dependencies**: 1. xinference 1.12.0 specifies ``huggingface-hub>=0.19.4`` as a **direct dependency** (no upper bound) 2. uv selects the latest: ``huggingface-hub==1.0.1`` as of November 06 2025 3. However, ``transformers<=4.57.3`` (an **indirect dependency** via ``peft``) requires ``huggingface-hub<1.0`` 4. To resolve the conflict, uv keeps the direct dependency at 1.0.1 and downgrades the indirect dependency ``transformers`` to ancient version 4.12.2 **This is by design in uv**: it prioritizes what you explicitly ask for (direct dependencies) over transitive dependencies. Refer to https://github.com/astral-sh/uv/issues/16601 **Update:** The latest transformers 4.57.3 (as in 2026.01.05) still requires ``huggingface-hub<1.0``. Solutions --------- **Solution 1: Pre-constrain huggingface-hub (Recommended)** Explicitly constrain ``huggingface-hub`` to a compatible version range: .. code-block:: bash uv pip install "huggingface-hub>=0.34.0,<1.0" xinference This forces uv to select a ``huggingface-hub`` version that's compatible with modern ``transformers``. **Solution 2: Make transformers a direct dependency** By specifying ``transformers`` explicitly, it becomes a direct dependency and uv will prefer higher versions: .. code-block:: bash uv pip install transformers xinference **Solution 3: Use pip** Or just resort to using ``pip install xinference`` which will resolve to the following versions - ``transformers==4.57.1`` - ``huggingface-hub==0.36.0`` - ``tokenizers==0.22.1`` vLLM + Torch + Xinference Compatibility Issue (Segmentation Fault) =================================================================== Symptom ------- If you have **vLLM < 0.12.0** installed and upgrade xinference (particularly using ``uv pip install -U xinference``), xinference may fail to start with a segmentation fault: .. code-block:: text root@server:/home# xinference-local --host 0.0.0.0 --port 9997 INFO 12-30 17:35:37 [__init__.py:216] Automatically detected platform cuda. Aborted (core dumped) Root Cause ---------- This issue has three contributing factors: 1. **Binary Incompatibility**: vLLM versions before 0.12.0 were compiled against PyTorch 2.8.0. These versions are incompatible with PyTorch 2.9. Reference: `vLLM v0.12.0 Release Notes `_ 2. **Xinference's Unbounded Torch Dependency**: Xinference's ``setup.cfg`` does not specify an upper bound for PyTorch: .. code-block:: ini [options] install_requires = torch # No version constraint! This allows package managers to upgrade PyTorch to incompatible versions. 3. **Different Package Manager Behaviors**: - **pip**: Conservative - only upgrades the specified package unless dependencies are incompatible - **uv with -U flag**: Aggressive - re-resolves ALL dependencies and picks latest versions Therefore before you're ready to upgrade your entire stack and just want to upgrade xinference, use either: - ``pip install -U xinference`` (keeps PyTorch unchanged, only upgrades xinference) - ``uv pip install "xinference==1.16.0"`` (without -U flag, only upgrades xinference too) ================================================ FILE: doc/source/getting_started/using_docker_image.rst ================================================ .. _using_docker_image: ======================= Xinference Docker Image ======================= Xinference provides official images for use on Dockerhub. .. versionchanged:: v2.0 Starting from **Xinference v2.0**, to use the CUDA version of the image, the minimum CUDA version must be **CUDA 12.9**. Prerequisites ============= * The image can only run in an environment with GPUs and CUDA installed, because Xinference in the image relies on Nvidia GPUs for acceleration. * CUDA must be successfully installed on the host machine. This can be determined by whether you can successfully execute the ``nvidia-smi`` command. * For CUDA version >= 12.9, CUDA version in the docker image is ``12.9``, and the CUDA version on the host machine should be ``12.9`` or above, and the NVIDIA driver version should be ``575`` or above. * Ensure `NVIDIA Container Toolkit `_ installed. Docker Image ============ The official image of Xinference is available on DockerHub in the repository ``xprobe/xinference``. Available tags include: * ``nightly-main``: This image is built daily from the `GitHub main branch `_ and generally does not guarantee stability. * ``v``: This image is built each time a Xinference release version is published, and it is typically more stable. * ``latest``: This image is built with the latest Xinference release version. * For CPU version, add ``-cpu`` suffix, e.g. ``nightly-main-cpu``. Dockerfile for custom build =========================== If you need to build the Xinference image according to your own requirements, the source code for the Dockerfile is located at `xinference/deploy/docker/Dockerfile `_ for reference. Please make sure to be in the top-level directory of Xinference when using this Dockerfile. For example: .. code-block:: bash git clone https://github.com/xorbitsai/inference.git cd inference docker build --progress=plain -t test -f xinference/deploy/docker/Dockerfile . Image usage =========== You can start Xinference in the container like this, simultaneously mapping port 9997 in the container to port 9998 on the host, enabling debug logging, and downloading models from modelscope. .. code-block:: bash docker run -e XINFERENCE_MODEL_SRC=modelscope -p 9998:9997 --gpus all xprobe/xinference:v xinference-local -H 0.0.0.0 --log-level debug .. warning:: * The option ``--gpus`` is essential and cannot be omitted, because as mentioned earlier, the image requires the host machine to have a GPU. Otherwise, errors will occur. * The ``-H 0.0.0.0`` parameter after the ``xinference-local`` command cannot be omitted. Otherwise, the host machine may not be able to access the port inside the container. * You can add multiple ``-e`` options to introduce multiple environment variables. Certainly, if you prefer, you can also manually enter the docker container and start Xinference in any desired way. .. note:: For multiple GPUs, make sure to set the shared memory size, for example: `docker run --shm-size=128g ...` Mount your volume for loading and saving models =============================================== The image does not contain any model files by default, and it downloads the models into the container. Typically, you would need to mount a directory on the host machine to the docker container, so that Xinference can download the models onto it, allowing for reuse. In this case, you need to specify a volume when running the Docker image and configure environment variables for Xinference: .. code-block:: bash docker run -v : -e XINFERENCE_HOME= -p 9998:9997 --gpus all xprobe/xinference:v xinference-local -H 0.0.0.0 The principle behind the above command is to mount the specified directory from the host machine into the container, and then set the ``XINFERENCE_HOME`` environment variable to point to that directory inside the container. This way, all downloaded model files will be stored in the directory you specified on the host machine. You don't have to worry about losing them when the Docker container stops, and the next time you run it, you can directly use the existing models without the need for repetitive downloads. If you downloaded the model using the default path on the host machine, and since the xinference cache directory stores the model using symbolic links, you need to mount the directory where the original file is located into the container as well. For example, if you are using HuggingFace and Modelscope as model hub, you would need to mount the corresponding directories into the container. Generally, the cache directories for HuggingFace and Modelscope are located at /.cache/huggingface and /.cache/modelscope. The command would be like: .. code-block:: bash docker run \ -v /.xinference:/root/.xinference \ -v /.cache/huggingface:/root/.cache/huggingface \ -v /.cache/modelscope:/root/.cache/modelscope \ -p 9997:9997 \ --gpus all \ xprobe/xinference:v \ xinference-local -H 0.0.0.0 ================================================ FILE: doc/source/getting_started/using_kubernetes.rst ================================================ .. _using_kubernetes: ######################## Xinference on Kubernetes ######################## ************ Helm Support ************ Xinference provides a method for installation in a Kubernetes cluster via ``Helm`` . Prerequisites ============= * You have a fully functional Kubernetes cluster. * Enable GPU support in Kubernetes, refer to `here `_. * ``Helm`` is correctly installed. Steps ===== #. Add xinference helm repo. .. code-block:: bash helm repo add xinference https://xorbitsai.github.io/xinference-helm-charts #. Update xinference helm repo indexes and query versions. .. code-block:: bash helm repo update xinference helm search repo xinference/xinference --devel --versions #. Install .. code-block:: bash helm install xinference xinference/xinference -n xinference --version Customized Installation ======================= The installation method mentioned above sets up a Xinference cluster similar to a single-machine setup, with only one worker and all startup parameters at their default values. However, this is usually not the desired setup. Below are some common custom installation configurations. #. I need to download models from ``ModelScope``. .. code-block:: bash helm install xinference xinference/xinference -n xinference --version --set config.model_src="modelscope" #. I want to use cpu image of xinference (or use any other version of xinference images). .. code-block:: bash helm install xinference xinference/xinference -n xinference --version --set config.xinference_image="" #. I want to have 4 Xinference workers, with each worker managing 4 GPUs. .. code-block:: bash helm install xinference xinference/xinference -n xinference --version --set config.worker_num=4 --set config.gpu_per_worker="4" The above installation method is based on Helm ``--set`` option. For more complex custom installations, such as multiple workers with shared storage, it is highly recommended to use your own ``values.yaml`` file with Helm ``-f`` option for installation. The default ``values.yaml`` file is located `here `_. Some examples can be found `here `_. ****************** KubeBlocks Support ****************** You can also install Xinference in Kubernetes using the third-party ``KubeBlocks``. This method is not maintained by Xinference and does not guarantee timely updates or availability. Please refer to the documentation at `here `_. ================================================ FILE: doc/source/getting_started/using_xinference.rst ================================================ .. _using_xinference: ================ Using Xinference ================ Run Xinference Locally ====================== Let's start by running Xinference on a local machine and running a classic LLM model: ``qwen2.5-instruct``. After this quickstart, you will move on to learning how to deploy Xinference in a cluster environment. Start Local Server ------------------ First, please ensure that you have installed Xinference according to the instructions provided :ref:`here `. To start a local instance of Xinference, run the following command: .. tabs:: .. tab:: shell .. code-block:: bash xinference-local --host 0.0.0.0 --port 9997 .. tab:: output .. code-block:: bash INFO Xinference supervisor 0.0.0.0:64570 started INFO Xinference worker 0.0.0.0:64570 started INFO Starting Xinference at endpoint: http://0.0.0.0:9997 INFO Uvicorn running on http://0.0.0.0:9997 (Press CTRL+C to quit) .. note:: By default, Xinference uses ``/.xinference`` as home path to store necessary files such as logs and models, where ```` is the home path of current user. You can change this directory by configuring the environment variable ``XINFERENCE_HOME``. For example: .. code-block:: bash XINFERENCE_HOME=/tmp/xinference xinference-local --host 0.0.0.0 --port 9997 Congrats! You now have Xinference running on your local machine. Once Xinference is running, there are multiple ways we can try it: via the web UI, via cURL, via the command line, or via the Xinference's python client. You can visit the web UI at `http://127.0.0.1:9997/ui `_ and visit `http://127.0.0.1:9997/docs `_ to inspect the API docs. You can install the Xinference command line tool and Python client using the following command: .. code-block:: bash pip install xinference The command line tool is ``xinference``. You can list the commands that can be used by running: .. tabs:: .. tab:: shell .. code-block:: bash xinference --help .. tab:: output .. code-block:: bash Usage: xinference [OPTIONS] COMMAND [ARGS]... Options: -v, --version Show the version and exit. --log-level TEXT -H, --host TEXT -p, --port INTEGER --help Show this message and exit. Commands: cached cal-model-mem chat engine generate launch list login register registrations remove-cache stop-cluster terminate unregister vllm-models You can install the Xinference Python client with minimal dependencies using the following command. Please ensure that the version of the client matches the version of the Xinference server. .. code-block:: bash pip install xinference-client==${SERVER_VERSION} .. _about_model_engine: About Model Engine ------------------ Since ``v0.11.0`` , before launching the LLM model, you need to specify the inference engine you want to run. Currently, xinference supports the following inference engines: * ``vllm`` * ``sglang`` * ``llama.cpp`` * ``transformers`` * ``MLX`` About the details of these inference engine, please refer to :ref:`here `. Note that when launching a LLM model, the ``model_format`` and ``quantization`` of the model you want to launch is closely related to the inference engine. You can use ``xinference engine`` command to query the combination of parameters of the model you want to launch. This will demonstrate under what conditions a model can run on which inference engines. For example: #. I would like to query about which inference engines the ``qwen-chat`` model can run on, and what are their respective parameters. .. code-block:: bash xinference engine -e --model-name qwen-chat #. I want to run ``qwen-chat`` with ``VLLM`` as the inference engine, but I don't know how to configure the other parameters. .. code-block:: bash xinference engine -e --model-name qwen-chat --model-engine vllm #. I want to launch the ``qwen-chat`` model in the ``GGUF`` format, and I need to know how to configure the remaining parameters. .. code-block:: bash xinference engine -e --model-name qwen-chat -f ggufv2 In summary, compared to previous versions, when launching LLM models, you need to additionally pass the ``model_engine`` parameter. You can retrieve information about the supported inference engines and their related parameter combinations through the ``xinference engine`` command. .. note:: Here are some recommendations on when to use which engine: - **Linux** - When possible, prioritize using **vLLM** or **SGLang** for better performance. - If resources are limited, consider using **llama.cpp**, as it offers more quantization options. - For other cases, consider using **Transformers**, which supports nearly all models. - **Windows** - It is recommended to use **WSL**, and in this case, follow the same choices as Linux. - Otherwise, prefer **llama.cpp**, and for unsupported models, opt for **Transformers**. - **Mac** - If supported by the model, use the **MLX engine**, as it delivers the best performance. - For other cases, prefer **llama.cpp**, and for unsupported models, choose **Transformers**. Run qwen2.5-instruct -------------------- Let's start by running a built-in model: ``qwen2.5-instruct``. When you start a model for the first time, Xinference will download the model parameters from HuggingFace, which might take a few minutes depending on the size of the model weights. We cache the model files locally, so there's no need to redownload them for subsequent starts. .. note:: Xinference also allows you to download models from other sites. You can do this by setting an environment variable when launching Xinference. For example, if you want to download models from `modelscope `_, do the following: .. code-block:: bash XINFERENCE_MODEL_SRC=modelscope xinference-local --host 0.0.0.0 --port 9997 We can specify the model's UID using the ``--model-uid`` or ``-u`` flag. If not specified, Xinference will generate a unique ID. The default unique ID will be identical to the model name. .. tabs:: .. code-tab:: bash shell xinference launch --model-engine -n qwen2.5-instruct -s 0_5 -f pytorch .. code-tab:: bash cURL curl -X 'POST' \ 'http://127.0.0.1:9997/v1/models' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "model_engine": "", "model_name": "qwen2.5-instruct", "model_format": "pytorch", "size_in_billions": "0_5" }' .. code-tab:: python from xinference.client import RESTfulClient client = RESTfulClient("http://127.0.0.1:9997") model_uid = client.launch_model( model_engine="", model_name="qwen2.5-instruct", model_format="pytorch", size_in_billions="0_5" ) print('Model uid: ' + model_uid) .. code-tab:: bash output Model uid: qwen2.5-instruct .. note:: For some engines, such as vllm, users need to specify the engine-related parameters when running models. In this case, you can directly specify the parameter name and value in the command line, for example: .. code-block:: bash xinference launch --model-engine vllm -n qwen2.5-instruct -s 0_5 -f pytorch --gpu_memory_utilization 0.9 `gpu_memory_utilization=0.9` will pass to vllm when launching model. .. note:: For more tips on model launching, refer to :ref:`launch`. Congrats! You now have ``qwen2.5-instruct`` running by Xinference. Once the model is running, we can try it out either via cURL, or via Xinference's python client: .. tabs:: .. code-tab:: bash cURL curl -X 'POST' \ 'http://127.0.0.1:9997/v1/chat/completions' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "model": "qwen2.5-instruct", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is the largest animal?" } ] }' .. code-tab:: python from xinference.client import RESTfulClient client = RESTfulClient("http://127.0.0.1:9997") model = client.get_model("qwen2.5-instruct") model.chat( messages=[ {"role": "user", "content": "Who won the world series in 2020?"} ] ) .. code-tab:: json output { "id": "chatcmpl-8d76b65a-bad0-42ef-912d-4a0533d90d61", "model": "qwen2.5-instruct", "object": "chat.completion", "created": 1688919187, "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The largest animal that has been scientifically measured is the blue whale, which has a maximum length of around 23 meters (75 feet) for adult animals and can weigh up to 150,000 pounds (68,000 kg). However, it is important to note that this is just an estimate and that the largest animal known to science may be larger still. Some scientists believe that the largest animals may not have a clear \"size\" in the same way that humans do, as their size can vary depending on the environment and the stage of their life." }, "finish_reason": "None" } ], "usage": { "prompt_tokens": -1, "completion_tokens": -1, "total_tokens": -1 } } Xinference provides OpenAI-compatible APIs for its supported models, so you can use Xinference as a local drop-in replacement for OpenAI APIs. For example: .. code-block:: python from openai import OpenAI client = OpenAI(base_url="http://127.0.0.1:9997/v1", api_key="not used actually") response = client.chat.completions.create( model="qwen2.5-instruct", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the largest animal?"} ] ) print(response) The following OpenAI APIs are supported: - Chat Completions: `https://platform.openai.com/docs/api-reference/chat `_ - Completions: `https://platform.openai.com/docs/api-reference/completions `_ - Embeddings: `https://platform.openai.com/docs/api-reference/embeddings `_ Xinference also supports Anthropic API via base url ``http://127.0.0.1:9997/anthropic``, you can use Xinference in Claude Code and so forth. Refer to :ref:`anthropic client ` for more details. Manage Models ------------- In addition to launching models, Xinference offers various ways to manage the entire lifecycle of models. You can manage models in Xinference through the command line, cURL, or Xinference's python client. You can list all models of a certain type that are available to launch in Xinference: .. tabs:: .. code-tab:: bash shell xinference registrations -t LLM .. code-tab:: bash cURL curl http://127.0.0.1:9997/v1/model_registrations/LLM .. code-tab:: python from xinference.client import RESTfulClient client = RESTfulClient("http://127.0.0.1:9997") print(client.list_model_registrations(model_type='LLM')) The following command gives you the currently running models in Xinference: .. tabs:: .. code-tab:: bash shell xinference list .. code-tab:: bash cURL curl http://127.0.0.1:9997/v1/models .. code-tab:: python from xinference.client import RESTfulClient client = RESTfulClient("http://127.0.0.1:9997") print(client.list_models()) When you no longer need a model that is currently running, you can remove it in the following way to free up the resources it occupies: .. tabs:: .. code-tab:: bash shell xinference terminate --model-uid "qwen2.5-instruct" .. code-tab:: bash cURL curl -X DELETE http://127.0.0.1:9997/v1/models/qwen2.5-instruct .. code-tab:: python from xinference.client import RESTfulClient client = RESTfulClient("http://127.0.0.1:9997") client.terminate_model(model_uid="qwen2.5-instruct") .. _distributed_getting_started: Deploy Xinference In a Cluster ============================== To deploy Xinference in a cluster, you need to start a Xinference supervisor on one server and Xinference workers on the other servers. First, make sure you have already installed Xinference on each of the servers according to the instructions provided :ref:`here `. Then follow the steps below: Start the Supervisor -------------------- On the server where you want to run the Xinference supervisor, run the following command: .. code-block:: bash xinference-supervisor -H "${supervisor_host}" Replace ``${supervisor_host}`` with the actual host of your supervisor server. You can the supervisor's web UI at `http://${supervisor_host}:9997/ui `_ and visit `http://${supervisor_host}:9997/docs `_ to inspect the API docs. Start the Workers ----------------- On each of the other servers where you want to run Xinference workers, run the following command: .. code-block:: bash xinference-worker -e "http://${supervisor_host}:9997" -H "${worker_host}" .. note:: Note that you must replace ``${worker_host}`` with the actual host of your worker server. .. note:: Note that if you need to interact with the Xinference in a cluster via the command line, you should include the ``-e`` or ``--endpoint`` flag to specify the supervisor server's endpoint. For example: .. code-block:: bash xinference launch -n qwen2.5-instruct -s 0_5 -f pytorch -e "http://${supervisor_host}:9997" Using Xinference With Docker ============================= To start Xinference in a Docker container, run the following command: Run On Nvidia GPU Host ----------------------- For cuda 12.4: .. code-block:: bash docker run -e XINFERENCE_MODEL_SRC=modelscope -p 9998:9997 --gpus all xprobe/xinference: xinference-local -H 0.0.0.0 --log-level debug For cuda 12.8: .. versionadded:: v1.8.1 CUDA 12.8 version is experimental, welcome to give us feedbacks to help us to improve. .. versionchanged:: v1.16.0 CUDA 12.8 version is removed in v1.16.0 . .. code-block:: bash docker run -e XINFERENCE_MODEL_SRC=modelscope -p 9998:9997 --gpus all xprobe/xinference:-cu128 xinference-local -H 0.0.0.0 --log-level debug For cuda 12.9: .. versionadded:: v1.16.0 CUDA 12.9 will become the default version when Xinference v2.0.0 released. .. code-block:: bash docker run -e XINFERENCE_MODEL_SRC=modelscope -p 9998:9997 --gpus all xprobe/xinference:-cu129 xinference-local -H 0.0.0.0 --log-level debug Run On CPU Only Host ----------------------- .. code-block:: bash docker run -e XINFERENCE_MODEL_SRC=modelscope -p 9998:9997 xprobe/xinference:-cpu xinference-local -H 0.0.0.0 --log-level debug Replace ```` with Xinference versions, e.g. ``v0.10.3``, ``latest`` can be used for the latest version. For more docker usage, refer to :ref:`Using Docker Image `. What's Next? ============ Congratulations on getting started with Xinference! To help you navigate and make the most out of this powerful tool, here are some resources and guides: * :ref:`How to Use Client APIs for Different Types of Models ` * :ref:`Choosing the Right Backends for Your Needs ` ================================================ FILE: doc/source/index.rst ================================================ .. _index: ====================== Welcome to Xinference! ====================== .. toctree:: :maxdepth: 2 :hidden: getting_started/index models/index user_guide/index examples/index reference/index development/index Xorbits Inference (Xinference) is an open-source platform to streamline the operation and integration of a wide array of AI models. With Xinference, you're empowered to run inference using any open-source LLMs, embedding models, and multimodal models either in the cloud or on your own premises, and create robust AI-driven applications. Developing Real-world AI Applications with Xinference ----------------------------------------------------- .. tabs:: .. code-tab:: python LLM from xinference.client import Client client = Client("http://localhost:9997") model = client.get_model("MODEL_UID") # Chat to LLM model.chat( messages=[{"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "What is the largest animal?"}], generate_config={"max_tokens": 1024} ) # Chat to VL model model.chat( messages=[ { "role": "user", "content": [ {"type": "text", "text": "What’s in this image?"}, { "type": "image_url", "image_url": { "url": "http://i.epochtimes.com/assets/uploads/2020/07/shutterstock_675595789-600x400.jpg", }, }, ], } ], generate_config={"max_tokens": 1024} ) .. code-tab:: python Embedding from xinference.client import Client client = Client("http://localhost:9997") model = client.get_model("MODEL_UID") model.create_embedding("What is the capital of China?") .. code-tab:: python Image from xinference.client import Client client = Client("http://localhost:9997") model = client.get_model("MODEL_UID") model.text_to_image("An astronaut walking on the mars") .. code-tab:: python Audio from xinference.client import Client client = Client("http://localhost:9997") model = client.get_model("MODEL_UID") with open("speech.mp3", "rb") as audio_file: model.transcriptions(audio_file.read()) .. code-tab:: python Rerank from xinference.client import Client client = Client("http://localhost:9997") model = client.get_model("MODEL_UID") query = "A man is eating pasta." corpus = [ "A man is eating food.", "A man is eating a piece of bread.", "The girl is carrying a baby.", "A man is riding a horse.", "A woman is playing violin." ] print(model.rerank(corpus, query)) .. code-tab:: python Video from xinference.client import Client client = Client("http://localhost:9997") model = client.get_model("MODEL_UID") model.text_to_video("") Getting Started --------------- .. grid:: 2 .. grid-item-card:: Install Xinference :link: installation :link-type: ref Install Xinference on Linux, Windows, and macOS. .. grid-item-card:: Try it out! :link: using_xinference :link-type: ref Start by running Xinference on a local machine. .. grid:: 2 .. grid-item-card:: Explore models :link: models_builtin_index :link-type: ref Explore a wide range of models supported by Xinference. .. grid-item-card:: Register your own model :link: models_custom :link-type: ref Register model weights and turn it into an API. Explore the API --------------- .. grid:: 2 .. grid-item-card:: Chat & Generate :link: chat :link-type: ref Learn how to chat with LLMs in Xinference. .. grid-item-card:: Tools :link: tools :link-type: ref Learn how to connect LLM with external tools. .. grid:: 2 .. grid-item-card:: Embeddings :link: embed :link-type: ref Learn how to create text embeddings in Xinference. .. grid-item-card:: Rerank :link: rerank :link-type: ref Learn how to use rerank models in Xinference. .. grid:: 2 .. grid-item-card:: Images :link: image :link-type: ref Learn how to generate images with Xinference. .. grid-item-card:: Multimodal :link: multimodal :link-type: ref Learn how to process images and audio with LLMs. .. grid:: 2 .. grid-item-card:: Audio :link: audio :link-type: ref Learn how to turn audio into text or text into audio with Xinference. .. grid-item-card:: Video :link: video :link-type: ref Learn how to generate video with Xinference. .. grid:: 2 .. grid-item-card:: Flexible :link: flexible :link-type: ref Learn how to inference traditional ML models with Xinference. Getting Involved ---------------- .. grid:: :gutter: 1 .. grid-item:: .. div:: sd-font-weight-normal sd-fs-5 Get Latest News .. grid:: 1 :gutter: 3 .. grid-item-card:: :link: https://twitter.com/Xorbitsio :fab:`twitter` Follow us on Twitter .. grid-item-card:: :link: https://zhihu.com/org/xorbits :fab:`zhihu` Read our blogs .. grid-item:: .. div:: sd-font-weight-normal sd-fs-5 Get Support .. grid:: 1 :gutter: 3 .. grid-item-card:: :link: https://xinference.cn/images/WeCom.jpg :fab:`weixin` Find community on WeChat .. grid-item-card:: :link: https://discord.gg/Xw9tszSkr5 :fab:`discord` Find community on Discord .. grid-item-card:: :link: https://github.com/xorbitsai/inference/issues/new/choose :fab:`github` Open an issue .. grid-item:: .. div:: sd-fs-5 Contribute to Xinference .. grid:: 1 :gutter: 3 .. grid-item-card:: :link: https://github.com/xorbitsai/inference/pulls :fab:`github` Create a pull request ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/development/contributing_codebase.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-03-07 15:03+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/development/contributing_codebase.rst:3 msgid "Contributing to the code base" msgstr "代码库开发指南" #: ../../source/development/contributing_codebase.rst:6 msgid "Table of contents:" msgstr "目录" #: ../../source/development/contributing_codebase.rst:9 msgid "Code standards" msgstr "代码规范" #: ../../source/development/contributing_codebase.rst:11 msgid "" "Writing good code is not just about what you write. It is also about " "*how* you write it. During Continuous Integration testing, several tools " "will be run to check your code for stylistic errors. Good style is a " "requirement for submitting code to Xinference." msgstr "" "写出好的代码不仅在于你写了什么,更在于你是如何写的。在持续集成测试期间,会有多个工具来检查您的代码是否存在风格错误。良好的编程风格是提交代码到 " "Xinference 的要求之一。" #: ../../source/development/contributing_codebase.rst:15 msgid "" "In addition, it is important that we do not make sudden changes to the " "code that could have the potential to break a lot of user code as a " "result. Therefore we need it to be as backwards compatible as possible to" " avoid mass breakages." msgstr "此外,不要对代码进行突然的更改,这可能会导致大量用户代码出现问题。所以,我们需要尽可能地保持向后兼容,以避免大规模的故障。" #: ../../source/development/contributing_codebase.rst:20 msgid "Autofixing formatting errors" msgstr "自动修复格式错误" #: ../../source/development/contributing_codebase.rst:22 msgid "" "Moreover, Continuous Integration will run code formatting checks like " "``black``, ``flake8``, ``isort``, and others using `pre-commit hooks " "`_ Any warnings generated by these checks will " "cause the Continuous Integration to fail. Therefore, it is advisable to " "run the check yourself before submitting code. This can be done by " "installing ``pre-commit``::" msgstr "" "此外,持续集成将使用 `pre-commit hooks `_ 运行诸如 ``black``、``flake8``、``isort`` " "等代码格式检查工具。任何由这些检查生成的警告都将导致持续集成失败。因此,建议在提交代码之前自行运行这些检查。" "可以通过在 Xinference 仓库的根目录下安装 ``pre-commit`` 来完成这一操作:" #: ../../source/development/contributing_codebase.rst:30 msgid "and then running::" msgstr "然后执行命令:" #: ../../source/development/contributing_codebase.rst:34 msgid "" "from the root of the Xinference repository. This setup ensures that all " "styling checks are automatically executed each time you commit changes " "without your needing to run each one manually. In addition, using ``pre-" "commit`` will also allow you to more easily remain up-to-date with our " "code checks as they change." msgstr "" "安装好了以后就能确保每次提交更改时都会自动执行所有样式检查,无需手动逐个运行。" "此外,使用 ``pre-commit`` 也能让您更轻松地在我们的代码检查发生更改的时候保持同步。" #: ../../source/development/contributing_codebase.rst:39 msgid "" "Note that if needed, you can skip these checks with ``git commit --no-" "verify``." msgstr "请注意,如果需要,您可以通过使用 ``git commit --no-verify`` 命令来跳过这些检查。" #: ../../source/development/contributing_codebase.rst:41 msgid "" "If you don't want to use ``pre-commit`` as part of your workflow, you can" " still use it to run its checks with::" msgstr "如果您不想将 ``pre-commit`` 作为工作流程的一部分,仍然可以运行如下命令来使用它进行检查:" #: ../../source/development/contributing_codebase.rst:46 #: ../../source/development/contributing_codebase.rst:52 msgid "without needing to have done ``pre-commit install`` beforehand." msgstr "而不需要事先执行 ``pre-commit install``。" #: ../../source/development/contributing_codebase.rst:48 msgid "" "If you want to run checks on all recently committed files on " "upstream/main you can use::" msgstr "如果您想在所有最近提交的文件上运行检查,您可以使用以下命令:" #: ../../source/development/contributing_codebase.rst:56 msgid "" "You may consider periodically running ``pre-commit gc`` to clean up repos" " which are no longer used." msgstr "您可以考虑定期运行 ``pre-commit gc`` 命令来清理不再使用的存储库。" #: ../../source/development/contributing_codebase.rst:61 msgid "" "If you have conflicting installations of ``virtualenv``, if could lead to" " errors - refer to `here " "`_." msgstr "如果您安装了冲突的 ``virtualenv`` 版本,可能会导致错误 - 可以参考" " `这里 `_ 。" #: ../../source/development/contributing_codebase.rst:64 msgid "" "Also, due to a `bug in virtualenv " "`_, you may run into " "issues if you're using conda. To solve this, you can downgrade " "``virtualenv`` to version ``20.0.33``." msgstr "" "此外,由于 ``virtualenv`` 中的一个 `错误 `_ ,如果您使用 conda,可能会遇到问题。" "要解决这个问题,您可以将 ``virtualenv`` 降级到版本 ``20.0.33``。" #: ../../source/development/contributing_codebase.rst:69 msgid "Backwards compatibility" msgstr "向后兼容" #: ../../source/development/contributing_codebase.rst:71 msgid "" "Please try to maintain backward compatibility. If you think breakage is " "necessary, clearly state why as part of the pull request. Also, be " "careful when changing method signatures and add deprecation warnings " "where needed. Also, add the deprecated sphinx directive to the deprecated" " functions or methods." msgstr "" "请尽量保持向后兼容性。如果您认为必须进行更改,请在拉取请求中说明清楚原因。同时,在更改方法签名时要小心,并在需要时添加弃用警告。此外,为弃用的函数或方法添加弃用的" " sphinx 指令。" #: ../../source/development/contributing_codebase.rst:76 msgid "You'll also need to" msgstr "同时你还需要" #: ../../source/development/contributing_codebase.rst:78 msgid "" "Write a new test that asserts a warning is issued when calling with the " "deprecated argument" msgstr "编写一个新的测试样例,在调用带有弃用参数时会发出警告。" #: ../../source/development/contributing_codebase.rst:79 msgid "Update all of Xinference existing tests and code to use the new argument" msgstr "更新所有 Xinference 现有的测试样例和代码,以使用新的参数。" #: ../../source/development/contributing_codebase.rst:82 msgid "Type hints" msgstr "类型提示" #: ../../source/development/contributing_codebase.rst:84 msgid "" "Xinference strongly encourages the use of :pep:`484` style type hints. " "New development should contain type hints and pull requests to annotate " "existing code are accepted as well!" msgstr "Xinference 强烈鼓励使用 :pep:`484` 风格的类型提示。新的开发应包含类型提示,并且对现有代码进行注释的拉取请求也是可以接受的!" #: ../../source/development/contributing_codebase.rst:88 msgid "Test-driven development" msgstr "测试驱动开发" #: ../../source/development/contributing_codebase.rst:90 msgid "" "Xinference is serious about testing and strongly encourages contributors " "to embrace `test-driven development (TDD) `_. This development process \"relies on the " "repetition of a very short development cycle: first the developer writes " "an (initially failing) automated test case that defines a desired " "improvement or new function, then produces the minimum amount of code to " "pass that test.\" So, before actually writing any code, you should write " "your tests. Often the test can be taken from the original GitHub issue. " "However, it is always worth considering additional use cases and writing " "corresponding tests." msgstr "" "Xinference 非常重视测试,并强烈鼓励贡献者采用 `测试驱动开发(TDD) `_ 。这种开发过程 " "\"依赖于非常短的开发周期的重复:首先,开发者编写一个(初始为失败的)自动化测试样例来定义所需的改进或新功能,然后用最少的代码来通过该测试。\"因此,在实际编写任何代码之前,您应该编写您的测试样例。通常,测试样例可以从原始的" " GitHub issue 中获取。然而,值得考虑额外的情况并编写相应的测试样例。" #: ../../source/development/contributing_codebase.rst:99 msgid "" "Adding tests is frequently requested after code is pushed to Xinference. " "Thus, it is worth getting in the habit of writing tests ahead of time so " "this is never an issue." msgstr "在将代码推送到 Xinference 之后,经常会要求添加测试样例。因此,养成提前编写测试样例的习惯非常重要,这样就不会出现问题。" #~ msgid "Pre-commit" #~ msgstr "Pre-commit" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/development/contributing_environment.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-08-02 23:15+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/development/contributing_environment.rst:3 msgid "Creating a development environment" msgstr "创建开发环境" #: ../../source/development/contributing_environment.rst:6 msgid "Table of contents:" msgstr "目录" #: ../../source/development/contributing_environment.rst:8 msgid "" "Before proceeding with any code modifications, it's essential to set up " "the necessary environment for Xinference development, which includes " "familiarizing yourself with Git usage, establishing an isolated " "environment, installing Xinference, and compiling the frontend." msgstr "" "在进行任何代码修改之前,建立起适用于 Xinference 开发的必要环境至关重要。" "包括熟悉 Git 的使用、建立一个独立的环境、安装 Xinference 以及前端部分的" "编译。" #: ../../source/development/contributing_environment.rst:12 msgid "Getting started with Git" msgstr "Git 的使用" #: ../../source/development/contributing_environment.rst:14 msgid "" "Now that you have identified an issue you wish to resolve, an enhancement" " to incorporate, or documentation to enhance, it's crucial to acquaint " "yourself with GitHub and the Xinference codebase." msgstr "" "当你有一个需要修复的问题、需要添加的增强功能或需要改进的文档时,熟悉 " "GitHub 和 Xinference 代码库很重要。" #: ../../source/development/contributing_environment.rst:17 msgid "" "To the new user, working with Git is one of the more intimidating aspects" " of contributing to Xinference. It can very quickly become overwhelming, " "but sticking to the guidelines below will help simplify the process and " "minimize potential issues. As always, if you are having difficulties " "please feel free to ask for help." msgstr "" "对新用户来说,使用 Git 是参与 Xinference 开发最令人畏惧的方面之一。很快就" "会感到压力山大,但以下指南将有助于简化流程并减少潜在问题。如果您遇到" "难以解决的问题,欢迎在社区寻求帮助。" #: ../../source/development/contributing_environment.rst:22 msgid "" "The code is hosted on `GitHub `_." " To contribute you will need to sign up for a `free GitHub account " "`_. We use `Git `_ " "for version control to allow many people to work together on the project." msgstr "" "Xinference 的代码托管在 `GitHub `" "_ 。要参与 Xinference 代码贡献,你需要注册一个 `免费的 GitHub 账户 `_ 。我们使用 `Git `_ " "进行版本控制,以便大家共同参与项目的开发。" #: ../../source/development/contributing_environment.rst:27 msgid "" "`GitHub has instructions `__" " for installing git, setting up your SSH key, and configuring git. All " "these steps need to be completed before you can work seamlessly between " "your local repository and GitHub." msgstr "" "你可以参考 `GitHub 指南 `_ " "来安装 git,设置 SSH 密钥以及配置 git。你需要完成这些步骤以确保你的本地" "仓库和 GitHub 可以正常工作,后续的工作才可以顺利进行。" #: ../../source/development/contributing_environment.rst:31 msgid "Some great resources for learning Git:" msgstr "以下是一些很好的学习 Git 的资源:" #: ../../source/development/contributing_environment.rst:33 msgid "`Official Git Documentation `_" msgstr "`Git 官方文档 `_" #: ../../source/development/contributing_environment.rst:34 msgid "`Pro Git Book `_" msgstr "`Pro Git 书籍 `_" #: ../../source/development/contributing_environment.rst:35 msgid "`Git Tutorial by Atlassian `_" msgstr "`Atlassian 提供的 Git 教程 `_" #: ../../source/development/contributing_environment.rst:36 msgid "" "`Git - Concise Guide `_" msgstr "`Git-简明指南 `_" #: ../../source/development/contributing_environment.rst:39 msgid "" "If the speed of ``git clone`` is slow, you can use the following command " "to add a proxy:" msgstr "如果在 ``git clone`` 代码的时候速度较慢,可以通过如下命令添加代理" #: ../../source/development/contributing_environment.rst:47 msgid "Creating an isolated environment" msgstr "创建一个隔离环境" #: ../../source/development/contributing_environment.rst:49 msgid "" "Before formally installing Xinference, it's recommended to create an " "isolated environment, using Conda recommended, for ease of subsequent " "operations." msgstr "在正式安装Xinference之前,建议使用 Conda 创建一个隔离环境方便后续操作。" #: ../../source/development/contributing_environment.rst:57 msgid "``xinf`` can be replaced with a custom Conda environment name." msgstr "``xinf`` 可替换为自定义的 Conda 环境名。" #: ../../source/development/contributing_environment.rst:59 msgid "" "Afterward, you'll need to install Python and Node.js (npm) in the newly " "created Conda environment. Here are the commands:" msgstr "随后需要在新建的 Conda 环境中安装 Python 以及 Node.js (npm)。命令如下:" #: ../../source/development/contributing_environment.rst:68 msgid "Install from source code" msgstr "从源码安装" #: ../../source/development/contributing_environment.rst:70 msgid "" "Before we begin, please make sure that you have cloned the repository. " "Suppose you clone the repository as ``inference`` directory, ``cd`` to " "this directory where the ``setup.cfg`` and ``setup.py`` files are " "located, and run the following command:" msgstr "" "在开始之前,请确保您已经克隆了存储库。假设您将存储库克隆到名为 ``" "inference`` 的目录中,请进入该目录,其中包含 ``setup.cfg`` 和 ``setup.py`" "` 文件,并执行以下命令:" #: ../../source/development/contributing_environment.rst:79 msgid "" "If the commands run successfully, you can use Xinference normally. For " "detailed usage instructions, refer to `using_xinference " "`__." msgstr "" "如果命令能够成功运行,接下来就能正常使用 Xinference 了,使用教程详情见 `" "使用 `__。" #: ../../source/development/contributing_environment.rst:83 msgid "" "If errors occur or the process freezes during execution, the next step is" " to compile the frontend." msgstr "如果出现报错或者在运行过程中卡死,那就需要进行下一步前端编译。" #: ../../source/development/contributing_environment.rst:87 msgid "Frontend Compilation" msgstr "前端编译" #: ../../source/development/contributing_environment.rst:89 msgid "" "Navigate to the ``inference/xinference/ui/web/ui`` directory. Then, " "execute the following command to clear the cache:" msgstr "" "首先需要进入 ``inference/xinference/ui/web/ui`` 目录下,随后执行如下命令清除" "缓存:" #: ../../source/development/contributing_environment.rst:96 msgid "" "If the command fails to execute, you can try adding the ``--force`` " "option." msgstr "如果命令执行失败,您可以尝试添加 ``--force`` 选项" #: ../../source/development/contributing_environment.rst:99 msgid "" "If the ``node_modules`` folder already exists in this directory, it's " "recommended to manually delete it before cleaning the cache." msgstr "如果该目录下已经存在 ``node_modules`` 文件夹的话建议先手动删除该文件夹" #: ../../source/development/contributing_environment.rst:102 msgid "" "Next, execute the following command in this directory to compile the " "frontend:" msgstr "接着在该目录下执行以下命令进行前端编译:" #: ../../source/development/contributing_environment.rst:110 msgid "" "Still, if the first command fails to execute, you can try adding the " "``--force`` option." msgstr "如果第一个命令执行失败,您仍然可以尝试通过添加 ``--force`` 选项解决" #: ../../source/development/contributing_environment.rst:112 msgid "" "After compiling the frontend, you can ``cd`` back to the directory where " "the ``setup.cfg`` and ``setup.py`` files are located, and install " "Xinference via ``pip install -e .``." msgstr "" "编译完前端后,您可以返回到包含 ``setup.cfg`` 和 ``setup.py`` 文件的目录," "然后通过 ``pip install -e .`` 安装 Xinference。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/development/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-03-06 12:05+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/development/index.rst:5 msgid "Development" msgstr "开发指南" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/development/xinference_internals.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-05-31 11:46+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/development/xinference_internals.rst:3 msgid "The internals of Xinference" msgstr "Xinference 的内部结构" #: ../../source/development/xinference_internals.rst:6 msgid "Table of contents:" msgstr "目录" #: ../../source/development/xinference_internals.rst:9 msgid "Overview" msgstr "概述" #: ../../source/development/xinference_internals.rst:10 msgid "" "Xinference leverages `Xoscar `_, an " "actor programming framework we designed, as its core component to manage " "machines, devices, and model inference processes. Each actor serves as a " "basic unit for model inference and various inference backends can be " "integrate into the actor, enabling us to support multiple inference " "engines and hardware. These actors are hosted and scheduled within actor " "pools, which are designed to be asynchronous and non-blocking and " "function as resource pools." msgstr "" "Xinference 利用我们设计的 actor 编程框架 `Xoscar `_ 作为其核心组件,以管理机器、设备和模型推理进程。每个 " "actor 都是模型推理的基本单元,各种推理后端可以集成到 actor 中,从而使我们" "能够支持多种推理引擎和硬件。这些 actor 在 actor 池中托管和调度,actor 池" "具有资源池的功能,actor 的设计是异步和非阻塞的,。" #: ../../source/development/xinference_internals.rst:22 msgid "" "Both supervisor and worker are actor instances. Initially, an actor pool," " serving as a resource pool, needs to be created on each server; and each" " actor can utilize a CPU core or a GPU device. Each server has its own " "address (IP address or hostname), so actors on different computing nodes " "can communicate with each other through these addresses. See `Actor`_ for" " more information." msgstr "" "supervisor 和 worker 都是 actor 实例。需要先在每台服务器上创建一个作为" "资源池的 actor 池;每个 actor 可以使用一个 CPU 内核或一块 GPU 设备。每台" "服务器都有自己的地址(IP 地址或主机名),因此不同计算节点上的 actor 可以" "通过这些地址相互通信。更多信息,请参阅 `Actor`_。" #: ../../source/development/xinference_internals.rst:27 msgid "RESTful API" msgstr "RESTful API" #: ../../source/development/xinference_internals.rst:28 msgid "" "The RESTful API is implemented using `FastAPI " "`_, as specified in " "`api/restful_api.py " "`_." msgstr "" "RESTful API 是利用 `FastAPI `_ 实现" "的,具体代码在 `api/restful_api.py `_。" #: ../../source/development/xinference_internals.rst:35 msgid "" "This is an example of the API ``/status``, it's corresponding function is" " ``get_status``. You can add connection between RESTful API and the " "backend function you want in `api/restful_api.py " "`_." msgstr "" "这是一个 API 的示例,API ``/status`` 对应函数 ``get_status``。您可以在 `" "api/restful_api.py `_ 中添加 RESTful API 和对应后端函数之间的" "关系。" #: ../../source/development/xinference_internals.rst:39 msgid "Command Line" msgstr "命令行" #: ../../source/development/xinference_internals.rst:40 msgid "" "The Command Line is implemented using `Click " "`_, as specified in " "`deploy/cmdline.py " "`_," " allowing users to interact with the Xinference deployment features " "directly from the terminal." msgstr "" "命令行是通过 `Click `_ 实现的,具体" "代码在 `deploy/cmdline.py `_,命令行允许用户直接在终端与 " "Xinference 进行交互。" #: ../../source/development/xinference_internals.rst:45 msgid "Entry Points" msgstr "入口点" #: ../../source/development/xinference_internals.rst:46 msgid "Take the command-lines we implemented as examples:" msgstr "以我们实现的命令行为例:" #: ../../source/development/xinference_internals.rst:48 msgid "" "``xinference``: Provides commands for model management, including " "registering/unregistering models, listing all registered/running models, " "and launching or terminating specific models. It also features " "interactive commands like generate and chat for testing and interacting " "with deployed models in real-time." msgstr "" "``xinference``:提供命令用于模型管理,包括注册/取消注册模型、列出所有已" "注册/运行的模型,以及启动或终止特定模型。它还提供生成语言和聊天等交互式" "命令,用于测试或交互已部署的模型。" #: ../../source/development/xinference_internals.rst:52 msgid "``xinference-local``: Starts a local Xinference service." msgstr "``xinference-local``:启动一个本地 Xinference 服务。" #: ../../source/development/xinference_internals.rst:54 msgid "" "``xinference-supervisor``: Initiates a supervisor process that manages " "and monitors worker actors within a distributed setup." msgstr "" "``xinference-supervisor``:启动 supervisor 进程,在分布式环境中管理和监控" " worker actors。" #: ../../source/development/xinference_internals.rst:56 msgid "" "``xinference-worker``: Starts a worker process that executes tasks " "assigned by the supervisor, utilizing available computational resources " "effectively." msgstr "" "``xinference-worker``:启动 worker 进程,利用可用计算资源,执行 " "supervisor 分配的任务。" #: ../../source/development/xinference_internals.rst:59 msgid "" "Each command is equipped with ``options`` and ``flags`` to customize its " "behavior, such as specifying log levels, host addresses, port numbers, " "and other relevant settings." msgstr "" "每条命令都配有 ``option`` 和 ``flag``,可自定义其行为,如指定日志级别、" "主机地址、端口号和其他相关设置。" #: ../../source/development/xinference_internals.rst:62 msgid "" "Python projects define command-line console entry points in `setup.cfg` " "or `setup.py`." msgstr "Python 项目会在 `setup.cfg` 或 `setup.py` 中定义命令行控制台入口点。" #: ../../source/development/xinference_internals.rst:72 msgid "" "The command-line ``xinference`` can be referred to code in " "``xinference.deploy.cmdline:cli``." msgstr "命令行 ``xinference`` 可参考 ``xinference.deploy.cmdline:cli`` 中的代码。" #: ../../source/development/xinference_internals.rst:75 msgid "Click" msgstr "Click" #: ../../source/development/xinference_internals.rst:76 msgid "We use Click to implement a specific command-line:" msgstr "我们使用 Click 来实现特定的命令行:" #: ../../source/development/xinference_internals.rst:95 msgid "" "For example, the ``xinference-local`` command allows you to define the " "host address and port." msgstr "例如,``xinference-local`` 命令允许您定义主机地址和端口。" #: ../../source/development/xinference_internals.rst:98 msgid "Actor" msgstr "Actor" #: ../../source/development/xinference_internals.rst:99 msgid "" "Xinference is fundamentally based on `Xoscar " "`_, our actor framework, which can " "manage computational resources and Python processes to support scalable " "and concurrent programming. The following is a pseudocode demonstrating " "how our Worker Actor works, the actual Worker Actor is more complex than " "this." msgstr "" "Xinference 以 `Xoscar `_ 为基础," "Xoscar 是我们的 actor 框架,可以管理计算资源和 Python 进程,支持可扩展的" "并发编程。下面的伪代码演示了 Worker Actor 的工作原理,实际的 Worker Actor" " 要比这个复杂得多。" #: ../../source/development/xinference_internals.rst:126 msgid "" "We use the ``WorkerActor`` as an example to illustrate how we build the " "Xinference. Each actor class is a standard Python class that inherits " "from ``xoscar.Actor``. An instance of this class is a specific actor " "within the actor pool." msgstr "" "我们以 ``WorkerActor`` 为例,说明如何构建 Xinference。每个 actor 类都是" "继承自 ``xoscar.Actor`` 的标准 Python 类。该类的实例就是 actor 池中的一个" "特定的 actor。" #: ../../source/development/xinference_internals.rst:130 msgid "" "**Define Actor Actions**: Each actor needs to define certain actions or " "behaviors to accomplish specific tasks. For instance, the model inference" " ``WorkerActor`` needs to launch the model (``launch_model``), list the " "models in this actor (``list_models``), terminate a model " "(``terminate_model``). There are two special methods worth noting. The " "``__post_create__`` is invoked before the actor is created, allowing for " "necessary initializations. The ``__pre_destroy__`` is called after the " "actor is destroyed, allowing for cleanup or finalization tasks." msgstr "" "**定义 Actor 的行为**:每个 actor 都需要定义某些动作或行为来完成特定任务" "。例如,模型推理 ``WorkerActor`` 需要启动模型(``launch_model``)、列出该" " actor 中的模型(``list_models``)、终止模型(``termininate_model``)。有" "两个特殊方法值得注意。``__post_create__`` 在创建 actor 之前调用,进行必要" "的初始化。而 ``__pre_destroy__`` 会在 actor 被销毁后调用,执行清理任务。" #: ../../source/development/xinference_internals.rst:136 msgid "" "**Reference Actor and Invoke Methods**: When an actor is created, it " "yields a reference variable so that other actors can reference it. The " "actor reference can also be referenced with the address. Suppose the " "``WorkerActor`` is created and the reference variable is ``worker_ref``," " the ``launch_model`` method of this actor class can be invoked by " "calling ``worker_ref.launch_model()``. Even if the actor's method is " "originally a synchronized method, when called with an actor reference, it" " will become as an asynchronous method." msgstr "" "**引用 Actor 和调用方法**:当创建一个 Actor 时,它会产生一个引用变量," "以便其他 Actor 可以引用它。Actor 也可以用 IP 地址来引用。假设创建了 ``" "WorkerActor``,且引用变量为 ``worker_ref``,那么就可以通过调用 ``worker_" "ref.launch_model()`` 来调用该 Actor 类的 ``launch_model``。即使 actor 中" "的方法原来是一个传统的阻塞式的方法,当我们使用引用变量调用这个方法时,它" "也变成了一个异步方法。" #: ../../source/development/xinference_internals.rst:143 msgid "" "**Inference Engine**: The actor can manage the process, and the inference" " engine is also a process. In the launch model part of the " "``WorkerActor``, we can initialize different inference engines according " "to the user's need. Therefore, Xinference can support multiple inference " "engines and can easily adapt to new inference engines in the future." msgstr "" "**推理引擎**:Actor 可以管理进程,而推理引擎也是一种进程。在 ``" "WorkerActor`` 的启动模型部分,我们可以根据用户的需要初始化不同的推理引擎" "。因此,Xinference 可以支持多种推理引擎,并能轻松适应未来的新推理引擎。" #: ../../source/development/xinference_internals.rst:148 msgid "" "See `Xoscar document `_ for more actor use cases." msgstr "" "请参阅 `Xoscar 文档 `_ 了解更多 Actor 用例。" #: ../../source/development/xinference_internals.rst:151 msgid "Asynchronous Programming" msgstr "异步编程" #: ../../source/development/xinference_internals.rst:153 msgid "" "Both Xinference and Xoscar highly utilize asynchronous programming of " "``asyncio``. Asynchronous programming is a programming paradigm that does" " not block. Instead, requests and function calls are issued and executed " "in the background and results are returned in the future. This enables us" " to perform activities concurrently." msgstr "" "Xinference 和 Xoscar 非常依赖异步编程库 ``asyncio``。异步编程是一种非阻塞" "的编程范式。相比于传统的阻塞式的函数调用,异步编程中的请求或函数调用在" "后台执行,运行结果在未来某个时刻返回。异步编程的优势是使得可以同时并发" "进行很多不同的活动或任务。" #: ../../source/development/xinference_internals.rst:159 msgid "" "If you're not familiar with Pythons's ``asyncio``, you can see more " "tutorials for help:" msgstr "如果您不熟悉 Python 的 ``asyncio``,可以查看更多教程以获得帮助:" #: ../../source/development/xinference_internals.rst:161 msgid "" "`Python Asyncio Tutorial `__" msgstr "" "`Python Asyncio 教程 `__" #: ../../source/development/xinference_internals.rst:163 msgid "" "`Real Python's asyncio Tutorial `__" msgstr "`Real Python asyncio 教程 `__" #: ../../source/development/xinference_internals.rst:165 msgid "" "`Python Official Documentation " "`__" msgstr "`Python 官方文档 `__" #: ../../source/development/xinference_internals.rst:169 msgid "Model" msgstr "模型" #: ../../source/development/xinference_internals.rst:171 msgid "" "Xinference supports different types of models including large language " "models (LLMs), image models, audio models, embedding models, etc. All " "models are implemented in `model/ " "`_." msgstr "" "Xinference 支持不同类型的模型,包括大型语言模型(LLM)、图像模型、音频" "模型、嵌入模型等。所有模型在 `model/ `_ 文件夹下实现。" #: ../../source/development/xinference_internals.rst:175 msgid "LLM" msgstr "" #: ../../source/development/xinference_internals.rst:177 msgid "" "Take `model/llm/ " "`_" " for example, it focuses on the management and instantiation of LLMs. It " "includes detailed implementations for loading, configuring, and deploying" " LLMs." msgstr "" "以 `model/llm/ `_ 为例,它主要管理和启动 LLM,包括加载、配置和运行" "大语言模型。" #: ../../source/development/xinference_internals.rst:181 msgid "" "We support many backends such as GGML, PyTorch, and vLLM. Our generated " "content is compatible with the format of OpenAI, supporting features such" " as streaming output and returning chat completion format (for chat " "models only). Therefore, there is a lot of adaptation work to be done " "after the model generate content. These tasks are not difficult, but they" " do require some time. When writing this part of the code, please refer " "to the `OpenAI API documentation " "`_ and the documentation " "of various inference backends, and make the necessary adaptations." msgstr "" "我们支持不同的推理后端,比如 GGML、PyTorch 和 vLLM。我们生成的内容与 " "OpenAI 的格式兼容,比如支持流式输出(stream),对话模型以 chat completion" " 格式返回。因此模型输出内容后要做很多适配工作。这些工作并不难,但需要一些" "时间。编写这部分代码时,请参考 `OpenAI 的 API 文档 `_ 和各个推理后端的文档,做必要的适配。" #: ../../source/development/xinference_internals.rst:185 msgid "JSON" msgstr "" #: ../../source/development/xinference_internals.rst:187 msgid "" "In `model/llm/llm_family.json " "`_," " we utilize JSON files to manage the metadata of emerging open-source " "models. Adding a new model does not necessitate writing new code, it " "merely requires appending new metadata to the existing JSON file." msgstr "" "在 `model/llm/llm_family.json `_ 中,我们利用 JSON 文件" "来管理新出现的开源模型的元数据。添加一个新模型并不需要编写新代码,只需要" "将新的元数据添加到现有的 JSON 文件中即可。" #: ../../source/development/xinference_internals.rst:214 msgid "" "This is an example of how to define the Llama-2 chat model. The " "``model_specs`` define the information of the model, as one model family " "usually comes with various sizes, quantization methods, and file formats." " For instance, the ``model_format`` could be ``pytorch`` (using Hugging " "Face Transformers or vLLM as backend), ``ggmlv3`` (a tensor library " "associated with llama.cpp), or ``gptq`` (a post-training quantization " "framework). The ``model_id`` defines the repository of the model hub from" " which Xinference downloads the checkpoint files. Furthermore, due to " "distinct instruction-tuning processes, different model families have " "varying prompt styles. The ``prompt_style`` in the JSON file specifies " "how to format prompts for this particular model. For example, " "``system_prompt`` and ``roles`` are used to specify the instructions and " "personality of the model." msgstr "" "这是一个如何定义 Llama-2 聊天模型的示例。``model_specs`` 定义了模型的信息" ",因为一个模型系列通常有不同的尺寸、量化方法和文件格式。例如,``model_" "format`` 可以是 ``pytorch`` (使用 Hugging Face Transformers 或 vLLM 作为" "后端)、 ``ggmlv3`` (与 llama.cpp 相关的张量库)或 ``gptq`` (训练后量化" "框架)。 ``model_id`` 定义了模型中心的资源库,Xinference 从模型中心下载" "检查点文件。此外,由于不同的指令调整过程,不同的模型系列有不同的提示风格" "。JSON 文件中的 ``prompt_style`` 指定了该特定模型的提示格式。例如,``" "system_prompt`` 和 ``roles`` 用于指定模型的指令和个性。" #: ../../source/development/xinference_internals.rst:224 msgid "Code Walkthrough" msgstr "代码指南" #: ../../source/development/xinference_internals.rst:226 msgid "" "The main code is located in the `xinference/ " "`_:" msgstr "" "主要代码位于 `xinference/ `_:" #: ../../source/development/xinference_internals.rst:228 msgid "" "`api/ " "`_: " "`restful_api.py " "`_" " is the core part that sets up and runs the RESTful APIs. It integrates " "an authentication service (the specific code is located in `oauth2/ " "`_)," " as some or all endpointsrequire user authentication." msgstr "" "`api/ `_" ":`restful_api.py `_ 是设置和运行 RESTful API 的核心部分。它" "集成了一个身份验证服务(具体代码位于 `oauth2/ `_),因为部分或所有" "端口需要用户身份验证。" #: ../../source/development/xinference_internals.rst:233 msgid "" "`client/ " "`_: " "This is the client of Xinference." msgstr "" "`client/ `_:这是 Xinference 的客户端。" #: ../../source/development/xinference_internals.rst:235 msgid "" "`oscar/ " "`_" " defines the Actor Client which acts as a client interface for " "interacting with models deployed in a Xinference cluster." msgstr "" "`oscar/ `_ 定义了 Actor 客户端,它是一个客户端接口,用于与 " "Xinference 中的模型交互。" #: ../../source/development/xinference_internals.rst:238 msgid "" "`restful/ " "`_" " implements a RESTful client for interacting with a Xinference service." msgstr "" "`restful/ `_ 实现与 Xinference 服务交互的 RESTful 客户端。" #: ../../source/development/xinference_internals.rst:241 msgid "" "`core/ " "`_: " "This is the core part of Xinference." msgstr "" "`core/ " "`_:这是 Xinference 的核心部分。" #: ../../source/development/xinference_internals.rst:243 msgid "" "`metrics.py " "`_" " and `resource.py " "`_" " defines a set of tools for collecting and reporting metrics and the " "status of node resources, including model throughput, latency, the usage " "of CPU and GPU, memory usage, and more." msgstr "" "`metrics.py `_ 和 `resource.py `_ 定义了一套用于收集和" "报告指标以及节点资源状态的工具,包括模型吞吐量、延迟、CPU 和 GPU 的使用率" "、内存使用率等。" #: ../../source/development/xinference_internals.rst:248 msgid "" "`image_interface.py " "`_" " and `chat_interface.py " "`_" " implement `Gradio `_ interfaces " "for image and chat models, respectively. These interfaces allow users to " "interact with models through a Web UI, such as generating images or " "engaging in chat. They build user interfaces using the gradio package and" " communicate with backend models through our RESTful APIs." msgstr "" "`image_interface.py `_ 和 `chat_interface.py `_ 分别为图像和聊天模型实现了 `Gradio `_ 接口。这些接口允许用户通过 Web 界面与模型进行交互,例如生成图像" "或进行聊天。代码使用 Gradio 软件包构建用户界面,并通过我们的 RESTful API " "与后端模型通信。" #: ../../source/development/xinference_internals.rst:254 msgid "" "`worker.py " "`_" " and `supervisor.py " "`_" " respectively define the logic for worker actors and supervisor actor. " "Worker actors are responsible for carrying out specific model computation" " tasks, while supervisor actors manage the lifecycle of worker nodes, " "schedule tasks, and monitor system states." msgstr "" "`worker.py `_ 和 `supervisor.py `_ 分别定义了 worker " "actor 和 supervisor actor 的逻辑。worker actor 负责执行特定的模型计算任务" ",而 supervisor actor 则管理 Worker 节点的生命周期和任务调度,并监控系统" "状态。" #: ../../source/development/xinference_internals.rst:259 msgid "" "`status_guard.py " "`_" " implements a status monitor to track the status of models (like " "creating, updating, terminating, etc.). It allows querying status " "information of model instances and managing these statuses based on the " "model's UID." msgstr "" "`status_guard.py `_ 实现了一个状态监视器,用于跟踪模型的" "状态(如创建、更新、终止等)。它允许根据模型的 UID 查询模型实例的状态信息" #: ../../source/development/xinference_internals.rst:263 msgid "" "`cache_tracker.py " "`_" " defines a cache tracker for recording and managing cache status and " "information of model versions. It supports recording cache locations and " "statuses of model versions and querying model version information based " "on model names." msgstr "" "`cache_tracker.py `_ 定义了一个缓存跟踪器,用于记录和管理" "缓存状态和模型版本信息。它支持记录缓存位置和模型版本的状态,并根据模型" "名称查询模型版本信息。" #: ../../source/development/xinference_internals.rst:267 msgid "" "`event.py " "`_" " defines an event collector for gathering and reporting various runtime " "events of models, such as information, warnings, and errors. `model.py " "`_" " defines a Model Actor, the core component for direct model interactions." " The Model Actor is responsible for executing model inference requests, " "handling input and output data streams, and supports various types of " "model operations." msgstr "" "`event.py `_ 定义了一个事件收集器,用于收集和报告各种运行时模型的事件" ",如信息、警告和错误。`model.py `_ 定义了一个模型 Actor,它是与模型" "直接交互的核心组件。模型 actor 负责执行模型推理请求,处理输入和输出数据流" ",并支持各种模型操作。这两个部分都使用 `Xoscar `_ 用于并发和分布式执行。" #: ../../source/development/xinference_internals.rst:273 msgid "" "`deploy/ " "`_: " "It provides a command-line interface (CLI) for interacting with the " "Xinference framework, allowing users to perform operations by command " "line. See `Command Line`_ for more information." msgstr "" "`deploy/ `_:它提供了一个命令行界面(CLI),用于与 Xinference 框架进行交互" ",允许用户通过命令行进行操作。更多信息,请参见 `Command Line`_。" #: ../../source/development/xinference_internals.rst:276 msgid "" "`locale/ " "`_: " "It supports multi-language localization. By simply adding and updating " "JSON translation files, it becomes possible to support more languages, " "improving user experience." msgstr "" "`locale/ `_:它支持多语言本地化。只需添加和更新 JSON 翻译文件,就可以支持更" "多语言,改善用户体验。" #: ../../source/development/xinference_internals.rst:279 msgid "" "`model/ " "`_: It" " provides a structure for model descriptions, creation, and caching. See " "`Model`_ for more information." msgstr "" "`model/ `_:它为模型描述、创建和缓存提供了一个框架。请参见 `Model`_ 以获取" "更多信息。" #: ../../source/development/xinference_internals.rst:282 msgid "" "`web/ui/ " "`_: " "The js code of the frontend (Web UI)." msgstr "" "`web/ui/ `_:前端(用户界面)的js代码。" #~ msgid "" #~ "Xinference supports different types of " #~ "models including large language models " #~ "(LLMs), image models, audio models, " #~ "embedding models, etc" #~ msgstr "" #~ "Xinference 支持不同类型的模型,包括大型" #~ "语言模型(LLM)、图像模型、音频模型、" #~ "嵌入模型等。所有模型都在 `model/ " #~ "`" #~ "_ 中实现。以 `llm/ `_ 为例" #~ ",它侧重于 LLM 的管理和实例化。它" #~ "包括加载、配置和部署 LLM 的详细实现" #~ ",包括处理不同类型的量化和模型格式。" #~ "在 `llm/ `_ 中,它支持许多后" #~ "端,如 `GGML `_、`" #~ "PyTorch `_、`SGLang `" #~ "_ 和 `vLLM `_。" #~ msgid "" #~ "Take `llm/ " #~ "`_" #~ " for example, it focuses on the " #~ "management and instantiation of LLMs. It" #~ " includes detailed implementations for " #~ "loading, configuring, and deploying LLMs, " #~ "including handling different types of " #~ "quantization and model formats. In `llm/" #~ " " #~ "`_," #~ " it supports many backends such as" #~ " `GGML " #~ "`_," #~ " `PyTorch " #~ "`_," #~ " `SGLang " #~ "`_" #~ " and `vLLM " #~ "`_." #~ msgstr "" #~ "所有模型都在 `model/ `_ 中实现。以 " #~ "`llm/ `_ 为例,它侧重于 LLM" #~ " 的管理和实例化。它包括加载、配置" #~ "和部署 LLM 的详细实现,包括处理不同" #~ "类型的量化和模型格式。在 `llm/ " #~ "`_ 中,它支持许多后端,如 `" #~ "GGML `_、`PyTorch `_" #~ "、`SGLang `_ 和 `vLLM " #~ "`_。" #~ msgid "" #~ "All models are implemented in `model/" #~ " " #~ "`_." #~ msgstr "" #~ "所有模型在 `model/ `_ 中实现" #~ msgid "" #~ "In ``model/llm/``, it supports many " #~ "backends such as GGML, PyTorch, and " #~ "vLLM." #~ msgstr "在 ``model/llm/`` 下,我们支持了很多后端,比如 GGML、PyTorch、和 vLLM。" #~ msgid "" #~ "`oscar/ " #~ "`_" #~ " defines the Actor Client which acts" #~ " as a client interface for " #~ "interacting with models deployed in a" #~ " server environment. It includes " #~ "functionalities to register/unregister models, " #~ "launch/terminate models, and interact with " #~ "different types of models. This part " #~ "heavily utilizes ``asyncio`` for asynchronous" #~ " operations. See `Concurrency`_ for more" #~ " information." #~ msgstr "" #~ "`oscar/ `_ 定义了 Actor " #~ "客户端,它是一个客户端接口,用于与部署" #~ "在服务器环境中的模型交互。它包括注册/" #~ "取消注册模型、启动/终止模型,以及与" #~ "不同类型的模型交互的功能。这部分主要使用" #~ " ``asyncio`` 进行异步操作。更多" #~ "信息,请参见 `Concurrency`_。" #~ msgid "Concurrency" #~ msgstr "并发性" #~ msgid "" #~ "Both Xinference and Xoscar highly " #~ "utilize coroutine programming of ``asyncio``." #~ msgstr "Xinference 和 Xoscar 都高度利用了 ``asyncio`` 进行协程编程。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/examples/ai_podcast.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-12-25 17:11+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/examples/ai_podcast.rst:5 msgid "Example: AI Podcast 🎙" msgstr "示例:智能播客 🎙" #: ../../source/examples/ai_podcast.rst:7 msgid "**Description**:" msgstr "描述" #: ../../source/examples/ai_podcast.rst:9 msgid "🎙️AI Podcast - Voice Conversations with Multiple Agents on M2 Max 💻" msgstr "🎙️AI播客 - 在M2 Max芯片上进行多智能体语音对话" #: ../../source/examples/ai_podcast.rst:11 msgid "**Support Language** :" msgstr "**支持语言**:" #: ../../source/examples/ai_podcast.rst:13 msgid "English (AI_Podcast.py)" msgstr "英文对应代码文件:AI_Podcast.py" #: ../../source/examples/ai_podcast.rst:15 msgid "Chinese (AI_Podcast_ZH.py)" msgstr "中文对应代码文件:AI_Podcast_ZH.py" #: ../../source/examples/ai_podcast.rst:17 msgid "**Used Technology (EN version)** :" msgstr "英文版本涉及技术:" #: ../../source/examples/ai_podcast.rst:19 msgid "" "@ `OpenAI `_ 's `whisper " "`_" msgstr "" "@ `OpenAI `_ `whisper `_" #: ../../source/examples/ai_podcast.rst:21 msgid "" "@ `ggerganov `_ 's `ggml " "`_" msgstr "" "@ `ggerganov `_ `ggml `_" #: ../../source/examples/ai_podcast.rst:23 msgid "" "@ `WizardLM_AI `_ 's `wizardlm v1.0 " "`_" msgstr "" "@ `WizardLM_AI `_ `wizardlm v1.0 `_" #: ../../source/examples/ai_podcast.rst:25/ msgid "" "@ `lmsysorg `_ 's `vicuna v1.3 " "`_" msgstr "" "@ `lmsysorg `_ `vicuna v1.3 `_" #: ../../source/examples/ai_podcast.rst:27 msgid "@ `Xinference `_ as a launcher" msgstr "`Xinference `_ 作为平台" #: ../../source/examples/ai_podcast.rst:29 msgid "**Detailed Explanation on the Demo Functionality** :" msgstr "**关于演示功能的详细说明**:" #: ../../source/examples/ai_podcast.rst:31 msgid "" "Generate the Wizardlm Model and Vicuna Model when the program is " "launching with Xorbits Inference. Initiate the Chatroom by giving the two" " chatbot their names and telling them that there is a human user called " "\"username\", where \"username\" is given by user's input. Initialize a " "empty chat history for the chatroom." msgstr "" "启动 XInference, 部署 Wizardlm 模型和 Vicuna 模型。" "通过为两个模型指定名称并告诉它们有一个名为“username”的人类用户来启动聊天室,其中“username”是由用户输入提供的。然后为聊天室初始化一个空的聊天历史。" #: ../../source/examples/ai_podcast.rst:35 msgid "" "Use Audio device to store recording into file, and transcribe the file " "using OpenAI's Whisper to receive a human readable text as string." msgstr "" "使用音频设备将录音存储到文件中,然后使用OpenAI的Whisper将文件转录为人类可读的文本字符串。" #: ../../source/examples/ai_podcast.rst:37 msgid "" "Based on the input message string, determine which agents the user want " "to talk to. Call the target agents and parse in the input string and chat" " history for the model to generate." msgstr "" "基于输入的消息字符串,确定用户想要与哪些代理(模型)进行对话。调用这些目标代理并将用户输入字符串和聊天历史作为输入让模型去生成对应的内容。" #: ../../source/examples/ai_podcast.rst:40 msgid "" "When the responses are ready, use Macos's \"Say\" Command to produce " "audio through speaker. Each agents have their own voice while speaking." msgstr "" "当模型的输出准备好时,使用MacOS的“Say”命令通过扬声器生成音频。每个代理在说话时都有自己的声音。" #: ../../source/examples/ai_podcast.rst:43 msgid "" "Store the user input and the agent response into chat history, and " "recursively looping the program until user explicitly says words like " "\"see you\" in their responses." msgstr "" "将用户输入和代理响应存储到聊天历史中,并循环递归程序,直到用户明确在其响应中说出“再见”之类的话语。" #: ../../source/examples/ai_podcast.rst:46 msgid "**Highlight Features with Xinference** :" msgstr "**Xinference的突出特性**:" #: ../../source/examples/ai_podcast.rst:48 msgid "" "With Xinference's distributed system, we can easily deploy two different " "models in the same session and in the same \"chatroom\". With enough " "resources, the framework can deploy any amount of models you like at the " "same time." msgstr "" "借助 Xinference 的分布式系统,我们可以轻松在同一会话和同一“聊天室”中部署两个不同的模型。" "在足够的资源情况下,该框架可以同时部署任意数量的模型。" #: ../../source/examples/ai_podcast.rst:51 msgid "" "With Xinference, you can deploy the model easily by just adding a few " "lines of code. For examples, for launching the vicuna model in the demo, " "just by::" msgstr "" "使用 Xinference,只需添加几行代码就可以轻松部署模型。例如,在演示中启动vicuna模型,只需:" #: ../../source/examples/ai_podcast.rst:68 msgid "" "Then, the Xinference client will handle \"target model downloading and " "caching\", \"set up environment and process for the model\", and \"run " "the service at selected endpoint. \" You are now ready to play with your " "llm model." msgstr "" "然后,Xinference 客户端将处理“目标模型的下载和缓存”、“为模型设置环境和进程”以及“在选择的端点运行服务”。你现在已经准备好与你的 llm 模型交互。" #: ../../source/examples/ai_podcast.rst:71 msgid "**Original Demo Video** :" msgstr "**原始演示视频**" #: ../../source/examples/ai_podcast.rst:73 msgid "" "`🎙️AI Podcast - Voice Conversations with Multiple Agents on M2 Max💻🔥🤖 <" "https://twitter.com/yichaocheng/status/1679129417778442240>`_" msgstr "" "`🎙️AI播客 - 在M2 Max芯片上进行多智能体语音对话💻🔥🤖 `_" #: ../../source/examples/ai_podcast.rst:75 msgid "**Source Code** :" msgstr "**源代码**:" #: ../../source/examples/ai_podcast.rst:77 msgid "" "`AI_Podcast " "`_" " (English Version)" msgstr "" "`AI播客 `_(英文版)" #: ../../source/examples/ai_podcast.rst:79 msgid "" "`AI_Podcast_ZH " "`_" " (Chinese Version)" msgstr "" "`AI播客 `_(中文版)" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/examples/chatbot.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-11-01 10:48+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/examples/chatbot.rst:5 msgid "Example: CLI chatbot 🤖️" msgstr "示例:命令行聊天机器人 🤖️" #: ../../source/examples/chatbot.rst:7 msgid "**Description**:" msgstr "**描述**:" #: ../../source/examples/chatbot.rst:9 msgid "" "Demonstrate how to interact with Xinference to play with LLM chat " "functionality with an AI agent in command line💻" msgstr "" "演示如何与 Xinference 交互,在命令行中基于 LLM 的聊天功能与 AI 代理互动。💻" #: ../../source/examples/chatbot.rst:11 msgid "**Used Technology**:" msgstr "**涉及技术**:" #: ../../source/examples/chatbot.rst:13 msgid "" "@ `ggerganov `_ 's `ggml " "`_" msgstr "" "@ `ggerganov `_ `ggml `_" #: ../../source/examples/chatbot.rst:15 msgid "@ `Xinference `_ as a launcher" msgstr "" "@ `Xinference `_ 作为平台" #: ../../source/examples/chatbot.rst:17 msgid "" "@ All LLaMA and Chatglm models supported by `Xorbitsio inference " "`_" msgstr "" "由 `Xinference 推理 `_ 支持的所有 LLaMA 和 Chatglm 模型" #: ../../source/examples/chatbot.rst:19 msgid "**Detailed Explanation on the Demo Functionality** :" msgstr "**关于演示功能的详细说明**:" #: ../../source/examples/chatbot.rst:21 msgid "" "Take the user command line input in the terminal and grab the required " "parameters for model launching." msgstr "" "在终端中接受用户的命令行输入,并获取启动模型所需的参数。" #: ../../source/examples/chatbot.rst:23 msgid "" "Launch the Xinference frameworks and automatically deploy the model user " "demanded into the cluster." msgstr "" "启动 Xinference 框架,并自动将用户需求的模型部署到集群中。" #: ../../source/examples/chatbot.rst:25 msgid "Initialize an empty chat history to store all the context in the chatroom." msgstr "" "初始化一个空的聊天历史,以存储聊天室中的所有上下文。" #: ../../source/examples/chatbot.rst:27 msgid "" "Recursively ask for user's input as prompt and let the model to generate " "response based on the prompt and the chat history. Show the Output of the" " response in the terminal." msgstr "" "递归地请求用户的输入作为提示词,让模型基于提示词和聊天历史生成响应。在终端中显示响应的输出。" #: ../../source/examples/chatbot.rst:30 msgid "" "Store the user's input and agent's response into the chat history as " "context for the upcoming rounds." msgstr "" "将用户的输入和代理的响应存储到聊天历史中,作为即将到来的对话轮次的上下文。" #: ../../source/examples/chatbot.rst:32 msgid "**Source Code** :" msgstr "**源代码**:" #: ../../source/examples/chatbot.rst:33 msgid "" "`chat " "`_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/examples/gradio_chatinterface.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-11-01 10:48+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/examples/gradio_chatinterface.rst:5 msgid "Example: Gradio ChatInterface🤗" msgstr "示例:Gradio 聊天界面🤗" #: ../../source/examples/gradio_chatinterface.rst:7 msgid "**Description**:" msgstr "**描述**:" #: ../../source/examples/gradio_chatinterface.rst:9 msgid "" "This example showcases how to build a chatbot with 120 lines of code with" " Gradio ChatInterface and Xinference local LLM" msgstr "" "这个例子展示了如何使用Gradio ChatInterface 聊天界面接口和 Xinference 本地LLM构建一个只有120行代码的聊天机器人。" #: ../../source/examples/gradio_chatinterface.rst:11 msgid "**Used Technology**:" msgstr "**涉及技术**:" #: ../../source/examples/gradio_chatinterface.rst:13 msgid "" "@ `Xinference `_ as a LLM model " "hosting service" msgstr "" "@ `Xinference `_ 作为 LLM 模型托管服务" #: ../../source/examples/gradio_chatinterface.rst:15 msgid "" "@ `Gradio `_ as a web interface for" " the chatbot" msgstr "" "@ `Gradio `_ 作为聊天机器人的 Web 界面" #: ../../source/examples/gradio_chatinterface.rst:17 msgid "**Detailed Explanation on the Demo Functionality** :" msgstr "**关于演示功能的详细说明**:" #: ../../source/examples/gradio_chatinterface.rst:19 msgid "" "Parse user-provided command line arguments to capture essential model " "parameters such as model name, size, format, and quantization." msgstr "" "解析用户提供的命令行参数,以捕获关键的模型参数,如模型名称、大小、格式和量化方式。" #: ../../source/examples/gradio_chatinterface.rst:21 msgid "" "Establish a connection to the Xinference framework and deploy the " "specified model, ensuring it's ready for real-time interactions." msgstr "" "建立与 Xinference 框架的连接并部署指定的模型,确保它准备好进行实时交互。" #: ../../source/examples/gradio_chatinterface.rst:23 msgid "" "Implement helper functions (flatten and to_chat) to efficiently handle " "and store chat interactions, ensuring the model has context for " "generating relevant responses." msgstr "" "实现辅助函数(flatten和to_chat),以高效处理和存储聊天交互,确保模型具有生成相关响应的上下文。" #: ../../source/examples/gradio_chatinterface.rst:25 msgid "" "Set up an interactive chat interface using Gradio, allowing users to " "communicate with the model in a user-friendly environment." msgstr "" "使用 Gradio 设置交互式聊天界面,允许用户在用户友好的环境中与模型进行通信。" #: ../../source/examples/gradio_chatinterface.rst:27 msgid "" "Activate the Gradio web interface, enabling users to start their chat " "sessions and receive model-generated responses based on their queries." msgstr "" "启动 Gradio Web 界面,使用户能够开始他们的聊天会话,并根据他们的查询接收模型生成的响应。" #: ../../source/examples/gradio_chatinterface.rst:29 msgid "**Source Code** :" msgstr "**源代码**:" #: ../../source/examples/gradio_chatinterface.rst:30 msgid "" "`Gradio ChatInterface " "`_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/examples/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-12-25 17:11+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/examples/index.rst:5 msgid "Examples" msgstr "示例" #: ../../source/examples/index.rst:17 msgid "" "Here you can find examples and resources to learn about how to use " "Xinference." msgstr "在这里,你可以找到关于如何使用 Xinference 的示例和学习资源。" #: ../../source/examples/index.rst:20 msgid "Demos" msgstr "示例" #: ../../source/examples/index.rst:22 msgid "End-to-end applications of using Xinference:" msgstr "使用 Xinference 的端到端应用:" #: ../../source/examples/index.rst:24 msgid "`Voice Conversations with AI Agents on M2 Max `_" msgstr "`在M2 Max上与AI代理进行语音对话的播客 `_" #: ../../source/examples/index.rst:26 msgid "`Interacting with LLM Models: A Command-Line Example `_" msgstr "`与 LLM 模型互动:命令行示例 `_" #: ../../source/examples/index.rst:28 msgid "" "`Interacting with LLM Models: A Gradio ChatInterface Example " "`_" msgstr "`与 LLM 模型互动:Gradio 聊天页面示例 `_" #: ../../source/examples/index.rst:30 msgid "`PDF Chatbot with Local LLM and Embeddings `_" msgstr "`使用本地 LLM 和 embedding 模型的 PDF 聊天机器人 `_" #: ../../source/examples/index.rst:32 msgid "" "`Local Doc Conversations with LangChain and Streamlit " "`_" msgstr "`使用 LangChain 和 Streamlit 进行本地文档对话 `_" #: ../../source/examples/index.rst:34 msgid "" "If you come across other examples in your own workflows we encourage you " "to contribute a `PR `_!" msgstr "如果在你自己的工作流中遇到其他示例,我们鼓励你贡献一个`PR `_!" #: ../../source/examples/index.rst:38 msgid "Tutorials" msgstr "教程" #: ../../source/examples/index.rst:40 msgid "" "The following tutorials cover the basics of using Xinference in different" " scenarios:" msgstr "以下教程涵盖了在不同场景中使用 Xinference 的基础知识:" #: ../../source/examples/index.rst:42 msgid "" "`[Notebook] Question-answering(QA) Application with Xinference, Milvus " "and LangChain " "`_" msgstr "`[Notebook] 使用Xinference、Milvus 和 LangChain 的问答(QA)应用 " "`_" #: ../../source/examples/index.rst:44 msgid "" "`Using Xinference local LLMs within LlamaIndex " "`_" msgstr "`在 LlamaIndex 中使用 Xinference 本地 LLMs `_" #: ../../source/examples/index.rst:46 msgid "" "`[Chinese] 如何让 Chatbox 接入开源大模型,实现免费聊天 `_" msgstr "" #: ../../source/examples/index.rst:48 msgid "" "`[Chinese] 摆脱 OpenAI 依赖,8 分钟教你用开源生态构建全栈 AI 应用 `_" msgstr "" #: ../../source/examples/index.rst:50 msgid "" "`[Chinese] 使用全套开源工具构建 LLM 应用实战: 在 Dify 调用 Baichuan 开源" "模型能力 `_" msgstr "" #: ../../source/examples/index.rst:54 msgid "Third-Party Library Integrations" msgstr "第三方库集成" #: ../../source/examples/index.rst:56 msgid "" "Xinference is designed to seamlessly integrate and deploy open-sourced AI" " models, so we want to incorporate support for mainstream toolkits in the" " AI landscape. Xinference can be used with the following third-party " "libraries:" msgstr "" "Xinference 能够无缝集成和部署开源 AI 模型,因此支持 AI 领域主流工具包。Xinference 可以与以下第三方库一起使用:" #: ../../source/examples/index.rst:59 msgid "" "LangChain `Text Embedding Models " "`_" " and `LLMs " "`_" msgstr "" #: ../../source/examples/index.rst:61 msgid "" "`LlamaIndex Xinference LLM " "`_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/examples/langchain_streamlit_doc_chat.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-12-25 17:11+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/examples/langchain_streamlit_doc_chat.rst:5 msgid "Example: LangChain Streamlit Doc Chat📄" msgstr "示例:LangChain Streamlit 文档聊天📄" #: ../../source/examples/langchain_streamlit_doc_chat.rst:7 msgid "**Description**:" msgstr "**描述**:" #: ../../source/examples/langchain_streamlit_doc_chat.rst:9 msgid "" "This Streamlit-based application demonstrates a AI chatbot powered by " "local LLM and embedding models" msgstr "这个基于 Streamlit 的应用演示了由本地 LLM 和 embedding 模型提供支持的 AI 聊天机器人。" #: ../../source/examples/langchain_streamlit_doc_chat.rst:11 msgid "**Used Technology**:" msgstr "**涉及技术**:" #: ../../source/examples/langchain_streamlit_doc_chat.rst:13 msgid "" "@ `Xinference `_: as the LLM and " "embedding model hosting service" msgstr "@ `Xinference `_:作为 LLM 和 embedding 模型托管服务" #: ../../source/examples/langchain_streamlit_doc_chat.rst:15 msgid "" "@ `LangChain `_: orchestrates " "the entire document processing and query answering pipeline" msgstr "@ `LangChain `_:编排整个文档处理和查询回答的管道" #: ../../source/examples/langchain_streamlit_doc_chat.rst:17 msgid "@ `Streamlit `_: for interactive user interface" msgstr "@ `Streamlit `_:用于交互式用户界面" #: ../../source/examples/langchain_streamlit_doc_chat.rst:19 msgid "**Detailed Explanation on the Demo Functionality** :" msgstr "**关于演示功能的详细说明**:" #: ../../source/examples/langchain_streamlit_doc_chat.rst:21 msgid "Streamlit UI for uploading text files, enhancing user interaction." msgstr "Streamlit 用户界面,用于上传文本文件,提升用户交互。" #: ../../source/examples/langchain_streamlit_doc_chat.rst:23 msgid "" "Texts are split into chunks and embedded using Xinference for efficient " "processing." msgstr "文本被分割成块,并使用 Xinference 进行 embed 操作,以实现高效的处理。" #: ../../source/examples/langchain_streamlit_doc_chat.rst:25 msgid "" "Executes similarity searches on embedded texts to pinpoint relevant " "sections for user queries." msgstr "对嵌入的文本执行相似性搜索,以精确定位用户查询的相关部分。" #: ../../source/examples/langchain_streamlit_doc_chat.rst:27 msgid "Utilizes a structured prompt template for focused LLM interactions." msgstr "利用结构化的提示词模板与 LLM 模型进行交互。" #: ../../source/examples/langchain_streamlit_doc_chat.rst:29 msgid "" "Xinference's LLM processes queries within the context of relevant " "document parts, providing accurate responses." msgstr "" "Xinference 的 LLM 在相关文档部分的上下文中处理查询,提供准确的响应。" #: ../../source/examples/langchain_streamlit_doc_chat.rst:31 msgid "" "The system facilitates effective and context-sensitive document " "exploration, aiding users in information retrieval." msgstr "" "该系统实现了有效的、上下文敏感的文档搜索,帮助用户进行高效信息检索。" #: ../../source/examples/langchain_streamlit_doc_chat.rst:33 msgid "**Source Code** :" msgstr "**源代码**:" #: ../../source/examples/langchain_streamlit_doc_chat.rst:34 msgid "" "`LangChain Streamlit Doc Chat " "`_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/examples/pdf_chatbot.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-11-01 10:48+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/examples/pdf_chatbot.rst:5 msgid "Example: PDF Chatbot📚" msgstr "示例:PDF 聊天机器人📚" #: ../../source/examples/pdf_chatbot.rst:7 msgid "**Description**:" msgstr "**描述**:" #: ../../source/examples/pdf_chatbot.rst:9 msgid "" "This example showcases how to build a PDF chatbot with local LLM and " "Embedding models" msgstr "" "这个例子展示了如何使用本地 LLM 和 embedding 模型构建PDF聊天机器人。" #: ../../source/examples/pdf_chatbot.rst:11 msgid "**Used Technology**:" msgstr "**涉及技术**:" #: ../../source/examples/pdf_chatbot.rst:13 msgid "" "@ `Xinference `_ as a LLM model " "hosting service" msgstr "@ `Xinference `_ 作为LLM模型托管服务" #: ../../source/examples/pdf_chatbot.rst:15 msgid "" "@ `LlamaIndex `_ for " "orchestrating the entire RAG pipeline" msgstr "@ `LlamaIndex `_ 用于编排整个RAG管道" #: ../../source/examples/pdf_chatbot.rst:17 msgid "@ `Streamlit `_ for interactive UI" msgstr "@ `Streamlit `_ 用于交互式用户界面" #: ../../source/examples/pdf_chatbot.rst:19 msgid "**Detailed Explanation on the Demo Functionality** :" msgstr "**关于演示功能的详细说明**:" #: ../../source/examples/pdf_chatbot.rst:21 msgid "" "Crafted a Dockerfile to simplify the process and ensure easy " "reproducibility." msgstr "制作了一个Dockerfile,通过 docker 简化了部署流程并确保易于复现。" #: ../../source/examples/pdf_chatbot.rst:23 msgid "Set up models with Xinference and expose two ports for accessing them." msgstr "使用 Xinference 拉起 LLM 和 embedding 模型,并暴露两个端口以访问它们。" #: ../../source/examples/pdf_chatbot.rst:25 msgid "" "Leverage Streamlit for seamless file uploads and interactive " "communication with the chat engine." msgstr "利用 Streamlit 实现无缝文件上传和与聊天引擎的交互通信。" #: ../../source/examples/pdf_chatbot.rst:27 msgid "5x faster doc embedding than OpenAI's API." msgstr "文档 embedding 速度比 OpenAI 的 API快5倍。" #: ../../source/examples/pdf_chatbot.rst:29 msgid "" "Leveraging the power of GGML to offload models to the GPU, ensuring swift" " acceleration. Less long waits for returns." msgstr "利用 GGML 的强大功能将模型置于GPU上运行,确保加速、减少等待返回的时间。" #: ../../source/examples/pdf_chatbot.rst:31 msgid "**Source Code** :" msgstr "**源代码**:" #: ../../source/examples/pdf_chatbot.rst:32 msgid "" "`PDF Chatbot `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/getting_started/environments.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-28 11:54+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/getting_started/environments.rst:5 msgid "Environments Variables" msgstr "环境变量" #: ../../source/getting_started/environments.rst:8 msgid "XINFERENCE_ENDPOINT" msgstr "XINFERENCE_ENDPOINT" #: ../../source/getting_started/environments.rst:9 msgid "" "Endpoint of Xinference, used to connect to Xinference service. Default " "value is http://127.0.0.1:9997 , you can get it through logs." msgstr "" "Xinference 的服务地址,用来与 Xinference 连接。默认地址是 " "http://127.0.0.1:9997,可以在日志中获得这个地址。" #: ../../source/getting_started/environments.rst:13 msgid "XINFERENCE_MODEL_SRC" msgstr "XINFERENCE_MODEL_SRC" #: ../../source/getting_started/environments.rst:14 msgid "" "Modelhub used for downloading models. Default is \"huggingface\", or you " "can set \"modelscope\" as downloading source." msgstr "配置模型下载仓库。默认下载源是 \"huggingface\",也可以设置为 \"modelscope\" 作为下载源。" #: ../../source/getting_started/environments.rst:20 msgid "XINFERENCE_HOME" msgstr "XINFERENCE_HOME" #: ../../source/getting_started/environments.rst:21 msgid "" "By default, Xinference uses ``/.xinference`` as home path to store " "necessary files such as logs and models, where ```` is the home " "path of current user. You can change this directory by configuring this " "environment variable." msgstr "" "Xinference 默认使用 ``/.xinference`` 作为默认目录来存储模型以及日志等必要的文件。其中 " "```` 是当前用户的主目录。可以通过配置这个环境变量来修改默认目录。" #: ../../source/getting_started/environments.rst:27 msgid "XINFERENCE_HEALTH_CHECK_FAILURE_THRESHOLD" msgstr "XINFERENCE_HEALTH_CHECK_FAILURE_THRESHOLD" #: ../../source/getting_started/environments.rst:28 msgid "" "The maximum number of failed health checks tolerated at Xinference " "startup. Default value is 5." msgstr "Xinference启动时允许的最大健康检查失败次数。默认值为5。" #: ../../source/getting_started/environments.rst:32 msgid "XINFERENCE_HEALTH_CHECK_INTERVAL" msgstr "XINFERENCE_HEALTH_CHECK_INTERVAL" #: ../../source/getting_started/environments.rst:33 msgid "Health check interval (seconds) at Xinference startup. Default value is 5." msgstr "Xinference启动时的健康检查间隔(秒)。默认值为5。" #: ../../source/getting_started/environments.rst:37 msgid "XINFERENCE_HEALTH_CHECK_TIMEOUT" msgstr "XINFERENCE_HEALTH_CHECK_TIMEOUT" #: ../../source/getting_started/environments.rst:38 msgid "Health check timeout (seconds) at Xinference startup. Default value is 10." msgstr "Xinference启动时的健康检查超时时间(秒)。默认值为10。" #: ../../source/getting_started/environments.rst:42 msgid "XINFERENCE_DISABLE_HEALTH_CHECK" msgstr "XINFERENCE_DISABLE_HEALTH_CHECK" #: ../../source/getting_started/environments.rst:43 msgid "" "Xinference will automatically report health check at Xinference startup. " "Setting this environment to 1 can disable health check." msgstr "在满足条件时,Xinference 会自动汇报worker健康状况,设置改环境变量为 1可以禁用健康检查。" #: ../../source/getting_started/environments.rst:47 msgid "XINFERENCE_DISABLE_METRICS" msgstr "XINFERENCE_DISABLE_METRICS" #: ../../source/getting_started/environments.rst:48 msgid "" "Xinference will by default enable the metrics exporter on the supervisor " "and worker. Setting this environment to 1 will disable the /metrics " "endpoint on the supervisor and the HTTP service (only provide the " "/metrics endpoint) on the worker." msgstr "" "Xinference 会默认在 supervisor 和 worker 上启用 metrics exporter。设置环境变量为 1可以在 " "supervisor 上禁用 /metrics 端点,并在 worker 上禁用 HTTP 服务(仅提供 /metrics 端点)" #: ../../source/getting_started/environments.rst:53 msgid "XINFERENCE_DOWNLOAD_MAX_ATTEMPTS" msgstr "XINFERENCE_DOWNLOAD_MAX_ATTEMPTS" #: ../../source/getting_started/environments.rst:54 msgid "Maximum download retry attempts for model files. Default value is 3." msgstr "模型文件的最大下载重试次数。默认值为3。" #: ../../source/getting_started/environments.rst:58 msgid "XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE" msgstr "XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE" #: ../../source/getting_started/environments.rst:59 msgid "" "Enable continuous batching for text-to-image models by specifying the " "target image size (e.g., ``1024*1024``). Default is unset." msgstr "通过指定目标图像尺寸(例如 ``1024*1024`` )为文本转图像模型启用连续批处理。默认未设置。" #: ../../source/getting_started/environments.rst:63 msgid "XINFERENCE_SSE_PING_ATTEMPTS_SECONDS" msgstr "XINFERENCE_SSE_PING_ATTEMPTS_SECONDS" #: ../../source/getting_started/environments.rst:64 msgid "" "Server-Sent Events keepalive ping interval (seconds). Default value is " "600." msgstr "服务器发送事件保持活动状态的ping间隔(秒)。默认值为600。" #: ../../source/getting_started/environments.rst:68 msgid "XINFERENCE_MAX_TOKENS" msgstr "XINFERENCE_MAX_TOKENS" #: ../../source/getting_started/environments.rst:69 msgid "Global max tokens limit override for requests. Default is unset." msgstr "请求的全局最大tokens限制覆盖。默认值为未设置。" #: ../../source/getting_started/environments.rst:72 msgid "XINFERENCE_ALLOWED_IPS" msgstr "XINFERENCE_ALLOWED_IPS" #: ../../source/getting_started/environments.rst:73 msgid "" "Restrict access to specified IPs or CIDR blocks. Default is unset (no " "restriction)." msgstr "限制访问特定IP地址或CIDR地址块。默认未设置(无限制)。" #: ../../source/getting_started/environments.rst:76 msgid "XINFERENCE_BATCH_SIZE" msgstr "XINFERENCE_BATCH_SIZE" #: ../../source/getting_started/environments.rst:77 msgid "" "Default batch size used by the server when batching is enabled. Default " "value is 32." msgstr "启用批处理时服务器使用的默认批处理大小。默认值为32。" #: ../../source/getting_started/environments.rst:81 msgid "XINFERENCE_BATCH_INTERVAL" msgstr "XINFERENCE_BATCH_INTERVAL" #: ../../source/getting_started/environments.rst:82 msgid "Default batching interval (seconds). Default value is 0.003." msgstr "默认批处理间隔(秒)。默认值为0.003。" #: ../../source/getting_started/environments.rst:86 msgid "XINFERENCE_ALLOW_MULTI_REPLICA_PER_GPU" msgstr "XINFERENCE_ALLOW_MULTI_REPLICA_PER_GPU" #: ../../source/getting_started/environments.rst:87 msgid "" "Whether to allow multiple replicas on a single GPU. Default value is 1 " "(enabled)." msgstr "是否允许在单个GPU上创建多个副本。默认值为1 (启用)。" #: ../../source/getting_started/environments.rst:91 msgid "XINFERENCE_LAUNCH_STRATEGY" msgstr "XINFERENCE_LAUNCH_STRATEGY" #: ../../source/getting_started/environments.rst:92 msgid "" "GPU allocation strategy for replicas. Default is " "``IDLE_FIRST_LAUNCH_STRATEGY``." msgstr "副本的GPU分配策略。默认值为 ``IDLE_FIRST_LAUNCH_STRATEGY`` 。" #: ../../source/getting_started/environments.rst:95 msgid "XINFERENCE_ENABLE_VIRTUAL_ENV" msgstr "XINFERENCE_ENABLE_VIRTUAL_ENV" #: ../../source/getting_started/environments.rst:96 msgid "" "Enable model virtual environments globally. Default value is 1 (enabled, " "starting from v2.0)." msgstr "全局启用模型虚拟环境。默认值为1(启用,自v2.0版本生效)" #: ../../source/getting_started/environments.rst:100 msgid "XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED" msgstr "XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED" #: ../../source/getting_started/environments.rst:101 msgid "" "Skip packages already present in system site-packages when creating " "virtual environments. Default value is 1." msgstr "创建虚拟环境时跳过系统site-packages中已存在的包。默认值为1。" #: ../../source/getting_started/environments.rst:105 msgid "XINFERENCE_CSG_TOKEN" msgstr "XINFERENCE_CSG_TOKEN" #: ../../source/getting_started/environments.rst:106 msgid "Authentication token for CSGHub model source. Default is unset." msgstr "CSGHub模型源的认证令牌。默认值为未设置。" #: ../../source/getting_started/environments.rst:110 msgid "XINFERENCE_CSG_ENDPOINT" msgstr "XINFERENCE_CSG_ENDPOINT" #: ../../source/getting_started/environments.rst:111 msgid "" "CSGHub endpoint for model source. Default value is ``https://hub-" "stg.opencsg.com/``." msgstr "CSGHub 模型源端点。默认值为 ``https://hub-stg.opencsg.com/`` 。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/getting_started/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-10-16 10:33+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/getting_started/index.rst:5 msgid "Getting Started" msgstr "入门指南" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/getting_started/installation.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-28 11:54+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/getting_started/installation.rst:5 msgid "Installation" msgstr "安装" #: ../../source/getting_started/installation.rst:6 msgid "" "Xinference can be installed with ``pip`` on Linux, Windows, and macOS. To" " run models using Xinference, you will need to install the backend " "corresponding to the type of model you intend to serve." msgstr "" "Xinference 在 Linux, Windows, MacOS 上都可以通过 ``pip`` 来安装。如果需要使用 Xinference " "进行模型推理,可以根据不同的模型指定不同的引擎。" #: ../../source/getting_started/installation.rst:8 msgid "" "If you aim to serve all supported models, you can install all the " "necessary dependencies with a single command::" msgstr "如果你希望能够推理所有支持的模型,可以用以下命令安装所有需要的依赖:" #: ../../source/getting_started/installation.rst:14 msgid "" "Due to irreconcilable package dependency conflicts between vLLM and " "sglang, we have removed sglang from the all extra. If you want to use " "sglang, please install it separately via ``pip install " "'xinference[sglang]'``." msgstr "" "由于 vllm 和 sglang 在包依赖上无法调和,因此,我们从 all 里移除了 sglang,如果要使用 sglang,请使用 ``pip " "install 'xinference[sglang]'`` 。" #: ../../source/getting_started/installation.rst:17 msgid "Several usage scenarios require special attention." msgstr "某些使用场景需要特别注意。" #: ../../source/getting_started/installation.rst:19 msgid "**GGUF format** with **llama.cpp engine**" msgstr "**GGUF 格式** 配合 **llama.cpp 引擎** 使用" #: ../../source/getting_started/installation.rst:21 msgid "" "In this situation, it's advised to install its dependencies manually " "based on your hardware specifications to enable acceleration. For more " "details, see the :ref:`installation_gguf` section." msgstr "在这种情况下,建议根据您的硬件规格手动安装其依赖项以启用加速。更多详情请参见 :ref:`installation_gguf` 部分。" #: ../../source/getting_started/installation.rst:23 msgid "**AWQ or GPTQ** format with **transformers engine**" msgstr "**AWQ 或 GPTQ 格式** 配合 **transformers 引擎** 使用" #: ../../source/getting_started/installation.rst:25 msgid "**This section is added in v1.6.0.**" msgstr "**本节内容新增于 v1.6.0。**" #: ../../source/getting_started/installation.rst:27 msgid "" "This is because the dependencies at this stage require special options " "and are difficult to install. Please run command below in advance" msgstr "这是因为此阶段的依赖项需要特殊选项,并且安装起来比较困难。请提前运行以下命令" #: ../../source/getting_started/installation.rst:33 msgid "" "Some dependencies like ``transformers`` might be downgraded, you can run " "``pip install \"xinference[all]\"`` afterwards." msgstr "" "某些依赖项,如 ``transformers``,可能会被降级,您可以之后运行 ``pip install " "\"xinference[all]\"``。" #: ../../source/getting_started/installation.rst:36 msgid "" "If you want to install only the necessary backends, here's a breakdown of" " how to do it." msgstr "如果你只想安装必要的依赖,接下来是如何操作的详细步骤。" #: ../../source/getting_started/installation.rst:41 msgid "Transformers Backend" msgstr "Transformers 引擎" #: ../../source/getting_started/installation.rst:42 msgid "" "PyTorch (transformers) supports the inference of most state-of-art " "models. It is the default backend for models in PyTorch format::" msgstr "PyTorch(transformers) 引擎支持几乎有所的最新模型,这是 Pytorch 模型默认使用的引擎:" #: ../../source/getting_started/installation.rst:46 msgid "Notes:" msgstr "注意:" #: ../../source/getting_started/installation.rst:48 msgid "" "The transformers engine supports ``pytorch`` / ``gptq`` / ``awq`` / " "``bnb`` / ``fp4`` formats." msgstr "Transformers引擎支持 ``pytorch`` / ``gptq`` / ``awq`` / ``bnb`` / ``fp4`` 格式。" #: ../../source/getting_started/installation.rst:49 msgid "" "FP4 format requires ``transformers`` with ``FPQuantConfig`` support. If " "you see an import error, please upgrade ``transformers`` to a newer " "version." msgstr "FP4格式需要支持FPQuantConfig的transformers库。若遇到导入错误,请将transformers升级至新版本。" #: ../../source/getting_started/installation.rst:54 msgid "vLLM Backend" msgstr "vLLM 引擎" #: ../../source/getting_started/installation.rst:55 msgid "" "vLLM is a fast and easy-to-use library for LLM inference and serving. " "Xinference will choose vLLM as the backend to achieve better throughput " "when the following conditions are met:" msgstr "vLLM 是一个支持高并发的高性能大模型推理引擎。当满足以下条件时,Xinference 会自动选择 vllm 作为引擎来达到更高的吞吐量:" #: ../../source/getting_started/installation.rst:57 msgid "" "The model format is ``pytorch``, ``gptq``, ``awq``, ``fp4``, ``fp8`` or " "``bnb``." msgstr "模型格式为 ``pytorch`` , ``gptq`` , ``awq`` , ``fp4`` , ``fp8`` 或者 ``bnb`` 。" #: ../../source/getting_started/installation.rst:58 msgid "When the model format is ``pytorch``, the quantization is ``none``." msgstr "当模型格式为 ``pytorch`` 时,量化选项需为 ``none`` 。" #: ../../source/getting_started/installation.rst:59 msgid "When the model format is ``awq``, the quantization is ``Int4``." msgstr "当模型格式为 ``awq`` 时,量化选项需为 ``Int4`` 。" #: ../../source/getting_started/installation.rst:60 msgid "" "When the model format is ``gptq``, the quantization is ``Int3``, ``Int4``" " or ``Int8``." msgstr "当模型格式为 ``gptq`` 时,量化选项需为 ``Int3`` 、 ``Int4`` 或者 ``Int8`` 。" #: ../../source/getting_started/installation.rst:61 msgid "The system is Linux and has at least one CUDA device" msgstr "操作系统为 Linux 并且至少有一个支持 CUDA 的设备" #: ../../source/getting_started/installation.rst:62 msgid "" "The model family (for custom models) / model name (for builtin models) is" " within the list of models supported by vLLM" msgstr "自定义模型的 ``model_family`` 字段和内置模型的 ``model_name`` 字段在 vLLM 的支持列表中。" #: ../../source/getting_started/installation.rst:64 msgid "Currently, supported models include:" msgstr "目前,支持的模型包括:" #: ../../source/getting_started/installation.rst:68 msgid "" "``code-llama``, ``code-llama-instruct``, ``code-llama-python``, " "``deepseek``, ``deepseek-chat``, ``deepseek-coder``, ``deepseek-coder-" "instruct``, ``deepseek-r1-distill-llama``, ``gorilla-openfunctions-v2``, " "``HuatuoGPT-o1-LLaMA-3.1``, ``llama-2``, ``llama-2-chat``, ``llama-3``, " "``llama-3-instruct``, ``llama-3.1``, ``llama-3.1-instruct``, " "``llama-3.3-instruct``, ``tiny-llama``, ``wizardcoder-python-v1.0``, " "``wizardmath-v1.0``, ``Yi``, ``Yi-1.5``, ``Yi-1.5-chat``, ``Yi-1.5-chat-" "16k``, ``Yi-200k``, ``Yi-chat``" msgstr "" #: ../../source/getting_started/installation.rst:69 msgid "" "``codestral-v0.1``, ``mistral-instruct-v0.1``, ``mistral-instruct-v0.2``," " ``mistral-instruct-v0.3``, ``mistral-large-instruct``, ``mistral-nemo-" "instruct``, ``mistral-v0.1``, ``openhermes-2.5``, ``seallm_v2``" msgstr "" #: ../../source/getting_started/installation.rst:70 msgid "" "``Baichuan-M2``, ``codeqwen1.5``, ``codeqwen1.5-chat``, ``deepseek-r1" "-distill-qwen``, ``DianJin-R1``, ``fin-r1``, ``HuatuoGPT-o1-Qwen2.5``, " "``KAT-V1``, ``marco-o1``, ``qwen1.5-chat``, ``qwen2-instruct``, " "``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-coder-instruct``, " "``qwen2.5-instruct``, ``qwen2.5-instruct-1m``, ``qwenLong-l1``, ``QwQ-" "32B``, ``QwQ-32B-Preview``, ``seallms-v3``, ``skywork-or1``, ``skywork-" "or1-preview``, ``XiYanSQL-QwenCoder-2504``" msgstr "" #: ../../source/getting_started/installation.rst:71 msgid "``llama-3.2-vision``, ``llama-3.2-vision-instruct``" msgstr "" #: ../../source/getting_started/installation.rst:72 msgid "``baichuan-2``, ``baichuan-2-chat``" msgstr "" #: ../../source/getting_started/installation.rst:73 msgid "``InternLM2ForCausalLM``" msgstr "" #: ../../source/getting_started/installation.rst:74 msgid "``qwen-chat``" msgstr "" #: ../../source/getting_started/installation.rst:75 msgid "" "``mixtral-8x22B-instruct-v0.1``, ``mixtral-instruct-v0.1``, " "``mixtral-v0.1``" msgstr "" #: ../../source/getting_started/installation.rst:76 msgid "``cogagent``" msgstr "" #: ../../source/getting_started/installation.rst:77 msgid "``glm-edge-chat``, ``glm4-chat``, ``glm4-chat-1m``" msgstr "" #: ../../source/getting_started/installation.rst:78 msgid "``codegeex4``, ``glm-4v``" msgstr "" #: ../../source/getting_started/installation.rst:79 msgid "``seallm_v2.5``" msgstr "" #: ../../source/getting_started/installation.rst:80 msgid "``orion-chat``" msgstr "" #: ../../source/getting_started/installation.rst:81 msgid "``qwen1.5-moe-chat``, ``qwen2-moe-instruct``" msgstr "" #: ../../source/getting_started/installation.rst:82 msgid "``CohereForCausalLM``" msgstr "" #: ../../source/getting_started/installation.rst:83 msgid "" "``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, " "``deepseek-vl2``" msgstr "" #: ../../source/getting_started/installation.rst:84 msgid "" "``deepseek-prover-v2``, ``deepseek-r1``, ``deepseek-r1-0528``, " "``deepseek-v3``, ``deepseek-v3-0324``, ``Deepseek-V3.1``, ``moonlight-" "16b-a3b-instruct``" msgstr "" #: ../../source/getting_started/installation.rst:85 msgid "``deepseek-r1-0528-qwen3``, ``qwen3``" msgstr "" #: ../../source/getting_started/installation.rst:86 msgid "``minicpm3-4b``" msgstr "" #: ../../source/getting_started/installation.rst:87 msgid "``internlm3-instruct``" msgstr "" #: ../../source/getting_started/installation.rst:88 msgid "``gemma-3-1b-it``" msgstr "" #: ../../source/getting_started/installation.rst:89 msgid "``glm4-0414``" msgstr "" #: ../../source/getting_started/installation.rst:90 msgid "" "``minicpm-2b-dpo-bf16``, ``minicpm-2b-dpo-fp16``, ``minicpm-2b-dpo-" "fp32``, ``minicpm-2b-sft-bf16``, ``minicpm-2b-sft-fp32``, ``minicpm4``" msgstr "" #: ../../source/getting_started/installation.rst:91 msgid "``Ernie4.5``" msgstr "" #: ../../source/getting_started/installation.rst:92 msgid "``Qwen3-Coder``, ``Qwen3-Instruct``, ``Qwen3-Thinking``" msgstr "" #: ../../source/getting_started/installation.rst:93 msgid "``glm-4.5``" msgstr "" #: ../../source/getting_started/installation.rst:94 msgid "``gpt-oss``" msgstr "" #: ../../source/getting_started/installation.rst:95 msgid "``seed-oss``" msgstr "" #: ../../source/getting_started/installation.rst:96 msgid "``Qwen3-Next-Instruct``, ``Qwen3-Next-Thinking``" msgstr "" #: ../../source/getting_started/installation.rst:97 msgid "``DeepSeek-V3.2``, ``DeepSeek-V3.2-Exp``" msgstr "" #: ../../source/getting_started/installation.rst:98 msgid "``MiniMax-M2``" msgstr "" #: ../../source/getting_started/installation.rst:101 msgid "To install Xinference and vLLM::" msgstr "安装 xinference 和 vLLM:" #: ../../source/getting_started/installation.rst:114 msgid "Llama.cpp Backend" msgstr "Llama.cpp 引擎" #: ../../source/getting_started/installation.rst:115 msgid "" "Xinference supports models in ``gguf`` format via ``xllamacpp``. " "`xllamacpp `_ is developed by " "Xinference team, and is the sole backend for llama.cpp since v1.6.0." msgstr "" "Xinference 通过 xllamacpp 支持 gguf 格式的模型。`xllamacpp " "`_ 由 Xinference 团队开发,并从 v1.6.0 " "开始成为 llama.cpp 的唯一后端。" #: ../../source/getting_started/installation.rst:121 msgid "" "Since Xinference v1.5.0, ``llama-cpp-python`` is deprecated. Since " "Xinference v1.6.0, ``llama-cpp-python`` has been removed." msgstr "" "自 Xinference v1.5.0 起,``llama-cpp-python`` 被弃用;在 Xinference 从 v1.6.0 " "开始,该后端已被移除。" #: ../../source/getting_started/installation.rst:124 #: ../../source/getting_started/installation.rst:134 #: ../../source/getting_started/installation.rst:143 msgid "Initial setup::" msgstr "初始步骤:" #: ../../source/getting_started/installation.rst:128 msgid "" "For more installation instructions for ``xllamacpp`` to enable GPU " "acceleration, please refer to: https://github.com/xorbitsai/xllamacpp" msgstr "" "更多的 ``xllamacpp`` 安装说明以便开启 GPU " "加速,请参考:https://github.com/xorbitsai/xllamacpp" #: ../../source/getting_started/installation.rst:131 msgid "SGLang Backend" msgstr "SGLang 引擎" #: ../../source/getting_started/installation.rst:132 msgid "" "SGLang has a high-performance inference runtime with RadixAttention. It " "significantly accelerates the execution of complex LLM programs by " "automatic KV cache reuse across multiple calls. And it also supports " "other common techniques like continuous batching and tensor parallelism." msgstr "" "SGLang 具有基于 RadixAttention 的高性能推理运行时。它通过在多个调用之间自动重用KV缓存,显著加速了复杂 LLM " "程序的执行。它还支持其他常见推理技术,如连续批处理和张量并行处理。" #: ../../source/getting_started/installation.rst:140 msgid "MLX Backend" msgstr "MLX 引擎" #: ../../source/getting_started/installation.rst:141 msgid "MLX-lm is designed for Apple silicon users to run LLM efficiently." msgstr "MLX-lm 用来在苹果 silicon 芯片上提供高效的 LLM 推理。" #: ../../source/getting_started/installation.rst:148 msgid "Other Platforms" msgstr "其他平台" #: ../../source/getting_started/installation.rst:150 msgid ":ref:`Ascend NPU `" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/getting_started/installation_npu.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-07-07 16:58+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/getting_started/installation_npu.rst:6 msgid "Installation Guide for Ascend NPU" msgstr "在昇腾 NPU 上安装" #: ../../source/getting_started/installation_npu.rst:7 msgid "Xinference can run on Ascend NPU, follow below instructions to install." msgstr "Xinference 能在昇腾 NPU 上运行,使用如下命令安装。" #: ../../source/getting_started/installation_npu.rst:11 msgid "" "The open-source version relies on Transformers for inference, which can " "be slow on chips like 310p3. We provide an enterprise version that " "supports the MindIE engine, offering better performance and compatibility" " for Ascend NPU. Refer to `Xinference Enterprise " "`_" msgstr "" "开源版本依赖 Transformers 进行推理,在 310p3 等芯片上会存在运行慢的问题。" "我们提供了支持 MindIE 引擎,性能更为强大,兼容性更好的企业版本来支持 " "Ascend NPU。详细参考 `Xinference 企业版 `_" #: ../../source/getting_started/installation_npu.rst:18 msgid "Installing PyTorch and Ascend extension for PyTorch" msgstr "安装 PyTorch 和昇腾扩展" #: ../../source/getting_started/installation_npu.rst:19 msgid "Install PyTorch CPU version and corresponding Ascend extension." msgstr "安装 PyTorch CPU 版本和相应的昇腾扩展。" #: ../../source/getting_started/installation_npu.rst:21 msgid "Take PyTorch v2.1.0 as example." msgstr "以 PyTorch v2.1.0 为例。" #: ../../source/getting_started/installation_npu.rst:27 msgid "" "Then install `Ascend extension for PyTorch " "`_." msgstr "接着安装 `昇腾 PyTorch 扩展 `_." #: ../../source/getting_started/installation_npu.rst:35 msgid "Running below command to see if it correctly prints the Ascend NPU count." msgstr "运行如下命令查看,如果正常运行,会打印昇腾 NPU 的个数。" #: ../../source/getting_started/installation_npu.rst:42 msgid "Installing Xinference" msgstr "安装 Xinference" #: ../../source/getting_started/installation_npu.rst:48 msgid "" "Now you can use xinference according to :ref:`doc `. " "``Transformers`` backend is the only available engine supported for " "Ascend NPU for open source version." msgstr "" "现在你可以参考 :ref:`文档 ` 来使用 Xinference。``" "Transformers`` 是开源唯一支持的昇腾 NPU 的引擎。" #: ../../source/getting_started/installation_npu.rst:52 msgid "Enterprise Support" msgstr "企业支持" #: ../../source/getting_started/installation_npu.rst:53 msgid "" "If you encounter any performance or other issues for Ascend NPU, please " "reach out to us via `link `_." msgstr "" "如果你在昇腾 NPU 遇到任何性能和其他问题,欢迎垂询 Xinference 企业版,在 `" "这里 `_ 联系我们" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/getting_started/logging.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-11-15 19:32+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/getting_started/logging.rst:5 msgid "Logging in Xinference" msgstr "日志" #: ../../source/getting_started/logging.rst:8 msgid "Configure Log Level" msgstr "日志等级" #: ../../source/getting_started/logging.rst:9 msgid "" "You can configure the log level with the ``--log-level`` option. For " "example, starting a local cluster with ``DEBUG`` log level:" msgstr "" "你可以通过 ``--log-level`` 选项来配置 Xinference 集群的日志等级。例如,以" " ``DEBUG`` 日志等级启动 Xinference 本地集群:" #: ../../source/getting_started/logging.rst:18 msgid "Log Files" msgstr "日志文件" #: ../../source/getting_started/logging.rst:19 msgid "" "Xinference supports log rotation of log files. By default, logs rotate " "when they reach 100MB (maxBytes), and up to 30 backup files (backupCount)" " are kept. Note that the log level configured above takes effect in both " "the command line logs and the log files." msgstr "" "Xinference 支持滚动日志文件。默认情况下,当单个日志文件达到 100MB 时会" "生成新的日志备份文件,系统会保留最近的30份日志备份。上述配置日志等级的" "方式会同时影响命令行日志和日志文件。" #: ../../source/getting_started/logging.rst:24 msgid "Log Directory Structure" msgstr "日志目录结构" #: ../../source/getting_started/logging.rst:25 msgid "" "All the logs are stored in the ``/logs`` directory, " "where ```` can be configured as mentioned in " ":ref:`using_xinference`." msgstr "" "首先,所有的日志存储在 ``/logs`` 目录中,其中 ``<" "XINFERENCE_HOME>`` 的配置方式请参考 :ref:`using_xinference` 。" #: ../../source/getting_started/logging.rst:27 msgid "" "Xinference creates a subdirectory under the log directory " "``/logs``. The name of the subdirectory corresponds to " "the Xinference cluster startup time in milliseconds." msgstr "" "其次,Xinference 在日志目录 ``/logs`` 下创建一个子目录。" "子目录的名称对应于 Xinference 集群启动的时间(以毫秒为单位)。" #: ../../source/getting_started/logging.rst:31 msgid "Local deployment" msgstr "本地部署" #: ../../source/getting_started/logging.rst:32 msgid "" "In a local deployment, the logs of Xinference supervisor and Xinference " "workers are combined into a single file. An example of the log directory " "structure is shown below::" msgstr "" "在本地部署中,Xinference supervisor 和 Xinference workers 的日志被合并到" "一个文件中。日志目录结构如下所示:" #: ../../source/getting_started/logging.rst:38 msgid "" "where ``1699503558105`` is the timestamp when the Xinference cluster was " "created. Therefore, when you create a cluster locally multiple times, you" " can look for the corresponding logs based on this timestamp." msgstr "" "其中,``1699503558105`` 是 Xinference 集群创建时的时间戳。因此,当你在" "本地多次创建集群时,可以根据此时间戳查找相应的日志。" #: ../../source/getting_started/logging.rst:42 msgid "Distributed deployment" msgstr "分布式部署" #: ../../source/getting_started/logging.rst:43 msgid "" "In a distributed deployment, Xinference supervisor and Xinference workers" " each create their own subdirectory under the log directory. The name of " "the subdirectory starts with the role name, followed by the role startup " "time in milliseconds. An example of the log directory structure is shown " "below::" msgstr "" "在分布式部署中,Xinference supervisor 和 Xinference workers 分别在日志" "目录下创建自己的子目录。子目录的名称以集群角色名称开头,然后是启动时间(" "以毫秒为单位)。如下所示:" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/getting_started/release_notes.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2025, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2025. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-03-15 14:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/getting_started/release_notes.rst:4 #: ../../source/getting_started/release_notes.rst:10 msgid "Release Notes" msgstr "版本发布说明" #: ../../source/getting_started/release_notes.rst:6 msgid "" "This page provides a version-by-version index of Xinference release " "notes. For detailed updates, please visit the corresponding links below." msgstr "" "本页面提供 Xinference 各版本的发布说明索引。有关详细更新,请访问对应的" "链接。" #: ../../source/getting_started/release_notes.rst:10 msgid "Version" msgstr "版本" #: ../../source/getting_started/release_notes.rst:12 msgid "v2.3.0" msgstr "" #: ../../source/getting_started/release_notes.rst:12 msgid "`View release notes `_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:14 msgid "v2.2.0" msgstr "" #: ../../source/getting_started/release_notes.rst:14 msgid "`View release notes `_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:16 msgid "v2.1.0" msgstr "" #: ../../source/getting_started/release_notes.rst:16 msgid "`View release notes `_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:18 msgid "v2.0.0" msgstr "" #: ../../source/getting_started/release_notes.rst:18 msgid "`View release notes `_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:20 msgid "v1.17.0" msgstr "" #: ../../source/getting_started/release_notes.rst:20 msgid "`View release notes `_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:22 msgid "v1.16.0" msgstr "" #: ../../source/getting_started/release_notes.rst:22 msgid "`View release notes `_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:24 msgid "v1.15.0" msgstr "" #: ../../source/getting_started/release_notes.rst:24 msgid "`View release notes `_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:26 msgid "v1.14.0" msgstr "" #: ../../source/getting_started/release_notes.rst:26 msgid "`View release notes `_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:28 msgid "v1.13.0" msgstr "" #: ../../source/getting_started/release_notes.rst:28 msgid "`View release notes `_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:30 msgid "v1.12.0" msgstr "" #: ../../source/getting_started/release_notes.rst:30 msgid "`View release notes `_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:32 msgid "v1.11.0.post1" msgstr "" #: ../../source/getting_started/release_notes.rst:32 msgid "" "`View release notes " "`_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:34 msgid "v1.10.1" msgstr "" #: ../../source/getting_started/release_notes.rst:34 msgid "`View release notes `_" msgstr "`查看版本说明 `_" #: ../../source/getting_started/release_notes.rst:39 msgid "" "For older versions and source history, see our GitHub releases page: " "https://github.com/xorbitsai/inference/releases" msgstr "" "更多历史版本及源代码,请访问 GitHub Releases:https://github.com/" "xorbitsai/inference/releases" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/getting_started/troubleshooting.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-07 15:50+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/getting_started/troubleshooting.rst:5 msgid "Troubleshooting" msgstr "故障排除" #: ../../source/getting_started/troubleshooting.rst:9 msgid "No huggingface repo access" msgstr "没有 huggingface 仓库权限" #: ../../source/getting_started/troubleshooting.rst:11 msgid "" "Sometimes, you may face errors accessing huggingface models, such as the " "following message when accessing `llama2`:" msgstr "" "获取模型时,有时候会遇到权限问题。比如在获取 ``llama2`` 模型时可能会有" "以下提示:" #: ../../source/getting_started/troubleshooting.rst:18 msgid "" "This typically indicates either a lack of access rights to the repository" " or missing huggingface access tokens. The following sections provide " "guidance on addressing these issues." msgstr "" "这种情况一般是缺少 huggingface 仓库的权限,或者是没有配置 huggingface " "token。可以按照接下来的方式解决这个问题。" #: ../../source/getting_started/troubleshooting.rst:22 msgid "Get access to the huggingface repo" msgstr "申请 huggingface 仓库权限" #: ../../source/getting_started/troubleshooting.rst:24 msgid "" "To obtain access, navigate to the desired huggingface repository and " "agree to its terms and conditions. As an illustration, for the `llama2` " "model, you can use this link: `https://huggingface.co/meta-llama/Llama-2" "-7b-hf `_." msgstr "" "想要获取访问权限,打开对应的 huggingface 仓库,同意其条款和注意事项。以 `" "`llama2`` 为例,可以打开这个链接去申请:`https://huggingface.co/meta-" "llama/Llama-2-7b-hf `_." #: ../../source/getting_started/troubleshooting.rst:29 msgid "Set up credentials to access huggingface" msgstr "设置访问 huggingface 凭证" #: ../../source/getting_started/troubleshooting.rst:31 msgid "" "Your credential to access huggingface can be found online at " "`https://huggingface.co/settings/tokens " "`_." msgstr "" "可以在 huggingface 页面找到凭证,`https://huggingface.co/settings/tokens " "`_." #: ../../source/getting_started/troubleshooting.rst:33 msgid "" "You can set the token as an environmental variable, with ``export " "HUGGING_FACE_HUB_TOKEN=your_token_here``." msgstr "" "可以通过设置环境变量设置访问凭证,``export HUGGING_FACE_HUB_TOKEN=your_" "token_here``。" #: ../../source/getting_started/troubleshooting.rst:37 msgid "Incompatibility Between NVIDIA Driver and PyTorch Version" msgstr "英伟达驱动和 PyTorch 版本不匹配" #: ../../source/getting_started/troubleshooting.rst:39 msgid "If you are using a NVIDIA GPU, you may face the following error:" msgstr "如果你在使用英伟达显卡,你可能会遇到以下错误:" #: ../../source/getting_started/troubleshooting.rst:50 msgid "" "This typically indicates that your CUDA driver version is not compatible " "with the PyTorch version you are using." msgstr "这种情况一般是 CUDA 的版本和 Pytorch 版本不兼容导致的。" #: ../../source/getting_started/troubleshooting.rst:52 msgid "" "Go to `https://pytorch.org `_ to install a PyTorch " "version that has been compiled with your version of the CUDA driver. **Do" " not install a cuda version smaller than 11.8, preferably between 11.8 " "and 12.1.**" msgstr "" "可以到 `https://pytorch.org `_ 官网安装和 CUDA 对应" "的预编译版本的 PyTorch。同时,**请检查安装的 CUDA 版本不要小于 11.8,最好" "版本在 11.8 到 12.1之间。**" #: ../../source/getting_started/troubleshooting.rst:55 msgid "" "Say if your CUDA driver version is 11.8, then you can install PyTorch " "with the following command:" msgstr "比如你的 CUDA 版本是 11.8,可以使用以下命令安装对应的 PyTorch:" #: ../../source/getting_started/troubleshooting.rst:63 msgid "" "Xinference service cannot be accessed from external systems through " "``:9997``" msgstr "外部系统无法通过 ``:9997`` 访问 Xinference 服务" #: ../../source/getting_started/troubleshooting.rst:65 msgid "Use ``-H 0.0.0.0`` parameter in when starting Xinference:" msgstr "在启动 Xinference 时记得要加上 ``-H 0.0.0.0`` 参数:" #: ../../source/getting_started/troubleshooting.rst:71 msgid "" "Then Xinference service will listen on all network interfaces (not " "limited to ``127.0.0.1`` or ``localhost``)." msgstr "" "那么 Xinference 服务将监听所有网络接口(而不仅限于 ``127.0.0.1`` 或 ``" "localhost``)。" #: ../../source/getting_started/troubleshooting.rst:73 msgid "" "If you are using the :ref:`using_docker_image`, please add ``-p " ":9997`` during the docker run command, then access is available " "through ``:`` of the local machine." msgstr "" "如果使用的是 :ref:`using_docker_image`,请在 Docker 运行命令中 加上 ``-p " ":9997`` ,,你就可以通过本地机器的 ``:`` 进行访问。" #: ../../source/getting_started/troubleshooting.rst:78 msgid "" "Launching a built-in model takes a long time, and sometimes the model " "fails to download" msgstr "启动内置模型需要很长时间,模型有时下载失败" #: ../../source/getting_started/troubleshooting.rst:80 msgid "" "Xinference by default uses HuggingFace as the source for models. If your " "machines are in Mainland China, there might be accessibility issues when " "using built-in models." msgstr "" "Xinference 默认使用 HuggingFace作为模型源。如果你的机器在中国大陆,使用" "内置模型可能会有访问问题。" #: ../../source/getting_started/troubleshooting.rst:84 msgid "" "To address this, add environment variable " "``XINFERENCE_MODEL_SRC=modelscope`` when starting the Xinference to " "change the model source to ModelScope, which is optimized for Mainland " "China." msgstr "" "要解决这个问题,可以在启动 Xinference 时添加环境变量 ``XINFERENCE_MODEL_" "SRC=modelscope``,将模型源更改为 ModelScope,在中国大陆速度下载更快。" #: ../../source/getting_started/troubleshooting.rst:88 msgid "" "If you’re starting Xinference with Docker, include ``-e XINFERENCE_MODEL" "_SRC=modelscope`` during the docker run command." msgstr "" "如果你用 Docker 启动 Xinference,可以在 Docker 命令中包含 ``-e XINFERENCE" "_MODEL_SRC=modelscope`` 选项。" #: ../../source/getting_started/troubleshooting.rst:92 msgid "" "When using the official Docker image, RayWorkerVllm died due to OOM, " "causing the model to fail to load" msgstr "使用官方 Docker 映像时,RayWorkerVllm 因 OOM 而死亡,导致模型无法加载" #: ../../source/getting_started/troubleshooting.rst:94 msgid "" "Docker's ``--shm-size`` parameter is used to set the size of shared " "memory. The default size of shared memory (/dev/shm) is 64MB, which may " "be too small for vLLM backend." msgstr "" "Docker 的 ``--shm-size`` 参数可以用来设置共享内存的大小。共享内存(/dev/" "shm)的默认大小是 64MB,对于 vLLM 后端来说可能不够。" #: ../../source/getting_started/troubleshooting.rst:98 msgid "" "You can increase its size by setting the ``--shm-size`` parameter as " "follows:" msgstr "你可以通过设置参数 ``--shm-size`` 来增加它的大小:" #: ../../source/getting_started/troubleshooting.rst:106 msgid "Missing ``model_engine`` parameter when launching LLM models" msgstr "加载 LLM 模型时提示缺失 ``model_engine`` 参数" #: ../../source/getting_started/troubleshooting.rst:108 msgid "" "Since version ``v0.11.0``, launching LLM models requires an additional " "``model_engine`` parameter. For specific information, please refer to " ":ref:`here `." msgstr "" "自 ``v0.11.0`` 版本开始,加载 LLM 模型时需要传入额外参数 ``model_engine``" " 。具体信息请参考 :ref:`这里 ` 。" #: ../../source/getting_started/troubleshooting.rst:112 msgid "Resolving MKL Threading Layer Conflicts" msgstr "解决 MKL 线程层冲突" #: ../../source/getting_started/troubleshooting.rst:114 msgid "" "When starting the Xinference server, you may encounter the error: " "``ValueError: Model architectures ['Qwen2ForCausalLM'] failed to be " "inspected. Please check the logs for more details.``" msgstr "" "在启动 Xinference 服务器时,如果遇到错误:``ValueError: Model " "architectures ['Qwen2ForCausalLM'] failed to be inspected. . Please check" " the logs for more details.``" #: ../../source/getting_started/troubleshooting.rst:116 msgid "The underlying cause shown in the logs is:" msgstr "日志中显示的根本原因是:" #: ../../source/getting_started/troubleshooting.rst:123 msgid "" "This typically occurs when NumPy was installed via conda. Conda's NumPy " "is built with Intel MKL optimizations, which conflicts with the GNU " "OpenMP library (libgomp) already loaded in your environment." msgstr "" "这通常是因为你的 NumPy 是通过 conda 安装的,而 conda 的 NumPy 是使用 " "Intel MKL 优化构建的,这导致它与环境中已加载的 GNU OpenMP 库(libgomp)" "产生冲突。" #: ../../source/getting_started/troubleshooting.rst:126 msgid "Solution 1: Override the Threading Layer" msgstr "解决方案 1:重写线程层" #: ../../source/getting_started/troubleshooting.rst:128 msgid "Force Intel's Math Kernel Library to use GNU's OpenMP implementation:" msgstr "" "设置 MKL_THREADING_LAYER=GNU 可以强制 Intel 数学核心库(MKL)使用 GNU 的 " "OpenMP 实现:" #: ../../source/getting_started/troubleshooting.rst:135 msgid "Solution 2: Reinstall NumPy with pip" msgstr "解决方案 2:使用 pip 重新安装 NumPy" #: ../../source/getting_started/troubleshooting.rst:137 msgid "Uninstall conda's NumPy and reinstall using pip:" msgstr "卸载 conda 安装的 numpy,然后使用 pip 重新安装。" #: ../../source/getting_started/troubleshooting.rst:146 msgid "Related Note: vLLM and PyTorch" msgstr "相关说明:vLLM 与 PyTorch" #: ../../source/getting_started/troubleshooting.rst:148 msgid "" "If you're using vLLM, avoid installing PyTorch with conda. Refer to the " "official vLLM installation guide for GPU-specific instructions: " "https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html" msgstr "" "如果你在使用 vLLM,请避免通过 conda 安装 PyTorch。有关特定 GPU 的安装说明" ",请参阅 vLLM 官方安装指南:https://docs.vllm.ai/en/latest/getting_" "started/installation/gpu.html" #: ../../source/getting_started/troubleshooting.rst:151 msgid "Configuring PyPI Mirrors to Speed Up Package Installation" msgstr "配置 PyPI 镜像以加快软件包安装速度" #: ../../source/getting_started/troubleshooting.rst:153 msgid "" "If you're in Mainland China, using a PyPI mirror can significantly speed " "up package installation. Here are some commonly used mirrors:" msgstr "" "如果你在中国大陆,使用 PyPI 镜像可以显著加快软件包的安装速度。以下是一些" "常用的镜像源:" #: ../../source/getting_started/troubleshooting.rst:155 msgid "Tsinghua University: ``https://pypi.tuna.tsinghua.edu.cn/simple``" msgstr "清华大学镜像:``https://pypi.tuna.tsinghua.edu.cn/simple``" #: ../../source/getting_started/troubleshooting.rst:156 msgid "Alibaba Cloud: ``https://mirrors.aliyun.com/pypi/simple/``" msgstr "阿里云镜像:``https://mirrors.aliyun.com/pypi/simple/``" #: ../../source/getting_started/troubleshooting.rst:157 msgid "Tencent Cloud: ``https://mirrors.cloud.tencent.com/pypi/simple``" msgstr "腾讯云镜像:``https://mirrors.cloud.tencent.com/pypi/simple``" #: ../../source/getting_started/troubleshooting.rst:159 msgid "" "However, be aware that some packages may not be available on certain " "mirrors. For example, if you're installing ``xinference[audio]`` using " "only the Aliyun mirror, the installation may fail." msgstr "" "但请注意,某些镜像源上可能缺少部分软件包。例如,如果你仅使用阿里云镜像" "安装 ``xinference[audio]``,安装可能会失败。" #: ../../source/getting_started/troubleshooting.rst:161 msgid "" "This happens because ``num2words``, a dependency used by ``MeloTTS``, is " "not available on the Aliyun mirror. As a result, ``pip install " "xinference[audio]`` will resolve to older versions like " "``xinference==1.2.0`` and ``xoscar==0.8.0`` (as of Oct 27, 2025)." msgstr "" "这是因为 ``MeloTTS`` 所依赖的 ``num2words`` 软件包在阿里云镜像上不可用。" "因此,在执行 ``pip install xinference[audio]`` 时,可能会回退安装旧版本," "如 ``xinference==1.2.0`` 和 ``xoscar==0.8.0`` (截至 2025 年 10 月 27 日" ")。" #: ../../source/getting_started/troubleshooting.rst:163 msgid "" "These older versions are incompatible and will produce the error: " "``MainActorPool.append_sub_pool() got an unexpected keyword argument " "'start_method'``" msgstr "" "这些旧版本不兼容,会导致以下错误:``MainActorPool.append_sub_pool() got " "an unexpected keyword argument 'start_method'``" #: ../../source/getting_started/troubleshooting.rst:174 msgid "" "To avoid this issue when installing the xinference audio package, use " "multiple mirrors:" msgstr "为避免在安装 xinference 音频包时出现此问题,建议同时使用多个镜像源:" #: ../../source/getting_started/troubleshooting.rst:188 msgid "Installing Xinference 1.12.0 with uv Fails (As of November 2025)" msgstr "使用 uv 安装 Xinference 1.12.0 失败(截至 2025 年 11 月)" #: ../../source/getting_started/troubleshooting.rst:190 msgid "" "**Note:** This is a temporary issue due to the current package ecosystem " "and uv prioritizing **higher versions for direct dependencies** over " "**indirect dependencies**." msgstr "" "**注意:** 这是一个临时性问题,原因在于当前的软件包生态系统以及 uv 的依赖" "解析策略——它会优先选择 **直接依赖的高版本**,而不是 **间接依赖的版本**。" #: ../../source/getting_started/troubleshooting.rst:193 #: ../../source/getting_started/troubleshooting.rst:250 msgid "Symptom" msgstr "症状" #: ../../source/getting_started/troubleshooting.rst:195 msgid "" "When installing xinference 1.12.0 as of November 2025 using ``uv pip " "install xinference``, you may encounter an issue where very old package " "versions are installed, particularly:" msgstr "" "在 2025 年 11 月使用 ``uv pip install xinference`` 安装 xinference 1.12.0" " 时,你可能会遇到安装到非常旧版本依赖包的问题,尤其是:" #: ../../source/getting_started/troubleshooting.rst:197 msgid "``transformers==4.12.2`` (from 2021)" msgstr "``transformers==4.12.2`` (来自 2021 年的版本)" #: ../../source/getting_started/troubleshooting.rst:198 msgid "``tokenizers==0.10.3`` (from 2021)" msgstr "``tokenizers==0.10.3`` (来自 2021 年的版本)" #: ../../source/getting_started/troubleshooting.rst:199 msgid "``huggingface-hub==1.0.1``" msgstr "" #: ../../source/getting_started/troubleshooting.rst:201 msgid "Then uv fails with \"Failed to build `tokenizers==0.10.3`\"" msgstr "" "随后 uv 报错:\"Failed to build `tokenizers==0.10.3`\"(构建 `tokenizers=" "=0.10.3` 失败)" #: ../../source/getting_started/troubleshooting.rst:204 #: ../../source/getting_started/troubleshooting.rst:261 msgid "Root Cause" msgstr "根本原因" #: ../../source/getting_started/troubleshooting.rst:206 msgid "" "This occurs because uv prioritizes **higher versions for direct " "dependencies** over **indirect dependencies**:" msgstr "" "出现该问题的原因是 uv 会优先选择 **直接依赖的高版本**,而忽略 **间接依赖*" "* 中的版本要求:" #: ../../source/getting_started/troubleshooting.rst:208 msgid "" "xinference 1.12.0 specifies ``huggingface-hub>=0.19.4`` as a **direct " "dependency** (no upper bound)" msgstr "" "xinference 1.12.0 将 ``huggingface-hub>=0.19.4`` 指定为 **直接依赖** (" "没有上限约束)" #: ../../source/getting_started/troubleshooting.rst:209 msgid "uv selects the latest: ``huggingface-hub==1.0.1`` as of November 06 2025" msgstr "截至 2025 年 11 月 6 日,uv 会选择最新版本:``huggingface-hub==1.0.1``" #: ../../source/getting_started/troubleshooting.rst:210 msgid "" "However, ``transformers<=4.57.3`` (an **indirect dependency** via " "``peft``) requires ``huggingface-hub<1.0``" msgstr "" "然而,``transformers<=4.57.3`` (通过 ``peft`` 引入的 **间接依赖** )要求 ``" "huggingface-hub<1.0``" #: ../../source/getting_started/troubleshooting.rst:211 msgid "" "To resolve the conflict, uv keeps the direct dependency at 1.0.1 and " "downgrades the indirect dependency ``transformers`` to ancient version " "4.12.2" msgstr "" "为了解决依赖冲突,uv 保留了直接依赖 ``huggingface-hub==1.0.1``,并将间接" "依赖 ``transformers`` 降级到了非常旧的版本 4.12.2。" #: ../../source/getting_started/troubleshooting.rst:213 msgid "" "**This is by design in uv**: it prioritizes what you explicitly ask for " "(direct dependencies) over transitive dependencies. Refer to " "https://github.com/astral-sh/uv/issues/16601" msgstr "" "**这属于 uv 的设计特性**:它会优先满足你显式指定的依赖(直接依赖),而非" "传递依赖。参考链接:https://github.com/astral-sh/uv/issues/16601" #: ../../source/getting_started/troubleshooting.rst:215 msgid "" "**Update:** The latest transformers 4.57.3 (as in 2026.01.05) still " "requires ``huggingface-hub<1.0``." msgstr "" msgstr "" "**更新:** 截至 2026.01.05,transformers 最新版本 4.57.3 依然 " "依赖 ``huggingface-hub<1.0``。" #: ../../source/getting_started/troubleshooting.rst:218 msgid "Solutions" msgstr "解决方案" #: ../../source/getting_started/troubleshooting.rst:220 msgid "**Solution 1: Pre-constrain huggingface-hub (Recommended)**" msgstr "**解决方案 1:预先限定 huggingface-hub 版本(推荐)**" #: ../../source/getting_started/troubleshooting.rst:222 msgid "Explicitly constrain ``huggingface-hub`` to a compatible version range:" msgstr "显式地将 ``huggingface-hub`` 限定在一个兼容的版本范围内:" #: ../../source/getting_started/troubleshooting.rst:228 msgid "" "This forces uv to select a ``huggingface-hub`` version that's compatible " "with modern ``transformers``." msgstr "" "这样可以强制 uv 选择与现代版本 ``transformers`` 兼容的 ``huggingface-hub`" "` 版本。" #: ../../source/getting_started/troubleshooting.rst:230 msgid "**Solution 2: Make transformers a direct dependency**" msgstr "**解决方案 2:将 transformers 设为直接依赖**" #: ../../source/getting_started/troubleshooting.rst:232 msgid "" "By specifying ``transformers`` explicitly, it becomes a direct dependency" " and uv will prefer higher versions:" msgstr "通过显式指定 ``transformers``,它会成为直接依赖,uv 将优先选择更高版本:" #: ../../source/getting_started/troubleshooting.rst:238 msgid "**Solution 3: Use pip**" msgstr "**解决方案 3:使用 pip**" #: ../../source/getting_started/troubleshooting.rst:240 msgid "" "Or just resort to using ``pip install xinference`` which will resolve to " "the following versions" msgstr "或者直接使用 ``pip install xinference``,它会自动解析到以下版本组合:" #: ../../source/getting_started/troubleshooting.rst:242 msgid "``transformers==4.57.1``" msgstr "" #: ../../source/getting_started/troubleshooting.rst:243 msgid "``huggingface-hub==0.36.0``" msgstr "" #: ../../source/getting_started/troubleshooting.rst:244 msgid "``tokenizers==0.22.1``" msgstr "" #: ../../source/getting_started/troubleshooting.rst:247 msgid "vLLM + Torch + Xinference Compatibility Issue (Segmentation Fault)" msgstr "vLLM + Torch + Xinference 兼容性问题(段错误)" #: ../../source/getting_started/troubleshooting.rst:252 msgid "" "If you have **vLLM < 0.12.0** installed and upgrade xinference " "(particularly using ``uv pip install -U xinference``), xinference may " "fail to start with a segmentation fault:" msgstr "" "如果你安装的是 **vLLM < 0.12.0**,并且升级了 xinference " "(尤其是使用 ``uv pip install -U xinference`` 时)," "xinference 可能会在启动时因为段错误而失败:" #: ../../source/getting_started/troubleshooting.rst:263 msgid "This issue has three contributing factors:" msgstr "该问题由三个因素共同导致:" #: ../../source/getting_started/troubleshooting.rst:265 msgid "" "**Binary Incompatibility**: vLLM versions before 0.12.0 were compiled " "against PyTorch 2.8.0. These versions are incompatible with PyTorch 2.9. " "Reference: `vLLM v0.12.0 Release Notes `_" msgstr "" "**二进制不兼容**:vLLM 在 0.12.0 之前的版本是基于 PyTorch 2.8.0 编译的," "这些版本与 PyTorch 2.9 不兼容。" "参考:`vLLM v0.12.0 发布说明 `_" #: ../../source/getting_started/troubleshooting.rst:267 msgid "" "**Xinference's Unbounded Torch Dependency**: Xinference's ``setup.cfg`` " "does not specify an upper bound for PyTorch:" msgstr "" "**Xinference 对 Torch 依赖未设置上限**:Xinference 的 ``setup.cfg`` " "中没有为 PyTorch 指定版本上限:" #: ../../source/getting_started/troubleshooting.rst:275 msgid "This allows package managers to upgrade PyTorch to incompatible versions." msgstr "" #: ../../source/getting_started/troubleshooting.rst:277 msgid "**Different Package Manager Behaviors**:" msgstr "**不同包管理器的行为差异**:" #: ../../source/getting_started/troubleshooting.rst:279 msgid "" "**pip**: Conservative - only upgrades the specified package unless " "dependencies are incompatible" msgstr "" "**pip**:较为保守 —— 仅在依赖不兼容时,才会升级相关依赖,否则只升级指定的包" #: ../../source/getting_started/troubleshooting.rst:280 msgid "" "**uv with -U flag**: Aggressive - re-resolves ALL dependencies and picks " "latest versions" msgstr "" "**使用 -U 参数的 uv**:策略较为激进 —— 会重新解析**所有**依赖,并选择最新版本" #: ../../source/getting_started/troubleshooting.rst:283 msgid "" "Therefore before you're ready to upgrade your entire stack and just want " "to upgrade xinference, use either:" msgstr "" "因此,在你尚未准备好升级整个技术栈、而只是想升级 xinference 时,可以选择使用:" #: ../../source/getting_started/troubleshooting.rst:285 msgid "" "``pip install -U xinference`` (keeps PyTorch unchanged, only upgrades " "xinference)" msgstr "``pip install -U xinference`` (保持 PyTorch 版本不变,仅升级 xinference)" #: ../../source/getting_started/troubleshooting.rst:286 msgid "" "``uv pip install \"xinference==1.16.0\"`` (without -U flag, only upgrades" " xinference too)" msgstr "" "``uv pip install \"xinference==1.16.0\"`` (不使用 -U 参数,同样只会升级" " xinference)" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-29 16:49+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/getting_started/using_docker_image.rst:5 msgid "Xinference Docker Image" msgstr "Docker 镜像" #: ../../source/getting_started/using_docker_image.rst:7 msgid "Xinference provides official images for use on Dockerhub." msgstr "Xinference 在 Dockerhub 和 阿里云容器镜像服务 中上传了官方镜像。" #: ../../source/getting_started/using_docker_image.rst:11 msgid "" "Starting from **Xinference v2.0**, to use the CUDA version of the image, " "the minimum CUDA version must be **CUDA 12.9**." msgstr "" "从 **Xinference v2.0** 开始,如果要使用cuda版本的镜像,cuda版本最低要达到 **CUDA 12.9** 。" #: ../../source/getting_started/using_docker_image.rst:14 msgid "Prerequisites" msgstr "准备工作" #: ../../source/getting_started/using_docker_image.rst:15 msgid "" "The image can only run in an environment with GPUs and CUDA installed, " "because Xinference in the image relies on Nvidia GPUs for acceleration." msgstr "Xinference 使用 GPU 加速推理,该镜像需要在有 GPU 显卡并且安装 CUDA 的机器上运行。" #: ../../source/getting_started/using_docker_image.rst:16 msgid "" "CUDA must be successfully installed on the host machine. This can be " "determined by whether you can successfully execute the ``nvidia-smi`` " "command." msgstr "保证 CUDA 在机器上正确安装。可以使用 ``nvidia-smi`` 检查是否正确运行。" #: ../../source/getting_started/using_docker_image.rst:17 msgid "" "For CUDA version >= 12.9, CUDA version in the docker image is ``12.9``, " "and the CUDA version on the host machine should be ``12.9`` or above, and" " the NVIDIA driver version should be ``575`` or above." msgstr "" "对于 CUDA 版本 >= 12.9,Docker 镜像中使用的 CUDA 版本为 ``12.9``。宿主机上的 CUDA 版本需为 " "``12.9`` 或以上,同时 NVIDIA 驱动版本需为 ``575`` 或以上。" #: ../../source/getting_started/using_docker_image.rst:18 msgid "" "Ensure `NVIDIA Container Toolkit `_ installed." msgstr "" "请确保已安装 `NVIDIA Container Toolkit `_ 。" #: ../../source/getting_started/using_docker_image.rst:22 msgid "Docker Image" msgstr "Docker 镜像" #: ../../source/getting_started/using_docker_image.rst:23 msgid "" "The official image of Xinference is available on DockerHub in the " "repository ``xprobe/xinference``. Available tags include:" msgstr "Xinference 官方镜像已发布在 DockerHub 上的 ``xprobe/xinference`` 仓库中。当前可用的标签包括:" #: ../../source/getting_started/using_docker_image.rst:26 msgid "" "``nightly-main``: This image is built daily from the `GitHub main branch " "`_ and generally does not " "guarantee stability." msgstr "``nightly-main``: 这个镜像会每天从 GitHub main 分支更新制作,不保证稳定可靠。" #: ../../source/getting_started/using_docker_image.rst:27 msgid "" "``v``: This image is built each time a Xinference " "release version is published, and it is typically more stable." msgstr "``v``: 这个镜像会在 Xinference 每次发布的时候制作,通常可以认为是稳定可靠的。" #: ../../source/getting_started/using_docker_image.rst:28 msgid "" "``latest``: This image is built with the latest Xinference release " "version." msgstr "``latest``: 这个镜像会在 Xinference 发布时指向最新的发布版本" #: ../../source/getting_started/using_docker_image.rst:29 msgid "For CPU version, add ``-cpu`` suffix, e.g. ``nightly-main-cpu``." msgstr "对于 CPU 版本,增加 ``-cpu`` 后缀,如 ``nightly-main-cpu``。" #: ../../source/getting_started/using_docker_image.rst:33 msgid "Dockerfile for custom build" msgstr "自定义镜像" #: ../../source/getting_started/using_docker_image.rst:34 msgid "" "If you need to build the Xinference image according to your own " "requirements, the source code for the Dockerfile is located at " "`xinference/deploy/docker/Dockerfile " "`_" " for reference. Please make sure to be in the top-level directory of " "Xinference when using this Dockerfile. For example:" msgstr "" "如果需要安装额外的依赖,可以参考 `xinference/deploy/docker/Dockerfile " "`_" " 。请确保使用 Dockerfile 制作镜像时在 Xinference 项目的根目录下。比如:" #: ../../source/getting_started/using_docker_image.rst:45 msgid "Image usage" msgstr "使用镜像" #: ../../source/getting_started/using_docker_image.rst:46 msgid "" "You can start Xinference in the container like this, simultaneously " "mapping port 9997 in the container to port 9998 on the host, enabling " "debug logging, and downloading models from modelscope." msgstr "" "你可以使用如下方式在容器内启动 Xinference,同时将 9997 端口映射到宿主机的 9998 端口,并且指定日志级别为 " "DEBUG,也可以指定需要的环境变量。" #: ../../source/getting_started/using_docker_image.rst:54 msgid "" "The option ``--gpus`` is essential and cannot be omitted, because as " "mentioned earlier, the image requires the host machine to have a GPU. " "Otherwise, errors will occur." msgstr "``--gpus`` 必须指定,正如前文描述,镜像必须运行在有 GPU 的机器上,否则会出现错误。" #: ../../source/getting_started/using_docker_image.rst:55 msgid "" "The ``-H 0.0.0.0`` parameter after the ``xinference-local`` command " "cannot be omitted. Otherwise, the host machine may not be able to access " "the port inside the container." msgstr "``-H 0.0.0.0`` 也是必须指定的,否则在容器外无法连接到 Xinference 服务。" #: ../../source/getting_started/using_docker_image.rst:56 msgid "" "You can add multiple ``-e`` options to introduce multiple environment " "variables." msgstr "可以指定多个 ``-e`` 选项赋值多个环境变量。" #: ../../source/getting_started/using_docker_image.rst:59 msgid "" "Certainly, if you prefer, you can also manually enter the docker " "container and start Xinference in any desired way." msgstr "当然,也可以运行容器后,进入容器内手动拉起 Xinference。" #: ../../source/getting_started/using_docker_image.rst:63 msgid "" "For multiple GPUs, make sure to set the shared memory size, for example: " "`docker run --shm-size=128g ...`" msgstr "对于多张 GPU,确保设置共享内存大小,例如:`docker run --shm-size=128g ...`" #: ../../source/getting_started/using_docker_image.rst:67 msgid "Mount your volume for loading and saving models" msgstr "挂载模型目录" #: ../../source/getting_started/using_docker_image.rst:68 msgid "" "The image does not contain any model files by default, and it downloads " "the models into the container. Typically, you would need to mount a " "directory on the host machine to the docker container, so that Xinference" " can download the models onto it, allowing for reuse. In this case, you " "need to specify a volume when running the Docker image and configure " "environment variables for Xinference:" msgstr "" "默认情况下,镜像中不包含任何模型文件,使用过程中会在容器内下载模型。如果需要使用已经下载好的模型,需要将宿主机的目录挂载到容器内。这种情况下,需要在运行容器时指定本地卷,并且为" " Xinference 配置环境变量。" #: ../../source/getting_started/using_docker_image.rst:77 msgid "" "The principle behind the above command is to mount the specified " "directory from the host machine into the container, and then set the " "``XINFERENCE_HOME`` environment variable to point to that directory " "inside the container. This way, all downloaded model files will be stored" " in the directory you specified on the host machine. You don't have to " "worry about losing them when the Docker container stops, and the next " "time you run it, you can directly use the existing models without the " "need for repetitive downloads." msgstr "" "上述命令的原理是将主机上指定的目录挂载到容器中,并设置 ``XINFERENCE_HOME`` " "环境变量指向容器内的该目录。这样,所有下载的模型文件将存储在您在主机上指定的目录中。您无需担心在 Docker " "容器停止时丢失这些文件,下次运行容器时,您可以直接使用现有的模型,无需重复下载。" #: ../../source/getting_started/using_docker_image.rst:81 msgid "" "If you downloaded the model using the default path on the host machine, " "and since the xinference cache directory stores the model using symbolic " "links, you need to mount the directory where the original file is located" " into the container as well. For example, if you are using HuggingFace " "and Modelscope as model hub, you would need to mount the corresponding " "directories into the container. Generally, the cache directories for " "HuggingFace and Modelscope are located at /.cache/huggingface " "and /.cache/modelscope. The command would be like:" msgstr "" "如果你在宿主机使用的默认路径下载的模型,由于 xinference cache " "目录是用的软链的方式存储模型,需要将原文件所在的目录也挂载到容器内。例如你使用 huggingface 和 modelscope " "作为模型仓库,那么需要将这两个对应的目录挂载到容器内,一般对应的 cache 目录分别在 " "/.cache/huggingface 和 /.cache/modelscope,使用的命令如下:" #~ msgid "" #~ "For CUDA 12.8, add ``-cu128`` suffix," #~ " e.g. ``nightly-main-cu128``. (Xinference" #~ " version should be between v1.8.1 and" #~ " v1.15.0)" #~ msgstr "" #~ "对于 CUDA 12.8 版本,增加 ``-cu128`` 后缀,如 " #~ "``nightly-main-cu128`` 。(Xinference 版本需要介于 " #~ "v1.8.1 和 v1.15.0)" #~ msgid "" #~ "Starting from **Xinference v2.0**, only " #~ "``-cu129`` and ``-cpu`` images are " #~ "officially provided." #~ msgstr "" #~ msgid "" #~ "Starting from **Xinference v2.0**, only " #~ "two image variants are provided: the " #~ "default (no suffix, **CUDA 12.9**) and" #~ " the ``-cpu`` image." #~ msgstr "" #~ "从 **Xinference v2.0** 开始,仅提供两种镜像变体:默认镜像(无后缀, " #~ "**CUDA 12.9** )和 ``-cpu`` 镜像。" #~ msgid "" #~ "For CUDA 12.9, no suffix, e.g. " #~ "``nightly-main``. (Xinference version should " #~ "be v2.0 at least)" #~ msgstr "对于 CUDA 12.9,不带后缀,例如 ``nightly-main`` 。(Xinference 版本应至少为 v2.0)" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_kubernetes.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-08-02 15:15+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/getting_started/using_kubernetes.rst:5 msgid "Xinference on Kubernetes" msgstr "在 Kubernetes 集群中安装 Xinference" #: ../../source/getting_started/using_kubernetes.rst:9 msgid "Helm Support" msgstr "基于原生 Helm 的方式" #: ../../source/getting_started/using_kubernetes.rst:10 msgid "" "Xinference provides a method for installation in a Kubernetes cluster via" " ``Helm`` ." msgstr "" "Xinference 提供基于原生 Helm 在 Kubernetes 集群中安装的方式。" #: ../../source/getting_started/using_kubernetes.rst:14 msgid "Prerequisites" msgstr "" "准备条件" #: ../../source/getting_started/using_kubernetes.rst:15 msgid "You have a fully functional Kubernetes cluster." msgstr "" "一个可用的 Kubernetes 集群。" #: ../../source/getting_started/using_kubernetes.rst:16 msgid "" "Enable GPU support in Kubernetes, refer to `here " "`_." msgstr "" "在 Kubernetes 中开启 GPU 支持,参考 `这里 `_ 。" #: ../../source/getting_started/using_kubernetes.rst:17 msgid "``Helm`` is correctly installed." msgstr "" "正确安装 ``Helm`` 。" #: ../../source/getting_started/using_kubernetes.rst:21 msgid "Steps" msgstr "" "具体步骤" #: ../../source/getting_started/using_kubernetes.rst:22 msgid "Add xinference helm repo." msgstr "新增 Xinference Helm 仓库" #: ../../source/getting_started/using_kubernetes.rst:28 msgid "Update xinference helm repo indexes and query versions." msgstr "更新仓库索引,查询可安装版本" #: ../../source/getting_started/using_kubernetes.rst:35 msgid "Install" msgstr "安装" #: ../../source/getting_started/using_kubernetes.rst:43 msgid "Customized Installation" msgstr "自定义安装" #: ../../source/getting_started/using_kubernetes.rst:44 msgid "" "The installation method mentioned above sets up a Xinference cluster " "similar to a single-machine setup, with only one worker and all startup " "parameters at their default values. However, this is usually not the " "desired setup." msgstr "" "上述安装方式安装了一个类似单机的 Xinference ,也就是只有一个节点,同时其他启动参数均保持默认。" #: ../../source/getting_started/using_kubernetes.rst:48 msgid "Below are some common custom installation configurations." msgstr "下面展示了一些常见的自定义安装配置。" #: ../../source/getting_started/using_kubernetes.rst:50 msgid "I need to download models from ``ModelScope``." msgstr "我需要从 ``ModelScope`` 下载模型。" #: ../../source/getting_started/using_kubernetes.rst:56 msgid "" "I want to use cpu image of xinference (or use any other version of " "xinference images)." msgstr "我想使用 cpu 版本的 Xinference 镜像(或者其他版本的镜像)。" #: ../../source/getting_started/using_kubernetes.rst:62 msgid "I want to have 4 Xinference workers, with each worker managing 4 GPUs." msgstr "我需要启动 4 个 Xinference worker 节点,每个 worker 管理 4 个 GPU。" #: ../../source/getting_started/using_kubernetes.rst:68 msgid "" "The above installation method is based on Helm ``--set`` option. For more" " complex custom installations, such as multiple workers with shared " "storage, it is highly recommended to use your own ``values.yaml`` file " "with Helm ``-f`` option for installation." msgstr "" "上面的安装方式基于 Helm ``--set`` 选项。对于更加复杂的自定义安装场景,例如多个 worker 共享存储," "非常推荐使用你自己的 ``values.yaml`` 文件,然后通过 Helm ``-f`` 选项进行安装。" #: ../../source/getting_started/using_kubernetes.rst:72 msgid "" "The default ``values.yaml`` file is located `here " "`_. Some examples can be " "found `here `_." msgstr "" "默认安装方式对应的 ``values.yaml`` 文件位于 `这里 `_ 。" "而 `这里 `_ 有一些示例供参考。" #: ../../source/getting_started/using_kubernetes.rst:78 msgid "KubeBlocks Support" msgstr "基于第三方 KubeBlocks 的方式" #: ../../source/getting_started/using_kubernetes.rst:79 msgid "" "You can also install Xinference in Kubernetes using the third-party " "``KubeBlocks``. This method is not maintained by Xinference and does not " "guarantee timely updates or availability. Please refer to the " "documentation at `here `_." msgstr "" "你也可以通过第三方的 ``KubeBlocks`` 来在 K8s 集群中安装 Xinference 。这种方式不是 Xinference 官方维护的," "因此无法严格保证实时更新和可用性。请参考 `文档 `_ 。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_xinference.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-12-29 12:19+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/getting_started/using_xinference.rst:5 msgid "Using Xinference" msgstr "使用" #: ../../source/getting_started/using_xinference.rst:9 msgid "Run Xinference Locally" msgstr "本地运行 Xinference" #: ../../source/getting_started/using_xinference.rst:11 msgid "" "Let's start by running Xinference on a local machine and running a " "classic LLM model: ``qwen2.5-instruct``." msgstr "" "让我们以一个经典的大语言模型 ``qwen2.5-instruct`` 来展示如何在本地用 " "Xinference 运行大模型。" #: ../../source/getting_started/using_xinference.rst:13 msgid "" "After this quickstart, you will move on to learning how to deploy " "Xinference in a cluster environment." msgstr "" "在这个快速入门之后,可以继续学习如何在一个分布式集群环境下部署 Xinference" "。" #: ../../source/getting_started/using_xinference.rst:16 msgid "Start Local Server" msgstr "拉起本地服务" #: ../../source/getting_started/using_xinference.rst:18 msgid "" "First, please ensure that you have installed Xinference according to the " "instructions provided :ref:`here `. To start a local " "instance of Xinference, run the following command:" msgstr "" "首先,请根据这个 :ref:`文档 ` 的指导确保本地安装了 " "Xinference。使用以下命令拉起本地的 Xinference 服务:" #: ../../source/getting_started/using_xinference.rst:23 #: ../../source/getting_started/using_xinference.rst:65 msgid "shell" msgstr "shell" #: ../../source/getting_started/using_xinference.rst:29 #: ../../source/getting_started/using_xinference.rst:71 msgid "output" msgstr "输出" #: ../../source/getting_started/using_xinference.rst:39 msgid "" "By default, Xinference uses ``/.xinference`` as home path to store " "necessary files such as logs and models, where ```` is the home " "path of current user." msgstr "" "默认情况下,Xinference 会使用 ``/.xinference`` 作为主目录来存储一些" "必要的信息,比如日志文件和模型文件,其中 ```` 就是当前用户的主目录" "。" #: ../../source/getting_started/using_xinference.rst:42 msgid "" "You can change this directory by configuring the environment variable " "``XINFERENCE_HOME``. For example:" msgstr "你可以通过配置环境变量 ``XINFERENCE_HOME`` 修改主目录, 比如:" #: ../../source/getting_started/using_xinference.rst:49 msgid "" "Congrats! You now have Xinference running on your local machine. Once " "Xinference is running, there are multiple ways we can try it: via the web" " UI, via cURL, via the command line, or via the Xinference's python " "client." msgstr "" "恭喜!你已经在本地拉起了 Xinference 服务。一旦 Xinference 服务运行起来," "可以有多种方式来使用,包括使用网页、cURL 命令、命令行或者是 Xinference 的" " Python SDK。" #: ../../source/getting_started/using_xinference.rst:52 msgid "" "You can visit the web UI at `http://127.0.0.1:9997/ui " "`_ and visit `http://127.0.0.1:9997/docs " "`_ to inspect the API docs." msgstr "" "可以通过访问 `http://127.0.0.1:9997/ui `_ 来" "使用 UI,访问 `http://127.0.0.1:9997/docs `_ " "来查看 API 文档。" #: ../../source/getting_started/using_xinference.rst:55 msgid "" "You can install the Xinference command line tool and Python client using " "the following command:" msgstr "可以通过以下命令安装后,利用 Xinference 命令行工具或者 Python 代码来使用:" #: ../../source/getting_started/using_xinference.rst:61 msgid "" "The command line tool is ``xinference``. You can list the commands that " "can be used by running:" msgstr "命令行工具是 ``xinference``。可以通过以下命令查看有哪些可以使用的命令:" #: ../../source/getting_started/using_xinference.rst:102 msgid "" "You can install the Xinference Python client with minimal dependencies " "using the following command. Please ensure that the version of the client" " matches the version of the Xinference server." msgstr "" "如果只需要安装 Xinference 的 Python SDK,可以使用以下命令安装最少依赖。" "需要注意的是版本必须和 Xinference 服务的版本保持匹配。" #: ../../source/getting_started/using_xinference.rst:112 msgid "About Model Engine" msgstr "关于模型的推理引擎" #: ../../source/getting_started/using_xinference.rst:113 msgid "" "Since ``v0.11.0`` , before launching the LLM model, you need to specify " "the inference engine you want to run. Currently, xinference supports the " "following inference engines:" msgstr "" "自 ``v0.11.0`` 版本开始,在加载 LLM 模型之前,你需要指定具体的推理引擎。" "当前,Xinference 支持以下推理引擎:" #: ../../source/getting_started/using_xinference.rst:116 msgid "``vllm``" msgstr "" #: ../../source/getting_started/using_xinference.rst:117 msgid "``sglang``" msgstr "" #: ../../source/getting_started/using_xinference.rst:118 msgid "``llama.cpp``" msgstr "" #: ../../source/getting_started/using_xinference.rst:119 msgid "``transformers``" msgstr "" #: ../../source/getting_started/using_xinference.rst:120 msgid "``MLX``" msgstr "" #: ../../source/getting_started/using_xinference.rst:122 msgid "" "About the details of these inference engine, please refer to :ref:`here " "`." msgstr "关于这些推理引擎的详细信息,请参考 :ref:`这里 ` 。" #: ../../source/getting_started/using_xinference.rst:124 msgid "" "Note that when launching a LLM model, the ``model_format`` and " "``quantization`` of the model you want to launch is closely related to " "the inference engine." msgstr "" "注意,当加载 LLM 模型时,所能运行的引擎与 ``model_format`` 和 ``" "quantization`` 参数息息相关。" #: ../../source/getting_started/using_xinference.rst:127 msgid "" "You can use ``xinference engine`` command to query the combination of " "parameters of the model you want to launch. This will demonstrate under " "what conditions a model can run on which inference engines." msgstr "Xinference 提供了 ``xinference engine`` 命令帮助你查询相关的参数组合。" #: ../../source/getting_started/using_xinference.rst:130 msgid "For example:" msgstr "例如:" #: ../../source/getting_started/using_xinference.rst:132 msgid "" "I would like to query about which inference engines the ``qwen-chat`` " "model can run on, and what are their respective parameters." msgstr "" "我想查询与 ``qwen-chat`` 模型相关的参数组合,以决定它能够怎样跑在各种推理" "引擎上。" #: ../../source/getting_started/using_xinference.rst:138 msgid "" "I want to run ``qwen-chat`` with ``VLLM`` as the inference engine, but I " "don't know how to configure the other parameters." msgstr "" "我想将 ``qwen-chat`` 跑在 ``VLLM`` 推理引擎上,但是我不知道什么样的其他" "参数符合这个要求。" #: ../../source/getting_started/using_xinference.rst:144 msgid "" "I want to launch the ``qwen-chat`` model in the ``GGUF`` format, and I " "need to know how to configure the remaining parameters." msgstr "我想加载 ``GGUF`` 格式的 ``qwen-chat`` 模型,我需要知道其余的参数组合。" #: ../../source/getting_started/using_xinference.rst:151 msgid "" "In summary, compared to previous versions, when launching LLM models, you" " need to additionally pass the ``model_engine`` parameter. You can " "retrieve information about the supported inference engines and their " "related parameter combinations through the ``xinference engine`` command." msgstr "" "总之,相比于之前的版本,当加载 LLM 模型时,需要额外传入 ``model_engine`` " "参数。你可以通过 ``xinference engine`` 命令查询你想运行的推理引擎与其他" "参数组合的关系。" #: ../../source/getting_started/using_xinference.rst:158 msgid "Here are some recommendations on when to use which engine:" msgstr "关于何时使用什么引擎,以下是一些建议:" #: ../../source/getting_started/using_xinference.rst:160 msgid "**Linux**" msgstr "" #: ../../source/getting_started/using_xinference.rst:162 msgid "" "When possible, prioritize using **vLLM** or **SGLang** for better " "performance." msgstr "在能使用的情况下,优先使用 **vLLM** 或 **SGLang**,因为他们有更好的性能。" #: ../../source/getting_started/using_xinference.rst:163 msgid "" "If resources are limited, consider using **llama.cpp**, as it offers more" " quantization options." msgstr "如果资源有限,可以考虑使用 **llama.cpp**,因为他提供了更多的量化选项。" #: ../../source/getting_started/using_xinference.rst:164 msgid "" "For other cases, consider using **Transformers**, which supports nearly " "all models." msgstr "其他使用考虑使用 **Transformers**,它几乎支持所有的模型。" #: ../../source/getting_started/using_xinference.rst:166 msgid "**Windows**" msgstr "" #: ../../source/getting_started/using_xinference.rst:168 msgid "" "It is recommended to use **WSL**, and in this case, follow the same " "choices as Linux." msgstr "推荐使用 **WSL**,这个时候选择和 Linux 一致。" #: ../../source/getting_started/using_xinference.rst:169 msgid "" "Otherwise, prefer **llama.cpp**, and for unsupported models, opt for " "**Transformers**." msgstr "" "其他时候推荐使用 **llama.cpp**,对于不支持的模型,选择使用 **Transformers" "**。" #: ../../source/getting_started/using_xinference.rst:171 msgid "**Mac**" msgstr "" #: ../../source/getting_started/using_xinference.rst:173 msgid "" "If supported by the model, use the **MLX engine**, as it delivers the " "best performance." msgstr "在模型支持的情况下,推荐使用 **MLX 引擎**,它有着最好的性能" #: ../../source/getting_started/using_xinference.rst:174 msgid "" "For other cases, prefer **llama.cpp**, and for unsupported models, choose" " **Transformers**." msgstr "" "其他时候推荐使用 **llama.cpp**,对于不支持的模型,选择使用 **Transformers" "**。" #: ../../source/getting_started/using_xinference.rst:178 msgid "Run qwen2.5-instruct" msgstr "运行 qwen2.5-instruct" #: ../../source/getting_started/using_xinference.rst:180 msgid "" "Let's start by running a built-in model: ``qwen2.5-instruct``. When you " "start a model for the first time, Xinference will download the model " "parameters from HuggingFace, which might take a few minutes depending on " "the size of the model weights. We cache the model files locally, so " "there's no need to redownload them for subsequent starts." msgstr "" "让我们来运行一个内置的 ``qwen2.5-instruct`` 模型。当你需要运行一个模型时" ",第一次运行是要从HuggingFace 下载模型参数,一般来说需要根据模型大小下载" "10到30分钟不等。当下载完成后,Xinference本地会有缓存的处理,以后再运行" "相同的模型不需要重新下载。" #: ../../source/getting_started/using_xinference.rst:185 msgid "" "Xinference also allows you to download models from other sites. You can " "do this by setting an environment variable when launching Xinference. For" " example, if you want to download models from `modelscope " "`_, do the following:" msgstr "" "Xinference 也允许从其他模型托管平台下载模型。可以通过在拉起 Xinference 时" "指定环境变量,比如,如果想要从 ModelScope 中下载模型,可以使用如下命令:" #: ../../source/getting_started/using_xinference.rst:193 msgid "" "We can specify the model's UID using the ``--model-uid`` or ``-u`` flag. " "If not specified, Xinference will generate a unique ID. The default " "unique ID will be identical to the model name." msgstr "" "可以使用 ``--model-uid`` 或者 ``-u`` 参数指定模型的 UID,如果没有指定," "Xinference 会随机生成一个 ID。默认的 ID 和模型名保持一致。``:" #: ../../source/getting_started/using_xinference.rst:232 msgid "" "For some engines, such as vllm, users need to specify the engine-related " "parameters when running models. In this case, you can directly specify " "the parameter name and value in the command line, for example:" msgstr "" "对于一些推理引擎,比如 vllm,用户需要在运行模型时指定引擎相关的参数,这种" "情况下直接在命令行中指定对应的参数名和值即可,比如:" #: ../../source/getting_started/using_xinference.rst:240 msgid "`gpu_memory_utilization=0.9` will pass to vllm when launching model." msgstr "在运行模型时,`gpu_memory_utilization=0.9` 会传到 vllm 后端。" #: ../../source/getting_started/using_xinference.rst:243 msgid "For more tips on model launching, refer to :ref:`launch`." msgstr "关于模型加载更多技巧,参考 :ref:`launch`。" #: ../../source/getting_started/using_xinference.rst:245 msgid "" "Congrats! You now have ``qwen2.5-instruct`` running by Xinference. Once " "the model is running, we can try it out either via cURL, or via " "Xinference's python client:" msgstr "" "到这一步,恭喜你已经成功通过 Xinference 将 ``qwen2.5-instruct`` 运行起来" "了。一旦这个模型在运行中,我们可以通过命令行、cURL 或者是 Python 代码来" "预支交互:" #: ../../source/getting_started/using_xinference.rst:305 msgid "" "Xinference provides OpenAI-compatible APIs for its supported models, so " "you can use Xinference as a local drop-in replacement for OpenAI APIs. " "For example:" msgstr "" "Xinference 提供了与 OpenAI 兼容的 API,所以可以将 Xinference 运行的模型" "当成 OpenAI的本地替代。比如:" #: ../../source/getting_started/using_xinference.rst:321 msgid "The following OpenAI APIs are supported:" msgstr "以下是支持的 OpenAI 的 API:" #: ../../source/getting_started/using_xinference.rst:323 msgid "" "Chat Completions: `https://platform.openai.com/docs/api-reference/chat " "`_" msgstr "" "对话生成:`https://platform.openai.com/docs/api-reference/chat `_" #: ../../source/getting_started/using_xinference.rst:325 msgid "" "Completions: `https://platform.openai.com/docs/api-reference/completions " "`_" msgstr "" "生成: `https://platform.openai.com/docs/api-reference/completions `_" #: ../../source/getting_started/using_xinference.rst:327 msgid "" "Embeddings: `https://platform.openai.com/docs/api-reference/embeddings " "`_" msgstr "" "向量生成:`https://platform.openai.com/docs/api-reference/embeddings <" "https://platform.openai.com/docs/api-reference/embeddings>`_" #: ../../source/getting_started/using_xinference.rst:329 msgid "" "Xinference also supports Anthropic API via base url " "``http://127.0.0.1:9997/anthropic``, you can use Xinference in Claude " "Code and so forth. Refer to :ref:`anthropic client ` " "for more details." msgstr "" "Xinference 还支持通过基础 URL ``http://127.0.0.1:9997/anthropic`` 调用 " "Anthropic API,你可以在 Claude Code 等环境中使用 Xinference。更多详情" "请参阅 :ref:`anthropic client `。" #: ../../source/getting_started/using_xinference.rst:333 msgid "Manage Models" msgstr "管理模型" #: ../../source/getting_started/using_xinference.rst:335 msgid "" "In addition to launching models, Xinference offers various ways to manage" " the entire lifecycle of models. You can manage models in Xinference " "through the command line, cURL, or Xinference's python client." msgstr "" "除了启动模型,Xinference 提供了管理模型整个生命周期的能力。同样的,你可以" "使用命令行、cURL 以及 Python 代码来管理:" #: ../../source/getting_started/using_xinference.rst:338 msgid "" "You can list all models of a certain type that are available to launch in" " Xinference:" msgstr "可以列出所有 Xinference 支持的指定类型的模型:" #: ../../source/getting_started/using_xinference.rst:356 msgid "" "The following command gives you the currently running models in " "Xinference:" msgstr "接下来的命令可以列出所有在运行的模型:" #: ../../source/getting_started/using_xinference.rst:374 msgid "" "When you no longer need a model that is currently running, you can remove" " it in the following way to free up the resources it occupies:" msgstr "当你不需要某个正在运行的模型,可以通过以下的方式来停止它并释放资源:" #: ../../source/getting_started/using_xinference.rst:395 msgid "Deploy Xinference In a Cluster" msgstr "集群中部署 Xinference" #: ../../source/getting_started/using_xinference.rst:397 msgid "" "To deploy Xinference in a cluster, you need to start a Xinference " "supervisor on one server and Xinference workers on the other servers." msgstr "" "若要在集群环境中部署 Xinference,需要在一台机器中启动 supervisor 节点,并" "在当前或者其他节点启动 worker 节点" #: ../../source/getting_started/using_xinference.rst:400 msgid "" "First, make sure you have already installed Xinference on each of the " "servers according to the instructions provided :ref:`here " "`. Then follow the steps below:" msgstr "" "首先,根据 :ref:`文档 ` 确保所有的服务器上都安装了 " "Xinference。接下来按照步骤:" #: ../../source/getting_started/using_xinference.rst:404 msgid "Start the Supervisor" msgstr "启动 Supervisor" #: ../../source/getting_started/using_xinference.rst:405 msgid "" "On the server where you want to run the Xinference supervisor, run the " "following command:" msgstr "在服务器上执行以下命令来启动 Supervisor 节点:" #: ../../source/getting_started/using_xinference.rst:411 msgid "" "Replace ``${supervisor_host}`` with the actual host of your supervisor " "server." msgstr "用当前节点的 IP 来替换 ``${supervisor_host}``。" #: ../../source/getting_started/using_xinference.rst:414 msgid "" "You can the supervisor's web UI at `http://${supervisor_host}:9997/ui " "`_ and visit " "`http://${supervisor_host}:9997/docs " "`_ to inspect the API docs." msgstr "" "可以在 `http://${supervisor_host}:9997/ui `_ 访问 web UI,在 `http://${supervisor_host}:9997/docs `_ 访问 API 文档。" #: ../../source/getting_started/using_xinference.rst:418 msgid "Start the Workers" msgstr "启动 Worker" #: ../../source/getting_started/using_xinference.rst:420 msgid "" "On each of the other servers where you want to run Xinference workers, " "run the following command:" msgstr "在需要启动 Xinference worker 的机器上执行以下命令:" #: ../../source/getting_started/using_xinference.rst:427 msgid "" "Note that you must replace ``${worker_host}`` with the actual host of " "your worker server." msgstr "需要注意的是,必须使用当前Worker节点的 IP 来替换 ``${worker_host}``。" #: ../../source/getting_started/using_xinference.rst:430 msgid "" "Note that if you need to interact with the Xinference in a cluster via " "the command line, you should include the ``-e`` or ``--endpoint`` flag to" " specify the supervisor server's endpoint. For example:" msgstr "" "需要注意的是,如果你需要通过命令行与集群交互,应该通过 ``-e`` 或者 ``--" "endpoint`` 参数来指定 supervisor 的地址,比如:" #: ../../source/getting_started/using_xinference.rst:438 msgid "Using Xinference With Docker" msgstr "使用 Docker 部署 Xinference" #: ../../source/getting_started/using_xinference.rst:440 msgid "To start Xinference in a Docker container, run the following command:" msgstr "用以下命令在容器中运行 Xinference:" #: ../../source/getting_started/using_xinference.rst:443 msgid "Run On Nvidia GPU Host" msgstr "在拥有英伟达显卡的机器上运行" #: ../../source/getting_started/using_xinference.rst:445 msgid "For cuda 12.4:" msgstr "对于 cuda 12.4:" #: ../../source/getting_started/using_xinference.rst:451 msgid "For cuda 12.8:" msgstr "对于 cuda 12.8:" #: ../../source/getting_started/using_xinference.rst:453 msgid "" "CUDA 12.8 version is experimental, welcome to give us feedbacks to help " "us to improve." msgstr "CUDA 12.8 版本是实验性质,欢迎给我们反馈以改进。" #: ../../source/getting_started/using_xinference.rst:456 msgid "CUDA 12.8 version is removed in v1.16.0 ." msgstr "CUDA 12.8 版本已在 v1.16.0 中移除。" #: ../../source/getting_started/using_xinference.rst:463 msgid "For cuda 12.9:" msgstr "对于 cuda 12.9:" #: ../../source/getting_started/using_xinference.rst:465 msgid "CUDA 12.9 will become the default version when Xinference v2.0.0 released." msgstr "在 Xinference v2.0.0 发布后,CUDA 12.9 将成为默认版本。" #: ../../source/getting_started/using_xinference.rst:473 msgid "Run On CPU Only Host" msgstr "在只有 CPU 的机器上运行" #: ../../source/getting_started/using_xinference.rst:479 msgid "" "Replace ```` with Xinference versions, e.g. ``v0.10.3``, " "``latest`` can be used for the latest version." msgstr "" "将 ```` 替换为 Xinference 的版本,比如 ``v0.10.3``,可以用 " "``latest`` 来用于最新版本。" #: ../../source/getting_started/using_xinference.rst:481 msgid "" "For more docker usage, refer to :ref:`Using Docker Image " "`." msgstr "更多 docker 使用,请参考 :ref:`使用 docker 镜像 `。" #: ../../source/getting_started/using_xinference.rst:485 msgid "What's Next?" msgstr "更多" #: ../../source/getting_started/using_xinference.rst:487 msgid "" "Congratulations on getting started with Xinference! To help you navigate " "and make the most out of this powerful tool, here are some resources and " "guides:" msgstr "" "恭喜你,已经初步掌握了 Xinference 的用法!为了帮助你更好地使用工具,下面" "是其他的一些文档和指导资源:" #: ../../source/getting_started/using_xinference.rst:490 msgid "" ":ref:`How to Use Client APIs for Different Types of Models " "`" msgstr ":ref:`如何使用 Python 创建不同类型的模型 `" #: ../../source/getting_started/using_xinference.rst:492 msgid ":ref:`Choosing the Right Backends for Your Needs `" msgstr ":ref:`选择正确的推理引擎 ` " #~ msgid ". code-block:: bash" #~ msgstr "" #~ msgid "" #~ "docker run -e XINFERENCE_MODEL_SRC=modelscope " #~ "-p 9998:9997 --gpus all " #~ "xprobe/xinference:-cu129 xinference-local" #~ " -H 0.0.0.0 --log-level debug" #~ msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/getting_started.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-07-18 10:54+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/getting_started/index.rst:5 msgid "Getting Started" msgstr "入门指南" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-06-26 13:20+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/index.rst:5 msgid "Welcome to Xinference!" msgstr "欢迎来到 Xinference!" #: ../../source/index.rst:19 msgid "" "Xorbits Inference (Xinference) is an open-source platform to streamline " "the operation and integration of a wide array of AI models. With " "Xinference, you're empowered to run inference using any open-source LLMs," " embedding models, and multimodal models either in the cloud or on your " "own premises, and create robust AI-driven applications." msgstr "" "Xorbits Inference (Xinference) 是一个开源平台,用于简化各种 AI 模型的运行" "和集成。借助 Xinference,您可以使用任何开源 LLM、嵌入模型和多模态模型在" "云端或本地环境中运行推理,并创建强大的 AI 应用。" #: ../../source/index.rst:25 msgid "Developing Real-world AI Applications with Xinference" msgstr "使用 Xinference 开发真实场景的 AI 应用" #: ../../source/index.rst:117 msgid "Getting Started" msgstr "入门指南" #: ../../source/index.rst:121 msgid "Install Xinference" msgstr "安装 Xinference" #: ../../source/index.rst:125 msgid "Install Xinference on Linux, Windows, and macOS." msgstr "在 Linux、Windows 和 macOS 上安装 Xinference。" #: ../../source/index.rst:127 msgid "Try it out!" msgstr "立即体验!" #: ../../source/index.rst:131 msgid "Start by running Xinference on a local machine." msgstr "首先在本地计算机上运行 Xinference。" #: ../../source/index.rst:136 msgid "Explore models" msgstr "探索模型" #: ../../source/index.rst:140 msgid "Explore a wide range of models supported by Xinference." msgstr "探索 Xinference 支持的各种模型。" #: ../../source/index.rst:142 msgid "Register your own model" msgstr "注册你自己的模型" #: ../../source/index.rst:146 msgid "Register model weights and turn it into an API." msgstr "注册模型权重,并转化为 API" #: ../../source/index.rst:151 msgid "Explore the API" msgstr "探索 API" #: ../../source/index.rst:155 msgid "Chat & Generate" msgstr "聊天 & 生成" #: ../../source/index.rst:159 msgid "Learn how to chat with LLMs in Xinference." msgstr "学习如何在 Xinference 中与 LLM聊天。" #: ../../source/index.rst:161 msgid "Tools" msgstr "工具" #: ../../source/index.rst:165 msgid "Learn how to connect LLM with external tools." msgstr "学习如何将 LLM 与外部工具连接起来。" #: ../../source/index.rst:170 msgid "Embeddings" msgstr "嵌入" #: ../../source/index.rst:174 msgid "Learn how to create text embeddings in Xinference." msgstr "学习如何在 Xinference 中创建文本嵌入。" #: ../../source/index.rst:176 msgid "Rerank" msgstr "重排序" #: ../../source/index.rst:180 msgid "Learn how to use rerank models in Xinference." msgstr "学习如何在 Xinference 中使用重排序模型。" #: ../../source/index.rst:185 msgid "Images" msgstr "图像" #: ../../source/index.rst:189 msgid "Learn how to generate images with Xinference." msgstr "学习如何使用Xinference生成图像。" #: ../../source/index.rst:191 msgid "Multimodal" msgstr "多模态" #: ../../source/index.rst:195 msgid "Learn how to process images and audio with LLMs." msgstr "学习如何使用 LLM 处理图像和音频。" #: ../../source/index.rst:200 msgid "Audio" msgstr "音频" #: ../../source/index.rst:204 msgid "Learn how to turn audio into text or text into audio with Xinference." msgstr "学习如何使用 Xinference 将音频转换为文本或将文本转换为音频。" #: ../../source/index.rst:206 msgid "Video" msgstr "视频" #: ../../source/index.rst:210 msgid "Learn how to generate video with Xinference." msgstr "学习如何使用Xinference生成视频。" #: ../../source/index.rst:214 msgid "Flexible" msgstr "灵活模型" #: ../../source/index.rst:218 msgid "Learn how to inference traditional ML models with Xinference." msgstr "了解如何使用 Xinference 推理传统机器学习模型。" #: ../../source/index.rst:222 msgid "Getting Involved" msgstr "参与我们" #: ../../source/index.rst:231 msgid "Get Latest News" msgstr "最新资讯" #: ../../source/index.rst:239 msgid ":fab:`twitter` Follow us on Twitter" msgstr ":fab:`twitter` 在 Twitter 上关注我们" #: ../../source/index.rst:244 msgid ":fab:`zhihu` Read our blogs" msgstr ":fab:`zhihu` 阅读知乎博客" #: ../../source/index.rst:251 msgid "Get Support" msgstr "寻求帮助" #: ../../source/index.rst:259 msgid ":fab:`weixin` Find community on WeChat" msgstr ":fab:`weixin` 微信社区" #: ../../source/index.rst:264 msgid ":fab:`discord` Find community on Discord" msgstr ":fab:`discord` Discord 社区" #: ../../source/index.rst:269 msgid ":fab:`github` Open an issue" msgstr ":fab:`github` 在 Github 上提 issue" #: ../../source/index.rst:276 msgid "Contribute to Xinference" msgstr "贡献" #: ../../source/index.rst:284 msgid ":fab:`github` Create a pull request" msgstr ":fab:`github` 在 Github 上提 PR" #~ msgid "Vision" #~ msgstr "视觉" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/audio/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-02-01 16:47+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.13.1\n" #: ../../source/models/builtin/audio/index.rst:5 msgid "Audio Models" msgstr "音频模型" #: ../../source/models/builtin/audio/index.rst:7 msgid "The following is a list of built-in audio models in Xinference:" msgstr "以下是 Xinference 中内置的音频模型列表:" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-base-en-v1.5.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-base-en-v1.5.rst:5 msgid "bge-base-en-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en-v1.5.rst:7 msgid "**Model Name:** bge-base-en-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en-v1.5.rst:8 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en-v1.5.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en-v1.5.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en-v1.5.rst:14 msgid "**Dimensions:** 768" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en-v1.5.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en-v1.5.rst:16 msgid "**Model ID:** BAAI/bge-base-en-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en-v1.5.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en-v1.5.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-base-en.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-base-en.rst:5 msgid "bge-base-en" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en.rst:7 msgid "**Model Name:** bge-base-en" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en.rst:8 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en.rst:14 msgid "**Dimensions:** 768" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en.rst:16 msgid "**Model ID:** BAAI/bge-base-en" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-base-en.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-base-zh-v1.5.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-base-zh-v1.5.rst:5 msgid "bge-base-zh-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh-v1.5.rst:7 msgid "**Model Name:** bge-base-zh-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh-v1.5.rst:8 msgid "**Languages:** zh" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh-v1.5.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh-v1.5.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh-v1.5.rst:14 msgid "**Dimensions:** 768" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh-v1.5.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh-v1.5.rst:16 msgid "**Model ID:** BAAI/bge-base-zh-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh-v1.5.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh-v1.5.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-base-zh.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-base-zh.rst:5 msgid "bge-base-zh" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh.rst:7 msgid "**Model Name:** bge-base-zh" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh.rst:8 msgid "**Languages:** zh" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh.rst:14 msgid "**Dimensions:** 768" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh.rst:16 msgid "**Model ID:** BAAI/bge-base-zh" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-base-zh.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-large-en-v1.5.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-large-en-v1.5.rst:5 msgid "bge-large-en-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en-v1.5.rst:7 msgid "**Model Name:** bge-large-en-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en-v1.5.rst:8 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en-v1.5.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en-v1.5.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en-v1.5.rst:14 msgid "**Dimensions:** 1024" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en-v1.5.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en-v1.5.rst:16 msgid "**Model ID:** BAAI/bge-large-en-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en-v1.5.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en-v1.5.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-large-en.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-large-en.rst:5 msgid "bge-large-en" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en.rst:7 msgid "**Model Name:** bge-large-en" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en.rst:8 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en.rst:14 msgid "**Dimensions:** 1024" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en.rst:16 msgid "**Model ID:** BAAI/bge-large-en" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-large-en.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-large-zh-noinstruct.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-large-zh-noinstruct.rst:5 msgid "bge-large-zh-noinstruct" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-noinstruct.rst:7 msgid "**Model Name:** bge-large-zh-noinstruct" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-noinstruct.rst:8 msgid "**Languages:** zh" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-noinstruct.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-noinstruct.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-noinstruct.rst:14 msgid "**Dimensions:** 1024" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-noinstruct.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-noinstruct.rst:16 msgid "**Model ID:** BAAI/bge-large-zh-noinstruct" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-noinstruct.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-noinstruct.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-large-zh-v1.5.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-large-zh-v1.5.rst:5 msgid "bge-large-zh-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-v1.5.rst:7 msgid "**Model Name:** bge-large-zh-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-v1.5.rst:8 msgid "**Languages:** zh" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-v1.5.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-v1.5.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-v1.5.rst:14 msgid "**Dimensions:** 1024" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-v1.5.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-v1.5.rst:16 msgid "**Model ID:** BAAI/bge-large-zh-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-v1.5.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh-v1.5.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-large-zh.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-large-zh.rst:5 msgid "bge-large-zh" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh.rst:7 msgid "**Model Name:** bge-large-zh" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh.rst:8 msgid "**Languages:** zh" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh.rst:14 msgid "**Dimensions:** 1024" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh.rst:16 msgid "**Model ID:** BAAI/bge-large-zh" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-large-zh.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-small-en-v1.5.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-small-en-v1.5.rst:5 msgid "bge-small-en-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-small-en-v1.5.rst:7 msgid "**Model Name:** bge-small-en-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-small-en-v1.5.rst:8 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/embedding/bge-small-en-v1.5.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-small-en-v1.5.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-small-en-v1.5.rst:14 msgid "**Dimensions:** 384" msgstr "" #: ../../source/models/builtin/embedding/bge-small-en-v1.5.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-small-en-v1.5.rst:16 msgid "**Model ID:** BAAI/bge-small-en-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-small-en-v1.5.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-small-en-v1.5.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-small-zh-v1.5.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-small-zh-v1.5.rst:5 msgid "bge-small-zh-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh-v1.5.rst:7 msgid "**Model Name:** bge-small-zh-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh-v1.5.rst:8 msgid "**Languages:** zh" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh-v1.5.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh-v1.5.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh-v1.5.rst:14 msgid "**Dimensions:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh-v1.5.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh-v1.5.rst:16 msgid "**Model ID:** BAAI/bge-small-zh-v1.5" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh-v1.5.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh-v1.5.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/bge-small-zh.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/bge-small-zh.rst:5 msgid "bge-small-zh" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh.rst:7 msgid "**Model Name:** bge-small-zh" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh.rst:8 msgid "**Languages:** zh" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh.rst:14 msgid "**Dimensions:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh.rst:16 msgid "**Model ID:** BAAI/bge-small-zh" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/bge-small-zh.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/e5-large-v2.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/e5-large-v2.rst:5 msgid "e5-large-v2" msgstr "" #: ../../source/models/builtin/embedding/e5-large-v2.rst:7 msgid "**Model Name:** e5-large-v2" msgstr "" #: ../../source/models/builtin/embedding/e5-large-v2.rst:8 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/embedding/e5-large-v2.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/e5-large-v2.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/e5-large-v2.rst:14 msgid "**Dimensions:** 1024" msgstr "" #: ../../source/models/builtin/embedding/e5-large-v2.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/e5-large-v2.rst:16 msgid "**Model ID:** intfloat/e5-large-v2" msgstr "" #: ../../source/models/builtin/embedding/e5-large-v2.rst:17 msgid "" "**Model Hubs**: `Hugging Face " "`_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/embedding/e5-large-v2.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/gte-base.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/gte-base.rst:5 msgid "gte-base" msgstr "" #: ../../source/models/builtin/embedding/gte-base.rst:7 msgid "**Model Name:** gte-base" msgstr "" #: ../../source/models/builtin/embedding/gte-base.rst:8 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/embedding/gte-base.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/gte-base.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/gte-base.rst:14 msgid "**Dimensions:** 768" msgstr "" #: ../../source/models/builtin/embedding/gte-base.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/gte-base.rst:16 msgid "**Model ID:** thenlper/gte-base" msgstr "" #: ../../source/models/builtin/embedding/gte-base.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/gte-base.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/gte-large.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/gte-large.rst:5 msgid "gte-large" msgstr "" #: ../../source/models/builtin/embedding/gte-large.rst:7 msgid "**Model Name:** gte-large" msgstr "" #: ../../source/models/builtin/embedding/gte-large.rst:8 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/embedding/gte-large.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/gte-large.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/gte-large.rst:14 msgid "**Dimensions:** 1024" msgstr "" #: ../../source/models/builtin/embedding/gte-large.rst:15 msgid "**Max Tokens:** 512" msgstr "" #: ../../source/models/builtin/embedding/gte-large.rst:16 msgid "**Model ID:** thenlper/gte-large" msgstr "" #: ../../source/models/builtin/embedding/gte-large.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/embedding/gte-large.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-12-25 17:11+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/models/builtin/embedding/index.rst:5 msgid "Embedding Models" msgstr "嵌入模型" #: ../../source/models/builtin/embedding/index.rst:7 msgid "The following is a list of built-in embedding models in Xinference:" msgstr "以下是 Xinference 中内置的嵌入模型列表:" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/jina-embeddings-v2-base-en.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/jina-embeddings-v2-base-en.rst:5 msgid "jina-embeddings-v2-base-en" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-base-en.rst:7 msgid "**Model Name:** jina-embeddings-v2-base-en" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-base-en.rst:8 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-base-en.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-base-en.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-base-en.rst:14 msgid "**Dimensions:** 512" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-base-en.rst:15 msgid "**Max Tokens:** 8192" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-base-en.rst:16 msgid "**Model ID:** jinaai/jina-embeddings-v2-base-en" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-base-en.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-base-en.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/jina-embeddings-v2-small-en.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/jina-embeddings-v2-small-en.rst:5 msgid "jina-embeddings-v2-small-en" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-small-en.rst:7 msgid "**Model Name:** jina-embeddings-v2-small-en" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-small-en.rst:8 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-small-en.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-small-en.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-small-en.rst:14 msgid "**Dimensions:** 512" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-small-en.rst:15 msgid "**Max Tokens:** 8192" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-small-en.rst:16 msgid "**Model ID:** jinaai/jina-embeddings-v2-small-en" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-small-en.rst:17 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/embedding/jina-embeddings-v2-small-en.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/embedding/multilingual-e5-large.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/embedding/multilingual-e5-large.rst:5 msgid "multilingual-e5-large" msgstr "" #: ../../source/models/builtin/embedding/multilingual-e5-large.rst:7 msgid "**Model Name:** multilingual-e5-large" msgstr "" #: ../../source/models/builtin/embedding/multilingual-e5-large.rst:8 msgid "**Languages:** zh" msgstr "" #: ../../source/models/builtin/embedding/multilingual-e5-large.rst:9 msgid "**Abilities:** embed" msgstr "" #: ../../source/models/builtin/embedding/multilingual-e5-large.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/embedding/multilingual-e5-large.rst:14 msgid "**Dimensions:** 1024" msgstr "" #: ../../source/models/builtin/embedding/multilingual-e5-large.rst:15 msgid "**Max Tokens:** 514" msgstr "" #: ../../source/models/builtin/embedding/multilingual-e5-large.rst:16 msgid "**Model ID:** intfloat/multilingual-e5-large" msgstr "" #: ../../source/models/builtin/embedding/multilingual-e5-large.rst:17 msgid "" "**Model Hubs**: `Hugging Face " "`_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/embedding/multilingual-e5-large.rst:19 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/flux.1-dev.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-08-09 19:13+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/flux.1-dev.rst:5 msgid "FLUX.1-dev" msgstr "" #: ../../source/models/builtin/image/flux.1-dev.rst:7 msgid "**Model Name:** FLUX.1-dev" msgstr "" #: ../../source/models/builtin/image/flux.1-dev.rst:8 msgid "**Model Family:** stable_diffusion" msgstr "" #: ../../source/models/builtin/image/flux.1-dev.rst:9 msgid "**Abilities:** text2image" msgstr "" #: ../../source/models/builtin/image/flux.1-dev.rst:10 msgid "**Available ControlNet:** None" msgstr "" #: ../../source/models/builtin/image/flux.1-dev.rst:13 msgid "Specifications" msgstr "" #: ../../source/models/builtin/image/flux.1-dev.rst:15 msgid "**Model ID:** black-forest-labs/FLUX.1-dev" msgstr "" #: ../../source/models/builtin/image/flux.1-dev.rst:17 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/flux.1-schnell.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-08-09 19:13+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/flux.1-schnell.rst:5 msgid "FLUX.1-schnell" msgstr "" #: ../../source/models/builtin/image/flux.1-schnell.rst:7 msgid "**Model Name:** FLUX.1-schnell" msgstr "" #: ../../source/models/builtin/image/flux.1-schnell.rst:8 msgid "**Model Family:** stable_diffusion" msgstr "" #: ../../source/models/builtin/image/flux.1-schnell.rst:9 msgid "**Abilities:** text2image" msgstr "" #: ../../source/models/builtin/image/flux.1-schnell.rst:10 msgid "**Available ControlNet:** None" msgstr "" #: ../../source/models/builtin/image/flux.1-schnell.rst:13 msgid "Specifications" msgstr "" #: ../../source/models/builtin/image/flux.1-schnell.rst:15 msgid "**Model ID:** black-forest-labs/FLUX.1-schnell" msgstr "" #: ../../source/models/builtin/image/flux.1-schnell.rst:17 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-03-11 13:33+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/index.rst:5 msgid "Image Models" msgstr "图像模型" #: ../../source/models/builtin/image/index.rst:7 msgid "The following is a list of built-in image models in Xinference:" msgstr "以下是 Xinference 中内置的图像模型列表:" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/kolors.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-08-09 19:13+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/kolors.rst:5 msgid "kolors" msgstr "" #: ../../source/models/builtin/image/kolors.rst:7 msgid "**Model Name:** kolors" msgstr "" #: ../../source/models/builtin/image/kolors.rst:8 msgid "**Model Family:** stable_diffusion" msgstr "" #: ../../source/models/builtin/image/kolors.rst:9 msgid "**Abilities:** text2image, image2image" msgstr "" #: ../../source/models/builtin/image/kolors.rst:10 msgid "**Available ControlNet:** None" msgstr "" #: ../../source/models/builtin/image/kolors.rst:13 msgid "Specifications" msgstr "" #: ../../source/models/builtin/image/kolors.rst:15 msgid "**Model ID:** Kwai-Kolors/Kolors-diffusers" msgstr "" #: ../../source/models/builtin/image/kolors.rst:17 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/sd-turbo.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-08-09 19:13+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/sd-turbo.rst:5 msgid "sd-turbo" msgstr "" #: ../../source/models/builtin/image/sd-turbo.rst:7 msgid "**Model Name:** sd-turbo" msgstr "" #: ../../source/models/builtin/image/sd-turbo.rst:8 msgid "**Model Family:** stable_diffusion" msgstr "" #: ../../source/models/builtin/image/sd-turbo.rst:9 msgid "**Abilities:** text2image" msgstr "" #: ../../source/models/builtin/image/sd-turbo.rst:10 msgid "**Available ControlNet:** None" msgstr "" #: ../../source/models/builtin/image/sd-turbo.rst:13 msgid "Specifications" msgstr "" #: ../../source/models/builtin/image/sd-turbo.rst:15 msgid "**Model ID:** stabilityai/sd-turbo" msgstr "" #: ../../source/models/builtin/image/sd-turbo.rst:17 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/sd3-medium.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-08-09 19:13+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/sd3-medium.rst:5 msgid "sd3-medium" msgstr "" #: ../../source/models/builtin/image/sd3-medium.rst:7 msgid "**Model Name:** sd3-medium" msgstr "" #: ../../source/models/builtin/image/sd3-medium.rst:8 msgid "**Model Family:** stable_diffusion" msgstr "" #: ../../source/models/builtin/image/sd3-medium.rst:9 msgid "**Abilities:** text2image, image2image" msgstr "" #: ../../source/models/builtin/image/sd3-medium.rst:10 msgid "**Available ControlNet:** None" msgstr "" #: ../../source/models/builtin/image/sd3-medium.rst:13 msgid "Specifications" msgstr "" #: ../../source/models/builtin/image/sd3-medium.rst:15 msgid "**Model ID:** stabilityai/stable-diffusion-3-medium-diffusers" msgstr "" #: ../../source/models/builtin/image/sd3-medium.rst:17 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/sdxl-turbo.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-08-09 19:13+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/sdxl-turbo.rst:5 msgid "sdxl-turbo" msgstr "" #: ../../source/models/builtin/image/sdxl-turbo.rst:7 msgid "**Model Name:** sdxl-turbo" msgstr "" #: ../../source/models/builtin/image/sdxl-turbo.rst:8 msgid "**Model Family:** stable_diffusion" msgstr "" #: ../../source/models/builtin/image/sdxl-turbo.rst:9 msgid "**Abilities:** text2image" msgstr "" #: ../../source/models/builtin/image/sdxl-turbo.rst:10 msgid "**Available ControlNet:** None" msgstr "" #: ../../source/models/builtin/image/sdxl-turbo.rst:13 msgid "Specifications" msgstr "" #: ../../source/models/builtin/image/sdxl-turbo.rst:15 msgid "**Model ID:** stabilityai/sdxl-turbo" msgstr "" #: ../../source/models/builtin/image/sdxl-turbo.rst:17 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/stable-diffusion-2-inpainting.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-07-28 22:01+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/stable-diffusion-2-inpainting.rst:5 msgid "stable-diffusion-2-inpainting" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-2-inpainting.rst:7 msgid "**Model Name:** stable-diffusion-2-inpainting" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-2-inpainting.rst:8 msgid "**Model Family:** stable_diffusion" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-2-inpainting.rst:9 msgid "**Abilities:** inpainting" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-2-inpainting.rst:10 msgid "**Available ControlNet:** None" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-2-inpainting.rst:13 msgid "Specifications" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-2-inpainting.rst:15 msgid "**Model ID:** stabilityai/stable-diffusion-2-inpainting" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-2-inpainting.rst:17 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/stable-diffusion-inpainting.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-07-28 22:01+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/stable-diffusion-inpainting.rst:5 msgid "stable-diffusion-inpainting" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-inpainting.rst:7 msgid "**Model Name:** stable-diffusion-inpainting" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-inpainting.rst:8 msgid "**Model Family:** stable_diffusion" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-inpainting.rst:9 msgid "**Abilities:** inpainting" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-inpainting.rst:10 msgid "**Available ControlNet:** None" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-inpainting.rst:13 msgid "Specifications" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-inpainting.rst:15 msgid "**Model ID:** runwayml/stable-diffusion-inpainting" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-inpainting.rst:17 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/stable-diffusion-v1.5.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-08-09 19:13+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/stable-diffusion-v1.5.rst:5 msgid "stable-diffusion-v1.5" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-v1.5.rst:7 msgid "**Model Name:** stable-diffusion-v1.5" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-v1.5.rst:8 msgid "**Model Family:** stable_diffusion" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-v1.5.rst:9 msgid "**Abilities:** text2image, image2image" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-v1.5.rst:10 msgid "" "**Available ControlNet:** ['canny', 'mlsd', 'hed', 'scribble', " "'openpose', 'normal', 'seg']" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-v1.5.rst:13 msgid "Specifications" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-v1.5.rst:15 msgid "**Model ID:** runwayml/stable-diffusion-v1-5" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-v1.5.rst:17 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/stable-diffusion-xl-base-1.0.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-08-09 19:13+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/stable-diffusion-xl-base-1.0.rst:5 msgid "stable-diffusion-xl-base-1.0" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-base-1.0.rst:7 msgid "**Model Name:** stable-diffusion-xl-base-1.0" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-base-1.0.rst:8 msgid "**Model Family:** stable_diffusion" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-base-1.0.rst:9 msgid "**Abilities:** text2image, image2image" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-base-1.0.rst:10 msgid "**Available ControlNet:** ['canny', 'depth', 'zoe-depth']" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-base-1.0.rst:13 msgid "Specifications" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-base-1.0.rst:15 msgid "**Model ID:** stabilityai/stable-diffusion-xl-base-1.0" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-base-1.0.rst:17 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/image/stable-diffusion-xl-inpainting.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-07-28 22:01+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/image/stable-diffusion-xl-inpainting.rst:5 msgid "stable-diffusion-xl-inpainting" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-inpainting.rst:7 msgid "**Model Name:** stable-diffusion-xl-inpainting" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-inpainting.rst:8 msgid "**Model Family:** stable_diffusion" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-inpainting.rst:9 msgid "**Abilities:** inpainting" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-inpainting.rst:10 msgid "**Available ControlNet:** None" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-inpainting.rst:13 msgid "Specifications" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-inpainting.rst:15 msgid "**Model ID:** diffusers/stable-diffusion-xl-1.0-inpainting-0.1" msgstr "" #: ../../source/models/builtin/image/stable-diffusion-xl-inpainting.rst:17 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-12-25 17:11+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/models/builtin/index.rst:5 msgid "Builtin Models" msgstr "内置模型" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/baichuan-2-chat.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:5 msgid "baichuan-2-chat" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:8 msgid "**Model Name:** baichuan-2-chat" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:11 msgid "" "**Description:** Baichuan2-chat is a fine-tuned version of the Baichuan " "LLM, specializing in chatting." msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:20 #: ../../source/models/builtin/llm/baichuan-2-chat.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:22 #: ../../source/models/builtin/llm/baichuan-2-chat.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:23 msgid "**Model ID:** baichuan-inc/Baichuan2-7B-Chat" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:26 #: ../../source/models/builtin/llm/baichuan-2-chat.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:33 msgid "Model Spec 2 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:36 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:38 msgid "**Model ID:** baichuan-inc/Baichuan2-13B-Chat" msgstr "" #: ../../source/models/builtin/llm/baichuan-2-chat.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/baichuan-2.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/baichuan-2.rst:5 msgid "baichuan-2" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:8 msgid "**Model Name:** baichuan-2" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:11 msgid "" "**Description:** Baichuan2 is an open-source Transformer based LLM that " "is trained on both Chinese and English data." msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:20 #: ../../source/models/builtin/llm/baichuan-2.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:22 #: ../../source/models/builtin/llm/baichuan-2.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:23 msgid "**Model ID:** baichuan-inc/Baichuan2-7B-Base" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:26 #: ../../source/models/builtin/llm/baichuan-2.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:33 msgid "Model Spec 2 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:36 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:38 msgid "**Model ID:** baichuan-inc/Baichuan2-13B-Base" msgstr "" #: ../../source/models/builtin/llm/baichuan-2.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/baichuan-chat.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/baichuan-chat.rst:5 msgid "baichuan-chat" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:8 msgid "**Model Name:** baichuan-chat" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:11 msgid "" "**Description:** Baichuan-chat is a fine-tuned version of the Baichuan " "LLM, specializing in chatting." msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:18 msgid "Model Spec 1 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:21 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:23 msgid "**Model ID:** baichuan-inc/Baichuan-13B-Chat" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/baichuan-chat.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/baichuan.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/baichuan.rst:5 msgid "baichuan" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:8 msgid "**Model Name:** baichuan" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:11 msgid "" "**Description:** Baichuan is an open-source Transformer based LLM that is" " trained on both Chinese and English data." msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:18 msgid "Model Spec 1 (ggmlv3, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:20 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:21 #: ../../source/models/builtin/llm/baichuan.rst:36 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:22 msgid "" "**Quantizations:** q2_K, q3_K_L, q3_K_M, q3_K_S, q4_0, q4_1, q4_K_M, " "q4_K_S, q5_0, q5_1, q5_K_M, q5_K_S, q6_K, q8_0" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:23 msgid "**Model ID:** TheBloke/baichuan-llama-7B-GGML" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:26 #: ../../source/models/builtin/llm/baichuan.rst:41 #: ../../source/models/builtin/llm/baichuan.rst:56 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:33 msgid "Model Spec 2 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:35 #: ../../source/models/builtin/llm/baichuan.rst:50 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:37 #: ../../source/models/builtin/llm/baichuan.rst:52 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:38 msgid "**Model ID:** baichuan-inc/Baichuan-7B" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:48 msgid "Model Spec 3 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:51 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:53 msgid "**Model ID:** baichuan-inc/Baichuan-13B-Base" msgstr "" #: ../../source/models/builtin/llm/baichuan.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/chatglm.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/chatglm.rst:5 msgid "chatglm" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:8 msgid "**Model Name:** chatglm" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:11 msgid "" "**Description:** ChatGLM is an open-source General Language Model (GLM) " "based LLM trained on both Chinese and English data." msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:18 msgid "Model Spec 1 (ggmlv3, 6 Billion)" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:20 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:21 #: ../../source/models/builtin/llm/chatglm.rst:36 msgid "**Model Size (in billions):** 6" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:22 msgid "**Quantizations:** q4_0, q4_1, q5_0, q5_1, q8_0" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:23 msgid "**Model ID:** Xorbits/chatglm-6B-GGML" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:26 #: ../../source/models/builtin/llm/chatglm.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:33 msgid "Model Spec 2 (pytorch, 6 Billion)" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:38 msgid "**Model ID:** THUDM/chatglm-6b" msgstr "" #: ../../source/models/builtin/llm/chatglm.rst:39 msgid "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/chatglm2-32k.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/chatglm2-32k.rst:5 msgid "chatglm2-32k" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:7 msgid "**Context Length:** 32768" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:8 msgid "**Model Name:** chatglm2-32k" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:11 msgid "" "**Description:** ChatGLM2-32k is a special version of ChatGLM2, with a " "context window of 32k tokens instead of 8k." msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:18 msgid "Model Spec 1 (pytorch, 6 Billion)" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:21 msgid "**Model Size (in billions):** 6" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:23 msgid "**Model ID:** THUDM/chatglm2-6b-32k" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/chatglm2-32k.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/chatglm2.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/chatglm2.rst:5 msgid "chatglm2" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:8 msgid "**Model Name:** chatglm2" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:11 msgid "" "**Description:** ChatGLM2 is the second generation of ChatGLM, still " "open-source and trained on Chinese and English data." msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:18 msgid "Model Spec 1 (ggmlv3, 6 Billion)" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:20 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:21 #: ../../source/models/builtin/llm/chatglm2.rst:36 msgid "**Model Size (in billions):** 6" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:22 msgid "**Quantizations:** q4_0, q4_1, q5_0, q5_1, q8_0" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:23 msgid "**Model ID:** Xorbits/chatglm2-6B-GGML" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:26 #: ../../source/models/builtin/llm/chatglm2.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:33 msgid "Model Spec 2 (pytorch, 6 Billion)" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:38 msgid "**Model ID:** THUDM/chatglm2-6b" msgstr "" #: ../../source/models/builtin/llm/chatglm2.rst:39 msgid "" "**Model Hubs**: `Hugging Face " "`_, `ModelScope " "`_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/chatglm3-32k.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/chatglm3-32k.rst:5 msgid "chatglm3-32k" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:7 msgid "**Context Length:** 32768" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:8 msgid "**Model Name:** chatglm3-32k" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:11 msgid "" "**Description:** ChatGLM3 is the third generation of ChatGLM, still open-" "source and trained on Chinese and English data." msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:18 msgid "Model Spec 1 (pytorch, 6 Billion)" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:21 msgid "**Model Size (in billions):** 6" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:23 msgid "**Model ID:** THUDM/chatglm3-6b-32k" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/chatglm3-32k.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/chatglm3.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/chatglm3.rst:5 msgid "chatglm3" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:8 msgid "**Model Name:** chatglm3" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:11 msgid "" "**Description:** ChatGLM3 is the third generation of ChatGLM, still open-" "source and trained on Chinese and English data." msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:18 msgid "Model Spec 1 (ggmlv3, 6 Billion)" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:20 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:21 #: ../../source/models/builtin/llm/chatglm3.rst:36 msgid "**Model Size (in billions):** 6" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:22 msgid "**Quantizations:** q4_0" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:23 msgid "**Model ID:** Xorbits/chatglm3-6B-GGML" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:26 #: ../../source/models/builtin/llm/chatglm3.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:33 msgid "Model Spec 2 (pytorch, 6 Billion)" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:38 msgid "**Model ID:** THUDM/chatglm3-6b" msgstr "" #: ../../source/models/builtin/llm/chatglm3.rst:39 msgid "" "**Model Hubs**: `Hugging Face " "`_, `ModelScope " "`_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/code-llama-instruct.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/code-llama-instruct.rst:5 msgid "code-llama-instruct" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:7 msgid "**Context Length:** 100000" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:8 msgid "**Model Name:** code-llama-instruct" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:11 msgid "" "**Description:** Code-Llama-Instruct is an instruct-tuned version of the " "Code-Llama LLM." msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:20 #: ../../source/models/builtin/llm/code-llama-instruct.rst:35 #: ../../source/models/builtin/llm/code-llama-instruct.rst:50 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:21 #: ../../source/models/builtin/llm/code-llama-instruct.rst:66 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:22 #: ../../source/models/builtin/llm/code-llama-instruct.rst:37 #: ../../source/models/builtin/llm/code-llama-instruct.rst:52 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:23 msgid "**Model ID:** codellama/CodeLlama-7b-Instruct-hf" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:26 #: ../../source/models/builtin/llm/code-llama-instruct.rst:41 #: ../../source/models/builtin/llm/code-llama-instruct.rst:56 #: ../../source/models/builtin/llm/code-llama-instruct.rst:71 #: ../../source/models/builtin/llm/code-llama-instruct.rst:86 #: ../../source/models/builtin/llm/code-llama-instruct.rst:101 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:33 msgid "Model Spec 2 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:36 #: ../../source/models/builtin/llm/code-llama-instruct.rst:81 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:38 msgid "**Model ID:** codellama/CodeLlama-13b-Instruct-hf" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:48 msgid "Model Spec 3 (pytorch, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:51 #: ../../source/models/builtin/llm/code-llama-instruct.rst:96 msgid "**Model Size (in billions):** 34" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:53 msgid "**Model ID:** codellama/CodeLlama-34b-Instruct-hf" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:63 msgid "Model Spec 4 (ggufv2, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:65 #: ../../source/models/builtin/llm/code-llama-instruct.rst:80 #: ../../source/models/builtin/llm/code-llama-instruct.rst:95 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:67 #: ../../source/models/builtin/llm/code-llama-instruct.rst:82 #: ../../source/models/builtin/llm/code-llama-instruct.rst:97 msgid "" "**Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, " "Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:68 msgid "**Model ID:** TheBloke/CodeLlama-7B-Instruct-GGUF" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:69 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:78 msgid "Model Spec 5 (ggufv2, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:83 msgid "**Model ID:** TheBloke/CodeLlama-13B-Instruct-GGUF" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:84 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:93 msgid "Model Spec 6 (ggufv2, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:98 msgid "**Model ID:** TheBloke/CodeLlama-34B-Instruct-GGUF" msgstr "" #: ../../source/models/builtin/llm/code-llama-instruct.rst:99 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/code-llama-python.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/code-llama-python.rst:5 msgid "code-llama-python" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:7 msgid "**Context Length:** 100000" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:8 msgid "**Model Name:** code-llama-python" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:11 msgid "" "**Description:** Code-Llama-Python is a fine-tuned version of the Code-" "Llama LLM, specializing in Python." msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:20 #: ../../source/models/builtin/llm/code-llama-python.rst:35 #: ../../source/models/builtin/llm/code-llama-python.rst:50 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:21 #: ../../source/models/builtin/llm/code-llama-python.rst:66 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:22 #: ../../source/models/builtin/llm/code-llama-python.rst:37 #: ../../source/models/builtin/llm/code-llama-python.rst:52 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:23 msgid "**Model ID:** TheBloke/CodeLlama-7B-Python-fp16" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:26 #: ../../source/models/builtin/llm/code-llama-python.rst:41 #: ../../source/models/builtin/llm/code-llama-python.rst:56 #: ../../source/models/builtin/llm/code-llama-python.rst:71 #: ../../source/models/builtin/llm/code-llama-python.rst:86 #: ../../source/models/builtin/llm/code-llama-python.rst:101 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:33 msgid "Model Spec 2 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:36 #: ../../source/models/builtin/llm/code-llama-python.rst:81 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:38 msgid "**Model ID:** TheBloke/CodeLlama-13B-Python-fp16" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:48 msgid "Model Spec 3 (pytorch, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:51 #: ../../source/models/builtin/llm/code-llama-python.rst:96 msgid "**Model Size (in billions):** 34" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:53 msgid "**Model ID:** TheBloke/CodeLlama-34B-Python-fp16" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:63 msgid "Model Spec 4 (ggufv2, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:65 #: ../../source/models/builtin/llm/code-llama-python.rst:80 #: ../../source/models/builtin/llm/code-llama-python.rst:95 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:67 #: ../../source/models/builtin/llm/code-llama-python.rst:82 #: ../../source/models/builtin/llm/code-llama-python.rst:97 msgid "" "**Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, " "Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:68 msgid "**Model ID:** TheBloke/CodeLlama-7B-Python-GGUF" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:69 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:78 msgid "Model Spec 5 (ggufv2, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:83 msgid "**Model ID:** TheBloke/CodeLlama-13B-Python-GGUF" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:84 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:93 msgid "Model Spec 6 (ggufv2, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:98 msgid "**Model ID:** TheBloke/CodeLlama-34B-Python-GGUF" msgstr "" #: ../../source/models/builtin/llm/code-llama-python.rst:99 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/code-llama.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/code-llama.rst:5 msgid "code-llama" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:7 msgid "**Context Length:** 100000" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:8 msgid "**Model Name:** code-llama" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:11 msgid "" "**Description:** Code-Llama is an open-source LLM trained by fine-tuning " "LLaMA2 for generating and discussing code." msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:20 #: ../../source/models/builtin/llm/code-llama.rst:35 #: ../../source/models/builtin/llm/code-llama.rst:50 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:21 #: ../../source/models/builtin/llm/code-llama.rst:66 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:22 #: ../../source/models/builtin/llm/code-llama.rst:37 #: ../../source/models/builtin/llm/code-llama.rst:52 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:23 msgid "**Model ID:** TheBloke/CodeLlama-7B-fp16" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:26 #: ../../source/models/builtin/llm/code-llama.rst:41 #: ../../source/models/builtin/llm/code-llama.rst:56 #: ../../source/models/builtin/llm/code-llama.rst:71 #: ../../source/models/builtin/llm/code-llama.rst:86 #: ../../source/models/builtin/llm/code-llama.rst:101 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:33 msgid "Model Spec 2 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:36 #: ../../source/models/builtin/llm/code-llama.rst:81 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:38 msgid "**Model ID:** TheBloke/CodeLlama-13B-fp16" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:48 msgid "Model Spec 3 (pytorch, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:51 #: ../../source/models/builtin/llm/code-llama.rst:96 msgid "**Model Size (in billions):** 34" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:53 msgid "**Model ID:** TheBloke/CodeLlama-34B-fp16" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:63 msgid "Model Spec 4 (ggufv2, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:65 #: ../../source/models/builtin/llm/code-llama.rst:80 #: ../../source/models/builtin/llm/code-llama.rst:95 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:67 #: ../../source/models/builtin/llm/code-llama.rst:82 #: ../../source/models/builtin/llm/code-llama.rst:97 msgid "" "**Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, " "Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:68 msgid "**Model ID:** TheBloke/CodeLlama-7B-GGUF" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:69 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:78 msgid "Model Spec 5 (ggufv2, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:83 msgid "**Model ID:** TheBloke/CodeLlama-13B-GGUF" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:84 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:93 msgid "Model Spec 6 (ggufv2, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:98 msgid "**Model ID:** TheBloke/CodeLlama-34B-GGUF" msgstr "" #: ../../source/models/builtin/llm/code-llama.rst:99 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/deepseek-chat.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/deepseek-chat.rst:5 msgid "deepseek-chat" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:8 msgid "**Model Name:** deepseek-chat" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:11 msgid "" "**Description:** DeepSeek LLM is an advanced language model comprising 67" " billion parameters. It has been trained from scratch on a vast dataset " "of 2 trillion tokens in both English and Chinese." msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:20 #: ../../source/models/builtin/llm/deepseek-chat.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:21 #: ../../source/models/builtin/llm/deepseek-chat.rst:51 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:22 #: ../../source/models/builtin/llm/deepseek-chat.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:23 msgid "**Model ID:** deepseek-ai/deepseek-llm-7b-chat" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:26 #: ../../source/models/builtin/llm/deepseek-chat.rst:41 #: ../../source/models/builtin/llm/deepseek-chat.rst:56 #: ../../source/models/builtin/llm/deepseek-chat.rst:71 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:33 msgid "Model Spec 2 (pytorch, 67 Billion)" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:36 #: ../../source/models/builtin/llm/deepseek-chat.rst:66 msgid "**Model Size (in billions):** 67" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:38 msgid "**Model ID:** deepseek-ai/deepseek-llm-67b-chat" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:48 msgid "Model Spec 3 (ggufv2, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:50 #: ../../source/models/builtin/llm/deepseek-chat.rst:65 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:52 #: ../../source/models/builtin/llm/deepseek-chat.rst:67 msgid "" "**Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, " "Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:53 msgid "**Model ID:** TheBloke/deepseek-llm-7B-chat-GGUF" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:63 msgid "Model Spec 4 (ggufv2, 67 Billion)" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:68 msgid "**Model ID:** TheBloke/deepseek-llm-67b-chat-GGUF" msgstr "" #: ../../source/models/builtin/llm/deepseek-chat.rst:69 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/deepseek-coder-instruct.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:5 msgid "deepseek-coder-instruct" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:8 msgid "**Model Name:** deepseek-coder-instruct" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:11 msgid "" "**Description:** deepseek-coder-instruct is a model initialized from " "deepseek-coder-base and fine-tuned on 2B tokens of instruction data." msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:18 msgid "Model Spec 1 (pytorch, 1_3 Billion)" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:20 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:35 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:50 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:21 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:66 msgid "**Model Size (in billions):** 1_3" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:22 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:37 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:52 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:23 msgid "**Model ID:** deepseek-ai/deepseek-coder-1.3b-instruct" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:26 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:41 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:56 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:71 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:86 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:101 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:33 msgid "Model Spec 2 (pytorch, 6_7 Billion)" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:36 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:81 msgid "**Model Size (in billions):** 6_7" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:38 msgid "**Model ID:** deepseek-ai/deepseek-coder-6.7b-instruct" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:48 msgid "Model Spec 3 (pytorch, 33 Billion)" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:51 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:96 msgid "**Model Size (in billions):** 33" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:53 msgid "**Model ID:** deepseek-ai/deepseek-coder-33b-instruct" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:63 msgid "Model Spec 4 (ggufv2, 1_3 Billion)" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:65 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:80 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:95 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:67 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:82 #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:97 msgid "" "**Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, " "Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:68 msgid "**Model ID:** TheBloke/deepseek-coder-1.3b-instruct-GGUF" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:69 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:78 msgid "Model Spec 5 (ggufv2, 6_7 Billion)" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:83 msgid "**Model ID:** TheBloke/deepseek-coder-6.7B-instruct-GGUF" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:84 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:93 msgid "Model Spec 6 (ggufv2, 33 Billion)" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:98 msgid "**Model ID:** TheBloke/deepseek-coder-33B-instruct-GGUF" msgstr "" #: ../../source/models/builtin/llm/deepseek-coder-instruct.rst:99 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/falcon-instruct.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/falcon-instruct.rst:5 msgid "falcon-instruct" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:8 msgid "**Model Name:** falcon-instruct" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:11 msgid "" "**Description:** Falcon-instruct is a fine-tuned version of the Falcon " "LLM, specializing in chatting." msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:20 #: ../../source/models/builtin/llm/falcon-instruct.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:22 #: ../../source/models/builtin/llm/falcon-instruct.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:23 msgid "**Model ID:** tiiuae/falcon-7b-instruct" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:26 #: ../../source/models/builtin/llm/falcon-instruct.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:33 msgid "Model Spec 2 (pytorch, 40 Billion)" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:36 msgid "**Model Size (in billions):** 40" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:38 msgid "**Model ID:** tiiuae/falcon-40b-instruct" msgstr "" #: ../../source/models/builtin/llm/falcon-instruct.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/falcon.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/falcon.rst:5 msgid "falcon" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:8 msgid "**Model Name:** falcon" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:11 msgid "" "**Description:** Falcon is an open-source Transformer based LLM trained " "on the RefinedWeb dataset." msgstr "" #: ../../source/models/builtin/llm/falcon.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:18 msgid "Model Spec 1 (pytorch, 40 Billion)" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:20 #: ../../source/models/builtin/llm/falcon.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:21 msgid "**Model Size (in billions):** 40" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:22 #: ../../source/models/builtin/llm/falcon.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:23 msgid "**Model ID:** tiiuae/falcon-40b" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:26 #: ../../source/models/builtin/llm/falcon.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:33 msgid "Model Spec 2 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:36 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:38 msgid "**Model ID:** tiiuae/falcon-7b" msgstr "" #: ../../source/models/builtin/llm/falcon.rst:39 msgid "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/glaive-coder.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/glaive-coder.rst:5 msgid "glaive-coder" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:7 msgid "**Context Length:** 100000" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:8 msgid "**Model Name:** glaive-coder" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:11 msgid "" "**Description:** A code model trained on a dataset of ~140k programming " "related problems and solutions generated from Glaive’s synthetic data " "generation platform." msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:23 msgid "**Model ID:** glaiveai/glaive-coder-7b" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/glaive-coder.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/gorilla-openfunctions-v1.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:5 msgid "gorilla-openfunctions-v1" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:8 msgid "**Model Name:** gorilla-openfunctions-v1" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:11 msgid "" "**Description:** OpenFunctions is designed to extend Large Language Model" " (LLM) Chat Completion feature to formulate executable APIs call given " "natural language instructions and API context." msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:21 #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:36 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:23 msgid "**Model ID:** gorilla-llm/gorilla-openfunctions-v1" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:26 #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:33 msgid "Model Spec 2 (ggufv2, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:35 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:37 msgid "" "**Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, " "Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:38 msgid "**Model ID:** TheBloke/gorilla-openfunctions-v1-GGUF" msgstr "" #: ../../source/models/builtin/llm/gorilla-openfunctions-v1.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/gpt-2.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/gpt-2.rst:5 msgid "gpt-2" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:7 msgid "**Context Length:** 1024" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:8 msgid "**Model Name:** gpt-2" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:11 msgid "" "**Description:** GPT-2 is a Transformer-based LLM that is trained on " "WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes." msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:18 msgid "Model Spec 1 (ggmlv3, 1 Billion)" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:20 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:21 msgid "**Model Size (in billions):** 1" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:22 msgid "**Quantizations:** none" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:23 msgid "**Model ID:** marella/gpt-2-ggml" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:24 msgid "" "**Model Hubs**: `Hugging Face " "`_" msgstr "" #: ../../source/models/builtin/llm/gpt-2.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-02-01 16:47+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.13.1\n" #: ../../source/models/builtin/llm/index.rst:5 msgid "Large language Models" msgstr "大语言模型" #: ../../source/models/builtin/llm/index.rst:7 msgid "The following is a list of built-in LLM in Xinference:" msgstr "以下是 Xinference 中内置的 LLM 列表:" #: ../../source/models/builtin/llm/index.rst:13 msgid "MODEL NAME" msgstr "" #: ../../source/models/builtin/llm/index.rst:14 msgid "ABILITIES" msgstr "" #: ../../source/models/builtin/llm/index.rst:15 msgid "COTNEXT_LENGTH" msgstr "" #: ../../source/models/builtin/llm/index.rst:16 msgid "DESCRIPTION" msgstr "" #: ../../source/models/builtin/llm/index.rst:19 msgid ":ref:`baichuan `" msgstr "" #: ../../source/models/builtin/llm/index.rst:20 #: ../../source/models/builtin/llm/index.rst:25 #: ../../source/models/builtin/llm/index.rst:65 #: ../../source/models/builtin/llm/index.rst:75 #: ../../source/models/builtin/llm/index.rst:90 #: ../../source/models/builtin/llm/index.rst:110 #: ../../source/models/builtin/llm/index.rst:115 #: ../../source/models/builtin/llm/index.rst:120 #: ../../source/models/builtin/llm/index.rst:140 #: ../../source/models/builtin/llm/index.rst:160 #: ../../source/models/builtin/llm/index.rst:170 #: ../../source/models/builtin/llm/index.rst:185 #: ../../source/models/builtin/llm/index.rst:205 #: ../../source/models/builtin/llm/index.rst:220 #: ../../source/models/builtin/llm/index.rst:225 #: ../../source/models/builtin/llm/index.rst:235 #: ../../source/models/builtin/llm/index.rst:240 #: ../../source/models/builtin/llm/index.rst:245 #: ../../source/models/builtin/llm/index.rst:280 #: ../../source/models/builtin/llm/index.rst:290 #: ../../source/models/builtin/llm/index.rst:295 msgid "generate" msgstr "" #: ../../source/models/builtin/llm/index.rst:21 #: ../../source/models/builtin/llm/index.rst:26 #: ../../source/models/builtin/llm/index.rst:31 #: ../../source/models/builtin/llm/index.rst:36 #: ../../source/models/builtin/llm/index.rst:81 #: ../../source/models/builtin/llm/index.rst:86 #: ../../source/models/builtin/llm/index.rst:106 #: ../../source/models/builtin/llm/index.rst:131 #: ../../source/models/builtin/llm/index.rst:141 #: ../../source/models/builtin/llm/index.rst:146 #: ../../source/models/builtin/llm/index.rst:196 #: ../../source/models/builtin/llm/index.rst:201 #: ../../source/models/builtin/llm/index.rst:216 #: ../../source/models/builtin/llm/index.rst:221 #: ../../source/models/builtin/llm/index.rst:226 #: ../../source/models/builtin/llm/index.rst:256 #: ../../source/models/builtin/llm/index.rst:291 msgid "4096" msgstr "" #: ../../source/models/builtin/llm/index.rst:22 msgid "" "Baichuan is an open-source Transformer based LLM that is trained on both " "Chinese and English data." msgstr "" #: ../../source/models/builtin/llm/index.rst:24 msgid ":ref:`baichuan-2 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:27 msgid "" "Baichuan2 is an open-source Transformer based LLM that is trained on both" " Chinese and English data." msgstr "" #: ../../source/models/builtin/llm/index.rst:29 msgid ":ref:`baichuan-2-chat `" msgstr "" #: ../../source/models/builtin/llm/index.rst:30 #: ../../source/models/builtin/llm/index.rst:35 #: ../../source/models/builtin/llm/index.rst:40 #: ../../source/models/builtin/llm/index.rst:45 #: ../../source/models/builtin/llm/index.rst:50 #: ../../source/models/builtin/llm/index.rst:60 #: ../../source/models/builtin/llm/index.rst:70 #: ../../source/models/builtin/llm/index.rst:80 #: ../../source/models/builtin/llm/index.rst:85 #: ../../source/models/builtin/llm/index.rst:95 #: ../../source/models/builtin/llm/index.rst:100 #: ../../source/models/builtin/llm/index.rst:105 #: ../../source/models/builtin/llm/index.rst:125 #: ../../source/models/builtin/llm/index.rst:130 #: ../../source/models/builtin/llm/index.rst:135 #: ../../source/models/builtin/llm/index.rst:145 #: ../../source/models/builtin/llm/index.rst:150 #: ../../source/models/builtin/llm/index.rst:155 #: ../../source/models/builtin/llm/index.rst:165 #: ../../source/models/builtin/llm/index.rst:175 #: ../../source/models/builtin/llm/index.rst:180 #: ../../source/models/builtin/llm/index.rst:190 #: ../../source/models/builtin/llm/index.rst:195 #: ../../source/models/builtin/llm/index.rst:200 #: ../../source/models/builtin/llm/index.rst:230 #: ../../source/models/builtin/llm/index.rst:250 #: ../../source/models/builtin/llm/index.rst:255 #: ../../source/models/builtin/llm/index.rst:260 #: ../../source/models/builtin/llm/index.rst:265 #: ../../source/models/builtin/llm/index.rst:270 #: ../../source/models/builtin/llm/index.rst:275 #: ../../source/models/builtin/llm/index.rst:285 #: ../../source/models/builtin/llm/index.rst:300 #: ../../source/models/builtin/llm/index.rst:310 #: ../../source/models/builtin/llm/index.rst:315 msgid "chat" msgstr "" #: ../../source/models/builtin/llm/index.rst:32 msgid "" "Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing " "in chatting." msgstr "" #: ../../source/models/builtin/llm/index.rst:34 msgid ":ref:`baichuan-chat `" msgstr "" #: ../../source/models/builtin/llm/index.rst:37 msgid "" "Baichuan-chat is a fine-tuned version of the Baichuan LLM, specializing " "in chatting." msgstr "" #: ../../source/models/builtin/llm/index.rst:39 msgid ":ref:`chatglm `" msgstr "" #: ../../source/models/builtin/llm/index.rst:41 #: ../../source/models/builtin/llm/index.rst:91 #: ../../source/models/builtin/llm/index.rst:96 #: ../../source/models/builtin/llm/index.rst:176 #: ../../source/models/builtin/llm/index.rst:186 #: ../../source/models/builtin/llm/index.rst:191 #: ../../source/models/builtin/llm/index.rst:206 #: ../../source/models/builtin/llm/index.rst:246 #: ../../source/models/builtin/llm/index.rst:251 #: ../../source/models/builtin/llm/index.rst:271 #: ../../source/models/builtin/llm/index.rst:276 #: ../../source/models/builtin/llm/index.rst:281 #: ../../source/models/builtin/llm/index.rst:286 msgid "2048" msgstr "" #: ../../source/models/builtin/llm/index.rst:42 msgid "" "ChatGLM is an open-source General Language Model (GLM) based LLM trained " "on both Chinese and English data." msgstr "" #: ../../source/models/builtin/llm/index.rst:44 msgid ":ref:`chatglm2 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:46 #: ../../source/models/builtin/llm/index.rst:56 #: ../../source/models/builtin/llm/index.rst:121 #: ../../source/models/builtin/llm/index.rst:151 #: ../../source/models/builtin/llm/index.rst:156 #: ../../source/models/builtin/llm/index.rst:161 #: ../../source/models/builtin/llm/index.rst:181 #: ../../source/models/builtin/llm/index.rst:231 #: ../../source/models/builtin/llm/index.rst:236 #: ../../source/models/builtin/llm/index.rst:241 #: ../../source/models/builtin/llm/index.rst:311 #: ../../source/models/builtin/llm/index.rst:316 msgid "8192" msgstr "" #: ../../source/models/builtin/llm/index.rst:47 msgid "" "ChatGLM2 is the second generation of ChatGLM, still open-source and " "trained on Chinese and English data." msgstr "" #: ../../source/models/builtin/llm/index.rst:49 msgid ":ref:`chatglm2-32k `" msgstr "" #: ../../source/models/builtin/llm/index.rst:51 #: ../../source/models/builtin/llm/index.rst:61 #: ../../source/models/builtin/llm/index.rst:166 #: ../../source/models/builtin/llm/index.rst:171 #: ../../source/models/builtin/llm/index.rst:211 msgid "32768" msgstr "" #: ../../source/models/builtin/llm/index.rst:52 msgid "" "ChatGLM2-32k is a special version of ChatGLM2, with a context window of " "32k tokens instead of 8k." msgstr "" #: ../../source/models/builtin/llm/index.rst:54 msgid ":ref:`chatglm3 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:55 #: ../../source/models/builtin/llm/index.rst:210 msgid "chat, tools" msgstr "" #: ../../source/models/builtin/llm/index.rst:57 #: ../../source/models/builtin/llm/index.rst:62 msgid "" "ChatGLM3 is the third generation of ChatGLM, still open-source and " "trained on Chinese and English data." msgstr "" #: ../../source/models/builtin/llm/index.rst:59 msgid ":ref:`chatglm3-32k `" msgstr "" #: ../../source/models/builtin/llm/index.rst:64 msgid ":ref:`code-llama `" msgstr "" #: ../../source/models/builtin/llm/index.rst:66 #: ../../source/models/builtin/llm/index.rst:71 #: ../../source/models/builtin/llm/index.rst:76 #: ../../source/models/builtin/llm/index.rst:101 #: ../../source/models/builtin/llm/index.rst:266 msgid "100000" msgstr "" #: ../../source/models/builtin/llm/index.rst:67 msgid "" "Code-Llama is an open-source LLM trained by fine-tuning LLaMA2 for " "generating and discussing code." msgstr "" #: ../../source/models/builtin/llm/index.rst:69 msgid ":ref:`code-llama-instruct `" msgstr "" #: ../../source/models/builtin/llm/index.rst:72 msgid "Code-Llama-Instruct is an instruct-tuned version of the Code-Llama LLM." msgstr "" #: ../../source/models/builtin/llm/index.rst:74 msgid ":ref:`code-llama-python `" msgstr "" #: ../../source/models/builtin/llm/index.rst:77 msgid "" "Code-Llama-Python is a fine-tuned version of the Code-Llama LLM, " "specializing in Python." msgstr "" #: ../../source/models/builtin/llm/index.rst:79 msgid ":ref:`deepseek-chat `" msgstr "" #: ../../source/models/builtin/llm/index.rst:82 msgid "" "DeepSeek LLM is an advanced language model comprising 67 billion " "parameters. It has been trained from scratch on a vast dataset of 2 " "trillion tokens in both English and Chinese." msgstr "" #: ../../source/models/builtin/llm/index.rst:84 msgid ":ref:`deepseek-coder-instruct `" msgstr "" #: ../../source/models/builtin/llm/index.rst:87 msgid "" "deepseek-coder-instruct is a model initialized from deepseek-coder-base " "and fine-tuned on 2B tokens of instruction data." msgstr "" #: ../../source/models/builtin/llm/index.rst:89 msgid ":ref:`falcon `" msgstr "" #: ../../source/models/builtin/llm/index.rst:92 msgid "" "Falcon is an open-source Transformer based LLM trained on the RefinedWeb " "dataset." msgstr "" #: ../../source/models/builtin/llm/index.rst:94 msgid ":ref:`falcon-instruct `" msgstr "" #: ../../source/models/builtin/llm/index.rst:97 msgid "" "Falcon-instruct is a fine-tuned version of the Falcon LLM, specializing " "in chatting." msgstr "" #: ../../source/models/builtin/llm/index.rst:99 msgid ":ref:`glaive-coder `" msgstr "" #: ../../source/models/builtin/llm/index.rst:102 msgid "" "A code model trained on a dataset of ~140k programming related problems " "and solutions generated from Glaive’s synthetic data generation platform" "." msgstr "" #: ../../source/models/builtin/llm/index.rst:104 msgid ":ref:`gorilla-openfunctions-v1 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:107 msgid "" "OpenFunctions is designed to extend Large Language Model (LLM) Chat " "Completion feature to formulate executable APIs call given natural " "language instructions and API context." msgstr "" #: ../../source/models/builtin/llm/index.rst:109 msgid ":ref:`gpt-2 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:111 msgid "1024" msgstr "" #: ../../source/models/builtin/llm/index.rst:112 msgid "" "GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB " "dataset of Reddit posts with 3+ upvotes." msgstr "" #: ../../source/models/builtin/llm/index.rst:114 msgid ":ref:`internlm-20b `" msgstr "" #: ../../source/models/builtin/llm/index.rst:116 #: ../../source/models/builtin/llm/index.rst:126 #: ../../source/models/builtin/llm/index.rst:261 msgid "16384" msgstr "" #: ../../source/models/builtin/llm/index.rst:117 msgid "" "Pre-trained on over 2.3T Tokens containing high-quality English, Chinese," " and code data." msgstr "" #: ../../source/models/builtin/llm/index.rst:119 msgid ":ref:`internlm-7b `" msgstr "" #: ../../source/models/builtin/llm/index.rst:122 msgid "" "InternLM is a Transformer-based LLM that is trained on both Chinese and " "English data, focusing on practical scenarios." msgstr "" #: ../../source/models/builtin/llm/index.rst:124 msgid ":ref:`internlm-chat-20b `" msgstr "" #: ../../source/models/builtin/llm/index.rst:127 msgid "" "Pre-trained on over 2.3T Tokens containing high-quality English, Chinese," " and code data. The Chat version has undergone SFT and RLHF training." msgstr "" #: ../../source/models/builtin/llm/index.rst:129 msgid ":ref:`internlm-chat-7b `" msgstr "" #: ../../source/models/builtin/llm/index.rst:132 msgid "" "Internlm-chat is a fine-tuned version of the Internlm LLM, specializing " "in chatting." msgstr "" #: ../../source/models/builtin/llm/index.rst:134 msgid ":ref:`internlm2-chat `" msgstr "" #: ../../source/models/builtin/llm/index.rst:136 #: ../../source/models/builtin/llm/index.rst:296 #: ../../source/models/builtin/llm/index.rst:301 #: ../../source/models/builtin/llm/index.rst:306 msgid "204800" msgstr "" #: ../../source/models/builtin/llm/index.rst:137 msgid "The second generation of the InternLM model, InternLM2." msgstr "" #: ../../source/models/builtin/llm/index.rst:139 msgid ":ref:`llama-2 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:142 msgid "" "Llama-2 is the second generation of Llama, open-source and trained on a " "larger amount of data." msgstr "" #: ../../source/models/builtin/llm/index.rst:144 msgid ":ref:`llama-2-chat `" msgstr "" #: ../../source/models/builtin/llm/index.rst:147 msgid "" "Llama-2-Chat is a fine-tuned version of the Llama-2 LLM, specializing in " "chatting." msgstr "" #: ../../source/models/builtin/llm/index.rst:149 msgid ":ref:`mistral-instruct-v0.1 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:152 msgid "" "Mistral-7B-Instruct is a fine-tuned version of the Mistral-7B LLM on " "public datasets, specializing in chatting." msgstr "" #: ../../source/models/builtin/llm/index.rst:154 msgid ":ref:`mistral-instruct-v0.2 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:157 msgid "" "The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved " "instruct fine-tuned version of Mistral-7B-Instruct-v0.1." msgstr "" #: ../../source/models/builtin/llm/index.rst:159 msgid ":ref:`mistral-v0.1 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:162 msgid "" "Mistral-7B is a unmoderated Transformer based LLM claiming to outperform " "Llama2 on all benchmarks." msgstr "" #: ../../source/models/builtin/llm/index.rst:164 msgid ":ref:`mixtral-instruct-v0.1 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:167 msgid "" "Mistral-8x7B-Instruct is a fine-tuned version of the Mistral-8x7B LLM, " "specializing in chatting." msgstr "" #: ../../source/models/builtin/llm/index.rst:169 msgid ":ref:`mixtral-v0.1 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:172 msgid "" "The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative " "Sparse Mixture of Experts." msgstr "" #: ../../source/models/builtin/llm/index.rst:174 msgid ":ref:`openbuddy `" msgstr "" #: ../../source/models/builtin/llm/index.rst:177 msgid "" "OpenBuddy is a powerful open multilingual chatbot model aimed at global " "users." msgstr "" #: ../../source/models/builtin/llm/index.rst:179 msgid ":ref:`openhermes-2.5 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:182 msgid "" "Openhermes 2.5 is a fine-tuned version of Mistral-7B-v0.1 on primarily " "GPT-4 generated data." msgstr "" #: ../../source/models/builtin/llm/index.rst:184 msgid ":ref:`opt `" msgstr "" #: ../../source/models/builtin/llm/index.rst:187 msgid "" "Opt is an open-source, decoder-only, Transformer based LLM that was " "designed to replicate GPT-3." msgstr "" #: ../../source/models/builtin/llm/index.rst:189 msgid ":ref:`orca `" msgstr "" #: ../../source/models/builtin/llm/index.rst:192 msgid "" "Orca is an LLM trained by fine-tuning LLaMA on explanation traces " "obtained from GPT-4." msgstr "" #: ../../source/models/builtin/llm/index.rst:194 msgid ":ref:`orion-chat `" msgstr "" #: ../../source/models/builtin/llm/index.rst:197 #: ../../source/models/builtin/llm/index.rst:202 msgid "" "Orion-14B series models are open-source multilingual large language " "models trained from scratch by OrionStarAI." msgstr "" #: ../../source/models/builtin/llm/index.rst:199 msgid ":ref:`orion-chat-rag `" msgstr "" #: ../../source/models/builtin/llm/index.rst:204 msgid ":ref:`phi-2 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:207 msgid "" "Phi-2 is a 2.7B Transformer based LLM used for research on model safety, " "trained with data similar to Phi-1.5 but augmented with synthetic texts " "and curated websites." msgstr "" #: ../../source/models/builtin/llm/index.rst:209 msgid ":ref:`qwen-chat `" msgstr "" #: ../../source/models/builtin/llm/index.rst:212 msgid "" "Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment " "techniques, specializing in chatting." msgstr "" #: ../../source/models/builtin/llm/index.rst:214 msgid ":ref:`qwen-vl-chat `" msgstr "" #: ../../source/models/builtin/llm/index.rst:215 #: ../../source/models/builtin/llm/index.rst:305 msgid "chat, vision" msgstr "" #: ../../source/models/builtin/llm/index.rst:217 msgid "" "Qwen-VL-Chat supports more flexible interaction, such as multiple image " "inputs, multi-round question answering, and creative capabilities." msgstr "" #: ../../source/models/builtin/llm/index.rst:219 msgid ":ref:`skywork `" msgstr "" #: ../../source/models/builtin/llm/index.rst:222 #: ../../source/models/builtin/llm/index.rst:227 msgid "" "Skywork is a series of large models developed by the Kunlun Group · " "Skywork team." msgstr "" #: ../../source/models/builtin/llm/index.rst:224 msgid ":ref:`skywork-math `" msgstr "" #: ../../source/models/builtin/llm/index.rst:229 msgid ":ref:`starchat-beta `" msgstr "" #: ../../source/models/builtin/llm/index.rst:232 msgid "" "Starchat-beta is a fine-tuned version of the Starcoderplus LLM, " "specializing in coding assistance." msgstr "" #: ../../source/models/builtin/llm/index.rst:234 msgid ":ref:`starcoder `" msgstr "" #: ../../source/models/builtin/llm/index.rst:237 msgid "" "Starcoder is an open-source Transformer based LLM that is trained on " "permissively licensed data from GitHub." msgstr "" #: ../../source/models/builtin/llm/index.rst:239 msgid ":ref:`starcoderplus `" msgstr "" #: ../../source/models/builtin/llm/index.rst:242 msgid "" "Starcoderplus is an open-source LLM trained by fine-tuning Starcoder on " "RedefinedWeb and StarCoderData datasets." msgstr "" #: ../../source/models/builtin/llm/index.rst:244 msgid ":ref:`tiny-llama `" msgstr "" #: ../../source/models/builtin/llm/index.rst:247 msgid "" "The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion " "tokens." msgstr "" #: ../../source/models/builtin/llm/index.rst:249 msgid ":ref:`vicuna-v1.3 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:252 #: ../../source/models/builtin/llm/index.rst:257 msgid "" "Vicuna is an open-source LLM trained by fine-tuning LLaMA on data " "collected from ShareGPT." msgstr "" #: ../../source/models/builtin/llm/index.rst:254 msgid ":ref:`vicuna-v1.5 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:259 msgid ":ref:`vicuna-v1.5-16k `" msgstr "" #: ../../source/models/builtin/llm/index.rst:262 msgid "" "Vicuna-v1.5-16k is a special version of Vicuna-v1.5, with a context " "window of 16k tokens instead of 4k." msgstr "" #: ../../source/models/builtin/llm/index.rst:264 msgid ":ref:`wizardcoder-python-v1.0 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:269 msgid ":ref:`wizardlm-v1.0 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:272 msgid "" "WizardLM is an open-source LLM trained by fine-tuning LLaMA with Evol-" "Instruct." msgstr "" #: ../../source/models/builtin/llm/index.rst:274 msgid ":ref:`wizardmath-v1.0 `" msgstr "" #: ../../source/models/builtin/llm/index.rst:277 msgid "" "WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-" "Instruct, specializing in math." msgstr "" #: ../../source/models/builtin/llm/index.rst:279 msgid ":ref:`xverse `" msgstr "" #: ../../source/models/builtin/llm/index.rst:282 msgid "" "XVERSE is a multilingual large language model, independently developed by" " Shenzhen Yuanxiang Technology." msgstr "" #: ../../source/models/builtin/llm/index.rst:284 msgid ":ref:`xverse-chat `" msgstr "" #: ../../source/models/builtin/llm/index.rst:287 msgid "XVERSEB-Chat is the aligned version of model XVERSE." msgstr "" #: ../../source/models/builtin/llm/index.rst:289 msgid ":ref:`yi `" msgstr "" #: ../../source/models/builtin/llm/index.rst:292 #: ../../source/models/builtin/llm/index.rst:297 #: ../../source/models/builtin/llm/index.rst:302 msgid "" "The Yi series models are large language models trained from scratch by " "developers at 01.AI." msgstr "" #: ../../source/models/builtin/llm/index.rst:294 msgid ":ref:`yi-200k `" msgstr "" #: ../../source/models/builtin/llm/index.rst:299 msgid ":ref:`yi-chat `" msgstr "" #: ../../source/models/builtin/llm/index.rst:304 msgid ":ref:`yi-vl-chat `" msgstr "" #: ../../source/models/builtin/llm/index.rst:307 msgid "" "Yi Vision Language (Yi-VL) model is the open-source, multimodal version " "of the Yi Large Language Model (LLM) series, enabling content " "comprehension, recognition, and multi-round conversations about images." msgstr "" #: ../../source/models/builtin/llm/index.rst:309 msgid ":ref:`zephyr-7b-alpha `" msgstr "" #: ../../source/models/builtin/llm/index.rst:312 msgid "" "Zephyr-7B-α is the first model in the series, and is a fine-tuned " "version of mistralai/Mistral-7B-v0.1." msgstr "" #: ../../source/models/builtin/llm/index.rst:314 msgid ":ref:`zephyr-7b-beta `" msgstr "" #: ../../source/models/builtin/llm/index.rst:317 msgid "" "Zephyr-7B-β is the second model in the series, and is a fine-tuned " "version of mistralai/Mistral-7B-v0.1" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/internlm-20b.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/internlm-20b.rst:5 msgid "internlm-20b" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:7 msgid "**Context Length:** 16384" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:8 msgid "**Model Name:** internlm-20b" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:11 msgid "" "**Description:** Pre-trained on over 2.3T Tokens containing high-quality " "English, Chinese, and code data." msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:18 msgid "Model Spec 1 (pytorch, 20 Billion)" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:21 msgid "**Model Size (in billions):** 20" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:23 msgid "**Model ID:** internlm/internlm-20b" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/internlm-20b.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/internlm-7b.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/internlm-7b.rst:5 msgid "internlm-7b" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:8 msgid "**Model Name:** internlm-7b" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:11 msgid "" "**Description:** InternLM is a Transformer-based LLM that is trained on " "both Chinese and English data, focusing on practical scenarios." msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:23 msgid "**Model ID:** internlm/internlm-7b" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/internlm-7b.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/internlm-chat-20b.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:5 msgid "internlm-chat-20b" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:7 msgid "**Context Length:** 16384" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:8 msgid "**Model Name:** internlm-chat-20b" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:11 msgid "" "**Description:** Pre-trained on over 2.3T Tokens containing high-quality " "English, Chinese, and code data. The Chat version has undergone SFT and " "RLHF training." msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:18 msgid "Model Spec 1 (pytorch, 20 Billion)" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:21 msgid "**Model Size (in billions):** 20" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:23 msgid "**Model ID:** internlm/internlm-chat-20b" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-20b.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/internlm-chat-7b.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:5 msgid "internlm-chat-7b" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:8 msgid "**Model Name:** internlm-chat-7b" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:11 msgid "" "**Description:** Internlm-chat is a fine-tuned version of the Internlm " "LLM, specializing in chatting." msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:23 msgid "**Model ID:** internlm/internlm-chat-7b" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/internlm-chat-7b.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/llama-2-chat.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/llama-2-chat.rst:5 msgid "llama-2-chat" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:8 msgid "**Model Name:** llama-2-chat" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:11 msgid "" "**Description:** Llama-2-Chat is a fine-tuned version of the Llama-2 LLM," " specializing in chatting." msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:18 msgid "Model Spec 1 (ggmlv3, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:20 #: ../../source/models/builtin/llm/llama-2-chat.rst:35 #: ../../source/models/builtin/llm/llama-2-chat.rst:50 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:21 #: ../../source/models/builtin/llm/llama-2-chat.rst:66 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:22 #: ../../source/models/builtin/llm/llama-2-chat.rst:37 #: ../../source/models/builtin/llm/llama-2-chat.rst:52 msgid "" "**Quantizations:** q2_K, q3_K_L, q3_K_M, q3_K_S, q4_0, q4_1, q4_K_M, " "q4_K_S, q5_0, q5_1, q5_K_M, q5_K_S, q6_K, q8_0" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:23 msgid "**Model ID:** TheBloke/Llama-2-7B-Chat-GGML" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:26 #: ../../source/models/builtin/llm/llama-2-chat.rst:41 #: ../../source/models/builtin/llm/llama-2-chat.rst:56 #: ../../source/models/builtin/llm/llama-2-chat.rst:71 #: ../../source/models/builtin/llm/llama-2-chat.rst:86 #: ../../source/models/builtin/llm/llama-2-chat.rst:101 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:33 msgid "Model Spec 2 (ggmlv3, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:36 #: ../../source/models/builtin/llm/llama-2-chat.rst:81 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:38 msgid "**Model ID:** TheBloke/Llama-2-13B-chat-GGML" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:48 msgid "Model Spec 3 (ggmlv3, 70 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:51 #: ../../source/models/builtin/llm/llama-2-chat.rst:96 msgid "**Model Size (in billions):** 70" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:53 msgid "**Model ID:** TheBloke/Llama-2-70B-Chat-GGML" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:63 msgid "Model Spec 4 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:65 #: ../../source/models/builtin/llm/llama-2-chat.rst:80 #: ../../source/models/builtin/llm/llama-2-chat.rst:95 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:67 #: ../../source/models/builtin/llm/llama-2-chat.rst:82 #: ../../source/models/builtin/llm/llama-2-chat.rst:97 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:68 msgid "**Model ID:** meta-llama/Llama-2-7b-chat-hf" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:69 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:78 msgid "Model Spec 5 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:83 msgid "**Model ID:** meta-llama/Llama-2-13b-chat-hf" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:84 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:93 msgid "Model Spec 6 (pytorch, 70 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:98 msgid "**Model ID:** meta-llama/Llama-2-70b-chat-hf" msgstr "" #: ../../source/models/builtin/llm/llama-2-chat.rst:99 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/llama-2.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/llama-2.rst:5 msgid "llama-2" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:8 msgid "**Model Name:** llama-2" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:11 msgid "" "**Description:** Llama-2 is the second generation of Llama, open-source " "and trained on a larger amount of data." msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:18 msgid "Model Spec 1 (ggmlv3, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:20 #: ../../source/models/builtin/llm/llama-2.rst:35 #: ../../source/models/builtin/llm/llama-2.rst:50 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:21 #: ../../source/models/builtin/llm/llama-2.rst:66 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:22 #: ../../source/models/builtin/llm/llama-2.rst:37 #: ../../source/models/builtin/llm/llama-2.rst:52 msgid "" "**Quantizations:** q2_K, q3_K_L, q3_K_M, q3_K_S, q4_0, q4_1, q4_K_M, " "q4_K_S, q5_0, q5_1, q5_K_M, q5_K_S, q6_K, q8_0" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:23 msgid "**Model ID:** TheBloke/Llama-2-7B-GGML" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:26 #: ../../source/models/builtin/llm/llama-2.rst:41 #: ../../source/models/builtin/llm/llama-2.rst:56 #: ../../source/models/builtin/llm/llama-2.rst:71 #: ../../source/models/builtin/llm/llama-2.rst:86 #: ../../source/models/builtin/llm/llama-2.rst:101 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:33 msgid "Model Spec 2 (ggmlv3, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:36 #: ../../source/models/builtin/llm/llama-2.rst:81 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:38 msgid "**Model ID:** TheBloke/Llama-2-13B-GGML" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:48 msgid "Model Spec 3 (ggmlv3, 70 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:51 #: ../../source/models/builtin/llm/llama-2.rst:96 msgid "**Model Size (in billions):** 70" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:53 msgid "**Model ID:** TheBloke/Llama-2-70B-GGML" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:63 msgid "Model Spec 4 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:65 #: ../../source/models/builtin/llm/llama-2.rst:80 #: ../../source/models/builtin/llm/llama-2.rst:95 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:67 #: ../../source/models/builtin/llm/llama-2.rst:82 #: ../../source/models/builtin/llm/llama-2.rst:97 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:68 msgid "**Model ID:** meta-llama/Llama-2-7b-hf" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:69 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:78 msgid "Model Spec 5 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:83 msgid "**Model ID:** meta-llama/Llama-2-13b-hf" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:84 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:93 msgid "Model Spec 6 (pytorch, 70 Billion)" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:98 msgid "**Model ID:** meta-llama/Llama-2-70b-hf" msgstr "" #: ../../source/models/builtin/llm/llama-2.rst:99 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/mistral-instruct-v0.1.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:5 msgid "mistral-instruct-v0.1" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:8 msgid "**Model Name:** mistral-instruct-v0.1" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:11 msgid "" "**Description:** Mistral-7B-Instruct is a fine-tuned version of the " "Mistral-7B LLM on public datasets, specializing in chatting." msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:21 #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:36 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:23 msgid "**Model ID:** mistralai/Mistral-7B-Instruct-v0.1" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:26 #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:33 msgid "Model Spec 2 (ggufv2, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:35 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:37 msgid "" "**Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, " "Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:38 msgid "**Model ID:** TheBloke/Mistral-7B-Instruct-v0.1-GGUF" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.1.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/mistral-instruct-v0.2.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:5 msgid "mistral-instruct-v0.2" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:8 msgid "**Model Name:** mistral-instruct-v0.2" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:11 msgid "" "**Description:** The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) " "is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1." msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:21 #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:36 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:23 msgid "**Model ID:** mistralai/Mistral-7B-Instruct-v0.2" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:26 #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:33 msgid "Model Spec 2 (ggufv2, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:35 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:37 msgid "" "**Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, " "Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:38 msgid "**Model ID:** TheBloke/Mistral-7B-Instruct-v0.2-GGUF" msgstr "" #: ../../source/models/builtin/llm/mistral-instruct-v0.2.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/mistral-v0.1.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/mistral-v0.1.rst:5 msgid "mistral-v0.1" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:8 msgid "**Model Name:** mistral-v0.1" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:11 msgid "" "**Description:** Mistral-7B is a unmoderated Transformer based LLM " "claiming to outperform Llama2 on all benchmarks." msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:21 #: ../../source/models/builtin/llm/mistral-v0.1.rst:36 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:23 msgid "**Model ID:** mistralai/Mistral-7B-v0.1" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:26 #: ../../source/models/builtin/llm/mistral-v0.1.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:33 msgid "Model Spec 2 (ggufv2, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:35 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:37 msgid "" "**Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, " "Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:38 msgid "**Model ID:** TheBloke/Mistral-7B-v0.1-GGUF" msgstr "" #: ../../source/models/builtin/llm/mistral-v0.1.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/mixtral-instruct-v0.1.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:5 msgid "mixtral-instruct-v0.1" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:7 msgid "**Context Length:** 32768" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:8 msgid "**Model Name:** mixtral-instruct-v0.1" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:9 msgid "**Languages:** en, fr, it, de, es" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:11 msgid "" "**Description:** Mistral-8x7B-Instruct is a fine-tuned version of the " "Mistral-8x7B LLM, specializing in chatting." msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:18 msgid "Model Spec 1 (pytorch, 46_7 Billion)" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:21 #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:36 msgid "**Model Size (in billions):** 46_7" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:23 msgid "**Model ID:** mistralai/Mixtral-8x7B-Instruct-v0.1" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:26 #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:33 msgid "Model Spec 2 (ggufv2, 46_7 Billion)" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:35 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:37 msgid "**Quantizations:** Q2_K, Q3_K_M, Q4_0, Q4_K_M, Q5_0, Q5_K_M, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:38 msgid "**Model ID:** TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF" msgstr "" #: ../../source/models/builtin/llm/mixtral-instruct-v0.1.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/mixtral-v0.1.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:5 msgid "mixtral-v0.1" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:7 msgid "**Context Length:** 32768" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:8 msgid "**Model Name:** mixtral-v0.1" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:9 msgid "**Languages:** en, fr, it, de, es" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:11 msgid "" "**Description:** The Mixtral-8x7B Large Language Model (LLM) is a " "pretrained generative Sparse Mixture of Experts." msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:18 msgid "Model Spec 1 (pytorch, 46_7 Billion)" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:21 #: ../../source/models/builtin/llm/mixtral-v0.1.rst:36 msgid "**Model Size (in billions):** 46_7" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:23 msgid "**Model ID:** mistralai/Mixtral-8x7B-v0.1" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:26 #: ../../source/models/builtin/llm/mixtral-v0.1.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:33 msgid "Model Spec 2 (ggufv2, 46_7 Billion)" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:35 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:37 msgid "**Quantizations:** Q2_K, Q3_K_M, Q4_0, Q4_K_M, Q5_0, Q5_K_M, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:38 msgid "**Model ID:** TheBloke/Mixtral-8x7B-v0.1-GGUF" msgstr "" #: ../../source/models/builtin/llm/mixtral-v0.1.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/openbuddy.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/openbuddy.rst:5 msgid "OpenBuddy" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:8 msgid "**Model Name:** OpenBuddy" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:11 msgid "" "**Description:** OpenBuddy is a powerful open multilingual chatbot model " "aimed at global users." msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:18 msgid "Model Spec 1 (ggmlv3, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:20 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:21 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:22 msgid "" "**Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_1, Q4_K_S, " "Q4_K_M, Q5_0, Q5_1, Q5_K_S, Q5_K_M, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:23 msgid "**Model ID:** TheBloke/OpenBuddy-Llama2-13B-v11.1-GGML" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/openbuddy.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/openhermes-2.5.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/openhermes-2.5.rst:5 msgid "openhermes-2.5" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:8 msgid "**Model Name:** openhermes-2.5" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:11 msgid "" "**Description:** Openhermes 2.5 is a fine-tuned version of Mistral-" "7B-v0.1 on primarily GPT-4 generated data." msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:21 #: ../../source/models/builtin/llm/openhermes-2.5.rst:36 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:23 msgid "**Model ID:** teknium/OpenHermes-2.5-Mistral-7B" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:24 msgid "" "**Model Hubs**: `Hugging Face " "`_" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:26 #: ../../source/models/builtin/llm/openhermes-2.5.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:33 msgid "Model Spec 2 (ggufv2, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:35 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:37 msgid "" "**Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, " "Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:38 msgid "**Model ID:** TheBloke/OpenHermes-2.5-Mistral-7B-GGUF" msgstr "" #: ../../source/models/builtin/llm/openhermes-2.5.rst:39 msgid "" "**Model Hubs**: `Hugging Face " "`_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/opt.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/opt.rst:5 msgid "opt" msgstr "" #: ../../source/models/builtin/llm/opt.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/opt.rst:8 msgid "**Model Name:** opt" msgstr "" #: ../../source/models/builtin/llm/opt.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/opt.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/opt.rst:11 msgid "" "**Description:** Opt is an open-source, decoder-only, Transformer based " "LLM that was designed to replicate GPT-3." msgstr "" #: ../../source/models/builtin/llm/opt.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/opt.rst:18 msgid "Model Spec 1 (pytorch, 1 Billion)" msgstr "" #: ../../source/models/builtin/llm/opt.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/opt.rst:21 msgid "**Model Size (in billions):** 1" msgstr "" #: ../../source/models/builtin/llm/opt.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/opt.rst:23 msgid "**Model ID:** facebook/opt-125m" msgstr "" #: ../../source/models/builtin/llm/opt.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/opt.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/orca.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/orca.rst:5 msgid "orca" msgstr "" #: ../../source/models/builtin/llm/orca.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/orca.rst:8 msgid "**Model Name:** orca" msgstr "" #: ../../source/models/builtin/llm/orca.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/orca.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/orca.rst:11 msgid "" "**Description:** Orca is an LLM trained by fine-tuning LLaMA on " "explanation traces obtained from GPT-4." msgstr "" #: ../../source/models/builtin/llm/orca.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/orca.rst:18 msgid "Model Spec 1 (ggmlv3, 3 Billion)" msgstr "" #: ../../source/models/builtin/llm/orca.rst:20 #: ../../source/models/builtin/llm/orca.rst:35 #: ../../source/models/builtin/llm/orca.rst:50 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/orca.rst:21 msgid "**Model Size (in billions):** 3" msgstr "" #: ../../source/models/builtin/llm/orca.rst:22 #: ../../source/models/builtin/llm/orca.rst:37 #: ../../source/models/builtin/llm/orca.rst:52 msgid "**Quantizations:** q4_0, q4_1, q5_0, q5_1, q8_0" msgstr "" #: ../../source/models/builtin/llm/orca.rst:23 msgid "**Model ID:** TheBloke/orca_mini_3B-GGML" msgstr "" #: ../../source/models/builtin/llm/orca.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/orca.rst:26 #: ../../source/models/builtin/llm/orca.rst:41 #: ../../source/models/builtin/llm/orca.rst:56 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/orca.rst:33 msgid "Model Spec 2 (ggmlv3, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/orca.rst:36 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/orca.rst:38 msgid "**Model ID:** TheBloke/orca_mini_7B-GGML" msgstr "" #: ../../source/models/builtin/llm/orca.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/orca.rst:48 msgid "Model Spec 3 (ggmlv3, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/orca.rst:51 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/orca.rst:53 msgid "**Model ID:** TheBloke/orca_mini_13B-GGML" msgstr "" #: ../../source/models/builtin/llm/orca.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/qwen-chat.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/qwen-chat.rst:5 msgid "qwen-chat" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:8 msgid "**Model Name:** qwen-chat" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:11 msgid "" "**Description:** Qwen-chat is a fine-tuned version of the Qwen LLM " "trained with alignment techniques, specializing in chatting." msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:18 msgid "Model Spec 1 (ggufv2, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:20 #: ../../source/models/builtin/llm/qwen-chat.rst:35 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:21 #: ../../source/models/builtin/llm/qwen-chat.rst:66 #: ../../source/models/builtin/llm/qwen-chat.rst:111 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:22 #: ../../source/models/builtin/llm/qwen-chat.rst:37 msgid "**Quantizations:** Q4_K_M" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:23 msgid "**Model ID:** Xorbits/Qwen-7B-Chat-GGUF" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:26 #: ../../source/models/builtin/llm/qwen-chat.rst:41 #: ../../source/models/builtin/llm/qwen-chat.rst:56 #: ../../source/models/builtin/llm/qwen-chat.rst:71 #: ../../source/models/builtin/llm/qwen-chat.rst:86 #: ../../source/models/builtin/llm/qwen-chat.rst:101 #: ../../source/models/builtin/llm/qwen-chat.rst:116 #: ../../source/models/builtin/llm/qwen-chat.rst:131 #: ../../source/models/builtin/llm/qwen-chat.rst:146 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:33 msgid "Model Spec 2 (ggufv2, 14 Billion)" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:36 #: ../../source/models/builtin/llm/qwen-chat.rst:81 #: ../../source/models/builtin/llm/qwen-chat.rst:126 msgid "**Model Size (in billions):** 14" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:38 msgid "**Model ID:** Xorbits/Qwen-14B-Chat-GGUF" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:48 msgid "Model Spec 3 (pytorch, 1_8 Billion)" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:50 #: ../../source/models/builtin/llm/qwen-chat.rst:65 #: ../../source/models/builtin/llm/qwen-chat.rst:80 #: ../../source/models/builtin/llm/qwen-chat.rst:95 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:51 msgid "**Model Size (in billions):** 1_8" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:52 #: ../../source/models/builtin/llm/qwen-chat.rst:67 #: ../../source/models/builtin/llm/qwen-chat.rst:82 #: ../../source/models/builtin/llm/qwen-chat.rst:97 msgid "**Quantizations:** none" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:53 msgid "**Model ID:** Qwen/Qwen-1_8B-Chat" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:63 msgid "Model Spec 4 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:68 msgid "**Model ID:** Qwen/Qwen-7B-Chat" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:69 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:78 msgid "Model Spec 5 (pytorch, 14 Billion)" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:83 msgid "**Model ID:** Qwen/Qwen-14B-Chat" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:84 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:93 msgid "Model Spec 6 (pytorch, 72 Billion)" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:96 #: ../../source/models/builtin/llm/qwen-chat.rst:141 msgid "**Model Size (in billions):** 72" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:98 msgid "**Model ID:** Qwen/Qwen-72B-Chat" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:99 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:108 msgid "Model Spec 7 (gptq, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:110 #: ../../source/models/builtin/llm/qwen-chat.rst:125 #: ../../source/models/builtin/llm/qwen-chat.rst:140 msgid "**Model Format:** gptq" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:112 #: ../../source/models/builtin/llm/qwen-chat.rst:127 #: ../../source/models/builtin/llm/qwen-chat.rst:142 msgid "**Quantizations:** Int4, Int8" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:113 msgid "**Model ID:** Qwen/Qwen-7B-Chat-{quantization}" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:114 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:123 msgid "Model Spec 8 (gptq, 14 Billion)" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:128 msgid "**Model ID:** Qwen/Qwen-14B-Chat-{quantization}" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:129 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:138 msgid "Model Spec 9 (gptq, 72 Billion)" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:143 msgid "**Model ID:** Qwen/Qwen-72B-Chat-{quantization}" msgstr "" #: ../../source/models/builtin/llm/qwen-chat.rst:144 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/starchat-beta.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/starchat-beta.rst:5 msgid "starchat-beta" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:8 msgid "**Model Name:** starchat-beta" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:11 msgid "" "**Description:** Starchat-beta is a fine-tuned version of the " "Starcoderplus LLM, specializing in coding assistance." msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:18 msgid "Model Spec 1 (pytorch, 16 Billion)" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:21 msgid "**Model Size (in billions):** 16" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:23 msgid "**Model ID:** HuggingFaceH4/starchat-beta" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/starchat-beta.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/starcoder.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/starcoder.rst:5 msgid "starcoder" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:8 msgid "**Model Name:** starcoder" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:11 msgid "" "**Description:** Starcoder is an open-source Transformer based LLM that " "is trained on permissively licensed data from GitHub." msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:18 msgid "Model Spec 1 (ggmlv3, 16 Billion)" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:20 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:21 msgid "**Model Size (in billions):** 16" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:22 msgid "**Quantizations:** q4_0, q4_1, q5_0, q5_1, q8_0" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:23 msgid "**Model ID:** TheBloke/starcoder-GGML" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/starcoder.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/starcoderplus.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/starcoderplus.rst:5 msgid "starcoderplus" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:8 msgid "**Model Name:** starcoderplus" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:11 msgid "" "**Description:** Starcoderplus is an open-source LLM trained by fine-" "tuning Starcoder on RedefinedWeb and StarCoderData datasets." msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:18 msgid "Model Spec 1 (pytorch, 16 Billion)" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:21 msgid "**Model Size (in billions):** 16" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:23 msgid "**Model ID:** bigcode/starcoderplus" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:24 msgid "" "**Model Hubs**: `Hugging Face " "`_" msgstr "" #: ../../source/models/builtin/llm/starcoderplus.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/tiny-llama.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/tiny-llama.rst:5 msgid "tiny-llama" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:8 msgid "**Model Name:** tiny-llama" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:11 msgid "" "**Description:** The TinyLlama project aims to pretrain a 1.1B Llama " "model on 3 trillion tokens." msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:18 msgid "Model Spec 1 (ggufv2, 1 Billion)" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:20 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:21 msgid "**Model Size (in billions):** 1" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:22 msgid "" "**Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, " "Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:23 msgid "**Model ID:** TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:24 msgid "" "**Model Hubs**: `Hugging Face " "`_, " "`ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/tiny-llama.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/vicuna-v1.3.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:5 msgid "vicuna-v1.3" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:8 msgid "**Model Name:** vicuna-v1.3" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:11 msgid "" "**Description:** Vicuna is an open-source LLM trained by fine-tuning " "LLaMA on data collected from ShareGPT." msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:18 msgid "Model Spec 1 (ggmlv3, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:20 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:35 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:50 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:21 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:96 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:22 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:37 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:52 msgid "" "**Quantizations:** q2_K, q3_K_L, q3_K_M, q3_K_S, q4_0, q4_1, q4_K_M, " "q4_K_S, q5_0, q5_1, q5_K_M, q5_K_S, q6_K, q8_0" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:23 msgid "**Model ID:** TheBloke/vicuna-7B-v1.3-GGML" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:26 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:41 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:56 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:71 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:86 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:101 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:33 msgid "Model Spec 2 (ggmlv3, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:36 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:81 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:38 msgid "**Model ID:** TheBloke/vicuna-13b-v1.3.0-GGML" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:48 msgid "Model Spec 3 (ggmlv3, 33 Billion)" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:51 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:66 msgid "**Model Size (in billions):** 33" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:53 msgid "**Model ID:** TheBloke/vicuna-33B-GGML" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:63 msgid "Model Spec 4 (pytorch, 33 Billion)" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:65 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:80 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:95 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:67 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:82 #: ../../source/models/builtin/llm/vicuna-v1.3.rst:97 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:68 msgid "**Model ID:** lmsys/vicuna-33b-v1.3" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:69 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:78 msgid "Model Spec 5 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:83 msgid "**Model ID:** lmsys/vicuna-13b-v1.3" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:84 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:93 msgid "Model Spec 6 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:98 msgid "**Model ID:** lmsys/vicuna-7b-v1.3" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.3.rst:99 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/vicuna-v1.5-16k.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:5 msgid "vicuna-v1.5-16k" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:7 msgid "**Context Length:** 16384" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:8 msgid "**Model Name:** vicuna-v1.5-16k" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:11 msgid "" "**Description:** Vicuna-v1.5-16k is a special version of Vicuna-v1.5, " "with a context window of 16k tokens instead of 4k." msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:20 #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:22 #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:23 msgid "**Model ID:** lmsys/vicuna-7b-v1.5-16k" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:26 #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:33 msgid "Model Spec 2 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:36 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:38 msgid "**Model ID:** lmsys/vicuna-13b-v1.5-16k" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5-16k.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/vicuna-v1.5.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:5 msgid "vicuna-v1.5" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:8 msgid "**Model Name:** vicuna-v1.5" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:11 msgid "" "**Description:** Vicuna is an open-source LLM trained by fine-tuning " "LLaMA on data collected from ShareGPT." msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:20 #: ../../source/models/builtin/llm/vicuna-v1.5.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:22 #: ../../source/models/builtin/llm/vicuna-v1.5.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:23 msgid "**Model ID:** lmsys/vicuna-7b-v1.5" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:26 #: ../../source/models/builtin/llm/vicuna-v1.5.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:33 msgid "Model Spec 2 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:36 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:38 msgid "**Model ID:** lmsys/vicuna-13b-v1.5" msgstr "" #: ../../source/models/builtin/llm/vicuna-v1.5.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/wizardcoder-python-v1.0.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:5 msgid "wizardcoder-python-v1.0" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:7 msgid "**Context Length:** 100000" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:8 msgid "**Model Name:** wizardcoder-python-v1.0" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:11 msgid "**Description:**" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:20 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:35 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:50 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:21 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:66 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:22 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:37 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:52 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:23 msgid "**Model ID:** WizardLM/WizardCoder-Python-7B-V1.0" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:26 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:41 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:56 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:71 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:86 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:101 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:33 msgid "Model Spec 2 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:36 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:81 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:38 msgid "**Model ID:** WizardLM/WizardCoder-Python-13B-V1.0" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:48 msgid "Model Spec 3 (pytorch, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:51 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:96 msgid "**Model Size (in billions):** 34" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:53 msgid "**Model ID:** WizardLM/WizardCoder-Python-34B-V1.0" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:63 msgid "Model Spec 4 (ggufv2, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:65 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:80 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:95 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:67 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:82 #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:97 msgid "" "**Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, " "Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:68 msgid "**Model ID:** TheBloke/WizardCoder-Python-7B-V1.0-GGUF" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:69 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:78 msgid "Model Spec 5 (ggufv2, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:83 msgid "**Model ID:** TheBloke/WizardCoder-Python-13B-V1.0-GGUF" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:84 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:93 msgid "Model Spec 6 (ggufv2, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:98 msgid "**Model ID:** TheBloke/WizardCoder-Python-34B-V1.0-GGUF" msgstr "" #: ../../source/models/builtin/llm/wizardcoder-python-v1.0.rst:99 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/wizardlm-v1.0.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:5 msgid "wizardlm-v1.0" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:8 msgid "**Model Name:** wizardlm-v1.0" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:11 msgid "" "**Description:** WizardLM is an open-source LLM trained by fine-tuning " "LLaMA with Evol-Instruct." msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:18 msgid "Model Spec 1 (ggmlv3, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:20 #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:35 msgid "**Model Format:** ggmlv3" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:22 #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:37 msgid "" "**Quantizations:** q2_K, q3_K_L, q3_K_M, q3_K_S, q4_0, q4_1, q4_K_M, " "q4_K_S, q5_0, q5_1, q5_K_M, q5_K_S, q6_K, q8_0" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:23 msgid "**Model ID:** TheBloke/WizardLM-7B-V1.0-Uncensored-GGML" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:26 #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:33 msgid "Model Spec 2 (ggmlv3, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:36 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:38 msgid "**Model ID:** TheBloke/WizardLM-13B-V1.0-Uncensored-GGML" msgstr "" #: ../../source/models/builtin/llm/wizardlm-v1.0.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/wizardmath-v1.0.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:5 msgid "wizardmath-v1.0" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:8 msgid "**Model Name:** wizardmath-v1.0" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:11 msgid "" "**Description:** WizardMath is an open-source LLM trained by fine-tuning " "Llama2 with Evol-Instruct, specializing in math." msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:20 #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:35 #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:50 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:22 #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:37 #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:52 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:23 msgid "**Model ID:** WizardLM/WizardMath-7B-V1.0" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:26 #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:41 #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:56 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:33 msgid "Model Spec 2 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:36 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:38 msgid "**Model ID:** WizardLM/WizardMath-13B-V1.0" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:48 msgid "Model Spec 3 (pytorch, 70 Billion)" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:51 msgid "**Model Size (in billions):** 70" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:53 msgid "**Model ID:** WizardLM/WizardMath-70B-V1.0" msgstr "" #: ../../source/models/builtin/llm/wizardmath-v1.0.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/xverse-chat.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/xverse-chat.rst:5 msgid "xverse-chat" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:8 msgid "**Model Name:** xverse-chat" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:11 msgid "**Description:** XVERSEB-Chat is the aligned version of model XVERSE." msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:20 #: ../../source/models/builtin/llm/xverse-chat.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:22 #: ../../source/models/builtin/llm/xverse-chat.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:23 msgid "**Model ID:** xverse/XVERSE-7B-Chat" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:26 #: ../../source/models/builtin/llm/xverse-chat.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:33 msgid "Model Spec 2 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:36 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:38 msgid "**Model ID:** xverse/XVERSE-13B-Chat" msgstr "" #: ../../source/models/builtin/llm/xverse-chat.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/xverse.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/xverse.rst:5 msgid "xverse" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:7 msgid "**Context Length:** 2048" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:8 msgid "**Model Name:** xverse" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:11 msgid "" "**Description:** XVERSE is a multilingual large language model, " "independently developed by Shenzhen Yuanxiang Technology." msgstr "" #: ../../source/models/builtin/llm/xverse.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:20 #: ../../source/models/builtin/llm/xverse.rst:35 #: ../../source/models/builtin/llm/xverse.rst:50 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:22 #: ../../source/models/builtin/llm/xverse.rst:37 #: ../../source/models/builtin/llm/xverse.rst:52 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:23 msgid "**Model ID:** xverse/XVERSE-7B" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:26 #: ../../source/models/builtin/llm/xverse.rst:41 #: ../../source/models/builtin/llm/xverse.rst:56 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:33 msgid "Model Spec 2 (pytorch, 13 Billion)" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:36 msgid "**Model Size (in billions):** 13" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:38 msgid "**Model ID:** xverse/XVERSE-13B" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:48 msgid "Model Spec 3 (pytorch, 65 Billion)" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:51 msgid "**Model Size (in billions):** 65" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:53 msgid "**Model ID:** xverse/XVERSE-65B" msgstr "" #: ../../source/models/builtin/llm/xverse.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/yi-200k.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/yi-200k.rst:5 msgid "Yi-200k" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:7 msgid "**Context Length:** 204800" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:8 msgid "**Model Name:** Yi-200k" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:11 msgid "" "**Description:** The Yi series models are large language models trained " "from scratch by developers at 01.AI." msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:18 msgid "Model Spec 1 (pytorch, 6 Billion)" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:20 #: ../../source/models/builtin/llm/yi-200k.rst:35 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:21 msgid "**Model Size (in billions):** 6" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:22 #: ../../source/models/builtin/llm/yi-200k.rst:37 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:23 msgid "**Model ID:** 01-ai/Yi-6B-200K" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:26 #: ../../source/models/builtin/llm/yi-200k.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:33 msgid "Model Spec 2 (pytorch, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:36 msgid "**Model Size (in billions):** 34" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:38 msgid "**Model ID:** 01-ai/Yi-34B-200K" msgstr "" #: ../../source/models/builtin/llm/yi-200k.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/yi-chat.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/yi-chat.rst:5 msgid "Yi-chat" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:7 msgid "**Context Length:** 204800" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:8 msgid "**Model Name:** Yi-chat" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:11 msgid "" "**Description:** The Yi series models are large language models trained " "from scratch by developers at 01.AI." msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:18 msgid "Model Spec 1 (pytorch, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:21 #: ../../source/models/builtin/llm/yi-chat.rst:36 msgid "**Model Size (in billions):** 34" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:23 msgid "**Model ID:** 01-ai/Yi-34B-Chat" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:26 #: ../../source/models/builtin/llm/yi-chat.rst:41 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:33 msgid "Model Spec 2 (ggufv2, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:35 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:37 msgid "" "**Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, " "Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:38 msgid "**Model ID:** TheBloke/Yi-34B-Chat-GGUF" msgstr "" #: ../../source/models/builtin/llm/yi-chat.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/yi.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/yi.rst:5 msgid "Yi" msgstr "" #: ../../source/models/builtin/llm/yi.rst:7 msgid "**Context Length:** 4096" msgstr "" #: ../../source/models/builtin/llm/yi.rst:8 msgid "**Model Name:** Yi" msgstr "" #: ../../source/models/builtin/llm/yi.rst:9 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/llm/yi.rst:10 msgid "**Abilities:** generate" msgstr "" #: ../../source/models/builtin/llm/yi.rst:11 msgid "" "**Description:** The Yi series models are large language models trained " "from scratch by developers at 01.AI." msgstr "" #: ../../source/models/builtin/llm/yi.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/yi.rst:18 msgid "Model Spec 1 (ggufv2, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/yi.rst:20 msgid "**Model Format:** ggufv2" msgstr "" #: ../../source/models/builtin/llm/yi.rst:21 #: ../../source/models/builtin/llm/yi.rst:51 msgid "**Model Size (in billions):** 34" msgstr "" #: ../../source/models/builtin/llm/yi.rst:22 msgid "" "**Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, " "Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/builtin/llm/yi.rst:23 msgid "**Model ID:** TheBloke/Yi-34B-GGUF" msgstr "" #: ../../source/models/builtin/llm/yi.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_" msgstr "" #: ../../source/models/builtin/llm/yi.rst:26 #: ../../source/models/builtin/llm/yi.rst:41 #: ../../source/models/builtin/llm/yi.rst:56 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" #: ../../source/models/builtin/llm/yi.rst:33 msgid "Model Spec 2 (pytorch, 6 Billion)" msgstr "" #: ../../source/models/builtin/llm/yi.rst:35 #: ../../source/models/builtin/llm/yi.rst:50 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/yi.rst:36 msgid "**Model Size (in billions):** 6" msgstr "" #: ../../source/models/builtin/llm/yi.rst:37 #: ../../source/models/builtin/llm/yi.rst:52 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/yi.rst:38 msgid "**Model ID:** 01-ai/Yi-6B" msgstr "" #: ../../source/models/builtin/llm/yi.rst:39 msgid "" "**Model Hubs**: `Hugging Face `_, " "`ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/yi.rst:48 msgid "Model Spec 3 (pytorch, 34 Billion)" msgstr "" #: ../../source/models/builtin/llm/yi.rst:53 msgid "**Model ID:** 01-ai/Yi-34B" msgstr "" #: ../../source/models/builtin/llm/yi.rst:54 msgid "" "**Model Hubs**: `Hugging Face `_, " "`ModelScope `_" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/zephyr-7b-alpha.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:5 msgid "zephyr-7b-alpha" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:8 msgid "**Model Name:** zephyr-7b-alpha" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:11 msgid "" "**Description:** Zephyr-7B-α is the first model in the series, and is a " "fine-tuned version of mistralai/Mistral-7B-v0.1." msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:23 msgid "**Model ID:** HuggingFaceH4/zephyr-7b-alpha" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope " "`_" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-alpha.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/llm/zephyr-7b-beta.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-03 15:51+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:5 msgid "zephyr-7b-beta" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:7 msgid "**Context Length:** 8192" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:8 msgid "**Model Name:** zephyr-7b-beta" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:9 msgid "**Languages:** en" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:10 msgid "**Abilities:** chat" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:11 msgid "" "**Description:** Zephyr-7B-β is the second model in the series, and is a " "fine-tuned version of mistralai/Mistral-7B-v0.1" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:14 msgid "Specifications" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:18 msgid "Model Spec 1 (pytorch, 7 Billion)" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:20 msgid "**Model Format:** pytorch" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:21 msgid "**Model Size (in billions):** 7" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:22 msgid "**Quantizations:** 4-bit, 8-bit, none" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:23 msgid "**Model ID:** HuggingFaceH4/zephyr-7b-beta" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:24 msgid "" "**Model Hubs**: `Hugging Face `_, `ModelScope `_" msgstr "" #: ../../source/models/builtin/llm/zephyr-7b-beta.rst:26 msgid "" "Execute the following command to launch the model, remember to replace " "``${quantization}`` with your chosen quantization method from the options" " listed above::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/rerank/bge-reranker-base.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-12-25 17:11+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/models/builtin/rerank/bge-reranker-base.rst:5 msgid "bge-reranker-base" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-base.rst:7 msgid "**Model Name:** bge-reranker-base" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-base.rst:8 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-base.rst:9 msgid "**Abilities:** rerank" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-base.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-base.rst:14 msgid "**Model ID:** BAAI/bge-reranker-base" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-base.rst:16 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/rerank/bge-reranker-large.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-12-25 17:11+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/models/builtin/rerank/bge-reranker-large.rst:5 msgid "bge-reranker-large" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-large.rst:7 msgid "**Model Name:** bge-reranker-large" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-large.rst:8 msgid "**Languages:** en, zh" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-large.rst:9 msgid "**Abilities:** rerank" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-large.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-large.rst:14 msgid "**Model ID:** BAAI/bge-reranker-large" msgstr "" #: ../../source/models/builtin/rerank/bge-reranker-large.rst:16 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/rerank/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-12-25 17:11+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/models/builtin/rerank/index.rst:5 msgid "Rerank Models" msgstr "重排序模型" #: ../../source/models/builtin/rerank/index.rst:7 msgid "The following is a list of built-in rerank models in Xinference:" msgstr "以下是 Xinference 中内置的重排序模型列表:" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/video/cogvideox-2b.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-08-13 17:44+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/video/cogvideox-2b.rst:5 msgid "CogVideoX-2b" msgstr "" #: ../../source/models/builtin/video/cogvideox-2b.rst:7 msgid "**Model Name:** CogVideoX-2b" msgstr "" #: ../../source/models/builtin/video/cogvideox-2b.rst:8 msgid "**Model Family:** CogVideoX" msgstr "" #: ../../source/models/builtin/video/cogvideox-2b.rst:9 msgid "**Abilities:** text2video" msgstr "" #: ../../source/models/builtin/video/cogvideox-2b.rst:12 msgid "Specifications" msgstr "" #: ../../source/models/builtin/video/cogvideox-2b.rst:14 msgid "**Model ID:** THUDM/CogVideoX-2b" msgstr "" #: ../../source/models/builtin/video/cogvideox-2b.rst:16 msgid "Execute the following command to launch the model::" msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/builtin/video/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-08-13 17:44+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/builtin/video/index.rst:5 msgid "Video Models" msgstr "视频模型" #: ../../source/models/builtin/video/index.rst:7 msgid "The following is a list of built-in video models in Xinference:" msgstr "以下是 Xinference 中内置的视频模型列表:" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/custom.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-28 14:31+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/models/custom.rst:5 msgid "Custom Models" msgstr "自定义模型" #: ../../source/models/custom.rst:6 msgid "" "Xinference provides a flexible and comprehensive way to integrate, " "manage, and utilize custom models." msgstr "Xinference 提供了一种灵活而全面的方式来集成、管理和应用自定义模型。" #: ../../source/models/custom.rst:10 msgid "Directly launch an existing model" msgstr "无需注册而直接启动自定义模型" #: ../../source/models/custom.rst:11 msgid "" "Since ``v0.14.0``, you can directly launch an existing model by passing " "``model_path`` to the launch interface without downloading it. This way " "requires that the model's ``model_family`` is among the built-in " "supported models, and eliminates the hassle of registering the model." msgstr "" "从 ``v0.14.0`` 版本开始,如果你需要注册的模型的家族是 Xinference 内置支持的模型,你可以直接通过 launch 接口中的 " "``model_path`` 参数来启动它,从而免去注册步骤的麻烦。现在非常推荐使用这种方式。" #: ../../source/models/custom.rst:15 msgid "For example:" msgstr "例如:" #: ../../source/models/custom.rst:47 msgid "" "The above example demonstrates how to directly launch a qwen1.5-chat " "model file without registering it." msgstr "上面的例子展示了当我已有 qwen1.5-chat 模型文件时,如何直接 launch 它。" #: ../../source/models/custom.rst:49 msgid "" "For distributed scenarios, if your model file is on a specific worker, " "you can directly launch it using the ``worker_ip`` and ``model_path`` " "parameters with the launch interface." msgstr "" "对于分布式场景,将你的模型文件置于某个 worker ,然后通过 launch 接口的 ``worker_ip`` 和 " "``model_path`` 参数来达到直接 launch 的效果。" #: ../../source/models/custom.rst:53 msgid "" "For CLI usage, prefer ``--model-path`` (kebab-case). ``--model_path`` is " "legacy-compatible but not recommended." msgstr "" "对于命令行界面(CLI)的使用,请优先使用 ``--model-path``(分号分隔的大小写混合形式)。``--model_path`` " "兼容旧版规范,但不建议使用。" #: ../../source/models/custom.rst:56 msgid "Define a custom model" msgstr "定义一个自定义模型" #: ../../source/models/custom.rst:59 msgid "Web UI: Automatic LLM Config Parsing" msgstr "Web UI:自动解析大型语言模型配置" #: ../../source/models/custom.rst:61 msgid "" "When registering a custom LLM via the Web UI, Xinference can " "automatically parse the model configuration and pre-fill key fields for " "you." msgstr "通过Web UI注册自定义LLM时,Xinference可自动解析模型配置并为您预填关键字段。" #: ../../source/models/custom.rst:64 msgid "You only need to provide:" msgstr "您仅需要提供:" #: ../../source/models/custom.rst:66 msgid "**Model path / Model ID** (where the model lives, local path or hub ID)" msgstr " **模型路径/模型ID** (模型所在位置,本地路径或中心ID)" #: ../../source/models/custom.rst:67 msgid "**Model Family**" msgstr " **模型家族** " #: ../../source/models/custom.rst:69 msgid "After parsing, the UI can auto-populate fields such as:" msgstr "解析后,用户界面可自动填充以下字段:" #: ../../source/models/custom.rst:71 msgid "``Context Length``" msgstr " ``上下文长度`` " #: ../../source/models/custom.rst:72 msgid "``Model_Languages``" msgstr " ``模型语言`` " #: ../../source/models/custom.rst:73 msgid "``Model_Abilities``" msgstr " ``模型能力`` " #: ../../source/models/custom.rst:74 msgid "``Model_Specs``" msgstr " ``模型规格`` " #: ../../source/models/custom.rst:76 msgid "You can review and edit these fields before saving the custom model." msgstr "在保存自定义模型之前,您可以查看并编辑这些字段。" #: ../../source/models/custom.rst:78 msgid "Define a custom model based on the following templates:" msgstr "基于以下模板定义一个自定义模型:" #: ../../source/models/custom.rst:82 msgid "LLM" msgstr "语言模型" #: ../../source/models/custom.rst:129 msgid "embedding" msgstr "嵌入模型" #: ../../source/models/custom.rst:164 msgid "Rerank" msgstr "重排序模型" #: ../../source/models/custom.rst:199 msgid "image" msgstr "图像模型" #: ../../source/models/custom.rst:234 msgid "audio" msgstr "音频模型" #: ../../source/models/custom.rst:267 msgid "flexible" msgstr "灵活模型" #: ../../source/models/custom.rst:294 msgid "" "model_name: A string defining the name of the model. The name must start " "with a letter or a digit and can only contain letters, digits, " "underscores, or dashes." msgstr "model_name: 模型名称。名称必须以字母或数字开头,且只能包含字母、数字、下划线或短划线。" #: ../../source/models/custom.rst:295 msgid "" "context_length: An optional integer that specifies the maximum context " "size the model was trained to accommodate, encompassing both the input " "and output lengths. If not defined, the default value is 2048 tokens " "(~1,500 words)." msgstr "" "context_length: " "一个可选的整数,模型支持的最大上下文长度,包括输入和输出长度。如果未定义,默认值为2048个token(约1,500个词)。" #: ../../source/models/custom.rst:296 msgid "" "dimensions: An interger defining the size of the vector output by the " "embedding model." msgstr "dimensions: 一个整数,用于定义嵌入模型输出的向量大小。" #: ../../source/models/custom.rst:297 msgid "" "max_tokens: An interger defining the maximum number of input tokens the " "embedding model can process in a single request." msgstr "max_tokens: 一个整数,定义嵌入模型在单次请求中可处理的最大输入token数量。" #: ../../source/models/custom.rst:298 msgid "" "model_lang: A list of strings representing the supported languages for " "the model. Example: [\"en\"], which means that the model supports " "English." msgstr "model_lang: 一个字符串列表,表示模型支持的语言。例如:['en'],表示该模型支持英语。" #: ../../source/models/custom.rst:299 msgid "" "model_ability: A list of strings defining the abilities of the model. It " "could include options like \"embed\", \"generate\", and \"chat\". In this" " case, the model has the ability to \"generate\"." msgstr "" "model_ability: 一个字符串列表,定义模型的能力。它可以包括像 'embed'、'generate' 和 'chat' " "这样的选项。示例表示模型具有 'generate' 的能力。" #: ../../source/models/custom.rst:300 msgid "" "model_family: A required string representing the family of the model you " "want to register. This parameter must not conflict with any builtin model" " names." msgstr "model_family: 一个必要的字符串,表示要注册的模型族。该参数名称不得与任何内置模型名称冲突。" #: ../../source/models/custom.rst:301 msgid "" "model_specs: An array of objects defining the specifications of the " "model. These include:" msgstr "model_specs: 一个包含定义模型规格的对象数组。这些规格包括:" #: ../../source/models/custom.rst:302 msgid "" "model_format: A string that defines the model format, like \"pytorch\" or" " \"ggufv2\"." msgstr "model_format: 一个定义模型格式的字符串,可以是 'pytorch' 或 'ggufv2'。" #: ../../source/models/custom.rst:303 msgid "" "model_size_in_billions: An integer defining the size of the model in " "billions of parameters." msgstr "model_size_in_billions: 一个整数,定义模型的参数量,以十亿为单位。" #: ../../source/models/custom.rst:304 msgid "" "quantizations: A list of strings defining the available quantizations for" " the model. For PyTorch models, it could be \"4-bit\", \"8-bit\", or " "\"none\". For ggufv2 models, the quantizations should correspond to " "values that work with the ``model_file_name_template``. Some engines also" " support ``fp4`` / ``fp8`` / ``bnb`` formats (see :ref:`installation` for" " backend support details)." msgstr "" "quantizations: 一个字符串列表,定义模型的量化方式。对于 PyTorch 模型,它可以是 \"4-bit\"、\"8-bit\" 或" " \"none\"。对于 ggufv2 模型,量化方式应与 ``model_file_name_template`` 中的值对应。" "某些引擎还支持 ``fp4`` / ``fp8`` / ``bnb`` 格式(后端支持详情请参见 :ref:`installation` )。" #: ../../source/models/custom.rst:306 msgid "" "model_id: A string representing the model ID, possibly referring to an " "identifier used by Hugging Face. **If model_uri is missing, Xinference " "will try to download the model from the huggingface repository specified " "here.**." msgstr "" "model_id:代表模型 id 的字符串,可以是该模型对应的 HuggingFace 仓库 id。如果 model_uri " "字段缺失,Xinference 将尝试从此id指示的HuggingFace仓库下载该模型。" #: ../../source/models/custom.rst:307 msgid "" "model_hub: A string representing where to download the model from, like " "\"Huggingface\" or \"modelscope\"" msgstr "model_hub: 一个可选字符串,表示从何处下载模型,例如 HuggingFace 或 modelscope。" #: ../../source/models/custom.rst:308 msgid "" "model_uri: A string representing the URI where the model can be loaded " "from, such as \"file:///path/to/llama-2-7b\". **When the model format is " "ggufv2, model_uri must be the specific file path. When the model format " "is pytorch, model_uri must be the path to the directory containing the " "model files.** If model URI is absent, Xinference will try to download " "the model from Hugging Face with the model ID." msgstr "" "model_uri:表示模型文件位置的字符串,例如本地目录:\"file:///path/to/llama-2-7b\"。当 " "model_format 是 ggufv2 ,此字段必须是具体的模型文件路径。而当 model_format 是 pytorch " "时,此字段必须是一个包含所有模型文件的目录。" #: ../../source/models/custom.rst:309 msgid "" "model_revision: A string representing the specific version or commit hash" " of the model files to use from the repository." msgstr "model_revision: 一个字符串,表示从存储库中使用的模型文件的具体版本或提交哈希值。" #: ../../source/models/custom.rst:310 msgid "" "chat_template: If ``model_ability`` includes ``chat`` , you must " "configure this option to generate the correct full prompt during chat. " "This is a Jinja template string. Usually, you can find it in the " "``tokenizer_config.json`` file within the model directory." msgstr "" "chat_template:如果 ``model_ability`` 中包含 ``chat`` " ",那么此选项必须配置以生成合适的完整提示词。这是一个 Jinja 模版字符串。通常,你可以在模型目录的 " "``tokenizer_config.json`` 文件中找到。" #: ../../source/models/custom.rst:311 msgid "" "stop_token_ids: If ``model_ability`` includes ``chat`` , you can " "configure this option to control when the model stops during chat. This " "is a list of integers, and you can typically extract the corresponding " "values from the ``generation_config.json`` or ``tokenizer_config.json`` " "file in the model directory." msgstr "" "stop_token_ids:如果 ``model_ability`` 中包含 ``chat`` " ",那么推荐配置此选项以合理控制对话的停止。这是一个包含整数的列表,你可以在模型目录的 ``generation_config.json`` 和 " "``tokenizer_config.json`` 文件中提取相应的值。" #: ../../source/models/custom.rst:312 msgid "" "stop: If ``model_ability`` includes ``chat`` , you can configure this " "option to control when the model stops during chat. This is a list of " "strings, and you can typically extract the corresponding values from the " "``generation_config.json`` or ``tokenizer_config.json`` file in the model" " directory." msgstr "" "stop:如果 ``model_ability`` 中包含 ``chat`` " ",那么推荐配置此选项以合理控制对话的停止。这是一个包含字符串的列表,你可以在模型目录的 ``tokenizer_config.json`` " "文件中找到 token 值对应的字符串。" #: ../../source/models/custom.rst:313 msgid "" "reasoning_start_tag: A special token or prompt used to explicitly " "instruct the LLM to begin its chain-of-thought or reasoning process in " "its output." msgstr "reasoning_start_tag: 一个特殊的 token 或 prompt,用于明确指示大语言模型在其输出中思维链或推理过程的起点。" #: ../../source/models/custom.rst:314 msgid "" "reasoning_end_tag: A special token or prompt used to explicitly mark the " "end of the model's chain-of-thought or reasoning process in its output." msgstr "reasoning_end_tag: 一个特殊的 token 或 prompt,用于明确指示大语言模型在其输出中思维链或推理过程的终点。" #: ../../source/models/custom.rst:315 msgid "" "cache_config: A string representing the parameters and rules for how the " "system stores and manages temporary data (cache)." msgstr "cache_config: 一个字符串,表示系统存储和管理临时数据(缓存)的参数。" #: ../../source/models/custom.rst:316 msgid "" "virtualenv: A settings object for model dependency isolation. Please " "refer to :ref:`this document ` for details." msgstr "" #: ../../source/models/custom.rst:319 msgid "Register a Custom Model" msgstr "注册一个自定义模型" #: ../../source/models/custom.rst:321 msgid "Register a custom model programmatically:" msgstr "以代码的方式注册自定义模型" #: ../../source/models/custom.rst:336 ../../source/models/custom.rst:354 #: ../../source/models/custom.rst:369 ../../source/models/custom.rst:424 msgid "Or via CLI:" msgstr "以命令行的方式" #: ../../source/models/custom.rst:342 msgid "" "Note that replace the ```` above with ``LLM``, ``embedding`` " "or ``rerank``. The same as below." msgstr "注意将以下部分的 ```` 替换为 ``LLM``、``embedding`` 或 ``rerank`` 。" #: ../../source/models/custom.rst:346 msgid "List the Built-in and Custom Models" msgstr "列举内置和自定义模型" #: ../../source/models/custom.rst:348 msgid "List built-in and custom models programmatically:" msgstr "以代码的方式列举内置和自定义模型" #: ../../source/models/custom.rst:361 msgid "Launch the Custom Model" msgstr "启动自定义模型" #: ../../source/models/custom.rst:363 msgid "Launch the custom model programmatically:" msgstr "以代码的方式启动自定义模型" #: ../../source/models/custom.rst:376 msgid "Interact with the Custom Model" msgstr "使用自定义模型" #: ../../source/models/custom.rst:378 msgid "Invoke the model programmatically:" msgstr "以代码的方式调用模型" #: ../../source/models/custom.rst:385 msgid "Result:" msgstr "结果为:" #: ../../source/models/custom.rst:409 #, python-brace-format msgid "Or via CLI, replace ``${UID}`` with real model UID:" msgstr "或者以命令行的方式,用实际的模型 UID 替换 ``${UID}``:" #: ../../source/models/custom.rst:416 msgid "Unregister the Custom Model" msgstr "注销自定义模型" #: ../../source/models/custom.rst:418 msgid "Unregister the custom model programmatically:" msgstr "以代码的方式注销自定义模型" #~ msgid "" #~ "model_file_name_template: Required by gguf " #~ "models. An f-string template used for" #~ " defining the model file name based" #~ " on the quantization. **Note that " #~ "this field is just a template for" #~ " the format of the ggufv2 model " #~ "file, do not fill in the specific" #~ " path of the model file.**" #~ msgstr "" #~ "model_file_name_template: gguf 模型所需。一个 f-string " #~ "模板,用于根据量化定义模型文件名。注意,这里不要填入文件的路径。" #~ msgid "Define a custom embedding model" #~ msgstr "定义自定义 embedding 模型" #~ msgid "Define a custom embedding model based on the following template:" #~ msgstr "基于以下模板定义一个自定义 embedding 模型:" #~ msgid "dimensions: A integer that specifies the embedding dimensions." #~ msgstr "dimensions: 表示 embedding 维度的整型值。" #~ msgid "" #~ "max_tokens: A integer that represents " #~ "the max sequence length that the " #~ "embedding model supports." #~ msgstr "max_tokens: 表示 embedding 模型支持的最大输入序列长度的整型值。" #~ msgid "" #~ "language: A list of strings representing" #~ " the supported languages for the " #~ "model. Example: [\"en\"], which means " #~ "that the model supports English." #~ msgstr "model_lang: 一个字符串列表,表示模型支持的语言。例如:['en'],表示该模型支持英语。" #~ msgid "" #~ "model_id: A string representing the " #~ "model ID, possibly referring to an " #~ "identifier used by Hugging Face." #~ msgstr "model_id: 一个表示模型标识的字符串,类似 HuggingFace 或 ModelScope 使用的标识符。" #~ msgid "" #~ "model_uri: A string representing the URI" #~ " where the model can be loaded " #~ "from, such as \"file:///path/to/your_model\". " #~ "If model URI is absent, Xinference " #~ "will try to download the model " #~ "from Hugging Face with the model " #~ "ID." #~ msgstr "" #~ "model_uri: 表示模型的 URI 的字符串,例如 " #~ "\"file:///path/to/llama-2-7b\"。如果模型 URI 不存在,Xinference " #~ "将尝试使用 model_id 从 HuggingFace 或 " #~ "ModelScope 下载模型。" #~ msgid "Define a custom Rerank model" #~ msgstr "定义自定义 rerank 模型" #~ msgid "Define a custom rerank model based on the following template:" #~ msgstr "基于以下模板定义一个自定义大语言模型:" #~ msgid "" #~ "type: A string defining the type " #~ "of the model, including ``normal``, " #~ "``LLM-based`` and ``LLM-based layerwise``." #~ msgstr "type: 表示模型的类型,可选值包括 ``normal``、``LLM-based`` 和 ``LLM-based layerwise``。" #~ msgid "" #~ "virtualenv: An array refers to the " #~ "name or path of a self-contained" #~ " Python environment used to isolate " #~ "dependencies required to run a specific" #~ " model or project. Please refer to" #~ " :ref:`this document `." #~ msgstr "" #~ "virtualenv: 一个数组,指代用于隔离特定模型或项目运行所依赖的独立环境名称或路径。详情请阅读 " #~ ":ref:`这个文档 `。" #~ msgid "**Model engine** (e.g., transformers / vllm / sglang)" #~ msgstr "" #~ msgid "``model_family`` / ``model_name``" #~ msgstr "" #~ msgid "``model_format``" #~ msgstr "" #~ msgid "``model_size_in_billions``" #~ msgstr "" #~ msgid "``quantization`` (if detectable)" #~ msgstr "" #~ msgid "``architectures`` and other model metadata (when available)" #~ msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-06-26 13:20+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/index.rst:5 msgid "Models" msgstr "模型" #: ../../source/models/index.rst:8 msgid "List Models" msgstr "模型列表" #: ../../source/models/index.rst:10 msgid "" "You can list all models of a certain type that are available to launch in" " Xinference:" msgstr "你可以列出 Xinference 中所有可以启动的、某种类型的模型:" #: ../../source/models/index.rst:29 msgid "The following ``MODEL_TYPE`` is supported by Xinference:" msgstr "Xinference 支持以下 ``MODEL_TYPE``:" #: ../../source/models/index.rst:33 msgid "LLM" msgstr "" #: ../../source/models/index.rst:37 msgid "Text generation models or large language models" msgstr "文本生成模型或大型语言模型" #: ../../source/models/index.rst:39 msgid "embedding" msgstr "" #: ../../source/models/index.rst:43 msgid "Text embeddings models" msgstr "文本嵌入模型" #: ../../source/models/index.rst:47 msgid "image" msgstr "" #: ../../source/models/index.rst:51 msgid "Image generation or manipulation models" msgstr "图像生成或处理模型" #: ../../source/models/index.rst:53 msgid "audio" msgstr "" #: ../../source/models/index.rst:57 msgid "Audio models" msgstr "音频模型" #: ../../source/models/index.rst:61 msgid "rerank" msgstr "" #: ../../source/models/index.rst:65 msgid "Rerank models" msgstr "重排序模型" #: ../../source/models/index.rst:67 msgid "video" msgstr "" #: ../../source/models/index.rst:71 msgid "Video models" msgstr "视频模型" #: ../../source/models/index.rst:75 ../../source/models/index.rst:240 msgid "flexible" msgstr "灵活模型" #: ../../source/models/index.rst:79 msgid "Flexible models (Traditional ML Models)" msgstr "灵活模型(传统机器学习模型)" #: ../../source/models/index.rst:84 msgid "" "You can see all the built-in models supported by xinference :ref:`here " "`. If the model you need is not available, " "Xinference also allows you to register your own :ref:`custom models " "`." msgstr "" "你可以在 :ref:`这里 ` 查看 Xinference 支持的所有" "内置模型。如果你需要的模型不可用,Xinference 还允许你注册自己的 :ref:`" "自定义模型 `。" #: ../../source/models/index.rst:89 msgid "Launch and Terminate Model" msgstr "启动和停止模型" #: ../../source/models/index.rst:91 msgid "" "Each running model instance will be assigned a unique model uid. By " "default, the model uid is equal to the model name. This unique id can be " "used as a handle for the further usage. You can manually assign it by " "passing ``--model-uid`` option in the launch command." msgstr "" "每个运行的模型实例将被分配一个唯一的模型uid。默认情况下,模型uid等于模型" "名。这个 ID 是后续使用模型实例的句柄,启动命令 ``--model-uid`` 选项可以" "手动指定它。" #: ../../source/models/index.rst:95 msgid "" "You can launch a model in Xinference either via command line or " "Xinference's Python client:" msgstr "你可以通过命令行或者 Xinference 的 Python 客户端来启动一个模型。" #: ../../source/models/index.rst:122 msgid "" "For model type ``LLM``, launching the model requires not only specifying " "the model name, but also the size of the parameters , the model format " "and the model engine. Please refer to the list of LLM :ref:`model " "families `." msgstr "" "对于模型类型 ``LLM``,启动模型不仅需要指定模型名称,还需要参数的大小、" "模型格式以及模型引擎。请参考 :ref:`models_llm_index` 文档。" #: ../../source/models/index.rst:125 msgid "" "The following command gives you the currently running models in " "Xinference:" msgstr "以下命令可以列出 Xinference 中正在运行的模型:" #: ../../source/models/index.rst:146 msgid "" "When you no longer need a model that is currently running, you can remove" " it in the following way to free up the resources it occupies:" msgstr "当你不再需要当前正在运行的模型时,以下列方式释放其占用的资源:" #: ../../source/models/index.rst:170 msgid "" "For models that are no longer maintained and depend on outdated libraries" " (such as ``transformers``), we recommend enabling the :ref:`Model " "Virtual Environment ` feature to ensure they can run " "properly in a compatible environment." msgstr "" "对于不再维护且依赖旧版库(如 ``transformers`` )的模型,建议启用 :ref:`" "模型虚拟空间 ` 功能,以确保它们能在兼容的环境中正常" "运行。" #: ../../source/models/index.rst:176 msgid "Model Usage" msgstr "模型使用" #: ../../source/models/index.rst:181 msgid "Chat & Generate" msgstr "聊天 & 生成" #: ../../source/models/index.rst:185 msgid "Learn how to chat with LLMs in Xinference." msgstr "学习如何在 Xinference 中与 LLM聊天。" #: ../../source/models/index.rst:187 msgid "Tools" msgstr "工具" #: ../../source/models/index.rst:191 msgid "Learn how to connect LLM with external tools." msgstr "学习如何将 LLM 与外部工具连接起来。" #: ../../source/models/index.rst:196 msgid "Embeddings" msgstr "嵌入" #: ../../source/models/index.rst:200 msgid "Learn how to create text embeddings in Xinference." msgstr "学习如何在 Xinference 中创建文本嵌入。" #: ../../source/models/index.rst:202 msgid "Rerank" msgstr "重排序" #: ../../source/models/index.rst:206 msgid "Learn how to use rerank models in Xinference." msgstr "学习如何在 Xinference 中使用重排序模型。" #: ../../source/models/index.rst:211 msgid "Images" msgstr "图像" #: ../../source/models/index.rst:215 msgid "Learn how to generate images with Xinference." msgstr "学习如何使用Xinference生成图像。" #: ../../source/models/index.rst:217 msgid "Multimodal" msgstr "多模态" #: ../../source/models/index.rst:221 msgid "Learn how to process images and audio with LLMs." msgstr "学习如何使用 LLM 处理图像和音频。" #: ../../source/models/index.rst:226 msgid "Audio" msgstr "音频" #: ../../source/models/index.rst:230 msgid "Learn how to turn audio into text or text into audio with Xinference." msgstr "学习如何使用 Xinference 将音频转换为文本或将文本转换为音频。" #: ../../source/models/index.rst:232 msgid "Video" msgstr "视频" #: ../../source/models/index.rst:236 msgid "Learn how to generate video with Xinference." msgstr "学习如何使用Xinference生成视频。" #: ../../source/models/index.rst:244 msgid "Learn how to inference traditional ML models with Xinference." msgstr "了解如何使用 Xinference 推理传统机器学习模型。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/lora.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-05-16 14:18+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/models/lora.rst:5 msgid "LoRA Integration" msgstr "集成LoRA" #: ../../source/models/lora.rst:7 msgid "" "Currently, Xinference supports launching ``LLM`` and ``image`` models " "with an attached LoRA fine-tuned model." msgstr "" "当前,Xinference 可以在启动 ``LLM`` 和 ``image`` 模型时连带一个 LoRA 微调" "模型用以辅助基础模型。" #: ../../source/models/lora.rst:10 msgid "Usage" msgstr "使用方式" #: ../../source/models/lora.rst:13 msgid "Launch" msgstr "启动" #: ../../source/models/lora.rst:14 msgid "" "Different from built-in models, xinference currently does not involve " "managing LoRA models. Users need to first download the LoRA model " "themselves and then provide the storage path of the model files to " "xinference." msgstr "" "不同于内置模型,Xinference 目前不会涉及管理 LoRA 模型。用户需要首先下载" "对应的 LoRA 模型,然后将模型存储路径提供给 Xinference 。" #: ../../source/models/lora.rst:54 msgid "Apply" msgstr "应用" #: ../../source/models/lora.rst:55 msgid "" "For LLM models, you can only configure one lora model you want when you " "use the model. Specifically, specify that the ``lora_name`` parameter be " "configured in the ``generate_config``. ``lora_name`` corresponds to the " "name of the lora in the LAUNCH procedure described above." msgstr "" "对于大语言模型,使用时指定其中一个 lora 。具体地,在 ``generate_config`` 参数中配置 ``lora_name`` 参数。" "``lora_name`` 对应 launch 过程中你的配置。" #: ../../source/models/lora.rst:75 msgid "Note" msgstr "注意事项" #: ../../source/models/lora.rst:77 msgid "" "The options ``image_lora_load_kwargs`` and ``image_lora_fuse_kwargs`` are" " only applicable to models with model_type ``image``. They correspond to " "the parameters in the ``load_lora_weights`` and ``fuse_lora`` interfaces " "of the ``diffusers`` library. If launching an LLM model, these parameters" " are not required." msgstr "" "上述 ``image_lora_load_kwargs`` 和 ``image_lora_fuse_kwargs`` 选项只应用" "于 ``image`` 模型。它们对应于 ``diffusers`` 库的 ``load_lora_weights`` 和" " ``fuse_lora`` 接口中的额外参数。如果启动的是 ``LLM`` 模型,则无需设置" "这些选项。" #: ../../source/models/lora.rst:81 msgid "" "You need to add the parameter lora_name during inference to specify the " "corresponding lora model. You can specify it in the Additional Inputs " "option." msgstr "" #: ../../source/models/lora.rst:83 msgid "" "For LLM chat models, currently only LoRA models are supported that do not" " change the prompt style." msgstr "" "对于 ``LLM`` 聊天模型,当前仅支持那些微调后不更改原始基础模型的提示词模版" "的 LoRA 模型。" #: ../../source/models/lora.rst:85 msgid "When using GPU, both LoRA and its base model occupy the same devices." msgstr "使用 GPU 时,LoRA 模型与其基础模型在同样的设备上,不会对其他模型造成影响。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-11-10 11:08+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/models/model_abilities/audio.rst:5 msgid "Audio" msgstr "音频" #: ../../source/models/model_abilities/audio.rst:7 msgid "Learn how to turn audio into text or text into audio with Xinference." msgstr "学习如何使用 Xinference 将音频转换为文本或将文本转换为音频。" #: ../../source/models/model_abilities/audio.rst:11 msgid "Introduction" msgstr "介绍" #: ../../source/models/model_abilities/audio.rst:14 msgid "The Audio API provides three methods for interacting with audio:" msgstr "Audio API提供了三种与音频交互的方法:" #: ../../source/models/model_abilities/audio.rst:17 msgid "The transcriptions endpoint transcribes audio into the input language." msgstr "转录终端将音频转录为输入语言。" #: ../../source/models/model_abilities/audio.rst:18 msgid "The translations endpoint translates audio into English." msgstr "翻译端点将音频转换为英文。" #: ../../source/models/model_abilities/audio.rst:19 msgid "The speech endpoint generates audio from the input text." msgstr "转录终端将音频转录为输入语言。" #: ../../source/models/model_abilities/audio.rst:26 msgid "API ENDPOINT" msgstr "API 端点" #: ../../source/models/model_abilities/audio.rst:27 msgid "OpenAI-compatible ENDPOINT" msgstr "OpenAI 兼容端点" #: ../../source/models/model_abilities/audio.rst:29 msgid "Transcription API" msgstr "转录 API" #: ../../source/models/model_abilities/audio.rst:30 msgid "/v1/audio/transcriptions" msgstr "/v1/audio/transcriptions" #: ../../source/models/model_abilities/audio.rst:32 msgid "Translation API" msgstr "翻译 API" #: ../../source/models/model_abilities/audio.rst:33 msgid "/v1/audio/translations" msgstr "/v1/audio/translations" #: ../../source/models/model_abilities/audio.rst:35 msgid "Speech API" msgstr "语音 API" #: ../../source/models/model_abilities/audio.rst:36 msgid "/v1/audio/speech" msgstr "/v1/audio/speech" #: ../../source/models/model_abilities/audio.rst:40 msgid "Supported models" msgstr "支持的模型列表" #: ../../source/models/model_abilities/audio.rst:42 msgid "The audio API is supported with the following models in Xinference:" msgstr "在Xinference中,以下模型支持音频API:" #: ../../source/models/model_abilities/audio.rst:45 msgid "Audio to text" msgstr "语音转文本" #: ../../source/models/model_abilities/audio.rst:47 msgid ":ref:`whisper-tiny `" msgstr "" #: ../../source/models/model_abilities/audio.rst:48 msgid ":ref:`whisper-tiny.en `" msgstr "" #: ../../source/models/model_abilities/audio.rst:49 msgid ":ref:`whisper-base `" msgstr "" #: ../../source/models/model_abilities/audio.rst:50 msgid ":ref:`whisper-base.en `" msgstr "" #: ../../source/models/model_abilities/audio.rst:51 msgid ":ref:`whisper-medium `" msgstr "" #: ../../source/models/model_abilities/audio.rst:52 msgid ":ref:`whisper-medium.en `" msgstr "" #: ../../source/models/model_abilities/audio.rst:53 msgid ":ref:`whisper-large-v3 `" msgstr "" #: ../../source/models/model_abilities/audio.rst:54 msgid ":ref:`whisper-large-v3-turbo `" msgstr "" #: ../../source/models/model_abilities/audio.rst:55 msgid "" ":ref:`Belle-distilwhisper-large-v2-zh `" msgstr "" #: ../../source/models/model_abilities/audio.rst:56 msgid "" ":ref:`Belle-whisper-large-v2-zh `" msgstr "" #: ../../source/models/model_abilities/audio.rst:57 msgid "" ":ref:`Belle-whisper-large-v3-zh `" msgstr "" #: ../../source/models/model_abilities/audio.rst:58 msgid ":ref:`SenseVoiceSmall `" msgstr "" #: ../../source/models/model_abilities/audio.rst:59 msgid ":ref:`Paraformer-zh `" msgstr "" #: ../../source/models/model_abilities/audio.rst:61 #: ../../source/models/model_abilities/audio.rst:99 msgid "For Mac M-series chips only:" msgstr "仅适用于 Mac M 系列芯片:" #: ../../source/models/model_abilities/audio.rst:63 msgid ":ref:`whisper-tiny-mlx `" msgstr "" #: ../../source/models/model_abilities/audio.rst:64 msgid ":ref:`whisper-tiny.en-mlx `" msgstr "" #: ../../source/models/model_abilities/audio.rst:65 msgid ":ref:`whisper-base-mlx `" msgstr "" #: ../../source/models/model_abilities/audio.rst:66 msgid ":ref:`whisper-base.en-mlx `" msgstr "" #: ../../source/models/model_abilities/audio.rst:67 msgid ":ref:`whisper-medium-mlx `" msgstr "" #: ../../source/models/model_abilities/audio.rst:68 msgid ":ref:`whisper-medium.en-mlx `" msgstr "" #: ../../source/models/model_abilities/audio.rst:69 msgid ":ref:`whisper-large-v3-mlx `" msgstr "" #: ../../source/models/model_abilities/audio.rst:70 msgid "" ":ref:`whisper-large-v3-turbo-mlx `" msgstr "" #: ../../source/models/model_abilities/audio.rst:74 msgid "Text to audio (TTS)" msgstr "文本转语音(TTS)" #: ../../source/models/model_abilities/audio.rst:76 msgid "" "**Models supporting zero-shot** (direct synthesis without reference " "audio):" msgstr "**支持zero-shot的模型** (无需参考音频)" #: ../../source/models/model_abilities/audio.rst:78 msgid ":ref:`ChatTTS `" msgstr "" #: ../../source/models/model_abilities/audio.rst:79 msgid ":ref:`CosyVoice-300M-SFT `" msgstr "" #: ../../source/models/model_abilities/audio.rst:80 msgid ":ref:`CosyVoice-300M-Instruct `" msgstr "" #: ../../source/models/model_abilities/audio.rst:81 msgid "MeloTTS series" msgstr "" #: ../../source/models/model_abilities/audio.rst:82 msgid ":ref:`Kokoro-82M `" msgstr "" #: ../../source/models/model_abilities/audio.rst:83 #: ../../source/models/model_abilities/audio.rst:102 msgid ":ref:`Kokoro-82M-MLX `" msgstr "" #: ../../source/models/model_abilities/audio.rst:84 msgid ":ref:`MegaTTS3 `" msgstr "" #: ../../source/models/model_abilities/audio.rst:86 msgid "**Models supporting voice cloning** (requires reference audio):" msgstr "**支持语音克隆的模型** (需要参考音频)" #: ../../source/models/model_abilities/audio.rst:88 msgid ":ref:`CosyVoice-300M `" msgstr "" #: ../../source/models/model_abilities/audio.rst:89 msgid ":ref:`CosyVoice 2.0 `" msgstr "" #: ../../source/models/model_abilities/audio.rst:90 msgid ":ref:`FishSpeech-1.5 `" msgstr "" #: ../../source/models/model_abilities/audio.rst:91 msgid ":ref:`F5-TTS `" msgstr "" #: ../../source/models/model_abilities/audio.rst:92 #: ../../source/models/model_abilities/audio.rst:101 msgid ":ref:`F5-TTS-MLX `" msgstr "" #: ../../source/models/model_abilities/audio.rst:93 #: ../../source/models/model_abilities/audio.rst:97 msgid ":ref:`IndexTTS2 `" msgstr "" #: ../../source/models/model_abilities/audio.rst:95 msgid "**Models supporting emotion control**:" msgstr "**支持情感控制的模型**" #: ../../source/models/model_abilities/audio.rst:105 msgid "Quickstart" msgstr "快速入门" #: ../../source/models/model_abilities/audio.rst:108 msgid "Transcription" msgstr "转录" #: ../../source/models/model_abilities/audio.rst:110 msgid "" "The Transcription API mimics OpenAI's `create transcriptions API " "`_. We can try Transcription API out " "either via cURL, OpenAI Client, or Xinference's python client:" msgstr "" "Transcription API 模仿了 OpenAI 的 `create transcriptions API `_。你" "可以通过 cURL、OpenAI Client 或者 Xinference 的 Python 客户端来尝试 " "Transcription API:" #: ../../source/models/model_abilities/audio.rst:161 msgid "Translation" msgstr "翻译" #: ../../source/models/model_abilities/audio.rst:163 msgid "" "The Translation API mimics OpenAI's `create translations API " "`_. We can try Translation API out " "either via cURL, OpenAI Client, or Xinference's python client:" msgstr "" "Translation API 模仿了 OpenAI 的 `create translations API `_。你可以" "通过 cURL、OpenAI Client 或 Xinference 的 Python 客户端来尝试使用 " "Translation API:" #: ../../source/models/model_abilities/audio.rst:213 msgid "Speech" msgstr "语音" #: ../../source/models/model_abilities/audio.rst:217 msgid "" "The Speech API mimics OpenAI's `create speech API " "`_. We" " can try Speech API out either via cURL, OpenAI Client, or Xinference's " "python client:" msgstr "" "Transcription API 模仿了 OpenAI 的 `create speech API `_。你可以通过 cURL、" "OpenAI Client 或者 Xinference 的 Python 客户端来尝试 Speech API:" #: ../../source/models/model_abilities/audio.rst:220 msgid "Speech API use non-stream by default as" msgstr "Speech API 默认使用非流式" #: ../../source/models/model_abilities/audio.rst:222 msgid "" "The stream output of ChatTTS is not as good as the non-stream output, " "please refer to: https://github.com/2noise/ChatTTS/pull/564" msgstr "" "ChatTTS 的流式输出不如非流式的效果好,参考:https://github.com/2noise/" "ChatTTS/pull/564" #: ../../source/models/model_abilities/audio.rst:223 msgid "" "The stream requires ffmpeg<7: " "https://pytorch.org/audio/stable/installation.html#optional-dependencies" msgstr "" "流式要求 ffmpeg<7:https://pytorch.org/audio/stable/installation.html#" "optional-dependencies" #: ../../source/models/model_abilities/audio.rst:275 msgid "ChatTTS Usage" msgstr "ChatTTS 使用" #: ../../source/models/model_abilities/audio.rst:277 #: ../../source/models/model_abilities/audio.rst:480 msgid "Basic usage, refer to :ref:`audio speech usage `." msgstr "基本使用,参考 :ref:`语音使用章节 `。" #: ../../source/models/model_abilities/audio.rst:279 msgid "" "Fixed tone color. We can use fixed tone color provided by " "https://github.com/6drf21e/ChatTTS_Speaker, Download the " "`evaluation_result.csv " "`_" " , take ``seed_2155`` as example, we get the ``emb_data`` of it." msgstr "" "固定音色。我们可以使用由 https://github.com/6drf21e/ChatTTS_Speaker 提供" "的固定音色,下载 `evaluation_result.csv `_ ,以 ``seed_2155`` " "音色作为例子,我们使用 ``emb_data`` 列的数据。" #: ../../source/models/model_abilities/audio.rst:292 msgid "Use the fixed tone color of ``seed_2155`` to generate speech." msgstr "使用 ``seed_2155`` 固定音色来创建语音。" #: ../../source/models/model_abilities/audio.rst:308 msgid "CosyVoice Usage" msgstr "CosyVoice 模型使用" #: ../../source/models/model_abilities/audio.rst:310 msgid "" "CosyVoice has two versions: CosyVoice 1.0 and CosyVoice 2.0. CosyVoice " "1.0 has three different models:" msgstr "" "CosyVoice 有两个版本:CosyVoice 1.0 和 CosyVoice 2.0。CosyVoice 1.0 有 3 " "个不同模型:" #: ../../source/models/model_abilities/audio.rst:312 msgid "" "**CosyVoice-300M-SFT**: Choose this model if you just want to convert " "text to audio. There are pretrained voices available: ['中文女', '中文男'" ", '日语男', '粤语女', '英文女', '英文男', '韩语女']" msgstr "" "**CosyVoice-300M-SFT**: 如果你只想把文本转换为语音,选择这个模型。它提供" "了一些预训练的音色: ['中文女', '中文男', '日语男', '粤语女', '英文女', '" "英文男', '韩语女']" #: ../../source/models/model_abilities/audio.rst:313 msgid "" "**CosyVoice-300M**: Choose this model if you want to clone voice or " "convert text to audio in different languages. The ``prompt_speech`` is " "always required and should be a WAV file. For optimal performance, use a " "sample rate of 16,000 Hz." msgstr "" "**CosyVoice-300M**: 如果你想克隆声音或者把文本转换成另一种语言的语音," "选择这个模型。使用这个模型,你必须提供 ``prompt_speech`` WAV格式音频文件" ",请使用 16,000 Hz 采样率以获得更好的性能。" #: ../../source/models/model_abilities/audio.rst:314 msgid "" "**CosyVoice-300M-Instruct**: Choose this model If you need precise " "control over the tone and pitch." msgstr "**CosyVoice-300M-Instruct**: 如果你想精确控制音调和音色,选择这个模型。" #: ../../source/models/model_abilities/audio.rst:316 msgid "Basic usage, launch model ``CosyVoice-300M-SFT``." msgstr "基本使用,加载模型 ``CosyVoice-300M-SFT``。" #: ../../source/models/model_abilities/audio.rst:365 msgid "Clone voice, launch model ``CosyVoice-300M``." msgstr "克隆声音,加载模型 ``CosyVoice-300M``。" #: ../../source/models/model_abilities/audio.rst:395 msgid "Cross lingual usage, launch model ``CosyVoice-300M``." msgstr "跨语言使用,加载模型 ``CosyVoice-300M``。" #: ../../source/models/model_abilities/audio.rst:418 msgid "Instruction based, launch model ``CosyVoice-300M-Instruct``." msgstr "基于指令的声音合成,加载模型 ``CosyVoice-300M-Instruct``。" #: ../../source/models/model_abilities/audio.rst:435 msgid "" "CosyVoice 2.0 only has one model, it provides all the capabilities of the" " three CosyVoice models. The usage is the same as CosyVoice." msgstr "" "CosyVoice 2.0 只有一个模型,但它包含了 CosyVoice 三个模型的所有能力。使用" "方法与 CosyVoice 一样。" #: ../../source/models/model_abilities/audio.rst:437 msgid "CosyVoice 2.0 stream usage, launch model ``CosyVoice2-0.5B``." msgstr "CosyVoice 2.0 流式使用,加载模型 ``CosyVoice2-0.5B``。" #: ../../source/models/model_abilities/audio.rst:474 msgid "" "More instructions and examples, could be found at https://fun-audio-" "llm.github.io/ ." msgstr "更多指令和例子,可以参考 https://fun-audio-llm.github.io/ 。" #: ../../source/models/model_abilities/audio.rst:478 msgid "FishSpeech Usage" msgstr "FishSpeech 模型使用" #: ../../source/models/model_abilities/audio.rst:482 msgid "" "Clone voice, launch model ``FishSpeech-1.5``. Please use `prompt_speech` " "instead of `reference_audio` and `prompt_text` instead of " "`reference_text` to clone voice from the reference audio for the " "FishSpeech model. This arguments is aligned to voice cloning of " "CosyVoice." msgstr "" "克隆语音,启动模型 ``FishSpeech-1.5``。请使用 `prompt_speech`而不是 `" "reference_audio` 以及 `prompt_text` 而不是 `reference_text` 来为 " "FishSpeech 模型提供参考音频。这个参数和 CosyVoice 的语音克隆保持一致。" #: ../../source/models/model_abilities/audio.rst:509 msgid "Paraformer Usage" msgstr "Paraformer 使用说明" #: ../../source/models/model_abilities/audio.rst:512 msgid "model" msgstr "" #: ../../source/models/model_abilities/audio.rst:512 msgid "vad" msgstr "语音活动检测(vad)" #: ../../source/models/model_abilities/audio.rst:512 msgid "punc" msgstr "标点恢复(punc)" #: ../../source/models/model_abilities/audio.rst:512 msgid "timestamp" msgstr "时间戳" #: ../../source/models/model_abilities/audio.rst:512 msgid "speaker" msgstr "说话人" #: ../../source/models/model_abilities/audio.rst:512 msgid "hotword" msgstr "热词" #: ../../source/models/model_abilities/audio.rst:514 msgid ":ref:`models_builtin_paraformer-zh`" msgstr "" #: ../../source/models/model_abilities/audio.rst:514 #: ../../source/models/model_abilities/audio.rst:516 #: ../../source/models/model_abilities/audio.rst:518 #: ../../source/models/model_abilities/audio.rst:520 #: ../../source/models/model_abilities/audio.rst:522 msgid "yes" msgstr "是" #: ../../source/models/model_abilities/audio.rst:514 #: ../../source/models/model_abilities/audio.rst:516 #: ../../source/models/model_abilities/audio.rst:518 #: ../../source/models/model_abilities/audio.rst:520 msgid "no" msgstr "否" #: ../../source/models/model_abilities/audio.rst:516 msgid ":ref:`models_builtin_paraformer-zh-hotword`" msgstr "" #: ../../source/models/model_abilities/audio.rst:518 msgid ":ref:`models_builtin_paraformer-zh-spk`" msgstr "" #: ../../source/models/model_abilities/audio.rst:520 msgid ":ref:`models_builtin_paraformer-zh-long`" msgstr "" #: ../../source/models/model_abilities/audio.rst:522 msgid ":ref:`models_builtin_seaco-paraformer-zh` (recommend)" msgstr ":ref:`models_builtin_seaco-paraformer-zh` (推荐)" #: ../../source/models/model_abilities/audio.rst:525 msgid "**VAD & Punctuation Usage**" msgstr "**VAD 与标点符号的使用**" #: ../../source/models/model_abilities/audio.rst:527 msgid "All Paraformer models support VAD and punctuation." msgstr "所有 Paraformer 模型均支持 VAD 和标点功能。" #: ../../source/models/model_abilities/audio.rst:529 msgid "**Timestamp & Speaker Usage**" msgstr "**时间戳和说话人识别使用说明**" #: ../../source/models/model_abilities/audio.rst:531 msgid "Only the following models support `timestamp` and `speaker`:" msgstr "仅以下模型支持 `时间戳` 和 `说话人` 识别:" #: ../../source/models/model_abilities/audio.rst:533 msgid "`paraformer-zh-spk`" msgstr "" #: ../../source/models/model_abilities/audio.rst:534 msgid "`paraformer-zh-long`" msgstr "" #: ../../source/models/model_abilities/audio.rst:535 #: ../../source/models/model_abilities/audio.rst:560 msgid "`seaco-paraformer-zh`" msgstr "" #: ../../source/models/model_abilities/audio.rst:537 msgid "Among them, only `paraformer-zh-spk` enables **speaker info by default**." msgstr "其中,仅 `paraformer-zh-spk` 默认启用说话人识别功能。" #: ../../source/models/model_abilities/audio.rst:539 msgid "" "If you need speaker info when using `paraformer-zh-long` or `seaco-" "paraformer-zh`:" msgstr "" "如果你使用的是 `paraformer-zh-long` 或 `seaco-paraformer-zh`,且需要启用" "说话人识别功能:" #: ../../source/models/model_abilities/audio.rst:541 msgid "" "In Web UI: add an extra parameter with key ``spk_model`` and value " "``cam++``" msgstr "在 Web UI 中:添加名为 ``spk_model``、值为 ``cam++`` 的参数" #: ../../source/models/model_abilities/audio.rst:542 msgid "In command line: add the option ``--spk_model cam++``" msgstr "在命令行中:添加参数 ``--spk_model cam++``" #: ../../source/models/model_abilities/audio.rst:544 #: ../../source/models/model_abilities/audio.rst:562 msgid "Example:" msgstr "示例:" #: ../../source/models/model_abilities/audio.rst:555 msgid "**Hotword Usage**" msgstr "**热词功能使用说明**" #: ../../source/models/model_abilities/audio.rst:557 msgid "Only the following models support `hotword`:" msgstr "仅以下模型支持 `hotword` (热词功能):" #: ../../source/models/model_abilities/audio.rst:559 msgid "`paraformer-zh-hotword`" msgstr "" #: ../../source/models/model_abilities/audio.rst:576 msgid "SenseVoiceSmall Offline Usage" msgstr "SenseVoiceSmall 离线使用" #: ../../source/models/model_abilities/audio.rst:578 msgid "" "Now SenseVoiceSmall use a small vad model ``fsmn-vad``, it will be " "downloaded thus network required." msgstr "" "现在 SenseVoiceSmall 使用一个小的 VAD 模型 ``fsmn-vad``,因此它需要网络来" "下载。" #: ../../source/models/model_abilities/audio.rst:580 msgid "For offline environment, you can download the vad model in advance." msgstr "对于离线环境,你可以提前下载这个 VAD 模型。" #: ../../source/models/model_abilities/audio.rst:582 msgid "" "Download from `huggingface `_ or " "`modelscope `_. Assume downloaded to ``/path/to/fsmn-vad``." msgstr "" "从 `huggingface `_ 或者 `" "modelscope `_ 下载。假设下载到 ``/path/to/fsmn-vad``。" #: ../../source/models/model_abilities/audio.rst:585 msgid "" "Then when launching SenseVoiceSmall with Web UI, you can add an " "additional parameter with key ``vad_model`` and value ``/path/to/fsmn-" "vad`` which is the downloaded path. When launching with command line, you" " can add an option ``--vad_model /path/to/fsmn-vad``." msgstr "" "然后当用 Web UI 加载 SenseVoiceSmall 时,添加额外选项,key 是 ``vad_model" "``,值是之前的下载路径 ``/path/to/fsmn-vad``。用命令行加载时,增加选项 ``" "--vad_model /path/to/fsmn-vad``。" #: ../../source/models/model_abilities/audio.rst:590 msgid "Kokoro Usage" msgstr "Kokoro 模型使用" #: ../../source/models/model_abilities/audio.rst:592 msgid "" "The Kokoro model supports multiple languages, but the default language is" " English. If you want to use other languages, such as Chinese, you need " "to install additional dependency packages and add an additional parameter" " when starting the model." msgstr "" "Kokoro模型支持多语言,默认是英文。如果你想使用非默认语言,例如中文,则" "需要安装额外依赖包并且在模型启动时增加对应参数。" #: ../../source/models/model_abilities/audio.rst:596 msgid "pip install misaki[zh]" msgstr "" #: ../../source/models/model_abilities/audio.rst:598 msgid "" "Initialize the model with the parameter lang_code='z', For all available " "``lang_code`` options, please refer to `kokoro source code " "`_. " "If the model is started through the web UI, an additional parameter needs" " to be added, with the key as ``lang_code`` and the value as ``z``. If " "the model is started through the xinference client, the parameters are " "passed via the launch_model interface:" msgstr "" "使用 lang_code='z' 参数初始化模型,可以参考 `kokoro source code `_ 查看所有" "支持的 lang_code。如果你是通过 Web UI启动的模型,则需要添加额外参数,key" "是 ``lang_code``,value是 ``z``。如果你是通过 xinference client启动的模型" ",则可以参考如下代码传递参数:" #: ../../source/models/model_abilities/audio.rst:615 msgid "" "When inferring, the voice must start with 'z', for example: " "``zf_xiaoyi``. The currently supported voices are: " "https://huggingface.co/hexgrad/Kokoro-82M/tree/main/voices. For example:" msgstr "" "当推理时,需要使用 'z' 开头的 voice,例如:``zf_xiaoyi``。目前支持的 " "voices 可以参考 https://huggingface.co/hexgrad/Kokoro-82M/tree/main/" "voices。使用方法如下:" #: ../../source/models/model_abilities/audio.rst:625 msgid "IndexTTS2 Usage" msgstr "IndexTTS2 使用" #: ../../source/models/model_abilities/audio.rst:627 msgid "" "The IndexTTS2 model supports emotion control, you can use this feature by" " specifying some additional parameters. Here are several examples of how " "to use IndexTTS2:" msgstr "" "IndexTTS2模型支持情感控制,你可以通过使用一些额外的参数来时用这个功能。" "以下为IndexTTS2的使用方式:" #: ../../source/models/model_abilities/audio.rst:630 msgid "Synthesize new speech with a single reference audio file (voice cloning):" msgstr "单一参考音频(音色克隆):" #: ../../source/models/model_abilities/audio.rst:646 msgid "" "Using a separate, emotional reference audio file to condition the speech " "synthesis:" msgstr "指定情感参考音频:" #: ../../source/models/model_abilities/audio.rst:666 msgid "" "When an emotional reference audio file is specified, you can optionally " "set the ``emo_alpha`` to adjust how much it affects the output. Valid " "range is ``0.0 - 1.0`` , and the default value is ``1.0`` (100%):" msgstr "" "当指定情感参考音频时,可以选择设置 ``emo_alpha`` 参数以调整其对输出的影响" "程度。有效范围为 ``0.0 - 1.0`` ,默认值为 ``1.0`` (100%)。" #: ../../source/models/model_abilities/audio.rst:689 msgid "" "It's also possible to omit the emotional reference audio and instead " "provide an 8-float list specifying the intensity of each emotion, in the " "following order: ``[happy, angry, sad, afraid, disgusted, melancholic, " "surprised, calm]`` . You can additionally use the ``use_random`` " "parameter to introduce stochasticity during inference; the default is " "``False`` , and setting it to ``True`` enables randomness:" msgstr "" "可以省略情绪参考音频,转而提供一个包含8个浮点数的列表,按以下顺序指定每种" "情绪的强度: ``[快乐, 愤怒, 悲伤, 恐惧, 厌恶, 忧郁, 惊讶, 平静]`` 。您还" "可以使用 ``use_random`` 参数在推理过程中引入随机性情绪;默认值为 ``False`" "` ,设置为 ``True`` 即可启用随机性情绪。" #: ../../source/models/model_abilities/audio.rst:712 msgid "" "Alternatively, you can enable ``use_emo_text`` to guide the emotions " "based on your provided ``text`` script. Your text script will then " "automatically be converted into emotion vectors. It's recommended to use " "``emo_alpha`` around 0.6 (or lower) when using the text emotion modes, " "for more natural sounding speech. You can introduce randomness with " "``use_random`` (default: ``False``; ``True`` enables randomness):" msgstr "" "或者,您可以启用 ``use_emo_text`` 功能,根据您提供的 ``text`` 脚本引导" "情感表达。您的文本脚本将自动转换为情感向量。使用文本情感模式时,建议将 ``" "emo_alpha`` 设置为 0.6 左右(或更低),以获得更自然的语音效果。您可通过 `" "`use_random`` 引入随机性(默认值:``False`` ;``True`` 启用随机性):" #: ../../source/models/model_abilities/audio.rst:737 msgid "" "It's also possible to directly provide a specific text emotion " "description via the ``emo_text`` parameter. Your emotion text will then " "automatically be converted into emotion vectors. This gives you separate " "control of the text script and the text emotion description:" msgstr "" "您也可以通过 ``emo_text`` 参数直接提供特定的文本情绪描述。您的情绪文本将" "自动转换为情绪向量。这使您能够分别控制文本脚本和文本情绪描述:" #: ../../source/models/model_abilities/audio.rst:761 msgid "IndexTTS2 Offline Usage" msgstr "IndexTTS2 离线使用" #: ../../source/models/model_abilities/audio.rst:763 msgid "" "IndexTTS2 requires several small models that are downloaded automatically" " during initialization. For offline environments, you can download these " "models to a single directory and specify the directory path." msgstr "" "IndexTTS2需要多个小型模型,这些模型会在初始化过程中自动下载。在离线环境中" ",您可以将这些模型下载到单一目录,并指定该目录路径。" #: ../../source/models/model_abilities/audio.rst:766 msgid "**Easy Setup Method**" msgstr "**简易设置方法**" #: ../../source/models/model_abilities/audio.rst:768 msgid "" "The simplest way to set up offline usage is to Use the `hf download` " "command to download the small model in advance:" msgstr "设置离线使用的最简单方法是使用: `hf download` 命令去提前下载所有小模型" #: ../../source/models/model_abilities/audio.rst:781 msgid "The final directory structure should look like this:" msgstr "最终的目录结构应如下所示:" #: ../../source/models/model_abilities/audio.rst:791 msgid "**Required Models**" msgstr "**支持的模型列表**" #: ../../source/models/model_abilities/audio.rst:793 msgid "The small models are automatically mapped as follows:" msgstr "小型模型将按以下方式自动映射:" #: ../../source/models/model_abilities/audio.rst:795 msgid "" "**w2v-bert-2.0** (``models--facebook--w2v-bert-2.0``) - Feature " "extraction model" msgstr "**w2v-bert-2.0** (``models--facebook--w2v-bert-2.0``) - 特征提取模型" #: ../../source/models/model_abilities/audio.rst:796 msgid "**campplus** (``models--funasr--campplus``) - Speaker recognition model" msgstr "**campplus** (``models--funasr--campplus``) - 说话人识别模型" #: ../../source/models/model_abilities/audio.rst:797 msgid "" "**bigvgan** (``models--nvidia--bigvgan_v2_22khz_80band_256x``) - Vocoder " "model" msgstr "" "**bigvgan** (``models--nvidia--bigvgan_v2_22khz_80band_256x``) - 语音" "编码器模型" #: ../../source/models/model_abilities/audio.rst:798 msgid "" "**semantic_codec** (``models--amphion--MaskGCT``) - Semantic " "encoding/decoding model" msgstr "**语义编解码器** (``models--amphion--MaskGCT``) - 语义编码/解码模型" #: ../../source/models/model_abilities/audio.rst:800 msgid "**Launching IndexTTS2 with Offline Models**" msgstr "**使用离线模式启动IndexTTS2**" #: ../../source/models/model_abilities/audio.rst:802 msgid "" "When launching IndexTTS2 with Web UI, you can add an additional " "parameter: - ``small_models_dir`` - Path to directory containing all " "small models" msgstr "" "在通过Web UI启动IndexTTS2时,可添加额外参数:- ``small_models_dir`` - " "包含所有小型模型的目录路径" #: ../../source/models/model_abilities/audio.rst:805 msgid "When launching with command line, you can add the option:" msgstr "在通过命令行启动时,您可以添加以下选项:" #: ../../source/models/model_abilities/audio.rst:812 msgid "When launching with Python client:" msgstr "使用 Python 客户端启动时:" #~ msgid "**random sampling**" #~ msgstr "" #~ msgid "" #~ "Enabling random sampling will reduce the" #~ " voice cloning fidelity of the speech" #~ " synthesis." #~ msgstr "" #~ msgid "" #~ "5.Alternatively, you can enable `use_emo_text`" #~ " to guide the emotions based on" #~ msgstr "" #~ msgid "" #~ "your provided `text` script. Your text" #~ " script will then automatically be " #~ "converted into emotion vectors. It's " #~ "recommended to use `emo_alpha` around " #~ "0.6 (or lower) when using the text" #~ " emotion modes, for more natural " #~ "sounding speech. You can introduce " #~ "randomness with `use_random` (default: " #~ "`False`; `True` enables randomness):" #~ msgstr "" #~ msgid "" #~ "The required small models are: 1. " #~ "**w2v-bert-2.0** - Feature extraction model" #~ " (place in ``w2v-bert-2.0/`` subdirectory)" #~ " 2. **semantic_codec** - Semantic " #~ "encoding/decoding model (place in " #~ "``semantic_codec/`` subdirectory) 3. **campplus**" #~ " - Speaker recognition model (place " #~ "in ``campplus/`` subdirectory) 4. **bigvgan**" #~ " - Vocoder model (place in " #~ "``bigvgan/`` subdirectory)" #~ msgstr "" #~ "所需的小型模型包括:1. **w2v-" #~ "bert-2.0** - 特征提取模型(放置于" #~ "``w2v-bert-2.0/``子目录)2. " #~ "**semantic_codec** - 语义编码/解码" #~ "模型(放置于``semantic_codec/``" #~ "子目录)3. **campplus** - 说话" #~ "人识别模型(放置于``campplus/``" #~ "子目录) 4. **bigvgan** - 声" #~ "码器模型(放置于``bigvgan/``子目录" #~ ")" #~ msgid "" #~ "Assume downloaded to ``/path/to/small_models`` " #~ "with the following structure:" #~ msgstr "假设下载到``/path/to/small_models``目录,其结构如下:" #~ msgid "" #~ "**Find your Hugging Face cache " #~ "directory** (usually ``~/.cache/huggingface/hub/``)" #~ msgstr "" #~ "**查找您的Hugging Face缓存目录** " #~ "(通常位于 ``~/.cache/huggingface/" #~ "hub/`` )" #~ msgid "**Copy the required models** to your target directory:" #~ msgstr "**将所需模型** 复制到目标目录:" #~ msgid "**Note about Directory Structure**" #~ msgstr "**关于目录结构的说明**" #~ msgid "" #~ "The ``snapshots/`` directories contain " #~ "version-specific model files with hash " #~ "names. Xinference automatically detects and" #~ " uses the correct snapshot directory, " #~ "so you don't need to worry about" #~ " the exact hash values." #~ msgstr "" #~ "``snapshots/`` 目录包含具有哈希名称" #~ "的特定版本模型文件。Xinference会自动检测并" #~ "使用正确的快照目录,因此您无需担心精确" #~ "的哈希值。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/chat.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-01 22:58+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/model_abilities/chat.rst:5 msgid "Chat & Generate" msgstr "聊天 & 生成" #: ../../source/models/model_abilities/chat.rst:7 msgid "Learn how to chat with LLMs in Xinference." msgstr "学习如何在 Xinference 中与 LLM 聊天。" #: ../../source/models/model_abilities/chat.rst:10 msgid "Introduction" msgstr "介绍" #: ../../source/models/model_abilities/chat.rst:12 msgid "" "Models equipped with ``chat`` or ``generate`` abilities are frequently " "referred to as large language models (LLM) or text generation models. " "These models are designed to respond with text outputs to the inputs they" " receive, commonly known as \"prompts\". Typically, one can direct these " "models using specific instructions or by providing concrete examples " "illustrating how to accomplish a task." msgstr "" "具备 ``chat`` 或 ``generate`` 能力的模型通常被称为大型语言模型(LLM)或" "文本生成模型。这些模型旨在根据接收到的输入以文本输出方式进行回应,通常被" "称为“提示”。一般来说,可以通过特定指令或提供具体示例来引导这些模型完成" "任务。" #: ../../source/models/model_abilities/chat.rst:17 msgid "" "Models with ``generate`` capacities are typically pre-trained large " "language models. On the other hand, models equipped with ``chat`` " "capabilities are finely-tuned and aligned LLMs, optimized for dialogues " "use case. In most cases, models ending with \"chat\" (e.g. " "``llama-2-chat``, ``qwen-chat``, etc) are identified as having ``chat`` " "capabilities." msgstr "" "具备 ``generate`` 能力的模型通常是预训练的大型语言模型。另一方面,配备 ``" "chat`` 功能的模型是经过精调和对齐的 LLM(Language Model),专为对话场景" "进行优化。在大多数情况下,以“chat”结尾的模型(例如 ``llama-2-chat``,``" "qwen-chat`` 等)则具有 ``chat`` 功能。" #: ../../source/models/model_abilities/chat.rst:22 msgid "" "The Chat API and Generate API offer two distinct approaches for " "interacting with LLMs:" msgstr "Chat API 和 Generate API 提供了两种不同的与 LLMs 进行交互的方法:" #: ../../source/models/model_abilities/chat.rst:24 msgid "" "The Chat API (like OpenAI's `Chat Completion API " "`__) can " "conduct multi-turn conversations." msgstr "" "Chat API(类似于 OpenAI 的 `Chat Completion API `__)可以进行多轮对话。" #: ../../source/models/model_abilities/chat.rst:27 msgid "" "The Generate API (like OpenAI's legacy `Completions API " "`__) " "allows you to generate text based on a text prompt." msgstr "" "Generate API(类似于 OpenAI 的 `Completions API `__ )允许您根据文本提示生成" "文本。" #: ../../source/models/model_abilities/chat.rst:34 msgid "MODEL ABILITY" msgstr "模型能力" #: ../../source/models/model_abilities/chat.rst:35 msgid "API ENDPOINT" msgstr "API 端点" #: ../../source/models/model_abilities/chat.rst:36 msgid "OpenAI-compatible ENDPOINT" msgstr "OpenAI 兼容端点" #: ../../source/models/model_abilities/chat.rst:38 msgid "chat" msgstr "" #: ../../source/models/model_abilities/chat.rst:39 #: ../../source/models/model_abilities/chat.rst:56 msgid "Chat API" msgstr "" #: ../../source/models/model_abilities/chat.rst:40 msgid "/v1/chat/completions" msgstr "" #: ../../source/models/model_abilities/chat.rst:42 msgid "generate" msgstr "" #: ../../source/models/model_abilities/chat.rst:43 #: ../../source/models/model_abilities/chat.rst:222 msgid "Generate API" msgstr "" #: ../../source/models/model_abilities/chat.rst:44 msgid "/v1/completions" msgstr "" #: ../../source/models/model_abilities/chat.rst:48 msgid "Supported models" msgstr "支持的模型列表" #: ../../source/models/model_abilities/chat.rst:50 msgid "" "You can examine the abilities of all the :ref:`builtin LLM models in " "Xinference `." msgstr "" "你可以查看所有 :ref:`Xinference 中内置的 LLM 模型的能力 `。" #: ../../source/models/model_abilities/chat.rst:53 msgid "Chat Models" msgstr "聊天模型" #: ../../source/models/model_abilities/chat.rst:58 msgid "" "The Chat API mimics OpenAI's `Chat Completion API " "`__. We can " "try Chat API out either via cURL, OpenAI Client, or Xinference's python " "client:" msgstr "" "尝试使用 cURL、OpenAI Client 或 Xinference的 Python 客户端来测试 Chat API" ":" #: ../../source/models/model_abilities/chat.rst:145 msgid "You can find more examples of Chat API in the tutorial notebook:" msgstr "你可以在教程笔记本中找到更多 Chat API 的示例。" #: ../../source/models/model_abilities/chat.rst:149 msgid "Gradio Chat" msgstr "" #: ../../source/models/model_abilities/chat.rst:152 msgid "" "Learn from an example of utilizing the Chat API with the Xinference " "Python client." msgstr "学习如何使用 Xinference 的 Chat API 和 Python 客户端的示例。" #: ../../source/models/model_abilities/chat.rst:155 msgid "Hybrid Thinking Models" msgstr "混合思考模型" #: ../../source/models/model_abilities/chat.rst:157 msgid "" "Some LLMs are marked as ``hybrid`` and can run with or without thinking " "mode." msgstr "" "部分大型语言模型标记为 ``混合型`` ,可选择是否启用思考模式运行。" #: ../../source/models/model_abilities/chat.rst:159 msgid "Request-level ``enable_thinking`` is added in v1.17.0" msgstr "请求级别的 ``enable_thinking`` 开关在 v1.17.0 支持" #: ../../source/models/model_abilities/chat.rst:162 msgid "" "Xinference exposes a request-level ``enable_thinking`` switch that works " "across different model templates (e.g. Qwen uses ``enable_thinking`` " "while some DeepSeek templates use ``thinking``)." msgstr "" "Xinference提供请求级别的 ``enable_thinking`` 开关,该开关适用于不同模型" "模板(例如Qwen使用 ``enable_thinking`` ,而部分DeepSeek模板使用 ``" "thinking`` )。" #: ../../source/models/model_abilities/chat.rst:165 msgid "Usage examples:" msgstr "使用示例:" #: ../../source/models/model_abilities/chat.rst:219 msgid "Generate Models" msgstr "生成模型" #: ../../source/models/model_abilities/chat.rst:224 msgid "" "The Generate API mirrors OpenAI's legacy `Completions API " "`__." msgstr "" "Generate API 复刻了 OpenAI 的 `Completions API `__。 " #: ../../source/models/model_abilities/chat.rst:226 msgid "" "The difference between the Generate API and the Chat API lies primarily " "in the form of input. Opposite to the Chat API that takes a list of " "messages as input, the Generate API accepts a freeform text string named " "\"prompt\"." msgstr "" "Generate API 和 Chat API 之间的区别主要在于输入形式。Chat API 接受一个" "消息列表作为输入,Generate API 接受一个名为 prompt 的自由文本字符串作为" "输入。" #: ../../source/models/model_abilities/chat.rst:297 msgid "FAQ" msgstr "" #: ../../source/models/model_abilities/chat.rst:300 msgid "" "Does Xinference's LLM provide integration methods for LangChain or " "LlamaIndex?" msgstr "Xinference 的 LLM 是否提供与 LangChain 或 LlamaIndex 的集成方法?" #: ../../source/models/model_abilities/chat.rst:302 msgid "" "Yes, you can refer to the related sections in their respective official " "Xinference documentation. Here are the links:" msgstr "是的,你可以参考它们各自官方Xinference文档中的相关部分。以下是链接:" #: ../../source/models/model_abilities/chat.rst:304 msgid "" "`LangChain LLMs: Xinference " "`__" msgstr "" #: ../../source/models/model_abilities/chat.rst:306 msgid "" "`LlamaIndex LLM integrations: Xinference " "`__" msgstr "" #~ msgid "Quickstart" #~ msgstr "快速入门" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/embed.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-02-01 16:47+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.13.1\n" #: ../../source/models/model_abilities/embed.rst:5 msgid "Embeddings" msgstr "嵌入" #: ../../source/models/model_abilities/embed.rst:8 msgid "Learn how to create text embeddings in Xinference." msgstr "学习如何在 Xinference 中创建文本嵌入。" #: ../../source/models/model_abilities/embed.rst:12 msgid "Introduction" msgstr "介绍" #: ../../source/models/model_abilities/embed.rst:14 msgid "" "Text embeddings are used to quantify how related different pieces of text" " are. They can be used in various applications including search, " "clustering, recommendations, anomaly detection, diversity measurement, " "and classification." msgstr "文本嵌入用于量化不同文本之间的相关性。它们可以应用于各种应用程序,包括搜索、聚类、推荐、异常检测、多样性度量和分类。" #: ../../source/models/model_abilities/embed.rst:17 msgid "" "An embedding is a vector of floating point numbers. The proximity between" " two vectors serves as an indicator of their similarity. Less distance " "implies a higher correlation, while a larger distance indicates reduced " "correlation." msgstr "嵌入是一组浮点数的向量。两个向量之间的接近程度可以作为它们相似性的指标。" "距离越小表示相关性越高,而距离越大则表示相关性降低。" #: ../../source/models/model_abilities/embed.rst:20 msgid "" "Embedding models in Xinference can be invoked through the Embeddings API " "to create embeddings. The Embeddings API mimics OpenAI's `create " "embeddings API `_." msgstr "通过 Embeddings API 在 Xinference 中嵌入模型可以被调用,以创建嵌入。" "Embeddings API 模仿了 OpenAI 的 `create embeddings API `_。" #: ../../source/models/model_abilities/embed.rst:27 msgid "API ENDPOINT" msgstr "API 端点" #: ../../source/models/model_abilities/embed.rst:28 msgid "OpenAI-compatible ENDPOINT" msgstr "OpenAI 兼容端点" #: ../../source/models/model_abilities/embed.rst:30 msgid "Embeddings API" msgstr "" #: ../../source/models/model_abilities/embed.rst:31 msgid "/v1/embeddings" msgstr "" #: ../../source/models/model_abilities/embed.rst:35 msgid "Supported models" msgstr "支持的模型列表" #: ../../source/models/model_abilities/embed.rst:37 msgid "" "You can examine all the :ref:`builtin embedding models in Xinference " "`." msgstr "你可以查看所有 :ref:`Xinference 内置中的嵌入模型 `。" #: ../../source/models/model_abilities/embed.rst:41 msgid "Quickstart" msgstr "快速入门" #: ../../source/models/model_abilities/embed.rst:43 msgid "" "We can try Embeddings API out either via cURL, OpenAI Client, or " "Xinference's python client:" msgstr "我们可以通过 cURL、OpenAI Client 或 Xinference 的 Python 客户端来尝试 Embeddings API。" #: ../../source/models/model_abilities/embed.rst:103 msgid "You can find more examples of ``embed`` ability in the tutorial notebook:" msgstr "你可以在教程笔记本中找到更多关于 ``embed`` 能力的示例。" #: ../../source/models/model_abilities/embed.rst:107 msgid "LangChain Streamlit Doc Chat" msgstr "LangChain Streamlit 文档对话" #: ../../source/models/model_abilities/embed.rst:110 msgid "Learn from an example demonstrating how to use embed API via LangChain" msgstr "从一个示例中学习如何通过 LangChain 使用嵌入 API" #: ../../source/models/model_abilities/embed.rst:114 msgid "FAQ" msgstr "" #: ../../source/models/model_abilities/embed.rst:117 msgid "Does the LLM in Xinference support Embeddings API?" msgstr "LLM 在 Xinference 中是否支持 Embeddings API?" #: ../../source/models/model_abilities/embed.rst:119 msgid "" "No. Xinference doesn't provide embed API for LLMs due to considerations " "of performance." msgstr "不支持,Xinference由于性能考虑,并没有提供 LLMs 嵌入 API。" #: ../../source/models/model_abilities/embed.rst:123 msgid "Does Embeddings API provides integration method for LangChain?" msgstr "Embeddings API 是否提供了与 LangChain 的集成方法?" #: ../../source/models/model_abilities/embed.rst:125 msgid "" "Yes, you can refer to the related sections in LangChain's respective " "official Xinference documentation. Here is the link: `Text Embedding " "Models: Xinference " "`_" msgstr "是的,你可以参考LangChain相关部分的官方Xinference文档。这里是链接:" "`Text Embedding Models: Xinference `_" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/flexible.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2025. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-06-26 13:20+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/models/model_abilities/flexible.rst:5 msgid "Traditional ML models (Experimental)" msgstr "传统机器学习模型(实验性质)" #: ../../source/models/model_abilities/flexible.rst:7 msgid "" "Learn how to inference traditional machine learning models with " "Xinference. These flexibly extensible models are referred to as " "**Flexible Models** within Xinference." msgstr "" "了解如何使用 Xinference 推理传统机器学习模型。在 Xinference 中,这些灵活可扩展的模型被称为 **灵活模型**。" #: ../../source/models/model_abilities/flexible.rst:10 msgid "" "This ability is public since v1.7.1, now the API is not stable and may " "change during evolving." msgstr "" "该功能自 v1.7.1 版本起公开,目前 API 尚不稳定,可能会在后续迭代中发生变化。" #: ../../source/models/model_abilities/flexible.rst:15 msgid "Introduction" msgstr "介绍" #: ../../source/models/model_abilities/flexible.rst:17 msgid "" "Traditional machine learning models can still play a significant role " "within an LLM-centric ecosystem." msgstr "" "传统机器学习模型在以大模型为核心的生态系统中仍然能发挥重要作用。" #: ../../source/models/model_abilities/flexible.rst:19 msgid "" "Xinference provides flexible extensibility for performing inference with " "traditional machine learning models. It includes built-in support for " "loading and running the following types of models:" msgstr "" "Xinference 提供了灵活的扩展能力,用于推理传统机器学习模型。它内置支持加载和运行以下类型的模型:" #: ../../source/models/model_abilities/flexible.rst:22 msgid "" "HuggingFace Pipelines for tasks such as classification using models " "hosted on HuggingFace." msgstr "" "使用 HuggingFace 托管模型的 HuggingFace Pipeline,可用于分类等任务。" #: ../../source/models/model_abilities/flexible.rst:23 msgid "" "ModelScope Pipelines for tasks such as classification using models from " "ModelScope." msgstr "" "使用 ModelScope 上模型的 ModelScope Pipeline,可用于分类等任务。" #: ../../source/models/model_abilities/flexible.rst:24 msgid "YOLO for image detection and related computer vision tasks." msgstr "" "YOLO 用于图像检测及相关计算机视觉任务。" #: ../../source/models/model_abilities/flexible.rst:26 msgid "" "A wide range of traditional machine learning models can be used with " "Xinference. For each of the categories above, we will walk through a " "representative example to demonstrate how to perform inference step by " "step on the Xinference platform." msgstr "" "Xinference 支持多种传统机器学习模型。针对上述每个类别,我们将通过一个代表性示例," "逐步演示如何在 Xinference 平台上进行推理。" #: ../../source/models/model_abilities/flexible.rst:31 msgid "Built-in Model Support Examples" msgstr "内置模型支持案例" #: ../../source/models/model_abilities/flexible.rst:34 msgid "HuggingFace Pipeline Model" msgstr "HuggingFace Pipeline 模型" #: ../../source/models/model_abilities/flexible.rst:36 msgid "" "First, we use `FacebookAI/roberta-large-mnli " "`_ as an example. " "This is a zero-shot classification model. For other types of models, " "simply specify the corresponding task (which is also a parameter of the " "Pipeline) when registering the model." msgstr "" "首先,我们以 `FacebookAI/roberta-large-mnli `_ 为例。" "该模型属于零样本分类模型。对于其他类型的模型,注册时只需指定对应的任务(也是 Pipeline 的参数)。" #: ../../source/models/model_abilities/flexible.rst:41 msgid "Download the model to the following path::" msgstr "将模型下载到以下路径::" #: ../../source/models/model_abilities/flexible.rst:45 msgid "" "Next, we demonstrate how to register this flexible model in the " "Xinference Web UI. For the following examples, unless we have to, we will" " skip the UI steps and focus on the core logic." msgstr "接下来,我们演示如何在 Xinference Web UI 中注册该灵活模型。后续示例中,除非必要,我们将跳过界面操作,专注于核心逻辑。" #: ../../source/models/model_abilities/flexible.rst:52 msgid "The corresponding custom model JSON file is as follows:" msgstr "对应的自定义模型 JSON 文件如下:" #: ../../source/models/model_abilities/flexible.rst:77 msgid "" "Refer to the section :ref:`register_custom_model` for instructions on " "registering the model using either code or the command line." msgstr "" "请参见章节 :ref:`register_custom_model`,了解如何通过代码或命令行注册模型。" #: ../../source/models/model_abilities/flexible.rst:79 msgid "" "Next, load the model by selecting **Launch Model** / **Custom Model** / " "**Flexible Model** in the Web UI. The loading procedure is the same as " "for other model types." msgstr "" "接下来,在 Web UI 中选择 **启动模型** / **自定义模型** / **灵活模型** 来加载模型。" "加载流程与其他模型类型相同。" #: ../../source/models/model_abilities/flexible.rst:82 msgid "" "When using the command line, remember to specify the option ``--model-" "type flexible``." msgstr "" "使用命令行时,请记得指定参数 ``--model-type flexible``。" #: ../../source/models/model_abilities/flexible.rst:84 msgid "" "After the model is successfully loaded, we can perform inference using " "the following method." msgstr "模型成功加载后,我们可以通过以下方式进行推理。" #: ../../source/models/model_abilities/flexible.rst:120 msgid "ModelScope Pipeline Model" msgstr "ModelScope Pipeline 模型" #: ../../source/models/model_abilities/flexible.rst:122 msgid "" "ModelScope Pipeline models are very similar to Huggingface ones. The only" " difference lies in the launcher used." msgstr "" "ModelScope Pipeline 模型与 Huggingface 模型非常相似,唯一的区别在于使用的 launcher 不同。" #: ../../source/models/model_abilities/flexible.rst:125 msgid "" "We take a zero-shot classification model from ModelScope as an example. " "The model is `iic/nlp_structbert_zero-shot-classification_chinese-base " "`_." msgstr "" "我们以 ModelScope 上的一个零样本分类模型为例。模型为 `iic/nlp_structbert_zero-shot-classification_chinese-base " "`_。" #: ../../source/models/model_abilities/flexible.rst:128 msgid "" "Here, we make use of Xinference's model virtual environment feature. This" " is because the model used in this example requires " "``transformers==4.50.3`` to run properly. To isolate the environment, we " "use a :ref:`virtual env ` when registering the model." msgstr "" "这里我们使用了 Xinference 的模型虚拟环境功能。因为本示例中使用的模型需要 ``transformers==4.50.3`` 才能正常运行。" "为了隔离运行环境,我们在注册模型时使用了 :ref:`虚拟环境 `。" #: ../../source/models/model_abilities/flexible.rst:132 msgid "" "When specifying custom packages during registration, the syntax is the " "same as for regular packages, with a few special cases. Since the virtual" " environment is still based on the site packages of the Python runtime " "where Xinference is running, we need to explicitly include " "``#system_numpy#``. Packages wrapped in ``#system_xx#`` ensure consistency " "with the base environment during virtual environment creation; otherwise," " it may easily result in runtime errors." msgstr "" "注册模型时指定自定义包的语法与普通包相同,但有一些特殊情况。由于虚拟环境仍基于 Xinference 运行的 Python 解释器的" " site-packages,我们需要显式包含 ``#system_numpy#``。包名用 ``#system_xx#`` 包裹," "确保虚拟环境创建时与基础环境一致,否则很容易导致运行时错误。" #: ../../source/models/model_abilities/flexible.rst:136 msgid "Registering via Web UI:" msgstr "注册方式(Web UI):" #: ../../source/models/model_abilities/flexible.rst:142 msgid "Corresponding json file:" msgstr "对应的 JSON 文件:" #: ../../source/models/model_abilities/flexible.rst:170 #: ../../source/models/model_abilities/flexible.rst:241 msgid "Inference the model:" msgstr "模型推理:" #: ../../source/models/model_abilities/flexible.rst:206 msgid "YOLO" msgstr "" #: ../../source/models/model_abilities/flexible.rst:208 msgid "" "YOLO is a popular real-time object detection model, widely used in image " "detection and video analysis scenarios." msgstr "YOLO 是一种流行的实时目标检测模型,广泛应用于图像检测和视频分析场景。" #: ../../source/models/model_abilities/flexible.rst:210 msgid "" "First, download the YOLO weights. Here, we use the `yolov11s.pt " "`_ file as an example." msgstr "" "首先,下载 YOLO 权重。这里我们以 `yolov11s.pt `_ 文件为例。" #: ../../source/models/model_abilities/flexible.rst:213 msgid "JSON file of model definition:" msgstr "" "模型定义的 JSON 文件:" #: ../../source/models/model_abilities/flexible.rst:300 msgid "Writing a Custom Flexible Model" msgstr "编写自定义灵活模型" #: ../../source/models/model_abilities/flexible.rst:302 msgid "" "First, we implement a custom launcher with a simple model for sentiment " "scoring. In this example, we do not use any actual model weights, so the " "``load`` function does not perform any model loading." msgstr "首先,我们实现了一个用于情感评分的简单自定义 launcher。在此示例中,我们未使用任何实际模型权重," "因此 ``load`` 函数不执行任何模型加载操作。" #: ../../source/models/model_abilities/flexible.rst:334 msgid "The model JSON definition is as follows:" msgstr "模型 JSON 定义如下:" #: ../../source/models/model_abilities/flexible.rst:359 msgid "Here, we extend the model by passing in a custom-defined ``pos`` value." msgstr "这里我们通过传入自定义的 ``pos`` 值扩展了模型。" #: ../../source/models/model_abilities/flexible.rst:361 msgid "Finally, let's verify the result:" msgstr "最后,我们验证下结果:" #: ../../source/models/model_abilities/flexible.rst:380 msgid "Conclusion" msgstr "结论" #: ../../source/models/model_abilities/flexible.rst:382 msgid "" "The built-in Flexible Model launchers in Xinference can be found at " "`Github " "`_." " Contributions are welcome to support more traditional machine learning " "models!" msgstr "Xinference 内置的灵活模型 launcher 可以在 " "`Github `_ 找到," "欢迎贡献更多传统机器学习模型的支持!" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/image.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-12-04 17:26+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/models/model_abilities/image.rst:5 msgid "Images" msgstr "图像" #: ../../source/models/model_abilities/image.rst:7 msgid "Learn how to generate images with Xinference." msgstr "学习如何使用 Xinference 生成图像。" #: ../../source/models/model_abilities/image.rst:11 msgid "Introduction" msgstr "介绍" #: ../../source/models/model_abilities/image.rst:14 msgid "The Images API provides two methods for interacting with images:" msgstr "Images API提供了两种与图像交互的方法:" #: ../../source/models/model_abilities/image.rst:17 msgid "" "The Text-to-image endpoint create images from scratch based on a text " "prompt." msgstr "文生图端点根据文本从零开始创建图像。" #: ../../source/models/model_abilities/image.rst:18 msgid "" "The Image-to-image endpoint allows you to generate a variation of a given" " image." msgstr "图生图端点允许您生成给定图像的变体。" #: ../../source/models/model_abilities/image.rst:25 msgid "API ENDPOINT" msgstr "API 端点" #: ../../source/models/model_abilities/image.rst:26 msgid "OpenAI-compatible ENDPOINT" msgstr "OpenAI 兼容端点" #: ../../source/models/model_abilities/image.rst:28 msgid "Text-to-Image API" msgstr "" #: ../../source/models/model_abilities/image.rst:29 msgid "/v1/images/generations" msgstr "" #: ../../source/models/model_abilities/image.rst:31 msgid "Image-to-image API" msgstr "" #: ../../source/models/model_abilities/image.rst:32 msgid "/v1/images/variations" msgstr "" #: ../../source/models/model_abilities/image.rst:35 msgid "Supported models" msgstr "支持的模型列表" #: ../../source/models/model_abilities/image.rst:37 msgid "" "The Text-to-image API is supported with the following models in " "Xinference:" msgstr "Text-to-image API 在 Xinference 中支持以下模型:" #: ../../source/models/model_abilities/image.rst:39 msgid "sd-turbo" msgstr "" #: ../../source/models/model_abilities/image.rst:40 msgid "sdxl-turbo" msgstr "" #: ../../source/models/model_abilities/image.rst:41 msgid "stable-diffusion-v1.5" msgstr "" #: ../../source/models/model_abilities/image.rst:42 msgid "stable-diffusion-xl-base-1.0" msgstr "" #: ../../source/models/model_abilities/image.rst:43 #: ../../source/models/model_abilities/image.rst:213 msgid "sd3-medium" msgstr "" #: ../../source/models/model_abilities/image.rst:44 #: ../../source/models/model_abilities/image.rst:215 #: ../../source/models/model_abilities/image.rst:252 msgid "sd3.5-medium" msgstr "" #: ../../source/models/model_abilities/image.rst:45 #: ../../source/models/model_abilities/image.rst:217 #: ../../source/models/model_abilities/image.rst:254 msgid "sd3.5-large" msgstr "" #: ../../source/models/model_abilities/image.rst:46 #: ../../source/models/model_abilities/image.rst:219 #: ../../source/models/model_abilities/image.rst:256 msgid "sd3.5-large-turbo" msgstr "" #: ../../source/models/model_abilities/image.rst:47 #: ../../source/models/model_abilities/image.rst:211 #: ../../source/models/model_abilities/image.rst:250 msgid "FLUX.1-schnell" msgstr "" #: ../../source/models/model_abilities/image.rst:48 #: ../../source/models/model_abilities/image.rst:209 #: ../../source/models/model_abilities/image.rst:248 msgid "FLUX.1-dev" msgstr "" #: ../../source/models/model_abilities/image.rst:49 msgid "Kolors" msgstr "" #: ../../source/models/model_abilities/image.rst:50 msgid "hunyuandit-v1.2" msgstr "" #: ../../source/models/model_abilities/image.rst:51 msgid "hunyuandit-v1.2-distilled" msgstr "" #: ../../source/models/model_abilities/image.rst:52 msgid "cogview4" msgstr "" #: ../../source/models/model_abilities/image.rst:53 #: ../../source/models/model_abilities/image.rst:221 #: ../../source/models/model_abilities/image.rst:258 #: ../../source/models/model_abilities/image.rst:292 msgid "Qwen-Image" msgstr "" #: ../../source/models/model_abilities/image.rst:55 #, fuzzy msgid "Image-to-image supported models:" msgstr "支持的模型列表" #: ../../source/models/model_abilities/image.rst:57 msgid "Flux.1-Kontext-dev" msgstr "" #: ../../source/models/model_abilities/image.rst:58 #: ../../source/models/model_abilities/image.rst:223 #: ../../source/models/model_abilities/image.rst:260 #: ../../source/models/model_abilities/image.rst:294 msgid "Qwen-Image-Edit" msgstr "" #: ../../source/models/model_abilities/image.rst:62 msgid "Quickstart" msgstr "快速入门" #: ../../source/models/model_abilities/image.rst:65 msgid "Text-to-image" msgstr "文生图" #: ../../source/models/model_abilities/image.rst:67 msgid "" "The Text-to-image API mimics OpenAI's `create images API " "`_. We can " "try Text-to-image API out either via cURL, OpenAI Client, or Xinference's" " python client:" msgstr "" "可以通过 cURL、OpenAI Client 或 Xinference 的方式尝试使用 Text-to-image " "API。" #: ../../source/models/model_abilities/image.rst:121 msgid "Image-to-image" msgstr "图生图" #: ../../source/models/model_abilities/image.rst:123 msgid "" "The Image-to-image API mimics OpenAI's `create image variation API " "`_. We can try image-to-image API out " "either via cURL, OpenAI Client, or Xinference's python client:" msgstr "" "图生图 API 模拟了 OpenAI 的 `图像变体创建 API `_。我们可以通过 cURL、" "OpenAI 客户端,或 Xinference 的 Python 客户端来尝试使用图生图 API:" #: ../../source/models/model_abilities/image.rst:176 msgid "Memory optimization for Large Image Models e.g. SD3-Medium, FLUX.1" msgstr "大型图像模型(例如 SD3-Medium、FLUX.1)的内存优化" #: ../../source/models/model_abilities/image.rst:180 msgid "" "From v0.16.1, Xinference by default enabled quantization for large image " "models like Flux.1 and SD3.5 series. So if your Xinference version is " "newer than v0.16.1, You barely need to do anything to run those large " "image models on GPUs with small memory." msgstr "" "从 v0.16.1 开始,Xinference 默认对大图像模型如 Flux.1 和 SD3.5 系列开启" "量化。如果你使用新于 v0.16.1 的 Xinference 版本,你不需要做什么事情来在小" " GPU 显存的机器上来运行这些大型图像模型。" #: ../../source/models/model_abilities/image.rst:185 msgid "Useful extra parameters can be passed to launch including:" msgstr "有用的传递给加载模型的额外参数包括:" #: ../../source/models/model_abilities/image.rst:187 msgid "" "``--cpu_offload True``: specifying ``True`` will offload the components " "of the model to CPU during inference in order to save memory, while " "seeing a slight increase in inference latency. Model offloading will only" " move a model component onto the GPU when it needs to be executed, while " "keeping the remaining components on the CPU." msgstr "" "``--cpu_offload True``:指定 ``True`` 会在推理过程中将模型的组件卸载到 " "CPU 上以节省内存,这会导致推理延迟略有增加。模型卸载仅会在需要执行时将" "模型组件移动到 GPU 上,同时保持其余组件在 CPU 上" #: ../../source/models/model_abilities/image.rst:191 msgid "" "``--quantize_text_encoder ``: We leveraged the " "``bitsandbytes`` library to load and quantize the T5-XXL text encoder to " "8-bit precision. This allows you to keep using all text encoders while " "only slightly impacting performance." msgstr "" "``--quantize_text_encoder ``:我们利用 ``bitsandbytes" "`` 库加载并量化 T5-XXL 文本编码器至8位精度。这使得你能够在仅轻微影响性能" "的情况下继续使用全部文本编码器。" #: ../../source/models/model_abilities/image.rst:194 msgid "" "``--text_encoder_3 None``, for sd3-medium, removing the memory-intensive " "4.7B parameter T5-XXL text encoder during inference can significantly " "decrease the memory requirements with only a slight loss in performance." msgstr "" "``--text_encoder_3 None``,对于 sd3-medium,移除在推理过程中内存密集型的" "47亿参数T5-XXL文本编码器可以显著降低内存需求,而仅造成性能上的轻微损失。" #: ../../source/models/model_abilities/image.rst:197 msgid "``--transformer_nf4 True``: use nf4 for transformer quantization." msgstr "``--transformer_nf4 True`` :使用 nf4 量化 transformer。" #: ../../source/models/model_abilities/image.rst:198 msgid "" "``--quantize``: Only work for MLX on Mac, Flux.1-dev and Flux.1-schnell " "will switch to MLX engine on Mac, and ``quantize`` can be used to " "quantize the model." msgstr "" "``--quantize`` :只对 Mac 上的 MLX 引擎生效,Flux.1-dev 和 Flux.1-schnell" "会在 Mac 上使用 MLX 引擎计算,``quantize`` 可以用来量化模型。" #: ../../source/models/model_abilities/image.rst:201 msgid "" "For WebUI, Just add additional parameters, e.g. add key ``cpu_offload`` " "and value ``True`` to enable cpu offloading." msgstr "" "对于 WebUI,只需要添加额外参数,比如,添加 key ``cpu_offload`` 以及值 ``" "True`` 来开启 CPU 卸载。" #: ../../source/models/model_abilities/image.rst:204 msgid "Below list default options that used from v0.16.1." msgstr "如下列出了从 v0.16.1 开始默认使用的参数。" #: ../../source/models/model_abilities/image.rst:207 #: ../../source/models/model_abilities/image.rst:246 #: ../../source/models/model_abilities/image.rst:290 msgid "Model" msgstr "模型" #: ../../source/models/model_abilities/image.rst:207 msgid "quantize_text_encoder" msgstr "" #: ../../source/models/model_abilities/image.rst:207 msgid "quantize" msgstr "" #: ../../source/models/model_abilities/image.rst:207 msgid "transformer_nf4" msgstr "" #: ../../source/models/model_abilities/image.rst:209 #: ../../source/models/model_abilities/image.rst:211 msgid "text_encoder_2" msgstr "" #: ../../source/models/model_abilities/image.rst:209 #: ../../source/models/model_abilities/image.rst:211 #: ../../source/models/model_abilities/image.rst:217 #: ../../source/models/model_abilities/image.rst:219 msgid "True" msgstr "" #: ../../source/models/model_abilities/image.rst:209 #: ../../source/models/model_abilities/image.rst:211 #: ../../source/models/model_abilities/image.rst:213 #: ../../source/models/model_abilities/image.rst:215 #: ../../source/models/model_abilities/image.rst:221 #: ../../source/models/model_abilities/image.rst:223 msgid "False" msgstr "" #: ../../source/models/model_abilities/image.rst:213 #: ../../source/models/model_abilities/image.rst:215 #: ../../source/models/model_abilities/image.rst:217 #: ../../source/models/model_abilities/image.rst:219 msgid "text_encoder_3" msgstr "" #: ../../source/models/model_abilities/image.rst:213 #: ../../source/models/model_abilities/image.rst:215 #: ../../source/models/model_abilities/image.rst:217 #: ../../source/models/model_abilities/image.rst:219 #: ../../source/models/model_abilities/image.rst:221 #: ../../source/models/model_abilities/image.rst:223 msgid "N/A" msgstr "" #: ../../source/models/model_abilities/image.rst:221 #: ../../source/models/model_abilities/image.rst:223 msgid "text_encoder" msgstr "" #: ../../source/models/model_abilities/image.rst:228 msgid "" "If you want to disable some quantization, just set the corresponding " "option to False. e.g. for Web UI, set key ``quantize_text_encoder`` and " "value ``False`` and for command line, specify ``--quantize_text_encoder " "False`` to disable quantization for text encoder." msgstr "" "如果你想关闭某些量化,只需要设置相应的选项为 False。比如,对于 Web UI," "设置 key ``quantize_text_encoder`` 和值 ``False``,或对于命令行,指定 ``" "--quantize_text_encoder False`` 来关闭 text encoder 的量化。" #: ../../source/models/model_abilities/image.rst:233 msgid "" "For :ref:`CogView4 `, we found that quantization" " has a significant impact on the model. Therefore, when GPU memory is " "limited, we recommend enabling the CPU offload option in the Web UI, and " "specifying ``--cpu_offload True`` when loading the model via the command " "line." msgstr "" "对于 :ref:`CogView4 `,我们发现量化对模型的影响" "较大。因此,当显存有限时,我们推荐在 Web UI 中启用 CPU offload 选项,在" "命令行加载模型时指定 ``--cpu_offload True``。" #: ../../source/models/model_abilities/image.rst:238 msgid "GGUF file format" msgstr "GGUF 文件格式" #: ../../source/models/model_abilities/image.rst:240 msgid "" "GGUF file format for transformer provides various quantization options. " "To use gguf file, you can specify additional option ``gguf_quantization``" " for web UI, or ``--gguf_quantization`` for command line for those image " "models which support internally by Xinference. Below is the mode list." msgstr "" "GGUF 文件格式为 transformer 模块提供了丰富的量化选项。要使用 GGUF 文件," "你可以在 Web 界面上指定额外选项 ``gguf_quantization`` ,或者在命令行指定 " "``--gguf_quantization`` ,以为 Xinference 内建支持 GGUF 量化的模型开启。" "如下是内置支持的模型。" #: ../../source/models/model_abilities/image.rst:246 msgid "supported gguf quantization" msgstr "支持 GGUF 量化格式" #: ../../source/models/model_abilities/image.rst:248 #: ../../source/models/model_abilities/image.rst:250 msgid "F16, Q2_K, Q3_K_S, Q4_0, Q4_1, Q4_K_S, Q5_0, Q5_1, Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/model_abilities/image.rst:252 #: ../../source/models/model_abilities/image.rst:258 msgid "" "F16, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, " "Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/model_abilities/image.rst:254 #: ../../source/models/model_abilities/image.rst:256 msgid "F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0" msgstr "" #: ../../source/models/model_abilities/image.rst:260 #: ../../source/models/model_abilities/image.rst:262 msgid "" "Q2_K, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, " "Q5_K_S, Q6_K, Q8_0" msgstr "" #: ../../source/models/model_abilities/image.rst:262 #: ../../source/models/model_abilities/image.rst:296 msgid "Qwen-Image-Edit-2509" msgstr "" #: ../../source/models/model_abilities/image.rst:267 msgid "" "We stronly recommend to enable additional option ``cpu_offload`` with " "value ``True`` for WebUI, or specify ``--cpu_offload True`` for command " "line." msgstr "" "我们强烈推荐在 WebUI 上开启额外选项 ``cpu_offload`` 并指定为 ``True``,或" "对命令行,指定 ``--cpu_offload True``。" #: ../../source/models/model_abilities/image.rst:270 msgid "Example:" msgstr "例如:" #: ../../source/models/model_abilities/image.rst:276 msgid "" "With ``Q2_K`` quantization, you only need around 5 GiB GPU memory to run " "Flux.1-dev." msgstr "使用 ``Q2_K`` 量化,你只需要大约 5GB 的显存来运行 Flux.1-dev。" #: ../../source/models/model_abilities/image.rst:278 msgid "" "For those models gguf options are not supported internally, or you want " "to download gguf files on you own, you can specify additional option " "``gguf_model_path`` for web UI or spcecify ``--gguf_model_path " "/path/to/model_quant.gguf`` for command line." msgstr "" "对于非内建支持 GGUF 量化的模型,或者你希望自己下载 GGUF 文件,你可以在 " "Web UI 指定额外选项 ``gguf_model_path`` 或者用命令行指定 ``--gguf_model_" "path /path/to/model_quant.gguf`` 。" #: ../../source/models/model_abilities/image.rst:283 msgid "Lightning LORA Support" msgstr "Lightning LORA 支持" #: ../../source/models/model_abilities/image.rst:285 msgid "" "Lightning LORA performs distillation on models in the form of LoRA, " "reducing the number of inference steps while maintaining model " "performance, and significantly speeding up inference. The following " "models currently support this LoRA:" msgstr "" "Lightning LORA 以 LoRA 的形式对模型进行蒸馏,在保持模型性能的同时减少推理" "步数,并大幅提升推理速度。以下模型目前支持该 LoRA:" #: ../../source/models/model_abilities/image.rst:290 msgid "Supported lightning version" msgstr "支持的 Lightning 版本" #: ../../source/models/model_abilities/image.rst:292 msgid "4steps-V1.0-bf16, 4steps-V1.0, 8steps-V1.0, 8steps-V1.1-bf16, 8steps-V1.1" msgstr "" #: ../../source/models/model_abilities/image.rst:294 msgid "4steps-V1.0-bf16, 4steps-V1.0, 8steps-V1.0-bf16, 8steps-V1.0" msgstr "" #: ../../source/models/model_abilities/image.rst:296 msgid "4steps-V1.0-bf16, 4steps-V1.0-fp32, 8steps-V1.0-bf16, 8steps-V1.0-fp32" msgstr "" #: ../../source/models/model_abilities/image.rst:299 msgid "" "4 steps or 8 steps refer to the inference steps " "(``num_inference_steps``). When ``lightning_version`` is specified, " "Xinference will automatically set the number of inference steps." msgstr "" "4 步或 8 步是指推理步数( ``num_inference_steps`` )。当指定了 ``" "lightning_version`` 时,Xinference 会自动设定推理步数。" #: ../../source/models/model_abilities/image.rst:302 msgid "" "When using it, select the lightning version in the interface, or specify " "it via the command line." msgstr "使用时,可以在界面上选择 lightning 版本,或者通过命令行指定。" #: ../../source/models/model_abilities/image.rst:308 msgid "Use the command line with ``--lightning_version ``." msgstr "在命令行中使用 ``--lightning_version ``。" #: ../../source/models/model_abilities/image.rst:310 msgid "" "For those who have downloaded the lightning LoRA files themselves, you " "can specify them via the Lightning Model Path in the interface or by " "using the command line option ``--lightning_model_path``." msgstr "" "对于自行下载了 lightning LoRA 文件的用户,可以在界面上通过 Lightning " "Model Path 指定,或者使用命令行参数 ``--lightning_model_path`` 。" #: ../../source/models/model_abilities/image.rst:313 msgid "" "For example, using ``4steps-V1.0``, the inference time is reduced from " "the original 34s to 3s." msgstr "例如,使用 ``4steps-V1.0`` 时,推理时间从原来的 34 秒减少到 3 秒。" #: ../../source/models/model_abilities/image.rst:316 msgid "OCR" msgstr "" #: ../../source/models/model_abilities/image.rst:318 msgid "The OCR API accepts image bytes and returns the OCR text." msgstr "OCR API 接受图像字节并返回 OCR 文本。" #: ../../source/models/model_abilities/image.rst:320 msgid "We can try OCR API out either via cURL, or Xinference's python client:" msgstr "可以通过 cURL 或 Xinference 的 Python 客户端来尝试 OCR API。" #~ msgid "You can find more examples of Images API in the tutorial notebook:" #~ msgstr "你可以在教程笔记本中找到更多 Images API 的示例。" #~ msgid "Stable Diffusion ControlNet" #~ msgstr "" #~ msgid "Learn from a Stable Diffusion ControlNet example" #~ msgstr "学习一个 Stable Diffusion 控制网络的示例" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-02-01 16:47+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.13.1\n" #: ../../source/models/model_abilities/index.rst:5 msgid "Model Abilities" msgstr "模型能力" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/multimodal.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-08-25 03:59+0000\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/models/model_abilities/multimodal.rst:5 msgid "Multimodal" msgstr "多模态" #: ../../source/models/model_abilities/multimodal.rst:7 msgid "Learn how to process images and audio with LLMs." msgstr "学习如何使用 LLM 处理图像和音频。" #: ../../source/models/model_abilities/multimodal.rst:11 msgid "Vision" msgstr "视觉" #: ../../source/models/model_abilities/multimodal.rst:13 msgid "" "With the ``vision`` ability you can have your model take in images and " "answer questions about them. Within Xinference, this indicates that " "certain models are capable of processing image inputs when conducting " "dialogues via the Chat API." msgstr "" "通过 ``vision`` 能力,您可以让模型接收图像并回答有关它们的问题。在 Xinference 中,这表示某些模型在通过 Chat API " "进行对话时能够处理图像输入。" #: ../../source/models/model_abilities/multimodal.rst:19 #: ../../source/models/model_abilities/multimodal.rst:190 msgid "Supported models" msgstr "支持的模型列表" #: ../../source/models/model_abilities/multimodal.rst:21 msgid "" "The ``vision`` ability is supported with the following models in " "Xinference:" msgstr "在 Xinference 中支持 ``vision`` 功能的模型如下:" #: ../../source/models/model_abilities/multimodal.rst:23 msgid ":ref:`qwen-vl-chat `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:24 msgid ":ref:`deepseek-vl-chat `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:25 msgid ":ref:`omnilmm `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:26 msgid ":ref:`cogvlm2 `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:27 msgid ":ref:`MiniCPM-Llama3-V 2.5 `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:28 msgid ":ref:`GLM-4V `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:29 msgid ":ref:`MiniCPM-Llama3-V 2.6 `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:30 msgid ":ref:`qwen2-vl-instruct `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:31 msgid ":ref:`llama-3.2-vision `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:32 msgid ":ref:`llama-3.2-vision-instruct `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:33 msgid ":ref:`glm-edge-v `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:34 msgid ":ref:`qwen2.5-vl-instruct `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:35 msgid ":ref:`gemma-3-it `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:36 msgid ":ref:`deepseek-vl2 `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:37 msgid ":ref:`internvl3 `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:41 #: ../../source/models/model_abilities/multimodal.rst:197 msgid "Quickstart" msgstr "快速入门" #: ../../source/models/model_abilities/multimodal.rst:43 msgid "" "Images are made available to the model in two main ways: by passing a " "link to the image or by passing the base64 encoded image directly in the " "request." msgstr "模型可以通过两种主要方式获取图像:通过传递图像的链接或直接在请求中传递 base64 编码的图像。" #: ../../source/models/model_abilities/multimodal.rst:47 msgid "Example using OpenAI Client" msgstr "使用 OpenAI 客户端的示例" #: ../../source/models/model_abilities/multimodal.rst:78 msgid "Uploading base 64 encoded images" msgstr "上传 Base64 编码的图片" #: ../../source/models/model_abilities/multimodal.rst:121 msgid "Limiting Images Per Prompt" msgstr "限制每轮对话中的图像数量" #: ../../source/models/model_abilities/multimodal.rst:123 msgid "" "For vision models using the VLLM backend, you can use the " "``limit_mm_per_prompt`` parameter to limit the number of images that can " "be processed in each conversation turn. This helps control memory usage " "and improve performance." msgstr "" "对于使用 VLLM 后端的视觉模型,你可以通过 ``limit_mm_per_prompt`` " "参数来限制每轮对话中可以处理的图像数量。这有助于控制内存使用和提高性能。" #: ../../source/models/model_abilities/multimodal.rst:142 msgid "Alternatively, you can launch the model using the command line:" msgstr "或者,你可以使用命令行启动模型:" #: ../../source/models/model_abilities/multimodal.rst:155 msgid "" "For Web UI, you can set the ``limit_mm_per_prompt`` parameter in the " "launch form:" msgstr "对于 Web UI,你可以在vLLM引擎表单中设置 ``limit_mm_per_prompt`` 参数:" #: ../../source/models/model_abilities/multimodal.rst:161 msgid "This parameter provides the following benefits:" msgstr "此参数提供以下好处:" #: ../../source/models/model_abilities/multimodal.rst:163 msgid "**image**: Sets the maximum number of images allowed per conversation turn" msgstr "**image**: 设置每轮对话中允许的最大图像数量" #: ../../source/models/model_abilities/multimodal.rst:164 msgid "Helps prevent memory overflow, especially when processing multiple images" msgstr "有助于防止内存溢出,特别是在处理多张图像时" #: ../../source/models/model_abilities/multimodal.rst:165 msgid "Improves model inference stability and performance" msgstr "提高模型推理的稳定性和性能" #: ../../source/models/model_abilities/multimodal.rst:166 msgid "Applies to all VLLM-based vision models" msgstr "适用于所有基于 VLLM 的视觉模型" #: ../../source/models/model_abilities/multimodal.rst:169 msgid "" "The ``limit_mm_per_prompt`` parameter only takes effect when using the " "VLLM backend. If your model uses other backends, this parameter will be " "ignored." msgstr "``limit_mm_per_prompt`` 参数仅在使用 VLLM 后端时生效。如果你的模型使用其他后端,此参数将被忽略。" #: ../../source/models/model_abilities/multimodal.rst:171 msgid "You can find more examples of ``vision`` ability in the tutorial notebook:" msgstr "你可以在教程笔记本中找到更多关于 ``vision`` 能力的示例。" #: ../../source/models/model_abilities/multimodal.rst:175 msgid "Qwen VL Chat" msgstr "Qwen VL Chat" #: ../../source/models/model_abilities/multimodal.rst:178 msgid "Learn vision ability from a example using qwen-vl-chat" msgstr "通过使用 qwen-vl-chat 的示例来学习使用 LLM 的视觉能力" #: ../../source/models/model_abilities/multimodal.rst:182 msgid "Audio" msgstr "音频" #: ../../source/models/model_abilities/multimodal.rst:184 msgid "" "With the ``audio`` ability you can have your model take in audio and " "performing audio analysis or direct textual responses with regard to " "speech instructions. Within Xinference, this indicates that certain " "models are capable of processing audio inputs when conducting dialogues " "via the Chat API." msgstr "" "通过“音频”功能,您的模型可以接收音频并执行音频分析或根据语音指令直接生成文本响应。在 Xinference 中,这表示某些模型在通过 Chat " "API 进行对话时能够处理音频输入。" #: ../../source/models/model_abilities/multimodal.rst:192 msgid "" "The ``audio`` ability is supported with the following models in " "Xinference:" msgstr "“音频”功能在 Xinference 中支持以下模型:" #: ../../source/models/model_abilities/multimodal.rst:194 msgid ":ref:`qwen2-audio-instruct `" msgstr "" #: ../../source/models/model_abilities/multimodal.rst:199 msgid "" "Audios are made available to the model in two main ways: by passing a " "link to the image or by passing the audio url directly in the request." msgstr "音频可以通过两种主要方式提供给模型:通过传递图像链接或在请求中直接传递音频 URL。" #: ../../source/models/model_abilities/multimodal.rst:204 msgid "Chat with audio" msgstr "带有音频的聊天" #~ msgid ":ref:`yi-vl-chat `" #~ msgstr "" #~ msgid ":ref:`internvl-chat `" #~ msgstr "" #~ msgid ":ref:`internvl2 `" #~ msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/rerank.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-02-01 16:47+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.13.1\n" #: ../../source/models/model_abilities/rerank.rst:5 msgid "Rerank" msgstr "重排序" #: ../../source/models/model_abilities/rerank.rst:7 msgid "Learn how to use rerank models in Xinference." msgstr "学习如何在Xinference中使用重新排序模型。" #: ../../source/models/model_abilities/rerank.rst:11 msgid "Introduction" msgstr "介绍" #: ../../source/models/model_abilities/rerank.rst:13 msgid "" "Given a query and a list of documents, Rerank indexes the documents from " "most to least semantically relevant to the query. Rerank models in " "Xinference can be invoked through the Rerank endpoint to rank a list of " "documents." msgstr "给定一个查询和一系列文档,Rerank 会根据与查询的语义相关性从最相关到最不相关对文档进行重新排序。" "在 Xinference 中,可以通过 Rerank 端点调用 Rerank 模型来对一系列文档进行排序。" #: ../../source/models/model_abilities/rerank.rst:18 msgid "Quickstart" msgstr "快速入门" #: ../../source/models/model_abilities/rerank.rst:20 msgid "" "We can try Rerank API out either via cURL, OpenAI Client, or Xinference's" " python client:" msgstr "我们可以通过cURL、OpenAI Client或Xinference的来尝试使用Rerank API:" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/tools.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-09-05 16:44+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/models/model_abilities/tools.rst:5 msgid "Tools" msgstr "工具" #: ../../source/models/model_abilities/tools.rst:7 msgid "Learn how to connect LLM with external tools." msgstr "学习如何将 LLM 与外部工具连接起来。" #: ../../source/models/model_abilities/tools.rst:11 msgid "Introduction" msgstr "介绍" #: ../../source/models/model_abilities/tools.rst:13 msgid "With the ``tools`` ability you can have your model use external tools." msgstr "通过 ``tools`` 功能,您可以让您的模型使用外部工具。" #: ../../source/models/model_abilities/tools.rst:16 msgid "" "Like `OpenAI's Function calling API " "`_, you can " "define the functions along with their parameters and have the model " "dynamically choose which function to call and what parameters to pass to " "it." msgstr "" "就像 `OpenAI 的 Function calling API " "`_ " "一样,你可以定义带有参数的函数,并让模型动态选择要调用哪个函数以及传递给它什么参数。" #: ../../source/models/model_abilities/tools.rst:19 msgid "This is the general process for calling a function:" msgstr "这是调用函数的一般过程:" #: ../../source/models/model_abilities/tools.rst:21 msgid "" "You submit a query, detailing the functions, their parameters, and " "descriptions." msgstr "您提交一个查询,详细说明函数、它们的参数和描述。" #: ../../source/models/model_abilities/tools.rst:22 msgid "" "The LLM decides whether to initiate the function. If chosen not to, it " "replies in everyday language, either offering a solution based on its " "inherent understanding or asking further details about the query and tool" " usage. On deciding to use a tool, it recommends the suitable API and " "instructions for its usage, framed in JSON." msgstr "" "LLM " "决定是否启动功能。如果选择不启动,它会用日常语言回复,要么基于其内在理解提供解决方案,要么询问有关查询和工具使用的进一步细节。在决定使用工具时,它会推荐适合的" " API 和 JSON 格式的使用说明。" #: ../../source/models/model_abilities/tools.rst:25 msgid "" "Following that, you implement the API call within your application and " "send the returned response back to the LLM for result analysis and " "proceeding with the next steps." msgstr "接下来,你在应用程序中实现 API 调用,并将返回的响应发送回 LLM 进行结果分析,并继续执行下一步操作。" #: ../../source/models/model_abilities/tools.rst:28 msgid "" "There is no dedicated API endpoint implemented for ``tools`` ability. It " "must be used in combination with Chat API." msgstr "目前没有为 ``tools`` 功能实现专用的 API 端点。它必须与 Chat API 结合使用。" #: ../../source/models/model_abilities/tools.rst:31 msgid "Supported models" msgstr "支持的模型列表" #: ../../source/models/model_abilities/tools.rst:33 msgid "" "The ``tools`` ability is supported with the following models in " "Xinference:" msgstr "Xinference 支持以下模型使用 ``tools`` 功能:" #: ../../source/models/model_abilities/tools.rst:35 msgid ":ref:`models_llm_glm4-chat`" msgstr "" #: ../../source/models/model_abilities/tools.rst:36 msgid ":ref:`models_llm_glm4-chat-1m`" msgstr "" #: ../../source/models/model_abilities/tools.rst:37 msgid ":ref:`models_llm_llama-3.1-instruct`" msgstr "" #: ../../source/models/model_abilities/tools.rst:38 msgid ":ref:`models_llm_llama-3.3-instruct`" msgstr "" #: ../../source/models/model_abilities/tools.rst:39 msgid ":ref:`models_llm_qwen1.5-chat`" msgstr "" #: ../../source/models/model_abilities/tools.rst:40 msgid ":ref:`models_llm_qwen1.5-moe-chat`" msgstr "" #: ../../source/models/model_abilities/tools.rst:41 msgid ":ref:`models_llm_qwen2-instruct`" msgstr "" #: ../../source/models/model_abilities/tools.rst:42 msgid ":ref:`models_llm_qwen2-moe-instruct`" msgstr "" #: ../../source/models/model_abilities/tools.rst:43 msgid ":ref:`models_llm_qwen2.5-instruct`" msgstr "" #: ../../source/models/model_abilities/tools.rst:44 msgid ":ref:`models_llm_qwen2.5-coder-instruct`" msgstr "" #: ../../source/models/model_abilities/tools.rst:45 msgid ":ref:`models_llm_qwq-32b`" msgstr "" #: ../../source/models/model_abilities/tools.rst:46 msgid ":ref:`models_llm_qwen3`" msgstr "" #: ../../source/models/model_abilities/tools.rst:47 msgid ":ref:`models_llm_qwen3-instruct`" msgstr "" #: ../../source/models/model_abilities/tools.rst:48 msgid ":ref:`models_llm_qwen3-coder`" msgstr "" #: ../../source/models/model_abilities/tools.rst:49 msgid ":ref:`models_llm_deepseek-v3`" msgstr "" #: ../../source/models/model_abilities/tools.rst:50 msgid ":ref:`models_llm_deepseek-r1-0528`" msgstr "" #: ../../source/models/model_abilities/tools.rst:53 msgid "Quickstart" msgstr "快速入门" #: ../../source/models/model_abilities/tools.rst:55 msgid "" "An optional parameter ``tools`` in the Chat API can be used to provide " "function specifications. The purpose of this is to enable models to " "generate function arguments which adhere to the provided specifications." msgstr "Chat API 中的可选参数 ``tools`` 可以用于提供函数规范。其目的是使模型能够生成符合所提供规范的函数参数。" #: ../../source/models/model_abilities/tools.rst:59 msgid "Example using OpenAI Client" msgstr "使用 OpenAI 客户端的示例" #: ../../source/models/model_abilities/tools.rst:108 #: ../../source/models/model_abilities/tools.rst:171 msgid "The output will be:" msgstr "输出结果是:" #: ../../source/models/model_abilities/tools.rst:126 #, fuzzy msgid "Example using Anthropic Client" msgstr "使用 Anthropic 客户端的示例" #: ../../source/models/model_abilities/tools.rst:191 msgid "" "Finish reason will be ``tool_calls`` if the LLM uses a tool call. " "Othewise it will be the default finish reason." msgstr "如果 LLM 使用了工具调用,完成原因将是 ``tool_calls`` 。否则,它将是默认的完成原因。" #: ../../source/models/model_abilities/tools.rst:196 msgid "" "The API will not actually execute any function calls. It is up to " "developers to execute function calls using model outputs." msgstr "API 本身不会执行任何函数调用。开发者需要使用模型输出来执行函数调用。" #: ../../source/models/model_abilities/tools.rst:200 msgid "You can find more examples of ``tools`` ability in the tutorial notebook:" msgstr "你可以在教程笔记本中找到更多关于 ``tools`` 能力的示例。" #: ../../source/models/model_abilities/tools.rst:204 msgid "Function calling" msgstr "函数调用" #: ../../source/models/model_abilities/tools.rst:207 msgid "Learn from a complete example demonstrating function calling" msgstr "学习一个完整的示例,演示函数调用的过程。" #~ msgid ":ref:`models_llm_qwen-chat`" #~ msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/video.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-06-01 16:29+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/models/model_abilities/video.rst:5 msgid "Video (Experimental)" msgstr "视频(实验性质)" #: ../../source/models/model_abilities/video.rst:7 msgid "Learn how to generate videos with Xinference." msgstr "学习如何使用 Xinference 生成视频" #: ../../source/models/model_abilities/video.rst:11 msgid "Introduction" msgstr "介绍" #: ../../source/models/model_abilities/video.rst:14 msgid "The Video API provides the ability to interact with videos:" msgstr "Video API 提供了和视频交互的方式:" #: ../../source/models/model_abilities/video.rst:17 msgid "" "The text-to-video endpoint create videos from scratch based on a text " "prompt." msgstr "Text-to-video 端点将一段文本提示词从头开始创建视频" #: ../../source/models/model_abilities/video.rst:18 msgid "" "The image-to-video endpoint create videos from scratch based on an input " "image." msgstr "Image-to-video 端点将一张图片从头开始创建视频" #: ../../source/models/model_abilities/video.rst:19 msgid "" "The firstlastframe-to-video endpoint creates videos based on the " "transition between a first and a last frame." msgstr "firstlastframe-to-video 接口根据首帧和尾帧之间的过渡生成视频。" #: ../../source/models/model_abilities/video.rst:39 msgid "Supported models" msgstr "支持的模型列表" #: ../../source/models/model_abilities/video.rst:41 msgid "" "The text-to-video API is supported with the following models in " "Xinference:" msgstr "Text-to-video API 在 Xinference 中支持以下模型:" #: ../../source/models/model_abilities/video.rst:43 msgid ":ref:`CogVideoX-2b `" msgstr "" #: ../../source/models/model_abilities/video.rst:44 msgid ":ref:`CogVideoX-5b `" msgstr "" #: ../../source/models/model_abilities/video.rst:45 msgid ":ref:`HunyuanVideo `" msgstr "" #: ../../source/models/model_abilities/video.rst:46 msgid ":ref:`Wan2.1-1.3B `" msgstr "" #: ../../source/models/model_abilities/video.rst:47 msgid ":ref:`Wan2.1-14B `" msgstr "" #: ../../source/models/model_abilities/video.rst:49 msgid "" "The image-to-video API is supported with the following models in " "Xinference:" msgstr "Image-to-video API 在 Xinference 中支持以下模型:" #: ../../source/models/model_abilities/video.rst:51 msgid ":ref:`Wan2.1-i2v-14B-480p `" msgstr "" #: ../../source/models/model_abilities/video.rst:52 msgid ":ref:`Wan2.1-i2v-14B-720p `" msgstr "" #: ../../source/models/model_abilities/video.rst:54 msgid "" "The firstlastframe-to-video API is supported with the following models in" " Xinference:" msgstr "Xinference 中支持以下模型使用 firstlastframe-to-video 接口:" #: ../../source/models/model_abilities/video.rst:56 msgid ":ref:`Wan2.1-flf2v-14B-720p `" msgstr "" #: ../../source/models/model_abilities/video.rst:59 msgid "Quickstart" msgstr "快速入门" #: ../../source/models/model_abilities/video.rst:62 msgid "Text-to-video" msgstr "文生视频" #: ../../source/models/model_abilities/video.rst:64 msgid "" "You can try text-to-video API out either via cURL, or Xinference's python" " client:" msgstr "可以通过 cURL 或 Xinference 的方式尝试使用 text-to-video API" #: ../../source/models/model_abilities/video.rst:91 msgid "Image-to-video" msgstr "图生视频" #: ../../source/models/model_abilities/video.rst:93 msgid "" "You can try image-to-video API out either via cURL, or Xinference's " "python client:" msgstr "可以通过 cURL 或 Xinference 的方式尝试使用 image-to-video API" #: ../../source/models/model_abilities/video.rst:118 msgid "FirstLastFrame-to-video" msgstr "首尾帧生视频" #: ../../source/models/model_abilities/video.rst:120 msgid "" "You can try firstlastframe-to-video API out either via cURL, or " "Xinference's python client:" msgstr "你可以通过 cURL 或 Xinference 的 Python 客户端来体验 firstlastframe-to-video 接口:" #: ../../source/models/model_abilities/video.rst:147 msgid "Memory optimization" msgstr "内存优化" #: ../../source/models/model_abilities/video.rst:149 msgid "" "Video generation will occupy huge GPU memory, for instance, running " "CogVideoX may require up to around 35 GB GPU memory." msgstr "" "视频生成会占用大量显存,举例来说,运行 CogVideoX 可能会使用到约 35 GB 的" "显存。" #: ../../source/models/model_abilities/video.rst:152 msgid "" "Xinference supports several options to optimize video model memory (VRAM)" " usage." msgstr "Xinference 支持若干选项,来优化视频模型显存(VRAM)使用。" #: ../../source/models/model_abilities/video.rst:154 msgid "CPU offloading or block level group offloading." msgstr "CPU 卸载或块级分组卸载。" #: ../../source/models/model_abilities/video.rst:155 msgid "Layerwise casting." msgstr "逐层类型转换(Layerwise casting)。" #: ../../source/models/model_abilities/video.rst:159 msgid "" "CPU offloading and Block Level Group Offloading cannot be enabled at the " "same time, but layerwise casting can be used in combination with either " "of them." msgstr "CPU 卸载和块级分组卸载不能同时开启,但逐层类型转换可以与其中之一配合使用。" #: ../../source/models/model_abilities/video.rst:163 msgid "CPU offloading" msgstr "CPU 卸载" #: ../../source/models/model_abilities/video.rst:165 msgid "" "CPU offloading keeps the model weights on the CPU and only loads them to " "the GPU when a forward pass needs to be executed. It is suitable for " "scenarios with extremely limited GPU memory, but it has a significant " "impact on performance." msgstr "" "CPU 卸载会将模型权重保留在 CPU 上,仅在执行前向传播时才加载到 GPU。适用于" "显存极其有限的场景,但对性能影响较大。" #: ../../source/models/model_abilities/video.rst:169 msgid "" "When running on GPU whose memory is less than 24 GB, we recommend to add " "``--cpu_offload True`` when launching model. For Web UI, add an extra " "option, ``cpu_offload`` with value set to ``True``." msgstr "" "当使用显存小于 24 GB 的 GPU 时,建议在启动模型时添加 ``--cpu_offload True" "``。对于 Web UI,可添加额外选项 ``cpu_offload``,值设为 ``True``。" #: ../../source/models/model_abilities/video.rst:178 msgid "Block Level Group Offloading" msgstr "块级分组卸载" #: ../../source/models/model_abilities/video.rst:180 msgid "" "Block Level Group Offloading groups multiple internal layers of the model" " (such as ``torch.nn.ModuleList`` or ``torch.nn.Sequential``) and loads " "these groups from the CPU to the GPU as needed during inference. Compared" " to CPU offloading, it uses more memory but has less impact on " "performance." msgstr "" "块级分组卸载将模型的多个内部层(如 ``torch.nn.ModuleList`` 或 ``torch.nn." "Sequential``)分组,并根据需要在推理过程中将这些分组从 CPU 加载到 GPU。与" " CPU 卸载相比,它使用更多的内存,但对性能的影响更小。" #: ../../source/models/model_abilities/video.rst:184 msgid "" "For the command line, add the ``--group_offload True`` option; for the " "Web UI, add an additional option ``group_offload`` with the value set to " "``True``." msgstr "" "对于命令行,添加 ``--group_offload True`` 选项;对于 Web UI,添加一个额外" "选项 ``group_offload``,值设为 ``True``。" #: ../../source/models/model_abilities/video.rst:187 msgid "" "We can speed up group offloading inference, by enabling the use of CUDA " "streams. However, using CUDA streams requires moving the model parameters" " into pinned memory. This allocation is handled by Pytorch under the " "hood, and can result in a significant spike in CPU RAM usage. Please " "consider this option if your CPU RAM is atleast 2X the size of the model " "you are group offloading. Enable CUDA streams via adding ``--use_stream " "True`` for command line; for the Web UI, add an additional option " "``use_stream`` with the value set to ``True``." msgstr "" "通过启用 CUDA 流,我们可以加速分组卸载推理。然而,使用 CUDA 流需要将模型" "参数移动到固定内存中。这项分配由 Pytorch 在后台处理,并可能导致 CPU RAM " "使用量显著增加。如果您的 CPU RAM 至少是模型大小的两倍,请考虑使用此选项。" "通过在命令行中添加 ``--use_stream True`` 启用 CUDA 流;对于 Web UI,添加" "一个额外选项 ``use_stream``,值设为 ``True``。" #: ../../source/models/model_abilities/video.rst:199 msgid "Applying Layerwise Casting to the Transformer" msgstr "将逐层类型转换应用于 Transformer" #: ../../source/models/model_abilities/video.rst:201 msgid "" "Layerwise casting will downcast each layer’s weights to ``torch.float8_" "e4m3fn``, temporarily upcast to ``torch.bfloat16`` during the forward " "pass of the layer, then revert to ``torch.float8_e4m3fn`` afterward. This" " approach reduces memory requirements by approximately 50% while " "introducing a minor quality reduction in the generated video due to the " "precision trade-off. Enable layerwise casting via adding ``--layerwise_" "cast True`` for command line; for the Web UI, add an additional option ``" "layerwise_cast`` with the value set to ``True``." msgstr "" "逐层类型转换将把每个层的权重降级为 ``torch.float8_e4m3fn``,在层的前向" "传播过程中暂时升级为 ``torch.bfloat16``,然后在之后恢复为 ``torch.float8_" "e4m3fn``。这种方法将内存需求减少约 50%,同时由于精度折衷,生成的视频质量" "会略有下降。通过在命令行中添加 ``--layerwise_cast True`` 来启用逐层" "类型转换;对于 Web UI,添加一个额外选项 ``layerwise_cast``,值设为 ``True" "``。" #: ../../source/models/model_abilities/video.rst:208 msgid "This example will require 20GB of VRAM." msgstr "此示例将需要 20GB 的显存。" #~ msgid "OpenAI-compatible ENDPOINT" #~ msgstr "OpenAI 兼容端点" #~ msgid "API" #~ msgstr "" #~ msgid "Endpoint" #~ msgstr "端点" #~ msgid "Text-to-Video API" #~ msgstr "文生视频 API" #~ msgid "/v1/video/generations" #~ msgstr "" #~ msgid "Image-to-Video API" #~ msgstr "图生视频 API" #~ msgid "/v1/video/generations/image" #~ msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_memory.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-06-06 17:26+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.15.0\n" #: ../../source/models/model_memory.rst:5 msgid "Model Memory Calculation" msgstr "模型显存使用量计算" #: ../../source/models/model_memory.rst:7 msgid "" "For better planning of VMEM usage, xinference provided tool for model " "memory calculation: ``cal-model-mem``" msgstr "为了更好规划显存使用, Xinference 提供了计算模型显存使用量的工具:``cal-model-mem``" #: ../../source/models/model_memory.rst:9 msgid "Use algorithm from https://github.com/RahulSChand/gpu_poor" msgstr "算法来自:https://github.com/RahulSChand/gpu_poor" #: ../../source/models/model_memory.rst:11 msgid "Output: model_mem, kv_cache, overhead, active_mem" msgstr "输出:model_mem, kv_cache, overhead, active_mem" #: ../../source/models/model_memory.rst:13 msgid "" "Example: To calculate memory usage for qwen1.5-chat, run the following " "command:" msgstr "示例:计算 qwen1.5-chat 模型的显存用量,可以运行以下示例指令:" #: ../../source/models/model_memory.rst:37 msgid "Syntax" msgstr "语法" #: ../../source/models/model_memory.rst:39 msgid "--size-in-billions {model_size}" msgstr "" #: ../../source/models/model_memory.rst:42 msgid "-s {model_size}" msgstr "" #: ../../source/models/model_memory.rst:45 msgid "" "Set the model size. Specify the model size in billions of parameters. " "Format accept 1_8 and 1.8. For example, 7 for 7.0B model size." msgstr "设置模型大小。以十亿个参数为单位指定模型大小。参数格式接受形式如 1_8 和 1.8. 例如,7 表示 7.0B 的模型大小。" #: ../../source/models/model_memory.rst:50 msgid "--quantization {precision}" msgstr "" #: ../../source/models/model_memory.rst:53 msgid "-q {precision} *(Optional)*" msgstr "-q {precision} *(可选)*" #: ../../source/models/model_memory.rst:56 msgid "" "Define the quantization settings for the model. For example, Int4 for " "INT4 quantization." msgstr "指定模型的量化配置。例如:Int4 参数表示使用 INT4 量化。" #: ../../source/models/model_memory.rst:60 msgid "--model-name {model_name}" msgstr "" #: ../../source/models/model_memory.rst:63 msgid "-n {model_name} *(Optional)*" msgstr "-n {model_name} *(可选)*" #: ../../source/models/model_memory.rst:66 msgid "" "Specify the model's name. If provided, fetch model config from " "huggingface/modelscope; If not specified, use default model layer to " "estimate." msgstr "" "指定模型名称。如果提供此参数,将从 huggingface/modelscope 中获取模型配置;如果没有指定,将使用默认的 layer " "参数粗略估计。" #: ../../source/models/model_memory.rst:70 msgid "--context-length {context_length}" msgstr "" #: ../../source/models/model_memory.rst:73 msgid "-c {context_length}" msgstr "" #: ../../source/models/model_memory.rst:76 msgid "" "Specify the maximum number of tokens(context length) that your model " "support." msgstr "指定模型的最大上下文长度。" #: ../../source/models/model_memory.rst:79 msgid "--model-format {format}" msgstr "" #: ../../source/models/model_memory.rst:82 msgid "-f {format}" msgstr "" #: ../../source/models/model_memory.rst:85 msgid "Specify the format of the model, e.g. pytorch, ggmlv3, etc." msgstr "指定模型的格式,例如:pytorch, ggmlv3, etc." #: ../../source/models/model_memory.rst:89 msgid "" "The environment variable ``HF_ENDPOINT`` could set the endpoint of " "HuggingFace. e.g. hf-mirror, etc. Please refer to :ref:`this document " "`" msgstr "" "利用环境变量 ``HF_ENDPOINT`` 可设置 HuggingFace 服务器的 Endpoint。例如,当网络不佳时可以选择 hf-" "mirror 作为 Endpoint. 更多请参考 :ref:`此文档 `" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/model_update.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2025, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2025. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-11-14 14:48+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/models/model_update.rst:5 msgid "Model Update" msgstr "模型更新" #: ../../source/models/model_update.rst:8 msgid "" "This section briefly introduces a common operation on the \"Launch " "Model\" page: updating the model list. It corresponds to the \"Type " "Selection + Update\" button at the top of the page, which is used to " "quickly refresh models of a specific type." msgstr "" "本节简要介绍“启动模型”页面上的一项常见操作:更新模型列表。它对应页面" "顶部的“类型选择 + 更新”按钮,用于快速刷新某一类型的模型。" #: ../../source/models/model_update.rst:10 msgid "" "Model update rely on the online model list service provided by " ":ref:`xinference_models_hub` ." msgstr "模型更新依赖 :ref:`xinference_models_hub` 提供的在线模型列表服务。" #: ../../source/models/model_update.rst:17 msgid "Update Models" msgstr "更新模型" #: ../../source/models/model_update.rst:19 msgid "" "Operation Location: \"Type Selection\" dropdown and \"Update\" button at " "the top right of the page." msgstr "操作位置:页面右上角的“类型选择”下拉框和“更新”按钮。" #: ../../source/models/model_update.rst:20 msgid "Usage:" msgstr "使用方法:" #: ../../source/models/model_update.rst:21 msgid "" "Select a model type from the dropdown (such as llm, embedding, rerank, " "image, audio, video)." msgstr "" "从下拉菜单中选择模型类型(例如 llm、embedding、rerank、image、audio、" "video)。" #: ../../source/models/model_update.rst:22 msgid "" "Click the \"Update\" button, the page will send an update request to the " "backend, then automatically jump to the corresponding Tab and refresh the" " model list of that type." msgstr "" "点击“更新”按钮后,页面会向后端发送更新请求,并自动跳转到对应的选项卡," "刷新该类型的模型列表。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/source/source.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-10-20 15:15+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/models/source/source.rst:5 msgid "Download Source" msgstr "" #: ../../source/models/source/source.rst:7 msgid "Xinference supports downloading various models from different sources." msgstr "" #: ../../source/models/source/source.rst:10 msgid "HuggingFace" msgstr "" #: ../../source/models/source/source.rst:11 msgid "" "Xinference directly downloads the required models from the official " "`Hugging Face model repository `_ by " "default." msgstr "" #: ../../source/models/source/source.rst:14 msgid "ModelScope" msgstr "" #: ../../source/models/source/source.rst:15 msgid "" "Users can choose to download models from the `ModelScope model repository" " `_." msgstr "" #: ../../source/models/source/source.rst:17 msgid "Xinference supports downloading the following models from ModelScope:" msgstr "" #: ../../source/models/source/source.rst:30 msgid "LLM Models" msgstr "" #: ../../source/models/source/source.rst:20 msgid "llama-2-chat" msgstr "" #: ../../source/models/source/source.rst:21 msgid "tiny-llama" msgstr "" #: ../../source/models/source/source.rst:22 msgid "baichuan-2-chat" msgstr "" #: ../../source/models/source/source.rst:23 msgid "baichuan-2" msgstr "" #: ../../source/models/source/source.rst:24 msgid "chatglm2" msgstr "" #: ../../source/models/source/source.rst:25 msgid "chatglm2-32k" msgstr "" #: ../../source/models/source/source.rst:26 msgid "internlm-7b" msgstr "" #: ../../source/models/source/source.rst:27 msgid "internlm-chat-7b" msgstr "" #: ../../source/models/source/source.rst:28 msgid "internlm-20b" msgstr "" #: ../../source/models/source/source.rst:29 msgid "internlm-chat-20b" msgstr "" #: ../../source/models/source/source.rst:30 msgid "wizardcoder-python-v1.0" msgstr "" #: ../../source/models/source/source.rst:49 msgid "Embedding Models" msgstr "" #: ../../source/models/source/source.rst:33 msgid "bge-large-en" msgstr "" #: ../../source/models/source/source.rst:34 msgid "bge-base-en" msgstr "" #: ../../source/models/source/source.rst:35 msgid "gte-large" msgstr "" #: ../../source/models/source/source.rst:36 msgid "gte-base" msgstr "" #: ../../source/models/source/source.rst:37 msgid "e5-large-v2" msgstr "" #: ../../source/models/source/source.rst:38 msgid "bge-large-zh" msgstr "" #: ../../source/models/source/source.rst:39 msgid "bge-large-zh-noinstruct" msgstr "" #: ../../source/models/source/source.rst:40 msgid "bge-base-zh" msgstr "" #: ../../source/models/source/source.rst:41 msgid "multilingual-e5-large" msgstr "" #: ../../source/models/source/source.rst:42 msgid "bge-small-zh" msgstr "" #: ../../source/models/source/source.rst:43 msgid "bge-small-zh-v1.5" msgstr "" #: ../../source/models/source/source.rst:44 msgid "bge-base-zh-v1.5" msgstr "" #: ../../source/models/source/source.rst:45 msgid "bge-large-zh-v1.5" msgstr "" #: ../../source/models/source/source.rst:46 msgid "bge-small-en-v1.5" msgstr "" #: ../../source/models/source/source.rst:47 msgid "bge-base-en-v1.5" msgstr "" #: ../../source/models/source/source.rst:48 msgid "bge-large-en-v1.5" msgstr "" #: ../../source/models/source/source.rst:51 msgid "" "One of the following settings will make Xinference download models from " "ModelScope:" msgstr "" #: ../../source/models/source/source.rst:53 msgid "The operating system's language is set to Simplified Chinese (zh_CN)." msgstr "" #: ../../source/models/source/source.rst:54 msgid "Set the environment variable ``XINFERENCE_MODEL_SRC=modelscope``." msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/sources/sources.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2024-01-02 16:27+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.13.1\n" #: ../../source/models/sources/sources.rst:5 msgid "Download Sources" msgstr "模型来源" #: ../../source/models/sources/sources.rst:7 msgid "Xinference supports downloading various models from different sources." msgstr "Xinference 支持从不同的来源下载各种模型。" #: ../../source/models/sources/sources.rst:10 msgid "HuggingFace" msgstr "" #: ../../source/models/sources/sources.rst:11 msgid "" "Xinference directly downloads the required models from the official " "`Hugging Face model repository `_ by " "default." msgstr "" "Xinference 默认直接从 `Hugging Face 官方模型仓库 `_ " "下载所需的模型。" #: ../../source/models/sources/sources.rst:14 msgid "" "If you have trouble connecting to Huggingface, you can use a mirror " "website to download with setting the environment variable " "``HF_ENDPOINT=https://hf-mirror.com``." msgstr "" "如果你的网络无法连接到 HuggingFace ,你可以通过环境变量指定 HuggingFace 镜像网站:``HF_ENDPOINT=https" "://hf-mirror.com`` 。" #: ../../source/models/sources/sources.rst:18 msgid "ModelScope" msgstr "" #: ../../source/models/sources/sources.rst:20 msgid "" "When Xinference detects that the system's language is set to Simplified " "Chinese, it will automatically set the model download source to " "`ModelScope `_." msgstr "当 Xinference 检测到系统语言设置为“简体中文”时,会将模型下载源设置为 `ModelScope `_。" #: ../../source/models/sources/sources.rst:23 msgid "" "You can also achieve this by manually setting an environment variable " "``XINFERENCE_MODEL_SRC=modelscope``." msgstr "你也可以通过手动设置环境变量 ``XINFERENCE_MODEL_SRC=modelscope`` 来实现这一点。" #: ../../source/models/sources/sources.rst:25 msgid "" "Please check the detail page of a model to confirm whether the model " "supports downloading from ModelScope. If a model spec supports " "downloading from ModelScope, the \"Model Hubs\" section in the spec " "information will include \"ModelScope\"." msgstr "请在模型的详情页面上查看它是否支持从 ModelScope 进行下载。如果一个模型支持从 ModelScope 下载,模型信息中的 Model Hubs 这一项会包含 ModelScope。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2025. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-30 19:14+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/models/virtualenv.rst:6 msgid "Model Virtual Environments" msgstr "模型虚拟环境" #: ../../source/models/virtualenv.rst:11 msgid "Background" msgstr "背景" #: ../../source/models/virtualenv.rst:13 msgid "" "Some models are no longer maintained after their release, and the " "versions of the libraries they depend on remain outdated. For example, " "the ``GOT-OCR2`` model still relies on ``transformers`` version 4.37.2. " "If this library is updated to a newer version, the model can no longer " "function properly. On the other hand, many newer models require the " "latest version of ``transformers``. This version mismatch leads to " "dependency conflicts." msgstr "" "一些模型在发布后不再维护,其依赖的库版本也保持在较旧的状态。例如,``GOT-OCR2`` 模型仍依赖于 ``transformers`` " "4.37.2。如果将该库升级为新版本,模型将无法正常运行;而许多新模型又需要最新版本的 " "``transformers``。这种版本差异会导致依赖冲突。" #: ../../source/models/virtualenv.rst:19 msgid "Solution" msgstr "解决方案" #: ../../source/models/virtualenv.rst:21 msgid "" "To address this issue, we have introduced the **Model Virtual " "Environment** feature." msgstr "为了解决这个问题,我们引入了 **模型虚拟环境** 功能。" #: ../../source/models/virtualenv.rst:23 msgid "Install requirements for this functionality via" msgstr "通过以下命令安装该功能所需的依赖" #: ../../source/models/virtualenv.rst:32 msgid "" "Enable by setting environment variable " "``XINFERENCE_ENABLE_VIRTUAL_ENV=1``." msgstr "通过设置环境变量 ``XINFERENCE_ENABLE_VIRTUAL_ENV=1`` 启用该功能。" #: ../../source/models/virtualenv.rst:34 ../../source/models/virtualenv.rst:217 #: ../../source/models/virtualenv.rst:233 msgid "Example usage:" msgstr "使用示例:" #: ../../source/models/virtualenv.rst:46 msgid "This feature requires internet access or a self-hosted PyPI mirror." msgstr "该功能需要联网,或使用自建的 PyPI 镜像服务。" #: ../../source/models/virtualenv.rst:48 msgid "Xinference will by default inherit the config for current pip." msgstr "Xinference 默认会继承当前 pip 的配置。" #: ../../source/models/virtualenv.rst:52 msgid "" "Note: When launching a vLLM/SgLang engine model inside a virtual " "environment, if you encounter a cuDNN error, you can set:" msgstr "注意:在虚拟环境中启动vLLM/SgLang引擎模型时,若遇到cuDNN错误,可设置:" #: ../../source/models/virtualenv.rst:63 msgid "" "Starting from **Xinference v2.0**, the model virtual environment feature " "is enabled by default (i.e., ``XINFERENCE_ENABLE_VIRTUAL_ENV`` defaults " "to ``1``)." msgstr "" "从 **Xinference v2.0** 开始,模型虚拟环境功能默认启用(即 ``XINFERENCE_ENABLE_VIRTUAL_ENV``" " 默认值为 ``1`` )。" #: ../../source/models/virtualenv.rst:66 msgid "" "To disable it globally, set ``XINFERENCE_ENABLE_VIRTUAL_ENV=0`` when " "starting Xinference." msgstr "要全局禁用该功能,请在启动Xinference时设置 ``XINFERENCE_ENABLE_VIRTUAL_ENV=0`` 。" #: ../../source/models/virtualenv.rst:68 msgid "" "When enabled, Xinference will automatically create a dedicated virtual " "environment for each model when it is loaded, and install its specific " "dependencies there. This prevents dependency conflicts between models, " "allowing them to run in isolation without affecting one another." msgstr "" "启用该功能后,Xinference " "会在加载模型时自动为其创建专属的虚拟环境,并在其中安装对应依赖。这可避免模型之间的依赖冲突,确保各模型在相互隔离的环境中独立运行。" #: ../../source/models/virtualenv.rst:73 msgid "Using Virtual Environments (v2.0)" msgstr "虚拟环境管理(v2.0)" #: ../../source/models/virtualenv.rst:76 msgid "Global toggle" msgstr "全局切换" #: ../../source/models/virtualenv.rst:78 msgid "" "Virtual environments are enabled by default starting from v2.0. You can " "still override this globally:" msgstr "从v2.0版本开始,虚拟环境默认处于启用状态。您仍可通过全局设置覆盖此选项:" #: ../../source/models/virtualenv.rst:89 msgid "Per-model override at launch time" msgstr "启动时按模型覆盖" #: ../../source/models/virtualenv.rst:91 msgid "You can override the global setting when launching a model:" msgstr "在启动模型时,您可以覆盖全局设置:" #: ../../source/models/virtualenv.rst:102 msgid "Add or override packages at launch time" msgstr "在启动时添加或覆盖包" #: ../../source/models/virtualenv.rst:104 msgid "Use ``--virtual-env-package`` (or ``-vp``) multiple times:" msgstr "命令行中,使用 ``--virtual-env-package`` 或 ``-vp`` 来指定单个包版本。" #: ../../source/models/virtualenv.rst:112 msgid "" "If you specify a package that already exists in the model's default " "virtualenv package list, your version replaces the default instead of " "being appended." msgstr "若指定的软件包已在模型的默认虚拟环境软件包列表中存在,则您指定的版本将覆盖默认版本,而非追加至列表中。" #: ../../source/models/virtualenv.rst:117 msgid "Storage Location" msgstr "存储位置" #: ../../source/models/virtualenv.rst:119 msgid "By default, the model’s virtual environment is stored under path:" msgstr "默认情况下,模型的虚拟环境存储在以下路径" #: ../../source/models/virtualenv.rst:121 #, python-brace-format msgid "" "Before v1.6.0: :ref:`XINFERENCE_HOME ` / " "virtualenv / {model_name}" msgstr "" "在 v1.6.0 之前::ref:`XINFERENCE_HOME ` / " "virtualenv / {model_name}" #: ../../source/models/virtualenv.rst:122 #, python-brace-format msgid "" "From v1.6.0 to v1.13.0: :ref:`XINFERENCE_HOME " "` / virtualenv / v2 / {model_name}" msgstr "" "从 v1.6.0 到 v1.13.0::ref:`XINFERENCE_HOME ` " "/ virtualenv / v2 / {model_name}" #: ../../source/models/virtualenv.rst:123 #, python-brace-format msgid "" "Since v1.14.0: :ref:`XINFERENCE_HOME ` / " "virtualenv / v3 / {model_name} / {python_version}" msgstr "" "从 v1.14.0 开始::ref:`XINFERENCE_HOME ` / " "virtualenv / v3 / {model_name} / {python_version}" #: ../../source/models/virtualenv.rst:124 #, python-brace-format msgid "" "Since v2.0: :ref:`XINFERENCE_HOME ` / " "virtualenv / v4 / {model_name} / {model_engine} / {python_version}" msgstr "" "自 v2.0 起::ref:`XINFERENCE_HOME ` / " "virtualenv / v4 / {model_name} / {model_engine} / {python_version}" #: ../../source/models/virtualenv.rst:127 msgid "Skip Installed Libraries" msgstr "跳过已安装的库" #: ../../source/models/virtualenv.rst:133 msgid "" "This feature requires ``xoscar >= 0.7.12``, which is the minimum Xoscar " "version required for Xinference v1.8.1." msgstr "此功能要求 ``xoscar >= 0.7.12``,这是 Xinference v1.8.1 需要的最低 Xoscar 版本。" #: ../../source/models/virtualenv.rst:135 msgid "" "``xinference`` uses the ``uv`` tool to create virtual environments, with " "the current Python **system site-packages** set as the base environment. " "By default, ``uv`` **does not check for existing packages in the system " "environment** and reinstalls all dependencies in the virtual environment." " This ensures better isolation from system packages but can result in " "redundant installations, longer setup times, and increased disk usage." msgstr "" "``xinference`` 使用 ``uv`` 工具创建虚拟环境,并将当前 Python 的 **system site-packages** " "设置为基础环境。默认情况下,``uv`` " "**不会检查系统环境中是否已有包**,而是会在虚拟环境中重新安装所有依赖。这种方式可以更好地与系统包隔离,但可能导致重复安装、初始化时间变长以及磁盘占用增加。" #: ../../source/models/virtualenv.rst:139 msgid "" "Starting from ``v1.8.1``, an **experimental feature** is available: by " "setting the environment variable " "``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=1``, ``uv`` will **skip packages " "already available in system site-packages**." msgstr "" "从 ``v1.8.1`` 开始,提供了一个 **实验功能**:通过设置环境变量 " "``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=1``,``uv`` 将会 **跳过系统 site-" "packages 中已存在的包**。" #: ../../source/models/virtualenv.rst:144 msgid "" "This feature is enabled by default in ``v2.0``. To disable it, set " "``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0``." msgstr "" "此功能在 ``v2.0`` 版本中默认启用。若需禁用,请设置 " "``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0`` 。" #: ../../source/models/virtualenv.rst:147 msgid "Advantages" msgstr "优势" #: ../../source/models/virtualenv.rst:149 msgid "" "Avoid redundant installations of large dependencies (e.g., ``torch`` + " "``CUDA``)." msgstr "避免重复安装大型依赖(例如 ``torch`` + ``CUDA`` )。" #: ../../source/models/virtualenv.rst:150 msgid "Speed up virtual environment creation." msgstr "加快虚拟环境创建速度。" #: ../../source/models/virtualenv.rst:151 msgid "Reduce disk usage." msgstr "减少磁盘空间占用。" #: ../../source/models/virtualenv.rst:154 msgid "Usage" msgstr "使用" #: ../../source/models/virtualenv.rst:166 msgid "Performance Comparison" msgstr "性能对比" #: ../../source/models/virtualenv.rst:168 msgid "Using the ``CosyVoice 0.5B`` model as an example:" msgstr "以 ``CosyVoice 0.5B`` 模型为例:" #: ../../source/models/virtualenv.rst:170 msgid "**Without this feature enabled**::" msgstr "**未开启该功能时**::" #: ../../source/models/virtualenv.rst:181 msgid "**With this feature enabled**::" msgstr "**开启该功能后**::" #: ../../source/models/virtualenv.rst:196 msgid "Model Launching: Toggle Virtual Environments and Customize Dependencies" msgstr "模型加载:开关虚拟环境并自定义依赖" #: ../../source/models/virtualenv.rst:200 msgid "" "Starting from v1.8.1, we support toggling the virtual environment for " "individual model launching, as well as overriding the model's default " "settings with custom package dependencies." msgstr "从 v1.8.1 开始,我们支持对单个模型加载开关虚拟环境,并用自定义包依赖覆盖模型的默认设置。" #: ../../source/models/virtualenv.rst:204 msgid "Toggle Virtual Environment" msgstr "开关模型虚拟空间" #: ../../source/models/virtualenv.rst:206 msgid "" "When loading a model, you can specify whether to enable the model's " "virtual environment. If not specified, the setting will follow the " "environment variable configuration." msgstr "加载模型时,可以指定是否启用模型的虚拟环境。如果未指定,则默认遵循环境变量的配置。" #: ../../source/models/virtualenv.rst:209 msgid "" "For the Web UI, this can be toggled on or off through the optional " "settings switch." msgstr "在 Web UI 中,可以通过可选设置开关打开或关闭该功能。" #: ../../source/models/virtualenv.rst:215 msgid "" "For command-line loading, use the ``--enable-virtual-env`` option to " "enable the virtual environment, or ``--disable-virtual-env`` to disable " "it." msgstr "" "命令行加载时,使用 ``--enable-virtual-env`` 选项启用虚拟环境,使用 ``--disable-virtual-env`` " "选项禁用虚拟环境。" #: ../../source/models/virtualenv.rst:224 msgid "Set Virtual Environment Package Dependencies" msgstr "设置虚拟环境包依赖" #: ../../source/models/virtualenv.rst:226 msgid "" "For supported models, Xinference has already defined the package " "dependencies and version requirements within the virtual environment. " "However, if you need to specify particular versions or install additional" " dependencies, you can manually provide them during model loading." msgstr "对于支持的模型,Xinference 已经在虚拟环境中定义了包依赖和版本要求。但如果需要指定特定版本或安装额外依赖,可以在加载模型时手动提供。" #: ../../source/models/virtualenv.rst:229 msgid "" "In the Web UI, you can add custom dependencies by clicking the plus icon " "in the same location as the virtual environment toggle." msgstr "在 Web UI 中,可以在虚拟环境开关同一位置点击加号图标来添加自定义依赖。" #: ../../source/models/virtualenv.rst:231 msgid "" "For the command line, use ``--virtual-env-package`` or ``-vp`` to specify" " a single package version." msgstr "命令行中,使用 ``--virtual-env-package`` 或 ``-vp`` 来指定单个包版本。" #: ../../source/models/virtualenv.rst:239 msgid "" "In addition to the standard way of specifying package dependencies, such " "as ``transformers==xxx``, Xinference also supports some extended syntax." msgstr "除了常规的包依赖指定方式(如 ``transformers==xxx``),Xinference 还支持一些扩展语法。" #: ../../source/models/virtualenv.rst:241 msgid "" "``#system_xxx#``: Using the same version as the system site packages, " "such as ``#system_numpy#``, ensures that the installed package matches " "the system site package version of numpy. This helps prevent dependency " "conflicts." msgstr "" "``#system_xxx#``:使用与系统 site packages 相同的版本,例如 " "``#system_numpy#``,确保安装的包版本与系统 site packages 中的 numpy 版本一致,防止依赖冲突。" #: ../../source/models/virtualenv.rst:248 msgid "Manage Virtual Enviroments" msgstr "虚拟环境管理" #: ../../source/models/virtualenv.rst:252 msgid "" "Xinference provides comprehensive virtual environment management for " "model dependencies, allowing you to create isolated Python environments " "for each model with specific package requirements." msgstr "Xinference 提供全面的虚拟环境管理功能,允许您为每个模型创建独立的 Python 环境,满足特定的包依赖需求。" #: ../../source/models/virtualenv.rst:264 msgid "Key Features" msgstr "核心功能" #: ../../source/models/virtualenv.rst:266 msgid "" "**Multiple Python Version Support**: Each model can have virtual " "environments with different Python versions (e.g., Python 3.10.18, " "3.11.5), enabling compatibility with various model requirements." msgstr "" "**多 Python 版本支持** : 每个模型可以拥有不同 Python 版本的虚拟环境(例如 Python " "3.10.18、3.11.5),实现与各种模型要求的兼容性。" #: ../../source/models/virtualenv.rst:271 msgid "" "**Isolated Dependencies**: Each virtual environment contains its own set " "of packages, preventing conflicts between different models' requirements." msgstr "**依赖隔离** : 每个虚拟环境包含自己独立的包集合,防止不同模型之间的依赖冲突。" #: ../../source/models/virtualenv.rst:276 msgid "Management Operations" msgstr "管理操作" #: ../../source/models/virtualenv.rst:278 msgid "" "**Listing Virtual Environments**: View all virtual environments across " "your cluster, filtered by model name or worker IP address." msgstr "**列出虚拟环境** : 查看集群中的所有虚拟环境,支持按模型名称或工作节点 IP 地址过滤。" #: ../../source/models/virtualenv.rst:282 msgid "" "**Creating Environments**: Automatically created when launching models " "with enable_virtual_env=true. The system detects your current Python " "version and creates an isolated environment with the required packages." msgstr "" "**创建环境** : 当使用 enable_virtual_env=true 启动模型时自动创建。系统会检测当前的 Python " "版本并创建包含所需包的独立环境。" #: ../../source/models/virtualenv.rst:287 msgid "" "**Removing Environments**: Delete specific virtual environments by model " "name and optionally Python version, or remove all environments for a " "model." msgstr "**删除环境** : 可按模型名称和可选的 Python 版本删除特定虚拟环境,或删除模型的所有环境。" #: ../../source/models/virtualenv.rst:292 msgid "ModelHub JSON for Xinference Models" msgstr "ModelHub JSON 格式(适用于 Xinference 模型)" #: ../../source/models/virtualenv.rst:294 msgid "" "If you plan to add a model to a model hub for Xinference, define a " "``virtualenv`` block in the model JSON. Starting from v2.0 (v4 flow), " "**engine-aware markers are recommended** so one JSON can cover multiple " "engines." msgstr "" "若计划将模型添加至Xinference的Model Hub,请在模型JSON中定义一个``virtualenv``块。自v2.0(v4流程)起, " "**建议使用引擎感知标记** ,以便单个JSON文件覆盖多个引擎。" #: ../../source/models/virtualenv.rst:298 msgid "" "Important rule: If a new model supports a specific engine, you **must** " "include at least one package entry for that engine in " "``virtualenv.packages`` and attach a marker, for example ``#engine# == " "\"vllm\"``. Engine availability checks rely on these markers when virtual" " environments are enabled." msgstr "" "重要规则:若新模型支持特定引擎,则 **必须** 在 ``virtualenv.packages`` " "中至少包含该引擎的一个包条目,并附加标记(例如 ``#engine# == \"vllm\"`` " ")。当虚拟环境启用时,引擎可用性检查依赖这些标记进行验证。" #: ../../source/models/virtualenv.rst:325 msgid "``packages`` (required): list of pip requirement strings or markers." msgstr " ``packages`` (必填):pip 要求字符串或标记的列表。" #: ../../source/models/virtualenv.rst:326 msgid "" "``inherit_pip_config`` (default ``true``): inherit system pip " "configuration if present." msgstr " ``inherit_pip_config`` (默认值为 ``true`` ):若存在系统 pip 配置文件,则继承其设置。" #: ../../source/models/virtualenv.rst:327 msgid "" "``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host``: " "pip index and mirror controls." msgstr "" " ``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host`` " ": pip 索引和镜像控制。" #: ../../source/models/virtualenv.rst:329 msgid "" "``index_strategy``: passed through to the virtualenv installer (used by " "some engines)." msgstr " ``index_strategy`` :传递给虚拟环境安装程序(由某些引擎使用)。" #: ../../source/models/virtualenv.rst:330 msgid "``no_build_isolation``: pip build isolation switch for tricky builds." msgstr " ``no_build_isolation`` :用于处理复杂构建的pip构建隔离开关。" #: ../../source/models/virtualenv.rst:335 msgid "Use wrapped placeholders to inject engine defaults:" msgstr "使用包裹的占位符注入引擎默认值:" #: ../../source/models/virtualenv.rst:337 msgid "``#vllm_dependencies#``" msgstr "" #: ../../source/models/virtualenv.rst:338 msgid "``#sglang_dependencies#``" msgstr "" #: ../../source/models/virtualenv.rst:339 msgid "``#mlx_dependencies#``" msgstr "" #: ../../source/models/virtualenv.rst:340 msgid "``#transformers_dependencies#``" msgstr "" #: ../../source/models/virtualenv.rst:341 msgid "``#llama_cpp_dependencies#``" msgstr "" #: ../../source/models/virtualenv.rst:342 msgid "``#diffusers_dependencies#``" msgstr "" #: ../../source/models/virtualenv.rst:343 msgid "``#sentence_transformers_dependencies#``" msgstr "" #: ../../source/models/virtualenv.rst:348 msgid "" "Markers use ``#engine#`` or ``#model_engine#`` comparisons (case-" "sensitive). Engine values are passed in lowercase internally, so prefer " "lowercase values, for example ``#engine# == \"vllm\"`` or ``#engine# == " "\"transformers\"``." msgstr "" "标记使用 ``#engine#`` 或 ``#model_engine#`` " "进行比较(区分大小写)。引擎值在内部以小写形式传递,因此建议使用小写值,例如 ``#engine# == \"vllm\"`` 或 " "``#engine# == \"transformers\"`` 。" #~ msgid "Manage Virtual Enviroments" #~ msgstr "虚拟环境管理" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/xinference_model_hub.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2025, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2025. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-11-13 17:28+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/models/xinference_model_hub.rst:5 msgid "Xinference Models Hub User Guide" msgstr "Xinference 模型中心用户指南" #: ../../source/models/xinference_model_hub.rst:9 msgid "Overview" msgstr "概述" #: ../../source/models/xinference_model_hub.rst:11 msgid "" "`Xinference Models Hub `_ is a full-stack " "platform for managing and sharing models, providing a complete solution " "for model registration, browsing, review workflows, and collaborative " "model management." msgstr "" "`Xinference 模型中心 `_ 是一个用于管理和共享模型的全栈平台," "提供模型注册、浏览、评审流程及协作管理的完整解决方案。" #: ../../source/models/xinference_model_hub.rst:14 msgid "User Guide for Regular Users" msgstr "普通用户指南" #: ../../source/models/xinference_model_hub.rst:16 msgid "" "This section introduces the main features available to regular registered" " users, including model browsing and personal center management." msgstr "本节介绍普通注册用户可使用的主要功能,包括模型浏览和个人中心管理。" #: ../../source/models/xinference_model_hub.rst:18 msgid "" "**Audience:** Regular registered users without model maintenance or " "public model registration permissions." msgstr "**适用对象:** 无模型维护或公共模型注册权限的普通注册用户。" #: ../../source/models/xinference_model_hub.rst:21 #: ../../source/models/xinference_model_hub.rst:60 msgid "Core Features" msgstr "核心功能" #: ../../source/models/xinference_model_hub.rst:24 msgid "Browse Models" msgstr "浏览模型" #: ../../source/models/xinference_model_hub.rst:26 msgid "**Function:** View available models and click to see details" msgstr "**功能:** 查看可用模型并点击查看详细信息" #: ../../source/models/xinference_model_hub.rst:27 msgid "**Location:** Navigation bar → “Models”" msgstr "**位置:** 导航栏 → “模型”" #: ../../source/models/xinference_model_hub.rst:28 msgid "" "**Model Details Page:** The default tab is “README,” where you can view" " model descriptions, usage instructions, and important notes" msgstr "**模型详情页:** 默认展示 “README” 标签页,可查看模型描述、使用说明及重要注意事项" #: ../../source/models/xinference_model_hub.rst:31 msgid "Some advanced models are only visible to authorized users." msgstr "部分高级模型仅对授权用户可见。" #: ../../source/models/xinference_model_hub.rst:34 msgid "User Center" msgstr "用户中心" #: ../../source/models/xinference_model_hub.rst:36 msgid "**Function:** View and manage personal information" msgstr "**功能:** 查看与管理个人信息" #: ../../source/models/xinference_model_hub.rst:37 msgid "**Location:** Click the avatar in the top-right corner → “User Center”" msgstr "**位置:** 点击右上角头像 → “用户中心”" #: ../../source/models/xinference_model_hub.rst:38 msgid "**Account Management:** Update personal profile, email, and other details" msgstr "**账户管理:** 更新个人资料、邮箱及其他信息" #: ../../source/models/xinference_model_hub.rst:39 msgid "" "**Token Management:** Configure a GitHub Token used for model submissions" " or updates" msgstr "**Token 管理:** 配置用于提交或更新模型的 GitHub Token" #: ../../source/models/xinference_model_hub.rst:42 msgid "" "If no token is configured, the system will use a default token to create " "pull requests (PRs)." msgstr "若未配置 Token,系统将使用默认 Token 创建 Pull Request(PR)。" #: ../../source/models/xinference_model_hub.rst:45 msgid "Workflow" msgstr "使用流程" #: ../../source/models/xinference_model_hub.rst:47 msgid "**Register an account** → Log in → Browse models" msgstr "**注册账户** → 登录 → 浏览模型" #: ../../source/models/xinference_model_hub.rst:48 msgid "" "**Reset password:** Click “Forgot Password” on the login page and " "follow the email instructions" msgstr "**重置密码:** 在登录页点击 “忘记密码”,并根据邮件提示操作" #: ../../source/models/xinference_model_hub.rst:49 msgid "**Logout:** Click the avatar in the top-right corner → “Logout”" msgstr "**退出登录:** 点击右上角头像 → “退出登录”" #: ../../source/models/xinference_model_hub.rst:52 msgid "Guide for Model Maintainers" msgstr "模型维护者指南" #: ../../source/models/xinference_model_hub.rst:54 msgid "" "This section is for users with model registration or maintenance " "permissions. It introduces model registration, maintenance, and the " "review workflow." msgstr "本节面向拥有模型注册或维护权限的用户,介绍模型注册、维护及审核流程。" #: ../../source/models/xinference_model_hub.rst:56 msgid "" "**Audience:** Users with model registration or maintenance permissions. " "If you wish to become a model maintainer, you can `contact us " "`_." msgstr "" "**适用对象:** 拥有模型注册或维护权限的用户。若希望成为模型维护者,可 " "`联系我们 `_。" #: ../../source/models/xinference_model_hub.rst:62 msgid "" "Includes all features available to regular users, plus the following " "advanced functions." msgstr "包含普通用户的全部功能,并额外提供以下高级功能。" #: ../../source/models/xinference_model_hub.rst:65 msgid "Model Registration" msgstr "模型注册" #: ../../source/models/xinference_model_hub.rst:67 msgid "**Function:** Submit new models" msgstr "**功能:** 提交新模型" #: ../../source/models/xinference_model_hub.rst:68 msgid "" "**Location:** Click the avatar in the top-right corner → “Model " "Registration”" msgstr "**位置:** 点击右上角头像 → “模型注册”" #: ../../source/models/xinference_model_hub.rst:70 #: ../../source/models/xinference_model_hub.rst:112 msgid "**Operation Steps:**" msgstr "**操作步骤:**" #: ../../source/models/xinference_model_hub.rst:72 msgid "Fill in basic model information" msgstr "填写模型基础信息" #: ../../source/models/xinference_model_hub.rst:73 msgid "Complete the README (click “Get README” to auto-generate)" msgstr "完善 README(可点击 “获取 README” 自动生成)" #: ../../source/models/xinference_model_hub.rst:74 msgid "Submit (for public models, enable the “Public Model” parameter)" msgstr "提交(若为公共模型,请启用 “公共模型” 参数)" #: ../../source/models/xinference_model_hub.rst:77 msgid "" "When registering a public model, the system will automatically create a " "PR in the `xorbitsai/inference` repository. If the user has configured a " "GitHub Token in their personal settings, the system will use that Token " "to submit the PR; otherwise, the default Token will be used." msgstr "" "注册公共模型时,系统会自动在 `xorbitsai/inference` 仓库中创建 PR。" "若用户已在个人中心配置 GitHub Token,系统将使用该 Token 提交 PR;" "否则将使用默认 Token。" #: ../../source/models/xinference_model_hub.rst:80 msgid "**Notes:**" msgstr "**注意事项:**" #: ../../source/models/xinference_model_hub.rst:82 msgid "" "Enterprise model registration requires enabling the “Public Model” " "parameter first." msgstr "企业模型注册需先启用 “公共模型” 参数。" #: ../../source/models/xinference_model_hub.rst:85 msgid "My Models" msgstr "我的模型" #: ../../source/models/xinference_model_hub.rst:87 msgid "**Function:** View models associated with your account" msgstr "**功能:** 查看与当前账户关联的模型" #: ../../source/models/xinference_model_hub.rst:88 msgid "**Location:** Click the avatar in the top-right corner → “My Models”" msgstr "**位置:** 点击右上角头像 → “我的模型”" #: ../../source/models/xinference_model_hub.rst:91 msgid "Model Maintenance" msgstr "模型维护" #: ../../source/models/xinference_model_hub.rst:93 msgid "**Function:** Modify and manage existing models" msgstr "**功能:** 修改与管理已有模型" #: ../../source/models/xinference_model_hub.rst:94 msgid "**Location:** Model Details → “Settings” icon" msgstr "**位置:** 模型详情页 → “设置” 图标" #: ../../source/models/xinference_model_hub.rst:97 msgid "" "When updating the JSON of a public model or modifying expiration " "attributes, the system automatically creates a PR in the " "`xorbitsai/inference` repository." msgstr "当更新公共模型的 JSON 或修改过期属性时,系统会自动在 `xorbitsai/inference` 仓库中创建 PR。" #: ../../source/models/xinference_model_hub.rst:100 msgid "Review Workflow" msgstr "审核流程" #: ../../source/models/xinference_model_hub.rst:102 msgid "**For Submitters:**" msgstr "**提交者操作:**" #: ../../source/models/xinference_model_hub.rst:104 msgid "Submit a model" msgstr "提交模型" #: ../../source/models/xinference_model_hub.rst:105 msgid "Check the review status" msgstr "查看审核状态" #: ../../source/models/xinference_model_hub.rst:106 msgid "Modify based on reviewer feedback" msgstr "根据审核反馈进行修改" #: ../../source/models/xinference_model_hub.rst:108 msgid "**For Reviewers:**" msgstr "**审核者操作:**" #: ../../source/models/xinference_model_hub.rst:110 msgid "" "**Required Permissions:** Model review list access and model review " "permissions" msgstr "**所需权限:** 模型审核列表访问权限与审核权限" #: ../../source/models/xinference_model_hub.rst:114 msgid "Enter the review list" msgstr "进入审核列表" #: ../../source/models/xinference_model_hub.rst:115 msgid "Evaluate model quality and compliance" msgstr "评估模型质量与合规性" #: ../../source/models/xinference_model_hub.rst:116 msgid "Approve or reject, providing feedback as needed" msgstr "根据需要通过或拒绝,并提供审核反馈" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/models/xinference_models_hub.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2025, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2025. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-11-14 14:59+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/models/xinference_models_hub.rst:5 msgid "Xinference Models Hub" msgstr "Xinference Models Hub" #: ../../source/models/xinference_models_hub.rst:9 msgid "Overview" msgstr "概述" #: ../../source/models/xinference_models_hub.rst:11 msgid "" "The `Xinference Models Hub `_ is Xinference" "’s unified platform for model management and collaboration. It provides " "end-to-end support for model browsing, registration, review, updates, and" " collaborative maintenance, serving both regular users and model " "maintainers." msgstr "`Xinference Models Hub `_ 是 Xinference 用于模型管理与协作的统一平台。它为模型浏览、注册、评审、更新与协作维护提供端到端支持,面向普通用户与模型维护者提供全流程能力。" #: ../../source/models/xinference_models_hub.rst:13 msgid "" "Before the introduction of the Models Hub, model JSON files were manually" " submitted and modified directly through PRs in the open-source " "repository (`xorbitsai/inference`). This approach resulted in " "uncontrolled versioning, long iteration cycles, and delays in delivering " "updated models—since model updates were tied to product release cycles, " "users could not obtain new models promptly." msgstr "在引入 Models Hub 之前,模型 JSON 文件需要由用户直接通过 PR 修改 `xorbitsai/inference` 开源仓库。这种方式导致模型版本不可控、迭代周期长、模型更新依赖产品发版流程,用户无法及时获得最新模型。" #: ../../source/models/xinference_models_hub.rst:15 msgid "" "With centralized online model management, the Xinference Models Hub " "requires that **all model information—including metadata, parameters, and" " the README—be edited within the platform**. Based on modifications made " "by :ref:`model maintainers `, the system **" "automatically generates and submits PRs** to the `xorbitsai/inference` " "repository, ensuring a standardized, automated, and traceable workflow " "that eliminates inconsistencies caused by manual edits." msgstr "通过集中式在线模型管理机制,Xinference Models Hub 要求 **所有模型信息(含元数据、参数及 README)均在平台中完成编辑**。系统会根据 :ref:`模型维护者 ` 的修改 **自动生成并提交 PR** 到 `xorbitsai/inference` 仓库,实现流程标准化、自动化与可追溯,避免手动修改带来的不一致问题。" #: ../../source/models/xinference_models_hub.rst:17 msgid "" "Users can obtain the latest model list at any time through the " ":ref:`model_update` feature, significantly improving model delivery " "efficiency and overall experience." msgstr "用户可通过 :ref:`model_update` 功能随时获取最新模型列表,从而显著提升模型交付效率与整体使用体验。" #: ../../source/models/xinference_models_hub.rst:20 msgid "User Guide for Regular Users" msgstr "普通用户指南" #: ../../source/models/xinference_models_hub.rst:22 msgid "" "This section introduces the basic features available to regular " "registered users." msgstr "本节介绍普通注册用户可使用的基础功能。" #: ../../source/models/xinference_models_hub.rst:24 msgid "" "**Audience:** Regular users without model registration or maintenance " "permissions." msgstr "**适用对象:** 无模型注册或维护权限的普通用户。" #: ../../source/models/xinference_models_hub.rst:27 msgid "Core Features" msgstr "核心功能" #: ../../source/models/xinference_models_hub.rst:29 msgid "Regular users can browse public models without logging in:" msgstr "普通用户无需登录即可浏览公开模型:" #: ../../source/models/xinference_models_hub.rst:31 msgid "**Access:** `Xinference Models Hub `_" msgstr "**访问入口:** `Xinference Models Hub `_" #: ../../source/models/xinference_models_hub.rst:34 msgid "Browse Models" msgstr "浏览模型" #: ../../source/models/xinference_models_hub.rst:36 msgid "**Function:** View all publicly available models" msgstr "**功能:** 查看所有公开可用模型" #: ../../source/models/xinference_models_hub.rst:37 msgid "**Location:** Navigation bar → “Models”" msgstr "**入口位置:** 导航栏 → “Models”" #: ../../source/models/xinference_models_hub.rst:38 msgid "" "**Model Details:** The default tab is “README,” which includes model " "description, usage guide, and important notes" msgstr "**模型详情:** 默认展示 README 标签,包含模型说明、使用指南与注意事项" #: ../../source/models/xinference_models_hub.rst:41 msgid "" "Certain advanced or enterprise-level models are visible only to " "authorized users." msgstr "部分高级或企业版模型仅对具备权限的用户可见。" #: ../../source/models/xinference_models_hub.rst:46 msgid "Guide for Model Maintainers" msgstr "模型维护者指南" #: ../../source/models/xinference_models_hub.rst:48 msgid "" "This section describes the features available to users with model " "registration or maintenance permissions, including model registration, " "updates, and review workflows." msgstr "本节介绍具备模型注册或维护权限的用户可使用的功能,包括模型注册、更新与评审流程。" #: ../../source/models/xinference_models_hub.rst:50 msgid "" "**Audience:** Users with model registration or maintenance permissions. " "To become a model maintainer, you may `contact us " "`_." msgstr "**适用对象:** 具有模型注册或维护权限的用户。如需成为模型维护者,可 `联系我们 `_。" #: ../../source/models/xinference_models_hub.rst:54 msgid "Core Features (Login Required)" msgstr "核心功能(需登录)" #: ../../source/models/xinference_models_hub.rst:56 msgid "" "Model maintainers have access to the following advanced features in " "addition to the capabilities available to regular users." msgstr "模型维护者在普通用户功能基础上,可以使用以下高级能力:" #: ../../source/models/xinference_models_hub.rst:59 msgid "User Center" msgstr "用户中心" #: ../../source/models/xinference_models_hub.rst:61 msgid "**Function:** View and manage personal information" msgstr "**功能:** 查看与管理个人信息" #: ../../source/models/xinference_models_hub.rst:62 msgid "**Location:** Top-right avatar → “User Center”" msgstr "**入口位置:** 右上角头像 → “用户中心”" #: ../../source/models/xinference_models_hub.rst:63 msgid "**Account Management:** Update profile, email, and other information" msgstr "**账号管理:** 更新个人资料、邮箱等信息" #: ../../source/models/xinference_models_hub.rst:64 msgid "" "**Token Management:** Configure a personal GitHub Token for model " "submissions or updates" msgstr "**Token 管理:** 配置个人 GitHub Token 以提交或更新模型" #: ../../source/models/xinference_models_hub.rst:67 msgid "" "If no GitHub Token is configured, the system will use a default Token " "when generating PRs." msgstr "若未配置个人 GitHub Token,系统将在生成 PR 时使用默认 Token。" #: ../../source/models/xinference_models_hub.rst:70 msgid "Model Registration" msgstr "模型注册" #: ../../source/models/xinference_models_hub.rst:72 msgid "**Function:** Register new models and submit them for review" msgstr "**功能:** 注册新模型并提交评审" #: ../../source/models/xinference_models_hub.rst:73 msgid "" "**Location:** After logging in → Top-right avatar → “Model " "Registration”" msgstr "**入口位置:** 登录后 → 右上角头像 → “模型注册”" #: ../../source/models/xinference_models_hub.rst:75 msgid "**Submission Steps:**" msgstr "**提交步骤:**" #: ../../source/models/xinference_models_hub.rst:77 msgid "Fill in basic model information (name, engine, format, etc.)" msgstr "填写基础模型信息(名称、引擎、格式等)" #: ../../source/models/xinference_models_hub.rst:78 msgid "Edit the README (click “Get README” to auto-generate a template)" msgstr "编辑 README(点击“获取 README”可自动生成模板)" #: ../../source/models/xinference_models_hub.rst:79 msgid "" "Submit the model (enable the “Public Model” parameter if registering a " "public model)" msgstr "提交模型(如需注册公开模型,请启用“公开模型”参数)" #: ../../source/models/xinference_models_hub.rst:82 msgid "" "When registering a public model, the system automatically creates a PR in" " the `xorbitsai/inference` repository." msgstr "在注册公开模型时,系统将自动在 `xorbitsai/inference` 仓库创建 PR。" #: ../../source/models/xinference_models_hub.rst:85 msgid "My Models" msgstr "我的模型" #: ../../source/models/xinference_models_hub.rst:87 msgid "**Function:** View all models associated with the current account" msgstr "**功能:** 查看与当前账号相关的所有模型" #: ../../source/models/xinference_models_hub.rst:88 msgid "**Location:** After logging in → Top-right avatar → “My Models”" msgstr "**入口位置:** 登录后 → 右上角头像 → “我的模型”" #: ../../source/models/xinference_models_hub.rst:91 msgid "Model Maintenance" msgstr "模型维护" #: ../../source/models/xinference_models_hub.rst:93 msgid "**Function:** Modify the model’s JSON or README" msgstr "**功能:** 修改模型的 JSON 或 README" #: ../../source/models/xinference_models_hub.rst:94 msgid "**Location:** Model details page → “Settings” icon" msgstr "**入口位置:** 模型详情页 → “设置” 图标" #: ../../source/models/xinference_models_hub.rst:97 msgid "" "When updating a public model, the system automatically creates a PR in " "the `xorbitsai/inference` repository." msgstr "在更新公开模型时,系统将自动在 `xorbitsai/inference` 仓库创建 PR。" #: ../../source/models/xinference_models_hub.rst:100 msgid "Review Workflow" msgstr "评审流程" #: ../../source/models/xinference_models_hub.rst:102 msgid "**Submitter Workflow:**" msgstr "**提交者流程:**" #: ../../source/models/xinference_models_hub.rst:104 msgid "Submit a model" msgstr "提交模型" #: ../../source/models/xinference_models_hub.rst:105 msgid "Wait for review" msgstr "等待评审" #: ../../source/models/xinference_models_hub.rst:106 msgid "Revise based on reviewer feedback" msgstr "根据评审意见修改" #: ../../source/models/xinference_models_hub.rst:107 msgid "Updated PRs are generated automatically (for public models)" msgstr "对于公开模型,PR 会在修改后自动更新" #: ../../source/models/xinference_models_hub.rst:109 msgid "" "**Reviewer Permissions:** Access to the model review list and model " "review privileges." msgstr "**评审者权限:** 可访问模型评审列表并执行评审操作。" #: ../../source/models/xinference_models_hub.rst:111 msgid "**Reviewer Workflow:**" msgstr "**评审者流程:**" #: ../../source/models/xinference_models_hub.rst:113 msgid "Enter the “Review List”" msgstr "进入“评审列表”" #: ../../source/models/xinference_models_hub.rst:114 msgid "Evaluate the model’s quality, completeness, and compliance" msgstr "评估模型质量、信息完整性与规范性" #: ../../source/models/xinference_models_hub.rst:115 msgid "Approve or reject, providing feedback as necessary" msgstr "通过或驳回,并给出相应反馈" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/reference/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-28 11:54+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/reference/index.rst:5 msgid "API Reference" msgstr "API 指南" #: ../../source/reference/index.rst:9 msgid "Client" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client `\\ " "\\(base\\_url\\[\\, api\\_key\\]\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.describe_model " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Get model information via RESTful APIs." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.get_model " "`\\ \\(model\\_uid\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Launch the model based on the parameters on the server via RESTful APIs." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.get_model_registration " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Get the model with the model type and model name registered on the server." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.get_launch_model_progress " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Get progress of the specific model." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.cancel_launch_model " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Cancel launching model." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.get_instance_info " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.launch_model " "`\\ \\(model\\_name\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.list_model_registrations " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "List models registered on the server." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.list_models " "`\\ \\(\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Retrieve the model specifications from the Server." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.list_cached_models " "`\\ \\(\\[...\\]\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Get a list of cached models." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.list_deletable_models " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Get the cached models with the model path cached on the server." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.confirm_and_remove_model " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Remove the cached models with the model name cached on the server." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.query_engine_by_model_name " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Get the engine parameters with the model name registered on the server." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.register_model " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Register a custom model." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.terminate_model " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Terminate the specific model running on the server." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.abort_request " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Abort a request." msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.vllm_models " "`\\ \\(\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.login " "`\\ \\(username\\, ...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.get_workers_info " "`\\ \\(\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.get_supervisor_info " "`\\ \\(\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.get_progress " "`\\ \\(request\\_id\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.abort_cluster " "`\\ \\(\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "" ":py:obj:`xinference.client.Client.unregister_model " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:39::1 msgid "Unregister a custom model." msgstr "" #: ../../source/reference/index.rst:41 msgid "Model Handles" msgstr "" #: ../../source/reference/index.rst:45 msgid "ChatModelHandle" msgstr "" #: ../../source/reference/index.rst:54::1 msgid "" ":py:obj:`xinference.client.handlers.ChatModelHandle " "`\\" msgstr "" #: ../../source/reference/index.rst:54::1 msgid "" "alias of " ":py:class:`~xinference.client.restful.restful_client.RESTfulChatModelHandle`" msgstr "" #: ../../source/reference/index.rst:54::1 msgid "" ":py:obj:`xinference.client.handlers.ChatModelHandle.chat " "`\\ \\(...\\)" msgstr "" #: ../../source/reference/index.rst:54::1 msgid "" "Given a list of messages comprising a conversation, the model will return" " a response via RESTful APIs." msgstr "" #: ../../source/reference/index.rst:54::1 msgid "" ":py:obj:`xinference.client.handlers.ChatModelHandle.generate " "`\\ \\(prompt\\)" msgstr "" #: ../../source/reference/index.rst:54::1 #: ../../source/reference/index.rst:84::1 msgid "" "Creates a completion for the provided prompt and parameters via RESTful " "APIs." msgstr "" #: ../../source/reference/index.rst:56 msgid "EmbeddingModelHandle" msgstr "" #: ../../source/reference/index.rst:64::1 msgid "" ":py:obj:`xinference.client.handlers.EmbeddingModelHandle " "`\\" msgstr "" #: ../../source/reference/index.rst:64::1 msgid "" "alias of " ":py:class:`~xinference.client.restful.restful_client.RESTfulEmbeddingModelHandle`" msgstr "" #: ../../source/reference/index.rst:64::1 msgid "" ":py:obj:`xinference.client.handlers.EmbeddingModelHandle.create_embedding" " `\\ " "\\(...\\)" msgstr "" #: ../../source/reference/index.rst:64::1 msgid "Create an Embedding from user input via RESTful APIs." msgstr "" #: ../../source/reference/index.rst:66 msgid "RerankModelHandle" msgstr "" #: ../../source/reference/index.rst:74::1 msgid "" ":py:obj:`xinference.client.restful.restful_client.RESTfulRerankModelHandle" " `\\ " "\\(...\\)" msgstr "" #: ../../source/reference/index.rst:74::1 msgid "" ":py:obj:`xinference.client.restful.restful_client.RESTfulRerankModelHandle.rerank" " " "`\\" " \\(...\\)" msgstr "" #: ../../source/reference/index.rst:74::1 msgid "" "Returns an ordered list of documents ordered by their relevance to the " "provided query." msgstr "" #: ../../source/reference/index.rst:76 msgid "GenerateModelHandle" msgstr "" #: ../../source/reference/index.rst:84::1 msgid "" ":py:obj:`xinference.client.handlers.GenerateModelHandle " "`\\" msgstr "" #: ../../source/reference/index.rst:84::1 msgid "" "alias of " ":py:class:`~xinference.client.restful.restful_client.RESTfulGenerateModelHandle`" msgstr "" #: ../../source/reference/index.rst:84::1 msgid "" ":py:obj:`xinference.client.handlers.GenerateModelHandle.generate " "`\\ \\(prompt\\)" msgstr "" #: ../../source/reference/index.rst:86 msgid "ImageModelHandle" msgstr "" #: ../../source/reference/index.rst:94::1 msgid "" ":py:obj:`xinference.client.handlers.ImageModelHandle " "`\\" msgstr "" #: ../../source/reference/index.rst:94::1 msgid "" "alias of " ":py:class:`~xinference.client.restful.restful_client.RESTfulImageModelHandle`" msgstr "" #: ../../source/reference/index.rst:94::1 msgid "" ":py:obj:`xinference.client.handlers.ImageModelHandle.text_to_image " "`\\ " "\\(prompt\\)" msgstr "" #: ../../source/reference/index.rst:94::1 msgid "Creates an image by the input text." msgstr "" #: ../../source/reference/index.rst:96 msgid "AudioModelHandle" msgstr "" #: ../../source/reference/index.rst:106::1 msgid "" ":py:obj:`xinference.client.handlers.AudioModelHandle " "`\\" msgstr "" #: ../../source/reference/index.rst:106::1 msgid "" "alias of " ":py:class:`~xinference.client.restful.restful_client.RESTfulAudioModelHandle`" msgstr "" #: ../../source/reference/index.rst:106::1 msgid "" ":py:obj:`xinference.client.handlers.AudioModelHandle.transcriptions " "`\\ " "\\(audio\\)" msgstr "" #: ../../source/reference/index.rst:106::1 msgid "Transcribes audio into the input language." msgstr "" #: ../../source/reference/index.rst:106::1 msgid "" ":py:obj:`xinference.client.handlers.AudioModelHandle.translations " "`\\ \\(audio\\)" msgstr "" #: ../../source/reference/index.rst:106::1 msgid "Translates audio into English." msgstr "" #: ../../source/reference/index.rst:106::1 msgid "" ":py:obj:`xinference.client.handlers.AudioModelHandle.speech " "`\\ \\(input\\)" msgstr "" #: ../../source/reference/index.rst:106::1 msgid "Generates audio from the input text." msgstr "" #: ../../source/reference/index.rst:108 msgid "FlexibleModelHandle" msgstr "" #: ../../source/reference/index.rst:116::1 msgid "" ":py:obj:`xinference.client.restful.restful_client.RESTfulFlexibleModelHandle" " `\\" " \\(...\\)" msgstr "" #: ../../source/reference/index.rst:116::1 msgid "" ":py:obj:`xinference.client.restful.restful_client.RESTfulFlexibleModelHandle.infer" " " "`\\" " \\(...\\)" msgstr "" #: ../../source/reference/index.rst:116::1 msgid "Call flexible model." msgstr "" #: ../../source/reference/index.rst:118 msgid "VideoModelHandle" msgstr "" #: ../../source/reference/index.rst:124::1 msgid "" ":py:obj:`xinference.client.handlers.VideoModelHandle " "`\\" msgstr "" #: ../../source/reference/index.rst:124::1 msgid "" "alias of " ":py:class:`~xinference.client.restful.restful_client.RESTfulVideoModelHandle`" msgstr "" #: ../../source/reference/index.rst:124::1 msgid "" ":py:obj:`xinference.client.handlers.VideoModelHandle.text_to_video " "`\\ " "\\(prompt\\)" msgstr "" #: ../../source/reference/index.rst:124::1 msgid "Creates a video by the input text." msgstr "" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/reference.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-07-18 10:54+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/reference/index.rst:5 msgid "API Reference" msgstr "API 指南" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/user_guide/auth_system.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-03-19 12:55+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/user_guide/auth_system.rst:5 msgid "Simple OAuth2 System (experimental)" msgstr "OAuth2 系统(实验性质)" #: ../../source/user_guide/auth_system.rst:7 msgid "" "Xinference builds an In-memory OAuth2 authentication and authorization " "system using the account-password mode." msgstr "" "Xinference 使用了账号密码的模式构建了一个基于内存的 OAuth2 的身份验证和" "授权系统。" #: ../../source/user_guide/auth_system.rst:10 msgid "" "If you don't have authentication and authorization requirements, you can " "use Xinference as before, without any changes." msgstr "" "如果没有身份验证和授权的要求,可以像之前一样使用 Xinference,无需任何改动" "。" #: ../../source/user_guide/auth_system.rst:14 msgid "Permissions" msgstr "权限" #: ../../source/user_guide/auth_system.rst:15 msgid "" "Currently, Xinference system internally defines some interface " "permissions:" msgstr "目前,Xinference 内部定义了以下几个接口权限:" #: ../../source/user_guide/auth_system.rst:17 msgid "``models:list``: Permission to list models and get models' information." msgstr "``models:list``: 获取模型列表和信息的权限。" #: ../../source/user_guide/auth_system.rst:18 msgid "``models:read``: Permission to use models." msgstr "``models:read``: 使用模型的权限。" #: ../../source/user_guide/auth_system.rst:19 msgid "``models:register``: Permission to register custom models." msgstr "``models:register``: 注册模型的权限。" #: ../../source/user_guide/auth_system.rst:20 msgid "``models:unregister``: Permission to unregister custom models." msgstr "``models:unregister``: 取消注册模型的权限。" #: ../../source/user_guide/auth_system.rst:21 msgid "``models:start``: Permission to launch models." msgstr "``models:start``: 启动模型的权限。" #: ../../source/user_guide/auth_system.rst:22 msgid "``models:stop``: Permission to stop running models." msgstr "``models:stop``: 停止模型的权限。" #: ../../source/user_guide/auth_system.rst:23 msgid "``admin``: Administrators have permissions for all interfaces." msgstr "``admin``: 管理员拥有所有接口的权限。" #: ../../source/user_guide/auth_system.rst:27 msgid "Startup" msgstr "开始使用" #: ../../source/user_guide/auth_system.rst:28 msgid "" "All authentication and authorization information needs to be specified " "and loaded into memory when Xinference is started. Xinference requires a " "JSON-formatted file with the following specific fields:" msgstr "" "在启动 Xinference 时,需要指定所有的验证和授权信息。当前,Xinference 需要" "一个 JSON 文件,其中包含以下特定字段:" #: ../../source/user_guide/auth_system.rst:67 msgid "" "``auth_config``: This field is used to configure security-related " "information." msgstr "``auth_config``: 这个字段配置与安全相关的信息。" #: ../../source/user_guide/auth_system.rst:69 msgid "" "``algorithm``: The algorithm used for token generation and parsing. " "``HS`` series algorithms are recommended. For example, ``HS256``, " "``HS384`` or ``HS512``." msgstr "" "``algorithm``: 用于令牌生成与解析的算法。推荐使用 ``HS`` 系列算法,例如 `" "`HS256``,``HS384`` 或者 ``HS512`` 算法。" #: ../../source/user_guide/auth_system.rst:71 msgid "" "``secret_key``: The secret_key used for token generation and parsing. Use" " this command to generate the secret_key adapted to the ``HS`` " "algorithms: ``openssl rand -hex 32``." msgstr "" "``secret_key``: 用于令牌生成和解析的密钥。可以使用该命令生成适配 ``HS`` " "系列算法的密钥:``openssl rand -hex 32`` 。" #: ../../source/user_guide/auth_system.rst:73 msgid "" "``token_expire_in_minutes``: Reserved field indicating the expiration " "time of the token. The current open-source version of Xinference does not" " check the expiration time of tokens." msgstr "" "``token_expire_in_minutes``: 保留字段,表示令牌失效时间。目前 Xinference " "开源版本不会检查令牌过期时间。" #: ../../source/user_guide/auth_system.rst:75 msgid "" "``user_config``: This field is used to configure user and permission " "information. Each user information is composed of these fields:" msgstr "" "``user_config``: 这个字段用来配置用户和权限信息。每个用户信息由以下字段" "组成:" #: ../../source/user_guide/auth_system.rst:77 msgid "``username``: string field for username." msgstr "``username``: 字符串,表示用户名" #: ../../source/user_guide/auth_system.rst:79 msgid "``password``: string field for password." msgstr "``password``: 字符串,表示密码" #: ../../source/user_guide/auth_system.rst:81 msgid "" "``permissions``: A list containing strings representing the permissions " "that this user has. The permissions are described as above." msgstr "" "``permissions``: 字符串列表,表示该用户拥有的权限。权限描述如上权限部分" "文档所述。" #: ../../source/user_guide/auth_system.rst:83 msgid "" "``api_keys``: A list containing strings representing the api-keys of this" " user. With these api-keys, user can access the xinference interfaces " "without the need to signin. The api-key here is formatted similar to the " "``OPENAI_API_KEY`` , always starting with ``sk-``, followed by 13 " "alphanumeric characters." msgstr "" "``api_keys``: 字符串列表,表示该用户拥有的 api-key 。用户可以通过这些 api" "-key ,无需登录步骤即可访问 xinference 接口。这里的 api_key 组成与 ``" "OPENAI_API_KEY`` 相似,总是以 ``sk-`` 开头,后跟 13 个数字、大小写字母。" #: ../../source/user_guide/auth_system.rst:86 msgid "" "Once you have configured such a JSON file, use the ``--auth-config`` " "option to enable Xinference with the authentication and authorization " "system. For example, for local startup:" msgstr "" "配置好这样一个 JSON 文件后,可以使用 ``--auth-config`` 选项启用具有" "身份验证和授权系统的 Xinference。例如,本地启动的命令如下所示:" #: ../../source/user_guide/auth_system.rst:93 msgid "" "For distributed startup, just specify this option when starting the " "supervisor:" msgstr "在分布式环境下,只需要在启动 supervisor 时指定这个选项:" #: ../../source/user_guide/auth_system.rst:101 msgid "Usage" msgstr "使用" #: ../../source/user_guide/auth_system.rst:102 msgid "" "For Xinference with the authentication and authorization system enabled, " "all usage remains the same, except for the addition of a login step at " "the beginning or using the api-key." msgstr "" "使用带有权限管理的 Xinference 服务与正常的版本保持一致,只是在开始阶段" "添加了登录步骤,或者使用 api-key 进行鉴权。" #: ../../source/user_guide/auth_system.rst:105 msgid "Signin" msgstr "基于用户名-密码的使用方式" #: ../../source/user_guide/auth_system.rst:106 msgid "Signin for command line users:" msgstr "使用命令行登录:" #: ../../source/user_guide/auth_system.rst:113 msgid "For python SDK users:" msgstr "使用 Python SDK 登录:" #: ../../source/user_guide/auth_system.rst:122 msgid "" "For web UI users, when opening the web UI, you will first be directed to " "the login page. After logging in, you can use the web UI normally." msgstr "" "对于 Web UI 的用户,在打开 Web UI 时,将首先跳转到登录页面。登录后,就" "可以正常使用Web UI 的功能。" #: ../../source/user_guide/auth_system.rst:125 msgid "Api-Key" msgstr "基于 Api-Key 鉴权的使用方式" #: ../../source/user_guide/auth_system.rst:126 msgid "" "For command line users, just add ``--api-key`` or ``-ak`` option in the " "command you want to use." msgstr "" "对于命令行用户,仅需在所要运行的命令上新增 ``--api-key`` 或 ``-ak`` 选项" "即可。" #: ../../source/user_guide/auth_system.rst:133 msgid "" "For python SDK users, pass the ``api_key`` parameter when initializing " "the client, just like the ``OPENAI`` Python client." msgstr "" "对于 Python 客户端用户,在客户端对象初始化时传入 ``api_key`` 参数即可,就" "像 ``OPENAI`` 客户端那样。" #: ../../source/user_guide/auth_system.rst:141 msgid "Xinference is also compatible with the ``OPENAI`` Python SDK as well." msgstr "当然,Xinference 也与 ``OPENAI`` Python 客户端的使用方式完全兼容。" #: ../../source/user_guide/auth_system.rst:149 msgid "" "For http request, pass ``Authorization: Bearer api-key`` in request " "header." msgstr "对于 HTTP 请求,在请求头中传递 ``Authorization: Bearer api-key``。" #: ../../source/user_guide/auth_system.rst:159 msgid "Http Status Code" msgstr "Http 状态码" #: ../../source/user_guide/auth_system.rst:160 msgid "Add the following two HTTP status codes:" msgstr "添加了以下两种 HTTP 状态码:" #: ../../source/user_guide/auth_system.rst:162 msgid "``401 Unauthorized``: login information or token verifies failed." msgstr "``401 Unauthorized``: 登录信息或者令牌验证失效。" #: ../../source/user_guide/auth_system.rst:163 msgid "``403 Forbidden``: No enough permissions when accessing interfaces." msgstr "``403 Forbidden``: 没有足够的权限访问接口。" #: ../../source/user_guide/auth_system.rst:165 msgid "" "For the command line, SDK, or web UI users, there will be clear " "information prompts when encountering authorization and permissions " "issues." msgstr "" "对于命令行、SDK 或 Web UI 用户,在遇到授权和权限问题时,会有明确的信息" "提示。" #: ../../source/user_guide/auth_system.rst:169 msgid "Note" msgstr "注意" #: ../../source/user_guide/auth_system.rst:170 msgid "" "This feature is still in an experimental stage. Feel free to provide " "feedback on usage issues or improvement suggestions through `GitHub " "issues `_ or `our Slack " "`_." msgstr "" "该功能处于实验阶段。欢迎通过 `GitHub issues `_ 或者 `Slack `_ 提供反馈和建议。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/user_guide/backends.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-28 11:54+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/user_guide/backends.rst:5 msgid "Backends" msgstr "推理引擎" #: ../../source/user_guide/backends.rst:7 msgid "" "Xinference supports multiple backends for different models. After the " "user specifies the model, xinference will automatically select the " "appropriate backend." msgstr "Xinference 对于不同模型支持不同的推理引擎。用户选择模型后,Xinference 会自动选择合适的引擎" #: ../../source/user_guide/backends.rst:11 msgid "llama.cpp" msgstr "" #: ../../source/user_guide/backends.rst:13 msgid "" "Xinference now supports `xllamacpp " "`_ which developed by Xinference " "team to run llama.cpp backend. `llama.cpp` is developed based on the " "tensor library `ggml`, supporting inference of the LLaMA series models " "and their variants." msgstr "" "Xinference 目前支持由 Xinference 团队开发的 `xllamacpp " "`_ 作为 llama.cpp 后端运行。`llama.cpp` " "基于张量库 `ggml` 开发,支持 LLaMA 系列模型及其变体的推理。" #: ../../source/user_guide/backends.rst:20 msgid "" "Since Xinference v1.5.0, ``xllamacpp`` becomes default option for " "llama.cpp, and ``llama-cpp-python`` is deprecated. Since Xinference " "v1.6.0, ``llama-cpp-python`` has been removed." msgstr "" "自 Xinference v1.5.0 起,``xllamacpp`` 成为 llama.cpp 的默认选项,``llama-cpp-" "python`` 被弃用;从 Xinference v1.6.0 开始,``llama-cpp-python`` 已被移除。" #: ../../source/user_guide/backends.rst:25 msgid "" "For all configurable llama.cpp parameters, please refer to the definition" " of the ``common_params`` structure in ``llama.cpp`` `common.h " "`_" msgstr "" "请参考 ``llama.cpp`` 的 `common.h `_ 中 ``common_params`` " "结构体定义设置参数。" #: ../../source/user_guide/backends.rst:27 msgid "" "There may be some nested parameters. For example, ``sampling.top_k``. " "Just use the ``.`` to separate nested parameters." msgstr "可能会有嵌套多层的参数。例如,``sampling.top_k``。请使用 ``.`` 来分割嵌套参数。" #: ../../source/user_guide/backends.rst:29 msgid "Here is an example of setting nested sampling parameters in WebUI:" msgstr "这里有一个在 WebUI 中设置嵌套 sampling 参数的例子:" #: ../../source/user_guide/backends.rst:36 msgid "Auto NGL" msgstr "自动 NGL" #: ../../source/user_guide/backends.rst:38 msgid "" "Auto GPU layers estimation is enabled since v1.6.1 when ``n-gpu-layers`` " "is not specified (default is -1)." msgstr "自 v1.6.1 起,当未指定 n-gpu-layers(默认为 -1)时,将自动启用 GPU 层数估算功能。" #: ../../source/user_guide/backends.rst:41 msgid "" "This feature automatically detects the number of GPU layers (NGL) for the" " llama.cpp backend. Please be aware that this is not an accurate " "calculation. Therefore, the ``-ngl`` result might not be the most " "optimized, and there is still a chance of encountering an out-of-memory " "error." msgstr "" "这个特性可以为 llama.cpp 后端自动设置 GPU 层数(NGL)。请注意这并不是一个精确的计算,因此 ``-ngl`` " "结果可能不是最优的,并且仍然可能遇到显存不足的错误。" #: ../../source/user_guide/backends.rst:45 msgid "" "Currently, there is no official implementation for auto ngl. Please refer" " to the following issues for more information:" msgstr "目前自动 NGL 没有官方支持。请参考下面 issue 来了解更多详情:" #: ../../source/user_guide/backends.rst:47 msgid "https://github.com/ggml-org/llama.cpp/issues/13860" msgstr "" #: ../../source/user_guide/backends.rst:48 msgid "https://github.com/ggml-org/llama.cpp/pull/6502" msgstr "" #: ../../source/user_guide/backends.rst:50 msgid "" "Our implementation is based on the Ollama auto ngl, but there are some " "differences:" msgstr "我们的实现是基于 Ollama 的自动 NGL,但是有一些不同之处:" #: ../../source/user_guide/backends.rst:52 msgid "" "We utilize device information detected by `xllamacpp " "`_." msgstr "我们使用 `xllamacpp `_ 提供的设备信息。" #: ../../source/user_guide/backends.rst:53 msgid "" "We have removed support for less popular architectures, these " "architectures will use the default calculation." msgstr "我们删除了一些不常见的架构支持,这些架构下会使用默认计算逻辑。" #: ../../source/user_guide/backends.rst:54 msgid "" "We fall back to offloading all the layers to the GPU if the auto ngl " "fails." msgstr "如果自动 NGL 失败,我们会尝试全部加载到 GPU。" #: ../../source/user_guide/backends.rst:55 msgid "" "We do not support multimodal projectors embedded into the model GGUF, as " "this is a very experimental feature." msgstr "我们不支持多模态投影器内嵌到模型的 GGUF,这种格式的模型目前还处于实验阶段。" #: ../../source/user_guide/backends.rst:59 msgid "Common Issues" msgstr "常见问题" #: ../../source/user_guide/backends.rst:61 #, python-brace-format msgid "" "**Server error: {'code': 500, 'message': 'failed to process image', " "'type': 'server_error'}**" msgstr "" #: ../../source/user_guide/backends.rst:63 #: ../../source/user_guide/backends.rst:87 msgid "The error logs from server:" msgstr "服务端日志:" #: ../../source/user_guide/backends.rst:78 msgid "" "This could be caused by running out of memory. You can try reducing " "memory usage by decreasing ``n_ctx``." msgstr "可能由于内存不足导致。你可以尝试减小 ``n_ctx`` 解决。" #: ../../source/user_guide/backends.rst:80 #, python-brace-format msgid "" "**Server error: {'code': 400, 'message': 'the request exceeds the " "available context size. try increasing the context size or enable context" " shift', 'type': 'invalid_request_error'}**" msgstr "" #: ../../source/user_guide/backends.rst:82 msgid "" "If you are using the multimodal feature, the ``ctx_shift`` is disabled by" " default. Please increase the context size by either increasing ``n_ctx``" " or reducing ``n_parallel``." msgstr "" "如果你正在使用 multimodal 功能,``ctx_shift`` 会被默认关闭。请尝试增加 ``n_ctx`` 或者减小 " "``n_parallel`` 以增加每个 slot 的 context 大小。" #: ../../source/user_guide/backends.rst:85 #, python-brace-format msgid "" "**Server error: {'code': 500, 'message': 'Input prompt is too big " "compared to KV size. Please try increasing KV size.', 'type': " "'server_error'}**" msgstr "" #: ../../source/user_guide/backends.rst:97 msgid "" "This could be caused by the KV cache allocation failure. You can try to " "reduce the context size by either reducing ``n_ctx`` or increasing " "``n_parallel``, or loading a partial model onto the GPU by adjusting " "``n_gpu_layers``. Be aware that if you are handling inference requests " "serially, increasing ``n_parallel`` can't improve the latency or " "throughput." msgstr "" "可能由于 KV cache 创建失败导致。你可以通过减小 ``n_ctx`` 或者增加 ``n_parallel`` 或者调节 " "``n_gpu_layers`` 参数加载部分模型到 GPU 来解决。请注意,如果你只处理串行推理请求,增加 ``n_parallel`` " "并不会带来性能提升。" #: ../../source/user_guide/backends.rst:102 msgid "transformers" msgstr "transformers" #: ../../source/user_guide/backends.rst:103 msgid "" "Transformers supports the inference of most state-of-art models. It is " "the default backend for models in PyTorch format." msgstr "Transformers 支持绝大部分新出的模型。是 Pytorch 格式模型默认使用的引擎。" #: ../../source/user_guide/backends.rst:108 msgid "vLLM" msgstr "vLLM" #: ../../source/user_guide/backends.rst:109 msgid "vLLM is a fast and easy-to-use library for LLM inference and serving." msgstr "vLLM 是一个非常高效并且易用的大语言模型推理引擎。" #: ../../source/user_guide/backends.rst:111 msgid "vLLM is fast with:" msgstr "vLLM 具有以下特点:" #: ../../source/user_guide/backends.rst:113 msgid "State-of-the-art serving throughput" msgstr "领先的推理吞吐量" #: ../../source/user_guide/backends.rst:114 msgid "Efficient management of attention key and value memory with PagedAttention" msgstr "使用 PagedAttention 高效管理注意力键和值记忆" #: ../../source/user_guide/backends.rst:115 msgid "Continuous batching of incoming requests" msgstr "对传入请求进行连续批处理" #: ../../source/user_guide/backends.rst:116 msgid "Optimized CUDA kernels" msgstr "优化的 CUDA 内核" #: ../../source/user_guide/backends.rst:118 msgid "" "When the following conditions are met, Xinference will choose vLLM as the" " inference engine:" msgstr "当满足以下条件时,Xinference 会自动选择 vLLM 作为推理引擎:" #: ../../source/user_guide/backends.rst:120 msgid "" "The model format is ``pytorch``, ``gptq``, ``awq``, ``fp4``, ``fp8`` or " "``bnb``." msgstr "模型格式为 ``pytorch`` , ``gptq`` , ``awq`` , ``fp4`` , ``fp8`` 或者 ``bnb`` 。" #: ../../source/user_guide/backends.rst:121 msgid "When the model format is ``pytorch``, the quantization is ``none``." msgstr "当模型格式为 ``pytorch`` 时,量化选项需为 ``none`` 。" #: ../../source/user_guide/backends.rst:122 msgid "When the model format is ``awq``, the quantization is ``Int4``." msgstr "当模型格式为 ``awq`` 时,量化选项需为 ``Int4`` 。" #: ../../source/user_guide/backends.rst:123 msgid "" "When the model format is ``gptq``, the quantization is ``Int3``, ``Int4``" " or ``Int8``." msgstr "当模型格式为 ``gptq`` 时,量化选项需为 ``Int3``, ``Int4`` 或 ``Int8`` 。" #: ../../source/user_guide/backends.rst:124 msgid "The system is Linux and has at least one CUDA device" msgstr "操作系统为 Linux 并且至少有一个支持 CUDA 的设备" #: ../../source/user_guide/backends.rst:125 msgid "" "The model family (for custom models) / model name (for builtin models) is" " within the list of models supported by vLLM" msgstr "自定义模型的 ``model_family`` 字段和内置模型的 ``model_name`` 字段在 vLLM 的支持列表中。" #: ../../source/user_guide/backends.rst:127 msgid "Currently, supported model includes:" msgstr "目前,支持的模型包括:" #: ../../source/user_guide/backends.rst:131 msgid "" "``code-llama``, ``code-llama-instruct``, ``code-llama-python``, " "``deepseek``, ``deepseek-chat``, ``deepseek-coder``, ``deepseek-coder-" "instruct``, ``deepseek-r1-distill-llama``, ``gorilla-openfunctions-v2``, " "``HuatuoGPT-o1-LLaMA-3.1``, ``llama-2``, ``llama-2-chat``, ``llama-3``, " "``llama-3-instruct``, ``llama-3.1``, ``llama-3.1-instruct``, " "``llama-3.3-instruct``, ``tiny-llama``, ``wizardcoder-python-v1.0``, " "``wizardmath-v1.0``, ``Yi``, ``Yi-1.5``, ``Yi-1.5-chat``, ``Yi-1.5-chat-" "16k``, ``Yi-200k``, ``Yi-chat``" msgstr "" #: ../../source/user_guide/backends.rst:132 msgid "" "``codestral-v0.1``, ``mistral-instruct-v0.1``, ``mistral-instruct-v0.2``," " ``mistral-instruct-v0.3``, ``mistral-large-instruct``, ``mistral-nemo-" "instruct``, ``mistral-v0.1``, ``openhermes-2.5``, ``seallm_v2``" msgstr "" #: ../../source/user_guide/backends.rst:133 msgid "" "``Baichuan-M2``, ``codeqwen1.5``, ``codeqwen1.5-chat``, ``deepseek-r1" "-distill-qwen``, ``DianJin-R1``, ``fin-r1``, ``HuatuoGPT-o1-Qwen2.5``, " "``KAT-V1``, ``marco-o1``, ``qwen1.5-chat``, ``qwen2-instruct``, " "``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-coder-instruct``, " "``qwen2.5-instruct``, ``qwen2.5-instruct-1m``, ``qwenLong-l1``, ``QwQ-" "32B``, ``QwQ-32B-Preview``, ``seallms-v3``, ``skywork-or1``, ``skywork-" "or1-preview``, ``XiYanSQL-QwenCoder-2504``" msgstr "" #: ../../source/user_guide/backends.rst:134 msgid "``llama-3.2-vision``, ``llama-3.2-vision-instruct``" msgstr "" #: ../../source/user_guide/backends.rst:135 msgid "``baichuan-2``, ``baichuan-2-chat``" msgstr "" #: ../../source/user_guide/backends.rst:136 msgid "``InternLM2ForCausalLM``" msgstr "" #: ../../source/user_guide/backends.rst:137 msgid "``qwen-chat``" msgstr "" #: ../../source/user_guide/backends.rst:138 msgid "" "``mixtral-8x22B-instruct-v0.1``, ``mixtral-instruct-v0.1``, " "``mixtral-v0.1``" msgstr "" #: ../../source/user_guide/backends.rst:139 msgid "``cogagent``" msgstr "" #: ../../source/user_guide/backends.rst:140 msgid "``glm-edge-chat``, ``glm4-chat``, ``glm4-chat-1m``" msgstr "" #: ../../source/user_guide/backends.rst:141 msgid "``codegeex4``, ``glm-4v``" msgstr "" #: ../../source/user_guide/backends.rst:142 msgid "``seallm_v2.5``" msgstr "" #: ../../source/user_guide/backends.rst:143 msgid "``orion-chat``" msgstr "" #: ../../source/user_guide/backends.rst:144 msgid "``qwen1.5-moe-chat``, ``qwen2-moe-instruct``" msgstr "" #: ../../source/user_guide/backends.rst:145 msgid "``CohereForCausalLM``" msgstr "" #: ../../source/user_guide/backends.rst:146 msgid "" "``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, " "``deepseek-vl2``" msgstr "" #: ../../source/user_guide/backends.rst:147 msgid "" "``deepseek-prover-v2``, ``deepseek-r1``, ``deepseek-r1-0528``, " "``deepseek-v3``, ``deepseek-v3-0324``, ``Deepseek-V3.1``, ``moonlight-" "16b-a3b-instruct``" msgstr "" #: ../../source/user_guide/backends.rst:148 msgid "``deepseek-r1-0528-qwen3``, ``qwen3``" msgstr "" #: ../../source/user_guide/backends.rst:149 msgid "``minicpm3-4b``" msgstr "" #: ../../source/user_guide/backends.rst:150 msgid "``internlm3-instruct``" msgstr "" #: ../../source/user_guide/backends.rst:151 msgid "``gemma-3-1b-it``" msgstr "" #: ../../source/user_guide/backends.rst:152 msgid "``glm4-0414``" msgstr "" #: ../../source/user_guide/backends.rst:153 msgid "" "``minicpm-2b-dpo-bf16``, ``minicpm-2b-dpo-fp16``, ``minicpm-2b-dpo-" "fp32``, ``minicpm-2b-sft-bf16``, ``minicpm-2b-sft-fp32``, ``minicpm4``" msgstr "" #: ../../source/user_guide/backends.rst:154 msgid "``Ernie4.5``" msgstr "" #: ../../source/user_guide/backends.rst:155 msgid "``Qwen3-Coder``, ``Qwen3-Instruct``, ``Qwen3-Thinking``" msgstr "" #: ../../source/user_guide/backends.rst:156 msgid "``glm-4.5``" msgstr "" #: ../../source/user_guide/backends.rst:157 msgid "``gpt-oss``" msgstr "" #: ../../source/user_guide/backends.rst:158 msgid "``seed-oss``" msgstr "" #: ../../source/user_guide/backends.rst:159 msgid "``Qwen3-Next-Instruct``, ``Qwen3-Next-Thinking``" msgstr "" #: ../../source/user_guide/backends.rst:160 msgid "``DeepSeek-V3.2``, ``DeepSeek-V3.2-Exp``" msgstr "" #: ../../source/user_guide/backends.rst:161 msgid "``MiniMax-M2``" msgstr "" #: ../../source/user_guide/backends.rst:167 msgid "SGLang" msgstr "" #: ../../source/user_guide/backends.rst:168 msgid "" "`SGLang `_ has a high-performance " "inference runtime with RadixAttention. It significantly accelerates the " "execution of complex LLM programs by automatic KV cache reuse across " "multiple calls. And it also supports other common techniques like " "continuous batching and tensor parallelism." msgstr "" "`SGLang `_ 具有基于 RadixAttention " "的高性能推理运行时。它通过在多个调用之间自动重用KV缓存,显著加速了复杂 LLM " "程序的执行。它还支持其他常见推理技术,如连续批处理和张量并行处理。" #: ../../source/user_guide/backends.rst:175 msgid "MLX" msgstr "" #: ../../source/user_guide/backends.rst:176 msgid "" "`MLX `_ " "provides efficient runtime to run LLM on Apple silicon. It's recommended " "to use for Mac users when running on Apple silicon if the model has MLX " "format support." msgstr "" "`MLX `_ 提供在苹果 " "silicon 芯片上高效运行 LLM 的方式。在模型包含 MLX 格式的时候,推荐使用苹果 silicon 芯片的 Mac 用户使用 MLX " "引擎。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/user_guide/cache_management.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-10-16 10:33+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/user_guide/cache_management.rst:5 msgid "Cache Management" msgstr "缓存管理" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/user_guide/client_api.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-09-09 12:13+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/user_guide/client_api.rst:5 msgid "Client API" msgstr "客户端 API" #: ../../source/user_guide/client_api.rst:7 msgid "Complete Client API Reference: :ref:`reference_index`" msgstr "完整地 API 指南: :ref:`reference_index`" #: ../../source/user_guide/client_api.rst:9 msgid "" "To utilize the Client API, initiate the xinference server using the " "command below:" msgstr "使用 Client API,需要先使用以下命令拉起 Xinference 服务:" #: ../../source/user_guide/client_api.rst:18 msgid "" "Based on the log above, the endpoint is `http://127.0.0.1:9997`. Users " "can connect to the xinference server through this endpoint using the " "Client." msgstr "" "在命令日志里会打印服务地址,上述日志中为 `http://127.0.0.1:9997`。用户可以通过 Client 连接 Xinference " "服务。" #: ../../source/user_guide/client_api.rst:20 msgid "" "Models are categorized into LLM, embedding, image, etc. We plan to " "introduce more model types in the future." msgstr "所有模型被分为 LLM、embedding、rerank 等类型。后续可能会支持更多类型的模型。" #: ../../source/user_guide/client_api.rst:23 msgid "LLM" msgstr "LLM" #: ../../source/user_guide/client_api.rst:25 msgid "To list the available built-in LLM models:" msgstr "列出所有内置支持的 LLM 模型:" #: ../../source/user_guide/client_api.rst:38 msgid "To initialize an LLM and chat:" msgstr "初始化一个大语言模型并且与之对话:" #: ../../source/user_guide/client_api.rst:41 #: ../../source/user_guide/client_api.rst:183 #: ../../source/user_guide/client_api.rst:254 #: ../../source/user_guide/client_api.rst:323 msgid "Xinference Client" msgstr "Xinference Client" #: ../../source/user_guide/client_api.rst:65 #: ../../source/user_guide/client_api.rst:215 #: ../../source/user_guide/client_api.rst:278 #: ../../source/user_guide/client_api.rst:347 msgid "OpenAI Client" msgstr "OpenAI Client" #: ../../source/user_guide/client_api.rst:67 msgid "" "Openai client request with the same function as before, excluding launch " "model. More details refer to: https://platform.openai.com/docs/api-" "reference/chat?lang=python" msgstr "" "使用 Openai 发送请求时,除了创建模型,其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 " "https://platform.openai.com/docs/api-reference/chat?lang=python" #: ../../source/user_guide/client_api.rst:89 msgid "OpenAI Client Tool Calls" msgstr "OpenAI 工具调用" #: ../../source/user_guide/client_api.rst:134 #: ../../source/user_guide/client_api.rst:197 #: ../../source/user_guide/client_api.rst:229 #: ../../source/user_guide/client_api.rst:269 #: ../../source/user_guide/client_api.rst:293 #: ../../source/user_guide/client_api.rst:337 #: ../../source/user_guide/client_api.rst:362 #: ../../source/user_guide/client_api.rst:391 msgid "Output:" msgstr "输出:" #: ../../source/user_guide/client_api.rst:142 #, fuzzy msgid "Anthropic Client" msgstr "Anthropic Client" #: ../../source/user_guide/client_api.rst:144 #, fuzzy msgid "Anthropic API's access address is: /anthropic/v1/messages" msgstr "Anthropic的API访问地址为:/anthropic/v1/messages" #: ../../source/user_guide/client_api.rst:165 msgid "Embedding" msgstr "Embedding" #: ../../source/user_guide/client_api.rst:167 msgid "To list the available built-in embedding models:" msgstr "列出所有内置支持的 embedding 模型:" #: ../../source/user_guide/client_api.rst:180 msgid "To launch an embedding model and embed text:" msgstr "拉起 embedding 模型并使用文本向量化:" #: ../../source/user_guide/client_api.rst:217 msgid "" "Openai client request with the same function as before, excluding launch " "model. More details refer to: https://platform.openai.com/docs/api-" "reference/embeddings?lang=python" msgstr "" "使用 Openai 发送请求时,除了创建模型,其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 " "https://platform.openai.com/docs/api-reference/embeddings?lang=python" #: ../../source/user_guide/client_api.rst:236 msgid "Image" msgstr "图片" #: ../../source/user_guide/client_api.rst:238 #: ../../source/user_guide/client_api.rst:303 msgid "To list the available built-in image models:" msgstr "列出所有内置的文生图模型:" #: ../../source/user_guide/client_api.rst:251 msgid "To initiate an image model and generate an image using a text prompt:" msgstr "初始化一个文生图模型并通过提示词生成图片:" #: ../../source/user_guide/client_api.rst:280 msgid "" "Openai client request with the same function as before, excluding launch " "model. More details refer to: https://platform.openai.com/docs/api-" "reference/images/create?lang=python" msgstr "" "使用 Openai 发送请求时,除了创建模型,其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 " "https://platform.openai.com/docs/api-reference/images/create?lang=python" #: ../../source/user_guide/client_api.rst:301 msgid "Audio" msgstr "" #: ../../source/user_guide/client_api.rst:320 msgid "To initiate an audio model and get text from an audio:" msgstr "初始化一个语音模型并通过语音生成文字:" #: ../../source/user_guide/client_api.rst:349 msgid "" "Openai client request with the same function as before. More details " "refer to: https://platform.openai.com/docs/api-" "reference/audio/createTranscription" msgstr "" "使用 Openai 发送请求时,除了创建模型,其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 " "https://platform.openai.com/docs/api-reference/images/create?lang=python" #: ../../source/user_guide/client_api.rst:370 msgid "Rerank" msgstr "Rerank" #: ../../source/user_guide/client_api.rst:371 msgid "To launch a rerank model and compute the similarity scores:" msgstr "拉起 rerank 模型并计算文本相似度:" #~ msgid "" #~ "Anthropic client request with the same" #~ " function as before, excluding launch " #~ "model." #~ msgstr "使用 Anthropic 发送请求时,除了创建模型,其余的请求都保持与 Anthropic 的接口兼容。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/user_guide/continuous_batching.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2024. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2025-05-27 19:02+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.11.0\n" #: ../../source/user_guide/continuous_batching.rst:5 msgid "Continuous Batching" msgstr "连续批处理" #: ../../source/user_guide/continuous_batching.rst:7 msgid "" "Continuous batching, as a means to improve throughput during model " "serving, has already been implemented in inference engines like ``VLLM``." " Xinference aims to provide this optimization capability when using the " "transformers engine as well." msgstr "" "连续批处理是诸如 ``VLLM`` 这样的推理引擎中提升吞吐的重要技术。Xinference " "旨在通过这项技术提升 ``transformers`` 推理引擎的吞吐。" #: ../../source/user_guide/continuous_batching.rst:11 msgid "Usage" msgstr "使用方式" #: ../../source/user_guide/continuous_batching.rst:14 msgid "LLM" msgstr "大语言模型" #: ../../source/user_guide/continuous_batching.rst:15 msgid "Currently, this feature can be enabled under the following conditions:" msgstr "当前,此功能在满足以下条件时开启:" #: ../../source/user_guide/continuous_batching.rst:17 msgid "" "First, set the environment variable " "``XINFERENCE_TRANSFORMERS_ENABLE_BATCHING`` to ``1`` when starting " "xinference. For example:" msgstr "" "首先,启动 Xinference 时需要将环境变量 ``XINFERENCE_TRANSFORMERS_ENABLE_" "BATCHING`` 置为 ``1`` 。" #: ../../source/user_guide/continuous_batching.rst:25 msgid "" "Since ``v0.16.0``, this feature is turned on by default and is no longer " "required to set the ``XINFERENCE_TRANSFORMERS_ENABLE_BATCHING`` " "environment variable. This environment variable has been removed." msgstr "" "自 ``v0.16.0`` 开始,此功能默认开启,不再需要设置 ``XINFERENCE_" "TRANSFORMERS_ENABLE_BATCHING`` 环境变量,且该环境变量已被移除。" #: ../../source/user_guide/continuous_batching.rst:30 msgid "" "Then, ensure that the ``transformers`` engine is selected when launching " "the model. For example:" msgstr "然后,启动 LLM 模型时选择 ``transformers`` 推理引擎。例如:" #: ../../source/user_guide/continuous_batching.rst:66 msgid "" "Once this feature is enabled, all requests for LLMs will be managed by " "continuous batching, and the average throughput of requests made to a " "single model will increase. The usage of the LLM interface remains " "exactly the same as before, with no differences." msgstr "" "一旦此功能开启,LLM 模型的所有接口将被此功能接管。所有接口的使用方式没有" "任何变化。" #: ../../source/user_guide/continuous_batching.rst:71 msgid "Image Model" msgstr "图像模型" #: ../../source/user_guide/continuous_batching.rst:72 msgid "" "Currently, for image models, only the ``text_to_image`` interface is " "supported for ``FLUX.1`` series models." msgstr "" "当前只有 ``FLUX.1`` 系列模型的 ``text_to_image`` (文生图)接口支持此功能" "。" #: ../../source/user_guide/continuous_batching.rst:74 msgid "" "Enabling this feature requires setting the environment variable " "``XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE``, which indicates the ``size`` " "of the generated images." msgstr "" "图像模型开启此功能需要在启动 xinference 时指定 ``XINFERENCE_TEXT_TO_IMAGE" "_BATCHING_SIZE`` 环境变量,表示生成图片的大小。" #: ../../source/user_guide/continuous_batching.rst:76 msgid "For example, starting xinference like this:" msgstr "例如,像这样启动 xinference:" #: ../../source/user_guide/continuous_batching.rst:83 msgid "" "Then just use the ``text_to_image`` interface as before, and nothing else" " needs to be changed." msgstr "接下来正常使用 ``text_to_image`` 接口即可,其他什么都不需要改变。" #: ../../source/user_guide/continuous_batching.rst:86 msgid "Abort your request" msgstr "中止请求" #: ../../source/user_guide/continuous_batching.rst:87 msgid "In this mode, you can abort requests that are in the process of inference." msgstr "此功能中,你可以优雅地中止正在推理中的请求。" #: ../../source/user_guide/continuous_batching.rst:89 msgid "First, add ``request_id`` option in ``generate_config``. For example:" msgstr "首先,在推理请求的 ``generate_config`` 中指定 ``request_id`` 选项。例如:" #: ../../source/user_guide/continuous_batching.rst:98 msgid "" "Then, abort the request using the ``request_id`` you have set. For " "example:" msgstr "接着,带着你指定的 ``request_id`` 去中止该请求。例如:" #: ../../source/user_guide/continuous_batching.rst:106 msgid "" "Note that if your request has already finished, aborting the request will" " be a no-op. Image models also support this feature." msgstr "注意,如果你的请求已经结束,那么此操作将什么都不做。" #: ../../source/user_guide/continuous_batching.rst:110 msgid "Note" msgstr "注意事项" #: ../../source/user_guide/continuous_batching.rst:112 msgid "" "Currently, for ``LLM`` models, this feature only supports the " "``generate``, ``chat``, ``tool call`` and ``vision`` tasks." msgstr "" "当前,此功能仅支持 LLM 模型的 ``generate``, ``chat``, ``tool call`` (" "工具调用)和 ``vision`` (多模态) 功能。" #: ../../source/user_guide/continuous_batching.rst:114 msgid "" "Currently, for ``image`` models, this feature only supports the " "``text_to_image`` tasks. Only ``FLUX.1`` series models are supported." msgstr "" "当前,对于图像模型,仅支持 `FLUX.1`` 系列模型的 ``text_to_image`` (文生" "图)功能。" #: ../../source/user_guide/continuous_batching.rst:116 msgid "" "For ``vision`` tasks, currently only ``qwen2-vl-instruct``, ``qwen2.5-vl-" "instruct``, ``QvQ-72B-Preview``, ``glm-4v`` and ``MiniCPM-V-2.6`` (only " "for image tasks) models are supported. More models will be supported in " "the future. Please let us know your requirements." msgstr "对于多模态任务,当前支持 ``qwen2-vl-instruct``,``qwen2.5-vl-instruct``,``QvQ-72B-Preview``,``glm-4v`` " "和 ``MiniCPM-V-2.6``。未来将加入更多模型,敬请期待。" #: ../../source/user_guide/continuous_batching.rst:118 msgid "" "If using GPU inference, this method will consume more GPU memory. Please " "be cautious when increasing the number of concurrent requests to the same" " model. The ``launch_model`` interface provides the ``max_num_seqs`` " "parameter to adjust the concurrency level, with a default value of " "``16``." msgstr "" "如果使用 GPU 推理,此功能对显存要求较高。因此请谨慎提高对同一个模型的并发" "请求量。``launch_model`` 接口提供可选参数 ``max_num_seqs`` 用于调整并发度" ",默认值为 ``16`` 。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/user_guide/distributed_inference.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2025. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-29 11:03+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/user_guide/distributed_inference.rst:5 msgid "Distributed Inference" msgstr "分布式推理" #: ../../source/user_guide/distributed_inference.rst:6 msgid "" "Some language models including **DeepSeek V3**, **DeepSeek R1**, etc are " "too large to fit into GPus on a single machine, Xinference supported " "running these models across multiple machines." msgstr "" "一些语言模型,包括 **DeepSeek V3**、**DeepSeek R1** 等,体积过大,无法适配单台机器上的 " "GPU,Xinference 支持在多台机器上运行这些模型。" #: ../../source/user_guide/distributed_inference.rst:13 msgid "Supported Engines" msgstr "支持的引擎" #: ../../source/user_guide/distributed_inference.rst:14 msgid "Now, Xinference supported below engines to run models across workers." msgstr "现在,Xinference 支持如下引擎在多台 worker 上运行模型。" #: ../../source/user_guide/distributed_inference.rst:16 msgid ":ref:`SGLang ` (supported in v1.3.0)" msgstr ":ref:`SGLang ` (在 v1.3.0 中支持)" #: ../../source/user_guide/distributed_inference.rst:17 msgid ":ref:`vLLM ` (supported in v1.4.1)" msgstr ":ref:`vLLM ` (在 v1.4.1 中支持)" #: ../../source/user_guide/distributed_inference.rst:18 msgid "" ":ref:`MLX ` (supported in v1.7.1), MLX distributed currently" " does not support all models. The following model types are supported at " "this time. If you have additional requirements, feel free to submit a " "GitHub issue at `https://github.com/xorbitsai/inference/issues " "`_ to request support." msgstr "" ":ref:`MLX ` (自 v1.7.1 " "起支持)目前在分布式模式下并不支持所有模型。目前支持以下几种模型类型。如果你有其他需求,欢迎在 " "`https://github.com/xorbitsai/inference/issues " "`_ 提交 GitHub issue 来请求支持。" #: ../../source/user_guide/distributed_inference.rst:22 msgid "DeepSeek v3 and R1" msgstr "DeepSeek v3 和 R1" #: ../../source/user_guide/distributed_inference.rst:23 msgid "Qwen2.5-instruct and the models have the same model architectures." msgstr "Qwen2.5-instruct 及其他具有相同模型架构的模型。" #: ../../source/user_guide/distributed_inference.rst:24 msgid "Qwen3 and the models have the same model architectures." msgstr "Qwen3 及其他具有相同模型架构的模型。" #: ../../source/user_guide/distributed_inference.rst:25 msgid "Qwen3-moe and the models have the same model architectures." msgstr "Qwen3-moe 及其他具有相同模型架构的模型。" #: ../../source/user_guide/distributed_inference.rst:30 msgid "Usage" msgstr "使用" #: ../../source/user_guide/distributed_inference.rst:31 msgid "" "First you need at least 2 workers to support distributed inference. Refer" " to :ref:`running Xinference in cluster ` to" " create a Xinference cluster including supervisor and workers." msgstr "" "首先,您需要至少 2 个工作节点来支持分布式推理。请参考 :ref:`在集群中运行 Xinference " "` 以创建包含 supervisor 节点和 worker 节点的 Xinference" " 集群。" #: ../../source/user_guide/distributed_inference.rst:35 msgid "" "vLLM (v0.11.0+) note: Starting from vLLM v0.11.0, distributed deployment " "with vLLM requires Xinference >= v1.17.1. In addition to setting " "``--n-worker`` as before, you must also set ``tensor_parallel_size`` (set" " it to the **GPU count**) and ``pipeline_parallel_size=1`` when launching" " the model." msgstr "" "vLLM(v0.11.0+)注意事项:从vLLM v0.11.0版本开始,使用vLLM进行分布式部署需要Xinference >= " "v1.17.1版本。除原有的 ``--n-worker`` 参数设置外,启动模型时还必须同时设置 " "``tensor_parallel_size`` (将其设置为 **GPU数量** ) 和 ``pipeline_parallel_size=1`` 参数。" #: ../../source/user_guide/distributed_inference.rst:40 msgid "" "Then if are using web UI, choose expected machines for ``worker count`` " "in the optional configurations, if you are using command line, add " "``--n-worker `` when launching a model. The model will be" " launched across multiple workers accordingly." msgstr "" "然后,如果您使用的是 Web UI,请在可选配置中选择期望的机器数量作为 ``worker count``;如果您使用的是命令行,启动模型时请添加" " ``--n-worker <机器数量>``。模型将相应地在多个工作节点上启动。" #: ../../source/user_guide/distributed_inference.rst:48 msgid "" "``GPU count`` on web UI, or ``--n-gpu`` for command line now mean GPUs " "count per worker if you are using distributed inference." msgstr "使用分布式推理时,在 Web UI 中的 ``GPU count`` 或命令行中的 ``--n-gpu`` 现在表示每个工作节点的 GPU 数量。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/user_guide/index.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2023. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2023-10-16 10:33+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.12.1\n" #: ../../source/user_guide/index.rst:5 msgid "User Guide" msgstr "用户指南" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/user_guide/launch.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2025, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2025. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-28 11:54+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" "Language-Team: zh_CN \n" "Plural-Forms: nplurals=1; plural=0;\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.17.0\n" #: ../../source/user_guide/launch.rst:5 msgid "Model Launching Instructions" msgstr "模型加载指南" #: ../../source/user_guide/launch.rst:7 msgid "This document aims to provide a functional overview of model launching." msgstr "本文档旨在提供模型加载的功能说明。" #: ../../source/user_guide/launch.rst:10 msgid "Replica" msgstr "副本" #: ../../source/user_guide/launch.rst:12 msgid "" "Replicas specify the number of model instances to load. For example, if " "you have two GPUs and each can host one replica of the model, you can set" " the replica count to 2. This way, two identical instances of the model " "will be distributed across the two GPUs. Xinference automatically load-" "balances requests to ensure even distribution across multiple GPUs. " "Meanwhile, users see it as a single model, which greatly improves overall" " resource utilization." msgstr "" "副本用来指定模型加载的实例份数。比如,你有两张 GPU,每张卡可以放下模型的一个副本,你可以设置副本数为 " "2。这样,两个完全相同的模型实例将分布在这两张 GPU 上。Xinference " "会自动进行负载均衡,确保请求均匀分配到多张卡上。用户看到的仍是一个模型,这大大提升了整体资源利用率。" #: ../../source/user_guide/launch.rst:17 msgid "Traditional Multi-Instance Deployment:" msgstr "旧版本多实例部署:" #: ../../source/user_guide/launch.rst:19 msgid "" "When you have multiple GPU cards, each capable of hosting one model " "instance, you can set the number of instances equal to the number of " "GPUs. For example:" msgstr "当您拥有多张GPU显卡时,每张显卡可承载一个模型实例,此时可将实例数量设置为等于GPU数量。例如:" #: ../../source/user_guide/launch.rst:21 msgid "2 GPUs, 2 instances: Each GPU runs one model instance" msgstr "2张GPU,2个实例:每张GPU运行一个模型实例" #: ../../source/user_guide/launch.rst:22 msgid "4 GPUs, 4 instances: Each GPU runs one model instance" msgstr "4张GPU,4个实例:每张GPU运行一个模型实例" #: ../../source/user_guide/launch.rst:26 msgid "Introduce a new environment variable:" msgstr "引入一个新的环境变量:" #: ../../source/user_guide/launch.rst:32 msgid "" "Control whether to enable the single GPU multi-copy feature Default " "value: 1" msgstr "控制是否启用单GPU多副本功能,默认值:1" #: ../../source/user_guide/launch.rst:35 msgid "New Feature: Smart Replica Deployment" msgstr "新功能:智能副本部署" #: ../../source/user_guide/launch.rst:37 msgid "Single GPU Multi-Replica" msgstr "单GPU多副本" #: ../../source/user_guide/launch.rst:39 msgid "New Support: Run multiple model replicas even with just one GPU." msgstr "新增支持:即使仅有一块GPU,也能运行多个模型副本。" #: ../../source/user_guide/launch.rst:41 msgid "Scenario: You have 1 GPU with sufficient VRAM" msgstr "场景:您拥有1个GPU且显存充足" #: ../../source/user_guide/launch.rst:42 msgid "Configuration: Replica Count = 3, GPU Count = 1" msgstr "配置:副本数量=3,GPU数量=1" #: ../../source/user_guide/launch.rst:43 msgid "Result: 3 model instances running on the same GPU, sharing GPU resources" msgstr "结果:3个模型实例,在同一GPU上运行,共享GPU资源" #: ../../source/user_guide/launch.rst:45 msgid "Hybrid GPU Allocation" msgstr "混合GPU分配" #: ../../source/user_guide/launch.rst:47 msgid "" "Smart Allocation: Number of replicas may differ from GPU count; system " "intelligently distributes" msgstr "智能分配: 副本数可以不等于GPU数量,系统会智能分配" #: ../../source/user_guide/launch.rst:49 msgid "Scenario: You have 2 GPUs and need 3 replicas" msgstr "场景: 你有2张GPU,需要3个副本" #: ../../source/user_guide/launch.rst:50 msgid "Configuration: Replicas=3, GPUs=2" msgstr "配置: 副本数=3,GPU数量=2" #: ../../source/user_guide/launch.rst:51 msgid "Result: GPU0 runs 2 instances, GPU1 runs 1 instance" msgstr "结果: GPU0运行2个实例,GPU1运行1个实例" #: ../../source/user_guide/launch.rst:54 msgid "GPU Allocation Strategy" msgstr "混合分配策略" #: ../../source/user_guide/launch.rst:56 msgid "" "The current policy is *Idle First*: The scheduler always attempts to " "assign replicas to the least utilized GPU. Use the " "``XINFERENCE_LAUNCH_STRATEGY`` parameter to choose launch strategy." msgstr "" "当前策略为 *空闲优先* :调度器始终尝试将副本分配至最空闲的GPU。使用 ``XINFERENCE_ENV_LAUNCH_STRATEGY`` " "参数选择启动策略。" #: ../../source/user_guide/launch.rst:59 msgid "Set Environment Variables" msgstr "设置环境变量" #: ../../source/user_guide/launch.rst:63 msgid "" "Sometimes, we want to specify environment variables for a particular " "model at runtime. Since v1.8.1, Xinference provides the capability to " "configure these individually without needing to set them before starting " "Xinference." msgstr "" "有时我们希望在运行时为特定模型指定环境变量。从 v1.8.1 开始,Xinference 提供了单独配置环境变量的功能,无需在启动 " "Xinference 前设置。" #: ../../source/user_guide/launch.rst:66 msgid "For Web UI." msgstr "针对 Web UI。" #: ../../source/user_guide/launch.rst:72 msgid "" "When using the command line, use ``--env`` to specify an environment " "variable." msgstr "命令行使用时,使用 ``--env`` 指定环境变量。" #: ../../source/user_guide/launch.rst:74 ../../source/user_guide/launch.rst:123 msgid "Example usage:" msgstr "示例用法:" #: ../../source/user_guide/launch.rst:80 msgid "" "Take vLLM as an example: it has versions V1 and V0, and by default, it " "automatically determines which version to use. If you want to force the " "use of V0 by setting ``VLLM_USE_V1=0`` when launching a model, you can " "specify this during model launching." msgstr "" "以 vLLM 为例,它有 V1 和 V0 两个版本,默认会自动判定使用哪个版本。如果想在加载模型时强制通过设置 ``VLLM_USE_V1=0``" " 来使用 V0,可以指定该环境变量。" #: ../../source/user_guide/launch.rst:84 msgid "Configuring Model Virtual Environment" msgstr "配置模型虚拟空间" #: ../../source/user_guide/launch.rst:88 msgid "" "For this part, please refer to :ref:`toggling virtual environments and " "customizing dependencies `." msgstr "对于这部分,请参考 :ref:`开关虚拟空间和定制依赖 `。" #: ../../source/user_guide/launch.rst:91 msgid "Batching / Continuous Batching" msgstr "批处理 / 连续批处理" #: ../../source/user_guide/launch.rst:93 msgid "" "Xinference supports batching for higher throughput. For LLMs on the " "``transformers`` engine, continuous batching is available and can be " "enabled via environment variables at launch time." msgstr "" "Xinference支持批处理以提升吞吐量。对于基于 ``transformers`` " "引擎的大型语言模型,可启用连续批处理功能,该功能可在启动时通过环境变量进行配置。" #: ../../source/user_guide/launch.rst:96 msgid "Key settings:" msgstr "关键设置:" #: ../../source/user_guide/launch.rst:98 msgid "" "``XINFERENCE_BATCH_SIZE`` and ``XINFERENCE_BATCH_INTERVAL`` for general " "batching behavior." msgstr " ``XINFERENCE_BATCH_SIZE`` 和 ``XINFERENCE_BATCH_INTERVAL`` 用于控制常规的批处理行为。" #: ../../source/user_guide/launch.rst:99 msgid "" "``XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE`` for text-to-image models (when" " supported)." msgstr " ``XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE``(文本转图像模型,当支持时)。" #: ../../source/user_guide/launch.rst:101 msgid "Example (LLM, transformers):" msgstr "示例(大型语言模型,Transformers):" #: ../../source/user_guide/launch.rst:108 msgid "Example (text-to-image):" msgstr "示例(文生图):" #: ../../source/user_guide/launch.rst:114 msgid "" "For detailed behavior, supported models, and aborting requests, see " ":ref:`Continuous Batching `." msgstr "" "有关详细行为、支持的模型以及中止请求的信息,请参阅" " :ref:`连续批处理 ` 。" #: ../../source/user_guide/launch.rst:118 msgid "Thinking Mode" msgstr "思考模式" #: ../../source/user_guide/launch.rst:120 msgid "" "Some hybrid reasoning models (for example, Qwen3) support an optional " "*thinking mode*. You can enable this at launch time via ``--enable-" "thinking``." msgstr "某些混合推理模型(例如Qwen3)支持可选的 *思考模式* 。您可在启动时通过 ``--enable-thinking`` 参数启用该功能。" ================================================ FILE: doc/source/locale/zh_CN/LC_MESSAGES/user_guide/vllm_enhancement.po ================================================ # SOME DESCRIPTIVE TITLE. # Copyright (C) 2023, Xorbits Inc. # This file is distributed under the same license as the Xinference package. # FIRST AUTHOR , 2025. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Xinference \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2026-01-15 13:56+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.14.0\n" #: ../../source/user_guide/vllm_enhancement.rst:5 msgid "Xavier: Share KV Cache between vllm replicas" msgstr "Xavier: 多VLLM副本间共享KV Cache" #: ../../source/user_guide/vllm_enhancement.rst:6 msgid "" "For scenarios such as long document queries and multi-round " "conversations, the computation during the inference prefill phase can be " "particularly heavy, which affects overall throughput and the latency of " "individual inferences. Xinference enhances the vllm engine by introducing" " the ``Xavier`` framework, enabling KV cache sharing across multiple vllm" " instances. This allows KV cache computed by other replicas to be " "directly reused, avoiding redundant computations." msgstr "" "对于长文档查询和多轮对话等场景,在推理预填充阶段的计算可能特别繁重,这会" "影响整体吞吐量和单次推理的延迟。Xinference 通过引入 ``Xavier`` 框架来增强" " vllm 引擎,支持在多个 vllm 实例之间共享 KV 缓存。这使得其他副本计算出的 " "KV 缓存可以被直接重用,从而避免了冗余计算。" #: ../../source/user_guide/vllm_enhancement.rst:15 msgid "Usage" msgstr "使用" #: ../../source/user_guide/vllm_enhancement.rst:16 msgid "" "Simply add the parameter ``enable_xavier=True`` when starting the vllm " "model." msgstr "" "启动 vllm 模型时设置选项 ``enable_xavier=True`` 即可。" #: ../../source/user_guide/vllm_enhancement.rst:20 msgid "Limitations" msgstr "限制" #: ../../source/user_guide/vllm_enhancement.rst:21 msgid "" "Xavier requires vllm version >= ``0.7.0``, and currently not supports for" " vllm version >= ``0.11.0`` due to vllm reconstruction." msgstr "" "Xavier 要求 vllm 版本不低于 ``0.7.0`` 。暂不支持vllm 版本高于 ``0.11.0``。" #: ../../source/user_guide/vllm_enhancement.rst:22 msgid "" "Due to the underlying communication not recognizing ``0.0.0.0``, the " "actual IP address needs to be passed when starting Xinference, for " "example: ``xinference-local -H 192.168.xx.xx``." msgstr "" "由于底层通信无法识别 ``0.0.0.0`` 地址,启动 xinference 时需要配置实际的 " "IP 地址,例如:``xinference-local -H 192.168.xx.xx`` 。" #: ../../source/user_guide/vllm_enhancement.rst:23 msgid "Xavier only works for Nvidia product." msgstr "Xavier 只支持 Nvidia 显卡。" ================================================ FILE: doc/source/models/builtin/audio/belle-distilwhisper-large-v2-zh.rst ================================================ .. _models_builtin_belle-distilwhisper-large-v2-zh: =============================== Belle-distilwhisper-large-v2-zh =============================== - **Model Name:** Belle-distilwhisper-large-v2-zh - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** BELLE-2/Belle-distilwhisper-large-v2-zh Execute the following command to launch the model:: xinference launch --model-name Belle-distilwhisper-large-v2-zh --model-type audio ================================================ FILE: doc/source/models/builtin/audio/belle-whisper-large-v2-zh.rst ================================================ .. _models_builtin_belle-whisper-large-v2-zh: ========================= Belle-whisper-large-v2-zh ========================= - **Model Name:** Belle-whisper-large-v2-zh - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** BELLE-2/Belle-whisper-large-v2-zh Execute the following command to launch the model:: xinference launch --model-name Belle-whisper-large-v2-zh --model-type audio ================================================ FILE: doc/source/models/builtin/audio/belle-whisper-large-v3-zh.rst ================================================ .. _models_builtin_belle-whisper-large-v3-zh: ========================= Belle-whisper-large-v3-zh ========================= - **Model Name:** Belle-whisper-large-v3-zh - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** BELLE-2/Belle-whisper-large-v3-zh Execute the following command to launch the model:: xinference launch --model-name Belle-whisper-large-v3-zh --model-type audio ================================================ FILE: doc/source/models/builtin/audio/chattts.rst ================================================ .. _models_builtin_chattts: ======= ChatTTS ======= - **Model Name:** ChatTTS - **Model Family:** ChatTTS - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** 2Noise/ChatTTS Execute the following command to launch the model:: xinference launch --model-name ChatTTS --model-type audio ================================================ FILE: doc/source/models/builtin/audio/cosyvoice-300m-instruct.rst ================================================ .. _models_builtin_cosyvoice-300m-instruct: ======================= CosyVoice-300M-Instruct ======================= - **Model Name:** CosyVoice-300M-Instruct - **Model Family:** CosyVoice - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** FunAudioLLM/CosyVoice-300M-Instruct Execute the following command to launch the model:: xinference launch --model-name CosyVoice-300M-Instruct --model-type audio ================================================ FILE: doc/source/models/builtin/audio/cosyvoice-300m-sft.rst ================================================ .. _models_builtin_cosyvoice-300m-sft: ================== CosyVoice-300M-SFT ================== - **Model Name:** CosyVoice-300M-SFT - **Model Family:** CosyVoice - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** FunAudioLLM/CosyVoice-300M-SFT Execute the following command to launch the model:: xinference launch --model-name CosyVoice-300M-SFT --model-type audio ================================================ FILE: doc/source/models/builtin/audio/cosyvoice-300m.rst ================================================ .. _models_builtin_cosyvoice-300m: ============== CosyVoice-300M ============== - **Model Name:** CosyVoice-300M - **Model Family:** CosyVoice - **Abilities:** ['text2audio', 'text2audio_voice_cloning'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** FunAudioLLM/CosyVoice-300M Execute the following command to launch the model:: xinference launch --model-name CosyVoice-300M --model-type audio ================================================ FILE: doc/source/models/builtin/audio/cosyvoice2-0.5b.rst ================================================ .. _models_builtin_cosyvoice2-0.5b: =============== CosyVoice2-0.5B =============== - **Model Name:** CosyVoice2-0.5B - **Model Family:** CosyVoice - **Abilities:** ['text2audio', 'text2audio_zero_shot', 'text2audio_voice_cloning'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** JunHowie/CosyVoice2-0.5B Execute the following command to launch the model:: xinference launch --model-name CosyVoice2-0.5B --model-type audio ================================================ FILE: doc/source/models/builtin/audio/f5-tts-mlx.rst ================================================ .. _models_builtin_f5-tts-mlx: ========== F5-TTS-MLX ========== - **Model Name:** F5-TTS-MLX - **Model Family:** F5-TTS-MLX - **Abilities:** ['text2audio', 'text2audio_zero_shot', 'text2audio_voice_cloning'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** lucasnewman/f5-tts-mlx Execute the following command to launch the model:: xinference launch --model-name F5-TTS-MLX --model-type audio ================================================ FILE: doc/source/models/builtin/audio/f5-tts.rst ================================================ .. _models_builtin_f5-tts: ====== F5-TTS ====== - **Model Name:** F5-TTS - **Model Family:** F5-TTS - **Abilities:** ['text2audio', 'text2audio_zero_shot', 'text2audio_voice_cloning'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** SWivid/F5-TTS Execute the following command to launch the model:: xinference launch --model-name F5-TTS --model-type audio ================================================ FILE: doc/source/models/builtin/audio/fishspeech-1.5.rst ================================================ .. _models_builtin_fishspeech-1.5: ============== FishSpeech-1.5 ============== - **Model Name:** FishSpeech-1.5 - **Model Family:** FishAudio - **Abilities:** ['text2audio', 'text2audio_zero_shot', 'text2audio_voice_cloning'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** fishaudio/fish-speech-1.5 Execute the following command to launch the model:: xinference launch --model-name FishSpeech-1.5 --model-type audio ================================================ FILE: doc/source/models/builtin/audio/fun-asr-mlt-nano-2512.rst ================================================ .. _models_builtin_fun-asr-mlt-nano-2512: ===================== Fun-ASR-MLT-Nano-2512 ===================== - **Model Name:** Fun-ASR-MLT-Nano-2512 - **Model Family:** funasr - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** FunAudioLLM/Fun-ASR-MLT-Nano-2512 Execute the following command to launch the model:: xinference launch --model-name Fun-ASR-MLT-Nano-2512 --model-type audio ================================================ FILE: doc/source/models/builtin/audio/fun-asr-nano-2512.rst ================================================ .. _models_builtin_fun-asr-nano-2512: ================= Fun-ASR-Nano-2512 ================= - **Model Name:** Fun-ASR-Nano-2512 - **Model Family:** funasr - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** FunAudioLLM/Fun-ASR-Nano-2512 Execute the following command to launch the model:: xinference launch --model-name Fun-ASR-Nano-2512 --model-type audio ================================================ FILE: doc/source/models/builtin/audio/index.rst ================================================ .. _models_audio_index: ================ Audio Models ================ The following is a list of built-in audio models in Xinference: .. toctree:: :maxdepth: 1 belle-distilwhisper-large-v2-zh belle-whisper-large-v2-zh belle-whisper-large-v3-zh chattts cosyvoice-300m cosyvoice-300m-instruct cosyvoice-300m-sft cosyvoice2-0.5b f5-tts f5-tts-mlx fishspeech-1.5 fun-asr-mlt-nano-2512 fun-asr-nano-2512 indextts2 kokoro-82m kokoro-82m-mlx kokoro-82m-v1.1-zh megatts3 melotts-chinese melotts-english melotts-english-v2 melotts-english-v3 melotts-french melotts-japanese melotts-korean melotts-spanish paraformer-zh paraformer-zh-hotword paraformer-zh-long paraformer-zh-spk qwen3-asr-0.6b qwen3-asr-1.7b seaco-paraformer-zh sensevoicesmall whisper-base whisper-base-mlx whisper-base.en whisper-base.en-mlx whisper-large-v3 whisper-large-v3-mlx whisper-large-v3-turbo whisper-large-v3-turbo-mlx whisper-medium whisper-medium-mlx whisper-medium.en whisper-medium.en-mlx whisper-small whisper-small-mlx whisper-small.en whisper-small.en-mlx whisper-tiny whisper-tiny-mlx whisper-tiny.en whisper-tiny.en-mlx ================================================ FILE: doc/source/models/builtin/audio/indextts2.rst ================================================ .. _models_builtin_indextts2: ========= IndexTTS2 ========= - **Model Name:** IndexTTS2 - **Model Family:** IndexTTS2 - **Abilities:** ['text2audio', 'text2audio_zero_shot', 'text2audio_voice_cloning', 'text2audio_emotion_control'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** IndexTeam/IndexTTS-2 Execute the following command to launch the model:: xinference launch --model-name IndexTTS2 --model-type audio ================================================ FILE: doc/source/models/builtin/audio/kokoro-82m-mlx.rst ================================================ .. _models_builtin_kokoro-82m-mlx: ============== Kokoro-82M-MLX ============== - **Model Name:** Kokoro-82M-MLX - **Model Family:** Kokoro-MLX - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** prince-canuma/Kokoro-82M Execute the following command to launch the model:: xinference launch --model-name Kokoro-82M-MLX --model-type audio ================================================ FILE: doc/source/models/builtin/audio/kokoro-82m-v1.1-zh.rst ================================================ .. _models_builtin_kokoro-82m-v1.1-zh: ================== Kokoro-82M-v1.1-zh ================== - **Model Name:** Kokoro-82M-v1.1-zh - **Model Family:** Kokoro-zh - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** hexgrad/Kokoro-82M-v1.1-zh Execute the following command to launch the model:: xinference launch --model-name Kokoro-82M-v1.1-zh --model-type audio ================================================ FILE: doc/source/models/builtin/audio/kokoro-82m.rst ================================================ .. _models_builtin_kokoro-82m: ========== Kokoro-82M ========== - **Model Name:** Kokoro-82M - **Model Family:** Kokoro - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** hexgrad/Kokoro-82M Execute the following command to launch the model:: xinference launch --model-name Kokoro-82M --model-type audio ================================================ FILE: doc/source/models/builtin/audio/megatts3.rst ================================================ .. _models_builtin_megatts3: ======== MegaTTS3 ======== - **Model Name:** MegaTTS3 - **Model Family:** MegaTTS - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** ByteDance/MegaTTS3 Execute the following command to launch the model:: xinference launch --model-name MegaTTS3 --model-type audio ================================================ FILE: doc/source/models/builtin/audio/melotts-chinese.rst ================================================ .. _models_builtin_melotts-chinese: =============== MeloTTS-Chinese =============== - **Model Name:** MeloTTS-Chinese - **Model Family:** MeloTTS - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** myshell-ai/MeloTTS-Chinese Execute the following command to launch the model:: xinference launch --model-name MeloTTS-Chinese --model-type audio ================================================ FILE: doc/source/models/builtin/audio/melotts-english-v2.rst ================================================ .. _models_builtin_melotts-english-v2: ================== MeloTTS-English-v2 ================== - **Model Name:** MeloTTS-English-v2 - **Model Family:** MeloTTS - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** myshell-ai/MeloTTS-English-v2 Execute the following command to launch the model:: xinference launch --model-name MeloTTS-English-v2 --model-type audio ================================================ FILE: doc/source/models/builtin/audio/melotts-english-v3.rst ================================================ .. _models_builtin_melotts-english-v3: ================== MeloTTS-English-v3 ================== - **Model Name:** MeloTTS-English-v3 - **Model Family:** MeloTTS - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** myshell-ai/MeloTTS-English-v3 Execute the following command to launch the model:: xinference launch --model-name MeloTTS-English-v3 --model-type audio ================================================ FILE: doc/source/models/builtin/audio/melotts-english.rst ================================================ .. _models_builtin_melotts-english: =============== MeloTTS-English =============== - **Model Name:** MeloTTS-English - **Model Family:** MeloTTS - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** myshell-ai/MeloTTS-English Execute the following command to launch the model:: xinference launch --model-name MeloTTS-English --model-type audio ================================================ FILE: doc/source/models/builtin/audio/melotts-french.rst ================================================ .. _models_builtin_melotts-french: ============== MeloTTS-French ============== - **Model Name:** MeloTTS-French - **Model Family:** MeloTTS - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** myshell-ai/MeloTTS-French Execute the following command to launch the model:: xinference launch --model-name MeloTTS-French --model-type audio ================================================ FILE: doc/source/models/builtin/audio/melotts-japanese.rst ================================================ .. _models_builtin_melotts-japanese: ================ MeloTTS-Japanese ================ - **Model Name:** MeloTTS-Japanese - **Model Family:** MeloTTS - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** myshell-ai/MeloTTS-Japanese Execute the following command to launch the model:: xinference launch --model-name MeloTTS-Japanese --model-type audio ================================================ FILE: doc/source/models/builtin/audio/melotts-korean.rst ================================================ .. _models_builtin_melotts-korean: ============== MeloTTS-Korean ============== - **Model Name:** MeloTTS-Korean - **Model Family:** MeloTTS - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** myshell-ai/MeloTTS-Korean Execute the following command to launch the model:: xinference launch --model-name MeloTTS-Korean --model-type audio ================================================ FILE: doc/source/models/builtin/audio/melotts-spanish.rst ================================================ .. _models_builtin_melotts-spanish: =============== MeloTTS-Spanish =============== - **Model Name:** MeloTTS-Spanish - **Model Family:** MeloTTS - **Abilities:** ['text2audio', 'text2audio_zero_shot'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** myshell-ai/MeloTTS-Spanish Execute the following command to launch the model:: xinference launch --model-name MeloTTS-Spanish --model-type audio ================================================ FILE: doc/source/models/builtin/audio/paraformer-zh-hotword.rst ================================================ .. _models_builtin_paraformer-zh-hotword: ===================== paraformer-zh-hotword ===================== - **Model Name:** paraformer-zh-hotword - **Model Family:** funasr - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** JunHowie/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404 Execute the following command to launch the model:: xinference launch --model-name paraformer-zh-hotword --model-type audio ================================================ FILE: doc/source/models/builtin/audio/paraformer-zh-long.rst ================================================ .. _models_builtin_paraformer-zh-long: ================== paraformer-zh-long ================== - **Model Name:** paraformer-zh-long - **Model Family:** funasr - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** JunHowie/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch Execute the following command to launch the model:: xinference launch --model-name paraformer-zh-long --model-type audio ================================================ FILE: doc/source/models/builtin/audio/paraformer-zh-spk.rst ================================================ .. _models_builtin_paraformer-zh-spk: ================= paraformer-zh-spk ================= - **Model Name:** paraformer-zh-spk - **Model Family:** funasr - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** JunHowie/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn Execute the following command to launch the model:: xinference launch --model-name paraformer-zh-spk --model-type audio ================================================ FILE: doc/source/models/builtin/audio/paraformer-zh.rst ================================================ .. _models_builtin_paraformer-zh: ============= paraformer-zh ============= - **Model Name:** paraformer-zh - **Model Family:** funasr - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** funasr/paraformer-zh Execute the following command to launch the model:: xinference launch --model-name paraformer-zh --model-type audio ================================================ FILE: doc/source/models/builtin/audio/qwen3-asr-0.6b.rst ================================================ .. _models_builtin_qwen3-asr-0.6b: ============== Qwen3-ASR-0.6B ============== - **Model Name:** Qwen3-ASR-0.6B - **Model Family:** qwen3_asr - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen3-ASR-0.6B Execute the following command to launch the model:: xinference launch --model-name Qwen3-ASR-0.6B --model-type audio ================================================ FILE: doc/source/models/builtin/audio/qwen3-asr-1.7b.rst ================================================ .. _models_builtin_qwen3-asr-1.7b: ============== Qwen3-ASR-1.7B ============== - **Model Name:** Qwen3-ASR-1.7B - **Model Family:** qwen3_asr - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen3-ASR-1.7B Execute the following command to launch the model:: xinference launch --model-name Qwen3-ASR-1.7B --model-type audio ================================================ FILE: doc/source/models/builtin/audio/seaco-paraformer-zh.rst ================================================ .. _models_builtin_seaco-paraformer-zh: =================== seaco-paraformer-zh =================== - **Model Name:** seaco-paraformer-zh - **Model Family:** funasr - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** JunHowie/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch Execute the following command to launch the model:: xinference launch --model-name seaco-paraformer-zh --model-type audio ================================================ FILE: doc/source/models/builtin/audio/sensevoicesmall.rst ================================================ .. _models_builtin_sensevoicesmall: =============== SenseVoiceSmall =============== - **Model Name:** SenseVoiceSmall - **Model Family:** funasr - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** FunAudioLLM/SenseVoiceSmall Execute the following command to launch the model:: xinference launch --model-name SenseVoiceSmall --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-base-mlx.rst ================================================ .. _models_builtin_whisper-base-mlx: ================ whisper-base-mlx ================ - **Model Name:** whisper-base-mlx - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** mlx-community/whisper-base-mlx Execute the following command to launch the model:: xinference launch --model-name whisper-base-mlx --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-base.en-mlx.rst ================================================ .. _models_builtin_whisper-base.en-mlx: =================== whisper-base.en-mlx =================== - **Model Name:** whisper-base.en-mlx - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** mlx-community/whisper-base.en-mlx Execute the following command to launch the model:: xinference launch --model-name whisper-base.en-mlx --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-base.en.rst ================================================ .. _models_builtin_whisper-base.en: =============== whisper-base.en =============== - **Model Name:** whisper-base.en - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** openai/whisper-base.en Execute the following command to launch the model:: xinference launch --model-name whisper-base.en --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-base.rst ================================================ .. _models_builtin_whisper-base: ============ whisper-base ============ - **Model Name:** whisper-base - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** openai/whisper-base Execute the following command to launch the model:: xinference launch --model-name whisper-base --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-large-v3-mlx.rst ================================================ .. _models_builtin_whisper-large-v3-mlx: ==================== whisper-large-v3-mlx ==================== - **Model Name:** whisper-large-v3-mlx - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** mlx-community/whisper-large-v3-mlx Execute the following command to launch the model:: xinference launch --model-name whisper-large-v3-mlx --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-large-v3-turbo-mlx.rst ================================================ .. _models_builtin_whisper-large-v3-turbo-mlx: ========================== whisper-large-v3-turbo-mlx ========================== - **Model Name:** whisper-large-v3-turbo-mlx - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** mlx-community/whisper-large-v3-turbo Execute the following command to launch the model:: xinference launch --model-name whisper-large-v3-turbo-mlx --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-large-v3-turbo.rst ================================================ .. _models_builtin_whisper-large-v3-turbo: ====================== whisper-large-v3-turbo ====================== - **Model Name:** whisper-large-v3-turbo - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** openai/whisper-large-v3-turbo Execute the following command to launch the model:: xinference launch --model-name whisper-large-v3-turbo --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-large-v3.rst ================================================ .. _models_builtin_whisper-large-v3: ================ whisper-large-v3 ================ - **Model Name:** whisper-large-v3 - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** openai/whisper-large-v3 Execute the following command to launch the model:: xinference launch --model-name whisper-large-v3 --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-medium-mlx.rst ================================================ .. _models_builtin_whisper-medium-mlx: ================== whisper-medium-mlx ================== - **Model Name:** whisper-medium-mlx - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** mlx-community/whisper-medium-mlx Execute the following command to launch the model:: xinference launch --model-name whisper-medium-mlx --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-medium.en-mlx.rst ================================================ .. _models_builtin_whisper-medium.en-mlx: ===================== whisper-medium.en-mlx ===================== - **Model Name:** whisper-medium.en-mlx - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** mlx-community/whisper-medium.en-mlx Execute the following command to launch the model:: xinference launch --model-name whisper-medium.en-mlx --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-medium.en.rst ================================================ .. _models_builtin_whisper-medium.en: ================= whisper-medium.en ================= - **Model Name:** whisper-medium.en - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** openai/whisper-medium.en Execute the following command to launch the model:: xinference launch --model-name whisper-medium.en --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-medium.rst ================================================ .. _models_builtin_whisper-medium: ============== whisper-medium ============== - **Model Name:** whisper-medium - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** openai/whisper-medium Execute the following command to launch the model:: xinference launch --model-name whisper-medium --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-small-mlx.rst ================================================ .. _models_builtin_whisper-small-mlx: ================= whisper-small-mlx ================= - **Model Name:** whisper-small-mlx - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** mlx-community/whisper-small-mlx Execute the following command to launch the model:: xinference launch --model-name whisper-small-mlx --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-small.en-mlx.rst ================================================ .. _models_builtin_whisper-small.en-mlx: ==================== whisper-small.en-mlx ==================== - **Model Name:** whisper-small.en-mlx - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** mlx-community/whisper-small.en-mlx Execute the following command to launch the model:: xinference launch --model-name whisper-small.en-mlx --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-small.en.rst ================================================ .. _models_builtin_whisper-small.en: ================ whisper-small.en ================ - **Model Name:** whisper-small.en - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** openai/whisper-small.en Execute the following command to launch the model:: xinference launch --model-name whisper-small.en --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-small.rst ================================================ .. _models_builtin_whisper-small: ============= whisper-small ============= - **Model Name:** whisper-small - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** openai/whisper-small Execute the following command to launch the model:: xinference launch --model-name whisper-small --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-tiny-mlx.rst ================================================ .. _models_builtin_whisper-tiny-mlx: ================ whisper-tiny-mlx ================ - **Model Name:** whisper-tiny-mlx - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** mlx-community/whisper-tiny Execute the following command to launch the model:: xinference launch --model-name whisper-tiny-mlx --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-tiny.en-mlx.rst ================================================ .. _models_builtin_whisper-tiny.en-mlx: =================== whisper-tiny.en-mlx =================== - **Model Name:** whisper-tiny.en-mlx - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** mlx-community/whisper-tiny.en-mlx Execute the following command to launch the model:: xinference launch --model-name whisper-tiny.en-mlx --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-tiny.en.rst ================================================ .. _models_builtin_whisper-tiny.en: =============== whisper-tiny.en =============== - **Model Name:** whisper-tiny.en - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** False Specifications ^^^^^^^^^^^^^^ - **Model ID:** openai/whisper-tiny.en Execute the following command to launch the model:: xinference launch --model-name whisper-tiny.en --model-type audio ================================================ FILE: doc/source/models/builtin/audio/whisper-tiny.rst ================================================ .. _models_builtin_whisper-tiny: ============ whisper-tiny ============ - **Model Name:** whisper-tiny - **Model Family:** whisper - **Abilities:** ['audio2text'] - **Multilingual:** True Specifications ^^^^^^^^^^^^^^ - **Model ID:** openai/whisper-tiny Execute the following command to launch the model:: xinference launch --model-name whisper-tiny --model-type audio ================================================ FILE: doc/source/models/builtin/embedding/bce-embedding-base_v1.rst ================================================ .. _models_builtin_bce-embedding-base_v1: ===================== bce-embedding-base_v1 ===================== - **Model Name:** bce-embedding-base_v1 - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 768 - **Max Tokens:** 512 - **Model ID:** maidalun1020/bce-embedding-base_v1 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bce-embedding-base_v1 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-base-en-v1.5.rst ================================================ .. _models_builtin_bge-base-en-v1.5: ================ bge-base-en-v1.5 ================ - **Model Name:** bge-base-en-v1.5 - **Languages:** en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 768 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-base-en-v1.5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-base-en-v1.5 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-base-en.rst ================================================ .. _models_builtin_bge-base-en: =========== bge-base-en =========== - **Model Name:** bge-base-en - **Languages:** en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 768 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-base-en - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-base-en --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-base-zh-v1.5.rst ================================================ .. _models_builtin_bge-base-zh-v1.5: ================ bge-base-zh-v1.5 ================ - **Model Name:** bge-base-zh-v1.5 - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 768 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-base-zh-v1.5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-base-zh-v1.5 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-base-zh.rst ================================================ .. _models_builtin_bge-base-zh: =========== bge-base-zh =========== - **Model Name:** bge-base-zh - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 768 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-base-zh - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-base-zh --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-large-en-v1.5.rst ================================================ .. _models_builtin_bge-large-en-v1.5: ================= bge-large-en-v1.5 ================= - **Model Name:** bge-large-en-v1.5 - **Languages:** en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-large-en-v1.5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-large-en-v1.5 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-large-en.rst ================================================ .. _models_builtin_bge-large-en: ============ bge-large-en ============ - **Model Name:** bge-large-en - **Languages:** en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-large-en - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-large-en --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-large-zh-noinstruct.rst ================================================ .. _models_builtin_bge-large-zh-noinstruct: ======================= bge-large-zh-noinstruct ======================= - **Model Name:** bge-large-zh-noinstruct - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-large-zh-noinstruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-large-zh-noinstruct --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-large-zh-v1.5.rst ================================================ .. _models_builtin_bge-large-zh-v1.5: ================= bge-large-zh-v1.5 ================= - **Model Name:** bge-large-zh-v1.5 - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-large-zh-v1.5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-large-zh-v1.5 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-large-zh.rst ================================================ .. _models_builtin_bge-large-zh: ============ bge-large-zh ============ - **Model Name:** bge-large-zh - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-large-zh - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-large-zh --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-m3.rst ================================================ .. _models_builtin_bge-m3: ====== bge-m3 ====== - **Model Name:** bge-m3 - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 8192 - **Model ID:** BAAI/bge-m3 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-m3 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-small-en-v1.5.rst ================================================ .. _models_builtin_bge-small-en-v1.5: ================= bge-small-en-v1.5 ================= - **Model Name:** bge-small-en-v1.5 - **Languages:** en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 384 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-small-en-v1.5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-small-en-v1.5 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-small-zh-v1.5.rst ================================================ .. _models_builtin_bge-small-zh-v1.5: ================= bge-small-zh-v1.5 ================= - **Model Name:** bge-small-zh-v1.5 - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 512 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-small-zh-v1.5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-small-zh-v1.5 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/bge-small-zh.rst ================================================ .. _models_builtin_bge-small-zh: ============ bge-small-zh ============ - **Model Name:** bge-small-zh - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 512 - **Max Tokens:** 512 - **Model ID:** BAAI/bge-small-zh - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name bge-small-zh --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/e5-large-v2.rst ================================================ .. _models_builtin_e5-large-v2: =========== e5-large-v2 =========== - **Model Name:** e5-large-v2 - **Languages:** en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 512 - **Model ID:** intfloat/e5-large-v2 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name e5-large-v2 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/gme-qwen2-vl-2b-instruct.rst ================================================ .. _models_builtin_gme-qwen2-vl-2b-instruct: ======================== gme-Qwen2-VL-2B-Instruct ======================== - **Model Name:** gme-Qwen2-VL-2B-Instruct - **Languages:** en, zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1536 - **Max Tokens:** 32768 - **Model ID:** Alibaba-NLP/gme-Qwen2-VL-2B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name gme-Qwen2-VL-2B-Instruct --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/gme-qwen2-vl-7b-instruct.rst ================================================ .. _models_builtin_gme-qwen2-vl-7b-instruct: ======================== gme-Qwen2-VL-7B-Instruct ======================== - **Model Name:** gme-Qwen2-VL-7B-Instruct - **Languages:** en, zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 3584 - **Max Tokens:** 32768 - **Model ID:** Alibaba-NLP/gme-Qwen2-VL-7B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name gme-Qwen2-VL-7B-Instruct --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/gte-base.rst ================================================ .. _models_builtin_gte-base: ======== gte-base ======== - **Model Name:** gte-base - **Languages:** en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 768 - **Max Tokens:** 512 - **Model ID:** thenlper/gte-base - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name gte-base --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/gte-large.rst ================================================ .. _models_builtin_gte-large: ========= gte-large ========= - **Model Name:** gte-large - **Languages:** en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 512 - **Model ID:** thenlper/gte-large - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name gte-large --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/gte-qwen2.rst ================================================ .. _models_builtin_gte-qwen2: ========= gte-Qwen2 ========= - **Model Name:** gte-Qwen2 - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 3584 - **Max Tokens:** 32000 - **Model ID:** Alibaba-NLP/gte-Qwen2-7B-instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name gte-Qwen2 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/index.rst ================================================ .. _models_embedding_index: ================ Embedding Models ================ The following is a list of built-in embedding models in Xinference: .. toctree:: :maxdepth: 1 bce-embedding-base_v1 bge-base-en bge-base-en-v1.5 bge-base-zh bge-base-zh-v1.5 bge-large-en bge-large-en-v1.5 bge-large-zh bge-large-zh-noinstruct bge-large-zh-v1.5 bge-m3 bge-small-en-v1.5 bge-small-zh bge-small-zh-v1.5 e5-large-v2 gme-qwen2-vl-2b-instruct gme-qwen2-vl-7b-instruct gte-base gte-large gte-qwen2 jina-clip-v2 jina-embeddings-v2-base-en jina-embeddings-v2-base-zh jina-embeddings-v2-small-en jina-embeddings-v3 jina-embeddings-v4 m3e-base m3e-large m3e-small multilingual-e5-large qwen3-embedding-0.6b qwen3-embedding-4b qwen3-embedding-8b qwen3-vl-embedding-2b qwen3-vl-embedding-8b text2vec-base-chinese text2vec-base-chinese-paraphrase text2vec-base-chinese-sentence text2vec-base-multilingual text2vec-large-chinese ================================================ FILE: doc/source/models/builtin/embedding/jina-clip-v2.rst ================================================ .. _models_builtin_jina-clip-v2: ============ jina-clip-v2 ============ - **Model Name:** jina-clip-v2 - **Languages:** 89 languages supported - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 8192 - **Model ID:** jinaai/jina-clip-v2 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name jina-clip-v2 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/jina-embeddings-v2-base-en.rst ================================================ .. _models_builtin_jina-embeddings-v2-base-en: ========================== jina-embeddings-v2-base-en ========================== - **Model Name:** jina-embeddings-v2-base-en - **Languages:** en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 512 - **Max Tokens:** 8192 - **Model ID:** jinaai/jina-embeddings-v2-base-en - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name jina-embeddings-v2-base-en --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/jina-embeddings-v2-base-zh.rst ================================================ .. _models_builtin_jina-embeddings-v2-base-zh: ========================== jina-embeddings-v2-base-zh ========================== - **Model Name:** jina-embeddings-v2-base-zh - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 768 - **Max Tokens:** 8192 - **Model ID:** jinaai/jina-embeddings-v2-base-zh - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name jina-embeddings-v2-base-zh --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/jina-embeddings-v2-small-en.rst ================================================ .. _models_builtin_jina-embeddings-v2-small-en: =========================== jina-embeddings-v2-small-en =========================== - **Model Name:** jina-embeddings-v2-small-en - **Languages:** en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 512 - **Max Tokens:** 8192 - **Model ID:** jinaai/jina-embeddings-v2-small-en - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name jina-embeddings-v2-small-en --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/jina-embeddings-v3.rst ================================================ .. _models_builtin_jina-embeddings-v3: ================== jina-embeddings-v3 ================== - **Model Name:** jina-embeddings-v3 - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 8192 - **Model ID:** jinaai/jina-embeddings-v3 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name jina-embeddings-v3 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/jina-embeddings-v4.rst ================================================ .. _models_builtin_jina-embeddings-v4: ================== jina-embeddings-v4 ================== - **Model Name:** jina-embeddings-v4 - **Languages:** 30+ languages supported - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 2048 - **Max Tokens:** 32768 - **Model ID:** jinaai/jina-embeddings-v4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name jina-embeddings-v4 --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/m3e-base.rst ================================================ .. _models_builtin_m3e-base: ======== m3e-base ======== - **Model Name:** m3e-base - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 768 - **Max Tokens:** 512 - **Model ID:** moka-ai/m3e-base - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name m3e-base --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/m3e-large.rst ================================================ .. _models_builtin_m3e-large: ========= m3e-large ========= - **Model Name:** m3e-large - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 512 - **Model ID:** moka-ai/m3e-large - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name m3e-large --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/m3e-small.rst ================================================ .. _models_builtin_m3e-small: ========= m3e-small ========= - **Model Name:** m3e-small - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 512 - **Max Tokens:** 512 - **Model ID:** moka-ai/m3e-small - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name m3e-small --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/multilingual-e5-large.rst ================================================ .. _models_builtin_multilingual-e5-large: ===================== multilingual-e5-large ===================== - **Model Name:** multilingual-e5-large - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 514 - **Model ID:** intfloat/multilingual-e5-large - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name multilingual-e5-large --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/qwen3-embedding-0.6b.rst ================================================ .. _models_builtin_qwen3-embedding-0.6b: ==================== Qwen3-Embedding-0.6B ==================== - **Model Name:** Qwen3-Embedding-0.6B - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 32768 - **Model ID:** Qwen/Qwen3-Embedding-0.6B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name Qwen3-Embedding-0.6B --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/qwen3-embedding-4b.rst ================================================ .. _models_builtin_qwen3-embedding-4b: ================== Qwen3-Embedding-4B ================== - **Model Name:** Qwen3-Embedding-4B - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 2560 - **Max Tokens:** 32768 - **Model ID:** Qwen/Qwen3-Embedding-4B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name Qwen3-Embedding-4B --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/qwen3-embedding-8b.rst ================================================ .. _models_builtin_qwen3-embedding-8b: ================== Qwen3-Embedding-8B ================== - **Model Name:** Qwen3-Embedding-8B - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 4096 - **Max Tokens:** 32768 - **Model ID:** Qwen/Qwen3-Embedding-8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name Qwen3-Embedding-8B --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/qwen3-vl-embedding-2b.rst ================================================ .. _models_builtin_qwen3-vl-embedding-2b: ===================== Qwen3-VL-Embedding-2B ===================== - **Model Name:** Qwen3-VL-Embedding-2B - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 2048 - **Max Tokens:** 8192 - **Model ID:** Qwen/Qwen3-VL-Embedding-2B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name Qwen3-VL-Embedding-2B --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/qwen3-vl-embedding-8b.rst ================================================ .. _models_builtin_qwen3-vl-embedding-8b: ===================== Qwen3-VL-Embedding-8B ===================== - **Model Name:** Qwen3-VL-Embedding-8B - **Languages:** zh, en - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 4096 - **Max Tokens:** 8192 - **Model ID:** Qwen/Qwen3-VL-Embedding-8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name Qwen3-VL-Embedding-8B --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/text2vec-base-chinese-paraphrase.rst ================================================ .. _models_builtin_text2vec-base-chinese-paraphrase: ================================ text2vec-base-chinese-paraphrase ================================ - **Model Name:** text2vec-base-chinese-paraphrase - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 768 - **Max Tokens:** 256 - **Model ID:** shibing624/text2vec-base-chinese-paraphrase - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name text2vec-base-chinese-paraphrase --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/text2vec-base-chinese-sentence.rst ================================================ .. _models_builtin_text2vec-base-chinese-sentence: ============================== text2vec-base-chinese-sentence ============================== - **Model Name:** text2vec-base-chinese-sentence - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 768 - **Max Tokens:** 256 - **Model ID:** shibing624/text2vec-base-chinese-sentence - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model:: xinference launch --model-name text2vec-base-chinese-sentence --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/text2vec-base-chinese.rst ================================================ .. _models_builtin_text2vec-base-chinese: ===================== text2vec-base-chinese ===================== - **Model Name:** text2vec-base-chinese - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 768 - **Max Tokens:** 128 - **Model ID:** shibing624/text2vec-base-chinese - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name text2vec-base-chinese --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/text2vec-base-multilingual.rst ================================================ .. _models_builtin_text2vec-base-multilingual: ========================== text2vec-base-multilingual ========================== - **Model Name:** text2vec-base-multilingual - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 384 - **Max Tokens:** 256 - **Model ID:** shibing624/text2vec-base-multilingual - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model:: xinference launch --model-name text2vec-base-multilingual --model-type embedding ================================================ FILE: doc/source/models/builtin/embedding/text2vec-large-chinese.rst ================================================ .. _models_builtin_text2vec-large-chinese: ====================== text2vec-large-chinese ====================== - **Model Name:** text2vec-large-chinese - **Languages:** zh - **Abilities:** embed Specifications ^^^^^^^^^^^^^^ - **Dimensions:** 1024 - **Max Tokens:** 256 - **Model ID:** shibing624/text2vec-bge-large-chinese - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model:: xinference launch --model-name text2vec-large-chinese --model-type embedding ================================================ FILE: doc/source/models/builtin/image/cogview4.rst ================================================ .. _models_builtin_cogview4: ======== cogview4 ======== - **Model Name:** cogview4 - **Model Family:** stable_diffusion - **Abilities:** text2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** THUDM/CogView4-6B Execute the following command to launch the model:: xinference launch --model-name cogview4 --model-type image ================================================ FILE: doc/source/models/builtin/image/deepseek-ocr.rst ================================================ .. _models_builtin_deepseek-ocr: ============ DeepSeek-OCR ============ - **Model Name:** DeepSeek-OCR - **Model Family:** ocr - **Abilities:** ocr - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** deepseek-ai/DeepSeek-OCR Execute the following command to launch the model:: xinference launch --model-name DeepSeek-OCR --model-type image ================================================ FILE: doc/source/models/builtin/image/flux.1-dev.rst ================================================ .. _models_builtin_flux.1-dev: ========== FLUX.1-dev ========== - **Model Name:** FLUX.1-dev - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image, inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** black-forest-labs/FLUX.1-dev - **GGUF Model ID**: city96/FLUX.1-dev-gguf - **GGUF Quantizations**: F16, Q2_K, Q3_K_S, Q4_0, Q4_1, Q4_K_S, Q5_0, Q5_1, Q5_K_S, Q6_K, Q8_0 Execute the following command to launch the model:: xinference launch --model-name FLUX.1-dev --model-type image For GGUF quantization, using below command:: xinference launch --model-name FLUX.1-dev --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True ================================================ FILE: doc/source/models/builtin/image/flux.1-kontext-dev.rst ================================================ .. _models_builtin_flux.1-kontext-dev: ================== FLUX.1-Kontext-dev ================== - **Model Name:** FLUX.1-Kontext-dev - **Model Family:** stable_diffusion - **Abilities:** image2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** black-forest-labs/FLUX.1-Kontext-dev - **GGUF Model ID**: bullerwins/FLUX.1-Kontext-dev-GGUF - **GGUF Quantizations**: BF16, Q2_K, Q3_K_S, Q4_K_M, Q4_K_S, Q4_K_S, Q5_K_M, Q5_K_S, Q5_K_S, Q6_K, Q8_0 Execute the following command to launch the model:: xinference launch --model-name FLUX.1-Kontext-dev --model-type image For GGUF quantization, using below command:: xinference launch --model-name FLUX.1-Kontext-dev --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True ================================================ FILE: doc/source/models/builtin/image/flux.1-schnell.rst ================================================ .. _models_builtin_flux.1-schnell: ============== FLUX.1-schnell ============== - **Model Name:** FLUX.1-schnell - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image, inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** black-forest-labs/FLUX.1-schnell - **GGUF Model ID**: city96/FLUX.1-schnell-gguf - **GGUF Quantizations**: F16, Q2_K, Q3_K_S, Q4_0, Q4_1, Q4_K_S, Q5_0, Q5_1, Q5_K_S, Q6_K, Q8_0 Execute the following command to launch the model:: xinference launch --model-name FLUX.1-schnell --model-type image For GGUF quantization, using below command:: xinference launch --model-name FLUX.1-schnell --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True ================================================ FILE: doc/source/models/builtin/image/flux.2-dev.rst ================================================ .. _models_builtin_flux.2-dev: ========== FLUX.2-dev ========== - **Model Name:** FLUX.2-dev - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image, inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** black-forest-labs/FLUX.2-dev - **GGUF Model ID**: city96/FLUX.2-dev-gguf - **GGUF Quantizations**: BF16, Q2_K, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 Execute the following command to launch the model:: xinference launch --model-name FLUX.2-dev --model-type image For GGUF quantization, using below command:: xinference launch --model-name FLUX.2-dev --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True ================================================ FILE: doc/source/models/builtin/image/flux.2-klein-4b.rst ================================================ .. _models_builtin_flux.2-klein-4b: =============== FLUX.2-klein-4B =============== - **Model Name:** FLUX.2-klein-4B - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** black-forest-labs/FLUX.2-klein-4B Execute the following command to launch the model:: xinference launch --model-name FLUX.2-klein-4B --model-type image ================================================ FILE: doc/source/models/builtin/image/flux.2-klein-9b.rst ================================================ .. _models_builtin_flux.2-klein-9b: =============== FLUX.2-klein-9B =============== - **Model Name:** FLUX.2-klein-9B - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** black-forest-labs/FLUX.2-klein-9B Execute the following command to launch the model:: xinference launch --model-name FLUX.2-klein-9B --model-type image ================================================ FILE: doc/source/models/builtin/image/got-ocr2_0.rst ================================================ .. _models_builtin_got-ocr2_0: ========== GOT-OCR2_0 ========== - **Model Name:** GOT-OCR2_0 - **Model Family:** ocr - **Abilities:** ocr - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** stepfun-ai/GOT-OCR2_0 Execute the following command to launch the model:: xinference launch --model-name GOT-OCR2_0 --model-type image ================================================ FILE: doc/source/models/builtin/image/hunyuandit-v1.2-distilled.rst ================================================ .. _models_builtin_hunyuandit-v1.2-distilled: ========================= HunyuanDiT-v1.2-Distilled ========================= - **Model Name:** HunyuanDiT-v1.2-Distilled - **Model Family:** stable_diffusion - **Abilities:** text2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers-Distilled Execute the following command to launch the model:: xinference launch --model-name HunyuanDiT-v1.2-Distilled --model-type image ================================================ FILE: doc/source/models/builtin/image/hunyuandit-v1.2.rst ================================================ .. _models_builtin_hunyuandit-v1.2: =============== HunyuanDiT-v1.2 =============== - **Model Name:** HunyuanDiT-v1.2 - **Model Family:** stable_diffusion - **Abilities:** text2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers Execute the following command to launch the model:: xinference launch --model-name HunyuanDiT-v1.2 --model-type image ================================================ FILE: doc/source/models/builtin/image/hunyuanocr.rst ================================================ .. _models_builtin_hunyuanocr: ========== HunyuanOCR ========== - **Model Name:** HunyuanOCR - **Model Family:** ocr - **Abilities:** ocr - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** tencent/HunyuanOCR Execute the following command to launch the model:: xinference launch --model-name HunyuanOCR --model-type image ================================================ FILE: doc/source/models/builtin/image/index.rst ================================================ .. _models_image_index: ================ Image Models ================ The following is a list of built-in image models in Xinference: .. toctree:: :maxdepth: 1 cogview4 deepseek-ocr flux.1-dev flux.1-kontext-dev flux.1-schnell flux.2-dev flux.2-klein-4b flux.2-klein-9b got-ocr2_0 hunyuandit-v1.2 hunyuandit-v1.2-distilled hunyuanocr kolors paddleocr-vl qwen-image qwen-image-2512 qwen-image-edit qwen-image-edit-2509 qwen-image-edit-2511 qwen-image-layered sd-turbo sd3-medium sd3.5-large sd3.5-large-turbo sd3.5-medium sdxl-turbo stable-diffusion-2-inpainting stable-diffusion-inpainting stable-diffusion-v1.5 stable-diffusion-xl-base-1.0 stable-diffusion-xl-inpainting z-image z-image-turbo ================================================ FILE: doc/source/models/builtin/image/kolors.rst ================================================ .. _models_builtin_kolors: ====== kolors ====== - **Model Name:** kolors - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** Kwai-Kolors/Kolors-diffusers Execute the following command to launch the model:: xinference launch --model-name kolors --model-type image ================================================ FILE: doc/source/models/builtin/image/mineru2.5-2509-1.2b.rst ================================================ .. _models_builtin_mineru2.5-2509-1.2b: =================== MinerU2.5-2509-1.2B =================== - **Model Name:** MinerU2.5-2509-1.2B - **Model Family:** ocr - **Abilities:** ocr - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** opendatalab/MinerU2.5-2509-1.2B Execute the following command to launch the model:: xinference launch --model-name MinerU2.5-2509-1.2B --model-type image ================================================ FILE: doc/source/models/builtin/image/paddleocr-vl.rst ================================================ .. _models_builtin_paddleocr-vl: ============ PaddleOCR-VL ============ - **Model Name:** PaddleOCR-VL - **Model Family:** ocr - **Abilities:** ocr - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** PaddlePaddle/PaddleOCR-VL Execute the following command to launch the model:: xinference launch --model-name PaddleOCR-VL --model-type image ================================================ FILE: doc/source/models/builtin/image/qwen-image-2512.rst ================================================ .. _models_builtin_qwen-image-2512: =============== Qwen-Image-2512 =============== - **Model Name:** Qwen-Image-2512 - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image, inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen-Image-2512 - **GGUF Model ID**: unsloth/Qwen-Image-2512-GGUF - **GGUF Quantizations**: BF16, F16, Q2_K, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Lightning Model ID**: lightx2v/Qwen-Image-2512-Lightning - **Lightning Versions**: 4steps-V1.0-bf16, 4steps-V1.0-fp32 Execute the following command to launch the model:: xinference launch --model-name Qwen-Image-2512 --model-type image For GGUF quantization, using below command:: xinference launch --model-name Qwen-Image-2512 --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True For Lightning LoRA acceleration, using below command:: xinference launch --model-name Qwen-Image-2512 --model-type image --lightning_version ${lightning_version} ================================================ FILE: doc/source/models/builtin/image/qwen-image-edit-2509.rst ================================================ .. _models_builtin_qwen-image-edit-2509: ==================== Qwen-Image-Edit-2509 ==================== - **Model Name:** Qwen-Image-Edit-2509 - **Model Family:** stable_diffusion - **Abilities:** image2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen-Image-Edit-2509 - **GGUF Model ID**: QuantStack/Qwen-Image-Edit-2509-GGUF - **GGUF Quantizations**: Q2_K, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Lightning Model ID**: lightx2v/Qwen-Image-Lightning - **Lightning Versions**: 4steps-V1.0-bf16, 4steps-V1.0-fp32, 8steps-V1.0-bf16, 8steps-V1.0-fp32 Execute the following command to launch the model:: xinference launch --model-name Qwen-Image-Edit-2509 --model-type image For GGUF quantization, using below command:: xinference launch --model-name Qwen-Image-Edit-2509 --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True For Lightning LoRA acceleration, using below command:: xinference launch --model-name Qwen-Image-Edit-2509 --model-type image --lightning_version ${lightning_version} ================================================ FILE: doc/source/models/builtin/image/qwen-image-edit-2511.rst ================================================ .. _models_builtin_qwen-image-edit-2511: ==================== Qwen-Image-Edit-2511 ==================== - **Model Name:** Qwen-Image-Edit-2511 - **Model Family:** stable_diffusion - **Abilities:** image2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen-Image-Edit-2511 - **GGUF Model ID**: unsloth/Qwen-Image-Edit-2511-GGUF - **GGUF Quantizations**: BF16, F16, Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Lightning Model ID**: lightx2v/Qwen-Image-Edit-2511-Lightning - **Lightning Versions**: 4steps-V1.0-bf16, 4steps-V1.0-fp32 Execute the following command to launch the model:: xinference launch --model-name Qwen-Image-Edit-2511 --model-type image For GGUF quantization, using below command:: xinference launch --model-name Qwen-Image-Edit-2511 --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True For Lightning LoRA acceleration, using below command:: xinference launch --model-name Qwen-Image-Edit-2511 --model-type image --lightning_version ${lightning_version} ================================================ FILE: doc/source/models/builtin/image/qwen-image-edit.rst ================================================ .. _models_builtin_qwen-image-edit: =============== Qwen-Image-Edit =============== - **Model Name:** Qwen-Image-Edit - **Model Family:** stable_diffusion - **Abilities:** image2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen-Image-Edit - **GGUF Model ID**: QuantStack/Qwen-Image-Edit-GGUF - **GGUF Quantizations**: Q2_K, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Lightning Model ID**: lightx2v/Qwen-Image-Lightning - **Lightning Versions**: 4steps-V1.0-bf16, 4steps-V1.0, 8steps-V1.0-bf16, 8steps-V1.0 Execute the following command to launch the model:: xinference launch --model-name Qwen-Image-Edit --model-type image For GGUF quantization, using below command:: xinference launch --model-name Qwen-Image-Edit --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True For Lightning LoRA acceleration, using below command:: xinference launch --model-name Qwen-Image-Edit --model-type image --lightning_version ${lightning_version} ================================================ FILE: doc/source/models/builtin/image/qwen-image-layered.rst ================================================ .. _models_builtin_qwen-image-layered: ================== Qwen-Image-Layered ================== - **Model Name:** Qwen-Image-Layered - **Model Family:** stable_diffusion - **Abilities:** image2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen-Image-Layered Execute the following command to launch the model:: xinference launch --model-name Qwen-Image-Layered --model-type image ================================================ FILE: doc/source/models/builtin/image/qwen-image.rst ================================================ .. _models_builtin_qwen-image: ========== Qwen-Image ========== - **Model Name:** Qwen-Image - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image, inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen-Image - **GGUF Model ID**: city96/Qwen-Image-gguf - **GGUF Quantizations**: F16, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Lightning Model ID**: lightx2v/Qwen-Image-Lightning - **Lightning Versions**: 4steps-V1.0-bf16, 4steps-V1.0, 8steps-V1.0, 8steps-V1.1-bf16, 8steps-V1.1 Execute the following command to launch the model:: xinference launch --model-name Qwen-Image --model-type image For GGUF quantization, using below command:: xinference launch --model-name Qwen-Image --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True For Lightning LoRA acceleration, using below command:: xinference launch --model-name Qwen-Image --model-type image --lightning_version ${lightning_version} ================================================ FILE: doc/source/models/builtin/image/sd-turbo.rst ================================================ .. _models_builtin_sd-turbo: ======== sd-turbo ======== - **Model Name:** sd-turbo - **Model Family:** stable_diffusion - **Abilities:** text2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** stabilityai/sd-turbo Execute the following command to launch the model:: xinference launch --model-name sd-turbo --model-type image ================================================ FILE: doc/source/models/builtin/image/sd3-medium.rst ================================================ .. _models_builtin_sd3-medium: ========== sd3-medium ========== - **Model Name:** sd3-medium - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image, inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** stabilityai/stable-diffusion-3-medium-diffusers Execute the following command to launch the model:: xinference launch --model-name sd3-medium --model-type image ================================================ FILE: doc/source/models/builtin/image/sd3.5-large-turbo.rst ================================================ .. _models_builtin_sd3.5-large-turbo: ================= sd3.5-large-turbo ================= - **Model Name:** sd3.5-large-turbo - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image, inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** stabilityai/stable-diffusion-3.5-large-turbo - **GGUF Model ID**: city96/stable-diffusion-3.5-large-turbo-gguf - **GGUF Quantizations**: F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0 Execute the following command to launch the model:: xinference launch --model-name sd3.5-large-turbo --model-type image For GGUF quantization, using below command:: xinference launch --model-name sd3.5-large-turbo --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True ================================================ FILE: doc/source/models/builtin/image/sd3.5-large.rst ================================================ .. _models_builtin_sd3.5-large: =========== sd3.5-large =========== - **Model Name:** sd3.5-large - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image, inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** stabilityai/stable-diffusion-3.5-large - **GGUF Model ID**: city96/stable-diffusion-3.5-large-gguf - **GGUF Quantizations**: F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0 Execute the following command to launch the model:: xinference launch --model-name sd3.5-large --model-type image For GGUF quantization, using below command:: xinference launch --model-name sd3.5-large --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True ================================================ FILE: doc/source/models/builtin/image/sd3.5-medium.rst ================================================ .. _models_builtin_sd3.5-medium: ============ sd3.5-medium ============ - **Model Name:** sd3.5-medium - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image, inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** stabilityai/stable-diffusion-3.5-medium - **GGUF Model ID**: city96/stable-diffusion-3.5-medium-gguf - **GGUF Quantizations**: F16, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 Execute the following command to launch the model:: xinference launch --model-name sd3.5-medium --model-type image For GGUF quantization, using below command:: xinference launch --model-name sd3.5-medium --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True ================================================ FILE: doc/source/models/builtin/image/sdxl-turbo.rst ================================================ .. _models_builtin_sdxl-turbo: ========== sdxl-turbo ========== - **Model Name:** sdxl-turbo - **Model Family:** stable_diffusion - **Abilities:** text2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** stabilityai/sdxl-turbo Execute the following command to launch the model:: xinference launch --model-name sdxl-turbo --model-type image ================================================ FILE: doc/source/models/builtin/image/stable-diffusion-2-inpainting.rst ================================================ .. _models_builtin_stable-diffusion-2-inpainting: ============================= stable-diffusion-2-inpainting ============================= - **Model Name:** stable-diffusion-2-inpainting - **Model Family:** stable_diffusion - **Abilities:** inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** stabilityai/stable-diffusion-2-inpainting Execute the following command to launch the model:: xinference launch --model-name stable-diffusion-2-inpainting --model-type image ================================================ FILE: doc/source/models/builtin/image/stable-diffusion-inpainting.rst ================================================ .. _models_builtin_stable-diffusion-inpainting: =========================== stable-diffusion-inpainting =========================== - **Model Name:** stable-diffusion-inpainting - **Model Family:** stable_diffusion - **Abilities:** inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** runwayml/stable-diffusion-inpainting Execute the following command to launch the model:: xinference launch --model-name stable-diffusion-inpainting --model-type image ================================================ FILE: doc/source/models/builtin/image/stable-diffusion-v1.5.rst ================================================ .. _models_builtin_stable-diffusion-v1.5: ===================== stable-diffusion-v1.5 ===================== - **Model Name:** stable-diffusion-v1.5 - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image - **Available ControlNet:** ['canny', 'mlsd', 'hed', 'scribble', 'openpose', 'normal', 'seg'] Specifications ^^^^^^^^^^^^^^ - **Model ID:** runwayml/stable-diffusion-v1-5 Execute the following command to launch the model:: xinference launch --model-name stable-diffusion-v1.5 --model-type image ================================================ FILE: doc/source/models/builtin/image/stable-diffusion-xl-base-1.0.rst ================================================ .. _models_builtin_stable-diffusion-xl-base-1.0: ============================ stable-diffusion-xl-base-1.0 ============================ - **Model Name:** stable-diffusion-xl-base-1.0 - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image - **Available ControlNet:** ['canny', 'depth', 'zoe-depth'] Specifications ^^^^^^^^^^^^^^ - **Model ID:** stabilityai/stable-diffusion-xl-base-1.0 Execute the following command to launch the model:: xinference launch --model-name stable-diffusion-xl-base-1.0 --model-type image ================================================ FILE: doc/source/models/builtin/image/stable-diffusion-xl-inpainting.rst ================================================ .. _models_builtin_stable-diffusion-xl-inpainting: ============================== stable-diffusion-xl-inpainting ============================== - **Model Name:** stable-diffusion-xl-inpainting - **Model Family:** stable_diffusion - **Abilities:** inpainting - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** diffusers/stable-diffusion-xl-1.0-inpainting-0.1 Execute the following command to launch the model:: xinference launch --model-name stable-diffusion-xl-inpainting --model-type image ================================================ FILE: doc/source/models/builtin/image/z-image-turbo.rst ================================================ .. _models_builtin_z-image-turbo: ============= Z-Image-Turbo ============= - **Model Name:** Z-Image-Turbo - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** Tongyi-MAI/Z-Image-Turbo - **GGUF Model ID**: unsloth/Z-Image-Turbo-GGUF - **GGUF Quantizations**: ['BF16', 'F16', 'Q2_K', 'Q3_K_L', 'Q3_K_M', 'Q3_K_S', 'Q4_0', 'Q4_1', 'Q4_K_M', 'Q4_K_S', 'Q5_0', 'Q5_1', 'Q5_K_M', 'Q5_K_S', 'Q6_K', 'Q8_0'] Execute the following command to launch the model:: xinference launch --model-name Z-Image-Turbo --model-type image For GGUF quantization, using below command:: xinference launch --model-name Z-Image-Turbo --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True ================================================ FILE: doc/source/models/builtin/image/z-image.rst ================================================ .. _models_builtin_z-image: ======= Z-Image ======= - **Model Name:** Z-Image - **Model Family:** stable_diffusion - **Abilities:** text2image, image2image - **Available ControlNet:** None Specifications ^^^^^^^^^^^^^^ - **Model ID:** Tongyi-MAI/Z-Image Execute the following command to launch the model:: xinference launch --model-name Z-Image --model-type image ================================================ FILE: doc/source/models/builtin/index.rst ================================================ .. _models_builtin_index: ============== Builtin Models ============== .. toctree:: :maxdepth: 1 llm/index embedding/index image/index audio/index rerank/index video/index ================================================ FILE: doc/source/models/builtin/llm/baichuan-2-chat.rst ================================================ .. _models_llm_baichuan-2-chat: ======================================== baichuan-2-chat ======================================== - **Context Length:** 4096 - **Model Name:** baichuan-2-chat - **Languages:** en, zh - **Abilities:** chat - **Description:** Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** baichuan-inc/Baichuan2-7B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name baichuan-2-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** baichuan-inc/Baichuan2-13B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name baichuan-2-chat --size-in-billions 13 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/baichuan-2.rst ================================================ .. _models_llm_baichuan-2: ======================================== baichuan-2 ======================================== - **Context Length:** 4096 - **Model Name:** baichuan-2 - **Languages:** en, zh - **Abilities:** generate - **Description:** Baichuan2 is an open-source Transformer based LLM that is trained on both Chinese and English data. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** baichuan-inc/Baichuan2-7B-Base - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name baichuan-2 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** baichuan-inc/Baichuan2-13B-Base - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name baichuan-2 --size-in-billions 13 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/baichuan-m2.rst ================================================ .. _models_llm_baichuan-m2: ======================================== Baichuan-M2 ======================================== - **Context Length:** 131072 - **Model Name:** Baichuan-M2 - **Languages:** en, zh - **Abilities:** chat, reasoning, hybrid, tools - **Description:** Baichuan-M2-32B is Baichuan AI's medical-enhanced reasoning model, the second medical model released by Baichuan. Designed for real-world medical reasoning tasks, this model builds upon Qwen2.5-32B with an innovative Large Verifier System. Through domain-specific fine-tuning on real-world medical questions, it achieves breakthrough medical performance while maintaining strong general capabilities. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** baichuan-inc/Baichuan-M2-32B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Baichuan-M2 --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** baichuan-inc/Baichuan-M2-32B-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Baichuan-M2 --size-in-billions 32 --model-format gptq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/code-llama-instruct.rst ================================================ .. _models_llm_code-llama-instruct: ======================================== code-llama-instruct ======================================== - **Context Length:** 100000 - **Model Name:** code-llama-instruct - **Languages:** en - **Abilities:** chat - **Description:** Code-Llama-Instruct is an instruct-tuned version of the Code-Llama LLM. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** codellama/CodeLlama-7b-Instruct-hf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-instruct --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** codellama/CodeLlama-13b-Instruct-hf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-instruct --size-in-billions 13 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 34 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** codellama/CodeLlama-34b-Instruct-hf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-instruct --size-in-billions 34 --model-format pytorch --quantization ${quantization} Model Spec 4 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/CodeLlama-7B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-instruct --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (ggufv2, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 13 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/CodeLlama-13B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-instruct --size-in-billions 13 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (ggufv2, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 34 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/CodeLlama-34B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-instruct --size-in-billions 34 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/code-llama-python.rst ================================================ .. _models_llm_code-llama-python: ======================================== code-llama-python ======================================== - **Context Length:** 100000 - **Model Name:** code-llama-python - **Languages:** en - **Abilities:** generate - **Description:** Code-Llama-Python is a fine-tuned version of the Code-Llama LLM, specializing in Python. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/CodeLlama-7B-Python-fp16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-python --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/CodeLlama-13B-Python-fp16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-python --size-in-billions 13 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 34 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/CodeLlama-34B-Python-fp16 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-python --size-in-billions 34 --model-format pytorch --quantization ${quantization} Model Spec 4 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/CodeLlama-7B-Python-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-python --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (ggufv2, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 13 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/CodeLlama-13B-Python-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-python --size-in-billions 13 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (ggufv2, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 34 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/CodeLlama-34B-Python-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama-python --size-in-billions 34 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/code-llama.rst ================================================ .. _models_llm_code-llama: ======================================== code-llama ======================================== - **Context Length:** 100000 - **Model Name:** code-llama - **Languages:** en - **Abilities:** generate - **Description:** Code-Llama is an open-source LLM trained by fine-tuning LLaMA2 for generating and discussing code. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/CodeLlama-7B-fp16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/CodeLlama-13B-fp16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama --size-in-billions 13 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 34 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/CodeLlama-34B-fp16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama --size-in-billions 34 --model-format pytorch --quantization ${quantization} Model Spec 4 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/CodeLlama-7B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (ggufv2, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 13 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/CodeLlama-13B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama --size-in-billions 13 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (ggufv2, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 34 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/CodeLlama-34B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name code-llama --size-in-billions 34 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/codegeex4.rst ================================================ .. _models_llm_codegeex4: ======================================== codegeex4 ======================================== - **Context Length:** 131072 - **Model Name:** codegeex4 - **Languages:** en, zh - **Abilities:** chat - **Description:** the open-source version of the latest CodeGeeX4 model series Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/codegeex4-all-9b - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codegeex4 --size-in-billions 9 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 9 - **Quantizations:** IQ2_M, IQ3_M, Q4_K_M, Q5_K_M, Q6_K_L, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** zai-org/codegeex4-all-9b-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codegeex4 --size-in-billions 9 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/codeqwen1.5-chat.rst ================================================ .. _models_llm_codeqwen1.5-chat: ======================================== codeqwen1.5-chat ======================================== - **Context Length:** 65536 - **Model Name:** codeqwen1.5-chat - **Languages:** en, zh - **Abilities:** chat - **Description:** CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/CodeQwen1.5-7B-Chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codeqwen1.5-chat --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 2 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/CodeQwen1.5-7B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codeqwen1.5-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 3 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/CodeQwen1.5-7B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codeqwen1.5-chat --size-in-billions 7 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/codeqwen1.5.rst ================================================ .. _models_llm_codeqwen1.5: ======================================== codeqwen1.5 ======================================== - **Context Length:** 65536 - **Model Name:** codeqwen1.5 - **Languages:** en, zh - **Abilities:** generate - **Description:** CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/CodeQwen1.5-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codeqwen1.5 --size-in-billions 7 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/codeshell-chat.rst ================================================ .. _models_llm_codeshell-chat: ======================================== codeshell-chat ======================================== - **Context Length:** 8194 - **Model Name:** codeshell-chat - **Languages:** en, zh - **Abilities:** chat - **Description:** CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** WisdomShell/CodeShell-7B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codeshell-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/codeshell.rst ================================================ .. _models_llm_codeshell: ======================================== codeshell ======================================== - **Context Length:** 8194 - **Model Name:** codeshell - **Languages:** en, zh - **Abilities:** generate - **Description:** CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** WisdomShell/CodeShell-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codeshell --size-in-billions 7 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/codestral-v0.1.rst ================================================ .. _models_llm_codestral-v0.1: ======================================== codestral-v0.1 ======================================== - **Context Length:** 32768 - **Model Name:** codestral-v0.1 - **Languages:** en - **Abilities:** generate - **Description:** Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 22 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 22 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** mistralai/Codestral-22B-v0.1 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codestral-v0.1 --size-in-billions 22 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 22 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 22 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_K_S, Q4_K_M, Q5_K_S, Q5_K_M, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** bartowski/Codestral-22B-v0.1-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codestral-v0.1 --size-in-billions 22 --model-format ggufv2 --quantization ${quantization} Model Spec 3 (mlx, 22 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 22 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Codestral-22B-v0.1-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codestral-v0.1 --size-in-billions 22 --model-format mlx --quantization ${quantization} Model Spec 4 (mlx, 22 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 22 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Codestral-22B-v0.1-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name codestral-v0.1 --size-in-billions 22 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/cogagent.rst ================================================ .. _models_llm_cogagent: ======================================== cogagent ======================================== - **Context Length:** 4096 - **Model Name:** cogagent - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** The CogAgent-9B-20241220 model is based on GLM-4V-9B, a bilingual open-source VLM base model. Through data collection and optimization, multi-stage training, and strategy improvements, CogAgent-9B-20241220 achieves significant advancements in GUI perception, inference prediction accuracy, action space completeness, and task generalizability. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/cogagent-9b-20241220 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name cogagent --size-in-billions 9 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-chat.rst ================================================ .. _models_llm_deepseek-chat: ======================================== deepseek-chat ======================================== - **Context Length:** 4096 - **Model Name:** deepseek-chat - **Languages:** en, zh - **Abilities:** chat - **Description:** DeepSeek LLM is an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-llm-7b-chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 67 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 67 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-llm-67b-chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-chat --size-in-billions 67 --model-format pytorch --quantization ${quantization} Model Spec 3 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/deepseek-llm-7B-chat-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-chat --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 4 (ggufv2, 67 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 67 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/deepseek-llm-67b-chat-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-chat --size-in-billions 67 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-coder-instruct.rst ================================================ .. _models_llm_deepseek-coder-instruct: ======================================== deepseek-coder-instruct ======================================== - **Context Length:** 16384 - **Model Name:** deepseek-coder-instruct - **Languages:** en, zh - **Abilities:** chat - **Description:** deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1_3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_3 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-coder-1.3b-instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 1_3 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 6_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 6_7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-coder-6.7b-instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 6_7 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-coder-7b-instruct-v1.5 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 33 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 33 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-coder-33b-instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 33 --model-format pytorch --quantization ${quantization} Model Spec 5 (ggufv2, 1_3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1_3 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/deepseek-coder-1.3b-instruct-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 1_3 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (ggufv2, 6_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 6_7 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/deepseek-coder-6.7B-instruct-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 6_7 --model-format ggufv2 --quantization ${quantization} Model Spec 7 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** LoneStriker/deepseek-coder-7b-instruct-v1.5-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 8 (ggufv2, 33 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 33 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/deepseek-coder-33B-instruct-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 33 --model-format ggufv2 --quantization ${quantization} Model Spec 9 (gptq, 1_3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 1_3 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-1.3b-instruct-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 1_3 --model-format gptq --quantization ${quantization} Model Spec 10 (gptq, 6_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 6_7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-6.7B-instruct-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 6_7 --model-format gptq --quantization ${quantization} Model Spec 11 (gptq, 33 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 33 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-33B-instruct-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 33 --model-format gptq --quantization ${quantization} Model Spec 12 (awq, 1_3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 1_3 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-1.3b-instruct-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 1_3 --model-format awq --quantization ${quantization} Model Spec 13 (awq, 6_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 6_7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-6.7B-instruct-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 6_7 --model-format awq --quantization ${quantization} Model Spec 14 (awq, 33 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 33 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-33B-instruct-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder-instruct --size-in-billions 33 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-coder.rst ================================================ .. _models_llm_deepseek-coder: ======================================== deepseek-coder ======================================== - **Context Length:** 16384 - **Model Name:** deepseek-coder - **Languages:** en, zh - **Abilities:** generate - **Description:** Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1_3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_3 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-coder-1.3b-base - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 1_3 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 6_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 6_7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-coder-6.7b-base - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 6_7 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-coder-7b-base-v1.5 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 33 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 33 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-coder-33b-base - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 33 --model-format pytorch --quantization ${quantization} Model Spec 5 (ggufv2, 1_3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1_3 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/deepseek-coder-1.3b-base-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 1_3 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (ggufv2, 6_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 6_7 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/deepseek-coder-6.7B-base-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 6_7 --model-format ggufv2 --quantization ${quantization} Model Spec 7 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** dagbs/deepseek-coder-7b-base-v1.5-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 8 (ggufv2, 33 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 33 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/deepseek-coder-33B-base-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 33 --model-format ggufv2 --quantization ${quantization} Model Spec 9 (gptq, 1_3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 1_3 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-1.3b-base-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 1_3 --model-format gptq --quantization ${quantization} Model Spec 10 (gptq, 6_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 6_7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-6.7B-base-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 6_7 --model-format gptq --quantization ${quantization} Model Spec 11 (gptq, 33 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 33 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-33B-base-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 33 --model-format gptq --quantization ${quantization} Model Spec 12 (awq, 1_3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 1_3 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-1.3b-base-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 1_3 --model-format awq --quantization ${quantization} Model Spec 13 (awq, 6_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 6_7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-6.7B-base-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 6_7 --model-format awq --quantization ${quantization} Model Spec 14 (awq, 33 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 33 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/deepseek-coder-33B-base-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-coder --size-in-billions 33 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-prover-v2.rst ================================================ .. _models_llm_deepseek-prover-v2: ======================================== deepseek-prover-v2 ======================================== - **Context Length:** 163840 - **Model Name:** deepseek-prover-v2 - **Languages:** en, zh - **Abilities:** chat, reasoning - **Description:** We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 671 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-Prover-V2-671B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 671 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-Prover-V2-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 3 (mlx, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 7 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/DeepSeek-Prover-V2-7B-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 7 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-r1-0528-qwen3.rst ================================================ .. _models_llm_deepseek-r1-0528-qwen3: ======================================== deepseek-r1-0528-qwen3 ======================================== - **Context Length:** 131072 - **Model Name:** deepseek-r1-0528-qwen3 - **Languages:** en, zh - **Abilities:** chat, reasoning - **Description:** The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-R1-0528-Qwen3-8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-0528-qwen3 --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 8 - **Quantizations:** Int4-W4A16, Int8-W8A16 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** QuantTrio/DeepSeek-R1-0528-Qwen3-8B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-0528-qwen3 --size-in-billions 8 --model-format gptq --quantization ${quantization} Model Spec 3 (gptq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 8 - **Quantizations:** Int4-Int8Mix - **Engines**: vLLM, Transformers, SGLang - **Model ID:** QuantTrio/DeepSeek-R1-0528-Qwen3-8B-GPTQ-Int4-Int8Mix - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-0528-qwen3 --size-in-billions 8 --model-format gptq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-r1-0528.rst ================================================ .. _models_llm_deepseek-r1-0528: ======================================== deepseek-r1-0528 ======================================== - **Context Length:** 163840 - **Model Name:** deepseek-r1-0528 - **Languages:** en, zh - **Abilities:** chat, reasoning, tools - **Description:** DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 671 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-R1-0528 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-0528 --size-in-billions 671 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 671 - **Quantizations:** Int4-Int8Mix-Lite, Int4-Int8Mix-Compact, Int4-Int8Mix-Medium - **Engines**: vLLM, Transformers, SGLang - **Model ID:** QuantTrio/DeepSeek-R1-0528-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-0528 --size-in-billions 671 --model-format gptq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-r1-distill-llama.rst ================================================ .. _models_llm_deepseek-r1-distill-llama: ======================================== deepseek-r1-distill-llama ======================================== - **Context Length:** 131072 - **Model Name:** deepseek-r1-distill-llama - **Languages:** en, zh - **Abilities:** chat, reasoning - **Description:** deepseek-r1-distill-llama is distilled from DeepSeek-R1 based on Llama Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-R1-Distill-Llama-8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-llama --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 2 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/DeepSeek-R1-Distill-Llama-8B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-llama --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 3 (pytorch, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-R1-Distill-Llama-70B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-llama --size-in-billions 70 --model-format pytorch --quantization ${quantization} Model Spec 4 (ggufv2, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 70 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0, F16 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-llama --size-in-billions 70 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (mlx, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 70 - **Quantizations:** 3bit, 4bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/DeepSeek-R1-Distill-Llama-70B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-llama --size-in-billions 70 --model-format mlx --quantization ${quantization} Model Spec 6 (awq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 8 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** jakiAJK/DeepSeek-R1-Distill-Llama-8B_AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-llama --size-in-billions 8 --model-format awq --quantization ${quantization} Model Spec 7 (gptq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 8 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** jakiAJK/DeepSeek-R1-Distill-Llama-8B_GPTQ-int4 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-llama --size-in-billions 8 --model-format gptq --quantization ${quantization} Model Spec 8 (ggufv2, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1_5 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0, F16 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-llama --size-in-billions 1_5 --model-format ggufv2 --quantization ${quantization} Model Spec 9 (awq, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 70 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** casperhansen/deepseek-r1-distill-llama-70b-awq - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-llama --size-in-billions 70 --model-format awq --quantization ${quantization} Model Spec 10 (gptq, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 70 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** empirischtech/DeepSeek-R1-Distill-Llama-70B-gptq-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-llama --size-in-billions 70 --model-format gptq --quantization ${quantization} Model Spec 11 (ggufv2, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 8 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0, F16 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF - **Model Hubs**: `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-llama --size-in-billions 8 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-r1-distill-qwen.rst ================================================ .. _models_llm_deepseek-r1-distill-qwen: ======================================== deepseek-r1-distill-qwen ======================================== - **Context Length:** 131072 - **Model Name:** deepseek-r1-distill-qwen - **Languages:** en, zh - **Abilities:** chat, reasoning - **Description:** deepseek-r1-distill-qwen is distilled from DeepSeek-R1 based on Qwen Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 1_5 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1_5 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 1_5 --model-format ggufv2 --quantization ${quantization} Model Spec 3 (mlx, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 1_5 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 1_5 --model-format mlx --quantization ${quantization} Model Spec 4 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-R1-Distill-Qwen-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 5 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** jakiAJK/DeepSeek-R1-Distill-Qwen-7B_GPTQ-int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 6 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0, F16 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 7 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-R1-Distill-Qwen-14B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 14 --model-format pytorch --quantization ${quantization} Model Spec 8 (ggufv2, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 14 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0, F16 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 14 --model-format ggufv2 --quantization ${quantization} Model Spec 9 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-R1-Distill-Qwen-32B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 10 (ggufv2, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 32 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0, F16 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 32 --model-format ggufv2 --quantization ${quantization} Model Spec 11 (awq, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 1_5 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** casperhansen/deepseek-r1-distill-qwen-1.5b-awq - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 1_5 --model-format awq --quantization ${quantization} Model Spec 12 (gptq, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 1_5 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** jakiAJK/DeepSeek-R1-Distill-Qwen-1.5B_GPTQ-int4 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 1_5 --model-format gptq --quantization ${quantization} Model Spec 13 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** jakiAJK/DeepSeek-R1-Distill-Qwen-7B_AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 14 (mlx, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 7 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/DeepSeek-R1-Distill-Qwen-7B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 7 --model-format mlx --quantization ${quantization} Model Spec 15 (awq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 14 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** casperhansen/deepseek-r1-distill-qwen-14b-awq - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 14 --model-format awq --quantization ${quantization} Model Spec 16 (mlx, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 14 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/DeepSeek-R1-Distill-Qwen-14B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 14 --model-format mlx --quantization ${quantization} Model Spec 17 (awq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** casperhansen/deepseek-r1-distill-qwen-32b-awq - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 32 --model-format awq --quantization ${quantization} Model Spec 18 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/DeepSeek-R1-Distill-Qwen-32B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 32 --model-format mlx --quantization ${quantization} Model Spec 19 (gptq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** tclf90/deepseek-r1-distill-qwen-32b-gptq-int4 - **Model Hubs**: `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1-distill-qwen --size-in-billions 32 --model-format gptq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-r1.rst ================================================ .. _models_llm_deepseek-r1: ======================================== deepseek-r1 ======================================== - **Context Length:** 163840 - **Model Name:** deepseek-r1 - **Languages:** en, zh - **Abilities:** chat, reasoning - **Description:** DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 671 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-R1 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1 --size-in-billions 671 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 671 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** cognitivecomputations/DeepSeek-R1-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1 --size-in-billions 671 --model-format awq --quantization ${quantization} Model Spec 3 (ggufv2, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 671 - **Quantizations:** UD-IQ1_S, UD-IQ1_M, UD-IQ2_XXS, UD-Q2_K_XL, Q2_K, Q2_K_L, Q2_K_XS, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0, BF16 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/DeepSeek-R1-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1 --size-in-billions 671 --model-format ggufv2 --quantization ${quantization} Model Spec 4 (mlx, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 671 - **Quantizations:** 2bit, 3bit, 4bit - **Engines**: MLX - **Model ID:** mlx-community/DeepSeek-R1-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-r1 --size-in-billions 671 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-v2-chat-0628.rst ================================================ .. _models_llm_deepseek-v2-chat-0628: ======================================== deepseek-v2-chat-0628 ======================================== - **Context Length:** 128000 - **Model Name:** deepseek-v2-chat-0628 - **Languages:** en, zh - **Abilities:** chat - **Description:** DeepSeek-V2-Chat-0628 is an improved version of DeepSeek-V2-Chat. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 236 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 236 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-V2-Chat-0628 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-v2-chat-0628 --size-in-billions 236 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-v2-chat.rst ================================================ .. _models_llm_deepseek-v2-chat: ======================================== deepseek-v2-chat ======================================== - **Context Length:** 128000 - **Model Name:** deepseek-v2-chat - **Languages:** en, zh - **Abilities:** chat - **Description:** DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 16 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 16 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-V2-Lite-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-v2-chat --size-in-billions 16 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 236 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 236 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-V2-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-v2-chat --size-in-billions 236 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-v2.5.rst ================================================ .. _models_llm_deepseek-v2.5: ======================================== deepseek-v2.5 ======================================== - **Context Length:** 128000 - **Model Name:** deepseek-v2.5 - **Languages:** en, zh - **Abilities:** chat - **Description:** DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 236 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 236 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-V2.5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-v2.5 --size-in-billions 236 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-v3-0324.rst ================================================ .. _models_llm_deepseek-v3-0324: ======================================== deepseek-v3-0324 ======================================== - **Context Length:** 163840 - **Model Name:** deepseek-v3-0324 - **Languages:** en, zh - **Abilities:** chat - **Description:** DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 671 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-V3-0324 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-v3-0324 --size-in-billions 671 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 671 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** cognitivecomputations/DeepSeek-V3-0324-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-v3-0324 --size-in-billions 671 --model-format awq --quantization ${quantization} Model Spec 3 (mlx, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 671 - **Quantizations:** 4bit, 5bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/DeepSeek-V3-0324-{quantization} - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-v3-0324 --size-in-billions 671 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-v3.1.rst ================================================ .. _models_llm_deepseek-v3.1: ======================================== Deepseek-V3.1 ======================================== - **Context Length:** 131072 - **Model Name:** Deepseek-V3.1 - **Languages:** en, zh - **Abilities:** chat, reasoning, hybrid, tools - **Description:** DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 671 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-V3.1 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Deepseek-V3.1 --size-in-billions 671 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 671 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** cpatonn/DeepSeek-V3.1-GPTQ-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Deepseek-V3.1 --size-in-billions 671 --model-format gptq --quantization ${quantization} Model Spec 3 (awq, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 671 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** QuantTrio/DeepSeek-V3.1-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Deepseek-V3.1 --size-in-billions 671 --model-format awq --quantization ${quantization} Model Spec 4 (mlx, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 671 - **Quantizations:** 8bit, 4bit - **Engines**: MLX - **Model ID:** mlx-community/DeepSeek-V3.1-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Deepseek-V3.1 --size-in-billions 671 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-v3.2-exp.rst ================================================ .. _models_llm_deepseek-v3.2-exp: ======================================== DeepSeek-V3.2-Exp ======================================== - **Context Length:** 163840 - **Model Name:** DeepSeek-V3.2-Exp - **Languages:** en, zh - **Abilities:** chat, reasoning, hybrid, tools - **Description:** We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 671 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** deepseek-ai/DeepSeek-V3.2-Exp - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name DeepSeek-V3.2-Exp --size-in-billions 671 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 671 - **Quantizations:** AWQ, AWQ-Lite - **Engines**: Transformers - **Model ID:** QuantTrio/DeepSeek-V3.2-Exp-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name DeepSeek-V3.2-Exp --size-in-billions 671 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-v3.2.rst ================================================ .. _models_llm_deepseek-v3.2: ======================================== DeepSeek-V3.2 ======================================== - **Context Length:** 163840 - **Model Name:** DeepSeek-V3.2 - **Languages:** en, zh - **Abilities:** chat, reasoning, hybrid, tools - **Description:** We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 671 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** deepseek-ai/DeepSeek-V3.2 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name DeepSeek-V3.2 --size-in-billions 671 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 671 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/DeepSeek-V3.2-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name DeepSeek-V3.2 --size-in-billions 671 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-v3.rst ================================================ .. _models_llm_deepseek-v3: ======================================== deepseek-v3 ======================================== - **Context Length:** 163840 - **Model Name:** deepseek-v3 - **Languages:** en, zh - **Abilities:** chat - **Description:** DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 671 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/DeepSeek-V3 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-v3 --size-in-billions 671 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 671 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** cognitivecomputations/DeepSeek-V3-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-v3 --size-in-billions 671 --model-format awq --quantization ${quantization} Model Spec 3 (ggufv2, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 671 - **Quantizations:** Q2_K_L, Q2_K_XS, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/DeepSeek-V3-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-v3 --size-in-billions 671 --model-format ggufv2 --quantization ${quantization} Model Spec 4 (mlx, 671 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 671 - **Quantizations:** 3bit, 4bit - **Engines**: MLX - **Model ID:** mlx-community/DeepSeek-V3-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-v3 --size-in-billions 671 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek-vl2.rst ================================================ .. _models_llm_deepseek-vl2: ======================================== deepseek-vl2 ======================================== - **Context Length:** 4096 - **Model Name:** deepseek-vl2 - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 27 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 27 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-vl2 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-vl2 --size-in-billions 27 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 16 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 16 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-vl2-small - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-vl2 --size-in-billions 16 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 3 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-vl2-tiny - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek-vl2 --size-in-billions 3 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/deepseek.rst ================================================ .. _models_llm_deepseek: ======================================== deepseek ======================================== - **Context Length:** 4096 - **Model Name:** deepseek - **Languages:** en, zh - **Abilities:** generate - **Description:** DeepSeek LLM, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-llm-7b-base - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 67 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 67 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** deepseek-ai/deepseek-llm-67b-base - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek --size-in-billions 67 --model-format pytorch --quantization ${quantization} Model Spec 3 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/deepseek-llm-7B-chat-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 4 (ggufv2, 67 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 67 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/deepseek-llm-67b-chat-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name deepseek --size-in-billions 67 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/dianjin-r1.rst ================================================ .. _models_llm_dianjin-r1: ======================================== DianJin-R1 ======================================== - **Context Length:** 32768 - **Model Name:** DianJin-R1 - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** Tongyi DianJin is a financial intelligence solution platform built by Alibaba Cloud, dedicated to providing financial business developers with a convenient artificial intelligence application development environment. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** DianJin/DianJin-R1-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name DianJin-R1 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** DianJin/DianJin-R1-32B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name DianJin-R1 --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 3 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, IQ4_XS, Q4_K_S, Q4_K_M, Q5_K_S, Q5_K_M, Q6_K, Q8_0, f16 - **Engines**: vLLM, llama.cpp - **Model ID:** mradermacher/DianJin-R1-7B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name DianJin-R1 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 4 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** i1-IQ1_S, i1-IQ1_M, i1-IQ2_XXS, i1-IQ2_XS, i1-IQ2_S, i1-IQ2_M, i1-Q2_K_S, i1-Q2_K, i1-IQ3_XXS, i1-IQ3_XS, i1-Q3_K_S, i1-IQ3_S, i1-IQ3_M, i1-Q3_K_M, i1-Q3_K_L, i1-IQ4_XS, i1-IQ4_NL, i1-Q4_0, i1-Q4_K_S, i1-Q4_K_M, i1-Q4_1, i1-Q5_K_S, i1-Q5_K_M, i1-Q6_K - **Engines**: vLLM, llama.cpp - **Model ID:** mradermacher/DianJin-R1-7B-i1-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name DianJin-R1 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (ggufv2, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 32 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, IQ4_XS, Q4_K_S, Q4_K_M, Q5_K_S, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** mradermacher/DianJin-R1-32B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name DianJin-R1 --size-in-billions 32 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (ggufv2, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 32 - **Quantizations:** i1-IQ1_S, i1-IQ1_M, i1-IQ2_XXS, i1-IQ2_XS, i1-IQ2_S, i1-IQ2_M, i1-Q2_K_S, i1-Q2_K, i1-IQ3_XXS, i1-IQ3_XS, i1-Q3_K_S, i1-IQ3_S, i1-IQ3_M, i1-Q3_K_M, i1-Q3_K_L, i1-IQ4_XS, i1-Q4_0, i1-Q4_K_S, i1-Q4_K_M, i1-Q4_1, i1-Q5_K_S, i1-Q5_K_M, i1-Q6_K - **Engines**: vLLM, llama.cpp - **Model ID:** mradermacher/DianJin-R1-32B-i1-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name DianJin-R1 --size-in-billions 32 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/ernie4.5.rst ================================================ .. _models_llm_ernie4.5: ======================================== Ernie4.5 ======================================== - **Context Length:** 131072 - **Model Name:** Ernie4.5 - **Languages:** en, zh - **Abilities:** chat - **Description:** ERNIE 4.5, a new family of large-scale multimodal models comprising 10 distinct variants. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 0_3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 0_3 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** baidu/ERNIE-4.5-0.3B-PT - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ernie4.5 --size-in-billions 0_3 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 0_3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 0_3 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, F16 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/ERNIE-4.5-0.3B-PT-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ernie4.5 --size-in-billions 0_3 --model-format ggufv2 --quantization ${quantization} Model Spec 3 (mlx, 0_3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 0_3 - **Quantizations:** 4bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/ERNIE-4.5-0.3B-PT-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ernie4.5 --size-in-billions 0_3 --model-format mlx --quantization ${quantization} Model Spec 4 (pytorch, 21 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 21 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** baidu/ERNIE-4.5-21B-A3B-Base-PT - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ernie4.5 --size-in-billions 21 --model-format pytorch --quantization ${quantization} Model Spec 5 (ggufv2, 21 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 21 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, BF16 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/ERNIE-4.5-21B-A3B-PT-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ernie4.5 --size-in-billions 21 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (mlx, 21 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 21 - **Quantizations:** 4bit, 5bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/ERNIE-4.5-21B-A3B-PT-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ernie4.5 --size-in-billions 21 --model-format mlx --quantization ${quantization} Model Spec 7 (pytorch, 300 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 300 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** baidu/ERNIE-4.5-300B-A47B-PT - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ernie4.5 --size-in-billions 300 --model-format pytorch --quantization ${quantization} Model Spec 8 (ggufv2, 300 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 300 - **Quantizations:** Q2_K, Q4_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/ERNIE-4.5-300B-A47B-PT-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ernie4.5 --size-in-billions 300 --model-format ggufv2 --quantization ${quantization} Model Spec 9 (mlx, 300 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 300 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/ERNIE-4.5-300B-47B-PT-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ernie4.5 --size-in-billions 300 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/fin-r1.rst ================================================ .. _models_llm_fin-r1: ======================================== fin-r1 ======================================== - **Context Length:** 131072 - **Model Name:** fin-r1 - **Languages:** en, zh - **Abilities:** chat - **Description:** Fin-R1 is a large language model specifically designed for the field of financial reasoning Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** SUFE-AIFLM-Lab/Fin-R1 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name fin-r1 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** JunHowie/Fin-R1-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name fin-r1 --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 3 (fp8, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 7 - **Quantizations:** FP8 - **Engines**: vLLM, SGLang - **Model ID:** JunHowie/Fin-R1-FP8-Dynamic - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name fin-r1 --size-in-billions 7 --model-format fp8 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/gemma-3-1b-it.rst ================================================ .. _models_llm_gemma-3-1b-it: ======================================== gemma-3-1b-it ======================================== - **Context Length:** 32768 - **Model Name:** gemma-3-1b-it - **Languages:** en - **Abilities:** chat - **Description:** Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** google/gemma-3-1b-it - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-1b-it --size-in-billions 1 --model-format pytorch --quantization ${quantization} Model Spec 2 (mlx, 1 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 1 - **Quantizations:** 4bit, 6bit, 8bit, fp16 - **Engines**: MLX - **Model ID:** mlx-community/gemma-3-1b-it-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-1b-it --size-in-billions 1 --model-format mlx --quantization ${quantization} Model Spec 3 (ggufv2, 1 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1 - **Quantizations:** IQ2_M, IQ3_M, IQ3_XS, IQ3_XXS, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_L, Q4_K_M, Q4_K_S, Q5_K_L, Q5_K_M, Q5_K_S, Q6_K, Q6_K_L, Q8_0, bf16 - **Engines**: vLLM, llama.cpp - **Model ID:** bartowski/google_gemma-3-1b-it-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-1b-it --size-in-billions 1 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/gemma-3-it.rst ================================================ .. _models_llm_gemma-3-it: ======================================== gemma-3-it ======================================== - **Context Length:** 131072 - **Model Name:** gemma-3-it - **Languages:** en - **Abilities:** chat, vision - **Description:** Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** google/gemma-3-4b-it - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-it --size-in-billions 4 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 12 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** google/gemma-3-12b-it - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-it --size-in-billions 12 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 27 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 27 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** google/gemma-3-27b-it - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-it --size-in-billions 27 --model-format pytorch --quantization ${quantization} Model Spec 4 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** 4bit, 6bit, 8bit, fp16 - **Engines**: MLX - **Model ID:** mlx-community/gemma-3-4b-it-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-it --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 5 (mlx, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 12 - **Quantizations:** 4bit, 6bit, 8bit, fp16 - **Engines**: MLX - **Model ID:** mlx-community/gemma-3-12b-it-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-it --size-in-billions 12 --model-format mlx --quantization ${quantization} Model Spec 6 (mlx, 27 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 27 - **Quantizations:** 4bit, 6bit, 8bit, fp16 - **Engines**: MLX - **Model ID:** mlx-community/gemma-3-27b-it-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-it --size-in-billions 27 --model-format mlx --quantization ${quantization} Model Spec 7 (ggufv2, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 4 - **Quantizations:** IQ2_M, IQ3_M, IQ3_XS, IQ3_XXS, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_L, Q4_K_M, Q4_K_S, Q5_K_L, Q5_K_M, Q5_K_S, Q6_K, Q6_K_L, Q8_0, bf16 - **Engines**: llama.cpp - **Model ID:** bartowski/google_gemma-3-4b-it-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-it --size-in-billions 4 --model-format ggufv2 --quantization ${quantization} Model Spec 8 (ggufv2, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 12 - **Quantizations:** IQ2_M, IQ3_M, IQ3_XS, IQ3_XXS, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_L, Q4_K_M, Q4_K_S, Q5_K_L, Q5_K_M, Q5_K_S, Q6_K, Q6_K_L, Q8_0, bf16 - **Engines**: llama.cpp - **Model ID:** bartowski/google_gemma-3-12b-it-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-it --size-in-billions 12 --model-format ggufv2 --quantization ${quantization} Model Spec 9 (ggufv2, 27 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 27 - **Quantizations:** IQ2_M, IQ3_M, IQ3_XS, IQ3_XXS, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_L, Q4_K_M, Q4_K_S, Q5_K_L, Q5_K_M, Q5_K_S, Q6_K, Q6_K_L, Q8_0, bf16 - **Engines**: llama.cpp - **Model ID:** bartowski/google_gemma-3-27b-it-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gemma-3-it --size-in-billions 27 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm-4.1v-thinking.rst ================================================ .. _models_llm_glm-4.1v-thinking: ======================================== glm-4.1v-thinking ======================================== - **Context Length:** 65536 - **Model Name:** glm-4.1v-thinking - **Languages:** en, zh - **Abilities:** chat, vision, reasoning, tools - **Description:** GLM-4.1V-9B-Thinking, designed to explore the upper limits of reasoning in vision-language models. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/GLM-4.1V-9B-Thinking - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.1v-thinking --size-in-billions 9 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 9 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/GLM-4.1V-9B-Thinking-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.1v-thinking --size-in-billions 9 --model-format awq --quantization ${quantization} Model Spec 3 (gptq, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 9 - **Quantizations:** Int4-Int8Mix - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.1v-thinking --size-in-billions 9 --model-format gptq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm-4.5.rst ================================================ .. _models_llm_glm-4.5: ======================================== glm-4.5 ======================================== - **Context Length:** 131072 - **Model Name:** glm-4.5 - **Languages:** en, zh - **Abilities:** chat, reasoning, hybrid, tools - **Description:** The GLM-4.5 series models are foundation models designed for intelligent agents. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 355 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/GLM-4.5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 355 --model-format pytorch --quantization ${quantization} Model Spec 2 (fp8, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 355 - **Quantizations:** FP8 - **Engines**: vLLM - **Model ID:** zai-org/GLM-4.5-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 355 --model-format fp8 --quantization ${quantization} Model Spec 3 (gptq, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 355 - **Quantizations:** Int4-Int8Mix - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/GLM-4.5-GPTQ-Int4-Int8Mix - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 355 --model-format gptq --quantization ${quantization} Model Spec 4 (awq, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 355 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/GLM-4.5-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 355 --model-format awq --quantization ${quantization} Model Spec 5 (mlx, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 355 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/GLM-4.5-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 355 --model-format mlx --quantization ${quantization} Model Spec 6 (pytorch, 106 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 106 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/GLM-4.5-Air - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format pytorch --quantization ${quantization} Model Spec 7 (fp8, 106 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 106 - **Quantizations:** FP8 - **Engines**: vLLM - **Model ID:** zai-org/GLM-4.5-Air-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format fp8 --quantization ${quantization} Model Spec 8 (gptq, 106 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 106 - **Quantizations:** Int4-Int8Mix - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/GLM-4.5-Air-GPTQ-Int4-Int8Mix - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format gptq --quantization ${quantization} Model Spec 9 (awq, 106 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 106 - **Quantizations:** AWQ-FP16Mix - **Engines**: Transformers - **Model ID:** QuantTrio/GLM-4.5-Air-AWQ-FP16Mix - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format awq --quantization ${quantization} Model Spec 10 (awq, 106 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 106 - **Quantizations:** 4bit - **Engines**: vLLM, Transformers - **Model ID:** cpatonn-mirror/GLM-4.5-Air-AWQ-4bit - **Model Hubs**: `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format awq --quantization ${quantization} Model Spec 11 (mlx, 106 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 106 - **Quantizations:** 2bit, 3bit, 4bit, 5bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/GLM-4.5-Air-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm-4.5v.rst ================================================ .. _models_llm_glm-4.5v: ======================================== glm-4.5v ======================================== - **Context Length:** 131072 - **Model Name:** glm-4.5v - **Languages:** en, zh - **Abilities:** chat, vision, reasoning, tools - **Description:** GLM-4.5V is based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air (106B parameters, 12B active). It continues the technical approach of GLM-4.1V-Thinking, achieving SOTA performance among models of the same scale on 42 public vision-language benchmarks. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 106 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 106 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/GLM-4.5V - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5v --size-in-billions 106 --model-format pytorch --quantization ${quantization} Model Spec 2 (fp8, 106 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 106 - **Quantizations:** FP8 - **Engines**: vLLM, Transformers - **Model ID:** zai-org/GLM-4.5V-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5v --size-in-billions 106 --model-format fp8 --quantization ${quantization} Model Spec 3 (awq, 106 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 106 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/GLM-4.5V-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5v --size-in-billions 106 --model-format awq --quantization ${quantization} Model Spec 4 (mlx, 106 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 106 - **Quantizations:** 3bit, 4bit, 5bit, 6bit, 8bit - **Engines**: Transformers, MLX - **Model ID:** mlx-community/GLM-4.5V-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4.5v --size-in-billions 106 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm-4.6.rst ================================================ .. _models_llm_glm-4.6: ======================================== GLM-4.6 ======================================== - **Context Length:** 202752 - **Model Name:** GLM-4.6 - **Languages:** en, zh - **Abilities:** chat, reasoning, hybrid, tools - **Description:** GLM-4.6 significantly enhances context length (up to 200K tokens), code generation, reasoning with tool use, agent capabilities, and human-aligned writing compared to GLM-4.5. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 355 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/GLM-4.6 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name GLM-4.6 --size-in-billions 355 --model-format pytorch --quantization ${quantization} Model Spec 2 (fp8, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 355 - **Quantizations:** FP8 - **Engines**: vLLM - **Model ID:** zai-org/GLM-4.6-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name GLM-4.6 --size-in-billions 355 --model-format fp8 --quantization ${quantization} Model Spec 3 (gptq, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 355 - **Quantizations:** Int4-Int8Mix - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/GLM-4.6-GPTQ-Int4-Int8Mix - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name GLM-4.6 --size-in-billions 355 --model-format gptq --quantization ${quantization} Model Spec 4 (awq, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 355 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/GLM-4.6-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name GLM-4.6 --size-in-billions 355 --model-format awq --quantization ${quantization} Model Spec 5 (mlx, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 355 - **Quantizations:** 4bit, 5bit - **Engines**: MLX - **Model ID:** mlx-community/GLM-4.6-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name GLM-4.6 --size-in-billions 355 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm-4.7-flash.rst ================================================ .. _models_llm_glm-4.7-flash: ======================================== GLM-4.7-Flash ======================================== - **Context Length:** 202752 - **Model Name:** GLM-4.7-Flash - **Languages:** en, zh - **Abilities:** chat, reasoning, hybrid, tools - **Description:** GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, it offers a lightweight deployment option that balances performance and efficiency. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 30 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** zai-org/GLM-4.7-Flash - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name GLM-4.7-Flash --size-in-billions 30 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm-4.7.rst ================================================ .. _models_llm_glm-4.7: ======================================== GLM-4.7 ======================================== - **Context Length:** 202752 - **Model Name:** GLM-4.7 - **Languages:** en, zh - **Abilities:** chat, reasoning, hybrid, tools - **Description:** GLM-4.7 significantly advances core and multilingual agentic coding, UI/vibe coding, tool use, and complex reasoning—outperforming GLM-4.6 across benchmarks like SWE-bench, Terminal Bench 2.0, τ²-Bench, and HLE—while also improving chat, creative writing, and role-play. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 355 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/GLM-4.7 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name GLM-4.7 --size-in-billions 355 --model-format pytorch --quantization ${quantization} Model Spec 2 (fp8, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 355 - **Quantizations:** FP8 - **Engines**: vLLM - **Model ID:** zai-org/GLM-4.7-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name GLM-4.7 --size-in-billions 355 --model-format fp8 --quantization ${quantization} Model Spec 3 (gptq, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 355 - **Quantizations:** Int4-Int8Mix - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/GLM-4.7-GPTQ-Int4-Int8Mix - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name GLM-4.7 --size-in-billions 355 --model-format gptq --quantization ${quantization} Model Spec 4 (awq, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 355 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/GLM-4.7-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name GLM-4.7 --size-in-billions 355 --model-format awq --quantization ${quantization} Model Spec 5 (mlx, 355 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 355 - **Quantizations:** 4bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/GLM-4.7-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name GLM-4.7 --size-in-billions 355 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm-4v.rst ================================================ .. _models_llm_glm-4v: ======================================== glm-4v ======================================== - **Context Length:** 8192 - **Model Name:** glm-4v - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/glm-4v-9b - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-4v --size-in-billions 9 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm-5.rst ================================================ .. _models_llm_glm-5: ======================================== glm-5 ======================================== - **Context Length:** 202752 - **Model Name:** glm-5 - **Languages:** en, zh - **Abilities:** chat, vision, tools, reasoning - **Description:** We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity. Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed slime, a novel asynchronous RL infrastructure that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 744 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 744 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/GLM-5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-5 --size-in-billions 744 --model-format pytorch --quantization ${quantization} Model Spec 2 (fp8, 744 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 744 - **Quantizations:** FP8 - **Engines**: vLLM - **Model ID:** zai-org/GLM-5-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-5 --size-in-billions 744 --model-format fp8 --quantization ${quantization} Model Spec 3 (ggufv2, 744 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 744 - **Quantizations:** UD-TQ1_0 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/GLM-5-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-5 --size-in-billions 744 --model-format ggufv2 --quantization ${quantization} Model Spec 4 (mlx, 744 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 744 - **Quantizations:** 4bit, 8bit-MXFP8 - **Engines**: MLX - **Model ID:** mlx-community/GLM-5-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-5 --size-in-billions 744 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm-edge-chat.rst ================================================ .. _models_llm_glm-edge-chat: ======================================== glm-edge-chat ======================================== - **Context Length:** 8192 - **Model Name:** glm-edge-chat - **Languages:** en, zh - **Abilities:** chat - **Description:** The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_5 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/glm-edge-1.5b-chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 1_5 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/glm-edge-4b-chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 4 --model-format pytorch --quantization ${quantization} Model Spec 3 (ggufv2, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1_5 - **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** zai-org/glm-edge-1.5b-chat-gguf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 1_5 --model-format ggufv2 --quantization ${quantization} Model Spec 4 (ggufv2, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1_5 - **Quantizations:** F16 - **Engines**: vLLM, llama.cpp - **Model ID:** zai-org/glm-edge-1.5b-chat-gguf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 1_5 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (ggufv2, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 4 - **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** zai-org/glm-edge-4b-chat-gguf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 4 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (ggufv2, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 4 - **Quantizations:** F16 - **Engines**: vLLM, llama.cpp - **Model ID:** zai-org/glm-edge-4b-chat-gguf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 4 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm4-0414.rst ================================================ .. _models_llm_glm4-0414: ======================================== glm4-0414 ======================================== - **Context Length:** 32768 - **Model Name:** glm4-0414 - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** The GLM family welcomes new members, the GLM-4-32B-0414 series models, featuring 32 billion parameters. Its performance is comparable to OpenAI’s GPT series and DeepSeek’s V3/R1 series Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/GLM-4-9B-0414 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm4-0414 --size-in-billions 9 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/GLM-4-32B-0414 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm4-0414 --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 3 (mlx, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 9 - **Quantizations:** 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/GLM-4-9B-0414-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm4-0414 --size-in-billions 9 --model-format mlx --quantization ${quantization} Model Spec 4 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/GLM-4-32B-0414-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm4-0414 --size-in-billions 32 --model-format mlx --quantization ${quantization} Model Spec 5 (ggufv2, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 9 - **Quantizations:** IQ2_M, IQ3_M, IQ3_XS, IQ3_XXS, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_L, Q3_K_M, Q3_K_S, Q3_K_XL, Q4_0, Q4_1, Q4_K_L, Q4_K_M, Q4_K_S, Q5_K_L, Q5_K_M, Q5_K_S, Q6_K, Q6_K_L, Q8_0, bf16 - **Engines**: vLLM, llama.cpp - **Model ID:** bartowski/THUDM_GLM-4-9B-0414-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm4-0414 --size-in-billions 9 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (ggufv2, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 32 - **Quantizations:** IQ2_M, IQ2_S, IQ2_XS, IQ3_M, IQ3_XS, IQ3_XXS, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_L, Q3_K_M, Q3_K_S, Q3_K_XL, Q4_0, Q4_1, Q4_K_L, Q4_K_M, Q4_K_S, Q5_K_L, Q5_K_M, Q5_K_S, Q6_K, Q6_K_L, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** bartowski/THUDM_GLM-4-9B-0414-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm4-0414 --size-in-billions 32 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm4-chat-1m.rst ================================================ .. _models_llm_glm4-chat-1m: ======================================== glm4-chat-1m ======================================== - **Context Length:** 1048576 - **Model Name:** glm4-chat-1m - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/glm-4-9b-chat-1m-hf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm4-chat-1m --size-in-billions 9 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 9 - **Quantizations:** Q2_K, IQ3_XS, IQ3_S, IQ3_M, Q3_K_S, Q3_K_L, Q3_K, IQ4_XS, IQ4_NL, Q4_K_S, Q4_K, Q5_K_S, Q5_K, Q6_K, Q8_0, BF16, FP16 - **Engines**: vLLM, llama.cpp - **Model ID:** legraphista/glm-4-9b-chat-1m-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm4-chat-1m --size-in-billions 9 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/glm4-chat.rst ================================================ .. _models_llm_glm4-chat: ======================================== glm4-chat ======================================== - **Context Length:** 131072 - **Model Name:** glm4-chat - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** zai-org/glm-4-9b-chat-hf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm4-chat --size-in-billions 9 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 9 - **Quantizations:** Q2_K, IQ3_XS, IQ3_S, IQ3_M, Q3_K_S, Q3_K_L, Q3_K, IQ4_XS, IQ4_NL, Q4_K_S, Q4_K, Q5_K_S, Q5_K, Q6_K, Q8_0, BF16, FP16 - **Engines**: vLLM, llama.cpp - **Model ID:** legraphista/glm-4-9b-chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name glm4-chat --size-in-billions 9 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/gorilla-openfunctions-v2.rst ================================================ .. _models_llm_gorilla-openfunctions-v2: ======================================== gorilla-openfunctions-v2 ======================================== - **Context Length:** 4096 - **Model Name:** gorilla-openfunctions-v2 - **Languages:** en - **Abilities:** chat - **Description:** OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** gorilla-llm/gorilla-openfunctions-v2 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gorilla-openfunctions-v2 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K - **Engines**: vLLM, llama.cpp - **Model ID:** gorilla-llm//gorilla-openfunctions-v2-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gorilla-openfunctions-v2 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/gpt-2.rst ================================================ .. _models_llm_gpt-2: ======================================== gpt-2 ======================================== - **Context Length:** 1024 - **Model Name:** gpt-2 - **Languages:** en - **Abilities:** generate - **Description:** GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_5 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** openai-community/gpt2 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gpt-2 --size-in-billions 1_5 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/gpt-oss.rst ================================================ .. _models_llm_gpt-oss: ======================================== gpt-oss ======================================== - **Context Length:** 131072 - **Model Name:** gpt-oss - **Languages:** en - **Abilities:** chat, reasoning - **Description:** gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 20 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 20 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** openai/gpt-oss-20b - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gpt-oss --size-in-billions 20 --model-format pytorch --quantization ${quantization} Model Spec 2 (bnb, 20 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** bnb - **Model Size (in billions):** 20 - **Quantizations:** 4-bit - **Engines**: vLLM, Transformers - **Model ID:** unsloth/gpt-oss-20b-bnb-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gpt-oss --size-in-billions 20 --model-format bnb --quantization ${quantization} Model Spec 3 (pytorch, 120 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 120 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** openai/gpt-oss-120b - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gpt-oss --size-in-billions 120 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/huatuogpt-o1-llama-3.1.rst ================================================ .. _models_llm_huatuogpt-o1-llama-3.1: ======================================== HuatuoGPT-o1-LLaMA-3.1 ======================================== - **Context Length:** 131072 - **Model Name:** HuatuoGPT-o1-LLaMA-3.1 - **Languages:** en - **Abilities:** chat, tools - **Description:** HuatuoGPT-o1 is a medical LLM designed for advanced medical reasoning. It generates a complex thought process, reflecting and refining its reasoning, before providing a final response. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** FreedomIntelligence/HuatuoGPT-o1-8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name HuatuoGPT-o1-LLaMA-3.1 --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** FreedomIntelligence/HuatuoGPT-o1-70B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name HuatuoGPT-o1-LLaMA-3.1 --size-in-billions 70 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/huatuogpt-o1-qwen2.5.rst ================================================ .. _models_llm_huatuogpt-o1-qwen2.5: ======================================== HuatuoGPT-o1-Qwen2.5 ======================================== - **Context Length:** 32768 - **Model Name:** HuatuoGPT-o1-Qwen2.5 - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** HuatuoGPT-o1 is a medical LLM designed for advanced medical reasoning. It generates a complex thought process, reflecting and refining its reasoning, before providing a final response. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** FreedomIntelligence/HuatuoGPT-o1-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name HuatuoGPT-o1-Qwen2.5 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 72 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** FreedomIntelligence/HuatuoGPT-o1-72B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name HuatuoGPT-o1-Qwen2.5 --size-in-billions 72 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/index.rst ================================================ .. _models_llm_index: ===================== Large language Models ===================== The following is a list of built-in LLM in Xinference: .. list-table:: :widths: 25 25 25 50 :header-rows: 1 * - MODEL NAME - ABILITIES - COTNEXT_LENGTH - DESCRIPTION * - :ref:`baichuan-2 ` - generate - 4096 - Baichuan2 is an open-source Transformer based LLM that is trained on both Chinese and English data. * - :ref:`baichuan-2-chat ` - chat - 4096 - Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting. * - :ref:`baichuan-m2 ` - chat, reasoning, hybrid, tools - 131072 - Baichuan-M2-32B is Baichuan AI's medical-enhanced reasoning model, the second medical model released by Baichuan. Designed for real-world medical reasoning tasks, this model builds upon Qwen2.5-32B with an innovative Large Verifier System. Through domain-specific fine-tuning on real-world medical questions, it achieves breakthrough medical performance while maintaining strong general capabilities. * - :ref:`code-llama ` - generate - 100000 - Code-Llama is an open-source LLM trained by fine-tuning LLaMA2 for generating and discussing code. * - :ref:`code-llama-instruct ` - chat - 100000 - Code-Llama-Instruct is an instruct-tuned version of the Code-Llama LLM. * - :ref:`code-llama-python ` - generate - 100000 - Code-Llama-Python is a fine-tuned version of the Code-Llama LLM, specializing in Python. * - :ref:`codegeex4 ` - chat - 131072 - the open-source version of the latest CodeGeeX4 model series * - :ref:`codeqwen1.5 ` - generate - 65536 - CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes. * - :ref:`codeqwen1.5-chat ` - chat - 65536 - CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes. * - :ref:`codeshell ` - generate - 8194 - CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. * - :ref:`codeshell-chat ` - chat - 8194 - CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. * - :ref:`codestral-v0.1 ` - generate - 32768 - Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash * - :ref:`cogagent ` - chat, vision - 4096 - The CogAgent-9B-20241220 model is based on GLM-4V-9B, a bilingual open-source VLM base model. Through data collection and optimization, multi-stage training, and strategy improvements, CogAgent-9B-20241220 achieves significant advancements in GUI perception, inference prediction accuracy, action space completeness, and task generalizability. * - :ref:`deepseek ` - generate - 4096 - DeepSeek LLM, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. * - :ref:`deepseek-chat ` - chat - 4096 - DeepSeek LLM is an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. * - :ref:`deepseek-coder ` - generate - 16384 - Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. * - :ref:`deepseek-coder-instruct ` - chat - 16384 - deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data. * - :ref:`deepseek-prover-v2 ` - chat, reasoning - 163840 - We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model * - :ref:`deepseek-r1 ` - chat, reasoning - 163840 - DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. * - :ref:`deepseek-r1-0528 ` - chat, reasoning, tools - 163840 - DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. * - :ref:`deepseek-r1-0528-qwen3 ` - chat, reasoning - 131072 - The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro * - :ref:`deepseek-r1-distill-llama ` - chat, reasoning - 131072 - deepseek-r1-distill-llama is distilled from DeepSeek-R1 based on Llama * - :ref:`deepseek-r1-distill-qwen ` - chat, reasoning - 131072 - deepseek-r1-distill-qwen is distilled from DeepSeek-R1 based on Qwen * - :ref:`deepseek-v2-chat ` - chat - 128000 - DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. * - :ref:`deepseek-v2-chat-0628 ` - chat - 128000 - DeepSeek-V2-Chat-0628 is an improved version of DeepSeek-V2-Chat. * - :ref:`deepseek-v2.5 ` - chat - 128000 - DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. * - :ref:`deepseek-v3 ` - chat - 163840 - DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. * - :ref:`deepseek-v3-0324 ` - chat - 163840 - DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. * - :ref:`deepseek-v3.1 ` - chat, reasoning, hybrid, tools - 131072 - DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. * - :ref:`deepseek-v3.2 ` - chat, reasoning, hybrid, tools - 163840 - We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance * - :ref:`deepseek-v3.2-exp ` - chat, reasoning, hybrid, tools - 163840 - We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios. * - :ref:`deepseek-vl2 ` - chat, vision - 4096 - DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. * - :ref:`dianjin-r1 ` - chat, tools - 32768 - Tongyi DianJin is a financial intelligence solution platform built by Alibaba Cloud, dedicated to providing financial business developers with a convenient artificial intelligence application development environment. * - :ref:`ernie4.5 ` - chat - 131072 - ERNIE 4.5, a new family of large-scale multimodal models comprising 10 distinct variants. * - :ref:`fin-r1 ` - chat - 131072 - Fin-R1 is a large language model specifically designed for the field of financial reasoning * - :ref:`gemma-3-1b-it ` - chat - 32768 - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. * - :ref:`gemma-3-it ` - chat, vision - 131072 - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. * - :ref:`glm-4.1v-thinking ` - chat, vision, reasoning, tools - 65536 - GLM-4.1V-9B-Thinking, designed to explore the upper limits of reasoning in vision-language models. * - :ref:`glm-4.5 ` - chat, reasoning, hybrid, tools - 131072 - The GLM-4.5 series models are foundation models designed for intelligent agents. * - :ref:`glm-4.5v ` - chat, vision, reasoning, tools - 131072 - GLM-4.5V is based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air (106B parameters, 12B active). It continues the technical approach of GLM-4.1V-Thinking, achieving SOTA performance among models of the same scale on 42 public vision-language benchmarks. * - :ref:`glm-4.6 ` - chat, reasoning, hybrid, tools - 202752 - GLM-4.6 significantly enhances context length (up to 200K tokens), code generation, reasoning with tool use, agent capabilities, and human-aligned writing compared to GLM-4.5. * - :ref:`glm-4.7 ` - chat, reasoning, hybrid, tools - 202752 - GLM-4.7 significantly advances core and multilingual agentic coding, UI/vibe coding, tool use, and complex reasoning—outperforming GLM-4.6 across benchmarks like SWE-bench, Terminal Bench 2.0, τ²-Bench, and HLE—while also improving chat, creative writing, and role-play. * - :ref:`glm-4.7-flash ` - chat, reasoning, hybrid, tools - 202752 - GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, it offers a lightweight deployment option that balances performance and efficiency. * - :ref:`glm-4v ` - chat, vision - 8192 - GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. * - :ref:`glm-5 ` - chat, vision, tools, reasoning - 202752 - We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity. Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed slime, a novel asynchronous RL infrastructure that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models. * - :ref:`glm-edge-chat ` - chat - 8192 - The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs. * - :ref:`glm4-0414 ` - chat, tools - 32768 - The GLM family welcomes new members, the GLM-4-32B-0414 series models, featuring 32 billion parameters. Its performance is comparable to OpenAI’s GPT series and DeepSeek’s V3/R1 series * - :ref:`glm4-chat ` - chat, tools - 131072 - GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. * - :ref:`glm4-chat-1m ` - chat, tools - 1048576 - GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. * - :ref:`gorilla-openfunctions-v2 ` - chat - 4096 - OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. * - :ref:`gpt-2 ` - generate - 1024 - GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes. * - :ref:`gpt-oss ` - chat, reasoning - 131072 - gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. * - :ref:`huatuogpt-o1-llama-3.1 ` - chat, tools - 131072 - HuatuoGPT-o1 is a medical LLM designed for advanced medical reasoning. It generates a complex thought process, reflecting and refining its reasoning, before providing a final response. * - :ref:`huatuogpt-o1-qwen2.5 ` - chat, tools - 32768 - HuatuoGPT-o1 is a medical LLM designed for advanced medical reasoning. It generates a complex thought process, reflecting and refining its reasoning, before providing a final response. * - :ref:`internlm3-instruct ` - chat, tools - 32768 - InternLM3 has open-sourced an 8-billion parameter instruction model, InternLM3-8B-Instruct, designed for general-purpose usage and advanced reasoning. * - :ref:`internvl3 ` - chat, vision - 8192 - InternVL3, an advanced multimodal large language model (MLLM) series that demonstrates superior overall performance. * - :ref:`kat-v1 ` - chat - 131072 - Kwaipilot-AutoThink ranks first among all open-source models on LiveCodeBench Pro, a challenging benchmark explicitly designed to prevent data leakage, and even surpasses strong proprietary systems such as Seed and o3-mini. * - :ref:`kimi-k2.5 ` - chat, vision - 262144 - Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. It seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms. * - :ref:`llama-2 ` - generate - 4096 - Llama-2 is the second generation of Llama, open-source and trained on a larger amount of data. * - :ref:`llama-2-chat ` - chat - 4096 - Llama-2-Chat is a fine-tuned version of the Llama-2 LLM, specializing in chatting. * - :ref:`llama-3 ` - generate - 8192 - Llama 3 is an auto-regressive language model that uses an optimized transformer architecture * - :ref:`llama-3-instruct ` - chat - 8192 - The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. * - :ref:`llama-3.1 ` - generate - 131072 - Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture * - :ref:`llama-3.1-instruct ` - chat, tools - 131072 - The Llama 3.1 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. * - :ref:`llama-3.2-vision ` - generate, vision - 131072 - The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image... * - :ref:`llama-3.2-vision-instruct ` - chat, vision - 131072 - Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image... * - :ref:`llama-3.3-instruct ` - chat, tools - 131072 - The Llama 3.3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. * - :ref:`marco-o1 ` - chat, tools - 32768 - Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions * - :ref:`mineru2.5-2509-1.2b ` - chat, vision - 32768 - MinerU2.5-2509-1.2B is a vision language model for document understanding. * - :ref:`minicpm-2b-dpo-bf16 ` - chat - 4096 - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. * - :ref:`minicpm-2b-dpo-fp16 ` - chat - 4096 - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. * - :ref:`minicpm-2b-dpo-fp32 ` - chat - 4096 - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. * - :ref:`minicpm-2b-sft-bf16 ` - chat - 4096 - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. * - :ref:`minicpm-2b-sft-fp32 ` - chat - 4096 - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. * - :ref:`minicpm-v-2.6 ` - chat, vision - 32768 - MiniCPM-V 2.6 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. * - :ref:`minicpm-v-4.5 ` - chat, vision - 32768 - MiniCPM-V 4.5 is an improved version in the MiniCPM-V series with enhanced multimodal capabilities and better performance. * - :ref:`minicpm3-4b ` - chat - 32768 - MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models. * - :ref:`minicpm4 ` - chat - 32768 - MiniCPM4 series are highly efficient large language models (LLMs) designed explicitly for end-side devices, which achieves this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. * - :ref:`minimax-m2 ` - chat, tools, reasoning - 196608 - MiniMax-M2, a Mini model built for Max coding & agentic workflows. * - :ref:`minimax-m2.5 ` - chat, tools, reasoning - 196608 - MiniMax-M2.5, a Mini model built for Max coding & agentic workflows. * - :ref:`mistral-instruct-v0.1 ` - chat - 8192 - Mistral-7B-Instruct is a fine-tuned version of the Mistral-7B LLM on public datasets, specializing in chatting. * - :ref:`mistral-instruct-v0.2 ` - chat - 8192 - The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. * - :ref:`mistral-instruct-v0.3 ` - chat - 32768 - The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. * - :ref:`mistral-large-instruct ` - chat - 131072 - Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities. * - :ref:`mistral-nemo-instruct ` - chat - 1024000 - The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407 * - :ref:`mistral-v0.1 ` - generate - 8192 - Mistral-7B is a unmoderated Transformer based LLM claiming to outperform Llama2 on all benchmarks. * - :ref:`mixtral-8x22b-instruct-v0.1 ` - chat - 65536 - The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1, specializing in chatting. * - :ref:`mixtral-instruct-v0.1 ` - chat - 32768 - Mistral-8x7B-Instruct is a fine-tuned version of the Mistral-8x7B LLM, specializing in chatting. * - :ref:`mixtral-v0.1 ` - generate - 32768 - The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. * - :ref:`moonlight-16b-a3b-instruct ` - chat - 8192 - Kimi Muon is Scalable for LLM Training * - :ref:`openhermes-2.5 ` - chat - 8192 - Openhermes 2.5 is a fine-tuned version of Mistral-7B-v0.1 on primarily GPT-4 generated data. * - :ref:`opt ` - generate - 2048 - Opt is an open-source, decoder-only, Transformer based LLM that was designed to replicate GPT-3. * - :ref:`orion-chat ` - chat - 4096 - Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI. * - :ref:`ovis2 ` - chat, vision - 32768 - Ovis (Open VISion) is a novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings. * - :ref:`phi-2 ` - generate - 2048 - Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites. * - :ref:`phi-3-mini-128k-instruct ` - chat - 128000 - The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. * - :ref:`phi-3-mini-4k-instruct ` - chat - 4096 - The Phi-3-Mini-4k-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. * - :ref:`qvq-72b-preview ` - chat, vision - 32768 - QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities. * - :ref:`qwen-chat ` - chat - 32768 - Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting. * - :ref:`qwen1.5-chat ` - chat, tools - 32768 - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. * - :ref:`qwen1.5-moe-chat ` - chat, tools - 32768 - Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data. * - :ref:`qwen2-audio-instruct ` - chat, audio - 32768 - Qwen2-Audio: A large-scale audio-language model which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. * - :ref:`qwen2-instruct ` - chat, tools - 32768 - Qwen2 is the new series of Qwen large language models * - :ref:`qwen2-moe-instruct ` - chat, tools - 32768 - Qwen2 is the new series of Qwen large language models. * - :ref:`qwen2-vl-instruct ` - chat, vision - 32768 - Qwen2-VL: To See the World More Clearly.Qwen2-VL is the latest version of the vision language models in the Qwen model familities. * - :ref:`qwen2.5 ` - generate - 32768 - Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. * - :ref:`qwen2.5-coder ` - generate - 32768 - Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). * - :ref:`qwen2.5-coder-instruct ` - chat, tools - 32768 - Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). * - :ref:`qwen2.5-instruct ` - chat, tools - 32768 - Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. * - :ref:`qwen2.5-instruct-1m ` - chat - 1010000 - Qwen2.5-1M is the long-context version of the Qwen2.5 series models, supporting a context length of up to 1M tokens. * - :ref:`qwen2.5-omni ` - chat, vision, audio, omni - 32768 - Qwen2.5-Omni: the new flagship end-to-end multimodal model in the Qwen series. * - :ref:`qwen2.5-vl-instruct ` - chat, vision - 128000 - Qwen2.5-VL: Qwen2.5-VL is the latest version of the vision language models in the Qwen model familities. * - :ref:`qwen3 ` - chat, reasoning, hybrid, tools - 40960 - Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. * - :ref:`qwen3-coder ` - chat, tools - 262144 - we're announcing Qwen3-Coder, our most agentic code model to date * - :ref:`qwen3-instruct ` - chat, tools - 262144 - We introduce the updated version of the Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507 * - :ref:`qwen3-next-instruct ` - chat, tools - 262144 - Qwen3-Next-80B-A3B is the first installment in the Qwen3-Next series * - :ref:`qwen3-next-thinking ` - chat, reasoning, tools - 262144 - Qwen3-Next-80B-A3B is the first installment in the Qwen3-Next series * - :ref:`qwen3-omni-instruct ` - chat, vision, audio, omni, tools - 262144 - Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation models. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. We introduce several architectural upgrades to improve performance and efficiency. * - :ref:`qwen3-omni-thinking ` - chat, vision, audio, omni, reasoning, tools - 262144 - Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation models. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. We introduce several architectural upgrades to improve performance and efficiency. * - :ref:`qwen3-thinking ` - chat, reasoning, tools - 262144 - we have continued to scale the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning * - :ref:`qwen3-vl-instruct ` - chat, vision, tools - 262144 - Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. * - :ref:`qwen3-vl-thinking ` - chat, vision, reasoning, tools - 262144 - Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. * - :ref:`qwen3.5 ` - chat, vision, tools, reasoning - 262144 - Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with unprecedented capability and efficiency * - :ref:`qwenlong-l1 ` - chat - 32768 - QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning * - :ref:`qwq-32b ` - chat, reasoning, tools - 131072 - QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini. * - :ref:`qwq-32b-preview ` - chat - 32768 - QwQ-32B-Preview is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. * - :ref:`seallm_v2 ` - generate - 8192 - We introduce SeaLLM-7B-v2, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages * - :ref:`seallm_v2.5 ` - generate - 8192 - We introduce SeaLLM-7B-v2.5, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages * - :ref:`seallms-v3 ` - chat - 32768 - SeaLLMs - Large Language Models for Southeast Asia * - :ref:`seed-oss ` - chat, reasoning, tools - 524288 - Seed-OSS is a series of open-source large language models developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features. Although trained with only 12T tokens, Seed-OSS achieves excellent performance on several popular open benchmarks. * - :ref:`skywork ` - generate - 4096 - Skywork is a series of large models developed by the Kunlun Group · Skywork team. * - :ref:`skywork-math ` - generate - 4096 - Skywork is a series of large models developed by the Kunlun Group · Skywork team. * - :ref:`skywork-or1 ` - chat - 131072 - We release the final version of Skywork-OR1 (Open Reasoner 1) series of models, including * - :ref:`skywork-or1-preview ` - chat - 32768 - The Skywork-OR1 (Open Reasoner 1) model series consists of powerful math and code reasoning models trained using large-scale rule-based reinforcement learning with carefully designed datasets and training recipes. * - :ref:`telechat ` - chat - 8192 - The TeleChat is a large language model developed and trained by China Telecom Artificial Intelligence Technology Co., LTD. The 7B model base is trained with 1.5 trillion Tokens and 3 trillion Tokens and Chinese high-quality corpus. * - :ref:`tiny-llama ` - generate - 2048 - The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. * - :ref:`wizardcoder-python-v1.0 ` - chat - 100000 - * - :ref:`wizardmath-v1.0 ` - chat - 2048 - WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math. * - :ref:`xiyansql-qwencoder-2504 ` - chat, tools - 32768 - The XiYanSQL-QwenCoder models, as multi-dialect SQL base models, demonstrating robust SQL generation capabilities. * - :ref:`xverse ` - generate - 2048 - XVERSE is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. * - :ref:`xverse-chat ` - chat - 2048 - XVERSEB-Chat is the aligned version of model XVERSE. * - :ref:`yi ` - generate - 4096 - The Yi series models are large language models trained from scratch by developers at 01.AI. * - :ref:`yi-1.5 ` - generate - 4096 - Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. * - :ref:`yi-1.5-chat ` - chat - 4096 - Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. * - :ref:`yi-1.5-chat-16k ` - chat - 16384 - Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. * - :ref:`yi-200k ` - generate - 262144 - The Yi series models are large language models trained from scratch by developers at 01.AI. * - :ref:`yi-chat ` - chat - 4096 - The Yi series models are large language models trained from scratch by developers at 01.AI. .. toctree:: :maxdepth: 3 baichuan-2 baichuan-2-chat baichuan-m2 code-llama code-llama-instruct code-llama-python codegeex4 codeqwen1.5 codeqwen1.5-chat codeshell codeshell-chat codestral-v0.1 cogagent deepseek deepseek-chat deepseek-coder deepseek-coder-instruct deepseek-prover-v2 deepseek-r1 deepseek-r1-0528 deepseek-r1-0528-qwen3 deepseek-r1-distill-llama deepseek-r1-distill-qwen deepseek-v2-chat deepseek-v2-chat-0628 deepseek-v2.5 deepseek-v3 deepseek-v3-0324 deepseek-v3.1 deepseek-v3.2 deepseek-v3.2-exp deepseek-vl2 dianjin-r1 ernie4.5 fin-r1 gemma-3-1b-it gemma-3-it glm-4.1v-thinking glm-4.5 glm-4.5v glm-4.6 glm-4.7 glm-4.7-flash glm-4v glm-5 glm-edge-chat glm4-0414 glm4-chat glm4-chat-1m gorilla-openfunctions-v2 gpt-2 gpt-oss huatuogpt-o1-llama-3.1 huatuogpt-o1-qwen2.5 internlm3-instruct internvl3 kat-v1 kimi-k2.5 llama-2 llama-2-chat llama-3 llama-3-instruct llama-3.1 llama-3.1-instruct llama-3.2-vision llama-3.2-vision-instruct llama-3.3-instruct marco-o1 mineru2.5-2509-1.2b minicpm-2b-dpo-bf16 minicpm-2b-dpo-fp16 minicpm-2b-dpo-fp32 minicpm-2b-sft-bf16 minicpm-2b-sft-fp32 minicpm-v-2.6 minicpm-v-4.5 minicpm3-4b minicpm4 minimax-m2 minimax-m2.5 mistral-instruct-v0.1 mistral-instruct-v0.2 mistral-instruct-v0.3 mistral-large-instruct mistral-nemo-instruct mistral-v0.1 mixtral-8x22b-instruct-v0.1 mixtral-instruct-v0.1 mixtral-v0.1 moonlight-16b-a3b-instruct openhermes-2.5 opt orion-chat ovis2 phi-2 phi-3-mini-128k-instruct phi-3-mini-4k-instruct qvq-72b-preview qwen-chat qwen1.5-chat qwen1.5-moe-chat qwen2-audio-instruct qwen2-instruct qwen2-moe-instruct qwen2-vl-instruct qwen2.5 qwen2.5-coder qwen2.5-coder-instruct qwen2.5-instruct qwen2.5-instruct-1m qwen2.5-omni qwen2.5-vl-instruct qwen3 qwen3-coder qwen3-instruct qwen3-next-instruct qwen3-next-thinking qwen3-omni-instruct qwen3-omni-thinking qwen3-thinking qwen3-vl-instruct qwen3-vl-thinking qwen3.5 qwenlong-l1 qwq-32b qwq-32b-preview seallm_v2 seallm_v2.5 seallms-v3 seed-oss skywork skywork-math skywork-or1 skywork-or1-preview telechat tiny-llama wizardcoder-python-v1.0 wizardmath-v1.0 xiyansql-qwencoder-2504 xverse xverse-chat yi yi-1.5 yi-1.5-chat yi-1.5-chat-16k yi-200k yi-chat ================================================ FILE: doc/source/models/builtin/llm/internlm3-instruct.rst ================================================ .. _models_llm_internlm3-instruct: ======================================== internlm3-instruct ======================================== - **Context Length:** 32768 - **Model Name:** internlm3-instruct - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** InternLM3 has open-sourced an 8-billion parameter instruction model, InternLM3-8B-Instruct, designed for general-purpose usage and advanced reasoning. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** internlm/internlm3-8b-instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name internlm3-instruct --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 8 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** internlm/internlm3-8b-instruct-gptq-int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name internlm3-instruct --size-in-billions 8 --model-format gptq --quantization ${quantization} Model Spec 3 (awq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 8 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** internlm/internlm3-8b-instruct-awq - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name internlm3-instruct --size-in-billions 8 --model-format awq --quantization ${quantization} Model Spec 4 (ggufv2, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 8 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** internlm/internlm3-8b-instruct-gguf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name internlm3-instruct --size-in-billions 8 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/internlm3-8b-instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name internlm3-instruct --size-in-billions 8 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/internvl3.rst ================================================ .. _models_llm_internvl3: ======================================== InternVL3 ======================================== - **Context Length:** 8192 - **Model Name:** InternVL3 - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** InternVL3, an advanced multimodal large language model (MLLM) series that demonstrates superior overall performance. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1 - **Quantizations:** none - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-1B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 1 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 1 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 1 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-1B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 1 --model-format awq --quantization ${quantization} Model Spec 3 (pytorch, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2 - **Quantizations:** none - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-2B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 2 --model-format pytorch --quantization ${quantization} Model Spec 4 (awq, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 2 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-2B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 2 --model-format awq --quantization ${quantization} Model Spec 5 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 6 (awq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 8 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-8B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 8 --model-format awq --quantization ${quantization} Model Spec 7 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-9B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 9 --model-format pytorch --quantization ${quantization} Model Spec 8 (awq, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 9 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-9B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 9 --model-format awq --quantization ${quantization} Model Spec 9 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-14B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 14 --model-format pytorch --quantization ${quantization} Model Spec 10 (awq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 14 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-14B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 14 --model-format awq --quantization ${quantization} Model Spec 11 (pytorch, 38 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 38 - **Quantizations:** none - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-38B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 38 --model-format pytorch --quantization ${quantization} Model Spec 12 (awq, 38 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 38 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-38B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 38 --model-format awq --quantization ${quantization} Model Spec 13 (pytorch, 78 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 78 - **Quantizations:** none - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-78B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 78 --model-format pytorch --quantization ${quantization} Model Spec 14 (awq, 78 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 78 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, LMDEPLOY - **Model ID:** OpenGVLab/InternVL3-78B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name InternVL3 --size-in-billions 78 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/kat-v1.rst ================================================ .. _models_llm_kat-v1: ======================================== KAT-V1 ======================================== - **Context Length:** 131072 - **Model Name:** KAT-V1 - **Languages:** en, zh - **Abilities:** chat - **Description:** Kwaipilot-AutoThink ranks first among all open-source models on LiveCodeBench Pro, a challenging benchmark explicitly designed to prevent data leakage, and even surpasses strong proprietary systems such as Seed and o3-mini. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 40 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 40 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Kwaipilot/KAT-V1-40B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name KAT-V1 --size-in-billions 40 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 40 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 40 - **Quantizations:** Int4-Int8Mix - **Engines**: vLLM, Transformers, SGLang - **Model ID:** QuantTrio/KAT-V1-40B-GPTQ-Int4-Int8Mix - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name KAT-V1 --size-in-billions 40 --model-format gptq --quantization ${quantization} Model Spec 3 (awq, 40 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 40 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** QuantTrio/KAT-V1-40B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name KAT-V1 --size-in-billions 40 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/kimi-k2.5.rst ================================================ .. _models_llm_kimi-k2.5: ======================================== Kimi-K2.5 ======================================== - **Context Length:** 262144 - **Model Name:** Kimi-K2.5 - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. It seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1058_59 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1058_59 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** moonshotai/Kimi-K2.5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Kimi-K2.5 --size-in-billions 1058_59 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 1058_59 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1058_59 - **Quantizations:** none - **Engines**: llama.cpp - **Model ID:** unsloth/Kimi-K2.5-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Kimi-K2.5 --size-in-billions 1058_59 --model-format ggufv2 --quantization ${quantization} Model Spec 3 (mlx, 1058_59 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 1058_59 - **Quantizations:** 3bit - **Engines**: MLX - **Model ID:** mlx-community/Kimi-K2.5-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Kimi-K2.5 --size-in-billions 1058_59 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/llama-2-chat.rst ================================================ .. _models_llm_llama-2-chat: ======================================== llama-2-chat ======================================== - **Context Length:** 4096 - **Model Name:** llama-2-chat - **Languages:** en - **Abilities:** chat - **Description:** Llama-2-Chat is a fine-tuned version of the Llama-2 LLM, specializing in chatting. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Llama-2-7b-chat-hf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Llama-2-13b-chat-hf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 13 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Llama-2-70b-chat-hf - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 70 --model-format pytorch --quantization ${quantization} Model Spec 4 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/Llama-2-7B-Chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (ggufv2, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 13 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/Llama-2-13B-chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 13 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (ggufv2, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 70 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/Llama-2-70B-Chat-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 70 --model-format ggufv2 --quantization ${quantization} Model Spec 7 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-7B-Chat-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 8 (gptq, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 70 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-70B-Chat-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 70 --model-format gptq --quantization ${quantization} Model Spec 9 (awq, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 70 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-70B-Chat-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 70 --model-format awq --quantization ${quantization} Model Spec 10 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-7B-Chat-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 11 (gptq, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 13 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-13B-chat-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 13 --model-format gptq --quantization ${quantization} Model Spec 12 (awq, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 13 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-13B-chat-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2-chat --size-in-billions 13 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/llama-2.rst ================================================ .. _models_llm_llama-2: ======================================== llama-2 ======================================== - **Context Length:** 4096 - **Model Name:** llama-2 - **Languages:** en - **Abilities:** generate - **Description:** Llama-2 is the second generation of Llama, open-source and trained on a larger amount of data. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/Llama-2-7B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 2 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-7B-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 3 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-7B-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 4 (ggufv2, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 13 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/Llama-2-13B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 13 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (ggufv2, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 70 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M - **Engines**: llama.cpp - **Model ID:** TheBloke/Llama-2-70B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 70 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Llama-2-7b-hf - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 7 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Llama-2-13b-hf - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 13 --model-format pytorch --quantization ${quantization} Model Spec 8 (gptq, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 13 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-13B-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 13 --model-format gptq --quantization ${quantization} Model Spec 9 (awq, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 13 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-13B-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 13 --model-format awq --quantization ${quantization} Model Spec 10 (pytorch, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Llama-2-70b-hf - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 70 --model-format pytorch --quantization ${quantization} Model Spec 11 (gptq, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 70 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-70B-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 70 --model-format gptq --quantization ${quantization} Model Spec 12 (awq, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 70 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Llama-2-70B-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-2 --size-in-billions 70 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/llama-3-instruct.rst ================================================ .. _models_llm_llama-3-instruct: ======================================== llama-3-instruct ======================================== - **Context Length:** 8192 - **Model Name:** llama-3-instruct - **Languages:** en - **Abilities:** chat - **Description:** The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3-8B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3-70B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 70 --model-format pytorch --quantization ${quantization} Model Spec 3 (ggufv2, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 8 - **Quantizations:** IQ3_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 8 --model-format ggufv2 --quantization ${quantization} Model Spec 4 (ggufv2, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 70 - **Quantizations:** IQ1_M, IQ2_XS, Q4_K_M - **Engines**: vLLM, llama.cpp - **Model ID:** lmstudio-community/Meta-Llama-3-70B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 70 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3-8B-Instruct-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 6 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3-8B-Instruct-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 7 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3-8B-Instruct - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 8 (mlx, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 70 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3-70B-Instruct-4bit-mlx - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 70 --model-format mlx --quantization ${quantization} Model Spec 9 (mlx, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 70 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3-70B-Instruct-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 70 --model-format mlx --quantization ${quantization} Model Spec 10 (mlx, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3-70B-Instruct-mlx-unquantized - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 70 --model-format mlx --quantization ${quantization} Model Spec 11 (gptq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 8 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TechxGenus/Meta-Llama-3-8B-Instruct-GPTQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 8 --model-format gptq --quantization ${quantization} Model Spec 12 (gptq, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 70 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TechxGenus/Meta-Llama-3-70B-Instruct-GPTQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3-instruct --size-in-billions 70 --model-format gptq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/llama-3.1-instruct.rst ================================================ .. _models_llm_llama-3.1-instruct: ======================================== llama-3.1-instruct ======================================== - **Context Length:** 131072 - **Model Name:** llama-3.1-instruct - **Languages:** en, de, fr, it, pt, hi, es, th - **Abilities:** chat, tools - **Description:** The Llama 3.1 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3.1-8B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 3 (gptq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 8 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 8 --model-format gptq --quantization ${quantization} Model Spec 4 (awq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 8 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 8 --model-format awq --quantization ${quantization} Model Spec 5 (pytorch, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3.1-70B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 70 --model-format pytorch --quantization ${quantization} Model Spec 6 (pytorch, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 70 --model-format pytorch --quantization ${quantization} Model Spec 7 (gptq, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 70 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 70 --model-format gptq --quantization ${quantization} Model Spec 8 (awq, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 70 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 70 --model-format awq --quantization ${quantization} Model Spec 9 (pytorch, 405 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 405 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3.1-405B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 405 --model-format pytorch --quantization ${quantization} Model Spec 10 (gptq, 405 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 405 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 405 --model-format gptq --quantization ${quantization} Model Spec 11 (awq, 405 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 405 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 405 --model-format awq --quantization ${quantization} Model Spec 12 (ggufv2, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 8 - **Quantizations:** Q3_K_L, IQ4_XS, Q4_K_M, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 8 --model-format ggufv2 --quantization ${quantization} Model Spec 13 (ggufv2, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 70 - **Quantizations:** IQ2_M, IQ4_XS, Q2_K, Q3_K_S, Q4_K_M, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** lmstudio-community/Meta-Llama-3.1-70B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 70 --model-format ggufv2 --quantization ${quantization} Model Spec 14 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3.1-8B-Instruct-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 15 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3.1-8B-Instruct-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 16 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3.1-8B-Instruct - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 17 (mlx, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 70 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3.1-70B-Instruct-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 70 --model-format mlx --quantization ${quantization} Model Spec 18 (mlx, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 70 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3.1-70B-Instruct-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 70 --model-format mlx --quantization ${quantization} Model Spec 19 (mlx, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Meta-Llama-3.1-70B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1-instruct --size-in-billions 70 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/llama-3.1.rst ================================================ .. _models_llm_llama-3.1: ======================================== llama-3.1 ======================================== - **Context Length:** 131072 - **Model Name:** llama-3.1 - **Languages:** en, de, fr, it, pt, hi, es, th - **Abilities:** generate - **Description:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3.1-8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1 --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 8 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** QuantFactory/Meta-Llama-3.1-8B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1 --size-in-billions 8 --model-format ggufv2 --quantization ${quantization} Model Spec 3 (pytorch, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3.1-70B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1 --size-in-billions 70 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 405 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 405 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3.1-405B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.1 --size-in-billions 405 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/llama-3.2-vision-instruct.rst ================================================ .. _models_llm_llama-3.2-vision-instruct: ======================================== llama-3.2-vision-instruct ======================================== - **Context Length:** 131072 - **Model Name:** llama-3.2-vision-instruct - **Languages:** en, de, fr, it, pt, hi, es, th - **Abilities:** chat, vision - **Description:** Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image... Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 11 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 11 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Llama-3.2-11B-Vision-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.2-vision-instruct --size-in-billions 11 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 90 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 90 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Llama-3.2-90B-Vision-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.2-vision-instruct --size-in-billions 90 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/llama-3.2-vision.rst ================================================ .. _models_llm_llama-3.2-vision: ======================================== llama-3.2-vision ======================================== - **Context Length:** 131072 - **Model Name:** llama-3.2-vision - **Languages:** en, de, fr, it, pt, hi, es, th - **Abilities:** generate, vision - **Description:** The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image... Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 11 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 11 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3.2-11B-Vision - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.2-vision --size-in-billions 11 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 90 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 90 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3.2-90B-Vision - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.2-vision --size-in-billions 90 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/llama-3.3-instruct.rst ================================================ .. _models_llm_llama-3.3-instruct: ======================================== llama-3.3-instruct ======================================== - **Context Length:** 131072 - **Model Name:** llama-3.3-instruct - **Languages:** en, de, fr, it, pt, hi, es, th - **Abilities:** chat, tools - **Description:** The Llama 3.3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Llama-3.3-70B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.3-instruct --size-in-billions 70 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 70 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** shuyuej/Llama-3.3-70B-Instruct-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.3-instruct --size-in-billions 70 --model-format gptq --quantization ${quantization} Model Spec 3 (awq, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 70 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** casperhansen/llama-3.3-70b-instruct-awq - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.3-instruct --size-in-billions 70 --model-format awq --quantization ${quantization} Model Spec 4 (mlx, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 70 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, fp16 - **Engines**: MLX - **Model ID:** mlx-community/Llama-3.3-70B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.3-instruct --size-in-billions 70 --model-format mlx --quantization ${quantization} Model Spec 5 (ggufv2, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 70 - **Quantizations:** Q3_K_L, Q4_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** lmstudio-community/Llama-3.3-70B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3.3-instruct --size-in-billions 70 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/llama-3.rst ================================================ .. _models_llm_llama-3: ======================================== llama-3 ======================================== - **Context Length:** 8192 - **Model Name:** llama-3 - **Languages:** en - **Abilities:** generate - **Description:** Llama 3 is an auto-regressive language model that uses an optimized transformer architecture Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3-8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3 --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 8 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** QuantFactory/Meta-Llama-3-8B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3 --size-in-billions 8 --model-format ggufv2 --quantization ${quantization} Model Spec 3 (pytorch, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** meta-llama/Meta-Llama-3-70B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3 --size-in-billions 70 --model-format pytorch --quantization ${quantization} Model Spec 4 (ggufv2, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 70 - **Quantizations:** Q4_K_M, Q5_K_M - **Engines**: llama.cpp - **Model ID:** NousResearch/Meta-Llama-3-70B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name llama-3 --size-in-billions 70 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/marco-o1.rst ================================================ .. _models_llm_marco-o1: ======================================== marco-o1 ======================================== - **Context Length:** 32768 - **Model Name:** marco-o1 - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** AIDC-AI/Marco-o1 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name marco-o1 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** QuantFactory/Marco-o1-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name marco-o1 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/mineru2.5-2509-1.2b.rst ================================================ .. _models_llm_mineru2.5-2509-1.2b: ======================================== MinerU2.5-2509-1.2B ======================================== - **Context Length:** 32768 - **Model Name:** MinerU2.5-2509-1.2B - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** MinerU2.5-2509-1.2B is a vision language model for document understanding. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1_2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_2 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** opendatalab/MinerU2.5-2509-1.2B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name MinerU2.5-2509-1.2B --size-in-billions 1_2 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/minicpm-2b-dpo-bf16.rst ================================================ .. _models_llm_minicpm-2b-dpo-bf16: ======================================== minicpm-2b-dpo-bf16 ======================================== - **Context Length:** 4096 - **Model Name:** minicpm-2b-dpo-bf16 - **Languages:** zh - **Abilities:** chat - **Description:** MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** openbmb/MiniCPM-2B-dpo-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name minicpm-2b-dpo-bf16 --size-in-billions 2 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/minicpm-2b-dpo-fp16.rst ================================================ .. _models_llm_minicpm-2b-dpo-fp16: ======================================== minicpm-2b-dpo-fp16 ======================================== - **Context Length:** 4096 - **Model Name:** minicpm-2b-dpo-fp16 - **Languages:** zh - **Abilities:** chat - **Description:** MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** openbmb/MiniCPM-2B-dpo-fp16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name minicpm-2b-dpo-fp16 --size-in-billions 2 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/minicpm-2b-dpo-fp32.rst ================================================ .. _models_llm_minicpm-2b-dpo-fp32: ======================================== minicpm-2b-dpo-fp32 ======================================== - **Context Length:** 4096 - **Model Name:** minicpm-2b-dpo-fp32 - **Languages:** zh - **Abilities:** chat - **Description:** MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** openbmb/MiniCPM-2B-dpo-fp32 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name minicpm-2b-dpo-fp32 --size-in-billions 2 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/minicpm-2b-sft-bf16.rst ================================================ .. _models_llm_minicpm-2b-sft-bf16: ======================================== minicpm-2b-sft-bf16 ======================================== - **Context Length:** 4096 - **Model Name:** minicpm-2b-sft-bf16 - **Languages:** zh - **Abilities:** chat - **Description:** MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** openbmb/MiniCPM-2B-sft-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name minicpm-2b-sft-bf16 --size-in-billions 2 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/minicpm-2b-sft-fp32.rst ================================================ .. _models_llm_minicpm-2b-sft-fp32: ======================================== minicpm-2b-sft-fp32 ======================================== - **Context Length:** 4096 - **Model Name:** minicpm-2b-sft-fp32 - **Languages:** zh - **Abilities:** chat - **Description:** MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** openbmb/MiniCPM-2B-sft-fp32 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name minicpm-2b-sft-fp32 --size-in-billions 2 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/minicpm-v-2.6.rst ================================================ .. _models_llm_minicpm-v-2.6: ======================================== MiniCPM-V-2.6 ======================================== - **Context Length:** 32768 - **Model Name:** MiniCPM-V-2.6 - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** MiniCPM-V 2.6 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** openbmb/MiniCPM-V-2_6 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name MiniCPM-V-2.6 --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** openbmb/MiniCPM-V-2_6-int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name MiniCPM-V-2.6 --size-in-billions 8 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/minicpm-v-4.5.rst ================================================ .. _models_llm_minicpm-v-4.5: ======================================== MiniCPM-V-4.5 ======================================== - **Context Length:** 32768 - **Model Name:** MiniCPM-V-4.5 - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** MiniCPM-V 4.5 is an improved version in the MiniCPM-V series with enhanced multimodal capabilities and better performance. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** openbmb/MiniCPM-V-4_5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name MiniCPM-V-4.5 --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** openbmb/MiniCPM-V-4_5-int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name MiniCPM-V-4.5 --size-in-billions 8 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/minicpm3-4b.rst ================================================ .. _models_llm_minicpm3-4b: ======================================== minicpm3-4b ======================================== - **Context Length:** 32768 - **Model Name:** minicpm3-4b - **Languages:** zh - **Abilities:** chat - **Description:** MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** openbmb/MiniCPM3-4B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name minicpm3-4b --size-in-billions 4 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 4 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** openbmb/MiniCPM3-4B-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name minicpm3-4b --size-in-billions 4 --model-format gptq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/minicpm4.rst ================================================ .. _models_llm_minicpm4: ======================================== minicpm4 ======================================== - **Context Length:** 32768 - **Model Name:** minicpm4 - **Languages:** zh - **Abilities:** chat - **Description:** MiniCPM4 series are highly efficient large language models (LLMs) designed explicitly for end-side devices, which achieves this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 0_5 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** JunHowie/MiniCPM4-0.5B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name minicpm4 --size-in-billions 0_5 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** JunHowie/MiniCPM4-8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name minicpm4 --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 3 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/MiniCPM4-8B-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name minicpm4 --size-in-billions 8 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/minimax-m2.5.rst ================================================ .. _models_llm_minimax-m2.5: ======================================== MiniMax-M2.5 ======================================== - **Context Length:** 196608 - **Model Name:** MiniMax-M2.5 - **Languages:** en, zh - **Abilities:** chat, tools, reasoning - **Description:** MiniMax-M2.5, a Mini model built for Max coding & agentic workflows. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 230 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 230 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** MiniMaxAI/MiniMax-M2.5 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name MiniMax-M2.5 --size-in-billions 230 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 230 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 230 - **Quantizations:** UD-TQ1_0 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/MiniMax-M2.5-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name MiniMax-M2.5 --size-in-billions 230 --model-format ggufv2 --quantization ${quantization} Model Spec 3 (mlx, 230 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 230 - **Quantizations:** 3bit, 4bit, 5bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/MiniMax-M2.5-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name MiniMax-M2.5 --size-in-billions 230 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/minimax-m2.rst ================================================ .. _models_llm_minimax-m2: ======================================== MiniMax-M2 ======================================== - **Context Length:** 196608 - **Model Name:** MiniMax-M2 - **Languages:** en, zh - **Abilities:** chat, tools, reasoning - **Description:** MiniMax-M2, a Mini model built for Max coding & agentic workflows. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 230 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 230 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** MiniMaxAI/MiniMax-M2 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name MiniMax-M2 --size-in-billions 230 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 230 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 230 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/MiniMax-M2-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name MiniMax-M2 --size-in-billions 230 --model-format awq --quantization ${quantization} Model Spec 3 (mlx, 230 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 230 - **Quantizations:** 3bit, 4bit, 5bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/MiniMax-M2-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name MiniMax-M2 --size-in-billions 230 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/mistral-instruct-v0.1.rst ================================================ .. _models_llm_mistral-instruct-v0.1: ======================================== mistral-instruct-v0.1 ======================================== - **Context Length:** 8192 - **Model Name:** mistral-instruct-v0.1 - **Languages:** en - **Abilities:** chat - **Description:** Mistral-7B-Instruct is a fine-tuned version of the Mistral-7B LLM on public datasets, specializing in chatting. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** mistralai/Mistral-7B-Instruct-v0.1 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Mistral-7B-Instruct-v0.1-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.1 --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 3 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Mistral-7B-Instruct-v0.1-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.1 --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 4 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/Mistral-7B-Instruct-v0.1-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.1 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/mistral-instruct-v0.2.rst ================================================ .. _models_llm_mistral-instruct-v0.2: ======================================== mistral-instruct-v0.2 ======================================== - **Context Length:** 8192 - **Model Name:** mistral-instruct-v0.2 - **Languages:** en - **Abilities:** chat - **Description:** The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** mistralai/Mistral-7B-Instruct-v0.2 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.2 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Mistral-7B-Instruct-v0.2-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.2 --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 3 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Mistral-7B-Instruct-v0.2-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.2 --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 4 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/Mistral-7B-Instruct-v0.2-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.2 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/mistral-instruct-v0.3.rst ================================================ .. _models_llm_mistral-instruct-v0.3: ======================================== mistral-instruct-v0.3 ======================================== - **Context Length:** 32768 - **Model Name:** mistral-instruct-v0.3 - **Languages:** en - **Abilities:** chat - **Description:** The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** mistralai/Mistral-7B-Instruct-v0.3 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.3 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** neuralmagic/Mistral-7B-Instruct-v0.3-GPTQ-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.3 --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 3 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** solidrust/Mistral-7B-Instruct-v0.3-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.3 --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 4 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_K_S, Q4_K_M, Q5_K_S, Q5_K_M, Q6_K, Q8_0, fp16 - **Engines**: vLLM, llama.cpp - **Model ID:** MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-instruct-v0.3 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/mistral-large-instruct.rst ================================================ .. _models_llm_mistral-large-instruct: ======================================== mistral-large-instruct ======================================== - **Context Length:** 131072 - **Model Name:** mistral-large-instruct - **Languages:** en, fr, de, es, it, pt, zh, ru, ja, ko - **Abilities:** chat - **Description:** Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 123 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 123 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** mistralai/Mistral-Large-Instruct-2407 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-large-instruct --size-in-billions 123 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 123 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 123 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** unsloth/Mistral-Large-Instruct-2407-bnb-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-large-instruct --size-in-billions 123 --model-format pytorch --quantization ${quantization} Model Spec 3 (gptq, 123 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 123 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** ModelCloud/Mistral-Large-Instruct-2407-gptq-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-large-instruct --size-in-billions 123 --model-format gptq --quantization ${quantization} Model Spec 4 (awq, 123 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 123 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TechxGenus/Mistral-Large-Instruct-2407-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-large-instruct --size-in-billions 123 --model-format awq --quantization ${quantization} Model Spec 5 (ggufv2, 123 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 123 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_K_S, Q4_K_M - **Engines**: vLLM, llama.cpp - **Model ID:** MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-large-instruct --size-in-billions 123 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (mlx, 123 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 123 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Mistral-Large-Instruct-2407-bf16 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-large-instruct --size-in-billions 123 --model-format mlx --quantization ${quantization} Model Spec 7 (mlx, 123 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 123 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Mistral-Large-Instruct-2407-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-large-instruct --size-in-billions 123 --model-format mlx --quantization ${quantization} Model Spec 8 (mlx, 123 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 123 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Mistral-Large-Instruct-2407-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-large-instruct --size-in-billions 123 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/mistral-nemo-instruct.rst ================================================ .. _models_llm_mistral-nemo-instruct: ======================================== mistral-nemo-instruct ======================================== - **Context Length:** 1024000 - **Model Name:** mistral-nemo-instruct - **Languages:** en, fr, de, es, it, pt, zh, ru, ja - **Abilities:** chat - **Description:** The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407 Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 12 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** mistralai/Mistral-Nemo-Instruct-2407 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-nemo-instruct --size-in-billions 12 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 12 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-nemo-instruct --size-in-billions 12 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 12 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** afrizalha/Mistral-Nemo-Instruct-2407-bnb-8bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-nemo-instruct --size-in-billions 12 --model-format pytorch --quantization ${quantization} Model Spec 4 (gptq, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 12 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** ModelCloud/Mistral-Nemo-Instruct-2407-gptq-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-nemo-instruct --size-in-billions 12 --model-format gptq --quantization ${quantization} Model Spec 5 (awq, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 12 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** casperhansen/mistral-nemo-instruct-2407-awq - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-nemo-instruct --size-in-billions 12 --model-format awq --quantization ${quantization} Model Spec 6 (ggufv2, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 12 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_K_S, Q4_K_M, Q5_K_S, Q5_K_M, Q6_K, Q8_0, fp16 - **Engines**: vLLM, llama.cpp - **Model ID:** MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-nemo-instruct --size-in-billions 12 --model-format ggufv2 --quantization ${quantization} Model Spec 7 (mlx, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 12 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Mistral-Nemo-Instruct-2407-bf16 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-nemo-instruct --size-in-billions 12 --model-format mlx --quantization ${quantization} Model Spec 8 (mlx, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 12 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Mistral-Nemo-Instruct-2407-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-nemo-instruct --size-in-billions 12 --model-format mlx --quantization ${quantization} Model Spec 9 (mlx, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 12 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Mistral-Nemo-Instruct-2407-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-nemo-instruct --size-in-billions 12 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/mistral-v0.1.rst ================================================ .. _models_llm_mistral-v0.1: ======================================== mistral-v0.1 ======================================== - **Context Length:** 8192 - **Model Name:** mistral-v0.1 - **Languages:** en - **Abilities:** generate - **Description:** Mistral-7B is a unmoderated Transformer based LLM claiming to outperform Llama2 on all benchmarks. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** mistralai/Mistral-7B-v0.1 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/Mistral-7B-v0.1-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mistral-v0.1 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/mixtral-8x22b-instruct-v0.1.rst ================================================ .. _models_llm_mixtral-8x22b-instruct-v0.1: ======================================== mixtral-8x22B-instruct-v0.1 ======================================== - **Context Length:** 65536 - **Model Name:** mixtral-8x22B-instruct-v0.1 - **Languages:** en, fr, it, de, es - **Abilities:** chat - **Description:** The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1, specializing in chatting. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 141 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 141 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** mistralai/Mixtral-8x22B-Instruct-v0.1 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mixtral-8x22B-instruct-v0.1 --size-in-billions 141 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 141 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 141 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mixtral-8x22B-instruct-v0.1 --size-in-billions 141 --model-format awq --quantization ${quantization} Model Spec 3 (gptq, 141 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 141 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** jarrelscy/Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mixtral-8x22B-instruct-v0.1 --size-in-billions 141 --model-format gptq --quantization ${quantization} Model Spec 4 (ggufv2, 141 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 141 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6, Q8_0, fp16 - **Engines**: vLLM, llama.cpp - **Model ID:** MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mixtral-8x22B-instruct-v0.1 --size-in-billions 141 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/mixtral-instruct-v0.1.rst ================================================ .. _models_llm_mixtral-instruct-v0.1: ======================================== mixtral-instruct-v0.1 ======================================== - **Context Length:** 32768 - **Model Name:** mixtral-instruct-v0.1 - **Languages:** en, fr, it, de, es - **Abilities:** chat - **Description:** Mistral-8x7B-Instruct is a fine-tuned version of the Mistral-8x7B LLM, specializing in chatting. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 46_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 46_7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** mistralai/Mixtral-8x7B-Instruct-v0.1 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mixtral-instruct-v0.1 --size-in-billions 46_7 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 46_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 46_7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mixtral-instruct-v0.1 --size-in-billions 46_7 --model-format awq --quantization ${quantization} Model Spec 3 (gptq, 46_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 46_7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mixtral-instruct-v0.1 --size-in-billions 46_7 --model-format gptq --quantization ${quantization} Model Spec 4 (ggufv2, 46_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 46_7 - **Quantizations:** Q2_K, Q3_K_M, Q4_0, Q4_K_M, Q5_0, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mixtral-instruct-v0.1 --size-in-billions 46_7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/mixtral-v0.1.rst ================================================ .. _models_llm_mixtral-v0.1: ======================================== mixtral-v0.1 ======================================== - **Context Length:** 32768 - **Model Name:** mixtral-v0.1 - **Languages:** en, fr, it, de, es - **Abilities:** generate - **Description:** The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 46_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 46_7 - **Quantizations:** none - **Engines**: Transformers, SGLang - **Model ID:** mistralai/Mixtral-8x7B-v0.1 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mixtral-v0.1 --size-in-billions 46_7 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 46_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 46_7 - **Quantizations:** Int4 - **Engines**: Transformers, SGLang - **Model ID:** TheBloke/Mixtral-8x7B-v0.1-GPTQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mixtral-v0.1 --size-in-billions 46_7 --model-format gptq --quantization ${quantization} Model Spec 3 (ggufv2, 46_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 46_7 - **Quantizations:** Q2_K, Q3_K_M, Q4_0, Q4_K_M, Q5_0, Q5_K_M, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/Mixtral-8x7B-v0.1-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name mixtral-v0.1 --size-in-billions 46_7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/moonlight-16b-a3b-instruct.rst ================================================ .. _models_llm_moonlight-16b-a3b-instruct: ======================================== moonlight-16b-a3b-instruct ======================================== - **Context Length:** 8192 - **Model Name:** moonlight-16b-a3b-instruct - **Languages:** en, zh - **Abilities:** chat - **Description:** Kimi Muon is Scalable for LLM Training Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 3 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** moonshotai/Moonlight-16B-A3B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name moonlight-16b-a3b-instruct --size-in-billions 3 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/openhermes-2.5.rst ================================================ .. _models_llm_openhermes-2.5: ======================================== openhermes-2.5 ======================================== - **Context Length:** 8192 - **Model Name:** openhermes-2.5 - **Languages:** en - **Abilities:** chat - **Description:** Openhermes 2.5 is a fine-tuned version of Mistral-7B-v0.1 on primarily GPT-4 generated data. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** teknium/OpenHermes-2.5-Mistral-7B - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name openhermes-2.5 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/OpenHermes-2.5-Mistral-7B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name openhermes-2.5 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/opt.rst ================================================ .. _models_llm_opt: ======================================== opt ======================================== - **Context Length:** 2048 - **Model Name:** opt - **Languages:** en - **Abilities:** generate - **Description:** Opt is an open-source, decoder-only, Transformer based LLM that was designed to replicate GPT-3. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1 - **Quantizations:** none - **Engines**: Transformers, SGLang - **Model ID:** facebook/opt-125m - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name opt --size-in-billions 1 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/orion-chat.rst ================================================ .. _models_llm_orion-chat: ======================================== orion-chat ======================================== - **Context Length:** 4096 - **Model Name:** orion-chat - **Languages:** en, zh - **Abilities:** chat - **Description:** Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** OrionStarAI/Orion-14B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name orion-chat --size-in-billions 14 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 14 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** OrionStarAI/Orion-14B-Chat-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name orion-chat --size-in-billions 14 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/ovis2.rst ================================================ .. _models_llm_ovis2: ======================================== Ovis2 ======================================== - **Context Length:** 32768 - **Model Name:** Ovis2 - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** Ovis (Open VISion) is a novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** AIDC-AI/Ovis2-1B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ovis2 --size-in-billions 1 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** AIDC-AI/Ovis2-2B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ovis2 --size-in-billions 2 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** AIDC-AI/Ovis2-4B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ovis2 --size-in-billions 4 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** AIDC-AI/Ovis2-8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ovis2 --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 5 (pytorch, 16 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 16 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** AIDC-AI/Ovis2-16B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ovis2 --size-in-billions 16 --model-format pytorch --quantization ${quantization} Model Spec 6 (pytorch, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 34 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** AIDC-AI/Ovis2-34B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ovis2 --size-in-billions 34 --model-format pytorch --quantization ${quantization} Model Spec 7 (gptq, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 2 - **Quantizations:** Int4 - **Engines**: Transformers - **Model ID:** AIDC-AI/Ovis2-2B-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ovis2 --size-in-billions 2 --model-format gptq --quantization ${quantization} Model Spec 8 (gptq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 4 - **Quantizations:** Int4 - **Engines**: Transformers - **Model ID:** AIDC-AI/Ovis2-4B-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ovis2 --size-in-billions 4 --model-format gptq --quantization ${quantization} Model Spec 9 (gptq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 8 - **Quantizations:** Int4 - **Engines**: Transformers - **Model ID:** AIDC-AI/Ovis2-8B-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ovis2 --size-in-billions 8 --model-format gptq --quantization ${quantization} Model Spec 10 (gptq, 16 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 16 - **Quantizations:** Int4 - **Engines**: Transformers - **Model ID:** AIDC-AI/Ovis2-16B-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ovis2 --size-in-billions 16 --model-format gptq --quantization ${quantization} Model Spec 11 (gptq, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 34 - **Quantizations:** Int4, Int8 - **Engines**: Transformers - **Model ID:** AIDC-AI/Ovis2-34B-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Ovis2 --size-in-billions 34 --model-format gptq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/phi-2.rst ================================================ .. _models_llm_phi-2: ======================================== phi-2 ======================================== - **Context Length:** 2048 - **Model Name:** phi-2 - **Languages:** en - **Abilities:** generate - **Description:** Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (ggufv2, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 2 - **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/phi-2-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name phi-2 --size-in-billions 2 --model-format ggufv2 --quantization ${quantization} Model Spec 2 (pytorch, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** microsoft/phi-2 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name phi-2 --size-in-billions 2 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/phi-3-mini-128k-instruct.rst ================================================ .. _models_llm_phi-3-mini-128k-instruct: ======================================== phi-3-mini-128k-instruct ======================================== - **Context Length:** 128000 - **Model Name:** phi-3-mini-128k-instruct - **Languages:** en - **Abilities:** chat - **Description:** The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** microsoft/Phi-3-mini-128k-instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name phi-3-mini-128k-instruct --size-in-billions 4 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/phi-3-mini-4k-instruct.rst ================================================ .. _models_llm_phi-3-mini-4k-instruct: ======================================== phi-3-mini-4k-instruct ======================================== - **Context Length:** 4096 - **Model Name:** phi-3-mini-4k-instruct - **Languages:** en - **Abilities:** chat - **Description:** The Phi-3-Mini-4k-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (ggufv2, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 4 - **Quantizations:** fp16, q4 - **Engines**: llama.cpp - **Model ID:** microsoft/Phi-3-mini-4k-instruct-gguf - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name phi-3-mini-4k-instruct --size-in-billions 4 --model-format ggufv2 --quantization ${quantization} Model Spec 2 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** microsoft/Phi-3-mini-4k-instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name phi-3-mini-4k-instruct --size-in-billions 4 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qvq-72b-preview.rst ================================================ .. _models_llm_qvq-72b-preview: ======================================== QvQ-72B-Preview ======================================== - **Context Length:** 32768 - **Model Name:** QvQ-72B-Preview - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 72 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/QVQ-72B-Preview - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QvQ-72B-Preview --size-in-billions 72 --model-format pytorch --quantization ${quantization} Model Spec 2 (mlx, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 72 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/QVQ-72B-Preview-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QvQ-72B-Preview --size-in-billions 72 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen-chat.rst ================================================ .. _models_llm_qwen-chat: ======================================== qwen-chat ======================================== - **Context Length:** 32768 - **Model Name:** qwen-chat - **Languages:** en, zh - **Abilities:** chat - **Description:** Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q4_K_M - **Engines**: vLLM, llama.cpp - **Model ID:** Xorbits/Qwen-7B-Chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen-chat --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 2 (ggufv2, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 14 - **Quantizations:** Q4_K_M - **Engines**: vLLM, llama.cpp - **Model ID:** Xorbits/Qwen-14B-Chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen-chat --size-in-billions 14 --model-format ggufv2 --quantization ${quantization} Model Spec 3 (pytorch, 1_8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen-1_8B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen-chat --size-in-billions 1_8 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen-7B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 5 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen-14B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen-chat --size-in-billions 14 --model-format pytorch --quantization ${quantization} Model Spec 6 (pytorch, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 72 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen-72B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen-chat --size-in-billions 72 --model-format pytorch --quantization ${quantization} Model Spec 7 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen-7B-Chat-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen-chat --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 8 (gptq, 1_8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 1_8 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen-1_8B-Chat-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen-chat --size-in-billions 1_8 --model-format gptq --quantization ${quantization} Model Spec 9 (gptq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 14 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen-14B-Chat-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen-chat --size-in-billions 14 --model-format gptq --quantization ${quantization} Model Spec 10 (gptq, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 72 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen-72B-Chat-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen-chat --size-in-billions 72 --model-format gptq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen1.5-chat.rst ================================================ .. _models_llm_qwen1.5-chat: ======================================== qwen1.5-chat ======================================== - **Context Length:** 32768 - **Model Name:** qwen1.5-chat - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 0_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-0.5B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 0_5 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 1_8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-1.8B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 1_8 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-4B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 4 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-7B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 5 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-14B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 14 --model-format pytorch --quantization ${quantization} Model Spec 6 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-32B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 7 (pytorch, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 72 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-72B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 72 --model-format pytorch --quantization ${quantization} Model Spec 8 (pytorch, 110 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 110 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-110B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 110 --model-format pytorch --quantization ${quantization} Model Spec 9 (gptq, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 0_5 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-0.5B-Chat-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 0_5 --model-format gptq --quantization ${quantization} Model Spec 10 (gptq, 1_8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 1_8 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-1.8B-Chat-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 1_8 --model-format gptq --quantization ${quantization} Model Spec 11 (gptq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 4 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-4B-Chat-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 4 --model-format gptq --quantization ${quantization} Model Spec 12 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-7B-Chat-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 13 (gptq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 14 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-14B-Chat-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 14 --model-format gptq --quantization ${quantization} Model Spec 14 (gptq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-32B-Chat-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 32 --model-format gptq --quantization ${quantization} Model Spec 15 (gptq, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 72 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-72B-Chat-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 72 --model-format gptq --quantization ${quantization} Model Spec 16 (gptq, 110 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 110 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-110B-Chat-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 110 --model-format gptq --quantization ${quantization} Model Spec 17 (awq, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 0_5 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-0.5B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 0_5 --model-format awq --quantization ${quantization} Model Spec 18 (awq, 1_8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 1_8 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-1.8B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 1_8 --model-format awq --quantization ${quantization} Model Spec 19 (awq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 4 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-4B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 4 --model-format awq --quantization ${quantization} Model Spec 20 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-7B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 21 (awq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 14 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-14B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 14 --model-format awq --quantization ${quantization} Model Spec 22 (awq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-32B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 32 --model-format awq --quantization ${quantization} Model Spec 23 (awq, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 72 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-72B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 72 --model-format awq --quantization ${quantization} Model Spec 24 (awq, 110 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 110 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-110B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 110 --model-format awq --quantization ${quantization} Model Spec 25 (ggufv2, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 0_5 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen1.5-0.5B-Chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 0_5 --model-format ggufv2 --quantization ${quantization} Model Spec 26 (ggufv2, 1_8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1_8 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen1.5-1.8B-Chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 1_8 --model-format ggufv2 --quantization ${quantization} Model Spec 27 (ggufv2, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 4 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen1.5-4B-Chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 4 --model-format ggufv2 --quantization ${quantization} Model Spec 28 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen1.5-7B-Chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 29 (ggufv2, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 14 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen1.5-14B-Chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 14 --model-format ggufv2 --quantization ${quantization} Model Spec 30 (ggufv2, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 32 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen1.5-32B-Chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 32 --model-format ggufv2 --quantization ${quantization} Model Spec 31 (ggufv2, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 72 - **Quantizations:** q2_k, q3_k_m, q4_k_m - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen1.5-72B-Chat-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-chat --size-in-billions 72 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen1.5-moe-chat.rst ================================================ .. _models_llm_qwen1.5-moe-chat: ======================================== qwen1.5-moe-chat ======================================== - **Context Length:** 32768 - **Model Name:** qwen1.5-moe-chat - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 2_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2_7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-MoE-A2.7B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-moe-chat --size-in-billions 2_7 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 2_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 2_7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen1.5-moe-chat --size-in-billions 2_7 --model-format gptq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen2-audio-instruct.rst ================================================ .. _models_llm_qwen2-audio-instruct: ======================================== qwen2-audio-instruct ======================================== - **Context Length:** 32768 - **Model Name:** qwen2-audio-instruct - **Languages:** en, zh - **Abilities:** chat, audio - **Description:** Qwen2-Audio: A large-scale audio-language model which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-Audio-7B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-audio-instruct --size-in-billions 7 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen2-instruct.rst ================================================ .. _models_llm_qwen2-instruct: ======================================== qwen2-instruct ======================================== - **Context Length:** 32768 - **Model Name:** qwen2-instruct - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** Qwen2 is the new series of Qwen large language models Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 0_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-0.5B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 0_5 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-1.5B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 1_5 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-7B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 72 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-72B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 72 --model-format pytorch --quantization ${quantization} Model Spec 5 (gptq, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 0_5 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-0.5B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 0_5 --model-format gptq --quantization ${quantization} Model Spec 6 (gptq, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 1_5 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-1.5B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 1_5 --model-format gptq --quantization ${quantization} Model Spec 7 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-7B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 8 (gptq, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 72 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-72B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 72 --model-format gptq --quantization ${quantization} Model Spec 9 (awq, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 0_5 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-0.5B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 0_5 --model-format awq --quantization ${quantization} Model Spec 10 (awq, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 1_5 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-1.5B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 1_5 --model-format awq --quantization ${quantization} Model Spec 11 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-7B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 12 (awq, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 72 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-72B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 72 --model-format awq --quantization ${quantization} Model Spec 13 (fp8, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 0_5 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** neuralmagic/Qwen2-0.5B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 0_5 --model-format fp8 --quantization ${quantization} Model Spec 14 (fp8, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 0_5 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** neuralmagic/Qwen2-0.5B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 0_5 --model-format fp8 --quantization ${quantization} Model Spec 15 (fp8, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 1_5 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** neuralmagic/Qwen2-1.5B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 1_5 --model-format fp8 --quantization ${quantization} Model Spec 16 (fp8, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 7 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** neuralmagic/Qwen2-7B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 7 --model-format fp8 --quantization ${quantization} Model Spec 17 (fp8, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 72 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** neuralmagic/Qwen2-72B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 72 --model-format fp8 --quantization ${quantization} Model Spec 18 (mlx, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 0_5 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** Qwen/Qwen2-0.5B-Instruct-MLX - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 0_5 --model-format mlx --quantization ${quantization} Model Spec 19 (mlx, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 1_5 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** Qwen/Qwen2-1.5B-Instruct-MLX - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 1_5 --model-format mlx --quantization ${quantization} Model Spec 20 (mlx, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 7 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** Qwen/Qwen2-7B-Instruct-MLX - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 7 --model-format mlx --quantization ${quantization} Model Spec 21 (mlx, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 72 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2-72B-Instruct-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 72 --model-format mlx --quantization ${quantization} Model Spec 22 (ggufv2, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 0_5 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0, fp16 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2-0.5B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 0_5 --model-format ggufv2 --quantization ${quantization} Model Spec 23 (ggufv2, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1_5 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0, fp16 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2-1.5B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 1_5 --model-format ggufv2 --quantization ${quantization} Model Spec 24 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0, fp16 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2-7B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 25 (ggufv2, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 72 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0, fp16 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2-72B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-instruct --size-in-billions 72 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen2-moe-instruct.rst ================================================ .. _models_llm_qwen2-moe-instruct: ======================================== qwen2-moe-instruct ======================================== - **Context Length:** 32768 - **Model Name:** qwen2-moe-instruct - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** Qwen2 is the new series of Qwen large language models. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-57B-A14B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-moe-instruct --size-in-billions 14 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 14 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-moe-instruct --size-in-billions 14 --model-format gptq --quantization ${quantization} Model Spec 3 (ggufv2, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 14 - **Quantizations:** q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0, fp16 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2-57B-A14B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-moe-instruct --size-in-billions 14 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen2-vl-instruct.rst ================================================ .. _models_llm_qwen2-vl-instruct: ======================================== qwen2-vl-instruct ======================================== - **Context Length:** 32768 - **Model Name:** qwen2-vl-instruct - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** Qwen2-VL: To See the World More Clearly.Qwen2-VL is the latest version of the vision language models in the Qwen model familities. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-VL-2B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 2 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 2 - **Quantizations:** Int8 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 2 --model-format gptq --quantization ${quantization} Model Spec 3 (gptq, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 2 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 2 --model-format gptq --quantization ${quantization} Model Spec 4 (awq, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 2 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-VL-2B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 2 --model-format awq --quantization ${quantization} Model Spec 5 (mlx, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 2 - **Quantizations:** 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2-VL-2B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 2 --model-format mlx --quantization ${quantization} Model Spec 6 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-VL-7B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 7 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int8 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 8 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 9 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-VL-7B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 10 (pytorch, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 72 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-VL-72B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format pytorch --quantization ${quantization} Model Spec 11 (awq, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 72 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-VL-72B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format awq --quantization ${quantization} Model Spec 12 (gptq, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 72 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2-VL-72B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format gptq --quantization ${quantization} Model Spec 13 (mlx, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 72 - **Quantizations:** 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2-VL-72B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format mlx --quantization ${quantization} Model Spec 14 (mlx, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 7 - **Quantizations:** 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2-VL-7B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen2.5-coder-instruct.rst ================================================ .. _models_llm_qwen2.5-coder-instruct: ======================================== qwen2.5-coder-instruct ======================================== - **Context Length:** 32768 - **Model Name:** qwen2.5-coder-instruct - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 0_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-0.5B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 0_5 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-1.5B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 1_5 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 3 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-3B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 3 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-7B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 5 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-14B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 14 --model-format pytorch --quantization ${quantization} Model Spec 6 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-32B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 7 (gptq, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 0_5 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 0_5 --model-format gptq --quantization ${quantization} Model Spec 8 (gptq, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 1_5 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 1_5 --model-format gptq --quantization ${quantization} Model Spec 9 (awq, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 0_5 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 0_5 --model-format awq --quantization ${quantization} Model Spec 10 (awq, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 1_5 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 1_5 --model-format awq --quantization ${quantization} Model Spec 11 (ggufv2, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1_5 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 1_5 --model-format ggufv2 --quantization ${quantization} Model Spec 12 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2.5-Coder-7B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 13 (gptq, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 3 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 3 --model-format gptq --quantization ${quantization} Model Spec 14 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 15 (gptq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 14 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 14 --model-format gptq --quantization ${quantization} Model Spec 16 (gptq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 32 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 32 --model-format gptq --quantization ${quantization} Model Spec 17 (awq, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 3 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-3B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 3 --model-format awq --quantization ${quantization} Model Spec 18 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-7B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 19 (awq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 14 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-14B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 14 --model-format awq --quantization ${quantization} Model Spec 20 (awq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-32B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 32 --model-format awq --quantization ${quantization} Model Spec 21 (gptq, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 3 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-{quantization} - **Model Hubs**: `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 3 --model-format gptq --quantization ${quantization} Model Spec 22 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-{quantization} - **Model Hubs**: `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 23 (gptq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 14 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-{quantization} - **Model Hubs**: `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 14 --model-format gptq --quantization ${quantization} Model Spec 24 (gptq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 32 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-{quantization} - **Model Hubs**: `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 32 --model-format gptq --quantization ${quantization} Model Spec 25 (awq, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 3 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** qwen/Qwen2.5-Coder-3B-Instruct-AWQ - **Model Hubs**: `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 3 --model-format awq --quantization ${quantization} Model Spec 26 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** qwen/Qwen2.5-Coder-7B-Instruct-AWQ - **Model Hubs**: `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 27 (awq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 14 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** qwen/Qwen2.5-Coder-14B-Instruct-AWQ - **Model Hubs**: `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 14 --model-format awq --quantization ${quantization} Model Spec 28 (awq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** qwen/Qwen2.5-Coder-32B-Instruct-AWQ - **Model Hubs**: `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder-instruct --size-in-billions 32 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen2.5-coder.rst ================================================ .. _models_llm_qwen2.5-coder: ======================================== qwen2.5-coder ======================================== - **Context Length:** 32768 - **Model Name:** qwen2.5-coder - **Languages:** en, zh - **Abilities:** generate - **Description:** Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 0_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-0.5B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder --size-in-billions 0_5 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-1.5B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder --size-in-billions 1_5 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 3 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-3B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder --size-in-billions 3 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 5 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-14B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder --size-in-billions 14 --model-format pytorch --quantization ${quantization} Model Spec 6 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-Coder-32B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-coder --size-in-billions 32 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen2.5-instruct-1m.rst ================================================ .. _models_llm_qwen2.5-instruct-1m: ======================================== qwen2.5-instruct-1m ======================================== - **Context Length:** 1010000 - **Model Name:** qwen2.5-instruct-1m - **Languages:** en, zh - **Abilities:** chat - **Description:** Qwen2.5-1M is the long-context version of the Qwen2.5 series models, supporting a context length of up to 1M tokens. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-7B-Instruct-1M - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct-1m --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-14B-Instruct-1M - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct-1m --size-in-billions 14 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen2.5-instruct.rst ================================================ .. _models_llm_qwen2.5-instruct: ======================================== qwen2.5-instruct ======================================== - **Context Length:** 32768 - **Model Name:** qwen2.5-instruct - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 0_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-0.5B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 0_5 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-1.5B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 1_5 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 3 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-3B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 3 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-7B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 5 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-14B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 14 --model-format pytorch --quantization ${quantization} Model Spec 6 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-32B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 7 (pytorch, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 72 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-72B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 72 --model-format pytorch --quantization ${quantization} Model Spec 8 (gptq, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 0_5 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-0.5B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 0_5 --model-format gptq --quantization ${quantization} Model Spec 9 (gptq, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 1_5 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-1.5B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 1_5 --model-format gptq --quantization ${quantization} Model Spec 10 (gptq, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 3 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-3B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 3 --model-format gptq --quantization ${quantization} Model Spec 11 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-7B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 12 (gptq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 14 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-14B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 14 --model-format gptq --quantization ${quantization} Model Spec 13 (gptq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 32 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-32B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 32 --model-format gptq --quantization ${quantization} Model Spec 14 (gptq, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 72 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-72B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 72 --model-format gptq --quantization ${quantization} Model Spec 15 (awq, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 0_5 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-0.5B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 0_5 --model-format awq --quantization ${quantization} Model Spec 16 (awq, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 1_5 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-1.5B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 1_5 --model-format awq --quantization ${quantization} Model Spec 17 (awq, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 3 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-3B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 3 --model-format awq --quantization ${quantization} Model Spec 18 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-7B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 19 (awq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 14 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-14B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 14 --model-format awq --quantization ${quantization} Model Spec 20 (awq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-32B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 32 --model-format awq --quantization ${quantization} Model Spec 21 (awq, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 72 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-72B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 72 --model-format awq --quantization ${quantization} Model Spec 22 (ggufv2, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 0_5 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2.5-0.5B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 0_5 --model-format ggufv2 --quantization ${quantization} Model Spec 23 (ggufv2, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1_5 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2.5-1.5B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 1_5 --model-format ggufv2 --quantization ${quantization} Model Spec 24 (ggufv2, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 3 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2.5-3B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 3 --model-format ggufv2 --quantization ${quantization} Model Spec 25 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2.5-7B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 26 (ggufv2, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 14 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2.5-14B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 14 --model-format ggufv2 --quantization ${quantization} Model Spec 27 (ggufv2, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 32 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2.5-32B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 32 --model-format ggufv2 --quantization ${quantization} Model Spec 28 (mlx, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 3 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-3B-Instruct-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 3 --model-format mlx --quantization ${quantization} Model Spec 29 (mlx, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 3 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-3B-Instruct-8bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 3 --model-format mlx --quantization ${quantization} Model Spec 30 (mlx, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 7 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-7B-Instruct-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 7 --model-format mlx --quantization ${quantization} Model Spec 31 (mlx, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 7 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-7B-Instruct-8bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 7 --model-format mlx --quantization ${quantization} Model Spec 32 (mlx, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 14 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-14B-Instruct-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 14 --model-format mlx --quantization ${quantization} Model Spec 33 (mlx, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 14 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-14B-Instruct-8bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 14 --model-format mlx --quantization ${quantization} Model Spec 34 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-32B-Instruct-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 32 --model-format mlx --quantization ${quantization} Model Spec 35 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-32B-Instruct-8bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 32 --model-format mlx --quantization ${quantization} Model Spec 36 (mlx, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 72 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-72B-Instruct-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 72 --model-format mlx --quantization ${quantization} Model Spec 37 (mlx, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 72 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-72B-Instruct-8bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 72 --model-format mlx --quantization ${quantization} Model Spec 38 (ggufv2, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 72 - **Quantizations:** q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k, q8_0, fp16 - **Engines**: vLLM, llama.cpp - **Model ID:** Qwen/Qwen2.5-72B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 72 --model-format ggufv2 --quantization ${quantization} Model Spec 39 (mlx, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 0_5 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-0.5B-Instruct-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 0_5 --model-format mlx --quantization ${quantization} Model Spec 40 (mlx, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 0_5 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-0.5B-Instruct-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 0_5 --model-format mlx --quantization ${quantization} Model Spec 41 (mlx, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 0_5 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-0.5B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 0_5 --model-format mlx --quantization ${quantization} Model Spec 42 (mlx, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 1_5 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-1.5B-Instruct-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 1_5 --model-format mlx --quantization ${quantization} Model Spec 43 (mlx, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 1_5 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-1.5B-Instruct-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 1_5 --model-format mlx --quantization ${quantization} Model Spec 44 (mlx, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 1_5 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-1.5B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 1_5 --model-format mlx --quantization ${quantization} Model Spec 45 (mlx, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 3 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-3B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 3 --model-format mlx --quantization ${quantization} Model Spec 46 (mlx, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-7B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 7 --model-format mlx --quantization ${quantization} Model Spec 47 (mlx, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-14B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 14 --model-format mlx --quantization ${quantization} Model Spec 48 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-32B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 32 --model-format mlx --quantization ${quantization} Model Spec 49 (mlx, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 72 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-72B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-instruct --size-in-billions 72 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen2.5-omni.rst ================================================ .. _models_llm_qwen2.5-omni: ======================================== qwen2.5-omni ======================================== - **Context Length:** 32768 - **Model Name:** qwen2.5-omni - **Languages:** en, zh - **Abilities:** chat, vision, audio, omni - **Description:** Qwen2.5-Omni: the new flagship end-to-end multimodal model in the Qwen series. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 3 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2.5-Omni-3B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-omni --size-in-billions 3 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen2.5-Omni-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-omni --size-in-billions 7 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen2.5-vl-instruct.rst ================================================ .. _models_llm_qwen2.5-vl-instruct: ======================================== qwen2.5-vl-instruct ======================================== - **Context Length:** 128000 - **Model Name:** qwen2.5-vl-instruct - **Languages:** en, zh - **Abilities:** chat, vision - **Description:** Qwen2.5-VL: Qwen2.5-VL is the latest version of the vision language models in the Qwen model familities. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 3 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-VL-3B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 3 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-VL-7B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-VL-32B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 72 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-VL-72B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 72 --model-format pytorch --quantization ${quantization} Model Spec 5 (awq, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 3 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-VL-3B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 3 --model-format awq --quantization ${quantization} Model Spec 6 (awq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-VL-7B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 7 --model-format awq --quantization ${quantization} Model Spec 7 (awq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-VL-32B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 32 --model-format awq --quantization ${quantization} Model Spec 8 (awq, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 72 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-VL-72B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 72 --model-format awq --quantization ${quantization} Model Spec 9 (mlx, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 3 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-VL-3B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 3 --model-format mlx --quantization ${quantization} Model Spec 10 (mlx, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 7 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-VL-7B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 7 --model-format mlx --quantization ${quantization} Model Spec 11 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-VL-32B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 32 --model-format mlx --quantization ${quantization} Model Spec 12 (mlx, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 72 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen2.5-VL-72B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5-vl-instruct --size-in-billions 72 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen2.5.rst ================================================ .. _models_llm_qwen2.5: ======================================== qwen2.5 ======================================== - **Context Length:** 32768 - **Model Name:** qwen2.5 - **Languages:** en, zh - **Abilities:** generate - **Description:** Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 0_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 0_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-0.5B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5 --size-in-billions 0_5 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-1.5B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5 --size-in-billions 1_5 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 3 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 3 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-3B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5 --size-in-billions 3 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 5 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-14B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5 --size-in-billions 14 --model-format pytorch --quantization ${quantization} Model Spec 6 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-32B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5 --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 7 (pytorch, 72 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 72 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen2.5-72B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen2.5 --size-in-billions 72 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen3-coder.rst ================================================ .. _models_llm_qwen3-coder: ======================================== Qwen3-Coder ======================================== - **Context Length:** 262144 - **Model Name:** Qwen3-Coder - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** we're announcing Qwen3-Coder, our most agentic code model to date Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 480 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 480 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-Coder-480B-A35B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 480 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 30 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-Coder-30B-A3B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 30 --model-format pytorch --quantization ${quantization} Model Spec 3 (fp8, 480 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 480 - **Quantizations:** fp8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 480 --model-format fp8 --quantization ${quantization} Model Spec 4 (fp8, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 30 - **Quantizations:** fp8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 30 --model-format fp8 --quantization ${quantization} Model Spec 5 (gptq, 480 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 480 - **Quantizations:** Int4-Int8Mix - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-Coder-480B-A35B-Instruct-GPTQ-Int4-Int8Mix - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 480 --model-format gptq --quantization ${quantization} Model Spec 6 (gptq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 30 - **Quantizations:** Int8 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 30 --model-format gptq --quantization ${quantization} Model Spec 7 (awq, 480 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 480 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-Coder-480B-A35B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 480 --model-format awq --quantization ${quantization} Model Spec 8 (awq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 30 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 30 --model-format awq --quantization ${quantization} Model Spec 9 (mlx, 480 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 480 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-Coder-480B-A35B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 480 --model-format mlx --quantization ${quantization} Model Spec 10 (mlx, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 30 - **Quantizations:** 3bit, 4bit, 5bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-Coder-30B-A3B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 30 --model-format mlx --quantization ${quantization} Model Spec 11 (ggufv2, 480 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 480 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 480 --model-format ggufv2 --quantization ${quantization} Model Spec 12 (ggufv2, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 30 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL, UD-TQ1_0 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Coder --size-in-billions 30 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen3-instruct.rst ================================================ .. _models_llm_qwen3-instruct: ======================================== Qwen3-Instruct ======================================== - **Context Length:** 262144 - **Model Name:** Qwen3-Instruct - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** We introduce the updated version of the Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507 Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 235 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-235B-A22B-Instruct-2507 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 235 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 30 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-30B-A3B-Instruct-2507 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 30 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-4B-Instruct-2507 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 4 --model-format pytorch --quantization ${quantization} Model Spec 4 (fp8, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 235 - **Quantizations:** fp8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 235 --model-format fp8 --quantization ${quantization} Model Spec 5 (fp8, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 30 - **Quantizations:** fp8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 30 --model-format fp8 --quantization ${quantization} Model Spec 6 (fp8, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: vLLM - **Model ID:** Qwen/Qwen3-4B-Instruct-2507-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 4 --model-format fp8 --quantization ${quantization} Model Spec 7 (gptq, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 235 - **Quantizations:** Int4-Int8Mix - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 235 --model-format gptq --quantization ${quantization} Model Spec 8 (gptq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 30 - **Quantizations:** Int8 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-30B-A3B-Instruct-2507-GPTQ-Int8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 30 --model-format gptq --quantization ${quantization} Model Spec 9 (gptq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 4 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers - **Model ID:** JunHowie/Qwen3-4B-Instruct-2507-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 4 --model-format gptq --quantization ${quantization} Model Spec 10 (awq, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 235 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-235B-A22B-Instruct-2507-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 235 --model-format awq --quantization ${quantization} Model Spec 11 (awq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 30 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 30 --model-format awq --quantization ${quantization} Model Spec 12 (awq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 4 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** Eslzzyl/Qwen3-4B-Instruct-2507-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 4 --model-format awq --quantization ${quantization} Model Spec 13 (mlx, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 235 - **Quantizations:** 3bit, 4bit, 5bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-235B-A22B-Instruct-2507-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 235 --model-format mlx --quantization ${quantization} Model Spec 14 (mlx, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 30 - **Quantizations:** 4bit, 5bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-30B-A3B-Instruct-2507-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 30 --model-format mlx --quantization ${quantization} Model Spec 15 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** 4bit, 5bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-4B-Instruct-2507-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 16 (ggufv2, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 235 - **Quantizations:** BF16, IQ4_XS, Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 235 --model-format ggufv2 --quantization ${quantization} Model Spec 17 (ggufv2, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 30 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL, UD-TQ1_0 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 30 --model-format ggufv2 --quantization ${quantization} Model Spec 18 (ggufv2, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 4 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-4B-Instruct-2507-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Instruct --size-in-billions 4 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen3-next-instruct.rst ================================================ .. _models_llm_qwen3-next-instruct: ======================================== Qwen3-Next-Instruct ======================================== - **Context Length:** 262144 - **Model Name:** Qwen3-Next-Instruct - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** Qwen3-Next-80B-A3B is the first installment in the Qwen3-Next series Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 80 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 80 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-Next-80B-A3B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Next-Instruct --size-in-billions 80 --model-format pytorch --quantization ${quantization} Model Spec 2 (fp8, 80 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 80 - **Quantizations:** fp8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Next-Instruct --size-in-billions 80 --model-format fp8 --quantization ${quantization} Model Spec 3 (awq, 80 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 80 - **Quantizations:** 4bit, 8bit - **Engines**: vLLM, Transformers - **Model ID:** cpatonn/Qwen3-Next-80B-A3B-Instruct-AWQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Next-Instruct --size-in-billions 80 --model-format awq --quantization ${quantization} Model Spec 4 (mlx, 80 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 80 - **Quantizations:** 4bit, 5bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-Next-80B-A3B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Next-Instruct --size-in-billions 80 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen3-next-thinking.rst ================================================ .. _models_llm_qwen3-next-thinking: ======================================== Qwen3-Next-Thinking ======================================== - **Context Length:** 262144 - **Model Name:** Qwen3-Next-Thinking - **Languages:** en, zh - **Abilities:** chat, reasoning, tools - **Description:** Qwen3-Next-80B-A3B is the first installment in the Qwen3-Next series Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 80 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 80 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-Next-80B-A3B-Thinking - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Next-Thinking --size-in-billions 80 --model-format pytorch --quantization ${quantization} Model Spec 2 (fp8, 80 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 80 - **Quantizations:** fp8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Next-Thinking --size-in-billions 80 --model-format fp8 --quantization ${quantization} Model Spec 3 (awq, 80 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 80 - **Quantizations:** 4bit, 8bit - **Engines**: vLLM, Transformers - **Model ID:** cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Next-Thinking --size-in-billions 80 --model-format awq --quantization ${quantization} Model Spec 4 (mlx, 80 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 80 - **Quantizations:** 4bit, 5bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-Next-80B-A3B-Thinking-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Next-Thinking --size-in-billions 80 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen3-omni-instruct.rst ================================================ .. _models_llm_qwen3-omni-instruct: ======================================== Qwen3-Omni-Instruct ======================================== - **Context Length:** 262144 - **Model Name:** Qwen3-Omni-Instruct - **Languages:** en, zh - **Abilities:** chat, vision, audio, omni, tools - **Description:** Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation models. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. We introduce several architectural upgrades to improve performance and efficiency. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 30 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-Omni-30B-A3B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Omni-Instruct --size-in-billions 30 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 30 - **Quantizations:** 4bit, 8bit - **Engines**: vLLM, Transformers - **Model ID:** cpatonn/Qwen3-Omni-30B-A3B-Instruct-AWQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Omni-Instruct --size-in-billions 30 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen3-omni-thinking.rst ================================================ .. _models_llm_qwen3-omni-thinking: ======================================== Qwen3-Omni-Thinking ======================================== - **Context Length:** 262144 - **Model Name:** Qwen3-Omni-Thinking - **Languages:** en, zh - **Abilities:** chat, vision, audio, omni, reasoning, tools - **Description:** Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation models. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. We introduce several architectural upgrades to improve performance and efficiency. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 30 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-Omni-30B-A3B-Thinking - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Omni-Thinking --size-in-billions 30 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 30 - **Quantizations:** 4bit, 8bit - **Engines**: vLLM, Transformers - **Model ID:** cpatonn/Qwen3-Omni-30B-A3B-Thinking-AWQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Omni-Thinking --size-in-billions 30 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen3-thinking.rst ================================================ .. _models_llm_qwen3-thinking: ======================================== Qwen3-Thinking ======================================== - **Context Length:** 262144 - **Model Name:** Qwen3-Thinking - **Languages:** en, zh - **Abilities:** chat, reasoning, tools - **Description:** we have continued to scale the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 235 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-235B-A22B-Thinking-2507 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 235 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 30 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-30B-A3B-Thinking-2507 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 30 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-4B-Thinking-2507 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 4 --model-format pytorch --quantization ${quantization} Model Spec 4 (fp8, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 235 - **Quantizations:** fp8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3-235B-A22B-Thinking-2507-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 235 --model-format fp8 --quantization ${quantization} Model Spec 5 (fp8, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 30 - **Quantizations:** fp8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3-30B-A3B-Thinking-2507-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 30 --model-format fp8 --quantization ${quantization} Model Spec 6 (fp8, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: vLLM - **Model ID:** Qwen/Qwen3-4B-Thinking-2507-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 4 --model-format fp8 --quantization ${quantization} Model Spec 7 (gptq, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 235 - **Quantizations:** Int4-Int8Mix - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-235B-A22B-Thinking-2507-GPTQ-Int4-Int8Mix - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 235 --model-format gptq --quantization ${quantization} Model Spec 8 (gptq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 30 - **Quantizations:** Int8 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-30B-A3B-Thinking-2507-GPTQ-Int8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 30 --model-format gptq --quantization ${quantization} Model Spec 9 (gptq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 4 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers - **Model ID:** JunHowie/Qwen3-4B-Thinking-2507-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 4 --model-format gptq --quantization ${quantization} Model Spec 10 (awq, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 235 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-235B-A22B-Thinking-2507-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 235 --model-format awq --quantization ${quantization} Model Spec 11 (awq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 30 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-30B-A3B-Thinking-2507-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 30 --model-format awq --quantization ${quantization} Model Spec 12 (awq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 4 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** Eslzzyl/Qwen3-4B-Thinking-2507-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 4 --model-format awq --quantization ${quantization} Model Spec 13 (mlx, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 235 - **Quantizations:** 3bit, 4bit, 5bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-235B-A22B-Thinking-2507-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 235 --model-format mlx --quantization ${quantization} Model Spec 14 (mlx, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 30 - **Quantizations:** 4bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-30B-A3B-Thinking-2507-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 30 --model-format mlx --quantization ${quantization} Model Spec 15 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** 4bit, 5bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-4B-Thinking-2507-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 16 (ggufv2, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 235 - **Quantizations:** BF16, IQ4_XS, Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-235B-A22B-Thinking-2507-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 235 --model-format ggufv2 --quantization ${quantization} Model Spec 17 (ggufv2, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 30 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL, UD-TQ1_0 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 30 --model-format ggufv2 --quantization ${quantization} Model Spec 18 (ggufv2, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 4 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-4B-Thinking-2507-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-Thinking --size-in-billions 4 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen3-vl-instruct.rst ================================================ .. _models_llm_qwen3-vl-instruct: ======================================== Qwen3-VL-Instruct ======================================== - **Context Length:** 262144 - **Model Name:** Qwen3-VL-Instruct - **Languages:** en, zh - **Abilities:** chat, vision, tools - **Description:** Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 235 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-235B-A22B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 235 --model-format pytorch --quantization ${quantization} Model Spec 2 (fp8, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 235 - **Quantizations:** fp8 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-235B-A22B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 235 --model-format fp8 --quantization ${quantization} Model Spec 3 (awq, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 235 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-VL-235B-A22B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 235 --model-format awq --quantization ${quantization} Model Spec 4 (pytorch, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 30 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-30B-A3B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 30 --model-format pytorch --quantization ${quantization} Model Spec 5 (fp8, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 30 - **Quantizations:** fp8 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 30 --model-format fp8 --quantization ${quantization} Model Spec 6 (awq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 30 - **Quantizations:** 4bit, 8bit - **Engines**: vLLM, Transformers - **Model ID:** cpatonn/Qwen3-VL-30B-A3B-Instruct-AWQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 30 --model-format awq --quantization ${quantization} Model Spec 7 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-32B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 8 (fp8, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 32 - **Quantizations:** fp8 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-32B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 32 --model-format fp8 --quantization ${quantization} Model Spec 9 (awq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-VL-32B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 32 --model-format awq --quantization ${quantization} Model Spec 10 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-8B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 11 (fp8, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 8 - **Quantizations:** fp8 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-8B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 8 --model-format fp8 --quantization ${quantization} Model Spec 12 (awq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 8 - **Quantizations:** 4bit, 8bit - **Engines**: vLLM, Transformers - **Model ID:** cpatonn/Qwen3-VL-8B-Instruct-AWQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 8 --model-format awq --quantization ${quantization} Model Spec 13 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-4B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 4 --model-format pytorch --quantization ${quantization} Model Spec 14 (fp8, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 4 - **Quantizations:** fp8 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-4B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 4 --model-format fp8 --quantization ${quantization} Model Spec 15 (awq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 4 - **Quantizations:** 4bit, 8bit - **Engines**: vLLM, Transformers - **Model ID:** cpatonn/Qwen3-VL-4B-Instruct-AWQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 4 --model-format awq --quantization ${quantization} Model Spec 16 (pytorch, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-2B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 2 --model-format pytorch --quantization ${quantization} Model Spec 17 (fp8, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 2 - **Quantizations:** fp8 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-2B-Instruct-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 2 --model-format fp8 --quantization ${quantization} Model Spec 18 (mlx, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 235 - **Quantizations:** 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-235B-A22B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 235 --model-format mlx --quantization ${quantization} Model Spec 19 (mlx, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 30 - **Quantizations:** 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-30B-A3B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 30 --model-format mlx --quantization ${quantization} Model Spec 20 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-32B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 32 --model-format mlx --quantization ${quantization} Model Spec 21 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-8B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 22 (mlx, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 2 - **Quantizations:** 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-2B-Instruct-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 2 --model-format mlx --quantization ${quantization} Model Spec 23 (mlx, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 2 - **Quantizations:** bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-2B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 2 --model-format mlx --quantization ${quantization} Model Spec 24 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** 3bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-4B-Instruct-3bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 25 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-4B-Instruct-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 26 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-4B-Instruct-8bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 27 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-4B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 28 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 3bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-8B-Instruct-3bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 29 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 5bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-8B-Instruct-5bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 30 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-8B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 31 (mlx, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 30 - **Quantizations:** 3bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-30B-A3B-Instruct-3bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 30 --model-format mlx --quantization ${quantization} Model Spec 32 (mlx, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 30 - **Quantizations:** bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-30B-A3B-Instruct-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 30 --model-format mlx --quantization ${quantization} Model Spec 33 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** 3bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-32B-Instruct-3bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Instruct --size-in-billions 32 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen3-vl-thinking.rst ================================================ .. _models_llm_qwen3-vl-thinking: ======================================== Qwen3-VL-Thinking ======================================== - **Context Length:** 262144 - **Model Name:** Qwen3-VL-Thinking - **Languages:** en, zh - **Abilities:** chat, vision, reasoning, tools - **Description:** Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 235 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-235B-A22B-Thinking - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 235 --model-format pytorch --quantization ${quantization} Model Spec 2 (fp8, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 235 - **Quantizations:** fp8 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-235B-A22B-Thinking-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 235 --model-format fp8 --quantization ${quantization} Model Spec 3 (awq, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 235 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3-VL-235B-A22B-Thinking-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 235 --model-format awq --quantization ${quantization} Model Spec 4 (pytorch, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 30 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-30B-A3B-Thinking - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 30 --model-format pytorch --quantization ${quantization} Model Spec 5 (fp8, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 30 - **Quantizations:** fp8 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3-VL-30B-A3B-Thinking-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 30 --model-format fp8 --quantization ${quantization} Model Spec 6 (awq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 30 - **Quantizations:** 4bit, 8bit - **Engines**: vLLM, Transformers - **Model ID:** cpatonn/Qwen3-VL-30B-A3B-Thinking-AWQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 30 --model-format awq --quantization ${quantization} Model Spec 7 (mlx, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 235 - **Quantizations:** 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-235B-A22B-Thinking-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 235 --model-format mlx --quantization ${quantization} Model Spec 8 (mlx, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 30 - **Quantizations:** 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-30B-A3B-Thinking-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 30 --model-format mlx --quantization ${quantization} Model Spec 9 (mlx, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 2 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-2B-Thinking-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 2 --model-format mlx --quantization ${quantization} Model Spec 10 (mlx, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 2 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-2B-Thinking-8bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 2 --model-format mlx --quantization ${quantization} Model Spec 11 (mlx, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 2 - **Quantizations:** bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-2B-Thinking-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 2 --model-format mlx --quantization ${quantization} Model Spec 12 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** 3bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-4B-Thinking-3bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 13 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-4B-Thinking-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 14 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-4B-Thinking-8bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 15 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-4B-Thinking-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 16 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 3bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-8B-Thinking-3bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 17 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-8B-Thinking-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 18 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 5bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-8B-Thinking-5bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 19 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-8B-Thinking-8bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 20 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-8B-Thinking-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 21 (mlx, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 30 - **Quantizations:** 3bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-30B-A3B-Thinking-3bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 30 --model-format mlx --quantization ${quantization} Model Spec 22 (mlx, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 30 - **Quantizations:** bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-30B-A3B-Thinking-bf16 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 30 --model-format mlx --quantization ${quantization} Model Spec 23 (mlx, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 235 - **Quantizations:** 3bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-VL-235B-A22B-Thinking-3bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Qwen3-VL-Thinking --size-in-billions 235 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen3.5.rst ================================================ .. _models_llm_qwen3.5: ======================================== qwen3.5 ======================================== - **Context Length:** 262144 - **Model Name:** qwen3.5 - **Languages:** en, zh - **Abilities:** chat, vision, tools, reasoning - **Description:** Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with unprecedented capability and efficiency Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 397 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 397 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-397B-A17B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 397 --model-format pytorch --quantization ${quantization} Model Spec 2 (fp8, 397 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 397 - **Quantizations:** FP8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3.5-397B-A17B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 397 --model-format fp8 --quantization ${quantization} Model Spec 3 (gptq, 397 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 397 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-397B-A17B-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 397 --model-format gptq --quantization ${quantization} Model Spec 4 (awq, 397 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 397 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3.5-397B-A17B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 397 --model-format awq --quantization ${quantization} Model Spec 5 (ggufv2, 397 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 397 - **Quantizations:** UD-TQ1_0 - **Engines**: llama.cpp - **Model ID:** unsloth/Qwen3.5-397B-A17B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 397 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (mlx, 397 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 397 - **Quantizations:** 4bit, 5bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3.5-397B-A17B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 397 --model-format mlx --quantization ${quantization} Model Spec 7 (pytorch, 122 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 122 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-122B-A10B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 122 --model-format pytorch --quantization ${quantization} Model Spec 8 (fp8, 122 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 122 - **Quantizations:** FP8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3.5-122B-A10B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 122 --model-format fp8 --quantization ${quantization} Model Spec 9 (gptq, 122 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 122 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-122B-A10B-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 122 --model-format gptq --quantization ${quantization} Model Spec 10 (awq, 122 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 122 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3.5-122B-A10B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 122 --model-format awq --quantization ${quantization} Model Spec 11 (ggufv2, 122 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 122 - **Quantizations:** UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_S, UD-IQ3_XXS - **Engines**: llama.cpp - **Model ID:** unsloth/Qwen3.5-122B-A10B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 122 --model-format ggufv2 --quantization ${quantization} Model Spec 12 (mlx, 122 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 122 - **Quantizations:** 4bit, 5bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3.5-122B-A10B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 122 --model-format mlx --quantization ${quantization} Model Spec 13 (pytorch, 35 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 35 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-35B-A3B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 35 --model-format pytorch --quantization ${quantization} Model Spec 14 (fp8, 35 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 35 - **Quantizations:** FP8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3.5-35B-A3B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 35 --model-format fp8 --quantization ${quantization} Model Spec 15 (gptq, 35 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 35 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-35B-A3B-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 35 --model-format gptq --quantization ${quantization} Model Spec 16 (awq, 35 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 35 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3.5-35B-A3B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 35 --model-format awq --quantization ${quantization} Model Spec 17 (ggufv2, 35 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 35 - **Quantizations:** MXFP4_MOE, Q3_K_M, Q3_K_S, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ1_M, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_S, UD-IQ3_XXS, UD-IQ4_NL, UD-IQ4_XS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_L, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_S, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: llama.cpp - **Model ID:** unsloth/Qwen3.5-35B-A3B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 35 --model-format ggufv2 --quantization ${quantization} Model Spec 18 (mlx, 35 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 35 - **Quantizations:** 4bit, 5bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3.5-35B-A3B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 35 --model-format mlx --quantization ${quantization} Model Spec 19 (pytorch, 27 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 27 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-27B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 27 --model-format pytorch --quantization ${quantization} Model Spec 20 (fp8, 27 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 27 - **Quantizations:** FP8 - **Engines**: vLLM - **Model ID:** Qwen/Qwen3.5-27B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 27 --model-format fp8 --quantization ${quantization} Model Spec 21 (gptq, 27 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 27 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-27B-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 27 --model-format gptq --quantization ${quantization} Model Spec 22 (awq, 27 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 27 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Qwen3.5-27B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 27 --model-format awq --quantization ${quantization} Model Spec 23 (ggufv2, 27 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 27 - **Quantizations:** IQ4_NL, IQ4_XS, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: llama.cpp - **Model ID:** unsloth/Qwen3.5-27B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 27 --model-format ggufv2 --quantization ${quantization} Model Spec 24 (mlx, 27 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 27 - **Quantizations:** 4bit, 5bit, 6bit, 8bit, bf16, mxfp8 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3.5-27B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 27 --model-format mlx --quantization ${quantization} Model Spec 25 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-9B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 9 --model-format pytorch --quantization ${quantization} Model Spec 26 (awq, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 9 - **Quantizations:** Int4 - **Engines**: Transformers - **Model ID:** QuantTrio/Qwen3.5-9B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 9 --model-format awq --quantization ${quantization} Model Spec 27 (ggufv2, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 9 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: llama.cpp - **Model ID:** unsloth/Qwen3.5-9B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 9 --model-format ggufv2 --quantization ${quantization} Model Spec 28 (mlx, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 9 - **Quantizations:** 4bit, 5bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3.5-9B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 9 --model-format mlx --quantization ${quantization} Model Spec 29 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-4B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 4 --model-format pytorch --quantization ${quantization} Model Spec 30 (awq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 4 - **Quantizations:** Int4 - **Engines**: Transformers - **Model ID:** QuantTrio/Qwen3.5-4B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 4 --model-format awq --quantization ${quantization} Model Spec 31 (ggufv2, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 4 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: llama.cpp - **Model ID:** unsloth/Qwen3.5-4B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 4 --model-format ggufv2 --quantization ${quantization} Model Spec 32 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** 3bit, 4bit, 6bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen3.5-4B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 33 (pytorch, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 2 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-2B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 2 --model-format pytorch --quantization ${quantization} Model Spec 34 (awq, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 2 - **Quantizations:** Int4 - **Engines**: Transformers - **Model ID:** QuantTrio/Qwen3.5-2B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 2 --model-format awq --quantization ${quantization} Model Spec 35 (ggufv2, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 2 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: llama.cpp - **Model ID:** unsloth/Qwen3.5-2B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 2 --model-format ggufv2 --quantization ${quantization} Model Spec 36 (mlx, 2 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 2 - **Quantizations:** 3bit, 4bit, 5bit, 6bit, 8bit, bf16, mxfp4, mxfp8, nvfp4 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3.5-2B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 2 --model-format mlx --quantization ${quantization} Model Spec 37 (pytorch, 0_8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 0_8 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** Qwen/Qwen3.5-0.8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 0_8 --model-format pytorch --quantization ${quantization} Model Spec 38 (ggufv2, 0_8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 0_8 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: llama.cpp - **Model ID:** unsloth/Qwen3.5-0.8B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 0_8 --model-format ggufv2 --quantization ${quantization} Model Spec 39 (mlx, 0_8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 0_8 - **Quantizations:** 3bit, 4bit, 5bit, 6bit, 8bit, bf16, mxfp4, mxfp8, nvfp4 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3.5-0.8B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3.5 --size-in-billions 0_8 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwen3.rst ================================================ .. _models_llm_qwen3: ======================================== qwen3 ======================================== - **Context Length:** 40960 - **Model Name:** qwen3 - **Languages:** en, zh - **Abilities:** chat, reasoning, hybrid, tools - **Description:** Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 0_6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 0_6 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-0.6B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 0_6 --model-format pytorch --quantization ${quantization} Model Spec 2 (fp8, 0_6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 0_6 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** Qwen/Qwen3-0.6B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 0_6 --model-format fp8 --quantization ${quantization} Model Spec 3 (gptq, 0_6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 0_6 - **Quantizations:** Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-0.6B-GPTQ-Int8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 0_6 --model-format gptq --quantization ${quantization} Model Spec 4 (gptq, 0_6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 0_6 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** JunHowie/Qwen3-0.6B-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 0_6 --model-format gptq --quantization ${quantization} Model Spec 5 (mlx, 0_6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 0_6 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-0.6B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 0_6 --model-format mlx --quantization ${quantization} Model Spec 6 (ggufv2, 0_6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 0_6 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q5_K_M, Q6_K, Q8_0, BF16, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL, IQ4_NL, IQ4_XS - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-0.6B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 0_6 --model-format ggufv2 --quantization ${quantization} Model Spec 7 (pytorch, 1_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-1.7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 1_7 --model-format pytorch --quantization ${quantization} Model Spec 8 (fp8, 1_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 1_7 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** Qwen/Qwen3-1.7B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 1_7 --model-format fp8 --quantization ${quantization} Model Spec 9 (gptq, 1_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 1_7 - **Quantizations:** Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-1.7B-GPTQ-Int8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 1_7 --model-format gptq --quantization ${quantization} Model Spec 10 (gptq, 1_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 1_7 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** JunHowie/Qwen3-1.7B-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 1_7 --model-format gptq --quantization ${quantization} Model Spec 11 (mlx, 1_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 1_7 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-1.7B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 1_7 --model-format mlx --quantization ${quantization} Model Spec 12 (ggufv2, 1_7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1_7 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q5_K_M, Q6_K, Q8_0, BF16, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL, IQ4_NL, IQ4_XS - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-1.7B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 1_7 --model-format ggufv2 --quantization ${quantization} Model Spec 13 (pytorch, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 4 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-4B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 4 --model-format pytorch --quantization ${quantization} Model Spec 14 (fp8, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 4 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** Qwen/Qwen3-4B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 4 --model-format fp8 --quantization ${quantization} Model Spec 15 (awq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 4 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-4B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 4 --model-format awq --quantization ${quantization} Model Spec 16 (gptq, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 4 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** JunHowie/Qwen3-4B-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 4 --model-format gptq --quantization ${quantization} Model Spec 17 (mlx, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 4 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-4B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 4 --model-format mlx --quantization ${quantization} Model Spec 18 (ggufv2, 4 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 4 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q5_K_M, Q6_K, Q8_0, BF16, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL, IQ4_NL, IQ4_XS - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-4B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 4 --model-format ggufv2 --quantization ${quantization} Model Spec 19 (pytorch, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 8 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-8B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 8 --model-format pytorch --quantization ${quantization} Model Spec 20 (fp8, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 8 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** Qwen/Qwen3-8B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 8 --model-format fp8 --quantization ${quantization} Model Spec 21 (awq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 8 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-8B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 8 --model-format awq --quantization ${quantization} Model Spec 22 (gptq, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 8 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** JunHowie/Qwen3-8B-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 8 --model-format gptq --quantization ${quantization} Model Spec 23 (mlx, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 8 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-8B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 8 --model-format mlx --quantization ${quantization} Model Spec 24 (ggufv2, 8 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 8 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q5_K_M, Q6_K, Q8_0, BF16, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL, IQ4_NL, IQ4_XS - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-8B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 8 --model-format ggufv2 --quantization ${quantization} Model Spec 25 (pytorch, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 14 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-14B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 14 --model-format pytorch --quantization ${quantization} Model Spec 26 (fp8, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 14 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** Qwen/Qwen3-14B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 14 --model-format fp8 --quantization ${quantization} Model Spec 27 (awq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 14 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-14B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 14 --model-format awq --quantization ${quantization} Model Spec 28 (gptq, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 14 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** JunHowie/Qwen3-14B-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 14 --model-format gptq --quantization ${quantization} Model Spec 29 (mlx, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 14 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-14B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 14 --model-format mlx --quantization ${quantization} Model Spec 30 (ggufv2, 14 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 14 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q5_K_M, Q6_K, Q8_0, BF16, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL, IQ4_NL, IQ4_XS - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-14B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 14 --model-format ggufv2 --quantization ${quantization} Model Spec 31 (pytorch, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 30 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-30B-A3B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 30 --model-format pytorch --quantization ${quantization} Model Spec 32 (fp8, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 30 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** Qwen/Qwen3-30B-A3B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 30 --model-format fp8 --quantization ${quantization} Model Spec 33 (gptq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 30 - **Quantizations:** Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** JunHowie/Qwen3-30B-A3B-GPTQ-Int8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 30 --model-format gptq --quantization ${quantization} Model Spec 34 (gptq, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 30 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-30B-A3B-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 30 --model-format gptq --quantization ${quantization} Model Spec 35 (mlx, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 30 - **Quantizations:** 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-30B-A3B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 30 --model-format mlx --quantization ${quantization} Model Spec 36 (ggufv2, 30 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 30 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q5_K_M, Q6_K, Q8_0, BF16, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL, IQ4_NL, IQ4_XS - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-30B-A3B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 30 --model-format ggufv2 --quantization ${quantization} Model Spec 37 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-32B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 38 (fp8, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 32 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** Qwen/Qwen3-32B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 32 --model-format fp8 --quantization ${quantization} Model Spec 39 (awq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-32B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 32 --model-format awq --quantization ${quantization} Model Spec 40 (gptq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 32 - **Quantizations:** Int4, Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** JunHowie/Qwen3-32B-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 32 --model-format gptq --quantization ${quantization} Model Spec 41 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/Qwen3-32B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 32 --model-format mlx --quantization ${quantization} Model Spec 42 (ggufv2, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 32 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q5_K_M, Q6_K, Q8_0, BF16, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL, IQ4_NL, IQ4_XS - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-32B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 32 --model-format ggufv2 --quantization ${quantization} Model Spec 43 (pytorch, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 235 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-235B-A22B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 235 --model-format pytorch --quantization ${quantization} Model Spec 44 (fp8, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** fp8 - **Model Size (in billions):** 235 - **Quantizations:** fp8 - **Engines**: vLLM, SGLang - **Model ID:** Qwen/Qwen3-235B-A22B-FP8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 235 --model-format fp8 --quantization ${quantization} Model Spec 45 (gptq, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 235 - **Quantizations:** Int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** QuantTrio/Qwen3-235B-A22B-GPTQ-Int8 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 235 --model-format gptq --quantization ${quantization} Model Spec 46 (gptq, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 235 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/Qwen3-235B-A22B-GPTQ-Int4 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 235 --model-format gptq --quantization ${quantization} Model Spec 47 (mlx, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 235 - **Quantizations:** 3bit, 4bit, 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen/Qwen3-235B-A22B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 235 --model-format mlx --quantization ${quantization} Model Spec 48 (ggufv2, 235 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 235 - **Quantizations:** Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q5_K_M, Q6_K, Q8_0, BF16, UD-Q2_K_XL, UD-Q3_K_XL, IQ4_NL, IQ4_XS - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Qwen3-235B-A22B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwen3 --size-in-billions 235 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwenlong-l1.rst ================================================ .. _models_llm_qwenlong-l1: ======================================== qwenLong-l1 ======================================== - **Context Length:** 32768 - **Model Name:** qwenLong-l1 - **Languages:** en, zh - **Abilities:** chat - **Description:** QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Tongyi-Zhiwen/QwenLong-L1-32B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwenLong-l1 --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Tongyi-Zhiwen/QwenLong-L1-32B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name qwenLong-l1 --size-in-billions 32 --model-format awq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwq-32b-preview.rst ================================================ .. _models_llm_qwq-32b-preview: ======================================== QwQ-32B-Preview ======================================== - **Context Length:** 32768 - **Model Name:** QwQ-32B-Preview - **Languages:** en, zh - **Abilities:** chat - **Description:** QwQ-32B-Preview is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/QwQ-32B-Preview - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QwQ-32B-Preview --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** KirillR/QwQ-32B-Preview-AWQ - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QwQ-32B-Preview --size-in-billions 32 --model-format awq --quantization ${quantization} Model Spec 3 (ggufv2, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 32 - **Quantizations:** Q3_K_L, Q4_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** lmstudio-community/QwQ-32B-Preview-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QwQ-32B-Preview --size-in-billions 32 --model-format ggufv2 --quantization ${quantization} Model Spec 4 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen_QwQ-32B-Preview_MLX-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QwQ-32B-Preview --size-in-billions 32 --model-format mlx --quantization ${quantization} Model Spec 5 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Qwen_QwQ-32B-Preview_MLX-8bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QwQ-32B-Preview --size-in-billions 32 --model-format mlx --quantization ${quantization} Model Spec 6 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: MLX - **Model ID:** mlx-community/QwQ-32B-Preview-bf16 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QwQ-32B-Preview --size-in-billions 32 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/qwq-32b.rst ================================================ .. _models_llm_qwq-32b: ======================================== QwQ-32B ======================================== - **Context Length:** 131072 - **Model Name:** QwQ-32B - **Languages:** en, zh - **Abilities:** chat, reasoning, tools - **Description:** QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/QwQ-32B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QwQ-32B --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 2 (awq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 32 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Qwen/QwQ-32B-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QwQ-32B --size-in-billions 32 --model-format awq --quantization ${quantization} Model Spec 3 (mlx, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 32 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16 - **Engines**: MLX - **Model ID:** mlx-community/QwQ-32B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QwQ-32B --size-in-billions 32 --model-format mlx --quantization ${quantization} Model Spec 4 (ggufv2, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 32 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q5_K_M, Q6_K, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/QwQ-32B-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name QwQ-32B --size-in-billions 32 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/seallm_v2.5.rst ================================================ .. _models_llm_seallm_v2.5: ======================================== seallm_v2.5 ======================================== - **Context Length:** 8192 - **Model Name:** seallm_v2.5 - **Languages:** en, zh, vi, id, th, ms, km, lo, my, tl - **Abilities:** generate - **Description:** We introduce SeaLLM-7B-v2.5, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** SeaLLMs/SeaLLM-7B-v2.5 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name seallm_v2.5 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q4_K_M, Q8_0 - **Engines**: llama.cpp - **Model ID:** SeaLLMs/SeaLLM-7B-v2.5-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name seallm_v2.5 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/seallm_v2.rst ================================================ .. _models_llm_seallm_v2: ======================================== seallm_v2 ======================================== - **Context Length:** 8192 - **Model Name:** seallm_v2 - **Languages:** en, zh, vi, id, th, ms, km, lo, my, tl - **Abilities:** generate - **Description:** We introduce SeaLLM-7B-v2, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** SeaLLMs/SeaLLM-7B-v2 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name seallm_v2 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q4_0, Q8_0 - **Engines**: llama.cpp - **Model ID:** SeaLLMs/SeaLLM-7B-v2-gguf - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name seallm_v2 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/seallms-v3.rst ================================================ .. _models_llm_seallms-v3: ======================================== seallms-v3 ======================================== - **Context Length:** 32768 - **Model Name:** seallms-v3 - **Languages:** en, zh, id, vi, th, ph, ms, mm, kh, la, in - **Abilities:** chat - **Description:** SeaLLMs - Large Language Models for Southeast Asia Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 1_5 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 1_5 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** SeaLLMs/SeaLLMs-v3-1.5B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name seallms-v3 --size-in-billions 1_5 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** SeaLLMs/SeaLLMs-v3-7B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name seallms-v3 --size-in-billions 7 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/seed-oss.rst ================================================ .. _models_llm_seed-oss: ======================================== seed-oss ======================================== - **Context Length:** 524288 - **Model Name:** seed-oss - **Languages:** en, zh - **Abilities:** chat, reasoning, tools - **Description:** Seed-OSS is a series of open-source large language models developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features. Although trained with only 12T tokens, Seed-OSS achieves excellent performance on several popular open benchmarks. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 36 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 36 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** ByteDance-Seed/Seed-OSS-36B-Instruct - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name seed-oss --size-in-billions 36 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 36 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 36 - **Quantizations:** Int8, Int4, Int3 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Seed-OSS-36B-Instruct-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name seed-oss --size-in-billions 36 --model-format gptq --quantization ${quantization} Model Spec 3 (awq, 36 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 36 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers - **Model ID:** QuantTrio/Seed-OSS-36B-Instruct-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name seed-oss --size-in-billions 36 --model-format awq --quantization ${quantization} Model Spec 4 (mlx, 36 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 36 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Seed-OSS-36B-Instruct-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name seed-oss --size-in-billions 36 --model-format mlx --quantization ${quantization} Model Spec 5 (ggufv2, 36 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 36 - **Quantizations:** BF16, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0, UD-IQ1_M, UD-IQ1_S, UD-IQ2_M, UD-IQ2_XXS, UD-IQ3_XXS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q5_K_XL, UD-Q6_K_XL, UD-Q8_K_XL - **Engines**: vLLM, llama.cpp - **Model ID:** unsloth/Seed-OSS-36B-Instruct-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name seed-oss --size-in-billions 36 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/skywork-math.rst ================================================ .. _models_llm_skywork-math: ======================================== Skywork-Math ======================================== - **Context Length:** 4096 - **Model Name:** Skywork-Math - **Languages:** en, zh - **Abilities:** generate - **Description:** Skywork is a series of large models developed by the Kunlun Group · Skywork team. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** skywork/Skywork-13B-Math - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Skywork-Math --size-in-billions 13 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/skywork-or1-preview.rst ================================================ .. _models_llm_skywork-or1-preview: ======================================== skywork-or1-preview ======================================== - **Context Length:** 32768 - **Model Name:** skywork-or1-preview - **Languages:** en, zh - **Abilities:** chat - **Description:** The Skywork-OR1 (Open Reasoner 1) model series consists of powerful math and code reasoning models trained using large-scale rule-based reinforcement learning with carefully designed datasets and training recipes. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Skywork/Skywork-OR1-32B-Preview - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name skywork-or1-preview --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 32 - **Quantizations:** Int4, int8 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** JunHowie/Skywork-OR1-32B-Preview-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name skywork-or1-preview --size-in-billions 32 --model-format gptq --quantization ${quantization} Model Spec 3 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Skywork/Skywork-OR1-7B-Preview - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name skywork-or1-preview --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 4 (ggufv2, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 32 - **Quantizations:** IQ2_M, IQ2_S, IQ2_XS, IQ3_M, IQ3_XS, IQ3_XXS, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_L, Q3_K_M, Q3_K_S, Q3_K_XL, Q4_0, Q4_1, Q4_K_L, Q4_K_M, Q4_K_S, Q5_K_L, Q5_K_M, Q5_K_S, Q6_K, Q6_K_L, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** bartowski/Skywork_Skywork-OR1-32B-Preview-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name skywork-or1-preview --size-in-billions 32 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** IQ2_M, IQ2_S, IQ2_XS, IQ3_M, IQ3_XS, IQ3_XXS, IQ4_NL, IQ4_XS, Q2_K, Q2_K_L, Q3_K_L, Q3_K_M, Q3_K_S, Q3_K_XL, Q4_0, Q4_1, Q4_K_L, Q4_K_M, Q4_K_S, Q5_K_L, Q5_K_M, Q5_K_S, Q6_K, Q6_K_L, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** bartowski/Skywork_Skywork-OR1-7B-Preview-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name skywork-or1-preview --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/skywork-or1.rst ================================================ .. _models_llm_skywork-or1: ======================================== skywork-or1 ======================================== - **Context Length:** 131072 - **Model Name:** skywork-or1 - **Languages:** en, zh - **Abilities:** chat - **Description:** We release the final version of Skywork-OR1 (Open Reasoner 1) series of models, including Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Skywork/Skywork-OR1-32B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name skywork-or1 --size-in-billions 32 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 32 - **Quantizations:** Int8, Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** JunHowie/Skywork-OR1-32B-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name skywork-or1 --size-in-billions 32 --model-format gptq --quantization ${quantization} Model Spec 3 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** Skywork/Skywork-OR1-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name skywork-or1 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 4 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** Int8, Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** JunHowie/Skywork-OR1-7B-GPTQ-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name skywork-or1 --size-in-billions 7 --model-format gptq --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/skywork.rst ================================================ .. _models_llm_skywork: ======================================== Skywork ======================================== - **Context Length:** 4096 - **Model Name:** Skywork - **Languages:** en, zh - **Abilities:** generate - **Description:** Skywork is a series of large models developed by the Kunlun Group · Skywork team. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** skywork/Skywork-13B-base - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Skywork --size-in-billions 13 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/telechat.rst ================================================ .. _models_llm_telechat: ======================================== telechat ======================================== - **Context Length:** 8192 - **Model Name:** telechat - **Languages:** en, zh - **Abilities:** chat - **Description:** The TeleChat is a large language model developed and trained by China Telecom Artificial Intelligence Technology Co., LTD. The 7B model base is trained with 1.5 trillion Tokens and 3 trillion Tokens and Chinese high-quality corpus. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** Tele-AI/telechat-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name telechat --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (gptq, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 7 - **Quantizations:** int4, int8 - **Engines**: Transformers - **Model ID:** Tele-AI/telechat-7B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name telechat --size-in-billions 7 --model-format gptq --quantization ${quantization} Model Spec 3 (pytorch, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 12 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** Tele-AI/TeleChat-12B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name telechat --size-in-billions 12 --model-format pytorch --quantization ${quantization} Model Spec 4 (gptq, 12 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 12 - **Quantizations:** int4, int8 - **Engines**: Transformers - **Model ID:** Tele-AI/TeleChat-12B-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name telechat --size-in-billions 12 --model-format gptq --quantization ${quantization} Model Spec 5 (pytorch, 52 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 52 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** Tele-AI/TeleChat-52B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name telechat --size-in-billions 52 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/tiny-llama.rst ================================================ .. _models_llm_tiny-llama: ======================================== tiny-llama ======================================== - **Context Length:** 2048 - **Model Name:** tiny-llama - **Languages:** en - **Abilities:** generate - **Description:** The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (ggufv2, 1 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 1 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name tiny-llama --size-in-billions 1 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/wizardcoder-python-v1.0.rst ================================================ .. _models_llm_wizardcoder-python-v1.0: ======================================== wizardcoder-python-v1.0 ======================================== - **Context Length:** 100000 - **Model Name:** wizardcoder-python-v1.0 - **Languages:** en - **Abilities:** chat - **Description:** Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** WizardLMTeam/WizardCoder-Python-13B-V1.0 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name wizardcoder-python-v1.0 --size-in-billions 13 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 34 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** WizardLMTeam/WizardCoder-Python-34B-V1.0 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name wizardcoder-python-v1.0 --size-in-billions 34 --model-format pytorch --quantization ${quantization} Model Spec 3 (ggufv2, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 7 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/WizardCoder-Python-7B-V1.0-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name wizardcoder-python-v1.0 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} Model Spec 4 (ggufv2, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 13 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/WizardCoder-Python-13B-V1.0-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name wizardcoder-python-v1.0 --size-in-billions 13 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (ggufv2, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 34 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/WizardCoder-Python-34B-V1.0-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name wizardcoder-python-v1.0 --size-in-billions 34 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/wizardmath-v1.0.rst ================================================ .. _models_llm_wizardmath-v1.0: ======================================== wizardmath-v1.0 ======================================== - **Context Length:** 2048 - **Model Name:** wizardmath-v1.0 - **Languages:** en - **Abilities:** chat - **Description:** WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** WizardLMTeam/WizardMath-7B-V1.0 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name wizardmath-v1.0 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 70 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 70 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** WizardLMTeam/WizardMath-70B-V1.0 - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name wizardmath-v1.0 --size-in-billions 70 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/xiyansql-qwencoder-2504.rst ================================================ .. _models_llm_xiyansql-qwencoder-2504: ======================================== XiYanSQL-QwenCoder-2504 ======================================== - **Context Length:** 32768 - **Model Name:** XiYanSQL-QwenCoder-2504 - **Languages:** en, zh - **Abilities:** chat, tools - **Description:** The XiYanSQL-QwenCoder models, as multi-dialect SQL base models, demonstrating robust SQL generation capabilities. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** XGenerationLab/XiYanSQL-QwenCoder-7B-2504 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name XiYanSQL-QwenCoder-2504 --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 32 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 32 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** XGenerationLab/XiYanSQL-QwenCoder-32B-2504 - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name XiYanSQL-QwenCoder-2504 --size-in-billions 32 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/xverse-chat.rst ================================================ .. _models_llm_xverse-chat: ======================================== xverse-chat ======================================== - **Context Length:** 2048 - **Model Name:** xverse-chat - **Languages:** en, zh - **Abilities:** chat - **Description:** XVERSEB-Chat is the aligned version of model XVERSE. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** xverse/XVERSE-7B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name xverse-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** xverse/XVERSE-13B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name xverse-chat --size-in-billions 13 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/xverse.rst ================================================ .. _models_llm_xverse: ======================================== xverse ======================================== - **Context Length:** 2048 - **Model Name:** xverse - **Languages:** en, zh - **Abilities:** generate - **Description:** XVERSE is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 7 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 7 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** xverse/XVERSE-7B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name xverse --size-in-billions 7 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 13 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 13 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** xverse/XVERSE-13B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name xverse --size-in-billions 13 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 65 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 65 - **Quantizations:** none - **Engines**: Transformers - **Model ID:** xverse/XVERSE-65B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name xverse --size-in-billions 65 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/yi-1.5-chat-16k.rst ================================================ .. _models_llm_yi-1.5-chat-16k: ======================================== Yi-1.5-chat-16k ======================================== - **Context Length:** 16384 - **Model Name:** Yi-1.5-chat-16k - **Languages:** en, zh - **Abilities:** chat - **Description:** Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-1.5-9B-Chat-16K - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat-16k --size-in-billions 9 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 34 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-1.5-34B-Chat-16K - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat-16k --size-in-billions 34 --model-format pytorch --quantization ${quantization} Model Spec 3 (ggufv2, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 9 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** QuantFactory/Yi-1.5-9B-Chat-16K-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat-16k --size-in-billions 9 --model-format ggufv2 --quantization ${quantization} Model Spec 4 (ggufv2, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 34 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** bartowski/Yi-1.5-34B-Chat-16K-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat-16k --size-in-billions 34 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/yi-1.5-chat.rst ================================================ .. _models_llm_yi-1.5-chat: ======================================== Yi-1.5-chat ======================================== - **Context Length:** 4096 - **Model Name:** Yi-1.5-chat - **Languages:** en, zh - **Abilities:** chat - **Description:** Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 6 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-1.5-6B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 6 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-1.5-9B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 9 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 34 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-1.5-34B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 34 --model-format pytorch --quantization ${quantization} Model Spec 4 (ggufv2, 6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 6 - **Quantizations:** Q3_K_L, Q4_K_M, Q5_K_M, Q6_K, Q8_0, f32 - **Engines**: vLLM, llama.cpp - **Model ID:** lmstudio-community/Yi-1.5-6B-Chat-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 6 --model-format ggufv2 --quantization ${quantization} Model Spec 5 (ggufv2, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 9 - **Quantizations:** Q3_K_L, Q4_K_M, Q5_K_M, Q6_K, Q8_0, f32 - **Engines**: vLLM, llama.cpp - **Model ID:** lmstudio-community/Yi-1.5-9B-Chat-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 9 --model-format ggufv2 --quantization ${quantization} Model Spec 6 (ggufv2, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 34 - **Quantizations:** Q2_K, Q3_K_L, Q4_K_M, Q5_K_M, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** lmstudio-community/Yi-1.5-34B-Chat-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 34 --model-format ggufv2 --quantization ${quantization} Model Spec 7 (gptq, 6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 6 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** modelscope/Yi-1.5-6B-Chat-GPTQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 6 --model-format gptq --quantization ${quantization} Model Spec 8 (gptq, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 9 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** modelscope/Yi-1.5-9B-Chat-GPTQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 9 --model-format gptq --quantization ${quantization} Model Spec 9 (gptq, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 34 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** modelscope/Yi-1.5-34B-Chat-GPTQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 34 --model-format gptq --quantization ${quantization} Model Spec 10 (awq, 6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 6 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** modelscope/Yi-1.5-6B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 6 --model-format awq --quantization ${quantization} Model Spec 11 (awq, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 9 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** modelscope/Yi-1.5-9B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 9 --model-format awq --quantization ${quantization} Model Spec 12 (awq, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** awq - **Model Size (in billions):** 34 - **Quantizations:** Int4 - **Engines**: vLLM, Transformers, SGLang - **Model ID:** modelscope/Yi-1.5-34B-Chat-AWQ - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 34 --model-format awq --quantization ${quantization} Model Spec 13 (mlx, 6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 6 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Yi-1.5-6B-Chat-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 6 --model-format mlx --quantization ${quantization} Model Spec 14 (mlx, 6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 6 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Yi-1.5-6B-Chat-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 6 --model-format mlx --quantization ${quantization} Model Spec 15 (mlx, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 9 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Yi-1.5-9B-Chat-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 9 --model-format mlx --quantization ${quantization} Model Spec 16 (mlx, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 9 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Yi-1.5-9B-Chat-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 9 --model-format mlx --quantization ${quantization} Model Spec 17 (mlx, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 34 - **Quantizations:** 4bit - **Engines**: MLX - **Model ID:** mlx-community/Yi-1.5-34B-Chat-4bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 34 --model-format mlx --quantization ${quantization} Model Spec 18 (mlx, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** mlx - **Model Size (in billions):** 34 - **Quantizations:** 8bit - **Engines**: MLX - **Model ID:** mlx-community/Yi-1.5-34B-Chat-8bit - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5-chat --size-in-billions 34 --model-format mlx --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/yi-1.5.rst ================================================ .. _models_llm_yi-1.5: ======================================== Yi-1.5 ======================================== - **Context Length:** 4096 - **Model Name:** Yi-1.5 - **Languages:** en, zh - **Abilities:** generate - **Description:** Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 6 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-1.5-6B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5 --size-in-billions 6 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-1.5-9B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5 --size-in-billions 9 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 34 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-1.5-34B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-1.5 --size-in-billions 34 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/yi-200k.rst ================================================ .. _models_llm_yi-200k: ======================================== Yi-200k ======================================== - **Context Length:** 262144 - **Model Name:** Yi-200k - **Languages:** en, zh - **Abilities:** generate - **Description:** The Yi series models are large language models trained from scratch by developers at 01.AI. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 6 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-6B-200K - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-200k --size-in-billions 6 --model-format pytorch --quantization ${quantization} Model Spec 2 (pytorch, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 34 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-34B-200K - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-200k --size-in-billions 34 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/yi-chat.rst ================================================ .. _models_llm_yi-chat: ======================================== Yi-chat ======================================== - **Context Length:** 4096 - **Model Name:** Yi-chat - **Languages:** en, zh - **Abilities:** chat - **Description:** The Yi series models are large language models trained from scratch by developers at 01.AI. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (gptq, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** gptq - **Model Size (in billions):** 34 - **Quantizations:** 8bits - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-34B-Chat-{quantization} - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-chat --size-in-billions 34 --model-format gptq --quantization ${quantization} Model Spec 2 (pytorch, 6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 6 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-6B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-chat --size-in-billions 6 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 34 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-34B-Chat - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-chat --size-in-billions 34 --model-format pytorch --quantization ${quantization} Model Spec 4 (ggufv2, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 34 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: vLLM, llama.cpp - **Model ID:** TheBloke/Yi-34B-Chat-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi-chat --size-in-billions 34 --model-format ggufv2 --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/llm/yi.rst ================================================ .. _models_llm_yi: ======================================== Yi ======================================== - **Context Length:** 4096 - **Model Name:** Yi - **Languages:** en, zh - **Abilities:** generate - **Description:** The Yi series models are large language models trained from scratch by developers at 01.AI. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (ggufv2, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** ggufv2 - **Model Size (in billions):** 34 - **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0 - **Engines**: llama.cpp - **Model ID:** TheBloke/Yi-34B-GGUF - **Model Hubs**: `Hugging Face `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi --size-in-billions 34 --model-format ggufv2 --quantization ${quantization} Model Spec 2 (pytorch, 6 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 6 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-6B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi --size-in-billions 6 --model-format pytorch --quantization ${quantization} Model Spec 3 (pytorch, 9 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 9 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-9B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi --size-in-billions 9 --model-format pytorch --quantization ${quantization} Model Spec 4 (pytorch, 34 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 34 - **Quantizations:** none - **Engines**: vLLM, Transformers, SGLang - **Model ID:** 01-ai/Yi-34B - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name Yi --size-in-billions 34 --model-format pytorch --quantization ${quantization} ================================================ FILE: doc/source/models/builtin/rerank/bce-reranker-base_v1.rst ================================================ .. _models_builtin_bce-reranker-base_v1: ==================== bce-reranker-base_v1 ==================== - **Model Name:** bce-reranker-base_v1 - **Languages:** en, zh - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** maidalun1020/bce-reranker-base_v1 Execute the following command to launch the model:: xinference launch --model-name bce-reranker-base_v1 --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/bge-reranker-base.rst ================================================ .. _models_builtin_bge-reranker-base: ================= bge-reranker-base ================= - **Model Name:** bge-reranker-base - **Languages:** en, zh - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** BAAI/bge-reranker-base Execute the following command to launch the model:: xinference launch --model-name bge-reranker-base --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/bge-reranker-large.rst ================================================ .. _models_builtin_bge-reranker-large: ================== bge-reranker-large ================== - **Model Name:** bge-reranker-large - **Languages:** en, zh - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** BAAI/bge-reranker-large Execute the following command to launch the model:: xinference launch --model-name bge-reranker-large --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/bge-reranker-v2-gemma.rst ================================================ .. _models_builtin_bge-reranker-v2-gemma: ===================== bge-reranker-v2-gemma ===================== - **Model Name:** bge-reranker-v2-gemma - **Languages:** en, zh, multilingual - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** BAAI/bge-reranker-v2-gemma Execute the following command to launch the model:: xinference launch --model-name bge-reranker-v2-gemma --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/bge-reranker-v2-m3.rst ================================================ .. _models_builtin_bge-reranker-v2-m3: ================== bge-reranker-v2-m3 ================== - **Model Name:** bge-reranker-v2-m3 - **Languages:** en, zh, multilingual - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** BAAI/bge-reranker-v2-m3 Execute the following command to launch the model:: xinference launch --model-name bge-reranker-v2-m3 --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/bge-reranker-v2-minicpm-layerwise.rst ================================================ .. _models_builtin_bge-reranker-v2-minicpm-layerwise: ================================= bge-reranker-v2-minicpm-layerwise ================================= - **Model Name:** bge-reranker-v2-minicpm-layerwise - **Languages:** en, zh, multilingual - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** BAAI/bge-reranker-v2-minicpm-layerwise Execute the following command to launch the model:: xinference launch --model-name bge-reranker-v2-minicpm-layerwise --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/index.rst ================================================ .. _models_rerank_index: ================ Rerank Models ================ The following is a list of built-in rerank models in Xinference: .. toctree:: :maxdepth: 1 bce-reranker-base_v1 bge-reranker-base bge-reranker-large bge-reranker-v2-gemma bge-reranker-v2-m3 bge-reranker-v2-minicpm-layerwise jina-reranker-v2 jina-reranker-v3 minicpm-reranker qwen3-reranker-0.6b qwen3-reranker-4b qwen3-reranker-8b qwen3-vl-reranker-2b qwen3-vl-reranker-8b ================================================ FILE: doc/source/models/builtin/rerank/jina-reranker-v2.rst ================================================ .. _models_builtin_jina-reranker-v2: ================ jina-reranker-v2 ================ - **Model Name:** jina-reranker-v2 - **Languages:** en, zh, multilingual - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** jinaai/jina-reranker-v2-base-multilingual Execute the following command to launch the model:: xinference launch --model-name jina-reranker-v2 --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/jina-reranker-v3.rst ================================================ .. _models_builtin_jina-reranker-v3: ================ jina-reranker-v3 ================ - **Model Name:** jina-reranker-v3 - **Languages:** en, zh, multilingual - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** jinaai/jina-reranker-v3 Execute the following command to launch the model:: xinference launch --model-name jina-reranker-v3 --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/minicpm-reranker.rst ================================================ .. _models_builtin_minicpm-reranker: ================ minicpm-reranker ================ - **Model Name:** minicpm-reranker - **Languages:** en, zh - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** openbmb/MiniCPM-Reranker Execute the following command to launch the model:: xinference launch --model-name minicpm-reranker --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/qwen3-reranker-0.6b.rst ================================================ .. _models_builtin_qwen3-reranker-0.6b: =================== Qwen3-Reranker-0.6B =================== - **Model Name:** Qwen3-Reranker-0.6B - **Languages:** en, zh - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen3-Reranker-0.6B Execute the following command to launch the model:: xinference launch --model-name Qwen3-Reranker-0.6B --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/qwen3-reranker-4b.rst ================================================ .. _models_builtin_qwen3-reranker-4b: ================= Qwen3-Reranker-4B ================= - **Model Name:** Qwen3-Reranker-4B - **Languages:** en, zh - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen3-Reranker-4B Execute the following command to launch the model:: xinference launch --model-name Qwen3-Reranker-4B --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/qwen3-reranker-8b.rst ================================================ .. _models_builtin_qwen3-reranker-8b: ================= Qwen3-Reranker-8B ================= - **Model Name:** Qwen3-Reranker-8B - **Languages:** en, zh - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen3-Reranker-8B Execute the following command to launch the model:: xinference launch --model-name Qwen3-Reranker-8B --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/qwen3-vl-reranker-2b.rst ================================================ .. _models_builtin_qwen3-vl-reranker-2b: ==================== Qwen3-VL-Reranker-2B ==================== - **Model Name:** Qwen3-VL-Reranker-2B - **Languages:** en, zh - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen3-VL-Reranker-2B Execute the following command to launch the model:: xinference launch --model-name Qwen3-VL-Reranker-2B --model-type rerank ================================================ FILE: doc/source/models/builtin/rerank/qwen3-vl-reranker-8b.rst ================================================ .. _models_builtin_qwen3-vl-reranker-8b: ==================== Qwen3-VL-Reranker-8B ==================== - **Model Name:** Qwen3-VL-Reranker-8B - **Languages:** en, zh - **Abilities:** rerank Specifications ^^^^^^^^^^^^^^ - **Model ID:** Qwen/Qwen3-VL-Reranker-8B Execute the following command to launch the model:: xinference launch --model-name Qwen3-VL-Reranker-8B --model-type rerank ================================================ FILE: doc/source/models/builtin/video/cogvideox-2b.rst ================================================ .. _models_builtin_cogvideox-2b: ============ CogVideoX-2b ============ - **Model Name:** CogVideoX-2b - **Model Family:** CogVideoX - **Abilities:** text2video Specifications ^^^^^^^^^^^^^^ - **Model ID:** THUDM/CogVideoX-2b Execute the following command to launch the model:: xinference launch --model-name CogVideoX-2b --model-type video ================================================ FILE: doc/source/models/builtin/video/cogvideox-5b.rst ================================================ .. _models_builtin_cogvideox-5b: ============ CogVideoX-5b ============ - **Model Name:** CogVideoX-5b - **Model Family:** CogVideoX - **Abilities:** text2video Specifications ^^^^^^^^^^^^^^ - **Model ID:** THUDM/CogVideoX-5b Execute the following command to launch the model:: xinference launch --model-name CogVideoX-5b --model-type video ================================================ FILE: doc/source/models/builtin/video/hunyuanvideo.rst ================================================ .. _models_builtin_hunyuanvideo: ============ HunyuanVideo ============ - **Model Name:** HunyuanVideo - **Model Family:** HunyuanVideo - **Abilities:** text2video Specifications ^^^^^^^^^^^^^^ - **Model ID:** hunyuanvideo-community/HunyuanVideo Execute the following command to launch the model:: xinference launch --model-name HunyuanVideo --model-type video ================================================ FILE: doc/source/models/builtin/video/index.rst ================================================ .. _models_video_index: ================ Video Models ================ The following is a list of built-in video models in Xinference: .. toctree:: :maxdepth: 1 cogvideox-2b cogvideox-5b hunyuanvideo wan2.1-1.3b wan2.1-14b wan2.1-flf2v-14b-720p wan2.1-i2v-14b-480p wan2.1-i2v-14b-720p wan2.2-a14b wan2.2-i2v-a14b wan2.2-ti2v-5b ================================================ FILE: doc/source/models/builtin/video/wan2.1-1.3b.rst ================================================ .. _models_builtin_wan2.1-1.3b: =========== Wan2.1-1.3B =========== - **Model Name:** Wan2.1-1.3B - **Model Family:** Wan - **Abilities:** text2video Specifications ^^^^^^^^^^^^^^ - **Model ID:** Wan-AI/Wan2.1-T2V-1.3B-Diffusers Execute the following command to launch the model:: xinference launch --model-name Wan2.1-1.3B --model-type video ================================================ FILE: doc/source/models/builtin/video/wan2.1-14b.rst ================================================ .. _models_builtin_wan2.1-14b: ========== Wan2.1-14B ========== - **Model Name:** Wan2.1-14B - **Model Family:** Wan - **Abilities:** text2video Specifications ^^^^^^^^^^^^^^ - **Model ID:** Wan-AI/Wan2.1-T2V-14B-Diffusers Execute the following command to launch the model:: xinference launch --model-name Wan2.1-14B --model-type video ================================================ FILE: doc/source/models/builtin/video/wan2.1-flf2v-14b-720p.rst ================================================ .. _models_builtin_wan2.1-flf2v-14b-720p: ===================== Wan2.1-flf2v-14B-720p ===================== - **Model Name:** Wan2.1-flf2v-14B-720p - **Model Family:** Wan - **Abilities:** firstlastframe2video Specifications ^^^^^^^^^^^^^^ - **Model ID:** Wan-AI/Wan2.1-FLF2V-14B-720P-diffusers Execute the following command to launch the model:: xinference launch --model-name Wan2.1-flf2v-14B-720p --model-type video ================================================ FILE: doc/source/models/builtin/video/wan2.1-i2v-14b-480p.rst ================================================ .. _models_builtin_wan2.1-i2v-14b-480p: =================== Wan2.1-i2v-14B-480p =================== - **Model Name:** Wan2.1-i2v-14B-480p - **Model Family:** Wan - **Abilities:** image2video Specifications ^^^^^^^^^^^^^^ - **Model ID:** Wan-AI/Wan2.1-I2V-14B-480P-Diffusers Execute the following command to launch the model:: xinference launch --model-name Wan2.1-i2v-14B-480p --model-type video ================================================ FILE: doc/source/models/builtin/video/wan2.1-i2v-14b-720p.rst ================================================ .. _models_builtin_wan2.1-i2v-14b-720p: =================== Wan2.1-i2v-14B-720p =================== - **Model Name:** Wan2.1-i2v-14B-720p - **Model Family:** Wan - **Abilities:** image2video Specifications ^^^^^^^^^^^^^^ - **Model ID:** Wan-AI/Wan2.1-I2V-14B-720P-Diffusers Execute the following command to launch the model:: xinference launch --model-name Wan2.1-i2v-14B-720p --model-type video ================================================ FILE: doc/source/models/builtin/video/wan2.2-a14b.rst ================================================ .. _models_builtin_wan2.2-a14b: =========== Wan2.2-A14B =========== - **Model Name:** Wan2.2-A14B - **Model Family:** Wan - **Abilities:** text2video Specifications ^^^^^^^^^^^^^^ - **Model ID:** Wan-AI/Wan2.2-T2V-A14B-Diffusers Execute the following command to launch the model:: xinference launch --model-name Wan2.2-A14B --model-type video ================================================ FILE: doc/source/models/builtin/video/wan2.2-i2v-a14b.rst ================================================ .. _models_builtin_wan2.2-i2v-a14b: =============== Wan2.2-i2v-A14B =============== - **Model Name:** Wan2.2-i2v-A14B - **Model Family:** Wan - **Abilities:** image2video Specifications ^^^^^^^^^^^^^^ - **Model ID:** Wan-AI/Wan2.2-I2V-A14B-Diffusers Execute the following command to launch the model:: xinference launch --model-name Wan2.2-i2v-A14B --model-type video ================================================ FILE: doc/source/models/builtin/video/wan2.2-ti2v-5b.rst ================================================ .. _models_builtin_wan2.2-ti2v-5b: ============== Wan2.2-ti2v-5B ============== - **Model Name:** Wan2.2-ti2v-5B - **Model Family:** Wan - **Abilities:** text2video, image2video Specifications ^^^^^^^^^^^^^^ - **Model ID:** Wan-AI/Wan2.2-TI2V-5B-Diffusers Execute the following command to launch the model:: xinference launch --model-name Wan2.2-ti2v-5B --model-type video ================================================ FILE: doc/source/models/custom.rst ================================================ .. _models_custom: ============= Custom Models ============= Xinference provides a flexible and comprehensive way to integrate, manage, and utilize custom models. Directly launch an existing model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since ``v0.14.0``, you can directly launch an existing model by passing ``model_path`` to the launch interface without downloading it. This way requires that the model's ``model_family`` is among the built-in supported models, and eliminates the hassle of registering the model. For example: .. tabs:: .. code-tab:: bash shell xinference launch --model-path --model-engine -n qwen1.5-chat .. code-tab:: bash cURL curl -X 'POST' \ 'http://127.0.0.1:9997/v1/models' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "model_engine": "", "model_name": "qwen1.5-chat", "model_path": "" }' .. code-tab:: python from xinference.client import RESTfulClient client = RESTfulClient("http://127.0.0.1:9997") model_uid = client.launch_model( model_engine="", model_name="qwen1.5-chat", model_path="" ) print('Model uid: ' + model_uid) The above example demonstrates how to directly launch a qwen1.5-chat model file without registering it. For distributed scenarios, if your model file is on a specific worker, you can directly launch it using the ``worker_ip`` and ``model_path`` parameters with the launch interface. .. note:: For CLI usage, prefer ``--model-path`` (kebab-case). ``--model_path`` is legacy-compatible but not recommended. Define a custom model ~~~~~~~~~~~~~~~~~~~~~~~~~ Web UI: Automatic LLM Config Parsing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. versionadded:: v2.0.0 When registering a custom LLM via the Web UI, Xinference can automatically parse the model configuration and pre-fill key fields for you. You only need to provide: - **Model path / Model ID** (where the model lives, local path or hub ID) - **Model Family** After parsing, the UI can auto-populate fields such as: - ``Context Length`` - ``Model_Languages`` - ``Model_Abilities`` - ``Model_Specs`` You can review and edit these fields before saving the custom model. Define a custom model based on the following templates: .. tabs:: .. tab:: LLM .. code-block:: json { "version": 2, "context_length": 32768, "model_name": "custom-qwen-2.5", "model_lang": [ "en", "zh" ], "model_ability": [ "generate" ], "model_description": "This is a custom model description.", "model_family": "my-custom-qwen-2.5", "model_specs": [ { "model_format": "pytorch", "model_size_in_billions": "0_5", "quantization": "none", "model_id": null, "model_hub": "huggingface", "model_uri": "file:///path/to/models--Qwen--Qwen2.5-0.5B", "model_revision": null, "activated_size_in_billions": null } ], "chat_template": null, "stop_token_ids": null, "stop": null, "reasoning_start_tag": null, "reasoning_end_tag": null, "cache_config": null, "virtualenv": { "packages": [], "inherit_pip_config": true, "index_url": null, "extra_index_url": null, "find_links": null, "trusted_host": null, "no_build_isolation": null }, "is_builtin": false } .. tab:: embedding .. code-block:: json { "version": 2, "model_name": "my-bge-large-zh-v1.5", "dimensions": 1024, "max_tokens": 512, "language": [ "zh" ], "model_specs": [ { "model_format": "pytorch", "model_hub": "huggingface", "model_id": null, "model_uri": "file:///path/to/my-bge-large-zh-v1.5", "model_revision": null, "quantization": "none" } ], "cache_config": null, "virtualenv": { "packages": [], "inherit_pip_config": true, "index_url": null, "extra_index_url": null, "find_links": null, "trusted_host": null, "no_build_isolation": null }, "is_builtin": false } .. tab:: Rerank .. code-block:: json { "version": 2, "model_name": "my-bge-reranker-base", "model_specs": [ { "model_format": "pytorch", "model_hub": "huggingface", "model_id": null, "model_revision": null, "model_uri": "file:///path/to/my-bge-reranker-base", "quantization": "none" } ], "language": [ "en", "zh" ], "type": "unknown", "max_tokens": 512, "virtualenv": { "packages": [], "inherit_pip_config": true, "index_url": null, "extra_index_url": null, "find_links": null, "trusted_host": null, "no_build_isolation": null }, "is_builtin": false } .. tab:: image .. code-block:: json { "model_name": "my-qwen-image", "model_id": null, "model_revision": null, "model_hub": "huggingface", "cache_config": null, "version": 2, "model_family": "stable_diffusion", "model_ability": null, "controlnet": [], "default_model_config": {}, "default_generate_config": {}, "gguf_model_id": null, "gguf_quantizations": null, "gguf_model_file_name_template": null, "lightning_model_id": null, "lightning_versions": null, "lightning_model_file_name_template": null, "virtualenv": { "packages": [], "inherit_pip_config": true, "index_url": null, "extra_index_url": null, "find_links": null, "trusted_host": null, "no_build_isolation": null }, "model_uri": "file:///path/to/my-qwen-image", "is_builtin": false } .. tab:: audio .. code-block:: json { "model_name": "my-ChatTTS", "model_id": null, "model_revision": null, "model_hub": "huggingface", "cache_config": null, "version": 2, "model_family": "ChatTTS", "multilingual": false, "language": null, "model_ability": [ "text2audio" ], "default_model_config": null, "default_transcription_config": null, "engine": null, "virtualenv": { "packages": [], "inherit_pip_config": true, "index_url": null, "extra_index_url": null, "find_links": null, "trusted_host": null, "no_build_isolation": null }, "model_uri": "file:///path/to/my-ChatTTS", "is_builtin": false } .. tab:: flexible .. code-block:: json { "model_name": "my-flexible-model", "model_id": null, "model_revision": null, "model_hub": "huggingface", "cache_config": null, "version": 2, "model_description": "This is a model description.", "model_uri": "file:///path/to/my-flexible-model", "launcher": "xinference.model.flexible.launchers.transformers", "launcher_args": "{}", "virtualenv": { "packages": [], "inherit_pip_config": true, "index_url": null, "extra_index_url": null, "find_links": null, "trusted_host": null, "no_build_isolation": null }, "is_builtin": false } * model_name: A string defining the name of the model. The name must start with a letter or a digit and can only contain letters, digits, underscores, or dashes. * context_length: An optional integer that specifies the maximum context size the model was trained to accommodate, encompassing both the input and output lengths. If not defined, the default value is 2048 tokens (~1,500 words). * dimensions: An interger defining the size of the vector output by the embedding model. * max_tokens: An interger defining the maximum number of input tokens the embedding model can process in a single request. * model_lang: A list of strings representing the supported languages for the model. Example: ["en"], which means that the model supports English. * model_ability: A list of strings defining the abilities of the model. It could include options like "embed", "generate", and "chat". In this case, the model has the ability to "generate". * model_family: A required string representing the family of the model you want to register. This parameter must not conflict with any builtin model names. * model_specs: An array of objects defining the specifications of the model. These include: * model_format: A string that defines the model format, like "pytorch" or "ggufv2". * model_size_in_billions: An integer defining the size of the model in billions of parameters. * quantizations: A list of strings defining the available quantizations for the model. For PyTorch models, it could be "4-bit", "8-bit", or "none". For ggufv2 models, the quantizations should correspond to values that work with the ``model_file_name_template``. Some engines also support ``fp4`` / ``fp8`` / ``bnb`` formats (see :ref:`installation` for backend support details). * model_id: A string representing the model ID, possibly referring to an identifier used by Hugging Face. **If model_uri is missing, Xinference will try to download the model from the huggingface repository specified here.**. * model_hub: A string representing where to download the model from, like "Huggingface" or "modelscope" * model_uri: A string representing the URI where the model can be loaded from, such as "file:///path/to/llama-2-7b". **When the model format is ggufv2, model_uri must be the specific file path. When the model format is pytorch, model_uri must be the path to the directory containing the model files.** If model URI is absent, Xinference will try to download the model from Hugging Face with the model ID. * model_revision: A string representing the specific version or commit hash of the model files to use from the repository. * chat_template: If ``model_ability`` includes ``chat`` , you must configure this option to generate the correct full prompt during chat. This is a Jinja template string. Usually, you can find it in the ``tokenizer_config.json`` file within the model directory. * stop_token_ids: If ``model_ability`` includes ``chat`` , you can configure this option to control when the model stops during chat. This is a list of integers, and you can typically extract the corresponding values from the ``generation_config.json`` or ``tokenizer_config.json`` file in the model directory. * stop: If ``model_ability`` includes ``chat`` , you can configure this option to control when the model stops during chat. This is a list of strings, and you can typically extract the corresponding values from the ``generation_config.json`` or ``tokenizer_config.json`` file in the model directory. * reasoning_start_tag: A special token or prompt used to explicitly instruct the LLM to begin its chain-of-thought or reasoning process in its output. * reasoning_end_tag: A special token or prompt used to explicitly mark the end of the model's chain-of-thought or reasoning process in its output. * cache_config: A string representing the parameters and rules for how the system stores and manages temporary data (cache). * virtualenv: A settings object for model dependency isolation. Please refer to :ref:`this document ` for details. Register a Custom Model ~~~~~~~~~~~~~~~~~~~~~~~ Register a custom model programmatically: .. code-block:: python import json from xinference.client import Client with open('model.json') as fd: model = fd.read() # replace with real xinference endpoint endpoint = 'http://localhost:9997' client = Client(endpoint) client.register_model(model_type="", model=model, persist=False) Or via CLI: .. code-block:: bash xinference register --model-type --file model.json --persist Note that replace the ```` above with ``LLM``, ``embedding`` or ``rerank``. The same as below. List the Built-in and Custom Models ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ List built-in and custom models programmatically: .. code-block:: python registrations = client.list_model_registrations(model_type="") Or via CLI: .. code-block:: bash xinference registrations --model-type Launch the Custom Model ~~~~~~~~~~~~~~~~~~~~~~~ Launch the custom model programmatically: .. code-block:: python uid = client.launch_model(model_name='custom-llama-2', model_format='pytorch') Or via CLI: .. code-block:: bash xinference launch --model-name custom-llama-2 --model-format pytorch Interact with the Custom Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Invoke the model programmatically: .. code-block:: python model = client.get_model(model_uid=uid) model.generate('What is the largest animal in the world?') Result: .. code-block:: json { "id":"cmpl-a4a9d9fc-7703-4a44-82af-fce9e3c0e52a", "object":"text_completion", "created":1692024624, "model":"43e1f69a-3ab0-11ee-8f69-fa163e74fa2d", "choices":[ { "text":"\nWhat does an octopus look like?\nHow many human hours has an octopus been watching you for?", "index":0, "logprobs":"None", "finish_reason":"stop" } ], "usage":{ "prompt_tokens":10, "completion_tokens":23, "total_tokens":33 } } Or via CLI, replace ``${UID}`` with real model UID: .. code-block:: bash xinference generate --model-uid ${UID} Unregister the Custom Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Unregister the custom model programmatically: .. code-block:: python model = client.unregister_model(model_type="", model_name='custom-llama-2') Or via CLI: .. code-block:: bash xinference unregister --model-type --model-name custom-llama-2 ================================================ FILE: doc/source/models/index.rst ================================================ .. _models_index: ====== Models ====== List Models ============================ You can list all models of a certain type that are available to launch in Xinference: .. tabs:: .. code-tab:: bash shell xinference registrations --model-type \ [--endpoint "http://:"] \ .. code-tab:: bash cURL curl http://:/v1/model_registrations/ .. code-tab:: python from xinference.client import Client client = Client("http://:") print(client.list_model_registrations(model_type='')) The following ``MODEL_TYPE`` is supported by Xinference: .. grid:: 2 .. grid-item-card:: LLM :link: models_llm_index :link-type: ref Text generation models or large language models .. grid-item-card:: embedding :link: models_embedding_index :link-type: ref Text embeddings models .. grid:: 2 .. grid-item-card:: image :link: models_image_index :link-type: ref Image generation or manipulation models .. grid-item-card:: audio :link: models_audio_index :link-type: ref Audio models .. grid:: 2 .. grid-item-card:: rerank :link: models_rerank_index :link-type: ref Rerank models .. grid-item-card:: video :link: models_video_index :link-type: ref Video models .. grid:: 2 .. grid-item-card:: flexible :link: flexible :link-type: ref Flexible models (Traditional ML Models) You can see all the built-in models supported by xinference :ref:`here `. If the model you need is not available, Xinference also allows you to register your own :ref:`custom models `. Launch and Terminate Model ============================ Each running model instance will be assigned a unique model uid. By default, the model uid is equal to the model name. This unique id can be used as a handle for the further usage. You can manually assign it by passing ``--model-uid`` option in the launch command. You can launch a model in Xinference either via command line or Xinference's Python client: .. tabs:: .. code-tab:: bash shell xinference launch --model-name \ [--model-engine ] \ [--model-type ] \ [--model-uid ] \ [--endpoint "http://:"] \ .. code-tab:: python from xinference.client import Client client = Client("http://:") model_uid = client.launch_model( model_name="", model_engine="", model_type="" model_uid="" ) print(model_uid) For model type ``LLM``, launching the model requires not only specifying the model name, but also the size of the parameters , the model format and the model engine. Please refer to the list of LLM :ref:`model families `. The following command gives you the currently running models in Xinference: .. tabs:: .. code-tab:: bash shell xinference list [--endpoint "http://:"] .. code-tab:: bash cURL curl http://:/v1/models .. code-tab:: python from xinference.client import Client client = Client("http://:") print(client.list_models()) When you no longer need a model that is currently running, you can remove it in the following way to free up the resources it occupies: .. tabs:: .. code-tab:: bash shell xinference terminate --model-uid "" [--endpoint "http://:"] .. code-tab:: bash cURL curl -X DELETE http://:/v1/models/ .. code-tab:: python from xinference.client import Client client = Client("http://:") client.terminate_model(model_uid="") .. note:: For models that are no longer maintained and depend on outdated libraries (such as ``transformers``), we recommend enabling the :ref:`Model Virtual Environment ` feature to ensure they can run properly in a compatible environment. Model Usage ============================ .. grid:: 2 .. grid-item-card:: Chat & Generate :link: chat :link-type: ref Learn how to chat with LLMs in Xinference. .. grid-item-card:: Tools :link: tools :link-type: ref Learn how to connect LLM with external tools. .. grid:: 2 .. grid-item-card:: Embeddings :link: embed :link-type: ref Learn how to create text embeddings in Xinference. .. grid-item-card:: Rerank :link: rerank :link-type: ref Learn how to use rerank models in Xinference. .. grid:: 2 .. grid-item-card:: Images :link: image :link-type: ref Learn how to generate images with Xinference. .. grid-item-card:: Multimodal :link: multimodal :link-type: ref Learn how to process images and audio with LLMs. .. grid:: 2 .. grid-item-card:: Audio :link: audio :link-type: ref Learn how to turn audio into text or text into audio with Xinference. .. grid-item-card:: Video :link: video :link-type: ref Learn how to generate video with Xinference. .. grid:: 2 .. grid-item-card:: flexible :link: flexible :link-type: ref Learn how to inference traditional ML models with Xinference. .. toctree:: :maxdepth: 2 xinference_models_hub model_abilities/index builtin/index custom model_update sources/sources virtualenv lora model_memory ================================================ FILE: doc/source/models/lora.rst ================================================ .. _lora: ================ LoRA Integration ================ Currently, Xinference supports launching ``LLM`` and ``image`` models with an attached LoRA fine-tuned model. Usage ##### Launch ====== Different from built-in models, xinference currently does not involve managing LoRA models. Users need to first download the LoRA model themselves and then provide the storage path of the model files to xinference. .. tabs:: .. code-tab:: bash shell xinference launch --lora-modules --lora-modules --image-lora-load-kwargs --image-lora-load-kwargs --image-lora-fuse-kwargs --image-lora-fuse-kwargs .. code-tab:: python from xinference.client import Client client = Client("http://:") lora_model1={'lora_name': , 'local_path': } lora_model2={'lora_name': , 'local_path': } lora_models=[lora_model1, lora_model2] image_lora_load_kwargs={'': , '': }, image_lora_fuse_kwargs={'': , '': } peft_model_config = { "image_lora_load_kwargs": image_lora_load_params, "image_lora_fuse_kwargs": image_lora_fuse_params, "lora_list": lora_models } client.launch_model( , peft_model_config=peft_model_config ) Apply ===== For LLM models, you can only configure one lora model you want when you use the model. Specifically, specify that the ``lora_name`` parameter be configured in the ``generate_config``. ``lora_name`` corresponds to the name of the lora in the LAUNCH procedure described above. .. tabs:: .. code-tab:: python from xinference.client import Client client = Client("http://:") model = client.get_model("") model.chat( messages=[{"role": "user", "content": ""}], generate_config={"lora_name": ""} ) Note #### * The options ``image_lora_load_kwargs`` and ``image_lora_fuse_kwargs`` are only applicable to models with model_type ``image``. They correspond to the parameters in the ``load_lora_weights`` and ``fuse_lora`` interfaces of the ``diffusers`` library. If launching an LLM model, these parameters are not required. * You need to add the parameter lora_name during inference to specify the corresponding lora model. You can specify it in the Additional Inputs option. * For LLM chat models, currently only LoRA models are supported that do not change the prompt style. * When using GPU, both LoRA and its base model occupy the same devices. ================================================ FILE: doc/source/models/model_abilities/audio.rst ================================================ .. _audio: ===== Audio ===== Learn how to turn audio into text or text into audio with Xinference. Introduction ================== The Audio API provides three methods for interacting with audio: * The transcriptions endpoint transcribes audio into the input language. * The translations endpoint translates audio into English. * The speech endpoint generates audio from the input text. .. list-table:: :widths: 25 50 :header-rows: 1 * - API ENDPOINT - OpenAI-compatible ENDPOINT * - Transcription API - /v1/audio/transcriptions * - Translation API - /v1/audio/translations * - Speech API - /v1/audio/speech Supported models ------------------- The audio API is supported with the following models in Xinference: Audio to text ~~~~~~~~~~~~~ * :ref:`whisper-tiny ` * :ref:`whisper-tiny.en ` * :ref:`whisper-base ` * :ref:`whisper-base.en ` * :ref:`whisper-medium ` * :ref:`whisper-medium.en ` * :ref:`whisper-large-v3 ` * :ref:`whisper-large-v3-turbo ` * :ref:`Belle-distilwhisper-large-v2-zh ` * :ref:`Belle-whisper-large-v2-zh ` * :ref:`Belle-whisper-large-v3-zh ` * :ref:`SenseVoiceSmall ` * :ref:`Paraformer-zh ` For Mac M-series chips only: * :ref:`whisper-tiny-mlx ` * :ref:`whisper-tiny.en-mlx ` * :ref:`whisper-base-mlx ` * :ref:`whisper-base.en-mlx ` * :ref:`whisper-medium-mlx ` * :ref:`whisper-medium.en-mlx ` * :ref:`whisper-large-v3-mlx ` * :ref:`whisper-large-v3-turbo-mlx ` Text to audio (TTS) ~~~~~~~~~~~~~~~~~~~~~ **Models supporting zero-shot** (direct synthesis without reference audio): * :ref:`ChatTTS ` * :ref:`CosyVoice-300M-SFT ` * :ref:`CosyVoice-300M-Instruct ` * MeloTTS series * :ref:`Kokoro-82M ` * :ref:`Kokoro-82M-MLX ` * :ref:`MegaTTS3 ` **Models supporting voice cloning** (requires reference audio): * :ref:`CosyVoice-300M ` * :ref:`CosyVoice 2.0 ` * :ref:`FishSpeech-1.5 ` * :ref:`F5-TTS ` * :ref:`F5-TTS-MLX ` * :ref:`IndexTTS2 ` **Models supporting emotion control**: * :ref:`IndexTTS2 ` For Mac M-series chips only: * :ref:`F5-TTS-MLX ` * :ref:`Kokoro-82M-MLX ` Quickstart =================== Transcription -------------------- The Transcription API mimics OpenAI's `create transcriptions API `_. We can try Transcription API out either via cURL, OpenAI Client, or Xinference's python client: .. tabs:: .. code-tab:: bash cURL curl -X 'POST' \ 'http://:/v1/audio/transcriptions' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "model": "", "file": "