Full Code of QwenLM/Qwen3 for AI

main 7a2f61ffc7a2 cached

101 files

27.6 MB

920.6k tokens

45 symbols

1 requests

Download .txt

Showing preview only (3,680K chars total). Download the full file or copy to clipboard to get everything.

Repository: QwenLM/Qwen3
Branch: main
Commit: 7a2f61ffc7a2
Files: 101
Total size: 27.6 MB

Directory structure:
gitextract_9etz2yip/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yml
│   │   └── config.yml
│   └── workflows/
│       └── inactive.yml
├── .gitignore
├── .readthedocs.yaml
├── README.md
├── docker/
│   ├── Dockerfile-cu121
│   ├── docker_cli_demo.sh
│   └── docker_web_demo.sh
├── docs/
│   ├── Makefile
│   ├── README.md
│   ├── locales/
│   │   └── zh_CN/
│   │       └── LC_MESSAGES/
│   │           ├── deployment/
│   │           │   ├── dstack.po
│   │           │   ├── openllm.po
│   │           │   ├── sglang.po
│   │           │   ├── skypilot.po
│   │           │   ├── tgi.po
│   │           │   └── vllm.po
│   │           ├── framework/
│   │           │   ├── Langchain.po
│   │           │   ├── LlamaIndex.po
│   │           │   ├── function_call.po
│   │           │   └── qwen_agent.po
│   │           ├── getting_started/
│   │           │   ├── concepts.po
│   │           │   ├── quantization_benchmark.po
│   │           │   ├── quickstart.po
│   │           │   ├── speed_benchmark.po
│   │           │   └── thinking_budget.po
│   │           ├── index.po
│   │           ├── inference/
│   │           │   └── transformers.po
│   │           ├── quantization/
│   │           │   ├── awq.po
│   │           │   ├── gptq.po
│   │           │   └── llama.cpp.po
│   │           ├── run_locally/
│   │           │   ├── llama.cpp.po
│   │           │   ├── mlx-lm.po
│   │           │   └── ollama.po
│   │           └── training/
│   │               ├── axolotl.po
│   │               ├── llama_factory.po
│   │               ├── ms_swift.po
│   │               ├── unsloth.po
│   │               └── verl.po
│   ├── make.bat
│   ├── requirements-docs.txt
│   └── source/
│       ├── _static/
│       │   ├── css/
│       │   │   └── custom.css
│       │   └── design-tabs.js
│       ├── assets/
│       │   └── qwen3_nonthinking.jinja
│       ├── conf.py
│       ├── deployment/
│       │   ├── dstack.rst
│       │   ├── openllm.rst
│       │   ├── sglang.md
│       │   ├── skypilot.rst
│       │   ├── tgi.rst
│       │   └── vllm.md
│       ├── framework/
│       │   ├── Langchain.rst
│       │   ├── LlamaIndex.rst
│       │   ├── function_call.md
│       │   └── qwen_agent.rst
│       ├── getting_started/
│       │   ├── concepts.md
│       │   ├── quantization_benchmark.rst
│       │   ├── quickstart.md
│       │   ├── speed_benchmark.md
│       │   └── thinking_budget.md
│       ├── index.rst
│       ├── inference/
│       │   └── transformers.md
│       ├── quantization/
│       │   ├── awq.md
│       │   ├── gptq.md
│       │   └── llama.cpp.md
│       ├── run_locally/
│       │   ├── llama.cpp.md
│       │   ├── lmstudio.md
│       │   ├── mlx-lm.md
│       │   └── ollama.md
│       └── training/
│           ├── axolotl.md
│           ├── llama_factory.md
│           ├── ms_swift.md
│           ├── unsloth.md
│           └── verl.md
├── eval/
│   ├── README.md
│   ├── configs/
│   │   └── ARCAGI-Qwen3-235B-A22B-Instruct-2507.yaml
│   ├── data/
│   │   └── arc_agi_1.jsonl
│   ├── eval/
│   │   ├── arc_agi_1.py
│   │   └── eval.py
│   ├── eval_res/
│   │   └── ARCAGI-Qwen3-235B-A22B-Instruct-2507_eval_result.txt
│   ├── generate_api_answers/
│   │   ├── infer_multithread.py
│   │   └── utils_vllm.py
│   ├── output/
│   │   ├── ARCAGI-Qwen3-235B-A22B-Instruct-2507.jsonl
│   │   └── ARCAGI-Qwen3-235B-A22B-Instruct-2507_details.jsonl
│   └── requirements.txt
└── examples/
    ├── README.md
    ├── demo/
    │   ├── cli_demo.py
    │   └── web_demo.py
    ├── gcu-support/
    │   ├── README.md
    │   └── gcu_demo.py
    ├── llama-factory/
    │   ├── finetune-zh.md
    │   ├── qwen2-7b-full-sft.yaml
    │   ├── qwen2-7b-lora-sft.yaml
    │   ├── qwen2-7b-merge-lora.yaml
    │   └── qwen2-7b-qlora-sft.yaml
    └── speed-benchmark/
        ├── README.md
        ├── README_zh.md
        ├── requirements-perf-transformers.txt
        ├── requirements-perf-vllm.txt
        ├── speed_benchmark_transformers.py
        └── speed_benchmark_vllm.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.yml
================================================
name: 🐞 Bug Report
description: Something unexpected happened, errors or badcases
body:
  - type: markdown
    attributes:
      value: |
        **We appreciate your time and effort in filing this report. ❤**

        Issues are a vital part of open-source collaboration. They help identify and resolve real-world problems, improve the project for everyone, and create a transparent, well-documented knowledge base that benefits the entire community. **By reporting an issue, you're not just solving a problem for yourself --- you're helping others who may encounter the same challenge.**

        To ensure your issue is addressed efficiently and effectively, please follow these guidelines when submitting:

        1.  **Use English**  
            Please write your issue in English. This ensures the widest possible audience can understand and contribute. Most translation tools also work best with English, making your report more accessible globally.

        2.  **Choose a clear and descriptive title**  
            Your title should summarize the issue clearly and include relevant details such as the model ID (e.g., `Qwen3-8B`, `Qwen3-30B-A3B-Instruct-2507-FP8`) and the framework or environment (e.g., `transformers`, `vllm`).  
            Example: `Qwen3-8B-Instruct-2507 generates gibberish with SGLang under CUDA 12.1`

        3.  **Provide reproducible steps**  
            Help us help you: include a minimal, self-contained code snippet and clear instructions to reproduce the issue. Be sure to specify:
            - Your environment (OS, Python version, package versions)
            - The exact commands or code used; the prompts and the sampling parameters in terms of badcases
            - Expected vs. actual behavior

        4.  **Be precise and concise**  
            - Focus on **one issue per report**. If you have multiple bugs, please open separate issues.
            - Include only the context and code necessary to understand and reproduce the problem.
            - Think of your issue as a future reference for others --- aim to make it a helpful documentation entry.


        Also keep in mind that this repository is dedicated to the open-weight versions of the **Qwen language models**. 
        - For **Qwen-Coder**, **Qwen-VL**, **Qwen-Omni**, or other specialized models, refer to their respective repositories.
        - For **Qwen API services** or closed-source variants, please contact Alibaba Cloud support directly.

  - type: textarea
    attributes:
      label: Description
      description: |
        Please describe the problem you have encountered.
    validations:
      required: true

  - type: textarea
    attributes:
      label: Reproduction
      description: | 
        Please provide minimal reproducible code. For badcases, Qwen Chat share links are also good.
    validations:
      required: true

  - type: textarea
    attributes:
      label: Logs
      description: | 
        Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks.
        If the log is too long, feel free to put it in a public gist and link it in the issue: https://gist.github.com. 
      render: shell
    validations:
      required: false

  - type: textarea
    attributes:
      label: Environment Information
      description: |
        Please provide information about you environment, e.g., the software versions and the information on the OS, GPUs, CUDA, and NVIDIA Driver if GPUs are used.
    validations:
      required: true
  
  - type: checkboxes
    attributes:
      label: Known Issue
      options:
        - label: The issue hasn't been already addressed in Documentation, Issues, and Discussions.
  


================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: false
contact_links:
  - name: 🚀 Feature Request
    url: https://github.com/QwenLM/Qwen3/discussions/categories/polls
    about: Create a poll and see if others are interested in a new feature as well
  - name: 🙏 Question
    url: https://github.com/QwenLM/Qwen3/discussions/categories/q-a
    about: Quick question, what/when/where/why/how ...
  - name: 💬 General Discussion
    url: https://github.com/QwenLM/Qwen3/discussions/categories/general
    about: Discuss in general

================================================
FILE: .github/workflows/inactive.yml
================================================
name: Close and lock inactive threads

on:
  schedule:
    - cron: "0 8 * * *"
  workflow_dispatch:

permissions:
  actions: write
  issues: write
  pull-requests: write

jobs:
  close:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/stale@v10
        with:
          days-before-issue-stale: 30
          days-before-issue-close: 7
          stale-issue-label: inactive
          stale-issue-message: >
            This issue has been automatically marked as inactive 
            due to lack of recent activity. 
            Should you believe it remains unresolved and warrants attention, 
            kindly leave a comment on this thread. 
          exempt-issue-labels: enhancement
          days-before-pr-stale: -1
          days-before-pr-close: -1
          operations-per-run: 128
  lock:
    runs-on: ubuntu-latest
    steps:
      - uses: dessant/lock-threads@v5
        with:
          issue-inactive-days: '30'
          issue-comment: >
            This issue has been automatically locked since there
            has not been any recent activity after it was closed.
            Please open a new issue for related bugs.
          pr-inactive-days: '30'
          pr-comment: >
            This pull request has been automatically locked since there
            has not been any recent activity after it was closed.
            Please open a new issue for related bugs.
          process-only: "issues,prs"



================================================
FILE: .gitignore
================================================
# Sphinx documentation
docs/_build/
docs/build/
docs/**/*.mo
.vscode
.idea

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class


================================================
FILE: .readthedocs.yaml
================================================
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

version: 2

build:
  os: ubuntu-22.04
  tools:
    python: "3"

sphinx:
   configuration: docs/source/conf.py

# If using Sphinx, optionally build your docs in additional formats such as PDF
# formats:
#    - pdf

# Optionally declare the Python requirements required to build your docs
python:
   install:
   - requirements: docs/requirements-docs.txt


================================================
FILE: README.md
================================================
# Qwen3

<p align="center">
    <img src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen3.png" width="400"/>
<p>

<p align="center">
          💜 <a href="https://chat.qwen.ai/"><b>Qwen Chat</b></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Qwen">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/qwen">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2505.09388">Paper</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://qwenlm.github.io/blog/qwen3/">Blog</a> &nbsp&nbsp ｜ &nbsp&nbsp📖 <a href="https://qwen.readthedocs.io/">Documentation</a>
<br>
🖥️ <a href="https://huggingface.co/spaces/Qwen/Qwen3-Demo">Demo</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://github.com/QwenLM/Qwen/blob/main/assets/wechat.png">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp🫨 <a href="https://discord.gg/CV4E9rpNSD">Discord</a>&nbsp&nbsp
</p>


Visit our Hugging Face or ModelScope organization (click links above), search checkpoints with names starting with `Qwen3-` or visit the [Qwen3 collection](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f), and you will find all you need! Enjoy!

To learn more about Qwen3, feel free to read our documentation \[[EN](https://qwen.readthedocs.io/en/latest/)|[ZH](https://qwen.readthedocs.io/zh-cn/latest/)\]. Our documentation consists of the following sections:

- Quickstart: the basic usages and demonstrations;
- Inference: the guidance for the inference with Transformers, including batch inference, streaming, etc.;
- Run Locally: the instructions for running LLM locally on CPU and GPU, with frameworks like llama.cpp, Ollama, and LM Studio;
- Deployment: the demonstration of how to deploy Qwen for large-scale inference with frameworks like SGLang, vLLM, TGI, etc.;
- Quantization: the practice of quantizing LLMs with GPTQ, AWQ, as well as the guidance for how to make high-quality quantized GGUF files;
- Training: the instructions for post-training, including SFT and RLHF (TODO) with frameworks like Axolotl, LLaMA-Factory, etc.
- Framework: the usage of Qwen with frameworks for application, e.g., RAG, Agent, etc.

## Introduction

### Qwen3-2507

Over the past three months, we continued to explore the potential of the Qwen3 families and we are excited to introduce the updated **Qwen3-2507** in two variants, Qwen3-Instruct-2507 and Qwen3-Thinking-2507, and three sizes, 235B-A22B, 30B-A3B, and 4B.

**Qwen3-Instruct-2507** is the updated version of the previous Qwen3 non-thinking mode, featuring the following key enhancements:  

- **Significant improvements** in general capabilities, including **instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage**.  
- **Substantial gains** in long-tail knowledge coverage across **multiple languages**.  
- **Markedly better alignment** with user preferences in **subjective and open-ended tasks**, enabling more helpful responses and higher-quality text generation.  
- **Enhanced capabilities** in **256K-token long-context understanding**, extendable up to **1 million tokens**.

**Qwen3-Thinking-2507** is the continuation of Qwen3 thinking model, with improved quality and depth of reasoning, featuring the following key enhancements:
- **Significantly improved performance** on reasoning tasks, including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise — achieving **state-of-the-art results among open-weight thinking models**.
- **Markedly better general capabilities**, such as instruction following, tool usage, text generation, and alignment with human preferences.
- **Enhanced 256K long-context understanding** capabilities, extendable up to **1 million tokens**.


<details>
    <summary><b>Previous Qwen3 Release</b></summary>
    <h3>Qwen3 (aka Qwen3-2504)</h3>
    <p>
    We are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. 
    These models represent our most advanced and intelligent systems to date, improving from our experience in building QwQ and Qwen2.5.
    We are making the weights of Qwen3 available to the public, including both dense and Mixture-of-Expert (MoE) models. 
    <br><br>
    The highlights from Qwen3 include:
        <ul>
            <li><b>Dense and Mixture-of-Experts (MoE) models of various sizes</b>, available in 0.6B, 1.7B, 4B, 8B, 14B, 32B and 30B-A3B, 235B-A22B.</li>
            <li><b>Seamless switching between thinking mode</b> (for complex logical reasoning, math, and coding) and <b>non-thinking mode</b> (for efficient, general-purpose chat), ensuring optimal performance across various scenarios.</li>
            <li><b>Significantly enhancement in reasoning capabilities</b>, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.</li>
            <li><b>Superior human preference alignment</b>, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.</li>
            <li><b>Expertise in agent capabilities</b>, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.</li>
            <li><b>Support of 100+ languages and dialects</b> with strong capabilities for <b>multilingual instruction following</b> and <b>translation</b>.</li>
        </ul>
    </p>
</details>


## News
- 2025.08.08: You can now use Qwen3-2507 to handle ultra-long inputs of **1 million tokens**! See the update modelcards ([235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507), [235B-A22B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507), [A30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507), [A30B-A3B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507)) for how to enable this feature.
- 2025.08.06: The final open release of Qwen3-2507, [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) and [Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), is out!
- 2025.07.31: Qwen3-30B-A3B-Thinking-2507 is released. Check out the [modelcard](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507) for more details!
- 2025.07.30: Qwen3-30B-A3B-Instruct-2507 is released. Check out the [modelcard](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) for more details!
- 2025.07.25: We released the updated version of Qwen3-235B-A22B thinking mode, named Qwen3-235B-A22B-Thinking-2507. Check out the [modelcard](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507) for more details!
- 2025.07.21: We released the updated version of Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507, featuring significant enhancements over the previous version and supporting 256K-token long-context understanding. Check our [modelcard](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507) for more details!
- 2025.04.29: We released the Qwen3 series. Check our [blog](https://qwenlm.github.io/blog/qwen3) for more details!
- 2024.09.19: We released the Qwen2.5 series. This time there are 3 extra model sizes: 3B, 14B, and 32B for more possibilities. Check our [blog](https://qwenlm.github.io/blog/qwen2.5) for more!
- 2024.06.06: We released the Qwen2 series. Check our [blog](https://qwenlm.github.io/blog/qwen2/)!
- 2024.03.28: We released the first MoE model of Qwen: Qwen1.5-MoE-A2.7B! Temporarily, only HF transformers and vLLM support the model. We will soon add the support of llama.cpp, mlx-lm, etc. Check our [blog](https://qwenlm.github.io/blog/qwen-moe/) for more information!
- 2024.02.05: We released the Qwen1.5 series.

## Performance

Detailed evaluation results are reported in this [📑 blog (Qwen3-2504)](https://qwenlm.github.io/blog/qwen3/) and this [📑 blog (Qwen3-2507) \[coming soon\]]().

For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/getting_started/speed_benchmark.html).

## Run Qwen3

### 🤗 Transformers

Transformers is a library of pretrained natural language processing for inference and training. 
The latest version of `transformers` is recommended and `transformers>=4.51.0` is required.

#### Qwen3-Instruct-2507

The following contains a code snippet illustrating how to use Qwen3-30B-A3B-Instruct-2507 to generate content based on given inputs. 
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)
```

> [!Note]
> Qwen3-Instruct-2507 supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.


#### Qwen3-Thinking-2507

The following contains a code snippet illustrating how to use Qwen3-30B-A3B-Thinking-2507 to generate content based on given inputs. 
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-30B-A3B-Thinking-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)  # no opening <think> tag
print("content:", content)

```

> [!Note]
> Qwen3-Thinking-2507 supports only thinking mode.
> Additionally, to enforce model thinking, the default chat template automatically includes `<think>`. Therefore, it is normal for the model's output to contain only `</think>` without an explicit opening `<think>` tag.
> 
> Qwen3-Thinking-2507 also features an increased thinking length. We strongly recommend its use in highly complex reasoning tasks with adequate maximum generation length.



<details>
    <summary><b>Switching Thinking/Non-thinking Modes for Previous Qwen3  Models</b></summary>
    <p>
    By default, Qwen3 models will think before response.
    This could be controlled by
        <ul>
            <li><code>enable_thinking=False</code>: Passing <code>enable_thinking=False</code> to `tokenizer.apply_chat_template` will strictly prevent the model from generating thinking content.</li>
            <li><code>/think</code> and <code>/no_think</code> instructions: Use those words in the system or user message to signify whether Qwen3 should think. In multi-turn conversations, the latest instruction is followed.</li>
        </ul>
    </p>
</details>


### ModelScope

We strongly advise users especially those in mainland China to use ModelScope. 
ModelScope adopts a Python API similar to Transformers.
The CLI tool `modelscope download` can help you solve issues concerning downloading checkpoints.
For vLLM and SGLang, the environment variable `VLLM_USE_MODELSCOPE=true` and `SGLANG_USE_MODELSCOPE=true` can be used respectively.


### llama.cpp

[`llama.cpp`](https://github.com/ggml-org/llama.cpp) enables LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware.
`llama.cpp>=b5401` is recommended for the full support of Qwen3.

To use the CLI, run the following in a terminal:
```shell
./llama-cli -hf Qwen/Qwen3-8B-GGUF:Q8_0 --jinja --color -ngl 99 -fa -sm row --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 -c 40960 -n 32768 --no-context-shift
# CTRL+C to exit
```

To use the API server, run the following in a terminal:
```shell
./llama-server -hf Qwen/Qwen3-8B-GGUF:Q8_0 --jinja --reasoning-format deepseek -ngl 99 -fa -sm row --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 -c 40960 -n 32768 --no-context-shift --port 8080
```
A simple web front end will be at `http://localhost:8080` and an OpenAI-compatible API will be at `http://localhost:8080/v1`.

For additional guides, please refer to [our documentation](https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html).

> [!Note]
> llama.cpp adopts "rotating context management" and infinite generation is made possible by evicting earlier tokens.
> It could configured by parameters and the commands above effectively disable it.
> For more details, please refer to [our documentation](https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html#llama-cli).

### Ollama

After [installing Ollama](https://ollama.com/), you can initiate the Ollama service with the following command (Ollama v0.9.0 or higher is recommended):
```shell
ollama serve
# You need to keep this service running whenever you are using ollama
```

To pull a model checkpoint and run the model, use the `ollama run` command. You can specify a model size by adding a suffix to `qwen3`, such as `:8b` or `:30b-a3b`:
```shell
ollama run qwen3:8b
# Setting parameters, type "/set parameter num_ctx 40960" and "/set parameter num_predict 32768"
# To exit, type "/bye" and press ENTER
# For Qwen3-2504 models,
# - To enable thinking, which is the default, type "/set think"
# - To disable thinking, type "/set nothink"
```

You can also access the Ollama service via its OpenAI-compatible API. 
Please note that you need to (1) keep `ollama serve` running while using the API, and (2) execute `ollama run qwen3:8b` before utilizing this API to ensure that the model checkpoint is prepared.
The API is at `http://localhost:11434/v1/` by default.

For additional details, please visit [ollama.ai](https://ollama.com/).

> [!Note]
> Ollama's naming may not be consistent with the Qwen's original naming.
> For example, `qwen3:30b-a3b` in Ollama points to `qwen3:30b-a3b-thinking-2507-q4_K_M` as of August 2025.
> Please check <https://ollama.com/library/qwen3/tags> before use.


> [!Note]
> Ollama adopts the same "rotating context management" with llama.cpp.
> However, its default settings (`num_ctx` 2048 and `num_predict` -1), suggesting infinite generation with a 2048-token context,
> could lead to trouble for Qwen3 models.
> We recommend setting `num_ctx` and `num_predict` properly.

### LMStudio

Qwen3 has already been supported by [lmstudio.ai](https://lmstudio.ai/). You can directly use LMStudio with our GGUF files.

### ExecuTorch

To export and run on ExecuTorch (iOS, Android, Mac, Linux, and more), please follow this [example](https://github.com/pytorch/executorch/blob/main/examples/models/qwen3/README.md).

### MNN

To export and run on MNN, which supports Qwen3 on mobile devices, please visit [Alibaba MNN](https://github.com/alibaba/MNN).

### MLX LM

If you are running on Apple Silicon, [`mlx-lm`](https://github.com/ml-explore/mlx-lm) also supports Qwen3 (`mlx-lm>=0.24.0`). 
Look for models ending with MLX on Hugging Face Hub.


### OpenVINO

If you are running on Intel CPU or GPU, [OpenVINO toolkit](https://github.com/openvinotoolkit) supports Qwen3.
You can follow this [chatbot example](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-chatbot/llm-chatbot.ipynb).


## Deploy Qwen3

Qwen3 is supported by multiple inference frameworks. 
Here we demonstrate the usage of `SGLang`, `vLLM` and `TensorRT-LLM`.
You can also find Qwen3 models from various inference providers, e.g., [Alibaba Cloud Model Studio](https://www.alibabacloud.com/en/product/modelstudio).


### SGLang

[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models.
SGLang could be used to launch a server with OpenAI-compatible API service. 
`sglang>=0.4.6.post1` is required.

For Qwen3-Instruct-2507, 
```shell
python -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B-Instruct-2507 --port 30000 --context-length 262144
```

For Qwen3-Thinking-2507,
```shell
python -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B-Thinking-2507 --port 30000 --context-length 262144 --reasoning-parser deepseek-r1
```

For Qwen3, it is
```shell
python -m sglang.launch_server --model-path Qwen/Qwen3-8B --port 30000 --context-length 131072 --reasoning-parser qwen3
```
An OpenAI-compatible API will be available at `http://localhost:30000/v1`.

> [!Note]
> Due to the preprocessing of API requests in SGLang, which drops all `reasoning_content` fields, the quality of **multi-step tool use with Qwen3 thinking models** may be suboptimal, which requires the existence of the related thinking content. While the fixes are being worked on, as a workdaround, we recommend passing the content as it is, without extracting thinking content, and the chat template will correctly handle the processing.


### vLLM

[vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs.
`vllm>=0.9.0` is recommended.

For Qwen3-Instruct-2507, 
```shell
vllm serve Qwen/Qwen3-30B-A3B-Instruct-2507 --port 8000 --max-model-len 262144
```

For Qwen3-Thinking-2507,
```shell
vllm serve Qwen/Qwen3-30B-A3B-Thinking-2507 --port 8000 --max-model-len 262144 --enable-reasoning --reasoning-parser deepseek_r1
```

For Qwen3, it is
```shell
vllm serve Qwen/Qwen3-8B --port 8000 --max-model-len 131072 --enable-reasoning --reasoning-parser qwen3
```
An OpenAI-compatible API will be available at `http://localhost:8000/v1`.

> [!Note]
> Due to the preprocessing of API requests in vLLM, which drops all `reasoning_content` fields, the quality of **multi-step tool use with Qwen3 thinking models** may be suboptimal, which requires the existence of the related thinking content. While the fixes are being worked on, as a workdaround, we recommend passing the content as it is, without extracting thinking content, and the chat template will correctly handle the processing.

### TensorRT-LLM

[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) is an open-source LLM inference engine from NVIDIA, which provides optimizations including custom attention kernels, quantization and more on NVIDIA GPUs. Qwen3 is supported in its re-architected [PyTorch backend](https://nvidia.github.io/TensorRT-LLM/torch.html). `tensorrt_llm>=0.20.0rc3` is recommended. Please refer to the [README](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/models/core/qwen/README.md#qwen3) page for more details.

```shell
trtllm-serve Qwen/Qwen3-8B --host localhost --port 8000 --backend pytorch
```
An OpenAI-compatible API will be available at `http://localhost:8000/v1`.

### MindIE

For deployment on Ascend NPUs, please visit [Modelers](https://modelers.cn/) and search for Qwen3.

<!-- 
### OpenLLM

[OpenLLM](https://github.com/bentoml/OpenLLM) allows you to easily run Qwen2.5 as OpenAI-compatible APIs. You can start a model server using `openllm serve`. For example:

```bash
openllm serve qwen2.5:7b
```

The server is active at `http://localhost:3000/`, providing OpenAI-compatible APIs. You can create an OpenAI client to call its chat API. For more information, refer to [our documentation](https://qwen.readthedocs.io/en/latest/deployment/openllm.html). -->


## Build with Qwen3

### Tool Use

For tool use capabilities, we recommend taking a look at [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent), which provides a wrapper around these APIs to support tool use or function calling with MCP support.
Tool use with Qwen3 can also be conducted with SGLang, vLLM, Transformers, llama.cpp, Ollama, etc.
Follow guides in our documentation to see how to enable the support.


### Finetuning

We advise you to use training frameworks, including [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl), [UnSloth](https://github.com/unslothai/unsloth), [Swift](https://github.com/modelscope/swift), [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory), etc., to finetune your models with SFT, DPO, GRPO, etc.


## License Agreement

All our open-weight models are licensed under Apache 2.0. 
You can find the license files in the respective Hugging Face repositories.

## Citation

If you find our work helpful, feel free to give us a cite.

```bibtex
@article{qwen3,
    title={Qwen3 Technical Report}, 
    author={An Yang and Anfeng Li and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Gao and Chengen Huang and Chenxu Lv and Chujie Zheng and Dayiheng Liu and Fan Zhou and Fei Huang and Feng Hu and Hao Ge and Haoran Wei and Huan Lin and Jialong Tang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jing Zhou and Jingren Zhou and Junyang Lin and Kai Dang and Keqin Bao and Kexin Yang and Le Yu and Lianghao Deng and Mei Li and Mingfeng Xue and Mingze Li and Pei Zhang and Peng Wang and Qin Zhu and Rui Men and Ruize Gao and Shixuan Liu and Shuang Luo and Tianhao Li and Tianyi Tang and Wenbiao Yin and Xingzhang Ren and Xinyu Wang and Xinyu Zhang and Xuancheng Ren and Yang Fan and Yang Su and Yichang Zhang and Yinger Zhang and Yu Wan and Yuqiong Liu and Zekun Wang and Zeyu Cui and Zhenru Zhang and Zhipeng Zhou and Zihan Qiu},
    journal = {arXiv preprint arXiv:2505.09388},
    year={2025}
}

@article{qwen2.5,
    title   = {Qwen2.5 Technical Report}, 
    author  = {An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu and Mei Li and Mingfeng Xue and Pei Zhang and Qin Zhu and Rui Men and Runji Lin and Tianhao Li and Tingyu Xia and Xingzhang Ren and Xuancheng Ren and Yang Fan and Yang Su and Yichang Zhang and Yu Wan and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zihan Qiu},
    journal = {arXiv preprint arXiv:2412.15115},
    year    = {2024}
}

@article{qwen2,
    title   = {Qwen2 Technical Report}, 
    author  = {An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
    journal = {arXiv preprint arXiv:2407.10671},
    year    = {2024}
}
```

## Contact Us
If you are interested to leave a message to either our research team or product team, join our [Discord](https://discord.gg/z3GAxXZ9Ce) or [WeChat groups](assets/wechat.png)!


================================================
FILE: docker/Dockerfile-cu121
================================================
ARG CUDA_VERSION=12.1.0
ARG from=nvidia/cuda:${CUDA_VERSION}-cudnn8-devel-ubuntu20.04

FROM ${from} as base

RUN <<EOF
apt update -y && apt upgrade -y && apt install -y --no-install-recommends  \
    git \
    git-lfs \
    python3 \
    python3-pip \
    python3-dev \
    wget \
    vim \
&& rm -rf /var/lib/apt/lists/*
EOF

RUN ln -s /usr/bin/python3 /usr/bin/python

RUN git lfs install

FROM base as dev

WORKDIR /

RUN mkdir -p /data/shared/Qwen

WORKDIR /data/shared/Qwen/

FROM dev as bundle_req
RUN pip install --no-cache-dir networkx==3.1
RUN pip3 install --no-cache-dir torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
RUN pip3 install --no-cache-dir transformers==4.40.2 accelerate tiktoken einops scipy
    
FROM bundle_req as bundle_finetune
ARG BUNDLE_FINETUNE=true

RUN <<EOF
if [ "$BUNDLE_FINETUNE" = "true" ]; then
    cd /data/shared/Qwen

    # Full-finetune / LoRA.
    pip3 install --no-cache-dir "deepspeed==0.14.2" "peft==0.11.1"

    # Q-LoRA.
    apt update -y && DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends \
        libopenmpi-dev openmpi-bin \
        && rm -rf /var/lib/apt/lists/*
    pip3 install --no-cache-dir "optimum==1.20.0" "auto-gptq==0.7.1" "autoawq==0.2.5" mpi4py
fi
EOF

FROM bundle_finetune as bundle_vllm
ARG BUNDLE_VLLM=true

RUN <<EOF
if [ "$BUNDLE_VLLM" = "true" ]; then
    cd /data/shared/Qwen

    pip3 install --no-cache-dir vllm==0.4.3 "fschat[model_worker,webui]==0.2.36"
fi
EOF

FROM bundle_vllm as bundle_flash_attention
ARG BUNDLE_FLASH_ATTENTION=true

RUN <<EOF 
if [ "$BUNDLE_FLASH_ATTENTION" = "true" ]; then
    pip3 install --no-cache-dir flash-attn==2.5.8 --no-build-isolation
fi
EOF

FROM bundle_flash_attention as final

COPY ../examples/sft/* ./
COPY ../examples/demo/* ./

EXPOSE 80


================================================
FILE: docker/docker_cli_demo.sh
================================================
#!/usr/bin/env bash
#
# This script will automatically pull docker image from DockerHub, and start a container to run the Qwen-Chat cli-demo.

IMAGE_NAME=qwenllm/qwen:2-cu121
QWEN_CHECKPOINT_PATH=/path/to/Qwen-Instruct
CONTAINER_NAME=qwen2

function usage() {
    echo '
Usage: bash docker/docker_cli_demo.sh [-i IMAGE_NAME] -c [/path/to/Qwen-Instruct] [-n CONTAINER_NAME]
'
}

while [[ "$1" != "" ]]; do
    case $1 in
        -i | --image-name )
            shift
            IMAGE_NAME=$1
            ;;
        -c | --checkpoint )
            shift
            QWEN_CHECKPOINT_PATH=$1
            ;;
        -n | --container-name )
            shift
            CONTAINER_NAME=$1
            ;;
        -h | --help )
            usage
            exit 0
            ;;
        * )
            echo "Unknown argument ${1}"
            exit 1
            ;;
    esac
    shift
done

if [ ! -e ${QWEN_CHECKPOINT_PATH}/config.json ]; then
    echo "Checkpoint config.json file not found in ${QWEN_CHECKPOINT_PATH}, exit."
    exit 1
fi

sudo docker pull ${IMAGE_NAME} || {
    echo "Pulling image ${IMAGE_NAME} failed, exit."
    exit 1
}

sudo docker run --gpus all --rm --name ${CONTAINER_NAME} \
    --mount type=bind,source=${QWEN_CHECKPOINT_PATH},target=/data/shared/Qwen/Qwen-Instruct \
    -it ${IMAGE_NAME} \
    python cli_demo.py -c /data/shared/Qwen/Qwen-Instruct/

================================================
FILE: docker/docker_web_demo.sh
================================================
#!/usr/bin/env bash
#
# This script will automatically pull docker image from DockerHub, and start a daemon container to run the Qwen-Chat web-demo.

IMAGE_NAME=qwenllm/qwen:2-cu121
QWEN_CHECKPOINT_PATH=/path/to/Qwen-Instruct
PORT=8901
CONTAINER_NAME=qwen2

function usage() {
    echo '
Usage: bash docker/docker_web_demo.sh [-i IMAGE_NAME] -c [/path/to/Qwen-Instruct] [-n CONTAINER_NAME] [--port PORT]
'
}

while [[ "$1" != "" ]]; do
    case $1 in
        -i | --image-name )
            shift
            IMAGE_NAME=$1
            ;;
        -c | --checkpoint )
            shift
            QWEN_CHECKPOINT_PATH=$1
            ;;
        -n | --container-name )
            shift
            CONTAINER_NAME=$1
            ;;
        --port )
            shift
            PORT=$1
            ;;
        -h | --help )
            usage
            exit 0
            ;;
        * )
            echo "Unknown argument ${1}"
            exit 1
            ;;
    esac
    shift
done

if [ ! -e ${QWEN_CHECKPOINT_PATH}/config.json ]; then
    echo "Checkpoint config.json file not found in ${QWEN_CHECKPOINT_PATH}, exit."
    exit 1
fi

sudo docker pull ${IMAGE_NAME} || {
    echo "Pulling image ${IMAGE_NAME} failed, exit."
    exit 1
}

sudo docker run --gpus all -d --restart always --name ${CONTAINER_NAME} \
    -v /var/run/docker.sock:/var/run/docker.sock -p ${PORT}:80 \
    --mount type=bind,source=${QWEN_CHECKPOINT_PATH},target=/data/shared/Qwen/Qwen-Instruct \
    -it ${IMAGE_NAME} \
    python web_demo.py --server-port 80 --server-name 0.0.0.0 -c /data/shared/Qwen/Qwen-Instruct/ && {
    echo "Successfully started web demo. Open 'http://localhost:${PORT}' to try!
Run \`docker logs ${CONTAINER_NAME}\` to check demo status.
Run \`docker rm -f ${CONTAINER_NAME}\` to stop and remove the demo."
}

================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS    ?=
SPHINXBUILD   ?= sphinx-build
SOURCEDIR     = source
BUILDDIR      = build

# Put it first so that "make" without argument is like "make help".
help:
	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

================================================
FILE: docs/README.md
================================================
# Qwen Documentation

This is the source of the documentation at <https://qwen.readthedocs.io>.

## Quick Start

We use `sphinx` to manage the documentation and use the `furo` theme.
To get started, simply run
```bash
pip install -r requirements-docs.txt
```

Then run `make html` or `sphinx-build -M html source build` and it will compile the docs and put it under the `build/html` directory.


## Translation

The documentation is available in both English and Simplified Chinese. We use
`sphinx-intl` to work with Sphinx translation flow, following [this article](https://www.sphinx-doc.org/en/master/usage/advanced/intl.html).

You need to install the Python package `sphinx-intl` before starting.

1. After updating the English documentation, run `make gettext`, and the pot files will be placed in the `build/gettext` directory. `make gettext` can be slow if the doc is long.

2. Use the generated pot files to update the po files:
    ```bash
    sphinx-intl update -p build/gettext -l zh_CN -w 0
    ```

3. Translate po files at `locales\zh_CN\LC_MESSAGES`. Pay attention to fuzzy matches (messages after `#, fuzzy`). Please be careful not to break reST notation.

4. Build translated document: `make -e SPHINXOPTS="-D language='zh_CN'" html` or `sphinx-build -M html source build -D language=zh_CN`

## Auto Build

```bash
pip install sphinx-autobuild
```

To autobuild the default version:
```bash
sphinx-autobuild source build/html
```

To autobuild the translated version:
```bash
sphinx-autobuild source build/html -D language=zh_CN --watch locales/zh_CN
```

By default, the doc is at `http://127.0.0.1:8000`

================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/deployment/dstack.po
================================================
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-28 10:50+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

#: ../../source/deployment/dstack.rst:2 dfac4ff2e6e7425290c3cd12a2de701c
msgid "dstack"
msgstr ""

#: ../../source/deployment/dstack.rst:4 2438a502621e4637bac3fa19171a5e53
msgid "`dstack <https://github.com/dstackai/dstack>`__ is an open-source alternative to Kubernetes and Slurm, designed to simplify GPU allocation and AI workload orchestration for ML teams across top clouds, on-prem clusters, and accelerators."
msgstr ""

#: ../../source/deployment/dstack.rst:7 1ff23a34c6ec4236b5b9d73e7d1d6241
msgid "Prerequisites"
msgstr ""

#: ../../source/deployment/dstack.rst:8 5f95e757ef4f4cba85ce773801be340d
msgid "Before you start, install dstack by following the `installation instructions <https://dstack.ai/docs/installation/>`__. Once dstack server is up, you can initialize your workspace as shown below:"
msgstr ""

#: ../../source/deployment/dstack.rst:17 ccf222149a8d43d2bf716c2a39956d77
msgid "Deploy Qwen3-30B-A3B"
msgstr ""

#: ../../source/deployment/dstack.rst:19 45b4f1af973546ab9651757fcb13b9e9
msgid "Deploy ``Qwen3-30B-A3B`` on instances available with cloud providers configured in your ``~/.dstack/server/config.yml`` file."
msgstr ""

#: ../../source/deployment/dstack.rst:21 8565ac9fdb394e32b10269087dfc18c7
msgid "You can use ``SgLang``, ``TGI`` or ``vLLM`` to serve the model. Here we use ``SgLang`` as an example."
msgstr ""

#: ../../source/deployment/dstack.rst:23 9ed2f6fbcd3f408fa0d34a3199709122
msgid "Create a `service <https://dstack.ai/docs/concepts/services/>`__ configuration file named ``serve-30b.dstack.yml`` with the following content:"
msgstr ""

#: ../../source/deployment/dstack.rst:49 0973beefd01b4c8081a5d4d2113dc7c4
msgid "For other inference backends such as vLLM or TGI, visit the `dstack Inference Examples <https://dstack.ai/examples/#inference>`__ documentation."
msgstr ""

#: ../../source/deployment/dstack.rst:51 826cb0f7e041443db0a8382fd918e3b7
msgid "Go ahead and apply the service configuration:"
msgstr ""

#: ../../source/deployment/dstack.rst:58 d16702dc64694eeaba319277a3ab4a03
msgid "Access the Service"
msgstr ""

#: ../../source/deployment/dstack.rst:60 7edacaff1d53424190978e77cd557190
msgid "After the service is successfully deployed, you can access the service's endpoint in the following ways:"
msgstr ""

#: ../../source/deployment/dstack.rst e83ef74bbe7e4e5eaf5f7a10773c9d46
msgid "CURL"
msgstr ""

#: ../../source/deployment/dstack.rst:66 9f51986795d3414f96dd65790157e723
msgid "Access through service endpoint at ``<dstack server URL>/proxy/services/<project name>/<run name>/``"
msgstr ""

#: ../../source/deployment/dstack.rst:84 9a8130ecf20c4e42ac9994e2145bfcec
msgid "When starting the dstack server, an admin token is automatically generated:"
msgstr ""

#: ../../source/deployment/dstack.rst 94c7a3424a19432ebfc0a98eb0725d42
msgid "Chat UI"
msgstr ""

#: ../../source/deployment/dstack.rst:93 5c7bb346537b456da005af909a333b09
msgid "Access through dstack's Chat UI at ``<dstack server URL>/projects/<project name>/models/<run name>/``"
msgstr ""

#: ../../source/deployment/dstack.rst 11cd02dcfb214277988135c49b839775
msgid "Gateway"
msgstr ""

#: ../../source/deployment/dstack.rst:102 e1e19487dd6f4ae8b12f728b39bef5d6
msgid "Running services for development purposes doesn't require setting up a gateway."
msgstr ""

#: ../../source/deployment/dstack.rst:104 bf94ccabbeaa491c9827e983e7f9950a
msgid "However, you'll need a gateway in the following cases:"
msgstr ""

#: ../../source/deployment/dstack.rst:106 15278aaab8214461b9fa17c95549f1cc
msgid "To use auto-scaling or rate limits"
msgstr ""

#: ../../source/deployment/dstack.rst:107 5cf53dda95e24cfda6c3a62e32632461
msgid "To enable HTTPS for the endpoint and map it to your domain"
msgstr ""

#: ../../source/deployment/dstack.rst:108 b91006984e1b42298a79360df47c942e
msgid "If your service requires WebSockets"
msgstr ""

#: ../../source/deployment/dstack.rst:109 a0ecebb158d048b7bb366e166509ab31
msgid "If your service cannot work with a path prefix"
msgstr ""

#: ../../source/deployment/dstack.rst:111 df78b814ee044508979d32d16c6fa418
msgid "For detailed information about gateway configuration and usage, refer to the `dstack documentation on gateways <https://dstack.ai/docs/concepts/gateways/>`__."
msgstr ""

#: ../../source/deployment/dstack.rst:114 da366f09f068481898788356a2720d00
msgid "Replicas and Auto Scaling"
msgstr ""

#: ../../source/deployment/dstack.rst:116 1814b084b9344951b8fda9bc315ff652
msgid "You can auto scale the service by specifying additional configurations in the ``serve-30b.dstack.yml``."
msgstr ""

#: ../../source/deployment/dstack.rst:118 a2ca8abfc03b4008a679e44ed42a6224
msgid "Set ``replicas: min..max`` to define the minimum and maximum number of replicas"
msgstr ""

#: ../../source/deployment/dstack.rst:119 8cfabedabf1f4d64bb6708f07852e3f8
msgid "Configure ``scaling`` rules to determine when to scale up or down"
msgstr ""

#: ../../source/deployment/dstack.rst:121 fcd56774d834404fa6561d65b46afe74
msgid "Below is a complete configuration example with auto-scaling enabled:"
msgstr ""

#: ../../source/deployment/dstack.rst:153 dd2ed9086cf5424ab14b641044da1279
msgid "The scaling property requires a gateway to be set up."
msgstr ""

#: ../../source/deployment/dstack.rst:156 3986bd1e05de49048a64cdf4d6782f8a
msgid "See also"
msgstr ""

#: ../../source/deployment/dstack.rst:157 7dc365aeac714fa5ba2989f0cb1c7e9c
msgid "**Fleets**: Create cloud and on-prem clusters using `Fleets <https://dstack.ai/docs/concepts/fleets/>`__."
msgstr ""

#: ../../source/deployment/dstack.rst:158 5e647ecf87164fccbe9589e3d9c540b9
msgid "**Dev Environments**: Experiment and test before deploying to production using `Dev Environments <https://dstack.ai/docs/concepts/dev-environments/>`__."
msgstr ""

#: ../../source/deployment/dstack.rst:159 36774ccbc7e6446a9bfcdf029e13fe58
msgid "**Tasks**: Schedule single node or distributed training using `Tasks <https://dstack.ai/docs/concepts/tasks/>`__."
msgstr ""

#: ../../source/deployment/dstack.rst:160 72cc0103fb1e4c82ae284fa1e3633bc4
msgid "**Services**: Deploy models as secure, auto-scaling OpenAI-compatible endpoints using `Services <https://dstack.ai/docs/concepts/services/>`__."
msgstr ""

#: ../../source/deployment/dstack.rst:161 ba9f121d4831485e84c3cf922edc3982
msgid "**Metrics**: Monitor performance with automatically tracked metrics via CLI or UI using `Metrics <https://dstack.ai/docs/guides/metrics/>`__."
msgstr ""



================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/deployment/openllm.po
================================================
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

#: ../../Qwen/source/deployment/openllm.rst:2 986ea00cb5af4a0d82f974ed79a82430
msgid "OpenLLM"
msgstr "OpenLLM"

#: ../../Qwen/source/deployment/openllm.rst:5 78be03fbdccb429892b03bf84596411b
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"

#: ../../Qwen/source/deployment/openllm.rst:7 a001f11d1c5440188121d20b3baf59db
msgid "OpenLLM allows developers to run Qwen2.5 models of different sizes as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Qwen2.5. Visit `the OpenLLM repository <https://github.com/bentoml/OpenLLM/>`_ to learn more."
msgstr "OpenLLM 允许开发者通过一个命令运行不同大小的 Qwen2.5 模型，提供 OpenAI 兼容的 API。它具有内置的聊天 UI，先进的推理后端，以及简化的工作流程来使用 Qwen2.5 创建企业级云部署。访问 `OpenLLM 仓库 <https://github.com/bentoml/OpenLLM/>`_ 了解更多信息。"

#: ../../Qwen/source/deployment/openllm.rst:10 229f89c3be65442bbe15905d75a0d13d
msgid "Installation"
msgstr "安装"

#: ../../Qwen/source/deployment/openllm.rst:12 79421f700fbc426cb6ce9841aff67503
msgid "Install OpenLLM using ``pip``."
msgstr "使用 ``pip`` 安装 OpenLLM。"

#: ../../Qwen/source/deployment/openllm.rst:18 69cfd6fe2e274173ad4065be91b71472
msgid "Verify the installation and display the help information:"
msgstr "验证安装并显示帮助信息："

#: ../../Qwen/source/deployment/openllm.rst:25 503cae99b14c4ef4b322b8ec0bd2d32d
msgid "Quickstart"
msgstr "快速开始"

#: ../../Qwen/source/deployment/openllm.rst:27 0ea788c801404d8780404611c87644b0
msgid "Before you run any Qwen2.5 model, ensure your model repository is up to date by syncing it with OpenLLM's latest official repository."
msgstr "在运行任何 Qwen2.5 模型之前，确保您的模型仓库与 OpenLLM 的最新官方仓库同步。"

#: ../../Qwen/source/deployment/openllm.rst:33 8852ff46ecdb45b2bfc9885bbfaacb02
msgid "List the supported Qwen2.5 models:"
msgstr "列出支持的 Qwen2.5 模型："

#: ../../Qwen/source/deployment/openllm.rst:39 3e4f6c11396844adb30d4e5812339484
msgid "The results also display the required GPU resources and supported platforms:"
msgstr "结果还会显示所需的 GPU 资源和支持的平台："

#: ../../Qwen/source/deployment/openllm.rst:57 ac4c0db02f5249d5882940820779db9a
msgid "To start a server with one of the models, use ``openllm serve`` like this:"
msgstr "要使用其中一个模型来启动服务器，请使用 ``openllm serve`` 命令，例如："

#: ../../Qwen/source/deployment/openllm.rst:63 0a1d3ec35c684e3bb3e971c916aa9be7
msgid "By default, the server starts at ``http://localhost:3000/``."
msgstr "默认情况下，服务器启动在 http://localhost:3000/。"

#: ../../Qwen/source/deployment/openllm.rst:66 2e787de9a62f4342bdf8f88ee0df5379
msgid "Interact with the model server"
msgstr "与模型服务器交互"

#: ../../Qwen/source/deployment/openllm.rst:68 b22802ad9027458bb30ea0da665fea36
msgid "With the model server up and running, you can call its APIs in the following ways:"
msgstr "服务器运行后，可以通过以下方式调用其 API："

#: ../../Qwen/source/deployment/openllm.rst 76214ea690094930899d6f2eddcc1454
msgid "CURL"
msgstr "CURL"

#: ../../Qwen/source/deployment/openllm.rst:74 42775a3df58f474782d29f2f82707bd9
msgid "Send an HTTP request to its ``/generate`` endpoint via CURL:"
msgstr "通过 CURL 向其 ``/generate`` 端点发送 HTTP 请求："

#: ../../Qwen/source/deployment/openllm.rst 4f0ff3eee2ab49dda5a72bd611a9d45e
msgid "Python client"
msgstr "Python 客户端"

#: ../../Qwen/source/deployment/openllm.rst:91 ce2e11a46e434798947b1e74ce82a19c
msgid "Call the OpenAI-compatible endpoints with frameworks and tools that support the OpenAI API protocol. Here is an example:"
msgstr "使用支持 OpenAI API 协议的框架和工具来调用。例如："

#: ../../Qwen/source/deployment/openllm.rst 107921d1a855430ca70c8c163d37c7f2
msgid "Chat UI"
msgstr "聊天 UI"

#: ../../Qwen/source/deployment/openllm.rst:118
#: b92df2759cd54c2b8316e2a160ede656
msgid "OpenLLM provides a chat UI at the ``/chat`` endpoint for the LLM server at http://localhost:3000/chat."
msgstr "OpenLLM 为 LLM 服务器提供的聊天 UI 位于 ``/chat`` 端点，地址为 http://localhost:3000/chat。"

#: ../../Qwen/source/deployment/openllm.rst:123
#: 0d3fa679178f443caf9c87623001be1f
msgid "Model repository"
msgstr "模型仓库"

#: ../../Qwen/source/deployment/openllm.rst:125
#: 54d6a9bdcc064aeb95a23b60d3d575ab
msgid "A model repository in OpenLLM represents a catalog of available LLMs. You can add your own repository to OpenLLM with custom Qwen2.5 variants for your specific needs. See our `documentation to learn details <https://github.com/bentoml/OpenLLM?tab=readme-ov-file#model-repository>`_."
msgstr "OpenLLM 中的模型仓库表示可用的 LLM 目录。您可以为 OpenLLM 添加自定义的 Qwen2.5 模型仓库，以满足您的特定需求。请参阅 `我们的文档 <https://github.com/bentoml/OpenLLM?tab=readme-ov-file#model-repository>`_ 了解详细信息。"



================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/deployment/sglang.po
================================================
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-05-07 19:51+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

#: ../../source/deployment/sglang.md:1 e05607ecb34c453aa8f805ea62edf34f
msgid "SGLang"
msgstr ""

#: ../../source/deployment/sglang.md:3 54dde79baa664197a2f3a5bb52383b70
msgid "[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models."
msgstr "[SGLang](https://github.com/sgl-project/sglang) 是一个用于大型语言模型和视觉语言模型的快速推理框架。"

#: ../../source/deployment/sglang.md:5 1ae08e7b1ffc4f0290eefb616eac1b63
msgid "To learn more about SGLang, please refer to the [documentation](https://docs.sglang.ai/)."
msgstr "要了解更多关于 SGLang 的信息，请参阅[官方文档](https://docs.sglang.ai/)。"

#: ../../source/deployment/sglang.md:7 927f96387c844f79a7cfa592e64fc1b2
msgid "Environment Setup"
msgstr "环境配置"

#: ../../source/deployment/sglang.md:9 e04e805b59364e96a366fa088fae04e4
msgid "By default, you can install `sglang` with pip in a clean environment:"
msgstr "默认情况下，你可以通过 pip 在新环境中安装 `sglang` ： "

#: ../../source/deployment/sglang.md:15 fcb185985f1b4c1589200ac4af2a6aee
msgid "If you have encountered issues in installation, please feel free to check the official document for installation ([link](https://docs.sglang.ai/start/install.html))."
msgstr "如果在安装过程中遇到问题，请随时查阅官方安装文档（[链接](https://docs.sglang.ai/start/install.html)）"

#: ../../source/deployment/sglang.md:17 a0f36bc7b4e24d598d381e2705f73eb1
msgid "API Service"
msgstr "API 服务"

#: ../../source/deployment/sglang.md:19 4d7006fa87884605b48700b05f602bb1
msgid "It is easy to build an OpenAI-compatible API service with SGLang, which can be deployed as a server that implements OpenAI API protocol. By default, it starts the server at `http://localhost:30000`.  You can specify the address with `--host` and `--port` arguments.  Run the command as shown below:"
msgstr "借助 SGLang ，构建一个与OpenAI API兼容的API服务十分简便，该服务可以作为实现OpenAI API协议的服务器进行部署。默认情况下，它将在 `http://localhost:30000` 启动服务器。您可以通过 `--host` 和 `--port` 参数来自定义地址。请按照以下所示运行命令："

#: ../../source/deployment/sglang.md:27 6d10b2003b9b4dd0b9dca0a2e8d33fd6
msgid "By default, if the `--model-path` does not point to a valid local directory, it will download the model files from the Hugging Face Hub. To download model from ModelScope, set the following before running the above command:"
msgstr "默认情况下，如果模型未指向有效的本地目录，它将从 Hugging Face Hub 下载模型文件。要从 ModelScope 下载模型，请在运行上述命令之前设置以下内容："

#: ../../source/deployment/sglang.md:33 d3cee58928964c5dba7720884d6c5189
msgid "For distributed inference with tensor parallelism, it is as simple as"
msgstr "对于使用张量并行的分布式推理，操作非常简单："

#: ../../source/deployment/sglang.md:37 4c8600c0f3ac4d0e803af9c089d73dae
msgid "The above command will use tensor parallelism on 4 GPUs. You should change the number of GPUs according to your demand."
msgstr "上述命令将在 4 块 GPU 上使用张量并行。您应根据需求调整 GPU 的数量。"

#: ../../source/deployment/sglang.md:40 4ca7c9376bd84c65a877134047aeee37
msgid "Basic Usage"
msgstr "基本用法"

#: ../../source/deployment/sglang.md:42 bd805ae178b6401c925a959334b64b88
msgid "Then, you can use the [create chat interface](https://platform.openai.com/docs/api-reference/chat/completions/create) to communicate with Qwen:"
msgstr "然后，您可以利用 [create chat interface](https://platform.openai.com/docs/api-reference/chat/completions/create) 来与Qwen进行对话："

#: ../../source/deployment/sglang.md 2f867c83bdce4a4286842da69aa68640
#: 418b07dd6a574642bfa89052103763e9
msgid "curl"
msgstr ""

#: ../../source/deployment/sglang.md 14df52980bfe41689ac8dc8699be2134
#: 7a50af3d10534acfbf980ac0d2ee92e5
msgid "Python"
msgstr ""

#: ../../source/deployment/sglang.md:62 ../../source/deployment/sglang.md:126
#: 669de086434740279e9cf7c54fb42e56 a3f9e92506374567a4660de9071567e8
msgid "You can use the API client with the `openai` Python SDK as shown below:"
msgstr "或者您可以如下面所示使用 `openai` Python SDK中的 API 客户端："

#: ../../source/deployment/sglang.md:92 d8321f81e9624419b5e0fdb7012816e4
msgid "While the default sampling parameters would work most of the time for thinking mode, it is recommended to adjust the sampling parameters according to your application,  and always pass the sampling parameters to the API."
msgstr "虽然默认的采样参数在大多数情况下适用于思考模式，但建议根据您的应用调整采样参数，并始终将采样参数传递给 API。"

#: ../../source/deployment/sglang.md:98 d6379b9f885748ca89bd3fe6c3362376
msgid "Thinking & Non-Thinking Modes"
msgstr "思考与非思考模式"

#: ../../source/deployment/sglang.md:100 f82eb1dfcc934667ac5aee0600140794
msgid "Qwen3 models will think before respond. This behavior could be controlled by either the hard switch, which could disable thinking completely, or the soft switch, where the model follows the instruction of the user on whether it should think."
msgstr "Qwen3 模型会在回复前进行思考。这种行为可以通过硬开关（完全禁用思考）或软开关（模型遵循用户关于是否应该思考的指令）来控制。"

#: ../../source/deployment/sglang.md:103 bac5d71126f04d149c0d674b7b2f7ec8
msgid "The hard switch is available in SGLang through the following configuration to the API call. To disable thinking, use"
msgstr "硬开关在 SGLang 中可以通过以下 API 调用配置使用。要禁用思考，请使用"

#: ../../source/deployment/sglang.md:158 09ccfb31c140452399460ed1357afc28
msgid "Please note that passing `enable_thinking` is not OpenAI API compatible. The exact method may differ among frameworks."
msgstr "请注意，`enable_thinking`并非OpenAI API定义的参数，具体传入方式可能因推理框架不同而不同。"

#: ../../source/deployment/sglang.md:163 650e618e24044303b48b6bc9d4ccc239
msgid "To completely disable thinking, you could use [a custom chat template](../../source/assets/qwen3_nonthinking.jinja) when starting the model:"
msgstr "要完全禁用思考，您可以在启动模型时使用[自定义聊天模板](../../source/assets/qwen3_nonthinking.jinja)："

#: ../../source/deployment/sglang.md:169 9c0dc646158541a991045064cfa5b258
msgid "The chat template prevents the model from generating thinking content, even if the user instructs the model to do so with `/think`."
msgstr "该聊天模板会阻止模型生成思考内容，即使用户通过 `/think` 指示模型这样做。"

#: ../../source/deployment/sglang.md:174 c23b692035b14b1099c8a148956457a5
msgid "It is recommended to set sampling parameters differently for thinking and non-thinking modes."
msgstr "建议为思考模式和非思考模式分别设置不同的采样参数。"

#: ../../source/deployment/sglang.md:177 c5c258baa5fa46ccbadb58573699a0f1
msgid "Parsing Thinking Content"
msgstr "解析思考内容"

#: ../../source/deployment/sglang.md:179 02d90ad41ecb4d51ae9f55458670843e
msgid "SGLang supports parsing the thinking content from the model generation into structured messages:"
msgstr "SGLang 支持将模型生成的思考内容解析为结构化消息："

#: ../../source/deployment/sglang.md:184 854a73931a9e404b9942a10dd2702023
msgid "The response message will have a field named `reasoning_content` in addition to `content`, containing the thinking content generated by the model."
msgstr "响应消息除了包含 `content` 字段外，还会有一个名为 `reasoning_content` 的字段，其中包含模型生成的思考内容。"

#: ../../source/deployment/sglang.md:187 0bae083925f64ec7984c1b7c86d00ac1
msgid "Please note that this feature is not OpenAI API compatible."
msgstr "请注意，此功能与 OpenAI API 规范不一致。"

#: ../../source/deployment/sglang.md:191 f23a3deb557a4d808cef5bdaad6dcf16
msgid "`enable_thinking=False` may not be compatible with this feature. If you need to pass `enable_thinking=False` to the API, please consider disabling parsing thinking content."
msgstr "`enable_thinking=False` 可能与思考内容解析不兼容。如果需要向 API 传递 `enable_thinking=False`，请考虑禁用该功能。"

#: ../../source/deployment/sglang.md:195 930b8e7391204fc68d6473fec1d2e4e0
msgid "Parsing Tool Calls"
msgstr "解析工具调用"

#: ../../source/deployment/sglang.md:197 8fb5272b079543219b125e70da4f89d3
msgid "SGLang supports parsing the tool calling content from the model generation into structured messages:"
msgstr "SGLang 支持将模型生成的工具调用内容解析为结构化消息："

#: ../../source/deployment/sglang.md:202 28ca5e5fc8694b839b91cb3f7f38a0cb
msgid "For more information, please refer to [our guide on Function Calling](../framework/function_call.md)."
msgstr "详细信息，请参阅[函数调用的指南](../framework/function_call.md#vllm)。"

#: ../../source/deployment/sglang.md:204 59cd747bac244c57afc56b7f3d041df8
msgid "Structured/JSON Output"
msgstr "结构化/JSON输出"

#: ../../source/deployment/sglang.md:206 4534e68747c041d5addd24c36fbc8250
msgid "SGLang supports structured/JSON output.  Please refer to [SGLang's documentation](https://docs.sglang.ai/backend/structured_outputs.html#OpenAI-Compatible-API). Besides, it is also recommended to instruct the model to generate the specific format in the system message or in your prompt."
msgstr "SGLang 支持结构化/JSON 输出。请参阅[SGLan文档](https://docs.sglang.ai/backend/structured_outputs.html#OpenAI-Compatible-API)。此外，还建议在系统消息或您的提示中指示模型生成特定格式。"

#: ../../source/deployment/sglang.md:210 734cfd6d921e4706a07e112237b09b38
msgid "Serving Quantized models"
msgstr "部署量化模型"

#: ../../source/deployment/sglang.md:212 e7b0890292ad44278e910b6ee97f6d2d
msgid "Qwen3 comes with two types of pre-quantized models, FP8 and AWQ."
msgstr "Qwen3 提供了两种类型的预量化模型：FP8 和 AWQ。"

#: ../../source/deployment/sglang.md:214 0bb52b4e43504cb8ac143e594247a0e0
msgid "The command serving those models are the same as the original models except for the name change:"
msgstr "部署这些模型的命令与原始模型相同，只是名称有所更改："

#: ../../source/deployment/sglang.md:223 714f8f196af24271b6967dd038614f88
msgid "Context Length"
msgstr "上下文长度"

#: ../../source/deployment/sglang.md:225 ad211116852345b8bfb9bb9e58027486
msgid "The context length for Qwen3 models in pretraining is up to 32,768 tokens. To handle context length substantially exceeding 32,768 tokens, RoPE scaling techniques should be applied. We have validated the performance of [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts."
msgstr "Qwen3 模型在预训练中的上下文长度最长为 32,768 个 token。为了处理显著超过 32,768 个 token 的上下文长度，应应用 RoPE 缩放技术。我们已经验证了 [YaRN](https://arxiv.org/abs/2309.00071) 的性能，这是一种增强模型长度外推的技术，可确保在长文本上的最佳性能。"

#: ../../source/deployment/sglang.md:229 d243e7a41b214c289be782db495e82f4
msgid "SGLang supports YaRN, which can be configured as"
msgstr "SGLang 支持 YaRN，可以配置为"

#: ../../source/deployment/sglang.md:235 c15ed6a15a714884ab3024654203ec06
msgid "SGLang implements static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.** We advise adding the `rope_scaling` configuration only when processing long contexts is required.  It is also recommended to modify the `factor` as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set `factor` as 2.0."
msgstr "SGLang 实现了静态 YaRN，这意味着无论输入长度如何，缩放因子都保持不变，**这可能会对较短文本的性能产生影响。** 我们建议仅在需要处理长上下文时添加 `rope_scaling` 配置。还建议根据需要调整 `factor`。例如，如果您的应用程序的典型上下文长度为 65,536 个 token，则最好将 `factor` 设置为 2.0。"

#: ../../source/deployment/sglang.md:241 e0528eb23e2a454585b46ef178d28a79
msgid "The default `max_position_embeddings` in `config.json` is set to 40,960, which is used by SGLang. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing and leave adequate room for model thinking. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance."
msgstr "`config.json` 中的默认 `max_position_embeddings` 被设置为 40,960，SGLang 将使用该值。此分配包括为输出保留 32,768 个 token，为典型提示保留 8,192 个 token，这足以应对大多数涉及短文本处理的场景，并为模型思考留出充足空间。如果平均上下文长度不超过 32,768 个 token，我们不建议在此场景中启用 YaRN，因为这可能会降低模型性能。"

#~ msgid "Please note that `sglang` relies on `flashinfer-python` and has strict dependencies on `torch` and its CUDA versions. Check the note in the official document for installation ([link](https://docs.sglang.ai/start/install.html)) for more help."
#~ msgstr "请留意预构建的 `sglang` 依赖 `flashinfer-python`，并对`torch`和其CUDA版本有强依赖。请查看[官方文档](https://docs.sglang.ai/start/install.html)中的注意事项以获取有关安装的帮助。"

#~ msgid "This feature has not been released. For more information, please see this [pull request](https://github.com/sgl-project/sglang/pull/5551)."
#~ msgstr "此功能尚未发布。更多信息，请参阅此[pull request](https://github.com/sgl-project/sglang/pull/5551)。"



================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/deployment/skypilot.po
================================================
# Copyright (C) 2024, Qwen Team, Alibaba Group.
# This file is distributed under the same license as the Qwen package.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

#: ../../Qwen/source/deployment/skypilot.rst:2 795ad4f30e27494d93675f71bb1a5cc4
msgid "SkyPilot"
msgstr ""

#: ../../Qwen/source/deployment/skypilot.rst:5 aad807db94a24d868c9c1b364b47e152
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"

#: ../../Qwen/source/deployment/skypilot.rst:8 d6bbf736584f4bbfa9c300d50a2ed669
msgid "What is SkyPilot"
msgstr "SkyPilot 是什么"

#: ../../Qwen/source/deployment/skypilot.rst:10
#: b66facae41bf493880e43044e2915a45
msgid "SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, the highest GPU availability, and managed execution. Its features include:"
msgstr "SkyPilot 是一个可以在任何云上运行 LLM 、 AI 应用以及批量任务的框架，旨在实现最大程度的成本节省、最高的 GPU 可用性以及受管理的执行过程。其特性包括："

#: ../../Qwen/source/deployment/skypilot.rst:14
#: 621f021163c549d0aadb1c911a3a3ef5
msgid "Get the best GPU availability by utilizing multiple resources pools across multiple regions and clouds."
msgstr "通过跨区域和跨云充分利用多个资源池，以获得最佳的 GPU 可用性。"

#: ../../Qwen/source/deployment/skypilot.rst:16
#: ea1723c3b5be454cad3219836f4386d8
msgid "Pay absolute minimum — SkyPilot picks the cheapest resources across regions and clouds. No managed solution markups."
msgstr "把费用降到最低—— SkyPilot 在各区域和云平台中为您挑选最便宜的资源。无需任何托管解决方案的额外加价。"

#: ../../Qwen/source/deployment/skypilot.rst:18
#: e479693ecf08411ca35d8d0727c8f441
msgid "Scale up to multiple replicas across different locations and accelerators, all served with a single endpoint"
msgstr "将服务扩展到多个副本上，所有副本通过单一 endpoint 对外提供服务"

#: ../../Qwen/source/deployment/skypilot.rst:20
#: 1f9cdd2ae2544d1faa8a4c463ee0e42c
msgid "Everything stays in your cloud account (your VMs & buckets)"
msgstr "所有内容均保存在您的云账户中（包括您的虚拟机和 bucket ）"

#: ../../Qwen/source/deployment/skypilot.rst:21
#: 5bb9b617764942d989e5093463a359f0
msgid "Completely private - no one else sees your chat history"
msgstr "完全私密 - 没有其他人能看到您的聊天记录"

#: ../../Qwen/source/deployment/skypilot.rst:24
#: cf0c456ac72f40ac98790c11dc243317
msgid "Install SkyPilot"
msgstr "安装 SkyPilot"

#: ../../Qwen/source/deployment/skypilot.rst:26
#: 78d86c1fa8104b138b01aed640b262fc
msgid "We advise you to follow the `instruction <https://skypilot.readthedocs.io/en/latest/getting-started/installation.html>`__ to install SkyPilot. Here we provide a simple example of using ``pip`` for the installation as shown below."
msgstr "我们建议您按照 `指示 <https://skypilot.readthedocs.io/en/latest/getting-started/installation.html>`__ 安装 SkyPilot 。以下为您提供了一个使用 ``pip`` 进行安装的简单示例："

#: ../../Qwen/source/deployment/skypilot.rst:38
#: a7c88265bf404f55b85388c81a240199
msgid "After that, you need to verify cloud access with a command like:"
msgstr "随后，您需要用如下命令确认是否能使用云："

#: ../../Qwen/source/deployment/skypilot.rst:44
#: 72025dfba0144f63a720f6da0dd39bfa
msgid "For more information, check the `official document <https://skypilot.readthedocs.io/en/latest/getting-started/installation.html>`__ and see if you have set up your cloud accounts correctly."
msgstr "若需更多信息，请查阅官方文档，确认您的云账户设置是否正确无误。"

#: ../../Qwen/source/deployment/skypilot.rst:47
#: 61be006061554e5ea40d55497e11e192
msgid "Alternatively, you can also use the official docker image with SkyPilot master branch automatically cloned by running:"
msgstr "或者，您也可以使用官方提供的 docker 镜像，可以自动克隆 SkyPilot 的主分支："

#: ../../Qwen/source/deployment/skypilot.rst:63
#: 4ae89fb44c6643a3a82fca5cee622af4
msgid "Running Qwen2.5-72B-Instruct with SkyPilot"
msgstr "使用 SkyPilot 运行 Qwen2.5-72B-Instruct "

#: ../../Qwen/source/deployment/skypilot.rst:65
#: 1bc4973c2eb745689ded0af54ba33e0e
msgid "Start serving Qwen2.5-72B-Instruct on a single instance with any available GPU in the list specified in `serve-72b.yaml <https://github.com/skypilot-org/skypilot/blob/master/llm/qwen/serve-72b.yaml>`__ with a vLLM-powered OpenAI-compatible endpoint:"
msgstr "`serve-72b.yaml <https://github.com/skypilot-org/skypilot/blob/master/llm/qwen/serve-72b.yaml>`__ 中列出了支持的 GPU 。您可使用配备这类 GPU 的单个运算实例来部署 Qwen2.5-72B-Instruct 服务。该服务由 vLLM 搭建，并与 OpenAI API 兼容。以下为部署方法："

#: ../../Qwen/source/deployment/skypilot.rst:74
#: ../../Qwen/source/deployment/skypilot.rst:123
#: ac3692ed16974facbd58b6886cd111af b325de015e7b4bb0a91491d3f7418792
msgid "**Before launching, make sure you have changed Qwen/Qwen2-72B-Instruct to Qwen/Qwen2.5-72B-Instruct in the YAML file.**"
msgstr "**在启动之前，请先将 YAML 文件中的 Qwen/Qwen2-72B-Instruct 修改为 Qwen/Qwen2.5-72B-Instruct。**"

#: ../../Qwen/source/deployment/skypilot.rst:76
#: 6046b3c86fae4a43878fbadbeb33fbd8
msgid "Send a request to the endpoint for completion:"
msgstr "向该 endpoint 发送续写请求："

#: ../../Qwen/source/deployment/skypilot.rst:90
#: 2ec56c2028a94f568fd2c1a65063d25a
msgid "Send a request for chat completion:"
msgstr "向该 endpoint 发送对话续写请求"

#: ../../Qwen/source/deployment/skypilot.rst:112
#: c8e140ddfd914ff5a460621a7ca1891e
msgid "Scale up the service with SkyPilot Serve"
msgstr "使用 SkyPilot Serve 扩展服务规模"

#: ../../Qwen/source/deployment/skypilot.rst:114
#: 0db304ab396d45adb6017d78cd1ee4a2
msgid "With `SkyPilot Serve <https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html>`__, a serving library built on top of SkyPilot, scaling up the Qwen service is as simple as running:"
msgstr "使用 `SkyPilot Serve <https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html>`__ 扩展 Qwen 的服务规模非常容易，只需运行："

#: ../../Qwen/source/deployment/skypilot.rst:125
#: 25bbbf9e49be44d3899074ff97202d71
msgid "This will start the service with multiple replicas on the cheapest available locations and accelerators. SkyServe will automatically manage the replicas, monitor their health, autoscale based on load, and restart them when needed."
msgstr "这将启动服务，使用多个副本部署在最经济的可用位置和加速器上。 SkyServe 将自动管理这些副本，监控其健康状况，根据负载进行自动伸缩，并在必要时重启它们。"

#: ../../Qwen/source/deployment/skypilot.rst:130
#: bda628bab7ef41a0918dc4b80a9b3cfe
msgid "A single endpoint will be returned and any request sent to the endpoint will be routed to the ready replicas."
msgstr "将返回一个 endpoint ，所有发送至该endpoint的请求都将被路由至就绪状态的副本。"

#: ../../Qwen/source/deployment/skypilot.rst:133
#: b232dbbdcf674d56bcf9c0331c020864
msgid "To check the status of the service, run:"
msgstr "运行如下命令检查服务的状态："

#: ../../Qwen/source/deployment/skypilot.rst:139
#: 556b854caf7243fb93f253ebe2dc9033
msgid "After a while, you will see the following output:"
msgstr "很快，您将看到如下输出："

#: ../../Qwen/source/deployment/skypilot.rst:152
#: 5a6055c5a42c4b2db6693c1095688de8
msgid "As shown, the service is now backed by 2 replicas, one on Azure and one on GCP, and the accelerator type is chosen to be **the cheapest available one** on the clouds. That said, it maximizes the availability of the service while minimizing the cost."
msgstr "如下所示：该服务现由两个副本提供支持，一个位于 Azure 平台，另一个位于 GCP 平台。同时，已为服务选择云服务商提供的 **最经济实惠** 的加速器类型。这样既最大限度地提升了服务的可用性，又尽可能降低了成本。"

#: ../../Qwen/source/deployment/skypilot.rst:157
#: a18533d33dc54a1091ded0b4bba0a1eb
msgid "To access the model, we use a ``curl -L`` command (``-L`` to follow redirect) to send the request to the endpoint:"
msgstr "要访问模型，我们使用带有 ``curl -L`` （用于跟随重定向），将请求发送到 endpoint ："

#: ../../Qwen/source/deployment/skypilot.rst:182
#: 34cd50fd79e24d8895075f7841b025e4
msgid "Accessing Qwen2.5 with Chat GUI"
msgstr "使用 Chat GUI 调用 Qwen2.5"

#: ../../Qwen/source/deployment/skypilot.rst:184
#: ca6994cda1cb469e83ce8c026bb67e42
msgid "It is also possible to access the Qwen2.5 service with GUI by connecting a `FastChat GUI server <https://github.com/lm-sys/FastChat>`__ to the endpoint launched above (see `gui.yaml <https://github.com/skypilot-org/skypilot/blob/master/llm/qwen/gui.yaml>`__)."
msgstr "可以通过 `FastChat <https://github.com/lm-sys/FastChat>`__ 来使用 GUI 调用 Qwen2.5 的服务："

#: ../../Qwen/source/deployment/skypilot.rst:188
#: 99a63e55ab5c46258c20ab89cdfa39dc
msgid "Start the Chat Web UI:"
msgstr "开启一个 Chat Web UI"

#: ../../Qwen/source/deployment/skypilot.rst:194
#: e61593a092c146f8a06af896d6af17f2
msgid "**Before launching, make sure you have changed Qwen/Qwen1.5-72B-Chat to Qwen/Qwen2.5-72B-Instruct in the YAML file.**"
msgstr "**在启动之前，请先将 YAML 文件中的 Qwen/Qwen1.5-72B-Chat 修改为 Qwen/Qwen2.5-72B-Instruct。**"

#: ../../Qwen/source/deployment/skypilot.rst:196
#: 9631068a8b424aa8af6dc6911daac7a9
msgid "Then, we can access the GUI at the returned gradio link:"
msgstr "随后，我们可以通过返回的 gradio 链接来访问 GUI ："

#: ../../Qwen/source/deployment/skypilot.rst:202
#: 1464a56dcd06404aafbe6d7d2c72212b
msgid "Note that you may get better results by using a different temperature and top_p value."
msgstr "你可以通过使用不同的温度和 top_p 值来尝试取得更好的结果。"

#: ../../Qwen/source/deployment/skypilot.rst:205
#: d257f49d835e4c12b28bc680bb78a9cb
msgid "Summary"
msgstr "总结"

#: ../../Qwen/source/deployment/skypilot.rst:207
#: 06b9684a19774eaba4f69862332c5166
msgid "With SkyPilot, it is easy for you to deploy Qwen2.5 on any cloud. We advise you to read the official doc for more usages and updates. Check `this <https://skypilot.readthedocs.io/>`__ out!"
msgstr "通过 SkyPilot ，你可以轻松地在任何云上部署 Qwen2.5 。我们建议您阅读 `官方文档 <https://skypilot.readthedocs.io/>`__ 了解更多用法和最新进展。"



================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/deployment/tgi.po
================================================
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

#: ../../Qwen/source/deployment/tgi.rst:2 2abcc96f9deb4b9187ac9d88fc69e929
msgid "TGI"
msgstr ""

#: ../../Qwen/source/deployment/tgi.rst:5 2d124d7cb95f47388aa48c662932ef9b
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"

#: ../../Qwen/source/deployment/tgi.rst:7 4e5d299c4fdd46d5aba38c9af5765792
msgid "Hugging Face's Text Generation Inference (TGI) is a production-ready framework specifically designed for deploying and serving large language models (LLMs) for text generation tasks. It offers a seamless deployment experience, powered by a robust set of features:"
msgstr "Hugging Face 的 Text Generation Inference (TGI) 是一个专为部署大规模语言模型 (Large Language Models, LLMs) 而设计的生产级框架。TGI提供了流畅的部署体验，并稳定支持如下特性："

#: ../../Qwen/source/deployment/tgi.rst:9 ecd4fc11a95140959915d062791ceba1
msgid "`Speculative Decoding <Speculative Decoding_>`_: Accelerates generation speeds."
msgstr "`推测解码 (Speculative Decoding) <Speculative Decoding_>`_ ：提升生成速度。"

#: ../../Qwen/source/deployment/tgi.rst:10 84590a56416348bf85b3f296cf57e257
msgid "`Tensor Parallelism`_: Enables efficient deployment across multiple GPUs."
msgstr "张量并行 (`Tensor Parallelism`_) ：高效多卡部署。"

#: ../../Qwen/source/deployment/tgi.rst:11 a996d6ecd7b94c5cb9752d370f29a9b1
msgid "`Token Streaming`_: Allows for the continuous generation of text."
msgstr "流式生成 (`Token Streaming`_) ：支持持续性生成文本。"

#: ../../Qwen/source/deployment/tgi.rst:12 8f591c045ba34f4581bb19652db9f9b3
msgid "Versatile Device Support: Works seamlessly with `AMD`_, `Gaudi`_ and `AWS Inferentia`_."
msgstr "灵活的硬件支持：与 `AMD`_ ， `Gaudi`_ 和 `AWS Inferentia`_ 无缝衔接。"

#: ../../Qwen/source/deployment/tgi.rst:21 5e8a98b91fc146e0b581422faa683a18
msgid "Installation"
msgstr "安装"

#: ../../Qwen/source/deployment/tgi.rst:23 684ef25bfb0e460999d6dcccce41b85f
msgid "The easiest way to use TGI is via the TGI docker image. In this guide, we show how to use TGI with docker."
msgstr "通过 TGI docker 镜像使用 TGI 轻而易举。本文将主要介绍 TGI 的 docker 用法。"

#: ../../Qwen/source/deployment/tgi.rst:25 c563fa3eccb04d00a477c1d2e8b15c38
msgid "It's possible to run it locally via Conda or build locally. Please refer to `Installation Guide <https://huggingface.co/docs/text-generation-inference/installation>`_  and `CLI tool <https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/using_cli>`_ for detailed instructions."
msgstr "也可通过 Conda 实机安装或搭建服务。请参考 `Installation Guide <https://huggingface.co/docs/text-generation-inference/installation>`_ 与 `CLI tool <https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/using_cli>`_ 以了解详细说明。"

#: ../../Qwen/source/deployment/tgi.rst:28 b55fc58ff4cb472abca08296409c7837
msgid "Deploy Qwen2.5 with TGI"
msgstr "通过 TGI 部署 Qwen2.5"

#: ../../Qwen/source/deployment/tgi.rst:30 586a8425ec5d413592fd7daf579c7e87
msgid "**Find a Qwen2.5 Model:** Choose a model from `the Qwen2.5 collection <https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e>`_."
msgstr "**选定 Qwen2.5 模型：** 从 `the Qwen2.5 collection <https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e>`_ 中挑选模型。"

#: ../../Qwen/source/deployment/tgi.rst:31 50fcab8da35941eca308786979dbaf38
msgid "**Deployment Command:** Run the following command in your terminal, replacing ``model`` with your chosen Qwen2.5 model ID and ``volume`` with the path to your local data directory:"
msgstr "**部署TGI服务：** 在终端中运行以下命令，注意替换 ``model`` 为选定的 Qwen2.5 模型 ID 、 ``volume`` 为本地的数据路径： "

#: ../../Qwen/source/deployment/tgi.rst:42 2a800533a7d84bdeab1da0976b0cab53
msgid "Using TGI API"
msgstr "使用 TGI API"

#: ../../Qwen/source/deployment/tgi.rst:44 f05d1ec08140452782d0659543fad7d1
msgid "Once deployed, the model will be available on the mapped port (8080)."
msgstr "一旦成功部署，API 将于选定的映射端口 (8080) 提供服务。"

#: ../../Qwen/source/deployment/tgi.rst:46 f265dc1522b049c98ba31fd5d255c50f
msgid "TGI comes with a handy API for streaming response:"
msgstr "TGI 提供了简单直接的 API 支持流式生成："

#: ../../Qwen/source/deployment/tgi.rst:54 e9cc4c0571b74bd08b2a59347503e653
msgid "It's also available on OpenAI style API:"
msgstr "也可使用 OpenAI 风格的 API 使用 TGI ："

#: ../../Qwen/source/deployment/tgi.rst:73 5dc7e9c74fc04483ba8e5dcdd7052020
msgid "The model field in the JSON is not used by TGI, you can put anything."
msgstr "JSON 中的 model 字段不会被 TGI 识别，您可传入任意值。"

#: ../../Qwen/source/deployment/tgi.rst:75 d60f837152014cda8baebc90d65d1cc0
#, python-format
msgid "Refer to the `TGI Swagger UI <https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/completions>`_ for a complete API reference."
msgstr "完整 API 文档，请查阅 `TGI Swagger UI <https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/completions>`_ 。"

#: ../../Qwen/source/deployment/tgi.rst:77 b59564031e5548088aef828f9753e337
msgid "You can also use Python API:"
msgstr "你也可以使用 Python 访问 API ："

#: ../../Qwen/source/deployment/tgi.rst:106 62646cecb024479ebfeca5f3063e7322
msgid "Quantization for Performance"
msgstr "量化"

#: ../../Qwen/source/deployment/tgi.rst:108 4a8d39bf37be4820afb230f9a977b431
msgid "Data-dependent quantization (GPTQ and AWQ)"
msgstr "依赖数据的量化方案（ GPTQ 与 AWQ ）"

#: ../../Qwen/source/deployment/tgi.rst:110 ef2b18f47e4f4f7ebb017be628cb0be9
msgid "Both GPTQ and AWQ models are data-dependent. The official quantized models can be found from `the Qwen2.5 collection`_ and you can also quantize models with your own dataset to make it perform better on your use case."
msgstr "GPTQ 与 AWQ 均依赖数据进行量化。我们提供了预先量化好的模型，请于 `the Qwen2.5 collection`_ 查找。你也可以使用自己的数据集自行量化，以在你的场景中取得更好效果。"

#: ../../Qwen/source/deployment/tgi.rst:112 53d94278a2e3409abb9980ebc7c96c24
msgid "The following shows the command to start TGI with Qwen2.5-7B-Instruct-GPTQ-Int4:"
msgstr "以下是通过 TGI 部署 Qwen2.5-7B-Instruct-GPTQ-Int4 的指令："

#: ../../Qwen/source/deployment/tgi.rst:122 68ff8a07d0eb40cfa67d79e01adea070
msgid "If the model is quantized with AWQ, e.g. Qwen/Qwen2.5-7B-Instruct-AWQ, please use ``--quantize awq``."
msgstr "如果模型是 AWQ 量化的，如 Qwen/Qwen2.5-7B-Instruct-AWQ ，请使用 ``--quantize awq`` 。"

#: ../../Qwen/source/deployment/tgi.rst:124 b4c3b82b1f2a43a8a02383fd0afbda5f
msgid "Data-agnostic quantization"
msgstr "不依赖数据的量化方案"

#: ../../Qwen/source/deployment/tgi.rst:126 7a6b89c94b72407482b96790f5bbd272
msgid "EETQ on the other side is not data dependent and can be used with any model. Note that we're passing in the original model (instead of a quantized model) with the ``--quantize eetq`` flag."
msgstr "EETQ 是一种不依赖数据的量化方案，可直接用于任意模型。请注意，我们需要传入原始模型，并使用 ``--quantize eetq`` 标志。"

#: ../../Qwen/source/deployment/tgi.rst:138 763166da65924887b3bba99ea4d2baab
msgid "Multi-Accelerators Deployment"
msgstr "多卡部署"

#: ../../Qwen/source/deployment/tgi.rst:140 ddcfcff947894f168c7945ae9c42a579
msgid "Use the ``--num-shard`` flag to specify the number of accelerators. Please also use ``--shm-size 1g`` to enable shared memory for optimal NCCL performance (`reference <https://github.com/huggingface/text-generation-inference?tab=readme-ov-file#a-note-on-shared-memory-shm>`__):"
msgstr "使用 ``--num-shard`` 指定卡书数量。 请务必传入 ``--shm-size 1g`` 让 NCCL 发挥最好性能 (`说明 <https://github.com/huggingface/text-generation-inference?tab=readme-ov-file#a-note-on-shared-memory-shm>`__) ："

#: ../../Qwen/source/deployment/tgi.rst:151 520c46fb404c4ec9bf89280e4a71f1e8
msgid "Speculative Decoding"
msgstr "推测性解码 (Speculative Decoding)"

#: ../../Qwen/source/deployment/tgi.rst:153 74c6b65f76b74d56ad109af9da11f66e
msgid "Speculative decoding can reduce the time per token by speculating on the next token. Use the ``--speculative-decoding`` flag, setting the value to the number of tokens to speculate on (default: 0 for no speculation):"
msgstr "推测性解码 (Speculative Decoding) 通过预先推测下一 token 来节约每 token 需要的时间。使用 ``--speculative-decoding`` 设定预先推测 token 的数量 （默认为0，表示不预先推测）："

#: ../../Qwen/source/deployment/tgi.rst:164 dee05ee0fb1a4f2da42b250192d943f5
msgid "The overall performance of speculative decoding highly depends on the type of task. It works best for code or highly repetitive text."
msgstr "推测性解码的加速效果依赖于任务类型，对于代码或重复性较高的文本生成任务，提速更明显。"

#: ../../Qwen/source/deployment/tgi.rst:166 731f300bc1174589901dd5feb26e8b2f
msgid "More context on speculative decoding can be found `here <https://huggingface.co/docs/text-generation-inference/conceptual/speculation>`__."
msgstr "更多说明可查阅 `此文档 <https://huggingface.co/docs/text-generation-inference/conceptual/speculation>`__ 。"

#: ../../Qwen/source/deployment/tgi.rst:170 65a7d5553dd145398f9705c1ee6c28f0
msgid "Zero-Code Deployment with HF Inference Endpoints"
msgstr "使用 HF Inference Endpoints 零代码部署"

#: ../../Qwen/source/deployment/tgi.rst:172 721c3a7578f846ae8e21e595923e17e7
msgid "For effortless deployment, leverage Hugging Face Inference Endpoints:"
msgstr "使用 Hugging Face Inference Endpoints 不费吹灰之力："

#: ../../Qwen/source/deployment/tgi.rst:174 7741607488d94a9f8be2ffcb6a5322fb
msgid "**GUI interface:** `<https://huggingface.co/inference-endpoints/dedicated>`__"
msgstr ""

#: ../../Qwen/source/deployment/tgi.rst:175 02ff4520e66f4a42828483da7d25445f
msgid "**Coding interface:** `<https://huggingface.co/blog/tgi-messages-api>`__"
msgstr ""

#: ../../Qwen/source/deployment/tgi.rst:177 d35f9dd4bc96400cb6c7584012d2df49
msgid "Once deployed, the endpoint can be used as usual."
msgstr "一旦部署成功，服务使用与本地无异。"

#: ../../Qwen/source/deployment/tgi.rst:181 61c1b825bbf24be2aaaeb99de3f0660e
msgid "Common Issues"
msgstr "常见问题"

#: ../../Qwen/source/deployment/tgi.rst:183 b55a2d286fc24dbe92b79ab5c010c7af
msgid "Qwen2.5 supports long context lengths, so carefully choose the values for ``--max-batch-prefill-tokens``, ``--max-total-tokens``, and ``--max-input-tokens`` to avoid potential out-of-memory (OOM) issues. If an OOM occurs, you'll receive an error message upon startup. The following shows an example to modify those parameters:"
msgstr "Qwen2.5 支持长上下文，谨慎设定 ``--max-batch-prefill-tokens`` ， ``--max-total-tokens`` 和 ``--max-input-tokens`` 以避免 out-of-memory (OOM) 。如 OOM ，你将在启动 TGI 时收到错误提示。以下为修改这些参数的示例："



================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/deployment/vllm.po
================================================
# Copyright (C) 2024, Qwen Team, Alibaba Group.
# This file is distributed under the same license as the Qwen package.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-06-13 16:50+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.16.0\n"

#: ../../source/deployment/vllm.md:1 d5c0a6a59a4e4efdba77515c0c05f04a
msgid "vLLM"
msgstr ""

#: ../../source/deployment/vllm.md:3 56317e0cd3104065a8496366f9bdcb67
msgid "We recommend you trying [vLLM](https://github.com/vllm-project/vllm) for your deployment of Qwen.  It is simple to use, and it is fast with state-of-the-art serving throughput, efficient management of attention key value memory with PagedAttention, continuous batching of input requests, optimized CUDA kernels, etc.  To learn more about vLLM, please refer to the [paper](https://arxiv.org/abs/2309.06180) and [documentation](https://docs.vllm.ai/)."
msgstr "我们建议您在部署 Qwen 时尝试使用 [vLLM](https://github.com/vllm-project/vllm)。它易于使用，且具有最先进的服务吞吐量、高效的注意力键值内存管理（通过PagedAttention实现）、连续批处理输入请求、优化的CUDA内核等功能。要了解更多关于vLLM的信息，请参阅 [论文](https://arxiv.org/abs/2309.06180) 和 [文档](https://docs.vllm.ai/)。"

#: ../../source/deployment/vllm.md:7 81077ec6d7594b85b64857ee22883693
msgid "Environment Setup"
msgstr "环境配置"

#: ../../source/deployment/vllm.md:9 f80ee07832674dbfb3c547d0bd003720
msgid "By default, you can install `vllm` with pip in a clean environment:"
msgstr "默认情况下，你可以通过 pip 在新环境中安装 `vllm` ： "

#: ../../source/deployment/vllm.md:15 7c0520bacc8941108def669c821603a9
msgid "Please note that the prebuilt `vllm` has strict dependencies on `torch` and its CUDA versions. Check the note in the official document for installation ([link](https://docs.vllm.ai/en/latest/getting_started/installation.html)) for more help."
msgstr "请留意预构建的`vllm`对`torch`和其CUDA版本有强依赖。请查看[vLLM官方文档](https://docs.vllm.ai/en/latest/getting_started/installation.html)中的注意事项以获取有关安装的帮助。"

#: ../../source/deployment/vllm.md:18 ab785eb82e7f40a08269510d1cb5610d
msgid "API Service"
msgstr "API 服务"

#: ../../source/deployment/vllm.md:20 fa0b4510d60d489c94435c334100e413
msgid "It is easy to build an OpenAI-compatible API service with vLLM, which can be deployed as a server that implements OpenAI API protocol. By default, it starts the server at `http://localhost:8000`.  You can specify the address with `--host` and `--port` arguments.  Run the command as shown below:"
msgstr "借助vLLM，构建一个与OpenAI API兼容的API服务十分简便，该服务可以作为实现OpenAI API协议的服务器进行部署。默认情况下，它将在 `http://localhost:8000` 启动服务器。您可以通过 `--host` 和 `--port` 参数来自定义地址。请按照以下所示运行命令："

#: ../../source/deployment/vllm.md:28 66e317eb19424af7a54eb52093e7945b
msgid "By default, if the model does not point to a valid local directory, it will download the model files from the Hugging Face Hub. To download model from ModelScope, set the following before running the above command:"
msgstr "默认情况下，如果模型未指向有效的本地目录，它将从 Hugging Face Hub 下载模型文件。要从 ModelScope 下载模型，请在运行上述命令之前设置以下内容："

#: ../../source/deployment/vllm.md:34 4d9631b6922c4f08b9be1ec85a956309
msgid "For distributed inference with tensor parallelism, it is as simple as"
msgstr "对于使用张量并行的分布式推理，操作非常简单："

#: ../../source/deployment/vllm.md:38 33cc8d44a1434af2af200659add2e57f
msgid "The above command will use tensor parallelism on 4 GPUs. You should change the number of GPUs according to your demand."
msgstr "上述命令将在 4 块 GPU 上使用张量并行。您应根据需求调整 GPU 的数量。"

#: ../../source/deployment/vllm.md:41 605a634a44e3401ab46c77933aa2817e
msgid "Basic Usage"
msgstr "基本用法"

#: ../../source/deployment/vllm.md:43 58fb05b376b545e4892584539004c4c8
msgid "Then, you can use the [create chat interface](https://platform.openai.com/docs/api-reference/chat/completions/create) to communicate with Qwen:"
msgstr "然后，您可以利用 [create chat interface](https://platform.openai.com/docs/api-reference/chat/completions/create) 来与Qwen进行对话："

#: ../../source/deployment/vllm.md b9a21dd5fe924a36ae4d655aa7c2d127
#: f95affb6c52340bd8623c319fa8a159f
msgid "curl"
msgstr ""

#: ../../source/deployment/vllm.md 3c6242c05b90435dbf1c971ce061e127
#: 92e58b5637bd4af1bc4df269071a3df5
msgid "Python"
msgstr ""

#: ../../source/deployment/vllm.md:63 ../../source/deployment/vllm.md:129
#: 12be4f0cf2b9495faee25799b266a799 855aea8cb2fa48f7a63c6a511cb03fd5
msgid "You can use the API client with the `openai` Python SDK as shown below:"
msgstr "或者您可以如下面所示使用 `openai` Python SDK中的 API 客户端："

#: ../../source/deployment/vllm.md:93 85d6fac1e7e24de1983be918b3e0ea3e
msgid "`vllm` will use the sampling parameters from the `generation_config.json` in the model files."
msgstr "`vllm` 将使用模型文件中 `generation_config.json` 的采样参数。"

#: ../../source/deployment/vllm.md:95 1323928e307d4ed9bcd1e538e61f0d2d
msgid "While the default sampling parameters would work most of the time for thinking mode, it is recommended to adjust the sampling parameters according to your application,  and always pass the sampling parameters to the API."
msgstr "虽然默认的采样参数在大多数情况下适用于思考模式，但建议根据您的应用调整采样参数，并始终将采样参数传递给 API。"

#: ../../source/deployment/vllm.md:101 84e3b8ae27ab4242973d411876f79250
msgid "Thinking & Non-Thinking Modes"
msgstr "思考与非思考模式"

#: ../../source/deployment/vllm.md:103 80dddb284cb8439399ad0e3c2227cc0e
msgid "Qwen3 models will think before respond. This behavior could be controlled by either the hard switch, which could disable thinking completely, or the soft switch, where the model follows the instruction of the user on whether it should think."
msgstr "Qwen3 模型会在回复前进行思考。这种行为可以通过硬开关（完全禁用思考）或软开关（模型遵循用户关于是否应该思考的指令）来控制。"

#: ../../source/deployment/vllm.md:106 88d3ca0f6faf4b8b80012f0a1b3a53ea
msgid "The hard switch is available in vLLM through the following configuration to the API call. To disable thinking, use"
msgstr "硬开关在 vLLM 中可以通过以下 API 调用配置使用。要禁用思考，请使用"

#: ../../source/deployment/vllm.md:162 bfc88862f9e5470e8d8e213003c310bc
msgid "Please note that passing `enable_thinking` is not OpenAI API compatible. The exact method may differ among frameworks."
msgstr "请注意，`enable_thinking`并非OpenAI API定义的参数，具体传入方式可能因推理框架不同而不同。"

#: ../../source/deployment/vllm.md:167 1d3ab8c3e2d24c0d9961b38686661005
msgid "To completely disable thinking, you could use [a custom chat template](../../source/assets/qwen3_nonthinking.jinja) when starting the model:"
msgstr "要完全禁用思考，您可以在启动模型时使用[自定义聊天模板](../../source/assets/qwen3_nonthinking.jinja)："

#: ../../source/deployment/vllm.md:173 495c3e31b2614dd8bad7f0eba0d48236
msgid "The chat template prevents the model from generating thinking content, even if the user instructs the model to do so with `/think`."
msgstr "该聊天模板会阻止模型生成思考内容，即使用户通过 `/think` 指示模型这样做。"

#: ../../source/deployment/vllm.md:178 02a8fc8d1024402ebc4f9449aa2500f3
msgid "It is recommended to set sampling parameters differently for thinking and non-thinking modes."
msgstr "建议为思考模式和非思考模式分别设置不同的采样参数。"

#: ../../source/deployment/vllm.md:182 35e320018c0646cb82ad0d2f9fc54ea8
msgid "Parsing Thinking Content"
msgstr "解析思考内容"

#: ../../source/deployment/vllm.md:184 efeeeb8ff56e4f37af53bd7a6e440d65
msgid "vLLM supports parsing the thinking content from the model generation into structured messages:"
msgstr "vLLM 支持将模型生成的思考内容解析为结构化消息："

#: ../../source/deployment/vllm.md:189 a79e5778da03462199a629ed038526a8
msgid "Since vLLM 0.9.0, one can also use"
msgstr "自 vLLM 0.9.0 版本，也可以使用"

#: ../../source/deployment/vllm.md:194 7fdb9af5391b402280be7aeaa514dcd3
msgid "The response message will have a field named `reasoning_content` in addition to `content`, containing the thinking content generated by the model."
msgstr "响应消息除了包含 `content` 字段外，还会有一个名为 `reasoning_content` 的字段，其中包含模型生成的思考内容。"

#: ../../source/deployment/vllm.md:197 7036cf771a4e4fddbcbf92cc52de59d0
msgid "Please note that this feature is not OpenAI API compatible."
msgstr "请注意，此功能与 OpenAI API 规范不一致。"

#: ../../source/deployment/vllm.md:201 5e0250bbffa94e7dac6c9278e0d87ab6
msgid "As of vLLM 0.8.5, `enable_thinking=False` is not compatible with this feature. If you need to pass `enable_thinking=False` to the API, you should disable parsing thinking content. This is resolved in vLLM 0.9.0 with the `qwen3` reasoning parser."
msgstr "在 vLLM 0.8.5 版本中，`enable_thinking=False` 与此功能不兼容。如果需要向 API 传递 `enable_thinking=False`，则应禁用解析思考内容。此问题已在 vLLM 0.9.0 中通过 `qwen3` 思考解析器得到解决。"

#: ../../source/deployment/vllm.md:206 aa45c227d5d040dea6522bf2a45576b0
msgid "Parsing Tool Calls"
msgstr "解析工具调用"

#: ../../source/deployment/vllm.md:208 56c12209324f41fcbed13409de6d0aa5
msgid "vLLM supports parsing the tool calling content from the model generation into structured messages:"
msgstr "vLLM 支持将模型生成的工具调用内容解析为结构化消息："

#: ../../source/deployment/vllm.md:213 7a12c4073d874bff81f012d7da241e9c
msgid "For more information, please refer to [our guide on Function Calling](../framework/function_call.md#vllm)."
msgstr "详细信息，请参阅[函数调用的指南](../framework/function_call.md#vllm)。"

#: ../../source/deployment/vllm.md:215 827710cca0954bfaaeaa8a67fef1efa6
msgid "Structured/JSON Output"
msgstr "结构化/JSON输出"

#: ../../source/deployment/vllm.md:217 ac630e8cc3b04ad8af427e677896b7ee
msgid "vLLM supports structured/JSON output.  Please refer to [vLLM's documentation](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#extra-parameters-for-chat-api) for the `guided_json` parameters. Besides, it is also recommended to instruct the model to generate the specific format in the system message or in your prompt."
msgstr "vLLM 支持结构化/JSON 输出。请参照[vLLM文档](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#extra-parameters-for-chat-api)了解 `guided_json` 参数。此外，也建议在系统消息或用户提示中指示模型生成特定格式，避免仅依赖于推理参数配置。"

#: ../../source/deployment/vllm.md:222 5534e3b2ef2242c69964016b526e1a01
msgid "Serving Quantized models"
msgstr "部署量化模型"

#: ../../source/deployment/vllm.md:224 dbd8323a4bc6415da9fde179b23dac78
msgid "Qwen3 comes with two types of pre-quantized models, FP8 and AWQ."
msgstr "Qwen3 提供了两种类型的预量化模型：FP8 和 AWQ。"

#: ../../source/deployment/vllm.md:226 1aedffe7b0874ac289badc829e035300
msgid "The command serving those models are the same as the original models except for the name change:"
msgstr "部署这些模型的命令与原始模型相同，只是名称有所更改："

#: ../../source/deployment/vllm.md:236 6d91506e31194076857d992f51d3f336
msgid "The FP8 models of Qwen3 are block-wise quant, which is supported on NVIDIA GPUs with compute capability > 8.9, that is, Ada Lovelace, Hopper, and later GPUs and runs as w8a8."
msgstr "Qwen3 的 FP8 模型采用分块 (block-wise) 量化，该功能支持在 compute capability > 8.9 的 NVIDIA GPU 上运行，即 Ada Lovelace、Hopper 及更新的 GPU，并以 w8a8 方式运行。"

#: ../../source/deployment/vllm.md:238 f26fbf5e37134274b435efaeada7af6b
msgid "Since vLLM v0.9.0, FP8 Marlin has supported block-wise quants (running as w8a16) and you can also run Qwen3 FP8 models on Ampere cards."
msgstr "从 vLLM v0.9.0 开始，FP8 Marlin 已支持分块量化（以 w8a16 方式运行），您还可以在 Ampere 显卡上运行 Qwen3 FP8 模型。"

#: ../../source/deployment/vllm.md:242 dca096f25d794f88bb446eb2f39f3570
msgid "If you encountered the following error when deploying the FP8 models, it indicates that the tensor parallel size does not agree with the model weights:"
msgstr "如果在部署 FP8 模型时遇到以下错误，这表明张量并行大小与模型权重不匹配："

#: ../../source/deployment/vllm.md:249 609be8d8c32f48c5ade0f1a32873bbf8
msgid "We recommend lowering the degree of tensor parallel, e.g., `--tensor-parallel-size 4` or enabling expert parallel, e.g., `--tensor-parallel-size 8 --enable-expert-parallel`."
msgstr "目前，我们建议降低张量并行的程度，例如使用 `--tensor-parallel-size 4`，或者启用专家并行，例如使用 `--tensor-parallel-size 8 --enable-expert-parallel`。"

#: ../../source/deployment/vllm.md:252 027c1c2d8d6a46a78c69451e96b6544a
msgid "Context Length"
msgstr "上下文长度"

#: ../../source/deployment/vllm.md:254 a9dd53625c294f8682cc2fbedce37f45
msgid "The context length for Qwen3 models in pretraining is up to 32,768 tokens. To handle context length substantially exceeding 32,768 tokens, RoPE scaling techniques should be applied. We have validated the performance of [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts."
msgstr "Qwen3 模型在预训练中的上下文长度最长为 32,768 个 token。为了处理显著超过 32,768 个 token 的上下文长度，应应用 RoPE 缩放技术。我们已经验证了 [YaRN](https://arxiv.org/abs/2309.00071) 的性能，这是一种增强模型长度外推的技术，可确保在长文本上的最佳性能。"

#: ../../source/deployment/vllm.md:258 988e60ef751944c09274ce28aecf3fac
msgid "vLLM supports YaRN, which can be configured as"
msgstr "vLLM 支持 YaRN，可以配置为"

#: ../../source/deployment/vllm.md:264 7da95eb3121944859d583a7a5885c5c4
msgid "vLLM implements static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.** We advise adding the `rope_scaling` configuration only when processing long contexts is required.  It is also recommended to modify the `factor` as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set `factor` as 2.0."
msgstr "vLLM 实现了静态 YaRN，这意味着无论输入长度如何，缩放因子都保持不变，**这可能会对较短文本的性能产生影响。** 我们建议仅在需要处理长上下文时添加 `rope_scaling` 配置。还建议根据需要调整 `factor`。例如，如果您的应用程序的典型上下文长度为 65,536 个 token，则最好将 `factor` 设置为 2.0。"

#: ../../source/deployment/vllm.md:270 fa10485203e3408dabb81687868ce059
msgid "The default `max_position_embeddings` in `config.json` is set to 40,960, which used by vLLM, if `--max-model-len` is not specified. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing and leave adequate room for model thinking. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance."
msgstr "如果未指定 `--max-model-len`，`config.json` 中的默认 `max_position_embeddings` 被设置为 40,960，vLLM 将使用该值。此分配包括为输出保留 32,768 个 token，为典型提示保留 8,192 个 token，这足以应对大多数涉及短文本处理的场景，并为模型思考留出充足空间。如果平均上下文长度不超过 32,768 个 token，我们不建议在此场景中启用 YaRN，因为这可能会降低模型性能。"

#: ../../source/deployment/vllm.md:275 fe82482bd3e1406998560b372e2707df
msgid "Python Library"
msgstr "Python 库使用"

#: ../../source/deployment/vllm.md:277 82fb6b76e9c24978b9bee04927b3d5d4
msgid "vLLM can also be directly used as a Python library, which is convenient for offline batch inference but lack some API-only features, such as parsing model generation to structure messages."
msgstr "vLLM 也可以直接用作 Python 库，这对离线批量推理非常方便，但缺少一些仅限 API 的功能，例如将模型生成解析为结构化消息。"

#: ../../source/deployment/vllm.md:279 857f6312cd4e44ccb34197793db59165
msgid "The following shows the basic usage of vLLM as a library:"
msgstr "以下展示了将 vLLM 用作库的基本用法："

#: ../../source/deployment/vllm.md:316 0c88c07558e04cbd8f2e6cf04091ef62
msgid "Since vLLM v0.9.0, you can also use the `LLM.chat` interface which includes support for `chat_template_kwargs`:"
msgstr "自 vLLM v0.9.0 开始，`LLM.chat` 支持 `chat_template_kwargs` 参数，因而也可以使用以下方法："

#: ../../source/deployment/vllm.md:347 de026655e79449f28854a1cbcc182d9f
msgid "FAQ"
msgstr "常见问题解答"

#: ../../source/deployment/vllm.md:349 30e351b0aebb4e71b187d74895124662
msgid "You may encounter OOM issues that are pretty annoying. We recommend two arguments for you to make some fix."
msgstr "您可能会遇到令人烦恼的OOM（内存溢出）问题。我们推荐您尝试两个参数进行修复。"

#: ../../source/deployment/vllm.md:352 cc5e992512ff4aa9bdede2e3791264f3
msgid "The first one is `--max-model-len`. Our provided default `max_position_embedding` is `40960` and thus the maximum length for the serving is also this value, leading to higher requirements of memory. Reducing it to a proper length for yourself often helps with the OOM issue."
msgstr "第一个参数是 `--max-model-len` 。我们提供的默认最大位置嵌入（`max_position_embedding`）为 40960 ，因此服务时的最大长度也是这个值，这会导致更高的内存需求。将此值适当减小通常有助于解决OOM问题。"

#: ../../source/deployment/vllm.md:355 9e8f5e6e064841de8bfa0f581f28c25b
msgid "Another argument you can pay attention to is `--gpu-memory-utilization`. vLLM will pre-allocate this much GPU memory. By default, it is `0.9`. This is also why you find a vLLM service always takes so much memory. If you are in eager mode (by default it is not), you can level it up to tackle the OOM problem. Otherwise, CUDA Graphs are used, which will use GPU memory not controlled by vLLM, and you should try lowering it. If it doesn't work, you should try `--enforce-eager`, which may slow down inference, or reduce the `--max-model-len`."
msgstr "另一个您可以关注的参数是 `--gpu-memory-utilization` 。 vLLM将预分配该参数指定比例的显存。默认情况下，该值为 `0.9`。这也是为什么您发现一个vLLM服务总是占用大量内存的原因。如果你使用了eager模式（默认不是），您可以将其调高以应对OOM问题。反之，vLLM会使用CUDA Graphs，而CUDA Graphs会额外占用不受vLLM管理的显存；此时，您应当尝试降低`--gpu-memory-utilization`。如果还是无法解决，可以尝试`--enforce-eager`（这会影响推理效率）或缩小`--max-model-len`。"

#: ../../source/deployment/vllm.md:364 2dd3f6552c53423ca7ce0576047f2ab6
msgid "For more usage guide with vLLM, please see vLLM's [Qwen3 Usage Guide](https://github.com/vllm-project/vllm/issues/17327)."
msgstr "有关 vLLM 的更多使用指南，请参阅 vLLM 的[Qwen3 使用指南](https://github.com/vllm-project/vllm/issues/17327)。"



================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/framework/Langchain.po
================================================
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

#: ../../Qwen/source/framework/Langchain.rst:2 6f9b66430d9c495592b1e275fdfd7c9e
msgid "Langchain"
msgstr ""

#: ../../Qwen/source/framework/Langchain.rst:5 1205af46f88e4d6681003403109385c3
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"

#: ../../Qwen/source/framework/Langchain.rst:7 115ee7b1c8404629a8f98175264cc114
msgid "This guide helps you build a question-answering application based on a local knowledge base using ``Qwen2.5-7B-Instruct`` with ``langchain``. The goal is to establish a knowledge base Q&A solution."
msgstr "本教程旨在帮助您利用 ``Qwen2.5-7B-Instruct`` 与 ``langchain`` ，基于本地知识库构建问答应用。目标是建立一个知识库问答解决方案。"

#: ../../Qwen/source/framework/Langchain.rst:12
#: 7257b95612fb423bb9ca73212fd12a02
msgid "Basic Usage"
msgstr "基础用法"

#: ../../Qwen/source/framework/Langchain.rst:14
#: fecf7a682dcc4c15a53da1f7cdf145e5
msgid "The implementation process of this project includes loading files -> reading text -> segmenting text -> vectorizing text -> vectorizing questions -> matching the top k most similar text vectors with the question vectors -> incorporating the matched text as context along with the question into the prompt -> submitting to the Qwen2.5-7B-Instruct to generate an answer. Below is an example:"
msgstr "您可以仅使用您的文档配合 ``langchain`` 来构建一个问答应用。该项目的实现流程包括加载文件 -> 阅读文本 -> 文本分段 -> 文本向量化 -> 问题向量化 -> 将最相似的前k个文本向量与问题向量匹配 -> 将匹配的文本作为上下文连同问题一起纳入提示 -> 提交给Qwen2.5-7B-Instruct生成答案。以下是一个示例："

#: ../../Qwen/source/framework/Langchain.rst:98
#: 6ad1ebd2ef4a49f9aa66cfdf777e1290
msgid "After loading the Qwen2.5-7B-Instruct model, you should specify the txt file for retrieval."
msgstr "加载Qwen2.5-7B-Instruct模型后，您可以指定需要用于知识库问答的txt文件。"

#: ../../Qwen/source/framework/Langchain.rst:274
#: 00467b1e4e294a26b9f49886633331e0
msgid "Next Step"
msgstr "下一步"

#: ../../Qwen/source/framework/Langchain.rst:276
#: 15ed906687054af78545290ba0746380
msgid "Now you can chat with Qwen2.5 use your own document. Continue to read the documentation and try to figure out more advanced usages of model retrieval!"
msgstr "现在，您可以在您自己的文档上与Qwen2.5进行交流。继续阅读文档，尝试探索模型检索的更多高级用法！"



================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/framework/LlamaIndex.po
================================================
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

#: ../../Qwen/source/framework/LlamaIndex.rst:2
#: 2e41f8696c20488d8593b670c6361edf
msgid "LlamaIndex"
msgstr "LlamaIndex"

#: ../../Qwen/source/framework/LlamaIndex.rst:5
#: 20b3836fd391457bb00bf75b61e23e0d
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"

#: ../../Qwen/source/framework/LlamaIndex.rst:7
#: 86d9e6f0684749aab40a9824cd026fa3
msgid "To connect Qwen2.5 with external data, such as documents, web pages, etc., we offer a tutorial on `LlamaIndex <https://www.llamaindex.ai/>`__. This guide helps you quickly implement retrieval-augmented generation (RAG) using LlamaIndex with Qwen2.5."
msgstr "为了实现 Qwen2.5 与外部数据（例如文档、网页等）的连接，我们提供了 `LlamaIndex <https://www.llamaindex.ai/>`__ 的详细教程。本指南旨在帮助用户利用 LlamaIndex 与 Qwen2.5 快速部署检索增强生成（RAG）技术。"

#: ../../Qwen/source/framework/LlamaIndex.rst:11
#: 71ed222858054687a5b33222bb6ac086
msgid "Preparation"
msgstr "环境准备"

#: ../../Qwen/source/framework/LlamaIndex.rst:13
#: 161d9153d6484dd5a1f1bdb340847814
msgid "To implement RAG, we advise you to install the LlamaIndex-related packages first."
msgstr "为实现检索增强生成（RAG），我们建议您首先安装与 LlamaIndex 相关的软件包。"

#: ../../Qwen/source/framework/LlamaIndex.rst:16
#: a8d6acb1001a42c88185b971ae2de3bf
msgid "The following is a simple code snippet showing how to do this:"
msgstr "以下是一个简单的代码示例："

#: ../../Qwen/source/framework/LlamaIndex.rst:25
#: e441d3b8fb6d4a13b52e1560ef250b16
msgid "Set Parameters"
msgstr "设置参数"

#: ../../Qwen/source/framework/LlamaIndex.rst:27
#: c2481804c3f34c7f883eed92ffa3111e
msgid "Now we can set up LLM, embedding model, and the related configurations. Qwen2.5-Instruct supports conversations in multiple languages, including English and Chinese. You can use the ``bge-base-en-v1.5`` model to retrieve from English documents, and you can download the ``bge-base-zh-v1.5`` model to retrieve from Chinese documents. You can also choose ``bge-large`` or ``bge-small`` as the embedding model or modify the context window size or text chunk size depending on your computing resources. Qwen2.5 model families support a maximum of 32K context window size (up to 128K for 7B, 14B, 32B, and 72B, requiring extra configuration)"
msgstr "现在，我们可以设置语言模型和向量模型。Qwen2.5-Instruct支持包括英语和中文在内的多种语言对话。您可以使用 ``bge-base-en-v1.5`` 模型来检索英文文档，下载 ``bge-base-zh-v1.5`` 模型以检索中文文档。根据您的计算资源，您还可以选择 ``bge-large`` 或 ``bge-small`` 作为向量模型，或调整上下文窗口大小或文本块大小。Qwen2.5模型系列支持最大32K上下文窗口大小（7B 、14B 、32B 及 72B可扩展支持 128K 上下文，但需要额外配置）"

#: ../../Qwen/source/framework/LlamaIndex.rst:85
#: 74c35d5a03734c289d162dfa3813ada6
msgid "Build Index"
msgstr "构建索引"

#: ../../Qwen/source/framework/LlamaIndex.rst:87
#: c49859d4ea5f49dba1fa2263f3ae284d
msgid "Now we can build index from documents or websites."
msgstr "现在我们可以从文档或网站构建索引。"

#: ../../Qwen/source/framework/LlamaIndex.rst:89
#: b460d000037e4266a4d9f43d38f1f9b0
msgid "The following code snippet demonstrates how to build an index for files (regardless of whether they are in PDF or TXT format) in a local folder named 'document'."
msgstr "以下代码片段展示了如何为本地名为'document'的文件夹中的文件（无论是PDF格式还是TXT格式）构建索引。"

#: ../../Qwen/source/framework/LlamaIndex.rst:102
#: a416d18b227940e29fac1f59851ff8c4
msgid "The following code snippet demonstrates how to build an index for the content in a list of websites."
msgstr "以下代码片段展示了如何为一系列网站的内容构建索引。"

#: ../../Qwen/source/framework/LlamaIndex.rst:118
#: 487cf928d048424fa1b50438f701137c
msgid "To save and load the index, you can use the following code snippet."
msgstr "要保存和加载已构建的索引，您可以使用以下代码示例。"

#: ../../Qwen/source/framework/LlamaIndex.rst:132
#: c68419c4318d46e891f5df9191be6d2d
msgid "RAG"
msgstr "检索增强（RAG）"

#: ../../Qwen/source/framework/LlamaIndex.rst:134
#: 8ad20a8f43fe496084a40f963ba97440
msgid "Now you can perform queries, and Qwen2.5 will answer based on the content of the indexed documents."
msgstr "现在您可以输入查询，Qwen2.5 将基于索引文档的内容提供答案。"



================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/framework/function_call.po
================================================
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-06-13 16:36+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.16.0\n"

#: ../../source/framework/function_call.md:6 10546d07829648458ac4f91b91967697
msgid "Function Calling"
msgstr "函数调用"

#: ../../source/framework/function_call.md:9 97bb46c965d44cb4900c947356c571fd
msgid "Preface"
msgstr "前言"

#: ../../source/framework/function_call.md:11 539a099d2e764c8181e17c9c41f8053b
msgid "Function calling with large language models is a huge and evolving topic. It is particularly important for AI applications:"
msgstr "使用大型语言模型进行函数调用 (Function Calling) 是一个庞大且不断发展的主题。这对AI应用尤为重要："

#: ../../source/framework/function_call.md:13 96e919c9808842ddb3225121c70a350b
msgid "either for AI-native applications that strive to work around the shortcomings of current AI technology,"
msgstr "无论是为了绕过当前AI技术的局限性，而设计的原生AI应用，"

#: ../../source/framework/function_call.md:14 e7f29fd2c11b413bb08860d1a1c8ad8d
msgid "or for existing applications that seeks the integration of AI technology to improve performance, user interaction and experience, or efficiency."
msgstr "还是为了提升性能、用户体验或效率，寻求整合AI技术的现有应用。"

#: ../../source/framework/function_call.md:16 cc49601082014a989c503f4461c8a942
msgid "We will talk about how Qwen3 can be used to support function calling and how it can be used to achieve your goals, from the inference usage for developing application to the inner workings for hardcore customizations.  In this guide,"
msgstr "我们将讨论如何使用 Qwen3 来支持函数调用，以及如何利用它来实现您的目标，从用于开发应用程序的推理用法到针对硬核定制的内部工作机制。在本指南中，"

#: ../../source/framework/function_call.md:18 bad8e302d5044edd80cb9da02c0f6dbe
msgid "We will first demonstrate how to use function calling with Qwen3."
msgstr "我们首先将展示如何使用Qwen3进行函数调用。"

#: ../../source/framework/function_call.md:19 41f975e599264589af4819e9aee96f76
msgid "Then, we will introduce the technical details on functional calling with Qwen3, which are mainly about the templates."
msgstr "接着，我们将介绍使用Qwen3行函数调用的技术细节，主要涉及模板的使用。"

#: ../../source/framework/function_call.md:21 cad1eaec5df646509123bd97566c69c5
msgid "Before starting, there is one thing we have not yet introduced, that is ..."
msgstr "在开始之前，还有一件事我们尚未介绍，那就是…"

#: ../../source/framework/function_call.md:23 129f1c1d7bd7486f80955cda6cf77e3f
msgid "What is function calling?"
msgstr "什么是函数调用？"

#: ../../source/framework/function_call.md:26 cdf0c1412f234bd8bcc386683a38b131
msgid "There is another term \"tool use\" that may be used to refer to the same concept. While some may argue that tools are a generalized form of functions, at present, their difference exists only technically as different I/O types of programming interfaces."
msgstr "这一概念也可能被称为“工具使用” (\"tool use\")。虽然有人认为“工具”是“函数”的泛化形式，但在当前，它们的区别仅在技术层面上，表现为编程接口的不同输入输出类型。"

#: ../../source/framework/function_call.md:30 93085d6f71dd4349832e0c8832b57966
msgid "Large language models (LLMs) are powerful things. However, sometimes LLMs by themselves are simply not capable enough."
msgstr "大型语言模型（LLMs）确实强大。然而，有时候单靠大型语言模型的能力还是不够的。"

#: ../../source/framework/function_call.md:32 d441a861290d4a0aaa6fea713b628287
msgid "On the one hand, LLMs have inherent modeling limitations.  For one, they do not know things that are not in their training data, which include those happened after their training ended. In addition, they learn things in the way of likelihood, which suggests that they may not be precise enough for tasks with fixed rule sets, e.g., mathematical computation."
msgstr "一方面，大型语言模型存在建模局限性。首先，对于训练数据中没有的信息，包括训练结束后发生的事情，它们并不了解。此外，它们通过概率方式学习，这意味着对于有固定规则集的任务，如数学计算，可能不够精确。"

#: ../../source/framework/function_call.md:35 f52f0935e2a346e3b1631eeaf2223fb0
msgid "On the other hand, it is not easy to use LLMs as a Plug-and-Play service programmatically with other things. LLMs mostly talk in words that are open to interpretation and thus ambiguous, while other software or applications or systems talk in code and through programming interfaces that are pre-defined and fixed and structured."
msgstr "另一方面，将大型语言模型作为即插即用服务与其它系统进行编程式协作，并非易事。大型语言模型的表达多含主观解释成分，因而产生歧义；而其他软件、应用或系统则通过预定义、固定和结构化的代码及编程接口进行沟通。"

#: ../../source/framework/function_call.md:38 4ff86e51b366418ba9cd2ff9a8fceb27
msgid "To this end, function calling establishes a common protocol that specifies how LLMs should interact with the other things. The procedure is mainly as follows:"
msgstr "为此，函数调用确立了一个通用协议，规定了大型语言模型应与其他实体互动的流程。主要流程如下："

#: ../../source/framework/function_call.md:40 1bc24a6d6a5140cdb2f274b7df61a429
msgid "The application provides a set of functions and the instructions of the functions to an LLM."
msgstr "应用程序向大型语言模型提供一组函数及其使用说明。"

#: ../../source/framework/function_call.md:41 8569c9c13fd34caab163f4f9154e45bc
msgid "The LLM choose to or not to, or is forced to use one or many of the functions, in response to user queries."
msgstr "大型语言模型根据用户查询，选择使用或不使用，或被迫使用一个或多个函数。"

#: ../../source/framework/function_call.md:42 dac3cb931c3c4755b276886d5568340a
msgid "If the LLM chooses to use the functions, it states how the functions should be used based on the function instructions."
msgstr "如果大型语言模型选择使用这些函数，它会根据函数说明如何使用。"

#: ../../source/framework/function_call.md:43 a6439df6e0354e96b321fbfaa260dbd1
msgid "The chosen functions are used as such by the application and the results are obtained, which are then given to the LLM if further interaction is needed."
msgstr "应用程序按照选择使用这些函数，并获取结果。如果需要进一步互动，结果将提供给大型语言模型。"

#: ../../source/framework/function_call.md:45 8cef5c9125154720952742a4f51e19e2
msgid "There are many ways for LLMs to understand and follow this protocol. As always, the key is prompt engineering or an internalized template known by the model. We recommend using Hermes-style tool use for Qwen3 to maximize function calling performance."
msgstr "大型语言模型（LLMs）有许多方式来理解和遵循该协议。一如既往，关键在于提示工程或模型已内化的模板。我们建议对 Qwen3 使用 Hermes 风格的工具调用方法，以最大化函数调用性能。"

#: ../../source/framework/function_call.md:49 486bf0eaff5449f3966f948cf341570f
msgid "Inference with Function Calling"
msgstr "使用函数调用进行推理"

#: ../../source/framework/function_call.md:51 6d50be829cba4f43b587c494dc74a820
msgid "As function calling is essentially implemented using prompt engineering, you could manually construct the model inputs for Qwen3 models. However, frameworks with function calling support can help you with all that laborious work."
msgstr "由于函数调用本质上是通过提示工程实现的，您可以手动构建Qwen3模型的输入。但是，支持函数调用的框架可以帮助您完成所有繁重的工作。"

#: ../../source/framework/function_call.md:54 5afa6df4e2d94df89b26b66bce596264
msgid "In the following, we will introduce the usage (via dedicated function calling chat template) with"
msgstr "接下来，我们将介绍（通过专用的函数调用模板）使用"

#: ../../source/framework/function_call.md:55 380f90e66ba44ad686e981536dd29e12
msgid "**Qwen-Agent**,"
msgstr "**Qwen-Agent**，"

#: ../../source/framework/function_call.md:56 79b6f6475bc64a7bbe397ba0c94d43b1
msgid "**vLLM**."
msgstr "**vLLM**。"

#: ../../source/framework/function_call.md:58 30fff59c00c64a70821c167a944fd3f7
msgid "The Example Case"
msgstr "案例"

#: ../../source/framework/function_call.md:60 2b4cdb8daedd4da49877af5ecc67ba8b
msgid "Let's also use an example to demonstrate the inference usage. We assume **Python 3.11** is used as the programming language."
msgstr "我们同样通过一个示例来展示推理的使用方法。假设我们使用的编程语言是**Python 3.11**。"

#: ../../source/framework/function_call.md:63 ae5619a2a7ff41608d3420cd0caaba79
msgid "**Scenario**: Suppose we would like to ask the model about the temperature of a location. Normally, the model would reply that it cannot provide real-time information. But we have two tools that can be used to obtain the current temperature of and the temperature at a given date of a city respectively, and we would like the model to make use of them."
msgstr "**场景**：假设我们要询问模型某个地点的温度。通常，模型会回答无法提供实时信息。但我们有两个工具，可以分别获取城市的当前温度和指定日期的温度，我们希望模型能够利用这些工具。"

#: ../../source/framework/function_call.md:67 2617f66327e64038a5933bd825f099b8
msgid "To set up the example case, you can use the following code:"
msgstr "为了这个示例案例，您可以使用以下代码："

#: ../../source/framework/function_call.md a419b632036846fdb90427683502c597
msgid "Preparation Code"
msgstr "准备代码"

#: ../../source/framework/function_call.md:173 77ce44e0fb3b43f6814da6a2ad36e511
msgid "In particular, the tools should be described using JSON Schema and the messages should contain as much available information as possible. You can find the explanations of the tools and messages below:"
msgstr "工具应使用JSON Schema进行描述，消息应包含尽可能多的有效信息。您可以在下面找到工具和消息的解释："

#: ../../source/framework/function_call.md 48a8432a5d384a4ba10f39fcea96b825
msgid "Example Tools"
msgstr "示例工具"

#: ../../source/framework/function_call.md:178 f18d3697d6c347a2b407b21ae874f865
msgid "The tools should be described using the following JSON:"
msgstr "工具应使用以下JSON进行描述："

#: ../../source/framework/function_call.md:242 6e55bd9040be4bc296ed948788780347
msgid "For each **tool**, it is a JSON object with two fields:"
msgstr "对于每个**工具**，它是一个具有两个字段的JSON object："

#: ../../source/framework/function_call.md:243 ae43f666fcab4e5b93ff009688305fce
msgid "`type`: a string specifying the type of the tool, currently only `\"function\"` is valid"
msgstr "`type`：string，用于指定工具类型，目前仅`\"function\"`有效"

#: ../../source/framework/function_call.md:244 35cc77d63bf6455bb3376bec483b0f9b
msgid "`function`: an object detailing the instructions to use the function"
msgstr "`function`：object，详细说明了如何使用该函数"

#: ../../source/framework/function_call.md:246 b6f78c8fbb9b40d5a6c0c10e5273b85e
msgid "For each **function**, it is a JSON object with three fields:"
msgstr "对于每个**function**，它是一个具有三个字段的JSON object："

#: ../../source/framework/function_call.md:247 0905c8ca11bb4454abb26ba42a3b29c3
msgid "`name`: a string indicating the name of the function"
msgstr "`name`：string 表示函数名称"

#: ../../source/framework/function_call.md:248 d787950be0144e0a83f18d7a06d43de0
msgid "`description`: a string describing what the function is used for"
msgstr "`description`：string 描述函数用途"

#: ../../source/framework/function_call.md:249 ba88a6e75ae8462d892a3ec3e9a6edaa
msgid "`parameters`: [a JSON Schema](https://json-schema.org/learn/getting-started-step-by-step) that specifies the parameters the function accepts. Please refer to the linked documentation for how to compose a JSON Schema. Notable fields include `type`, `required`, and `enum`."
msgstr "`parameters`：[JSON Schema](https://json-schema.org/learn/getting-started-step-by-step)，用于指定函数接受的参数。请参阅链接文档以了解如何构建JSON Schema。值得注意的字段包括`type`、`required`和`enum`。"

#: ../../source/framework/function_call.md:251 abd8fe46919340a582f532932f9c1b52
msgid "Most frameworks use the tool format and some may use the function format. Which one to use should be obvious according to the naming."
msgstr "大多数框架使用“工具”格式，有些可能使用“函数”格式。根据命名，应该很明显应该使用哪一个。"

#: ../../source/framework/function_call.md 6c17006423714b9bb29379bd3674e1a0
msgid "Example Messages"
msgstr "示例消息"

#: ../../source/framework/function_call.md:258 1cd11a9c9f9b4f2ba308599a87f82613
msgid "Our query is `What's the temperature in San Francisco now? How about tomorrow? Current Date: 2024-09-30.`."
msgstr ""

#: ../../source/framework/function_call.md:267 3f124f3c5ef54673bb6e12d8bd1bada3
msgid "Qwen-Agent"
msgstr ""

#: ../../source/framework/function_call.md:269 761ea5172da64e9d8911cc567d7e7779
msgid "[Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) is actually a Python Agent framework for developing AI applications. Although its intended use cases are higher-level than efficient inference, it does contain the **canonical implementation** of function calling for Qwen3. It provides the function calling ability for Qwen3 to an OpenAI-compatible API through templates that is transparent to users."
msgstr "[Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) 实际上是一个用于开发AI应用的Python智能体框架。尽管其设计用例比高效推理更高级，但它确实包含了Qwen3函数调用的**规范实现**。基于OpenAI兼容API，它可以通过模板为Qwen3提供了对用户透明的的函数调用能力。"

#: ../../source/framework/function_call.md:273 194c41677b7b4a6a81802980f4973988
msgid "It is worth noting that for reasoning models like Qwen3, it is *not recommended* to use tool call template based on stopwords, such as ReAct, because the model may output stopwords in the thought section, potentially leading to unexpected behavior in tool calls."
msgstr ""

#: ../../source/framework/function_call.md:275 02e0092b1db04537bb75bb8a9728b0c9
msgid "Before starting, let's make sure the latest library is installed:"
msgstr "在开始之前，让我们确保已安装了最新的库："

#: ../../source/framework/function_call.md:280
#: ../../source/framework/function_call.md:450 8e49db2c62cc438193a1a385778c9908
msgid "Preparing"
msgstr "准备工作"

#: ../../source/framework/function_call.md:282 1181a68368ce484f9beec38d971108aa
msgid "Qwen-Agent can wrap an OpenAI-compatible API that does not support function calling. You can serve such an API with most inference frameworks or obtain one from cloud providers like DashScope or Together."
msgstr "Qwen-Agent可以封装一个不支持函数调用的OpenAI兼容API。您可以使用大多数推理框架来提供此类API，或者从DashScope或Together等云提供商处获取一个。"

#: ../../source/framework/function_call.md:285 d7708c50cf554212866c284d73a79c9e
msgid "Assuming there is an OpenAI-compatible API at `http://localhost:8000/v1`, Qwen-Agent provides a shortcut function `get_chat_model` to obtain a model inference class with function calling support:"
msgstr "假设在`http://localhost:8000/v1`处有一个OpenAI兼容API，Qwen-Agent提供了一个快捷函数`get_chat_model`，用于获取具有函数调用支持的模型推理类："

#: ../../source/framework/function_call.md:302 e5676cec6583445888cf806ef54de829
msgid "In the above, `model_server` is the `api_base` common used in other OpenAI-compatible API clients. It is advised to provide the `api_key` (but not via plaintext in the code), even if the API server does not check it, in which case, you can set it to anything. You can pass model parameters to the model by `generate_cfg`. Here we demonstrate how to control the think and no_think modes of Qwen3.  Different APIs may have different control methods."
msgstr "在上述代码中，`model_server` 是其他兼容 OpenAI 的 API 客户端常用的 `api_base`。建议提供 `api_key`（但不要以代码中的明文形式提供），即使 API 服务器不检查它，在这种情况下，您可以将其设置为任意值。您可以通过 `generate_cfg` 将模型参数传递给模型。在此我们演示如何控制 Qwen3 的思考与非思考模式。不同的 API 可能有不同的控制方法。"

#: ../../source/framework/function_call.md:307 42c1289baeef45b1a471e606bdafbe9a
msgid "For model inputs, the common message structure for system, user, and assistant history should be used:"
msgstr "对于模型输入，应使用系统、用户和助手历史记录的通用消息结构："

#: ../../source/framework/function_call.md:313 64e0a941165d49299e80aa8270f10bfb
msgid "At the time, Qwen-Agent works with functions instead of tools. This requires a small change to our tool descriptions, that is, extracting the function fields:"
msgstr "目前，Qwen-Agent使用“函数”而非“工具”。这需要对我们工具描述进行一些小的更改，即提取函数字段："

#: ../../source/framework/function_call.md:320
#: ../../source/framework/function_call.md:482 6babaeb87a48402086b0db01d6a0c5e2
msgid "Tool Calls and Tool Results"
msgstr "工具调用和工具结果"

#: ../../source/framework/function_call.md:322 53c98c106e2542f49d418348118717ea
msgid "To interact with the model, the `chat` method should be used:"
msgstr "为了与模型交互，应使用`chat`方法："

#: ../../source/framework/function_call.md:333 a0e820c457e146298945b5df4cc918c5
msgid "The `chat` method returns a generator of list, each of which may contain multiple messages."
msgstr "`chat`方法返回一个列表的生成器，每个列表可能包含多条消息。"

#: ../../source/framework/function_call.md:336 4627b7f2ddd9427aa10a127ea5d0d2b4
msgid "The results of `no_think` mode:"
msgstr ""

#: ../../source/framework/function_call.md:344 bc34f89540c044d3809f29474f5cfe43
msgid "The results of `think` mode:"
msgstr ""

#: ../../source/framework/function_call.md:353 89732d0ca8ce4d93b666cdd51f8e126a
msgid "As we can see, Qwen-Agent attempts to parse the model generation in an easier to use structural format. The details related to function calls are placed in the `function_call` field of the messages:"
msgstr "我们可以看到，Qwen-Agent试图以更易于使用的结构化格式解析模型生成。与函数调用相关的详细信息被放置在消息的`function_call`字段中："

#: ../../source/framework/function_call.md:355 d08d78b0219a4bbda4b6431e7f661eaa
msgid "`name`: a string representing the function to call"
msgstr "`name`：代表要调用的函数的字符串"

#: ../../source/framework/function_call.md:356 5a9245e840994576a89c28e4ce4f9378
msgid "`arguments`: a JSON-formatted string representing the arguments the function should be called with"
msgstr "`arguments`：表示函数应带有的参数的JSON格式字符串"

#: ../../source/framework/function_call.md:358 e108e1ea05564aac9a398e1093dab1ce
msgid "In the thinking mode, it will first generate a thought and then generate the tool call(s)."
msgstr ""

#: ../../source/framework/function_call.md:360 50cc88e82a004ab99199f46549f4aaa2
msgid "Then comes the critical part -- checking and applying the function call:"
msgstr "接下来是关键部分——检查和应用函数调用："

#: ../../source/framework/function_call.md:376 6992f24f3f984594ac99e4267cd20144
msgid "To get tool results:"
msgstr "获取工具结果："

#: ../../source/framework/function_call.md:377 23a3a2cbf74347948c6629be3d68be0f
msgid "line 1: We should iterate the function calls in the order the model generates them."
msgstr "第1行：我们应该按模型生成它们的顺序迭代函数调用。"

#: ../../source/framework/function_call.md:378 762ba5a6310248dc886921fa542aa667
msgid "line 2: We can check if a function call is needed as deemed by the model by checking the `function_call` field of the generated messages."
msgstr "第2行：通过检查生成消息的`function_call`字段，我们可以查看是否需要按模型判断进行函数调用。"

#: ../../source/framework/function_call.md:379 bfbb86b6b1b24ca598e6402043417cbf
msgid "line 3-4: The related details including the name and the arguments of the function can also be found there, which are `name` and `arguments` respectively."
msgstr "第3-4行：相关详情，包括函数名称和参数，也可以在那里找到，分别是`name`和`arguments`。"

#: ../../source/framework/function_call.md:380 7d108413597041c48363681e79da9813
msgid "line 6: With the details, one should call the function and obtain the results. Here, we assume there is a function named [`get_function_by_name`](#prepcode) to help us get the related function by its name."
msgstr "第6行：有了这些细节，应该调用函数并获取结果。这里，我们假设有一个名为[`get_function_by_name`](#prepcode)的函数来帮助我们根据名称获取相关函数。"

#: ../../source/framework/function_call.md:382 237839b4eb4048f6b22ab8d9c94f20c4
msgid "line 8-12: With the result obtained, add the function result to the messages as `content` and with `role` as `\"function\"`."
msgstr "第8-12行：获得结果后，将函数结果作为`content`添加到消息中，并将`role`设置为`\"function\"`。"

#: ../../source/framework/function_call.md:384 556c8b3b12544c5ab2fb8e5d2b7a818c
#, fuzzy
msgid "Now the messages are:"
msgstr "现在消息是"

#: ../../source/framework/function_call.md:386
#: ../../source/framework/function_call.md:421 dce3c134764b4690b2b0b37e36a9dbcb
#: edd9cf9e63af4a869eb3e7bb07c88328
msgid "`no_think` mode:"
msgstr ""

#: ../../source/framework/function_call.md:397
#: ../../source/framework/function_call.md:428 2f611a03bd624b2db666fb07fa8f658d
#: c18021fb05d5485ebe2e1afa6371ef65
msgid "`think` mode:"
msgstr ""

#: ../../source/framework/function_call.md:409
#: ../../source/framework/function_call.md:570 a5f394f05e9a4f80958a16e43ed31e7d
msgid "Final Response"
msgstr "最终响应"

#: ../../source/framework/function_call.md:411 215d04b89c1749f99831d4baa72f678d
msgid "Finally, run the model again to get the final model results:"
msgstr "最后，再次运行模型以获取最终的模型结果："

#: ../../source/framework/function_call.md:419 823711ca9c364335948f2e994740417e
msgid "The final response should be like"
msgstr "最终响应应如下所示"

#: ../../source/framework/function_call.md:438 a5f84b2e746e4669bd836f43afaf16a2
msgid "vLLM"
msgstr ""

#: ../../source/framework/function_call.md:440 4d64c942ff674ba88a9753c25635374d
msgid "vLLM is a fast and easy-to-use library for LLM inference and serving. It uses the tokenizer from `transformers` to format the input, so we should have no trouble preparing the input. In addition, vLLm also implements helper functions so that generated tool calls can be parsed automatically if the format is supported."
msgstr "vLLM 是一个快速且易于使用的库，用于大型语言模型的推理和部署。它使用 `transformers` 中的分词器来格式化输入，因此我们在准备输入时应该不会遇到任何问题。此外，vLLM 还实现了辅助函数，以便在支持的情况下自动解析生成的工具调用。"

#: ../../source/framework/function_call.md:444 f65e19e7888c4d9ea42f4c82ca89b496
msgid "`vllm` >= v0.8.5."
msgstr ""

#: ../../source/framework/function_call.md:446 6df4774659a84791b0dff63ed1b211c1
msgid "For more information, check the [vLLM documentation](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#tool-calling-in-the-chat-completion-api)."
msgstr "更多信息，请查阅 [vLLM 文档](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#tool-calling-in-the-chat-completion-api)"

#: ../../source/framework/function_call.md:448 c134b90394c941d49115838d2280a343
msgid "We will use the OpenAI-Compatible API by `vllm` with the API client from the `openai` Python library."
msgstr "在本指南中，我们将使用 `vllm` 提供的 OpenAI 兼容 API，并通过 `openai` Python 库的 API 客户端来进行操作。"

#: ../../source/framework/function_call.md:452 401f8cd3cf204f459a7f7c42e7dc43e7
msgid "For Qwen3, the chat template in tokenizer_config.json has already included support for the Hermes-style tool use. We simply need to start a OpenAI-compatible API with vLLM:"
msgstr "对于 Qwen3，`tokenizer_config.json` 中的聊天模板已经包含了对 Hermes 风格工具调用的支持。我们只需要启动一个由 vLLM 提供的 OpenAI 兼容 API 即可："

#: ../../source/framework/function_call.md:459 e27b0ff05a97431dbaa2a18d4e5b6ab5
msgid "The inputs are the same with those in [the preparation code](#prepcode):"
msgstr "输入与[准备代码](#prepcode)中的相同："

#: ../../source/framework/function_call.md:466 bac07622ec554e9ba4090940724d8d21
msgid "Let's also initialize the client:"
msgstr "我们先初始化API客户端："

#: ../../source/framework/function_call.md:484 46d45caf4eab44c69ad519d82e84818c
msgid "We can use the create chat completions endpoint to query the model.  Here is an example of the `no_think` mode:"
msgstr "我们可以使用create chat completions endpoint直接查询底层API。以下是使用非思考模式的例子："

#: ../../source/framework/function_call.md:502 4198241de2fb4e4ba5433b0c579734e1
msgid "vLLM should be able to parse the tool calls for us, and the main fields in the response (`response.choices[0]`) should be like"
msgstr "vLLM应当可以为我们解析工具调用，回复的主要字段(`response.choices[0]`)应如下所示："

#: ../../source/framework/function_call.md:529 050ffcaa034f4cd7b1405ab5a561b969
msgid "Note that the function arguments are JSON-formatted strings, which Qwen-Agent follows."
msgstr "请注意这里函数的参数是JSON格式字符串，Qwen-Agent与其一致。"

#: ../../source/framework/function_call.md:531 e3012b896829473cb0e1802047f5819e
msgid "As before, chances are that there are corner cases where tool calls are generated but they are malformed and cannot be parsed. For production code, we should try parsing by ourselves."
msgstr "如前所述，有可能存在边界情况，模型生成了工具调用但格式不良也无法被解析。对于生产代码，我们需要尝试自行解析。"

#: ../../source/framework/function_call.md:534 de4fb4d7f1b342e0849b049e81fb8585
msgid "Then, we can obtain the tool results and add them to the messages as shown below:"
msgstr "随后，我们可以调用工具并获得结果，然后将它们加入消息中："

#: ../../source/framework/function_call.md:555 e5e63bf3a7e34d0683d244549cbbbce1
msgid "It should be noted that the OpenAI API uses `tool_call_id` to identify the relation between tool results and tool calls."
msgstr "这里需要注意OpenAI API使用`tool_call_id`字段来识别工具结果和工具调用间的联系。"

#: ../../source/framework/function_call.md:557 deda1dccce204294ba69c74f24e4588f
msgid "The messages are now like"
msgstr "现在消息如下："

#: ../../source/framework/function_call.md:572 2dd5c4ae2dd84633809c8365c10fdaa8
msgid "Let's call the endpoint again to seed the tool results and get response:"
msgstr "让我们再次查询接口，以给模型提供工具结果并获得回复："

#: ../../source/framework/function_call.md:589 15f3ce61d4044fb1a06f75bdeb8140b2
msgid "The final response (`response.choices[0].message.content`) should be like"
msgstr "最终响应 (`response.choices[0].message.content`)应如"

#: ../../source/framework/function_call.md:595 1e8b7940bbbc46a8bbafb59afb086819
msgid "Finally"
msgstr "最后"

#: ../../source/framework/function_call.md:597 b588a910fcef435183d4d476851c10fb
msgid "In whichever way you choose to use function calling with Qwen3, keep in mind that the limitation and the perks of prompt engineering applies:"
msgstr "无论你选择哪种方式在Qwen3中使用函数调用，请记住提示工程的限制和优势适用："

#: ../../source/framework/function_call.md:598 feaec442d07c4d3fa7e20d3dd9724db1
msgid "It is not guaranteed that the model generation will always follow the protocol even with proper prompting or templates. Especially, for the templates that are more complex and relies more on the model itself to think and stay on track than the ones that are simpler and relies on the template and the use of control or special tokens. The latter one, of course, requires some kind of training. In production code, be prepared that if it breaks, countermeasures or rectifications are in place."
msgstr "无法保证模型生成将始终遵循协议，即使有适当的提示或模板。特别是对于那些更复杂且更多依赖于模型本身思考和保持方向的模板，而非那些更简单且依赖于模板以及控制或特殊标记使用的模板。当然，后者需要某种训练。在生产代码中，要准备好如果出现问题，采取补救措施或修正措施。"

#: ../../source/framework/function_call.md:602 d86a7ee6f6f64e40bea8bdb0991f3361
msgid "If in certain scenarios, the generation is not up to expectation, you can refine the template to add more instructions or constraints. While the templates mentioned here are general enough, they may not be the best or the most specific or the most concise for your use cases. The ultimate solution is fine-tuning using your own data."
msgstr "如果在某些场景下，生成结果未达到预期，你可以细化模板以添加更多指令或约束。尽管这里提到的模板足够通用，但对于你的具体使用案例，它们可能不是最佳的、最具体的或最简洁的。最终解决方案是使用你自己的数据进行微调。"

#: ../../source/framework/function_call.md:606 e09bd8deef0e413698e55cc1dfbc3283
msgid "Have fun prompting!"
msgstr "享受提示的乐趣吧！"


================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/framework/qwen_agent.po
================================================
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-05-16 18:57+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

#: ../../source/framework/qwen_agent.rst:2 74719c4bae294c5ea93e9f8542cef14c
msgid "Qwen-Agent"
msgstr "Qwen-Agent"

#: ../../source/framework/qwen_agent.rst:4 2a3d08cd70a34436bf5d9e1617a4d392
msgid "`Qwen-Agent <https://github.com/QwenLM/Qwen-Agent>`__ is a framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen."
msgstr "`Qwen-Agent <https://github.com/QwenLM/Qwen-Agent>`__ 是一个基于 Qwen 的指令跟随、工具使用、计划和记忆能力来开发 LLM 应用程序的框架。"

#: ../../source/framework/qwen_agent.rst:8 ada0e9f26c6748768f66c1c62e4b6d75
msgid "This is a simple tutorial on using Qwen-Agent to quickly experience the agentic capabilities of Qwen3. For more detailed information, please refer to `Qwen-Agent <https://github.com/QwenLM/Qwen-Agent>`__ repository."
msgstr "本教程展示基于 Qwen-Agent 快速体验 Qwen3 智能体能力的流程。更多信息请参考 `Qwen-Agent <https://github.com/QwenLM/Qwen-Agent>`__ 仓库。"

#: ../../source/framework/qwen_agent.rst:14 b0997eeb63844471b1637075add23cb0
msgid "Installation"
msgstr "安装"

#: ../../source/framework/qwen_agent.rst:16 b6ba4e319cd24dee88d5cdbde60b096b
msgid "Install the stable version from PyPI:"
msgstr "从 PyPI 安装 Qwen-Agent 的稳定版本："

#: ../../source/framework/qwen_agent.rst:29 3f5ac104f7a647b99b1146d72fdda96d
msgid "Developing Your Own Agent"
msgstr "开发您自己的智能体"

#: ../../source/framework/qwen_agent.rst:31 e85c4ea6d3d9406da3823cb4f188ffc4
msgid "Qwen3 excels in tool calling capabilities. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity."
msgstr "Qwen3 在工具调用能力方面表现出色。Qwen-Agent 内部封装了工具调用模板和工具调用解析器，大大降低了编码复杂性。"

#: ../../source/framework/qwen_agent.rst:35 d63624897ee340fb8d31eeea6a02e995
msgid "To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself."
msgstr "要定义可用的工具，您可以使用 MCP 配置文件，使用 Qwen-Agent 的集成工具，或者自行集成其他工具。"

#: ../../source/framework/qwen_agent.rst:112 ada0e9f26c6748768f66c1c62e4b6d75
msgid "For more detailed examples and MCP cookbooks, please refer to `Qwen-Agent <https://github.com/QwenLM/Qwen-Agent>`__ repository."
msgstr "有关更详细的示例和 MCP 使用指南，请参阅 `Qwen-Agent <https://github.com/QwenLM/Qwen-Agent>`__ 仓库。"

#~ msgid "To be updated for Qwen3."
#~ msgstr "仍需为Qwen3更新。"

#~ msgid "Qwen-Agent provides atomic components such as LLMs and prompts, as well as high-level components such as Agents. The example below uses the Assistant component as an illustration, demonstrating how to add custom tools and quickly develop an agent that uses tools."
#~ msgstr "Qwen-Agent 提供包括语言模型和提示词等原子级组件，及智能体等高级组件在内的多种组件。以下示例选取助理组件进行展示，阐述了如何整合自定义工具以及如何迅速开发出一个能够应用这些工具的代理程序。"

#~ msgid "The framework also provides more atomic components for developers to combine. For additional showcases, please refer to `examples <https://github.com/QwenLM/Qwen-Agent/tree/main/examples>`__."
#~ msgstr "该框架还为开发者提供了更多的原子组件以供组合使用。欲了解更多示例，请参见 `examples <https://github.com/QwenLM/Qwen-Agent/tree/main/examples>`__。"

#~ msgid "This is the simplest tutorial on using Qwen-Agent to quickly experience the agentic capabilities of Qwen3. For more detailed information, please refer to `Qwen-Agent <https://github.com/QwenLM/Qwen-Agent>`__ repository."
#~ msgstr ""



================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/getting_started/concepts.po
================================================
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

#: ../../Qwen/source/getting_started/concepts.md:1
#: 581ec8a4d8dd4b5a99caf167b796a6e9
msgid "Key Concepts"
msgstr "核心概念"

#: ../../Qwen/source/getting_started/concepts.md:4
#: fc803dd8f02a4caf9be29e42364659a0
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"

#: ../../Qwen/source/getting_started/concepts.md:7
#: 834244ff25a040fe91f63682732dd416
msgid "Qwen"
msgstr "通义千问 (Qwen)"

#: ../../Qwen/source/getting_started/concepts.md:9
#: ee9dee3630614908860b2144007186fd
msgid "Qwen (Chinese: 通义千问; pinyin: _Tongyi Qianwen_) is the large language model and large multimodal model series of the Qwen Team, Alibaba Group.  Qwen is capable of natural language understanding, text generation, vision understanding, audio understanding, tool use, role play, playing as AI agent, etc.  Both language models and multimodal models are pre-trained on large-scale multilingual and multimodal data and post-trained on quality data for aligning to human preferences."
msgstr "通义千问（英文： Qwen ；读作： _kùn_）是由阿里巴巴通义千问团队开发的大规模语言和多模态系列模型。通义千问可以执行自然语言理解、文本生成、视觉理解、音频理解、工具调用、角色扮演、智能体等多种任务。语言和多模态模型均在大规模、多语言、多模态数据上进行预训练，并在高质量语料上后训练以与人类偏好对齐。"

#: ../../Qwen/source/getting_started/concepts.md:13
#: 6a37d9a0b6e2414a9b7ede0e095476af
msgid "There is the proprietary version and the open-weight version."
msgstr ""

#: ../../Qwen/source/getting_started/concepts.md:15
#: 4fba11f4661b4e469f88dc3917b27427
msgid "The proprietary versions include"
msgstr ""

#: ../../Qwen/source/getting_started/concepts.md:16
#: ../../Qwen/source/getting_started/concepts.md:31
#: be8423cea0b447c2b15de596c120f541 d07679ae34d0463f96aeff896a759118
msgid "Qwen: the language models"
msgstr "通义千问 (Qwen)：语言模型"

#: ../../Qwen/source/getting_started/concepts.md:17
#: a1461ec445034ba099aa58b1a13375a0
#, fuzzy
msgid "Qwen Max"
msgstr "通义千问 (Qwen)"

#: ../../Qwen/source/getting_started/concepts.md:18
#: 19f8d7108d69464a8d1ce2980c1e4e92
#, fuzzy
msgid "Qwen Plus"
msgstr "通义千问 (Qwen)"

#: ../../Qwen/source/getting_started/concepts.md:19
#: ede369bc8dd24052ad674131f4a3b68a
msgid "Qwen Turbo"
msgstr ""

#: ../../Qwen/source/getting_started/concepts.md:20
#: ../../Qwen/source/getting_started/concepts.md:36
#: ddb0acdec40b4f79a3e6517f86727e4b e4df2227d36a46ee8644ce77f9fc1dc0
msgid "Qwen-VL: the vision-language models"
msgstr "通义千问 VL (Qwen-VL): 视觉语言模型"

#: ../../Qwen/source/getting_started/concepts.md:21
#: f9f5a5b50af44e90999a87661cdf4e5a
msgid "Qwen-VL Max"
msgstr ""

#: ../../Qwen/source/getting_started/concepts.md:22
#: fd0074955211498c8520ef3405bf312f
msgid "Qwen-VL Plus"
msgstr ""

#: ../../Qwen/source/getting_started/concepts.md:23
#: 40c8d32d570c4a76a5392c8e296c3793
msgid "Qwen-VL OCR"
msgstr ""

#: ../../Qwen/source/getting_started/concepts.md:24
#: ../../Qwen/source/getting_started/concepts.md:39
#: c0e45bd6e6b44ac7b18ef6a511c0999e f84666b662ab4d5ea41766d46f34fbc0
msgid "Qwen-Audio: the audio-language models"
msgstr "通义千问 Audio: 音频语言模型"

#: ../../Qwen/source/getting_started/concepts.md:25
#: 0584dbb5e76949ea965661c535e982d7
msgid "Qwen-Audio Turbo"
msgstr ""

#: ../../Qwen/source/getting_started/concepts.md:26
#: aa78dd31bce94f6db05be93976278455
msgid "Qwen-Audio ASR"
msgstr ""

#: ../../Qwen/source/getting_started/concepts.md:28
#: df255434cec04d12b8e2d048d4e5baf8
msgid "You can learn more about them at Alibaba Cloud Model Studio ([China Site](https://help.aliyun.com/zh/model-studio/getting-started/models#9f8890ce29g5u) \\[zh\\], [International Site](https://www.alibabacloud.com/en/product/modelstudio))."
msgstr ""

#: ../../Qwen/source/getting_started/concepts.md:30
#: bc0fbc68d29b49da90efba3358f5013f
msgid "The spectrum for the open-weight models spans over"
msgstr "开源模型包括："

#: ../../Qwen/source/getting_started/concepts.md:32
#: e3107d97ea1b4e0284c2a33c0da02813
msgid "[Qwen](https://github.com/QwenLM/Qwen): 1.8B, 7B, 14B, and 72B models"
msgstr "[Qwen](https://github.com/QwenLM/Qwen): 1.8B、 7B、 14B 及 72B 模型"

#: ../../Qwen/source/getting_started/concepts.md:33
#: 8918b660d015430a8d14c3c62b87b19d
msgid "[Qwen1.5](https://github.com/QwenLM/Qwen1.5/tree/v1.5): 0.5B, 1.8B, 4B, 14BA2.7B, 7B, 14B, 32B, 72B, and 110B models"
msgstr "[Qwen1.5](https://github.com/QwenLM/Qwen1.5/tree/v1.5): 0.5B、 1.8B、 4B、 14BA2.7B、 7B、 14B、 32B、 72B 及 110B 模型"

#: ../../Qwen/source/getting_started/concepts.md:34
#: c5ad94aa9d524a7290d9d0ec35321641
msgid "[Qwen2](https://github.com/QwenLM/Qwen2/tree/v2.0): 0.5B, 1.5B, 7B, 57A14B, and 72B models"
msgstr "[Qwen2](https://github.com/QwenLM/Qwen2/tree/v2.0): 0.5B、 1.5B、 7B、 57A14B 及 72B 模型"

#: ../../Qwen/source/getting_started/concepts.md:35
#: 5c38bd713ca847b4bb552971cdd75a99
msgid "[Qwen2.5](https://github.com/QwenLM/Qwen2.5/): 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B models"
msgstr "[Qwen2.5](https://github.com/QwenLM/Qwen2.5/): 0.5B、 1.5B、 3B、 7B、 14B、 32B 及 72B 模型"

#: ../../Qwen/source/getting_started/concepts.md:37
#: aa36bbcafdf742a9addd2a7b32705a02
msgid "[Qwen-VL](https://github.com/QwenLM/Qwen-VL): 7B-based models"
msgstr "[Qwen-VL](https://github.com/QwenLM/Qwen-VL): 基于 7B 的模型"

#: ../../Qwen/source/getting_started/concepts.md:38
#: 9d1d663950d34fefbfc7df37fa1def7a
msgid "[Qwen2-VL](https://github.com/QwenLM/Qwen2-VL): 2B, 7B, and 72B-based models"
msgstr "[Qwen-VL](https://github.com/QwenLM/Qwen2-VL): 基于 2B 、 7B 和 72B 的模型"

#: ../../Qwen/source/getting_started/concepts.md:40
#: bb8a3431ea1f4cc99b4b8dd78e55d9ad
msgid "[Qwen-Audio](https://github.com/QwenLM/Qwen-Audio): 7B-based model"
msgstr "[Qwen-Audio](https://github.com/QwenLM/Qwen-Audio): 基于 7B 的模型"

#: ../../Qwen/source/getting_started/concepts.md:41
#: 2421a5d0f547440bbf0211274bf44d5d
msgid "[Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio): 7B-based models"
msgstr "[Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio): 基于 7B 的模型"

#: ../../Qwen/source/getting_started/concepts.md:42
#: df8610d7dfbf4651a955dc909b727061
#, fuzzy
msgid "Q*Q: the reasoning models"
msgstr "通义千问 (Qwen)：语言模型"

#: ../../Qwen/source/getting_started/concepts.md:43
#: 67e209da7f5848adab885598a9069f11
#, fuzzy
msgid "[QwQ-Preview](https://github.com/QwenLM/Qwen2.5/): 32B LLM"
msgstr "[Qwen2.5-Coder](https://github.com/QwenLM/Qwen2.5-Coder): 7B 模型"

#: ../../Qwen/source/getting_started/concepts.md:44
#: 4e9ad66b735a4109ab8ec727486c463c
#, fuzzy
msgid "[QVQ-Preview](https://github.com/QwenLM/Qwen2-VL): 72B VLM"
msgstr "[Qwen-VL](https://github.com/QwenLM/Qwen-VL): 基于 7B 的模型"

#: ../../Qwen/source/getting_started/concepts.md:45
#: 728cd9f1dc9d4502ad9a3702e802fc2e
msgid "CodeQwen/Qwen-Coder: the language models for coding"
msgstr "Code通义千问 / 通义千问Coder：代码语言模型"

#: ../../Qwen/source/getting_started/concepts.md:46
#: 133fd513d7084b54bfe910fda13a42ec
msgid "[CodeQwen1.5](https://github.com/QwenLM/CodeQwen1.5): 7B models"
msgstr "[CodeQwen1.5](https://github.com/QwenLM/CodeQwen1.5): 7B 模型"

#: ../../Qwen/source/getting_started/concepts.md:47
#: a903957acc0d458b8200788144be0b4d
#, fuzzy
msgid "[Qwen2.5-Coder](https://github.com/QwenLM/Qwen2.5-Coder): 0.5B, 1.5B, 3B, 7B, 14B, and 32B models"
msgstr "[Qwen2.5](https://github.com/QwenLM/Qwen2.5/): 0.5B、 1.5B、 3B、 7B、 14B、 32B 及 72B 模型"

#: ../../Qwen/source/getting_started/concepts.md:48
#: 6c47a9310a6945719b35da4bff3e0c9e
msgid "Qwen-Math: the language models for mathematics"
msgstr "通义千问 Math：数学语言模型"

#: ../../Qwen/source/getting_started/concepts.md:49
#: fadbf7de806d4f288fc4355b52bcc060
msgid "[Qwen2-Math](https://github.com/QwenLM/Qwen2-Math): 1.5B, 7B, and 72B models"
msgstr "[Qwen2-Math](https://github.com/QwenLM/Qwen2-Math)： 1.5B、 7B 及 72B 模型"

#: ../../Qwen/source/getting_started/concepts.md:50
#: 0066352e253345288d16bb1a8df40e1c
msgid "[Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math): 1.5B, 7B, and 72B models"
msgstr "[Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math)： 1.5B、 7B 及 72B 模型"

#: ../../Qwen/source/getting_started/concepts.md:51
#: b45ed6f1601c41f8a33f6b2b6ff8b47b
#, fuzzy
msgid "Qwen-Math-RM: the reward models for mathematics"
msgstr "通义千问 Math：数学语言模型"

#: ../../Qwen/source/getting_started/concepts.md:52
#: 286e8dd455ef4bab91821d399dd4a582
#, fuzzy
msgid "[Qwen2-Math-RM](https://github.com/QwenLM/Qwen2-Math): 72B models"
msgstr "[Qwen2-Math](https://github.com/QwenLM/Qwen2-Math)： 1.5B、 7B 及 72B 模型"

#: ../../Qwen/source/getting_started/concepts.md:53
#: 81eb8401de1646309924a74e633b9b45
#, fuzzy
msgid "[Qwen2.5-Math-RM](https://github.com/QwenLM/Qwen2.5-Math): 72B models"
msgstr "[Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math)： 1.5B、 7B 及 72B 模型"

#: ../../Qwen/source/getting_started/concepts.md:54
#: e0cd026299ba4809a86504afbe2dd8d5
#, fuzzy
msgid "[Qwen2.5-Math-PRM](https://github.com/QwenLM/Qwen2.5-Math): 7B and 72B models"
msgstr "[Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math)： 1.5B、 7B 及 72B 模型"

#: ../../Qwen/source/getting_started/concepts.md:56
#: acec8c22ff094ebe8295cad38ec7a8db
msgid "**In this document, our focus is Qwen, the language models.**"
msgstr "**本文档针对通义千问 (Qwen) 语言模型。**"

#: ../../Qwen/source/getting_started/concepts.md:58
#: e1e6ade4e85b4975bf992ed0a9c99140
msgid "Causal Language Models"
msgstr "因果语言模型 (Causal Language Models)"

#: ../../Qwen/source/getting_started/concepts.md:60
#: 593921d01e7a41caa52eda69db81c908
msgid "Causal language models, also known as autoregressive language models or decoder-only language models, are a type of machine learning model designed to predict the next token in a sequence based on the preceding tokens.  In other words, they generate text one token at a time, using the previously generated tokens as context.  The \"causal\" aspect refers to the fact that the model only considers the past context (the already generated tokens) when predicting the next token, not any future tokens."
msgstr "因果语言模型 (causal Language Models)，也被称为自回归语言模型 (autoregressive language models) 或仅解码器语言模型 (decoder-only language models) ，是一种机器学习模型，旨在根据序列中的前导 token 预测下一个 token 。换句话说，它使用之前生成的 token 作为上下文，一次生成一个 token 的文本。\"因果\"方面指的是模型在预测下一个 token 时只考虑过去的上下文（即已生成的 token ），而不考虑任何未来的 token 。"

#: ../../Qwen/source/getting_started/concepts.md:64
#: 4b31da2c06c54107857edcb2764e0019
msgid "Causal language models are widely used for various natural language processing tasks involving text completion and generation.  They have been particularly successful in generating coherent and contextually relevant text, making them a cornerstone of modern natural language understanding and generation systems."
msgstr "因果语言模型被广泛用于涉及文本补全和生成的各种自然语言处理任务。它们在生成连贯且具有上下文关联性的文本方面尤其成功，这使得它们成为现代自然语言理解和生成系统的基础。"

#: ../../Qwen/source/getting_started/concepts.md:67
#: 98f73b1f049641038ec1b310a219b209
msgid "**Takeaway: Qwen models are causal language models suitable for text completion.**"
msgstr "**要点：Qwen 模型是适用于文本补全的因果语言模型。**"

#: ../../Qwen/source/getting_started/concepts.md
#: 2f5c19be905046e1ae669119e3bb6e7c
msgid "Learn more about language models"
msgstr "了解更多关于语言模型的信息"

#: ../../Qwen/source/getting_started/concepts.md:71
#: 557d7c8bafb94a34b76b6d96a3ce46ff
msgid "They are three main kinds of models that are commonly referred to as language models in deep learning:"
msgstr "在深度学习中，被称为语言模型的主要有三类："

#: ../../Qwen/source/getting_started/concepts.md:72
#: 89ef0f95d0f5492f877ddceb0233d2fc
msgid "Sequence-to-sequence models: T5 and the likes"
msgstr "序列到序列模型 (sequence-to-sequence models)：T5及其类似模型"

#: ../../Qwen/source/getting_started/concepts.md:74
#: 80f14b7e5beb41d7920772b053681e24
msgid "Sequence-to-sequence models use both an encoder to capture the entire input sequence and a decoder to generate an output sequence. They are widely used for tasks like machine translation, text summarization, etc."
msgstr "序列到序列模型同时使用编码器来捕获整个输入序列，以及解码器来生成输出序列。它们广泛应用于诸如机器翻译、文本摘要等任务。"

#: ../../Qwen/source/getting_started/concepts.md:77
#: 0b15c87feae5409f80999e86ad5f5942
msgid "Bidirectional models or encoder-only models: BERT and the likes"
msgstr "双向模型 (bidirectional models) 或仅编码器模型 (encoder-only models) ：BERT及其类似模型"

#: ../../Qwen/source/getting_started/concepts.md:79
#: 7439fe506ee64fbfaba86bb409cb76ca
msgid "Bidirectional models can access both past and future context in a sequence during training. They cannot generate sequential outputs in real-time due to the need for future context. They are widely used as embedding models and subsequently used for text classification."
msgstr "双向模型在训练期间可以访问序列中的过去和未来上下文。由于需要未来上下文，它们无法实时生成顺序输出。它们广泛用作嵌入模型，并随后用于文本分类。"

#: ../../Qwen/source/getting_started/concepts.md:83
#: c7f7ae809802445bbaafc7d7f783c71a
msgid "Casual language models or decoder-only models: GPT and the likes"
msgstr "因果语言模型 (casual language models) 或仅解码器模型 (decoder-only models) ：GPT及其类似模型"

#: ../../Qwen/source/getting_started/concepts.md:85
#: b2825bdbf41c485c849444fc734fde43
msgid "Causal language models operate unidirectionally in a strictly forward direction, predicting each subsequent word based only on the previous words in the sequence.  This unidirectional nature ensures that the model's predictions do not rely on future context, making them suitable for tasks like text completion and generation."
msgstr "因果语言模型以严格向前的单向方式运行，仅根据序列中的前导词汇预测每个后续词汇。这种单向性确保了模型的预测不依赖于未来上下文，使它们适合于文本补全和生成等任务。"

#: ../../Qwen/source/getting_started/concepts.md:89
#: 26bfa80a4e224b9ca3494f83fc37b0b6
msgid "Pre-training & Base models"
msgstr "预训练 (Pre-training) 和基模型 (Base models)"

#: ../../Qwen/source/getting_started/concepts.md:91
#: d75a1bc5132a43e8b41ce24b8021e7ab
msgid "Base language models are foundational models trained on extensive corpora of text to predict the next word in a sequence.  Their main goal is to capture the statistical patterns and structures of language, enabling them to generate coherent and contextually relevant text.  These models are versatile and can be adapted to various natural language processing tasks through fine-tuning.  While adept at producing fluent text, they may require in-context learning or additional training to follow specific instructions or perform complex reasoning tasks effectively. For Qwen models, the base models are those without \"-Instruct\" indicators, such as Qwen2.5-7B and Qwen2.5-72B."
msgstr "基础语言模型 (base language models) 是在大量文本语料库上训练的基本模型，用于预测序列中的下一个词。它们的主要目标是捕捉语言的统计模式和结构，使它们能够生成连贯且具有上下文关联性的文本。这些模型具有多功能性，可以通过微调适应各种自然语言处理任务。虽然擅长生成流畅的文本，但它们可能需要情境学习 (in-context learning)或额外训练才能遵循特定指令或有效执行复杂推理任务。对于 Qwen 模型，基础模型是指那些没有 \"-Instruct\" 标识符的模型，例如 Qwen2.5-7B 和 Qwen2.5-72B 。"

#: ../../Qwen/source/getting_started/concepts.md:97
#: 7f7321ea84f34e29beabf6122a77ec64
msgid "**Takeaway: Use base models for in-context learning, downstream fine-tuning, etc.**"
msgstr "**要点：使用基础模型进行情境学习、下游微调等。**"

#: ../../Qwen/source/getting_started/concepts.md:99
#: b1d8ca8221c0494796dda85ac2456389
msgid "Post-training & Instruction-tuned models"
msgstr "后训练 (Post-training) 和指令微调模型 (Instruction-tuned models)"

#: ../../Qwen/source/getting_started/concepts.md:101
#: 2f55c1d2c9234c44ab55bf90fcb1b10f
msgid "Instruction-tuned language models are specialized models designed to understand and execute specific instructions in conversational styles. These models are fine-tuned to interpret user commands accurately and can perform tasks such as summarization, translation, and question answering with improved accuracy and consistency.  Unlike base models, which are trained on large corpora of text, instruction-tuned models undergo additional training using datasets that contain examples of instructions and their desired outcomes, often in multiple turns. This kind of training makes them ideal for applications requiring targeted functionalities while maintaining the ability to generate fluent and coherent text. For Qwen models, the instruction-tuned models are those with the \"-Instruct\" suffix, such as Qwen2.5-7B-Instruct and Qwen2.5-72B-Instruct. [^instruct-chat]"
msgstr "指令微调语言模型 (Instruction-tuned language models) 是专门设计用于理解并以对话风格执行特定指令的模型。这些模型经过微调，能准确地解释用户命令，并能以更高的准确性和一致性执行诸如摘要、翻译和问答等任务。与在大量文本语料库上训练的基础模型不同，指令调优模型会使用包含指令示例及其预期结果的数据集进行额外训练，通常涵盖多个回合。这种训练方式使它们非常适合需要特定功能的应用，同时保持生成流畅且连贯文本的能力。对于 Qwen 模型，指令调优模型是指带有 \"-Instruct\" 后缀的模型，例如 Qwen2.5-7B-Instruct 和 Qwen2.5-72B-Instruct 。 [^instruct-chat]"

#: ../../Qwen/source/getting_started/concepts.md:107
#: d5b5590ccf434715bd57d0746f196cfe
msgid "**Takeaway: Use instruction-tuned models for conducting tasks in conversations, downstream fine-tuning, etc.**"
msgstr "**要点：使用指令微调模型进行对话式的任务执行、下游微调等。**"

#: ../../Qwen/source/getting_started/concepts.md:112
#: 5dc4cca1e5104c67b1a3bcdd004e7a9d
msgid "Tokens & Tokenization"
msgstr "Tokens & Tokenization"

#: ../../Qwen/source/getting_started/concepts.md:114
#: 9e3a74bf95fd40e49fef921a0d0df6ff
msgid "Tokens represent the fundamental units that models process and generate.  They can represent texts in human languages (regular tokens) or represent specific functionality like keywords in programming languages (control tokens [^special]). Typically, a tokenizer is used to split text into regular tokens, which can be words, subwords, or characters depending on the specific tokenization scheme employed, and furnish the token sequence with control tokens as needed. The vocabulary size, or the total number of unique tokens a model recognizes, significantly impacts its performance and versatility.  Larger language models often use sophisticated tokenization methods to handle the vast diversity of human language while keeping the vocabulary size manageable. Qwen use a relatively large vocabulary of 151,646 tokens in total."
msgstr "token 代表模型处理和生成的基本单位。它们可以表示人类语言中的文本（常规 token），或者表示特定功能，如编程语言中的关键字（控制 token [^special]）。通常，使用 tokenizer 将文本分割成常规 token ，这些 token 可以是单词、子词或字符，具体取决于所采用的特定 tokenization 方案，并按需为 token 序列添加控制 token 。词表大小，即模型识别的唯一 token 总数，对模型的性能和多功能性有重大影响。大型语言模型通常使用复杂的 tokenization 来处理人类语言的广阔多样性，同时保持词表大小可控。Qwen 词表相对较大，有 15 1646 个 token。"

#: ../../Qwen/source/getting_started/concepts.md:123
#: 9e1c049b23fc403ea61919a755ae865a
msgid "**Takeaway: Tokenization method and vocabulary size is important.**"
msgstr "**要点：tokenization 和词表大小很重要。**"

#: ../../Qwen/source/getting_started/concepts.md:125
#: 0a01476839134505b1e2e004f67c876b
msgid "Byte-level Byte Pair Encoding"
msgstr "Byte-level Byte Pair Encoding"

#: ../../Qwen/source/getting_started/concepts.md:127
#: e461340d6e834aaeb233649a70618165
msgid "Qwen adopts a subword tokenization method called Byte Pair Encoding (BPE), which attempts to learn the composition of tokens that can represent the text with the fewest tokens.  For example, the string \" tokenization\" is decomposed as \" token\" and \"ization\" (note that the space is part of the token). Especially, the tokenization of Qwen ensures that there is no unknown words and all texts can be transformed to token sequences."
msgstr "Qwen采用了名为字节对编码（Byte Pair Encoding，简称BPE）的子词tokenization方法，这种方法试图学习能够用最少的 token 表示文本的 token 组合。例如，字符串\"tokenization\"被分解为\" token\"和\"ization\"（注意空格是 token 的一部分）。特别地，Qwen的 tokenization 确保了不存在未知词汇，并且所有文本都可以转换为 token 序列。"

#: ../../Qwen/source/getting_started/concepts.md:131
#: af40a128cbe44fb59a057f9477737197
msgid "There are 151,643 tokens as a result of BPE in the vocabulary of Qwen, which is a large vocabulary efficient for diverse languages. As a rule of thumb, 1 token is 3~4 characters for English texts and 1.5~1.8 characters for Chinese texts."
msgstr "Qwen词表中因BPE而产生的 token 数量为 15 1643 个，这是一个适用于多种语言的大词表。一般而言，对于英语文本，1个token大约是3~4个字符；而对于中文文本，则大约是1.5~1.8个汉字。"

#: ../../Qwen/source/getting_started/concepts.md:134
#: 3b92bf813f14474f842584fa9bf4fdee
msgid "**Takeaway: Qwen processes texts in subwords and there are no unknown words.**"
msgstr "**要点：Qwen 以子词形式处理文本，不存在未知词汇。**"

#: ../../Qwen/source/getting_started/concepts.md
#: b29e165e1810403dbcd90cfedd8c73a6
msgid "Learn more about tokenization in Qwen"
msgstr "了解更多"

#: ../../Qwen/source/getting_started/concepts.md:137
#: b7fa098dbce946c9847eb414f7d52b9e
msgid "Qwen uses byte-level BPE (BBPE) on UTF-8 encoded texts.  It starts by treating each byte as a token and then iteratively merges the most frequent pairs of tokens occurring the texts into larger tokens until the desired vocabulary size is met."
msgstr "Qwen 使用基于字节的BPE (BBPE) 对UTF-8编码的文本进行处理。它开始时将每个字节视为一个 token ，然后迭代地将文本中最频繁出现的 token 对合并成更大的 token，直到达到所需的词表大小。"

#: ../../Qwen/source/getting_started/concepts.md:140
#: 504bb23b689949dd9bbee78f97d7e0a0
msgid "In byte-level BPE, minimum 256 tokens are needed to tokenize every piece of text and avoid the out of vocabulary (OOV) problem. In comparison, character-level BPE needs every Unicode character in its vocabulary to avoid OOV and the Unicode Standard contains 154,998 characters as of Unicode Version 16.0."
msgstr "在基于字节的BPE中，至少需要256个 token 来对每段文本进行 tokenization，并避免未登录词（out of vocabulary, OOV）问题。相比之下，基于字符的 BPE 需要其词表中包含所有 Unicode 字符以避免未登录词，而截至 Unicode 版本16.0，Unicode标准包含 15 4998 个字符。"

#: ../../Qwen/source/getting_started/concepts.md:143
#: cfed44d0c905486cb7e12838014249e1
msgid "One limitation to keep in mind for byte-level BPE is that the individual tokens in the vocabulary may not be seemingly semantically meaningful or even valid UTF-8 byte sequences, and in certain aspects, they should be viewed as a text compression scheme."
msgstr "基于字节的BPE的一个限制是，词表中的个别 token 可能看似没有语义意义，甚至不是有效的 UTF-8 字节序列，在某些方面，它们应该被视为一种文本压缩方案。"

#: ../../Qwen/source/getting_started/concepts.md:146
#: 4c6140ebdb0742e199793a7da566943e
msgid "Control Tokens & Chat Template"
msgstr "控制 Token 和 对话模板"

#: ../../Qwen/source/getting_started/concepts.md:148
#: 7fab9c7227b94996bbdd30a2dd6a11cc
msgid "Control tokens and chat templates both serve as mechanisms to guide the model's behavior and outputs."
msgstr "控制 token 和对话模板都作为指导模型行为和输出的机制。"

#: ../../Qwen/source/getting_started/concepts.md:150
#: 9d38b62cddc34442bffc173b6c5e15ea
msgid "Control tokens are special tokens inserted into the sequence that signifies meta information. For example, in pre-training, multiple documents may be packed into a single sequence. For Qwen, the control token \"<|endoftext|>\" is inserted after each document to signify that the document has ended and a new document will proceed."
msgstr "控制token是插入到序列中的特殊token，表示元信息。例如，在预训练中，多个文档可以被打包成一个单一的序列。对于Qwen，控制令牌 \"<|endoftext|>\" 在每个文档后插入，表示文档已经结束，新的文档将开始。"

#: ../../Qwen/source/getting_started/concepts.md:154
#: aed5af70b3de447b9b3c1312f040f103
msgid "Chat templates provide a structured format for conversational interactions, where predefined placeholders or prompts are used to elicit responses from the model that adhere to a desired dialogue flow or context. Different models may use different kinds of chat template to format the conversations.  It is crucial to use the designated one to ensure the precise control over the LLM's generation process."
msgstr "对话模板为对话交互提供了结构化的格式，其中使用预定义的占位符或提示来从模型中引发遵循期望的对话流程或上下文的响应。不同的模型可能使用不同类型的对话模板来格式化对话。使用指定的模板对于确保对语言模型生成过程的精确控制至关重要。"

#: ../../Qwen/source/getting_started/concepts.md:158
#: 7acbb7b28f1746a8b779a004a7dc2d93
msgid "Qwen uses the following format (ChatML[^chatml]), making use of control tokens to format each turn in the conversations"
msgstr "Qwen使用以下格式（ChatML[^chatml]），利用控制 token 来格式化对话中的每一轮。"

#: ../../Qwen/source/getting_started/concepts.md:163
#: 33f3aee8869748fa9f7a51c7efa76338
msgid "The user input take the role of `user` and the model generation takes the role of `assistant`.  Qwen also supports the meta message that instruct the model to perform specific actions or generate text with certain characteristics, such as altering tone, style, or content, which takes the role of `system` and the content defaults to \"You are Qwen, created by Alibaba Cloud. You are a helpful assistant.\""
msgstr "用户输入扮演 `user` 的 role ，而模型生成则承担 `assistant` 的 role 。 Qwen 还支持元消息，该消息指导模型执行特定操作或生成具有特定特性的文本，例如改变语气、风格或内容，这将承担 `system` 的 role，且内容默认为 \"You are Qwen, created by Alibaba Cloud. You are a helpful assistant.\" 。"

#: ../../Qwen/source/getting_started/concepts.md:166
#: 0129cbc394614f5f94047592df13c9b6
msgid "The following is a full example:"
msgstr "下面为一个完整示例"

#: ../../Qwen/source/getting_started/concepts.md:183
#: 59bab0422fa34a19ab2995e6ff15dc56
msgid "Starting from Qwen2.5, the Qwen model family including multimodal and specialized models will use a unified vocabulary, which contains control tokens from all subfamilies. There are 22 control tokens in the vocabulary of Qwen2.5, making the vocabulary size totaling 151,665:"
msgstr "从 Qwen2.5 开始，Qwen 模型家族，包括多模态和专项模型，将使用统一的词汇表，其中包含了所有子系列的控制 token 。Qwen2.5 的词汇表中有 22 个控制 token，使得词汇表的总规模达到 15 1665 。"

#: ../../Qwen/source/getting_started/concepts.md:185
#: 701bd6f896634b0aaf2920d883268a16
msgid "1 general: `<|endoftext|>`"
msgstr "通用 token 1个：`<|endoftext|>`"

#: ../../Qwen/source/getting_started/concepts.md:186
#: 7e78239f93a245dbb046d4ae2afe8a72
msgid "2 for chat: `<|im_start|>` and `<|im_end|>`"
msgstr "对话 token 2个：`<|im_start|>` 和 `<|im_end|>`"

#: ../../Qwen/source/getting_started/concepts.md:187
#: eb686086dfe44d53a5cdfc98e9bbaad8
msgid "2 for tool use: `<tool_call>` and `</tool_call>`"
msgstr "工具调用 token 2个： `<tool_call>` 和 `</tool_call>`"

#: ../../Qwen/source/getting_started/concepts.md:188
#: c8259cada9e94790a759a4b1f8edaf2d
msgid "11 for vision"
msgstr "视觉相关 token 11个"

#: ../../Qwen/source/getting_started/concepts.md:189
#: 9b67870139b144c8ae4451e3deb1c1c5
msgid "6 for coding"
msgstr "代码相关 token 6个"

#: ../../Qwen/source/getting_started/concepts.md:191
#: 32c9581187f640d2a37cca85390bf1de
msgid "**Takeaway: Qwen uses ChatML with control tokens for chat template.**"
msgstr "**要点: Qwen 使用带有控制 token 的 ChatML 作为对话模板。**"

#: ../../Qwen/source/getting_started/concepts.md:195
#: 74d8b323a0864a9c94a78f154a5c86c0
msgid "Length Limit"
msgstr "长度限制"

#: ../../Qwen/source/getting_started/concepts.md:197
#: 2833c71b35d94ff0b6825f86bc9be098
msgid "As Qwen models are causal language models, in theory there is only one length limit of the entire sequence. However, since there is often packing in training and each sequence may contain multiple individual pieces of texts.  **How long the model can generate or complete ultimately depends on the use case and in that case how long each document (for pre-training) or each turn (for post-training) is in training.**"
msgstr "由于 Qwen 模型是因果语言模型，理论上整个序列只有一个长度限制。然而，由于在训练中通常存在打包现象，每个序列可能包含多个独立的文本片段。**模型能够生成或完成的长度最终取决于具体的应用场景，以及在这种情况下，预训练时每份文档或后训练时每轮对话的长度。**"

#: ../../Qwen/source/getting_started/concepts.md:201
#: 1d25c6232d924639b313a1a66d1990c9
msgid "For Qwen2.5, the packed sequence length in training is 32,768 tokens.[^yarn] The maximum document length in pre-training is this length. The maximum message length for user and assistant is different in post-training. In general, the assistant message could be up to 8192 tokens."
msgstr "对于Qwen2.5，在训练中的打包序列长度为 3 2768 个 token [^yarn]。预训练中的最大文档长度即为此长度。而后训练中，user和assistant的最大消息长度则有所不同。一般情况下，assistant消息长度可达 8192 个 token。"

#: ../../Qwen/source/getting_started/concepts.md:209
#: f39c2748eccb486794c941d23b23835c
msgid "**Takeaway: Qwen2.5 models can process texts of 32K or 128K tokens and up to 8K tokens can be assistant output.**"
msgstr "**要点：Qwen2 模型可以处理 32K 或 128K token 长的文本，其中 8K 长度可作为输出。**"

#: ../../Qwen/source/getting_started/concepts.md:109
#: 7195ff6a5d1a4e6881f272081c9885d7
msgid "Previously, they are known as the chat models and with the \"-Chat\" suffix. Starting from Qwen2, the name is changed to follow the common practice. For Qwen, \"-Instruct\" and \"-Chat\" should be regarded as synonymous."
msgstr "此前，它们被称为对话模型，并带有\"-Chat\"后缀。从Qwen2开始，名称变更为遵循通用做法。对于Qwen，\"-Instruct\"和\"-Chat\"应被视为同义词。"

#: ../../Qwen/source/getting_started/concepts.md:121
#: f50caec63c8948a894dbf8c718f0b2d8
msgid "Control tokens can be called special tokens. However, the meaning of special tokens need to be interpreted based on the contexts: special tokens may contain extra regular tokens."
msgstr "控制 token 也可以称为“特殊 token”。但是，特殊 token 的意义需要根据上下文进行解释：特殊 token 也可能包含额外的常规 token。"

#: ../../Qwen/source/getting_started/concepts.md:193
#: fc70e6f93b71452ca0d09aa0ff28dd54
msgid "For historical reference only, ChatML is first described by the OpenAI Python SDK. The last available version is [this](https://github.com/openai/openai-python/blob/v0.28.1/chatml.md). Please also be aware that that document lists use cases intended for OpenAI models. For Qwen2.5 models, please only use as in our guide."
msgstr "仅供历史参考，ChatML最初由OpenAI的Python SDK描述。可获取的最新版本是[这个](https//github.com/openai/openai-python/blob/v0.28.1/chatml.md)。请注意，该文档列出的应用案例是为OpenAI模型设计的。对于Qwen2.5模型，请仅按照我们的指南使用。"

#: ../../Qwen/source/getting_started/concepts.md:206
#: a08b83b36c2d4e8d8f3dbb020ecb37a2
msgid "The sequence length can be extended to 131,072 tokens for Qwen2.5-7B, Qwen2.5-14B, Qwen2.5-32B, and Qwen2.5-72B models with YaRN.      Please refer to the model card on how to enable YaRN in vLLM."
msgstr "使用YaRN，Qwen2.5-7B、Qwen2.5-14B、Qwen2.5-32B和Qwen2-72B模型的序列长度可以扩展到13 1072个token。请参考模型卡片了解如何在 vLLM 中启用 YaRN。"

#~ msgid "There is the proprietary version hosted exclusively at [Alibaba Cloud \\[zh\\]](https://help.aliyun.com/zh/model-studio/developer-reference/tongyi-qianwen-llm/) and the open-weight version."
#~ msgstr "通义千问分为[闭源](https://help.aliyun.com/zh/model-studio/developer-reference/tongyi-qianwen-llm/)和开源两大版本。"



================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/getting_started/quantization_benchmark.po
================================================
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:2
#: 6d4d3bb3020f4e4d8dba0ca5778cdcae
msgid "Performance of Quantized Models"
msgstr "量化模型效果评估"

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:5
#: 3a541cd8cba74edf9b06b46f59eaaf38
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:7
#: 3a95fc299de141dea4fc729ef907ce17
msgid "This section reports the generation performance of quantized models (including GPTQ and AWQ) of the Qwen2 series. Specifically, we report:"
msgstr "本部分介绍Qwen2量化模型（包括GPTQ与AWQ量化方案）的效果评估，有以下数据集"

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:11
#: 9386a3b95eb340568185da78224a1ccd
msgid "MMLU (Accuracy)"
msgstr "MMLU （准确率）"

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:12
#: 3cd93b881c90488895c61298104bc7fb
msgid "C-Eval (Accuracy)"
msgstr "C-Eval （准确率）"

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:13
#: 7ac4bb515b0a49699d4eb95fc433bb51
msgid "IFEval (Strict Prompt-Level Accuracy)"
msgstr "IFEval （提示词级的严格准确率，Strict Prompt-Level Accuracy）"

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:15
#: 08e3f35820344c93877618815650b866
msgid "We use greedy decoding in evaluating all models."
msgstr "所有模型均使用贪心解码。"

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:18
#: 9aec40221219455d8fc4e473e5acf09c
msgid "Quantization"
msgstr "量化模型"

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:18
#: 93f274f4751f445d85f04937b25c7f7d
msgid "Average"
msgstr "平均"

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:18
#: 776612f5dd4a40d98976bdfe4896508c
msgid "MMLU"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:18
#: f6e8014116cf4179a934d601ee61d04d
msgid "C-Eval"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:18
#: 0c40e96c4a3b4cdeaaf1a95ff1aa8f98
msgid "IFEval"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:20
#: 773ccb0f10bd4cf690e819af51c40e76
msgid "Qwen2-72B-Instruct"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:20
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:28
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:36
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:44
#: 71e180f75e624b738d56ec2a1fad253c 7ebe73a2e96445c4bb733845c3190240
#: bd5a3b8861d646fa9e8d8bc51bb1b80c cc79a78b34f94c18b7bdaf1bfcc8824d
msgid "BF16"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:20
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:22
#: 08517ffc3e6e4ceb812c3d8710307266 2e879d3d1fef4c878b097550d745e7ae
msgid "81.3"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:20
#: f795aa42cf7d42ccb5a573a5f44be79f
msgid "82.3"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:20
#: 01c54f3da3454e178a07a9f88ed5302b
msgid "83.8"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:20
#: 7651df5ccaa14b11a3a89827a5265ae8
msgid "77.6"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:22
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:30
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:38
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:46
#: 04de04c9ff3640f096301e76fdd291de 301aa8e494ff4fe4aefcc8cfb7a4c065
#: d395be41cf144318a1faeccc6f6965c8 ec513d10a75d44b8bd134287a57b5cdd
msgid "GPTQ-Int8"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:22
#: 411166db878d4d8f8515e9f5d78a651c
msgid "80.7"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:22
#: e63ce8a2f1cc4cec9b52521015e2aebe
msgid "83.4"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:22
#: e6be6c30e0d740d39c6c8807e2d4f5f8
msgid "77.5"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:24
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:32
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:40
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:48
#: 21720ff324814b2b865f37a40c3586b5 4644a49bcdfd457b84eb5b2771177d78
#: 560dcb4bfa6e45088faefdb504d629a5 7044a0d2dd6945138ea385287ab5bf33
msgid "GPTQ-Int4"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:24
#: 1cb55cd40b3c484d8213c15375b2ad68
msgid "81.2"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:24
#: 32b889d9ef014f2ab6be6881e20d40ae
msgid "80.8"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:24
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:26
#: ba86de9eb27b40e0ba6a57580aed89c3 eed2e99c0edc426e81ec24e961fe971e
msgid "83.9"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:24
#: ee3a3132082048d5b79721fa84f6f816
msgid "78.9"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:26
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:34
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:42
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:50
#: 632f832fc1f249fa92764538b698550d 8c7ccf4f75f44b27bb1b5aac544836cb
#: b473937c2be94c3490483bb5a820e2fe bc1abd77dd27412992d21bda1831a2a8
msgid "AWQ"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:26
#: 2711a3f907224e51ba30818b2e730a30
msgid "80.4"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:26
#: ca9624c0258b425ba53f024b086c173a
msgid "80.5"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:26
#: 2f4b57d4394c4cb187407145ce8d5f1e
msgid "76.9"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:28
#: 48cc75ed7bf04778b327c7b03d418e37
msgid "Qwen2-7B-Instruct"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:28
#: 75182905b74a41099ff859fb86752e99
msgid "66.9"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:28
#: 80cda712e9dc482fac24952d3bb27b28
msgid "70.5"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:28
#: 0701d66bc3084aef8937e4b687705f37
msgid "77.2"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:28
#: 8efb5c133644420c808dfd78f8fcde2f
msgid "53.1"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:30
#: 2076e02516bd4ff1856bc12a8d6bd320
msgid "66.2"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:30
#: 588f4ad13845491d9589ea094265d532
msgid "69.1"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:30
#: 0c79963a231a402eb6db1671e851be38
msgid "76.7"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:30
#: 5d525163672f456289990489459466ae
msgid "52.9"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:32
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:34
#: 9283ca6491194b59a5edf57228f9b5af a4123c0691a442f6850ae25615c108af
msgid "64.1"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:32
#: 9e7ffb49aac34129894b0582c0d8aba1
msgid "67.8"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:32
#: 7c2fc310e5764b7fbf6034ffd3a5d26d
msgid "75.2"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:32
#: 33e6b6e590a64c08adccf0bb161c1046
msgid "49.4"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:34
#: b3cbe7665bdf4f4388f015fb6606540e
msgid "67.4"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:34
#: a47d3b52e80249f986c4339b9d3fff10
msgid "73.6"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:34
#: d76543cff2df434185fbe51712024679
msgid "51.4"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:36
#: cee2c965036d41c6a93ffbf9a9788e4b
msgid "Qwen2-1.5B-Instruct"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:36
#: 8c9d1cd8fb5a4d75b85d0edcb9ed69df
msgid "48.4"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:36
#: f5e05b0942a24e2b9cac753932ad51c4
msgid "52.4"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:36
#: c6f81ec529004598aa14c55228ff9538
msgid "63.8"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:36
#: 5b2b4092d04f4d02a56bd0df5807e2c5
msgid "29.0"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:38
#: 08d2bf82e83f4a889d622c72c1e1b3b2
msgid "48.1"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:38
#: 3d8ea738153f467ba55d50e6bf0f84c0
msgid "53.0"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:38
#: 8755d6c4c1e64cd38122f08a92bd90ca
msgid "62.5"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:38
#: 1c403dbb3692472a88706cb4b4a1f0f3
msgid "28.8"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:40
#: f3f43ea77edc4ff0969e2466e6fe13e1
msgid "45.0"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:40
#: 9d070c4b9f3e4fceb27b29ecdf90eb41
msgid "50.7"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:40
#: 24ff991704c440deb34b92512f89c371
msgid "57.4"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:40
#: b4645b7317a44cb795fc4190149dd0e0
msgid "27.0"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:42
#: eeee44d1d65647569999de94e72c00cb
msgid "46.5"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:42
#: 41630bee9142494c801083cd5d213dc0
msgid "51.6"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:42
#: 762395735fb34bccbc4d057968bbfbf1
msgid "58.1"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:42
#: f5915835bcb24051bebed452fc398728
msgid "29.9"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:44
#: 39108e2a66444ca780a720f115251308
msgid "Qwen2-0.5B-Instruct"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:44
#: ../../Qwen/source/getting_started/quantization_benchmark.rst:50
#: 2795adace57c401cb8bacc00082dfd53 a59271d53e434d17a8a0a19529158f2c
msgid "34.4"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:44
#: c93982789e4e453eb5a02d64f02cb74f
msgid "37.9"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:44
#: 213dfd43b2254a2caec1d4b1d231ed55
msgid "45.2"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:44
#: 11de22e2a04a4c04b0b91d09d028b853
msgid "20.0"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:46
#: 84b6570bcc8d4c6598336d5bc9b9d36a
msgid "32.6"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:46
#: b79e88232d114f43a179dcc5b0477c97
msgid "35.6"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:46
#: 1166b675e1e64e18a82c3219f321e248
msgid "43.9"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:46
#: fdf340d39b074778b55d36f477f8dc0a
msgid "18.1"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:48
#: ed930e1b13dd4c5caf80b2a180a1bcc3
msgid "29.7"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:48
#: c3d5617389634f7e96c66b4f869379a9
msgid "33.0"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:48
#: 4573b471c48d4028ad6fb378e75f40aa
msgid "39.2"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:48
#: c867c42e916f493b9715b1adf656ddcb
msgid "16.8"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:50
#: 20d4c89c335648bb93f07ebfb8ce9fce
msgid "31.1"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:50
#: 25400aeaf79d49cb914ffa5ff26bfe03
msgid "42.1"
msgstr ""

#: ../../Qwen/source/getting_started/quantization_benchmark.rst:50
#: d15e246b65b0427d970b78deffd8c2bc
msgid "16.7"
msgstr ""



================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/getting_started/quickstart.po
================================================
# Copyright (C) 2024, Qwen Team, Alibaba Group.
# This file is distributed under the same license as the Qwen package.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-06-13 16:36+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.16.0\n"

#: ../../source/getting_started/quickstart.md:1
#: a99b6a1db1374218a20b06bcd0c57957
msgid "Quickstart"
msgstr "快速开始"

#: ../../source/getting_started/quickstart.md:3
#: 1da6c3f04eb24db8b697e094163096a1
msgid "This guide helps you quickly start using Qwen3.  We provide examples of [Hugging Face Transformers](https://github.com/huggingface/transformers) as well as [ModelScope](https://github.com/modelscope/modelscope), and [vLLM](https://github.com/vllm-project/vllm) for deployment."
msgstr "本指南帮助您快速上手 Qwen3 的使用，并提供了如下示例： [Hugging Face Transformers](https://github.com/huggingface/transformers) 以及 [ModelScope](https://github.com/modelscope/modelscope) 和 [vLLM](https://github.com/vllm-project/vllm) 在部署时的应用实例。"

#: ../../source/getting_started/quickstart.md:6
#: 11c38e7141f941efb448e7099935b8a9
msgid "You can find Qwen3 models in [the Qwen3 collection](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) at Hugging Face Hub and [the Qwen3 collection](https://www.modelscope.cn/collections/Qwen3-9743180bdc6b48) at ModelScope."
msgstr "你可以在 Hugging Face Hub 的 [Qwen3 collection](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) 或 ModelScope 的 [Qwen3 collection](https://www.modelscope.cn/collections/Qwen3-9743180bdc6b48) 中寻找 Qwen3 模型。"

#: ../../source/getting_started/quickstart.md:8
#: 842c24eb7d30496baf9025af21ca1ed0
msgid "Transformers"
msgstr "Transformers"

#: ../../source/getting_started/quickstart.md:10
#: d0c61f87b0b347ae91e115ba60ebd46e
msgid "To get a quick start with Qwen3, you can try the inference with `transformers` first. Make sure that you have installed `transformers>=4.51.0`. We advise you to use Python 3.10 or higher, and PyTorch 2.6 or higher."
msgstr "要快速上手 Qwen3 ，我们建议您首先尝试使用 `transformers` 进行推理。请确保已安装了 `transformers>=4.51.0` 版本。我们建议您使用 Python 3.10 或以上版本， PyTorch 2.6 或以上版本。"

#: ../../source/getting_started/quickstart.md:14
#: a7fbb3d015b4440ca14ba00821f84fb0
msgid "The following is a very simple code snippet showing how to run Qwen3-8B:"
msgstr "以下是一个非常简单的代码片段示例，展示如何运行 Qwen3 模型："

#: ../../source/getting_started/quickstart.md:63
#: b55178516d31433c9ead5287e9abd3b4
msgid "Qwen3 will think before respond, similar to QwQ models. This means the model will use its reasoning abilities to enhance the quality of generated responses. The model will first generate thinking content wrapped in a `<think>...</think>` block, followed by the final response."
msgstr "Qwen3 将在实际回复前思考，与 QwQ 模型类似。这意味着模型将运用其推理能力来提升生成回复的质量。模型会首先生成包含在 `<think>...</think>` 块中的思考内容，随后给出最终回复。"

#: ../../source/getting_started/quickstart.md:67
#: e31fa076c2264d9fa2ae5d8516b7f4a7
msgid "Hard Switch: To strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models, you can set `enable_thinking=False` when formatting the text."
msgstr "硬开关：为了严格禁用模型的思考行为，使其功能与之前的Qwen2.5-Instruct模型保持一致，您可以在格式化文本时设置`enable_thinking=False`。"

#: ../../source/getting_started/quickstart.md:77
#: a6c11c7147e54cf9a86d65b4f80840c3
msgid "It can be particularly useful in scenarios where disabling thinking is essential for enhancing efficiency."
msgstr "在某些需要通过禁用思考来提升效率的场景中，这一功能尤其有用。"

#: ../../source/getting_started/quickstart.md:79
#: 57bd7b66fe3b4dbfabe7439dc67b7d5f
msgid "Soft Switch: Qwen3 also understands the user's instruction on its thinking behavior, in particular, the soft switch `/think` and `/no_think`. You can add them to user prompts or system messages to switch the model's thinking mode from turn to turn.  The model will follow the most recent instruction in multi-turn conversations."
msgstr "软开关：Qwen3 还能够理解用户对其思考行为的指令，特别是软开关 `/think` 和 `/no_think`。您可以将这些指令添加到用户 (user) 或系统 (system) 消息中，以在对话轮次之间灵活切换模型的思考模式。在多轮对话中，模型将遵循最近的指令。"

#: ../../source/getting_started/quickstart.md:85
#: 73fa1ee92c7b4a71a2724aedb665dd1b
msgid "For thinking mode, use Temperature=0.6, TopP=0.95, TopK=20, and MinP=0 (the default setting in `generation_config.json`). DO NOT use greedy decoding, as it can lead to performance degradation and endless repetitions.  For more detailed guidance, please refer to the Best Practices section."
msgstr "对于思考模式，使用 Temperature=0.6，TopP=0.95，TopK=20，以及 MinP=0（`generation_config.json` 中的默认设置）。不要使用贪婪解码，因为它可能导致性能下降和无尽的重复。更多详细指导，请参阅最佳实践部分。"

#: ../../source/getting_started/quickstart.md:89
#: 4ce3b1cf5e1349628a66c101432fd748
msgid "For non-thinking mode, we suggest using Temperature=0.7, TopP=0.8, TopK=20, and MinP=0."
msgstr "对于非思考模式，我们建议使用 Temperature=0.7，TopP=0.8，TopK=20，以及 MinP=0。"

#: ../../source/getting_started/quickstart.md:93
#: 34d138d1ca8c4ecab4002b354cfe64d2
msgid "ModelScope"
msgstr "魔搭 (ModelScope)"

#: ../../source/getting_started/quickstart.md:95
#: 0c25fecd5e42412a80214fbc35b08226
msgid "To tackle with downloading issues, we advise you to try [ModelScope](https://github.com/modelscope/modelscope). Before starting, you need to install `modelscope` with `pip`."
msgstr "为了解决下载问题，我们建议您尝试从 [ModelScope](https://github.com/modelscope/modelscope) 进行下载。开始之前，需要使用 `pip` 安装 `modelscope` 。"

#: ../../source/getting_started/quickstart.md:98
#: d8ccf4eafeb849ae8bd49d9fb2281c60
msgid "`modelscope` adopts a programmatic interface similar (but not identical) to `transformers`. For basic usage, you can simply change the first line of code above to the following:"
msgstr "`modelscope` 采用了与 `transformers` 类似（但不完全一致）的编程接口。对于基础使用，仅需将上面代码第一行做如下修改："

#: ../../source/getting_started/quickstart.md:105
#: f17c53400572487ca066135facd711bb
msgid "For more information, please refer to [the documentation of `modelscope`](https://www.modelscope.cn/docs)."
msgstr "欲获取更多信息，请参考 [`modelscope` 文档](https://www.modelscope.cn/docs)。"

#: ../../source/getting_started/quickstart.md:107
#: 6e3028c50b2146bd932d168662e57620
msgid "OpenAI API Compatibility"
msgstr ""

#: ../../source/getting_started/quickstart.md:109
#: 046b0833e5b744cda72d7a7ef5672cc2
msgid "You can serve Qwen3 via OpenAI-compatible APIs using frameworks such as vLLM, SGLang, and interact with the API using common HTTP clients or the OpenAI SDKs."
msgstr ""

#: ../../source/getting_started/quickstart.md:112
#: 1542f3cbe7ba4adab50dfc85da110c36
msgid "Here we take Qwen3-8B as an example to start the API:"
msgstr ""

#: ../../source/getting_started/quickstart.md:114
#: 4bfa52cf82914d73b0fcbbceafa4ff8a
msgid "SGLang (`sglang>=0.4.6.post1` is required):"
msgstr ""

#: ../../source/getting_started/quickstart.md:120
#: cc7a9ae4d4fe4b86be41b376d7334024
msgid "vLLM (`vllm>=0.8.5` is recommended):"
msgstr ""

#: ../../source/getting_started/quickstart.md:126
#: 7a690ab7e23f4505a355abd50e647101
msgid "Then, you can use the [create chat interface](https://platform.openai.com/docs/api-reference/chat/completions/create) to communicate with Qwen:"
msgstr "然后，可以使用 [\"create chat\" interface](https://platform.openai.com/docs/api-reference/chat/completions/create>) 来与 Qwen 进行交流："

#: ../../source/getting_started/quickstart.md 9cd82768a97142acac1c2c63a05e1ad3
msgid "curl"
msgstr ""

#: ../../source/getting_started/quickstart.md 7d220176fa6b4622ae74431873d49396
msgid "Python"
msgstr ""

#: ../../source/getting_started/quickstart.md:146
#: f55afdfc6fa54f468b1b554fab082263
msgid "You can use the API client with the `openai` Python SDK as shown below:"
msgstr "您可以按照下面所示的方式，使用 `openai` Python SDK中的客户端："

#: ../../source/getting_started/quickstart.md:175
#: d0370e64550a4c51b6146ad7dfb52f97
msgid "While the soft switch is always available, the hard switch is also available in the API through the following configuration to the API call. For more usage, please refer to our document on [SGLang](../deployment/sglang) and [vLLM](../deployment/vllm)."
msgstr "虽然软开关始终可用，但硬开关也可以通过以下 API 调用配置在 API 中使用。更多用法，请参阅我们关于 [SGLang](../deployment/sglang) 和 [vLLM](../deployment/vllm) 的文档。"

#: ../../source/getting_started/quickstart.md:178
#: e48c576eba5e4c5db9ef0a48882e18c2
msgid "Thinking Budget"
msgstr "思考预算"

#: ../../source/getting_started/quickstart.md:180
#: cc5be59588e34164b7f2e84a7d8b82c0
msgid "Qwen3 supports the configuration of thinking budget. It is achieved by ending the thinking process once the budget is reached and guiding the model to generate the \"summary\" with an early-stopping prompt."
msgstr "Qwen3 支持配置思考预算。其实现方式是，一旦达到预算，便结束思考过程，并通过提前停止提示引导模型生成“总结”。"

#: ../../source/getting_started/quickstart.md:183
#: 8a2760351bc04093adc5017b58ec4981
msgid "Since this feature involves customization specific to each model, it is currently not available in the open-source frameworks and only implemented by [the Alibaba Cloud Model Studio API](https://www.alibabacloud.com/help/en/model-studio/deep-thinking#6f0633b9cdts1)."
msgstr "由于此功能涉及针对模型的定制，目前在开源框架中不可用，仅由[阿里云百炼API](https://bailian.console.aliyun.com/?tab=doc#/doc/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2870973.html&renderType=iframe)实现。" 

#: ../../source/getting_started/quickstart.md:185
#: 01acc4a8caa443dc8da81862d2e7d6bb
msgid "However, with existing open-source frameworks, one can generate twice to implement this feature as follows:"
msgstr "然而，利用现有的开源框架，可以通过两次生成来实现此功能，具体如下："

#: ../../source/getting_started/quickstart.md:186
#: 677f5e6ad5b1416aa4bbadff3e6537a1
msgid "For the first time, generate tokens up to the thinking budget and check if the thinking process is finished. If the thinking process is not finished, append the early-stopping prompt."
msgstr "第一次生成时，生成的token数量达到思考预算，并检查思考过程是否完成。如果思考过程未完成，则追加提前停止提示。"

#: ../../source/getting_started/quickstart.md:187
#: 479c92c1bc9d4a74b18d5c175d1e6cda
msgid "For the second time, continue generation until the end of the content or the upper length limit is fulfilled."
msgstr "第二次生成时，继续生成直到内容结束或达到长度上限。"

#: ../../source/getting_started/quickstart.md:189
#: dd7834e30f9344ddad21792279ad4732
msgid "The following snippet shows the implementation with Hugging Face Transformers:"
msgstr "以下代码片段展示了使用Hugging Face Transformers的实现："

#: ../../source/getting_started/quickstart.md:262
#: 85f448fb112e4742bc5d4bd05979f30d
msgid "You should see the output in the console like the following"
msgstr "您应该会在控制台中看到类似以下的输出："

#: ../../source/getting_started/quickstart.md:274
#: 4fc4ae8cdbb54c8b97b2da717d61e42e
msgid "For purpose of demonstration only, `thinking_budget` is set to 16. However, `thinking_budget` should not be set to that low in practice. We recommend tuning `thinking_budget` based on the latency users can accept and setting it higher than 1024 for meaningful improvements across tasks."
msgstr "出于示例目的，`thinking_budget` 被设置为 16。然而，在实际应用中不应将其设置得如此低。我们建议根据用户可接受的延迟调整 `thinking_budget`，并将其设置为高于 1024，以在各项任务中获得有意义的改进。"

#: ../../source/getting_started/quickstart.md:278
#: f5b4018640c24a3cb6dd1e958f2860d9
msgid "If thinking is not desired at all, developers should make use of the hard switch instead."
msgstr "如果完全不需要思考，开发者应改用硬开关。"

#: ../../source/getting_started/quickstart.md:281
#: a6bbbe9da06c43c09cd3756e075e6103
msgid "Next Step"
msgstr "下一步"

#: ../../source/getting_started/quickstart.md:283
#: 40f5270a92bf490ab4fa0d6af2761044
msgid "Now, you can have fun with Qwen3 models.  Would love to know more about its usage?  Feel free to check other documents in this documentation."
msgstr "现在，您可以尽情探索 Qwen3 模型的各种用途。若想了解更多，请随时查阅本文档中的其他内容。"


================================================
FILE: docs/locales/zh_CN/LC_MESSAGES/getting_started/speed_benchmark.po
================================================
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-05-20 17:08+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.16.0\n"

#: ../../source/getting_started/speed_benchmark.md:1
#: d7da8abe501c46e99fee28d56435b08b
msgid "Speed Benchmark"
msgstr "效率评估"

#: ../../source/getting_started/speed_benchmark.md:3
#: 478eb0d956db47ccafda9ec05e9c7253
msgid "We report the speed performance of bfloat16 models and quantized models (including FP8, GPTQ, AWQ) of the Qwen3 series.  Specifically, we report the inference speed (tokens/s) as well as memory footprint (GB) under different context lengths."
msgstr "本部分介绍 Qwen3 系列模型（原始模型和量化模型）的效率测试结果，包括推理速度(tokens/s)与不同上下文长度时的显存占用(GB)。"

#: ../../source/getting_started/speed_benchmark.md:6
#: c3e1ab9d5f8b4a54bff5d5a56ea6a2bb
msgid "Environments"
msgstr "环境配置"

#: ../../source/getting_started/speed_benchmark.md:8
#: 6b7764c599eb4693be742f161daf5cc4
msgid "Hugging Face Transformers"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:10
#: ../../source/getting_started/speed_benchmark.md:25
#: aa80ee8b1abd4be1b0a9a88cff496938 c43bf971a253469c832f5e874eda86a9
msgid "**Hardware**:"
msgstr "**硬件**:"

#: ../../source/getting_started/speed_benchmark.md:11
#: ../../source/getting_started/speed_benchmark.md:26
#: 471dbb8bd26c41a58f485dabf6643aa0
msgid "NVIDIA H20 96GB"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:12
#: 04347c8d74a44a18af8fb11cbf47bb42
msgid "**Software for Non-AutoAWQ**:"
msgstr "**非AutoAWQ的软件环境**:"

#: ../../source/getting_started/speed_benchmark.md:13
#: 5bf39b332ff04657838c29c627a0e3a6
msgid "PyTorch 2.6.0"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:14
#: e1ca0b138a5248a59f6fb59331ff09c1
msgid "Flash Attention 2.7.4"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:15
#: ../../source/getting_started/speed_benchmark.md:19
#: ../../source/getting_started/speed_benchmark.md:29
#: c763f81c0b2443d594f7b7c6c1cacce0
msgid "Transformers 4.51.3"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:16
#: 07c9d4a46b1446d9a334ad590b23ae8c
msgid "GPTQModel 2.2.0+cu128torch2.6"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:17
#: a13e8f4aa82f4693af93c20cda549afe
msgid "**Software for AutoAWQ**:"
msgstr "**AutoAWQ的软件环境**:"

#: ../../source/getting_started/speed_benchmark.md:18
#: ../../source/getting_started/speed_benchmark.md:28
#: 14b90e1a69ce40f6a1edc3eb0fbf92b3
msgid "PyTorch 2.6.0+cu124"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:20
#: c8709c79343448838e2e6423ae888b6e
msgid "AutoAWQ 0.2.9"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:21
#: c8bd3eae4c8e497197bd3da6e79ded0a
msgid "AutoAWQ_kernels 0.0.9"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:24
#: cc0dad54e88d4e13857f3a45888b6030
msgid "SGLang"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:27
#: 931979d5babc4459940b65cc29f091b3
msgid "**Software**:"
msgstr "**软件环境**:"

#: ../../source/getting_started/speed_benchmark.md:30
#: 23713b47351b4b82949d982019a725b1
msgid "SGLang 0.4.6.post1"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:31
#: 8a1e4b407a5e46748b9fcbd29b7624ef
msgid "SGL-kernel 0.1.0"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:32
#: bc8a5ce98be34cd3987b015f51b607a1
msgid "vLLM 0.7.2 (Required by SGLang for AWQ quantization)"
msgstr "vLLM 0.7.2 (被SGLang AWQ量化依赖)"

#: ../../source/getting_started/speed_benchmark.md:34
#: 1b5176a8dd9647738f81c3f726f9e9dd
msgid "Notes"
msgstr "备注"

#: ../../source/getting_started/speed_benchmark.md:36
#: 5b67ede5b69845158d7de45576b5f511
msgid "**Inference Speed (tokens/s)** is calculated as:"
msgstr "**推理速度（tokens/s）** 的计算公式为："

#: ../../source/getting_started/speed_benchmark.md:38
#: 8d9cad9e886e4ba09e3a9907eecdc85e
msgid "\\text{Speed} = \\frac{\\text{tokens}_{\\text{prompt}} + \\text{tokens}_{\\text{generation}}}{\\text{time}}"
msgstr ""

#: ../../source/getting_started/speed_benchmark.md:42
#: 5e23850b08f54a5b93ca312f77ad16cb
msgid "We use a **batch size of 1** and the **minimum number of GPUs** possible for evaluation."
msgstr "batch size 设置为1，使用 GPU 数量尽可能少"

#: ../../source/getting_started/speed_benchmark.md:44
#: 25f5f831928e428a857e76cb66be39dc
msgid "We test the **speed and memory usage** when generating **2048 tokens**, with input lengths of `1`, `6144`, `14336`, `30720`, `63488`, and `129024` tok

Download .txt

gitextract_9etz2yip/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yml
│   │   └── config.yml
│   └── workflows/
│       └── inactive.yml
├── .gitignore
├── .readthedocs.yaml
├── README.md
├── docker/
│   ├── Dockerfile-cu121
│   ├── docker_cli_demo.sh
│   └── docker_web_demo.sh
├── docs/
│   ├── Makefile
│   ├── README.md
│   ├── locales/
│   │   └── zh_CN/
│   │       └── LC_MESSAGES/
│   │           ├── deployment/
│   │           │   ├── dstack.po
│   │           │   ├── openllm.po
│   │           │   ├── sglang.po
│   │           │   ├── skypilot.po
│   │           │   ├── tgi.po
│   │           │   └── vllm.po
│   │           ├── framework/
│   │           │   ├── Langchain.po
│   │           │   ├── LlamaIndex.po
│   │           │   ├── function_call.po
│   │           │   └── qwen_agent.po
│   │           ├── getting_started/
│   │           │   ├── concepts.po
│   │           │   ├── quantization_benchmark.po
│   │           │   ├── quickstart.po
│   │           │   ├── speed_benchmark.po
│   │           │   └── thinking_budget.po
│   │           ├── index.po
│   │           ├── inference/
│   │           │   └── transformers.po
│   │           ├── quantization/
│   │           │   ├── awq.po
│   │           │   ├── gptq.po
│   │           │   └── llama.cpp.po
│   │           ├── run_locally/
│   │           │   ├── llama.cpp.po
│   │           │   ├── mlx-lm.po
│   │           │   └── ollama.po
│   │           └── training/
│   │               ├── axolotl.po
│   │               ├── llama_factory.po
│   │               ├── ms_swift.po
│   │               ├── unsloth.po
│   │               └── verl.po
│   ├── make.bat
│   ├── requirements-docs.txt
│   └── source/
│       ├── _static/
│       │   ├── css/
│       │   │   └── custom.css
│       │   └── design-tabs.js
│       ├── assets/
│       │   └── qwen3_nonthinking.jinja
│       ├── conf.py
│       ├── deployment/
│       │   ├── dstack.rst
│       │   ├── openllm.rst
│       │   ├── sglang.md
│       │   ├── skypilot.rst
│       │   ├── tgi.rst
│       │   └── vllm.md
│       ├── framework/
│       │   ├── Langchain.rst
│       │   ├── LlamaIndex.rst
│       │   ├── function_call.md
│       │   └── qwen_agent.rst
│       ├── getting_started/
│       │   ├── concepts.md
│       │   ├── quantization_benchmark.rst
│       │   ├── quickstart.md
│       │   ├── speed_benchmark.md
│       │   └── thinking_budget.md
│       ├── index.rst
│       ├── inference/
│       │   └── transformers.md
│       ├── quantization/
│       │   ├── awq.md
│       │   ├── gptq.md
│       │   └── llama.cpp.md
│       ├── run_locally/
│       │   ├── llama.cpp.md
│       │   ├── lmstudio.md
│       │   ├── mlx-lm.md
│       │   └── ollama.md
│       └── training/
│           ├── axolotl.md
│           ├── llama_factory.md
│           ├── ms_swift.md
│           ├── unsloth.md
│           └── verl.md
├── eval/
│   ├── README.md
│   ├── configs/
│   │   └── ARCAGI-Qwen3-235B-A22B-Instruct-2507.yaml
│   ├── data/
│   │   └── arc_agi_1.jsonl
│   ├── eval/
│   │   ├── arc_agi_1.py
│   │   └── eval.py
│   ├── eval_res/
│   │   └── ARCAGI-Qwen3-235B-A22B-Instruct-2507_eval_result.txt
│   ├── generate_api_answers/
│   │   ├── infer_multithread.py
│   │   └── utils_vllm.py
│   ├── output/
│   │   ├── ARCAGI-Qwen3-235B-A22B-Instruct-2507.jsonl
│   │   └── ARCAGI-Qwen3-235B-A22B-Instruct-2507_details.jsonl
│   └── requirements.txt
└── examples/
    ├── README.md
    ├── demo/
    │   ├── cli_demo.py
    │   └── web_demo.py
    ├── gcu-support/
    │   ├── README.md
    │   └── gcu_demo.py
    ├── llama-factory/
    │   ├── finetune-zh.md
    │   ├── qwen2-7b-full-sft.yaml
    │   ├── qwen2-7b-lora-sft.yaml
    │   ├── qwen2-7b-merge-lora.yaml
    │   └── qwen2-7b-qlora-sft.yaml
    └── speed-benchmark/
        ├── README.md
        ├── README_zh.md
        ├── requirements-perf-transformers.txt
        ├── requirements-perf-vllm.txt
        ├── speed_benchmark_transformers.py
        └── speed_benchmark_vllm.py

Download .txt

SYMBOL INDEX (45 symbols across 10 files)

FILE: docs/source/_static/design-tabs.js
  function create_key (line 18) | function create_key(el) {
  function ready (line 29) | function ready() {
  function onSDLabelClick (line 89) | function onSDLabelClick() {

FILE: docs/source/conf.py
  class MockedClassDocumenter (line 109) | class MockedClassDocumenter(autodoc.ClassDocumenter):
    method add_line (line 112) | def add_line(self, line: str, source: str, *lineno: int) -> None:

FILE: eval/eval/arc_agi_1.py
  function parse_model_output (line 6) | def parse_model_output(output):
  function solution_score (line 30) | def solution_score(predicted, ground_truth):
  function compute_scores_arc_agi_1 (line 36) | def compute_scores_arc_agi_1(jobs, cache_path):
  function save_cache (line 56) | def save_cache(jobs, cache_path):

FILE: eval/eval/eval.py
  function get_after_think (line 12) | def get_after_think(text):
  function main (line 20) | def main():

FILE: eval/generate_api_answers/infer_multithread.py
  function count_completed_samples (line 16) | def count_completed_samples(output_file):
  function process_item (line 31) | def process_item(
  function main (line 67) | def main():

FILE: eval/generate_api_answers/utils_vllm.py
  class ClientError (line 16) | class ClientError(RuntimeError):
  function get_content (line 20) | def get_content(

FILE: examples/demo/cli_demo.py
  function _setup_readline (line 60) | def _setup_readline():
  function _load_model_tokenizer (line 83) | def _load_model_tokenizer(args):
  function _gc (line 105) | def _gc():
  function _clear_screen (line 113) | def _clear_screen():
  function _print_history (line 120) | def _print_history(history):
  function _get_input (line 129) | def _get_input() -> str:
  function _chat_stream (line 143) | def _chat_stream(model, tokenizer, query, history):
  function main (line 169) | def main():

FILE: examples/demo/web_demo.py
  function _get_args (line 18) | def _get_args():
  function _load_model_tokenizer (line 54) | def _load_model_tokenizer(args):
  function _chat_stream (line 76) | def _chat_stream(model, tokenizer, query, history):
  function _gc (line 102) | def _gc():
  function _launch_demo (line 110) | def _launch_demo(args, model, tokenizer):
  function main (line 197) | def main():

FILE: examples/speed-benchmark/speed_benchmark_transformers.py
  class SpeedBenchmarkTransformers (line 19) | class SpeedBenchmarkTransformers:
    method __init__ (line 30) | def __init__(self, model_id_or_path, use_modelscope: bool = True, outp...
    method run (line 59) | def run(self, context_length: int, generate_length: int) -> str:
    method save_result (line 125) | def save_result(data: dict, out_file: str) -> None:
  function main (line 135) | def main():

FILE: examples/speed-benchmark/speed_benchmark_vllm.py
  class SpeedBenchmarkVllm (line 32) | class SpeedBenchmarkVllm:
    method __init__ (line 38) | def __init__(self, experiment_config: dict, sampling_params: SamplingP...
    method _reprs (line 70) | def _reprs(self, o):
    method create_query (line 73) | def create_query(self, length: int, limited_size: int = 96) -> Tuple[s...
    method run_infer (line 91) | def run_infer(self, query: str):
    method run (line 101) | def run(self):
    method collect_statistics (line 154) | def collect_statistics(model_id_or_path, data, out_length, in_length, ...
    method print_table (line 172) | def print_table(results):
    method save_result (line 177) | def save_result(data: dict, out_file: str) -> None:
  function main (line 187) | def main():

Download .json

Condensed preview — 101 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (3,782K chars).

[
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.yml",
    "chars": 3739,
    "preview": "name: 🐞 Bug Report\ndescription: Something unexpected happened, errors or badcases\nbody:\n  - type: markdown\n    attribute"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "chars": 500,
    "preview": "blank_issues_enabled: false\ncontact_links:\n  - name: 🚀 Feature Request\n    url: https://github.com/QwenLM/Qwen3/discussi"
  },
  {
    "path": ".github/workflows/inactive.yml",
    "chars": 1484,
    "preview": "name: Close and lock inactive threads\r\n\r\non:\r\n  schedule:\r\n    - cron: \"0 8 * * *\"\r\n  workflow_dispatch:\r\n\r\npermissions:"
  },
  {
    "path": ".gitignore",
    "chars": 150,
    "preview": "# Sphinx documentation\ndocs/_build/\ndocs/build/\ndocs/**/*.mo\n.vscode\n.idea\n\n# Byte-compiled / optimized / DLL files\n__py"
  },
  {
    "path": ".readthedocs.yaml",
    "chars": 465,
    "preview": "# Read the Docs configuration file\n# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details\n\nversion:"
  },
  {
    "path": "README.md",
    "chars": 24685,
    "preview": "# Qwen3\n\n<p align=\"center\">\n    <img src=\"https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen3.png\" width"
  },
  {
    "path": "docker/Dockerfile-cu121",
    "chars": 1828,
    "preview": "ARG CUDA_VERSION=12.1.0\nARG from=nvidia/cuda:${CUDA_VERSION}-cudnn8-devel-ubuntu20.04\n\nFROM ${from} as base\n\nRUN <<EOF\na"
  },
  {
    "path": "docker/docker_cli_demo.sh",
    "chars": 1375,
    "preview": "#!/usr/bin/env bash\n#\n# This script will automatically pull docker image from DockerHub, and start a container to run th"
  },
  {
    "path": "docker/docker_web_demo.sh",
    "chars": 1812,
    "preview": "#!/usr/bin/env bash\n#\n# This script will automatically pull docker image from DockerHub, and start a daemon container to"
  },
  {
    "path": "docs/Makefile",
    "chars": 637,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the "
  },
  {
    "path": "docs/README.md",
    "chars": 1672,
    "preview": "# Qwen Documentation\r\n\r\nThis is the source of the documentation at <https://qwen.readthedocs.io>.\r\n\r\n## Quick Start\r\n\r\nW"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/deployment/dstack.po",
    "chars": 7067,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/deployment/openllm.po",
    "chars": 5162,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/deployment/sglang.po",
    "chars": 12564,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/deployment/skypilot.po",
    "chars": 9535,
    "preview": "# Copyright (C) 2024, Qwen Team, Alibaba Group.\n# This file is distributed under the same license as the Qwen package.\n#"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/deployment/tgi.po",
    "chars": 10591,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/deployment/vllm.po",
    "chars": 17100,
    "preview": "# Copyright (C) 2024, Qwen Team, Alibaba Group.\n# This file is distributed under the same license as the Qwen package.\n#"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/framework/Langchain.po",
    "chars": 2683,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/framework/LlamaIndex.po",
    "chars": 4422,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/framework/function_call.po",
    "chars": 25731,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/framework/qwen_agent.po",
    "chars": 3876,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/getting_started/concepts.po",
    "chars": 29694,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/getting_started/quantization_benchmark.po",
    "chars": 12893,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/getting_started/quickstart.po",
    "chars": 11774,
    "preview": "# Copyright (C) 2024, Qwen Team, Alibaba Group.\n# This file is distributed under the same license as the Qwen package.\n#"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/getting_started/speed_benchmark.po",
    "chars": 24974,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/getting_started/thinking_budget.po",
    "chars": 2374,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/index.po",
    "chars": 7154,
    "preview": "# Copyright (C) 2024, Qwen Team, Alibaba Group.\n# This file is distributed under the same license as the Qwen package.\n#"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/inference/transformers.po",
    "chars": 14146,
    "preview": "# Copyright (C) 2024, Qwen Team, Alibaba Group.\n# This file is distributed under the same license as the Qwen package.\n#"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/quantization/awq.po",
    "chars": 6522,
    "preview": "# Copyright (C) 2024, Qwen Team, Alibaba Group.\n# This file is distributed under the same license as the Qwen package.\n#"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/quantization/gptq.po",
    "chars": 13360,
    "preview": "# Copyright (C) 2024, Qwen Team, Alibaba Group.\n# This file is distributed under the same license as the Qwen package.\n#"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/quantization/llama.cpp.po",
    "chars": 13719,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/run_locally/llama.cpp.po",
    "chars": 41965,
    "preview": "# Copyright (C) 2024, Qwen Team, Alibaba Group.\n# This file is distributed under the same license as the Qwen package.\n#"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/run_locally/mlx-lm.po",
    "chars": 3481,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/run_locally/ollama.po",
    "chars": 4582,
    "preview": "# Copyright (C) 2024, Qwen Team, Alibaba Group.\n# This file is distributed under the same license as the Qwen package.\n#"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/training/axolotl.po",
    "chars": 9694,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/training/llama_factory.po",
    "chars": 6104,
    "preview": "# Copyright (C) 2024, Qwen Team, Alibaba Group.\n# This file is distributed under the same license as the Qwen package.\n#"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/training/ms_swift.po",
    "chars": 17852,
    "preview": "# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen package.\n#\nmsgid \"\"\nmsgstr"
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/training/unsloth.po",
    "chars": 10406,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/locales/zh_CN/LC_MESSAGES/training/verl.po",
    "chars": 6652,
    "preview": "# SOME DESCRIPTIVE TITLE.\n# Copyright (C) 2024, Qwen Team\n# This file is distributed under the same license as the Qwen "
  },
  {
    "path": "docs/make.bat",
    "chars": 769,
    "preview": "@ECHO OFF\n\npushd %~dp0\n\nREM Command file for Sphinx documentation\n\nif \"%SPHINXBUILD%\" == \"\" (\n\tset SPHINXBUILD=sphinx-bu"
  },
  {
    "path": "docs/requirements-docs.txt",
    "chars": 79,
    "preview": "furo\nmyst-parser==4.0.0\nsphinx<8,>4.5.0\nsphinx-copybutton\nsphinx-design>=0.6.0\n"
  },
  {
    "path": "docs/source/_static/css/custom.css",
    "chars": 1324,
    "preview": "html {\r\n    font-size: 16px;\r\n}\r\n\r\nh1 {\r\n    font-size: 1.75rem;\r\n    line-height: 2.5rem;\r\n}\r\n\r\nh2 {\r\n    font-size: 1."
  },
  {
    "path": "docs/source/_static/design-tabs.js",
    "chars": 3100,
    "preview": "// @ts-check\r\n\r\n// Extra JS capability for selected tabs to be synced\r\n// The selection is stored in local storage so th"
  },
  {
    "path": "docs/source/assets/qwen3_nonthinking.jinja",
    "chars": 4148,
    "preview": "{%- if tools %}\r\n    {{- '<|im_start|>system\\n' }}\r\n    {%- if messages[0].role == 'system' %}\r\n        {{- messages[0]."
  },
  {
    "path": "docs/source/conf.py",
    "chars": 3798,
    "preview": "# Configuration file for the Sphinx documentation builder.\n#\n# This file only contains a selection of the most common op"
  },
  {
    "path": "docs/source/deployment/dstack.rst",
    "chars": 5305,
    "preview": "dstack\n========\n\n`dstack <https://github.com/dstackai/dstack>`__ is an open-source alternative to Kubernetes and Slurm, "
  },
  {
    "path": "docs/source/deployment/openllm.rst",
    "chars": 4241,
    "preview": "OpenLLM\n=======\n\n.. attention:: \n    To be updated for Qwen3.\n\nOpenLLM allows developers to run Qwen2.5 models of differ"
  },
  {
    "path": "docs/source/deployment/sglang.md",
    "chars": 8703,
    "preview": "# SGLang\r\n\r\n[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vi"
  },
  {
    "path": "docs/source/deployment/skypilot.rst",
    "chars": 7152,
    "preview": "SkyPilot\n========\n\n.. attention:: \n    To be updated for Qwen3.\n\nWhat is SkyPilot\n----------------\n\nSkyPilot is a framew"
  },
  {
    "path": "docs/source/deployment/tgi.rst",
    "chars": 8277,
    "preview": "TGI\n=====================\n\n.. attention:: \n    To be updated for Qwen3.\n\nHugging Face's Text Generation Inference (TGI) "
  },
  {
    "path": "docs/source/deployment/vllm.md",
    "chars": 13320,
    "preview": "# vLLM\r\n\r\nWe recommend you trying [vLLM](https://github.com/vllm-project/vllm) for your deployment of Qwen. \r\nIt is simp"
  },
  {
    "path": "docs/source/framework/Langchain.rst",
    "chars": 10657,
    "preview": "Langchain\n==========================\n\n.. attention:: \n    To be updated for Qwen3.\n\nThis guide helps you build a questio"
  },
  {
    "path": "docs/source/framework/LlamaIndex.rst",
    "chars": 5388,
    "preview": "LlamaIndex\n==========\n\n.. attention:: \n    To be updated for Qwen3.\n\nTo connect Qwen2.5 with external data, such as docu"
  },
  {
    "path": "docs/source/framework/function_call.md",
    "chars": 28969,
    "preview": "---\r\nmyst:\r\n  number_code_blocks: [\"python3\"]\r\n---\r\n\r\n# Function Calling\r\n\r\n\r\n## Preface\r\n\r\nFunction calling with large "
  },
  {
    "path": "docs/source/framework/qwen_agent.rst",
    "chars": 3936,
    "preview": "Qwen-Agent\n==========\n\n`Qwen-Agent <https://github.com/QwenLM/Qwen-Agent>`__ is a framework for\ndeveloping LLM applicati"
  },
  {
    "path": "docs/source/getting_started/concepts.md",
    "chars": 13079,
    "preview": "# Key Concepts\r\n\r\n## Qwen\r\n\r\nQwen (Chinese: 通义千问; pinyin: _Tongyi Qianwen_) is the large language model and large multim"
  },
  {
    "path": "docs/source/getting_started/quantization_benchmark.rst",
    "chars": 3009,
    "preview": "Performance of Quantized Models\n==================================\n\n.. attention:: \n    To be updated for Qwen3.\n\nThis s"
  },
  {
    "path": "docs/source/getting_started/quickstart.md",
    "chars": 21754,
    "preview": "# Quickstart\r\n\r\nThis guide helps you quickly start using Qwen3. \r\nWe provide examples of [Hugging Face Transformers](htt"
  },
  {
    "path": "docs/source/getting_started/speed_benchmark.md",
    "chars": 31150,
    "preview": "# Speed Benchmark\r\n\r\nWe report the speed performance of bfloat16 models and quantized models (including FP8, GPTQ, AWQ) "
  },
  {
    "path": "docs/source/getting_started/thinking_budget.md",
    "chars": 3812,
    "preview": "# Thinking budget\n\nThis example demonstrates the inference process with thinking budgets using Qwen3 series models. The "
  },
  {
    "path": "docs/source/index.rst",
    "chars": 4924,
    "preview": "Welcome to Qwen!\n================\n\n.. figure:: https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen3.png\n "
  },
  {
    "path": "docs/source/inference/transformers.md",
    "chars": 11793,
    "preview": "# Transformers\r\n\r\nTransformers is a library of pretrained natural language processing for inference and training. \r\nDeve"
  },
  {
    "path": "docs/source/quantization/awq.md",
    "chars": 6102,
    "preview": "# AWQ\r\n\r\n:::{attention}\r\nTo be updated for Qwen3.\r\n:::\r\n\r\nFor quantized models, one of our recommendations is the usage "
  },
  {
    "path": "docs/source/quantization/gptq.md",
    "chars": 11207,
    "preview": "# GPTQ\r\n\r\n:::{attention}\r\nTo be updated for Qwen3.\r\n:::\r\n\r\n[GPTQ](https://arxiv.org/abs/2210.17323) is a quantization me"
  },
  {
    "path": "docs/source/quantization/llama.cpp.md",
    "chars": 7943,
    "preview": "# llama.cpp\r\n\r\nQuantization is a major topic for local inference of LLMs, as it reduces the memory footprint.\r\nUndoubtab"
  },
  {
    "path": "docs/source/run_locally/llama.cpp.md",
    "chars": 15415,
    "preview": "# llama.cpp\r\n\r\n[^GGUF]: GPT-Generated Unified Format\r\n\r\n:::{dropdown} llama.cpp as a C++ library\r\nBefore starting, let's"
  },
  {
    "path": "docs/source/run_locally/lmstudio.md",
    "chars": 3350,
    "preview": "# LM Studio\n\n[LM Studio](https://lmstudio.ai) is a powerful desktop application for experimenting & developing with loca"
  },
  {
    "path": "docs/source/run_locally/mlx-lm.md",
    "chars": 1845,
    "preview": "# MLX LM\r\n\r\n:::{attention}\r\nTo be updated for Qwen3.\r\n:::\r\n\r\n[mlx-lm](https://github.com/ml-explore/mlx-examples/tree/ma"
  },
  {
    "path": "docs/source/run_locally/ollama.md",
    "chars": 3523,
    "preview": "# Ollama\r\n\r\n:::{attention}\r\nTo be updated for Qwen3.\r\n:::\r\n\r\n[Ollama](https://ollama.com/) helps you run LLMs locally wi"
  },
  {
    "path": "docs/source/training/axolotl.md",
    "chars": 4102,
    "preview": "# Axolotl\r\n\r\nThis guide will help you get started with post-training (SFT, RLHF, RM, PRM) for Qwen3 / Qwen3_MOE using Ax"
  },
  {
    "path": "docs/source/training/llama_factory.md",
    "chars": 4990,
    "preview": "# LLaMA-Factory\r\n\r\n:::{attention}\r\nTo be updated for Qwen3.\r\n:::\r\n\r\nHere we provide a script for supervised finetuning Q"
  },
  {
    "path": "docs/source/training/ms_swift.md",
    "chars": 15183,
    "preview": "# MS-SWIFT\r\n\r\nModelScope SWIFT (**ms-swift**) is the large model and multimodal large model training and deployment fram"
  },
  {
    "path": "docs/source/training/unsloth.md",
    "chars": 4437,
    "preview": "# Unsloth\n\nThis guide will teach you how to easily train Qwen3 models with Unsloth. Unsloth simplifies local model train"
  },
  {
    "path": "docs/source/training/verl.md",
    "chars": 4816,
    "preview": "# verl\r\n\r\nverl is a flexible, efficient and production-ready RL training library for large language models (LLMs).\r\n\r\nve"
  },
  {
    "path": "eval/README.md",
    "chars": 3041,
    "preview": "This folder provides scripts to reproduce evaluation results across various benchmarks for the **Qwen** series of large "
  },
  {
    "path": "eval/configs/ARCAGI-Qwen3-235B-A22B-Instruct-2507.yaml",
    "chars": 830,
    "preview": "# Data from https://github.com/fchollet/ARC-AGI/tree/399030444e0ab0cc8b4e199870fb20b863846f34/data/evaluation\n# Prompt T"
  },
  {
    "path": "eval/data/arc_agi_1.jsonl",
    "chars": 2928514,
    "preview": "{\"prompt\": \"Here are the example input and output pairs from which you should learn the underlying rule to later predict"
  },
  {
    "path": "eval/eval/arc_agi_1.py",
    "chars": 2162,
    "preview": "import json\nimport re\nfrom collections import defaultdict\nimport numpy as np\n\ndef parse_model_output(output):\n    try:\n "
  },
  {
    "path": "eval/eval/eval.py",
    "chars": 2703,
    "preview": "import json\nimport argparse\nfrom tqdm import tqdm\nimport os\nimport yaml\n\nALL_TASKS = {}\n\nfrom arc_agi_1 import compute_s"
  },
  {
    "path": "eval/eval_res/ARCAGI-Qwen3-235B-A22B-Instruct-2507_eval_result.txt",
    "chars": 298,
    "preview": "\n--- Evaluation Configuration Information ---\nModel Output File Path: output/ARCAGI-Qwen3-235B-A22B-Instruct-2507.jsonl\n"
  },
  {
    "path": "eval/generate_api_answers/infer_multithread.py",
    "chars": 5670,
    "preview": "import json\nimport argparse\nfrom tqdm import tqdm\nimport copy\nimport concurrent.futures\nimport threading\nimport os\nimpor"
  },
  {
    "path": "eval/generate_api_answers/utils_vllm.py",
    "chars": 3067,
    "preview": "import os\nimport time\nimport random\nimport openai\nimport logging\nfrom packaging.version import parse as parse_version\n\nI"
  },
  {
    "path": "eval/requirements.txt",
    "chars": 78,
    "preview": "# common\nopenai>=0.28.1,<=1.65.5\npackaging\nnumpy\ntqdm\ndatasets==2.14.6\npyyaml\n"
  },
  {
    "path": "examples/README.md",
    "chars": 150,
    "preview": "# Examples\r\n\r\n> [!IMPORTANT]\r\n> The examples in this directory should be considered deprecated at the moment and they ar"
  },
  {
    "path": "examples/demo/cli_demo.py",
    "chars": 9204,
    "preview": "# Copyright (c) Alibaba Cloud.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the roo"
  },
  {
    "path": "examples/demo/web_demo.py",
    "chars": 6504,
    "preview": "# Copyright (c) Alibaba Cloud.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the roo"
  },
  {
    "path": "examples/gcu-support/README.md",
    "chars": 1747,
    "preview": "# Qwen2.5 推理\n\n## 1、配置运行环境\n\n**安装驱动**\n\n```\n# <version_id> 为软件包具体版本号。\nchmod +x TopsRider_i3x_<version_id>_deb_amd64.run\n./T"
  },
  {
    "path": "examples/gcu-support/gcu_demo.py",
    "chars": 1097,
    "preview": "try:\n    import torch_gcu # 导入 torch_gcu\n    from torch_gcu import transfer_to_gcu #  transfer_to_gcu\nexcept Exception a"
  },
  {
    "path": "examples/llama-factory/finetune-zh.md",
    "chars": 7152,
    "preview": "# 使用LLaMA-Factory微调Qwen模型\n\n## LLAMA-Factory简介\n[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)是一个简单易用且高效的大模型训练框"
  },
  {
    "path": "examples/llama-factory/qwen2-7b-full-sft.yaml",
    "chars": 676,
    "preview": "### model\nmodel_name_or_path: Qwen/Qwen2-7B-Instruct\n\n### method\nstage: sft\ndo_train: true\nfinetuning_type: full\ndeepspe"
  },
  {
    "path": "examples/llama-factory/qwen2-7b-lora-sft.yaml",
    "chars": 713,
    "preview": "### model\nmodel_name_or_path: Qwen/Qwen2-7B-Instruct\n\n### method\nstage: sft\ndo_train: true\nfinetuning_type: lora\nlora_ta"
  },
  {
    "path": "examples/llama-factory/qwen2-7b-merge-lora.yaml",
    "chars": 327,
    "preview": "### Note: DO NOT use quantized model or quantization_bit when merging lora adapters\n\n### model\nmodel_name_or_path: Qwen/"
  },
  {
    "path": "examples/llama-factory/qwen2-7b-qlora-sft.yaml",
    "chars": 830,
    "preview": "### model\nmodel_name_or_path: Qwen/Qwen2-7B-Instruct\n\n### method\nstage: sft\ndo_train: true\nfinetuning_type: lora\nlora_ta"
  },
  {
    "path": "examples/speed-benchmark/README.md",
    "chars": 7123,
    "preview": "# Speed Benchmark\n\nThis document introduces the speed benchmark testing process for the Qwen2.5 series models (original "
  },
  {
    "path": "examples/speed-benchmark/README_zh.md",
    "chars": 5428,
    "preview": "# 效率评估\n\n本文介绍Qwen2.5系列模型（原始模型和量化模型）的效率测试流程，详细报告可参考 [Qwen2.5模型效率评估报告](https://qwen.readthedocs.io/en/latest/benchmark/spee"
  },
  {
    "path": "examples/speed-benchmark/requirements-perf-transformers.txt",
    "chars": 303,
    "preview": "# Note: install following requirements saparately\n# pip install torch==2.3.1\n# pip install git+https://github.com/AutoGP"
  },
  {
    "path": "examples/speed-benchmark/requirements-perf-vllm.txt",
    "chars": 64,
    "preview": "vllm==0.6.3.post1\ntorch==2.4.0\nmodelscope[framework]\naccelerate\n"
  },
  {
    "path": "examples/speed-benchmark/speed_benchmark_transformers.py",
    "chars": 7187,
    "preview": "# Copyright (c) Alibaba Cloud.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the roo"
  },
  {
    "path": "examples/speed-benchmark/speed_benchmark_vllm.py",
    "chars": 10458,
    "preview": "# Copyright (c) Alibaba Cloud.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the roo"
  }
]

// ... and 2 more files (download for full content)

About this extraction

This page contains the full source code of the QwenLM/Qwen3 GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 101 files (27.6 MB), approximately 920.6k tokens, and a symbol index with 45 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo