Full Code of QwenLM/Qwen-Agent for AI

main 31a4d36d1236 cached

304 files

4.7 MB

1.3M tokens

1125 symbols

1 requests

Download .txt

Showing preview only (5,033K chars total). Download the full file or copy to clipboard to get everything.

Repository: QwenLM/Qwen-Agent
Branch: main
Commit: 31a4d36d1236
Files: 304
Total size: 4.7 MB

Directory structure:
gitextract_aif8221u/

├── .github/
│   └── workflows/
│       └── deploy-docs.yml
├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── MANIFEST.in
├── README.md
├── README_CN.md
├── benchmark/
│   ├── code_interpreter/
│   │   ├── README.md
│   │   ├── code_interpreter.py
│   │   ├── config.py
│   │   ├── inference_and_execute.py
│   │   ├── metrics/
│   │   │   ├── __init__.py
│   │   │   ├── code_execution.py
│   │   │   ├── gsm8k.py
│   │   │   └── visualization.py
│   │   ├── models/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── dashscope.py
│   │   │   ├── llm.py
│   │   │   └── qwen.py
│   │   ├── parser/
│   │   │   ├── __init__.py
│   │   │   ├── internlm_parser.py
│   │   │   └── react_parser.py
│   │   ├── prompt/
│   │   │   ├── __init__.py
│   │   │   ├── internlm_react.py
│   │   │   ├── llama_react.py
│   │   │   ├── qwen_react.py
│   │   │   └── react.py
│   │   ├── requirements.txt
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── code_utils.py
│   │       └── data_utils.py
│   └── deepplanning/
│       ├── README.md
│       ├── aggregate_results.py
│       ├── env.example
│       ├── models_config.json
│       ├── requirements.txt
│       ├── run_all.sh
│       ├── shoppingplanning/
│       │   ├── README.md
│       │   ├── agent/
│       │   │   ├── call_llm.py
│       │   │   ├── prompts.py
│       │   │   └── shopping_agent.py
│       │   ├── data/
│       │   │   ├── level_1_query_meta.json
│       │   │   ├── level_2_query_meta.json
│       │   │   └── level_3_query_meta.json
│       │   ├── evaluation/
│       │   │   ├── evaluation_pipeline.py
│       │   │   └── score_statistics.py
│       │   ├── run.py
│       │   ├── run.sh
│       │   └── tools/
│       │       ├── __init__.py
│       │       ├── add_coupon_to_cart.py
│       │       ├── add_product_to_cart.py
│       │       ├── base_shopping_tool.py
│       │       ├── calculate_transport_time_tool.py
│       │       ├── delete_coupon_from_cart.py
│       │       ├── delete_product_from_cart.py
│       │       ├── filter_by_applicable_coupons_tool.py
│       │       ├── filter_by_brand_tool.py
│       │       ├── filter_by_color_tool.py
│       │       ├── filter_by_range_tool.py
│       │       ├── filter_by_size_tool.py
│       │       ├── get_cart_info.py
│       │       ├── get_product_details_tool.py
│       │       ├── get_user_info.py
│       │       ├── search_products_tool.py
│       │       ├── shopping_tool_schema.json
│       │       └── sort_product_tool.py
│       └── travelplanning/
│           ├── README.md
│           ├── agent/
│           │   ├── __init__.py
│           │   ├── call_llm.py
│           │   ├── prompts.py
│           │   └── tools_fn_agent.py
│           ├── data/
│           │   ├── travelplanning_query_en.json
│           │   └── travelplanning_query_zh.json
│           ├── evaluation/
│           │   ├── __init__.py
│           │   ├── constraints_commonsense.py
│           │   ├── constraints_hard.py
│           │   ├── convert_report.py
│           │   ├── eval_converted.py
│           │   └── utils.py
│           ├── run.py
│           ├── run.sh
│           └── tools/
│               ├── __init__.py
│               ├── attraction_query_tool.py
│               ├── base_travel_tool.py
│               ├── flight_query_tool.py
│               ├── hotel_query_tool.py
│               ├── location_search_tool.py
│               ├── restaurant_query_tool.py
│               ├── roadroute_query_tool.py
│               ├── tool_schema.json
│               ├── tool_schema_en.json
│               ├── tool_schema_zh.json
│               └── train_query_tool.py
├── browser_qwen/
│   ├── background.js
│   ├── manifest.json
│   └── src/
│       ├── content.js
│       ├── popup.html
│       └── popup.js
├── browser_qwen.md
├── browser_qwen_cn.md
├── examples/
│   ├── __init__.py
│   ├── assistant_add_custom_tool.py
│   ├── assistant_audio.py
│   ├── assistant_mcp_sqlite_bot.py
│   ├── assistant_omni.py
│   ├── assistant_qwen3.5.py
│   ├── assistant_qwen3.py
│   ├── assistant_qwen3_coder.py
│   ├── assistant_qwen3vl.py
│   ├── assistant_qwq.py
│   ├── assistant_rag.py
│   ├── assistant_weather_bot.py
│   ├── function_calling.py
│   ├── function_calling_in_parallel.py
│   ├── gpt_mentions.py
│   ├── group_chat_chess.py
│   ├── group_chat_demo.py
│   ├── llm_quick_chat_oai.py
│   ├── llm_riddles.py
│   ├── llm_vl_mix_text.py
│   ├── long_dialogue.py
│   ├── multi_agent_router.py
│   ├── parallel_doc_qa.py
│   ├── qwen2vl_assistant_tooluse.py
│   ├── qwen2vl_assistant_video.py
│   ├── qwen2vl_function_calling.py
│   ├── react_data_analysis.py
│   ├── resource/
│   │   └── stock_prices.csv
│   ├── tir_math.py
│   ├── virtual_memory_qa.py
│   └── visual_storytelling.py
├── qwen-agent-docs/
│   └── website/
│       ├── .gitignore
│       ├── app/
│       │   ├── [lang]/
│       │   │   ├── [[...mdxPath]]/
│       │   │   │   ├── index.css
│       │   │   │   └── page.jsx
│       │   │   └── layout.tsx
│       │   ├── layout.tsx
│       │   ├── page.tsx
│       │   ├── robots.ts
│       │   └── sitemap.ts
│       ├── content/
│       │   └── en/
│       │       ├── _meta.ts
│       │       ├── benchmarks/
│       │       │   ├── _meta.ts
│       │       │   ├── deepplanning/
│       │       │   │   └── index.mdx
│       │       │   └── index.md
│       │       ├── guide/
│       │       │   ├── _meta.ts
│       │       │   ├── core_moduls/
│       │       │   │   ├── _meta.ts
│       │       │   │   ├── agent.md
│       │       │   │   ├── context.md
│       │       │   │   ├── llm.md
│       │       │   │   ├── mcp.md
│       │       │   │   ├── rag.md
│       │       │   │   ├── schema.md
│       │       │   │   └── tool.md
│       │       │   ├── get_started/
│       │       │   │   ├── _meta.ts
│       │       │   │   ├── configuration.md
│       │       │   │   ├── features.md
│       │       │   │   ├── install.md
│       │       │   │   └── quickstart.md
│       │       │   └── index.md
│       │       └── index.md
│       ├── mdx-components.tsx
│       ├── next-env.d.ts
│       ├── next.config.mjs
│       ├── package.json
│       ├── postcss.config.js
│       ├── public/
│       │   ├── .nojekyll
│       │   ├── fonts/
│       │   │   ├── Monoton/
│       │   │   │   └── OFL.txt
│       │   │   └── Orbitron/
│       │   │       ├── OFL.txt
│       │   │       └── README.txt
│       │   └── site.webmanifest
│       ├── src/
│       │   └── components/
│       │       ├── font-loader.tsx
│       │       ├── leaderboard.tsx
│       │       └── locale-anchor.tsx
│       └── tsconfig.json
├── qwen_agent/
│   ├── __init__.py
│   ├── agent.py
│   ├── agents/
│   │   ├── __init__.py
│   │   ├── article_agent.py
│   │   ├── assistant.py
│   │   ├── dialogue_retrieval_agent.py
│   │   ├── dialogue_simulator.py
│   │   ├── doc_qa/
│   │   │   ├── __init__.py
│   │   │   ├── basic_doc_qa.py
│   │   │   ├── parallel_doc_qa.py
│   │   │   ├── parallel_doc_qa_member.py
│   │   │   └── parallel_doc_qa_summary.py
│   │   ├── fncall_agent.py
│   │   ├── group_chat.py
│   │   ├── group_chat_auto_router.py
│   │   ├── group_chat_creator.py
│   │   ├── human_simulator.py
│   │   ├── keygen_strategies/
│   │   │   ├── __init__.py
│   │   │   ├── gen_keyword.py
│   │   │   ├── gen_keyword_with_knowledge.py
│   │   │   ├── split_query.py
│   │   │   ├── split_query_then_gen_keyword.py
│   │   │   └── split_query_then_gen_keyword_with_knowledge.py
│   │   ├── memo_assistant.py
│   │   ├── react_chat.py
│   │   ├── router.py
│   │   ├── tir_agent.py
│   │   ├── user_agent.py
│   │   ├── virtual_memory_agent.py
│   │   ├── write_from_scratch.py
│   │   └── writing/
│   │       ├── __init__.py
│   │       ├── continue_writing.py
│   │       ├── expand_writing.py
│   │       └── outline_writing.py
│   ├── gui/
│   │   ├── __init__.py
│   │   ├── assets/
│   │   │   ├── app.css
│   │   │   └── appBot.css
│   │   ├── gradio_dep.py
│   │   ├── gradio_utils.py
│   │   ├── utils.py
│   │   └── web_ui.py
│   ├── llm/
│   │   ├── __init__.py
│   │   ├── azure.py
│   │   ├── base.py
│   │   ├── fncall_prompts/
│   │   │   ├── __init__.py
│   │   │   ├── base_fncall_prompt.py
│   │   │   ├── nous_fncall_prompt.py
│   │   │   └── qwen_fncall_prompt.py
│   │   ├── function_calling.py
│   │   ├── oai.py
│   │   ├── openvino.py
│   │   ├── qwen_dashscope.py
│   │   ├── qwenaudio_dashscope.py
│   │   ├── qwenomni_oai.py
│   │   ├── qwenvl_dashscope.py
│   │   ├── qwenvl_oai.py
│   │   ├── qwenvlo_dashscope.py
│   │   ├── schema.py
│   │   └── transformers_llm.py
│   ├── log.py
│   ├── memory/
│   │   ├── __init__.py
│   │   └── memory.py
│   ├── multi_agent_hub.py
│   ├── settings.py
│   ├── tools/
│   │   ├── __init__.py
│   │   ├── amap_weather.py
│   │   ├── base.py
│   │   ├── code_interpreter.py
│   │   ├── doc_parser.py
│   │   ├── extract_doc_vocabulary.py
│   │   ├── image_gen.py
│   │   ├── image_search.py
│   │   ├── image_zoom_in_qwen3vl.py
│   │   ├── mcp_manager.py
│   │   ├── python_executor.py
│   │   ├── resource/
│   │   │   ├── code_interpreter_image.dockerfile
│   │   │   └── code_interpreter_init_kernel.py
│   │   ├── retrieval.py
│   │   ├── search_tools/
│   │   │   ├── __init__.py
│   │   │   ├── base_search.py
│   │   │   ├── front_page_search.py
│   │   │   ├── hybrid_search.py
│   │   │   ├── keyword_search.py
│   │   │   └── vector_search.py
│   │   ├── simple_doc_parser.py
│   │   ├── storage.py
│   │   ├── web_extractor.py
│   │   └── web_search.py
│   └── utils/
│       ├── __init__.py
│       ├── output_beautify.py
│       ├── parallel_executor.py
│       ├── qwen.tiktoken
│       ├── str_processing.py
│       ├── tokenization_qwen.py
│       └── utils.py
├── qwen_server/
│   ├── __init__.py
│   ├── add_qwen_libs.py
│   ├── assistant_server.py
│   ├── css/
│   │   └── main.css
│   ├── database_server.py
│   ├── js/
│   │   └── main.js
│   ├── output_beautify.py
│   ├── schema.py
│   ├── server_config.json
│   ├── utils.py
│   └── workstation_server.py
├── run_server.py
├── setup.py
└── tests/
    ├── agents/
    │   ├── test_article_agent.py
    │   ├── test_assistant.py
    │   ├── test_custom_tool_object.py
    │   ├── test_doc_qa.py
    │   ├── test_parallel_qa.py
    │   ├── test_react_chat.py
    │   └── test_router.py
    ├── examples/
    │   ├── test_examples.py
    │   ├── test_long_dialogue.py
    │   └── test_vm_qa.py
    ├── llm/
    │   ├── test_continue.py
    │   ├── test_dashscope.py
    │   ├── test_function_content.py
    │   └── test_oai.py
    ├── memory/
    │   └── test_memory.py
    ├── qwen_server/
    │   └── test_database_server.py
    └── tools/
        ├── test_doc_parser.py
        ├── test_hybrid_search.py
        ├── test_keyword_search.py
        ├── test_simple_doc_parser.py
        ├── test_tools.py
        └── test_vector_search.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/deploy-docs.yml
================================================
name: Deploy to GitHub Pages

on:
  push:
    branches:
      - main  # 或者你的主分支名称
    paths:
      - 'qwen-agent-docs/website/**'
  workflow_dispatch:  # 允许手动触发

permissions:
  contents: read
  pages: write
  id-token: write

# 防止并发部署
concurrency:
  group: "pages"
  cancel-in-progress: false

jobs:
  build:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: qwen-agent-docs/website
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '18'
          cache: 'npm'
          cache-dependency-path: 'qwen-agent-docs/website/package-lock.json'

      - name: Install dependencies
        run: npm ci

      - name: Build website
        run: npm run build
        env:
          NODE_ENV: production

      - name: Upload artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: qwen-agent-docs/website/out

  deploy:
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    needs: build
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4



================================================
FILE: .gitignore
================================================
env
*.pyc
__pycache__

.idea
.vscode
.DS_Store
*.ipynb_checkpoints

qwen_agent/llm/gpt.py
qwen_agent/llm/tools.py
workspace/*

benchmark/log/*
benchmark/output_data/*
benchmark/upload_file/*
benchmark/upload_file_clean/*
benchmark/eval_data/
Qwen-Agent

docqa/*
log/*
log.jsonl

ai_agent/debug.json
ai_agent/local_prompts/*
**/debug.json
**/debug.log*
debug.json
ai_agent/log.jsonl
qwen_agent.egg-info/*
build/*
dist/*

examples/*.ipynb
**/workspace/*
test/*
tests/env.sh
examples/docqa_multi_agent.py
examples/docqa_multihp_agents.py
**/workspace/*
test/*
tests/env.sh
examples/data/*
test.db

benchmark/deepplanning/travelplanning/database/database_en/
benchmark/deepplanning/travelplanning/database/database_zh/
benchmark/deepplanning/travelplanning/.env
__pycache__/
benchmark/deepplanning/travelplanning/CUSTOM_AGENT.md
benchmark/deepplanning/travelplanning/MODEL_CONFIG.md



# Website (Next.js/Node.js)
qwen-agent-docs/website/node_modules/
#qwen-agent-docs/website/package-lock.json
qwen-agent-docs/website/.next/
qwen-agent-docs/website/out/
qwen-agent-docs/website/.env*
qwen-agent-docs/website/.temp-source-repo/
qwen-agent-docs/website/.source-docs/
qwen-agent-docs/website/last-sync.json
qwen-agent-docs/website/_pagefind/
qwen-agent-docs/website/*.tsbuildinfo


================================================
FILE: .pre-commit-config.yaml
================================================
repos:
  - repo: https://github.com/pycqa/flake8.git
    rev: 5.0.4
    hooks:
      - id: flake8
        args: ["--max-line-length=300", "--extend-ignore=E231,E702,E251,W604"]  # TODO: Set to 120 and `pre-commit run --all-files`.
  - repo: https://github.com/PyCQA/isort.git
    rev: 5.11.5
    hooks:
      - id: isort
        args: ["--line-length", "120"]
  - repo: https://github.com/pre-commit/mirrors-yapf.git
    rev: v0.32.0
    hooks:
      - id: yapf
        args: ["--style", "{based_on_style: google, column_limit: 120}", "-i"]
  - repo: https://github.com/pre-commit/pre-commit-hooks.git
    rev: v4.3.0
    hooks:
      - id: trailing-whitespace
      - id: check-yaml
      - id: end-of-file-fixer
      - id: requirements-txt-fixer
      - id: double-quote-string-fixer
      - id: check-merge-conflict
      - id: fix-encoding-pragma
        args: ["--remove"]
      - id: mixed-line-ending
        args: ["--fix=lf"]


================================================
FILE: LICENSE
================================================

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: MANIFEST.in
================================================
include qwen_agent/utils/qwen.tiktoken
recursive-include qwen_agent/tools/resource *


================================================
FILE: README.md
================================================
<!---
Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

[中文](https://github.com/QwenLM/Qwen-Agent/blob/main/README_CN.md) ｜ English

<p align="center">
    <img src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen_agent.png" width="400"/>
<p>
<br>

<p align="center">
          💜 <a href="https://chat.qwen.ai/"><b>Qwen Chat</b></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Qwen">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/qwen">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://qwenlm.github.io/">Blog</a> &nbsp&nbsp ｜ &nbsp&nbsp📖 <a href="https://qwenlm.github.io/Qwen-Agent/en/">Documentation</a>

<br>
📊 <a href="https://qwenlm.github.io/Qwen-Agent/en/benchmarks/deepplanning/">Benchmark</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://github.com/QwenLM/Qwen/blob/main/assets/wechat.png">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp🫨 <a href="https://discord.gg/CV4E9rpNSD">Discord</a>&nbsp&nbsp
</p>


Qwen-Agent is a framework for developing LLM applications based on the instruction following, tool usage, planning, and
memory capabilities of Qwen.
It also comes with example applications such as Browser Assistant, Code Interpreter, and Custom Assistant.
Now Qwen-Agent plays as the backend of [Qwen Chat](https://chat.qwen.ai/).

# News
* 🔥🔥🔥Feb 16, 2026: Open-sourced Qwen3.5. For usage examples, refer to [Qwen3.5 Agent Demo](./examples/assistant_qwen3.5.py).
* Jan 27, 2026: Open-sourced agent evaluation benchmark [DeepPlanning](https://qwenlm.github.io/Qwen-Agent/en/benchmarks/deepplanning/) and added Qwen-Agent [documentation](https://qwenlm.github.io/Qwen-Agent/en/guide/).
* Sep 23, 2025: Added [Qwen3-VL Tool-call Demo](./examples/cookbook_think_with_images.ipynb), supporting tools such as zoom in, image search, and web search.
* Jul 23, 2025: Add [Qwen3-Coder Tool-call Demo](./examples/assistant_qwen3_coder.py); Added native API tool call interface support, such as using vLLM's built-in tool call parsing.
* May 1, 2025: Add [Qwen3 Tool-call Demo](./examples/assistant_qwen3.py), and add [MCP Cookbooks](./examples/).
* Mar 18, 2025: Support for the `reasoning_content` field; adjust the default [Function Call template](./qwen_agent/llm/fncall_prompts/nous_fncall_prompt.py), which is applicable to the Qwen2.5 series general models and QwQ-32B. If you need to use the old version of the template, please refer to the [example](./examples/function_calling.py) for passing parameters.
* Mar 7, 2025: Added [QwQ-32B Tool-call Demo](./examples/assistant_qwq.py). It supports parallel, multi-step, and multi-turn tool calls.
* Dec 3, 2024: Upgrade GUI to Gradio 5 based. Note: GUI requires Python 3.10 or higher.
* Sep 18, 2024: Added [Qwen2.5-Math Demo](./examples/tir_math.py) to showcase the Tool-Integrated Reasoning capabilities of Qwen2.5-Math. Note: The python executor is not sandboxed and is intended for local testing only, not for production use.

# Getting Started

## Installation

- Install the stable version from PyPI:
```bash
pip install -U "qwen-agent[gui,rag,code_interpreter,mcp]"
# Or use `pip install -U qwen-agent` for the minimal requirements.
# The optional requirements, specified in double brackets, are:
#   [gui] for Gradio-based GUI support;
#   [rag] for RAG support;
#   [code_interpreter] for Code Interpreter support;
#   [mcp] for MCP support.
```

- Alternatively, you can install the latest development version from the source:
```bash
git clone https://github.com/QwenLM/Qwen-Agent.git
cd Qwen-Agent
pip install -e ./"[gui,rag,code_interpreter,mcp]"
# Or `pip install -e ./` for minimal requirements.
```

## Preparation: Model Service

You can either use the model service provided by Alibaba
Cloud's [DashScope](https://help.aliyun.com/zh/dashscope/developer-reference/quick-start), or deploy and use your own
model service using the open-source Qwen models.

- If you choose to use the model service offered by DashScope, please ensure that you set the environment
variable `DASHSCOPE_API_KEY` to your unique DashScope API key.

- Alternatively, if you prefer to deploy and use your own model service, please follow the instructions provided in the README of Qwen2 for deploying an OpenAI-compatible API service.
Specifically, consult the [vLLM](https://github.com/QwenLM/Qwen2?tab=readme-ov-file#vllm) section for high-throughput GPU deployment or the [Ollama](https://github.com/QwenLM/Qwen2?tab=readme-ov-file#ollama) section for local CPU (+GPU) deployment.
For the QwQ and Qwen3 model, it is recommended to **do not** add the `--enable-auto-tool-choice` and `--tool-call-parser hermes` parameters, as Qwen-Agent will parse the tool outputs from vLLM on its own.
For Qwen3-Coder, it is recommended to enable both of the above parameters, use vLLM's built-in tool parsing, and combine with the `use_raw_api` parameter [usage](#how-to-pass-llm-parameters-to-the-agent).

## Developing Your Own Agent

Qwen-Agent offers atomic components, such as LLMs (which inherit from `class BaseChatModel` and come with [function calling](https://github.com/QwenLM/Qwen-Agent/blob/main/examples/function_calling.py)) and Tools (which inherit
from `class BaseTool`), along with high-level components like Agents (derived from `class Agent`).

The following example illustrates the process of creating an agent capable of reading PDF files and utilizing tools, as
well as incorporating a custom tool:

```py
import pprint
import urllib.parse
import json5
from qwen_agent.agents import Assistant
from qwen_agent.tools.base import BaseTool, register_tool
from qwen_agent.utils.output_beautify import typewriter_print


# Step 1 (Optional): Add a custom tool named `my_image_gen`.
@register_tool('my_image_gen')
class MyImageGen(BaseTool):
    # The `description` tells the agent the functionality of this tool.
    description = 'AI painting (image generation) service, input text description, and return the image URL drawn based on text information.'
    # The `parameters` tell the agent what input parameters the tool has.
    parameters = [{
        'name': 'prompt',
        'type': 'string',
        'description': 'Detailed description of the desired image content, in English',
        'required': True
    }]

    def call(self, params: str, **kwargs) -> str:
        # `params` are the arguments generated by the LLM agent.
        prompt = json5.loads(params)['prompt']
        prompt = urllib.parse.quote(prompt)
        return json5.dumps(
            {'image_url': f'https://image.pollinations.ai/prompt/{prompt}'},
            ensure_ascii=False)


# Step 2: Configure the LLM you are using.
llm_cfg = {
    # Use the model service provided by DashScope:
    'model': 'qwen-max-latest',
    'model_type': 'qwen_dashscope',
    # 'api_key': 'YOUR_DASHSCOPE_API_KEY',
    # It will use the `DASHSCOPE_API_KEY' environment variable if 'api_key' is not set here.

    # Use a model service compatible with the OpenAI API, such as vLLM or Ollama:
    # 'model': 'Qwen2.5-7B-Instruct',
    # 'model_server': 'http://localhost:8000/v1',  # base_url, also known as api_base
    # 'api_key': 'EMPTY',

    # (Optional) LLM hyperparameters for generation:
    'generate_cfg': {
        'top_p': 0.8
    }
}

# Step 3: Create an agent. Here we use the `Assistant` agent as an example, which is capable of using tools and reading files.
system_instruction = '''After receiving the user's request, you should:
- first draw an image and obtain the image url,
- then run code `request.get(image_url)` to download the image,
- and finally select an image operation from the given document to process the image.
Please show the image using `plt.show()`.'''
tools = ['my_image_gen', 'code_interpreter']  # `code_interpreter` is a built-in tool for executing code. For configuration details, please refer to the FAQ.
files = ['./examples/resource/doc.pdf']  # Give the bot a PDF file to read.
bot = Assistant(llm=llm_cfg,
                system_message=system_instruction,
                function_list=tools,
                files=files)

# Step 4: Run the agent as a chatbot.
messages = []  # This stores the chat history.
while True:
    # For example, enter the query "draw a dog and rotate it 90 degrees".
    query = input('\nuser query: ')
    # Append the user query to the chat history.
    messages.append({'role': 'user', 'content': query})
    response = []
    response_plain_text = ''
    print('bot response:')
    for response in bot.run(messages=messages):
        # Streaming output.
        response_plain_text = typewriter_print(response, response_plain_text)
    # Append the bot responses to the chat history.
    messages.extend(response)
```

In addition to using built-in agent implementations such as `class Assistant`, you can also develop your own agent implemetation by inheriting from `class Agent`.

The framework also provides a convenient GUI interface, supporting the rapid deployment of Gradio Demos for Agents.
For example, in the case above, you can quickly launch a Gradio Demo using the following code:

```py
from qwen_agent.gui import WebUI
WebUI(bot).run()  # bot is the agent defined in the above code, we do not repeat the definition here for saving space.
```
Now you can chat with the Agent in the web UI. Please refer to the [examples](https://github.com/QwenLM/Qwen-Agent/blob/main/examples) directory for more usage examples.

# FAQ
## How to Use the Code Interpreter Tool?

We implement a code interpreter tool based on local Docker containers. You can enable the built-in `code interpreter` tool for your agent, allowing it to autonomously write code according to specific scenarios, execute it securely within an isolated sandbox environment, and return the execution results.

⚠️ **Note**: Before using this tool, please ensure that Docker is installed and running on your local operating system. The time required to build the container image for the first time depends on your network conditions. For Docker installation and setup instructions, please refer to the [official documentation](https://docs.docker.com/desktop/).


## How to Use MCP?

You can select the required tools on the open-source [MCP server website](https://github.com/modelcontextprotocol/servers) and configure the relevant environment.

Example of MCP invocation format:
```
{
    "mcpServers": {
        "memory": {
            "command": "npx",
            "args": ["-y", "@modelcontextprotocol/server-memory"]
        },
        "filesystem": {
            "command": "npx",
            "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/files"]
        },
        "sqlite" : {
            "command": "uvx",
            "args": [
                "mcp-server-sqlite",
                "--db-path",
                "test.db"
            ]
        }
    }
}
```
For more details, you can refer to the [MCP usage example](./examples/assistant_mcp_sqlite_bot.py)

The dependencies required to run this example are as follows:
```
# Node.js (Download and install the latest version from the Node.js official website)
# uv 0.4.18 or higher (Check with uv --version)
# Git (Check with git --version)
# SQLite (Check with sqlite3 --version)

# For macOS users, you can install these components using Homebrew:
brew install uv git sqlite3

# For Windows users, you can install these components using winget:
winget install --id=astral-sh.uv -e
winget install git.git sqlite.sqlite
```
## Do you have function calling (aka tool calling)?

Yes. The LLM classes provide [function calling](https://github.com/QwenLM/Qwen-Agent/blob/main/examples/function_calling.py). Additionally, some Agent classes also are built upon the function calling capability, e.g., FnCallAgent and ReActChat.

The current default tool calling template natively supports **Parallel Function Calls**.

## How to pass LLM parameters to the Agent?
```py
llm_cfg = {
    # The model name being used:
    'model': 'qwen3-32b',
    # The model service being used:
    'model_type': 'qwen_dashscope',
    # If 'api_key' is not set here, it will default to reading the `DASHSCOPE_API_KEY` environment variable:
    'api_key': 'YOUR_DASHSCOPE_API_KEY',

    # Using an OpenAI API compatible model service, such as vLLM or Ollama:
    # 'model': 'qwen3-32b',
    # 'model_server': 'http://localhost:8000/v1',  # base_url, also known as api_base
    # 'api_key': 'EMPTY',

    # (Optional) LLM hyperparameters:
    'generate_cfg': {
        # This parameter will affect the tool-call parsing logic. Default is False:
          # Set to True: when content is `<think>this is the thought</think>this is the answer`
          # Set to False: when response consists of reasoning_content and content
        # 'thought_in_content': True,

        # tool-call template: default is nous (recommended for qwen3):
        # 'fncall_prompt_type': 'nous'

        # Maximum input length, messages will be truncated if they exceed this length, please adjust according to model API:
        # 'max_input_tokens': 58000

        # Parameters that will be passed directly to the model API, such as top_p, enable_thinking, etc., according to the API specifications:
        # 'top_p': 0.8

        # Using the API's native tool call interface
        # 'use_raw_api': True,
    }
}
```

## How to do question-answering over super-long documents involving 1M tokens?

We have released [a fast RAG solution](https://github.com/QwenLM/Qwen-Agent/blob/main/examples/assistant_rag.py), as well as [an expensive but competitive agent](https://github.com/QwenLM/Qwen-Agent/blob/main/examples/parallel_doc_qa.py), for doing question-answering over super-long documents. They have managed to outperform native long-context models on two challenging benchmarks while being more efficient, and perform perfectly in the single-needle "needle-in-the-haystack" pressure test involving 1M-token contexts. See the [blog](https://qwenlm.github.io/blog/qwen-agent-2405/) for technical details.

<p align="center">
    <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/assets/qwen_agent/qwen-agent-2405-blog-long-context-results.png" width="400"/>
<p>

# Application: BrowserQwen

BrowserQwen is a browser assistant built upon Qwen-Agent. Please refer to its [documentation](https://github.com/QwenLM/Qwen-Agent/blob/main/browser_qwen.md) for details.

# Disclaimer

The Docker container-based code interpreter mounts only the specified working directory and implements basic sandbox isolation, but it should still be used with caution in production environments.


================================================
FILE: README_CN.md
================================================
<!---
Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

中文 ｜ [English](./README.md)

<p align="center">
    <img src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen_agent.png" width="400"/>
<p>
<br>

<p align="center">
          💜 <a href="https://chat.qwen.ai/"><b>Qwen Chat</b></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Qwen">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/qwen">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://qwenlm.github.io/">Blog</a> &nbsp&nbsp ｜ &nbsp&nbsp📖 <a href="https://qwenlm.github.io/Qwen-Agent/en/">Documentation</a>

<br>
📊 <a href="https://qwenlm.github.io/Qwen-Agent/en/benchmarks/deepplanning/">Benchmark</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://github.com/QwenLM/Qwen/blob/main/assets/wechat.png">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp🫨 <a href="https://discord.gg/CV4E9rpNSD">Discord</a>&nbsp&nbsp
</p>

Qwen-Agent是一个开发框架。开发者可基于本框架开发Agent应用，充分利用基于通义千问模型（Qwen）的指令遵循、工具使用、规划、记忆能力。本项目也提供了浏览器助手、代码解释器、自定义助手等示例应用。
现在，Qwen-Agent 作为 [Qwen Chat](https://chat.qwen.ai/) 的后端运行。

# 更新
* 🔥🔥🔥Feb 16, 2026: 开源Qwen3.5，调用实例参考 [Qwen3.5 Agent Demo](./examples/assistant_qwen3.5.py)。
* Jan 27, 2026: 开源Agent评测基准[DeepPlanning](https://qwenlm.github.io/Qwen-Agent/en/benchmarks/deepplanning/)，增加Qwen-Agent[文档](https://qwenlm.github.io/Qwen-Agent/en/guide/)。
* Sep 23, 2025: 新增 [Qwen3-VL Tool-call Demo](./examples/cookbook_think_with_images.ipynb)，支持使用抠图、图搜、文搜等工具。
* Jul 23, 2025: 新增 [Qwen3-Coder Tool-call Demo](./examples/assistant_qwen3_coder.py)；新增原生API工具调用接口支持，例如可使用vLLM自带的工具调用解析。
* May 1, 2025: 新增 [Qwen3 Tool-call Demo](./examples/assistant_qwen3.py)；新增 [MCP cookbooks](./examples/)。
* Mar 18, 2025: 支持`reasoning_content`字段；调整默认的[Function Call模版](./qwen_agent/llm/fncall_prompts/nous_fncall_prompt.py)（适用于Qwen2.5系列通用模型、QwQ-32B）。如果需要使用旧版模版：请参考[样例](./examples/function_calling.py)传递参数。
* Mar 7, 2025: 新增[QwQ-32B Tool-call Demo](./examples/assistant_qwq.py)，支持并行、多步、多轮工具调用。
* Dec 3, 2024: GUI 升级为基于 Gradio 5。注意：如果需要使用GUI，Python版本需要3.10及以上。
* Sep 18, 2024: 新增[Qwen2.5-Math Demo](./examples/tir_math.py)以展示Qwen2.5-Math基于工具的推理能力。注意：代码执行工具未进行沙箱保护，仅适用于本地测试，不可用于生产。

# 开始上手

## 安装

- 从 PyPI 安装稳定版本：
```bash
pip install -U "qwen-agent[rag,code_interpreter,gui,mcp]"
# 或者，使用 `pip install -U qwen-agent` 来安装最小依赖。
# 可使用双括号指定如下的可选依赖：
#   [gui] 用于提供基于 Gradio 的 GUI 支持；
#   [rag] 用于支持 RAG；
#   [code_interpreter] 用于提供代码解释器相关支持；
#   [mcp] 用于支持 MCP。
```

- 或者，你可以从源码安装最新的开发版本：
```bash
git clone https://github.com/QwenLM/Qwen-Agent.git
cd Qwen-Agent
pip install -e ./"[gui,rag,code_interpreter,mcp]"
# 或者，使用 `pip install -e ./` 安装最小依赖。
```


## 准备：模型服务

Qwen-Agent支持接入阿里云[DashScope](https://help.aliyun.com/zh/dashscope/developer-reference/quick-start)服务提供的Qwen模型服务，也支持通过OpenAI API方式接入开源的Qwen模型服务。

- 如果希望接入DashScope提供的模型服务，只需配置相应的环境变量`DASHSCOPE_API_KEY`为您的DashScope API Key。

- 或者，如果您希望部署并使用您自己的模型服务，请按照Qwen2的README中提供的指导进行操作，以部署一个兼容OpenAI接口协议的API服务。
具体来说，请参阅[vLLM](https://github.com/QwenLM/Qwen2?tab=readme-ov-file#vllm)一节了解高并发的GPU部署方式，或者查看[Ollama](https://github.com/QwenLM/Qwen2?tab=readme-ov-file#ollama)一节了解本地CPU（+GPU）部署。

注意对于QwQ和Qwen3模型，建议启动服务时**不加**`--enable-auto-tool-choice`和`--tool-call-parser hermes`两个参数，因为Qwen-Agent会自行解析vLLM的工具输出。
对于Qwen3-Coder，则建议开启以上两个参数，使用vLLM自带的工具解析，并搭配`use_raw_api`参数[使用](#如何传递llm参数给agent)。

## 快速开发

框架提供了大模型（LLM，继承自`class BaseChatModel`，并提供了[Function Calling](./examples/function_calling.py)功能）和工具（Tool，继承自`class BaseTool`）等原子组件，也提供了智能体（Agent）等高级抽象组件（继承自`class Agent`）。

以下示例演示了如何增加自定义工具，并快速开发一个带有设定、知识库和工具使用能力的智能体：

```py
import pprint
import urllib.parse
import json5
from qwen_agent.agents import Assistant
from qwen_agent.tools.base import BaseTool, register_tool
from qwen_agent.utils.output_beautify import typewriter_print


# 步骤 1（可选）：添加一个名为 `my_image_gen` 的自定义工具。
@register_tool('my_image_gen')
class MyImageGen(BaseTool):
    # `description` 用于告诉智能体该工具的功能。
    description = 'AI 绘画（图像生成）服务，输入文本描述，返回基于文本信息绘制的图像 URL。'
    # `parameters` 告诉智能体该工具有哪些输入参数。
    parameters = [{
        'name': 'prompt',
        'type': 'string',
        'description': '期望的图像内容的详细描述',
        'required': True
    }]

    def call(self, params: str, **kwargs) -> str:
        # `params` 是由 LLM 智能体生成的参数。
        prompt = json5.loads(params)['prompt']
        prompt = urllib.parse.quote(prompt)
        return json5.dumps(
            {'image_url': f'https://image.pollinations.ai/prompt/{prompt}'},
            ensure_ascii=False)


# 步骤 2：配置您所使用的 LLM。
llm_cfg = {
    # 使用 DashScope 提供的模型服务：
    'model': 'qwen-max-latest',
    'model_type': 'qwen_dashscope',
    # 'api_key': 'YOUR_DASHSCOPE_API_KEY',
    # 如果这里没有设置 'api_key'，它将读取 `DASHSCOPE_API_KEY` 环境变量。

    # 使用与 OpenAI API 兼容的模型服务，例如 vLLM 或 Ollama：
    # 'model': 'Qwen2.5-7B-Instruct',
    # 'model_server': 'http://localhost:8000/v1',  # base_url，也称为 api_base
    # 'api_key': 'EMPTY',

    # （可选） LLM 的超参数：
    'generate_cfg': {
        'top_p': 0.8
    }
}

# 步骤 3：创建一个智能体。这里我们以 `Assistant` 智能体为例，它能够使用工具并读取文件。
system_instruction = '''在收到用户的请求后，你应该：
- 首先绘制一幅图像，得到图像的url，
- 然后运行代码`request.get`以下载该图像的url，
- 最后从给定的文档中选择一个图像操作进行图像处理。
用 `plt.show()` 展示图像。
你总是用中文回复用户。'''
tools = ['my_image_gen', 'code_interpreter']  # `code_interpreter` 是框架自带的工具，用于执行代码，请参考FAQ进行配置。
files = ['./examples/resource/doc.pdf']  # 给智能体一个 PDF 文件阅读。
bot = Assistant(llm=llm_cfg,
                system_message=system_instruction,
                function_list=tools,
                files=files)

# 步骤 4：作为聊天机器人运行智能体。
messages = []  # 这里储存聊天历史。
while True:
    # 例如，输入请求 "绘制一只狗并将其旋转 90 度"。
    query = input('\n用户请求: ')
    # 将用户请求添加到聊天历史。
    messages.append({'role': 'user', 'content': query})
    response = []
    response_plain_text = ''
    print('机器人回应:')
    for response in bot.run(messages=messages):
        # 流式输出。
        response_plain_text = typewriter_print(response, response_plain_text)
    # 将机器人的回应添加到聊天历史。
    messages.extend(response)
```

除了使用框架自带的智能体实现（如`class Assistant`），您也可以通过继承`class Agent`来自行开发您的智能体实现。

框架还提供了便捷的GUI接口，支持为Agent快速部署Gradio Demo。
例如上面的例子中，可以使用以下代码快速启动Gradio Demo：

```py
from qwen_agent.gui import WebUI
WebUI(bot).run()  # bot is the agent defined in the above code, we do not repeat the definition here for saving space.
```

现在您可以在Web UI中和Agent对话了。更多使用示例，请参阅[examples](./examples)目录。

# FAQ
## 如何使用代码解释器工具？
我们提供了一种基于本地 Docker 容器的代码解释器实现。您可以为智能体启用内置的 `code interpreter` 工具，使其能够根据具体场景自主编写代码，在隔离的沙箱环境中安全执行，并返回执行结果。
⚠️ **注意**：在使用该工具前，请确保已在本地操作系统上安装并启动 Docker 服务。首次构建容器镜像所需时间取决于您的网络状况。Docker 的安装与配置请参考 [官方文档](https://docs.docker.com/desktop/)。

## 如何使用MCP？
可以在开源的[MCP Server网站](https://github.com/modelcontextprotocol/servers)上选择需要的工具，并配置相关环境。

Qwen-Agent中MCP调用格式：
```
{
    "mcpServers": {
        "memory": {
            "command": "npx",
            "args": ["-y", "@modelcontextprotocol/server-memory"]
        },
        "filesystem": {
            "command": "npx",
            "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/files"]
        },
        "sqlite" : {
            "command": "uvx",
            "args": [
                "mcp-server-sqlite",
                "--db-path",
                "test.db"
            ]
        }
    }
}
```
具体可参考[MCP使用例子](./examples/assistant_mcp_sqlite_bot.py)

运行该例子需要额外安装的依赖有：
```
# Node.js（访问 Node.js 官网下载并安装最新版本, https://nodejs.org/）
# uv 0.4.18 或更高版本 (使用 uv --version 检查)
# Git (git --version 检查)
# SQLite (sqlite3 --version 检查)

# 对于 macOS 用户，可以使用 Homebrew 安装这些组件：
brew install uv git sqlite3

# 对于 Windows 用户，可以使用 winget 安装这些组件：
winget install --id=astral-sh.uv -e
winget install git.git sqlite.sqlite
```

## 支持函数调用（也称为工具调用）吗？

支持，LLM类提供了[函数调用](https://github.com/QwenLM/Qwen-Agent/blob/main/examples/function_calling.py)的支持。此外，一些Agent类如FnCallAgent和ReActChat也是基于函数调用功能构建的。

目前的默认工具调用模版原生支持 **并行工具调用**（Parallel Function call）。

## 如何传递LLM参数给Agent？
```py
llm_cfg = {
    # 使用的模型名：
    'model': 'qwen3-32b',
    # 使用的模型服务：
    'model_type': 'qwen_dashscope',
    # 如果这里没有设置 'api_key'，它将默认读取 `DASHSCOPE_API_KEY` 环境变量：
    'api_key': 'YOUR_DASHSCOPE_API_KEY',

    # 使用与 OpenAI API 兼容的模型服务，例如 vLLM 或 Ollama：
    # 'model': 'qwen3-32b',
    # 'model_server': 'http://localhost:8000/v1',  # base_url，也称为 api_base
    # 'api_key': 'EMPTY',

    # （可选） LLM 的超参数：
    'generate_cfg': {
        # 这个参数将影响tool-call解析逻辑。默认为False：
          # 设置为True：当content为 `<think>this is the thought</think>this is the answer`
          # 设置为False: 当回复为 reasoning_content 和 content
        # 'thought_in_content': True,

        # tool-call template：默认为nous（qwen3 推荐）
        # 'fncall_prompt_type': 'nous'

        # 最大输入长度，超过该长度会对messages截断，请根据模型API调整
        # 'max_input_tokens': 58000

        # 将直接输入模型API的参数，例如top_p, enable_thinking等，根据API规范传入：
        # 'top_p': 0.8

        # Using the API's native tool call interface
        # 'use_raw_api': True,
    }
}
```

## 如何让AI基于超长文档进行问答？

我们已发布了一个[快速的RAG解决方案](https://github.com/QwenLM/Qwen-Agent/blob/main/examples/assistant_rag.py)，以及一个虽运行成本较高但[准确度较高的智能体](https://github.com/QwenLM/Qwen-Agent/blob/main/examples/parallel_doc_qa.py)，用于在超长文档中进行问答。它们在两个具有挑战性的基准测试中表现出色，超越了原生的长上下文模型，同时更加高效，并在涉及100万字词上下文的“大海捞针”式单针查询压力测试中表现完美。欲了解技术细节，请参阅[博客](https://qwenlm.github.io/blog/qwen-agent-2405/)。

<p align="center">
    <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/assets/qwen_agent/qwen-agent-2405-blog-long-context-results.png" width="400"/>
<p>

# 应用：BrowserQwen

BrowserQwen 是一款基于 Qwen-Agent 构建的浏览器助手。如需了解详情，请参阅其[文档](browser_qwen_cn.md)。

# 免责声明

基于 Docker 容器的代码解释器仅挂载指定的工作目录，并已实施基础的沙盒隔离，但在生产环境中仍需谨慎使用。


================================================
FILE: benchmark/code_interpreter/README.md
================================================
# Code Interpreter Benchmark

## Introduction
To assess LLM's ability to use the Python Code Interpreter for tasks such as mathematical problem solving, data visualization, and other general-purpose tasks such as file handling and web scraping, we have created and open-sourced a benchmark specifically designed for evaluating these capabilities.

### Metrics
The metrics are divided into two parts: code executability and code correctness.
- Code executability: evaluating the ability of the LLM-generated code to be executed.
- Code correctness: evaluating whether the LLM-generated code runs correctly.

### Domain
When evaluating the accuracy of the code execution results for code correctness, we further divide it into two specific domains: `Math`, `Visualization`.
In terms of code executability, we calculate executable rate of the generated code for `General problem-solving`.

## Results
- Qwen-7B-Chat refers to the version updated after September 25, 2023.
- The code correctness judger model for `Visualization` has changed from `Qwen-vl-chat` to `gpt-4-vision-preview` in the version 20231206.

<table>
    <tr>
        <th colspan="5" align="center">In-house Code Interpreter Benchmark (Version 20231206)</th>
    </tr>
    <tr>
        <th rowspan="2" align="center">Model</th>
        <th colspan="3" align="center">Accuracy of Code Execution Results (%)</th>
        <th colspan="1" align="center">Executable Rate of Code (%)</th>
    </tr>
    <tr>
        <th align="center">Math↑</th><th align="center">Visualization-Hard↑</th><th align="center">Visualization-Easy↑</th><th align="center">General↑</th>
    </tr>
    <tr>
        <td>GPT-4</td>
        <td align="center">82.8</td>
        <td align="center">66.7</td>
        <td align="center">60.8</td>
        <td align="center">82.8</td>
    </tr>
    <tr>
        <td>GPT-3.5</td>
        <td align="center">47.3</td>
        <td align="center">33.3</td>
        <td align="center">55.7</td>
        <td align="center">74.1</td>
    </tr>
    <tr>
        <td>LLaMA2-13B-Chat</td>
        <td align="center">8.3</td>
        <td align="center">1.2</td>
        <td align="center">15.2</td>
        <td align="center">48.3</td>
    </tr>
    <tr>
        <td>CodeLLaMA-13B-Instruct</td>
        <td align="center">28.2</td>
        <td align="center">15.5</td>
        <td align="center">21.5</td>
        <td align="center">74.1</td>
    </tr>
    <tr>
        <td>InternLM-20B-Chat</td>
        <td align="center">34.6</td>
        <td align="center">10.7</td>
        <td align="center">24.1</td>
        <td align="center">65.5</td>
    </tr>
    <tr>
        <td>ChatGLM3-6B</td>
        <td align="center">54.2</td>
        <td align="center">4.8</td>
        <td align="center">15.2</td>
        <td align="center">62.1</td>
    </tr>
    <tr>
        <td>Qwen-1.8B-Chat</td>
        <td align="center">25.6</td>
        <td align="center">21.4</td>
        <td align="center">22.8</td>
        <td align="center">65.5</td>
    </tr>
    <tr>
        <td>Qwen-7B-Chat</td>
        <td align="center">41.9</td>
        <td align="center">23.8</td>
        <td align="center">38.0</td>
        <td align="center">67.2</td>
    </tr>
    <tr>
        <td>Qwen-14B-Chat</td>
        <td align="center">58.4</td>
        <td align="center">31.0</td>
        <td align="center">45.6</td>
        <td align="center">65.5</td>
    </tr>
    <tr>
        <td>Qwen-72B-Chat</td>
        <td align="center">72.7</td>
        <td align="center">41.7</td>
        <td align="center">43.0</td>
        <td align="center">82.8</td>
    </tr>
</table>

Furthermore, we also provide the results of `Qwen-vl-plus` as the code correctness judger model for `Visualization` task to serve as a reference.

<table>
    <tr>
        <th colspan="3" align="center">Code Correctness Judger Model = Qwen-vl-plus</th>
    </tr>
    <tr>
        <th rowspan="2" align="center">Model</th>
        <th colspan="2" align="center">Accuracy of Code Execution Results (%)</th>
    </tr>
    <tr>
        <th align="center">Visualization-Hard↑</th>
        <th align="center">Visualization-Easy↑</th>
    </tr>
    <tr>
        <td>LLaMA2-13B-Chat</td>
        <td align="center">2.4</td>
        <td align="center">17.7</td>
    </tr>
    <tr>
        <td>CodeLLaMA-13B-Instruct</td>
        <td align="center">17.9</td>
        <td align="center">34.2</td>
    </tr>
    <tr>
        <td>InternLM-20B-Chat</td>
        <td align="center">9.5</td>
        <td align="center">31.7</td>
    </tr>
    <tr>
        <td>ChatGLM3-6B</td>
        <td align="center">10.7</td>
        <td align="center">29.1</td>
    </tr>
    <tr>
        <td>Qwen-1.8B-Chat</td>
        <td align="center">32.1</td>
        <td align="center">32.9</td>
    </tr>
    <tr>
        <td>Qwen-7B-Chat</td>
        <td align="center">26.2</td>
        <td align="center">39.2</td>
    </tr>
    <tr>
        <td>Qwen-14B-Chat</td>
        <td align="center">36.9</td>
        <td align="center">41.8</td>
    </tr>
    <tr>
        <td>Qwen-72B-Chat</td>
        <td align="center">38.1</td>
        <td align="center">38.0</td>
    </tr>
</table>



## Usage

### Installation

```shell
git clone https://github.com/QwenLM/Qwen-Agent.git
cd benchmark
pip install -r requirements.txt
```

### Dataset Download
```shell
cd benchmark
wget https://qianwen-res.oss-cn-beijing.aliyuncs.com/assets/qwen_agent/benchmark_code_interpreter_data.zip
unzip benchmark_code_interpreter_data.zip
mkdir eval_data
mv eval_code_interpreter_v1.jsonl eval_data/
```

### Evaluation
To reproduce the comprehensive results of benchmark, you can run the following script:

```Shell
python inference_and_execute.py --model {model_name}
```

{model_name}:
- qwen-1.8b-chat
- qwen-7b-chat
- qwen-14b-chat
- qwen-72b-chat
- llama-2-7b-chat
- llama-2-13b-chat
- codellama-7b-instruct
- codellama-13b-instruct
- internlm-7b-chat-1.1
- internlm-20b-chat

The benchmark will run the test cases and generate the performance results. The results will be saved in the `output_data` directory.

**Notes**:
Please install `simhei.ttf` font for proper display in matplotlib when evaluating visualization task. You can do this by preparing `simhei.ttf` (which can be found on any Windows PC) and then running the following code snippet:
```python
import os
import matplotlib
target_font_path = os.path.join(
    os.path.abspath(
        os.path.join(matplotlib.matplotlib_fname(), os.path.pardir)),
        'fonts', 'ttf', 'simhei.ttf')
os.system(f'cp simhei.ttf {target_font_path}')
font_list_cache = os.path.join(matplotlib.get_cachedir(), 'fontlist-*.json')
os.system(f'rm -f {font_list_cache}')
```

#### Code Executable Rate
```Shell
python inference_and_execute.py --task {task_name} --model {model_name}
```

{task_name}:
- `general`: General problem-solving task


#### Code Correctness Rate
```Shell
python inference_and_execute.py --task {task_name} --model {model_name}
```

{task_name}:
- `visualization`: Visualization task
- `gsm8k`: Math task


## Configuration
The inference_and_exec.py file contains the following configurable options:

- `--model`: The model to test which can be one of `qwen-72b-chat`, `qwen-14b-chat`, `qwen-7b-chat`, `qwen-1.8b-chat`, `qwen-7b-chat`, `llama-2-7b-chat`, `llama-2-13b-chat`, `codellama-7b-instruct`, `codellama-13b-instruct`, `internlm-7b-chat-1.1`, `internlm-20b-chat`.
- `--task`: The test task which can be one of `all`, `visualization`, `general`, `gsm8k`.
- `--output-path`: The path for saving evaluation result.
- `--input-path`: The path for placing evaluation data.
- `--output-fname`: The file name for evaluation result.
- `--input-fname`: The file name for evaluation data.
- `--force`: Force generation and will overwrite the cached results.
- `--eval-only`: Only calculate evaluation metrics without re-inference.
- `--eval-code-exec-only`: Only evaluate code executable rate
- `--gen-exec-only`: Only generate and execuate code without calculating evaluation metrics.
- `--gen-only`: Only generate without execuating code and calculating evaluation metrics.
- `--vis-judger`: The model to judge the result correctness for `Visualization` task which can be one of `gpt-4-vision-preview`, `qwen-vl-chat`, `qwen-vl-plus`. It is set to `gpt-4-vision-preview` by default in the version 20231206, and `Qwen-vl-chat` has been deprecated.


================================================
FILE: benchmark/code_interpreter/code_interpreter.py
================================================
import base64
import io
import json
import logging
import os
import queue
import re
import subprocess
import sys
import time
import traceback
import uuid

import matplotlib
import PIL.Image
from jupyter_client import BlockingKernelClient
from utils.code_utils import extract_code

WORK_DIR = os.getenv('CODE_INTERPRETER_WORK_DIR', '/tmp/workspace')

LAUNCH_KERNEL_PY = """
from ipykernel import kernelapp as app
app.launch_new_instance()
"""

_KERNEL_CLIENTS = {}


# Run this fix before jupyter starts if matplotlib cannot render CJK fonts.
# And we need to additionally run the following lines in the jupyter notebook.
#   ```python
#   import matplotlib.pyplot as plt
#   plt.rcParams['font.sans-serif'] = ['SimHei']
#   plt.rcParams['axes.unicode_minus'] = False
#   ````
def fix_matplotlib_cjk_font_issue():
    local_ttf = os.path.join(os.path.abspath(os.path.join(matplotlib.matplotlib_fname(), os.path.pardir)), 'fonts',
                             'ttf', 'simhei.ttf')
    if not os.path.exists(local_ttf):
        logging.warning(
            f'Missing font file `{local_ttf}` for matplotlib. It may cause some error when using matplotlib.')


def start_kernel(pid):
    fix_matplotlib_cjk_font_issue()

    connection_file = os.path.join(WORK_DIR, f'kernel_connection_file_{pid}.json')
    launch_kernel_script = os.path.join(WORK_DIR, f'launch_kernel_{pid}.py')
    for f in [connection_file, launch_kernel_script]:
        if os.path.exists(f):
            logging.warning(f'{f} already exists')
            os.remove(f)

    os.makedirs(WORK_DIR, exist_ok=True)

    with open(launch_kernel_script, 'w') as fout:
        fout.write(LAUNCH_KERNEL_PY)

    kernel_process = subprocess.Popen([
        sys.executable,
        launch_kernel_script,
        '--IPKernelApp.connection_file',
        connection_file,
        '--matplotlib=inline',
        '--quiet',
    ],
                                      cwd=WORK_DIR)
    logging.info(f"INFO: kernel process's PID = {kernel_process.pid}")

    # Wait for kernel connection file to be written
    while True:
        if not os.path.isfile(connection_file):
            time.sleep(0.1)
        else:
            # Keep looping if JSON parsing fails, file may be partially written
            try:
                with open(connection_file, 'r') as fp:
                    json.load(fp)
                break
            except json.JSONDecodeError:
                pass

    # Client
    kc = BlockingKernelClient(connection_file=connection_file)
    kc.load_connection_file()
    kc.start_channels()
    kc.wait_for_ready()
    return kc


def escape_ansi(line):
    ansi_escape = re.compile(r'(?:\x1B[@-_]|[\x80-\x9F])[0-?]*[ -/]*[@-~]')
    return ansi_escape.sub('', line)


def publish_image_to_local(image_base64: str):
    image_file = str(uuid.uuid4()) + '.png'
    local_image_file = os.path.join(WORK_DIR, image_file)

    png_bytes = base64.b64decode(image_base64)
    assert isinstance(png_bytes, bytes)
    bytes_io = io.BytesIO(png_bytes)
    PIL.Image.open(bytes_io).save(local_image_file, 'png')

    return local_image_file


START_CODE = """
import signal
def _m6_code_interpreter_timeout_handler(signum, frame):
    raise TimeoutError("M6_CODE_INTERPRETER_TIMEOUT")
signal.signal(signal.SIGALRM, _m6_code_interpreter_timeout_handler)

def input(*args, **kwargs):
    raise NotImplementedError('Python input() function is disabled.')

import os
if 'upload_file' not in os.getcwd():
    os.chdir("./upload_file/")

import math
import re
import json

import seaborn as sns
sns.set_theme()

import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

import numpy as np
import pandas as pd

from sympy import Eq, symbols, solve
"""


def code_interpreter(action_input_list: list, timeout=30, clear=False):
    code = ''
    for action_input in action_input_list:
        code += (extract_code(action_input) + '\n')
    fixed_code = []
    for line in code.split('\n'):
        fixed_code.append(line)
        if line.startswith('sns.set_theme('):
            fixed_code.append('plt.rcParams["font.sans-serif"] = ["SimHei"]')
            fixed_code.append('plt.rcParams["axes.unicode_minus"] = False')
    fixed_code = '\n'.join(fixed_code)
    if 'def solution()' in fixed_code:
        fixed_code += '\nsolution()'

    return _code_interpreter(fixed_code, timeout, clear)


def _code_interpreter(code: str, timeout, clear=False):
    if not code.strip():
        return ''
    if timeout:
        code = f'signal.alarm({timeout})\n{code}'
    if clear:
        code = "get_ipython().run_line_magic('reset', '-f')\n" + START_CODE + code

    pid = os.getpid()
    if pid not in _KERNEL_CLIENTS:
        _KERNEL_CLIENTS[pid] = start_kernel(pid)
        _code_interpreter(START_CODE, timeout=None)
    kc = _KERNEL_CLIENTS[pid]
    kc.wait_for_ready()
    kc.execute(code)
    result = ''
    image_idx = 0
    while True:
        text = ''
        image = ''
        finished = False
        msg_type = 'error'
        try:
            msg = kc.get_iopub_msg()
            msg_type = msg['msg_type']
            if msg_type == 'status':
                if msg['content'].get('execution_state') == 'idle':
                    finished = True
            elif msg_type == 'execute_result':
                text = msg['content']['data'].get('text/plain', '')
                if 'image/png' in msg['content']['data']:
                    image_b64 = msg['content']['data']['image/png']
                    image_url = publish_image_to_local(image_b64)
                    image_idx += 1
                    image = '![fig-%03d](%s)' % (image_idx, image_url)
            elif msg_type == 'display_data':
                if 'image/png' in msg['content']['data']:
                    image_b64 = msg['content']['data']['image/png']
                    image_url = publish_image_to_local(image_b64)
                    image_idx += 1
                    image = '![fig-%03d](%s)' % (image_idx, image_url)
                else:
                    text = msg['content']['data'].get('text/plain', '')
            elif msg_type == 'stream':
                msg_type = msg['content']['name']  # stdout, stderr
                text = msg['content']['text']
            elif msg_type == 'error':
                text = escape_ansi('\n'.join(msg['content']['traceback']))
                if 'M6_CODE_INTERPRETER_TIMEOUT' in text:
                    text = f'Timeout. No response after {timeout} seconds.'
        except queue.Empty:
            text = f'Timeout. No response after {timeout} seconds.'
            finished = True
        except Exception:
            text = 'The code interpreter encountered an unexpected error.'
            logging.warning(''.join(traceback.format_exception(*sys.exc_info())))
            finished = True
        if text:
            result += f'\n\n{msg_type}:\n\n```\n{text}\n```'
        if image:
            result += f'\n\n{image}'
        if finished:
            break
    result = result.lstrip('\n')
    if timeout:
        _code_interpreter('signal.alarm(0)', timeout=None)
    return result


def get_multiline_input(hint):
    print(hint)
    print('// Press ENTER to make a new line. Press CTRL-D to end input.')
    lines = []
    while True:
        try:
            line = input()
        except EOFError:  # CTRL-D
            break
        lines.append(line)
    print('// Input received.')
    if lines:
        return '\n'.join(lines)
    else:
        return ''


if __name__ == '__main__':
    while True:
        print(code_interpreter([get_multiline_input('Enter python code:')]))


================================================
FILE: benchmark/code_interpreter/config.py
================================================
from parser import InternLMReActParser, ReActParser

from models import LLM, Qwen, QwenDashscopeVLModel, QwenVL
from prompt import InternLMReAct, LlamaReAct, QwenReAct

react_prompt_map = {
    'qwen': QwenReAct,
    'llama': LlamaReAct,
    'internlm': InternLMReAct,
}

react_parser_map = {
    'qwen': ReActParser,
    'llama': ReActParser,
    'internlm': InternLMReActParser,
}

model_map = {'qwen': Qwen, 'llama': LLM, 'internlm': LLM, 'qwen-vl-chat': QwenVL}

model_type_map = {
    'qwen-72b-chat': 'qwen',
    'qwen-14b-chat': 'qwen',
    'qwen-1.8b-chat': 'qwen',
    'qwen-7b-chat': 'qwen',
    'llama-2-7b-chat': 'llama',
    'llama-2-13b-chat': 'llama',
    'codellama-7b-instruct': 'llama',
    'codellama-13b-instruct': 'llama',
    'internlm-7b-chat-1.1': 'internlm',
    'internlm-20b-chat': 'internlm',
    'qwen-vl-chat': 'qwen-vl-chat',
}

model_path_map = {
    'qwen-72b-chat': 'Qwen/Qwen-72B-Chat',
    'qwen-14b-chat': 'Qwen/Qwen-14B-Chat',
    'qwen-7b-chat': 'Qwen/Qwen-7B-Chat',
    'qwen-1.8b-chat': 'Qwen/Qwen-1_8B-Chat',
    'llama-2-7b-chat': 'meta-llama/Llama-2-7b-chat-hf',
    'llama-2-13b-chat': 'meta-llama/Llama-2-13b-chat-hf',
    'codellama-7b-instruct': 'codellama/CodeLlama-7b-Instruct-hf',
    'codellama-13b-instruct': 'codellama/CodeLlama-13b-Instruct-hf',
    'internlm-7b-chat-1.1': 'internlm/internlm-chat-7b-v1_1',
    'internlm-20b-chat': 'internlm/internlm-chat-20b',
    'qwen-vl-chat': 'Qwen/Qwen-VL-Chat',
}


def get_react_prompt(model_name, query, lang, upload_fname_list):
    react_prompt_cls = react_prompt_map.get(model_type_map[model_name], QwenReAct)
    return react_prompt_cls(query, lang, upload_fname_list)


def get_react_parser(model_name):
    react_parser_cls = react_parser_map.get(model_type_map[model_name], ReActParser)
    return react_parser_cls()


def get_model(model_name):
    if model_name in ['qwen-vl-plus']:
        return QwenDashscopeVLModel(model=model_name)
    model_path = model_path_map.get(model_name, None)
    model_cls = model_map.get(model_type_map[model_name], LLM)
    return model_cls(model_path)


================================================
FILE: benchmark/code_interpreter/inference_and_execute.py
================================================
import argparse
import json
import logging
import os
from parser import ReActParser

import prettytable
import tqdm
from code_interpreter import code_interpreter
from config import get_model, get_react_parser, get_react_prompt, model_path_map
from datasets import load_dataset
from metrics.code_execution import eval_code_execution_rate
from metrics.gsm8k import eval_gsm8k_acc, is_correct
from metrics.visualization import eval_visualization_acc
from utils.code_utils import replace_upload_fname
from utils.data_utils import load_jsonl

logging.basicConfig(
    format='%(asctime)s - %(levelname)s - %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S',
    level=logging.INFO,
)

WORK_DIR = os.getenv('CODE_INTERPRETER_WORK_DIR', '/tmp/workspace')
os.makedirs(WORK_DIR, exist_ok=True)
os.system(f'cp -r upload_file_clean {WORK_DIR}/upload_file')
os.system('cp -r upload_file_clean ./upload_file')

global_eval_result = {
    'code_executability': {
        'math': None,
        'visualization': None,
        'general': None,
    },
    'code_correctness': {
        'math': None,
        'visualization-hard': None,
        'visualization-easy': None,
    }
}


def llm_with_plugin(args, query, item=None, exec_limit=3):
    exec_count = 0

    # Build ReAct prompt
    upload_fname_list = item['input_file_path'] if item and 'input_file_path' in item else []
    lang = item['lang'] if item and 'lang' in item else 'en'
    react_prompt_obj = get_react_prompt(args.model, query, lang, upload_fname_list)
    planning_prompt = react_prompt_obj.build_prompt()

    # Execute the code when providing the first action in the query
    if '<|im_start|>' in query:
        _, prepend_code, __ = ReActParser().parse_latest_plugin_call(query)
        prepend_code = replace_upload_fname(prepend_code, upload_fname_list)
        call_tool(_, [prepend_code], clear=(exec_count == 0))
        exec_count += 1
        exec_limit += 1

    # Inference and execute
    text = ''
    while exec_count < exec_limit:
        stop_words_list = react_prompt_obj.get_stop_words_list()
        output = text_completion(args.llm, planning_prompt + text, stop_words=stop_words_list)

        if args.gen_only:
            text += output
            break

        react_parser = get_react_parser(args.model)
        action, action_input, output = react_parser.parse_latest_plugin_call(output)
        if action:
            action_input = replace_upload_fname(action_input, upload_fname_list)
            observation = call_tool(action, [action_input], clear=(exec_count == 0))
            output += react_prompt_obj.build_observation(observation)
            text += output
            exec_count += 1
            if 'error:' in observation or 'Traceback' in observation:
                break
        else:
            text += output
            break
    return text


def text_completion(llm, input_text, stop_words=[]):
    logging.info('Generating'.center(60, '='))
    logging.info('Input'.center(60, '-'))
    logging.info(input_text)

    output = llm.generate(input_text, stop_words)

    logging.info('Output'.center(60, '-'))
    logging.info(output)
    return output


def call_tool(plugin_name, plugin_args_list, clear=False):
    # Relax constraints on plugin name.
    logging.info('Call code interpreter'.center(60, '='))
    obs = code_interpreter(plugin_args_list, clear=clear)
    logging.info(obs)
    return obs


def process_code_interpreter(item, writer):
    query = item['query']
    exec_limit = 3 if 'visualization' in item['tags'] else 1
    response = llm_with_plugin(args=args, query=query, item=item, exec_limit=exec_limit)
    item['gen'] = response

    writer.write(json.dumps(item, ensure_ascii=False) + '\n')
    writer.flush()


def process_gsm8k(doc, writer):
    context = doc['question']
    completion = llm_with_plugin(args=args, query=context)
    acc = is_correct(completion, doc['answer'])
    doc['completion'] = completion
    doc['acc'] = acc

    writer.write(json.dumps(doc, ensure_ascii=False) + '\n')
    writer.flush()


def sequential_processing(args, data_list, process_func, writer):
    for item in tqdm.tqdm(data_list):
        process_func(item, writer)


process_func_map = {'gsm8k': process_gsm8k, 'visualization': process_code_interpreter}


def gather_eval_result(model_name):
    for metric in global_eval_result:
        logging.info(metric)
        table = prettytable.PrettyTable()
        table.field_names = ['model'] + list(global_eval_result[metric].keys())
        row_data = [model_name]
        for item in global_eval_result[metric].values():
            item = str(item) if not item else str(round(item, 2))
            row_data.append(item)
        table.add_row(row_data)
        logging.info('\n' + str(table))


def eval_metrics(args, test_set, full_output_fname):
    # metrics
    assert os.path.exists(full_output_fname), f'Not Found File {full_output_fname}.'
    inference_res = load_jsonl(full_output_fname)
    assert len(inference_res) == len(test_set), f'There are still {len(test_set)-len(inference_res)} cases left.'

    abs_output_fname = os.path.join(os.path.dirname(os.path.abspath(__file__)), full_output_fname)
    if args.task == 'gsm8k':
        math_code_correctness = eval_gsm8k_acc(abs_output_fname)
        global_eval_result['code_correctness'].update(math_code_correctness)
    else:
        code_executability = eval_code_execution_rate(abs_output_fname, args.task, args.model)
        global_eval_result['code_executability'].update(code_executability)
        if args.task in ['all_ci', 'visualization'] and not args.eval_code_exec_only:
            visualization_code_correctness = eval_visualization_acc(abs_output_fname, args.model, args.vis_judger)
            global_eval_result['code_correctness'].update(visualization_code_correctness)


def main(args):
    current_dir = os.getcwd()
    os.makedirs(args.output_path, exist_ok=True)
    full_output_fname = os.path.join(args.output_path, (args.output_fname or f'{args.task}_{args.model}_res.jsonl'))

    if not os.path.exists(full_output_fname):
        with open(full_output_fname, 'w'):
            logging.info(f'Create file {full_output_fname} done.')

    # build data
    if args.task == 'gsm8k':
        dataset = load_dataset('gsm8k', 'main')
        test_set = dataset['test']
    else:
        eval_data_path = os.path.join(args.input_path, args.input_fname)
        test_set = [item for item in load_jsonl(eval_data_path) if args.task in item['tags']]
    logging.info(f'Test set: {len(test_set)}')

    if args.eval_only:
        eval_metrics(args, test_set, full_output_fname)
    else:
        key = 'question' if args.task == 'gsm8k' else 'query'
        cache_question = [item[key] for item in load_jsonl(full_output_fname)] if not args.force else []
        data_list = [item for item in test_set if item[key] not in cache_question]
        logging.info(f'Left cases: {len(data_list)}')

        # inference
        writer_mode = 'w' if args.force else 'a'
        f_output = open(full_output_fname, writer_mode, encoding='utf-8')
        process_func = process_func_map.get(args.task, process_code_interpreter)
        sequential_processing(args, data_list, process_func, f_output)
        f_output.close()

        # evaluate
        if not args.gen_exec_only:
            eval_metrics(args, test_set, full_output_fname)

    os.chdir(current_dir)


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', type=str, default='qwen-14b-chat', choices=list(model_path_map.keys()))
    parser.add_argument('--task', type=str, default='all', choices=['all', 'gsm8k', 'visualization', 'general'])
    parser.add_argument('--output-path', type=str, default='output_data')
    parser.add_argument('--input-path', type=str, default='eval_data')
    parser.add_argument('-o', '--output-fname', type=str, default='')
    parser.add_argument('-i', '--input-fname', type=str, default='eval_code_interpreter_v1.jsonl')
    parser.add_argument('-f', '--force', action='store_true', default=False)
    parser.add_argument('--eval-only', action='store_true', default=False)
    parser.add_argument('--eval-code-exec-only', action='store_true', default=False)
    parser.add_argument('--gen-exec-only', action='store_true', default=False)
    parser.add_argument('--gen-only', action='store_true', default=False)
    parser.add_argument('--vis-judger',
                        type=str,
                        default="'gpt-4-vision-preview'",
                        choices=['gpt-4-vision-preview', 'qwen-vl-chat', 'qwen-vl-plus'])
    args = parser.parse_args()
    return args


if __name__ == '__main__':
    args = parse_args()
    if not args.eval_only:
        args.llm = get_model(args.model)
        logging.info(f'Init {args.model} done.')

    if args.task == 'all':
        for key in ['gsm8k', 'visualization', 'general']:
            args.task = key
            main(args)
    else:
        main(args)
    gather_eval_result(args.model)


================================================
FILE: benchmark/code_interpreter/metrics/__init__.py
================================================


================================================
FILE: benchmark/code_interpreter/metrics/code_execution.py
================================================
import logging
import os

import func_timeout
from config import get_react_parser
from func_timeout import func_set_timeout
from utils.code_utils import extract_code, replace_upload_fname
from utils.data_utils import load_jsonl, save_jsonl

pre_load = """
import os
if 'upload_file' not in os.getcwd():
    os.chdir("./upload_file/")

import seaborn as sns

import matplotlib
# matplotlib.use('Agg')
import matplotlib.pyplot as plt
plt.ion()

import numpy as np
import pandas as pd
from sympy import Eq, symbols, solve
import re
import json
import math
"""

tags_config = {
    'visualization': {
        'timelimit': True,
        'extract_first_code': True,
    },
    'math': {
        'timelimit': True,
        'extract_first_code': False,
    },
    'general': {
        'timelimit': False,
        'extract_first_code': True,
    }
}

code_executability = {'math': None, 'visualization': None, 'general': None}


@func_set_timeout(10)
def exec_limit_time(text):
    exec(text, locals())


def exec_code(text, timelimit=False):
    if timelimit:
        exec_limit_time(text)
    else:
        exec(text, locals())


def postprocess_code(gen_code, line):
    if '<|im_start|>' in line['query']:
        first_action_code = get_action_input_code(line['query'])
        gen_code = first_action_code + gen_code

    upload_fname_list = line['input_file_path'] if line and 'input_file_path' in line else []
    gen_code = replace_upload_fname(gen_code, upload_fname_list)

    if 'def solution()' in gen_code:
        gen_code += '\nsolution()\n'

    if 'plt.show()' in gen_code:
        gen_code += "\nplt.pause(1)\nplt.close('all')\n"

    if 'sns.' in gen_code and 'plot' in gen_code:
        gen_code += "\nplt.close('all')\n"

    gen_code = pre_load + gen_code
    return gen_code


def get_action_input_code(text, model_name='qwen-14b-chat', extract_first_code=False):
    action_input_list = []
    tmp = text
    react_parser = get_react_parser(model_name)
    while True:
        action_input = react_parser.get_first_action_input(tmp)
        if not action_input:
            break
        action_input_list.append(action_input)
        tmp = tmp.split(action_input)[1]
        if not tmp or extract_first_code:
            break

    code = ''
    for action_input in action_input_list:
        code = code + '# concat\n' + extract_code(action_input) + '\n'
    return code


def eval_code_execution_rate(output_fname,
                             tag='all_ci',
                             model_name='qwen-14b-chat',
                             timelimit=False,
                             extract_first_code=False):
    data_list = load_jsonl(output_fname)
    pip_package = []

    for line_id, line in enumerate(data_list):
        line['idx'] = line_id
        tags_list = line['tags'].split(',')
        if tag not in tags_list:
            continue

        # update args
        for cur_tag in tags_list:
            if cur_tag != 'all_ci':
                timelimit = tags_config[cur_tag]['timelimit']
                extract_first_code = tags_config[cur_tag]['extract_first_code']

        line['executable_code'] = False
        line['missing_code'] = False
        line['code_error_info'] = ''

        # get Action Input code from response
        gen_code = get_action_input_code(line['gen'], model_name=model_name, extract_first_code=extract_first_code)

        if not gen_code:
            line['missing_code'] = True
            line['code'] = ''
            line['code_error_info'] = 'missing code'
            continue

        line['code'] = gen_code
        gen_code = postprocess_code(gen_code, line)

        while True:
            try:
                exec_code(gen_code, timelimit=timelimit)
                line['executable_code'] = True
                break
            except func_timeout.exceptions.FunctionTimedOut as ex:
                line['code_error_info'] = str(ex)
                break
            except (ImportError, ModuleNotFoundError) as ex:
                try:
                    packege = str(ex).split("'")[1].strip()
                except Exception:
                    packege = ''
                if packege and packege not in pip_package:  # install package
                    pip_package.append(packege)
                    os.system('pip install ' + packege)
                    logging.info(f'Automatic installation: {packege}')
                else:
                    line['code_error_info'] = str(ex)
                    break
            except Exception as ex:
                line['code_error_info'] = str(ex)
                break

        # double check
        observation = get_react_parser(model_name).get_first_observation(line['gen'])
        if line['executable_code'] and ('error:' in observation):
            logging.warning('The code executes correctly, but it has an error in IPython!')
            logging.warning(f'Code:\n{gen_code}')
            logging.warning(f'IPython error info:\n{observation}')
            logging.info('=' * 60)
        elif not line['executable_code'] and not ('error:' in observation):
            logging.warning('The code has an execution error, but it runs correctly in IPython!')
            logging.warning(f'Code:\n{gen_code}')
            logging.warning(f"Exec error info:\n{line['code_error_info']}")
            logging.warning(f'IPython observation:\n{observation}')
            logging.info('=' * 60)

    # save error data
    error_data_list = [item for item in data_list if not item['executable_code'] or item['missing_code']]
    error_data_output_fname = os.path.splitext(output_fname)[0] + '_exec_error.jsonl'
    save_jsonl(error_data_list, error_data_output_fname)

    log_result(data_list)

    return code_executability


def log_result(data_list, verbose=True):
    if verbose:
        logging.info('*' * 60)
        logging.info('{:^60}'.format('Detail'))
        logging.info('*' * 60)
        for line_id, line in enumerate(data_list):
            logging.info(f'Question {line_id}'.center(60, '='))
            logging.info(line['query'])

            logging.info(f'Generated {line_id}'.center(60, '-'))
            logging.info('\n' + line['gen'])

            logging.info(f'Code {line_id}'.center(60, '-'))
            logging.info('\n' + line['code'])

            logging.info(f'Exec Result {line_id}'.center(60, '-'))
            prefix_info = 'Exec Success' if line['executable_code'] else 'Exec Error: '
            exec_info = prefix_info + line['code_error_info']
            logging.info(exec_info)

    logging.info('=' * 60)
    logging.info('{:^60}'.format('Code Execuation Rate'))
    logging.info('=' * 60)
    involved_tags = []
    for line in data_list:
        involved_tags += line['tags'].split(',')
    involved_tags = list(set(involved_tags))

    for key in involved_tags:
        logging.info(f'task: {key}'.center(60, '='))
        key_item_list = [item for item in data_list if key in item['tags']]
        all_count = len(key_item_list)
        missing_code_count = len([item for item in key_item_list if item['missing_code']])
        executable_code_count = len([item for item in key_item_list if item['executable_code']])

        logging.info(f'All Test: {all_count}')
        logging.info(f'Missing Code: {missing_code_count}')
        logging.info(f'Predict Exec Success: {executable_code_count}')
        logging.info('Codes available && Execution Rate: {:.2f}'.format(executable_code_count /
                                                                        (all_count - missing_code_count) * 100))
        logging.info('Execution Rate: {:.2f}'.format(executable_code_count / all_count * 100))
        logging.info('Non-executable rate: {:.2f}'.format(
            (all_count - missing_code_count - executable_code_count) / all_count * 100))
        logging.info('Missing code rate: {:.2f}'.format(missing_code_count / all_count * 100))

        if key != 'all_ci':
            code_executability[key] = executable_code_count / all_count * 100

        if verbose:
            logging.info('Error List: ')
            error_list = [(item['idx'], item['code_error_info']) for item in key_item_list if item['code_error_info']]
            error_list.sort(key=lambda x: x[1])
            for x in error_list:
                logging.info(x)


================================================
FILE: benchmark/code_interpreter/metrics/gsm8k.py
================================================
import logging
import os
import re

import numpy as np
from utils.data_utils import load_jsonl, save_jsonl

INVALID_ANS = '[invalid]'


def extract_answer(completion):

    def _get_last_digit(s):
        _PAT_LAST_DIGIT = re.compile(
            r'(?<=(\s|[\$%#{]))([+-])?(?=(\S))(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?=(\s|[.,}]|$))')
        match = list(_PAT_LAST_DIGIT.finditer(s))
        if match:
            last_digit = match[-1].group().replace(',', '').replace('+', '')
        else:
            last_digit = None
            logging.warning(f'No digits found in {s!r}')
        return last_digit

    job_gen = completion.strip('.').replace('\n', '\\n')
    last_digit = _get_last_digit(job_gen)
    if last_digit:
        return eval(last_digit)
    else:
        return INVALID_ANS


def is_correct(completion, answer):
    gold = extract_answer(answer)
    assert gold != INVALID_ANS, 'No ground truth answer found in the document.'
    return extract_answer(completion) == gold


def eval_gsm8k_acc(output_fname):
    data_list = load_jsonl(output_fname)
    acc_res = [item['acc'] for item in data_list]
    logging.info('=' * 60)
    logging.info('{:^60}'.format('Math Acc.'))
    logging.info('=' * 60)
    logging.info('Total num={:.2f}'.format(len(acc_res)))
    logging.info('Right num={:.2f}'.format(np.sum(acc_res)))
    logging.info('Zero-shot Acc={:.2f}'.format(np.mean(acc_res) * 100))

    error_data_list = [item for item in data_list if not item['acc']]
    error_data_output_fname = os.path.splitext(output_fname)[0] + '_gsm8k_error.jsonl'
    save_jsonl(error_data_list, error_data_output_fname)

    return {'math': np.mean(acc_res) * 100}


================================================
FILE: benchmark/code_interpreter/metrics/visualization.py
================================================
import base64
import logging
import os
import re

import torch
from config import get_model, get_react_parser
from utils.data_utils import load_jsonl, save_jsonl

torch.manual_seed(1234)

EVAL_VISUAL_PROMPT_ZH = """请判断图片是否与下面的[问题]一致，如果一致则回复“right”，不一致则回复“wrong”。
[问题]：{query}
"""

EVAL_VISUAL_PROMPT_EN = """Please judge whether the image is consistent with the [Question] below, if it is consistent then reply "right", if not then reply "wrong".
[Question]: {query}
"""

visualization_code_correctness = {
    'visualization-hard': None,
    'visualization-easy': None,
}


def encode_image(image_path):
    with open(image_path, 'rb') as image_file:
        a = base64.b64encode(image_file.read()).decode('utf-8')
    return a


def judger_model_inference(judger_model_name, judger_model, imgs=[], prompt=''):
    output = ''
    if judger_model_name == 'gpt-4-vision-preview':
        logging.warning('This is an example of `gpt-4-vision-preview`. '
                        'Please set the API key and use according to your actual situation.')
        from openai import OpenAI
        client = OpenAI()
        content_list = []
        content_list.append({'type': 'text', 'text': prompt})
        input_images = []
        for img in imgs:
            if 'http' not in img:
                base64_image = encode_image(img)
                img = f'data:image/jpeg;base64,{base64_image}'
            input_images.append({'type': 'image_url', 'image_url': img})
        content_list.extend(input_images)
        response = client.chat.completions.create(
            model='gpt-4-vision-preview',
            messages=[{
                'role': 'user',
                'content': content_list,
            }],
            max_tokens=300,
        )
        output = response.choices[0]
    elif judger_model_name in ['qwen-vl-plus', 'qwen-vl-chat']:
        inputs = []
        for img in imgs:
            if 'http' not in img and judger_model_name == 'qwen-vl-plus':
                img = 'file://' + img
            inputs.append({'image': img})
        inputs.append({'text': prompt})

        logging.info('Eval'.center(60, '-'))
        logging.info(inputs)
        output = judger_model.generate(inputs)
    logging.info(output)
    logging.info('=' * 60)
    return output


def extract_images(text):
    regex = re.compile(r'!\[fig-(.+)\]\((.+)\)')
    results = re.findall(regex, text)
    images = []
    for res in results:
        assert len(res) == 2
        if os.path.exists(res[1]):
            images.append(res[1])
    return images


def check_images_observation(text, images, model_name):
    start_flag = get_react_parser(model_name).observation
    for image in images:
        logging.info('Image'.center(60, '-'))
        logging.info(image)

        end_idx = text.find(image)
        tmp_text = text[:end_idx + len(image)]
        start_idx = tmp_text.rfind(start_flag)
        check_text = tmp_text[start_idx + len(start_flag):]

        logging.info('Observation'.center(60, '-'))
        logging.info(check_text)

        # As long as there exists correctly executed observation, we consider `True`
        if 'error:' not in check_text and 'Traceback' not in check_text:
            return True
    return False


eval_visual_prompt = {'zh': EVAL_VISUAL_PROMPT_ZH, 'en': EVAL_VISUAL_PROMPT_EN}


def eval_visualization_acc(output_fname, model_name, judger_model_name='gpt-4-vision-preview'):
    if judger_model_name == 'gpt-4-vision-preview':
        judger_model = None
    elif judger_model_name in ['qwen-vl-chat', 'qwen-vl-plus']:
        if judger_model_name == 'qwen-vl-chat':
            logging.warning('In this benchmark of version 20231206, `Qwen-vl-chat` is no longer used as the '
                            'evaluation model for `Visualization` task.. If you insist on using it, '
                            'the evaluation results might differ from the official results.')
        judger_model = get_model(judger_model_name)
    else:
        raise Exception('Not supported judger model.')

    one_action, one_action_right = 0, 0
    zero_action, zero_action_right = 0, 0

    data_list = load_jsonl(output_fname)
    for item in data_list:
        if 'visualization' not in item['tags']:
            continue

        item['vis_acc'] = False
        if '<|im_end|>' in item['query']:
            one_action += 1
            prompt = item['query'].split('<|im_end|>')[0]
        else:
            zero_action += 1
            prompt = item['query']

        images = extract_images(item['gen'])

        if images and check_images_observation(item['gen'], images, model_name):
            input_prompt = eval_visual_prompt[item.get('lang', 'en')]
            format_prompt = input_prompt.format(query=prompt)
            output = judger_model_inference(judger_model_name, judger_model, images, format_prompt)
            if 'right' in output.lower():
                item['vis_acc'] = True
                if '<|im_end|>' in item['query']:
                    one_action_right += 1
                else:
                    zero_action_right += 1

    logging.info('*' * 60)
    logging.info('{:^60}'.format('Visualization Acc.'))
    logging.info('*' * 60)
    logging.info('Visualization-Hard count={}, Visualization-Hard right count={}, Visualization-Hard acc={:.2f}'.format(
        zero_action, zero_action_right, zero_action_right / zero_action * 100))
    logging.info('Visualization-Easy count={}, Visualization-Easy right count={}, Visualization-Easy acc={:.2f}'.format(
        one_action, one_action_right, one_action_right / one_action * 100))
    logging.info('all count={}, all right={}, all acc={:.2f}'.format(
        zero_action + one_action, zero_action_right + one_action_right,
        (zero_action_right + one_action_right) / (zero_action + one_action) * 100))

    visualization_code_correctness['visualization-hard'] = zero_action_right / zero_action * 100
    visualization_code_correctness['visualization-easy'] = one_action_right / one_action * 100

    error_data_list = [item for item in data_list if 'visualization' in item['tags'] and not item['vis_acc']]
    error_data_output_fname = os.path.splitext(output_fname)[0] + '_vis_error.jsonl'
    save_jsonl(error_data_list, error_data_output_fname)

    return visualization_code_correctness


================================================
FILE: benchmark/code_interpreter/models/__init__.py
================================================
from models.base import HFModel  # noqa
from models.dashscope import QwenDashscopeVLModel  # noqa
from models.llm import LLM  # noqa
from models.qwen import Qwen, QwenVL  # noqa


================================================
FILE: benchmark/code_interpreter/models/base.py
================================================
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig


class HFModel(object):

    def __init__(self, model_path):
        self.tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
        self.model = AutoModelForCausalLM.from_pretrained(model_path,
                                                          trust_remote_code=True,
                                                          device_map='auto',
                                                          low_cpu_mem_usage=True).eval()
        self.model.generation_config = GenerationConfig.from_pretrained(model_path, trust_remote_code=True)
        self.model.generation_config.do_sample = False


================================================
FILE: benchmark/code_interpreter/models/dashscope.py
================================================
import logging
import os
import time
from http import HTTPStatus

import dashscope


class QwenDashscopeVLModel(object):

    def __init__(self, model, api_key):
        self.model = model
        dashscope.api_key = api_key.strip() or os.getenv('DASHSCOPE_API_KEY', default='')
        assert dashscope.api_key, 'DASHSCOPE_API_KEY is required.'

    def generate(self, prompt, stop_words=[]):
        if isinstance(prompt, str):
            prompt = [{'text': prompt}]

        MAX_TRY = 3
        count = 0
        while count < MAX_TRY:
            response = dashscope.MultiModalConversation.call(
                self.model,
                messages=[{
                    'role': 'user',
                    'content': prompt
                }],
                top_p=0.01,
                top_k=1,
            )
            if response.status_code == HTTPStatus.OK:
                output = response.output.choices[0].message.content[0]['text']
                for stop_str in stop_words:
                    idx = output.find(stop_str)
                    if idx != -1:
                        output = output[:idx + len(stop_str)]
                return output
            else:
                err = 'Error code: %s, error message: %s' % (
                    response.code,
                    response.message,
                )
                logging.error(err)
                count += 1
            time.sleep(1)


================================================
FILE: benchmark/code_interpreter/models/llm.py
================================================
import torch
from models.base import HFModel


class LLM(HFModel):

    def __init__(self, model_path):
        super().__init__(model_path)

    def generate(self, input_text, stop_words=[], max_new_tokens=512):
        if isinstance(input_text, str):
            input_text = [input_text]

        input_ids = self.tokenizer(input_text)['input_ids']
        input_ids = torch.tensor(input_ids, device=self.model.device)
        gen_kwargs = {'max_new_tokens': max_new_tokens, 'do_sample': False}
        outputs = self.model.generate(input_ids, **gen_kwargs)
        s = outputs[0][input_ids.shape[1]:]
        output = self.tokenizer.decode(s, skip_special_tokens=True)

        for stop_str in stop_words:
            idx = output.find(stop_str)
            if idx != -1:
                output = output[:idx + len(stop_str)]

        return output


================================================
FILE: benchmark/code_interpreter/models/qwen.py
================================================
import torch
from models.base import HFModel


class Qwen(HFModel):

    def __init__(self, model_path):
        super().__init__(model_path)

    def generate(self, input_text, stop_words=[]):
        im_end = '<|im_end|>'
        if im_end not in stop_words:
            stop_words = stop_words + [im_end]
        stop_words_ids = [self.tokenizer.encode(w) for w in stop_words]

        input_ids = torch.tensor([self.tokenizer.encode(input_text)]).to(self.model.device)
        output = self.model.generate(input_ids, stop_words_ids=stop_words_ids)
        output = output.tolist()[0]
        output = self.tokenizer.decode(output, errors='ignore')
        assert output.startswith(input_text)
        output = output[len(input_text):].replace('<|endoftext|>', '').replace(im_end, '')

        return output


class QwenVL(HFModel):

    def __init__(self, model_path):
        super().__init__(model_path)

    def generate(self, inputs: list):
        query = self.tokenizer.from_list_format(inputs)
        response, _ = self.model.chat(self.tokenizer, query=query, history=None)

        return response


================================================
FILE: benchmark/code_interpreter/parser/__init__.py
================================================
from parser.internlm_parser import InternLMReActParser  # noqa
from parser.react_parser import ReActParser  # noqa


================================================
FILE: benchmark/code_interpreter/parser/internlm_parser.py
================================================
from parser.react_parser import ReActParser


class InternLMReActParser(ReActParser):

    def __init__(self):
        self.action = '\nAction:'
        self.action_input = '\nActionInput:'
        self.action_input_stop = '<eoa>'
        self.observation = '<|System|>:Response:'
        self.observation_stop = '<TOKENS_UNUSED_2>\n<|Bot|>:'


================================================
FILE: benchmark/code_interpreter/parser/react_parser.py
================================================
class ReActParser(object):

    def __init__(self):
        self.action = '\nAction:'
        self.action_input = '\nAction Input:'
        self.action_input_stop = '\nObservation:'
        self.observation = '\nObservation:'
        self.observation_stop = '\nThought:'

    def parse_latest_plugin_call(self, text):
        action = self.action
        action_input = self.action_input
        observation = self.action_input_stop
        plugin_name, plugin_args = '', ''
        i = text.rfind(action)
        j = text.rfind(action_input)
        k = text.rfind(observation)
        if 0 <= i < j:  # If the text has `Action` and `Action input`,
            if k < j:  # but does not contain `Observation`,
                # then it is likely that `Observation` is ommited by the LLM,
                # because the output text may have discarded the stop word.
                text = text.rstrip() + observation  # Add it back.
            k = text.rfind(observation)
            plugin_name = text[i + len(action):j].strip()
            plugin_args = text[j + len(action_input):k].strip()
            text = text[:k]
        return plugin_name, plugin_args, text

    def _extract_first_target(self, text, start_flag, end_flag):
        target = ''
        i = text.find(start_flag)
        if i != -1:
            j = text.find(end_flag, i)
            if j != -1:
                target = text[i + len(start_flag):j].strip()
            else:
                target = text[i + len(start_flag):].strip()
        return target

    def get_first_observation(self, text):
        return self._extract_first_target(text, self.observation, self.observation_stop)

    def get_first_action_input(self, text):
        return self._extract_first_target(text, self.action_input, self.action_input_stop)


================================================
FILE: benchmark/code_interpreter/prompt/__init__.py
================================================
from prompt.internlm_react import InternLMReAct  # noqa
from prompt.llama_react import LlamaReAct  # noqa
from prompt.qwen_react import QwenReAct  # noqa
from prompt.react import ReAct  # noqa


================================================
FILE: benchmark/code_interpreter/prompt/internlm_react.py
================================================
from prompt.react import ReAct

INTERNLM_TOOL_DESCRIPTION = """用来执行Python代码。代码必须是一个函数，
函数名必须得是 'solution'，代码对应你的思考过程。代码实例格式如下：
```python
# import 依赖包
import xxx
def solution():
    # 初始化一些变量
    variable_names_with_real_meaning = xxx
    # 步骤一
    mid_variable = func(variable_names_with_real_meaning)
    # 步骤 x
    mid_variable = func(mid_variable)
    # 最后结果
    final_answer =  func(mid_variable)
    return final_answer
```"""

INTERNLM_TOOL = {'PythonInterpreter': INTERNLM_TOOL_DESCRIPTION}

INTERNLM_REACT_PROMPT_ZH = """<|System|>:你是一个可以调用外部工具的助手，可以使用的工具包括：
{tools_text}
如果使用工具请遵循以下格式回复：
```
Thought:思考你当前步骤需要解决什么问题，是否需要使用工具
Action:工具名称，你的工具必须从 [{tools_name_text}] 选择
ActionInput:工具输入参数
```
工具返回按照以下格式回复：
```
Response:调用工具后的结果
```
如果你已经知道了答案，或者你不需要工具，请遵循以下格式回复
```
Thought:给出最终答案的思考过程
FinalAnswer:最终答案
```
开始!<TOKENS_UNUSED_2>
<|User|>:{query}<eoh>
<|Bot|>:"""

INTERNLM_REACT_PROMPT_EN = """<|System|>:You are a assistant who can utilize external tools.
{tools_text}
To use a tool, please use the following format:
```
Thought: Think what you need to solve, do you need to use tools?
Action: the tool name, should be one of [{tools_name_text}]
ActionInput: the input to the action
```
The response after utilizing tools should using the following format:
```
Response: the results after call the tool.
``
If you already know the answer, or you do not need to use tools,
please using the following format to reply:
```
Thought: the thought process to get the final answer
FinalAnswer: final answer
```
Begin!<TOKENS_UNUSED_2>
<|User|>:{query}<eoh>
<|Bot|>:"""


class InternLMReAct(ReAct):

    def __init__(self, query, lang='en', upload_file_paths=[]):
        super().__init__(query, lang, upload_file_paths)
        self.react_template = INTERNLM_REACT_PROMPT_ZH if self.lang == 'zh' else INTERNLM_REACT_PROMPT_EN

    def build_prompt(self):
        planning_prompt = super().build_prompt()
        if '<|im_end|>' in self.query and planning_prompt.endswith('<eoh>\n<|Bot|>:'):
            planning_prompt = planning_prompt[:-len('<eoh>\n<|Bot|>:')]

        if '<|im_end|>' in self.query:
            planning_prompt = planning_prompt.replace('<|im_end|>\n<|im_start|>assistant\n', '<eoh>\n<|Bot|>:').replace(
                'Observation:',
                '<eoa>\n<|System|>:Response:').replace('\nAction Input',
                                                       '\nActionInput').replace('code_interpreter', 'PythonInterpreter')
            assert planning_prompt.endswith('Thought:')
            planning_prompt = planning_prompt[:-len('Thought:')] + '<TOKENS_UNUSED_2>\n<|Bot|>:'

        self.prompt = planning_prompt
        return planning_prompt

    def _build_tools_text(self):
        return INTERNLM_TOOL

    def _build_tools_name_text(self):
        return list(INTERNLM_TOOL.keys())

    def build_observation(self, observation):
        return f'<eoa>\n<|System|>:Response:{observation}\n<TOKENS_UNUSED_2>\n<|Bot|>:'

    def get_stop_words_list(self):
        return ['<eoa>']


================================================
FILE: benchmark/code_interpreter/prompt/llama_react.py
================================================
from prompt.react import ReAct


class LlamaReAct(ReAct):

    def __init__(self, query, lang='en', upload_file_paths=[]):
        super().__init__(query, lang, upload_file_paths)

    def build_prompt(self):
        planning_prompt = super().build_prompt()
        planning_prompt = '[INST] ' + planning_prompt + ' [/INST]'

        if '<|im_end|>' in self.query:
            planning_prompt = planning_prompt.replace('<|im_end|>\n<|im_start|>assistant', ' [/INST] ')
            assert planning_prompt.endswith(' [/INST]')
            planning_prompt = planning_prompt[:-len(' [/INST]')]

        self.prompt = planning_prompt
        return planning_prompt


================================================
FILE: benchmark/code_interpreter/prompt/qwen_react.py
================================================
import json
import os

from prompt.react import ReAct

QWEN_TOOLS_LIST = [
    {
        'name_for_human': '代码解释器',
        'name_for_model': 'code_interpreter',
        'description_for_model': '代码解释器，可用于执行Python代码。',
        'parameters': [{
            'name': 'code',
            'type': 'string',
            'description': '待执行的代码'
        }],
        'args_format': 'code'
    },
]

TOOL_DESC = """{name_for_model}: Call this tool to interact with the {name_for_human} API. What is the {name_for_human} API useful for? {description_for_model} Parameters: {parameters}"""


class QwenReAct(ReAct):

    def __init__(self, query, lang='en', upload_file_paths=[]):
        super().__init__(query, lang, upload_file_paths)

        self.upload_file_paths = [f'{os.path.basename(fname)}' for fname in upload_file_paths]
        self.list_of_plugin_info = QWEN_TOOLS_LIST
        self.fname_template = {
            'zh': '[上传文件{fname_str}]',
            'en': '[Upload file {fname_str}]',
            'en_multi': '[Upload file {fname_str}]'
        }

    def build_prompt(self):
        im_start = '<|im_start|>'
        im_end = '<|im_end|>'
        prompt = f'{im_start}system\nYou are a helpful assistant.{im_end}'

        query = super().build_prompt()

        query = query.lstrip('\n').rstrip()
        prompt += f'\n{im_start}user\n{query}{im_end}'
        if f'{im_start}assistant' not in query:
            prompt += f'\n{im_start}assistant\n{im_end}'
            assert prompt.endswith(f'\n{im_start}assistant\n{im_end}')

        prompt = prompt[:-len(f'{im_end}')]
        self.prompt = prompt
        return prompt

    def _build_tools_text(self):
        # tool info
        tools_text = []
        for plugin_info in self.list_of_plugin_info:
            tool = TOOL_DESC.format(
                name_for_model=plugin_info['name_for_model'],
                name_for_human=plugin_info['name_for_human'],
                description_for_model=plugin_info['description_for_model'],
                parameters=json.dumps(plugin_info['parameters'], ensure_ascii=False),
            )
            if plugin_info.get('args_format', 'json') == 'json':
                tool += ' Format the arguments as a JSON object.'
            elif plugin_info['args_format'] == 'code':
                tool += ' Enclose the code within triple backticks (`) at the beginning and end of the code.'
            else:
                raise NotImplementedError
            tools_text.append(tool)
        tools_text = '\n\n'.join(tools_text)
        return tools_text

    def _build_tools_name_text(self):
        return ', '.join([plugin_info['name_for_model'] for plugin_info in self.list_of_plugin_info])


================================================
FILE: benchmark/code_interpreter/prompt/react.py
================================================
import os

tools_text = """code_interpreter: Call this tool to interact with the Code Interpreter API.
What is the Code Interpreter API useful for?
Code Interpreter is used to execute Python code to deal with the following tasks:
1. Solving mathematical problems, both quantitative and qualitative
2. Doing data analysis and visualization
3. Converting files between formats
Parameters:
```py
code
```
Enclose the code within triple backticks (```) at the beginning and end of the code.
"""

REACT_PROMPT = """Answer the following questions as best you can. You have access to the following tools:

{tools_text}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tools_name_text}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {query}"""

fname_template = {
    'zh': '文件{fname_str}，',
    'en_multi': 'Files {fname_str}. ',
    'en': 'File {fname_str}. ',
}


class ReAct(object):

    def __init__(self, query, lang='en', upload_file_paths=[]):
        self.query = query
        self.lang = lang
        self.upload_file_paths = [f'`{os.path.basename(fname)}`' for fname in upload_file_paths]

        self.fname_template = fname_template
        self.react_template = REACT_PROMPT
        self.prompt = ''

    def build_prompt(self):
        query = self._format_upload_fname() + self.query
        tools_text = self._build_tools_text()
        tools_name_text = self._build_tools_name_text()
        planning_prompt = self.react_template.format(query=query,
                                                     tools_text=tools_text,
                                                     tools_name_text=tools_name_text)

        self.prompt = planning_prompt
        return planning_prompt

    def _format_upload_fname(self):
        prefix = ''
        if self.upload_file_paths:
            fname_str = ', '.join(self.upload_file_paths)
            lang_key = 'en_multi' if self.lang == 'en' and len(self.upload_file_paths) > 1 else self.lang
            fname_template = self.fname_template[lang_key]
            prefix = fname_template.format(fname_str=fname_str)
        return prefix

    def _build_tools_text(self):
        return tools_text

    def _build_tools_name_text(self):
        return 'code_interpreter'

    def build_observation(self, observation):
        return f'\nObservation: {observation}\nThought:'

    def get_stop_words_list(self):
        return ['Observation:', 'Observation:\n']


================================================
FILE: benchmark/code_interpreter/requirements.txt
================================================
accelerate>=0.20.3
func_timeout
json5
matplotlib
numpy
openai
pandas
PrettyTable
scipy
seaborn
sympy
transformers==4.33.1
transformers_stream_generator


================================================
FILE: benchmark/code_interpreter/utils/__init__.py
================================================


================================================
FILE: benchmark/code_interpreter/utils/code_utils.py
================================================
import os
import re

import json5


def replace_upload_fname(text, upload_fname_list):
    for full_input_fname in upload_fname_list:
        if full_input_fname not in text and os.path.basename(full_input_fname) in text:
            text = text.replace(os.path.basename(full_input_fname), full_input_fname)
    return text


def extract_code(text):
    # Match triple backtick blocks first
    triple_match = re.search(r'```[^\n]*\n(.+?)```', text, re.DOTALL)
    # Match single backtick blocks second
    single_match = re.search(r'`([^`]*)`', text, re.DOTALL)
    if triple_match:
        text = triple_match.group(1)
    elif single_match:
        text = single_match.group(1)
    else:
        try:
            text = json5.loads(text)['code']
        except Exception:
            pass
    # If no code blocks found, return original text
    return text


================================================
FILE: benchmark/code_interpreter/utils/data_utils.py
================================================
import json
import logging

from tqdm import tqdm


def load_jsonl(path):
    data = []
    with open(path, 'r', encoding='utf8') as f:
        for idx, line in enumerate(f, start=1):
            try:
                data.append(json.loads(line))
            except Exception as e:
                logging.info(line)
                logging.warning(f'Error at line {idx}: {e}')
                continue
    return data


def save_jsonl(data, path, progress=False, enabled=True):
    if not enabled:
        return
    with open(path, 'w', encoding='utf-8') as f:
        if progress:
            data = tqdm(data)
        for item in data:
            line = json.dumps(item, ensure_ascii=False)
            print(line, file=f)


================================================
FILE: benchmark/deepplanning/README.md
================================================
# DeepPlanning Benchmark

A comprehensive benchmark for evaluating AI agents' planning capabilities across multiple domains.

## 📋 Overview

This benchmark evaluates AI agents on complex planning tasks across two domains:

- **Travel Planning**: Evaluate agents on travel itinerary planning tasks
- **Shopping Planning**: Evaluate agents on e-commerce shopping tasks

**Flexible Execution:**
- **Unified Run (Recommended)**: You can run both domains together using the unified orchestrator. This documentation focuses on this unified workflow to help you reproduce the experimental results reported in our paper.
- **Independent Run**: Each domain can also be run independently. For domain-specific details, please refer to their respective documentation:
  - [`travelplanning/readme.md`](travelplanning/readme.md) - Travel domain details
  - [`shoppingplanning/README.md`](shoppingplanning/README.md) - Shopping domain details

## 🚀 Quick Start

### Step 1: Install Dependencies

```bash
# Create and activate conda environment
conda create -n deepplanning python=3.10 -y
conda activate deepplanning
pip install -r requirements.txt
```

### Step 2: Download Data Files
First, download the required data files from [HuggingFace Dataset](https://huggingface.co/datasets/Qwen/DeepPlanning) and place them in the project:

**Shopping Planning:**
- `shoppingplanning/database_zip/database_level1.tar.gz` - Level 1 shopping database
- `shoppingplanning/database_zip/database_level2.tar.gz` - Level 2 shopping database
- `shoppingplanning/database_zip/database_level3.tar.gz` - Level 3 shopping database

**Travel Planning:**
- `travelplanning/database/database_zh.zip` - Chinese database 
- `travelplanning/database/database_en.zip` - English database


- In `shoppingplanning/database_zip/`: put `database_level1.tar.gz`, `database_level2.tar.gz`, and `database_level3.tar.gz`.
- In `travelplanning/database/`: put `database_zh.zip` and `database_en.zip`.


### Step 3: Extract Database Files

After downloading, extract the compressed databases:

```bash
# Extract shopping databases
cd shoppingplanning/database_zip
tar -xzf database_level1.tar.gz -C ..
tar -xzf database_level2.tar.gz -C ..
tar -xzf database_level3.tar.gz -C ..
cd ../..

# Extract travel databases
cd travelplanning/database
unzip database_zh.zip    # Chinese database (flights, hotels, restaurants, attractions)
unzip database_en.zip    # English database
cd ../..
```

### Step 4: Configure Models

Edit `models_config.json` in the project root to add your model configurations:

```json
{
  "models": {
    "qwen-plus": {
      "model_name": "qwen-plus",
      "model_type": "openai",
      "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
      "api_key_env": "DASHSCOPE_API_KEY",
      "temperature": 0.0
    },
    "gpt-4o-2024-11-20": {
      "model_name": "gpt-4o-2024-11-20",
      "model_type": "openai",
      "base_url": "https://api.openai.com/v1/models",
      "api_key_env": "OPENAI_API_KEY",
      "temperature": 0.0
    }
  }
}
```
**Important Note about `qwen-plus`:**
- The `qwen-plus` configuration is **required** because it's used by default in the conversion stage (`evaluation/convert_report.py`) in travel domain to parse and format agent-generated travel plans.
- If you want to use a different model for conversion, you can modify the `conversion_model` variable in `travelplanning/evaluation/convert_report.py`.

### Step 5: Set API Keys

Create a `.env` file in the project root (use `.env.example` as template):

```bash
cp .env.example .env
# Edit .env and add your API keys
```

### Step 6: Run the Unified Benchmark

Edit `run_all.sh` to configure your run:

```bash
# Configuration in run_all.sh
DOMAINS="travel shopping"          # Domains to run
BENCHMARK_MODEL="qwen-plus"        # Default model for all domains

# Shopping domain configuration
SHOPPING_MODEL="${BENCHMARK_MODEL}"  # Model(s) for shopping
SHOPPING_LEVELS="1 2 3"             # Levels to run
SHOPPING_WORKERS=50                 # Parallel workers
SHOPPING_MAX_LLM_CALLS=400          # Max LLM calls per sample

# Travel domain configuration
TRAVEL_MODEL="${BENCHMARK_MODEL}"    # Model(s) for travel
TRAVEL_LANGUAGE=""                   # Language (zh/en/empty for both)
TRAVEL_WORKERS=50                    # Parallel workers
TRAVEL_MAX_LLM_CALLS=400             # Max LLM calls per sample
TRAVEL_START_FROM="inference"        # Start point: inference, conversion, evaluation
TRAVEL_OUTPUT_DIR=""                 # Output directory (optional)
TRAVEL_VERBOSE="false"               # Verbose output
TRAVEL_DEBUG="false"                  # Debug mode
```

Then run:

```bash
bash run_all.sh
```

**What it does:**
1. Runs each model on all specified domains sequentially
2. For **Travel domain**: runs both language versions (Chinese and English)
3. For **Shopping domain**: runs all difficulty levels (1 → 2 → 3)
4. Generates per-domain statistics in domain-specific result folders
5. Aggregates results across domains and calculates overall scores
6. Saves aggregated results in `aggregated_results/{model_name}_aggregated.json`

## 📊 Understanding Results

### Result File Locations

**Travel Domain:**
- Evaluation results: `travelplanning/results/{model}_{language}/evaluation/evaluation_summary.json`
- Converted plans: `travelplanning/results/{model}_{language}/converted_plans/`
- Trajectories: `travelplanning/results/{model}_{language}/trajectories/`

**Shopping Domain:**
- Per-level results: `shoppingplanning/result_report/summary_report_{model}_{level}_{timestamp}.json`
- Overall statistics: `shoppingplanning/result_report/{model}_statistics.json`
- Inference outputs: `shoppingplanning/database_infered/`



**Aggregated Results (Both Domains):**
- Cross-domain aggregation: `aggregated_results/{model}_aggregated.json`

**For detailed domain-specific metrics and result interpretation:**
- **Shopping Domain**: See [Shopping Results Documentation](shoppingplanning/README.md#step-7-view-results) for detailed explanation of match_rate, weighted_average_case_score, and per-level statistics
- **Travel Domain**: See [Travel Results Documentation](travelplanning/readme.md#step-7-view-results) for detailed explanation of composite_score, case_acc, commonsense_score, and personalized_score

### Aggregated Results Format

After running all benchmarks, view the aggregated results:

```bash
cat aggregated_results/{MODEL}_aggregated.json
```

**Example Output:**
```json
{
  "model_name": "qwen-plus",
  "aggregation_time": "2026-01-05T15:30:00.000000",
  "domains": {
    "shopping": {
      "total_cases": 120,
      "successful_cases": 17,
      "successful_rate": 0.1417,
      "match_rate": 0.6209,
      "weighted_average_case_score": 0.1417,
      "valid": true,
      "levels_completed": [1, 2, 3]
    },
    "travel": {
      "total_cases": 240,
      "successful_cases": 238,
      "successful_rate": 0.9917,
      "composite_score": 0.2813,
      "case_acc": 0.0,
      "commonsense_score": 0.4292,
      "personalized_score": 0.1333,
      "valid": true,
      "languages_completed": ["zh", "en"],
      "language_details": {
        "zh": {
          "composite_score": 0.2813,
          "case_acc": 0.0,
          "commonsense_score": 0.4292,
          "personalized_score": 0.1333
        },
        "en": {
          "composite_score": 0.2850,
          "case_acc": 0.0,
          "commonsense_score": 0.4300,
          "personalized_score": 0.1350
        }
      }
    }
  },
  "overall": {
    "total_cases": 360,
    "successful_cases": 255,
    "successful_rate": 0.5667,
    "valid": true,
    "domains_completed": ["shopping", "travel"],
    "num_domains": 2,
    "shopping_match_rate": 0.6209,
    "shopping_weighted_average_case_score": 0.1417,
    "travel_composite_score": 0.2813,
    "travel_case_acc": 0.0,
    "travel_commonsense_score": 0.4292,
    "travel_personalized_score": 0.1333,
    "avg_acc": 0.0708
  }
}
```

**Key Metrics Overview:**

**Shopping Domain:**
- **`match_rate`** ⭐: Percentage of expected items correctly matched (main paper metric)
- **`weighted_average_case_score`** ⭐: Average case completion score (main paper metric)

**Travel Domain:**
- **`composite_score`** ⭐: Weighted combination of commonsense and personalized scores (main paper metric)
- **`case_acc`** ⭐: Percentage of cases passing all constraints (main paper metric)
- `commonsense_score`: Score for commonsense constraint satisfaction
- `personalized_score`: Score for personalized requirement satisfaction

**Cross-Domain:**
- **`avg_acc`** ⭐: Average of shopping `weighted_average_case_score` and travel `case_acc` - **Primary cross-domain metric**

---


## 📄 License

Please refer to individual domain directories for license information.



================================================
FILE: benchmark/deepplanning/aggregate_results.py
================================================
#!/usr/bin/env python3
"""
Aggregate results across Shopping and Travel Planning benchmarks
Calculates overall scores by averaging across domains
"""

import json
import sys
from pathlib import Path
from typing import Dict, Any, Optional
from datetime import datetime


def load_shopping_statistics(domain_dir: Path, model_name: str) -> Optional[Dict[str, Any]]:
    """
    Load statistics for shopping domain
    
    Args:
        domain_dir: Path to shoppingplanning directory
        model_name: Model name
        
    Returns:
        Statistics dictionary with match_rate and weighted_average_case_score
    """
    stats_file = domain_dir / "result_report" / f"{model_name}_statistics.json"
    
    if not stats_file.exists():
        print(f"⚠️  Warning: Shopping statistics file not found: {stats_file}")
        return None
    
    try:
        with open(stats_file, 'r', encoding='utf-8') as f:
            data = json.load(f)
        
        # Extract the metrics we need
        total = data.get("total", {})
        return {
            "total": {
                "total_cases": total.get("total_cases", 0),
                "successful_cases": total.get("successful_cases", 0),
                "successful_rate": total.get("successful_rate", 0.0),
                "match_rate": total.get("match_rate", 0.0),  # Main metric for shopping
                "weighted_average_case_score": total.get("weighted_average_case_score", 0.0),  # Main metric for shopping
                "valid": total.get("valid", False),
                "levels_completed": total.get("levels_completed", [])
            }
        }
    except Exception as e:
        print(f"❌ Error loading shopping statistics {stats_file}: {e}")
        return None


def load_travel_statistics(domain_dir: Path, model_name: str, output_dir: Optional[str] = None) -> Optional[Dict[str, Any]]:
    """
    Load statistics for travel domain
    
    Reads evaluation_summary.json for both zh and en languages,
    then calculates average scores.
    
    Args:
        domain_dir: Path to travelplanning directory
        model_name: Model name
        output_dir: Optional custom output directory for travel results
        
    Returns:
        Statistics dictionary with composite_score (as match_rate) and case_acc (as weighted_average_case_score)
    """
    languages = ["zh", "en"]
    language_results = {}
    
    # Determine the results directory
    if output_dir:
        results_base = Path(output_dir)
    else:
        results_base = domain_dir / "results"
    
    for lang in languages:
        summary_file = results_base / f"{model_name}_{lang}" / "evaluation" / "evaluation_summary.json"
        
        if not summary_file.exists():
            print(f"⚠️  Warning: Travel evaluation summary not found for {lang}: {summary_file}")
            continue
        
        try:
            with open(summary_file, 'r', encoding='utf-8') as f:
                data = json.load(f)
            
            metrics = data.get("metrics", {})
            language_results[lang] = {
                "composite_score": metrics.get("composite_score", 0.0),
                "case_acc": metrics.get("case_acc", 0.0),
                "commonsense_score": metrics.get("commonsense_score", 0.0),
                "personalized_score": metrics.get("personalized_score", 0.0),
                "total_test_samples": data.get("total_test_samples", 0),
                "evaluation_success_count": data.get("evaluation_success_count", 0)
            }
        except Exception as e:
            print(f"⚠️  Warning: Error loading travel statistics for {lang}: {e}")
            continue
    
    if not language_results:
        print(f"⚠️  Warning: No travel statistics found for model {model_name}")
        return None
    
    # Calculate average across languages
    num_languages = len(language_results)
    avg_composite_score = sum(r["composite_score"] for r in language_results.values()) / num_languages
    avg_case_acc = sum(r["case_acc"] for r in language_results.values()) / num_languages
    avg_commonsense_score = sum(r["commonsense_score"] for r in language_results.values()) / num_languages
    avg_personalized_score = sum(r["personalized_score"] for r in language_results.values()) / num_languages
    
    # Calculate total cases
    total_cases = sum(r["total_test_samples"] for r in language_results.values())
    successful_cases = sum(r["evaluation_success_count"] for r in language_results.values())
    successful_rate = successful_cases / total_cases if total_cases > 0 else 0.0
    
    return {
        "total": {
            "total_cases": total_cases,
            "successful_cases": successful_cases,
            "successful_rate": successful_rate,
            "match_rate": avg_composite_score,  # composite_score as match_rate
            "weighted_average_case_score": avg_case_acc,  # case_acc as weighted_average_case_score
            "commonsense_score": avg_commonsense_score,  # Average commonsense_score
            "personalized_score": avg_personalized_score,  # Average personalized_score
            "valid": True,  # Assume valid if we have results
            "levels_completed": list(language_results.keys())  # Languages completed
        },
        "language_details": language_results  # Keep language-specific details
    }


def aggregate_model_results(model_name: str, project_root: Path, travel_output_dir: Optional[str] = None) -> Optional[Dict[str, Any]]:
    """
    Aggregate results for a model across all domains
    
    Args:
        model_name: Model name
        project_root: Project root directory
        travel_output_dir: Optional custom output directory for travel results
        
    Returns:
        Aggregated results dictionary
    """
    # Load statistics from each domain
    shopping_stats = load_shopping_statistics(project_root / "shoppingplanning", model_name)
    travel_stats = load_travel_statistics(project_root / "travelplanning", model_name)
    
    # Check if we have at least one domain's results
    domains_found = []
    if shopping_stats:
        domains_found.append("shopping")
    if travel_stats:
        domains_found.append("travel")
    
    if not domains_found:
        print(f"❌ Error: No statistics found for model {model_name}")
        return None
    
    print(f"✓ Found statistics for domains: {', '.join(domains_found)}")
    
    # Prepare aggregated results
    aggregated = {
        "model_name": model_name,
        "aggregation_time": datetime.now().isoformat(),
        "domains": {},
        "overall": {}
    }
    
    # Add domain-specific results
    if shopping_stats:
        aggregated["domains"]["shopping"] = {
            "total_cases": shopping_stats["total"]["total_cases"],
            "successful_cases": shopping_stats["total"]["successful_cases"],
            "successful_rate": shopping_stats["total"]["successful_rate"],
            "match_rate": shopping_stats["total"]["match_rate"],
            "weighted_average_case_score": shopping_stats["total"]["weighted_average_case_score"],
            "valid": shopping_stats["total"]["valid"],
            "levels_completed": shopping_stats["total"]["levels_completed"]
        }
    
    if travel_stats:
        travel_domain_data = {
            "total_cases": travel_stats["total"]["total_cases"],
            "successful_cases": travel_stats["total"]["successful_cases"],
            "successful_rate": travel_stats["total"]["successful_rate"],
            "composite_score": travel_stats["total"]["match_rate"],  # Average composite_score across zh and en
            "case_acc": travel_stats["total"]["weighted_average_case_score"],  # Average case_acc across zh and en
            "commonsense_score": travel_stats["total"]["commonsense_score"],  # Average commonsense_score across zh and en
            "personalized_score": travel_stats["total"]["personalized_score"],  # Average personalized_score across zh and en
            "valid": travel_stats["total"]["valid"],
            "languages_completed": travel_stats["total"]["levels_completed"]  # Languages: ["zh", "en"]
        }
        # Add language-specific details if available
        if "language_details" in travel_stats:
            travel_domain_data["language_details"] = travel_stats["language_details"]
        aggregated["domains"]["travel"] = travel_domain_data
    
    # Calculate overall averages across domains
    num_domains = len(domains_found)
    
    # Collect metrics for averaging
    total_cases = 0
    successful_cases = 0
    successful_rates = []
    
    # Domain-specific metrics
    shopping_match_rate = None
    shopping_weighted_score = None
    travel_composite_score = None
    travel_case_acc = None
    travel_commonsense_score = None
    travel_personalized_score = None
    
    all_valid = True
    
    for domain in domains_found:
        domain_stats = shopping_stats if domain == "shopping" else travel_stats
        total_cases += domain_stats["total"]["total_cases"]
        successful_cases += domain_stats["total"]["successful_cases"]
        successful_rates.append(domain_stats["total"]["successful_rate"])
        all_valid = all_valid and domain_stats["total"]["valid"]
        
        # Store domain-specific metrics
        if domain == "shopping":
            shopping_match_rate = domain_stats["total"]["match_rate"]
            shopping_weighted_score = domain_stats["total"]["weighted_average_case_score"]
        elif domain == "travel":
            travel_composite_score = domain_stats["total"]["match_rate"]  # This is avg composite_score
            travel_case_acc = domain_stats["total"]["weighted_average_case_score"]  # This is avg case_acc
            travel_commonsense_score = domain_stats["total"]["commonsense_score"]  # This is avg commonsense_score
            travel_personalized_score = domain_stats["total"]["personalized_score"]  # This is avg personalized_score
    
    # Calculate overall metrics
    aggregated["overall"] = {
        "total_cases": total_cases,
        "successful_cases": successful_cases,
        "successful_rate": sum(successful_rates) / num_domains,
        "valid": all_valid,
        "domains_completed": domains_found,
        "num_domains": num_domains
    }
    
    # Add domain-specific metrics to overall
    if shopping_match_rate is not None:
        aggregated["overall"]["shopping_match_rate"] = shopping_match_rate
        aggregated["overall"]["shopping_weighted_average_case_score"] = shopping_weighted_score
    
    if travel_composite_score is not None:
        aggregated["overall"]["travel_composite_score"] = travel_composite_score
        aggregated["overall"]["travel_case_acc"] = travel_case_acc
        aggregated["overall"]["travel_commonsense_score"] = travel_commonsense_score
        aggregated["overall"]["travel_personalized_score"] = travel_personalized_score
    
    # Calculate cross-domain averages (if both domains exist)
    if shopping_match_rate is not None and travel_composite_score is not None:
        # avg_acc: average of shopping weighted_average_case_score and travel case_acc
        aggregated["overall"]["avg_acc"] = (shopping_weighted_score + travel_case_acc) / 2
    
    return aggregated


def main():
    import argparse
    
    parser = argparse.ArgumentParser(description="Aggregate results across domains")
    parser.add_argument(
        "--model_name",
        type=str,
        required=True,
        help="Model name to aggregate results for"
    )
    parser.add_argument(
        "--travel-output-dir",
        type=str,
        default=None,
        help="Custom output directory for travel domain results (optional)"
    )
    
    args = parser.parse_args()
    
    # Get project root directory
    project_root = Path(__file__).resolve().parent
    
    print(f"\n{'='*80}")
    print(f"📊 Aggregating Results for Model: {args.model_name}")
    print(f"{'='*80}\n")
    
    # Aggregate results
    aggregated = aggregate_model_results(args.model_name, project_root, args.travel_output_dir)
    
    if aggregated is None:
        print(f"❌ Failed to aggregate results for model {args.model_name}")
        sys.exit(1)
    
    # Save aggregated results
    output_dir = project_root / "aggregated_results"
    output_dir.mkdir(exist_ok=True)
    
    output_file = output_dir / f"{args.model_name}_aggregated.json"
    
    try:
        with open(output_file, 'w', encoding='utf-8') as f:
            json.dump(aggregated, f, indent=2, ensure_ascii=False)
        
        print(f"✅ Aggregated results saved to: {output_file}\n")
        
        # Print summary
        print(f"{'='*80}")
        print(f"📊 Summary for {args.model_name}")
        print(f"{'='*80}")
        print(f"\nDomains completed: {', '.join(aggregated['overall']['domains_completed'])}")
        print(f"\nOverall Metrics:")
        print(f"  Total cases: {aggregated['overall']['total_cases']}")
        print(f"  Successful cases: {aggregated['overall']['successful_cases']}")
        print(f"  Successful rate: {aggregated['overall']['successful_rate']:.4f} ({aggregated['overall']['successful_rate']:.2%})")
        
        # Show cross-domain average if both domains exist
        if 'avg_acc' in aggregated['overall']:
            print(f"\nCross-Domain Metric ⭐:")
            print(f"  avg_acc (shopping weighted_score + travel case_acc) / 2: {aggregated['overall']['avg_acc']:.4f} ({aggregated['overall']['avg_acc']:.2%})")
        
        # Show individual domain metrics
        if 'shopping_match_rate' in aggregated['overall']:
            print(f"\nShopping Domain Metrics:")
            print(f"  Match rate ⭐: {aggregated['overall']['shopping_match_rate']:.4f} ({aggregated['overall']['shopping_match_rate']:.2%})")
            print(f"  Weighted average case score ⭐: {aggregated['overall']['shopping_weighted_average_case_score']:.4f} ({aggregated['overall']['shopping_weighted_average_case_score']:.2%})")
        
        if 'travel_composite_score' in aggregated['overall']:
            print(f"\nTravel Domain Metrics (averaged across zh and en):")
            print(f"  Composite score ⭐: {aggregated['overall']['travel_composite_score']:.4f} ({aggregated['overall']['travel_composite_score']:.2%})")
            print(f"  Case accuracy ⭐: {aggregated['overall']['travel_case_acc']:.4f} ({aggregated['overall']['travel_case_acc']:.2%})")
            print(f"  Commonsense score: {aggregated['overall']['travel_commonsense_score']:.4f} ({aggregated['overall']['travel_commonsense_score']:.2%})")
            print(f"  Personalized score: {aggregated['overall']['travel_personalized_score']:.4f} ({aggregated['overall']['travel_personalized_score']:.2%})")
        
        print(f"\nModel valid: {aggregated['overall']['valid']} {'✅' if aggregated['overall']['valid'] else '❌'}")
        
        print(f"\nPer-Domain Breakdown:")
        for domain, stats in aggregated['domains'].items():
            if domain == "shopping":
                print(f"  Shopping:")
                print(f"    Total cases: {stats['total_cases']}")
                print(f"    Successful rate: {stats['successful_rate']:.4f} ({stats['successful_rate']:.2%})")
                print(f"    Match rate ⭐: {stats['match_rate']:.4f} ({stats['match_rate']:.2%})")
                print(f"    Weighted average case score ⭐: {stats['weighted_average_case_score']:.4f} ({stats['weighted_average_case_score']:.2%})")
                if "levels_completed" in stats:
                    print(f"    Levels: {', '.join(map(str, stats['levels_completed']))}")
            
            elif domain == "travel":
                print(f"  Travel:")
                print(f"    Total cases: {stats['total_cases']}")
                print(f"    Successful rate: {stats['successful_rate']:.4f} ({stats['successful_rate']:.2%})")
                print(f"    Composite score (avg) ⭐: {stats['composite_score']:.4f} ({stats['composite_score']:.2%})")
                print(f"    Case accuracy (avg) ⭐: {stats['case_acc']:.4f} ({stats['case_acc']:.2%})")
                print(f"    Commonsense score (avg): {stats['commonsense_score']:.4f} ({stats['commonsense_score']:.2%})")
                print(f"    Personalized score (avg): {stats['personalized_score']:.4f} ({stats['personalized_score']:.2%})")
                
                # Show language details for travel domain
                if "language_details" in stats:
                    print(f"    Languages: {', '.join(stats['languages_completed'])}")
                    for lang, lang_stats in stats['language_details'].items():
                        print(f"      {lang.upper()}:")
                        print(f"        Composite score: {lang_stats['composite_score']:.4f}")
                        print(f"        Case accuracy: {lang_stats['case_acc']:.4f}")
                        print(f"        Commonsense score: {lang_stats['commonsense_score']:.4f}")
                        print(f"        Personalized score: {lang_stats['personalized_score']:.4f}")
        
        print(f"{'='*80}\n")
        
    except Exception as e:
        print(f"❌ Failed to save aggregated results: {e}")
        sys.exit(1)


if __name__ == "__main__":
    main()



================================================
FILE: benchmark/deepplanning/env.example
================================================
# API Keys for different model providers
# Copy this file to .env and fill in your API keys

# For Qwen models (via DashScope)
DASHSCOPE_API_KEY="your_dashscope_api_key_here"

# For OpenAI models
OPENAI_API_KEY="your_openai_api_key_here"



================================================
FILE: benchmark/deepplanning/models_config.json
================================================
{
  "models": {
    "qwen-plus": {
      "model_name": "qwen-plus",
      "model_type": "openai",
      "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
      "api_key_env": "DASHSCOPE_API_KEY",
      "temperature": 0.0
    },
    "qwen3-max": {
      "model_name": "qwen3-max",
      "model_type": "openai",
      "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
      "api_key_env": "DASHSCOPE_API_KEY",
      "temperature": 0.0
    },
    "gpt-4o-2024-11-20": {
      "model_name": "gpt-4o-2024-11-20",
      "model_type": "openai",
      "base_url": "https://api.openai.com/v1/models",
      "api_key_env": "OPENAI_API_KEY",
      "temperature": 0.0
    },
    "gpt-5-2025-08-07-high": {
      "model_name": "gpt-5-2025-08-07",
      "model_type": "openai",
      "base_url": "https://api.openai.com/v1/models",
      "api_key_env": "OPENAI_API_KEY",
      "temperature": 0.0,
      "extra_body": {
        "reasoning_effort": "high"
      }
    }
  }
}



================================================
FILE: benchmark/deepplanning/requirements.txt
================================================
# ========================================
# Unified Benchmark Requirements
# For both Shopping Planning and Travel Planning domains
# ========================================

# Core agent package (Travel domain)
qwen-agent>=0.0.10

# LLM API clients
openai>=1.0.0                    # OpenAI SDK (supports OpenAI, DashScope, and compatible APIs)
dashscope>=1.11.0                # Alibaba DashScope API client

# Data processing
pandas>=1.5.0                    # CSV database loading and querying
numpy>=1.24.0                    # Numerical operations

# Search algorithm (Shopping domain)
rank-bm25>=0.2.2                 # BM25 algorithm for product search

# HTTP requests
requests>=2.28.0                 # HTTP client for API calls

# Configuration and environment
python-dotenv>=1.0.0             # Load environment variables from .env file

# JSON handling
json5>=0.9.0                     # JSON5 support (Travel domain)
jsonlines>=3.0.0                 # JSON Lines format support
jsonschema>=4.0.0                # JSON schema validation (Travel domain)

# Text processing
pydantic>=2.3.0                  # Data validation and settings management
tiktoken>=0.5.0                  # Token counting for OpenAI models

# Utilities
eval-type-backport               # Type evaluation backport (Travel domain)
pillow>=9.0.0                    # Image processing (if needed)
tabulate>=0.9.0                  # Pretty-print tabular data



================================================
FILE: benchmark/deepplanning/run_all.sh
================================================
#!/bin/bash

# ============================================
# Unified Benchmark Runner
# Runs both Shopping and Travel Planning benchmarks
# Usage: bash run_all.sh
# ============================================

set -e  # Exit immediately if a command exits with a non-zero status

# Get the absolute path of the script directory
BASE_DIR="$(cd "$(dirname "$0")" && pwd)"
cd "$BASE_DIR"

# ============================================
# Configuration
# ============================================

# Domains to run (space-separated): "shopping travel" or just "shopping"
DOMAINS="travel shopping"

# Model configuration (applies to all domains unless overridden)
# For a single model: BENCHMARK_MODEL="qwen-plus"
# For multiple models: BENCHMARK_MODEL="qwen-plus qwen3-max gpt-4o-2024-11-20"
BENCHMARK_MODEL="qwen-plus"

# ============================================
# Shopping Domain Configuration
# ============================================

# Test levels for shopping domain (space-separated)
SHOPPING_LEVELS="1 2 3"

# Number of parallel workers for shopping
SHOPPING_WORKERS=50

# Maximum LLM calls per sample for shopping
SHOPPING_MAX_LLM_CALLS=400

# Model for shopping domain (optional, defaults to BENCHMARK_MODEL)
SHOPPING_MODEL="${BENCHMARK_MODEL}"

# ============================================
# Travel Domain Configuration
# ============================================

# Model for travel domain (optional, defaults to BENCHMARK_MODEL)
TRAVEL_MODEL="${BENCHMARK_MODEL}"

# Language for travel domain: zh, en, or empty for both
TRAVEL_LANGUAGE=""

# Number of parallel workers for travel
TRAVEL_WORKERS=50

# Maximum LLM calls per sample for travel
TRAVEL_MAX_LLM_CALLS=400

# Start point for travel: inference, conversion, evaluation
TRAVEL_START_FROM="inference"

# Output directory for travel (optional, default: results/ in travelplanning directory)
TRAVEL_OUTPUT_DIR=""

# Verbose output for travel
TRAVEL_VERBOSE="false"

# Debug mode for travel
TRAVEL_DEBUG="false"

# ============================================
# Validate Configuration
# ============================================

# Check if models_config.json exists
if [ ! -f "$BASE_DIR/models_config.json" ]; then
    echo "❌ Error: models_config.json not found in $BASE_DIR"
    echo "   Please create models_config.json in the project root directory."
    exit 1
fi

# Check if required environment variables are set
if [ -f "$BASE_DIR/.env" ]; then
    echo "📝 Loading environment variables from .env"
    set -a
    source "$BASE_DIR/.env"
    set +a
fi

echo ""
echo "╔════════════════════════════════════════════════════════════════════╗"
echo "║              Unified Agent Benchmark Runner                        ║"
echo "╚════════════════════════════════════════════════════════════════════╝"
echo ""
echo "Configuration:"
echo "  Domains:              $DOMAINS"
echo "  Default Model:        $BENCHMARK_MODEL"
echo ""
echo "Shopping Domain:"
echo "  Model:                ${SHOPPING_MODEL}"
echo "  Levels:               ${SHOPPING_LEVELS}"
echo "  Workers:              ${SHOPPING_WORKERS}"
echo "  Max LLM calls:        ${SHOPPING_MAX_LLM_CALLS}"
echo ""
echo "Travel Domain:"
echo "  Model:                ${TRAVEL_MODEL}"
echo "  Language:             ${TRAVEL_LANGUAGE}"
echo "  Workers:              ${TRAVEL_WORKERS}"
echo "  Max LLM calls:        ${TRAVEL_MAX_LLM_CALLS}"
echo "  Start from:           ${TRAVEL_START_FROM}"
echo ""

# ============================================
# Run Benchmarks
# ============================================

DOMAIN_LIST=($DOMAINS)
START_TIME=$(date +%s)

# Build list of unique models from both domains
MODELS_LIST=()
if [[ " ${DOMAIN_LIST[@]} " =~ " shopping " ]]; then
    for model in $SHOPPING_MODEL; do
        if [[ ! " ${MODELS_LIST[@]} " =~ " ${model} " ]]; then
            MODELS_LIST+=("$model")
        fi
    done
fi
if [[ " ${DOMAIN_LIST[@]} " =~ " travel " ]]; then
    for model in $TRAVEL_MODEL; do
        if [[ ! " ${MODELS_LIST[@]} " =~ " ${model} " ]]; then
            MODELS_LIST+=("$model")
        fi
    done
fi

for MODEL in "${MODELS_LIST[@]}"; do
    echo ""
    echo "════════════════════════════════════════════════════════════════════"
    echo "🚀 Starting Benchmark for Model: ${MODEL}"
    echo "════════════════════════════════════════════════════════════════════"
    echo ""
    
    for DOMAIN in "${DOMAIN_LIST[@]}"; do
        if [ "$DOMAIN" = "shopping" ]; then
            DOMAIN_DIR="$BASE_DIR/shoppingplanning"
            DOMAIN_NAME="Shopping Planning"
            # Check if this model should run for shopping domain
            if [[ ! " ${SHOPPING_MODEL} " =~ " ${MODEL} " ]]; then
                echo "⚠️  Skipping ${DOMAIN_NAME} for model ${MODEL} (not in SHOPPING_MODEL list)"
                continue
            fi
            # Set shopping-specific parameters
            DOMAIN_MODEL="$MODEL"
            DOMAIN_LEVELS="$SHOPPING_LEVELS"
            DOMAIN_WORKERS="$SHOPPING_WORKERS"
            DOMAIN_MAX_LLM_CALLS="$SHOPPING_MAX_LLM_CALLS"
        elif [ "$DOMAIN" = "travel" ]; then
            DOMAIN_DIR="$BASE_DIR/travelplanning"
            DOMAIN_NAME="Travel Planning"
            # Check if this model should run for travel domain
            if [[ ! " ${TRAVEL_MODEL} " =~ " ${MODEL} " ]]; then
                echo "⚠️  Skipping ${DOMAIN_NAME} for model ${MODEL} (not in TRAVEL_MODEL list)"
                continue
            fi
            # Set travel-specific parameters
            DOMAIN_MODEL="$MODEL"
            DOMAIN_WORKERS="$TRAVEL_WORKERS"
            DOMAIN_MAX_LLM_CALLS="$TRAVEL_MAX_LLM_CALLS"
        else
            echo "⚠️  Warning: Unknown domain '$DOMAIN', skipping..."
            continue
        fi
        
        if [ ! -d "$DOMAIN_DIR" ]; then
            echo "⚠️  Warning: Domain directory not found: $DOMAIN_DIR, skipping..."
            continue
        fi
        
        echo ""
        echo "────────────────────────────────────────────────────────────────────"
        echo "🔹 Running ${DOMAIN_NAME} Benchmark"
        echo "────────────────────────────────────────────────────────────────────"
        if [ "$DOMAIN" = "shopping" ]; then
            echo "    Model:          ${DOMAIN_MODEL}"
            echo "    Levels:         ${DOMAIN_LEVELS}"
            echo "    Workers:        ${DOMAIN_WORKERS}"
            echo "    Max LLM calls:  ${DOMAIN_MAX_LLM_CALLS}"
        elif [ "$DOMAIN" = "travel" ]; then
            echo "    Model:          ${DOMAIN_MODEL}"
            echo "    Language:       ${TRAVEL_LANGUAGE}"
            echo "    Workers:        ${DOMAIN_WORKERS}"
            echo "    Max LLM calls:  ${DOMAIN_MAX_LLM_CALLS}"
            echo "    Start from:     ${TRAVEL_START_FROM}"
        fi
        echo "────────────────────────────────────────────────────────────────────"
        echo ""
        
        cd "$DOMAIN_DIR"
        
        # Note: models_config.json is automatically loaded from project root
        # Each domain's code will automatically find ../models_config.json
        
        # Export domain-specific environment variables
        export BENCHMARK_MODEL="$DOMAIN_MODEL"
        export BENCHMARK_WORKERS="$DOMAIN_WORKERS"
        export BENCHMARK_MAX_LLM_CALLS="$DOMAIN_MAX_LLM_CALLS"
        
        if [ "$DOMAIN" = "shopping" ]; then
            export BENCHMARK_LEVELS="$DOMAIN_LEVELS"
            export SHOPPING_AGENT_MODEL="$DOMAIN_MODEL"
        elif [ "$DOMAIN" = "travel" ]; then
            # Export BENCHMARK_LANGUAGE (including empty string for "both languages")
            export BENCHMARK_LANGUAGE="$TRAVEL_LANGUAGE"
            export BENCHMARK_START_FROM="$TRAVEL_START_FROM"
            export BENCHMARK_OUTPUT_DIR="$TRAVEL_OUTPUT_DIR"
            export BENCHMARK_VERBOSE="$TRAVEL_VERBOSE"
            export BENCHMARK_DEBUG="$TRAVEL_DEBUG"
            export TRAVEL_AGENT_MODEL="$DOMAIN_MODEL"
        fi
        
        # Run the domain-specific benchmark script
        bash run.sh
        EXIT_CODE=$?
        
        if [ $EXIT_CODE -ne 0 ]; then
            echo "❌ ${DOMAIN_NAME} benchmark failed for model ${MODEL}"
            exit 1
        fi
        
        echo ""
        echo "✅ ${DOMAIN_NAME} benchmark completed for ${MODEL}"
        echo ""
        
        cd "$BASE_DIR"
    done
    
    # Aggregate results across domains for this model
    echo ""
    echo "────────────────────────────────────────────────────────────────────"
    echo "📊 Aggregating Results for ${MODEL}"
    echo "────────────────────────────────────────────────────────────────────"
    echo ""
    
    # Pass travel output directory if specified
    if [ -n "$TRAVEL_OUTPUT_DIR" ]; then
        python aggregate_results.py --model_name "${MODEL}" --travel-output-dir "$TRAVEL_OUTPUT_DIR"
    else
        python aggregate_results.py --model_name "${MODEL}"
    fi
    EXIT_CODE=$?
    
    if [ $EXIT_CODE -ne 0 ]; then
        echo "⚠️  Warning: Result aggregation failed for model ${MODEL}, continuing..."
    else
        echo "✅ Results aggregated for ${MODEL}"
    fi
    
    echo ""
    echo "════════════════════════════════════════════════════════════════════"
    echo "✅ Model ${MODEL} completed all benchmarks"
    echo "════════════════════════════════════════════════════════════════════"
    echo ""
    
    # Sleep between model runs except for the last one
    if [ "${MODEL}" != "${MODELS[-1]}" ]; then
        echo "⏳ Sleeping 60s before next model..."
        sleep 60
    fi
done

# ============================================
# Final Summary
# ============================================

END_TIME=$(date +%s)
ELAPSED=$((END_TIME - START_TIME))
ELAPSED_MIN=$((ELAPSED / 60))
ELAPSED_SEC=$((ELAPSED % 60))

echo ""
echo "╔════════════════════════════════════════════════════════════════════╗"
echo "║                    Benchmark Completed                             ║"
echo "╚════════════════════════════════════════════════════════════════════╝"
echo ""
echo "Total time: ${ELAPSED}s (${ELAPSED_MIN}m ${ELAPSED_SEC}s)"
echo "Results saved in:"
echo "  - shoppingplanning/result_report/"
echo "  - travelplanning/result_report/"
echo "  - aggregated_results/"
echo ""
echo "✅ All benchmarks completed successfully!"
echo ""

exit 0



================================================
FILE: benchmark/deepplanning/shoppingplanning/README.md
================================================
## 🛠️ Quick Start

This domain can be run as part of the unified benchmark or independently.

### Step 1: Install Dependencies

**Note:** The unified environment is set up in the project root directory.

```bash
# Navigate to project root (if you're in shoppingplanning/)
cd ..

# Create a new conda environment (recommended Python 3.10)
conda create -n deepplanning python=3.10 -y

# Activate the environment
conda activate deepplanning

# Install all required packages from the unified requirements.txt
pip install -r requirements.txt

# Return to shoppingplanning directory
cd shoppingplanning
```

### Step 2: Download Data Files

**Required Files:**
- `database_zip/database_level1.tar.gz` - Level 1 shopping database
- `database_zip/database_level2.tar.gz` - Level 2 shopping database
- `database_zip/database_level3.tar.gz` - Level 3 shopping database

**Download from:** [HuggingFace Dataset](https://huggingface.co/datasets/Qwen/DeepPlanning)

First, download the required data files from HuggingFace and place them in the project:

- In `shoppingplanning/database_zip/`: put `database_level1.tar.gz`, `database_level2.tar.gz`, and `database_level3.tar.gz`.

### Step 3: Extract Database Files

After downloading, extract the compressed shopping databases:

```bash
# Extract database files for all levels
cd database_zip
tar -xzf database_level1.tar.gz -C ..
tar -xzf database_level2.tar.gz -C ..
tar -xzf database_level3.tar.gz -C ..
cd ..
```

### Step 4: Configure Model Settings

**Note:** Model configuration is shared across all domains and located in the project root.

Edit `models_config.json` in the **project root directory** (one level up from shoppingplanning/):

```json
{
  "models": {
    "qwen-plus": {
      "model_name": "qwen-plus",
      "model_type": "openai",
      "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
      "api_key_env": "DASHSCOPE_API_KEY",
      "temperature": 0.0
    },
    "gpt-4o-2024-11-20": {
      "model_name": "gpt-4o-2024-11-20",
      "model_type": "openai",
      "base_url": "https://api.openai.com/v1/models",
      "api_key_env": "OPENAI_API_KEY",
      "temperature": 0.0
    }
  }
}
```

**Supported Model Types:**
- `openai`: OpenAI and compatible models (GPT-4, Qwen, DeepSeek, etc.)

### Step 5: Set API Keys

**Note:** API keys are configured in the project root directory.

Create a `.env` file in the **project root directory** or set environment variables:

```bash
# Option 1: Create .env file in project root
# Navigate to project root
cd ..
cp .env.example .env
# Edit .env and add your API keys

# Option 2: Set environment variables directly
export DASHSCOPE_API_KEY="your_dashscope_api_key"
export OPENAI_API_KEY="your_openai_api_key"
```

### Step 6: Run the Benchmark

#### Using Shell Script with Environment Variables (Recommended)

Set environment variables to configure the run:

```bash
SHOPPING_AGENT_MODEL="qwen-plus" \
SHOPPING_LEVELS="1 2 3" \
SHOPPING_WORKERS=50 \
SHOPPING_MAX_LLM_CALLS=400 \
bash run.sh
```

**Available Environment Variables:**
- `SHOPPING_AGENT_MODEL`: Model name(s) from models_config.json (space-separated for multiple models)
- `SHOPPING_LEVELS`: Levels to run (space-separated, e.g., "1 2 3")
- `SHOPPING_WORKERS`: Number of parallel workers
- `SHOPPING_MAX_LLM_CALLS`: Maximum LLM calls per sample

**Or edit default values in `run.sh` for permanent changes:**

Find and modify these lines in `run.sh` (change the values after the last `:-`):

```bash
TEST_LEVELS="${BENCHMARK_LEVELS:-${SHOPPING_LEVELS:-1 2 3}}"           # Change levels
WORKERS="${BENCHMARK_WORKERS:-${SHOPPING_WORKERS:-50}}"                # Change workers  
MAX_LLM_CALLS="${BENCHMARK_MAX_LLM_CALLS:-${SHOPPING_MAX_LLM_CALLS:-400}}"  # Change max LLM calls
SHOPPING_AGENT_MODEL="${BENCHMARK_MODEL:-${SHOPPING_AGENT_MODEL:-qwen-plus}}"  # Change model
```

Then simply run:

```bash
bash run.sh
```

**How it works:**
1. Creates an **isolated database copy** with unique timestamp for each run (e.g., `database_run_qwen-plus_level1_20250105143022_12345/`). This allows multiple concurrent runs without interference.
2. Runs agent inference for all specified models across all levels (sequentially: level 1 → 2 → 3)
3. Moves inference results to `database_infered/` after completion
4. Runs evaluation pipeline for each level
5. Generates evaluation reports in `result_report/` for each level (reports are **always saved**, even if model is invalid)
6. Calculates overall statistics across all levels for each model and saves to `result_report/{model_name}_statistics.json`

**Note on concurrent runs:** Each run uses an isolated database directory, so you can safely run multiple benchmarks simultaneously (e.g., testing different models in parallel).


## 🔄 Understanding the Pipeline

The benchmark runs in two main stages:

#### Stage 1: Inference (Agent Planning)
**What it does:** 
- Loads shopping planning tasks from `data/level_{level}_query_meta.json`
- Calls the LLM agent to generate shopping plans
- Agent uses tools to query database (search products, filter, add to cart, etc.)
- Saves agent trajectories and execution logs in `database/case_{id}/`

**Output:**
```
database/
├── case_0/
│   ├── messages.json          # Agent execution traces
│   ├── cart.json              # Final shopping cart
│   └── validation_cases.json  # Ground truth
├── case_1/
│   └── ...
└── ...
```

#### Stage 2: Evaluation
**What it does:**
- Compares agent-generated carts with ground truth
- Calculates accuracy scores (product matching, coupon matching)
- Validates case completion
- Generates evaluation reports

**Output:**
```
result_report/database_{MODEL}_level{LEVEL}_{TIMESTAMP}/
├── summary_report.json        # Overall metrics and statistics
├── case_0_report.json         # Individual case detailed reports
├── case_1_report.json
└── ...                        # One report file per case
```

## 📊 Viewing Results

#### Cross-Level Statistics (Overall Score)

After running all levels for a model, the script automatically calculates overall statistics across all levels. This provides a comprehensive view of model performance across different difficulty levels.

```bash
# View overall statistics for a model
cat result_report/{MODEL}_statistics.json
```

**Example Output:**
```json
{
  "model_name": "qwen-plus",
  "statistics_time": "2026-01-05T12:30:45.123456",
  "levels": {
    "level_1": {
      "folder_name": "database_qwen-plus_level1_202601051200",
      "total_cases": 50,
      "successful_cases": 45,
      "failed_cases": 5,
      "total_matched_products": 200,
      "total_expected_products": 210,
      "total_extra_products": 10,
      "average_case_score": 0.90,
      "overall_match_rate": 0.952,
      "incomplete_cases": 0,
      "incomplete_rate": 0.0,
      "valid": true
    },
    "level_2": {
      "folder_name": "database_qwen-plus_level2_202601051300",
      "total_cases": 50,
      "successful_cases": 30,
      "failed_cases": 20,
      "total_matched_products": 150,
      "total_expected_products": 180,
      "total_extra_products": 25,
      "average_case_score": 0.60,
      "overall_match_rate": 0.833,
      "incomplete_cases": 2,
      "incomplete_rate": 0.04,
      "valid": true
    },
    "level_3": {
      "folder_name": "database_qwen-plus_level3_202601051400",
      "total_cases": 50,
      "successful_cases": 20,
      "failed_cases": 30,
      "total_matched_products": 100,
      "total_expected_products": 200,
      "total_extra_products": 40,
      "average_case_score": 0.40,
      "overall_match_rate": 0.500,
      "incomplete_cases": 5,
      "incomplete_rate": 0.10,
      "valid": true
    }
  },
  "total": {
    "total_cases": 150,
    "successful_cases": 95,
    "failed_cases": 55,
    "total_matched_products": 450,
    "total_expected_products": 590,
    "total_extra_products": 75,
    "successful_rate": 0.6333,
    "match_rate": 0.7627,
    "weighted_average_case_score": 0.6333,
    "incomplete_cases": 7,
    "incomplete_rate": 0.0467,
    "valid": true,
    "levels_completed": [1, 2, 3]
  }
}
```

**Key Metrics Explained:**
- **`successful_rate`**: Overall percentage of cases that achieved perfect scores (all products and coupons matched)
- **`match_rate`** ⭐: Overall percentage of expected products that were correctly matched. **This is the main metric reported in the paper.**
- **`weighted_average_case_score`** ⭐: Average case score weighted by the number of cases in each level. **This is the main metric reported in the paper.**
- **`levels_completed`**: List of levels included in the statistics
- **`valid`**: Whether the model is considered valid (incomplete_rate ≤ 10% for all levels)

**Note:** Evaluation reports are **always saved** regardless of the `valid` status. This allows for debugging and analysis even when a model has high incomplete rates (e.g., due to early termination or errors). The `valid` flag in the report indicates whether the results should be considered reliable for benchmarking.

#### Level Statistics

```bash
cat result_report/database_{MODEL}_level{LEVEL}_{TIMESTAMP}/summary_report.json
```

**Example Output:**
```json
{
  "evaluation_time": "2026-01-04T12:09:18.522300",
  "overall_statistics": {
    "total_cases": 50,
    "successful_cases": 11,
    "failed_cases": 39,
    "average_score": 0.22,
    "average_case_score": 0.22,
    "max_score": 1.0,
    "min_score": 0.0,
    "total_matched_products": 152,
    "total_expected_products": 215,
    "total_extra_products": 54,
    "overall_match_rate": 0.707,
    "incomplete_cases": 0,
    "incomplete_rate": 0.0,
    "valid": true
  },
  "case_results": [
    {
      "case_name": "case_1",
      "success": false,
      "score": 0.8,
      "matched_count": 4,
      "expected_count": 5,
      "extra_products_count": 1,
      "case_score": 0.0,
      "is_completed": true
    }
  ],
  "detailed_results": [...]
}
```

#### Per-Case Details

```bash
# View detailed report for a specific case
cat result_report/database_{MODEL}_level{LEVEL}_{TIMESTAMP}/case_0_report.json

```

**Example Case Report:**
```json
{
  "case_name": "case_1",
  "evaluation_time": "2026-01-04T12:09:18.174467",
  "summary": {
    "score": 0.8,
    "matched_count": 4,
    "expected_count": 5,
    "extra_products_count": 1,
    "coupon_score": 0.0
  },
  "query": "User shopping query...",
  "matched_products": ["706395e1", "3b5b2e0e", ...],
  "matched_coupons": [],
  "ground_truth_coupons": [],
  "unmatched_ground_truth_products": [...],
  "extra_products": [...],
  "ground_truth_products": [...]
}
```

## 📝 Notes

- The benchmark automatically manages database initialization per run
- Results are backed up to `database_infered/` after each model inference
- Evaluation reports are saved to `result_report/`
- The script supports running multiple models sequentially with automatic delays between runs



================================================
FILE: benchmark/deepplanning/shoppingplanning/agent/call_llm.py
================================================
"""
Universal LLM calling module
Supports OpenAI-compatible APIs
"""
import json
import os
import time
from pathlib import Path
from typing import List, Dict, Any, Optional

import openai


def load_model_config(model_name: str) -> Dict[str, Any]:
    """
    Load model configuration from models_config.json
    
    Searches for models_config.json in the following order:
    1. Current domain directory (shoppingplanning/)
    2. Parent directory (project root)
    
    Args:
        model_name: Name of the model
        
    Returns:
        Model configuration dict
        
    Raises:
        FileNotFoundError: If config file not found
        ValueError: If model not found in config
    """
    # Try domain directory first
    domain_config_path = Path(__file__).parent.parent / 'models_config.json'
    # Try project root (parent of domain directory)
    root_config_path = Path(__file__).parent.parent.parent / 'models_config.json'
    
    config_path = None
    if domain_config_path.exists():
        config_path = domain_config_path
    elif root_config_path.exists():
        config_path = root_config_path
    else:
        raise FileNotFoundError(
            f"models_config.json not found in:\n"
            f"  - Domain directory: {domain_config_path}\n"
            f"  - Project root: {root_config_path}\n"
            f"Please create models_config.json in the project root or domain directory."
        )
    
    with open(config_path, 'r', encoding='utf-8') as f:
        config = json.load(f)
    
    models = config.get('models', {})
    if model_name not in models:
        available = ', '.join(models.keys())
        raise ValueError(
            f"Model '{model_name}' not found in models_config.json\n"
            f"Available models: {available}"
        )
    
    return models[model_name]


def create_client(model_name: str, model_config: Optional[Dict[str, Any]] = None):
    """
    Create OpenAI client based on model configuration
    
    Args:
        model_name: Name of the model
        model_config: Model configuration (if None, will load from config file)
        
    Returns:
        Initialized OpenAI client instance
    """
    if model_config is None:
        model_config = load_model_config(model_name)
    
    model_type = model_config.get('model_type', 'openai')
    base_url = model_config['base_url']
    api_key_env = model_config.get('api_key_env')
    api_key = os.getenv(api_key_env) if api_key_env else None
    
    if not api_key:
        raise RuntimeError(
            f"API key not found for model '{model_name}'\n"
            f"Please set environment variable: {api_key_env}"
        )
    
    if model_type == 'openai':
        # OpenAI and OpenAI-compatible APIs (Qwen, DeepSeek, etc.)
        return openai.OpenAI(api_key=api_key, base_url=base_url)


def call_llm(
    config_name: str,
    messages: List[Dict[str, Any]],
    tools: Optional[List[Dict[str, Any]]] = None
):
    """
    Universal LLM call with automatic client creation and retry logic
    
    Args:
        config_name: Configuration name from models_config.json (display name)
        messages: Message list
        tools: Tool definitions (optional)
    
    Returns:
        API response object
        
    Note:
        All parameters (model_name, temperature, extra_body, etc.) are loaded
        from models_config.json based on the config_name.
    """
    # Load model config and create client
    model_config = load_model_config(config_name)
    client = create_client(config_name, model_config)
    
    # Get actual model name for API call (fallback to config_name if not specified)
    actual_model_name = model_config.get('model_name', config_name)
    
    # Get parameters from config or use defaults
    temperature = model_config.get('temperature', None)
    max_retries = model_config.get('max_retries', 30)
    backoff = model_config.get('backoff', 1.5)
    extra_body = model_config.get('extra_body')  # Get from config
    
    # Detect reasoning models (don't support temperature)
    is_reasoning_model = any(x in actual_model_name.lower() for x in ['o1', 'o3', 'o4-mini', 'reasoner'])
    
    last_err = None
    
    for attempt in range(max_retries):
        try:
            params = {
                "model": actual_model_name,
                "messages": messages,
            }
            
            if tools:
                params["tools"] = tools
            
            if not is_reasoning_model and temperature:
                params["temperature"] = temperature
            
            if extra_body:
                params["extra_body"] = extra_body
            response = client.chat.completions.create(**params)
            
            # Validate response
            msg = response.choices[0].message
            has_content = msg.content and msg.content.strip()
            has_tool_calls = hasattr(msg, 'tool_calls') and msg.tool_calls
            
            if not has_content and not has_tool_calls:
                raise ValueError("Model returned an empty response without tool calls")
            
            return response
            
        except Exception as e:
            last_err = e
            
            if attempt == max_retries - 1:
                raise
            
            wait_time = backoff
            print(f"  ⚠️  LLM API error (attempt {attempt + 1}/{max_retries}): {e}")
            print(f"     Retrying in {wait_time:.1f}s...")
            time.sleep(wait_time)
    
    raise last_err if last_err else RuntimeError("LLM API call failed")


================================================
FILE: benchmark/deepplanning/shoppingplanning/agent/prompts.py
================================================

SYSTEM_PROMPT_level1 = """
You are an expert and highly strategic AI Shopping Assistant. Your mission is to understand a user's shopping request and assemble the combination of products that results in the **absolute lowest final price** for the user.

**Core Mission:**
Analyze the user's request, leverage any provided contextual data (about the user and products), and construct the most cost-effective shopping cart. The best strategy is always the one that results in the lowest total cost, period.

**Guiding Principles & Reasoning Workflow:**

1.  **Determine User's Exact Shopping Requirements:** Begin by clearly identifying the user's essential purchase goals. This means establishing the **precise types and quantities of products** they must have. If important details like size or gender are missing from the request, actively reference the user's profile to select the appropriate product variants. Your first priority is to ensure all core product needs are fully satisfied.

2.  **The Ultimate Goal: Absolute Minimum Price via Product Selection:** Your primary objective is to minimize the final bill by finding the most economical products. To achieve this, you must:
    *   **Actively Search for Alternatives:** Scour the available products to find all items that meet the user's core requirements (e.g., "a pair of running shoes, size 42").
    *   **Compare and Select the Cheapest Option:** From all the suitable alternatives you find, your strategy must be to select the product or combination of products that carries the **lowest price tag**.
    *   **Your recommendation must always be the cheapest possible combination of items** that fulfills the user's stated needs. If there are multiple products that serve the same purpose, you must choose the one with the lowest cost to build the final cart.

3.  **Cart as the Single Source of Truth:** All purchases are finalized based on the shopping cart's state. The cart contains the definitive list of products the user will buy, and the final price is calculated solely from the items within it.
    *   **Always verify the current cart status using the `get_cart_info` tool** before making any decisions or providing your final answer.
    *   Your entire strategy and all calculations must be based strictly on the cart's final state. The final combination of items in the cart is what determines the outcome.

4.  **Final Output Requirements:** Provide a comprehensive summary including:
    *   **Final Cart Contents:** An itemized breakdown of all products in the cart.
    *   **Final Calculated Price:** The total cost based on the items in the cart.
    *   **Clear Explanation:** A justification for why this specific combination of products was chosen and how it achieves the lowest possible price while meeting all of the user's requirements.
"""


SYSTEM_PROMPT_level2 = """
You are an expert and highly strategic AI Shopping Assistant. Your mission is to understand a user's shopping request and assemble the combination of products that results in the **absolute lowest final price for the user, while strictly adhering to their specified budget.**

**Core Mission:**
Analyze the user's request, leverage any provided contextual data (about the user, products, and **budget**), and construct the most cost-effective shopping cart. The best strategy is always the one that results in the lowest total cost **within the user's budget**. **Meeting the budget is the primary constraint; minimizing the price is the secondary objective.**

**Guiding Principles & Reasoning Workflow:**

1.  **Determine User's Exact Requirements & Constraints:** Begin by clearly identifying the user's essential goals. This means establishing:
    *   The **precise types and quantities of products** they must have. If important details like size or gender are missing, actively reference the user's profile to select appropriate variants.
    *   **The user's maximum budget.** This budget is a hard limit and your final recommended cart total **must not** exceed it. Your first priority is to find a solution that respects this financial boundary.

2.  **The Ultimate Goal: Cost Optimization Under Budget Constraints:** Your primary objective is to find the most economical combination of products that fulfills all requirements *and* fits within the budget. To achieve this, you must follow this strategic sequence:
    *   **Step A: Explore Feasible Combinations:** Scour available products to find all possible combinations that meet the user's core product requirements (e.g., "a pair of running shoes, size 42" and "a t-shirt, size L").
    *   **Step B: Filter by Budget:** Calculate the total price for each potential combination. Immediately discard any combination whose total price exceeds the user's specified budget.
    *   **Step C: Select the Optimal Solution:** From the remaining combinations that are **within the budget**, your strategy must be to select the one that has the **absolute lowest total price**. This is your final recommendation.
    *   **Step D: Handle Insufficient Budget Scenarios:** If, after exploring all possible combinations, **none** of them meet the budget requirement, you must clearly state this to the user. In this scenario, your recommendation should be the combination with the lowest possible price (even if it's over budget), and you must explicitly explain that the user's budget is insufficient for their requested items and state what the minimum required cost would be.

3.  **Cart as the Single Source of Truth:** All purchases are finalized based on the shopping cart's state. The cart contains the definitive list of products the user will buy, and the final price is calculated solely from the items within it.
    *   **Always verify the current cart status using the `get_cart_info` tool** before making any decisions or providing your final answer.
    *   Your entire strategy and all calculations must be based strictly on the cart's final state. The final combination of items in the cart is what determines the outcome.

4.  **Final Output Requirements:** Provide a comprehensive summary including:
    *   **Final Cart Contents:** An itemized breakdown of all products in the cart.
    *   **Final Calculated Price:** The total cost based on the items in the cart.
    *   **Clear Explanation:** A justification for your choice, explaining:
        *   How this specific combination meets all of the user's product requirements.
        *   How it achieves the lowest possible price **while respecting the given budget**.
        *   **If the budget could not be met, a clear explanation of why, and what the minimum cost would be.**
"""


SYSTEM_PROMPT_level3 = """
You are an expert and highly strategic AI Shopping Assistant. Your mission is to understand a user's shopping request and assemble the combination of **products and coupons** that results in the **absolute lowest final price for the user,** while also adhering to any specified budget.

**Core Mission:**
Analyze the user's request, leverage any provided contextual data (about the user, products, coupons, and budget), and construct the most cost-effective shopping cart. The best strategy is always the one that results in the lowest total cost. **Minimizing the price is the primary objective; meeting the budget is a secondary constraint.**

**Guiding Principles & Reasoning Workflow:**

**1. Determine User's Exact Requirements & Constraints:**
Begin by clearly identifying the user's essential goals. This means establishing:
*   The **precise types and quantities of products** they must have. If important details like size or gender are missing, actively reference the user's profile to select appropriate variants.
*   The **user's maximum budget,** if provided. This budget is a hard limit that should be respected.
*   The **user's available coupons** by reviewing their profile information. This is critical for calculating potential discounts.

**2. The Ultimate Goal: Absolute Minimum Price**
Your primary objective is to find the single most economical path to fulfilling the user's needs. This requires a holistic evaluation of all possible scenarios involving both products and coupons.

*   **Step A: Explore Feasible Combinations:** Scour available products to find all possible combinations that meet the user's core product requirements. This includes strategically selecting different versions of required products (e.g., choosing a slightly more expensive item) if it enables the use of a more valuable coupon that results in a lower overall final price. 

*   **Step B: Apply Coupon Logic & Calculate Scenarios:** For each potential product combination, calculate the final price by testing various coupon strategies to find the maximum possible discount. You must follow these rules strictly:

    *   **Coupon Application Logic:**
        *   **Prerequisites:** Before applying any coupon, verify that the user owns it and has a sufficient quantity.
        *   **Scope:** Each coupon applies to a specific price scope. Crucially, **`Cross-store` coupons apply to the entire cart's total price**, regardless of the brands involved, as long as the total meets the threshold. `Same-brand` coupons apply *only* to the subtotal of items from a single, matching brand.
        *   **Threshold:** A coupon can only be used if its relevant price scope (e.g., cart total for a cross-store coupon) meets or exceeds the coupon's threshold.
        *   **Stacking:** Multiple different coupons can be applied together, provided the relevant price scope for **each coupon individually** meets its own threshold after prior discounts are considered. When a same-brand coupon is applied, its discounted amount is deducted from the overall cart total before evaluating cross-store coupons.

    *   **Coupon Application Examples:**
        *   **Example 1: Comparing Different Strategies**
            *   Imagine a cart totals ¥1300 (¥1000 from Brand A, ¥300 from Brand B). The user owns one "Cross-store: ¥200 off every ¥1,200" coupon and two "Same-brand: ¥60 off every ¥400" coupons.
            *   *Evaluation:*
                *   **Strategy A (Use Cross-store):** The total cart price (¥1300) meets the ¥1200 threshold. Applying this gives a **¥200 discount**.
                *   **Strategy B (Use Same-brand only):** The Brand A subtotal (¥1000) meets the ¥400 threshold twice (¥1000 > ¥800). Applying two same-brand coupons gives 2 × ¥60 = **¥120 discount**.
            *   *Conclusion:* The ¥200 discount is greater. The optimal strategy is to use only the cross-store coupon.

        *   **Example 2: Stacking Coupons**
            *   Imagine a cart totals ¥1610 (¥1200 from Brand A, ¥410 from Brand B). The user has the same coupons.
            *   *Evaluation:* The total cart price (¥1610) exceeds the cross-store coupon threshold (¥1200), allowing a **¥200 discount**. After applying this to ¥1200 worth of items, ¥410 remains in the cart (from Brand B). This remaining amount exceeds the same-brand coupon threshold (¥410 > ¥400), so one "Same-brand: ¥60 off every ¥400" coupon can be applied for an additional **¥60 discount**.
            *   *Conclusion:* The optimal strategy is to stack both. Total discount: ¥200 + ¥60 = **¥260**.

        *   **Example 3: Same-brand Scope Limitations**
            *   Imagine a cart totals ¥500 (¥250 from Brand A, ¥250 from Brand B) and the user owns two "Same-brand: ¥25 off every ¥200" coupons.
            *   *Evaluation:* Brand A's subtotal (¥250) meets the ¥200 threshold once, and Brand B's subtotal (¥250) also meets it once. One coupon can be used on each brand's items. Total discount: ¥25 + ¥25 = **¥50**.

*   **Step C: Select the Optimal Solution:**
    *   From the remaining combinations that are **within the budget**, select the one with the **absolute lowest total price**. This is your final recommendation.
    *   **If no combination meets the budget**, you must clearly state this. Your recommendation should then be the combination with the absolute lowest possible price (even if it's over budget), and you must explain that the user's budget is insufficient and state what the minimum required cost would be.

**3. Cart as the Single Source of Truth:**
All purchases are finalized based on the shopping cart's state. The cart contains the definitive list of products and coupons the user will use, and the final price is calculated solely from its contents.
*   **Always verify the current cart status using the `get_cart_info` tool** before making a final decision.
*   Your entire strategy must be based strictly on the cart's final state. This includes ensuring that **any coupons you intend to use are added to the cart** for the calculations to be valid. The final combination of items and coupon usage in the cart determines the outcome.

**4. Final Output Requirements:**
Provide a comprehensive summary including:
*   **Final Cart Contents:** An itemized breakdown of all products in the cart.
*   **Optimal Coupon Usage Plan:** A clear list of coupons used and detailed calculations showing how the discount was derived.
*   **Final Calculated Price:** The total cost after all discounts have been applied.
*   **Clear Explanation:** A justification for your choice, explaining:
    *   How this combination meets all of the user's product requirements.
    *   How it achieves the lowest possible price through strategic product selection and coupon application.
"""

# Create a namespace object to hold all prompts for easy access
class PromptLib:
    """Namespace for all system prompts"""
    pass

prompt_lib = PromptLib()
prompt_lib.SYSTEM_PROMPT_level1 = SYSTEM_PROMPT_level1
prompt_lib.SYSTEM_PROMPT_level2 = SYSTEM_PROMPT_level2
prompt_lib.SYSTEM_PROMPT_level3 = SYSTEM_PROMPT_level3

================================================
FILE: benchmark/deepplanning/shoppingplanning/agent/shopping_agent.py
================================================
"""
Custom Agent implementation - Framework-independent

Uses universal LLM calling for multiple providers
"""

import json
import os
import sys
import time
import uuid
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, List, Optional
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading
from threading import Lock

try:
    from .call_llm import call_llm
except ImportError:
    from call_llm import call_llm




class ShoppingFnAgent:
    """
    Lightweight function-calling Agent (shopping scenario):
    - Loads shopping_tool_schema.json as OpenAI Chat Completions tools
    - Dynamically loads tool classes (BaseShoppingTool subclasses) from shopping_tools directory
    - Iteratively calls LLM and executes tool_calls until final answer
    """

    def __init__(self,
                 model: str | None = None,
                 tool_schema_path: str | None = None,
                 base_url: str | None = None,
                 api_key: str | None = None,
                 sample_id: str | None = None,
                 database_base_path: str | None = None) -> None:
        """
        Initialize Agent
        
        Args:
            model: Model name (must exist in models_config.json)
            tool_schema_path: Path to tool schema JSON file
            base_url: Base URL for API (deprecated, loaded from models_config.json)
            api_key: API key (deprecated, loaded from models_config.json)
            sample_id: Sample ID for database path resolution
            database_base_path: Base path to database directory
        """
        self._load_env_from_dotenv()

        self.model = model or os.getenv("TOOLS_AGENT_MODEL", "qwen-plus")
        default_schema = Path(__file__).resolve().parent / 'tools' / 'shopping_tool_schema.json'
        self.tool_schema_path = tool_schema_path or os.getenv("SHOPPING_SCHEMA_PATH", str(default_schema))

        self.sample_id = sample_id
        if database_base_path:
            self.database_base_path = Path(database_base_path)
        else:
            # Default path: ShoppingBench/database
            project_root = Path(__file__).resolve().parent
            self.database_base_path = project_root / 'database'

        self.tool_config = self._build_tool_config()
        self.tools_schema = self._load_tool_schemas()
        self.openai_tools = self._build_openai_tools(self.tools_schema)
        self.tool_instances = self._load_tool_instances()

        if not Path(self.tool_schema_path).exists():
            raise FileNotFoundError(f"Tool schema not found: {self.tool_schema_path}")

    def _build_tool_config(self) -> Dict[str, Any]:
        """
        Build tool configuration with database path.
        All shopping tools use the same products.jsonl file, simplifying the logic.
        """
        cfg = {}
        if self.sample_id is not None:
            # Shopping scenario database path structure: database/case_{sample_id}/products.jsonl
            db_path = self.database_base_path / f'case_{self.sample_id}'
            
            if db_path.exists():
                cfg['database_path'] = str(db_path)
            else:
                if os.getenv('DEBUG_TOOLS') == '1':
                    print(f"[ShoppingFnAgent] WARN: Database not found for case {self.sample_id}: {db_path}")
        return cfg
    
    def _load_tool_instances(self) -> Dict[str, Any]:
        """
        Dynamically load tool instances from TOOL_REGISTRY.
        
        Tool registration mechanism:
        1. Tool classes use the @register_tool('tool_name') decorator
        2. The decorator executes at class definition time, registering the tool class to base_shopping_tool.TOOL_REGISTRY
        3. When importing the tools package, __init__.py imports all tool modules, triggering decorator execution
        4. Retrieve registered tool classes from TOOL_REGISTRY and instantiate them
        """
        instances: Dict[str, Any] = {}

        tools_dir = Path(__file__).resolve().parent.parent / 'tools'
        # Add tools_dir to sys.path to enable 'from base_shopping_tool import ...' in tool files
        sys.path.insert(0, str(tools_dir))
        sys.path.insert(0, str(tools_dir.parent))

        # Import tools package to trigger @register_tool decorator execution for all tool modules
        # tools/__init__.py imports all tool modules, and decorators register tool classes to TOOL_REGISTRY
        try:
            import tools  # noqa: F401
        except Exception as e:
            if os.getenv('DEBUG_TOOLS') == '1':
                print(f"[ShoppingFnAgent] WARN: import tools failed: {e}")
            return instances

        # Get TOOL_REGISTRY from base_shopping_tool module
        try:
            import base_shopping_tool  # type: ignore
            tool_registry = getattr(base_shopping_tool, 'TOOL_REGISTRY', None)
            if tool_registry is None:
                if os.getenv('DEBUG_TOOLS') == '1':
                    print("[ShoppingFnAgent] WARN: TOOL_REGISTRY not found in base_shopping_tool")
                return instances
        except Exception as e:
            if os.getenv('DEBUG_TOOLS') == '1':
                print(f"[ShoppingFnAgent] WARN: import base_shopping_tool failed: {e}")
            return instances

        if not tool_registry:
            print("[ShoppingFnAgent] WARN: TOOL_REGISTRY is empty. No tools were registered.")
            return instances

        # Create tool instances from TOOL_REGISTRY
        tool_cfg = self.tool_config
        for tool_name, tool_cls in tool_registry.items():
            try:
                inst = tool_cls(cfg=tool_cfg)
                instances[tool_name] = inst
            except Exception as e:
                if os.getenv('DEBUG_TOOLS') == '1':
                    print(f"[ShoppingFnAgent] WARN: Failed to instantiate tool '{tool_name}': {e}")
                continue

        return instances

    def _load_env_from_dotenv(self) -> None:
        """
        Load environment variables from .env file
        
        Searches for .env in the following order:
        1. Domain directory (shoppingplanning/)
        2. Project root (parent of domain)
        """
        try:
            # Try domain directory first
            domain_root = Path(__file__).resolve().parent.parent
            domain_dotenv = domain_root / '.env'
            
            # Try project root
            project_root = domain_root.parent
            project_dotenv = project_root / '.env'
            
            # Use project root .env if it exists, otherwise domain .env
            dotenv_path = project_dotenv if project_dotenv.exists() else domain_dotenv
            
            if not dotenv_path.exists():
                return
            
            for line in dotenv_path.read_text(encoding='utf-8').splitlines():
                line = line.strip()
                if not line or line.startswith('#') or '=' not in line:
                    continue
                key, val = line.split('=', 1)
                key = key.strip()
                val = val.strip().strip('"').strip("'")
                if key and (key not in os.environ):
                    os.environ[key] = val
        except Exception:
            pass

    def _load_tool_schemas(self) -> List[Dict[str, Any]]:
        """Load tool schemas from JSON file"""
        with open(self.tool_schema_path, 'r', encoding='utf-8') as f:
            return json.load(f)

    def _build_openai_tools(self, schemas: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """
        Build OpenAI tools format
        - If schema is already {type:function, function:{...}}, use as-is
        - Otherwise wrap as function definition
        """
        tools: List[Dict[str, Any]] = []
        for s in schemas:
            if isinstance(s, dict) and s.get('type') == 'function' and isinstance(s.get('function'), dict):
                tools.append(s)
        return tools

    def _exec_tool(self, name: str, arguments_json: str) -> str:
        """Execute tool call"""
        inst = self.tool_instances.get(name)
        if not inst:
            return json.dumps({"error": f"tool '{name}' not found"}, ensure_ascii=False)
        try:
            res = inst.call(arguments_json)  # Pass raw JSON string
            return res if isinstance(res, str) else json.dumps(res, ensure_ascii=False)
        except Exception as e:
            return json.dumps({"error": str(e)}, ensure_ascii=False)

    def _call_llm(self, messages: List[Dict[str, Any]], tools: Optional[List[Dict[str, Any]]] = None):
        """Call LLM with unified handling for all models"""
        return call_llm(
            config_name=self.model,
            messages=messages,
            tools=tools
        )

    def _detect_tool_calls(self, assistant_message) -> List[Dict[str, Any]]:
        """Detect and normalize tool calls"""
        tool_calls = getattr(assistant_message, 'tool_calls', None)
        calls: List[Dict[str, Any]] = []
        if not tool_calls:
            return calls
        
        for idx, tc in enumerate(tool_calls):
            try:
                # Generate unique ID if not provided by the model
                tool_call_id = tc.id
                if tool_call_id is None or not tool_call_id:
                    tool_call_id = f"call_{uuid.uuid4().hex[:24]}"
                
                calls.append({
                    'id': tool_call_id,
                    'name': tc.function.name,
                    'arguments': tc.function.arguments,
                })
            except Exception:
                continue
        
        return calls

    def _add_to_cart(self, history_messages: List[Any]) -> List[Any]:
        history_messages = list(history_messages)
        history_messages.append({
            "role": "user",
            "content": (
                "Check whether the items in the shopping cart meet the requirements. "
                "If not, add the required items to the cart. If there are multiple possible solutions, "
                "choose the optimal one. The final result should be based on the items in the cart. "
                "If the task is already complete, then stop."
            )
        })
        return history_messages

    def run(self, user_query: str, system_prompt: str | None = None, max_llm_calls: int = 100, save_messages: bool = True, messages_output_dir: str | None = None, sample_id: str | None = None) -> List[Any]:
        """
        Agent main loop: Call LLM → Execute tools → Repeat until final answer
        
        Args:
            user_query: User query
            system_prompt: System prompt
            max_llm_calls: Maximum LLM calls
            save_messages: Whether to save messages to file
            messages_output_dir: Output directory for messages (if sample_id not provided)
            sample_id: Sample ID for database path resolution
            
        Returns:
            Complete message history
        """
        if save_messages:
            # If sample_id exists, save to {database_base_path}/case_{sample_id}/messages.json
            # Use self.database_base_path for proper isolation when running concurrent instances
            if sample_id:
                db_case_dir = self.database_base_path / f'case_{sample_id}'
                db_case_dir.mkdir(parents=True, exist_ok=True)
                messages_file = db_case_dir / 'messages.json'
            else:
                # Otherwise fallback to result/messages
                msg_dir = Path(messages_output_dir or (Path(__file__).resolve().parent.parent / 'result' / 'messages'))
                msg_dir.mkdir(parents=True, exist_ok=True)
                ts = datetime.now().strftime("%Y%m%d_%H%M%S")
                messages_file = msg_dir / f'messages_{ts}.json'

        messages: List[Any] = ([{"role": "system", "content": system_prompt}] if system_prompt else []) + [{"role": "user", "content": user_query}]
        if save_messages:
            self._save_messages(messages, messages_file, 0, "Initial messages")

        for step_count in range(1, max_llm_calls + 1):
            resp = self._call_llm(messages=messages, tools=self.openai_tools)
            msg = resp.choices[0].message
            
            # Convert message object to serializable dict
            msg_dict = {
                "role": "assistant",
                "content": msg.content or '',
            }
            
            # Preserve reasoning_content if present
            if hasattr(msg, 'reasoning_content') and msg.reasoning_content:
                msg_dict['reasoning_content'] = msg.reasoning_content
            
            calls = self._detect_tool_calls(msg)
            if calls:
                msg_dict["tool_calls"] = [
                    {
                        'id': call['id'],
                        'type': 'function',
                        'function': {
                            'name': call['name'],
                            'arguments': call['arguments']
                        }
                    }
                    for call in calls
                ]
            
            messages.append(msg_dict)
            if save_messages:
                self._save_messages(messages, messages_file, step_count, f"LLM response - {len(calls)} tool calls")
            
            if not calls:
                break

            for call in calls:
                tool_result = self._exec_tool(call['name'], call['arguments'])
                messages.append({"role": "tool", "tool_call_id": call['id'], "content": tool_result})
            if save_messages:
                self._save_messages(messages, messages_file, step_count, f"Tool execution completed - {len(calls)} tools")

        messages = self._add_to_cart(messages)
        for step_count in range(1, max_llm_calls + 1):
            resp = self._call_llm(messages=messages, tools=self.openai_tools)
            msg = resp.choices[0].message
            
            # Convert message object to serializable dict
            msg_dict = {
                "role": "assistant",
                "content": msg.content or '',
            }
            
            # Preserve reasoning_content if present
            if hasattr(msg, 'reasoning_content') and msg.reasoning_content:
                msg_dict['reasoning_content'] = msg.reasoning_content
            
            calls = self._detect_tool_calls(msg)
            if calls:
                msg_dict["tool_calls"] = [
                    {
                        'id': call['id'],
                        'type': 'function',
                        'function': {
                            'name': call['name'],
                            'arguments': call['arguments']
                        }
                    }
                    for call in calls
                ]
            
            messages.append(msg_dict)
            if save_messages:
                self._save_messages(messages, messages_file, step_count, f"LLM response - {len(calls)} tool calls")
            
            if not calls:
                return messages
            
            for call in calls:
                tool_result = self._exec_tool(call['name'], call['arguments'])
                messages.append({"role": "tool", "tool_call_id": call['id'], "content": tool_result})
            if save_messages:
                self._save_messages(messages, messages_file, step_count, f"Tool execution completed - {len(calls)} tools")

        return messages
    
    def _save_messages(self, messages: List[Any], filepath: Path, step: int, description: str):
        """Save messages to file"""
        serializable_messages = [m.model_dump() if hasattr(m, 'model_dump') else m for m in messages]
        save_data = {"step": step, "description": description, "messages": serializable_messages}
        try:
            with open(filepath, 'w', encoding='utf-8') as f:
                json.dump(save_data, f, ensure_ascii=False, indent=2)
            thread_info = threading.current_thread().name
            print(f"  💾 [{thread_info}] Step {step}: {description} - Saved {len(messages)} messages")
        except Exception as e:
            thread_info = threading.current_thread().name
            print(f"  ⚠️  [{thread_info}] Failed to save messages: {e}")


def run_agent_inference(
    model: str,
    test_data_path: Path,
    database_dir: Path,
    tool_schema_path: Path,
    system_prompt: str,
    workers: int = 10,
    max_llm_calls: int = 100,
    rerun_ids: Optional[List[int]] = None,
) -> Dict[str, Any]:
    """
    Run agent inference (batch processing)
    
    Args:
        model: Configuration name from models_config.json
        test_data_path: Path to test data JSON file
        database_dir: Base path to database directory
        tool_schema_path: Path to tool schema JSON file
        system_prompt: System prompt for the agent
        workers: Number of parallel workers
        max_llm_calls: Maximum LLM calls per sample
        rerun_ids: Optional list of specific IDs to rerun. If None, run all samples.
    
    Returns:
        Results summary dict
    """
    with open(test_data_path, 'r', encoding='utf-8') as f:
        test_data = json.load(f)
    
    # Filter samples if rerun_ids is specified
    if rerun_ids is not None:
        rerun_ids_set = set(str(id) for id in rerun_ids)  # Convert to strings for comparison
        original_count = len(test_data)
        test_data = [s for s in test_data if str(s.get('id')) in rerun_ids_set]
        print(f"  🔄 Filtered {original_count} samples to {len(test_data)} samples for rerun")
        
        if len(test_data) == 0:
            print(f"  ⚠️  Warning: No samples found matching the specified IDs")
            return {
                'total': 0,
                'success': 0,
                'failed': 0,
                'elapsed_time': 0,
                'results': []
            }
    
    print(f"\n{'='*80}")
    print(f"Agent Inference")
    print(f"{'='*80}")
    print(f"Model: {model}")
    print(f"Samples: {len(test_data)}")
    print(f"Workers: {workers}")
    print(f"{'='*80}\n")
    
    print_lock = Lock()
    results = []
    
    def process_sample(sample):
        sample_id = sample.get('id', 'unknown')
        query = sample.get('query', '')
        
        try:
            
            agent = ShoppingFnAgent(
                model=model,
                sample_id=str(sample_id),
                database_base_path=str(database_dir),
                tool_schema_path=str(tool_schema_path)
            )
            
            start_time = time.time()
            
            messages = agent.run(
                user_query=query,
                system_prompt=system_prompt,
                save_messages=True,
                sample_id=str(sample_id),
                max_llm_calls=max_llm_calls
            )
            
            elapsed = time.time() - start_time
            
            result = {
                'id': sample_id,
                'query': query,
                'model': model,
                'messages': messages,
                'elapsed_time': elapsed,
                'success': True,
            }
            
            with print_lock:
                print(f"✅ Sample {sample_id} completed in {elapsed:.2f}s")
            
            return result
            
        except Exception as e:
            with print_lock:
                print(f"❌ Sample {sample_id} failed: {e}")
                import traceback
                traceback.print_exc()
            
            return {
                'id': sample_id,
                'query': query,
                'success': False,
                'error': str(e),
            }
    
    with ThreadPoolExecutor(max_workers=workers) as executor:
        futures = [executor.submit(process_sample, sample) for sample in test_data]
        for future in as_completed(futures):
            result = future.result()
            results.append(result)
    
    success_count = sum(1 for r in results if r['success'])
    
    return {
        'total': len(results),
        'success': success_count,
        'failed': len(results) - success_count,
        'results': results
    }


if __name__ == '__main__':
    """Simple test"""
    import argparse
    
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', default='qwen-plus', help='Configuration name from models_config.json')
    parser.add_argument('--level', type=int, default=1, choices=[1, 2, 3], help='Shopping level: 1, 2, or 3')
    args = parser.parse_args()
    
    base_dir = Path(__file__).resolve().parent.parent
    test_output_dir = base_dir / 'results' / 'test'
    
    # Get system prompt for the specified level
    try:
        from .prompts import prompt_lib
    except ImportError:
        from prompts import prompt_lib
    
    system_prompt = getattr(prompt_lib, f'SYSTEM_PROMPT_level{args.level}', None)
    if system_prompt is None:
        raise ValueError(f"System prompt for level {args.level} not found")
    
    result = run_agent_inference(
        model=args.model,
        test_data_path=base_dir / 'data' / f'level_{args.level}_query_meta.json',
        database_dir=base_dir / 'database',
        tool_schema_path=base_dir / 'tools' / 'shopping_tool_schema.json',
        system_prompt=system_prompt,
        workers=2,
        max_llm_calls=100,
    )
    print(f"\nTest completed: {result['success']}/{result['total']} succeeded")



================================================
FILE: benchmark/deepplanning/shoppingplanning/data/level_1_query_meta.json
================================================
[
    {
        "id": "1",
        "query": "I'm putting together a complete footwear collection and need to order several specific items online. First, I'm looking for something from Nike in orange that has strong customer satisfaction - it needs fewer than 10 one-star reviews and more than 300 four-star reviews to ensure quality. Next, I need the Men's Puma RS-X Reinvention Classic White Sneakers from Puma, and since I need them quickly, the transport time must be less than 2 days. This item should have more than 3000 total reviews but fewer than 30 two-star reviews to confirm it's well-received. I also need an all-seasons product that can arrive within 1 day and has fewer than 30 two-star reviews for reliability. Additionally, I'm specifically looking for the Men's Aerios FL 2 GTX Trail Shoe in gold, which must have more than 200 five-star reviews and fewer than 5 one-star reviews to guarantee excellent quality. Finally, I need a summer item from Vans that's highly rated with more than 250 five-star reviews and fewer than 5 one-star reviews."
    },
    {
        "id": "2",
        "query": "I'm getting all my gear ready for an upcoming outdoor trip. First, I'm looking for a very specific item: the 'Men's Atom LT Insulated Crew Neck Pullover' from Arc'teryx in Navy Blue. I'm only interested if it's highly rated, with an average score over 4.5, more than 50 four-star ratings, and fewer than 5 one-star ratings. Next, I need something from New Balance

Download .txt

gitextract_aif8221u/

├── .github/
│   └── workflows/
│       └── deploy-docs.yml
├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── MANIFEST.in
├── README.md
├── README_CN.md
├── benchmark/
│   ├── code_interpreter/
│   │   ├── README.md
│   │   ├── code_interpreter.py
│   │   ├── config.py
│   │   ├── inference_and_execute.py
│   │   ├── metrics/
│   │   │   ├── __init__.py
│   │   │   ├── code_execution.py
│   │   │   ├── gsm8k.py
│   │   │   └── visualization.py
│   │   ├── models/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── dashscope.py
│   │   │   ├── llm.py
│   │   │   └── qwen.py
│   │   ├── parser/
│   │   │   ├── __init__.py
│   │   │   ├── internlm_parser.py
│   │   │   └── react_parser.py
│   │   ├── prompt/
│   │   │   ├── __init__.py
│   │   │   ├── internlm_react.py
│   │   │   ├── llama_react.py
│   │   │   ├── qwen_react.py
│   │   │   └── react.py
│   │   ├── requirements.txt
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── code_utils.py
│   │       └── data_utils.py
│   └── deepplanning/
│       ├── README.md
│       ├── aggregate_results.py
│       ├── env.example
│       ├── models_config.json
│       ├── requirements.txt
│       ├── run_all.sh
│       ├── shoppingplanning/
│       │   ├── README.md
│       │   ├── agent/
│       │   │   ├── call_llm.py
│       │   │   ├── prompts.py
│       │   │   └── shopping_agent.py
│       │   ├── data/
│       │   │   ├── level_1_query_meta.json
│       │   │   ├── level_2_query_meta.json
│       │   │   └── level_3_query_meta.json
│       │   ├── evaluation/
│       │   │   ├── evaluation_pipeline.py
│       │   │   └── score_statistics.py
│       │   ├── run.py
│       │   ├── run.sh
│       │   └── tools/
│       │       ├── __init__.py
│       │       ├── add_coupon_to_cart.py
│       │       ├── add_product_to_cart.py
│       │       ├── base_shopping_tool.py
│       │       ├── calculate_transport_time_tool.py
│       │       ├── delete_coupon_from_cart.py
│       │       ├── delete_product_from_cart.py
│       │       ├── filter_by_applicable_coupons_tool.py
│       │       ├── filter_by_brand_tool.py
│       │       ├── filter_by_color_tool.py
│       │       ├── filter_by_range_tool.py
│       │       ├── filter_by_size_tool.py
│       │       ├── get_cart_info.py
│       │       ├── get_product_details_tool.py
│       │       ├── get_user_info.py
│       │       ├── search_products_tool.py
│       │       ├── shopping_tool_schema.json
│       │       └── sort_product_tool.py
│       └── travelplanning/
│           ├── README.md
│           ├── agent/
│           │   ├── __init__.py
│           │   ├── call_llm.py
│           │   ├── prompts.py
│           │   └── tools_fn_agent.py
│           ├── data/
│           │   ├── travelplanning_query_en.json
│           │   └── travelplanning_query_zh.json
│           ├── evaluation/
│           │   ├── __init__.py
│           │   ├── constraints_commonsense.py
│           │   ├── constraints_hard.py
│           │   ├── convert_report.py
│           │   ├── eval_converted.py
│           │   └── utils.py
│           ├── run.py
│           ├── run.sh
│           └── tools/
│               ├── __init__.py
│               ├── attraction_query_tool.py
│               ├── base_travel_tool.py
│               ├── flight_query_tool.py
│               ├── hotel_query_tool.py
│               ├── location_search_tool.py
│               ├── restaurant_query_tool.py
│               ├── roadroute_query_tool.py
│               ├── tool_schema.json
│               ├── tool_schema_en.json
│               ├── tool_schema_zh.json
│               └── train_query_tool.py
├── browser_qwen/
│   ├── background.js
│   ├── manifest.json
│   └── src/
│       ├── content.js
│       ├── popup.html
│       └── popup.js
├── browser_qwen.md
├── browser_qwen_cn.md
├── examples/
│   ├── __init__.py
│   ├── assistant_add_custom_tool.py
│   ├── assistant_audio.py
│   ├── assistant_mcp_sqlite_bot.py
│   ├── assistant_omni.py
│   ├── assistant_qwen3.5.py
│   ├── assistant_qwen3.py
│   ├── assistant_qwen3_coder.py
│   ├── assistant_qwen3vl.py
│   ├── assistant_qwq.py
│   ├── assistant_rag.py
│   ├── assistant_weather_bot.py
│   ├── function_calling.py
│   ├── function_calling_in_parallel.py
│   ├── gpt_mentions.py
│   ├── group_chat_chess.py
│   ├── group_chat_demo.py
│   ├── llm_quick_chat_oai.py
│   ├── llm_riddles.py
│   ├── llm_vl_mix_text.py
│   ├── long_dialogue.py
│   ├── multi_agent_router.py
│   ├── parallel_doc_qa.py
│   ├── qwen2vl_assistant_tooluse.py
│   ├── qwen2vl_assistant_video.py
│   ├── qwen2vl_function_calling.py
│   ├── react_data_analysis.py
│   ├── resource/
│   │   └── stock_prices.csv
│   ├── tir_math.py
│   ├── virtual_memory_qa.py
│   └── visual_storytelling.py
├── qwen-agent-docs/
│   └── website/
│       ├── .gitignore
│       ├── app/
│       │   ├── [lang]/
│       │   │   ├── [[...mdxPath]]/
│       │   │   │   ├── index.css
│       │   │   │   └── page.jsx
│       │   │   └── layout.tsx
│       │   ├── layout.tsx
│       │   ├── page.tsx
│       │   ├── robots.ts
│       │   └── sitemap.ts
│       ├── content/
│       │   └── en/
│       │       ├── _meta.ts
│       │       ├── benchmarks/
│       │       │   ├── _meta.ts
│       │       │   ├── deepplanning/
│       │       │   │   └── index.mdx
│       │       │   └── index.md
│       │       ├── guide/
│       │       │   ├── _meta.ts
│       │       │   ├── core_moduls/
│       │       │   │   ├── _meta.ts
│       │       │   │   ├── agent.md
│       │       │   │   ├── context.md
│       │       │   │   ├── llm.md
│       │       │   │   ├── mcp.md
│       │       │   │   ├── rag.md
│       │       │   │   ├── schema.md
│       │       │   │   └── tool.md
│       │       │   ├── get_started/
│       │       │   │   ├── _meta.ts
│       │       │   │   ├── configuration.md
│       │       │   │   ├── features.md
│       │       │   │   ├── install.md
│       │       │   │   └── quickstart.md
│       │       │   └── index.md
│       │       └── index.md
│       ├── mdx-components.tsx
│       ├── next-env.d.ts
│       ├── next.config.mjs
│       ├── package.json
│       ├── postcss.config.js
│       ├── public/
│       │   ├── .nojekyll
│       │   ├── fonts/
│       │   │   ├── Monoton/
│       │   │   │   └── OFL.txt
│       │   │   └── Orbitron/
│       │   │       ├── OFL.txt
│       │   │       └── README.txt
│       │   └── site.webmanifest
│       ├── src/
│       │   └── components/
│       │       ├── font-loader.tsx
│       │       ├── leaderboard.tsx
│       │       └── locale-anchor.tsx
│       └── tsconfig.json
├── qwen_agent/
│   ├── __init__.py
│   ├── agent.py
│   ├── agents/
│   │   ├── __init__.py
│   │   ├── article_agent.py
│   │   ├── assistant.py
│   │   ├── dialogue_retrieval_agent.py
│   │   ├── dialogue_simulator.py
│   │   ├── doc_qa/
│   │   │   ├── __init__.py
│   │   │   ├── basic_doc_qa.py
│   │   │   ├── parallel_doc_qa.py
│   │   │   ├── parallel_doc_qa_member.py
│   │   │   └── parallel_doc_qa_summary.py
│   │   ├── fncall_agent.py
│   │   ├── group_chat.py
│   │   ├── group_chat_auto_router.py
│   │   ├── group_chat_creator.py
│   │   ├── human_simulator.py
│   │   ├── keygen_strategies/
│   │   │   ├── __init__.py
│   │   │   ├── gen_keyword.py
│   │   │   ├── gen_keyword_with_knowledge.py
│   │   │   ├── split_query.py
│   │   │   ├── split_query_then_gen_keyword.py
│   │   │   └── split_query_then_gen_keyword_with_knowledge.py
│   │   ├── memo_assistant.py
│   │   ├── react_chat.py
│   │   ├── router.py
│   │   ├── tir_agent.py
│   │   ├── user_agent.py
│   │   ├── virtual_memory_agent.py
│   │   ├── write_from_scratch.py
│   │   └── writing/
│   │       ├── __init__.py
│   │       ├── continue_writing.py
│   │       ├── expand_writing.py
│   │       └── outline_writing.py
│   ├── gui/
│   │   ├── __init__.py
│   │   ├── assets/
│   │   │   ├── app.css
│   │   │   └── appBot.css
│   │   ├── gradio_dep.py
│   │   ├── gradio_utils.py
│   │   ├── utils.py
│   │   └── web_ui.py
│   ├── llm/
│   │   ├── __init__.py
│   │   ├── azure.py
│   │   ├── base.py
│   │   ├── fncall_prompts/
│   │   │   ├── __init__.py
│   │   │   ├── base_fncall_prompt.py
│   │   │   ├── nous_fncall_prompt.py
│   │   │   └── qwen_fncall_prompt.py
│   │   ├── function_calling.py
│   │   ├── oai.py
│   │   ├── openvino.py
│   │   ├── qwen_dashscope.py
│   │   ├── qwenaudio_dashscope.py
│   │   ├── qwenomni_oai.py
│   │   ├── qwenvl_dashscope.py
│   │   ├── qwenvl_oai.py
│   │   ├── qwenvlo_dashscope.py
│   │   ├── schema.py
│   │   └── transformers_llm.py
│   ├── log.py
│   ├── memory/
│   │   ├── __init__.py
│   │   └── memory.py
│   ├── multi_agent_hub.py
│   ├── settings.py
│   ├── tools/
│   │   ├── __init__.py
│   │   ├── amap_weather.py
│   │   ├── base.py
│   │   ├── code_interpreter.py
│   │   ├── doc_parser.py
│   │   ├── extract_doc_vocabulary.py
│   │   ├── image_gen.py
│   │   ├── image_search.py
│   │   ├── image_zoom_in_qwen3vl.py
│   │   ├── mcp_manager.py
│   │   ├── python_executor.py
│   │   ├── resource/
│   │   │   ├── code_interpreter_image.dockerfile
│   │   │   └── code_interpreter_init_kernel.py
│   │   ├── retrieval.py
│   │   ├── search_tools/
│   │   │   ├── __init__.py
│   │   │   ├── base_search.py
│   │   │   ├── front_page_search.py
│   │   │   ├── hybrid_search.py
│   │   │   ├── keyword_search.py
│   │   │   └── vector_search.py
│   │   ├── simple_doc_parser.py
│   │   ├── storage.py
│   │   ├── web_extractor.py
│   │   └── web_search.py
│   └── utils/
│       ├── __init__.py
│       ├── output_beautify.py
│       ├── parallel_executor.py
│       ├── qwen.tiktoken
│       ├── str_processing.py
│       ├── tokenization_qwen.py
│       └── utils.py
├── qwen_server/
│   ├── __init__.py
│   ├── add_qwen_libs.py
│   ├── assistant_server.py
│   ├── css/
│   │   └── main.css
│   ├── database_server.py
│   ├── js/
│   │   └── main.js
│   ├── output_beautify.py
│   ├── schema.py
│   ├── server_config.json
│   ├── utils.py
│   └── workstation_server.py
├── run_server.py
├── setup.py
└── tests/
    ├── agents/
    │   ├── test_article_agent.py
    │   ├── test_assistant.py
    │   ├── test_custom_tool_object.py
    │   ├── test_doc_qa.py
    │   ├── test_parallel_qa.py
    │   ├── test_react_chat.py
    │   └── test_router.py
    ├── examples/
    │   ├── test_examples.py
    │   ├── test_long_dialogue.py
    │   └── test_vm_qa.py
    ├── llm/
    │   ├── test_continue.py
    │   ├── test_dashscope.py
    │   ├── test_function_content.py
    │   └── test_oai.py
    ├── memory/
    │   └── test_memory.py
    ├── qwen_server/
    │   └── test_database_server.py
    └── tools/
        ├── test_doc_parser.py
        ├── test_hybrid_search.py
        ├── test_keyword_search.py
        ├── test_simple_doc_parser.py
        ├── test_tools.py
        └── test_vector_search.py

Download .txt

SYMBOL INDEX (1125 symbols across 206 files)

FILE: benchmark/code_interpreter/code_interpreter.py
  function fix_matplotlib_cjk_font_issue (line 36) | def fix_matplotlib_cjk_font_issue():
  function start_kernel (line 44) | def start_kernel(pid):
  function escape_ansi (line 91) | def escape_ansi(line):
  function publish_image_to_local (line 96) | def publish_image_to_local(image_base64: str):
  function code_interpreter (line 140) | def code_interpreter(action_input_list: list, timeout=30, clear=False):
  function _code_interpreter (line 157) | def _code_interpreter(code: str, timeout, clear=False):
  function get_multiline_input (line 226) | def get_multiline_input(hint):

FILE: benchmark/code_interpreter/config.py
  function get_react_prompt (line 49) | def get_react_prompt(model_name, query, lang, upload_fname_list):
  function get_react_parser (line 54) | def get_react_parser(model_name):
  function get_model (line 59) | def get_model(model_name):

FILE: benchmark/code_interpreter/inference_and_execute.py
  function llm_with_plugin (line 43) | def llm_with_plugin(args, query, item=None, exec_limit=3):
  function text_completion (line 86) | def text_completion(llm, input_text, stop_words=[]):
  function call_tool (line 98) | def call_tool(plugin_name, plugin_args_list, clear=False):
  function process_code_interpreter (line 106) | def process_code_interpreter(item, writer):
  function process_gsm8k (line 116) | def process_gsm8k(doc, writer):
  function sequential_processing (line 127) | def sequential_processing(args, data_list, process_func, writer):
  function gather_eval_result (line 135) | def gather_eval_result(model_name):
  function eval_metrics (line 148) | def eval_metrics(args, test_set, full_output_fname):
  function main (line 166) | def main(args):
  function parse_args (line 206) | def parse_args():

FILE: benchmark/code_interpreter/metrics/code_execution.py
  function exec_limit_time (line 49) | def exec_limit_time(text):
  function exec_code (line 53) | def exec_code(text, timelimit=False):
  function postprocess_code (line 60) | def postprocess_code(gen_code, line):
  function get_action_input_code (line 81) | def get_action_input_code(text, model_name='qwen-14b-chat', extract_firs...
  function eval_code_execution_rate (line 100) | def eval_code_execution_rate(output_fname,
  function log_result (line 184) | def log_result(data_list, verbose=True):

FILE: benchmark/code_interpreter/metrics/gsm8k.py
  function extract_answer (line 11) | def extract_answer(completion):
  function is_correct (line 32) | def is_correct(completion, answer):
  function eval_gsm8k_acc (line 38) | def eval_gsm8k_acc(output_fname):

FILE: benchmark/code_interpreter/metrics/visualization.py
  function encode_image (line 26) | def encode_image(image_path):
  function judger_model_inference (line 32) | def judger_model_inference(judger_model_name, judger_model, imgs=[], pro...
  function extract_images (line 73) | def extract_images(text):
  function check_images_observation (line 84) | def check_images_observation(text, images, model_name):
  function eval_visualization_acc (line 107) | def eval_visualization_acc(output_fname, model_name, judger_model_name='...

FILE: benchmark/code_interpreter/models/base.py
  class HFModel (line 5) | class HFModel(object):
    method __init__ (line 7) | def __init__(self, model_path):

FILE: benchmark/code_interpreter/models/dashscope.py
  class QwenDashscopeVLModel (line 9) | class QwenDashscopeVLModel(object):
    method __init__ (line 11) | def __init__(self, model, api_key):
    method generate (line 16) | def generate(self, prompt, stop_words=[]):

FILE: benchmark/code_interpreter/models/llm.py
  class LLM (line 5) | class LLM(HFModel):
    method __init__ (line 7) | def __init__(self, model_path):
    method generate (line 10) | def generate(self, input_text, stop_words=[], max_new_tokens=512):

FILE: benchmark/code_interpreter/models/qwen.py
  class Qwen (line 5) | class Qwen(HFModel):
    method __init__ (line 7) | def __init__(self, model_path):
    method generate (line 10) | def generate(self, input_text, stop_words=[]):
  class QwenVL (line 26) | class QwenVL(HFModel):
    method __init__ (line 28) | def __init__(self, model_path):
    method generate (line 31) | def generate(self, inputs: list):

FILE: benchmark/code_interpreter/parser/internlm_parser.py
  class InternLMReActParser (line 4) | class InternLMReActParser(ReActParser):
    method __init__ (line 6) | def __init__(self):

FILE: benchmark/code_interpreter/parser/react_parser.py
  class ReActParser (line 1) | class ReActParser(object):
    method __init__ (line 3) | def __init__(self):
    method parse_latest_plugin_call (line 10) | def parse_latest_plugin_call(self, text):
    method _extract_first_target (line 29) | def _extract_first_target(self, text, start_flag, end_flag):
    method get_first_observation (line 40) | def get_first_observation(self, text):
    method get_first_action_input (line 43) | def get_first_action_input(self, text):

FILE: benchmark/code_interpreter/prompt/internlm_react.py
  class InternLMReAct (line 66) | class InternLMReAct(ReAct):
    method __init__ (line 68) | def __init__(self, query, lang='en', upload_file_paths=[]):
    method build_prompt (line 72) | def build_prompt(self):
    method _build_tools_text (line 88) | def _build_tools_text(self):
    method _build_tools_name_text (line 91) | def _build_tools_name_text(self):
    method build_observation (line 94) | def build_observation(self, observation):
    method get_stop_words_list (line 97) | def get_stop_words_list(self):

FILE: benchmark/code_interpreter/prompt/llama_react.py
  class LlamaReAct (line 4) | class LlamaReAct(ReAct):
    method __init__ (line 6) | def __init__(self, query, lang='en', upload_file_paths=[]):
    method build_prompt (line 9) | def build_prompt(self):

FILE: benchmark/code_interpreter/prompt/qwen_react.py
  class QwenReAct (line 23) | class QwenReAct(ReAct):
    method __init__ (line 25) | def __init__(self, query, lang='en', upload_file_paths=[]):
    method build_prompt (line 36) | def build_prompt(self):
    method _build_tools_text (line 53) | def _build_tools_text(self):
    method _build_tools_name_text (line 73) | def _build_tools_name_text(self):

FILE: benchmark/code_interpreter/prompt/react.py
  class ReAct (line 42) | class ReAct(object):
    method __init__ (line 44) | def __init__(self, query, lang='en', upload_file_paths=[]):
    method build_prompt (line 53) | def build_prompt(self):
    method _format_upload_fname (line 64) | def _format_upload_fname(self):
    method _build_tools_text (line 73) | def _build_tools_text(self):
    method _build_tools_name_text (line 76) | def _build_tools_name_text(self):
    method build_observation (line 79) | def build_observation(self, observation):
    method get_stop_words_list (line 82) | def get_stop_words_list(self):

FILE: benchmark/code_interpreter/utils/code_utils.py
  function replace_upload_fname (line 7) | def replace_upload_fname(text, upload_fname_list):
  function extract_code (line 14) | def extract_code(text):

FILE: benchmark/code_interpreter/utils/data_utils.py
  function load_jsonl (line 7) | def load_jsonl(path):
  function save_jsonl (line 20) | def save_jsonl(data, path, progress=False, enabled=True):

FILE: benchmark/deepplanning/aggregate_results.py
  function load_shopping_statistics (line 14) | def load_shopping_statistics(domain_dir: Path, model_name: str) -> Optio...
  function load_travel_statistics (line 53) | def load_travel_statistics(domain_dir: Path, model_name: str, output_dir...
  function aggregate_model_results (line 133) | def aggregate_model_results(model_name: str, project_root: Path, travel_...
  function main (line 263) | def main():

FILE: benchmark/deepplanning/shoppingplanning/agent/call_llm.py
  function load_model_config (line 14) | def load_model_config(model_name: str) -> Dict[str, Any]:
  function create_client (line 64) | def create_client(model_name: str, model_config: Optional[Dict[str, Any]...
  function call_llm (line 94) | def call_llm(

FILE: benchmark/deepplanning/shoppingplanning/agent/prompts.py
  class PromptLib (line 124) | class PromptLib:

FILE: benchmark/deepplanning/shoppingplanning/agent/shopping_agent.py
  class ShoppingFnAgent (line 27) | class ShoppingFnAgent:
    method __init__ (line 35) | def __init__(self,
    method _build_tool_config (line 75) | def _build_tool_config(self) -> Dict[str, Any]:
    method _load_tool_instances (line 92) | def _load_tool_instances(self) -> Dict[str, Any]:
    method _load_env_from_dotenv (line 148) | def _load_env_from_dotenv(self) -> None:
    method _load_tool_schemas (line 183) | def _load_tool_schemas(self) -> List[Dict[str, Any]]:
    method _build_openai_tools (line 188) | def _build_openai_tools(self, schemas: List[Dict[str, Any]]) -> List[D...
    method _exec_tool (line 200) | def _exec_tool(self, name: str, arguments_json: str) -> str:
    method _call_llm (line 211) | def _call_llm(self, messages: List[Dict[str, Any]], tools: Optional[Li...
    method _detect_tool_calls (line 219) | def _detect_tool_calls(self, assistant_message) -> List[Dict[str, Any]]:
    method _add_to_cart (line 243) | def _add_to_cart(self, history_messages: List[Any]) -> List[Any]:
    method run (line 256) | def run(self, user_query: str, system_prompt: str | None = None, max_l...
    method _save_messages (line 374) | def _save_messages(self, messages: List[Any], filepath: Path, step: in...
  function run_agent_inference (line 388) | def run_agent_inference(

FILE: benchmark/deepplanning/shoppingplanning/evaluation/evaluation_pipeline.py
  function load_validation_cases (line 11) | def load_validation_cases(json_path: Path) -> Dict[str, Any]:
  function load_cart (line 27) | def load_cart(cart_path: Path) -> Dict[str, Any]:
  function check_case_completion (line 40) | def check_case_completion(messages_path: Path) -> bool:
  function evaluate_single_case (line 85) | def evaluate_single_case(case_dir: Path) -> Dict[str, Any]:
  function generate_case_report (line 243) | def generate_case_report(evaluation_result: Dict[str, Any], output_dir: ...
  function generate_summary_report (line 279) | def generate_summary_report(all_results: List[Dict[str, Any]], output_di...
  function main (line 365) | def main():

FILE: benchmark/deepplanning/shoppingplanning/evaluation/score_statistics.py
  function parse_folder_name (line 18) | def parse_folder_name(folder_name: str) -> Optional[tuple]:
  function read_summary_report (line 35) | def read_summary_report(report_path: Path) -> Optional[Dict[str, Any]]:
  function calculate_model_statistics (line 64) | def calculate_model_statistics(model_name: str, result_report_dir: Path)...
  function main (line 203) | def main():

FILE: benchmark/deepplanning/shoppingplanning/run.py
  function parse_args (line 19) | def parse_args():
  function setup_paths (line 51) | def setup_paths(args):
  function print_config (line 82) | def print_config(args):
  function run_step_inference (line 96) | def run_step_inference(args):
  function main (line 142) | def main():

FILE: benchmark/deepplanning/shoppingplanning/tools/add_coupon_to_cart.py
  class AddCouponToCartTool (line 21) | class AddCouponToCartTool(BaseShoppingTool):
    method __init__ (line 26) | def __init__(self, cfg: Dict = None):
    method _load_cart (line 46) | def _load_cart(self, path: Path):
    method _load_user (line 80) | def _load_user(self, path: Path):
    method _save_cart (line 97) | def _save_cart(self):
    method _parse_coupon (line 102) | def _parse_coupon(self, coupon_name: str) -> Tuple[float, float]:
    method _calculate_base_total (line 122) | def _calculate_base_total(self) -> float:
    method _calculate_max_coupon_usage (line 132) | def _calculate_max_coupon_usage(self, coupon_name: str, base_total: fl...
    method _calculate_total_discount (line 146) | def _calculate_total_discount(self, used_coupons: List[Dict]) -> float:
    method _validate_coupon_combination (line 157) | def _validate_coupon_combination(self, base_total: float, used_coupons...
    method _update_summary (line 188) | def _update_summary(self):
    method call (line 203) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/add_product_to_cart.py
  class AddProductToCartTool (line 7) | class AddProductToCartTool(BaseShoppingTool):
    method __init__ (line 12) | def __init__(self, cfg: Dict = None):
    method _load_cart (line 32) | def _load_cart(self, path: Path):
    method _load_products (line 63) | def _load_products(self, path: Path):
    method _save_cart (line 74) | def _save_cart(self):
    method _update_summary (line 79) | def _update_summary(self):
    method call (line 91) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/base_shopping_tool.py
  function load_tool_schemas (line 19) | def load_tool_schemas(schema_file: str = 'shopping_tool_schema.json') ->...
  function get_tool_schema (line 53) | def get_tool_schema(tool_name: str, schemas: Optional[Dict[str, dict]] =...
  function get_cached_tool_schemas (line 79) | def get_cached_tool_schemas() -> Dict[str, dict]:
  function register_tool (line 92) | def register_tool(name: str, allow_overwrite: bool = False):
  class BaseShoppingTool (line 145) | class BaseShoppingTool(ABC):
    method __init__ (line 174) | def __init__(self, cfg: Optional[Dict] = None):
    method _load_schema_from_json (line 212) | def _load_schema_from_json(self):
    method _is_valid_schema (line 237) | def _is_valid_schema(schema: dict) -> bool:
    method call (line 252) | def call(self, params: Union[str, dict], **kwargs) -> str:
    method _verify_json_format_args (line 270) | def _verify_json_format_args(self, params: Union[str, dict], strict_js...
    method load_json_database (line 306) | def load_json_database(self, path: str) -> dict:
    method load_csv_database (line 325) | def load_csv_database(self, path: str):
    method format_result_as_json (line 372) | def format_result_as_json(self, result: Union[dict, list]) -> str:
    method openai_schema (line 387) | def openai_schema(self) -> Dict:
    method function (line 404) | def function(self) -> Dict:
    method get_schema (line 417) | def get_schema(self, format: str = "openai") -> Dict:
    method get_openai_schema_from_class (line 444) | def get_openai_schema_from_class(cls) -> Dict:

FILE: benchmark/deepplanning/shoppingplanning/tools/calculate_transport_time_tool.py
  class CalculateTransportTimeTool (line 88) | class CalculateTransportTimeTool(BaseShoppingTool):
    method __init__ (line 96) | def __init__(self, cfg: Dict = None):
    method _load_database (line 114) | def _load_database(self, path: str):
    method _normalize_province (line 126) | def _normalize_province(self, address_str: str) -> str:
    method call (line 143) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/delete_coupon_from_cart.py
  class DeleteCouponFromCartTool (line 23) | class DeleteCouponFromCartTool(BaseShoppingTool):
    method __init__ (line 27) | def __init__(self, cfg: Dict = None):
    method _load_cart (line 41) | def _load_cart(self, path: Path):
    method _save_cart (line 74) | def _save_cart(self):
    method _parse_coupon (line 79) | def _parse_coupon(self, coupon_name: str):
    method _calculate_base_total (line 96) | def _calculate_base_total(self) -> float:
    method _calculate_total_discount (line 107) | def _calculate_total_discount(self, used_coupons: List[Dict]) -> float:
    method _update_summary (line 131) | def _update_summary(self):
    method _cleanup_zero_quantity_coupons (line 144) | def _cleanup_zero_quantity_coupons(self):
    method call (line 163) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/delete_product_from_cart.py
  class DeleteProductFromCartTool (line 7) | class DeleteProductFromCartTool(BaseShoppingTool):
    method __init__ (line 12) | def __init__(self, cfg: Dict = None):
    method _load_cart (line 31) | def _load_cart(self, path: Path):
    method _load_products (line 61) | def _load_products(self, path: Path):
    method _save_cart (line 72) | def _save_cart(self):
    method _update_summary (line 77) | def _update_summary(self):
    method _cleanup_zero_quantity_items (line 89) | def _cleanup_zero_quantity_items(self):
    method call (line 94) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/filter_by_applicable_coupons_tool.py
  class FilterByApplicableCouponsTool (line 22) | class FilterByApplicableCouponsTool(BaseShoppingTool):
    method __init__ (line 29) | def __init__(self, cfg: Dict = None):
    method _load_database (line 45) | def _load_database(self, path: str):
    method call (line 59) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/filter_by_brand_tool.py
  class FilterByBrandTool (line 12) | class FilterByBrandTool(BaseShoppingTool):
    method __init__ (line 20) | def __init__(self, cfg: Dict = None):
    method _load_database (line 40) | def _load_database(self, path: str):
    method call (line 54) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/filter_by_color_tool.py
  class FilterByColorTool (line 8) | class FilterByColorTool(BaseShoppingTool):
    method __init__ (line 16) | def __init__(self, cfg: Dict = None):
    method _load_database (line 32) | def _load_database(self, path: str):
    method call (line 46) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/filter_by_range_tool.py
  class FilterByRangeTool (line 10) | class FilterByRangeTool(BaseShoppingTool):
    method __init__ (line 17) | def __init__(self, cfg: Dict = None):
    method _load_database (line 33) | def _load_database(self, path: str):
    method _get_nested_value (line 47) | def _get_nested_value(self, obj: Dict, key_path: str) -> Any:
    method call (line 60) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/filter_by_size_tool.py
  class FilterBySizeTool (line 8) | class FilterBySizeTool(BaseShoppingTool):
    method __init__ (line 16) | def __init__(self, cfg: Dict = None):
    method _load_database (line 32) | def _load_database(self, path: str):
    method call (line 46) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/get_cart_info.py
  class GetCartInfoTool (line 8) | class GetCartInfoTool(BaseShoppingTool):
    method __init__ (line 13) | def __init__(self, cfg: Dict = None):
    method _load_database (line 26) | def _load_database(self, path: Path):
    method call (line 84) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/get_product_details_tool.py
  class GetProductDetailsTool (line 8) | class GetProductDetailsTool(BaseShoppingTool):
    method __init__ (line 13) | def __init__(self, cfg: Dict = None):
    method _load_database (line 29) | def _load_database(self, path: str):
    method call (line 43) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/get_user_info.py
  class GetUserInfoTool (line 7) | class GetUserInfoTool(BaseShoppingTool):
    method __init__ (line 12) | def __init__(self, cfg: Dict = None):
    method _load_database (line 25) | def _load_database(self, path: Path):
    method call (line 56) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/search_products_tool.py
  class SearchProductsTool (line 17) | class SearchProductsTool(BaseShoppingTool):
    method __init__ (line 23) | def __init__(self, cfg: Dict = None):
    method _load_and_prepare_database (line 43) | def _load_and_prepare_database(self, path: str):
    method call (line 77) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/shoppingplanning/tools/sort_product_tool.py
  class SortProductsTool (line 9) | class SortProductsTool(BaseShoppingTool):
    method __init__ (line 17) | def __init__(self, cfg: Dict = None):
    method _load_database (line 33) | def _load_database(self, path: str):
    method _get_nested_value (line 45) | def _get_nested_value(self, obj: Dict, key_path: str) -> Any:
    method call (line 52) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/travelplanning/agent/call_llm.py
  function load_model_config (line 14) | def load_model_config(model_name: str) -> Dict[str, Any]:
  function create_client (line 64) | def create_client(model_name: str, model_config: Optional[Dict[str, Any]...
  function call_llm (line 99) | def call_llm(

FILE: benchmark/deepplanning/travelplanning/agent/prompts.py
  function get_system_prompt (line 923) | def get_system_prompt(language: str = 'zh') -> str:
  function get_format_convert_prompt (line 933) | def get_format_convert_prompt(language: str = 'zh') -> str:

FILE: benchmark/deepplanning/travelplanning/agent/tools_fn_agent.py
  class ToolsFnAgent (line 21) | class ToolsFnAgent:
    method __init__ (line 29) | def __init__(self,
    method _load_env_from_dotenv (line 67) | def _load_env_from_dotenv(self) -> None:
    method _load_tool_schemas (line 102) | def _load_tool_schemas(self) -> List[Dict[str, Any]]:
    method _build_openai_tools (line 113) | def _build_openai_tools(self, schemas: List[Dict[str, Any]]) -> List[D...
    method _build_tool_config (line 135) | def _build_tool_config(self, tool_cls) -> Dict[str, Any]:
    method _load_tool_instances (line 166) | def _load_tool_instances(self) -> Dict[str, Any]:
    method _exec_tool (line 201) | def _exec_tool(self, name: str, arguments_json: str) -> str:
    method _call_llm (line 218) | def _call_llm(self, messages: List[Any], tools: Optional[List[Dict[str...
    method _detect_tool_calls (line 227) | def _detect_tool_calls(self, assistant_message) -> List[Dict[str, Any]]:
    method _extract_plan_content (line 253) | def _extract_plan_content(self, text: str) -> str:
    method _message_to_dict (line 272) | def _message_to_dict(self, msg) -> Dict[str, Any]:
    method _serialize_messages (line 322) | def _serialize_messages(self, messages: List[Any]) -> List[Dict[str, A...
    method run (line 329) | def run(self,
  function run_agent_inference (line 379) | def run_agent_inference(

FILE: benchmark/deepplanning/travelplanning/evaluation/constraints_commonsense.py
  function check_valid_days (line 153) | def check_valid_days(daily_plans: List[Dict[str, Any]], meta: Dict[str, ...
  function check_route_closed_loop (line 160) | def check_route_closed_loop(daily_plans: List[Dict[str, Any]], meta: Dic...
  function check_intercity_transportation_consistency (line 177) | def check_intercity_transportation_consistency(daily_plans: List[Dict[st...
  function check_hotels_from_search (line 289) | def check_hotels_from_search(daily_plans: List[Dict[str, Any]], hotels_i...
  function check_attractions_from_search (line 342) | def check_attractions_from_search(daily_plans: List[Dict[str, Any]], att...
  function check_meals_from_search (line 372) | def check_meals_from_search(daily_plans: List[Dict[str, Any]], restauran...
  function check_intercity_public_from_search (line 409) | def check_intercity_public_from_search(
  function check_accommodation_traceable (line 504) | def check_accommodation_traceable(daily_plans: List[Dict[str, Any]]) -> ...
  function check_last_activity_is_hotel (line 538) | def check_last_activity_is_hotel(daily_plans: List[Dict[str, Any]]) -> T...
  function check_meal_necessity (line 556) | def check_meal_necessity(daily_plans: List[Dict[str, Any]], meta: Dict[s...
  function check_attraction_necessity (line 706) | def check_attraction_necessity(daily_plans: List[Dict[str, Any]], meta: ...
  function check_time_no_overlap (line 877) | def check_time_no_overlap(daily_plans: List[Dict[str, Any]]) -> Tuple[bo...
  function check_transfer_time_reasonable (line 901) | def check_transfer_time_reasonable(daily_plans: List[Dict[str, Any]], lo...
  function check_attractions_in_opening_hours (line 1057) | def check_attractions_in_opening_hours(daily_plans: List[Dict[str, Any]]...
  function check_meals_in_business_hours (line 1088) | def check_meals_in_business_hours(daily_plans: List[Dict[str, Any]], res...
  function check_attractions_not_closed (line 1127) | def check_attractions_not_closed(
  function check_attractions_duration_reasonable (line 1202) | def check_attractions_duration_reasonable(daily_plans: List[Dict[str, An...
  function check_meal_duration_reasonable (line 1230) | def check_meal_duration_reasonable(daily_plans: List[Dict[str, Any]]) ->...
  function check_budget_accuracy (line 1283) | def check_budget_accuracy(plan: Dict[str, Any], daily_plans: List[Dict[s...
  function check_diverse_restaurants (line 1423) | def check_diverse_restaurants(daily_plans: List[Dict[str, Any]]) -> Tupl...
  function check_diverse_attractions (line 1439) | def check_diverse_attractions(daily_plans: List[Dict[str, Any]]) -> Tupl...
  function calculate_dimension_scores (line 1458) | def calculate_dimension_scores(check_results: Dict[str, Tuple[bool, Opti...
  function get_dimension_summary (line 1530) | def get_dimension_summary(dimension_result: Dict[str, Any]) -> str:
  function eval_commonsense (line 1572) | def eval_commonsense(plan: Dict[str, Any], meta: Dict[str, Any], databas...
  function eval_commonsense_with_dimensions (line 1673) | def eval_commonsense_with_dimensions(
  function get_all_check_names (line 1719) | def get_all_check_names() -> List[str]:

FILE: benchmark/deepplanning/travelplanning/evaluation/constraints_hard.py
  function eval_hard (line 10) | def eval_hard(plan: Dict[str, Any], meta: Dict[str, Any]) -> Dict[str, T...
  function _eval_flight_constraint (line 61) | def _eval_flight_constraint(constraint_key: str, constraint_data: Dict, ...
  function _eval_train_constraint (line 110) | def _eval_train_constraint(constraint_key: str, constraint_data: Dict, p...
  function _eval_hotel_constraint (line 157) | def _eval_hotel_constraint(constraint_key: str, constraint_data: Dict, p...
  function _eval_restaurant_constraint (line 223) | def _eval_restaurant_constraint(constraint_key: str, constraint_data: Di...
  function _eval_attraction_constraint (line 283) | def _eval_attraction_constraint(constraint_key: str, constraint_data: Di...
  function _extract_flights_from_plan (line 343) | def _extract_flights_from_plan(plan: Dict) -> List[Dict]:
  function _extract_trains_from_plan (line 368) | def _extract_trains_from_plan(plan: Dict) -> List[Dict]:
  function _extract_hotels_from_plan (line 392) | def _extract_hotels_from_plan(plan: Dict) -> List[Dict]:
  function _extract_restaurants_from_plan (line 409) | def _extract_restaurants_from_plan(plan: Dict) -> List[Dict]:
  function _extract_attractions_from_plan (line 430) | def _extract_attractions_from_plan(plan: Dict) -> List[Dict]:
  function _eval_budget_constraint (line 455) | def _eval_budget_constraint(constraint_data: Dict, plan: Dict, meta: Dic...

FILE: benchmark/deepplanning/travelplanning/evaluation/convert_report.py
  function _load_env_from_dotenv (line 27) | def _load_env_from_dotenv() -> None:
  function extract_json_from_response (line 65) | def extract_json_from_response(text: str) -> Optional[str]:
  function process_single_report (line 79) | def process_single_report(
  function convert_reports (line 208) | def convert_reports(

FILE: benchmark/deepplanning/travelplanning/evaluation/eval_converted.py
  function calculate_weighted_score (line 19) | def calculate_weighted_score(commonsense_results: Dict[str, Tuple[bool, ...
  function calculate_hard_score (line 84) | def calculate_hard_score(hard_results: Dict[str, Tuple[bool, Optional[st...
  function process_single_evaluation (line 124) | def process_single_evaluation(
  function evaluate_plans (line 247) | def evaluate_plans(

FILE: benchmark/deepplanning/travelplanning/evaluation/utils.py
  function extract_from_to (line 19) | def extract_from_to(text: str) -> Tuple[Optional[str], Optional[str]]:
  function normalize_city (line 27) | def normalize_city(text: Optional[str]) -> Optional[str]:
  function parse_lonlat_string (line 34) | def parse_lonlat_string(text: Optional[str]) -> Tuple[Optional[float], O...
  function parse_time_hhmm (line 54) | def parse_time_hhmm(t: Optional[str]) -> Optional[time]:
  function parse_time_slot (line 72) | def parse_time_slot(slot: Optional[str]) -> Tuple[Optional[time], Option...
  function is_within_business_hours (line 84) | def is_within_business_hours(slot_start: time, slot_end: time, open_t: t...
  function slot_to_minutes (line 106) | def slot_to_minutes(slot: Optional[str]) -> Tuple[Optional[int], Optiona...
  function haversine_km (line 122) | def haversine_km(lat1: float, lon1: float, lat2: float, lon2: float) -> ...
  function get_base_dir (line 137) | def get_base_dir() -> Path:
  function get_database_dir (line 180) | def get_database_dir(database_dir: Optional[Path] = None) -> Path:
  function load_restaurant_index (line 211) | def load_restaurant_index(csv_path: str) -> Dict[str, Dict[str, Any]]:
  function load_hotel_index (line 236) | def load_hotel_index(csv_path: str) -> Dict[str, Dict[str, Any]]:
  function load_attraction_index (line 258) | def load_attraction_index(csv_path: str) -> Dict[str, Dict[str, Any]]:
  function load_locations_index (line 286) | def load_locations_index(csv_path: str) -> Dict[str, Dict[str, Any]]:
  function load_flights_index (line 314) | def load_flights_index(csv_path: str) -> Dict[str, List[Dict[str, Any]]]:
  function load_trains_index (line 353) | def load_trains_index(csv_path: str) -> Dict[str, List[Dict[str, Any]]]:
  function load_station_to_city_mapping (line 400) | def load_station_to_city_mapping(database_dir: Optional[Path] = None) ->...
  function get_station_to_city_map (line 463) | def get_station_to_city_map(database_dir: Optional[Path] = None) -> Dict...
  function extract_city_from_location (line 483) | def extract_city_from_location(location: str, database_dir: Optional[Pat...
  function get_location_coords (line 521) | def get_location_coords(name: str, locations_index: Dict[str, Dict[str, ...
  function resolve_name_coords (line 538) | def resolve_name_coords(name: str, locations_index: Optional[Dict[str, D...
  function parse_duration_hours (line 565) | def parse_duration_hours(val: Any) -> Optional[float]:
  function is_all_day (line 575) | def is_all_day(opening: Optional[str], closing: Optional[str]) -> bool:
  function calculate_day_of_week (line 588) | def calculate_day_of_week(depart_weekday: int, day_index: int) -> int:
  function parse_closing_dates (line 610) | def parse_closing_dates(closing_dates_str: Optional[str]) -> List[int]:
  function is_attraction_closed_on_day (line 668) | def is_attraction_closed_on_day(closing_dates: Optional[str], weekday: i...
  function day_cities (line 687) | def day_cities(current_city: str) -> List[str]:
  function iter_meal_acts (line 695) | def iter_meal_acts(daily_plans: List[Dict[str, Any]]):
  function iter_hotel_acts (line 708) | def iter_hotel_acts(daily_plans: List[Dict[str, Any]]):
  function iter_attraction_acts (line 721) | def iter_attraction_acts(daily_plans: List[Dict[str, Any]]):
  function iter_intercity_public_acts (line 734) | def iter_intercity_public_acts(daily_plans: List[Dict[str, Any]]):
  function end_city_of_day (line 746) | def end_city_of_day(current_city: str) -> Optional[str]:
  function get_day_accommodation_city (line 754) | def get_day_accommodation_city(day: Dict[str, Any], hotels_index: Option...
  function iter_accommodation_entries (line 772) | def iter_accommodation_entries(daily_plans: List[Dict[str, Any]]):
  function get_intercity_arrival_time (line 795) | def get_intercity_arrival_time(day: Dict[str, Any]) -> Optional[float]:
  function get_intercity_departure_time (line 815) | def get_intercity_departure_time(day: Dict[str, Any]) -> Optional[float]:

FILE: benchmark/deepplanning/travelplanning/run.py
  function detect_missing_ids (line 27) | def detect_missing_ids(directory: Path, file_pattern: str, total_ids: in...
  function parse_id_list (line 59) | def parse_id_list(id_str: str) -> list:
  function get_agent_inference_function (line 101) | def get_agent_inference_function(model: str):
  function parse_args (line 116) | def parse_args():
  function setup_paths (line 161) | def setup_paths(args):
  function print_config (line 199) | def print_config(args):
  function run_step_inference (line 228) | def run_step_inference(args):
  function run_step_conversion (line 295) | def run_step_conversion(args):
  function run_step_evaluation (line 344) | def run_step_evaluation(args):
  function print_final_summary (line 383) | def print_final_summary(args, inference_results, conversion_results, eva...
  function run_single_language (line 403) | def run_single_language(args, language):
  function main (line 446) | def main():

FILE: benchmark/deepplanning/travelplanning/tools/attraction_query_tool.py
  class AttractionDetailsQueryTool (line 11) | class AttractionDetailsQueryTool(BaseTravelTool):
    method __init__ (line 68) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 83) | def call(self, params: Union[str, dict], **kwargs) -> str:
  class AttractionRecommendTool (line 172) | class AttractionRecommendTool(BaseTravelTool):
    method __init__ (line 195) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 210) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/travelplanning/tools/base_travel_tool.py
  function load_tool_schemas (line 17) | def load_tool_schemas(schema_file: str = 'tool_schema.json', language: s...
  function get_tool_schema (line 64) | def get_tool_schema(tool_name: str, schemas: Dict[str, dict] = None) -> ...
  function get_cached_tool_schemas (line 87) | def get_cached_tool_schemas(language: str = 'en') -> Dict[str, dict]:
  class BaseTravelTool (line 101) | class BaseTravelTool(BaseTool):
    method __init__ (line 123) | def __init__(self, cfg: Optional[Dict] = None):
    method _load_schema_from_json (line 150) | def _load_schema_from_json(self):
    method load_json_database (line 176) | def load_json_database(self, path: str) -> dict:
    method load_csv_database (line 195) | def load_csv_database(self, path: str):
    method format_result_as_json (line 245) | def format_result_as_json(self, result: Union[dict, list]) -> str:
    method openai_schema (line 260) | def openai_schema(self) -> Dict:
    method get_schema (line 289) | def get_schema(self, format: str = "openai") -> Dict:
    method get_openai_schema_from_class (line 320) | def get_openai_schema_from_class(cls) -> Dict:

FILE: benchmark/deepplanning/travelplanning/tools/flight_query_tool.py
  class FlightQueryTool (line 11) | class FlightQueryTool(BaseTravelTool):
    method __init__ (line 36) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 51) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/travelplanning/tools/hotel_query_tool.py
  class HotelQueryTool (line 11) | class HotelQueryTool(BaseTravelTool):
    method __init__ (line 30) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 45) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/travelplanning/tools/location_search_tool.py
  class LocationSearchTool (line 11) | class LocationSearchTool(BaseTravelTool):
    method __init__ (line 30) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 45) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/travelplanning/tools/restaurant_query_tool.py
  class RestaurantRecommendTool (line 11) | class RestaurantRecommendTool(BaseTravelTool):
    method __init__ (line 30) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 45) | def call(self, params: Union[str, dict], **kwargs) -> str:
  class RestaurantDetailsQueryTool (line 105) | class RestaurantDetailsQueryTool(BaseTravelTool):
    method __init__ (line 124) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 139) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: benchmark/deepplanning/travelplanning/tools/roadroute_query_tool.py
  class RoadRouteInfoQueryTool (line 11) | class RoadRouteInfoQueryTool(BaseTravelTool):
    method __init__ (line 32) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 47) | def call(self, params: Union[str, dict], **kwargs) -> str:
    method _check_coordinate_existence (line 94) | def _check_coordinate_existence(self, origin: str, destination: str) -...

FILE: benchmark/deepplanning/travelplanning/tools/train_query_tool.py
  class TrainQueryTool (line 11) | class TrainQueryTool(BaseTravelTool):
    method __init__ (line 38) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 54) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: browser_qwen/background.js
  function send_data (line 19) | function send_data(msg){

FILE: browser_qwen/src/content.js
  function getPageTextContent (line 18) | function getPageTextContent() {
  function cache_browser (line 23) | function cache_browser(){

FILE: examples/assistant_add_custom_tool.py
  class MyImageGen (line 32) | class MyImageGen(BaseTool):
    method call (line 41) | def call(self, params: str, **kwargs) -> str:
  function init_agent_service (line 50) | def init_agent_service():
  function test (line 72) | def test(query: str = 'draw a dog'):
  function app_tui (line 82) | def app_tui():
  function app_gui (line 97) | def app_gui():

FILE: examples/assistant_audio.py
  function test (line 19) | def test():
  function app_gui (line 34) | def app_gui():

FILE: examples/assistant_mcp_sqlite_bot.py
  function init_agent_service (line 27) | def init_agent_service():
  function test (line 53) | def test(query='数据库里有几张表', file: Optional[str] = os.path.join(ROOT_RESOU...
  function app_tui (line 69) | def app_tui():
  function app_gui (line 94) | def app_gui():

FILE: examples/assistant_omni.py
  function test (line 19) | def test():
  function app_gui (line 51) | def app_gui():

FILE: examples/assistant_qwen3.5.py
  function init_agent_service (line 22) | def init_agent_service():
  function test (line 77) | def test(query: str = 'What time is it?'):
  function app_tui (line 88) | def app_tui():
  function app_gui (line 104) | def app_gui():

FILE: examples/assistant_qwen3.py
  function init_agent_service (line 22) | def init_agent_service():
  function test (line 87) | def test(query: str = 'What time is it?'):
  function app_tui (line 98) | def app_tui():
  function app_gui (line 114) | def app_gui():

FILE: examples/assistant_qwen3_coder.py
  function init_agent_service (line 22) | def init_agent_service():
  function test (line 77) | def test(query: str = 'What time is it?'):
  function app_tui (line 88) | def app_tui():
  function app_gui (line 104) | def app_gui():

FILE: examples/assistant_qwen3vl.py
  function init_agent_service (line 21) | def init_agent_service():
  function test (line 43) | def test(pic_url: str, query: str):

FILE: examples/assistant_qwq.py
  function init_agent_service (line 26) | def init_agent_service():
  function test (line 53) | def test(query: str = '画一只猫，再画一只狗，最后画他们一起玩的画面，给我三张图'):
  function app_tui (line 64) | def app_tui():
  function app_gui (line 80) | def app_gui():

FILE: examples/assistant_rag.py
  function test (line 19) | def test():
  function app_gui (line 26) | def app_gui():

FILE: examples/assistant_weather_bot.py
  function init_agent_service (line 26) | def init_agent_service():
  function test (line 43) | def test(query='海淀区天气', file: Optional[str] = os.path.join(ROOT_RESOURCE...
  function app_tui (line 59) | def app_tui():
  function app_gui (line 84) | def app_gui():

FILE: examples/function_calling.py
  function get_current_weather (line 24) | def get_current_weather(location, unit='fahrenheit'):
  function test (line 36) | def test(fncall_prompt_type: str = 'qwen'):

FILE: examples/function_calling_in_parallel.py
  function get_current_weather (line 25) | def get_current_weather(location, unit='fahrenheit'):
  function test (line 37) | def test():

FILE: examples/gpt_mentions.py
  function init_agent_service (line 22) | def init_agent_service():
  function app_gui (line 44) | def app_gui():

FILE: examples/group_chat_chess.py
  function test (line 55) | def test(query: str = '<1,1>'):
  function app_tui (line 63) | def app_tui():
  function app_gui (line 77) | def app_gui():

FILE: examples/group_chat_demo.py
  function init_agent_service (line 26) | def init_agent_service(cfgs):
  function init_agent_service_create (line 32) | def init_agent_service_create():
  function app (line 87) | def app(cfgs):
  function test (line 162) | def test():
  function app_create (line 166) | def app_create(history, now_cfgs):
  function _get_display_history_from_message (line 215) | def _get_display_history_from_message():
  function get_name_of_current_user (line 231) | def get_name_of_current_user(cfgs):
  function add_text (line 238) | def add_text(text, cfgs):
  function chat_clear (line 250) | def chat_clear():
  function chat_clear_create (line 255) | def chat_clear_create():
  function add_file (line 260) | def add_file(file):
  function add_text_create (line 266) | def add_text_create(history, text):

FILE: examples/llm_quick_chat_oai.py
  function test (line 19) | def test():

FILE: examples/llm_riddles.py
  class LLMRiddles (line 26) | class LLMRiddles(Agent):
    method __init__ (line 29) | def __init__(self, llm: Optional[Union[Dict, BaseChatModel]] = None):
    method _run (line 41) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...
  function test (line 45) | def test():
  function app_tui (line 65) | def app_tui():

FILE: examples/llm_vl_mix_text.py
  function test (line 20) | def test():

FILE: examples/long_dialogue.py
  function test (line 19) | def test():
  function app_tui (line 31) | def app_tui():
  function app_gui (line 45) | def app_gui():

FILE: examples/multi_agent_router.py
  function init_agent_service (line 26) | def init_agent_service():
  function test (line 51) | def test(
  function app_tui (line 75) | def app_tui():
  function app_gui (line 105) | def app_gui():

FILE: examples/parallel_doc_qa.py
  function test (line 19) | def test():
  function app_gui (line 38) | def app_gui():

FILE: examples/qwen2vl_assistant_tooluse.py
  class ExpressTracking (line 36) | class ExpressTracking(BaseToolWithFileAccess):
    method call (line 54) | def call(self, params: Union[str, dict], files: List[str] = None, **kw...
  class Area2Weather (line 101) | class Area2Weather(BaseToolWithFileAccess):
    method call (line 143) | def call(self, params: Union[str, dict], files: List[str] = None, **kw...
  class WeatherHour24 (line 172) | class WeatherHour24(BaseToolWithFileAccess):
    method call (line 184) | def call(self, params: Union[str, dict], files: List[str] = None, **kw...
  class CropResize (line 208) | class CropResize(BaseToolWithFileAccess):
    method _extract_coordinates (line 225) | def _extract_coordinates(self, text):
    method _expand_box (line 240) | def _expand_box(self, x1, y1, x2, y2, factor=1):
    method call (line 249) | def call(self, params: Union[str, dict], files: List[str] = None, **kw...
  function init_agent_service (line 288) | def init_agent_service():
  function test (line 346) | def test():
  function app_gui (line 368) | def app_gui():

FILE: examples/qwen2vl_assistant_video.py
  function test (line 18) | def test():

FILE: examples/qwen2vl_function_calling.py
  function image_gen (line 24) | def image_gen(prompt: str) -> str:
  function test (line 31) | def test():

FILE: examples/react_data_analysis.py
  function init_agent_service (line 26) | def init_agent_service():
  function test (line 43) | def test(query: str = 'pd.head the file first and then help me draw a li...
  function app_tui (line 60) | def app_tui():
  function app_gui (line 85) | def app_gui():

FILE: examples/tir_math.py
  function init_agent_service (line 33) | def init_agent_service():
  function test (line 40) | def test(query: str = '斐波那契数列前10个数字'):
  function app_tui (line 50) | def app_tui():
  function app_gui (line 66) | def app_gui():

FILE: examples/virtual_memory_qa.py
  function init_agent_service (line 25) | def init_agent_service():
  function test (line 36) | def test(query='简单列出这篇文章的贡献https://qianwen-res.oss-cn-beijing.aliyuncs.c...
  function app_tui (line 47) | def app_tui():
  function app_gui (line 72) | def app_gui():

FILE: examples/visual_storytelling.py
  class VisualStorytelling (line 27) | class VisualStorytelling(Agent):
    method __init__ (line 30) | def __init__(self,
    method _run (line 45) | def _run(self, messages: List[Message], lang: str = 'zh', **kwargs) ->...
  function test (line 66) | def test(query: Optional[str] = '看图说话',
  function app_tui (line 80) | def app_tui():
  function app_gui (line 104) | def app_gui():

FILE: qwen-agent-docs/website/app/[lang]/[[...mdxPath]]/page.jsx
  function generateStaticParams (line 8) | async function generateStaticParams() {
  function generateMetadata (line 28) | async function generateMetadata(props) {

FILE: qwen-agent-docs/website/app/[lang]/layout.tsx
  type LayoutProps (line 8) | type LayoutProps = Readonly<{
  constant SUPPORTED_LOCALES (line 16) | const SUPPORTED_LOCALES = ['en'];
  function generateStaticParams (line 19) | async function generateStaticParams() {

FILE: qwen-agent-docs/website/app/layout.tsx
  constant SITE_NAME (line 9) | const SITE_NAME = "Qwen Agent";
  constant DEFAULT_TITLE (line 10) | const DEFAULT_TITLE = "Qwen Agent: AI Agent Framework Documentation";
  constant DESCRIPTION (line 11) | const DESCRIPTION =
  constant KEYWORDS (line 14) | const KEYWORDS = [
  function getSiteUrl (line 33) | function getSiteUrl(): string {
  type LayoutProps (line 100) | type LayoutProps = Readonly<{

FILE: qwen-agent-docs/website/app/page.tsx
  function HomePage (line 3) | function HomePage() {

FILE: qwen-agent-docs/website/app/robots.ts
  function getSiteUrl (line 3) | function getSiteUrl(): string {
  function robots (line 16) | function robots(): MetadataRoute.Robots {

FILE: qwen-agent-docs/website/app/sitemap.ts
  constant LOCALES (line 5) | const LOCALES = ["en", "zh"] as const;
  function getSiteUrl (line 7) | function getSiteUrl(): string {
  function walkDir (line 20) | function walkDir(dir: string): string[] {
  function toDocPath (line 42) | function toDocPath(locale: string, markdownFile: string): string {
  function safeExists (line 57) | function safeExists(p: string): boolean {
  function sitemap (line 66) | function sitemap(): MetadataRoute.Sitemap {

FILE: qwen-agent-docs/website/src/components/leaderboard.tsx
  type ModelScore (line 5) | interface ModelScore {
  type VersionKey (line 23) | type VersionKey = "v1.1" | "v1.0";
  function RankBadge (line 86) | function RankBadge({ rank }: { rank: number }) {
  function ModelIcon (line 119) | function ModelIcon({ icon }: { icon: string }) {
  function sortByScore (line 137) | function sortByScore(models: ModelScore[]): ModelScore[] {
  function findBestValues (line 141) | function findBestValues(models: ModelScore[]) {
  function ScoreCell (line 159) | function ScoreCell({ value, isBest }: { value: number | null; isBest: bo...
  function Leaderboard (line 174) | function Leaderboard() {

FILE: qwen-agent-docs/website/src/components/locale-anchor.tsx
  constant LOCALES (line 8) | const LOCALES = ["en", "zh", "de", "fr", "ru", "ja", "pt-BR"] as const;
  type Locale (line 9) | type Locale = (typeof LOCALES)[number];
  function LinkArrowIcon (line 11) | function LinkArrowIcon(props: React.SVGProps<SVGSVGElement>) {
  function isExternalUrl (line 31) | function isExternalUrl(href: string) {
  function getLocaleFromPathname (line 41) | function getLocaleFromPathname(pathname: string | null): Locale | null {
  function hasLocalePrefix (line 50) | function hasLocalePrefix(path: string) {
  function LocaleAnchor (line 56) | function LocaleAnchor(

FILE: qwen_agent/agent.py
  class Agent (line 31) | class Agent(ABC):
    method __init__ (line 38) | def __init__(self,
    method run_nonstream (line 71) | def run_nonstream(self, messages: List[Union[Dict, Message]], **kwargs...
    method run (line 78) | def run(self, messages: List[Union[Dict, Message]],
    method _run (line 134) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...
    method _call_llm (line 150) | def _call_llm(
    method _call_tool (line 178) | def _call_tool(self, tool_name: str, tool_args: Union[str, dict] = '{}...
    method _init_tool (line 212) | def _init_tool(self, tool: Union[str, Dict, BaseTool]):
    method _detect_tool (line 239) | def _detect_tool(self, message: Message) -> Tuple[bool, str, str, str]:
  class BasicAgent (line 263) | class BasicAgent(Agent):
    method _run (line 265) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...

FILE: qwen_agent/agents/article_agent.py
  class ArticleAgent (line 23) | class ArticleAgent(Assistant):
    method _run (line 29) | def _run(self,

FILE: qwen_agent/agents/assistant.py
  function format_knowledge_to_source_and_content (line 52) | def format_knowledge_to_source_and_content(result: Union[str, List[dict]...
  class Assistant (line 81) | class Assistant(FnCallAgent):
    method __init__ (line 84) | def __init__(self,
    method _run (line 100) | def _run(self,
    method _prepend_knowledge_prompt (line 116) | def _prepend_knowledge_prompt(self,
  function get_current_date_str (line 152) | def get_current_date_str(

FILE: qwen_agent/agents/dialogue_retrieval_agent.py
  class DialogueRetrievalAgent (line 40) | class DialogueRetrievalAgent(Assistant):
    method _run (line 43) | def _run(self,

FILE: qwen_agent/agents/dialogue_simulator.py
  class DialogueSimulator (line 23) | class DialogueSimulator(Agent):
    method __init__ (line 25) | def __init__(self, user_agent: HumanSimulator, assistant_agent: Agent,...
    method _run (line 31) | def _run(self, messages: List[Message] = None, **kwargs) -> Iterator[L...
  function _swap_roles (line 55) | def _swap_roles(messages: List[Message]) -> List[Message]:

FILE: qwen_agent/agents/doc_qa/basic_doc_qa.py
  class BasicDocQA (line 40) | class BasicDocQA(Assistant):
    method __init__ (line 43) | def __init__(self,
    method _run (line 59) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...

FILE: qwen_agent/agents/doc_qa/parallel_doc_qa.py
  class ParallelDocQA (line 48) | class ParallelDocQA(Assistant):
    method __init__ (line 50) | def __init__(self,
    method _get_files (line 76) | def _get_files(self, messages: List[Message]):
    method _parse_and_chunk_files (line 85) | def _parse_and_chunk_files(self, messages: List[Message]):
    method _retrieve_according_to_member_responses (line 97) | def _retrieve_according_to_member_responses(
    method _is_none_response (line 163) | def _is_none_response(self, text: str) -> bool:
    method _extract_text_from_output (line 170) | def _extract_text_from_output(self, output):
    method _parser_json (line 177) | def _parser_json(self, content):
    method _run (line 189) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...
    method _ask_member_agent (line 254) | def _ask_member_agent(self,

FILE: qwen_agent/agents/doc_qa/parallel_doc_qa_member.py
  class ParallelDocQAMember (line 116) | class ParallelDocQAMember(Agent):
    method __init__ (line 118) | def __init__(self,
    method _run (line 129) | def _run(self,

FILE: qwen_agent/agents/doc_qa/parallel_doc_qa_summary.py
  class ParallelDocQASummary (line 69) | class ParallelDocQASummary(Agent):
    method _run (line 71) | def _run(self, messages: List[Message], knowledge: str = '', lang: str...

FILE: qwen_agent/agents/fncall_agent.py
  class FnCallAgent (line 27) | class FnCallAgent(Agent):
    method __init__ (line 30) | def __init__(self,
    method _run (line 73) | def _run(self, messages: List[Message], lang: Literal['en', 'zh'] = 'e...
    method _call_tool (line 110) | def _call_tool(self, tool_name: str, tool_args: Union[str, dict] = '{}...

FILE: qwen_agent/agents/group_chat.py
  class GroupChat (line 29) | class GroupChat(Agent, MultiAgentHub):
    method __init__ (line 37) | def __init__(self,
    method _run (line 81) | def _run(self,
    method _gen_batch_response (line 110) | def _gen_batch_response(self,
    method _gen_one_response (line 153) | def _gen_one_response(self,
    method _select_agent (line 168) | def _select_agent(self,
    method _manage_messages (line 214) | def _manage_messages(self, messages: List[Message], name: str) -> List...
    method _init_agents_from_config (line 265) | def _init_agents_from_config(self, cfgs: Dict, llm: Optional[Union[Dic...

FILE: qwen_agent/agents/group_chat_auto_router.py
  class GroupChatAutoRouter (line 25) | class GroupChatAutoRouter(Agent):
    method __init__ (line 50) | def __init__(self,
    method _run (line 72) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...

FILE: qwen_agent/agents/group_chat_creator.py
  class GroupChatCreator (line 69) | class GroupChatCreator(Agent):
    method __init__ (line 71) | def __init__(self,
    method _run (line 84) | def _run(self,
    method _preprocess_messages (line 95) | def _preprocess_messages(self, messages: List[Message]) -> List[Message]:
    method _postprocess_messages (line 113) | def _postprocess_messages(self, messages: List[Message]) -> List[Messa...
    method _extract_role_config_and_answer (line 127) | def _extract_role_config_and_answer(self, text: str) -> Tuple[str, Lis...

FILE: qwen_agent/agents/human_simulator.py
  class HumanSimulator (line 36) | class HumanSimulator(Agent):
    method __init__ (line 38) | def __init__(self,
    method _run (line 54) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...

FILE: qwen_agent/agents/keygen_strategies/gen_keyword.py
  class GenKeyword (line 25) | class GenKeyword(Agent):
    method __init__ (line 69) | def __init__(self,
    method _run (line 80) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...

FILE: qwen_agent/agents/keygen_strategies/gen_keyword_with_knowledge.py
  class GenKeywordWithKnowledge (line 26) | class GenKeywordWithKnowledge(GenKeyword):
    method __init__ (line 58) | def __init__(self,
    method _run (line 65) | def _run(self,

FILE: qwen_agent/agents/keygen_strategies/split_query.py
  class SplitQuery (line 25) | class SplitQuery(GenKeyword):
    method __init__ (line 79) | def __init__(self,
    method _run (line 92) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...

FILE: qwen_agent/agents/keygen_strategies/split_query_then_gen_keyword.py
  class SplitQueryThenGenKeyword (line 28) | class SplitQueryThenGenKeyword(Agent):
    method __init__ (line 30) | def __init__(self,
    method _run (line 39) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...

FILE: qwen_agent/agents/keygen_strategies/split_query_then_gen_keyword_with_knowledge.py
  class SplitQueryThenGenKeywordWithKnowledge (line 24) | class SplitQueryThenGenKeywordWithKnowledge(SplitQueryThenGenKeyword):
    method __init__ (line 26) | def __init__(self,

FILE: qwen_agent/agents/memo_assistant.py
  class MemoAssistant (line 41) | class MemoAssistant(Assistant):
    method __init__ (line 43) | def __init__(self,
    method _run (line 58) | def _run(self, messages: List[Message], lang: str = 'zh', knowledge: s...
    method _prepend_storage_info_to_sys (line 65) | def _prepend_storage_info_to_sys(self, messages: List[Message]) -> Lis...
    method _truncate_dialogue_history (line 93) | def _truncate_dialogue_history(self, messages: List[Message]) -> List[...

FILE: qwen_agent/agents/react_chat.py
  class ReActChat (line 50) | class ReActChat(FnCallAgent):
    method __init__ (line 53) | def __init__(self,
    method _run (line 73) | def _run(self, messages: List[Message], lang: Literal['en', 'zh'] = 'e...
    method _prepend_react_prompt (line 109) | def _prepend_react_prompt(self, messages: List[Message], lang: Literal...
    method _detect_tool (line 134) | def _detect_tool(self, text: str) -> Tuple[bool, str, str, str]:

FILE: qwen_agent/agents/router.py
  class Router (line 36) | class Router(Assistant, MultiAgentHub):
    method __init__ (line 38) | def __init__(self,
    method _run (line 61) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...
    method supplement_name_special_token (line 93) | def supplement_name_special_token(message: Message) -> Message:

FILE: qwen_agent/agents/tir_agent.py
  function extract_program (line 33) | def extract_program(result: str, last_only=True):
  class TIRMathAgent (line 56) | class TIRMathAgent(FnCallAgent):
    method __init__ (line 59) | def __init__(self,
    method _run (line 76) | def _run(self, messages: List[Message], lang: Literal['en', 'zh'] = 'e...
    method _detect_tool (line 130) | def _detect_tool(self, text: str) -> Tuple[bool, str, str, str]:

FILE: qwen_agent/agents/user_agent.py
  class UserAgent (line 23) | class UserAgent(Agent):
    method _run (line 25) | def _run(self, messages: List[Message], **kwargs) -> Iterator[List[Mes...

FILE: qwen_agent/agents/virtual_memory_agent.py
  class VirtualMemoryAgent (line 28) | class VirtualMemoryAgent(Assistant):
    method __init__ (line 30) | def __init__(self,
    method _run (line 48) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...
    method _format_file (line 80) | def _format_file(self, messages: List[Message], lang: str = 'en') -> L...

FILE: qwen_agent/agents/write_from_scratch.py
  function is_roman_numeral (line 28) | def is_roman_numeral(s):
  class WriteFromScratch (line 34) | class WriteFromScratch(Agent):
    method _run (line 36) | def _run(self, messages: List[Message], knowledge: str = '', lang: str...

FILE: qwen_agent/agents/writing/continue_writing.py
  class ContinueWriting (line 46) | class ContinueWriting(Agent):
    method _run (line 48) | def _run(self, messages: List[Message], knowledge: str = '', lang: str...

FILE: qwen_agent/agents/writing/expand_writing.py
  class ExpandWriting (line 54) | class ExpandWriting(Agent):
    method _run (line 56) | def _run(self,

FILE: qwen_agent/agents/writing/outline_writing.py
  class OutlineWriting (line 48) | class OutlineWriting(Agent):
    method _run (line 50) | def _run(self, messages: List[Message], knowledge: str = '', lang: str...

FILE: qwen_agent/gui/gradio_utils.py
  function covert_image_to_base64 (line 18) | def covert_image_to_base64(image_path):
  function format_cover_html (line 34) | def format_cover_html(bot_name, bot_description, bot_avatar):

FILE: qwen_agent/gui/utils.py
  function get_avatar_image (line 43) | def get_avatar_image(name: str = 'user') -> str:
  function convert_history_to_chatbot (line 50) | def convert_history_to_chatbot(messages):
  function convert_fncall_to_text (line 67) | def convert_fncall_to_text(messages: List[Dict]) -> List[Dict]:

FILE: qwen_agent/gui/web_ui.py
  class WebUI (line 29) | class WebUI:
    method __init__ (line 32) | def __init__(self, agent: Union[Agent, MultiAgentHub, List[Agent]], ch...
    method run (line 83) | def run(self,
    method change_agent (line 214) | def change_agent(self, agent_selector):
    method add_text (line 218) | def add_text(self, _input, _audio_input, _chatbot, _history):
    method add_mention (line 252) | def add_mention(self, _chatbot, _agent_selector):
    method agent_run (line 268) | def agent_run(self, _chatbot, _history, _agent_selector=None):
    method flushed (line 325) | def flushed(self):
    method _get_agent_index_by_name (line 330) | def _get_agent_index_by_name(self, agent_name):
    method _create_agent_info_block (line 344) | def _create_agent_info_block(self, agent_index=0):
    method _create_agent_plugins_block (line 356) | def _create_agent_plugins_block(self, agent_index=0):

FILE: qwen_agent/llm/__init__.py
  function get_chat_model (line 31) | def get_chat_model(cfg: Union[dict, str] = 'qwen-plus') -> BaseChatModel:

FILE: qwen_agent/llm/azure.py
  class TextChatAtAzure (line 25) | class TextChatAtAzure(TextChatAtOAI):
    method __init__ (line 27) | def __init__(self, cfg: Optional[Dict] = None):

FILE: qwen_agent/llm/base.py
  function register_llm (line 35) | def register_llm(model_type):
  class ModelServiceError (line 44) | class ModelServiceError(Exception):
    method __init__ (line 46) | def __init__(self,
  class BaseChatModel (line 61) | class BaseChatModel(ABC):
    method support_multimodal_input (line 65) | def support_multimodal_input(self) -> bool:
    method support_multimodal_output (line 70) | def support_multimodal_output(self) -> bool:
    method support_audio_input (line 75) | def support_audio_input(self) -> bool:
    method __init__ (line 78) | def __init__(self, cfg: Optional[Dict] = None):
    method quick_chat (line 111) | def quick_chat(self, prompt: str) -> str:
    method chat (line 118) | def chat(
    method _chat (line 292) | def _chat(
    method _chat_with_functions (line 305) | def _chat_with_functions(
    method _continue_assistant_response (line 316) | def _continue_assistant_response(
    method _chat_stream (line 325) | def _chat_stream(
    method _chat_no_stream (line 334) | def _chat_no_stream(
    method _preprocess_messages (line 341) | def _preprocess_messages(
    method _postprocess_messages (line 364) | def _postprocess_messages(
    method _postprocess_messages_iterator (line 381) | def _postprocess_messages_iterator(
    method _convert_messages_to_target_type (line 392) | def _convert_messages_to_target_type(self, messages: List[Message],
    method _convert_messages_iterator_to_target_type (line 401) | def _convert_messages_iterator_to_target_type(
    method raw_chat (line 407) | def raw_chat(
    method _conv_qwen_agent_messages_to_oai (line 422) | def _conv_qwen_agent_messages_to_oai(messages: List[Union[Message, Dic...
    method quick_chat_oai (line 452) | def quick_chat_oai(self, messages: List[dict], tools: Optional[list] =...
  function _format_as_text_messages (line 536) | def _format_as_text_messages(messages: List[Message]) -> List[Message]:
  function _postprocess_stop_words (line 547) | def _postprocess_stop_words(messages: List[Message], stop: List[str]) ->...
  function _truncate_at_stop_word (line 592) | def _truncate_at_stop_word(text: str, stop: List[str]):
  function _truncate_input_messages_roughly (line 602) | def _truncate_input_messages_roughly(messages: List[Message], max_tokens...
  function retry_model_service (line 807) | def retry_model_service(
  function retry_model_service_iterator (line 822) | def retry_model_service_iterator(
  function _raise_or_delay (line 839) | def _raise_or_delay(
  function _rm_think (line 878) | def _rm_think(text: str) -> str:

FILE: qwen_agent/llm/fncall_prompts/base_fncall_prompt.py
  class BaseFnCallPrompt (line 21) | class BaseFnCallPrompt(object):
    method preprocess_fncall_messages (line 24) | def preprocess_fncall_messages(messages: List[Message],
    method postprocess_fncall_messages (line 38) | def postprocess_fncall_messages(messages: List[Message],
    method format_plaintext_train_samples (line 48) | def format_plaintext_train_samples(

FILE: qwen_agent/llm/fncall_prompts/nous_fncall_prompt.py
  class NousFnCallPrompt (line 27) | class NousFnCallPrompt(BaseFnCallPrompt):
    method preprocess_fncall_messages (line 29) | def preprocess_fncall_messages(self,
    method postprocess_fncall_messages (line 103) | def postprocess_fncall_messages(
  function remove_incomplete_special_tokens (line 294) | def remove_incomplete_special_tokens(text: str) -> str:
  function extract_fn (line 300) | def extract_fn(text: str):

FILE: qwen_agent/llm/fncall_prompts/qwen_fncall_prompt.py
  class QwenFnCallPrompt (line 24) | class QwenFnCallPrompt(BaseFnCallPrompt):
    method preprocess_fncall_messages (line 27) | def preprocess_fncall_messages(messages: List[Message],
    method postprocess_fncall_messages (line 113) | def postprocess_fncall_messages(messages: List[Message],
  function get_function_description (line 335) | def get_function_description(function: Dict, lang: Literal['en', 'zh']) ...
  function remove_incomplete_special_tokens (line 369) | def remove_incomplete_special_tokens(text: str) -> str:
  function remove_trailing_comment_of_fn_args (line 389) | def remove_trailing_comment_of_fn_args(fn_args: str):

FILE: qwen_agent/llm/function_calling.py
  class BaseFnCallModel (line 23) | class BaseFnCallModel(BaseChatModel, ABC):
    method __init__ (line 25) | def __init__(self, cfg: Optional[Dict] = None):
    method _preprocess_messages (line 41) | def _preprocess_messages(
    method _postprocess_messages (line 68) | def _postprocess_messages(
    method _remove_fncall_messages (line 84) | def _remove_fncall_messages(self, messages: List[Message], lang: Liter...
    method _chat_with_functions (line 120) | def _chat_with_functions(
    method _continue_assistant_response (line 138) | def _continue_assistant_response(
  function simulate_response_completion_with_chat (line 148) | def simulate_response_completion_with_chat(messages: List[Message]) -> L...
  function validate_num_fncall_results (line 167) | def validate_num_fncall_results(messages: List[Message], support_multimo...

FILE: qwen_agent/llm/oai.py
  class TextChatAtOAI (line 37) | class TextChatAtOAI(BaseFnCallModel):
    method __init__ (line 39) | def __init__(self, cfg: Optional[Dict] = None):
    method _chat_stream (line 98) | def _chat_stream(
    method _chat_no_stream (line 161) | def _chat_no_stream(
    method convert_messages_to_dicts (line 180) | def convert_messages_to_dicts(self, messages: List[Message]) -> List[d...

FILE: qwen_agent/llm/openvino.py
  class OpenVINO (line 28) | class OpenVINO(BaseFnCallModel):
    method __init__ (line 56) | def __init__(self, cfg: Optional[Dict] = None):
    method _get_stopping_criteria (line 82) | def _get_stopping_criteria(self, generate_cfg: dict):
    method _chat_stream (line 108) | def _chat_stream(
    method _chat_no_stream (line 142) | def _chat_no_stream(

FILE: qwen_agent/llm/qwen_dashscope.py
  class QwenChatAtDS (line 30) | class QwenChatAtDS(BaseFnCallModel):
    method __init__ (line 32) | def __init__(self, cfg: Optional[Dict] = None):
    method _chat_stream (line 37) | def _chat_stream(
    method _chat_no_stream (line 60) | def _chat_no_stream(
    method _continue_assistant_response (line 88) | def _continue_assistant_response(
    method _delta_stream_output (line 97) | def _delta_stream_output(response) -> Iterator[List[Message]]:
    method _full_stream_output (line 110) | def _full_stream_output(response) -> Iterator[List[Message]]:
  function initialize_dashscope (line 162) | def initialize_dashscope(cfg: Optional[Dict] = None) -> None:

FILE: qwen_agent/llm/qwenaudio_dashscope.py
  class QwenAudioChatAtDS (line 22) | class QwenAudioChatAtDS(QwenVLChatAtDS):
    method support_multimodal_input (line 25) | def support_multimodal_input(self) -> bool:
    method __init__ (line 28) | def __init__(self, cfg: Optional[Dict] = None):

FILE: qwen_agent/llm/qwenomni_oai.py
  class QwenOmniChatAtOAI (line 22) | class QwenOmniChatAtOAI(QwenVLChatAtOAI):
    method support_audio_input (line 25) | def support_audio_input(self) -> bool:
    method __init__ (line 28) | def __init__(self, cfg: Optional[Dict] = None):

FILE: qwen_agent/llm/qwenvl_dashscope.py
  class QwenVLChatAtDS (line 35) | class QwenVLChatAtDS(BaseFnCallModel):
    method support_multimodal_input (line 38) | def support_multimodal_input(self) -> bool:
    method __init__ (line 41) | def __init__(self, cfg: Optional[Dict] = None):
    method _chat_stream (line 46) | def _chat_stream(
    method _chat_no_stream (line 182) | def _chat_no_stream(
    method _continue_assistant_response (line 221) | def _continue_assistant_response(
  function _format_local_files (line 233) | def _format_local_files(messages: List[Message]) -> List[Message]:
  function _conv_fname (line 254) | def _conv_fname(fname: str) -> str:
  function rm_unsupported_modality (line 274) | def rm_unsupported_modality(messages: List[Message]) -> List[Message]:

FILE: qwen_agent/llm/qwenvl_oai.py
  class QwenVLChatAtOAI (line 30) | class QwenVLChatAtOAI(TextChatAtOAI):
    method support_multimodal_input (line 33) | def support_multimodal_input(self) -> bool:
    method convert_messages_to_dicts (line 36) | def convert_messages_to_dicts(self, messages: List[Message]) -> List[d...
  function conv_multimodel_value (line 103) | def conv_multimodel_value(t, v):

FILE: qwen_agent/llm/qwenvlo_dashscope.py
  class QwenVLoChatAtDS (line 8) | class QwenVLoChatAtDS(QwenVLChatAtDS):
    method support_multimodal_output (line 11) | def support_multimodal_output(self) -> bool:
    method __init__ (line 14) | def __init__(self, cfg: Optional[Dict] = None):

FILE: qwen_agent/llm/schema.py
  class BaseModelCompatibleDict (line 37) | class BaseModelCompatibleDict(BaseModel):
    method __getitem__ (line 39) | def __getitem__(self, item):
    method __setitem__ (line 42) | def __setitem__(self, key, value):
    method model_dump (line 45) | def model_dump(self, **kwargs):
    method model_dump_json (line 50) | def model_dump_json(self, **kwargs):
    method get (line 55) | def get(self, key, default=None):
    method __str__ (line 65) | def __str__(self):
  class FunctionCall (line 69) | class FunctionCall(BaseModelCompatibleDict):
    method __init__ (line 73) | def __init__(self, name: str, arguments: str):
    method __repr__ (line 76) | def __repr__(self):
  class ContentItem (line 80) | class ContentItem(BaseModelCompatibleDict):
    method __init__ (line 87) | def __init__(self,
    method check_exclusivity (line 96) | def check_exclusivity(self):
    method __repr__ (line 113) | def __repr__(self):
    method get_type_and_value (line 116) | def get_type_and_value(self) -> Tuple[Literal['text', 'image', 'file',...
    method type (line 122) | def type(self) -> Literal['text', 'image', 'file', 'audio', 'video']:
    method value (line 127) | def value(self) -> str:
  class Message (line 132) | class Message(BaseModelCompatibleDict):
    method __init__ (line 140) | def __init__(self,
    method __repr__ (line 157) | def __repr__(self):
    method role_checker (line 161) | def role_checker(cls, value: str) -> str:

FILE: qwen_agent/llm/transformers_llm.py
  class Transformers (line 28) | class Transformers(BaseFnCallModel):
    method __init__ (line 40) | def __init__(self, cfg: Optional[Dict] = None):
    method support_multimodal_input (line 74) | def support_multimodal_input(self) -> bool:
    method support_audio_input (line 78) | def support_audio_input(self) -> bool:
    method _get_streamer (line 81) | def _get_streamer(self):
    method _get_inputs (line 86) | def _get_inputs(self, messages: List[Message]):
    method _chat_stream (line 135) | def _chat_stream(
    method _chat_no_stream (line 169) | def _chat_no_stream(

FILE: qwen_agent/log.py
  function setup_logger (line 19) | def setup_logger(level=None):

FILE: qwen_agent/memory/memory.py
  class Memory (line 32) | class Memory(Agent):
    method __init__ (line 38) | def __init__(self,
    method _run (line 81) | def _run(self, messages: List[Message], lang: str = 'en', **kwargs) ->...
    method get_rag_files (line 146) | def get_rag_files(self, messages: List[Message]):

FILE: qwen_agent/multi_agent_hub.py
  class MultiAgentHub (line 22) | class MultiAgentHub(ABC):
    method agents (line 25) | def agents(self) -> List[Agent]:
    method agent_names (line 44) | def agent_names(self) -> List[str]:
    method nonuser_agents (line 48) | def nonuser_agents(self):

FILE: qwen_agent/tools/amap_weather.py
  class AmapWeather (line 24) | class AmapWeather(BaseTool):
    method __init__ (line 37) | def __init__(self, cfg: Optional[Dict] = None):
    method get_city_adcode (line 51) | def get_city_adcode(self, city_name):
    method call (line 59) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: qwen_agent/tools/base.py
  class ToolServiceError (line 27) | class ToolServiceError(Exception):
    method __init__ (line 29) | def __init__(self,
  function register_tool (line 44) | def register_tool(name, allow_overwrite=False):
  function is_tool_schema (line 62) | def is_tool_schema(obj: dict) -> bool:
  class BaseTool (line 109) | class BaseTool(ABC):
    method __init__ (line 114) | def __init__(self, cfg: Optional[dict] = None):
    method call (line 126) | def call(self, params: Union[str, dict], **kwargs) -> Union[str, list,...
    method _verify_json_format_args (line 140) | def _verify_json_format_args(self, params: Union[str, dict], strict_js...
    method function (line 165) | def function(self) -> dict:  # Bad naming. It should be `function_info`.
    method name_for_human (line 175) | def name_for_human(self) -> str:
    method args_format (line 179) | def args_format(self) -> str:
    method file_access (line 189) | def file_access(self) -> bool:
  class BaseToolWithFileAccess (line 193) | class BaseToolWithFileAccess(BaseTool, ABC):
    method __init__ (line 195) | def __init__(self, cfg: Optional[Dict] = None):
    method file_access (line 202) | def file_access(self) -> bool:
    method call (line 205) | def call(self, params: Union[str, dict], files: List[str] = None, **kw...

FILE: qwen_agent/tools/code_interpreter.py
  function _kill_kernels_and_containers (line 55) | def _kill_kernels_and_containers(_sig_num=None, _frame=None):
  class CodeInterpreter (line 81) | class CodeInterpreter(BaseToolWithFileAccess):
    method __init__ (line 94) | def __init__(self, cfg: Optional[Dict] = None):
    method args_format (line 105) | def args_format(self) -> str:
    method call (line 114) | def call(self, params: Union[str, dict], files: List[str] = None, time...
    method __del__ (line 157) | def __del__(self):
    method _build_docker_image (line 172) | def _build_docker_image(self):
    method _get_free_ports (line 203) | def _get_free_ports(self, n=5):
    method _start_kernel (line 215) | def _start_kernel(self, kernel_id: str):
    method _execute_code (line 342) | def _execute_code(self, kc, code: str) -> str:
    method _serve_image (line 396) | def _serve_image(self, image_base64: str) -> str:
  function _check_docker_availability (line 413) | def _check_docker_availability():
  function _check_host_deps (line 446) | def _check_host_deps():
  function _escape_ansi (line 457) | def _escape_ansi(line: str) -> str:
  class AnyThreadEventLoopPolicy (line 473) | class AnyThreadEventLoopPolicy(_BasePolicy):  # type: ignore
    method get_event_loop (line 486) | def get_event_loop(self) -> asyncio.AbstractEventLoop:

FILE: qwen_agent/tools/doc_parser.py
  class Chunk (line 32) | class Chunk(BaseModel):
    method __init__ (line 37) | def __init__(self, content: str, metadata: dict, token: int):
    method to_dict (line 40) | def to_dict(self) -> dict:
  class Record (line 44) | class Record(BaseModel):
    method __init__ (line 49) | def __init__(self, url: str, raw: List[Chunk], title: str):
    method to_dict (line 52) | def to_dict(self) -> dict:
  class DocParser (line 57) | class DocParser(BaseTool):
    method __init__ (line 70) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 80) | def call(self, params: Union[str, dict], **kwargs) -> dict:
    method split_doc_to_chunk (line 152) | def split_doc_to_chunk(self,
    method _get_last_part (line 275) | def _get_last_part(self, chunk: list) -> str:

FILE: qwen_agent/tools/extract_doc_vocabulary.py
  class ExtractDocVocabulary (line 29) | class ExtractDocVocabulary(BaseTool):
    method __init__ (line 45) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 52) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: qwen_agent/tools/image_gen.py
  class ImageGen (line 24) | class ImageGen(BaseTool):
    method __init__ (line 39) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 47) | def call(self, params: Union[str, dict], **kwargs) -> List[ContentItem]:

FILE: qwen_agent/tools/image_search.py
  function _new_getaddrinfo (line 38) | def _new_getaddrinfo(*args, **kwargs):
  class ImageResult (line 44) | class ImageResult(BaseModel):
    method __str__ (line 56) | def __str__(self):
    method __getitem__ (line 69) | def __getitem__(self, item):
    method __setitem__ (line 72) | def __setitem__(self, key, value):
  function serper_search (line 76) | def serper_search(image_url: str, check_accessibility: bool = True, max_...
  class ImageSearch (line 139) | class ImageSearch(BaseTool):
    method call (line 153) | def call(self, params: Union[str, dict], **kwargs) -> str:
  function check_image_url_accessibility (line 181) | def check_image_url_accessibility(url: str, timeout: int = 10) -> Tuple[...

FILE: qwen_agent/tools/image_zoom_in_qwen3vl.py
  class ImageZoomInToolQwen3VL (line 32) | class ImageZoomInToolQwen3VL(BaseToolWithFileAccess):
    method round_by_factor (line 64) | def round_by_factor(self, number: int, factor: int) -> int:
    method ceil_by_factor (line 68) | def ceil_by_factor(self, number: int, factor: int) -> int:
    method floor_by_factor (line 72) | def floor_by_factor(self, number: int, factor: int) -> int:
    method smart_resize (line 76) | def smart_resize(self,
    method maybe_resize_bbox (line 95) | def maybe_resize_bbox(self, left, top, right, bottom, img_width, img_h...
    method call (line 128) | def call(self, params: Union[str, dict], **kwargs) -> List[ContentItem]:

FILE: qwen_agent/tools/mcp_manager.py
  class MCPManager (line 31) | class MCPManager:
    method __new__ (line 34) | def __new__(cls, *args, **kwargs):
    method __init__ (line 39) | def __init__(self):
    method monkey_patch_mcp_create_platform_compatible_process (line 57) | def monkey_patch_mcp_create_platform_compatible_process(self):
    method start_loop (line 73) | def start_loop(self):
    method is_valid_mcp_servers (line 94) | def is_valid_mcp_servers(self, config: dict):
    method initConfig (line 139) | def initConfig(self, config: Dict):
    method init_config_async (line 152) | async def init_config_async(self, config: Dict):
    method create_tool_class (line 265) | def create_tool_class(self, register_name, register_client_id, tool_na...
    method shutdown (line 289) | def shutdown(self):
  class MCPClient (line 313) | class MCPClient:
    method __init__ (line 315) | def __init__(self):
    method connection_server (line 325) | async def connection_server(self, mcp_server_name, mcp_server):
    method reconnect (line 390) | async def reconnect(self):
    method execute_function (line 401) | async def execute_function(self, tool_name, tool_args: dict):
    method cleanup (line 463) | async def cleanup(self):
  function _cleanup_mcp (line 467) | def _cleanup_mcp(_sig_num=None, _frame=None):

FILE: qwen_agent/tools/python_executor.py
  class GenericRuntime (line 34) | class GenericRuntime:
    method __init__ (line 39) | def __init__(self):
    method exec_code (line 46) | def exec_code(self, code_piece: str) -> None:
    method eval_code (line 51) | def eval_code(self, expr: str) -> Any:
    method inject (line 54) | def inject(self, var_dict: Dict[str, Any]) -> None:
    method answer (line 59) | def answer(self):
  class DateRuntime (line 63) | class DateRuntime(GenericRuntime):
  class CustomDict (line 72) | class CustomDict(dict):
    method __iter__ (line 74) | def __iter__(self):
  class ColorObjectRuntime (line 78) | class ColorObjectRuntime(GenericRuntime):
  function _check_deps_for_python_executor (line 82) | def _check_deps_for_python_executor():
  class PythonExecutor (line 96) | class PythonExecutor(BaseTool):
    method __init__ (line 110) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 129) | def call(self, params: Union[str, dict], **kwargs) -> list:
    method apply (line 142) | def apply(self, code: str) -> list:
    method process_generation_to_code (line 145) | def process_generation_to_code(self, gens: str):
    method execute (line 149) | def execute(
    method truncate (line 183) | def truncate(s, max_length=256):
    method batch_apply (line 189) | def batch_apply(self, batch_code: List[str]) -> list:
  function _test (line 240) | def _test():

FILE: qwen_agent/tools/resource/code_interpreter_init_kernel.py
  function input (line 30) | def input(*args, **kwargs):  # noqa
  function _m6_timout_handler (line 34) | def _m6_timout_handler(_signum=None, _frame=None):
  class _M6CountdownTimer (line 44) | class _M6CountdownTimer:
    method start (line 47) | def start(cls, timeout: int):
    method cancel (line 54) | def cancel(cls):

FILE: qwen_agent/tools/retrieval.py
  function _check_deps_for_rag (line 25) | def _check_deps_for_rag():
  class Retrieval (line 42) | class Retrieval(BaseTool):
    method __init__ (line 66) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 79) | def call(self, params: Union[str, dict], **kwargs) -> list:

FILE: qwen_agent/tools/search_tools/base_search.py
  class RefMaterialOutput (line 27) | class RefMaterialOutput(BaseModel):
    method to_dict (line 32) | def to_dict(self) -> dict:
  class BaseSearch (line 39) | class BaseSearch(BaseTool):
    method __init__ (line 52) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 56) | def call(self, params: Union[str, dict], docs: List[Union[Record, str,...
    method search (line 89) | def search(self, query: str, docs: List[Record], max_ref_token: int = ...
    method sort_by_scores (line 94) | def sort_by_scores(self, query: str, docs: List[Record], **kwargs) -> ...
    method get_topk (line 107) | def get_topk(self,
    method format_docs (line 139) | def format_docs(self, docs: List[Union[Record, str, List[str]]]):
    method _get_the_front_part (line 166) | def _get_the_front_part(docs: List[Record], max_ref_token: int = DEFAU...

FILE: qwen_agent/tools/search_tools/front_page_search.py
  class FrontPageSearch (line 28) | class FrontPageSearch(BaseSearch):
    method sort_by_scores (line 30) | def sort_by_scores(self,

FILE: qwen_agent/tools/search_tools/hybrid_search.py
  class HybridSearch (line 25) | class HybridSearch(BaseSearch):
    method __init__ (line 27) | def __init__(self, cfg: Optional[Dict] = None):
    method sort_by_scores (line 35) | def sort_by_scores(self, query: str, docs: List[Record], **kwargs) -> ...

FILE: qwen_agent/tools/search_tools/keyword_search.py
  class KeywordSearch (line 30) | class KeywordSearch(BaseSearch):
    method search (line 32) | def search(self, query: str, docs: List[Record], max_ref_token: int = ...
    method sort_by_scores (line 44) | def sort_by_scores(self, query: str, docs: List[Record], **kwargs) -> ...
  function clean_en_token (line 95) | def clean_en_token(token: str) -> str:
  function tokenize_and_filter (line 111) | def tokenize_and_filter(input_text: str) -> str:
  function string_tokenizer (line 132) | def string_tokenizer(text: str) -> List[str]:
  function split_text_into_keywords (line 159) | def split_text_into_keywords(text: str) -> List[str]:
  function parse_keyword (line 169) | def parse_keyword(text):

FILE: qwen_agent/tools/search_tools/vector_search.py
  class VectorSearch (line 25) | class VectorSearch(BaseSearch):
    method sort_by_scores (line 28) | def sort_by_scores(self, query: str, docs: List[Record], **kwargs) -> ...

FILE: qwen_agent/tools/simple_doc_parser.py
  function clean_paragraph (line 32) | def clean_paragraph(text):
  class DocParserError (line 39) | class DocParserError(Exception):
    method __init__ (line 41) | def __init__(self,
  function parse_word (line 59) | def parse_word(docx_path: str, extract_image: bool = False):
  function parse_ppt (line 80) | def parse_ppt(path: str, extract_image: bool = False):
  function parse_txt (line 116) | def parse_txt(path: str):
  function df_to_md (line 127) | def df_to_md(df) -> str:
  function parse_excel (line 150) | def parse_excel(file_path: str, extract_image: bool = False) -> List[dict]:
  function parse_csv (line 166) | def parse_csv(file_path: str, extract_image: bool = False) -> List[dict]:
  function parse_tsv (line 184) | def parse_tsv(file_path: str, extract_image: bool = False) -> List[dict]:
  function parse_html_bs (line 202) | def parse_html_bs(path: str, extract_image: bool = False):
  function parse_pdf (line 240) | def parse_pdf(pdf_path: str, extract_image: bool = False) -> List[dict]:
  function postprocess_page_content (line 292) | def postprocess_page_content(page_content: list) -> list:
  function get_font (line 330) | def get_font(element):
  function extract_tables (line 349) | def extract_tables(pdf, page_num):
  function table_converter (line 355) | def table_converter(table):
  function get_plain_doc (line 371) | def get_plain_doc(doc: list):
  class SimpleDocParser (line 382) | class SimpleDocParser(BaseTool):
    method __init__ (line 395) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 403) | def call(self, params: Union[str, dict], **kwargs) -> Union[str, list]:

FILE: qwen_agent/tools/storage.py
  class KeyNotExistsError (line 23) | class KeyNotExistsError(ValueError):
  class Storage (line 28) | class Storage(BaseTool):
    method __init__ (line 53) | def __init__(self, cfg: Optional[Dict] = None):
    method call (line 58) | def call(self, params: Union[str, dict], **kwargs) -> str:
    method put (line 75) | def put(self, key: str, value: str, path: Optional[str] = None) -> str:
    method get (line 88) | def get(self, key: str, path: Optional[str] = None) -> str:
    method delete (line 94) | def delete(self, key, path: Optional[str] = None) -> str:
    method scan (line 103) | def scan(self, key: str, path: Optional[str] = None) -> str:

FILE: qwen_agent/tools/web_extractor.py
  class WebExtractor (line 22) | class WebExtractor(BaseTool):
    method call (line 35) | def call(self, params: Union[str, dict], **kwargs) -> str:

FILE: qwen_agent/tools/web_search.py
  class WebSearch (line 27) | class WebSearch(BaseTool):
    method call (line 40) | def call(self, params: Union[str, dict], **kwargs) -> str:
    method search (line 49) | def search(query: str) -> List[Any]:
    method _format_results (line 62) | def _format_results(search_results: List[Any]) -> str:

FILE: qwen_agent/utils/output_beautify.py
  function typewriter_print (line 28) | def typewriter_print(messages: List[dict], text: str) -> str:
  function multimodal_typewriter_print (line 51) | def multimodal_typewriter_print(messages: List[dict], text: str = '') ->...

FILE: qwen_agent/utils/parallel_executor.py
  function parallel_exec (line 21) | def parallel_exec(
  function serial_exec (line 58) | def serial_exec(fn: Callable, list_of_kwargs: List[dict]) -> List[Any]:

FILE: qwen_agent/utils/str_processing.py
  function rm_newlines (line 20) | def rm_newlines(text):
  function rm_cid (line 31) | def rm_cid(text):
  function rm_hexadecimal (line 36) | def rm_hexadecimal(text):
  function rm_continuous_placeholders (line 41) | def rm_continuous_placeholders(text):

FILE: qwen_agent/utils/tokenization_qwen.py
  function _load_tiktoken_bpe (line 49) | def _load_tiktoken_bpe(tiktoken_bpe_file: str) -> Dict[bytes, int]:
  class QWenTokenizer (line 57) | class QWenTokenizer:
    method __init__ (line 62) | def __init__(
    method __getstate__ (line 112) | def __getstate__(self):
    method __setstate__ (line 118) | def __setstate__(self, state):
    method __len__ (line 129) | def __len__(self) -> int:
    method get_vocab (line 132) | def get_vocab(self) -> Dict[bytes, int]:
    method convert_tokens_to_ids (line 135) | def convert_tokens_to_ids(self, tokens: Union[bytes, str, List[Union[b...
    method tokenize (line 149) | def tokenize(
    method convert_tokens_to_string (line 179) | def convert_tokens_to_string(self, tokens: List[Union[bytes, str]]) ->...
    method vocab_size (line 200) | def vocab_size(self):
    method _decode (line 203) | def _decode(
    method encode (line 215) | def encode(self, text: str) -> List[int]:
    method count_tokens (line 218) | def count_tokens(self, text: str) -> int:
    method truncate (line 221) | def truncate(self, text: str, max_token: int, start_token: int = 0, ke...
  function count_tokens (line 245) | def count_tokens(text: str) -> int:

FILE: qwen_agent/utils/utils.py
  function append_signal_handler (line 41) | def append_signal_handler(sig, handler):
  function get_local_ip (line 67) | def get_local_ip() -> str:
  function hash_sha256 (line 80) | def hash_sha256(text: str) -> str:
  function print_traceback (line 86) | def print_traceback(is_error: bool = True):
  function has_chinese_chars (line 97) | def has_chinese_chars(data: Any) -> bool:
  function has_chinese_messages (line 102) | def has_chinese_messages(messages: List[Union[Message, dict]], check_rol...
  function get_basename_from_url (line 110) | def get_basename_from_url(path_or_url: str) -> str:
  function is_http_url (line 130) | def is_http_url(path_or_url: str) -> bool:
  function is_image (line 136) | def is_image(path_or_url: str) -> bool:
  function sanitize_chrome_file_path (line 144) | def sanitize_chrome_file_path(file_path: str) -> str:
  function sanitize_windows_file_path (line 158) | def sanitize_windows_file_path(file_path: str) -> str:
  function save_url_to_local_work_dir (line 184) | def save_url_to_local_work_dir(url: str, save_dir: str, save_filename: s...
  function save_text_to_file (line 211) | def save_text_to_file(path: str, text: str) -> None:
  function read_text_from_file (line 216) | def read_text_from_file(path: str) -> str:
  function contains_html_tags (line 228) | def contains_html_tags(text: str) -> bool:
  function get_content_type_by_head_request (line 233) | def get_content_type_by_head_request(path: str) -> str:
  function get_file_type (line 242) | def get_file_type(path: str) -> Literal['pdf', 'docx', 'pptx', 'txt', 'h...
  function extract_urls (line 274) | def extract_urls(text: str) -> List[str]:
  function extract_markdown_urls (line 280) | def extract_markdown_urls(md_text: str) -> List[str]:
  function extract_code (line 286) | def extract_code(text: str) -> str:
  function json_loads (line 300) | def json_loads(text: str) -> dict:
  class PydanticJSONEncoder (line 313) | class PydanticJSONEncoder(json.JSONEncoder):
    method default (line 315) | def default(self, obj):
  function json_dumps_pretty (line 321) | def json_dumps_pretty(obj: dict, ensure_ascii=False, indent=2, **kwargs)...
  function json_dumps_compact (line 325) | def json_dumps_compact(obj: dict, ensure_ascii=False, indent=None, **kwa...
  function format_as_multimodal_message (line 329) | def format_as_multimodal_message(
  function format_as_text_message (line 427) | def format_as_text_message(
  function save_audio_to_file (line 445) | def save_audio_to_file(base_64: str, file_name: str):
  function extract_text_from_message (line 451) | def extract_text_from_message(
  function extract_files_from_messages (line 465) | def extract_files_from_messages(messages: List[Message], include_images:...
  function extract_images_from_messages (line 477) | def extract_images_from_messages(messages: List[Message]) -> List[str]:
  function merge_generate_cfgs (line 487) | def merge_generate_cfgs(base_generate_cfg: Optional[dict], new_generate_...
  function build_text_completion_prompt (line 500) | def build_text_completion_prompt(
  function encode_image_as_base64 (line 553) | def encode_image_as_base64(path: str, max_short_side_length: int = -1) -...
  function encode_audio_as_base64 (line 568) | def encode_audio_as_base64(path: str) -> str:
  function encode_video_as_base64 (line 573) | def encode_video_as_base64(path: str) -> str:
  function load_image_from_base64 (line 578) | def load_image_from_base64(image_base64: Union[bytes, str]):
  function resize_image (line 585) | def resize_image(img, short_side_length: int = 1080):
  function get_last_usr_msg_idx (line 602) | def get_last_usr_msg_idx(messages: List[Union[dict, Message]]) -> int:
  function rm_default_system (line 611) | def rm_default_system(messages: List[Message]) -> List[Message]:

FILE: qwen_server/assistant_server.py
  function add_text (line 60) | def add_text(history, text):
  function rm_text (line 65) | def rm_text(history):
  function set_url (line 75) | def set_url():
  function bot (line 87) | def bot(history):
  function init_chatbot (line 109) | def init_chatbot():
  function clear_session (line 122) | def clear_session():

FILE: qwen_server/database_server.py
  function update_pop_url (line 71) | def update_pop_url(url: str):
  function change_checkbox_state (line 82) | def change_checkbox_state(key):
  function cache_page (line 91) | def cache_page(**kwargs):
  function web_listening (line 116) | async def web_listening(request: Request):

FILE: qwen_server/js/main.js
  function autoTriggerFunction (line 22) | function autoTriggerFunction() {
  function scrollTextboxToBottom (line 38) | function scrollTextboxToBottom() {

FILE: qwen_server/output_beautify.py
  function extract_obs (line 26) | def extract_obs(text):
  function format_answer (line 33) | def format_answer(text):

FILE: qwen_server/schema.py
  class PathConfig (line 18) | class PathConfig(BaseModel):
  class ServerConfig (line 24) | class ServerConfig(BaseModel):
    class Config (line 35) | class Config:
  class GlobalConfig (line 39) | class GlobalConfig(BaseModel):

FILE: qwen_server/utils.py
  function save_browsing_meta_data (line 22) | def save_browsing_meta_data(url: str, title: str, meta_file: str):
  function rm_browsing_meta_data (line 40) | def rm_browsing_meta_data(url: str, meta_file: str):
  function read_meta_data_by_condition (line 53) | def read_meta_data_by_condition(meta_file: str, **kwargs):
  function save_history (line 85) | def save_history(history, url, history_dir):
  function read_history (line 94) | def read_history(url, history_dir):

FILE: qwen_server/workstation_server.py
  function add_text (line 73) | def add_text(history, text):
  function pure_add_text (line 79) | def pure_add_text(history, text):
  function rm_text (line 85) | def rm_text(history):
  function chat_clear (line 95) | def chat_clear():
  function chat_clear_pure (line 100) | def chat_clear_pure():
  function chat_clear_last (line 105) | def chat_clear_last():
  function pure_chat_clear_last (line 111) | def pure_chat_clear_last():
  function add_file (line 117) | def add_file(file, chosen_plug):
  function update_app_global_para (line 142) | def update_app_global_para(date1, date2):
  function refresh_date (line 147) | def refresh_date():
  function update_browser_list (line 153) | def update_browser_list():
  function layout_to_right (line 173) | def layout_to_right(text):
  function download_text (line 177) | def download_text(text):
  function choose_plugin (line 189) | def choose_plugin(chosen_plugin):
  function pure_bot (line 198) | def pure_bot(history):
  function keep_only_files_for_name (line 226) | def keep_only_files_for_name(messages, name):
  function bot (line 245) | def bot(history, chosen_plug):
  function get_last_one_line_context (line 312) | def get_last_one_line_context(text):
  function generate (line 323) | def generate(context):
  function format_generate (line 387) | def format_generate(edit, context):

FILE: run_server.py
  function parse_args (line 26) | def parse_args():
  function update_config (line 80) | def update_config(server_config, args, server_config_path):
  function main (line 97) | def main():

FILE: setup.py
  function get_version (line 20) | def get_version() -> str:
  function read_description (line 30) | def read_description() -> str:

FILE: tests/agents/test_article_agent.py
  function test_article_agent_full_article (line 21) | def test_article_agent_full_article():

FILE: tests/agents/test_assistant.py
  function test_assistant_system_and_tool (line 19) | def test_assistant_system_and_tool():
  function test_assistant_files (line 36) | def test_assistant_files():
  function test_assistant_empty_query (line 53) | def test_assistant_empty_query():
  function test_assistant_vl (line 69) | def test_assistant_vl():

FILE: tests/agents/test_custom_tool_object.py
  class MyImageGen (line 24) | class MyImageGen(BaseTool):
    method call (line 34) | def call(self, params: str, **kwargs) -> str:
  function init_agent_service (line 40) | def init_agent_service():
  function test_custom_tool_object (line 50) | def test_custom_tool_object():

FILE: tests/agents/test_doc_qa.py
  function test_doc_qa (line 18) | def test_doc_qa():

FILE: tests/agents/test_parallel_qa.py
  function test_parallel_qa (line 18) | def test_parallel_qa():

FILE: tests/agents/test_react_chat.py
  function test_react_chat (line 23) | def test_react_chat():
  function test_react_chat_with_file (line 39) | def test_react_chat_with_file():

FILE: tests/agents/test_router.py
  function test_router (line 19) | def test_router():

FILE: tests/examples/test_examples.py
  function test_assistant_add_custom_tool (line 40) | def test_assistant_add_custom_tool(query):
  function test_assistant_weather_bot (line 46) | def test_assistant_weather_bot(query, file):
  function test_llm_vl_mix_text (line 50) | def test_llm_vl_mix_text():
  function test_visual_storytelling (line 56) | def test_visual_storytelling(query, image):
  function test_function_calling (line 60) | def test_function_calling():
  function test_parallel_function_calling (line 64) | def test_parallel_function_calling():
  function test_react_data_analysis (line 78) | def test_react_data_analysis(query, file):
  function test_llm_riddles (line 82) | def test_llm_riddles():
  function test_multi_agent_router (line 89) | def test_multi_agent_router(query, image, file):
  function test_group_chat_chess (line 94) | def test_group_chat_chess(query):
  function test_group_chat_demo (line 98) | def test_group_chat_demo():
  function test_qwen2vl_assistant_tooluse (line 102) | def test_qwen2vl_assistant_tooluse():
  function test_video_understanding (line 106) | def test_video_understanding():

FILE: tests/examples/test_long_dialogue.py
  function test_long_dialogue (line 24) | def test_long_dialogue():

FILE: tests/examples/test_vm_qa.py
  function test_vm (line 24) | def test_vm():

FILE: tests/llm/test_continue.py
  function test_continue (line 33) | def test_continue(stream, delta_stream, llm_cfg):

FILE: tests/llm/test_dashscope.py
  function test_vl_mix_text (line 41) | def test_vl_mix_text(functions, stream, delta_stream):
  function test_llm_dashscope (line 69) | def test_llm_dashscope(functions, stream, delta_stream):
  function test_llm_retry_failure (line 94) | def test_llm_retry_failure(stream, delta_stream):

FILE: tests/llm/test_function_content.py
  function test_function_content (line 35) | def test_function_content(cfg, gen_cfg1, gen_cfg2):

FILE: tests/llm/test_oai.py
  function test_llm_oai (line 41) | def test_llm_oai(functions, stream, delta_stream):

FILE: tests/memory/test_memory.py
  function test_memory (line 25) | def test_memory():

FILE: tests/qwen_server/test_database_server.py
  function test_database_server (line 25) | def test_database_server():

FILE: tests/tools/test_doc_parser.py
  function test_doc_parser (line 18) | def test_doc_parser():

FILE: tests/tools/test_hybrid_search.py
  function test_hybrid_search (line 18) | def test_hybrid_search():

FILE: tests/tools/test_keyword_search.py
  function test_keyword_search (line 18) | def test_keyword_search():

FILE: tests/tools/test_simple_doc_parser.py
  function test_simple_doc_parser (line 18) | def test_simple_doc_parser():

FILE: tests/tools/test_tools.py
  function test_amap_weather (line 24) | def test_amap_weather(params):
  function test_code_interpreter (line 29) | def test_code_interpreter():
  function test_image_gen (line 34) | def test_image_gen():
  function test_retrieval (line 39) | def test_retrieval():
  function test_storage_put (line 48) | def test_storage_put(operate):
  function test_storage_scan (line 56) | def test_storage_scan(operate):
  function test_storage_get_delete (line 64) | def test_storage_get_delete(operate):

FILE: tests/tools/test_vector_search.py
  function test_vector_search (line 18) | def test_vector_search():

Download .json

Condensed preview — 304 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (5,255K chars).

[
  {
    "path": ".github/workflows/deploy-docs.yml",
    "chars": 1239,
    "preview": "name: Deploy to GitHub Pages\n\non:\n  push:\n    branches:\n      - main  # 或者你的主分支名称\n    paths:\n      - 'qwen-agent-docs/we"
  },
  {
    "path": ".gitignore",
    "chars": 1274,
    "preview": "env\n*.pyc\n__pycache__\n\n.idea\n.vscode\n.DS_Store\n*.ipynb_checkpoints\n\nqwen_agent/llm/gpt.py\nqwen_agent/llm/tools.py\nworksp"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 936,
    "preview": "repos:\n  - repo: https://github.com/pycqa/flake8.git\n    rev: 5.0.4\n    hooks:\n      - id: flake8\n        args: [\"--max-"
  },
  {
    "path": "LICENSE",
    "chars": 11358,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "MANIFEST.in",
    "chars": 85,
    "preview": "include qwen_agent/utils/qwen.tiktoken\nrecursive-include qwen_agent/tools/resource *\n"
  },
  {
    "path": "README.md",
    "chars": 15110,
    "preview": "<!---\nCopyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 "
  },
  {
    "path": "README_CN.md",
    "chars": 10071,
    "preview": "<!---\nCopyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 "
  },
  {
    "path": "benchmark/code_interpreter/README.md",
    "chars": 8383,
    "preview": "# Code Interpreter Benchmark\n\n## Introduction\nTo assess LLM's ability to use the Python Code Interpreter for tasks such "
  },
  {
    "path": "benchmark/code_interpreter/code_interpreter.py",
    "chars": 7686,
    "preview": "import base64\nimport io\nimport json\nimport logging\nimport os\nimport queue\nimport re\nimport subprocess\nimport sys\nimport "
  },
  {
    "path": "benchmark/code_interpreter/config.py",
    "chars": 2095,
    "preview": "from parser import InternLMReActParser, ReActParser\n\nfrom models import LLM, Qwen, QwenDashscopeVLModel, QwenVL\nfrom pro"
  },
  {
    "path": "benchmark/code_interpreter/inference_and_execute.py",
    "chars": 9033,
    "preview": "import argparse\nimport json\nimport logging\nimport os\nfrom parser import ReActParser\n\nimport prettytable\nimport tqdm\nfrom"
  },
  {
    "path": "benchmark/code_interpreter/metrics/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "benchmark/code_interpreter/metrics/code_execution.py",
    "chars": 8337,
    "preview": "import logging\nimport os\n\nimport func_timeout\nfrom config import get_react_parser\nfrom func_timeout import func_set_time"
  },
  {
    "path": "benchmark/code_interpreter/metrics/gsm8k.py",
    "chars": 1684,
    "preview": "import logging\nimport os\nimport re\n\nimport numpy as np\nfrom utils.data_utils import load_jsonl, save_jsonl\n\nINVALID_ANS "
  },
  {
    "path": "benchmark/code_interpreter/metrics/visualization.py",
    "chars": 6322,
    "preview": "import base64\nimport logging\nimport os\nimport re\n\nimport torch\nfrom config import get_model, get_react_parser\nfrom utils"
  },
  {
    "path": "benchmark/code_interpreter/models/__init__.py",
    "chars": 178,
    "preview": "from models.base import HFModel  # noqa\nfrom models.dashscope import QwenDashscopeVLModel  # noqa\nfrom models.llm import"
  },
  {
    "path": "benchmark/code_interpreter/models/base.py",
    "chars": 748,
    "preview": "from transformers import AutoModelForCausalLM, AutoTokenizer\nfrom transformers.generation import GenerationConfig\n\n\nclas"
  },
  {
    "path": "benchmark/code_interpreter/models/dashscope.py",
    "chars": 1429,
    "preview": "import logging\nimport os\nimport time\nfrom http import HTTPStatus\n\nimport dashscope\n\n\nclass QwenDashscopeVLModel(object):"
  },
  {
    "path": "benchmark/code_interpreter/models/llm.py",
    "chars": 853,
    "preview": "import torch\nfrom models.base import HFModel\n\n\nclass LLM(HFModel):\n\n    def __init__(self, model_path):\n        super()."
  },
  {
    "path": "benchmark/code_interpreter/models/qwen.py",
    "chars": 1111,
    "preview": "import torch\nfrom models.base import HFModel\n\n\nclass Qwen(HFModel):\n\n    def __init__(self, model_path):\n        super()"
  },
  {
    "path": "benchmark/code_interpreter/parser/__init__.py",
    "chars": 115,
    "preview": "from parser.internlm_parser import InternLMReActParser  # noqa\nfrom parser.react_parser import ReActParser  # noqa\n"
  },
  {
    "path": "benchmark/code_interpreter/parser/internlm_parser.py",
    "chars": 343,
    "preview": "from parser.react_parser import ReActParser\n\n\nclass InternLMReActParser(ReActParser):\n\n    def __init__(self):\n        s"
  },
  {
    "path": "benchmark/code_interpreter/parser/react_parser.py",
    "chars": 1801,
    "preview": "class ReActParser(object):\n\n    def __init__(self):\n        self.action = '\\nAction:'\n        self.action_input = '\\nAct"
  },
  {
    "path": "benchmark/code_interpreter/prompt/__init__.py",
    "chars": 193,
    "preview": "from prompt.internlm_react import InternLMReAct  # noqa\nfrom prompt.llama_react import LlamaReAct  # noqa\nfrom prompt.qw"
  },
  {
    "path": "benchmark/code_interpreter/prompt/internlm_react.py",
    "chars": 3014,
    "preview": "from prompt.react import ReAct\n\nINTERNLM_TOOL_DESCRIPTION = \"\"\"用来执行Python代码。代码必须是一个函数，\n函数名必须得是 'solution'，代码对应你的思考过程。代码实"
  },
  {
    "path": "benchmark/code_interpreter/prompt/llama_react.py",
    "chars": 660,
    "preview": "from prompt.react import ReAct\n\n\nclass LlamaReAct(ReAct):\n\n    def __init__(self, query, lang='en', upload_file_paths=[]"
  },
  {
    "path": "benchmark/code_interpreter/prompt/qwen_react.py",
    "chars": 2705,
    "preview": "import json\nimport os\n\nfrom prompt.react import ReAct\n\nQWEN_TOOLS_LIST = [\n    {\n        'name_for_human': '代码解释器',\n    "
  },
  {
    "path": "benchmark/code_interpreter/prompt/react.py",
    "chars": 2773,
    "preview": "import os\n\ntools_text = \"\"\"code_interpreter: Call this tool to interact with the Code Interpreter API.\nWhat is the Code "
  },
  {
    "path": "benchmark/code_interpreter/requirements.txt",
    "chars": 152,
    "preview": "accelerate>=0.20.3\nfunc_timeout\njson5\nmatplotlib\nnumpy\nopenai\npandas\nPrettyTable\nscipy\nseaborn\nsympy\ntransformers==4.33."
  },
  {
    "path": "benchmark/code_interpreter/utils/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "benchmark/code_interpreter/utils/code_utils.py",
    "chars": 860,
    "preview": "import os\nimport re\n\nimport json5\n\n\ndef replace_upload_fname(text, upload_fname_list):\n    for full_input_fname in uploa"
  },
  {
    "path": "benchmark/code_interpreter/utils/data_utils.py",
    "chars": 728,
    "preview": "import json\nimport logging\n\nfrom tqdm import tqdm\n\n\ndef load_jsonl(path):\n    data = []\n    with open(path, 'r', encodin"
  },
  {
    "path": "benchmark/deepplanning/README.md",
    "chars": 8765,
    "preview": "# DeepPlanning Benchmark\n\nA comprehensive benchmark for evaluating AI agents' planning capabilities across multiple doma"
  },
  {
    "path": "benchmark/deepplanning/aggregate_results.py",
    "chars": 17286,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nAggregate results across Shopping and Travel Planning benchmarks\nCalculates overall scores by"
  },
  {
    "path": "benchmark/deepplanning/env.example",
    "chars": 239,
    "preview": "# API Keys for different model providers\n# Copy this file to .env and fill in your API keys\n\n# For Qwen models (via Dash"
  },
  {
    "path": "benchmark/deepplanning/models_config.json",
    "chars": 991,
    "preview": "{\n  \"models\": {\n    \"qwen-plus\": {\n      \"model_name\": \"qwen-plus\",\n      \"model_type\": \"openai\",\n      \"base_url\": \"htt"
  },
  {
    "path": "benchmark/deepplanning/requirements.txt",
    "chars": 1445,
    "preview": "# ========================================\n# Unified Benchmark Requirements\n# For both Shopping Planning and Travel Plan"
  },
  {
    "path": "benchmark/deepplanning/run_all.sh",
    "chars": 10285,
    "preview": "#!/bin/bash\n\n# ============================================\n# Unified Benchmark Runner\n# Runs both Shopping and Travel P"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/README.md",
    "chars": 10881,
    "preview": "## 🛠️ Quick Start\n\nThis domain can be run as part of the unified benchmark or independently.\n\n### Step 1: Install Depend"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/agent/call_llm.py",
    "chars": 5587,
    "preview": "\"\"\"\nUniversal LLM calling module\nSupports OpenAI-compatible APIs\n\"\"\"\nimport json\nimport os\nimport time\nfrom pathlib impo"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/agent/prompts.py",
    "chars": 13809,
    "preview": "\nSYSTEM_PROMPT_level1 = \"\"\"\nYou are an expert and highly strategic AI Shopping Assistant. Your mission is to understand "
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/agent/shopping_agent.py",
    "chars": 21610,
    "preview": "\"\"\"\nCustom Agent implementation - Framework-independent\n\nUses universal LLM calling for multiple providers\n\"\"\"\n\nimport j"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/data/level_1_query_meta.json",
    "chars": 42021,
    "preview": "[\n    {\n        \"id\": \"1\",\n        \"query\": \"I'm putting together a complete footwear collection and need to order sever"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/data/level_2_query_meta.json",
    "chars": 47576,
    "preview": "[\n    {\n        \"id\": \"1\",\n        \"query\": \"I'm updating my wardrobe for an upcoming trip and need to order a few speci"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/data/level_3_query_meta.json",
    "chars": 15930,
    "preview": "[\n    {\n        \"id\": \"1\",\n        \"query\": \"I'm preparing for a weekend getaway and need to order several items with fa"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/evaluation/evaluation_pipeline.py",
    "chars": 18451,
    "preview": "import json\nimport sys\nfrom pathlib import Path\nfrom typing import Any, Dict, List\nfrom datetime import datetime\n\n# Add "
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/evaluation/score_statistics.py",
    "chars": 11523,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nScore statistics script\nCalculate total scores for a model across all levels\n\"\"\"\n\nimport json"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/run.py",
    "chars": 5796,
    "preview": "\"\"\"\nShoppingBench Integrated Runner\n\nThis script runs shopping agent inference for different levels.\n\nUsage:\n    python "
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/run.sh",
    "chars": 4962,
    "preview": "#!/bin/bash\n\n# ============================================\n# Shopping Benchmark Runner\n# Usage: bash run.sh\n# \n# Key Fe"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/__init__.py",
    "chars": 1300,
    "preview": "\"\"\"\nShoppingBench Tools Package\n\"\"\"\n\nfrom .filter_by_brand_tool import FilterByBrandTool\nfrom .filter_by_color_tool impo"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/add_coupon_to_cart.py",
    "chars": 12007,
    "preview": "import json\nimport re\nfrom typing import Union, Dict, Tuple, List\nfrom base_shopping_tool import BaseShoppingTool, regis"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/add_product_to_cart.py",
    "chars": 6271,
    "preview": "import json\nfrom typing import Union, Dict\nfrom base_shopping_tool import BaseShoppingTool, register_tool\nfrom pathlib i"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/base_shopping_tool.py",
    "chars": 14756,
    "preview": "\"\"\"\nBase Shopping Tool - Independent Base Tool Class\n\nFramework-agnostic, designed to be compatible with the qwen-agent "
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/calculate_transport_time_tool.py",
    "chars": 8414,
    "preview": "import json\nimport os\nfrom typing import Union, Dict\nfrom pathlib import Path\nfrom base_shopping_tool import BaseShoppin"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/delete_coupon_from_cart.py",
    "chars": 10094,
    "preview": "import json\nimport re\nfrom typing import Union, Dict, List\nfrom pathlib import Path\nfrom base_shopping_tool import BaseS"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/delete_product_from_cart.py",
    "chars": 6029,
    "preview": "import json\nfrom typing import Union, Dict\nfrom base_shopping_tool import BaseShoppingTool, register_tool\nfrom pathlib i"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/filter_by_applicable_coupons_tool.py",
    "chars": 3477,
    "preview": "import json\nimport os\nfrom typing import Union, Dict, List\nfrom base_shopping_tool import BaseShoppingTool, register_too"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/filter_by_brand_tool.py",
    "chars": 2843,
    "preview": "\"\"\"\nFilter products by brand name.\n\"\"\"\n\nimport json\nimport os\nfrom typing import Union, Dict, List\nfrom base_shopping_to"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/filter_by_color_tool.py",
    "chars": 2686,
    "preview": "import json\nimport os\nfrom typing import Union, Dict, List\nfrom base_shopping_tool import BaseShoppingTool, register_too"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/filter_by_range_tool.py",
    "chars": 4441,
    "preview": "import json\nimport os\nfrom typing import Union, Dict, List, Any\nfrom functools import reduce\nfrom base_shopping_tool imp"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/filter_by_size_tool.py",
    "chars": 2670,
    "preview": "import json\nimport os\nfrom typing import Union, Dict, List\nfrom base_shopping_tool import BaseShoppingTool, register_too"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/get_cart_info.py",
    "chars": 3286,
    "preview": "import json\nimport os\nfrom typing import Union, Dict, List, Optional\nfrom base_shopping_tool import BaseShoppingTool, re"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/get_product_details_tool.py",
    "chars": 2117,
    "preview": "import json\nimport os\nfrom typing import Union, Dict, List\nfrom base_shopping_tool import BaseShoppingTool, register_too"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/get_user_info.py",
    "chars": 2799,
    "preview": "import json\nfrom typing import Union, Dict, List, Optional\nfrom base_shopping_tool import BaseShoppingTool, register_too"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/search_products_tool.py",
    "chars": 4393,
    "preview": "import json\nimport os\nfrom pathlib import Path\nfrom typing import Union, Dict, List\n\ntry:\n    from rank_bm25 import BM25"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/shopping_tool_schema.json",
    "chars": 19130,
    "preview": "[\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"search_products\",\n            \"descriptio"
  },
  {
    "path": "benchmark/deepplanning/shoppingplanning/tools/sort_product_tool.py",
    "chars": 4153,
    "preview": "import json\nimport os\nfrom typing import Union, Dict, List, Any\nfrom functools import reduce\nfrom base_shopping_tool imp"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/README.md",
    "chars": 8042,
    "preview": "## 🛠️ Quick Start\n\nThis domain can be run as part of the unified benchmark or independently.\n\n### Step 1: Install Depend"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/agent/__init__.py",
    "chars": 189,
    "preview": "\"\"\"\nAgent module for TravelBench\n\nThis module contains the agent implementation for travel planning.\n\"\"\"\n\nfrom .tools_fn"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/agent/call_llm.py",
    "chars": 5852,
    "preview": "\"\"\"\nUniversal LLM calling module\nSupports multiple providers: OpenAI, Anthropic (Claude), Google (Gemini), etc.\n\"\"\"\nimpo"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/agent/prompts.py",
    "chars": 38535,
    "preview": "\"\"\"\nPrompts for Travel Planning Agent\nIncludes both Chinese and English versions\n\"\"\"\n\n# Chinese Version (from TravelBenc"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/agent/tools_fn_agent.py",
    "chars": 21082,
    "preview": "\"\"\"\nCustom Agent implementation - Framework-independent\nUses universal LLM calling for multiple providers\n\"\"\"\nimport jso"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/data/travelplanning_query_en.json",
    "chars": 462195,
    "preview": "[\n  {\n    \"id\": \"0\",\n    \"query\": \"I'm planning a two-day trip from Hefei to Nanjing on November 12, 2025, returning in "
  },
  {
    "path": "benchmark/deepplanning/travelplanning/data/travelplanning_query_zh.json",
    "chars": 277826,
    "preview": "[\n  {\n    \"id\": \"0\",\n    \"query\": \"我打算2025年11月12号从合肥去南京玩两天，13号晚上就回来了，这次旅行的总开销希望控制在3000元以内。我们一共三个人，交通的话就坐火车吧，应该挺方便的，你帮我挑个"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/evaluation/__init__.py",
    "chars": 306,
    "preview": "\"\"\"\nEvaluation module for TravelBench\n\nThis module contains tools for converting agent outputs to structured format\nand "
  },
  {
    "path": "benchmark/deepplanning/travelplanning/evaluation/constraints_commonsense.py",
    "chars": 74434,
    "preview": "\"\"\"\nCommonsense constraints evaluation for travel plans.\nContains all validation checks for travel plan feasibility and "
  },
  {
    "path": "benchmark/deepplanning/travelplanning/evaluation/constraints_hard.py",
    "chars": 25905,
    "preview": "\"\"\"\nHard Constraint Evaluation Module\nImplements evaluation strategies for each hard constraint type\n\"\"\"\n\nfrom typing im"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/evaluation/convert_report.py",
    "chars": 13621,
    "preview": "\"\"\"\nConvert agent reports to structured JSON format for evaluation\n\nThis module uses an LLM to parse the agent's natural"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/evaluation/eval_converted.py",
    "chars": 23567,
    "preview": "\"\"\"\nEvaluation Script for Converted Travel Plans\nEvaluates both commonsense and hard constraints with parallel processin"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/evaluation/utils.py",
    "chars": 30655,
    "preview": "\"\"\"\nUtility functions for travel planning evaluation.\nContains common helper functions for parsing, validation, and data"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/run.py",
    "chars": 17133,
    "preview": "\"\"\"\nTravelBench Integrated Runner\n\nThis script integrates three steps into a single pipeline:\n1. Agent inference (genera"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/run.sh",
    "chars": 13925,
    "preview": "#!/bin/bash\n\nSCRIPT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" && pwd)\"\ncd \"$SCRIPT_DIR\"\n\n# Model from models_config.jso"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/__init__.py",
    "chars": 725,
    "preview": "\"\"\"\nTravelBench Tools Package\n\"\"\"\n\nfrom .train_query_tool import TrainQueryTool\nfrom .flight_query_tool import FlightQue"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/attraction_query_tool.py",
    "chars": 10724,
    "preview": "\"\"\"\nAttraction Query Tool - Query and recommend attractions (Multilingual)\n\"\"\"\nimport os\nfrom typing import Dict, Option"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/base_travel_tool.py",
    "chars": 11368,
    "preview": "\"\"\"\nBase Travel Tool - Extension of qwen-agent BaseTool for travel planning\n\"\"\"\nimport json\nimport os\nfrom typing import"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/flight_query_tool.py",
    "chars": 5346,
    "preview": "\"\"\"\nFlight Query Tool - Query flight information (Multilingual)\n\"\"\"\nimport os\nfrom typing import Dict, Optional, Union\n\n"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/hotel_query_tool.py",
    "chars": 5081,
    "preview": "\"\"\"\nHotel Query Tool - Query hotel information (Multilingual)\n\"\"\"\nimport os\nfrom typing import Dict, Optional, Union\n\nfr"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/location_search_tool.py",
    "chars": 3164,
    "preview": "\"\"\"\nLocation Search Tool - Query location coordinates (Multilingual)\n\"\"\"\nimport os\nfrom typing import Dict, Optional, Un"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/restaurant_query_tool.py",
    "chars": 7855,
    "preview": "\"\"\"\nRestaurant Query Tool - Recommend and query restaurant information (Multilingual)\n\"\"\"\nimport os\nfrom typing import D"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/roadroute_query_tool.py",
    "chars": 4947,
    "preview": "\"\"\"\nRoad Route Query Tool - Query distance and duration between locations (Multilingual)\n\"\"\"\nimport os\nfrom typing impor"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/tool_schema.json",
    "chars": 6197,
    "preview": "[\n    {\n    \"type\": \"function\",\n    \"function\": {\n      \"name\": \"query_train_info\",\n      \"description\": \"查询火车票信息，支持按出发地"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/tool_schema_en.json",
    "chars": 9684,
    "preview": "[\n    {\n    \"type\": \"function\",\n    \"function\": {\n      \"name\": \"query_train_info\",\n      \"description\": \"Query train ti"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/tool_schema_zh.json",
    "chars": 6197,
    "preview": "[\n    {\n    \"type\": \"function\",\n    \"function\": {\n      \"name\": \"query_train_info\",\n      \"description\": \"查询火车票信息，支持按出发地"
  },
  {
    "path": "benchmark/deepplanning/travelplanning/tools/train_query_tool.py",
    "chars": 5413,
    "preview": "\"\"\"\nTrain Query Tool - Query train ticket information (Multilingual)\n\"\"\"\nimport os\nfrom typing import Dict, Optional, Un"
  },
  {
    "path": "browser_qwen/background.js",
    "chars": 2464,
    "preview": "/* \nCopyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (t"
  },
  {
    "path": "browser_qwen/manifest.json",
    "chars": 952,
    "preview": "{\n    \"name\": \"BrowserQwen\",\n    \"description\" : \"An Extension Driven by LLM\",\n    \"version\": \"1.0\",\n    \"manifest_versi"
  },
  {
    "path": "browser_qwen/src/content.js",
    "chars": 2945,
    "preview": "/* \nCopyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (t"
  },
  {
    "path": "browser_qwen/src/popup.html",
    "chars": 2656,
    "preview": "<!DOCTYPE html>\n<html>\n<head>\n  <meta charset=\"UTF-8\">\n    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\n    <me"
  },
  {
    "path": "browser_qwen/src/popup.js",
    "chars": 3056,
    "preview": "/* \nCopyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (t"
  },
  {
    "path": "browser_qwen.md",
    "chars": 5077,
    "preview": "<!---\nCopyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 "
  },
  {
    "path": "browser_qwen_cn.md",
    "chars": 3257,
    "preview": "<!---\nCopyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/__init__.py",
    "chars": 617,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/assistant_add_custom_tool.py",
    "chars": 3321,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/assistant_audio.py",
    "chars": 1262,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/assistant_mcp_sqlite_bot.py",
    "chars": 2975,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/assistant_omni.py",
    "chars": 1954,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/assistant_qwen3.5.py",
    "chars": 3942,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 ("
  },
  {
    "path": "examples/assistant_qwen3.py",
    "chars": 4531,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 ("
  },
  {
    "path": "examples/assistant_qwen3_coder.py",
    "chars": 3976,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 ("
  },
  {
    "path": "examples/assistant_qwen3vl.py",
    "chars": 1830,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 ("
  },
  {
    "path": "examples/assistant_qwq.py",
    "chars": 3122,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/assistant_rag.py",
    "chars": 1442,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/assistant_weather_bot.py",
    "chars": 2801,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/function_calling.py",
    "chars": 5958,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/function_calling_in_parallel.py",
    "chars": 5731,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/gpt_mentions.py",
    "chars": 1957,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/group_chat_chess.py",
    "chars": 2954,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/group_chat_demo.py",
    "chars": 12197,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/llm_quick_chat_oai.py",
    "chars": 2791,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/llm_riddles.py",
    "chars": 2999,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/llm_vl_mix_text.py",
    "chars": 3180,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/long_dialogue.py",
    "chars": 1614,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/multi_agent_router.py",
    "chars": 3453,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/parallel_doc_qa.py",
    "chars": 1655,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/qwen2vl_assistant_tooluse.py",
    "chars": 13044,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/qwen2vl_assistant_video.py",
    "chars": 1840,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/qwen2vl_function_calling.py",
    "chars": 3861,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/react_data_analysis.py",
    "chars": 3266,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/resource/stock_prices.csv",
    "chars": 333,
    "preview": ",Date,Open,High,Low,Close,Adj,Close,Volume\n0,2020/1/3,74.13,74.31,73.6,73.91,73.91,17423000,36237000\n1,2020/1/4,73.91,7"
  },
  {
    "path": "examples/tir_math.py",
    "chars": 2876,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/virtual_memory_qa.py",
    "chars": 2525,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/visual_storytelling.py",
    "chars": 4017,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen-agent-docs/website/.gitignore",
    "chars": 492,
    "preview": "# Dependencies\nnode_modules/\n\n# Next.js build output\n.next/\nout/\n\n# Production build\nbuild/\ndist/\n\n# Debug logs\nnpm-debu"
  },
  {
    "path": "qwen-agent-docs/website/app/[lang]/[[...mdxPath]]/index.css",
    "chars": 710,
    "preview": "@import \"tailwindcss\";\n@import \"tw-animate-css\";\n@import \"nextra-theme-docs/style.css\";\n@source \"../**/*.{ts,tsx}\";\n\n.x\\"
  },
  {
    "path": "qwen-agent-docs/website/app/[lang]/[[...mdxPath]]/page.jsx",
    "chars": 1806,
    "preview": "import { generateStaticParamsFor, importPage } from \"nextra/pages\";\nimport { useMDXComponents as getMDXComponents } from"
  },
  {
    "path": "qwen-agent-docs/website/app/[lang]/layout.tsx",
    "chars": 2166,
    "preview": "/* eslint-env node */\n\nimport { Layout, Navbar } from \"nextra-theme-docs\";\nimport { getPageMap } from \"nextra/page-map\";"
  },
  {
    "path": "qwen-agent-docs/website/app/layout.tsx",
    "chars": 2841,
    "preview": "/* eslint-env node */\n\nimport type { Metadata } from \"next\";\nimport { Head } from \"nextra/components\";\nimport type { FC,"
  },
  {
    "path": "qwen-agent-docs/website/app/page.tsx",
    "chars": 122,
    "preview": "import { redirect } from 'next/navigation';\n\nexport default function HomePage() {\n  // 直接重定向到英文文档首页\n  redirect('/en/');\n"
  },
  {
    "path": "qwen-agent-docs/website/app/robots.ts",
    "chars": 686,
    "preview": "import type { MetadataRoute } from \"next\";\n\nfunction getSiteUrl(): string {\n  const explicit = process.env.NEXT_PUBLIC_S"
  },
  {
    "path": "qwen-agent-docs/website/app/sitemap.ts",
    "chars": 2620,
    "preview": "import type { MetadataRoute } from \"next\";\nimport fs from \"node:fs\";\nimport path from \"node:path\";\n\nconst LOCALES = [\"en"
  },
  {
    "path": "qwen-agent-docs/website/content/en/_meta.ts",
    "chars": 195,
    "preview": "export default {\n  index: {\n    type: 'page',\n    display: 'hidden',\n  },\n  guide: {\n    type: 'page',\n    title: 'Guide"
  },
  {
    "path": "qwen-agent-docs/website/content/en/benchmarks/_meta.ts",
    "chars": 75,
    "preview": "export default {\n  index: 'Overview',\n  deepplanning: 'DeepPlanning',\n};\n\n\n"
  },
  {
    "path": "qwen-agent-docs/website/content/en/benchmarks/deepplanning/index.mdx",
    "chars": 9282,
    "preview": "---\ntoc: false\nsidebar: false\ntypesetting: article\n---\n\n<style>{`\n  aside.nextra-toc {\n    display: none !important;\n  }"
  },
  {
    "path": "qwen-agent-docs/website/content/en/benchmarks/index.md",
    "chars": 543,
    "preview": "# Benchmark Overview\n\nWe provide a benchmark to evaluate the planning capabilities of state-of-the-art agentic models.\n\n"
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/_meta.ts",
    "chars": 101,
    "preview": "export default {\n  index: 'Overview',\n  get_started: 'Get Started',\n  core_moduls: 'Core Moduls',\n};\n"
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/core_moduls/_meta.ts",
    "chars": 214,
    "preview": "export default {\n//   index: 'Overview',\n  'schema': 'Schema',\n  'agent': 'Agent',\n  'llm': 'Model',\n  'tool': 'Tool',\n "
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/core_moduls/agent.md",
    "chars": 10995,
    "preview": "# Agent Introduction\n\nThis document introduces the usage and development process of the Agent class.\n\n## 1. Agent Usage\n"
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/core_moduls/context.md",
    "chars": 1921,
    "preview": "# Context Management\n\nThe context management logic of Qwen Agent aims to dynamically truncate input messages while maint"
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/core_moduls/llm.md",
    "chars": 4108,
    "preview": "# LLM Introduction\n\nThis document introduces the usage and development process of LLM classes.\n\n## 1. LLM Usage\n\nCurrent"
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/core_moduls/mcp.md",
    "chars": 5698,
    "preview": "# MCP (Model Context Protocol)\n\nMCP (Model Context Protocol) is a standardized protocol that enables large language mode"
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/core_moduls/rag.md",
    "chars": 2588,
    "preview": "# RAG (Retrieval-Augmented Generation)\n\nQwen-Agent provides built-in RAG (Retrieval-Augmented Generation) capabilities t"
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/core_moduls/schema.md",
    "chars": 6188,
    "preview": "# Qwen-Agent Schema Documentation\n\n\n## Overview\n\nThe `qwen-agent` schema provides a structured, type-safe messaging syst"
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/core_moduls/tool.md",
    "chars": 4158,
    "preview": "# Tool Introduction\n\nThis document introduces the usage and development process of the Tool class.\nPlease refer to the ["
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/get_started/_meta.ts",
    "chars": 165,
    "preview": "export default {\n//   index: 'Overview',\n  'install': 'Installation',\n  'quickstart': 'QuickStart',\n  'features': 'Featu"
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/get_started/configuration.md",
    "chars": 12210,
    "preview": "# Configuration\n\nThis document explains all configuration parameters of Agent.\n\n## LLM configuration\nThis part explains "
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/get_started/features.md",
    "chars": 2566,
    "preview": "# Qwen-Agent Features\n\nQwen-Agent is a powerful and flexible framework for building intelligent LLM-powered applications"
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/get_started/install.md",
    "chars": 664,
    "preview": "# Installation\n\n- Install the stable version from PyPI:\n```bash\npip install -U \"qwen-agent[gui,rag,code_interpreter,mcp]"
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/get_started/quickstart.md",
    "chars": 6638,
    "preview": "# QuickStart\n\nThis quickstart will guide you to implement an agent using a few lines of code in just a few minutes.\n\n## "
  },
  {
    "path": "qwen-agent-docs/website/content/en/guide/index.md",
    "chars": 1687,
    "preview": "# Qwen-Agent Overview\n\nQwen-Agent is a framework for developing LLM applications based on the instruction following, too"
  },
  {
    "path": "qwen-agent-docs/website/content/en/index.md",
    "chars": 724,
    "preview": "# Qwen Agent Documentation\n\nWelcome to the Qwen Agent documentation. Qwen Agent is an open-source AI agent framework tha"
  },
  {
    "path": "qwen-agent-docs/website/mdx-components.tsx",
    "chars": 1355,
    "preview": "import React from \"react\";\nimport { useMDXComponents as getDocsMDXComponents } from \"nextra-theme-docs\";\nimport { Pre, w"
  },
  {
    "path": "qwen-agent-docs/website/next-env.d.ts",
    "chars": 228,
    "preview": "/// <reference types=\"next\" />\n/// <reference types=\"next/image-types/global\" />\n\n// NOTE: This file should not be edite"
  },
  {
    "path": "qwen-agent-docs/website/next.config.mjs",
    "chars": 854,
    "preview": "import nextra from \"nextra\";\n\nconst withNextra = nextra({\n  latex: true,\n  search: {\n    codeblocks: false,\n  },\n  conte"
  },
  {
    "path": "qwen-agent-docs/website/package.json",
    "chars": 1893,
    "preview": "{\n  \"name\": \"qwen-agent-docs\",\n  \"version\": \"1.0.0\",\n  \"description\": \"Documentation for Qwen-Agent\",\n  \"scripts\": {\n   "
  },
  {
    "path": "qwen-agent-docs/website/postcss.config.js",
    "chars": 94,
    "preview": "module.exports = {\n  plugins: {\n    \"@tailwindcss/postcss\": {},\n    autoprefixer: {},\n  },\n};\n"
  },
  {
    "path": "qwen-agent-docs/website/public/.nojekyll",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "qwen-agent-docs/website/public/fonts/Monoton/OFL.txt",
    "chars": 4496,
    "preview": "Copyright (c) 2011 by vernon adams (vern@newtypography.co.uk),\r\nwith Reserved Font Names \"Monoton\"\r\n\r\nThis Font Software"
  },
  {
    "path": "qwen-agent-docs/website/public/fonts/Orbitron/OFL.txt",
    "chars": 4520,
    "preview": "Copyright 2018 The Orbitron Project Authors (https://github.com/theleagueof/orbitron), with Reserved Font Name: \"Orbitro"
  },
  {
    "path": "qwen-agent-docs/website/public/fonts/Orbitron/README.txt",
    "chars": 2194,
    "preview": "Orbitron Variable Font\n======================\n\nThis download contains Orbitron as both a variable font and static fonts."
  },
  {
    "path": "qwen-agent-docs/website/public/site.webmanifest",
    "chars": 405,
    "preview": "{\n  \"name\": \"Qwen Agent Docs\",\n  \"short_name\": \"Qwen Agent\",\n  \"description\": \"Documentation for Qwen Agent: an open-sou"
  },
  {
    "path": "qwen-agent-docs/website/src/components/font-loader.tsx",
    "chars": 2708,
    "preview": "\"use client\";\nimport { useEffect } from \"react\";\n\nexport const FontLoader = () => {\n  useEffect(() => {\n    // 根据环境设置字体路"
  },
  {
    "path": "qwen-agent-docs/website/src/components/leaderboard.tsx",
    "chars": 20952,
    "preview": "\"use client\";\n\nimport React, { useState } from \"react\";\n\ninterface ModelScore {\n  model: string;\n  icon: string;\n  isThi"
  },
  {
    "path": "qwen-agent-docs/website/src/components/locale-anchor.tsx",
    "chars": 3903,
    "preview": "\"use client\";\n\nimport cn from \"clsx\";\nimport Link from \"next/link\";\nimport { usePathname } from \"next/navigation\";\nimpor"
  },
  {
    "path": "qwen-agent-docs/website/tsconfig.json",
    "chars": 599,
    "preview": "{\n  \"compilerOptions\": {\n    \"lib\": [\"dom\", \"dom.iterable\", \"esnext\"],\n    \"allowJs\": true,\n    \"skipLibCheck\": true,\n  "
  },
  {
    "path": "qwen_agent/__init__.py",
    "chars": 754,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 ("
  },
  {
    "path": "qwen_agent/agent.py",
    "chars": 11230,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/__init__.py",
    "chars": 2008,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/article_agent.py",
    "chars": 1964,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/assistant.py",
    "chars": 6327,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/dialogue_retrieval_agent.py",
    "chars": 3862,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/dialogue_simulator.py",
    "chars": 2668,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/doc_qa/__init__.py",
    "chars": 751,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/doc_qa/basic_doc_qa.py",
    "chars": 2921,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/doc_qa/parallel_doc_qa.py",
    "chars": 10956,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/doc_qa/parallel_doc_qa_member.py",
    "chars": 5017,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/doc_qa/parallel_doc_qa_summary.py",
    "chars": 3200,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/fncall_agent.py",
    "chars": 5812,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 ("
  },
  {
    "path": "qwen_agent/agents/group_chat.py",
    "chars": 13770,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/group_chat_auto_router.py",
    "chars": 3896,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/group_chat_creator.py",
    "chars": 5931,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/human_simulator.py",
    "chars": 2940,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/keygen_strategies/__init__.py",
    "chars": 1020,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/keygen_strategies/gen_keyword.py",
    "chars": 3402,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/keygen_strategies/gen_keyword_with_knowledge.py",
    "chars": 3509,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/keygen_strategies/split_query.py",
    "chars": 4464,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/keygen_strategies/split_query_then_gen_keyword.py",
    "chars": 2812,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/keygen_strategies/split_query_then_gen_keyword_with_knowledge.py",
    "chars": 1481,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/memo_assistant.py",
    "chars": 4405,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 ("
  },
  {
    "path": "qwen_agent/agents/react_chat.py",
    "chars": 6981,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "qwen_agent/agents/router.py",
    "chars": 4744,
    "preview": "# Copyright 2023 The Qwen team, Alibaba Group. All rights reserved.\n# \n# Licensed under the Apache License, Version 2.0 "
  }
]

// ... and 104 more files (download for full content)

About this extraction

This page contains the full source code of the QwenLM/Qwen-Agent GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 304 files (4.7 MB), approximately 1.3M tokens, and a symbol index with 1125 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo