Repository: lsdefine/pc-agent-loop Branch: main Commit: a6c010d25ad9 Files: 89 Total size: 814.0 KB Directory structure: gitextract_dpkv5wrs/ ├── .gitignore ├── CONTRIBUTING.md ├── GETTING_STARTED.md ├── LICENSE ├── README.md ├── TMWebDriver.py ├── agent_loop.py ├── agentmain.py ├── assets/ │ ├── SETUP_FEISHU.md │ ├── agent_bbs.py │ ├── code_run_header.py │ ├── configure_mykey.py │ ├── global_mem_insight_template.txt │ ├── global_mem_insight_template_en.txt │ ├── insight_fixed_structure.txt │ ├── insight_fixed_structure_en.txt │ ├── install-macos-app.sh │ ├── install_python_windows.bat │ ├── sys_prompt.txt │ ├── sys_prompt_en.txt │ ├── tmwd_cdp_bridge/ │ │ ├── background.js │ │ ├── content.js │ │ ├── disable_dialogs.js │ │ ├── manifest.json │ │ ├── popup.html │ │ └── popup.js │ ├── tool_usable_history.json │ ├── tools_schema.json │ └── tools_schema_cn.json ├── frontends/ │ ├── DESKTOP_PET_README.md │ ├── btw_cmd.py │ ├── chatapp_common.py │ ├── continue_cmd.py │ ├── dcapp.py │ ├── desktop_pet.pyw │ ├── desktop_pet_v2.pyw │ ├── dingtalkapp.py │ ├── fsapp.py │ ├── genericagent_acp_bridge.py │ ├── qqapp.py │ ├── qtapp.py │ ├── skins/ │ │ ├── boy/ │ │ │ └── skin.json │ │ ├── dinosaur/ │ │ │ └── skin.json │ │ ├── doux/ │ │ │ └── skin.json │ │ ├── glube/ │ │ │ └── skin.json │ │ ├── line/ │ │ │ ├── License.txt │ │ │ └── skin.json │ │ ├── mort/ │ │ │ └── skin.json │ │ ├── tard/ │ │ │ └── skin.json │ │ └── vita/ │ │ └── skin.json │ ├── stapp.py │ ├── stapp2.py │ ├── tgapp.py │ ├── tuiapp.py │ ├── wechatapp.py │ └── wecomapp.py ├── ga.py ├── hub.pyw ├── launch.pyw ├── llmcore.py ├── memory/ │ ├── adb_ui.py │ ├── autonomous_operation_sop.md │ ├── goal_mode_sop.md │ ├── keychain.py │ ├── ljqCtrl.py │ ├── ljqCtrl_sop.md │ ├── memory_cleanup_sop.md │ ├── memory_management_sop.md │ ├── ocr_utils.py │ ├── plan_sop.md │ ├── procmem_scanner.py │ ├── procmem_scanner_sop.md │ ├── scheduled_task_sop.md │ ├── supervisor_sop.md │ ├── tmwebdriver_sop.md │ ├── ui_detect.py │ ├── vision_api.template.py │ ├── vision_sop.md │ ├── vue3_component_sop.md │ └── web_setup_sop.md ├── mykey_template.py ├── mykey_template_en.py ├── plugins/ │ └── langfuse_tracing.py ├── pyproject.toml ├── reflect/ │ ├── agent_team_worker.py │ ├── autonomous.py │ ├── goal_mode.py │ └── scheduler.py └── simphtml.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ temp/ tmp/ __pycache__/ *.py[cod] *$py.class .venv/ venv/ env/ build/ dist/ *.egg-info/ .streamlit/ .vscode/ .idea/ *.swp *.swo .DS_Store Thumbs.db *.log .env auth.json model_responses.txt # Sensitive files (API keys, credentials) mykey.py tasks/ *.zip memory/* !memory/memory_management_sop.md # Allow tracking of specific SOPs !memory/web_setup_sop.md !memory/autonomous_operation_sop.md !memory/autonomous_operation_sop/ !memory/autonomous_operation_sop/** !memory/scheduled_task_sop.md # L4 session archiver (only the script, not archives) !memory/L4_raw_sessions/ memory/L4_raw_sessions/* !memory/L4_raw_sessions/compress_session.py # ljqCtrl related tools !memory/ljqCtrl.py !memory/ljqCtrl_sop.md # procmem_scanner related tools !memory/procmem_scanner.py !memory/procmem_scanner_sop.md # TMWebDriver SOP !memory/tmwebdriver_sop.md # Vue3 Component SOP !memory/vue3_component_sop.md # Subagent SOP !memory/subagent_sop.md # Supervisor SOP !memory/supervisor_sop.md # Plan SOP !memory/plan_sop.md # Goal Mode SOP !memory/goal_mode_sop.md # Skill Search SOP !memory/skill_search/ !memory/skill_search/** # ADB UI tool !memory/adb_ui.py # Keychain !memory/keychain.py # Vision / OCR / UI detection tools !memory/ocr_utils.py !memory/vision_sop.md !memory/ui_detect.py !memory/vision_api.template.py # Memory management !memory/memory_cleanup_sop.md # Visual Studio .vs/ restore_commit.txt sche_tasks/ # CDP Bridge 密钥配置(首次运行自动生成) assets/tmwd_cdp_bridge/config.js assets/copilot_proxy.pyw **log.* # Reflect (ignore new files, whitelist existing) reflect/* !reflect/autonomous.py !reflect/scheduler.py !reflect/agent_team_worker.py !reflect/goal_mode.py # Universal: never track __pycache__ anywhere **/__pycache__/ .claude/ ================================================ FILE: CONTRIBUTING.md ================================================ # Contributing to GenericAgent ## Why This File Is Short GenericAgent's core is ~3K lines. Every file in this repo will be read by AI agents — potentially thousands of times. Extra words cost real tokens and push useful context out of the window, increasing hallucinations. This document practices what it preaches: **say only what matters.** ## Before You Contribute 1. **Read the codebase first.** It's small enough to read in one sitting. Understand the philosophy before proposing changes. 2. **Open an Issue first** for anything non-trivial. Discuss before coding. ## Code Standards All PRs go through a strict automated code review skill. Key expectations: - **Self-documenting code, minimal comments.** If code needs a paragraph to explain, rewrite it. - **Compact and visually uniform.** Fewer lines, consistent line lengths, no fluff. - **Small change radius.** Changing A shouldn't ripple through B, C, D. - **More features → less code.** Good abstractions make the codebase shrink, not grow. - **Let it crash by failure radius.** Critical errors fail loud; trivial ones pass silently. No blanket try-catch. > ⚠️ This review is deliberately strict — most AI-generated code (e.g. Claude Code output) will not pass as-is. Read the full principles before submitting. ## Skill Contributions GenericAgent evolves through skills. Not all skills belong in the core repo: | Type | Where it goes | Example | |---|---|---| | **Fundamental / universal** | Core repo (`memory/`) | File search, clipboard, basic web ops | | **Domain-specific / niche** | Skill Marketplace *(coming soon)* | Stock screening, food delivery, specific API integrations | If your skill only makes sense for a specific workflow, it's a marketplace candidate, not a core PR. ## PR Checklist - [ ] Issue linked or context explained in ≤3 sentences - [ ] Code passes the [review principles] self-check: 1. Can I safely modify this locally without reading the whole codebase? 2. Is there a clear core abstraction — new features add implementations, not modify old logic? 3. Are change points converging at boundaries, not scattered everywhere? 4. On failure, can I quickly locate the responsible module? - [ ] Net line count: ideally negative or zero for refactors - [ ] No unnecessary dependencies added ================================================ FILE: GETTING_STARTED.md ================================================ # 🚀 新手上手指南 > 完全没接触过编程也没关系,跟着做就行。Mac / Windows 都适用。 > > 如果你已经有 Python 环境,直接跳到[第 2 步](#2-配置-api-key)。 --- ## 1. 安装 Python ### Mac 打开「终端」(启动台搜索 "终端" 或 "Terminal"),粘贴这行命令然后回车: ```bash brew install python ``` 如果提示 `brew: command not found`,说明还没装 Homebrew,先粘贴这行: ```bash /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" ``` 装完后再执行 `brew install python`。 ### Windows 1. 打开 [python.org/downloads](https://www.python.org/downloads/),点黄色大按钮下载 2. 运行安装包,**底部的 "Add Python to PATH" 一定要勾上** 3. 点 "Install Now" ### 验证 终端 / 命令提示符里输入: ```bash python3 --version ``` 看到 `Python 3.x.x` 就 OK。Windows 上也可以试 `python --version`。 > ⚠️ **版本提示**:推荐 **Python 3.11 或 3.12**。不要使用 3.14(与 pywebview 等依赖不兼容)。 --- ## 2. 配置 API Key ### 下载项目 1. 打开 [GitHub 仓库页面](https://github.com/lsdefine/GenericAgent) 2. 点绿色 **Code** 按钮 → **Download ZIP** 3. 解压到你喜欢的位置 ### 创建配置文件 进入项目文件夹,把 `mykey_template.py` 复制一份,重命名为 `mykey.py`。 用任意文本编辑器打开 `mykey.py`,填入你的 API 信息。**选一种填就行**,不用的配置删掉或留着不管都行。 > 💡 也可以运行交互式向导 `python assets/configure_mykey.py`,按提示选择厂商、填入 Key 即可自动生成 `mykey.py`。 ### 配置示例 **最常见的用法:** ```python # 变量名含 'oai' → 走 OpenAI 兼容格式 (/chat/completions) oai_config = { 'apikey': 'sk-你的密钥', 'apibase': 'http://你的API地址:端口', 'model': '模型名称', } ``` ```python # 变量名含 'claude'(不含 'native')→ 走 Claude 兼容格式 (/messages) claude_config = { 'apikey': 'sk-你的密钥', 'apibase': 'http://你的API地址:端口', 'model': 'claude-sonnet-4-20250514', } ``` ```python # MiniMax 使用 OpenAI 兼容格式,变量名含 'oai' 即可 # 温度自动修正为 (0, 1],支持 M2.7 / M2.5 全系列,204K 上下文 oai_minimax_config = { 'apikey': 'eyJh...', 'apibase': 'https://api.minimax.io/v1', 'model': 'MiniMax-M2.7', } ``` **使用标准工具调用格式(适合较弱模型):** ```python # 变量名同时含 'native' 和 'claude' → Claude 标准工具调用格式 native_claude_config = { 'apikey': 'sk-ant-你的密钥', 'apibase': 'https://api.anthropic.com', 'model': 'claude-sonnet-4-20250514', } ``` > 💡 还支持 `native_oai_config`(OpenAI 标准工具调用)、`sider_cookie`(Sider)等,详见 `mykey_template.py` 中的注释。 ### 关键规则 **变量命名决定接口格式**(不是模型名决定的): | 变量名包含 | 触发的 Session | 适用场景 | |-----------|---------------|---------| | `oai` | OpenAI 兼容 | 大多数 API 服务、OpenAI 官方 | | `claude`(不含 `native`) | Claude 兼容 | Claude API 服务 | | `native` + `claude` | Claude 标准工具调用 | 较弱模型推荐,工具调用更规范 | | `native` + `oai` | OpenAI 标准工具调用 | 较弱模型推荐,工具调用更规范 | > 例:用 Claude 模型,但 API 服务提供的是 OpenAI 兼容接口 → 变量名用 `oai_xxx`。 > 例:用 MiniMax 模型 → 变量名用 `oai_minimax_config`,MiniMax 走 OpenAI 兼容接口。 **`apibase` 填写规则**(会自动拼接端点路径): | 你填的内容 | 系统行为 | |-----------|---------| | `http://host:2001` | 自动补 `/v1/chat/completions` | | `http://host:2001/v1` | 自动补 `/chat/completions` | | `http://host:2001/v1/chat/completions` | 直接使用,不拼接 | --- ## 3. 初次启动 终端里进入项目文件夹,运行: ```bash cd 你的解压路径 python3 agentmain.py ``` 这就是**命令行模式**,已经可以用了。你会看到一个输入提示符,直接打字发送任务即可。 试试你的第一个任务: ``` 帮我在桌面创建一个 hello.txt,内容是 Hello World ``` > 💡 Windows 上如果 `python3` 不识别,换成 `python agentmain.py`。 --- ## 4. 让 Agent 自己装依赖 Agent 启动后,只需要一句话,它就会自己搞定所有依赖: ``` 请查看你的代码,安装所有用得上的 python 依赖 ``` Agent 会自己读代码、找出需要的包、全部装好。 > ⚠️ 如果遇到网络问题导致 Agent 无法调用 API,可能需要先手动装一个包: > ```bash > pip install requests > ``` ### 升级到图形界面 依赖装完后,就可以用 GUI 模式了: ```bash python3 launch.pyw ``` 启动后会出现一个桌面悬浮窗,直接在里面输入任务指令。 ### 可选:让 Agent 帮你做的事 ``` 请帮我建立 git 连接,方便以后更新代码 ``` Agent 会自动配好。如果你电脑上没有 Git,它也会帮你下载 portable 版。 ``` 请帮我在桌面创建一个 launch.pyw 的快捷方式 ``` 这样以后双击桌面图标就能启动,不用再开终端了。 --- ## 5. 能力解锁 环境跑起来之后,你可以逐步解锁更多能力。每一项都只需要**对 Agent 说一句话**: ### 基础能力 | 能力 | 对 Agent 说 | 说明 | |------|-----------|------| | **PowerShell 脚本执行** | `帮我解锁当前用户的 PowerShell ps1 执行权限` | Windows 默认禁止运行 .ps1 脚本 | | **全局文件搜索** | `安装并配置 Everything 命令行工具进 PATH` | 毫秒级全盘文件搜索 | ### 浏览器自动化 | 能力 | 对 Agent 说 | 说明 | |------|-----------|------| | **Web 工具解锁** | `执行 web setup sop,解锁 web 工具` | 注入浏览器插件,使 Agent 能直接操控网页 | 解锁后,Agent 可以在**保留你登录态**的真实浏览器中操作: ``` 打开淘宝,搜索 iPhone 16,按价格排序 去 B 站,查看我最近看过的历史视频 ``` ### 进阶能力 | 能力 | 对 Agent 说 | 说明 | |------|-----------|------| | **OCR** | `用rapidocr配置你的ocr能力并存入记忆` | 让 Agent 能"看到"屏幕文字 | | **屏幕视觉** | `仿造你的llmcore,写个调用vision的能力并存入记忆` | 让 Agent 能"看到"屏幕内容 | | **移动端控制** | `配置 ADB 环境,准备连接安卓设备` | 通过 USB/WiFi 控制 Android 手机 | ### 聊天平台接入(可选) 接入后可以随时随地通过手机给电脑上的 Agent 发指令。 对 Agent 说:`看你的代码,帮我配置 XX 平台的机器人接入` 支持的平台:**微信个人Bot** / QQ / 飞书 / 企业微信 / 钉钉 / Telegram > Agent 会自动读取代码、引导你完成配置。 ### 高级模式 以下模式全部**自文档化**——不用查手册,直接问 Agent 即可: | 模式 | 对 Agent 说 | |------|------------| | **Reflect(反射)** | `查看你的代码,告诉我你的 reflect 模式怎么启用` | | **计划任务** | `查看你的代码,告诉我你的计划任务模式怎么启用` | | **Plan(规划)** | `查看你的代码,告诉我你的 plan 模式怎么启用` | | **SubAgent(子代理)** | `查看你的代码,告诉我你的 subagent 模式怎么启用` | | **自主探索** | `查看你的代码,告诉我你的自主探索模式怎么启用` | > 💡 这就是 GenericAgent 的核心设计理念:**代码即文档**。Agent 能读懂自己的源码,所以任何功能你都可以直接问它。 --- ## 💡 使用越久越强 GenericAgent 不预设技能,而是**靠使用进化**。每完成一个新任务,它会自动将执行路径固化为 Skill,下次遇到类似任务直接调用。 你不需要管理这些 Skill,Agent 会自动处理。使用时间越长,积累的技能越多,最终形成一棵完全属于你的专属技能树。 > 💡 如果你觉得某些重要信息 Agent 没有记住,可以直接告诉它:`把这个记到你的记忆里`,它会主动记忆。 **其他 Claw 的 Skill 也可以直接复用:** - 让 Agent 搜索:`帮我找个做 XXX 的 skill` → 完成后 → `加入你的记忆中` - 直接指定来源:`访问 XXX 文件夹/URL,按照这个 skill 做 XXX` **保持更新:** 对 Agent 说:`git 更新你的代码,然后看看 commit 有什么新功能` > Agent 会自动 pull 最新代码并解读 commit log,告诉你新增了什么能力。 > 更多细节请参阅 [README.md](README.md) 或 [详细版图文教程](https://my.feishu.cn/wiki/CGrDw0T76iNFuskmwxdcWrpinPb)。 ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2025 lsdefine Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================
lsdefine%2FGenericAgent | Trendshift

English | 中文 | 📄 Technical Report: arXiv Technical Report PDF Experiments & Reproduction Repo | 📘 教程 | Sophub

> 📌 **Official channel**: This GitHub repository is the sole official source for GenericAgent. We have no affiliation with any third-party website using the GenericAgent name. --- ## 🌟 Overview **GenericAgent** is a minimal, self-evolving autonomous agent framework. Its core is just **~3K lines of code**. Through **9 atomic tools + a ~100-line Agent Loop**, it grants any LLM system-level control over a local computer — covering browser, terminal, filesystem, keyboard/mouse input, screen vision, and mobile devices (ADB). Its design philosophy: **don't preload skills — evolve them.** Every time GenericAgent solves a new task, it automatically crystallizes the execution path into an skill for direct reuse later. The longer you use it, the more skills accumulate — forming a skill tree that belongs entirely to you, grown from 3K lines of seed code. > **🤖 Self-Bootstrap Proof** — Everything in this repository, from installing Git and running `git init` to every commit message, was completed autonomously by GenericAgent. The author never opened a terminal once. ## 📋 Core Features - **Self-Evolving**: Automatically crystallizes each task into an skill. Capabilities grow with every use, forming your personal skill tree. - **Minimal Architecture**: ~3K lines of core code. Agent Loop is ~100 lines. No complex dependencies, zero deployment overhead. - **Strong Execution**: Injects into a real browser (preserving login sessions). 9 atomic tools take direct control of the system. - **High Compatibility**: Supports Claude / Gemini / Kimi / MiniMax and other major models. Cross-platform. - **Token Efficient**: <30K context window — a fraction of the 200K–1M other agents consume. Layered memory ensures the right knowledge is always in scope. Less noise, fewer hallucinations, higher success rate — at a fraction of the cost. ## 🧬 Self-Evolution Mechanism This is what fundamentally distinguishes GenericAgent from every other agent framework. ``` [New Task] --> [Autonomous Exploration] (install deps, write scripts, debug & verify) --> [Crystallize Execution Path into skill] --> [Write to Memory Layer] --> [Direct Recall on Next Similar Task] ``` | What you say | What the agent does the first time | Every time after | |---|---|---| | *"Read my WeChat messages"* | Install deps → reverse DB → write read script → save skill | **one-line invoke** | | *"Monitor stocks and alert me"* | Install mootdx → build selection flow → configure cron → save skill | **one-line start** | | *"Send this file via Gmail"* | Configure OAuth → write send script → save skill | **ready to use** | After a few weeks, your agent instance will have a skill tree no one else in the world has — all grown from 3K lines of seed code. ##### 🎯 Demo Showcase | 🧋 Food Delivery Order | 📈 Quantitative Stock Screening | |:---:|:---:| | Order Tea | Stock Selection | | *"Order me a milk tea"* — Navigates the delivery app, selects items, and completes checkout automatically. | *"Find GEM stocks with EXPMA golden cross, turnover > 5%"* — Screens stocks with quantitative conditions. | | 🌐 Autonomous Web Exploration | 💰 Expense Tracking | 💬 Batch Messaging | | Web Exploration | Alipay Expense | WeChat Batch | | Autonomously browses and periodically summarizes web content. | *"Find expenses over ¥2K in the last 3 months"* — Drives Alipay via ADB. | Sends bulk WeChat messages, fully driving the WeChat client. | ## 📅 Latest News - **2026-04-21:** 📄 [Technical Report released on arXiv](https://arxiv.org/abs/2604.17091) — *GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization* - **2026-04-11:** Introduced **L4 session archive memory** and scheduler cron integration - **2026-03-23:** Support personal WeChat as a bot frontend - **2026-03-10:** [Released million-scale Skill Library](https://mp.weixin.qq.com/s/q2gQ7YvWoiAcwxzaiwpuiQ?scene=1&click_id=7) - **2026-03-08:** [Released "Dintal Claw" — a GenericAgent-powered government affairs bot](https://mp.weixin.qq.com/s/eiEhwo-j6S-WpLxgBnNxBg) - **2026-03-01:** [GenericAgent featured by Jiqizhixin (机器之心)](https://mp.weixin.qq.com/s/uVWpTTF5I1yzAENV_qm7yg) - **2026-01-16:** GenericAgent V1.0 public release --- ## 🚀 Quick Start #### Method 1: Standard Installation ```bash # 1. Clone the repo git clone https://github.com/lsdefine/GenericAgent.git cd GenericAgent # 2. Install dependencies pip install requests streamlit pywebview # Desktop GUI (launch.pyw) pip install requests textual # Terminal UI (tuiapp.py) # 3. Configure API Key cp mykey_template.py mykey.py # Edit mykey.py and fill in your LLM API Key # 4. Launch python launch.pyw ``` #### Method 2: uv (for experienced Python users) If you prefer a modern Python workflow, GenericAgent also provides a minimal `pyproject.toml`: ```bash git clone https://github.com/lsdefine/GenericAgent.git cd GenericAgent uv venv uv pip install -e ".[ui]" # Core + GUI dependencies cp mykey_template.py mykey.py python launch.pyw ``` > GenericAgent is meant to grow its environment through the Agent itself, not by pre-installing every possible package. Full guide: [GETTING_STARTED.md](GETTING_STARTED.md) --- ## 🖥️ Desktop Frontends ### Terminal UI A lightweight, keyboard-driven interface built on [Textual](https://github.com/Textualize/textual). Supports multiple concurrent sessions, real-time streaming, and runs anywhere a terminal does — no browser needed. ```bash python frontends/tuiapp.py ``` ### Other Desktop Frontends ```bash python frontends/qtapp.py # Qt-based desktop app streamlit run frontends/stapp2.py # Alternative Streamlit UI ``` ### Codeg
[Codeg](https://github.com/yiqi-017/codeg) (`feat/genericagent-integration` branch) is a desktop/web UI that connects GenericAgent alongside other agents (Claude Code, Gemini, Codex, etc.) in a unified interface with a polished, modern UI. > This integration is usable now. Some features are still being refined — feedback welcome. Place your GenericAgent directory alongside the codeg project. Codeg will auto-detect `frontends/genericagent_acp_bridge.py` and launch GenericAgent as a local ACP agent. Codeg Demo
--- ## 💬 Bot Interface (IM) ### Telegram Bot ```python # mykey.py tg_bot_token = 'YOUR_BOT_TOKEN' tg_allowed_users = [YOUR_USER_ID] ``` ```bash python frontends/tgapp.py ``` ### Common Chat Commands The default Streamlit desktop UI started by `python launch.pyw`, plus the QQ / Telegram / Feishu / WeCom / DingTalk frontends, support these chat commands: - `/new` - start a fresh conversation and clear the current context - `/continue` - list recoverable conversation snapshots - `/continue N` - restore the `N`th recoverable conversation ## 📊 Comparison with Similar Tools | Feature | GenericAgent | OpenClaw | Claude Code | |------|:---:|:---:|:---:| | **Codebase** | ~3K lines | ~530,000 lines | Open-sourced (large) | | **Deployment** | `pip install` + API Key | Multi-service orchestration | CLI + subscription | | **Browser Control** | Real browser (session preserved) | Sandbox / headless browser | Via MCP plugin | | **OS Control** | Mouse/kbd, vision, ADB | Multi-agent delegation | File + terminal | | **Self-Evolution** | Autonomous skill growth | Plugin ecosystem | Stateless between sessions | | **Out of the Box** | A few core files + starter skills | Hundreds of modules | Rich CLI toolset | ## 🧠 How It Works GenericAgent accomplishes complex tasks through **Layered Memory × Minimal Toolset × Autonomous Execution Loop**, continuously accumulating experience during execution. 1️⃣ **Layered Memory System** > _Memory crystallizes throughout task execution, letting the agent build stable, efficient working patterns over time._ - **L0 — Meta Rules**: Core behavioral rules and system constraints of the agent - **L1 — Insight Index**: Minimal memory index for fast routing and recall - **L2 — Global Facts**: Stable knowledge accumulated over long-term operation - **L3 — Task Skills / SOPs**: Reusable workflows for completing specific task types - **L4 — Session Archive**: Archived task records distilled from finished sessions for long-horizon recall 2️⃣ **Autonomous Execution Loop** > _Perceive environment state → Task reasoning → Execute tools → Write experience to memory → Loop_ The entire core loop is just **~100 lines of code** (`agent_loop.py`). 3️⃣ **Minimal Toolset** > _GenericAgent provides only **9 atomic tools**, forming the foundational capabilities for interacting with the outside world._ | Tool | Function | |------|------| | `code_run` | Execute arbitrary code | | `file_read` | Read files | | `file_write` | Write files | | `file_patch` | Patch / modify files | | `web_scan` | Perceive web content | | `web_execute_js` | Control browser behavior | | `ask_user` | Human-in-the-loop confirmation | > Additionally, 2 **memory management tools** (`update_working_checkpoint`, `start_long_term_update`) allow the agent to persist context and accumulate experience across sessions. 4️⃣ **Capability Extension Mechanism** > _Capable of dynamically creating new tools._ Via `code_run`, GenericAgent can dynamically install Python packages, write new scripts, call external APIs, or control hardware at runtime — crystallizing temporary abilities into permanent tools.
GenericAgent Workflow
GenericAgent Workflow Diagram
## ⭐ Support If this project helped you, please consider leaving a **Star!** 🙏 You're also welcome to join our **GenericAgent Community Group** for discussion, feedback, and co-building 👏
WeChat Group 15
WeChat Group 15 QR Code
## 🚩 Friendly Links Thanks for the support from the LinuxDo community! [![LinuxDo](https://img.shields.io/badge/社区-LinuxDo-blue?style=for-the-badge)](https://linux.do/) ## 📄 License MIT License — see [LICENSE](LICENSE) *Disclaimer: This project does not build or operate any commercial website. Apart from DintalClaw, no institution, organization, or individual is currently officially authorized to conduct commercial activities under the GenericAgent name.* --- ## 🌟 项目简介 **GenericAgent** 是一个极简、可自我进化的自主 Agent 框架。核心仅 **~3K 行代码**,通过 **9 个原子工具 + ~100 行 Agent Loop**,赋予任意 LLM 对本地计算机的系统级控制能力,覆盖浏览器、终端、文件系统、键鼠输入、屏幕视觉及移动设备。 它的设计哲学是:**不预设技能,靠进化获得能力。** 每解决一个新任务,GenericAgent 就将执行路径自动固化为 Skill,供后续直接调用。使用时间越长,沉淀的技能越多,形成一棵完全属于你、从 3K 行种子代码生长出来的专属技能树。 > **🤖 自举实证** — 本仓库的一切,从安装 Git、`git init` 到每一条 commit message,均由 GenericAgent 自主完成。作者全程未打开过一次终端。 ## 📋 核心特性 - **自我进化**: 每次任务自动沉淀 Skill,能力随使用持续增长,形成专属技能树 - **极简架构**: ~3K 行核心代码,Agent Loop 约百行,无复杂依赖,部署零负担 - **强执行力**: 注入真实浏览器(保留登录态),9 个原子工具直接接管系统 - **高兼容性**: 支持 Claude / Gemini / Kimi / MiniMax 等主流模型,跨平台运行 - **极致省 Token**: 上下文窗口不到 30K,是其他 Agent(200K–1M)的零头。分层记忆让关键信息始终在场——噪声更少,幻觉更低,成功率反而更高,而成本低一个数量级。 ## 🧬 自我进化机制 这是 GenericAgent 区别于其他 Agent 框架的根本所在。 ``` [遇到新任务]-->[自主摸索](安装依赖、编写脚本、调试验证)--> [将执行路径固化为 Skill]-->[写入记忆层]-->[下次同类任务直接调用] ``` | 你说的一句话 | Agent 第一次做了什么 | 之后每次 | |---|---|---| | *"监控股票并提醒我"* | 安装 mootdx → 构建选股流程 → 配置定时任务 → 保存 Skill | **一句话启动** | | *"用 Gmail 发这个文件"* | 配置 OAuth → 编写发送脚本 → 保存 Skill | **直接可用** | 用几周后,你的 Agent 实例将拥有一套任何人都没有的专属技能树,全部从 3K 行种子代码中生长而来。 #### 🎯 实例展示 | 🧋 外卖下单 | 📈 量化选股 | |:---:|:---:| | Order Tea | Stock Selection | | *"Order me a milk tea"* — 自动导航外卖 App,选品并完成结账 | *"Find GEM stocks with EXPMA golden cross, turnover > 5%"* — 量化条件筛股 | | 🌐 自主网页探索 | 💰 支出追踪 | 💬 批量消息 | | Web Exploration | Alipay Expense | WeChat Batch | | 自主浏览并定时汇总网页信息 | *"查找近 3 个月超 ¥2K 的支出"* — 通过 ADB 驱动支付宝 | 批量发送微信消息,完整驱动微信客户端 | ## 📅 最新动态 - **2026-04-21:** 📄 [技术报告已发布至 arXiv](https://arxiv.org/abs/2604.17091) — *GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization* - **2026-04-11:** 引入 **L4 会话归档记忆**,并接入 scheduler cron 调度 - **2026-03-23:** 支持个人微信接入作为 Bot 前端 - **2026-03-10:** [发布百万级 Skill 库](https://mp.weixin.qq.com/s/q2gQ7YvWoiAcwxzaiwpuiQ?scene=1&click_id=7) - **2026-03-08:** [发布以 GenericAgent 为核心的"政务龙虾" Dintal Claw](https://mp.weixin.qq.com/s/eiEhwo-j6S-WpLxgBnNxBg) - **2026-03-01:** [GenericAgent 被机器之心报道](https://mp.weixin.qq.com/s/uVWpTTF5I1yzAENV_qm7yg) - **2026-01-16:** GenericAgent V1.0 公开版本发布 --- ## 🚀 快速开始 #### 方法一:标准安装 ```bash # 1. 克隆仓库 git clone https://github.com/lsdefine/GenericAgent.git cd GenericAgent # 2. 安装依赖 pip install requests streamlit pywebview # 桌面 GUI (launch.pyw) pip install requests textual # 终端 UI (tuiapp.py) # 3. 配置 API Key cp mykey_template.py mykey.py # 编辑 mykey.py,填入你的 LLM API Key # 或使用交互式向导:python assets/configure_mykey.py # 4. 启动 python launch.pyw ``` #### 方法二:uv 快速安装(熟悉 Python 的用户) 如果你习惯现代 Python 工作流,GenericAgent 也提供了一个最小化的 `pyproject.toml`: ```bash git clone https://github.com/lsdefine/GenericAgent.git cd GenericAgent uv pip install -e ".[ui]" # 核心 + GUI 依赖 cp mykey_template.py mykey.py python launch.pyw ``` > GenericAgent 更推荐由 Agent 在使用中自举环境,而不是预先手动装完整依赖。 完整引导流程见 [GETTING_STARTED.md](GETTING_STARTED.md)。 📖 新手使用指南(图文版):[飞书文档](https://my.feishu.cn/wiki/CGrDw0T76iNFuskmwxdcWrpinPb) 📘 完整入门教程(Datawhale 出品):[Hello GenericAgent](https://datawhalechina.github.io/hello-generic-agent/) · [GitHub](https://github.com/datawhalechina/hello-generic-agent) --- ## 🖥️ 桌面前端 ### 终端 UI 基于 [Textual](https://github.com/Textualize/textual) 的轻量键盘驱动界面。支持多会话并发、实时流式输出,有终端就能跑,无需浏览器。 ```bash python frontends/tuiapp.py ``` ### 其他桌面前端 ```bash python frontends/qtapp.py # 基于 Qt 的桌面应用 streamlit run frontends/stapp2.py # 另一种 Streamlit 风格 UI ``` ### Codeg前端
[Codeg](https://github.com/yiqi-017/codeg)(`feat/genericagent-integration` 分支)是一个桌面/Web UI,可以将 GenericAgent 与其他代理(Claude Code、Gemini、Codex 等)在统一界面中并行使用,UI 更加精美。 > 此集成已可使用,部分功能仍在完善中,欢迎体验反馈。 将 GenericAgent 目录放在 codeg 项目同级目录下,Codeg 会自动检测 `frontends/genericagent_acp_bridge.py` 并将 GenericAgent 作为本地 ACP 代理启动。 Codeg Demo
--- ## 💬 Bot 接口(IM) ### 微信 Bot(个人微信) 无需额外配置,扫码登录即可: ```bash pip install pycryptodome qrcode requests python frontends/wechatapp.py ``` > 首次启动会弹出二维码,用微信扫码完成绑定。之后通过微信消息与 Agent 交互。 ### QQ Bot 使用 `qq-botpy` WebSocket 长连接,**无需公网 webhook**: ```bash pip install qq-botpy ``` 在 `mykey.py` 中补充: ```python qq_app_id = "YOUR_APP_ID" qq_app_secret = "YOUR_APP_SECRET" qq_allowed_users = ["YOUR_USER_OPENID"] # 或 ['*'] 公开访问 ``` ```bash python frontends/qqapp.py ``` > 在 [QQ 开放平台](https://q.qq.com) 创建机器人获取 AppID / AppSecret。首次消息后,用户 openid 记录于 `temp/qqapp.log`。 ### 飞书(Lark) ```bash pip install lark-oapi python frontends/fsapp.py ``` ```python fs_app_id = "cli_xxx" fs_app_secret = "xxx" fs_allowed_users = ["ou_xxx"] # 或 ['*'] ``` **入站支持**:文本、富文本 post、图片、文件、音频、media、交互卡片 / 分享卡片 **出站支持**:流式进度卡片、图片回传、文件 / media 回传 **视觉模型**:图片首轮以真正的多模态输入发送给兼容 OpenAI Vision 的后端 详细配置见 [assets/SETUP_FEISHU.md](assets/SETUP_FEISHU.md) ### 企业微信(WeCom) ```bash pip install wecom_aibot_sdk python frontends/wecomapp.py ``` ```python wecom_bot_id = "your_bot_id" wecom_secret = "your_bot_secret" wecom_allowed_users = ["your_user_id"] wecom_welcome_message = "你好,我在线上。" ``` ### 钉钉(DingTalk) ```bash pip install dingtalk-stream python frontends/dingtalkapp.py ``` ```python dingtalk_client_id = "your_app_key" dingtalk_client_secret = "your_app_secret" dingtalk_allowed_users = ["your_staff_id"] # 或 ['*'] ``` ### 通用聊天命令 默认通过 `python launch.pyw` 启动的 Streamlit 桌面 UI,以及 QQ / Telegram / 飞书 / 企业微信 / 钉钉前端,都支持以下命令: - `/new` - 开启新对话并清空当前上下文 - `/continue` - 列出可恢复会话快照 - `/continue N` - 恢复第 `N` 个可恢复会话 ## 📊 与同类产品对比 | 特性 | GenericAgent | OpenClaw | Claude Code | |------|:---:|:---:|:---:| | **代码量** | ~3K 行 | ~530,000 行 | 已开源(体量大) | | **部署方式** | `pip install` + API Key | 多服务编排 | CLI + 订阅 | | **浏览器控制** | 注入真实浏览器(保留登录态) | 沙箱 / 无头浏览器 | 通过 MCP 插件 | | **OS 控制** | 键鼠、视觉、ADB | 多 Agent 委派 | 文件 + 终端 | | **自我进化** | 自主生长 Skill 和工具 | 插件生态 | 会话间无状态 | | **出厂配置** | 几个核心文件 + 少量初始 Skills | 数百模块 | 丰富 CLI 工具集 | ## 🧠 工作机制 GenericAgent 通过**分层记忆 × 最小工具集 × 自主执行循环**完成复杂任务,并在执行过程中持续积累经验。 1️⃣ **分层记忆系统** > 记忆在任务执行过程中持续沉淀,使 Agent 逐步形成稳定且高效的工作方式 - **L0 — 元规则(Meta Rules)**:Agent 的基础行为规则和系统约束 - **L1 — 记忆索引(Insight Index)**:极简索引层,用于快速路由与召回 - **L2 — 全局事实(Global Facts)**:在长期运行过程中积累的稳定知识 - **L3 — 任务 Skills / SOPs**:完成特定任务类型的可复用流程 - **L4 — 会话归档(Session Archive)**:从已完成任务中提炼出的归档记录,用于长程召回 2️⃣ **自主执行循环** > 感知环境状态 → 任务推理 → 调用工具执行 → 经验写入记忆 → 循环 整个核心循环仅 **约百行代码**(`agent_loop.py`)。 3️⃣ **最小工具集** >GenericAgent 仅提供 **9 个原子工具**,构成与外部世界交互的基础能力 | 工具 | 功能 | |------|------| | `code_run` | 执行任意代码 | | `file_read` | 读取文件 | | `file_write` | 写入文件 | | `file_patch` | 修改文件 | | `web_scan` | 感知网页内容 | | `web_execute_js` | 控制浏览器行为 | | `ask_user` | 人机协作确认 | > 此外,还有 2 个**记忆管理工具**(`update_working_checkpoint`、`start_long_term_update`),使 Agent 能够跨会话积累经验、维持持久上下文。 4️⃣ **能力扩展机制** > 具备动态创建新的工具能力 > 通过 `code_run`,GenericAgent 可在运行时动态安装 Python 包、编写新脚本、调用外部 API 或控制硬件,将临时能力固化为永久工具。
GenericAgent 工作流程
GenericAgent 工作流程图
## ⭐ 支持 如果这个项目对您有帮助,欢迎点一个 **Star!** 🙏 同时也欢迎加入我们的**GenericAgent体验交流群**,一起交流、反馈和共建 👏
微信群 15
微信群 15 二维码
## 🚩 友情链接 感谢 **LinuxDo** 社区的支持! [![LinuxDo](https://img.shields.io/badge/社区-LinuxDo-blue?style=for-the-badge)](https://linux.do/) ## 📄 许可 MIT License — 详见 [LICENSE](LICENSE) *声明:本项目未构建任何商业站点;除 DintalClaw 外,目前未官方授权任何机构、组织或个人以 GenericAgent 名义从事商业活动。* ## 📈 Star History Star History Chart ================================================ FILE: TMWebDriver.py ================================================ import json, threading, time, uuid, queue, socket, requests, traceback from typing import Any from simple_websocket_server import WebSocketServer, WebSocket import bottle from bottle import request class Session: def __init__(self, session_id, info, client=None): self.id = session_id self.info = info self.connect_at = time.time() self.disconnect_at = None self.type = info.get('type', 'ws') self.ws_client = client if self.type in ('ws', 'ext_ws') else None self.http_queue = client if self.type == 'http' else None @property def url(self): return self.info.get('url', '') def is_active(self): if self.type == 'http' and time.time() - self.connect_at > 60: self.mark_disconnected() return self.disconnect_at is None def reconnect(self, client, info): self.info = info self.type = info.get('type', 'ws') if self.type in ('ws', 'ext_ws'): self.ws_client = client self.http_queue = None elif self.type == 'http': self.http_queue = client self.connect_at = time.time() self.disconnect_at = None def mark_disconnected(self): if self.is_active(): print(f"Tab disconnected: {self.url} (Session: {self.id})") self.disconnect_at = time.time() class TMWebDriver: def __init__(self, host: str = '127.0.0.1', port: int = 18765): self.host, self.port = host, port self.sessions, self.results, self.acks = {}, {}, {} self.default_session_id = None self.latest_session_id = None self.is_remote = socket.socket().connect_ex((host, port+1)) == 0 if not self.is_remote: self.start_ws_server() self.start_http_server() else: self.remote = f'http://{self.host}:{self.port+1}/link' def start_http_server(self): self.app = app = bottle.Bottle() @app.route('/api/longpoll', method=['GET', 'POST']) def long_poll(): data = request.json session_id = data.get('sessionId') session_info = {'url': data.get('url'), 'title': data.get('title', ''), 'type': 'http'} if session_id not in self.sessions: session = Session(session_id, session_info, queue.Queue()) print(f"Browser http connected: {session.url} (Session: {session_id})") self.sessions[session_id] = session session = self.sessions[session_id] if session.disconnect_at is not None and session.type != 'http': session.reconnect(queue.Queue(), session_info) session.disconnect_at = None if session.type == 'http': msgQ = session.http_queue else: return json.dumps({"id": "", "ret": "use ws"}) session.connect_at = start_time = time.time() while time.time() - start_time < 5: try: msg = msgQ.get(timeout=0.2) try: self.acks[json.loads(msg).get('id','')] = True except Exception: traceback.print_exc() return msg except queue.Empty: continue return json.dumps({"id": "", "ret": "next long-poll"}) @app.route('/api/result', method=['GET','POST']) def result(): data = request.json if data.get('type') == 'result': self.results[data.get('id')] = {'success': True, 'data': data.get('result'), 'newTabs': data.get('newTabs', [])} elif data.get('type') == 'error': self.results[data.get('id')] = {'success': False, 'data': data.get('error'), 'newTabs': data.get('newTabs', [])} return 'ok' @app.route('/link', method=['GET','POST']) def link(): data = request.json if data.get('cmd') == 'get_all_sessions': return json.dumps({'r': self.get_all_sessions()}, ensure_ascii=False) if data.get('cmd') == 'find_session': url_pattern = data.get('url_pattern', '') return json.dumps({'r': self.find_session(url_pattern)}, ensure_ascii=False) if data.get('cmd') == 'execute_js': session_id = data.get('sessionId') code = data.get('code') timeout = float(data.get('timeout', 10.0)) try: result = self.execute_js(code, timeout=timeout, session_id=session_id) print('[remote result]', (str(code)[:50] + ' RESULT:' +str(result)[:50]).replace('\n', ' ')) return json.dumps({'r': result}, ensure_ascii=False) except Exception as e: return json.dumps({'r': {'error': str(e)}}, ensure_ascii=False) return 'ok' def run(): from wsgiref.simple_server import make_server, WSGIServer, WSGIRequestHandler from socketserver import ThreadingMixIn class _T(ThreadingMixIn, WSGIServer): pass class _H(WSGIRequestHandler): def log_request(self, *a): pass make_server(self.host, self.port+1, app, server_class=_T, handler_class=_H).serve_forever() http_thread = threading.Thread(target=run, daemon=True) http_thread.start() def clean_sessions(self): sids = list(self.sessions.keys()) for sid in sids: session = self.sessions[sid] if not session.is_active() and time.time() - session.disconnect_at > 600: del self.sessions[sid] def start_ws_server(self) -> None: driver = self class JSExecutor(WebSocket): def handle(self) -> None: try: data = json.loads(self.data) if data.get('type') == 'ready': session_id = data.get('sessionId') session_info = {'url': data.get('url'), 'title': data.get('title', ''), 'connected_at': time.time(), 'type': 'ws'} driver._register_client(session_id, self, session_info) elif data.get('type') in ['ext_ready', 'tabs_update']: tabs = data.get('tabs', []) current_tab_ids = {str(tab['id']) for tab in tabs} print(f"Received tabs update: {current_tab_ids}") for sid in list(driver.sessions.keys()): sess = driver.sessions[sid] if sess.type == 'ext_ws' and sid not in current_tab_ids: sess.mark_disconnected() for tab in tabs: session_id = str(tab['id']) session_info = {'url': tab.get('url'), 'title': tab.get('title', ''), 'connected_at': time.time(), 'type': 'ext_ws'} sess = driver.sessions.get(session_id) if sess and sess.is_active(): sess.info = session_info else: driver._register_client(session_id, self, session_info) elif data.get('type') == 'ack': driver.acks[data.get('id','')] = True elif data.get('type') == 'result': driver.results[data.get('id')] = {'success': True, 'data': data.get('result'), 'newTabs': data.get('newTabs', [])} elif data.get('type') == 'error': driver.results[data.get('id')] = {'success': False, 'data': data.get('error'), 'newTabs': data.get('newTabs', [])} except Exception as e: print(f"Error handling message: {e}") if hasattr(self, 'data'): print(self.data) def connected(self): (f"New connection from {self.address}") def handle_close(self): print(f"WS Connection closed: {self.address}") driver._unregister_client(self) self.server = WebSocketServer(self.host, self.port, JSExecutor) server_thread = threading.Thread(target=self.server.serve_forever) server_thread.daemon = True server_thread.start() print(f"WebSocket server running on ws://{self.host}:{self.port}") def _register_client(self, session_id: str, client: WebSocket, session_info) -> None: is_new_session = session_id not in self.sessions if is_new_session: session = Session(session_id, session_info, client) self.sessions[session_id] = session print(f"New tab connected: {session.url} (Session: {session_id})") else: session = self.sessions[session_id] session.reconnect(client, session_info) print(f"Tab reconnected: {session.url} (Session: {session_id})") self.latest_session_id = session_id if self.default_session_id is None: self.default_session_id = session_id def _unregister_client(self, client: WebSocket) -> None: for session in self.sessions.values(): if session.ws_client == client: session.mark_disconnected() def execute_js(self, code, timeout=15, session_id=None) -> Any: if session_id is None: session_id = self.default_session_id if self.is_remote: print('remote_execute_js') response = self._remote_cmd({"cmd": "execute_js", "sessionId": session_id, "code": code, "timeout": str(timeout)}).get('r', {}) if response.get('error'): raise Exception(response['error']) return response session = self.sessions.get(session_id) if not session or not session.is_active(): time.sleep(3) session = self.sessions.get(session_id) if not session or not session.is_active(): alive_sessions = [s for s in self.sessions.values() if s.is_active()] if alive_sessions: session = alive_sessions[0] print(f"会话 {session_id} 未连接,自动切换到最新活动会话: {session.id}") session_id = self.default_session_id = session.id if not session or not session.is_active(): raise ValueError(f"会话ID {session_id} 未连接") tp = session.type if tp not in ('ws', 'http', 'ext_ws'): raise ValueError(f"Unsupported session type: {tp}") exec_id = str(uuid.uuid4()) payload_dict = {'id': exec_id, 'code': code} if tp == 'ext_ws': payload_dict['tabId'] = int(session.id) payload = json.dumps(payload_dict) if tp in ['ws', 'ext_ws']: session.ws_client.send_message(payload) elif tp == 'http': session.http_queue.put(payload) start_time = time.time() self.clean_sessions() hasjump = acked = False while exec_id not in self.results: time.sleep(0.2) if not acked and exec_id in self.acks: acked = True; start_time = time.time() if tp in ['ws', 'ext_ws']: if not session.is_active(): hasjump = True if hasjump and session.is_active(): return {'result': f"Session {session_id} reloaded.", "closed":1} if time.time() - start_time > timeout: if tp in ['ws', 'ext_ws']: if hasjump: return {'result': f"Session {session_id} reloaded and new page is loading...", 'closed':1} if acked: return {"result": f"No response data in {timeout}s (ACK received, script may still be running)"} return {"result": f"No response data in {timeout}s (no ACK, script may not have been delivered)"} elif tp == 'http': if acked: return {"result": f"Session {session_id} no response in {timeout}s (delivered but no result)"} return {"result": f"Session {session_id} no response in {timeout}s (script not polled)"} result = self.results.pop(exec_id) if exec_id in self.acks: self.acks.pop(exec_id) if not result['success']: raise Exception(result['data']) rr = {'data': result['data']} newtabs = result.get('newTabs', []); [x.pop('ts', None) for x in newtabs] if newtabs: rr['newTabs'] = newtabs return rr def _remote_cmd(self, cmd): try: return requests.post(self.remote, headers={"Content-Type": "application/json"}, json=cmd, timeout=30).json() except (ConnectionError, requests.exceptions.ConnectionError): raise ConnectionError("TMWebDriver master未运行,看tmwebdriver_sop启动master") def get_all_sessions(self): if self.is_remote: return self._remote_cmd({"cmd": "get_all_sessions"}).get('r', []) return [{'id': session.id, **session.info} for session in self.sessions.values() if session.is_active()] def get_session_dict(self): return {session['id']: session['url'] for session in self.get_all_sessions()} def find_session(self, url_pattern: str): if url_pattern == '': session = self.sessions.get(self.latest_session_id) return [(session.id, session.info)] if session else [] matching_sessions = [] for session in self.sessions.values(): if not session.is_active(): continue if 'url' in session.info and url_pattern in session.info['url']: matching_sessions.append((session.id, session.info)) return matching_sessions def set_session(self, url_pattern: str) -> bool: if self.is_remote: matched = self._remote_cmd({"cmd": "find_session", "url_pattern": url_pattern}).get('r', []) else: matched = self.find_session(url_pattern) if not matched: return print(f"警告: 未找到URL包含 '{url_pattern}' 的会话") if len(matched) > 1: print(f"警告: 找到多个URL包含 '{url_pattern}' 的会话,选择第一个") self.default_session_id, info = matched[0] print(f"成功设置默认会话: {self.default_session_id}: {info['url']}") return self.default_session_id def jump(self, url, timeout=10): self.execute_js(f"window.location.href='{url}'", timeout=timeout) if __name__ == "__main__": driver = TMWebDriver(host='127.0.0.1', port=18765) ================================================ FILE: agent_loop.py ================================================ import json, re, os from dataclasses import dataclass from typing import Any, Optional @dataclass class StepOutcome: data: Any next_prompt: Optional[str] = None should_exit: bool = False def try_call_generator(func, *args, **kwargs): ret = func(*args, **kwargs) if hasattr(ret, '__iter__') and not isinstance(ret, (str, bytes, dict, list)): ret = yield from ret return ret class BaseHandler: def tool_before_callback(self, tool_name, args, response): pass def tool_after_callback(self, tool_name, args, response, ret): pass def turn_end_callback(self, response, tool_calls, tool_results, turn, next_prompt, exit_reason): return next_prompt def dispatch(self, tool_name, args, response, index=0): method_name = f"do_{tool_name}" if hasattr(self, method_name): args['_index'] = index prer = yield from try_call_generator(self.tool_before_callback, tool_name, args, response) ret = yield from try_call_generator(getattr(self, method_name), args, response) _ = yield from try_call_generator(self.tool_after_callback, tool_name, args, response, ret) return ret elif tool_name == 'bad_json': return StepOutcome(None, next_prompt=args.get('msg', 'bad_json'), should_exit=False) else: yield f"未知工具: {tool_name}\n" return StepOutcome(None, next_prompt=f"未知工具 {tool_name}", should_exit=False) def json_default(o): return list(o) if isinstance(o, set) else str(o) def exhaust(g): try: while True: next(g) except StopIteration as e: return e.value def get_pretty_json(data): if isinstance(data, dict) and "script" in data: data = data.copy(); data["script"] = data["script"].replace("; ", ";\n ") return json.dumps(data, indent=2, ensure_ascii=False).replace('\\n', '\n') def agent_runner_loop(client, system_prompt, user_input, handler, tools_schema, max_turns=40, verbose=True, initial_user_content=None): messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": initial_user_content if initial_user_content is not None else user_input} ] turn = 0; handler.max_turns = max_turns while turn < handler.max_turns: turn += 1; turnstr = f'LLM Running (Turn {turn}) ...' if handler.parent.task_dir: turnstr = f'Turn {turn} ...' if verbose: turnstr = f'**{turnstr}**' yield f"\n\n{turnstr}\n\n" if turn%10 == 0: client.last_tools = '' # 每10轮重置一次工具描述,避免上下文过大导致的模型性能下降 response_gen = client.chat(messages=messages, tools=tools_schema) if verbose: response = yield from response_gen yield '\n\n' else: response = exhaust(response_gen) cleaned = _clean_content(response.content) if cleaned: yield cleaned + '\n' if not response.tool_calls: tool_calls = [{'tool_name': 'no_tool', 'args': {}}] else: tool_calls = [{'tool_name': tc.function.name, 'args': json.loads(tc.function.arguments), 'id': tc.id} for tc in response.tool_calls] tool_results = []; next_prompts = set(); exit_reason = {} for ii, tc in enumerate(tool_calls): tool_name, args, tid = tc['tool_name'], tc['args'], tc.get('id', '') if tool_name == 'no_tool': pass else: if verbose: yield f"🛠️ Tool: `{tool_name}` 📥 args:\n````text\n{get_pretty_json(args)}\n````\n" else: yield f"🛠️ {tool_name}({_compact_tool_args(tool_name, args)})\n\n\n" handler.current_turn = turn gen = handler.dispatch(tool_name, args, response, index=ii) try: v = next(gen) def proxy(): yield v; return (yield from gen) if verbose: yield '`````\n' outcome = (yield from proxy()) if verbose else exhaust(proxy()) if verbose: yield '`````\n' except StopIteration as e: outcome = e.value if outcome.should_exit: exit_reason = {'result': 'EXITED', 'data': outcome.data}; break if not outcome.next_prompt: exit_reason = {'result': 'CURRENT_TASK_DONE', 'data': outcome.data}; break if outcome.next_prompt.startswith('未知工具'): client.last_tools = '' if outcome.data is not None and tool_name != 'no_tool': datastr = json.dumps(outcome.data, ensure_ascii=False, default=json_default) if type(outcome.data) in [dict, list] else str(outcome.data) tool_results.append({'tool_use_id': tid, 'content': datastr}) next_prompts.add(outcome.next_prompt) if len(next_prompts) == 0 or exit_reason: if len(handler._done_hooks) == 0 or exit_reason.get('result', '') == 'EXITED': break next_prompts.add(handler._done_hooks.pop(0)) next_prompt = handler.turn_end_callback(response, tool_calls, tool_results, turn, '\n'.join(next_prompts), exit_reason) messages = [{"role": "user", "content": next_prompt, "tool_results": tool_results}] # just new message, history is kept in *Session if exit_reason: handler.turn_end_callback(response, tool_calls, tool_results, turn, '', exit_reason) return exit_reason or {'result': 'MAX_TURNS_EXCEEDED'} def _clean_content(text): if not text: return '' def _shrink_code(m): lines = m.group(0).split('\n') lang = lines[0].replace('```','').strip() body = [l for l in lines[1:-1] if l.strip()] if len(body) <= 6: return m.group(0) preview = '\n'.join(body[:5]) return f'```{lang}\n{preview}\n ... ({len(body)} lines)\n```' text = re.sub(r'```[\s\S]*?```', _shrink_code, text) for p in [r'[\s\S]*?', r'[\s\S]*?', r'(\r?\n){3,}']: text = re.sub(p, '\n\n' if '\\n' in p else '', text) return text.strip() def _compact_tool_args(name, args): a = {k: v for k, v in args.items() if k != '_index'} for k in ('path',): if k in a: a[k] = os.path.basename(a[k]) if name == 'update_working_checkpoint': s = a.get('key_info', ''); return (s[:60]+'...') if len(s)>60 else s if name == 'ask_user': q = str(a.get('question', '')) cs = a.get('candidates') or [] if cs: q += '\ncandidates:\n' + '\n'.join(f'- {c}' for c in cs) return q s = json.dumps(a, ensure_ascii=False); return (s[:120]+'...') if len(s)>120 else s ================================================ FILE: agentmain.py ================================================ import os, sys, threading, queue, time, json, re, random, locale os.environ.setdefault('GA_LANG', 'zh' if any(k in (locale.getlocale()[0] or '').lower() for k in ('zh', 'chinese')) else 'en') if sys.stdout is None: sys.stdout = open(os.devnull, "w") elif hasattr(sys.stdout, 'reconfigure'): sys.stdout.reconfigure(errors='replace') if sys.stderr is None: sys.stderr = open(os.devnull, "w") elif hasattr(sys.stderr, 'reconfigure'): sys.stderr.reconfigure(errors='replace') sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) from llmcore import reload_mykeys, LLMSession, ToolClient, ClaudeSession, MixinSession, NativeToolClient, NativeClaudeSession, NativeOAISession, resolve_client from agent_loop import agent_runner_loop from ga import GenericAgentHandler, smart_format, get_global_memory, format_error, consume_file script_dir = os.path.dirname(os.path.abspath(__file__)) def load_tool_schema(suffix=''): global TOOLS_SCHEMA TS = open(os.path.join(script_dir, f'assets/tools_schema{suffix}.json'), 'r', encoding='utf-8').read() TOOLS_SCHEMA = json.loads(TS if os.name == 'nt' else TS.replace('powershell', 'bash')) load_tool_schema() lang_suffix = '_en' if os.environ.get('GA_LANG', '') == 'en' else '' mem_dir = os.path.join(script_dir, 'memory') if not os.path.exists(mem_dir): os.makedirs(mem_dir) mem_txt = os.path.join(mem_dir, 'global_mem.txt') if not os.path.exists(mem_txt): open(mem_txt, 'w', encoding='utf-8').write('# [Global Memory - L2]\n') mem_insight = os.path.join(mem_dir, 'global_mem_insight.txt') if not os.path.exists(mem_insight): t = os.path.join(script_dir, f'assets/global_mem_insight_template{lang_suffix}.txt') open(mem_insight, 'w', encoding='utf-8').write(open(t, encoding='utf-8').read() if os.path.exists(t) else '') cdp_cfg = os.path.join(script_dir, 'assets/tmwd_cdp_bridge/config.js') if not os.path.exists(cdp_cfg): try: os.makedirs(os.path.dirname(cdp_cfg), exist_ok=True) open(cdp_cfg, 'w', encoding='utf-8').write(f"const TID = '__ljq_{hex(random.randint(0, 99999999))[2:8]}';") except Exception as e: print(f'[WARN] CDP config init failed: {e} — advanced web features (tmwebdriver) will be unavailable.') def get_system_prompt(): with open(os.path.join(script_dir, f'assets/sys_prompt{lang_suffix}.txt'), 'r', encoding='utf-8') as f: prompt = f.read() prompt += f"\nToday: {time.strftime('%Y-%m-%d %a')}\n" prompt += get_global_memory() return prompt class GenericAgent: def __init__(self): os.makedirs(os.path.join(script_dir, 'temp'), exist_ok=True) self.lock = threading.Lock() self.task_dir = None self.history = []; self.handler = None; self.task_queue = queue.Queue() self.is_running = False; self.stop_sig = False self.llm_no = 0; self.inc_out = False; self.verbose = True self.peer_hint = True self.log_path = os.path.join(script_dir, f'temp/model_responses/model_responses_{int(time.time()*1e6)%1000000:06d}.txt') self.load_llm_sessions() def load_llm_sessions(self): mykeys, changed = reload_mykeys() if not changed and hasattr(self, 'llmclients'): return try: oldhistory = self.llmclient.backend.history except: oldhistory = None llm_sessions = [] for k, cfg in mykeys.items(): if not any(x in k for x in ['api', 'config', 'cookie']): continue try: if 'mixin' in k: llm_sessions += [{'mixin_cfg': cfg}] elif c := resolve_client(k): llm_sessions += [c] except: pass for i, s in enumerate(llm_sessions): if isinstance(s, dict) and 'mixin_cfg' in s: try: mixin = MixinSession(llm_sessions, s['mixin_cfg']) if isinstance(mixin._sessions[0], (NativeClaudeSession, NativeOAISession)): llm_sessions[i] = NativeToolClient(mixin) else: llm_sessions[i] = ToolClient(mixin) except Exception as e: print(f'\n\n\n[ERROR] Failed to init MixinSession with cfg {s["mixin_cfg"]}: {e}!!!\n\n') self.llmclients = llm_sessions self.llmclient = self.llmclients[self.llm_no%len(self.llmclients)] if oldhistory: self.llmclient.backend.history = oldhistory def next_llm(self, n=-1): self.load_llm_sessions() self.llm_no = ((self.llm_no + 1) if n < 0 else n) % len(self.llmclients) lastc = self.llmclient self.llmclient = self.llmclients[self.llm_no] try: self.llmclient.backend.history = lastc.backend.history except: raise Exception('[ERROR] BAD Mixin config: Check your mykey.py') self.llmclient.last_tools = '' name = self.get_llm_name(model=True) if 'glm' in name or 'minimax' in name or 'kimi' in name: load_tool_schema('_cn') else: load_tool_schema() def list_llms(self): self.load_llm_sessions() return [(i, self.get_llm_name(b), i == self.llm_no) for i, b in enumerate(self.llmclients)] def get_llm_name(self, b=None, model=False): b = self.llmclient if b is None else b if isinstance(b, dict): return 'BADCONFIG_MIXIN' if model: return b.backend.model.lower() return f"{type(b.backend).__name__}/{b.backend.name}" def abort(self): if not self.is_running: return print('Abort current task...') self.stop_sig = True if self.handler is not None: self.handler.code_stop_signal.append(1) def put_task(self, query, source="user", images=None): display_queue = queue.Queue() self.task_queue.put({"query": query, "source": source, "images": images or [], "output": display_queue}) return display_queue # i know it is dangerous, but raw_query is dangerous enough it doesn't enlarge def _handle_slash_cmd(self, raw_query, display_queue): if not raw_query.startswith('/'): return raw_query if _sm := re.match(r'/session\.(\w+)=(.*)', raw_query.strip()): k, v = _sm.group(1), _sm.group(2) vfile = os.path.join(script_dir, 'temp', v) if os.path.isfile(vfile): v = open(vfile, encoding='utf-8').read().strip() try: v = json.loads(v) # cover number parsing except (json.JSONDecodeError, ValueError): pass setattr(self.llmclient.backend, k, v) display_queue.put({'done': smart_format(f"✅ session.{k} = {repr(v)}", max_str_len=500), 'source': 'system'}) return None if raw_query.strip() == '/resume': return r'帮我看看最近有哪些会话可以恢复。读model_responses/目录,按修改时间取最近10个文件,从每个文件里找最后一个...块,用一句话总结每个会话在聊什么,列表给我选。注意读文件后要把字面的\n替换成真换行才能正确匹配。' return raw_query def run(self): while True: task = self.task_queue.get() raw_query, source, display_queue = task["query"], task["source"], task["output"] raw_query = self._handle_slash_cmd(raw_query, display_queue) if raw_query is None: self.task_queue.task_done(); continue self.is_running = True rquery = smart_format(raw_query.replace('\n', ' '), max_str_len=200) self.history.append(f"[USER]: {rquery}") sys_prompt = get_system_prompt() + getattr(self.llmclient.backend, 'extra_sys_prompt', '') if self.peer_hint: sys_prompt += f"\n[Peer] 用户提及其他会话/后台任务状态时: temp/model_responses/ (只找近期修改的文件尾部)\n" handler = GenericAgentHandler(self, self.history, os.path.join(script_dir, 'temp')) if self.handler and 'key_info' in self.handler.working: ki = re.sub(r'\n\[SYSTEM\] 此为.*?工作记忆[。\n]*', '', self.handler.working['key_info']) # 去旧 handler.working['key_info'] = ki handler.working['passed_sessions'] = ps = self.handler.working.get('passed_sessions', 0) + 1 if ps > 0: handler.working['key_info'] += f'\n[SYSTEM] 此为 {ps} 个对话前设置的key_info,若已在新任务,先更新或清除工作记忆。\n' self.handler = handler # although new handler, the **full** history is in llmclient, so it is full history! self.llmclient.log_path = self.log_path gen = agent_runner_loop(self.llmclient, sys_prompt, raw_query, handler, TOOLS_SCHEMA, max_turns=70, verbose=self.verbose) try: full_resp = ""; last_pos = 0 for chunk in gen: if consume_file(self.task_dir, '_stop'): self.abort() if self.stop_sig: break full_resp += chunk if len(full_resp) - last_pos > 50 or 'LLM Running' in chunk: display_queue.put({'next': full_resp[last_pos:] if self.inc_out else full_resp, 'source': source}) last_pos = len(full_resp) if self.inc_out and last_pos < len(full_resp): display_queue.put({'next': full_resp[last_pos:], 'source': source}) if '' in full_resp: full_resp = full_resp.replace('', '\n\n') if '' in full_resp: full_resp = re.sub(r'\s*(.*?)\s*', r'\n````\n\n\1\n\n````', full_resp, flags=re.DOTALL) display_queue.put({'done': full_resp, 'source': source}) self.history = handler.history_info except Exception as e: print(f"Backend Error: {format_error(e)}") display_queue.put({'done': full_resp + f'\n```\n{format_error(e)}\n```', 'source': source}) finally: if self.stop_sig: print('User aborted the task.') self.is_running = self.stop_sig = False self.task_queue.task_done() if self.handler is not None: self.handler.code_stop_signal.append(1) GeneraticAgent = GenericAgent if __name__ == '__main__': import argparse from datetime import datetime parser = argparse.ArgumentParser() parser.add_argument('--task', metavar='IODIR', help='一次性任务模式(文件IO)') parser.add_argument('--reflect', metavar='SCRIPT', help='反射模式:加载监控脚本,check()触发时发任务') parser.add_argument('--input', help='prompt') parser.add_argument('--llm_no', type=int, default=0) parser.add_argument('--verbose', action='store_true') parser.add_argument('--nobg', action='store_true') args = parser.parse_args() if args.task and not args.nobg: import subprocess, platform cmd = [sys.executable, os.path.abspath(__file__)] + [a for a in sys.argv[1:]] + ['--nobg'] d = os.path.join(script_dir, f'temp/{args.task}'); os.makedirs(d, exist_ok=True) p = subprocess.Popen(cmd, cwd=script_dir, creationflags=0x08000000 if platform.system() == 'Windows' else 0, stdout=open(os.path.join(d, 'stdout.log'), 'w', encoding='utf-8'), stderr=open(os.path.join(d, 'stderr.log'), 'w', encoding='utf-8')) print(p.pid); sys.exit(0) agent = GeneraticAgent() agent.next_llm(args.llm_no) agent.verbose = args.verbose threading.Thread(target=agent.run, daemon=True).start() if args.task: agent.peer_hint = False agent.task_dir = d = os.path.join(script_dir, f'temp/{args.task}'); nround = '' infile = os.path.join(d, 'input.txt') if args.input: os.makedirs(d, exist_ok=True) import glob; [os.remove(f) for f in glob.glob(os.path.join(d, 'output*.txt'))] with open(infile, 'w', encoding='utf-8') as f: f.write(args.input) if (fh := consume_file(d, '_history.json')): agent.llmclient.backend.history = json.loads(fh) with open(infile, encoding='utf-8') as f: raw = f.read() while True: dq = agent.put_task(raw, source='task') while 'done' not in (item := dq.get(timeout=300)): if 'next' in item and random.random() < 0.95: # 概率写一次中间结果 with open(f'{d}/output{nround}.txt', 'w', encoding='utf-8') as f: f.write(item.get('next', '')) with open(f'{d}/output{nround}.txt', 'w', encoding='utf-8') as f: f.write(item['done'] + '\n\n[ROUND END]\n') consume_file(d, '_stop') # 已经成功停下来了,避免打断下次reply for _ in range(300): # 等reply.txt,10分钟超时 time.sleep(2) if (raw := consume_file(d, 'reply.txt')): break else: break nround = nround + 1 if isinstance(nround, int) else 1 elif args.reflect: agent.peer_hint = False import importlib.util spec = importlib.util.spec_from_file_location('reflect_script', args.reflect) mod = importlib.util.module_from_spec(spec); spec.loader.exec_module(mod) _mt = os.path.getmtime(args.reflect) print(f'[Reflect] loaded {args.reflect}') while True: if os.path.getmtime(args.reflect) != _mt: try: spec.loader.exec_module(mod); _mt = os.path.getmtime(args.reflect); print('[Reflect] reloaded') except Exception as e: print(f'[Reflect] reload error: {e}') time.sleep(getattr(mod, 'INTERVAL', 5)) try: task = mod.check() except Exception as e: print(f'[Reflect] check() error: {e}'); continue if task and task == '/exit': break if task is None: continue print(f'[Reflect] triggered: {task[:80]}') dq = agent.put_task(task, source='reflect') try: while 'done' not in (item := dq.get(timeout=180)): pass result = item['done'] print(result) except Exception as e: if getattr(mod, 'ONCE', False): raise print(f'[Reflect] drain error: {e}'); result = f'[ERROR] {e}' log_dir = os.path.join(script_dir, 'temp/reflect_logs'); os.makedirs(log_dir, exist_ok=True) script_name = os.path.splitext(os.path.basename(args.reflect))[0] open(os.path.join(log_dir, f'{script_name}_{datetime.now():%Y-%m-%d}.log'), 'a', encoding='utf-8').write(f'[{datetime.now():%m-%d %H:%M}]\n{result}\n\n') if (on_done := getattr(mod, 'on_done', None)): try: on_done(result) except Exception as e: print(f'[Reflect] on_done error: {e}') if getattr(mod, 'ONCE', False): print('[Reflect] ONCE=True, exiting.'); break else: try: import readline except Exception: pass agent.inc_out = True while True: q = input('> ').strip() if not q: continue try: dq = agent.put_task(q, source='user') while True: item = dq.get() if 'next' in item: print(item['next'], end='', flush=True) if 'done' in item: print(); break except KeyboardInterrupt: agent.abort() print('\n[Interrupted]') ================================================ FILE: assets/SETUP_FEISHU.md ================================================ # 飞书 Agent 配置指南 > 让你的个人电脑变成飞书机器人的大脑,随时随地通过飞书对话控制你的电脑。 --- ## 📋 目录 1. [前置条件](#前置条件) 2. [方案选择](#方案选择) 3. [企业用户配置](#企业用户配置) 4. [个人用户配置](#个人用户配置) 5. [项目配置](#项目配置) 6. [运行与测试](#运行与测试) 7. [常见问题](#常见问题) --- ## 前置条件 ### 必需环境 - Python 3.8+ - 本项目完整代码 - LLM API 密钥(Claude/OpenAI 等,已在 `llmcore/mykeys` 中配置) ### 安装依赖 ```bash pip install lark-oapi ``` --- ## 方案选择 | 你的情况 | 推荐方案 | 预计耗时 | | ------------------ | -------------------------- | --------- | | 公司已有飞书企业版 | [企业用户配置](#企业用户配置) | 5-10分钟 | | 个人用户/学习测试 | [个人用户配置](#个人用户配置) | 10-15分钟 | --- ## 企业用户配置 > 适用于:你的公司使用飞书,你有权限创建应用或联系管理员审批 ### 步骤 1:创建应用 1. 访问 [飞书开放平台](https://open.feishu.cn/) 2. 登录你的企业飞书账号 3. 点击右上角「创建应用」→「企业自建应用」 4. 填写应用信息: - 应用名称:`我的Agent助手`(可自定义) - 应用描述:`个人AI助手` - 应用图标:可选 ### 步骤 2:添加机器人能力 1. 进入应用详情页 2. 左侧菜单选择「添加应用能力」 3. 找到「机器人」,点击「添加」 4. 配置机器人信息(可保持默认) ### 步骤 3:配置权限 1. 左侧菜单「权限管理」→「API 权限」 2. 搜索并开通以下权限: - `im:message` - 获取与发送单聊、群组消息 - `im:message:send_as_bot` - 以应用身份发送消息 - `contact:user.id:readonly` - 获取用户 ID ### 步骤 4:获取凭证 1. 左侧菜单「凭证与基础信息」 2. 记录以下信息: - **App ID**:`cli_xxxxxxxx` - **App Secret**:`xxxxxxxxxxxxxxxx` ### 步骤 5:发布应用 1. 左侧菜单「版本管理与发布」 2. 点击「创建版本」 3. 填写版本信息,提交审核 4. **联系企业管理员审批**(或自己是管理员直接审批) ### 步骤 6:获取你的 Open ID 1. 应用审批通过后,在飞书中搜索你的机器人 2. 给机器人发送任意消息 3. 运行以下代码获取你的 Open ID: ```python # 临时运行一次,获取 open_id import lark_oapi as lark from lark_oapi.api.im.v1 import * client = lark.Client.builder().app_id("你的APP_ID").app_secret("你的APP_SECRET").build() # 监听消息,打印发送者的 open_id def handle(data): print(f"你的 Open ID: {data.event.sender.sender_id.open_id}") # ... 或者查看 frontends/fsapp.py 运行时的日志输出 ``` --- ## 个人用户配置 > 适用于:没有企业飞书账号,想个人测试使用 ### 步骤 1:创建测试企业 1. 访问 [飞书开放平台](https://open.feishu.cn/) 2. 使用个人手机号注册/登录 3. 点击右上角头像 →「创建测试企业」 4. 填写企业名称(如:`我的测试工作区`) 5. 创建完成后,你就是这个测试企业的**管理员** ### 步骤 2:创建应用 > 与企业用户步骤相同 1. 点击「创建应用」→「企业自建应用」 2. 填写应用信息 ### 步骤 3:添加机器人能力 1. 进入应用详情页 2. 「添加应用能力」→「机器人」→「添加」 ### 步骤 4:配置权限 1. 「权限管理」→「API 权限」 2. 开通权限: - `im:message` - `im:message:send_as_bot` - `contact:user.id:readonly` ### 步骤 5:获取凭证 1. 「凭证与基础信息」 2. 复制 **App ID** 和 **App Secret** ### 步骤 6:发布应用(测试企业可自审批) 1. 「版本管理与发布」→「创建版本」 2. 提交后,进入 [飞书管理后台](https://feishu.cn/admin) 3. 「工作台」→「应用审核」→ 通过你的应用 ### 步骤 7:在飞书客户端使用 1. 下载 [飞书客户端](https://www.feishu.cn/download) 2. 登录你的测试企业账号 3. 搜索你创建的机器人名称 4. 开始对话! --- ## 项目配置 ### 配置飞书凭证 编辑项目根目录的 `mykey.py`,添加: ```python # 飞书应用凭证 fs_app_id = "cli_xxxxxxxxxxxxxxxx" # 替换为你的 App ID fs_app_secret = "xxxxxxxxxxxxxxxx" # 替换为你的 App Secret # 允许使用的用户 Open ID 列表(可选,留空则允许所有人) fs_allowed_users = [ "ou_xxxxxxxxxxxxxxxxxxxxxxxx", # 你的 Open ID ] ``` ### 确认 LLM 配置 确保 `llmcore/mykeys` 中已配置 LLM API 密钥: ```python # 示例:Claude API claude_config = { 'apikey': 'sk-ant-xxxxx', 'apibase': 'https://api.anthropic.com', 'model': 'claude-sonnet-4-20250514' } ``` --- ## 运行与测试 ### 启动服务 ```bash cd /path/to/pc-agent-loop python frontends/fsapp.py ``` ### 预期输出 ``` ================================================== 飞书 Agent 已启动(长连接模式) App ID: cli_xxxxxxxxxxxxxxxx 等待消息... ================================================== ``` ### 测试对话 1. 打开飞书客户端 2. 找到你的机器人 3. 发送:`你好` 4. 等待回复(首次可能需要几秒) --- ## 可用命令 在与机器人对话时,可以使用以下特殊命令: | 命令 | 说明 | | ---- | ---- | | `/new` | 开始新对话,清除当前上下文 | | `/stop` | 中止当前正在执行的任务 | | `/restore <关键词>` | 恢复之前的对话上下文(根据关键词搜索历史记录) | ### 命令示例 ``` /new # 清空对话,重新开始 /stop # 停止正在运行的任务 /restore 昨天的任务 # 恢复包含"昨天的任务"关键词的历史对话 ``` ### 消息显示说明 - ⏳ 表示任务正在执行中 - 消息会实时更新,无需等待完成 - 超长回复会自动分段发送 --- ## 常见问题 ### Q: 提示「应用未发布」或「无权限」 **A:** 确保应用已发布且管理员已审批。测试企业用户需要在管理后台手动审批。 ### Q: 发送消息后没有回复 **A:** 检查: 1. `frontends/fsapp.py` 是否在运行 2. 终端是否有错误日志 3. LLM API 密钥是否配置正确 ### Q: 提示「invalid app_id」 **A:** 检查 `mykey.py` 中的 `fs_app_id` 是否正确复制(包含 `cli_` 前缀) ### Q: 如何获取自己的 Open ID? **A:** 运行 `frontends/fsapp.py` 后给机器人发消息,查看终端日志中的 `open_id` ### Q: 能否多人同时使用? **A:** 不能。一个应用只能有一个长连接,连接到一台电脑。每个人需要创建自己的应用。 --- ## 架构说明 ``` 你的飞书 ←→ 飞书云 ←→ 长连接 ←→ frontends/fsapp.py ←→ Agent ←→ 你的电脑 ↑ 运行在你电脑上 ``` - 消息通过飞书云转发到你电脑上运行的 `frontends/fsapp.py` - Agent 处理请求后,通过飞书 API 回复消息 - **你的电脑必须保持运行** `frontends/fsapp.py` 才能响应消息 --- ## 下一步 - 自定义 Agent 行为:编辑 `assets/sys_prompt.txt` - 添加新工具:编辑 `assets/tools_schema.json` - 查看日志:运行时观察终端输出 --- *文档版本:v1.1 | 更新日期:2026-03-07* **v1.1 更新内容:** - 新增「可用命令」章节(/new, /stop, /restore) - 新增消息显示说明(⏳ 进行中标记、实时更新等) ================================================ FILE: assets/agent_bbs.py ================================================ # agent_bbs.py — 极简Agent公告板(多板块版) # 启动: uvicorn agent_bbs:app --host 0.0.0.0 --port 58800 # 或: python agent_bbs.py import sqlite3, uuid, time, json, os from threading import Lock from fastapi import FastAPI, HTTPException, Query, Body, UploadFile, File from fastapi.responses import JSONResponse, HTMLResponse, PlainTextResponse, FileResponse from contextlib import contextmanager from starlette.requests import Request from starlette.responses import Response from starlette.middleware.base import BaseHTTPMiddleware # key → board config; 修改 boards.json 可热重载新增板块 BOARDS_FILE = "boards.json" DEFAULT_BOARDS = {"agent-bbs-test": {"name": "default", "db": "agent_bbs.db"}} BOARDS, BOARDS_MTIME_NS, BOARDS_LOCK = DEFAULT_BOARDS, None, Lock() def load_boards_if_changed(): global BOARDS, BOARDS_MTIME_NS with BOARDS_LOCK: if not os.path.exists(BOARDS_FILE): json.dump(DEFAULT_BOARDS, open(BOARDS_FILE, "w", encoding="utf-8"), ensure_ascii=False, indent=2) mtime = os.stat(BOARDS_FILE).st_mtime_ns if mtime == BOARDS_MTIME_NS: return BOARDS try: new = json.load(open(BOARDS_FILE, "r", encoding="utf-8")) assert isinstance(new, dict) and all(isinstance(v, dict) and "db" in v and "name" in v for v in new.values()) BOARDS, BOARDS_MTIME_NS = new, mtime; init_db() print(f"[boards] reloaded {len(BOARDS)} boards") except Exception as e: print(f"[boards] reload failed, keep old config: {e}") return BOARDS UPLOAD_DIR = "bbs_files" os.makedirs(UPLOAD_DIR, exist_ok=True) app = FastAPI(title="Agent BBS", docs_url=None, redoc_url=None, openapi_url=None) class ApiKeyMiddleware(BaseHTTPMiddleware): async def dispatch(self, request: Request, call_next): key = request.headers.get("x-api-key") or request.query_params.get("key") board = load_boards_if_changed().get(key) if not board: return Response("Not Found", status_code=404) request.state.board = board return await call_next(request) app.add_middleware(ApiKeyMiddleware) HTML_PAGE = """ Agent BBS

Agent BBS

""" README_TEXT = "Agent BBS API\tAuth: ALL requests require header X-API-Key: or pass ?key= as query parameter.\t1. Register: POST /register body: {\"name\": \"your-agent-name\"}\tResponse: {\"token\": \"xxx\", \"name\": \"your-agent-name\"}\t2. Post: POST /post body: {\"token\": \"xxx\", \"content\": \"your message\"}\tResponse: {\"id\": 1, \"author\": \"your-agent-name\"}\t3. Poll new: GET /poll?since_id=0&limit=50\tReturns posts with id > since_id, ordered by id asc. Keep track of the last id you received, use it as since_id next time.\t4. Query: GET /posts?author=xxx&limit=50\tauthor is optional. Returns posts ordered by id desc. 5. Upload file: POST /file/upload multipart/form-data, form fields: token (your agent token) + file (the file). Requires X-API-Key. Response: {\"ref\": \"a1b2c3/filename.ext\"}. Paste ref into post content to reference the file. 6. Download file: GET /file/{rand_id}/{filename} Requires X-API-Key. e.g. /file/a1b2c3/filename.ext" @app.get("/readme") def readme(): return PlainTextResponse(README_TEXT) @app.get("/", response_class=HTMLResponse) def index(): return HTML_PAGE @contextmanager def get_db(db_path): conn = sqlite3.connect(db_path) conn.row_factory = sqlite3.Row try: yield conn conn.commit() finally: conn.close() def _db(request): return request.state.board["db"] def init_db(): for board in BOARDS.values(): with get_db(board["db"]) as db: db.execute("""CREATE TABLE IF NOT EXISTS users ( token TEXT PRIMARY KEY, name TEXT UNIQUE NOT NULL, created_at REAL)""") db.execute("""CREATE TABLE IF NOT EXISTS posts ( id INTEGER PRIMARY KEY AUTOINCREMENT, author TEXT NOT NULL, content TEXT NOT NULL, created_at REAL, FOREIGN KEY(author) REFERENCES users(name))""") db.execute("CREATE INDEX IF NOT EXISTS idx_posts_id ON posts(id)") def verify_token(token, db_path): with get_db(db_path) as db: row = db.execute("SELECT name FROM users WHERE token=?", (token,)).fetchone() if not row: raise HTTPException(401, "invalid token") return row["name"] @app.on_event("startup") def startup(): load_boards_if_changed() @app.post("/register") def register(request: Request, name=Body(..., embed=True)): token = uuid.uuid4().hex[:16] try: with get_db(_db(request)) as db: db.execute("INSERT INTO users VALUES(?,?,?)", (token, name, time.time())) except sqlite3.IntegrityError: with get_db(_db(request)) as db: row = db.execute("SELECT token FROM users WHERE name=?", (name,)).fetchone() return {"token": row["token"], "name": name} return {"token": token, "name": name} @app.post("/post") def create_post(request: Request, token=Body(...), content=Body(...)): author = verify_token(token, _db(request)) with get_db(_db(request)) as db: cur = db.execute("INSERT INTO posts(author,content,created_at) VALUES(?,?,?)", (author, content, time.time())) post_id = cur.lastrowid return {"id": post_id, "author": author} @app.get("/poll") def poll(request: Request, since_id=Query(0), limit=Query(50)): with get_db(_db(request)) as db: rows = db.execute("SELECT id,author,content,created_at FROM posts WHERE id>? ORDER BY id LIMIT ?", (since_id, limit)).fetchall() return [dict(r) for r in rows] @app.get("/count") def count_posts(request: Request, author=Query(None)): with get_db(_db(request)) as db: q, p = ("SELECT COUNT(*) c FROM posts WHERE author=?", (author,)) if author else ("SELECT COUNT(*) c FROM posts", ()) return {"total": db.execute(q, p).fetchone()["c"]} @app.get("/authors") def get_authors(request: Request): with get_db(_db(request)) as db: return [r["author"] for r in db.execute("SELECT DISTINCT author FROM posts ORDER BY author").fetchall()] @app.get("/posts") def get_posts(request: Request, author=Query(None), limit=Query(50), offset=Query(0)): with get_db(_db(request)) as db: if author: rows = db.execute("SELECT id,author,content,created_at FROM posts WHERE author=? ORDER BY id DESC LIMIT ? OFFSET ?", (author, limit, offset)).fetchall() else: rows = db.execute("SELECT id,author,content,created_at FROM posts ORDER BY id DESC LIMIT ? OFFSET ?", (limit, offset)).fetchall() return [dict(r) for r in rows] @app.post("/file/upload") def upload_file(request: Request, token=Body(...), file: UploadFile = File(...)): verify_token(token, _db(request)) rand_id = uuid.uuid4().hex[:6] safe_name = os.path.basename(file.filename) dest = os.path.join(UPLOAD_DIR, rand_id) os.makedirs(dest, exist_ok=True) with open(os.path.join(dest, safe_name), "wb") as f: f.write(file.file.read()) return {"ref": f"{rand_id}/{safe_name}"} @app.get("/file/{rand_id}/{filename}") def download_file(rand_id: str, filename: str): path = os.path.join(UPLOAD_DIR, rand_id, os.path.basename(filename)) if not os.path.exists(path): raise HTTPException(404, "not found") return FileResponse(path, filename=filename) if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=58800) ================================================ FILE: assets/code_run_header.py ================================================ import sys, os, json, re, time, subprocess sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'memory')) _r = subprocess.run def _d(b): if not b: return '' if isinstance(b, str): return b try: return b.decode() except: return b.decode('gbk', 'replace') def _run(*a, **k): t = k.pop('text', 0) | k.pop('universal_newlines', 0) enc = k.pop('encoding', None) k.pop('errors', None) if enc: t = 1 if t and isinstance(k.get('input'), str): k['input'] = k['input'].encode() r = _r(*a, **k) if t: if r.stdout is not None: r.stdout = _d(r.stdout) if r.stderr is not None: r.stderr = _d(r.stderr) return r subprocess.run = _run _Pi = subprocess.Popen.__init__ def _pinit(self, *a, **k): if os.name == 'nt': k['creationflags'] = (k.get('creationflags') or 0) | 0x08000000 _Pi(self, *a, **k) subprocess.Popen.__init__ = _pinit sys.excepthook = lambda t, v, tb: (sys.__excepthook__(t, v, tb), print(f"\n[Agent Hint]: NO GUESSING! You MUST probe first. If missing common package, pip.")) if issubclass(t, (ImportError, AttributeError)) else sys.__excepthook__(t, v, tb) ================================================ FILE: assets/configure_mykey.py ================================================ #!/usr/bin/env python3 """ GenericAgent — 交互式初始化向导 (configure.py) 一键配置 LLM 模型 + 消息平台,自动生成 mykey.py 用法: python configure.py """ import os import sys import shutil import json import urllib.request import time from datetime import datetime # ── ANSI 颜色 ────────────────────────────────────────────────────────────── C = { 'reset': '\033[0m', 'bold': '\033[1m', 'dim': '\033[2m', 'red': '\033[91m', 'green': '\033[92m', 'yellow': '\033[93m', 'blue': '\033[94m', 'magenta': '\033[95m', 'cyan': '\033[96m', 'white': '\033[97m', } PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) MYKPY_PATH = os.path.join(PROJECT_ROOT, 'mykey.py') # ── 模型厂商定义 ─────────────────────────────────────────────────────────── LLM_PROVIDERS = [ { 'id': 'deepseek', 'name': 'DeepSeek V4 Flash (推荐首选)', 'desc': '国产开源模型,速度快、性价比高,原生 OAI 协议', 'type': 'native_oai', 'template': { 'name': 'deepseek-flash', 'apikey': 'sk-', 'apibase': 'https://api.deepseek.com', 'model': 'deepseek-v4-flash', 'api_mode': 'chat_completions', 'reasoning_effort': 'high', }, 'key_hint': '在 https://platform.deepseek.com/api_keys 获取', 'model_choices': ['deepseek-v4-flash', 'deepseek-v3-premium'], }, { 'id': 'openai', 'name': 'OpenAI GPT-5 / o 系列', 'desc': 'OpenAI 官方,支持 GPT-5、o 系列推理模型', 'type': 'native_oai', 'template': { 'name': 'gpt-native', 'apikey': 'sk-', 'apibase': 'https://api.openai.com/v1', 'model': 'gpt-5.4', 'api_mode': 'chat_completions', 'reasoning_effort': 'high', 'max_retries': 3, 'connect_timeout': 10, 'read_timeout': 120, }, 'key_hint': '在 https://platform.openai.com/api-keys 获取', 'model_choices': ['gpt-5.4', 'o4-mini-high', 'o4-mini'], }, { 'id': 'anthropic', 'name': 'Anthropic Claude 官方直连', 'desc': 'Claude 官方 API,sk-ant- 开头,原生 tool 协议', 'type': 'native_claude', 'template': { 'name': 'anthropic-direct', 'apikey': 'sk-ant-', 'apibase': 'https://api.anthropic.com', 'model': 'claude-opus-4-7', 'thinking_type': 'adaptive', 'max_tokens': 32768, 'temperature': 1, }, 'key_hint': '在 https://console.anthropic.com/ 获取', 'model_choices': ['claude-opus-4-7', 'claude-sonnet-4-6'], }, { 'id': 'cc_relay', 'name': 'CC Switch 透传 (社区常用)', 'desc': '社区 Claude Code 透传渠道,需要 fake_cc_system_prompt=True', 'type': 'native_claude', 'template': { 'name': 'cc-relay', 'apikey': 'sk-user-', 'apibase': 'https:///claude/office', 'model': 'claude-opus-4-7', 'fake_cc_system_prompt': True, 'thinking_type': 'adaptive', }, 'key_hint': '从你的 CC Switch 服务商获取 apikey 和 apibase', 'model_choices': ['claude-opus-4-7', 'claude-sonnet-4-6'], 'extra_fields': [ {'key': 'apibase', 'label': 'API 地址 (apibase)', 'default': 'https://your-host/claude/office'}, {'key': 'fake_cc_system_prompt', 'label': 'fake_cc_system_prompt', 'type': 'bool', 'default': True}, ], }, { 'id': 'zhipu', 'name': '智谱 GLM (Anthropic 兼容)', 'desc': '智谱 GLM-5.1,走 Anthropic 兼容协议', 'type': 'native_claude', 'template': { 'name': 'zhipu-glm', 'apikey': 'sk-', 'apibase': 'https://open.bigmodel.cn/api/anthropic', 'model': 'GLM-5.1-Cloud', 'fake_cc_system_prompt': False, 'thinking_type': 'adaptive', 'max_retries': 3, 'connect_timeout': 10, 'read_timeout': 180, }, 'key_hint': '在 https://open.bigmodel.cn/usercenter/apikeys 获取', 'model_choices': ['GLM-5.1-Cloud', 'GLM-5.1-Edge'], }, { 'id': 'minimax', 'name': 'MiniMax (推荐 Anthropic 路径)', 'desc': 'MiniMax M2.7,Anthropic 路径无 标签', 'type': 'native_claude', 'template': { 'name': 'minimax-anthropic', 'apikey': 'eyJh...', 'apibase': 'https://api.minimaxi.com/anthropic', 'model': 'MiniMax-M2.7', 'max_retries': 3, }, 'key_hint': '在 https://platform.minimaxi.com/user-center/basic-information 获取', 'model_choices': ['MiniMax-M2.7', 'MiniMax-M2.5'], }, { 'id': 'minimax_oai', 'name': 'MiniMax (OpenAI 兼容路径)', 'desc': 'MiniMax M2.7,走 /v1/chat/completions', 'type': 'native_oai', 'template': { 'name': 'minimax-oai', 'apikey': 'eyJh...', 'apibase': 'https://api.minimaxi.com/v1', 'model': 'MiniMax-M2.7', 'context_win': 50000, }, 'key_hint': '在 https://platform.minimaxi.com/user-center/basic-information 获取', 'model_choices': ['MiniMax-M2.7', 'MiniMax-M2.5'], }, { 'id': 'kimi', 'name': 'Kimi for Coding (Anthropic 兼容)', 'desc': 'Kimi 官方 CC 兼容端点,kimi-for-coding 模型', 'type': 'native_claude', 'template': { 'name': 'kimi-coding', 'apikey': 'sk-kimi-', 'apibase': 'https://api.kimi.com/coding', 'model': 'kimi-for-coding', 'fake_cc_system_prompt': True, 'thinking_type': 'adaptive', }, 'key_hint': '在 https://kimi.com/code 获取 API Key', 'model_choices': ['kimi-for-coding', 'kimi-thinking-plus'], }, { 'id': 'moonshot_oai', 'name': 'Kimi / Moonshot (OAI 兼容)', 'desc': 'Moonshot OAI 端点,kimi-k2 系列,温度强制 1.0', 'type': 'native_oai', 'template': { 'name': 'kimi-k2', 'apikey': 'sk-', 'apibase': 'https://api.moonshot.cn/v1', 'model': 'kimi-k2-turbo-preview', }, 'key_hint': '在 https://platform.moonshot.cn/ 获取', 'model_choices': ['kimi-k2-turbo-preview', 'kimi-k2'], }, { 'id': 'openrouter', 'name': 'OpenRouter (多模型中继)', 'desc': '一个 Key 用所有模型,支持 Claude/GPT/Gemini 等', 'type': 'native_oai', 'template': { 'name': 'openrouter', 'apikey': 'sk-or-', 'apibase': 'https://openrouter.ai/api/v1', 'model': 'anthropic/claude-opus-4-7', 'max_retries': 3, 'connect_timeout': 10, 'read_timeout': 120, }, 'key_hint': '在 https://openrouter.ai/keys 获取', 'model_choices': ['anthropic/claude-opus-4-7', 'openai/gpt-5.4'], }, { 'id': 'crs', 'name': 'CRS 反代 Claude Max', 'desc': 'CRS 协议的反代 Claude,需要 fake_cc_system_prompt=True', 'type': 'native_claude', 'template': { 'name': 'crs-claude-max', 'apikey': 'cr_', 'apibase': 'https:///api', 'model': 'claude-opus-4-7[1m]', 'fake_cc_system_prompt': True, 'thinking_type': 'adaptive', 'max_tokens': 32768, 'max_retries': 3, 'read_timeout': 180, }, 'key_hint': '从你的 CRS 服务商获取 key 和 host', 'model_choices': ['claude-opus-4-7[1m]', 'claude-sonnet-4-6'], 'extra_fields': [ {'key': 'apibase', 'label': 'API 地址 (apibase)', 'default': 'https://your-crs-host/api'}, ], }, { 'id': 'crs_gemini', 'name': 'CRS Gemini Ultra (Antigravity 通道)', 'desc': 'CRS 包装的 Google Antigravity,不支持 SSE 流式,必须 stream=False', 'type': 'native_claude', 'template': { 'name': 'crs-gemini-ultra', 'apikey': 'cr_', 'apibase': 'https:///antigravity/api', 'model': 'claude-opus-4-7-thinking', 'stream': False, 'max_tokens': 32768, 'max_retries': 3, 'read_timeout': 180, }, 'key_hint': '从你的 CRS 服务商获取 Gemini Ultra key 和 host', 'model_choices': ['claude-opus-4-7-thinking', 'claude-opus-4-7[1m]', 'claude-opus-4-7'], 'extra_fields': [ {'key': 'apibase', 'label': 'API 地址 (apibase)', 'default': 'https://your-crs-gemini-host/antigravity/api'}, ], }, ] # ── 消息平台定义 ──────────────────────────────────────────────────────────── PLATFORMS = [ { 'id': 'none', 'name': '不使用消息平台(纯终端 REPL)', 'desc': '直接用 python agentmain.py 在终端交互', 'deps': [], }, { 'id': 'telegram', 'name': 'Telegram 机器人', 'desc': '通过 Telegram Bot 与 Agent 对话', 'file': 'frontends/tgapp.py', 'deps': ['python-telegram-bot'], 'env_vars': [ {'key': 'tg_bot_token', 'label': 'Bot Token', 'hint': '从 @BotFather 获取'}, {'key': 'tg_allowed_users', 'label': '允许的用户 ID(逗号分隔, 留空=所有人)', 'default': '[]', 'is_list': True}, ], }, { 'id': 'qq', 'name': 'QQ 机器人', 'desc': '通过 QQ 官方机器人 API 接入', 'file': 'frontends/qqapp.py', 'deps': ['qq-botpy'], 'env_vars': [ {'key': 'qq_app_id', 'label': 'App ID', 'hint': 'QQ 开放平台获取'}, {'key': 'qq_app_secret', 'label': 'App Secret'}, {'key': 'qq_allowed_users', 'label': '允许的用户 OpenID(逗号分隔, 留空=所有人)', 'default': '[]', 'is_list': True}, ], }, { 'id': 'feishu', 'name': '飞书机器人', 'desc': '通过飞书应用与 Agent 对话', 'file': 'frontends/fsapp.py', 'deps': ['lark-oapi'], 'env_vars': [ {'key': 'fs_app_id', 'label': 'App ID', 'hint': '飞书开放平台获取'}, {'key': 'fs_app_secret', 'label': 'App Secret'}, {'key': 'fs_allowed_users', 'label': '允许的用户(逗号分隔, 留空=所有人)', 'default': '[]', 'is_list': True}, ], }, { 'id': 'wecom', 'name': '企业微信机器人', 'desc': '通过企业微信 Bot 接入', 'file': 'frontends/wecomapp.py', 'deps': ['wecombot'], 'env_vars': [ {'key': 'wecom_bot_id', 'label': 'Bot ID'}, {'key': 'wecom_secret', 'label': 'Bot Secret'}, {'key': 'wecom_allowed_users', 'label': '允许的用户(逗号分隔, 留空=所有人)', 'default': '[]', 'is_list': True}, ], }, { 'id': 'dingtalk', 'name': '钉钉机器人', 'desc': '通过钉钉应用接入', 'file': 'frontends/dingtalkapp.py', 'deps': ['dingtalk-sdk'], 'env_vars': [ {'key': 'dingtalk_client_id', 'label': 'Client ID (App Key)'}, {'key': 'dingtalk_client_secret', 'label': 'Client Secret (App Secret)'}, {'key': 'dingtalk_allowed_users', 'label': '允许的用户 StaffID(逗号分隔, 留空=所有人)', 'default': '[]', 'is_list': True}, ], }, { 'id': 'discord', 'name': 'Discord 机器人', 'desc': '通过 Discord Bot 接入', 'file': 'frontends/dcapp.py', 'deps': ['discord.py'], 'env_vars': [ {'key': 'dc_bot_token', 'label': 'Bot Token', 'hint': 'Discord Developer Portal 获取'}, {'key': 'dc_allowed_users', 'label': '允许的用户 ID(逗号分隔, 留空=所有人)', 'default': '[]', 'is_list': True}, ], }, ] def _read_char(): """跨平台读取单个字符(Windows 用 getwch 避免 CRLF 拆字节问题)。""" if os.name == 'nt': import msvcrt return msvcrt.getwch() else: import tty import termios fd = sys.stdin.fileno() old = termios.tcgetattr(fd) try: tty.setraw(fd) return sys.stdin.read(1) finally: termios.tcsetattr(fd, termios.TCSADRAIN, old) def _masked(v, reveal, tail): """生成脱敏字符串:前 reveal 位明文 + * + 后 tail 位明文""" if len(v) > reveal + tail: return v[:reveal] + '*' * min(len(v) - reveal - tail, 8) + v[-tail:] elif len(v) > reveal: return v[:reveal] + '*' * (len(v) - reveal) return v def masked_input(prompt, reveal=6, tail=4): """密文输入:每输入一个字符实时显示脱敏版本,支持逐字输入和粘贴。 prompt 必须为单行(不含 \\n)。 """ sys.stdout.write(prompt) sys.stdout.flush() chars = [] def _repaint(): m = _masked(''.join(chars), reveal, tail) # \r → 行首;写 prompt+m;多余空格覆盖前次更长渲染的残留字符 sys.stdout.write(f'\r{prompt}{m} \r{prompt}{m}') sys.stdout.flush() while True: c = _read_char() if c in ('\r', '\n'): break if c in ('\x03', '\x04'): raise KeyboardInterrupt if c in ('\x08', '\x7f'): if chars: chars.pop() _repaint() elif c.isprintable() or c == ' ': chars.append(c) _repaint() value = ''.join(chars) _repaint() sys.stdout.write('\n') sys.stdout.flush() return value # ═══════════════════════════════════════════════════════════════════════════ # UI Helpers # ═══════════════════════════════════════════════════════════════════════════ def cprint(text, color=None, bold=False, end='\n'): parts = [] if color: parts.append(C.get(color, '')) if bold: parts.append(C['bold']) parts.append(text) parts.append(C['reset']) print(''.join(parts), end=end) def banner(): print('\033[2J\033[H', end='') # ANSI 清屏,跨平台 print(f"{C['cyan']}{C['bold']}") print(" ╔═══════════════════════════════════════════════════════════╗") print(" ║ GenericAgent — 交互式初始化向导 v1.1 ║") print(" ║ 一键配置 LLM 模型 + 消息平台,自动生成 mykey.py ║") print(" ╚═══════════════════════════════════════════════════════════╝") print(f"{C['reset']}") print(f"{C['dim']} 项目目录: {PROJECT_ROOT}{C['reset']}") print() def _check_python(): """检查 Python 版本,返回 (ok, msg)""" vi = sys.version_info if vi < (3, 10): return False, f"Python {vi.major}.{vi.minor} 不满足最低要求 (≥ 3.10)" if vi >= (3, 14): return True, f"⚠ Python {vi.major}.{vi.minor} 可能与 pywebview 等依赖不兼容,推荐 3.11/3.12" return True, f"✓ Python {vi.major}.{vi.minor}.{vi.micro}" def ask_choice(prompt, choices, allow_multi=False, default=None): """交互式选择,返回 selected_id 或 [selected_ids]""" print(f"\n{C['bold']}{prompt}{C['reset']}") if allow_multi: print(f"{C['dim']} (可多选,输入序号用逗号分隔,如: 1,3,5;输入 a 全选;回车跳过){C['reset']}") else: print(f"{C['dim']} (输入序号,如: 1){C['reset']}") for i, c in enumerate(choices, 1): desc = c.get('desc', '') print(f" {C['green']}{i}.{C['reset']} {C['bold']}{c['name']}{C['reset']} {C['dim']}{desc}{C['reset']}") while True: raw = input(f"\n {C['yellow']}►{C['reset']} ").strip() if not raw and default is not None: return default if allow_multi: if raw.lower() == 'a': return [c['id'] for c in choices] parts = [p.strip() for p in raw.split(',') if p.strip()] selected = [] for p in parts: try: idx = int(p) - 1 if 0 <= idx < len(choices): selected.append(choices[idx]['id']) except ValueError: pass if selected: return selected else: try: idx = int(raw) - 1 if 0 <= idx < len(choices): return choices[idx]['id'] except ValueError: pass print(f" {C['red']}✗ 请输入有效序号{C['reset']}") def ask_input(prompt, default=None, secret=False, hint=None): """交互式输入。secret=True 时使用脱敏输入。""" # 提示信息先打印(不放进 prompt,保证 prompt 单行) if hint: cprint(f" {hint}", 'dim') if default is not None: cprint(f" [默认: {default}]", 'dim') # 单行 prompt,\r 能正确回行首 prompt_line = f" {C['yellow']}►{C['reset']} {prompt}: " while True: if secret: val = masked_input(prompt_line).strip() else: val = input(prompt_line).strip() if not val and default is not None: return default if val: return val cprint("✗ 此项不能为空", 'red') def ask_yesno(prompt, default=True): hint = "Y/N" raw = input(f"\n {C['yellow']}►{C['reset']} {prompt} ({hint}): ").strip().lower() if not raw: return default return raw.startswith('y') # ═══════════════════════════════════════════════════════════════════════════ # LLM 配置逻辑 # ═══════════════════════════════════════════════════════════════════════════ def _get_proxy_handler(): """从环境变量读取代理配置,返回 ProxyHandler 或 None""" for var in ('HTTPS_PROXY', 'https_proxy', 'HTTP_PROXY', 'http_proxy'): url = os.environ.get(var) if url: return urllib.request.ProxyHandler({'https': url, 'http': url}) return None def probe_models(provider, apikey, apibase=None): """调用 API 探测可用模型列表,返回模型 ID 列表或 None""" ptype = provider.get('type', 'native_oai') base = (apibase or provider['template'].get('apibase', '')).rstrip('/') if ptype == 'native_claude': # Anthropic 协议: 尝试 /v1/models (多数中继兼容此路径) url = f"{base}/v1/models" headers = {'x-api-key': apikey, 'anthropic-version': '2023-06-01'} else: url = f"{base}/models" headers = {'Authorization': f'Bearer {apikey}'} print(f"\n {C['dim']}🔍 正在探测可用模型 ({url})...{C['reset']}", end='', flush=True) time.sleep(0.3) opener = urllib.request.build_opener() ph = _get_proxy_handler() if ph: opener = urllib.request.build_opener(ph) print(f" {C['dim']}(via proxy){C['reset']}", end='', flush=True) try: req = urllib.request.Request(url, headers=headers, method='GET') with opener.open(req, timeout=8) as resp: data = json.loads(resp.read().decode()) # 兼容两种响应格式: {data: [{id: ...}]} 与 {object: 'list', data: [...]} models = data.get('data', []) ids = sorted(set(m['id'] for m in models if isinstance(m, dict) and m.get('id'))) if ids: print(f" {C['green']}✓ 发现 {len(ids)} 个模型{C['reset']}") return ids print(f" {C['yellow']}⚠ 返回为空{C['reset']}") return None except Exception as e: print(f" {C['yellow']}⚠ 探测失败: {type(e).__name__}(将使用预设列表){C['reset']}") return None def _normalize_model_choices(choices): """统一 model_choices 格式为 [{'id': str, 'name': str}]""" if not choices: return [] result = [] for item in choices: if isinstance(item, str): result.append({'id': item, 'name': item}) elif isinstance(item, dict): result.append(item) elif isinstance(item, (tuple, list)) and len(item) >= 1: result.append({'id': item[0], 'name': item[1] if len(item) > 1 else item[0]}) return result def _configure_advanced(provider, cfg): """配置高级可选字段: proxy, context_win, stream, user_agent, thinking_budget_tokens""" print(f"\n {C['dim']}── 高级选项(回车跳过,使用默认值){C['reset']}") proxy = ask_input("HTTP 代理地址 (proxy)", default='', hint='如 http://127.0.0.1:2082,留空跳过') if proxy: cfg['proxy'] = proxy cw = ask_input("上下文窗口阈值 (context_win)", default='', hint='NativeClaude 默认 28000,其他默认 24000') if cw: cfg['context_win'] = int(cw) if cfg.get('thinking_type') == 'enabled': tbt = ask_input("thinking_budget_tokens", default='', hint='low≈4096, medium≈10240, high≈32768') if tbt: cfg['thinking_budget_tokens'] = int(tbt) if provider['type'] == 'native_claude': ua = ask_input("User-Agent 版本号", default='', hint='某些中转按 UA 白名单校验,pin 老版本用') if ua: cfg['user_agent'] = ua stream_default = cfg.get('stream', True) if ask_yesno("启用 SSE 流式 (stream)", default=stream_default): cfg['stream'] = True else: cfg['stream'] = False def configure_llm(provider): """引导用户配置单个模型""" print(f"\n{C['cyan']}{'─'*60}{C['reset']}") print(f"{C['bold']} 配置: {provider['name']}{C['reset']}") print(f" {C['dim']}{provider['desc']}{C['reset']}") print(f"{C['cyan']}{'─'*60}{C['reset']}") cfg = dict(provider['template']) # API Key(密文输入) cfg['apikey'] = ask_input( f"API Key", hint=provider.get('key_hint', ''), secret=True, ) # 额外字段 for field in provider.get('extra_fields', []): if field['key'] == 'apibase': cfg['apibase'] = ask_input( field['label'], default=field.get('default', cfg.get('apibase', '')), ) elif field.get('type') == 'bool': cfg[field['key']] = ask_yesno( field['label'], default=field.get('default', True) ) # 模型选择 model_list = probe_models(provider, cfg['apikey'], cfg.get('apibase')) if model_list: refresh_choice = {'id': '__refresh__', 'name': '🔄 重新探测模型列表'} choices = [refresh_choice] + [{'id': m, 'name': m} for m in model_list] while True: picked = ask_choice("API 探测到以下可用模型,请选择:", choices) if picked == '__refresh__': print(f" {C['dim']}再次探测...{C['reset']}") model_list = probe_models(provider, cfg['apikey'], cfg.get('apibase')) if not model_list: print(f" {C['yellow']}⚠ 再次探测失败,回退到预设列表{C['reset']}") picked = _fallback_model(provider) break choices = [refresh_choice] + [{'id': m, 'name': m} for m in model_list] else: break cfg['model'] = picked else: cfg['model'] = _fallback_model(provider) # 别名 default_name = cfg.get('name', provider['id']) name = ask_input("此配置的别名 (name,Mixin 引用用)", default=default_name) if name: cfg['name'] = name # 高级选项 if ask_yesno("配置高级选项(proxy / context_win / stream 等)?", default=False): _configure_advanced(provider, cfg) return cfg def _fallback_model(provider): """使用预设模型列表让用户选择""" normalized = _normalize_model_choices(provider.get('model_choices', [])) if normalized: return ask_choice("选择模型:", normalized) return ask_input("请输入模型名称", default=provider['template'].get('model', '')) def configure_llms(): """配置 LLM 模型""" print(f"\n{C['bold']}{C['magenta']}╔══════════════════════════════════════╗") print(f"║ 第一步: 配置 LLM 模型 ║") print(f"╚══════════════════════════════════════╝{C['reset']}") print(f"\n{C['dim']} 你可以配置最多 2 个模型组成故障转移 (Mixin) 列表。{C['reset']}") all_cfgs = [] provider_id = ask_choice("选择模型厂商 (配置第 1 个模型):", LLM_PROVIDERS) provider = next(p for p in LLM_PROVIDERS if p['id'] == provider_id) cfg = configure_llm(provider) all_cfgs.append(cfg) if ask_yesno("再添加一个模型做故障转移?", default=False): providers_ext = [{'id': '__stop__', 'name': '✓ 不需要备选了', 'desc': ''}] + LLM_PROVIDERS provider_id = ask_choice( "选择模型厂商 (配置第 2 个模型 — 或选「不需要备选了」跳过):", providers_ext ) if provider_id != '__stop__': provider = next(p for p in LLM_PROVIDERS if p['id'] == provider_id) cfg = configure_llm(provider) all_cfgs.append(cfg) return all_cfgs # ═══════════════════════════════════════════════════════════════════════════ # 消息平台配置逻辑 # ═══════════════════════════════════════════════════════════════════════════ def configure_platforms(): """配置消息平台,返回 (platform_configs, pip_hints)""" print(f"\n{C['bold']}{C['magenta']}╔══════════════════════════════════════╗") print(f"║ 第二步: 配置消息平台 ║") print(f"╚══════════════════════════════════════╝{C['reset']}") print(f"\n{C['dim']} 消息平台用于从聊天软件与 Agent 交互。{C['reset']}") print(f"{C['dim']} 你也可以跳过此步,直接用终端 REPL。{C['reset']}") platform_ids = ask_choice( "选择消息平台 (可多选,选 '不使用' 则跳过):", PLATFORMS, allow_multi=True, default=['none'] ) if 'none' in platform_ids: return [], set() selected_platforms = [] pip_hints = set() for pid in platform_ids: platform = next(p for p in PLATFORMS if p['id'] == pid) pip_hints.update(platform.get('deps', [])) print(f"\n{C['cyan']}{'─'*60}{C['reset']}") print(f"{C['bold']} 配置: {platform['name']}{C['reset']}") print(f"{C['cyan']}{'─'*60}{C['reset']}") env_vals = {} # 飞书扫码创建 if pid == 'feishu' and ask_yesno("使用一键扫码创建应用?(推荐)", default=True): env_vals = _feishu_scan(platform) # 补充扫码未获取的字段(或扫码失败时全手动填写) for var in platform['env_vars']: if var['key'] not in env_vals: env_vals.update(_manual_platform_var(var)) # 企业微信专属:欢迎消息 if pid == 'wecom' and ask_yesno("设置欢迎消息?", default=False): env_vals['wecom_welcome_message'] = ask_input("欢迎消息内容", default='你好,我在线上。') selected_platforms.append({'platform': platform, 'config': env_vals}) return selected_platforms, pip_hints def _manual_platform_var(var): """手动填写单个平台变量""" val = ask_input(var['label'], hint=var.get('hint', ''), default=var.get('default')) if var.get('is_list'): if val == '[]' or not val: return {var['key']: []} return {var['key']: [x.strip() for x in val.split(',') if x.strip()]} return {var['key']: val} def _feishu_scan(platform): """飞书一键扫码创建应用,返回 env_vals 或空 dict""" try: import lark_oapi as lark import qrcode, threading from io import StringIO except ImportError: print(f"\n {C['yellow']}⚠ lark-oapi 未安装,降级为手动配置{C['reset']}") return {} print(f"\n {C['cyan']}📱 正在启动一键创建...{C['reset']}") print(f" {C['dim']} 请用飞书 App 扫描终端二维码,完成授权后自动获取凭据。{C['reset']}\n") qr_printed = threading.Event() result_holder = {'data': None} def handle_qr(info): url = info['url'] expire = info['expire_in'] qr = qrcode.QRCode(border=1, box_size=1) qr.add_data(url) buf = StringIO() qr.print_ascii(out=buf) qr_art = buf.getvalue() print(f"\n {C['bold']}请用飞书扫描下方二维码,或复制链接在浏览器打开:{C['reset']}") print(f" {C['green']}{qr_art.replace(chr(27), '')}{C['reset']}") print(f" {C['dim']} 链接: {url}{C['reset']}") print(f" {C['dim']} 有效期 {expire} 秒{C['reset']}") qr_printed.set() def handle_status(info): status = info['status'] if status == 'polling': print(f" {C['yellow']}⏳ 等待扫码...{C['reset']}") elif status == 'slow_down': print(f" {C['yellow']}⏳ 等待中... (间隔 {info.get('interval', '?')}s){C['reset']}") elif status == 'domain_switched': print(f" {C['cyan']}🌐 已切换认证域名{C['reset']}") def run_register(): try: result = lark.register_app( on_qr_code=handle_qr, on_status_change=handle_status, ) result_holder['data'] = result except Exception as e: print(f"\n {C['red']}✗ 创建失败: {e}{C['reset']}") thread = threading.Thread(target=run_register, daemon=True) thread.start() qr_printed.wait(timeout=15) thread.join(timeout=300) if result_holder['data']: result = result_holder['data'] print(f"\n {C['green']}✅ 应用创建成功!{C['reset']}") print(f" App ID: {C['bold']}{result['client_id']}{C['reset']}") print(f" App Secret: {C['bold']}{result['client_secret']}{C['reset']}") return { 'fs_app_id': result['client_id'], 'fs_app_secret': result['client_secret'], } else: print(f"\n {C['yellow']}⚠ 扫码创建未完成,降级为手动填写...{C['reset']}") return {} # ═══════════════════════════════════════════════════════════════════════════ # 生成 mykey.py # ═══════════════════════════════════════════════════════════════════════════ def _var_type_info(cfg): """根据配置类型返回 (var_prefix, session_type)""" cfg_type = cfg.get('type', 'native_oai') if cfg_type == 'native_claude': return 'native_claude_config', 'NativeClaudeSession' elif cfg_type == 'claude': return 'claude_config', 'ClaudeSession' elif cfg_type == 'oai': return 'oai_config', 'LLMSession' else: return 'native_oai_config', 'NativeOAISession' def generate_mykey(llm_cfgs, platform_configs): """生成 mykey.py 内容""" lines = [] lines.append("# ══════════════════════════════════════════════════════════════════════════════") lines.append(f"# GenericAgent — mykey.py (由 configure.py 自动生成 @ {datetime.now().strftime('%Y-%m-%d %H:%M')})") lines.append("# ══════════════════════════════════════════════════════════════════════════════") lines.append("") lines.append("# ── 停止符 ──────────────────────────────────────────────────────────────────") lines.append("_SETUP_DONE = 'configure.py' # 删除此行可重新触发配置向导") lines.append("") # Mixin 配置 names = [c['name'] for c in llm_cfgs] lines.append("# ── Mixin 故障转移 ──────────────────────────────────────────────────────────") lines.append("mixin_config = {") lines.append(f" 'llm_nos': {names},") lines.append(" 'max_retries': 10,") lines.append(" 'base_delay': 0.5,") lines.append("}") lines.append("") # 各模型配置 # 同类型多实例时加上数字后缀 type_counts = {} for cfg in llm_cfgs: cfg_type = cfg.get('type', 'native_oai') type_counts[cfg_type] = type_counts.get(cfg_type, 0) + 1 type_indices = {} for i, cfg in enumerate(llm_cfgs): cfg_type = cfg.get('type', 'native_oai') var_prefix, session_type = _var_type_info(cfg) idx = type_indices.get(cfg_type, 0) type_indices[cfg_type] = idx + 1 # 同类型只有一个时不加后缀;多个时加数字后缀 if type_counts[cfg_type] > 1: var_name = f"{var_prefix}_{idx}" else: var_name = var_prefix lines.append(f"# ── {cfg['name']} ({session_type}) ─────────────────────────────────────────────") lines.append(f"{var_name} = {{") _write_config_fields(lines, cfg) lines.append("}") lines.append("") # 平台配置 if platform_configs: lines.append("# ══════════════════════════════════════════════════════════════════════════════") lines.append("# 聊天平台集成") lines.append("# ══════════════════════════════════════════════════════════════════════════════") lines.append("") for pc in platform_configs: for key, val in pc['config'].items(): _write_platform_value(lines, key, val) lines.append("") # 尾部 lines.append("# ══════════════════════════════════════════════════════════════════════════════") lines.append("# 配置完毕!运行: python agentmain.py (终端 REPL)") if platform_configs: for pc in platform_configs: p = pc['platform'] lines.append(f"# 或: python {p['file']} ({p['name']})") lines.append("# ══════════════════════════════════════════════════════════════════════════════") return '\n'.join(lines) def _write_config_fields(lines, cfg): """写入配置字典的键值对(缩进的 'key': value, 格式)""" for key in ['name', 'apikey', 'apibase', 'model', 'api_mode', 'fake_cc_system_prompt', 'thinking_type', 'thinking_budget_tokens', 'reasoning_effort', 'max_tokens', 'max_retries', 'connect_timeout', 'read_timeout', 'temperature', 'context_win', 'proxy', 'user_agent', 'stream']: if key not in cfg: continue val = cfg[key] if isinstance(val, bool): lines.append(f" '{key}': {str(val)},") elif isinstance(val, (int, float)): lines.append(f" '{key}': {val},") elif isinstance(val, str): lines.append(f" '{key}': '{val}',") else: lines.append(f" '{key}': {repr(val)},") def _write_platform_value(lines, key, val): """写入顶级变量(平台配置等)""" if isinstance(val, list): if val: lines.append(f"{key} = {repr(val)}") else: lines.append(f"{key} = [] # 允许所有用户") elif isinstance(val, str): lines.append(f"{key} = '{val}'") else: lines.append(f"{key} = {repr(val)}") # ═══════════════════════════════════════════════════════════════════════════ # Main # ═══════════════════════════════════════════════════════════════════════════ def main(): banner() # Python 版本检查 ok, msg = _check_python() if not ok: print(f" {C['red']}✗ {msg}{C['reset']}") sys.exit(1) color = 'yellow' if '⚠' in msg else 'green' print(f" {C[color]}{msg}{C['reset']}\n") # 检测已有配置 if os.path.exists(MYKPY_PATH): print(f" {C['yellow']}⚠ 检测到已有 mykey.py{C['reset']}") if not ask_yesno("是否重新配置?", default=False): print(f"\n {C['dim']} 退出。如需重新配置请删除 mykey.py 后重试。{C['reset']}\n") sys.exit(0) # ── 顶层菜单 ── scope = ask_choice( "你想配置什么?", [ {'id': 'llm', 'name': 'LLM 模型', 'desc': '选择厂商、填写 API Key、探测模型列表'}, {'id': 'platform', 'name': '消息平台 (Telegram/QQ/飞书等)', 'desc': '配置聊天机器人接入'}, {'id': 'both', 'name': '两项都配置 (推荐)', 'desc': 'LLM + 平台,完整初始化'}, ], default='both', ) llm_cfgs = [] platform_configs = [] platform_deps = set() # ── 执行 ── if scope in ('llm', 'both'): llm_cfgs = _do_llm() if scope == 'llm': if ask_yesno("是否继续配置消息平台?", default=True): platform_configs, platform_deps = configure_platforms() if scope == 'both': platform_configs, platform_deps = configure_platforms() if scope == 'platform': platform_configs, platform_deps = configure_platforms() if ask_yesno("是否继续配置 LLM 模型?", default=True): llm_cfgs = _do_llm() # ── 生成 mykey.py ── if not llm_cfgs and not platform_configs: print(f"\n {C['yellow']}⚠ 没有配置任何内容,退出。{C['reset']}") sys.exit(0) content = generate_mykey(llm_cfgs, platform_configs) # 备份旧文件 if os.path.exists(MYKPY_PATH): backup = os.path.join(PROJECT_ROOT, f'mykey.py.bak.{datetime.now().strftime("%Y%m%d_%H%M%S")}') shutil.copy2(MYKPY_PATH, backup) print(f"\n {C['green']}✓ 旧配置已备份至:{C['reset']} {C['dim']}{backup}{C['reset']}") # 写入 with open(MYKPY_PATH, 'w', encoding='utf-8') as f: f.write(content) print(f"\n {C['green']}✓ mykey.py 已生成!{C['reset']}") # ── 完成提示 ── print(f"\n{C['bold']}{C['green']}╔══════════════════════════════════════╗") print(f"║ 配置完成! ║") print(f"╚══════════════════════════════════════╝{C['reset']}") print() if llm_cfgs: print(f" {C['cyan']} 终端 REPL:{C['reset']} python agentmain.py") if platform_configs: for i, pc in enumerate(platform_configs, 1): p = pc['platform'] print(f" {C['cyan']} 平台 {i} ({p['name']}):{C['reset']} python {p['file']}") print() # pip 依赖提示 all_deps = sorted(platform_deps) if all_deps: print(f" {C['yellow']}💡 提示:你需要安装以下依赖以使消息平台正常工作:{C['reset']}") print(f" {C['cyan']}pip install {' '.join(all_deps)}{C['reset']}") print() # ── 入门示例 ── print(f" {C['bold']}试试这些命令:{C['reset']}") examples = [ "帮我在桌面创建一个 hello.txt,内容是 Hello World", "请查看你的代码,安装所有用得上的 python 依赖", "执行 web setup sop,解锁 web 工具", "打开淘宝,搜索 iPhone 16,按价格排序", "用rapidocr配置你的ocr能力并存入记忆", "git 更新你的代码,然后看看 commit 有什么新功能", "把这个记到你的记忆里", ] for ex in examples: print(f" {C['dim']}{ex}{C['reset']}") print() print(f" {C['green']}{C['bold']}合抱之木,生于毫末{C['reset']}\n") def _do_llm(): """配置 LLM 模型,失败则 exit。""" cfgs = configure_llms() if not cfgs: print(f"\n {C['red']}✗ 至少需要配置一个模型才能使用。退出。{C['reset']}") sys.exit(1) return cfgs if __name__ == '__main__': try: main() except KeyboardInterrupt: print(f"\n\n {C['yellow']}⚠ 用户中断{C['reset']}") sys.exit(0) ================================================ FILE: assets/global_mem_insight_template.txt ================================================ # [Global Memory Insight] 需要时read L2 或 ls ../memory/ 查L3 L0(META-SOP): memory_management_sop L2: 现空 L3: memory_cleanup_sop(记忆整理) | skill_search | ui_detect.py | ocr_utils.py | subagent | web_setup_sop | plan_sop | procmem_scanner | keychain | ljqCtrl_sop+.py | tmwebdriver_sop | autonomous_operation_sop | scheduled_task_sop | vision_sop | adb_ui.py L4: L4_raw_sessions/ 历史会话 浏览器特殊操作: tmwebdriver_sop(文件上传/图搜/PDF blob/物理坐标/HttpOnly Cookie/autofill突破/跨域iframe/CDP/跨tab) 键鼠: ljqCtrl_sop(禁pyautogui/先activate) 截图/视觉: ocr/vision_sop | 禁全屏截图,优先窗口 定时:scheduled_task_sop | 自主:autonomous_operation_sop | watchdog/反射:agentmain --reflect 手机:adb_ui.py [RULES] 1. 搜索先行: 搜文件名严禁不用es(禁PS递归/禁dir遍历), 搜索一定优先使用web工具的google(严禁duckduckgo等), 优先看cwd,禁猜路径 2. 交叉验证: 禁信摘要, 数值进详情页核实 3. 编码安全: 禁PS cat/type用file_read; 改前必读; memory模块直接import(已在PATH,禁加虚假前缀) 4. 闭环: 物理模拟后确认; 3次失败请求干预; Git完整闭环 5. 进程: 禁无条件杀python(杀自己), 精确PID, 禁os.kill判活 6. 窗口: GUI状态优先win32gui枚举标题 7. web JS: 输入用原生setter+事件链, 点击前检disabled, 注意引号转义; scan空/不全先稍等再scan, 禁首扫定论 8. SOP: 读SOP禁凭印象,有utils必用 | 复杂超长程任务/用户明确提及规划模式→读plan_sop ================================================ FILE: assets/global_mem_insight_template_en.txt ================================================ # [Global Memory Insight] Read L2 or ls ../memory/ for L3 when needed L0(META-SOP): memory_management_sop L2: currently empty L3: memory_cleanup_sop(memory cleanup) | skill_search | ui_detect.py | ocr_utils.py | subagent | web_setup_sop | plan_sop | procmem_scanner | keychain | ljqCtrl_sop+.py | tmwebdriver_sop | autonomous_operation_sop | scheduled_task_sop | vision_sop | adb_ui.py L4: L4_raw_sessions/ historical sessions Browser special ops: tmwebdriver_sop(file upload/image search/PDF blob/physical coords/HttpOnly Cookie/autofill bypass/cross-origin iframe/CDP/cross-tab) Keyboard & Mouse: ljqCtrl_sop(no pyautogui/activate first) Screenshot/Vision: ocr/vision_sop | No fullscreen capture, prefer window Scheduling: scheduled_task_sop | Autonomous: autonomous_operation_sop | watchdog/reflect: agentmain --reflect Mobile: adb_ui.py [RULES] 1. Search first: must use es for filename search (no PS recursion/no dir traversal), always prefer Google for web search (no duckduckgo etc), check cwd first, no guessing paths 2. Cross-verify: never trust summaries, verify numbers on detail pages 3. Encoding safety: use file_read not PS cat/type; read before modify; import memory modules directly (already in PATH, no fake prefixes) 4. Close the loop: confirm after physical simulation; request intervention after 3 failures; complete Git workflow 5. Processes: never kill python unconditionally (kills self), use exact PID, no os.kill for liveness check 6. Windows: prefer win32gui title enumeration for GUI state 7. Web JS: use native setter + event chain for input, check disabled before click, mind quote escaping; if scan empty/incomplete wait then rescan, no conclusions from first scan 8. SOP: read SOPs not from memory, must use utils if available | complex long-running/user mentions planning -> read plan_sop ================================================ FILE: assets/insight_fixed_structure.txt ================================================ Facts(L2): ../memory/global_mem.txt | GA CodeRoot: ../ | SOPs(L3): ../memory/*.md or *.py | META-SOP(L0): ../memory/memory_management_sop.md L1 Insight是极简索引,L2/L3变更时同步L1,索引必须极简。写记忆前先读META-SOP(L0)。 [CONSTITUTION] 1. 改自身源码先请示;./内可自主实验,允许装包和portable工具 2. 决策前查记忆,有SOP/utils必用;多次失败回看SOP;未查证不断言 3. 分步执行,控制粒度,限制失败半径;3次失败请求干预 4. 密钥文件仅引用,不读取/移动 5. 写任何记忆前读META-SOP核验,memory下文件只能patch修改(除非新建) ================================================ FILE: assets/insight_fixed_structure_en.txt ================================================ Facts(L2): ../memory/global_mem.txt | CodeRoot: ../ | SOPs(L3): ../memory/*.md or *.py | META-SOP(L0): ../memory/memory_management_sop.md L1 Insight is a minimal index; sync L1 when L2/L3 changes; keep index minimal. Read META-SOP(L0) before writing any memory. [CONSTITUTION] 1. Ask before modifying own source code; free to experiment within ./; installing packages and portable tools allowed 2. Check memory before decisions; always use existing SOPs/utils; revisit SOPs on repeated failures; never assert without evidence 3. Execute step by step, control granularity, limit blast radius; request intervention after 3 failures 4. Key/secret files: reference only, never read or move 5. Read META-SOP to verify before writing any memory; files under memory/ must be patched only (unless creating new) ================================================ FILE: assets/install-macos-app.sh ================================================ #!/bin/bash # GenericAgent macOS Desktop App Installation Script # # Usage: # bash assets/install-macos-app.sh [--auto] # # This installer creates a small .app bundle that opens Terminal and runs # `python3 launch.pyw` from the current GenericAgent checkout. if [ -z "${BASH_VERSION}" ]; then if command -v bash >/dev/null 2>&1; then exec bash -- "${0}" "$@" else echo "Error: This script requires bash." exit 1 fi fi set -euo pipefail RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; CYAN='\033[0;36m'; NC='\033[0m' log_info() { echo -e "${BLUE}ℹ️ $1${NC}"; } log_success() { echo -e "${GREEN}✅ $1${NC}"; } log_warning() { echo -e "${YELLOW}⚠️ $1${NC}"; } log_error() { echo -e "${RED}❌ $1${NC}"; } AUTO_MODE=false for arg in "$@"; do case "$arg" in --auto) AUTO_MODE=true ;; esac done APP_NAME="GenericAgent" PRIMARY_INSTALL_DIR="/Applications" FALLBACK_INSTALL_DIR="${HOME}/Applications" SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)" ICON_PATH="${PROJECT_ROOT}/assets/images/logo.jpg" LAUNCH_SCRIPT="${PROJECT_ROOT}/launch.pyw" echo -e "${CYAN}" echo "╔═══════════════════════════════════════════════════════════╗" echo "║ GenericAgent — macOS Desktop App Installer ║" echo "╚═══════════════════════════════════════════════════════════╝" echo -e "${NC}" if [[ "$(uname)" != "Darwin" ]]; then log_error "This script only supports macOS." exit 1 fi if ! command -v python3 >/dev/null 2>&1; then log_error "python3 is not installed." exit 1 fi if [ ! -f "${LAUNCH_SCRIPT}" ]; then log_error "launch.pyw not found at ${LAUNCH_SCRIPT}" exit 1 fi project_path_for_applescript="${PROJECT_ROOT}/" project_path_for_applescript="${project_path_for_applescript//\\/\\\\}" project_path_for_applescript="${project_path_for_applescript//\"/\\\"}" detect_existing_app() { if [ -d "${PRIMARY_INSTALL_DIR}/${APP_NAME}.app" ]; then echo "${PRIMARY_INSTALL_DIR}/${APP_NAME}.app" return fi if [ -d "${FALLBACK_INSTALL_DIR}/${APP_NAME}.app" ]; then echo "${FALLBACK_INSTALL_DIR}/${APP_NAME}.app" return fi } existing_app_path="$(detect_existing_app || true)" if [ -n "${existing_app_path}" ]; then log_warning "${APP_NAME}.app already exists at ${existing_app_path}" fi if [ "${AUTO_MODE}" = false ]; then echo "" echo "This will install a desktop app that launches GenericAgent" echo "from Spotlight, Launchpad, or the Applications folder." echo "" if [ -n "${existing_app_path}" ]; then read -p "Reinstall ${APP_NAME}.app? (y/N) " -n 1 -r else read -p "Continue? (Y/n) " -n 1 -r fi echo if [ -n "${existing_app_path}" ]; then [[ ! ${REPLY:-} =~ ^[Yy]$ ]] && { echo "Aborted."; exit 0; } else [[ ${REPLY:-} =~ ^[Nn]$ ]] && { echo "Aborted."; exit 0; } fi fi TMP_DIR="$(mktemp -d)" trap 'rm -rf "${TMP_DIR}"' EXIT log_info "Building ${APP_NAME}.app..." cat > "${TMP_DIR}/${APP_NAME}.applescript" </dev/null 2>&1 sips -z 32 32 "${ICON_PATH}" --out "${ICONSET_DIR}/icon_16x16@2x.png" >/dev/null 2>&1 sips -z 32 32 "${ICON_PATH}" --out "${ICONSET_DIR}/icon_32x32.png" >/dev/null 2>&1 sips -z 64 64 "${ICON_PATH}" --out "${ICONSET_DIR}/icon_32x32@2x.png" >/dev/null 2>&1 sips -z 128 128 "${ICON_PATH}" --out "${ICONSET_DIR}/icon_128x128.png" >/dev/null 2>&1 sips -z 256 256 "${ICON_PATH}" --out "${ICONSET_DIR}/icon_128x128@2x.png" >/dev/null 2>&1 sips -z 256 256 "${ICON_PATH}" --out "${ICONSET_DIR}/icon_256x256.png" >/dev/null 2>&1 sips -z 512 512 "${ICON_PATH}" --out "${ICONSET_DIR}/icon_256x256@2x.png" >/dev/null 2>&1 sips -z 512 512 "${ICON_PATH}" --out "${ICONSET_DIR}/icon_512x512.png" >/dev/null 2>&1 cp "${ICON_PATH}" "${ICONSET_DIR}/icon_512x512@2x.png" iconutil -c icns "${ICONSET_DIR}" -o "${TMP_DIR}/ga-icon.icns" cp "${TMP_DIR}/ga-icon.icns" "${TMP_DIR}/${APP_NAME}.app/Contents/Resources/applet.icns" log_success "Icon applied from assets/images/logo.jpg" else log_warning "Logo not found at ${ICON_PATH}, using default icon." fi install_bundle() { local install_dir="$1" local destination="${install_dir}/${APP_NAME}.app" mkdir -p "${install_dir}" rm -rf "${destination}" cp -R "${TMP_DIR}/${APP_NAME}.app" "${destination}" } install_path="" if install_bundle "${PRIMARY_INSTALL_DIR}" 2>/dev/null; then install_path="${PRIMARY_INSTALL_DIR}/${APP_NAME}.app" else log_warning "Could not write to ${PRIMARY_INSTALL_DIR}; falling back to ${FALLBACK_INSTALL_DIR}" install_bundle "${FALLBACK_INSTALL_DIR}" install_path="${FALLBACK_INSTALL_DIR}/${APP_NAME}.app" fi log_success "Installed to: ${install_path}" echo "" echo -e "${CYAN}╔═══════════════════════════════════════════════════════════╗${NC}" echo -e "${CYAN}║${NC} ✨ ${APP_NAME} Desktop App installed successfully! ${CYAN}║${NC}" echo -e "${CYAN}╚═══════════════════════════════════════════════════════════╝${NC}" echo "" echo -e "${BLUE}Launch methods:${NC}" echo " • Spotlight: Cmd + Space → type '${APP_NAME}' → Enter" echo " • Launchpad: Find the '${APP_NAME}' icon" echo " • Finder: Open ${install_path}" echo "" echo -e "${BLUE}Runtime behavior:${NC}" echo " The app uses the current checkout path embedded at install time:" echo " ${PROJECT_ROOT}" echo " If you move the repo later, re-run this installer." echo "" echo -e "${BLUE}Uninstall:${NC}" echo " rm -rf '${install_path}'" echo "" ================================================ FILE: assets/install_python_windows.bat ================================================ @echo off setlocal enabledelayedexpansion title Python One-Click Installer color 0A echo. echo ======================================== echo Python One-Click Installer (Windows) echo ======================================== echo. net session >nul 2>&1 if %errorlevel% neq 0 ( echo [!] Administrator privileges required. Restarting with elevation... powershell -Command "Start-Process '%~f0' -Verb RunAs" exit /b ) echo [OK] Administrator privileges confirmed echo. python --version >nul 2>&1 if %errorlevel% equ 0 ( echo [OK] Python already installed: python --version echo. choice /C YN /M "Install latest version anyway? (Y=Yes / N=Exit)" if errorlevel 2 goto :end ) set PYTHON_VERSION=3.12.9 set MIRROR_URL=https://npmmirror.com/mirrors/python/3.12.9/python-3.12.9-amd64.exe set OFFICIAL_URL=https://www.python.org/ftp/python/3.12.9/python-3.12.9-amd64.exe set INSTALLER=%TEMP%\python_installer.exe echo [*] Preparing to download Python %PYTHON_VERSION% echo [*] Trying mirror source first... echo. powershell -NoProfile -Command "[Net.ServicePointManager]::SecurityProtocol=[Net.SecurityProtocolType]::Tls12; $ProgressPreference='SilentlyContinue'; Invoke-WebRequest -Uri '%MIRROR_URL%' -OutFile '%INSTALLER%' -UseBasicParsing" if not exist "%INSTALLER%" goto :official for %%A in ("%INSTALLER%") do if %%~zA lss 1000000 goto :official echo [OK] Mirror download complete goto :install :official echo [!] Mirror failed, switching to official source... powershell -NoProfile -Command "[Net.ServicePointManager]::SecurityProtocol=[Net.SecurityProtocolType]::Tls12; $ProgressPreference='SilentlyContinue'; Invoke-WebRequest -Uri '%OFFICIAL_URL%' -OutFile '%INSTALLER%' -UseBasicParsing" if not exist "%INSTALLER%" ( echo [x] Download failed. Please check your network connection and retry. pause goto :end ) for %%A in ("%INSTALLER%") do if %%~zA lss 1000000 ( echo [x] Downloaded file is incomplete. Please check your network and retry. pause goto :end ) echo [OK] Official source download complete :install echo. echo [*] Installing Python %PYTHON_VERSION% (this may take 2-5 minutes^)... echo. start /wait "" "%INSTALLER%" /passive InstallAllUsers=1 PrependPath=1 Include_test=0 Include_pip=1 set INSTALL_CODE=%errorlevel% del /f /q "%INSTALLER%" >nul 2>&1 if %INSTALL_CODE% neq 0 ( echo [x] Installation failed with error code: %INSTALL_CODE% pause goto :end ) echo [+] Installation complete! echo. timeout /t 3 /nobreak >nul set "PATH=C:\Program Files\Python312;C:\Program Files\Python312\Scripts;%PATH%" python --version >nul 2>&1 if %errorlevel% equ 0 ( echo [OK] Python installed successfully: python --version echo. echo [OK] pip version: pip --version echo. echo [*] Configuring pip mirror (Tsinghua^)... pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple pip config set global.trusted-host pypi.tuna.tsinghua.edu.cn echo. echo [*] Installing requests... pip install requests echo. echo ======================================== echo All done! Open a new terminal to use echo python and pip commands. echo ======================================== ) else ( echo [!] PATH not yet refreshed. Please close this window and open a new terminal. ) :end echo. pause ================================================ FILE: assets/sys_prompt.txt ================================================ # Role: 物理级全能执行者 你拥有文件读写、脚本执行、用户浏览器JS注入、系统级干预的物理操作权限。禁止推诿"无法操作"——不空想,用工具探测。 ## 行动原则 调用工具前先推演:当前阶段、上步结果是否符合预期、下步策略,必须在回复文本中用输出极简总结。 - 探测优先:失败时先充分获取信息(日志/状态/上下文),关键信息存入工作记忆,再决定重试或换方案。不可逆操作先询问用户。 - 失败升级:1次→读错误理解原因,2次→探测环境状态,3次→深度分析后换方案或问用户。禁止无新信息的重复操作。 ================================================ FILE: assets/sys_prompt_en.txt ================================================ # Role: Physical-Level Omnipotent Executor You have full physical access: file I/O, script execution, browser JS injection, and system-level intervention. Never deflect with "can't do it" — don't speculate, use tools to probe. Summarize and reply in user's language or follow user's prompt. ## Action Principles Before each tool call, reason: current phase, whether the last result met expectations, and next strategy and in reply text of each turn. - Probe first: on failure, gather sufficient info (logs/status/context), store key findings in working memory, then decide to retry or pivot. Ask the user before irreversible operations. - Failure escalation: 1st fail → read error and understand cause; 2nd → probe environment state; 3rd → deep analysis then switch approach or ask user. Never repeat an action without new information. ================================================ FILE: assets/tmwd_cdp_bridge/background.js ================================================ // background.js - Cookie + CDP Bridge chrome.runtime.onInstalled.addListener(() => { console.log('CDP Bridge installed'); // Strip CSP headers to allow eval/inline scripts chrome.declarativeNetRequest.updateDynamicRules({ removeRuleIds: [9999], addRules: [{ id: 9999, priority: 1, action: { type: 'modifyHeaders', responseHeaders: [ { header: 'content-security-policy', operation: 'remove' }, { header: 'content-security-policy-report-only', operation: 'remove' } ]}, condition: { urlFilter: '*', resourceTypes: ['main_frame', 'sub_frame'] } }] }); }); async function handleExtMessage(msg, sender) { if (msg.cmd === 'cookies') return await handleCookies(msg, sender); if (msg.cmd === 'cdp') return await handleCDP(msg, sender); if (msg.cmd === 'batch') return await handleBatch(msg, sender); if (msg.cmd === 'tabs') { try { if (msg.method === 'switch') { const tab = await chrome.tabs.update(msg.tabId, { active: true }); await chrome.windows.update(tab.windowId, { focused: true }); return { ok: true }; } else { const tabs = (await chrome.tabs.query({})).filter(t => isScriptable(t.url)); const data = tabs.map(t => ({ id: t.id, url: t.url, title: t.title, active: t.active, windowId: t.windowId })); return { ok: true, data }; } } catch (e) { return { ok: false, error: e.message }; } } if (msg.cmd === 'management') { try { if (msg.method === 'list') { const all = await chrome.management.getAll(); return { ok: true, data: all.map(e => ({ id: e.id, name: e.name, enabled: e.enabled, type: e.type, version: e.version })) }; } if (msg.method === 'reload') { chrome.alarms.create('tmwd-self-reload', { when: Date.now() + 200 }); return { ok: true }; } if (msg.method === 'disable') { await chrome.management.setEnabled(msg.extId, false); return { ok: true }; } if (msg.method === 'enable') { await chrome.management.setEnabled(msg.extId, true); return { ok: true }; } return { ok: false, error: 'Unknown method: ' + msg.method }; } catch (e) { return { ok: false, error: e.message }; } } if (msg.cmd === 'contentSettings') { try { const type = msg.type || 'automaticDownloads'; const setting = msg.setting || 'allow'; const pattern = msg.pattern || ''; await chrome.contentSettings[type].set({ primaryPattern: pattern, setting: setting }); return { ok: true }; } catch (e) { return { ok: false, error: e.message }; } } return { ok: false, error: 'Unknown cmd: ' + msg.cmd }; } chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => { handleExtMessage(msg, sender).then(sendResponse); return true; }); async function handleCookies(msg, sender) { try { let url = msg.url || sender.tab?.url; if (!url && msg.tabId) { const tab = await chrome.tabs.get(msg.tabId); url = tab.url; } const origin = url.match(/^https?:\/\/[^\/]+/)[0]; const all = await chrome.cookies.getAll({ url }); const part = await chrome.cookies.getAll({ url, partitionKey: { topLevelSite: origin } }).catch(() => []); const merged = [...all]; for (const c of part) { if (!merged.some(x => x.name === c.name && x.domain === c.domain)) merged.push(c); } return { ok: true, data: merged }; } catch (e) { return { ok: false, error: e.message }; } } async function handleBatch(msg, sender) { const R = []; let attached = null; const resolve$N = (params) => JSON.parse(JSON.stringify(params || {}).replace(/"\$(\d+)\.([^"]+)"/g, (_, i, path) => { let v = R[+i]; for (const k of path.split('.')) v = v[k]; return JSON.stringify(v); })); try { for (const c of msg.commands) { if (c.tabId === undefined && msg.tabId !== undefined) c.tabId = msg.tabId; if (c.cmd === 'cookies') { R.push(await handleCookies(c, sender)); } else if (c.cmd === 'tabs') { const tabs = (await chrome.tabs.query({})).filter(t => isScriptable(t.url)); R.push({ ok: true, data: tabs.map(t => ({ id: t.id, url: t.url, title: t.title, active: t.active, windowId: t.windowId })) }); } else if (c.cmd === 'cdp') { const tabId = c.tabId || msg.tabId || sender.tab?.id; if (attached !== tabId) { if (attached) { await chrome.debugger.detach({ tabId: attached }); attached = null; } await chrome.debugger.attach({ tabId }, '1.3'); attached = tabId; } R.push(await chrome.debugger.sendCommand({ tabId }, c.method, resolve$N(c.params))); } else { R.push({ ok: false, error: 'unknown cmd: ' + c.cmd }); } } if (attached) await chrome.debugger.detach({ tabId: attached }); return { ok: true, results: R }; } catch (e) { if (attached) try { await chrome.debugger.detach({ tabId: attached }); } catch (_) {} return { ok: false, error: e.message, results: R }; } } async function handleCDP(msg, sender) { const tabId = msg.tabId || sender.tab?.id; if (!tabId) return { ok: false, error: 'no tabId' }; try { await chrome.debugger.attach({ tabId }, '1.3'); const result = await chrome.debugger.sendCommand({ tabId }, msg.method, msg.params || {}); await chrome.debugger.detach({ tabId }); return { ok: true, data: result }; } catch (e) { try { await chrome.debugger.detach({ tabId }); } catch (_) {} return { ok: false, error: e.message }; } } // Filter out chrome:// and other internal tabs that can't be scripted const isScriptable = url => url && /^https?:/.test(url); // --- Shared page/CDP script builder core --- function buildExecScript(code, errorHandler) { return `(async () => { function smartProcessResult(result) { if (result === null || result === undefined || typeof result !== 'object') return result; try { if (result.window === result && result.document) return '[Window: ' + (result.location?.href || 'about:blank') + ']'; } catch(_){} if (typeof jQuery !== 'undefined' && result instanceof jQuery) { const elements = []; for (let i = 0; i < result.length; i++) { if (result[i] && result[i].nodeType === 1) elements.push(result[i].outerHTML); } return elements; } if (result instanceof NodeList || result instanceof HTMLCollection) { const elements = []; for (let i = 0; i < result.length; i++) { if (result[i] && result[i].nodeType === 1) elements.push(result[i].outerHTML); } return elements; } if (result.nodeType === 1) return result.outerHTML; if (!Array.isArray(result) && typeof result === 'object' && 'length' in result && typeof result.length === 'number') { const firstElement = result[0]; if (firstElement && firstElement.nodeType === 1) { const elements = []; const length = Math.min(result.length, 100); for (let i = 0; i < length; i++) { const elem = result[i]; if (elem && elem.nodeType === 1) elements.push(elem.outerHTML); } return elements; } } try { return JSON.parse(JSON.stringify(result, function(key, value) { if (typeof value === 'object' && value !== null) { if (value.nodeType === 1) return value.outerHTML; if (value === window || value === document) return '[Object]'; try { if (value.window === value && value.document) return '[Window]'; } catch(_){} } return value; })); } catch (e) { return '[无法序列化: ' + e.message + ']'; } } try { const jsCode = ${JSON.stringify(code)}.trim(); const lines = jsCode.split(/\\r?\\n/).filter(l => l.trim()); const lastLine = lines.length > 0 ? lines[lines.length - 1].trim() : ''; const AsyncFunction = Object.getPrototypeOf(async function(){}).constructor; let r; function _air(c) { const ls = c.split(/\\r?\\n/); let i = ls.length - 1; while (i >= 0 && !ls[i].trim()) i--; if (i < 0) return c; const t = ls[i].trim(); if (/^(return |return;|return$|let |const |var |if |if\\(|for |for\\(|while |while\\(|switch|try |throw |class |function |async |import |export |\\/\\/|})/.test(t)) return c; ls[i] = ls[i].match(/^(\\s*)/)[1] + 'return ' + t; return ls.join('\\n'); } if (lastLine.startsWith('return')) { r = await (new AsyncFunction(jsCode))(); } else { try { r = eval(jsCode); if (r instanceof Promise) r = await r; } catch (e) { if (e instanceof SyntaxError && (/return/i.test(e.message) || /await/i.test(e.message))) { r = await (new AsyncFunction(_air(jsCode)))(); } else throw e; } } return { ok: true, data: smartProcessResult(r) }; } catch (e) { ${errorHandler} } })()`; } function buildPageScript(code) { return buildExecScript(code, ` const errMsg = e.message || String(e); return { ok: false, error: { name: e.name || 'Error', message: errMsg, stack: e.stack || '' }, csp: errMsg.includes('Refused to evaluate') || errMsg.includes('unsafe-eval') || errMsg.includes('Content Security Policy') }; `); } function buildCdpScript(code) { return buildExecScript(code, ` return { ok: false, error: { name: e.name || 'Error', message: e.message || String(e), stack: e.stack || '' } }; `); } // --- WebSocket Client for TMWebDriver --- let ws = null; const WS_URL = 'ws://127.0.0.1:18765'; function scheduleProbe() { // Use chrome.alarms to survive MV3 service worker suspension chrome.alarms.create('tmwd-ws-probe', { delayInMinutes: 0.083 }); // ~5s } function scheduleKeepalive() { // Keep SW alive while WS is connected (~25s, under 30s SW timeout) chrome.alarms.create('tmwd-ws-keepalive', { delayInMinutes: 0.4 }); // ~24s } async function isServerAlive() { try { const ctrl = new AbortController(); setTimeout(() => ctrl.abort(), 2000); await fetch('http://127.0.0.1:18765', { signal: ctrl.signal }); return true; // Got HTTP response → port is listening } catch (e) { return false; // Network error (connection refused) or timeout → server not alive } } chrome.alarms.onAlarm.addListener(async (alarm) => { if (alarm.name === 'tmwd-self-reload') { chrome.runtime.reload(); return; } if (alarm.name === 'tmwd-ws-keepalive') { // Keepalive: ping to keep SW alive + detect dead connections if (ws && ws.readyState === WebSocket.OPEN) { try { ws.send('{"type":"ping"}'); } catch (_) {} scheduleKeepalive(); } else { // Connection lost, switch to probe mode ws = null; scheduleProbe(); } } if (alarm.name === 'tmwd-ws-probe') { if (ws && ws.readyState <= 1) return; // Already connected/connecting if (await isServerAlive()) { console.log('[TMWD-WS] Server detected, connecting...'); connectWS(); } else { scheduleProbe(); // Server not up, keep probing } } }); async function handleWsExec(data) { const tabId = data.tabId; console.log('[TMWD-WS] Exec request', data.id, 'on tab', tabId); ws.send(JSON.stringify({ type: 'ack', id: data.id })); if (!tabId) { ws.send(JSON.stringify({ type: 'error', id: data.id, error: 'No tabId provided' })); return; } // Use onCreated listener to reliably capture new tabs (avoids race condition with query-diff) const newTabIds = new Set(); const onCreated = (tab) => { newTabIds.add(tab.id); }; chrome.tabs.onCreated.addListener(onCreated); try { let res; try { const result = await chrome.scripting.executeScript({ target: { tabId }, world: 'MAIN', func: async (s) => await eval(s), args: [buildPageScript(data.code)] }); res = result[0]?.result; if (res === null || res === undefined) { console.log('[TMWD-WS] executeScript returned null/undefined, treating as CSP issue'); res = { ok: false, error: { name: 'Error', message: 'executeScript returned null (possible CSP or context issue)', stack: '' }, csp: true }; } } catch (e) { console.log('[TMWD-WS] scripting.executeScript failed:', e.message); res = { ok: false, error: { name: e.name || 'Error', message: e.message || String(e), stack: e.stack || '' }, csp: true }; } // CDP fallback for CSP-restricted pages if (res && !res.ok && res.csp) { console.log('[TMWD-WS] CDP fallback for tab', tabId); const wrappedCode = buildCdpScript(data.code); try { await chrome.debugger.attach({ tabId }, '1.3'); const cdpRes = await chrome.debugger.sendCommand({ tabId }, 'Runtime.evaluate', { expression: wrappedCode, awaitPromise: true, returnByValue: true }); await chrome.debugger.detach({ tabId }); if (cdpRes.exceptionDetails) { const desc = cdpRes.exceptionDetails.exception?.description || 'CDP Error'; res = { ok: false, error: { name: 'Error', message: desc, stack: desc } }; } else { res = cdpRes.result.value; } } catch (cdpErr) { try { await chrome.debugger.detach({ tabId }); } catch (_) {} res = { ok: false, error: { name: 'Error', message: 'CDP fallback failed: ' + cdpErr.message, stack: '' } }; } } // Grace period for async tab creation (e.g. link click with target=_blank) if (newTabIds.size === 0) await new Promise(r => setTimeout(r, 200)); chrome.tabs.onCreated.removeListener(onCreated); // Get full info for captured new tabs const newTabs = []; for (const id of newTabIds) { try { const t = await chrome.tabs.get(id); newTabs.push({id: t.id, url: t.url, title: t.title}); } catch (_) {} } if (res?.ok) { ws.send(JSON.stringify({ type: 'result', id: data.id, result: res.data, newTabs })); } else { console.log(res); ws.send(JSON.stringify({ type: 'error', id: data.id, error: res?.error || 'Unknown error', newTabs })); } } catch (e) { ws.send(JSON.stringify({ type: 'error', id: data.id, error: { name: e.name || 'Error', message: e.message || String(e), stack: e.stack || '' } })); } finally { chrome.tabs.onCreated.removeListener(onCreated); } } function connectWS() { if (ws && ws.readyState <= 1) return; // CONNECTING or OPEN ws = null; console.log('[TMWD-WS] Connecting to', WS_URL); try { ws = new WebSocket(WS_URL); } catch (e) { console.error('[TMWD-WS] Constructor error:', e); ws = null; scheduleProbe(); return; } ws.onopen = async () => { console.log('[TMWD-WS] Connected!'); scheduleKeepalive(); // Keep SW alive while connected const tabs = (await chrome.tabs.query({})).filter(t => isScriptable(t.url)); ws.send(JSON.stringify({ type: 'ext_ready', tabs: tabs.map(t => ({ id: t.id, url: t.url, title: t.title })) })); console.log('[TMWD-WS] Sent ext_ready with', tabs.length, 'tabs'); }; ws.onmessage = async (event) => { try { const data = JSON.parse(event.data); if (data.id && data.code) { let code = data.code; // If code is a JSON string representing an object, parse it if (typeof code === 'string') { try { const p = JSON.parse(code); if (p && typeof p === 'object') code = p; } catch (_) {} } if (typeof code === 'object' && code !== null && code.cmd) { // Custom protocol message → route to handleExtMessage if (code.tabId === undefined && data.tabId !== undefined) code.tabId = data.tabId; const res = await handleExtMessage(code, {}); ws.send(JSON.stringify({ type: res.ok ? 'result' : 'error', id: data.id, result: res.data ?? res.results ?? res, error: res.error })); } else if (typeof code === 'string') { // Plain JS code await handleWsExec(data); } else if (typeof code === 'object' && code !== null) { // Object without cmd → legacy extension message const msg = code.tabId === undefined && data.tabId !== undefined ? { ...code, tabId: data.tabId } : code; const res = await handleExtMessage(msg, {}); ws.send(JSON.stringify({ type: res.ok ? 'result' : 'error', id: data.id, result: res.data ?? res.results ?? res, error: res.error })); } } } catch (e) { console.error('[TMWD-WS] message parse error', e); } }; ws.onclose = () => { console.log('[TMWD-WS] Disconnected'); ws = null; scheduleProbe(); }; ws.onerror = (e) => { console.error('[TMWD-WS] Error:', e); // onclose will fire after this, which triggers reconnect }; } // Initial connect + wake-up hooks connectWS(); chrome.runtime.onStartup.addListener(() => connectWS()); chrome.runtime.onInstalled.addListener(() => connectWS()); // Sync tab list on changes async function sendTabsUpdate() { if (!ws || ws.readyState !== WebSocket.OPEN) return; const tabs = (await chrome.tabs.query({})).filter(t => isScriptable(t.url) && !/streamlit/i.test(t.title)); ws.send(JSON.stringify({ type: 'tabs_update', tabs: tabs.map(t => ({ id: t.id, url: t.url, title: t.title })) })); } chrome.tabs.onUpdated.addListener((_, changeInfo) => { if (changeInfo.status === 'complete') sendTabsUpdate(); }); chrome.tabs.onRemoved.addListener(() => sendTabsUpdate()); chrome.tabs.onCreated.addListener(() => sendTabsUpdate()); ================================================ FILE: assets/tmwd_cdp_bridge/content.js ================================================ ;(function(){ if (/streamlit/i.test(document.title)) return; // Remove meta CSP tags document.querySelectorAll('meta[http-equiv="Content-Security-Policy"]').forEach(e => e.remove()); // Indicator badge at bottom-right (userscript style) (function(){ if(window.self!==window.top)return; const d=document.createElement('div'); d.id='ljq-ind'; d.innerText='ljq_driver: 已连接'; d.style.cssText='position:fixed;bottom:8px;right:8px;background:#4CAF50;color:white;padding:4px 7px;border-radius:4px;font-size:11px;font-weight:bold;z-index:99999;cursor:pointer;box-shadow:0 2px 4px rgba(0,0,0,0.2);opacity:0.5;'; d.addEventListener('click',()=>alert('会话活跃\nURL: '+location.href)); (document.body||document.documentElement).appendChild(d); })(); new MutationObserver(muts => { for (const m of muts) for (const n of m.addedNodes) { if (n.id === TID || (n.querySelector && n.querySelector('#' + TID))) { const el = n.id === TID ? n : n.querySelector('#' + TID); handle(el); } } }).observe(document.documentElement, { childList: true, subtree: true }); async function handle(el) { try { const req = el.textContent.trim() ? JSON.parse(el.textContent) : { cmd: 'cookies' }; const cmd = req.cmd || 'cookies'; let resp; if (cmd === 'cookies') { resp = await chrome.runtime.sendMessage({ cmd: 'cookies', url: req.url || location.href }); } else if (cmd === 'cdp') { resp = await chrome.runtime.sendMessage({ cmd: 'cdp', method: req.method, params: req.params || {}, tabId: req.tabId }); } else if (cmd === 'batch') { resp = await chrome.runtime.sendMessage({ cmd: 'batch', commands: req.commands, tabId: req.tabId }); } else if (cmd === 'tabs') { resp = await chrome.runtime.sendMessage({ cmd: 'tabs', method: req.method, tabId: req.tabId }); } else { resp = { ok: false, error: 'unknown cmd: ' + cmd }; } el.textContent = JSON.stringify(resp); } catch (e) { el.textContent = JSON.stringify({ ok: false, error: e.message }); } } })(); ================================================ FILE: assets/tmwd_cdp_bridge/disable_dialogs.js ================================================ // Disable alert/confirm/prompt to prevent page JS from blocking extension (function() { const _log = console.log.bind(console); function toast(type, msg) { _log('[TMWD] ' + type + ' suppressed:', msg); try { const d = document.createElement('div'); d.textContent = '[' + type + '] ' + msg; Object.assign(d.style, { position:'fixed', top:'12px', right:'12px', zIndex:'2147483647', background:'#222', color:'#fff', padding:'10px 18px', borderRadius:'8px', fontSize:'14px', maxWidth:'420px', wordBreak:'break-all', boxShadow:'0 4px 16px rgba(0,0,0,.3)', opacity:'1', transition:'opacity .5s', pointerEvents:'none' }); (document.body || document.documentElement).appendChild(d); setTimeout(() => { d.style.opacity = '0'; }, 3000); setTimeout(() => { d.remove(); }, 3600); } catch(e) {} } window.alert = function(msg) { toast('alert', msg); }; window.confirm = function(msg) { toast('confirm', msg); return true; }; window.prompt = function(msg, def) { toast('prompt', msg); return def || null; }; })(); ================================================ FILE: assets/tmwd_cdp_bridge/manifest.json ================================================ { "manifest_version": 3, "name": "TMWD CDP Bridge", "version": "2.0", "description": "Cookie viewer + CDP bridge", "permissions": [ "cookies", "tabs", "activeTab", "debugger", "scripting", "alarms", "declarativeNetRequest", "management", "contentSettings" ], "host_permissions": [""], "background": { "service_worker": "background.js" }, "content_scripts": [ { "matches": [""], "js": ["disable_dialogs.js"], "run_at": "document_start", "all_frames": true, "world": "MAIN" }, { "matches": [""], "js": ["config.js", "content.js"], "run_at": "document_idle", "all_frames": true } ], "action": { "default_popup": "popup.html", "default_title": "TMWD CDP Bridge" } } ================================================ FILE: assets/tmwd_cdp_bridge/popup.html ================================================

🍪 Cookies

点击刷新获取 cookies...
================================================ FILE: assets/tmwd_cdp_bridge/popup.js ================================================ document.addEventListener('DOMContentLoaded', () => { const out = document.getElementById('out'); const btn = document.getElementById('refresh'); btn.addEventListener('click', fetchCookies); fetchCookies(); }); async function fetchCookies() { const out = document.getElementById('out'); try { const [tab] = await chrome.tabs.query({ active: true, currentWindow: true }); if (!tab?.url) { out.textContent = 'No active tab'; return; } const resp = await chrome.runtime.sendMessage({ cmd: 'cookies', url: tab.url }); if (!resp?.ok) { out.textContent = 'Error: ' + (resp?.error || 'unknown'); return; } if (!resp.data.length) { out.textContent = '(no cookies)'; return; } // 展示带标记 out.textContent = resp.data.map(c => `${c.name}=${c.value}` + (c.httpOnly ? ' [H]' : '') + (c.secure ? ' [S]' : '') + (c.partitionKey ? ' [P]' : '') ).join('\n'); // 自动复制 name=value; 格式到剪贴板 const str = resp.data.map(c => `${c.name}=${c.value}`).join('; '); await navigator.clipboard.writeText(str); } catch (e) { out.textContent = 'Error: ' + e.message; } } ================================================ FILE: assets/tool_usable_history.json ================================================ [{"role":"user","content":[{"type":"text","text":"code_run个helloworld,根据结果简单评价你的工具配置"}]},{"role":"assistant","content":[{"type":"text","text":"\n\n用户要求code_run helloworld并评价工具配置\n\n```python\nprint(\"Hello, World!\")\n```"},{"type":"tool_use","id":"tooluse_GlmTNw190WGgK4sBBOiDHK","name":"code_run","input":{"script":"print(\"Hello, World!\")"}}]},{"role":"user","content":[{"type":"tool_result","tool_use_id":"tooluse_GlmTNw190WGgK4sBBOiDHK","content":"{\"status\": \"success\", \"stdout\": \"Hello, World!\\r\\n\", \"exit_code\": 0}"},{"type":"text","text":"\n### [WORKING MEMORY]\n\n\n[USER]: code_run个helloworld,根据结果简单评价你的工具配置\n\nCurrent turn: 1\n"}]},{"role":"assistant","content":[{"type":"text","text":"\n\ncode_run helloworld成功,评价工具配置\n\n工具配置没问题。`code_run` 返回了结构化的 JSON(`status`、`stdout`、`exit_code`),执行链路通畅,cwd 默认落在 temp 目录。\n\n刚才还验证了 `inline_eval` 能正确访问 agent 内部状态并导出 history,`file_patch` 也能精准改文件。整套工具链从代码执行、文件读写、到 agent 内省都是通的。"}]}] ================================================ FILE: assets/tools_schema.json ================================================ [ {"type": "function", "function": { "name": "code_run", "description": "Code executor. Prefer python. Multi-call OK, use script param. Reply code block is executed if no script arg; prefer for single call to avoid escaping. No hardcoding bulk data", "parameters": {"type": "object", "properties": { "script": {"type": "string", "description": "[Mutually exclusive] NEVER use this param when use reply code block."}, "type": {"type": "string", "enum": ["python", "powershell"], "description": "Code type", "default": "python"}, "timeout": {"type": "integer", "description": "in seconds", "default": 60}, "cwd": {"type": "string", "description": "Working directory, defaults to cwd"}, "inline_eval": {"type": "boolean", "description": "DO NOT USE except explicitly specified."}}} }}, {"type": "function", "function": { "name": "file_read", "description": "Read file. Read before modify for latest context and line numbers", "parameters": {"type": "object", "properties": { "path": {"type": "string", "description": "Relative or absolute"}, "start": {"type": "integer", "description": "Start line number (1-based)"}, "count": {"type": "integer", "description": "Number of lines to read", "default": 200}, "keyword": {"type": "string", "description": "[Optional] If provided, returns first match (case-insensitive) with context"}, "show_linenos": {"type": "boolean", "description": "Show line numbers", "default": true}}} }}, {"type": "function", "function": { "name": "file_patch", "description": "Replace unique old_content with new_content. Exact match required (whitespace/indentation). On failure, file_read to recheck", "parameters": {"type": "object", "properties": { "path": {"type": "string", "description": "File path"}, "old_content": {"type": "string", "description": "Original text block to replace (must be unique)"}, "new_content": {"type": "string", "description": "New content. Supports {{file:path:startLine:endLine}} to ref file lines, auto-expanded"}}} }}, {"type": "function", "function": { "name": "file_write", "description": "Create/overwrite/append files. HUGE edits ONLY. Supports {{file:path:startLine:endLine}}, auto-expanded", "parameters": {"type": "object", "properties": { "path": {"type": "string", "description": "File path"}, "content": {"type": "string"}, "mode": {"type": "string", "enum": ["overwrite", "append", "prepend"], "description": "Write mode", "default": "overwrite"}}} }}, {"type": "function", "function": { "name": "web_scan", "description": "Get simplified HTML and tab list. Removes hidden/floating/covered elements. Call after switching pages", "parameters": {"type": "object", "properties": { "tabs_only": {"type": "boolean", "description": "Show tab list only, no HTML"}, "switch_tab_id": {"type": "string", "description": "[Optional] Tab ID to switch to"}, "text_only": {"type": "boolean", "description": "Plain text only, no HTML"}}} }}, {"type": "function", "function": { "name": "web_execute_js", "description": "Execute JS. Multi-call OK with different switch_tab_id. No guessing. Act accurately to reduce web_scan calls. Execute JS in ```javascript blocks if no script arg, prefer to avoid escaping", "parameters": {"type": "object", "properties": { "script": {"type": "string", "description": "[Mutually exclusive] JS code or script path. NEVER use this param when use reply code block"}, "save_to_file": {"type": "string", "description": "file path; **only** for long result"}, "no_monitor": {"type": "boolean", "description": "Skip page change monitoring, saves 2-3s. Only for reads, not for page actions"}, "switch_tab_id": {"type": "string", "description": "[Optional] Tab ID to switch to before executing"}}} }}, {"type": "function", "function": { "name": "update_working_checkpoint", "description": "Short-term working notepad, auto-injected each turn to prevent info loss in long tasks. Call during early/mid stages, not at end. When: (1) after reading SOP, store user needs & key constraints (skip for simple 1-2 step tasks); (2) before subtask switch or context flush; (3) after repeated failures, re-read SOP and must store new findings; (4) on new task, update content, clear old progress but keep valid constraints.\n\nDon't call: simple tasks (1-2 steps), task completed (use long-term memory tool)", "parameters": {"type": "object", "properties": { "key_info": {"type": "string", "description": "Replaces current notepad (<200 tokens). Incremental update: review existing, keep valid, add/remove/modify. Store: pitfalls, user requirements, key params/findings, file paths, progress, next steps. Don't store: ephemeral info, obvious context, old task info when user switched tasks. Prefer over-updating over losing key info"}, "related_sop": {"type": "string", "description": "Related SOP names, tips for further re-read"}}} }}, {"type": "function", "function": { "name": "ask_user", "description": "Interrupt task to ask user when needing decisions, extra info, or facing unresolvable blockers", "parameters": {"type": "object", "properties": { "question": {"type": "string", "description": "Question for the user"}, "candidates": {"type": "array", "items": {"type": "string"}, "description": "Optional quick-select choices for the user"}}} }}, {"type": "function", "function": { "name": "start_long_term_update", "description": "Start distilling long-term memory. Call when discovering info worth remembering (env facts/user prefs/lessons learned). Skip if memory already updated or in autonomous flow. Must call when a task that took 15+ turns is completed", "parameters": {"type": "object", "properties": {}}} } ] ================================================ FILE: assets/tools_schema_cn.json ================================================ [ {"type": "function", "function": { "name": "code_run", "description": "代码执行器。优先使用python。支持Multi-call,并行时用script参数。无script参数时正文代码块会被执行,单次调用优先使用以免转义。禁硬编码大量数据", "parameters": {"type": "object", "properties": { "script": {"type": "string", "description": "[Optional] 要执行的代码。为免转义建议留空,改用正文代码块(与此参数互斥)"}, "type": {"type": "string", "enum": ["python", "powershell"], "description": "代码类型", "default": "python"}, "timeout": {"type": "integer", "description": "执行超时时间(秒)", "default": 60}, "cwd": {"type": "string", "description": "工作目录,默认为当前工作目录"}, "inline_eval": {"type": "boolean", "description": "不允许使用除非明确要求"}}} }}, {"type": "function", "function": { "name": "file_read", "description": "读取文件内容。建议在修改文件前先读取,以确保获取最新的上下文和行号。支持分页读取或关键字搜索", "parameters": {"type": "object", "properties": { "path": {"type": "string", "description": "文件相对或绝对路径"}, "start": {"type": "integer", "description": "起始行号(从 1 开始)"}, "count": {"type": "integer", "description": "读取的行数", "default": 200}, "keyword": {"type": "string", "description": "可选搜索关键字。如果提供,将返回第一个匹配项(忽略大小写)及其周边的内容"}, "show_linenos": {"type": "boolean", "description": "是否显示行号,建议开启以辅助 file_patch 定位", "default": true}}} }}, {"type": "function", "function": { "name": "file_patch", "description": "精细化局部文件修改。在文件中寻找唯一的 old_content 块并替换为 new_content。要求 old_content 必须在文件中唯一存在,且空格、缩进、换行必须与原文件完全一致。如果匹配失败,请使用 file_read 重新确认文件内容", "parameters": {"type": "object", "properties": { "path": {"type": "string", "description": "文件路径"}, "old_content": {"type": "string", "description": "文件中需要被替换的原始文本块(需确保唯一性)"}, "new_content": {"type": "string", "description": "替换后的新文本内容。支持 {{file:路径:起始行:结束行}} 语法引用文件内容,写入前自动展开"}}} }}, {"type": "function", "function": { "name": "file_write", "description": "用于文件的新建、全量覆盖或追加写入。对于精细的代码修改,应优先使用 file_patch。写入内容支持 {{file:路径:起始行:结束行}} 语法引用文件片段,写入前自动展开", "parameters": {"type": "object", "properties": { "path": {"type": "string", "description": "文件路径"}, "content": {"type": "string"}, "mode": {"type": "string", "enum": ["overwrite", "append", "prepend"], "description": "写入模式覆盖、追加或在开头追加", "default": "overwrite"}}} }}, {"type": "function", "function": { "name": "web_scan", "description": "获取当前页面的简化HTML内容和标签页列表。会移除隐藏/浮动/被遮盖的元素。切换页面后一般应先调用查看", "parameters": {"type": "object", "properties": { "tabs_only": {"type": "boolean", "description": "仅返回标签页列表和当前标签信息,不获取HTML内容"}, "switch_tab_id": {"type": "string", "description": "可选的标签页 ID。如果提供,系统将在扫描前切换到该标签页"}, "text_only": {"type": "boolean", "description": "只要纯文本不要HTML"}}} }}, {"type": "function", "function": { "name": "web_execute_js", "description": "执行JS。支持Multi-call,用不同switch_tab_id并行操作多标签页。禁止猜测,准确操作以减少 web_scan 调用。无script参数时执行正文 ```javascript 块,以免转义", "parameters": {"type": "object", "properties": { "script": {"type": "string", "description": "[Optional] JS代码或路径。为免转义建议留空,改用正文代码块(与此参数互斥)"}, "save_to_file": {"type": "string", "description": "结果存文件,适合返回值较长时"}, "no_monitor": {"type": "boolean", "description": "跳过页面变更监控,省2-3秒。仅在纯读取信息时设置,页面操作时不要设置"}, "switch_tab_id": {"type": "string", "description": "可选的标签页 ID,切换到该标签页执行"}}} }}, {"type": "function", "function": { "name": "update_working_checkpoint", "description": "短期工作便签,每轮自动注入上下文,防长任务信息丢失。前中期调用,非结束时。何时调用:(1)任务开始读SOP后,存用户需求和关键约束/参数(简单1-2步任务除外);(2)子任务切换或上下文即将被冲刷前;(3)多次重试失败后,重读SOP并必须调用存储新发现;(4)切换新任务时更新内容,清旧进度但保留仍有效的约束。\n\n何时不调用:简单任务(1-2步且无严重约束)、任务已完成时(应当用长期结算工具)", "parameters": {"type": "object", "properties": { "key_info": {"type": "string", "description": "替换当前便签(<200 tokens)。增量更新:先回顾现有内容,保留仍有效的,再增删改。存:要避的坑、用户原始需求、关键参数/发现、文件路径、当前进度、下一步计划。不存:马上要用用完即丢的、上下文中显而易见的、用户已换全新任务时的旧任务信息。宁多更新不丢关键"}, "related_sop": {"type": "string", "description": "相关sop名称,可以多个,必要时需要再读"}}} }}, {"type": "function", "function": { "name": "ask_user", "description": "当需要用户决策、提供额外信息或遇到无法自动解决的阻碍时,调用此工具中断任务并提问", "parameters": {"type": "object", "properties": { "question": {"type": "string", "description": "向用户提出的明确问题"}, "candidates": {"type": "array", "items": {"type": "string"}, "description": "提供给用户的可选快捷选项列表"}}} }}, {"type": "function", "function": { "name": "start_long_term_update", "description": "准备开始提炼记忆。发现值得长期记忆的信息(环境事实/用户偏好/避坑经验)时调用此工具。已记忆更新或在自主流程内时无需调用。超15轮完成的任务必须调用以沉淀经验", "parameters": {"type": "object", "properties": {}}} } ] ================================================ FILE: frontends/DESKTOP_PET_README.md ================================================ # Desktop Pet Skin System ## 快速开始 运行桌面宠物: ```bash python3 desktop_pet_v2.pyw ``` ## 功能特性 ### 1. 多皮肤支持 - 自动发现 `skins/` 目录下的所有皮肤 - 右键菜单切换皮肤 - 支持 sprite sheet 和 GIF 两种格式 ### 2. 多动画状态 - **idle** - 待机动画 - **walk** - 行走动画 - **run** - 跑步动画 - **sprint** - 冲刺动画 右键菜单可切换动画状态 ### 3. 交互功能 - **单击** - 拖动宠物 - **双击** - 关闭程序 - **右键** - 打开菜单(切换皮肤/动画) ### 4. HTTP 远程控制 ```bash # 显示消息 curl "http://127.0.0.1:51983/?msg=Hello" # 切换动画状态 curl "http://127.0.0.1:51983/?state=run" # POST 消息 curl -X POST -d "任务完成" http://127.0.0.1:51983/ ``` ## 添加新皮肤 ### 目录结构 ``` skins/ └── your-skin-name/ ├── skin.json # 配置文件(必需) ├── idle.png # 动画资源 ├── walk.png ├── run.png └── sprint.png ``` ### skin.json 配置示例 #### Sprite Sheet 格式(推荐) ```json { "name": "My Pet", "version": "1.0.0", "author": "Your Name", "description": "描述", "format": "sprite", "animations": { "idle": { "file": "idle.png", "loop": true, "sprite": { "frameWidth": 44, "frameHeight": 31, "frameCount": 6, "columns": 6, "fps": 6, "startFrame": 0 } }, "walk": { "file": "walk.png", "loop": true, "sprite": { "frameWidth": 65, "frameHeight": 32, "frameCount": 8, "columns": 8, "fps": 8, "startFrame": 0 } } } } ``` #### GIF 格式 ```json { "name": "My Pet", "format": "gif", "animations": { "idle": { "file": "idle.gif", "loop": true }, "walk": { "file": "walk.gif", "loop": true } } } ``` ### 配置说明 - **frameWidth/frameHeight**: 单帧尺寸(像素) - **frameCount**: 帧数 - **columns**: sprite sheet 的列数 - **fps**: 播放帧率 - **startFrame**: 起始帧索引(从 0 开始) ### Sprite Sheet 布局 ``` +-------+-------+-------+-------+ | 帧0 | 帧1 | 帧2 | 帧3 | ← 第一行 +-------+-------+-------+-------+ | 帧4 | 帧5 | 帧6 | 帧7 | ← 第二行 +-------+-------+-------+-------+ ``` 如果 `columns=4, startFrame=2, frameCount=3`,则读取:帧2, 帧3, 帧4 ## 已包含的皮肤 1. **Glube** - 像素风小怪兽(多文件 sprite) 2. **Vita** - 像素风小恐龙(单文件 sprite) 3. **Doux** - 像素风小恐龙(单文件 sprite) ## 从 ai-bubu 导入更多皮肤 ai-bubu 项目包含更多皮肤资源,可以直接复制: ```bash # 复制皮肤 cp -r ai-bubu-main/packages/app/public/skins/boy frontends/skins/ cp -r ai-bubu-main/packages/app/public/skins/dinosaur frontends/skins/ cp -r ai-bubu-main/packages/app/public/skins/line frontends/skins/ cp -r ai-bubu-main/packages/app/public/skins/mort frontends/skins/ cp -r ai-bubu-main/packages/app/public/skins/tard frontends/skins/ ``` ## 与 stapp.py 集成 在 `stapp.py` 中点击"🐱 桌面宠物"按钮会自动启动桌面宠物,并在每个 turn 结束时发送通知。 ## 故障排查 ### 皮肤不显示 1. 检查 `skin.json` 格式是否正确 2. 确认图片文件存在 3. 检查 sprite 配置参数是否匹配图片尺寸 ### 动画不流畅 - 调整 `fps` 参数 - 检查帧数是否正确 ### 透明背景问题 - 确保 PNG 文件包含 alpha 通道 - 使用 RGBA 模式的图片 ## 技术细节 - 基于 Tkinter + PIL/Pillow - 支持透明背景(#01FF01 色键) - 窗口置顶、无边框 - HTTP 服务器端口:51983 ================================================ FILE: frontends/btw_cmd.py ================================================ """`/btw` 命令:side question — 不打断主 Agent 的临时 subagent 问答。 - 持锁 deepcopy backend.history → 后台线程 backend.raw_ask 单次拉答 - 主 agent backend.history 零写入;不入 task_queue - 答案 → display_queue 'done'(install 路径)或同步 return(frontend 路径) 复用 backend.raw_ask + make_messages,不新建 LLM 实例。 """ from __future__ import annotations import copy, os, threading, time from typing import Optional _WRAPPER_ZH = """ 这是用户的临时插问 (side question)。主 agent 仍在后台运行,**不会被打断**。 身份与边界: - 你是一个独立的轻量 sub-agent - 上下文里能看到主 agent 与用户的完整对话、最近的工具调用与结果 - 用户在问当前进展或顺便确认某事——基于已有信息**一次性**作答 - 没有任何工具可用:不要"让我查一下" / "我去试试" / 任何承诺动作 - 信息不足就坦白说"基于目前对话我不知道" 侧问内容如下: {question}""" _WRAPPER_EN = """ This is a side question from the user. The main agent is NOT interrupted — it continues in the background. Identity & boundaries: - You are an independent lightweight sub-agent - You can see the full conversation between the main agent and the user, plus recent tool calls/results - The user is asking about current progress or a quick aside — answer in **one shot** from existing info - You have NO tools — never say "let me check" / "I'll try" / any action promise - If info is missing, just say "based on the conversation I don't know" Question: {question}""" _TIMEOUT_SEC = 120 def _wrapper(): return _WRAPPER_EN if os.environ.get('GA_LANG') == 'en' else _WRAPPER_ZH def _strip_cmd(query): s = (query or '').strip() return s[len('/btw'):].strip() if s.startswith('/btw') else s def _help_text(): return ('**/btw 用法**:side question — 临时问主 agent 当前进展,不打断主线\n\n' '`/btw <你的问题>`\n\n' '行为:抓取当前对话上下文 → 单轮纯文本作答(无工具)→ 主 agent 历史不变。') def _snapshot_history(backend): """Lock + deepcopy: defends against concurrent compress_history_tags mutating inner blocks.""" with backend.lock: return copy.deepcopy(list(backend.history)) def _build_wire(backend, history, sidequest_msg): """history + sidequest → wire-format. Dispatches: BaseSession subclasses → make_messages, Native* → raw pairs (raw_ask runs _fix/_drop/_ensure transforms itself).""" msgs = history + [sidequest_msg] if hasattr(backend, 'make_messages'): return backend.make_messages(msgs) return [{"role": m["role"], "content": list(m.get("content", []))} for m in msgs] def _ask(agent, question, deadline): """One-shot raw_ask against current backend; never mutates backend.history.""" backend = agent.llmclient.backend user_msg = {"role": "user", "content": [{"type": "text", "text": _wrapper().format(question=question)}]} wire = _build_wire(backend, _snapshot_history(backend), user_msg) text = '' for chunk in backend.raw_ask(wire): text += chunk if time.time() > deadline: return text + '\n\n⚠️ /btw 超时,仅返回部分回复。' return text def _format(question, body, took): head = f'> 🟡 /btw {question}\n\n' return head + (body.strip() or '*(空回复)*') + f'\n\n*({took:.1f}s)*' def _run(agent, question, deadline): """Catches errors at the boundary so neither caller path needs its own try/except.""" try: return _ask(agent, question, deadline) except Exception as e: return f'❌ /btw 失败: {type(e).__name__}: {e}' def handle(agent, query, display_queue) -> Optional[str]: """Slash-cmd entry (server-side, install path). Spawn worker; return None to consume.""" question = _strip_cmd(query) if not question or question in ('help', '?', '-h', '--help'): display_queue.put({'done': _help_text(), 'source': 'system'}) return None started = time.time() deadline = started + _TIMEOUT_SEC def worker(): body = _run(agent, question, deadline) display_queue.put({'done': _format(question, body, time.time() - started), 'source': 'system'}) threading.Thread(target=worker, daemon=True, name='btw-sidequest').start() return None def handle_frontend_command(agent, query) -> str: """Sync entry for frontends wanting a string back (tg/wx/stapp/...).""" question = _strip_cmd(query) if not question or question in ('help', '?', '-h', '--help'): return _help_text() started = time.time() body = _run(agent, question, started + _TIMEOUT_SEC) return _format(question, body, time.time() - started) def install(cls): """Idempotent monkey-patch: intercept /btw before original dispatch.""" orig = cls._handle_slash_cmd if getattr(orig, '_btw_patched', False): return def patched(self, raw_query, display_queue): s = (raw_query or '').strip() if s == '/btw' or s.startswith('/btw ') or s.startswith('/btw\t'): r = handle(self, raw_query, display_queue) if r is None: return None return r return orig(self, raw_query, display_queue) patched._btw_patched = True cls._handle_slash_cmd = patched ================================================ FILE: frontends/chatapp_common.py ================================================ import ast, asyncio, glob, json, os, queue as Q, re, socket, sys, time HELP_COMMANDS = ( ("/help", "显示帮助"), ("/status", "查看状态"), ("/stop", "停止当前任务"), ("/new", "开启新对话并清空当前上下文"), ("/restore", "恢复上次对话历史"), ("/continue", "列出可恢复会话"), ("/continue [n]", "恢复第 n 个会话"), ("/btw ", "side question — 临时插问主 agent 进展,不打断主线"), ("/llm", "查看当前模型列表"), ("/llm [n]", "切换到第 n 个模型"), ) TELEGRAM_MENU_COMMANDS = ( ("help", "显示帮助"), ("status", "查看状态"), ("stop", "停止当前任务"), ("new", "开启新对话并清空当前上下文"), ("restore", "恢复上次对话历史"), ("continue", "列出可恢复会话;/continue n 恢复第 n 个"), ("llm", "查看模型列表;/llm n 切换到指定模型"), ) def build_help_text(commands=HELP_COMMANDS): return "📖 命令列表:\n" + "\n".join(f"{cmd} - {desc}" for cmd, desc in commands) HELP_TEXT = build_help_text() FILE_HINT = "If you need to show files to user, use [FILE:filepath] in your response." TAG_PATS = [r"<" + t + r">.*?" for t in ("thinking", "summary", "tool_use", "file_content")] PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) RESTORE_GLOBS = ( os.path.join(PROJECT_ROOT, "temp", "model_responses", "model_responses_*.txt"), os.path.join(PROJECT_ROOT, "temp", "model_responses_*.txt"), ) RESTORE_BLOCK_RE = re.compile( r"^=== (Prompt|Response) ===.*?\n(.*?)(?=^=== (?:Prompt|Response) ===|\Z)", re.DOTALL | re.MULTILINE, ) HISTORY_RE = re.compile(r"\s*(.*?)\s*", re.DOTALL) SUMMARY_RE = re.compile(r"\s*(.*?)\s*", re.DOTALL) def clean_reply(text): for pat in TAG_PATS: text = re.sub(pat, "", text or "", flags=re.DOTALL) return re.sub(r"\n{3,}", "\n\n", text).strip() or "..." def extract_files(text): return re.findall(r"\[FILE:([^\]]+)\]", text or "") def strip_files(text): return re.sub(r"\[FILE:[^\]]+\]", "", text or "").strip() def split_text(text, limit): text, parts = (text or "").strip() or "...", [] while len(text) > limit: cut = text.rfind("\n", 0, limit) if cut < limit * 0.6: cut = limit parts.append(text[:cut].rstrip()) text = text[cut:].lstrip() return parts + ([text] if text else []) or ["..."] def _restore_log_files(): files = [] for pattern in RESTORE_GLOBS: files.extend(glob.glob(pattern)) return sorted(set(files)) def _restore_text_pairs(content): users = re.findall(r"=== USER ===\n(.+?)(?==== |$)", content, re.DOTALL) resps = re.findall(r"=== Response ===.*?\n(.+?)(?==== Prompt|$)", content, re.DOTALL) restored = [] for u, r in zip(users, resps): u, r = u.strip(), r.strip()[:500] if u and r: restored.extend([f"[USER]: {u}", f"[Agent] {r}"]) return restored def _native_prompt_obj(prompt_body): try: prompt = json.loads(prompt_body) except Exception: return None if not isinstance(prompt, dict) or prompt.get("role") != "user": return None if not isinstance(prompt.get("content"), list): return None return prompt def _native_prompt_text(prompt): texts = [] for block in prompt.get("content", []): if isinstance(block, dict) and block.get("type") == "text": text = block.get("text", "") if isinstance(text, str) and text.strip(): texts.append(text) return "\n".join(texts).strip() def _native_history_lines(prompt_text): match = HISTORY_RE.search(prompt_text or "") if not match: return [] restored = [] for line in match.group(1).splitlines(): line = line.strip() if line.startswith("[USER]: ") or line.startswith("[Agent] "): restored.append(line) return restored def _native_first_user_line(prompt_text): text = (prompt_text or "").strip() if not text or "" in text or text.startswith("### [WORKING MEMORY]"): return "" if text.startswith(FILE_HINT): text = text[len(FILE_HINT):].lstrip() if "### 用户当前消息" in text: text = text.split("### 用户当前消息", 1)[-1].strip() return text def _native_response_summary(response_body): try: blocks = ast.literal_eval((response_body or "").strip()) except Exception: return "" if not isinstance(blocks, list): return "" text_parts = [] for block in blocks: if isinstance(block, dict) and block.get("type") == "text": text = block.get("text", "") if isinstance(text, str) and text: text_parts.append(text) match = SUMMARY_RE.search("\n".join(text_parts)) return (match.group(1).strip() if match else "")[:500] def _restore_native_history(content): blocks = RESTORE_BLOCK_RE.findall(content or "") if not blocks: return [] pairs = [] pending_prompt = None for label, body in blocks: if label == "Prompt": pending_prompt = body elif pending_prompt is not None: pairs.append((pending_prompt, body)) pending_prompt = None for prompt_body, response_body in reversed(pairs): prompt = _native_prompt_obj(prompt_body) if prompt is None: continue prompt_text = _native_prompt_text(prompt) restored = list(_native_history_lines(prompt_text)) if restored: summary = _native_response_summary(response_body) summary_line = f"[Agent] {summary}" if summary else "" if summary_line and (not restored or restored[-1] != summary_line): restored.append(summary_line) return restored user_text = _native_first_user_line(prompt_text) summary = _native_response_summary(response_body) if user_text and summary: return [f"[USER]: {user_text}", f"[Agent] {summary}"] return [] def format_restore(): files = _restore_log_files() if not files: return None, "❌ 没有找到历史记录" latest = max(files, key=os.path.getmtime) with open(latest, "r", encoding="utf-8") as f: content = f.read() restored = _restore_text_pairs(content) or _restore_native_history(content) if not restored: return None, "❌ 历史记录里没有可恢复内容" count = sum(1 for line in restored if line.startswith("[USER]: ")) return (restored, os.path.basename(latest), count), None def build_done_text(raw_text): files = [p for p in extract_files(raw_text) if os.path.exists(p)] body = strip_files(clean_reply(raw_text)) if files: body = (body + "\n\n" if body else "") + "\n".join(f"生成文件: {p}" for p in files) return body or "..." def public_access(allowed): return not allowed or "*" in allowed def to_allowed_set(value): if value is None: return set() if isinstance(value, str): value = [value] return {str(x).strip() for x in value if str(x).strip()} def allowed_label(allowed): return "public" if public_access(allowed) else sorted(allowed) def ensure_single_instance(port, label): try: lock_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) lock_sock.bind(("127.0.0.1", port)) return lock_sock except OSError: print(f"[{label}] Another instance is already running, skipping...") sys.exit(1) def require_runtime(agent, label, **required): missing = [k for k, v in required.items() if not v] if missing: print(f"[{label}] ERROR: please set {', '.join(missing)} in mykey.py or mykey.json") sys.exit(1) if agent.llmclient is None: print(f"[{label}] ERROR: no usable LLM backend found in mykey.py or mykey.json") sys.exit(1) def redirect_log(script_file, log_name, label, allowed): log_dir = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(script_file))), "temp") os.makedirs(log_dir, exist_ok=True) logf = open(os.path.join(log_dir, log_name), "a", encoding="utf-8", buffering=1) sys.stdout = sys.stderr = logf print(f"[NEW] {label} process starting, the above are history infos ...") print(f"[{label}] allow list: {allowed_label(allowed)}") class AgentChatMixin: label = "Chat" source = "chat" split_limit = 1500 ping_interval = 20 def __init__(self, agent, user_tasks): self.agent, self.user_tasks = agent, user_tasks async def send_text(self, chat_id, content, **ctx): raise NotImplementedError async def send_done(self, chat_id, raw_text, **ctx): await self.send_text(chat_id, build_done_text(raw_text), **ctx) async def handle_command(self, chat_id, cmd, **ctx): parts = (cmd or "").split() op = (parts[0] if parts else "").lower() if op == "/help": return await self.send_text(chat_id, HELP_TEXT, **ctx) if op == "/stop": state = self.user_tasks.get(chat_id) if state: state["running"] = False self.agent.abort() return await self.send_text(chat_id, "⏹️ 正在停止...", **ctx) if op == "/status": llm = self.agent.get_llm_name() if self.agent.llmclient else "未配置" return await self.send_text(chat_id, f"状态: {'🔴 运行中' if self.agent.is_running else '🟢 空闲'}\nLLM: [{self.agent.llm_no}] {llm}", **ctx) if op == "/llm": if not self.agent.llmclient: return await self.send_text(chat_id, "❌ 当前没有可用的 LLM 配置", **ctx) if len(parts) > 1: try: self.agent.next_llm(int(parts[1])) return await self.send_text(chat_id, f"✅ 已切换到 [{self.agent.llm_no}] {self.agent.get_llm_name()}", **ctx) except Exception: return await self.send_text(chat_id, f"用法: /llm <0-{len(self.agent.list_llms()) - 1}>", **ctx) lines = [f"{'→' if cur else ' '} [{i}] {name}" for i, name, cur in self.agent.list_llms()] return await self.send_text(chat_id, "LLMs:\n" + "\n".join(lines), **ctx) if op == "/restore": try: restored_info, err = format_restore() if err: return await self.send_text(chat_id, err, **ctx) restored, fname, count = restored_info self.agent.abort() self.agent.history.extend(restored) return await self.send_text(chat_id, f"✅ 已恢复 {count} 轮对话\n来源: {fname}\n(仅恢复上下文,请输入新问题继续)", **ctx) except Exception as e: return await self.send_text(chat_id, f"❌ 恢复失败: {e}", **ctx) if op == "/continue": return await self.send_text(chat_id, _handle_continue_frontend(self.agent, cmd), **ctx) if op == "/new": return await self.send_text(chat_id, _reset_conversation(self.agent), **ctx) if op == "/btw": answer = await asyncio.to_thread(_handle_btw_frontend, self.agent, cmd) return await self.send_text(chat_id, answer, **ctx) return await self.send_text(chat_id, HELP_TEXT, **ctx) async def run_agent(self, chat_id, text, **ctx): state = {"running": True} self.user_tasks[chat_id] = state try: await self.send_text(chat_id, "思考中...", **ctx) dq = self.agent.put_task(f"{FILE_HINT}\n\n{text}", source=self.source) last_ping = time.time() while state["running"]: try: item = await asyncio.to_thread(dq.get, True, 3) except Q.Empty: if self.agent.is_running and time.time() - last_ping > self.ping_interval: await self.send_text(chat_id, "⏳ 还在处理中,请稍等...", **ctx) last_ping = time.time() continue if "done" in item: await self.send_done(chat_id, item.get("done", ""), **ctx) break if not state["running"]: await self.send_text(chat_id, "⏹️ 已停止", **ctx) except Exception as e: import traceback print(f"[{self.label}] run_agent error: {e}") traceback.print_exc() await self.send_text(chat_id, f"❌ 错误: {e}", **ctx) finally: self.user_tasks.pop(chat_id, None) from agentmain import GeneraticAgent as _GA from continue_cmd import handle_frontend_command as _handle_continue_frontend, install as _install_continue, reset_conversation as _reset_conversation _install_continue(_GA) from btw_cmd import handle_frontend_command as _handle_btw_frontend, install as _install_btw; _install_btw(_GA) ================================================ FILE: frontends/continue_cmd.py ================================================ """`/continue` command: list & restore past model_responses sessions. Pure functions + one `install(cls)` monkey-patch entry. No side effects at import. """ import ast, glob, json, os, re, time _LOG_DIR = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), 'temp', 'model_responses') _LOG_GLOB = os.path.join(_LOG_DIR, 'model_responses_*.txt') _BLOCK_RE = re.compile(r'^=== (Prompt|Response) ===.*?\n(.*?)(?=^=== (?:Prompt|Response) ===|\Z)', re.DOTALL | re.MULTILINE) _SUMMARY_RE = re.compile(r'\s*(.*?)\s*', re.DOTALL) def _rel_time(mtime): d = int(time.time() - mtime) if d < 60: return f'{d}秒前' if d < 3600: return f'{d // 60}分前' if d < 86400: return f'{d // 3600}小时前' return f'{d // 86400}天前' def _pairs(content): blocks, pairs, pending = _BLOCK_RE.findall(content or ''), [], None for label, body in blocks: if label == 'Prompt': pending = body.strip() elif pending is not None: pairs.append((pending, body.strip())); pending = None return pairs def _first_user(pairs): for p, _ in pairs: try: msg = json.loads(p) except Exception: continue if not isinstance(msg, dict): continue for blk in msg.get('content', []) or []: if isinstance(blk, dict) and blk.get('type') == 'text': t = (blk.get('text') or '').strip() if t and '' not in t and not t.startswith('### [WORKING MEMORY]'): return t for p, _ in pairs[:1]: for line in p.splitlines(): s = line.strip() if s and not s.startswith('###'): return s return '' def _last_summary(pairs): for _, response_body in reversed(pairs): try: blocks = ast.literal_eval(response_body) except Exception: continue if not isinstance(blocks, list): continue text_parts = [] for block in blocks: if isinstance(block, dict) and block.get('type') == 'text': text = block.get('text', '') if isinstance(text, str) and text: text_parts.append(text) match = _SUMMARY_RE.search('\n'.join(text_parts)) if match: summary = match.group(1).strip() if summary: return summary return '' def _preview_text(pairs): return _last_summary(pairs) or _first_user(pairs) def _recent_context(my_pid, n=5): """扫描最近 n 个 model_response 文件(排除自身),提取 lastQ / lastA。""" out = [] for f in sorted(glob.glob(_LOG_GLOB), key=os.path.getmtime, reverse=True): m = re.search(r'model_responses_(\d+)', os.path.basename(f)) if not m or m.group(1) == str(my_pid): continue try: c = open(f, encoding='utf-8', errors='ignore').read() except Exception: continue q = s = "" for hm in re.finditer(r'(.*?)', c, re.DOTALL): u = re.search(r'\[USER\]:\s*(.+?)(?:\\n|<)', hm.group(1)) if u: q = u.group(1) sm = _SUMMARY_RE.search(c) if sm: s = sm.group(1).strip() q, s = q[:60].strip(), s[:60].replace('\n', ' ').strip() out.append(f'· {m.group(1)} | lastQ: {q or "-"} | lastA: {s or "-"}') if len(out) >= n: break return ('[RecentContext] 近期并行会话(非当前):\n' + '\n'.join(out) + '\n[/RecentContext]') if out else "" def _parse_native_history(pairs): history = [] for p, r in pairs: try: user_msg = json.loads(p) except Exception: return None try: blocks = ast.literal_eval(r) except Exception: return None if not (isinstance(user_msg, dict) and user_msg.get('role') == 'user'): return None if not isinstance(blocks, list): return None history.append(user_msg) history.append({'role': 'assistant', 'content': blocks}) return history def list_sessions(exclude_pid=None): """Newest-first list of (path, mtime, first_user_text, n_rounds).""" files = glob.glob(_LOG_GLOB) if exclude_pid is not None: tag = f'model_responses_{exclude_pid}.txt' files = [f for f in files if not f.endswith(tag)] out = [] for f in files: try: with open(f, encoding='utf-8', errors='replace') as fh: content = fh.read() except Exception: continue pairs = _pairs(content) if not pairs: continue out.append((f, os.path.getmtime(f), _preview_text(pairs), len(pairs))) out.sort(key=lambda x: x[1], reverse=True) return out _MD_ESCAPE_RE = re.compile(r'([\\`*_\[\]])') def _escape_md(s): return _MD_ESCAPE_RE.sub(r'\\\1', s) def _agent_clients(agent): clients = [] for client in getattr(agent, 'llmclients', []) or []: if client not in clients: clients.append(client) current = getattr(agent, 'llmclient', None) if current is not None and current not in clients: clients.insert(0, current) return clients def _replace_backend_history(agent, history): backend = getattr(getattr(agent, 'llmclient', None), 'backend', None) if backend is not None and hasattr(backend, 'history'): backend.history = list(history or []) def _current_log_path(pid=None): pid = os.getpid() if pid is None else pid return os.path.join(_LOG_DIR, f'model_responses_{pid}.txt') def _snapshot_current_log(pid=None): """Persist current PID log as a standalone recoverable snapshot, then clear it.""" path = _current_log_path(pid) if not os.path.isfile(path): return None try: with open(path, encoding='utf-8', errors='replace') as fh: content = fh.read() except Exception: return None if not _pairs(content): return None os.makedirs(_LOG_DIR, exist_ok=True) pid = os.getpid() if pid is None else pid stamp = time.strftime('%Y%m%d_%H%M%S') snapshot = os.path.join(_LOG_DIR, f'model_responses_snapshot_{pid}_{stamp}_{time.time_ns() % 1_000_000_000:09d}.txt') with open(snapshot, 'w', encoding='utf-8', errors='replace') as fh: fh.write(content) with open(path, 'w', encoding='utf-8', errors='replace'): pass return snapshot def reset_conversation(agent, message='🆕 已开启新对话,当前上下文已清空'): """Abort current work and clear all known frontend-visible conversation state.""" try: agent.abort() except Exception: pass _snapshot_current_log() if hasattr(agent, 'history'): agent.history = [] for client in _agent_clients(agent): backend = getattr(client, 'backend', None) if backend is not None and hasattr(backend, 'history'): backend.history = [] if hasattr(client, 'last_tools'): client.last_tools = '' if hasattr(agent, 'handler'): agent.handler = None return message def format_list(sessions, limit=20): if not sessions: return '❌ 没有可恢复的历史会话' lines = ['**可恢复会话**(输入 `/continue N` 恢复第 N 个):', ''] for i, (_, mtime, first, n) in enumerate(sessions[:limit], 1): preview = _escape_md((first or '(无法预览)').replace('\n', ' ')[:60]) lines.append(f'{i}. `{_rel_time(mtime)}` · **{n} 轮** · {preview}') return '\n'.join(lines) def restore(agent, path): """Restore session at path. Returns (msg, is_full).""" try: with open(path, encoding='utf-8', errors='replace') as fh: content = fh.read() except Exception as e: return f'❌ 读取失败: {e}', False pairs = _pairs(content) if not pairs: return f'❌ {os.path.basename(path)} 为空或格式不符', False history = _parse_native_history(pairs) name = os.path.basename(path) if history is not None: agent.abort() _replace_backend_history(agent, history) return f'✅ 已恢复 {len(pairs)} 轮完整对话({name})\n(已写入 backend.history,可直接继续)', True from chatapp_common import _restore_native_history, _restore_text_pairs summary = _restore_text_pairs(content) or _restore_native_history(content) if not summary: return f'❌ {name} 无法解析(非 native 且无摘要可提取)', False agent.abort() agent.history.extend(summary) n = sum(1 for l in summary if l.startswith('[USER]: ')) return f'⚠️ 非 native 格式,已降级恢复 {n} 轮摘要({name})\n(请输入新问题继续)', False def handle(agent, query, display_queue): """Dispatch /continue or /continue N. Returns None if consumed else original query.""" s = (query or '').strip() if s == '/continue': display_queue.put({'done': format_list(list_sessions(exclude_pid=os.getpid())), 'source': 'system'}) return None m = re.match(r'/continue\s+(\d+)\s*$', s) if m: sessions = list_sessions(exclude_pid=os.getpid()) idx = int(m.group(1)) - 1 if not (0 <= idx < len(sessions)): display_queue.put({'done': f'❌ 索引越界(有效范围 1-{len(sessions)})', 'source': 'system'}) return None reset_conversation(agent, message=None) msg, _ = restore(agent, sessions[idx][0]) display_queue.put({'done': msg, 'source': 'system'}) return None return query def _user_text(prompt_body): """User-typed text from a prompt JSON; '' if this is an agent auto-continuation.""" try: msg = json.loads(prompt_body) except Exception: return '' if not isinstance(msg, dict): return '' for blk in msg.get('content', []) or []: if isinstance(blk, dict) and blk.get('type') == 'text': t = (blk.get('text') or '').strip() if t and not t.startswith('### [WORKING MEMORY]'): return t return '' def _assistant_text(response_body): """Joined text from a response blocks repr; '' on parse failure.""" try: blocks = ast.literal_eval(response_body) except Exception: return '' if not isinstance(blocks, list): return '' return '\n'.join(b['text'] for b in blocks if isinstance(b, dict) and b.get('type') == 'text' and isinstance(b.get('text'), str) and b['text'].strip()) _TURN_MARK = '**LLM Running (Turn {}) ...**\n\n' def extract_ui_messages(path): """Parse a model_responses log into [{role, content}, ...] for UI replay. Auto-continuation turns are folded into one assistant bubble with Turn markers, matching live chat rendering via fold_turns(). """ try: with open(path, encoding='utf-8', errors='replace') as f: content = f.read() except Exception: return [] rounds = [] # [(user_text, [turn_text, ...]), ...] for prompt, response in _pairs(content): user = _user_text(prompt) if user or not rounds: rounds.append((user, [])) rounds[-1][1].append(_assistant_text(response)) out = [] for user, turns in rounds: if not user or not any(turns): continue body = '\n\n'.join(t if i == 0 else _TURN_MARK.format(i + 1) + t for i, t in enumerate(turns)) out += [{'role': 'user', 'content': user}, {'role': 'assistant', 'content': body}] return out def handle_frontend_command(agent, query, exclude_pid=None): """Frontend-friendly /continue entry that returns text directly.""" s = (query or '').strip() exclude_pid = os.getpid() if exclude_pid is None else exclude_pid if s == '/continue': return format_list(list_sessions(exclude_pid=exclude_pid)) m = re.match(r'/continue\s+(\d+)\s*$', s) if not m: return '用法: /continue 或 /continue N' sessions = list_sessions(exclude_pid=exclude_pid) idx = int(m.group(1)) - 1 if not (0 <= idx < len(sessions)): return f'❌ 索引越界(有效范围 1-{len(sessions)})' reset_conversation(agent, message=None) msg, _ = restore(agent, sessions[idx][0]) return msg def install(cls): """Wrap cls._handle_slash_cmd so /continue is handled before original dispatch.""" orig = cls._handle_slash_cmd if getattr(orig, '_continue_patched', False): return def patched(self, raw_query, display_queue): if (raw_query or '').startswith('/continue'): r = handle(self, raw_query, display_queue) if r is None: return None return orig(self, raw_query, display_queue) patched._continue_patched = True cls._handle_slash_cmd = patched ================================================ FILE: frontends/dcapp.py ================================================ # Discord Bot Frontend for GenericAgent # ⚠️ 需要在 Discord Developer Portal 开启 "Message Content Intent" # Bot → Privileged Gateway Intents → MESSAGE CONTENT INTENT → 打开 # pip install discord.py import asyncio, json, os, queue as Q, re, sys, threading, time from collections import OrderedDict sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from agentmain import GeneraticAgent from chatapp_common import ( AgentChatMixin, build_done_text, ensure_single_instance, extract_files, public_access, redirect_log, require_runtime, split_text, strip_files, clean_reply, HELP_TEXT, FILE_HINT, format_restore, _handle_continue_frontend, _reset_conversation, ) from llmcore import mykeys try: import discord except Exception: print("Please install discord.py to use Discord: pip install discord.py") sys.exit(1) agent = GeneraticAgent(); agent.verbose = False BOT_TOKEN = str(mykeys.get("discord_bot_token", "") or "").strip() ALLOWED = {str(x).strip() for x in mykeys.get("discord_allowed_users", []) if str(x).strip()} USER_TASKS = {} PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) TEMP_DIR = os.path.join(PROJECT_ROOT, "temp") MEDIA_DIR = os.path.join(TEMP_DIR, "discord_media") ACTIVE_FILE = os.path.join(TEMP_DIR, "discord_active_channels.json") ACTIVE_TTL_SECONDS = 30 * 24 * 3600 EXIT_CHANNEL_TEXTS = {"退出该频道", "退出此频道", "退出频道"} EXIT_THREAD_TEXTS = {"退出该子区", "退出此子区", "退出子区"} os.makedirs(MEDIA_DIR, exist_ok=True) def _extract_discord_progress(text): """Return the newest concise from a streaming transcript.""" matches = re.findall(r"\s*(.*?)\s*", text or "", flags=re.DOTALL) if not matches: return "" summary = re.sub(r"\s+", " ", matches[-1]).strip() return summary[:120] def _strip_discord_transcript(text): """Hide LLM/tool transcript noise while preserving the final natural reply.""" text = text or "" text = re.sub(r"^\s*\*?\*?LLM Running \(Turn \d+\) \.\.\.\*?\*?\s*$", "", text, flags=re.M) text = re.sub(r"^\s*🛠️\s+.*?(?=^\s*(?:\*?\*?LLM Running||$))", "", text, flags=re.M | re.DOTALL) text = re.sub(r"^\s*(?:✅|❌|ERR|STDOUT|PAT\b|RC\b).*?$", "", text, flags=re.M) text = re.sub(r".*?", "", text, flags=re.DOTALL) text = clean_reply(text) return strip_files(text).strip() def _display_done_text(text): body = _strip_discord_transcript(text) if body and body != "...": return body summaries = re.findall(r"\s*(.*?)\s*", text or "", flags=re.DOTALL) if summaries: return re.sub(r"\s+", " ", summaries[-1]).strip() or "..." return "..." class DiscordApp(AgentChatMixin): label, source, split_limit = "Discord", "discord", 1900 def __init__(self): super().__init__(agent, USER_TASKS) intents = discord.Intents.default() intents.message_content = True intents.guilds = True intents.dm_messages = True proxy = str(mykeys.get("proxy", "") or "").strip() or None self.client = discord.Client(intents=intents, proxy=proxy) self.background_tasks = set() self._channel_cache = OrderedDict() # chat_id -> channel/user object (LRU, max 500) self._active_channels = self._load_active_channels() # guild chat_id -> {last_seen: float} self._active_lock = threading.Lock() self._agents = OrderedDict() # chat_id -> GeneraticAgent, each chat has isolated history self._agent_lock = threading.Lock() @self.client.event async def on_ready(): print(f"[Discord] bot ready: {self.client.user} ({self.client.user.id})") @self.client.event async def on_message(message): await self._handle_message(message) def _chat_id(self, message): """Return a string chat_id: 'dm:' or 'ch:'.""" if isinstance(message.channel, discord.DMChannel): return f"dm:{message.author.id}" return f"ch:{message.channel.id}" def _load_active_channels(self): try: with open(ACTIVE_FILE, "r", encoding="utf-8") as f: data = json.load(f) if not isinstance(data, dict): return {} now = time.time() active = {} for chat_id, item in data.items(): if not str(chat_id).startswith("ch:") or not isinstance(item, dict): continue last_seen = float(item.get("last_seen") or 0) if now - last_seen <= ACTIVE_TTL_SECONDS: active[str(chat_id)] = {"last_seen": last_seen} return active except FileNotFoundError: return {} except Exception as e: print(f"[Discord] failed to load active channels: {e}") return {} def _save_active_channels(self): try: os.makedirs(os.path.dirname(ACTIVE_FILE), exist_ok=True) tmp = ACTIVE_FILE + ".tmp" with open(tmp, "w", encoding="utf-8") as f: json.dump(self._active_channels, f, ensure_ascii=False, indent=2, sort_keys=True) os.replace(tmp, ACTIVE_FILE) except Exception as e: print(f"[Discord] failed to save active channels: {e}") def _is_active_channel(self, chat_id, now=None): now = now or time.time() with self._active_lock: item = self._active_channels.get(chat_id) if not item: return False if now - float(item.get("last_seen") or 0) > ACTIVE_TTL_SECONDS: self._active_channels.pop(chat_id, None) self._save_active_channels() print(f"[Discord] channel expired: {chat_id}") return False return True def _touch_active_channel(self, chat_id, now=None): if not chat_id.startswith("ch:"): return with self._active_lock: self._active_channels[chat_id] = {"last_seen": float(now or time.time())} self._save_active_channels() def _deactivate_channel(self, chat_id): with self._active_lock: changed = self._active_channels.pop(chat_id, None) is not None self._save_active_channels() state = self.user_tasks.get(chat_id) if state: state["running"] = False try: self._get_agent(chat_id).abort() except Exception as e: print(f"[Discord] deactivate abort failed for {chat_id}: {e}") return changed def _get_agent(self, chat_id): with self._agent_lock: ga = self._agents.get(chat_id) if ga is None: ga = GeneraticAgent() ga.verbose = False self._agents[chat_id] = ga threading.Thread(target=ga.run, daemon=True, name=f"discord-agent-{chat_id}").start() if len(self._agents) > 200: old_chat_id, _old_agent = self._agents.popitem(last=False) print(f"[Discord] dropped agent cache entry: {old_chat_id}") else: self._agents.move_to_end(chat_id) return ga async def _download_attachments(self, message): """Download attachments/images to MEDIA_DIR, return list of local paths.""" paths = [] for att in message.attachments: safe_name = re.sub(r'[<>:"/\\|?*]', '_', att.filename or f"file_{att.id}") local_path = os.path.join(MEDIA_DIR, f"{att.id}_{safe_name}") try: await att.save(local_path) paths.append(local_path) print(f"[Discord] saved attachment: {local_path}") except Exception as e: print(f"[Discord] failed to save attachment {att.filename}: {e}") return paths async def send_text(self, chat_id, content, **ctx): """Send text (and optionally files) to a chat_id.""" channel = self._channel_cache.get(chat_id) if channel is None: try: if chat_id.startswith("dm:"): user = await self.client.fetch_user(int(chat_id[3:])) channel = await user.create_dm() else: channel = await self.client.fetch_channel(int(chat_id[3:])) self._channel_cache[chat_id] = channel if len(self._channel_cache) > 500: self._channel_cache.popitem(last=False) except Exception as e: print(f"[Discord] cannot resolve channel for {chat_id}: {e}") return for part in split_text(content, self.split_limit): try: await channel.send(part) except Exception as e: print(f"[Discord] send error: {e}") async def send_done(self, chat_id, raw_text, **ctx): """Send final reply: text parts + file attachments.""" files = [p for p in extract_files(raw_text) if os.path.exists(p)] body = _display_done_text(raw_text) # Send text (send_text handles splitting internally) if body and body != "...": await self.send_text(chat_id, body, **ctx) # Send files as Discord attachments if files: channel = self._channel_cache.get(chat_id) if channel: for fpath in files: try: await channel.send(file=discord.File(fpath)) except Exception as e: print(f"[Discord] failed to send file {fpath}: {e}") await self.send_text(chat_id, f"⚠️ 文件发送失败: {os.path.basename(fpath)}", **ctx) if not body and not files: await self.send_text(chat_id, "...", **ctx) async def handle_command(self, chat_id, cmd, **ctx): """Handle slash commands against the per-chat agent, keeping Discord chats isolated.""" ga = self._get_agent(chat_id) parts = (cmd or "").split() op = (parts[0] if parts else "").lower() if op == "/help": return await self.send_text(chat_id, HELP_TEXT, **ctx) if op == "/stop": state = self.user_tasks.get(chat_id) if state: state["running"] = False ga.abort() return await self.send_text(chat_id, "⏹️ 正在停止...", **ctx) if op == "/status": llm = ga.get_llm_name() if ga.llmclient else "未配置" return await self.send_text(chat_id, f"状态: {'🔴 运行中' if ga.is_running else '🟢 空闲'}\nLLM: [{ga.llm_no}] {llm}", **ctx) if op == "/llm": if not ga.llmclient: return await self.send_text(chat_id, "❌ 当前没有可用的 LLM 配置", **ctx) if len(parts) > 1: try: ga.next_llm(int(parts[1])) return await self.send_text(chat_id, f"✅ 已切换到 [{ga.llm_no}] {ga.get_llm_name()}", **ctx) except Exception: return await self.send_text(chat_id, f"用法: /llm <0-{len(ga.list_llms()) - 1}>", **ctx) lines = [f"{'→' if cur else ' '} [{i}] {name}" for i, name, cur in ga.list_llms()] return await self.send_text(chat_id, "LLMs:\n" + "\n".join(lines), **ctx) if op == "/restore": try: restored_info, err = format_restore() if err: return await self.send_text(chat_id, err, **ctx) restored, fname, count = restored_info ga.abort() ga.history.extend(restored) return await self.send_text(chat_id, f"✅ 已恢复 {count} 轮对话\n来源: {fname}\n(仅恢复上下文,请输入新问题继续)", **ctx) except Exception as e: return await self.send_text(chat_id, f"❌ 恢复失败: {e}", **ctx) if op == "/continue": return await self.send_text(chat_id, _handle_continue_frontend(ga, cmd), **ctx) if op == "/new": return await self.send_text(chat_id, _reset_conversation(ga), **ctx) return await self.send_text(chat_id, HELP_TEXT, **ctx) async def run_agent(self, chat_id, text, **ctx): """Run the isolated per-chat Discord agent.""" ga = self._get_agent(chat_id) state = {"running": True} self.user_tasks[chat_id] = state try: await self.send_text(chat_id, "思考中...", **ctx) dq = ga.put_task(f"{FILE_HINT}\n\n{text}", source=self.source) last_ping = time.time() last_step = "" step_no = 0 while state["running"]: try: item = await asyncio.to_thread(dq.get, True, 3) except Q.Empty: if ga.is_running and time.time() - last_ping > self.ping_interval: await self.send_text(chat_id, "⏳ 还在处理中,请稍等...", **ctx) last_ping = time.time() continue if "next" in item: step = _extract_discord_progress(item.get("next", "")) if step and step != last_step: step_no += 1 await self.send_text(chat_id, f"步骤{step_no}:{step}", **ctx) last_step = step last_ping = time.time() continue if "done" in item: await self.send_done(chat_id, item.get("done", ""), **ctx) break if not state["running"]: await self.send_text(chat_id, "⏹️ 已停止", **ctx) except Exception as e: import traceback print(f"[{self.label}] run_agent error: {e}") traceback.print_exc() await self.send_text(chat_id, f"❌ 错误: {e}", **ctx) finally: self.user_tasks.pop(chat_id, None) async def _handle_message(self, message): # Ignore self if message.author == self.client.user or message.author.bot: return is_dm = isinstance(message.channel, discord.DMChannel) is_guild = message.guild is not None chat_id = self._chat_id(message) now = time.time() mentioned = bool(is_guild and self.client.user and self.client.user.mentioned_in(message)) self._channel_cache[chat_id] = message.channel if len(self._channel_cache) > 500: self._channel_cache.popitem(last=False) user_id = str(message.author.id) user_name = str(message.author) if not public_access(ALLOWED) and user_id not in ALLOWED: print(f"[Discord] unauthorized user: {user_name} ({user_id})") return if is_guild: active = self._is_active_channel(chat_id, now) if not mentioned and not active: return if mentioned or active: self._touch_active_channel(chat_id, now) # Strip bot mention from content content = message.content or "" if is_guild and self.client.user: content = re.sub(rf"<@!?{self.client.user.id}>", "", content).strip() else: content = content.strip() normalized = re.sub(r"\s+", "", content) if is_guild and normalized in EXIT_CHANNEL_TEXTS | EXIT_THREAD_TEXTS: self._deactivate_channel(chat_id) label = "子区" if normalized in EXIT_THREAD_TEXTS else "频道" await self.send_text(chat_id, f"✅ 已退出该{label},之后除非重新 @ 我,否则不会主动响应。") print(f"[Discord] manually deactivated {chat_id} by {user_name} ({user_id})") return # Download attachments attachment_paths = await self._download_attachments(message) # Build message text with attachment paths if attachment_paths: paths_text = "\n".join(f"[附件: {p}]" for p in attachment_paths) content = f"{content}\n{paths_text}" if content else paths_text if not content: return print(f"[Discord] message from {user_name} ({user_id}, {'dm' if is_dm else 'guild'}): {content[:200]}") if content.startswith("/"): return await self.handle_command(chat_id, content) task = asyncio.create_task(self.run_agent(chat_id, content)) self.background_tasks.add(task) task.add_done_callback(self.background_tasks.discard) async def start(self): print("[Discord] bot starting...") delay, max_delay = 5, 300 while True: started_at = time.monotonic() try: await self.client.start(BOT_TOKEN) except Exception as e: print(f"[Discord] error: {e}") if time.monotonic() - started_at >= 60: delay = 5 print(f"[Discord] reconnect in {delay}s...") await asyncio.sleep(delay) delay = min(delay * 2, max_delay) if __name__ == "__main__": _LOCK_SOCK = ensure_single_instance(19532, "Discord") require_runtime(agent, "Discord", discord_bot_token=BOT_TOKEN) redirect_log(__file__, "dcapp.log", "Discord", ALLOWED) asyncio.run(DiscordApp().start()) ================================================ FILE: frontends/desktop_pet.pyw ================================================ """Desktop Pet with HTTP Toast — ~90 lines""" import tkinter as tk, threading, random, os, sys from http.server import HTTPServer, BaseHTTPRequestHandler from urllib.parse import urlparse, parse_qs PORT = 41983 GIF = os.path.join(os.path.dirname(os.path.abspath(sys.argv[0])), 'pet.gif') class Pet: def __init__(self): self.root = tk.Tk() self.root.overrideredirect(True) self.root.wm_attributes('-topmost', True) self.root.wm_attributes('-transparentcolor', '#01FF01') self.root.config(bg='#01FF01') self.root.after(50, lambda: self.root.geometry('+300+500')) # load GIF frames self.frames, i = [], 0 while True: try: self.frames.append(tk.PhotoImage(file=GIF, format=f'gif -index {i}')); i += 1 except: break if not self.frames: raise FileNotFoundError(f'No GIF: {GIF}') self.idx = 0 self.label = tk.Label(self.root, image=self.frames[0], bg='#01FF01', bd=0) self.label.pack() # drag self.label.bind('', lambda e: setattr(self, '_d', (e.x, e.y))) self.label.bind('', self._drag) self.label.bind('', lambda e: (self.root.destroy(), os._exit(0))) # start loops self._animate() self._wander() self._start_server() self.root.mainloop() def _drag(self, e): x, y = self.root.winfo_x() + e.x - self._d[0], self.root.winfo_y() + e.y - self._d[1] self.root.geometry(f'+{x}+{y}') def _animate(self): self.idx = (self.idx + 1) % len(self.frames) self.label.config(image=self.frames[self.idx]) self.root.after(150, self._animate) def _wander(self): if random.random() < 0.25: x = self.root.winfo_x() + random.randint(-15, 15) y = self.root.winfo_y() + random.randint(-5, 5) self.root.geometry(f'+{x}+{y}') self.root.after(4000, self._wander) def show_toast(self, msg): """Show a speech bubble near the pet that auto-dismisses.""" tw = tk.Toplevel(self.root) tw.overrideredirect(True) tw.wm_attributes('-topmost', True) tw.config(bg='#FFFDE7') px, py = self.root.winfo_x(), self.root.winfo_y() tw.geometry(f'+{px + 30}+{py - 50}') # bubble content f = tk.Frame(tw, bg='#FFFDE7', highlightbackground='#888', highlightthickness=1, padx=8, pady=4) f.pack() tk.Label(f, text=msg, bg='#FFFDE7', fg='#333', font=('Segoe UI', 10), wraplength=220, justify='left').pack() # auto dismiss tw.after(3000, tw.destroy) def _start_server(self): pet = self class H(BaseHTTPRequestHandler): def do_GET(self): qs = parse_qs(urlparse(self.path).query) msg = qs.get('msg', [''])[0] if msg: pet.root.after(0, pet.show_toast, msg) self.send_response(200); self.end_headers(); self.wfile.write(b'ok') else: self.send_response(400); self.end_headers(); self.wfile.write(b'?msg=xxx') def do_POST(self): body = self.rfile.read(int(self.headers.get('Content-Length', 0))).decode() if body: pet.root.after(0, pet.show_toast, body) self.send_response(200); self.end_headers(); self.wfile.write(b'ok') else: self.send_response(400); self.end_headers(); self.wfile.write(b'empty body') def log_message(self, *a): pass HTTPServer.allow_reuse_address = False srv = HTTPServer(('127.0.0.1', PORT), H) t = threading.Thread(target=srv.serve_forever, daemon=True) t.start() print(f'Toast server: http://127.0.0.1:{PORT}/?msg=hello') if __name__ == '__main__': Pet() ================================================ FILE: frontends/desktop_pet_v2.pyw ================================================ """Desktop Pet with Skin System — Cross-platform with True Transparency""" import os, re, sys, json, threading, io from http.server import HTTPServer, BaseHTTPRequestHandler from urllib.parse import urlparse, parse_qs from PIL import Image, ImageDraw, ImageFont, ImageOps PORT = 41983 SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__)) PROJECT_DIR = os.path.dirname(SCRIPT_DIR) SKINS_DIR = os.path.join(SCRIPT_DIR, 'skins') class SkinLoader: """Load and parse skin configuration""" @staticmethod def load_skin(skin_path): """Load skin.json and return skin config""" config_file = os.path.join(skin_path, 'skin.json') if not os.path.exists(config_file): raise FileNotFoundError(f"skin.json not found in {skin_path}") with open(config_file, 'r', encoding='utf-8') as f: config = json.load(f) if 'animations' not in config: raise ValueError("skin.json must contain 'animations' field") config['path'] = skin_path return config @staticmethod def list_skins(): """List all available skins""" if not os.path.exists(SKINS_DIR): return [] skins = [] for item in os.listdir(SKINS_DIR): skin_path = os.path.join(SKINS_DIR, item) if os.path.isdir(skin_path): config_file = os.path.join(skin_path, 'skin.json') if os.path.exists(config_file): skins.append(item) return skins class AnimationLoader: """Load animation frames from sprite sheet""" @staticmethod def load_sprite_frames(skin_path, anim_config): """Load frames from sprite sheet""" file_path = os.path.join(skin_path, anim_config['file']) sprite_config = anim_config['sprite'] img = Image.open(file_path) frames = [] frame_width = sprite_config['frameWidth'] frame_height = sprite_config['frameHeight'] frame_count = sprite_config['frameCount'] columns = sprite_config['columns'] start_frame = sprite_config.get('startFrame', 0) for i in range(frame_count): frame_idx = start_frame + i row = frame_idx // columns col = frame_idx % columns x = col * frame_width y = row * frame_height frame = img.crop((x, y, x + frame_width, y + frame_height)) frames.append(frame) return frames def _load_default_font(size): """Load a usable font for bubble text.""" font_candidates = [ '/System/Library/Fonts/Supplemental/Arial Unicode.ttf', '/System/Library/Fonts/PingFang.ttc', '/System/Library/Fonts/STHeiti Light.ttc', 'C:/Windows/Fonts/msyh.ttc', 'C:/Windows/Fonts/simhei.ttf', 'C:/Windows/Fonts/arial.ttf', '/usr/share/fonts/truetype/noto/NotoSansCJK-Regular.ttc', '/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc', '/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc', '/usr/share/fonts/truetype/droid/DroidSansFallbackFull.ttf', '/usr/share/fonts/noto-cjk/NotoSansCJK-Regular.ttc', ] for font_path in font_candidates: if os.path.exists(font_path): try: return ImageFont.truetype(font_path, size=size) except Exception: pass return ImageFont.load_default() def _normalize_bubble_text(text): """Normalize text for fonts that cannot render some symbols.""" text = (text or '').strip() lines = text.replace('\r\n', '\n').replace('\r', '\n').split('\n') if lines: turn_match = re.match(r'^\s*🔄?\s*Turn\s+(\d+)\s*$', lines[0], flags=re.IGNORECASE) if turn_match: rest = '\n'.join(line.strip() for line in lines[1:] if line.strip()) return f"Turn {turn_match.group(1)}: {rest}" if rest else f"Turn {turn_match.group(1)}:" return text.replace('🔄 Turn', 'Turn').replace('🔄', '').strip() def _wrap_text_for_width(draw, text, font, max_width): """Wrap text to fit inside max_width.""" text = _normalize_bubble_text(text) if not text: return [''] paragraphs = text.replace('\r\n', '\n').replace('\r', '\n').split('\n') lines = [] for paragraph in paragraphs: if not paragraph: lines.append('') continue current = '' for ch in paragraph: candidate = current + ch bbox = draw.textbbox((0, 0), candidate, font=font) width = bbox[2] - bbox[0] if current and width > max_width: lines.append(current) current = ch else: current = candidate if current: lines.append(current) return lines or [''] def build_bubble_image(message, max_width=220): """Build a PIL image for the toast bubble using the user asset when available.""" message = (message or '').strip() bubble_path = next((p for p in [os.path.join(SCRIPT_DIR, 'chat_bubble.png'), os.path.join(SCRIPT_DIR, 'bubble.png')] if os.path.exists(p)), None) if bubble_path: bubble = Image.open(bubble_path).convert('RGBA') else: bubble = Image.new('RGBA', (256, 128), (255, 255, 255, 0)) draw = ImageDraw.Draw(bubble) draw.rounded_rectangle((8, 8, 247, 87), radius=12, fill=(255, 255, 255, 255), outline=(0, 0, 0, 255), width=3) draw.polygon([(48, 87), (72, 87), (56, 112)], fill=(255, 255, 255, 255), outline=(0, 0, 0, 255)) bubble = ImageOps.contain(bubble, (max_width, max(64, int(max_width * bubble.height / bubble.width))), Image.NEAREST) # Detect the actual opaque bubble region to position text correctly alpha = bubble.getchannel('A') content_box = alpha.getbbox() # (left, top, right, bottom) of opaque area if content_box: cb_left, cb_top, cb_right, cb_bottom = content_box else: cb_left, cb_top, cb_right, cb_bottom = 0, 0, bubble.width, bubble.height content_w = cb_right - cb_left content_h = cb_bottom - cb_top font_size = max(12, content_h // 6) font = _load_default_font(font_size) draw = ImageDraw.Draw(bubble) # Padding relative to the opaque bubble region, not the full image inner_pad_x = max(6, content_w // 14) inner_pad_top = max(4, content_h // 12) inner_pad_bottom = max(12, content_h // 4) text_area_width = max(36, content_w - inner_pad_x * 2) lines = _wrap_text_for_width(draw, message, font, text_area_width) ascent, descent = font.getmetrics() if hasattr(font, 'getmetrics') else (font_size, font_size // 4) line_height = max(font_size, ascent + descent) usable_h = content_h - inner_pad_top - inner_pad_bottom max_lines = max(1, usable_h // line_height) if len(lines) > max_lines: lines = lines[:max_lines] if lines: last = lines[-1] while last and draw.textbbox((0, 0), last + '…', font=font)[2] > text_area_width: last = last[:-1] lines[-1] = (last + '…') if last else '…' total_text_height = len(lines) * line_height y = cb_top + inner_pad_top + max(0, (usable_h - total_text_height) // 2) - 3 for line in lines: bbox = draw.textbbox((0, 0), line, font=font) text_width = bbox[2] - bbox[0] x = cb_left + inner_pad_x + (text_area_width - text_width) / 2 draw.text((x, y), line, font=font, fill=(32, 32, 32, 255)) y += line_height alpha = bubble.getchannel('A') bbox = alpha.getbbox() if bbox: bubble = bubble.crop(bbox) width, height = bubble.size alpha = bubble.getchannel('A') bottom_y = height - 1 tail_x = width // 2 for y in range(height - 1, -1, -1): xs = [x for x in range(width) if alpha.getpixel((x, y)) > 0] if xs: bottom_y = y tail_x = xs[len(xs) // 2] break return { 'image': bubble, 'size': bubble.size, 'tail_tip': (tail_x, bottom_y), } # ============================================================================ # Shared Base Class # ============================================================================ class PetBase: """Shared logic for Mac and Windows pet implementations.""" def _schedule_main(self, fn): """Schedule fn on the GUI main thread. Subclasses must override.""" raise NotImplementedError def set_state_safe(self, state): """Thread-safe wrapper for set_state.""" self._schedule_main(lambda: self.set_state(state)) def show_toast_safe(self, message): """Thread-safe wrapper for show_toast.""" self._schedule_main(lambda m=message: self.show_toast(m)) def _start_server(self): """Start HTTP control server.""" pet = self class Handler(BaseHTTPRequestHandler): def do_GET(self): parsed = urlparse(self.path) params = parse_qs(parsed.query) if 'state' in params: state = params['state'][0] pet.set_state_safe(state) self.send_response(200) self.end_headers() self.wfile.write(b'ok') elif 'msg' in params: msg = params['msg'][0] pet.show_toast_safe(msg) self.send_response(200) self.end_headers() self.wfile.write(b'ok') else: self.send_response(400) self.end_headers() self.wfile.write(b'?state=idle/walk/run/sprint or ?msg=hello') def do_POST(self): body = self.rfile.read(int(self.headers.get('Content-Length', 0))).decode() if body: pet.show_toast_safe(body) self.send_response(200) self.end_headers() self.wfile.write(b'ok') else: self.send_response(400) self.end_headers() self.wfile.write(b'empty body') def log_message(self, *a): pass try: HTTPServer.allow_reuse_address = True srv = HTTPServer(('127.0.0.1', PORT), Handler) threading.Thread(target=srv.serve_forever, daemon=True).start() print(f'✓ Server: http://127.0.0.1:{PORT}/?state=walk') except OSError as e: if e.errno == 48: print(f'⚠ Port {PORT} already in use') else: raise # ============================================================================ # macOS Implementation - Pure Cocoa with True Transparency # ============================================================================ if sys.platform == 'darwin': from Cocoa import ( NSApplication, NSWindow, NSImageView, NSImage, NSData, NSTimer, NSMenu, NSMenuItem, NSApp, NSFloatingWindowLevel, NSColor, NSBackingStoreBuffered, NSWindowStyleMaskBorderless, NSApplicationActivationPolicyAccessory ) from Foundation import NSMakeRect, NSMakePoint, NSMakeSize from PyObjCTools import AppHelper import objc class MacPet(PetBase): def __init__(self, skin_name=None): self.app = NSApplication.sharedApplication() self.app.setActivationPolicy_(NSApplicationActivationPolicyAccessory) # Load skin self.load_skin(skin_name) self.available_skins = SkinLoader.list_skins() # Get screen size from AppKit import NSScreen, NSWindowCollectionBehaviorCanJoinAllSpaces, NSWindowCollectionBehaviorStationary screen = NSScreen.mainScreen() screen_frame = screen.frame() screen_width = screen_frame.size.width screen_height = screen_frame.size.height # Position at right side x_pos = screen_width - 200 y_pos = 300 # Create transparent window self.window = NSWindow.alloc().initWithContentRect_styleMask_backing_defer_( NSMakeRect(x_pos, y_pos, self.display_width, self.display_height), NSWindowStyleMaskBorderless, NSBackingStoreBuffered, False ) self.window.setOpaque_(False) self.window.setBackgroundColor_(NSColor.clearColor()) self.window.setLevel_(NSFloatingWindowLevel) self.window.setMovableByWindowBackground_(True) self.window.setAcceptsMouseMovedEvents_(True) # Make window sticky across spaces (stays in fixed screen position) self.window.setCollectionBehavior_( NSWindowCollectionBehaviorCanJoinAllSpaces | NSWindowCollectionBehaviorStationary ) # Create custom view for handling mouse events from AppKit import NSView from objc import super as objc_super class DraggableImageView(NSView): """Custom view that handles dragging and double-click""" def initWithFrame_(self, frame): self = objc_super(DraggableImageView, self).initWithFrame_(frame) if self is None: return None self.image_view = NSImageView.alloc().initWithFrame_(self.bounds()) self.image_view.setImageScaling_(1) # NSImageScaleProportionallyUpOrDown self.addSubview_(self.image_view) # Create overlay view for toast (always on top) # Make it non-opaque so it doesn't block the image self.overlay_view = NSView.alloc().initWithFrame_(self.bounds()) self.overlay_view.setWantsLayer_(True) self.addSubview_(self.overlay_view) self.drag_start = None return self def mouseDown_(self, event): """Handle mouse down for dragging""" if event.clickCount() == 2: # Double-click to quit from AppKit import NSApp NSApp.terminate_(None) else: # Start dragging self.drag_start = event.locationInWindow() def mouseDragged_(self, event): """Handle mouse drag""" if self.drag_start: current_location = event.locationInWindow() window_frame = self.window().frame() dx = current_location.x - self.drag_start.x dy = current_location.y - self.drag_start.y new_origin = NSMakePoint( window_frame.origin.x + dx, window_frame.origin.y + dy ) self.window().setFrameOrigin_(new_origin) def acceptsFirstMouse_(self, event): """Accept first mouse click""" return True def rightMouseDown_(self, event): from AppKit import NSMenu, NSMenuItem, NSApp menu = NSMenu.alloc().init() pet = getattr(self, 'mac_pet', None) or self.window().delegate() if not pet: return for skin_name in pet.available_skins: # preload this in MacPet.__init__ item = NSMenuItem.alloc().initWithTitle_action_keyEquivalent_( skin_name, 'changeSkin:', '' ) item.setTarget_(pet) item.setRepresentedObject_(skin_name) menu.addItem_(item) menu.addItem_(NSMenuItem.separatorItem()) quit_item = NSMenuItem.alloc().initWithTitle_action_keyEquivalent_('Quit', 'terminate:', '') menu.addItem_(quit_item) NSApp.activateIgnoringOtherApps_(True) NSMenu.popUpContextMenu_withEvent_forView_(menu, event, self) # Create draggable view self.content_view = DraggableImageView.alloc().initWithFrame_( NSMakeRect(0, 0, self.display_width, self.display_height) ) self.content_view.mac_pet = self self.image_view = self.content_view.image_view self.overlay_view = self.content_view.overlay_view self.window.setContentView_(self.content_view) # Animation state self.current_state = 'idle' self.frame_idx = 0 # Toast state self.toast_label = None self.toast_timer = None self.toast_image = None self.toast_window = None # Start animation timer self.timer = NSTimer.scheduledTimerWithTimeInterval_target_selector_userInfo_repeats_( 1.0 / self.animations[self.current_state]['fps'], self, 'animate:', None, True ) # Show window self.window.makeKeyAndOrderFront_(None) # Start HTTP server self._start_server() print(f"✓ macOS Pet started at ({x_pos}, {y_pos})") print(f" Animations: {', '.join(self.animations.keys())}") def load_skin(self, skin_name=None): """Load skin configuration and animations""" available_skins = SkinLoader.list_skins() if not available_skins: raise FileNotFoundError(f"No skins found in {SKINS_DIR}") if skin_name is None or skin_name not in available_skins: skin_name = available_skins[0] skin_path = os.path.join(SKINS_DIR, skin_name) self.skin_config = SkinLoader.load_skin(skin_path) # Get display size display_size = self.skin_config.get('size', {}) self.display_width = display_size.get('width', 128) self.display_height = display_size.get('height', 128) # Load animations self.animations = {} for anim_name, anim_config in self.skin_config['animations'].items(): pil_frames = AnimationLoader.load_sprite_frames(skin_path, anim_config) # Scale frames scaled_frames = [] for frame in pil_frames: if frame.mode != 'RGBA': frame = frame.convert('RGBA') scaled = frame.resize((self.display_width, self.display_height), Image.NEAREST) scaled_frames.append(scaled) # Convert to NSImage with proper alpha handling ns_images = [] for pil_img in scaled_frames: # Convert PIL to PNG bytes (PNG preserves alpha channel) png_buffer = io.BytesIO() pil_img.save(png_buffer, format='PNG') png_data = png_buffer.getvalue() # Create NSImage from PNG data ns_data = NSData.dataWithBytes_length_(png_data, len(png_data)) ns_image = NSImage.alloc().initWithData_(ns_data) ns_images.append(ns_image) self.animations[anim_name] = { 'frames': ns_images, 'fps': anim_config.get('sprite', {}).get('fps', 6) } def animate_(self, timer): """Animation callback""" anim = self.animations[self.current_state] frames = anim['frames'] if frames: self.image_view.setImage_(frames[self.frame_idx]) self.frame_idx = (self.frame_idx + 1) % len(frames) def set_state(self, state): """Change animation state (must be called on main thread)""" if state in self.animations and state != self.current_state: self.current_state = state self.frame_idx = 0 # Update timer interval self.timer.invalidate() self.timer = NSTimer.scheduledTimerWithTimeInterval_target_selector_userInfo_repeats_( 1.0 / self.animations[self.current_state]['fps'], self, 'animate:', None, True ) print(f"→ State: {state}") def _schedule_main(self, fn): AppHelper.callAfter(fn) def show_toast(self, message): """Show toast message above pet""" from AppKit import NSImageView if self.toast_window: self.toast_window.orderOut_(None) self.toast_window = None self.toast_label = None if self.toast_timer: self.toast_timer.invalidate() self.toast_timer = None bubble_info = build_bubble_image(message, max_width=max(180, min(260, self.display_width * 2))) bubble_pil = bubble_info['image'] bubble_width, bubble_height = bubble_info['size'] tail_x, tail_y = bubble_info['tail_tip'] png_buffer = io.BytesIO() bubble_pil.save(png_buffer, format='PNG') png_data = png_buffer.getvalue() ns_data = NSData.dataWithBytes_length_(png_data, len(png_data)) self.toast_image = NSImage.alloc().initWithData_(ns_data) pet_frame = self.window.frame() anchor_x = pet_frame.origin.x + self.display_width * 0.75 anchor_y = pet_frame.origin.y + self.display_height * 1.65 toast_x = anchor_x - tail_x toast_y = anchor_y - tail_y self.toast_window = NSWindow.alloc().initWithContentRect_styleMask_backing_defer_( NSMakeRect(toast_x, toast_y, bubble_width, bubble_height), NSWindowStyleMaskBorderless, NSBackingStoreBuffered, False ) self.toast_window.setOpaque_(False) self.toast_window.setBackgroundColor_(NSColor.clearColor()) self.toast_window.setLevel_(NSFloatingWindowLevel) self.toast_window.setIgnoresMouseEvents_(True) self.toast_window.setHasShadow_(False) self.toast_label = NSImageView.alloc().initWithFrame_( NSMakeRect(0, 0, bubble_width, bubble_height) ) self.toast_label.setImage_(self.toast_image) self.toast_label.setImageScaling_(0) self.toast_window.setContentView_(self.toast_label) self.toast_window.orderFrontRegardless() self.toast_timer = NSTimer.scheduledTimerWithTimeInterval_target_selector_userInfo_repeats_( 3.0, self, 'hideToast:', None, False ) print(f"Toast: {message}") def hideToast_(self, timer): """Hide toast message""" if self.toast_window: self.toast_window.orderOut_(None) self.toast_window = None self.toast_label = None self.toast_image = None self.toast_timer = None def run(self): """Run the application""" AppHelper.runEventLoop() def changeSkin_(self, sender): skin_name = sender.representedObject() print(f"Changing skin to: {skin_name}") self.load_skin(skin_name) self.current_state = 'idle' self.frame_idx = 0 # ============================================================================ # Windows/Linux Implementations # ============================================================================ else: if sys.platform.startswith('win'): import tkinter as tk from PIL import ImageTk class WinPet(PetBase): def __init__(self, skin_name=None): self.root = tk.Tk() self.root.wm_attributes('-topmost', True) self.is_windows = sys.platform.startswith('win') self.platform_name = 'Windows' if self.is_windows else 'Linux' self.pet_bg_color = '#F0F0F0' if self.is_windows else 'black' self.toast_bg_color = '#00ff01' if self.is_windows else 'black' # Load skin self.load_skin(skin_name) # Setup window screen_width = self.root.winfo_screenwidth() screen_height = self.root.winfo_screenheight() x_pos = screen_width - 200 y_pos = screen_height - 300 self.root.geometry(f'{self.display_width}x{self.display_height}+{x_pos}+{y_pos}') self.root.overrideredirect(True) self.root.wm_attributes('-topmost', True) # Transparent background if self.is_windows: self.root.wm_attributes('-transparentcolor', self.pet_bg_color) self.root.config(bg=self.pet_bg_color) # Create label self.label = tk.Label(self.root, bg=self.pet_bg_color, bd=0) self.label.pack() # Bind events self.label.bind('', lambda e: setattr(self, '_d', (e.x, e.y))) self.label.bind('', self._drag) self.label.bind('', lambda e: (self.root.destroy(), os._exit(0))) self.label.bind('', self._on_right_click) # Animation state self.current_state = 'idle' self.frame_idx = 0 # Toast state self.toast_window = None self.toast_photo = None # Start animation self._animate() self._start_server() print(f"✓ {self.platform_name} Pet started at ({x_pos}, {y_pos})") print(f" Animations: {', '.join(self.animations.keys())}") self.root.mainloop() def load_skin(self, skin_name=None): """Load skin configuration and animations""" available_skins = SkinLoader.list_skins() if not available_skins: raise FileNotFoundError(f"No skins found in {SKINS_DIR}") if skin_name is None or skin_name not in available_skins: skin_name = available_skins[0] skin_path = os.path.join(SKINS_DIR, skin_name) self.skin_config = SkinLoader.load_skin(skin_path) # Get display size display_size = self.skin_config.get('size', {}) self.display_width = display_size.get('width', 128) self.display_height = display_size.get('height', 128) # Load animations self.animations = {} for anim_name, anim_config in self.skin_config['animations'].items(): pil_frames = AnimationLoader.load_sprite_frames(skin_path, anim_config) # Scale and convert frames tk_frames = [] for frame in pil_frames: if frame.mode != 'RGBA': frame = frame.convert('RGBA') scaled = frame.resize((self.display_width, self.display_height), Image.NEAREST) tk_frames.append(ImageTk.PhotoImage(scaled)) self.animations[anim_name] = { 'frames': tk_frames, 'fps': anim_config.get('sprite', {}).get('fps', 6) } def set_state(self, state): """Change animation state""" if state in self.animations and state != self.current_state: self.current_state = state self.frame_idx = 0 print(f"→ State: {state}") def _drag(self, e): x = self.root.winfo_x() + e.x - self._d[0] y = self.root.winfo_y() + e.y - self._d[1] self.root.geometry(f'+{x}+{y}') def _animate(self): """Animate current state""" if self.current_state not in self.animations: self.root.after(100, self._animate) return anim = self.animations[self.current_state] frames = anim['frames'] if frames: self.label.config(image=frames[self.frame_idx]) self.frame_idx = (self.frame_idx + 1) % len(frames) delay = int(1000 / anim['fps']) self.root.after(delay, self._animate) def show_toast(self, message): """Show toast message above pet""" if self.toast_window: try: self.toast_window.destroy() except: pass self.toast_window = None bubble_info = build_bubble_image(message, max_width=max(180, min(260, self.display_width * 2))) bubble_pil = bubble_info['image'] bubble_width, bubble_height = bubble_info['size'] tail_x, tail_y = bubble_info['tail_tip'] self.toast_photo = ImageTk.PhotoImage(bubble_pil) self.toast_window = tk.Toplevel(self.root) self.toast_window.overrideredirect(True) self.toast_window.wm_attributes('-topmost', True) if self.is_windows: self.toast_window.wm_attributes('-transparentcolor', self.toast_bg_color) self.toast_window.config(bg=self.toast_bg_color) toast_label = tk.Label( self.toast_window, image=self.toast_photo, bg=self.toast_bg_color, bd=0, highlightthickness=0 ) toast_label.pack() pet_x = self.root.winfo_x() pet_y = self.root.winfo_y() anchor_x = pet_x + int(self.display_width * 0.75) anchor_y = pet_y toast_x = anchor_x - tail_x toast_y = anchor_y - bubble_height self.toast_window.geometry(f'{bubble_width}x{bubble_height}+{toast_x}+{toast_y}') self.root.after(3000, self._hide_toast) print(f"Toast: {message}") def _hide_toast(self): """Hide toast message""" if self.toast_window: try: self.toast_window.destroy() self.toast_window = None except: pass def _schedule_main(self, fn): self.root.after(0, fn) def run(self): """Run the application (already in mainloop)""" pass def _on_right_click(self, event): # Build a dynamic menu of all available skins menu = tk.Menu(self.root, tearoff=0) for skin_name in SkinLoader.list_skins(): menu.add_command( label=skin_name, command=lambda name=skin_name: self._change_skin(name) ) menu.add_separator() menu.add_command(label="Quit", command=lambda: (self.root.destroy(), os._exit(0))) menu.tk_popup(event.x_root, event.y_root) def _change_skin(self, skin_name): print(f"Changing skin to: {skin_name}") self.load_skin(skin_name) self.current_state = 'idle' self.frame_idx = 0 else: from PySide6.QtCore import Qt, QTimer, QPoint from PySide6.QtGui import QAction, QCursor, QImage, QPixmap from PySide6.QtWidgets import QApplication, QLabel, QMenu, QWidget class _LinuxPetLabel(QLabel): def __init__(self, pet): super().__init__() self.pet = pet self.drag_offset = None def mousePressEvent(self, event): if event.button() == Qt.LeftButton: self.drag_offset = event.globalPosition().toPoint() - self.pet.window.frameGeometry().topLeft() event.accept() return if event.button() == Qt.RightButton: self.pet._show_context_menu(event.globalPosition().toPoint()) event.accept() return super().mousePressEvent(event) def mouseMoveEvent(self, event): if self.drag_offset is not None and (event.buttons() & Qt.LeftButton): self.pet.window.move(event.globalPosition().toPoint() - self.drag_offset) self.pet._reposition_toast() event.accept() return super().mouseMoveEvent(event) def mouseReleaseEvent(self, event): if event.button() == Qt.LeftButton: self.drag_offset = None super().mouseReleaseEvent(event) def mouseDoubleClickEvent(self, event): if event.button() == Qt.LeftButton: QApplication.instance().quit() event.accept() return super().mouseDoubleClickEvent(event) class LinuxPet(PetBase): def __init__(self, skin_name=None): self.app = QApplication.instance() or QApplication(sys.argv) self.available_skins = SkinLoader.list_skins() self.load_skin(skin_name) screen = self.app.primaryScreen() screen_geo = screen.availableGeometry() if screen else None if screen_geo: x_pos = screen_geo.right() - self.display_width - 72 y_pos = screen_geo.bottom() - self.display_height - 120 else: x_pos, y_pos = 1200, 700 self.window = QWidget() self.window.setWindowFlags( Qt.FramelessWindowHint | Qt.WindowStaysOnTopHint | Qt.Tool ) self.window.setAttribute(Qt.WA_TranslucentBackground, True) self.window.setAttribute(Qt.WA_ShowWithoutActivating, True) self.window.resize(self.display_width, self.display_height) self.window.move(x_pos, y_pos) self.label = _LinuxPetLabel(self) self.label.setParent(self.window) self.label.setGeometry(0, 0, self.display_width, self.display_height) self.label.setAttribute(Qt.WA_TranslucentBackground, True) self.label.setStyleSheet('background: transparent;') self.label.setScaledContents(True) self.current_state = 'idle' self.frame_idx = 0 self.toast_window = None self.toast_label = None self.toast_pixmap = None self.anim_timer = QTimer() self.anim_timer.timeout.connect(self._animate) self._restart_animation_timer() self.window.show() self._start_server() print(f"✓ Linux PySide6 Pet started at ({x_pos}, {y_pos})") print(f" Animations: {', '.join(self.animations.keys())}") def _pil_to_qpixmap(self, pil_img): buffer = io.BytesIO() pil_img.save(buffer, format='PNG') qimage = QImage.fromData(buffer.getvalue(), 'PNG') return QPixmap.fromImage(qimage) def load_skin(self, skin_name=None): available_skins = SkinLoader.list_skins() if not available_skins: raise FileNotFoundError(f"No skins found in {SKINS_DIR}") if skin_name is None or skin_name not in available_skins: skin_name = available_skins[0] skin_path = os.path.join(SKINS_DIR, skin_name) self.skin_config = SkinLoader.load_skin(skin_path) display_size = self.skin_config.get('size', {}) self.display_width = display_size.get('width', 128) self.display_height = display_size.get('height', 128) self.animations = {} for anim_name, anim_config in self.skin_config['animations'].items(): pil_frames = AnimationLoader.load_sprite_frames(skin_path, anim_config) qt_frames = [] for frame in pil_frames: if frame.mode != 'RGBA': frame = frame.convert('RGBA') scaled = frame.resize((self.display_width, self.display_height), Image.NEAREST) qt_frames.append(self._pil_to_qpixmap(scaled)) self.animations[anim_name] = { 'frames': qt_frames, 'fps': anim_config.get('sprite', {}).get('fps', 6) } if hasattr(self, 'window'): self.window.resize(self.display_width, self.display_height) self.label.setGeometry(0, 0, self.display_width, self.display_height) self._animate(force=True) self._reposition_toast() def _restart_animation_timer(self): anim = self.animations.get(self.current_state) or next(iter(self.animations.values())) fps = max(1, anim.get('fps', 6)) self.anim_timer.start(int(1000 / fps)) def _animate(self, force=False): if self.current_state not in self.animations: return anim = self.animations[self.current_state] frames = anim['frames'] if not frames: return if force: self.frame_idx = 0 self.label.setPixmap(frames[self.frame_idx]) self.frame_idx = (self.frame_idx + 1) % len(frames) def set_state(self, state): if state in self.animations and state != self.current_state: self.current_state = state self.frame_idx = 0 self._restart_animation_timer() print(f"→ State: {state}") def _show_context_menu(self, global_pos): menu = QMenu(self.window) for skin_name in SkinLoader.list_skins(): action = QAction(skin_name, menu) action.triggered.connect(lambda checked=False, name=skin_name: self._change_skin(name)) menu.addAction(action) menu.addSeparator() quit_action = QAction('Quit', menu) quit_action.triggered.connect(QApplication.instance().quit) menu.addAction(quit_action) menu.popup(global_pos) def _compute_toast_geometry(self, bubble_width, bubble_height, tail_x, tail_y): pet_pos = self.window.frameGeometry().topLeft() anchor_x = pet_pos.x() + int(self.display_width * 0.75) anchor_y = pet_pos.y() + int(self.display_height * 0.15) return anchor_x - tail_x, anchor_y - tail_y - bubble_height // 2 def show_toast(self, message): if self.toast_window: self.toast_window.close() self.toast_window = None self.toast_label = None self.toast_pixmap = None bubble_info = build_bubble_image(message, max_width=max(180, min(260, self.display_width * 2))) bubble_pil = bubble_info['image'] bubble_width, bubble_height = bubble_info['size'] tail_x, tail_y = bubble_info['tail_tip'] self.toast_pixmap = self._pil_to_qpixmap(bubble_pil) self.toast_window = QWidget() self.toast_window.setWindowFlags( Qt.FramelessWindowHint | Qt.WindowStaysOnTopHint | Qt.Tool | Qt.WindowTransparentForInput ) self.toast_window.setAttribute(Qt.WA_TranslucentBackground, True) self.toast_window.setAttribute(Qt.WA_ShowWithoutActivating, True) self.toast_window.resize(bubble_width, bubble_height) self.toast_label = QLabel(self.toast_window) self.toast_label.setGeometry(0, 0, bubble_width, bubble_height) self.toast_label.setPixmap(self.toast_pixmap) self.toast_label.setAttribute(Qt.WA_TranslucentBackground, True) self.toast_label.setStyleSheet('background: transparent;') toast_x, toast_y = self._compute_toast_geometry(bubble_width, bubble_height, tail_x, tail_y) self.toast_window.move(toast_x, toast_y) self.toast_window.show() QTimer.singleShot(3000, self._hide_toast) print(f"Toast: {message}") def _reposition_toast(self): if not self.toast_window: return label_pixmap = self.toast_label.pixmap() if self.toast_label else None if label_pixmap is None: return bubble_width = label_pixmap.width() bubble_height = label_pixmap.height() toast_x, toast_y = self._compute_toast_geometry( bubble_width, bubble_height, bubble_width // 2, bubble_height ) self.toast_window.move(toast_x, toast_y) def _hide_toast(self): if self.toast_window: self.toast_window.close() self.toast_window = None self.toast_label = None self.toast_pixmap = None def _schedule_main(self, fn): QTimer.singleShot(0, fn) def _change_skin(self, skin_name): print(f"Changing skin to: {skin_name}") self.load_skin(skin_name) self.current_state = 'idle' self.frame_idx = 0 self._restart_animation_timer() def run(self): self.app.exec() if __name__ == '__main__': # Singleton: if port already in use, another instance is running import socket _s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) try: _s.connect(('127.0.0.1', PORT)) _s.close() print(f'⚠ Pet already running on port {PORT}, exiting.') sys.exit(0) except ConnectionRefusedError: pass if sys.platform == 'darwin': pet = MacPet('vita') pet.run() elif sys.platform.startswith('win'): pet = WinPet('vita') else: pet = LinuxPet('vita') pet.run() ================================================ FILE: frontends/dingtalkapp.py ================================================ import asyncio, json, os, sys, threading, time import requests sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from agentmain import GeneraticAgent from chatapp_common import AgentChatMixin, ensure_single_instance, public_access, redirect_log, require_runtime, split_text from llmcore import mykeys try: from dingtalk_stream import AckMessage, CallbackHandler, Credential, DingTalkStreamClient from dingtalk_stream.chatbot import ChatbotMessage except Exception: print("Please install dingtalk-stream to use DingTalk: pip install dingtalk-stream") sys.exit(1) agent = GeneraticAgent(); agent.verbose = False CLIENT_ID = str(mykeys.get("dingtalk_client_id", "") or "").strip() CLIENT_SECRET = str(mykeys.get("dingtalk_client_secret", "") or "").strip() ALLOWED = {str(x).strip() for x in mykeys.get("dingtalk_allowed_users", []) if str(x).strip()} USER_TASKS = {} class DingTalkApp(AgentChatMixin): label, source, split_limit = "DingTalk", "dingtalk", 1800 def __init__(self): super().__init__(agent, USER_TASKS) self.client, self.access_token, self.token_expiry, self.background_tasks = None, None, 0, set() async def _get_access_token(self): if self.access_token and time.time() < self.token_expiry: return self.access_token def _fetch(): resp = requests.post("https://api.dingtalk.com/v1.0/oauth2/accessToken", json={"appKey": CLIENT_ID, "appSecret": CLIENT_SECRET}, timeout=20) resp.raise_for_status() return resp.json() last_err = None for attempt in range(2): try: data = await asyncio.to_thread(_fetch) self.access_token = data.get("accessToken") self.token_expiry = time.time() + int(data.get("expireIn", 7200)) - 60 return self.access_token except Exception as e: last_err = e if attempt == 0: await asyncio.sleep(1) print(f"[DingTalk] token error after retry: {last_err}") return None async def _send_batch_message(self, chat_id, msg_key, msg_param): token = await self._get_access_token() if not token: return False headers = {"x-acs-dingtalk-access-token": token} if chat_id.startswith("group:"): url = "https://api.dingtalk.com/v1.0/robot/groupMessages/send" payload = {"robotCode": CLIENT_ID, "openConversationId": chat_id[6:], "msgKey": msg_key, "msgParam": json.dumps(msg_param, ensure_ascii=False)} else: url = "https://api.dingtalk.com/v1.0/robot/oToMessages/batchSend" payload = {"robotCode": CLIENT_ID, "userIds": [chat_id], "msgKey": msg_key, "msgParam": json.dumps(msg_param, ensure_ascii=False)} def _post(): resp = requests.post(url, json=payload, headers=headers, timeout=20) body = resp.text if resp.status_code != 200: raise RuntimeError(f"HTTP {resp.status_code}: {body[:300]}") result = resp.json() if "json" in resp.headers.get("content-type", "") else {} errcode = result.get("errcode") if errcode not in (None, 0): raise RuntimeError(f"API errcode={errcode}: {body[:300]}") return True try: return await asyncio.to_thread(_post) except Exception as e: print(f"[DingTalk] send error: {e}") return False async def send_text(self, chat_id, content): for part in split_text(content, self.split_limit): await self._send_batch_message(chat_id, "sampleMarkdown", {"text": part, "title": "Agent Reply"}) async def on_message(self, content, sender_id, sender_name, conversation_type=None, conversation_id=None): try: if not content: return if not public_access(ALLOWED) and sender_id not in ALLOWED: print(f"[DingTalk] unauthorized user: {sender_id}") return is_group = conversation_type == "2" and conversation_id chat_id = f"group:{conversation_id}" if is_group else sender_id print(f"[DingTalk] message from {sender_name} ({sender_id}): {content}") if content.startswith("/"): return await self.handle_command(chat_id, content) task = asyncio.create_task(self.run_agent(chat_id, content)) self.background_tasks.add(task) task.add_done_callback(self.background_tasks.discard) except Exception: import traceback print("[DingTalk] handle_message error") traceback.print_exc() async def start(self): self.client = DingTalkStreamClient(Credential(CLIENT_ID, CLIENT_SECRET)) self.client.register_callback_handler(ChatbotMessage.TOPIC, _DingTalkHandler(self)) print("[DingTalk] bot starting...") delay, max_delay = 5, 300 while True: started_at = time.monotonic() try: await self.client.start() except Exception as e: print(f"[DingTalk] stream error: {e}") # any session that lived >=60s is treated as healthy -> reset backoff if time.monotonic() - started_at >= 60: delay = 5 print(f"[DingTalk] reconnect in {delay}s...") await asyncio.sleep(delay) delay = min(delay * 2, max_delay) class _DingTalkHandler(CallbackHandler): def __init__(self, app): super().__init__() self.app = app async def process(self, message): try: chatbot_msg = ChatbotMessage.from_dict(message.data) text = getattr(getattr(chatbot_msg, "text", None), "content", "") or "" extensions = getattr(chatbot_msg, "extensions", None) or {} recognition = ((extensions.get("content") or {}).get("recognition") or "").strip() if isinstance(extensions, dict) else "" if not (text := text.strip()): text = recognition or str((message.data.get("text", {}) or {}).get("content", "") or "").strip() sender_id = str(getattr(chatbot_msg, "sender_staff_id", None) or getattr(chatbot_msg, "sender_id", None) or "unknown") sender_name = getattr(chatbot_msg, "sender_nick", None) or "Unknown" await self.app.on_message(text, sender_id, sender_name, message.data.get("conversationType"), message.data.get("conversationId") or message.data.get("openConversationId")) except Exception as e: print(f"[DingTalk] callback error: {e}") return AckMessage.STATUS_OK, "OK" if __name__ == "__main__": _LOCK_SOCK = ensure_single_instance(19530, "DingTalk") require_runtime(agent, "DingTalk", dingtalk_client_id=CLIENT_ID, dingtalk_client_secret=CLIENT_SECRET) redirect_log(__file__, "dingtalkapp.log", "DingTalk", ALLOWED) threading.Thread(target=agent.run, daemon=True).start() asyncio.run(DingTalkApp().start()) ================================================ FILE: frontends/fsapp.py ================================================ import glob, json, os, queue as Q, re, sys, threading, time PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) sys.path.insert(0, PROJECT_ROOT) os.chdir(PROJECT_ROOT) from agentmain import GeneraticAgent from frontends.chatapp_common import format_restore from frontends.continue_cmd import handle_frontend_command as handle_continue_frontend, reset_conversation from llmcore import mykeys import traceback import lark_oapi as lark from lark_oapi.api.im.v1 import * _TAG_PATS = [r"<" + t + r">.*?" for t in ("thinking", "summary", "tool_use", "file_content")] _IMAGE_EXTS = {".png", ".jpg", ".jpeg", ".gif", ".bmp", ".webp", ".ico", ".tiff", ".tif"} _AUDIO_EXTS = {".opus", ".mp3", ".wav", ".m4a", ".aac"} _VIDEO_EXTS = {".mp4", ".mov", ".avi", ".mkv", ".webm"} _FILE_TYPE_MAP = { ".opus": "opus", ".mp4": "mp4", ".pdf": "pdf", ".doc": "doc", ".docx": "doc", ".xls": "xls", ".xlsx": "xls", ".ppt": "ppt", ".pptx": "ppt", } _MSG_TYPE_MAP = {"image": "[image]", "audio": "[audio]", "file": "[file]", "media": "[media]", "sticker": "[sticker]"} TEMP_DIR = os.path.join(PROJECT_ROOT, "temp") MEDIA_DIR = os.path.join(TEMP_DIR, "feishu_media") os.makedirs(MEDIA_DIR, exist_ok=True) _TRUNC_TAIL = 300 # 截断兜底时保留原文尾部字符数 def _clean(text): for pat in _TAG_PATS: text = re.sub(pat, "", text or "", flags=re.DOTALL) return re.sub(r"\n{3,}", "\n\n", text).strip() def _extract_files(text): return re.findall(r"\[FILE:([^\]]+)\]", text or "") def _strip_files(text): return re.sub(r"\[FILE:[^\]]+\]", "", text or "").strip() def _display_text(text): cleaned = _strip_files(_clean(text)) if cleaned: return cleaned tail = (text or "").strip()[-_TRUNC_TAIL:] return "⚠️ 模型输出被截断或为空" + (f"\n…{tail}" if tail else "") def _to_allowed_set(value): if value is None: return set() if isinstance(value, str): value = [value] return {str(x).strip() for x in value if str(x).strip()} def _parse_json(raw): if not raw: return {} try: return json.loads(raw) except Exception: return {} def _extract_share_card_content(content_json, msg_type): parts = [] if msg_type == "share_chat": parts.append(f"[shared chat: {content_json.get('chat_id', '')}]") elif msg_type == "share_user": parts.append(f"[shared user: {content_json.get('user_id', '')}]") elif msg_type == "interactive": parts.extend(_extract_interactive_content(content_json)) elif msg_type == "share_calendar_event": parts.append(f"[shared calendar event: {content_json.get('event_key', '')}]") elif msg_type == "system": parts.append("[system message]") elif msg_type == "merge_forward": parts.append("[merged forward messages]") return "\n".join([p for p in parts if p]).strip() or f"[{msg_type}]" def _extract_interactive_content(content): parts = [] if isinstance(content, str): try: content = json.loads(content) except Exception: return [content] if content.strip() else [] if not isinstance(content, dict): return parts title = content.get("title") if isinstance(title, dict): title_text = title.get("content", "") or title.get("text", "") if title_text: parts.append(f"title: {title_text}") elif isinstance(title, str) and title: parts.append(f"title: {title}") elements = content.get("elements", []) if isinstance(elements, list): for row in elements: if isinstance(row, dict): parts.extend(_extract_element_content(row)) elif isinstance(row, list): for el in row: parts.extend(_extract_element_content(el)) card = content.get("card", {}) if card: parts.extend(_extract_interactive_content(card)) header = content.get("header", {}) if isinstance(header, dict): header_title = header.get("title", {}) if isinstance(header_title, dict): header_text = header_title.get("content", "") or header_title.get("text", "") if header_text: parts.append(f"title: {header_text}") return [p for p in parts if p] def _extract_element_content(element): parts = [] if not isinstance(element, dict): return parts tag = element.get("tag", "") if tag in ("markdown", "lark_md"): content = element.get("content", "") if content: parts.append(content) elif tag == "div": text = element.get("text", {}) if isinstance(text, dict): text_content = text.get("content", "") or text.get("text", "") if text_content: parts.append(text_content) elif isinstance(text, str) and text: parts.append(text) for field in element.get("fields", []) or []: if isinstance(field, dict): field_text = field.get("text", {}) if isinstance(field_text, dict): content = field_text.get("content", "") or field_text.get("text", "") if content: parts.append(content) elif tag == "a": href = element.get("href", "") text = element.get("text", "") if href: parts.append(f"link: {href}") if text: parts.append(text) elif tag == "button": text = element.get("text", {}) if isinstance(text, dict): content = text.get("content", "") or text.get("text", "") if content: parts.append(content) url = element.get("url", "") or (element.get("multi_url", {}) or {}).get("url", "") if url: parts.append(f"link: {url}") elif tag == "img": alt = element.get("alt", {}) if isinstance(alt, dict): parts.append(alt.get("content", "[image]") or "[image]") else: parts.append("[image]") for child in element.get("elements", []) or []: parts.extend(_extract_element_content(child)) for col in element.get("columns", []) or []: for child in (col.get("elements", []) if isinstance(col, dict) else []): parts.extend(_extract_element_content(child)) return parts def _extract_post_content(content_json): def _parse_block(block): if not isinstance(block, dict) or not isinstance(block.get("content"), list): return None, [] texts, images = [], [] if block.get("title"): texts.append(block.get("title")) for row in block["content"]: if not isinstance(row, list): continue for el in row: if not isinstance(el, dict): continue tag = el.get("tag") if tag in ("text", "a"): texts.append(el.get("text", "")) elif tag == "at": texts.append(f"@{el.get('user_name', 'user')}") elif tag == "img" and el.get("image_key"): images.append(el["image_key"]) text = " ".join([t for t in texts if t]).strip() return text or None, images root = content_json if isinstance(root, dict) and isinstance(root.get("post"), dict): root = root["post"] if not isinstance(root, dict): return "", [] if "content" in root: text, imgs = _parse_block(root) if text or imgs: return text or "", imgs for key in ("zh_cn", "en_us", "ja_jp"): if key in root: text, imgs = _parse_block(root[key]) if text or imgs: return text or "", imgs for val in root.values(): if isinstance(val, dict): text, imgs = _parse_block(val) if text or imgs: return text or "", imgs return "", [] APP_ID = str(mykeys.get("fs_app_id", "") or "").strip() APP_SECRET = str(mykeys.get("fs_app_secret", "") or "").strip() ALLOWED_USERS = _to_allowed_set(mykeys.get("fs_allowed_users", [])) PUBLIC_ACCESS = not ALLOWED_USERS or "*" in ALLOWED_USERS AGENT_TIMEOUT_SEC = 900 agent = GeneraticAgent() threading.Thread(target=agent.run, daemon=True).start() client, user_tasks = None, {} def create_client(): return lark.Client.builder().app_id(APP_ID).app_secret(APP_SECRET).log_level(lark.LogLevel.INFO).build() def _card_raw(elements): return json.dumps({ "schema": "2.0", "config": {"streaming_mode": False, "width_mode": "fill"}, "body": {"elements": elements}, }, ensure_ascii=False) def _card(text): return _card_raw([{"tag": "markdown", "content": text}]) def _send_raw(receive_id, payload, msg_type, rtype): try: body = CreateMessageRequest.builder().receive_id_type(rtype).request_body( CreateMessageRequestBody.builder().receive_id(receive_id).msg_type(msg_type).content(payload).build() ).build() r = client.im.v1.message.create(body) if r.success(): return r.data.message_id if r.data else None print(f"发送失败: {r.code}, {r.msg}") except Exception as e: print(f"[ERROR] _send_raw 网络异常: {e}") return None def _patch_card(message_id, card_json): return _patch_card_result(message_id, card_json)[0] def _patch_card_result(message_id, card_json): try: body = PatchMessageRequest.builder().message_id(message_id).request_body( PatchMessageRequestBody.builder().content(card_json).build() ).build() r = client.im.v1.message.patch(body) if not r.success(): print(f"[ERROR] patch_card 失败: {r.code}, {r.msg}") msg = f"{getattr(r, 'code', '')} {getattr(r, 'msg', '')}".lower() return r.success(), ("230099" in msg or "11310" in msg or "element exceeds the limit" in msg) except Exception as e: print(f"[ERROR] _patch_card 网络异常: {e}") return False, False def send_message(receive_id, content, msg_type="text", use_card=False, receive_id_type="open_id"): if use_card: return _send_raw(receive_id, _card(content), "interactive", receive_id_type) if msg_type == "text": return _send_raw(receive_id, json.dumps({"text": content}, ensure_ascii=False), "text", receive_id_type) return _send_raw(receive_id, content, msg_type, receive_id_type) def update_message(message_id, content): return _patch_card(message_id, _card(content)) def _upload_image_sync(file_path): try: with open(file_path, "rb") as f: request = CreateImageRequest.builder().request_body( CreateImageRequestBody.builder().image_type("message").image(f).build() ).build() response = client.im.v1.image.create(request) if response.success(): return response.data.image_key print(f"[ERROR] upload image failed: {response.code}, {response.msg}") except Exception as e: print(f"[ERROR] upload image failed {file_path}: {e}") return None def _upload_file_sync(file_path): ext = os.path.splitext(file_path)[1].lower() file_type = _FILE_TYPE_MAP.get(ext, "stream") file_name = os.path.basename(file_path) try: with open(file_path, "rb") as f: request = CreateFileRequest.builder().request_body( CreateFileRequestBody.builder().file_type(file_type).file_name(file_name).file(f).build() ).build() response = client.im.v1.file.create(request) if response.success(): return response.data.file_key print(f"[ERROR] upload file failed: {response.code}, {response.msg}") except Exception as e: print(f"[ERROR] upload file failed {file_path}: {e}") return None def _download_image_sync(message_id, image_key): try: request = GetMessageResourceRequest.builder().message_id(message_id).file_key(image_key).type("image").build() response = client.im.v1.message_resource.get(request) if response.success(): data = response.file.read() if hasattr(response.file, "read") else response.file return data, response.file_name print(f"[ERROR] download image failed: {response.code}, {response.msg}") except Exception as e: print(f"[ERROR] download image failed {image_key}: {e}") return None, None def _download_file_sync(message_id, file_key, resource_type="file"): if resource_type == "audio": resource_type = "file" try: request = GetMessageResourceRequest.builder().message_id(message_id).file_key(file_key).type(resource_type).build() response = client.im.v1.message_resource.get(request) if response.success(): data = response.file.read() if hasattr(response.file, "read") else response.file return data, response.file_name print(f"[ERROR] download {resource_type} failed: {response.code}, {response.msg}") except Exception as e: print(f"[ERROR] download {resource_type} failed {file_key}: {e}") return None, None def _download_and_save_media(msg_type, content_json, message_id): data, filename = None, None if msg_type == "image": image_key = content_json.get("image_key") if image_key and message_id: data, filename = _download_image_sync(message_id, image_key) if not filename: filename = f"{image_key[:16]}.jpg" elif msg_type in ("audio", "file", "media"): file_key = content_json.get("file_key") if file_key and message_id: data, filename = _download_file_sync(message_id, file_key, msg_type) if not filename: filename = file_key[:16] if msg_type == "audio" and filename and not filename.endswith(".opus"): filename = f"{filename}.opus" if data and filename: file_path = os.path.join(MEDIA_DIR, os.path.basename(filename)) with open(file_path, "wb") as f: f.write(data) return file_path, filename return None, None def _describe_media(msg_type, file_path, filename): if msg_type == "image": return f"[image: {filename}]\n[Image: source: {file_path}]" if msg_type == "audio": return f"[audio: {filename}]\n[File: source: {file_path}]" if msg_type in ("file", "media"): return f"[{msg_type}: {filename}]\n[File: source: {file_path}]" return f"[{msg_type}]\n[File: source: {file_path}]" def _send_local_file(receive_id, file_path, receive_id_type="open_id"): if not os.path.isfile(file_path): send_message(receive_id, f"⚠️ 文件不存在: {file_path}", receive_id_type=receive_id_type) return False ext = os.path.splitext(file_path)[1].lower() if ext in _IMAGE_EXTS: image_key = _upload_image_sync(file_path) if image_key: send_message(receive_id, json.dumps({"image_key": image_key}, ensure_ascii=False), msg_type="image", receive_id_type=receive_id_type) return True else: file_key = _upload_file_sync(file_path) if file_key: msg_type = "media" if ext in _AUDIO_EXTS or ext in _VIDEO_EXTS else "file" send_message(receive_id, json.dumps({"file_key": file_key}, ensure_ascii=False), msg_type=msg_type, receive_id_type=receive_id_type) return True send_message(receive_id, f"⚠️ 文件发送失败: {os.path.basename(file_path)}", receive_id_type=receive_id_type) return False def _send_generated_files(receive_id, raw_text, receive_id_type="open_id"): for file_path in _extract_files(raw_text): _send_local_file(receive_id, file_path, receive_id_type) def _build_user_message(message): msg_type = message.message_type message_id = message.message_id content_json = _parse_json(message.content) parts, image_paths = [], [] if msg_type == "text": text = str(content_json.get("text", "") or "").strip() if text: parts.append(text) elif msg_type == "post": text, image_keys = _extract_post_content(content_json) if text: parts.append(text) for image_key in image_keys: file_path, filename = _download_and_save_media("image", {"image_key": image_key}, message_id) if file_path and filename: parts.append(_describe_media("image", file_path, filename)) image_paths.append(file_path) else: parts.append("[image: download failed]") elif msg_type in ("image", "audio", "file", "media"): file_path, filename = _download_and_save_media(msg_type, content_json, message_id) if file_path and filename: parts.append(_describe_media(msg_type, file_path, filename)) if msg_type == "image": image_paths.append(file_path) else: parts.append(f"[{msg_type}: download failed]") elif msg_type in ("share_chat", "share_user", "interactive", "share_calendar_event", "system", "merge_forward"): parts.append(_extract_share_card_content(content_json, msg_type)) else: parts.append(_MSG_TYPE_MAP.get(msg_type, f"[{msg_type}]")) return "\n".join([p for p in parts if p]).strip(), image_paths def _fmt_tool_call(tc): name = tc.get('tool_name', '?') args = {k: v for k, v in (tc.get('args') or {}).items() if not k.startswith('_')} return f"- `{name}`({json.dumps(args, ensure_ascii=False)[:200]})" def _build_step_detail(resp, tool_calls): """从 LLM response + tool_calls 组装单步展开详情(纯函数)。""" parts = [] thinking = (getattr(resp, 'thinking', '') or '').strip() if resp else '' if thinking: parts.append(f"### 💭 Thinking\n{thinking}") if tool_calls: parts.append("### 🛠 Tool Calls\n" + "\n".join(_fmt_tool_call(tc) for tc in tool_calls)) content = _display_text((getattr(resp, 'content', '') or '')).strip() if resp else '' if content and content != '...': parts.append(f"### 📝 Output\n{content}") return "\n\n".join(parts) class _TaskCard: """飞书任务卡片:单卡片持续 patch;每步一个独立折叠面板(header 显示 summary,展开看详情)。""" _DETAIL_LIMIT = 8000 _FINAL_LIMIT = 6000 def __init__(self, receive_id, rid_type): self.rid, self.rtype = receive_id, rid_type self.steps = [] # [(summary, detail), ...] self.status = "🤔 思考中..." self.final = None self.msg_id = None self.page_no = 1 self.turn_no = 0 self.turn_base = 1 self.note = None def _step_panel(self, idx, summary, detail): detail = detail or "_(无输出)_" if len(detail) > self._DETAIL_LIMIT: detail = detail[:self._DETAIL_LIMIT] + f"\n\n…(已截断,共 {len(detail)} 字符)" return { "tag": "collapsible_panel", "expanded": False, "header": {"title": {"tag": "plain_text", "content": f"Turn {idx} · {summary}"}}, "elements": [{"tag": "markdown", "content": detail}], } def _build(self): header = f"**{self.status}**" if self.page_no > 1: header += f"\n\n📄 工作卡片 {self.page_no}" els = [{"tag": "markdown", "content": header}] if self.note: els.append({"tag": "markdown", "content": self.note}) for i, (s, d) in enumerate(self.steps, self.turn_base): els.append(self._step_panel(i, s, d)) if self.final: els += [{"tag": "hr"}, {"tag": "markdown", "content": self.final}] return _card_raw(els) def _push(self): card = self._build() if self.msg_id: return _patch_card_result(self.msg_id, card) else: self.msg_id = _send_raw(self.rid, card, "interactive", self.rtype) return bool(self.msg_id), False def _rollover(self): self.page_no += 1 self.msg_id = None self.final = None self.note = "⚠️ 上一张工作卡片达到飞书限制,本页继续展示后续进展。" # ── 公开接口 ── def start(self): self._push() def step(self, summary, detail=""): self.turn_no += 1 step = (summary, detail) self.steps.append(step) self.status = f"⏳ 工作中 · Turn {self.turn_no}" ok, limit = self._push() if limit: self.steps.pop() self._rollover() self.turn_base = self.turn_no self.steps = [step] self._push() def done(self, text): self.status = "✅ 已完成" self.final = (text or "_(无文本输出)_")[:self._FINAL_LIMIT] ok, limit = self._push() if limit: self._rollover() self.steps = [] self.turn_base = self.turn_no + 1 self.final = (text or "_(无文本输出)_")[:self._FINAL_LIMIT] self._push() def fail(self, msg): self.status = f"❌ {msg}" self._push() def _make_task_hook(card, done_event, on_final): """飞书任务 hook:每轮 patch 卡片状态;结束触发 on_final(raw) 处理附件。""" def hook(ctx): try: if ctx.get('exit_reason'): resp = ctx.get('response') raw = resp.content if hasattr(resp, 'content') else str(resp) card.done(_display_text(raw)) on_final(raw) done_event.set() elif ctx.get('summary'): detail = _build_step_detail(ctx.get('response'), ctx.get('tool_calls') or []) card.step(ctx['summary'], detail) except Exception as e: print(f"[fs hook] error: {e}") return hook def handle_message(data): event, message, sender = data.event, data.event.message, data.event.sender open_id = sender.sender_id.open_id chat_id = message.chat_id if not PUBLIC_ACCESS and open_id not in ALLOWED_USERS: print(f"未授权用户: {open_id}") return user_input, image_paths = _build_user_message(message) if not user_input: if chat_id: send_message(chat_id, f"⚠️ 暂不支持处理此类飞书消息:{message.message_type}", receive_id_type="chat_id") else: send_message(open_id, f"⚠️ 暂不支持处理此类飞书消息:{message.message_type}") return print(f"收到消息 [{open_id}] ({message.message_type}, {len(image_paths)} images): {user_input[:200]}") if message.message_type == "text" and user_input.startswith("/"): return handle_command(open_id, user_input, chat_id) def run_agent(): user_tasks[open_id] = {"running": True} receive_id = chat_id or open_id rid_type = "chat_id" if chat_id else "open_id" done_event = threading.Event() hook_key = f"fs_{open_id}" card = _TaskCard(receive_id, rid_type) card.start() on_final = lambda raw: _send_generated_files(receive_id, raw, receive_id_type=rid_type) if not hasattr(agent, '_turn_end_hooks'): agent._turn_end_hooks = {} agent._turn_end_hooks[hook_key] = _make_task_hook(card, done_event, on_final) try: agent.put_task(user_input, source="feishu", images=image_paths) start = time.time() while not done_event.wait(timeout=3): if not user_tasks.get(open_id, {}).get("running", True): agent.abort() card.fail("已停止") break if time.time() - start > AGENT_TIMEOUT_SEC: agent.abort() card.fail("任务超时") break except Exception as e: traceback.print_exc() card.fail(f"错误: {e}") finally: agent._turn_end_hooks.pop(hook_key, None) user_tasks.pop(open_id, None) threading.Thread(target=run_agent, daemon=True).start() def handle_command(open_id, cmd, chat_id=None): def _send_cmd_response(content): if chat_id: send_message(chat_id, content, receive_id_type="chat_id") else: send_message(open_id, content) parts = (cmd or "").split() op = (parts[0] if parts else "").lower() if op == "/stop": if open_id in user_tasks: user_tasks[open_id]["running"] = False agent.abort() _send_cmd_response("正在停止...") elif op == "/new": _send_cmd_response(reset_conversation(agent)) elif op == "/help": _send_cmd_response("命令列表:\n/stop - 停止当前任务\n/status - 查看状态\n/llm - 查看当前模型列表\n/llm [n] - 切换到第 n 个模型\n/restore - 恢复上次对话历史\n/continue - 列出可恢复会话\n/continue [n] - 恢复第 n 个会话\n/new - 开启新对话并清空当前上下文\n/help - 显示帮助") elif op == "/status": llm = agent.get_llm_name() if agent.llmclient else "未配置" _send_cmd_response(f"状态: {'🔴 运行中' if agent.is_running else '🟢 空闲'}\nLLM: [{agent.llm_no}] {llm}") elif op == "/llm": if not agent.llmclient: return _send_cmd_response("❌ 当前没有可用的 LLM 配置") if len(parts) > 1: try: agent.next_llm(int(parts[1])) return _send_cmd_response(f"✅ 已切换到 [{agent.llm_no}] {agent.get_llm_name()}") except Exception: return _send_cmd_response(f"用法: /llm <0-{len(agent.list_llms()) - 1}>") lines = [f"{'→' if cur else ' '} [{i}] {name}" for i, name, cur in agent.list_llms()] _send_cmd_response("LLMs:\n" + "\n".join(lines)) elif op == "/restore": try: restored_info, err = format_restore() if err: return _send_cmd_response(err.replace("❌ ", "")) restored, fname, count = restored_info agent.history.extend(restored) agent.abort() _send_cmd_response(f"已恢复 {count} 轮对话\n来源: {fname}\n(仅恢复上下文,请输入新问题继续)") except Exception as e: _send_cmd_response(f"恢复失败: {e}") elif op == "/continue" or cmd.startswith("/continue"): _send_cmd_response(handle_continue_frontend(agent, cmd)) else: _send_cmd_response(f"未知命令: {cmd}") def main(): global client if not APP_ID or not APP_SECRET: print("错误: 请在 mykey.py 或 mykey.json 中配置 fs_app_id 和 fs_app_secret") sys.exit(1) client = create_client() handler = lark.EventDispatcherHandler.builder("", "").register_p2_im_message_receive_v1(handle_message).build() print("=" * 50 + "\n飞书 Agent 已启动(长连接模式)\n" + f"App ID: {APP_ID}\n等待消息...\n" + "=" * 50) retry_delay = 5 while True: try: cli = lark.ws.Client(APP_ID, APP_SECRET, event_handler=handler, log_level=lark.LogLevel.INFO) cli.start() except Exception as e: print(f"[WARN] 飞书长连接断开或启动失败: {e}") print(f"[INFO] {retry_delay}s 后重连...") time.sleep(retry_delay) retry_delay = min(retry_delay * 2, 120) # 重连时刷新 client try: client = create_client() except Exception: pass if __name__ == "__main__": main() ================================================ FILE: frontends/genericagent_acp_bridge.py ================================================ import io import json import os import sys sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) # Must run BEFORE importing agentmain — it reconfigures stdout at import time, # and its submodules may print() during init. We capture the raw binary stdout # for ACP JSON-RPC, then redirect the text-mode stdout to stderr so any stray # prints from agentmain/llmcore don't pollute the ACP channel. if sys.platform == "win32": import msvcrt _stdout_fd = os.dup(sys.__stdout__.fileno()) msvcrt.setmode(_stdout_fd, os.O_BINARY) _acp_stdout = os.fdopen(_stdout_fd, "wb", buffering=0) msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY) # Mark the ACP fd as non-inheritable so child processes can't write to it. os.set_inheritable(_stdout_fd, False) # Redirect the original stdout fd to stderr so child processes # (tool calls) don't write into the ACP JSON-RPC channel. os.dup2(sys.stderr.fileno(), sys.__stdout__.fileno()) else: _stdout_fd = os.dup(sys.__stdout__.fileno()) os.set_inheritable(_stdout_fd, False) _acp_stdout = os.fdopen(_stdout_fd, "wb", buffering=0) os.dup2(sys.stderr.fileno(), sys.__stdout__.fileno()) class _StdoutToStderrRouter(io.TextIOBase): """Redirect text-mode stdout to stderr so agentmain prints don't leak.""" def writable(self): return True def write(self, s): if s: sys.stderr.write(s) sys.stderr.flush() return len(s) if s else 0 def flush(self): sys.stderr.flush() sys.stdout = _StdoutToStderrRouter() import argparse import queue import threading import traceback import uuid from dataclasses import dataclass, field from typing import Any, Dict, List, Optional from agentmain import GeneraticAgent JSONRPC_VERSION = "2.0" ACP_PROTOCOL_VERSION = 1 def eprint(*args: Any) -> None: print(*args, file=sys.stderr, flush=True) def make_text_block(text: str) -> Dict[str, Any]: return {"type": "text", "text": text} def make_session_update(session_id: str, update: Dict[str, Any]) -> Dict[str, Any]: return { "jsonrpc": JSONRPC_VERSION, "method": "session/update", "params": {"sessionId": session_id, "update": update}, } def compact_json(obj: Dict[str, Any]) -> str: return json.dumps(obj, ensure_ascii=False, separators=(",", ":")) def parse_jsonrpc_line(line: str) -> Optional[Dict[str, Any]]: stripped = line.strip() if not stripped: return None try: obj = json.loads(stripped) except json.JSONDecodeError: return None return obj if isinstance(obj, dict) else None def content_blocks_to_text(blocks: List[Dict[str, Any]]) -> str: parts: List[str] = [] for block in blocks: if not isinstance(block, dict): continue block_type = block.get("type") if block_type == "text": text = block.get("text") if isinstance(text, str) and text: parts.append(text) elif block_type == "resource_link": name = block.get("name") or "resource" uri = block.get("uri") or "" desc = block.get("description") or "" parts.append(f"[ResourceLink] {name}: {uri}\n{desc}".strip()) elif block_type == "resource": uri = block.get("uri") or "resource" text = block.get("text") if isinstance(text, str) and text: parts.append(f"[Resource] {uri}\n{text}") else: parts.append(f"[Resource] {uri}") elif block_type == "image": uri = block.get("uri") or "inline-image" parts.append(f"[Image omitted] {uri}") else: parts.append(f"[Unsupported content block: {block_type}]") return "\n\n".join(p for p in parts if p).strip() def jsonrpc_error(code: int, message: str, req_id: Any = None, data: Any = None) -> Dict[str, Any]: err: Dict[str, Any] = {"code": code, "message": message} if data is not None: err["data"] = data return {"jsonrpc": JSONRPC_VERSION, "id": req_id, "error": err} def jsonrpc_result(req_id: Any, result: Any) -> Dict[str, Any]: return {"jsonrpc": JSONRPC_VERSION, "id": req_id, "result": result} @dataclass class SessionState: session_id: str cwd: str agent: GeneraticAgent current_prompt_id: Any = None prompt_lock: threading.Lock = field(default_factory=threading.Lock) class GenericAgentAcpBridge: def __init__(self, llm_no: int = 0): self.llm_no = llm_no self._json_out = _acp_stdout self._write_lock = threading.Lock() self._sessions: Dict[str, SessionState] = {} self._shutdown = False def write_message(self, msg: Dict[str, Any]) -> None: payload = compact_json(msg) raw = (payload + "\n").encode("utf-8") method = msg.get("method", msg.get("id", "?")) eprint(f"[ACP-BRIDGE] >>> {payload[:500]}") try: with self._write_lock: self._json_out.write(raw) self._json_out.flush() except Exception as e: eprint(f"[ACP-BRIDGE] WRITE FAILED: {type(e).__name__}: {e}") def new_agent(self) -> GeneraticAgent: agent = GeneraticAgent() agent.next_llm(self.llm_no) agent.verbose = True agent.inc_out = True threading.Thread(target=agent.run, daemon=True).start() return agent def handle_initialize(self, req_id: Any, params: Dict[str, Any]) -> None: requested_version = params.get("protocolVersion", ACP_PROTOCOL_VERSION) version = ACP_PROTOCOL_VERSION if requested_version == ACP_PROTOCOL_VERSION else ACP_PROTOCOL_VERSION result = { "protocolVersion": version, "agentCapabilities": { "loadSession": False, "mcpCapabilities": {"http": False, "sse": False}, "promptCapabilities": { "image": False, "audio": False, "embeddedContext": False, }, "sessionCapabilities": {}, }, "agentInfo": { "name": "genericagent-acp", "title": "GenericAgent", "version": "0.1.0", }, "authMethods": [], } self.write_message(jsonrpc_result(req_id, result)) def handle_session_new(self, req_id: Any, params: Dict[str, Any]) -> None: cwd = params.get("cwd") if not isinstance(cwd, str) or not cwd: self.write_message(jsonrpc_error(-32602, "cwd is required", req_id)) return if not os.path.isabs(cwd): cwd = os.path.abspath(cwd) session_id = f"ga_{uuid.uuid4().hex}" agent = self.new_agent() session = SessionState(session_id=session_id, cwd=cwd, agent=agent) self._sessions[session_id] = session self.write_message( jsonrpc_result( req_id, { "sessionId": session_id, "modes": None, "configOptions": None, }, ) ) def handle_session_prompt(self, req_id: Any, params: Dict[str, Any]) -> None: session_id = params.get("sessionId") prompt_blocks = params.get("prompt") session = self._sessions.get(session_id) if session is None: self.write_message(jsonrpc_error(-32602, "unknown sessionId", req_id)) return if not isinstance(prompt_blocks, list): self.write_message(jsonrpc_error(-32602, "prompt must be an array", req_id)) return prompt_text = content_blocks_to_text(prompt_blocks) if not prompt_text: self.write_message(jsonrpc_error(-32602, "prompt must contain text or supported content", req_id)) return with session.prompt_lock: if session.current_prompt_id is not None: self.write_message( jsonrpc_error(-32603, "session already has an active prompt", req_id) ) return session.current_prompt_id = req_id def run_prompt() -> None: stop_reason = "end_turn" try: dq = session.agent.put_task(prompt_text, source="acp") self._drain_agent_queue(session, dq) except Exception as exc: stop_reason = "end_turn" self.write_message( make_session_update( session.session_id, { "sessionUpdate": "agent_message_chunk", "content": make_text_block( f"[Bridge error] {type(exc).__name__}: {exc}" ), }, ) ) eprint("[GenericAgent ACP] prompt thread failed:", traceback.format_exc()) finally: with session.prompt_lock: finished_req_id = session.current_prompt_id session.current_prompt_id = None if finished_req_id is not None: import time time.sleep(0.1) self.write_message( jsonrpc_result(finished_req_id, {"stopReason": stop_reason}) ) threading.Thread(target=run_prompt, daemon=True).start() def _drain_agent_queue(self, session: SessionState, dq: "queue.Queue[Dict[str, Any]]") -> None: sent_any = False while True: item = dq.get() if not isinstance(item, dict): continue # With inc_out=True, "next" items are already incremental deltas. if "next" in item and "done" not in item: delta = item["next"] if isinstance(delta, str) and delta: sent_any = True try: self.write_message( make_session_update( session.session_id, { "sessionUpdate": "agent_message_chunk", "content": make_text_block(delta), }, ) ) except Exception as e: eprint(f"[ACP-BRIDGE] ERROR writing update: {e}") if "done" in item: # "done" text has post-processing (\n\n insertion) # that shifts offsets — cannot safely compute a tail delta. # Only use "done" content if nothing was streamed (error case). if not sent_any: done_text = item["done"] if isinstance(done_text, str) and done_text: try: self.write_message( make_session_update( session.session_id, { "sessionUpdate": "agent_message_chunk", "content": make_text_block(done_text), }, ) ) except Exception as e: eprint(f"[ACP-BRIDGE] ERROR writing done: {e}") break def handle_session_cancel(self, params: Dict[str, Any]) -> None: session_id = params.get("sessionId") session = self._sessions.get(session_id) if session is None: return if session.current_prompt_id is not None: session.agent.abort() def handle_message(self, msg: Dict[str, Any]) -> None: method = msg.get("method") req_id = msg.get("id") params = msg.get("params") or {} try: if method == "initialize": self.handle_initialize(req_id, params) elif method == "session/new": self.handle_session_new(req_id, params) elif method == "session/prompt": self.handle_session_prompt(req_id, params) elif method == "session/cancel": self.handle_session_cancel(params) elif method == "session/load": self.write_message(jsonrpc_error(-32601, "session/load not supported", req_id)) elif method == "session/list": self.write_message(jsonrpc_error(-32601, "session/list not supported", req_id)) elif method == "session/close": self.write_message(jsonrpc_result(req_id, {})) elif method is None: if req_id is not None: self.write_message(jsonrpc_error(-32600, "invalid request", req_id)) else: if req_id is not None: self.write_message(jsonrpc_error(-32601, f"method not found: {method}", req_id)) except Exception as exc: eprint("[GenericAgent ACP] request handler failed:", traceback.format_exc()) if req_id is not None: self.write_message( jsonrpc_error(-32603, f"internal error: {type(exc).__name__}: {exc}", req_id) ) def serve(self) -> None: eprint("[GenericAgent ACP] bridge started") stdin = io.TextIOWrapper(sys.stdin.buffer, encoding="utf-8", errors="replace") if hasattr(sys.stdin, 'buffer') else sys.stdin for raw_line in stdin: msg = parse_jsonrpc_line(raw_line) if msg is None: continue self.handle_message(msg) if self._shutdown: break eprint("[GenericAgent ACP] bridge stopped") def main() -> int: parser = argparse.ArgumentParser(description="GenericAgent ACP bridge over stdio") parser.add_argument("--llm-no", type=int, default=0, help="LLM index for GenericAgent") args = parser.parse_args() bridge = GenericAgentAcpBridge(llm_no=args.llm_no) bridge.serve() return 0 if __name__ == "__main__": raise SystemExit(main()) ================================================ FILE: frontends/qqapp.py ================================================ import asyncio, os, sys, threading, time from collections import deque sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from agentmain import GeneraticAgent from chatapp_common import AgentChatMixin, ensure_single_instance, public_access, redirect_log, require_runtime, split_text from llmcore import mykeys try: import botpy from botpy.message import C2CMessage, GroupMessage except Exception: print("Please install qq-botpy to use QQ module: pip install qq-botpy") sys.exit(1) agent = GeneraticAgent(); agent.verbose = False APP_ID = str(mykeys.get("qq_app_id", "") or "").strip() APP_SECRET = str(mykeys.get("qq_app_secret", "") or "").strip() ALLOWED = {str(x).strip() for x in mykeys.get("qq_allowed_users", []) if str(x).strip()} PROCESSED_IDS, USER_TASKS = deque(maxlen=1000), {} SEQ_LOCK, MSG_SEQ = threading.Lock(), 1 def _next_msg_seq(): global MSG_SEQ with SEQ_LOCK: MSG_SEQ += 1 return MSG_SEQ def _build_intents(): try: return botpy.Intents(public_messages=True, direct_message=True) except Exception: intents = botpy.Intents.none() if hasattr(botpy.Intents, "none") else botpy.Intents() for attr in ("public_messages", "public_guild_messages", "direct_message", "direct_messages", "c2c_message", "c2c_messages", "group_at_message", "group_at_messages"): if hasattr(intents, attr): try: setattr(intents, attr, True) except Exception: pass return intents def _make_bot_class(app): class QQBot(botpy.Client): def __init__(self): super().__init__(intents=_build_intents(), ext_handlers=False) async def on_ready(self): print(f"[QQ] bot ready: {getattr(getattr(self, 'robot', None), 'name', 'QQBot')}") async def on_c2c_message_create(self, message: C2CMessage): await app.on_message(message, is_group=False) async def on_group_at_message_create(self, message: GroupMessage): await app.on_message(message, is_group=True) async def on_direct_message_create(self, message): await app.on_message(message, is_group=False) return QQBot class QQApp(AgentChatMixin): label, source, split_limit = "QQ", "qq", 1500 def __init__(self): super().__init__(agent, USER_TASKS) self.client = None async def send_text(self, chat_id, content, *, msg_id=None, is_group=False): if not self.client: return api = self.client.api.post_group_message if is_group else self.client.api.post_c2c_message key = "group_openid" if is_group else "openid" for part in split_text(content, self.split_limit): await api(**{key: chat_id, "msg_type": 0, "content": part, "msg_id": msg_id, "msg_seq": _next_msg_seq()}) async def on_message(self, data, is_group=False): try: msg_id = getattr(data, "id", None) if msg_id in PROCESSED_IDS: return PROCESSED_IDS.append(msg_id) content = (getattr(data, "content", "") or "").strip() if not content: return author = getattr(data, "author", None) user_id = str(getattr(author, "member_openid" if is_group else "user_openid", "") or getattr(author, "id", "") or "unknown") chat_id = str(getattr(data, "group_openid", "") or user_id) if is_group else user_id if not public_access(ALLOWED) and user_id not in ALLOWED: print(f"[QQ] unauthorized user: {user_id}") return print(f"[QQ] message from {user_id} ({'group' if is_group else 'c2c'}): {content}") if content.startswith("/"): return await self.handle_command(chat_id, content, msg_id=msg_id, is_group=is_group) asyncio.create_task(self.run_agent(chat_id, content, msg_id=msg_id, is_group=is_group)) except Exception: import traceback print("[QQ] handle_message error") traceback.print_exc() async def start(self): self.client = _make_bot_class(self)() delay, max_delay = 5, 300 while True: started_at = time.monotonic() try: print(f"[QQ] bot starting... {time.strftime('%m-%d %H:%M')}") await self.client.start(appid=APP_ID, secret=APP_SECRET) except Exception as e: print(f"[QQ] bot error: {e}") if time.monotonic() - started_at >= 60: delay = 5 print(f"[QQ] reconnect in {delay}s...") await asyncio.sleep(delay) delay = min(delay * 2, max_delay) if __name__ == "__main__": _LOCK_SOCK = ensure_single_instance(19528, "QQ") require_runtime(agent, "QQ", qq_app_id=APP_ID, qq_app_secret=APP_SECRET) redirect_log(__file__, "qqapp.log", "QQ", ALLOWED) threading.Thread(target=agent.run, daemon=True).start() asyncio.run(QQApp().start()) ================================================ FILE: frontends/qtapp.py ================================================ """ 桌面前端单文件版 – PySide6 聊天面板 + 悬浮按钮 thanks to GaoZhiCheng 依赖: pip install PySide6 可选: pip install markdown (Markdown 渲染) 用法: python frontends/qtapp.py """ from __future__ import annotations import math, os, sys, json, glob, re, base64, time, threading import queue as _queue from datetime import datetime from typing import Optional from PySide6.QtWidgets import ( QWidget, QVBoxLayout, QHBoxLayout, QLabel, QPushButton, QScrollArea, QFrame, QTextEdit, QStackedWidget, QListWidget, QListWidgetItem, QSizePolicy, QFileDialog, QSplitter, QTextBrowser, QApplication, QMessageBox, QMenu, QLineEdit, ) from PySide6.QtCore import ( Qt, QTimer, QPoint, QPointF, QByteArray, QSize, Signal, QMetaObject, Q_ARG, QObject, QDateTime, QEvent, ) from PySide6.QtGui import ( QPainter, QColor, QLinearGradient, QRadialGradient, QPen, QPainterPath, QCursor, QFont, QIcon, QPixmap, QRegion, ) sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) from agentmain import GeneraticAgent from chatapp_common import FILE_HINT, HELP_TEXT, clean_reply, build_done_text, format_restore # ══════════════════════════════════════════════════════════════════════ # FloatingButton # ══════════════════════════════════════════════════════════════════════ class FloatingButton(QWidget): SIZE = 60 # circle diameter MARGIN = 14 # extra space for glow TOTAL = SIZE + MARGIN * 2 def __init__(self, chat_panel: QWidget): super().__init__() self.chat_panel = chat_panel self._drag_origin_global: QPoint | None = None self._drag_origin_win: QPoint | None = None self._dragged = False self._glow = 0.5 self._glow_dir = 1 self._hovering = False self._hover_clock = 0.0 self._hover_strength = 0.0 self._flow_phase = 0.0 self._running = False self._last_toggle_ms = 0 # debounce timestamp # Window flags: frameless, always on top, no taskbar entry self.setWindowFlags( Qt.FramelessWindowHint | Qt.WindowStaysOnTopHint | Qt.Tool ) self.setAttribute(Qt.WA_TranslucentBackground) self.setFixedSize(self.TOTAL, self.TOTAL) self.setCursor(QCursor(Qt.PointingHandCursor)) # Smooth animation (~30 fps) self._timer = QTimer(self) self._timer.timeout.connect(self._tick) self._timer.start(33) # Default position: bottom-right of the work area scr = QApplication.primaryScreen().availableGeometry() self.move(scr.right() - self.TOTAL - 20, scr.bottom() - self.TOTAL - 20) # ── Animation ──────────────────────────────────────── def _tick(self): # running status: green when model is actively responding self._running = bool( getattr(self.chat_panel, "_is_streaming", False) or getattr(getattr(self.chat_panel, "agent", None), "is_running", False) ) self._glow += self._glow_dir * 0.04 if self._glow >= 1.0: self._glow, self._glow_dir = 1.0, -1 elif self._glow <= 0.0: self._glow, self._glow_dir = 0.0, 1 target = 1.0 if self._hovering else 0.0 self._hover_strength += (target - self._hover_strength) * 0.20 self._hover_clock += 0.033 self._flow_phase += 0.16 + (0.06 if self._running else 0.0) + (0.05 if self._hovering else 0.0) self.update() # ── Painting ────────────────────────────────────────── def paintEvent(self, _event): p = QPainter(self) p.setRenderHint(QPainter.Antialiasing) m = self.MARGIN r = self.SIZE // 2 cx = m + r # Rhythmic spring bounce: one main hop + one lighter rebound per beat. beat_t = self._hover_clock % 1.18 spring = 0.0 if beat_t < 0.70: spring += max(0.0, math.exp(-5.2 * beat_t) * math.sin(15.5 * beat_t)) if beat_t > 0.20: rt = beat_t - 0.20 spring += 0.52 * max(0.0, math.exp(-7.0 * rt) * math.sin(21.0 * rt)) idle_sway = 0.20 * math.sin(self._hover_clock * 2.1) bounce = int(round((spring * 7.2 + idle_sway) * self._hover_strength)) cy = m + r - bounce if self._running: # running: #2DFFF5 -> #FFF878 g0 = QColor(45, 255, 245, 195) g1 = QColor(255, 248, 120, 195) glow_rgb = (96, 255, 216) else: # idle: #103CE7 -> #64E9FF g0 = QColor(16, 60, 231, 195) g1 = QColor(100, 233, 255, 195) glow_rgb = (74, 170, 255) # --- Outer glow rings (3 layers) --- base_alpha = int(45 + 25 * self._glow) for i, gr in enumerate([r + 10, r + 6, r + 2]): g = QRadialGradient(QPointF(cx, cy), gr) g.setColorAt(0.0, QColor(glow_rgb[0], glow_rgb[1], glow_rgb[2], max(0, base_alpha - i * 14))) g.setColorAt(1.0, QColor(glow_rgb[0], glow_rgb[1], glow_rgb[2], 0)) p.setBrush(g) p.setPen(Qt.NoPen) p.drawEllipse(int(cx - gr), int(cy - gr), int(gr * 2), int(gr * 2)) # --- Frosted glass disc behind main circle --- frost = QRadialGradient(QPointF(cx, cy), r) frost.setColorAt(0.0, QColor(30, 30, 45, 140)) frost.setColorAt(0.85, QColor(20, 20, 32, 160)) frost.setColorAt(1.0, QColor(14, 14, 20, 100)) p.setBrush(frost) p.setPen(Qt.NoPen) p.drawEllipse(cx - r, cy - r, r * 2, r * 2) # --- Main circle (flowing state gradient) --- spin = self._flow_phase dx = math.cos(spin) * r dy = math.sin(spin) * r grad = QLinearGradient(cx - dx, cy - dy, cx + dx, cy + dy) grad.setColorAt(0.0, g0) grad.setColorAt(1.0, g1) p.setBrush(grad) p.setPen(QPen(QColor(255, 255, 255, 50), 1.5)) p.drawEllipse(cx - r, cy - r, r * 2, r * 2) # --- Flowing glass streaks --- clip = QPainterPath() clip.addEllipse(float(cx - r), float(cy - r), float(r * 2), float(r * 2)) p.setClipPath(clip) flow_shift = math.sin(self._flow_phase * 0.85) * (r * 0.7) streak1 = QLinearGradient(cx - r + flow_shift, cy - r, cx + r + flow_shift, cy + r) streak1.setColorAt(0.00, QColor(255, 255, 255, 0)) streak1.setColorAt(0.45, QColor(255, 255, 255, 42)) streak1.setColorAt(0.52, QColor(255, 255, 255, 78)) streak1.setColorAt(0.60, QColor(255, 255, 255, 24)) streak1.setColorAt(1.00, QColor(255, 255, 255, 0)) p.setBrush(streak1) p.setPen(Qt.NoPen) p.drawEllipse(cx - r, cy - r, r * 2, r * 2) flow_shift_2 = math.cos(self._flow_phase * 1.2) * (r * 0.5) streak2 = QLinearGradient(cx - r, cy + flow_shift_2, cx + r, cy - flow_shift_2) streak2.setColorAt(0.00, QColor(255, 255, 255, 0)) streak2.setColorAt(0.35, QColor(255, 255, 255, 16)) streak2.setColorAt(0.50, QColor(255, 255, 255, 46)) streak2.setColorAt(0.65, QColor(255, 255, 255, 16)) streak2.setColorAt(1.00, QColor(255, 255, 255, 0)) p.setBrush(streak2) p.drawEllipse(cx - r, cy - r, r * 2, r * 2) # --- Top highlight --- hl = QLinearGradient(cx, cy - r, cx, cy) hl.setColorAt(0.0, QColor(255, 255, 255, 72)) hl.setColorAt(1.0, QColor(255, 255, 255, 0)) p.setBrush(hl) p.drawRect(cx - r, cy - r, r * 2, r) p.setClipping(False) # --- Bot icon --- p.setPen(QPen(QColor(255, 255, 255, 220), 1.8)) p.setBrush(Qt.NoBrush) # Head p.drawRoundedRect(cx - 9, cy - 6, 18, 12, 2, 2) # Eyes p.setBrush(QColor(255, 255, 255, 220)) p.setPen(Qt.NoPen) p.drawEllipse(cx - 6, cy - 3, 4, 4) p.drawEllipse(cx + 2, cy - 3, 4, 4) # Antenna stem p.setPen(QPen(QColor(255, 255, 255, 220), 1.8)) p.drawLine(cx, cy - 6, cx, cy - 10) # Antenna tip p.setBrush(QColor(255, 255, 255, 190)) p.setPen(Qt.NoPen) p.drawEllipse(cx - 2, cy - 13, 4, 4) def enterEvent(self, event): self._hovering = True self.update() super().enterEvent(event) def leaveEvent(self, event): self._hovering = False self.update() super().leaveEvent(event) # ── Mouse events (drag + click) ─────────────────────── def mousePressEvent(self, event): if event.button() == Qt.LeftButton: self._drag_origin_global = event.globalPosition().toPoint() self._drag_origin_win = self.pos() self._dragged = False def mouseMoveEvent(self, event): if event.buttons() == Qt.LeftButton and self._drag_origin_global: delta = event.globalPosition().toPoint() - self._drag_origin_global if abs(delta.x()) > 5 or abs(delta.y()) > 5: self._dragged = True if self._dragged: new = self._drag_origin_win + delta scr = QApplication.primaryScreen().availableGeometry() new.setX(max(scr.left(), min(new.x(), scr.right() - self.width()))) new.setY(max(scr.top(), min(new.y(), scr.bottom() - self.height()))) self.move(new) def mouseDoubleClickEvent(self, event): # Qt sends Press→Release→DoubleClick→Release on double-click. # The first Release already toggled the panel; swallow the DoubleClick # so the second Release does NOT trigger a second toggle. self._dragged = True # mark as "dragged" → Release will be ignored event.accept() def mouseReleaseEvent(self, event): if event.button() == Qt.LeftButton: if not self._dragged: self._toggle() self._dragged = False self._drag_origin_global = None # ── Toggle panel ────────────────────────────────────── def _toggle(self): now = QDateTime.currentMSecsSinceEpoch() if now - self._last_toggle_ms < 500: # 500 ms debounce return self._last_toggle_ms = now if self.chat_panel.isVisible(): self.chat_panel.hide() else: self._position_panel() self.chat_panel.show() self.chat_panel.raise_() self.chat_panel.activateWindow() def _position_panel(self): scr = QApplication.primaryScreen().availableGeometry() btn = self.geometry() pw = self.chat_panel.width() ph = self.chat_panel.height() # Prefer left of button, bottom-aligned x = btn.left() - pw - 12 y = btn.bottom() - ph x = max(scr.left() + 10, min(x, scr.right() - pw - 10)) y = max(scr.top() + 10, min(y, scr.bottom() - ph - 10)) self.chat_panel.move(x, y) # ══════════════════════════════════════════════════════════════════════ # ChatPanel # ══════════════════════════════════════════════════════════════════════ # ── constants ───────────────────────────────────────────────────────────────── HISTORY_FILE = "memory/chat_history.json" TEXT_FILE_EXTS = { ".txt", ".md", ".py", ".json", ".csv", ".yaml", ".yml", ".log", ".ini", ".toml", ".xml", ".html", ".js", ".ts", ".sql", } MAX_INLINE_CHARS = 6000 MAX_UPLOAD_BYTES = 10 * 1024 * 1024 # 10 MB AUTO_IDLE_THRESHOLD = 1800 # seconds before autonomous trigger AUTO_COOLDOWN = 120 # seconds between triggers C = { "bg": QColor(14, 14, 18), "panel": QColor(20, 20, 24, 248), "border": QColor(45, 45, 50), "accent": "#7c3aed", "text": "#e4e4e7", "muted": "#71717a", "user_g0": QColor(79, 70, 229), "user_g1": QColor(124, 58, 237), "asst_bg": QColor(39, 39, 42, 210), "asst_bdr": QColor(63, 63, 70), "send_g0": QColor(220, 38, 38), "send_g1": QColor(239, 68, 68), "green": "#22c55e", "hover_bg": "rgba(63,63,70,0.6)", "accent_bg":"rgba(124,58,237,0.25)", "accent_bdr":"rgba(124,58,237,0.5)", } SCROLLBAR_STYLE = """ QScrollBar:vertical { width: 5px; background: transparent; border: none; } QScrollBar::handle:vertical { background: rgba(255,255,255,0.12); border-radius: 2px; min-height: 20px; } QScrollBar::add-line:vertical, QScrollBar::sub-line:vertical { height: 0; } QScrollBar::add-page:vertical, QScrollBar::sub-page:vertical { background: none; } """ _SVG_COPY = '' _SVG_REGEN = '' _SVG_CHAT = '' _SVG_CLOCK = '' _SVG_SEARCH = '' _SVG_BOOK = '' _SVG_GEAR = '' _SVG_PLUS = '' _SVG_CLIP = _SVG_PLUS _SVG_STOP = '' _SVG_RESET = _SVG_REGEN _SVG_SAVE = '' _SVG_TRASH = '' _SVG_BOLT = '' _SVG_PLAY = '' _SVG_FILE = '' _SVG_USER = '' _SVG_BOT = '' _SVG_SEND = '' _MD_CSS = """ body { color: #e4e4e7; font-family: "Arial", "Microsoft YaHei", sans-serif; font-size: 13px; line-height: 1.6; font-weight: 400; } h1 { color: #f4f4f5; font-size: 20px; font-weight: 700; border-bottom: 1px solid #3f3f46; padding-bottom: 4px; margin-top: 16px; } h2 { color: #f4f4f5; font-size: 17px; font-weight: 700; border-bottom: 1px solid #3f3f46; padding-bottom: 3px; margin-top: 14px; } h3 { color: #f4f4f5; font-size: 15px; font-weight: 600; margin-top: 12px; } h4,h5,h6 { color: #d4d4d8; font-size: 13px; font-weight: 600; margin-top: 10px; } code { background: rgba(63,63,70,0.6); color: #c4b5fd; padding: 1px 4px; border-radius: 3px; font-family: Consolas, "Courier New", monospace; font-size: 12px; } pre { background: rgba(24,24,30,0.95); border: 1px solid #3f3f46; border-radius: 6px; padding: 10px 12px; margin: 8px 0; } pre code { background: transparent; padding: 0; color: #d4d4d8; } a { color: #818cf8; text-decoration: none; } a:hover { text-decoration: underline; } blockquote { border-left: 3px solid #7c3aed; margin: 8px 0 8px 0; padding: 4px 0 4px 12px; color: #a1a1aa; } table { border-collapse: collapse; margin: 8px 0; } th, td { border: 1px solid #3f3f46; padding: 5px 10px; } th { background: rgba(63,63,70,0.35); color: #d4d4d8; font-weight: 700; } hr { border: none; border-top: 1px solid #3f3f46; margin: 12px 0; } ul, ol { padding-left: 22px; margin: 4px 0; } li { margin: 2px 0; } p { margin: 6px 0; } """ def _md_to_html(text: str) -> str: try: import markdown return markdown.markdown( text, extensions=["fenced_code", "tables", "nl2br", "sane_lists"] ) except ImportError: pass html, in_code, in_ul = [], False, False for raw in text.split("\n"): if raw.strip().startswith("```"): if in_code: html.append("") else: html.append("
")
            in_code = not in_code
            continue
        if in_code:
            html.append(raw.replace("&", "&").replace("<", "<").replace(">", ">"))
            continue
        line = raw
        line = re.sub(r"`([^`]+)`", r"\1", line)
        line = re.sub(r"\*\*(.+?)\*\*", r"\1", line)
        line = re.sub(r"\*(.+?)\*", r"\1", line)
        line = re.sub(r"\[([^\]]+)\]\(([^)]+)\)", r'\1', line)
        if re.match(r"^#{1,6}\s", line):
            lvl = len(line.split()[0])
            line = f"{line[lvl:].strip()}"
        elif re.match(r"^-{3,}$|^_{3,}$|^\*{3,}$", line.strip()):
            line = "
" elif re.match(r"^\s*[-*+]\s", line): content = re.sub(r"^\s*[-*+]\s", "", line) if not in_ul: html.append("
    ") in_ul = True line = f"
  • {content}
  • " else: if in_ul: html.append("
") in_ul = False line = f"

{line}

" if line.strip() else "" html.append(line) if in_code: html.append("
") if in_ul: html.append("") return "\n".join(html) _icon_cache: dict[str, QIcon] = {} def _svg_icon(key: str, svg_template: str, color: str = "#a1a1aa", size: int = 16) -> QIcon: cache_key = f"{key}_{color}_{size}" if cache_key not in _icon_cache: try: from PySide6.QtSvg import QSvgRenderer except ImportError: return QIcon() data = QByteArray(svg_template.format(c=color).encode("utf-8")) renderer = QSvgRenderer(data) pixmap = QPixmap(size, size) pixmap.fill(Qt.transparent) painter = QPainter(pixmap) renderer.render(painter) painter.end() _icon_cache[cache_key] = QIcon(pixmap) return _icon_cache[cache_key] # ── utilities ───────────────────────────────────────────────────────────────── def _make_session_id() -> str: return datetime.now().strftime("%Y%m%d_%H%M%S_%f") def _load_history() -> list: if os.path.exists(HISTORY_FILE): try: with open(HISTORY_FILE, "r", encoding="utf-8") as f: return json.load(f) except Exception: pass return [] def _save_history(history: list): os.makedirs(os.path.dirname(HISTORY_FILE), exist_ok=True) with open(HISTORY_FILE, "w", encoding="utf-8") as f: json.dump(history, f, ensure_ascii=False, indent=2) def _build_prompt_with_uploads(prompt: str, files: list) -> tuple: """ files: list of {'name': str, 'type': str, 'raw': bytes} returns (full_prompt, display_prompt, display_attachments) """ if not files: return prompt, prompt, [] os.makedirs("temp/uploaded", exist_ok=True) attachment_chunks = ["\n\n[用户上传附件 — 文件已保存到本地磁盘,可用 file_read 工具读取]"] display_attachments = [] img_count, file_names = 0, [] for f in files: raw, name, mime = f["raw"], f["name"], f.get("type", "") size = len(raw) ext = os.path.splitext(name)[1].lower() safe = re.sub(r"[^A-Za-z0-9._\-]", "_", name) saved = os.path.join( "temp", "uploaded", f"{datetime.now().strftime('%Y%m%d_%H%M%S_%f')}_{safe}", ) try: with open(saved, "wb") as out: out.write(raw) except Exception: saved = "(保存失败)" if mime.startswith("image/"): b64 = base64.b64encode(raw).decode() attachment_chunks.append( f"\n- [图片附件] {name} ({size} bytes)\n 磁盘路径: {saved}" f"\n data:{mime};base64,{b64}" ) display_attachments.append({"type": "image", "name": name}) img_count += 1 elif ext in TEXT_FILE_EXTS: text = raw.decode("utf-8", errors="replace") attachment_chunks.append( f"\n--- 文本文件: {name} ({size} bytes) ---\n磁盘路径: {saved}\n{text[:MAX_INLINE_CHARS]}" + ("\n[内容已截断,请用 file_read 读取完整内容]" if len(text) > MAX_INLINE_CHARS else "") ) display_attachments.append({"type": "file", "name": name}) file_names.append(name) else: attachment_chunks.append( f"\n- 文件: {name} ({size} bytes)\n 磁盘路径: {saved}" ) display_attachments.append({"type": "file", "name": name}) file_names.append(name) parts = [] if img_count: parts.append(f"{img_count} 张图片") if file_names: parts.append(f"{len(file_names)} 个文件({'、'.join(file_names)})") display_prompt = f"{prompt}\n\n📎 已附带:{','.join(parts)}" if parts else prompt return prompt + "\n".join(attachment_chunks), display_prompt, display_attachments # ── small reusable widgets ──────────────────────────────────────────────────── class _Separator(QFrame): def __init__(self, parent=None): super().__init__(parent) self.setFixedHeight(1) self.setStyleSheet(f"background: {C['border'].name()};") class _Badge(QLabel): def __init__(self, text: str, parent=None): super().__init__(text, parent) self.setStyleSheet( "QLabel { background: rgba(63,63,70,0.9); color: #a1a1aa;" " border: 1px solid #3f3f46; border-radius: 9px;" " padding: 1px 8px; font-size: 11px; }" ) class _StreamingBadge(QLabel): def __init__(self, parent=None): super().__init__("处理中…", parent) self.setStyleSheet( "QLabel { background: rgba(124,58,237,0.18); color: #c4b5fd;" " border: 1px solid rgba(124,58,237,0.35); border-radius: 9px;" " padding: 1px 8px; font-size: 11px; }" ) self.hide() class _FoldableTextBrowser(QTextBrowser): """QTextBrowser subclass that reliably detects clicks on fold anchors.""" def __init__(self, parent=None): super().__init__(parent) self.viewport().installEventFilter(self) def eventFilter(self, obj, event): from PySide6.QtCore import QEvent if obj is self.viewport() and event.type() == QEvent.MouseButtonRelease: href = self.anchorAt(event.pos()) if href and href.startswith("#fold_"): from urllib.parse import unquote title = unquote(href[6:]) p = self.parent() while p and not isinstance(p, _MsgRow): p = p.parent() if p and hasattr(p, '_toggle_fold'): p._toggle_fold(title) return True return super().eventFilter(obj, event) class _MsgRow(QWidget): """A single message row – flat layout with avatar, inspired by ChatGPT / Qwen.""" _ACTION_BTN = """ QPushButton { background: transparent; border: none; border-radius: 4px; padding: 3px; } QPushButton:hover { background: %s; } """ % C["hover_bg"] def __init__(self, text: str, role: str, parent=None, on_resend=None, on_delete=None, on_rewrite=None, created_at: str = None): super().__init__(parent) self._text = text self._role = role self._on_resend = on_resend self._on_delete = on_delete self._on_rewrite = on_rewrite self._created_at = created_at self._action_row = None self._finished = True is_user = role == "user" self.setStyleSheet("background: transparent;") outer = QHBoxLayout(self) outer.setContentsMargins(12, 10, 12, 10) outer.setSpacing(10) outer.setAlignment(Qt.AlignTop) # ── avatar ── avatar = QLabel() avatar.setFixedSize(30, 30) avatar.setAlignment(Qt.AlignCenter) svg_data = _SVG_USER if is_user else _SVG_BOT avatar_color = "#c8c8d0" if is_user else "#9eb4d0" pm = QPixmap(30, 30) pm.fill(QColor(0, 0, 0, 0)) from PySide6.QtSvg import QSvgRenderer renderer = QSvgRenderer(QByteArray(svg_data.replace("{c}", avatar_color).encode())) p = QPainter(pm) renderer.render(p) p.end() avatar.setPixmap(pm) avatar.setStyleSheet( "QLabel { background: rgba(255,255,255,0.04); border: 1px solid rgba(255,255,255,0.10);" " border-radius: 15px; }" ) # ── content column ── content_col = QVBoxLayout() content_col.setContentsMargins(0, 0, 0, 0) content_col.setSpacing(2) role_lbl = QLabel("你" if is_user else "助手") role_lbl.setStyleSheet( "color: #d4d4d8; font-size: 12px; font-weight: 700; background: transparent;" ) if is_user: role_lbl.setAlignment(Qt.AlignRight) content_col.addWidget(role_lbl) if is_user: # ── user: right-aligned bubble ── bubble = QWidget() bubble.setStyleSheet( "background: rgba(63,63,70,0.4); border-radius: 12px;" ) bubble_ly = QVBoxLayout(bubble) bubble_ly.setContentsMargins(12, 8, 12, 8) bubble_ly.setSpacing(0) label = QLabel(text) label.setWordWrap(True) label.setTextInteractionFlags(Qt.TextSelectableByMouse) label.setSizePolicy(QSizePolicy.Preferred, QSizePolicy.Minimum) label.setStyleSheet( "QLabel { background: transparent; color: #e4e4e7;" " padding: 0; font-size: 14px; line-height: 1.6; }" ) bubble_ly.addWidget(label) self._label = label # Size bubble to text: measure longest line, cap at 420 fm = label.fontMetrics() text_w = max((fm.horizontalAdvance(ln) for ln in text.split('\n')), default=0) bubble.setMinimumWidth(min(text_w + 24, 420)) bubble.setMaximumWidth(420) content_col.addWidget(bubble, 0, Qt.AlignRight) # ── user message action row ── self._action_row = QWidget() self._action_row.setStyleSheet("background: transparent;") alayout = QHBoxLayout(self._action_row) alayout.setContentsMargins(0, 4, 0, 0) alayout.setSpacing(4) alayout.setAlignment(Qt.AlignRight) icon_sz = QSize(15, 15) copy_btn = QPushButton() copy_btn.setIcon(_svg_icon("copy", _SVG_COPY)) copy_btn.setIconSize(icon_sz) copy_btn.setFixedSize(26, 24) copy_btn.setStyleSheet(self._ACTION_BTN) copy_btn.setToolTip("复制") copy_btn.setCursor(QCursor(Qt.PointingHandCursor)) copy_btn.clicked.connect(self._copy_text) alayout.addWidget(copy_btn) if on_delete: delete_btn = QPushButton() delete_btn.setIcon(_svg_icon("delete", _SVG_TRASH)) delete_btn.setIconSize(icon_sz) delete_btn.setFixedSize(26, 24) delete_btn.setStyleSheet(self._ACTION_BTN) delete_btn.setToolTip("删除") delete_btn.setCursor(QCursor(Qt.PointingHandCursor)) delete_btn.clicked.connect(self._do_delete) alayout.addWidget(delete_btn) if on_rewrite: rewrite_btn = QPushButton() rewrite_btn.setIcon(_svg_icon("rewrite", _SVG_RESET)) rewrite_btn.setIconSize(icon_sz) rewrite_btn.setFixedSize(26, 24) rewrite_btn.setStyleSheet(self._ACTION_BTN) rewrite_btn.setToolTip("重写") rewrite_btn.setCursor(QCursor(Qt.PointingHandCursor)) rewrite_btn.clicked.connect(self._do_rewrite) alayout.addWidget(rewrite_btn) alayout.addStretch() if created_at: from datetime import datetime try: dt = datetime.fromisoformat(created_at) time_lbl = QLabel(dt.strftime("%Y-%m-%d %H:%M")) time_lbl.setStyleSheet("color: #a1a1aa; font-size: 11px; background: transparent;") alayout.addWidget(time_lbl) except: pass self._action_row.hide() content_col.addWidget(self._action_row, 0, Qt.AlignRight) else: # ── assistant: left-aligned, no bubble ── browser = _FoldableTextBrowser() browser.setReadOnly(True) browser.setOpenExternalLinks(True) browser.setVerticalScrollBarPolicy(Qt.ScrollBarAlwaysOff) browser.setHorizontalScrollBarPolicy(Qt.ScrollBarAlwaysOff) browser.setSizePolicy(QSizePolicy.Expanding, QSizePolicy.Minimum) browser.document().setDefaultStyleSheet(_MD_CSS) browser.setStyleSheet( "QTextBrowser { background: transparent; color: #e4e4e7;" " border: none; padding: 0; font-size: 14px; }" ) self._folded_ids = set() # 记录被折叠的块 self._auto_fold_new_blocks(text) browser.setHtml(self._render_with_folds(text)) self._label = browser content_col.addWidget(browser) self._adjust_browser_height() self._action_row = QWidget() self._action_row.setStyleSheet("background: transparent;") alayout = QHBoxLayout(self._action_row) alayout.setContentsMargins(0, 4, 0, 0) alayout.setSpacing(4) icon_sz = QSize(15, 15) copy_btn = QPushButton() copy_btn.setIcon(_svg_icon("copy", _SVG_COPY)) copy_btn.setIconSize(icon_sz) copy_btn.setFixedSize(26, 24) copy_btn.setStyleSheet(self._ACTION_BTN) copy_btn.setToolTip("复制") copy_btn.setCursor(QCursor(Qt.PointingHandCursor)) copy_btn.clicked.connect(self._copy_text) alayout.addWidget(copy_btn) if on_delete: delete_btn = QPushButton() delete_btn.setIcon(_svg_icon("delete", _SVG_TRASH)) delete_btn.setIconSize(icon_sz) delete_btn.setFixedSize(26, 24) delete_btn.setStyleSheet(self._ACTION_BTN) delete_btn.setToolTip("删除") delete_btn.setCursor(QCursor(Qt.PointingHandCursor)) delete_btn.clicked.connect(self._do_delete) alayout.addWidget(delete_btn) if on_resend: regen_btn = QPushButton() regen_btn.setIcon(_svg_icon("regen", _SVG_REGEN)) regen_btn.setIconSize(icon_sz) regen_btn.setFixedSize(26, 24) regen_btn.setStyleSheet(self._ACTION_BTN) regen_btn.setToolTip("重新生成") regen_btn.setCursor(QCursor(Qt.PointingHandCursor)) regen_btn.clicked.connect(self._do_resend) alayout.addWidget(regen_btn) export_btn = QPushButton() export_btn.setIcon(_svg_icon("save", _SVG_SAVE)) export_btn.setIconSize(icon_sz) export_btn.setFixedSize(26, 24) export_btn.setStyleSheet(self._ACTION_BTN) export_btn.setToolTip("导出为md") export_btn.setCursor(QCursor(Qt.PointingHandCursor)) export_btn.clicked.connect(self._export_as_md) alayout.addWidget(export_btn) alayout.addStretch() if created_at: from datetime import datetime try: dt = datetime.fromisoformat(created_at) time_lbl = QLabel(dt.strftime("%Y-%m-%d %H:%M")) time_lbl.setStyleSheet("color: #a1a1aa; font-size: 11px; background: transparent;") alayout.addWidget(time_lbl) except: pass self._action_row.hide() content_col.addWidget(self._action_row) # ── assemble: assistant left, user right ── if is_user: outer.addStretch(1) outer.addLayout(content_col, 0) outer.addWidget(avatar, 0, Qt.AlignTop) else: outer.addWidget(avatar, 0, Qt.AlignTop) outer.addLayout(content_col, 1) def _copy_text(self): QApplication.clipboard().setText(self._text) def _do_resend(self): if self._on_resend: self._on_resend() def _do_delete(self): if self._on_delete: self._on_delete() def _do_rewrite(self): if self._on_rewrite: self._on_rewrite() def _export_as_md(self): from PySide6.QtWidgets import QFileDialog import os from datetime import datetime default_name = f"msg_{datetime.now().strftime('%Y%m%d_%H%M%S')}.md" file_path, _ = QFileDialog.getSaveFileName( self, "导出为 Markdown", default_name, "Markdown 文件 (*.md);;所有文件 (*)" ) if file_path: try: with open(file_path, "w", encoding="utf-8") as f: f.write(self._text) except Exception as e: import traceback traceback.print_exc() def enterEvent(self, event): if self._action_row and self._finished: self._action_row.show() super().enterEvent(event) def leaveEvent(self, event): if self._action_row: self._action_row.hide() super().leaveEvent(event) def resizeEvent(self, event): super().resizeEvent(event) if self._role != "user" and hasattr(self, '_label'): self._adjust_browser_height() def set_finished(self, done: bool): self._finished = done if not done and self._action_row: self._action_row.hide() def _adjust_browser_height(self): doc = self._label.document() w = self._label.width() if w < 50: w = 460 doc.setTextWidth(w - 6) self._label.setFixedHeight(int(doc.size().height() + 8)) def set_text(self, text: str): self._text = text if self._role == "user": self._label.setText(text) self._label.adjustSize() else: self._auto_fold_new_blocks(text) self._label.setHtml(self._render_with_folds(text)) self._adjust_browser_height() def highlight(self, keyword: str): """Apply highlight and return keyword's y position in document, or None.""" if not keyword or not self._text: return None kw_lower = keyword.lower() text_lower = self._text.lower() if kw_lower not in text_lower: return None if self._role == "user": escaped = self._text.replace("&", "&").replace("<", "<").replace(">", ">") kw_esc = keyword.replace("&", "&").replace("<", "<").replace(">", ">") highlighted = escaped.replace(kw_esc, f'{kw_esc}') self._label.setText(highlighted) self._label.adjustSize() return 0 # plain text, keyword at top else: from PySide6.QtGui import QTextDocument, QTextCursor, QTextCharFormat doc = self._label.document() cursor = QTextCursor(doc) flags = QTextDocument.FindFlags(0) fmt = QTextCharFormat() fmt.setBackground(QColor(251, 191, 36, 90)) fmt.setForeground(QColor(251, 191, 36)) keyword_y = None while True: cursor = doc.find(keyword, cursor, flags) if cursor.isNull(): break cursor.mergeCharFormat(fmt) if keyword_y is None: keyword_y = self._label.cursorRect(cursor).y() self._adjust_browser_height() return keyword_y def clear_highlight(self): if self._role == "user": self._label.setText(self._text) self._label.adjustSize() else: self._label.setHtml(self._render_with_folds(self._text)) self._adjust_browser_height() def _parse_foldable_blocks(self, text: str): """解析文本为可折叠块,返回 [(type, title_or_None, content), ...]""" import re lines = text.split('\n') blocks = [] current_type = "normal" current_title = None current_lines = [] for line in lines: # 检查是否是折叠块开始 llm_match = re.match(r'^\s*\*\*LLM Running \(Turn \d+\) \.\.\.\*\*\s*$', line) tool_match = re.match(r'^\s*🛠️\s*Tool:', line) tool_compact_match = re.match(r'^\s*🛠️\s+\w+\(', line) is_foldable_start = llm_match or tool_match or tool_compact_match if is_foldable_start: if current_lines: blocks.append((current_type, current_title, '\n'.join(current_lines))) title = line.strip() if llm_match: title = line.strip().replace('**', '') current_type = "foldable" current_title = title current_lines = [line] else: current_lines.append(line) if current_lines: blocks.append((current_type, current_title, '\n'.join(current_lines))) return blocks def _auto_fold_new_blocks(self, text: str): """将新出现的折叠块加入 _folded_ids(仅在此处修改集合)""" for _, title, _ in self._parse_foldable_blocks(text): if title is not None and title not in self._folded_ids: self._folded_ids.add(title) def _render_with_folds(self, text: str) -> str: """渲染文本为带折叠的 HTML(纯渲染,不修改 _folded_ids)""" from urllib.parse import quote blocks = self._parse_foldable_blocks(text) html_parts = [] for i, (block_type, title, content) in enumerate(blocks): if block_type == "normal": html_parts.append(f'
{_md_to_html(content)}
') else: safe_title = quote(title, safe='') display_title = title.replace('**', '') if title in self._folded_ids: # 折叠状态:只显示标题 + 展开链接 html_parts.append( f'' ) else: # 展开状态:显示标题 + 折叠链接 + 内容 html_parts.append( f'' ) return '\n'.join(html_parts) def _toggle_fold(self, title): """折叠/展开切换""" if title in self._folded_ids: self._folded_ids.remove(title) else: self._folded_ids.add(title) self._label.setHtml(self._render_with_folds(self._text)) self._adjust_browser_height() class _TabButton(QPushButton): _STYLE = """ QPushButton {{ background: transparent; color: {muted}; border: none; border-radius: 8px; padding: 0 14px; font-size: 12px; font-weight: 700; }} QPushButton:hover {{ background: {hover_bg}; color: {text}; }} QPushButton:checked {{ background: {accent}; color: white; }} """.format(muted=C["muted"], text=C["text"], hover_bg=C["hover_bg"], accent=C["accent"]) def __init__(self, text: str, parent=None): super().__init__(text, parent) self.setCheckable(True) self.setFixedHeight(30) self.setStyleSheet(self._STYLE) def _action_btn(label: str, color: str, icon: QIcon | None = None) -> QPushButton: btn = QPushButton(label) if icon and not icon.isNull(): btn.setIcon(icon) btn.setIconSize(QSize(16, 16)) btn.setFixedHeight(36) btn.setStyleSheet(f""" QPushButton {{ background: rgba(35,35,40,0.8); color: {C['text']}; border: 1px solid {C['border'].name()}; border-left: 3px solid {color}; border-radius: 8px; padding: 0 14px; font-size: 13px; font-weight: 700; text-align: left; }} QPushButton:hover {{ background: rgba(55,55,62,0.9); }} QPushButton:checked {{ color: {color}; background: rgba(35,35,40,0.95); }} """) return btn # ── Main panel ──────────────────────────────────────────────────────────────── class ChatPanel(QWidget): """Frameless always-on-top chat window.""" def __init__(self, agent): super().__init__() self.agent = agent # session state self._messages: list[dict] = [] self._session = {"id": _make_session_id(), "title": "新对话", "messages": []} self._history: list[dict] = _load_history() self._pending_files: list[dict] = [] # {'name','type','raw'} self._settings_health_checked = False # streaming state self._display_queue: Optional[_queue.Queue] = None self._streaming_row: Optional[_MsgRow] = None self._streaming_text = "" self._user_scrolled_up = False self._poll_timer = QTimer(self) self._poll_timer.timeout.connect(self._poll_queue) # autonomous mode self.autonomous_enabled = False self.last_reply_time = time.time() self.setWindowFlags( Qt.FramelessWindowHint | Qt.WindowStaysOnTopHint | Qt.Tool ) self.setAttribute(Qt.WA_TranslucentBackground) self.resize(530, 700) # drag state (title bar) self._drag_pos: Optional[QPoint] = None self._build_ui() def paintEvent(self, _event): p = QPainter(self) p.setRenderHint(QPainter.Antialiasing) path = QPainterPath() path.addRect(0.5, 0.5, self.width() - 1.0, self.height() - 1.0) grad = QLinearGradient(0, 0, 0, self.height()) grad.setColorAt(0.0, QColor(20, 20, 28, 255)) grad.setColorAt(1.0, QColor(10, 10, 14, 255)) p.fillPath(path, grad) def resizeEvent(self, event): path = QPainterPath() path.addRect(0, 0, float(self.width()), float(self.height())) self.setMask(QRegion(path.toFillPolygon().toPolygon())) super().resizeEvent(event) # ── UI construction ─────────────────────────────────────────────────────── def _build_ui(self): root = QVBoxLayout(self) root.setContentsMargins(0, 0, 0, 0) root.setSpacing(0) root.addWidget(self._build_titlebar()) root.addWidget(_Separator()) root.addWidget(self._build_tabbar()) root.addWidget(_Separator()) self._stack = QStackedWidget() self._stack.setStyleSheet("background: transparent;") self._stack.addWidget(self._build_chat_page()) # 0 self._stack.addWidget(self._build_history_page()) # 1 self._stack.addWidget(self._build_sop_page()) # 2 self._stack.addWidget(self._build_settings_page())# 3 root.addWidget(self._stack) root.addWidget(self._build_statusbar()) # Now that _stack exists, activate the first tab self._switch_tab(0) # ── title bar ───────────────────────────────────────────────────────────── def _build_titlebar(self) -> QWidget: bar = QWidget() bar.setFixedHeight(48) bar.setStyleSheet("background: transparent;") bar.setCursor(QCursor(Qt.SizeAllCursor)) ly = QHBoxLayout(bar) ly.setContentsMargins(16, 0, 10, 0) ly.setSpacing(8) # Search button search_btn = QPushButton() search_btn.setIcon(_svg_icon("search", _SVG_SEARCH, "#a1a1aa")) search_btn.setIconSize(QSize(16, 16)) search_btn.setFixedSize(26, 26) search_btn.setCursor(QCursor(Qt.PointingHandCursor)) search_btn.setStyleSheet(""" QPushButton { background: transparent; border: none; border-radius: 13px; } QPushButton:hover { background: rgba(63,63,70,0.6); } """) search_btn.clicked.connect(self._toggle_search) self._search_btn = search_btn ly.addWidget(search_btn) # Search widget (hidden by default) self._search_widget = QWidget() self._search_widget.hide() sw_ly = QHBoxLayout(self._search_widget) sw_ly.setContentsMargins(0, 0, 0, 0) sw_ly.setSpacing(6) self._search_input = QLineEdit() self._search_input.setPlaceholderText("搜索当前对话和历史...") self._search_input.setFixedHeight(26) self._search_input.setStyleSheet(f""" QLineEdit {{ background: rgba(32,32,38,0.9); border: 1px solid {C['border'].name()}; border-radius: 13px; color: {C['text']}; font-size: 13px; padding: 0 10px; }} QLineEdit::placeholder {{ color: {C['muted']}; }} """) self._search_input.setFixedWidth(200) self._search_input.textChanged.connect(self._on_search_changed) self._search_input.installEventFilter(self) sw_ly.addWidget(self._search_input) close_search = QPushButton("×") close_search.setFixedSize(26, 26) close_search.setCursor(QCursor(Qt.PointingHandCursor)) close_search.setStyleSheet(""" QPushButton { background: transparent; color: #71717a; border: none; font-size: 16px; } QPushButton:hover { color: #a1a1aa; } """) close_search.clicked.connect(self._hide_search) sw_ly.addWidget(close_search) ly.addWidget(self._search_widget) ly.addStretch() # Minimize button mini = QPushButton("\uE949") mini.setFixedSize(26, 26) mini.setCursor(QCursor(Qt.PointingHandCursor)) mini.setStyleSheet(""" QPushButton { background: rgba(63,63,70,0.6); color: #a1a1aa; border: none; border-radius: 13px; font-family: "Segoe MDL2 Assets"; font-size: 9px; } QPushButton:hover { background: rgba(63,63,70,0.9); color: white; } """) mini.clicked.connect(self.hide) ly.addWidget(mini) # Maximize button maxi = QPushButton("\uE739") maxi.setFixedSize(26, 26) maxi.setCursor(QCursor(Qt.PointingHandCursor)) maxi.setStyleSheet(""" QPushButton { background: rgba(63,63,70,0.6); color: #a1a1aa; border: none; border-radius: 13px; font-family: "Segoe MDL2 Assets"; font-size: 9px; } QPushButton:hover { background: rgba(63,63,70,0.9); color: white; } """) maxi.clicked.connect(self._toggle_maximize) self._maxi_btn = maxi ly.addWidget(maxi) # Close button close = QPushButton("\uE8BB") close.setFixedSize(26, 26) close.setCursor(QCursor(Qt.PointingHandCursor)) close.setStyleSheet(""" QPushButton { background: rgba(63,63,70,0.6); color: #a1a1aa; border: none; border-radius: 13px; font-family: "Segoe MDL2 Assets"; font-size: 9px; } QPushButton:hover { background: rgba(220,38,38,0.85); color: white; } """) close.clicked.connect(lambda: (self.close(), QApplication.instance().quit())) ly.addWidget(close) # Drag bar.mousePressEvent = self._tb_press bar.mouseMoveEvent = self._tb_move bar.mouseReleaseEvent = self._tb_release return bar def _toggle_search(self): if hasattr(self, "_search_visible") and self._search_visible: self._hide_search() else: self._show_search() def _show_search(self): self._search_visible = True self._search_btn.setFixedSize(0, 0) self._search_widget.show() self._search_input.setFocus() self._search_input.selectAll() def _hide_search(self): self._search_visible = False self._search_btn.setFixedSize(26, 26) self._search_widget.hide() self._search_input.clear() self._clear_all_highlights() if self._stack.currentIndex() == 1: self._reset_history_items_style() def _hide_search_if_no_focus(self): if not self._search_input.hasFocus(): self._hide_search() def _on_search_changed(self, text): if not text.strip(): self._clear_all_highlights() return keyword = text.strip() current_tab = self._stack.currentIndex() if current_tab == 0: self._search_current_chat(keyword) elif current_tab == 1: self._search_history(keyword) def _clear_all_highlights(self): for i in range(self._msg_layout.count() - 1): w = self._msg_layout.itemAt(i).widget() if isinstance(w, _MsgRow): w.clear_highlight() def _search_current_chat(self, keyword: str): first_found = None first_keyword_y = None for i in range(self._msg_layout.count() - 1): w = self._msg_layout.itemAt(i).widget() if isinstance(w, _MsgRow): if keyword.lower() in w._text.lower(): kw_y = w.highlight(keyword) if first_found is None: first_found = w first_keyword_y = kw_y else: w.clear_highlight() # 滚动到第一个匹配项(使用关键词在文档内的实际位置) if first_found: self._scroll_to_widget(first_found, first_keyword_y or 0) def _scroll_to_widget(self, w, keyword_y=0): self._user_scrolled_up = True self._msg_container.layout().activate() QApplication.processEvents() sb = self._scroll.verticalScrollBar() vp_h = self._scroll.viewport().height() keyword_screen_y = w.y() + keyword_y target = keyword_screen_y - vp_h // 3 target = max(0, min(target, sb.maximum())) sb.setValue(target) QApplication.processEvents() self._scroll.viewport().repaint() def _search_history(self, keyword: str): kw_lower = keyword.lower() for i in range(self._hist_list.count()): item = self._hist_list.item(i) session = item.data(Qt.UserRole) messages = session.get("messages", []) if session else [] content_text = " ".join([m.get("content", "") for m in messages if isinstance(m.get("content"), str)]) match = kw_lower in content_text.lower() item.setHidden(not match) if match: item.setBackground(QColor(251, 191, 36, 50)) item.setForeground(QColor(251, 191, 36)) else: item.setBackground(QColor(0, 0, 0, 0)) item.setForeground(QColor(255, 255, 255)) def _reset_history_items_style(self): for i in range(self._hist_list.count()): item = self._hist_list.item(i) item.setHidden(False) item.setBackground(QColor(0, 0, 0, 0)) item.setForeground(QColor(255, 255, 255)) w = self._hist_list.itemWidget(item) if w: w.setStyleSheet( f"background: rgba(35,35,42,0.6); color: {C['text']};" " border: 1px solid #3f3f46; border-radius: 8px;" " padding: 8px 12px; margin: 2px 0;" ) def _tb_press(self, e): if e.button() == Qt.LeftButton: self._drag_pos = e.globalPosition().toPoint() - self.pos() def _tb_move(self, e): if e.buttons() == Qt.LeftButton and self._drag_pos is not None: self.move(e.globalPosition().toPoint() - self._drag_pos) def _tb_release(self, _e): self._drag_pos = None def _toggle_maximize(self): if self.isMaximized(): self.showNormal() self._maxi_btn.setText("☐") else: self.showMaximized() self._maxi_btn.setText("❐") # ── status bar ───────────────────────────────────────────────────────────── def _build_statusbar(self) -> QWidget: bar = QWidget() bar.setFixedHeight(24) bar.setStyleSheet("background: transparent;") ly = QHBoxLayout(bar) ly.setContentsMargins(16, 0, 10, 0) ly.setSpacing(8) # Status dot dot = QLabel("●") dot.setStyleSheet(f"color: {C['green']}; font-size: 9px;") dot.setFixedWidth(12) ly.addWidget(dot) # Model name (clickable to show model list) self._model_badge = QLabel(self._model_name()) self._model_badge.setStyleSheet("color: #a1a1aa; font-size: 11px;") self._model_badge.setCursor(QCursor(Qt.PointingHandCursor)) self._model_badge.mousePressEvent = lambda e: self._show_model_menu(e) ly.addWidget(self._model_badge) self._streaming_badge = _StreamingBadge() ly.addWidget(self._streaming_badge) ly.addStretch() return bar def _show_model_menu(self, _e): menu = QMenu(self._model_badge) menu.setStyleSheet(f""" QMenu {{ background: {C['panel'].name()}; border: 1px solid {C['border'].name()}; padding: 4px 0; }} QMenu::item {{ color: {C['text']}; padding: 6px 20px 6px 12px; font-size: 12px; }} QMenu::item:selected {{ background: {C['hover_bg']}; }} """) for i, client in enumerate(self.agent.llmclients): name = getattr(client, 'name', None) or "未知" act = menu.addAction(f"{name} #{i + 1}") act.triggered.connect(lambda _, idx=i: self._do_switch_to(idx)) menu.exec(QCursor.pos()) # ── tab bar ─────────────────────────────────────────────────────────────── def _build_tabbar(self) -> QWidget: bar = QWidget() bar.setFixedHeight(40) bar.setStyleSheet("background: rgba(10,10,14,0.6);") ly = QHBoxLayout(bar) ly.setContentsMargins(12, 5, 12, 5) ly.setSpacing(4) self._tabs: list[_TabButton] = [] tab_defs = [ (_SVG_CHAT, "对话"), (_SVG_CLOCK, "历史"), (_SVG_BOOK, "SOP"), (_SVG_GEAR, "设置"), ] for i, (svg, text) in enumerate(tab_defs): btn = _TabButton(text) btn.setIcon(_svg_icon(text, svg, "#b0b0b8")) btn.setIconSize(QSize(14, 14)) btn.clicked.connect(lambda _checked, idx=i: self._switch_tab(idx)) ly.addWidget(btn) self._tabs.append(btn) ly.addStretch() new_btn = QPushButton("新对话") new_btn.setIcon(_svg_icon("plus", _SVG_PLUS, "#a78bfa")) new_btn.setIconSize(QSize(12, 12)) new_btn.setFixedHeight(27) new_btn.setStyleSheet(f""" QPushButton {{ background: rgba(124,58,237,0.18); color: #a78bfa; border: 1px solid rgba(124,58,237,0.3); border-radius: 7px; padding: 0 10px; font-size: 12px; font-weight: 700; }} QPushButton:hover {{ background: rgba(124,58,237,0.35); color: white; }} """) new_btn.clicked.connect(self._new_session) ly.addWidget(new_btn) # NOTE: _switch_tab(0) is called in _build_ui() after _stack is created return bar def _switch_tab(self, idx: int): self._stack.setCurrentIndex(idx) for i, btn in enumerate(self._tabs): btn.setChecked(i == idx) # 切换标签时关闭搜索框 if hasattr(self, '_search_visible') and self._search_visible: self._hide_search() if idx == 1: self._refresh_history() if idx == 2: self._refresh_sop() if idx == 3: self._refresh_model_rows_style() if not self._settings_health_checked: self._start_health_checks() self._settings_health_checked = True # ── chat page ───────────────────────────────────────────────────────────── def _build_chat_page(self) -> QWidget: page = QWidget() page.setStyleSheet("background: transparent;") ly = QVBoxLayout(page) ly.setContentsMargins(0, 0, 0, 0) ly.setSpacing(0) # ── message scroll area ── self._scroll = QScrollArea() self._scroll.setWidgetResizable(True) self._scroll.setFrameShape(QFrame.NoFrame) self._scroll.setHorizontalScrollBarPolicy(Qt.ScrollBarAlwaysOff) self._scroll.setStyleSheet(f"QScrollArea {{ background: transparent; border: none; }} {SCROLLBAR_STYLE}") self._msg_container = QWidget() self._msg_container.setStyleSheet("background: transparent;") self._msg_layout = QVBoxLayout(self._msg_container) self._msg_layout.setContentsMargins(0, 12, 0, 12) self._msg_layout.setSpacing(4) self._msg_layout.addStretch() self._scroll.setWidget(self._msg_container) self._scroll.verticalScrollBar().valueChanged.connect(self._on_scroll) # ── scroll navigation buttons (centered at bottom of message area) ── scroll_wrapper = QWidget() scroll_wrapper.setStyleSheet("background: transparent;") wrap_ly = QVBoxLayout(scroll_wrapper) wrap_ly.setContentsMargins(0, 0, 0, 0) wrap_ly.setSpacing(0) wrap_ly.addWidget(self._scroll) self._nav_widget = QWidget() self._nav_widget.setFixedSize(68, 28) self._nav_widget.setStyleSheet("background: transparent; border: none;") nav_ly = QHBoxLayout(self._nav_widget) nav_ly.setContentsMargins(6, 2, 6, 2) nav_ly.setSpacing(8) self._nav_up = QPushButton("∧") self._nav_up.setFixedWidth(26) self._nav_up.setCursor(QCursor(Qt.PointingHandCursor)) self._nav_up.setStyleSheet(""" QPushButton { background: transparent; color: #71717a; border: none; font-size: 14px; } QPushButton:hover { color: #a1a1aa; } QPushButton:disabled { color: #27272a; } """) self._nav_up.clicked.connect(self._scroll_to_top) self._nav_down = QPushButton("∨") self._nav_down.setFixedWidth(26) self._nav_down.setCursor(QCursor(Qt.PointingHandCursor)) self._nav_down.setStyleSheet(""" QPushButton { background: transparent; color: #71717a; border: none; font-size: 14px; } QPushButton:hover { color: #a1a1aa; } QPushButton:disabled { color: #27272a; } """) self._nav_down.clicked.connect(self._scroll_to_bottom) nav_ly.addWidget(self._nav_up) nav_ly.addWidget(self._nav_down) wrap_ly.addWidget(self._nav_widget, 0, Qt.AlignHCenter | Qt.AlignBottom) self._nav_widget.setContentsMargins(0, 0, 0, 8) self._nav_widget.hide() ly.addWidget(scroll_wrapper, 1) ly.addWidget(_Separator()) # ── input area ── ly.addWidget(self._build_input_area()) QTimer.singleShot(200, self._update_nav_visibility) return page def _build_input_area(self) -> QWidget: wrap = QWidget() wrap.setStyleSheet("background: transparent;") ly = QVBoxLayout(wrap) ly.setContentsMargins(20, 6, 20, 0) ly.setSpacing(0) self._chips_row = QWidget() self._chips_row.setStyleSheet("background: transparent;") self._chips_ly = QHBoxLayout(self._chips_row) self._chips_ly.setContentsMargins(0, 0, 0, 6) self._chips_ly.setSpacing(6) self._chips_row.hide() ly.addWidget(self._chips_row) card = QWidget() card.setStyleSheet(f""" QWidget#inputCard {{ background: rgba(32,32,38,0.85); border: 1px solid {C['border'].name()}; border-radius: 16px; }} QWidget#inputCard:focus-within {{ border-color: rgba(124,58,237,0.55); }} """) card.setObjectName("inputCard") card_ly = QVBoxLayout(card) card_ly.setContentsMargins(14, 10, 10, 10) card_ly.setSpacing(6) class _PlainTextEdit(QTextEdit): def insertFromMimeData(self, source): text = source.text() or source.data("text/plain") if text: self.insertPlainText(text) self._input = _PlainTextEdit() self._input.setAutoFormatting(QTextEdit.AutoNone) self._input.setFixedHeight(64) self._input.setPlaceholderText("给助手发送消息... Enter发送,Shift+Enter换行") self._input.setStyleSheet(f""" QTextEdit {{ background: transparent; color: {C['text']}; border: none; padding: 0; font-size: 14px; selection-background-color: rgba(124,58,237,0.4); }} """) self._input.installEventFilter(self) self._input.textChanged.connect(self._on_text_changed) card_ly.addWidget(self._input) bottom = QHBoxLayout() bottom.setSpacing(6) attach = QPushButton() attach.setIcon(_svg_icon("clip", _SVG_CLIP, "#a1a1aa")) attach.setIconSize(QSize(17, 17)) attach.setFixedSize(30, 30) attach.setToolTip("上传附件") attach.setCursor(QCursor(Qt.PointingHandCursor)) attach.setStyleSheet(""" QPushButton { background: transparent; border: none; border-radius: 15px; } QPushButton:hover { background: rgba(63,63,70,0.6); } """) attach.clicked.connect(self._attach_files) bottom.addWidget(attach) self._char_lbl = QLabel("0 / 2000") self._char_lbl.setStyleSheet(f"color: {C['muted']}; font-size: 11px;") bottom.addWidget(self._char_lbl) self._token_lbl = QLabel("") self._token_lbl.setStyleSheet(f"color: {C['muted']}; font-size: 11px; margin-left: 10px;") bottom.addWidget(self._token_lbl) bottom.addStretch() self._is_streaming = False self._send_btn = QPushButton() self._send_btn.setFixedSize(34, 34) self._send_btn.setCursor(QCursor(Qt.PointingHandCursor)) self._send_btn.clicked.connect(self._on_send_btn_click) self._set_send_mode() bottom.addWidget(self._send_btn) card_ly.addLayout(bottom) ly.addWidget(card) return wrap # ── history page ────────────────────────────────────────────────────────── def _build_history_page(self) -> QWidget: page = QWidget() page.setStyleSheet("background: transparent;") ly = QVBoxLayout(page) ly.setContentsMargins(12, 12, 12, 12) ly.setSpacing(8) header = QHBoxLayout() lbl = QLabel("历史记录") lbl.setStyleSheet("color: #f4f4f5; font-weight: 600; font-size: 14px;") header.addWidget(lbl) header.addStretch() restore_btn = QPushButton("恢复会话") restore_btn.setStyleSheet(self._small_btn_style(C["accent"])) restore_btn.clicked.connect(self._restore_selected) header.addWidget(restore_btn) del_btn = QPushButton("删除") del_btn.setStyleSheet(self._small_btn_style("#dc2626")) del_btn.clicked.connect(self._delete_selected) header.addWidget(del_btn) ly.addLayout(header) self._hist_list = QListWidget() self._hist_list.setStyleSheet(f""" QListWidget {{ background: transparent; border: none; outline: none; }} QListWidget::item {{ background: rgba(35,35,42,0.6); color: {C['text']}; border: 1px solid {C['border'].name()}; border-radius: 8px; padding: 8px 12px; margin: 2px 0; }} QListWidget::item:hover {{ background: rgba(55,55,65,0.8); border-color: rgba(124,58,237,0.4); }} QListWidget::item:selected {{ background: {C["accent_bg"]}; border-color: rgba(124,58,237,0.6); }} {SCROLLBAR_STYLE} """) self._hist_list.itemDoubleClicked.connect(self._restore_selected) ly.addWidget(self._hist_list) return page # ── SOP page ────────────────────────────────────────────────────────────── def _build_sop_page(self) -> QWidget: page = QWidget() page.setStyleSheet("background: transparent;") ly = QVBoxLayout(page) ly.setContentsMargins(0, 0, 0, 0) splitter = QSplitter(Qt.Horizontal) self._sop_list = QListWidget() self._sop_list.setMaximumWidth(175) self._sop_list.setStyleSheet(f""" QListWidget {{ background: rgba(10,10,14,0.7); border: none; border-right: 1px solid {C['border'].name()}; outline: none; }} QListWidget::item {{ color: {C['muted']}; padding: 7px 10px; border-radius: 4px; margin: 1px 4px; }} QListWidget::item:hover {{ background: rgba(55,55,65,0.7); color: {C['text']}; }} QListWidget::item:selected {{ background: rgba(124,58,237,0.28); color: white; }} {SCROLLBAR_STYLE} """) self._sop_list.currentItemChanged.connect(self._load_sop) splitter.addWidget(self._sop_list) self._sop_viewer = QTextBrowser() self._sop_viewer.setOpenExternalLinks(True) self._sop_viewer.document().setDefaultStyleSheet(_MD_CSS) self._sop_viewer.setStyleSheet(f""" QTextBrowser {{ background: transparent; color: {C['text']}; border: none; padding: 10px 14px; font-family: "Arial", "Microsoft YaHei", sans-serif; font-size: 13px; }} {SCROLLBAR_STYLE} """) splitter.addWidget(self._sop_viewer) splitter.setSizes([165, 340]) ly.addWidget(splitter) return page # ── settings page ───────────────────────────────────────────────────────── def _build_settings_page(self) -> QWidget: page = QWidget() page.setStyleSheet("background: transparent;") ly = QVBoxLayout(page) ly.setContentsMargins(16, 16, 16, 16) ly.setSpacing(8) lbl = QLabel("控制面板") lbl.setStyleSheet("color: #f4f4f5; font-weight: 600; font-size: 14px;") ly.addWidget(lbl) self._model_info = QLabel(f"当前模型:{self._model_name()} (#{self.agent.llm_no})") self._model_info.setStyleSheet(f"color: {C['muted']}; font-size: 12px;") ly.addWidget(self._model_info) ly.addSpacing(4) model_hdr = QLabel("模型列表") model_hdr.setStyleSheet("color: #d4d4d8; font-weight: 600; font-size: 13px;") ly.addWidget(model_hdr) self._model_rows_container = QWidget() self._model_rows_container.setStyleSheet("background: transparent;") self._model_rows_layout = QVBoxLayout(self._model_rows_container) self._model_rows_layout.setContentsMargins(0, 0, 0, 0) self._model_rows_layout.setSpacing(3) ly.addWidget(self._model_rows_container) self._model_row_widgets: list[dict] = [] self._health_results: dict[int, bool | None] = {} self._build_model_rows() ly.addSpacing(6) for (lbl_text, color, handler, svg) in [ ("重置提示词", "#059669", self._do_reset_prompt, _SVG_RESET), ("保存当前会话","#0ea5e9", self._do_save, _SVG_SAVE), ("清空对话", "#78716c", self._do_clear, _SVG_TRASH), ]: b = _action_btn(lbl_text, color, _svg_icon(lbl_text, svg)) b.clicked.connect(handler) ly.addWidget(b) ly.addSpacing(10) sep = QLabel("自主行动") sep.setStyleSheet("color: #f4f4f5; font-weight: 600; font-size: 13px;") ly.addWidget(sep) self._auto_btn = _action_btn(f"开启自主行动 (idle > {AUTO_IDLE_THRESHOLD // 60} min 自动触发)", "#f59e0b", _svg_icon("bolt", _SVG_BOLT)) self._auto_btn.setCheckable(True) self._auto_btn.clicked.connect(self._do_toggle_auto) ly.addWidget(self._auto_btn) trigger_btn = _action_btn("立即触发一次", "#f59e0b", _svg_icon("play", _SVG_PLAY)) trigger_btn.clicked.connect(self._do_trigger_auto) ly.addWidget(trigger_btn) ly.addStretch() return page # ── model list ──────────────────────────────────────────────────────────── _MODEL_ROW_STYLE = ( "QPushButton { background: rgba(39,39,42,0.7); color: #e4e4e7;" " border: 1px solid #3f3f46; border-radius: 8px;" " padding: 6px 10px; font-size: 12px; font-weight: 700; text-align: left; }" " QPushButton:hover { background: rgba(63,63,70,0.8); }" ) _MODEL_ROW_ACTIVE = ( "QPushButton { background: rgba(124,58,237,0.25); color: #c4b5fd;" " border: 1px solid rgba(124,58,237,0.5); border-radius: 8px;" " padding: 6px 10px; font-size: 12px; font-weight: 700; text-align: left; }" " QPushButton:hover { background: rgba(124,58,237,0.35); }" ) def _build_model_rows(self): while self._model_rows_layout.count(): w = self._model_rows_layout.takeAt(0).widget() if w: w.deleteLater() self._model_row_widgets.clear() for idx, tc in enumerate(self.agent.llmclients): b = tc.backend name = f"{type(b).__name__}/{b.model}" is_current = idx == self.agent.llm_no row = QWidget() row.setStyleSheet("background: transparent;") rlay = QHBoxLayout(row) rlay.setContentsMargins(0, 0, 0, 0) rlay.setSpacing(6) dot = QLabel("●") dot.setFixedWidth(14) dot.setAlignment(Qt.AlignCenter) dot.setStyleSheet("color: #71717a; font-size: 11px;") rlay.addWidget(dot) btn = QPushButton(f" #{idx} {name}") btn.setCursor(QCursor(Qt.PointingHandCursor)) btn.setStyleSheet(self._MODEL_ROW_ACTIVE if is_current else self._MODEL_ROW_STYLE) btn.clicked.connect(lambda checked, i=idx: self._do_switch_to(i)) rlay.addWidget(btn, 1) self._model_rows_layout.addWidget(row) self._model_row_widgets.append({"dot": dot, "btn": btn, "idx": idx}) def _refresh_model_rows_style(self): for entry in self._model_row_widgets: is_current = entry["idx"] == self.agent.llm_no entry["btn"].setStyleSheet( self._MODEL_ROW_ACTIVE if is_current else self._MODEL_ROW_STYLE ) status = self._health_results.get(entry["idx"]) if status is True: entry["dot"].setStyleSheet("color: #22c55e; font-size: 11px;") elif status is False: entry["dot"].setStyleSheet("color: #ef4444; font-size: 11px;") else: entry["dot"].setStyleSheet("color: #71717a; font-size: 11px;") def _do_switch_to(self, idx: int): if idx == self.agent.llm_no: return self.agent.next_llm(n=idx) name = self._model_name() self._model_badge.setText(name) self._model_info.setText(f"当前模型:{name} (#{self.agent.llm_no})") self._add_system_notice(f"已切换至 {name},对话上下文已保留") self._refresh_model_rows_style() def _start_health_checks(self): self._health_results.clear() self._health_pending = 0 self._health_result_queue = _queue.Queue() for entry in self._model_row_widgets: entry["dot"].setStyleSheet("color: #71717a; font-size: 11px;") entry["dot"].setText("◌") for idx, tc in enumerate(self.agent.llmclients): self._health_pending += 1 t = threading.Thread(target=self._check_backend, args=(idx, tc.backend), daemon=True) t.start() if not hasattr(self, '_health_poll_timer'): self._health_poll_timer = QTimer(self) self._health_poll_timer.timeout.connect(self._poll_health_results) self._health_poll_timer.start(500) def _poll_health_results(self): while True: try: idx, ok = self._health_result_queue.get_nowait() self._health_results[idx] = ok except _queue.Empty: break self._refresh_model_rows_style() if len(self._health_results) >= self._health_pending: self._health_poll_timer.stop() def _check_backend(self, idx: int, backend): ok = False try: reply = backend.ask("你好") # 兼容生成器函数(NativeClaudeSession.ask是生成器) if hasattr(reply, '__iter__') and not isinstance(reply, str): reply = ''.join(str(b) for b in reply if isinstance(b, str)) text = str(reply).strip() if reply else "" ok = len(text) > 0 and not text.startswith("Error") and not text.startswith("[") print(f"[HealthCheck] Backend #{idx} {type(backend).__name__}/{backend.model}: {'OK' if ok else 'FAIL'} -> {text[:60]}") except Exception as e: print(f"[HealthCheck] Backend #{idx} {type(backend).__name__}/{backend.model}: ERROR -> {e}") ok = False if hasattr(backend, 'raw_msgs') and backend.raw_msgs: backend.raw_msgs = [m for m in backend.raw_msgs if m.get("prompt") != "你好"] self._health_result_queue.put((idx, ok)) # ── event filter (Enter key in text edit, Escape to close search) ────────── def eventFilter(self, obj, event): if event.type() == QEvent.KeyPress: if obj is self._search_input and event.key() == Qt.Key_Escape: self._hide_search() return True if obj is self._input and event.key() in (Qt.Key_Return, Qt.Key_Enter): if not (event.modifiers() & Qt.ShiftModifier): self._handle_send() return True # 搜索框失焦时关闭搜索 if event.type() == QEvent.FocusOut and obj is self._search_input: # 延迟关闭,等待点击事件处理完毕 QTimer.singleShot(50, self._hide_search_if_no_focus) return super().eventFilter(obj, event) def _on_text_changed(self): n = len(self._input.toPlainText()) self._char_lbl.setText(f"{n} / 2000") # ── file attachment ──────────────────────────────────────────────────────── def _attach_files(self): paths, _ = QFileDialog.getOpenFileNames( self, "选择附件", "", "All Files (*);;" "Images (*.png *.jpg *.jpeg *.gif *.webp *.bmp);;" "Text (*.txt *.md *.py *.json *.csv *.yaml *.yml *.log *.js *.ts *.sql)", ) for path in paths: name = os.path.basename(path) if any(f["name"] == name for f in self._pending_files): continue ext = os.path.splitext(path)[1].lower() img_exts = {".png", ".jpg", ".jpeg", ".gif", ".webp", ".bmp"} mime = (f"image/{ext[1:]}" if ext in img_exts else "text/plain" if ext in TEXT_FILE_EXTS else "application/octet-stream") try: with open(path, "rb") as fh: raw = fh.read() if len(raw) > MAX_UPLOAD_BYTES: print(f"[Attach] 文件过大,已跳过: {name} ({len(raw)} bytes)") continue self._pending_files.append({"name": name, "type": mime, "raw": raw}) except Exception as e: print(f"[Attach] Failed to read {path}: {e}") self._refresh_chips() def _refresh_chips(self): while self._chips_ly.count(): item = self._chips_ly.takeAt(0) if item.widget(): item.widget().deleteLater() if not self._pending_files: self._chips_row.hide() return for f in self._pending_files: chip = QLabel(f['name']) chip.setStyleSheet(f""" QLabel {{ background: rgba(55,55,65,0.7); color: {C['text']}; border: 1px solid {C['border'].name()}; border-radius: 6px; padding: 3px 8px; font-size: 11px; }} """) self._chips_ly.addWidget(chip) self._chips_ly.addStretch() self._chips_row.show() # ── send / streaming ─────────────────────────────────────────────────────── _SEND_BTN_STYLE = """ QPushButton { background: #e4e4e7; border: none; border-radius: 17px; } QPushButton:hover { background: #f4f4f5; } QPushButton:pressed { background: #d4d4d8; } """ _STOP_BTN_STYLE = """ QPushButton { background: rgba(239,68,68,0.85); border: none; border-radius: 17px; } QPushButton:hover { background: rgba(248,113,113,0.9); } QPushButton:pressed { background: rgba(220,38,38,0.9); } """ def _set_send_mode(self): self._is_streaming = False self._send_btn.setText("") self._send_btn.setIcon(_svg_icon("send_arrow", _SVG_SEND, "#18181b")) self._send_btn.setIconSize(QSize(18, 18)) self._send_btn.setStyleSheet(self._SEND_BTN_STYLE) def _set_stop_mode(self): self._is_streaming = True self._send_btn.setText("") self._send_btn.setIcon(_svg_icon("stop_circle", _SVG_STOP, "#ffffff")) self._send_btn.setIconSize(QSize(16, 16)) self._send_btn.setStyleSheet(self._STOP_BTN_STYLE) def _on_send_btn_click(self): if self._is_streaming: self._do_stop() else: self._handle_send() def _handle_send(self): text = self._input.toPlainText().strip() files = self._pending_files.copy() if not text and not files: return if text.startswith("/"): self._input.clear() self._pending_files.clear() self._refresh_chips() self._handle_command(text) return prompt = text or "请分析我上传的附件。" full_prompt, display_prompt, _ = _build_prompt_with_uploads(prompt, files) # Clear input state self._input.clear() self._pending_files.clear() self._refresh_chips() # Update session title if self._session["title"] == "新对话" and prompt: self._session["title"] = prompt[:20] + ("..." if len(prompt) > 20 else "") from datetime import datetime now_iso = datetime.now().isoformat() user_idx = len(self._messages) self._messages.append({"role": "user", "content": display_prompt, "created_at": now_iso}) self._add_msg_row( "user", display_prompt, created_at=now_iso, on_delete=lambda idx=user_idx: self._delete_message(idx), on_rewrite=lambda idx=user_idx: self._rewrite_message(idx) ) self._update_token_usage() # Start streaming — reset scroll lock so new output auto-scrolls self._user_scrolled_up = False self._streaming_text = "" # The streaming row will be replaced when done, it doesn't need deletion/export self._streaming_row = self._add_msg_row("assistant", "▌") self._streaming_row.set_finished(False) self._set_stop_mode() self._streaming_badge.show() self._display_queue = self.agent.put_task(f"{FILE_HINT}\n\n{full_prompt}", source="user") self._poll_timer.start(40) def _handle_command(self, cmd: str): parts = cmd.split() op = parts[0].lower() if parts else "" if op == "/help": self._add_system_notice(HELP_TEXT) elif op == "/stop": self._do_stop() self._add_system_notice("⏹️ 已停止") elif op == "/status": llm = self._model_name() state = "🔴 运行中" if self.agent.is_running else "🟢 空闲" self._add_system_notice(f"状态: {state}\nLLM: [{self.agent.llm_no}] {llm}") elif op == "/llm": if not self.agent.llmclient: self._add_system_notice("❌ 当前没有可用的 LLM 配置") elif len(parts) > 1: try: idx = int(parts[1]) self._do_switch_to(idx) except Exception: self._add_system_notice(f"用法: /llm <0-{len(self.agent.llmclients) - 1}>") else: lines = [f"{'→' if i == self.agent.llm_no else ' '} [{i}] {getattr(c, 'name', type(c.backend).__name__ + '/' + c.backend.model)}" for i, c in enumerate(self.agent.llmclients)] self._add_system_notice("LLMs:\n" + "\n".join(lines)) elif op == "/restore": restored_info, err = format_restore() if err: self._add_system_notice(err) else: restored, fname, count = restored_info self.agent.abort() self.agent.history.extend(restored) self._add_system_notice(f"✅ 已恢复 {count} 轮对话\n来源: {fname}") elif op == "/new": self._do_clear() self._add_system_notice("✅ 已开启新对话") else: self._add_system_notice(f"未知命令: {cmd}\n{HELP_TEXT}") def _poll_queue(self): if not self._display_queue: return try: while True: item = self._display_queue.get_nowait() if not isinstance(item, dict) or ("next" not in item and "done" not in item): print(f"[Queue] 跳过异常项: {item}") continue if "next" in item: self._streaming_text = item["next"] if self._streaming_row: self._streaming_row.set_text(self._streaming_text + " ▌") self._update_token_usage() self._scroll_bottom() if "done" in item: final = item["done"] from datetime import datetime now_iso = datetime.now().isoformat() # Remove the temporary streaming row if self._streaming_row: # Find its position in the layout to replace it idx = self._msg_layout.indexOf(self._streaming_row) self._streaming_row.deleteLater() self._streaming_row = None # Add the final message with proper buttons assist_idx = len(self._messages) self._messages.append({"role": "assistant", "content": final, "created_at": now_iso}) # Insert at the same position where the streaming row was, or before the stretch insert_pos = idx if idx >= 0 else self._msg_layout.count() - 1 row = _MsgRow( final, "assistant", on_resend=self._regenerate_response, on_delete=lambda idx=assist_idx: self._delete_message(idx), on_rewrite=None, created_at=now_iso ) # 自动展开最后一个 LLM Running 块,方便用户直接看到结果 for _, title, _ in reversed(row._parse_foldable_blocks(final)): if title is not None and title in row._folded_ids and 'LLM Running' in title: row._folded_ids.remove(title) row._label.setHtml(row._render_with_folds(final)) row._adjust_browser_height() break self._msg_layout.insertWidget(insert_pos, row) self._poll_timer.stop() self._set_send_mode() self._streaming_badge.hide() self.last_reply_time = time.time() self._update_token_usage() self._scroll_bottom() self._auto_save() break except _queue.Empty: pass def _add_msg_row(self, role: str, text: str, created_at: str = None, on_delete=None, on_rewrite=None) -> _MsgRow: row = _MsgRow( text, role, on_resend=self._regenerate_response if role != "user" else None, on_delete=on_delete, on_rewrite=on_rewrite, created_at=created_at ) self._msg_layout.insertWidget(self._msg_layout.count() - 1, row) self._scroll_bottom() return row def _regenerate_response(self): """Resend the last user message to regenerate the assistant response.""" if self._is_streaming: return for msg in reversed(self._messages): if msg["role"] == "user": self._input.setPlainText(msg["content"]) self._handle_send() break def _delete_message(self, index: int): """Delete the message at the given index.""" if index < 0 or index >= len(self._messages): return # Remove from data self._messages.pop(index) # Rebuild all rows to ensure on_delete indices are correct self._rebuild_messages() # Update self._update_token_usage() self._auto_save() def _rewrite_message(self, index: int): """Rewrite the user message at the given index.""" if index < 0 or index >= len(self._messages): return if self._messages[index]["role"] != "user": return # Get the content and fill it into the input content = self._messages[index]["content"] self._input.setPlainText(content) # Remove this message and everything after it self._messages = self._messages[:index] # Rebuild UI self._rebuild_messages() self._update_token_usage() self._auto_save() def _on_scroll(self, value): sb = self._scroll.verticalScrollBar() self._user_scrolled_up = value < sb.maximum() - 30 self._update_nav_visibility() def _update_nav_visibility(self): sb = self._scroll.verticalScrollBar() max_val = sb.maximum() vp_h = self._scroll.viewport().height() total_h = max_val + vp_h show_nav = max_val > 0 and total_h >= vp_h * 1.5 if show_nav: self._nav_widget.show() self._nav_up.setEnabled(sb.value() > 2) self._nav_down.setEnabled(max_val > 0 and sb.value() < max_val - 2) else: self._nav_widget.hide() def _scroll_to_top(self): self._user_scrolled_up = True self._scroll.verticalScrollBar().setValue(0) def _scroll_to_bottom(self): self._user_scrolled_up = False QTimer.singleShot(60, lambda: self._scroll.verticalScrollBar().setValue( self._scroll.verticalScrollBar().maximum() )) def _scroll_bottom(self): if self._user_scrolled_up: return QTimer.singleShot(60, lambda: self._scroll.verticalScrollBar().setValue( self._scroll.verticalScrollBar().maximum() )) # ── inject (autonomous mode) ─────────────────────────────────────────────── def inject_message(self, text: str): """Programmatically send a message (called by idle monitor).""" self._input.setPlainText(text) self._handle_send() # ── history ──────────────────────────────────────────────────────────────── def _refresh_history(self): self._history = _load_history() self._hist_list.clear() for s in reversed(self._history[-20:]): n = len(s.get("messages", [])) item = QListWidgetItem(f" {s.get('title','未命名')} ({n} 条)") item.setData(Qt.UserRole, s) self._hist_list.addItem(item) def _restore_selected(self, item=None): item = item or self._hist_list.currentItem() if not item: return s = item.data(Qt.UserRole) if s: self._session = s.copy() self._messages = s.get("messages", []).copy() self._rebuild_messages() self._switch_tab(0) self._update_token_usage() search_text = self._search_input.text().strip() if search_text: QTimer.singleShot(50, lambda: self._search_current_chat(search_text)) def _delete_selected(self): item = self._hist_list.currentItem() if not item: return s = item.data(Qt.UserRole) if s: self._history = [h for h in self._history if h.get("id") != s.get("id")] _save_history(self._history) self._refresh_history() def _rebuild_messages(self): while self._msg_layout.count() > 1: it = self._msg_layout.takeAt(0) if it.widget(): it.widget().deleteLater() for i, m in enumerate(self._messages): rewrite_cb = (lambda idx=i: self._rewrite_message(idx)) if m["role"] == "user" else None self._add_msg_row( m["role"], m["content"], created_at=m.get("created_at"), on_delete=lambda idx=i: self._delete_message(idx), on_rewrite=rewrite_cb ) self._update_token_usage() def _update_token_usage(self): in_chars = sum(len(m.get("content", "")) for m in self._messages if m.get("role") == "user") out_chars = sum(len(m.get("content", "")) for m in self._messages if m.get("role") == "assistant") if getattr(self, "_is_streaming", False) and getattr(self, "_streaming_text", ""): out_chars += len(self._streaming_text) in_tokens = int(in_chars / 2.5) out_tokens = int(out_chars / 2.5) if in_tokens == 0 and out_tokens == 0: self._token_lbl.setText("") else: self._token_lbl.setText(f"| 会话上下文消耗: 入 {in_tokens} 出 {out_tokens} tokens") # ── SOP ──────────────────────────────────────────────────────────────────── def _refresh_sop(self): self._sop_list.clear() file_icon = _svg_icon("sop_file_item", _SVG_FILE, C["muted"]) for path in sorted(glob.glob(os.path.join(os.path.dirname(os.path.dirname(__file__)), "memory", "*.md"))): name = os.path.basename(path) size = os.path.getsize(path) it = QListWidgetItem(name) it.setIcon(file_icon) it.setData(Qt.UserRole, path) it.setToolTip(f"{size:,} 字节") self._sop_list.addItem(it) def _load_sop(self, item): if not item: return path = item.data(Qt.UserRole) try: with open(path, "r", encoding="utf-8") as f: self._sop_viewer.setHtml(_md_to_html(f.read())) except Exception as e: self._sop_viewer.setPlainText(f"读取失败: {e}") # ── settings actions ─────────────────────────────────────────────────────── def _model_name(self) -> str: if self.agent.llmclient is None: return "未配置" try: return self.agent.get_llm_name() except Exception: return "未知" def _add_system_notice(self, text: str): """Insert a small centered notice label (not tracked as a message).""" lbl = QLabel(text) lbl.setWordWrap(True) lbl.setAlignment(Qt.AlignCenter) lbl.setStyleSheet( "QLabel { background: transparent; color: #71717a;" " border: none; padding: 6px 20px; font-size: 12px; }" ) self._msg_layout.insertWidget(self._msg_layout.count() - 1, lbl) self._scroll_bottom() def _do_stop(self): self.agent.abort() self._poll_timer.stop() self._set_send_mode() self._streaming_badge.hide() if self._streaming_row: self._streaming_row.set_text(self._streaming_text or "(已停止)") self._streaming_row.set_finished(True) self._streaming_row = None self._update_token_usage() def _do_reset_prompt(self): if self.agent.llmclient and hasattr(self.agent.llmclient, 'last_tools'): self.agent.llmclient.last_tools = "" def _auto_save(self): if not self._messages: return if self._session.get("title") == "新对话": first_user = next( (m["content"] for m in self._messages if m["role"] == "user"), "" ) if first_user: self._session["title"] = first_user[:30].replace("\n", " ") self._do_save() def _do_save(self): if not self._messages: return self._session["messages"] = self._messages.copy() self._session["updatedAt"] = datetime.now().isoformat() self._history = _load_history() for i, s in enumerate(self._history): if s.get("id") == self._session["id"]: self._history[i] = self._session.copy() break else: self._history.append(self._session.copy()) _save_history(self._history) def _do_clear(self): self._messages.clear() self._session = {"id": _make_session_id(), "title": "新对话", "messages": []} self._rebuild_messages() self._switch_tab(0) self._update_token_usage() def _new_session(self): if self._messages: self._do_save() self._do_clear() def _do_toggle_auto(self): self.autonomous_enabled = not self.autonomous_enabled self._auto_btn.setChecked(self.autonomous_enabled) lbl = "暂停自主行动" if self.autonomous_enabled else "开启自主行动 (idle > 30 min 自动触发)" self._auto_btn.setText(lbl) def _do_trigger_auto(self): self.inject_message( "[AUTO]🤖 用户触发了自主行动,请阅读自动化sop,选择并执行一项有价值的任务。" ) # ── helpers ──────────────────────────────────────────────────────────────── @staticmethod def _small_btn_style(color: str) -> str: return ( f"QPushButton {{ background: {color}; color: white; border: none;" f" border-radius: 7px; padding: 4px 12px; font-size: 12px; font-weight: 600; }}" f"QPushButton:hover {{ opacity: 0.85; }}" ) # ══════════════════════════════════════════════════════════════════════ # Entry Point # ══════════════════════════════════════════════════════════════════════ def main(): # High-DPI support QApplication.setHighDpiScaleFactorRoundingPolicy( Qt.HighDpiScaleFactorRoundingPolicy.PassThrough ) app = QApplication(sys.argv) app.setQuitOnLastWindowClosed(False) app.setApplicationName("GenericAgent") # Font font = QFont() # Keep English glyphs in Arial; Chinese falls back to Microsoft YaHei. try: font.setFamilies(["Arial", "Microsoft YaHei"]) except Exception: font.setFamily("Microsoft YaHei") font.setPointSize(10) app.setFont(font) # ── Agent initialisation ────────────────────────────── agent = GeneraticAgent() if agent.llmclient is None: QMessageBox.critical( None, "未配置 LLM", "未在 mykey.py 中发现任何可用的 LLM 接口配置,\n程序将在无 LLM 模式下运行。", ) else: threading.Thread(target=agent.run, daemon=True).start() # ── Windows ─────────────────────────────────────────── panel = ChatPanel(agent) button = FloatingButton(panel) button.show() # Position panel next to button and show it on first launch button._position_panel() panel.show() scr = QApplication.primaryScreen().availableGeometry() print(f"[GenericAgent] 启动成功") print(f" 屏幕分辨率: {scr.width()}x{scr.height()}") print(f" 悬浮按钮: ({button.x()}, {button.y()})") print(f" 聊天面板: ({panel.x()}, {panel.y()})") print(f" 关闭面板后可点击右下角发光按钮重新打开") # ── Idle monitor (autonomous mode) ──────────────────── _last_trigger = 0.0 def idle_check(): nonlocal _last_trigger if not panel.autonomous_enabled: return now = time.time() if now - _last_trigger < AUTO_COOLDOWN: return idle = now - panel.last_reply_time if idle > AUTO_IDLE_THRESHOLD: _last_trigger = now panel.inject_message( "[AUTO]🤖 用户已经离开超过30分钟,作为自主智能体,请阅读自动化sop,执行自动任务。" ) idle_timer = QTimer() idle_timer.timeout.connect(idle_check) idle_timer.start(5000) # check every 5 s sys.exit(app.exec()) if __name__ == "__main__": main() ================================================ FILE: frontends/skins/boy/skin.json ================================================ { "name": "Boy", "version": "1.0.0", "author": "pzuh", "source": "https://pzuh.itch.io/temple-run-game-sprites", "description": "Boy 角色皮肤", "style": "pixel", "format": "sprite", "size": { "width": 80, "height": 122 }, "animations": { "idle": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 64, "frameHeight": 98, "frameCount": 10, "columns": 40, "fps": 6, "startFrame": 0 } }, "walk": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 64, "frameHeight": 98, "frameCount": 10, "columns": 40, "fps": 3, "startFrame": 20 } }, "run": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 64, "frameHeight": 98, "frameCount": 10, "columns": 40, "fps": 10, "startFrame": 20 } }, "sprint": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 64, "frameHeight": 98, "frameCount": 10, "columns": 40, "fps": 24, "startFrame": 20 } } } } ================================================ FILE: frontends/skins/dinosaur/skin.json ================================================ { "name": "Dinosaur", "version": "1.0.0", "author": "voidcord54", "source": "https://voidcord54.itch.io/", "description": "像素风小恐龙 Dinosaur", "style": "pixel", "format": "sprite", "size": { "width": 128, "height": 128 }, "animations": { "idle": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 128, "frameHeight": 128, "frameCount": 2, "columns": 5, "fps": 6, "startFrame": 0 } }, "walk": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 128, "frameHeight": 128, "frameCount": 2, "columns": 5, "fps": 4, "startFrame": 2 } }, "run": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 128, "frameHeight": 128, "frameCount": 2, "columns": 5, "fps": 8, "startFrame": 2 } }, "sprint": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 128, "frameHeight": 128, "frameCount": 2, "columns": 5, "fps": 16, "startFrame": 2 } } } } ================================================ FILE: frontends/skins/doux/skin.json ================================================ { "name": "Doux", "version": "1.0.0", "author": "arks", "source": "https://arks.itch.io/dino-characters", "license": "CC0", "description": "像素风小恐龙 Doux", "style": "pixel", "format": "sprite", "size": { "width": 128, "height": 128 }, "animations": { "idle": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 4, "columns": 24, "fps": 6, "startFrame": 0 } }, "walk": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 6, "columns": 24, "fps": 6, "startFrame": 5 } }, "run": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 8, "columns": 24, "fps": 16, "startFrame": 6 } }, "sprint": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 6, "columns": 24, "fps": 16, "startFrame": 17 } } } } ================================================ FILE: frontends/skins/glube/skin.json ================================================ { "name": "Glube", "version": "1.0.0", "author": "SketchesWithKevin", "source": "https://sketcheswithkevin.itch.io/glube-platformer", "description": "像素风小怪兽 Glube", "style": "pixel", "format": "sprite", "size": { "width": 65, "height": 38 }, "animations": { "idle": { "file": "idle.png", "loop": true, "sprite": { "frameWidth": 44, "frameHeight": 31, "frameCount": 6, "columns": 6, "fps": 6, "startFrame": 0 } }, "walk": { "file": "walk.png", "loop": true, "sprite": { "frameWidth": 65, "frameHeight": 32, "frameCount": 8, "columns": 8, "fps": 6, "startFrame": 0 } }, "run": { "file": "run.png", "loop": true, "sprite": { "frameWidth": 65, "frameHeight": 32, "frameCount": 8, "columns": 8, "fps": 12, "startFrame": 0 } }, "sprint": { "file": "run.png", "loop": true, "sprite": { "frameWidth": 65, "frameHeight": 32, "frameCount": 8, "columns": 8, "fps": 24, "startFrame": 0 } } } } ================================================ FILE: frontends/skins/line/License.txt ================================================ License is CC0 - https://creativecommons.org/public-domain/cc0/ YOU CAN: -> You can do whatever you want with this asset, including modifying it for commercial use. -> Credit is not required, but is greatly appreciated! ================================================ FILE: frontends/skins/line/skin.json ================================================ { "name": "Line", "version": "1.0.0", "author": "itch.io", "source": "https://itch.io", "description": "Line 角色皮肤", "style": "pixel", "format": "sprite", "size": { "width": 128, "height": 128 }, "animations": { "idle": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 156, "frameHeight": 185, "frameCount": 4, "columns": 28, "fps": 6, "startFrame": 0 } }, "walk": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 156, "frameHeight": 185, "frameCount": 8, "columns": 28, "fps": 6, "startFrame": 4 } }, "run": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 156, "frameHeight": 185, "frameCount": 8, "columns": 28, "fps": 10, "startFrame": 12 } }, "sprint": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 156, "frameHeight": 185, "frameCount": 8, "columns": 28, "fps": 24, "startFrame": 12 } } } } ================================================ FILE: frontends/skins/mort/skin.json ================================================ { "name": "Mort", "version": "1.0.0", "author": "arks", "source": "https://arks.itch.io/dino-characters", "license": "CC0", "description": "像素风小恐龙 Mort", "style": "pixel", "format": "sprite", "size": { "width": 128, "height": 128 }, "animations": { "idle": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 4, "columns": 24, "fps": 6, "startFrame": 0 } }, "walk": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 6, "columns": 24, "fps": 6, "startFrame": 5 } }, "run": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 8, "columns": 24, "fps": 16, "startFrame": 6 } }, "sprint": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 6, "columns": 24, "fps": 16, "startFrame": 17 } } } } ================================================ FILE: frontends/skins/tard/skin.json ================================================ { "name": "Tard", "version": "1.0.0", "author": "arks", "source": "https://arks.itch.io/dino-characters", "license": "CC0", "description": "像素风小恐龙 Tard", "style": "pixel", "format": "sprite", "size": { "width": 128, "height": 128 }, "animations": { "idle": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 4, "columns": 24, "fps": 6, "startFrame": 0 } }, "walk": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 6, "columns": 24, "fps": 6, "startFrame": 5 } }, "run": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 8, "columns": 24, "fps": 16, "startFrame": 6 } }, "sprint": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 6, "columns": 24, "fps": 16, "startFrame": 17 } } } } ================================================ FILE: frontends/skins/vita/skin.json ================================================ { "name": "Vita", "version": "1.0.0", "author": "arks", "source": "https://arks.itch.io/dino-characters", "license": "CC0", "description": "像素风小恐龙 Vita", "style": "pixel", "format": "sprite", "size": { "width": 128, "height": 128 }, "animations": { "idle": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 4, "columns": 24, "fps": 6, "startFrame": 0 } }, "walk": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 6, "columns": 24, "fps": 6, "startFrame": 5 } }, "run": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 8, "columns": 24, "fps": 16, "startFrame": 6 } }, "sprint": { "file": "skin.png", "loop": true, "sprite": { "frameWidth": 24, "frameHeight": 24, "frameCount": 6, "columns": 24, "fps": 16, "startFrame": 17 } } } } ================================================ FILE: frontends/stapp.py ================================================ import os, sys, subprocess from urllib.request import urlopen from urllib.parse import quote if sys.stdout is None: sys.stdout = open(os.devnull, "w") if sys.stderr is None: sys.stderr = open(os.devnull, "w") try: sys.stdout.reconfigure(errors='replace') except: pass try: sys.stderr.reconfigure(errors='replace') except: pass script_dir = os.path.dirname(__file__) sys.path.append(os.path.abspath(os.path.join(script_dir, '..'))) sys.path.append(os.path.abspath(script_dir)) import streamlit as st import time, json, re, threading, queue from agentmain import GeneraticAgent import chatapp_common # activate /continue command (monkey patches GeneraticAgent) from continue_cmd import handle_frontend_command, reset_conversation, list_sessions, extract_ui_messages from btw_cmd import handle_frontend_command as btw_handle_frontend st.set_page_config(page_title="Cowork", layout="wide") LANG = os.environ.get('GA_LANG', 'zh') if LANG not in ('zh', 'en'): LANG = 'zh' I18N = { 'zh': { 'force_stop': '强行停止任务', 'reinject_tools': '重新注入工具', 'desktop_pet': '🐱 桌面宠物', }, 'en': { 'force_stop': 'Force Stop', 'reinject_tools': 'Reinject Tools', 'desktop_pet': '🐱 Desktop Pet', }, } def T(key): return I18N.get(LANG, I18N['zh']).get(key, key) @st.cache_resource def init(): agent = GeneraticAgent() if agent.llmclient is None: st.error("⚠️ Please set mykey.py!") st.stop() else: threading.Thread(target=agent.run, daemon=True).start() return agent agent = init() st.title("🖥️ Cowork") st.session_state.setdefault('autonomous_enabled', False) @st.fragment def render_sidebar(): st.session_state.setdefault('autonomous_enabled', False) llm_options = agent.list_llms() current_idx = agent.llm_no llm_labels = {idx: f"{idx}: {(name or '').strip()}" for idx, name, _ in llm_options} st.caption(f"LLM Core: {llm_labels.get(current_idx, str(current_idx))}") selected_idx = st.selectbox("LLM", [idx for idx, _, _ in llm_options], index=next((i for i, (idx, _, _) in enumerate(llm_options) if idx == current_idx), 0), format_func=llm_labels.get, label_visibility="collapsed", key="sidebar_llm_select") if selected_idx != current_idx: agent.next_llm(selected_idx); st.rerun(scope="fragment") if st.button(T('force_stop')): agent.abort(); st.toast("Stop signal sended"); st.rerun() if st.button(T('reinject_tools')): agent.llmclient.last_tools = '' try: hist_path = os.path.join(script_dir, '..', 'assets', 'tool_usable_history.json') with open(hist_path, 'r', encoding='utf-8') as f: tool_hist = json.load(f) agent.llmclient.backend.history.extend(tool_hist) st.toast(f"Tools injected") except Exception as e: st.toast(f"Injected tools failed: {e}") if st.button(T('desktop_pet')): kwargs = {'creationflags': 0x08} if sys.platform == 'win32' else {} pet_script = os.path.join(script_dir, 'desktop_pet_v2.pyw') if not os.path.exists(pet_script): pet_script = os.path.join(script_dir, 'desktop_pet.pyw') subprocess.Popen([sys.executable, pet_script], **kwargs) def _pet_req(q): def _do(): try: urlopen(f'http://127.0.0.1:41983/?{q}', timeout=2) except Exception: pass threading.Thread(target=_do, daemon=True).start() agent._pet_req = _pet_req if not hasattr(agent, '_turn_end_hooks'): agent._turn_end_hooks = {} def _pet_hook(ctx): parts = [f"Turn {ctx.get('turn','?')}"] if ctx.get('summary'): parts.append(ctx['summary']) if ctx.get('exit_reason'): parts.append('DONE') _pet_req(f'msg={quote(chr(10).join(parts))}') if ctx.get('exit_reason'): _pet_req('state=idle') agent._turn_end_hooks['pet'] = _pet_hook st.toast("Desktop pet started") if LANG == 'zh': st.divider() if st.button("开始空闲自主行动"): st.session_state.last_reply_time = int(time.time()) - 1800 st.toast("已将上次回复时间设为1800秒前"); st.rerun() if st.session_state.autonomous_enabled: if st.button("⏸️ 禁止自主行动"): st.session_state.autonomous_enabled = False st.toast("⏸️ 已禁止自主行动"); st.rerun() st.caption("🟢 自主行动运行中,会在你离开它30分钟后自动进行") else: if st.button("▶️ 允许自主行动", type="primary"): st.session_state.autonomous_enabled = True st.toast("✅ 已允许自主行动"); st.rerun() st.caption("🔴 自主行动已停止") with st.sidebar: render_sidebar() def fold_turns(text): """Return list of segments: [{'type':'text','content':...}, {'type':'fold','title':...,'content':...}]""" # 先把4+反引号块替换为占位符,避免误切子agent嵌套的 LLM Running _ph = [] safe = re.sub(r'`{4,}.*?`{4,}', lambda m: (_ph.append(m.group(0)), f'\x00PH{len(_ph)-1}\x00')[1], text, flags=re.DOTALL) # 流式中间态:末尾可能有未闭合的4+反引号块,也需保护 safe = re.sub(r'`{4,}[^`].*$', lambda m: (_ph.append(m.group(0)), f'\x00PH{len(_ph)-1}\x00')[1], safe, flags=re.DOTALL) parts = re.split(r'(\**LLM Running \(Turn \d+\) \.\.\.\*\**)', safe) parts = [re.sub(r'\x00PH(\d+)\x00', lambda m: _ph[int(m.group(1))], p) for p in parts] if len(parts) < 4: return [{'type': 'text', 'content': text}] segments = [] if parts[0].strip(): segments.append({'type': 'text', 'content': parts[0]}) turns = [] for i in range(1, len(parts), 2): marker = parts[i] content = parts[i+1] if i+1 < len(parts) else '' turns.append((marker, content)) for idx, (marker, content) in enumerate(turns): if idx < len(turns) - 1: _c = re.sub(r'`{3,}.*?`{3,}|.*?', '', content, flags=re.DOTALL) matches = re.findall(r'\s*((?:(?!).)*?)\s*', _c, re.DOTALL) if matches: title = matches[0].strip() title = title.split('\n')[0] if len(title) > 50: title = title[:50] + '...' else: _plain = _c.strip().split('\n', 1)[0] title = (_plain[:50] + '...') if len(_plain) > 50 else (_plain or marker.strip('*')) segments.append({'type': 'fold', 'title': title, 'content': content}) else: segments.append({'type': 'text', 'content': marker + content}) return segments _SUMMARY_TAG_RE = re.compile(r'.*?\s*', re.DOTALL) def render_segments(segments, suffix=''): # 整块重画:调用方用 slot.container() 包裹,保证 DOM 路径稳定、跨 rerun 对齐(消除"灰色重影")。 # heartbeat 空转时 segments 不变 → Streamlit 后端 diff 无变化 → 前端零闪烁; # 但 container/markdown 本身是 API 调用,StopException 仍会被抛出(abort 照常起作用)。 for seg in segments: if seg['type'] == 'fold': with st.expander(seg['title'], expanded=False): st.markdown(seg['content']) else: # Strip meta tags from text segments — folded turns already # promote them to expander titles; for the first/last segments # they'd otherwise leak into the chat as raw text (esp. after /continue # restores a multi-turn body). st.markdown(_SUMMARY_TAG_RE.sub('', seg['content']) + suffix) def agent_backend_stream(prompt=None): """Drain main task display_queue. - prompt given: start a fresh task; new dq is kept in session_state. - prompt is None: resume a dq left in session_state by a prior run (e.g. after /btw). Per-chunk progress is mirrored to session_state.partial_response so the rendered bubble survives reruns. No implicit agent.abort() — explicit stop is on the Stop button.""" if prompt is not None: st.session_state.display_queue = agent.put_task(prompt, source="user") st.session_state.partial_response = '' dq = st.session_state.get('display_queue') if dq is None: return # Drop a dangling 'LLM Running (Turn N) ...' marker if the captured partial # ended right at a turn boundary with no content yet — otherwise the resume # bubble flashes as a marker-only gray line. The marker reappears with # content on the next chunk (raw_resp is cumulative). response = re.sub(r'\**LLM Running \(Turn \d+\) \.\.\.\**\s*$', '', st.session_state.get('partial_response', '')).rstrip() while True: try: item = dq.get(timeout=1) except queue.Empty: yield response # heartbeat: let outer st.markdown() run → Streamlit checks StopException continue if 'next' in item: response = item['next'] st.session_state.partial_response = response yield response if 'done' in item: st.session_state.display_queue = None st.session_state.partial_response = '' yield item['done']; break def render_main_stream(prompt=None): """Render the assistant bubble for the main task (new or resumed). Saves final to messages.""" with st.chat_message("assistant"): frozen = 0; live = st.empty(); response = '' CURSOR = ' ▌' for response in agent_backend_stream(prompt): segs = fold_turns(response) n_done = max(0, len(segs) - 1) while frozen < n_done: with live.container(): render_segments([segs[frozen]]) live = st.empty(); frozen += 1 with live.container(): render_segments([segs[-1]], suffix=CURSOR) # live 区域 segs = fold_turns(response) for i in range(frozen, len(segs)): with live.container(): render_segments([segs[i]]) if i < len(segs) - 1: live = st.empty() if response: st.session_state.messages.append({"role": "assistant", "content": response}) st.session_state.last_reply_time = int(time.time()) if "messages" not in st.session_state: st.session_state.messages = [] for msg in st.session_state.messages: with st.chat_message(msg["role"]): # 用 slot=st.empty() + with slot.container(): ... 的外壳,DOM 路径和流式渲染完全一致,跨 rerun 对齐 slot = st.empty() with slot.container(): if msg["role"] == "assistant": render_segments(fold_turns(msg["content"])) else: st.markdown(msg["content"]) # Scroll-height ghost fix: during streaming, expander open/close mid-animation can leave # phantom height → scrollbar long but can't scroll to bottom. Periodically detect & reflow. try: from streamlit import iframe as _st_iframe # 1.56+ _embed_html = lambda html, **kw: _st_iframe(html, **{k: max(v, 1) if isinstance(v, int) else v for k, v in kw.items()}) except (ImportError, AttributeError): from streamlit.components.v1 import html as _embed_html # ≤1.55 _js_scroll_fix = ( "!function(){var p=window.parent;if(p.__sfx2)return;p.__sfx2=1;var d=p.document;" "function f(){var m=d.querySelector('section.main');if(!m)return;" "var s=m.scrollTop;m.style.minHeight=m.scrollHeight+1+'px';void m.offsetHeight;" "m.style.minHeight='';void m.offsetHeight;m.scrollTop=s}" "d.addEventListener('transitionend',function(e){" "e.target.closest&&e.target.closest('details')&&setTimeout(f,60)},!0);" "new MutationObserver(function(){setTimeout(f,80)})" ".observe(d.body,{subtree:1,attributes:1,attributeFilter:['open']});" "setInterval(f,5000)}()" ) # IME composition fix (macOS only) - prevents Enter from submitting during CJK input _js_ime_fix = ("" if os.name == 'nt' else "!function(){if(window.parent.__imeFix)return;window.parent.__imeFix=1;" "var d=window.parent.document,c=0;" "d.addEventListener('compositionstart',()=>c=1,!0);" "d.addEventListener('compositionend',()=>c=0,!0);" "function f(){d.querySelectorAll('textarea[data-testid=stChatInputTextArea]')" ".forEach(t=>{t.__imeFix||(t.__imeFix=1,t.addEventListener('keydown',e=>{" "e.key==='Enter'&&!e.shiftKey&&(e.isComposing||c||e.keyCode===229)&&" "(e.stopImmediatePropagation(),e.preventDefault())},!0))})}" "f();new MutationObserver(f).observe(d.body,{childList:1,subtree:1})}()") _embed_html(f'', height=0) if prompt := st.chat_input("any task?"): ts = time.strftime("%Y-%m-%d %H:%M:%S") cmd = (prompt or "").strip() def _reset_and_rerun(): st.session_state.streaming = False st.session_state.stopping = False st.session_state.display_queue = None st.session_state.partial_response = "" st.session_state.reply_ts = "" st.session_state.current_prompt = "" st.session_state.last_reply_time = int(time.time()) st.rerun() if cmd == "/new": st.session_state.messages = [{"role": "assistant", "content": reset_conversation(agent), "time": ts}] _reset_and_rerun() if cmd.startswith("/continue"): m = re.match(r'/continue\s+(\d+)\s*$', cmd.strip()) sessions = list_sessions(exclude_pid=os.getpid()) if m else [] idx = int(m.group(1)) - 1 if m else -1 # Resolve target path BEFORE handle (which snapshots current log, shifting indices). target = sessions[idx][0] if 0 <= idx < len(sessions) else None result = handle_frontend_command(agent, cmd) history = extract_ui_messages(target) if target and result.startswith('✅') else None tail = [{"role": "assistant", "content": result, "time": ts}] if history: st.session_state.messages = history + tail else: st.session_state.messages = list(st.session_state.messages) + \ [{"role": "user", "content": cmd, "time": ts}] + tail _reset_and_rerun() if cmd.startswith("/btw"): answer = btw_handle_frontend(agent, cmd) # sync; bypasses put_task → main agent.run() untouched st.session_state.messages = list(st.session_state.messages) + [ {"role": "user", "content": prompt, "time": ts}, {"role": "assistant", "content": answer, "time": ts}, ] st.rerun() # preserve display_queue/partial_response so resume path drains the running main task # Regular prompt: cancel any in-flight task to match original "submit cancels" UX. # (/btw branch above is the only path that intentionally lets the prior task keep streaming.) if st.session_state.get('display_queue') is not None: agent.abort() st.session_state.display_queue = None st.session_state.partial_response = '' st.session_state.messages.append({"role": "user", "content": prompt}) if hasattr(agent, '_pet_req') and not prompt.startswith('/'): agent._pet_req('state=walk') with st.chat_message("user"): st.markdown(prompt) render_main_stream(prompt) elif st.session_state.get('display_queue') is not None: # No new prompt but a task is mid-flight (typically a /btw rerun) — resume drain. render_main_stream() if st.session_state.autonomous_enabled: st.markdown(f"""""", unsafe_allow_html=True) ================================================ FILE: frontends/stapp2.py ================================================ import os, sys import html if sys.stdout is None: sys.stdout = open(os.devnull, "w") if sys.stderr is None: sys.stderr = open(os.devnull, "w") try: sys.stdout.reconfigure(errors='replace') except: pass try: sys.stderr.reconfigure(errors='replace') except: pass sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) import streamlit as st try: from streamlit import iframe as _st_iframe # 1.56+ _embed_html = lambda html, **kw: _st_iframe(html, **{k: max(v, 1) if isinstance(v, int) else v for k, v in kw.items()}) except (ImportError, AttributeError): from streamlit.components.v1 import html as _embed_html # ≤1.55 import time, json, re, threading, queue from datetime import datetime from agentmain import GeneraticAgent st.set_page_config(page_title="Cowork", layout="wide") # ─── Anthropic Light Theme CSS ─── ANTHROPIC_CSS = """ """ ANTHROPIC_SELECTBOX_SCRIPT = """
""" @st.cache_resource def init(): agent = GeneraticAgent() if agent.llmclient is None: st.error("⚠️ 未配置任何可用的 LLM 接口,请在 mykey.py 中添加 sider_cookie 或 oai_apikey+oai_apibase 等信息后重启。") st.stop() else: threading.Thread(target=agent.run, daemon=True).start() return agent def build_dynamic_font_css(scale_percent: float) -> str: root_percent = max(100.0, min(200.0, float(scale_percent))) rem_scale = root_percent / 100.0 return f""" """ def build_dynamic_font_update_script(scale_percent: float) -> str: css = json.dumps(build_dynamic_font_css(scale_percent)) return f""" """ def build_header_agent_badge_script() -> str: return """ """ agent = init() def init_session_state(): for key, value in { 'agent_name': 'GenericAgent', 'streaming': False, 'stopping': False, 'display_queue': None, 'partial_response': '', 'reply_ts': '', 'current_prompt': '', 'selected_llm_idx': agent.llm_no, 'autonomous_enabled': False, 'messages': [], }.items(): st.session_state.setdefault(key, value) init_session_state() # Inject Anthropic theme st.markdown(ANTHROPIC_CSS, unsafe_allow_html=True) st.markdown(build_dynamic_font_css(110.0), unsafe_allow_html=True) _embed_html(ANTHROPIC_SELECTBOX_SCRIPT, height=0, width=0) _embed_html(build_header_agent_badge_script(), height=0, width=0) st.session_state.agent_name = 'Generic Agent' with st.chat_message("assistant"): st.markdown(f'
{datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
', unsafe_allow_html=True) st.write("欢迎使用GenericAgent~") @st.fragment def render_sidebar(): llm_options, current_idx = agent.list_llms(), agent.llm_no st.session_state.selected_llm_idx = current_idx llm_labels = {idx: f"{idx}: {(name or '').strip()}" for idx, name, _ in llm_options} st.caption(f"当前使用的LLM为:{current_idx}: {agent.get_llm_name()}", help="可在下方选择链路") st.markdown(f'
{html.escape(max(llm_labels.values(), key=len, default=""))}
', unsafe_allow_html=True) selected_idx = st.selectbox("选择链路:", [idx for idx, _, _ in llm_options], index=next((i for i, (idx, _, _) in enumerate(llm_options) if idx == current_idx), 0), format_func=llm_labels.get, key="sidebar_llm_select") if selected_idx != current_idx: agent.next_llm(selected_idx) st.session_state.selected_llm_idx = selected_idx st.toast(f"已切换到备用链路:{llm_labels[selected_idx]}") st.rerun() st.divider() if st.button("重新注入System Prompt"): agent.llmclient.last_tools = '' st.toast("下次将重新注入System Prompt") with st.sidebar: render_sidebar() def start_agent_task(prompt): st.session_state.display_queue = agent.put_task(prompt, source="user") st.session_state.streaming, st.session_state.stopping, st.session_state.partial_response = True, False, '' st.session_state.reply_ts = datetime.now().strftime("%Y-%m-%d %H:%M:%S") st.session_state.current_prompt = prompt def poll_agent_output(max_items=20): q = st.session_state.display_queue if q is None: st.session_state.streaming = False return False done = False for _ in range(max_items): try: item = q.get_nowait() except queue.Empty: break if 'next' in item: st.session_state.partial_response = item['next'] if 'done' in item: st.session_state.partial_response = item['done'] done = True break if done: st.session_state.streaming = st.session_state.stopping = False; st.session_state.display_queue = None return done def _get_response_segments(text): return [p for p in re.split(r'(?=\*\*LLM Running \(Turn \d+\) \.\.\.\*\*)', text) if p.strip()] or [text] def render_message(role, content, ts='', unsafe_allow_html=True): with st.chat_message(role): if ts: st.markdown(f'
{ts}
', unsafe_allow_html=True) st.markdown(content, unsafe_allow_html=unsafe_allow_html) def finish_streaming_message(): reply_ts = st.session_state.reply_ts st.session_state.messages.extend({"role": "assistant", "content": seg, "time": reply_ts} for seg in _get_response_segments(st.session_state.partial_response)) st.session_state.last_reply_time = int(time.time()) st.session_state.partial_response = st.session_state.reply_ts = st.session_state.current_prompt = '' def render_streaming_area(): if not st.session_state.streaming: return with st.container(): st.markdown('', unsafe_allow_html=True) if st.button("⏹️ 停止生成", type="primary"): agent.abort(); st.session_state.stopping = True; st.toast("已发送停止信号"); st.rerun() reply_ts = st.session_state.reply_ts with st.empty().container(): segments = _get_response_segments(st.session_state.partial_response) for i, seg in enumerate(segments): render_message("assistant", seg + ("" if i < len(segments) - 1 else "▌"), ts=reply_ts, unsafe_allow_html=False) if poll_agent_output(): finish_streaming_message() else: time.sleep(0.2) st.rerun() for msg in st.session_state.messages: render_message(msg["role"], msg["content"], ts=msg.get("time", ""), unsafe_allow_html=True) if st.session_state.streaming: render_streaming_area() if prompt := st.chat_input("请输入指令", disabled=st.session_state.streaming): st.session_state.messages.append({"role": "user", "content": prompt, "time": datetime.now().strftime("%Y-%m-%d %H:%M:%S")}) start_agent_task(prompt) st.rerun() ================================================ FILE: frontends/tgapp.py ================================================ import os, sys, re, threading, asyncio, queue as Q, time, random, uuid sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) _TEMP_DIR = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), 'temp') from agentmain import GeneraticAgent try: from telegram import BotCommand, InlineKeyboardButton, InlineKeyboardMarkup from telegram.constants import ChatType, MessageLimit, ParseMode from telegram.error import RetryAfter from telegram.ext import ApplicationBuilder, CallbackQueryHandler, MessageHandler, filters, ContextTypes from telegram.helpers import escape_markdown from telegram.request import HTTPXRequest except: print("Please ask the agent install python-telegram-bot to use telegram module.") sys.exit(1) from chatapp_common import ( FILE_HINT, HELP_TEXT, TELEGRAM_MENU_COMMANDS, clean_reply, ensure_single_instance, extract_files, format_restore, redirect_log, require_runtime, split_text, ) from continue_cmd import handle_frontend_command, reset_conversation from llmcore import mykeys agent = GeneraticAgent() agent.verbose = False agent.inc_out = True ALLOWED = set(mykeys.get('tg_allowed_users', [])) _DRAFT_HINT = "thinking..." _STREAM_SUFFIX = " ⏳" _STREAM_SEGMENT_LIMIT = max(1200, MessageLimit.MAX_TEXT_LENGTH - 256) _STREAM_UPDATE_INTERVAL_SECONDS = 2.0 _STREAM_MIN_UPDATE_CHARS = 400 _RETRY_AFTER_MARGIN_SECONDS = 1.0 _QUEUE_WAIT_SECONDS = 1 _ASK_USER_HOOK_KEY = "telegram_ask_user_menu" _ASK_CALLBACK_PREFIX = "ask:" _ASK_CANCEL_ACTION = "none" _ASK_CANCEL_LABEL = "none of these above" _ASK_CANCEL_PROMPT = "已取消选择,请直接发送下一步操作。" _ask_menu_events = Q.Queue() _ask_menu_store = {} _QUOTE_OPEN_TAG = "<_quote_>" _QUOTE_CLOSE_TAG = "" _QUOTE_TOKEN_PATTERN = re.escape(_QUOTE_OPEN_TAG) + r"([\s\S]*?)" + re.escape(_QUOTE_CLOSE_TAG) _MD_TOKEN_RE = re.compile( ( r"(`{3,})([A-Za-z0-9_+-]*)\n([\s\S]*?)\1" r"|" + _QUOTE_TOKEN_PATTERN + r"|\[([^\]]+)\]\(([^)\n]+)\)" r"|`([^`\n]+)`" r"|\*\*([^\n]+?)\*\*" r"|__([^\n]+?)__" r"|~~([^\n]+?)~~" r"|(?\s*(.*?)\s*
", re.DOTALL) _TURN_SUMMARY_SEARCH_STRIP_RE = re.compile(r"`{3,}[\s\S]*?`{3,}|[\s\S]*?", re.DOTALL) def _make_draft_id(): return random.randint(1, 2**31 - 1) def _visible_segments(text): text = (text or "").strip() if not text: return [] segments = [] for part in split_text(text, _STREAM_SEGMENT_LIMIT): segments.extend(_markdown_safe_segments(part)) return segments def _markdown_safe_segments(text, limit=None): limit = limit or MessageLimit.MAX_TEXT_LENGTH text = (text or "").strip() if not text: return [] if len(_to_markdown_v2(text)) <= limit: return [text] parts = [] remaining = text while remaining: if len(_to_markdown_v2(remaining)) <= limit: parts.append(remaining) break low, high, best = 1, len(remaining), 1 while low <= high: mid = (low + high) // 2 if len(_to_markdown_v2(remaining[:mid].rstrip() or remaining[:mid])) <= limit: best = mid low = mid + 1 else: high = mid - 1 cut = remaining.rfind("\n", 0, best) if cut < max(1, best * 0.6): cut = best chunk = remaining[:cut].rstrip() or remaining[:best] parts.append(chunk) remaining = remaining[len(chunk):].lstrip() return parts def _line_complete(line): return (line or "").endswith(("\n", "\r")) def _turn_marker_number(line): match = _TURN_MARKER_RE.fullmatch((line or "").strip()) return int(match.group(1)) if match else None def _maybe_partial_turn_marker(line): text = (line or "").strip().lstrip("*") if not text: return False marker_head = "LLM Running (Turn " return marker_head.startswith(text) or text.startswith(marker_head) def _maybe_partial_code_fence(line): return bool(re.match(r"^\s*`{1,}[^`\r\n]*$", line or "")) def _extract_turn_summary(raw_text): search_text = _TURN_SUMMARY_SEARCH_STRIP_RE.sub("", raw_text or "") match = _TURN_SUMMARY_RE.search(search_text) if not match: return "" summary = re.sub(r"\s+", " ", match.group(1)).strip() if len(summary) > _TURN_SUMMARY_LIMIT: summary = summary[:_TURN_SUMMARY_LIMIT - 3].rstrip() + "..." return summary def _quote_tag(text): safe_text = (text or "").strip().replace(_QUOTE_OPEN_TAG, "").replace(_QUOTE_CLOSE_TAG, "") return f"{_QUOTE_OPEN_TAG}{safe_text}{_QUOTE_CLOSE_TAG}" def _inject_turn_summary(body, summary): if not (body or "").strip() or not (summary or "").strip(): return body lines = (body or "").splitlines() if not lines or _turn_marker_number(lines[0]) is None: return body title = lines[0].strip() rest = "\n".join(lines[1:]).strip() summary_line = _quote_tag(summary) if rest: return f"{title}\n\n{summary_line}\n\n{rest}" return f"{title}\n\n{summary_line}" def _resolve_files(paths): files, seen = [], set() for fpath in paths: if not os.path.isabs(fpath): fpath = os.path.join(_TEMP_DIR, fpath) if fpath in seen or not os.path.exists(fpath): continue files.append(fpath) seen.add(fpath) return files def _render_file_markers(text): def repl(match): return os.path.basename(match.group(1)) return re.sub(r"\[FILE:([^\]]+)\]", repl, text or "").strip() def _files_from_text(text): cleaned = clean_reply(text) if (text or "").strip() else "" return _resolve_files(extract_files(cleaned)) async def _send_files(root_msg, files): for fpath in files: if fpath.lower().endswith((".png", ".jpg", ".jpeg", ".gif", ".webp")): try: with open(fpath, "rb") as fp: await root_msg.reply_photo(fp) except Exception: pass else: try: with open(fpath, "rb") as fp: await root_msg.reply_document(fp) except Exception: pass async def _send_files_from_text(root_msg, text): await _send_files(root_msg, _files_from_text(text)) def _escape_pre(text): return escape_markdown(text or "", version=2, entity_type="pre") def _escape_code(text): return escape_markdown(text or "", version=2, entity_type="code") def _escape_link_target(text): return escape_markdown(text or "", version=2, entity_type="text_link") def _quote_to_markdown_v2(text): lines = (text or "").strip().splitlines() or [""] return "\n".join(f"> {escape_markdown(line, version=2)}" for line in lines) def _to_markdown_v2(text): if not text: return "" parts, pos = [], 0 for match in _MD_TOKEN_RE.finditer(text): parts.append(escape_markdown(text[pos:match.start()], version=2)) if match.group(1): lang = re.sub(r"[^A-Za-z0-9_+-]", "", match.group(2) or "") code = _escape_pre(match.group(3) or "") header = f"```{lang}\n" if lang else "```\n" parts.append(f"{header}{code}\n```") elif match.group(4) is not None: parts.append(_quote_to_markdown_v2(match.group(4))) elif match.group(5) is not None: label = escape_markdown(match.group(5), version=2) target = _escape_link_target(match.group(6)) parts.append(f"[{label}]({target})") elif match.group(7) is not None: parts.append(f"`{_escape_code(match.group(7))}`") elif match.group(8) is not None: parts.append(f"*{escape_markdown(match.group(8), version=2)}*") elif match.group(9) is not None: parts.append(f"*{escape_markdown(match.group(9), version=2)}*") elif match.group(10) is not None: parts.append(f"~{escape_markdown(match.group(10), version=2)}~") elif match.group(11) is not None: parts.append(f"_{escape_markdown(match.group(11), version=2)}_") pos = match.end() parts.append(escape_markdown(text[pos:], version=2)) return "".join(parts) def _is_not_modified_error(exc): return "not modified" in str(exc).lower() def _extract_ask_user_event(ctx): exit_reason = (ctx or {}).get("exit_reason") or {} if exit_reason.get("result") != "EXITED": return None payload = exit_reason.get("data") if not isinstance(payload, dict): return None if payload.get("status") != "INTERRUPT" or payload.get("intent") != "HUMAN_INTERVENTION": return None data = payload.get("data") if not isinstance(data, dict): return None raw_candidates = data.get("candidates") or [] if not isinstance(raw_candidates, (list, tuple)): return None candidates = [] for candidate in raw_candidates: if candidate is None: continue text = str(candidate).strip() if text: candidates.append(text) if not candidates: return None question = str(data.get("question") or "请选择下一步操作:").strip() or "请选择下一步操作:" return {"question": question, "candidates": candidates} def _register_ask_user_hook(): if not hasattr(agent, "_turn_end_hooks"): agent._turn_end_hooks = {} def _hook(ctx): event = _extract_ask_user_event(ctx) if event: _ask_menu_events.put(event) agent._turn_end_hooks[_ASK_USER_HOOK_KEY] = _hook def _drain_latest_ask_user_event(): latest = None while True: try: latest = _ask_menu_events.get_nowait() except Q.Empty: break return latest def _build_ask_user_markup(menu_id, candidates): rows = [ [InlineKeyboardButton(candidate, callback_data=f"{_ASK_CALLBACK_PREFIX}{menu_id}:{idx}")] for idx, candidate in enumerate(candidates) ] rows.append([ InlineKeyboardButton(_ASK_CANCEL_LABEL, callback_data=f"{_ASK_CALLBACK_PREFIX}{menu_id}:{_ASK_CANCEL_ACTION}") ]) return InlineKeyboardMarkup(rows) def _parse_ask_callback_data(data): if not (data or "").startswith(_ASK_CALLBACK_PREFIX): return None, None payload = data[len(_ASK_CALLBACK_PREFIX):] menu_id, sep, action = payload.partition(":") if not sep or not menu_id or not action: return None, None return menu_id, action def _build_text_prompt(text): return f"{FILE_HINT}\n\n{text}" def _normalize_ask_menu_event(stored): if isinstance(stored, dict): candidates = stored.get("candidates") or [] return { "question": str(stored.get("question") or "请选择下一步操作:").strip() or "请选择下一步操作:", "candidates": [str(candidate).strip() for candidate in candidates if str(candidate).strip()], } if isinstance(stored, (list, tuple)): return { "question": "请选择下一步操作:", "candidates": [str(candidate).strip() for candidate in stored if str(candidate).strip()], } return None def _render_ask_user_result(event, selected=None, cancelled=False): question = str(event.get("question") or "请选择下一步操作:").strip() or "请选择下一步操作:" candidates = event.get("candidates") or [] lines = [question, "", "选项:"] for idx, candidate in enumerate(candidates, start=1): lines.append(f"{idx}. {candidate}") lines.append(f"{len(candidates) + 1}. {_ASK_CANCEL_LABEL}") lines.append("") if cancelled: lines.append(f"已取消:{_ASK_CANCEL_LABEL}") elif selected: lines.append(f"已选择:{selected}") text = "\n".join(lines) if len(text) > MessageLimit.MAX_TEXT_LENGTH: text = text[:MessageLimit.MAX_TEXT_LENGTH - 18].rstrip() + "\n...[truncated]" return text async def _clear_ask_reply_markup(query): try: await query.edit_message_reply_markup(reply_markup=None) except Exception as exc: print(f"[TG ask_user menu cleanup] {type(exc).__name__}: {exc}", flush=True) async def _edit_ask_user_result(query, event, selected=None, cancelled=False): try: await query.edit_message_text( _render_ask_user_result(event, selected=selected, cancelled=cancelled), reply_markup=None, ) except Exception as exc: print(f"[TG ask_user menu edit] {type(exc).__name__}: {exc}", flush=True) await _clear_ask_reply_markup(query) async def _send_ask_user_menu(root_msg, event): menu_id = uuid.uuid4().hex[:16] candidates = event["candidates"] _ask_menu_store[menu_id] = {"question": event["question"], "candidates": list(candidates)} try: await root_msg.reply_text( event["question"], reply_markup=_build_ask_user_markup(menu_id, candidates), ) except Exception as exc: _ask_menu_store.pop(menu_id, None) print(f"[TG ask_user menu error] {type(exc).__name__}: {exc}", flush=True) fallback = event["question"] + "\n" + "\n".join(f"- {candidate}" for candidate in candidates) await root_msg.reply_text(fallback) class _TelegramStreamSession: def __init__(self, root_msg): self.root_msg = root_msg self.private_chat = getattr(getattr(root_msg, "chat", None), "type", "") == ChatType.PRIVATE self.can_use_draft = self.private_chat # update tg client! self.draft_id = _make_draft_id() self.live_msg = None self.raw_text = "" self.files = [] self.sent_segments = 0 self.active_display = "" self.pending_display = "" self.retry_until = 0.0 self.last_update_at = 0.0 self.last_update_raw_len = 0 def _now(self): return time.monotonic() def _retry_after_seconds(self, exc): retry_after = getattr(exc, "_retry_after", None) if retry_after is None: retry_after = getattr(exc, "retry_after", 0) or 0 if hasattr(retry_after, "total_seconds"): retry_after = retry_after.total_seconds() try: return max(0.0, float(retry_after)) except (TypeError, ValueError): return 0.0 def _set_retry_after(self, exc): wait_seconds = self._retry_after_seconds(exc) + _RETRY_AFTER_MARGIN_SECONDS self.retry_until = max(self.retry_until, self._now() + wait_seconds) def _is_retrying(self): return self._now() < self.retry_until async def _wait_for_retry(self): remaining = self.retry_until - self._now() if remaining > 0: await asyncio.sleep(remaining) def _should_stream_update(self, display): if display == self.active_display: return False if self.last_update_at <= 0: return True elapsed = self._now() - self.last_update_at raw_delta = len(self.raw_text) - self.last_update_raw_len return elapsed >= _STREAM_UPDATE_INTERVAL_SECONDS or raw_delta >= _STREAM_MIN_UPDATE_CHARS def _mark_stream_update(self, display): self.active_display = display self.pending_display = "" self.last_update_at = self._now() self.last_update_raw_len = len(self.raw_text) def _stream_display(self, text): base = (text or _DRAFT_HINT).strip() or _DRAFT_HINT safe_parts = _markdown_safe_segments(base) base = safe_parts[-1] if safe_parts else _DRAFT_HINT if base == _DRAFT_HINT: return base display = base + _STREAM_SUFFIX if len(_to_markdown_v2(display)) <= MessageLimit.MAX_TEXT_LENGTH: return display return base async def prime(self): if self.can_use_draft: draft_result = await self._send_draft(_DRAFT_HINT) if draft_result is True: self.active_display = _DRAFT_HINT return if draft_result is None: self.active_display = _DRAFT_HINT return try: await self._upsert_live_message(_DRAFT_HINT, wait_retry=False) except RetryAfter: self.active_display = _DRAFT_HINT return self.active_display = _DRAFT_HINT async def add_chunk(self, chunk): if not chunk: return self.raw_text += chunk await self._refresh(done=False, send_files=False) async def finalize(self, full_text=None, send_files=True): if full_text is not None: self.raw_text = full_text await self._refresh(done=True, send_files=send_files) async def finish_with_notice(self, notice): if self.raw_text.strip(): await self.finalize(send_files=False) await self._reply_text(notice) return if self.live_msg is not None: await self._edit_text(self.live_msg, notice) self.live_msg = None self.active_display = "" return await self._reply_text(notice) self.active_display = "" async def _refresh(self, done, send_files): summary = _extract_turn_summary(self.raw_text) cleaned = clean_reply(self.raw_text) if self.raw_text.strip() else "" self.files = _files_from_text(cleaned) body = _inject_turn_summary(_render_file_markers(cleaned), summary) if done and not body and self.files: body = "已生成附件" elif done and not body: body = "..." segments = _visible_segments(body) finalized_target = len(segments) if done else max(len(segments) - 1, 0) while self.sent_segments < finalized_target: await self._finalize_segment(segments[self.sent_segments]) self.sent_segments += 1 if done: if send_files: await self._send_files() return active_text = segments[-1] if segments else _DRAFT_HINT await self._stream_active(active_text) async def _stream_active(self, text): display = self._stream_display(text) if display == self.active_display: return self.pending_display = display if self._is_retrying() or not self._should_stream_update(display): return try: if self.can_use_draft: draft_result = await self._send_draft(display) if draft_result is True: self._mark_stream_update(display) return if draft_result is None: return await self._upsert_live_message(display, wait_retry=False) self._mark_stream_update(display) except RetryAfter: return async def _finalize_segment(self, text): final_text = (text or "").strip() or "..." if self.live_msg is not None: await self._edit_text(self.live_msg, final_text) self.live_msg = None else: await self._reply_text(final_text) self.active_display = "" if self.can_use_draft: self.draft_id = _make_draft_id() async def _send_files(self): await _send_files(self.root_msg, self.files) async def _send_draft(self, text): try: await self.root_msg.reply_text_draft( self.draft_id, _to_markdown_v2(text), parse_mode=ParseMode.MARKDOWN_V2, ) return True except RetryAfter as exc: self._set_retry_after(exc) return None except Exception as exc: if _is_not_modified_error(exc): return True print(f"[TG draft fallback] {type(exc).__name__}: {exc}", flush=True) self.can_use_draft = False self.draft_id = _make_draft_id() return False async def _retry_call(self, func, *args): while True: await self._wait_for_retry() try: return await func(*args) except RetryAfter as exc: self._set_retry_after(exc) async def _reply_text_once(self, text): markdown = _to_markdown_v2(text) try: return await self.root_msg.reply_text(markdown, parse_mode=ParseMode.MARKDOWN_V2) except RetryAfter as exc: self._set_retry_after(exc) raise except Exception as exc: if _is_not_modified_error(exc): return None try: return await self.root_msg.reply_text(text) except RetryAfter as retry_exc: self._set_retry_after(retry_exc) raise async def _reply_text(self, text, wait_retry=True): last_msg = None for segment in _markdown_safe_segments(text) or ["..."]: if wait_retry: last_msg = await self._retry_call(self._reply_text_once, segment) else: last_msg = await self._reply_text_once(segment) return last_msg async def _edit_text_once(self, msg, text): markdown = _to_markdown_v2(text) try: updated = await msg.edit_text(markdown, parse_mode=ParseMode.MARKDOWN_V2) except RetryAfter as exc: self._set_retry_after(exc) raise except Exception as exc: if _is_not_modified_error(exc): return msg try: updated = await msg.edit_text(text) except RetryAfter as retry_exc: self._set_retry_after(retry_exc) raise return updated if hasattr(updated, "edit_text") else msg async def _edit_text(self, msg, text, wait_retry=True): segments = _markdown_safe_segments(text) or ["..."] if wait_retry: updated = await self._retry_call(self._edit_text_once, msg, segments[0]) else: updated = await self._edit_text_once(msg, segments[0]) for segment in segments[1:]: updated = await self._reply_text(segment, wait_retry=wait_retry) return updated if hasattr(updated, "edit_text") else msg async def _upsert_live_message(self, text, wait_retry=True): if self.live_msg is None: self.live_msg = await self._reply_text(text, wait_retry=wait_retry) else: self.live_msg = await self._edit_text(self.live_msg, text, wait_retry=wait_retry) class _TelegramTurnStreamCoordinator: def __init__(self, root_msg): self.root_msg = root_msg self.session = None self.pending_line = "" self.code_fence_len = 0 self.last_turn = 0 async def prime(self): await self._ensure_session() async def add_chunk(self, chunk): if not chunk: return text = self.pending_line + chunk self.pending_line = "" for line in text.splitlines(keepends=True): if _line_complete(line): await self._process_line(line) elif _maybe_partial_turn_marker(line) or _maybe_partial_code_fence(line): self.pending_line = line else: await self._process_line(line) async def finalize(self, done_text="", send_files=True): await self._flush_pending_line() if self.session is None: if done_text: await self._add_to_current(done_text) elif not self.session.raw_text.strip() and done_text: await self.session.finalize(done_text, send_files=False) if send_files: await _send_files_from_text(self.root_msg, done_text) return if self.session is not None: await self.session.finalize(send_files=False) if send_files: await _send_files_from_text(self.root_msg, done_text) async def finish_with_notice(self, notice): await self._flush_pending_line() await self._ensure_session() await self.session.finish_with_notice(notice) async def _ensure_session(self): if self.session is None: self.session = _TelegramStreamSession(self.root_msg) await self.session.prime() async def _start_turn(self, marker): if self.session is not None and self.session.raw_text.strip(): await self.session.finalize(send_files=False) self.session = None await self._ensure_session() await self.session.add_chunk(marker) async def _add_to_current(self, text): if not text: return await self._ensure_session() await self.session.add_chunk(text) async def _process_line(self, line): turn_no = _turn_marker_number(line) if self.code_fence_len == 0 and turn_no == self.last_turn + 1: self.last_turn = turn_no await self._start_turn(line) return await self._add_to_current(line) self._update_code_fence(line) async def _flush_pending_line(self): if not self.pending_line: return line = self.pending_line self.pending_line = "" await self._add_to_current(line) def _update_code_fence(self, line): match = _CODE_FENCE_RE.match(line or "") if not match: return fence_len = len(match.group(1)) if self.code_fence_len: if fence_len >= self.code_fence_len: self.code_fence_len = 0 return self.code_fence_len = fence_len async def _stream(dq, msg): stream = _TelegramTurnStreamCoordinator(msg) await stream.prime() try: while True: try: first = await asyncio.to_thread(dq.get, True, _QUEUE_WAIT_SECONDS) except Q.Empty: continue items = [first] try: while True: items.append(dq.get_nowait()) except Q.Empty: pass done_item = None for item in items: chunk = item.get("next", "") if chunk: await stream.add_chunk(chunk) if "done" in item: done_item = item break if done_item is not None: await stream.finalize(done_item.get("done", "")) event = _drain_latest_ask_user_event() if event: await _send_ask_user_menu(msg, event) break except asyncio.CancelledError: await stream.finish_with_notice("⏹️ 已停止") except RetryAfter as exc: print(f"[TG stream retry_after] {type(exc).__name__}: {exc}", flush=True) if stream.session is not None: stream.session._set_retry_after(exc) except Exception as exc: print(f"[TG stream error] {type(exc).__name__}: {exc}", flush=True) if stream.session is not None and stream.session._is_retrying(): return try: await stream.finish_with_notice(f"❌ 输出失败: {exc}") except RetryAfter as retry_exc: print(f"[TG stream error notice retry_after] {type(retry_exc).__name__}: {retry_exc}", flush=True) def _normalized_command(text): parts = (text or "").strip().split(None, 1) if not parts: return '' head = parts[0].lower() if head.startswith('/'): head = '/' + head[1:].split('@', 1)[0] return head + (f" {parts[1].strip()}" if len(parts) > 1 and parts[1].strip() else '') def _cancel_stream_task(ctx): task = ctx.user_data.pop('stream_task', None) if task and not task.done(): task.cancel() async def _sync_commands(application): await application.bot.set_my_commands([BotCommand(command, description) for command, description in TELEGRAM_MENU_COMMANDS]) async def handle_msg(update, ctx): uid = update.effective_user.id if ALLOWED and uid not in ALLOWED: return await update.message.reply_text("no") prompt = _build_text_prompt(update.message.text) dq = agent.put_task(prompt, source="telegram") task = asyncio.create_task(_stream(dq, update.message)) ctx.user_data['stream_task'] = task async def handle_ask_callback(update, ctx): query = update.callback_query if query is None: return uid = update.effective_user.id if update.effective_user else None if ALLOWED and uid not in ALLOWED: return await query.answer("no", show_alert=True) menu_id, action = _parse_ask_callback_data(query.data) if not menu_id: return await query.answer("菜单无效") event = _normalize_ask_menu_event(_ask_menu_store.get(menu_id)) if event is None: await query.answer("菜单已过期") return await _clear_ask_reply_markup(query) candidates = event["candidates"] if action == _ASK_CANCEL_ACTION: _ask_menu_store.pop(menu_id, None) await query.answer() await _edit_ask_user_result(query, event, cancelled=True) if query.message is not None: await query.message.reply_text(_ASK_CANCEL_PROMPT) return try: selected = candidates[int(action)] except (ValueError, IndexError): return await query.answer("菜单无效") _ask_menu_store.pop(menu_id, None) await query.answer() await _edit_ask_user_result(query, event, selected=selected) if query.message is None: return dq = agent.put_task(_build_text_prompt(selected), source="telegram") task = asyncio.create_task(_stream(dq, query.message)) ctx.user_data['stream_task'] = task async def cmd_abort(update, ctx): _cancel_stream_task(ctx) agent.abort() await update.message.reply_text("⏹️ 正在停止...") async def cmd_llm(update, ctx): args = (update.message.text or '').split() if len(args) > 1: try: n = int(args[1]) agent.next_llm(n) await update.message.reply_text(f"✅ 已切换到 [{agent.llm_no}] {agent.get_llm_name()}") except (ValueError, IndexError): await update.message.reply_text(f"用法: /llm <0-{len(agent.list_llms())-1}>") else: lines = [f"{'→' if cur else ' '} [{i}] {name}" for i, name, cur in agent.list_llms()] await update.message.reply_text("LLMs:\n" + "\n".join(lines)) async def handle_photo(update, ctx): uid = update.effective_user.id if ALLOWED and uid not in ALLOWED: return await update.message.reply_text("no") if update.message.photo: photo = update.message.photo[-1] file = await photo.get_file() fpath = f"tg_{photo.file_unique_id}.jpg" kind = "图片" elif update.message.document: doc = update.message.document file = await doc.get_file() ext = os.path.splitext(doc.file_name or '')[1] or '' fpath = f"tg_{doc.file_unique_id}{ext}" kind = "文件" else: return await file.download_to_drive(os.path.join(_TEMP_DIR, fpath)) caption = update.message.caption prompt = f"[TIPS] 收到{kind}temp/{fpath}\n{caption}" if caption else f"[TIPS] 收到{kind}temp/{fpath},请等待下一步指令" dq = agent.put_task(prompt, source="telegram") task = asyncio.create_task(_stream(dq, update.message)) ctx.user_data['stream_task'] = task async def handle_command(update, ctx): uid = update.effective_user.id if ALLOWED and uid not in ALLOWED: return await update.message.reply_text("no") cmd = _normalized_command(update.message.text) op = cmd.split()[0] if cmd else '' if op == '/help': return await update.message.reply_text(HELP_TEXT) if op == '/status': llm = agent.get_llm_name() if agent.llmclient else '未配置' return await update.message.reply_text(f"状态: {'🔴 运行中' if agent.is_running else '🟢 空闲'}\nLLM: [{agent.llm_no}] {llm}") if op == '/stop': return await cmd_abort(update, ctx) if op == '/llm': return await cmd_llm(update, ctx) if op == '/new': _cancel_stream_task(ctx) return await update.message.reply_text(reset_conversation(agent)) if op == '/restore': _cancel_stream_task(ctx) try: restored_info, err = format_restore() if err: return await update.message.reply_text(err) restored, fname, count = restored_info agent.abort() agent.history.extend(restored) return await update.message.reply_text(f"✅ 已恢复 {count} 轮对话\n来源: {fname}\n(仅恢复上下文,请输入新问题继续)") except Exception as e: return await update.message.reply_text(f"❌ 恢复失败: {e}") if op == '/continue': if cmd != '/continue': _cancel_stream_task(ctx) return await update.message.reply_text(handle_frontend_command(agent, cmd)) return await update.message.reply_text(HELP_TEXT) if __name__ == '__main__': _LOCK_SOCK = ensure_single_instance(19527, "Telegram") if not ALLOWED: print('[Telegram] ERROR: tg_allowed_users in mykey.py is empty or missing. Set it to avoid unauthorized access.') sys.exit(1) require_runtime(agent, "Telegram", tg_bot_token=mykeys.get("tg_bot_token")) redirect_log(__file__, "tgapp.log", "Telegram", ALLOWED) _register_ask_user_hook() threading.Thread(target=agent.run, daemon=True).start() proxy = mykeys.get('proxy') if proxy: print('proxy:', proxy) else: print('proxy: ') async def _error_handler(update, context: ContextTypes.DEFAULT_TYPE): print(f"[{time.strftime('%m-%d %H:%M')}] TG error: {context.error}", flush=True) while True: try: print(f"TG bot starting... {time.strftime('%m-%d %H:%M')}") # Recreate request and app objects on each restart to avoid stale connections request_kwargs = dict(read_timeout=30, write_timeout=30, connect_timeout=30, pool_timeout=30) if proxy: request_kwargs['proxy'] = proxy request = HTTPXRequest(**request_kwargs) app = (ApplicationBuilder().token(mykeys['tg_bot_token']) .request(request).get_updates_request(request).post_init(_sync_commands).build()) app.add_handler(CallbackQueryHandler(handle_ask_callback, pattern=r"^ask:")) app.add_handler(MessageHandler(filters.COMMAND, handle_command)) app.add_handler(MessageHandler(filters.PHOTO, handle_photo)) app.add_handler(MessageHandler(filters.Document.ALL, handle_photo)) app.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_msg)) app.add_error_handler(_error_handler) app.run_polling(drop_pending_updates=True, poll_interval=1.0, timeout=30) except Exception as e: print(f"[{time.strftime('%m-%d %H:%M')}] polling crashed: {e}", flush=True) time.sleep(10) asyncio.set_event_loop(asyncio.new_event_loop()) ================================================ FILE: frontends/tuiapp.py ================================================ """Textual terminal UI for GenericAgent. Run from the project root: python frontends/tuiapp.py Useful options: python frontends/tuiapp.py --help MVP design notes: - One TUI manages multiple GenericAgent instances. - GenericAgent.put_task() returns a per-task display_queue; the TUI records a task_id for every submit. - Agent.run() and display_queue.get() run in daemon threads; UI updates are posted via App.call_from_thread(). - Multiple sessions may run concurrently, but GenericAgent still shares project temp/memory/tool globals. """ from __future__ import annotations import argparse import os import queue import re import sys import threading import time from dataclasses import dataclass, field from itertools import count from typing import Any, Callable, Optional try: from rich.markdown import Markdown from rich.panel import Panel from rich.text import Text from textual.app import App, ComposeResult from textual.binding import Binding from textual.containers import Horizontal, Vertical from textual.widgets import Footer, Header, Input, RichLog, Static except ModuleNotFoundError as exc: # pragma: no cover - exercised by manual missing-dep path if exc.name == "textual": print("Textual is required. Install with: pip install textual", file=sys.stderr) else: print(f"Missing dependency: {exc.name}", file=sys.stderr) raise SystemExit(2) from exc ROOT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), "..")) if ROOT_DIR not in sys.path: sys.path.insert(0, ROOT_DIR) AgentFactory = Callable[[], Any] @dataclass class ChatMessage: role: str content: str task_id: Optional[int] = None done: bool = True _rendered_panel: Any = field(default=None, repr=False) @dataclass class AgentSession: agent_id: int name: str agent: Any thread: Optional[threading.Thread] = None status: str = "idle" messages: list[ChatMessage] = field(default_factory=list) task_seq: int = 0 current_task_id: Optional[int] = None current_display_queue: Optional[queue.Queue] = None buffer: str = "" def fold_turns(text: str) -> list[dict[str, str]]: """Split GenericAgent turn output into text/fold segments. Completed turns become ``{'type': 'fold', 'title': ..., 'content': ...}``. The latest/incomplete turn remains ``type='text'`` for streaming refresh. """ placeholders: list[str] = [] def stash(match: re.Match[str]) -> str: placeholders.append(match.group(0)) return f"\x00PH{len(placeholders) - 1}\x00" safe = re.sub(r"`{4,}.*?`{4,}", stash, text, flags=re.DOTALL) safe = re.sub(r"`{4,}[^`].*$", stash, safe, flags=re.DOTALL) parts = re.split(r"(\**LLM Running \(Turn \d+\) \.\.\.\**)", safe) def restore(part: str) -> str: return re.sub(r"\x00PH(\d+)\x00", lambda m: placeholders[int(m.group(1))], part) parts = [restore(p) for p in parts] if len(parts) < 4: return [{"type": "text", "content": text}] segments: list[dict[str, str]] = [] if parts[0].strip(): segments.append({"type": "text", "content": parts[0]}) turns: list[tuple[str, str]] = [] for i in range(1, len(parts), 2): marker = parts[i] content = parts[i + 1] if i + 1 < len(parts) else "" turns.append((marker, content)) for idx, (marker, content) in enumerate(turns): if idx < len(turns) - 1: cleaned = re.sub(r"`{3,}.*?`{3,}|.*?", "", content, flags=re.DOTALL) matches = re.findall(r"\s*((?:(?!).)*?)\s*", cleaned, re.DOTALL) if matches: title = matches[0].strip().split("\n", 1)[0] else: title = cleaned.strip().split("\n", 1)[0] or marker.strip("*") # Strip trailing args portion from tool-call lines title = re.sub(r",?\s*args:.*$", "", title) if len(title) > 72: title = title[:72] + "..." segments.append({"type": "fold", "title": title, "content": content}) else: segments.append({"type": "text", "content": marker + content}) return segments def render_folded_text(text: str) -> str: """Render fold segments as terminal-friendly Markdown text. Textual's interactive Collapsible widgets are best for static layouts; the MVP uses a RichLog and re-renders compact summaries for completed turns to keep streaming cheap. """ rendered: list[str] = [] for seg in fold_turns(text): if seg["type"] == "fold": rendered.append(f"\n▸ {seg.get('title') or 'completed turn'}\n\n") else: rendered.append(seg.get("content", "")) return "".join(rendered) def parse_local_command(raw: str) -> tuple[str, list[str]] | None: """Return (command, args) for TUI-owned slash commands; unknown slash is passthrough.""" text = (raw or "").strip() if not text.startswith("/"): return None name, *rest = text.split(maxsplit=1) cmd = name[1:].lower() args = rest[0].split() if rest else [] if cmd in {"help", "status", "new", "switch", "sessions", "stop", "llm", "branch", "rewind", "clear", "close", "quit", "exit"}: return cmd, args return None def default_agent_factory() -> Any: from agentmain import GenericAgent agent = GenericAgent() agent.inc_out = True return agent class GenericAgentTUI(App[None]): """Textual app that manages multiple GenericAgent sessions.""" CSS = """ Screen { layout: vertical; } #body { height: 1fr; } #sidebar { width: 30; min-width: 24; border: solid $accent; padding: 0 1; overflow-x: hidden; } #main { width: 1fr; } #status { height: 3; border: solid $primary; padding: 0 1; } #log { height: 1fr; border: solid $primary; padding: 0 1; } #prompt { dock: bottom; } .hint { color: $text-muted; } """ BINDINGS = [ ("ctrl+n", "new_session", "New session"), ("ctrl+s", "stop_current", "Stop"), ("ctrl+f", "toggle_fold", "Fold/Unfold"), ("ctrl+q", "quit", "Quit"), Binding("ctrl+left", "prev_session", "←Prev", show=True, priority=True), Binding("ctrl+right", "next_session", "Next→", show=True, priority=True), ] def __init__(self, agent_factory: Optional[AgentFactory] = None) -> None: super().__init__() self.agent_factory: AgentFactory = agent_factory or default_agent_factory self.sessions: dict[int, AgentSession] = {} self.current_id: Optional[int] = None self._ids = count(1) self.fold_mode: bool = True self._last_stream_refresh: float = 0.0 self._stream_throttle_ms: float = 0.15 # seconds between streaming UI refreshes def compose(self) -> ComposeResult: yield Header(show_clock=True) with Horizontal(id="body"): yield Static("", id="sidebar") with Vertical(id="main"): yield Static("", id="status") yield RichLog(id="log", wrap=True, highlight=True, markup=True) yield Input(placeholder="Message, or /help /new /branch /rewind /switch /clear /close /stop /llm /resume", id="prompt") yield Footer() def on_mount(self) -> None: self.add_session("main") self._system("Welcome to GenericAgent TUI. Type /help for commands.") self.query_one("#prompt", Input).focus() def on_resize(self, event) -> None: narrow = self.size.width < 70 self.query_one("#sidebar").styles.display = "none" if narrow else "block" @property def current(self) -> AgentSession: if self.current_id is None: raise RuntimeError("no active session") return self.sessions[self.current_id] def add_session(self, name: Optional[str] = None) -> AgentSession: agent_id = next(self._ids) agent = self.agent_factory() try: agent.inc_out = True except Exception: pass session = AgentSession(agent_id=agent_id, name=name or f"agent-{agent_id}", agent=agent) thread = threading.Thread(target=agent.run, name=f"ga-tui-agent-{agent_id}", daemon=True) thread.start() session.thread = thread self.sessions[agent_id] = session self.current_id = agent_id self._refresh_all() return session def action_prev_session(self) -> None: """Switch to previous session.""" ids = sorted(self.sessions.keys()) if len(ids) <= 1: return idx = ids.index(self.current_id) self.current_id = ids[(idx - 1) % len(ids)] self._refresh_all() def action_next_session(self) -> None: """Switch to next session.""" ids = sorted(self.sessions.keys()) if len(ids) <= 1: return idx = ids.index(self.current_id) self.current_id = ids[(idx + 1) % len(ids)] self._refresh_all() def action_switch_session(self, n: int) -> None: """Switch to session by id (used by /switch command).""" if n in self.sessions: self.current_id = n self._refresh_all() else: self.notify(f"Session #{n} does not exist.", severity="warning") def action_new_session(self) -> None: self.add_session() self._system(f"Created and switched to session #{self.current_id}.") def action_stop_current(self) -> None: self._cmd_stop([]) def on_input_submitted(self, event: Input.Submitted) -> None: value = event.value.rstrip() event.input.value = "" if not value: self._system("Empty input ignored. Type /help for commands.") return parsed = parse_local_command(value) if parsed: cmd, args = parsed self._dispatch_command(cmd, args) return self.submit_user_message(value) def _dispatch_command(self, cmd: str, args: list[str]) -> None: handlers = { "help": self._cmd_help, "status": self._cmd_status, "new": self._cmd_new, "switch": self._cmd_switch, "sessions": self._cmd_sessions, "stop": self._cmd_stop, "llm": self._cmd_llm, "branch": self._cmd_branch, "rewind": self._cmd_rewind, "clear": self._cmd_clear, "close": self._cmd_close, "quit": lambda _args: self.exit(), "exit": lambda _args: self.exit(), } handlers[cmd](args) def submit_user_message(self, text: str) -> int: session = self.current if session.status == "running": self._system(f"Session #{session.agent_id} is already running; wait or /stop before submitting another task.") return -1 session.task_seq += 1 task_id = session.task_seq session.current_task_id = task_id session.buffer = "" session.status = "running" session.messages.append(ChatMessage("user", text)) session.messages.append(ChatMessage("assistant", "", task_id=task_id, done=False)) self._refresh_all() try: display_queue = session.agent.put_task(text, source="user") except Exception as exc: session.status = "error" self._set_assistant_message(session.agent_id, task_id, f"[ERROR] put_task failed: {exc}", done=True) return task_id session.current_display_queue = display_queue threading.Thread( target=self._consume_display_queue, args=(session.agent_id, task_id, display_queue), name=f"ga-tui-consumer-{session.agent_id}-{task_id}", daemon=True, ).start() return task_id def _consume_display_queue(self, agent_id: int, task_id: int, display_queue: queue.Queue) -> None: buffer = "" while True: try: item = display_queue.get(timeout=0.25) except queue.Empty: continue if "next" in item: buffer += str(item.get("next") or "") self.call_from_thread(self._on_stream_update, agent_id, task_id, buffer, False) if "done" in item: done_text = str(item.get("done") or buffer) self.call_from_thread(self._on_stream_update, agent_id, task_id, done_text, True) return def _on_stream_update(self, agent_id: int, task_id: int, text: str, done: bool) -> None: session = self.sessions.get(agent_id) if not session: return if session.current_task_id != task_id: session.messages.append(ChatMessage("system", f"Stale update ignored for task {task_id}.", done=True)) return session.buffer = text if done: session.status = "idle" session.current_display_queue = None self._set_assistant_message(agent_id, task_id, text, done=done) def _set_assistant_message(self, agent_id: int, task_id: int, text: str, *, done: bool) -> None: session = self.sessions.get(agent_id) if not session: return for msg in reversed(session.messages): if msg.role == "assistant" and msg.task_id == task_id: msg.content = text msg.done = done break else: session.messages.append(ChatMessage("assistant", text, task_id=task_id, done=done)) if agent_id == self.current_id: self._refresh_all() else: self._refresh_sidebar() def _cmd_help(self, args: list[str]) -> None: self._system( "Commands:\n" "/help - show this help\n" "/new [name] - create and switch to a new agent session\n" "/branch [name] - fork current session (copies LLM history + display)\n" "/rewind - list rewindable turns; /rewind to truncate history\n" "/switch - switch active session\n" "/sessions - list sessions\n" "/status - show current/all status\n" "/stop - abort current session task\n" "/clear - clear chat display (keeps LLM history)\n" "/close - close current session (cannot close last)\n" "/llm - list models for current session\n" "/llm - switch model for current session\n" "/quit - exit TUI\n\n" "Unknown slash commands (for example /session.x=... or /resume) are sent to GenericAgent." ) def _cmd_new(self, args: list[str]) -> None: name = " ".join(args).strip() or None session = self.add_session(name) self._system(f"Created session #{session.agent_id} {session.name!r}. Shared temp/memory are not isolated.") def _cmd_branch(self, args: list[str]) -> None: import copy old_session = self.current name = " ".join(args).strip() or f"{old_session.name}-branch" new_session = self.add_session(name) # Copy LLM backend history try: new_session.agent.llmclient.backend.history = copy.deepcopy( old_session.agent.llmclient.backend.history ) except Exception as e: self._system(f"Branch warning: failed to copy history: {e}") return # Copy TUI display messages new_session.messages = copy.deepcopy(old_session.messages) new_session.task_seq = old_session.task_seq n = len(new_session.agent.llmclient.backend.history) self._system(f"Branched from #{old_session.agent_id} → #{new_session.agent_id} ({n} messages inherited).") def _cmd_rewind(self, args: list[str]) -> None: session = self.current if session.status == "running": self._system("Cannot rewind while running. /stop first.") return history = session.agent.llmclient.backend.history # Find real user turn boundaries — skip tool_result messages turns = [] # list of (index_in_history, preview_text) for i, msg in enumerate(history): if msg.get("role") != "user": continue content = msg.get("content") # Pure string content is always a real user message if isinstance(content, str): turns.append((i, content[:60])) continue if isinstance(content, list): # Skip if content is purely tool_result blocks has_tool_result = any(b.get("type") == "tool_result" for b in content if isinstance(b, dict)) if has_tool_result: continue texts = [b.get("text", "") for b in content if isinstance(b, dict) and b.get("type") == "text"] if texts and any(t.strip() for t in texts): turns.append((i, (texts[0] or "")[:60])) if not turns: self._system("No rewindable turns in history.") return # Reverse numbering: 1 = most recent turn, 2 = second most recent, etc. # /rewind without args: show list if not args: lines = [f"Rewindable turns ({len(turns)} total, showing last 10):"] show = turns[-10:] for offset, (_, preview) in enumerate(reversed(show), 1): lines.append(f" {offset}) {preview!r}") lines.append("/rewind to rewind n turns (1 = undo last turn).") self._system("\n".join(lines)) return # /rewind : truncate last n turns try: n = int(args[0]) except ValueError: self._system("Usage: /rewind (1 = undo last turn)") return if n < 1 or n > len(turns): self._system(f"Invalid: range is 1-{len(turns)}") return # cut_at = index of the n-th turn from the end cut_at = turns[-n][0] removed = len(history) - cut_at history[:] = history[:cut_at] # Sync TUI messages: keep only messages before the corresponding user message real_user_indices = [i for i, msg in enumerate(session.messages) if msg.role == "user"] if n <= len(real_user_indices): cut_msg = real_user_indices[-n] session.messages = session.messages[:cut_msg] # Mark rewind in agentmain's working memory history try: session.agent.history.append(f"[USER]: /rewind {n}") except Exception: pass self._system(f"Rewound {n} turn(s). Removed {removed} history entries.") def _cmd_clear(self, args: list[str]) -> None: self.current.messages.clear(); self._refresh_all() def _cmd_close(self, args: list[str]) -> None: if len(self.sessions) <= 1: self._system("Cannot close the last session."); return del self.sessions[self.current_id] self.current_id = next(iter(self.sessions)) self._refresh_all() def _cmd_switch(self, args: list[str]) -> None: if not args: self._system("Usage: /switch ") return key = " ".join(args) target: Optional[int] = None if key.isdigit() and int(key) in self.sessions: target = int(key) else: for sid, session in self.sessions.items(): if session.name == key: target = sid break if target is None: self._system(f"No session found for {key!r}.") return self.current_id = target self._refresh_all() self._system(f"Switched to session #{target}.") def _cmd_sessions(self, args: list[str]) -> None: lines = [] for sid, session in self.sessions.items(): mark = "*" if sid == self.current_id else " " lines.append(f"{mark} #{sid} {session.name} [{session.status}] messages={len(session.messages)} task={session.current_task_id}") self._system("Sessions:\n" + "\n".join(lines)) def _cmd_status(self, args: list[str]) -> None: self._cmd_sessions(args) def _cmd_stop(self, args: list[str]) -> None: session = self.current try: session.agent.abort() session.status = "stopping" if session.status == "running" else session.status self._system(f"Stop signal sent to session #{session.agent_id}.") except Exception as exc: self._system(f"Stop failed: {exc}") self._refresh_all() def _cmd_llm(self, args: list[str]) -> None: session = self.current if args: try: session.agent.next_llm(int(args[0])) self._system(f"Switched model to #{int(args[0])}.") except Exception as exc: self._system(f"Model switch failed: {exc}") return try: rows = session.agent.list_llms() self._system("Models:\n" + "\n".join(f"{'*' if cur else ' '} {i}: {name}" for i, name, cur in rows)) except Exception as exc: self._system(f"Listing models failed: {exc}") def _system(self, text: str) -> None: if self.current_id is not None and self.current_id in self.sessions: self.current.messages.append(ChatMessage("system", text)) self._refresh_all() def _refresh_all(self) -> None: if not self.is_mounted: return self._refresh_sidebar() self._refresh_status() self._refresh_log() def _session_last_user_query(self, session: AgentSession) -> str: """Return the last user message content, truncated for sidebar display.""" for msg in reversed(session.messages): if msg.role == "user": text = msg.content.strip().replace("\n", " ") return self._truncate_display(text, 20) return "" def _session_last_summary(self, session: AgentSession) -> str: """Extract the last from the most recent assistant message.""" for msg in reversed(session.messages): if msg.role == "assistant" and msg.content: matches = re.findall(r"\s*(.*?)\s*", msg.content, re.DOTALL) if matches: text = matches[-1].strip().split("\n", 1)[0].replace("\n", " ") return self._truncate_display(text, 20) return "" @staticmethod def _truncate_display(text: str, max_width: int) -> str: """Truncate text by display width (CJK chars count as 2).""" import unicodedata width = 0 result = [] for ch in text: w = 2 if unicodedata.east_asian_width(ch) in ('W', 'F') else 1 if width + w > max_width: result.append("…") break result.append(ch) width += w return "".join(result) def _refresh_sidebar(self) -> None: sidebar = self.query_one("#sidebar", Static) max_w = 26 # 30 - 2(border) - 2(padding) lines: list[str] = ["[b]Sessions[/b]", ""] for sid, session in self.sessions.items(): mark = "▶" if sid == self.current_id else " " last_q = self._session_last_user_query(session) last_s = self._session_last_summary(session) status_style = "green" if session.status == "running" else "dim" # Header line: "▶ #1 name status" — truncate name if needed prefix = f"{mark} #{sid} " suffix = f" {session.status}" name_max = max_w - len(prefix) - len(suffix) name_disp = self._truncate_display(session.name, max(name_max, 4)) lines.append(f"{prefix}{name_disp} [{status_style}]{session.status}[/{status_style}]") if last_q: lines.append(f" [dim]Q:{last_q}[/dim]") if last_s: lines.append(f" [dim]S:{last_s}[/dim]") lines.append("") lines.append("[dim]/new, /switch, Ctrl+N[/dim]") lines.append("[dim]I have memory, just say what you want[/dim]") sidebar.update("\n".join(lines)) def _refresh_status(self) -> None: status = self.query_one("#status", Static) if self.current_id is None: status.update("No session") return session = self.current try: model = session.agent.get_llm_name(model=True) except Exception: model = "unknown" status.update( f"[b]#{session.agent_id} {session.name}[/b] status={session.status} task={session.current_task_id} model={model}\n" "Enter message or /help. Per-task queue streaming is enabled (inc_out=True)." ) def action_toggle_fold(self) -> None: self.fold_mode = not self.fold_mode # Invalidate cached panels for assistant messages since fold state changed if self.current_id is not None: for msg in self.current.messages: if msg.role == "assistant": msg._rendered_panel = None self._refresh_log() mode_label = "folded" if self.fold_mode else "expanded" self.notify(f"Display mode: {mode_label} (Ctrl+F to toggle)") def _refresh_log(self) -> None: log = self.query_one("#log", RichLog) log.clear() if self.current_id is None: return # Collect recent task_ids to only expand the latest 3 tasks recent_task_ids: set[int] = set() if not self.fold_mode: seen: list[int] = [] for msg in reversed(self.current.messages): if msg.role == "assistant" and msg.task_id not in seen: seen.append(msg.task_id) if len(seen) == 5: break recent_task_ids = set(seen) for msg in self.current.messages: if msg.role == "user": if msg._rendered_panel is None: msg._rendered_panel = Panel(Markdown(msg.content), title="You", border_style="blue") log.write(msg._rendered_panel) elif msg.role == "assistant": if msg.done and msg._rendered_panel is not None: log.write(msg._rendered_panel) else: suffix = "" if msg.done else "\n▌" # Fold older tasks even in unfold mode to reduce render cost should_fold = self.fold_mode or (msg.task_id not in recent_task_ids) content = render_folded_text(msg.content) if should_fold else msg.content panel = Panel(Markdown(content + suffix), title=f"Agent task {msg.task_id}", border_style="green") if msg.done: msg._rendered_panel = panel log.write(panel) else: if msg._rendered_panel is None: msg._rendered_panel = Panel(Text(msg.content), title="System", border_style="yellow") log.write(msg._rendered_panel) def build_arg_parser() -> argparse.ArgumentParser: parser = argparse.ArgumentParser(description="Textual TUI for GenericAgent") return parser def main(argv: Optional[list[str]] = None) -> int: args = build_arg_parser().parse_args(argv) app = GenericAgentTUI() app.run() return 0 if __name__ == "__main__": raise SystemExit(main()) ================================================ FILE: frontends/wechatapp.py ================================================ import os, sys, re, threading, queue, time, socket, json, struct, base64, uuid, webbrowser, hashlib, math from pathlib import Path from urllib.parse import quote import requests, qrcode from Crypto.Cipher import AES sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) _TEMP_DIR = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), 'temp') from agentmain import GeneraticAgent # ── WxBotClient (inline from wx_bot_client.py) ── for _k in ('HTTPS_PROXY', 'https_proxy'): os.environ.pop(_k, None) # avoid inherited proxy breaking WeChat long-poll SSL API = 'https://ilinkai.weixin.qq.com' TOKEN_FILE = Path.home() / '.wxbot' / 'token.json' TOKEN_FILE.parent.mkdir(exist_ok=True) VER, MSG_USER, MSG_BOT, ITEM_TEXT, STATE_FINISH = '2.1.10', 1, 2, 1, 2 ILINK_APP_ID = 'bot' ILINK_APP_CLIENT_VERSION = (2 << 16) | (1 << 8) | 10 UA = f'openclaw-weixin/{VER}' ITEM_IMAGE, ITEM_FILE, ITEM_VIDEO = 2, 4, 5 CDN_BASE = 'https://novac2c.cdn.weixin.qq.com/c2c' def _uin(): return base64.b64encode(str(struct.unpack('>I', os.urandom(4))[0]).encode()).decode() class WxBotClient: def __init__(self, token=None, token_file=None): self._tf = Path(token_file) if token_file else TOKEN_FILE self.token = token self.bot_id = None self._buf = '' if not self.token: self._load() def _load(self): if self._tf.exists(): d = json.loads(self._tf.read_text('utf-8')) self.token, self.bot_id, self._buf = d.get('bot_token',''), d.get('ilink_bot_id',''), d.get('updates_buf','') def _save(self, **kw): d = {'bot_token': self.token or '', 'ilink_bot_id': self.bot_id or '', 'updates_buf': self._buf or '', **kw} self._tf.write_text(json.dumps(d, ensure_ascii=False, indent=2), 'utf-8') def _post(self, ep, body, timeout=15): data = json.dumps(body, ensure_ascii=False, separators=(',', ':')).encode('utf-8') h = {'Content-Type': 'application/json', 'AuthorizationType': 'ilink_bot_token', 'Content-Length': str(len(data)), 'X-WECHAT-UIN': _uin(), 'iLink-App-Id': ILINK_APP_ID, 'iLink-App-ClientVersion': str(ILINK_APP_CLIENT_VERSION), 'User-Agent': UA} tok = (self.token or '').strip() if tok: h['Authorization'] = f'Bearer {tok}' r = requests.post(f'{API}/{ep}', data=data, headers=h, timeout=timeout) r.raise_for_status() return r.json() def login_qr(self, poll_interval=2): r = requests.get(f'{API}/ilink/bot/get_bot_qrcode', params={'bot_type': 3}, headers={'User-Agent': UA}, timeout=10) r.raise_for_status() d = r.json() qr_id, url = d['qrcode'], d.get('qrcode_img_content', '') print(f'[QR登录] ID: {qr_id}') if url: img = self._tf.parent / 'wx_qr.png' qrcode.make(url).save(str(img)); webbrowser.open(str(img)) qr = qrcode.QRCode(border=1); qr.add_data(url); qr.make(fit=True); qr.print_ascii(invert=True) last = '' while True: time.sleep(poll_interval) try: s = requests.get(f'{API}/ilink/bot/get_qrcode_status', params={'qrcode': qr_id}, headers={'User-Agent': UA}, timeout=60).json() except requests.exceptions.ReadTimeout: continue st = s.get('status', '') if st != last: print(f' 状态: {st}'); last = st if st == 'confirmed': self.token, self.bot_id = s.get('bot_token', ''), s.get('ilink_bot_id', '') self._save(login_time=time.strftime('%Y-%m-%d %H:%M:%S')) print(f'[QR登录] 成功! bot_id={self.bot_id}') return s if st == 'expired': raise RuntimeError('二维码过期') def get_updates(self, timeout=30): try: resp = self._post('ilink/bot/getupdates', {'get_updates_buf': self._buf or '', 'base_info': {'channel_version': VER}}, timeout=timeout + 5) except requests.exceptions.ReadTimeout: return [] if resp.get('errcode'): print(f'[getUpdates] err: {resp.get("errcode")} {resp.get("errmsg","")}') if resp['errcode'] == -14: self._buf = ''; self._save() return [] nb = resp.get('get_updates_buf', '') if nb: self._buf = nb; self._save() return resp.get('msgs') or [] def send_text(self, to_user_id, text, context_token=''): msg = {'from_user_id': '', 'to_user_id': to_user_id, 'client_id': f'pyclient-{uuid.uuid4().hex[:16]}', 'message_type': MSG_BOT, 'message_state': STATE_FINISH, 'item_list': [{'type': ITEM_TEXT, 'text_item': {'text': text}}]} if context_token: msg['context_token'] = context_token return self._post('ilink/bot/sendmessage', {'msg': msg, 'base_info': {'channel_version': VER}}) def send_typing(self, to_user_id, typing_ticket='', cancel=False): return self._post('ilink/bot/sendtyping', { 'ilink_user_id': to_user_id, 'typing_ticket': typing_ticket, 'status': 2 if cancel else 1, 'base_info': {'channel_version': VER}}) def get_typing_ticket(self, to_user_id, context_token=''): payload = {'ilink_user_id': to_user_id} if context_token: payload['context_token'] = context_token return self._post('ilink/bot/getconfig', payload).get('typing_ticket', '') def _enc(self, raw, aes_key): pad = 16 - (len(raw) % 16) return AES.new(aes_key, AES.MODE_ECB).encrypt(raw + bytes([pad] * pad)) def _upload(self, filekey, upload_param, raw, aes_key, timeout=120, upload_url=''): url = upload_url.strip() if upload_url else f'{CDN_BASE}/upload?encrypted_query_param={quote(upload_param)}&filekey={filekey}' data = self._enc(raw, aes_key) last_err = None for attempt in range(1, 4): try: r = requests.post(url, data=data, headers={'Content-Type': 'application/octet-stream', 'User-Agent': UA}, timeout=timeout) if 400 <= r.status_code < 500: msg = r.headers.get('x-error-message') or r.text[:300] raise RuntimeError(f'CDN upload client error {r.status_code}: {msg}') if r.status_code != 200: msg = r.headers.get('x-error-message') or f'status {r.status_code}' raise RuntimeError(f'CDN upload server error: {msg}') eq = r.headers.get('x-encrypted-param', '') if not eq: raise RuntimeError('CDN upload response missing x-encrypted-param header') return {'encrypt_query_param': eq, 'aes_key': base64.b64encode(aes_key.hex().encode()).decode(), 'encrypt_type': 1} except Exception as e: last_err = e if 'client error' in str(e) or attempt >= 3: break print(f'[WX] CDN upload retry {attempt}: {e}', file=sys.__stdout__) raise last_err def _send_media(self, to_user_id, file_path, media_type, item_type, item_key, context_token=''): fp = Path(file_path) raw = fp.read_bytes() filekey = uuid.uuid4().hex aes_key = os.urandom(16) ciphertext_size = ((len(raw) // 16) + 1) * 16 thumb_raw = b''; thumb_w = thumb_h = 0; thumb_ciphertext_size = 0 if item_key == 'image_item': from io import BytesIO from PIL import Image im = Image.open(fp); im.thumbnail((240, 240)) thumb_w, thumb_h = im.size if im.mode not in ('RGB', 'L'): im = im.convert('RGB') bio = BytesIO(); im.save(bio, format='JPEG', quality=85) thumb_raw = bio.getvalue() thumb_ciphertext_size = ((len(thumb_raw) // 16) + 1) * 16 body = { 'filekey': filekey, 'media_type': media_type, 'to_user_id': to_user_id, 'rawsize': len(raw), 'rawfilemd5': hashlib.md5(raw).hexdigest(), 'filesize': ciphertext_size, 'no_need_thumb': item_key not in ('image_item', 'video_item'), 'aeskey': aes_key.hex(), 'base_info': {'channel_version': VER}} if thumb_raw: body.update({'thumb_rawsize': len(thumb_raw), 'thumb_rawfilemd5': hashlib.md5(thumb_raw).hexdigest(), 'thumb_filesize': thumb_ciphertext_size}) resp = self._post('ilink/bot/getuploadurl', body) upload_param = resp.get('upload_param', '') upload_url = resp.get('upload_full_url', '') if not (upload_param or upload_url): raise RuntimeError(f'getuploadurl failed: {resp}') media = self._upload(filekey, upload_param, raw, aes_key=aes_key, upload_url=upload_url) item = {'media': media} if item_key == 'file_item': item.update({'file_name': fp.name, 'len': str(len(raw))}) elif item_key == 'image_item': thumb_param = resp.get('thumb_upload_param', '') thumb_url = resp.get('thumb_upload_full_url', '') if thumb_param or thumb_url: thumb_media = self._upload(filekey, thumb_param, thumb_raw, aes_key=aes_key, upload_url=thumb_url) thumb_size = thumb_ciphertext_size else: # Some getuploadurl responses only return a single upload_full_url for IMAGE. # Keep ImageItem structurally complete by reusing the original CDN media as thumb_media. thumb_media = media thumb_size = ciphertext_size item.update({'mid_size': ciphertext_size, 'thumb_media': thumb_media, 'thumb_size': thumb_size, 'thumb_width': thumb_w, 'thumb_height': thumb_h}) elif item_key == 'video_item': item.update({'video_size': ciphertext_size}) msg = {'from_user_id': '', 'to_user_id': to_user_id, 'client_id': f'pyclient-{uuid.uuid4().hex[:16]}', 'message_type': MSG_BOT, 'message_state': STATE_FINISH, 'item_list': [{'type': item_type, item_key: item}]} if context_token: msg['context_token'] = context_token return self._post('ilink/bot/sendmessage', {'msg': msg, 'base_info': {'channel_version': VER}}) def send_file(self, to_user_id, file_path, context_token=''): return self._send_media(to_user_id, file_path, 3, ITEM_FILE, 'file_item', context_token) def send_image(self, to_user_id, file_path, context_token=''): return self._send_media(to_user_id, file_path, 1, ITEM_IMAGE, 'image_item', context_token) def send_video(self, to_user_id, file_path, context_token=''): return self._send_media(to_user_id, file_path, 2, ITEM_VIDEO, 'video_item', context_token) @staticmethod def extract_text(msg): return '\n'.join(it['text_item'].get('text', '') for it in msg.get('item_list', []) if it.get('type') == ITEM_TEXT and it.get('text_item')) @staticmethod def is_user_msg(msg): return msg.get('message_type') == MSG_USER def run_loop(self, on_message, poll_timeout=30): print(f'[Bot] 监听中... (bot_id={self.bot_id})') seen = set() while True: try: for msg in self.get_updates(poll_timeout): mid = msg.get('message_id', 0) if not self.is_user_msg(msg) or mid in seen: continue seen.add(mid) if len(seen) > 5000: seen = set(list(seen)[-2000:]) try: on_message(self, msg) except Exception as e: print(f'[Bot] 回调异常: {e}') except KeyboardInterrupt: print('[Bot] 退出'); break except Exception as e: print(f'[Bot] 异常: {e},5s重试'); time.sleep(5) # ── Unified media download (IMAGE/VIDEO/FILE/VOICE) ── _MEDIA_KEYS = {'image_item': '.jpg', 'video_item': '.mp4', 'file_item': '', 'voice_item': '.silk'} def _dl_media(items): """Download & decrypt all media items → list of local file paths.""" paths = [] for item in items: for key, ext in _MEDIA_KEYS.items(): sub = item.get(key) if not sub: continue eq = (sub.get('media') or {}).get('encrypt_query_param') if not eq: continue ak = (sub.get('media') or {}).get('aes_key', '') or sub.get('aeskey', '') if not ak: continue try: aes_key = (bytes.fromhex(base64.b64decode(ak).decode()) if sub.get('media', {}).get('aes_key') else bytes.fromhex(ak)) ct = requests.get(f'{CDN_BASE}/download?encrypted_query_param={quote(eq)}', headers={'User-Agent': UA}, timeout=60).content pt = AES.new(aes_key, AES.MODE_ECB).decrypt(ct); pt = pt[:-pt[-1]] fname = sub.get('file_name') or f'{uuid.uuid4().hex[:8]}{ext or ".bin"}' p = os.path.join(_TEMP_DIR, fname); open(p, 'wb').write(pt) paths.append(p); print(f'[WX] media saved: {fname}', file=sys.__stdout__) except Exception as e: print(f'[WX] media dl err ({key}): {e}', file=sys.__stdout__) break # one media per item return paths agent = GeneraticAgent() agent.verbose = False _TAG_PATS = [r'<' + t + r'>.*?' for t in ('thinking', 'tool_use')] _TAG_PATS.append(r'.*?') def _strip_md(t): """Filter markdown for WeChat rich-text rendering. WeChat natively renders: code fences, inline code, bold, italic, H1-H4 headings, horizontal rules, tables. We only strip unsupported syntax.""" def _trunc_code(m): full = m.group() fence = re.match(r'`{3,}', full).group() rest = full[len(fence):-len(fence)] if '\n' not in rest: return full # single-line, keep as-is lang_line, _, body = rest.partition('\n') lines = body.split('\n') if len(lines) > 10: return f'{fence}{lang_line}\n' + '\n'.join(lines[:10]) + '\n...\n' + fence return full # keep intact t = re.sub(r'(`{3,})[\s\S]*?\1', _trunc_code, t) # inline code: keep (WeChat renders it) # bold/italic (*/**/***): keep (WeChat renders it) t = re.sub(r'!\[.*?\]\(.*?\)', '', t) # images: remove t = re.sub(r'\[([^\]]+)\]\([^\)]+\)', r'\1', t) # links: text only t = re.sub(r'^#{5,6}\s+', '', t, flags=re.M) # H5-H6: strip (H1-H4 kept) t = re.sub(r'^\s*[-*+]\s+', '• ', t, flags=re.M) # unordered list: bullet t = re.sub(r'^\s*\d+\.\s+', '', t, flags=re.M) # ordered list: strip num t = re.sub(r'^\s*>\s?', '', t, flags=re.M) # blockquote: strip # horizontal rules (---): keep (WeChat renders it) return re.sub(r'\n{3,}', '\n\n', t).strip() def _clean(t): t = re.sub(r'^\s*LLM Running \(Turn \d+\) \.{3}\s*$', '', t, flags=re.M) t = re.sub(r'^\s*🛠️\s*[A-Za-z_][A-Za-z0-9_]*\(.*$', '', t, flags=re.M) for p in _TAG_PATS: t = re.sub(p, '', t, flags=re.DOTALL) t = re.sub(r'', '', t) return re.sub(r'\n{3,}', '\n\n', _strip_md(t)).strip() def _turn_parts(t): _ph = [] safe = re.sub(r'`{4,}.*?`{4,}', lambda m: (_ph.append(m.group(0)), f'\x00PH{len(_ph)-1}\x00')[1], t, flags=re.DOTALL) parts = re.split(r'(\**LLM Running \(Turn \d+\) \.\.\.\**)', safe) parts = [re.sub(r'\x00PH(\d+)\x00', lambda m: _ph[int(m.group(1))], p) for p in parts] if len(parts) < 4: return [], t turns = [parts[i] + (parts[i+1] if i+1 < len(parts) else '') for i in range(1, len(parts), 2)] return (([parts[0]] if parts[0].strip() else []) + turns[:-1], turns[-1]) def on_message(bot, msg): text = bot.extract_text(msg).strip() uid = msg.get('from_user_id', '') ctx = msg.get('context_token', '') media_paths = _dl_media(msg.get('item_list', [])) if not text and not media_paths: return if media_paths: text = (text + '\n' if text else '') + '\n'.join(f'[用户发送文件: {p}]' for p in media_paths) print(f'[WX] 收到: {text[:80]}', file=sys.__stdout__) # Commands if text in ('/stop', '/abort'): agent.abort() bot.send_text(uid, '已停止', context_token=ctx) return if text.startswith('/llm'): args = text.split() if len(args) > 1: try: n = int(args[1]); agent.next_llm(n) bot.send_text(uid, f'切换到 [{agent.llm_no}] {agent.get_llm_name()}', context_token=ctx) except (ValueError, IndexError): bot.send_text(uid, f'用法: /llm <0-{len(agent.list_llms())-1}>', context_token=ctx) else: lines = [f"{'→' if cur else ' '} [{i}] {name}" for i, name, cur in agent.list_llms()] bot.send_text(uid, 'LLMs:\n' + '\n'.join(lines), context_token=ctx) return def _handle(): prompt = text if text.startswith('/') else f"If you need to show files to user, use [FILE:filepath] in your response.\n\n{text}" dq = agent.put_task(prompt, source="wechat") _typing_stop = threading.Event() def _keep_typing(): ticket = bot.get_typing_ticket(uid, ctx) if not ticket: return while not _typing_stop.is_set(): try: bot.send_typing(uid, ticket) except: pass _typing_stop.wait(2.0) threading.Thread(target=_keep_typing, daemon=True).start() result = ''; sent = 0; mi = 0; last_send = 0 def _wx_send(text): s = text.strip(); t0 = time.time() try: bot.send_text(uid, s, context_token=ctx) print(f'[WX] send ok len={len(s)} dt={time.time()-t0:.1f}s', file=sys.__stdout__) return True except Exception as e: print(f'[WX] send err len={len(s)} dt={time.time()-t0:.1f}s {type(e).__name__}: {e}', file=sys.__stdout__) return False def _send(show): nonlocal mi, last_send now = time.time() if mi >= 9 or not show.strip(): return False if mi and now - last_send < 6 * mi: return None if _wx_send(show[:2000]): mi += 1; last_send = time.time(); return True return False try: while True: item = dq.get(timeout=300) if 'done' in item: result = item['done']; break raw = item.get('next', '') done, partial = _turn_parts(raw) if len(done) > sent: merged = _clean('\n\n'.join(done[sent:])) print(f'[WX] turns={len(done)}/{len(done)+1} sent={sent} sending={len(done)-sent}', file=sys.__stdout__) if _send(merged): sent = len(done) except queue.Empty: result = '[超时]' _typing_stop.set() done, partial = _turn_parts(result) rest = '\n\n'.join(done[sent:] + [partial] + ['\n\n[任务已完成]']) if rest.strip(): _wx_send((_clean(rest))[-2000:]) files = re.findall(r'\[FILE:([^\]]+)\]', result) bad = {'filepath', '', 'path', '', 'file_path', '', '...'} files = [f for f in files if f.strip().lower() not in bad and (f if os.path.isabs(f) else os.path.join(_TEMP_DIR, f)) not in media_paths] for fpath in set(files): if not os.path.isabs(fpath): fpath = os.path.join(_TEMP_DIR, fpath) try: if not os.path.exists(fpath): raise FileNotFoundError(f"文件不存在: {fpath}") ext = os.path.splitext(fpath)[1].lower() sender = bot.send_video if ext in {'.mp4', '.mov', '.m4v', '.webm'} else \ bot.send_image if ext in {'.jpg', '.jpeg', '.png', '.gif', '.webp', '.bmp'} else bot.send_file sender(uid, fpath, context_token=ctx) print(f'[WX] sent media: {fpath}', file=sys.__stdout__) except Exception as e: print(f'[WX] send media err: {e}', file=sys.__stdout__) threading.Thread(target=_handle, daemon=True).start() if __name__ == '__main__': try: _lock = socket.socket(socket.AF_INET, socket.SOCK_STREAM); _lock.bind(('127.0.0.1', 19531)) except OSError: print('[WeChat] Another instance running, exiting.'); sys.exit(1) _logf = open(os.path.join(os.path.dirname(os.path.dirname(__file__)), 'temp', 'wechatapp.log'), 'a', encoding='utf-8', buffering=1) sys.stdout = sys.stderr = _logf print(f'[NEW] Process starting {time.strftime("%m-%d %H:%M")}') bot = WxBotClient() if not bot.token: sys.stdout = sys.stderr = sys.__stdout__ # restore for QR display bot.login_qr() sys.stdout = sys.stderr = _logf threading.Thread(target=agent.run, daemon=True).start() print(f'WeChat Bot 已启动 (bot_id={bot.bot_id})', file=sys.__stdout__) bot.run_loop(on_message) ================================================ FILE: frontends/wecomapp.py ================================================ import asyncio, os, select, sys, threading, time, traceback from collections import deque from datetime import datetime from typing import Any, Callable, Dict, Optional, TypedDict class TurnContext(TypedDict, total=False): """Hook callback receives agent locals() — these are the keys we rely on.""" exit_reason: Optional[str] response: Any summary: Optional[str] tool_calls: Optional[list] turn: int TurnHookFn = Callable[[TurnContext], None] sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from agentmain import GeneraticAgent from chatapp_common import (AgentChatMixin, FILE_HINT, build_done_text, clean_reply, ensure_single_instance, extract_files, public_access, redirect_log, require_runtime, split_text, strip_files) from llmcore import mykeys try: from wecom_aibot_sdk import WSClient, generate_req_id except Exception: print("Please install wecom_aibot_sdk: pip install wecom_aibot_sdk") sys.exit(1) # ── Config ────────────────────────────────────────────────────────── BOT_ID = str(mykeys.get("wecom_bot_id", "") or "").strip() SECRET = str(mykeys.get("wecom_secret", "") or "").strip() WELCOME = str(mykeys.get("wecom_welcome_message", "") or "").strip() ALLOWED = {str(x).strip() for x in mykeys.get("wecom_allowed_users", []) if str(x).strip()} PORT = 19531 # single-instance lock port TEMP_DIR = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "temp") MEDIA_DIR = os.path.join(TEMP_DIR, "media") IMAGE_EXTS = {".png", ".jpg", ".jpeg", ".gif", ".bmp", ".webp", ".svg"} # ── Helpers ───────────────────────────────────────────────────────── def _ts(): return datetime.now().strftime("%H:%M:%S") def _tprint(*a, **kw): kw.setdefault("file", sys.__stdout__) print(*a, **kw) if hasattr(sys.__stdout__, "flush"): sys.__stdout__.flush() def _fmt_tool(tc): name = tc.get("tool_name", "?") args = {k: v for k, v in (tc.get("args") or {}).items() if not k.startswith("_")} return f"{name}({str(args)[:120]})" # ── WeComApp ──────────────────────────────────────────────────────── class WeComApp(AgentChatMixin): label, source, split_limit = "WeCom", "wecom", 1200 # split_limit: wecom single-msg char cap def __init__(self, agent): self.agent = agent if not hasattr(agent, '_turn_end_hooks'): agent._turn_end_hooks = {} super().__init__(agent, {}) self._allowed = ALLOWED self.client = None self.chat_frames = {} # chat_id → latest frame (for reply) self._seen = deque(maxlen=1000) self._stats = {"received": 0, "completed": 0} # ── hook management ────────────────────────────────────────────── def _register_hook(self, key: str, fn: TurnHookFn) -> None: """Register a turn-end callback on the agent.""" self.agent._turn_end_hooks[key] = fn def _unregister_hook(self, key: str) -> None: """Remove a turn-end callback.""" self.agent._turn_end_hooks.pop(key, None) # ── frame accept: dedup → auth → register ─────────────────────── def _accept(self, frame): """Parse incoming frame. Returns (body, sender_id, chat_id) or None.""" body = frame.body if hasattr(frame, "body") else frame.get("body", frame) if isinstance(frame, dict) else {} msg_id = body.get("msgid") or f"{body.get('chatid', '')}_{body.get('sendertime', '')}_{id(frame)}" if msg_id in self._seen: return None self._seen.append(msg_id) sender_id = str((body.get("from") or {}).get("userid", "") or "unknown") chat_id = str(body.get("chatid", "") or sender_id) if not public_access(ALLOWED) and sender_id not in ALLOWED: print(f"[WeCom] unauthorized: {sender_id}") return None self.chat_frames[chat_id] = frame self._stats["received"] += 1 return body, sender_id, chat_id async def _save_media(self, url, aes_key, default_name): """Download encrypted media → save to MEDIA_DIR → return local path.""" os.makedirs(MEDIA_DIR, exist_ok=True) result = await self.client.download_file(url, aes_key or None) buf = result["buffer"] fname = result.get("filename") or default_name path = os.path.join(MEDIA_DIR, fname) with open(path, "wb") as f: f.write(buf) _tprint(f"[{_ts()}] 💾 Saved: {path} ({len(buf)} bytes)") return path # ── send ──────────────────────────────────────────────────────── async def send_text(self, chat_id, content, **_): if not self.client or chat_id not in self.chat_frames: return frame = self.chat_frames[chat_id] for part in split_text(content, self.split_limit): await self.client.reply_stream(frame, generate_req_id("stream"), part, finish=True) async def send_media(self, chat_id, file_path): if not self.client or not os.path.isfile(file_path): return ext = os.path.splitext(file_path)[1].lower() media_type = "image" if ext in IMAGE_EXTS else "file" with open(file_path, "rb") as f: data = f.read() try: result = await self.client.upload_media(data, type=media_type, filename=os.path.basename(file_path)) frame = self.chat_frames.get(chat_id) if frame: await self.client.reply_media(frame, media_type, result["media_id"]) else: await self.client.send_media_message(chat_id, media_type, result["media_id"]) _tprint(f"[{_ts()}] 📤 Sent {media_type}: {os.path.basename(file_path)}") except Exception as e: print(f"[WeCom] send_media error: {e}") await self.send_text(chat_id, f"📎 {os.path.basename(file_path)}(发送失败: {e})") async def send_done(self, chat_id, raw_text): """Send final result: text + extracted file attachments.""" files = extract_files(raw_text) if not files: return await self.send_text(chat_id, build_done_text(raw_text)) clean = clean_reply(strip_files(raw_text)) if clean and clean != "...": await self.send_text(chat_id, clean) for fp in files: if not os.path.isabs(fp) and not os.path.isfile(fp): resolved = os.path.join(TEMP_DIR, fp) if os.path.isfile(resolved): fp = resolved await self.send_media(chat_id, fp) # ── agent execution (single-channel via turn hook) ────────────── async def run_agent(self, chat_id, text, **_): state = {"running": True} self.user_tasks[chat_id] = state done_event = threading.Event() result = {} loop = asyncio.get_running_loop() hook_key = f"wecom_{chat_id}" # namespace: wecom_ + chat_id, matches _turn_end_hooks convention def _on_turn(ctx): """Turn-end callback injected into agent. ctx = locals() from ga.py.""" try: if ctx.get("exit_reason"): resp = ctx.get("response") result["raw"] = resp.content if hasattr(resp, "content") else str(resp) result["summary"] = ctx.get("summary") done_event.set() return summary = ctx.get("summary") if not summary: return turn = ctx.get("turn", "?") tools = ctx.get("tool_calls") or [] parts = [f"⏳ Turn {turn}: {summary}"] if tools: parts.append(f"🛠 {', '.join(_fmt_tool(tc) for tc in tools[:3])}") _tprint(f"[{_ts()}] {parts[0]}") asyncio.run_coroutine_threadsafe(self.send_text(chat_id, "\n".join(parts)), loop) except Exception as e: print(f"[WeCom hook] {e}") traceback.print_exc() try: await self.send_text(chat_id, "🤔 思考中...") self._register_hook(hook_key, _on_turn) self.agent.put_task(f"{FILE_HINT}\n\n{text}", source=self.source) # Wait for: hook signals done / user stops / agent crashes t0 = time.time() while state["running"] and not done_event.is_set(): await asyncio.sleep(1) elapsed = time.time() - t0 if elapsed > 10 and not self.agent.is_running: await asyncio.sleep(3) # grace period for hook delivery if not done_event.is_set(): break if result.get("raw") is not None: self._stats["completed"] += 1 await self.send_done(chat_id, result["raw"]) label = result.get("summary") or f'{len(result["raw"])} 字' _tprint(f"[{_ts()}] ✅ Done ({chat_id}) — {label}") elif not state["running"]: _tprint(f"[{_ts()}] ⏹️ 停止 ({chat_id})") await self.send_text(chat_id, "⏹️ 已停止") else: _tprint(f"[{_ts()}] ⚠️ 异常退出 ({chat_id})") await self.send_text(chat_id, "⚠️ Agent 异常退出,请重试") except Exception as e: traceback.print_exc() await self.send_text(chat_id, f"❌ 错误: {e}") finally: self._unregister_hook(hook_key) self.user_tasks.pop(chat_id, None) # ── message handlers ──────────────────────────────────────────── async def on_text(self, frame): parsed = self._accept(frame) if not parsed: return body, sender_id, chat_id = parsed content = str((body.get("text", {}) or {}).get("content", "") or "").strip() if not content: return _tprint(f"[{_ts()}] 📩 {sender_id}: {content}") if content.startswith("/"): _tprint(f"[{_ts()}] 🔧 命令 {content} from {sender_id}") return await self.handle_command(chat_id, content) asyncio.create_task(self.run_agent(chat_id, content)) async def _on_media(self, frame, key, icon): """Common handler for image/file messages.""" parsed = self._accept(frame) if not parsed: return body, sender_id, chat_id = parsed info = body.get(key) or {} url = info.get("url", "") if not url: return fname = info.get("file_name") or info.get("filename") or "" msgid = body.get("msgid", "x")[:16] default = f"img_{msgid}.jpg" if key == "image" else (fname or f"file_{msgid}") try: _tprint(f"[{_ts()}] {icon} {key.title()} from {sender_id}" + (f": {fname}" if fname else "")) path = await self._save_media(url, info.get("aeskey", ""), default) label = "一张图片" if key == "image" else f"文件 {os.path.basename(path)}" asyncio.create_task(self.run_agent(chat_id, f"[用户发送了{label},已保存到: {path}]")) except Exception as e: print(f"[WeCom] on_{key} error: {e}") await self.send_text(chat_id, f"❌ {key}处理失败: {e}") async def on_image(self, frame): await self._on_media(frame, "image", "🖼️") async def on_file(self, frame): await self._on_media(frame, "file", "📎") # ── lifecycle ─────────────────────────────────────────────────── async def on_enter_chat(self, frame): if WELCOME and self.client: try: await self.client.reply_welcome(frame, {"msgtype": "text", "text": {"content": WELCOME}}) except Exception as e: print(f"[WeCom] welcome error: {e}") async def on_connected(self, *_): _tprint("[WeCom] connected") async def on_authenticated(self, *_): _tprint("[WeCom] authenticated, 等待消息中...\n") async def on_disconnected(self, *_): _tprint("[WeCom] disconnected") async def on_error(self, frame): _tprint(f"[WeCom] error: {frame}") # ── Terminal CLI (runs in background thread) ───────────────────── def _terminal_loop(self): """Blocking CLI loop — run in a daemon thread.""" while True: try: if not select.select([sys.stdin], [], [], 1.0)[0]: continue cmd = sys.stdin.readline().strip().lower() except Exception: break if not cmd: continue if cmd == "help": _tprint(" status — 查看状态") _tprint(" stop [user] — 停止任务(多任务时需指定 user)") _tprint(" exit — 退出进程") elif cmd == "status": _tprint(f"[{_ts()}] 📊 收到 {self._stats['received']} 条 | 完成 {self._stats['completed']} 条 | 活跃 {len(self.user_tasks)}") for uid, st in self.user_tasks.items(): _tprint(f" ├ {uid}: running={st.get('running')}") _tprint(f" Agent running: {self.agent.is_running} | 允许: {self._allowed or '全部'}") elif cmd.startswith("stop"): parts = cmd.split(None, 1) tasks = self.user_tasks if not tasks: _tprint(" 没有活跃任务") elif len(parts) > 1: uid = parts[1] if uid in tasks: tasks[uid]["running"] = False _tprint(f" ⏹️ 已停止 {uid}") else: _tprint(f" 未找到: {uid}") elif len(tasks) == 1: uid = next(iter(tasks)) tasks[uid]["running"] = False _tprint(f" ⏹️ 已停止 {uid}") else: _tprint(" 多个任务,请指定: stop ") for uid in tasks: _tprint(f" ├ {uid}") elif cmd == "exit": _tprint(f"[{_ts()}] 👋 退出...") os._exit(0) else: _tprint(" 可用命令: help | status | stop | exit") async def start(self, client=None): self.client = client or WSClient(BOT_ID, SECRET, reconnect_interval=1000, max_reconnect_attempts=-1, heartbeat_interval=30000) for ev, fn in { "connected": self.on_connected, "authenticated": self.on_authenticated, "disconnected": self.on_disconnected, "error": self.on_error, "message.text": self.on_text, "message.image": self.on_image, "message.file": self.on_file, "event.enter_chat": self.on_enter_chat, }.items(): self.client.on(ev, fn) _tprint("[WeCom] starting ...") await self.client.connect() while True: await asyncio.sleep(1) # ── Main ──────────────────────────────────────────────────────────── if __name__ == "__main__": agent = GeneraticAgent(); agent.verbose = False _LOCK = ensure_single_instance(PORT, "WeCom") require_runtime(agent, "WeCom", wecom_bot_id=BOT_ID, wecom_secret=SECRET) redirect_log(__file__, "wecomapp.log", "WeCom", ALLOWED) _tprint("\n═══════════════════════════════════════════") _tprint(" 企业微信 Agent (长连接模式)") _tprint(f" 端口锁: {PORT} | 允许用户: {ALLOWED or '全部'}") _tprint("═══════════════════════════════════════════") _tprint(" 终端命令: help | status | stop | exit") app = WeComApp(agent) threading.Thread(target=agent.run, daemon=True).start() threading.Thread(target=app._terminal_loop, daemon=True).start() asyncio.run(app.start()) ================================================ FILE: ga.py ================================================ import sys, os, re, json, time, threading, importlib from datetime import datetime from pathlib import Path import tempfile, traceback, subprocess, itertools, collections, difflib if sys.stdout is None: sys.stdout = open(os.devnull, "w") if sys.stderr is None: sys.stderr = open(os.devnull, "w") sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) from agent_loop import BaseHandler, StepOutcome, json_default script_dir = os.path.dirname(os.path.abspath(__file__)) def code_run(code, code_type="python", timeout=60, cwd=None, code_cwd=None, stop_signal=None): """代码执行器 python: 运行复杂的 .py 脚本(文件模式) powershell/bash: 运行单行指令(命令模式) 优先使用python,仅在必要系统操作时使用powershell""" preview = (code[:60].replace('\n', ' ') + '...') if len(code) > 60 else code.strip() yield f"[Action] Running {code_type} in {os.path.basename(cwd)}: {preview}\n" cwd = cwd or os.path.join(script_dir, 'temp'); tmp_path = None if code_type in ["python", "py"]: tmp_file = tempfile.NamedTemporaryFile(suffix=".ai.py", delete=False, mode='w', encoding='utf-8', dir=code_cwd) cr_header = os.path.join(script_dir, 'assets', 'code_run_header.py') if os.path.exists(cr_header): tmp_file.write(open(cr_header, encoding='utf-8').read()) tmp_file.write(code) tmp_path = tmp_file.name tmp_file.close() cmd = [sys.executable, "-X", "utf8", "-u", tmp_path] elif code_type in ["powershell", "bash", "sh", "shell", "ps1", "pwsh"]: if os.name == 'nt': cmd = ["powershell", "-NoProfile", "-NonInteractive", "-Command", code] else: cmd = ["bash", "-c", code] else: return {"status": "error", "msg": f"不支持的类型: {code_type}"} print("code run output:") startupinfo = None if os.name == 'nt': startupinfo = subprocess.STARTUPINFO() startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW startupinfo.wShowWindow = 0 # SW_HIDE full_stdout = [] def stream_reader(proc, logs): try: for line_bytes in iter(proc.stdout.readline, b''): try: line = line_bytes.decode('utf-8') except UnicodeDecodeError: line = line_bytes.decode('gbk', errors='ignore') logs.append(line) try: print(line, end="") except: pass except: pass try: process = subprocess.Popen( cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=0, cwd=cwd, startupinfo=startupinfo ) start_t = time.time() t = threading.Thread(target=stream_reader, args=(process, full_stdout), daemon=True) t.start() while t.is_alive(): istimeout = time.time() - start_t > timeout if istimeout or stop_signal: process.kill() print("[Debug] Process killed due to timeout or stop signal.") if istimeout: full_stdout.append("\n[Timeout Error] 超时强制终止") else: full_stdout.append("\n[Stopped] 用户强制终止") break time.sleep(1) t.join(timeout=1) exit_code = process.poll() stdout_str = "".join(full_stdout) status = "success" if exit_code == 0 else "error" status_icon = "✅" if exit_code == 0 else "❌" if exit_code is None: status_icon = "⏳" output_snippet = smart_format(stdout_str, max_str_len=600, omit_str='\n\n[omitted long output]\n\n') output_snippet = re.sub(r'`{4,}', lambda m: m.group(0)[:3] + '\u200b' + m.group(0)[3:], output_snippet) yield f"[Status] {status_icon} Exit Code: {exit_code}\n[Stdout]\n{output_snippet}\n" if process.stdout: threading.Thread(target=process.stdout.close, daemon=True).start() return { "status": status, "stdout": smart_format(stdout_str, max_str_len=10000, omit_str='\n\n[omitted long output]\n\n'), "exit_code": exit_code } except Exception as e: if 'process' in locals(): process.kill() return {"status": "error", "msg": str(e)} finally: if code_type == "python" and tmp_path and os.path.exists(tmp_path): os.remove(tmp_path) def ask_user(question, candidates=None): """question: 向用户提出的问题。candidates: 可选的候选项列表""" return {"status": "INTERRUPT", "intent": "HUMAN_INTERVENTION", "data": {"question": question, "candidates": candidates or []}} import simphtml driver = None def first_init_driver(): global driver from TMWebDriver import TMWebDriver driver = TMWebDriver() for i in range(20): time.sleep(1) sess = driver.get_all_sessions() if len(sess) > 0: break if len(sess) == 0: return if len(sess) == 1: #driver.newtab() time.sleep(3) def web_scan(tabs_only=False, switch_tab_id=None, text_only=False): """获取当前页面的简化HTML内容和标签页列表。注意:简化过程会过滤边栏、浮动元素等非主体内容。 tabs_only: 仅返回标签页列表,不获取HTML内容(节省token)。 switch_tab_id: 可选参数,如果提供,则在扫描前切换到该标签页。 应当多用execute_js,少全量观察html""" global driver try: if driver is None: first_init_driver() if len(driver.get_all_sessions()) == 0: return {"status": "error", "msg": "没有可用的浏览器标签页,查L3记忆分析原因。"} tabs = [] for sess in driver.get_all_sessions(): sess.pop('connected_at', None) sess.pop('type', None) sess['url'] = sess.get('url', '')[:50] + ("..." if len(sess.get('url', '')) > 50 else "") tabs.append(sess) if switch_tab_id: driver.default_session_id = switch_tab_id result = { "status": "success", "metadata": { "tabs_count": len(tabs), "tabs": tabs, "active_tab": driver.default_session_id } } if not tabs_only: importlib.reload(simphtml); result["content"] = simphtml.get_html(driver, cutlist=True, maxchars=35000, text_only=text_only) if text_only: result['content'] = smart_format(result['content'], max_str_len=10000, omit_str='\n\n[omitted long content]\n\n') return result except Exception as e: return {"status": "error", "msg": format_error(e)} def format_error(e): exc_type, exc_value, exc_traceback = sys.exc_info() tb = traceback.extract_tb(exc_traceback) if tb: f = tb[-1] fname = os.path.basename(f.filename) return f"{exc_type.__name__}: {str(e)} @ {fname}:{f.lineno}, {f.name} -> `{f.line}`" return f"{exc_type.__name__}: {str(e)}" def log_memory_access(path): if 'memory' not in path: return stats_file = os.path.join(script_dir, 'memory/file_access_stats.json') try: with open(stats_file, 'r', encoding='utf-8') as f: stats = json.load(f) except: stats = {} fname = os.path.basename(path) stats[fname] = {'count': stats.get(fname, {}).get('count', 0) + 1, 'last': datetime.now().strftime('%Y-%m-%d')} with open(stats_file, 'w', encoding='utf-8') as f: json.dump(stats, f, indent=2, ensure_ascii=False) def web_execute_js(script, switch_tab_id=None, no_monitor=False): """执行 JS 脚本来控制浏览器,并捕获结果和页面变化""" global driver try: if driver is None: first_init_driver() if len(driver.get_all_sessions()) == 0: return {"status": "error", "msg": "没有可用的浏览器标签页,查L3记忆分析原因。"} if switch_tab_id: driver.default_session_id = switch_tab_id result = simphtml.execute_js_rich(script, driver, no_monitor=no_monitor) return result except Exception as e: return {"status": "error", "msg": format_error(e)} def expand_file_refs(text, base_dir=None): """展开文本中的 {{file:路径:起始行:结束行}} 引用为实际文件内容。 可与普通文本混排。展开失败抛 ValueError。 base_dir: 相对路径的基准目录,默认为进程 cwd""" pattern = r'\{\{file:(.+?):(\d+):(\d+)\}\}' def replacer(match): path, start, end = match.group(1), int(match.group(2)), int(match.group(3)) path = os.path.abspath(os.path.join(base_dir or '.', path)) if not os.path.isfile(path): raise ValueError(f"引用文件不存在: {path}") with open(path, 'r', encoding='utf-8') as f: lines = f.readlines() if start < 1 or end > len(lines) or start > end: raise ValueError(f"行号越界: {path} 共{len(lines)}行, 请求{start}-{end}") return ''.join(lines[start-1:end]) return re.sub(pattern, replacer, text) def file_patch(path: str, old_content: str, new_content: str): """在文件中寻找唯一的 old_content 块并替换为 new_content""" path = str(Path(path).resolve()) try: if not os.path.exists(path): return {"status": "error", "msg": "文件不存在"} with open(path, 'r', encoding='utf-8') as f: full_text = f.read() if not old_content: return {"status": "error", "msg": "old_content 为空,请确认 arguments"} count = full_text.count(old_content) if count == 0: return {"status": "error", "msg": "未找到匹配的旧文本块,建议:先用 file_read 确认当前内容,再分小段进行 patch。若多次失败则询问用户,严禁自行使用 overwrite 或代码替换。"} if count > 1: return {"status": "error", "msg": f"找到 {count} 处匹配,无法确定唯一位置。请提供更长、更具体的旧文本块以确保唯一性。建议:包含上下文行来增强特征,或分小段逐个修改。"} updated_text = full_text.replace(old_content, new_content) with open(path, 'w', encoding='utf-8') as f: f.write(updated_text) return {"status": "success", "msg": "文件局部修改成功"} except Exception as e: return {"status": "error", "msg": str(e)} _read_dirs = set() def _scan_files(base, depth=2): try: for e in os.scandir(base): if e.is_file(): yield (e.name, e.path) elif depth > 0 and e.is_dir(follow_symlinks=False): yield from _scan_files(e.path, depth - 1) except (PermissionError, OSError): pass def file_read(path, start=1, keyword=None, count=200, show_linenos=True): try: with open(path, 'r', encoding='utf-8', errors='replace') as f: stream = ((i, l.rstrip('\r\n')) for i, l in enumerate(f, 1)) stream = itertools.dropwhile(lambda x: x[0] < start, stream) if keyword: before = collections.deque(maxlen=count//3) for i, l in stream: if keyword.lower() in l.lower(): res = list(before) + [(i, l)] + list(itertools.islice(stream, count - len(before) - 1)) break before.append((i, l)) else: return f"Keyword '{keyword}' not found after line {start}. Falling back to content from line {start}:\n\n" \ + file_read(path, start, None, count, show_linenos) else: res = list(itertools.islice(stream, count)) realcnt = len(res); L_MAX = min(max(100, 256000//max(realcnt,1)), 8000); TAG = " ... [TRUNCATED]" remaining = sum(1 for _ in itertools.islice(stream, 5000)) total_lines = (res[0][0] - 1 if res else start - 1) + realcnt + remaining tl_str = f"{total_lines}+" if remaining >= 5000 else str(total_lines) partial = total_lines > realcnt total_tag = f"[FILE] {tl_str} lines" + (f" | PARTIAL showing {realcnt}; assess need for more" if partial else "") + "\n" res = [(i, l if len(l) <= L_MAX else l[:L_MAX] + TAG) for i, l in res] result = "\n".join(f"{i}|{l}" if show_linenos else l for i, l in res) if show_linenos: result = total_tag + result elif partial: result += f"\n\n[FILE PARTIAL: showing {realcnt}/{tl_str} lines; assess need for more]" _read_dirs.add(os.path.dirname(os.path.abspath(path))) return result except FileNotFoundError: msg = f"Error: File not found: {path}" try: tgt = os.path.basename(path); scan = os.path.dirname(os.path.dirname(os.path.abspath(path))) roots = [scan] + [d for d in _read_dirs if not d.startswith(scan)] cands = list(itertools.islice((c for base in roots for c in _scan_files(base)), 2000)) top = sorted([(difflib.SequenceMatcher(None, tgt.lower(), c[0].lower()).ratio(), c) for c in cands[:2000]], key=lambda x: -x[0])[:5] top = [(s, c) for s, c in top if s > 0.3] if top: msg += "\n\nDid you mean:\n" + "\n".join(f" {c[1]} ({s:.0%})" for s, c in top) except Exception: pass return msg except Exception as e: return f"Error: {str(e)}" def smart_format(data, max_str_len=100, omit_str=' ... '): if not isinstance(data, str): data = str(data) if len(data) < max_str_len + len(omit_str)*2: return data return f"{data[:max_str_len//2]}{omit_str}{data[-max_str_len//2:]}" def consume_file(dr, file): if dr and os.path.exists(os.path.join(dr, file)): with open(os.path.join(dr, file), encoding='utf-8', errors='replace') as f: content = f.read() os.remove(os.path.join(dr, file)) return content class GenericAgentHandler(BaseHandler): '''Generic Agent 工具库,包含多种工具的实现。工具函数自动加上了 do_ 前缀。实际工具名没有前缀。''' def __init__(self, parent, last_history=None, cwd='./temp'): self.parent = parent self.working = {} self.cwd = cwd; self.current_turn = 0 self.history_info = last_history if last_history else [] self.code_stop_signal = [] self._done_hooks = [] def _get_abs_path(self, path): if not path: return "" return os.path.abspath(os.path.join(self.cwd, path)) def _extract_code_block(self, response, code_type): code_type = {'python':'python|py', 'powershell':'powershell|ps1|pwsh', 'bash':'bash|sh|shell'}.get(code_type, re.escape(code_type)) matches = re.findall(rf"```(?:{code_type})\n(.*?)\n```", response.content, re.DOTALL) return matches[-1].strip() if matches else None def do_code_run(self, args, response): '''执行代码片段,有长度限制,不允许代码中放大量数据,如有需要应当通过文件读取进行。''' code_type = args.get("type", "python") code = args.get("code") or args.get("script") if not code: code = self._extract_code_block(response, code_type) if not code: return StepOutcome("[Error] Code missing. Must use reply code block or 'script' arg.", next_prompt="\n") try: timeout = int(args.get("timeout", 60)) except: timeout = 60 raw_path = os.path.join(self.cwd, args.get("cwd", './')) cwd = os.path.normpath(os.path.abspath(raw_path)) code_cwd = os.path.normpath(self.cwd) if code_type == 'python' and args.get("inline_eval"): ns = {'handler':self, 'parent':self.parent, 'history':json.dumps(self.parent.llmclient.backend.history)} old_cwd = os.getcwd() try: os.chdir(cwd) try: try: result = repr(eval(code, ns)) except SyntaxError: exec(code, ns); result = ns.get('_r', 'OK') except Exception as e: result = f'Error: {e}' finally: os.chdir(old_cwd) else: result = yield from code_run(code, code_type, timeout, cwd, code_cwd=code_cwd, stop_signal=self.code_stop_signal) next_prompt = self._get_anchor_prompt(skip=args.get('_index', 0) > 0) return StepOutcome(result, next_prompt=next_prompt) def do_ask_user(self, args, response): question = args.get("question", "请提供输入:") candidates = args.get("candidates", []) result = ask_user(question, candidates) yield f"Waiting for your answer ...\n" return StepOutcome(result, next_prompt="", should_exit=True) def do_web_scan(self, args, response): '''获取当前页面内容和标签页列表。也可用于切换标签页。 注意:HTML经过简化,边栏/浮动元素等可能被过滤。如需查看被过滤的内容请用execute_js。 tabs_only=true时仅返回标签页列表,不获取HTML(省token)''' tabs_only = args.get("tabs_only", False) switch_tab_id = args.get("switch_tab_id", None) text_only = args.get("text_only", False) result = web_scan(tabs_only=tabs_only, switch_tab_id=switch_tab_id, text_only=text_only) content = result.pop("content", None) yield f'[Info] {str(result)}\n' if content: result = json.dumps(result, ensure_ascii=False, default=json_default) + f"\n```html\n{content}\n```" next_prompt = "\n" return StepOutcome(result, next_prompt=next_prompt) def do_web_execute_js(self, args, response): '''web情况下的优先使用工具,执行任何js达成对浏览器的*完全*控制。支持将结果保存到文件供后续读取分析。''' script = args.get("script", "") or self._extract_code_block(response, "javascript") if not script: return StepOutcome("[Error] Script missing. Use ```javascript block or 'script' arg.", next_prompt="\n") abs_path = self._get_abs_path(script.strip()) if os.path.isfile(abs_path): with open(abs_path, 'r', encoding='utf-8') as f: script = f.read() save_to_file = args.get("save_to_file", "") switch_tab_id = args.get("switch_tab_id") or args.get("tab_id") no_monitor = args.get("no_monitor", False) result = web_execute_js(script, switch_tab_id=switch_tab_id, no_monitor=no_monitor) if save_to_file and "js_return" in result: content = str(result["js_return"] or '') abs_path = self._get_abs_path(save_to_file) result["js_return"] = smart_format(content, max_str_len=170) try: with open(abs_path, 'w', encoding='utf-8') as f: f.write(str(content)) result["js_return"] += f"\n\n[已保存完整内容到 {abs_path}]" except: result['js_return'] += f"\n\n[保存失败,无法写入文件 {abs_path}]" show = smart_format(json.dumps(result, ensure_ascii=False, indent=2, default=json_default), max_str_len=300) try: print("Web Execute JS Result:", show) except: pass yield f"JS 执行结果:\n{show}\n" next_prompt = self._get_anchor_prompt(skip=args.get('_index', 0) > 0) result = json.dumps(result, ensure_ascii=False, default=json_default) return StepOutcome(smart_format(result, max_str_len=8000), next_prompt=next_prompt) def do_file_patch(self, args, response): path = self._get_abs_path(args.get("path", "")) yield f"[Action] Patching file: {path}\n" old_content = args.get("old_content", "") new_content = args.get("new_content", "") try: new_content = expand_file_refs(new_content, base_dir=self.cwd) except ValueError as e: yield f"[Status] ❌ 引用展开失败: {e}\n" return StepOutcome({"status": "error", "msg": str(e)}, next_prompt="\n") result = file_patch(path, old_content, new_content) yield f"\n{str(result)}\n" next_prompt = self._get_anchor_prompt(skip=args.get('_index', 0) > 0) return StepOutcome(result, next_prompt=next_prompt) def do_file_write(self, args, response): '''用于对整个文件的大量处理,精细修改要用file_patch。 需要将要写入的内容放在标签内,或者放在代码块中''' path = self._get_abs_path(args.get("path", "")) mode = args.get("mode", "overwrite") # overwrite/append/prepend action_str = {"prepend": "Prepending to", "append": "Appending to"}.get(mode, "Overwriting") yield f"[Action] {action_str} file: {os.path.basename(path)}\n" def extract_robust_content(text): tags = re.findall(r"]*>(.*?)", text, re.DOTALL) if tags: return tags[-1].strip() blocks = re.findall(r"```[^\n]*\n([\s\S]*?)```", text) if blocks: return blocks[-1].strip() return None content = args.get('content') or extract_robust_content(response.content) if not content: yield f"[Status] ❌ 失败: 未在回复中找到代码块内容\n" return StepOutcome({"status": "error", "msg": "No content found. Blank is not supported. Put content inside ... tags in your reply body before call file_write."}, next_prompt="\n") try: new_content = expand_file_refs(content, base_dir=self.cwd) if mode == "prepend": old = open(path, 'r', encoding="utf-8").read() if os.path.exists(path) else "" open(path, 'w', encoding="utf-8").write(new_content + old) else: with open(path, 'a' if mode == "append" else 'w', encoding="utf-8") as f: f.write(new_content) yield f"[Status] ✅ {mode.capitalize()} 成功 ({len(new_content)} bytes)\n" next_prompt = self._get_anchor_prompt(skip=args.get('_index', 0) > 0) return StepOutcome({"status": "success", 'writed_bytes': len(new_content)}, next_prompt=next_prompt) except Exception as e: yield f"[Status] ❌ 写入异常: {str(e)}\n" return StepOutcome({"status": "error", "msg": str(e)}, next_prompt="\n") def do_file_read(self, args, response): '''读取文件内容。从第start行开始读取。如有keyword则返回第一个keyword(忽略大小写)周边内容''' path = self._get_abs_path(args.get("path", "")) yield f"\n[Action] Reading file: {path}\n" start = args.get("start", 1) count = args.get("count", 200) keyword = args.get("keyword") show_linenos = args.get("show_linenos", True) result = file_read(path, start=start, keyword=keyword, count=count, show_linenos=show_linenos) if show_linenos and not result.startswith("Error:"): result = '由于设置了show_linenos,以下返回信息为:(行号|)内容 。\n' + result if ' ... [TRUNCATED]' in result: result += '\n\n(某些行被截断,如需完整内容可改用 code_run 读取)' result = smart_format(result, max_str_len=20000, omit_str='\n\n[omitted long content]\n\n') next_prompt = self._get_anchor_prompt(skip=args.get('_index', 0) > 0) log_memory_access(path) if 'memory' in path or 'sop' in path: next_prompt += "\n[SYSTEM TIPS] 正在读取记忆或SOP文件,若决定按sop执行请提取sop中的关键点(特别是靠后的)update working memory." return StepOutcome(result, next_prompt=next_prompt) def _in_plan_mode(self): return self.working.get('in_plan_mode') def _exit_plan_mode(self): self.working.pop('in_plan_mode', None) def enter_plan_mode(self, plan_path): self.working['in_plan_mode'] = plan_path; self.max_turns = 100 print(f"[Info] Entered plan mode with plan file: {plan_path}"); return plan_path def _check_plan_completion(self): if not os.path.isfile(p:=self._in_plan_mode() or ''): return None try: return len(re.findall(r'\[ \]', open(p, encoding='utf-8', errors='replace').read())) except: return None def do_update_working_checkpoint(self, args, response): '''为整个任务设定后续需要临时记忆的重点。''' key_info = args.get("key_info", "") related_sop = args.get("related_sop", "") if "key_info" in args: self.working['key_info'] = key_info if "related_sop" in args: self.working['related_sop'] = related_sop self.working['passed_sessions'] = 0 yield f"[Info] Updated key_info and related_sop.\n" next_prompt = self._get_anchor_prompt(skip=args.get('_index', 0) > 0) #next_prompt += '\n[SYSTEM TIPS] 此函数一般在任务开始或中间时调用,如果任务已成功完成应该是start_long_term_update用于结算长期记忆。\n' return StepOutcome({"result": "working key_info updated"}, next_prompt=next_prompt) def _retry_or_exit(self, prompt): self._empty_ct = getattr(self, '_empty_ct', 0) + 1 if self._empty_ct >= 3: return StepOutcome({}, should_exit=True) return StepOutcome({}, next_prompt=prompt) def do_no_tool(self, args, response): '''这是一个特殊工具,由引擎自主调用,不要包含在TOOLS_SCHEMA里。 当模型在一轮中未显式调用任何工具时,由引擎自动触发。 二次确认仅在回复几乎只包含/和一段大代码块时触发。''' content = getattr(response, 'content', '') or "" thinking = getattr(response, 'thinking', '') or "" if not response or (not content.strip() and not thinking.strip()): yield "[Warn] LLM returned an empty response. Retrying...\n" return self._retry_or_exit("[System] Blank response, regenerate and tooluse") if '[!!! 流异常中断' in content[-100:] or '!!!Error:' in content[-100:]: return self._retry_or_exit("[System] Incomplete response. Regenerate and tooluse.") if 'max_tokens !!!]' in content[-100:]: return self._retry_or_exit("[System] max_tokens limit reached. Use multi small steps to do it.") if self._in_plan_mode() and any(kw in content for kw in ['任务完成', '全部完成', '已完成所有', '🏁']): if 'VERDICT' not in content and '[VERIFY]' not in content and '验证subagent' not in content: yield "[Warn] Plan模式完成声明拦截。\n" return StepOutcome({}, next_prompt="⛔ [验证拦截] 检测到你在plan模式下声称完成,但未执行[VERIFY]验证步骤。请先按plan_sop §四启动验证subagent,获得VERDICT后才能声称完成。") # 2. 检测"包含较大代码块但未调用工具"的情况 # 关键特征:恰好1个大代码块 + 代码块直接结尾(后面只有空白) code_block_pattern = r"```[a-zA-Z0-9_]*\n[\s\S]{50,}?```" blocks = re.findall(code_block_pattern, content) if len(blocks) == 1: m = re.search(code_block_pattern, content) after_block = content[m.end():] if not after_block.strip(): residual = content.replace(m.group(0), "") residual = re.sub(r"[\s\S]*?", "", residual, flags=re.IGNORECASE) residual = re.sub(r"[\s\S]*?", "", residual, flags=re.IGNORECASE) clean_residual = re.sub(r"\s+", "", residual) if len(clean_residual) <= 30: yield "[Info] Detected large code block without tool call and no extra natural language. Requesting clarification.\n" next_prompt = ( "[System] 检测到你在上一轮回复中主要内容是较大代码块,且本轮未调用任何工具。\n" "如果这些代码需要执行、写入文件或进一步分析,请重新组织回复并显式调用相应工具" "(例如:code_run、file_write、file_patch 等);\n" "如果只是向用户展示或讲解代码片段,请在回复中补充自然语言说明," "并明确是否还需要额外的实际操作。" ) return StepOutcome({}, next_prompt=next_prompt) if self._in_plan_mode(): remaining = self._check_plan_completion() if remaining == 0: self._exit_plan_mode(); yield "[Info] Plan完成:plan.md中0个[ ]残留,退出plan模式。\n" yield "[Info] Final response to user.\n" return StepOutcome(response, next_prompt=None) def do_start_long_term_update(self, args, response): '''Agent觉得当前任务完成后有重要信息需要记忆时调用此工具。''' prompt = '''### [总结提炼经验] 既然你觉得当前任务有重要信息需要记忆,请提取最近一次任务中【事实验证成功且长期有效】的环境事实、用户偏好、重要步骤,更新记忆。 本工具是标记开启结算过程,若已在更新记忆过程或没有值得记忆的点,忽略本次调用。 **如果没有经验证的,未来能用上的信息,忽略本次调用!** **只能提取行动验证成功的信息**: - **环境事实**(路径/凭证/配置)→ `file_patch` 更新 L2,同步 L1 - **复杂任务经验**(关键坑点/前置条件/重要步骤)→ L3 精简 SOP(只记你被坑得多次重试的核心要点) **禁止**:临时变量、具体推理过程、未验证信息、通用常识、你可以轻松复现的细节、只是做了但没有验证的信息 **操作**:严格遵循提供的L0的记忆更新SOP。先 `file_read` 看现有 → 判断类型 → 最小化更新 → 无新内容跳过,保证对记忆库最小局部修改。\n ''' + get_global_memory() yield "[Info] Start distilling good memory for long-term storage.\n" path = './memory/memory_management_sop.md' if os.path.exists(path): result = 'This is L0:\n' + file_read(path, show_linenos=False) else: result = "Memory Management SOP not found. Do not update memory." return StepOutcome(result, next_prompt=prompt) def _fold_earlier(self, lines): FALLBACK = '直接回答了用户问题' parts, cnt, last = [], 0, '' def flush(): if cnt: if FALLBACK in last: parts.append(f'[Agent]({cnt} turns)') else: parts.append(f'{last}({cnt} turns)') for line in lines: if line.startswith('[USER]'): flush(); parts.append(line); cnt = 0; last = '' else: cnt += 1; last = line flush() return "\n".join(parts[-150:]) def _get_anchor_prompt(self, skip=False): if skip: return "\n" h = self.history_info; W = 30 earlier = f'\n{self._fold_earlier(h[:-W])}\n\n' if len(h) > W else "" h_str = "\n".join(h[-W:]) prompt = f"\n### [WORKING MEMORY]\n{earlier}\n{h_str}\n" prompt += f"\nCurrent turn: {self.current_turn}\n" if self.working.get('key_info'): prompt += f"\n{self.working.get('key_info')}" if self.working.get('related_sop'): prompt += f"\n有不清晰的地方请再次读取{self.working.get('related_sop')}" if getattr(self.parent, 'verbose', False): try: print(prompt) except: pass return prompt def turn_end_callback(self, response, tool_calls, tool_results, turn, next_prompt, exit_reason): _c = re.sub(r'```.*?```|.*?', '', response.content, flags=re.DOTALL) rsumm = re.search(r"(.*?)", _c, re.DOTALL) if rsumm: summary = rsumm.group(1).strip() else: tc = tool_calls[0]; tool_name, args = tc['tool_name'], tc['args'] # at least one because no_tool clean_args = {k: v for k, v in args.items() if not k.startswith('_')} summary = f"调用工具{tool_name}, args: {clean_args}" if tool_name == 'no_tool': summary = "直接回答了用户问题" next_prompt += "\n\n\n[SYSTEM] 必须在回复文本中包含!\n\n" summary = smart_format(summary.replace('\n', ''), max_str_len=80) self.history_info.append(f'[Agent] {summary}') _plan = self._in_plan_mode() if turn % 65 == 0 and (not _plan): next_prompt += f"\n\n[DANGER] 已连续执行第 {turn} 轮。必须总结情况进行ask_user,不允许继续重试。" elif turn % 7 == 0: next_prompt += f"\n\n[DANGER] 已连续执行第 {turn} 轮。禁止无效重试。若无有效进展,必须切换策略:1. 探测物理边界 2. 请求用户协助。如有需要,可调用 update_working_checkpoint 保存关键上下文。" elif turn % 10 == 0: next_prompt += get_global_memory() if _plan and turn >= 10 and turn % 5 == 0: next_prompt = f"[Plan Hint] 正在计划模式。必须 file_read({_plan}) 确认当前步骤,回复开头引用:📌 当前步骤:...\n\n" + next_prompt if _plan and turn >= 90: next_prompt += f"\n\n[DANGER] Plan模式已运行 {turn} 轮,已达上限。必须 ask_user 汇报进度并确认是否继续。" injkeyinfo = consume_file(self.parent.task_dir, '_keyinfo') injprompt = consume_file(self.parent.task_dir, '_intervene') if injkeyinfo: self.working['key_info'] = self.working.get('key_info', '') + f"\n[MASTER] {injkeyinfo}" if injprompt: next_prompt += f"\n\n[MASTER] {injprompt}\n" for hook in getattr(self.parent, '_turn_end_hooks', {}).values(): hook(locals()) # current readonly return next_prompt def get_global_memory(): prompt = "\n" try: suffix = '_en' if os.environ.get('GA_LANG', '') == 'en' else '' with open(os.path.join(script_dir, 'memory/global_mem_insight.txt'), 'r', encoding='utf-8', errors='replace') as f: insight = f.read() with open(os.path.join(script_dir, f'assets/insight_fixed_structure{suffix}.txt'), 'r', encoding='utf-8') as f: structure = f.read() prompt += f'cwd = {os.path.join(script_dir, "temp")} (./)\n' prompt += f"\n[Memory] (../memory)\n" prompt += structure + '\n../memory/global_mem_insight.txt:\n' prompt += insight + "\n" except FileNotFoundError: pass return prompt ================================================ FILE: hub.pyw ================================================ # launcher.pyw - GenericAgent 服务启动器 # 纯 tkinter + 标准库,零第三方依赖,跨平台 import os, sys, socket, subprocess, threading import tkinter as tk from tkinter import ttk from collections import deque LOCK_PORT = 19735 BASE_DIR = os.path.dirname(os.path.abspath(__file__)) def acquire_singleton(): s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) try: s.bind(('127.0.0.1', LOCK_PORT)); s.listen(1);return s except OSError: return None def discover_services(): services = [] EXCLUDES = {'goal_mode.py', 'chatapp_common.py', 'tuiapp.py'} reflect_dir = os.path.join(BASE_DIR, 'reflect') if os.path.isdir(reflect_dir): for f in sorted(os.listdir(reflect_dir)): if f.endswith('.py') and not f.startswith('_') and f not in EXCLUDES: services.append({ 'name': 'reflect/' + f, 'cmd': [sys.executable, 'agentmain.py', '--reflect', 'reflect/' + f], }) frontends_dir = os.path.join(BASE_DIR, 'frontends') if os.path.isdir(frontends_dir): for f in sorted(os.listdir(frontends_dir)): if 'app' in f and f.endswith('.py') and f not in EXCLUDES: if 'stapp' in f: cmd = [sys.executable, '-m', 'streamlit', 'run', 'frontends/' + f, '--server.headless=true'] else: cmd = [sys.executable, 'frontends/' + f] services.append({'name': 'frontends/' + f, 'cmd': cmd}) return services class ServiceManager: def __init__(self): self.procs = {} self.buffers = {} def start(self, name, cmd): if name in self.procs and self.procs[name].poll() is None: return self.buffers[name] = deque(maxlen=500) env = os.environ.copy() env['PYTHONUNBUFFERED'] = '1' kw = dict(cwd=BASE_DIR, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, env=env) if sys.platform == 'win32': kw['creationflags'] = subprocess.CREATE_NO_WINDOW proc = subprocess.Popen(cmd, **kw) self.procs[name] = proc threading.Thread(target=self._reader, args=(name, proc), daemon=True).start() def _reader(self, name, proc): try: for line in proc.stdout: self.buffers[name].append(line) except Exception: pass def stop(self, name): proc = self.procs.get(name) if proc and proc.poll() is None: proc.terminate() try: proc.wait(timeout=5) except subprocess.TimeoutExpired: proc.kill() def is_running(self, name): proc = self.procs.get(name) return proc is not None and proc.poll() is None def stop_all(self): for name in list(self.procs): self.stop(name) def get_output(self, name): buf = self.buffers.get(name) return list(buf) if buf else [] class LauncherApp: def __init__(self, root): self.root = root self.root.title('GenericAgent Launcher') self.root.geometry('720x740') self.root.protocol('WM_DELETE_WINDOW', self.on_close) self.mgr = ServiceManager() self.services = discover_services() self.check_vars = {} self.selected = None self._build_ui() self._poll() def _build_ui(self): # 标题行:左边标签,右边 Rescan 按钮 header = ttk.Frame(self.root) header.pack(fill='x', padx=8, pady=(8, 0)) ttk.Label(header, text='Services', font=('', 10, 'bold')).pack(side='left') ttk.Button(header, text='\u27f3 Rescan', width=10, command=self._rescan).pack(side='right') svc_frame = ttk.LabelFrame(self.root, padding=5) svc_frame.pack(fill='x', padx=8, pady=(2, 4)) self.svc_container = ttk.Frame(svc_frame) self.svc_container.pack(fill='x') self.status_labels = {} self.row_frames = {} self.name_labels = {} self._build_service_rows() self.output_frame = ttk.LabelFrame(self.root, text='Output', padding=5) self.output_frame.pack(fill='both', expand=True, padx=8, pady=(4, 8)) self.output_text = tk.Text( self.output_frame, wrap='word', state='disabled', bg='#1e1e1e', fg='#d4d4d4', font=('Consolas', 9), insertbackground='white') sb = ttk.Scrollbar(self.output_frame, command=self.output_text.yview) self.output_text.configure(yscrollcommand=sb.set) sb.pack(side='right', fill='y') self.output_text.pack(fill='both', expand=True) def _build_service_rows(self): for svc in self.services: name = svc['name'] row = tk.Frame(self.svc_container, cursor='hand2', padx=4, pady=2) row.pack(fill='x', pady=1) self.row_frames[name] = row running = self.mgr.is_running(name) var = self.check_vars.get(name, tk.BooleanVar(value=running)) if running: var.set(True) self.check_vars[name] = var cb = ttk.Checkbutton( row, variable=var, command=lambda n=name, v=var, s=svc: self._toggle(n, v, s)) cb.pack(side='left') name_lbl = tk.Label(row, text=name, anchor='w', cursor='hand2', bg=row.cget('bg')) name_lbl.pack(side='left', fill='x', expand=True) self.name_labels[name] = name_lbl st = 'running' if running else 'stopped' fg = 'green' if running else 'gray' lbl = ttk.Label(row, text=st, foreground=fg, width=10) lbl.pack(side='right') self.status_labels[name] = lbl name_lbl.bind('', lambda e, n=name: self._select(n)) row.bind('', lambda e, n=name: self._select(n)) def _rescan(self): # 记住正在运行的服务 running_names = {n for n in self.mgr.procs if self.mgr.is_running(n)} # 清除旧行 for w in self.svc_container.winfo_children(): w.destroy() self.status_labels.clear() self.row_frames.clear() self.name_labels.clear() # 清除不再运行的 check_vars old_vars = {k: v for k, v in self.check_vars.items() if k in running_names} self.check_vars.clear() self.check_vars.update(old_vars) # 重新扫描 self.services = discover_services() self._build_service_rows() # 如果选中的服务不在新列表中,清除选中 svc_names = {s['name'] for s in self.services} if self.selected and self.selected not in svc_names: self.selected = None self.output_frame.configure(text='Output') def _toggle(self, name, var, svc): if var.get(): self.mgr.start(name, svc['cmd']) self._select(name) else: self.mgr.stop(name) def _select(self, name): self.selected = name # 高亮选中行 for n, row in self.row_frames.items(): if n == name: row.configure(bg='#cce5ff') self.name_labels[n].configure(bg='#cce5ff') else: row.configure(bg='SystemButtonFace') self.name_labels[n].configure(bg='SystemButtonFace') self.output_frame.configure(text=f'Output - {name}') self.root.after(50, self._refresh_output) def _refresh_output(self): if not self.selected: return lines = self.mgr.get_output(self.selected) new_text = ''.join(lines[-200:]) # 跳过无变化的刷新,避免不必要的闪烁和位置扰动 current = self.output_text.get('1.0', 'end-1c') if new_text.rstrip('\n') == current.rstrip('\n'): return # 记录滚动状态 _top, bot = self.output_text.yview() at_bottom = bot >= 0.99 # 记录「距底部的行偏移」用于非底部时精确恢复位置 if not at_bottom: old_total = int(self.output_text.index('end-1c').split('.')[0]) first_vis = int(float(self.output_text.index('@0,0'))) offset_from_end = old_total - first_vis self.output_text.configure(state='normal') self.output_text.delete('1.0', 'end') self.output_text.insert('end', new_text) self.output_text.configure(state='disabled') if at_bottom: self.output_text.see('end') else: # 用距底部偏移恢复,不受总行数变化影响 new_total = int(self.output_text.index('end-1c').split('.')[0]) target = max(1, new_total - offset_from_end) self.output_text.yview_moveto(0) # 先归零避免残留 self.output_text.see(f'{target}.0') def _poll(self): for svc in self.services: name = svc['name'] running = self.mgr.is_running(name) lbl = self.status_labels[name] if running: lbl.configure(text='running', foreground='green') else: lbl.configure(text='stopped', foreground='gray') if self.check_vars[name].get(): self.check_vars[name].set(False) self._refresh_output() self.root.after(1000, self._poll) def on_close(self): self.mgr.stop_all() self.root.destroy() if __name__ == '__main__': lock = acquire_singleton() if lock is None: try: import tkinter.messagebox as mb r = tk.Tk() r.withdraw() mb.showinfo('Launcher', 'Already running.') r.destroy() except Exception: pass sys.exit(0) root = tk.Tk() app = LauncherApp(root) root.mainloop() lock.close() ================================================ FILE: launch.pyw ================================================ import webview, threading, subprocess, sys, time, os, ctypes, atexit, socket, random WINDOW_WIDTH, WINDOW_HEIGHT, RIGHT_PADDING, TOP_PADDING = 600, 900, 0, 100 script_dir = os.path.dirname(os.path.abspath(__file__)) frontends_dir = os.path.join(script_dir, "frontends") def find_free_port(lo=18501, hi=18599): ports = list(range(lo, hi+1)); random.shuffle(ports) for p in ports: try: s = socket.socket(); s.bind(('127.0.0.1', p)); s.close(); return p except OSError: continue raise RuntimeError(f'No free port in {lo}-{hi}') def get_screen_width(): try: return ctypes.windll.user32.GetSystemMetrics(0) except: return 1920 def start_streamlit(port): global proc cmd = [sys.executable, "-m", "streamlit", "run", os.path.join(frontends_dir, "stapp.py"), "--server.port", str(port), "--server.address", "localhost", "--server.headless", "true", "--client.toolbarMode", "viewer"] proc = subprocess.Popen(cmd) atexit.register(proc.kill) def inject(text): window.evaluate_js(f""" const textarea = document.querySelector('textarea[data-testid="stChatInputTextArea"]'); if (textarea) {{ // 1. 用原生 setter 设置值(绕过 React) const nativeTextAreaValueSetter = Object.getOwnPropertyDescriptor(window.HTMLTextAreaElement.prototype, 'value').set; nativeTextAreaValueSetter.call(textarea, {repr(text)}); // 2. 触发 React 的 input 事件 textarea.dispatchEvent(new Event('input', {{ bubbles: true }})); // 3. 触发 change 事件(有些组件需要) textarea.dispatchEvent(new Event('change', {{ bubbles: true }})); // 4. 延迟提交 setTimeout(() => {{ const btn = document.querySelector('[data-testid="stChatInputSubmitButton"]'); if (btn) {{btn.click();console.log('Submitted:', {repr(text)});}} }}, 200); }}""") def get_last_reply_time(): last = window.evaluate_js(""" const el = document.getElementById('last-reply-time'); el ? parseInt(el.textContent) : 0; """) or 0 return last or int(time.time()) PASTE_HOOK_JS = """if (!window._pasteHooked) { window._pasteHooked = true; document.addEventListener('paste', e => { const items = e.clipboardData?.items; if (!items) return; let t = null; for (const item of items) { if (item.kind === 'file') { t = item.type.startsWith('image/') ? 'image in clipboard, ' : 'file in clipboard, '; break; } } if (!t) return; e.preventDefault(); e.stopImmediatePropagation(); const el = document.querySelector('textarea[data-testid="stChatInputTextArea"]') || document.activeElement; if (el && (el.tagName === 'TEXTAREA' || el.tagName === 'INPUT')) { const s = Object.getOwnPropertyDescriptor(HTMLTextAreaElement.prototype, 'value')?.set || Object.getOwnPropertyDescriptor(HTMLInputElement.prototype, 'value')?.set; s.call(el, el.value + t); el.dispatchEvent(new Event('input', { bubbles: true })); } }, true); }""" def idle_monitor(): last_trigger_time = 0 while True: time.sleep(5) try: window.evaluate_js(PASTE_HOOK_JS) now = time.time() if now - last_trigger_time < 120: continue last_reply = get_last_reply_time() if now - last_reply > 1800: print('[Idle Monitor] Detected idle state, injecting task...') inject("[AUTO]🤖 用户已经离开超过30分钟,作为自主智能体,请阅读自动化sop,执行自动任务。") last_trigger_time = now except Exception as e: print(f'[Idle Monitor] Error: {e}') if __name__ == '__main__': import argparse parser = argparse.ArgumentParser() parser.add_argument('port', nargs='?', default='0'); parser.add_argument('--tg', action='store_true', help='启动 Telegram Bot'); parser.add_argument('--qq', action='store_true', help='启动 QQ Bot'); parser.add_argument('--feishu', '--fs', dest='feishu', action='store_true', help='启动 Feishu Bot'); parser.add_argument('--wechat', '--wx', dest='wechat', action='store_true', help='启动 WeChat Bot'); parser.add_argument('--wecom', action='store_true', help='启动 WeCom Bot'); parser.add_argument('--dingtalk', '--dt', dest='dingtalk', action='store_true', help='启动 DingTalk Bot'); parser.add_argument('--sched', action='store_true', help='启动计划任务调度器') parser.add_argument('--llm_no', type=int, default=0, help='LLM编号') args = parser.parse_args() port = str(find_free_port()) if args.port == '0' else args.port print(f'[Launch] Using port {port}') threading.Thread(target=start_streamlit, args=(port,), daemon=True).start() if args.tg: tgproc = subprocess.Popen([sys.executable, os.path.join(frontends_dir, "tgapp.py")], creationflags=subprocess.CREATE_NO_WINDOW if os.name=='nt' else 0) atexit.register(tgproc.kill) print('[Launch] Telegram Bot started') else: print('[Launch] Telegram Bot not enabled (use --tg to start)') if args.qq: qqproc = subprocess.Popen([sys.executable, os.path.join(frontends_dir, "qqapp.py")], creationflags=subprocess.CREATE_NO_WINDOW if os.name=='nt' else 0) atexit.register(qqproc.kill) print('[Launch] QQ Bot started') else: print('[Launch] QQ Bot not enabled (use --qq to start)') if args.feishu: fsproc = subprocess.Popen([sys.executable, os.path.join(frontends_dir, "fsapp.py")], creationflags=subprocess.CREATE_NO_WINDOW if os.name=='nt' else 0) atexit.register(fsproc.kill) print('[Launch] Feishu Bot started') else: print('[Launch] Feishu Bot not enabled (use --feishu to start)') if args.wechat: wxproc = subprocess.Popen([sys.executable, os.path.join(frontends_dir, 'wechatapp.py')], creationflags=subprocess.CREATE_NO_WINDOW if os.name=='nt' else 0) atexit.register(wxproc.kill) print('[Launch] WeChat Bot started') else: print('[Launch] WeChat Bot not enabled (use --wechat to start)') if args.wecom: wcproc = subprocess.Popen([sys.executable, os.path.join(frontends_dir, "wecomapp.py")], creationflags=subprocess.CREATE_NO_WINDOW if os.name=='nt' else 0) atexit.register(wcproc.kill) print('[Launch] WeCom Bot started') else: print('[Launch] WeCom Bot not enabled (use --wecom to start)') if args.dingtalk: dtproc = subprocess.Popen([sys.executable, os.path.join(frontends_dir, "dingtalkapp.py")], creationflags=subprocess.CREATE_NO_WINDOW if os.name=='nt' else 0) atexit.register(dtproc.kill) print('[Launch] DingTalk Bot started') else: print('[Launch] DingTalk Bot not enabled (use --dingtalk to start)') if args.sched: scheduler_proc = subprocess.Popen([sys.executable, os.path.join(script_dir, "agentmain.py"), "--reflect", os.path.join(script_dir, "reflect", "scheduler.py"), "--llm_no", str(args.llm_no)], creationflags=subprocess.CREATE_NO_WINDOW if os.name=='nt' else 0) atexit.register(scheduler_proc.kill) print('[Launch] Task Scheduler started (duplicate prevented by scheduler port lock)') else: print('[Launch] Task Scheduler not enabled (--sched)') monitor_thread = threading.Thread(target=idle_monitor, daemon=True) monitor_thread.start() if os.name == 'nt': screen_width = get_screen_width() x_pos = screen_width - WINDOW_WIDTH - RIGHT_PADDING else: x_pos = 100 time.sleep(2) window = webview.create_window( title='GenericAgent', url=f'http://localhost:{port}', width=WINDOW_WIDTH, height=WINDOW_HEIGHT, x=x_pos, y=TOP_PADDING, resizable=True, text_select=True) webview.start() ================================================ FILE: llmcore.py ================================================ import os, json, re, time, requests, sys, threading, urllib3, base64, importlib, uuid from datetime import datetime urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) _RESP_CACHE_KEY = str(uuid.uuid4()) def _load_mykeys(): global _mykey_path try: import mykey; importlib.reload(mykey); _mykey_path = mykey.__file__ return {k: v for k, v in vars(mykey).items() if not k.startswith('_')} except ImportError: pass _mykey_path = p = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'mykey.json') if not os.path.exists(p): raise Exception('[ERROR] mykey.py or mykey.json not found, please create one from mykey_template.') with open(p, encoding='utf-8') as f: return json.load(f) _mykey_path = _mykey_mtime = None def reload_mykeys(): global _mykey_mtime mt = os.stat(_mykey_path).st_mtime_ns if _mykey_path else -1 if mt == _mykey_mtime: return globals().get('mykeys', {}), False mk = _load_mykeys(); _mykey_mtime = os.stat(_mykey_path).st_mtime_ns print(f'[Info] Load mykeys from {_mykey_path}') globals().update(mykeys=mk) if mk.get('langfuse_config'): try: from plugins import langfuse_tracing except Exception: pass return mk, True def __getattr__(name): # once guard in PEP 562 if name == 'mykeys': return reload_mykeys()[0] raise AttributeError(f"module 'llmcore' has no attribute {name}") def compress_history_tags(messages, keep_recent=10, max_len=800, force=False): """Compress // tags in older messages to save tokens.""" compress_history_tags._cd = getattr(compress_history_tags, '_cd', 0) + 1 if force: compress_history_tags._cd = 0 if compress_history_tags._cd % 5 != 0: return messages _before = sum(len(json.dumps(m, ensure_ascii=False)) for m in messages) _pats = {tag: re.compile(rf'(<{tag}>)([\s\S]*?)()') for tag in ('thinking', 'think', 'tool_use', 'tool_result')} _hist_pat = re.compile(r'<(history|key_info|earlier_context)>[\s\S]*?') def _trunc_str(s): return s[:max_len//2] + '\n...[Truncated]...\n' + s[-max_len//2:] if isinstance(s, str) and len(s) > max_len else s def _trunc(text): text = _hist_pat.sub(lambda m: f'<{m.group(1)}>[...]', text) for pat in _pats.values(): text = pat.sub(lambda m: m.group(1) + _trunc_str(m.group(2)) + m.group(3), text) return text for i, msg in enumerate(messages): if i >= len(messages) - keep_recent: break c = msg['content'] if isinstance(c, str): msg['content'] = _trunc(c) elif isinstance(c, list): for b in c: if not isinstance(b, dict): continue t = b.get('type') if t == 'text' and isinstance(b.get('text'), str): b['text'] = _trunc(b['text']) elif t == 'tool_result': tc = b.get('content') if isinstance(tc, str): b['content'] = _trunc_str(tc) elif isinstance(tc, list): for sub in tc: if isinstance(sub, dict) and sub.get('type') == 'text': sub['text'] = _trunc_str(sub.get('text')) elif t == 'tool_use' and isinstance(b.get('input'), dict): for k, v in b['input'].items(): b['input'][k] = _trunc_str(v) print(f"[Cut] {_before} -> {sum(len(json.dumps(m, ensure_ascii=False)) for m in messages)}") return messages def _sanitize_leading_user_msg(msg): """把 user 消息里的 tool_result 块改写成纯文本,避免孤立引用。 history 统一使用 Claude content-block 格式:content 是 list of blocks。""" msg = dict(msg) # 浅拷贝外层 dict content = msg.get('content') if not isinstance(content, list): return msg texts = [] for block in content: if not isinstance(block, dict): continue if block.get('type') == 'tool_result': c = block.get('content', '') if isinstance(c, list): # content 本身也可能是 list[{type:text,text:...}] texts.extend(b.get('text', '') for b in c if isinstance(b, dict)) else: texts.append(str(c)) elif block.get('type') == 'text': texts.append(block.get('text', '')) msg['content'] = [{"type": "text", "text": '\n'.join(t for t in texts if t)}] return msg _oldprint = print def safeprint(*argv): try: _oldprint(*argv) except OSError: pass print = safeprint def trim_messages_history(history, context_win): compress_history_tags(history) cost = sum(len(json.dumps(m, ensure_ascii=False)) for m in history) print(f'[Debug] Current context: {cost} chars, {len(history)} messages.') if cost > context_win * 3: compress_history_tags(history, keep_recent=4, force=True) # trim breaks cache, so compress more btw target = context_win * 3 * 0.6 while len(history) > 5 and cost > target: history.pop(0) while history and history[0].get('role') != 'user': history.pop(0) if history and history[0].get('role') == 'user': history[0] = _sanitize_leading_user_msg(history[0]) cost = sum(len(json.dumps(m, ensure_ascii=False)) for m in history) print(f'[Debug] Trimmed context, current: {cost} chars, {len(history)} messages.') def auto_make_url(base, path): b, p = base.rstrip('/'), path.strip('/') if b.endswith('$'): return b[:-1].rstrip('/') if b.endswith(p): return b return f"{b}/{p}" if re.search(r'/v\d+(/|$)', b) else f"{b}/v1/{p}" def _parse_claude_json(data): content_blocks = data.get("content", []) _record_usage(data.get("usage", {}), "messages") for b in content_blocks: if b.get("type") == "text": yield b.get("text", "") elif b.get("type") == "thinking": yield "" return content_blocks def _parse_claude_sse(resp_lines): """Parse Anthropic SSE stream. Yields text chunks, returns list[content_block].""" content_blocks = []; current_block = None; tool_json_buf = "" stop_reason = None; got_message_stop = False; warn = None for line in resp_lines: if not line: continue line = line.decode('utf-8') if isinstance(line, bytes) else line if not line.startswith("data:"): continue data_str = line[5:].lstrip() if data_str == "[DONE]": break try: evt = json.loads(data_str) except Exception as e: print(f"[SSE] JSON parse error: {e}, line: {data_str[:200]}") continue evt_type = evt.get("type", "") if evt_type == "message_start": usage = evt.get("message", {}).get("usage", {}) _record_usage(usage, "messages") elif evt_type == "content_block_start": block = evt.get("content_block", {}) if block.get("type") == "text": current_block = {"type": "text", "text": ""} elif block.get("type") == "thinking": current_block = {"type": "thinking", "thinking": "", "signature": ""} elif block.get("type") == "tool_use": current_block = {"type": "tool_use", "id": block.get("id", ""), "name": block.get("name", ""), "input": {}} tool_json_buf = "" elif evt_type == "content_block_delta": delta = evt.get("delta", {}) if delta.get("type") == "text_delta": text = delta.get("text", "") if current_block and current_block.get("type") == "text": current_block["text"] += text if text: yield text elif delta.get("type") == "thinking_delta": if current_block and current_block.get("type") == "thinking": current_block["thinking"] += delta.get("thinking", "") elif delta.get("type") == "signature_delta": if current_block and current_block.get("type") == "thinking": current_block["signature"] = current_block.get("signature", "") + delta.get("signature", "") elif delta.get("type") == "input_json_delta": tool_json_buf += delta.get("partial_json", "") elif evt_type == "content_block_stop": if current_block: if current_block["type"] == "tool_use": try: current_block["input"] = json.loads(tool_json_buf) if tool_json_buf else {} except: current_block["input"] = {"_raw": tool_json_buf} content_blocks.append(current_block) current_block = None elif evt_type == "message_delta": delta = evt.get("delta", {}) stop_reason = delta.get("stop_reason", stop_reason) out_usage = evt.get("usage", {}) out_tokens = out_usage.get("output_tokens", 0) if out_tokens: print(f"[Output] tokens={out_tokens} stop_reason={stop_reason}") elif evt_type == "message_stop": got_message_stop = True elif evt_type == "error": err = evt.get("error", {}) emsg = err.get("message", str(err)) if isinstance(err, dict) else str(err) warn = f"\n\n!!!Error: SSE {emsg}"; break if not warn: if not got_message_stop and not stop_reason: warn = "\n\n[!!! 流异常中断,未收到完整响应 !!!]" elif stop_reason == "max_tokens": warn = "\n\n[!!! Response truncated: max_tokens !!!]" if current_block: if current_block["type"] == "tool_use": try: current_block["input"] = json.loads(tool_json_buf) if tool_json_buf else {} except: current_block["input"] = {"_raw": tool_json_buf} content_blocks.append(current_block); current_block = None if warn: print(f"[WARN] {warn.strip()}") content_blocks.append({"type": "text", "text": warn}); yield warn return content_blocks def _try_parse_tool_args(raw): """Parse tool args string; split concatenated JSON objects like {..}{..} if needed. Returns list of parsed dicts.""" if not raw: return [{}] try: return [json.loads(raw)] except: pass parts = re.split(r'(?<=\})(?=\{)', raw) if len(parts) > 1: parsed = [] for p in parts: try: parsed.append(json.loads(p)) except: return [{"_raw": raw}] return parsed return [{"_raw": raw}] def _parse_openai_sse(resp_lines, api_mode="chat_completions"): """Parse OpenAI SSE stream (chat_completions or responses API). Yields text chunks, returns list[content_block]. content_block: {type:'text', text:str} | {type:'tool_use', id:str, name:str, input:dict} """ content_text = "" if api_mode == "responses": seen_delta = False; fc_buf = {}; current_fc_idx = None for line in resp_lines: if not line: continue line = line.decode('utf-8', errors='replace') if isinstance(line, bytes) else line if not line.startswith("data:"): continue data_str = line[5:].lstrip() if data_str == "[DONE]": break try: evt = json.loads(data_str) except: continue etype = evt.get("type", "") if etype == "response.output_text.delta": delta = evt.get("delta", "") if delta: seen_delta = True; content_text += delta; yield delta elif etype == "response.output_text.done" and not seen_delta: text = evt.get("text", "") if text: content_text += text; yield text elif etype == "response.output_item.added": item = evt.get("item", {}) if item.get("type") == "function_call": idx = evt.get("output_index", 0) fc_buf[idx] = {"id": item.get("call_id", item.get("id", "")), "name": item.get("name", ""), "args": ""} current_fc_idx = idx elif etype == "response.function_call_arguments.delta": idx = evt.get("output_index", current_fc_idx or 0) if idx in fc_buf: fc_buf[idx]["args"] += evt.get("delta", "") elif etype == "response.function_call_arguments.done": idx = evt.get("output_index", current_fc_idx or 0) if idx in fc_buf: fc_buf[idx]["args"] = evt.get("arguments", fc_buf[idx]["args"]) elif etype == "error": err = evt.get("error", {}) emsg = err.get("message", str(err)) if isinstance(err, dict) else str(err) if emsg: content_text += f"!!!Error: {emsg}"; yield f"!!!Error: {emsg}" break elif etype == "response.completed": usage = evt.get("response", {}).get("usage", {}) _record_usage(usage, api_mode) break blocks = [] if content_text: blocks.append({"type": "text", "text": content_text}) for idx in sorted(fc_buf): fc = fc_buf[idx] inps = _try_parse_tool_args(fc["args"]) for i, inp in enumerate(inps): bid = fc["id"] or '' if len(inps) > 1: bid = f"{bid}_{i}" if bid else f"split_{i}" blocks.append({"type": "tool_use", "id": bid, "name": fc["name"], "input": inp}) return blocks else: tc_buf = {} # index -> {id, name, args} reasoning_text = "" for line in resp_lines: if not line: continue line = line.decode('utf-8', errors='replace') if isinstance(line, bytes) else line if not line.startswith("data:"): continue data_str = line[5:].lstrip() if data_str == "[DONE]": break try: evt = json.loads(data_str) except: continue ch = (evt.get("choices") or [{}])[0] delta = ch.get("delta") or {} if delta.get("reasoning_content"): reasoning_text += delta["reasoning_content"] if delta.get("content"): text = delta["content"]; content_text += text; yield text for tc in (delta.get("tool_calls") or []): idx = tc.get("index", 0) has_name = bool(tc.get("function", {}).get("name")) if idx not in tc_buf: if has_name or not tc_buf: tc_buf[idx] = {"id": tc.get("id") or '', "name": "", "args": ""} else: idx = max(tc_buf) if has_name: tc_buf[idx]["name"] = tc["function"]["name"] if tc.get("function", {}).get("arguments"): tc_buf[idx]["args"] += tc["function"]["arguments"] if tc.get("id") and not tc_buf[idx]["id"]: tc_buf[idx]["id"] = tc["id"] usage = evt.get("usage") if usage: _record_usage(usage, api_mode) blocks = [] if reasoning_text: blocks.append({"type": "thinking", "thinking": reasoning_text}) if content_text: blocks.append({"type": "text", "text": content_text}) for idx in sorted(tc_buf): tc = tc_buf[idx] inps = _try_parse_tool_args(tc["args"]) for i, inp in enumerate(inps): bid = tc["id"] or '' if len(inps) > 1: bid = f"{bid}_{i}" if bid else f"split_{i}" blocks.append({"type": "tool_use", "id": bid, "name": tc["name"], "input": inp}) return blocks def _record_usage(usage, api_mode): if not usage: return if api_mode == 'responses': cached = (usage.get("input_tokens_details") or {}).get("cached_tokens", 0) inp = usage.get("input_tokens", 0) print(f"[Cache] input={inp} cached={cached}") elif api_mode == 'chat_completions': cached = (usage.get("prompt_tokens_details") or {}).get("cached_tokens", 0) inp = usage.get("prompt_tokens", 0) print(f"[Cache] input={inp} cached={cached}") elif api_mode == 'messages': ci, cr, inp = usage.get("cache_creation_input_tokens", 0), usage.get("cache_read_input_tokens", 0), usage.get("input_tokens", 0) print(f"[Cache] input={inp} creation={ci} read={cr}") def _parse_openai_json(data, api_mode="chat_completions"): blocks = [] if api_mode == "responses": _record_usage(data.get("usage") or {}, api_mode) for item in (data.get("output") or []): if item.get("type") == "message": for p in (item.get("content") or []): if p.get("type") in ("output_text", "text") and p.get("text"): blocks.append({"type": "text", "text": p["text"]}); yield p["text"] elif item.get("type") == "function_call": try: args = json.loads(item.get("arguments", "")) if item.get("arguments") else {} except: args = {"_raw": item.get("arguments", "")} blocks.append({"type": "tool_use", "id": item.get("call_id", item.get("id", "")), "name": item.get("name", ""), "input": args}) else: _record_usage(data.get("usage") or {}, api_mode) msg = (data.get("choices") or [{}])[0].get("message", {}) reasoning = msg.get("reasoning_content", "") if reasoning: blocks.append({"type": "thinking", "thinking": reasoning}) content = msg.get("content", "") if content: blocks.append({"type": "text", "text": content}); yield content for tc in (msg.get("tool_calls") or []): fn = tc.get("function", {}) try: args = json.loads(fn.get("arguments", "")) if fn.get("arguments") else {} except: args = {"_raw": fn.get("arguments", "")} blocks.append({"type": "tool_use", "id": tc.get("id", ""), "name": fn.get("name", ""), "input": args}) return blocks def _stamp_oai_cache_markers(messages, model): """Add cache_control to last 2 user messages for Anthropic models via OAI-compatible relay.""" ml = model.lower() if not any(k in ml for k in ('claude', 'anthropic')): return user_idxs = [i for i, m in enumerate(messages) if m.get('role') == 'user'] for idx in user_idxs[-2:]: c = messages[idx].get('content') if isinstance(c, str): messages[idx] = {**messages[idx], 'content': [{'type': 'text', 'text': c, 'cache_control': {'type': 'ephemeral'}}]} elif isinstance(c, list) and c: c = list(c); c[-1] = dict(c[-1], cache_control={'type': 'ephemeral'}) messages[idx] = {**messages[idx], 'content': c} def _stream_with_retry(sess, url, headers, payload, parse_fn): _RETRYABLE = {408, 409, 425, 429, 500, 502, 503, 504, 529} def _delay(resp, attempt): try: ra = float((resp.headers or {}).get("retry-after")) except: ra = None return max(0.5, ra if ra is not None else min(30.0, 1.5 * (2 ** attempt))) for attempt in range(sess.max_retries + 1): streamed = False try: with requests.post(url, headers=headers, json=payload, stream=sess.stream, timeout=(sess.connect_timeout, sess.read_timeout), proxies=sess.proxies, verify=sess.verify) as r: if r.status_code >= 400: if r.status_code in _RETRYABLE and attempt < sess.max_retries: d = _delay(r, attempt) print(f"[LLM Retry] HTTP {r.status_code}, retry in {d:.1f}s ({attempt+1}/{sess.max_retries+1})") time.sleep(d); continue try: body = r.text.strip()[:500] except: body = "" err = f"!!!Error: HTTP {r.status_code}" + (f": {body}" if body else "") yield err; return [{"type": "text", "text": err}] gen = parse_fn(r) try: while True: streamed = True; yield next(gen) except StopIteration as e: return e.value or [] except (requests.Timeout, requests.ConnectionError) as e: err = f"!!!Error: {type(e).__name__}" if attempt < sess.max_retries: d = _delay(None, attempt) print(f"[LLM Retry] {type(e).__name__}, retry in {d:.1f}s ({attempt+1}/{sess.max_retries+1})") yield err; time.sleep(d); continue yield err; return [{"type": "text", "text": err}] except Exception as e: err = f"\n\n[!!! 流异常中断 {type(e).__name__}: {e} !!!]" if streamed else f"!!!Error: {type(e).__name__}: {e}" yield err; return [{"type": "text", "text": err}] def _openai_stream(sess, messages): model, api_mode = sess.model, sess.api_mode ml = model.lower() temperature = sess.temperature if 'kimi' in ml or 'moonshot' in ml: temperature = 1 elif 'minimax' in ml: temperature = max(0.01, min(temperature, 1.0)) # MiniMax requires temp in (0, 1] headers = {"Authorization": f"Bearer {sess.api_key}", "Content-Type": "application/json", "Accept": "text/event-stream"} if api_mode == "responses": url = auto_make_url(sess.api_base, "responses") payload = {"model": model, "input": _to_responses_input(messages), "stream": sess.stream, "prompt_cache_key": _RESP_CACHE_KEY, "instructions": sess.system or "You are an Omnipotent Executor."} if sess.reasoning_effort: payload["reasoning"] = {"effort": sess.reasoning_effort} if sess.max_tokens: payload["max_output_tokens"] = sess.max_tokens else: url = auto_make_url(sess.api_base, "chat/completions") if sess.system: messages = [{"role": "system", "content": sess.system}] + messages _stamp_oai_cache_markers(messages, model) payload = {"model": model, "messages": messages, "stream": sess.stream} if sess.stream: payload["stream_options"] = {"include_usage": True} if temperature != 1: payload["temperature"] = temperature if sess.max_tokens: payload["max_completion_tokens" if ml.startswith(("gpt-5", "o1", "o2", "o3", "o4")) else "max_tokens"] = sess.max_tokens if sess.reasoning_effort: payload["reasoning_effort"] = sess.reasoning_effort tools = getattr(sess, 'tools', None) if tools: payload["tools"] = _prepare_oai_tools(tools, api_mode) if sess.service_tier: payload["service_tier"] = sess.service_tier parse_fn = (lambda r: _parse_openai_sse(r.iter_lines(), api_mode)) if sess.stream else (lambda r: _parse_openai_json(r.json(), api_mode)) return (yield from _stream_with_retry(sess, url, headers, payload, parse_fn)) def _prepare_oai_tools(tools, api_mode="chat_completions"): if api_mode == "responses": resp_tools = [] for t in tools: if t.get("type") == "function" and "function" in t: rt = {"type": "function"}; rt.update(t["function"]) resp_tools.append(rt) else: resp_tools.append(t) return resp_tools return tools def _to_responses_input(messages): result, pending = [], [] for msg in messages: role = str(msg.get("role", "user")).lower() if role == "tool": cid = msg.get("tool_call_id") or (pending.pop(0) if pending else f"call_{uuid.uuid4().hex[:8]}") result.append({"type": "function_call_output", "call_id": cid, "output": msg.get("content", "")}) continue if role not in ["user", "assistant", "system", "developer"]: role = "user" if role == "system": role = "developer" # Responses API uses 'developer' instead of 'system' content = msg.get("content", "") text_type = "output_text" if role == "assistant" else "input_text" parts = [] if isinstance(content, str): if content: parts.append({"type": text_type, "text": content}) elif isinstance(content, list): for part in content: if not isinstance(part, dict): continue ptype = part.get("type") if ptype == "text": text = part.get("text", "") if text: parts.append({"type": text_type, "text": text}) elif ptype == "image_url": url = (part.get("image_url") or {}).get("url", "") if url and role != "assistant": parts.append({"type": "input_image", "image_url": url}) if len(parts) == 0: parts = [{"type": text_type, "text": str(content) if not isinstance(content, list) else '[empty]'}] result.append({"role": role, "content": parts}) pending = [] for tc in (msg.get("tool_calls") or []): f = tc.get("function", {}) cid = tc.get("id") or f"call_{uuid.uuid4().hex[:8]}" pending.append(cid) result.append({"type": "function_call", "call_id": cid, "name": f.get("name", ""), "arguments": f.get("arguments", "")}) return result def _msgs_claude2oai(messages): result = [] for msg in messages: role = msg.get("role", "user") content = msg.get("content", "") blocks = content if isinstance(content, list) else [{"type": "text", "text": str(content)}] if role == "assistant": text_parts, tool_calls, reasoning = [], [], "" for b in blocks: if not isinstance(b, dict): continue if b.get("type") == "thinking" and b.get("thinking"): reasoning = b["thinking"] elif b.get("type") == "text" and b.get("text"): text_parts.append({"type": "text", "text": b.get("text", "")}) elif b.get("type") == "tool_use": tool_calls.append({ "id": b.get("id") or '', "type": "function", "function": {"name": b.get("name", ""), "arguments": json.dumps(b.get("input", {}), ensure_ascii=False)} }) m = {"role": "assistant"} if reasoning: m["reasoning_content"] = reasoning if text_parts: m["content"] = text_parts elif not tool_calls: m["content"] = "." if tool_calls: m["tool_calls"] = tool_calls result.append(m) elif role == "user": text_parts = [] for b in blocks: if not isinstance(b, dict): continue if b.get("type") == "tool_result": if text_parts: result.append({"role": "user", "content": text_parts}) text_parts = [] tr = b.get("content", "") if isinstance(tr, list): tr = "\n".join(x.get("text", "") for x in tr if isinstance(x, dict) and x.get("type") == "text") result.append({"role": "tool", "tool_call_id": b.get("tool_use_id") or '', "content": tr if isinstance(tr, str) else str(tr)}) elif b.get("type") == "image": src = b.get("source") or {} if src.get("type") == "base64" and src.get("data"): text_parts.append({"type": "image_url", "image_url": {"url": f"data:{src.get('media_type', 'image/png')};base64,{src.get('data', '')}"}}) elif b.get("type") == "image_url": text_parts.append(b) elif b.get("type") == "text" and b.get("text"): text_parts.append({"type": "text", "text": b.get("text", "")}) if text_parts: result.append({"role": "user", "content": text_parts}) else: result.append(msg) return result class BaseSession: def __init__(self, cfg): self.api_key = cfg['apikey'] self.api_base = cfg['apibase'].rstrip('/') self.model = cfg.get('model', '') self.context_win = cfg.get('context_win', 28000) self.history = [] self.lock = threading.Lock() self.system = "" self.name = cfg.get('name', self.model) proxy = cfg.get('proxy') self.proxies = {"http": proxy, "https": proxy} if proxy else None self.max_retries = max(0, int(cfg.get('max_retries', 4))) self.verify = cfg.get('verify', True) self.stream = cfg.get('stream', True) default_ct, default_rt = (5, 30) if self.stream else (10, 240) self.connect_timeout = max(1, int(cfg.get('timeout', default_ct))) self.read_timeout = max(5, int(cfg.get('read_timeout', default_rt))) def _enum(key, valid): v = cfg.get(key); v = None if v is None else str(v).strip().lower() return v if not v or v in valid else print(f"[WARN] Invalid {key} {v!r}, ignored.") self.reasoning_effort = _enum('reasoning_effort', {'none', 'minimal', 'low', 'medium', 'high', 'xhigh'}) self.service_tier = _enum('service_tier', {'auto', 'default', 'priority', 'flex'}) self.thinking_type = _enum('thinking_type', {'adaptive', 'enabled', 'disabled'}) self.thinking_budget_tokens = cfg.get('thinking_budget_tokens') mode = str(cfg.get('api_mode', 'chat_completions')).strip().lower().replace('-', '_') self.api_mode = 'responses' if mode in ('responses', 'response') else 'chat_completions' self.temperature = cfg.get('temperature', 1) self.max_tokens = cfg.get('max_tokens') def _apply_claude_thinking(self, payload): if self.thinking_type: thinking = {"type": self.thinking_type} if self.thinking_type == 'enabled': if self.thinking_budget_tokens is None: print("[WARN] thinking_type='enabled' requires thinking_budget_tokens, ignored.") else: thinking["budget_tokens"] = self.thinking_budget_tokens; payload["thinking"] = thinking else: payload["thinking"] = thinking if self.reasoning_effort: effort = {'low': 'low', 'medium': 'medium', 'high': 'high', 'xhigh': 'max'}.get(self.reasoning_effort) if effort: payload["output_config"] = {"effort": effort} else: print(f"[WARN] reasoning_effort {self.reasoning_effort!r} is unsupported for Claude output_config.effort, ignored.") def ask(self, prompt): def _ask_gen(): with self.lock: self.history.append({"role": "user", "content": [{"type": "text", "text": prompt}]}) trim_messages_history(self.history, self.context_win) messages = self.make_messages(self.history) content_blocks = None; content = '' gen = self.raw_ask(messages) try: while True: chunk = next(gen); content += chunk; yield chunk except StopIteration as e: content_blocks = e.value or [] if len(content_blocks) > 1: print(f"[DEBUG BaseSession.ask] content_blocks: {content_blocks}") for block in (content_blocks or []): if block.get('type', '') == 'tool_use': tu = {'name': block.get('name', ''), 'arguments': block.get('input', {})} yield f'{json.dumps(tu, ensure_ascii=False)}' if content.strip() and not content.startswith("!!!Error:"): self.history.append({"role": "assistant", "content": [{"type": "text", "text": content}]}) return _ask_gen() if self.stream else ''.join(list(_ask_gen())) def _keep_claude_block(b): return not isinstance(b, dict) or b.get("type") != "thinking" or b.get("signature") def _drop_unsigned_thinking(messages): for m in messages: c = m.get("content") if isinstance(c, list): m["content"] = [b for b in c if _keep_claude_block(b)] return messages def _ensure_thinking_blocks(messages, model): """deepseek needs thinking in history!""" if 'deepseek' not in model.lower(): return messages for m in messages: if m.get("role") != "assistant": continue c = m.get("content") if not isinstance(c, list): continue has_thinking = any(isinstance(b, dict) and b.get("type") == "thinking" for b in c) if not has_thinking: m["content"] = [{"type": "thinking", "thinking": "...", "signature": "placeholder"}, *c] return messages class ClaudeSession(BaseSession): def raw_ask(self, messages): messages = _fix_messages(messages) if self.max_tokens is None: self.max_tokens = 8192 headers = {"x-api-key": self.api_key, "Content-Type": "application/json", "anthropic-version": "2023-06-01", "anthropic-beta": "prompt-caching-2024-07-31"} payload = {"model": self.model, "messages": messages, "max_tokens": self.max_tokens, "stream": self.stream} if self.temperature != 1: payload["temperature"] = self.temperature self._apply_claude_thinking(payload) if self.system: payload["system"] = [{"type": "text", "text": self.system, "cache_control": {"type": "persistent"}}] url = auto_make_url(self.api_base, "messages") parse_fn = (lambda r: _parse_claude_sse(r.iter_lines())) if self.stream else (lambda r: _parse_claude_json(r.json())) return (yield from _stream_with_retry(self, url, headers, payload, parse_fn)) def make_messages(self, raw_list): msgs = _drop_unsigned_thinking([{"role": m['role'], "content": list(m['content'])} for m in raw_list]) user_idxs = [i for i, m in enumerate(msgs) if m['role'] == 'user'] for idx in user_idxs[-2:]: msgs[idx]["content"][-1] = dict(msgs[idx]["content"][-1], cache_control={"type": "ephemeral"}) return msgs class LLMSession(BaseSession): def raw_ask(self, messages): return (yield from _openai_stream(self, messages)) def make_messages(self, raw_list): return _msgs_claude2oai(_fix_messages(raw_list)) def _fix_messages(messages): """修复 messages 符合 Claude API:交替、tool_use/tool_result 配对""" if not messages: return messages _wrap = lambda c: c if isinstance(c, list) else [{"type": "text", "text": str(c)}] fixed = [] for m in messages: if fixed and m['role'] == fixed[-1]['role']: fixed[-1] = {**fixed[-1], 'content': _wrap(fixed[-1]['content']) + [{"type": "text", "text": "\n"}] + _wrap(m['content'])}; continue if fixed and fixed[-1]['role'] == 'assistant' and m['role'] == 'user': uses = [b.get('id') for b in fixed[-1].get('content', []) if isinstance(b, dict) and b.get('type') == 'tool_use' and b.get('id')] has = {b.get('tool_use_id') for b in _wrap(m['content']) if isinstance(b, dict) and b.get('type') == 'tool_result'} miss = [uid for uid in uses if uid not in has] if miss: m = {**m, 'content': [{"type": "tool_result", "tool_use_id": uid, "content": "(error)"} for uid in miss] + _wrap(m['content'])} orphan = has - set(uses) if orphan: m = {**m, 'content': [{"type":"text","text":str(b.get('content',''))} if isinstance(b,dict) and b.get('type')=='tool_result' and b.get('tool_use_id') in orphan else b for b in _wrap(m['content'])]} fixed.append(m) while fixed and fixed[0]['role'] != 'user': fixed.pop(0) return fixed class NativeClaudeSession(BaseSession): def __init__(self, cfg): super().__init__(cfg) self.fake_cc_system_prompt = cfg.get("fake_cc_system_prompt", False) self.user_agent = cfg.get("user_agent", "claude-cli/2.1.113 (external, cli)") self._session_id = str(uuid.uuid4()) self._account_uuid = str(uuid.uuid4()) self._device_id = uuid.uuid4().hex + uuid.uuid4().hex[:32] self.tools = None def raw_ask(self, messages): messages = _ensure_thinking_blocks(_drop_unsigned_thinking(_fix_messages(messages)), self.model) if self.max_tokens is None: self.max_tokens = 8192 model = self.model beta_parts = ["claude-code-20250219", "interleaved-thinking-2025-05-14", "redact-thinking-2026-02-12", "prompt-caching-scope-2026-01-05"] if "[1m]" in model.lower(): beta_parts.insert(1, "context-1m-2025-08-07"); model = model.replace("[1m]", "").replace("[1M]", "") headers = {"Content-Type": "application/json", "anthropic-version": "2023-06-01", "anthropic-beta": ",".join(beta_parts), "anthropic-dangerous-direct-browser-access": "true", "user-agent": self.user_agent, "x-app": "cli"} if self.api_key.startswith("sk-ant-"): headers["x-api-key"] = self.api_key else: headers["authorization"] = f"Bearer {self.api_key}" payload = {"model": model, "messages": messages, "max_tokens": self.max_tokens, "stream": self.stream} if self.temperature != 1: payload["temperature"] = self.temperature self._apply_claude_thinking(payload) payload["metadata"] = {"user_id": json.dumps({"device_id": self._device_id, "account_uuid": self._account_uuid, "session_id": self._session_id}, separators=(',', ':'))} if self.tools: claude_tools = openai_tools_to_claude(self.tools) tools = [dict(t) for t in claude_tools]; tools[-1]["cache_control"] = {"type": "ephemeral"} payload["tools"] = tools else: print("[ERROR] No tools provided for this session.") payload['system'] = [{"type": "text", "text": "You are Claude Code, Anthropic's official CLI for Claude.", "cache_control": {"type": "ephemeral"}}] if self.system: if self.fake_cc_system_prompt: messages[0]["content"].insert(0, {"type": "text", "text": self.system}) else: payload["system"] = [{"type": "text", "text": self.system}] user_idxs = [i for i, m in enumerate(messages) if m['role'] == 'user'] for idx in user_idxs[-2:]: messages[idx] = {**messages[idx], "content": list(messages[idx]["content"])} messages[idx]["content"][-1] = dict(messages[idx]["content"][-1], cache_control={"type": "ephemeral"}) url = auto_make_url(self.api_base, "messages") + '?beta=true' parse_fn = (lambda r: _parse_claude_sse(r.iter_lines())) if self.stream else (lambda r: _parse_claude_json(r.json())) return (yield from _stream_with_retry(self, url, headers, payload, parse_fn)) def ask(self, msg): assert type(msg) is dict with self.lock: self.history.append(msg) trim_messages_history(self.history, self.context_win) messages = [{"role": m["role"], "content": list(m["content"])} for m in self.history] content_blocks = None gen = self.raw_ask(messages) try: while True: yield next(gen) except StopIteration as e: content_blocks = e.value or [] if content_blocks and (_injected := _ensure_text_block(content_blocks)): yield _injected if content_blocks and not (len(content_blocks) == 1 and content_blocks[0].get("text", "").startswith("!!!Error:")): self.history.append({"role": "assistant", "content": content_blocks}) text_parts = [b["text"] for b in content_blocks if b.get("type") == "text"] content = "\n".join(text_parts).strip() tool_calls = [MockToolCall(b["name"], b.get("input", {}), id=b.get("id", "")) for b in content_blocks if b.get("type") == "tool_use"] if not tool_calls: tool_calls, content = _parse_text_tool_calls(content) thinking_parts = [b["thinking"] for b in content_blocks if b.get("type") == "thinking"] thinking = "\n".join(thinking_parts).strip() if not thinking: think_pattern = r"(.*?)" think_match = re.search(think_pattern, content, re.DOTALL) if think_match: thinking = think_match.group(1).strip() content = re.sub(think_pattern, "", content, flags=re.DOTALL) return MockResponse(thinking, content, tool_calls, str(content_blocks)) class NativeOAISession(NativeClaudeSession): def raw_ask(self, messages): messages = _fix_messages(messages) messages = _ensure_thinking_blocks(messages, self.model) return (yield from _openai_stream(self, _msgs_claude2oai(messages))) def openai_tools_to_claude(tools): """[{type:'function', function:{name,description,parameters}}] → [{name,description,input_schema}].""" result = [] for t in tools: if 'input_schema' in t: result.append(t); continue # 已是claude格式 fn = t.get('function', t) result.append({'name': fn['name'], 'description': fn.get('description', ''), 'input_schema': fn.get('parameters', {'type': 'object', 'properties': {}})}) return result class MockFunction: def __init__(self, name, arguments): self.name, self.arguments = name, arguments class MockToolCall: def __init__(self, name, args, id=''): arg_str = json.dumps(args, ensure_ascii=False) if isinstance(args, (dict, list)) else (args or '{}') self.function = MockFunction(name, arg_str); self.id = id class MockResponse: def __init__(self, thinking, content, tool_calls, raw, stop_reason='end_turn'): self.thinking = thinking; self.content = content self.tool_calls = tool_calls; self.raw = raw self.stop_reason = 'tool_use' if tool_calls else stop_reason def __repr__(self): return f"" class ToolClient: def __init__(self, backend, auto_save_tokens=True): self.backend = backend self.auto_save_tokens = auto_save_tokens self.last_tools = '' self.name = self.backend.name self.total_cd_tokens = 0 self.log_path = None def chat(self, messages, tools=None): tools = json.loads(json.dumps(tools, ensure_ascii=False)) if tools else tools for t in tools or []: f = t.get('function', {}) if f.get('name') == 'file_write': props = f.get('parameters', {}).get('properties', {}) props.pop('content', None) extra = '. Content must be placed in tags in reply body, not in args' if extra not in f.get('description', ''): f['description'] = f.get('description', '') + extra break full_prompt = self._build_protocol_prompt(messages, tools) print("Full prompt length:", len(full_prompt), 'chars') gen = self.backend.ask(full_prompt) _write_llm_log('Prompt', full_prompt, self.log_path) raw_text = '' for chunk in gen: raw_text += chunk; yield chunk _write_llm_log('Response', raw_text, self.log_path) return self._parse_mixed_response(raw_text) def _prepare_tool_instruction(self, tools): tool_instruction = "" if not tools: return tool_instruction tools_json = json.dumps(tools, ensure_ascii=False, separators=(',', ':')) _en = os.environ.get('GA_LANG') == 'en' if _en: tool_instruction = f""" ### Interaction Protocol (must follow strictly, always in effect) Follow these steps to think and act: 1. **Think**: Analyze the current situation and strategy inside `` tags. 2. **Summarize**: Output a minimal one-line (<30 words) physical snapshot in ``: new info from last tool result + current tool call intent. This goes into long-term working memory. Must contain real information, no filler. 3. **Act**: If you need to call tools, output one or more ** blocks** after your reply, then stop. """ else: tool_instruction = f""" ### 交互协议 (必须严格遵守,持续有效) 请按照以下步骤思考并行动: 1. **思考**: 在 `` 标签中先进行思考,分析现状和策略。 2. **总结**: 在 `` 中输出*极为简短*的高度概括的单行(<30字)物理快照,包括上次工具调用结果产生的新信息+本次工具调用意图。此内容将进入长期工作记忆,记录关键信息,严禁输出无实际信息增量的描述。 3. **行动**: 如需调用工具,请在回复正文之后输出一个(或多个)**块**,然后结束。 """ tool_instruction += f'\nFormat: ```{{"name": "tool_name", "arguments": {{...}}}}```\n\n### Tools (mounted, always in effect):\n{tools_json}\n' if self.auto_save_tokens and self.last_tools == tools_json: tool_instruction = "\n### Tools: still active, **ready to call**. Protocol unchanged.\n" if _en else "\n### 工具库状态:持续有效(code_run/file_read等),**可正常调用**。调用协议沿用。\n" else: self.total_cd_tokens = 0 self.last_tools = tools_json return tool_instruction def _build_protocol_prompt(self, messages, tools): system_content = next((m['content'] for m in messages if m['role'].lower() == 'system'), "") history_msgs = [m for m in messages if m['role'].lower() != 'system'] tool_instruction = self._prepare_tool_instruction(tools) system = ""; user = "" if system_content: system += f"{system_content}\n" system += f"{tool_instruction}" for m in history_msgs: role = "USER" if m['role'] == 'user' else "ASSISTANT" user += f"=== {role} ===\n" for tr in m.get('tool_results', []): user += f'{tr["content"]}\n' user += str(m['content']) + "\n" self.total_cd_tokens += len(user) // 3 if self.total_cd_tokens > 9000: self.last_tools = '' user += "=== ASSISTANT ===\n" return system + user def _parse_mixed_response(self, text): remaining_text = text; thinking = '' think_match = re.search(r"(.*?)", text, re.DOTALL) if think_match: thinking = think_match.group(1).strip() remaining_text = re.sub(r"(.*?)", "", remaining_text, flags=re.DOTALL) tool_calls, remaining_text = _parse_text_tool_calls(remaining_text) if not tool_calls: json_strs = []; errors = [] if '' in remaining_text: weaktoolstr = remaining_text.split('')[-1].strip().strip('><') json_str = weaktoolstr if weaktoolstr.endswith('}') else '' if json_str == '' and '```' in weaktoolstr and weaktoolstr.split('```')[0].strip().endswith('}'): json_str = weaktoolstr.split('```')[0].strip() if json_str: json_strs.append(json_str) remaining_text = remaining_text.replace(''+weaktoolstr, "") elif '"name":' in remaining_text and '"arguments":' in remaining_text: json_match = re.search(r'\{.*"name":.*\}', remaining_text, re.DOTALL) if json_match: json_strs.append(json_match.group(0).strip()) remaining_text = remaining_text.replace(json_match.group(0), "").strip() for json_str in json_strs: try: data = tryparse(json_str) func_name = data.get('name') or data.get('function') or data.get('tool') args = data.get('arguments') or data.get('args') or data.get('params') or data.get('parameters') if args is None: args = data if func_name: tool_calls.append(MockToolCall(func_name, args)) except json.JSONDecodeError: errors.append(f'Failed to parse tool_use JSON: {json_str[:200]}') self.last_tools = '' except: pass if not tool_calls: for e in errors: print(f"[Warn] {e}"); tool_calls.append(MockToolCall('bad_json', {'msg': e})) return MockResponse(thinking, remaining_text.strip(), tool_calls, text) def _parse_text_tool_calls(content): """Fallback: extract tool calls from text when model doesn't use native tool_use blocks.""" tcs = [] # try JSON array: [{"type":"tool_use", "name":..., "input":...}] _jp = next((p for p in ['[{"type":"tool_use"', '[{"type": "tool_use"'] if p in content), None) if _jp and content.endswith('}]'): try: idx = content.index(_jp); raw = json.loads(content[idx:]) tcs = [MockToolCall(b["name"], b.get("input", {}), id=b.get("id", "")) for b in raw if b.get("type") == "tool_use"] return tcs, content[:idx].strip() except: pass # try XML tags: {"name":..., "arguments":...} _xp = r"<(?:tool_use|tool_call)>((?:(?!<(?:tool_use|tool_call)>).){15,}?)" for s in re.findall(_xp, content, re.DOTALL): try: d = tryparse(s.strip()); name = d.get('name') args = d.get('arguments') or d.get('args') or d.get('input') or {} if name: tcs.append(MockToolCall(name, args)) except: pass if tcs: content = re.sub(_xp, "", content, flags=re.DOTALL).strip() return tcs, content def _ensure_text_block(blocks): """If response has thinking but no text block, inject a synthetic summary from thinking's first line.""" if any(b.get("type") == "text" for b in blocks): return None th = next((b.get("thinking", "") for b in blocks if b.get("type") == "thinking"), "") if not th: return None line = th.strip().split('\n', 1)[0] txt = "" + (line[:60] + '...' if len(line) > 60 else line) + "" blocks.insert(1, {"type": "text", "text": txt}) return txt def _write_llm_log(label, content, log_path=None): if not log_path: log_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), f'temp/model_responses/model_responses_{os.getpid()}.txt') os.makedirs(os.path.dirname(os.path.abspath(log_path)), exist_ok=True) ts = datetime.now().strftime('%Y-%m-%d %H:%M:%S') with open(log_path, 'a', encoding='utf-8', errors='replace') as f: f.write(f"=== {label} === {ts}\n{content}\n\n") def tryparse(json_str): try: return json.loads(json_str) except: pass json_str = json_str.strip().strip('`').replace('json\n', '', 1).strip() try: return json.loads(json_str) except: pass try: return json.loads(json_str[:-1]) except: pass if '}' in json_str: json_str = json_str[:json_str.rfind('}') + 1] return json.loads(json_str) class MixinSession: """Multi-session fallback with spring-back to primary.""" def __init__(self, all_sessions, cfg): self._retries, self._base_delay = cfg.get('max_retries', 3), cfg.get('base_delay', 1.5) self._spring_sec = cfg.get('spring_back', 300) self._sessions = [all_sessions[i].backend if isinstance(i, int) else next(s.backend for s in all_sessions if type(s) is not dict and s.backend.name == i) for i in cfg.get('llm_nos', [])] is_native = lambda s: 'Native' in s.__class__.__name__ groups = {is_native(s) for s in self._sessions} assert len(groups) == 1, f"MixinSession: sessions must be in same group (Native or non-Native), got {[type(s).__name__ for s in self._sessions]}" self.name = '|'.join(s.name for s in self._sessions) import copy; self._sessions = [copy.copy(s) for s in self._sessions] for s in self._sessions: s.max_retries = 0 self._orig_raw_asks = [s.raw_ask for s in self._sessions] self._sessions[0].raw_ask = self._raw_ask self.model = getattr(self._sessions[0], 'model', None) self._cur_idx, self._switched_at = 0, 0.0 def __getattr__(self, name): return getattr(self._sessions[0], name) _BROADCAST_ATTRS = frozenset({'system', 'tools', 'temperature', 'max_tokens', 'reasoning_effort', 'history'}) def __setattr__(self, name, value): if name in self._BROADCAST_ATTRS: for s in self._sessions: v = openai_tools_to_claude(value) if name == 'tools' and type(s) is NativeClaudeSession else value setattr(s, name, v) else: object.__setattr__(self, name, value) @property def primary(self): return self._sessions[0] def _pick(self): if self._cur_idx and time.time() - self._switched_at > self._spring_sec: self._cur_idx = 0 return self._cur_idx def _raw_ask(self, *args, **kwargs): base, n = self._pick(), len(self._sessions) test_error = lambda x: isinstance(x, str) and x.lstrip().startswith(('!!!Error:', '[Error:')) for attempt in range(self._retries + 1): idx = (base + attempt) % n gen = self._orig_raw_asks[idx](*args, **kwargs) print(f'[MixinSession] Using session ({self._sessions[idx].name})') last_chunk, return_val, yielded = None, [], False try: while True: chunk = next(gen); last_chunk = chunk if not yielded and test_error(chunk): continue yield chunk; yielded = True except StopIteration as e: return_val = e.value or [] is_err = test_error(last_chunk) if not is_err: if attempt > 0: self._cur_idx = idx; self._switched_at = time.time() elif isinstance(last_chunk, str) and '[!!! 流异常中断' in last_chunk and n > 1: self._cur_idx = (idx + 1) % n; self._switched_at = time.time() print(f'[MixinSession] Partial failure, next call → s{self._cur_idx} ({self._sessions[self._cur_idx].name})') return return_val if attempt >= self._retries: yield last_chunk; return return_val nxt = (base + attempt + 1) % n if nxt == base: # full round failed, delay before next rnd = (attempt + 1) // n delay = min(30, self._base_delay * (1.5 ** rnd)) print(f'[MixinSession] {last_chunk[:80]}, round {rnd} exhausted, retry in {delay:.1f}s') time.sleep(delay) else: print(f'[MixinSession] {last_chunk[:80]}, retry {attempt+1}/{self._retries} (s{idx}→s{nxt})') THINKING_PROMPT_ZH = """ ### 行动规范(持续有效) 每次回复(含工具调用轮)都先在回复文字中包含一个 中输出极简单行(<30字)物理快照:上次结果新信息+本次意图。此内容进入长期工作记忆。 \n**若用户需求未完成,必须进行工具调用!** """.strip() THINKING_PROMPT_EN = """ ### Action Protocol (always in effect) The reply body should first include a minimal one-line (<30 words) physical snapshot in : new info from last result + current intent. This goes into long-term working memory. \n**If the user's request is not yet complete, tool calls are required!** """.strip() class NativeToolClient: @staticmethod def _thinking_prompt(): return THINKING_PROMPT_EN if os.environ.get('GA_LANG') == 'en' else THINKING_PROMPT_ZH def __init__(self, backend): self.backend = backend self.backend.system = self._thinking_prompt() self.name = self.backend.name self._pending_tool_ids = [] self.log_path = None def set_system(self, extra_system): combined = f"{extra_system}\n\n{self._thinking_prompt()}" if extra_system else self._thinking_prompt() if combined != self.backend.system: print(f"[Debug] Updated system prompt, length {len(combined)} chars.") self.backend.system = combined def chat(self, messages, tools=None): if tools: self.backend.tools = tools if not self.backend.history: self._pending_tool_ids = [] combined_content = []; resp = None; tool_results = [] for msg in messages: c = msg.get('content', '') if msg['role'] == 'system': self.set_system(c); continue if isinstance(c, str): combined_content.append({"type": "text", "text": c}) elif isinstance(c, list): combined_content.extend(c) if msg['role'] == 'user' and msg.get('tool_results'): tool_results.extend(msg['tool_results']) tr_id_set = set(); tool_result_blocks = [] for tr in tool_results: tool_use_id, content = tr.get("tool_use_id", ""), tr.get("content", "") tr_id_set.add(tool_use_id) if tool_use_id: tool_result_blocks.append({"type": "tool_result", "tool_use_id": tool_use_id, "content": tr.get("content", "")}) else: combined_content = [{"type": "text", "text": f'{content}'}] + combined_content for tid in self._pending_tool_ids: if tid not in tr_id_set: tool_result_blocks.append({"type": "tool_result", "tool_use_id": tid, "content": ""}) self._pending_tool_ids = [] # Filter whitespace-only text blocks that cause 400 on strict API proxies filtered_content = [c for c in combined_content if c.get("text", "").strip()] final_content = tool_result_blocks + filtered_content if not final_content: final_content = [{"type": "text", "text": "."}] merged = {"role": "user", "content": final_content} _write_llm_log('Prompt', json.dumps(merged, ensure_ascii=False, indent=2), self.log_path) gen = self.backend.ask(merged) try: while True: chunk = next(gen); yield chunk except StopIteration as e: resp = e.value if resp: _write_llm_log('Response', resp.raw, self.log_path) if resp and hasattr(resp, 'tool_calls') and resp.tool_calls: self._pending_tool_ids = [tc.id for tc in resp.tool_calls] return resp def resolve_session(cfg_name): cfg = reload_mykeys()[0].get(cfg_name) if not cfg: raise ValueError(f"Config '{cfg_name}' not in mykey") if 'native' in cfg_name: return (NativeClaudeSession if 'claude' in cfg_name else NativeOAISession)(cfg=cfg) if 'claude' in cfg_name: return ClaudeSession(cfg=cfg) return LLMSession(cfg=cfg) if 'oai' in cfg_name else None def resolve_client(cfg_name): s = resolve_session(cfg_name) return (NativeToolClient(s) if isinstance(s, (NativeClaudeSession, NativeOAISession)) else ToolClient(s)) if s else None def fast_ask(prompt, cfg_name): sess = resolve_session(cfg_name) if not sess: raise ValueError(f"fast_ask: '{cfg_name}' unsupported") return "".join(sess.raw_ask([{"role": "user", "content": prompt}])) ================================================ FILE: memory/adb_ui.py ================================================ # adb_ui.py - 一键dump+解析Android UI (u2优先,原生fallback) # u2 (uiautomator2) 不受idle限制,适合动画密集app(美团等) # 弹窗检测: ui(clickable_only=True, raw=True) 找全屏FrameLayout+底部小ImageView(关闭X) # 已知包名: 美团外卖=com.sankuai.meituan.takeoutnew 淘宝=com.taobao.taobao import subprocess, xml.etree.ElementTree as ET, os, re, shutil ADB = shutil.which("adb") or "adb" LOCAL_XML = os.path.join(os.path.dirname(os.path.abspath(__file__)), "ui_mt.xml") def _dump_u2(): """用uiautomator2 dump,不受idle限制""" try: import uiautomator2 as u2 d = u2.connect() xml_str = d.dump_hierarchy() if xml_str and len(xml_str) > 100: return xml_str except Exception as e: print(f"[u2 fallback] {e}") return None def _dump_native(): """原生uiautomator dump(需idle状态)""" subprocess.run([ADB, "shell", "rm", "-f", "/sdcard/ui.xml"], capture_output=True) r = subprocess.run([ADB, "shell", "uiautomator", "dump", "--compressed", "/sdcard/ui.xml"], capture_output=True, text=True, timeout=15) if "dumped" not in r.stdout.lower() and "dumped" not in r.stderr.lower(): print(f"dump failed: {r.stdout}{r.stderr}"); return None subprocess.run([ADB, "pull", "/sdcard/ui.xml", LOCAL_XML], capture_output=True, timeout=10) with open(LOCAL_XML, "r", encoding="utf-8") as f: return f.read() def _parse_xml(xml_str, keyword=None, clickable_only=False, raw=False): """解析XML字符串为节点列表""" root = ET.fromstring(xml_str) nodes = [] for n in root.iter("node"): pkg = n.get("package", "") if "termux" in pkg.lower(): continue text = n.get("text", "") desc = n.get("content-desc", "") bounds = n.get("bounds", "") click = n.get("clickable") == "true" cls = n.get("class", "").split(".")[-1] rid = n.get("resource-id", "") label = text or desc if not label and not click and not raw: continue if clickable_only and not click: continue if keyword and keyword.lower() not in label.lower(): continue cx, cy = 0, 0 if bounds: m = re.findall(r'\[(\d+),(\d+)\]', bounds) if len(m) == 2: cx = (int(m[0][0]) + int(m[1][0])) // 2 cy = (int(m[0][1]) + int(m[1][1])) // 2 edit = cls == "EditText" nodes.append({"text": text or desc, "click": click, "edit": edit, "cx": cx, "cy": cy, "cls": cls, "rid": rid}) return nodes def ui(keyword=None, clickable_only=False, raw=False): """一键dump+解析Android UI (u2优先) keyword: 过滤含关键词的节点 clickable_only: 只显示可点击节点 raw: 返回原始节点列表而非打印 """ xml_str = _dump_u2() or _dump_native() if not xml_str: print("dump failed (both u2 and native)"); return [] nodes = _parse_xml(xml_str, keyword, clickable_only, raw) if not raw: for n in nodes: flag = "E" if n.get("edit") else ("Y" if n["click"] else " ") coord = f"({n['cx']},{n['cy']})" if n['cx'] else "" display_text = n['text'] if not display_text: hint = n.get('rid', '').split('/')[-1] or n.get('cls', 'icon') display_text = f"<{hint}>" print(f"[{flag}] {display_text} {coord}") print(f"\ntotal: {len(nodes)} nodes") return nodes def tap(x, y): subprocess.run([ADB, "shell", "input", "tap", str(x), str(y)], capture_output=True) print(f"tap({x},{y}) ok") if __name__ == "__main__": ui() ================================================ FILE: memory/autonomous_operation_sop.md ================================================ # 自主行动 SOP ⚠️ **路径警告**:autonomous_reports 在 temp/ 下,用`./autonomous_reports/`访问,**不是**`../memory/autonomous_reports/`或`../autonomous_reports/`!TODO在cwd下。 报告存于 `./autonomous_reports/`,文件名 `RXX_简短描述.md`(XX从 history.txt 推断自增)。 授权你进行自主行动,只要不对环境造成副作用都可进行。 ## 启动(第一步) - update_working_checkpoint: `自主行动|收尾时重读SOP | from autonomous_operation_sop.helper import *; set_todo()/complete_task(tasktitle, historyline, report_path)` 第二步: ```python from autonomous_operation_sop.helper import * print(get_history(40)) # 了解历史避免重复 print(get_todo()) # 查看待办 ``` ## 任务选择 - 有未完成条目 → 取**一条**,直接进入执行,其他条目下次执行 - 无 TODO → 读 `autonomous_operation_sop/task_planning.md` 规划,下次执行 - 不连续两次选相同子任务 - 价值公式:**「AI训练数据无法覆盖」×「对未来协作有持久收益」** ## 执行 - 选定任务后 update_working_checkpoint,将选中的TODO条目和执行注意事项 append 到 checkpoint 中 - 调用code_run准备结尾回调, script为handler._done_hooks.append("重读自主任务sop,检查你刚刚的收尾工作是否正确,不正确则改正"),inline_eval=True (秘密参数) - ≤30回合,小步快跑,边探测边实验 - 用临时脚本验证假设;禁只读即下结论,完整验证再写报告 - 即使失败也记录实验过程和结果,失败报告同样有价值 - 用户不在线,遇到需要决策的问题写入报告待审,不要卡住 **收尾(4件事缺一不可)**: 0. 重读本sop 1. 在cwd写报告(文件名任意),若有记忆更新建议,附在报告末尾 2. `from/import helper; complete_task(tasktitle, historyline, report_path)` → 自动编号+移报告到 autonomous_reports/+prepend history(historyline 格式:`类型 | 主题 | 结论`,严格单行) 3. `set_todo()` 获取TODO路径 → 将已完成条目标记为 `[x]`(注意前缀) 4. 结束,剩余TODO留到下次再做 ## 权限边界 - 无需批准:只读探测、cwd内写操作/脚本实验 - 需写入报告待审:修改 global_mem / memory下SOP、安装软件、外部API调用、删除非临时文件 - 绝对禁止:读取密钥、修改核心代码库、不可逆危险操作 ## 等待用户审查 - 用户归来后审查报告,决定批准、修改或拒绝方案 ================================================ FILE: memory/goal_mode_sop.md ================================================ # Goal Mode SOP ## 何时使用 用户给出开放目标 + 时间预算(如"花3小时持续优化X"、"没事也找事干"),且不是一次性闭环任务。 ## 设置 写 `temp/goal_state.json`(或自定义路径): ```json { "objective": "用户原话目标", "budget_seconds": 10800, "start_time": , "turns_used": 0, "max_turns": 200, "status": "running" } ``` - `budget_seconds`:最少 3 小时(10800),按用户要求调整 - `max_turns`:防空转上限,一般 200 够用 - `status`:必须为 `"running"` ## 启动 必须后台启动(长时间运行,不占前台终端): ```bash # 默认路径 temp/goal_state.json start /b python agentmain.py --reflect reflect/goal_mode.py # 自定义路径(多实例) set GOAL_STATE=temp/goal_xxx.json && start /b python agentmain.py --reflect reflect/goal_mode.py # 用其他模型跑(--llm_no 选择已配置的第N个LLM,从0开始) set GOAL_STATE=temp/goal_xxx.json && start /b python agentmain.py --reflect reflect/goal_mode.py --llm_no 1 ``` ## 停止 - 预算耗尽时自动进入收口轮,然后停止 - 手动停:杀进程 ## 观察进度 - 状态:读 goal_state.json 的 `turns_used` / `status` - 详情:看 `temp/model_responses/` 下最近修改的文件尾部 ================================================ FILE: memory/keychain.py ================================================ """Keychain: save key to a file, then keys.set("name", file="path"); keys.name.use() to retrieve (use but no print).""" import json, os, hashlib, pathlib, getpass _PATH = pathlib.Path.home() / "ga_keychain.enc" try: _user = os.getlogin() except OSError: _user = getpass.getuser() _MASK = hashlib.sha256(f"{_user}@ga_keychain".encode()).digest() def _xor(data: bytes) -> bytes: return bytes(b ^ _MASK[i % len(_MASK)] for i, b in enumerate(data)) class SecretStr: def __init__(self, name: str, val: str): self._name, self._val = name, val def use(self) -> str: return self._val def __repr__(self): n = len(self._val) if n <= 4: preview = '***' elif n <= 16: preview = f"{self._val[:3]}···{self._val[-3:]}" elif n <= 40: preview = f"{self._val[:6]}···{self._val[-6:]} len={n}" else: preview = f"{self._val[:10]}···{self._val[-6:]} len={n}" return f"SecretStr({self._name}={preview}) # .use() to get raw, do not print raw value" __str__ = __repr__ class _Keys: def __init__(self): self._d = {} if _PATH.exists(): try: self._d = json.loads(_xor(_PATH.read_bytes())) except Exception as e: print(f"[keychain] WARNING: failed to load {_PATH}: {e}") print(f"[keychain] Starting with empty keychain. Old file kept as .bak") _PATH.rename(_PATH.with_suffix('.enc.bak')) def __getattr__(self, k): if k.startswith('_'): raise AttributeError(k) if k not in self._d: raise KeyError(f"No secret: {k}") return SecretStr(k, self._d[k]) def set(self, k, v=None, *, file=None): if file: v = pathlib.Path(file).read_text().strip() self._d[k] = v _PATH.write_bytes(_xor(json.dumps(self._d).encode())) def ls(self): return list(self._d.keys()) keys = _Keys() def __getattr__(name): return getattr(keys, name) ================================================ FILE: memory/ljqCtrl.py ================================================ """ CRITICAL: 严禁在此工具链中 import pyautogui (会污染 win32api 导致逻辑冲突)。 ljqCtrl Quick Reference: - dpi_scale: float (Logical = Physical * dpi_scale) - Click(x, y): Use Physical Coordinates (from screenshots) - SetCursorPos(z): Use Physical Coordinates z=(x, y) - Press(cmd, staytime=0): Keyboard shortcuts (e.g. 'ctrl+v') - FindBlock(fn, wrect=None, threshold=0.8) -> (obj_center_phys, is_found) - MouseDClick(staytime=0.05), MouseClick(staytime=0.05) - GrabWindow(hwnd) -> PIL Image: DPI-safe window screenshot """ import os, sys, time, random, math, win32api, win32con, ctypes import numpy as np dpi_scale = 1 try: from PIL import ImageGrab, Image, ImageEnhance, ImageFilter, ImageDraw import cv2 except: pass _hdc = ctypes.windll.user32.GetDC(0) swidth = ctypes.windll.gdi32.GetDeviceCaps(_hdc, 118) # DESKTOPHORZRES (物理) sheight = ctypes.windll.gdi32.GetDeviceCaps(_hdc, 117) # DESKTOPVERTRES ctypes.windll.user32.ReleaseDC(0, _hdc) cwidth = win32api.GetSystemMetrics(win32con.SM_CXSCREEN) # 逻辑 cheight = win32api.GetSystemMetrics(win32con.SM_CYSCREEN) dpi_scale = cwidth / swidth print('Screen width & height:', swidth, sheight) print('dpi_scale:', dpi_scale) def MouseDown(): win32api.mouse_event(win32con.MOUSEEVENTF_LEFTDOWN,0,0) def MouseUp(): win32api.mouse_event(win32con.MOUSEEVENTF_LEFTUP,0,0) def MouseClick(staytime=0.05): MouseDown(); time.sleep(staytime) MouseUp(); time.sleep(0.05) def MouseDClick(staytime=0.05): MouseDown(); MouseUp() MouseDown(); MouseUp() time.sleep(0.05) def SetCursorPos(z): z = tuple(map(lambda v:int(v*dpi_scale), z)) win32api.SetCursorPos(z) time.sleep(0.05) def Click(x, y=None): if type(x) is type(tuple()): x, y = int(x[0]), int(x[1]) SetCursorPos( (x, y) ) MouseClick() click = Click def Press(cmd, staytime=0): if type(cmd) is list: cmds = [x.lower() for x in cmd] else: cmds = cmd.lower().split('+') for z in cmds: win32api.keybd_event(VK_CODE[z], 0, 0, 0) time.sleep(staytime) for z in reversed(cmds): time.sleep(staytime) win32api.keybd_event(VK_CODE[z], 0, win32con.KEYEVENTF_KEYUP, 0) press = Press VK_CODE = {'backspace':0x08, 'tab':0x09, 'clear':0x0C, 'enter':0x0D, 'shift':0x10, 'ctrl':0x11, 'alt':0x12, 'pause':0x13, 'caps_lock':0x14, 'esc':0x1B, 'escape':0x1B, 'space':0x20, 'page_up':0x21, 'page_down':0x22, 'end':0x23, 'home':0x24, 'left_arrow':0x25, 'up_arrow':0x26, 'right_arrow':0x27, 'down_arrow':0x28, 'select':0x29, 'print':0x2A, 'execute':0x2B, 'print_screen':0x2C, 'ins':0x2D, 'del':0x2E, 'help':0x2F, '0':0x30, '1':0x31, '2':0x32, '3':0x33, '4':0x34, '5':0x35, '6':0x36, '7':0x37, '8':0x38, '9':0x39, 'a':0x41, 'b':0x42, 'c':0x43, 'd':0x44, 'e':0x45, 'f':0x46, 'g':0x47, 'h':0x48, 'i':0x49, 'j':0x4A, 'k':0x4B, 'l':0x4C, 'm':0x4D, 'n':0x4E, 'o':0x4F, 'p':0x50, 'q':0x51, 'r':0x52, 's':0x53, 't':0x54, 'u':0x55, 'v':0x56, 'w':0x57, 'x':0x58, 'y':0x59, 'z':0x5A, 'numpad_0':0x60, 'numpad_1':0x61, 'numpad_2':0x62, 'numpad_3':0x63, 'numpad_4':0x64, 'numpad_5':0x65, 'numpad_6':0x66, 'numpad_7':0x67, 'numpad_8':0x68, 'numpad_9':0x69, 'multiply_key':0x6A, 'add_key':0x6B, 'separator_key':0x6C, 'subtract_key':0x6D, 'decimal_key':0x6E, 'divide_key':0x6F, 'F1':0x70, 'F2':0x71, 'F3':0x72, 'F4':0x73, 'F5':0x74, 'F6':0x75, 'F7':0x76, 'F8':0x77, 'F9':0x78, 'F10':0x79, 'F11':0x7A, 'F12':0x7B, 'F13':0x7C, 'F14':0x7D, 'F15':0x7E, 'F16':0x7F, 'F17':0x80, 'F18':0x81, 'F19':0x82, 'F20':0x83, 'F21':0x84, 'F22':0x85, 'F23':0x86, 'F24':0x87, 'num_lock':0x90, 'scroll_lock':0x91, 'left_shift':0xA0, 'right_shift ':0xA1, 'left_control':0xA2, 'right_control':0xA3, 'left_menu':0xA4, 'right_menu':0xA5, 'browser_back':0xA6, 'browser_forward':0xA7, 'browser_refresh':0xA8, 'browser_stop':0xA9, 'browser_search':0xAA, 'browser_favorites':0xAB, 'browser_start_and_home':0xAC, 'volume_mute':0xAD, 'volume_Down':0xAE, 'volume_up':0xAF, 'next_track':0xB0, 'previous_track':0xB1, 'stop_media':0xB2, 'play/pause_media':0xB3, 'start_mail':0xB4, 'select_media':0xB5, 'start_application_1':0xB6, 'start_application_2':0xB7, 'attn_key':0xF6, 'crsel_key':0xF7, 'exsel_key':0xF8, 'play_key':0xFA, 'zoom_key':0xFB, 'clear_key':0xFE, '+':0xBB, ',':0xBC, '-':0xBD, '.':0xBE, '/':0xBF, '`':0xC0, ';':0xBA, '[':0xDB, '\\':0xDC, ']':0xDD, "'":0xDE} VK_CODE = {k.lower():v for k,v in VK_CODE.items()} def GrabWindow(hwnd): import win32gui; win32gui.SetForegroundWindow(hwnd); time.sleep(0.3) bbox = tuple(int(v / dpi_scale) for v in win32gui.GetWindowRect(hwnd)) return ImageGrab.grab(bbox) def imshow(mt, sec=0): cv2.imshow('cc', mt) cv2.waitKey(sec) def GetWRect(sr): num = int(sr[-1]) l, u, r, b = 0, 0, swidth, sheight if 'left' in sr: r = swidth // num if 'right' in sr: l = swidth * (num-1) // num if 'top' in sr: b = sheight // num if 'bottom' in sr: u = sheight * (num-1) // num return [l, u, r, b] def FindBlock(fn, wrect=None, verbose=0, threshold=0.8): tic = time.process_time() if wrect is not None and isinstance(wrect, Image.Image): scr, wrect = wrect, None else: if isinstance(wrect, str): wrect = GetWRect(wrect) scr = ImageGrab.grab(wrect) blc = Image.open(fn) if isinstance(fn, str) else fn T = cv2.cvtColor(np.array(blc), cv2.COLOR_RGB2BGR) B = cv2.cvtColor(np.array(scr), cv2.COLOR_RGB2BGR) tsh, tsw = T.shape[:2] if verbose: print('T.shape:', T.shape, '\t', 'B.shape:', B.shape) res = cv2.matchTemplate(B, T, cv2.TM_CCOEFF_NORMED) min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res) oj, oi = max_loc if wrect is None: wrect = [0, 0, scr.size[0], scr.size[1]] obj = (oj + wrect[0] + tsw//2, oi + wrect[1] + tsh//2) if verbose: print(f'Max match: {max_val:.4f} at ({oj}, {oi}) cost: {time.process_time() - tic:.3f}s') sscr = scr.crop([oj, oi, oj+tsw, oi+tsh]) sscr.show() return obj, max_val > threshold if __name__ == '__main__': #time.sleep(3) #SetCursorPos( (1640, 131) ) #MouseClick() #print(FindBlock('z:/z.png', [1638, 214, 5838, 414], verbose=1)) print('completed %.3f' % time.process_time()) ================================================ FILE: memory/ljqCtrl_sop.md ================================================ # ljqCtrl 使用与坐标转换 SOP > **must call update working ckp**:`ljqCtrl一律使用物理坐标|禁pyautogui|操作前先gw激活窗口` ## 0. API 快速参考 (Signatures) - `ljqCtrl.dpi_scale`: float (缩放系数 = 逻辑宽度 / 物理宽度) - `ljqCtrl.SetCursorPos(z)`: 移动鼠标到逻辑坐标 z=(x, y) - `ljqCtrl.Click(x, y=None)`: 模拟点击。支持 `Click((x, y))` 或 `Click(x, y)` - `ljqCtrl.Press(cmd, staytime=0)`: 模拟按键。如 `Press('ctrl+c')` - `ljqCtrl.FindBlock(fn, wrect=None, threshold=0.8)`: 找图。返回 `((center_x, center_y), is_found)` - `ljqCtrl.MouseDClick(staytime=0.05)`: 鼠标双击 ## 1. 环境载入 必须先将 `../memory` 加入路径,才能导入工具模块: ```python import sys, os, pygetwindow as gw sys.path.append("../memory") import ljqCtrl ``` ## 2. 核心:High-DPI 物理坐标换算 `ljqCtrl` 的 `Click/MoveTo` 接口接收的是**物理像素坐标**。 当使用 `pygetwindow` 等工具获取窗口位置(逻辑坐标)时,必须除以缩放系数。 - **换算公式**:`物理坐标 = 逻辑坐标 / ljqCtrl.dpi_scale` - **注意**:3840 (4K) 仅为当前开发机示例,实际物理边界由系统环境决定,代码应始终通过 `dpi_scale` 动态计算。 ## 3. 窗口操作与点击流程 1. **激活窗口**:使用 `gw.getWindowsWithTitle('标题')` 获取窗口,执行 `restore()` 和 `activate()`。 2. **坐标计算**: ```python win = gw.getWindowsWithTitle('微信')[0] # 计算窗口内某个点的逻辑坐标 (lx, ly) # 转换为物理坐标并点击 px, py = lx / ljqCtrl.dpi_scale, ly / ljqCtrl.dpi_scale ljqCtrl.Click(px, py) ``` ## 4. 避坑指南 - **⚠️ 一律使用物理坐标**:传给 ljqCtrl.Click/SetCursorPos 的坐标必须是物理坐标(=截图像素坐标)。从 pygetwindow 获取的逻辑坐标需先 `/ dpi_scale` 转换。禁止传入逻辑坐标。 - **物理验证**:模拟操作前必须确保窗口已通过 `activate()` 置于前台。 - **偏移量**:所有的相对偏移像素值(如“向右移动 10 像素”)同样需要除以 `dpi_scale`。 - **坐标对齐**: 物理坐标 = 截图坐标;ljqCtrl 自动处理 DPI 换算,禁止手动重复计算。 - **⚠️ 窗口坐标转换陷阱**:使用 `win32gui.GetWindowRect(hwnd)` 获取的矩形包含标题栏和边框,而截图内容是客户区。点击截图内元素时,必须用 `win32gui.ClientToScreen(hwnd, (0, 0))` 获取客户区原点的屏幕坐标,再加上截图内坐标。禁止直接用 GetWindowRect 左上角 + 截图坐标。 - **⚠️ win32 DPI 坐标陷阱**:未调用 `SetProcessDPIAware()` 时,`GetWindowRect/ClientToScreen/GetClientRect` 等拿到的窗口/客户区坐标通常是**逻辑坐标**;若后续截图或 `ljqCtrl` 使用的是物理像素,必须统一做 `坐标 / ljqCtrl.dpi_scale`。等价方案:先 `SetProcessDPIAware()`,之后全流程直接使用 raw 物理坐标,禁止逻辑/物理坐标混用。 - **文本输入**:ljqCtrl 无 TypeText/SendKeys。向输入框键入文本:先点击/三击选中字段,再 `pyperclip.copy('文本'); ljqCtrl.Press('ctrl+v')`。 ================================================ FILE: memory/memory_cleanup_sop.md ================================================ # 记忆整理 SOP ## 核心原则:存在性编码 LLM自身是压缩器+解码器。L1只需让它**意识到某类知识存在**,它就能通过tool call自行取用深层内容。 **L1本质:用最短词数表达——什么场景下有什么记忆可用(存在性)。** L1两类内容,统一ROI评估: - **存在性指针**:指向L2/L3知识的最短触发词 - **行为规则**:不提醒就会犯的错(致命/高频均可,只要ROI过门槛) ROI = (不放这几个词的犯错概率 × 代价) / 每轮词数成本 ## 快速判断 **该留**:反直觉触发词——没提示就想不到去查SOP的场景词。如`tmwebdriver_sop(httponly cookie)`:没有`httponly cookie`这个词,你不会想到取cookie要查tmwebdriver **该删**: - 名字翻译:`proxy-pool/(代理池)` → 名字自解释,括号是废词,直接`proxy-pool`即可 - 内容描述:`opencli_sop(66站点CLI,复用Chrome session)` → 实现细节属于SOP内部,不是触发场景 - 直觉能力:不提醒也能想到 → 0收益,白交每轮成本 - 冗余:L3已覆盖的规则 / L1其他行已含的片段 ## 压缩四原则 1. **命名自解释 > 加描述**:SOP名能说清的,L1不加注释;改名的ROI常高于改L1 2. **存在性集合最小描述**:多个相近条目若可被同一上位场景覆盖,用集合名表达这类能力的存在,不必平铺子项。如`qq操作/飞书操作/企微操作`→`im操作:*_im_sop`;子项名自解释则只列名不翻译 3. **条目 = 场景↔方案存在性**:如`视频理解:yt-dlp取字幕`、`fofa(资产测绘)`——场景名是触发词,方案名编码存在性;括号内**只放反直觉触发词**,非反直觉的(纯翻译/内容描述/实现细节)全是浪费 4. **分层归位**:带行为规则或高频高ROI的条目放上方场景行,纯存在性指针归L2/L3平铺列表 ## 整理流程 1. 逐行读L1,按`|`拆片段,先分类:存在性指针 / RULES / 翻译 / 内容描述 / 实现细节 / 冗余 2. 先清RULES:逐条问“这是全局高ROI,还是特定场景低危险规则?” - 全局高ROI → 留 - 特定场景 / 低危险 → 降级到L3或删除 3. 再清存在性指针:检查是否在表达**场景↔方案存在性**;场景触发词只在**反直觉**时才加,翻译/内容描述/实现细节删掉 4. 检查L3文件名是否自解释;能靠改名解决的,不靠L1加描述;最后验证总行数 ≤ 30 **红线**:记忆修改是持久性伤害,错误每轮复利。L1只能patch词级别修改,禁overwrite 产生误导应及时修正L1或记忆更名 ================================================ FILE: memory/memory_management_sop.md ================================================ ## 0. 核心公理 (Core Axioms - 最高优先级) 1. **行动验证原则 (Action-Verified Only)** * **定义**:任何写入 L1/L2/L3 的信息,必须源自**成功的工具调用结果**(如 `shell` 执行成功、`file_read` 确认内容存在、代码运行通过)。 * **禁止**:严禁将模型的“固有知识”、“推理猜测”、“未执行的计划”或“未验证的假设”作为事实写入。 * **口号**:**No Execution, No Memory. (无行动,不记忆)** 2. **神圣不可删改性 (Sanctity of Verified Data)** * **定义**:凡是经过行动验证的有效配置、避坑指南、关键路径,在重构(Refactoring/GC)时**严禁丢弃**。 * **操作**:可以压缩文字、可以迁移层级(从 L2 移到 L3),但绝不能丢失信息的准确性和可追溯性。 * 记忆修改时请极度小心,尽量不要overwrite或code run。只能少量patch,改不动宁愿不改。 3. **禁止存储易变状态 (No Volatile State)** * **定义**:严禁存储随时间/会话高频变化的数据。 * **示例**:当前时间戳、临时 Session ID、正在运行的 PID、某个具体绝对路径、连接的设备信息 4. **最小充分指针 (Minimum Sufficient Pointer)** * 上层只留能定位下层的最短标识,多一词即冗余。 --- ## 记忆层级架构 ``` L1: global_mem_insight.txt (极简索引层 - 严格控制 ≤30 行) ↓ 导航指向 (Pointer) L2: global_mem.txt (事实库层 - 现短但会膨胀) ↓ 详细引用 (Reference) L3: ../memory/ (记录库层 - 包含 .md/.py 等各类文件) L4: ../memory/L4_raw_sessions/ (历史会话层 - scheduler反射自动收集,可定位过往上下文) ``` --- ## 各层职责与原则 ### L1:全局内存索引 (global_mem_insight.txt) **职责**:为 L2 和 L3 提供极简导航索引,确保关键能力可被发现。 **特征**: - 体积限制:≤ 30 行(硬约束),< 1k tokens(期望)。严禁填写细节(除非极高频任务) - 内容:两层「场景关键词→记忆定位」映射 + RULES(红线规则 + 高频犯错点) - 第一层:高频场景 key→value(直接给出 sop/py/L2 section 名),自包含名称只写一词不重复翻译 - 第二层:低频场景仅列关键词,需要时 read L2 或 ls L3 自行定位 - 核心:场景触发词极重要(不索引则不知有此能力),但严禁写How-to细节 - RULES:压缩版避坑准则,包含: - 红线规则(致命型):违反会导致进程终止或系统崩溃(如 `禁无条件杀python(会杀自己)`) - 红线规则(隐蔽型):违反不报错但产生错误结果(如 `搜索用google不用百度`) - 高频犯错点:容易遗忘的关键约束(如 `es(PATH有)` 防止找路径) - 更新:L2/L3 有新增/删除时,判断频率归入对应层。修改时请极度小心,不允许overwrite或code run。只能少量patch,改不动宁愿不改。 **禁止**:严禁写入密码、API Key。允许内联非敏感触发参数(如代理端口)。不写 "How to" 或详细解释。严禁包含特定任务的技术细节(特定任务细节应该在L3)。更加严禁写入日志记录! --- ### L2:全局事实库 (global_mem.txt) **职责**:存储全局环境性事实(路径、凭证、配置、常量等)。 **特征**: - 趋势:随环境扩展而膨胀(可接受) - 内容:按 `## [SECTION]` 组织的事实条目 - 同步:变化时更新 L1 的相应 TOPIC 导航行,只能导航 **禁止**:禁止存储易变状态、禁止存储猜测、严禁存储大模型可推理的通用常识 --- ### L3:任务级精简记录库 (../memory/) 职责:补充 L1/L2 无法容纳、但对**特定任务**未来复用至关重要的少量详细信息。内容必须在满足复用需求的前提下**尽可能短**。 原则: - 只记录:跨会话仍重要、且难以通过少量 file_read / web_scan / 简单脚本快速重建的要点。 - 优先写:该任务特有的隐藏前置条件、典型易踩坑点,一旦遗忘会导致高成本重试的信息。 - 不记录:普通操作步骤、可在几步探测中重新获得的路径或状态信息。 形式: - SOP(*_sop.md):为单一任务或小类任务保留极简的「关键前置 + 典型坑」清单,避免长篇教程。 - 工具脚本(*.py):仅封装高复用、逻辑相对复杂且不希望每次都重新推理的处理流程。 --- ## L1 ↔ L2/L3 同步规则 | 操作 | L1 同步 | |---------|--------| | L2/L3 新增场景 | 新建默认低频→L3列表加文件名(自解释不加描述,反直觉场景才能加括号触发词) | | L2/L3 删除场景 | 删除对应层的关键词/映射行 | | L2/L3 修改值 | 若不影响场景定位则不动 L1 | | 发现通用避坑规律 | 压缩为一句加入 RULES | > **同步红线**:L1 只写关键词/名称,禁搬细节。括号内只写反直觉的场景触发词(2-4字),禁写机制/方法/步骤。需要评估L1中的token数和索引效用。 > 反例:❌ sop_name(场景A:方法1+方法2+方法3) → ✅ sop_name(场景A) > 反例:名字已自解释时 ❌ discord_slate_sop(Slate输入框) → ✅ discord_slate_sop --- ## 信息分类快速决策树 ``` "这条信息该放哪层?" 是『环境特异性事实』? (IP、非标路径、凭证、ID、API 密钥等,大模型 Zero-shot 无法生成准确) ├─ YES → L2 (global_mem.txt) │ 然后 → 按频率归入 L1 第一层(key→value)或第二层(仅关键词) │ └─ NO ↓ 是『通用操作规律』? (全局性避坑指南、排查方法、不针对特定任务的通用准则) ├─ YES → L1 [RULES] (仅限 1 句压缩准则) │ └─ NO ↓ 是『特定任务技术』? (艰难尝试才能成功,且未来还能用到的任务,如:微信解析参数、特定游戏坐标、临时工具配置) ├─ YES → L3 (../memory/ 专项 SOP 或脚本) │ └─ NO → 判定为『通用常识』或『冗余信息』: 严禁存储,直接丢弃 ``` ================================================ FILE: memory/ocr_utils.py ================================================ """ 本地 OCR 工具 - OCR引擎: rapidocr-onnxruntime (~1s/次, 中英文准确率高, 带bbox) - 坑(rapid): result[i][2] conf 是 str 不是 float - 坑(rapid): 无文字时 result 返回 None 而非空列表 - 坑: enhance 放大+高对比度处理,对清晰文字有害,默认关闭 - 坑(远程桌面): ImageGrab/mss 在 RDP 断开后截图全黑,用 ocr_window(hwnd) 代替 """ import re from PIL import ImageGrab, Image, ImageEnhance _LANG = 'zh-Hans-CN' _rapid_engine = None def _get_rapid(): global _rapid_engine if _rapid_engine is None: from rapidocr_onnxruntime import RapidOCR _rapid_engine = RapidOCR() return _rapid_engine def _preprocess(img, scale=3, contrast=3.0): img = ImageEnhance.Contrast(img).enhance(contrast) img = img.resize((img.width * scale, img.height * scale)) return img def _strip_cjk_spaces(t): return re.sub(r'(?<=[\u4e00-\u9fff])\s+(?=[\u4e00-\u9fff])', '', t) def _ocr_rapid(img): import numpy as np engine = _get_rapid() arr = np.array(img) result, elapse = engine(arr) if not result: return {'text': '', 'lines': [], 'details': []} lines = [r[1] for r in result] details = [{'bbox': r[0], 'text': r[1], 'conf': float(r[2])} for r in result] text = _strip_cjk_spaces('\n'.join(lines)) return {'text': text, 'lines': [_strip_cjk_spaces(l) for l in lines], 'details': details} def ocr_image(image_input, lang=_LANG, enhance=False, engine=None): """ 对 PIL Image 做 OCR :param image_input: PIL Image 对象 或 文件路径(str) :param lang: 保留参数,当前未使用 :param enhance: 预处理 :param engine: 保留参数,当前仅支持 rapid/None :return: dict {'text': 全文, 'lines': [行文本], 'details': [bbox+conf]} """ if isinstance(image_input, str): image_input = Image.open(image_input) if enhance: image_input = _preprocess(image_input) if engine not in (None, 'rapid'): raise ValueError("Only rapid OCR is supported") return _ocr_rapid(image_input) def ocr_screen(bbox=None, lang=_LANG, enhance=False, engine=None): """ 截取屏幕区域并 OCR :param bbox: (x1, y1, x2, y2) 像素坐标,None=全屏 :return: dict {'text': 全文, 'lines': [行文本], 'details': [bbox+conf](仅rapid)} """ img = ImageGrab.grab(bbox=bbox) return ocr_image(img, lang, enhance, engine) def ocr_window(hwnd, lang=_LANG, enhance=False, engine=None): """ 截取窗口并 OCR (使用 PrintWindow API,支持远程桌面断开场景) :param hwnd: 窗口句柄(int) :return: dict {'text': 全文, 'lines': [行文本], 'details': [bbox+conf](仅rapid)} """ import win32gui, win32ui from ctypes import windll l, t, r, b = win32gui.GetWindowRect(hwnd) w, h = r - l, b - t hwndDC = win32gui.GetWindowDC(hwnd) mfcDC = win32ui.CreateDCFromHandle(hwndDC) saveDC = mfcDC.CreateCompatibleDC() saveBitMap = win32ui.CreateBitmap() saveBitMap.CreateCompatibleBitmap(mfcDC, w, h) saveDC.SelectObject(saveBitMap) windll.user32.PrintWindow(hwnd, saveDC.GetSafeHdc(), 3) bmpinfo = saveBitMap.GetInfo() bmpstr = saveBitMap.GetBitmapBits(True) img = Image.frombuffer('RGB', (bmpinfo['bmWidth'], bmpinfo['bmHeight']), bmpstr, 'raw', 'BGRX', 0, 1) win32gui.DeleteObject(saveBitMap.GetHandle()) saveDC.DeleteDC() mfcDC.DeleteDC() win32gui.ReleaseDC(hwnd, hwndDC) return ocr_image(img, lang, enhance, engine) if __name__ == "__main__": r = ocr_screen((0, 0, 400, 100)) print(f"识别结果: {r['text']}") for line in r['lines']: print(f" 行: {line}") if 'details' in r: for d in r['details']: print(f" [{d['conf']:.3f}] {d['text']}") ================================================ FILE: memory/plan_sop.md ================================================ # Plan Mode SOP **触发**:3步以上有依赖/多文件协同/条件分支/需并行 | **禁用**:1-2步简单任务直接做 任务开始前必须先创建工作目录 `./plan_XXX/`(XXX=任务英文短名) 单独使用一个code_run({'inline_eval':True, 'script':'handler.enter_plan_mode("./plan_XXX/plan.md")'})进入plan模式 --- ## 一、探索态(规划前置,必须执行) ⛔ **硬性规则(先读再做)**: - **主agent禁止直接执行环境探测**(必须委托subagent,无例外) - 主agent只做:创建目录、匹配SOP、启动subagent、读取结论 - subagent只读探测,禁止修改任何文件、执行有副作用的操作 - **探索subagent启动失败时:排查原因→重试,最多2次。禁止主agent回退为自己探测** **目标**:在写任何计划之前,搞清3件事: ① 环境现状(有什么、缺什么) ② 可用SOP ③ 关键不确定点 **为什么必须用subagent**:主agent上下文是最稀缺资源,探测长输出会挤占规划执行空间。 ### 步骤1:创建目录(必做) + SOP匹配 + 设置plan标志(主agent直接做) 1. 创建工作目录 `mkdir plan_XXX/` 2. 从上下文中的 L1 Insight 索引匹配可用领域SOP 3. 更新checkpoint:`[任务] XXX | [需求] 一句话 | [约束] 关键限制 | [匹配SOP] ... | [进度] 探索态` ### 步骤2:启动探索subagent(监察模式) 按 subagent.md 启动探索subagent,**加 `--verbose`** 开启监察模式,input要点: - **任务**:探测环境信息,写入 `plan_XXX/exploration_findings.md` - **探测项**(按任务类型选做,不是全做): - 代码类 → 关键文件结构、依赖、入口点 - 浏览器类 → 目标页面当前状态、可交互元素 - 自动化类 → 环境检查(which/pip/路径/权限) - 数据类 → 抽样数据(首5行+尾5行+总量) - **输出格式**:`## 环境现状` / `## 关键发现` / `## 风险/不确定点` - **约束**:只读探测,禁止修改文件,≤10次工具调用 - **复杂度评估**:探测时注意记录数据规模(文件数、行数、页面数),写入findings供规划时判断委托 ### 步骤3:监察等待 + 读取结论 主agent主动观察output.txt进度(`--verbose`输出含原始工具结果),而非无脑sleep轮询: 1. **观察**:读output.txt,审查subagent的探测方向和原始数据 2. **纠偏**(按需): - 方向偏了 → 写 `_intervene` 追加指令纠正 - 缺少关键上下文 → 写 `_keyinfo` 注入信息 - 已获取足够信息 → 写 `_stop` 提前终止,节省轮次 3. **收取**:等待 `[ROUND END]`,读取 `exploration_findings.md` **产出**:`exploration_findings.md`(结构化发现报告),主agent基于此进入规划态,写入plan.md头部的「探索发现」段。主agent在监察过程中获得的一手认知也可直接用于规划。 --- ## 二、规划态(含审查门) ### 步骤4:读领域SOP → 写plan.md 先读探索态匹配到的SOP,然后写plan骨架。允许"⚠待确认",禁止以"没调研清楚"推迟。 **[D] 委托标注规则**:写每个步骤时,结合探索发现评估操作量,符合以下任一条件则标 `[D]`: - 需要读取大量代码/文件(预估 >3个文件或 >100行) - 需要浏览网页并提取信息 - 需要执行 3 次以上重复性操作 - 需要运行测试/构建并分析输出 不标 `[D]` 的情况:读/更新 plan.md、单文件小幅修改、ask_user、简单一次性命令 **plan.md格式**: ```markdown # 任务标题 需求:一句话 | 约束:关键限制 ## 探索发现 - 发现1:XXX(来源:file_read/web_scan/code_run) - 发现2:YYY - 不确定点:ZZZ ## 执行计划 1. [ ] 步骤1简述 SOP: xxx_sop.md 2. [D] 步骤2简述(委托subagent执行) SOP: yyy_sop.md 依赖:1 3. [P] 步骤3简述(并行,读subagent.md执行Map模式) SOP: yyy_sop.md 4. [?] 步骤4(条件分支) SOP: (无) ← 高风险 条件:X成功→4.1,否则→4.2 --- ## 验证检查点 N+1. [ ] **[VERIFY] 启动独立验证subagent** SOP: verify_sop.md plan_sop.md 操作:读plan_sop.md第四章内容 → 准备verify_context.json → 启动验证subagent → 读取VERDICT → 按结果处理 ⚠ 不可跳过,不可在未启动subagent的情况下标记[✓] --- ``` ### 步骤5:自检清单(主agent逐项检查) - □ 探索发现是否都反映在plan中?(没遗漏关键约束) - □ 每步的SOP标注是否合理?(SOP真的能解决该步?) - □ 步骤间依赖是否正确?(有没有隐含依赖没写出来) - □ 高风险步骤(SOP:无/不可逆)有没有清晰的执行思路? - □ 步骤粒度是否合适?(禁止"处理所有文件",必须展开具体条目) - □ **复杂/繁琐步骤是否标注了[D]?**(读大量代码/网页/重复操作必须委托subagent) - □ **是否包含"验证检查点"section,且有[VERIFY]步骤?(必须有,这是强制步骤)** ### 步骤6:用户确认 ask_user 确认plan后才能转入执行态。**⛔ 用户未确认不得执行。** ### 步骤7:转入执行态 更新checkpoint:`[执行] plan.md | 当前:步骤1 | ⚡有[P]标记必须读subagent.md执行Map模式` --- ## 三、执行态循环 > **核心原则:连续执行,不停顿汇报。** 做完一步立即 file_read(plan.md) 找下一个 `[ ]`,直到全部完成。 ### 每轮流程 1. **读plan** — `file_read(plan.md)` 定位第一个 `[ ]` 项 2. **读SOP** — 该步标注了SOP → 先 file_read 该SOP 3. **检查标记** — `[D]`标记 → 必须委托subagent执行,主agent只收结果摘要;`[P]`标记 → 读 subagent_sop.md 执行Map模式;`[?]`条件 → 评估条件选分支,未选标[SKIP] 4. **执行** — 无特殊标记的步骤由主agent自己执行 5. **Mini验证** — 快速确认产出存在且合理(file_read确认非空、检查exit code等) 6. **标记完成** — `file_patch` 标记 `[ ]` → `[✓ 简要结果]`(进度写入plan.md) 7. **继续** — 立即回到步骤1,file_read(plan.md) 执行下一个 `[ ]` ### 终止检查(最后一步标记后,不可跳过) file_read(plan.md) 全文扫描,确认所有步骤(含[VERIFY])均为 `[✓]`/`[✗]`,0个 `[ ]` 残留。 输出:`🏁 终止检查:[总步数]步全部完成,0个[ ]残留 → 任务结束` 若发现遗漏 → 继续执行,禁止声称完成。 ### ⚠ 执行态禁令 - **禁止凭记忆执行**:每次做新步骤前必须 `file_read(plan.md)`,不可"我记得下一步是..." - **禁止跳过验证步骤**:[VERIFY]步骤是强制的,不可以"任务都做完了"为由跳过 - **禁止未经终止检查就结束**:最后一步标记后必须 file_read 全文扫描确认0个[ ]残留,输出🏁终止确认行 - **禁止停下来输出纯文字汇报**:做完一步后必须立即 file_read(plan.md) 继续,不要输出进度总结 ### 💡 动态委托原则 即使步骤未标 `[D]`,执行中发现以下情况时,主动委托 subagent 处理: - 需要读取大量代码/文件才能理解上下文(>3个文件或预估 >100行) - 需要反复试错调试 - 需要浏览网页提取信息 做法:起 subagent 完成具体操作,要求返回精简摘要,主 agent 基于摘要继续决策。保持主 agent 上下文干净是第一优先级。 --- ## 四、验证态(subagent独立验证) > 全部步骤[✓]后进入。**强制**启动独立subagent做对抗性验证,避免上下文污染。 ### 触发条件 - 所有执行步骤标记为 `[✓]` - **所有plan模式任务必须经subagent验证**(主agent有确认偏误,易被表面成功迷惑) ### 步骤8:准备验证上下文 在 `./plan_XXX/` 下创建 `verify_context.json`,包含: - task_description:原始任务描述(用户原话) - plan_file:plan.md绝对路径 - task_type:code|data|browser|file|system - deliverables:交付物列表(type/path/expected) - required_checks:必做检查列表(check/tool) **传什么**:任务描述、plan路径、交付物清单、必做检查。**不传**:执行过程、调试记录。 ### 步骤9:启动验证subagent 按 subagent.md 标准流程启动验证subagent,input要点: - **角色**:你是独立验证者,工作是对抗性验证(证明交付物不能用) - **第一步强制**:file_read verify_sop.md 完整阅读验证SOP - **按 verify_sop.md 第3节**选择对应task_type的验证策略执行 - **每个检查必须有工具调用证据**(实际执行,不是叙述) - **任务描述**:(填入原始任务描述) - **交付物清单**:(填入deliverables列表) - **输出**:在 result.md 中按 verify_sop.md 第6节格式输出,最后一行 `VERDICT: PASS / FAIL / PARTIAL` - **约束**:3轮内完成,每轮至少1个实际工具调用 同时传入 verify_context.json 的路径,让subagent自行读取详细上下文。 ### 步骤10:收集验证结果 轮询 output.txt 等待 `[ROUND END]`,然后读取 result.md: 1. **找VERDICT行**:读取result.md最后几行,提取 `VERDICT: PASS/FAIL/PARTIAL` 2. **检查有效性**:如果所有PASS项都没有工具调用输出(只有叙述),视为验证无效,按FAIL处理 3. **按结果处理**: - **PASS** → 进入任务完成收尾 - **FAIL** → 进入修复循环 - **PARTIAL** → 主agent判断可接受则完成,否则修复 - **无VERDICT行** → 从output.txt提取关键信息,主agent自行判断PASS/FAIL **任务完成收尾**(验证PASS后执行): 1. 标记plan.md中 `[VERIFY]` 步骤为 `[✓]` 2. 更新checkpoint:`[完成] XXX任务 | [产出] ... | [经验] ...` 3. 向用户确认任务完成 **重要**:只有在验证PASS后,才能标记[VERIFY]为[✓]并声称任务完成。如果验证FAIL,需要进入修复循环。 **Fallback**:若subagent未产出result.md(turn耗尽),从output.txt提取VERDICT关键信息。 ### 修复循环(FAIL后) FAIL → 提取具体失败项 → 回执行态修复(不重新规划) → 修复完成 → 再次启动验证subagent → 最多2轮FAIL-重试,超过 ask_user 介入 修复时: 1. 将FAIL项作为新步骤追加到plan.md(标记为 `[FIX]`) 2. 只修复失败项,不重做已PASS的部分 3. 修复完成后重新准备verify_context.json(只含失败项) ### 特殊场景处理 浏览器/键鼠/定时任务等场景:主agent执行操作并导出证据(截图/录屏/日志)→ subagent验证证据文件。**禁止主agent自行判断PASS/FAIL**。 --- ## 五、失败处理 1. **记录**:checkpoint中 `step_X: [FAILED] 原因 (retry: N/3)` 2. **重试**:网络超时→自动重试3次(2s/4s/8s) | 配置错误→询问用户 | 其他→标[✗]跳过 3. **subagent失败**:查stderr.log→明确错误主agent修正重启 | 未知错误重试1次 | 最多重启2次 4. **依赖传播**:步骤失败后,后续依赖项标[SKIP] 5. **plan有误**:回退到规划态修正plan.md,重新过审查门 ## 强制约束 - 每项必须有独立完成判据 - 禁止"处理所有文件",必须展开具体条目 - 一次只做一项;计划有误回规划态修正 - 不可逆操作前多验证一步 ================================================ FILE: memory/procmem_scanner.py ================================================ import ctypes import ctypes.wintypes import argparse import yara import sys import os import json # Define WinAPI Types for 64-bit compatibility PHANDLE = ctypes.wintypes.HANDLE LPCVOID = ctypes.c_void_p LPVOID = ctypes.c_void_p SIZE_T = ctypes.c_size_t class MEMORY_BASIC_INFORMATION(ctypes.Structure): _fields_ = [ ("BaseAddress", LPVOID), ("AllocationBase", LPVOID), ("AllocationProtect", ctypes.wintypes.DWORD), ("RegionSize", SIZE_T), ("State", ctypes.wintypes.DWORD), ("Protect", ctypes.wintypes.DWORD), ("Type", ctypes.wintypes.DWORD), ] # Explicitly setup kernel32 functions with precise types k32 = ctypes.windll.kernel32 k32.OpenProcess.argtypes = [ctypes.wintypes.DWORD, ctypes.wintypes.BOOL, ctypes.wintypes.DWORD] k32.OpenProcess.restype = PHANDLE k32.VirtualQueryEx.argtypes = [PHANDLE, LPCVOID, ctypes.POINTER(MEMORY_BASIC_INFORMATION), SIZE_T] k32.VirtualQueryEx.restype = SIZE_T k32.ReadProcessMemory.argtypes = [PHANDLE, LPCVOID, LPVOID, SIZE_T, ctypes.POINTER(SIZE_T)] k32.ReadProcessMemory.restype = ctypes.wintypes.BOOL def is_hex_pattern(pattern): clean = pattern.replace(" ", "").replace("??", "") return all(c in "0123456789abcdefABCDEF" for c in clean) and (len(clean) % 2 == 0 or "??" in pattern) def build_rules(pattern, mode='auto'): use_hex = (mode == 'hex') or (mode == 'auto' and is_hex_pattern(pattern)) if use_hex: rule_text = f'rule CustomSearch {{ strings: $h = {{ {pattern.strip()} }} condition: $h }}' else: escaped = pattern.replace('\\', '\\\\').replace('"', '\\"') rule_text = f'rule CustomSearch {{ strings: $s = "{escaped}" ascii wide condition: $s }}' return yara.compile(source=rule_text) def format_llm_context(data, offset, base_addr, length=64): start = max(0, offset - length) end = min(len(data), offset + length + 16) chunk = data[start:end] abs_addr = (base_addr if base_addr else 0) + offset return { "address": hex(abs_addr), "offset": hex(offset), "hex": chunk.hex(), "ascii": "".join(chr(b) if 32 <= b <= 126 else "." for b in chunk), "hit_pos": offset - start } def scan_memory(pid, pattern, context_size=256, mode='auto', llm_mode=False): rules = build_rules(pattern, mode) h_proc = k32.OpenProcess(0x0400 | 0x0010, False, pid) if not h_proc: # OpenProcess failed: might be system process or higher integrity level return [f"Error: Cannot open process {pid}. (ErrorCode: {k32.GetLastError()})"] results = [] curr_addr = 0 mbi = MEMORY_BASIC_INFORMATION() # Range for 64-bit user space max_addr = 0x7FFFFFFFFFFF while curr_addr < max_addr: # Use cast to ensure pointer type is correct for 64-bit res = k32.VirtualQueryEx(h_proc, ctypes.cast(curr_addr, LPCVOID), ctypes.byref(mbi), ctypes.sizeof(mbi)) if res == 0: break # MEM_COMMIT = 0x1000, PAGE_READABLE bitmask if mbi.State == 0x1000 and (mbi.Protect & 0xEE): # 0xEE covers common readable flags buf = ctypes.create_string_buffer(mbi.RegionSize) read = SIZE_T(0) if k32.ReadProcessMemory(h_proc, ctypes.cast(mbi.BaseAddress, LPCVOID), buf, mbi.RegionSize, ctypes.byref(read)): data = buf.raw[:read.value] for match in rules.match(data=data): for inst in match.strings: offset = inst.instances[0].offset matched_data = inst.instances[0].matched_data base = mbi.BaseAddress if mbi.BaseAddress else 0 if llm_mode: results.append(format_llm_context(data, offset, base, length=context_size)) else: # Expand context based on context_size to capture full KEY+SALT start = max(0, offset - context_size) end = min(len(data), offset + len(matched_data) + context_size) results.append(f"Addr: {hex(base+offset)}\nHex: {data[start:end].hex()}") # Update address using the region size next_addr = (mbi.BaseAddress if mbi.BaseAddress else 0) + mbi.RegionSize if next_addr <= curr_addr: break curr_addr = next_addr k32.CloseHandle(h_proc) return results if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("pid", type=int) parser.add_argument("pattern", type=str) parser.add_argument("--mode", default='auto') parser.add_argument("--llm", action="store_true") args = parser.parse_args() try: res = scan_memory(args.pid, args.pattern, mode=args.mode, llm_mode=args.llm) print(json.dumps(res, indent=2) if args.llm else f"Matches: {len(res)}") except Exception as e: print(f"Error: {e}") ================================================ FILE: memory/procmem_scanner_sop.md ================================================ # Memory Scanner SOP ## 1. 快速开始 内存特征搜索工具,支持 Hex (CE 风格) 和 字符串匹配。特别提供 LLM 模式,方便大模型分析内存上下文。 **Python 调用方式:** ```python import sys sys.path.append('../memory') # 直接挂载工具目录 from procmem_scanner import scan_memory # 示例:搜索特定 Hex 特征码,开启 llm_mode 以获取上下文 results = scan_memory(pid, "48 8b ?? ?? 00", mode="hex", llm_mode=True) ``` **CLI:** ```powershell # 基础搜索 python ../memory/procmem_scanner.py "pattern" --mode string # LLM 增强模式(输出包含上下文的 JSON,推荐) python ../memory/procmem_scanner.py "pattern" --llm ``` ## 2. 典型场景:结构体或关键数据定位 1. 确定目标数据的前导特征或已知常量(如特定的 Header 或 Magic Number)。 2. 在目标进程中搜索该特征: `scan_memory(pid, "4D 5A 90 00", mode="hex", llm_mode=True)` 3. 分析返回的 JSON 中 `context` 字段,查看目标地址前后的原始字节及 ASCII 预览。 ## 3. 注意事项 - **权限**: 并非强制要求管理员权限,但需具备对目标进程的 `PROCESS_QUERY_INFORMATION` 和 `PROCESS_VM_READ` 权限。 - **效率**: 搜索大块内存时,尽量提供更唯一的特征码以减少误报。 ## 4. CE式差集扫描定位动态字段 定位微信等自绘UI中随操作变化的内存字段(如当前会话标题)。核心:一次全量scan + 多次ReadProcessMemory筛选。 ================================================ FILE: memory/scheduled_task_sop.md ================================================ # 定时任务 SOP 目录:`../sche_tasks/` 放任务定义JSON,`../sche_tasks/done/` 放执行报告 ## 任务JSON格式(*.json) ```json {"schedule":"08:00", "repeat":"daily", "enabled":true, "prompt":"...", "max_delay_hours":6} ``` repeat可选:daily | weekday | weekly | monthly | once | every_Nh(每N小时)| every_Nd(每N天) max_delay_hours(可选,默认6):超过schedule多少小时后不再触发,防止开机太晚执行过时任务 ## 触发流程 1. scheduler.py(reflect/)每60秒轮询 sche_tasks/*.json 2. 条件全满足才触发:enabled=true + 当前时间≥schedule + 冷却时间已过(基于done/最新报告时间戳) 3. 触发时拼prompt,含报告路径 `../sche_tasks/done/YYYY-MM-DD_任务名.md` 4. **收到任务后第一件事**:用 update_working_checkpoint 记录报告目标文件路径,防止长任务执行中遗忘 5. 执行完毕后将报告写入上述路径(scheduler靠此文件判断今天已执行) ## 日志与监控 - scheduler自动写日志到 `sche_tasks/scheduler.log`(触发/跳过/错误) - `scheduler.health_check()` 返回所有任务状态列表(HEALTHY/OVERDUE/DISABLED/NEVER_RUN/ERROR) - JSON解析错误、schedule格式错误、未知repeat类型均会记录日志 ## 注意 - once类型:执行一次后冷却100年(实际效果为永久跳过) - 任务文件只管"干什么",报告路径由scheduler自动生成注入prompt - sche_tasks目录在../,即code root下 ================================================ FILE: memory/supervisor_sop.md ================================================ # 监察者模式 SOP > 你是挑刺的监工,不是干活的工人。你的唯一任务:确保工作agent高质量完成任务。有SOP按SOP约束,无SOP凭常理和经验把关。 ## 红线 - **禁止下场干活**:不操作浏览器、不写代码、不执行任务步骤。你只读、只判断、只干预 - **可以读环境**:file_read/web_scan/web_execute_js/code_run(只读命令)获取情报,辅助判断工作agent进度和状态 ## 启动 1. **有SOP时**:读SOP原文,提取所有约束(⚠️/禁止/必须/格式要求),按步骤列成**约束清单**存working memory 1. **无SOP时**:根据任务性质和进度,预估未来会遇到的关键风险点 2. **启动subagent**(cwd=代码根): ``` python agentmain.py --task {name} --bg --verbose ``` input.txt:`用{SOP名}完成{用户任务}`(只给目标,不复述步骤) ## 监控循环 持续轮询 `temp/{task_name}/output.txt` 的新增内容(sleep间隔读取),每发现新输出: 1. 判断工作agent当前在哪一步,对照约束清单检查(约束记不清时重读SOP原文,禁凭印象) 2. 可读环境信息(文件/网页/进程)补充判断依据 3. 工作agent ask_user时给予回复 | 发现 | 干预 | |------|------| | 跳步 | `_intervene`:你跳过了StepN,先做 | | 细节遗漏 | `_intervene`:你漏了XX约束,重做/补上 | | 光说不做 | `_intervene`:别说了,直接做 | | 断言无据 | `_intervene`:你怎么确认的?验证一下 | | 连续失败 | `_intervene`:停,先读错误日志再决定 | | 感觉要偏 | `_intervene`:去重读SOP的StepN再继续 | | 即将进入中后期步骤 | `_keyinfo`:提前注入该步骤的⚠️细节(趁还没到,先塞进working memory) | ## 干预原则 - **沉默为主**:没问题不说话 - **一句话**:像用户一样直接说,禁长篇解释 - **`_keyinfo`只用于提前预注入**:在工作agent到达该步之前塞细节。已经犯错的一律用`_intervene`纠正 ================================================ FILE: memory/tmwebdriver_sop.md ================================================ # TMWebDriver SOP - 直接用web_scan/web_execute_js工具。本文件只记录特性和坑。 - 底层:`../TMWebDriver.py`通过Chrome扩展接管用户浏览器(保留登录态/Cookie) - 非Selenium/Playwright,保留用户浏览器登录态 ## 通用特性 - ⚠web_execute_js里使用`await`时需**显式`return`**才能拿到返回值(底层async包裹,不写return则返回null) - ✅web_scan自动穿透同源iframe;跨域iframe需CDP或postMessage(见下方章节) ## 限制(isTrusted) - JS事件`isTrusted=false`,敏感操作(如文件上传/部分按钮)可能被拦截;这类场景首选**CDP桥** - ⚠JS点击按钮打不开新tab→可能是浏览器弹窗拦截,换CDP点击试试 - Vue3自定义组件(Select/Dropdown):⭐优先vnode实例调用(无视口限制)→见**vue3_component_sop**;CDP坐标点击仅适合选项少且可见的场景 - 文件上传:⭐首选**DataTransfer API**(纯JS,无CDP依赖):`new File([content],name,{type}) → new DataTransfer().items.add(file) → input.files=dt.files → dispatch input+change`;CDP `DOM.setFileInputFiles` 在tmwd桥环境nodeId跨调用失效,不推荐;备选ljqCtrl物理点击 - 需转物理坐标时:`physX = (screenX + rect中心x) * dpr`,`physY = (screenY + chromeH + rect中心y) * dpr`;其中 `chromeH = outerHeight - innerHeight` ## 导航 - `web_scan` 仅读当前页不导航,切换网站用 `web_execute_js` + `location.href='url'` ## Google图搜 - class名混淆禁硬编码,点击结果用 `[role=button]` div - web_scan过滤边栏,弹出后用JS:文本`document.body.innerText`,大图遍历img按`naturalWidth`最大取src - "访问"链接:遍历a找`textContent.includes('访问')`的href - 缩略图:`img[src^="data:image"]`直接提取;大图src可能截断用`return img.src` ## Chrome下载PDF 场景:PDF链接在浏览器内预览而非下载 ```js fetch('PDF_URL').then(r=>r.blob()).then(b=>{ const a=document.createElement('a'); a.href=URL.createObjectURL(b); a.download='filename.pdf'; a.click(); }); ``` 注意:需同源或CORS允许,跨域先导航到目标域再执行 ## Chrome后台标签节流 - 后台标签中`setTimeout`被Chrome intensive throttling延迟到≥1min/次,扩展脚本中避免依赖setTimeout轮询 - 某些SPA页面需CDP `Page.bringToFront`切到前台才会加载数据 ## CDP桥(tmwd_cdp_bridge扩展) ⭐首选 扩展路径:`assets/tmwd_cdp_bridge/`(需安装,含debugger权限) ⚠TID约定标识:首次运行自动生成到`assets/tmwd_cdp_bridge/config.js`(已gitignore),扩展通过manifest引用 调用:`web_execute_js` script直传JSON字符串(工具层自动识别对象格式,走WS→background.js cmd路由) ```js // 直接传JSON字符串作为script参数,无需DOM操作 web_execute_js script='{"cmd": "cookies"}' web_execute_js script='{"cmd": "tabs"}' web_execute_js script='{"cmd": "cdp", "tabId": N, "method": "...", "params": {...}}' web_execute_js script='{"cmd": "batch", "commands": [...]}' // 返回值直接是JSON结果 ``` 通信方式:⭐JSON字符串直传(首选) | TID DOM方式(TID元素+MutationObserver,web_scan/execute_js底层依赖) 单命令:`{cmd:'tabs'}` | `{cmd:'cookies'}` | `{cmd:'cdp', tabId:N, method:'...', params:{...}}` | `{cmd:'management', method:'list|reload|disable|enable', extId:'...'}` - management:list返回所有扩展信息;reload/disable/enable需传extId - contentSettings:`{cmd:'contentSettings', type:'automaticDownloads', pattern:'https://*/*', setting:'allow'}` - 绕过Chrome"下载多个文件"对话框(该对话框会阻塞整个浏览器JS执行) - type可选:automaticDownloads/popups/notifications等;setting:allow/block/ask - ⚠CDP的Browser.setDownloadBehavior在扩展中不可用(chrome.debugger仅tab级),此为替代方案 - ⭐batch混合:`{cmd:'batch', commands:[{cmd:'cookies'},{cmd:'tabs'},{cmd:'cdp',...},...]}` - 返回`{ok:true, results:[...]}`,一次请求多命令,CDP懒attach复用session - 子命令会自动继承外层batch的tabId(如cookies命令可正确获取当前页面URL) - `$N.path`引用第N个结果字段(0-indexed),如`"nodeId":"$2.root.nodeId"` - ⚠batch前序命令失败时,后续`$N`引用会静默变成undefined;要检查results数组中每项的ok状态 - 典型文件上传:getDocument(**depth:1**) → querySelector(`input[type=file]`) → setFileInputFiles - 思想: - 同一链路内保持nodeId来源一致,不混用querySelector路径与performSearch路径 - 上传后前端框架可能不感知,必要时JS补发`input`/`change`事件 - 上传前检查`input.accept`;多input时用accept/父容器语义区分 - 等待元素优先用`DOM.performSearch('input[type=file]')`做轻量轮询 - 瞬态input的核心是**缩短发现→setFileInputFiles时间窗**:优先同batch完成;再不行用DOM事件监听;猴子补丁仅作兜底思路 - ⚠tabId:CDP默认sender.tab.id(当前注入页),跨tab需显式tabId或先batch内tabs查 - ⭐跨tab无需前台:指定tabId即可操作后台标签页 ## CDP点击完整生命周期(✅已验证) - 通用点击需**三事件序列**:mouseMoved → mousePressed → mouseReleased(间隔50-100ms) - 省略mouseMoved会导致MUI Tooltip/Ant Design Dropdown等hover依赖组件失效 - ⚠autofill释放是特例,只需mousePressed即可(见下方autofill章节) - ⭐**坐标系结论**:稳定状态下 CDP坐标 = `getBoundingClientRect()` 坐标,**无需修正** - ⚠**首次attach陷阱**:CDP debugger首次attach时Chrome弹出infobar("正在受自动化控制",~20px高),页面内容被推下 - 如果在attach前测量坐标、attach后发送点击 → 坐标偏移!(之前Currency下拉失败的根因) - ✅**解决**:确保测量坐标在CDP已attach稳定之后(即infobar已出现后再getBoundingClientRect) - 实践:首次CDP操作前先发一个无害的`mouseMoved(0,0)`预热,之后坐标系就稳定了 - ⭐**下拉框(Vue3 oxd-select等)CDP操作流程**: 1. 获取select元素rect → CDP点击打开下拉 2. 获取option元素rect → CDP点击选中(option是动态DOM,打开后才能测量) - 已验证:CDP点击对自定义下拉框有效,无isTrusted问题 - ⚠**限制**:选项多时底部option超出视口,CDP坐标够不着→此时应优先vnode方案(见vue3_component_sop) - 坐标修正(页面有transform:scale/zoom时): ```js var scale = window.visualViewport ? window.visualViewport.scale : 1; var zoom = parseFloat(getComputedStyle(document.documentElement).zoom) || 1; var realX = x * zoom; var realY = y * zoom; ``` - iframe内元素CDP点击:坐标需合成 `finalX = iframeRect.x + elRect.x` - 跨域iframe拿不到contentDocument: - ⚠`Target.getTargets`/`Target.attachToTarget`在CDP桥中返回"Not allowed"(chrome.debugger权限限制) - ⭐**已验证方案**:`Page.getFrameTree`找iframe frameId → `Page.createIsolatedWorld({frameId})`获取contextId → `Runtime.evaluate({expression, contextId})`在iframe中执行JS - batch链式引用:`$0.frameTree.childFrames`遍历找url匹配的frame,`$1.executionContextId`传给evaluate - postMessage中继方案仅在content script已注入iframe时有效,第三方支付iframe通常无注入 ## CDP文本输入(未验证,BBS#23) - `insertText`快但无key事件;受控组件需补dispatch `input`事件 - 需完整键盘模拟时用`dispatchKeyEvent`逐键派发 ## CDP DOM域穿透 closed Shadow DOM(未验证,BBS#24/#25) - `DOM.getDocument({depth:-1, pierce:true})` 穿透所有Shadow边界(含closed) - `DOM.querySelector({nodeId, selector})` 定位 → `DOM.getBoxModel({nodeId})` 取坐标 - getBoxModel返回content八值[x1,y1,...x4,y4],中心用**四点平均**:centerX=sum(x)/4, centerY=sum(y)/4 - ⚠不能简化为对角线平均——元素有transform:rotate/skew时四点非矩形 - querySelector**不能跨Shadow边界写组合选择器**,需分步:先找host再在其shadow内找子元素 - ⚠nodeId在DOM变更后失效 → 用`backendNodeId`更稳定,或重新getDocument刷新 ## autofill获取与登录 检测:web_scan输出input带`data-autofilled="true"`,value显示为受保护提示(非真实值,Chrome安全保护需点击释放) - ⚠**前置条件:必须先CDP `Page.bringToFront` 切tab到前台**,Chrome仅在前台tab释放autofill保护值,后台tab物理点击无效 - ⭐**一键释放与登录**:bringToFront → mousePressed点任一字段(无需Released,一个释放全页) → 等500ms → 补input/change事件 → 点登录 ## 验证码/页面视觉截图 - ⭐首选CDP截图:`Page.captureScreenshot`(format:'png')→返回base64,无需前台/后台tab也行,全页高清 - 验证码canvas/img:JS `canvas.toDataURL()` 直接拿base64最干净 ## simphtml与TMWebDriver调试 - simphtml调试必须通过`code_run`注入JS到真实浏览器(Python端无法模拟DOM) - `d=TMWebDriver()`, `d.set_session('url_pattern')`, `d.execute_js(code)` → 返回`{'data': value}` - simphtml:`str(simphtml.optimize_html_for_tokens(html))` — 返回BS4 Tag需str() ## 连不上排查 web_scan失败时按序排查(自动检测优先,用户参与放最后): ①浏览器没开?→检查浏览器进程是否在跑(tasklist/ps),没有则启动并打开正常URL(⚠about:blank等内部页不加载扩展) ②WS后台挂了?→本机18766端口没监听即dead→手动**后台持续运行**`from TMWebDriver import TMWebDriver; TMWebDriver()`起master ③扩展没装?→读Chrome用户目录下`Secure Preferences`→`extensions.settings`中找`path`含`tmwd_cdp_bridge`的条目 找到→扩展已装,排查其他原因;没找到→走web_setup_sop ④以上都正常仍连不上→请求用户协助 ================================================ FILE: memory/ui_detect.py ================================================ #!/usr/bin/env python3 """ 极简UI元素检测脚本 - 基于OmniParser的YOLO模型 依赖: ultralytics, rapidocr-onnxruntime, pillow, numpy """ import sys from pathlib import Path from ultralytics import YOLO from PIL import Image, ImageDraw import numpy as np DEFAULT_MODEL = str(Path(__file__).resolve().parent.parent / 'temp' / 'weights' / 'icon_detect' / 'model.pt') # 可选:使用rapidocr做OCR try: from rapidocr_onnxruntime import RapidOCR ocr_engine = RapidOCR() HAS_OCR = True except ImportError: HAS_OCR = False print("警告: rapidocr未安装,跳过OCR功能") def detect_ui_elements(image_path, model_path=None, conf_threshold=0.25): model_path = model_path or DEFAULT_MODEL """检测UI元素并返回边界框""" # 加载模型 model = YOLO(model_path) # 推理 results = model(image_path, conf=conf_threshold, verbose=False) # 提取检测结果 detections = [] for result in results: boxes = result.boxes for box in boxes: x1, y1, x2, y2 = box.xyxy[0].cpu().numpy() conf = float(box.conf[0]) cls = int(box.cls[0]) detections.append({ 'bbox': [int(x1), int(y1), int(x2), int(y2)], 'confidence': conf, 'class': cls }) return detections def ocr_text(image_path): """OCR识别文本""" if not HAS_OCR: return [] result, _ = ocr_engine(image_path) if not result: return [] texts = [] for item in result: bbox, text, conf = item texts.append({ 'text': text, 'bbox': bbox, 'confidence': conf }) return texts def visualize(image_path, detections, ocr_results=None, output_path=None): """可视化检测结果""" img = Image.open(image_path) draw = ImageDraw.Draw(img) # 画UI元素框(红色) for det in detections: x1, y1, x2, y2 = det['bbox'] draw.rectangle([x1, y1, x2, y2], outline='red', width=2) draw.text((x1, y1-10), f"{det['confidence']:.2f}", fill='red') # 画OCR文本框(蓝色) if ocr_results: for ocr in ocr_results: bbox = ocr['bbox'] points = [(bbox[i][0], bbox[i][1]) for i in range(4)] draw.polygon(points, outline='blue') draw.text((points[0][0], points[0][1]-10), ocr['text'][:10], fill='blue') if output_path: img.save(output_path) return img def main(): if len(sys.argv) < 2: print("用法: python ui_detect.py <图片路径> <模型路径> [输出路径]") print("示例: python ui_detect.py screenshot.png weights/icon_detect/model.pt output.png") sys.exit(1) image_path = sys.argv[1] model_path = sys.argv[2] if len(sys.argv) > 2 else DEFAULT_MODEL output_path = sys.argv[3] if len(sys.argv) > 3 else "output.png" print(f"检测图片: {image_path}") print(f"使用模型: {model_path}") # UI元素检测 print("\n[1/2] YOLO检测UI元素...") detections = detect_ui_elements(image_path, model_path) print(f"检测到 {len(detections)} 个UI元素") for i, det in enumerate(detections, 1): print(f" {i}. bbox={det['bbox']}, conf={det['confidence']:.3f}") # OCR文本识别 ocr_results = None if HAS_OCR: print("\n[2/2] OCR识别文本...") ocr_results = ocr_text(image_path) print(f"识别到 {len(ocr_results)} 个文本区域") for i, ocr in enumerate(ocr_results, 1): print(f" {i}. text='{ocr['text']}', conf={ocr['confidence']:.3f}") # 可视化 print(f"\n保存结果到: {output_path}") visualize(image_path, detections, ocr_results, output_path) # 输出JSON格式结果 import json result = { 'ui_elements': detections, 'ocr_texts': ocr_results or [] } json_path = output_path.replace('.png', '.json') with open(json_path, 'w', encoding='utf-8') as f: json.dump(result, f, ensure_ascii=False, indent=2) print(f"JSON结果: {json_path}") if __name__ == "__main__": main() ================================================ FILE: memory/vision_api.template.py ================================================ import base64, requests, sys, os from io import BytesIO from pathlib import Path # ============ 用户配置区(从 template 拷贝后只需改这里)============ CLAUDE_CONFIG_KEY = 'claude_config141' # mykey.py 中 Claude 配置的变量名 OPENAI_CONFIG_KEY = 'oai_config1' # mykey.py 中 OpenAI 配置的变量名 MODELSCOPE_API_KEY = '' # 直接填你的 ModelScope token DEFAULT_BACKEND = 'claude' # 默认后端: 'claude' / 'openai' / 'modelscope' # ================================================================= MODELSCOPE_API_BASE = 'https://api-inference.modelscope.cn' MODELSCOPE_MODEL = 'Qwen/Qwen3-VL-235B-A22B-Instruct' _DIR = os.path.dirname(os.path.abspath(__file__)) for _p in [os.path.join(_DIR, '..'), os.path.join(_DIR, '../..')]: if _p not in sys.path: sys.path.insert(0, _p) def ask_vision(image_input, prompt="详细描述这张图片的内容", timeout=60, max_pixels=1440000, backend=DEFAULT_BACKEND): try: b64 = _prepare_image(image_input, max_pixels) except Exception as e: return f"Error: 图片处理失败 - {type(e).__name__}: {e}" try: if backend == 'claude': return _call_claude(b64, prompt, timeout) elif backend == 'openai': mk = _load_config() cfg = getattr(mk, OPENAI_CONFIG_KEY) return _call_openai_compat( b64, prompt, timeout, apibase=cfg['apibase'], apikey=cfg['apikey'], model=cfg['model'], proxy=cfg.get('proxy') ) elif backend == 'modelscope': return _call_openai_compat( b64, prompt, timeout, apibase=MODELSCOPE_API_BASE, apikey=MODELSCOPE_API_KEY, model=MODELSCOPE_MODEL, proxy=None ) else: return f"Error: 未知backend '{backend}',可选: claude, openai, modelscope" except requests.exceptions.Timeout: return f"Error: 请求超时 (>{timeout}s)" except requests.exceptions.RequestException as e: return f"Error: API请求失败 - {type(e).__name__}: {e}" except (KeyError, ValueError) as e: return f"Error: 响应解析失败 - {e}" # ===================== 以下为内部实现 ===================== def _prepare_image(image_input, max_pixels=1440000): """加载+缩放+base64编码,返回b64字符串""" from PIL import Image if isinstance(image_input, Image.Image): img = image_input elif isinstance(image_input, (str, Path)): img = Image.open(image_input) else: raise TypeError(f"image_input 必须是文件路径或PIL Image,实际: {type(image_input).__name__}") w, h = img.size if w * h > max_pixels: scale = (max_pixels / (w * h)) ** 0.5 new_w, new_h = int(w * scale), int(h * scale) img = img.resize((new_w, new_h), Image.Resampling.LANCZOS) print(f" 📐 缩放: {w}×{h} → {new_w}×{new_h}") if img.mode in ('RGBA', 'LA', 'P'): rgb = Image.new('RGB', img.size, (255, 255, 255)) rgb.paste(img, mask=img.split()[-1] if img.mode == 'RGBA' else None) img = rgb buf = BytesIO() img.save(buf, format='JPEG', quality=80, optimize=True) b64 = base64.b64encode(buf.getvalue()).decode('utf-8') print(f" 📦 Base64: {len(buf.getvalue())/1024:.1f}KB") return b64 def _load_config(): import mykey return mykey def _call_claude(b64, prompt, timeout, max_tokens=1024): mk = _load_config() cfg = getattr(mk, CLAUDE_CONFIG_KEY) resp = requests.post( cfg['apibase'] + '/v1/messages', json={'model': cfg['model'], 'max_tokens': max_tokens, 'messages': [{ 'role': 'user', 'content': [ {'type': 'image', 'source': {'type': 'base64', 'media_type': 'image/jpeg', 'data': b64}}, {'type': 'text', 'text': prompt} ] }]}, headers={'x-api-key': cfg['apikey'], 'anthropic-version': '2023-06-01', 'content-type': 'application/json'}, timeout=timeout ) resp.raise_for_status() return resp.json()['content'][0]['text'] def _call_openai_compat(b64, prompt, timeout, *, apibase, apikey, model, proxy=None): proxies = {'https': proxy, 'http': proxy} if proxy else None resp = requests.post( apibase.rstrip('/') + '/v1/chat/completions', json={'model': model, 'messages': [{ 'role': 'user', 'content': [ {'type': 'text', 'text': prompt}, {'type': 'image_url', 'image_url': {'url': f'data:image/jpeg;base64,{b64}'}} ] }]}, headers={'Authorization': f"Bearer {apikey}", 'Content-Type': 'application/json'}, proxies=proxies, timeout=timeout ) resp.raise_for_status() return resp.json()['choices'][0]['message']['content'] if __name__ == '__main__': pass ================================================ FILE: memory/vision_sop.md ================================================ # Vision API SOP ## ⚠️ 前置规则(必须遵守) 1. **先枚举窗口**:调用 vision 前必须先用 `pygetwindow` 枚举窗口标题,确认目标窗口存在且已激活到前台。窗口不存在就不要截图。 2. **🚫 禁止全屏截图**:必须先利用ljqCtrl截取窗口区域。能截局部(如标题栏)就不截整窗口,能截窗口就绝不全屏。全屏截图在任何场景下都不允许。 3. **能不用 vision 就不用**:如果窗口标题/本地 OCR(`ocr_utils.py`)能获取所需信息,就不要调用 vision API,省 token 且更可靠。Vision 是最后手段。 ## 快速用法 ```python from vision_api import ask_vision result = ask_vision(image, prompt="描述图片内容", backend="claude", timeout=60, max_pixels=1_440_000) # image: 文件路径(str/Path) 或 PIL Image # backend: 'claude'(默认) | 'openai' | 'modelscope' # 返回 str:成功为模型回复,失败为 'Error: ...' ``` ## 如果没有 `vision_api.py`,初次构建vision能力 1. 复制 `memory/vision_api.template.py` → `memory/vision_api.py` 2. 只改头部"用户配置区":去 `mykey.py` 里扫描变量名(⚠️ 只看名字,禁止输出 apikey 值),尝试找能用配置名填入 `CLAUDE_CONFIG_KEY` / `OPENAI_CONFIG_KEY`,`DEFAULT_BACKEND` 选后端,并测试 3. 保底:没有可用 config 时去 `https://modelscope.cn/my/myaccesstoken` 申请 token 填入 `MODELSCOPE_API_KEY` ================================================ FILE: memory/vue3_component_sop.md ================================================ # Vue 3 自定义组件 JS 操作 SOP ## 问题 Vue 3 自定义组件(如 OxdSelect)通过 `addEventListener` 绑定事件,JS `dispatchEvent` 产生的事件 `isTrusted: false`,组件不响应。 - `element.click()` 无效(组件可能绑定 mousedown 而非 click) - `dispatchEvent(new MouseEvent('mousedown'))` 无效(isTrusted:false) - `element.focus()` 不触发 Vue 绑定的 focus handler ## 解决方案:直接操作 Vue 组件实例 ### 1. 获取 Vue 3 根入口 ```javascript const rootVnode = document.getElementById('app')._vnode; ``` ### 2. 遍历 vnode 树匹配 DOM 元素 ```javascript function findCompByEl(vnode, targetEl, depth = 0) { if (depth > 50 || !vnode) return null; const comp = vnode.component; if (comp) { if (comp.vnode?.el === targetEl || comp.subTree?.el === targetEl) return comp; if (comp.vnode?.el?.contains?.(targetEl)) { const result = findCompByEl(comp.subTree, targetEl, depth + 1); if (result) return result; return comp; } const subResult = findCompByEl(comp.subTree, targetEl, depth + 1); if (subResult) return subResult; } if (vnode.children && Array.isArray(vnode.children)) { for (const child of vnode.children) { const result = findCompByEl(child, targetEl, depth + 1); if (result) return result; } } if (vnode.dynamicChildren) { for (const child of vnode.dynamicChildren) { const result = findCompByEl(child, targetEl, depth + 1); if (result) return result; } } return null; } ``` ### 3. 调用组件方法 ```javascript // 目标DOM的parentElement通常是组件根元素 const comp = findCompByEl(rootVnode, targetElement.parentElement); const ctx = comp.proxy; // 查看可用方法 Object.keys(ctx).filter(k => !k.startsWith('_') && !k.startsWith('$')); // Select 类组件:直接调用 onSelect ctx.onSelect({id: 'USD', label: 'United States Dollar'}); // 获取选项列表 ctx.computedOptions; // [{id, label, _selected}, ...] ``` ## 组件层级注意 - **展示层**(如 OxdSelectText):只有 onToggle/onFocus/onBlur,调用无实际效果 - **逻辑层**(如 OxdSelectInput,是展示层的父组件):有 openDropdown/onSelect/computedOptions/onCloseDropdown - 定位逻辑层:用 `targetElement.parentElement` 而非 targetElement 本身 ### 弹窗内 Select 同样纯 JS 优先(已验证) - 弹窗(`.oxd-dialog-sheet`)内的 `.oxd-select-text` 用循环向上查找同样能命中 `OxdSelectInput`,`onSelect` 正常工作。 - 不需要 CDP 兜底。仅当循环 8 层仍找不到组件时才考虑 CDP 打开+JS 点 option。 ### 循环向上查找模式(推荐) 单层 `parentElement` 可能不够,用循环更健壮: ```javascript function findSelectComp(selectTextEl) { for (let el = selectTextEl, up = 0; el && up < 8; el = el.parentElement, up++) { const comp = findCompByEl(rootVnode, el); if (comp?.proxy?.onSelect && comp.proxy.computedOptions?.length) return comp; } return null; // 找不到再考虑CDP兜底 } ``` ## 普通 Input/Textarea 操作(nativeSetter) Vue 3 的 `v-model` 监听 input 事件,直接 `el.value = x` 不触发响应式。需用原型 setter: ```javascript // Input const setter = Object.getOwnPropertyDescriptor(HTMLInputElement.prototype, 'value').set; setter.call(inputEl, '新值'); inputEl.dispatchEvent(new Event('input', {bubbles: true})); inputEl.dispatchEvent(new Event('change', {bubbles: true})); // Textarea const taSetter = Object.getOwnPropertyDescriptor(HTMLTextAreaElement.prototype, 'value').set; taSetter.call(textareaEl, '内容'); textareaEl.dispatchEvent(new Event('input', {bubbles: true})); ``` ### Date Input 特殊处理 日期组件通常有 blur 校验,需要 focus→赋值→blur 完整链: ```javascript dateInput.focus(); setter.call(dateInput, '2026-08-05'); dateInput.dispatchEvent(new Event('input', {bubbles: true})); dateInput.dispatchEvent(new Event('change', {bubbles: true})); dateInput.dispatchEvent(new Event('blur', {bubbles: true})); ``` ### Button 普通 `.click()` 即可,Vue 3 不检查 button click 的 isTrusted。 ### File Upload (input[type="file"]) 浏览器安全模型禁止JS直接 `input.value='path'`,但可用 DataTransfer API 构造 FileList: ```javascript const fileInput = document.querySelector('input[type="file"]'); const content = '文件内容'; const file = new File([content], 'filename.txt', { type: 'text/plain', lastModified: Date.now() }); const dt = new DataTransfer(); dt.items.add(file); fileInput.files = dt.files; // Chrome 62+ 支持 fileInput.dispatchEvent(new Event('input', { bubbles: true })); fileInput.dispatchEvent(new Event('change', { bubbles: true })); ``` - 适用于任何框架(非Vue3特有),纯浏览器API - 可构造任意类型文件(Blob/ArrayBuffer均可传入File构造器) - ⚠ CDP `DOM.setFileInputFiles` 只设files属性不触发事件(Chrome通用行为),DataTransfer+dispatch是唯一纯JS方案 - ⚠ 确保弹窗/容器已打开再querySelector,否则input不在DOM中 ## 泛化到其他 Vue3 站点(未逐一验证,思路层面) 本 SOP 的核心方法(根 vnode → findCompByEl → proxy)是 Vue3 通用的,但具体方法名/属性名因 UI 库而异。 面对陌生 Vue3 站点的探测思路: 1. **确认是 Vue3** — `document.getElementById('app')?.__vue_app__` 存在即可 2. **定位目标 DOM** — 用选择器找到要操作的元素(如某个 select wrapper) 3. **从 DOM 反查组件** — 用 findCompByEl 从目标元素及其父级向上找,拿到 component 4. **探测组件能力** — 拿到 comp 后查看: - `Object.keys(comp.proxy.$options.methods || {})` → 组件方法名 - `Object.keys(comp.props || {})` → props - `Object.keys(comp.setupState || {})` → setup 暴露的响应式数据和函数 - 重点找类似 onSelect/handleSelect/select/setValue 的方法,以及 options/items/computedOptions 之类的选项列表 5. **试调** — 找到疑似选中方法后,传入选项对象试调,观察 DOM 是否更新 6. **选项格式** — 不同库的 option 结构不同(可能是 `{id, label}` 也可能是 `{value, text}` 或纯字符串),从选项列表数据中取一个完整对象传入即可 注意事项: - 有些库用 `emits` 而非 methods,选中逻辑可能在父组件而非子组件 - 有些库 prod build 会 minify 方法名,此时 setupState 里的 key 可能是短名,需结合行为猜测 - Composition API 组件的逻辑主要在 setupState 而非 $options.methods - 如果 proxy 上找不到方法,试试 `comp.exposed`(`