Full Code of zai-org/Open-AutoGLM for AI

main 86f55382982f cached

56 files

2.7 MB

705.3k tokens

253 symbols

1 requests

Download .txt

Showing preview only (2,820K chars total). Download the full file or copy to clipboard to get everything.

Repository: zai-org/Open-AutoGLM
Branch: main
Commit: 86f55382982f
Files: 56
Total size: 2.7 MB

Directory structure:
gitextract_6ew0oeqv/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yaml
│   │   └── feature-request.yaml
│   └── PULL_REQUEST_TEMPLATE.md
├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── README.md
├── README_coding_agent.md
├── README_en.md
├── docs/
│   └── ios_setup/
│       └── ios_setup.md
├── examples/
│   ├── basic_usage.py
│   └── demo_thinking.py
├── ios.py
├── main.py
├── phone_agent/
│   ├── __init__.py
│   ├── actions/
│   │   ├── __init__.py
│   │   ├── handler.py
│   │   └── handler_ios.py
│   ├── adb/
│   │   ├── __init__.py
│   │   ├── connection.py
│   │   ├── device.py
│   │   ├── input.py
│   │   └── screenshot.py
│   ├── agent.py
│   ├── agent_ios.py
│   ├── config/
│   │   ├── __init__.py
│   │   ├── apps.py
│   │   ├── apps_harmonyos.py
│   │   ├── apps_ios.py
│   │   ├── i18n.py
│   │   ├── prompts.py
│   │   ├── prompts_en.py
│   │   ├── prompts_zh.py
│   │   └── timing.py
│   ├── device_factory.py
│   ├── hdc/
│   │   ├── __init__.py
│   │   ├── connection.py
│   │   ├── device.py
│   │   ├── input.py
│   │   └── screenshot.py
│   ├── model/
│   │   ├── __init__.py
│   │   └── client.py
│   └── xctest/
│       ├── __init__.py
│       ├── connection.py
│       ├── device.py
│       ├── input.py
│       └── screenshot.py
├── requirements.txt
├── resources/
│   ├── WECHAT.md
│   ├── privacy_policy.txt
│   └── privacy_policy_en.txt
├── scripts/
│   ├── check_deployment_cn.py
│   ├── check_deployment_en.py
│   ├── sample_messages.json
│   └── sample_messages_en.json
└── setup.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.yaml
================================================
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve Open-AutoGLM / 提交一个 Bug 问题报告来帮助我们改进 Open-AutoGLM
body:
  - type: textarea
    id: system-info
    attributes:
      label: System Info / 系統信息
      description: Your operating environment / 您的运行环境信息
      placeholder: Includes Cuda version, Transformers version, Python version, operating system, hardware information (if you suspect a hardware problem)... / 包括Cuda版本，Transformers版本，Python版本，操作系统，硬件信息(如果您怀疑是硬件方面的问题)...
    validations:
      required: true

  - type: textarea
    id: who-can-help
    attributes:
      label: Who can help? / 谁可以帮助到您？
      description: |
        Your issue will be replied to more quickly if you can figure out the right person to tag with @
        All issues are read by one of the maintainers, so if you don't know who to tag, just leave this blank and our maintainer will ping the right person.

        Please tag fewer than 3 people.

        如果您能找到合适的标签 @，您的问题会更快得到回复。
        所有问题都会由我们的维护者阅读，如果您不知道该标记谁，只需留空，我们的维护人员会找到合适的开发组成员来解决问题。

        标记的人数应该不超过 3 个人。

        If it's not a bug in these three subsections, you may not specify the helper. Our maintainer will find the right person in the development group to solve the problem.

        如果不是这三个子版块的bug，您可以不指明帮助者，我们的维护人员会找到合适的开发组成员来解决问题。

      placeholder: "@Username ..."

  - type: checkboxes
    id: information-scripts-examples
    attributes:
      label: Information / 问题信息
      description: 'The problem arises when using: / 问题出现在'
      options:
        - label: "The official example scripts / 官方的示例脚本"
        - label: "My own modified scripts / 我自己修改的脚本和任务"

  - type: textarea
    id: reproduction
    validations:
      required: true
    attributes:
      label: Reproduction / 复现过程
      description: |
        Please provide a code example that reproduces the problem you encountered, preferably with a minimal reproduction unit.
        If you have code snippets, error messages, stack traces, please provide them here as well.
        Please format your code correctly using code tags. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
        Do not use screenshots, as they are difficult to read and (more importantly) do not allow others to copy and paste your code.

        请提供能重现您遇到的问题的代码示例,最好是最小复现单元。
        如果您有代码片段、错误信息、堆栈跟踪，也请在此提供。
        请使用代码标签正确格式化您的代码。请参见 https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
        请勿使用截图，因为截图难以阅读，而且（更重要的是）不允许他人复制粘贴您的代码。
      placeholder: |
        Steps to reproduce the behavior/复现Bug的步骤:

          1.
          2.
          3.

  - type: textarea
    id: expected-behavior
    validations:
      required: true
    attributes:
      label: Expected behavior / 期待表现
      description: "A clear and concise description of what you would expect to happen. /简单描述您期望发生的事情。"


================================================
FILE: .github/ISSUE_TEMPLATE/feature-request.yaml
================================================
name: "\U0001F680 Feature request"
description: Submit a request for a new Open-AutoGLM / 提交一个新的 Open-AutoGLM 的功能建议
labels: [ "feature" ]
body:
  - type: textarea
    id: feature-request
    validations:
      required: true
    attributes:
      label: Feature request  / 功能建议
      description: |
        A brief description of the functional proposal. Links to corresponding papers and code are desirable.
        对功能建议的简述。最好提供对应的论文和代码链接

  - type: textarea
    id: motivation
    validations:
      required: true
    attributes:
      label: Motivation / 动机
      description: |
        Your motivation for making the suggestion. If that motivation is related to another GitHub issue, link to it here.
        您提出建议的动机。如果该动机与另一个 GitHub 问题有关，请在此处提供对应的链接。

  - type: textarea
    id: contribution
    validations:
      required: true
    attributes:
      label: Your contribution / 您的贡献
      description: |

        Your PR link or any other link you can help with.
        您的PR链接或者其他您能提供帮助的链接。


================================================
FILE: .github/PULL_REQUEST_TEMPLATE.md
================================================
# Contribution Guide

We welcome your contributions to this repository. To ensure elegant code style and better code quality, we have prepared
the following contribution guidelines.

## What We Accept

+ This PR fixes a typo or improves the documentation (if this is the case, you may skip the other checks).
+ This PR fixes a specific issue — please reference the issue number in the PR description. Make sure your code strictly
  follows the coding standards below.
+ This PR introduces a new feature — please clearly explain the necessity and implementation of the feature. Make sure
  your code strictly follows the coding standards below.

## Code Style Guide

Good code style is an art. We have prepared a `pre-commit` hook to enforce consistent code
formatting across the project. You can clean up your code following the steps below:

```shell
pre-commit run --all-files
```

If your code complies with the standards, you should not see any errors.

## Naming Conventions

+ Please use **English** for naming; do not use Pinyin or other languages. All comments should also be in English.
+ Follow **PEP8** naming conventions strictly, and use underscores to separate words. Avoid meaningless names such as
  `a`, `b`, `c`.

## For glmv-reward Contributors

Before PR, Please run:

```bash
cd glmv-reward/
uv sync
uv run poe lint
uv run poe typecheck
```


================================================
FILE: .gitignore
================================================
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# Virtual environments
venv/
ENV/
env/
.venv/

# IDE
.idea/
.vscode/
*.swp
*.swo
*~

# Testing
.pytest_cache/
.coverage
htmlcov/
.tox/
.nox/

# Type checking
.mypy_cache/

# Jupyter
.ipynb_checkpoints/

# OS
.DS_Store
Thumbs.db

# Project specific
*.log
/tmp/
screenshots/

# Keep old files during transition
call_model.py
app_package_name.py

.claude/
.venv

================================================
FILE: .pre-commit-config.yaml
================================================
default_install_hook_types:
  - pre-commit
  - commit-msg
exclude: '^phone_agent/config/apps\.py$'
exclude: '^README_en\.md$'
default_stages:
  - pre-commit # Run locally
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
  rev: v0.11.7
  hooks:
  - id: ruff
    args: [--output-format, github, --fix, --select, I]
  - id: ruff-format
- repo: https://github.com/crate-ci/typos
  rev: v1.32.0
  hooks:
  - id: typos
- repo: https://github.com/jackdewinter/pymarkdown
  rev: v0.9.29
  hooks:
  - id: pymarkdown
    args: [fix]


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to the Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright 2025 Zhipu AI

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
# Open-AutoGLM

[Readme in English](README_en.md)

<div align="center">
<img src=resources/logo.svg width="20%"/>
</div>
<p align="center">
    👋 加入我们的 <a href="resources/WECHAT.md" target="_blank">微信</a> 社区
</p>
<p align="center">
    👋 关注智谱 AI 输入法 <a href="https://x.com/Autotyper_Agent?s=20" target="_blank">X</a> 账号
</p>
<p align="center">
    🎤 进一步在我们的产品 <a href="https://autoglm.zhipuai.cn/autotyper/" target="_blank">智谱 AI 输入法</a> 体验“用嘴发指令”
</p>
<p align="center">
    <a href="https://mp.weixin.qq.com/s/wRp22dmRVF23ySEiATiWIQ" target="_blank">AutoGLM 实战派</a> 开发者激励活动火热进行中，跑通、二创即可瓜分数万元现金奖池！成果提交 👉 <a href="https://zhipu-ai.feishu.cn/share/base/form/shrcnE3ZuPD5tlOyVJ7d5Wtir8c?from=navigation" target="_blank">入口</a>
</p>

## 懒人版快速安装

你可以使用Claude Code，配置 [GLM Coding Plan](https://bigmodel.cn/glm-coding) 后，输入以下提示词，快速部署本项目。

```
访问文档，为我安装 AutoGLM
https://raw.githubusercontent.com/zai-org/Open-AutoGLM/refs/heads/main/README.md
```

## 项目介绍

Phone Agent 是一个基于 AutoGLM 构建的手机端智能助理框架，它能够以多模态方式理解手机屏幕内容，并通过自动化操作帮助用户完成任务。系统通过
ADB(Android Debug Bridge)来控制设备，以视觉语言模型进行屏幕感知，再结合智能规划能力生成并执行操作流程。用户只需用自然语言描述需求，如“打开小红书搜索美食”，Phone
Agent 即可自动解析意图、理解当前界面、规划下一步动作并完成整个流程。系统还内置敏感操作确认机制，并支持在登录或验证码场景下进行人工接管。同时，它提供远程
ADB 调试能力，可通过 WiFi 或网络连接设备，实现灵活的远程控制与开发。

> ⚠️
> 本项目仅供研究和学习使用。严禁用于非法获取信息、干扰系统或任何违法活动。请仔细审阅 [使用条款](resources/privacy_policy.txt)。

## 与其他自动化工具集成

### Midscene.js

[Midscene.js](https://midscenejs.com/zh/index.html) 是一款由视觉模型驱动的开源 UI 自动化 SDK，支持通过 JavaScript 或 Yaml 格式的流程语法，实现多平台的自动化。

目前 Midscene.js 已完成对 AutoGLM 模型的适配，你可以通过 [Midscene.js 接入指南](https://midscenejs.com/zh/model-common-config.html#auto-glm) 快速体验 AutoGLM 在 iOS 和 Android 设备上的自动化效果。

## 模型下载地址

| Model                         | Download Links                                                                                                                                                         |
|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| AutoGLM-Phone-9B              | [🤗 Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B)<br>[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B)                           |
| AutoGLM-Phone-9B-Multilingual | [🤗 Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B-Multilingual)<br>[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B-Multilingual) |

其中，`AutoGLM-Phone-9B` 是针对中文手机应用优化的模型，而 `AutoGLM-Phone-9B-Multilingual` 支持英语场景，适用于包含英文等其他语言内容的应用。

## Android 环境准备

### 1. Python 环境

建议使用 Python 3.10 及以上版本。

### 2. 手机调试命令行工具

根据你的设备类型选择相应的工具：

#### 对于 Android 设备 - 使用 ADB

1. 下载官方 ADB [安装包](https://developer.android.com/tools/releases/platform-tools?hl=zh-cn)，并解压到自定义路径
2. 配置环境变量

- MacOS 配置方法：在 `Terminal` 或者任何命令行工具里

  ```bash
  # 假设解压后的目录为 ~/Downloads/platform-tools。如果不是请自行调整命令。
  export PATH=${PATH}:~/Downloads/platform-tools
  ```

- Windows 配置方法：可参考 [第三方教程](https://blog.csdn.net/x2584179909/article/details/108319973) 进行配置。

#### 对于鸿蒙设备 (HarmonyOS NEXT版本以上) - 使用 HDC

1. 下载 HDC 工具：
   - 从 [HarmonyOS SDK](https://developer.huawei.com/consumer/cn/download/) 下载
2. 配置环境变量

- MacOS/Linux 配置方法：

  ```bash
  # 假设解压后的目录为 ~/Downloads/harmonyos-sdk/toolchains。请根据实际路径调整。
  export PATH=${PATH}:~/Downloads/harmonyos-sdk/toolchains
  ```

- Windows 配置方法：将 HDC 工具所在目录添加到系统 PATH 环境变量

### 3. Android 7.0+ 或 HarmonyOS 设备，并启用 `开发者模式` 和 `USB 调试`

1. 开发者模式启用：通常启用方法是，找到 `设置-关于手机-版本号` 然后连续快速点击 10
   次左右，直到弹出弹窗显示“开发者模式已启用”。不同手机会有些许差别，如果找不到，可以上网搜索一下教程。
2. USB 调试启用：启用开发者模式之后，会出现 `设置-开发者选项-USB 调试`，勾选启用
3. 部分机型在设置开发者选项以后, 可能需要重启设备才能生效. 可以测试一下: 将手机用USB数据线连接到电脑后, `adb devices`
   查看是否有设备信息, 如果没有说明连接失败.

**请务必仔细检查相关权限**

![权限](resources/screenshot-20251209-181423.png)

### 4. 安装 ADB Keyboard(仅 Android 设备需要，用于文本输入)

**注意：鸿蒙设备使用原生输入方法，无需安装 ADB Keyboard。**

如果你使用的是 Android 设备：

下载 [安装包](https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk) 并在对应的安卓设备中进行安装。
注意，安装完成后还需要到 `设置-输入法` 或者 `设置-键盘列表` 中启用 `ADB Keyboard` 才能生效(或使用命令`adb shell ime enable com.android.adbkeyboard/.AdbIME`[How-to-use](https://github.com/senzhk/ADBKeyBoard/blob/master/README.md#how-to-use))

## iPhone 环境准备

如果你使用的是 iPhone 设备，请参考专门的 iOS 配置文档：

📱 [iOS 环境配置指南](docs/ios_setup/ios_setup.md)

该文档详细介绍了如何配置 WebDriverAgent 和 iPhone 设备，以便在 iOS 上使用 AutoGLM。

## 部署准备工作

### 1. 安装依赖

```bash
pip install -r requirements.txt 
pip install -e .
```

### 2. 配置 ADB 或 HDC

#### 对于 Android 设备

确认 **USB数据线具有数据传输功能**, 而不是仅有充电功能

确保已安装 ADB 并使用 **USB数据线** 连接设备：

```bash
# 检查已连接的设备
adb devices

# 输出结果应显示你的设备，如：
# List of devices attached
# emulator-5554   device
```

#### 对于鸿蒙设备

确认 **USB数据线具有数据传输功能**, 而不是仅有充电功能

确保已安装 HDC 并使用 **USB数据线** 连接设备：

```bash
# 检查已连接的设备
hdc list targets

# 输出结果应显示你的设备，如：
# 7001005458323933328a01bce01c2500
```

### 3. 启动模型服务

你可以选择自行部署模型服务，或使用第三方模型服务商。

#### 选项 A: 使用第三方模型服务

如果你不想自行部署模型，可以使用以下已部署我们模型的第三方服务：

**1. 智谱 BigModel**

- 文档: https://docs.bigmodel.cn/cn/api/introduction
- `--base-url`: `https://open.bigmodel.cn/api/paas/v4`
- `--model`: `autoglm-phone`
- `--apikey`: 在智谱平台申请你的 API Key

**2. ModelScope(魔搭社区)**

- 文档: https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B
- `--base-url`: `https://api-inference.modelscope.cn/v1`
- `--model`: `ZhipuAI/AutoGLM-Phone-9B`
- `--apikey`: 在 ModelScope 平台申请你的 API Key

使用第三方服务的示例：

```bash
# 使用智谱 BigModel
python main.py --base-url https://open.bigmodel.cn/api/paas/v4 --model "autoglm-phone" --apikey "your-bigmodel-api-key" "打开美团搜索附近的火锅店"

# 使用 ModelScope
python main.py --base-url https://api-inference.modelscope.cn/v1 --model "ZhipuAI/AutoGLM-Phone-9B" --apikey "your-modelscope-api-key" "打开美团搜索附近的火锅店"
```

#### 选项 B: 自行部署模型

如果你希望在本地或自己的服务器上部署模型：

1. 按照 `requirements.txt` 中 `For Model Deployment` 章节自行安装推理引擎框架。

对于SGLang， 除了使用pip安装，你也可以使用官方docker:
>
> ```shell
> docker pull lmsysorg/sglang:v0.5.6.post1
> ```
>
> 进入容器，执行
>
> ```
> pip install nvidia-cudnn-cu12==9.16.0.29
> ```

对于 vLLM，除了使用pip 安装，你也可以使用官方docker:
>
> ```shell
> docker pull vllm/vllm-openai:v0.12.0
> ```
>
> 进入容器，执行
>
> ```
> pip install -U transformers --pre
> ```

**注意**: 上述步骤出现的关于 transformers 的依赖冲突可以忽略。

1. 在对应容器或者实体机中(非容器安装)下载模型，通过 SGlang / vLLM 启动，得到 OpenAI 格式服务。这里提供一个 vLLM部署方案，请严格遵循我们提供的启动参数:

- vLLM:

```shell
python3 -m vllm.entrypoints.openai.api_server \
 --served-model-name autoglm-phone-9b \
 --allowed-local-media-path /   \
 --mm-encoder-tp-mode data \
 --mm_processor_cache_type shm \
 --mm_processor_kwargs "{\"max_pixels\":5000000}" \
 --max-model-len 25480  \
 --chat-template-content-format string \
 --limit-mm-per-prompt "{\"image\":10}" \
 --model zai-org/AutoGLM-Phone-9B \
 --port 8000
```

- SGLang:

```shell
python3 -m sglang.launch_server --model-path  zai-org/AutoGLM-Phone-9B \
        --served-model-name autoglm-phone-9b  \
        --context-length 25480  \
        --mm-enable-dp-encoder   \
        --mm-process-config '{"image":{"max_pixels":5000000}}'  \
        --port 8000
```

- 该模型结构与 `GLM-4.1V-9B-Thinking` 相同, 关于模型部署的详细内容，你也以查看 [GLM-V](https://github.com/zai-org/GLM-V)
  获取模型部署和使用指南。

- 运行成功后，将可以通过 `http://localhost:8000/v1` 访问模型服务。 如果您在远程服务器部署模型, 使用该服务器的IP访问模型.

### 4. 检查模型部署

模型服务启动后，可以使用检查脚本验证部署是否成功：

```bash
python scripts/check_deployment_cn.py --base-url http://你的IP:你的端口/v1 --model 模型名称
```

脚本将发送测试请求并展示模型的推理结果，你可以根据输出判断模型部署是否正常工作。

基于给定的任务, 预期输出如下。**如果思维链长度很短, 或者出现了乱码, 很可能是模型部署失败**, 请仔细检查文档要求的配置和依赖。

```
<think>用户想要比较这个洗发水在京东和淘宝上的价格，然后选择最便宜的平台下单。当前在小红书app上，显示的是一个关于LUMMI MOOD洗发水的帖子。

我需要：
1. 先启动京东app，搜索这个洗发水
2. 查看京东的价格
3. 再启动淘宝app，搜索这个洗发水
4. 查看淘宝的价格
5. 比较价格后，选择最便宜的京东或淘宝下单

首先，我需要从当前的小红书界面退出，然后启动京东app。</think>
<answer>do(action="Launch", app="京东")
```

**参数说明：**
- `--base-url`: 模型服务地址(根据实际部署地址修改)
- `--model`: 模型名称
- `--messages-file`: 可选，指定自定义测试消息文件(默认使用 `scripts/sample_messages.json`)

## 使用 AutoGLM

### 命令行

根据你部署的模型, 设置 `--base-url` 和 `--model` 参数, 设置 `--device-type` 指定是安卓设备或鸿蒙设备 (默认值 adb 表示安卓设备, hdc 表示鸿蒙设备). 例如:

```bash
# Android 设备 - 交互模式
python main.py --base-url http://localhost:8000/v1 --model "autoglm-phone-9b"

# Android 设备 - 指定任务
python main.py --base-url http://localhost:8000/v1 "打开美团搜索附近的火锅店"

# 鸿蒙设备 - 交互模式
python main.py --device-type hdc --base-url http://localhost:8000/v1 --model "autoglm-phone-9b"

# 鸿蒙设备 - 指定任务
python main.py --device-type hdc --base-url http://localhost:8000/v1 "打开美团搜索附近的火锅店"

# 使用 API Key 进行认证
python main.py --apikey sk-xxxxx

# 使用英文 system prompt
python main.py --lang en --base-url http://localhost:8000/v1 "Open Chrome browser"

# 列出支持的应用（Android）
python main.py --list-apps

# 列出支持的应用（鸿蒙）
python main.py --device-type hdc --list-apps
```

### Python API

```python
from phone_agent import PhoneAgent
from phone_agent.model import ModelConfig

# Configure model
model_config = ModelConfig(
    base_url="http://localhost:8000/v1",
    model_name="autoglm-phone-9b",
)

# 创建 Agent
agent = PhoneAgent(model_config=model_config)

# 执行任务
result = agent.run("打开淘宝搜索无线耳机")
print(result)
```

## 远程调试

Phone Agent 支持通过 WiFi/网络进行远程 ADB/HDC 调试，无需 USB 连接即可控制设备。

### 配置远程调试

#### 在手机端开启无线调试

##### Android 设备

确保手机和电脑在同一个WiFi中，如图所示

![开启无线调试](resources/setting.png)

##### 鸿蒙设备

确保手机和电脑在同一个WiFi中：
1. 进入 `设置 > 系统和更新 > 开发者选项`
2. 开启 `USB 调试` 和 `无线调试`
3. 记录显示的 IP 地址和端口号

#### 在电脑端使用标准 ADB/HDC 命令

```bash
# Android 设备 - 通过 WiFi 连接, 改成手机显示的 IP 地址和端口
adb connect 192.168.1.100:5555

# 验证连接
adb devices
# 应显示：192.168.1.100:5555    device

# 鸿蒙设备 - 通过 WiFi 连接
hdc tconn 192.168.1.100:5555

# 验证连接
hdc list targets
# 应显示：192.168.1.100:5555
```

### 设备管理命令

#### Android 设备（ADB）

```bash
# 列出所有已连接设备
adb devices

# 连接远程设备
adb connect 192.168.1.100:5555

# 断开指定设备
adb disconnect 192.168.1.100:5555

# 指定设备执行任务
python main.py --device-id 192.168.1.100:5555 --base-url http://localhost:8000/v1 --model "autoglm-phone-9b" "打开抖音刷视频"
```

#### 鸿蒙设备（HDC）

```bash
# 列出所有已连接设备
hdc list targets

# 连接远程设备
hdc tconn 192.168.1.100:5555

# 断开指定设备
hdc tdisconn 192.168.1.100:5555

# 指定设备执行任务
python main.py --device-type hdc --device-id 192.168.1.100:5555 --base-url http://localhost:8000/v1 --model "autoglm-phone-9b" "打开抖音刷视频"
```

### Python API 远程连接

#### Android 设备（ADB）

```python
from phone_agent.adb import ADBConnection, list_devices

# 创建连接管理器
conn = ADBConnection()

# 连接远程设备
success, message = conn.connect("192.168.1.100:5555")
print(f"连接状态: {message}")

# 列出已连接设备
devices = list_devices()
for device in devices:
    print(f"{device.device_id} - {device.connection_type.value}")

# 在 USB 设备上启用 TCP/IP
success, message = conn.enable_tcpip(5555)
ip = conn.get_device_ip()
print(f"设备 IP: {ip}")

# 断开连接
conn.disconnect("192.168.1.100:5555")
```

#### 鸿蒙设备（HDC）

```python
from phone_agent.hdc import HDCConnection, list_devices

# 创建连接管理器
conn = HDCConnection()

# 连接远程设备
success, message = conn.connect("192.168.1.100:5555")
print(f"连接状态: {message}")

# 列出已连接设备
devices = list_devices()
for device in devices:
    print(f"{device.device_id} - {device.connection_type.value}")

# 断开连接
conn.disconnect("192.168.1.100:5555")
```

### 远程连接问题排查

**连接被拒绝：**

- 确保设备和电脑在同一网络
- 检查防火墙是否阻止 5555 端口
- 确认已启用 TCP/IP 模式：`adb tcpip 5555`

**连接断开：**

- WiFi 可能断开了，使用 `--connect` 重新连接
- 部分设备重启后会禁用 TCP/IP，需要通过 USB 重新启用

**多设备：**

- 使用 `--device-id` 指定要使用的设备
- 或使用 `--list-devices` 查看所有已连接设备

## 配置

### 自定义SYSTEM PROMPT

系统提供中英文两套 prompt，通过 `--lang` 参数切换：

- `--lang cn` - 中文 prompt(默认)，配置文件：`phone_agent/config/prompts_zh.py`
- `--lang en` - 英文 prompt，配置文件：`phone_agent/config/prompts_en.py`

可以直接修改对应的配置文件来增强模型在特定领域的能力，或通过注入 app 名称禁用某些 app。

### 环境变量

| 变量                          | 描述                     | 默认值                        |
|-----------------------------|------------------------|----------------------------|
| `PHONE_AGENT_BASE_URL`      | 模型 API 地址              | `http://localhost:8000/v1` |
| `PHONE_AGENT_MODEL`         | 模型名称                   | `autoglm-phone-9b`         |
| `PHONE_AGENT_API_KEY`       | 模型认证 API Key           | `EMPTY`                    |
| `PHONE_AGENT_MAX_STEPS`     | 每个任务最大步数               | `100`                      |
| `PHONE_AGENT_DEVICE_ID`     | ADB/HDC 设备 ID          | (自动检测)                     |
| `PHONE_AGENT_DEVICE_TYPE`   | 设备类型 (`adb` 或 `hdc`)   | `adb`                      |
| `PHONE_AGENT_LANG`          | 语言 (`cn` 或 `en`)       | `cn`                       |

### 模型配置

```python
from phone_agent.model import ModelConfig

config = ModelConfig(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",  # API 密钥(如需要)
    model_name="autoglm-phone-9b",  # 模型名称
    max_tokens=3000,  # 最大输出 token 数
    temperature=0.1,  # 采样温度
    frequency_penalty=0.2,  # 频率惩罚
)
```

### Agent 配置

```python
from phone_agent.agent import AgentConfig

config = AgentConfig(
    max_steps=100,  # 每个任务最大步数
    device_id=None,  # ADB 设备 ID(None 为自动检测)
    lang="cn",  # 语言选择：cn(中文)或 en(英文)
    verbose=True,  # 打印调试信息(包括思考过程和执行动作)
)
```

### Verbose 模式输出

当 `verbose=True` 时，Agent 会在每一步输出详细信息：

```
==================================================
💭 思考过程:
--------------------------------------------------
当前在系统桌面，需要先启动小红书应用
--------------------------------------------------
🎯 执行动作:
{
  "_metadata": "do",
  "action": "Launch",
  "app": "小红书"
}
==================================================

... (执行动作后继续下一步)

==================================================
💭 思考过程:
--------------------------------------------------
小红书已打开，现在需要点击搜索框
--------------------------------------------------
🎯 执行动作:
{
  "_metadata": "do",
  "action": "Tap",
  "element": [500, 100]
}
==================================================

🎉 ================================================
✅ 任务完成: 已成功搜索美食攻略
==================================================
```

这样可以清楚地看到 AI 的推理过程和每一步的具体操作。

## 支持的应用

### Android 应用

Phone Agent 支持 50+ 款主流中文应用：

| 分类   | 应用              |
|------|-----------------|
| 社交通讯 | 微信、QQ、微博        |
| 电商购物 | 淘宝、京东、拼多多       |
| 美食外卖 | 美团、饿了么、肯德基      |
| 出行旅游 | 携程、12306、滴滴出行   |
| 视频娱乐 | bilibili、抖音、爱奇艺 |
| 音乐音频 | 网易云音乐、QQ音乐、喜马拉雅 |
| 生活服务 | 大众点评、高德地图、百度地图  |
| 内容社区 | 小红书、知乎、豆瓣       |

运行 `python main.py --list-apps` 查看完整列表。

### 鸿蒙应用

Phone Agent 支持 60+ 款鸿蒙原生应用和系统应用：

| 分类      | 应用                                       |
|---------|------------------------------------------|
| 社交通讯    | 微信、QQ、微博、飞书、企业微信                        |
| 电商购物    | 淘宝、京东、拼多多、唯品会、得物、闲鱼                     |
| 美食外卖    | 美团、美团外卖、大众点评、海底捞                        |
| 出行旅游    | 12306、滴滴出行、同程旅行、高德地图、百度地图               |
| 视频娱乐    | bilibili、抖音、快手、腾讯视频、爱奇艺、芒果TV            |
| 音乐音频    | QQ音乐、汽水音乐、喜马拉雅                           |
| 生活服务    | 小红书、知乎、今日头条、58同城、中国移动                   |
| AI与工具   | 豆包、WPS、UC浏览器、扫描全能王、美图秀秀                 |
| 系统应用    | 浏览器、日历、相机、时钟、云空间、文件管理器、相册、联系人、短信、设置等   |
| 华为服务    | 应用市场、音乐、视频、阅读、主题、天气                     |

运行 `python main.py --device-type hdc --list-apps` 查看完整列表。

## 可用操作

Agent 可以执行以下操作：

| 操作           | 描述              |
|--------------|-----------------|
| `Launch`     | 启动应用            |  
| `Tap`        | 点击指定坐标          |
| `Type`       | 输入文本            |
| `Swipe`      | 滑动屏幕            |
| `Back`       | 返回上一页           |
| `Home`       | 返回桌面            |
| `Long Press` | 长按              |
| `Double Tap` | 双击              |
| `Wait`       | 等待页面加载          |
| `Take_over`  | 请求人工接管(登录/验证码等) |

## 自定义回调

处理敏感操作确认和人工接管：

```python
def my_confirmation(message: str) -> bool:
    """敏感操作确认回调"""
    return input(f"确认执行 {message}？(y/n): ").lower() == "y"


def my_takeover(message: str) -> None:
    """人工接管回调"""
    print(f"请手动完成: {message}")
    input("完成后按回车继续...")


agent = PhoneAgent(
    confirmation_callback=my_confirmation,
    takeover_callback=my_takeover,
)
```

## 示例

查看 `examples/` 目录获取更多使用示例：

- `basic_usage.py` - 基础任务执行
- 单步调试模式
- 批量任务执行
- 自定义回调

## 二次开发

### 配置开发环境

二次开发需要使用开发依赖：

```bash
pip install -e ".[dev]"
```

### 运行测试

```bash
pytest tests/
```

### 完整项目结构

```
phone_agent/
├── __init__.py          # 包导出
├── agent.py             # PhoneAgent 主类
├── adb/                 # ADB 工具
│   ├── connection.py    # 远程/本地连接管理
│   ├── screenshot.py    # 屏幕截图
│   ├── input.py         # 文本输入 (ADB Keyboard)
│   └── device.py        # 设备控制 (点击、滑动等)
├── actions/             # 操作处理
│   └── handler.py       # 操作执行器
├── config/              # 配置
│   ├── apps.py          # 支持的应用映射
│   ├── prompts_zh.py    # 中文系统提示词
│   └── prompts_en.py    # 英文系统提示词
└── model/               # AI 模型客户端
    └── client.py        # OpenAI 兼容客户端
```

## 常见问题

我们列举了一些常见的问题，以及对应的解决方案：

### 设备未找到

尝试通过重启 ADB 服务来解决：

```bash
adb kill-server
adb start-server
adb devices
```

如果仍然无法识别，请检查：

1. USB 调试是否已开启
2. 数据线是否支持数据传输(部分数据线仅支持充电)
3. 手机上弹出的授权框是否已点击「允许」
4. 尝试更换 USB 接口或数据线

### 能打开应用，但无法点击

部分机型需要同时开启两个调试选项才能正常使用：

- **USB 调试**
- **USB 调试(安全设置)**

请在 `设置 → 开发者选项` 中检查这两个选项是否都已启用。

### 文本输入不工作

1. 确保设备已安装 ADB Keyboard
2. 在设置 > 系统 > 语言和输入法 > 虚拟键盘 中启用
3. Agent 会在需要输入时自动切换到 ADB Keyboard

### 截图失败(黑屏)

这通常意味着应用正在显示敏感页面(支付、密码、银行类应用)。Agent 会自动检测并请求人工接管。

### windows 编码异常问题

报错信息形如 `UnicodeEncodeError gbk code`

解决办法: 在运行代码的命令前面加上环境变量: `PYTHONIOENCODING=utf-8`

### 交互模式非TTY环境无法使用

报错形如: `EOF when reading a line`

解决办法: 使用非交互模式直接指定任务, 或者切换到 TTY 模式的终端应用.

### 引用

如果你觉得我们的工作有帮助，请引用以下论文：

```bibtex
@article{liu2024autoglm,
  title={Autoglm: Autonomous foundation agents for guis},
  author={Liu, Xiao and Qin, Bo and Liang, Dongzhu and Dong, Guang and Lai, Hanyu and Zhang, Hanchen and Zhao, Hanlin and Iong, Iat Long and Sun, Jiadai and Wang, Jiaqi and others},
  journal={arXiv preprint arXiv:2411.00820},
  year={2024}
}
@article{xu2025mobilerl,
  title={MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents},
  author={Xu, Yifan and Liu, Xiao and Liu, Xinghan and Fu, Jiaqi and Zhang, Hanchen and Jing, Bohao and Zhang, Shudan and Wang, Yuting and Zhao, Wenyi and Dong, Yuxiao},
  journal={arXiv preprint arXiv:2509.18119},
  year={2025}
}
```

---

## 自动化部署指南(面向 AI)

> **本章节专为 AI 助手(如 Claude Code)设计，用于自动化部署 Open-AutoGLM。**
>
> 如果你是人类读者，可以跳过本章节，按照上面的文档操作即可。

---

### 项目概述

Open-AutoGLM 是一个手机 Agent 框架：
- **输入**：用户的自然语言指令(如"打开微信发消息给张三")
- **输出**：自动操作用户的安卓手机完成任务
- **原理**：截图 → 视觉模型理解界面 → 输出点击坐标 → ADB 执行操作 → 循环

架构分为两部分：
1. **Agent 代码**(本仓库)：运行在用户电脑上，负责调用模型、解析动作、控制手机
2. **视觉模型服务**：可以是远程 API，也可以本地部署

---

### 部署前置检查

在开始部署前，请逐项向用户确认以下内容：

#### 硬件环境
- [ ] 用户有一台安卓手机(Android 7.0+)
- [ ] 用户有一根支持数据传输的 USB 数据线(不是仅充电线)
- [ ] 手机和电脑可以通过数据线连接

#### 手机端配置
- [ ] 手机已开启「开发者模式」(设置 → 关于手机 → 连续点击版本号 7 次)
- [ ] 手机已开启「USB 调试」(设置 → 开发者选项 → USB 调试)
- [ ] 部分机型需要同时开启「USB 调试(安全设置)」
- [ ] 手机已安装 ADB Keyboard 应用(下载地址：https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk)
- [ ] ADB Keyboard 已在系统设置中启用(设置 → 语言和输入法 → 启用 ADB Keyboard)

#### 模型服务确认(二选一)

**请明确询问用户：你是否已有可用的 AutoGLM 模型服务？**

- **选项 A：使用已部署的模型服务(推荐)**
  - 用户提供模型服务的 URL(如 `http://xxx.xxx.xxx.xxx:8000/v1`)
  - 无需本地 GPU，无需下载模型
  - 直接使用该 URL 作为 `--base-url` 参数

- **选项 B：本地部署模型(高配置要求)**
  - 需要 NVIDIA GPU(建议 24GB+ 显存)
  - 需要安装 vLLM 或 SGLang
  - 需要下载约 20GB 的模型文件
  - **如果用户是新手或不确定，强烈建议选择选项 A**

---

### 部署流程

#### 阶段一：环境准备

```bash
# 1. 安装 ADB 工具
# MacOS:
brew install android-platform-tools
# 或手动下载：https://developer.android.com/tools/releases/platform-tools

# Windows: 下载后解压，添加到 PATH 环境变量

# 2. 验证 ADB 安装
adb version
# 应输出版本信息

# 3. 连接手机并验证
# 用数据线连接手机，手机上点击「允许 USB 调试」
adb devices
# 应输出设备列表，如：
# List of devices attached
# XXXXXXXX    device
```

**如果 `adb devices` 显示空列表或 unauthorized：**
1. 检查手机上是否弹出授权框，点击「允许」
2. 检查 USB 调试是否开启
3. 尝试更换数据线或 USB 接口
4. 执行 `adb kill-server && adb start-server` 后重试

#### 阶段二：安装 Agent

```bash
# 1. 克隆仓库(如果还没有克隆)
git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM

# 2. 创建虚拟环境(推荐)
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 3. 安装依赖
pip install -r requirements.txt
pip install -e .
```

**注意：不需要 clone 模型仓库，模型通过 API 调用。**

#### 阶段三：配置模型服务

**如果用户选择选项 A(使用已部署的模型)：**

你可以使用以下第三方模型服务：

1. **智谱 BigModel**
   - 文档：https://docs.bigmodel.cn/cn/api/introduction
   - `--base-url`：`https://open.bigmodel.cn/api/paas/v4`
   - `--model`：`autoglm-phone`
   - `--apikey`：在智谱平台申请你的 API Key

2. **ModelScope(魔搭社区)**
   - 文档：https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B
   - `--base-url`：`https://api-inference.modelscope.cn/v1`
   - `--model`：`ZhipuAI/AutoGLM-Phone-9B`
   - `--apikey`：在 ModelScope 平台申请你的 API Key

使用示例：

```bash
# 使用智谱 BigModel
python main.py --base-url https://open.bigmodel.cn/api/paas/v4 --model "autoglm-phone" --apikey "your-bigmodel-api-key" "打开美团搜索附近的火锅店"

# 使用 ModelScope
python main.py --base-url https://api-inference.modelscope.cn/v1 --model "ZhipuAI/AutoGLM-Phone-9B" --apikey "your-modelscope-api-key" "打开美团搜索附近的火锅店"
```

或者直接使用用户提供的其他模型服务 URL，跳过本地模型部署步骤。

**如果用户选择选项 B(本地部署模型)：**

```bash
# 1. 安装 vLLM
pip install vllm

# 2. 启动模型服务(会自动下载模型，约 20GB)
python3 -m vllm.entrypoints.openai.api_server \
  --served-model-name autoglm-phone-9b \
  --allowed-local-media-path / \
  --mm-encoder-tp-mode data \
  --mm_processor_cache_type shm \
  --mm_processor_kwargs "{\"max_pixels\":5000000}" \
  --max-model-len 25480 \
  --chat-template-content-format string \
  --limit-mm-per-prompt "{\"image\":10}" \
  --model zai-org/AutoGLM-Phone-9B \
  --port 8000

# 模型服务 URL 为：http://localhost:8000/v1
```

#### 阶段四：验证部署

```bash
# 在 Open-AutoGLM 目录下执行
# 将 {MODEL_URL} 替换为实际的模型服务地址

python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b" "打开微信，对文件传输助手发送消息：部署成功"
```

**预期结果：**
- 手机自动打开微信
- 自动搜索「文件传输助手」
- 自动发送消息「部署成功」

---

### 异常处理

| 错误现象 | 可能原因 | 解决方案 |
|---------|---------|---------|
| `adb devices` 无输出 | USB 调试未开启或数据线问题 | 检查开发者选项，更换数据线 |
| `adb devices` 显示 unauthorized | 手机未授权 | 手机上点击「允许 USB 调试」|
| 能打开应用但无法点击 | 缺少安全调试权限 | 开启「USB 调试(安全设置)」|
| 中文输入变成乱码或无输入 | ADB Keyboard 未启用 | 在系统设置中启用 ADB Keyboard |
| 截图返回黑屏 | 敏感页面(支付/银行) | 正常现象，系统会自动处理 |
| 连接模型服务失败 | URL 错误或服务未启动 | 检查 URL，确认服务正在运行 |
| `ModuleNotFoundError` | 依赖未安装 | 执行 `pip install -r requirements.txt` |

---

### 部署要点

1. **优先确认手机连接**：在安装任何代码之前，先确保 `adb devices` 能看到设备
2. **不要跳过 ADB Keyboard**：没有它，中文输入会失败
3. **模型服务是外部依赖**：Agent 代码本身不包含模型，需要单独的模型服务
4. **遇到权限问题先检查手机设置**：大部分问题都是手机端配置不完整
5. **部署完成后用简单任务测试**：建议用「打开微信发消息给文件传输助手」作为验收标准

---

### 命令速查

```bash
# 检查 ADB 连接
adb devices

# 重启 ADB 服务
adb kill-server && adb start-server

# 安装依赖
pip install -r requirements.txt && pip install -e .

# 运行 Agent(交互模式)
python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b"

# 运行 Agent(单次任务)
python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b" "你的任务描述"

# 查看支持的应用列表
python main.py --list-apps
```

---

**部署完成的标志：手机能自动执行用户的自然语言指令。**


================================================
FILE: README_coding_agent.md
================================================
# Open-AutoGLM Quick Start for Coding Agent

<div align="center">
<img src=resources/logo.svg width="20%"/>
</div>

> **本文专为 AI 助手（如 Claude Code）阅读，用于自动化部署 Open-AutoGLM。**
>
> **This document is designed for AI assistants (such as Claude Code) to automate the deployment of Open-AutoGLM.**
>
> 如果你是人类读者，可以跳过本文，按照 README.md 文档操作即可。
>
> If you are a human reader, you can skip this document and follow the README.md instructions instead.

---

## Table of Contents / 目录

- [English](#english)
- [中文](#中文)

---

# English

## Prerequisites

### 1. Python Environment

Python 3.10 or higher is required.

### 2. ADB (Android Debug Bridge)

1. Download the official ADB [installation package](https://developer.android.com/tools/releases/platform-tools)
2. Extract and configure environment variables:

**macOS:**

```bash
# Assuming extracted to ~/Downloads/platform-tools
export PATH=${PATH}:~/Downloads/platform-tools
```

**Windows:** Add the extracted folder path to your system PATH. Refer to [this tutorial](https://blog.csdn.net/x2584179909/article/details/108319973) if needed.

### 3. Android Device Setup

Requirements:
- Android 7.0+ device or emulator
- Developer Mode enabled
- USB Debugging enabled

**Enable Developer Mode:**
1. Go to `Settings > About Phone > Build Number`
2. Tap rapidly about 10 times until "Developer mode enabled" appears

**Enable USB Debugging:**
1. Go to `Settings > Developer Options > USB Debugging`
2. Enable the toggle
3. Some devices may require a restart

**Important permissions to check:**

![Permissions](resources/screenshot-20251210-120416.png)

### 4. Install ADB Keyboard

Download and install [ADB Keyboard APK](https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk) on your device.

After installation, enable it in `Settings > Input Method` or `Settings > Keyboard List`.

---

## Installation

```bash
# Install dependencies
pip install -r requirements.txt

# Install package
pip install -e .
```

---

## ADB Configuration

**Ensure your USB cable supports data transfer (not charging only).**

### Verify Connection

```bash
# Check connected devices
adb devices

# Expected output:
# List of devices attached
# emulator-5554   device
```

### Remote Debugging (WiFi)

Ensure your phone and computer are on the same WiFi network.

![Enable Wireless Debugging](resources/screenshot-20251210-120630.png)

```bash
# Connect via WiFi (replace with your phone's IP and port)
adb connect 192.168.1.100:5555

# Verify connection
adb devices
```

### Device Management

```bash
# List all devices
adb devices

# Connect remote device
adb connect <ip>:<port>

# Disconnect device
adb disconnect <ip>:<port>
```

---

## Usage

### Command Line

```bash
# Interactive mode
python main.py --base-url <MODEL_API_URL> --model <MODEL_NAME>

# Execute specific task
python main.py --base-url <MODEL_API_URL> "Open Chrome browser"

# Use API key authentication
python main.py --apikey sk-xxxxx

# English system prompt
python main.py --lang en --base-url <MODEL_API_URL> "Open Chrome browser"

# List supported apps
python main.py --list-apps

# Specify device
python main.py --device-id 192.168.1.100:5555 --base-url <MODEL_API_URL> "Open TikTok"
```

### Python API

```python
from phone_agent import PhoneAgent
from phone_agent.model import ModelConfig

# Configure model
model_config = ModelConfig(
    base_url="<MODEL_API_URL>",
    model_name="<MODEL_NAME>",
)

# Create Agent
agent = PhoneAgent(model_config=model_config)

# Execute task
result = agent.run("Open eBay and search for wireless earbuds")
print(result)
```

---

## Environment Variables

| Variable                  | Description               | Default                      |
|---------------------------|---------------------------|------------------------------|
| `PHONE_AGENT_BASE_URL`    | Model API URL             | `http://localhost:8000/v1`   |
| `PHONE_AGENT_MODEL`       | Model name                | `autoglm-phone-9b`           |
| `PHONE_AGENT_API_KEY`     | API key                   | `EMPTY`                      |
| `PHONE_AGENT_MAX_STEPS`   | Max steps per task        | `100`                        |
| `PHONE_AGENT_DEVICE_ID`   | ADB device ID             | (auto-detect)                |
| `PHONE_AGENT_LANG`        | Language (`cn`/`en`)      | `cn`                         |

---

## Troubleshooting

### Device Not Found

```bash
adb kill-server
adb start-server
adb devices
```

Check:
1. USB debugging enabled
2. USB cable supports data transfer
3. Authorization popup approved on phone
4. Try different USB port/cable

### Can Open Apps but Cannot Tap

Enable both in `Settings > Developer Options`:
- **USB Debugging**
- **USB Debugging (Security Settings)**

### Text Input Not Working

1. Ensure ADB Keyboard is installed
2. Enable in `Settings > System > Language & Input > Virtual Keyboard`

### Windows Encoding Issues

Add environment variable before running:

```bash
PYTHONIOENCODING=utf-8 python main.py ...
```

---

# 中文

## 环境要求

### 1. Python 环境

需要 Python 3.10 及以上版本。

### 2. ADB (Android Debug Bridge)

1. 下载官方 ADB [安装包](https://developer.android.com/tools/releases/platform-tools?hl=zh-cn)
2. 解压并配置环境变量：

**macOS:**

```bash
# 假设解压到 ~/Downloads/platform-tools
export PATH=${PATH}:~/Downloads/platform-tools
```

**Windows:** 将解压后的文件夹路径添加到系统 PATH。可参考[此教程](https://blog.csdn.net/x2584179909/article/details/108319973)。

### 3. 安卓设备配置

要求：
- Android 7.0+ 设备或模拟器
- 开发者模式已启用
- USB 调试已启用

**启用开发者模式：**
1. 进入 `设置 > 关于手机 > 版本号`
2. 连续快速点击约 10 次，直到提示"开发者模式已启用"

**启用 USB 调试：**
1. 进入 `设置 > 开发者选项 > USB 调试`
2. 开启开关
3. 部分设备可能需要重启

**请务必检查以下权限：**

![权限](resources/screenshot-20251209-181423.png)

### 4. 安装 ADB Keyboard

在设备上下载并安装 [ADB Keyboard APK](https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk)。

安装后，在 `设置 > 输入法` 或 `设置 > 键盘列表` 中启用。

---

## 安装

```bash
# 安装依赖
pip install -r requirements.txt

# 安装包
pip install -e .
```

---

## ADB 配置

**请确保 USB 数据线支持数据传输（而非仅充电）。**

### 验证连接

```bash
# 检查已连接设备
adb devices

# 预期输出：
# List of devices attached
# emulator-5554   device
```

### 远程调试（WiFi）

确保手机和电脑在同一 WiFi 网络中。

![开启无线调试](resources/setting.png)

```bash
# 通过 WiFi 连接（替换为手机显示的 IP 和端口）
adb connect 192.168.1.100:5555

# 验证连接
adb devices
```

### 设备管理

```bash
# 列出所有设备
adb devices

# 连接远程设备
adb connect <ip>:<port>

# 断开设备
adb disconnect <ip>:<port>
```

---

## 使用方法

### 命令行

```bash
# 交互模式
python main.py --base-url <模型API地址> --model <模型名称>

# 执行指定任务
python main.py --base-url <模型API地址> "打开美团搜索附近的火锅店"

# 使用 API Key 认证
python main.py --apikey sk-xxxxx

# 使用英文系统提示词
python main.py --lang en --base-url <模型API地址> "Open Chrome browser"

# 列出支持的应用
python main.py --list-apps

# 指定设备
python main.py --device-id 192.168.1.100:5555 --base-url <模型API地址> "打开抖音刷视频"
```

### Python API

```python
from phone_agent import PhoneAgent
from phone_agent.model import ModelConfig

# 配置模型
model_config = ModelConfig(
    base_url="<模型API地址>",
    model_name="<模型名称>",
)

# 创建 Agent
agent = PhoneAgent(model_config=model_config)

# 执行任务
result = agent.run("打开淘宝搜索无线耳机")
print(result)
```

---

## 环境变量

| 变量                        | 描述               | 默认值                        |
|---------------------------|------------------|----------------------------|
| `PHONE_AGENT_BASE_URL`    | 模型 API 地址        | `http://localhost:8000/v1` |
| `PHONE_AGENT_MODEL`       | 模型名称             | `autoglm-phone-9b`         |
| `PHONE_AGENT_API_KEY`     | API Key          | `EMPTY`                    |
| `PHONE_AGENT_MAX_STEPS`   | 每个任务最大步数         | `100`                      |
| `PHONE_AGENT_DEVICE_ID`   | ADB 设备 ID        | (自动检测)                     |
| `PHONE_AGENT_LANG`        | 语言 (`cn`/`en`)   | `cn`                       |

---

## 常见问题

### 设备未找到

```bash
adb kill-server
adb start-server
adb devices
```

检查：
1. USB 调试是否已开启
2. 数据线是否支持数据传输
3. 手机上的授权弹窗是否已点击「允许」
4. 尝试更换 USB 接口或数据线

### 能打开应用但无法点击

在 `设置 > 开发者选项` 中同时启用：
- **USB 调试**
- **USB 调试（安全设置）**

### 文本输入不工作

1. 确保已安装 ADB Keyboard
2. 在 `设置 > 系统 > 语言和输入法 > 虚拟键盘` 中启用

### Windows 编码异常

运行代码前添加环境变量：

```bash
PYTHONIOENCODING=utf-8 python main.py ...
```

---

## License

This project is for research and learning purposes only. See [Terms of Use](resources/privacy_policy.txt) / [使用条款](resources/privacy_policy.txt).


================================================
FILE: README_en.md
================================================
# Open-AutoGLM

[中文阅读.](./README.md)

<div align="center">
<img src=resources/logo.svg width="20%"/>
</div>
<p align="center">
    👋 Join our<a href="resources/WECHAT.md" target="_blank"> Wechat</a> or <a href="https://discord.gg/HvT5BaPg3H" target="_blank">Discord</a> community.
</p>
<p align="center">
    👋 Follow AutoGLM Autotyper <a href="https://x.com/Autotyper_Agent?s=20" target="_blank">X</a> account
</p>

## Quick Start

You can use Claude Code with [GLM Coding Plan](https://z.ai/subscribe) and enter the following prompt to quickly deploy this project:

```
Access the documentation and install AutoGLM for me
https://raw.githubusercontent.com/zai-org/Open-AutoGLM/refs/heads/main/README_en.md
```

## Project Introduction

Phone Agent is a mobile intelligent assistant framework built on AutoGLM. It understands phone screen content in a multimodal manner and helps users complete tasks through automated operations. The system controls devices via ADB (Android Debug Bridge), perceives screens using vision-language models, and generates and executes operation workflows through intelligent planning. Users simply describe their needs in natural language, such as "Open eBay and search for wireless earphones." and Phone Agent will automatically parse the intent, understand the current interface, plan the next action, and complete the entire workflow. The system also includes a sensitive operation confirmation mechanism and supports manual takeover during login or verification code scenarios. Additionally, it provides remote ADB debugging capabilities, allowing device connection via WiFi or network for flexible remote control and development.

> ⚠️ This project is for research and learning purposes only. It is strictly prohibited to use for illegal information acquisition, system interference, or any illegal activities. Please carefully review the [Terms of Use](resources/privacy_policy_en.txt).

## Integration with Other Automation Tools

### Midscene.js

[Midscene.js](https://midscenejs.com/en/index.html) is an open-source, vision-model-driven UI automation SDK that supports JavaScript or YAML flow syntax for cross-platform automation.

Midscene.js already supports AutoGLM; see the [Midscene.js integration guide](https://midscenejs.com/model-common-config.html#auto-glm) to quickly try AutoGLM automation on both iOS and Android devices.

## Model Download Links

| Model             | Download Links                                                                                                                                             |
|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| AutoGLM-Phone-9B  | [🤗 Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B)<br>[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B)               |
| AutoGLM-Phone-9B-Multilingual | [🤗 Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B-Multilingual)<br>[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B-Multilingual) |

`AutoGLM-Phone-9B` is optimized for Chinese mobile applications, while `AutoGLM-Phone-9B-Multilingual` supports English scenarios and is suitable for applications containing English or other language content.

## Environment Setup

### 1. Python Environment

Python 3.10 or higher is recommended.

### 2. Device Debug Tools

Choose the appropriate tool based on your device type:

#### For Android Devices - Using ADB

1. Download the official ADB [installation package](https://developer.android.com/tools/releases/platform-tools) and extract it to a custom path
2. Configure environment variables

- MacOS configuration: In `Terminal` or any command line tool

  ```bash
  # Assuming the extracted directory is ~/Downloads/platform-tools. Adjust the command if different.
  export PATH=${PATH}:~/Downloads/platform-tools
  ```

- Windows configuration: Refer to [third-party tutorials](https://blog.csdn.net/x2584179909/article/details/108319973) for configuration.

#### For HarmonyOS Devices - Using HDC

1. Download HDC tool:
   - From [HarmonyOS SDK](https://developer.huawei.com/consumer/en/download/)
2. Configure environment variables

- MacOS/Linux configuration:

  ```bash
  # Assuming the extracted directory is ~/Downloads/harmonyos-sdk/toolchains. Adjust according to actual path.
  export PATH=${PATH}:~/Downloads/harmonyos-sdk/toolchains
  ```

- Windows configuration: Add the HDC tool directory to the system PATH environment variable

### 3. Android 7.0+ or HarmonyOS Device with `Developer Mode` and `USB Debugging` Enabled

1. Enable Developer Mode: The typical method is to find `Settings > About Phone > Build Number` and tap it rapidly about 10 times until a popup shows "Developer mode has been enabled." This may vary slightly between phones; search online for tutorials if you can't find it.
2. Enable USB Debugging: After enabling Developer Mode, go to `Settings > Developer Options > USB Debugging` and enable it
3. Some devices may require a restart after setting developer options for them to take effect. You can test by connecting your phone to your computer via USB cable and running `adb devices` to see if device information appears. If not, the connection has failed.

**Please carefully check the relevant permissions**

![Permissions](resources/screenshot-20251210-120416.png)

### 4. Install ADB Keyboard (Required for Android Devices Only, for Text Input)

**Note: HarmonyOS devices use native input methods and do not require ADB Keyboard.**

If you are using an Android device:

Download the [installation package](https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk) and install it on the corresponding Android device.
Note: After installation, you need to enable `ADB Keyboard` in `Settings > Input Method` or `Settings > Keyboard List` for it to work.(or use command `adb shell ime enable com.android.adbkeyboard/.AdbIME`[How-to-use](https://github.com/senzhk/ADBKeyBoard/blob/master/README.md#how-to-use))

## Deployment Preparation

### 1. Install Dependencies

```bash
pip install -r requirements.txt 
pip install -e .
```

### 2. Configure ADB or HDC

#### For Android Devices

Make sure your **USB cable supports data transfer**, not just charging.

Ensure ADB is installed and connect the device via **USB cable**:

```bash
# Check connected devices
adb devices

# Output should show your device, e.g.:
# List of devices attached
# emulator-5554   device
```

#### For HarmonyOS Devices

Make sure your **USB cable supports data transfer**, not just charging.

Ensure HDC is installed and connect the device via **USB cable**:

```bash
# Check connected devices
hdc list targets

# Output should show your device, e.g.:
# 7001005458323933328a01bce01c2500
```

### 3. Start Model Service

You can choose to deploy the model service yourself or use a third-party model service provider.

#### Option A: Use Third-Party Model Services

If you don't want to deploy the model yourself, you can use the following third-party services that have already deployed our model:

**1. z.ai**

- Documentation: https://docs.z.ai/api-reference/introduction
- `--base-url`: `https://api.z.ai/api/paas/v4`
- `--model`: `autoglm-phone-multilingual`
- `--apikey`: Apply for your own API key on the z.ai platform

**2. Novita AI**

- Documentation: https://novita.ai/models/model-detail/zai-org-autoglm-phone-9b-multilingual
- `--base-url`: `https://api.novita.ai/openai`
- `--model`: `zai-org/autoglm-phone-9b-multilingual`
- `--apikey`: Apply for your own API key on the Novita AI platform

**3. Parasail**

- Documentation: https://www.saas.parasail.io/serverless?name=auto-glm-9b-multilingual
- `--base-url`: `https://api.parasail.io/v1`
- `--model`: `parasail-auto-glm-9b-multilingual`
- `--apikey`: Apply for your own API key on the Parasail platform

Example usage with third-party services:

```bash
# Using z.ai
python main.py --base-url https://api.z.ai/api/paas/v4 --model "autoglm-phone-multilingual" --apikey "your-z-ai-api-key" "Open Chrome browser"

# Using Novita AI
python main.py --base-url https://api.novita.ai/openai --model "zai-org/autoglm-phone-9b-multilingual" --apikey "your-novita-api-key" "Open Chrome browser"

# Using Parasail
python main.py --base-url https://api.parasail.io/v1 --model "parasail-auto-glm-9b-multilingual" --apikey "your-parasail-api-key" "Open Chrome browser"
```

#### Option B: Deploy Model Yourself

If you prefer to deploy the model locally or on your own server:

1. Download the model and install the inference engine framework according to the `For Model Deployment` section in `requirements.txt`.
2. Start via SGlang / vLLM to get an OpenAI-format service. Here's a vLLM deployment solution; please strictly follow the startup parameters we provide:

- vLLM:

```shell
python3 -m vllm.entrypoints.openai.api_server \
 --served-model-name autoglm-phone-9b-multilingual \
 --allowed-local-media-path /   \
 --mm-encoder-tp-mode data \
 --mm_processor_cache_type shm \
 --mm_processor_kwargs "{\"max_pixels\":5000000}" \
 --max-model-len 25480  \
 --chat-template-content-format string \
 --limit-mm-per-prompt "{\"image\":10}" \
 --model zai-org/AutoGLM-Phone-9B-Multilingual \
 --port 8000
```

- This model has the same architecture as `GLM-4.1V-9B-Thinking`. For detailed information about model deployment, you can also check [GLM-V](https://github.com/zai-org/GLM-V) for model deployment and usage guides.

- After successful startup, the model service will be accessible at `http://localhost:8000/v1`. If you deploy the model on a remote server, access it using that server's IP address.

### 4. Check Model Deployment

After starting the model service, you can use the following command to verify the deployment:

```bash
python scripts/check_deployment_en.py --base-url http://localhost:8000/v1 --model autoglm-phone-9b-multilingual
```

If using a third-party model service:

```bash
# Novita AI
python scripts/check_deployment_en.py --base-url https://api.novita.ai/openai --model zai-org/autoglm-phone-9b-multilingual --apikey your-novita-api-key

# Parasail
python scripts/check_deployment_en.py --base-url https://api.parasail.io/v1 --model parasail-auto-glm-9b-multilingual --apikey your-parasail-api-key
```

Upon successful execution, the script will display the model's inference result and token statistics, helping you confirm whether the model deployment is working correctly.

## Using AutoGLM

### Command Line

Set the `--base-url` and `--model` parameters according to your deployed model. For example:

```bash
# Android device - Interactive mode
python main.py --base-url http://localhost:8000/v1 --model "autoglm-phone-9b-multilingual"

# Android device - Specify task
python main.py --base-url http://localhost:8000/v1 "Open Maps and search for nearby coffee shops"

# HarmonyOS device - Interactive mode
python main.py --device-type hdc --base-url http://localhost:8000/v1 --model "autoglm-phone-9b-multilingual"

# HarmonyOS device - Specify task
python main.py --device-type hdc --base-url http://localhost:8000/v1 "Open Maps and search for nearby coffee shops"

# Use API key for authentication
python main.py --apikey sk-xxxxx

# Use English system prompt
python main.py --lang en --base-url http://localhost:8000/v1 "Open Chrome browser"

# List supported apps (Android)
python main.py --list-apps

# List supported apps (HarmonyOS)
python main.py --device-type hdc --list-apps
```

### Python API

```python
from phone_agent import PhoneAgent
from phone_agent.model import ModelConfig

# Configure model
model_config = ModelConfig(
    base_url="http://localhost:8000/v1",
    model_name="autoglm-phone-9b-multilingual",
)

# Create Agent
agent = PhoneAgent(model_config=model_config)

# Execute task
result = agent.run("Open eBay and search for wireless earphones")
print(result)
```

## Remote Debugging

Phone Agent supports remote ADB/HDC debugging via WiFi/network, allowing device control without a USB connection.

### Configure Remote Debugging

#### Enable Wireless Debugging on Phone

##### Android Devices

Ensure the phone and computer are on the same WiFi network, as shown below:

![Enable Wireless Debugging](resources/screenshot-20251210-120630.png)

##### HarmonyOS Devices

Ensure the phone and computer are on the same WiFi network:
1. Go to `Settings > System & Updates > Developer Options`
2. Enable `USB Debugging` and `Wireless Debugging`
3. Note the displayed IP address and port number

#### Use Standard ADB/HDC Commands on Computer

```bash
# Android device - Connect via WiFi, replace with the IP address and port shown on your phone
adb connect 192.168.1.100:5555

# Verify connection
adb devices
# Should show: 192.168.1.100:5555    device

# HarmonyOS device - Connect via WiFi
hdc tconn 192.168.1.100:5555

# Verify connection
hdc list targets
# Should show: 192.168.1.100:5555
```

### Device Management Commands

#### Android Devices (ADB)

```bash
# List all connected devices
adb devices

# Connect to remote device
adb connect 192.168.1.100:5555

# Disconnect specific device
adb disconnect 192.168.1.100:5555

# Execute task on specific device
python main.py --device-id 192.168.1.100:5555 --base-url http://localhost:8000/v1 --model "autoglm-phone-9b-multilingual" "Open TikTok and browse videos"
```

#### HarmonyOS Devices (HDC)

```bash
# List all connected devices
hdc list targets

# Connect to remote device
hdc tconn 192.168.1.100:5555

# Disconnect specific device
hdc tdisconn 192.168.1.100:5555

# Execute task on specific device
python main.py --device-type hdc --device-id 192.168.1.100:5555 --base-url http://localhost:8000/v1 --model "autoglm-phone-9b-multilingual" "Open TikTok and browse videos"
```

### Python API Remote Connection

#### Android Devices (ADB)

```python
from phone_agent.adb import ADBConnection, list_devices

# Create connection manager
conn = ADBConnection()

# Connect to remote device
success, message = conn.connect("192.168.1.100:5555")
print(f"Connection status: {message}")

# List connected devices
devices = list_devices()
for device in devices:
    print(f"{device.device_id} - {device.connection_type.value}")

# Enable TCP/IP on USB device
success, message = conn.enable_tcpip(5555)
ip = conn.get_device_ip()
print(f"Device IP: {ip}")

# Disconnect
conn.disconnect("192.168.1.100:5555")
```

#### HarmonyOS Devices (HDC)

```python
from phone_agent.hdc import HDCConnection, list_devices

# Create connection manager
conn = HDCConnection()

# Connect to remote device
success, message = conn.connect("192.168.1.100:5555")
print(f"Connection status: {message}")

# List connected devices
devices = list_devices()
for device in devices:
    print(f"{device.device_id} - {device.connection_type.value}")

# Disconnect
conn.disconnect("192.168.1.100:5555")
```

### Remote Connection Troubleshooting

**Connection Refused:**

- Ensure the device and computer are on the same network
- Check if the firewall is blocking port 5555
- Confirm TCP/IP mode is enabled: `adb tcpip 5555`

**Connection Dropped:**

- WiFi may have disconnected; use `--connect` to reconnect
- Some devices disable TCP/IP after restart; re-enable via USB

**Multiple Devices:**

- Use `--device-id` to specify which device to use
- Or use `--list-devices` to view all connected devices

## Configuration

### Custom SYSTEM PROMPT

The system provides both Chinese and English prompts, switchable via the `--lang` parameter:

- `--lang cn` - Chinese prompt (default), config file: `phone_agent/config/prompts_zh.py`
- `--lang en` - English prompt, config file: `phone_agent/config/prompts_en.py`

You can directly modify the corresponding config files to enhance model capabilities in specific domains or disable certain apps by injecting app names.

### Environment Variables

| Variable                    | Description               | Default Value              |
|-----------------------------|---------------------------|----------------------------|
| `PHONE_AGENT_BASE_URL`      | Model API URL             | `http://localhost:8000/v1` |
| `PHONE_AGENT_MODEL`         | Model name                | `autoglm-phone-9b`         |
| `PHONE_AGENT_API_KEY`       | API key for authentication| `EMPTY`                    |
| `PHONE_AGENT_MAX_STEPS`     | Maximum steps per task    | `100`                      |
| `PHONE_AGENT_DEVICE_ID`     | ADB/HDC device ID         | (auto-detect)              |
| `PHONE_AGENT_DEVICE_TYPE`   | Device type (`adb` or `hdc`)| `adb`                    |
| `PHONE_AGENT_LANG`          | Language (`cn` or `en`)   | `en`                       |

### Model Configuration

```python
from phone_agent.model import ModelConfig

config = ModelConfig(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",  # API key (if required)
    model_name="autoglm-phone-9b-multilingual",  # Model name
    max_tokens=3000,  # Maximum output tokens
    temperature=0.1,  # Sampling temperature
    frequency_penalty=0.2,  # Frequency penalty
)
```

### Agent Configuration

```python
from phone_agent.agent import AgentConfig

config = AgentConfig(
    max_steps=100,  # Maximum steps per task
    device_id=None,  # ADB device ID (None for auto-detect)
    lang="en",  # Language: cn (Chinese) or en (English)
    verbose=True,  # Print debug info (including thinking process and actions)
)
```

### Verbose Mode Output

When `verbose=True`, the Agent outputs detailed information at each step:

```
==================================================
💭 Thinking Process:
--------------------------------------------------
Currently on the system desktop, need to launch eBay app first
--------------------------------------------------
🎯 Executing Action:
{
  "_metadata": "do",
  "action": "Launch",
  "app": "eBay"
}
==================================================

... (continues to next step after executing action)

==================================================
💭 Thinking Process:
--------------------------------------------------
eBay is now open, need to tap the search box
--------------------------------------------------
🎯 Executing Action:
{
  "_metadata": "do",
  "action": "Tap",
  "element": [499, 182]
}
==================================================

🎉 ================================================
✅ Task Completed: Successfully opened eBay and searched for 'wireless earphones'
==================================================
```

This allows you to clearly see the AI's reasoning process and specific operations at each step.

## Supported Apps

### Android Apps

Phone Agent supports 50+ mainstream Chinese applications:

| Category                 | Apps                                                                                   |
|--------------------------|----------------------------------------------------------------------------------------|
| Social & Messaging       | X, Tiktok, WhatsApp, Telegram, FacebookMessenger, GoogleChat, Quora, Reddit, Instagram |
| Productivity & Office    | Gmail, GoogleCalendar, GoogleDrive, GoogleDocs, GoogleTasks, Joplin                    |
| Life, Shopping & Finance | Amazon shopping, Temu, Bluecoins, Duolingo, GoogleFit, ebay                            |
| Utilities & Media        | GoogleClock, Chrome, GooglePlayStore, GooglePlayBooks, FilesbyGoogle                   |
| Travel & Navigation      | GoogleMaps, Booking.com, Trip.com, Expedia, OpenTracks                                 |

Run `python main.py --list-apps` to see the complete list.

### HarmonyOS Apps

Phone Agent supports 60+ HarmonyOS native apps and system apps:

| Category                 | Apps                                                                                   |
|--------------------------|----------------------------------------------------------------------------------------|
| Social & Messaging       | WeChat, QQ, Weibo, Feishu, Enterprise WeChat                                          |
| E-commerce & Shopping    | Taobao, JD.com, Pinduoduo, Vipshop, Dewu, Xianyu                                      |
| Food & Delivery          | Meituan, Meituan Waimai, Dianping, Haidilao                                           |
| Travel & Navigation      | 12306, Didi, Tongcheng, Amap, Baidu Maps                                              |
| Video & Entertainment    | Bilibili, Douyin, Kuaishou, Tencent Video, iQIYI, Mango TV                            |
| Music & Audio            | QQ Music, Qishui Music, Ximalaya                                                       |
| Lifestyle & Social       | Xiaohongshu, Zhihu, Toutiao, 58.com, China Mobile                                     |
| AI & Tools               | Doubao, WPS, UC Browser, CamScanner, Meitu                                            |
| System Apps              | Browser, Calendar, Camera, Clock, Cloud, File Manager, Gallery, Contacts, SMS, Settings |
| Huawei Services          | AppGallery, Music, Video, Books, Themes, Weather                                       |

Run `python main.py --device-type hdc --list-apps` to see the complete list.

## Available Actions

The Agent can perform the following actions:

| Action         | Description                              |
|----------------|------------------------------------------|
| `Launch`       | Launch an app                            |  
| `Tap`          | Tap at specified coordinates             |
| `Type`         | Input text                               |
| `Swipe`        | Swipe the screen                         |
| `Back`         | Go back to previous page                 |
| `Home`         | Return to home screen                    |
| `Long Press`   | Long press                               |
| `Double Tap`   | Double tap                               |
| `Wait`         | Wait for page to load                    |
| `Take_over`    | Request manual takeover (login/captcha)  |

## Custom Callbacks

Handle sensitive operation confirmation and manual takeover:

```python
def my_confirmation(message: str) -> bool:
    """Sensitive operation confirmation callback"""
    return input(f"Confirm execution of {message}? (y/n): ").lower() == "y"


def my_takeover(message: str) -> None:
    """Manual takeover callback"""
    print(f"Please complete manually: {message}")
    input("Press Enter after completion...")


agent = PhoneAgent(
    confirmation_callback=my_confirmation,
    takeover_callback=my_takeover,
)
```

## Examples

Check the `examples/` directory for more usage examples:

- `basic_usage.py` - Basic task execution
- Single-step debugging mode
- Batch task execution
- Custom callbacks

## Development

### Set Up Development Environment

Development requires dev dependencies:

```bash
pip install -e ".[dev]"
```

### Run Tests

```bash
pytest tests/
```

### Complete Project Structure

```
phone_agent/
├── __init__.py          # Package exports
├── agent.py             # PhoneAgent main class
├── adb/                 # ADB utilities
│   ├── connection.py    # Remote/local connection management
│   ├── screenshot.py    # Screen capture
│   ├── input.py         # Text input (ADB Keyboard)
│   └── device.py        # Device control (tap, swipe, etc.)
├── actions/             # Action handling
│   └── handler.py       # Action executor
├── config/              # Configuration
│   ├── apps.py          # Supported app mappings
│   ├── prompts_zh.py    # Chinese system prompts
│   └── prompts_en.py    # English system prompts
└── model/               # AI model client
    └── client.py        # OpenAI-compatible client
```

## FAQ

Here are some common issues and their solutions:

### Device Not Found

Try resolving by restarting the ADB service:

```bash
adb kill-server
adb start-server
adb devices
```

If the device is still not recognized, please check:
1. Whether USB debugging is enabled
2. Whether the USB cable supports data transfer (some cables only support charging)
3. Whether you have tapped "Allow" on the authorization popup on your phone
4. Try a different USB port or cable

### Can Open Apps but Cannot Tap

Some devices require both debugging options to be enabled:
- **USB Debugging**
- **USB Debugging (Security Settings)**

Please check in `Settings → Developer Options` that both options are enabled.

### Text Input Not Working

1. Ensure ADB Keyboard is installed on the device
2. Enable it in Settings > System > Language & Input > Virtual Keyboard
3. The Agent will automatically switch to ADB Keyboard when input is needed

### Screenshot Failed (Black Screen)

This usually means the app is displaying a sensitive page (payment, password, banking apps). The Agent will automatically detect this and request manual takeover.

### Windows Encoding Issues
Error message like `UnicodeEncodeError gbk code`

Solution: Add the environment variable before running the code: `PYTHONIOENCODING=utf-8`

### Interactive Mode Not Working in Non-TTY Environment
Error like: `EOF when reading a line`

Solution: Use non-interactive mode to specify tasks directly, or switch to a TTY-mode terminal application.

### Citation

If you find our work helpful, please cite the following papers:

```bibtex
@article{liu2024autoglm,
  title={Autoglm: Autonomous foundation agents for guis},
  author={Liu, Xiao and Qin, Bo and Liang, Dongzhu and Dong, Guang and Lai, Hanyu and Zhang, Hanchen and Zhao, Hanlin and Iong, Iat Long and Sun, Jiadai and Wang, Jiaqi and others},
  journal={arXiv preprint arXiv:2411.00820},
  year={2024}
}
@article{xu2025mobilerl,
  title={MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents},
  author={Xu, Yifan and Liu, Xiao and Liu, Xinghan and Fu, Jiaqi and Zhang, Hanchen and Jing, Bohao and Zhang, Shudan and Wang, Yuting and Zhao, Wenyi and Dong, Yuxiao},
  journal={arXiv preprint arXiv:2509.18119},
  year={2025}
}
```

---

## Automated Deployment Guide (For AI Assistants)

> **This section is specifically designed for AI assistants (such as Claude Code) to automate the deployment of Open-AutoGLM.**
>
> If you are a human reader, you can skip this section and follow the documentation above.

---

### Project Overview

Open-AutoGLM is a phone agent framework:
- **Input**: User's natural language instructions (e.g., "Open WhatsApp and send a message to John")
- **Output**: Automatically operates the user's Android phone to complete tasks
- **Mechanism**: Screenshot → Vision model understands interface → Outputs tap coordinates → ADB executes actions → Loop

The architecture consists of two parts:
1. **Agent Code** (this repository): Runs on the user's computer, responsible for calling models, parsing actions, and controlling the phone
2. **Vision Model Service**: Can be a remote API or deployed locally

---

### Pre-Deployment Checklist

Before starting deployment, confirm the following items with the user:

#### Hardware Requirements
- [ ] User has an Android phone (Android 7.0+)
- [ ] User has a USB cable that supports data transfer (not just charging)
- [ ] Phone and computer can be connected via USB cable

#### Phone Configuration
- [ ] Phone has Developer Mode enabled (Settings → About Phone → Tap Build Number 7 times)
- [ ] Phone has USB Debugging enabled (Settings → Developer Options → USB Debugging)
- [ ] Some models require enabling "USB Debugging (Security Settings)" as well
- [ ] ADB Keyboard app is installed (Download: https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk)
- [ ] ADB Keyboard is enabled in system settings (Settings → Language & Input → Enable ADB Keyboard)

#### Model Service Confirmation (Choose One)

**Ask the user explicitly: Do you already have access to an AutoGLM model service?**

- **Option A: Use an already-deployed model service (Recommended)**
  - User provides the model service URL (e.g., `http://xxx.xxx.xxx.xxx:8000/v1`)
  - No local GPU required, no model download needed
  - Use this URL directly as the `--base-url` parameter

- **Option B: Deploy model locally (High system requirements)**
  - Requires NVIDIA GPU (24GB+ VRAM recommended)
  - Requires installation of vLLM or SGLang
  - Requires downloading approximately 20GB of model files
  - **If the user is a beginner or unsure, strongly recommend Option A**

---

### Deployment Process

#### Phase 1: Environment Setup

```bash
# 1. Install ADB tools
# MacOS:
brew install android-platform-tools
# Or download manually: https://developer.android.com/tools/releases/platform-tools

# Windows: Download, extract, and add to PATH environment variable

# 2. Verify ADB installation
adb version
# Should output version information

# 3. Connect phone and verify
# Connect phone via USB cable, tap "Allow USB debugging" on phone
adb devices
# Should output device list, e.g.:
# List of devices attached
# XXXXXXXX    device
```

**If `adb devices` shows empty list or unauthorized:**
1. Check if authorization popup appeared on phone, tap "Allow"
2. Check if USB debugging is enabled
3. Try a different cable or USB port
4. Run `adb kill-server && adb start-server` and retry

#### Phase 2: Install Agent

```bash
# 1. Clone repository (if not already cloned)
git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM

# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt
pip install -e .
```

**Note: No need to clone model repository; models are called via API.**

#### Phase 3: Configure Model Service

**If user chooses Option A (using already-deployed model):**

You can use the following third-party model services:

1. **z.ai**
   - Documentation: https://docs.z.ai/api-reference/introduction
   - `--base-url`: `https://api.z.ai/api/paas/v4`
   - `--model`: `autoglm-phone-multilingual`
   - `--apikey`: Apply for your own API key on the z.ai platform

2. **Novita AI**
   - Documentation: https://novita.ai/models/model-detail/zai-org-autoglm-phone-9b-multilingual
   - `--base-url`: `https://api.novita.ai/openai`
   - `--model`: `zai-org/autoglm-phone-9b-multilingual`
   - `--apikey`: Apply for your own API key on the Novita AI platform

3. **Parasail**
   - Documentation: https://www.saas.parasail.io/serverless?name=auto-glm-9b-multilingual
   - `--base-url`: `https://api.parasail.io/v1`
   - `--model`: `parasail-auto-glm-9b-multilingual`
   - `--apikey`: Apply for your own API key on the Parasail platform

Example usage:

```bash
# Using z.ai
python main.py --base-url https://api.z.ai/api/paas/v4 --model "autoglm-phone-multilingual" --apikey "your-z-ai-api-key" "Open Chrome browser"

# Using Novita AI
python main.py --base-url https://api.novita.ai/openai --model "zai-org/autoglm-phone-9b-multilingual" --apikey "your-novita-api-key" "Open Chrome browser"

# Using Parasail
python main.py --base-url https://api.parasail.io/v1 --model "parasail-auto-glm-9b-multilingual" --apikey "your-parasail-api-key" "Open Chrome browser"
```

Or use the URL provided by the user directly and skip local model deployment steps.

**If user chooses Option B (deploy model locally):**

```bash
# 1. Install vLLM
pip install vllm

# 2. Start model service (will auto-download model, ~20GB)
python3 -m vllm.entrypoints.openai.api_server \
  --served-model-name autoglm-phone-9b-multilingual \
  --allowed-local-media-path / \
  --mm-encoder-tp-mode data \
  --mm_processor_cache_type shm \
  --mm_processor_kwargs "{\"max_pixels\":5000000}" \
  --max-model-len 25480 \
  --chat-template-content-format string \
  --limit-mm-per-prompt "{\"image\":10}" \
  --model zai-org/AutoGLM-Phone-9B-Multilingual \
  --port 8000

# Model service URL: http://localhost:8000/v1
```

#### Phase 4: Verify Deployment

```bash
# Execute in the Open-AutoGLM directory
# Replace {MODEL_URL} with the actual model service address

python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b-multilingual" "Open Gmail and send an email to File Transfer Assistant: Deployment successful"
```

**Expected Result:**
- Phone automatically opens Gmail
- Automatically searches for recipient
- Automatically sends the message "Deployment successful"

---

### Troubleshooting

| Error Symptom | Possible Cause | Solution |
|---------------|----------------|----------|
| `adb devices` shows nothing | USB debugging not enabled or cable issue | Check developer options, replace cable |
| `adb devices` shows unauthorized | Phone not authorized | Tap "Allow USB debugging" on phone |
| Can open apps but cannot tap | Missing security debugging permission | Enable "USB Debugging (Security Settings)" |
| Chinese/text input corrupted or missing | ADB Keyboard not enabled | Enable ADB Keyboard in system settings |
| Screenshot returns black screen | Sensitive page (payment/banking) | Normal behavior, system will handle automatically |
| Cannot connect to model service | Wrong URL or service not running | Check URL, confirm service is running |
| `ModuleNotFoundError` | Dependencies not installed | Run `pip install -r requirements.txt` |

---

### Deployment Key Points

1. **Prioritize confirming phone connection**: Before installing any code, ensure `adb devices` can see the device
2. **Don't skip ADB Keyboard**: Without it, text input will fail
3. **Model service is an external dependency**: Agent code doesn't include the model; a separate model service is required
4. **Check phone settings first for permission issues**: Most problems are due to incomplete phone-side configuration
5. **Test with simple tasks after deployment**: Recommend using "Open Gmail and send message to File Transfer Assistant" as acceptance criteria

---

### Command Quick Reference

```bash
# Check ADB connection
adb devices

# Restart ADB service
adb kill-server && adb start-server

# Install dependencies
pip install -r requirements.txt && pip install -e .

# Run Agent (interactive mode)
python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b-multilingual"

# Run Agent (single task)
python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b-multilingual" "your task description"

# View supported apps list
python main.py --list-apps
```

---

**Deployment success indicator: The phone can automatically execute user's natural language instructions.**


================================================
FILE: docs/ios_setup/ios_setup.md
================================================
# iOS 环境配置指南

本文档介绍如何为 Open-AutoGLM 配置 iOS 设备环境。

## 环境要求

- macOS 操作系统
- Xcode（最新版本，在App store中下载）
- 苹果开发者账号（免费账号即可，无需付费）
- iOS 设备（iPhone/iPad）
- USB 数据线或同一 WiFi 网络


## WebDriverAgent 配置

WebDriverAgent 是 iOS 自动化的核心组件，需要在 iOS 设备上运行。

### 1. 克隆 WebDriverAgent

```bash
git clone https://github.com/appium/WebDriverAgent.git
cd WebDriverAgent
```

直接点击`WebDriverAgent.xcodeproj`即可使用Xcode打开。

### 2. 设置 Signing & Capabilities

1. 在 Xcode 中选中 `WebDriverAgent`，出现General、Signing&Capabilities等选项。
2. 进入 `Signing & Capabilities` 选项卡
3.   勾选 `Automatically manage signing`。在Team中选择自己的开发者账号
4. 将 Bundle ID 改为唯一标识符，例如：`com.yourname.WebDriverAgentRunner`
![设置签名1](resources/ios0_WebDriverAgent0.png)

5. TARGETS中，建议将WebDriverAgentLib、WebDriverAgentRunner、IntegrationApp的`Signing & Capabilities` 都按照相同方式设置。
![设置签名1](resources/ios0_WebDriverAgent1.png)

### 3. 测试XCode的GUI模式和UI自动化设置

建议先测试GUI模式能否成功安装WebDriverAgent，再进行后续步骤。
Mac和iPhone有USB和WiFi两种连接方式，建议通过USB方式，成功率更高。

#### 通过 WiFi 连接

需要满足以下条件：
1.  通过USB连接。在Finder中选中连接的IPhone，在“通用”中勾选"在 WiFi 中显示这台 iPhone"
2. Mac 与 iPhone 处于同一 WiFi 网络之下

#### 具体步骤
1. 从项目 Target 选择 `WebDriverAgentRunner`
2. 选择你的设备

![选择设备](resources/select-your-iphone-device.png)

3. 长按"▶️"运行按钮，选择 "Test" 后开始编译并部署到你的 iPhone 上

![开始测试](resources/start-wda-testing.png)

部署成功的标志：1. XCode没有报错。2. 你可以在iPhone上找到名为WebDriverAgentRunner的App

#### 设备信任配置

首次运行时，需要在 iPhone 上完成以下设置，然后重新编译和部署：

1. **输入解锁密码**
2. **信任开发者应用**
   - 进入：设置 → 通用 → VPN与设备管理
   - 在“开发者 App”中选择对应开发者
   - 点击信任“XXX”

   ![信任设备](resources/trust-dev-app.jpg)

3. **启用 UI 自动化**
   - 进入：设置 → 开发者
   - 打开 UI 自动化设置

   ![启用UI自动化](resources/enable-ui-automation.jpg)

### 4. XCode命令行模式部署

1.安装libimobiledevice，用于与 iPhone / iPad 建立连接与通信。

```
brew install libimobiledevice
# 设备检查
idevice_id -ln
```
2.使用xcodebuild安装WebAgent。命令行也需要进行“设备信任配置”，参考GUI模式下的方法。

```
cd WebDriverAgent

xcodebuild -project WebDriverAgent.xcodeproj \
           -scheme WebDriverAgentRunner \
           -destination 'platform=iOS,name=YOUR_PHONE_NAME' \
           test
```
这里，YOUR_PHONE_NAME可以在xcode的GUI中看到。
WebDriverAgent 成功运行后，会在 Xcode 控制台输出类似以下信息：

```
ServerURLHere->http://[设备IP]:8100<-ServerURLHere
```

同时，观察到手机上安装好了WebDriverAgentRunner，屏幕显示Automation Running字样。
其中，**http://[设备IP]:8100**为WiFi所需的WDA_URL。

## 使用 AutoGLM

以上配置完成后，先打开一个新终端，在后台建立端口映射（使用WiFi连接则不需要）：

```bash
 iproxy 8100 8100
```

之后，打开一个新终端，通过以下命令使用AutoGLM（WiFi则使用上述获得的WDA_URL）：

```bash
python ios.py --base-url "YOUR_BASE_URL" \
    --model  "autoglm-phone" \
    --api-key "YOUR_API_KEY" \
    --wda-url http://localhost:8100 \
    "TASK"
```

## 参考资源

- [WebDriverAgent 官方仓库](https://github.com/appium/WebDriverAgent)
- [PR141](https://github.com/zai-org/Open-AutoGLM/pull/141)
- [Gekowa提供的ios方案](https://github.com/gekowa/Open-AutoGLM/tree/ios-support)

---

如有其他问题，请参考主项目 README 或提交 Issue。


================================================
FILE: examples/basic_usage.py
================================================
#!/usr/bin/env python3
"""
Phone Agent Usage Examples / Phone Agent 使用示例

Demonstrates how to use Phone Agent for phone automation tasks via Python API.
演示如何通过 Python API 使用 Phone Agent 进行手机自动化任务。
"""

from phone_agent import PhoneAgent
from phone_agent.agent import AgentConfig
from phone_agent.config import get_messages
from phone_agent.model import ModelConfig


def example_basic_task(lang: str = "cn"):
    """Basic task example / 基础任务示例"""
    msgs = get_messages(lang)

    # Configure model endpoint
    model_config = ModelConfig(
        base_url="http://localhost:8000/v1",
        model_name="autoglm-phone-9b",
        temperature=0.1,
    )

    # Configure Agent behavior
    agent_config = AgentConfig(
        max_steps=50,
        verbose=True,
        lang=lang,
    )

    # Create Agent
    agent = PhoneAgent(
        model_config=model_config,
        agent_config=agent_config,
    )

    # Execute task
    result = agent.run("打开小红书搜索美食攻略")
    print(f"{msgs['task_result']}: {result}")


def example_with_callbacks(lang: str = "cn"):
    """Task example with callbacks / 带回调的任务示例"""
    msgs = get_messages(lang)

    def my_confirmation(message: str) -> bool:
        """Sensitive operation confirmation callback / 敏感操作确认回调"""
        print(f"\n[{msgs['confirmation_required']}] {message}")
        response = input(f"{msgs['continue_prompt']}: ")
        return response.lower() in ("yes", "y", "是")

    def my_takeover(message: str) -> None:
        """Manual takeover callback / 人工接管回调"""
        print(f"\n[{msgs['manual_operation_required']}] {message}")
        print(msgs["manual_operation_hint"])
        input(f"{msgs['press_enter_when_done']}: ")

    # Create Agent with custom callbacks
    agent_config = AgentConfig(lang=lang)
    agent = PhoneAgent(
        agent_config=agent_config,
        confirmation_callback=my_confirmation,
        takeover_callback=my_takeover,
    )

    # Execute task that may require confirmation
    result = agent.run("打开淘宝搜索无线耳机并加入购物车")
    print(f"{msgs['task_result']}: {result}")


def example_step_by_step(lang: str = "cn"):
    """Step-by-step execution example (for debugging) / 单步执行示例（用于调试）"""
    msgs = get_messages(lang)

    agent_config = AgentConfig(lang=lang)
    agent = PhoneAgent(agent_config=agent_config)

    # Initialize task
    result = agent.step("打开美团搜索附近的火锅店")
    print(f"{msgs['step']} 1: {result.action}")

    # Continue if not finished
    while not result.finished and agent.step_count < 10:
        result = agent.step()
        print(f"{msgs['step']} {agent.step_count}: {result.action}")
        print(f"  {msgs['thinking']}: {result.thinking[:100]}...")

    print(f"\n{msgs['final_result']}: {result.message}")


def example_multiple_tasks(lang: str = "cn"):
    """Batch task example / 批量任务示例"""
    msgs = get_messages(lang)

    agent_config = AgentConfig(lang=lang)
    agent = PhoneAgent(agent_config=agent_config)

    tasks = [
        "打开高德地图查看实时路况",
        "打开大众点评搜索附近的咖啡店",
        "打开bilibili搜索Python教程",
    ]

    for task in tasks:
        print(f"\n{'=' * 50}")
        print(f"{msgs['task']}: {task}")
        print("=" * 50)

        result = agent.run(task)
        print(f"{msgs['result']}: {result}")

        # Reset Agent state
        agent.reset()


def example_remote_device(lang: str = "cn"):
    """Remote device example / 远程设备示例"""
    from phone_agent.adb import ADBConnection

    msgs = get_messages(lang)

    # Create connection manager
    conn = ADBConnection()

    # Connect to remote device
    success, message = conn.connect("192.168.1.100:5555")
    if not success:
        print(f"{msgs['connection_failed']}: {message}")
        return

    print(f"{msgs['connection_successful']}: {message}")

    # Create Agent with device specified
    agent_config = AgentConfig(
        device_id="192.168.1.100:5555",
        verbose=True,
        lang=lang,
    )

    agent = PhoneAgent(agent_config=agent_config)

    # Execute task
    result = agent.run("打开微信查看消息")
    print(f"{msgs['task_result']}: {result}")

    # Disconnect
    conn.disconnect("192.168.1.100:5555")


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description="Phone Agent Usage Examples")
    parser.add_argument(
        "--lang",
        type=str,
        default="cn",
        choices=["cn", "en"],
        help="Language for UI messages (cn=Chinese, en=English)",
    )
    args = parser.parse_args()

    msgs = get_messages(args.lang)

    print("Phone Agent Usage Examples")
    print("=" * 50)

    # Run basic example
    print(f"\n1. Basic Task Example")
    print("-" * 30)
    example_basic_task(args.lang)

    # Uncomment to run other examples
    # print(f"\n2. Task Example with Callbacks")
    # print("-" * 30)
    # example_with_callbacks(args.lang)

    # print(f"\n3. Step-by-step Example")
    # print("-" * 30)
    # example_step_by_step(args.lang)

    # print(f"\n4. Batch Task Example")
    # print("-" * 30)
    # example_multiple_tasks(args.lang)

    # print(f"\n5. Remote Device Example")
    # print("-" * 30)
    # example_remote_device(args.lang)


================================================
FILE: examples/demo_thinking.py
================================================
#!/usr/bin/env python3
"""
Thinking Output Demo / 演示 thinking 输出的示例

This script demonstrates how the Agent outputs both thinking process and actions in verbose mode.
这个脚本展示了在 verbose 模式下，Agent 会同时输出思考过程和执行动作。
"""

from phone_agent import PhoneAgent
from phone_agent.agent import AgentConfig
from phone_agent.config import get_messages
from phone_agent.model import ModelConfig


def main(lang: str = "cn"):
    msgs = get_messages(lang)

    print("=" * 60)
    print("Phone Agent - Thinking Demo")
    print("=" * 60)

    # Configure model
    model_config = ModelConfig(
        base_url="http://localhost:8000/v1",
        model_name="autoglm-phone-9b",
        temperature=0.1,
    )

    # Configure Agent (verbose=True enables detailed output)
    agent_config = AgentConfig(
        max_steps=10,
        verbose=True,
        lang=lang,
    )

    # Create Agent
    agent = PhoneAgent(
        model_config=model_config,
        agent_config=agent_config,
    )

    # Execute task
    print(f"\n📱 {msgs['starting_task']}...\n")
    result = agent.run("打开小红书搜索美食攻略")

    print("\n" + "=" * 60)
    print(f"📊 {msgs['final_result']}: {result}")
    print("=" * 60)


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description="Phone Agent Thinking Demo")
    parser.add_argument(
        "--lang",
        type=str,
        default="cn",
        choices=["cn", "en"],
        help="Language for UI messages (cn=Chinese, en=English)",
    )
    args = parser.parse_args()

    main(lang=args.lang)


================================================
FILE: ios.py
================================================
#!/usr/bin/env python3
"""
Phone Agent iOS CLI - AI-powered iOS phone automation.

Usage:
    python ios.py [OPTIONS]

Environment Variables:
    PHONE_AGENT_BASE_URL: Model API base URL (default: http://localhost:8000/v1)
    PHONE_AGENT_MODEL: Model name (default: autoglm-phone-9b)
    PHONE_AGENT_MAX_STEPS: Maximum steps per task (default: 100)
    PHONE_AGENT_WDA_URL: WebDriverAgent URL (default: http://localhost:8100)
    PHONE_AGENT_DEVICE_ID: iOS device UDID for multi-device setups
"""

import argparse
import os
import shutil
import subprocess
import sys
from urllib.parse import urlparse

from openai import OpenAI

from phone_agent.agent_ios import IOSAgentConfig, IOSPhoneAgent
from phone_agent.config.apps_ios import list_supported_apps
from phone_agent.model import ModelConfig
from phone_agent.xctest import XCTestConnection, list_devices


def check_system_requirements(wda_url: str = "http://localhost:8100") -> bool:
    """
    Check system requirements before running the agent.

    Checks:
    1. libimobiledevice tools installed
    2. At least one iOS device connected
    3. WebDriverAgent is running

    Args:
        wda_url: WebDriverAgent URL to check.

    Returns:
        True if all checks pass, False otherwise.
    """
    print("🔍 Checking system requirements...")
    print("-" * 50)

    all_passed = True

    # Check 1: libimobiledevice installed
    print("1. Checking libimobiledevice installation...", end=" ")
    if shutil.which("idevice_id") is None:
        print("❌ FAILED")
        print("   Error: libimobiledevice is not installed or not in PATH.")
        print("   Solution: Install libimobiledevice:")
        print("     - macOS: brew install libimobiledevice")
        print("     - Linux: sudo apt-get install libimobiledevice-utils")
        all_passed = False
    else:
        # Double check by running idevice_id
        try:
            result = subprocess.run(
                ["idevice_id", "-ln"], capture_output=True, text=True, timeout=10
            )
            if result.returncode == 0:
                print("✅ OK")
            else:
                print("❌ FAILED")
                print("   Error: idevice_id command failed to run.")
                all_passed = False
        except FileNotFoundError:
            print("❌ FAILED")
            print("   Error: idevice_id command not found.")
            all_passed = False
        except subprocess.TimeoutExpired:
            print("❌ FAILED")
            print("   Error: idevice_id command timed out.")
            all_passed = False

    # If libimobiledevice is not installed, skip remaining checks
    if not all_passed:
        print("-" * 50)
        print("❌ System check failed. Please fix the issues above.")
        return False

    # Check 2: iOS Device connected
    print("2. Checking connected iOS devices...", end=" ")
    try:
        devices = list_devices()

        if not devices:
            print("❌ FAILED")
            print("   Error: No iOS devices connected.")
            print("   Solution:")
            print("     1. Connect your iOS device via USB")
            print("     2. Unlock the device and tap 'Trust This Computer'")
            print("     3. Verify connection: idevice_id -l")
            print("     4. Or connect via WiFi using device IP")
            all_passed = False
        else:
            device_names = [
                d.device_name or d.device_id[:8] + "..." for d in devices
            ]
            print(f"✅ OK ({len(devices)} device(s): {', '.join(device_names)})")
    except Exception as e:
        print("❌ FAILED")
        print(f"   Error: {e}")
        all_passed = False

    # If no device connected, skip WebDriverAgent check
    if not all_passed:
        print("-" * 50)
        print("❌ System check failed. Please fix the issues above.")
        return False

    # Check 3: WebDriverAgent running
    print(f"3. Checking WebDriverAgent ({wda_url})...", end=" ")
    try:
        conn = XCTestConnection(wda_url=wda_url)

        if conn.is_wda_ready():
            print("✅ OK")
            # Get WDA status for additional info
            status = conn.get_wda_status()
            if status:
                session_id = status.get("sessionId", "N/A")
                print(f"   Session ID: {session_id}")
        else:
            print("❌ FAILED")
            print("   Error: WebDriverAgent is not running or not accessible.")
            print("   Solution:")
            print("     1. Run WebDriverAgent on your iOS device via Xcode")
            print("     2. For USB: Set up port forwarding: iproxy 8100 8100")
            print(
                "     3. For WiFi: Use device IP, e.g., --wda-url http://192.168.1.100:8100"
            )
            print("     4. Verify in browser: open http://localhost:8100/status")
            print("\n   Quick setup guide:")
            print(
                "     git clone https://github.com/appium/WebDriverAgent.git && cd WebDriverAgent"
            )
            print("     ./Scripts/bootstrap.sh")
            print("     open WebDriverAgent.xcodeproj")
            print("     # Configure signing, then Product > Test (Cmd+U)")
            all_passed = False
    except Exception as e:
        print("❌ FAILED")
        print(f"   Error: {e}")
        all_passed = False

    print("-" * 50)

    if all_passed:
        print("✅ All system checks passed!\n")
    else:
        print("❌ System check failed. Please fix the issues above.")

    return all_passed


def check_model_api(base_url: str, api_key: str, model_name: str) -> bool:
    """
    Check if the model API is accessible and the specified model exists.

    Checks:
    1. Network connectivity to the API endpoint
    2. Model exists in the available models list

    Args:
        base_url: The API base URL
        model_name: The model name to check

    Returns:
        True if all checks pass, False otherwise.
    """
    print("🔍 Checking model API...")
    print("-" * 50)

    all_passed = True

    # Check 1: Network connectivity
    print(f"1. Checking API connectivity ({base_url})...", end=" ")
    try:
        # Parse the URL to get host and port
        parsed = urlparse(base_url)

        # Create OpenAI client
        client = OpenAI(base_url=base_url, api_key=api_key, timeout=10.0)

        # Try to list models (this tests connectivity)
        models_response = client.models.list()
        available_models = [model.id for model in models_response.data]

        print("✅ OK")

        # Check 2: Model exists
        print(f"2. Checking model '{model_name}'...", end=" ")
        if model_name in available_models:
            print("✅ OK")
        else:
            print("❌ FAILED")
            print(f"   Error: Model '{model_name}' not found.")
            print(f"   Available models:")
            for m in available_models[:10]:  # Show first 10 models
                print(f"     - {m}")
            if len(available_models) > 10:
                print(f"     ... and {len(available_models) - 10} more")
            all_passed = False

    except Exception as e:
        print("❌ FAILED")
        error_msg = str(e)

        # Provide more specific error messages
        if "Connection refused" in error_msg or "Connection error" in error_msg:
            print(f"   Error: Cannot connect to {base_url}")
            print("   Solution:")
            print("     1. Check if the model server is running")
            print("     2. Verify the base URL is correct")
            print(f"     3. Try: curl {base_url}/models")
        elif "timed out" in error_msg.lower() or "timeout" in error_msg.lower():
            print(f"   Error: Connection to {base_url} timed out")
            print("   Solution:")
            print("     1. Check your network connection")
            print("     2. Verify the server is responding")
        elif (
            "Name or service not known" in error_msg
            or "nodename nor servname" in error_msg
        ):
            print(f"   Error: Cannot resolve hostname")
            print("   Solution:")
            print("     1. Check the URL is correct")
            print("     2. Verify DNS settings")
        else:
            print(f"   Error: {error_msg}")

        all_passed = False

    print("-" * 50)

    if all_passed:
        print("✅ Model API checks passed!\n")
    else:
        print("❌ Model API check failed. Please fix the issues above.")

    return all_passed


def parse_args() -> argparse.Namespace:
    """Parse command line arguments."""
    parser = argparse.ArgumentParser(
        description="Phone Agent iOS - AI-powered iOS phone automation",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
    # Run with default settings
    python ios.py

    # Specify model endpoint
    python ios.py --base-url http://localhost:8000/v1

    # Run with specific device
    python ios.py --device-id <UDID>

    # Use WiFi connection
    python ios.py --wda-url http://192.168.1.100:8100

    # List connected devices
    python ios.py --list-devices

    # Check device pairing status
    python ios.py --pair

    # List supported apps
    python ios.py --list-apps

    # Run a specific task
    python ios.py "Open Safari and search for iPhone tips"
        """,
    )

    # Model options
    parser.add_argument(
        "--base-url",
        type=str,
        default=os.getenv("PHONE_AGENT_BASE_URL", "http://localhost:8000/v1"),
        help="Model API base URL",
    )

    parser.add_argument(
        "--api-key",
        type=str,
        default="EMPTY",
        help="Model API KEY",
    )

    parser.add_argument(
        "--model",
        type=str,
        default=os.getenv("PHONE_AGENT_MODEL", "autoglm-phone-9b"),
        help="Model name",
    )

    parser.add_argument(
        "--max-steps",
        type=int,
        default=int(os.getenv("PHONE_AGENT_MAX_STEPS", "100")),
        help="Maximum steps per task",
    )

    # iOS Device options
    parser.add_argument(
        "--device-id",
        "-d",
        type=str,
        default=os.getenv("PHONE_AGENT_DEVICE_ID"),
        help="iOS device UDID",
    )

    parser.add_argument(
        "--wda-url",
        type=str,
        default=os.getenv("PHONE_AGENT_WDA_URL", "http://localhost:8100"),
        help="WebDriverAgent URL (default: http://localhost:8100)",
    )

    parser.add_argument(
        "--list-devices", action="store_true", help="List connected iOS devices and exit"
    )

    parser.add_argument(
        "--pair",
        action="store_true",
        help="Pair with iOS device (required for some operations)",
    )

    parser.add_argument(
        "--wda-status",
        action="store_true",
        help="Show WebDriverAgent status and exit",
    )

    # Other options
    parser.add_argument(
        "--quiet", "-q", action="store_true", help="Suppress verbose output"
    )

    parser.add_argument(
        "--list-apps", action="store_true", help="List supported apps and exit"
    )

    parser.add_argument(
        "--lang",
        type=str,
        choices=["cn", "en"],
        default=os.getenv("PHONE_AGENT_LANG", "cn"),
        help="Language for system prompt (cn or en, default: cn)",
    )

    parser.add_argument(
        "task",
        nargs="?",
        type=str,
        help="Task to execute (interactive mode if not provided)",
    )

    return parser.parse_args()


def handle_device_commands(args) -> bool:
    """
    Handle iOS device-related commands.

    Returns:
        True if a device command was handled (should exit), False otherwise.
    """
    conn = XCTestConnection(wda_url=args.wda_url)

    # Handle --list-devices
    if args.list_devices:
        devices = list_devices()
        if not devices:
            print("No iOS devices connected.")
            print("\nTroubleshooting:")
            print("  1. Connect device via USB")
            print("  2. Unlock device and trust this computer")
            print("  3. Run: idevice_id -l")
        else:
            print("Connected iOS devices:")
            print("-" * 70)
            for device in devices:
                conn_type = device.connection_type.value
                model_info = f"{device.model}" if device.model else "Unknown"
                ios_info = f"iOS {device.ios_version}" if device.ios_version else ""
                name_info = device.device_name or "Unnamed"

                print(f"  ✓ {name_info}")
                print(f"    UDID: {device.device_id}")
                print(f"    Model: {model_info}")
                print(f"    OS: {ios_info}")
                print(f"    Connection: {conn_type}")
                print("-" * 70)
        return True

    # Handle --pair
    if args.pair:
        print("Pairing with iOS device...")
        success, message = conn.pair_device(args.device_id)
        print(f"{'✓' if success else '✗'} {message}")
        return True

    # Handle --wda-status
    if args.wda_status:
        print(f"Checking WebDriverAgent status at {args.wda_url}...")
        print("-" * 50)

        if conn.is_wda_ready():
            print("✓ WebDriverAgent is running")

            status = conn.get_wda_status()
            if status:
                print(f"\nStatus details:")
                value = status.get("value", {})
                print(f"  Session ID: {status.get('sessionId', 'N/A')}")
                print(f"  Build: {value.get('build', {}).get('time', 'N/A')}")

                current_app = value.get("currentApp", {})
                if current_app:
                    print(f"\nCurrent App:")
                    print(f"  Bundle ID: {current_app.get('bundleId', 'N/A')}")
                    print(f"  Process ID: {current_app.get('pid', 'N/A')}")
        else:
            print("✗ WebDriverAgent is not running")
            print("\nPlease start WebDriverAgent on your iOS device:")
            print("  1. Open WebDriverAgent.xcodeproj in Xcode")
            print("  2. Select your device")
            print("  3. Run WebDriverAgentRunner (Product > Test or Cmd+U)")
            print(f"  4. For USB: Run port forwarding: iproxy 8100 8100")

        return True

    return False


def main():
    """Main entry point."""
    args = parse_args()

    # Handle --list-apps (no system check needed)
    if args.list_apps:
        print("Supported iOS apps:")
        print("\nNote: For iOS apps, Bundle IDs are configured in:")
        print("  phone_agent/config/apps_ios.py")
        print("\nCurrently configured apps:")
        for app in sorted(list_supported_apps()):
            print(f"  - {app}")
        print(
            "\nTo add iOS apps, find the Bundle ID and add to APP_PACKAGES_IOS dictionary."
        )
        return

    # Handle device commands (these may need partial system checks)
    if handle_device_commands(args):
        return

    # Run system requirements check before proceeding
    if not check_system_requirements(wda_url=args.wda_url):
        sys.exit(1)

    # Check model API connectivity and model availability
    # if not check_model_api(args.base_url, args.api_key, args.model):
    #     sys.exit(1)

    # Create configurations
    model_config = ModelConfig(
        base_url=args.base_url,
        model_name=args.model,
        api_key=args.api_key
    )

    agent_config = IOSAgentConfig(
        max_steps=args.max_steps,
        wda_url=args.wda_url,
        device_id=args.device_id,
        verbose=not args.quiet,
        lang=args.lang,
    )

    # Create iOS agent
    agent = IOSPhoneAgent(
        model_config=model_config,
        agent_config=agent_config,
    )

    # Print header
    print("=" * 50)
    print("Phone Agent iOS - AI-powered iOS automation")
    print("=" * 50)
    print(f"Model: {model_config.model_name}")
    print(f"Base URL: {model_config.base_url}")
    print(f"WDA URL: {args.wda_url}")
    print(f"Max Steps: {agent_config.max_steps}")
    print(f"Language: {agent_config.lang}")

    # Show device info
    devices = list_devices()
    if agent_config.device_id:
        print(f"Device: {agent_config.device_id}")
    elif devices:
        device = devices[0]
        print(f"Device: {device.device_name or device.device_id[:16]}")
        print(f"        {device.model}, iOS {device.ios_version}")

    print("=" * 50)

    # Run with provided task or enter interactive mode
    if args.task:
        print(f"\nTask: {args.task}\n")
        result = agent.run(args.task)
        print(f"\nResult: {result}")
    else:
        # Interactive mode
        print("\nEntering interactive mode. Type 'quit' to exit.\n")

        while True:
            try:
                task = input("Enter your task: ").strip()

                if task.lower() in ("quit", "exit", "q"):
                    print("Goodbye!")
                    break

                if not task:
                    continue

                print()
                result = agent.run(task)
                print(f"\nResult: {result}\n")
                agent.reset()

            except KeyboardInterrupt:
                print("\n\nInterrupted. Goodbye!")
                break
            except Exception as e:
                print(f"\nError: {e}\n")


if __name__ == "__main__":
    main()


================================================
FILE: main.py
================================================
#!/usr/bin/env python3
"""
Phone Agent CLI - AI-powered phone automation.

Usage:
    python main.py [OPTIONS]

Environment Variables:
    PHONE_AGENT_BASE_URL: Model API base URL (default: http://localhost:8000/v1)
    PHONE_AGENT_MODEL: Model name (default: autoglm-phone-9b)
    PHONE_AGENT_API_KEY: API key for model authentication (default: EMPTY)
    PHONE_AGENT_MAX_STEPS: Maximum steps per task (default: 100)
    PHONE_AGENT_DEVICE_ID: ADB device ID for multi-device setups
"""

import argparse
import os
import shutil
import subprocess
import sys
from urllib.parse import urlparse

from openai import OpenAI

from phone_agent import PhoneAgent
from phone_agent.agent import AgentConfig
from phone_agent.agent_ios import IOSAgentConfig, IOSPhoneAgent
from phone_agent.config.apps import list_supported_apps
from phone_agent.config.apps_harmonyos import list_supported_apps as list_harmonyos_apps
from phone_agent.config.apps_ios import list_supported_apps as list_ios_apps
from phone_agent.device_factory import DeviceType, get_device_factory, set_device_type
from phone_agent.model import ModelConfig
from phone_agent.xctest import XCTestConnection
from phone_agent.xctest import list_devices as list_ios_devices


def check_system_requirements(
    device_type: DeviceType = DeviceType.ADB, wda_url: str = "http://localhost:8100"
) -> bool:
    """
    Check system requirements before running the agent.

    Checks:
    1. ADB/HDC/iOS tools installed
    2. At least one device connected
    3. ADB Keyboard installed on the device (for ADB only)
    4. WebDriverAgent running (for iOS only)

    Args:
        device_type: Type of device tool (ADB, HDC, or IOS).
        wda_url: WebDriverAgent URL (for iOS only).

    Returns:
        True if all checks pass, False otherwise.
    """
    print("🔍 Checking system requirements...")
    print("-" * 50)

    all_passed = True

    # Determine tool name and command
    if device_type == DeviceType.IOS:
        tool_name = "libimobiledevice"
        tool_cmd = "idevice_id"
    else:
        tool_name = "ADB" if device_type == DeviceType.ADB else "HDC"
        tool_cmd = "adb" if device_type == DeviceType.ADB else "hdc"

    # Check 1: Tool installed
    print(f"1. Checking {tool_name} installation...", end=" ")
    if shutil.which(tool_cmd) is None:
        print("❌ FAILED")
        print(f"   Error: {tool_name} is not installed or not in PATH.")
        print(f"   Solution: Install {tool_name}:")
        if device_type == DeviceType.ADB:
            print("     - macOS: brew install android-platform-tools")
            print("     - Linux: sudo apt install android-tools-adb")
            print(
                "     - Windows: Download from https://developer.android.com/studio/releases/platform-tools"
            )
        elif device_type == DeviceType.HDC:
            print(
                "     - Download from HarmonyOS SDK or https://gitee.com/openharmony/docs"
            )
            print("     - Add to PATH environment variable")
        else:  # IOS
            print("     - macOS: brew install libimobiledevice")
            print("     - Linux: sudo apt-get install libimobiledevice-utils")
        all_passed = False
    else:
        # Double check by running version command
        try:
            if device_type == DeviceType.ADB:
                version_cmd = [tool_cmd, "version"]
            elif device_type == DeviceType.HDC:
                version_cmd = [tool_cmd, "-v"]
            else:  # IOS
                version_cmd = [tool_cmd, "-ln"]

            result = subprocess.run(
                version_cmd, capture_output=True, text=True, timeout=10
            )
            if result.returncode == 0:
                version_line = result.stdout.strip().split("\n")[0]
                print(f"✅ OK ({version_line if version_line else 'installed'})")
            else:
                print("❌ FAILED")
                print(f"   Error: {tool_name} command failed to run.")
                all_passed = False
        except FileNotFoundError:
            print("❌ FAILED")
            print(f"   Error: {tool_name} command not found.")
            all_passed = False
        except subprocess.TimeoutExpired:
            print("❌ FAILED")
            print(f"   Error: {tool_name} command timed out.")
            all_passed = False

    # If ADB is not installed, skip remaining checks
    if not all_passed:
        print("-" * 50)
        print("❌ System check failed. Please fix the issues above.")
        return False

    # Check 2: Device connected
    print("2. Checking connected devices...", end=" ")
    try:
        if device_type == DeviceType.ADB:
            result = subprocess.run(
                ["adb", "devices"], capture_output=True, text=True, timeout=10
            )
            lines = result.stdout.strip().split("\n")
            # Filter out header and empty lines, look for 'device' status
            devices = [
                line for line in lines[1:] if line.strip() and "\tdevice" in line
            ]
        elif device_type == DeviceType.HDC:
            result = subprocess.run(
                ["hdc", "list", "targets"], capture_output=True, text=True, timeout=10
            )
            lines = result.stdout.strip().split("\n")
            devices = [line for line in lines if line.strip()]
        else:  # IOS
            ios_devices = list_ios_devices()
            devices = [d.device_id for d in ios_devices]

        if not devices:
            print("❌ FAILED")
            print("   Error: No devices connected.")
            print("   Solution:")
            if device_type == DeviceType.ADB:
                print("     1. Enable USB debugging on your Android device")
                print("     2. Connect via USB and authorize the connection")
                print(
                    "     3. Or connect remotely: python main.py --connect <ip>:<port>"
                )
            elif device_type == DeviceType.HDC:
                print("     1. Enable USB debugging on your HarmonyOS device")
                print("     2. Connect via USB and authorize the connection")
                print(
                    "     3. Or connect remotely: python main.py --device-type hdc --connect <ip>:<port>"
                )
            else:  # IOS
                print("     1. Connect your iOS device via USB")
                print("     2. Unlock device and tap 'Trust This Computer'")
                print("     3. Verify: idevice_id -l")
                print("     4. Or connect via WiFi using device IP")
            all_passed = False
        else:
            if device_type == DeviceType.ADB:
                device_ids = [d.split("\t")[0] for d in devices]
            elif device_type == DeviceType.HDC:
                device_ids = [d.strip() for d in devices]
            else:  # IOS
                device_ids = devices
            print(
                f"✅ OK ({len(devices)} device(s): {', '.join(device_ids[:2])}{'...' if len(device_ids) > 2 else ''})"
            )
    except subprocess.TimeoutExpired:
        print("❌ FAILED")
        print(f"   Error: {tool_name} command timed out.")
        all_passed = False
    except Exception as e:
        print("❌ FAILED")
        print(f"   Error: {e}")
        all_passed = False

    # If no device connected, skip ADB Keyboard check
    if not all_passed:
        print("-" * 50)
        print("❌ System check failed. Please fix the issues above.")
        return False

    # Check 3: ADB Keyboard installed (only for ADB) or WebDriverAgent (for iOS)
    if device_type == DeviceType.ADB:
        print("3. Checking ADB Keyboard...", end=" ")
        try:
            result = subprocess.run(
                ["adb", "shell", "ime", "list", "-s"],
                capture_output=True,
                text=True,
                timeout=10,
            )
            ime_list = result.stdout.strip()

            if "com.android.adbkeyboard/.AdbIME" in ime_list:
                print("✅ OK")
            else:
                print("❌ FAILED")
                print("   Error: ADB Keyboard is not installed on the device.")
                print("   Solution:")
                print("     1. Download ADB Keyboard APK from:")
                print(
                    "        https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk"
                )
                print("     2. Install it on your device: adb install ADBKeyboard.apk")
                print(
                    "     3. Enable it in Settings > System > Languages & Input > Virtual Keyboard"
                )
                all_passed = False
        except subprocess.TimeoutExpired:
            print("❌ FAILED")
            print("   Error: ADB command timed out.")
            all_passed = False
        except Exception as e:
            print("❌ FAILED")
            print(f"   Error: {e}")
            all_passed = False
    elif device_type == DeviceType.HDC:
        # For HDC, skip keyboard check as it uses different input method
        print("3. Skipping keyboard check for HarmonyOS...", end=" ")
        print("✅ OK (using native input)")
    else:  # IOS
        # Check WebDriverAgent
        print(f"3. Checking WebDriverAgent ({wda_url})...", end=" ")
        try:
            conn = XCTestConnection(wda_url=wda_url)

            if conn.is_wda_ready():
                print("✅ OK")
                # Get WDA status for additional info
                status = conn.get_wda_status()
                if status:
                    session_id = status.get("sessionId", "N/A")
                    print(f"   Session ID: {session_id}")
            else:
                print("❌ FAILED")
                print("   Error: WebDriverAgent is not running or not accessible.")
                print("   Solution:")
                print("     1. Run WebDriverAgent on your iOS device via Xcode")
                print("     2. For USB: Set up port forwarding: iproxy 8100 8100")
                print(
                    "     3. For WiFi: Use device IP, e.g., --wda-url http://192.168.1.100:8100"
                )
                print("     4. Verify in browser: open http://localhost:8100/status")
                all_passed = False
        except Exception as e:
            print("❌ FAILED")
            print(f"   Error: {e}")
            all_passed = False

    print("-" * 50)

    if all_passed:
        print("✅ All system checks passed!\n")
    else:
        print("❌ System check failed. Please fix the issues above.")

    return all_passed


def check_model_api(base_url: str, model_name: str, api_key: str = "EMPTY") -> bool:
    """
    Check if the model API is accessible and the specified model exists.

    Checks:
    1. Network connectivity to the API endpoint
    2. Model exists in the available models list

    Args:
        base_url: The API base URL
        model_name: The model name to check
        api_key: The API key for authentication

    Returns:
        True if all checks pass, False otherwise.
    """
    print("🔍 Checking model API...")
    print("-" * 50)

    all_passed = True

    # Check 1: Network connectivity using chat API
    print(f"1. Checking API connectivity ({base_url})...", end=" ")
    try:
        # Create OpenAI client
        client = OpenAI(base_url=base_url, api_key=api_key, timeout=30.0)

        # Use chat completion to test connectivity (more universally supported than /models)
        response = client.chat.completions.create(
            model=model_name,
            messages=[{"role": "user", "content": "Hi"}],
            max_tokens=5,
            temperature=0.0,
            stream=False,
        )

        # Check if we got a valid response
        if response.choices and len(response.choices) > 0:
            print("✅ OK")
        else:
            print("❌ FAILED")
            print("   Error: Received empty response from API")
            all_passed = False

    except Exception as e:
        print("❌ FAILED")
        error_msg = str(e)

        # Provide more specific error messages
        if "Connection refused" in error_msg or "Connection error" in error_msg:
            print(f"   Error: Cannot connect to {base_url}")
            print("   Solution:")
            print("     1. Check if the model server is running")
            print("     2. Verify the base URL is correct")
            print(f"     3. Try: curl {base_url}/chat/completions")
        elif "timed out" in error_msg.lower() or "timeout" in error_msg.lower():
            print(f"   Error: Connection to {base_url} timed out")
            print("   Solution:")
            print("     1. Check your network connection")
            print("     2. Verify the server is responding")
        elif (
            "Name or service not known" in error_msg
            or "nodename nor servname" in error_msg
        ):
            print(f"   Error: Cannot resolve hostname")
            print("   Solution:")
            print("     1. Check the URL is correct")
            print("     2. Verify DNS settings")
        else:
            print(f"   Error: {error_msg}")

        all_passed = False

    print("-" * 50)

    if all_passed:
        print("✅ Model API checks passed!\n")
    else:
        print("❌ Model API check failed. Please fix the issues above.")

    return all_passed


def parse_args() -> argparse.Namespace:
    """Parse command line arguments."""
    parser = argparse.ArgumentParser(
        description="Phone Agent - AI-powered phone automation",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
    # Run with default settings (Android)
    python main.py

    # Specify model endpoint
    python main.py --base-url http://localhost:8000/v1

    # Use API key for authentication
    python main.py --apikey sk-xxxxx

    # Run with specific device
    python main.py --device-id emulator-5554

    # Connect to remote device
    python main.py --connect 192.168.1.100:5555

    # List connected devices
    python main.py --list-devices

    # Enable TCP/IP on USB device and get connection info
    python main.py --enable-tcpip

    # List supported apps
    python main.py --list-apps

    # iOS specific examples
    # Run with iOS device
    python main.py --device-type ios "Open Safari and search for iPhone tips"

    # Use WiFi connection for iOS
    python main.py --device-type ios --wda-url http://192.168.1.100:8100

    # List connected iOS devices
    python main.py --device-type ios --list-devices

    # Check WebDriverAgent status
    python main.py --device-type ios --wda-status

    # Pair with iOS device
    python main.py --device-type ios --pair
        """,
    )

    # Model options
    parser.add_argument(
        "--base-url",
        type=str,
        default=os.getenv("PHONE_AGENT_BASE_URL", "http://localhost:8000/v1"),
        help="Model API base URL",
    )

    parser.add_argument(
        "--model",
        type=str,
        default=os.getenv("PHONE_AGENT_MODEL", "autoglm-phone-9b"),
        help="Model name",
    )

    parser.add_argument(
        "--apikey",
        type=str,
        default=os.getenv("PHONE_AGENT_API_KEY", "EMPTY"),
        help="API key for model authentication",
    )

    parser.add_argument(
        "--max-steps",
        type=int,
        default=int(os.getenv("PHONE_AGENT_MAX_STEPS", "100")),
        help="Maximum steps per task",
    )

    # Device options
    parser.add_argument(
        "--device-id",
        "-d",
        type=str,
        default=os.getenv("PHONE_AGENT_DEVICE_ID"),
        help="ADB device ID",
    )

    parser.add_argument(
        "--connect",
        "-c",
        type=str,
        metavar="ADDRESS",
        help="Connect to remote device (e.g., 192.168.1.100:5555)",
    )

    parser.add_argument(
        "--disconnect",
        type=str,
        nargs="?",
        const="all",
        metavar="ADDRESS",
        help="Disconnect from remote device (or 'all' to disconnect all)",
    )

    parser.add_argument(
        "--list-devices", action="store_true", help="List connected devices and exit"
    )

    parser.add_argument(
        "--enable-tcpip",
        type=int,
        nargs="?",
        const=5555,
        metavar="PORT",
        help="Enable TCP/IP debugging on USB device (default port: 5555)",
    )

    # iOS specific options
    parser.add_argument(
        "--wda-url",
        type=str,
        default=os.getenv("PHONE_AGENT_WDA_URL", "http://localhost:8100"),
        help="WebDriverAgent URL for iOS (default: http://localhost:8100)",
    )

    parser.add_argument(
        "--pair",
        action="store_true",
        help="Pair with iOS device (required for some operations)",
    )

    parser.add_argument(
        "--wda-status",
        action="store_true",
        help="Show WebDriverAgent status and exit (iOS only)",
    )

    # Other options
    parser.add_argument(
        "--quiet", "-q", action="store_true", help="Suppress verbose output"
    )

    parser.add_argument(
        "--list-apps", action="store_true", help="List supported apps and exit"
    )

    parser.add_argument(
        "--lang",
        type=str,
        choices=["cn", "en"],
        default=os.getenv("PHONE_AGENT_LANG", "cn"),
        help="Language for system prompt (cn or en, default: cn)",
    )

    parser.add_argument(
        "--device-type",
        type=str,
        choices=["adb", "hdc", "ios"],
        default=os.getenv("PHONE_AGENT_DEVICE_TYPE", "adb"),
        help="Device type: adb for Android, hdc for HarmonyOS, ios for iPhone (default: adb)",
    )

    parser.add_argument(
        "task",
        nargs="?",
        type=str,
        help="Task to execute (interactive mode if not provided)",
    )

    return parser.parse_args()


def handle_ios_device_commands(args) -> bool:
    """
    Handle iOS device-related commands.

    Returns:
        True if a device command was handled (should exit), False otherwise.
    """
    conn = XCTestConnection(wda_url=args.wda_url)

    # Handle --list-devices
    if args.list_devices:
        devices = list_ios_devices()
        if not devices:
            print("No iOS devices connected.")
            print("\nTroubleshooting:")
            print("  1. Connect device via USB")
            print("  2. Unlock device and trust this computer")
            print("  3. Run: idevice_id -l")
        else:
            print("Connected iOS devices:")
            print("-" * 70)
            for device in devices:
                conn_type = device.connection_type.value
                model_info = f"{device.model}" if device.model else "Unknown"
                ios_info = f"iOS {device.ios_version}" if device.ios_version else ""
                name_info = device.device_name or "Unnamed"

                print(f"  ✓ {name_info}")
                print(f"    UUID: {device.device_id}")
                print(f"    Model: {model_info}")
                print(f"    OS: {ios_info}")
                print(f"    Connection: {conn_type}")
                print("-" * 70)
        return True

    # Handle --pair
    if args.pair:
        print("Pairing with iOS device...")
        success, message = conn.pair_device(args.device_id)
        print(f"{'✓' if success else '✗'} {message}")
        return True

    # Handle --wda-status
    if args.wda_status:
        print(f"Checking WebDriverAgent status at {args.wda_url}...")
        print("-" * 50)

        if conn.is_wda_ready():
            print("✓ WebDriverAgent is running")

            status = conn.get_wda_status()
            if status:
                print(f"\nStatus details:")
                value = status.get("value", {})
                print(f"  Session ID: {status.get('sessionId', 'N/A')}")
                print(f"  Build: {value.get('build', {}).get('time', 'N/A')}")

                current_app = value.get("currentApp", {})
                if current_app:
                    print(f"\nCurrent App:")
                    print(f"  Bundle ID: {current_app.get('bundleId', 'N/A')}")
                    print(f"  Process ID: {current_app.get('pid', 'N/A')}")
        else:
            print("✗ WebDriverAgent is not running")
            print("\nPlease start WebDriverAgent on your iOS device:")
            print("  1. Open WebDriverAgent.xcodeproj in Xcode")
            print("  2. Select your device")
            print("  3. Run WebDriverAgentRunner (Product > Test or Cmd+U)")
            print(f"  4. For USB: Run port forwarding: iproxy 8100 8100")

        return True

    return False


def handle_device_commands(args) -> bool:
    """
    Handle device-related commands.

    Returns:
        True if a device command was handled (should exit), False otherwise.
    """
    device_type = (
        DeviceType.ADB
        if args.device_type == "adb"
        else (DeviceType.HDC if args.device_type == "hdc" else DeviceType.IOS)
    )

    # Handle iOS-specific commands
    if device_type == DeviceType.IOS:
        return handle_ios_device_commands(args)

    device_factory = get_device_factory()
    ConnectionClass = device_factory.get_connection_class()
    conn = ConnectionClass()

    # Handle --list-devices
    if args.list_devices:
        devices = device_factory.list_devices()
        if not devices:
            print("No devices connected.")
        else:
            print("Connected devices:")
            print("-" * 60)
            for device in devices:
                status_icon = "✓" if device.status == "device" else "✗"
                conn_type = device.connection_type.value
                model_info = f" ({device.model})" if device.model else ""
                print(
                    f"  {status_icon} {device.device_id:<30} [{conn_type}]{model_info}"
                )
        return True

    # Handle --connect
    if args.connect:
        print(f"Connecting to {args.connect}...")
        success, message = conn.connect(args.connect)
        print(f"{'✓' if success else '✗'} {message}")
        if success:
            # Set as default device
            args.device_id = args.connect
        return not success  # Continue if connection succeeded

    # Handle --disconnect
    if args.disconnect:
        if args.disconnect == "all":
            print("Disconnecting all remote devices...")
            success, message = conn.disconnect()
        else:
            print(f"Disconnecting from {args.disconnect}...")
            success, message = conn.disconnect(args.disconnect)
        print(f"{'✓' if success else '✗'} {message}")
        return True

    # Handle --enable-tcpip
    if args.enable_tcpip:
        port = args.enable_tcpip
        print(f"Enabling TCP/IP debugging on port {port}...")

        success, message = conn.enable_tcpip(port, args.device_id)
        print(f"{'✓' if success else '✗'} {message}")

        if success:
            # Try to get device IP
            ip = conn.get_device_ip(args.device_id)
            if ip:
                print(f"\nYou can now connect remotely using:")
                print(f"  python main.py --connect {ip}:{port}")
                print(f"\nOr via ADB directly:")
                print(f"  adb connect {ip}:{port}")
            else:
                print("\nCould not determine device IP. Check device WiFi settings.")
        return True

    return False


def main():
    """Main entry point."""
    args = parse_args()

    # Set device type globally based on args
    if args.device_type == "adb":
        device_type = DeviceType.ADB
    elif args.device_type == "hdc":
        device_type = DeviceType.HDC
    else:  # ios
        device_type = DeviceType.IOS

    # Set device type globally for non-iOS devices
    if device_type != DeviceType.IOS:
        set_device_type(device_type)

    # Enable HDC verbose mode if using HDC
    if device_type == DeviceType.HDC:
        from phone_agent.hdc import set_hdc_verbose

        set_hdc_verbose(True)

    # Handle --list-apps (no system check needed)
    if args.list_apps:
        if device_type == DeviceType.HDC:
            print("Supported HarmonyOS apps:")
            apps = list_harmonyos_apps()
        elif device_type == DeviceType.IOS:
            print("Supported iOS apps:")
            print("\nNote: For iOS apps, Bundle IDs are configured in:")
            print("  phone_agent/config/apps_ios.py")
            print("\nCurrently configured apps:")
            apps = list_ios_apps()
        else:
            print("Supported Android apps:")
            apps = list_supported_apps()

        for app in sorted(apps):
            print(f"  - {app}")

        if device_type == DeviceType.IOS:
            print(
                "\nTo add iOS apps, find the Bundle ID and add to APP_PACKAGES_IOS dictionary."
            )
        return

    # Handle device commands (these may need partial system checks)
    if handle_device_commands(args):
        return

    # Run system requirements check before proceeding
    if not check_system_requirements(
        device_type,
        wda_url=args.wda_url
        if device_type == DeviceType.IOS
        else "http://localhost:8100",
    ):
        sys.exit(1)

    # Check model API connectivity and model availability
    if not check_model_api(args.base_url, args.model, args.apikey):
        sys.exit(1)

    # Create configurations and agent based on device type
    model_config = ModelConfig(
        base_url=args.base_url,
        model_name=args.model,
        api_key=args.apikey,
        lang=args.lang,
    )

    if device_type == DeviceType.IOS:
        # Create iOS agent
        agent_config = IOSAgentConfig(
            max_steps=args.max_steps,
            wda_url=args.wda_url,
            device_id=args.device_id,
            verbose=not args.quiet,
            lang=args.lang,
        )

        agent = IOSPhoneAgent(
            model_config=model_config,
            agent_config=agent_config,
        )
    else:
        # Create Android/HarmonyOS agent
        agent_config = AgentConfig(
            max_steps=args.max_steps,
            device_id=args.device_id,
            verbose=not args.quiet,
            lang=args.lang,
        )

        agent = PhoneAgent(
            model_config=model_config,
            agent_config=agent_config,
        )

    # Print header
    print("=" * 50)
    if device_type == DeviceType.IOS:
        print("Phone Agent iOS - AI-powered iOS automation")
    else:
        print("Phone Agent - AI-powered phone automation")
    print("=" * 50)
    print(f"Model: {model_config.model_name}")
    print(f"Base URL: {model_config.base_url}")
    print(f"Max Steps: {agent_config.max_steps}")
    print(f"Language: {agent_config.lang}")
    print(f"Device Type: {args.device_type.upper()}")

    # Show iOS-specific config
    if device_type == DeviceType.IOS:
        print(f"WDA URL: {args.wda_url}")

    # Show device info
    if device_type == DeviceType.IOS:
        devices = list_ios_devices()
        if agent_config.device_id:
            print(f"Device: {agent_config.device_id}")
        elif devices:
            device = devices[0]
            print(f"Device: {device.device_name or device.device_id[:16]}")
            if device.model and device.ios_version:
                print(f"        {device.model}, iOS {device.ios_version}")
    else:
        device_factory = get_device_factory()
        devices = device_factory.list_devices()
        if agent_config.device_id:
            print(f"Device: {agent_config.device_id}")
        elif devices:
            print(f"Device: {devices[0].device_id} (auto-detected)")

    print("=" * 50)

    # Run with provided task or enter interactive mode
    if args.task:
        print(f"\nTask: {args.task}\n")
        result = agent.run(args.task)
        print(f"\nResult: {result}")
    else:
        # Interactive mode
        print("\nEntering interactive mode. Type 'quit' to exit.\n")

        while True:
            try:
                task = input("Enter your task: ").strip()

                if task.lower() in ("quit", "exit", "q"):
                    print("Goodbye!")
                    break

                if not task:
                    continue

                print()
                result = agent.run(task)
                print(f"\nResult: {result}\n")
                agent.reset()

            except KeyboardInterrupt:
                print("\n\nInterrupted. Goodbye!")
                break
            except Exception as e:
                print(f"\nError: {e}\n")


if __name__ == "__main__":
    main()


================================================
FILE: phone_agent/__init__.py
================================================
"""
Phone Agent - An AI-powered phone automation framework.

This package provides tools for automating Android and iOS phone interactions
using AI models for visual understanding and decision making.
"""

from phone_agent.agent import PhoneAgent
from phone_agent.agent_ios import IOSPhoneAgent

__version__ = "0.1.0"
__all__ = ["PhoneAgent", "IOSPhoneAgent"]


================================================
FILE: phone_agent/actions/__init__.py
================================================
"""Action handling module for Phone Agent."""

from phone_agent.actions.handler import ActionHandler, ActionResult

__all__ = ["ActionHandler", "ActionResult"]


================================================
FILE: phone_agent/actions/handler.py
================================================
"""Action handler for processing AI model outputs."""

import ast
import re
import subprocess
import time
from dataclasses import dataclass
from typing import Any, Callable

from phone_agent.config.timing import TIMING_CONFIG
from phone_agent.device_factory import get_device_factory


@dataclass
class ActionResult:
    """Result of an action execution."""

    success: bool
    should_finish: bool
    message: str | None = None
    requires_confirmation: bool = False


class ActionHandler:
    """
    Handles execution of actions from AI model output.

    Args:
        device_id: Optional ADB device ID for multi-device setups.
        confirmation_callback: Optional callback for sensitive action confirmation.
            Should return True to proceed, False to cancel.
        takeover_callback: Optional callback for takeover requests (login, captcha).
    """

    def __init__(
        self,
        device_id: str | None = None,
        confirmation_callback: Callable[[str], bool] | None = None,
        takeover_callback: Callable[[str], None] | None = None,
    ):
        self.device_id = device_id
        self.confirmation_callback = confirmation_callback or self._default_confirmation
        self.takeover_callback = takeover_callback or self._default_takeover

    def execute(
        self, action: dict[str, Any], screen_width: int, screen_height: int
    ) -> ActionResult:
        """
        Execute an action from the AI model.

        Args:
            action: The action dictionary from the model.
            screen_width: Current screen width in pixels.
            screen_height: Current screen height in pixels.

        Returns:
            ActionResult indicating success and whether to finish.
        """
        action_type = action.get("_metadata")

        if action_type == "finish":
            return ActionResult(
                success=True, should_finish=True, message=action.get("message")
            )

        if action_type != "do":
            return ActionResult(
                success=False,
                should_finish=True,
                message=f"Unknown action type: {action_type}",
            )

        action_name = action.get("action")
        handler_method = self._get_handler(action_name)

        if handler_method is None:
            return ActionResult(
                success=False,
                should_finish=False,
                message=f"Unknown action: {action_name}",
            )

        try:
            return handler_method(action, screen_width, screen_height)
        except Exception as e:
            return ActionResult(
                success=False, should_finish=False, message=f"Action failed: {e}"
            )

    def _get_handler(self, action_name: str) -> Callable | None:
        """Get the handler method for an action."""
        handlers = {
            "Launch": self._handle_launch,
            "Tap": self._handle_tap,
            "Type": self._handle_type,
            "Type_Name": self._handle_type,
            "Swipe": self._handle_swipe,
            "Back": self._handle_back,
            "Home": self._handle_home,
            "Double Tap": self._handle_double_tap,
            "Long Press": self._handle_long_press,
            "Wait": self._handle_wait,
            "Take_over": self._handle_takeover,
            "Note": self._handle_note,
            "Call_API": self._handle_call_api,
            "Interact": self._handle_interact,
        }
        return handlers.get(action_name)

    def _convert_relative_to_absolute(
        self, element: list[int], screen_width: int, screen_height: int
    ) -> tuple[int, int]:
        """Convert relative coordinates (0-1000) to absolute pixels."""
        x = int(element[0] / 1000 * screen_width)
        y = int(element[1] / 1000 * screen_height)
        return x, y

    def _handle_launch(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle app launch action."""
        app_name = action.get("app")
        if not app_name:
            return ActionResult(False, False, "No app name specified")

        device_factory = get_device_factory()
        success = device_factory.launch_app(app_name, self.device_id)
        if success:
            return ActionResult(True, False)
        return ActionResult(False, False, f"App not found: {app_name}")

    def _handle_tap(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle tap action."""
        element = action.get("element")
        if not element:
            return ActionResult(False, False, "No element coordinates")

        x, y = self._convert_relative_to_absolute(element, width, height)

        # Check for sensitive operation
        if "message" in action:
            if not self.confirmation_callback(action["message"]):
                return ActionResult(
                    success=False,
                    should_finish=True,
                    message="User cancelled sensitive operation",
                )

        device_factory = get_device_factory()
        device_factory.tap(x, y, self.device_id)
        return ActionResult(True, False)

    def _handle_type(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle text input action."""
        text = action.get("text", "")

        device_factory = get_device_factory()

        # Switch to ADB keyboard
        original_ime = device_factory.detect_and_set_adb_keyboard(self.device_id)
        time.sleep(TIMING_CONFIG.action.keyboard_switch_delay)

        # Clear existing text and type new text
        device_factory.clear_text(self.device_id)
        time.sleep(TIMING_CONFIG.action.text_clear_delay)

        # Handle multiline text by splitting on newlines
        device_factory.type_text(text, self.device_id)
        time.sleep(TIMING_CONFIG.action.text_input_delay)

        # Restore original keyboard
        device_factory.restore_keyboard(original_ime, self.device_id)
        time.sleep(TIMING_CONFIG.action.keyboard_restore_delay)

        return ActionResult(True, False)

    def _handle_swipe(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle swipe action."""
        start = action.get("start")
        end = action.get("end")

        if not start or not end:
            return ActionResult(False, False, "Missing swipe coordinates")

        start_x, start_y = self._convert_relative_to_absolute(start, width, height)
        end_x, end_y = self._convert_relative_to_absolute(end, width, height)

        device_factory = get_device_factory()
        device_factory.swipe(start_x, start_y, end_x, end_y, device_id=self.device_id)
        return ActionResult(True, False)

    def _handle_back(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle back button action."""
        device_factory = get_device_factory()
        device_factory.back(self.device_id)
        return ActionResult(True, False)

    def _handle_home(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle home button action."""
        device_factory = get_device_factory()
        device_factory.home(self.device_id)
        return ActionResult(True, False)

    def _handle_double_tap(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle double tap action."""
        element = action.get("element")
        if not element:
            return ActionResult(False, False, "No element coordinates")

        x, y = self._convert_relative_to_absolute(element, width, height)
        device_factory = get_device_factory()
        device_factory.double_tap(x, y, self.device_id)
        return ActionResult(True, False)

    def _handle_long_press(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle long press action."""
        element = action.get("element")
        if not element:
            return ActionResult(False, False, "No element coordinates")

        x, y = self._convert_relative_to_absolute(element, width, height)
        device_factory = get_device_factory()
        device_factory.long_press(x, y, device_id=self.device_id)
        return ActionResult(True, False)

    def _handle_wait(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle wait action."""
        duration_str = action.get("duration", "1 seconds")
        try:
            duration = float(duration_str.replace("seconds", "").strip())
        except ValueError:
            duration = 1.0

        time.sleep(duration)
        return ActionResult(True, False)

    def _handle_takeover(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle takeover request (login, captcha, etc.)."""
        message = action.get("message", "User intervention required")
        self.takeover_callback(message)
        return ActionResult(True, False)

    def _handle_note(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle note action (placeholder for content recording)."""
        # This action is typically used for recording page content
        # Implementation depends on specific requirements
        return ActionResult(True, False)

    def _handle_call_api(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle API call action (placeholder for summarization)."""
        # This action is typically used for content summarization
        # Implementation depends on specific requirements
        return ActionResult(True, False)

    def _handle_interact(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle interaction request (user choice needed)."""
        # This action signals that user input is needed
        return ActionResult(True, False, message="User interaction required")

    def _send_keyevent(self, keycode: str) -> None:
        """Send a keyevent to the device."""
        from phone_agent.device_factory import DeviceType, get_device_factory
        from phone_agent.hdc.connection import _run_hdc_command

        device_factory = get_device_factory()

        # Handle HDC devices with HarmonyOS-specific keyEvent command
        if device_factory.device_type == DeviceType.HDC:
            hdc_prefix = ["hdc", "-t", self.device_id] if self.device_id else ["hdc"]
            
            # Map common keycodes to HarmonyOS keyEvent codes
            # KEYCODE_ENTER (66) -> 2054 (HarmonyOS Enter key code)
            if keycode == "KEYCODE_ENTER" or keycode == "66":
                _run_hdc_command(
                    hdc_prefix + ["shell", "uitest", "uiInput", "keyEvent", "2054"],
                    capture_output=True,
                    text=True,
                )
            else:
                # For other keys, try to use the numeric code directly
                # If keycode is a string like "KEYCODE_ENTER", convert it
                try:
                    # Try to extract numeric code from string or use as-is
                    if keycode.startswith("KEYCODE_"):
                        # For now, only handle ENTER, other keys may need mapping
                        if "ENTER" in keycode:
                            _run_hdc_command(
                                hdc_prefix + ["shell", "uitest", "uiInput", "keyEvent", "2054"],
                                capture_output=True,
                                text=True,
                            )
                        else:
                            # Fallback to ADB-style command for unsupported keys
                            subprocess.run(
                                hdc_prefix + ["shell", "input", "keyevent", keycode],
                                capture_output=True,
                                text=True,
                            )
                    else:
                        # Assume it's a numeric code
                        _run_hdc_command(
                            hdc_prefix + ["shell", "uitest", "uiInput", "keyEvent", str(keycode)],
                            capture_output=True,
                            text=True,
                        )
                except Exception:
                    # Fallback to ADB-style command
                    subprocess.run(
                        hdc_prefix + ["shell", "input", "keyevent", keycode],
                        capture_output=True,
                        text=True,
                    )
        else:
            # ADB devices use standard input keyevent command
            cmd_prefix = ["adb", "-s", self.device_id] if self.device_id else ["adb"]
            subprocess.run(
                cmd_prefix + ["shell", "input", "keyevent", keycode],
                capture_output=True,
                text=True,
            )

    @staticmethod
    def _default_confirmation(message: str) -> bool:
        """Default confirmation callback using console input."""
        response = input(f"Sensitive operation: {message}\nConfirm? (Y/N): ")
        return response.upper() == "Y"

    @staticmethod
    def _default_takeover(message: str) -> None:
        """Default takeover callback using console input."""
        input(f"{message}\nPress Enter after completing manual operation...")


def parse_action(response: str) -> dict[str, Any]:
    """
    Parse action from model response.

    Args:
        response: Raw response string from the model.

    Returns:
        Parsed action dictionary.

    Raises:
        ValueError: If the response cannot be parsed.
    """
    print(f"Parsing action: {response}")
    try:
        response = response.strip()
        if response.startswith('do(action="Type"') or response.startswith(
            'do(action="Type_Name"'
        ):
            text = response.split("text=", 1)[1][1:-2]
            action = {"_metadata": "do", "action": "Type", "text": text}
            return action
        elif response.startswith("do"):
            # Use AST parsing instead of eval for safety
            try:
                # Escape special characters (newlines, tabs, etc.) for valid Python syntax
                response = response.replace('\n', '\\n')
                response = response.replace('\r', '\\r')
                response = response.replace('\t', '\\t')

                tree = ast.parse(response, mode="eval")
                if not isinstance(tree.body, ast.Call):
                    raise ValueError("Expected a function call")

                call = tree.body
                # Extract keyword arguments safely
                action = {"_metadata": "do"}
                for keyword in call.keywords:
                    key = keyword.arg
                    value = ast.literal_eval(keyword.value)
                    action[key] = value

                return action
            except (SyntaxError, ValueError) as e:
                raise ValueError(f"Failed to parse do() action: {e}")

        elif response.startswith("finish"):
            action = {
                "_metadata": "finish",
                "message": response.replace("finish(message=", "")[1:-2],
            }
        else:
            raise ValueError(f"Failed to parse action: {response}")
        return action
    except Exception as e:
        raise ValueError(f"Failed to parse action: {e}")


def do(**kwargs) -> dict[str, Any]:
    """Helper function for creating 'do' actions."""
    kwargs["_metadata"] = "do"
    return kwargs


def finish(**kwargs) -> dict[str, Any]:
    """Helper function for creating 'finish' actions."""
    kwargs["_metadata"] = "finish"
    return kwargs


================================================
FILE: phone_agent/actions/handler_ios.py
================================================
"""Action handler for iOS automation using WebDriverAgent."""

import time
from dataclasses import dataclass
from typing import Any, Callable

from phone_agent.xctest import (
    back,
    double_tap,
    home,
    launch_app,
    long_press,
    swipe,
    tap,
)
from phone_agent.xctest.input import clear_text, hide_keyboard, type_text


@dataclass
class ActionResult:
    """Result of an action execution."""

    success: bool
    should_finish: bool
    message: str | None = None
    requires_confirmation: bool = False


class IOSActionHandler:
    """
    Handles execution of actions from AI model output for iOS devices.

    Args:
        wda_url: WebDriverAgent URL.
        session_id: Optional WDA session ID.
        confirmation_callback: Optional callback for sensitive action confirmation.
            Should return True to proceed, False to cancel.
        takeover_callback: Optional callback for takeover requests (login, captcha).
    """

    def __init__(
        self,
        wda_url: str = "http://localhost:8100",
        session_id: str | None = None,
        confirmation_callback: Callable[[str], bool] | None = None,
        takeover_callback: Callable[[str], None] | None = None,
    ):
        self.wda_url = wda_url
        self.session_id = session_id
        self.confirmation_callback = confirmation_callback or self._default_confirmation
        self.takeover_callback = takeover_callback or self._default_takeover

    def execute(
        self, action: dict[str, Any], screen_width: int, screen_height: int
    ) -> ActionResult:
        """
        Execute an action from the AI model.

        Args:
            action: The action dictionary from the model.
            screen_width: Current screen width in pixels.
            screen_height: Current screen height in pixels.

        Returns:
            ActionResult indicating success and whether to finish.
        """
        action_type = action.get("_metadata")

        if action_type == "finish":
            return ActionResult(
                success=True, should_finish=True, message=action.get("message")
            )

        if action_type != "do":
            return ActionResult(
                success=False,
                should_finish=True,
                message=f"Unknown action type: {action_type}",
            )

        action_name = action.get("action")
        handler_method = self._get_handler(action_name)

        if handler_method is None:
            return ActionResult(
                success=False,
                should_finish=False,
                message=f"Unknown action: {action_name}",
            )

        try:
            return handler_method(action, screen_width, screen_height)
        except Exception as e:
            return ActionResult(
                success=False, should_finish=False, message=f"Action failed: {e}"
            )

    def _get_handler(self, action_name: str) -> Callable | None:
        """Get the handler method for an action."""
        handlers = {
            "Launch": self._handle_launch,
            "Tap": self._handle_tap,
            "Type": self._handle_type,
            "Type_Name": self._handle_type,
            "Swipe": self._handle_swipe,
            "Back": self._handle_back,
            "Home": self._handle_home,
            "Double Tap": self._handle_double_tap,
            "Long Press": self._handle_long_press,
            "Wait": self._handle_wait,
            "Take_over": self._handle_takeover,
            "Note": self._handle_note,
            "Call_API": self._handle_call_api,
            "Interact": self._handle_interact,
        }
        return handlers.get(action_name)

    def _convert_relative_to_absolute(
        self, element: list[int], screen_width: int, screen_height: int
    ) -> tuple[int, int]:
        """Convert relative coordinates (0-1000) to absolute pixels."""
        x = int(element[0] / 1000 * screen_width)
        y = int(element[1] / 1000 * screen_height)
        return x, y

    def _handle_launch(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle app launch action."""
        app_name = action.get("app")
        if not app_name:
            return ActionResult(False, False, "No app name specified")

        success = launch_app(
            app_name, wda_url=self.wda_url, session_id=self.session_id
        )
        if success:
            return ActionResult(True, False)
        return ActionResult(False, False, f"App not found: {app_name}")

    def _handle_tap(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle tap action."""
        element = action.get("element")
        if not element:
            return ActionResult(False, False, "No element coordinates")

        x, y = self._convert_relative_to_absolute(element, width, height)

        print(f"Physically tap on ({x}, {y})")

        # Check for sensitive operation
        if "message" in action:
            if not self.confirmation_callback(action["message"]):
                return ActionResult(
                    success=False,
                    should_finish=True,
                    message="User cancelled sensitive operation",
                )

        tap(x, y, wda_url=self.wda_url, session_id=self.session_id)
        return ActionResult(True, False)

    def _handle_type(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle text input action."""
        text = action.get("text", "")

        # Clear existing text and type new text
        clear_text(wda_url=self.wda_url, session_id=self.session_id)
        time.sleep(0.5)

        type_text(text, wda_url=self.wda_url, session_id=self.session_id)
        time.sleep(0.5)

        # Hide keyboard after typing
        hide_keyboard(wda_url=self.wda_url, session_id=self.session_id)
        time.sleep(0.5)

        return ActionResult(True, False)

    def _handle_swipe(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle swipe action."""
        start = action.get("start")
        end = action.get("end")

        if not start or not end:
            return ActionResult(False, False, "Missing swipe coordinates")

        start_x, start_y = self._convert_relative_to_absolute(start, width, height)
        end_x, end_y = self._convert_relative_to_absolute(end, width, height)

        print(f"Physically scroll from ({start_x}, {start_y}) to ({end_x}, {end_y})")

        swipe(
            start_x,
            start_y,
            end_x,
            end_y,
            wda_url=self.wda_url,
            session_id=self.session_id,
        )
        return ActionResult(True, False)

    def _handle_back(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle back gesture (swipe from left edge)."""
        back(wda_url=self.wda_url, session_id=self.session_id)
        return ActionResult(True, False)

    def _handle_home(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle home button action."""
        home(wda_url=self.wda_url, session_id=self.session_id)
        return ActionResult(True, False)

    def _handle_double_tap(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle double tap action."""
        element = action.get("element")
        if not element:
            return ActionResult(False, False, "No element coordinates")

        x, y = self._convert_relative_to_absolute(element, width, height)
        double_tap(x, y, wda_url=self.wda_url, session_id=self.session_id)
        return ActionResult(True, False)

    def _handle_long_press(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle long press action."""
        element = action.get("element")
        if not element:
            return ActionResult(False, False, "No element coordinates")

        x, y = self._convert_relative_to_absolute(element, width, height)
        long_press(
            x,
            y,
            duration=3.0,
            wda_url=self.wda_url,
            session_id=self.session_id,
        )
        return ActionResult(True, False)

    def _handle_wait(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle wait action."""
        duration_str = action.get("duration", "1 seconds")
        try:
            duration = float(duration_str.replace("seconds", "").strip())
        except ValueError:
            duration = 1.0

        time.sleep(duration)
        return ActionResult(True, False)

    def _handle_takeover(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle takeover request (login, captcha, etc.)."""
        message = action.get("message", "User intervention required")
        self.takeover_callback(message)
        return ActionResult(True, False)

    def _handle_note(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle note action (placeholder for content recording)."""
        # This action is typically used for recording page content
        # Implementation depends on specific requirements
        return ActionResult(True, False)

    def _handle_call_api(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle API call action (placeholder for summarization)."""
        # This action is typically used for content summarization
        # Implementation depends on specific requirements
        return ActionResult(True, False)

    def _handle_interact(self, action: dict, width: int, height: int) -> ActionResult:
        """Handle interaction request (user choice needed)."""
        # This action signals that user input is needed
        return ActionResult(True, False, message="User interaction required")

    @staticmethod
    def _default_confirmation(message: str) -> bool:
        """Default confirmation callback using console input."""
        response = input(f"Sensitive operation: {message}\nConfirm? (Y/N): ")
        return response.upper() == "Y"

    @staticmethod
    def _default_takeover(message: str) -> None:
        """Default takeover callback using console input."""
        input(f"{message}\nPress Enter after completing manual operation...")


================================================
FILE: phone_agent/adb/__init__.py
================================================
"""ADB utilities for Android device interaction."""

from phone_agent.adb.connection import (
    ADBConnection,
    ConnectionType,
    DeviceInfo,
    list_devices,
    quick_connect,
)
from phone_agent.adb.device import (
    back,
    double_tap,
    get_current_app,
    home,
    launch_app,
    long_press,
    swipe,
    tap,
)
from phone_agent.adb.input import (
    clear_text,
    detect_and_set_adb_keyboard,
    restore_keyboard,
    type_text,
)
from phone_agent.adb.screenshot import get_screenshot

__all__ = [
    # Screenshot
    "get_screenshot",
    # Input
    "type_text",
    "clear_text",
    "detect_and_set_adb_keyboard",
    "restore_keyboard",
    # Device control
    "get_current_app",
    "tap",
    "swipe",
    "back",
    "home",
    "double_tap",
    "long_press",
    "launch_app",
    # Connection management
    "ADBConnection",
    "DeviceInfo",
    "ConnectionType",
    "quick_connect",
    "list_devices",
]


================================================
FILE: phone_agent/adb/connection.py
================================================
"""ADB connection management for local and remote devices."""

import subprocess
import time
from dataclasses import dataclass
from enum import Enum
from typing import Optional

from phone_agent.config.timing import TIMING_CONFIG


class ConnectionType(Enum):
    """Type of ADB connection."""

    USB = "usb"
    WIFI = "wifi"
    REMOTE = "remote"


@dataclass
class DeviceInfo:
    """Information about a connected device."""

    device_id: str
    status: str
    connection_type: ConnectionType
    model: str | None = None
    android_version: str | None = None


class ADBConnection:
    """
    Manages ADB connections to Android devices.

    Supports USB, WiFi, and remote TCP/IP connections.

    Example:
        >>> conn = ADBConnection()
        >>> # Connect to remote device
        >>> conn.connect("192.168.1.100:5555")
        >>> # List devices
        >>> devices = conn.list_devices()
        >>> # Disconnect
        >>> conn.disconnect("192.168.1.100:5555")
    """

    def __init__(self, adb_path: str = "adb"):
        """
        Initialize ADB connection manager.

        Args:
            adb_path: Path to ADB executable.
        """
        self.adb_path = adb_path

    def connect(self, address: str, timeout: int = 10) -> tuple[bool, str]:
        """
        Connect to a remote device via TCP/IP.

        Args:
            address: Device address in format "host:port" (e.g., "192.168.1.100:5555").
            timeout: Connection timeout in seconds.

        Returns:
            Tuple of (success, message).

        Note:
            The remote device must have TCP/IP debugging enabled.
            On the device, run: adb tcpip 5555
        """
        # Validate address format
        if ":" not in address:
            address = f"{address}:5555"  # Default ADB port

        try:
            result = subprocess.run(
                [self.adb_path, "connect", address],
                capture_output=True,
                text=True,
                timeout=timeout,
            )

            output = result.stdout + result.stderr

            if "connected" in output.lower():
                return True, f"Connected to {address}"
            elif "already connected" in output.lower():
                return True, f"Already connected to {address}"
            else:
                return False, output.strip()

        except subprocess.TimeoutExpired:
            return False, f"Connection timeout after {timeout}s"
        except Exception as e:
            return False, f"Connection error: {e}"

    def disconnect(self, address: str | None = None) -> tuple[bool, str]:
        """
        Disconnect from a remote device.

        Args:
            address: Device address to disconnect. If None, disconnects all.

        Returns:
            Tuple of (success, message).
        """
        try:
            cmd = [self.adb_path, "disconnect"]
            if address:
                cmd.append(address)

            result = subprocess.run(cmd, capture_output=True, text=True, encoding="utf-8", timeout=5)

            output = result.stdout + result.stderr
            return True, output.strip() or "Disconnected"

        except Exception as e:
            return False, f"Disconnect error: {e}"

    def list_devices(self) -> list[DeviceInfo]:
        """
        List all connected devices.

        Returns:
            List of DeviceInfo objects.
        """
        try:
            result = subprocess.run(
                [self.adb_path, "devices", "-l"],
                capture_output=True,
                text=True,
                timeout=5,
            )

            devices = []
            for line in result.stdout.strip().split("\n")[1:]:  # Skip header
                if not line.strip():
                    continue

                parts = line.split()
                if len(parts) >= 2:
                    device_id = parts[0]
                    status = parts[1]

                    # Determine connection type
                    if ":" in device_id:
                        conn_type = ConnectionType.REMOTE
                    elif "emulator" in device_id:
                        conn_type = ConnectionType.USB  # Emulator via USB
                    else:
                        conn_type = ConnectionType.USB

                    # Parse additional info
                    model = None
                    for part in parts[2:]:
                        if part.startswith("model:"):
                            model = part.split(":", 1)[1]
                            break

                    devices.append(
                        DeviceInfo(
                            device_id=device_id,
                            status=status,
                            connection_type=conn_type,
                            model=model,
                        )
                    )

            return devices

        except Exception as e:
            print(f"Error listing devices: {e}")
            return []

    def get_device_info(self, device_id: str | None = None) -> DeviceInfo | None:
        """
        Get detailed information about a device.

        Args:
            device_id: Device ID. If None, uses first available device.

        Returns:
            DeviceInfo or None if not found.
        """
        devices = self.list_devices()

        if not devices:
            return None

        if device_id is None:
            return devices[0]

        for device in devices:
            if device.device_id == device_id:
                return device

        return None

    def is_connected(self, device_id: str | None = None) -> bool:
        """
        Check if a device is connected.

        Args:
            device_id: Device ID to check. If None, checks if any device is connected.

        Returns:
            True if connected, False otherwise.
        """
        devices = self.list_devices()

        if not devices:
            return False

        if device_id is None:
            return any(d.status == "device" for d in devices)

        return any(d.device_id == device_id and d.status == "device" for d in devices)

    def enable_tcpip(
        self, port: int = 5555, device_id: str | None = None
    ) -> tuple[bool, str]:
        """
        Enable TCP/IP debugging on a USB-connected device.

        This allows subsequent wireless connections to the device.

        Args:
            port: TCP port for ADB (default: 5555).
            device_id: Device ID. If None, uses first available device.

        Returns:
            Tuple of (success, message).

        Note:
            The device must be connected via USB first.
            After this, you can disconnect USB and connect via WiFi.
        """
        try:
            cmd = [self.adb_path]
            if device_id:
                cmd.extend(["-s", device_id])
            cmd.extend(["tcpip", str(port)])

            result = subprocess.run(cmd, capture_output=True, text=True, encoding="utf-8", timeout=10)

            output = result.stdout + result.stderr

            if "restarting" in output.lower() or result.returncode == 0:
                time.sleep(TIMING_CONFIG.connection.adb_restart_delay)
                return True, f"TCP/IP mode enabled on port {port}"
            else:
                return False, output.strip()

        except Exception as e:
            return False, f"Error enabling TCP/IP: {e}"

    def get_device_ip(self, device_id: str | None = None) -> str | None:
        """
        Get the IP address of a connected device.

        Args:
            device_id: Device ID. If None, uses first available device.

        Returns:
            IP address string or None if not found.
        """
        try:
            cmd = [self.adb_path]
            if device_id:
                cmd.extend(["-s", device_id])
            cmd.extend(["shell", "ip", "route"])

            result = subprocess.run(cmd, capture_output=True, text=True, encoding="utf-8", timeout=5)

            # Parse IP from route output
            for line in result.stdout.split("\n"):
                if "src" in line:
                    parts = line.split()
                    for i, part in enumerate(parts):
                        if part == "src" and i + 1 < len(parts):
                            return parts[i + 1]

            # Alternative: try wlan0 interface
            cmd[-1] = "ip addr show wlan0"
            result = subprocess.run(
                cmd[:-1] + ["shell", "ip", "addr", "show", "wlan0"],
                capture_output=True,
                text=True,
                encoding="utf-8",
                timeout=5,
            )

            for line in result.stdout.split("\n"):
                if "inet " in line:
                    parts = line.strip().split()
                    if len(parts) >= 2:
                        return parts[1].split("/")[0]

            return None

        except Exception as e:
            print(f"Error getting device IP: {e}")
            return None

    def restart_server(self) -> tuple[bool, str]:
        """
        Restart the ADB server.

        Returns:
            Tuple of (success, message).
        """
        try:
            # Kill server
            subprocess.run(
                [self.adb_path, "kill-server"], capture_output=True, timeout=5
            )

            time.sleep(TIMING_CONFIG.connection.server_restart_delay)

            # Start server
            subprocess.run(
                [self.adb_path, "start-server"], capture_output=True, timeout=5
            )

            return True, "ADB server restarted"

        except Exception as e:
            return False, f"Error restarting server: {e}"


def quick_connect(address: str) -> tuple[bool, str]:
    """
    Quick helper to connect to a remote device.

    Args:
        address: Device address (e.g., "192.168.1.100" or "192.168.1.100:5555").

    Returns:
        Tuple of (success, message).
    """
    conn = ADBConnection()
    return conn.connect(address)


def list_devices() -> list[DeviceInfo]:
    """
    Quick helper to list connected devices.

    Returns:
        List of DeviceInfo objects.
    """
    conn = ADBConnection()
    return conn.list_devices()


================================================
FILE: phone_agent/adb/device.py
================================================
"""Device control utilities for Android automation."""

import os
import subprocess
import time
from typing import List, Optional, Tuple

from phone_agent.config.apps import APP_PACKAGES
from phone_agent.config.timing import TIMING_CONFIG


def get_current_app(device_id: str | None = None) -> str:
    """
    Get the currently focused app name.

    Args:
        device_id: Optional ADB device ID for multi-device setups.

    Returns:
        The app name if recognized, otherwise "System Home".
    """
    adb_prefix = _get_adb_prefix(device_id)

    result = subprocess.run(
        adb_prefix + ["shell", "dumpsys", "window"], capture_output=True, text=True, encoding="utf-8"
    )
    output = result.stdout
    if not output:
        raise ValueError("No output from dumpsys window")

    # Parse window focus info
    for line in output.split("\n"):
        if "mCurrentFocus" in line or "mFocusedApp" in line:
            for app_name, package in APP_PACKAGES.items():
                if package in line:
                    return app_name

    return "System Home"


def tap(
    x: int, y: int, device_id: str | None = None, delay: float | None = None
) -> None:
    """
    Tap at the specified coordinates.

    Args:
        x: X coordinate.
        y: Y coordinate.
        device_id: Optional ADB device ID.
        delay: Delay in seconds after tap. If None, uses configured default.
    """
    if delay is None:
        delay = TIMING_CONFIG.device.default_tap_delay

    adb_prefix = _get_adb_prefix(device_id)

    subprocess.run(
        adb_prefix + ["shell", "input", "tap", str(x), str(y)], capture_output=True
    )
    time.sleep(delay)


def double_tap(
    x: int, y: int, device_id: str | None = None, delay: float | None = None
) -> None:
    """
    Double tap at the specified coordinates.

    Args:
        x: X coordinate.
        y: Y coordinate.
        device_id: Optional ADB device ID.
        delay: Delay in seconds after double tap. If None, uses configured default.
    """
    if delay is None:
        delay = TIMING_CONFIG.device.default_double_tap_delay

    adb_prefix = _get_adb_prefix(device_id)

    subprocess.run(
        adb_prefix + ["shell", "input", "tap", str(x), str(y)], capture_output=True
    )
    time.sleep(TIMING_CONFIG.device.double_tap_interval)
    subprocess.run(
        adb_prefix + ["shell", "input", "tap", str(x), str(y)], capture_output=True
    )
    time.sleep(delay)


def long_press(
    x: int,
    y: int,
    duration_ms: int = 3000,
    device_id: str | None = None,
    delay: float | None = None,
) -> None:
    """
    Long press at the specified coordinates.

    Args:
        x: X coordinate.
        y: Y coordinate.
        duration_ms: Duration of press in milliseconds.
        device_id: Optional ADB device ID.
        delay: Delay in seconds after long press. If None, uses configured default.
    """
    if delay is None:
        delay = TIMING_CONFIG.device.default_long_press_delay

    adb_prefix = _get_adb_prefix(device_id)

    subprocess.run(
        adb_prefix
        + ["shell", "input", "swipe", str(x), str(y), str(x), str(y), str(duration_ms)],
        capture_output=True,
    )
    time.sleep(delay)


def swipe(
    start_x: int,
    start_y: int,
    end_x: int,
    end_y: int,
    duration_ms: int | None = None,
    device_id: str | None = None,
    delay: float | None = None,
) -> None:
    """
    Swipe from start to end coordinates.

    Args:
        start_x: Starting X coordinate.
        start_y: Starting Y coordinate.
        end_x: Ending X coordinate.
        end_y: Ending Y coordinate.
        duration_ms: Duration of swipe in milliseconds (auto-calculated if None).
        device_id: Optional ADB device ID.
        delay: Delay in seconds after swipe. If None, uses configured default.
    """
    if delay is None:
        delay = TIMING_CONFIG.device.default_swipe_delay

    adb_prefix = _get_adb_prefix(device_id)

    if duration_ms is None:
        # Calculate duration based on distance
        dist_sq = (start_x - end_x) ** 2 + (start_y - end_y) ** 2
        duration_ms = int(dist_sq / 1000)
        duration_ms = max(1000, min(duration_ms, 2000))  # Clamp between 1000-2000ms

    subprocess.run(
        adb_prefix
        + [
            "shell",
            "input",
            "swipe",
            str(start_x),
            str(start_y),
            str(end_x),
            str(end_y),
            str(duration_ms),
        ],
        capture_output=True,
    )
    time.sleep(delay)


def back(device_id: str | None = None, delay: float | None = None) -> None:
    """
    Press the back button.

    Args:
        device_id: Optional ADB device ID.
        delay: Delay in seconds after pressing back. If None, uses configured default.
    """
    if delay is None:
        delay = TIMING_CONFIG.device.default_back_delay

    adb_prefix = _get_adb_prefix(device_id)

    subprocess.run(
        adb_prefix + ["shell", "input", "keyevent", "4"], capture_output=True
    )
    time.sleep(delay)


def home(device_id: str | None = None, delay: float | None = None) -> None:
    """
    Press the home button.

    Args:
        device_id: Optional ADB device ID.
        delay: Delay in seconds after pressing home. If None, uses configured default.
    """
    if delay is None:
        delay = TIMING_CONFIG.device.default_home_delay

    adb_prefix = _get_adb_prefix(device_id)

    subprocess.run(
        adb_prefix + ["shell", "input", "keyevent", "KEYCODE_HOME"], capture_output=True
    )
    time.sleep(delay)


def launch_app(
    app_name: str, device_id: str | None = None, delay: float | None = None
) -> bool:
    """
    Launch an app by name.

    Args:
        app_name: The app name (must be in APP_PACKAGES).
        device_id: Optional ADB device ID.
        delay: Delay in seconds after launching. If None, uses configured default.

    Returns:
        True if app was launched, False if app not found.
    """
    if delay is None:
        delay = TIMING_CONFIG.device.default_launch_delay

    if app_name not in APP_PACKAGES:
        return False

    adb_prefix = _get_adb_prefix(device_id)
    package = APP_PACKAGES[app_name]

    subprocess.run(
        adb_prefix
        + [
            "shell",
            "monkey",
            "-p",
            package,
            "-c",
            "android.intent.category.LAUNCHER",
            "1",
        ],
        capture_output=True,
    )
    time.sleep(delay)
    return True


def _get_adb_prefix(device_id: str | None) -> list:
    """Get ADB command prefix with optional device specifier."""
    if device_id:
        return ["adb", "-s", device_id]
    return ["adb"]


================================================
FILE: phone_agent/adb/input.py
================================================
"""Input utilities for Android device text input."""

import base64
import subprocess
from typing import Optional


def type_text(text: str, device_id: str | None = None) -> None:
    """
    Type text into the currently focused input field using ADB Keyboard.

    Args:
        text: The text to type.
        device_id: Optional ADB device ID for multi-device setups.

    Note:
        Requires ADB Keyboard to be installed on the device.
        See: https://github.com/nicnocquee/AdbKeyboard
    """
    adb_prefix = _get_adb_prefix(device_id)
    encoded_text = base64.b64encode(text.encode("utf-8")).decode("utf-8")

    subprocess.run(
        adb_prefix
        + [
            "shell",
            "am",
            "broadcast",
            "-a",
            "ADB_INPUT_B64",
            "--es",
            "msg",
            encoded_text,
        ],
        capture_output=True,
        text=True,
    )


def clear_text(device_id: str | None = None) -> None:
    """
    Clear text in the currently focused input field.

    Args:
        device_id: Optional ADB device ID for multi-device setups.
    """
    adb_prefix = _get_adb_prefix(device_id)

    subprocess.run(
        adb_prefix + ["shell", "am", "broadcast", "-a", "ADB_CLEAR_TEXT"],
        capture_output=True,
        text=True,
    )


def detect_and_set_adb_keyboard(device_id: str | None = None) -> str:
    """
    Detect current keyboard and switch to ADB Keyboard if needed.

    Args:
        device_id: Optional ADB device ID for multi-device setups.

    Returns:
        The original keyboard IME identifier for later restoration.
    """
    adb_prefix = _get_adb_prefix(device_id)

    # Get current IME
    result = subprocess.run(
        adb_prefix + ["shell", "settings", "get", "secure", "default_input_method"],
        capture_output=True,
        text=True,
    )
    current_ime = (result.stdout + result.stderr).strip()

    # Switch to ADB Keyboard if not already set
    if "com.android.adbkeyboard/.AdbIME" not in current_ime:
        subprocess.run(
            adb_prefix + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"],
            capture_output=True,
            text=True,
        )

    # Warm up the keyboard
    type_text("", device_id)

    return current_ime


def restore_keyboard(ime: str, device_id: str | None = None) -> None:
    """
    Restore the original keyboard IME.

    Args:
        ime: The IME identifier to restore.
        device_id: Optional ADB device ID for multi-device setups.
    """
    adb_prefix = _get_adb_prefix(device_id)

    subprocess.run(
        adb_prefix + ["shell", "ime", "set", ime], capture_output=True, text=True
    )


def _get_adb_prefix(device_id: str | None) -> list:
    """Get ADB command prefix with optional device specifier."""
    if device_id:
        return ["adb", "-s", device_id]
    return ["adb"]


================================================
FILE: phone_agent/adb/screenshot.py
================================================
"""Screenshot utilities for capturing Android device screen."""

import base64
import os
import subprocess
import tempfile
import uuid
from dataclasses import dataclass
from io import BytesIO
from typing import Tuple

from PIL import Image


@dataclass
class Screenshot:
    """Represents a captured screenshot."""

    base64_data: str
    width: int
    height: int
    is_sensitive: bool = False


def get_screenshot(device_id: str | None = None, timeout: int = 10) -> Screenshot:
    """
    Capture a screenshot from the connected Android device.

    Args:
        device_id: Optional ADB device ID for multi-device setups.
        timeout: Timeout in seconds for screenshot operations.

    Returns:
        Screenshot object containing base64 data and dimensions.

    Note:
        If the screenshot fails (e.g., on sensitive screens like payment pages),
        a black fallback image is returned with is_sensitive=True.
    """
    temp_path = os.path.join(tempfile.gettempdir(), f"screenshot_{uuid.uuid4()}.png")
    adb_prefix = _get_adb_prefix(device_id)

    try:
        # Execute screenshot command
        result = subprocess.run(
            adb_prefix + ["shell", "screencap", "-p", "/sdcard/tmp.png"],
            capture_output=True,
            text=True,
            timeout=timeout,
        )

        # Check for screenshot failure (sensitive screen)
        output = result.stdout + result.stderr
        if "Status: -1" in output or "Failed" in output:
            return _create_fallback_screenshot(is_sensitive=True)

        # Pull screenshot to local temp path
        subprocess.run(
            adb_prefix + ["pull", "/sdcard/tmp.png", temp_path],
            capture_output=True,
            text=True,
            timeout=5,
        )

        if not os.path.exists(temp_path):
            return _create_fallback_screenshot(is_sensitive=False)

        # Read and encode image
        img = Image.open(temp_path)
        width, height = img.size

        buffered = BytesIO()
        img.save(buffered, format="PNG")
        base64_data = base64.b64encode(buffered.getvalue()).decode("utf-8")

        # Cleanup
        os.remove(temp_path)

        return Screenshot(
            base64_data=base64_data, width=width, height=height, is_sensitive=False
        )

    except Exception as e:
        print(f"Screenshot error: {e}")
        return _create_fallback_screenshot(is_sensitive=False)


def _get_adb_prefix(device_id: str | None) -> list:
    """Get ADB command prefix with optional device specifier."""
    if device_id:
        return ["adb", "-s", device_id]
    return ["adb"]


def _create_fallback_screenshot(is_sensitive: bool) -> Screenshot:
    """Create a black fallback image when screenshot fails."""
    default_width, default_height = 1080, 2400

    black_img = Image.new("RGB", (default_width, default_height), color="black")
    buffered = BytesIO()
    black_img.save(buffered, format="PNG")
    base64_data = base64.b64encode(buffered.getvalue()).decode("utf-8")

    return Screenshot(
        base64_data=base64_data,
        width=default_width,
        height=default_height,
        is_sensitive=is_sensitive,
    )


================================================
FILE: phone_agent/agent.py
================================================
"""Main PhoneAgent class for orchestrating phone automation."""

import json
import traceback
from dataclasses import dataclass
from typing import Any, Callable

from phone_agent.actions import ActionHandler
from phone_agent.actions.handler import do, finish, parse_action
from phone_agent.config import get_messages, get_system_prompt
from phone_agent.device_factory import get_device_factory
from phone_agent.model import ModelClient, ModelConfig
from phone_agent.model.client import MessageBuilder


@dataclass
class AgentConfig:
    """Configuration for the PhoneAgent."""

    max_steps: int = 100
    device_id: str | None = None
    lang: str = "cn"
    system_prompt: str | None = None
    verbose: bool = True

    def __post_init__(self):
        if self.system_prompt is None:
            self.system_prompt = get_system_prompt(self.lang)


@dataclass
class StepResult:
    """Result of a single agent step."""

    success: bool
    finished: bool
    action: dict[str, Any] | None
    thinking: str
    message: str | None = None


class PhoneAgent:
    """
    AI-powered agent for automating Android phone interactions.

    The agent uses a vision-language model to understand screen content
    and decide on actions to complete user tasks.

    Args:
        model_config: Configuration for the AI model.
        agent_config: Configuration for the agent behavior.
        confirmation_callback: Optional callback for sensitive action confirmation.
        takeover_callback: Optional callback for takeover requests.

    Example:
        >>> from phone_agent import PhoneAgent
        >>> from phone_agent.model import ModelConfig
        >>>
        >>> model_config = ModelConfig(base_url="http://localhost:8000/v1")
        >>> agent = PhoneAgent(model_config)
        >>> agent.run("Open WeChat and send a message to John")
    """

    def __init__(
        self,
        model_config: ModelConfig | None = None,
        agent_config: AgentConfig | None = None,
        confirmation_callback: Callable[[str], bool] | None = None,
        takeover_callback: Callable[[str], None] | None = None,
    ):
        self.model_config = model_config or ModelConfig()
        self.agent_config = agent_config or AgentConfig()

        self.model_client = ModelClient(self.model_config)
        self.action_handler = ActionHandler(
            device_id=self.agent_config.device_id,
            confirmation_callback=confirmation_callback,
            takeover_callback=takeover_callback,
        )

        self._context: list[dict[str, Any]] = []
        self._step_count = 0

    def run(self, task: str) -> str:
        """
        Run the agent to complete a task.

        Args:
            task: Natural language description of the task.

        Returns:
            Final message from the agent.
        """
        self._context = []
        self._step_count = 0

        # First step with user prompt
        result = self._execute_step(task, is_first=True)

        if result.finished:
            return result.message or "Task completed"

        # Continue until finished or max steps reached
        while self._step_count < self.agent_config.max_steps:
            result = self._execute_step(is_first=False)

            if result.finished:
                return result.message or "Task completed"

        return "Max steps reached"

    def step(self, task: str | None = None) -> StepResult:
        """
        Execute a single step of the agent.

        Useful for manual control or debugging.

        Args:
            task: Task description (only needed for first step).

        Returns:
            StepResult with step details.
        """
        is_first = len(self._context) == 0

        if is_first and not task:
            raise ValueError("Task is required for the first step")

        return self._execute_step(task, is_first)

    def reset(self) -> None:
        """Reset the agent state for a new task."""
        self._context = []
        self._step_count = 0

    def _execute_step(
        self, user_prompt: str | None = None, is_first: bool = False
    ) -> StepResult:
        """Execute a single step of the agent loop."""
        self._step_count += 1

        # Capture current screen state
        device_factory = get_device_factory()
        screenshot = device_factory.get_screenshot(self.agent_config.device_id)
        current_app = device_factory.get_current_app(self.agent_config.device_id)

        # Build messages
        if is_first:
            self._context.append(
                MessageBuilder.create_system_message(self.agent_config.system_prompt)
            )

            screen_info = MessageBuilder.build_screen_info(current_app)
            text_content = f"{user_prompt}\n\n{screen_info}"

            self._context.append(
                MessageBuilder.create_user_message(
                    text=text_content, image_base64=screenshot.base64_data
                )
            )
        else:
            screen_info = MessageBuilder.build_screen_info(current_app)
            text_content = f"** Screen Info **\n\n{screen_info}"

            self._context.append(
                MessageBuilder.create_user_message(
                    text=text_content, image_base64=screenshot.base64_data
                )
            )

        # Get model response
        try:
            msgs = get_messages(self.agent_config.lang)
            print("\n" + "=" * 50)
            print(f"💭 {msgs['thinking']}:")
            print("-" * 50)
            response = self.model_client.request(self._context)
        except Exception as e:
            if self.agent_config.verbose:
                traceback.print_exc()
            return StepResult(
                success=False,
                finished=True,
                action=None,
                thinking="",
                message=f"Model error: {e}",
            )

        # Parse action from response
        try:
            action = parse_action(response.action)
        except ValueError:
            if self.agent_config.verbose:
                traceback.print_exc()
            action = finish(message=response.action)

        if self.agent_config.verbose:
            # Print thinking process
            print("-" * 50)
            print(f"🎯 {msgs['action']}:")
            print(json.dumps(action, ensure_ascii=False, indent=2))
            print("=" * 50 + "\n")

        # Remove image from context to save space
        self._context[-1] = MessageBuilder.remove_images_from_message(self._context[-1])

        # Execute action
        try:
            result = self.action_handler.execute(
                action, screenshot.width, screenshot.height
            )
        except Exception as e:
            if self.agent_config.verbose:
                traceback.print_exc()
            result = self.action_handler.execute(
                finish(message=str(e)), screenshot.width, screenshot.height
            )

        # Add assistant response to context
        self._context.append(
            MessageBuilder.create_assistant_message(
                f"<think>{response.thinking}</think><answer>{response.action}</answer>"
            )
        )

        # Check if finished
        finished = action.get("_metadata") == "finish" or result.should_finish

        if finished and self.agent_config.verbose:
            msgs = get_messages(self.agent_config.lang)
            print("\n" + "🎉 " + "=" * 48)
            print(
                f"✅ {msgs['task_completed']}: {result.message or action.get('message', msgs['done'])}"
            )
            print("=" * 50 + "\n")

        return StepResult(
            success=result.success,
            finished=finished,
            action=action,
            thinking=response.thinking,
            message=result.message or action.get("message"),
        )

    @property
    def context(self) -> list[dict[str, Any]]:
        """Get the current conversation context."""
        return self._context.copy()

    @property
    def step_count(self) -> int:
        """Get the current step count."""
        return self._step_count


================================================
FILE: phone_agent/agent_ios.py
===

Download .txt

gitextract_6ew0oeqv/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yaml
│   │   └── feature-request.yaml
│   └── PULL_REQUEST_TEMPLATE.md
├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── README.md
├── README_coding_agent.md
├── README_en.md
├── docs/
│   └── ios_setup/
│       └── ios_setup.md
├── examples/
│   ├── basic_usage.py
│   └── demo_thinking.py
├── ios.py
├── main.py
├── phone_agent/
│   ├── __init__.py
│   ├── actions/
│   │   ├── __init__.py
│   │   ├── handler.py
│   │   └── handler_ios.py
│   ├── adb/
│   │   ├── __init__.py
│   │   ├── connection.py
│   │   ├── device.py
│   │   ├── input.py
│   │   └── screenshot.py
│   ├── agent.py
│   ├── agent_ios.py
│   ├── config/
│   │   ├── __init__.py
│   │   ├── apps.py
│   │   ├── apps_harmonyos.py
│   │   ├── apps_ios.py
│   │   ├── i18n.py
│   │   ├── prompts.py
│   │   ├── prompts_en.py
│   │   ├── prompts_zh.py
│   │   └── timing.py
│   ├── device_factory.py
│   ├── hdc/
│   │   ├── __init__.py
│   │   ├── connection.py
│   │   ├── device.py
│   │   ├── input.py
│   │   └── screenshot.py
│   ├── model/
│   │   ├── __init__.py
│   │   └── client.py
│   └── xctest/
│       ├── __init__.py
│       ├── connection.py
│       ├── device.py
│       ├── input.py
│       └── screenshot.py
├── requirements.txt
├── resources/
│   ├── WECHAT.md
│   ├── privacy_policy.txt
│   └── privacy_policy_en.txt
├── scripts/
│   ├── check_deployment_cn.py
│   ├── check_deployment_en.py
│   ├── sample_messages.json
│   └── sample_messages_en.json
└── setup.py

Download .txt

SYMBOL INDEX (253 symbols across 28 files)

FILE: examples/basic_usage.py
  function example_basic_task (line 15) | def example_basic_task(lang: str = "cn"):
  function example_with_callbacks (line 44) | def example_with_callbacks(lang: str = "cn"):
  function example_step_by_step (line 73) | def example_step_by_step(lang: str = "cn"):
  function example_multiple_tasks (line 93) | def example_multiple_tasks(lang: str = "cn"):
  function example_remote_device (line 118) | def example_remote_device(lang: str = "cn"):

FILE: examples/demo_thinking.py
  function main (line 15) | def main(lang: str = "cn"):

FILE: ios.py
  function check_system_requirements (line 31) | def check_system_requirements(wda_url: str = "http://localhost:8100") ->...
  function check_model_api (line 162) | def check_model_api(base_url: str, api_key: str, model_name: str) -> bool:
  function parse_args (line 250) | def parse_args() -> argparse.Namespace:
  function handle_device_commands (line 371) | def handle_device_commands(args) -> bool:
  function main (line 446) | def main():

FILE: main.py
  function check_system_requirements (line 37) | def check_system_requirements(
  function check_model_api (line 272) | def check_model_api(base_url: str, model_name: str, api_key: str = "EMPT...
  function parse_args (line 355) | def parse_args() -> argparse.Namespace:
  function handle_ios_device_commands (line 527) | def handle_ios_device_commands(args) -> bool:
  function handle_device_commands (line 602) | def handle_device_commands(args) -> bool:
  function main (line 684) | def main():

FILE: phone_agent/actions/handler.py
  class ActionResult (line 15) | class ActionResult:
  class ActionHandler (line 24) | class ActionHandler:
    method __init__ (line 35) | def __init__(
    method execute (line 45) | def execute(
    method _get_handler (line 90) | def _get_handler(self, action_name: str) -> Callable | None:
    method _convert_relative_to_absolute (line 110) | def _convert_relative_to_absolute(
    method _handle_launch (line 118) | def _handle_launch(self, action: dict, width: int, height: int) -> Act...
    method _handle_tap (line 130) | def _handle_tap(self, action: dict, width: int, height: int) -> Action...
    method _handle_type (line 151) | def _handle_type(self, action: dict, width: int, height: int) -> Actio...
    method _handle_swipe (line 175) | def _handle_swipe(self, action: dict, width: int, height: int) -> Acti...
    method _handle_back (line 190) | def _handle_back(self, action: dict, width: int, height: int) -> Actio...
    method _handle_home (line 196) | def _handle_home(self, action: dict, width: int, height: int) -> Actio...
    method _handle_double_tap (line 202) | def _handle_double_tap(self, action: dict, width: int, height: int) ->...
    method _handle_long_press (line 213) | def _handle_long_press(self, action: dict, width: int, height: int) ->...
    method _handle_wait (line 224) | def _handle_wait(self, action: dict, width: int, height: int) -> Actio...
    method _handle_takeover (line 235) | def _handle_takeover(self, action: dict, width: int, height: int) -> A...
    method _handle_note (line 241) | def _handle_note(self, action: dict, width: int, height: int) -> Actio...
    method _handle_call_api (line 247) | def _handle_call_api(self, action: dict, width: int, height: int) -> A...
    method _handle_interact (line 253) | def _handle_interact(self, action: dict, width: int, height: int) -> A...
    method _send_keyevent (line 258) | def _send_keyevent(self, keycode: str) -> None:
    method _default_confirmation (line 321) | def _default_confirmation(message: str) -> bool:
    method _default_takeover (line 327) | def _default_takeover(message: str) -> None:
  function parse_action (line 332) | def parse_action(response: str) -> dict[str, Any]:
  function do (line 390) | def do(**kwargs) -> dict[str, Any]:
  function finish (line 396) | def finish(**kwargs) -> dict[str, Any]:

FILE: phone_agent/actions/handler_ios.py
  class ActionResult (line 20) | class ActionResult:
  class IOSActionHandler (line 29) | class IOSActionHandler:
    method __init__ (line 41) | def __init__(
    method execute (line 53) | def execute(
    method _get_handler (line 98) | def _get_handler(self, action_name: str) -> Callable | None:
    method _convert_relative_to_absolute (line 118) | def _convert_relative_to_absolute(
    method _handle_launch (line 126) | def _handle_launch(self, action: dict, width: int, height: int) -> Act...
    method _handle_tap (line 139) | def _handle_tap(self, action: dict, width: int, height: int) -> Action...
    method _handle_type (line 161) | def _handle_type(self, action: dict, width: int, height: int) -> Actio...
    method _handle_swipe (line 178) | def _handle_swipe(self, action: dict, width: int, height: int) -> Acti...
    method _handle_back (line 201) | def _handle_back(self, action: dict, width: int, height: int) -> Actio...
    method _handle_home (line 206) | def _handle_home(self, action: dict, width: int, height: int) -> Actio...
    method _handle_double_tap (line 211) | def _handle_double_tap(self, action: dict, width: int, height: int) ->...
    method _handle_long_press (line 221) | def _handle_long_press(self, action: dict, width: int, height: int) ->...
    method _handle_wait (line 237) | def _handle_wait(self, action: dict, width: int, height: int) -> Actio...
    method _handle_takeover (line 248) | def _handle_takeover(self, action: dict, width: int, height: int) -> A...
    method _handle_note (line 254) | def _handle_note(self, action: dict, width: int, height: int) -> Actio...
    method _handle_call_api (line 260) | def _handle_call_api(self, action: dict, width: int, height: int) -> A...
    method _handle_interact (line 266) | def _handle_interact(self, action: dict, width: int, height: int) -> A...
    method _default_confirmation (line 272) | def _default_confirmation(message: str) -> bool:
    method _default_takeover (line 278) | def _default_takeover(message: str) -> None:

FILE: phone_agent/adb/connection.py
  class ConnectionType (line 12) | class ConnectionType(Enum):
  class DeviceInfo (line 21) | class DeviceInfo:
  class ADBConnection (line 31) | class ADBConnection:
    method __init__ (line 47) | def __init__(self, adb_path: str = "adb"):
    method connect (line 56) | def connect(self, address: str, timeout: int = 10) -> tuple[bool, str]:
    method disconnect (line 97) | def disconnect(self, address: str | None = None) -> tuple[bool, str]:
    method list_devices (line 120) | def list_devices(self) -> list[DeviceInfo]:
    method get_device_info (line 175) | def get_device_info(self, device_id: str | None = None) -> DeviceInfo ...
    method is_connected (line 199) | def is_connected(self, device_id: str | None = None) -> bool:
    method enable_tcpip (line 219) | def enable_tcpip(
    method get_device_ip (line 257) | def get_device_ip(self, device_id: str | None = None) -> str | None:
    method restart_server (line 305) | def restart_server(self) -> tuple[bool, str]:
  function quick_connect (line 331) | def quick_connect(address: str) -> tuple[bool, str]:
  function list_devices (line 345) | def list_devices() -> list[DeviceInfo]:

FILE: phone_agent/adb/device.py
  function get_current_app (line 12) | def get_current_app(device_id: str | None = None) -> str:
  function tap (line 41) | def tap(
  function double_tap (line 64) | def double_tap(
  function long_press (line 91) | def long_press(
  function swipe (line 121) | def swipe(
  function back (line 170) | def back(device_id: str | None = None, delay: float | None = None) -> None:
  function home (line 189) | def home(device_id: str | None = None, delay: float | None = None) -> None:
  function launch_app (line 208) | def launch_app(
  function _get_adb_prefix (line 248) | def _get_adb_prefix(device_id: str | None) -> list:

FILE: phone_agent/adb/input.py
  function type_text (line 8) | def type_text(text: str, device_id: str | None = None) -> None:
  function clear_text (line 40) | def clear_text(device_id: str | None = None) -> None:
  function detect_and_set_adb_keyboard (line 56) | def detect_and_set_adb_keyboard(device_id: str | None = None) -> str:
  function restore_keyboard (line 90) | def restore_keyboard(ime: str, device_id: str | None = None) -> None:
  function _get_adb_prefix (line 105) | def _get_adb_prefix(device_id: str | None) -> list:

FILE: phone_agent/adb/screenshot.py
  class Screenshot (line 16) | class Screenshot:
  function get_screenshot (line 25) | def get_screenshot(device_id: str | None = None, timeout: int = 10) -> S...
  function _get_adb_prefix (line 88) | def _get_adb_prefix(device_id: str | None) -> list:
  function _create_fallback_screenshot (line 95) | def _create_fallback_screenshot(is_sensitive: bool) -> Screenshot:

FILE: phone_agent/agent.py
  class AgentConfig (line 17) | class AgentConfig:
    method __post_init__ (line 26) | def __post_init__(self):
  class StepResult (line 32) | class StepResult:
  class PhoneAgent (line 42) | class PhoneAgent:
    method __init__ (line 64) | def __init__(
    method run (line 84) | def run(self, task: str) -> str:
    method step (line 112) | def step(self, task: str | None = None) -> StepResult:
    method reset (line 131) | def reset(self) -> None:
    method _execute_step (line 136) | def _execute_step(
    method context (line 246) | def context(self) -> list[dict[str, Any]]:
    method step_count (line 251) | def step_count(self) -> int:

FILE: phone_agent/agent_ios.py
  class IOSAgentConfig (line 17) | class IOSAgentConfig:
    method __post_init__ (line 28) | def __post_init__(self):
  class StepResult (line 34) | class StepResult:
  class IOSPhoneAgent (line 44) | class IOSPhoneAgent:
    method __init__ (line 67) | def __init__(
    method run (line 102) | def run(self, task: str) -> str:
    method step (line 130) | def step(self, task: str | None = None) -> StepResult:
    method reset (line 149) | def reset(self) -> None:
    method _execute_step (line 154) | def _execute_step(
    method context (line 270) | def context(self) -> list[dict[str, Any]]:
    method step_count (line 275) | def step_count(self) -> int:

FILE: phone_agent/config/__init__.py
  function get_system_prompt (line 19) | def get_system_prompt(lang: str = "cn") -> str:

FILE: phone_agent/config/apps.py
  function get_package_name (line 191) | def get_package_name(app_name: str) -> str | None:
  function get_app_name (line 204) | def get_app_name(package_name: str) -> str | None:
  function list_supported_apps (line 220) | def list_supported_apps() -> list[str]:

FILE: phone_agent/config/apps_harmonyos.py
  function get_package_name (line 230) | def get_package_name(app_name: str) -> str | None:
  function get_app_name (line 243) | def get_app_name(package_name: str) -> str | None:
  function list_supported_apps (line 259) | def list_supported_apps() -> list[str]:

FILE: phone_agent/config/apps_ios.py
  function get_bundle_id (line 204) | def get_bundle_id(app_name: str) -> str | None:
  function get_app_name (line 217) | def get_app_name(bundle_id: str) -> str | None:
  function list_supported_apps (line 233) | def list_supported_apps() -> list[str]:
  function check_app_installed (line 243) | def check_app_installed(app_name: str, wda_url: str = "http://localhost:...
  function get_app_info_from_itunes (line 282) | def get_app_info_from_itunes(bundle_id: str) -> dict | None:
  function get_app_info_by_id (line 312) | def get_app_info_by_id(app_store_id: str) -> dict | None:

FILE: phone_agent/config/i18n.py
  function get_messages (line 54) | def get_messages(lang: str = "cn") -> dict:
  function get_message (line 69) | def get_message(key: str, lang: str = "cn") -> str:

FILE: phone_agent/config/timing.py
  class ActionTimingConfig (line 12) | class ActionTimingConfig:
    method __post_init__ (line 21) | def __post_init__(self):
  class DeviceTimingConfig (line 38) | class DeviceTimingConfig:
    method __post_init__ (line 51) | def __post_init__(self):
  class ConnectionTimingConfig (line 80) | class ConnectionTimingConfig:
    method __post_init__ (line 89) | def __post_init__(self):
  class TimingConfig (line 100) | class TimingConfig:
    method __init__ (line 107) | def __init__(self):
  function get_timing_config (line 119) | def get_timing_config() -> TimingConfig:
  function update_timing_config (line 129) | def update_timing_config(

FILE: phone_agent/device_factory.py
  class DeviceType (line 7) | class DeviceType(Enum):
  class DeviceFactory (line 15) | class DeviceFactory:
    method __init__ (line 22) | def __init__(self, device_type: DeviceType = DeviceType.ADB):
    method module (line 33) | def module(self):
    method get_screenshot (line 48) | def get_screenshot(self, device_id: str | None = None, timeout: int = ...
    method get_current_app (line 52) | def get_current_app(self, device_id: str | None = None) -> str:
    method tap (line 56) | def tap(
    method double_tap (line 62) | def double_tap(
    method long_press (line 68) | def long_press(
    method swipe (line 79) | def swipe(
    method back (line 94) | def back(self, device_id: str | None = None, delay: float | None = None):
    method home (line 98) | def home(self, device_id: str | None = None, delay: float | None = None):
    method launch_app (line 102) | def launch_app(
    method type_text (line 108) | def type_text(self, text: str, device_id: str | None = None):
    method clear_text (line 112) | def clear_text(self, device_id: str | None = None):
    method detect_and_set_adb_keyboard (line 116) | def detect_and_set_adb_keyboard(self, device_id: str | None = None) ->...
    method restore_keyboard (line 120) | def restore_keyboard(self, ime: str, device_id: str | None = None):
    method list_devices (line 124) | def list_devices(self):
    method get_connection_class (line 128) | def get_connection_class(self):
  function set_device_type (line 146) | def set_device_type(device_type: DeviceType):
  function get_device_factory (line 157) | def get_device_factory() -> DeviceFactory:

FILE: phone_agent/hdc/connection.py
  function _run_hdc_command (line 17) | def _run_hdc_command(cmd: list, **kwargs) -> subprocess.CompletedProcess:
  function set_hdc_verbose (line 41) | def set_hdc_verbose(verbose: bool):
  class ConnectionType (line 47) | class ConnectionType(Enum):
  class DeviceInfo (line 56) | class DeviceInfo:
  class HDCConnection (line 66) | class HDCConnection:
    method __init__ (line 82) | def __init__(self, hdc_path: str = "hdc"):
    method connect (line 91) | def connect(self, address: str, timeout: int = 10) -> tuple[bool, str]:
    method disconnect (line 131) | def disconnect(self, address: str | None = None) -> tuple[bool, str]:
    method list_devices (line 165) | def list_devices(self) -> list[DeviceInfo]:
    method get_device_info (line 212) | def get_device_info(self, device_id: str | None = None) -> DeviceInfo ...
    method is_connected (line 236) | def is_connected(self, device_id: str | None = None) -> bool:
    method enable_tcpip (line 256) | def enable_tcpip(
    method get_device_ip (line 294) | def get_device_ip(self, device_id: str | None = None) -> str | None:
    method restart_server (line 333) | def restart_server(self) -> tuple[bool, str]:
  function quick_connect (line 359) | def quick_connect(address: str) -> tuple[bool, str]:
  function list_devices (line 373) | def list_devices() -> list[DeviceInfo]:

FILE: phone_agent/hdc/device.py
  function get_current_app (line 13) | def get_current_app(device_id: str | None = None) -> str:
  function tap (line 80) | def tap(
  function double_tap (line 105) | def double_tap(
  function long_press (line 130) | def long_press(
  function swipe (line 161) | def swipe(
  function back (line 213) | def back(device_id: str | None = None, delay: float | None = None) -> None:
  function home (line 234) | def home(device_id: str | None = None, delay: float | None = None) -> None:
  function launch_app (line 255) | def launch_app(
  function _get_hdc_prefix (line 303) | def _get_hdc_prefix(device_id: str | None) -> list:

FILE: phone_agent/hdc/input.py
  function type_text (line 10) | def type_text(text: str, device_id: str | None = None) -> None:
  function clear_text (line 66) | def clear_text(device_id: str | None = None) -> None:
  function detect_and_set_adb_keyboard (line 92) | def detect_and_set_adb_keyboard(device_id: str | None = None) -> str:
  function restore_keyboard (line 124) | def restore_keyboard(ime: str, device_id: str | None = None) -> None:
  function _get_hdc_prefix (line 145) | def _get_hdc_prefix(device_id: str | None) -> list:

FILE: phone_agent/hdc/screenshot.py
  class Screenshot (line 17) | class Screenshot:
  function get_screenshot (line 26) | def get_screenshot(device_id: str | None = None, timeout: int = 10) -> S...
  function _get_hdc_prefix (line 104) | def _get_hdc_prefix(device_id: str | None) -> list:
  function _create_fallback_screenshot (line 111) | def _create_fallback_screenshot(is_sensitive: bool) -> Screenshot:

FILE: phone_agent/model/client.py
  class ModelConfig (line 14) | class ModelConfig:
  class ModelResponse (line 29) | class ModelResponse:
  class ModelClient (line 41) | class ModelClient:
    method __init__ (line 49) | def __init__(self, config: ModelConfig | None = None):
    method request (line 53) | def request(self, messages: list[dict[str, Any]]) -> ModelResponse:
    method _parse_response (line 176) | def _parse_response(self, content: str) -> tuple[str, str]:
  class MessageBuilder (line 219) | class MessageBuilder:
    method create_system_message (line 223) | def create_system_message(content: str) -> dict[str, Any]:
    method create_user_message (line 228) | def create_user_message(
    method create_assistant_message (line 256) | def create_assistant_message(content: str) -> dict[str, Any]:
    method remove_images_from_message (line 261) | def remove_images_from_message(message: dict[str, Any]) -> dict[str, A...
    method build_screen_info (line 278) | def build_screen_info(current_app: str, **extra_info) -> str:

FILE: phone_agent/xctest/connection.py
  class ConnectionType (line 9) | class ConnectionType(Enum):
  class DeviceInfo (line 17) | class DeviceInfo:
  class XCTestConnection (line 28) | class XCTestConnection:
    method __init__ (line 47) | def __init__(self, wda_url: str = "http://localhost:8100"):
    method list_devices (line 57) | def list_devices(self) -> list[DeviceInfo]:
    method _get_device_details (line 115) | def _get_device_details(self, udid: str) -> dict[str, str]:
    method get_device_info (line 152) | def get_device_info(self, device_id: str | None = None) -> DeviceInfo ...
    method is_connected (line 176) | def is_connected(self, device_id: str | None = None) -> bool:
    method is_wda_ready (line 196) | def is_wda_ready(self, timeout: int = 2) -> bool:
    method start_wda_session (line 221) | def start_wda_session(self) -> tuple[bool, str]:
    method get_wda_status (line 255) | def get_wda_status(self) -> dict | None:
    method pair_device (line 274) | def pair_device(self, device_id: str | None = None) -> tuple[bool, str]:
    method get_device_name (line 307) | def get_device_name(self, device_id: str | None = None) -> str | None:
    method restart_wda (line 331) | def restart_wda(self) -> tuple[bool, str]:
  function quick_connect (line 351) | def quick_connect(wda_url: str = "http://localhost:8100") -> tuple[bool,...
  function list_devices (line 374) | def list_devices() -> list[DeviceInfo]:

FILE: phone_agent/xctest/device.py
  function _get_wda_session_url (line 11) | def _get_wda_session_url(wda_url: str, session_id: str | None, endpoint:...
  function get_current_app (line 31) | def get_current_app(
  function tap (line 75) | def tap(
  function double_tap (line 124) | def double_tap(
  function long_press (line 177) | def long_press(
  function swipe (line 231) | def swipe(
  function back (line 284) | def back(
  function home (line 325) | def home(
  function launch_app (line 353) | def launch_app(
  function get_screen_size (line 395) | def get_screen_size(
  function press_button (line 431) | def press_button(

FILE: phone_agent/xctest/input.py
  function _get_wda_session_url (line 6) | def _get_wda_session_url(wda_url: str, session_id: str | None, endpoint:...
  function type_text (line 26) | def type_text(
  function clear_text (line 64) | def clear_text(
  function _clear_with_backspace (line 106) | def _clear_with_backspace(
  function send_keys (line 137) | def send_keys(
  function press_enter (line 167) | def press_enter(
  function hide_keyboard (line 184) | def hide_keyboard(
  function is_keyboard_shown (line 208) | def is_keyboard_shown(
  function set_pasteboard (line 241) | def set_pasteboard(
  function get_pasteboard (line 271) | def get_pasteboard(

FILE: phone_agent/xctest/screenshot.py
  class Screenshot (line 15) | class Screenshot:
  function get_screenshot (line 24) | def get_screenshot(
  function _get_screenshot_wda (line 60) | def _get_screenshot_wda(
  function _get_screenshot_idevice (line 106) | def _get_screenshot_idevice(
  function _create_fallback_screenshot (line 159) | def _create_fallback_screenshot(is_sensitive: bool) -> Screenshot:
  function save_screenshot (line 185) | def save_screenshot(
  function get_screenshot_png (line 209) | def get_screenshot_png(

Download .json

Condensed preview — 56 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (2,834K chars).

[
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.yaml",
    "chars": 2949,
    "preview": "name: \"\\U0001F41B Bug Report\"\ndescription: Submit a bug report to help us improve Open-AutoGLM / 提交一个 Bug 问题报告来帮助我们改进 Op"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature-request.yaml",
    "chars": 1001,
    "preview": "name: \"\\U0001F680 Feature request\"\ndescription: Submit a request for a new Open-AutoGLM / 提交一个新的 Open-AutoGLM 的功能建议\nlabe"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE.md",
    "chars": 1362,
    "preview": "# Contribution Guide\n\nWe welcome your contributions to this repository. To ensure elegant code style and better code qua"
  },
  {
    "path": ".gitignore",
    "chars": 538,
    "preview": "# Python\n__pycache__/\n*.py[cod]\n*$py.class\n*.so\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\np"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 534,
    "preview": "default_install_hook_types:\n  - pre-commit\n  - commit-msg\nexclude: '^phone_agent/config/apps\\.py$'\nexclude: '^README_en\\"
  },
  {
    "path": "LICENSE",
    "chars": 11342,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "README.md",
    "chars": 22415,
    "preview": "# Open-AutoGLM\n\n[Readme in English](README_en.md)\n\n<div align=\"center\">\n<img src=resources/logo.svg width=\"20%\"/>\n</div>"
  },
  {
    "path": "README_coding_agent.md",
    "chars": 8239,
    "preview": "# Open-AutoGLM Quick Start for Coding Agent\n\n<div align=\"center\">\n<img src=resources/logo.svg width=\"20%\"/>\n</div>\n\n> **"
  },
  {
    "path": "README_en.md",
    "chars": 34353,
    "preview": "# Open-AutoGLM\n\n[中文阅读.](./README.md)\n\n<div align=\"center\">\n<img src=resources/logo.svg width=\"20%\"/>\n</div>\n<p align=\"ce"
  },
  {
    "path": "docs/ios_setup/ios_setup.md",
    "chars": 2807,
    "preview": "# iOS 环境配置指南\n\n本文档介绍如何为 Open-AutoGLM 配置 iOS 设备环境。\n\n## 环境要求\n\n- macOS 操作系统\n- Xcode（最新版本，在App store中下载）\n- 苹果开发者账号（免费账号即可，无需付"
  },
  {
    "path": "examples/basic_usage.py",
    "chars": 5145,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nPhone Agent Usage Examples / Phone Agent 使用示例\n\nDemonstrates how to use Phone Agent for phone "
  },
  {
    "path": "examples/demo_thinking.py",
    "chars": 1545,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nThinking Output Demo / 演示 thinking 输出的示例\n\nThis script demonstrates how the Agent outputs both"
  },
  {
    "path": "ios.py",
    "chars": 17346,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nPhone Agent iOS CLI - AI-powered iOS phone automation.\n\nUsage:\n    python ios.py [OPTIONS]\n\nE"
  },
  {
    "path": "main.py",
    "chars": 28685,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nPhone Agent CLI - AI-powered phone automation.\n\nUsage:\n    python main.py [OPTIONS]\n\nEnvironm"
  },
  {
    "path": "phone_agent/__init__.py",
    "chars": 360,
    "preview": "\"\"\"\nPhone Agent - An AI-powered phone automation framework.\n\nThis package provides tools for automating Android and iOS "
  },
  {
    "path": "phone_agent/actions/__init__.py",
    "chars": 160,
    "preview": "\"\"\"Action handling module for Phone Agent.\"\"\"\n\nfrom phone_agent.actions.handler import ActionHandler, ActionResult\n\n__al"
  },
  {
    "path": "phone_agent/actions/handler.py",
    "chars": 15602,
    "preview": "\"\"\"Action handler for processing AI model outputs.\"\"\"\n\nimport ast\nimport re\nimport subprocess\nimport time\nfrom dataclass"
  },
  {
    "path": "phone_agent/actions/handler_ios.py",
    "chars": 10245,
    "preview": "\"\"\"Action handler for iOS automation using WebDriverAgent.\"\"\"\n\nimport time\nfrom dataclasses import dataclass\nfrom typing"
  },
  {
    "path": "phone_agent/adb/__init__.py",
    "chars": 950,
    "preview": "\"\"\"ADB utilities for Android device interaction.\"\"\"\n\nfrom phone_agent.adb.connection import (\n    ADBConnection,\n    Con"
  },
  {
    "path": "phone_agent/adb/connection.py",
    "chars": 10282,
    "preview": "\"\"\"ADB connection management for local and remote devices.\"\"\"\n\nimport subprocess\nimport time\nfrom dataclasses import dat"
  },
  {
    "path": "phone_agent/adb/device.py",
    "chars": 6712,
    "preview": "\"\"\"Device control utilities for Android automation.\"\"\"\n\nimport os\nimport subprocess\nimport time\nfrom typing import List,"
  },
  {
    "path": "phone_agent/adb/input.py",
    "chars": 2885,
    "preview": "\"\"\"Input utilities for Android device text input.\"\"\"\n\nimport base64\nimport subprocess\nfrom typing import Optional\n\n\ndef "
  },
  {
    "path": "phone_agent/adb/screenshot.py",
    "chars": 3186,
    "preview": "\"\"\"Screenshot utilities for capturing Android device screen.\"\"\"\n\nimport base64\nimport os\nimport subprocess\nimport tempfi"
  },
  {
    "path": "phone_agent/agent.py",
    "chars": 8150,
    "preview": "\"\"\"Main PhoneAgent class for orchestrating phone automation.\"\"\"\n\nimport json\nimport traceback\nfrom dataclasses import da"
  },
  {
    "path": "phone_agent/agent_ios.py",
    "chars": 9326,
    "preview": "\"\"\"iOS PhoneAgent class for orchestrating iOS phone automation.\"\"\"\n\nimport json\nimport traceback\nfrom dataclasses import"
  },
  {
    "path": "phone_agent/config/__init__.py",
    "chars": 1322,
    "preview": "\"\"\"Configuration module for Phone Agent.\"\"\"\n\nfrom phone_agent.config.apps import APP_PACKAGES\nfrom phone_agent.config.ap"
  },
  {
    "path": "phone_agent/config/apps.py",
    "chars": 8463,
    "preview": "\"\"\"App name to package name mapping for supported applications.\"\"\"\n\nAPP_PACKAGES: dict[str, str] = {\n    # Social & Mess"
  },
  {
    "path": "phone_agent/config/apps_harmonyos.py",
    "chars": 9436,
    "preview": "\"\"\"HarmonyOS application package name mappings.\n\nMaps user-friendly app names to HarmonyOS bundle names.\nThese bundle na"
  },
  {
    "path": "phone_agent/config/apps_ios.py",
    "chars": 10463,
    "preview": "\"\"\"App name to iOS bundle ID mapping for supported applications.\n\nBased on iOS app bundle ID conventions and common iOS "
  },
  {
    "path": "phone_agent/config/i18n.py",
    "chars": 2321,
    "preview": "\"\"\"Internationalization (i18n) module for Phone Agent UI messages.\"\"\"\n\n# Chinese messages\nMESSAGES_ZH = {\n    \"thinking\""
  },
  {
    "path": "phone_agent/config/prompts.py",
    "chars": 3591,
    "preview": "\"\"\"System prompts for the AI agent.\"\"\"\n\nfrom datetime import datetime\n\ntoday = datetime.today()\nformatted_date = today.s"
  },
  {
    "path": "phone_agent/config/prompts_en.py",
    "chars": 2630,
    "preview": "\"\"\"System prompts for the AI agent.\"\"\"\n\nfrom datetime import datetime\n\ntoday = datetime.today()\nformatted_date = today.s"
  },
  {
    "path": "phone_agent/config/prompts_zh.py",
    "chars": 3714,
    "preview": "\"\"\"System prompts for the AI agent.\"\"\"\n\nfrom datetime import datetime\n\ntoday = datetime.today()\nweekday_names = [\"星期一\", "
  },
  {
    "path": "phone_agent/config/timing.py",
    "chars": 5793,
    "preview": "\"\"\"Timing configuration for Phone Agent.\n\nThis module defines all configurable waiting times used throughout the applica"
  },
  {
    "path": "phone_agent/device_factory.py",
    "chars": 5050,
    "preview": "\"\"\"Device factory for selecting ADB or HDC based on device type.\"\"\"\n\nfrom enum import Enum\nfrom typing import Any\n\n\nclas"
  },
  {
    "path": "phone_agent/hdc/__init__.py",
    "chars": 996,
    "preview": "\"\"\"HDC utilities for HarmonyOS device interaction.\"\"\"\n\nfrom phone_agent.hdc.connection import (\n    HDCConnection,\n    C"
  },
  {
    "path": "phone_agent/hdc/connection.py",
    "chars": 11293,
    "preview": "\"\"\"HDC connection management for HarmonyOS devices.\"\"\"\n\nimport os\nimport subprocess\nimport time\nfrom dataclasses import "
  },
  {
    "path": "phone_agent/hdc/device.py",
    "chars": 8961,
    "preview": "\"\"\"Device control utilities for HarmonyOS automation.\"\"\"\n\nimport os\nimport subprocess\nimport time\nfrom typing import Lis"
  },
  {
    "path": "phone_agent/hdc/input.py",
    "chars": 4946,
    "preview": "\"\"\"Input utilities for HarmonyOS device text input.\"\"\"\n\nimport base64\nimport subprocess\nfrom typing import Optional\n\nfro"
  },
  {
    "path": "phone_agent/hdc/screenshot.py",
    "chars": 4094,
    "preview": "\"\"\"Screenshot utilities for capturing HarmonyOS device screen.\"\"\"\n\nimport base64\nimport os\nimport subprocess\nimport temp"
  },
  {
    "path": "phone_agent/model/__init__.py",
    "chars": 149,
    "preview": "\"\"\"Model client module for AI inference.\"\"\"\n\nfrom phone_agent.model.client import ModelClient, ModelConfig\n\n__all__ = [\""
  },
  {
    "path": "phone_agent/model/client.py",
    "chars": 9703,
    "preview": "\"\"\"Model client for AI inference using OpenAI-compatible API.\"\"\"\n\nimport json\nimport time\nfrom dataclasses import datacl"
  },
  {
    "path": "phone_agent/xctest/__init__.py",
    "chars": 881,
    "preview": "\"\"\"XCTest utilities for iOS device interaction via WebDriverAgent/XCUITest.\"\"\"\n\nfrom phone_agent.xctest.connection impor"
  },
  {
    "path": "phone_agent/xctest/connection.py",
    "chars": 10875,
    "preview": "\"\"\"iOS device connection management via idevice tools and WebDriverAgent.\"\"\"\n\nimport subprocess\nimport time\nfrom datacla"
  },
  {
    "path": "phone_agent/xctest/device.py",
    "chars": 13283,
    "preview": "\"\"\"Device control utilities for iOS automation via WebDriverAgent.\"\"\"\n\nimport subprocess\nimport time\nfrom typing import "
  },
  {
    "path": "phone_agent/xctest/input.py",
    "chars": 7992,
    "preview": "\"\"\"Input utilities for iOS device text input via WebDriverAgent.\"\"\"\n\nimport time\n\n\ndef _get_wda_session_url(wda_url: str"
  },
  {
    "path": "phone_agent/xctest/screenshot.py",
    "chars": 6055,
    "preview": "\"\"\"Screenshot utilities for capturing iOS device screen.\"\"\"\n\nimport base64\nimport os\nimport subprocess\nimport tempfile\ni"
  },
  {
    "path": "requirements.txt",
    "chars": 421,
    "preview": "Pillow>=12.0.0\nopenai>=2.9.0\n\n# For iOS Support\nrequests>=2.31.0\n\n# For Model Deployment\n\n## After installing sglang or "
  },
  {
    "path": "resources/WECHAT.md",
    "chars": 198,
    "preview": "<div align=\"center\">\n<img src=wechat.jpeg width=\"60%\"/>\n\n<p> 扫码加入「Open-AutoGLM 交流群」 </p>\n<p> Scan the QR code to follow "
  },
  {
    "path": "resources/privacy_policy.txt",
    "chars": 4700,
    "preview": "第一部分：模型/技术的安全性说明\n\n1. AutoGLM 技术机制与部署灵活性\nAutoGLM 的核心功能是自动化操作执行。其工作原理如下：\n- 指令驱动： 基于用户或开发者发出的操作指令。\n- 屏幕理解： 获取当前操作环境的屏幕内容，将图"
  },
  {
    "path": "resources/privacy_policy_en.txt",
    "chars": 15763,
    "preview": "Part I: Safety Description of Model/Technology\n\n1. AutoGLM Technical Mechanism and Deployment Flexibility\nThe core funct"
  },
  {
    "path": "scripts/check_deployment_cn.py",
    "chars": 3163,
    "preview": "import argparse\nimport json\nimport os\n\nfrom openai import OpenAI\n\nif __name__ == \"__main__\":\n    parser = argparse.Argum"
  },
  {
    "path": "scripts/check_deployment_en.py",
    "chars": 3727,
    "preview": "import argparse\nimport json\nimport os\n\nfrom openai import OpenAI\n\nif __name__ == \"__main__\":\n    parser = argparse.Argum"
  },
  {
    "path": "scripts/sample_messages.json",
    "chars": 1222268,
    "preview": "[\n    {\n        \"role\": \"system\",\n        \"content\": \"今天的日期是: 2025年12月11日 星期四\\n你是一个智能体分析专家，可以根据操作历史和当前状态图执行一系列操作来完成任务。\\n"
  },
  {
    "path": "scripts/sample_messages_en.json",
    "chars": 1221240,
    "preview": "[\n    {\n        \"role\": \"system\",\n        \"content\": \"The current date: 2025-12-12, Friday\\n# Setup\\nYou are a professio"
  },
  {
    "path": "setup.py",
    "chars": 1460,
    "preview": "#!/usr/bin/env python3\n\"\"\"Setup script for Phone Agent.\"\"\"\n\nfrom setuptools import find_packages, setup\n\nwith open(\"READ"
  }
]

About this extraction

This page contains the full source code of the zai-org/Open-AutoGLM GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 56 files (2.7 MB), approximately 705.3k tokens, and a symbol index with 253 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo