[
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.yml",
    "content": "name: 🐛 Bug Report\ndescription: File an issue about a bug.\ntitle: \"[BUG] \"\nlabels: [bug]\nassignees: [dandansamax]\nbody:\n  - type: markdown\n    attributes:\n      value: |\n        Please do your best to make the issue as easy to act on as possible, and only submit here if there is clearly a problem with camel (ask in [Discussions](https://github.com/camel-ai/camel/discussions) first if unsure).\n\n  - type: input\n    id: version\n    attributes:\n      label: What version of camel are you using?\n      description: Run command `python3 -c 'print(__import__(\"camel\").__version__)'` in your shell and paste the output here.\n      placeholder: E.g., 0.1.0\n    validations:\n      required: true\n\n  - type: textarea\n    id: system-info\n    attributes:\n      label: System information\n      description: |\n        Describe the characteristic of your environment:\n\n        - Describe how the library was installed (pip, conda, source, ...)\n        - Python version\n        - Versions of any other relevant libraries\n\n        ```python\n        import sys, camel\n        print(sys.version, sys.platform)\n        print(camel.__version__)\n        ```\n    validations:\n      required: true\n\n  - type: textarea\n    id: description\n    attributes:\n      label: Problem description\n      description: >-\n        Provide a short description, state the expected behavior and what actually happens. Include\n        relevant information like what version of camel you are using, what system you are on,\n        and any useful commands / output.\n    validations:\n      required: true\n\n  - type: textarea\n    id: code\n    attributes:\n      label: Reproducible example code\n      description: >-\n        The code should be minimal, have minimal external dependencies, and isolate the functions\n        that cause breakage. Submit matched and complete snippets that can be easily run to diagnose\n        the issue.\n      value: |\n        The Python snippets:\n\n        ```python\n\n        ```\n\n        Command lines:\n\n        ```bash\n\n        ```\n\n        Extra dependencies:\n\n        ```text\n\n        ```\n\n        Steps to reproduce:\n\n        1.\n        2.\n        3.\n    validations:\n      required: true\n\n  - type: textarea\n    id: traceback\n    attributes:\n      label: Traceback\n      description: Put the Python traceback information here.\n      placeholder: |\n        Traceback (most recent call last):\n          File ...\n      render: pytb\n\n  - type: textarea\n    id: expected\n    attributes:\n      label: Expected behavior\n      description: Provide a clear and concise description of what you expected to happen.\n\n  - type: textarea\n    id: additional-context\n    attributes:\n      label: Additional context\n      description: >-\n        Add any other context about the problem here. Screenshots may also be helpful.\n\n        If you know or suspect the reason for this bug, paste the code lines and suggest modifications.\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.yml",
    "content": "name: ✨ Feature Request\ndescription: Suggest an idea for this project.\ntitle: \"[Feature Request] \"\nlabels: [enhancement]\nassignees: [dandansamax]\nbody:\n  - type: checkboxes\n    id: steps\n    attributes:\n      label: Required prerequisites\n      description: Make sure you've completed the following steps before submitting your issue -- thank you!\n      options:\n        - label: I have searched the [Issue Tracker](https://github.com/camel-ai/crab/issues) that this hasn't already been reported. (+1 or comment there if it has.)\n          required: true\n\n  - type: textarea\n    id: motivation\n    attributes:\n      label: Motivation\n      description: Outline the motivation for the proposal.\n      value: |\n        <!-- Please outline the motivation for the proposal.\n        Is your feature request related to a problem? E.g., \"I'm always frustrated when [...]\".\n        If this is related to another issue, please link here too. -->\n    validations:\n      required: true\n\n  - type: textarea\n    id: solution\n    attributes:\n      label: Solution\n      description: Provide a clear and concise description of what you want to happen.\n\n  - type: textarea\n    id: additional-context\n    attributes:\n      label: Additional context\n      description: Add any other context about the problem here. Screenshots may also be helpful.\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/questions.yml",
    "content": "name: 🤔 Questions / Help / Support\ndescription: Do you need support?\ntitle: \"[Question] \"\nlabels: [question]\nassignees: [dandansamax]\nbody:\n  - type: checkboxes\n    id: steps\n    attributes:\n      label: Required prerequisites\n      description: Make sure you've completed the following steps before submitting your issue -- thank you!\n      options:\n        # - label: I have read the documentation <https://camel-ai.github.io/camel/camel.html>.\n        #   required: true\n        - label: I have searched the [Issue Tracker](https://github.com/camel-ai/crab/issues) that this hasn't already been reported. (+1 or comment there if it has.)\n          required: true\n\n  - type: textarea\n    id: questions\n    attributes:\n      label: Questions\n      description: Describe your questions with relevant resources such as snippets, links, images, etc.\n    validations:\n      required: true\n"
  },
  {
    "path": ".github/actions/crab_install/action.yml",
    "content": "name: 'crab_install'\ndescription: 'Setup python environment and install dependencies for Crab by poetry.'\ninputs:\n  python-version:\n    description: 'Python version.'\n    required: true\n    default: '3.10'\nruns:\n  using: \"composite\"\n  steps:\n    - name: Set up Python\n      uses: actions/setup-python@v3\n      with:\n        python-version: '${{ inputs.python-version }}'\n    - name: Install poetry\n      uses: abatilo/actions-poetry@v2\n    - name: Setup poetry virtual environment\n      run: |\n        poetry config virtualenvs.create true --local\n        poetry config virtualenvs.in-project true --local\n      shell: bash\n    - uses: actions/cache/restore@v3\n      id: cache-restore\n      name: Restore caches for the virtual environment based on poetry.lock\n      with:\n        path: ./.venv\n        key: venv-${{ hashFiles('poetry.lock') }}\n    - name: Install the project dependencies\n      run: poetry install -E client -E server -E camel\n      shell: bash\n    - uses: actions/cache/save@v3\n      name: Save caches based on poetry.lock \n      if: ${{ !steps.cache-restore.outputs.cache-hit }}\n      with:\n        path: ./.venv\n        key: venv-${{ hashFiles('poetry.lock') }}\n"
  },
  {
    "path": ".github/workflows/documentation.yml",
    "content": "name: Build and deploy CRAB documents\non:\n  push:\n    branches: [ \"main\" ]\n  workflow_dispatch:\npermissions:\n    contents: write\njobs:\n  docs:\n    runs-on: ubuntu-latest\n\n    steps:\n    - uses: actions/checkout@v3\n    - name: Set up Python environment and install dependencies\n      uses: ./.github/actions/crab_install\n      with:\n        python-version: \"3.10\"\n    - name: Sphinx build\n      run: |\n        cd docs\n        poetry run make html\n    - name: Deploy\n      uses: peaceiris/actions-gh-pages@v3\n      if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main'}}\n      with:\n        publish_branch: gh-pages\n        github_token: ${{ secrets.GITHUB_TOKEN }}\n        publish_dir: docs/_build/html/\n        force_orphan: true\n"
  },
  {
    "path": ".github/workflows/publish_release.yml",
    "content": "name: Publish CRAB to PyPI / GitHub\n\non:\n  push:\n    tags:\n      - \"v*\"\n\n  workflow_dispatch:\n\njobs:\n  build-n-publish:\n    name: Build and publish to PyPI\n    runs-on: ubuntu-latest\n    permissions:\n      contents: write\n\n    steps:\n      - uses: actions/checkout@v3\n      - name: Build and publish to pypi\n        uses: JRubics/poetry-publish@v1.17\n        with:\n          pypi_token: ${{ secrets.PYPI_API_KEY }}\n          ignore_dev_requirements: \"yes\"\n\n      - name: Create GitHub Release\n        id: create_release\n        uses: actions/create-release@v1\n        env:\n          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # This token is provided by Actions, you do not need to create your own token\n        with:\n          tag_name: ${{ github.ref }}\n          release_name: ${{ github.ref }}\n          draft: false\n          prerelease: false\n\n      - name: Get Asset name\n        run: |\n          export PKG=$(ls dist/ | grep tar)\n          set -- $PKG\n          echo \"name=$1\" >> $GITHUB_ENV\n      - name: Upload Release Asset (sdist) to GitHub\n        id: upload-release-asset\n        uses: actions/upload-release-asset@v1\n        env:\n          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n        with:\n          upload_url: ${{ steps.create_release.outputs.upload_url }}\n          asset_path: dist/${{ env.name }}\n          asset_name: ${{ env.name }}\n          asset_content_type: application/zip\n"
  },
  {
    "path": ".github/workflows/pytest_package.yml",
    "content": "# This workflow will install Python dependencies, run tests\n# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python\n\nname: Pytest CRAB package\n\non: push\n\njobs:\n  pytest:\n    runs-on: ubuntu-latest\n    steps:\n    - uses: actions/checkout@v3\n    - name: Set up Python environment and install dependencies\n      uses: ./.github/actions/crab_install\n      with:\n        python-version: \"3.10\"\n    - name: Run pytest\n      run: poetry run pytest test/\n"
  },
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n.vagrant/*\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\ncover/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\n# docs/_build/\n\n# PyBuilder\n.pybuilder/\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# pyenv\n#   For a library or package, you might want to ignore these files since the code is\n#   intended to run in multiple environments; otherwise, check them in:\n# .python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# poetry\n#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.\n#   This is especially recommended for binary packages to ensure reproducibility, and is more\n#   commonly ignored for libraries.\n#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control\n#poetry.lock\n\n# pdm\n#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.\n#pdm.lock\n#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it\n#   in version control.\n#   https://pdm.fming.dev/#use-with-ide\n.pdm.toml\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n\n# pytype static type analyzer\n.pytype/\n\n# Cython debug symbols\ncython_debug/\n\n# PyCharm\n#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can\n#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore\n#  and can be added to the global gitignore or merged into this file.  For a more nuclear\n#  option (not recommended) you can uncomment the following to ignore the entire idea folder.\n.idea/\n\n.vscode/\n.python-version\n\n_build/\n\n# model parameter\n*.pth\n\nlogs/\n\n.DS_Store"
  },
  {
    "path": ".pre-commit-config.yaml",
    "content": "repos:\n  - repo: https://github.com/astral-sh/ruff-pre-commit\n    # Ruff version.\n    rev: v0.6.5\n    hooks:\n      # Run the linter.\n      - id: ruff\n      # Run the formatter.\n      - id: ruff-format\n  - repo: local\n    hooks:\n    - id: check-license\n      name: Check License\n      entry: python licenses/update_license.py . licenses/license_template.txt \n      language: system\n      types: [python]\n"
  },
  {
    "path": "README.md",
    "content": "# 🦀 CRAB: Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents\r\n\r\n[![arXiv][arxiv-image]][arxiv-url]\r\n[![Slack][slack-image]][slack-url]\r\n[![Discord][discord-image]][discord-url]\r\n[![Wechat][wechat-image]][wechat-url]\r\n[![Twitter][twitter-image]][twitter-url]\r\n\r\n<p align=\"center\">\r\n  <a href=\"https://camel-ai.github.io/crab/\">Documentation</a> |\r\n  <a href=\"https://crab.camel-ai.org/\">Website & Demos</a> |\r\n  <a href=\"https://www.camel-ai.org/post/crab\">Blog</a> |\r\n  <a href=\"https://dandansamax.github.io/posts/crab-paper/\">Chinese Blog</a> |\r\n  <a href=\"https://www.camel-ai.org/\">CAMEL-AI</a>\r\n</p>\r\n\r\n<p align=\"center\">\r\n  <img src='https://raw.githubusercontent.com/camel-ai/crab/main/assets/CRAB_logo1.png' width=800>\r\n</p>\r\n\r\n## Overview\r\n\r\nCRAB is a framework for building LLM agent benchmark environments in a Python-centric way.\r\n\r\n#### Key Features\r\n\r\n🌐 Cross-platform and Multi-environment\r\n* Create build agent environments that support various deployment options including in-memory, Docker-hosted, virtual machines, or distributed physical machines, provided they are accessible via Python functions.\r\n* Let the agent access all the environments in the same time through a unified interface.\r\n\r\n⚙ ️Easy-to-use Configuration\r\n* Add a new action by simply adding a `@action` decorator on a Python function.\r\n* Define the environment by integrating several actions together.\r\n\r\n📐 Novel Benchmarking Suite\r\n* Define tasks and the corresponding evaluators in an intuitive Python-native way.\r\n* Introduce a novel graph evaluator method providing fine-grained metrics.\r\n\r\n## Installation\r\n\r\n#### Prerequisites\r\n\r\n- Python 3.10 or newer\r\n\r\n```bash\r\npip install crab-framework[client]\r\n```\r\n\r\n## Experiment on CRAB-Benchmark-v0\r\n\r\nAll datasets and experiment code are in [crab-benchmark-v0](./crab-benchmark-v0/) directory. Please carefully read the [benchmark tutorial](./crab-benchmark-v0/README.md) before using our benchmark.\r\n\r\n## Examples\r\n\r\n#### Run template environment with openai agent\r\n\r\n```bash\r\nexport OPENAI_API_KEY=<your api key>\r\npython examples/single_env.py\r\npython examples/multi_env.py\r\n```\r\n\r\n## Demo Video\r\n\r\n[![demo_video](https://i.ytimg.com/vi_webp/PNqrHNQlU6I/maxresdefault.webp)](https://www.youtube.com/watch?v=PNqrHNQlU6I&ab_channel=CamelAI)\r\n\r\n## Cite\r\nPlease cite [our paper](https://arxiv.org/abs/2407.01511) if you use anything related in your work:\r\n```\r\n@misc{xu2024crab,\r\n      title={CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents}, \r\n      author={Tianqi Xu and Linyao Chen and Dai-Jie Wu and Yanjun Chen and Zecheng Zhang and Xiang Yao and Zhiqiang Xie and Yongchao Chen and Shilong Liu and Bochen Qian and Philip Torr and Bernard Ghanem and Guohao Li},\r\n      year={2024},\r\n      eprint={2407.01511},\r\n      archivePrefix={arXiv},\r\n      primaryClass={cs.AI},\r\n      url={https://arxiv.org/abs/2407.01511}, \r\n}\r\n```\r\n\r\n## Community\r\nJoin us ([*Discord*](https://discord.camel-ai.org/) or [*WeChat*](https://ghli.org/camel/wechat.png)) in pushing the boundaries of finding the scaling laws of agents. \r\n\r\n- **WeChat Community:** Scan the QR code below to join our WeChat community.\r\n\r\n  <div align=\"center\">\r\n    <img src=\"assets/wechatgroup.jpeg\" alt=\"WeChat QR Code\" width=\"50%\">\r\n  </div>\r\n\r\n\r\n<br>\r\n\r\n[slack-url]: https://join.slack.com/t/camel-kwr1314/shared_invite/zt-1vy8u9lbo-ZQmhIAyWSEfSwLCl2r2eKA\r\n[slack-image]: https://img.shields.io/badge/Slack-CAMEL--AI-blueviolet?logo=slack\r\n[discord-url]: https://discord.gg/CNcNpquyDc\r\n[discord-image]: https://img.shields.io/badge/Discord-CAMEL--AI-7289da?logo=discord&logoColor=white&color=7289da\r\n[wechat-url]: https://ghli.org/camel/wechat.png\r\n[wechat-image]: https://img.shields.io/badge/WeChat-CamelAIOrg-brightgreen?logo=wechat&logoColor=white\r\n[twitter-url]: https://twitter.com/CamelAIOrg\r\n[twitter-image]: https://img.shields.io/twitter/follow/CamelAIOrg?style=social&color=brightgreen&logo=twitter\r\n[arxiv-image]: https://img.shields.io/badge/arXiv-2407.01511-b31b1b.svg\r\n[arxiv-url]: https://arxiv.org/abs/2407.01511\r\n"
  },
  {
    "path": "crab/__init__.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# ruff: noqa: F403\nfrom .core import *\n\n__version__ = \"0.1.2\"\n"
  },
  {
    "path": "crab/actions/android_actions.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport base64\nimport subprocess\nfrom enum import Enum\nfrom time import sleep\n\nfrom crab import action\n\nfrom .crab_actions import get_element_position\n\n\ndef execute_adb(adb_command: str, env=None):\n    if env.device is None:\n        adb_command = \"adb \" + adb_command\n    else:\n        adb_command = f\"adb -s {env.device} \" + adb_command\n    result = subprocess.run(\n        adb_command,\n        shell=True,\n        stdout=subprocess.PIPE,\n        stderr=subprocess.PIPE,\n        text=True,\n    )\n    if result.returncode == 0:\n        return result.stdout.strip()\n    print(f\"Command execution failed: {adb_command}\")\n    print(result.stderr)\n    return \"ERROR\"\n\n\ndef get_device_size(env):\n    adb_command = \"shell wm size\"\n    result = execute_adb(adb_command, env)\n    if result != \"ERROR\":\n        return map(int, result.split(\": \")[1].split(\"x\"))\n    return 0, 0\n\n\n_DURATION = 1.5\n\n\n@action\ndef setup(env) -> None:\n    env.width, env.height = get_device_size(env)\n\n\n@action\ndef screenshot(env) -> str:\n    \"\"\"\n    Get the current screenshot of phone screen.\n    \"\"\"\n    if env.device is not None:\n        command = f\"adb -s {env.device} exec-out screencap -p\"\n    else:\n        command = \"adb exec-out screencap -p\"\n    result = subprocess.run(\n        command,\n        shell=True,\n        stdout=subprocess.PIPE,\n        stderr=subprocess.STDOUT,\n    )\n    return base64.b64encode(result.stdout).decode(\"utf-8\")\n\n\n@action\ndef tap(element: int, env) -> None:\n    \"\"\"\n    Tap an UI element shown on the smartphone screen. A simple use case can be tap(5),\n    which taps the UI element labeled with the number 5.\n\n    Args:\n        element: A numeric tag assigned to an UI element shown on the smartphone screen.\n    \"\"\"\n    x, y = get_element_position(element, env)\n    execute_adb(f\"shell input tap {x} {y}\", env)\n    sleep(_DURATION)\n\n\n@action\ndef long_tap(element: int, env) -> None:\n    \"\"\"\n    Press and hold a UI element on the smartphone screen for 1 second, typically to\n    access additional menu options. For example, the command long_tap(5) simulates a\n    long press on the UI element labeled with the number 5.\n\n    Args:\n        element: A numeric tag assigned to an UI element shown on the smartphone screen.\n    \"\"\"\n    x, y = get_element_position(element, env)\n    adb_command = f\"shell input swipe {x} {y} {x} {y} 1000\"\n    execute_adb(adb_command, env)\n    sleep(_DURATION)\n\n\nclass SwipeDirection(str, Enum):\n    RIGHT = \"right\"\n    LEFT = \"left\"\n    UP = \"up\"\n    DOWN = \"down\"\n\n\nclass SwipeDist(str, Enum):\n    SHORT = \"short\"\n    MEDIUM = \"medium\"\n    LONG = \"long\"\n\n\n@action\ndef swipe(element: int, direction: SwipeDirection, dist: SwipeDist, env) -> None:\n    \"\"\"\n    This function is used to swipe an UI element shown on the smartphone screen, usually\n    a scroll view or a slide bar. You should choose the appropriate direction and\n    distance option according to your need. A simple use case can be swipe(21, \"up\",\n    \"medium\"), which swipes up the UI element labeled with the number 21 for a medium\n    distance.\n\n    Args:\n        element: is a numeric tag assigned to an UI element shown on the smartphone\n            screen.\n        direction: is a string that represents the swipe direction\n        dist: determines the distance of the swipe.\n    \"\"\"\n    x, y = get_element_position(element, env)\n    unit_dist = int(env.width / 10)\n    if dist == \"long\":\n        unit_dist *= 3\n    elif dist == \"medium\":\n        unit_dist *= 2\n    if direction == \"up\":\n        offset = 0, -2 * unit_dist\n    elif direction == \"down\":\n        offset = 0, 2 * unit_dist\n    elif direction == \"left\":\n        offset = -1 * unit_dist, 0\n    elif direction == \"right\":\n        offset = unit_dist, 0\n    else:\n        return \"ERROR\"\n    adb_command = f\"shell input swipe {x} {y} {x + offset[0]} {y + offset[1]} 200\"\n    execute_adb(adb_command, env)\n    sleep(_DURATION)\n\n\n@action\ndef open_app_drawer(env) -> None:\n    \"\"\"Open app drawer to list all the installed applications in this phone. For\n    exmaple: you want to open \"Messages\" application, but you don't know where to find\n    it, you can call \"open_app_drawer()\" and you will see all the installed applications\n    through screenshot.\n    \"\"\"\n    execute_adb(\"shell input keyevent KEYCODE_HOME\", env)\n    sleep(0.5)\n    execute_adb(\"shell input swipe 800 2000 800 100 500\", env)\n    sleep(_DURATION)\n\n\nclass AndroidKey(str, Enum):\n    HOME = \"home\"\n    BACK = \"back\"\n\n\n@action\ndef key_press(key: AndroidKey, env):\n    \"\"\"\n    Press Android keys. press(\"home\") to go back to main screen. press(\"back\") to return\n    to the preivous page.\n\n    Args:\n        key (str): The pressed key.\n    \"\"\"\n    if key == AndroidKey.HOME:\n        adb_command = \"shell input keyevent KEYCODE_HOME\"\n    elif key == AndroidKey.BACK:\n        adb_command = \"shell input keyevent KEYCODE_BACK\"\n    else:\n        raise ValueError(\"Unsupported key\")\n    execute_adb(adb_command, env)\n    sleep(_DURATION)\n\n\n@action\ndef write_text(text: str, env) -> None:\n    \"\"\"\n    Typing the specified text.\n\n    Args:\n        text (str): The text to be typed.\n    \"\"\"\n    text = text.replace(\" \", \"%s\")\n    text = text.replace(\"'\", \"\")\n    adb_command = f\"shell input text {text}\"\n    execute_adb(adb_command, env)\n    sleep(_DURATION)\n\n\n@action\ndef stop_all_apps(env) -> None:\n    \"\"\"\n    Stop all running apps.\n    \"\"\"\n    execute_adb(\"shell input keyevent KEYCODE_HOME\", env)\n    execute_adb(\"shell input keyevent KEYCODE_APP_SWITCH\", env)\n    sleep(0.5)\n    command = (\n        f\"shell input swipe 100 {env.height / 2} {env.width - 100} {env.height / 2} 200\"\n    )\n    execute_adb(command, env)\n    sleep(0.5)\n    execute_adb(\"shell input tap 300 1400\", env)\n    sleep(_DURATION)\n"
  },
  {
    "path": "crab/actions/crab_actions.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom time import sleep\n\nfrom crab import action, evaluator\n\n\n@action(env_name=\"root\")\ndef submit(content: str) -> None:\n    \"\"\"Submit your answer through this action. For exmaple, if you are required to\n    submit a word \"apple\", you can use submit(content=\"apple\").\n\n    Args:\n        content: the content to submit\n    \"\"\"\n    pass\n\n\n@evaluator(env_name=\"root\")\ndef check_submit(text: str, env) -> bool:\n    if env.trajectory:\n        action_name, params, _ = env.trajectory[-1]\n        if action_name == \"submit\" and text in params[\"content\"]:\n            return True\n    return False\n\n\n@action(env_name=\"root\")\ndef complete() -> bool:\n    \"\"\"When you think the task is completed, use this action to notify the system. For\n    exmaple, if you successfully complete the task, you can use complete().\n    \"\"\"\n    pass\n\n\n@action(env_name=\"root\")\ndef wait() -> bool:\n    \"\"\"If the environment is still processing your action and you have nothing to do in\n    this step, you can use wait().\n    \"\"\"\n    sleep(5)\n\n\ndef get_element_position(element_id, env):\n    \"\"\"Get element position provided by function `zs_object_detection`\"\"\"\n    box = env.element_position_map[element_id]\n    x = (box[0] + box[2]) / 2\n    y = (box[1] + box[3]) / 2\n    return round(x), round(y)\n"
  },
  {
    "path": "crab/actions/desktop_actions.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport base64\nimport time\nfrom enum import Enum\n\nimport pyautogui\nfrom mss import mss, tools\n\nfrom crab import action\n\nfrom .crab_actions import get_element_position\n\nDURATION = 0.8\nDELAY = 1.0\n\n\n@action\ndef click_position(x: int, y: int) -> None:\n    \"\"\"\n    click on the current desktop screen.\n\n    Args:\n        x: The X coordinate, as a floating-point number in the range [0.0, 1.0].\n        y: The Y coordinate, as a floating-point number in the range [0.0, 1.0].\n    \"\"\"\n    pyautogui.click(x, y, duration=DURATION)\n    time.sleep(DELAY)\n\n\n@action(local=True)\ndef click(element: int, env) -> None:\n    \"\"\"\n    Click an UI element shown on the desktop screen. A simple use case can be\n    click(5), which clicks the UI element labeled with the number 5.\n\n    Args:\n        element: A numeric tag assigned to an UI element shown on the screenshot.\n    \"\"\"\n    x, y = get_element_position(element, env)\n    env._action_endpoint(click_position, {\"x\": x, \"y\": y})\n\n\n@action\ndef right_click_position(x: int, y: int) -> None:\n    \"\"\"\n    right-click on the current desktop screen.\n\n    Args:\n        x: The X coordinate, as a floating-point number in the range [0.0, 1.0].\n        y: The Y coordinate, as a floating-point number in the range [0.0, 1.0].\n    \"\"\"\n    pyautogui.click(x, y, duration=DURATION, button=\"right\")\n\n\n@action(local=True)\ndef right_click(element: int, env) -> None:\n    \"\"\"\n    Right-click an UI element shown on the desktop screen using the mouse, which is\n    usually used for opening the menu of the element. A simple use case can be\n    right_click(5), which right-clicks the UI element labeled with the number 5 to open\n    up menu on it.\n\n    Args:\n        element: A numeric tag assigned to an UI element shown on the screenshot.\n    \"\"\"\n    x, y = get_element_position(element, env)\n    env._action_endpoint(right_click_position, {\"x\": x, \"y\": y})\n    time.sleep(DELAY)\n\n\n@action\ndef double_click_position(x: int, y: int) -> None:\n    \"\"\"\n    Double-click on the current desktop screen.\n\n    Args:\n        x: The X coordinate, as a floating-point number in the range [0.0, 1.0].\n        y: The Y coordinate, as a floating-point number in the range [0.0, 1.0].\n    \"\"\"\n    pyautogui.click(x, y, duration=DURATION, clicks=2, interval=0.2)\n\n\n@action(local=True)\ndef double_click(element: int, env) -> None:\n    \"\"\"\n    Double-click an UI element shown on the desktop screen using the mouse, which is\n    usually used for opening a folder or a file. A simple use case can be\n    double_click(5), which double-clicks the UI element labeled with the number 5 to\n    open it.\n\n    Args:\n        element: A numeric tag assigned to an UI element shown on the screenshot.\n    \"\"\"\n    x, y = get_element_position(element, env)\n    env._action_endpoint(double_click_position, {\"x\": x, \"y\": y})\n    time.sleep(DELAY)\n\n\n@action\ndef mouse_scroll(click: int = 1) -> None:\n    \"\"\"\n    Performs a scroll of the mouse scroll wheel.\n\n    Args:\n        click(int): The amount of scrolling. Default to 1.\n    \"\"\"\n    pyautogui.scroll(click)\n    time.sleep(DELAY)\n\n\nclass KeyEnum(str, Enum):\n    KEY_TAB = \"\\t\"\n    KEY_LB = \"\\n\"\n    KEY_RR = \"\\r\"\n    KEY_SPACE = \" \"\n    KEY_EXCLAMATION = \"!\"\n    KEY_DQUOTE = '\"'\n    KEY_SHARP = \"#\"\n    KEY_DOLLAR = \"$\"\n    KEY_PER = \"%\"\n    KEY_AND = \"&\"\n    KEY_SQUOTE = \"'\"\n    KEY_LPAR = \"(\"\n    KEY_RPAR = \")\"\n    KEY_MUL = \"*\"\n    KEY_ADD = \"+\"\n    KEY_COMMA = \",\"\n    KEY_MIN = \"-\"\n    KEY_DOT = \".\"\n    KEY_SLASH = \"/\"\n    KEY_0 = \"0\"\n    KEY_1 = \"1\"\n    KEY_2 = \"2\"\n    KEY_3 = \"3\"\n    KEY_4 = \"4\"\n    KEY_5 = \"5\"\n    KEY_6 = \"6\"\n    KEY_7 = \"7\"\n    KEY_8 = \"8\"\n    KEY_9 = \"9\"\n    KEY_COL = \":\"\n    KEY_SEMICOL = \";\"\n    KET_LT = \"<\"\n    KEY_EQUAL = \"=\"\n    KEY_GT = \">\"\n    KEY_QM = \"?\"\n    KEY_AT = \"@\"\n    KEY_LBRA = \"[\"\n    KEY_RSLASH = \"\\\\\"\n    KEY_RBRA = \"]\"\n    KEY_CARET = \"^\"\n    KEY_UNDERLINE = \"_\"\n    KEY_BACKTICK = \"`\"\n    KEY_LBRACE = \"{\"\n    KEY_RBRACE = \"}\"\n    KEY_PIPE = \"|\"\n    KEY_TLIDE = \"~\"\n    KEY_A = \"a\"\n    KEY_B = \"b\"\n    KEY_C = \"c\"\n    KEY_D = \"d\"\n    KEY_E = \"e\"\n    KEY_F = \"f\"\n    KEY_G = \"g\"\n    KEY_H = \"h\"\n    KEY_I = \"i\"\n    KEY_J = \"j\"\n    KEY_K = \"k\"\n    KEY_L = \"l\"\n    KEY_M = \"m\"\n    KEY_N = \"n\"\n    KEY_O = \"o\"\n    KEY_P = \"p\"\n    KEY_Q = \"q\"\n    KEY_R = \"r\"\n    KEY_S = \"s\"\n    KEY_T = \"t\"\n    KEY_U = \"u\"\n    KEY_V = \"v\"\n    KEY_W = \"w\"\n    KEY_X = \"x\"\n    KEY_Y = \"y\"\n    KEY_Z = \"z\"\n    KEY_ALT = \"alt\"\n    KEY_SHIFT = \"shift\"\n    KEY_CTRL = \"ctrl\"\n    KEY_WIN = \"win\"\n    KEY_BACKSPACE = \"backspace\"\n    KEY_ENTER = \"enter\"\n    KEY_ESC = \"esc\"\n    KEY_F1 = \"f1\"\n    KEY_F2 = \"f2\"\n    KEY_F3 = \"f3\"\n    KEY_F4 = \"f4\"\n    KEY_F5 = \"f5\"\n    KEY_F6 = \"f6\"\n    KEY_F7 = \"f7\"\n    KEY_F8 = \"f8\"\n    KEY_F9 = \"f9\"\n    KEY_F10 = \"f10\"\n    KEY_F11 = \"f11\"\n    KEY_F12 = \"f12\"\n    KEY_LEFT = \"left\"\n    KEY_UP = \"up\"\n    KEY_RIGHT = \"right\"\n    KEY_DOWN = \"down\"\n\n\n@action\ndef key_press(key: KeyEnum) -> None:\n    \"\"\"\n    Performs a keyboard key press down, followed by a release.\n\n    Args:\n        key (str): The key to be pressed.\n    \"\"\"\n    if isinstance(key, KeyEnum):\n        pyautogui.press(key.value)\n    else:\n        pyautogui.press(key)\n    time.sleep(DELAY)\n\n\n@action\ndef press_hotkey(keys: list[KeyEnum]) -> None:\n    \"\"\"\n    Press multiple keyboard keys at the same time. For exmaple, if you want to use\n    Ctrl-C hoykey to copy the selected text, you can call\n    press_hotkey(keys=[\"ctrl\", \"c\"]).\n\n    Args:\n        key (str): The key to be pressed.\n    \"\"\"\n    if isinstance(keys[0], KeyEnum):\n        keys = [key.value for key in keys]\n    pyautogui.hotkey(*keys)\n    time.sleep(DELAY)\n\n\n@action\ndef write_text(text: str) -> None:\n    \"\"\"\n    Typing the specified text. Note: This function does not move the mouse cursor.\n    Ensure the cursor focuses in the correct text input field before calling this\n    function.\n\n    Args:\n        text (str): The text to be typed.\n    \"\"\"\n    pyautogui.write(text, interval=0.03)\n    time.sleep(DELAY)\n\n\n@action\ndef search_application(name: str) -> None:\n    \"\"\"\n    Search an application name. For exmaple, if you want to open an application named\n    \"slack\", you can call search_application(name=\"slack\"). You MUST use this action to\n    search for applications.\n\n    Args:\n        name: the application name.\n    \"\"\"\n    pyautogui.press(\"esc\")\n    time.sleep(DELAY)\n    pyautogui.hotkey(\"win\", \"a\")\n    time.sleep(DELAY)\n    pyautogui.write(name)\n    time.sleep(DELAY)\n\n\n@action\ndef screenshot() -> str:\n    \"Get the current screenshot.\"\n    with mss() as sct:\n        # Get raw pixels from the screen\n        sct_img = sct.grab(sct.monitors[1])\n        # Create the Image\n        png = tools.to_png(sct_img.rgb, sct_img.size)\n        base64_img = base64.b64encode(png).decode(\"utf-8\")\n    return base64_img\n"
  },
  {
    "path": "crab/actions/file_actions.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport base64\nfrom io import BytesIO\n\nfrom PIL import Image\n\nfrom crab.core import action\n\n\n@action\ndef save_base64_image(image: str, path: str = \"image.png\") -> None:\n    image = Image.open(BytesIO(base64.b64decode(image)))\n    image.save(path)\n"
  },
  {
    "path": "crab/actions/system_actions.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport subprocess\nfrom time import sleep\n\nfrom crab.core.decorators import action\n\n\n@action\ndef delay(time: float) -> None:\n    sleep(time)\n\n\n@action\ndef run_bash_command(command: str) -> str:\n    \"\"\"\n    Run a command using bash shell. You can use this command to open any application by\n    their name.\n\n    Args:\n        command: The commmand to be run.\n\n    Return:\n        stdout and stderr\n    \"\"\"\n    p = subprocess.run([\"bash\", command], capture_output=True)\n    return f'stdout: \"{p.stdout}\"\\nstderr: \"{p.stderr}\"'\n"
  },
  {
    "path": "crab/actions/visual_prompt_actions.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport logging\nfrom functools import cache\nfrom typing import Literal\n\nfrom PIL import Image, ImageDraw, ImageFont\n\nfrom crab import action\nfrom crab.utils.common import base64_to_image, image_to_base64\n\nlogger = logging.getLogger(__name__)\n\ntry:\n    import easyocr\n    import numpy as np\n    import torch\n    from transformers import (\n        AutoProcessor,\n        GroundingDinoForObjectDetection,\n        GroundingDinoProcessor,\n    )\n\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\n    TRANSFORMERS_ENABLE = True\nexcept ImportError:\n    TRANSFORMERS_ENABLE = False\n\nBoxType = tuple[int, int, int, int]\nAnnotatedBoxType = tuple[BoxType, str | None]\n\n\ndef check_transformers_import() -> None:\n    if not TRANSFORMERS_ENABLE:\n        raise ImportError(\n            \"Please install the required dependencies to use this function by running\"\n            \" `pip install crab-framework[client]`\"\n        )\n\n\ndef _calculate_iou(box1: BoxType, box2: BoxType) -> float:\n    xA = max(box1[0], box2[0])\n    yA = max(box1[1], box2[1])\n    xB = min(box1[2], box2[2])\n    yB = min(box1[3], box2[3])\n\n    interArea = max(0, xB - xA) * max(0, yB - yA)\n    box1Area = (box1[2] - box1[0]) * (box1[3] - box1[1])\n    box2Area = (box2[2] - box2[0]) * (box2[3] - box2[1])\n    unionArea = box1Area + box2Area - interArea\n    iou = interArea / unionArea\n\n    return iou\n\n\ndef _calculate_center(box: BoxType) -> tuple[int, int]:\n    return (box[0] + box[2]) / 2, (box[1] + box[3]) / 2\n\n\ndef _remove_invalid_boxes(\n    boxes_with_label: AnnotatedBoxType, width: int, height: int\n) -> AnnotatedBoxType:\n    boxes = [box[0] for box in boxes_with_label]\n    boxes_to_remove = set()\n    for idx, box in enumerate(boxes):\n        if box[0] < 0 or box[1] < 0 or box[2] > width or box[3] > height:\n            boxes_to_remove.add(idx)\n            continue\n        if box[0] >= box[2] or box[1] >= box[3]:\n            boxes_to_remove.add(idx)\n            continue\n\n    boxes_filt = [\n        box for idx, box in enumerate(boxes_with_label) if idx not in boxes_to_remove\n    ]\n    return boxes_filt\n\n\ndef _filter_boxes_by_center(\n    boxes_with_label: list[AnnotatedBoxType], center_dis_thresh: float\n) -> list[AnnotatedBoxType]:\n    boxes = [box[0] for box in boxes_with_label]\n    boxes_to_remove = set()\n    for i in range(len(boxes)):\n        if i in boxes_to_remove:\n            continue\n        center_i = _calculate_center(boxes[i])\n        for j in range(i + 1, len(boxes)):\n            center_j = _calculate_center(boxes[j])\n            # fmt: off\n            center_close = ((center_i[0] - center_j[0]) ** 2 + \n                            (center_i[1] - center_j[1]) ** 2 < \n                            center_dis_thresh**2)\n            # fmt: on\n            if center_close:\n                boxes_to_remove.add(j)\n\n    boxes_filt = [\n        box for idx, box in enumerate(boxes_with_label) if idx not in boxes_to_remove\n    ]\n    return boxes_filt\n\n\ndef _box_a_in_b(a: BoxType, b: BoxType) -> bool:\n    return a[0] >= b[0] and a[1] >= b[1] and a[2] <= b[2] and a[3] <= b[3]\n\n\ndef _filter_boxes_by_overlap(\n    boxes_with_label: list[AnnotatedBoxType],\n) -> list[AnnotatedBoxType]:\n    boxes = [box[0] for box in boxes_with_label]\n    boxes_to_remove = set()\n    for i in range(len(boxes)):\n        if i in boxes_to_remove:\n            continue\n        for j in range(len(boxes)):\n            if i != j and _box_a_in_b(boxes[i], boxes[j]):\n                boxes_to_remove.add(j)\n\n    boxes_filt = [\n        box for idx, box in enumerate(boxes_with_label) if idx not in boxes_to_remove\n    ]\n    return boxes_filt\n\n\ndef _filter_boxes_by_iou(\n    boxes_with_label: list[AnnotatedBoxType], iou_threshold=0.5\n) -> list[AnnotatedBoxType]:\n    boxes = [box[0] for box in boxes_with_label]\n    boxes_to_remove = set()\n    for i in range(len(boxes)):\n        if i in boxes_to_remove:\n            continue\n        for j in range(i + 1, len(boxes)):\n            iou = _calculate_iou(boxes[i], boxes[j])\n            if iou >= iou_threshold:\n                boxes_to_remove.add(j)\n\n    boxes_filt = [\n        box for idx, box in enumerate(boxes_with_label) if idx not in boxes_to_remove\n    ]\n    return boxes_filt\n\n\ndef _draw_boxes(\n    image: Image.Image,\n    boxes: list[BoxType],\n    font_size: int = 30,\n) -> None:\n    draw = ImageDraw.Draw(image)\n    for idx, box in enumerate(boxes):\n        color = tuple(np.random.randint(64, 191, size=3).tolist())\n        font = ImageFont.load_default(font_size)\n        center = _calculate_center(box)\n\n        draw.rectangle([box[0], box[1], box[2], box[3]], outline=color, width=2)\n\n        if hasattr(font, \"getbbox\"):\n            _, _, w, h = draw.textbbox((0, 0), str(idx), font)\n        else:\n            w, h = draw.textsize(str(idx), font)\n        if box[0] >= w:\n            bbox = (\n                round(box[0] - w),\n                round(center[1] - h / 2),\n                round(box[0]),\n                round(center[1] + h / 2),\n            )\n        else:\n            bbox = (\n                round(box[2]),\n                round(center[1] - h / 2),\n                round(box[2] + w),\n                round(center[1] + h / 2),\n            )\n\n        draw.rectangle(bbox, fill=color)\n        draw.text((bbox[0], bbox[1]), str(idx), fill=\"white\", font=font)\n\n\n@cache\ndef _get_grounding_dino_model(\n    type: Literal[\"tiny\", \"base\"] = \"tiny\",\n) -> tuple[GroundingDinoProcessor, GroundingDinoForObjectDetection]:\n    \"\"\"Get the grounding dino model.\n\n    Args:\n        type: The version of the Gounding Dino Model.\n\n    Returns:\n        A tuple (processor, model).\n    \"\"\"\n    model_name = f\"IDEA-Research/grounding-dino-{type}\"\n    processor = AutoProcessor.from_pretrained(model_name)\n    model = GroundingDinoForObjectDetection.from_pretrained(model_name).to(device)\n    return processor, model\n\n\n@cache\ndef _get_easyocr_model() -> easyocr.Reader:\n    return easyocr.Reader([\"en\"])\n\n\ndef get_groundingdino_boxes(\n    images: Image.Image | list[Image.Image],\n    text_prompt: str,\n    box_threshold: float = 0.05,\n    text_threshold: float = 0.5,\n) -> list[list[AnnotatedBoxType]]:\n    \"\"\"Get the bounding boxes of the objects in the image using GroundingDino.\n\n    Args:\n        images: The image or list of images.\n        text_prompt: The text prompt to use for all the images.\n        box_threshold: The box threshold.\n        text_threshold: The text threshold.\n\n    Returns:\n        The first level list is for each image, and the second level list contains\n        tuples (detected boxes, its sementical representation) as the result of the\n        image.\n    \"\"\"\n    processor, model = _get_grounding_dino_model()\n    if isinstance(images, Image.Image):\n        images = [images]\n    image_number = len(images)\n    images = [image.convert(\"RGB\") for image in images]\n    inputs = processor(\n        images=images,\n        text=[text_prompt] * image_number,\n        return_tensors=\"pt\",\n    ).to(device)\n    with torch.no_grad():\n        outputs = model(**inputs)\n\n    target_sizes = [image.size[::-1] for image in images]\n    detection_results = processor.post_process_grounded_object_detection(\n        outputs,\n        inputs.input_ids,\n        box_threshold=box_threshold,\n        text_threshold=text_threshold,\n        target_sizes=target_sizes,\n    )\n    final_output = []\n    for result in detection_results:\n        boxes = result[\"boxes\"].cpu().int().tolist()\n        labels = result[\"labels\"]\n        final_output.append(list(zip(boxes, labels)))\n    return final_output\n\n\ndef get_easyocr_boxes(\n    image: Image.Image,\n) -> list[AnnotatedBoxType]:\n    \"\"\"Get the bounding boxes of the text in the image using EasyOCR.\n\n    Args:\n        image: The taget image.\n\n    Returns:\n        The list of tuple of bounding boxes and their corresponding text.\n    \"\"\"\n    reader = _get_easyocr_model()\n    result = reader.readtext(np.array(image), text_threshold=0.9)\n    boxes = []\n    for detect in result:\n        boxes.append(\n            (\n                (\n                    detect[0][0][0],\n                    detect[0][0][1],\n                    detect[0][2][0],\n                    detect[0][2][1],\n                ),\n                detect[1],\n            )\n        )\n    return boxes\n\n\n@action(local=True)\ndef groundingdino_easyocr(\n    input_base64_image: str,\n    font_size: int,\n    env,\n) -> tuple[str, list[AnnotatedBoxType]]:\n    \"\"\"Get the interative elements in the image.\n\n    Using GroundingDino and EasyOCR to detect the interactive elements in the image.\n    Mark the detected elements with bounding boxes and labels. Store the labels and\n    boxes in the environment to be used in other actions.\n\n    Args:\n        input_base64_image: The base64 encoded image.\n        font_size: The font size of the label.\n\n    Returns:\n        A tuple (base64_image, boxes), where base64_image is the base64 encoded image\n        drawn with bounding boxes and labels, and box is the list of detected boxes and\n        labels.\n    \"\"\"\n    check_transformers_import()\n    image = base64_to_image(input_base64_image)\n    od_boxes = get_groundingdino_boxes(image, \"icon . logo .\", box_threshold=0.02)[0]\n    od_boxes = _filter_boxes_by_iou(od_boxes, iou_threshold=0.5)\n    ocr_boxes = get_easyocr_boxes(image)\n    boxes_with_label = ocr_boxes + od_boxes\n    filtered_boxes = _remove_invalid_boxes(boxes_with_label, image.width, image.height)\n    filtered_boxes = _filter_boxes_by_overlap(filtered_boxes)\n    center_dis = round(max(image.height, image.width) / 80.0)\n    filtered_boxes = _filter_boxes_by_center(filtered_boxes, center_dis)\n    env.element_label_map = [box[1] for box in filtered_boxes]\n    result_boxes = [box[0] for box in filtered_boxes]\n    _draw_boxes(image, result_boxes, font_size)\n    env.element_position_map = result_boxes\n    env.ocr_results = \"\".join([box[1] for box in ocr_boxes])\n    return image_to_base64(image), filtered_boxes\n\n\n@action(local=True)\ndef get_elements_prompt(\n    input: tuple[str, list[AnnotatedBoxType]], env\n) -> tuple[str, str]:\n    \"\"\"Get the text prompt passing to the agent for the image.\n\n    Args:\n        input: The base64 encoded image and the list of detected boxes and labels.\n\n    Returns:\n        A tuple (image, prompt) contains the base64 encoded image and the prompt.\n    \"\"\"\n    image, boxes = input\n    labels = \"\"\n    for id, box in enumerate(boxes):\n        if box[1] is not None:\n            labels += f\"[{id}|{box[1]}]\\n\"\n    prompt = (\n        \"Some elements in the current screenshot have labels. I will give you \"\n        \"these labels by [id|label].\\n\" + labels\n    )\n    return image, prompt\n"
  },
  {
    "path": "crab/agents/backend_models/__init__.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# ruff: noqa: F401\nfrom typing import Any, Literal\n\nfrom pydantic import BaseModel\n\nfrom crab.core.backend_model import BackendModel\n\nfrom .camel_model import CamelModel\nfrom .claude_model import ClaudeModel\nfrom .gemini_model import GeminiModel\nfrom .openai_model import OpenAIModel, OpenAIModelJSON, SGlangOpenAIModelJSON\n\n\nclass BackendModelConfig(BaseModel):\n    model_class: Literal[\"openai\", \"claude\", \"gemini\", \"camel\", \"sglang\"]\n    \"\"\"Specify the model class to be used. Different model classese use different\n    APIs.\n    \"\"\"\n\n    model_name: str\n    \"\"\"Specify the model name to be used. This value is directly passed to the API, \n    check model provider API documentation for more details.\n    \"\"\"\n\n    model_platform: str | None = None\n    \"\"\"Required for CamelModel. Otherwise, it is ignored. Please check CAMEL\n    documentation for more details.\n    \"\"\"\n\n    history_messages_len: int = 0\n    \"\"\"Number of rounds of previous messages to be used in the model input. 0 means no\n    history.\n    \"\"\"\n\n    parameters: dict[str, Any] = {}\n    \"\"\"Additional parameters to be passed to the model.\"\"\"\n\n    json_structre_output: bool = False\n    \"\"\"If True, the model generate action through JSON without using \"tool call\" or\n    \"function call\". SGLang model only supports JSON output. OpenAI model supports both.\n    Other models do not support JSON output.\n    \"\"\"\n\n    tool_call_required: bool = True\n    \"\"\"Specify if the model enforce each round to generate tool/function calls.\"\"\"\n\n    base_url: str | None = None\n    \"\"\"Specify the base URL of the API. Only used in OpenAI and SGLang currently.\"\"\"\n\n    api_key: str | None = None\n    \"\"\"Specify the API key to be used. Only used in OpenAI and SGLang currently.\"\"\"\n\n\ndef create_backend_model(model_config: BackendModelConfig) -> BackendModel:\n    match model_config.model_class:\n        case \"claude\":\n            if model_config.base_url is not None or model_config.api_key is not None:\n                raise Warning(\n                    \"base_url and api_key are not supported for ClaudeModel currently.\"\n                )\n            if model_config.json_structre_output:\n                raise Warning(\n                    \"json_structre_output is not supported for ClaudeModel currently.\"\n                )\n            return ClaudeModel(\n                model=model_config.model_name,\n                parameters=model_config.parameters,\n                history_messages_len=model_config.history_messages_len,\n                tool_call_required=model_config.tool_call_required,\n            )\n        case \"gemini\":\n            if model_config.base_url is not None or model_config.api_key is not None:\n                raise Warning(\n                    \"base_url and api_key are not supported for GeminiModel currently.\"\n                )\n            if model_config.json_structre_output:\n                raise Warning(\n                    \"json_structre_output is not supported for GeminiModel currently.\"\n                )\n            return GeminiModel(\n                model=model_config.model_name,\n                parameters=model_config.parameters,\n                history_messages_len=model_config.history_messages_len,\n                tool_call_required=model_config.tool_call_required,\n            )\n        case \"openai\":\n            if not model_config.json_structre_output:\n                return OpenAIModel(\n                    model=model_config.model_name,\n                    parameters=model_config.parameters,\n                    history_messages_len=model_config.history_messages_len,\n                    base_url=model_config.base_url,\n                    api_key=model_config.api_key,\n                    tool_call_required=model_config.tool_call_required,\n                )\n            else:\n                return OpenAIModelJSON(\n                    model=model_config.model_name,\n                    parameters=model_config.parameters,\n                    history_messages_len=model_config.history_messages_len,\n                    base_url=model_config.base_url,\n                    api_key=model_config.api_key,\n                )\n        case \"sglang\":\n            return SGlangOpenAIModelJSON(\n                model=model_config.model_name,\n                parameters=model_config.parameters,\n                history_messages_len=model_config.history_messages_len,\n                base_url=model_config.base_url,\n                api_key=model_config.api_key,\n            )\n        case \"camel\":\n            return CamelModel(\n                model=model_config.model_name,\n                model_platform=model_config.model_platform,\n                parameters=model_config.parameters,\n                history_messages_len=model_config.history_messages_len,\n                tool_call_required=model_config.tool_call_required,\n            )\n        case _:\n            raise ValueError(f\"Unsupported model name: {model_config.model_name}\")\n"
  },
  {
    "path": "crab/agents/backend_models/camel_model.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport json\nfrom typing import Any\n\nfrom openai.types.chat import ChatCompletionMessageToolCall\nfrom PIL import Image\n\nfrom crab import Action, ActionOutput, BackendModel, BackendOutput, MessageType\nfrom crab.utils.common import base64_to_image\n\ntry:\n    from camel.agents import ChatAgent\n    from camel.messages import BaseMessage\n    from camel.models import ModelFactory\n    from camel.toolkits import OpenAIFunction\n    from camel.types.enums import ModelPlatformType, ModelType\n\n    CAMEL_ENABLED = True\nexcept ImportError:\n    CAMEL_ENABLED = False\n\n\ndef _get_model_platform_type(model_platform_name: str) -> \"ModelPlatformType\":\n    try:\n        return ModelPlatformType(model_platform_name)\n    except ValueError:\n        all_models = [platform.value for platform in ModelPlatformType]\n        raise ValueError(\n            f\"Model {model_platform_name} not found. Supported models are {all_models}\"\n        )\n\n\ndef _get_model_type(model_name: str) -> \"str | ModelType\":\n    try:\n        return ModelType(model_name)\n    except ValueError:\n        return model_name\n\n\ndef _convert_action_to_schema(\n    action_space: list[Action] | None,\n) -> \"list[OpenAIFunction] | None\":\n    if action_space is None:\n        return None\n    schema_list = []\n    for action in action_space:\n        new_action = action.to_openai_json_schema()\n        schema = {\"type\": \"function\", \"function\": new_action}\n        schema_list.append(OpenAIFunction(action.entry, schema))\n    return schema_list\n\n\ndef _convert_tool_calls_to_action_list(\n    tool_calls: list[ChatCompletionMessageToolCall] | None,\n) -> list[ActionOutput] | None:\n    if tool_calls is None:\n        return None\n\n    return [\n        ActionOutput(\n            name=call.function.name,\n            arguments=json.loads(call.function.arguments),\n        )\n        for call in tool_calls\n    ]\n\n\nclass CamelModel(BackendModel):\n    def __init__(\n        self,\n        model: str,\n        model_platform: str,\n        parameters: dict[str, Any] | None = None,\n        history_messages_len: int = 0,\n        tool_call_required: bool = True,\n    ) -> None:\n        if not CAMEL_ENABLED:\n            raise ImportError(\"Please install camel-ai to use CamelModel\")\n        self.model = model\n        self.parameters = parameters if parameters is not None else {}\n        self.history_messages_len = history_messages_len\n\n        self.model_type = _get_model_type(model)\n        self.model_platform_type = _get_model_platform_type(model_platform)\n        self.client: ChatAgent | None = None\n        self.token_usage = 0\n        self.tool_call_required = tool_call_required\n        self.history_messages_len = history_messages_len\n\n    def get_token_usage(self) -> int:\n        return self.token_usage\n\n    def reset(self, system_message: str, action_space: list[Action] | None) -> None:\n        action_schema = _convert_action_to_schema(action_space)\n        config = self.parameters.copy()\n        if action_schema is not None:\n            config[\"tool_choice\"] = \"required\" if self.tool_call_required else \"auto\"\n            config[\"tools\"] = [\n                schema.get_openai_tool_schema() for schema in action_schema\n            ]\n\n        backend_model = ModelFactory.create(\n            self.model_platform_type,\n            self.model_type,\n            model_config_dict=config,\n        )\n        sysmsg = BaseMessage.make_assistant_message(\n            role_name=\"Assistant\",\n            content=system_message,\n        )\n        self.client = ChatAgent(\n            model=backend_model,\n            system_message=sysmsg,\n            external_tools=action_schema,\n            message_window_size=self.history_messages_len,\n        )\n        self.token_usage = 0\n\n    def chat(self, messages: list[tuple[str, MessageType]]) -> BackendOutput:\n        # TODO: handle multiple text messages after message refactoring\n        image_list: list[Image.Image] = []\n        content = \"\"\n        for message in messages:\n            if message[1] == MessageType.IMAGE_JPG_BASE64:\n                image = base64_to_image(message[0])\n                image_list.append(image)\n            else:\n                content = message[0]\n        usermsg = BaseMessage.make_user_message(\n            role_name=\"User\",\n            content=content,\n            image_list=image_list,\n        )\n        response = self.client.step(usermsg)\n        self.token_usage += response.info[\"usage\"][\"total_tokens\"]\n        tool_call_request = response.info.get(\"external_tool_request\")\n\n        return BackendOutput(\n            message=response.msg.content,\n            action_list=_convert_tool_calls_to_action_list([tool_call_request]),\n        )\n"
  },
  {
    "path": "crab/agents/backend_models/claude_model.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom copy import deepcopy\nfrom typing import Any\n\nfrom tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_fixed\n\nfrom crab import Action, ActionOutput, BackendModel, BackendOutput, Message, MessageType\n\ntry:\n    import anthropic\n    from anthropic.types import TextBlock, ToolUseBlock\n\n    anthropic_model_enable = True\nexcept ImportError:\n    anthropic_model_enable = False\n\n\nclass ClaudeModel(BackendModel):\n    def __init__(\n        self,\n        model: str,\n        parameters: dict[str, Any] | None = None,\n        history_messages_len: int = 0,\n        tool_call_required: bool = True,\n    ) -> None:\n        if anthropic_model_enable is False:\n            raise ImportError(\"Please install anthropic to use ClaudeModel\")\n        self.model = model\n        self.parameters = parameters if parameters is not None else {}\n        self.history_messages_len = history_messages_len\n\n        assert self.history_messages_len >= 0\n\n        self.client = anthropic.Anthropic()\n        self.tool_call_required: bool = tool_call_required\n        self.system_message: str = \"You are a helpful assistant.\"\n        self.action_space: list[Action] | None = None\n        self.action_schema: list[dict] | None = None\n        self.token_usage: int = 0\n        self.chat_history: list[list[dict]] = []\n        self.support_tool_call = True\n\n    def reset(self, system_message: str, action_space: list[Action] | None) -> None:\n        self.system_message = system_message\n        self.action_space = action_space\n        self.action_schema = _convert_action_to_schema(self.action_space)\n        self.token_usage = 0\n        self.chat_history = []\n\n    def chat(self, message: list[Message] | Message) -> BackendOutput:\n        if isinstance(message, tuple):\n            message = [message]\n        request = self._fetch_from_memory()\n        new_message = self._construct_new_message(message)\n        request.append(new_message)\n        response_message = self._call_api(request)\n        self._record_message(new_message, response_message)\n        return self._generate_backend_output(response_message)\n\n    def _construct_new_message(self, message: list[Message]) -> dict[str, Any]:\n        parts: list[dict] = []\n        for content, msg_type in message:\n            match msg_type:\n                case MessageType.TEXT:\n                    parts.append(\n                        {\n                            \"type\": \"text\",\n                            \"text\": content,\n                        }\n                    )\n                case MessageType.IMAGE_JPG_BASE64:\n                    parts.append(\n                        {\n                            \"type\": \"image\",\n                            \"source\": {\n                                \"data\": content,\n                                \"type\": \"base64\",\n                                \"media_type\": \"image/png\",\n                            },\n                        }\n                    )\n        return {\n            \"role\": \"user\",\n            \"content\": parts,\n        }\n\n    def _fetch_from_memory(self) -> list[dict]:\n        request: list[dict] = []\n        if self.history_messages_len > 0:\n            fetch_history_len = min(self.history_messages_len, len(self.chat_history))\n            for history_message in self.chat_history[-fetch_history_len:]:\n                request = request + history_message\n        return request\n\n    def get_token_usage(self):\n        return self.token_usage\n\n    def _record_message(\n        self, new_message: dict, response_message: anthropic.types.Message\n    ) -> None:\n        self.chat_history.append([new_message])\n        self.chat_history[-1].append(\n            {\"role\": response_message.role, \"content\": response_message.content}\n        )\n\n        if self.action_schema:\n            tool_calls = response_message.content\n            tool_content = []\n            for call in tool_calls:\n                if isinstance(call, ToolUseBlock):\n                    tool_content.append(\n                        {\n                            \"type\": \"tool_result\",\n                            \"tool_use_id\": call.id,\n                            \"content\": \"success\",\n                        }\n                    )\n            self.chat_history[-1].append(\n                {\n                    \"role\": \"user\",\n                    \"content\": tool_content,\n                }\n            )\n\n    @retry(\n        wait=wait_fixed(10),\n        stop=stop_after_attempt(7),\n        retry=retry_if_exception_type(\n            (\n                anthropic.APITimeoutError,\n                anthropic.APIConnectionError,\n                anthropic.InternalServerError,\n            )\n        ),\n    )\n    def _call_api(self, request_messages: list[dict]) -> anthropic.types.Message:\n        request_messages = _merge_request(request_messages)\n        if self.action_schema is not None:\n            response = self.client.messages.create(\n                system=self.system_message,  # <-- system prompt\n                messages=request_messages,  # type: ignore\n                model=self.model,\n                max_tokens=4096,\n                tools=self.action_schema,\n                tool_choice={\"type\": \"any\" if self.tool_call_required else \"auto\"},\n                **self.parameters,\n            )\n        else:\n            response = self.client.messages.create(\n                system=self.system_message,  # <-- system prompt\n                messages=request_messages,  # type: ignore\n                model=self.model,\n                max_tokens=4096,\n                **self.parameters,\n            )\n\n        self.token_usage += response.usage.input_tokens + response.usage.output_tokens\n        return response\n\n    def _generate_backend_output(\n        self, response_message: anthropic.types.Message\n    ) -> BackendOutput:\n        message = \"\"\n        action_list = []\n        for block in response_message.content:\n            if isinstance(block, TextBlock):\n                message += block.text\n            elif isinstance(block, ToolUseBlock):\n                action_list.append(\n                    ActionOutput(\n                        name=block.name,\n                        arguments=block.input,  # type: ignore\n                    )\n                )\n        if not action_list:\n            return BackendOutput(message=message, action_list=None)\n        else:\n            return BackendOutput(\n                message=message,\n                action_list=action_list,\n            )\n\n\ndef _merge_request(request: list[dict]) -> list[dict]:\n    merge_request = [deepcopy(request[0])]\n    for idx in range(1, len(request)):\n        if request[idx][\"role\"] == merge_request[-1][\"role\"]:\n            merge_request[-1][\"content\"].extend(request[idx][\"content\"])\n        else:\n            merge_request.append(deepcopy(request[idx]))\n\n    return merge_request\n\n\ndef _convert_action_to_schema(action_space):\n    if action_space is None:\n        return None\n    actions = []\n    for action in action_space:\n        new_action = action.to_openai_json_schema()\n        new_action[\"input_schema\"] = new_action.pop(\"parameters\")\n        if \"returns\" in new_action:\n            new_action.pop(\"returns\")\n        if \"title\" in new_action:\n            new_action.pop(\"title\")\n        if \"type\" in new_action:\n            new_action[\"input_schema\"][\"type\"] = new_action.pop(\"type\")\n        if \"required\" in new_action:\n            new_action[\"input_schema\"][\"required\"] = new_action.pop(\"required\")\n\n        actions.append(new_action)\n    return actions\n"
  },
  {
    "path": "crab/agents/backend_models/gemini_model.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport os\nfrom typing import Any\n\nfrom PIL.Image import Image\nfrom tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_fixed\n\nfrom crab import Action, ActionOutput, BackendModel, BackendOutput, Message, MessageType\nfrom crab.utils.common import base64_to_image, json_expand_refs\n\ntry:\n    import google.generativeai as genai\n    from google.ai.generativelanguage_v1beta import (\n        Content,\n        FunctionDeclaration,\n        Part,\n        Tool,\n    )\n    from google.api_core.exceptions import ResourceExhausted\n    from google.generativeai.types import content_types\n\n    gemini_model_enable = True\nexcept ImportError:\n    gemini_model_enable = False\n\n\nclass GeminiModel(BackendModel):\n    def __init__(\n        self,\n        model: str,\n        parameters: dict[str, Any] | None = None,\n        history_messages_len: int = 0,\n        tool_call_required: bool = True,\n    ) -> None:\n        if gemini_model_enable is False:\n            raise ImportError(\"Please install google.generativeai to use GeminiModel\")\n\n        self.model = model\n        self.parameters = parameters if parameters is not None else {}\n        self.history_messages_len = history_messages_len\n        assert self.history_messages_len >= 0\n        genai.configure(api_key=os.environ[\"GEMINI_API_KEY\"])\n        self.client = genai\n        self.tool_call_required = tool_call_required\n        self.system_message: str = \"You are a helpful assistant.\"\n        self.action_space: list[Action] | None = None\n        self.action_schema: list[Tool] | None = None\n        self.token_usage: int = 0\n        self.chat_history: list[list[dict]] = []\n        self.support_tool_call = True\n\n    def reset(self, system_message: str, action_space: list[Action] | None) -> None:\n        self.system_message = system_message\n        self.action_space = action_space\n        self.action_schema = _convert_action_to_schema(self.action_space)\n        self.token_usage = 0\n        self.chat_history = []\n\n    def chat(self, message: list[Message] | Message) -> BackendOutput:\n        if isinstance(message, tuple):\n            message = [message]\n        request = self._fetch_from_memory()\n        new_message = self._construct_new_message(message)\n        request.append(new_message)\n        response_message = self._call_api(request)\n        self._record_message(new_message, response_message)\n        return self._generate_backend_output(response_message)\n\n    def _construct_new_message(self, message: list[Message]) -> dict[str, Any]:\n        parts: list[str | Image] = []\n        for content, msg_type in message:\n            match msg_type:\n                case MessageType.TEXT:\n                    parts.append(content)\n                case MessageType.IMAGE_JPG_BASE64:\n                    parts.append(base64_to_image(content))\n        return {\n            \"role\": \"user\",\n            \"parts\": parts,\n        }\n\n    def _generate_backend_output(self, response_message: Content) -> BackendOutput:\n        tool_calls: list[ActionOutput] = []\n        for part in response_message.parts:\n            if \"function_call\" in Part.to_dict(part):\n                call = Part.to_dict(part)[\"function_call\"]\n                tool_calls.append(\n                    ActionOutput(\n                        name=call[\"name\"],\n                        arguments=call[\"args\"],\n                    )\n                )\n\n        return BackendOutput(\n            message=response_message.parts[0].text or None,\n            action_list=tool_calls or None,\n        )\n\n    def _fetch_from_memory(self) -> list[dict]:\n        request: list[dict] = []\n        if self.history_messages_len > 0:\n            fetch_history_len = min(self.history_messages_len, len(self.chat_history))\n            for history_message in self.chat_history[-fetch_history_len:]:\n                request = request + history_message\n        return request\n\n    def get_token_usage(self):\n        return self.token_usage\n\n    def _record_message(\n        self, new_message: dict[str, Any], response_message: Content\n    ) -> None:\n        self.chat_history.append([new_message])\n        self.chat_history[-1].append(\n            {\"role\": response_message.role, \"parts\": response_message.parts}\n        )\n\n    @retry(\n        wait=wait_fixed(10),\n        stop=stop_after_attempt(7),\n        retry=retry_if_exception_type(ResourceExhausted),\n    )\n    def _call_api(self, request_messages: list) -> Content:\n        if self.action_schema is not None:\n            tool_config = content_types.to_tool_config(\n                {\n                    \"function_calling_config\": {\n                        \"mode\": \"ANY\" if self.tool_call_required else \"AUTO\"\n                    }\n                }\n            )\n            response = self.client.GenerativeModel(\n                self.model, system_instruction=self.system_message\n            ).generate_content(\n                contents=request_messages,\n                tools=self.action_schema,\n                tool_config=tool_config,\n                # **self.parameters, # TODO(Tianqi): Fix this line in the future\n            )\n        else:\n            response = self.client.GenerativeModel(\n                self.model, system_instruction=self.system_message\n            ).generate_content(\n                contents=request_messages,\n                # **self.parameters, # TODO(Tianqi): Fix this line in the future\n            )\n\n        self.token_usage += response.candidates[0].token_count\n        return response.candidates[0].content\n\n\ndef _convert_action_to_schema(action_space: list[Action] | None) -> list[Tool] | None:\n    if action_space is None:\n        return None\n    actions = [\n        Tool(\n            function_declarations=[\n                _action_to_func_dec(action) for action in action_space\n            ]\n        )\n    ]\n    return actions\n\n\ndef _clear_schema(schema_dict: dict) -> None:\n    schema_dict.pop(\"title\", None)\n    p_type = schema_dict.pop(\"type\", None)\n    for prop in schema_dict.get(\"properties\", {}).values():\n        _clear_schema(prop)\n    if p_type is not None:\n        schema_dict[\"type_\"] = p_type.upper()\n    if \"items\" in schema_dict:\n        _clear_schema(schema_dict[\"items\"])\n\n\ndef _action_to_func_dec(action: Action) -> FunctionDeclaration:\n    \"Converts crab Action to google FunctionDeclaration\"\n    p_schema = action.parameters.model_json_schema()\n    if \"$defs\" in p_schema:\n        p_schema = json_expand_refs(p_schema)\n    _clear_schema(p_schema)\n    if not p_schema[\"properties\"]:\n        return FunctionDeclaration(\n            name=action.name,\n            description=action.description,\n        )\n    return FunctionDeclaration(\n        name=action.name,\n        description=action.description,\n        parameters=p_schema,\n    )\n"
  },
  {
    "path": "crab/agents/backend_models/openai_model.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport json\nfrom typing import Any\n\nfrom crab import Action, ActionOutput, BackendModel, BackendOutput, Message, MessageType\nfrom crab.agents.utils import extract_text_and_code_prompts\n\ntry:\n    import openai\n    from openai.types.chat import ChatCompletionMessage\n\n    openai_model_enable = True\nexcept ImportError:\n    openai_model_enable = False\n\n\nclass OpenAIModel(BackendModel):\n    def __init__(\n        self,\n        model: str,\n        parameters: dict[str, Any] | None = None,\n        history_messages_len: int = 0,\n        tool_call_required: bool = True,\n        base_url: str | None = None,\n        api_key: str | None = None,\n    ) -> None:\n        if not openai_model_enable:\n            raise ImportError(\"Please install openai to use OpenAIModel\")\n\n        self.model = model\n        self.parameters = parameters if parameters is not None else {}\n        self.history_messages_len = history_messages_len\n\n        assert self.history_messages_len >= 0\n\n        self.client = openai.OpenAI(api_key=api_key, base_url=base_url)\n        self.tool_call_required: bool = tool_call_required\n        self.system_message: str = \"You are a helpful assistant.\"\n        self.openai_system_message = {\n            \"role\": \"system\",\n            \"content\": self.system_message,\n        }\n        self.action_space: list[Action] | None = None\n        self.action_schema: list[dict] | None = None\n        self.token_usage: int = 0\n        self.chat_history: list[list[ChatCompletionMessage | dict]] = []\n        self.support_tool_call = True\n\n    def reset(self, system_message: str, action_space: list[Action] | None) -> None:\n        self.system_message = system_message\n        self.openai_system_message = {\n            \"role\": \"system\",\n            \"content\": system_message,\n        }\n        self.action_space = action_space\n        self.action_schema = _convert_action_to_schema(self.action_space)\n        self.token_usage = 0\n        self.chat_history = []\n\n    def chat(self, message: list[Message] | Message) -> BackendOutput:\n        if isinstance(message, tuple):\n            message = [message]\n        request = self._fetch_from_memory()\n        new_message = self._construct_new_message(message)\n        request.append(new_message)\n        response_message = self._call_api(request)\n        self._record_message(new_message, response_message)\n        return self._generate_backend_output(response_message)\n\n    def get_token_usage(self):\n        return self.token_usage\n\n    def _record_message(\n        self, new_message: dict, response_message: ChatCompletionMessage\n    ) -> None:\n        self.chat_history.append([new_message])\n        self.chat_history[-1].append(response_message)\n\n        if self.action_schema and response_message.tool_calls is not None:\n            for tool_call in response_message.tool_calls:\n                self.chat_history[-1].append(\n                    {\n                        \"tool_call_id\": tool_call.id,\n                        \"role\": \"tool\",\n                        \"name\": tool_call.function.name,\n                        \"content\": \"success\",\n                    }\n                )  # extend conversation with function response\n\n    def _call_api(\n        self, request_messages: list[ChatCompletionMessage | dict]\n    ) -> ChatCompletionMessage:\n        if self.action_schema is not None:\n            response = self.client.chat.completions.create(\n                messages=request_messages,  # type: ignore\n                model=self.model,\n                tools=self.action_schema,\n                tool_choice=\"required\" if self.tool_call_required else \"auto\",\n                **self.parameters,\n            )\n        else:\n            response = self.client.chat.completions.create(\n                messages=request_messages,  # type: ignore\n                model=self.model,\n                **self.parameters,\n            )\n\n        self.token_usage += response.usage.total_tokens\n        return response.choices[0].message\n\n    def _fetch_from_memory(self) -> list[ChatCompletionMessage | dict]:\n        request: list[ChatCompletionMessage | dict] = [self.openai_system_message]\n        if self.history_messages_len > 0:\n            fetch_history_len = min(self.history_messages_len, len(self.chat_history))\n            for history_message in self.chat_history[-fetch_history_len:]:\n                request = request + history_message\n        return request\n\n    def _construct_new_message(self, message: list[Message]) -> dict[str, Any]:\n        new_message_content: list[dict[str, Any]] = []\n        for content, msg_type in message:\n            match msg_type:\n                case MessageType.TEXT:\n                    new_message_content.append(\n                        {\n                            \"type\": \"text\",\n                            \"text\": content,\n                        }\n                    )\n                case MessageType.IMAGE_JPG_BASE64:\n                    new_message_content.append(\n                        {\n                            \"type\": \"image_url\",\n                            \"image_url\": {\n                                \"url\": f\"data:image/jpeg;base64,{content}\",\n                                \"detail\": \"high\",\n                            },\n                        }\n                    )\n\n        return {\"role\": \"user\", \"content\": new_message_content}\n\n    def _generate_backend_output(\n        self, response_message: ChatCompletionMessage\n    ) -> BackendOutput:\n        if response_message.tool_calls is None:\n            return BackendOutput(message=response_message.content, action_list=None)\n        action_list = [\n            ActionOutput(\n                name=call.function.name,\n                arguments=json.loads(call.function.arguments),\n            )\n            for call in response_message.tool_calls\n        ]\n        return BackendOutput(\n            message=response_message.content,\n            action_list=action_list,\n        )\n\n\ndef _convert_action_to_schema(\n    action_space: list[Action] | None,\n) -> list[dict] | None:\n    if action_space is None:\n        return None\n    actions = []\n    for action in action_space:\n        new_action = action.to_openai_json_schema()\n        actions.append({\"type\": \"function\", \"function\": new_action})\n    return actions\n\n\nclass OpenAIModelJSON(OpenAIModel):\n    def __init__(\n        self,\n        model: str,\n        parameters: dict[str, Any] = dict(),\n        history_messages_len: int = 0,\n        base_url: str | None = None,\n        api_key: str | None = None,\n    ) -> None:\n        super().__init__(\n            model,\n            parameters,\n            history_messages_len,\n            False,\n            base_url,\n            api_key,\n        )\n        self.support_tool_call = False\n\n    def reset(self, system_message: str, action_space: list[Action] | None) -> None:\n        super().reset(system_message, action_space)\n        self.action_schema = None\n\n    def _record_message(\n        self, new_message: dict, response_message: ChatCompletionMessage\n    ) -> None:\n        self.chat_history.append([new_message])\n        self.chat_history[-1].append(\n            {\"role\": \"assistant\", \"content\": response_message.content}\n        )\n\n    def _generate_backend_output(\n        self, response_message: ChatCompletionMessage\n    ) -> BackendOutput:\n        content = response_message.content\n        text_list, code_list = extract_text_and_code_prompts(content)\n\n        action_list = []\n        try:\n            for code_block in code_list:\n                action_object = json.loads(code_block)\n                action_list.append(\n                    ActionOutput(\n                        name=action_object[\"name\"], arguments=action_object[\"arguments\"]\n                    )\n                )\n        except json.JSONDecodeError as e:\n            raise RuntimeError(f\"Failed to parse code block: {code_block}\") from e\n        except KeyError as e:\n            raise RuntimeError(f\"Received invalid action format: {code_block}\") from e\n\n        return BackendOutput(\n            message=\"\".join(text_list),\n            action_list=action_list,\n        )\n\n\nclass SGlangOpenAIModelJSON(OpenAIModelJSON):\n    def _construct_new_message(self, message: list[Message]) -> dict[str, Any]:\n        new_message_content: list[dict[str, Any]] = []\n        image_count = 0\n        for _, msg_type in message:\n            if msg_type == MessageType.IMAGE_JPG_BASE64:\n                image_count += 1\n        for content, msg_type in message:\n            match msg_type:\n                case MessageType.TEXT:\n                    new_message_content.append(\n                        {\n                            \"type\": \"text\",\n                            \"text\": content,\n                        }\n                    )\n                case MessageType.IMAGE_JPG_BASE64:\n                    image_content = {\n                        \"type\": \"image_url\",\n                        \"image_url\": {\n                            \"url\": f\"data:image/png;base64,{content}\",\n                            \"detail\": \"high\",\n                        },\n                    }\n                    if image_count > 1:\n                        image_content[\"modalities\"] = \"multi-images\"\n                    new_message_content.append(image_content)\n\n        return {\"role\": \"user\", \"content\": new_message_content}\n"
  },
  {
    "path": "crab/agents/policies/__init__.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# ruff: noqa: F401\nfrom .multi_agent_by_env import MultiAgentByEnvPolicy\nfrom .multi_agent_by_func import MultiAgentByFuncPolicy\nfrom .single_agent import SingleAgentPolicy\n"
  },
  {
    "path": "crab/agents/policies/multi_agent_by_env.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom crab import Action, ActionOutput\nfrom crab.agents.backend_models import BackendModelConfig, create_backend_model\nfrom crab.agents.utils import generate_action_prompt\nfrom crab.core.agent_policy import AgentPolicy\nfrom crab.core.backend_model import (\n    BackendModel,\n    MessageType,\n)\n\n\nclass MultiAgentByEnvPolicy(AgentPolicy):\n    _main_agent_prompt = \"\"\"You are a main agent, and your goal is to plan and\n    give instructions to sub-agents in each environment to complete the final task. Now\n    you have to do a task as described below: {task_description}.  The description of\n    each given environment: {env_description}.  For each step, you are required to\n    provide high-level instructions detailing the next actions to be taken.\n    Additionally, you must specify which sub-agent in the designated environment should\n    execute these instructions. If a sub-agent is not needed for a particular step, you\n    may instruct it to skip that step.\"\"\"\n\n    _env_agent_prompt = \"\"\"You are a sub-agent responsible for the {environment}\n    environment.  The description of the {environment} environment is:\n    {env_description}.  Your goal is to assist the main agent in completing the final\n    task by performing actions in the {environment} environment according to the\n    instructions from the main agent. The final task is described below:\n    {task_description}. A unit operation you can perform is called action in a given\n    environment. You can only execute action in the {environment} environment. For the\n    {environment} environment, you are given a limited action space as function calls:\n    {action_descriptions}\n    The interactive UI elements on the screenshot are labeled with numeric tags starting\n    from 1. For each step, You will receive an instruction telling you what you need to\n    do next. After analyzing the instruction you received and the current {environment}\n    system, if you think you don't need to do anything in the current {environment}\n    system, you should choose SKIP action. Otherwise, you must state what actions to\n    take, what the parameters are, and you MUST provide in which environment to perform\n    these actions. Your answer must be function calls. Please do not output any other\n    information. You must make sure all function calls get their required parameters.\"\"\"\n\n    _root_agent_prompt = \"\"\"You are a sub-agent responsible for the crab benchmark root\n    environment. Your goal is to assist the main agent in completing the whole task:\n    \"{task_description}\". You can only complete the task or submit the result when the\n    main agent tells you the whole task has been completed. Otherwise, you can only call\n    SKIP.  \"\"\"\n\n    def __init__(\n        self,\n        main_agent_model_backend: BackendModelConfig,\n        env_agent_model_backend: BackendModelConfig,\n    ):\n        self.main_agent_model_backend = create_backend_model(main_agent_model_backend)\n        self.env_agent_model_backend_config = env_agent_model_backend\n        self.reset(task_description=\"\", action_spaces={}, env_descriptions={})\n\n    def reset(\n        self,\n        task_description: str,\n        action_spaces: dict[str, list[Action]],\n        env_descriptions: dict[str, str],\n    ) -> list:\n        self.task_description = task_description\n        main_agent_system_message = self._main_agent_prompt.format(\n            task_description=task_description,\n            env_description=str(env_descriptions),\n        )\n        self.main_agent_model_backend.reset(main_agent_system_message, None)\n\n        root_agent_system_message = self._root_agent_prompt.format(\n            task_description=task_description\n        )\n        self.env_agent_model_backends: dict[str, BackendModel] = {}\n        for env in action_spaces:\n            backend = create_backend_model(self.env_agent_model_backend_config)\n            if env == \"root\":\n                backend.reset(root_agent_system_message, action_spaces[env])\n            else:\n                backend.require_tool = True\n                env_agent_system_message = self._env_agent_prompt.format(\n                    task_description=task_description,\n                    environment=env,\n                    env_description=env_descriptions[env],\n                    action_descriptions=generate_action_prompt(action_spaces[env]),\n                )\n                backend.reset(env_agent_system_message, action_spaces[env])\n            self.env_agent_model_backends[env] = backend\n\n    def get_token_usage(self):\n        result = 0\n        result += self.main_agent_model_backend.get_token_usage()\n        for env_agent in self.env_agent_model_backends.values():\n            result += env_agent.get_token_usage()\n        return result\n\n    def get_backend_model_name(self):\n        return (\n            self.main_agent_model_backend.__class__.__name__\n            + \"_\"\n            + self.main_agent_model_backend.model\n        )\n\n    def chat(\n        self,\n        observation: dict[str, list[tuple[str, MessageType]]],\n    ) -> list[ActionOutput]:\n        main_prompt = []\n        for env in observation:\n            main_prompt.extend(observation[env])\n        main_prompt.append(\n            (\n                (\n                    f\"Your target: {self.task_description}\\n\"\n                    \"Tell me the next step in each environment.\"\n                ),\n                MessageType.TEXT,\n            )\n        )\n        output = self.main_agent_model_backend.chat(main_prompt)\n        main_agent_message = (\n            f\"The instruction from main agent for this step: {output.message}\"\n        )\n\n        tool_calls = []\n        for env in self.env_agent_model_backends:\n            backend = self.env_agent_model_backends[env]\n            if env in observation:\n                output = backend.chat(\n                    observation[env] + [(main_agent_message, MessageType.TEXT)]\n                )\n            else:\n                output = backend.chat((main_agent_message, MessageType.TEXT))\n            for action in output.action_list:\n                action.env = env\n            tool_calls.extend(output.action_list)\n        return tool_calls\n"
  },
  {
    "path": "crab/agents/policies/multi_agent_by_func.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom crab.agents.backend_models import BackendModelConfig, create_backend_model\nfrom crab.agents.utils import (\n    combine_multi_env_action_space,\n    decode_combined_action,\n    generate_action_prompt,\n)\nfrom crab.core import Action, ActionOutput\nfrom crab.core.agent_policy import AgentPolicy\nfrom crab.core.backend_model import MessageType\n\n\nclass MultiAgentByFuncPolicy(AgentPolicy):\n    _system_prompt = \"\"\"You are a helpful assistant. Now you have to do a task as\n    described below: {task_description}. And this is the description of each given\n    environment: {env_description}. A unit operation you can perform is called action in\n    a given environment. For each environment, you are given a limited action space as\n    function calls:\n    {action_descriptions}\n    You may receive a screenshot of the current system. The interactive UI elements on\n    the screenshot are labeled with numeric tags starting from 1. For each step, You\n    must state what actions to take, what the parameters are, and you MUST provide in\n    which environment to perform these actions. \"\"\"\n\n    _tool_prompt = \"\"\"You are a helpful assistant in generating function calls. I will\n    give you a detailed description of what actions to take next, you should translate\n    it into function calls. please do not output any other information.\n    \"\"\"\n\n    def __init__(\n        self,\n        main_agent_model_backend: BackendModelConfig,\n        tool_agent_model_backend: BackendModelConfig,\n    ):\n        self.main_agent_model_backend = create_backend_model(main_agent_model_backend)\n        self.tool_agent_model_backend = create_backend_model(tool_agent_model_backend)\n        self.reset(task_description=\"\", action_spaces=None, env_descriptions={})\n\n    def reset(\n        self,\n        task_description: str,\n        action_spaces: dict[str, list[Action]],\n        env_descriptions: dict[str, str],\n    ) -> list[ActionOutput]:\n        self.task_description = task_description\n        self.action_space = combine_multi_env_action_space(action_spaces)\n\n        main_agent_system_message = self._system_prompt.format(\n            task_description=task_description,\n            action_descriptions=generate_action_prompt(self.action_space),\n            env_description=str(env_descriptions),\n        )\n        self.main_agent_model_backend.reset(main_agent_system_message, None)\n        self.tool_agent_model_backend.reset(self._tool_prompt, self.action_space)\n\n    def get_token_usage(self):\n        return (\n            self.main_agent_model_backend.get_token_usage()\n            + self.tool_agent_model_backend.get_token_usage()\n        )\n\n    def get_backend_model_name(self):\n        return (\n            self.main_agent_model_backend.__class__.__name__\n            + \"_\"\n            + self.main_agent_model_backend.model\n        )\n\n    def chat(\n        self,\n        observation: dict[str, list[tuple[str, MessageType]]],\n    ) -> list[ActionOutput]:\n        prompt = []\n        for env in observation:\n            prompt.extend(observation[env])\n        prompt.append(\n            (\n                f\"Your target: {self.task_description}\\nTell me the next action.\",\n                MessageType.TEXT,\n            )\n        )\n        output = self.main_agent_model_backend.chat(prompt)\n        tool_output = self.tool_agent_model_backend.chat(\n            (output.message, MessageType.TEXT)\n        )\n        return decode_combined_action(tool_output.action_list)\n"
  },
  {
    "path": "crab/agents/policies/single_agent.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport logging\n\nfrom crab import Action, ActionOutput\nfrom crab.agents.backend_models import BackendModelConfig, create_backend_model\nfrom crab.agents.utils import (\n    combine_multi_env_action_space,\n    decode_combined_action,\n    generate_action_prompt,\n)\nfrom crab.core.agent_policy import AgentPolicy\nfrom crab.core.backend_model import (\n    MessageType,\n)\nfrom crab.utils.measure import timed\n\nlogger = logging.getLogger(__name__)\n\n\nclass SingleAgentPolicy(AgentPolicy):\n    _system_prompt_with_function_call = \"\"\"\\\n    You are a helpful assistant. Now you have to do a task as described below: \n\n    **\"{task_description}.\"**\n\n    You should never forget this task and always perform actions to achieve this task. \n    And this is the description of each given environment: {env_description}. A\n    unit operation you can perform is called Action. You have a limited action space as\n    function calls:\n    {action_descriptions}\n    You may receive a screenshot of the current system. You may receive a screenshot of\n    a smartphone app. The interactive UI elements on the screenshot are labeled with\n    numeric tags starting from 1. \n\n    In each step, You MUST explain what do you see from the current observation and the\n    plan of the next action, then use a provided action in each step to achieve the\n    task. You should state what action to take and what the parameters should be. Your\n    answer MUST be a least one function call. You SHOULD NEVER ask me to do anything for\n    you. Always do them by yourself using function calls.\n    \"\"\"\n\n    _system_prompt_no_function_call = \"\"\"\\\n    You are a helpful assistant. Now you have to do a task as described below: \n\n    **\"{task_description}.\"**\n\n    You should never forget this task and always perform actions to achieve this task. \n    And this is the description of each given environment: {env_description}. You will\n    receive screenshots of the environments. The interactive UI elements on the\n    screenshot are labeled with numeric tags starting from 1. \n\n    A unit operation you can perform is called Action. You have a limited action space\n    as function calls: {action_descriptions}. You should generate JSON code blocks to\n    execute the actions. Each code block MUST contains only one json object, i.e. one\n    action. You can output multiple code blocks to execute multiple actions in a single\n    step. You must follow the JSON format below to output the action. \n    ```json\n    {{\"name\": \"action_name\", \"arguments\": {{\"arg1\": \"value1\", \"arg2\": \"value2\"}}}}\n    ```\n    or if not arguments needed:\n    ```json\n    {{\"name\": \"action_name\", \"arguments\": {{}}}}\n    ```\n    You MUST use exactly the same \"action_name\" as I gave to you in the action space.\n    You SHOULDN'T add any comments in the code blocks.\n\n    In each step, You MUST explain what do you see from the current observation and the\n    plan of the next action, then use a provided action in each step to achieve the\n    task. You should state what action to take and what the parameters should be. Your\n    answer MUST contain at least one code block. You SHOULD NEVER ask me to do anything\n    for you. Always do them by yourself.\n    \"\"\"\n\n    def __init__(\n        self,\n        model_backend: BackendModelConfig,\n        function_call: bool = True,\n    ):\n        self.model_backend = create_backend_model(model_backend)\n        self.function_call = function_call\n        if not self.model_backend.support_tool_call and self.function_call:\n            logger.warning(\n                \"The backend model does not support tool call: {}\".format(\n                    model_backend.model_name\n                )\n                + \"\\nFallback to no function call mode.\"\n            )\n            self.function_call = False\n        if self.function_call:\n            self.system_prompt = self._system_prompt_with_function_call\n        else:\n            self.system_prompt = self._system_prompt_no_function_call\n        self.reset(task_description=\"\", action_spaces=None, env_descriptions={})\n\n    def reset(\n        self,\n        task_description: str,\n        action_spaces: dict[str, list[Action]],\n        env_descriptions: dict[str, str],\n    ) -> list:\n        self.task_description = task_description\n        self.action_space = combine_multi_env_action_space(action_spaces)\n        system_message = self.system_prompt.format(\n            task_description=task_description,\n            action_descriptions=generate_action_prompt(\n                self.action_space,\n                expand=not self.function_call,\n            ),\n            env_description=str(env_descriptions),\n        )\n        if self.function_call:\n            self.model_backend.reset(system_message, self.action_space)\n        else:\n            self.model_backend.reset(system_message, None)\n\n    def get_token_usage(self):\n        return self.model_backend.get_token_usage()\n\n    def get_backend_model_name(self):\n        return self.model_backend.__class__.__name__ + \"_\" + self.model_backend.model\n\n    @timed\n    def chat(\n        self,\n        observation: dict[str, list[tuple[str, MessageType]]],\n    ) -> list[ActionOutput]:\n        prompt = []\n        for env in observation:\n            prompt.extend(observation[env])\n        prompt.append(\n            (\n                f\"Your target: {self.task_description}\\nTell me the next action.\",\n                MessageType.TEXT,\n            )\n        )\n        output = self.model_backend.chat(prompt)\n        # print(\"Agent Message: \" + output.message, flush=True)\n        # print(\"Agent Action: \" + str(output.action_list), flush=True)\n        return decode_combined_action(output.action_list)\n"
  },
  {
    "path": "crab/agents/utils.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom crab.core import Action, ActionOutput\n\n\ndef combine_multi_env_action_space(\n    action_space: dict[str, list[Action]] | None,\n) -> list[Action]:\n    \"\"\"Combine multi-env action space together to fit in a single agent.\"\"\"\n    result = []\n    if action_space is None:\n        return result\n    for env in action_space:\n        for action in action_space[env]:\n            new_action = action.model_copy()\n            new_action.name = new_action.name + \"_in_\" + env\n            new_action.description = f\"In {env} environment, \" + new_action.description\n            result.append(new_action)\n    return result\n\n\ndef decode_combined_action(\n    output_actions: list[ActionOutput],\n) -> list[ActionOutput]:\n    \"\"\"Decode combined action output to action output with the corresponding\n    environment.\n    \"\"\"\n    result = []\n    for output in output_actions:\n        name_env = output.name.split(\"_in_\")\n        if len(name_env) != 2:\n            raise RuntimeError(\n                'The decoded action name should contain the splitter \"_in_\".'\n            )\n        new_output = output.model_copy()\n        new_output.name = name_env[0]\n        new_output.env = name_env[1]\n        result.append(new_output)\n    return result\n\n\ndef generate_action_prompt(action_space: list[Action], expand: bool = False) -> str:\n    if expand:\n        return \"\".join(\n            [\n                f\"[**{action.name}**:\\n\"\n                f\"action description: {action.description}\\n\"\n                f\"action arguments json schema: {action.to_openai_json_schema()}\\n\"\n                \"]\\n\"\n                for action in action_space\n            ]\n        )\n    else:\n        return \"\".join(\n            [f\"[{action.name}: {action.description}]\\n\" for action in action_space]\n        )\n\n\ndef extract_text_and_code_prompts(content: str) -> tuple[list[str], list[str]]:\n    r\"\"\"Extract text and code prompts from the message content.\n\n    Returns:\n        A tuple (text_list, code_list) where, text_list is a list of text and  code_list\n        is a list of extracted codes both from the content.\n    \"\"\"\n    text_prompts: list[str] = []\n    code_prompts: list[str] = []\n\n    lines = content.split(\"\\n\")\n    idx = 0\n    start_idx = 0\n    while idx < len(lines):\n        while idx < len(lines) and (not lines[idx].lstrip().startswith(\"```\")):\n            idx += 1\n        text = \"\\n\".join(lines[start_idx:idx]).strip()\n        text_prompts.append(text)\n\n        if idx >= len(lines):\n            break\n\n        # code_type = lines[idx].strip()[3:].strip()\n        idx += 1\n        start_idx = idx\n        while not lines[idx].lstrip().startswith(\"```\") and idx < len(lines):\n            idx += 1\n        if idx >= len(lines):\n            break\n        code = \"\\n\".join(lines[start_idx:idx]).strip()\n        code_prompts.append(code)\n\n        idx += 1\n        start_idx = idx\n\n    return text_prompts, code_prompts\n"
  },
  {
    "path": "crab/benchmarks/__init__.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n"
  },
  {
    "path": "crab/benchmarks/template.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport networkx as nx\n\nfrom crab import BenchmarkConfig, Task, action, evaluator\nfrom crab.environments.template import set_state, template_environment_config\n\n\n@evaluator\ndef is_system_state(env) -> bool:\n    return env.state\n\n\n@evaluator(env_name=\"root\")\ndef check_submit_true(env) -> bool:\n    if env.trajectory:\n        action_name, params, _ = env.trajectory[-1]\n        print(action_name, params)\n        if action_name == \"_submit\" and params[\"content\"]:\n            return True\n    return False\n\n\n@action(env_name=\"root\")\ndef _submit(content: bool) -> None:\n    \"\"\"Submit your answer through this function.\n\n    Args:\n        content: the content to submit\n    \"\"\"\n    pass\n\n\ntemplate_benchmark_config = BenchmarkConfig(\n    name=\"template_benchmark\",\n    environments=[template_environment_config],\n    tasks=[\n        Task(\n            id=\"0\",\n            description=\"Set the system state to True.\",\n            evaluator=is_system_state,\n            setup=set_state(False),\n        ),\n        Task(\n            id=\"1\",\n            description=\"Submit True.\",\n            evaluator=check_submit_true,\n            extra_action=[_submit],\n        ),\n    ],\n)\n\n\n@evaluator(env_name=\"testenv0\")\ndef check_sys0(env) -> bool:\n    return env.state\n\n\n@evaluator(env_name=\"testenv1\")\ndef check_sys1(env) -> bool:\n    return env.state\n\n\n@evaluator(env_name=\"testenv2\")\ndef check_sys2(env) -> bool:\n    return env.state\n\n\neval_g = nx.DiGraph()\neval_g.add_edge(check_sys0, check_submit_true)\neval_g.add_edge(check_sys1, check_submit_true)\neval_g.add_edge(check_sys2, check_submit_true)\n\nmultienv_template_benchmark_config = BenchmarkConfig(\n    name=\"mutlienv_template_benchmark\",\n    environments=[\n        template_environment_config.model_copy(update={\"name\": f\"testenv{idx}\"})\n        for idx in range(3)\n    ],\n    tasks=[\n        Task(\n            id=\"0\",\n            description=(\n                \"Set the system state to True in all three environments. \"\n                \"Then submit True to finish the project.\"\n            ),\n            evaluator=eval_g,\n            extra_action=[_submit],\n        )\n    ],\n    multienv=True,\n)\n"
  },
  {
    "path": "crab/core/__init__.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# ruff: noqa: F401, F403\nfrom .agent_policy import AgentPolicy\nfrom .backend_model import BackendModel\nfrom .benchmark import Benchmark, create_benchmark\nfrom .decorators import action, evaluator\nfrom .environment import Environment, create_environment\nfrom .experiment import Experiment\nfrom .graph_evaluator import Evaluator, GraphEvaluator\nfrom .models import *\nfrom .task_generator import TaskGenerator\n"
  },
  {
    "path": "crab/core/agent_policy.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom abc import ABC, abstractmethod\n\nfrom .models import Action, ActionOutput, Message\n\n\nclass AgentPolicy(ABC):\n    @abstractmethod\n    def chat(\n        self,\n        observation: dict[str, list[Message]],\n    ) -> list[ActionOutput]: ...\n\n    @abstractmethod\n    def reset(\n        self,\n        task_description: str,\n        action_spaces: dict[str, list[Action]],\n        env_descriptions: dict[str, str],\n    ) -> None: ...\n\n    @abstractmethod\n    def get_token_usage(self) -> int: ...\n\n    @abstractmethod\n    def get_backend_model_name(self) -> str: ...\n"
  },
  {
    "path": "crab/core/backend_model.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom abc import ABC, abstractmethod\n\nfrom .models import Action, BackendOutput, MessageType\n\n\nclass BackendModel(ABC):\n    @abstractmethod\n    def chat(self, contents: list[tuple[str, MessageType]]) -> BackendOutput: ...\n\n    @abstractmethod\n    def reset(\n        self,\n        system_message: str,\n        action_space: list[Action] | None,\n    ): ...\n\n    @abstractmethod\n    def get_token_usage(self): ...\n"
  },
  {
    "path": "crab/core/benchmark.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport traceback\nfrom time import sleep\nfrom typing import Any\n\nfrom crab.core.graph_evaluator import GraphEvaluator\nfrom crab.utils.measure import timed\n\nfrom .environment import Environment, create_environment\nfrom .exceptions import TaskNotFound\nfrom .models import Action, BenchmarkConfig, ClosedAction, MessageType, StepResult, Task\n\n\nclass Benchmark:\n    \"\"\"The crab benchmark controller managing environments and agent evaluation.\n\n    The class manages multiple environments together and provide the simple API by\n    :meth:`step`, :meth:`observe` and :meth:`reset` for language model agents to perform\n    tasks in multiple environments.\n\n    This class introduces a \"root\" environment with no action or observation\n    capabilities, intended as a utility for evaluations not directly tied to a specific\n    environment.\n\n    This class operates in two distinct modes: \"multi-environment\" and\n    \"single-environment\".  In multi-environment mode, observations and action results\n    are separated by environment, returned as a dictionary. While in single-environment\n    mode, all observations and action outcomes are merged under the \"root\" environment,\n    with actions being appropriately routed to their respective environments.\n\n    \"\"\"\n\n    def __init__(\n        self,\n        name: str,\n        tasks: list[Task],\n        environments: list[Environment],\n        default_env: str | None = None,\n        multienv: bool = False,\n        prompting_tools: dict[str, dict[str, Action]] = {},\n        root_action_space: list[Action] = [],\n        step_limit: int = 30,\n        common_setup: list[ClosedAction] = [],\n    ) -> None:\n        \"\"\"Initializes the instance.\n\n        Args:\n            name: Identifier for the benchmark.\n            tasks: Tasks to be executed within the benchmark.\n            environments: Environments in which the benchmark is conducted.\n            default_env: The default environment name, applied when actions do not\n                specify an environment. Defaults to \"root\" in the multi-environment mode\n                and to the environment in the single environment mode.\n            multienv: Indicates whether to enable multi-environment mode. Defaults to\n                :obj:`False`.\n            prompting_tools: Prompting tools applied in :meth:`observe_with_prompt`. The\n                first level keys are environment names, the second level keys are\n                observation action names. Defaults to empty.\n            root_action_space: The action space executed in the root environment.\n        \"\"\"\n        self.name = name\n        self.tasks = tasks\n        self.multienv = multienv\n        self.prompting_tools = prompting_tools\n        self.step_limit = step_limit\n        self.common_setup = common_setup\n\n        if isinstance(environments, Environment):\n            environments = [environments]\n        self.root_env = Environment(\n            name=\"root\",\n            action_space=root_action_space,\n            observation_space=[],\n            description=\"The crab benchmark root. You can submit your answer or \"\n            \"complete the task using this environment.\",\n        )\n        self.root_env.contained_envs = {env.name: env for env in environments}  # A hack\n        environments.append(self.root_env)\n        self.environment_map: dict[str, Environment] = {\n            env.name: env for env in environments\n        }\n\n        # if not multienv, combine all environments action space together\n        if not self.multienv:\n            # action_map is used only by \"agent\", specifically `step` and\n            # `export_action_space` functions\n            self._verify_spaces()\n            self._generate_action_map()\n\n        # default_env is used for predefined actions without env_name or like\n        # evaluators setups, teardowns, and so on.\n        if default_env is None:\n            if not multienv and len(environments) == 2:\n                self.default_env = environments[0].name\n            else:\n                self.default_env = self.root_env.name\n        else:\n            self.default_env = default_env\n\n        self.current_task: Task | None = None\n        self.current_evaluator: GraphEvaluator | None = None\n        self.step_cnt = 0\n\n    def start_task(self, task_id: str) -> tuple[Task, dict[str, list[Action]]]:\n        \"\"\"Initializes and starts a specified task.\n\n        Args:\n            task_id: The ID of the task to start.\n\n        Returns:\n            A tuple (task, action_space), where task is the started task object, and\n            action_sapce is a dict mapping action names to the corresponding action\n            object.\n        \"\"\"\n        if self.current_task is not None:\n            raise RuntimeError(\"Another task is running\")\n        self.current_task = self._get_task_by_id(task_id)\n\n        # reset all environments\n        self._reset_environments()\n\n        for action in self.common_setup:\n            self._take_env_action(action)\n\n        # select environment by Action.env_name\n        for action in self.current_task.setup:\n            self._take_env_action(action)\n\n        for task_action in self.current_task.extra_action:\n            self._set_env_action(task_action)\n\n        # reset evaluator\n        self.current_evaluator = GraphEvaluator(self.current_task.evaluator)\n        # put submit action to corresponding env space\n        # For now, only the last node can be the submit task\n\n        self.step_cnt = 0\n        return self.current_task, self.export_action_space()\n\n    def close_task(self) -> None:\n        \"\"\"Cleans up after a task is completed.\"\"\"\n        if self.current_evaluator is None or self.current_task is None:\n            raise RuntimeError(\"There is no started task.\")\n        for action in self.current_task.teardown:\n            self._take_env_action(action)\n        self.current_task = None\n\n    def get_env_descriptions(self) -> dict[str, str]:\n        \"\"\"Get environment descriptions as a dict structure.\"\"\"\n        return {\n            name: self.environment_map[name].description\n            for name in self.environment_map\n        }\n\n    def observe(self) -> dict[str, dict[str, Any]]:\n        \"\"\"Collects observations from all environments.\n\n        Returns:\n            A dict-of-dict with observations from each environment. The first level keys\n            are environment names, the second level keys are observation action names.\n        \"\"\"\n        env_obs = {env.name: env.observe() for env in self.environment_map.values()}\n        if self.multienv:\n            return env_obs\n        return self._merge_dicts(env_obs)\n\n    @timed\n    def observe_with_prompt(\n        self,\n    ) -> tuple[dict[str, dict[str, Any]], dict[str, tuple[str, MessageType]]]:\n        \"\"\"Collects observations and applies prompting tools.\n\n        Returns:\n            A tuple (observations, prompts), where \"observations\" and \"prompts\" are\n            observations from each environment and the result of applying prompting\n            tools on them. The first level keys are environment names, the second level\n            keys are observation action names. Notice that some dicts can be empty if\n            its prompting tool wasn't set.\n        \"\"\"\n        observations = {}\n        prompts = {}\n        for env_name, env in self.environment_map.items():\n            if env_name in self.prompting_tools:\n                tools = self.prompting_tools[env_name]\n            else:\n                tools = {}\n            observations[env_name], prompts[env_name] = env.observe_with_prompt(tools)\n        if self.multienv:\n            return observations, prompts\n        return self._merge_dicts(observations), self._merge_dicts(prompts)\n\n    def evaluate(self):\n        self.current_evaluator.step(self.environment_map, self.default_env)\n        return self.current_evaluator.stat()\n\n    @timed\n    def step(\n        self,\n        action: str,\n        parameters: dict[str, Any] = {},\n        env_name: str | None = None,\n    ) -> StepResult:\n        \"\"\"Executes a step in the benchmark by performing an action.\n\n        Args:\n            action: The action to execute.\n            parameters: Parameters for the action.\n            env_name: The name of the environment.\n\n        Returns:\n            The result of the step including observations and evaluation metrics. Notice\n            that the `truncated` field in the result is not meaningful for now.\n        \"\"\"\n        terminated = False\n        info = {}\n        if self.current_evaluator is None or self.current_task is None:\n            raise RuntimeError(\"There is no started task.\")\n\n        if action == \"complete\":\n            terminated = True\n            info[\"terminate_reason\"] = \"agent_complete\"\n            return StepResult(\n                truncated=False,\n                terminated=True,\n                action_returns=None,\n                evaluation_results=self.current_evaluator.stat(),\n                info=info,\n            )\n\n        try:\n            environment = self._get_env(env_name=env_name, action_name=action)\n        except Exception:\n            print(traceback.format_exc())\n            terminated = True\n            info[\"terminate_reason\"] = \"action_format_error\"\n            info[\"exception_detail\"] = traceback.format_exc()\n            environment.reset()\n            self.close_task()\n            return StepResult(\n                truncated=False,\n                terminated=True,\n                action_returns=None,\n                evaluation_results=self.current_evaluator.stat(),\n                info=info,\n            )\n        try:\n            action_returns = environment.step(action, parameters)\n        except Exception:\n            print(traceback.format_exc())\n            terminated = True\n            info[\"terminate_reason\"] = \"env_exception\"\n            info[\"exception_detail\"] = traceback.format_exc()\n            environment.reset()\n            self.close_task()\n            return StepResult(\n                truncated=False,\n                terminated=True,\n                action_returns=None,\n                evaluation_results=self.current_evaluator.stat(),\n                info=info,\n            )\n\n        try:\n            evaluation_results = self.evaluate()\n        except Exception:\n            print(traceback.format_exc())\n            terminated = True\n            info[\"terminate_reason\"] = \"evaluator_exception\"\n            info[\"exception_detail\"] = traceback.format_exc()\n            environment.reset()\n            self.close_task()\n            return StepResult(\n                truncated=False,\n                terminated=True,\n                action_returns=action_returns,\n                evaluation_results=self.current_evaluator.stat(),\n                info=info,\n            )\n\n        self.step_cnt += 1\n        if self.current_evaluator.is_complete():\n            terminated = True\n            info[\"terminate_reason\"] = \"success\"\n        if self.step_cnt >= self.step_limit:\n            terminated = True\n            info[\"terminate_reason\"] = \"reach_max_step\"\n        if terminated:\n            environment.reset()\n            self.close_task()\n        return StepResult(\n            truncated=False,\n            terminated=terminated,\n            action_returns=action_returns,\n            evaluation_results=evaluation_results,\n            info=info,\n        )\n\n    def reset(self) -> None:\n        \"\"\"Resets all environments and the current task.\"\"\"\n        self.current_evaluator = None\n        self._reset_environments()\n\n    def human_evaluation(self, task_id: str) -> None:\n        task, _ = self.start_task(task_id)\n        print(task.description)\n\n        self.current_evaluator.human_mode = True\n\n        evaluation_results = self.evaluate()\n        print(evaluation_results, end=\"\")\n        while evaluation_results[\"completeness\"] != 1.0:\n            sleep(2)\n            evaluation_results = self.evaluate()\n            print(\"\\r\" + str(evaluation_results), end=\"\")\n        self.close_task()\n\n    def export_action_space(self) -> dict[str, list[Action]]:\n        \"\"\"Returns the action spaces from all environments.\n\n        Returns:\n            A dict of action lists for each environment, keyed by environment name.\n        \"\"\"\n        result = {env.name: env.action_space for env in self.environment_map.values()}\n        if self.multienv:\n            return result\n        return self._merge_lists(result)\n\n    def _verify_spaces(self) -> None:\n        \"\"\"Make sure all actions and observations are unique.\"\"\"\n        observation_name_set = set()\n        action_name_set = set()\n        for env in self.environment_map.values():\n            for action in env.action_space:\n                if action.name in action_name_set:\n                    raise ValueError(\n                        \"Dulplicated action names are not allowed in single \"\n                        \"environment benchmark.\"\n                    )\n                action_name_set.add(action.name)\n            for observation in env.observation_space:\n                if observation.name in observation_name_set:\n                    raise ValueError(\n                        \"Dulplicated observation names are not allowed in the \"\n                        \"single environment benchmark.\"\n                    )\n                observation_name_set.add(observation.name)\n\n    def _generate_action_map(self) -> None:\n        self.action_map: dict[str, Environment] = {}\n        for env in self.environment_map.values():\n            for action in env.action_space:\n                self.action_map[action.name] = env\n\n    def _get_env(\n        self, env_name: str | None = None, action_name: str | None = None\n    ) -> Environment:\n        # env_name exists just return it\n        if env_name is not None:\n            return self.environment_map[env_name]\n        # or in multienv use default env, in singlenev use action_name mapping\n        if action_name is not None and not self.multienv:\n            return self.action_map[action_name]\n        return self.environment_map[self.default_env]\n\n    def _take_env_action(self, action: Action) -> Any:\n        if action.env_name is None:\n            env = self.environment_map[self.default_env]\n        else:\n            env = self.environment_map[action.env_name]\n        return env.take_action(action)\n\n    def _set_env_action(self, action: Action) -> None:\n        if action.env_name is None:\n            env = self.environment_map[self.default_env]\n        else:\n            env = self.environment_map[action.env_name]\n        env.set_action(action)\n        if not self.multienv:\n            self.action_map[action.name] = env\n\n    def _reset_environments(self):\n        for env in self.environment_map.values():\n            env.reset()\n        if not self.multienv:\n            self._generate_action_map()\n\n    def _get_task_by_id(self, task_id: str) -> Task:\n        result = [task for task in self.tasks if task_id == task.id]\n        if len(result) == 0:  # Doesn't find the task\n            raise TaskNotFound(f\"No such task: {task_id}\")\n        return result[0]\n\n    def _merge_dicts(\n        self, env_dict: dict[str, dict[str, Any]]\n    ) -> dict[str, dict[str, Any]]:\n        \"In single environment mode, merge aciton_space/observation_space in root.\"\n        result = {}\n        for dict_value in env_dict.values():\n            result.update(dict_value)\n        return {self.default_env: result}\n\n    def _merge_lists(self, env_dict: dict[str, list]) -> dict[str, list]:\n        \"In single environment mode, merge aciton_space/observation_space in root.\"\n        result = []\n        for dict_value in env_dict.values():\n            result.extend(dict_value)\n        return {self.default_env: result}\n\n\ndef create_benchmark(config: BenchmarkConfig) -> Benchmark:\n    \"\"\"Creates a benchmark by BenchmarkConfig\"\"\"\n    if isinstance(config, BenchmarkConfig):\n        environments = [\n            create_environment(env_config) for env_config in config.environments\n        ]\n        parameters = dict(config)\n        parameters[\"environments\"] = environments\n        return Benchmark(**parameters)\n    else:\n        raise ValueError(\"Unsupport benchmark config type.\")\n"
  },
  {
    "path": "crab/core/csv_log.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport csv\nfrom pathlib import Path\nfrom typing import Any\n\n\nclass CSVLog:\n    def __init__(self, csv_path: Path, headers: list[str]) -> None:\n        self.csv_path = csv_path\n        self.header = headers\n        if not csv_path.exists():\n            with open(csv_path, \"w\", newline=\"\") as file:\n                writer = csv.writer(file)\n                writer.writerow(headers)\n\n    def write_row(self, data: list[Any]):\n        assert len(data) == len(self.header)\n        with open(self.csv_path, \"a\", newline=\"\") as file:\n            writer = csv.writer(file)\n            writer.writerow(data)\n"
  },
  {
    "path": "crab/core/decorators.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom typing import Callable\n\nfrom .models import Action, Evaluator\n\n\ndef _decorator(func, cls: type[Action], options: dict | None = None) -> Action:\n    action = cls.from_function(func)\n    if options is not None:\n        for key in options:\n            setattr(action, key, options[key])\n\n    return action\n\n\ndef action(*args: Callable, env_name: str | None = None, local=False):\n    \"\"\"Use @action to change a function to an Action\"\"\"\n    if args and callable(args[0]):\n        return _decorator(args[0], Action)\n\n    return lambda func: _decorator(func, Action, {\"env_name\": env_name, \"local\": local})\n\n\ndef evaluator(\n    *args: Callable,\n    require_submit: bool = False,\n    env_name: str | None = None,\n    local=False,\n):\n    \"\"\"Use @evaluator to change a function to an Evaluator\"\"\"\n    if args and callable(args[0]):\n        return _decorator(args[0], Evaluator)\n\n    return lambda func: _decorator(\n        func,\n        Evaluator,\n        {\"require_submit\": require_submit, \"env_name\": env_name, \"local\": local},\n    )\n"
  },
  {
    "path": "crab/core/environment.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport json\nimport logging\nfrom typing import Any\n\nfrom httpx import Client\n\nfrom crab.utils import decrypt_message, encrypt_message, generate_key_from_env\nfrom crab.utils.measure import timed\n\nfrom .exceptions import ActionNotFound\nfrom .models import Action, ClosedAction, EnvironmentConfig\n\nlogger = logging.getLogger(\"crab-server\")\n\n\nclass Environment:\n    \"\"\"\n    A crab environment for language model agent interaction and evaluation.\n\n    This class supports action execution and observation within a simulated or actual\n    ecosystem. The environment is defined by customizable action and observation spaces,\n    comprising various crab actions. Actions should include comprehensive docstrings to\n    facilitate agent understanding and interaction.\n\n    Typically, users instantiate this class directly to perform actions within the local\n    execution context (i.e., the device running the crab framework). This class may also\n    serve as a base for specialized environments requiring unique action execution\n    processes, such as forwarding actions to remote systems for execution. This is\n    achieved by overriding the `take_action` method.\n\n    Actions defined in the `action_space`, `observation_space`, or `reset`, as well as\n    those invoked through the `take_action` method that include an `env` parameter, will\n    have this parameter automatically populated with the current environment instance.\n    This allows actions to access and manipulate environment states and variables.\n\n    Attributes:\n        name (str): The name of the environment.\n        description (str): A description of the environment.\n        trajectory (List[tuple[str, dict[str, Any], Any]]): A record of actions taken,\n            their parameters, and the results.\n\n    Args:\n        name (str): The name of the environment.\n        action_space (List[Action]): A list of actions that can be executed, defining\n            the possible interactions agents can undertake.\n        observation_space (List[ClosedAction]): A list of observations defining the\n            possible states agents can perceive.\n        description (str, optional): A textual description of the environment. Defaults\n            to an empty string.\n        reset (Action | None, optional): An action to reset the environment to its\n            initial state. Defaults to `None`.\n        remote_url (Action | None, optional): If set, the action will be taken at\n            remote machine, by default it will be taken at local. Example:\n            `http://192.168.1.1:8000`. Defaults to `None`.\n    \"\"\"\n\n    def __init__(\n        self,\n        name: str,\n        action_space: list[Action],\n        observation_space: list[ClosedAction],\n        description: str = \"\",\n        reset: Action | None = None,\n        remote_url: str | None = None,\n        extra_attributes: dict[str, Any] = {},\n    ) -> None:\n        self.name = name\n        self.description = description\n        self.trajectory: list[tuple[str, dict[str, Any], Any]] = []\n        self.observation_history: list[dict[str, Any]] = []\n\n        self._origin_action_space = action_space\n        self._observation_space = observation_space\n        self._reset = reset\n        self._action_map = {action.name: action for action in action_space}\n\n        self._client: Client | None = None\n        if remote_url is not None:\n            self._client = Client(base_url=remote_url, timeout=60)\n        for key, value in extra_attributes.items():\n            setattr(self, key, value)\n\n        self._enc_key = generate_key_from_env()\n\n    def step(\n        self,\n        action_name: str,\n        parameters: dict[str, Any] = {},\n    ):\n        \"\"\"\n        Executes an action that is in the action space and recorded to the trajectory.\n\n        Args:\n            action_name: Name of the action to execute. Must be in action space.\n            parameters (dict[str, Any], optional): Parameters for the action. Defaults\n                to an empty `dict`.\n\n        Returns:\n            Any: The result of the action execution.\n\n        Raises:\n            ActionNotFound: If the action is not found within the environment's action\n                space.\n        \"\"\"\n        if action_name not in self._action_map:\n            logger.error(f'Env \"{self.name}\": receives unkown action \"{action_name}\"')\n            raise ActionNotFound(f\"Action {action_name} not found in the environment\")\n        action_handler = self._action_map[action_name]\n        result = self.take_action(action_handler, parameters)\n        self.trajectory.append((action_handler.name, parameters, result))\n        return result\n\n    def take_action(\n        self,\n        action: Action,\n        parameters: dict[str, Any] = {},\n    ) -> Any:\n        \"\"\"\n        Executes an action within the environment.\n\n        Args:\n            action (Action): The action to execute. Can be an action name or an\n                `Action` object.\n            parameters (dict[str, Any], optional): Parameters for the action. Defaults\n                to an empty `dict`.\n\n        Returns:\n            Any: The result of the action execution.\n        \"\"\"\n        try:\n            result = self._action_endpoint(action, parameters)\n            logger.info(\n                f'Env \"{self.name}\": action: \"{action.name}\" successed. '\n                \"result: {result}.\"\n            )\n            return result\n        except:\n            logger.exception(\n                f'Env \"{self.name}\": action: \"{action}\" failed:', stack_info=True\n            )\n            raise\n\n    @timed\n    def observe(self) -> dict[str, Any]:\n        \"\"\"\n        Observes the current state.\n\n        Returns:\n            Dict[str, Any]: A dictionary containing the current observations. Keys\n                represent the names of the observation actions.\n        \"\"\"\n        result = {o.name: self.take_action(o) for o in self.observation_space}\n        self.observation_history.append(result)\n        return result\n\n    @timed\n    def observe_with_prompt(\n        self, prompt_tools: dict[str, Action]\n    ) -> tuple[dict[str, Any], dict[str, Any]]:\n        \"\"\"\n        Observes the current state with prompt.\n        \"\"\"\n        observations = self.observe()\n        prompts = {}\n        for ob_name, value in observations.items():\n            if ob_name in prompt_tools:\n                action = prompt_tools[ob_name]\n                key = next(iter(action.get_required_params()))\n                prompts[ob_name] = self._action_endpoint(action, {key: value})\n        return observations, prompts\n\n    def set_action(self, action: Action) -> None:\n        \"\"\"\n        Adds an action in the environment's action space, either replace if the action\n        name exist.\n\n        Args:\n            action (Action): The action to replace or add.\n        \"\"\"\n        self._action_map[action.name] = action\n\n    def start(self) -> None:\n        \"\"\"Starts the environment.\"\"\"\n        pass\n\n    def close(self) -> None:\n        \"\"\"Closes the environment, performing any necessary cleanup.\"\"\"\n        pass\n\n    def reset(self) -> None:\n        \"\"\"Resets the environment based on the provided reset action\"\"\"\n        self._action_space = self._origin_action_space\n        self.action_map = {action.name: action for action in self._action_space}\n        if self._reset is not None:\n            self.take_action(self._reset)\n\n    @property\n    def action_space(self) -> list[Action]:\n        return list(self._action_map.values())\n\n    @property\n    def observation_space(self) -> list[ClosedAction]:\n        return self._observation_space\n\n    def _action_endpoint(self, action: Action, parameters: dict[str, Any]):\n        \"\"\"Rewrite to support different environments.\"\"\"\n        if self._client is not None and not action.local:\n            data = json.dumps(\n                {\n                    \"action\": action.to_raw_action(),\n                    \"parameters\": action.parameters(**parameters).model_dump(),\n                }\n            )\n            content_type = \"application/json\"\n            if self._enc_key is not None:\n                data = encrypt_message(data, self._enc_key)\n                content_type = \"text/plain\"\n\n            # send action to remote machine\n            response = self._client.post(\n                \"/raw_action\",\n                content=data,\n                headers={\"Content-Type\": content_type},\n            )\n\n            resp_content = response.content.decode(\"utf-8\")\n            if self._enc_key is not None:\n                resp_content = decrypt_message(resp_content, self._enc_key)\n\n            resp_json = json.loads(resp_content)\n            return resp_json[\"action_returns\"]\n        else:\n            # or directly execute it\n            action = action.set_kept_param(env=self)\n            return action.run(**parameters)\n\n\ndef create_environment(config):\n    if isinstance(config, EnvironmentConfig):\n        return Environment(**dict(config))\n    else:\n        raise ValueError(\"Unsupported environment config type.\")\n"
  },
  {
    "path": "crab/core/exceptions.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nclass ActionNotFound(ValueError):\n    pass\n\n\nclass TaskNotFound(ValueError):\n    pass\n"
  },
  {
    "path": "crab/core/experiment.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport json\nimport traceback\nfrom datetime import datetime\nfrom pathlib import Path\nfrom time import sleep\nfrom typing import Literal\n\nfrom crab.utils.common import base64_to_image\n\nfrom .agent_policy import AgentPolicy\nfrom .benchmark import Benchmark\nfrom .csv_log import CSVLog\nfrom .models import ActionOutput, MessageType\n\nCURRENT_EXPERIMENT_COLUMNS = [\n    \"step\",\n    \"action\",\n    \"total_nodes\",\n    \"complete_nodes\",\n    \"completeness\",\n    \"completeness_per_action\",\n    \"step_to_complete\",\n    \"longest_unfinished_path_length\",\n    \"token_usage\",\n]\n\n\nMAIN_LOG_COLUMNS = [\n    \"time\",\n    \"agent_policy\",\n    \"model\",\n    \"task_id\",\n    \"total_steps\",\n    \"terminate_reason\",\n    \"total_nodes\",\n    \"complete_nodes\",\n    \"completeness\",\n    \"completeness_per_action\",\n    \"step_to_complete\",\n    \"longest_unfinished_path_length\",\n    \"token_usage\",\n]\n\n\nclass Experiment:\n    def __init__(\n        self,\n        benchmark: Benchmark,\n        task_id: str,\n        agent_policy: AgentPolicy | Literal[\"human\"],\n        log_dir: Path | None = None,\n    ) -> None:\n        self.benchmark = benchmark\n        self.task_id = task_id\n        self.agent_policy = agent_policy\n        self.log_dir = log_dir\n\n    def write_message(self, message: str, step: int):\n        with open(self.message_path, \"a\") as file:\n            file.write(\"=\" * 20 + f\"Step: {step}\" + \"=\" * 20 + \"\\n\" + message + \"\\n\")\n\n    def write_task_info_json(self, task_info_path: Path):\n        envs_info = {}\n        for name, env in self.benchmark.environment_map.items():\n            actions = {\n                name: action.description for name, action in env._action_map.items()\n            }\n            observations = {\n                action.name: action.description for action in env._observation_space\n            }\n            envs_info[name] = {\n                \"description\": env.description,\n                \"actions\": actions,\n                \"observations\": observations,\n            }\n        task_info = {\n            \"benchmark_name\": self.benchmark.name,\n            \"task_id\": self.task_id,\n            \"task_description\": self.task.description,\n            \"envs\": envs_info,\n        }\n        with open(task_info_path, \"w\") as file:\n            json.dump(task_info, file, indent=4)\n\n    def init_log_dir(self):\n        if self.log_dir is not None:\n            self.log_dir.mkdir(exist_ok=True, parents=True)\n\n            self.main_log = CSVLog(self.log_dir / \"main_log.csv\", MAIN_LOG_COLUMNS)\n\n            self.task_info_dir = self.log_dir / self.task_id\n            self.task_info_dir.mkdir(exist_ok=True, parents=True)\n            self.write_task_info_json(self.task_info_dir / \"task_info.json\")\n\n            self.time_now = datetime.now().strftime(\"%Y-%m-%d_%H-%M-%S\")\n            self.current_experiment_dir = (\n                self.task_info_dir / f\"{self.agent_policy.__class__.__name__}\"\n                f\"({self.agent_policy.get_backend_model_name()})\" / self.time_now\n            )\n            self.current_experiment_dir.mkdir(parents=True)\n\n            self.current_experiment_log = CSVLog(\n                self.current_experiment_dir / \"metrics.csv\", CURRENT_EXPERIMENT_COLUMNS\n            )\n\n            self.prompt_path = self.current_experiment_dir / \"prompt\"\n            self.image_path = self.current_experiment_dir / \"images\"\n            self.prompt_path.mkdir()\n            self.image_path.mkdir()\n\n            self.message_path = self.current_experiment_dir / \"messages.txt\"\n\n    def get_prompt(self) -> dict[str, list[tuple[str, MessageType]]]:\n        return self.benchmark.observe()\n\n    def execute_action(self, response: list[ActionOutput]) -> bool:\n        for action in response:\n            benchmark_result = self.benchmark.step(\n                action=action.name,\n                parameters=action.arguments,\n                env_name=action.env,\n            )\n            self.metrics = benchmark_result.evaluation_results\n            if benchmark_result.terminated:\n                print(\"\\033[92m\" f\"Task finished, result: {self.metrics}\" \"\\033[0m\")\n                self.write_current_log_row(action)\n                self.write_main_csv_row(benchmark_result.info[\"terminate_reason\"])\n                if \"exception_detail\" in benchmark_result.info:\n                    self.write_exception_detail(\n                        benchmark_result.info[\"exception_detail\"]\n                    )\n                return True\n            print(\n                \"\\033[92m\"\n                f'Action \"{action.name}\" in env \"{action.env}\" success. '\n                f\"Current evaluation results: {self.metrics}\\n\"\n                \"\\033[0m\"\n            )\n            self.write_current_log_row(action)\n            self.step_cnt += 1\n        return False\n\n    def log_prompt(self, prompt):\n        for env in prompt:\n            with open(self.prompt_path / f\"{env}_prompt.md\", \"a\") as prompt_file:\n                prompt_file.write(f\"### Step {self.step_cnt}\\n\\n\")\n                for message, message_type in prompt[env]:\n                    if message_type == MessageType.IMAGE_JPG_BASE64:\n                        file_name = f\"{env}_{self.step_cnt}.png\"\n                        base64_to_image(message).save(self.image_path / file_name)\n                        prompt_file.write(f\"![](../images/{file_name})\\n\\n\")\n                    else:\n                        prompt_file.write(message + \"\\n\\n\")\n\n    def step(self, it) -> bool:\n        print(\"=\" * 40)\n        print(f\"Start agent step {self.step_cnt}:\")\n        prompt = self.get_prompt()\n        self.log_prompt(prompt)\n        try:\n            response = self.agent_policy.chat(prompt)\n        except Exception:\n            print(traceback.format_exc())\n            self.write_main_csv_row(\"agent_exception\")\n            self.write_exception_detail(traceback.format_exc())\n            return True\n        # content = response[\"content\"]\n        # self.write_message(str(content), it)\n        # print(\"\\033[94m\" f\"Agent Reponse: {content}\" \"\\033[0m\")\n        print(f\"So agent take action: {response}\")\n        return self.execute_action(response)\n\n    def start_benchmark(self):\n        if self.agent_policy == \"human\":\n            self.benchmark.human_evaluation(self.task_id)\n            return\n\n        env_description = {}\n        for env in self.benchmark.environment_map:\n            env_description[env] = self.benchmark.environment_map[env].description\n\n        self.task, action_space = self.benchmark.start_task(self.task_id)\n        self.agent_policy.reset(\n            task_description=self.task.description,\n            action_spaces=action_space,\n            env_descriptions=env_description,\n        )\n        print(\n            f'Start benchmark \"{self.benchmark.name}\", task id \"{self.task.id}\": '\n            f'\"{self.task.description}\"'\n        )\n        self.init_log_dir()\n        self.step_cnt = 0\n        self.metrics = self.benchmark.evaluate()\n        if self.metrics[\"complete_nodes\"] != 0:\n            print(\"Graph Evaluator start with non-zero value. Check environment setup.\")\n            return\n        for it in range(50):\n            try:\n                terminated = self.step(it)\n            except KeyboardInterrupt:\n                self.write_main_csv_row(\"keyboard_interrupt\")\n                return\n            if terminated:\n                return\n            sleep(2)\n            # input(\"Press enter to do next step:\")\n\n    def write_exception_detail(self, exception_info: str):\n        if self.log_dir is None:\n            return\n        with open(self.current_experiment_dir / \"exception_detail.txt\", \"w\") as file:\n            file.write(exception_info)\n\n    def write_current_log_row(self, action):\n        if self.log_dir is None:\n            return\n        self.current_experiment_log.write_row(\n            [\n                self.step_cnt,\n                str(action),\n                self.metrics[\"total_nodes\"],\n                self.metrics[\"complete_nodes\"],\n                self.metrics[\"completeness\"],\n                self.metrics[\"completeness_per_action\"],\n                self.metrics[\"step_to_complete\"],\n                self.metrics[\"longest_unfinished_path_length\"],\n                self.agent_policy.get_token_usage(),\n            ]\n        )\n\n    def write_main_csv_row(self, terminate_reason):\n        if self.log_dir is None:\n            return\n        self.main_log.write_row(\n            [\n                self.time_now,\n                self.agent_policy.__class__.__name__,\n                self.agent_policy.get_backend_model_name(),\n                self.task_id,\n                self.step_cnt,\n                terminate_reason,\n                self.metrics[\"total_nodes\"],\n                self.metrics[\"complete_nodes\"],\n                self.metrics[\"completeness\"],\n                self.metrics[\"completeness_per_action\"],\n                self.metrics[\"step_to_complete\"],\n                self.metrics[\"longest_unfinished_path_length\"],\n                self.agent_policy.get_token_usage(),\n            ]\n        )\n"
  },
  {
    "path": "crab/core/graph_evaluator.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom collections import deque\nfrom typing import Any\n\nimport networkx as nx\n\nfrom .environment import Environment\nfrom .models import Evaluator\n\n\nclass GraphEvaluator:\n    def __init__(\n        self,\n        incoming_graph_data,\n        enable_shortcut: bool = False,\n    ) -> None:\n        self.G = nx.DiGraph(incoming_graph_data)\n        assert nx.is_directed_acyclic_graph(self.G)\n        self.count: int = 0\n        self.total_nodes: int = self.G.number_of_nodes()\n        assert self.total_nodes != 0\n        self.complete_nodes: int = 0\n        self.completeness: float = 0.0\n        self.completeness_per_action: float = 0.0\n        self.step_to_complete: int = self.G.number_of_edges()\n        self.longest_unfinished_path_length: int = nx.dag_longest_path_length(self.G)\n        self.enable_shortcut: bool = enable_shortcut\n\n        # Set the sink node for the DAG:\n        sink_nodes: list[Evaluator] = [\n            node for node, out_degree in self.G.out_degree() if out_degree == 0\n        ]\n        if len(sink_nodes) != 1:\n            raise ValueError(\"Graph should have exactly one sink node.\")\n        self.sink_node: Evaluator = sink_nodes[0]\n\n        self.human_mode = False\n\n        self.reset()\n\n    def reset(self):\n        self.count = 0\n        for node in self.G.nodes():\n            self.G.nodes[node][\"remaining_predecessors\"] = self.G.in_degree(node)\n            self.G.nodes[node][\"passing_count\"] = None\n\n    def step(\n        self,\n        envs: dict[str, Environment],\n        default_env: str = \"root\",\n    ):\n        if self.is_complete():\n            raise ValueError(\n                \"GraphEvaluator has already completed and \"\n                \"cannot perform another step.\"\n            )\n        run_evaluators = set()\n        evaluators = self.get_next_source_nodes()\n        while evaluators:\n            for evaluator in evaluators:\n                if evaluator.local and self.human_mode:\n                    result = True\n                else:\n                    environment = envs[evaluator.env_name or default_env]\n                    result = environment.take_action(evaluator)\n                if result:\n                    self.G.nodes[evaluator][\"passing_count\"] = self.count\n                    self.complete_nodes += 1\n                    for _, out_node in self.G.out_edges(evaluator):\n                        self.G.nodes[out_node][\"remaining_predecessors\"] -= 1\n            if self.is_complete():\n                self.complete_nodes = self.total_nodes\n                break\n            run_evaluators.update(evaluators)\n            evaluators = self.get_next_source_nodes() - run_evaluators\n\n        self.update()\n\n    def get_next_source_nodes(self) -> set[Evaluator]:\n        r\"\"\"Get next source nodes to evaluate.\"\"\"\n        if not self.enable_shortcut:\n            source_nodes: list[Evaluator] = []\n            for node in self.G.nodes(data=True):\n                if (\n                    node[1][\"passing_count\"] is None\n                    and node[1][\"remaining_predecessors\"] == 0\n                ):\n                    source_nodes.append(node[0])\n        else:\n            source_nodes = list(self.G.nodes())\n\n        return set(source_nodes)\n\n    def entry(self) -> bool:\n        return all(count is not None for _, count in self.G.nodes(data=\"passing_count\"))\n\n    def update(self):\n        self.count += 1\n        self.completeness = float(self.complete_nodes / self.total_nodes)\n        self.completeness_per_action = self.completeness / self.count\n        self.step_to_complete = self.calculate_step_to_complete()\n        self.longest_unfinished_path_length = (\n            self.calculate_longest_unfinished_path_length()\n        )\n\n    def calculate_longest_unfinished_path_length(self) -> int:\n        longest_path_length: int = 0\n        if self.G.nodes[self.sink_node][\"passing_count\"] is not None:\n            return longest_path_length\n\n        # Initialize set to keep track of visited nodes\n        visited = set()\n        # Initialize queue for BFS\n        queue = deque([[self.sink_node]])\n        # BFS traversal with path\n        while queue:\n            path = queue.popleft()\n            node = path[0]\n            # Mark the node as visited\n            visited.add(node)\n            longest_path_length = max(len(path), longest_path_length) - 1\n            # Explore predecessor of the current node\n            for predecessor in self.G.predecessors(node):\n                # If predecessor is complete, skip it\n                if self.G.nodes[predecessor][\"passing_count\"] is not None:\n                    continue\n                elif predecessor not in visited:\n                    # Add path with predecessor to queue\n                    queue.append([predecessor] + path)\n        return longest_path_length\n\n    def calculate_step_to_complete(self) -> int:\n        # Initialize count for incomplete edges\n        incomplete_edges: int = 0\n        if self.G.nodes[self.sink_node][\"passing_count\"] is not None:\n            return incomplete_edges\n\n        # Initialize set to keep track of visited nodes\n        visited = set()\n        # Initialize queue for BFS\n        queue = deque([self.sink_node])\n        # BFS traversal\n        while queue:\n            # Pop node from queue\n            node = queue.popleft()\n            # Mark the node as visited\n            visited.add(node)\n\n            incomplete_edges += len(list(self.G.predecessors(node)))\n            # Explore predecessor of the current node\n            for predecessor in self.G.predecessors(node):\n                # If predecessor is complete, skip it\n                if self.G.nodes[predecessor][\"passing_count\"] is not None:\n                    continue\n                elif predecessor not in visited:\n                    # Add predecessor to queue\n                    queue.append(predecessor)\n\n        return incomplete_edges\n\n    def is_complete(self) -> bool:\n        return self.G.nodes[self.sink_node][\"passing_count\"] is not None\n\n    def get_completeness(self) -> float:\n        return self.completeness\n\n    def get_completeness_per_action(self) -> float:\n        return self.completeness_per_action\n\n    def get_step_to_complete(self) -> int:\n        return self.step_to_complete\n\n    def get_longest_unfinished_path_length(self) -> int:\n        return self.longest_unfinished_path_length\n\n    def stat(self) -> dict[str, Any]:\n        return {\n            \"total_nodes\": self.total_nodes,\n            \"complete_nodes\": self.complete_nodes,\n            \"completeness\": self.completeness,\n            \"completeness_per_action\": self.completeness_per_action,\n            \"step_to_complete\": self.step_to_complete,\n            \"longest_unfinished_path_length\": self.longest_unfinished_path_length,\n        }\n\n    def _check_submit(self, environment: Environment) -> bool:\n        \"\"\"\n        Check if the last action is _submit. If yes, return its result, either return\n        False.\n        \"\"\"\n        if not environment.trajectory:\n            return False\n        last_action = environment.trajectory[-1]\n        if last_action[0] != \"_submit\":\n            return False\n\n        return last_action[2]\n\n    def compute_radar_stats(self) -> dict[str, float]:\n        longest_path_length = nx.dag_longest_path_length(self.G)\n        return {\n            \"Completeness\": float(self.completeness),\n            \"Efficiency\": float(self.completeness_per_action),\n            \"Path Completeness Ratio\": (\n                longest_path_length - self.longest_unfinished_path_length\n            )\n            / longest_path_length,\n        }\n\n    @staticmethod\n    def visualize(evaluators: list[\"GraphEvaluator\"], path: str):\n        import plotly.graph_objects as go\n\n        fig = go.Figure()\n        for i, evaluator in enumerate(evaluators):\n            radar_stats = evaluator.compute_radar_stats()\n            fig.add_trace(\n                go.Scatterpolar(\n                    r=list(radar_stats.values()),\n                    theta=list(radar_stats.keys()),\n                    fill=\"toself\",\n                    name=f\"Graph Evaluator {i}\",\n                )\n            )\n\n        fig.update_layout(\n            polar=dict(radialaxis=dict(visible=True, range=[0, 1])),\n            showlegend=True,\n        )\n        fig.update_layout(\n            margin=dict(l=150, r=150, t=150, b=150),\n        )\n        fig.write_image(path, scale=12, width=600, height=600)\n"
  },
  {
    "path": "crab/core/models/__init__.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# ruff: noqa: F401\nfrom .action import Action, ClosedAction\nfrom .agent_interface import ActionOutput, BackendOutput, Message, MessageType\nfrom .benchmark_interface import StepResult\nfrom .config import BenchmarkConfig, EnvironmentConfig, VMEnvironmentConfig\nfrom .evaluator import Evaluator\nfrom .task import GeneratedTask, SubTask, SubTaskInstance, Task\n\n__all__ = [\n    \"Action\",\n    \"ClosedAction\",\n    \"MessageType\",\n    \"Message\",\n    \"ActionOutput\",\n    \"BackendOutput\",\n    \"StepResult\",\n    \"BenchmarkConfig\",\n    \"Task\",\n    \"SubTask\",\n    \"SubTaskInstance\",\n    \"GeneratedTask\",\n    \"Evaluator\",\n    \"EnvironmentConfig\",\n    \"VMEnvironmentConfig\",\n]\n"
  },
  {
    "path": "crab/core/models/action.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom functools import partial\nfrom inspect import Parameter, Signature, signature\nfrom types import NoneType\nfrom typing import Annotated, Any, Callable, TypeAlias\n\nfrom docstring_parser import parse\nfrom pydantic import (\n    AfterValidator,\n    BaseModel,\n    ValidationError,\n    create_model,\n    model_serializer,\n)\nfrom pydantic.fields import FieldInfo\n\nfrom crab.utils.common import callable_to_base64\n\ntry:\n    from typing import Self\nexcept ImportError:\n    from typing_extensions import Self\n\n\nKEPT_PARAMS = [\"env\"]\nEMPTY_MODEL = create_model(\"Empty\")\n\n\nclass Action(BaseModel):\n    \"\"\"\n    The core operational unit within the Crab system.\n\n    This class stores parameters and return type definitions and can be easily converted\n    into a JSON schema. It supports argument verification and includes a feature for\n    retaining specific parameters.\n\n    Attributes:\n        name (str): The name of the action.\n        entry (Callable): The actual entry function of the action.\n        parameters (type[BaseModel]): Definition of input parameters.\n        returns (type[BaseModel]): Definition of the return type. Note: The actual\n            return type is specified by the `returns` attribute in this model.\n        description (str | None): A clear and concise description of the function's\n            purpose and behavior. Defaults to None.\n        kept_params (dict[str, Any]): Parameters retained for internal use by the Crab\n            system, such as 'env' for storing the current environment. These parameters\n            do not appear in the `parameters` field and are automatically injected at\n            runtime. Defaults to an empty dictionary.\n        env_name (Optinal[str]): Specify the environment the action is associated with.\n            Defualts to None.\n    \"\"\"\n\n    name: str\n    entry: Callable\n    parameters: type[BaseModel]\n    returns: type[BaseModel]\n    description: str | None = None\n    kept_params: list[str] = []\n    env_name: str | None = None\n    local: bool = False\n\n    def __eq__(self, other):\n        return super().__eq__(other)\n\n    def __hash__(self):\n        return hash(self.entry)\n\n    def __call__(self, *args: Any, **kwargs: Any) -> Self:\n        \"\"\"Sets default values for the action.\n\n        Direct calling of the action will not actully call the function, yet set\n        defaults values for the action, so the agent don't need to or only need to\n        provide part of the parameters.\n\n        This method has two mode, full setting and partial setting. Full setting mode is\n        applied when the user provides positional arguments, where all the required\n        parameters must be provide and the action parameters will be empty. While if\n        only keyword arguments are provided, partial setting mode is applied, where the\n        parameter model will not be changed but only change the default value of the\n        parameters.\n\n        Note:\n            Full setting mode is not stable.\n        \"\"\"\n        if args:\n            # this is closed function\n            result = self.model_copy(\n                update={\n                    \"entry\": partial(self.entry, *args, **kwargs),\n                    \"parameters\": EMPTY_MODEL,\n                }\n            )\n            if self.description is not None:\n                result.description = self.description + f\" Input: {args} {kwargs}\"\n            return result\n        else:\n            # or it should only contain kwargs\n            for key in kwargs:\n                # verify the kwargs exist\n                if key not in self.parameters.model_fields:\n                    raise ValueError(\n                        f'\"{key}\" is not a parameter of action \"{self.name}\"'\n                    )\n\n            result = self.model_copy(\n                update={\n                    \"entry\": partial(self.entry, **kwargs),\n                }\n            )\n            if self.description is not None:\n                result.description = self.description + f\" Input: {args} {kwargs}\"\n            return result\n\n    @staticmethod\n    def _check_combinable(a: \"Action\", b: \"Action\") -> None:\n        if set(a.kept_params) != set(b.kept_params):\n            raise ValueError(\"Piped actions should have same kept parameters.\")\n        if a.env_name != b.env_name:\n            raise ValueError(\"Piped actions should have same env_name.\")\n        if a.local != b.local:\n            raise ValueError(\"Piped actions should have same `local` value.\")\n\n    def __rshift__(self, other_action: \"Action\") -> \"Action\":\n        \"\"\"Uses :obj:`>>` to pipe two actions together to form a new action.\n\n        The returned action executes the actions from left to right. The output of the\n        left action becomes the input to the right action, provided their parameters and\n        return types are compatible.\n        \"\"\"\n        required = other_action.get_required_params()\n        if len(required) != 1:\n            raise ValueError(\n                \"Return type of the former action must mathces the parameter type \"\n                \"of the later action.\"\n            )\n        Action._check_combinable(self, other_action)\n\n        a_entry = self.entry\n        b_entry = other_action.entry\n        kept_params = self.kept_params.copy()\n        entry = lambda *args, **kwargs: b_entry(\n            a_entry(*args, **kwargs),\n            **{key: kwargs[key] for key in kwargs if key in kept_params},\n        )\n        return Action(\n            name=f\"{self.name}_pipe_{other_action.name}\",\n            description=f\"First {self.description}. Then use the result of the \"\n            f\"former as input, {other_action.description}\",\n            parameters=self.parameters,\n            returns=other_action.returns,\n            entry=entry,\n            kept_params=self.kept_params,\n            env_name=self.env_name,\n            local=self.local,\n        )\n\n    def __add__(self, other_action: \"Action\") -> \"Action\":\n        \"\"\"Uses :obj:`+` to combine two actions sequetially to form a new action.\n\n        The returned action executes the actions from left to right. Its return value\n        will be the return value of the right action.\n\n        Note:\n            \"+\" operator only support two action with no required parameters.\n        \"\"\"\n        self_required = self.get_required_params()\n        other_required = other_action.get_required_params()\n        if len(other_required) > 1 or len(self_required) > 1:\n            raise ValueError(\n                '\"+\" operator only support two action with no required parameters.'\n            )\n        Action._check_combinable(self, other_action)\n\n        a_entry = self.entry\n        b_entry = other_action.entry\n        entry = lambda **kwargs: (a_entry(**kwargs), b_entry(**kwargs))[1]\n        return Action(\n            name=f\"{self.name}_then_{other_action.name}\",\n            description=f\"{self.description} Then, {other_action.description}\",\n            parameters=EMPTY_MODEL,\n            returns=other_action.returns,\n            entry=entry,\n            kept_params=self.kept_params,\n            env_name=self.env_name,\n            local=self.local,\n        )\n\n    def run(self, **kwargs) -> Any:\n        \"\"\"Varifies the action parameters then runes the action.\"\"\"\n        if self.kept_params:\n            raise RuntimeError(\"There are unassigned kept parameters.\")\n        try:\n            kwargs = self.parameters(**kwargs).model_dump()\n        except ValidationError:\n            pass  # TODO: Exeception handle\n        return self.entry(**kwargs)\n\n    def set_kept_param(self, **params) -> Self:\n        kept_params = {key: params[key] for key in params if key in self.kept_params}\n        result = self.model_copy()\n        result.kept_params = []\n        result.entry = partial(self.entry, **kept_params)\n        return result\n\n    def get_required_params(self) -> dict[str, FieldInfo]:\n        return {\n            name: info\n            for name, info in self.parameters.model_fields.items()\n            if info.is_required()\n        }\n\n    @model_serializer\n    def to_openai_json_schema(self) -> dict:\n        \"\"\"Gets openai json schema from an action\"\"\"\n\n        return {\n            \"name\": self.name,\n            \"description\": self.description,\n            \"parameters\": self.parameters.model_json_schema(),\n            # \"returns\": self.returns.model_json_schema()[\"properties\"][\"returns\"],\n        }\n\n    def to_raw_action(self) -> dict[str, Any]:\n        \"\"\"Gets serialized action for remote execution\"\"\"\n        return {\n            \"name\": self.name,\n            \"dumped_entry\": callable_to_base64(self.entry),\n            \"kept_params\": list(self.kept_params),\n        }\n\n    @classmethod\n    def from_function(cls, func: Callable) -> Self:\n        \"\"\"Generates an action from functions annotated by @action.\"\"\"\n        if func.__doc__ is None:\n            # raise RuntimeError(\"The action must have a Google-style docstring.\")\n            parameters_descriptions = None\n            func_description = None\n            return_description = None\n        else:\n            docstring = parse(func.__doc__)\n            parameters_descriptions = {\n                param.arg_name: param.description for param in docstring.params\n            }\n            func_description = docstring.short_description or \"\"\n            if docstring.long_description:\n                func_description += \"\\n\" + docstring.long_description\n            if docstring.returns:\n                return_description = docstring.returns.description\n            else:\n                return_description = None\n\n        sign = signature(func)\n        params = sign.parameters\n        fields = {}\n        kept_params = []\n        for param_name, p in params.items():\n            # Don't add kept parameters in prameters' model\n            if param_name in KEPT_PARAMS:\n                kept_params.append(param_name)\n                continue\n            # Variable parameters are not supported\n            if p.kind in [Parameter.VAR_POSITIONAL, Parameter.VAR_KEYWORD]:\n                continue\n            # If the parameter type is not specified, it defaults to typing.Any\n            annotation = Any if p.annotation is Parameter.empty else p.annotation\n            # Check if the parameter has a description\n            param_description = None\n            if parameters_descriptions is not None:\n                param_description = parameters_descriptions.get(param_name, None)\n            # Check if the parameter has a default value\n            if p.default is Parameter.empty:\n                fields[param_name] = (\n                    annotation,\n                    FieldInfo(description=param_description),\n                )\n            else:\n                fields[param_name] = (annotation, FieldInfo(default=p.default))\n        model: type[BaseModel] = create_model(func.__name__, **fields)  # type: ignore\n\n        # insert return to parameters\n        return_annotation = (\n            Any if sign.return_annotation == Signature.empty else sign.return_annotation\n        )\n        return_model: type[BaseModel] = create_model(\n            func.__name__ + \"_return\",\n            returns=(\n                return_annotation or NoneType,\n                FieldInfo(description=return_description, init=False),  # type: ignore\n            ),\n        )\n\n        action = cls(\n            name=func.__name__,\n            entry=func,\n            parameters=model,\n            returns=return_model,\n            description=func_description,\n            kept_params=kept_params,\n        )\n        return action\n\n\ndef _check_no_param(action: Action) -> Action:\n    if len(action.get_required_params()) != 0:\n        raise ValueError(\"ClosedAction should not accept any parameter.\")\n    return action\n\n\nClosedAction: TypeAlias = Annotated[Action, AfterValidator(_check_no_param)]\n\"\"\"The action type alias with no reuqired parameters\"\"\"\n"
  },
  {
    "path": "crab/core/models/agent_interface.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom enum import IntEnum\nfrom typing import Any\n\nfrom pydantic import BaseModel\n\nfrom .action import Action\n\n\nclass MessageType(IntEnum):\n    TEXT = 0\n    IMAGE_JPG_BASE64 = 1\n\n\nMessage = tuple[str, MessageType]\n\n\nclass ActionOutput(BaseModel):\n    name: str\n    arguments: dict[str, Any]\n    env: str | None = None\n\n\nclass BackendOutput(BaseModel):\n    message: str | None\n    action_list: list[ActionOutput] | None\n\n\nclass EnvironmentInfo(BaseModel):\n    description: str\n    action_space: list[Action]\n"
  },
  {
    "path": "crab/core/models/benchmark_interface.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom typing import Any\n\nfrom pydantic import BaseModel\n\n\nclass StepResult(BaseModel):\n    truncated: bool\n    terminated: bool\n    action_returns: Any\n    evaluation_results: dict[str, Any]\n    info: dict[str, Any]\n"
  },
  {
    "path": "crab/core/models/config.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom typing import Any\n\nfrom pydantic import BaseModel\n\nfrom .action import Action, ClosedAction\nfrom .task import Task\n\n\nclass EnvironmentConfig(BaseModel):\n    name: str\n    action_space: list[Action]\n    observation_space: list[ClosedAction]\n    description: str = \"\"\n    reset: Action | None = None\n    remote_url: str | None = None\n    extra_attributes: dict[str, Any] = {}\n\n\nclass VMEnvironmentConfig(BaseModel):\n    inside_environment: EnvironmentConfig\n    remote_url: str = \"http://192.168.0.0:8000\"\n\n\nclass BenchmarkConfig(BaseModel):\n    name: str\n    tasks: list[Task]\n    environments: list[EnvironmentConfig]\n    default_env: str | None = None\n    multienv: bool = False\n    prompting_tools: dict[str, dict[str, Action]] = {}\n    root_action_space: list[Action] = []\n    step_limit: int = 30\n    common_setup: list[ClosedAction] = []\n"
  },
  {
    "path": "crab/core/models/evaluator.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom pydantic import BaseModel, field_validator\n\nfrom .action import Action\n\n\nclass Evaluator(Action):\n    require_submit: bool = False\n\n    @field_validator(\"returns\", mode=\"after\")\n    @classmethod\n    def must_return_bool(cls, v: type[BaseModel]) -> type[BaseModel]:\n        if v.model_fields[\"returns\"].annotation is not bool:\n            raise ValueError(\"Evaluator must return bool.\")\n        return v\n\n    def __and__(self, other: \"Evaluator\") -> \"Evaluator\":\n        Action._check_combinable(self, other)\n        result = self.model_copy()\n        result.name = (f\"{self.name}_and_{other.name}\",)\n        result.description = f\"{self.description} In the same time, {other.description}\"\n        self_entry = self.entry\n        other_entry = other.entry\n        result.entry = lambda: self_entry() and other_entry()\n        return result\n\n    def __or__(self, other: \"Evaluator\") -> \"Evaluator\":\n        Action._check_combinable(self, other)\n        result = self.model_copy()\n        result.name = (f\"{self.name}_or_{other.name}\",)\n        result.description = (\n            f\"{self.description} If the previous one fails {other.description}\"\n        )\n        self_entry = self.entry\n        other_entry = other.entry\n        result.entry = lambda: self_entry() or other_entry()\n        return result\n\n    def __invert__(self) -> \"Evaluator\":\n        result = self.model_copy()\n        result.name = f\"not_{self.name}\"\n        result.description = (\n            f\"Check if the following description is False. {self.description}\"\n        )\n        self_entry = self.entry\n        result.entry = lambda: not self_entry()\n        return result\n"
  },
  {
    "path": "crab/core/models/task.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom typing import Any, Callable, Literal\nfrom uuid import uuid4\n\nimport networkx as nx\nfrom pydantic import (\n    BaseModel,\n    ConfigDict,\n    Field,\n    field_validator,\n    model_serializer,\n)\n\nfrom .action import Action, ClosedAction\nfrom .evaluator import Evaluator\n\n\nclass Task(BaseModel):\n    model_config = ConfigDict(arbitrary_types_allowed=True)\n    id: str\n    description: str\n    evaluator: nx.DiGraph | Evaluator\n    setup: list[ClosedAction] | ClosedAction = []\n    teardown: list[ClosedAction] | ClosedAction = []\n    extra_action: list[Action] = []\n\n    @field_validator(\"evaluator\")\n    @classmethod\n    def change_evaluator_to_graph(cls, evaluator: nx.DiGraph | Evaluator) -> str:\n        if isinstance(evaluator, Evaluator):\n            graph = nx.DiGraph()\n            graph.add_node(evaluator)\n            return graph\n        return evaluator\n\n    @field_validator(\"setup\", \"teardown\")\n    @classmethod\n    def to_list(cls, action: Action | list[Action]) -> list[Action]:\n        if isinstance(action, Action):\n            return [action]\n        return action\n\n\nclass SubTask(BaseModel):\n    id: str\n    description: str\n    attribute_dict: dict[str, list[str] | str]\n    output_type: str\n    output_generator: Callable[[Any], str] | Literal[\"manual\"] | None = None\n    evaluator_generator: Callable[[Any], nx.DiGraph] | None = None\n    setup: list[ClosedAction] | ClosedAction = []\n    teardown: list[ClosedAction] | ClosedAction = []\n    extra_action: list[Action] = []\n\n    def __hash__(self) -> int:\n        return hash(self.id)\n\n    @field_validator(\"attribute_dict\")\n    @classmethod\n    def expand_attribute_type(\n        cls,\n        attribute_dict: dict[str, list[str] | str],\n    ) -> dict[str, list[str]]:\n        attribute_dict = attribute_dict.copy()\n        for key in attribute_dict:\n            if isinstance(attribute_dict[key], str):\n                attribute_dict[key] = [attribute_dict[key]]\n        return attribute_dict\n\n\nclass SubTaskInstance(BaseModel):\n    task: SubTask\n    attribute: dict[str, Any]\n    output: str | None = None\n    id: str = Field(default_factory=uuid4)\n\n    def __hash__(self) -> int:\n        return hash(self.id)\n\n    @model_serializer\n    def dump_model(self) -> dict[str, Any]:\n        return {\n            \"task\": self.task.id,\n            \"attribute\": self.attribute,\n            \"output\": self.output,\n        }\n\n\nclass GeneratedTask(BaseModel):\n    description: str\n    tasks: list[SubTaskInstance]\n    adjlist: str\n    id: str = Field(default_factory=uuid4)\n"
  },
  {
    "path": "crab/core/task_generator.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# ruff: noqa: E501\nimport argparse\nimport importlib\nimport itertools\nimport json\nimport os\nimport random\nfrom pathlib import Path\n\nimport networkx as nx\nimport yaml\nfrom openai import OpenAI\nfrom termcolor import colored\n\nfrom .models import GeneratedTask, SubTask, SubTaskInstance, Task\n\nSYSTEM_PROMPT_SINGLE = \"\"\"\nYou are a wise operator who is familiar with both the Ubuntu and Android operating\nsystems. Our goal is to use the output of the source task as the input for the target\ntask. You should describe of the task they combined together using several imperative\nsentences. You cannot provide any extra information such as detailed operation method,\nyet only combined the taks description together in a reasonable way. You shouldn't fill\nin the input attribute wrapped by curly brackets.\n\nSource task:\nFind out the city located at coordinate (8.65759263086632, 7.520403498426244) via Google Maps.\n\nTarget task:\nSet the screen background as the first figure of {city_name} in Google.\n\nAnswer:\nUsing Google Maps, find the city located at coordinates (8.65759263086632,7.520403498426244), search Google for the first image of that city, and set this image as the desktop background on an Ubuntu system.\n\"\"\"\nUSER_PROMPT_SINGLE = \"\"\"\nSource task:\n{task1}\n\nTarget task:\n{task2}\n\nAnswer:\n\"\"\"\n\nSELECT_USER_START = \"\"\"\nSource attribute:\n{source_task}\nTarget tasks:\n{target_tasks}\nSelect a task from target tasks\nAnswer:\n\"\"\"\n\nSELECT_SYSTEM_PROMPT = \"\"\"\nYou are a wise operator who is familiar with both the Ubuntu and Android operating\nsystems. Our goal is to use the output of the source task as the input for the target\ntask. You should identify the most reasonable target task from the list, explain why you\nchoose it, and output the description of the task they combined together using several\nimperative sentences. It is crucial to establish a connection between the source and\ntarget tasks and select the best one as the output. Remember, you must select at least\none with the crucial output format. You must include the provided value and every\ndetails in each task. You must use \"======\" to seperate each part (selected task number,\ncombined task description, and explanation) Here is an example:\n\nSource task:\nFind out the city located at coordinate (8.65759263086632, 7.520403498426244) via Google Maps.\n\nTarget tasks:\nTask 0: Set the screen background as the first figure of {input attribute} in Google.\nTask 1: Close the progress of {input attribute} app via task manager.\nTask 2: Download {input attribute} from the app store.\nTask 3: Create a PowerPoint with one page containing Mount Alps.jpg and named as {input attribute 2}.\nTask 4: Send message {input attribute 1} to +81 09074540472.\n\nAnswer:\n0\n======\nUsing Google Maps, find the city located at coordinates (8.65759263086632,7.520403498426244), search Google for the first image of that city, and set this image as the desktop background on an Ubuntu system.\n======\nThis task is the most relevant and directly utilizes the output of the source task.\nFinding the city provides us with a specific location which can easily lead to a visual\nrepresentation. Searching for an image of the city to set as a background is a practical\napplication that visually celebrates the discovery of the city's identity.\n\"\"\"\n\nSELECT_USER_PROMPT = \"\"\"\nSource task:\n{source_task}\nTarget tasks:\n{target_tasks}\n\nAnswer:\n\"\"\"\n\n\nclass TaskGenerator:\n    \"\"\"Class to generate tasks based on a directed graph of subtasks.\"\"\"\n\n    def __init__(\n        self, attribute_pool: dict[str, list] = {}, subtasks: list[SubTask] = []\n    ):\n        \"\"\"\n        Initializes the TaskGenerator object.\n\n        Parameters:\n            attribute_pool (dict): A dictionary mapping attribute types to lists of possible values.\n            subtasks (list): A list of SubTask objects to be included in the task generation graph.\n        \"\"\"\n        self.G = nx.DiGraph()\n        self.attribute_pool = attribute_pool\n        self.graph_generation(subtasks)\n        self.task_mapping = {task.id: task for task in subtasks}\n        if not os.getenv(\"OPENAI_API_KEY\"):\n            os.environ[\"OPENAI_API_KEY\"] = \"EMPTY\"\n        self.client = OpenAI()\n\n    @classmethod\n    def from_config(cls, config_path: str) -> \"TaskGenerator\":\n        \"\"\"\n        Class method to create a TaskGenerator instance from a configuration file.\n\n        Parameters:\n            config_path (str): Path to the YAML configuration file.\n\n        Returns:\n            TaskGenerator: An instance of TaskGenerator.\n        \"\"\"\n        with open(config_path, \"r\") as f:\n            data = yaml.safe_load(f)\n        subtask_data = data[\"subtask\"]\n        attribute_pool = data[\"attribute_pool\"]\n        subtask_list = [\n            SubTask(\n                id=subtask[\"id\"],\n                description=subtask[\"description\"],\n                attribute_dict={\n                    key: subtask[\"attribute_dict\"][key].split(\"/\")\n                    for key in subtask[\"attribute_dict\"]\n                },\n                output_type=subtask[\"output_type\"],\n            )\n            for subtask in subtask_data\n        ]\n        return cls(attribute_pool, subtask_list)\n\n    def graph_generation(self, subtask_list: list[SubTask]) -> None:\n        \"\"\"Generates a directed graph from a list of subtasks based on output and input types.\"\"\"\n        self.G.add_nodes_from(subtask_list)\n        for input_node in self.G.nodes:\n            for output_node in self.G.nodes:\n                for name, type_list in output_node.attribute_dict.items():\n                    for type in type_list:\n                        if type == input_node.output_type:\n                            self.G.add_edge(\n                                input_node, output_node, attribute_name=name\n                            )\n\n    def combine(self, current_description: str, target_description: str) -> str:\n        \"\"\"\n        Combines two task descriptions into a single task description using GPT model.\n\n        Parameters:\n            current_description (str): The current task description.\n            target_description (str): The target task description to combine.\n\n        Returns:\n            str: The combined task description.\n        \"\"\"\n        user_content = USER_PROMPT_SINGLE.format(\n            task1=current_description, task2=target_description\n        )\n        response = self.client.chat.completions.create(\n            messages=[\n                {\"role\": \"system\", \"content\": SYSTEM_PROMPT_SINGLE},\n                {\"role\": \"user\", \"content\": user_content},\n            ],\n            model=\"gpt-4-turbo-preview\",\n        )\n        return response.choices[0].message.content\n\n    def gpt_choice(\n        self,\n        current_description: str,\n        outgoing_edges: list[tuple[SubTask, SubTask, str]],\n    ) -> tuple[SubTask, dict[str, str], str, str]:\n        \"\"\"\n        Determines the best task choice from a list of possible target tasks using GPT model.\n\n        Parameters:\n            current_description (str): Description of the current task.\n            outgoing_edges (list): List of possible outgoing edges representing target tasks.\n\n        Returns:\n            tuple: A tuple containing the chosen SubTask, attributes, new description, and combined description.\n        \"\"\"\n        target_neighbours = \"\"\n        selected_attributes = []\n        new_descriptions = []\n        for idx, edge in enumerate(outgoing_edges):\n            _, node, attribute_name = edge\n            attributes = self._fill_task_attributes(node, attribute_name)\n            selected_attributes.append(attributes)\n            kwargs = attributes.copy()\n            kwargs[attribute_name] = \"{\" + attribute_name + \"}\"\n            new_description = node.description.format(**kwargs)\n            new_descriptions.append(new_description)\n            target_neighbours += \"Task {0}: {1}\\n\".format(idx, new_description)\n        user_content = SELECT_USER_PROMPT.format(\n            source_task=current_description,\n            target_tasks=target_neighbours,\n        )\n        response = self.client.chat.completions.create(\n            messages=[\n                {\"role\": \"system\", \"content\": SELECT_SYSTEM_PROMPT},\n                {\"role\": \"user\", \"content\": user_content},\n            ],\n            model=\"gpt-4-turbo-preview\",\n        )\n        response_message = response.choices[0].message\n        answers = response_message.content.split(\"======\")\n        index = int(answers[0].strip())\n        combined_description = answers[1].strip()\n        return (\n            outgoing_edges[index][1],\n            selected_attributes[index],\n            new_descriptions[index],\n            combined_description,\n        )\n\n    def random_walk(\n        self, current_description: str, start_node: SubTask, random_number: int\n    ) -> tuple[SubTask, dict[str, str]] | None:\n        \"\"\"\n        Performs a random walk from the starting node to generate a task sequence.\n\n        Parameters:\n            current_description (str): The current task description.\n            start_node (SubTask): The starting subtask node.\n            random_number (int): Maximum number of edges to consider.\n\n        Returns:\n            tuple | None: A tuple containing the next SubTask, attributes if a next step is available, otherwise None.\n        \"\"\"\n        out_edges = list(self.G.out_edges(start_node, data=\"attribute_name\"))\n        if len(out_edges) == 0:\n            print(colored(\"\\n*** No neighbour points, generation stopped ***\\n\", \"red\"))\n            return None\n        if start_node.output_type == \"None\":\n            print(colored(\"\\n*** Output None, generation will stop ***\\n\", \"red\"))\n            return None\n\n        if random_number <= len(out_edges):\n            select_edge_list = random.sample(out_edges, random_number)\n        else:\n            select_edge_list = out_edges\n        return self.gpt_choice(current_description, select_edge_list)\n\n    def _fill_task_attributes(self, task: SubTask, kept_attribute: str):\n        \"\"\"\n        Fills the task attributes by randomly selecting values from the attribute pool, except the kept attribute.\n\n        Parameters:\n            task (SubTask): The task whose attributes need to be filled.\n            kept_attribute (str): The attribute to exclude from filling.\n\n        Returns:\n            dict: A dictionary of filled attributes.\n        \"\"\"\n        attribute_types = task.attribute_dict.copy()\n        attribute_types.pop(kept_attribute)\n        return self._select_random_attributes(attribute_types)\n\n    def _select_random_attributes(self, type_dict: dict[str, str]) -> dict[str, str]:\n        \"\"\"\n        Randomly selects attributes for a task from the attribute pool based on the type dictionary.\n\n        Parameters:\n            type_dict (dict): A dictionary of attribute types to attribute names.\n\n        Returns:\n            dict: A dictionary of selected attributes.\n        \"\"\"\n        result = {}\n        for attr_name, attr_type_list in type_dict.items():\n            pool = []\n            for attr_type in attr_type_list:\n                if attr_type not in self.attribute_pool:\n                    raise ValueError(f\"{attr_type} not in attribute pool.\")\n                pool.extend(self.attribute_pool[attr_type])\n            result[attr_name] = random.choice(pool)\n        return result\n\n    @staticmethod\n    def generate_single_node_task(subtask: SubTask):\n        \"\"\"\n        Generates a single node task based on a SubTask instance.\n\n        Parameters:\n            subtask (SubTask): The subtask to generate a task for.\n\n        Returns:\n            tuple: A tuple containing the task description and a directed graph of the task.\n        \"\"\"\n        print(colored(f\"Generating task: {subtask.description}\\n\", \"green\"))\n        attributes = {}\n        for name, type_name in subtask.attribute_dict.items():\n            value = input(\n                colored(f'Input attribute \"{name}\" ({type_name}): ', \"yellow\")\n            )\n            attributes[name] = value\n        description = subtask.description.format(**attributes)\n        result_graph = nx.DiGraph()\n        result_graph.add_node(SubTaskInstance(task=subtask, attribute=attributes))\n        return description, result_graph\n\n    def combine_subtask_list(self, subtask_list: list[SubTask]):\n        \"\"\"\n        Combines a list of subtasks into a single task sequence.\n\n        Parameters:\n            subtask_list (list): A list of SubTask instances to combine.\n\n        Returns:\n            tuple: A tuple containing the final task description and a directed graph of the task sequence.\n        \"\"\"\n        start_node = subtask_list[0]\n        attributes = self._select_random_attributes(start_node.attribute_dict)\n        result_graph = nx.DiGraph()\n        output = input(\n            colored(\n                f\"What is the output of {start_node.description.format(**attributes)}: \",\n                \"yellow\",\n            )\n        )\n        last_node = SubTaskInstance(\n            task=start_node, attribute=attributes, output=output or None\n        )\n        result_graph.add_node(last_node)\n        current_description = start_node.description.format(**attributes)\n        for task in subtask_list[1:]:\n            current_description = self.combine(current_description, task.description)\n            key = next(iter(task.attribute_dict.keys()))\n            attributes = {key: output}\n            output = input(\n                colored(\n                    f\"What is the output of {task.description.format(**attributes)}: \",\n                    \"yellow\",\n                )\n            )\n            current_node = SubTaskInstance(\n                task=task, attribute=attributes, output=output or None\n            )\n            result_graph.add_edge(last_node, current_node)\n            last_node = current_node\n        return current_description, result_graph\n\n    def combine_two_subtasks(\n        self, sub_task_id_1: int, sub_task_id_2: int\n    ) -> tuple[str, nx.DiGraph]:\n        \"\"\"\n        Combines two subtasks into a single task sequence based on user input.\n\n        Parameters:\n            sub_task_id_1 (int): ID of the first subtask.\n            sub_task_id_2 (int): ID of the second subtask.\n\n        Returns:\n            tuple: A tuple containing the combined task description and a directed graph of the task sequence.\n        \"\"\"\n        sub_task_1 = self.task_mapping[sub_task_id_1]\n        sub_task_2 = self.task_mapping[sub_task_id_2]\n        print(colored(f\"\\nTask 1: {sub_task_1.description}\", \"cyan\"))\n        print(colored(f\"Task 2: {sub_task_2.description}\\n\", \"cyan\"))\n        attributes_1 = {}\n        for name, types in sub_task_1.attribute_dict.items():\n            value = input(\n                colored(\n                    f'Input attribute \"{name}\" ({types}) for the first task: ', \"yellow\"\n                )\n            )\n            attributes_1[name] = value\n        description_1 = sub_task_1.description.format(**attributes_1)\n        output_1 = input(\n            colored(\n                f'What is the output of {description_1} (\"{sub_task_1.output_type}\"): ',\n                \"yellow\",\n            )\n        )\n\n        print(\n            colored(\n                f\"\\nThe output type of the first subtask is '{sub_task_1.output_type}'.\\n\",\n                \"cyan\",\n            )\n        )\n        attributes_2 = {}\n        for name, types in sub_task_2.attribute_dict.items():\n            if (\n                sub_task_1.output_type in types\n                or input(\n                    colored(\n                        f\"Can the output '{sub_task_1.output_type}' be used as the '{name}' ({types}) of the second task? (yes/no): \",\n                        \"yellow\",\n                    )\n                )\n                .strip()\n                .lower()\n                == \"yes\"\n            ):\n                attributes_2[name] = output_1\n            else:\n                value = input(\n                    colored(\n                        f'Input attribute \"{name}\" ({types}) for the second task: ',\n                        \"yellow\",\n                    )\n                )\n                attributes_2[name] = value\n\n        description_2 = sub_task_2.description.format(**attributes_2)\n\n        while True:\n            combined_description = self.combine(description_1, description_2)\n            print(\n                colored(f\"\\n*** Combined Task: {combined_description} ***\\n\", \"green\")\n            )\n            if (\n                input(\n                    colored(\n                        \"Do you want to re-generate the combined task? (yes/no): \",\n                        \"yellow\",\n                    )\n                )\n                .strip()\n                .lower()\n                != \"yes\"\n            ):\n                break\n        result_graph = nx.DiGraph()\n        node1 = SubTaskInstance(\n            task=sub_task_1, attribute=attributes_1, output=output_1\n        )\n        node2 = SubTaskInstance(task=sub_task_2, attribute=attributes_2)\n        result_graph.add_node(node1)\n        result_graph.add_node(node2)\n        result_graph.add_edge(node1, node2)\n\n        return combined_description, result_graph\n\n    def task_generation(\n        self,\n        start_id: int | None = None,\n        max_iter: int = 3,\n        random_number: int = 5,\n    ) -> tuple[str, list[SubTask]]:\n        \"\"\"\n        Generates a sequence of tasks starting from a given subtask ID or randomly.\n\n        Parameters:\n            start_id (int | None): The ID of the starting subtask or None to choose randomly.\n            max_iter (int): The maximum number of iterations to perform in the generation process.\n            random_number (int): The maximum number of neighbors to consider for random walk.\n\n        Returns:\n            tuple: A tuple containing the final task description and a list of SubTask objects.\n        \"\"\"\n        description = \"\"\n        task_list = []\n\n        if start_id is None:\n            start_node: SubTask = random.choice(list(self.G.nodes))\n        else:\n            for node in self.G.nodes:\n                if node.id == start_id:\n                    start_node: SubTask = node\n                    break\n        attributes = self._select_random_attributes(start_node.attribute_dict)\n        description = start_node.description.format(**attributes)\n        task_list.append((start_node, attributes, description))\n\n        current_node = start_node\n        for _ in range(max_iter - 1):\n            next_node = self.random_walk(\n                current_description=description,\n                start_node=current_node,\n                random_number=random_number,\n            )\n            if next_node is None:\n                break\n            task_list.append(next_node)\n            description = next_node[3]\n            current_node = next_node[0]\n        return description, task_list\n\n    @staticmethod\n    def generate_evaluator(\n        subtasks_graph: nx.DiGraph,\n    ):\n        \"\"\"\n        Generates an evaluator graph from a directed graph of subtask instances.\n\n        Parameters:\n            subtasks_graph (nx.DiGraph): A directed graph of subtask instances.\n\n        Returns:\n            nx.DiGraph: A directed graph representing the combined evaluator.\n        \"\"\"\n        evaluator_map = {}\n        for node in subtasks_graph.nodes:\n            evaluator_map[node.id] = node.task.evaluator_generator(**node.attribute)\n        combined_evaluator_graph = nx.union_all(list(evaluator_map.values()))\n        for from_node, to_node in subtasks_graph.edges:\n            from_node_evaluator = evaluator_map[from_node.id]\n            sink_nodes = [\n                node\n                for node, out_degree in from_node_evaluator.out_degree()\n                if out_degree == 0\n            ]\n            to_node_evaluator = evaluator_map[to_node.id]\n            start_nodes = [\n                node\n                for node, in_degree in to_node_evaluator.in_degree()\n                if in_degree == 0\n            ]\n            combined_evaluator_graph.add_edges_from(\n                itertools.product(sink_nodes, start_nodes)\n            )\n        return combined_evaluator_graph\n\n    @staticmethod\n    def dump_generated_task(\n        description,\n        task_instance_graph,\n        dir_path=\".\",\n    ):\n        \"\"\"\n        Saves a generated task to a file.\n\n        Parameters:\n            description (str): The description of the generated task.\n            task_instance_graph (nx.DiGraph): The directed graph of the task instance.\n            dir_path (str): The directory path where the task file will be saved.\n        \"\"\"\n        mapping = {node: idx for idx, node in enumerate(task_instance_graph.nodes)}\n        id_graph = nx.relabel_nodes(task_instance_graph, mapping)\n\n        generated_task = GeneratedTask(\n            description=description,\n            tasks=list(task_instance_graph.nodes),\n            adjlist=\"\\n\".join(nx.generate_adjlist(id_graph)),\n        )\n        file_path = Path(dir_path) / f\"{generated_task.id}.json\"\n        with open(file_path, \"w\") as f:\n            f.write(generated_task.model_dump_json(indent=4))\n\n        print(\n            colored(\n                \"\\n====================================================================\\n\",\n                \"magenta\",\n            )\n        )\n        print(colored(f\"Task saved to: {file_path}\", \"magenta\"))\n\n    def get_task_from_file(self, file_name) -> Task:\n        \"\"\"\n        Loads a task from a file.\n\n        Parameters:\n            file_name (str): The file name containing the task data.\n\n        Returns:\n            Task: An instance of Task loaded from the file.\n        \"\"\"\n        with open(file_name, \"r\") as f:\n            config = json.load(f)\n        description = config[\"description\"]\n        graph_map = {}\n        for idx, task_config in enumerate(config[\"tasks\"]):\n            graph_map[idx] = SubTaskInstance(\n                task=self.task_mapping[task_config[\"task\"]],\n                attribute=task_config[\"attribute\"],\n                output=task_config[\"output\"],\n            )\n        lines = config[\"adjlist\"].split(\"\\n\")\n        graph = nx.parse_adjlist(lines, nodetype=int)\n        subtask_graph = nx.relabel_nodes(graph, graph_map)\n        evaluator = self.generate_evaluator(subtask_graph)\n\n        setup_set = set()\n        teardown_set = set()\n        extra_action_set = set()\n        for node in subtask_graph.nodes:\n            setup_set.update(node.task.setup)\n            teardown_set.update(node.task.teardown)\n            extra_action_set.update(node.task.extra_action)\n        return Task(\n            id=config[\"id\"],\n            description=description,\n            evaluator=evaluator,\n            setup=list(setup_set),\n            teardown=list(teardown_set),\n            extra_action=list(extra_action_set),\n        )\n\n\ndef load_subtasks(version):\n    \"\"\"\n    Loads subtasks from specified benchmark version modules.\n\n    Parameters:\n        version (str): The version of the benchmark to load subtasks from.\n\n    Returns:\n        tuple: A tuple containing two collections of subtasks.\n    \"\"\"\n    a_subtasks_module = importlib.import_module(\n        f\"benchmarks.crab-benchmark-{version}.subtasks.a_subtasks\"\n    )\n    u_subtasks_module = importlib.import_module(\n        f\"benchmarks.crab-benchmark-{version}.subtasks.u_subtasks\"\n    )\n    return a_subtasks_module.collection, u_subtasks_module.collection\n\n\ndef generate_length1_all(\n    generator: TaskGenerator, dir_path: str, subtask_collection: list\n):\n    \"\"\"\n    Generates tasks for all subtasks in a collection and saves them.\n\n    Parameters:\n        generator (TaskGenerator): The task generator instance.\n        dir_path (str): The directory path where the tasks will be saved.\n        subtask_collection (list): The collection of subtasks to generate tasks for.\n    \"\"\"\n    for task in subtask_collection:\n        description, graph = generator.generate_single_node_task(task)\n        generator.dump_generated_task(description, graph, dir_path)\n        print(\n            colored(\n                \"\\n==================== Task Generation Completed ====================\\n\",\n                \"magenta\",\n            )\n        )\n\n\ndef generate_length1_by_id(generator: TaskGenerator, dir_path: str):\n    \"\"\"\n    Generates a single task for a specified subtask ID and saves it.\n\n    Parameters:\n        generator (TaskGenerator): The task generator instance.\n        dir_path (str): The directory path where the task will be saved.\n    \"\"\"\n    while True:\n        subtask_id = input(colored(\"Please input the subtask ID: \", \"yellow\"))\n        if subtask_id in generator.task_mapping:\n            task = generator.task_mapping[subtask_id]\n            print()\n            description, graph = generator.generate_single_node_task(task)\n            generator.dump_generated_task(description, graph, dir_path)\n            print(\n                colored(\n                    \"\\n==================== Task Generation Completed ====================\\n\",\n                    \"magenta\",\n                )\n            )\n        else:\n            print(colored(\"Invalid subtask ID. Please try again.\", \"red\"))\n\n\ndef generate_length2_manual(generator: TaskGenerator, dir_path: str):\n    \"\"\"\n    Manually generates a two-step task sequence from user-specified subtask IDs and saves it.\n\n    Parameters:\n        generator (TaskGenerator): The task generator instance.\n        dir_path (str): The directory path where the task sequence will be saved.\n    \"\"\"\n    while True:\n        sub_task_id_1 = input(\n            colored(\"Please input the id of the first subtask: \", \"yellow\")\n        )\n        sub_task_id_2 = input(\n            colored(\"Please input the id of the second subtask: \", \"yellow\")\n        )\n\n        if (\n            sub_task_id_1 in generator.task_mapping\n            and sub_task_id_2 in generator.task_mapping\n        ):\n            description, graph = generator.combine_two_subtasks(\n                sub_task_id_1=sub_task_id_1, sub_task_id_2=sub_task_id_2\n            )\n            generator.dump_generated_task(description, graph, dir_path)\n            print(\n                colored(\n                    \"\\n==================== Task Composition Completed ====================\\n\",\n                    \"magenta\",\n                )\n            )\n        else:\n            missing_ids = [\n                id\n                for id in [sub_task_id_1, sub_task_id_2]\n                if id not in generator.task_mapping\n            ]\n            print(\n                colored(\n                    f\"Invalid input: ID {', '.join(missing_ids)} not found. Please try again.\",\n                    \"red\",\n                )\n            )\n\n\ndef main():\n    parser = argparse.ArgumentParser(description=\"Task Generator for CRAB Benchmarks\")\n    parser.add_argument(\n        \"--version\", type=str, default=\"v0\", help=\"Benchmark version (e.g., v0, v1)\"\n    )\n    parser.add_argument(\n        \"--mode\",\n        type=str,\n        choices=[\n            \"generate_length1_all\",\n            \"generate_length2_manual\",\n            \"generate_length1_by_id\",\n        ],\n        help=\"Mode to run the task generator\",\n    )\n    parser.add_argument(\n        \"--dir_path\", type=str, help=\"Directory path to save the generated tasks\"\n    )\n    parser.add_argument(\n        \"--config_path\", type=str, help=\"Path to the task generation configuration file\"\n    )\n\n    args = parser.parse_args()\n\n    Path(args.dir_path).mkdir(parents=True, exist_ok=True)\n\n    a_collection, u_collection = load_subtasks(args.version)\n    all_collection = u_collection + a_collection\n\n    print(\n        colored(\n            \"\\n==================== Task Generation Starting ====================\\n\",\n            \"magenta\",\n        )\n    )\n    if args.mode == \"generate_length1_all\":\n        generator = TaskGenerator(subtasks=all_collection)\n        generate_length1_all(generator, args.dir_path, all_collection)\n    elif args.mode == \"generate_length2_manual\":\n        with open(args.config_path, \"r\") as f:\n            data = yaml.safe_load(f)\n        attribute_pool = data[\"attribute_pool\"]\n        generator = TaskGenerator(attribute_pool, all_collection)\n        generate_length2_manual(generator, args.dir_path)\n    elif args.mode == \"generate_length1_by_id\":\n        generator = TaskGenerator(subtasks=all_collection)\n        generate_length1_by_id(generator, args.dir_path)\n    else:\n        print(\n            colored(\n                \"Invalid mode selected. Please choose 'generate_length1_all', 'generate_length2_manual', or 'generate_length1_by_id'.\",\n                \"red\",\n            )\n        )\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "crab/environments/__init__.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n"
  },
  {
    "path": "crab/environments/template.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom crab.core import Environment, EnvironmentConfig, action\n\n\n@action\ndef set_state(value: bool, env: Environment) -> None:\n    \"\"\"\n    Set system state to the given value.\n\n    Args:\n        value (bool): The given value to set the system state.\n    \"\"\"\n    env.state = value\n\n\n@action\ndef current_state(env: Environment) -> bool:\n    \"\"\"\n    Get current system state.\n    \"\"\"\n    return env.state\n\n\ntemplate_environment_config = EnvironmentConfig(\n    name=\"template_env\",\n    action_space=[set_state],\n    observation_space=[current_state],\n    description=\"A test environment\",\n    info=None,\n    reset=set_state(False),\n)\n"
  },
  {
    "path": "crab/server/__init__.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n"
  },
  {
    "path": "crab/server/api.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport json\n\nfrom fastapi import APIRouter, Request\nfrom fastapi.responses import JSONResponse, PlainTextResponse\n\nfrom crab.utils import (\n    base64_to_callable,\n    decrypt_message,\n    encrypt_message,\n    generate_key_from_env,\n)\n\nfrom .logger import crab_logger as logger\n\napi_router = APIRouter()\n\n\n@api_router.post(\"/raw_action\")\nasync def raw_action(request: Request):\n    \"\"\"Perform the specified action with given parameters.\"\"\"\n    enc_key = generate_key_from_env()\n    # Extract query parameters as a dictionary\n    request_content = await request.body()\n    request_content = request_content.decode(\"utf-8\")\n    if enc_key is not None:\n        request_content = decrypt_message(request_content, enc_key)\n    request_json = json.loads(request_content)\n\n    action = request_json[\"action\"]\n    parameters = request_json[\"parameters\"]\n    entry = base64_to_callable(action[\"dumped_entry\"])\n    logger.info(f\"remote action: {action['name']} received. parameters: {parameters}\")\n    if \"env\" in action[\"kept_params\"]:\n        parameters[\"env\"] = request.app.environment\n\n    resp_data = {\"action_returns\": entry(**parameters)}\n    if enc_key is None:\n        return JSONResponse(content=resp_data)\n    else:\n        encrypted = encrypt_message(json.dumps(resp_data), enc_key)\n        return PlainTextResponse(content=encrypted)\n"
  },
  {
    "path": "crab/server/config.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport argparse\n\nfrom pydantic_settings import BaseSettings\n\n\nclass Settings(BaseSettings):\n    HOST: str = \"127.0.0.1\"\n    PORT: int = 8000\n    ENVIRONMENT: str = \"template_environment_config\"\n\n\nclass EnvSettings(BaseSettings):\n    DISPLAY: str = \":0\"\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Application settings\")\n    parser.add_argument(\"--HOST\", type=str, help=\"Host of the application\")\n    parser.add_argument(\"--PORT\", type=int, help=\"Port of the application\")\n    parser.add_argument(\"--ENVIRONMENT\", type=str, help=\"Environment to be loaded\")\n    return parser.parse_args()\n"
  },
  {
    "path": "crab/server/exception_handlers.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport sys\n\nfrom fastapi import Request\nfrom fastapi.exception_handlers import (\n    request_validation_exception_handler as _request_validation_exception_handler,\n)\nfrom fastapi.exceptions import RequestValidationError\nfrom fastapi.responses import JSONResponse, PlainTextResponse\n\nfrom .logger import crab_logger as logger\n\n\nasync def request_validation_exception_handler(\n    request: Request, exc: RequestValidationError\n) -> JSONResponse:\n    \"\"\"\n    This is a wrapper to the default RequestValidationException handler of FastAPI.\n    This function will be called when client input is not valid.\n    \"\"\"\n    body = await request.body()\n    query_params = request.query_params._dict  # pylint: disable=protected-access\n    detail = {\n        \"errors\": exc.errors(),\n        \"body\": body.decode(),\n        \"query_params\": query_params,\n    }\n    logger.info(detail)\n    return await _request_validation_exception_handler(request, exc)\n\n\nasync def unhandled_exception_handler(\n    request: Request, exc: Exception\n) -> PlainTextResponse:\n    \"\"\"\n    This middleware will log all unhandled exceptions. Unhandled exceptions are\n    all exceptions that are not HTTPExceptions or RequestValidationErrors.\n    \"\"\"\n    host = getattr(getattr(request, \"client\", None), \"host\", None)\n    port = getattr(getattr(request, \"client\", None), \"port\", None)\n    url = (\n        f\"{request.url.path}?{request.query_params}\"\n        if request.query_params\n        else request.url.path\n    )\n    exception_type, exception_value, exception_traceback = sys.exc_info()\n    exception_name = getattr(exception_type, \"__name__\", None)\n    logger.error(\n        f'{host}:{port} - \"{request.method} {url}\" 500 Internal Server Error '\n        f\"<{exception_name}: {exception_value}>\"\n    )\n\n    return JSONResponse(\n        status_code=500,\n        content={\n            \"error\": \"Internal Server Error\",\n            \"message\": \"An unexpected error occurred.\",\n        },\n    )\n"
  },
  {
    "path": "crab/server/logger.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport logging\n\nuvicorn_logger = logging.getLogger(\"uvicorn\")\nuvicorn_logger.setLevel(logging.INFO)\n\ncrab_logger = logging.getLogger(\"crab-server\")\ncrab_logger.setLevel(logging.INFO)\n\nLOGGING_CONFIG = {\n    \"version\": 1,\n    \"disable_existing_loggers\": False,\n    \"formatters\": {\n        \"default\": {\n            \"()\": \"uvicorn.logging.DefaultFormatter\",\n            \"format\": \"[%(asctime)s %(process)d:%(threadName)s] %(name)s - \"\n            \"%(levelname)s - %(message)s | %(filename)s:%(lineno)d\",\n        },\n        \"logformat\": {\n            \"format\": \"[%(asctime)s %(process)d:%(threadName)s] %(name)s - \"\n            \"%(levelname)s - %(message)s | %(filename)s:%(lineno)d\"\n        },\n    },\n    \"handlers\": {\n        \"file_handler\": {\n            \"class\": \"logging.FileHandler\",\n            \"level\": \"INFO\",\n            \"formatter\": \"logformat\",\n            \"filename\": \"info.log\",\n            \"encoding\": \"utf8\",\n            \"mode\": \"a\",\n        },\n        \"default\": {\n            \"formatter\": \"default\",\n            \"class\": \"logging.StreamHandler\",\n            \"stream\": \"ext://sys.stderr\",\n        },\n    },\n    \"loggers\": {\n        \"uvicorn.error\": {\n            \"level\": \"INFO\",\n            \"handlers\": [\"default\", \"file_handler\"],\n            \"propagate\": False,\n        }\n    },\n    \"root\": {\n        \"level\": \"INFO\",\n        \"handlers\": [\"default\", \"file_handler\"],\n        \"propagate\": False,\n    },\n}\n"
  },
  {
    "path": "crab/server/main.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport os\n\nimport uvicorn\nfrom fastapi import FastAPI\nfrom fastapi.exceptions import RequestValidationError\n\nfrom crab import EnvironmentConfig, create_environment\n\nfrom .api import api_router\nfrom .config import EnvSettings, Settings, parse_args\nfrom .exception_handlers import (\n    request_validation_exception_handler,\n    unhandled_exception_handler,\n)\nfrom .logger import LOGGING_CONFIG\nfrom .middleware import log_request_middleware\nfrom .utils import get_benchmarks_environments\n\n\ndef init(environment_config: EnvironmentConfig) -> FastAPI:\n    app = FastAPI(title=\"Desktop Agent Benchmark Environment Server\")\n\n    app.middleware(\"http\")(log_request_middleware)\n    app.add_exception_handler(\n        RequestValidationError, request_validation_exception_handler\n    )\n    app.add_exception_handler(Exception, unhandled_exception_handler)\n    app.include_router(api_router)\n\n    app.environment = create_environment(environment_config)\n    return app\n\n\nif __name__ == \"__main__\":\n    env_settings = EnvSettings()\n    for field in env_settings.model_fields.keys():\n        value = getattr(env_settings, field)\n        os.environ[field] = value\n\n    args = parse_args()\n    kwargs = {k: v for k, v in vars(args).items() if v is not None}\n    settings = Settings(**kwargs)\n\n    benchmarks, environments = get_benchmarks_environments()\n    app = init(environment_config=environments[settings.ENVIRONMENT])\n\n    app.server_settings = settings\n    uvicorn.run(\n        app,\n        host=settings.HOST,\n        port=settings.PORT,\n        access_log=False,\n        log_config=LOGGING_CONFIG,\n    )\n"
  },
  {
    "path": "crab/server/middleware.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport http\nimport time\n\nfrom fastapi import Request\n\nfrom .logger import uvicorn_logger as logger\n\n\nasync def log_request_middleware(request: Request, call_next):\n    \"\"\"\n    This middleware will log all requests and their processing time.\n    E.g. log:\n    0.0.0.0:1234 - GET /ping 200 OK 1.00ms\n    \"\"\"\n    url = (\n        f\"{request.url.path}?{request.query_params}\"\n        if request.query_params\n        else request.url.path\n    )\n    start_time = time.time()\n    response = await call_next(request)\n    process_time = (time.time() - start_time) * 1000\n    formatted_process_time = \"{0:.2f}\".format(process_time)\n    host = getattr(getattr(request, \"client\", None), \"host\", None)\n    port = getattr(getattr(request, \"client\", None), \"port\", None)\n    try:\n        status_phrase = http.HTTPStatus(response.status_code).phrase\n    except ValueError:\n        status_phrase = \"\"\n    logger.info(\n        f'{host}:{port} - \"{request.method} {url}\" {response.status_code} '\n        f\"{status_phrase} {formatted_process_time}ms\"\n    )\n    return response\n"
  },
  {
    "path": "crab/server/utils.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport importlib\nimport inspect\nimport pkgutil\n\n\ndef get_instances(package, class_type):\n    instance_dict = {}\n    # Iterate through all modules in the specified package\n    for _, name, ispkg in pkgutil.iter_modules(\n        package.__path__, package.__name__ + \".\"\n    ):\n        if ispkg:\n            continue  # Skip subpackages\n        module = importlib.import_module(name)\n        for name, obj in inspect.getmembers(module):\n            if isinstance(obj, class_type):\n                instance_dict[name] = obj\n    return instance_dict\n\n\ndef get_benchmarks_environments():\n    from crab import BenchmarkConfig, EnvironmentConfig, benchmarks, environments\n\n    benchmark_configs = get_instances(benchmarks, BenchmarkConfig)\n    environment_configs = get_instances(environments, EnvironmentConfig)\n\n    return benchmark_configs, environment_configs\n"
  },
  {
    "path": "crab/utils/__init__.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n\nfrom crab.utils.common import (\n    base64_to_callable,\n    base64_to_image,\n    callable_to_base64,\n    image_to_base64,\n)\nfrom crab.utils.encryption import (\n    decrypt_message,\n    encrypt_message,\n    generate_key_from_env,\n)\n\n__all__ = [\n    \"base64_to_image\",\n    \"image_to_base64\",\n    \"callable_to_base64\",\n    \"base64_to_callable\",\n    \"decrypt_message\",\n    \"encrypt_message\",\n    \"generate_key_from_env\",\n]\n"
  },
  {
    "path": "crab/utils/common.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport base64\nfrom io import BytesIO\nfrom typing import Callable\n\nimport dill\nfrom PIL import Image\n\n\ndef base64_to_image(encoded: str) -> Image.Image:\n    return Image.open(BytesIO(base64.b64decode(encoded)))\n\n\ndef image_to_base64(image: Image.Image) -> str:\n    img_byte_arr = BytesIO()\n    image.save(img_byte_arr, format=\"png\")\n    return base64.b64encode(img_byte_arr.getvalue()).decode(\"utf-8\")\n\n\ndef callable_to_base64(func: Callable) -> str:\n    return base64.b64encode(dill.dumps(func, recurse=True)).decode(\"utf-8\")\n\n\ndef base64_to_callable(encoded: str) -> Callable:\n    return dill.loads(base64.b64decode(encoded))\n\n\ndef json_expand_refs(schema: dict | list, defs: dict | None = None):\n    \"\"\"Recursively expand `$ref` and `allOf` in the JSON.\n\n    This function walks through the schema object, replacing any `$ref` with its\n    corresponding definition found in `$defs`. It also expands subschemas defined in\n    `allOf` by merging their resolved definitions into a single schema.\n\n    Args:\n        schema: The JSON schema (or sub-schema).\n        defs: The collection of definitions for `$ref` expansion. If None, it will look\n            for `$defs` at the root of the schema.\n\n    Returns:\n        The schema with all `$ref` and `allOf` expanded.\n\n    Raises:\n        ValueError: If a reference cannot be resolved with the provided `$defs`.\n    \"\"\"\n    # If defs is None, it means we're at the root of the schema\n    if defs is None:\n        defs = schema.pop(\"$defs\", {})\n\n    if isinstance(schema, dict):\n        # Process `$ref` by replacing it with the referenced definition\n        if \"$ref\" in schema:\n            ref_path = schema[\"$ref\"].split(\"/\")\n            ref_name = ref_path[-1]\n            if ref_name in defs:\n                return json_expand_refs(defs[ref_name], defs)\n            else:\n                raise ValueError(f\"Reference {schema['$ref']} not found in $defs.\")\n\n        # Process `allOf` by combining all subschemas\n        elif \"allOf\" in schema:\n            combined_schema = {}\n            for subschema in schema[\"allOf\"]:\n                expanded_subschema = json_expand_refs(subschema, defs)\n                # Merge the expanded subschema into the combined_schema\n                for key, value in expanded_subschema.items():\n                    combined_schema[key] = value\n            return combined_schema\n\n        # Recursively process all keys in the dictionary\n        else:\n            return {key: json_expand_refs(value, defs) for key, value in schema.items()}\n\n    elif isinstance(schema, list):\n        # Recursively process each item in the list\n        return [json_expand_refs(item, defs) for item in schema]\n\n    # If it's neither a dict nor a list, return it as is (e.g., int, str)\n    return schema\n"
  },
  {
    "path": "crab/utils/encryption.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport base64\nimport hashlib\nimport logging\nimport os\nfrom typing import Optional\n\nfrom cryptography.hazmat.backends import default_backend\nfrom cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes\n\nlogger = logging.getLogger(\"encryption\")\n\n\ndef encrypt_message(plaintext: str, key: bytes) -> str:\n    \"\"\"Encrypts a message using a key with AES 256 encryption.\n\n    Args:\n        plaintext (str): The message to encrypt.\n        key (bytes): The encryption key, should be 256 bits.\n\n    Returns:\n        str: The encrypted message encoded in base64.\n    \"\"\"\n    nounce = os.urandom(12)\n    cipher = Cipher(algorithms.AES(key), modes.GCM(nounce), backend=default_backend())\n    encryptor = cipher.encryptor()\n    ciphertext = encryptor.update(plaintext.encode()) + encryptor.finalize()\n    return base64.b64encode(nounce + ciphertext + encryptor.tag).decode(\"utf-8\")\n\n\ndef decrypt_message(encrypted: str, key: bytes) -> str:\n    \"\"\"Decrypts an encrypted message using a key with AES 256 encryption.\n\n    Args:\n        encrypted (str): The encrypted message encoded in base64.\n        key (bytes): The encryption key, should be 256 bits.\n\n    Returns:\n        str: The decrypted message.\n    \"\"\"\n    encrypted = base64.b64decode(encrypted)\n    nounce = encrypted[:12]\n    ciphertext = encrypted[12:-16]\n    tag = encrypted[-16:]\n    cipher = Cipher(\n        algorithms.AES(key), modes.GCM(nounce, tag), backend=default_backend()\n    )\n    decryptor = cipher.decryptor()\n    return (decryptor.update(ciphertext) + decryptor.finalize()).decode(\"utf-8\")\n\n\ndef generate_key_from_env() -> Optional[bytes]:\n    \"\"\"Generate the encryption key from the environment variable `CRAB_ENC_KEY`.\n\n    Returns:\n        Optional[bytes]: The encryption key. If the environment variable is not set or\n            empty, return None.\n    \"\"\"\n    enc_key = os.environ.get(\"CRAB_ENC_KEY\")\n    # don't encrypt as long as the key is an empty value\n    if not enc_key:\n        logger.warning(\"CRAB_ENC_KEY is not set, connection will not be encrypted.\")\n        return None\n\n    return hashlib.sha256(enc_key.encode(\"utf-8\")).digest()\n"
  },
  {
    "path": "crab/utils/measure.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport logging\nimport time\nfrom functools import wraps\n\nlogger = logging.getLogger(__name__)\n\n\n# Misc logger setup so a debug log statement gets printed on stdout.\nhandler = logging.StreamHandler()\nlog_format = \"%(asctime)s %(levelname)s -- %(message)s\"\nformatter = logging.Formatter(log_format)\nhandler.setFormatter(formatter)\nlogger.addHandler(handler)\n\n\ndef timed(func):\n    \"\"\"This decorator prints the execution time for the decorated function.\"\"\"\n\n    @wraps(func)\n    def wrapper(*args, **kwargs):\n        start = time.time()\n        result = func(*args, **kwargs)\n        end = time.time()\n        func_class = args[0].__class__.__name__ if args else \"\"\n        info = \"{}.{} ran in {}s\".format(\n            func_class,\n            func.__name__,\n            round(end - start, 2),\n        )\n        if hasattr(args[0], \"name\"):\n            info += f\" with name {args[0].name}\"\n        logger.info(info)\n        return result\n\n    return wrapper\n"
  },
  {
    "path": "crab-benchmark-v0/README.md",
    "content": "# Crab Benchmark v0\n\n## Overview\n\n`crab-benchmark-v0` is a benchmark released with the crab framework to provide a standard usage. It includes two virtual machine environments: an Android smartphone and an Ubuntu desktop computer, with 100 tasks and 59 different evaluator functions in the dataset. It effectively evaluates the MLM-based agents' performance on operating real-world tasks across multiple platforms.\n\n## Get Started\n\nOur benchmark contains two important parts: **Environments** and **Tasks**.\n\n#### Environments\n\nSince our Ubuntu environment is built upon KVM, setting it up locally requires you an experienced Linux user to deal with many small and miscellaneous issues. Therefore, we provide two environment setup methods:\n\n* [Local setup](./docs/environment_local_setup.md) provides you a step-by-step guideline to build environments on a Linux Machine with **at least one monitor and 32G memory**, but it doesn't cover details like how to install KVM on your machine because they are various on different Linux distros.\n* For those who want a quicker setup, we also provide a setup through [Google Clould Platform](./docs/environment_gcp_setup.md). Specifically, we publish a disk image contains all required software and configurations on google cloud, you can use your own google account to create a cloud computer through this disk image and use [google remote desktop](https://remotedesktop.google.com/access/) to connect to it. This method doesn't have any hardware limitations and when you set it up you can run the experiment immediately. As a tradeoff, the cloud computer that meets the minimum hardware requirement costs around $0.4 per hour (depend on the machine zone).\n\nWe connect to the Android environment via ADB, so any Android device, from an emulator to a physical smartphone, will work. You should ensure ADB is installed on your system and can be directly called through the command line. In our experiment, we used the built-in emulator of [Android Studio](https://developer.android.com/studio) to create a Google Pixel 8 Pro virtual device with the release name \\textit{R} and installed necessary extra Apps.\n\n#### Tasks\n\nWe manage our task dataset using a CRAB-recommended method. Sub-tasks are defined through Pydantic models written in Python code, and composed tasks are defined in JSON format, typically combining several sub-tasks. The sub-tasks are defined in [android_subtasks](./dataset/android_subtasks.py) and [ubuntu_subtasks](./dataset/ubuntu_subtasks.py). The JSON files storing composed tasks are categorized into [android](./dataset/android/), [ubuntu](./dataset/ubuntu/), and [cross-platform](./dataset/cross/). The tasks in android and ubuntu directories are single-environment task and those in cross directory are cross-environment tasks. Additionally, we create several tasks by hand instead of composing sub-tasks to provide semantically more meaningful tasks, which are found in [handmade tasks](./dataset/handmade_tasks.py).\n\n## Experiment\n\nAfter setting up the environment, you can start the experiment. A brief overview of the experiment is as follows:\n\n1. Open the Ubuntu environment virtual machine and the Android environment emulator.\n2. Start the CRAB server in the Ubuntu environment and get its IP address and port. Let's say they are `192.168.122.72` and `8000`.\n3. Choose a task. As an example, we take the task with ID `a3476778-e512-40ca-b1c0-d7aab0c7f18b` from [handmade_tasks](./dataset/handmade_tasks.py). The task is: \"Open the 'Tasks' app on Android, check the first incomplete task, then perform the task according to its description.\"\n4. Run [main.py](./main.py) with the command `poetry run python -m crab-benchmark-v0.main --model gpt4o --policy single --remote-url http://192.168.122.72:8000 --task-id a3476778-e512-40ca-b1c0-d7aab0c7f18b`. In this command, `--model gpt4o` and `--policy single` determine the agent system, `--remote-url` specifies the Ubuntu environment interface, and `--task-id` indicates the task to be performed.\n\n#### Model\n\nFor open source models, we use [VLLM](https://github.com/vllm-project/vllm) to host Pixtral model, check [here](https://docs.vllm.ai/en/latest/models/vlm.html#online-inference) for the setup commands; [SGLang](https://github.com/sgl-project/sglang) to host LLaVa-OneVision model, check [here](https://github.com/sgl-project/sglang?tab=readme-ov-file#supported-models) for the setup commands."
  },
  {
    "path": "crab-benchmark-v0/__init__.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n"
  },
  {
    "path": "crab-benchmark-v0/android_env.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom crab import EnvironmentConfig\nfrom crab.actions.android_actions import (\n    key_press,\n    long_tap,\n    open_app_drawer,\n    screenshot,\n    setup,\n    swipe,\n    tap,\n    write_text,\n)\n\nANDROID_ENV = EnvironmentConfig(\n    name=\"android\",\n    action_space=[tap, key_press, long_tap, write_text, swipe, open_app_drawer],\n    observation_space=[screenshot],\n    description=\"\"\"A Google Pixel smartphone runs on the Android operating system. \\\nThe interface displays a current screenshot at each step and primarily \\\nsupports interaction through tapping and typing. This device offers a suite \\\nof standard applications including Phone, Photos, Camera, Chrome, and \\\nCalendar, among others. Access the app drawer to view all installed \\\napplications on the device. The Google account is pre-logged in, synchronized \\\nwith the same account used in the Ubuntu environment.\"\"\",\n    extra_attributes={\"device\": None},\n    reset=setup,\n)\n"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/1005c437-50d1-465a-b3fc-833098b22bfc.json",
    "content": "{\n    \"description\": \"In the Android operating system, use the \\\"Google Map\\\" app to find the city name corresponding to the postal code \\\"63002\\\" in South Korea, then use the \\\"Calendar\\\" app to add a new all-day event for 1 January 2025 with the text of the found city name.\",\n    \"tasks\": [\n        {\n            \"task\": \"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n            \"attribute\": {\n                \"number\": \"63002\",\n                \"country\": \"South Korea\"\n            },\n            \"output\": \"Jeju\"\n        },\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ac\",\n            \"attribute\": {\n                \"content\": \"Jeju\",\n                \"date\": \"1 January 2025\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"1005c437-50d1-465a-b3fc-833098b22bfc\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/12333aa0-e76d-4a5c-8657-9f897f62f62d.json",
    "content": "{\n    \"description\": \"In Android, use the \\\"Google Map\\\" app to find the city name for the postal code \\\"2770885\\\" in Japan, and then, using the \\\"Keep Notes\\\" app, create a new note without a title to record the city name you found.\",\n    \"tasks\": [\n        {\n            \"task\": \"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n            \"attribute\": {\n                \"number\": \"2770885\",\n                \"country\": \"Japan\"\n            },\n            \"output\": \"Chiba\"\n        },\n        {\n            \"task\": \"eb92a1e6-4c86-4d56-baac-95fc8397732e\",\n            \"attribute\": {\n                \"content\": \"Chiba\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"12333aa0-e76d-4a5c-8657-9f897f62f62d\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/22b04776-8eec-4303-b3f6-9c981f7f29b8.json",
    "content": "{\n    \"description\": \"In Android, Using \\\"Setting\\\" app, rename the device name of bluetooth as \\\"Sydney\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548an\",\n            \"attribute\": {\n                \"content\": \"Sydney\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"22b04776-8eec-4303-b3f6-9c981f7f29b8\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/2ade6a13-c7a6-4df7-8c62-77382687369e.json",
    "content": "{\n    \"description\": \"In Android, using the \\\"Contacts\\\" app, find the email of the contact named John Lauphin, then using the \\\"Gmail\\\" app, send an email to that contact with the subject \\\"Hello John.\\\"\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ap\",\n            \"attribute\": {\n                \"name\": \"John Lauphin\"\n            },\n            \"output\": \"crabbb@gmail.com\"\n        },\n        {\n            \"task\": \"0090f116-e02b-4562-a20d-b5df38be963a\",\n            \"attribute\": {\n                \"content\": \"Hello John\",\n                \"mail\": \"crabbb@gmail.com\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"2ade6a13-c7a6-4df7-8c62-77382687369e\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/346caf7c-dc74-4c38-962a-aaffb638e0c7.json",
    "content": "{\n    \"description\": \"In Android, Using \\\"Calendar\\\" app, add a new task with text \\\"meeting\\\" in date \\\"June 5th 2024\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ac\",\n            \"attribute\": {\n                \"content\": \"meeting\",\n                \"date\": \"05 June 2024\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"346caf7c-dc74-4c38-962a-aaffb638e0c7\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/379b9c58-5125-41b3-9cc6-ea925c8b094d.json",
    "content": "{\n    \"description\": \"In Android, Using Google Map app, Find the city name of corresponding post code \\\"560049\\\" in the country \\\"India\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n            \"attribute\": {\n                \"country\": \"India\",\n                \"number\": \"560049\"\n            },\n            \"output\": \"Bengaluru\"\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"379b9c58-5125-41b3-9cc6-ea925c8b094d\"\n}\n"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/4190c90c-b28c-4bb3-ab5c-af3c4fde0a3d.json",
    "content": "{\n    \"description\": \"In Android, Using Google Map app, Find the city name of corresponding post code \\\"1010021\\\" in the country \\\"Japan\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n            \"attribute\": {\n                \"country\": \"Japan\",\n                \"number\": \"101-0021\"\n            },\n            \"output\": \"Tokyo\"\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"4190c90c-b28c-4bb3-ab5c-af3c4fde0a3d\"\n}\n"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/46d7ccdb-d2e4-4b8a-bead-f2641b5ac23c.json",
    "content": "{\n    \"description\": \"In Android, Using \\\"Contacts\\\" app, add a contact with a mail \\\"{mail}\\\" with a name \\\"{name}\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ag\",\n            \"attribute\": {\n                \"mail\": \"abcdcly@qq.com\",\n                \"name\": \"John Haruhimiya\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"46d7ccdb-d2e4-4b8a-bead-f2641b5ac23c\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/483fbf9c-dc78-4ac2-9264-53c4f617f6cc.json",
    "content": "{\n    \"description\": \"Open the calendar app in the Android system and find the title of an event on the date \\\"17 August 2024,\\\" then using the \\\"Google Drive\\\" app on the same Android device, create a new folder with the founded name\",\n    \"tasks\": [\n        {\n            \"task\": \"2394b768-2ca7-45e9-b41e-2aa4e9573192\",\n            \"attribute\": {\n                \"date\": \"17 August 2024\"\n            },\n            \"output\": \"Travel to Paris\"\n        },\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ar\",\n            \"attribute\": {\n                \"content\": \"Travel to Paris\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"483fbf9c-dc78-4ac2-9264-53c4f617f6cc\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/4893a9b0-6477-495d-a73c-32503326e24a.json",
    "content": "{\n    \"description\": \"In the Android system, use the calendar app to find the title of an event on the date \\\"16 July 2024,\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"2394b768-2ca7-45e9-b41e-2aa4e9573192\",\n            \"attribute\": {\n                \"date\": \"16 July 2024\"\n            },\n            \"output\": \"Japan\"\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"4893a9b0-6477-495d-a73c-32503326e24a\"\n}\n"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/53010c40-dce4-4d72-a856-842c21059e2b.json",
    "content": "{\n    \"description\": \"In the Android system, use the calendar app to find the title of an event on the date \\\"16 July 2024,\\\" then, using the Google Map app, find the city name of the corresponding post code \\\"113-8654\\\" in the country with same name as title.\",\n    \"tasks\": [\n        {\n            \"task\": \"2394b768-2ca7-45e9-b41e-2aa4e9573192\",\n            \"attribute\": {\n                \"date\": \"16 July 2024\"\n            },\n            \"output\": \"Japan\"\n        },\n        {\n            \"task\": \"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n            \"attribute\": {\n                \"number\": \"113-8654\",\n                \"country\": \"Japan\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"53010c40-dce4-4d72-a856-842c21059e2b\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/6d9f6395-de79-4ad0-8a2a-2d674f93f293.json",
    "content": "{\n    \"description\": \"In Android, Using \\\"Clock\\\" app, set the time of \\\"London\\\" in the clock, check the time gap between the city and current city.\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ah\",\n            \"attribute\": {\n                \"place_name\": \"London\"\n            },\n            \"output\": \"7 hours behind\"\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"test_finished\":\"1\",\n    \"id\": \"6d9f6395-de79-4ad0-8a2a-2d674f93f293\"\n}\n"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/71ef7fd2-0ae3-49c8-8238-06b7aa985d25.json",
    "content": "{\n    \"description\": \"Using the \\\"Google Map\\\" app on Android, find the distance of the shortest route from \\\"National University of Singapore\\\" to \\\"Nanyang Technology University,\\\" then using the \\\"Calendar\\\" app, add a new event with the text representing the found distance on the date 21 June 2024 as an all-day event.\",\n    \"tasks\": [\n        {\n            \"task\": \"1a1b72d7-78c9-4027-8278-86083ae01045\",\n            \"attribute\": {\n                \"place_name_1\": \"National University of Singapore\",\n                \"place_name_2\": \"Nanyang Technology University\"\n            },\n            \"output\": \"13km\"\n        },\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ac\",\n            \"attribute\": {\n                \"content\": \"13km\",\n                \"date\": \"21 June 2024\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"71ef7fd2-0ae3-49c8-8238-06b7aa985d25\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/73f78fc3-1ca5-442d-801f-bc175a0bfb89.json",
    "content": "{\n    \"description\": \"In Android, using \\\"Google Map\\\" App, find the distance of the shortest route from \\\"Southern University of Science and Technology\\\" to \\\"Lianhuashan Park\\\"\",\n    \"tasks\": [\n        {\n            \"task\": \"1a1b72d7-78c9-4027-8278-86083ae01045\",\n            \"attribute\": {\n                \"place_name_1\": \"Southern University of Science and Technology\",\n                \"place_name_2\": \"Lianhuashan Park\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"73f78fc3-1ca5-442d-801f-bc175a0bfb89\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/764838cc-9359-4130-9bb2-4a75900b2d89.json",
    "content": "{\n    \"description\": \"In Android, call \\\"123456789\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"955d8773-dd7a-4072-b87c-7e546be7de4e\",\n            \"attribute\": {\n                \"number\": \"123456789\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"764838cc-9359-4130-9bb2-4a75900b2d89\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/77289141-e52b-48c8-b3a7-1b29520f3e1e.json",
    "content": "{\n    \"description\": \"In Android, Using \\\"Contacts\\\" app, find out the mail of contact named \\\"John Haruhimiya\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ap\",\n            \"attribute\": {\n                \"name\": \"John Haruhimiya\"\n            },\n            \"output\": \"abcdcly@qq.com\"  \n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"77289141-e52b-48c8-b3a7-1b29520f3e1e\"\n} "
  },
  {
    "path": "crab-benchmark-v0/dataset/android/7891ceab-7965-4ddb-a0fc-15740c9a4e44.json",
    "content": "{\n    \"description\": \"In Android, Using \\\"Google Map\\\" app, find the city name of corresponding post code \\\"560049\\\" in the country \\\"India\\\". Creat a folder with the city name in  \\\"Google Drive \\\" app\",\n    \"tasks\": [\n        {\n            \"task\": \"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n            \"attribute\": {\n                \"country\": \"India\",\n                \"number\": \"560049\"\n            },\n            \"output\": \"Bengaluru\"\n        },\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ar\",\n            \"attribute\": {\n                \"content\": \"Bengaluru\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"7891ceab-7965-4ddb-a0fc-15740c9a4e44\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/8bd51440-f959-4edc-baa5-cd03d32a5b0f.json",
    "content": "{\n    \"description\": \"In Android, use the \\\"Google Map\\\" app to find the address of the University of Sydney, then using the \\\"Gmail\\\" app, send a message to crabbb@gmail.com with the found address.\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548aw\",\n            \"attribute\": {\n                \"content\": \"The University of Sydney\"\n            },\n            \"output\": \"Camperdown NSW 2050 Australia\"\n        },\n        {\n            \"task\": \"0090f116-e02b-4562-a20d-b5df38be963a\",\n            \"attribute\": {\n                \"content\": \"Camperdown NSW 2050 Australia\",\n                \"mail\": \"crabbb@gmail.com\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"8bd51440-f959-4edc-baa5-cd03d32a5b0f\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/94b1836b-3111-40ad-8d07-b8a57efe7438.json",
    "content": "{\n    \"description\": \"In an Android system, use the calendar app to find the title of an event on the date \\\"9 August 2024\\\", and then, using the Gmail app, send an email to crabbb@gmail.com with the event title as message.\",\n    \"tasks\": [\n        {\n            \"task\": \"2394b768-2ca7-45e9-b41e-2aa4e9573192\",\n            \"attribute\": {\n                \"date\": \"9 August 2024\"\n            },\n            \"output\": \"National Day of Singapore would be a public holiday\"\n        },\n        {\n            \"task\": \"0090f116-e02b-4562-a20d-b5df38be963a\",\n            \"attribute\": {\n                \"content\": \"National Day of Singapore would be a public holiday\",\n                \"mail\": \"crabbb@gmail.com\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"94b1836b-3111-40ad-8d07-b8a57efe7438\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/a225f7f8-6d03-4619-b57d-7a08610030d8.json",
    "content": "{\n    \"description\": \"In Android, Using \\\"Google Map\\\" app, Find the address of \\\"University of Oxford\\\" and send \\\"98801234\\\" the address using \\\"message\\\" App. \",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548aw\",\n            \"attribute\": {\n                \"content\": \"University of Oxford\"\n            },\n            \"output\": \"Wellington Square, Oxford OX1 2JD, United Kingdom\"\n        },\n        {\n            \"task\": \"caa29623-1811-402d-963a-19f7eecc63d8\",\n            \"attribute\": {\n                \"content\": \"Wellington Square, Oxford OX1 2JD, United Kingdom\",\n                \"number\": \"98801234\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"a225f7f8-6d03-4619-b57d-7a08610030d8\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/b077299d-1acb-40f5-89f3-cc08044345bf.json",
    "content": "{\n    \"description\": \"Using \\\"Tasks\\\" app, add a new task with text \\\"Watch camel tutorial video\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548af\",\n            \"attribute\": {\n                \"content\": \"Watch camel tutorial video\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"b077299d-1acb-40f5-89f3-cc08044345bf\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/b3965b07-4683-4445-9de1-a1dedf6c73ad.json",
    "content": "{\n    \"description\": \"In Android, Using \\\"Google Map\\\" app, Find the address of \\\"University of Oxford\\\" and send \\\"abcdcly@qq.com\\\" the address using \\\"Gmail\\\" App. \",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548aw\",\n            \"attribute\": {\n                \"content\": \"University of Oxford\"\n            },\n            \"output\": \"Wellington Square, Oxford OX1 2JD, United Kingdom\"\n        },\n        {\n            \"task\": \"0090f116-e02b-4562-a20d-b5df38be963a\",\n            \"attribute\": {\n                \"content\": \"Wellington Square, Oxford OX1 2JD, United Kingdom\",\n                \"mail\": \"abcdcly@qq.com\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"b3965b07-4683-4445-9de1-a1dedf6c73ad\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/c1b1cfeb-40e7-49a8-a3f5-b8c8ba723601.json",
    "content": "{\n    \"description\": \"In Android, Using \\\"Google Drive\\\" app, create a new folder named \\\"Journey\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ar\",\n            \"attribute\": {\n                \"content\": \"Journey\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"c1b1cfeb-40e7-49a8-a3f5-b8c8ba723601\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/c85f03c9-83c4-417b-93d9-0d7b41022525.json",
    "content": "{\n    \"description\": \"In android system, use the calendar app, find the title of an event in the date \\\"15 June, 2024\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"2394b768-2ca7-45e9-b41e-2aa4e9573192\",\n            \"attribute\": {\n                \"date\": \"15 June 2024\"\n            },\n            \"output\": \"EMNLP ddl\"\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"c85f03c9-83c4-417b-93d9-0d7b41022525\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/cf4c496b-fbbd-4701-91ea-4590fe6a66e1.json",
    "content": "{\n    \"description\": \"In Android, use the \\\"Google Map\\\" app to find the city name corresponding to the postcode \\\"110151\\\" in Colombia, then use the \\\"Clock\\\" app to set the time of that city in the clock and check the time gap between that city and your current city.\",\n    \"tasks\": [\n        {\n            \"task\": \"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n            \"attribute\": {\n                \"number\": \"110151\",\n                \"country\": \"Columbia\"\n            },\n            \"output\": \"Bogota\"\n        },\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ah\",\n            \"attribute\": {\n                \"place_name\": \"Bogota\"\n            },\n            \"output\": \"-5h\"\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"cf4c496b-fbbd-4701-91ea-4590fe6a66e1\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/d0811e47-d75f-40ce-b34b-e1ee3c8bed3f.json",
    "content": "{\n    \"description\": \"In Android, first use the \\\"Files\\\" app to find the creation date of the file /Movies/movie_list.txt, then use the \\\"Calendar\\\" app to add a new event titled \\\"Public Talking\\\" scheduled for all day on the founded day.\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ak\",\n            \"attribute\": {\n                \"file_path\": \"/Movies/movie_list.txt\"\n            },\n            \"output\": \"4 June 2024\"\n        },\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ac\",\n            \"attribute\": {\n                \"content\": \"Public Talking\",\n                \"date\": \"4 June 2024\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"d0811e47-d75f-40ce-b34b-e1ee3c8bed3f\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/d2d456bb-c7d1-46af-8263-78d8509fb320.json",
    "content": "{\n    \"description\": \"In Android, using \\\"Gmail\\\" App, send \\\"abcdcly@qq.com\\\" a message \\\"Hello, nice to meet you!\\\"\",\n    \"tasks\": [\n        {\n            \"task\": \"0090f116-e02b-4562-a20d-b5df38be963a\",\n            \"attribute\": {\n                \"content\": \"Hello, nice to meet you!\",\n                \"mail\": \"abcdcly@qq.com\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"d2d456bb-c7d1-46af-8263-78d8509fb320\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/d4e0f2b3-d0ff-4efd-856f-9f5e598cfd05.json",
    "content": "{\n    \"description\": \"In Android, Using \\\"Google Map\\\" app, Find the address of \\\"University of Oxford\\\"\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548aw\",\n            \"attribute\": {\n                \"content\": \"University of Oxford\"\n            },\n            \"output\": \"Wellington Square, Oxford OX1 2JD, United Kingdom\"\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"d4e0f2b3-d0ff-4efd-856f-9f5e598cfd05\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/d7489d00-0046-4fb1-af5b-1fde7d87312c.json",
    "content": "{\n    \"description\": \"In Android, open the \\\"Contacts\\\" app to find the email address of the contact named Karoon Wei, then use the \\\"Tasks\\\" app to add a new task with the email address.\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ap\",\n            \"attribute\": {\n                \"name\": \"Karoon Wei\"\n            },\n            \"output\": \"karroonw@gmail.com\"\n        },\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548af\",\n            \"attribute\": {\n                \"content\": \"karroonw@gmail.com\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"d7489d00-0046-4fb1-af5b-1fde7d87312c\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/d92f6c33-e0a7-4101-957d-e7dd218d2565.json",
    "content": "{\n    \"description\": \"Using the \\\"Files\\\" app on an Android device, locate the file /Movies/movie_list.txt and determine its creation date, then use the Task app in the same Android system to find the title of an event scheduled for the days.\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ak\",\n            \"attribute\": {\n                \"file_path\": \"/Movies/movie_list.txt\"\n            },\n            \"output\": \"4 June 2024\"\n        },\n        {\n            \"task\": \"2394b768-2ca7-45e9-b41e-2aa4e9573192\",\n            \"attribute\": {\n                \"date\": \"4 June 2024\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"d92f6c33-e0a7-4101-957d-e7dd218d2565\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/de843952-df8f-4a26-bae9-d0a32ed9a7f5.json",
    "content": "{\n    \"description\": \"In Android, Using \\\"Files\\\" app, find the create date of \\\"Downloads/meow.jpg\\\" in the sdk system.\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ak\",\n            \"attribute\": {\n                \"file_path\": \"Download/meow.jpg.webp\"\n            },\n            \"output\": \"May 28\"\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"de843952-df8f-4a26-bae9-d0a32ed9a7f5\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/e20fd121-b981-42da-94de-efcd66889c11.json",
    "content": "{\n    \"description\": \"In Android, using \\\"Messages\\\", send \\\"The meeting starts from 10am today\\\" to \\\"123456789\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"caa29623-1811-402d-963a-19f7eecc63d8\",\n            \"attribute\": {\n                \"content\": \"The meeting starts from 10am today\",\n                \"number\": \"123456789\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"e20fd121-b981-42da-94de-efcd66889c11\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/e55d7a39-7b6b-4852-8711-844cebc88cb8.json",
    "content": "{\n    \"description\": \"In Android, use the \\\"Google Map\\\" app to find the city name corresponding to the postcode \\\"110151\\\" in Colombia.\",\n    \"tasks\": [\n        {\n            \"task\": \"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n            \"attribute\": {\n                \"number\": \"110151\",\n                \"country\": \"Columbia\"\n            },\n            \"output\": \"Bogota\"\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"e55d7a39-7b6b-4852-8711-844cebc88cb8\"\n}\n"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/e9268070-91b7-4e8c-9976-1cf8126ba13b.json",
    "content": "{\n    \"description\": \"In the Android system, use the task app to find the title of an event on the date \\\"15 June 2024\\\", then using the \\\"Google Drive\\\" app, create a new folder named as the title we found.\",\n    \"tasks\": [\n        {\n            \"task\": \"2394b768-2ca7-45e9-b41e-2aa4e9573192\",\n            \"attribute\": {\n                \"date\": \"15 June 2024\"\n            },\n            \"output\": \"EMNLP24 DDL\"\n        },\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ar\",\n            \"attribute\": {\n                \"content\": \"EMNLP24 DDL\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"e9268070-91b7-4e8c-9976-1cf8126ba13b\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/fbe6a1b1-63bb-4d4e-8a53-ff4f7839ef61.json",
    "content": "{\n    \"description\": \"In Android, open the \\\"Contacts\\\" app to find the email address of a contact named Luis Martin, then use the \\\"Messages\\\" app to send the found email address to the phone number \\\"04055891132\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ap\",\n            \"attribute\": {\n                \"name\": \"Luis Martin\"\n            },\n            \"output\": \"lmartin0431@gmail.com\"\n        },\n        {\n            \"task\": \"caa29623-1811-402d-963a-19f7eecc63d8\",\n            \"attribute\": {\n                \"content\": \"lmartin0431@gmail.com\",\n                \"number\": \"04055891132\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"fbe6a1b1-63bb-4d4e-8a53-ff4f7839ef61\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android/fc642cb6-5321-4966-afbf-fb3348bb69ee.json",
    "content": "{\n    \"description\": \"In Android, using \\\"Keep Notes\\\" App, record \\\"Camel is the best agent framework in the world!\\\" in a new note without title.\",\n    \"tasks\": [\n        {\n            \"task\": \"eb92a1e6-4c86-4d56-baac-95fc8397732e\",\n            \"attribute\": {\n                \"content\": \"Camel is the best agent framework in the world!\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"fc642cb6-5321-4966-afbf-fb3348bb69ee\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/android_subtasks.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# ruff: noqa: E501\n\nimport re\n\nimport networkx as nx\nfrom lxml import etree\nfrom lxml.etree import _Element\nfrom networkx import DiGraph, path_graph\n\nfrom crab import SubTask, evaluator\nfrom crab.actions.android_actions import execute_adb\n\n\ndef get_xml_etree(env) -> _Element | None:\n    xml_str = execute_adb(\"exec-out uiautomator dump /dev/tty\", env)\n    if \"UI hierchary dumped to: /dev/tty\" not in xml_str:\n        return None\n    xml_str = xml_str.removesuffix(\"UI hierchary dumped to: /dev/tty\")\n    return etree.fromstring(xml_str.encode(\"utf-8\"))\n\n\n@evaluator(env_name=\"android\", local=True)\ndef check_contain_input_text(text: str, env) -> bool:\n    if env.trajectory:\n        action_name, params, _ = env.trajectory[-1]\n        if action_name == \"write_text\" and text.lower() in params[\"text\"].lower():\n            return True\n    return False\n\n\n@evaluator(env_name=\"android\", local=True)\ndef check_contain_input_text_multiple(text: str, env) -> bool:\n    if env.trajectory:\n        for action_name, params, _ in env.trajectory:\n            if action_name == \"write_text\" and text in params[\"text\"].lower():\n                return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_contain_contact(name: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    title_node = root.xpath(\n        '//node[@resource-id=\"com.android.contacts:id/photo_touch_intercept_overlay\"]'\n    )\n    if not title_node:\n        return False\n    if title_node[0].get(\"content-desc\") != name:\n        return False\n    info_node = root.xpath('//*[@class=\"android.widget.RelativeLayout\"]')\n    if not info_node:\n        return False\n    print(\"info node checked\")\n    mail_node = None\n    for node in info_node:\n        desc = node.get(\"content-desc\")\n        if \"Email\" in desc:\n            mail_node = node\n    if mail_node is None:\n        return False\n    real_mail_node = mail_node.xpath(\n        '//*[@resource-id=\"com.android.contacts:id/header\"]'\n    )\n    if not real_mail_node:\n        return False\n    context = real_mail_node[0].get(\"text\")\n    print(\"context get\")\n    pattern = re.compile(r\"^\\w+@\\w+.com\")\n    if pattern.match(context):\n        return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_current_package_name(name: str, env) -> bool:\n    result = execute_adb(\n        r'shell \"dumpsys activity activities | grep mResumedActivity\"', env\n    )\n    return name in result\n\n\n@evaluator(env_name=\"android\", local=True)\ndef check_ocr_results(text: str, env) -> bool:\n    return text in env.ocr_results\n\n\n@evaluator(env_name=\"android\")\ndef check_current_message_page(title: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    title_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.apps.messaging:id/conversation_title\"]'\n    )\n    if title_node:\n        return title == title_node[0].get(\"text\")\n    else:\n        return False\n\n\n@evaluator(env_name=\"android\")\ndef check_message_text_box_contain(text: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    text_box_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.apps.messaging:id/compose_message_text\"]'\n    )\n    if text_box_node:\n        return text.lower() in text_box_node[0].get(\"text\").lower()\n    else:\n        return False\n\n\n@evaluator(env_name=\"android\")\ndef check_message_text_box_empty(env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    text_box_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.apps.messaging:id/compose_message_text\"]'\n    )\n    if not text_box_node:\n        return False\n    if text_box_node[0].get(\"text\").strip() == \"Text message\":\n        return True\n    else:\n        return False\n\n\n@evaluator(env_name=\"android\")\ndef check_send_message(title: str, message: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    title_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.apps.messaging:id/conversation_title\"]'\n    )\n    if not title_node or title != title_node[0].get(\"text\"):\n        return False\n    messages_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.apps.messaging:id/message_text\"]'\n    )\n    for node in messages_node:\n        if message in node.get(\"text\"):\n            return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_note_content(content: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    title_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.keep:id/editable_title\"]'\n    )\n    if not title_node:\n        return False\n    if title_node[0].get(\"text\") != \"Title\":\n        return False\n    node = root.xpath(\n        '//node[@resource-id=\"com.google.android.keep:id/edit_note_text\"]'\n    )\n    if not node:\n        return False\n    if content in node[0].get(\"text\"):\n        return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_bluetooth_name(content: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    bluetooth_node = root.xpath('//node[@resource-id=\"android:id/summary\"]')\n    if not bluetooth_node:\n        return False\n    if content in bluetooth_node[0].get(\"text\"):\n        return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_map_direction_page(from_des: str, to_des: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    from_node = root.xpath(f'//node[@content-desc=\"Start location, {from_des}\"]')\n    if not from_node:\n        return False\n    to_node = root.xpath(f'//node[@content-desc=\"Destination, {to_des}\"]')\n    if not to_node:\n        return False\n    return True\n\n\n@evaluator(env_name=\"android\")\ndef check_dial_number(phone_number: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    dialer_node = root.xpath('//node[@resource-id=\"com.android.dialer:id/digits\"]')\n    if not dialer_node:\n        return False\n    number = dialer_node[0].get(\"text\")\n    number = re.sub(\"[^0-9]\", \"\", number)\n    target = re.sub(\"[^0-9]\", \"\", phone_number)\n    return number == target\n\n\n@evaluator(env_name=\"android\")\ndef check_calendar_registered(date: str, content: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    calendar_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.calendar:id/alternate_timeline_fragment_container\"]'\n    )\n    if not calendar_node:\n        return False\n    itr_calendar_node = calendar_node[0].xpath(\n        '//node[@class=\"android.support.v7.widget.RecyclerView\"]'\n    )\n    if not itr_calendar_node:\n        return False\n    target_nodes = itr_calendar_node[0].xpath('//node[@content-desc=\"{content}\"]')\n    if not target_nodes:\n        return False\n    return True\n\n\n@evaluator(env_name=\"android\")\ndef check_drive_registered(content: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    entry_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.apps.docs:id/entry_label\"]'\n    )\n    if not entry_node:\n        return False\n    for node in entry_node:\n        if content == node.get(\"text\") and f\"{content} Folder\" == node.get(\n            \"content-desc\"\n        ):\n            return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_contact_registered(mail: str, name: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    name_node = root.xpath('//node[@resource-id=\"com.android.contacts:id/large_title\"]')\n    if not name_node:\n        return False\n    text = name_node[0].get(\"text\")\n    if text not in name:\n        return False\n\n    mail_node = root.xpath('//node[@resource-id=\"com.android.contacts:id/header\"]')\n    text = mail_node[0].get(\"text\")\n    if text not in mail:\n        return False\n    return True\n\n\n@evaluator(env_name=\"android\")\ndef check_calling_number(phone_number: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    dialer_node = root.xpath(\n        '//node[@resource-id=\"com.android.dialer:id/contactgrid_contact_name\"]'\n    )\n    if not dialer_node:\n        return False\n    number = dialer_node[0].get(\"text\")\n    number = re.sub(\"[^0-9]\", \"\", number)\n    target = re.sub(\"[^0-9]\", \"\", phone_number)\n    return number == target\n\n\n@evaluator(env_name=\"android\")\ndef check_google_tasks_name(target: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    task_nodes = root.xpath(\n        '//node[@resource-id=\"com.google.android.apps.tasks:id/task_name\"]'\n    )\n    if not task_nodes:\n        return False\n    for node in task_nodes:\n        task_name = node.get(\"text\")\n        if target in task_name:\n            return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_date(target: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    date_nodes = root.xpath(\n        '//node[@resource-id=\"com.google.android.apps.photos:id/datetime_item_layout\"]'\n    )\n    if not date_nodes:\n        return False\n    prev_node = date_nodes.xpath(\n        '//node[@resource-id=\"com.google.android.apps.photos:id/label\"]'\n    )\n    time = prev_node.get(\"text\")\n    pattern = re.compile(r\"^\\w{3},\\s\\w{3}\\s\\d{2},\\s\\d{4}\\s•\\s\\d{1,2}:\\d{2}\\s[AP]M$\")\n    if pattern.match(time):\n        return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_city_clock(place_name: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    city_nodes = root.xpath(\n        '//node[@resource-id=\"com.google.android.deskclock:id/city_name\"]'\n    )\n    if city_nodes is None:\n        return False\n    for city_node in city_nodes:\n        text = city_node.get(\"text\")\n        if place_name == text:\n            return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_event(date: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    event_nodes = root.xpath('//node[@class=\"android.support.v7.widget.RecyclerView\"]')\n    if event_nodes is None:\n        return False\n    if not event_nodes:\n        return False\n    for node in event_nodes[0]:\n        text = node.get(\"content-desc\")\n        if date in text:\n            return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_event_registered(date: str, content: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    event_nodes = root.xpath('//node[@class=\"android.support.v7.widget.RecyclerView\"]')\n    if not event_nodes:\n        return False\n    time_reg = False\n    content_reg = False\n    for node in event_nodes[0]:\n        text = node.get(\"content-desc\")\n        if date.lower() in text.lower():\n            time_reg = True\n        if content.lower() in text.lower():\n            content_reg = True\n    if time_reg and content_reg:\n        return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_location(content: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    checked_node = root.xpath(f'//node[@content-desc=\"{content}\"]')\n    if not checked_node:\n        return False\n    return True\n\n\n@evaluator(env_name=\"android\")\ndef check_contain_city(number: str, city: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    business_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.apps.maps:id/search_omnibox_text_box\"]'\n    )\n    if not business_node:\n        return False\n    text = None\n    for node in business_node[0]:\n        text = node.get(\"text\")\n    if text is None:\n        return False\n    if city in text and str(number) in text:\n        return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_file(content: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    name_source_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.apps.photos:id/exif_item_layout\"]'\n    )\n    if not name_source_node:\n        return False\n    name_nodes = name_source_node[0].xpath(\n        '//node[@resource-id=\"com.google.android.apps.photos:id/label\"]'\n    )\n    if not name_nodes:\n        return False\n    target_node = None\n    for node in name_nodes:\n        text = node.get(\"text\")\n        if content in text:\n            target_node = node\n    if target_node is None:\n        return False\n    time_source_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.apps.photos:id/datetime_item_layout\"]'\n    )\n    if not time_source_node:\n        return False\n    time_nodes = time_source_node[0].xpath(\n        '//node[@resource-id=\"com.google.android.apps.photos:id/label\"]'\n    )\n    if not time_nodes:\n        return False\n    target_node = None\n    for node in time_nodes:\n        text = node.get(\"text\")\n        pattern = re.compile(\n            r\"(Tue|Mon|Wed|Thu|Fri|Sat|Sun),\\s(May|Jan|Feb|Mar|Apr|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\\s\\d{2},\\s\\d{4} • \\d{2}:\\d{2}\\s(AM|PM)\"\n        )\n        if pattern.match(text):\n            return True\n        return False\n\n\n@evaluator(env_name=\"android\")\ndef check_mail_sent(mail: str, content: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    to_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.gm:id/peoplekit_chip\"]'\n    )\n    if not to_node:\n        return False\n    checked = False\n    for node in to_node:\n        text = node.get(\"content-desc\")\n        if mail in text:\n            checked = True\n    if not checked:\n        return False\n    # check the mail information-> Done\n\n    # check the content information\n    body_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.gm:id/body_wrapper\"]'\n    )\n    if not body_node:\n        return False\n    text_node = body_node[0].xpath('//node[@class=\"android.widget.EditText\"]')\n    if not text_node:\n        return False\n    for node in text_node:\n        text = node.get(\"text\")\n        if content in text:\n            return True\n    return False\n\n\ndef distance_evaluator_generator(place_name_1: str, place_name_2: str):\n    result = nx.DiGraph()\n    a = check_current_package_name(\"com.google.android.apps.maps\")\n    b = check_contain_input_text(place_name_1)\n    c = check_contain_input_text(place_name_2)\n    d = check_map_direction_page(place_name_1, place_name_2)\n    result.add_edges_from([(a, b), (a, c), (b, d), (c, d)])\n    return result\n\n\ndef mail_evaluator_generator(mail: str, content: str):\n    result = nx.DiGraph()\n    a = check_current_package_name(\"com.google.android.gm\")\n    b = check_contain_input_text(mail)\n    c = check_contain_input_text(content)\n    d = check_mail_sent(mail, content)\n    result.add_edges_from([(a, b), (a, c), (b, d), (c, d)])\n    return result\n\n\ndef contact_evaluator_generator(mail: str, name: str):\n    result = nx.DiGraph()\n    a = check_current_package_name(\"com.android.contacts\")\n    b = check_contain_input_text(mail)\n    c = check_contain_input_text(name)\n    d = check_contact_registered(mail, name)\n    result.add_edges_from([(a, b), (a, c), (b, d), (c, d)])\n    return result\n\n\nandroid_subtasks = [\n    SubTask(\n        id=\"1a1b72d7-78c9-4027-8278-86083ae01045\",\n        description='In Android, using \"Google Map\" App, find the distance of the shortest route from \"{place_name_1}\" to \"{place_name_2}\"',\n        attribute_dict={\"place_name_1\": \"place_name_1\", \"place_name_2\": \"place_name_2\"},\n        output_type=\"number\",\n        evaluator_generator=distance_evaluator_generator,\n    ),\n    SubTask(\n        id=\"eb92a1e6-4c86-4d56-baac-95fc8397732e\",\n        description='In Android, using \"Keep Notes\" App, record \"{content}\" in a new note without title.',\n        attribute_dict={\"content\": \"content\"},\n        output_type=\"None\",\n        evaluator_generator=lambda content: path_graph(\n            [\n                check_current_package_name(\"com.google.android.keep\"),\n                check_contain_input_text(content),\n                check_note_content(content),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"caa29623-1811-402d-963a-19f7eecc63d8\",\n        description='In Android, using \"Messages\", send \"{content}\" to \"{number}\".',\n        attribute_dict={\"content\": \"content\", \"number\": \"number\"},\n        output_type=\"None\",\n        evaluator_generator=lambda content, number: path_graph(\n            [\n                check_current_package_name(\"com.google.android.apps.messaging\"),\n                check_current_message_page(number),\n                check_contain_input_text(content),\n                check_send_message(number, content),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"955d8773-dd7a-4072-b87c-7e546be7de4e\",\n        description='In Android, call \"{number}\".',\n        attribute_dict={\"number\": \"number\"},\n        output_type=\"None\",\n        evaluator_generator=lambda number: path_graph(\n            [\n                check_current_package_name(\"com.android.dialer\"),\n                check_dial_number(number),\n                check_calling_number(number),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"a3d11574-2acf-4b26-a569-a5dbc9d548af\",\n        description='Using \"Tasks\" app, add a new task with text \"{content}\".',\n        attribute_dict={\"content\": \"content\"},\n        output_type=\"None\",\n        evaluator_generator=lambda content: path_graph(\n            [\n                check_current_package_name(\"com.google.android.apps.tasks\"),\n                check_contain_input_text(content),\n                check_google_tasks_name(content),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"a3d11574-2acf-4b26-a569-a5dbc9d548ac\",\n        description='In Android, Using \"Calendar\" app, add a new event with text \"{content}\" in date \"{date}\" all day.',\n        attribute_dict={\"content\": \"content\", \"date\": \"date\"},\n        output_type=\"None\",\n        evaluator_generator=lambda content, date: path_graph(\n            [\n                check_current_package_name(\"com.google.android.calendar\"),\n                check_contain_input_text(content),\n                check_event_registered(date, content),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"a3d11574-2acf-4b26-a569-a5dbc9d548ag\",\n        description='In Android, Using \"Contacts\" app, add a contact with a mail \"{mail}\" with a name \"{name}\".',\n        attribute_dict={\"mail\": \"mail\", \"name\": \"name\"},\n        output_type=\"None\",\n        evaluator_generator=contact_evaluator_generator,\n    ),\n    SubTask(\n        id=\"a3d11574-2acf-4b26-a569-a5dbc9d548ap\",\n        description='In Android, Using \"Contacts\" app, find out the mail of contact named {name}.',\n        attribute_dict={\"name\": \"name\"},\n        output_type=\"mail\",\n        evaluator_generator=lambda name: path_graph(\n            [\n                check_current_package_name(\"com.android.contact\"),\n                check_contain_contact(name),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"0090f116-e02b-4562-a20d-b5df38be963a\",\n        description='In Android, Using \"Gmail\" app, send {mail} a message {content}.',\n        attribute_dict={\"content\": \"content\", \"mail\": \"mail\"},\n        output_type=\"None\",\n        evaluator_generator=mail_evaluator_generator,\n    ),\n    SubTask(\n        id=\"a3d11574-2acf-4b26-a569-a5dbc9d548ar\",\n        description='In Android, Using \"Google Drive\" app, create a new folder named {content}.',\n        attribute_dict={\"content\": \"content\"},\n        output_type=\"None\",\n        evaluator_generator=lambda content: path_graph(\n            [\n                check_current_package_name(\"com.google.android.apps.docs\"),\n                check_drive_registered(content),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"a3d11574-2acf-4b26-a569-a5dbc9d548ak\",\n        description='In Android, Using \"Files\" app, find the create date of {file_path}.',\n        attribute_dict={\"file_path\": \"file_path\"},\n        output_type=\"Date\",\n        evaluator_generator=lambda file_path: path_graph(\n            [\n                check_current_package_name(\"com.google.android.apps.photos\"),\n                check_file(file_path),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"a3d11574-2acf-4b26-a569-a5dbc9d548an\",\n        description='In Android, Using \"Setting\" app, rename the device name of bluetooth as {name}.',\n        attribute_dict={\"content\": \"content\"},\n        output_type=\"None\",\n        evaluator_generator=lambda content: path_graph(\n            [\n                check_current_package_name(\"com.android.settings\"),\n                check_contain_input_text(content),\n                check_bluetooth_name(content),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"a3d11574-2acf-4b26-a569-a5dbc9d548ah\",\n        description='In Android, Using \"Clock\" app, set the time of {place_name} in the clock, check the time gap between the city and current city.',\n        attribute_dict={\"place_name\": \"place_name\"},\n        output_type=\"content\",\n        evaluator_generator=lambda place_name: path_graph(\n            [\n                check_current_package_name(\"com.google.android.deskclock\"),\n                check_city_clock(place_name),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"a3d11574-2acf-4b26-a569-a5dbc9d548aw\",\n        description='In Android, Using \"Google Map\" app, Find the address of {content}',\n        attribute_dict={\"content\": \"content\"},\n        output_type=\"content\",\n        evaluator_generator=lambda content: path_graph(\n            [\n                check_current_package_name(\"com.google.android.apps.maps\"),\n                check_location(content),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n        description='In Android, Using \"Google Map\" app, Find the city name of corresponding post code \"{number}\" in the country \"{country}\".',\n        attribute_dict={\"number\": \"number\", \"country\": \"country\"},\n        output_type=\"content\",\n        evaluator_generator=lambda number, country: path_graph(\n            [\n                check_current_package_name(\"com.google.android.apps.maps\"),\n                check_contain_input_text(country),\n                check_contain_input_text(number),\n                check_contain_city(number, country),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"2394b768-2ca7-45e9-b41e-2aa4e9573192\",\n        description='In android system, use the calendar app, find the title of an event in the date \"{date}\".',\n        attribute_dict={\"date\": \"date\"},\n        output_type=\"content\",\n        evaluator_generator=lambda date: path_graph(\n            [\n                check_current_package_name(\"com.google.android.calendar\"),\n                check_event(date),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    # TODO: The phone number page cannot be accesed by xml. figure out another way.\n    # SubTask(\n    #     id=\"fa9c0b01-9835-4932-824d-0990cb20e5f7\",\n    #     description='Using Settings app, find the phone number of this phone in the \"About\" panel.',\n    #     attribute_dict={},\n    #     output_type=\"phone_number\",\n    #     evaluator=lambda: path_graph(\n    #         [\n    #             check_current_package_name(\"com.android.settings\"),\n    #         ],\n    #         create_using=DiGraph,\n    #     ),\n    # ),\n]\n"
  },
  {
    "path": "crab-benchmark-v0/dataset/cross/05a7633d-b966-471c-8848-e18e69ad265f.json",
    "content": "{\n    \"description\": \"In Android, use the \\\"Google Map\\\" app to find the city name corresponding to the postal code \\\"1010021\\\" in Japan, then paste the name into LibreOffice Writer on an Ubuntu system and save it as an ODT file at \\\"/home/crab/Desktop/target.opt\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n            \"attribute\": {\n                \"number\": \"44145\",\n                \"country\": \"Germany\"\n            },\n            \"output\": \"Dortmund\"\n        },\n        {\n            \"task\": \"76de4bdb-c980-4b3a-9bd3-c87db467dffe\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Desktop/target.odt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"05a7633d-b966-471c-8848-e18e69ad265f\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/cross/1e92db38-501e-429b-ac31-453d1af10a25.json",
    "content": "{\n    \"description\": \"Open the terminal on Ubuntu, print the content of \\\"/home/crab/Desktop/kolakov.txt\\\" to the command line interface, and then, in the Android \\\"Keep Notes\\\" app, record the content in a new note without adding a title.\",\n    \"tasks\": [\n        {\n            \"task\": \"5b527839-0e58-426d-bab6-7160200b0d24\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Desktop/kolakov.txt\"\n            },\n            \"output\": \"The flight to warsaw is from kolakov\"\n        },\n        {\n            \"task\": \"eb92a1e6-4c86-4d56-baac-95fc8397732e\",\n            \"attribute\": {\n                \"content\": \"The flight to warsaw is from kolakov\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"1e92db38-501e-429b-ac31-453d1af10a25\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/cross/43be6e8e-034d-4277-8346-c4ae7553bf68.json",
    "content": "{\n    \"description\": \"On an Android device, using the Google Map app, find the address of Dignity Health Sports Park, then use Firefox to search for a university around the address on Google Maps, and copy the Google Maps sharing URL of that university to the clipboard.\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548aw\",\n            \"attribute\": {\n                \"content\": \"Dignity Health Sports Park\"\n            },\n            \"output\": \"18400 Avalon Blvd, Carson, CA 907, US\"\n        },\n        {\n            \"task\": \"2b189dc2-c77f-4fa3-8432-ba4355cc294c\",\n            \"attribute\": {\n                \"place_type\": \"University\",\n                \"place_name\": \"18400 Avalon Blvd, Carson, CA 907, US\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"43be6e8e-034d-4277-8346-c4ae7553bf68\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/cross/534be964-269a-4509-b2b8-28cc3ba8dfca.json",
    "content": "{\n    \"description\": \"On an Android system, use the calendar app to find the title of an event on the date \\\"18 September 2024\\\", then use Firefox to search for an image with the title and copy the URL of the image to the clipboard.\",\n    \"tasks\": [\n        {\n            \"task\": \"2394b768-2ca7-45e9-b41e-2aa4e9573192\",\n            \"attribute\": {\n                \"date\": \"18 September 2024\"\n            },\n            \"output\": \"Chile National Day\"\n        },\n        {\n            \"task\": \"017102b6-d2c3-466b-96f7-37c8bcddc41a\",\n            \"attribute\": {\n                \"keyword\": \"Chile National Day\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"534be964-269a-4509-b2b8-28cc3ba8dfca\"\n}\n"
  },
  {
    "path": "crab-benchmark-v0/dataset/cross/6f95cfa1-e7ae-4a82-912b-0180fc9622f2.json",
    "content": "{\n    \"description\": \"On an Android system, open the calendar app and find the title of an event scheduled for \\\"15 June 2024,\\\" copy this title, then paste the content into Visual Studio Code (VS Code) on an Ubuntu system and save it as a file named \\\"reminder.txt\\\" on the Desktop.\",\n    \"tasks\": [\n        {\n            \"task\": \"2394b768-2ca7-45e9-b41e-2aa4e9573192\",\n            \"attribute\": {\n                \"date\": \"15 June 2024\"\n            },\n            \"output\": \"EMNLP24 DDL\"\n        },\n        {\n            \"task\": \"8491e674-596b-452b-9e0e-58a44d90f947\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Desktop/reminder.txt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"6f95cfa1-e7ae-4a82-912b-0180fc9622f2\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/cross/760ed27e-b1bd-451f-8659-bdb9845fcb7f.json",
    "content": "{\n    \"description\": \"Open the \\\"~/Desktop/contact.txt\\\" file via the command line interface in Ubuntu to view its content, then use the Gmail app on Android to send a message to crabbb@gmail.com with the content.\",\n    \"tasks\": [\n        {\n            \"task\": \"5b527839-0e58-426d-bab6-7160200b0d24\",\n            \"attribute\": {\n                \"file_path\": \"~/Desktop/contact.txt\"\n            },\n            \"output\": \"crabbb@gmail.com\"\n        },\n        {\n            \"task\": \"0090f116-e02b-4562-a20d-b5df38be963a\",\n            \"attribute\": {\n                \"content\": \"Hello, please send me a message back\",\n                \"mail\": \"crabbb@gmail.com\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"760ed27e-b1bd-451f-8659-bdb9845fcb7f\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/cross/82596760-7d4d-457d-9ca9-9551ab85ec58.json",
    "content": "{\n    \"description\": \"Using the \\\"Google Map\\\" app on an Android device, find the city name corresponding to the postal code \\\"10179\\\" in Germany, and then submit the discovered city name.\",\n    \"tasks\": [\n        {\n            \"task\": \"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n            \"attribute\": {\n                \"number\": \"10179\",\n                \"country\": \"German\"\n            },\n            \"output\": \"Berlin\"\n        },\n        {\n            \"task\": \"1c3bedc3-ea5a-453c-a15b-223d72ab756d\",\n            \"attribute\": {\n                \"content\": \"Berlin\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"82596760-7d4d-457d-9ca9-9551ab85ec58\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/cross/a956a091-8de4-42ee-b152-913308dfc24b.json",
    "content": "{\n    \"description\": \"In the \\\"Clock\\\" app on Android, add Yakarta's time, compare it with the current city's time to determine the time gap, and then submit the information.\",\n    \"tasks\": [\n        {\n            \"task\": \"a3d11574-2acf-4b26-a569-a5dbc9d548ah\",\n            \"attribute\": {\n                \"place_name\": \"yakarta\"\n            },\n            \"output\": \"1 hour behind\"\n        },\n        {\n            \"task\": \"1c3bedc3-ea5a-453c-a15b-223d72ab756d\",\n            \"attribute\": {\n                \"content\": \"1 hour behind\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"a956a091-8de4-42ee-b152-913308dfc24b\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/cross/c5929ef3-ac27-4288-b02f-4f261d5871f9.json",
    "content": "{\n    \"description\": \"In Android, use the \\\"Google Map\\\" app to find the city name corresponding to the postal code \\\"1010021\\\" in Japan, then use Firefox to search for a code repository about that city on GitHub and copy the URL of the repository to the clipboard.\",\n    \"tasks\": [\n        {\n            \"task\": \"51b2463c-9904-4a32-81ba-507bfb89d61f\",\n            \"attribute\": {\n                \"number\": \"1010021\",\n                \"country\": \"Japan\"\n            },\n            \"output\": \"Tokyo\"\n        },\n        {\n            \"task\": \"bcd03c9f-62c9-4001-8d86-78358c59ce22\",\n            \"attribute\": {\n                \"keyword\": \"Tokyo\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"c5929ef3-ac27-4288-b02f-4f261d5871f9\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/cross/da5911e3-1a99-4735-ba3e-f08c5ca81fdd.json",
    "content": "{\n    \"description\": \"Open a terminal in Ubuntu, print the content of \\\"~/Desktop/contract_reminder.txt\\\", and then, on an Android device, use the Gmail app to send an email to crabbb@gmail.com, including the printed information.\",\n    \"tasks\": [\n        {\n            \"task\": \"5b527839-0e58-426d-bab6-7160200b0d24\",\n            \"attribute\": {\n                \"file_path\": \"~/Desktop/contract_reminder.txt\"\n            },\n            \"output\": \"uld be end in three days.\"\n        },\n        {\n            \"task\": \"0090f116-e02b-4562-a20d-b5df38be963a\",\n            \"attribute\": {\n                \"content\": \"uld be end in three days.\",\n                \"mail\": \"crabbb@gmail.com\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"da5911e3-1a99-4735-ba3e-f08c5ca81fdd\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/handmade_tasks.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# ruff: noqa: E501 F405\nimport os\nimport re\nimport subprocess\nimport time\nfrom datetime import datetime\n\nimport networkx as nx\n\nfrom crab import Task, action, evaluator\n\nfrom .android_subtasks import (\n    check_current_package_name,\n    check_google_tasks_name,\n    check_message_text_box_contain,\n    check_message_text_box_empty,\n    check_note_content,\n    get_xml_etree,\n)\nfrom .ubuntu_subtasks import *  # noqa: F403\n\n_item_count_cache = None\n\n\n@evaluator(env_name=\"android\")\ndef check_calendar_in_today(env) -> bool:\n    # Get today's date and format it as \"Weekday DD Month YYYY\"\n    today_date_str = datetime.now().strftime(\"%A %d %B %Y\")\n\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    # Construct the desired string with today's date\n    date_string = f\"{today_date_str}, Open Schedule View\"\n    date_node = root.xpath(f'//node[@content-desc=\"{date_string}\"]')\n    if not date_node or len(date_node) != 1:\n        return False\n    today_nodes = date_node[0].getparent().getchildren()\n    item_count = len(today_nodes) - 2\n    if item_count < 0:\n        return False\n    global _item_count_cache\n    _item_count_cache = item_count\n    return True\n\n\n@action(env_name=\"ubuntu\")\ndef get_file_bullet_points(file_path: str) -> int | None:\n    # Check if the file exists\n    if not os.path.exists(file_path):\n        return None\n\n    # Read the markdown text from the file\n    try:\n        with open(file_path, \"r\") as file:\n            markdown_text = file.read()\n    except Exception:\n        return None\n\n    # Regex to match empty checkboxes in markdown\n    pattern = r\"- \\[ \\]\"\n    # Find all matches\n    matches = re.findall(pattern, markdown_text)\n    # Return the number of empty checkboxes\n    return matches\n\n\n@evaluator(env_name=\"ubuntu\", local=True)\ndef check_blluet_point_match_calendar(file_path: str, env) -> bool:\n    matches = env._action_endpoint(get_file_bullet_points, {\"file_path\": file_path})\n    global _item_count_cache\n    if _item_count_cache is None or matches is None:\n        return False\n    return _item_count_cache == len(matches)\n\n\n@evaluator(env_name=\"android\")\ndef check_node_exist(node_query: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    node = root.xpath(f\"//node[{node_query}]\")\n    if not node:\n        return False\n    return True\n\n\n@evaluator(env_name=\"ubuntu\")\ndef check_new_jpg_files_in_dir(directory) -> bool:\n    # Get the current time\n    current_time = time.time()\n    # Time limit set to 3 minutes ago\n    time_limit = current_time - 180\n\n    # Iterate over files in the specified directory\n    for file in os.listdir(directory):\n        file_path = os.path.join(directory, file)\n        # Check if the file is a .jpg and was modified within the last 3 minutes\n        if file.endswith(\".jpg\") and os.path.getmtime(file_path) > time_limit:\n            return True\n\n    return False\n\n\n@evaluator(env_name=\"ubuntu\")\ndef check_text_list_in_current_window_name(texts: list[str]) -> bool:\n    try:\n        out = subprocess.check_output(\n            [\"xdotool\", \"getwindowfocus\", \"getwindowname\"], text=True\n        ).strip()\n    except subprocess.CalledProcessError:\n        return False\n    for text in texts:\n        if text not in out:\n            return False\n    return True\n\n\n@evaluator(env_name=\"android\")\ndef check_keep_notes_content(text: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None:\n        return False\n    edit_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.keep:id/editor_bottom_bar\"]'\n    )\n    if len(edit_node) != 1:\n        return False\n    content_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.keep:id/browse_note_interior_content\"]'\n    )\n    if len(content_node) != 1:\n        return False\n    text_nodes = content_node[0].getchildren()\n    if len(text_nodes) != 1:\n        return False\n    return text_nodes[0].get(\"text\") == text\n\n\n@evaluator(env_name=\"android\")\ndef check_keep_notes_contain_fd(env) -> bool:\n    global RESULT_fd0576be\n    text = RESULT_fd0576be\n    root = get_xml_etree(env)\n    if root is None or text is None:\n        return False\n    edit_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.keep:id/editor_bottom_bar\"]'\n    )\n    if len(edit_node) != 1:\n        return False\n    content_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.keep:id/browse_note_interior_content\"]'\n    )\n    for node in content_node:\n        text_nodes = node.getchildren()\n        if len(text_nodes) != 1:\n            continue\n        if text in text_nodes[0].get(\"text\"):\n            return True\n    return False\n\n\n@evaluator(env_name=\"android\")\ndef check_alarm_contains(time: str, env) -> bool:\n    root = get_xml_etree(env)\n    if root is None or time is None:\n        return False\n    clock_node = root.xpath(\n        '//node[@resource-id=\"com.google.android.deskclock:id/digital_clock\"]'\n    )\n    for node in clock_node:\n        if time == node.get(\"text\"):\n            return True\n    return False\n\n\n@evaluator(env_name=\"android\", local=True)\ndef check_tap_text(text: str, env) -> bool:\n    if env.trajectory:\n        action_name, params, _ = env.trajectory[-1]\n        if action_name == \"tap\":\n            try:\n                element_id = int(params[\"element\"])\n                element_label = env.element_label_map[element_id]\n            except TypeError:\n                return False\n            if element_label is None:\n                return False\n            return text.lower() in element_label.lower()\n    return False\n\n\ndef summarize_ubuntu_evaluator():\n    result = nx.DiGraph()\n    a = check_current_window_process(\"slack\")\n    b = check_current_package_name(\"com.google.android.apps.messaging\")\n    c = check_message_text_box_contain(\"agent\")\n    d = check_message_text_box_contain(\"github\")\n    e = check_message_text_box_empty()\n    result.add_edges_from([(a, c), (a, d), (b, c), (b, d), (c, e), (d, e)])\n    return result\n\n\ndef check_calendar_evaluator():\n    result = nx.DiGraph()\n    a = check_current_package_name(\"com.google.android.calendar\")\n    b = check_calendar_in_today()\n    c = check_file_exist(\"/home/crab/assets/plan.md\")\n    d = check_blluet_point_match_calendar(\"/home/crab/assets/plan.md\")\n    result.add_edges_from([(a, b), (b, d), (c, d)])\n    return result\n\n\ndef evaluator_97e6f333():\n    result = nx.DiGraph()\n    a = check_current_package_name(\"com.android.camera2\")\n    b = check_node_exist('@resource-id=\"com.android.camera2:id/rounded_thumbnail_view\"')\n    c = check_node_exist('@resource-id=\"com.android.camera2:id/filmstrip_layout\"')\n    d = check_current_package_name(\n        \"com.google.android.apps.photos/.upload.intent.UploadContentActivity\"\n    )\n    e = check_node_exist('@resource-id=\"com.android.camera2:id/filmstrip_layout\"')\n    f = check_current_window_process(\"firefox\")\n    g = check_text_in_current_window_name(\"Photos - Google Photos — Mozilla Firefox\")\n    h = check_new_jpg_files_in_dir(\"/home/crab/Downloads\")\n    i = check_file_exist(\"/home/crab/assets/photo.jpg\")\n    j = check_text_list_in_current_window_name([\"photo\", \"GIMP\"])\n    result.add_edges_from([(a, b), (b, c), (c, d), (d, e), (e, h)])\n    result.add_edges_from([(f, g), (g, h)])\n    result.add_edges_from([(h, i), (i, j)])\n    return result\n\n\ndef evaluator_82efbd82():\n    result = nx.DiGraph()\n    a = download_and_verify_file(\n        \"https://media.cntraveller.com/photos/642aa1ad770beda2d4f5cc22/4:3/w_2664,h_1998,c_limit/Fiji-march2023issue-JackJohns15.jpg\",\n        \"/home/crab/Downloads/raw.jpg\",\n    )\n    b = check_text_in_current_window_name(\"GNU Image Manipulation Program\")\n    c = check_file_exist(\"/home/crab/Pictures/edited.jpg\")\n    d = is_image_2_brighter(\n        \"/home/crab/Downloads/raw.jpg\", \"/home/crab/Pictures/edited.jpg\"\n    )\n    e = verify_background(\"/home/crab/Pictures/edited.jpg\")\n    result.add_edges_from([(a, b), (b, c), (c, d), (d, e)])\n    return result\n\n\ndef evaluator_515a5467():\n    result = nx.DiGraph()\n    a = download_and_verify_file(\n        \"https://media.cntraveller.com/photos/642aa1ad770beda2d4f5cc22/4:3/w_2664,h_1998,c_limit/Fiji-march2023issue-JackJohns15.jpg\",\n        \"/home/crab/Downloads/img_1.jpg\",\n    )\n    b = download_and_verify_file(\n        \"https://upload.wikimedia.org/wikipedia/commons/thumb/7/71/Flag_of_Ethiopia.svg/250px-Flag_of_Ethiopia.svg.png\",\n        \"/home/crab/Downloads/img_2.jpg\",\n    )\n    c = check_text_in_current_window_name(\"GNU Image Manipulation Program\")\n    d = check_file_exist(\"/home/crab/Downloads/combined_editing.jpg\")\n    e = verify_combined_image(\n        \"/home/crab/Downloads/img_1.jpg\",\n        \"/home/crab/Downloads/img_2.jpg\",\n        \"/home/crab/Downloads/combined_editing.jpg\",\n        \"right\",\n    )\n    f = check_directory_exists(\"/home/crab/jpg\")\n    g = verify_files_copied(\"/home/crab/Downloads\", \"/home/crab/jpg\", \"jpg\")\n    result.add_edges_from([(a, c), (b, c), (c, d), (d, e), (e, f), (f, g)])\n    return result\n\n\ndef evaluator_5a1eba49():\n    result = nx.DiGraph()\n    a = check_text_in_current_window_name(\"Firefox\")\n    b = check_contain_input_text(\"GPU\")\n    c = is_img_url_in_clipboard()\n    d = download_from_clipboard_and_verify_file(\"/home/crab/Pictures/GPU.png\")\n    e = check_directory_exists(\"/home/crab/Pictures/png_files\")\n    f = verify_files_copied(\n        \"/home/crab/Pictures\", \"/home/crab/Pictures/png_files\", \"png\"\n    )\n    result.add_edges_from([(a, b), (b, c), (c, d), (d, e), (e, f)])\n    return result\n\n\ndef evaluator_c347f78a():\n    file_path = \"/home/crab/assets/content.txt\"\n    content = \"An air quality health advisory is in effect Tuesday for New York City and the lower Hudson Valley, as well as western Connecticut and northern New Jersey, meaning it may not be safe for people with some conditions to be outside long.\"\n    result = nx.DiGraph()\n    a = check_current_window_process(\"gnome-terminal-server\")\n    b = is_process_open(\"vim\")\n    c = ~is_process_open(\"vim\")\n    d = check_file_content(file_path, content)\n    e = check_contain_input_text(\"cat \" + file_path)\n    f = check_submit(content)\n    result.add_edges_from([(a, b), (b, c), (c, d), (d, e), (e, f)])\n    return result\n\n\ndef evaluator_bf83c176():\n    result = nx.DiGraph()\n\n    file_path_1 = \"/home/crab/Desktop/waymo.jpg\"\n    file_path_2 = \"/home/crab/Desktop/tesla.png\"\n    output_path = \"/home/crab/Documents/self_driving.pdf\"\n    # Search for the first image and download it\n    a1 = check_text_in_current_window_name(\"Firefox\")\n    b1 = check_contain_input_text(\"Waymo\")\n    c1 = is_img_url_in_clipboard()\n    d1 = download_from_clipboard_and_verify_file(file_path_1)\n\n    # Search for the second image and download it\n    a2 = check_text_in_current_window_name(\"Firefox\")\n    b2 = check_contain_input_text(\"Tesla\")\n    c2 = is_img_url_in_clipboard()\n    d2 = download_from_clipboard_and_verify_file(file_path_2)\n\n    # Combine images into a PDF\n    e = check_text_in_current_window_name(\"LibreOffice Impress\")\n    f = check_file_exist(output_path)\n    g = verify_combined_image(file_path_1, file_path_2, output_path, \"left\")\n\n    # Add edges to form the branches and connections\n    result.add_edges_from([(a1, b1), (b1, c1), (c1, d1)])\n    result.add_edges_from([(d1, a2), (a2, b2), (b2, c2), (c2, d2)])\n    result.add_edges_from([(d2, e), (e, f), (f, g)])\n\n    return result\n\n\ndef evaluator_74bb11dd():\n    file_path_1 = \"/home/crab/Documents/FR.ods\"\n    file_path_2 = \"/home/crab/Documents/MX.ods\"\n    result = nx.DiGraph()\n\n    # Search for the first country and save information to an ODS file\n    a1 = check_text_in_current_window_name(\"Wikipedia — Mozilla Firefox\")\n    b1 = check_text_in_current_window_name(\"LibreOffice Calc\")\n    c1 = check_file_exist(file_path_1)\n    d1 = verify_country_data_in_ods(\"France\", file_path_1)\n\n    # Search for the second country and save information to an ODS file\n    a2 = check_text_in_current_window_name(\"Wikipedia — Mozilla Firefox\")\n    b2 = check_text_in_current_window_name(\"LibreOffice Calc\")\n    c2 = check_file_exist(file_path_2)\n    d2 = verify_country_data_in_ods(\"Mexico\", file_path_2)\n\n    # Create new directory and copy ODS files to it\n    e = check_directory_exists(\"/home/crab/Desktop/country_info\")\n    f = verify_files_copied(\n        \"/home/crab/Documents\", \"/home/crab/Desktop/country_info\", \"ods\"\n    )\n\n    # Add edges to form the branches and connections\n    result.add_edges_from([(a1, b1), (b1, c1), (c1, d1)])\n    result.add_edges_from([(a2, b2), (b2, c2), (c2, d2)])\n    result.add_edges_from([(d1, e), (d2, e), (e, f)])\n\n    return result\n\n\nTEXT_ca79febf = 'The rapid advancement of conversational and chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents and provide insight into their \"cognitive\" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of chat agents, providing a valuable resource for investigating conversational language models. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond. The GitHub repository of this project is made publicly available on: https://github.com/camel-ai/camel.'\n\n\ndef evaluator_ca79febf():\n    result = nx.DiGraph()\n    a = check_current_package_name(\"com.google.android.keep\")\n    b = check_keep_notes_content(TEXT_ca79febf)\n    c = check_tap_text(\"select\")\n    d = check_tap_text(\"copy\")\n    e = check_current_package_name(\n        \"com.google.android.apps.docs.editors.docs/com.google.android.apps.docs.editors.homescreen.HomescreenActivity\"\n    )\n    f = check_current_package_name(\n        \"com.google.android.apps.docs.editors.docs/com.google.android.apps.docs.editors.kix.KixEditorActivity\"\n    )\n    g = check_tap_text(\"paste\")\n    h = check_current_window_process(\"firefox\")\n    i = check_text_in_current_window_name(\"Google Docs — Mozilla Firefox\")\n    j = check_text_in_current_window_name(\n        \"Untitled document - Google Docs — Mozilla Firefox\"\n    )\n    result.add_edges_from([(a, b), (b, c), (c, d), (d, e), (e, f), (f, g), (g, j)])\n    result.add_edges_from([(h, i), (i, j)])\n    return result\n\n\ndef evaluator_dfabf84c():\n    result = nx.DiGraph()\n    keyword = \"kaust\"\n    a = check_text_in_current_window_name(\"Mozilla Firefox\")\n    b = check_contain_input_text(keyword)\n    c = is_img_url_in_clipboard()\n    d = download_from_clipboard_and_verify_file(\"/home/crab/Desktop/download.jpg\")\n    e = check_current_package_name(\"com.google.android.keep\")\n    f = check_contain_input_text(keyword)\n    g = check_note_content(keyword)\n    result.add_edges_from([(a, b), (b, c), (c, d), (d, g)])\n    result.add_edges_from([(b, e), (e, f), (f, g)])\n    return result\n\n\ndef evaluator_aab5555e():\n    result = nx.DiGraph()\n    a = check_current_window_process(\"gnome-terminal-server\")\n    b = check_contain_input_text(\"uname -a\")\n    d = check_current_package_name(\"com.google.android.apps.messaging\")\n    e = check_message_text_box_contain(\"ubuntu\")\n    f = check_message_text_box_contain(\"x86\")\n    g = check_message_text_box_contain(\"linux\")\n    h = check_message_text_box_contain(\"crab\")\n    sink = check_message_text_box_empty()\n    result.add_edges_from(\n        [\n            (a, b),\n            (b, sink),\n            (d, e),\n            (d, f),\n            (d, g),\n            (d, h),\n            (e, sink),\n            (f, sink),\n            (g, sink),\n            (h, sink),\n        ]\n    )\n    return result\n\n\nRESULT_fd0576be = None\n\n\n@action(env_name=\"ubuntu\")\ndef get_root_usage() -> str:\n    try:\n        output = subprocess.check_output([\"df\", \"/\"], text=True)\n        return output.split(\"\\n\")[1].split()[4][:-1]\n    except Exception:\n        return None\n\n\n@evaluator(env_name=\"ubuntu\", local=True)\ndef check_contain_input_text_and_get_df_result(text: str, env) -> bool:\n    global RESULT_fd0576be\n    RESULT_fd0576be = env._action_endpoint(get_root_usage, parameters={})\n    if env.trajectory:\n        inputs = [\n            params[\"text\"].lower()\n            for action_name, params, _ in env.trajectory\n            if action_name == \"write_text\"\n        ]\n        return any(text.lower() in input_text for input_text in inputs)\n\n    return False\n\n\ndef evaluator_fd0576be():\n    result = nx.DiGraph()\n    a = check_current_window_process(\"gnome-terminal-server\")\n    b = check_contain_input_text_and_get_df_result(\"df\")\n    c = check_current_package_name(\"com.google.android.keep\")\n    d = check_keep_notes_contain_fd()\n    result.add_edges_from([(a, b), (b, d), (c, d)])\n    return result\n\n\ndef evaluator_7e08f7d4():\n    result = nx.DiGraph()\n    a = check_text_in_current_window_name(\"Mozilla Firefox\")\n    b = check_contain_input_text(\n        \"https://farm9.staticflickr.com/8293/7591378270_76059bc1cf_z.jpg\"\n    )\n    c = check_current_package_name(\"com.android.deskclock.DeskClock\")\n    d = check_alarm_contains(\"7:00\\u200aAM\")\n    result.add_edges_from([(a, b), (b, d), (c, d)])\n    return result\n\n\ndef evaluator_4957e964():\n    result = nx.DiGraph()\n    a = check_current_window_process(\"gnome-terminal-server\")\n    b = check_contain_input_text(\"wget\")\n    c = check_contain_input_text(\n        \"https://farm8.staticflickr.com/7451/10001676353_fd762e02f0_z.jpg\"\n    )\n    d = check_file_exist(\"/home/crab/Desktop/download.jpg\")\n    e = check_text_in_current_window_name(\"Image Viewer\")\n    f = check_current_package_name(\"com.google.android.apps.tasks\")\n    g = check_google_tasks_name(\"tennis\")\n    result.add_edges_from([(a, b), (b, c), (c, d), (d, e), (e, g), (f, g)])\n    return result\n\n\n# Hand-made environment setup guide:\n# Ubuntu\n# * Make sure the Ubuntu slack login, and the default channel has at least two messages\n\n# Andorid\n# * Make sure the first incomplete task in android \"Tasks\" application is a instruction to change the system to dark mode.\n# * Make sure the init page of \"Calendar\" app is \"Day\" view. There should be at least one element today.\n\n\nubuntu_handmade_tasks = [\n    Task(\n        id=\"82efbd82-c941-4be9-9ac0-a495dc629e02\",\n        description='Download an image file from a given URL \"https://media.cntraveller.com/photos/642aa1ad770beda2d4f5cc22/4:3/w_2664,h_1998,c_limit/Fiji-march2023issue-JackJohns15.jpg\" to \"/home/crab/Downloads/raw.jpg\", then use GIMP (GNU Image Manipulation Program) to adjust the brightness of the image from \"/home/crab/Downloads/raw.jpg\" to be brighter and save the edited file to \"/home/crab/Pictures/edited.jpg\", and set the adjusted image \"/home/crab/Pictures/edited.jpg\" as the screen background of the system.',\n        evaluator=evaluator_82efbd82(),\n    ),\n    Task(\n        id=\"515a5467-b7ce-4cad-874d-da894361c1a3\",\n        description='Download two image files from given URLs \"https://media.cntraveller.com/photos/642aa1ad770beda2d4f5cc22/4:3/w_2664,h_1998,c_limit/Fiji-march2023issue-JackJohns15.jpg\" and \"https://upload.wikimedia.org/wikipedia/commons/thumb/7/71/Flag_of_Ethiopia.svg/250px-Flag_of_Ethiopia.svg.png\" to \"/home/crab/Downloads/img_1.jpg\" and \"/home/crab/Downloads/img_2.jpg\", combine the first image (\"/home/crab/Downloads/img_1.jpg\") with the second image (\"/home/crab/Downloads/img_2.jpg\") using GIMP (GNU Image Manipulation Program) by placing the first image on the right side of the second image, and save the resulting combined image to \"/home/crab/Downloads/combined_editing.jpg\". Then, create a new directory \"/home/crab/jpg\" and copy all files with the specified \"jpg\" extension from \"/home/crab/Downloads\" to the newly created directory \"/home/crab/jpg\".',\n        evaluator=evaluator_515a5467(),\n    ),\n    Task(\n        id=\"5a1eba49-ed2d-4955-a684-32472090a45b\",\n        description='Use Firefox to search for an image using the keyword \"GPU\", copy the URL of the found image to the clipboard, download the image file from the URL stored in the clipboard to \"/home/crab/Pictures/GPU.png\", and create a new directory \"/home/crab/Pictures/png_files\" to copy all files with the specified \"png\" extension from \"/home/crab/Pictures\" to the newly created directory \"/home/crab/Pictures/png_files\".',\n        evaluator=evaluator_5a1eba49(),\n    ),\n    Task(\n        id=\"c347f78a-4643-43c8-b41e-e437b70a2c5e\",\n        description='Open a file at \"/home/crab/assets/content.txt\" using vim in a terminal, write the specified \"An air quality health advisory is in effect Tuesday for New York City and the lower Hudson Valley, as well as western Connecticut and northern New Jersey, meaning it may not be safe for people with some conditions to be outside long.\" to it, then save and exit vim. Print the content of the file by printing it to the command line interface through a terminal, and finally, submit the printed content.',\n        evaluator=evaluator_c347f78a(),\n    ),\n    Task(\n        id=\"bf83c176-fa15-4057-996f-f75be4338c05\",\n        description='Use Firefox to search for an image using the keyword \"Waymo\" first, copy the URL of the image to the clipboard, and download the image to \"/home/crab/Desktop/waymo.jpg\". Then, search for another image using the keyword \"Tesla\", copy the URL of the image to the clipboard, and download the image to \"/home/crab/Desktop/tesla.png\". Finally, combine the two images using LibreOffice Impress, placing Image 1 from \"/home/crab/Desktop/waymo.jpg\" on the left side of Image 2 \"/home/crab/Desktop/tesla.png\", and save the resulting file in PDF format to \"/home/crab/Documents/self_driving.pdf\".',\n        evaluator=evaluator_bf83c176(),\n    ),\n    Task(\n        id=\"74bb11dd-89ca-43d0-8edf-fe7b5201ecf7\",\n        description='Use Firefox to search for information about the country \"France\" on Wikipedia. Extract the capital city and population, and save this information in an ODS file at \"/home/crab/Documents/FR.ods\" using LibreOffice Calc. Then, search for information about the country \"Mexico\" on Wikipedia, extract the capital city and population, and save this information in a separate ODS file at \"/home/crab/Documents/MX.ods\" using LibreOffice Calc. The format of the file are, first column for the country name, the second for the capital city name, and the third for the population without any header. Finally, create a new directory \"/home/crab/Desktop/country_info\" and copy all files with the specified \"ods\" extension from \"/home/crab/Documents\" to the newly created directory \"/home/crab/Desktop/country_info\".',\n        evaluator=evaluator_74bb11dd(),\n    ),\n]\n\ncorss_environment_tasks = [\n    Task(\n        id=\"79832e15-5fd3-43b8-b3e3-66249edfe1db\",\n        description='Open slack in Ubuntu desktop, summarize the last two messages in current channel, then use \"Messages\" app in android phone to send the summary to the first contact in the list.',\n        evaluator=summarize_ubuntu_evaluator(),\n    ),\n    Task(\n        id=\"a3476778-e512-40ca-b1c0-d7aab0c7f18b\",\n        # You must set the first incomplete task to \"In Ubuntu, switch the system to dark mode by \"Settings\" application\"\n        description='Open \"Tasks\" app on Android, check the first incomplete task, then perform the task according to its description',\n        evaluator=nx.path_graph(\n            [\n                check_current_package_name(\"com.google.android.apps.tasks\"),\n                check_current_window_process(\"gnome-control-center\"),\n                check_color_scheme(\"prefer-dark\"),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n    Task(\n        id=\"914e6a48-8430-4a68-8328-c4e01db8926e\",\n        # You must create several tasks in google calendar today's view.\n        description='Open \"Calendar\" app on Android, summarize all schedules today. Then, create a markdown file in Ubuntu at \"/home/crab/assets/plan.md\" with each event as a checkbox bullet point.',\n        evaluator=check_calendar_evaluator(),\n    ),\n    Task(\n        id=\"97e6f333-bedb-429b-8dd6-1855f99c312d\",\n        description=\"Take a photo through Android Camera, then upload it to Google Photos inside Camera App. Use Firefox inside Ubuntu desktop to download the photo to local disk, move it as `/home/crab/assets/photo.jpg`, finally open the photo in GIMP.\",\n        evaluator=evaluator_97e6f333(),\n    ),\n    Task(\n        id=\"ca79febf-cae7-4669-8812-d3ec85ee2868\",\n        description=\"Open the first note in the Keep Notes app on Android, copy its contents, and paste them into a new document in Google docs. Then, open the newly created document in Firefox on Ubuntu.\",\n        evaluator=evaluator_ca79febf(),\n    ),\n    Task(\n        id=\"dfabf84c-d05f-4e25-9f21-ba0f08107bd5\",\n        description='Use Firefox to search for an image using the keyword \"kaust\" and copy the URL of the image to the clipboard. Download a file from the URL stored in the clipboard to \"/home/crab/Desktop/download.jpg\". Then describe this image and save it in the Android Keep Notes app.',\n        evaluator=evaluator_dfabf84c(),\n    ),\n    Task(\n        id=\"aab5555e-4b72-4ebf-816a-59c1da2cec86\",\n        description=\"Check the all uname information of the system in Ubuntu, then explain the information to the first contact in the list of the Messages app in Android.\",\n        evaluator=evaluator_aab5555e(),\n    ),\n    Task(\n        id=\"fd0576be-8b2c-45ce-b4a2-78659740879b\",\n        description=\"Check the current disk usage through command line in Ubuntu, check the root directory usage in percentage and save the information to a note in Keep Notes app in Android.\",\n        evaluator=evaluator_fd0576be(),\n    ),\n    Task(\n        id=\"7e08f7d4-9b11-4aec-9b42-6cbde083fb4c\",\n        description='Use firefox on Ubuntu to openup the image \"https://farm9.staticflickr.com/8293/7591378270_76059bc1cf_z.jpg\", check the time of the clock in the image, then open the clock app in Android and set an alarm to the same as the image.',\n        evaluator=evaluator_7e08f7d4(),\n    ),\n    Task(\n        id=\"4957e964-5dd5-42f6-9d5d-f6a53a9a5d94\",\n        description='Use wget to download the image \"https://farm8.staticflickr.com/7451/10001676353_fd762e02f0_z.jpg\" to /home/crab/Desktop/download.jpg, what does the people in the image do? Create a task in the Tasks app in Android to remind you to do the same thing.',\n        evaluator=evaluator_4957e964(),\n    ),\n]\n\nhandmade_tasks = ubuntu_handmade_tasks + corss_environment_tasks\n"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/05d0e137-7d97-4021-9477-6490a2154c81.json",
    "content": "{\n    \"description\": \"Open \\\"/home/crab/poem\\\" using vim in a terminal, write \\\"If you shed tears when you miss the sun, you also miss the stars.\\\", then save and exit vim.\",\n    \"tasks\": [\n        {\n            \"task\": \"0f589bf9-9b26-4581-8b78-2961b115ab49\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/poem\",\n                \"content\": \"If you shed tears when you miss the sun, you also miss the stars.\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"05d0e137-7d97-4021-9477-6490a2154c81\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/0a893c2e-eec5-47cc-a930-eb01c5f17683.json",
    "content": "{\n    \"description\": \"Submit the following content \\\"If you shed tears when you miss the sun, you also miss the stars.\\\"\",\n    \"tasks\": [\n        {\n            \"task\": \"1c3bedc3-ea5a-453c-a15b-223d72ab756d\",\n            \"attribute\": {\n                \"content\": \"If you shed tears when you miss the sun, you also miss the stars.\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"0a893c2e-eec5-47cc-a930-eb01c5f17683\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/0d178388-8166-4b66-93c1-278861f9897c.json",
    "content": "{\n    \"description\": \"Use Firefox to find out a \\\"restaurant\\\" around \\\"kaust\\\" on Google Maps and copy the Google Maps sharing URL of that \\\"restaurant\\\" to the clipboard\",\n    \"tasks\": [\n        {\n            \"task\": \"2b189dc2-c77f-4fa3-8432-ba4355cc294c\",\n            \"attribute\": {\n                \"place_type\": \"restaurant\",\n                \"place_name\": \"kaust\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"0d178388-8166-4b66-93c1-278861f9897c\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/0d7c84d2-bbbd-46ab-80d1-52b3a44f3858.json",
    "content": "{\n    \"description\": \"Combine two images from Image 1 \\\"/home/crab/assets/campus.png\\\" and Image 2 \\\"/home/crab/assets/desert.jpg\\\" using LibreOffice Writer and save the resulting ODT file to \\\"/home/crab/assets/campus_desert.odt\\\". Image 1 should be placed above Image 2.\",\n    \"tasks\": [\n        {\n            \"task\": \"0111384f-38ca-41a2-9504-cb1c55002b3c\",\n            \"attribute\": {\n                \"image_path_1\": \"/home/crab/assets/campus.png\",\n                \"image_path_2\": \"/home/crab/assets/desert.jpg\",\n                \"output_path\": \"/home/crab/assets/campus_desert.odt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"0d7c84d2-bbbd-46ab-80d1-52b3a44f3858\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/0deafe05-8db5-445f-9031-f6e884569d03.json",
    "content": "{\n    \"description\": \"Create a new directory \\\"/home/crab/jpg_folder\\\", copy all files with the \\\"jpg\\\" extension from \\\"/home/crab/Pictures\\\" to this newly created directory, then open LibreOffice Impress to combine the two images located at \\\"/home/crab/jpg_folder/dog.jpg\\\" (Image 1) and \\\"/home/crab/jpg_folder/Interstellar.jpg\\\" (Image 2), placing Image 1 on the right side of Image 2, and save the combined image in PDF format to \\\"/home/crab/Documents/combination.pdf\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"217ababc-ccc7-4b9f-af07-c239d92848fe\",\n            \"attribute\": {\n                \"file_extension\": \"jpg\",\n                \"source_dir\": \"/home/crab/Pictures\",\n                \"target_dir\": \"/home/crab/jpg_folder\"\n            },\n            \"output\": \"/home/crab/jpg_folder\"\n        },\n        {\n            \"task\": \"467f17a6-c42f-4eda-996f-a53385eb3efd\",\n            \"attribute\": {\n                \"image_path_1\": \"/home/crab/jpg_folder/dog.jpg\",\n                \"image_path_2\": \"/home/crab/jpg_folder/Interstellar.jpg\",\n                \"output_path\": \"/home/crab/Documents/combination.pdf\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"0deafe05-8db5-445f-9031-f6e884569d03\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/0e80fd90-0b23-454f-a629-7b6d7baa7542.json",
    "content": "{\n    \"description\": \"Use Firefox to search for the country \\\"Canada\\\" on Wikipedia, extract the capital city and population, and save this information in an ODS file at \\\"/home/crab/canada.ods\\\" with LibreOffice Calc. The first column will save the country name, the second will save the capital city name, and the third will save the population. No header is needed in the ODS file.\",\n    \"tasks\": [\n        {\n            \"task\": \"1cd6519a-9ee0-442b-ba5a-9238aeb00ff6\",\n            \"attribute\": {\n                \"country\": \"Canada\",\n                \"file_path\": \"/home/crab/canada.ods\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"0e80fd90-0b23-454f-a629-7b6d7baa7542\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/125f7bae-e931-4190-8737-5f1ea7227772.json",
    "content": "{\n    \"description\": \"Submit content \\\"OpenAI is an American artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing \\\"safe and beneficial\\\" artificial general intelligence, which it defines as \\\"highly autonomous systems that outperform humans at most economically valuable work.\\\"\",\n    \"tasks\": [\n        {\n            \"task\": \"1c3bedc3-ea5a-453c-a15b-223d72ab756d\",\n            \"attribute\": {\n                \"content\": \"OpenAI is an American artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing \\\"safe and beneficial\\\" artificial general intelligence, which it defines as \\\"highly autonomous systems that outperform humans at most economically valuable work.\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"125f7bae-e931-4190-8737-5f1ea7227772\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/15a150a8-899c-4753-8dc5-05248ccc3640.json",
    "content": "{\n    \"description\": \"Download the file from \\\"https://media.cntraveller.com/photos/642aa1ad770beda2d4f5cc22/4:3/w_2664,h_1998,c_limit/Fiji-march2023issue-JackJohns15.jpg\\\" to the location \\\"/home/crab/Downloads/fiji.png\\\", and then set \\\"/home/crab/Downloads/fiji.png\\\" as the desktop background on the system.\",\n    \"tasks\": [\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n            \"attribute\": {\n                \"url\": \"https://media.cntraveller.com/photos/642aa1ad770beda2d4f5cc22/4:3/w_2664,h_1998,c_limit/Fiji-march2023issue-JackJohns15.jpg\",\n                \"file_path\": \"/home/crab/Downloads/fiji.png\"\n            },\n            \"output\": \"/home/crab/Downloads/fiji.png\"\n        },\n        {\n            \"task\": \"a207ef38-b3b2-4c6c-a1e3-75c38162f5ba\",\n            \"attribute\": {\n                \"photo_path\": \"/home/crab/Downloads/fiji.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"15a150a8-899c-4753-8dc5-05248ccc3640\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/1ebcd710-f73b-4022-832b-167c0d3f55a2.json",
    "content": "{\n    \"description\": \"Use Firefox to find out a \\\"University\\\" around \\\"Los Angeles\\\" on Google Maps and copy the Google Maps sharing URL of that \\\"University\\\" to the clipboard\",\n    \"tasks\": [\n        {\n            \"task\": \"2b189dc2-c77f-4fa3-8432-ba4355cc294c\",\n            \"attribute\": {\n                \"place_type\": \"University\",\n                \"place_name\": \"Los Angeles\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"1ebcd710-f73b-4022-832b-167c0d3f55a2\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/22787ecc-52b2-4791-aefb-c45800f51414.json",
    "content": "{\n    \"description\": \"Submit content \\\"Jensen Huang cofounded graphics-chip maker Nvidia in 1993, and has served as its CEO and president ever since. Huang owns approximately 3% of Nvidia, which went public in 1999.\\\"\",\n    \"tasks\": [\n        {\n            \"task\": \"1c3bedc3-ea5a-453c-a15b-223d72ab756d\",\n            \"attribute\": {\n                \"content\": \"Jensen Huang cofounded graphics-chip maker Nvidia in 1993, and has served as its CEO and president ever since. Huang owns approximately 3% of Nvidia, which went public in 1999.\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"22787ecc-52b2-4791-aefb-c45800f51414\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/22f05f6f-6aef-4786-958f-14f559eaf014.json",
    "content": "{\n    \"description\": \"Create a new directory \\\"/home/crab/example_code\\\" and copy all files with the specified \\\"py\\\" extension from \\\"/home/crab/crab/examples\\\" to the directory \\\"/home/crab/example_code\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"217ababc-ccc7-4b9f-af07-c239d92848fe\",\n            \"attribute\": {\n                \"file_extension\": \"py\",\n                \"source_dir\": \"/home/crab/crab/examples\",\n                \"target_dir\": \"/home/crab/example_code\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"22f05f6f-6aef-4786-958f-14f559eaf014\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/28963795-d694-4bb4-adaf-f7708a2c6fe5.json",
    "content": "{\n    \"description\": \"Use Firefox to search for an image using the keyword \\\"Elon Musk\\\" and copy the URL of the image.\",\n    \"tasks\": [\n        {\n            \"task\": \"017102b6-d2c3-466b-96f7-37c8bcddc41a\",\n            \"attribute\": {\n                \"keyword\": \"Elon Musk\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"28963795-d694-4bb4-adaf-f7708a2c6fe5\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/299db8f2-81eb-455f-9302-5c8cb30be691.json",
    "content": "{\n    \"description\": \"Combine two images, Image 1 \\\"/home/crab/Pictures/Interstellar.jpg\\\" and Image 2 \\\"/home/crab/Pictures/cat.png\\\", using GIMP (GNU Image Manipulation Program) with Image 1 placed on the left side of Image 2, and save the resulting image to \\\"/home/crab/Pictures/edited_background.png\\\". Then, set \\\"/home/crab/Pictures/edited_background.png\\\" as the desktop background on the system.\",\n    \"tasks\": [\n        {\n            \"task\": \"4cf246ea-0a7f-43da-84b6-61d74a2699af\",\n            \"attribute\": {\n                \"image_path_1\": \"/home/crab/Pictures/Interstellar.jpg\",\n                \"image_path_2\": \"/home/crab/Pictures/cat.png\",\n                \"output_path\": \"/home/crab/Pictures/edited_background.png\"\n            },\n            \"output\": \"/home/crab/Pictures/edited_background.png\"\n        },\n        {\n            \"task\": \"a207ef38-b3b2-4c6c-a1e3-75c38162f5ba\",\n            \"attribute\": {\n                \"photo_path\": \"/home/crab/Pictures/edited_background.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"299db8f2-81eb-455f-9302-5c8cb30be691\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/29f099b2-b3a5-463f-b10a-15363bf7e845.json",
    "content": "{\n    \"description\": \"Use Firefox to search for a \\\"garden\\\" around \\\"ETH Zurich\\\" on Google Maps, copy the sharing URL of that \\\"garden\\\" to the clipboard, then paste the content into Visual Studio Code (VS Code) and save the file at \\\"/home/crab/eth_garden.txt\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"2b189dc2-c77f-4fa3-8432-ba4355cc294c\",\n            \"attribute\": {\n                \"place_type\": \"garden\",\n                \"place_name\": \"ETH Zurich\"\n            },\n            \"output\": null\n        },\n        {\n            \"task\": \"8491e674-596b-452b-9e0e-58a44d90f947\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/eth_garden.txt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"29f099b2-b3a5-463f-b10a-15363bf7e845\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/355e9660-a355-4b95-8881-ac9da578ea43.json",
    "content": "{\n    \"description\": \"Use Firefox to search for the country \\\"Italy\\\" on Wikipedia, extract the capital city and population, and save this information in an ODS file at \\\"/home/crab/country.ods\\\" with LibreOffice Calc. The first column will save the country name, the second will save the capital city name, and the third will save the population. No header is needed in the ODS file.\",\n    \"tasks\": [\n        {\n            \"task\": \"1cd6519a-9ee0-442b-ba5a-9238aeb00ff6\",\n            \"attribute\": {\n                \"country\": \"Italy\",\n                \"file_path\": \"/home/crab/country.ods\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"355e9660-a355-4b95-8881-ac9da578ea43\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/35bd7387-4735-4632-8474-e93382004c12.json",
    "content": "{\n    \"description\": \"Use GIMP (GNU Image Manipulation Program) to adjust the brightness of the image from \\\"/home/crab/assets/campus.png\\\" to a higher value (brighter) and save it to \\\"/home/crab/assets/campus_edited.png\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"cc1adae7-bef9-4c8a-865d-00d44486dd69\",\n            \"attribute\": {\n                \"image_path_before_edit\": \"/home/crab/assets/campus.png\",\n                \"image_path_after_edit\": \"/home/crab/assets/campus_edited.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"35bd7387-4735-4632-8474-e93382004c12\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/362c5711-3824-42ff-96a0-7801b03b5f1f.json",
    "content": "{\n    \"description\": \"Use Firefox to find a code repository about \\\"Open Source Computer Vision Library\\\" in GitHub and copy the URL of the repository to the clipboard.\",\n    \"tasks\": [\n        {\n            \"task\": \"bcd03c9f-62c9-4001-8d86-78358c59ce22\",\n            \"attribute\": {\n                \"keyword\": \"Open Source Computer Vision Library\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"362c5711-3824-42ff-96a0-7801b03b5f1f\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/4718df9c-97ec-4b54-86ca-bd34e65c5a43.json",
    "content": "{\n    \"description\": \"Download a file from \\\"https://arxiv.org/pdf/2303.05499\\\" to \\\"/home/crab/Documents/Grounding_DINO.pdf\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n            \"attribute\": {\n                \"url\": \"https://arxiv.org/pdf/2303.05499\",\n                \"file_path\": \"/home/crab/Documents/Grounding_DINO.pdf\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"4718df9c-97ec-4b54-86ca-bd34e65c5a43\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/47b75b21-99a2-461c-9d40-6dddc5c206d0.json",
    "content": "{\n    \"description\": \"Use Firefox to search for an image using the keyword \\\"LLM\\\" and copy the URL of the image to the clipboard.\",\n    \"tasks\": [\n        {\n            \"task\": \"017102b6-d2c3-466b-96f7-37c8bcddc41a\",\n            \"attribute\": {\n                \"keyword\": \"LLM\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"47b75b21-99a2-461c-9d40-6dddc5c206d0\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/4ae4e35f-d90a-48cc-8fb9-492ac7ae07ee.json",
    "content": "{\n    \"description\": \"Paste clipboard content into LibreOffice Writer and save it as an ODT file at \\\"/home/crab/Documents/clipboard_text.odt\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"76de4bdb-c980-4b3a-9bd3-c87db467dffe\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Documents/clipboard_text.odt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"4ae4e35f-d90a-48cc-8fb9-492ac7ae07ee\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/4bbedade-4d4e-43d5-b650-2702b350ad28.json",
    "content": "{\n    \"description\": \"Open \\\"/home/crab/assets/1.txt\\\" using vim in a terminal, write \\\"LinkedIn is a business and employment-focused social media platform that works through websites and mobile apps. It was launched on May 5, 2003 by Reid Hoffman and Eric Ly.\\\", then save and exit vim.\",\n    \"tasks\": [\n        {\n            \"task\": \"0f589bf9-9b26-4581-8b78-2961b115ab49\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/assets/1.txt\",\n                \"content\": \"LinkedIn is a business and employment-focused social media platform that works through websites and mobile apps. It was launched on May 5, 2003 by Reid Hoffman and Eric Ly.\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"4bbedade-4d4e-43d5-b650-2702b350ad28\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/51a288f9-cf2c-4e8e-a98c-596a505af77c.json",
    "content": "{\n    \"description\": \"Combine two images from Image 1 \\\"/home/crab/assets/desert.jpg\\\" and Image 2 \\\"/home/crab/assets/campus.png\\\" using LibreOffice Impress and save the resulting file in PDF format to \\\"/home/crab/assets/desert_campus.pdf\\\". Image 1 should be placed on the right side of Image 2.\",\n    \"tasks\": [\n        {\n            \"task\": \"467f17a6-c42f-4eda-996f-a53385eb3efd\",\n            \"attribute\": {\n                \"image_path_1\": \"/home/crab/assets/desert.jpg\",\n                \"image_path_2\": \"/home/crab/assets/campus.png\",\n                \"output_path\": \"/home/crab/assets/desert_campus.pdf\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"51a288f9-cf2c-4e8e-a98c-596a505af77c\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/51c91051-3efb-4e92-a967-739b18520714.json",
    "content": "{\n    \"description\": \"Open Firefox and search for the torch.matmul example provided by the official PyTorch version 1.13 documentation, copy all the lines of code from the example, open Visual Studio Code (VS Code), paste the clipboard content into a new file, and save it as \\\"/home/crab/example.py\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"49b614c5-c4bb-4c20-aab8-ab9dcc7de1b5\",\n            \"attribute\": {},\n            \"output\": null\n        },\n        {\n            \"task\": \"8491e674-596b-452b-9e0e-58a44d90f947\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/example.py\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"51c91051-3efb-4e92-a967-739b18520714\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/57b7e8a7-8c17-4cc4-9bb5-4385afde3ad8.json",
    "content": "{\n    \"description\": \"Create a new directory \\\"/home/crab/assets_for_edit\\\" and copy all files with the \\\"png\\\" extension from \\\"/home/crab/assets\\\" to this new directory. Then, combining Image 1 \\\"/home/crab/assets_for_edit/background.png\\\" and Image 2 \\\"/home/crab/assets_for_edit/campus.png\\\" with LibreOffice Writer, place Image 1 above Image 2, and save the file in the ODT format to \\\"/home/crab/assets_for_edit/back_n_campus.odt\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"217ababc-ccc7-4b9f-af07-c239d92848fe\",\n            \"attribute\": {\n                \"file_extension\": \"png\",\n                \"source_dir\": \"/home/crab/assets\",\n                \"target_dir\": \"/home/crab/assets_for_edit\"\n            },\n            \"output\": \"/home/crab/assets_for_edit\"\n        },\n        {\n            \"task\": \"0111384f-38ca-41a2-9504-cb1c55002b3c\",\n            \"attribute\": {\n                \"image_path_1\": \"/home/crab/assets_for_edit/background.png\",\n                \"image_path_2\": \"/home/crab/assets_for_edit/campus.png\",\n                \"output_path\": \"/home/crab/assets_for_edit/back_n_campus.odt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"57b7e8a7-8c17-4cc4-9bb5-4385afde3ad8\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/58776443-ccf7-4db3-8c60-e188e4b5f90c.json",
    "content": "{\n    \"description\": \"Paste clipboard content into LibreOffice Writer and save it as an ODT file at \\\"/home/crab/paste.odt\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"76de4bdb-c980-4b3a-9bd3-c87db467dffe\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/paste.odt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"58776443-ccf7-4db3-8c60-e188e4b5f90c\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/5ba74c6a-4513-448b-8b68-ff145ece0652.json",
    "content": "{\n    \"description\": \"Download the file from \\\"https://raw.githubusercontent.com/camel-ai/camel/master/README.md\\\" to \\\"/home/crab/Documents/README.md\\\", and then print the content of \\\"/home/crab/Documents/README.md\\\" to the command line interface through a terminal.\",\n    \"tasks\": [\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n            \"attribute\": {\n                \"url\": \"https://raw.githubusercontent.com/camel-ai/camel/master/README.md\",\n                \"file_path\": \"/home/crab/Documents/README.md\"\n            },\n            \"output\": \"/home/crab/Documents/README.md\"\n        },\n        {\n            \"task\": \"5b527839-0e58-426d-bab6-7160200b0d24\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Documents/README.md\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"5ba74c6a-4513-448b-8b68-ff145ece0652\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/6428f803-62de-40d2-a345-64e6cf955c9d.json",
    "content": "{\n    \"description\": \"First, use LibreOffice Impress to adjust the brightness of the image located at \\\"/home/crab/Pictures/cat.png\\\" to make it darker, and save the edited image as \\\"/home/crab/Pictures/cat_edited.png\\\". Then, using GIMP (GNU Image Manipulation Program), combine the image \\\"/home/crab/Pictures/dog.png\\\" with \\\"/home/crab/Pictures/cat_edited.png\\\" by placing the dog image on the left side of the cat image, and save the merged image to \\\"/home/crab/Pictures/dog_cat.png\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"434402f3-647a-4a9a-9d8f-10f5bb6c7cf0\",\n            \"attribute\": {\n                \"image_path_before_edit\": \"/home/crab/Pictures/cat.png\",\n                \"image_path_after_edit\": \"/home/crab/Pictures/cat_edited.png\"\n            },\n            \"output\": \"/home/crab/Pictures/cat_edited.png\"\n        },\n        {\n            \"task\": \"4cf246ea-0a7f-43da-84b6-61d74a2699af\",\n            \"attribute\": {\n                \"image_path_1\": \"/home/crab/Pictures/dog.png\",\n                \"image_path_2\": \"/home/crab/Pictures/cat_edited.png\",\n                \"output_path\": \"/home/crab/Pictures/dog_cat.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"6428f803-62de-40d2-a345-64e6cf955c9d\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/64a2c205-c85a-4e56-8edb-5df4f7724441.json",
    "content": "{\n    \"description\": \"Find the example provided of \\\"torch.matmul\\\" by official PyTorch version 1.13 documentation using Firefox and copy all the lines of code in the example to the clipboard.\",\n    \"tasks\": [\n        {\n            \"task\": \"49b614c5-c4bb-4c20-aab8-ab9dcc7de1b5\",\n            \"attribute\": {},\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"64a2c205-c85a-4e56-8edb-5df4f7724441\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/696ca9bb-89ea-4cd5-b693-f2d749d964b1.json",
    "content": "{\n    \"description\": \"Adjust the brightness of the image located at \\\"/home/crab/assets/campus.png\\\" using GIMP (GNU Image Manipulation Program) to make it brighter, save the adjusted image to \\\"/home/crab/Pictures/campus_brighter.png\\\", and then set this enhanced image as the desktop background on an Ubuntu system.\",\n    \"tasks\": [\n        {\n            \"task\": \"cc1adae7-bef9-4c8a-865d-00d44486dd69\",\n            \"attribute\": {\n                \"image_path_before_edit\": \"/home/crab/assets/campus.png\",\n                \"image_path_after_edit\": \"/home/crab/Pictures/campus_brighter.png\"\n            },\n            \"output\": \"/home/crab/Pictures/campus_brighter.png\"\n        },\n        {\n            \"task\": \"a207ef38-b3b2-4c6c-a1e3-75c38162f5ba\",\n            \"attribute\": {\n                \"photo_path\": \"/home/crab/Pictures/campus_brighter.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"696ca9bb-89ea-4cd5-b693-f2d749d964b1\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/6be49e77-e904-4eb0-a36a-7f0fd128ede3.json",
    "content": "{\n    \"description\": \"Use Firefox to find a code repository about \\\"pytorch\\\" in GitHub and copy the URL of the repository to the clipboard.\",\n    \"tasks\": [\n        {\n            \"task\": \"bcd03c9f-62c9-4001-8d86-78358c59ce22\",\n            \"attribute\": {\n                \"keyword\": \"pytorch\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"6be49e77-e904-4eb0-a36a-7f0fd128ede3\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/6c3105a2-328c-4190-823d-03d759be0b57.json",
    "content": "{\n    \"description\": \"Use Firefox to search for an image with the keyword \\\"reinforcement learning,\\\" copy the URL of the chosen image to the clipboard, and download the image from the URL in the clipboard to \\\"/home/crab/Downloads/RL.png\\\" on an Ubuntu system.\",\n    \"tasks\": [\n        {\n            \"task\": \"017102b6-d2c3-466b-96f7-37c8bcddc41a\",\n            \"attribute\": {\n                \"keyword\": \"reinforcement learning\"\n            },\n            \"output\": null\n        },\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19acsd\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Downloads/RL.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"6c3105a2-328c-4190-823d-03d759be0b57\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/6c560516-ca14-4f97-b51d-16ad81fc29e4.json",
    "content": "{\n    \"description\": \"Open \\\"/home/crab/assets/a.txt\\\" using vim in a terminal, write \\\"The most recent COMPUTEX was held from 30 May to 2 June 2023 with sessions about such topics as high-performance computing, artificial intelligence, next-gen connectivity and sustainability.\\\", then save and exit vim, and print the content of \\\"/home/crab/assets/a.txt\\\" to the command line interface.\",\n    \"tasks\": [\n        {\n            \"task\": \"0f589bf9-9b26-4581-8b78-2961b115ab49\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/assets/a.txt\",\n                \"content\": \"The most recent COMPUTEX was held from 30 May to 2 June 2023 with sessions about such topics as high-performance computing, artificial intelligence, next-gen connectivity and sustainability.\"\n            },\n            \"output\": \"/home/crab/assets/a.txt\"\n        },\n        {\n            \"task\": \"5b527839-0e58-426d-bab6-7160200b0d24\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/assets/a.txt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"6c560516-ca14-4f97-b51d-16ad81fc29e4\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/730172f5-894a-4d46-9102-ac7d985a479d.json",
    "content": "{\n    \"description\": \"Download the image of Jupiter from \\\"https://upload.wikimedia.org/wikipedia/commons/thumb/2/2b/Jupiter_and_its_shrunken_Great_Red_Spot.jpg/640px-Jupiter_and_its_shrunken_Great_Red_Spot.jpg\\\" to \\\"/home/crab/Pictures/jupiter.jpg\\\", then use LibreOffice Impress to adjust the brightness of this image to make it darker and save the edited version as \\\"/home/crab/Pictures/jupiter_edited.jpg\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n            \"attribute\": {\n                \"url\": \"https://upload.wikimedia.org/wikipedia/commons/thumb/2/2b/Jupiter_and_its_shrunken_Great_Red_Spot.jpg/640px-Jupiter_and_its_shrunken_Great_Red_Spot.jpg\",\n                \"file_path\": \"/home/crab/Pictures/jupiter.jpg\"\n            },\n            \"output\": \"/home/crab/Pictures/jupiter.jpg\"\n        },\n        {\n            \"task\": \"434402f3-647a-4a9a-9d8f-10f5bb6c7cf0\",\n            \"attribute\": {\n                \"image_path_before_edit\": \"/home/crab/Pictures/jupiter.jpg\",\n                \"image_path_after_edit\": \"/home/crab/Pictures/jupiter_edited.jpg\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"730172f5-894a-4d46-9102-ac7d985a479d\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/73038efb-ca0f-4d90-a947-fcfd097dd91b.json",
    "content": "{\n    \"description\": \"Open Firefox and navigate to the official PyTorch version 1.13 documentation to find an example of `torch.matmul`. Copy all the lines of code in the example to the clipboard. Then, paste the clipboard content into Visual Studio Code (VS Code) and save it as a file at \\\"/home/crab/example_code.txt\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"49b614c5-c4bb-4c20-aab8-ab9dcc7de1b5\",\n            \"attribute\": {},\n            \"output\": null\n        },\n        {\n            \"task\": \"8491e674-596b-452b-9e0e-58a44d90f947\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/example_code.txt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"73038efb-ca0f-4d90-a947-fcfd097dd91b\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/73da97c9-f084-4cab-8697-1151737387ff.json",
    "content": "{\n    \"description\": \"Download the file from \\\"https://images.top1market.com/images/cms/uploads/20230928/4950e1db0038feb506fdcfa0c936fd8e.png\\\" to \\\"/home/crab/Desktop/meta.png\\\", then set this image, \\\"/home/crab/Desktop/meta.png\\\", as the desktop background on the system.\",\n    \"tasks\": [\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n            \"attribute\": {\n                \"url\": \"https://images.top1market.com/images/cms/uploads/20230928/4950e1db0038feb506fdcfa0c936fd8e.png\",\n                \"file_path\": \"/home/crab/Desktop/meta.png\"\n            },\n            \"output\": \"/home/crab/Desktop/meta.png\"\n        },\n        {\n            \"task\": \"a207ef38-b3b2-4c6c-a1e3-75c38162f5ba\",\n            \"attribute\": {\n                \"photo_path\": \"/home/crab/Desktop/meta.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"73da97c9-f084-4cab-8697-1151737387ff\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/77aa4dd3-5a68-4686-9cac-26d0ab77c7b4.json",
    "content": "{\n    \"description\": \"Use Firefox to find out a \\\"hiking trail\\\" around \\\"Munich\\\" on Google Maps and copy the Google Maps sharing URL of that \\\"hiking trail\\\" to the clipboard\",\n    \"tasks\": [\n        {\n            \"task\": \"2b189dc2-c77f-4fa3-8432-ba4355cc294c\",\n            \"attribute\": {\n                \"place_type\": \"hiking trail\",\n                \"place_name\": \"Munich\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"77aa4dd3-5a68-4686-9cac-26d0ab77c7b4\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/78502f1c-879b-4932-a5fd-d85f7f6b0f81.json",
    "content": "{\n    \"description\": \"Download the file from \\\"https://cemse.kaust.edu.sa/sites/default/files/styles/large/public/2023-04/Web%20banner.jpg?itok=d1TvGUKY\\\" to \\\"/home/crab/Pictures/KAUST_AI.png\\\" and then set this image as the desktop background on the system.\",\n    \"tasks\": [\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n            \"attribute\": {\n                \"url\": \"https://cemse.kaust.edu.sa/sites/default/files/styles/large/public/2023-04/Web%20banner.jpg?itok=d1TvGUKY\",\n                \"file_path\": \"/home/crab/Pictures/KAUST_AI.png\"\n            },\n            \"output\": \"/home/crab/Pictures/KAUST_AI.png\"\n        },\n        {\n            \"task\": \"a207ef38-b3b2-4c6c-a1e3-75c38162f5ba\",\n            \"attribute\": {\n                \"photo_path\": \"/home/crab/Pictures/KAUST_AI.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"78502f1c-879b-4932-a5fd-d85f7f6b0f81\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/7912f7a5-24b9-4dfe-a7b8-1effc1b7a212.json",
    "content": "{\n    \"description\": \"Combine two images from Image 1 \\\"/home/crab/assets/campus.png\\\" and Image 2 \\\"/home/crab/assets/desert.jpg using GIMP (GNU Image Manipulation Program) and save the resulting image to \\\"/home/crab/assets/campus_desert.png\\\". Image 1 should be placed on the left side of Image 2.\",\n    \"tasks\": [\n        {\n            \"task\": \"4cf246ea-0a7f-43da-84b6-61d74a2699af\",\n            \"attribute\": {\n                \"image_path_1\": \"/home/crab/assets/campus.png\",\n                \"image_path_2\": \"/home/crab/assets/desert.jpg\",\n                \"output_path\": \"/home/crab/assets/campus_desert.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"7912f7a5-24b9-4dfe-a7b8-1effc1b7a212\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/7d5613ec-9b67-4255-b766-d9c6e8466464.json",
    "content": "{\n    \"description\": \"Paste clipboard content into LibreOffice Writer and save it as an ODT file at \\\"/home/crab/assets/content.odt\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"76de4bdb-c980-4b3a-9bd3-c87db467dffe\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/assets/content.odt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"7d5613ec-9b67-4255-b766-d9c6e8466464\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/7dda7e46-78be-4663-b882-6132dbbff335.json",
    "content": "{\n    \"description\": \"Adjust the brightness of the image located at \\\"/home/crab/Pictures/Interstellar.jpg\\\" to a higher value using GIMP (GNU Image Manipulation Program), save the edited image as \\\"/home/crab/edited_background.png\\\", and then set this edited image as the desktop background on the system.\",\n    \"tasks\": [\n        {\n            \"task\": \"cc1adae7-bef9-4c8a-865d-00d44486dd69\",\n            \"attribute\": {\n                \"image_path_before_edit\": \"/home/crab/Pictures/Interstellar.jpg\",\n                \"image_path_after_edit\": \"/home/crab/edited_background.png\"\n            },\n            \"output\": \"/home/crab/edited_background.png\"\n        },\n        {\n            \"task\": \"a207ef38-b3b2-4c6c-a1e3-75c38162f5ba\",\n            \"attribute\": {\n                \"photo_path\": \"/home/crab/edited_background.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"7dda7e46-78be-4663-b882-6132dbbff335\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/7e6c4927-2220-4522-9e3f-36f69adc3e71.json",
    "content": "{\n    \"description\": \"Paste clipboard content into Visual Studio Code (VS Code) and save it as a file at \\\"/home/crab/assets/clipboard.md\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"8491e674-596b-452b-9e0e-58a44d90f947\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/assets/clipboard.md\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"7e6c4927-2220-4522-9e3f-36f69adc3e71\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/82c49e12-3b2f-432e-9069-4b67bafebbf7.json",
    "content": "{\n    \"description\": \"Open Firefox to find a coffee shop around the hungarian parliament on Google Maps, copy the sharing URL of the coffee shop to the clipboard, then paste the clipboard content into Visual Studio Code (VS Code), and save the content as a file at \\\"/home/crab/Downloads/coffee\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"2b189dc2-c77f-4fa3-8432-ba4355cc294c\",\n            \"attribute\": {\n                \"place_type\": \"coffee shop\",\n                \"place_name\": \"hungarian parliament\"\n            },\n            \"output\": null\n        },\n        {\n            \"task\": \"8491e674-596b-452b-9e0e-58a44d90f947\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Downloads/coffee\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"82c49e12-3b2f-432e-9069-4b67bafebbf7\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/87910f23-ab23-4ccc-b115-d71cff6f0162.json",
    "content": "{\n    \"description\": \"Use Firefox to search for an image with the keyword \\\"patagonia,\\\" copy the URL of the chosen image to the clipboard, and download the file from that URL to \\\"/home/crab/Desktop/brand.jpg\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"017102b6-d2c3-466b-96f7-37c8bcddc41a\",\n            \"attribute\": {\n                \"keyword\": \"patagonia\"\n            },\n            \"output\": null\n        },\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19acsd\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Desktop/brand.jpg\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"87910f23-ab23-4ccc-b115-d71cff6f0162\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/8afc25eb-7a80-459f-acdc-5c79fc146c29.json",
    "content": "{\n    \"description\": \"Paste clipboard content into Visual Studio Code (VS Code) and save it as a file at \\\"/home/crab/assets/content_2.txt\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"8491e674-596b-452b-9e0e-58a44d90f947\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/assets/content_2.txt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"8afc25eb-7a80-459f-acdc-5c79fc146c29\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/8cb5ab6d-a56e-43b9-aa83-00a46331e20f.json",
    "content": "{\n    \"description\": \"Download the image from \\\"https://res.cloudinary.com/simpleview/image/upload/v1648755098/clients/austin/Austin_Skyline_Credit_Christopher_Sherman_lifetime__4f60343d-9f69-450c-8ad3-fa636761786d.jpg\\\" to \\\"/home/crab/Downloads/Austin.jpg\\\", then use GIMP (GNU Image Manipulation Program) to adjust its brightness to a higher value and save the modified image as \\\"/home/crab/Downloads/brighter_austin.jpg\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n            \"attribute\": {\n                \"url\": \"https://res.cloudinary.com/simpleview/image/upload/v1648755098/clients/austin/Austin_Skyline_Credit_Christopher_Sherman_lifetime__4f60343d-9f69-450c-8ad3-fa636761786d.jpg\",\n                \"file_path\": \"/home/crab/Downloads/Austin.jpg\"\n            },\n            \"output\": \"/home/crab/Downloads/Austin.jpg\"\n        },\n        {\n            \"task\": \"cc1adae7-bef9-4c8a-865d-00d44486dd69\",\n            \"attribute\": {\n                \"image_path_before_edit\": \"/home/crab/Downloads/Austin.jpg\",\n                \"image_path_after_edit\": \"/home/crab/Downloads/brighter_austin.jpg\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"8cb5ab6d-a56e-43b9-aa83-00a46331e20f\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/90e09946-7b28-4102-b0ed-f683c01dbbd4.json",
    "content": "{\n    \"description\": \"Use Firefox to find a code repository about \\\"W&B\\\" in GitHub and copy the URL of the repository to the clipboard.\",\n    \"tasks\": [\n        {\n            \"task\": \"bcd03c9f-62c9-4001-8d86-78358c59ce22\",\n            \"attribute\": {\n                \"keyword\": \"W&B\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"90e09946-7b28-4102-b0ed-f683c01dbbd4\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/925a3607-2802-48aa-b339-13ebfcef43a2.json",
    "content": "{\n    \"description\": \"Use Firefox to find a code repository about \\\"segment-anything\\\" in GitHub and copy the URL of the repository to the clipboard.\",\n    \"tasks\": [\n        {\n            \"task\": \"bcd03c9f-62c9-4001-8d86-78358c59ce22\",\n            \"attribute\": {\n                \"keyword\": \"segment-anything\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"925a3607-2802-48aa-b339-13ebfcef43a2\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/9506dd30-f58d-4832-b336-8037e83e2689.json",
    "content": "{\n    \"description\": \"Get the content of \\\"/home/crab/Documents/nba.txt\\\" by printing it to the command line interface through a terminal\",\n    \"tasks\": [\n        {\n            \"task\": \"5b527839-0e58-426d-bab6-7160200b0d24\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Documents/nba.txt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"9506dd30-f58d-4832-b336-8037e83e2689\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/95e347aa-56ab-4d5d-a94c-350ddfddabf9.json",
    "content": "{\n    \"description\": \"Create a new directory \\\"/home/crab/png_folder\\\" and copy all files with the specified \\\"png\\\" extension from \\\"/home/crab/Pictures\\\" to the directory \\\"/home/crab/png_folder\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"217ababc-ccc7-4b9f-af07-c239d92848fe\",\n            \"attribute\": {\n                \"file_extension\": \"png\",\n                \"source_dir\": \"/home/crab/Pictures\",\n                \"target_dir\": \"/home/crab/png_folder\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"95e347aa-56ab-4d5d-a94c-350ddfddabf9\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/98a360d8-0f95-44cd-bb9d-442fca2918d4.json",
    "content": "{\n    \"description\": \"Download a file from \\\"https://github.com/open-mmlab/mmdetection/archive/refs/tags/v3.3.0.zip\\\" to \\\"/home/crab/mmdetection_v3.3.0.zip\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n            \"attribute\": {\n                \"url\": \"https://github.com/open-mmlab/mmdetection/archive/refs/tags/v3.3.0.zip\",\n                \"file_path\": \"/home/crab/mmdetection_v3.3.0.zip\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"98a360d8-0f95-44cd-bb9d-442fca2918d4\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/9c979fc5-8d60-41f1-a494-904a1d312187.json",
    "content": "{\n    \"description\": \"Use Firefox to search for the country \\\"United Kingdom\\\" on Wikipedia, extract the capital city and population, and save this information in an ODS file at \\\"/home/crab/assets/content.ods\\\" with LibreOffice Calc. The first column will save the country name, the second will save the capital city name, and the third will save the population. No header is needed in the ODS file.\",\n    \"tasks\": [\n        {\n            \"task\": \"1cd6519a-9ee0-442b-ba5a-9238aeb00ff6\",\n            \"attribute\": {\n                \"country\": \"United Kingdom\",\n                \"file_path\": \"/home/crab/assets/content.ods\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"9c979fc5-8d60-41f1-a494-904a1d312187\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/9e08971c-7f83-4853-952e-4c4a4a26333b.json",
    "content": "{\n    \"description\": \"Use Firefox to search for an image using the keyword \\\"Red Sea\\\" and copy the URL of the image to the clipboard.\",\n    \"tasks\": [\n        {\n            \"task\": \"017102b6-d2c3-466b-96f7-37c8bcddc41a\",\n            \"attribute\": {\n                \"keyword\": \"Red Sea\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"9e08971c-7f83-4853-952e-4c4a4a26333b\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/9fe4f541-61cf-48e0-a081-4371786659c7.json",
    "content": "{\n    \"description\": \"Set \\\"/home/crab/Pictures/Interstellar.jpg\\\" as the screen background of the system\",\n    \"tasks\": [\n        {\n            \"task\": \"a207ef38-b3b2-4c6c-a1e3-75c38162f5ba\",\n            \"attribute\": {\n                \"photo_path\": \"/home/crab/Pictures/Interstellar.jpg\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"9fe4f541-61cf-48e0-a081-4371786659c7\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/a0714ef7-bbdc-4f84-bd2e-c6e611d4db9e.json",
    "content": "{\n    \"description\": \"Get the content of \\\"/home/crab/ubuntu\\\" by printing it to the command line interface through a terminal\",\n    \"tasks\": [\n        {\n            \"task\": \"5b527839-0e58-426d-bab6-7160200b0d24\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/ubuntu\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"a0714ef7-bbdc-4f84-bd2e-c6e611d4db9e\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/a2a34580-cded-4bf8-81d9-b36a4d4402d0.json",
    "content": "{\n    \"description\": \"Set \\\"/home/crab/assets/background.png\\\" as the screen background of the system\",\n    \"tasks\": [\n        {\n            \"task\": \"a207ef38-b3b2-4c6c-a1e3-75c38162f5ba\",\n            \"attribute\": {\n                \"photo_path\": \"/home/crab/assets/background.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"a2a34580-cded-4bf8-81d9-b36a4d4402d0\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/a6b67c2d-d448-4e77-904e-dc7c5f21a5fe.json",
    "content": "{\n    \"description\": \"Get the content of \\\"/home/crab/crab/README.md\\\" by printing it to the command line interface through a terminal\",\n    \"tasks\": [\n        {\n            \"task\": \"5b527839-0e58-426d-bab6-7160200b0d24\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/crab/README.md\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"a6b67c2d-d448-4e77-904e-dc7c5f21a5fe\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/a70ab903-835f-48b7-8356-2321b8b869d8.json",
    "content": "{\n    \"description\": \"Using Firefox, find the example of torch.matmul provided by the official PyTorch version 1.13 documentation and copy all the lines of code in the example to the clipboard, then paste the clipboard content into LibreOffice Writer and save it as an ODT file at \\\"/home/crab/Desktop/doc_torch.odt\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"49b614c5-c4bb-4c20-aab8-ab9dcc7de1b5\",\n            \"attribute\": {},\n            \"output\": null\n        },\n        {\n            \"task\": \"76de4bdb-c980-4b3a-9bd3-c87db467dffe\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Desktop/doc_torch.odt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"a70ab903-835f-48b7-8356-2321b8b869d8\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/a78177f5-6cc6-48d7-8c6f-df53399d7759.json",
    "content": "{\n    \"description\": \"Use Firefox to search for an image using the keyword \\\"The Colosseum\\\" and copy the URL of the image to the clipboard.\",\n    \"tasks\": [\n        {\n            \"task\": \"017102b6-d2c3-466b-96f7-37c8bcddc41a\",\n            \"attribute\": {\n                \"keyword\": \"The Colosseum\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"a78177f5-6cc6-48d7-8c6f-df53399d7759\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/abb16512-27ae-49c0-b12b-7fbf0e95056b.json",
    "content": "{\n    \"description\": \"Paste the clipboard content into Visual Studio Code (VS Code) and save the file as \\\"/home/crab/Desktop/content.txt\\\", then open a terminal and print the content of \\\"/home/crab/Desktop/content.txt\\\" to the command line interface.\",\n    \"tasks\": [\n        {\n            \"task\": \"8491e674-596b-452b-9e0e-58a44d90f947\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Desktop/content.txt\"\n            },\n            \"output\": \"/home/crab/Desktop/content.txt\"\n        },\n        {\n            \"task\": \"5b527839-0e58-426d-bab6-7160200b0d24\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Desktop/content.txt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"abb16512-27ae-49c0-b12b-7fbf0e95056b\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/b2ca21dc-dde9-49f5-bec7-321fbf769315.json",
    "content": "{\n    \"description\": \"Adjust the brightness of the image located at \\\"/home/crab/assets/desert.jpg\\\" to a darker value using LibreOffice Impress and save it as \\\"/home/crab/assets/darker_desert.jpg\\\", then use GIMP (GNU Image Manipulation Program) to combine this adjusted image with the original image at \\\"/home/crab/assets/desert.jpg\\\", placing the darker image on the left side and the original on the right, finally save the resulting comparison image to \\\"/home/crab/assets/desert_comparison.jpg\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"434402f3-647a-4a9a-9d8f-10f5bb6c7cf0\",\n            \"attribute\": {\n                \"image_path_before_edit\": \"/home/crab/assets/desert.jpg\",\n                \"image_path_after_edit\": \"/home/crab/assets/darker_desert.jpg\"\n            },\n            \"output\": \"/home/crab/assets/darker_desert.jpg\"\n        },\n        {\n            \"task\": \"4cf246ea-0a7f-43da-84b6-61d74a2699af\",\n            \"attribute\": {\n                \"image_path_1\": \"/home/crab/assets/darker_desert.jpg\",\n                \"image_path_2\": \"/home/crab/assets/desert.jpg\",\n                \"output_path\": \"/home/crab/assets/desert_comparison.jpg\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"b2ca21dc-dde9-49f5-bec7-321fbf769315\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/b57c96c1-071b-40f6-b33b-2a0459fc25bb.json",
    "content": "{\n    \"description\": \"Use GIMP (GNU Image Manipulation Program) to adjust the brightness of the image from \\\"/home/crab/assets/background.png\\\" to a higher value (brighter) and save it to \\\"/home/crab/Pictures/background_edited.jpg\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"cc1adae7-bef9-4c8a-865d-00d44486dd69\",\n            \"attribute\": {\n                \"image_path_before_edit\": \"/home/crab/assets/background.png\",\n                \"image_path_after_edit\": \"/home/crab/Pictures/background_edited.jpg\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"b57c96c1-071b-40f6-b33b-2a0459fc25bb\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/b73019e0-3ce8-4657-8b13-b3e0ab6cfac8.json",
    "content": "{\n    \"description\": \"Download a file from \\\"https://raw.githubusercontent.com/camel-ai/camel/master/misc/primary_logo.png\\\" to \\\"/home/crab/camel-logo.png\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n            \"attribute\": {\n                \"url\": \"https://raw.githubusercontent.com/camel-ai/camel/master/misc/primary_logo.png\",\n                \"file_path\": \"/home/crab/camel-logo.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"b73019e0-3ce8-4657-8b13-b3e0ab6cfac8\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/ba5aebcb-999d-44d4-b9bc-241f9884c6dd.json",
    "content": "{\n    \"description\": \"Use GIMP (GNU Image Manipulation Program) to adjust the brightness of the image from \\\"/home/crab/Pictures/Interstellar.jpg\\\" to a higher value (brighter) and save it to \\\"/home/crab/interstellar_brighter.jpg\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"cc1adae7-bef9-4c8a-865d-00d44486dd69\",\n            \"attribute\": {\n                \"image_path_before_edit\": \"/home/crab/Pictures/Interstellar.jpg\",\n                \"image_path_after_edit\": \"/home/crab/interstellar_brighter.jpg\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"ba5aebcb-999d-44d4-b9bc-241f9884c6dd\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/be6468be-2218-45c1-9b75-b56efec61eb4.json",
    "content": "{\n    \"description\": \"Paste clipboard content into Visual Studio Code (VS Code) and save it as a file at \\\"/home/crab/text_result\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"8491e674-596b-452b-9e0e-58a44d90f947\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/text_result\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"be6468be-2218-45c1-9b75-b56efec61eb4\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/c4106f9a-9348-4a55-9892-782e6f4b3081.json",
    "content": "{\n    \"description\": \"Use LibreOffice Impress to adjust the brightness of the image from \\\"/home/crab/assets/desert.jpg\\\" to a lower value (darker) and save it to \\\"/home/crab/assets/desert_edited.png\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"434402f3-647a-4a9a-9d8f-10f5bb6c7cf0\",\n            \"attribute\": {\n                \"image_path_before_edit\": \"/home/crab/assets/desert.jpg\",\n                \"image_path_after_edit\": \"/home/crab/assets/desert_edited.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"c4106f9a-9348-4a55-9892-782e6f4b3081\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/c8800e50-3ff4-4dd2-bc90-33688be99659.json",
    "content": "{\n    \"description\": \"Download a file from \\\"https://raw.githubusercontent.com/facebookresearch/detectron2/main/README.md\\\" to \\\"/home/crab/Documents/detectron2.txt\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n            \"attribute\": {\n                \"url\": \"https://raw.githubusercontent.com/facebookresearch/detectron2/main/README.md\",\n                \"file_path\": \"/home/crab/Documents/detectron2.txt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"c8800e50-3ff4-4dd2-bc90-33688be99659\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/ccf31785-ec13-4981-93c5-ca6c242ac0c3.json",
    "content": "{\n    \"description\": \"Download the flag of Ethiopia image from \\\"https://upload.wikimedia.org/wikipedia/commons/thumb/7/71/Flag_of_Ethiopia.svg/250px-Flag_of_Ethiopia.svg.png\\\" to \\\"/home/crab/Pictures/flag.png\\\", create a new directory named \\\"/home/crab/Pictures/png_\\\", and copy all PNG files from \\\"/home/crab/Pictures\\\" to the newly created directory \\\"/home/crab/Pictures/png_\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n            \"attribute\": {\n                \"url\": \"https://upload.wikimedia.org/wikipedia/commons/thumb/7/71/Flag_of_Ethiopia.svg/250px-Flag_of_Ethiopia.svg.png\",\n                \"file_path\": \"/home/crab/Pictures/flag.png\"\n            },\n            \"output\": \"/home/crab/Pictures/flag.png\"\n        },\n        {\n            \"task\": \"217ababc-ccc7-4b9f-af07-c239d92848fe\",\n            \"attribute\": {\n                \"file_extension\": \"png\",\n                \"source_dir\": \"/home/crab/Pictures\",\n                \"target_dir\": \"/home/crab/Pictures/png_\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"ccf31785-ec13-4981-93c5-ca6c242ac0c3\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/d3478489-70f2-4a82-b7d2-0a47b75986eb.json",
    "content": "{\n    \"description\": \"Use Firefox to search for the country \\\"Ethiopia\\\" on Wikipedia, extract the capital city and population, save this information in an ODS file at \\\"/home/crab/Documents/africa.ods\\\" with LibreOffice Calc with the first column for the country name, the second for the capital city name, and the third for the population without any header, then create a new directory \\\"/home/crab/sheet\\\" and copy all ODS files from \\\"/home/crab/Documents\\\" to \\\"/home/crab/sheet\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"1cd6519a-9ee0-442b-ba5a-9238aeb00ff6\",\n            \"attribute\": {\n                \"country\": \"Ethiopia\",\n                \"file_path\": \"/home/crab/Documents/africa.ods\"\n            },\n            \"output\": \"/home/crab/Documents/africa.ods\"\n        },\n        {\n            \"task\": \"217ababc-ccc7-4b9f-af07-c239d92848fe\",\n            \"attribute\": {\n                \"file_extension\": \"ods\",\n                \"source_dir\": \"/home/crab/Documents\",\n                \"target_dir\": \"/home/crab/sheet\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"d3478489-70f2-4a82-b7d2-0a47b75986eb\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/d39d40b1-fc26-4169-9d6f-cdf81efe9a3e.json",
    "content": "{\n    \"description\": \"Use Firefox to search for the country \\\"Iceland\\\" on Wikipedia, extract the capital city and population, and save this information in an ODS file at \\\"/home/crab/country_iceland.ods\\\" with LibreOffice Calc. The first column will save the country name, the second will save the capital city name, and the third will save the population. No header is needed in the ODS file.\",\n    \"tasks\": [\n        {\n            \"task\": \"1cd6519a-9ee0-442b-ba5a-9238aeb00ff6\",\n            \"attribute\": {\n                \"country\": \"Iceland\",\n                \"file_path\": \"/home/crab/country_iceland.ods\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"d39d40b1-fc26-4169-9d6f-cdf81efe9a3e\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/d3c917ff-406f-447a-87f5-b8d835cba750.json",
    "content": "{\n    \"description\": \"Combine Image 1 \\\"/home/crab/Pictures/cat.png\\\" and Image 2 \\\"/home/crab/assets/campus.png\\\" using GIMP (GNU Image Manipulation Program), placing Image 1 on the left side of Image 2, and save the combined image to \\\"/home/crab/Desktop/background.png\\\". Then, set this combined image as the screen background of the system.\",\n    \"tasks\": [\n        {\n            \"task\": \"4cf246ea-0a7f-43da-84b6-61d74a2699af\",\n            \"attribute\": {\n                \"image_path_1\": \"/home/crab/Pictures/cat.png\",\n                \"image_path_2\": \"/home/crab/assets/campus.png\",\n                \"output_path\": \"/home/crab/Desktop/background.png\"\n            },\n            \"output\": \"/home/crab/Desktop/background.png\"\n        },\n        {\n            \"task\": \"a207ef38-b3b2-4c6c-a1e3-75c38162f5ba\",\n            \"attribute\": {\n                \"photo_path\": \"/home/crab/Desktop/background.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"d3c917ff-406f-447a-87f5-b8d835cba750\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/d6e460e4-c295-40ad-883c-11300d7832f0.json",
    "content": "{\n    \"description\": \"Using Firefox, locate the example provided of torch.matmul by the official PyTorch version 1.13 documentation and copy all the lines of code to the clipboard, then open LibreOffice Writer, paste the content from the clipboard, and save the document as an ODT file at \\\"/home/crab/Documents/torch_matmul.odt\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"49b614c5-c4bb-4c20-aab8-ab9dcc7de1b5\",\n            \"attribute\": {},\n            \"output\": null\n        },\n        {\n            \"task\": \"76de4bdb-c980-4b3a-9bd3-c87db467dffe\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Documents/torch_matmul.odt\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"d6e460e4-c295-40ad-883c-11300d7832f0\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/d9e4e23c-2a2a-4b5c-b034-7deb6036572d.json",
    "content": "{\n    \"description\": \"Use Firefox to find out a \\\"amusement park\\\" around \\\"Sentosa\\\" on Google Maps and copy the Google Maps sharing URL of that \\\"amusement park\\\" to the clipboard\",\n    \"tasks\": [\n        {\n            \"task\": \"2b189dc2-c77f-4fa3-8432-ba4355cc294c\",\n            \"attribute\": {\n                \"place_type\": \"amusement park\",\n                \"place_name\": \"Sentosa\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"d9e4e23c-2a2a-4b5c-b034-7deb6036572d\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/e31d4e3b-b753-4deb-b9ad-a0add5d4790e.json",
    "content": "{\n    \"description\": \"Use Firefox to search for an image with the keyword \\\"Mission: Impossible\\\", copy the image's URL to the clipboard, and then download the file from the clipboard's URL to \\\"/home/crab/Pictures/movie.jpg\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"017102b6-d2c3-466b-96f7-37c8bcddc41a\",\n            \"attribute\": {\n                \"keyword\": \"Mission: Impossible\"\n            },\n            \"output\": \"\"\n        },\n        {\n            \"task\": \"a313ea4d-e501-4971-b4fe-db2aad19acsd\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Pictures/movie.jpg\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"e31d4e3b-b753-4deb-b9ad-a0add5d4790e\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/f07a1f32-2f3f-40e7-b12f-8f1b128c41f6.json",
    "content": "{\n    \"description\": \"Create a new directory \\\"/home/crab/assets_copy\\\" and copy all files with the specified \\\"txt\\\" extension from \\\"/home/crab/assets\\\" to the directory \\\"/home/crab/assets_copy\\\".\",\n    \"tasks\": [\n        {\n            \"task\": \"217ababc-ccc7-4b9f-af07-c239d92848fe\",\n            \"attribute\": {\n                \"file_extension\": \"txt\",\n                \"source_dir\": \"/home/crab/assets\",\n                \"target_dir\": \"/home/crab/assets_copy\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"f07a1f32-2f3f-40e7-b12f-8f1b128c41f6\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/f5cce3a0-ba65-4317-95f8-1fc7d9776c78.json",
    "content": "{\n    \"description\": \"Set \\\"/home/crab/deepmind.png\\\" as the screen background of the system\",\n    \"tasks\": [\n        {\n            \"task\": \"a207ef38-b3b2-4c6c-a1e3-75c38162f5ba\",\n            \"attribute\": {\n                \"photo_path\": \"/home/crab/deepmind.png\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"f5cce3a0-ba65-4317-95f8-1fc7d9776c78\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/f67a26e4-58dd-4dc6-8859-affbf1d62f94.json",
    "content": "{\n    \"description\": \"Open \\\"/home/crab/poem\\\" using vim in a terminal, write \\\"Two roads diverged in a yellow wood, and sorry I could not travel both and be one traveler, long I stood and looked down one as far as I could to where it bent in the undergrowth.\\\", save and exit vim, and then print the content of \\\"/home/crab/poem\\\" to the command line interface through the terminal.\",\n    \"tasks\": [\n        {\n            \"task\": \"0f589bf9-9b26-4581-8b78-2961b115ab49\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/poem\",\n                \"content\": \"Two roads diverged in a yellow wood, and sorry I could not travel both and be one traveler, long I stood and looked down one as far as I could to where it bent in the undergrowth.\"\n            },\n            \"output\": \"/home/crab/poem\"\n        },\n        {\n            \"task\": \"5b527839-0e58-426d-bab6-7160200b0d24\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/poem\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0 1\\n1\",\n    \"id\": \"f67a26e4-58dd-4dc6-8859-affbf1d62f94\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu/f96d7c34-9543-4679-a6ea-89e0c2ef7b1c.json",
    "content": "{\n    \"description\": \"Open \\\"/home/crab/Documents/result\\\" using vim in a terminal, write \\\"Celtics vs. Mavericks odds, score prediction, time: 2024 NBA Finals picks, Game 1 best bets by proven model\\\", then save and exit vim.\",\n    \"tasks\": [\n        {\n            \"task\": \"0f589bf9-9b26-4581-8b78-2961b115ab49\",\n            \"attribute\": {\n                \"file_path\": \"/home/crab/Documents/result\",\n                \"content\": \"Celtics vs. Mavericks odds, score prediction, time: 2024 NBA Finals picks, Game 1 best bets by proven model\"\n            },\n            \"output\": null\n        }\n    ],\n    \"adjlist\": \"0\",\n    \"id\": \"f96d7c34-9543-4679-a6ea-89e0c2ef7b1c\"\n}"
  },
  {
    "path": "crab-benchmark-v0/dataset/ubuntu_subtasks.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# ruff: noqa: E501\nimport base64\nimport hashlib\nimport io\nimport os\nimport re\nimport subprocess\nimport time\nfrom collections import Counter\nfrom functools import cache\nfrom typing import Callable, List, Optional, Tuple\n\nimport cv2\nimport easyocr\nimport imageio as imio\nimport networkx as nx\nimport numpy as np\nimport psutil\nimport pyperclip\nimport requests\nimport torch\nfrom networkx import DiGraph, path_graph\nfrom numpy.linalg import norm\nfrom PIL import Image\n\nfrom crab import SubTask, TaskGenerator, action, evaluator\nfrom crab.actions.crab_actions import check_submit, submit\n\n\nclass ImageMatcher:\n    \"\"\"\n    A class to handle image matching, resizing, and cropping operations using accelerated feature matching.\n    See https://github.com/verlab/accelerated_features.\n    \"\"\"\n\n    def __init__(self, top_k: int = 4096):\n        \"\"\"\n        Initializes the ImageMatcher with a pretrained XFeat model.\n\n        Parameters:\n        top_k (int): The number of top features to use for matching.\n        \"\"\"\n        self.xfeat = torch.hub.load(\n            \"verlab/accelerated_features\", \"XFeat\", pretrained=True, top_k=top_k\n        )\n        self.top_k = top_k\n\n    def warp_corners_and_draw_matches(\n        self,\n        ref_points: np.ndarray,\n        dst_points: np.ndarray,\n        img1: np.ndarray,\n        img2: np.ndarray,\n    ) -> Tuple[np.ndarray, np.ndarray]:\n        \"\"\"\n        Calculates the homography matrix and warps the corners of the first image to the second image space.\n\n        Parameters:\n        ref_points (np.ndarray): Reference points from the first image.\n        dst_points (np.ndarray): Destination points from the second image.\n        img1 (np.ndarray): The first image.\n        img2 (np.ndarray): The second image.\n\n        Returns:\n        Tuple[np.ndarray, np.ndarray]: Image with warped corners and the warped corners coordinates.\n        \"\"\"\n        H, mask = cv2.findHomography(\n            ref_points,\n            dst_points,\n            cv2.USAC_MAGSAC,\n            3.5,\n            maxIters=1000,\n            confidence=0.999,\n        )\n        mask = mask.flatten()\n\n        h, w = img1.shape[:2]\n        corners_img1 = np.array(\n            [[0, 0], [w - 1, 0], [w - 1, h - 1], [0, h - 1]], dtype=np.float32\n        ).reshape(-1, 1, 2)\n        warped_corners = cv2.perspectiveTransform(corners_img1, H)\n\n        img2_with_corners = img2.copy()\n        for i in range(len(warped_corners)):\n            start_point = tuple(warped_corners[i - 1][0].astype(int))\n            end_point = tuple(warped_corners[i][0].astype(int))\n            cv2.line(img2_with_corners, start_point, end_point, (0, 255, 0), 4)\n\n        keypoints1 = [cv2.KeyPoint(p[0], p[1], 5) for p in ref_points]\n        keypoints2 = [cv2.KeyPoint(p[0], p[1], 5) for p in dst_points]\n        matches = [cv2.DMatch(i, i, 0) for i in range(len(mask)) if mask[i]]\n\n        img_matches = cv2.drawMatches(\n            img1,\n            keypoints1,\n            img2_with_corners,\n            keypoints2,\n            matches,\n            None,\n            matchColor=(0, 255, 0),\n            flags=2,\n        )\n\n        return img_matches, warped_corners\n\n    def _get_bounding_box(\n        self, warped_corners: np.ndarray, img_shape: Tuple[int, int]\n    ) -> List[int]:\n        \"\"\"\n        Computes the bounding box around the warped corners.\n\n        Parameters:\n        warped_corners (np.ndarray): The warped corners coordinates.\n        img_shape (Tuple[int, int]): The shape of the image as (height, width).\n\n        Returns:\n        List[int]: Bounding box coordinates [x_min, x_max, y_min, y_max].\n        \"\"\"\n        h, w = img_shape\n\n        x_min = np.min(warped_corners[:, 0, 0])\n        x_max = np.max(warped_corners[:, 0, 0])\n        y_min = np.min(warped_corners[:, 0, 1])\n        y_max = np.max(warped_corners[:, 0, 1])\n\n        x_min = max(0, x_min)\n        x_max = min(w - 1, x_max)\n        y_min = max(0, y_min)\n        y_max = min(h - 1, y_max)\n\n        return [int(x_min), int(x_max), int(y_min), int(y_max)]\n\n    def _resize_image(\n        self, img1: np.ndarray, img2: np.ndarray, scale: float, match_dimension: str\n    ) -> Tuple[np.ndarray, np.ndarray]:\n        \"\"\"\n        Resizes img1 to match a scaled dimension of img2.\n\n        Parameters:\n        img1 (np.ndarray): The first image to be resized.\n        img2 (np.ndarray): The reference image.\n        scale (float): The scale factor (0.5 for half size).\n        match_dimension (str): The dimension to match ('height' or 'width').\n\n        Returns:\n        Tuple[np.ndarray, np.ndarray]: Resized img1 and original img2.\n        \"\"\"\n        h1, w1 = img1.shape[:2]\n        h2, w2 = img2.shape[:2]\n\n        if match_dimension == \"height\":\n            new_height = int(h2 * scale)\n            new_width = int(w1 * (new_height / h1))\n        elif match_dimension == \"width\":\n            new_width = int(w2 * scale)\n            new_height = int(h1 * (new_width / w1))\n        else:\n            raise ValueError(\"match_dimension must be either 'height' or 'width'.\")\n\n        resized_img1 = cv2.resize(img1, (new_width, new_height))\n        return resized_img1, img2\n\n    def get_resizing_functions(\n        self,\n    ) -> List[Callable[[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray]]]:\n        \"\"\"\n        Provides a list of resizing functions.\n\n        Returns:\n        List[Callable[[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray]]]: List of resizing functions.\n        \"\"\"\n        return [\n            lambda x, y: (x, y),\n            lambda x, y: self._resize_image(x, y, 1.0, \"height\"),\n            lambda x, y: self._resize_image(x, y, 1.0, \"width\"),\n            lambda x, y: self._resize_image(x, y, 0.5, \"height\"),\n            lambda x, y: self._resize_image(x, y, 0.5, \"width\"),\n        ]\n\n    def match_images(\n        self,\n        im1_path: str,\n        im2_path: str,\n        top_k: int = 4096,\n        match_num_threshold: int = 80,\n    ) -> Tuple[Optional[List[int]], Optional[np.ndarray], int]:\n        \"\"\"\n        Matches two images and finds the bounding box around the matched area if sufficient matches are found.\n\n        Parameters:\n        im1_path (str): Path to the first image.\n        im2_path (str): Path to the second image.\n        top_k (int): The number of top features to use for matching.\n        match_num_threshold (int): The minimum number of matches required to consider the match valid.\n\n        Returns:\n        Tuple[Optional[List[int]], Optional[np.ndarray], int]: Bounding box, image with matched keypoints drawn, and the number of matches found.\n        \"\"\"\n        im1 = self.load_and_convert_image(im1_path)\n        im2 = self.load_and_convert_image(im2_path)\n\n        best_matches = {\n            \"count\": 0,\n            \"im1_resized\": None,\n            \"im2_resized\": None,\n            \"mkpts_0\": None,\n            \"mkpts_1\": None,\n        }\n\n        for resize_func in self.get_resizing_functions():\n            try:\n                im1_resized, im2_resized = resize_func(im1, im2)\n                mkpts_0, mkpts_1 = self.xfeat.match_xfeat_star(\n                    im1_resized, im2_resized, top_k=top_k\n                )\n\n                if len(mkpts_0) > best_matches[\"count\"]:\n                    best_matches.update(\n                        {\n                            \"count\": len(mkpts_0),\n                            \"im1_resized\": im1_resized,\n                            \"im2_resized\": im2_resized,\n                            \"mkpts_0\": mkpts_0,\n                            \"mkpts_1\": mkpts_1,\n                        }\n                    )\n            except Exception:\n                continue\n\n        if best_matches[\"count\"] >= match_num_threshold:\n            canvas, warped_corners = self.warp_corners_and_draw_matches(\n                best_matches[\"mkpts_0\"],\n                best_matches[\"mkpts_1\"],\n                best_matches[\"im1_resized\"],\n                best_matches[\"im2_resized\"],\n            )\n            bbox = self._get_bounding_box(warped_corners, im2_resized.shape[:2])\n        else:\n            bbox, canvas = None, None\n\n        return bbox, canvas, best_matches[\"count\"]\n\n    def load_and_convert_image(self, filepath: str) -> np.ndarray:\n        \"\"\"\n        Loads an image from a file and converts it to JPG format if necessary.\n\n        Parameters:\n        filepath (str): The path to the image file.\n\n        Returns:\n        np.ndarray: The loaded and converted image.\n        \"\"\"\n        image = Image.open(filepath)\n        if image.mode != \"RGB\":\n            image = image.convert(\"RGB\")\n        with io.BytesIO() as output:\n            image.save(output, format=\"JPEG\")\n            converted_image = np.copy(imio.v2.imread(output)[..., ::-1])\n        return converted_image\n\n\nimage_matcher = ImageMatcher()\n\n\ndef from_env_load_and_save_file(env, file_path, output_dir=\"/tmp/local_save\"):\n    \"\"\"\n    Load a file, convert it to bytes, and save it to a local directory with the same basename.\n\n    Args:\n        env: The environment object with the _action_endpoint method.\n        file_path (str): The path to the file to be loaded.\n        output_dir (str): The directory where the file should be saved (default is \"/tmp/local_save\").\n\n    Returns:\n        str: The path to the saved file.\n    \"\"\"\n\n    @action(env_name=\"ubuntu\")\n    def get_encoded_file(file_path: str) -> bytes | None:\n        try:\n            with open(file_path, \"rb\") as file:\n                file_bytes = file.read()\n                encoded_string = base64.b64encode(file_bytes).decode(\"utf-8\")\n        except Exception:\n            return None\n\n        return encoded_string\n\n    # Create output directory if it does not exist\n    os.makedirs(output_dir, exist_ok=True)\n\n    # Load the file and convert to bytes\n    encoded_string = env._action_endpoint(get_encoded_file, {\"file_path\": file_path})\n\n    # Decode the Base64 string back to bytes\n    decoded_bytes = base64.b64decode(encoded_string.encode(\"utf-8\"))\n\n    # Create the output file path\n    file_name = os.path.basename(file_path)\n    output_file_path = os.path.join(output_dir, file_name)\n\n    # Save the decoded bytes to the output path\n    with open(output_file_path, \"wb\") as file:\n        file.write(decoded_bytes)\n\n    return output_file_path\n\n\ndef crop_image(img: np.ndarray, bbox: List[int]) -> np.ndarray:\n    \"\"\"\n    Crops the image based on the bounding box coordinates.\n\n    Parameters:\n    img (np.ndarray): The input image.\n    bbox (List[int]): Bounding box coordinates [x_min, x_max, y_min, y_max].\n\n    Returns:\n    np.ndarray: The cropped image.\n    \"\"\"\n    x_min, x_max, y_min, y_max = bbox\n    return img[y_min:y_max, x_min:x_max]\n\n\ndef calculate_bbox_center(bbox: List[int]) -> Tuple[int, int]:\n    \"\"\"\n    Calculates the center of a bounding box.\n\n    Parameters:\n    bbox (List[int]): The bounding box coordinates [x_min, x_max, y_min, y_max].\n\n    Returns:\n    Tuple[int, int]: The center coordinates (x, y).\n    \"\"\"\n    x_min, x_max, y_min, y_max = bbox\n    x_center = (x_min + x_max) // 2\n    y_center = (y_min + y_max) // 2\n    return x_center, y_center\n\n\ndef is_bbox_in_direction(bbox_1: List[int], bbox_2: List[int], direction: str) -> bool:\n    \"\"\"\n    Check if the center of bbox_1 is in the specified direction relative to the center of bbox_2.\n\n    Args:\n        bbox_1 (List[int]): The bounding box coordinates [x_min, x_max, y_min, y_max] of the first bounding box.\n        bbox_2 (List[int]): The bounding box coordinates [x_min, x_max, y_min, y_max] of the second bounding box.\n        direction (str): The direction to check (\"left\", \"right\", \"above\", \"below\").\n\n    Returns:\n        bool: True if the center of bbox_1 is in the specified direction relative to bbox_2, False otherwise.\n    \"\"\"\n\n    center_1 = calculate_bbox_center(bbox_1)\n    center_2 = calculate_bbox_center(bbox_2)\n\n    if direction == \"left\":\n        return center_1[0] < center_2[0]\n    elif direction == \"right\":\n        return center_1[0] > center_2[0]\n    elif direction == \"above\":\n        return center_1[1] < center_2[1]\n    elif direction == \"below\":\n        return center_1[1] > center_2[1]\n    else:\n        raise ValueError(\"Invalid direction. Use 'left', 'right', 'above', or 'below'.\")\n\n\ndef ocr_text_matching(\n    image_path: str, text: str\n) -> Optional[Tuple[List[int], str, float]]:\n    \"\"\"\n    Performs OCR on an image to find a specific text string and returns the bounding box, matched text, and confidence level.\n\n    Parameters:\n    image_path (str): The path to the image file.\n    text (str): The text string to search for in the image.\n\n    Returns:\n    Optional[Tuple[List[int], str, float]]: The bounding box coordinates [x_min, y_min, x_max, y_max], the matched text, and the confidence level if found, otherwise None.\n    \"\"\"\n    reader = easyocr.Reader([\"en\"])\n    result = reader.readtext(image_path)\n\n    for entry in result:\n        bbox, detected_text, confidence = entry\n        if text in detected_text:\n            # Extract the bounding box coordinates\n            x_min = min(bbox[0][0], bbox[1][0], bbox[2][0], bbox[3][0])\n            x_max = max(bbox[0][0], bbox[1][0], bbox[2][0], bbox[3][0])\n            y_min = min(bbox[0][1], bbox[1][1], bbox[2][1], bbox[3][1])\n            y_max = max(bbox[0][1], bbox[1][1], bbox[2][1], bbox[3][1])\n            return (\n                [int(x_min), int(x_max), int(y_min), int(y_max)],\n                detected_text,\n                confidence,\n            )\n\n    return None\n\n\ndef convert_file_to_images(file_path: str) -> List[str]:\n    \"\"\"\n    Convert a file to JPG images using LibreOffice and return the list of image file paths.\n\n    Args:\n        file_path (str): The path to the file.\n\n    Returns:\n        List[str]: List of paths to the generated image files.\n    \"\"\"\n    output_format = \"jpg\"\n    output_dir = \"/tmp/converted_images\"\n    os.makedirs(output_dir, exist_ok=True)\n\n    # Run LibreOffice conversion command\n    result = subprocess.run(\n        [\n            \"libreoffice\",\n            \"--headless\",\n            \"--convert-to\",\n            output_format,\n            \"--outdir\",\n            output_dir,\n            file_path,\n        ],\n        capture_output=True,\n        text=True,\n    )\n\n    # Check if the conversion was successful\n    if result.returncode != 0:\n        raise RuntimeError(f\"Conversion failed: {result.stderr}\")\n\n    # Collect the generated image file paths\n    image_files = [\n        os.path.join(output_dir, f)\n        for f in os.listdir(output_dir)\n        if f.endswith(f\".{output_format}\")\n    ]\n\n    # Verify if the files were successfully saved\n    if not image_files:\n        raise FileNotFoundError(\n            f\"No {output_format} files found in the output directory\"\n        )\n\n    # Get the basename of the original file (without extension)\n    file_basename = os.path.splitext(os.path.basename(file_path))[0]\n\n    # Check if any of the images match the basename of the original file\n    matching_images = [f for f in image_files if file_basename in os.path.basename(f)]\n    if not matching_images:\n        raise FileNotFoundError(\n            f\"No images found with basename matching the original file: {file_basename}\"\n        )\n\n    return matching_images\n\n\ndef cleanup_files(files: List[str]):\n    \"\"\"\n    Delete the list of files.\n\n    Args:\n        files (List[str]): List of paths to the files to be deleted.\n    \"\"\"\n    for file in files:\n        os.remove(file)\n\n\ndef is_valid_url(url):\n    # Regular expression to check if the string is a valid HTTP/HTTPS URL\n    url_pattern = re.compile(\n        r\"^(https?://)\"  # http:// or https://\n        r\"(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\\.)+[A-Z]{2,6}\\.?|\"  # domain\n        r\"localhost|\"  # localhost...\n        r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\"  # ...or ip\n        r\"(?::\\d+)?\"  # optional port\n        r\"(?:/?|[/?]\\S+)$\",\n        re.IGNORECASE,\n    )\n    return bool(re.match(url_pattern, url))\n\n\ndef is_valid_image_data_uri(uri):\n    # Regular expression to check if the string is a valid Data URI for image formats\n    data_uri_pattern = re.compile(\n        r\"^data:image/(png|jpeg|gif|svg\\+xml|bmp|webp);base64,[A-Za-z0-9+/]+={0,2}$\",\n        re.IGNORECASE,\n    )\n    return bool(re.match(data_uri_pattern, uri))\n\n\ndef is_github_repo_url(url):\n    # Regular expression to check if the URL is a GitHub repository URL\n    github_repo_pattern = re.compile(\n        r\"^https?://\"  # Protocol\n        r\"github\\.com/\"  # Domain\n        r\"[^/]+/\"  # Username\n        r\"[^/]+/?$\",  # Repository name, optional trailing slash\n        re.IGNORECASE,\n    )\n    return bool(re.match(github_repo_pattern, url))\n\n\ndef get_rgb_values_outside_bbox(\n    img: np.ndarray, bbox: List[int], margin: int = 10\n) -> Tuple[np.ndarray, Tuple[int, int, int]]:\n    \"\"\"\n    Reads the pixel color RGB values outside of the bounding box with an additional margin and finds the most frequent RGB value.\n\n    Parameters:\n    img (np.ndarray): The input image.\n    bbox (List[int]): Bounding box coordinates [x_min, x_max, y_min, y_max].\n    margin (int): The margin to add outside the bounding box. Default is 10.\n\n    Returns:\n    Tuple[np.ndarray, Tuple[int, int, int]]: The RGB values outside the bounding box with the margin and the most frequent RGB value.\n    \"\"\"\n    x_min, x_max, y_min, y_max = bbox\n\n    # Ensure the coordinates with margin are within image dimensions\n    x_min_with_margin = max(0, x_min - margin)\n    x_max_with_margin = min(img.shape[1], x_max + margin)\n    y_min_with_margin = max(0, y_min - margin)\n    y_max_with_margin = min(img.shape[0], y_max + margin)\n\n    # Create a mask for the bounding box area with margin\n    mask = np.ones(img.shape[:2], dtype=bool)\n    mask[y_min_with_margin:y_max_with_margin, x_min_with_margin:x_max_with_margin] = (\n        False\n    )\n\n    # Extract the RGB values outside the bounding box with margin\n    rgb_values = img[mask]\n\n    # Find the most frequent RGB value\n    rgb_values_tuple = [tuple(rgb) for rgb in rgb_values]\n    most_common_rgb = Counter(rgb_values_tuple).most_common(1)[0][0]\n\n    return list(most_common_rgb)[::-1]\n\n\ndef contains_required_strings(clipboard_content: str, required_strings: list) -> bool:\n    \"\"\"\n    Check if all required strings are present in the clipboard content.\n\n    Args:\n        clipboard_content (str): The content from the clipboard.\n        required_strings (list): A list of required strings to check.\n\n    Returns:\n        bool: True if all required strings are found in the clipboard content, False otherwise.\n    \"\"\"\n    for string in required_strings:\n        if string not in clipboard_content:\n            return False\n    return True\n\n\n@evaluator(env_name=\"ubuntu\")\ndef verify_file_content_with_clipboard(file_path: str) -> bool:\n    \"\"\"\n    Verify that the content of the file matches the clipboard content line by line.\n\n    Args:\n        file_path (str): The path to the file to verify.\n\n    Returns:\n        bool: True if the file content matches the clipboard content, False otherwise.\n    \"\"\"\n\n    def verify_content_with_clipboard(file_content: str) -> bool:\n        \"\"\"\n        Verify that the provided file content matches the clipboard content line by line.\n\n        Args:\n            file_content (str): The content of the file to verify.\n\n        Returns:\n            bool: True if the file content matches the clipboard content, False otherwise.\n        \"\"\"\n        clipboard_content = pyperclip.paste()\n        clipboard_lines = clipboard_content.split(\"\\n\")\n        file_lines = file_content.split(\"\\n\")\n\n        # Check if each line from the clipboard content is in the corresponding line in the file content\n        for clipboard_line, file_line in zip(clipboard_lines, file_lines):\n            if clipboard_line not in file_line:\n                return False\n\n        return True\n\n    with open(file_path, \"r\") as file:\n        file_content = file.read()\n\n    return verify_content_with_clipboard(file_content)\n\n\n@evaluator(env_name=\"ubuntu\")\ndef verify_odt_file_content_with_clipboard(file_path: str) -> bool:\n    \"\"\"\n    Verify that the content of the ODT file matches the clipboard content.\n\n    Args:\n        file_path (str): The path to the ODT file to verify.\n\n    Returns:\n        bool: True if the ODT file content matches the clipboard content, False otherwise.\n    \"\"\"\n    from odf import teletype, text\n    from odf.opendocument import load\n\n    def verify_content_with_clipboard(file_content: str) -> bool:\n        \"\"\"\n        Verify that the provided file content matches the clipboard content line by line.\n\n        Args:\n            file_content (str): The content of the file to verify.\n\n        Returns:\n            bool: True if the file content matches the clipboard content, False otherwise.\n        \"\"\"\n        clipboard_content = pyperclip.paste()\n        clipboard_lines = clipboard_content.split(\"\\n\")\n        file_lines = file_content.split(\"\\n\")\n\n        # Check if each line from the clipboard content is in the corresponding line in the file content\n        for clipboard_line, file_line in zip(clipboard_lines, file_lines):\n            if clipboard_line not in file_line:\n                return False\n\n        return True\n\n    textdoc = load(file_path)\n    allparas = textdoc.getElementsByType(text.P)\n    odt_content = \"\\n\".join([teletype.extractText(p) for p in allparas])\n\n    return verify_content_with_clipboard(odt_content)\n\n\n@evaluator(env_name=\"ubuntu\", local=True)\ndef verify_combined_image(\n    image_path_1: str, image_path_2: str, file_path: str, direction: str, env\n) -> bool:\n    \"\"\"\n    Check if the combined file contains both input images without overlay and in the specified direction.\n\n    Args:\n        image_path_1 (str): Path to the first image.\n        image_path_2 (str): Path to the second image.\n        file_path (str): Path to the combined file.\n        direction (str): The direction to check (\"left\", \"right\", \"above\", \"below\").\n\n    Returns:\n        bool: True if the combined file contains both input images in the specified direction without overlay, False otherwise.\n    \"\"\"\n\n    saved_image_path_1 = from_env_load_and_save_file(env, image_path_1)\n    saved_image_path_2 = from_env_load_and_save_file(env, image_path_2)\n    saved_file_path = from_env_load_and_save_file(env, file_path)\n\n    # Determine if file_path is already an image\n\n    if file_path.lower().endswith((\".jpg\", \".jpeg\", \".png\", \".bmp\", \".tiff\")):\n        combined_image_path = saved_file_path\n    else:\n        # Convert the file to images\n        combined_image_path = convert_file_to_images(saved_file_path)[0]\n\n    try:\n        # Match the first image within the combined image\n        bbox_1, _, _ = image_matcher.match_images(\n            saved_image_path_1, combined_image_path\n        )\n\n        # Match the second image within the combined image\n        bbox_2, _, _ = image_matcher.match_images(\n            saved_image_path_2, combined_image_path\n        )\n\n        # Check if both bounding boxes are found\n        if bbox_1 is None or bbox_2 is None:\n            return False\n\n        # Check if bbox_1 is in the specified direction relative to bbox_2\n        correct_direction = is_bbox_in_direction(bbox_1, bbox_2, direction)\n\n        return correct_direction\n    finally:\n        # Cleanup intermediate image files if they were created\n        cleanup_files(\n            [\n                combined_image_path,\n                saved_image_path_1,\n                saved_image_path_2,\n                saved_file_path,\n            ]\n        )\n\n\n@evaluator(env_name=\"ubuntu\")\ndef is_image_2_brighter(image_path_1: str, image_path_2: str) -> bool:\n    \"\"\"\n    Check if the second image is brighter than the first image.\n\n    Args:\n        image_path_1(str): The path to the first image.\n        image_path_2(str): The path to the second image.\n    \"\"\"\n\n    def brightness(image_path: str) -> float:\n        # Load the image\n        img = cv2.imread(image_path)\n        if len(img.shape) == 3:\n            # Colored RGB or BGR (*Do Not* use HSV images with this function)\n            # create brightness with euclidean norm\n            return float(np.average(norm(img, axis=2)) / np.sqrt(3))\n        else:\n            # Grayscale\n            return float(np.average(img))\n\n    brightness_1 = brightness(image_path_1)\n    brightness_2 = brightness(image_path_2)\n\n    return brightness_2 > brightness_1\n\n\n@evaluator(env_name=\"ubuntu\")\ndef is_img_url_in_clipboard() -> bool:\n    \"\"\"\n    Check if the clipboard contains a valid URL or a Data URI that is specific to images.\n\n    Args:\n        env (Environment): The current testing environment, used to simulate clipboard functionality.\n\n    Returns:\n        bool: True if a valid URL or Data URI specific to images is found in the clipboard, False otherwise.\n    \"\"\"\n    clipboard_content = pyperclip.paste()  # Simulate clipboard paste action\n    data_uri_pattern = re.compile(\n        r\"^data:image/(png|jpeg|gif|svg\\+xml|bmp|webp);base64,[A-Za-z0-9+/]+={0,2}$\",\n        re.IGNORECASE,\n    )\n    is_valid_image_data = bool(re.match(data_uri_pattern, clipboard_content))\n    url_pattern = re.compile(\n        r\"^(https?://)\"  # http:// or https://\n        r\"(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\\.)+[A-Z]{2,6}\\.?|\"  # domain\n        r\"localhost|\"  # localhost...\n        r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\"  # ...or ip\n        r\"(?::\\d+)?\"  # optional port\n        r\"(?:/?|[/?]\\S+)$\",\n        re.IGNORECASE,\n    )\n    is_valid_url = bool(re.match(url_pattern, clipboard_content))\n    if is_valid_url or is_valid_image_data:\n        return True\n    return False\n\n\n@evaluator(env_name=\"ubuntu\")\ndef is_github_repo_url_in_clipboard(keyword: str) -> bool:\n    \"\"\"\n    Check if the clipboard contains a valid GitHub repository URL.\n\n    Returns:\n        bool: True if the clipboard content is a valid GitHub repository URL, False otherwise.\n    \"\"\"\n    clipboard_content = pyperclip.paste()  # Access the clipboard content\n    if keyword.lower() not in clipboard_content:\n        return False\n    github_repo_pattern = re.compile(\n        r\"^https?://\"  # Protocol\n        r\"github\\.com/\"  # Domain\n        r\"[^/]+/\"  # Username\n        r\"[^/]+/?$\",  # Repository name, optional trailing slash\n        re.IGNORECASE,\n    )\n    return bool(re.match(github_repo_pattern, clipboard_content))\n    # return is_github_repo_url(clipboard_content)\n\n\n@evaluator(env_name=\"ubuntu\")\ndef is_software_installed(package_name: str) -> bool:\n    try:\n        subprocess.check_call(\n            [\"dpkg\", \"-s\", package_name],\n            stdout=subprocess.DEVNULL,\n            stderr=subprocess.DEVNULL,\n        )\n        return True\n    except subprocess.CalledProcessError:\n        return False\n\n\n@cache\ndef get_file_url_hash(url):\n    response = requests.get(url)\n    response.raise_for_status()\n    return hashlib.sha256(response.content).hexdigest()\n\n\n@evaluator(env_name=\"ubuntu\")\ndef download_and_verify_file(url: str, file_path: str) -> bool:\n    # Check if the file was downloaded\n    if not os.path.isfile(file_path):\n        return False\n\n    # Calculate the hash of the downloaded file\n    with open(file_path, \"rb\") as f:\n        file_data = f.read()\n        downloaded_file_hash = hashlib.sha256(file_data).hexdigest()\n\n    # Get the file content directly from the URL\n    try:\n        original_file_hash = get_file_url_hash(url)\n    except requests.RequestException:\n        return False\n\n    # Compare the hashes\n    return downloaded_file_hash == original_file_hash\n\n\n@evaluator(env_name=\"ubuntu\")\ndef download_from_clipboard_and_verify_file(file_path: str) -> bool:\n    # Check if the file was downloaded\n    if not os.path.isfile(file_path):\n        return False\n\n    # Calculate the hash of the downloaded file\n    with open(file_path, \"rb\") as f:\n        file_data = f.read()\n        downloaded_file_hash = hashlib.sha256(file_data).hexdigest()\n\n    # Get the url from clipboard\n    content = pyperclip.paste()\n    \"\"\"\n    Problem: \n        1. There exist infinite possibilities of the downloable format in the clipboard. Not sure if we need to verify the format.\n    \"\"\"\n    # Get the file content directly from the URL\n    try:\n        original_file_hash = get_file_url_hash(content)\n    except requests.RequestException:\n        return False\n\n    # Compare the hashes\n    return downloaded_file_hash == original_file_hash\n\n\n@evaluator(env_name=\"ubuntu\")\ndef check_color_scheme(assmue: str) -> bool:\n    out = subprocess.check_output(\n        [\"gsettings\", \"get\", \"org.gnome.desktop.interface\", \"color-scheme\"],\n        text=True,\n    )\n    return assmue in out\n\n\n@evaluator(env_name=\"ubuntu\")\ndef check_text_in_current_window_name(text: str) -> bool:\n    try:\n        out = subprocess.check_output(\n            [\"xdotool\", \"getwindowfocus\", \"getwindowname\"], text=True\n        ).strip()\n    except subprocess.CalledProcessError:\n        return False\n    return text in out\n\n\n@evaluator(env_name=\"ubuntu\")\ndef check_current_window_process(assmue: str) -> bool:\n    try:\n        out = subprocess.check_output(\n            [\"xdotool\", \"getwindowfocus\", \"getwindowpid\"], text=True\n        ).strip()\n        if not out.isdigit():\n            return False\n        process = psutil.Process(int(out))\n    except (\n        psutil.NoSuchProcess,\n        psutil.AccessDenied,\n        psutil.ZombieProcess,\n        subprocess.CalledProcessError,\n    ):\n        return False\n    return assmue.strip() == process.name()\n\n\n@evaluator(env_name=\"ubuntu\")\ndef check_file_exist(file_path: str) -> bool:\n    return os.path.isfile(file_path)\n\n\n@evaluator(env_name=\"ubuntu\")\ndef check_file_content(file_path: str, content: str) -> bool:\n    if not os.path.isfile(file_path):\n        return False\n    with open(file_path, \"r\") as f:\n        file_content = f.read()\n    return content in file_content\n\n\n@evaluator(env_name=\"ubuntu\")\ndef empty_evaluator() -> bool:\n    return False\n\n\n@evaluator(env_name=\"ubuntu\")\ndef is_process_open(process_name: str) -> bool:\n    \"\"\"\n    Check if the given process is currently running.\n\n    Args:\n        process_name(str): The process name to check.\n    \"\"\"\n    for process in psutil.process_iter([\"name\"]):\n        try:\n            if process_name.lower() in process.info[\"name\"].lower():  # type: ignore\n                return True\n        except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):\n            pass\n    return False\n\n\n@evaluator(env_name=\"ubuntu\")\ndef check_app_usage_history(app_name: str) -> bool:\n    \"\"\"\n    Check if the given application has been in the usage history.\n    Args:\n        app_name(str): The name of the application to check.\n    Returns:\n        bool: True if the app was recently used, False otherwise.\n    \"\"\"\n    for process in psutil.process_iter([\"name\", \"create_time\"]):\n        try:\n            if app_name.lower() in process.info[\"name\"].lower():\n                # Assuming 'recently used' implies a running process was started within the last hour\n                if time.time() - process.info[\"create_time\"] < 3600:\n                    return True\n        except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):\n            continue\n    return False\n\n\n@evaluator(env_name=\"ubuntu\")\ndef check_process_closed(app_name: str) -> bool:\n    \"\"\"\n    Verify that the specified process is not running.\n    Args:\n        app_name(str): The application name to check for its absence.\n    Returns:\n        bool: True if the app is not running, False otherwise.\n    \"\"\"\n    return not any(\n        app_name.lower() in proc.info[\"name\"].lower()\n        for proc in psutil.process_iter([\"name\"])\n        if proc.is_running()\n    )\n\n\n@evaluator(env_name=\"ubuntu\")\ndef verify_background(photo_path: str) -> bool:\n    \"\"\"\n    Verify that the specified photo is currently set as the desktop background.\n\n    Args:\n        photo_path (str): The path to the photo file.\n\n    Returns:\n        bool: True if the photo is the current background, False otherwise.\n    \"\"\"\n    out = subprocess.check_output(\n        [\"gsettings\", \"get\", \"org.gnome.desktop.background\", \"picture-uri\"],\n        universal_newlines=True,\n    )\n    current_background = (\n        out.strip().split(\"'\")[1].split(\"file:/\")[1]\n    )  # Extract the path\n\n    # Compute hashes to compare files\n    if os.path.exists(photo_path) and os.path.exists(current_background):\n        with open(photo_path, \"rb\") as f:\n            original_hash = hashlib.sha256(f.read()).hexdigest()\n        with open(current_background, \"rb\") as f:\n            current_hash = hashlib.sha256(f.read()).hexdigest()\n\n        return original_hash == current_hash\n\n    return False\n\n\n@evaluator(env_name=\"ubuntu\")\ndef is_torch_matmul_example_copied_correctly() -> bool:\n    \"\"\"\n    Verify if the clipboard contains the correct torch.matmul example snippets from PyTorch 1.13 documentation.\n    \"\"\"\n\n    def contains_required_strings(\n        clipboard_content: str, required_strings: list\n    ) -> bool:\n        \"\"\"\n        Check if all required strings are present in the clipboard content.\n\n        Args:\n            clipboard_content (str): The content from the clipboard.\n            required_strings (list): A list of required strings to check.\n\n        Returns:\n            bool: True if all required strings are found in the clipboard content, False otherwise.\n        \"\"\"\n        for string in required_strings:\n            if string not in clipboard_content:\n                return False\n        return True\n\n    required_strings = [\n        \"tensor1 = torch.randn\",\n        \"tensor2 = torch.randn\",\n        \"torch.matmul(tensor1, tensor2).size()\",\n    ]\n    clipboard_content = pyperclip.paste().strip()\n    if not clipboard_content:\n        return False\n\n    return contains_required_strings(clipboard_content, required_strings)\n\n\n@evaluator(env_name=\"ubuntu\")\ndef check_directory_exists(dir_path: str) -> bool:\n    \"\"\"Check if the specified directory exists.\"\"\"\n    return os.path.isdir(dir_path)\n\n\n@evaluator(env_name=\"ubuntu\")\ndef verify_files_copied(source_dir: str, target_dir: str, file_extension: str) -> bool:\n    \"\"\"Verify that files were copied correctly.\"\"\"\n    source_files = {\n        file for file in os.listdir(source_dir) if file.endswith(f\".{file_extension}\")\n    }\n    target_files = {\n        file for file in os.listdir(target_dir) if file.endswith(f\".{file_extension}\")\n    }\n    return source_files == target_files\n\n\n@evaluator(env_name=\"ubuntu\", local=True)\ndef check_contain_input_text_list(texts: list[str], env) -> bool:\n    \"\"\"\n    Check if all provided search terms were entered in the browser.\n\n    Args:\n        search_terms: A list of strings, each representing a search term that needs to be verified.\n        env: The current testing environment, used to simulate browser interactions.\n\n    Returns:\n        bool: True if all search terms are found in the written text, False otherwise.\n    \"\"\"\n    if env.trajectory:\n        inputs = [\n            params[\"text\"].lower()\n            for action_name, params, _ in env.trajectory\n            if action_name == \"write_text\"\n        ]\n        return all(\n            any(term.lower() in input_text for input_text in inputs) for term in texts\n        )\n    return False\n\n\n@evaluator(env_name=\"ubuntu\")\ndef is_google_maps_url_in_clipboard() -> bool:\n    \"\"\"\n    Check if the clipboard contains a valid shortened Google Maps URL.\n    \"\"\"\n    clipboard_content = pyperclip.paste()\n    maps_url_pattern = re.compile(\n        r\"^https://maps\\.app\\.goo\\.gl/[A-Za-z0-9]+$\",\n        re.IGNORECASE,\n    )\n    return bool(re.match(maps_url_pattern, clipboard_content))\n\n\n@evaluator(env_name=\"ubuntu\", local=True)\ndef check_contain_input_text(text: str, env) -> bool:\n    \"\"\"\n    Check if the input text is contained in the written text action in a case-insensitive manner.\n\n    Args:\n        text (str): The text to check for.\n        env: The current testing environment, used to access the trajectory.\n\n    Returns:\n        bool: True if the input text is found in the written text action, False otherwise.\n    \"\"\"\n    if env.trajectory:\n        inputs = [\n            params[\"text\"].lower()\n            for action_name, params, _ in env.trajectory\n            if action_name == \"write_text\"\n        ]\n        return any(text.lower() in input_text for input_text in inputs)\n    return False\n\n\n@evaluator(env_name=\"ubuntu\")\ndef verify_country_data_in_ods(country: str, file_path: str) -> bool:\n    from bs4 import BeautifulSoup\n    from pyexcel_ods import get_data\n\n    def extract_population(text):\n        # Use regex to extract the first sequence of numbers which possibly contains commas\n        if text:\n            match = re.search(r\"\\d{1,3}(?:,\\d{3})*(?=\\[|$)\", text)\n            if match:\n                return match.group(0).replace(\",\", \"\")  # Remove commas\n        return \"0\"\n\n    def normalize_population(text):\n        # Ensure the input is treated as a string, whether it's originally an int or str\n        text = str(text)\n        # Normalize the population string by removing non-digit characters\n        return \"\".join(filter(str.isdigit, text))\n\n    def fetch_country_data(country):\n        country_norm = country.replace(\" \", \"_\")  # Replace spaces with underscores\n        url = f\"https://en.wikipedia.org/wiki/{country_norm}\"\n        response = requests.get(url)\n        soup = BeautifulSoup(response.content, \"html.parser\")\n\n        infobox = soup.find(\"table\", {\"class\": \"infobox\"})\n        capital_city = None\n        population = None\n\n        if infobox:\n            for row in infobox.find_all(\"tr\"):\n                header = row.find(\"th\")\n                if header:\n                    header_text = header.text.strip()\n                    if \"Capital\" in header_text:\n                        capital_city = row.find(\"td\").text.strip()\n                        capital_city = \" \".join(\n                            capital_city.split()\n                        )  # Normalize and clean up text\n                    if \"Population\" in header_text:\n                        if row.find(\"td\"):\n                            population_text = row.find(\"td\").text.strip()\n                        else:\n                            next_row = row.find_next_sibling(\"tr\")\n                            if next_row and next_row.find(\"td\"):\n                                population_text = next_row.find(\"td\").text.strip()\n                        population = extract_population(population_text)\n\n        return capital_city, population\n\n    capital_city, population = fetch_country_data(country)\n\n    if not capital_city or not population:\n        return False\n\n    # Load data from ODS file\n    data = get_data(file_path)\n    sheet = data[list(data.keys())[0]]  # Assume data is in the first sheet\n\n    # Search for country and verify data\n    for row in sheet:\n        if row[0].lower() == country.lower():\n            recorded_capital_city = row[1]\n            recorded_population = normalize_population(row[2])\n            # Check if the capital city and population in the sheet match Wikipedia\n            if (\n                recorded_capital_city in capital_city\n                and recorded_population == population\n            ):\n                return True\n            else:\n                return False\n\n    return True\n\n\nubuntu_subtasks = [\n    SubTask(\n        id=\"0f589bf9-9b26-4581-8b78-2961b115ab49\",\n        description='Open \"{file_path}\" using vim in a terminal, write \"{content}\", then save and exit vim.',\n        attribute_dict={\"file_path\": \"file_path\", \"content\": \"message\"},\n        output_type=\"file_path\",\n        output_generator=lambda file_path, content: file_path,\n        evaluator_generator=lambda file_path, content: nx.path_graph(\n            [\n                check_current_window_process(\"gnome-terminal-server\"),\n                is_process_open(\"vim\"),\n                ~is_process_open(\"vim\"),\n                check_file_content(file_path, content),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"5b527839-0e58-426d-bab6-7160200b0d24\",\n        description='Get the content of \"{file_path}\" by printing it to the command line interface through a terminal',\n        attribute_dict={\"file_path\": \"file_path\"},\n        output_type=\"message\",\n        output_generator=\"manual\",\n        evaluator_generator=lambda file_path: nx.path_graph(\n            [\n                check_current_window_process(\"gnome-terminal-server\"),\n                check_contain_input_text(\"cat \" + file_path),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"1c3bedc3-ea5a-453c-a15b-223d72ab756d\",\n        description='Submit content \"{content}\"',\n        attribute_dict={\"content\": \"message\"},\n        output_type=\"None\",\n        output_generator=\"manual\",\n        evaluator_generator=lambda content: nx.path_graph(\n            [\n                check_submit(content),\n            ],\n            create_using=nx.DiGraph,\n        ),\n        extra_action=[submit],\n    ),\n    SubTask(\n        id=\"a313ea4d-e501-4971-b4fe-db2aad19eac1\",\n        description='Download a file from \"{url}\" to \"{file_path}\".',\n        attribute_dict={\"url\": \"url\", \"file_path\": \"file_path\"},\n        output_type=\"file_path\",\n        output_generator=lambda file_path, content: file_path,\n        evaluator_generator=lambda url, file_path: nx.path_graph(\n            [\n                download_and_verify_file(url, file_path),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"a313ea4d-e501-4971-b4fe-db2aad19acsd\",\n        description='Download a file from the URL stored in the clipboard to \"{file_path}\".',\n        attribute_dict={\"file_path\": \"file_path\"},\n        output_type=\"file_path\",\n        output_generator=lambda file_path, content: file_path,\n        evaluator_generator=lambda file_path: nx.path_graph(\n            [\n                download_from_clipboard_and_verify_file(file_path),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"017102b6-d2c3-466b-96f7-37c8bcddc41a\",\n        description='Use Firefox to search for an image using the keyword \"{keyword}\" and copy the URL of the image to the clipboard.',\n        attribute_dict={\"keyword\": \"keyword\"},\n        output_type=\"None\",\n        evaluator_generator=lambda keyword: path_graph(\n            [\n                check_text_in_current_window_name(\"Mozilla Firefox\"),\n                check_contain_input_text(keyword),\n                is_img_url_in_clipboard(),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"bcd03c9f-62c9-4001-8d86-78358c59ce22\",\n        description='Use Firefox to find a code repository about \"{keyword}\" in GitHub and copy the URL of the repository to the clipboard.',\n        attribute_dict={\"keyword\": \"keyword\"},\n        output_type=\"None\",\n        evaluator_generator=lambda keyword: path_graph(\n            [\n                check_text_in_current_window_name(\"GitHub — Mozilla Firefox\"),\n                check_contain_input_text(keyword),\n                is_github_repo_url_in_clipboard(keyword),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"a207ef38-b3b2-4c6c-a1e3-75c38162f5ba\",\n        description='Set \"{photo_path}\" as the screen background of the system',\n        attribute_dict={\"photo_path\": \"photo_path\"},\n        output_type=\"None\",\n        evaluator_generator=lambda photo_path: path_graph(\n            [verify_background(photo_path)],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"217ababc-ccc7-4b9f-af07-c239d92848fe\",\n        description='Create a new directory \"{target_dir}\" and copy all files with the specified \"{file_extension}\" extension from \"{source_dir}\" to the directory \"{target_dir}\".',\n        attribute_dict={\n            \"file_extension\": \"file_extension\",\n            \"source_dir\": \"dir_path\",\n            \"target_dir\": \"dir_path\",\n        },\n        output_type=\"message\",\n        evaluator_generator=lambda file_extension,\n        source_dir,\n        target_dir: nx.path_graph(\n            [\n                check_directory_exists(target_dir),\n                verify_files_copied(source_dir, target_dir, file_extension),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"2b189dc2-c77f-4fa3-8432-ba4355cc294c\",\n        description='Use Firefox to find out a \"{place_type}\" around \"{place_name}\" on Google Maps and copy the Google Maps sharing URL of that \"{place_type}\" to the clipboard',\n        attribute_dict={\"place_type\": \"place_type\", \"place_name\": \"place_name\"},\n        output_type=\"None\",\n        evaluator_generator=lambda place_type, place_name: path_graph(\n            [\n                # check_current_window_process(\"firefox\"),\n                check_text_in_current_window_name(\"Google Maps — Mozilla Firefox\"),\n                check_contain_input_text_list([place_name, place_type]),\n                is_google_maps_url_in_clipboard(),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"cc1adae7-bef9-4c8a-865d-00d44486dd69\",\n        description='Use GIMP (GNU Image Manipulation Program) to adjust the brightness of the image from \"{image_path_before_edit}\" to a higher value (brighter) and save it to \"{image_path_after_edit}\".',\n        attribute_dict={\n            \"image_path_before_edit\": \"photo_path\",\n            \"image_path_after_edit\": \"photo_path\",\n        },\n        output_type=\"photo_path\",\n        evaluator_generator=lambda image_path_before_edit,\n        image_path_after_edit: nx.path_graph(\n            [\n                check_text_in_current_window_name(\"GNU Image Manipulation Program\"),\n                check_file_exist(image_path_after_edit),\n                is_image_2_brighter(image_path_before_edit, image_path_after_edit),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"434402f3-647a-4a9a-9d8f-10f5bb6c7cf0\",\n        description='Use LibreOffice Impress to adjust the brightness of the image from \"{image_path_before_edit}\" to a lower value (darker) and save it to \"{image_path_after_edit}\".',\n        attribute_dict={\n            \"image_path_before_edit\": \"photo_path\",\n            \"image_path_after_edit\": \"photo_path\",\n        },\n        output_type=\"photo_path\",\n        evaluator_generator=lambda image_path_before_edit,\n        image_path_after_edit: nx.path_graph(\n            [\n                check_text_in_current_window_name(\"LibreOffice Impress\"),\n                check_file_exist(image_path_after_edit),\n                ~is_image_2_brighter(image_path_before_edit, image_path_after_edit),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"4cf246ea-0a7f-43da-84b6-61d74a2699af\",\n        description='Combine two images from Image 1 \"{image_path_1}\" and Image 2 \"{image_path_2} using GIMP (GNU Image Manipulation Program) and save the resulting image to \"{output_path}\". Image 1 should be placed on the left side of Image 2.',\n        attribute_dict={\n            \"image_path_1\": \"photo_path_1\",\n            \"image_path_2\": \"photo_path_2\",\n            \"output_path\": \"photo_path_ouput\",\n        },\n        output_type=\"photo_path\",\n        evaluator_generator=lambda image_path_1,\n        image_path_2,\n        output_path: nx.path_graph(\n            [\n                check_text_in_current_window_name(\"GNU Image Manipulation Program\"),\n                check_file_exist(output_path),\n                verify_combined_image(image_path_1, image_path_2, output_path, \"left\"),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"0111384f-38ca-41a2-9504-cb1c55002b3c\",\n        description='Combine two images from Image 1 \"{image_path_1}\" and Image 2 \"{image_path_2}\" using LibreOffice Writer and save the resulting ODT file to \"{output_path}\". Image 1 should be placed above Image 2.',\n        attribute_dict={\n            \"image_path_1\": \"photo_path_1\",\n            \"image_path_2\": \"photo_path_2\",\n            \"output_path\": \"file_path\",\n        },\n        output_type=\"file_path\",\n        evaluator_generator=lambda image_path_1,\n        image_path_2,\n        output_path: nx.path_graph(\n            [\n                check_text_in_current_window_name(\"LibreOffice Writer\"),\n                check_file_exist(output_path),\n                verify_combined_image(image_path_1, image_path_2, output_path, \"above\"),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"467f17a6-c42f-4eda-996f-a53385eb3efd\",\n        description='Combine two images from Image 1 \"{image_path_1}\" and Image 2 \"{image_path_2}\" using LibreOffice Impress and save the resulting file in PDF format to \"{output_path}\". Image 1 should be placed on the right side of Image 2.',\n        attribute_dict={\n            \"image_path_1\": \"photo_path_1\",\n            \"image_path_2\": \"photo_path_2\",\n            \"output_path\": \"file_path\",\n        },\n        output_type=\"file_path\",\n        evaluator_generator=lambda image_path_1,\n        image_path_2,\n        output_path: nx.path_graph(\n            [\n                check_text_in_current_window_name(\"LibreOffice Impress\"),\n                check_file_exist(output_path),\n                verify_combined_image(image_path_1, image_path_2, output_path, \"right\"),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"49b614c5-c4bb-4c20-aab8-ab9dcc7de1b5\",\n        description=\"Find the example provided of torch.matmul by official PyTorch version 1.13 documentation using Firefox and copy all the lines of code in the example to the clipboard.\",\n        attribute_dict={},\n        output_type=\"None\",\n        evaluator_generator=lambda: nx.path_graph(\n            [\n                check_text_in_current_window_name(\n                    \"torch.matmul — PyTorch 1.13 documentation — Mozilla Firefox\"\n                ),\n                is_torch_matmul_example_copied_correctly(),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"76de4bdb-c980-4b3a-9bd3-c87db467dffe\",\n        description='Paste clipboard content into LibreOffice Writer and save it as an ODT file at \"{file_path}\".',\n        attribute_dict={\"file_path\": \"file_path\"},\n        output_type=\"file_path\",\n        evaluator_generator=lambda file_path: path_graph(\n            [\n                check_text_in_current_window_name(\"LibreOffice Writer\"),\n                check_file_exist(file_path),\n                verify_odt_file_content_with_clipboard(file_path),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"8491e674-596b-452b-9e0e-58a44d90f947\",\n        description='Paste clipboard content into Visual Studio Code (VS Code) and save it as a file at \"{file_path}\".',\n        attribute_dict={\"file_path\": \"file_path\"},\n        output_type=\"file_path\",\n        evaluator_generator=lambda file_path: path_graph(\n            [\n                check_text_in_current_window_name(\"Visual Studio Code\"),\n                check_file_exist(file_path),\n                verify_file_content_with_clipboard(file_path),\n            ],\n            create_using=DiGraph,\n        ),\n    ),\n    SubTask(\n        id=\"1cd6519a-9ee0-442b-ba5a-9238aeb00ff6\",\n        description='Use Firefox to search for the country \"{country}\" on Wikipedia, extract the capital city and population, and save this information in an ODS file at \"{file_path}\" with LibreOffice Calc. The first column will save the country name, the second will save the capital city name, and the third will save the population. No header is needed in the ODS file.',\n        attribute_dict={\"country\": \"country\", \"file_path\": \"file_path\"},\n        output_type=\"file_path\",\n        evaluator_generator=lambda country, file_path: nx.path_graph(\n            [\n                check_text_in_current_window_name(\"Wikipedia — Mozilla Firefox\"),\n                check_text_in_current_window_name(\"LibreOffice Calc\"),\n                check_file_exist(file_path),\n                verify_country_data_in_ods(country, file_path),\n            ],\n            create_using=nx.DiGraph,\n        ),\n    ),\n]\n\n\nif __name__ == \"__main__\":\n    generator = TaskGenerator(attribute_pool={})\n"
  },
  {
    "path": "crab-benchmark-v0/main.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport argparse\nimport logging\nimport warnings\nfrom pathlib import Path\nfrom typing import Literal\n\nfrom crab import (\n    BenchmarkConfig,\n    Experiment,\n    MessageType,\n    TaskGenerator,\n    create_benchmark,\n)\nfrom crab.actions.crab_actions import complete, wait\nfrom crab.actions.visual_prompt_actions import (\n    get_elements_prompt,\n    groundingdino_easyocr,\n)\nfrom crab.agents.backend_models import BackendModelConfig\nfrom crab.agents.policies import (\n    MultiAgentByEnvPolicy,\n    MultiAgentByFuncPolicy,\n    SingleAgentPolicy,\n)\nfrom crab.core.agent_policy import AgentPolicy\nfrom crab.core.benchmark import Benchmark\n\nfrom .android_env import ANDROID_ENV\nfrom .dataset.android_subtasks import android_subtasks\nfrom .dataset.handmade_tasks import handmade_tasks\nfrom .dataset.ubuntu_subtasks import ubuntu_subtasks\nfrom .ubuntu_env import UBUNTU_ENV\n\nwarnings.filterwarnings(\"ignore\")\n\n\nclass CrabBenchmarkV0(Experiment):\n    def __init__(\n        self,\n        benchmark: Benchmark,\n        task_id: str,\n        agent_policy: AgentPolicy | Literal[\"human\"],\n        log_dir: Path | None = None,\n    ) -> None:\n        super().__init__(benchmark, task_id, agent_policy, log_dir)\n\n    def get_prompt(self):\n        observation, ob_prompt = self.benchmark.observe_with_prompt()\n\n        # construct prompt\n        result_prompt = {}\n        for env in ob_prompt:\n            if env == \"root\":\n                continue\n            screenshot = observation[env][\"screenshot\"]\n            marked_screenshot, _ = ob_prompt[env][\"screenshot\"]\n            result_prompt[env] = [\n                (f\"Here is the current screenshot of {env}:\", MessageType.TEXT),\n                (screenshot, MessageType.IMAGE_JPG_BASE64),\n                (\n                    f\"Here is the screenshot with element labels of {env}:\",\n                    MessageType.TEXT,\n                ),\n                (marked_screenshot, MessageType.IMAGE_JPG_BASE64),\n            ]\n        return result_prompt\n\n\ndef get_benchmark(env: str, ubuntu_url: str):\n    ubuntu_env = UBUNTU_ENV.model_copy()\n    ubuntu_env.remote_url = ubuntu_url\n    ubuntu_tool = {\n        \"screenshot\": groundingdino_easyocr(font_size=16) >> get_elements_prompt\n    }\n    android_tool = {\n        \"screenshot\": groundingdino_easyocr(font_size=40) >> get_elements_prompt\n    }\n\n    if env == \"ubuntu\":\n        prompting_tools = {\"ubuntu\": ubuntu_tool}\n        benchmark_config = BenchmarkConfig(\n            name=\"ubuntu_benchmark\",\n            tasks=[],\n            environments=[ubuntu_env],\n            prompting_tools=prompting_tools,\n            root_action_space=[complete, wait],\n            multienv=True,\n        )\n    elif env == \"android\":\n        prompting_tools = {\"android\": android_tool}\n        benchmark_config = BenchmarkConfig(\n            name=\"android_benchmark\",\n            tasks=[],\n            environments=[ANDROID_ENV],\n            prompting_tools=prompting_tools,\n            root_action_space=[complete, wait],\n            multienv=True,\n        )\n    elif env == \"cross\":\n        prompting_tools = {\n            \"android\": android_tool,\n            \"ubuntu\": ubuntu_tool,\n        }\n        benchmark_config = BenchmarkConfig(\n            name=\"ubuntu_android_benchmark\",\n            tasks=[],\n            environments=[ubuntu_env, ANDROID_ENV],\n            prompting_tools=prompting_tools,\n            root_action_space=[complete, wait],\n            multienv=True,\n        )\n    else:\n        raise ValueError(\"Env not support\")\n\n    # Load from json config files by combining sub-tasks\n    generator = TaskGenerator(subtasks=android_subtasks + ubuntu_subtasks)\n    dir_path = (Path(__file__).parent / \"dataset\").resolve()\n    tasks = []\n    for task_json_files in dir_path.rglob(\"*.json\"):\n        task = generator.get_task_from_file(task_json_files)\n        tasks.append(task)\n    benchmark_config.tasks.extend(tasks)\n\n    # Load from handmade tasks\n    benchmark_config.tasks.extend(handmade_tasks)\n\n    benchmark_config.step_limit = 20\n    return create_benchmark(benchmark_config)\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser(\n        description=\"Script for running benchmark with an agent.\"\n    )\n    parser.add_argument(\n        \"--model\",\n        type=str,\n        help=\"gpt4o, gpt4turbo, gemini, claude or human\",\n        default=\"gpt4o\",\n    )\n    parser.add_argument(\n        \"--policy\",\n        type=str,\n        help=\"single, multi-by-func, or multi-by-env\",\n        default=\"single\",\n    )\n    parser.add_argument(\n        \"--ubuntu-url\",\n        type=str,\n        help=\"remote url of Ubunutu environment\",\n        default=\"http://127.0.0.1:8000\",\n    )\n    parser.add_argument(\n        \"--env\",\n        type=str,\n        help=\"ubuntu, android or cross\",\n        default=\"cross\",\n    )\n    parser.add_argument(\"--task-id\", type=str, help=\"task id\")\n    parser.add_argument(\n        \"--model-base-url\",\n        type=str,\n        help=\"URL of the model API\",\n        default=\"http://127.0.0.1:8000/v1\",\n    )\n    parser.add_argument(\n        \"--model-api-key\",\n        type=str,\n        help=\"API key of the model API\",\n        default=\"EMPTY\",\n    )\n    parser.add_argument(\n        \"--loglevel\",\n        type=str,\n        help=\"logger level, debug, info, warning, or error\",\n        default=\"warning\",\n    )\n    parser.add_argument(\n        \"--history-messages-len\",\n        type=int,\n        help=\"The number of rounds of chat history to provide to the model\",\n        default=2,\n    )\n    args = parser.parse_args()\n    loglevel = args.loglevel\n    numeric_level = getattr(logging, loglevel.upper(), None)\n    if not isinstance(numeric_level, int):\n        raise ValueError(\"Invalid log level: %s\" % loglevel)\n    logging.basicConfig(level=numeric_level)\n\n    benchmark = get_benchmark(args.env, args.ubuntu_url)\n\n    if args.model == \"human\":\n        expeirment = CrabBenchmarkV0(\n            benchmark=benchmark,\n            task_id=args.task_id,\n            agent_policy=\"human\",\n        )\n        expeirment.start_benchmark()\n        exit()\n\n    if args.model == \"gpt4o\":\n        model = BackendModelConfig(\n            model_class=\"openai\",\n            model_name=\"gpt-4o\",\n            history_messages_len=args.history_messages_len,\n        )\n    elif args.model == \"gpt4turbo\":\n        model = BackendModelConfig(\n            model_class=\"openai\",\n            model_name=\"gpt-4-turbo\",\n            history_messages_len=args.history_messages_len,\n        )\n    elif args.model == \"gemini\":\n        model = BackendModelConfig(\n            model_class=\"gemini\",\n            model_name=\"gemini-1.5-pro-latest\",\n            history_messages_len=args.history_messages_len,\n        )\n    elif args.model == \"claude\":\n        model = BackendModelConfig(\n            model_class=\"claude\",\n            model_name=\"claude-3-opus-20240229\",\n            history_messages_len=args.history_messages_len,\n        )\n    elif args.model == \"pixtral\":\n        model = BackendModelConfig(\n            model_class=\"openai\",\n            model_name=\"mistralai/Pixtral-12B-2409\",\n            json_structre_output=True,\n            history_messages_len=args.history_messages_len,\n            base_url=args.model_base_url,\n            api_key=args.model_api_key,\n        )\n    elif args.model == \"gpt4o-wofc\":\n        model = BackendModelConfig(\n            model_class=\"openai\",\n            model_name=\"gpt-4o\",\n            json_structre_output=True,\n            history_messages_len=args.history_messages_len,\n        )\n    elif args.model == \"llava-ov72b\":\n        model = BackendModelConfig(\n            model_class=\"sglang\",\n            model_name=\"lmms-lab/llava-onevision-qwen2-72b-ov-chat\",\n            json_structre_output=True,\n            history_messages_len=args.history_messages_len,\n            base_url=args.model_base_url,\n            api_key=args.model_api_key,\n        )\n    else:\n        print(\"Unsupported model: \", args.model)\n        exit()\n\n    if args.policy == \"single\":\n        agent_policy = SingleAgentPolicy(model_backend=model)\n    elif args.policy == \"multi-by-func\":\n        agent_policy = MultiAgentByFuncPolicy(\n            main_agent_model_backend=model, tool_agent_model_backend=model\n        )\n    elif args.policy == \"multi-by-env\":\n        agent_policy = MultiAgentByEnvPolicy(\n            main_agent_model_backend=model, env_agent_model_backend=model\n        )\n    else:\n        print(\"Unsupported policy: \", args.policy)\n        exit()\n\n    log_dir = (Path(__file__).parent / \"tianqi_logs\").resolve()\n    expeirment = CrabBenchmarkV0(\n        benchmark=benchmark,\n        task_id=args.task_id,\n        agent_policy=agent_policy,\n        log_dir=log_dir,\n    )\n    expeirment.start_benchmark()\n"
  },
  {
    "path": "crab-benchmark-v0/scripts/ubuntu_env_init.sh",
    "content": "#!/bin/bash\n\n# Disable screen autolock\ngsettings set org.gnome.desktop.screensaver lock-enabled false\ngsettings set org.gnome.desktop.session idle-delay 0\n\n# Disable automatic updates\nsudo bash -c 'cat <<EOF > /etc/apt/apt.conf.d/20auto-upgrades\nAPT::Periodic::Update-Package-Lists \"0\";\nAPT::Periodic::Unattended-Upgrade \"0\";\nEOF'\n\n# Allow sudo without password for the current user\nCURRENT_USER=$(whoami)\nsudo bash -c \"echo \\\"$CURRENT_USER ALL=(ALL) NOPASSWD: ALL\\\" | tee /etc/sudoers.d/$CURRENT_USER\"\n\n# Install required packages\nsudo apt update\nsudo apt install -y openssh-server git vim python3-pip xdotool python3-tk python3.10-venv\n\n# Install pipx\npython3 -m pip install pipx\npython3 -m pipx ensurepath\n\n# Modify .bashrc to alias python to python3 for the current user\necho 'alias python=python3' >> /home/$CURRENT_USER/.bashrc\n\n# Reload .bashrc for the current user\nsource /home/$CURRENT_USER/.bashrc\n\n# Install poetry using pipx\npipx install poetry\n\n# Pull CRAB repo\nif [ ! -d \"/home/$CURRENT_USER/crab\" ]; then\n    git clone https://github.com/camel-ai/crab.git /home/$CURRENT_USER/crab/\nfi\n\n# Create poetry environment\ncd /home/$CURRENT_USER/crab\npoetry install -E server\n\n# Change to X11 from Wayland\nsudo sed -i 's/#WaylandEnable=false/WaylandEnable=false/g' /etc/gdm3/custom.conf\ntouch /home/$CURRENT_USER/.Xauthority\n\n# Create the crab.service file with dynamic user and group\nsudo bash -c \"cat <<EOF > /etc/systemd/system/crab.service\n[Unit]\nDescription=My Python Script Service\nAfter=network.target\n\n[Service]\nWorkingDirectory=/home/$CURRENT_USER/crab/\nExecStart=/home/$CURRENT_USER/.local/bin/poetry run python -m crab.server.main --HOST 0.0.0.0\nRestart=always\nUser=$CURRENT_USER\nGroup=$CURRENT_USER\n\n[Install]\nWantedBy=multi-user.target\nEOF\"\n\n# Reload systemd to recognize the new service\nsudo systemctl daemon-reload\n\n# Enable and start the crab service\nsudo systemctl enable crab.service\n\n# Reboot the system to apply changes for X11\necho \"System will reboot in 10 seconds to apply changes...\"\nsleep 10\nsudo reboot"
  },
  {
    "path": "crab-benchmark-v0/ubuntu_env.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom crab.actions.desktop_actions import (\n    click,\n    double_click,\n    key_press,\n    press_hotkey,\n    right_click,\n    screenshot,\n    search_application,\n    write_text,\n)\nfrom crab.core import EnvironmentConfig\n\nUBUNTU_ENV = EnvironmentConfig(\n    name=\"ubuntu\",\n    action_space=[\n        click,\n        key_press,\n        write_text,\n        press_hotkey,\n        search_application,\n        right_click,\n        double_click,\n    ],\n    observation_space=[screenshot],\n    description=\"\"\"An Ubuntu 22.04 Linux desktop operating system. The interface \\\ndisplays a current screenshot at each step and primarily supports interaction \\\nvia mouse and keyboard. You must use searching functionality to open any \\\napplication in the system. This device includes system-related applications \\\nincluding Terminal, Files, Text Editor, Vim, and Settings. It also features \\\nFirefox as the web browser, and the LibreOffice suite—Writer, Calc, and \\\nImpress. For communication, Slack is available. The Google account is \\\npre-logged in on Firefox, synchronized with the same account used in the \\\nAndroid environment.\"\"\",\n)\n"
  },
  {
    "path": "docs/Makefile",
    "content": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the environment for the first two.\nSPHINXOPTS    ?=\nSPHINXBUILD   ?= sphinx-build\nSOURCEDIR     = .\nBUILDDIR      = _build\n\n# Put it first so that \"make\" without argument is like \"make help\".\nhelp:\n\t@$(SPHINXBUILD) -M help \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n\n.PHONY: help Makefile\n\n# Catch-all target: route all unknown targets to Sphinx using the new\n# \"make mode\" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).\n%: Makefile\n\t@$(SPHINXBUILD) -M $@ \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n"
  },
  {
    "path": "docs/conf.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Configuration file for the Sphinx documentation builder.\n#\n# For the full list of built-in configuration values, see the documentation:\n# https://www.sphinx-doc.org/en/master/usage/configuration.html\n\n# -- Path setup --------------------------------------------------------------\n\n# If extensions (or modules to document with autodoc) are in another directory,\n# add these directories to sys.path here. If the directory is relative to the\n# documentation root, use os.path.abspath to make it absolute, like shown here.\n#\nimport os\nimport sys\nsys.path.insert(0, os.path.abspath('..'))\n\n# -- Project information -----------------------------------------------------\n# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information\n\nproject = 'CRAB'\ncopyright = '2024, CAMEL-AI.org'\nauthor = 'CAMEL-AI.org'\nversion = '0.1'\nrelease = '0.1.2'\n\n# -- General configuration ---------------------------------------------------\n# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration\n\nextensions = [\n    'sphinx.ext.autodoc',\n    'sphinx.ext.viewcode',\n    'sphinx.ext.napoleon',\n    'myst_parser',\n]\n\ntemplates_path = ['_templates']\nexclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']\n\n\n\n# -- Options for HTML output -------------------------------------------------\n# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output\n\nhtml_theme = 'sphinx_book_theme'\nhtml_favicon = '_static/favicon.png'\nhtml_static_path = ['_static']\nhtml_logo = \"_static/CRAB_logo1.png\"\nhtml_title = \"CRAB Documentation\"\nhtml_theme_options = {\n    \"repository_url\": \"https://github.com/camel-ai/crab\",\n    \"use_repository_button\": True,\n}\n"
  },
  {
    "path": "docs/crab.benchmarks.rst",
    "content": "crab.benchmarks package\n=======================\n\nSubmodules\n----------\n\ncrab.benchmarks.template module\n-------------------------------\n\n.. automodule:: crab.benchmarks.template\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\nModule contents\n---------------\n\n.. automodule:: crab.benchmarks\n   :members:\n   :undoc-members:\n   :show-inheritance:\n"
  },
  {
    "path": "docs/crab.client.rst",
    "content": "crab.client package\n===================\n\nSubmodules\n----------\n\ncrab.client.env module\n----------------------\n\n.. automodule:: crab.client.env\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.client.openai\\_interface module\n------------------------------------\n\n.. automodule:: crab.client.openai_interface\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\nModule contents\n---------------\n\n.. automodule:: crab.client\n   :members:\n   :undoc-members:\n   :show-inheritance:\n"
  },
  {
    "path": "docs/crab.core.models.rst",
    "content": "crab.core.models package\n========================\n\nSubmodules\n----------\n\ncrab.core.models.action module\n------------------------------\n\n.. automodule:: crab.core.models.action\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.core.models.benchmark\\_interface module\n--------------------------------------------\n\n.. automodule:: crab.core.models.benchmark_interface\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.core.models.config module\n------------------------------\n\n.. automodule:: crab.core.models.config\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.core.models.evaluator module\n---------------------------------\n\n.. automodule:: crab.core.models.evaluator\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.core.models.task module\n----------------------------\n\n.. automodule:: crab.core.models.task\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\nModule contents\n---------------\n\n.. automodule:: crab.core.models\n   :members:\n   :undoc-members:\n   :show-inheritance:\n"
  },
  {
    "path": "docs/crab.core.rst",
    "content": "crab.core package\n=================\n\nSubpackages\n-----------\n\n.. toctree::\n   :maxdepth: 4\n\n   crab.core.models\n\nSubmodules\n----------\n\ncrab.core.benchmark module\n--------------------------\n\n.. automodule:: crab.core.benchmark\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.core.decorators module\n---------------------------\n\n.. automodule:: crab.core.decorators\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.core.environment module\n----------------------------\n\n.. automodule:: crab.core.environment\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.core.exceptions module\n---------------------------\n\n.. automodule:: crab.core.exceptions\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.core.graph\\_evaluator module\n---------------------------------\n\n.. automodule:: crab.core.graph_evaluator\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.core.task\\_generator module\n--------------------------------\n\n.. automodule:: crab.core.task_generator\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.core.vagrant\\_manager module\n---------------------------------\n\n.. automodule:: crab.core.vagrant_manager\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\nModule contents\n---------------\n\n.. automodule:: crab.core\n   :members:\n   :undoc-members:\n   :show-inheritance:\n"
  },
  {
    "path": "docs/crab.environments.rst",
    "content": "crab.environments package\n=========================\n\nSubmodules\n----------\n\ncrab.environments.android module\n--------------------------------\n\n.. automodule:: crab.environments.android\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.environments.linux module\n------------------------------\n\n.. automodule:: crab.environments.linux\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.environments.template module\n---------------------------------\n\n.. automodule:: crab.environments.template\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\nModule contents\n---------------\n\n.. automodule:: crab.environments\n   :members:\n   :undoc-members:\n   :show-inheritance:\n"
  },
  {
    "path": "docs/crab.rst",
    "content": "crab package\n============\n\nSubpackages\n-----------\n\n.. toctree::\n   :maxdepth: 4\n\n   crab.benchmarks\n   crab.client\n   crab.core\n   crab.environments\n   crab.server\n\nModule contents\n---------------\n\n.. automodule:: crab\n   :members:\n   :undoc-members:\n   :show-inheritance:\n"
  },
  {
    "path": "docs/crab.server.controller.rst",
    "content": "crab.server.controller package\n==============================\n\nSubmodules\n----------\n\ncrab.server.controller.benchmark module\n---------------------------------------\n\n.. automodule:: crab.server.controller.benchmark\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.server.controller.environment module\n-----------------------------------------\n\n.. automodule:: crab.server.controller.environment\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\nModule contents\n---------------\n\n.. automodule:: crab.server.controller\n   :members:\n   :undoc-members:\n   :show-inheritance:\n"
  },
  {
    "path": "docs/crab.server.rst",
    "content": "crab.server package\n===================\n\nSubpackages\n-----------\n\n.. toctree::\n   :maxdepth: 4\n\n   crab.server.controller\n\nSubmodules\n----------\n\ncrab.server.api module\n----------------------\n\n.. automodule:: crab.server.api\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.server.config module\n-------------------------\n\n.. automodule:: crab.server.config\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.server.data module\n-----------------------\n\n.. automodule:: crab.server.data\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.server.exception\\_handlers module\n--------------------------------------\n\n.. automodule:: crab.server.exception_handlers\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.server.logger module\n-------------------------\n\n.. automodule:: crab.server.logger\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.server.main module\n-----------------------\n\n.. automodule:: crab.server.main\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.server.middleware module\n-----------------------------\n\n.. automodule:: crab.server.middleware\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\ncrab.server.utils module\n------------------------\n\n.. automodule:: crab.server.utils\n   :members:\n   :undoc-members:\n   :show-inheritance:\n\nModule contents\n---------------\n\n.. automodule:: crab.server\n   :members:\n   :undoc-members:\n   :show-inheritance:\n"
  },
  {
    "path": "docs/crab_benchmark_v0/environment_gcp_setup.md",
    "content": "# Google cloud platform setup\n\n## Setup and Start the VM Instance\n\nThe development image is hosted in the project `capable-vista-420022` with image name `crab-benchmark-v0-1`.\n\nYou can use [gcloud](https://cloud.google.com/sdk/docs/install) to create an instance from this image.\n\nFirst install [gcloud](https://cloud.google.com/sdk/docs/install), then create an instance using the following command:\n\n```bash\ngcloud compute instances create \\\ncrab-instance \\\n--zone=us-central1-a \\\n--machine-type=n2-standard-8 \\\n--image=https://www.googleapis.com/compute/v1/projects/capable-vista-420022/global/images/crab-benchmark-v0-1 \\\n--enable-nested-virtualization\n# You can change instance name, zone, machine type as you want.\n# Remember that the CPU must support nested virtualization and should have at least 32G memory.\n# This setting costs around 0.4$ per hour.\n```\n\nAfter creating the instance, you can connect it using SSH.\n\nUser account information:\n\n* user: `root`; password: `crab`\n* user: `crab`; password: `crab`\n\n**IMPORTANT: You must switch to user `crab` before setting up remote desktop.** Use `sudo su crab`.\n\n## Connect the Instance through a remote desktop service\n\nYou need to connect the server to a display to set up the experiment environment because the Ubuntu virtual machine and the Android emulator require GUI operations.\n\nThere are many possible remote desktop products you can use. Here, we provide instructions for [Google Remote Desktop](https://remotedesktop.google.com/access/), which was used to run our experiment.\n\n1. Go to [Google Remote Desktop Headless](https://remotedesktop.google.com/headless). Click **Begin** -> **Next** -> **Authorize**. On the resulting page, copy the command from the `Debian Linux` section.\n2. Connect to the VM instance through SSH, paste the copied command, and run it. You will be prompted to set a six-digit PIN.\n3. Go to [Google Remote Desktop Access](https://remotedesktop.google.com/access). You should see a remote device marked as online. Click it and enter the PIN. You will then see the desktop of the VM instance."
  },
  {
    "path": "docs/crab_benchmark_v0/environment_local_setup.md",
    "content": "# Local setup\n\n## Install CRAB\n\nFirst you should install `poetry`, a modern python dependency management tool.\n\nThen pull the crab repo and install:\n\n```bash\ngit clone https://github.com/camel-ai/crab\n\ncd crab\npoetry install -E client\n```\n\n## Install Ubuntu VM\n\n**IMPORTANT: If you are using an Ubuntu VM, the Python version in the VM must match the Python version on the host machine. If you follow this instruction to install Ubuntu, the Python version in the VM will be 3.10.12. Consider using `conda` or `pyenv` to install the same Python version on the host machine.**\n\nInstall `virt-manager`. If you are using Ubuntu or Debian, try `sudo apt install virt-manager`.\n\nDownload [Ubuntu 22.04 image](https://releases.ubuntu.com/jammy/ubuntu-22.04.4-desktop-amd64.iso), then create a new machine with at least 8G RAM and 30G disk in virt-manager using the image. Follow the instruction and complete the installation. (It's better to use `crab` as the main user name.)\n\nAfter install Ubuntu, you should install crab-server on it and do necessary initilization. In Ubuntu VM, run\n\n```bash\ngit clone https://github.com/camel-ai/crab.git ~/crab/\ncd ~/crab/crab-benchmark-v0/scripts\nchmod +x ubuntu_env_init.sh\n./ubuntu_env_init.sh\n```\n\nThe VM will reboot after initilization. After rebooting, remember its ip address.\n\n\n## Install ADB\n\nDownload and install ADB from its [official website](https://developer.android.com/tools/releases/platform-tools).\n\n## Install Android Emulator\n\nYou can use emulators in [Android Studio](https://developer.android.com/studio) to simulate an Android device if you\ndon't want to use a physical one.\n\nTo create a new virtual device, open Android Studio and use its built-in device manager to create a Pixel 8 Pro with\nsystem image release \"R\".\n\n> Note that the benchmark on our side runs on a Google Pixel 8 Pro with system image release \"R\". However, cases are\n> noticed that Google API Level 30 may not work properly when trying to enable USB debugging mode. If such issues are \n> encountered, you can try switch to releases of lower API levels (e.g. \"Q\").\n\n![](./assets/android_1.png)\n\n![](./assets/android_2.png)\n\nThen you can boot the device. To check if it's all set, run\n\n```shell\nadb devices\n```\n\nYou should see the device in the list.\n\n> Important: ADB won't work normally if you see an `unauthorized` tag after the device ID. To solve this, enable both\n> the developer mode and USB debugging mode in the device."
  },
  {
    "path": "docs/crab_benchmark_v0/get_started.md",
    "content": "# Get started\n\n`crab-benchmark-v0` is a benchmark released with the crab framework to provide a standard usage. It includes two virtual machine environments: an Android smartphone and an Ubuntu desktop computer, with 100 tasks and 59 different evaluator functions in the dataset. It effectively evaluates the MLM-based agents' performance on operating real-world tasks across multiple platforms.\n\n## Concept\n\nOur benchmark contains two important parts: **Environments** and **Tasks**.\n\n#### Environment\n\nSince our Ubuntu environment is built upon KVM, setting it up locally requires you an experienced Linux user to deal with many small and miscellaneous issues. Therefore, we provide two environment setup methods:\n\n* [Local setup](./environment_local_setup.md) provides you a step-by-step guideline to build environments on a Linux Machine with **at least one monitor and 32G memory**, but it doesn't cover details like how to install KVM on your machine because they are various on different Linux distros.\n* For those who want a quicker setup, we also provide a setup through [Google Clould Platform](./environment_gcp_setup.md). Specifically, we publish a disk image contains all required software and configurations on google cloud, you can use your own google account to create a cloud computer through this disk image and use [google remote desktop](https://remotedesktop.google.com/access/) to connect to it. This method doesn't have any hardware limitations and when you set it up you can run the experiment immediately. As a tradeoff, the cloud computer that meets the minimum hardware requirement costs around $0.4 per hour (depend on the machine zone).\n\nWe connect to the Android environment via ADB, so any Android device, from an emulator to a physical smartphone, will work. You should ensure ADB is installed on your system and can be directly called through the command line. In our experiment, we used the built-in emulator of [Android Studio](https://developer.android.com/studio) to create a Google Pixel 8 Pro virtual device with the release name \\textit{R} and installed necessary extra Apps.\n\n#### Task\n\nWe manage our task dataset using a CRAB-recommended method. Sub-tasks are defined through Pydantic models written in Python code, and composed tasks are defined in JSON format, typically combining several sub-tasks. The sub-tasks are defined in [android_subtasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/android_subtasks.py) and [ubuntu_subtasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/ubuntu_subtasks.py). The JSON files storing composed tasks are categorized into [android](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/android/), [ubuntu](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/ubuntu/), and [cross-platform](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/cross/). The tasks in android and ubuntu directories are single-environment task and those in cross directory are cross-environment tasks. Additionally, we create several tasks by hand instead of composing sub-tasks to provide semantically more meaningful tasks, which are found in [handmade tasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/handmade_tasks.py).\n\n## Experiment\n\nAfter setting up the environment, you can start the experiment. A brief overview of the experiment is as follows:\n\n1. Open the Ubuntu environment virtual machine and the Android environment emulator.\n2. Start the CRAB server in the Ubuntu environment and get its IP address and port. Let's say they are `192.168.122.72` and `8000`.\n3. Choose a task. As an example, we take the task with ID `a3476778-e512-40ca-b1c0-d7aab0c7f18b` from [handmade_tasks](https://github.com/camel-ai/crab/tree/main/crab-benchmark-v0/dataset/handmade_tasks.py). The task is: \"Open the 'Tasks' app on Android, check the first incomplete task, then perform the task according to its description.\"\n4. Run [main.py](./main.py) with the command `poetry run python -m crab-benchmark-v0.main --model gpt4o --policy single --remote-url http://192.168.122.72:8000 --task-id a3476778-e512-40ca-b1c0-d7aab0c7f18b`. In this command, `--model gpt4o` and `--policy single` determine the agent system, `--remote-url` specifies the Ubuntu environment interface, and `--task-id` indicates the task to be performed.\n\n"
  },
  {
    "path": "docs/get_started/build_your_own_benchmark.md",
    "content": "# Build your own benchmark\n\n## Overview\n\n![](../assets/benchmark_config.png)\n\nCrab benchmark system mainly consists of five types of component:\n\n* `Action`: The fundamental building block of Crab framework, which represents a unit operation that can be taken by an agent or as a fixed process that called multi times in a benchmark.\n* `Evaluator`: A specific type of `Action` that assess whether an agent has achieved its goal. Multiple evaluators can be combined together as a graph to enable complex evaluation.\n* `Environment` A abstraction of an environment that the agent can take action and obverse in a given action and observation space. An environment can be launched on the local machine, a physical remote machine, or a virtual machine.\n* `Task`: A task with a natural language description to instruct the agent to perform. It can include interaction with multiple environments. Notice that in the benchmark, a task should have an graph evaluator to judge if the task progress.\n* `Benchmark`: The main body of the crab system that contains all required component to build a benchmark, including environments, tasks, prompting method. It controls several \n\n## Actions\n\nActions are the fundamental building blocks of the Crab system's operations.  Each action is encapsulated as an instance of the `Action` class. An action can convert into a JSON schema for language model agents to use.\n\nAn action is characterized by the following attributes:\n\n- **Name**: A string identifier uniquely represents the action.\n- **Entry**: A callable entry point to the actual Python function that executes the action.\n- **Parameters**: A Pydantic model class that defines the input parameters the action accepts.\n- **Returns**: A Pydantic model class that defines the structure of the return type the action produces.\n- **Description**: An string providing a clear and concise description of what the action does and how it behaves.\n- **Kept Parameters**: A list of parameters retained for internal use by the Crab system, which do not appear in the action's parameter list but are injected automatically at runtime. For exmaple we use `env` to represent the current environment object that action are taken in.\n- **Environment Name**: An optional string that can specify the environment the action is associated with. Usually this attribute is only used by predifined actions like `setup` in an environment.\n\nHere is an example of creating an action through python function:\n\n```python\n@action\ndef click(x: float, y: float) -> None:\n    \"\"\"\n    click on the current desktop screen.\n\n    Args:\n        x (float): The X coordinate, as a floating-point number in the range [0.0, 1.0].\n        y (float): The Y coordinate, as a floating-point number in the range [0.0, 1.0].\n\n    \"\"\"\n    import pyautogui\n\n    pyautogui.click(x,y)\n```\n\nThe `@action` decorator transforms the `click` function into an `Action` with these mappings:\n\n- The function name `click` becomes the action **name**.\n- The parameters `x: float, y: float` with their type hints become the action **parameters**.\n- The return type hint `-> None` is used for the action's **returns** field, indicating no value returned.\n- The function's docstring provides a **description** for the action and its parameters, utilized in the JSON schema for the agent.\n- The function body defines the action's behavior, executed when the action is called.\n\n\nThe `Action` class allows for different combination operations such as:\n\n- **Pipe**: Using the `>>` operator, actions can be piped together, where the output of one action becomes the input to another, provided their parameters and return types are compatible.\n- **Sequential Combination**: The `+` operator allows for two actions to be combined sequentially, executing one after the other.\n\n## Evaluators\n\nEvaluators in the Crab system are a specific type of `Action` that assess whether an agent has achieved its goal. They should return a boolean value, indicating whether the task's objective has been met. Multiple evaluators can be connected into a graph using the `networkx` package, enabling multi-stage evaluation, where different conditions can be checked in sequence or in parallel.\n\nAn example evaluator `check_file_exist` confirms the presence of a file at a given path, using the `os.path.isfile` method to return `True` if the file exists or `False` otherwise:\n\n```python\n@evaluator\ndef check_file_exist(file_path: str) -> bool:\n    return os.path.isfile(file_path)\n```\n\nExtra attributes of evaluators:\n\n- **Require Submit**: Indicates if the evaluator awaits a specific submission to carry out its assessment.\n\nLogical operators allow for evaluator combinations:\n\n- **AND (&)**: Requires all evaluators to succeed for a task to pass.\n- **OR (|)**: Passes if any of the evaluators succeed.\n- **NOT (~)**: Reverses the evaluation outcome.\n\nThe combined evaluator is still considered as **one evaluator** rather than a graph evaluator.\n\n"
  },
  {
    "path": "docs/get_started/quickstart.md",
    "content": "# Quickstart\n\nThe `Benchmark` class is a comprehensive framework for evaluating language model agents across various tasks and environments. It provides a flexible structure to manage multiple environments and tasks, offering single and multi-environment execution modes.\n\nThe following image shows an overview of how `Benchmark` works.\n\n![](../assets/crab_overview.png)\n\n## Basic Usage\n\n### Step 1: Importing the Benchmark\n\nBegin by importing the predefined benchmark from the `crab.benchmarks` module. For exmple, here we import `template_benchmark_config`:\n\n```python\nfrom crab.benchmarks import template_benchmark_config\n```\n\n### Step 2: Creating the Benchmark\n\nUse the `create_benchmark` function to create an instance of a `Benchmark` class based on the imported benchmark configuration:\n\n```python\nfrom crab import create_benchmark\n\nbenchmark = create_benchmark(template_benchmark_config)\n```\n\n### Step 3: Starting a Task\n\nSelect a task to start within the benchmark. The task ID should correspond to one of the predefined tasks in the benchmark configuration. Use the `start_task` method to initialize and begin the task:\n\n```python\n# Starting the task with ID \"0\"\ntask, action_space = benchmark.start_task(\"0\")\n```\n\n### Step 4: Running the Benchmark Loop\n\nExecute actions and observe the results using the `step` and `observe` methods:\n\n```python\nfrom crab.client.openai_interface import OpenAIAgent\n\n# Initialize the agent by benchmark task and action_space\nagent = OpenAIAgent(task, action_space)\n\n# Define a function to run the benchmark\ndef run_benchmark(benchmark, agent):\n    for step in range(20):  # Define the number of steps as per your requirements\n        print(\"=\" * 40)\n        print(f\"Starting step {step}:\")\n\n        # Get the current observations and prompts\n        observation = benchmark.observe()\n\n        # Process the observations and determine the next action\n        action_result = agent.determine_next_action(observation)\n        \n        # Execute the action and get the result\n        step_result = benchmark.step(action_result.action, action_result.parameters)\n\n        # Check current evaluation result.\n        print(step_result.evaluation_results)\n\n        # Check if the task is terminated and break the loop if so\n        if step_result.terminated:\n            print(\"Task completed successfully.\")\n            print(step_result.evaluation_results)\n            break\n\nrun_benchmark(benchmark, agent)\n```\n\n### Step 5: Completing the Benchmark\n\nClean up and reset the benchmark after completion using the`reset`:\n\n```python\nbenchmark.reset()\n```\n"
  },
  {
    "path": "docs/index.rst",
    "content": ".. Crab documentation master file, created by\n   sphinx-quickstart on Thu May  2 10:58:47 2024.\n   You can adapt this file completely to your liking, but it should at least\n   contain the root `toctree` directive.\n\nWelcome to Crab's documentation!\n================================\n\n.. toctree::\n   :maxdepth: 1\n   :caption: Get Started with CRAB:\n   :name: get_started\n\n   get_started/quickstart.md\n   get_started/build_your_own_benchmark.md\n\n.. toctree::\n   :maxdepth: 1\n   :caption: CRAB Benchmark-v0:\n   :name: crab_benchmark_v0\n\n   crab_benchmark_v0/get_started.md\n   crab_benchmark_v0/environment_gcp_setup.md\n   crab_benchmark_v0/environment_local_setup.md\n\n.. toctree::\n   :maxdepth: 2\n   :caption: API Reference:\n\n   modules\n\n\n\nIndices and tables\n==================\n\n* :ref:`genindex`\n* :ref:`modindex`\n* :ref:`search`\n"
  },
  {
    "path": "docs/make.bat",
    "content": "@ECHO OFF\r\n\r\npushd %~dp0\r\n\r\nREM Command file for Sphinx documentation\r\n\r\nif \"%SPHINXBUILD%\" == \"\" (\r\n\tset SPHINXBUILD=sphinx-build\r\n)\r\nset SOURCEDIR=.\r\nset BUILDDIR=_build\r\n\r\n%SPHINXBUILD% >NUL 2>NUL\r\nif errorlevel 9009 (\r\n\techo.\r\n\techo.The 'sphinx-build' command was not found. Make sure you have Sphinx\r\n\techo.installed, then set the SPHINXBUILD environment variable to point\r\n\techo.to the full path of the 'sphinx-build' executable. Alternatively you\r\n\techo.may add the Sphinx directory to PATH.\r\n\techo.\r\n\techo.If you don't have Sphinx installed, grab it from\r\n\techo.https://www.sphinx-doc.org/\r\n\texit /b 1\r\n)\r\n\r\nif \"%1\" == \"\" goto help\r\n\r\n%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%\r\ngoto end\r\n\r\n:help\r\n%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%\r\n\r\n:end\r\npopd\r\n"
  },
  {
    "path": "docs/modules.rst",
    "content": "crab\n====\n\n.. toctree::\n   :maxdepth: 4\n\n   crab\n"
  },
  {
    "path": "examples/multi_env.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom termcolor import colored\n\nfrom crab import Benchmark, create_benchmark\nfrom crab.agents.backend_models import OpenAIModel\nfrom crab.agents.policies import SingleAgentPolicy\nfrom crab.benchmarks.template import multienv_template_benchmark_config\n\n\ndef start_benchmark(benchmark: Benchmark, agent: SingleAgentPolicy):\n    for step in range(20):\n        print(\"=\" * 40)\n        print(f\"Start agent step {step}:\")\n        observation = benchmark.observe()\n        print(f\"Current enviornment observation: {observation}\")\n        prompt = {}\n        for env, obs in observation.items():\n            if env == \"root\":\n                continue\n            state = obs[\"current_state\"]\n            prompt[env] = [(f\"The state of {env} is {state}\", 0)]\n        response = agent.chat(observation=prompt)\n        print(colored(f\"Agent take action: {response}\", \"blue\"))\n\n        for action in response:\n            response = benchmark.step(\n                action=action.name,\n                parameters=action.arguments,\n                env_name=action.env,\n            )\n            print(\n                colored(\n                    f'Action \"{action.name}\" success, stat: '\n                    f\"{response.evaluation_results}\",\n                    \"green\",\n                )\n            )\n            if response.terminated:\n                print(\"=\" * 40)\n                print(\n                    colored(\n                        f\"Task finished, result: {response.evaluation_results}\", \"green\"\n                    )\n                )\n                return\n\n\nif __name__ == \"__main__\":\n    benchmark = create_benchmark(multienv_template_benchmark_config)\n    task, action_space = benchmark.start_task(\"0\")\n    env_descriptions = benchmark.get_env_descriptions()\n\n    agent = SingleAgentPolicy(model_backend=OpenAIModel(\"gpt-4o\"))\n    agent.reset(task.description, action_space, env_descriptions)\n    print(\"Start performing task: \" + colored(f'\"{task.description}\"', \"green\"))\n    start_benchmark(benchmark, agent)\n    benchmark.reset()\n"
  },
  {
    "path": "examples/single_env.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom termcolor import colored\n\nfrom crab import Benchmark, create_benchmark\nfrom crab.agents.backend_models import OpenAIModel\nfrom crab.agents.policies import SingleAgentPolicy\nfrom crab.benchmarks.template import template_benchmark_config\n\n\ndef start_benchmark(benchmark: Benchmark, agent: SingleAgentPolicy):\n    for step in range(20):\n        print(\"=\" * 40)\n        print(f\"Start agent step {step}:\")\n        observation = benchmark.observe()[\"template_env\"]\n        print(f\"Current enviornment observation: {observation}\")\n        response = agent.chat(\n            {\n                \"template_env\": [\n                    (f\"Current enviornment observation: {observation}\", 0),\n                ]\n            }\n        )\n        print(colored(f\"Agent take action: {response}\", \"blue\"))\n\n        for action in response:\n            response = benchmark.step(\n                action=action.name,\n                parameters=action.arguments,\n                env_name=action.env,\n            )\n            print(\n                colored(\n                    f'Action \"{action.name}\" success, stat: '\n                    f\"{response.evaluation_results}\",\n                    \"green\",\n                )\n            )\n            if response.terminated:\n                print(\"=\" * 40)\n                print(\n                    colored(\n                        f\"Task finished, result: {response.evaluation_results}\", \"green\"\n                    )\n                )\n                return\n\n\nif __name__ == \"__main__\":\n    benchmark = create_benchmark(template_benchmark_config)\n    task, action_space = benchmark.start_task(\"0\")\n    env_descriptions = benchmark.get_env_descriptions()\n\n    agent = SingleAgentPolicy(model_backend=OpenAIModel(\"gpt-4o\"))\n    agent.reset(task.description, action_space, env_descriptions)\n    print(\"Start performing task: \" + colored(f'\"{task.description}\"', \"green\"))\n    start_benchmark(benchmark, agent)\n    benchmark.reset()\n"
  },
  {
    "path": "licenses/LICENSE",
    "content": "                                 Apache License\n                           Version 2.0, January 2004\n                        http://www.apache.org/licenses/\n\n   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n   1. Definitions.\n\n      \"License\" shall mean the terms and conditions for use, reproduction,\n      and distribution as defined by Sections 1 through 9 of this document.\n\n      \"Licensor\" shall mean the copyright owner or entity authorized by\n      the copyright owner that is granting the License.\n\n      \"Legal Entity\" shall mean the union of the acting entity and all\n      other entities that control, are controlled by, or are under common\n      control with that entity. For the purposes of this definition,\n      \"control\" means (i) the power, direct or indirect, to cause the\n      direction or management of such entity, whether by contract or\n      otherwise, or (ii) ownership of fifty percent (50%) or more of the\n      outstanding shares, or (iii) beneficial ownership of such entity.\n\n      \"You\" (or \"Your\") shall mean an individual or Legal Entity\n      exercising permissions granted by this License.\n\n      \"Source\" form shall mean the preferred form for making modifications,\n      including but not limited to software source code, documentation\n      source, and configuration files.\n\n      \"Object\" form shall mean any form resulting from mechanical\n      transformation or translation of a Source form, including but\n      not limited to compiled object code, generated documentation,\n      and conversions to other media types.\n\n      \"Work\" shall mean the work of authorship, whether in Source or\n      Object form, made available under the License, as indicated by a\n      copyright notice that is included in or attached to the work\n      (an example is provided in the Appendix below).\n\n      \"Derivative Works\" shall mean any work, whether in Source or Object\n      form, that is based on (or derived from) the Work and for which the\n      editorial revisions, annotations, elaborations, or other modifications\n      represent, as a whole, an original work of authorship. For the purposes\n      of this License, Derivative Works shall not include works that remain\n      separable from, or merely link (or bind by name) to the interfaces of,\n      the Work and Derivative Works thereof.\n\n      \"Contribution\" shall mean any work of authorship, including\n      the original version of the Work and any modifications or additions\n      to that Work or Derivative Works thereof, that is intentionally\n      submitted to Licensor for inclusion in the Work by the copyright owner\n      or by an individual or Legal Entity authorized to submit on behalf of\n      the copyright owner. For the purposes of this definition, \"submitted\"\n      means any form of electronic, verbal, or written communication sent\n      to the Licensor or its representatives, including but not limited to\n      communication on electronic mailing lists, source code control systems,\n      and issue tracking systems that are managed by, or on behalf of, the\n      Licensor for the purpose of discussing and improving the Work, but\n      excluding communication that is conspicuously marked or otherwise\n      designated in writing by the copyright owner as \"Not a Contribution.\"\n\n      \"Contributor\" shall mean Licensor and any individual or Legal Entity\n      on behalf of whom a Contribution has been received by Licensor and\n      subsequently incorporated within the Work.\n\n   2. Grant of Copyright License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      copyright license to reproduce, prepare Derivative Works of,\n      publicly display, publicly perform, sublicense, and distribute the\n      Work and such Derivative Works in Source or Object form.\n\n   3. Grant of Patent License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      (except as stated in this section) patent license to make, have made,\n      use, offer to sell, sell, import, and otherwise transfer the Work,\n      where such license applies only to those patent claims licensable\n      by such Contributor that are necessarily infringed by their\n      Contribution(s) alone or by combination of their Contribution(s)\n      with the Work to which such Contribution(s) was submitted. If You\n      institute patent litigation against any entity (including a\n      cross-claim or counterclaim in a lawsuit) alleging that the Work\n      or a Contribution incorporated within the Work constitutes direct\n      or contributory patent infringement, then any patent licenses\n      granted to You under this License for that Work shall terminate\n      as of the date such litigation is filed.\n\n   4. Redistribution. You may reproduce and distribute copies of the\n      Work or Derivative Works thereof in any medium, with or without\n      modifications, and in Source or Object form, provided that You\n      meet the following conditions:\n\n      (a) You must give any other recipients of the Work or\n          Derivative Works a copy of this License; and\n\n      (b) You must cause any modified files to carry prominent notices\n          stating that You changed the files; and\n\n      (c) You must retain, in the Source form of any Derivative Works\n          that You distribute, all copyright, patent, trademark, and\n          attribution notices from the Source form of the Work,\n          excluding those notices that do not pertain to any part of\n          the Derivative Works; and\n\n      (d) If the Work includes a \"NOTICE\" text file as part of its\n          distribution, then any Derivative Works that You distribute must\n          include a readable copy of the attribution notices contained\n          within such NOTICE file, excluding those notices that do not\n          pertain to any part of the Derivative Works, in at least one\n          of the following places: within a NOTICE text file distributed\n          as part of the Derivative Works; within the Source form or\n          documentation, if provided along with the Derivative Works; or,\n          within a display generated by the Derivative Works, if and\n          wherever such third-party notices normally appear. The contents\n          of the NOTICE file are for informational purposes only and\n          do not modify the License. You may add Your own attribution\n          notices within Derivative Works that You distribute, alongside\n          or as an addendum to the NOTICE text from the Work, provided\n          that such additional attribution notices cannot be construed\n          as modifying the License.\n\n      You may add Your own copyright statement to Your modifications and\n      may provide additional or different license terms and conditions\n      for use, reproduction, or distribution of Your modifications, or\n      for any such Derivative Works as a whole, provided Your use,\n      reproduction, and distribution of the Work otherwise complies with\n      the conditions stated in this License.\n\n   5. Submission of Contributions. Unless You explicitly state otherwise,\n      any Contribution intentionally submitted for inclusion in the Work\n      by You to the Licensor shall be under the terms and conditions of\n      this License, without any additional terms or conditions.\n      Notwithstanding the above, nothing herein shall supersede or modify\n      the terms of any separate license agreement you may have executed\n      with Licensor regarding such Contributions.\n\n   6. Trademarks. This License does not grant permission to use the trade\n      names, trademarks, service marks, or product names of the Licensor,\n      except as required for reasonable and customary use in describing the\n      origin of the Work and reproducing the content of the NOTICE file.\n\n   7. Disclaimer of Warranty. Unless required by applicable law or\n      agreed to in writing, Licensor provides the Work (and each\n      Contributor provides its Contributions) on an \"AS IS\" BASIS,\n      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n      implied, including, without limitation, any warranties or conditions\n      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n      PARTICULAR PURPOSE. You are solely responsible for determining the\n      appropriateness of using or redistributing the Work and assume any\n      risks associated with Your exercise of permissions under this License.\n\n   8. Limitation of Liability. In no event and under no legal theory,\n      whether in tort (including negligence), contract, or otherwise,\n      unless required by applicable law (such as deliberate and grossly\n      negligent acts) or agreed to in writing, shall any Contributor be\n      liable to You for damages, including any direct, indirect, special,\n      incidental, or consequential damages of any character arising as a\n      result of this License or out of the use or inability to use the\n      Work (including but not limited to damages for loss of goodwill,\n      work stoppage, computer failure or malfunction, or any and all\n      other commercial damages or losses), even if such Contributor\n      has been advised of the possibility of such damages.\n\n   9. Accepting Warranty or Additional Liability. While redistributing\n      the Work or Derivative Works thereof, You may choose to offer,\n      and charge a fee for, acceptance of support, warranty, indemnity,\n      or other liability obligations and/or rights consistent with this\n      License. However, in accepting such obligations, You may act only\n      on Your own behalf and on Your sole responsibility, not on behalf\n      of any other Contributor, and only if You agree to indemnify,\n      defend, and hold each Contributor harmless for any liability\n      incurred by, or claims asserted against, such Contributor by reason\n      of your accepting any such warranty or additional liability.\n\n   END OF TERMS AND CONDITIONS\n\n   APPENDIX: How to apply the Apache License to your work.\n\n      To apply the Apache License to your work, attach the following\n      boilerplate notice, with the fields enclosed by brackets \"[]\"\n      replaced with your own identifying information. (Don't include\n      the brackets!)  The text should be enclosed in the appropriate\n      comment syntax for the file format. We also recommend that a\n      file or class name and description of purpose be included on the\n      same \"printed page\" as the copyright notice for easier\n      identification within third-party archives.\n\n   Copyright 2023 @ CAMEL-AI.org\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License."
  },
  {
    "path": "licenses/license_template.txt",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ==========="
  },
  {
    "path": "licenses/update_license.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport os\nimport re\nimport sys\nfrom pathlib import Path\nfrom typing import List\n\n\n# The license template file is hard-coded with specific start and end lines\ndef fine_license_start_line(lines: List[str], start_with: str) -> int:\n    for i in range(len(lines)):\n        if lines[i].startswith(start_with):\n            return i\n    return None\n\n\ndef find_license_end_line(lines: List[str], start_with: str) -> int:\n    for i in range(len(lines) - 1, -1, -1):\n        if lines[i].startswith(start_with):\n            return i\n    return None\n\n\ndef update_license_in_file(\n    file_path: str,\n    license_template_path: str,\n    start_line_start_with: str,\n    end_line_start_with: str,\n) -> bool:\n    with open(file_path, \"r\") as f:\n        content = f.read()\n\n    with open(license_template_path, \"r\") as f:\n        new_license = f.read().strip()\n\n    maybe_existing_licenses = re.findall(\n        r\"^#.*?(?=\\n)\", content, re.MULTILINE | re.DOTALL\n    )\n    start_index = fine_license_start_line(\n        maybe_existing_licenses, start_line_start_with\n    )\n    end_index = find_license_end_line(maybe_existing_licenses, end_line_start_with)\n    if start_index is not None and end_index is not None:\n        maybe_existing_licenses = maybe_existing_licenses[start_index : end_index + 1]\n    else:\n        maybe_existing_licenses = None\n    if maybe_existing_licenses:\n        maybe_old_licenses = \"\\n\".join(maybe_existing_licenses)\n        if maybe_old_licenses.strip() != new_license.strip():\n            replaced_content = content.replace(maybe_old_licenses, new_license)\n            with open(file_path, \"w\") as f:\n                f.write(replaced_content)\n            print(f\"Replaced license in {file_path}\")\n            return True\n        else:\n            return False\n    else:\n        with open(file_path, \"w\") as f:\n            f.write(new_license + \"\\n\" + content)\n        print(f\"Added license to {file_path}\")\n        return True\n\n\ndef update_license_in_directory(\n    directory_path: str,\n    license_template_path: str,\n    start_line_start_with: str,\n    end_line_start_with: str,\n) -> None:\n    # Check if directory exists\n    if not os.path.isdir(directory_path):\n        raise NotADirectoryError(f\"{directory_path} is not a directory\")\n    # Check if license template exists\n    if not os.path.isfile(license_template_path):\n        raise FileNotFoundError(f\"{license_template_path} not found\")\n\n    file_count = 0\n    for py_files in Path(directory_path).rglob(\"*.py\"):\n        if py_files.name.startswith(\".\"):\n            continue\n        if any(part.startswith(\".\") for part in py_files.parts):\n            continue\n        if any(part == \"thirdparty\" for part in py_files.parts):\n            continue\n        if update_license_in_file(\n            py_files,\n            license_template_path,\n            start_line_start_with,\n            end_line_start_with,\n        ):\n            file_count += 1\n\n    print(f\"License updated in {file_count} files\")\n\n\nif __name__ == \"__main__\":\n    if len(sys.argv) < 3:\n        print(\n            \"Usage from command line: \"\n            \"python update_license.py <directory_path> <license_template_path>\"\n            \"No valid input arguments found, please enter manually.\"\n        )\n        directory_path = input(\"Enter directory path: \")\n        license_template_path = input(\"Enter license template path: \")\n    else:\n        directory_path = sys.argv[1]\n        license_template_path = sys.argv[2]\n\n    start_line_start_with = \"# =========== Copyright\"\n    end_line_start_with = \"# =========== Copyright\"\n    update_license_in_directory(\n        directory_path,\n        license_template_path,\n        start_line_start_with,\n        end_line_start_with,\n    )\n"
  },
  {
    "path": "pyproject.toml",
    "content": "[build-system]\nrequires = [\"poetry-core>=1.2.0\", \"wheel\"]\nbuild-backend = \"poetry.core.masonry.api\"\n\n[tool.poetry]\nname = \"crab-framework\"\nversion = \"0.1.2\"\ndescription = \"Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents.\"\n\nauthors = [\"CAMEL-AI.org\"]\nmaintainers = [\"Tianqi Xu <tianqi.xu@kaust.edu.sa>\"]\n\npackages = [{ include = \"crab\" }]\n\nreadme = \"README.md\"\nlicense = \"Apache License 2.0\"\nrepository = \"https://github.com/camel-ai/crab\"\n\n[tool.poetry.dependencies]\npython = \"^3.10, <3.12\"\n\n# core\ndocstring-parser = \"^0\"\nnetworkx = \"^3\"\ndill = \"^0.3.8\"\npydantic = \"^2.6\"\nlxml = \"^5.2.2\"\nopenai = \"^1.12.0\"\ncryptography = \"^43.0.0\"\nsetuptools = \"^73.0.1\"\ntenacity = \"^9.0.0\"\n\n# desktop actions\npillow = \"^10.2.0\"\nmss = \"^9.0.1\"\npsutil = \"^5.9.8\"\npyautogui = \"^0.9.3\"\npyperclip = \"^1.8.2\"\n\n# environment\npython-vagrant = \"^1.0.0\"\n\n# evaluation\npyexcel-ods = \"^0.6.0\"\nodfpy = \"^1.4.1\"\nbeautifulsoup4 = \"^4.12.3\"\ntermcolor = \"^2.4.0\"\nopencv-python = \"^4.9.0.80\"\n\n# client\nhttpx = { version = \"*\", optional = true }\n\n# agent\ngoogle-generativeai = { version = \"^0.6.0\", optional = true }\nanthropic = { version = \"^0.29.0\", optional = true }\ngroq = { version = \"^0.5.0\", optional = true }\nollama = { version = \"^0.2.0\", optional = true }\ncamel-ai = { version = \"^0.2\", extras = [\"all\"], optional = true }\n\n# text ocr\neasyocr = { version = \"^1.7.1\", optional = true }\n\n# visual prompt\ntransformers = { version = \"4.44.1\", optional = true }\ntorch = { version = \"^2.4.0\", optional = true }\n\n# server\nfastapi = { extras = [\"all\"], version = \"0.109.1\", optional = true }\npydantic-settings = { version = \"^2\", optional = true }\nuvicorn = { extras = [\"standard\"], version = \"^0.27.0.post1\", optional = true }\n\n# radar plot\nplotly = { version = \"^5.20.0\", optional = true }\n\n# types\ntypes-pyautogui = \"^0.9.3.20240106\"\ntypes-psutil = \"^5.9.5.20240205\"\ntypes-networkx = \"^3.2.1.20240210\"\n\n[tool.poetry.extras]\nserver = [\"fastapi\", \"pydantic-settings\", \"uvicorn\"]\nclient = [\n    \"httpx\",\n    \"openai\",\n    \"google-generativeai\",\n    \"anthropic\",\n    \"groq\",\n    \"ollama\",\n    \"easyocr\",\n    \"plotly\",\n    \"torch\",\n    \"torchvision\",\n    \"numpy\",\n    \"opencv-python\",\n    \"transformers\",\n    \"addict\",\n    \"yapf\",\n    \"matplotlib\",\n    \"pycocotools\",\n    \"timm\",\n]\ncamel = [\"camel-ai\"]\n\n[tool.poetry.group.dev.dependencies]\nmypy = \"^1.8.0\"\npytest = \"^8.0.0\"\nruff = \"^0.6.5\"\nipykernel = \"^6.29.3\"\npandas = \"^2.2.2\"\nsphinx = \"^7\"\nmyst-parser = \"^4\"\nsphinx-book-theme = \"*\"\npre-commit = \"^3.7.0\"\ncertifi = \"^2024.2.2\"\n\n[tool.ruff]\nlint.select = [\"E501\", \"E4\", \"E7\", \"E9\", \"F\", \"I\"]\nlint.ignore = [\"E731\"]\nexclude = [\"docs/\"]\n\n[[tool.mypy.overrides]]\nmodule = [\"dill\", \"easyocr\", \"google.generativeai.*\"]\nignore_missing_imports = true\n"
  },
  {
    "path": "test/actions/test_visual_prompt_actions.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom pathlib import Path\n\nimport pytest\nimport requests\nfrom PIL import Image\n\nfrom crab.actions.visual_prompt_actions import (\n    get_groundingdino_boxes,\n    groundingdino_easyocr,\n)\nfrom crab.utils import image_to_base64\n\n\n@pytest.mark.skip(reason=\"Too slow\")\ndef test_get_groundingdino_boxes_single_image():\n    url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n    image = Image.open(requests.get(url, stream=True).raw)\n    text = \"a cat.\"\n\n    box_threshold = 0.4\n    text_threshold = 0.3\n    result = get_groundingdino_boxes(image, text, box_threshold, text_threshold)\n    assert len(result) == 1\n    assert len(result[0]) > 0\n    assert len(result[0][0]) == 2\n\n\n@pytest.mark.skip(reason=\"Too slow\")\ndef test_get_groundingdino_boxes_multi_image():\n    url1 = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n    url2 = \"https://farm5.staticflickr.com/4005/4666183752_c5b79faa17_z.jpg\"\n    image1 = Image.open(requests.get(url1, stream=True).raw)\n    image2 = Image.open(requests.get(url2, stream=True).raw)\n    text = \"a cat. a car.\"\n\n    box_threshold = 0.4\n    text_threshold = 0.3\n    result = get_groundingdino_boxes(\n        [image1, image2], text, box_threshold, text_threshold\n    )\n    assert len(result) == 2\n    assert len(result[0]) > 0\n    assert len(result[1]) > 0\n    assert len(result[0][0]) == 2\n\n\n@pytest.mark.skip(reason=\"Too slow\")\n@pytest.mark.parametrize(\n    \"image_name\", [\"ubuntu_screenshot.png\", \"android_screenshot.png\"]\n)\ndef test_groundingdino_easy_ocr(image_name: str):\n    class A:\n        pass\n\n    temp = A()\n\n    test_dir = Path(__file__).parent.parent\n    image_path = test_dir / \"_assets\" / image_name\n    image = Image.open(image_path)\n    image_base64 = image_to_base64(image)\n    visual_prompt = groundingdino_easyocr(font_size=40).set_kept_param(env=temp)\n    result_image, boxes = visual_prompt.run(input_base64_image=image_base64)\n    assert result_image != image_base64\n    assert boxes\n"
  },
  {
    "path": "test/agents/backend_models/test_camel_model.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport pytest\n\nfrom crab import action\nfrom crab.agents.backend_models import BackendModelConfig, create_backend_model\n\n\n@pytest.fixture\ndef camel_model():\n    return create_backend_model(\n        BackendModelConfig(\n            model_class=\"camel\",\n            model_name=\"gpt-4o\",\n            model_platform=\"openai\",\n            parameters={\"max_tokens\": 3000},\n            history_messages_len=1,\n        )\n    )\n\n\n@action\ndef add(a: int, b: int):\n    \"\"\"Add up two integers.\n\n    Args:\n        a: An addend\n        b: Another addend\n    \"\"\"\n    return a + b\n\n\n@pytest.mark.skip(reason=\"Mock data to be added\")\ndef test_action_chat(camel_model):\n    camel_model.reset(\"You are a helpful assistant.\", [add])\n    message = (\n        \"I had 10 dollars. Miss Polaris gave me 15 dollars. \"\n        \"How many money do I have now.\",\n        0,\n    )\n    output = camel_model.chat([message])\n    assert not output.message\n    assert len(output.action_list) == 1\n    assert output.action_list[0].arguments == {\"a\": 10, \"b\": 15}\n    assert output.action_list[0].name == \"add\"\n    assert camel_model.token_usage > 0\n"
  },
  {
    "path": "test/agents/backend_models/test_claude_model.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport pytest\n\nfrom crab import MessageType, action\nfrom crab.agents.backend_models import BackendModelConfig, create_backend_model\n\n# TODO: Add mock data\n\n\n@pytest.fixture\ndef claude_model_text():\n    return create_backend_model(\n        BackendModelConfig(\n            model_class=\"claude\",\n            model_name=\"claude-3-opus-20240229\",\n            parameters={\"max_tokens\": 3000},\n            history_messages_len=1,\n        )\n    )\n\n\n@action\ndef add(a: int, b: int):\n    \"\"\"Add up two integers.\n\n    Args:\n        a: An addend\n        b: Another addend\n    \"\"\"\n    return a + b\n\n\n@pytest.mark.skip(reason=\"Mock data to be added\")\ndef test_text_chat(claude_model_text):\n    message = (\"Hello!\", MessageType.TEXT)\n    output = claude_model_text.chat(message)\n    assert output.message\n    assert output.action_list is None\n    assert claude_model_text.token_usage > 0\n\n    # Send another message to check accumulated tokens and history length\n    message2 = (\"Give me five!\", MessageType.TEXT)\n    output = claude_model_text.chat(message2)\n    assert claude_model_text.token_usage > 0\n    assert output.message\n    assert len(claude_model_text.chat_history) == 2\n\n    # Send another message to check accumulated tokens and chat history\n    output = claude_model_text.chat(message2)\n    assert output.message\n    assert len(claude_model_text.chat_history) == 3\n\n\n@pytest.mark.skip(reason=\"Mock data to be added\")\ndef test_action_chat(claude_model_text):\n    claude_model_text.reset(\"You are a helpful assistant.\", [add])\n    message = (\n        (\n            \"I had 10 dollars. Miss Polaris gave me 15 dollars.\"\n            \" How many money do I have now.\"\n        ),\n        0,\n    )\n    output = claude_model_text.chat(message)\n    assert len(output.action_list) == 1\n    args = output.action_list[0].arguments\n    assert args[\"a\"] + args[\"b\"] == 25\n    assert output.action_list[0].name == \"add\"\n    assert claude_model_text.token_usage > 0\n"
  },
  {
    "path": "test/agents/backend_models/test_gemini_model.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport pytest\n\nfrom crab import MessageType, action\nfrom crab.agents.backend_models import BackendModelConfig, create_backend_model\n\n# TODO: Add mock data\n\n\n@pytest.fixture\ndef gemini_model_text():\n    return create_backend_model(\n        BackendModelConfig(\n            model_class=\"gemini\",\n            model_name=\"gemini-1.5-pro-latest\",\n            parameters={\"max_tokens\": 3000},\n            history_messages_len=1,\n            tool_call_required=False,\n        )\n    )\n\n\n@action\ndef add(a: int, b: int):\n    \"\"\"Add up two integers.\n\n    Args:\n        a: An addend\n        b: Another addend\n    \"\"\"\n    return a + b\n\n\n@pytest.mark.skip(reason=\"Mock data to be added\")\ndef test_text_chat(gemini_model_text):\n    message = (\"Hello!\", MessageType.TEXT)\n    output = gemini_model_text.chat(message)\n    assert output.message\n    assert output.action_list is None\n    # assert gemini_model_text.token_usage > 0\n\n    # Send another message to check accumulated tokens and history length\n    message2 = (\"Give me five!\", MessageType.TEXT)\n    output = gemini_model_text.chat(message2)\n    # assert gemini_model_text.token_usage > 0\n    assert output.message\n    assert len(gemini_model_text.chat_history) == 2\n\n    # Send another message to check accumulated tokens and chat history\n    output = gemini_model_text.chat(message2)\n    assert output.message\n    assert len(gemini_model_text.chat_history) == 3\n\n\n@pytest.mark.skip(reason=\"Mock data to be added\")\ndef test_action_chat(gemini_model_text):\n    gemini_model_text.reset(\"You are a helpful assistant.\", [add])\n    message = (\n        (\n            \"I had 10 dollars. Miss Polaris gave me 15 dollars. \"\n            \"How many money do I have now.\"\n        ),\n        0,\n    )\n    output = gemini_model_text.chat(message)\n    assert output.message is None\n    assert len(output.action_list) == 1\n    assert output.action_list[0].arguments == {\"a\": 10, \"b\": 15}\n    assert output.action_list[0].name == \"add\"\n"
  },
  {
    "path": "test/agents/backend_models/test_openai_model.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport os\nfrom unittest.mock import MagicMock, patch\n\nimport pytest\nfrom openai.types.chat.chat_completion_message_tool_call import Function\n\nfrom crab import action\nfrom crab.agents.backend_models import BackendModelConfig, create_backend_model\nfrom crab.agents.backend_models.openai_model import MessageType\n\n# Mock data for the OpenAI API response\nopenai_mock_response = MagicMock(\n    choices=[\n        MagicMock(\n            finish_reason=\"stop\",\n            index=0,\n            logprobs=None,\n            message=MagicMock(\n                content=\"Hi there! How can I assist you today?\",\n                role=\"assistant\",\n                function_call=None,\n                tool_calls=None,\n            ),\n        )\n    ],\n    model=\"gpt-4o-2024-05-13\",\n    object=\"chat.completion\",\n    usage=MagicMock(completion_tokens=10, prompt_tokens=19, total_tokens=29),\n)\n\nopenai_mock_response2 = MagicMock(\n    choices=[\n        MagicMock(\n            finish_reason=\"stop\",\n            index=0,\n            logprobs=None,\n            message=MagicMock(\n                content=\"Sure thing! ✋ How can I help you today?\",\n                role=\"assistant\",\n                function_call=None,\n                tool_calls=None,\n            ),\n        )\n    ],\n    model=\"gpt-4o-2024-05-13\",\n    object=\"chat.completion\",\n    usage=MagicMock(completion_tokens=12, prompt_tokens=41, total_tokens=53),\n)\n\nopenai_mock_response3 = MagicMock(\n    choices=[\n        MagicMock(\n            finish_reason=\"stop\",\n            index=0,\n            logprobs=None,\n            message=MagicMock(\n                content=None,\n                role=\"assistant\",\n                function_call=None,\n                tool_calls=[\n                    MagicMock(\n                        id=\"call_ceE9IX1uYeRqGShYYlHYrCCF\",\n                        function=Function(arguments='{\"a\":10,\"b\":15}', name=\"add\"),\n                        type=\"function\",\n                    )\n                ],\n            ),\n        )\n    ],\n    model=\"gpt-4o-2024-05-13\",\n    object=\"chat.completion\",\n    usage=MagicMock(completion_tokens=15, prompt_tokens=93, total_tokens=108),\n)\n\n\n@pytest.fixture\ndef openai_model_text():\n    os.environ[\"OPENAI_API_KEY\"] = \"MOCK\"\n    return create_backend_model(\n        BackendModelConfig(\n            model_class=\"openai\",\n            model_name=\"gpt-4o\",\n            parameters={\"max_tokens\": 3000},\n            history_messages_len=1,\n            tool_call_required=False,\n        )\n    )\n\n\n@action\ndef add(a: int, b: int):\n    \"\"\"Add up two integers.\n\n    Args:\n        a: An addend\n        b: Another addend\n    \"\"\"\n    return a + b\n\n\n@patch(\n    \"openai.resources.chat.completions.Completions.create\",\n    return_value=openai_mock_response,\n)\ndef test_text_chat(mock_create, openai_model_text):\n    message = (\"Hello!\", MessageType.TEXT)\n    output = openai_model_text.chat(message)\n    assert len(mock_create.call_args.kwargs[\"messages\"]) == 2\n    assert output.message == \"Hi there! How can I assist you today?\"\n    assert output.action_list is None\n    assert openai_model_text.token_usage == 29\n\n    # Send another message to check accumulated tokens and history length\n    message2 = (\"Give me five!\", MessageType.TEXT)\n    mock_create.return_value = openai_mock_response2\n    output = openai_model_text.chat(message2)\n    assert len(mock_create.call_args.kwargs[\"messages\"]) == 4\n    assert openai_model_text.token_usage == 29 + 53\n    assert output.message == \"Sure thing! ✋ How can I help you today?\"\n    assert len(openai_model_text.chat_history) == 2\n\n    # Send another message to check accumulated tokens and chat history\n    output = openai_model_text.chat(message2)\n    assert len(mock_create.call_args.kwargs[\"messages\"]) == 4\n    assert openai_model_text.token_usage == 29 + 53 + 53\n    assert output.message == \"Sure thing! ✋ How can I help you today?\"\n    assert len(openai_model_text.chat_history) == 3\n\n\n@patch(\n    \"openai.resources.chat.completions.Completions.create\",\n    return_value=openai_mock_response3,\n)\ndef test_action_chat(mock_create, openai_model_text):\n    openai_model_text.reset(\"You are a helpful assistant.\", [add])\n    message = (\n        (\n            \"I had 10 dollars. Miss Polaris gave me 15 dollars. \"\n            \"How many money do I have now.\"\n        ),\n        0,\n    )\n    output = openai_model_text.chat(message)\n    assert output.message is None\n    assert len(output.action_list) == 1\n    assert output.action_list[0].arguments == {\"a\": 10, \"b\": 15}\n    assert output.action_list[0].name == \"add\"\n    assert openai_model_text.token_usage == 108\n"
  },
  {
    "path": "test/agents/policies/test_multi_agent_by_func.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport pytest\n\nfrom crab import create_benchmark\nfrom crab.agents.backend_models import BackendModelConfig\nfrom crab.agents.policies.multi_agent_by_func import MultiAgentByFuncPolicy\nfrom crab.benchmarks.template import multienv_template_benchmark_config\n\n\n@pytest.fixture\ndef policy_fixture():\n    model = BackendModelConfig(\n        model_class=\"openai\",\n        model_name=\"gpt-4o\",\n        parameters={\"max_tokens\": 3000},\n        history_messages_len=1,\n    )\n    benchmark_config = multienv_template_benchmark_config\n    benchmark = create_benchmark(benchmark_config)\n    task, action_spaces = benchmark.start_task(\"0\")\n    policy = MultiAgentByFuncPolicy(\n        main_agent_model_backend=model,\n        tool_agent_model_backend=model,\n    )\n    policy.reset(\n        task_description=task.description,\n        action_spaces=action_spaces,\n        env_descriptions=benchmark.get_env_descriptions(),\n    )\n    return policy, benchmark\n\n\n@pytest.mark.skip(reason=\"Mock data to be added\")\ndef test_policy(policy_fixture):\n    policy, benchmark = policy_fixture\n    observations = benchmark.observe()\n    agent_observation = {}\n    for env in observations:\n        if env == \"root\":\n            continue\n        agent_observation[env] = [\n            (\n                f'The current state of \"{env}\" is '\n                + str(observations[env][\"current_state\"])\n                + \". \",\n                0,\n            )\n        ]\n    action_list = policy.chat(agent_observation)\n    assert action_list\n"
  },
  {
    "path": "test/agents/policies/test_mutli_agent_by_env.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport pytest\n\nfrom crab import create_benchmark\nfrom crab.agents.backend_models import BackendModelConfig\nfrom crab.agents.policies.multi_agent_by_env import MultiAgentByEnvPolicy\nfrom crab.benchmarks.template import multienv_template_benchmark_config\n\n\n@pytest.fixture\ndef policy_fixture():\n    model = BackendModelConfig(\n        model_class=\"openai\",\n        model_name=\"gpt-4o\",\n        parameters={\"max_tokens\": 3000},\n        history_messages_len=1,\n    )\n    benchmark_config = multienv_template_benchmark_config\n    benchmark = create_benchmark(benchmark_config)\n    task, action_spaces = benchmark.start_task(\"0\")\n    policy = MultiAgentByEnvPolicy(\n        main_agent_model_backend=model,\n        env_agent_model_backend=model,\n    )\n    policy.reset(\n        task_description=task.description,\n        action_spaces=action_spaces,\n        env_descriptions=benchmark.get_env_descriptions(),\n    )\n    return policy, benchmark\n\n\n@pytest.mark.skip(reason=\"Mock data to be added\")\ndef test_policy(policy_fixture):\n    policy, benchmark = policy_fixture\n    observations = benchmark.observe()\n    agent_observation = {}\n    for env in observations:\n        if env == \"root\":\n            continue\n        agent_observation[env] = [\n            (\n                f'The current state of \"{env}\" is '\n                + str(observations[env][\"current_state\"])\n                + \". \",\n                0,\n            )\n        ]\n    action_list = policy.chat(agent_observation)\n    assert action_list\n"
  },
  {
    "path": "test/agents/policies/test_single_agent.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport os\nfrom unittest.mock import MagicMock, patch\n\nimport pytest\nfrom openai.types.chat.chat_completion import (\n    ChatCompletionMessage,\n    Choice,\n    CompletionUsage,\n)\nfrom openai.types.chat.chat_completion_message_tool_call import (\n    ChatCompletionMessageToolCall,\n    Function,\n)\n\nfrom crab import create_benchmark\nfrom crab.agents.backend_models import BackendModelConfig\nfrom crab.agents.policies.single_agent import SingleAgentPolicy\nfrom crab.benchmarks.template import multienv_template_benchmark_config\n\nopenai_mock_response = MagicMock(\n    choices=[\n        Choice(\n            finish_reason=\"stop\",\n            index=0,\n            logprobs=None,\n            message=ChatCompletionMessage(\n                content=None,\n                role=\"assistant\",\n                function_call=None,\n                tool_calls=[\n                    ChatCompletionMessageToolCall(\n                        id=\"call_3YIJZhrC5smSjAJKOeFcQxRf\",\n                        function=Function(\n                            arguments='{\"value\": true}', name=\"set_state__in__testenv0\"\n                        ),\n                        type=\"function\",\n                    ),\n                    ChatCompletionMessageToolCall(\n                        id=\"call_mA9Z9HQfmYn2TbzeGsEVcCr7\",\n                        function=Function(\n                            arguments='{\"value\": true}', name=\"set_state__in__testenv1\"\n                        ),\n                        type=\"function\",\n                    ),\n                    ChatCompletionMessageToolCall(\n                        id=\"call_GgxbBTd6afj2iDyOewaNattB\",\n                        function=Function(\n                            arguments='{\"value\": true}', name=\"set_state__in__testenv2\"\n                        ),\n                        type=\"function\",\n                    ),\n                ],\n            ),\n        )\n    ],\n    model=\"gpt-4o-2024-05-13\",\n    object=\"chat.completion\",\n    usage=CompletionUsage(completion_tokens=74, prompt_tokens=648, total_tokens=722),\n)\n\n\n@pytest.fixture\ndef policy_fixture():\n    os.environ[\"OPENAI_API_KEY\"] = \"MOCK\"\n    model = BackendModelConfig(\n        model_class=\"openai\",\n        model_name=\"gpt-4o\",\n        parameters={\"max_tokens\": 3000},\n        history_messages_len=1,\n    )\n    benchmark_config = multienv_template_benchmark_config\n    benchmark = create_benchmark(benchmark_config)\n    task, action_spaces = benchmark.start_task(\"0\")\n    policy = SingleAgentPolicy(model_backend=model)\n    policy.reset(\n        task_description=task.description,\n        action_spaces=action_spaces,\n        env_descriptions=benchmark.get_env_descriptions(),\n    )\n    return policy, benchmark\n\n\n@patch(\n    \"openai.resources.chat.completions.Completions.create\",\n    return_value=openai_mock_response,\n)\ndef test_policy(mock_create: MagicMock, policy_fixture):\n    policy, benchmark = policy_fixture\n    observation = benchmark.observe()\n    for env in observation:\n        if env == \"root\":\n            continue\n        observation[env] = [\n            (\n                'The current state of \"{env}\" is '\n                + str(observation[env][\"current_state\"])\n                + \". \",\n                0,\n            )\n        ]\n    action_list = policy.chat(observation)\n    mock_create.assert_called_once()\n    assert action_list\n"
  },
  {
    "path": "test/core/test_action.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nfrom crab.core import Action, action\nfrom crab.core.models.action import _check_no_param\n\n\n@action\ndef dummy_function(a: int, b: str = \"default\") -> int:\n    \"\"\"\n    This is a test function.\n\n    Args:\n        a (int): The first parameter.\n        b (str, optional): The second parameter. Defaults to \"default\".\n\n    Returns:\n        int: The result.\n    \"\"\"\n    return a + 1\n\n\n@action\ndef dummy_env_action(a: int, env: int) -> int:\n    \"\"\"\n    This is a kept parameter test function.\n\n    Args:\n        a (int): The first parameter.\n        env (int): The current environemnt. Should not be appeared in the parameters.\n\n    Returns:\n        int: The result.\n    \"\"\"\n    return a + env\n\n\ndef test_action_to_openai_json_schema():\n    result = dummy_function.to_openai_json_schema()\n    assert result[\"name\"]\n    assert result[\"description\"]\n    assert result[\"parameters\"]\n\n    parameters = result[\"parameters\"]\n    assert \"properties\" in parameters\n    assert \"a\" in parameters[\"properties\"]\n    assert parameters[\"properties\"][\"a\"][\"type\"] == \"integer\"\n    assert \"b\" in parameters[\"properties\"]\n    assert parameters[\"properties\"][\"b\"][\"type\"] == \"string\"\n    assert parameters[\"properties\"][\"b\"][\"default\"] == \"default\"\n    assert \"required\" in parameters\n    assert \"a\" in parameters[\"required\"]\n\n\ndef test_from_function():\n    action_instance: Action = dummy_function\n    assert action_instance.description == \"This is a test function.\"\n    assert action_instance.name == \"dummy_function\"\n    assert \"a\" in action_instance.parameters.model_fields\n    assert \"b\" in action_instance.parameters.model_fields\n    assert action_instance.name == \"dummy_function\"\n\n\ndef test_chaining():\n    dummy_x2 = dummy_function >> dummy_function\n    assert dummy_x2.entry(1) == 3\n\n\n@action\ndef add_a_to_b(a: int, b: int = 1) -> int:\n    return a + b\n\n\n@action\ndef multiply_a_to_b(a: int, b: int = 1) -> int:\n    return a * b\n\n\ndef test_closed_action():\n    action = add_a_to_b(5)\n    assert action.entry() == 6\n    assert _check_no_param(action)\n\n\ndef test_kwargs_action():\n    action = add_a_to_b(b=6)\n    assert action.entry(1) == 7\n\n\ndef test_chain_various_actions():\n    action = add_a_to_b(b=10) >> multiply_a_to_b(b=10) >> add_a_to_b()\n    assert action.entry(0) == 101\n    action = add_a_to_b(a=1, b=10) >> multiply_a_to_b(b=10) >> add_a_to_b()\n    assert action.entry() == 111\n    action = add_a_to_b(1, b=10) >> multiply_a_to_b(b=10) >> add_a_to_b()\n    assert action.entry() == 111\n\n\ndef test_kept_param():\n    action = dummy_env_action.set_kept_param(env=10)\n    assert action.run(a=10) == 20\n"
  },
  {
    "path": "test/core/test_benchmark.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport pytest\nfrom fastapi.testclient import TestClient\n\nfrom crab import Benchmark, action, create_benchmark\nfrom crab.benchmarks.template import (\n    multienv_template_benchmark_config,\n    template_benchmark_config,\n    template_environment_config,\n)\nfrom crab.server.main import init\n\n\n@pytest.fixture\ndef benchmark(request):\n    if request.param == \"multienv\":\n        yield create_benchmark(multienv_template_benchmark_config)\n    elif request.param == \"multienv-remote\":\n        # TODO: fix multienv remote use the same env in different remote envs\n        app0 = init(environment_config=template_environment_config)\n        client0 = TestClient(app0)\n        app1 = init(environment_config=template_environment_config)\n        client1 = TestClient(app1)\n        app2 = init(environment_config=template_environment_config)\n        client2 = TestClient(app2)\n        proxy_config = multienv_template_benchmark_config.model_copy()\n        for env in proxy_config.environments:\n            env.remote_url = \"http://127.0.0.1:8000\"\n        benchmark = create_benchmark(proxy_config)\n        benchmark.environment_map[\"testenv0\"]._client = client0\n        benchmark.environment_map[\"testenv1\"]._client = client1\n        benchmark.environment_map[\"testenv2\"]._client = client2\n        yield benchmark\n    elif request.param == \"singleenv\":\n        yield create_benchmark(template_benchmark_config)\n\n\n@pytest.mark.parametrize(\"benchmark\", [\"multienv\", \"multienv-remote\"], indirect=True)\ndef test_multi_env_benchmark_process(benchmark: Benchmark):\n    assert benchmark.multienv\n    task, actions = benchmark.start_task(task_id=\"0\")\n    assert benchmark.current_task == task\n    assert len(actions) == 4\n    assert len(actions[\"root\"]) == 1\n    assert actions[\"root\"][0].name == \"_submit\"\n\n    result = benchmark.step(\n        action=\"set_state\", parameters={\"value\": True}, env_name=\"testenv0\"\n    )\n    assert result.evaluation_results[\"completeness\"] == 0.25\n\n    result = benchmark.step(\n        action=\"set_state\", parameters={\"value\": True}, env_name=\"testenv1\"\n    )\n    assert result.evaluation_results[\"completeness\"] == 0.5\n\n    result = benchmark.step(\n        action=\"set_state\", parameters={\"value\": True}, env_name=\"testenv2\"\n    )\n    assert result.evaluation_results[\"completeness\"] == 0.75\n\n    result = benchmark.step(\n        action=\"_submit\", parameters={\"content\": True}, env_name=\"root\"\n    )\n    assert result.terminated\n    assert result.evaluation_results[\"completeness\"] == 1.0\n\n\n@action\ndef to_str(input: bool) -> str:\n    return f\"The current state is {input}\"\n\n\n@pytest.mark.parametrize(\"benchmark\", [\"singleenv\"], indirect=True)\ndef test_prompting_tool(benchmark: Benchmark):\n    benchmark.prompting_tools = {\"template_env\": {\"current_state\": to_str}}\n    benchmark.start_task(\"0\")\n    observe, prompt = benchmark.observe_with_prompt()\n    assert observe[\"template_env\"][\"current_state\"] is False\n    assert prompt[\"template_env\"][\"current_state\"] == \"The current state is False\"\n    benchmark.close_task()\n"
  },
  {
    "path": "test/core/test_evaluator.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\nimport networkx as nx\nimport pytest\n\nfrom crab.core import Environment, Evaluator, GraphEvaluator, evaluator\n\na = None\n\n\ndef set_a(value: int) -> None:\n    global a\n    a = value\n\n\n@evaluator\ndef dummy_evaluator1() -> bool:\n    \"\"\"\n    This is a test evaluator.\n\n    Args:\n        a (int): The first parameter.\n\n    Returns:\n        bool: The result.\n    \"\"\"\n    return a > 0\n\n\n@evaluator\ndef dummy_evaluator2() -> bool:\n    \"\"\"\n    This is a test evaluator.\n\n    Args:\n        a (int): The first parameter.\n        b (str, optional): The second parameter. Defaults to \"default\".\n\n    Returns:\n        bool: The result.\n    \"\"\"\n    return a < 2\n\n\n@evaluator\ndef dummy_evaluator3() -> bool:\n    \"\"\"\n    This is a test evaluator.\n\n    Args:\n        a (int): The first parameter.\n        b (str, optional): The second parameter. Defaults to \"default\".\n\n    Returns:\n        bool: The result.\n    \"\"\"\n    return a > 100\n\n\n@evaluator\ndef no_param_evaluator() -> bool:\n    return True\n\n\n@pytest.fixture\ndef root_env() -> Environment:\n    return Environment(\n        name=\"root\",\n        action_space=[],\n        observation_space=[],\n        description=\"The crab root server\",\n    )\n\n\ndef test_evaluator_run():\n    assert isinstance(dummy_evaluator1, Evaluator)\n    set_a(3)\n    assert dummy_evaluator1.entry()\n    set_a(-1)\n    assert not dummy_evaluator1.entry()\n\n\ndef test_evaluator_and():\n    set_a(1)\n    assert (dummy_evaluator1 & dummy_evaluator2).entry()\n    set_a(-1)\n    assert not (dummy_evaluator1 & dummy_evaluator2).entry()\n    set_a(3)\n    assert not (dummy_evaluator1 & dummy_evaluator2).entry()\n\n\ndef test_evaluator_or():\n    set_a(1)\n    assert (dummy_evaluator1 | dummy_evaluator2).entry()\n    set_a(-1)\n    assert (dummy_evaluator1 | dummy_evaluator2).entry()\n    set_a(3)\n    assert (dummy_evaluator1 | dummy_evaluator2).entry()\n\n\ndef test_evaluator_not():\n    set_a(3)\n    assert not (~dummy_evaluator1).entry()\n    set_a(-1)\n    assert (~dummy_evaluator1).entry()\n\n\ndef test_chain_evaluator(root_env):\n    graph_evaluator = GraphEvaluator(\n        nx.path_graph(\n            [dummy_evaluator1, dummy_evaluator2, no_param_evaluator],\n            create_using=nx.DiGraph,\n        )\n    )\n    graph_evaluator.reset()\n    assert graph_evaluator.count == 0\n    assert graph_evaluator.G.nodes[dummy_evaluator1][\"remaining_predecessors\"] == 0\n    assert graph_evaluator.G.nodes[dummy_evaluator2][\"remaining_predecessors\"] == 1\n    assert graph_evaluator.G.nodes[no_param_evaluator][\"remaining_predecessors\"] == 1\n\n    set_a(3)\n    graph_evaluator.step({\"root\": root_env})\n    assert graph_evaluator.count == 1\n    assert graph_evaluator.G.nodes[dummy_evaluator1][\"passing_count\"] == 0\n    assert graph_evaluator.G.nodes[dummy_evaluator2][\"remaining_predecessors\"] == 0\n\n    set_a(3)\n    graph_evaluator.step({\"root\": root_env})\n    assert graph_evaluator.count == 2\n    assert graph_evaluator.G.nodes[dummy_evaluator2][\"remaining_predecessors\"] == 0\n    assert graph_evaluator.G.nodes[dummy_evaluator2][\"passing_count\"] is None\n\n    set_a(-1)\n    graph_evaluator.step({\"root\": root_env})\n    assert graph_evaluator.count == 3\n    assert graph_evaluator.G.nodes[dummy_evaluator2][\"passing_count\"] == 2\n    assert graph_evaluator.G.nodes[no_param_evaluator][\"remaining_predecessors\"] == 0\n"
  },
  {
    "path": "test/core/test_utils.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n\n\nimport os\n\nfrom crab.utils import decrypt_message, encrypt_message\n\n\ndef test_encrypt_decrypt():\n    message = \"Hello, World!\"\n    key = os.urandom(32)\n    encrypted_message = encrypt_message(message, key)\n    decrypted_message = decrypt_message(encrypted_message, key)\n    assert decrypted_message == message\n"
  },
  {
    "path": "test/server/test_api.py",
    "content": "# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n# Licensed under the Apache License, Version 2.0 (the “License”);\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an “AS IS” BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n# =========== Copyright 2024 @ CAMEL-AI.org. All Rights Reserved. ===========\n\nimport pytest\nfrom fastapi.testclient import TestClient\n\nfrom crab import create_environment\nfrom crab.environments.template import (\n    current_state,\n    set_state,\n    template_environment_config,\n)\nfrom crab.server.main import init\n\n\n@pytest.fixture\ndef mock_env():\n    mock_app = init(template_environment_config)\n    mock_cli = TestClient(mock_app)\n    mock_env = create_environment(template_environment_config)\n    mock_env._client = mock_cli\n    return mock_env\n\n\ndef test_raw_action_unencrypted(mock_env):\n    assert mock_env._action_endpoint(set_state, {\"value\": True}) is None\n    assert mock_env._action_endpoint(current_state, {}) is True\n    assert mock_env._action_endpoint(set_state(True), {}) is None\n    assert mock_env._action_endpoint(current_state >> set_state, {}) is None\n    assert mock_env._action_endpoint(set_state(True) + current_state, {}) is True\n\n\ndef test_raw_action_encrypted(mock_env, monkeypatch):\n    monkeypatch.setenv(\"ENCRYPTION_KEY\", \"the-cake-is-a-lie\")\n    assert mock_env._action_endpoint(set_state, {\"value\": True}) is None\n    assert mock_env._action_endpoint(current_state, {}) is True\n    assert mock_env._action_endpoint(set_state(True), {}) is None\n    assert mock_env._action_endpoint(current_state >> set_state, {}) is None\n    assert mock_env._action_endpoint(set_state(True) + current_state, {}) is True\n"
  }
]