[
  {
    "path": ".github/ISSUE_TEMPLATE/bug.md",
    "content": "---\nname: Bug Report\nabout: Submit a bug report\ntitle: \"[Bug Report] Bug title\"\n\n---\n\nIf you are submitting a bug report, please fill in the following details and use the tag [bug].\n\n**Describe the bug**\nA clear and concise description of what the bug is.\n\n**Code example**\nPlease try to provide a minimal example to reproduce the bug. Error messages and stack traces are also helpful.\n\n**System Info**\nDescribe the characteristic of your environment:\n * Describe how Gym was installed (pip, docker, source, ...)\n * What OS/version of Linux you're using. Note that while we will accept PRs to improve Window's support, we do not officially support it.\n * Python version\n\n**Additional context**\nAdd any other context about the problem here.\n\n### Checklist\n\n- [ ] I have checked that there is no similar [issue](https://github.com/openai/gym/issues) in the repo (**required**)\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/proposal.md",
    "content": "---\nname: Proposal\nabout: Propose changes that are not fixes bugs\ntitle: \"[Proposal] Proposal title\"\n---\n\n\n\n### Proposal \n\nA clear and concise description of the proposal.\n\n### Motivation\n\nPlease outline the motivation for the proposal.\nIs your feature request related to a problem? e.g.,\"I'm always frustrated when [...]\".\nIf this is related to another GitHub issue, please link here too.\n\n### Pitch\n\nA clear and concise description of what you want to happen.\n\n### Alternatives\n\nA clear and concise description of any alternative solutions or features you've considered, if any.\n\n### Additional context\n\nAdd any other context or screenshots about the feature request here.\n\n### Checklist\n\n- [ ] I have checked that there is no similar [issue](https://github.com/openai/gym/issues) in the repo (**required**)\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/question.md",
    "content": "---\nname: Question\nabout: Ask a question\ntitle: \"[Question] Question title\"\n---\n\n\n### Question\n\nIf you're a beginner and have basic questions, please ask on [r/reinforcementlearning](https://www.reddit.com/r/reinforcementlearning/) or in the [RL Discord](https://discord.com/invite/xhfNqQv) (if you're new please use the beginners channel). Basic questions that are not bugs or feature requests will be closed without reply, because GitHub issues are not an appropriate venue for these.\n\nAdvanced/nontrivial questions, especially in areas where documentation is lacking, are very much welcome.\n"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE.md",
    "content": "# Description\n\nPlease include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.\n\nFixes # (issue)\n\n## Type of change\n\nPlease delete options that are not relevant.\n\n- [ ] Bug fix (non-breaking change which fixes an issue)\n- [ ] New feature (non-breaking change which adds functionality)\n- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)\n- [ ] This change requires a documentation update\n\n### Screenshots\nPlease attach before and after screenshots of the change if applicable.\n\n<!--\nExample:\n\n| Before | After |\n| ------ | ----- |\n| _gif/png before_ | _gif/png after_ |\n\n\nTo upload images to a PR -- simply drag and drop an image while in edit mode and it should upload the image directly. You can then paste that source into the above before/after sections.\n-->\n\n# Checklist:\n\n- [ ] I have run the [`pre-commit` checks](https://pre-commit.com/) with `pre-commit run --all-files` (see `CONTRIBUTING.md` instructions to set it up)\n- [ ] I have commented my code, particularly in hard-to-understand areas\n- [ ] I have made corresponding changes to the documentation\n- [ ] My changes generate no new warnings\n- [ ] I have added tests that prove my fix is effective or that my feature works\n- [ ] New and existing unit tests pass locally with my changes\n\n<!--\nAs you go through the checklist above, you can mark something as done by putting an x character in it\n\nFor example,\n- [x] I have done this task\n- [ ] I have not done this task\n-->\n"
  },
  {
    "path": ".github/stale.yml",
    "content": "# Configuration for probot-stale - https://github.com/probot/stale\n\n# Number of days of inactivity before an Issue or Pull Request becomes stale\ndaysUntilStale: 60\n\n# Number of days of inactivity before an Issue or Pull Request with the stale label is closed.\n# Set to false to disable. If disabled, issues still need to be closed manually, but will remain marked as stale.\ndaysUntilClose: 14\n\n# Only issues or pull requests with all of these labels are check if stale. Defaults to `[]` (disabled)\nonlyLabels:\n  - more-information-needed\n\n# Issues or Pull Requests with these labels will never be considered stale. Set to `[]` to disable\nexemptLabels:\n  - pinned\n  - security\n  - \"[Status] Maybe Later\"\n\n# Set to true to ignore issues in a project (defaults to false)\nexemptProjects: true\n\n# Set to true to ignore issues in a milestone (defaults to false)\nexemptMilestones: true\n\n# Set to true to ignore issues with an assignee (defaults to false)\nexemptAssignees: true\n\n# Label to use when marking as stale\nstaleLabel: stale\n\n# Comment to post when marking as stale. Set to `false` to disable\nmarkComment: >\n  This issue has been automatically marked as stale because it has not had\n  recent activity. It will be closed if no further activity occurs. Thank you\n  for your contributions.\n\n# Comment to post when removing the stale label.\n# unmarkComment: >\n#   Your comment here.\n\n# Comment to post when closing a stale Issue or Pull Request.\n# closeComment: >\n#   Your comment here.\n\n# Limit the number of actions per hour, from 1-30. Default is 30\nlimitPerRun: 30\n\n# Limit to only `issues` or `pulls`\nonly: issues\n\n# Optionally, specify configuration settings that are specific to just 'issues' or 'pulls':\n# pulls:\n#   daysUntilStale: 30\n#   markComment: >\n#     This pull request has been automatically marked as stale because it has not had\n#     recent activity. It will be closed if no further activity occurs. Thank you\n#     for your contributions.\n\n# issues:\n#   exemptLabels:\n#     - confirmed"
  },
  {
    "path": ".github/workflows/build.yml",
    "content": "name: build\non: [pull_request, push]\n\npermissions:\n  contents: read # to fetch code (actions/checkout)\n\njobs:\n  build:\n    runs-on: ubuntu-latest\n    strategy:\n      matrix:\n        python-version: ['3.6', '3.7', '3.8', '3.9', '3.10']\n    steps:\n      - uses: actions/checkout@v2\n      - run: |\n           docker build -f py.Dockerfile \\\n             --build-arg PYTHON_VERSION=${{ matrix.python-version }} \\\n             --tag gym-docker .\n      - name: Run tests\n        run: docker run gym-docker pytest\n"
  },
  {
    "path": ".github/workflows/pre-commit.yml",
    "content": "# https://pre-commit.com\n# This GitHub Action assumes that the repo contains a valid .pre-commit-config.yaml file.\nname: pre-commit\non:\n  pull_request:\n  push:\n    branches: [master]\npermissions:\n  contents: read # to fetch code (actions/checkout)\njobs:\n  pre-commit:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v2\n      - uses: actions/setup-python@v2\n      - run: pip install pre-commit\n      - run: pre-commit --version\n      - run: pre-commit install\n      - run: pre-commit run --all-files\n"
  },
  {
    "path": ".gitignore",
    "content": "*.swp\n*.pyc\n*.py~\n.DS_Store\n.cache\n.pytest_cache/\n\n# Setuptools distribution and build folders.\n/dist/\n/build\n\n# Virtualenv\n/env\n\n# Python egg metadata, regenerated from source files by setuptools.\n/*.egg-info\n\n*.sublime-project\n*.sublime-workspace\n\nlogs/\n\n.ipynb_checkpoints\nghostdriver.log\n\njunk\nMUJOCO_LOG.txt\n\nrllab_mujoco\n\ntutorial/*.html\n\n# IDE files\n.eggs\n.tox\n\n# PyCharm project files\n.idea\nvizdoom.ini\n"
  },
  {
    "path": ".pre-commit-config.yaml",
    "content": "---\nrepos:\n  - repo: https://github.com/python/black\n    rev: 22.3.0\n    hooks:\n      - id: black\n  - repo: https://github.com/codespell-project/codespell\n    rev: v2.1.0\n    hooks:\n      - id: codespell\n        args:\n          - --ignore-words-list=nd,reacher,thist,ths, ure, referenc\n  - repo: https://gitlab.com/PyCQA/flake8\n    rev: 4.0.1\n    hooks:\n      - id: flake8\n        args:\n          - '--per-file-ignores=*/__init__.py:F401 gym/envs/registration.py:E704'\n          - --ignore=E203,W503,E741\n          - --max-complexity=30\n          - --max-line-length=456\n          - --show-source\n          - --statistics\n  - repo: https://github.com/PyCQA/isort\n    rev: 5.10.1\n    hooks:\n      - id: isort\n        args: [\"--profile\", \"black\"]\n  - repo: https://github.com/pycqa/pydocstyle\n    rev: 6.1.1  # pick a git hash / tag to point to\n    hooks:\n      - id: pydocstyle\n        exclude: ^(gym/version.py)|(gym/envs/)|(tests/)\n        args:\n          - --source\n          - --explain\n          - --convention=google\n        additional_dependencies: [\"toml\"]\n  - repo: https://github.com/asottile/pyupgrade\n    rev: v2.32.0\n    hooks:\n      - id: pyupgrade\n        # TODO: remove `--keep-runtime-typing` option\n        args: [\"--py36-plus\", \"--keep-runtime-typing\"]\n  - repo: local\n    hooks:\n      - id: pyright\n        name: pyright\n        entry: pyright\n        language: node\n        pass_filenames: false\n        types: [python]\n        additional_dependencies: [\"pyright\"]\n        args:\n          - --project=pyproject.toml\n"
  },
  {
    "path": "CODE_OF_CONDUCT.rst",
    "content": "OpenAI Gym is dedicated to providing a harassment-free experience for\neveryone, regardless of gender, gender identity and expression, sexual\norientation, disability, physical appearance, body size, age, race, or\nreligion. We do not tolerate harassment of participants in any form.\n\nThis code of conduct applies to all OpenAI Gym spaces (including Gist\ncomments) both online and off. Anyone who violates this code of\nconduct may be sanctioned or expelled from these spaces at the\ndiscretion of the OpenAI team.\n\nWe may add additional rules over time, which will be made clearly\navailable to participants. Participants are responsible for knowing\nand abiding by these rules.\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "content": "# Gym Contribution Guidelines\n\nAt this time we are currently accepting the current forms of contributions:\n\n- Bug reports (keep in mind that changing environment behavior should be minimized as that requires releasing a new version of the environment and makes results hard to compare across versions)\n- Pull requests for bug fixes\n- Documentation improvements\n\nNotably, we are not accepting these forms of contributions:\n\n- New environments\n- New features\n\nThis may change in the future.\nIf you wish to make a Gym environment, follow the instructions in [Creating Environments](https://github.com/openai/gym/blob/master/docs/creating_environments.md).  When your environment works, you can make a PR to add it to the bottom of the [List of Environments](https://github.com/openai/gym/blob/master/docs/third_party_environments.md).\n\n\nEdit July 27, 2021: Please see https://github.com/openai/gym/issues/2259 for new contributing standards\n\n# Development\nThis section contains technical instructions & hints for the contributors.\n\n## Type checking\nThe project uses `pyright` to check types. \nTo type check locally, install `pyright` per official [instructions](https://github.com/microsoft/pyright#command-line). \nIt's configuration lives within `pyproject.toml`. It includes list of included and excluded files currently supporting type checks.\nTo run `pyright` for the project, run the pre-commit process (`pre-commit run --all-files`) or `pyright --project=pyproject.toml`\nAlternatively, pyright is a built-in feature of VSCode that will automatically provide type hinting.\n\n### Adding typing to more modules and packages\nIf you would like to add typing to a module in the project, \nthe list of included, excluded and strict files can be found in pyproject.toml (pyproject.toml -> [tool.pyright]). \nTo run `pyright` for the project, run the pre-commit process (`pre-commit run --all-files`) or `pyright`\n\n## Git hooks\nThe CI will run several checks on the new code pushed to the Gym repository. These checks can also be run locally without waiting for the CI by following the steps below:\n1. [install `pre-commit`](https://pre-commit.com/#install),\n2. Install the Git hooks by running `pre-commit install`.\n\nOnce those two steps are done, the Git hooks will be run automatically at every new commit. \nThe Git hooks can also be run manually with `pre-commit run --all-files`, and if needed they can be skipped (not recommended) with `git commit --no-verify`. \n**Note:** you may have to run `pre-commit run --all-files` manually a couple of times to make it pass when you commit, as each formatting tool will first format the code and fail the first time but should pass the second time.\n\nAdditionally, for pull requests, the project runs a number of tests for the whole project using [pytest](https://docs.pytest.org/en/latest/getting-started.html#install-pytest).\nThese tests can be run locally with `pytest` in the root folder. \n\n## Docstrings\nPydocstyle has been added to the pre-commit process such that all new functions follow the [google docstring style](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).\nAll new functions require either a short docstring, a single line explaining the purpose of a function\nor a multiline docstring that documents each argument and the return type (if there is one) of the function.\nIn addition, new file and class require top docstrings that should outline the purpose of the file/class.\nFor classes, code block examples can be provided in the top docstring and not the constructor arguments.\n\nTo check your docstrings are correct, run `pre-commit run --all-files` or `pydocstyle --source --explain --convention=google`.\nIf all docstrings that fail, the source and reason for the failure is provided. "
  },
  {
    "path": "LICENSE.md",
    "content": "The MIT License\n\nCopyright (c) 2016 OpenAI (https://openai.com)\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in\nall copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\nTHE SOFTWARE.\n\n# Mujoco models\nThis work is derived from [MuJuCo models](http://www.mujoco.org/forum/index.php?resources/) used under the following license:\n```\nThis file is part of MuJoCo.     \nCopyright 2009-2015 Roboti LLC.\t\nMujoco\t\t:: Advanced physics simulation engine\nSource\t\t: www.roboti.us\nVersion\t\t: 1.31\nReleased \t: 23Apr16\nAuthor\t\t:: Vikash Kumar\nContacts \t: kumar@roboti.us\n```\n"
  },
  {
    "path": "README.md",
    "content": "[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n## Important Notice\n\n### The team that has been maintaining Gym since 2021 has moved all future development to [Gymnasium](https://github.com/Farama-Foundation/Gymnasium), a drop in replacement for Gym (import gymnasium as gym), and Gym will not be receiving any future updates. Please switch over to Gymnasium as soon as you're able to do so. If you'd like to read more about the story behind this switch, please check out [this blog post](https://farama.org/Announcing-The-Farama-Foundation).\n\n## Gym\n\nGym is an open source Python library for developing and comparing reinforcement learning algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of environments compliant with that API. Since its release, Gym's API has become the field standard for doing this.\n\nGym documentation website is at [https://www.gymlibrary.dev/](https://www.gymlibrary.dev/), and you can propose fixes and changes to it [here](https://github.com/Farama-Foundation/gym-docs).\n\nGym also has a discord server for development purposes that you can join here: https://discord.gg/nHg2JRN489\n\n## Installation\n\nTo install the base Gym library, use `pip install gym`.\n\nThis does not include dependencies for all families of environments (there's a massive number, and some can be problematic to install on certain systems). You can install these dependencies for one family like `pip install gym[atari]` or use `pip install gym[all]` to install all dependencies.\n\nWe support Python 3.7, 3.8, 3.9 and 3.10 on Linux and macOS. We will accept PRs related to Windows, but do not officially support it.\n\n## API\n\nThe Gym API's API models environments as simple Python `env` classes. Creating environment instances and interacting with them is very simple- here's an example using the \"CartPole-v1\" environment:\n\n```python\nimport gym\nenv = gym.make(\"CartPole-v1\")\nobservation, info = env.reset(seed=42)\n\nfor _ in range(1000):\n    action = env.action_space.sample()\n    observation, reward, terminated, truncated, info = env.step(action)\n\n    if terminated or truncated:\n        observation, info = env.reset()\nenv.close()\n```\n\n## Notable Related Libraries\n\nPlease note that this is an incomplete list, and just includes libraries that the maintainers most commonly point newcommers to when asked for recommendations.\n\n* [CleanRL](https://github.com/vwxyzjn/cleanrl) is a learning library based on the Gym API. It is designed to cater to newer people in the field and provides very good reference implementations.\n* [Tianshou](https://github.com/thu-ml/tianshou) is a learning library that's geared towards very experienced users and is design to allow for ease in complex algorithm modifications.\n* [RLlib](https://docs.ray.io/en/latest/rllib/index.html) is a learning library that allows for distributed training and inferencing and supports an extraordinarily large number of features throughout the reinforcement learning space.\n* [PettingZoo](https://github.com/Farama-Foundation/PettingZoo) is like Gym, but for environments with multiple agents.\n\n## Environment Versioning\n\nGym keeps strict versioning for reproducibility reasons. All environments end in a suffix like \"\\_v0\".  When changes are made to environments that might impact learning results, the number is increased by one to prevent potential confusion.\n\n## MuJoCo Environments\n\nThe latest \"\\_v4\" and future versions of the MuJoCo environments will no longer depend on `mujoco-py`. Instead `mujoco` will be the required dependency for future gym MuJoCo environment versions. Old gym MuJoCo environment versions that depend on `mujoco-py` will still be kept but unmaintained.\nTo install the dependencies for the latest gym MuJoCo environments use `pip install gym[mujoco]`. Dependencies for old MuJoCo environments can still be installed by `pip install gym[mujoco_py]`. \n\n## Citation\n\nA whitepaper from when Gym just came out is available https://arxiv.org/pdf/1606.01540, and can be cited with the following bibtex entry:\n\n```\n@misc{1606.01540,\n  Author = {Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba},\n  Title = {OpenAI Gym},\n  Year = {2016},\n  Eprint = {arXiv:1606.01540},\n}\n```\n\n## Release Notes\n\nThere used to be release notes for all the new Gym versions here. New release notes are being moved to [releases page](https://github.com/openai/gym/releases) on GitHub, like most other libraries do. Old notes can be viewed [here](https://github.com/openai/gym/blob/31be35ecd460f670f0c4b653a14c9996b7facc6c/README.rst).\n"
  },
  {
    "path": "bin/docker_entrypoint",
    "content": "#!/bin/bash\n# This script is the entrypoint for our Docker image.\n\nset -ex\n\n# Set up display; otherwise rendering will fail\nXvfb -screen 0 1024x768x24 &\nexport DISPLAY=:0\n\n# Wait for the file to come up\ndisplay=0\nfile=\"/tmp/.X11-unix/X$display\"\nfor i in $(seq 1 10); do\n    if [ -e \"$file\" ]; then\n\tbreak\n    fi\n\n    echo \"Waiting for $file to be created (try $i/10)\"\n    sleep \"$i\"\ndone\nif ! [ -e \"$file\" ]; then\n    echo \"Timing out: $file was not created\"\n    exit 1\nfi\n\nexec \"$@\"\n"
  },
  {
    "path": "gym/__init__.py",
    "content": "\"\"\"Root __init__ of the gym module setting the __all__ of gym modules.\"\"\"\n# isort: skip_file\n\nfrom gym import error\nfrom gym.version import VERSION as __version__\n\nfrom gym.core import (\n    Env,\n    Wrapper,\n    ObservationWrapper,\n    ActionWrapper,\n    RewardWrapper,\n)\nfrom gym.spaces import Space\nfrom gym.envs import make, spec, register\nfrom gym import logger\nfrom gym import vector\nfrom gym import wrappers\nimport os\nimport sys\n\n__all__ = [\"Env\", \"Space\", \"Wrapper\", \"make\", \"spec\", \"register\"]\n\n# Initializing pygame initializes audio connections through SDL. SDL uses alsa by default on all Linux systems\n# SDL connecting to alsa frequently create these giant lists of warnings every time you import an environment using\n#   pygame\n# DSP is far more benign (and should probably be the default in SDL anyways)\n\nif sys.platform.startswith(\"linux\"):\n    os.environ[\"SDL_AUDIODRIVER\"] = \"dsp\"\n\nos.environ[\"PYGAME_HIDE_SUPPORT_PROMPT\"] = \"hide\"\n\ntry:\n    import gym_notices.notices as notices\n\n    # print version warning if necessary\n    notice = notices.notices.get(__version__)\n    if notice:\n        print(notice, file=sys.stderr)\n\nexcept Exception:  # nosec\n    pass\n"
  },
  {
    "path": "gym/core.py",
    "content": "\"\"\"Core API for Environment, Wrapper, ActionWrapper, RewardWrapper and ObservationWrapper.\"\"\"\nimport sys\nfrom typing import (\n    TYPE_CHECKING,\n    Any,\n    Dict,\n    Generic,\n    List,\n    Optional,\n    SupportsFloat,\n    Tuple,\n    TypeVar,\n    Union,\n)\n\nimport numpy as np\n\nfrom gym import spaces\nfrom gym.logger import warn\nfrom gym.utils import seeding\n\nif TYPE_CHECKING:\n    from gym.envs.registration import EnvSpec\n\nif sys.version_info[0:2] == (3, 6):\n    warn(\n        \"Gym minimally supports python 3.6 as the python foundation not longer supports the version, please update your version to 3.7+\"\n    )\n\nObsType = TypeVar(\"ObsType\")\nActType = TypeVar(\"ActType\")\nRenderFrame = TypeVar(\"RenderFrame\")\n\n\nclass Env(Generic[ObsType, ActType]):\n    r\"\"\"The main OpenAI Gym class.\n\n    It encapsulates an environment with arbitrary behind-the-scenes dynamics.\n    An environment can be partially or fully observed.\n\n    The main API methods that users of this class need to know are:\n\n    - :meth:`step` - Takes a step in the environment using an action returning the next observation, reward,\n      if the environment terminated and observation information.\n    - :meth:`reset` - Resets the environment to an initial state, returning the initial observation and observation information.\n    - :meth:`render` - Renders the environment observation with modes depending on the output\n    - :meth:`close` - Closes the environment, important for rendering where pygame is imported\n\n    And set the following attributes:\n\n    - :attr:`action_space` - The Space object corresponding to valid actions\n    - :attr:`observation_space` - The Space object corresponding to valid observations\n    - :attr:`reward_range` - A tuple corresponding to the minimum and maximum possible rewards\n    - :attr:`spec` - An environment spec that contains the information used to initialise the environment from `gym.make`\n    - :attr:`metadata` - The metadata of the environment, i.e. render modes\n    - :attr:`np_random` - The random number generator for the environment\n\n    Note: a default reward range set to :math:`(-\\infty,+\\infty)` already exists. Set it if you want a narrower range.\n    \"\"\"\n\n    # Set this in SOME subclasses\n    metadata: Dict[str, Any] = {\"render_modes\": []}\n    # define render_mode if your environment supports rendering\n    render_mode: Optional[str] = None\n    reward_range = (-float(\"inf\"), float(\"inf\"))\n    spec: \"EnvSpec\" = None\n\n    # Set these in ALL subclasses\n    action_space: spaces.Space[ActType]\n    observation_space: spaces.Space[ObsType]\n\n    # Created\n    _np_random: Optional[np.random.Generator] = None\n\n    @property\n    def np_random(self) -> np.random.Generator:\n        \"\"\"Returns the environment's internal :attr:`_np_random` that if not set will initialise with a random seed.\"\"\"\n        if self._np_random is None:\n            self._np_random, seed = seeding.np_random()\n        return self._np_random\n\n    @np_random.setter\n    def np_random(self, value: np.random.Generator):\n        self._np_random = value\n\n    def step(self, action: ActType) -> Tuple[ObsType, float, bool, bool, dict]:\n        \"\"\"Run one timestep of the environment's dynamics.\n\n        When end of episode is reached, you are responsible for calling :meth:`reset` to reset this environment's state.\n        Accepts an action and returns either a tuple `(observation, reward, terminated, truncated, info)`.\n\n        Args:\n            action (ActType): an action provided by the agent\n\n        Returns:\n            observation (object): this will be an element of the environment's :attr:`observation_space`.\n                This may, for instance, be a numpy array containing the positions and velocities of certain objects.\n            reward (float): The amount of reward returned as a result of taking the action.\n            terminated (bool): whether a `terminal state` (as defined under the MDP of the task) is reached.\n                In this case further step() calls could return undefined results.\n            truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.\n                Typically a timelimit, but could also be used to indicate agent physically going out of bounds.\n                Can be used to end the episode prematurely before a `terminal state` is reached.\n            info (dictionary): `info` contains auxiliary diagnostic information (helpful for debugging, learning, and logging).\n                This might, for instance, contain: metrics that describe the agent's performance state, variables that are\n                hidden from observations, or individual reward terms that are combined to produce the total reward.\n                It also can contain information that distinguishes truncation and termination, however this is deprecated in favour\n                of returning two booleans, and will be removed in a future version.\n\n            (deprecated)\n            done (bool): A boolean value for if the episode has ended, in which case further :meth:`step` calls will return undefined results.\n                A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully,\n                a certain timelimit was exceeded, or the physics simulation has entered an invalid state.\n        \"\"\"\n        raise NotImplementedError\n\n    def reset(\n        self,\n        *,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ) -> Tuple[ObsType, dict]:\n        \"\"\"Resets the environment to an initial state and returns the initial observation.\n\n        This method can reset the environment's random number generator(s) if ``seed`` is an integer or\n        if the environment has not yet initialized a random number generator.\n        If the environment already has a random number generator and :meth:`reset` is called with ``seed=None``,\n        the RNG should not be reset. Moreover, :meth:`reset` should (in the typical use case) be called with an\n        integer seed right after initialization and then never again.\n\n        Args:\n            seed (optional int): The seed that is used to initialize the environment's PRNG.\n                If the environment does not already have a PRNG and ``seed=None`` (the default option) is passed,\n                a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom).\n                However, if the environment already has a PRNG and ``seed=None`` is passed, the PRNG will *not* be reset.\n                If you pass an integer, the PRNG will be reset even if it already exists.\n                Usually, you want to pass an integer *right after the environment has been initialized and then never again*.\n                Please refer to the minimal example above to see this paradigm in action.\n            options (optional dict): Additional information to specify how the environment is reset (optional,\n                depending on the specific environment)\n\n\n        Returns:\n            observation (object): Observation of the initial state. This will be an element of :attr:`observation_space`\n                (typically a numpy array) and is analogous to the observation returned by :meth:`step`.\n            info (dictionary):  This dictionary contains auxiliary information complementing ``observation``. It should be analogous to\n                the ``info`` returned by :meth:`step`.\n        \"\"\"\n        # Initialize the RNG if the seed is manually passed\n        if seed is not None:\n            self._np_random, seed = seeding.np_random(seed)\n\n    def render(self) -> Optional[Union[RenderFrame, List[RenderFrame]]]:\n        \"\"\"Compute the render frames as specified by render_mode attribute during initialization of the environment.\n\n        The set of supported modes varies per environment. (And some\n        third-party environments may not support rendering at all.)\n        By convention, if render_mode is:\n\n        - None (default): no render is computed.\n        - human: render return None.\n          The environment is continuously rendered in the current display or terminal. Usually for human consumption.\n        - rgb_array: return a single frame representing the current state of the environment.\n          A frame is a numpy.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.\n        - rgb_array_list: return a list of frames representing the states of the environment since the last reset.\n          Each frame is a numpy.ndarray with shape (x, y, 3), as with `rgb_array`.\n        - ansi: Return a strings (str) or StringIO.StringIO containing a\n          terminal-style text representation for each time step.\n          The text can include newlines and ANSI escape sequences (e.g. for colors).\n\n        Note:\n            Make sure that your class's metadata 'render_modes' key includes\n            the list of supported modes. It's recommended to call super()\n            in implementations to use the functionality of this method.\n        \"\"\"\n        raise NotImplementedError\n\n    def close(self):\n        \"\"\"Override close in your subclass to perform any necessary cleanup.\n\n        Environments will automatically :meth:`close()` themselves when\n        garbage collected or when the program exits.\n        \"\"\"\n        pass\n\n    @property\n    def unwrapped(self) -> \"Env\":\n        \"\"\"Returns the base non-wrapped environment.\n\n        Returns:\n            Env: The base non-wrapped gym.Env instance\n        \"\"\"\n        return self\n\n    def __str__(self):\n        \"\"\"Returns a string of the environment with the spec id if specified.\"\"\"\n        if self.spec is None:\n            return f\"<{type(self).__name__} instance>\"\n        else:\n            return f\"<{type(self).__name__}<{self.spec.id}>>\"\n\n    def __enter__(self):\n        \"\"\"Support with-statement for the environment.\"\"\"\n        return self\n\n    def __exit__(self, *args):\n        \"\"\"Support with-statement for the environment.\"\"\"\n        self.close()\n        # propagate exception\n        return False\n\n\nclass Wrapper(Env[ObsType, ActType]):\n    \"\"\"Wraps an environment to allow a modular transformation of the :meth:`step` and :meth:`reset` methods.\n\n    This class is the base class for all wrappers. The subclass could override\n    some methods to change the behavior of the original environment without touching the\n    original code.\n\n    Note:\n        Don't forget to call ``super().__init__(env)`` if the subclass overrides :meth:`__init__`.\n    \"\"\"\n\n    def __init__(self, env: Env):\n        \"\"\"Wraps an environment to allow a modular transformation of the :meth:`step` and :meth:`reset` methods.\n\n        Args:\n            env: The environment to wrap\n        \"\"\"\n        self.env = env\n\n        self._action_space: Optional[spaces.Space] = None\n        self._observation_space: Optional[spaces.Space] = None\n        self._reward_range: Optional[Tuple[SupportsFloat, SupportsFloat]] = None\n        self._metadata: Optional[dict] = None\n\n    def __getattr__(self, name):\n        \"\"\"Returns an attribute with ``name``, unless ``name`` starts with an underscore.\"\"\"\n        if name.startswith(\"_\"):\n            raise AttributeError(f\"accessing private attribute '{name}' is prohibited\")\n        return getattr(self.env, name)\n\n    @property\n    def spec(self):\n        \"\"\"Returns the environment specification.\"\"\"\n        return self.env.spec\n\n    @classmethod\n    def class_name(cls):\n        \"\"\"Returns the class name of the wrapper.\"\"\"\n        return cls.__name__\n\n    @property\n    def action_space(self) -> spaces.Space[ActType]:\n        \"\"\"Returns the action space of the environment.\"\"\"\n        if self._action_space is None:\n            return self.env.action_space\n        return self._action_space\n\n    @action_space.setter\n    def action_space(self, space: spaces.Space):\n        self._action_space = space\n\n    @property\n    def observation_space(self) -> spaces.Space:\n        \"\"\"Returns the observation space of the environment.\"\"\"\n        if self._observation_space is None:\n            return self.env.observation_space\n        return self._observation_space\n\n    @observation_space.setter\n    def observation_space(self, space: spaces.Space):\n        self._observation_space = space\n\n    @property\n    def reward_range(self) -> Tuple[SupportsFloat, SupportsFloat]:\n        \"\"\"Return the reward range of the environment.\"\"\"\n        if self._reward_range is None:\n            return self.env.reward_range\n        return self._reward_range\n\n    @reward_range.setter\n    def reward_range(self, value: Tuple[SupportsFloat, SupportsFloat]):\n        self._reward_range = value\n\n    @property\n    def metadata(self) -> dict:\n        \"\"\"Returns the environment metadata.\"\"\"\n        if self._metadata is None:\n            return self.env.metadata\n        return self._metadata\n\n    @metadata.setter\n    def metadata(self, value):\n        self._metadata = value\n\n    @property\n    def render_mode(self) -> Optional[str]:\n        \"\"\"Returns the environment render_mode.\"\"\"\n        return self.env.render_mode\n\n    @property\n    def np_random(self) -> np.random.Generator:\n        \"\"\"Returns the environment np_random.\"\"\"\n        return self.env.np_random\n\n    @np_random.setter\n    def np_random(self, value):\n        self.env.np_random = value\n\n    @property\n    def _np_random(self):\n        raise AttributeError(\n            \"Can't access `_np_random` of a wrapper, use `.unwrapped._np_random` or `.np_random`.\"\n        )\n\n    def step(self, action: ActType) -> Tuple[ObsType, float, bool, bool, dict]:\n        \"\"\"Steps through the environment with action.\"\"\"\n        return self.env.step(action)\n\n    def reset(self, **kwargs) -> Tuple[ObsType, dict]:\n        \"\"\"Resets the environment with kwargs.\"\"\"\n        return self.env.reset(**kwargs)\n\n    def render(\n        self, *args, **kwargs\n    ) -> Optional[Union[RenderFrame, List[RenderFrame]]]:\n        \"\"\"Renders the environment.\"\"\"\n        return self.env.render(*args, **kwargs)\n\n    def close(self):\n        \"\"\"Closes the environment.\"\"\"\n        return self.env.close()\n\n    def __str__(self):\n        \"\"\"Returns the wrapper name and the unwrapped environment string.\"\"\"\n        return f\"<{type(self).__name__}{self.env}>\"\n\n    def __repr__(self):\n        \"\"\"Returns the string representation of the wrapper.\"\"\"\n        return str(self)\n\n    @property\n    def unwrapped(self) -> Env:\n        \"\"\"Returns the base environment of the wrapper.\"\"\"\n        return self.env.unwrapped\n\n\nclass ObservationWrapper(Wrapper):\n    \"\"\"Superclass of wrappers that can modify observations using :meth:`observation` for :meth:`reset` and :meth:`step`.\n\n    If you would like to apply a function to the observation that is returned by the base environment before\n    passing it to learning code, you can simply inherit from :class:`ObservationWrapper` and overwrite the method\n    :meth:`observation` to implement that transformation. The transformation defined in that method must be\n    defined on the base environment’s observation space. However, it may take values in a different space.\n    In that case, you need to specify the new observation space of the wrapper by setting :attr:`self.observation_space`\n    in the :meth:`__init__` method of your wrapper.\n\n    For example, you might have a 2D navigation task where the environment returns dictionaries as observations with\n    keys ``\"agent_position\"`` and ``\"target_position\"``. A common thing to do might be to throw away some degrees of\n    freedom and only consider the position of the target relative to the agent, i.e.\n    ``observation[\"target_position\"] - observation[\"agent_position\"]``. For this, you could implement an\n    observation wrapper like this::\n\n        class RelativePosition(gym.ObservationWrapper):\n            def __init__(self, env):\n                super().__init__(env)\n                self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)\n\n            def observation(self, obs):\n                return obs[\"target\"] - obs[\"agent\"]\n\n    Among others, Gym provides the observation wrapper :class:`TimeAwareObservation`, which adds information about the\n    index of the timestep to the observation.\n    \"\"\"\n\n    def reset(self, **kwargs):\n        \"\"\"Resets the environment, returning a modified observation using :meth:`self.observation`.\"\"\"\n        obs, info = self.env.reset(**kwargs)\n        return self.observation(obs), info\n\n    def step(self, action):\n        \"\"\"Returns a modified observation using :meth:`self.observation` after calling :meth:`env.step`.\"\"\"\n        observation, reward, terminated, truncated, info = self.env.step(action)\n        return self.observation(observation), reward, terminated, truncated, info\n\n    def observation(self, observation):\n        \"\"\"Returns a modified observation.\"\"\"\n        raise NotImplementedError\n\n\nclass RewardWrapper(Wrapper):\n    \"\"\"Superclass of wrappers that can modify the returning reward from a step.\n\n    If you would like to apply a function to the reward that is returned by the base environment before\n    passing it to learning code, you can simply inherit from :class:`RewardWrapper` and overwrite the method\n    :meth:`reward` to implement that transformation.\n    This transformation might change the reward range; to specify the reward range of your wrapper,\n    you can simply define :attr:`self.reward_range` in :meth:`__init__`.\n\n    Let us look at an example: Sometimes (especially when we do not have control over the reward\n    because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.\n    To do that, we could, for instance, implement the following wrapper::\n\n        class ClipReward(gym.RewardWrapper):\n            def __init__(self, env, min_reward, max_reward):\n                super().__init__(env)\n                self.min_reward = min_reward\n                self.max_reward = max_reward\n                self.reward_range = (min_reward, max_reward)\n\n            def reward(self, reward):\n                return np.clip(reward, self.min_reward, self.max_reward)\n    \"\"\"\n\n    def step(self, action):\n        \"\"\"Modifies the reward using :meth:`self.reward` after the environment :meth:`env.step`.\"\"\"\n        observation, reward, terminated, truncated, info = self.env.step(action)\n        return observation, self.reward(reward), terminated, truncated, info\n\n    def reward(self, reward):\n        \"\"\"Returns a modified ``reward``.\"\"\"\n        raise NotImplementedError\n\n\nclass ActionWrapper(Wrapper):\n    \"\"\"Superclass of wrappers that can modify the action before :meth:`env.step`.\n\n    If you would like to apply a function to the action before passing it to the base environment,\n    you can simply inherit from :class:`ActionWrapper` and overwrite the method :meth:`action` to implement\n    that transformation. The transformation defined in that method must take values in the base environment’s\n    action space. However, its domain might differ from the original action space.\n    In that case, you need to specify the new action space of the wrapper by setting :attr:`self.action_space` in\n    the :meth:`__init__` method of your wrapper.\n\n    Let’s say you have an environment with action space of type :class:`gym.spaces.Box`, but you would only like\n    to use a finite subset of actions. Then, you might want to implement the following wrapper::\n\n        class DiscreteActions(gym.ActionWrapper):\n            def __init__(self, env, disc_to_cont):\n                super().__init__(env)\n                self.disc_to_cont = disc_to_cont\n                self.action_space = Discrete(len(disc_to_cont))\n\n            def action(self, act):\n                return self.disc_to_cont[act]\n\n        if __name__ == \"__main__\":\n            env = gym.make(\"LunarLanderContinuous-v2\")\n            wrapped_env = DiscreteActions(env, [np.array([1,0]), np.array([-1,0]),\n                                                np.array([0,1]), np.array([0,-1])])\n            print(wrapped_env.action_space)         #Discrete(4)\n\n\n    Among others, Gym provides the action wrappers :class:`ClipAction` and :class:`RescaleAction`.\n    \"\"\"\n\n    def step(self, action):\n        \"\"\"Runs the environment :meth:`env.step` using the modified ``action`` from :meth:`self.action`.\"\"\"\n        return self.env.step(self.action(action))\n\n    def action(self, action):\n        \"\"\"Returns a modified action before :meth:`env.step` is called.\"\"\"\n        raise NotImplementedError\n\n    def reverse_action(self, action):\n        \"\"\"Returns a reversed ``action``.\"\"\"\n        raise NotImplementedError\n"
  },
  {
    "path": "gym/envs/__init__.py",
    "content": "from gym.envs.registration import load_env_plugins as _load_env_plugins\nfrom gym.envs.registration import make, register, registry, spec\n\n# Hook to load plugins from entry points\n_load_env_plugins()\n\n\n# Classic\n# ----------------------------------------\n\nregister(\n    id=\"CartPole-v0\",\n    entry_point=\"gym.envs.classic_control.cartpole:CartPoleEnv\",\n    max_episode_steps=200,\n    reward_threshold=195.0,\n)\n\nregister(\n    id=\"CartPole-v1\",\n    entry_point=\"gym.envs.classic_control.cartpole:CartPoleEnv\",\n    max_episode_steps=500,\n    reward_threshold=475.0,\n)\n\nregister(\n    id=\"MountainCar-v0\",\n    entry_point=\"gym.envs.classic_control.mountain_car:MountainCarEnv\",\n    max_episode_steps=200,\n    reward_threshold=-110.0,\n)\n\nregister(\n    id=\"MountainCarContinuous-v0\",\n    entry_point=\"gym.envs.classic_control.continuous_mountain_car:Continuous_MountainCarEnv\",\n    max_episode_steps=999,\n    reward_threshold=90.0,\n)\n\nregister(\n    id=\"Pendulum-v1\",\n    entry_point=\"gym.envs.classic_control.pendulum:PendulumEnv\",\n    max_episode_steps=200,\n)\n\nregister(\n    id=\"Acrobot-v1\",\n    entry_point=\"gym.envs.classic_control.acrobot:AcrobotEnv\",\n    reward_threshold=-100.0,\n    max_episode_steps=500,\n)\n\n# Box2d\n# ----------------------------------------\n\nregister(\n    id=\"LunarLander-v2\",\n    entry_point=\"gym.envs.box2d.lunar_lander:LunarLander\",\n    max_episode_steps=1000,\n    reward_threshold=200,\n)\n\nregister(\n    id=\"LunarLanderContinuous-v2\",\n    entry_point=\"gym.envs.box2d.lunar_lander:LunarLander\",\n    kwargs={\"continuous\": True},\n    max_episode_steps=1000,\n    reward_threshold=200,\n)\n\nregister(\n    id=\"BipedalWalker-v3\",\n    entry_point=\"gym.envs.box2d.bipedal_walker:BipedalWalker\",\n    max_episode_steps=1600,\n    reward_threshold=300,\n)\n\nregister(\n    id=\"BipedalWalkerHardcore-v3\",\n    entry_point=\"gym.envs.box2d.bipedal_walker:BipedalWalker\",\n    kwargs={\"hardcore\": True},\n    max_episode_steps=2000,\n    reward_threshold=300,\n)\n\nregister(\n    id=\"CarRacing-v2\",\n    entry_point=\"gym.envs.box2d.car_racing:CarRacing\",\n    max_episode_steps=1000,\n    reward_threshold=900,\n)\n\n# Toy Text\n# ----------------------------------------\n\nregister(\n    id=\"Blackjack-v1\",\n    entry_point=\"gym.envs.toy_text.blackjack:BlackjackEnv\",\n    kwargs={\"sab\": True, \"natural\": False},\n)\n\nregister(\n    id=\"FrozenLake-v1\",\n    entry_point=\"gym.envs.toy_text.frozen_lake:FrozenLakeEnv\",\n    kwargs={\"map_name\": \"4x4\"},\n    max_episode_steps=100,\n    reward_threshold=0.70,  # optimum = 0.74\n)\n\nregister(\n    id=\"FrozenLake8x8-v1\",\n    entry_point=\"gym.envs.toy_text.frozen_lake:FrozenLakeEnv\",\n    kwargs={\"map_name\": \"8x8\"},\n    max_episode_steps=200,\n    reward_threshold=0.85,  # optimum = 0.91\n)\n\nregister(\n    id=\"CliffWalking-v0\",\n    entry_point=\"gym.envs.toy_text.cliffwalking:CliffWalkingEnv\",\n)\n\nregister(\n    id=\"Taxi-v3\",\n    entry_point=\"gym.envs.toy_text.taxi:TaxiEnv\",\n    reward_threshold=8,  # optimum = 8.46\n    max_episode_steps=200,\n)\n\n# Mujoco\n# ----------------------------------------\n\n# 2D\n\nregister(\n    id=\"Reacher-v2\",\n    entry_point=\"gym.envs.mujoco:ReacherEnv\",\n    max_episode_steps=50,\n    reward_threshold=-3.75,\n)\n\nregister(\n    id=\"Reacher-v4\",\n    entry_point=\"gym.envs.mujoco.reacher_v4:ReacherEnv\",\n    max_episode_steps=50,\n    reward_threshold=-3.75,\n)\n\nregister(\n    id=\"Pusher-v2\",\n    entry_point=\"gym.envs.mujoco:PusherEnv\",\n    max_episode_steps=100,\n    reward_threshold=0.0,\n)\n\nregister(\n    id=\"Pusher-v4\",\n    entry_point=\"gym.envs.mujoco.pusher_v4:PusherEnv\",\n    max_episode_steps=100,\n    reward_threshold=0.0,\n)\n\nregister(\n    id=\"InvertedPendulum-v2\",\n    entry_point=\"gym.envs.mujoco:InvertedPendulumEnv\",\n    max_episode_steps=1000,\n    reward_threshold=950.0,\n)\n\nregister(\n    id=\"InvertedPendulum-v4\",\n    entry_point=\"gym.envs.mujoco.inverted_pendulum_v4:InvertedPendulumEnv\",\n    max_episode_steps=1000,\n    reward_threshold=950.0,\n)\n\nregister(\n    id=\"InvertedDoublePendulum-v2\",\n    entry_point=\"gym.envs.mujoco:InvertedDoublePendulumEnv\",\n    max_episode_steps=1000,\n    reward_threshold=9100.0,\n)\n\nregister(\n    id=\"InvertedDoublePendulum-v4\",\n    entry_point=\"gym.envs.mujoco.inverted_double_pendulum_v4:InvertedDoublePendulumEnv\",\n    max_episode_steps=1000,\n    reward_threshold=9100.0,\n)\n\nregister(\n    id=\"HalfCheetah-v2\",\n    entry_point=\"gym.envs.mujoco:HalfCheetahEnv\",\n    max_episode_steps=1000,\n    reward_threshold=4800.0,\n)\n\nregister(\n    id=\"HalfCheetah-v3\",\n    entry_point=\"gym.envs.mujoco.half_cheetah_v3:HalfCheetahEnv\",\n    max_episode_steps=1000,\n    reward_threshold=4800.0,\n)\n\nregister(\n    id=\"HalfCheetah-v4\",\n    entry_point=\"gym.envs.mujoco.half_cheetah_v4:HalfCheetahEnv\",\n    max_episode_steps=1000,\n    reward_threshold=4800.0,\n)\n\nregister(\n    id=\"Hopper-v2\",\n    entry_point=\"gym.envs.mujoco:HopperEnv\",\n    max_episode_steps=1000,\n    reward_threshold=3800.0,\n)\n\nregister(\n    id=\"Hopper-v3\",\n    entry_point=\"gym.envs.mujoco.hopper_v3:HopperEnv\",\n    max_episode_steps=1000,\n    reward_threshold=3800.0,\n)\n\nregister(\n    id=\"Hopper-v4\",\n    entry_point=\"gym.envs.mujoco.hopper_v4:HopperEnv\",\n    max_episode_steps=1000,\n    reward_threshold=3800.0,\n)\n\nregister(\n    id=\"Swimmer-v2\",\n    entry_point=\"gym.envs.mujoco:SwimmerEnv\",\n    max_episode_steps=1000,\n    reward_threshold=360.0,\n)\n\nregister(\n    id=\"Swimmer-v3\",\n    entry_point=\"gym.envs.mujoco.swimmer_v3:SwimmerEnv\",\n    max_episode_steps=1000,\n    reward_threshold=360.0,\n)\n\nregister(\n    id=\"Swimmer-v4\",\n    entry_point=\"gym.envs.mujoco.swimmer_v4:SwimmerEnv\",\n    max_episode_steps=1000,\n    reward_threshold=360.0,\n)\n\nregister(\n    id=\"Walker2d-v2\",\n    max_episode_steps=1000,\n    entry_point=\"gym.envs.mujoco:Walker2dEnv\",\n)\n\nregister(\n    id=\"Walker2d-v3\",\n    max_episode_steps=1000,\n    entry_point=\"gym.envs.mujoco.walker2d_v3:Walker2dEnv\",\n)\n\nregister(\n    id=\"Walker2d-v4\",\n    max_episode_steps=1000,\n    entry_point=\"gym.envs.mujoco.walker2d_v4:Walker2dEnv\",\n)\n\nregister(\n    id=\"Ant-v2\",\n    entry_point=\"gym.envs.mujoco:AntEnv\",\n    max_episode_steps=1000,\n    reward_threshold=6000.0,\n)\n\nregister(\n    id=\"Ant-v3\",\n    entry_point=\"gym.envs.mujoco.ant_v3:AntEnv\",\n    max_episode_steps=1000,\n    reward_threshold=6000.0,\n)\n\nregister(\n    id=\"Ant-v4\",\n    entry_point=\"gym.envs.mujoco.ant_v4:AntEnv\",\n    max_episode_steps=1000,\n    reward_threshold=6000.0,\n)\n\nregister(\n    id=\"Humanoid-v2\",\n    entry_point=\"gym.envs.mujoco:HumanoidEnv\",\n    max_episode_steps=1000,\n)\n\nregister(\n    id=\"Humanoid-v3\",\n    entry_point=\"gym.envs.mujoco.humanoid_v3:HumanoidEnv\",\n    max_episode_steps=1000,\n)\n\nregister(\n    id=\"Humanoid-v4\",\n    entry_point=\"gym.envs.mujoco.humanoid_v4:HumanoidEnv\",\n    max_episode_steps=1000,\n)\n\nregister(\n    id=\"HumanoidStandup-v2\",\n    entry_point=\"gym.envs.mujoco:HumanoidStandupEnv\",\n    max_episode_steps=1000,\n)\n\nregister(\n    id=\"HumanoidStandup-v4\",\n    entry_point=\"gym.envs.mujoco.humanoidstandup_v4:HumanoidStandupEnv\",\n    max_episode_steps=1000,\n)\n"
  },
  {
    "path": "gym/envs/box2d/__init__.py",
    "content": "from gym.envs.box2d.bipedal_walker import BipedalWalker, BipedalWalkerHardcore\nfrom gym.envs.box2d.car_racing import CarRacing\nfrom gym.envs.box2d.lunar_lander import LunarLander, LunarLanderContinuous\n"
  },
  {
    "path": "gym/envs/box2d/bipedal_walker.py",
    "content": "__credits__ = [\"Andrea PIERRÉ\"]\n\nimport math\nfrom typing import TYPE_CHECKING, List, Optional\n\nimport numpy as np\n\nimport gym\nfrom gym import error, spaces\nfrom gym.error import DependencyNotInstalled\nfrom gym.utils import EzPickle\n\ntry:\n    import Box2D\n    from Box2D.b2 import (\n        circleShape,\n        contactListener,\n        edgeShape,\n        fixtureDef,\n        polygonShape,\n        revoluteJointDef,\n    )\nexcept ImportError:\n    raise DependencyNotInstalled(\"box2D is not installed, run `pip install gym[box2d]`\")\n\n\nif TYPE_CHECKING:\n    import pygame\n\nFPS = 50\nSCALE = 30.0  # affects how fast-paced the game is, forces should be adjusted as well\n\nMOTORS_TORQUE = 80\nSPEED_HIP = 4\nSPEED_KNEE = 6\nLIDAR_RANGE = 160 / SCALE\n\nINITIAL_RANDOM = 5\n\nHULL_POLY = [(-30, +9), (+6, +9), (+34, +1), (+34, -8), (-30, -8)]\nLEG_DOWN = -8 / SCALE\nLEG_W, LEG_H = 8 / SCALE, 34 / SCALE\n\nVIEWPORT_W = 600\nVIEWPORT_H = 400\n\nTERRAIN_STEP = 14 / SCALE\nTERRAIN_LENGTH = 200  # in steps\nTERRAIN_HEIGHT = VIEWPORT_H / SCALE / 4\nTERRAIN_GRASS = 10  # low long are grass spots, in steps\nTERRAIN_STARTPAD = 20  # in steps\nFRICTION = 2.5\n\nHULL_FD = fixtureDef(\n    shape=polygonShape(vertices=[(x / SCALE, y / SCALE) for x, y in HULL_POLY]),\n    density=5.0,\n    friction=0.1,\n    categoryBits=0x0020,\n    maskBits=0x001,  # collide only with ground\n    restitution=0.0,\n)  # 0.99 bouncy\n\nLEG_FD = fixtureDef(\n    shape=polygonShape(box=(LEG_W / 2, LEG_H / 2)),\n    density=1.0,\n    restitution=0.0,\n    categoryBits=0x0020,\n    maskBits=0x001,\n)\n\nLOWER_FD = fixtureDef(\n    shape=polygonShape(box=(0.8 * LEG_W / 2, LEG_H / 2)),\n    density=1.0,\n    restitution=0.0,\n    categoryBits=0x0020,\n    maskBits=0x001,\n)\n\n\nclass ContactDetector(contactListener):\n    def __init__(self, env):\n        contactListener.__init__(self)\n        self.env = env\n\n    def BeginContact(self, contact):\n        if (\n            self.env.hull == contact.fixtureA.body\n            or self.env.hull == contact.fixtureB.body\n        ):\n            self.env.game_over = True\n        for leg in [self.env.legs[1], self.env.legs[3]]:\n            if leg in [contact.fixtureA.body, contact.fixtureB.body]:\n                leg.ground_contact = True\n\n    def EndContact(self, contact):\n        for leg in [self.env.legs[1], self.env.legs[3]]:\n            if leg in [contact.fixtureA.body, contact.fixtureB.body]:\n                leg.ground_contact = False\n\n\nclass BipedalWalker(gym.Env, EzPickle):\n    \"\"\"\n    ### Description\n    This is a simple 4-joint walker robot environment.\n    There are two versions:\n    - Normal, with slightly uneven terrain.\n    - Hardcore, with ladders, stumps, pitfalls.\n\n    To solve the normal version, you need to get 300 points in 1600 time steps.\n    To solve the hardcore version, you need 300 points in 2000 time steps.\n\n    A heuristic is provided for testing. It's also useful to get demonstrations\n    to learn from. To run the heuristic:\n    ```\n    python gym/envs/box2d/bipedal_walker.py\n    ```\n\n    ### Action Space\n    Actions are motor speed values in the [-1, 1] range for each of the\n    4 joints at both hips and knees.\n\n    ### Observation Space\n    State consists of hull angle speed, angular velocity, horizontal speed,\n    vertical speed, position of joints and joints angular speed, legs contact\n    with ground, and 10 lidar rangefinder measurements. There are no coordinates\n    in the state vector.\n\n    ### Rewards\n    Reward is given for moving forward, totaling 300+ points up to the far end.\n    If the robot falls, it gets -100. Applying motor torque costs a small\n    amount of points. A more optimal agent will get a better score.\n\n    ### Starting State\n    The walker starts standing at the left end of the terrain with the hull\n    horizontal, and both legs in the same position with a slight knee angle.\n\n    ### Episode Termination\n    The episode will terminate if the hull gets in contact with the ground or\n    if the walker exceeds the right end of the terrain length.\n\n    ### Arguments\n    To use to the _hardcore_ environment, you need to specify the\n    `hardcore=True` argument like below:\n    ```python\n    import gym\n    env = gym.make(\"BipedalWalker-v3\", hardcore=True)\n    ```\n\n    ### Version History\n    - v3: returns closest lidar trace instead of furthest;\n        faster video recording\n    - v2: Count energy spent\n    - v1: Legs now report contact with ground; motors have higher torque and\n        speed; ground has higher friction; lidar rendered less nervously.\n    - v0: Initial version\n\n\n    <!-- ### References -->\n\n    ### Credits\n    Created by Oleg Klimov\n\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\"human\", \"rgb_array\"],\n        \"render_fps\": FPS,\n    }\n\n    def __init__(self, render_mode: Optional[str] = None, hardcore: bool = False):\n        EzPickle.__init__(self, render_mode, hardcore)\n        self.isopen = True\n\n        self.world = Box2D.b2World()\n        self.terrain: List[Box2D.b2Body] = []\n        self.hull: Optional[Box2D.b2Body] = None\n\n        self.prev_shaping = None\n\n        self.hardcore = hardcore\n\n        self.fd_polygon = fixtureDef(\n            shape=polygonShape(vertices=[(0, 0), (1, 0), (1, -1), (0, -1)]),\n            friction=FRICTION,\n        )\n\n        self.fd_edge = fixtureDef(\n            shape=edgeShape(vertices=[(0, 0), (1, 1)]),\n            friction=FRICTION,\n            categoryBits=0x0001,\n        )\n\n        # we use 5.0 to represent the joints moving at maximum\n        # 5 x the rated speed due to impulses from ground contact etc.\n        low = np.array(\n            [\n                -math.pi,\n                -5.0,\n                -5.0,\n                -5.0,\n                -math.pi,\n                -5.0,\n                -math.pi,\n                -5.0,\n                -0.0,\n                -math.pi,\n                -5.0,\n                -math.pi,\n                -5.0,\n                -0.0,\n            ]\n            + [-1.0] * 10\n        ).astype(np.float32)\n        high = np.array(\n            [\n                math.pi,\n                5.0,\n                5.0,\n                5.0,\n                math.pi,\n                5.0,\n                math.pi,\n                5.0,\n                5.0,\n                math.pi,\n                5.0,\n                math.pi,\n                5.0,\n                5.0,\n            ]\n            + [1.0] * 10\n        ).astype(np.float32)\n        self.action_space = spaces.Box(\n            np.array([-1, -1, -1, -1]).astype(np.float32),\n            np.array([1, 1, 1, 1]).astype(np.float32),\n        )\n        self.observation_space = spaces.Box(low, high)\n\n        # state = [\n        #     self.hull.angle,  # Normal angles up to 0.5 here, but sure more is possible.\n        #     2.0 * self.hull.angularVelocity / FPS,\n        #     0.3 * vel.x * (VIEWPORT_W / SCALE) / FPS,  # Normalized to get -1..1 range\n        #     0.3 * vel.y * (VIEWPORT_H / SCALE) / FPS,\n        #     self.joints[\n        #         0\n        #     ].angle,  # This will give 1.1 on high up, but it's still OK (and there should be spikes on hiting the ground, that's normal too)\n        #     self.joints[0].speed / SPEED_HIP,\n        #     self.joints[1].angle + 1.0,\n        #     self.joints[1].speed / SPEED_KNEE,\n        #     1.0 if self.legs[1].ground_contact else 0.0,\n        #     self.joints[2].angle,\n        #     self.joints[2].speed / SPEED_HIP,\n        #     self.joints[3].angle + 1.0,\n        #     self.joints[3].speed / SPEED_KNEE,\n        #     1.0 if self.legs[3].ground_contact else 0.0,\n        # ]\n        # state += [l.fraction for l in self.lidar]\n\n        self.render_mode = render_mode\n        self.screen: Optional[pygame.Surface] = None\n        self.clock = None\n\n    def _destroy(self):\n        if not self.terrain:\n            return\n        self.world.contactListener = None\n        for t in self.terrain:\n            self.world.DestroyBody(t)\n        self.terrain = []\n        self.world.DestroyBody(self.hull)\n        self.hull = None\n        for leg in self.legs:\n            self.world.DestroyBody(leg)\n        self.legs = []\n        self.joints = []\n\n    def _generate_terrain(self, hardcore):\n        GRASS, STUMP, STAIRS, PIT, _STATES_ = range(5)\n        state = GRASS\n        velocity = 0.0\n        y = TERRAIN_HEIGHT\n        counter = TERRAIN_STARTPAD\n        oneshot = False\n        self.terrain = []\n        self.terrain_x = []\n        self.terrain_y = []\n\n        stair_steps, stair_width, stair_height = 0, 0, 0\n        original_y = 0\n        for i in range(TERRAIN_LENGTH):\n            x = i * TERRAIN_STEP\n            self.terrain_x.append(x)\n\n            if state == GRASS and not oneshot:\n                velocity = 0.8 * velocity + 0.01 * np.sign(TERRAIN_HEIGHT - y)\n                if i > TERRAIN_STARTPAD:\n                    velocity += self.np_random.uniform(-1, 1) / SCALE  # 1\n                y += velocity\n\n            elif state == PIT and oneshot:\n                counter = self.np_random.integers(3, 5)\n                poly = [\n                    (x, y),\n                    (x + TERRAIN_STEP, y),\n                    (x + TERRAIN_STEP, y - 4 * TERRAIN_STEP),\n                    (x, y - 4 * TERRAIN_STEP),\n                ]\n                self.fd_polygon.shape.vertices = poly\n                t = self.world.CreateStaticBody(fixtures=self.fd_polygon)\n                t.color1, t.color2 = (255, 255, 255), (153, 153, 153)\n                self.terrain.append(t)\n\n                self.fd_polygon.shape.vertices = [\n                    (p[0] + TERRAIN_STEP * counter, p[1]) for p in poly\n                ]\n                t = self.world.CreateStaticBody(fixtures=self.fd_polygon)\n                t.color1, t.color2 = (255, 255, 255), (153, 153, 153)\n                self.terrain.append(t)\n                counter += 2\n                original_y = y\n\n            elif state == PIT and not oneshot:\n                y = original_y\n                if counter > 1:\n                    y -= 4 * TERRAIN_STEP\n\n            elif state == STUMP and oneshot:\n                counter = self.np_random.integers(1, 3)\n                poly = [\n                    (x, y),\n                    (x + counter * TERRAIN_STEP, y),\n                    (x + counter * TERRAIN_STEP, y + counter * TERRAIN_STEP),\n                    (x, y + counter * TERRAIN_STEP),\n                ]\n                self.fd_polygon.shape.vertices = poly\n                t = self.world.CreateStaticBody(fixtures=self.fd_polygon)\n                t.color1, t.color2 = (255, 255, 255), (153, 153, 153)\n                self.terrain.append(t)\n\n            elif state == STAIRS and oneshot:\n                stair_height = +1 if self.np_random.random() > 0.5 else -1\n                stair_width = self.np_random.integers(4, 5)\n                stair_steps = self.np_random.integers(3, 5)\n                original_y = y\n                for s in range(stair_steps):\n                    poly = [\n                        (\n                            x + (s * stair_width) * TERRAIN_STEP,\n                            y + (s * stair_height) * TERRAIN_STEP,\n                        ),\n                        (\n                            x + ((1 + s) * stair_width) * TERRAIN_STEP,\n                            y + (s * stair_height) * TERRAIN_STEP,\n                        ),\n                        (\n                            x + ((1 + s) * stair_width) * TERRAIN_STEP,\n                            y + (-1 + s * stair_height) * TERRAIN_STEP,\n                        ),\n                        (\n                            x + (s * stair_width) * TERRAIN_STEP,\n                            y + (-1 + s * stair_height) * TERRAIN_STEP,\n                        ),\n                    ]\n                    self.fd_polygon.shape.vertices = poly\n                    t = self.world.CreateStaticBody(fixtures=self.fd_polygon)\n                    t.color1, t.color2 = (255, 255, 255), (153, 153, 153)\n                    self.terrain.append(t)\n                counter = stair_steps * stair_width\n\n            elif state == STAIRS and not oneshot:\n                s = stair_steps * stair_width - counter - stair_height\n                n = s / stair_width\n                y = original_y + (n * stair_height) * TERRAIN_STEP\n\n            oneshot = False\n            self.terrain_y.append(y)\n            counter -= 1\n            if counter == 0:\n                counter = self.np_random.integers(TERRAIN_GRASS / 2, TERRAIN_GRASS)\n                if state == GRASS and hardcore:\n                    state = self.np_random.integers(1, _STATES_)\n                    oneshot = True\n                else:\n                    state = GRASS\n                    oneshot = True\n\n        self.terrain_poly = []\n        for i in range(TERRAIN_LENGTH - 1):\n            poly = [\n                (self.terrain_x[i], self.terrain_y[i]),\n                (self.terrain_x[i + 1], self.terrain_y[i + 1]),\n            ]\n            self.fd_edge.shape.vertices = poly\n            t = self.world.CreateStaticBody(fixtures=self.fd_edge)\n            color = (76, 255 if i % 2 == 0 else 204, 76)\n            t.color1 = color\n            t.color2 = color\n            self.terrain.append(t)\n            color = (102, 153, 76)\n            poly += [(poly[1][0], 0), (poly[0][0], 0)]\n            self.terrain_poly.append((poly, color))\n        self.terrain.reverse()\n\n    def _generate_clouds(self):\n        # Sorry for the clouds, couldn't resist\n        self.cloud_poly = []\n        for i in range(TERRAIN_LENGTH // 20):\n            x = self.np_random.uniform(0, TERRAIN_LENGTH) * TERRAIN_STEP\n            y = VIEWPORT_H / SCALE * 3 / 4\n            poly = [\n                (\n                    x\n                    + 15 * TERRAIN_STEP * math.sin(3.14 * 2 * a / 5)\n                    + self.np_random.uniform(0, 5 * TERRAIN_STEP),\n                    y\n                    + 5 * TERRAIN_STEP * math.cos(3.14 * 2 * a / 5)\n                    + self.np_random.uniform(0, 5 * TERRAIN_STEP),\n                )\n                for a in range(5)\n            ]\n            x1 = min(p[0] for p in poly)\n            x2 = max(p[0] for p in poly)\n            self.cloud_poly.append((poly, x1, x2))\n\n    def reset(\n        self,\n        *,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ):\n        super().reset(seed=seed)\n        self._destroy()\n        self.world.contactListener_bug_workaround = ContactDetector(self)\n        self.world.contactListener = self.world.contactListener_bug_workaround\n        self.game_over = False\n        self.prev_shaping = None\n        self.scroll = 0.0\n        self.lidar_render = 0\n\n        self._generate_terrain(self.hardcore)\n        self._generate_clouds()\n\n        init_x = TERRAIN_STEP * TERRAIN_STARTPAD / 2\n        init_y = TERRAIN_HEIGHT + 2 * LEG_H\n        self.hull = self.world.CreateDynamicBody(\n            position=(init_x, init_y), fixtures=HULL_FD\n        )\n        self.hull.color1 = (127, 51, 229)\n        self.hull.color2 = (76, 76, 127)\n        self.hull.ApplyForceToCenter(\n            (self.np_random.uniform(-INITIAL_RANDOM, INITIAL_RANDOM), 0), True\n        )\n\n        self.legs: List[Box2D.b2Body] = []\n        self.joints: List[Box2D.b2RevoluteJoint] = []\n        for i in [-1, +1]:\n            leg = self.world.CreateDynamicBody(\n                position=(init_x, init_y - LEG_H / 2 - LEG_DOWN),\n                angle=(i * 0.05),\n                fixtures=LEG_FD,\n            )\n            leg.color1 = (153 - i * 25, 76 - i * 25, 127 - i * 25)\n            leg.color2 = (102 - i * 25, 51 - i * 25, 76 - i * 25)\n            rjd = revoluteJointDef(\n                bodyA=self.hull,\n                bodyB=leg,\n                localAnchorA=(0, LEG_DOWN),\n                localAnchorB=(0, LEG_H / 2),\n                enableMotor=True,\n                enableLimit=True,\n                maxMotorTorque=MOTORS_TORQUE,\n                motorSpeed=i,\n                lowerAngle=-0.8,\n                upperAngle=1.1,\n            )\n            self.legs.append(leg)\n            self.joints.append(self.world.CreateJoint(rjd))\n\n            lower = self.world.CreateDynamicBody(\n                position=(init_x, init_y - LEG_H * 3 / 2 - LEG_DOWN),\n                angle=(i * 0.05),\n                fixtures=LOWER_FD,\n            )\n            lower.color1 = (153 - i * 25, 76 - i * 25, 127 - i * 25)\n            lower.color2 = (102 - i * 25, 51 - i * 25, 76 - i * 25)\n            rjd = revoluteJointDef(\n                bodyA=leg,\n                bodyB=lower,\n                localAnchorA=(0, -LEG_H / 2),\n                localAnchorB=(0, LEG_H / 2),\n                enableMotor=True,\n                enableLimit=True,\n                maxMotorTorque=MOTORS_TORQUE,\n                motorSpeed=1,\n                lowerAngle=-1.6,\n                upperAngle=-0.1,\n            )\n            lower.ground_contact = False\n            self.legs.append(lower)\n            self.joints.append(self.world.CreateJoint(rjd))\n\n        self.drawlist = self.terrain + self.legs + [self.hull]\n\n        class LidarCallback(Box2D.b2.rayCastCallback):\n            def ReportFixture(self, fixture, point, normal, fraction):\n                if (fixture.filterData.categoryBits & 1) == 0:\n                    return -1\n                self.p2 = point\n                self.fraction = fraction\n                return fraction\n\n        self.lidar = [LidarCallback() for _ in range(10)]\n        if self.render_mode == \"human\":\n            self.render()\n        return self.step(np.array([0, 0, 0, 0]))[0], {}\n\n    def step(self, action: np.ndarray):\n        assert self.hull is not None\n\n        # self.hull.ApplyForceToCenter((0, 20), True) -- Uncomment this to receive a bit of stability help\n        control_speed = False  # Should be easier as well\n        if control_speed:\n            self.joints[0].motorSpeed = float(SPEED_HIP * np.clip(action[0], -1, 1))\n            self.joints[1].motorSpeed = float(SPEED_KNEE * np.clip(action[1], -1, 1))\n            self.joints[2].motorSpeed = float(SPEED_HIP * np.clip(action[2], -1, 1))\n            self.joints[3].motorSpeed = float(SPEED_KNEE * np.clip(action[3], -1, 1))\n        else:\n            self.joints[0].motorSpeed = float(SPEED_HIP * np.sign(action[0]))\n            self.joints[0].maxMotorTorque = float(\n                MOTORS_TORQUE * np.clip(np.abs(action[0]), 0, 1)\n            )\n            self.joints[1].motorSpeed = float(SPEED_KNEE * np.sign(action[1]))\n            self.joints[1].maxMotorTorque = float(\n                MOTORS_TORQUE * np.clip(np.abs(action[1]), 0, 1)\n            )\n            self.joints[2].motorSpeed = float(SPEED_HIP * np.sign(action[2]))\n            self.joints[2].maxMotorTorque = float(\n                MOTORS_TORQUE * np.clip(np.abs(action[2]), 0, 1)\n            )\n            self.joints[3].motorSpeed = float(SPEED_KNEE * np.sign(action[3]))\n            self.joints[3].maxMotorTorque = float(\n                MOTORS_TORQUE * np.clip(np.abs(action[3]), 0, 1)\n            )\n\n        self.world.Step(1.0 / FPS, 6 * 30, 2 * 30)\n\n        pos = self.hull.position\n        vel = self.hull.linearVelocity\n\n        for i in range(10):\n            self.lidar[i].fraction = 1.0\n            self.lidar[i].p1 = pos\n            self.lidar[i].p2 = (\n                pos[0] + math.sin(1.5 * i / 10.0) * LIDAR_RANGE,\n                pos[1] - math.cos(1.5 * i / 10.0) * LIDAR_RANGE,\n            )\n            self.world.RayCast(self.lidar[i], self.lidar[i].p1, self.lidar[i].p2)\n\n        state = [\n            self.hull.angle,  # Normal angles up to 0.5 here, but sure more is possible.\n            2.0 * self.hull.angularVelocity / FPS,\n            0.3 * vel.x * (VIEWPORT_W / SCALE) / FPS,  # Normalized to get -1..1 range\n            0.3 * vel.y * (VIEWPORT_H / SCALE) / FPS,\n            self.joints[0].angle,\n            # This will give 1.1 on high up, but it's still OK (and there should be spikes on hiting the ground, that's normal too)\n            self.joints[0].speed / SPEED_HIP,\n            self.joints[1].angle + 1.0,\n            self.joints[1].speed / SPEED_KNEE,\n            1.0 if self.legs[1].ground_contact else 0.0,\n            self.joints[2].angle,\n            self.joints[2].speed / SPEED_HIP,\n            self.joints[3].angle + 1.0,\n            self.joints[3].speed / SPEED_KNEE,\n            1.0 if self.legs[3].ground_contact else 0.0,\n        ]\n        state += [l.fraction for l in self.lidar]\n        assert len(state) == 24\n\n        self.scroll = pos.x - VIEWPORT_W / SCALE / 5\n\n        shaping = (\n            130 * pos[0] / SCALE\n        )  # moving forward is a way to receive reward (normalized to get 300 on completion)\n        shaping -= 5.0 * abs(\n            state[0]\n        )  # keep head straight, other than that and falling, any behavior is unpunished\n\n        reward = 0\n        if self.prev_shaping is not None:\n            reward = shaping - self.prev_shaping\n        self.prev_shaping = shaping\n\n        for a in action:\n            reward -= 0.00035 * MOTORS_TORQUE * np.clip(np.abs(a), 0, 1)\n            # normalized to about -50.0 using heuristic, more optimal agent should spend less\n\n        terminated = False\n        if self.game_over or pos[0] < 0:\n            reward = -100\n            terminated = True\n        if pos[0] > (TERRAIN_LENGTH - TERRAIN_GRASS) * TERRAIN_STEP:\n            terminated = True\n\n        if self.render_mode == \"human\":\n            self.render()\n        return np.array(state, dtype=np.float32), reward, terminated, False, {}\n\n    def render(self):\n        if self.render_mode is None:\n            gym.logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n            return\n\n        try:\n            import pygame\n            from pygame import gfxdraw\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[box2d]`\"\n            )\n\n        if self.screen is None and self.render_mode == \"human\":\n            pygame.init()\n            pygame.display.init()\n            self.screen = pygame.display.set_mode((VIEWPORT_W, VIEWPORT_H))\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n\n        self.surf = pygame.Surface(\n            (VIEWPORT_W + max(0.0, self.scroll) * SCALE, VIEWPORT_H)\n        )\n\n        pygame.transform.scale(self.surf, (SCALE, SCALE))\n\n        pygame.draw.polygon(\n            self.surf,\n            color=(215, 215, 255),\n            points=[\n                (self.scroll * SCALE, 0),\n                (self.scroll * SCALE + VIEWPORT_W, 0),\n                (self.scroll * SCALE + VIEWPORT_W, VIEWPORT_H),\n                (self.scroll * SCALE, VIEWPORT_H),\n            ],\n        )\n\n        for poly, x1, x2 in self.cloud_poly:\n            if x2 < self.scroll / 2:\n                continue\n            if x1 > self.scroll / 2 + VIEWPORT_W / SCALE:\n                continue\n            pygame.draw.polygon(\n                self.surf,\n                color=(255, 255, 255),\n                points=[\n                    (p[0] * SCALE + self.scroll * SCALE / 2, p[1] * SCALE) for p in poly\n                ],\n            )\n            gfxdraw.aapolygon(\n                self.surf,\n                [(p[0] * SCALE + self.scroll * SCALE / 2, p[1] * SCALE) for p in poly],\n                (255, 255, 255),\n            )\n        for poly, color in self.terrain_poly:\n            if poly[1][0] < self.scroll:\n                continue\n            if poly[0][0] > self.scroll + VIEWPORT_W / SCALE:\n                continue\n            scaled_poly = []\n            for coord in poly:\n                scaled_poly.append([coord[0] * SCALE, coord[1] * SCALE])\n            pygame.draw.polygon(self.surf, color=color, points=scaled_poly)\n            gfxdraw.aapolygon(self.surf, scaled_poly, color)\n\n        self.lidar_render = (self.lidar_render + 1) % 100\n        i = self.lidar_render\n        if i < 2 * len(self.lidar):\n            single_lidar = (\n                self.lidar[i]\n                if i < len(self.lidar)\n                else self.lidar[len(self.lidar) - i - 1]\n            )\n            if hasattr(single_lidar, \"p1\") and hasattr(single_lidar, \"p2\"):\n                pygame.draw.line(\n                    self.surf,\n                    color=(255, 0, 0),\n                    start_pos=(single_lidar.p1[0] * SCALE, single_lidar.p1[1] * SCALE),\n                    end_pos=(single_lidar.p2[0] * SCALE, single_lidar.p2[1] * SCALE),\n                    width=1,\n                )\n\n        for obj in self.drawlist:\n            for f in obj.fixtures:\n                trans = f.body.transform\n                if type(f.shape) is circleShape:\n                    pygame.draw.circle(\n                        self.surf,\n                        color=obj.color1,\n                        center=trans * f.shape.pos * SCALE,\n                        radius=f.shape.radius * SCALE,\n                    )\n                    pygame.draw.circle(\n                        self.surf,\n                        color=obj.color2,\n                        center=trans * f.shape.pos * SCALE,\n                        radius=f.shape.radius * SCALE,\n                    )\n                else:\n                    path = [trans * v * SCALE for v in f.shape.vertices]\n                    if len(path) > 2:\n                        pygame.draw.polygon(self.surf, color=obj.color1, points=path)\n                        gfxdraw.aapolygon(self.surf, path, obj.color1)\n                        path.append(path[0])\n                        pygame.draw.polygon(\n                            self.surf, color=obj.color2, points=path, width=1\n                        )\n                        gfxdraw.aapolygon(self.surf, path, obj.color2)\n                    else:\n                        pygame.draw.aaline(\n                            self.surf,\n                            start_pos=path[0],\n                            end_pos=path[1],\n                            color=obj.color1,\n                        )\n\n        flagy1 = TERRAIN_HEIGHT * SCALE\n        flagy2 = flagy1 + 50\n        x = TERRAIN_STEP * 3 * SCALE\n        pygame.draw.aaline(\n            self.surf, color=(0, 0, 0), start_pos=(x, flagy1), end_pos=(x, flagy2)\n        )\n        f = [\n            (x, flagy2),\n            (x, flagy2 - 10),\n            (x + 25, flagy2 - 5),\n        ]\n        pygame.draw.polygon(self.surf, color=(230, 51, 0), points=f)\n        pygame.draw.lines(\n            self.surf, color=(0, 0, 0), points=f + [f[0]], width=1, closed=False\n        )\n\n        self.surf = pygame.transform.flip(self.surf, False, True)\n\n        if self.render_mode == \"human\":\n            assert self.screen is not None\n            self.screen.blit(self.surf, (-self.scroll * SCALE, 0))\n            pygame.event.pump()\n            self.clock.tick(self.metadata[\"render_fps\"])\n            pygame.display.flip()\n        elif self.render_mode == \"rgb_array\":\n            return np.transpose(\n                np.array(pygame.surfarray.pixels3d(self.surf)), axes=(1, 0, 2)\n            )[:, -VIEWPORT_W:]\n\n    def close(self):\n        if self.screen is not None:\n            import pygame\n\n            pygame.display.quit()\n            pygame.quit()\n            self.isopen = False\n\n\nclass BipedalWalkerHardcore:\n    def __init__(self):\n        raise error.Error(\n            \"Error initializing BipedalWalkerHardcore Environment.\\n\"\n            \"Currently, we do not support initializing this mode of environment by calling the class directly.\\n\"\n            \"To use this environment, instead create it by specifying the hardcore keyword in gym.make, i.e.\\n\"\n            'gym.make(\"BipedalWalker-v3\", hardcore=True)'\n        )\n\n\nif __name__ == \"__main__\":\n    # Heurisic: suboptimal, have no notion of balance.\n    env = BipedalWalker()\n    env.reset()\n    steps = 0\n    total_reward = 0\n    a = np.array([0.0, 0.0, 0.0, 0.0])\n    STAY_ON_ONE_LEG, PUT_OTHER_DOWN, PUSH_OFF = 1, 2, 3\n    SPEED = 0.29  # Will fall forward on higher speed\n    state = STAY_ON_ONE_LEG\n    moving_leg = 0\n    supporting_leg = 1 - moving_leg\n    SUPPORT_KNEE_ANGLE = +0.1\n    supporting_knee_angle = SUPPORT_KNEE_ANGLE\n    while True:\n        s, r, terminated, truncated, info = env.step(a)\n        total_reward += r\n        if steps % 20 == 0 or terminated or truncated:\n            print(\"\\naction \" + str([f\"{x:+0.2f}\" for x in a]))\n            print(f\"step {steps} total_reward {total_reward:+0.2f}\")\n            print(\"hull \" + str([f\"{x:+0.2f}\" for x in s[0:4]]))\n            print(\"leg0 \" + str([f\"{x:+0.2f}\" for x in s[4:9]]))\n            print(\"leg1 \" + str([f\"{x:+0.2f}\" for x in s[9:14]]))\n        steps += 1\n\n        contact0 = s[8]\n        contact1 = s[13]\n        moving_s_base = 4 + 5 * moving_leg\n        supporting_s_base = 4 + 5 * supporting_leg\n\n        hip_targ = [None, None]  # -0.8 .. +1.1\n        knee_targ = [None, None]  # -0.6 .. +0.9\n        hip_todo = [0.0, 0.0]\n        knee_todo = [0.0, 0.0]\n\n        if state == STAY_ON_ONE_LEG:\n            hip_targ[moving_leg] = 1.1\n            knee_targ[moving_leg] = -0.6\n            supporting_knee_angle += 0.03\n            if s[2] > SPEED:\n                supporting_knee_angle += 0.03\n            supporting_knee_angle = min(supporting_knee_angle, SUPPORT_KNEE_ANGLE)\n            knee_targ[supporting_leg] = supporting_knee_angle\n            if s[supporting_s_base + 0] < 0.10:  # supporting leg is behind\n                state = PUT_OTHER_DOWN\n        if state == PUT_OTHER_DOWN:\n            hip_targ[moving_leg] = +0.1\n            knee_targ[moving_leg] = SUPPORT_KNEE_ANGLE\n            knee_targ[supporting_leg] = supporting_knee_angle\n            if s[moving_s_base + 4]:\n                state = PUSH_OFF\n                supporting_knee_angle = min(s[moving_s_base + 2], SUPPORT_KNEE_ANGLE)\n        if state == PUSH_OFF:\n            knee_targ[moving_leg] = supporting_knee_angle\n            knee_targ[supporting_leg] = +1.0\n            if s[supporting_s_base + 2] > 0.88 or s[2] > 1.2 * SPEED:\n                state = STAY_ON_ONE_LEG\n                moving_leg = 1 - moving_leg\n                supporting_leg = 1 - moving_leg\n\n        if hip_targ[0]:\n            hip_todo[0] = 0.9 * (hip_targ[0] - s[4]) - 0.25 * s[5]\n        if hip_targ[1]:\n            hip_todo[1] = 0.9 * (hip_targ[1] - s[9]) - 0.25 * s[10]\n        if knee_targ[0]:\n            knee_todo[0] = 4.0 * (knee_targ[0] - s[6]) - 0.25 * s[7]\n        if knee_targ[1]:\n            knee_todo[1] = 4.0 * (knee_targ[1] - s[11]) - 0.25 * s[12]\n\n        hip_todo[0] -= 0.9 * (0 - s[0]) - 1.5 * s[1]  # PID to keep head strait\n        hip_todo[1] -= 0.9 * (0 - s[0]) - 1.5 * s[1]\n        knee_todo[0] -= 15.0 * s[3]  # vertical speed, to damp oscillations\n        knee_todo[1] -= 15.0 * s[3]\n\n        a[0] = hip_todo[0]\n        a[1] = knee_todo[0]\n        a[2] = hip_todo[1]\n        a[3] = knee_todo[1]\n        a = np.clip(0.5 * a, -1.0, 1.0)\n\n        if terminated or truncated:\n            break\n"
  },
  {
    "path": "gym/envs/box2d/car_dynamics.py",
    "content": "\"\"\"\nTop-down car dynamics simulation.\n\nSome ideas are taken from this great tutorial http://www.iforce2d.net/b2dtut/top-down-car by Chris Campbell.\nThis simulation is a bit more detailed, with wheels rotation.\n\nCreated by Oleg Klimov\n\"\"\"\n\nimport math\n\nimport Box2D\nimport numpy as np\n\nfrom gym.error import DependencyNotInstalled\n\ntry:\n    from Box2D.b2 import fixtureDef, polygonShape, revoluteJointDef\nexcept ImportError:\n    raise DependencyNotInstalled(\"box2D is not installed, run `pip install gym[box2d]`\")\n\n\nSIZE = 0.02\nENGINE_POWER = 100000000 * SIZE * SIZE\nWHEEL_MOMENT_OF_INERTIA = 4000 * SIZE * SIZE\nFRICTION_LIMIT = (\n    1000000 * SIZE * SIZE\n)  # friction ~= mass ~= size^2 (calculated implicitly using density)\nWHEEL_R = 27\nWHEEL_W = 14\nWHEELPOS = [(-55, +80), (+55, +80), (-55, -82), (+55, -82)]\nHULL_POLY1 = [(-60, +130), (+60, +130), (+60, +110), (-60, +110)]\nHULL_POLY2 = [(-15, +120), (+15, +120), (+20, +20), (-20, 20)]\nHULL_POLY3 = [\n    (+25, +20),\n    (+50, -10),\n    (+50, -40),\n    (+20, -90),\n    (-20, -90),\n    (-50, -40),\n    (-50, -10),\n    (-25, +20),\n]\nHULL_POLY4 = [(-50, -120), (+50, -120), (+50, -90), (-50, -90)]\nWHEEL_COLOR = (0, 0, 0)\nWHEEL_WHITE = (77, 77, 77)\nMUD_COLOR = (102, 102, 0)\n\n\nclass Car:\n    def __init__(self, world, init_angle, init_x, init_y):\n        self.world: Box2D.b2World = world\n        self.hull: Box2D.b2Body = self.world.CreateDynamicBody(\n            position=(init_x, init_y),\n            angle=init_angle,\n            fixtures=[\n                fixtureDef(\n                    shape=polygonShape(\n                        vertices=[(x * SIZE, y * SIZE) for x, y in HULL_POLY1]\n                    ),\n                    density=1.0,\n                ),\n                fixtureDef(\n                    shape=polygonShape(\n                        vertices=[(x * SIZE, y * SIZE) for x, y in HULL_POLY2]\n                    ),\n                    density=1.0,\n                ),\n                fixtureDef(\n                    shape=polygonShape(\n                        vertices=[(x * SIZE, y * SIZE) for x, y in HULL_POLY3]\n                    ),\n                    density=1.0,\n                ),\n                fixtureDef(\n                    shape=polygonShape(\n                        vertices=[(x * SIZE, y * SIZE) for x, y in HULL_POLY4]\n                    ),\n                    density=1.0,\n                ),\n            ],\n        )\n        self.hull.color = (0.8, 0.0, 0.0)\n        self.wheels = []\n        self.fuel_spent = 0.0\n        WHEEL_POLY = [\n            (-WHEEL_W, +WHEEL_R),\n            (+WHEEL_W, +WHEEL_R),\n            (+WHEEL_W, -WHEEL_R),\n            (-WHEEL_W, -WHEEL_R),\n        ]\n        for wx, wy in WHEELPOS:\n            front_k = 1.0 if wy > 0 else 1.0\n            w = self.world.CreateDynamicBody(\n                position=(init_x + wx * SIZE, init_y + wy * SIZE),\n                angle=init_angle,\n                fixtures=fixtureDef(\n                    shape=polygonShape(\n                        vertices=[\n                            (x * front_k * SIZE, y * front_k * SIZE)\n                            for x, y in WHEEL_POLY\n                        ]\n                    ),\n                    density=0.1,\n                    categoryBits=0x0020,\n                    maskBits=0x001,\n                    restitution=0.0,\n                ),\n            )\n            w.wheel_rad = front_k * WHEEL_R * SIZE\n            w.color = WHEEL_COLOR\n            w.gas = 0.0\n            w.brake = 0.0\n            w.steer = 0.0\n            w.phase = 0.0  # wheel angle\n            w.omega = 0.0  # angular velocity\n            w.skid_start = None\n            w.skid_particle = None\n            rjd = revoluteJointDef(\n                bodyA=self.hull,\n                bodyB=w,\n                localAnchorA=(wx * SIZE, wy * SIZE),\n                localAnchorB=(0, 0),\n                enableMotor=True,\n                enableLimit=True,\n                maxMotorTorque=180 * 900 * SIZE * SIZE,\n                motorSpeed=0,\n                lowerAngle=-0.4,\n                upperAngle=+0.4,\n            )\n            w.joint = self.world.CreateJoint(rjd)\n            w.tiles = set()\n            w.userData = w\n            self.wheels.append(w)\n        self.drawlist = self.wheels + [self.hull]\n        self.particles = []\n\n    def gas(self, gas):\n        \"\"\"control: rear wheel drive\n\n        Args:\n            gas (float): How much gas gets applied. Gets clipped between 0 and 1.\n        \"\"\"\n        gas = np.clip(gas, 0, 1)\n        for w in self.wheels[2:4]:\n            diff = gas - w.gas\n            if diff > 0.1:\n                diff = 0.1  # gradually increase, but stop immediately\n            w.gas += diff\n\n    def brake(self, b):\n        \"\"\"control: brake\n\n        Args:\n            b (0..1): Degree to which the brakes are applied. More than 0.9 blocks the wheels to zero rotation\"\"\"\n        for w in self.wheels:\n            w.brake = b\n\n    def steer(self, s):\n        \"\"\"control: steer\n\n        Args:\n            s (-1..1): target position, it takes time to rotate steering wheel from side-to-side\"\"\"\n        self.wheels[0].steer = s\n        self.wheels[1].steer = s\n\n    def step(self, dt):\n        for w in self.wheels:\n            # Steer each wheel\n            dir = np.sign(w.steer - w.joint.angle)\n            val = abs(w.steer - w.joint.angle)\n            w.joint.motorSpeed = dir * min(50.0 * val, 3.0)\n\n            # Position => friction_limit\n            grass = True\n            friction_limit = FRICTION_LIMIT * 0.6  # Grass friction if no tile\n            for tile in w.tiles:\n                friction_limit = max(\n                    friction_limit, FRICTION_LIMIT * tile.road_friction\n                )\n                grass = False\n\n            # Force\n            forw = w.GetWorldVector((0, 1))\n            side = w.GetWorldVector((1, 0))\n            v = w.linearVelocity\n            vf = forw[0] * v[0] + forw[1] * v[1]  # forward speed\n            vs = side[0] * v[0] + side[1] * v[1]  # side speed\n\n            # WHEEL_MOMENT_OF_INERTIA*np.square(w.omega)/2 = E -- energy\n            # WHEEL_MOMENT_OF_INERTIA*w.omega * domega/dt = dE/dt = W -- power\n            # domega = dt*W/WHEEL_MOMENT_OF_INERTIA/w.omega\n\n            # add small coef not to divide by zero\n            w.omega += (\n                dt\n                * ENGINE_POWER\n                * w.gas\n                / WHEEL_MOMENT_OF_INERTIA\n                / (abs(w.omega) + 5.0)\n            )\n            self.fuel_spent += dt * ENGINE_POWER * w.gas\n\n            if w.brake >= 0.9:\n                w.omega = 0\n            elif w.brake > 0:\n                BRAKE_FORCE = 15  # radians per second\n                dir = -np.sign(w.omega)\n                val = BRAKE_FORCE * w.brake\n                if abs(val) > abs(w.omega):\n                    val = abs(w.omega)  # low speed => same as = 0\n                w.omega += dir * val\n            w.phase += w.omega * dt\n\n            vr = w.omega * w.wheel_rad  # rotating wheel speed\n            f_force = -vf + vr  # force direction is direction of speed difference\n            p_force = -vs\n\n            # Physically correct is to always apply friction_limit until speed is equal.\n            # But dt is finite, that will lead to oscillations if difference is already near zero.\n\n            # Random coefficient to cut oscillations in few steps (have no effect on friction_limit)\n            f_force *= 205000 * SIZE * SIZE\n            p_force *= 205000 * SIZE * SIZE\n            force = np.sqrt(np.square(f_force) + np.square(p_force))\n\n            # Skid trace\n            if abs(force) > 2.0 * friction_limit:\n                if (\n                    w.skid_particle\n                    and w.skid_particle.grass == grass\n                    and len(w.skid_particle.poly) < 30\n                ):\n                    w.skid_particle.poly.append((w.position[0], w.position[1]))\n                elif w.skid_start is None:\n                    w.skid_start = w.position\n                else:\n                    w.skid_particle = self._create_particle(\n                        w.skid_start, w.position, grass\n                    )\n                    w.skid_start = None\n            else:\n                w.skid_start = None\n                w.skid_particle = None\n\n            if abs(force) > friction_limit:\n                f_force /= force\n                p_force /= force\n                force = friction_limit  # Correct physics here\n                f_force *= force\n                p_force *= force\n\n            w.omega -= dt * f_force * w.wheel_rad / WHEEL_MOMENT_OF_INERTIA\n\n            w.ApplyForceToCenter(\n                (\n                    p_force * side[0] + f_force * forw[0],\n                    p_force * side[1] + f_force * forw[1],\n                ),\n                True,\n            )\n\n    def draw(self, surface, zoom, translation, angle, draw_particles=True):\n        import pygame.draw\n\n        if draw_particles:\n            for p in self.particles:\n                poly = [pygame.math.Vector2(c).rotate_rad(angle) for c in p.poly]\n                poly = [\n                    (\n                        coords[0] * zoom + translation[0],\n                        coords[1] * zoom + translation[1],\n                    )\n                    for coords in poly\n                ]\n                pygame.draw.lines(\n                    surface, color=p.color, points=poly, width=2, closed=False\n                )\n\n        for obj in self.drawlist:\n            for f in obj.fixtures:\n                trans = f.body.transform\n                path = [trans * v for v in f.shape.vertices]\n                path = [(coords[0], coords[1]) for coords in path]\n                path = [pygame.math.Vector2(c).rotate_rad(angle) for c in path]\n                path = [\n                    (\n                        coords[0] * zoom + translation[0],\n                        coords[1] * zoom + translation[1],\n                    )\n                    for coords in path\n                ]\n                color = [int(c * 255) for c in obj.color]\n\n                pygame.draw.polygon(surface, color=color, points=path)\n\n                if \"phase\" not in obj.__dict__:\n                    continue\n                a1 = obj.phase\n                a2 = obj.phase + 1.2  # radians\n                s1 = math.sin(a1)\n                s2 = math.sin(a2)\n                c1 = math.cos(a1)\n                c2 = math.cos(a2)\n                if s1 > 0 and s2 > 0:\n                    continue\n                if s1 > 0:\n                    c1 = np.sign(c1)\n                if s2 > 0:\n                    c2 = np.sign(c2)\n                white_poly = [\n                    (-WHEEL_W * SIZE, +WHEEL_R * c1 * SIZE),\n                    (+WHEEL_W * SIZE, +WHEEL_R * c1 * SIZE),\n                    (+WHEEL_W * SIZE, +WHEEL_R * c2 * SIZE),\n                    (-WHEEL_W * SIZE, +WHEEL_R * c2 * SIZE),\n                ]\n                white_poly = [trans * v for v in white_poly]\n\n                white_poly = [(coords[0], coords[1]) for coords in white_poly]\n                white_poly = [\n                    pygame.math.Vector2(c).rotate_rad(angle) for c in white_poly\n                ]\n                white_poly = [\n                    (\n                        coords[0] * zoom + translation[0],\n                        coords[1] * zoom + translation[1],\n                    )\n                    for coords in white_poly\n                ]\n                pygame.draw.polygon(surface, color=WHEEL_WHITE, points=white_poly)\n\n    def _create_particle(self, point1, point2, grass):\n        class Particle:\n            pass\n\n        p = Particle()\n        p.color = WHEEL_COLOR if not grass else MUD_COLOR\n        p.ttl = 1\n        p.poly = [(point1[0], point1[1]), (point2[0], point2[1])]\n        p.grass = grass\n        self.particles.append(p)\n        while len(self.particles) > 30:\n            self.particles.pop(0)\n        return p\n\n    def destroy(self):\n        self.world.DestroyBody(self.hull)\n        self.hull = None\n        for w in self.wheels:\n            self.world.DestroyBody(w)\n        self.wheels = []\n"
  },
  {
    "path": "gym/envs/box2d/car_racing.py",
    "content": "__credits__ = [\"Andrea PIERRÉ\"]\n\nimport math\nfrom typing import Optional, Union\n\nimport numpy as np\n\nimport gym\nfrom gym import spaces\nfrom gym.envs.box2d.car_dynamics import Car\nfrom gym.error import DependencyNotInstalled, InvalidAction\nfrom gym.utils import EzPickle\n\ntry:\n    import Box2D\n    from Box2D.b2 import contactListener, fixtureDef, polygonShape\nexcept ImportError:\n    raise DependencyNotInstalled(\"box2D is not installed, run `pip install gym[box2d]`\")\n\ntry:\n    # As pygame is necessary for using the environment (reset and step) even without a render mode\n    #   therefore, pygame is a necessary import for the environment.\n    import pygame\n    from pygame import gfxdraw\nexcept ImportError:\n    raise DependencyNotInstalled(\n        \"pygame is not installed, run `pip install gym[box2d]`\"\n    )\n\n\nSTATE_W = 96  # less than Atari 160x192\nSTATE_H = 96\nVIDEO_W = 600\nVIDEO_H = 400\nWINDOW_W = 1000\nWINDOW_H = 800\n\nSCALE = 6.0  # Track scale\nTRACK_RAD = 900 / SCALE  # Track is heavily morphed circle with this radius\nPLAYFIELD = 2000 / SCALE  # Game over boundary\nFPS = 50  # Frames per second\nZOOM = 2.7  # Camera zoom\nZOOM_FOLLOW = True  # Set to False for fixed view (don't use zoom)\n\n\nTRACK_DETAIL_STEP = 21 / SCALE\nTRACK_TURN_RATE = 0.31\nTRACK_WIDTH = 40 / SCALE\nBORDER = 8 / SCALE\nBORDER_MIN_COUNT = 4\nGRASS_DIM = PLAYFIELD / 20.0\nMAX_SHAPE_DIM = (\n    max(GRASS_DIM, TRACK_WIDTH, TRACK_DETAIL_STEP) * math.sqrt(2) * ZOOM * SCALE\n)\n\n\nclass FrictionDetector(contactListener):\n    def __init__(self, env, lap_complete_percent):\n        contactListener.__init__(self)\n        self.env = env\n        self.lap_complete_percent = lap_complete_percent\n\n    def BeginContact(self, contact):\n        self._contact(contact, True)\n\n    def EndContact(self, contact):\n        self._contact(contact, False)\n\n    def _contact(self, contact, begin):\n        tile = None\n        obj = None\n        u1 = contact.fixtureA.body.userData\n        u2 = contact.fixtureB.body.userData\n        if u1 and \"road_friction\" in u1.__dict__:\n            tile = u1\n            obj = u2\n        if u2 and \"road_friction\" in u2.__dict__:\n            tile = u2\n            obj = u1\n        if not tile:\n            return\n\n        # inherit tile color from env\n        tile.color[:] = self.env.road_color\n        if not obj or \"tiles\" not in obj.__dict__:\n            return\n        if begin:\n            obj.tiles.add(tile)\n            if not tile.road_visited:\n                tile.road_visited = True\n                self.env.reward += 1000.0 / len(self.env.track)\n                self.env.tile_visited_count += 1\n\n                # Lap is considered completed if enough % of the track was covered\n                if (\n                    tile.idx == 0\n                    and self.env.tile_visited_count / len(self.env.track)\n                    > self.lap_complete_percent\n                ):\n                    self.env.new_lap = True\n        else:\n            obj.tiles.remove(tile)\n\n\nclass CarRacing(gym.Env, EzPickle):\n    \"\"\"\n    ### Description\n    The easiest control task to learn from pixels - a top-down\n    racing environment. The generated track is random every episode.\n\n    Some indicators are shown at the bottom of the window along with the\n    state RGB buffer. From left to right: true speed, four ABS sensors,\n    steering wheel position, and gyroscope.\n    To play yourself (it's rather fast for humans), type:\n    ```\n    python gym/envs/box2d/car_racing.py\n    ```\n    Remember: it's a powerful rear-wheel drive car - don't press the accelerator\n    and turn at the same time.\n\n    ### Action Space\n    If continuous:\n        There are 3 actions: steering (-1 is full left, +1 is full right), gas, and breaking.\n    If discrete:\n        There are 5 actions: do nothing, steer left, steer right, gas, brake.\n\n    ### Observation Space\n    State consists of 96x96 pixels.\n\n    ### Rewards\n    The reward is -0.1 every frame and +1000/N for every track tile visited,\n    where N is the total number of tiles visited in the track. For example,\n    if you have finished in 732 frames, your reward is\n    1000 - 0.1*732 = 926.8 points.\n\n    ### Starting State\n    The car starts at rest in the center of the road.\n\n    ### Episode Termination\n    The episode finishes when all of the tiles are visited. The car can also go\n    outside of the playfield - that is, far off the track, in which case it will\n    receive -100 reward and die.\n\n    ### Arguments\n    `lap_complete_percent` dictates the percentage of tiles that must be visited by\n    the agent before a lap is considered complete.\n\n    Passing `domain_randomize=True` enables the domain randomized variant of the environment.\n    In this scenario, the background and track colours are different on every reset.\n\n    Passing `continuous=False` converts the environment to use discrete action space.\n    The discrete action space has 5 actions: [do nothing, left, right, gas, brake].\n\n    ### Reset Arguments\n    Passing the option `options[\"randomize\"] = True` will change the current colour of the environment on demand.\n    Correspondingly, passing the option `options[\"randomize\"] = False` will not change the current colour of the environment.\n    `domain_randomize` must be `True` on init for this argument to work.\n    Example usage:\n    ```py\n        env = gym.make(\"CarRacing-v1\", domain_randomize=True)\n\n        # normal reset, this changes the colour scheme by default\n        env.reset()\n\n        # reset with colour scheme change\n        env.reset(options={\"randomize\": True})\n\n        # reset with no colour scheme change\n        env.reset(options={\"randomize\": False})\n    ```\n\n    ### Version History\n    - v1: Change track completion logic and add domain randomization (0.24.0)\n    - v0: Original version\n\n    ### References\n    - Chris Campbell (2014), http://www.iforce2d.net/b2dtut/top-down-car.\n\n    ### Credits\n    Created by Oleg Klimov\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"state_pixels\",\n        ],\n        \"render_fps\": FPS,\n    }\n\n    def __init__(\n        self,\n        render_mode: Optional[str] = None,\n        verbose: bool = False,\n        lap_complete_percent: float = 0.95,\n        domain_randomize: bool = False,\n        continuous: bool = True,\n    ):\n        EzPickle.__init__(\n            self,\n            render_mode,\n            verbose,\n            lap_complete_percent,\n            domain_randomize,\n            continuous,\n        )\n        self.continuous = continuous\n        self.domain_randomize = domain_randomize\n        self.lap_complete_percent = lap_complete_percent\n        self._init_colors()\n\n        self.contactListener_keepref = FrictionDetector(self, self.lap_complete_percent)\n        self.world = Box2D.b2World((0, 0), contactListener=self.contactListener_keepref)\n        self.screen: Optional[pygame.Surface] = None\n        self.surf = None\n        self.clock = None\n        self.isopen = True\n        self.invisible_state_window = None\n        self.invisible_video_window = None\n        self.road = None\n        self.car: Optional[Car] = None\n        self.reward = 0.0\n        self.prev_reward = 0.0\n        self.verbose = verbose\n        self.new_lap = False\n        self.fd_tile = fixtureDef(\n            shape=polygonShape(vertices=[(0, 0), (1, 0), (1, -1), (0, -1)])\n        )\n\n        # This will throw a warning in tests/envs/test_envs in utils/env_checker.py as the space is not symmetric\n        #   or normalised however this is not possible here so ignore\n        if self.continuous:\n            self.action_space = spaces.Box(\n                np.array([-1, 0, 0]).astype(np.float32),\n                np.array([+1, +1, +1]).astype(np.float32),\n            )  # steer, gas, brake\n        else:\n            self.action_space = spaces.Discrete(5)\n            # do nothing, left, right, gas, brake\n\n        self.observation_space = spaces.Box(\n            low=0, high=255, shape=(STATE_H, STATE_W, 3), dtype=np.uint8\n        )\n\n        self.render_mode = render_mode\n\n    def _destroy(self):\n        if not self.road:\n            return\n        for t in self.road:\n            self.world.DestroyBody(t)\n        self.road = []\n        assert self.car is not None\n        self.car.destroy()\n\n    def _init_colors(self):\n        if self.domain_randomize:\n            # domain randomize the bg and grass colour\n            self.road_color = self.np_random.uniform(0, 210, size=3)\n\n            self.bg_color = self.np_random.uniform(0, 210, size=3)\n\n            self.grass_color = np.copy(self.bg_color)\n            idx = self.np_random.integers(3)\n            self.grass_color[idx] += 20\n        else:\n            # default colours\n            self.road_color = np.array([102, 102, 102])\n            self.bg_color = np.array([102, 204, 102])\n            self.grass_color = np.array([102, 230, 102])\n\n    def _reinit_colors(self, randomize):\n        assert (\n            self.domain_randomize\n        ), \"domain_randomize must be True to use this function.\"\n\n        if randomize:\n            # domain randomize the bg and grass colour\n            self.road_color = self.np_random.uniform(0, 210, size=3)\n\n            self.bg_color = self.np_random.uniform(0, 210, size=3)\n\n            self.grass_color = np.copy(self.bg_color)\n            idx = self.np_random.integers(3)\n            self.grass_color[idx] += 20\n\n    def _create_track(self):\n        CHECKPOINTS = 12\n\n        # Create checkpoints\n        checkpoints = []\n        for c in range(CHECKPOINTS):\n            noise = self.np_random.uniform(0, 2 * math.pi * 1 / CHECKPOINTS)\n            alpha = 2 * math.pi * c / CHECKPOINTS + noise\n            rad = self.np_random.uniform(TRACK_RAD / 3, TRACK_RAD)\n\n            if c == 0:\n                alpha = 0\n                rad = 1.5 * TRACK_RAD\n            if c == CHECKPOINTS - 1:\n                alpha = 2 * math.pi * c / CHECKPOINTS\n                self.start_alpha = 2 * math.pi * (-0.5) / CHECKPOINTS\n                rad = 1.5 * TRACK_RAD\n\n            checkpoints.append((alpha, rad * math.cos(alpha), rad * math.sin(alpha)))\n        self.road = []\n\n        # Go from one checkpoint to another to create track\n        x, y, beta = 1.5 * TRACK_RAD, 0, 0\n        dest_i = 0\n        laps = 0\n        track = []\n        no_freeze = 2500\n        visited_other_side = False\n        while True:\n            alpha = math.atan2(y, x)\n            if visited_other_side and alpha > 0:\n                laps += 1\n                visited_other_side = False\n            if alpha < 0:\n                visited_other_side = True\n                alpha += 2 * math.pi\n\n            while True:  # Find destination from checkpoints\n                failed = True\n\n                while True:\n                    dest_alpha, dest_x, dest_y = checkpoints[dest_i % len(checkpoints)]\n                    if alpha <= dest_alpha:\n                        failed = False\n                        break\n                    dest_i += 1\n                    if dest_i % len(checkpoints) == 0:\n                        break\n\n                if not failed:\n                    break\n\n                alpha -= 2 * math.pi\n                continue\n\n            r1x = math.cos(beta)\n            r1y = math.sin(beta)\n            p1x = -r1y\n            p1y = r1x\n            dest_dx = dest_x - x  # vector towards destination\n            dest_dy = dest_y - y\n            # destination vector projected on rad:\n            proj = r1x * dest_dx + r1y * dest_dy\n            while beta - alpha > 1.5 * math.pi:\n                beta -= 2 * math.pi\n            while beta - alpha < -1.5 * math.pi:\n                beta += 2 * math.pi\n            prev_beta = beta\n            proj *= SCALE\n            if proj > 0.3:\n                beta -= min(TRACK_TURN_RATE, abs(0.001 * proj))\n            if proj < -0.3:\n                beta += min(TRACK_TURN_RATE, abs(0.001 * proj))\n            x += p1x * TRACK_DETAIL_STEP\n            y += p1y * TRACK_DETAIL_STEP\n            track.append((alpha, prev_beta * 0.5 + beta * 0.5, x, y))\n            if laps > 4:\n                break\n            no_freeze -= 1\n            if no_freeze == 0:\n                break\n\n        # Find closed loop range i1..i2, first loop should be ignored, second is OK\n        i1, i2 = -1, -1\n        i = len(track)\n        while True:\n            i -= 1\n            if i == 0:\n                return False  # Failed\n            pass_through_start = (\n                track[i][0] > self.start_alpha and track[i - 1][0] <= self.start_alpha\n            )\n            if pass_through_start and i2 == -1:\n                i2 = i\n            elif pass_through_start and i1 == -1:\n                i1 = i\n                break\n        if self.verbose:\n            print(\"Track generation: %i..%i -> %i-tiles track\" % (i1, i2, i2 - i1))\n        assert i1 != -1\n        assert i2 != -1\n\n        track = track[i1 : i2 - 1]\n\n        first_beta = track[0][1]\n        first_perp_x = math.cos(first_beta)\n        first_perp_y = math.sin(first_beta)\n        # Length of perpendicular jump to put together head and tail\n        well_glued_together = np.sqrt(\n            np.square(first_perp_x * (track[0][2] - track[-1][2]))\n            + np.square(first_perp_y * (track[0][3] - track[-1][3]))\n        )\n        if well_glued_together > TRACK_DETAIL_STEP:\n            return False\n\n        # Red-white border on hard turns\n        border = [False] * len(track)\n        for i in range(len(track)):\n            good = True\n            oneside = 0\n            for neg in range(BORDER_MIN_COUNT):\n                beta1 = track[i - neg - 0][1]\n                beta2 = track[i - neg - 1][1]\n                good &= abs(beta1 - beta2) > TRACK_TURN_RATE * 0.2\n                oneside += np.sign(beta1 - beta2)\n            good &= abs(oneside) == BORDER_MIN_COUNT\n            border[i] = good\n        for i in range(len(track)):\n            for neg in range(BORDER_MIN_COUNT):\n                border[i - neg] |= border[i]\n\n        # Create tiles\n        for i in range(len(track)):\n            alpha1, beta1, x1, y1 = track[i]\n            alpha2, beta2, x2, y2 = track[i - 1]\n            road1_l = (\n                x1 - TRACK_WIDTH * math.cos(beta1),\n                y1 - TRACK_WIDTH * math.sin(beta1),\n            )\n            road1_r = (\n                x1 + TRACK_WIDTH * math.cos(beta1),\n                y1 + TRACK_WIDTH * math.sin(beta1),\n            )\n            road2_l = (\n                x2 - TRACK_WIDTH * math.cos(beta2),\n                y2 - TRACK_WIDTH * math.sin(beta2),\n            )\n            road2_r = (\n                x2 + TRACK_WIDTH * math.cos(beta2),\n                y2 + TRACK_WIDTH * math.sin(beta2),\n            )\n            vertices = [road1_l, road1_r, road2_r, road2_l]\n            self.fd_tile.shape.vertices = vertices\n            t = self.world.CreateStaticBody(fixtures=self.fd_tile)\n            t.userData = t\n            c = 0.01 * (i % 3) * 255\n            t.color = self.road_color + c\n            t.road_visited = False\n            t.road_friction = 1.0\n            t.idx = i\n            t.fixtures[0].sensor = True\n            self.road_poly.append(([road1_l, road1_r, road2_r, road2_l], t.color))\n            self.road.append(t)\n            if border[i]:\n                side = np.sign(beta2 - beta1)\n                b1_l = (\n                    x1 + side * TRACK_WIDTH * math.cos(beta1),\n                    y1 + side * TRACK_WIDTH * math.sin(beta1),\n                )\n                b1_r = (\n                    x1 + side * (TRACK_WIDTH + BORDER) * math.cos(beta1),\n                    y1 + side * (TRACK_WIDTH + BORDER) * math.sin(beta1),\n                )\n                b2_l = (\n                    x2 + side * TRACK_WIDTH * math.cos(beta2),\n                    y2 + side * TRACK_WIDTH * math.sin(beta2),\n                )\n                b2_r = (\n                    x2 + side * (TRACK_WIDTH + BORDER) * math.cos(beta2),\n                    y2 + side * (TRACK_WIDTH + BORDER) * math.sin(beta2),\n                )\n                self.road_poly.append(\n                    (\n                        [b1_l, b1_r, b2_r, b2_l],\n                        (255, 255, 255) if i % 2 == 0 else (255, 0, 0),\n                    )\n                )\n        self.track = track\n        return True\n\n    def reset(\n        self,\n        *,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ):\n        super().reset(seed=seed)\n        self._destroy()\n        self.world.contactListener_bug_workaround = FrictionDetector(\n            self, self.lap_complete_percent\n        )\n        self.world.contactListener = self.world.contactListener_bug_workaround\n        self.reward = 0.0\n        self.prev_reward = 0.0\n        self.tile_visited_count = 0\n        self.t = 0.0\n        self.new_lap = False\n        self.road_poly = []\n\n        if self.domain_randomize:\n            randomize = True\n            if isinstance(options, dict):\n                if \"randomize\" in options:\n                    randomize = options[\"randomize\"]\n\n            self._reinit_colors(randomize)\n\n        while True:\n            success = self._create_track()\n            if success:\n                break\n            if self.verbose:\n                print(\n                    \"retry to generate track (normal if there are not many\"\n                    \"instances of this message)\"\n                )\n        self.car = Car(self.world, *self.track[0][1:4])\n\n        if self.render_mode == \"human\":\n            self.render()\n        return self.step(None)[0], {}\n\n    def step(self, action: Union[np.ndarray, int]):\n        assert self.car is not None\n        if action is not None:\n            if self.continuous:\n                self.car.steer(-action[0])\n                self.car.gas(action[1])\n                self.car.brake(action[2])\n            else:\n                if not self.action_space.contains(action):\n                    raise InvalidAction(\n                        f\"you passed the invalid action `{action}`. \"\n                        f\"The supported action_space is `{self.action_space}`\"\n                    )\n                self.car.steer(-0.6 * (action == 1) + 0.6 * (action == 2))\n                self.car.gas(0.2 * (action == 3))\n                self.car.brake(0.8 * (action == 4))\n\n        self.car.step(1.0 / FPS)\n        self.world.Step(1.0 / FPS, 6 * 30, 2 * 30)\n        self.t += 1.0 / FPS\n\n        self.state = self._render(\"state_pixels\")\n\n        step_reward = 0\n        terminated = False\n        truncated = False\n        if action is not None:  # First step without action, called from reset()\n            self.reward -= 0.1\n            # We actually don't want to count fuel spent, we want car to be faster.\n            # self.reward -=  10 * self.car.fuel_spent / ENGINE_POWER\n            self.car.fuel_spent = 0.0\n            step_reward = self.reward - self.prev_reward\n            self.prev_reward = self.reward\n            if self.tile_visited_count == len(self.track) or self.new_lap:\n                # Truncation due to finishing lap\n                # This should not be treated as a failure\n                # but like a timeout\n                truncated = True\n            x, y = self.car.hull.position\n            if abs(x) > PLAYFIELD or abs(y) > PLAYFIELD:\n                terminated = True\n                step_reward = -100\n\n        if self.render_mode == \"human\":\n            self.render()\n        return self.state, step_reward, terminated, truncated, {}\n\n    def render(self):\n        if self.render_mode is None:\n            gym.logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n        else:\n            return self._render(self.render_mode)\n\n    def _render(self, mode: str):\n        assert mode in self.metadata[\"render_modes\"]\n\n        pygame.font.init()\n        if self.screen is None and mode == \"human\":\n            pygame.init()\n            pygame.display.init()\n            self.screen = pygame.display.set_mode((WINDOW_W, WINDOW_H))\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n\n        if \"t\" not in self.__dict__:\n            return  # reset() not called yet\n\n        self.surf = pygame.Surface((WINDOW_W, WINDOW_H))\n\n        assert self.car is not None\n        # computing transformations\n        angle = -self.car.hull.angle\n        # Animating first second zoom.\n        zoom = 0.1 * SCALE * max(1 - self.t, 0) + ZOOM * SCALE * min(self.t, 1)\n        scroll_x = -(self.car.hull.position[0]) * zoom\n        scroll_y = -(self.car.hull.position[1]) * zoom\n        trans = pygame.math.Vector2((scroll_x, scroll_y)).rotate_rad(angle)\n        trans = (WINDOW_W / 2 + trans[0], WINDOW_H / 4 + trans[1])\n\n        self._render_road(zoom, trans, angle)\n        self.car.draw(\n            self.surf,\n            zoom,\n            trans,\n            angle,\n            mode not in [\"state_pixels_list\", \"state_pixels\"],\n        )\n\n        self.surf = pygame.transform.flip(self.surf, False, True)\n\n        # showing stats\n        self._render_indicators(WINDOW_W, WINDOW_H)\n\n        font = pygame.font.Font(pygame.font.get_default_font(), 42)\n        text = font.render(\"%04i\" % self.reward, True, (255, 255, 255), (0, 0, 0))\n        text_rect = text.get_rect()\n        text_rect.center = (60, WINDOW_H - WINDOW_H * 2.5 / 40.0)\n        self.surf.blit(text, text_rect)\n\n        if mode == \"human\":\n            pygame.event.pump()\n            self.clock.tick(self.metadata[\"render_fps\"])\n            assert self.screen is not None\n            self.screen.fill(0)\n            self.screen.blit(self.surf, (0, 0))\n            pygame.display.flip()\n\n        if mode == \"rgb_array\":\n            return self._create_image_array(self.surf, (VIDEO_W, VIDEO_H))\n        elif mode == \"state_pixels\":\n            return self._create_image_array(self.surf, (STATE_W, STATE_H))\n        else:\n            return self.isopen\n\n    def _render_road(self, zoom, translation, angle):\n        bounds = PLAYFIELD\n        field = [\n            (bounds, bounds),\n            (bounds, -bounds),\n            (-bounds, -bounds),\n            (-bounds, bounds),\n        ]\n\n        # draw background\n        self._draw_colored_polygon(\n            self.surf, field, self.bg_color, zoom, translation, angle, clip=False\n        )\n\n        # draw grass patches\n        grass = []\n        for x in range(-20, 20, 2):\n            for y in range(-20, 20, 2):\n                grass.append(\n                    [\n                        (GRASS_DIM * x + GRASS_DIM, GRASS_DIM * y + 0),\n                        (GRASS_DIM * x + 0, GRASS_DIM * y + 0),\n                        (GRASS_DIM * x + 0, GRASS_DIM * y + GRASS_DIM),\n                        (GRASS_DIM * x + GRASS_DIM, GRASS_DIM * y + GRASS_DIM),\n                    ]\n                )\n        for poly in grass:\n            self._draw_colored_polygon(\n                self.surf, poly, self.grass_color, zoom, translation, angle\n            )\n\n        # draw road\n        for poly, color in self.road_poly:\n            # converting to pixel coordinates\n            poly = [(p[0], p[1]) for p in poly]\n            color = [int(c) for c in color]\n            self._draw_colored_polygon(self.surf, poly, color, zoom, translation, angle)\n\n    def _render_indicators(self, W, H):\n        s = W / 40.0\n        h = H / 40.0\n        color = (0, 0, 0)\n        polygon = [(W, H), (W, H - 5 * h), (0, H - 5 * h), (0, H)]\n        pygame.draw.polygon(self.surf, color=color, points=polygon)\n\n        def vertical_ind(place, val):\n            return [\n                (place * s, H - (h + h * val)),\n                ((place + 1) * s, H - (h + h * val)),\n                ((place + 1) * s, H - h),\n                ((place + 0) * s, H - h),\n            ]\n\n        def horiz_ind(place, val):\n            return [\n                ((place + 0) * s, H - 4 * h),\n                ((place + val) * s, H - 4 * h),\n                ((place + val) * s, H - 2 * h),\n                ((place + 0) * s, H - 2 * h),\n            ]\n\n        assert self.car is not None\n        true_speed = np.sqrt(\n            np.square(self.car.hull.linearVelocity[0])\n            + np.square(self.car.hull.linearVelocity[1])\n        )\n\n        # simple wrapper to render if the indicator value is above a threshold\n        def render_if_min(value, points, color):\n            if abs(value) > 1e-4:\n                pygame.draw.polygon(self.surf, points=points, color=color)\n\n        render_if_min(true_speed, vertical_ind(5, 0.02 * true_speed), (255, 255, 255))\n        # ABS sensors\n        render_if_min(\n            self.car.wheels[0].omega,\n            vertical_ind(7, 0.01 * self.car.wheels[0].omega),\n            (0, 0, 255),\n        )\n        render_if_min(\n            self.car.wheels[1].omega,\n            vertical_ind(8, 0.01 * self.car.wheels[1].omega),\n            (0, 0, 255),\n        )\n        render_if_min(\n            self.car.wheels[2].omega,\n            vertical_ind(9, 0.01 * self.car.wheels[2].omega),\n            (51, 0, 255),\n        )\n        render_if_min(\n            self.car.wheels[3].omega,\n            vertical_ind(10, 0.01 * self.car.wheels[3].omega),\n            (51, 0, 255),\n        )\n\n        render_if_min(\n            self.car.wheels[0].joint.angle,\n            horiz_ind(20, -10.0 * self.car.wheels[0].joint.angle),\n            (0, 255, 0),\n        )\n        render_if_min(\n            self.car.hull.angularVelocity,\n            horiz_ind(30, -0.8 * self.car.hull.angularVelocity),\n            (255, 0, 0),\n        )\n\n    def _draw_colored_polygon(\n        self, surface, poly, color, zoom, translation, angle, clip=True\n    ):\n        poly = [pygame.math.Vector2(c).rotate_rad(angle) for c in poly]\n        poly = [\n            (c[0] * zoom + translation[0], c[1] * zoom + translation[1]) for c in poly\n        ]\n        # This checks if the polygon is out of bounds of the screen, and we skip drawing if so.\n        # Instead of calculating exactly if the polygon and screen overlap,\n        # we simply check if the polygon is in a larger bounding box whose dimension\n        # is greater than the screen by MAX_SHAPE_DIM, which is the maximum\n        # diagonal length of an environment object\n        if not clip or any(\n            (-MAX_SHAPE_DIM <= coord[0] <= WINDOW_W + MAX_SHAPE_DIM)\n            and (-MAX_SHAPE_DIM <= coord[1] <= WINDOW_H + MAX_SHAPE_DIM)\n            for coord in poly\n        ):\n            gfxdraw.aapolygon(self.surf, poly, color)\n            gfxdraw.filled_polygon(self.surf, poly, color)\n\n    def _create_image_array(self, screen, size):\n        scaled_screen = pygame.transform.smoothscale(screen, size)\n        return np.transpose(\n            np.array(pygame.surfarray.pixels3d(scaled_screen)), axes=(1, 0, 2)\n        )\n\n    def close(self):\n        if self.screen is not None:\n            pygame.display.quit()\n            self.isopen = False\n            pygame.quit()\n\n\nif __name__ == \"__main__\":\n    a = np.array([0.0, 0.0, 0.0])\n\n    def register_input():\n        global quit, restart\n        for event in pygame.event.get():\n            if event.type == pygame.KEYDOWN:\n                if event.key == pygame.K_LEFT:\n                    a[0] = -1.0\n                if event.key == pygame.K_RIGHT:\n                    a[0] = +1.0\n                if event.key == pygame.K_UP:\n                    a[1] = +1.0\n                if event.key == pygame.K_DOWN:\n                    a[2] = +0.8  # set 1.0 for wheels to block to zero rotation\n                if event.key == pygame.K_RETURN:\n                    restart = True\n                if event.key == pygame.K_ESCAPE:\n                    quit = True\n\n            if event.type == pygame.KEYUP:\n                if event.key == pygame.K_LEFT:\n                    a[0] = 0\n                if event.key == pygame.K_RIGHT:\n                    a[0] = 0\n                if event.key == pygame.K_UP:\n                    a[1] = 0\n                if event.key == pygame.K_DOWN:\n                    a[2] = 0\n\n            if event.type == pygame.QUIT:\n                quit = True\n\n    env = CarRacing(render_mode=\"human\")\n\n    quit = False\n    while not quit:\n        env.reset()\n        total_reward = 0.0\n        steps = 0\n        restart = False\n        while True:\n            register_input()\n            s, r, terminated, truncated, info = env.step(a)\n            total_reward += r\n            if steps % 200 == 0 or terminated or truncated:\n                print(\"\\naction \" + str([f\"{x:+0.2f}\" for x in a]))\n                print(f\"step {steps} total_reward {total_reward:+0.2f}\")\n            steps += 1\n            if terminated or truncated or restart or quit:\n                break\n    env.close()\n"
  },
  {
    "path": "gym/envs/box2d/lunar_lander.py",
    "content": "__credits__ = [\"Andrea PIERRÉ\"]\n\nimport math\nimport warnings\nfrom typing import TYPE_CHECKING, Optional\n\nimport numpy as np\n\nimport gym\nfrom gym import error, spaces\nfrom gym.error import DependencyNotInstalled\nfrom gym.utils import EzPickle, colorize\nfrom gym.utils.step_api_compatibility import step_api_compatibility\n\ntry:\n    import Box2D\n    from Box2D.b2 import (\n        circleShape,\n        contactListener,\n        edgeShape,\n        fixtureDef,\n        polygonShape,\n        revoluteJointDef,\n    )\nexcept ImportError:\n    raise DependencyNotInstalled(\"box2d is not installed, run `pip install gym[box2d]`\")\n\n\nif TYPE_CHECKING:\n    import pygame\n\n\nFPS = 50\nSCALE = 30.0  # affects how fast-paced the game is, forces should be adjusted as well\n\nMAIN_ENGINE_POWER = 13.0\nSIDE_ENGINE_POWER = 0.6\n\nINITIAL_RANDOM = 1000.0  # Set 1500 to make game harder\n\nLANDER_POLY = [(-14, +17), (-17, 0), (-17, -10), (+17, -10), (+17, 0), (+14, +17)]\nLEG_AWAY = 20\nLEG_DOWN = 18\nLEG_W, LEG_H = 2, 8\nLEG_SPRING_TORQUE = 40\n\nSIDE_ENGINE_HEIGHT = 14.0\nSIDE_ENGINE_AWAY = 12.0\n\nVIEWPORT_W = 600\nVIEWPORT_H = 400\n\n\nclass ContactDetector(contactListener):\n    def __init__(self, env):\n        contactListener.__init__(self)\n        self.env = env\n\n    def BeginContact(self, contact):\n        if (\n            self.env.lander == contact.fixtureA.body\n            or self.env.lander == contact.fixtureB.body\n        ):\n            self.env.game_over = True\n        for i in range(2):\n            if self.env.legs[i] in [contact.fixtureA.body, contact.fixtureB.body]:\n                self.env.legs[i].ground_contact = True\n\n    def EndContact(self, contact):\n        for i in range(2):\n            if self.env.legs[i] in [contact.fixtureA.body, contact.fixtureB.body]:\n                self.env.legs[i].ground_contact = False\n\n\nclass LunarLander(gym.Env, EzPickle):\n    \"\"\"\n    ### Description\n    This environment is a classic rocket trajectory optimization problem.\n    According to Pontryagin's maximum principle, it is optimal to fire the\n    engine at full throttle or turn it off. This is the reason why this\n    environment has discrete actions: engine on or off.\n\n    There are two environment versions: discrete or continuous.\n    The landing pad is always at coordinates (0,0). The coordinates are the\n    first two numbers in the state vector.\n    Landing outside of the landing pad is possible. Fuel is infinite, so an agent\n    can learn to fly and then land on its first attempt.\n\n    To see a heuristic landing, run:\n    ```\n    python gym/envs/box2d/lunar_lander.py\n    ```\n    <!-- To play yourself, run: -->\n    <!-- python examples/agents/keyboard_agent.py LunarLander-v2 -->\n\n    ### Action Space\n    There are four discrete actions available: do nothing, fire left\n    orientation engine, fire main engine, fire right orientation engine.\n\n    ### Observation Space\n    The state is an 8-dimensional vector: the coordinates of the lander in `x` & `y`, its linear\n    velocities in `x` & `y`, its angle, its angular velocity, and two booleans\n    that represent whether each leg is in contact with the ground or not.\n\n    ### Rewards\n    After every step a reward is granted. The total reward of an episode is the\n    sum of the rewards for all the steps within that episode.\n\n    For each step, the reward:\n    - is increased/decreased the closer/further the lander is to the landing pad.\n    - is increased/decreased the slower/faster the lander is moving.\n    - is decreased the more the lander is tilted (angle not horizontal).\n    - is increased by 10 points for each leg that is in contact with the ground.\n    - is decreased by 0.03 points each frame a side engine is firing.\n    - is decreased by 0.3 points each frame the main engine is firing.\n\n    The episode receive an additional reward of -100 or +100 points for crashing or landing safely respectively.\n\n    An episode is considered a solution if it scores at least 200 points.\n\n    ### Starting State\n    The lander starts at the top center of the viewport with a random initial\n    force applied to its center of mass.\n\n    ### Episode Termination\n    The episode finishes if:\n    1) the lander crashes (the lander body gets in contact with the moon);\n    2) the lander gets outside of the viewport (`x` coordinate is greater than 1);\n    3) the lander is not awake. From the [Box2D docs](https://box2d.org/documentation/md__d_1__git_hub_box2d_docs_dynamics.html#autotoc_md61),\n        a body which is not awake is a body which doesn't move and doesn't\n        collide with any other body:\n    > When Box2D determines that a body (or group of bodies) has come to rest,\n    > the body enters a sleep state which has very little CPU overhead. If a\n    > body is awake and collides with a sleeping body, then the sleeping body\n    > wakes up. Bodies will also wake up if a joint or contact attached to\n    > them is destroyed.\n\n    ### Arguments\n    To use to the _continuous_ environment, you need to specify the\n    `continuous=True` argument like below:\n    ```python\n    import gym\n    env = gym.make(\n        \"LunarLander-v2\",\n        continuous: bool = False,\n        gravity: float = -10.0,\n        enable_wind: bool = False,\n        wind_power: float = 15.0,\n        turbulence_power: float = 1.5,\n    )\n    ```\n    If `continuous=True` is passed, continuous actions (corresponding to the throttle of the engines) will be used and the\n    action space will be `Box(-1, +1, (2,), dtype=np.float32)`.\n    The first coordinate of an action determines the throttle of the main engine, while the second\n    coordinate specifies the throttle of the lateral boosters.\n    Given an action `np.array([main, lateral])`, the main engine will be turned off completely if\n    `main < 0` and the throttle scales affinely from 50% to 100% for `0 <= main <= 1` (in particular, the\n    main engine doesn't work  with less than 50% power).\n    Similarly, if `-0.5 < lateral < 0.5`, the lateral boosters will not fire at all. If `lateral < -0.5`, the left\n    booster will fire, and if `lateral > 0.5`, the right booster will fire. Again, the throttle scales affinely\n    from 50% to 100% between -1 and -0.5 (and 0.5 and 1, respectively).\n\n    `gravity` dictates the gravitational constant, this is bounded to be within 0 and -12.\n\n    If `enable_wind=True` is passed, there will be wind effects applied to the lander.\n    The wind is generated using the function `tanh(sin(2 k (t+C)) + sin(pi k (t+C)))`.\n    `k` is set to 0.01.\n    `C` is sampled randomly between -9999 and 9999.\n\n    `wind_power` dictates the maximum magnitude of linear wind applied to the craft. The recommended value for `wind_power` is between 0.0 and 20.0.\n    `turbulence_power` dictates the maximum magnitude of rotational wind applied to the craft. The recommended value for `turbulence_power` is between 0.0 and 2.0.\n\n    ### Version History\n    - v2: Count energy spent and in v0.24, added turbulance with wind power and turbulence_power parameters\n    - v1: Legs contact with ground added in state vector; contact with ground\n        give +10 reward points, and -10 if then lose contact; reward\n        renormalized to 200; harder initial random push.\n    - v0: Initial version\n\n    <!-- ### References -->\n\n    ### Credits\n    Created by Oleg Klimov\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\"human\", \"rgb_array\"],\n        \"render_fps\": FPS,\n    }\n\n    def __init__(\n        self,\n        render_mode: Optional[str] = None,\n        continuous: bool = False,\n        gravity: float = -10.0,\n        enable_wind: bool = False,\n        wind_power: float = 15.0,\n        turbulence_power: float = 1.5,\n    ):\n        EzPickle.__init__(\n            self,\n            render_mode,\n            continuous,\n            gravity,\n            enable_wind,\n            wind_power,\n            turbulence_power,\n        )\n\n        assert (\n            -12.0 < gravity and gravity < 0.0\n        ), f\"gravity (current value: {gravity}) must be between -12 and 0\"\n        self.gravity = gravity\n\n        if 0.0 > wind_power or wind_power > 20.0:\n            warnings.warn(\n                colorize(\n                    f\"WARN: wind_power value is recommended to be between 0.0 and 20.0, (current value: {wind_power})\",\n                    \"yellow\",\n                ),\n            )\n        self.wind_power = wind_power\n\n        if 0.0 > turbulence_power or turbulence_power > 2.0:\n            warnings.warn(\n                colorize(\n                    f\"WARN: turbulence_power value is recommended to be between 0.0 and 2.0, (current value: {turbulence_power})\",\n                    \"yellow\",\n                ),\n            )\n        self.turbulence_power = turbulence_power\n\n        self.enable_wind = enable_wind\n        self.wind_idx = np.random.randint(-9999, 9999)\n        self.torque_idx = np.random.randint(-9999, 9999)\n\n        self.screen: pygame.Surface = None\n        self.clock = None\n        self.isopen = True\n        self.world = Box2D.b2World(gravity=(0, gravity))\n        self.moon = None\n        self.lander: Optional[Box2D.b2Body] = None\n        self.particles = []\n\n        self.prev_reward = None\n\n        self.continuous = continuous\n\n        low = np.array(\n            [\n                # these are bounds for position\n                # realistically the environment should have ended\n                # long before we reach more than 50% outside\n                -1.5,\n                -1.5,\n                # velocity bounds is 5x rated speed\n                -5.0,\n                -5.0,\n                -math.pi,\n                -5.0,\n                -0.0,\n                -0.0,\n            ]\n        ).astype(np.float32)\n        high = np.array(\n            [\n                # these are bounds for position\n                # realistically the environment should have ended\n                # long before we reach more than 50% outside\n                1.5,\n                1.5,\n                # velocity bounds is 5x rated speed\n                5.0,\n                5.0,\n                math.pi,\n                5.0,\n                1.0,\n                1.0,\n            ]\n        ).astype(np.float32)\n\n        # useful range is -1 .. +1, but spikes can be higher\n        self.observation_space = spaces.Box(low, high)\n\n        if self.continuous:\n            # Action is two floats [main engine, left-right engines].\n            # Main engine: -1..0 off, 0..+1 throttle from 50% to 100% power. Engine can't work with less than 50% power.\n            # Left-right:  -1.0..-0.5 fire left engine, +0.5..+1.0 fire right engine, -0.5..0.5 off\n            self.action_space = spaces.Box(-1, +1, (2,), dtype=np.float32)\n        else:\n            # Nop, fire left engine, main engine, right engine\n            self.action_space = spaces.Discrete(4)\n\n        self.render_mode = render_mode\n\n    def _destroy(self):\n        if not self.moon:\n            return\n        self.world.contactListener = None\n        self._clean_particles(True)\n        self.world.DestroyBody(self.moon)\n        self.moon = None\n        self.world.DestroyBody(self.lander)\n        self.lander = None\n        self.world.DestroyBody(self.legs[0])\n        self.world.DestroyBody(self.legs[1])\n\n    def reset(\n        self,\n        *,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ):\n        super().reset(seed=seed)\n        self._destroy()\n        self.world.contactListener_keepref = ContactDetector(self)\n        self.world.contactListener = self.world.contactListener_keepref\n        self.game_over = False\n        self.prev_shaping = None\n\n        W = VIEWPORT_W / SCALE\n        H = VIEWPORT_H / SCALE\n\n        # terrain\n        CHUNKS = 11\n        height = self.np_random.uniform(0, H / 2, size=(CHUNKS + 1,))\n        chunk_x = [W / (CHUNKS - 1) * i for i in range(CHUNKS)]\n        self.helipad_x1 = chunk_x[CHUNKS // 2 - 1]\n        self.helipad_x2 = chunk_x[CHUNKS // 2 + 1]\n        self.helipad_y = H / 4\n        height[CHUNKS // 2 - 2] = self.helipad_y\n        height[CHUNKS // 2 - 1] = self.helipad_y\n        height[CHUNKS // 2 + 0] = self.helipad_y\n        height[CHUNKS // 2 + 1] = self.helipad_y\n        height[CHUNKS // 2 + 2] = self.helipad_y\n        smooth_y = [\n            0.33 * (height[i - 1] + height[i + 0] + height[i + 1])\n            for i in range(CHUNKS)\n        ]\n\n        self.moon = self.world.CreateStaticBody(\n            shapes=edgeShape(vertices=[(0, 0), (W, 0)])\n        )\n        self.sky_polys = []\n        for i in range(CHUNKS - 1):\n            p1 = (chunk_x[i], smooth_y[i])\n            p2 = (chunk_x[i + 1], smooth_y[i + 1])\n            self.moon.CreateEdgeFixture(vertices=[p1, p2], density=0, friction=0.1)\n            self.sky_polys.append([p1, p2, (p2[0], H), (p1[0], H)])\n\n        self.moon.color1 = (0.0, 0.0, 0.0)\n        self.moon.color2 = (0.0, 0.0, 0.0)\n\n        initial_y = VIEWPORT_H / SCALE\n        self.lander: Box2D.b2Body = self.world.CreateDynamicBody(\n            position=(VIEWPORT_W / SCALE / 2, initial_y),\n            angle=0.0,\n            fixtures=fixtureDef(\n                shape=polygonShape(\n                    vertices=[(x / SCALE, y / SCALE) for x, y in LANDER_POLY]\n                ),\n                density=5.0,\n                friction=0.1,\n                categoryBits=0x0010,\n                maskBits=0x001,  # collide only with ground\n                restitution=0.0,\n            ),  # 0.99 bouncy\n        )\n        self.lander.color1 = (128, 102, 230)\n        self.lander.color2 = (77, 77, 128)\n        self.lander.ApplyForceToCenter(\n            (\n                self.np_random.uniform(-INITIAL_RANDOM, INITIAL_RANDOM),\n                self.np_random.uniform(-INITIAL_RANDOM, INITIAL_RANDOM),\n            ),\n            True,\n        )\n\n        self.legs = []\n        for i in [-1, +1]:\n            leg = self.world.CreateDynamicBody(\n                position=(VIEWPORT_W / SCALE / 2 - i * LEG_AWAY / SCALE, initial_y),\n                angle=(i * 0.05),\n                fixtures=fixtureDef(\n                    shape=polygonShape(box=(LEG_W / SCALE, LEG_H / SCALE)),\n                    density=1.0,\n                    restitution=0.0,\n                    categoryBits=0x0020,\n                    maskBits=0x001,\n                ),\n            )\n            leg.ground_contact = False\n            leg.color1 = (128, 102, 230)\n            leg.color2 = (77, 77, 128)\n            rjd = revoluteJointDef(\n                bodyA=self.lander,\n                bodyB=leg,\n                localAnchorA=(0, 0),\n                localAnchorB=(i * LEG_AWAY / SCALE, LEG_DOWN / SCALE),\n                enableMotor=True,\n                enableLimit=True,\n                maxMotorTorque=LEG_SPRING_TORQUE,\n                motorSpeed=+0.3 * i,  # low enough not to jump back into the sky\n            )\n            if i == -1:\n                rjd.lowerAngle = (\n                    +0.9 - 0.5\n                )  # The most esoteric numbers here, angled legs have freedom to travel within\n                rjd.upperAngle = +0.9\n            else:\n                rjd.lowerAngle = -0.9\n                rjd.upperAngle = -0.9 + 0.5\n            leg.joint = self.world.CreateJoint(rjd)\n            self.legs.append(leg)\n\n        self.drawlist = [self.lander] + self.legs\n\n        if self.render_mode == \"human\":\n            self.render()\n        return self.step(np.array([0, 0]) if self.continuous else 0)[0], {}\n\n    def _create_particle(self, mass, x, y, ttl):\n        p = self.world.CreateDynamicBody(\n            position=(x, y),\n            angle=0.0,\n            fixtures=fixtureDef(\n                shape=circleShape(radius=2 / SCALE, pos=(0, 0)),\n                density=mass,\n                friction=0.1,\n                categoryBits=0x0100,\n                maskBits=0x001,  # collide only with ground\n                restitution=0.3,\n            ),\n        )\n        p.ttl = ttl\n        self.particles.append(p)\n        self._clean_particles(False)\n        return p\n\n    def _clean_particles(self, all):\n        while self.particles and (all or self.particles[0].ttl < 0):\n            self.world.DestroyBody(self.particles.pop(0))\n\n    def step(self, action):\n        assert self.lander is not None\n\n        # Update wind\n        assert self.lander is not None, \"You forgot to call reset()\"\n        if self.enable_wind and not (\n            self.legs[0].ground_contact or self.legs[1].ground_contact\n        ):\n            # the function used for wind is tanh(sin(2 k x) + sin(pi k x)),\n            # which is proven to never be periodic, k = 0.01\n            wind_mag = (\n                math.tanh(\n                    math.sin(0.02 * self.wind_idx)\n                    + (math.sin(math.pi * 0.01 * self.wind_idx))\n                )\n                * self.wind_power\n            )\n            self.wind_idx += 1\n            self.lander.ApplyForceToCenter(\n                (wind_mag, 0.0),\n                True,\n            )\n\n            # the function used for torque is tanh(sin(2 k x) + sin(pi k x)),\n            # which is proven to never be periodic, k = 0.01\n            torque_mag = math.tanh(\n                math.sin(0.02 * self.torque_idx)\n                + (math.sin(math.pi * 0.01 * self.torque_idx))\n            ) * (self.turbulence_power)\n            self.torque_idx += 1\n            self.lander.ApplyTorque(\n                (torque_mag),\n                True,\n            )\n\n        if self.continuous:\n            action = np.clip(action, -1, +1).astype(np.float32)\n        else:\n            assert self.action_space.contains(\n                action\n            ), f\"{action!r} ({type(action)}) invalid \"\n\n        # Engines\n        tip = (math.sin(self.lander.angle), math.cos(self.lander.angle))\n        side = (-tip[1], tip[0])\n        dispersion = [self.np_random.uniform(-1.0, +1.0) / SCALE for _ in range(2)]\n\n        m_power = 0.0\n        if (self.continuous and action[0] > 0.0) or (\n            not self.continuous and action == 2\n        ):\n            # Main engine\n            if self.continuous:\n                m_power = (np.clip(action[0], 0.0, 1.0) + 1.0) * 0.5  # 0.5..1.0\n                assert m_power >= 0.5 and m_power <= 1.0\n            else:\n                m_power = 1.0\n            # 4 is move a bit downwards, +-2 for randomness\n            ox = tip[0] * (4 / SCALE + 2 * dispersion[0]) + side[0] * dispersion[1]\n            oy = -tip[1] * (4 / SCALE + 2 * dispersion[0]) - side[1] * dispersion[1]\n            impulse_pos = (self.lander.position[0] + ox, self.lander.position[1] + oy)\n            p = self._create_particle(\n                3.5,  # 3.5 is here to make particle speed adequate\n                impulse_pos[0],\n                impulse_pos[1],\n                m_power,\n            )  # particles are just a decoration\n            p.ApplyLinearImpulse(\n                (ox * MAIN_ENGINE_POWER * m_power, oy * MAIN_ENGINE_POWER * m_power),\n                impulse_pos,\n                True,\n            )\n            self.lander.ApplyLinearImpulse(\n                (-ox * MAIN_ENGINE_POWER * m_power, -oy * MAIN_ENGINE_POWER * m_power),\n                impulse_pos,\n                True,\n            )\n\n        s_power = 0.0\n        if (self.continuous and np.abs(action[1]) > 0.5) or (\n            not self.continuous and action in [1, 3]\n        ):\n            # Orientation engines\n            if self.continuous:\n                direction = np.sign(action[1])\n                s_power = np.clip(np.abs(action[1]), 0.5, 1.0)\n                assert s_power >= 0.5 and s_power <= 1.0\n            else:\n                direction = action - 2\n                s_power = 1.0\n            ox = tip[0] * dispersion[0] + side[0] * (\n                3 * dispersion[1] + direction * SIDE_ENGINE_AWAY / SCALE\n            )\n            oy = -tip[1] * dispersion[0] - side[1] * (\n                3 * dispersion[1] + direction * SIDE_ENGINE_AWAY / SCALE\n            )\n            impulse_pos = (\n                self.lander.position[0] + ox - tip[0] * 17 / SCALE,\n                self.lander.position[1] + oy + tip[1] * SIDE_ENGINE_HEIGHT / SCALE,\n            )\n            p = self._create_particle(0.7, impulse_pos[0], impulse_pos[1], s_power)\n            p.ApplyLinearImpulse(\n                (ox * SIDE_ENGINE_POWER * s_power, oy * SIDE_ENGINE_POWER * s_power),\n                impulse_pos,\n                True,\n            )\n            self.lander.ApplyLinearImpulse(\n                (-ox * SIDE_ENGINE_POWER * s_power, -oy * SIDE_ENGINE_POWER * s_power),\n                impulse_pos,\n                True,\n            )\n\n        self.world.Step(1.0 / FPS, 6 * 30, 2 * 30)\n\n        pos = self.lander.position\n        vel = self.lander.linearVelocity\n        state = [\n            (pos.x - VIEWPORT_W / SCALE / 2) / (VIEWPORT_W / SCALE / 2),\n            (pos.y - (self.helipad_y + LEG_DOWN / SCALE)) / (VIEWPORT_H / SCALE / 2),\n            vel.x * (VIEWPORT_W / SCALE / 2) / FPS,\n            vel.y * (VIEWPORT_H / SCALE / 2) / FPS,\n            self.lander.angle,\n            20.0 * self.lander.angularVelocity / FPS,\n            1.0 if self.legs[0].ground_contact else 0.0,\n            1.0 if self.legs[1].ground_contact else 0.0,\n        ]\n        assert len(state) == 8\n\n        reward = 0\n        shaping = (\n            -100 * np.sqrt(state[0] * state[0] + state[1] * state[1])\n            - 100 * np.sqrt(state[2] * state[2] + state[3] * state[3])\n            - 100 * abs(state[4])\n            + 10 * state[6]\n            + 10 * state[7]\n        )  # And ten points for legs contact, the idea is if you\n        # lose contact again after landing, you get negative reward\n        if self.prev_shaping is not None:\n            reward = shaping - self.prev_shaping\n        self.prev_shaping = shaping\n\n        reward -= (\n            m_power * 0.30\n        )  # less fuel spent is better, about -30 for heuristic landing\n        reward -= s_power * 0.03\n\n        terminated = False\n        if self.game_over or abs(state[0]) >= 1.0:\n            terminated = True\n            reward = -100\n        if not self.lander.awake:\n            terminated = True\n            reward = +100\n\n        if self.render_mode == \"human\":\n            self.render()\n        return np.array(state, dtype=np.float32), reward, terminated, False, {}\n\n    def render(self):\n        if self.render_mode is None:\n            gym.logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n            return\n\n        try:\n            import pygame\n            from pygame import gfxdraw\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[box2d]`\"\n            )\n\n        if self.screen is None and self.render_mode == \"human\":\n            pygame.init()\n            pygame.display.init()\n            self.screen = pygame.display.set_mode((VIEWPORT_W, VIEWPORT_H))\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n\n        self.surf = pygame.Surface((VIEWPORT_W, VIEWPORT_H))\n\n        pygame.transform.scale(self.surf, (SCALE, SCALE))\n        pygame.draw.rect(self.surf, (255, 255, 255), self.surf.get_rect())\n\n        for obj in self.particles:\n            obj.ttl -= 0.15\n            obj.color1 = (\n                int(max(0.2, 0.15 + obj.ttl) * 255),\n                int(max(0.2, 0.5 * obj.ttl) * 255),\n                int(max(0.2, 0.5 * obj.ttl) * 255),\n            )\n            obj.color2 = (\n                int(max(0.2, 0.15 + obj.ttl) * 255),\n                int(max(0.2, 0.5 * obj.ttl) * 255),\n                int(max(0.2, 0.5 * obj.ttl) * 255),\n            )\n\n        self._clean_particles(False)\n\n        for p in self.sky_polys:\n            scaled_poly = []\n            for coord in p:\n                scaled_poly.append((coord[0] * SCALE, coord[1] * SCALE))\n            pygame.draw.polygon(self.surf, (0, 0, 0), scaled_poly)\n            gfxdraw.aapolygon(self.surf, scaled_poly, (0, 0, 0))\n\n        for obj in self.particles + self.drawlist:\n            for f in obj.fixtures:\n                trans = f.body.transform\n                if type(f.shape) is circleShape:\n                    pygame.draw.circle(\n                        self.surf,\n                        color=obj.color1,\n                        center=trans * f.shape.pos * SCALE,\n                        radius=f.shape.radius * SCALE,\n                    )\n                    pygame.draw.circle(\n                        self.surf,\n                        color=obj.color2,\n                        center=trans * f.shape.pos * SCALE,\n                        radius=f.shape.radius * SCALE,\n                    )\n\n                else:\n                    path = [trans * v * SCALE for v in f.shape.vertices]\n                    pygame.draw.polygon(self.surf, color=obj.color1, points=path)\n                    gfxdraw.aapolygon(self.surf, path, obj.color1)\n                    pygame.draw.aalines(\n                        self.surf, color=obj.color2, points=path, closed=True\n                    )\n\n                for x in [self.helipad_x1, self.helipad_x2]:\n                    x = x * SCALE\n                    flagy1 = self.helipad_y * SCALE\n                    flagy2 = flagy1 + 50\n                    pygame.draw.line(\n                        self.surf,\n                        color=(255, 255, 255),\n                        start_pos=(x, flagy1),\n                        end_pos=(x, flagy2),\n                        width=1,\n                    )\n                    pygame.draw.polygon(\n                        self.surf,\n                        color=(204, 204, 0),\n                        points=[\n                            (x, flagy2),\n                            (x, flagy2 - 10),\n                            (x + 25, flagy2 - 5),\n                        ],\n                    )\n                    gfxdraw.aapolygon(\n                        self.surf,\n                        [(x, flagy2), (x, flagy2 - 10), (x + 25, flagy2 - 5)],\n                        (204, 204, 0),\n                    )\n\n        self.surf = pygame.transform.flip(self.surf, False, True)\n\n        if self.render_mode == \"human\":\n            assert self.screen is not None\n            self.screen.blit(self.surf, (0, 0))\n            pygame.event.pump()\n            self.clock.tick(self.metadata[\"render_fps\"])\n            pygame.display.flip()\n        elif self.render_mode == \"rgb_array\":\n            return np.transpose(\n                np.array(pygame.surfarray.pixels3d(self.surf)), axes=(1, 0, 2)\n            )\n\n    def close(self):\n        if self.screen is not None:\n            import pygame\n\n            pygame.display.quit()\n            pygame.quit()\n            self.isopen = False\n\n\ndef heuristic(env, s):\n    \"\"\"\n    The heuristic for\n    1. Testing\n    2. Demonstration rollout.\n\n    Args:\n        env: The environment\n        s (list): The state. Attributes:\n            s[0] is the horizontal coordinate\n            s[1] is the vertical coordinate\n            s[2] is the horizontal speed\n            s[3] is the vertical speed\n            s[4] is the angle\n            s[5] is the angular speed\n            s[6] 1 if first leg has contact, else 0\n            s[7] 1 if second leg has contact, else 0\n\n    Returns:\n         a: The heuristic to be fed into the step function defined above to determine the next step and reward.\n    \"\"\"\n\n    angle_targ = s[0] * 0.5 + s[2] * 1.0  # angle should point towards center\n    if angle_targ > 0.4:\n        angle_targ = 0.4  # more than 0.4 radians (22 degrees) is bad\n    if angle_targ < -0.4:\n        angle_targ = -0.4\n    hover_targ = 0.55 * np.abs(\n        s[0]\n    )  # target y should be proportional to horizontal offset\n\n    angle_todo = (angle_targ - s[4]) * 0.5 - (s[5]) * 1.0\n    hover_todo = (hover_targ - s[1]) * 0.5 - (s[3]) * 0.5\n\n    if s[6] or s[7]:  # legs have contact\n        angle_todo = 0\n        hover_todo = (\n            -(s[3]) * 0.5\n        )  # override to reduce fall speed, that's all we need after contact\n\n    if env.continuous:\n        a = np.array([hover_todo * 20 - 1, -angle_todo * 20])\n        a = np.clip(a, -1, +1)\n    else:\n        a = 0\n        if hover_todo > np.abs(angle_todo) and hover_todo > 0.05:\n            a = 2\n        elif angle_todo < -0.05:\n            a = 3\n        elif angle_todo > +0.05:\n            a = 1\n    return a\n\n\ndef demo_heuristic_lander(env, seed=None, render=False):\n\n    total_reward = 0\n    steps = 0\n    s, info = env.reset(seed=seed)\n    while True:\n        a = heuristic(env, s)\n        s, r, terminated, truncated, info = step_api_compatibility(env.step(a), True)\n        total_reward += r\n\n        if render:\n            still_open = env.render()\n            if still_open is False:\n                break\n\n        if steps % 20 == 0 or terminated or truncated:\n            print(\"observations:\", \" \".join([f\"{x:+0.2f}\" for x in s]))\n            print(f\"step {steps} total_reward {total_reward:+0.2f}\")\n        steps += 1\n        if terminated or truncated:\n            break\n    if render:\n        env.close()\n    return total_reward\n\n\nclass LunarLanderContinuous:\n    def __init__(self):\n        raise error.Error(\n            \"Error initializing LunarLanderContinuous Environment.\\n\"\n            \"Currently, we do not support initializing this mode of environment by calling the class directly.\\n\"\n            \"To use this environment, instead create it by specifying the continuous keyword in gym.make, i.e.\\n\"\n            'gym.make(\"LunarLander-v2\", continuous=True)'\n        )\n\n\nif __name__ == \"__main__\":\n    demo_heuristic_lander(LunarLander(), render=True)\n"
  },
  {
    "path": "gym/envs/classic_control/__init__.py",
    "content": "from gym.envs.classic_control.acrobot import AcrobotEnv\nfrom gym.envs.classic_control.cartpole import CartPoleEnv\nfrom gym.envs.classic_control.continuous_mountain_car import Continuous_MountainCarEnv\nfrom gym.envs.classic_control.mountain_car import MountainCarEnv\nfrom gym.envs.classic_control.pendulum import PendulumEnv\n"
  },
  {
    "path": "gym/envs/classic_control/acrobot.py",
    "content": "\"\"\"classic Acrobot task\"\"\"\nfrom typing import Optional\n\nimport numpy as np\nfrom numpy import cos, pi, sin\n\nfrom gym import core, logger, spaces\nfrom gym.error import DependencyNotInstalled\n\n__copyright__ = \"Copyright 2013, RLPy http://acl.mit.edu/RLPy\"\n__credits__ = [\n    \"Alborz Geramifard\",\n    \"Robert H. Klein\",\n    \"Christoph Dann\",\n    \"William Dabney\",\n    \"Jonathan P. How\",\n]\n__license__ = \"BSD 3-Clause\"\n__author__ = \"Christoph Dann <cdann@cdann.de>\"\n\n# SOURCE:\n# https://github.com/rlpy/rlpy/blob/master/rlpy/Domains/Acrobot.py\nfrom gym.envs.classic_control import utils\n\n\nclass AcrobotEnv(core.Env):\n    \"\"\"\n    ### Description\n\n    The Acrobot environment is based on Sutton's work in\n    [\"Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding\"](https://papers.nips.cc/paper/1995/hash/8f1d43620bc6bb580df6e80b0dc05c48-Abstract.html)\n    and [Sutton and Barto's book](http://www.incompleteideas.net/book/the-book-2nd.html).\n    The system consists of two links connected linearly to form a chain, with one end of\n    the chain fixed. The joint between the two links is actuated. The goal is to apply\n    torques on the actuated joint to swing the free end of the linear chain above a\n    given height while starting from the initial state of hanging downwards.\n\n    As seen in the **Gif**: two blue links connected by two green joints. The joint in\n    between the two links is actuated. The goal is to swing the free end of the outer-link\n    to reach the target height (black horizontal line above system) by applying torque on\n    the actuator.\n\n    ### Action Space\n\n    The action is discrete, deterministic, and represents the torque applied on the actuated\n    joint between the two links.\n\n    | Num | Action                                | Unit         |\n    |-----|---------------------------------------|--------------|\n    | 0   | apply -1 torque to the actuated joint | torque (N m) |\n    | 1   | apply 0 torque to the actuated joint  | torque (N m) |\n    | 2   | apply 1 torque to the actuated joint  | torque (N m) |\n\n    ### Observation Space\n\n    The observation is a `ndarray` with shape `(6,)` that provides information about the\n    two rotational joint angles as well as their angular velocities:\n\n    | Num | Observation                  | Min                 | Max               |\n    |-----|------------------------------|---------------------|-------------------|\n    | 0   | Cosine of `theta1`           | -1                  | 1                 |\n    | 1   | Sine of `theta1`             | -1                  | 1                 |\n    | 2   | Cosine of `theta2`           | -1                  | 1                 |\n    | 3   | Sine of `theta2`             | -1                  | 1                 |\n    | 4   | Angular velocity of `theta1` | ~ -12.567 (-4 * pi) | ~ 12.567 (4 * pi) |\n    | 5   | Angular velocity of `theta2` | ~ -28.274 (-9 * pi) | ~ 28.274 (9 * pi) |\n\n    where\n    - `theta1` is the angle of the first joint, where an angle of 0 indicates the first link is pointing directly\n    downwards.\n    - `theta2` is ***relative to the angle of the first link.***\n        An angle of 0 corresponds to having the same angle between the two links.\n\n    The angular velocities of `theta1` and `theta2` are bounded at ±4π, and ±9π rad/s respectively.\n    A state of `[1, 0, 1, 0, ..., ...]` indicates that both links are pointing downwards.\n\n    ### Rewards\n\n    The goal is to have the free end reach a designated target height in as few steps as possible,\n    and as such all steps that do not reach the goal incur a reward of -1.\n    Achieving the target height results in termination with a reward of 0. The reward threshold is -100.\n\n    ### Starting State\n\n    Each parameter in the underlying state (`theta1`, `theta2`, and the two angular velocities) is initialized\n    uniformly between -0.1 and 0.1. This means both links are pointing downwards with some initial stochasticity.\n\n    ### Episode End\n\n    The episode ends if one of the following occurs:\n    1. Termination: The free end reaches the target height, which is constructed as:\n    `-cos(theta1) - cos(theta2 + theta1) > 1.0`\n    2. Truncation: Episode length is greater than 500 (200 for v0)\n\n    ### Arguments\n\n    No additional arguments are currently supported.\n\n    ```\n    env = gym.make('Acrobot-v1')\n    ```\n\n    By default, the dynamics of the acrobot follow those described in Sutton and Barto's book\n    [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/11/node4.html).\n    However, a `book_or_nips` parameter can be modified to change the pendulum dynamics to those described\n    in the original [NeurIPS paper](https://papers.nips.cc/paper/1995/hash/8f1d43620bc6bb580df6e80b0dc05c48-Abstract.html).\n\n    ```\n    # To change the dynamics as described above\n    env.env.book_or_nips = 'nips'\n    ```\n\n    See the following note and\n    the [implementation](https://github.com/openai/gym/blob/master/gym/envs/classic_control/acrobot.py) for details:\n\n    > The dynamics equations were missing some terms in the NIPS paper which\n            are present in the book. R. Sutton confirmed in personal correspondence\n            that the experimental results shown in the paper and the book were\n            generated with the equations shown in the book.\n            However, there is the option to run the domain with the paper equations\n            by setting `book_or_nips = 'nips'`\n\n\n    ### Version History\n\n    - v1: Maximum number of steps increased from 200 to 500. The observation space for v0 provided direct readings of\n    `theta1` and `theta2` in radians, having a range of `[-pi, pi]`. The v1 observation space as described here provides the\n    sine and cosine of each angle instead.\n    - v0: Initial versions release (1.0.0) (removed from gym for v1)\n\n    ### References\n    - Sutton, R. S. (1996). Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding.\n        In D. Touretzky, M. C. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8).\n        MIT Press. https://proceedings.neurips.cc/paper/1995/file/8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf\n    - Sutton, R. S., Barto, A. G. (2018 ). Reinforcement Learning: An Introduction. The MIT Press.\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\"human\", \"rgb_array\"],\n        \"render_fps\": 15,\n    }\n\n    dt = 0.2\n\n    LINK_LENGTH_1 = 1.0  # [m]\n    LINK_LENGTH_2 = 1.0  # [m]\n    LINK_MASS_1 = 1.0  #: [kg] mass of link 1\n    LINK_MASS_2 = 1.0  #: [kg] mass of link 2\n    LINK_COM_POS_1 = 0.5  #: [m] position of the center of mass of link 1\n    LINK_COM_POS_2 = 0.5  #: [m] position of the center of mass of link 2\n    LINK_MOI = 1.0  #: moments of inertia for both links\n\n    MAX_VEL_1 = 4 * pi\n    MAX_VEL_2 = 9 * pi\n\n    AVAIL_TORQUE = [-1.0, 0.0, +1]\n\n    torque_noise_max = 0.0\n\n    SCREEN_DIM = 500\n\n    #: use dynamics equations from the nips paper or the book\n    book_or_nips = \"book\"\n    action_arrow = None\n    domain_fig = None\n    actions_num = 3\n\n    def __init__(self, render_mode: Optional[str] = None):\n        self.render_mode = render_mode\n        self.screen = None\n        self.clock = None\n        self.isopen = True\n        high = np.array(\n            [1.0, 1.0, 1.0, 1.0, self.MAX_VEL_1, self.MAX_VEL_2], dtype=np.float32\n        )\n        low = -high\n        self.observation_space = spaces.Box(low=low, high=high, dtype=np.float32)\n        self.action_space = spaces.Discrete(3)\n        self.state = None\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        # Note that if you use custom reset bounds, it may lead to out-of-bound\n        # state/observations.\n        low, high = utils.maybe_parse_reset_bounds(\n            options, -0.1, 0.1  # default low\n        )  # default high\n        self.state = self.np_random.uniform(low=low, high=high, size=(4,)).astype(\n            np.float32\n        )\n\n        if self.render_mode == \"human\":\n            self.render()\n        return self._get_ob(), {}\n\n    def step(self, a):\n        s = self.state\n        assert s is not None, \"Call reset before using AcrobotEnv object.\"\n        torque = self.AVAIL_TORQUE[a]\n\n        # Add noise to the force action\n        if self.torque_noise_max > 0:\n            torque += self.np_random.uniform(\n                -self.torque_noise_max, self.torque_noise_max\n            )\n\n        # Now, augment the state with our force action so it can be passed to\n        # _dsdt\n        s_augmented = np.append(s, torque)\n\n        ns = rk4(self._dsdt, s_augmented, [0, self.dt])\n\n        ns[0] = wrap(ns[0], -pi, pi)\n        ns[1] = wrap(ns[1], -pi, pi)\n        ns[2] = bound(ns[2], -self.MAX_VEL_1, self.MAX_VEL_1)\n        ns[3] = bound(ns[3], -self.MAX_VEL_2, self.MAX_VEL_2)\n        self.state = ns\n        terminated = self._terminal()\n        reward = -1.0 if not terminated else 0.0\n\n        if self.render_mode == \"human\":\n            self.render()\n        return (self._get_ob(), reward, terminated, False, {})\n\n    def _get_ob(self):\n        s = self.state\n        assert s is not None, \"Call reset before using AcrobotEnv object.\"\n        return np.array(\n            [cos(s[0]), sin(s[0]), cos(s[1]), sin(s[1]), s[2], s[3]], dtype=np.float32\n        )\n\n    def _terminal(self):\n        s = self.state\n        assert s is not None, \"Call reset before using AcrobotEnv object.\"\n        return bool(-cos(s[0]) - cos(s[1] + s[0]) > 1.0)\n\n    def _dsdt(self, s_augmented):\n        m1 = self.LINK_MASS_1\n        m2 = self.LINK_MASS_2\n        l1 = self.LINK_LENGTH_1\n        lc1 = self.LINK_COM_POS_1\n        lc2 = self.LINK_COM_POS_2\n        I1 = self.LINK_MOI\n        I2 = self.LINK_MOI\n        g = 9.8\n        a = s_augmented[-1]\n        s = s_augmented[:-1]\n        theta1 = s[0]\n        theta2 = s[1]\n        dtheta1 = s[2]\n        dtheta2 = s[3]\n        d1 = (\n            m1 * lc1**2\n            + m2 * (l1**2 + lc2**2 + 2 * l1 * lc2 * cos(theta2))\n            + I1\n            + I2\n        )\n        d2 = m2 * (lc2**2 + l1 * lc2 * cos(theta2)) + I2\n        phi2 = m2 * lc2 * g * cos(theta1 + theta2 - pi / 2.0)\n        phi1 = (\n            -m2 * l1 * lc2 * dtheta2**2 * sin(theta2)\n            - 2 * m2 * l1 * lc2 * dtheta2 * dtheta1 * sin(theta2)\n            + (m1 * lc1 + m2 * l1) * g * cos(theta1 - pi / 2)\n            + phi2\n        )\n        if self.book_or_nips == \"nips\":\n            # the following line is consistent with the description in the\n            # paper\n            ddtheta2 = (a + d2 / d1 * phi1 - phi2) / (m2 * lc2**2 + I2 - d2**2 / d1)\n        else:\n            # the following line is consistent with the java implementation and the\n            # book\n            ddtheta2 = (\n                a + d2 / d1 * phi1 - m2 * l1 * lc2 * dtheta1**2 * sin(theta2) - phi2\n            ) / (m2 * lc2**2 + I2 - d2**2 / d1)\n        ddtheta1 = -(d2 * ddtheta2 + phi1) / d1\n        return dtheta1, dtheta2, ddtheta1, ddtheta2, 0.0\n\n    def render(self):\n        if self.render_mode is None:\n            logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n            return\n\n        try:\n            import pygame\n            from pygame import gfxdraw\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[classic_control]`\"\n            )\n\n        if self.screen is None:\n            pygame.init()\n            if self.render_mode == \"human\":\n                pygame.display.init()\n                self.screen = pygame.display.set_mode(\n                    (self.SCREEN_DIM, self.SCREEN_DIM)\n                )\n            else:  # mode in \"rgb_array\"\n                self.screen = pygame.Surface((self.SCREEN_DIM, self.SCREEN_DIM))\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n\n        surf = pygame.Surface((self.SCREEN_DIM, self.SCREEN_DIM))\n        surf.fill((255, 255, 255))\n        s = self.state\n\n        bound = self.LINK_LENGTH_1 + self.LINK_LENGTH_2 + 0.2  # 2.2 for default\n        scale = self.SCREEN_DIM / (bound * 2)\n        offset = self.SCREEN_DIM / 2\n\n        if s is None:\n            return None\n\n        p1 = [\n            -self.LINK_LENGTH_1 * cos(s[0]) * scale,\n            self.LINK_LENGTH_1 * sin(s[0]) * scale,\n        ]\n\n        p2 = [\n            p1[0] - self.LINK_LENGTH_2 * cos(s[0] + s[1]) * scale,\n            p1[1] + self.LINK_LENGTH_2 * sin(s[0] + s[1]) * scale,\n        ]\n\n        xys = np.array([[0, 0], p1, p2])[:, ::-1]\n        thetas = [s[0] - pi / 2, s[0] + s[1] - pi / 2]\n        link_lengths = [self.LINK_LENGTH_1 * scale, self.LINK_LENGTH_2 * scale]\n\n        pygame.draw.line(\n            surf,\n            start_pos=(-2.2 * scale + offset, 1 * scale + offset),\n            end_pos=(2.2 * scale + offset, 1 * scale + offset),\n            color=(0, 0, 0),\n        )\n\n        for ((x, y), th, llen) in zip(xys, thetas, link_lengths):\n            x = x + offset\n            y = y + offset\n            l, r, t, b = 0, llen, 0.1 * scale, -0.1 * scale\n            coords = [(l, b), (l, t), (r, t), (r, b)]\n            transformed_coords = []\n            for coord in coords:\n                coord = pygame.math.Vector2(coord).rotate_rad(th)\n                coord = (coord[0] + x, coord[1] + y)\n                transformed_coords.append(coord)\n            gfxdraw.aapolygon(surf, transformed_coords, (0, 204, 204))\n            gfxdraw.filled_polygon(surf, transformed_coords, (0, 204, 204))\n\n            gfxdraw.aacircle(surf, int(x), int(y), int(0.1 * scale), (204, 204, 0))\n            gfxdraw.filled_circle(surf, int(x), int(y), int(0.1 * scale), (204, 204, 0))\n\n        surf = pygame.transform.flip(surf, False, True)\n        self.screen.blit(surf, (0, 0))\n\n        if self.render_mode == \"human\":\n            pygame.event.pump()\n            self.clock.tick(self.metadata[\"render_fps\"])\n            pygame.display.flip()\n\n        elif self.render_mode == \"rgb_array\":\n            return np.transpose(\n                np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2)\n            )\n\n    def close(self):\n        if self.screen is not None:\n            import pygame\n\n            pygame.display.quit()\n            pygame.quit()\n            self.isopen = False\n\n\ndef wrap(x, m, M):\n    \"\"\"Wraps ``x`` so m <= x <= M; but unlike ``bound()`` which\n    truncates, ``wrap()`` wraps x around the coordinate system defined by m,M.\\n\n    For example, m = -180, M = 180 (degrees), x = 360 --> returns 0.\n\n    Args:\n        x: a scalar\n        m: minimum possible value in range\n        M: maximum possible value in range\n\n    Returns:\n        x: a scalar, wrapped\n    \"\"\"\n    diff = M - m\n    while x > M:\n        x = x - diff\n    while x < m:\n        x = x + diff\n    return x\n\n\ndef bound(x, m, M=None):\n    \"\"\"Either have m as scalar, so bound(x,m,M) which returns m <= x <= M *OR*\n    have m as length 2 vector, bound(x,m, <IGNORED>) returns m[0] <= x <= m[1].\n\n    Args:\n        x: scalar\n        m: The lower bound\n        M: The upper bound\n\n    Returns:\n        x: scalar, bound between min (m) and Max (M)\n    \"\"\"\n    if M is None:\n        M = m[1]\n        m = m[0]\n    # bound x between min (m) and Max (M)\n    return min(max(x, m), M)\n\n\ndef rk4(derivs, y0, t):\n    \"\"\"\n    Integrate 1-D or N-D system of ODEs using 4-th order Runge-Kutta.\n\n    Example for 2D system:\n\n        >>> def derivs(x):\n        ...     d1 =  x[0] + 2*x[1]\n        ...     d2 =  -3*x[0] + 4*x[1]\n        ...     return d1, d2\n\n        >>> dt = 0.0005\n        >>> t = np.arange(0.0, 2.0, dt)\n        >>> y0 = (1,2)\n        >>> yout = rk4(derivs, y0, t)\n\n    Args:\n        derivs: the derivative of the system and has the signature ``dy = derivs(yi)``\n        y0: initial state vector\n        t: sample times\n\n    Returns:\n        yout: Runge-Kutta approximation of the ODE\n    \"\"\"\n\n    try:\n        Ny = len(y0)\n    except TypeError:\n        yout = np.zeros((len(t),), np.float_)\n    else:\n        yout = np.zeros((len(t), Ny), np.float_)\n\n    yout[0] = y0\n\n    for i in np.arange(len(t) - 1):\n\n        this = t[i]\n        dt = t[i + 1] - this\n        dt2 = dt / 2.0\n        y0 = yout[i]\n\n        k1 = np.asarray(derivs(y0))\n        k2 = np.asarray(derivs(y0 + dt2 * k1))\n        k3 = np.asarray(derivs(y0 + dt2 * k2))\n        k4 = np.asarray(derivs(y0 + dt * k3))\n        yout[i + 1] = y0 + dt / 6.0 * (k1 + 2 * k2 + 2 * k3 + k4)\n    # We only care about the final timestep and we cleave off action value which will be zero\n    return yout[-1][:4]\n"
  },
  {
    "path": "gym/envs/classic_control/cartpole.py",
    "content": "\"\"\"\nClassic cart-pole system implemented by Rich Sutton et al.\nCopied from http://incompleteideas.net/sutton/book/code/pole.c\npermalink: https://perma.cc/C9ZM-652R\n\"\"\"\nimport math\nfrom typing import Optional, Union\n\nimport numpy as np\n\nimport gym\nfrom gym import logger, spaces\nfrom gym.envs.classic_control import utils\nfrom gym.error import DependencyNotInstalled\n\n\nclass CartPoleEnv(gym.Env[np.ndarray, Union[int, np.ndarray]]):\n    \"\"\"\n    ### Description\n\n    This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in\n    [\"Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem\"](https://ieeexplore.ieee.org/document/6313077).\n    A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track.\n    The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces\n     in the left and right direction on the cart.\n\n    ### Action Space\n\n    The action is a `ndarray` with shape `(1,)` which can take values `{0, 1}` indicating the direction\n     of the fixed force the cart is pushed with.\n\n    | Num | Action                 |\n    |-----|------------------------|\n    | 0   | Push cart to the left  |\n    | 1   | Push cart to the right |\n\n    **Note**: The velocity that is reduced or increased by the applied force is not fixed and it depends on the angle\n     the pole is pointing. The center of gravity of the pole varies the amount of energy needed to move the cart underneath it\n\n    ### Observation Space\n\n    The observation is a `ndarray` with shape `(4,)` with the values corresponding to the following positions and velocities:\n\n    | Num | Observation           | Min                 | Max               |\n    |-----|-----------------------|---------------------|-------------------|\n    | 0   | Cart Position         | -4.8                | 4.8               |\n    | 1   | Cart Velocity         | -Inf                | Inf               |\n    | 2   | Pole Angle            | ~ -0.418 rad (-24°) | ~ 0.418 rad (24°) |\n    | 3   | Pole Angular Velocity | -Inf                | Inf               |\n\n    **Note:** While the ranges above denote the possible values for observation space of each element,\n        it is not reflective of the allowed values of the state space in an unterminated episode. Particularly:\n    -  The cart x-position (index 0) can be take values between `(-4.8, 4.8)`, but the episode terminates\n       if the cart leaves the `(-2.4, 2.4)` range.\n    -  The pole angle can be observed between  `(-.418, .418)` radians (or **±24°**), but the episode terminates\n       if the pole angle is not in the range `(-.2095, .2095)` (or **±12°**)\n\n    ### Rewards\n\n    Since the goal is to keep the pole upright for as long as possible, a reward of `+1` for every step taken,\n    including the termination step, is allotted. The threshold for rewards is 475 for v1.\n\n    ### Starting State\n\n    All observations are assigned a uniformly random value in `(-0.05, 0.05)`\n\n    ### Episode End\n\n    The episode ends if any one of the following occurs:\n\n    1. Termination: Pole Angle is greater than ±12°\n    2. Termination: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display)\n    3. Truncation: Episode length is greater than 500 (200 for v0)\n\n    ### Arguments\n\n    ```\n    gym.make('CartPole-v1')\n    ```\n\n    No additional arguments are currently supported.\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\"human\", \"rgb_array\"],\n        \"render_fps\": 50,\n    }\n\n    def __init__(self, render_mode: Optional[str] = None):\n        self.gravity = 9.8\n        self.masscart = 1.0\n        self.masspole = 0.1\n        self.total_mass = self.masspole + self.masscart\n        self.length = 0.5  # actually half the pole's length\n        self.polemass_length = self.masspole * self.length\n        self.force_mag = 10.0\n        self.tau = 0.02  # seconds between state updates\n        self.kinematics_integrator = \"euler\"\n\n        # Angle at which to fail the episode\n        self.theta_threshold_radians = 12 * 2 * math.pi / 360\n        self.x_threshold = 2.4\n\n        # Angle limit set to 2 * theta_threshold_radians so failing observation\n        # is still within bounds.\n        high = np.array(\n            [\n                self.x_threshold * 2,\n                np.finfo(np.float32).max,\n                self.theta_threshold_radians * 2,\n                np.finfo(np.float32).max,\n            ],\n            dtype=np.float32,\n        )\n\n        self.action_space = spaces.Discrete(2)\n        self.observation_space = spaces.Box(-high, high, dtype=np.float32)\n\n        self.render_mode = render_mode\n\n        self.screen_width = 600\n        self.screen_height = 400\n        self.screen = None\n        self.clock = None\n        self.isopen = True\n        self.state = None\n\n        self.steps_beyond_terminated = None\n\n    def step(self, action):\n        err_msg = f\"{action!r} ({type(action)}) invalid\"\n        assert self.action_space.contains(action), err_msg\n        assert self.state is not None, \"Call reset before using step method.\"\n        x, x_dot, theta, theta_dot = self.state\n        force = self.force_mag if action == 1 else -self.force_mag\n        costheta = math.cos(theta)\n        sintheta = math.sin(theta)\n\n        # For the interested reader:\n        # https://coneural.org/florian/papers/05_cart_pole.pdf\n        temp = (\n            force + self.polemass_length * theta_dot**2 * sintheta\n        ) / self.total_mass\n        thetaacc = (self.gravity * sintheta - costheta * temp) / (\n            self.length * (4.0 / 3.0 - self.masspole * costheta**2 / self.total_mass)\n        )\n        xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass\n\n        if self.kinematics_integrator == \"euler\":\n            x = x + self.tau * x_dot\n            x_dot = x_dot + self.tau * xacc\n            theta = theta + self.tau * theta_dot\n            theta_dot = theta_dot + self.tau * thetaacc\n        else:  # semi-implicit euler\n            x_dot = x_dot + self.tau * xacc\n            x = x + self.tau * x_dot\n            theta_dot = theta_dot + self.tau * thetaacc\n            theta = theta + self.tau * theta_dot\n\n        self.state = (x, x_dot, theta, theta_dot)\n\n        terminated = bool(\n            x < -self.x_threshold\n            or x > self.x_threshold\n            or theta < -self.theta_threshold_radians\n            or theta > self.theta_threshold_radians\n        )\n\n        if not terminated:\n            reward = 1.0\n        elif self.steps_beyond_terminated is None:\n            # Pole just fell!\n            self.steps_beyond_terminated = 0\n            reward = 1.0\n        else:\n            if self.steps_beyond_terminated == 0:\n                logger.warn(\n                    \"You are calling 'step()' even though this \"\n                    \"environment has already returned terminated = True. You \"\n                    \"should always call 'reset()' once you receive 'terminated = \"\n                    \"True' -- any further steps are undefined behavior.\"\n                )\n            self.steps_beyond_terminated += 1\n            reward = 0.0\n\n        if self.render_mode == \"human\":\n            self.render()\n        return np.array(self.state, dtype=np.float32), reward, terminated, False, {}\n\n    def reset(\n        self,\n        *,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ):\n        super().reset(seed=seed)\n        # Note that if you use custom reset bounds, it may lead to out-of-bound\n        # state/observations.\n        low, high = utils.maybe_parse_reset_bounds(\n            options, -0.05, 0.05  # default low\n        )  # default high\n        self.state = self.np_random.uniform(low=low, high=high, size=(4,))\n        self.steps_beyond_terminated = None\n\n        if self.render_mode == \"human\":\n            self.render()\n        return np.array(self.state, dtype=np.float32), {}\n\n    def render(self):\n        if self.render_mode is None:\n            gym.logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n            return\n\n        try:\n            import pygame\n            from pygame import gfxdraw\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[classic_control]`\"\n            )\n\n        if self.screen is None:\n            pygame.init()\n            if self.render_mode == \"human\":\n                pygame.display.init()\n                self.screen = pygame.display.set_mode(\n                    (self.screen_width, self.screen_height)\n                )\n            else:  # mode == \"rgb_array\"\n                self.screen = pygame.Surface((self.screen_width, self.screen_height))\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n\n        world_width = self.x_threshold * 2\n        scale = self.screen_width / world_width\n        polewidth = 10.0\n        polelen = scale * (2 * self.length)\n        cartwidth = 50.0\n        cartheight = 30.0\n\n        if self.state is None:\n            return None\n\n        x = self.state\n\n        self.surf = pygame.Surface((self.screen_width, self.screen_height))\n        self.surf.fill((255, 255, 255))\n\n        l, r, t, b = -cartwidth / 2, cartwidth / 2, cartheight / 2, -cartheight / 2\n        axleoffset = cartheight / 4.0\n        cartx = x[0] * scale + self.screen_width / 2.0  # MIDDLE OF CART\n        carty = 100  # TOP OF CART\n        cart_coords = [(l, b), (l, t), (r, t), (r, b)]\n        cart_coords = [(c[0] + cartx, c[1] + carty) for c in cart_coords]\n        gfxdraw.aapolygon(self.surf, cart_coords, (0, 0, 0))\n        gfxdraw.filled_polygon(self.surf, cart_coords, (0, 0, 0))\n\n        l, r, t, b = (\n            -polewidth / 2,\n            polewidth / 2,\n            polelen - polewidth / 2,\n            -polewidth / 2,\n        )\n\n        pole_coords = []\n        for coord in [(l, b), (l, t), (r, t), (r, b)]:\n            coord = pygame.math.Vector2(coord).rotate_rad(-x[2])\n            coord = (coord[0] + cartx, coord[1] + carty + axleoffset)\n            pole_coords.append(coord)\n        gfxdraw.aapolygon(self.surf, pole_coords, (202, 152, 101))\n        gfxdraw.filled_polygon(self.surf, pole_coords, (202, 152, 101))\n\n        gfxdraw.aacircle(\n            self.surf,\n            int(cartx),\n            int(carty + axleoffset),\n            int(polewidth / 2),\n            (129, 132, 203),\n        )\n        gfxdraw.filled_circle(\n            self.surf,\n            int(cartx),\n            int(carty + axleoffset),\n            int(polewidth / 2),\n            (129, 132, 203),\n        )\n\n        gfxdraw.hline(self.surf, 0, self.screen_width, carty, (0, 0, 0))\n\n        self.surf = pygame.transform.flip(self.surf, False, True)\n        self.screen.blit(self.surf, (0, 0))\n        if self.render_mode == \"human\":\n            pygame.event.pump()\n            self.clock.tick(self.metadata[\"render_fps\"])\n            pygame.display.flip()\n\n        elif self.render_mode == \"rgb_array\":\n            return np.transpose(\n                np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2)\n            )\n\n    def close(self):\n        if self.screen is not None:\n            import pygame\n\n            pygame.display.quit()\n            pygame.quit()\n            self.isopen = False\n"
  },
  {
    "path": "gym/envs/classic_control/continuous_mountain_car.py",
    "content": "\"\"\"\n@author: Olivier Sigaud\n\nA merge between two sources:\n\n* Adaptation of the MountainCar Environment from the \"FAReinforcement\" library\nof Jose Antonio Martin H. (version 1.0), adapted by  'Tom Schaul, tom@idsia.ch'\nand then modified by Arnaud de Broissia\n\n* the gym MountainCar environment\nitself from\nhttp://incompleteideas.net/sutton/MountainCar/MountainCar1.cp\npermalink: https://perma.cc/6Z2N-PFWC\n\"\"\"\n\nimport math\nfrom typing import Optional\n\nimport numpy as np\n\nimport gym\nfrom gym import spaces\nfrom gym.envs.classic_control import utils\nfrom gym.error import DependencyNotInstalled\n\n\nclass Continuous_MountainCarEnv(gym.Env):\n    \"\"\"\n    ### Description\n\n    The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically\n    at the bottom of a sinusoidal valley, with the only possible actions being the accelerations\n    that can be applied to the car in either direction. The goal of the MDP is to strategically\n    accelerate the car to reach the goal state on top of the right hill. There are two versions\n    of the mountain car domain in gym: one with discrete actions and one with continuous.\n    This version is the one with continuous actions.\n\n    This MDP first appeared in [Andrew Moore's PhD Thesis (1990)](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-209.pdf)\n\n    ```\n    @TECHREPORT{Moore90efficientmemory-based,\n        author = {Andrew William Moore},\n        title = {Efficient Memory-based Learning for Robot Control},\n        institution = {University of Cambridge},\n        year = {1990}\n    }\n    ```\n\n    ### Observation Space\n\n    The observation is a `ndarray` with shape `(2,)` where the elements correspond to the following:\n\n    | Num | Observation                          | Min  | Max | Unit         |\n    |-----|--------------------------------------|------|-----|--------------|\n    | 0   | position of the car along the x-axis | -Inf | Inf | position (m) |\n    | 1   | velocity of the car                  | -Inf | Inf | position (m) |\n\n    ### Action Space\n\n    The action is a `ndarray` with shape `(1,)`, representing the directional force applied on the car.\n    The action is clipped in the range `[-1,1]` and multiplied by a power of 0.0015.\n\n    ### Transition Dynamics:\n\n    Given an action, the mountain car follows the following transition dynamics:\n\n    *velocity<sub>t+1</sub> = velocity<sub>t+1</sub> + force * self.power - 0.0025 * cos(3 * position<sub>t</sub>)*\n\n    *position<sub>t+1</sub> = position<sub>t</sub> + velocity<sub>t+1</sub>*\n\n    where force is the action clipped to the range `[-1,1]` and power is a constant 0.0015.\n    The collisions at either end are inelastic with the velocity set to 0 upon collision with the wall.\n    The position is clipped to the range [-1.2, 0.6] and velocity is clipped to the range [-0.07, 0.07].\n\n    ### Reward\n\n    A negative reward of *-0.1 * action<sup>2</sup>* is received at each timestep to penalise for\n    taking actions of large magnitude. If the mountain car reaches the goal then a positive reward of +100\n    is added to the negative reward for that timestep.\n\n    ### Starting State\n\n    The position of the car is assigned a uniform random value in `[-0.6 , -0.4]`.\n    The starting velocity of the car is always assigned to 0.\n\n    ### Episode End\n\n    The episode ends if either of the following happens:\n    1. Termination: The position of the car is greater than or equal to 0.45 (the goal position on top of the right hill)\n    2. Truncation: The length of the episode is 999.\n\n    ### Arguments\n\n    ```\n    gym.make('MountainCarContinuous-v0')\n    ```\n\n    ### Version History\n\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\"human\", \"rgb_array\"],\n        \"render_fps\": 30,\n    }\n\n    def __init__(self, render_mode: Optional[str] = None, goal_velocity=0):\n        self.min_action = -1.0\n        self.max_action = 1.0\n        self.min_position = -1.2\n        self.max_position = 0.6\n        self.max_speed = 0.07\n        self.goal_position = (\n            0.45  # was 0.5 in gym, 0.45 in Arnaud de Broissia's version\n        )\n        self.goal_velocity = goal_velocity\n        self.power = 0.0015\n\n        self.low_state = np.array(\n            [self.min_position, -self.max_speed], dtype=np.float32\n        )\n        self.high_state = np.array(\n            [self.max_position, self.max_speed], dtype=np.float32\n        )\n\n        self.render_mode = render_mode\n\n        self.screen_width = 600\n        self.screen_height = 400\n        self.screen = None\n        self.clock = None\n        self.isopen = True\n\n        self.action_space = spaces.Box(\n            low=self.min_action, high=self.max_action, shape=(1,), dtype=np.float32\n        )\n        self.observation_space = spaces.Box(\n            low=self.low_state, high=self.high_state, dtype=np.float32\n        )\n\n    def step(self, action: np.ndarray):\n\n        position = self.state[0]\n        velocity = self.state[1]\n        force = min(max(action[0], self.min_action), self.max_action)\n\n        velocity += force * self.power - 0.0025 * math.cos(3 * position)\n        if velocity > self.max_speed:\n            velocity = self.max_speed\n        if velocity < -self.max_speed:\n            velocity = -self.max_speed\n        position += velocity\n        if position > self.max_position:\n            position = self.max_position\n        if position < self.min_position:\n            position = self.min_position\n        if position == self.min_position and velocity < 0:\n            velocity = 0\n\n        # Convert a possible numpy bool to a Python bool.\n        terminated = bool(\n            position >= self.goal_position and velocity >= self.goal_velocity\n        )\n\n        reward = 0\n        if terminated:\n            reward = 100.0\n        reward -= math.pow(action[0], 2) * 0.1\n\n        self.state = np.array([position, velocity], dtype=np.float32)\n\n        if self.render_mode == \"human\":\n            self.render()\n        return self.state, reward, terminated, False, {}\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        # Note that if you use custom reset bounds, it may lead to out-of-bound\n        # state/observations.\n        low, high = utils.maybe_parse_reset_bounds(options, -0.6, -0.4)\n        self.state = np.array([self.np_random.uniform(low=low, high=high), 0])\n\n        if self.render_mode == \"human\":\n            self.render()\n        return np.array(self.state, dtype=np.float32), {}\n\n    def _height(self, xs):\n        return np.sin(3 * xs) * 0.45 + 0.55\n\n    def render(self):\n        if self.render_mode is None:\n            gym.logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n            return\n\n        try:\n            import pygame\n            from pygame import gfxdraw\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[classic_control]`\"\n            )\n\n        if self.screen is None:\n            pygame.init()\n            if self.render_mode == \"human\":\n                pygame.display.init()\n                self.screen = pygame.display.set_mode(\n                    (self.screen_width, self.screen_height)\n                )\n            else:  # mode == \"rgb_array\":\n                self.screen = pygame.Surface((self.screen_width, self.screen_height))\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n\n        world_width = self.max_position - self.min_position\n        scale = self.screen_width / world_width\n        carwidth = 40\n        carheight = 20\n\n        self.surf = pygame.Surface((self.screen_width, self.screen_height))\n        self.surf.fill((255, 255, 255))\n\n        pos = self.state[0]\n\n        xs = np.linspace(self.min_position, self.max_position, 100)\n        ys = self._height(xs)\n        xys = list(zip((xs - self.min_position) * scale, ys * scale))\n\n        pygame.draw.aalines(self.surf, points=xys, closed=False, color=(0, 0, 0))\n\n        clearance = 10\n\n        l, r, t, b = -carwidth / 2, carwidth / 2, carheight, 0\n        coords = []\n        for c in [(l, b), (l, t), (r, t), (r, b)]:\n            c = pygame.math.Vector2(c).rotate_rad(math.cos(3 * pos))\n            coords.append(\n                (\n                    c[0] + (pos - self.min_position) * scale,\n                    c[1] + clearance + self._height(pos) * scale,\n                )\n            )\n\n        gfxdraw.aapolygon(self.surf, coords, (0, 0, 0))\n        gfxdraw.filled_polygon(self.surf, coords, (0, 0, 0))\n\n        for c in [(carwidth / 4, 0), (-carwidth / 4, 0)]:\n            c = pygame.math.Vector2(c).rotate_rad(math.cos(3 * pos))\n            wheel = (\n                int(c[0] + (pos - self.min_position) * scale),\n                int(c[1] + clearance + self._height(pos) * scale),\n            )\n\n            gfxdraw.aacircle(\n                self.surf, wheel[0], wheel[1], int(carheight / 2.5), (128, 128, 128)\n            )\n            gfxdraw.filled_circle(\n                self.surf, wheel[0], wheel[1], int(carheight / 2.5), (128, 128, 128)\n            )\n\n        flagx = int((self.goal_position - self.min_position) * scale)\n        flagy1 = int(self._height(self.goal_position) * scale)\n        flagy2 = flagy1 + 50\n        gfxdraw.vline(self.surf, flagx, flagy1, flagy2, (0, 0, 0))\n\n        gfxdraw.aapolygon(\n            self.surf,\n            [(flagx, flagy2), (flagx, flagy2 - 10), (flagx + 25, flagy2 - 5)],\n            (204, 204, 0),\n        )\n        gfxdraw.filled_polygon(\n            self.surf,\n            [(flagx, flagy2), (flagx, flagy2 - 10), (flagx + 25, flagy2 - 5)],\n            (204, 204, 0),\n        )\n\n        self.surf = pygame.transform.flip(self.surf, False, True)\n        self.screen.blit(self.surf, (0, 0))\n        if self.render_mode == \"human\":\n            pygame.event.pump()\n            self.clock.tick(self.metadata[\"render_fps\"])\n            pygame.display.flip()\n\n        elif self.render_mode == \"rgb_array\":\n            return np.transpose(\n                np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2)\n            )\n\n    def close(self):\n        if self.screen is not None:\n            import pygame\n\n            pygame.display.quit()\n            pygame.quit()\n            self.isopen = False\n"
  },
  {
    "path": "gym/envs/classic_control/mountain_car.py",
    "content": "\"\"\"\nhttp://incompleteideas.net/MountainCar/MountainCar1.cp\npermalink: https://perma.cc/6Z2N-PFWC\n\"\"\"\nimport math\nfrom typing import Optional\n\nimport numpy as np\n\nimport gym\nfrom gym import spaces\nfrom gym.envs.classic_control import utils\nfrom gym.error import DependencyNotInstalled\n\n\nclass MountainCarEnv(gym.Env):\n    \"\"\"\n    ### Description\n\n    The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically\n    at the bottom of a sinusoidal valley, with the only possible actions being the accelerations\n    that can be applied to the car in either direction. The goal of the MDP is to strategically\n    accelerate the car to reach the goal state on top of the right hill. There are two versions\n    of the mountain car domain in gym: one with discrete actions and one with continuous.\n    This version is the one with discrete actions.\n\n    This MDP first appeared in [Andrew Moore's PhD Thesis (1990)](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-209.pdf)\n\n    ```\n    @TECHREPORT{Moore90efficientmemory-based,\n        author = {Andrew William Moore},\n        title = {Efficient Memory-based Learning for Robot Control},\n        institution = {University of Cambridge},\n        year = {1990}\n    }\n    ```\n\n    ### Observation Space\n\n    The observation is a `ndarray` with shape `(2,)` where the elements correspond to the following:\n\n    | Num | Observation                          | Min  | Max | Unit         |\n    |-----|--------------------------------------|------|-----|--------------|\n    | 0   | position of the car along the x-axis | -Inf | Inf | position (m) |\n    | 1   | velocity of the car                  | -Inf | Inf | position (m) |\n\n    ### Action Space\n\n    There are 3 discrete deterministic actions:\n\n    | Num | Observation             | Value | Unit         |\n    |-----|-------------------------|-------|--------------|\n    | 0   | Accelerate to the left  | Inf   | position (m) |\n    | 1   | Don't accelerate        | Inf   | position (m) |\n    | 2   | Accelerate to the right | Inf   | position (m) |\n\n    ### Transition Dynamics:\n\n    Given an action, the mountain car follows the following transition dynamics:\n\n    *velocity<sub>t+1</sub> = velocity<sub>t</sub> + (action - 1) * force - cos(3 * position<sub>t</sub>) * gravity*\n\n    *position<sub>t+1</sub> = position<sub>t</sub> + velocity<sub>t+1</sub>*\n\n    where force = 0.001 and gravity = 0.0025. The collisions at either end are inelastic with the velocity set to 0\n    upon collision with the wall. The position is clipped to the range `[-1.2, 0.6]` and\n    velocity is clipped to the range `[-0.07, 0.07]`.\n\n\n    ### Reward:\n\n    The goal is to reach the flag placed on top of the right hill as quickly as possible, as such the agent is\n    penalised with a reward of -1 for each timestep.\n\n    ### Starting State\n\n    The position of the car is assigned a uniform random value in *[-0.6 , -0.4]*.\n    The starting velocity of the car is always assigned to 0.\n\n    ### Episode End\n\n    The episode ends if either of the following happens:\n    1. Termination: The position of the car is greater than or equal to 0.5 (the goal position on top of the right hill)\n    2. Truncation: The length of the episode is 200.\n\n\n    ### Arguments\n\n    ```\n    gym.make('MountainCar-v0')\n    ```\n\n    ### Version History\n\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\"human\", \"rgb_array\"],\n        \"render_fps\": 30,\n    }\n\n    def __init__(self, render_mode: Optional[str] = None, goal_velocity=0):\n        self.min_position = -1.2\n        self.max_position = 0.6\n        self.max_speed = 0.07\n        self.goal_position = 0.5\n        self.goal_velocity = goal_velocity\n\n        self.force = 0.001\n        self.gravity = 0.0025\n\n        self.low = np.array([self.min_position, -self.max_speed], dtype=np.float32)\n        self.high = np.array([self.max_position, self.max_speed], dtype=np.float32)\n\n        self.render_mode = render_mode\n\n        self.screen_width = 600\n        self.screen_height = 400\n        self.screen = None\n        self.clock = None\n        self.isopen = True\n\n        self.action_space = spaces.Discrete(3)\n        self.observation_space = spaces.Box(self.low, self.high, dtype=np.float32)\n\n    def step(self, action: int):\n        assert self.action_space.contains(\n            action\n        ), f\"{action!r} ({type(action)}) invalid\"\n\n        position, velocity = self.state\n        velocity += (action - 1) * self.force + math.cos(3 * position) * (-self.gravity)\n        velocity = np.clip(velocity, -self.max_speed, self.max_speed)\n        position += velocity\n        position = np.clip(position, self.min_position, self.max_position)\n        if position == self.min_position and velocity < 0:\n            velocity = 0\n\n        terminated = bool(\n            position >= self.goal_position and velocity >= self.goal_velocity\n        )\n        reward = -1.0\n\n        self.state = (position, velocity)\n        if self.render_mode == \"human\":\n            self.render()\n        return np.array(self.state, dtype=np.float32), reward, terminated, False, {}\n\n    def reset(\n        self,\n        *,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ):\n        super().reset(seed=seed)\n        # Note that if you use custom reset bounds, it may lead to out-of-bound\n        # state/observations.\n        low, high = utils.maybe_parse_reset_bounds(options, -0.6, -0.4)\n        self.state = np.array([self.np_random.uniform(low=low, high=high), 0])\n\n        if self.render_mode == \"human\":\n            self.render()\n        return np.array(self.state, dtype=np.float32), {}\n\n    def _height(self, xs):\n        return np.sin(3 * xs) * 0.45 + 0.55\n\n    def render(self):\n        if self.render_mode is None:\n            gym.logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n            return\n\n        try:\n            import pygame\n            from pygame import gfxdraw\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[classic_control]`\"\n            )\n\n        if self.screen is None:\n            pygame.init()\n            if self.render_mode == \"human\":\n                pygame.display.init()\n                self.screen = pygame.display.set_mode(\n                    (self.screen_width, self.screen_height)\n                )\n            else:  # mode in \"rgb_array\"\n                self.screen = pygame.Surface((self.screen_width, self.screen_height))\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n\n        world_width = self.max_position - self.min_position\n        scale = self.screen_width / world_width\n        carwidth = 40\n        carheight = 20\n\n        self.surf = pygame.Surface((self.screen_width, self.screen_height))\n        self.surf.fill((255, 255, 255))\n\n        pos = self.state[0]\n\n        xs = np.linspace(self.min_position, self.max_position, 100)\n        ys = self._height(xs)\n        xys = list(zip((xs - self.min_position) * scale, ys * scale))\n\n        pygame.draw.aalines(self.surf, points=xys, closed=False, color=(0, 0, 0))\n\n        clearance = 10\n\n        l, r, t, b = -carwidth / 2, carwidth / 2, carheight, 0\n        coords = []\n        for c in [(l, b), (l, t), (r, t), (r, b)]:\n            c = pygame.math.Vector2(c).rotate_rad(math.cos(3 * pos))\n            coords.append(\n                (\n                    c[0] + (pos - self.min_position) * scale,\n                    c[1] + clearance + self._height(pos) * scale,\n                )\n            )\n\n        gfxdraw.aapolygon(self.surf, coords, (0, 0, 0))\n        gfxdraw.filled_polygon(self.surf, coords, (0, 0, 0))\n\n        for c in [(carwidth / 4, 0), (-carwidth / 4, 0)]:\n            c = pygame.math.Vector2(c).rotate_rad(math.cos(3 * pos))\n            wheel = (\n                int(c[0] + (pos - self.min_position) * scale),\n                int(c[1] + clearance + self._height(pos) * scale),\n            )\n\n            gfxdraw.aacircle(\n                self.surf, wheel[0], wheel[1], int(carheight / 2.5), (128, 128, 128)\n            )\n            gfxdraw.filled_circle(\n                self.surf, wheel[0], wheel[1], int(carheight / 2.5), (128, 128, 128)\n            )\n\n        flagx = int((self.goal_position - self.min_position) * scale)\n        flagy1 = int(self._height(self.goal_position) * scale)\n        flagy2 = flagy1 + 50\n        gfxdraw.vline(self.surf, flagx, flagy1, flagy2, (0, 0, 0))\n\n        gfxdraw.aapolygon(\n            self.surf,\n            [(flagx, flagy2), (flagx, flagy2 - 10), (flagx + 25, flagy2 - 5)],\n            (204, 204, 0),\n        )\n        gfxdraw.filled_polygon(\n            self.surf,\n            [(flagx, flagy2), (flagx, flagy2 - 10), (flagx + 25, flagy2 - 5)],\n            (204, 204, 0),\n        )\n\n        self.surf = pygame.transform.flip(self.surf, False, True)\n        self.screen.blit(self.surf, (0, 0))\n        if self.render_mode == \"human\":\n            pygame.event.pump()\n            self.clock.tick(self.metadata[\"render_fps\"])\n            pygame.display.flip()\n\n        elif self.render_mode == \"rgb_array\":\n            return np.transpose(\n                np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2)\n            )\n\n    def get_keys_to_action(self):\n        # Control with left and right arrow keys.\n        return {(): 1, (276,): 0, (275,): 2, (275, 276): 1}\n\n    def close(self):\n        if self.screen is not None:\n            import pygame\n\n            pygame.display.quit()\n            pygame.quit()\n            self.isopen = False\n"
  },
  {
    "path": "gym/envs/classic_control/pendulum.py",
    "content": "__credits__ = [\"Carlos Luis\"]\n\nfrom os import path\nfrom typing import Optional\n\nimport numpy as np\n\nimport gym\nfrom gym import spaces\nfrom gym.envs.classic_control import utils\nfrom gym.error import DependencyNotInstalled\n\nDEFAULT_X = np.pi\nDEFAULT_Y = 1.0\n\n\nclass PendulumEnv(gym.Env):\n    \"\"\"\n       ### Description\n\n    The inverted pendulum swingup problem is based on the classic problem in control theory.\n    The system consists of a pendulum attached at one end to a fixed point, and the other end being free.\n    The pendulum starts in a random position and the goal is to apply torque on the free end to swing it\n    into an upright position, with its center of gravity right above the fixed point.\n\n    The diagram below specifies the coordinate system used for the implementation of the pendulum's\n    dynamic equations.\n\n    ![Pendulum Coordinate System](./diagrams/pendulum.png)\n\n    -  `x-y`: cartesian coordinates of the pendulum's end in meters.\n    - `theta` : angle in radians.\n    - `tau`: torque in `N m`. Defined as positive _counter-clockwise_.\n\n    ### Action Space\n\n    The action is a `ndarray` with shape `(1,)` representing the torque applied to free end of the pendulum.\n\n    | Num | Action | Min  | Max |\n    |-----|--------|------|-----|\n    | 0   | Torque | -2.0 | 2.0 |\n\n\n    ### Observation Space\n\n    The observation is a `ndarray` with shape `(3,)` representing the x-y coordinates of the pendulum's free\n    end and its angular velocity.\n\n    | Num | Observation      | Min  | Max |\n    |-----|------------------|------|-----|\n    | 0   | x = cos(theta)   | -1.0 | 1.0 |\n    | 1   | y = sin(theta)   | -1.0 | 1.0 |\n    | 2   | Angular Velocity | -8.0 | 8.0 |\n\n    ### Rewards\n\n    The reward function is defined as:\n\n    *r = -(theta<sup>2</sup> + 0.1 * theta_dt<sup>2</sup> + 0.001 * torque<sup>2</sup>)*\n\n    where `$\\theta$` is the pendulum's angle normalized between *[-pi, pi]* (with 0 being in the upright position).\n    Based on the above equation, the minimum reward that can be obtained is\n    *-(pi<sup>2</sup> + 0.1 * 8<sup>2</sup> + 0.001 * 2<sup>2</sup>) = -16.2736044*,\n    while the maximum reward is zero (pendulum is upright with zero velocity and no torque applied).\n\n    ### Starting State\n\n    The starting state is a random angle in *[-pi, pi]* and a random angular velocity in *[-1,1]*.\n\n    ### Episode Truncation\n\n    The episode truncates at 200 time steps.\n\n    ### Arguments\n\n    - `g`: acceleration of gravity measured in *(m s<sup>-2</sup>)* used to calculate the pendulum dynamics.\n      The default value is g = 10.0 .\n\n    ```\n    gym.make('Pendulum-v1', g=9.81)\n    ```\n\n    ### Version History\n\n    * v1: Simplify the math equations, no difference in behavior.\n    * v0: Initial versions release (1.0.0)\n\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\"human\", \"rgb_array\"],\n        \"render_fps\": 30,\n    }\n\n    def __init__(self, render_mode: Optional[str] = None, g=10.0):\n        self.max_speed = 8\n        self.max_torque = 2.0\n        self.dt = 0.05\n        self.g = g\n        self.m = 1.0\n        self.l = 1.0\n\n        self.render_mode = render_mode\n\n        self.screen_dim = 500\n        self.screen = None\n        self.clock = None\n        self.isopen = True\n\n        high = np.array([1.0, 1.0, self.max_speed], dtype=np.float32)\n        # This will throw a warning in tests/envs/test_envs in utils/env_checker.py as the space is not symmetric\n        #   or normalised as max_torque == 2 by default. Ignoring the issue here as the default settings are too old\n        #   to update to follow the openai gym api\n        self.action_space = spaces.Box(\n            low=-self.max_torque, high=self.max_torque, shape=(1,), dtype=np.float32\n        )\n        self.observation_space = spaces.Box(low=-high, high=high, dtype=np.float32)\n\n    def step(self, u):\n        th, thdot = self.state  # th := theta\n\n        g = self.g\n        m = self.m\n        l = self.l\n        dt = self.dt\n\n        u = np.clip(u, -self.max_torque, self.max_torque)[0]\n        self.last_u = u  # for rendering\n        costs = angle_normalize(th) ** 2 + 0.1 * thdot**2 + 0.001 * (u**2)\n\n        newthdot = thdot + (3 * g / (2 * l) * np.sin(th) + 3.0 / (m * l**2) * u) * dt\n        newthdot = np.clip(newthdot, -self.max_speed, self.max_speed)\n        newth = th + newthdot * dt\n\n        self.state = np.array([newth, newthdot])\n\n        if self.render_mode == \"human\":\n            self.render()\n        return self._get_obs(), -costs, False, False, {}\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        if options is None:\n            high = np.array([DEFAULT_X, DEFAULT_Y])\n        else:\n            # Note that if you use custom reset bounds, it may lead to out-of-bound\n            # state/observations.\n            x = options.get(\"x_init\") if \"x_init\" in options else DEFAULT_X\n            y = options.get(\"y_init\") if \"y_init\" in options else DEFAULT_Y\n            x = utils.verify_number_and_cast(x)\n            y = utils.verify_number_and_cast(y)\n            high = np.array([x, y])\n        low = -high  # We enforce symmetric limits.\n        self.state = self.np_random.uniform(low=low, high=high)\n        self.last_u = None\n\n        if self.render_mode == \"human\":\n            self.render()\n        return self._get_obs(), {}\n\n    def _get_obs(self):\n        theta, thetadot = self.state\n        return np.array([np.cos(theta), np.sin(theta), thetadot], dtype=np.float32)\n\n    def render(self):\n        if self.render_mode is None:\n            gym.logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n            return\n\n        try:\n            import pygame\n            from pygame import gfxdraw\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[classic_control]`\"\n            )\n\n        if self.screen is None:\n            pygame.init()\n            if self.render_mode == \"human\":\n                pygame.display.init()\n                self.screen = pygame.display.set_mode(\n                    (self.screen_dim, self.screen_dim)\n                )\n            else:  # mode in \"rgb_array\"\n                self.screen = pygame.Surface((self.screen_dim, self.screen_dim))\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n\n        self.surf = pygame.Surface((self.screen_dim, self.screen_dim))\n        self.surf.fill((255, 255, 255))\n\n        bound = 2.2\n        scale = self.screen_dim / (bound * 2)\n        offset = self.screen_dim // 2\n\n        rod_length = 1 * scale\n        rod_width = 0.2 * scale\n        l, r, t, b = 0, rod_length, rod_width / 2, -rod_width / 2\n        coords = [(l, b), (l, t), (r, t), (r, b)]\n        transformed_coords = []\n        for c in coords:\n            c = pygame.math.Vector2(c).rotate_rad(self.state[0] + np.pi / 2)\n            c = (c[0] + offset, c[1] + offset)\n            transformed_coords.append(c)\n        gfxdraw.aapolygon(self.surf, transformed_coords, (204, 77, 77))\n        gfxdraw.filled_polygon(self.surf, transformed_coords, (204, 77, 77))\n\n        gfxdraw.aacircle(self.surf, offset, offset, int(rod_width / 2), (204, 77, 77))\n        gfxdraw.filled_circle(\n            self.surf, offset, offset, int(rod_width / 2), (204, 77, 77)\n        )\n\n        rod_end = (rod_length, 0)\n        rod_end = pygame.math.Vector2(rod_end).rotate_rad(self.state[0] + np.pi / 2)\n        rod_end = (int(rod_end[0] + offset), int(rod_end[1] + offset))\n        gfxdraw.aacircle(\n            self.surf, rod_end[0], rod_end[1], int(rod_width / 2), (204, 77, 77)\n        )\n        gfxdraw.filled_circle(\n            self.surf, rod_end[0], rod_end[1], int(rod_width / 2), (204, 77, 77)\n        )\n\n        fname = path.join(path.dirname(__file__), \"assets/clockwise.png\")\n        img = pygame.image.load(fname)\n        if self.last_u is not None:\n            scale_img = pygame.transform.smoothscale(\n                img,\n                (scale * np.abs(self.last_u) / 2, scale * np.abs(self.last_u) / 2),\n            )\n            is_flip = bool(self.last_u > 0)\n            scale_img = pygame.transform.flip(scale_img, is_flip, True)\n            self.surf.blit(\n                scale_img,\n                (\n                    offset - scale_img.get_rect().centerx,\n                    offset - scale_img.get_rect().centery,\n                ),\n            )\n\n        # drawing axle\n        gfxdraw.aacircle(self.surf, offset, offset, int(0.05 * scale), (0, 0, 0))\n        gfxdraw.filled_circle(self.surf, offset, offset, int(0.05 * scale), (0, 0, 0))\n\n        self.surf = pygame.transform.flip(self.surf, False, True)\n        self.screen.blit(self.surf, (0, 0))\n        if self.render_mode == \"human\":\n            pygame.event.pump()\n            self.clock.tick(self.metadata[\"render_fps\"])\n            pygame.display.flip()\n\n        else:  # mode == \"rgb_array\":\n            return np.transpose(\n                np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2)\n            )\n\n    def close(self):\n        if self.screen is not None:\n            import pygame\n\n            pygame.display.quit()\n            pygame.quit()\n            self.isopen = False\n\n\ndef angle_normalize(x):\n    return ((x + np.pi) % (2 * np.pi)) - np.pi\n"
  },
  {
    "path": "gym/envs/classic_control/utils.py",
    "content": "\"\"\"\nUtility functions used for classic control environments.\n\"\"\"\n\nfrom typing import Optional, SupportsFloat, Tuple\n\n\ndef verify_number_and_cast(x: SupportsFloat) -> float:\n    \"\"\"Verify parameter is a single number and cast to a float.\"\"\"\n    try:\n        x = float(x)\n    except (ValueError, TypeError):\n        raise ValueError(f\"An option ({x}) could not be converted to a float.\")\n    return x\n\n\ndef maybe_parse_reset_bounds(\n    options: Optional[dict], default_low: float, default_high: float\n) -> Tuple[float, float]:\n    \"\"\"\n    This function can be called during a reset() to customize the sampling\n    ranges for setting the initial state distributions.\n\n    Args:\n      options: Options passed in to reset().\n      default_low: Default lower limit to use, if none specified in options.\n      default_high: Default upper limit to use, if none specified in options.\n\n    Returns:\n      Tuple of the lower and upper limits.\n    \"\"\"\n    if options is None:\n        return default_low, default_high\n\n    low = options.get(\"low\") if \"low\" in options else default_low\n    high = options.get(\"high\") if \"high\" in options else default_high\n\n    # We expect only numerical inputs.\n    low = verify_number_and_cast(low)\n    high = verify_number_and_cast(high)\n    if low > high:\n        raise ValueError(\n            f\"Lower bound ({low}) must be lower than higher bound ({high}).\"\n        )\n\n    return low, high\n"
  },
  {
    "path": "gym/envs/mujoco/__init__.py",
    "content": "from gym.envs.mujoco.mujoco_env import MujocoEnv, MuJocoPyEnv  # isort:skip\n\nfrom gym.envs.mujoco.ant import AntEnv\nfrom gym.envs.mujoco.half_cheetah import HalfCheetahEnv\nfrom gym.envs.mujoco.hopper import HopperEnv\nfrom gym.envs.mujoco.humanoid import HumanoidEnv\nfrom gym.envs.mujoco.humanoidstandup import HumanoidStandupEnv\nfrom gym.envs.mujoco.inverted_double_pendulum import InvertedDoublePendulumEnv\nfrom gym.envs.mujoco.inverted_pendulum import InvertedPendulumEnv\nfrom gym.envs.mujoco.pusher import PusherEnv\nfrom gym.envs.mujoco.reacher import ReacherEnv\nfrom gym.envs.mujoco.swimmer import SwimmerEnv\nfrom gym.envs.mujoco.walker2d import Walker2dEnv\n"
  },
  {
    "path": "gym/envs/mujoco/ant.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\n\nclass AntEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 20,\n    }\n\n    def __init__(self, **kwargs):\n        observation_space = Box(\n            low=-np.inf, high=np.inf, shape=(111,), dtype=np.float64\n        )\n        MuJocoPyEnv.__init__(\n            self, \"ant.xml\", 5, observation_space=observation_space, **kwargs\n        )\n        utils.EzPickle.__init__(self, **kwargs)\n\n    def step(self, a):\n        xposbefore = self.get_body_com(\"torso\")[0]\n        self.do_simulation(a, self.frame_skip)\n        xposafter = self.get_body_com(\"torso\")[0]\n\n        forward_reward = (xposafter - xposbefore) / self.dt\n        ctrl_cost = 0.5 * np.square(a).sum()\n        contact_cost = (\n            0.5 * 1e-3 * np.sum(np.square(np.clip(self.sim.data.cfrc_ext, -1, 1)))\n        )\n        survive_reward = 1.0\n        reward = forward_reward - ctrl_cost - contact_cost + survive_reward\n        state = self.state_vector()\n        not_terminated = (\n            np.isfinite(state).all() and state[2] >= 0.2 and state[2] <= 1.0\n        )\n        terminated = not not_terminated\n        ob = self._get_obs()\n\n        if self.render_mode == \"human\":\n            self.render()\n        return (\n            ob,\n            reward,\n            terminated,\n            False,\n            dict(\n                reward_forward=forward_reward,\n                reward_ctrl=-ctrl_cost,\n                reward_contact=-contact_cost,\n                reward_survive=survive_reward,\n            ),\n        )\n\n    def _get_obs(self):\n        return np.concatenate(\n            [\n                self.sim.data.qpos.flat[2:],\n                self.sim.data.qvel.flat,\n                np.clip(self.sim.data.cfrc_ext, -1, 1).flat,\n            ]\n        )\n\n    def reset_model(self):\n        qpos = self.init_qpos + self.np_random.uniform(\n            size=self.model.nq, low=-0.1, high=0.1\n        )\n        qvel = self.init_qvel + self.np_random.standard_normal(self.model.nv) * 0.1\n        self.set_state(qpos, qvel)\n        return self._get_obs()\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.distance = self.model.stat.extent * 0.5\n"
  },
  {
    "path": "gym/envs/mujoco/ant_v3.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {\n    \"distance\": 4.0,\n}\n\n\nclass AntEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 20,\n    }\n\n    def __init__(\n        self,\n        xml_file=\"ant.xml\",\n        ctrl_cost_weight=0.5,\n        contact_cost_weight=5e-4,\n        healthy_reward=1.0,\n        terminate_when_unhealthy=True,\n        healthy_z_range=(0.2, 1.0),\n        contact_force_range=(-1.0, 1.0),\n        reset_noise_scale=0.1,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            xml_file,\n            ctrl_cost_weight,\n            contact_cost_weight,\n            healthy_reward,\n            terminate_when_unhealthy,\n            healthy_z_range,\n            contact_force_range,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._ctrl_cost_weight = ctrl_cost_weight\n        self._contact_cost_weight = contact_cost_weight\n\n        self._healthy_reward = healthy_reward\n        self._terminate_when_unhealthy = terminate_when_unhealthy\n        self._healthy_z_range = healthy_z_range\n\n        self._contact_force_range = contact_force_range\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n\n        if exclude_current_positions_from_observation:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(111,), dtype=np.float64\n            )\n        else:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(113,), dtype=np.float64\n            )\n\n        MuJocoPyEnv.__init__(\n            self, xml_file, 5, observation_space=observation_space, **kwargs\n        )\n\n    @property\n    def healthy_reward(self):\n        return (\n            float(self.is_healthy or self._terminate_when_unhealthy)\n            * self._healthy_reward\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(action))\n        return control_cost\n\n    @property\n    def contact_forces(self):\n        raw_contact_forces = self.sim.data.cfrc_ext\n        min_value, max_value = self._contact_force_range\n        contact_forces = np.clip(raw_contact_forces, min_value, max_value)\n        return contact_forces\n\n    @property\n    def contact_cost(self):\n        contact_cost = self._contact_cost_weight * np.sum(\n            np.square(self.contact_forces)\n        )\n        return contact_cost\n\n    @property\n    def is_healthy(self):\n        state = self.state_vector()\n        min_z, max_z = self._healthy_z_range\n        is_healthy = np.isfinite(state).all() and min_z <= state[2] <= max_z\n        return is_healthy\n\n    @property\n    def terminated(self):\n        terminated = not self.is_healthy if self._terminate_when_unhealthy else False\n        return terminated\n\n    def step(self, action):\n        xy_position_before = self.get_body_com(\"torso\")[:2].copy()\n        self.do_simulation(action, self.frame_skip)\n        xy_position_after = self.get_body_com(\"torso\")[:2].copy()\n\n        xy_velocity = (xy_position_after - xy_position_before) / self.dt\n        x_velocity, y_velocity = xy_velocity\n\n        ctrl_cost = self.control_cost(action)\n        contact_cost = self.contact_cost\n\n        forward_reward = x_velocity\n        healthy_reward = self.healthy_reward\n\n        rewards = forward_reward + healthy_reward\n        costs = ctrl_cost + contact_cost\n\n        reward = rewards - costs\n        terminated = self.terminated\n        observation = self._get_obs()\n        info = {\n            \"reward_forward\": forward_reward,\n            \"reward_ctrl\": -ctrl_cost,\n            \"reward_contact\": -contact_cost,\n            \"reward_survive\": healthy_reward,\n            \"x_position\": xy_position_after[0],\n            \"y_position\": xy_position_after[1],\n            \"distance_from_origin\": np.linalg.norm(xy_position_after, ord=2),\n            \"x_velocity\": x_velocity,\n            \"y_velocity\": y_velocity,\n            \"forward_reward\": forward_reward,\n        }\n\n        if self.render_mode == \"human\":\n            self.render()\n        return observation, reward, terminated, False, info\n\n    def _get_obs(self):\n        position = self.sim.data.qpos.flat.copy()\n        velocity = self.sim.data.qvel.flat.copy()\n        contact_force = self.contact_forces.flat.copy()\n\n        if self._exclude_current_positions_from_observation:\n            position = position[2:]\n\n        observations = np.concatenate((position, velocity, contact_force))\n\n        return observations\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = (\n            self.init_qvel\n            + self._reset_noise_scale * self.np_random.standard_normal(self.model.nv)\n        )\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/mujoco/ant_v4.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MujocoEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {\n    \"distance\": 4.0,\n}\n\n\nclass AntEnv(MujocoEnv, utils.EzPickle):\n    \"\"\"\n    ### Description\n\n    This environment is based on the environment introduced by Schulman,\n    Moritz, Levine, Jordan and Abbeel in [\"High-Dimensional Continuous Control\n    Using Generalized Advantage Estimation\"](https://arxiv.org/abs/1506.02438).\n    The ant is a 3D robot consisting of one torso (free rotational body) with\n    four legs attached to it with each leg having two links. The goal is to\n    coordinate the four legs to move in the forward (right) direction by applying\n    torques on the eight hinges connecting the two links of each leg and the torso\n    (nine parts and eight hinges).\n\n    ### Action Space\n    The action space is a `Box(-1, 1, (8,), float32)`. An action represents the torques applied at the hinge joints.\n\n    | Num | Action                                                            | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit         |\n    | --- | ----------------------------------------------------------------- | ----------- | ----------- | -------------------------------- | ----- | ------------ |\n    | 0   | Torque applied on the rotor between the torso and front left hip  | -1          | 1           | hip_1 (front_left_leg)           | hinge | torque (N m) |\n    | 1   | Torque applied on the rotor between the front left two links      | -1          | 1           | angle_1 (front_left_leg)         | hinge | torque (N m) |\n    | 2   | Torque applied on the rotor between the torso and front right hip | -1          | 1           | hip_2 (front_right_leg)          | hinge | torque (N m) |\n    | 3   | Torque applied on the rotor between the front right two links     | -1          | 1           | angle_2 (front_right_leg)        | hinge | torque (N m) |\n    | 4   | Torque applied on the rotor between the torso and back left hip   | -1          | 1           | hip_3 (back_leg)                 | hinge | torque (N m) |\n    | 5   | Torque applied on the rotor between the back left two links       | -1          | 1           | angle_3 (back_leg)               | hinge | torque (N m) |\n    | 6   | Torque applied on the rotor between the torso and back right hip  | -1          | 1           | hip_4 (right_back_leg)           | hinge | torque (N m) |\n    | 7   | Torque applied on the rotor between the back right two links      | -1          | 1           | angle_4 (right_back_leg)         | hinge | torque (N m) |\n\n    ### Observation Space\n\n    Observations consist of positional values of different body parts of the ant,\n    followed by the velocities of those individual parts (their derivatives) with all\n    the positions ordered before all the velocities.\n\n    By default, observations do not include the x- and y-coordinates of the ant's torso. These may\n    be included by passing `exclude_current_positions_from_observation=False` during construction.\n    In that case, the observation space will have 113 dimensions where the first two dimensions\n    represent the x- and y- coordinates of the ant's torso.\n    Regardless of whether `exclude_current_positions_from_observation` was set to true or false, the x- and y-coordinates\n    of the torso will be returned in `info` with keys `\"x_position\"` and `\"y_position\"`, respectively.\n\n    However, by default, an observation is a `ndarray` with shape `(111,)`\n    where the elements correspond to the following:\n\n    | Num | Observation                                                  | Min    | Max    | Name (in corresponding XML file)       | Joint | Unit                     |\n    |-----|--------------------------------------------------------------|--------|--------|----------------------------------------|-------|--------------------------|\n    | 0   | z-coordinate of the torso (centre)                           | -Inf   | Inf    | torso                                  | free  | position (m)             |\n    | 1   | x-orientation of the torso (centre)                          | -Inf   | Inf    | torso                                  | free  | angle (rad)              |\n    | 2   | y-orientation of the torso (centre)                          | -Inf   | Inf    | torso                                  | free  | angle (rad)              |\n    | 3   | z-orientation of the torso (centre)                          | -Inf   | Inf    | torso                                  | free  | angle (rad)              |\n    | 4   | w-orientation of the torso (centre)                          | -Inf   | Inf    | torso                                  | free  | angle (rad)              |\n    | 5   | angle between torso and first link on front left             | -Inf   | Inf    | hip_1 (front_left_leg)                 | hinge | angle (rad)              |\n    | 6   | angle between the two links on the front left                | -Inf   | Inf    | ankle_1 (front_left_leg)               | hinge | angle (rad)              |\n    | 7   | angle between torso and first link on front right            | -Inf   | Inf    | hip_2 (front_right_leg)                | hinge | angle (rad)              |\n    | 8   | angle between the two links on the front right               | -Inf   | Inf    | ankle_2 (front_right_leg)              | hinge | angle (rad)              |\n    | 9   | angle between torso and first link on back left              | -Inf   | Inf    | hip_3 (back_leg)                       | hinge | angle (rad)              |\n    | 10  | angle between the two links on the back left                 | -Inf   | Inf    | ankle_3 (back_leg)                     | hinge | angle (rad)              |\n    | 11  | angle between torso and first link on back right             | -Inf   | Inf    | hip_4 (right_back_leg)                 | hinge | angle (rad)              |\n    | 12  | angle between the two links on the back right                | -Inf   | Inf    | ankle_4 (right_back_leg)               | hinge | angle (rad)              |\n    | 13  | x-coordinate velocity of the torso                           | -Inf   | Inf    | torso                                  | free  | velocity (m/s)           |\n    | 14  | y-coordinate velocity of the torso                           | -Inf   | Inf    | torso                                  | free  | velocity (m/s)           |\n    | 15  | z-coordinate velocity of the torso                           | -Inf   | Inf    | torso                                  | free  | velocity (m/s)           |\n    | 16  | x-coordinate angular velocity of the torso                   | -Inf   | Inf    | torso                                  | free  | angular velocity (rad/s) |\n    | 17  | y-coordinate angular velocity of the torso                   | -Inf   | Inf    | torso                                  | free  | angular velocity (rad/s) |\n    | 18  | z-coordinate angular velocity of the torso                   | -Inf   | Inf    | torso                                  | free  | angular velocity (rad/s) |\n    | 19  | angular velocity of angle between torso and front left link  | -Inf   | Inf    | hip_1 (front_left_leg)                 | hinge | angle (rad)              |\n    | 20  | angular velocity of the angle between front left links       | -Inf   | Inf    | ankle_1 (front_left_leg)               | hinge | angle (rad)              |\n    | 21  | angular velocity of angle between torso and front right link | -Inf   | Inf    | hip_2 (front_right_leg)                | hinge | angle (rad)              |\n    | 22  | angular velocity of the angle between front right links      | -Inf   | Inf    | ankle_2 (front_right_leg)              | hinge | angle (rad)              |\n    | 23  | angular velocity of angle between torso and back left link   | -Inf   | Inf    | hip_3 (back_leg)                       | hinge | angle (rad)              |\n    | 24  | angular velocity of the angle between back left links        | -Inf   | Inf    | ankle_3 (back_leg)                     | hinge | angle (rad)              |\n    | 25  | angular velocity of angle between torso and back right link  | -Inf   | Inf    | hip_4 (right_back_leg)                 | hinge | angle (rad)              |\n    | 26  |angular velocity of the angle between back right links        | -Inf   | Inf    | ankle_4 (right_back_leg)               | hinge | angle (rad)              |\n\n\n    The remaining 14*6 = 84 elements of the observation are contact forces\n    (external forces - force x, y, z and torque x, y, z) applied to the\n    center of mass of each of the links. The 14 links are: the ground link,\n    the torso link, and 3 links for each leg (1 + 1 + 12) with the 6 external forces.\n\n    The (x,y,z) coordinates are translational DOFs while the orientations are rotational\n    DOFs expressed as quaternions. One can read more about free joints on the [Mujoco Documentation](https://mujoco.readthedocs.io/en/latest/XMLreference.html).\n\n\n    **Note:** Ant-v4 environment no longer has the following contact forces issue.\n    If using previous Humanoid versions from v4, there have been reported issues that using a Mujoco-Py version > 2.0 results\n    in the contact forces always being 0. As such we recommend to use a Mujoco-Py version < 2.0\n    when using the Ant environment if you would like to report results with contact forces (if\n    contact forces are not used in your experiments, you can use version > 2.0).\n\n    ### Rewards\n    The reward consists of three parts:\n    - *healthy_reward*: Every timestep that the ant is healthy (see definition in section \"Episode Termination\"), it gets a reward of fixed value `healthy_reward`\n    - *forward_reward*: A reward of moving forward which is measured as\n    *(x-coordinate before action - x-coordinate after action)/dt*. *dt* is the time\n    between actions and is dependent on the `frame_skip` parameter (default is 5),\n    where the frametime is 0.01 - making the default *dt = 5 * 0.01 = 0.05*.\n    This reward would be positive if the ant moves forward (in positive x direction).\n    - *ctrl_cost*: A negative reward for penalising the ant if it takes actions\n    that are too large. It is measured as *`ctrl_cost_weight` * sum(action<sup>2</sup>)*\n    where *`ctr_cost_weight`* is a parameter set for the control and has a default value of 0.5.\n    - *contact_cost*: A negative reward for penalising the ant if the external contact\n    force is too large. It is calculated *`contact_cost_weight` * sum(clip(external contact\n    force to `contact_force_range`)<sup>2</sup>)*.\n\n    The total reward returned is ***reward*** *=* *healthy_reward + forward_reward - ctrl_cost - contact_cost* and `info` will also contain the individual reward terms.\n\n    ### Starting State\n    All observations start in state\n    (0.0, 0.0,  0.75, 1.0, 0.0  ... 0.0) with a uniform noise in the range\n    of [-`reset_noise_scale`, `reset_noise_scale`] added to the positional values and standard normal noise\n    with mean 0 and standard deviation `reset_noise_scale` added to the velocity values for\n    stochasticity. Note that the initial z coordinate is intentionally selected\n    to be slightly high, thereby indicating a standing up ant. The initial orientation\n    is designed to make it face forward as well.\n\n    ### Episode End\n    The ant is said to be unhealthy if any of the following happens:\n\n    1. Any of the state space values is no longer finite\n    2. The z-coordinate of the torso is **not** in the closed interval given by `healthy_z_range` (defaults to [0.2, 1.0])\n\n    If `terminate_when_unhealthy=True` is passed during construction (which is the default),\n    the episode ends when any of the following happens:\n\n    1. Truncation: The episode duration reaches a 1000 timesteps\n    2. Termination: The ant is unhealthy\n\n    If `terminate_when_unhealthy=False` is passed, the episode is ended only when 1000 timesteps are exceeded.\n\n    ### Arguments\n\n    No additional arguments are currently supported in v2 and lower.\n\n    ```\n    env = gym.make('Ant-v2')\n    ```\n\n    v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.\n\n    ```\n    env = gym.make('Ant-v4', ctrl_cost_weight=0.1, ...)\n    ```\n\n    | Parameter               | Type       | Default      |Description                    |\n    |-------------------------|------------|--------------|-------------------------------|\n    | `xml_file`              | **str**    | `\"ant.xml\"`  | Path to a MuJoCo model |\n    | `ctrl_cost_weight`      | **float**  | `0.5`        | Weight for *ctrl_cost* term (see section on reward) |\n    | `contact_cost_weight`   | **float**  | `5e-4`       | Weight for *contact_cost* term (see section on reward) |\n    | `healthy_reward`        | **float**  | `1`          | Constant reward given if the ant is \"healthy\" after timestep |\n    | `terminate_when_unhealthy` | **bool**| `True`       | If true, issue a done signal if the z-coordinate of the torso is no longer in the `healthy_z_range` |\n    | `healthy_z_range`       | **tuple**  | `(0.2, 1)`   | The ant is considered healthy if the z-coordinate of the torso is in this range |\n    | `contact_force_range`   | **tuple**  | `(-1, 1)`    | Contact forces are clipped to this range in the computation of *contact_cost* |\n    | `reset_noise_scale`     | **float**  | `0.1`        | Scale of random perturbations of initial position and velocity (see section on Starting State) |\n    | `exclude_current_positions_from_observation`| **bool** | `True`| Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies |\n\n    ### Version History\n    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3\n    * v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)\n    * v2: All continuous control environments now use mujoco_py >= 1.50\n    * v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 20,\n    }\n\n    def __init__(\n        self,\n        xml_file=\"ant.xml\",\n        ctrl_cost_weight=0.5,\n        use_contact_forces=False,\n        contact_cost_weight=5e-4,\n        healthy_reward=1.0,\n        terminate_when_unhealthy=True,\n        healthy_z_range=(0.2, 1.0),\n        contact_force_range=(-1.0, 1.0),\n        reset_noise_scale=0.1,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            xml_file,\n            ctrl_cost_weight,\n            use_contact_forces,\n            contact_cost_weight,\n            healthy_reward,\n            terminate_when_unhealthy,\n            healthy_z_range,\n            contact_force_range,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._ctrl_cost_weight = ctrl_cost_weight\n        self._contact_cost_weight = contact_cost_weight\n\n        self._healthy_reward = healthy_reward\n        self._terminate_when_unhealthy = terminate_when_unhealthy\n        self._healthy_z_range = healthy_z_range\n\n        self._contact_force_range = contact_force_range\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._use_contact_forces = use_contact_forces\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n\n        obs_shape = 27\n        if not exclude_current_positions_from_observation:\n            obs_shape += 2\n        if use_contact_forces:\n            obs_shape += 84\n\n        observation_space = Box(\n            low=-np.inf, high=np.inf, shape=(obs_shape,), dtype=np.float64\n        )\n\n        MujocoEnv.__init__(\n            self, xml_file, 5, observation_space=observation_space, **kwargs\n        )\n\n    @property\n    def healthy_reward(self):\n        return (\n            float(self.is_healthy or self._terminate_when_unhealthy)\n            * self._healthy_reward\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(action))\n        return control_cost\n\n    @property\n    def contact_forces(self):\n        raw_contact_forces = self.data.cfrc_ext\n        min_value, max_value = self._contact_force_range\n        contact_forces = np.clip(raw_contact_forces, min_value, max_value)\n        return contact_forces\n\n    @property\n    def contact_cost(self):\n        contact_cost = self._contact_cost_weight * np.sum(\n            np.square(self.contact_forces)\n        )\n        return contact_cost\n\n    @property\n    def is_healthy(self):\n        state = self.state_vector()\n        min_z, max_z = self._healthy_z_range\n        is_healthy = np.isfinite(state).all() and min_z <= state[2] <= max_z\n        return is_healthy\n\n    @property\n    def terminated(self):\n        terminated = not self.is_healthy if self._terminate_when_unhealthy else False\n        return terminated\n\n    def step(self, action):\n        xy_position_before = self.get_body_com(\"torso\")[:2].copy()\n        self.do_simulation(action, self.frame_skip)\n        xy_position_after = self.get_body_com(\"torso\")[:2].copy()\n\n        xy_velocity = (xy_position_after - xy_position_before) / self.dt\n        x_velocity, y_velocity = xy_velocity\n\n        forward_reward = x_velocity\n        healthy_reward = self.healthy_reward\n\n        rewards = forward_reward + healthy_reward\n\n        costs = ctrl_cost = self.control_cost(action)\n\n        terminated = self.terminated\n        observation = self._get_obs()\n        info = {\n            \"reward_forward\": forward_reward,\n            \"reward_ctrl\": -ctrl_cost,\n            \"reward_survive\": healthy_reward,\n            \"x_position\": xy_position_after[0],\n            \"y_position\": xy_position_after[1],\n            \"distance_from_origin\": np.linalg.norm(xy_position_after, ord=2),\n            \"x_velocity\": x_velocity,\n            \"y_velocity\": y_velocity,\n            \"forward_reward\": forward_reward,\n        }\n        if self._use_contact_forces:\n            contact_cost = self.contact_cost\n            costs += contact_cost\n            info[\"reward_ctrl\"] = -contact_cost\n\n        reward = rewards - costs\n\n        if self.render_mode == \"human\":\n            self.render()\n        return observation, reward, terminated, False, info\n\n    def _get_obs(self):\n        position = self.data.qpos.flat.copy()\n        velocity = self.data.qvel.flat.copy()\n\n        if self._exclude_current_positions_from_observation:\n            position = position[2:]\n\n        if self._use_contact_forces:\n            contact_force = self.contact_forces.flat.copy()\n            return np.concatenate((position, velocity, contact_force))\n        else:\n            return np.concatenate((position, velocity))\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = (\n            self.init_qvel\n            + self._reset_noise_scale * self.np_random.standard_normal(self.model.nv)\n        )\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/mujoco/assets/ant.xml",
    "content": "<mujoco model=\"ant\">\n  <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n  <option integrator=\"RK4\" timestep=\"0.01\"/>\n  <custom>\n    <numeric data=\"0.0 0.0 0.55 1.0 0.0 0.0 0.0 0.0 1.0 0.0 -1.0 0.0 -1.0 0.0 1.0\" name=\"init_qpos\"/>\n  </custom>\n  <default>\n    <joint armature=\"1\" damping=\"1\" limited=\"true\"/>\n    <geom conaffinity=\"0\" condim=\"3\" density=\"5.0\" friction=\"1 0.5 0.5\" margin=\"0.01\" rgba=\"0.8 0.6 0.4 1\"/>\n  </default>\n  <asset>\n    <texture builtin=\"gradient\" height=\"100\" rgb1=\"1 1 1\" rgb2=\"0 0 0\" type=\"skybox\" width=\"100\"/>\n    <texture builtin=\"flat\" height=\"1278\" mark=\"cross\" markrgb=\"1 1 1\" name=\"texgeom\" random=\"0.01\" rgb1=\"0.8 0.6 0.4\" rgb2=\"0.8 0.6 0.4\" type=\"cube\" width=\"127\"/>\n    <texture builtin=\"checker\" height=\"100\" name=\"texplane\" rgb1=\"0 0 0\" rgb2=\"0.8 0.8 0.8\" type=\"2d\" width=\"100\"/>\n    <material name=\"MatPlane\" reflectance=\"0.5\" shininess=\"1\" specular=\"1\" texrepeat=\"60 60\" texture=\"texplane\"/>\n    <material name=\"geom\" texture=\"texgeom\" texuniform=\"true\"/>\n  </asset>\n  <worldbody>\n    <light cutoff=\"100\" diffuse=\"1 1 1\" dir=\"-0 0 -1.3\" directional=\"true\" exponent=\"1\" pos=\"0 0 1.3\" specular=\".1 .1 .1\"/>\n    <geom conaffinity=\"1\" condim=\"3\" material=\"MatPlane\" name=\"floor\" pos=\"0 0 0\" rgba=\"0.8 0.9 0.8 1\" size=\"40 40 40\" type=\"plane\"/>\n    <body name=\"torso\" pos=\"0 0 0.75\">\n      <camera name=\"track\" mode=\"trackcom\" pos=\"0 -3 0.3\" xyaxes=\"1 0 0 0 0 1\"/>\n      <geom name=\"torso_geom\" pos=\"0 0 0\" size=\"0.25\" type=\"sphere\"/>\n      <joint armature=\"0\" damping=\"0\" limited=\"false\" margin=\"0.01\" name=\"root\" pos=\"0 0 0\" type=\"free\"/>\n      <body name=\"front_left_leg\" pos=\"0 0 0\">\n        <geom fromto=\"0.0 0.0 0.0 0.2 0.2 0.0\" name=\"aux_1_geom\" size=\"0.08\" type=\"capsule\"/>\n        <body name=\"aux_1\" pos=\"0.2 0.2 0\">\n          <joint axis=\"0 0 1\" name=\"hip_1\" pos=\"0.0 0.0 0.0\" range=\"-30 30\" type=\"hinge\"/>\n          <geom fromto=\"0.0 0.0 0.0 0.2 0.2 0.0\" name=\"left_leg_geom\" size=\"0.08\" type=\"capsule\"/>\n          <body pos=\"0.2 0.2 0\">\n            <joint axis=\"-1 1 0\" name=\"ankle_1\" pos=\"0.0 0.0 0.0\" range=\"30 70\" type=\"hinge\"/>\n            <geom fromto=\"0.0 0.0 0.0 0.4 0.4 0.0\" name=\"left_ankle_geom\" size=\"0.08\" type=\"capsule\"/>\n          </body>\n        </body>\n      </body>\n      <body name=\"front_right_leg\" pos=\"0 0 0\">\n        <geom fromto=\"0.0 0.0 0.0 -0.2 0.2 0.0\" name=\"aux_2_geom\" size=\"0.08\" type=\"capsule\"/>\n        <body name=\"aux_2\" pos=\"-0.2 0.2 0\">\n          <joint axis=\"0 0 1\" name=\"hip_2\" pos=\"0.0 0.0 0.0\" range=\"-30 30\" type=\"hinge\"/>\n          <geom fromto=\"0.0 0.0 0.0 -0.2 0.2 0.0\" name=\"right_leg_geom\" size=\"0.08\" type=\"capsule\"/>\n          <body pos=\"-0.2 0.2 0\">\n            <joint axis=\"1 1 0\" name=\"ankle_2\" pos=\"0.0 0.0 0.0\" range=\"-70 -30\" type=\"hinge\"/>\n            <geom fromto=\"0.0 0.0 0.0 -0.4 0.4 0.0\" name=\"right_ankle_geom\" size=\"0.08\" type=\"capsule\"/>\n          </body>\n        </body>\n      </body>\n      <body name=\"back_leg\" pos=\"0 0 0\">\n        <geom fromto=\"0.0 0.0 0.0 -0.2 -0.2 0.0\" name=\"aux_3_geom\" size=\"0.08\" type=\"capsule\"/>\n        <body name=\"aux_3\" pos=\"-0.2 -0.2 0\">\n          <joint axis=\"0 0 1\" name=\"hip_3\" pos=\"0.0 0.0 0.0\" range=\"-30 30\" type=\"hinge\"/>\n          <geom fromto=\"0.0 0.0 0.0 -0.2 -0.2 0.0\" name=\"back_leg_geom\" size=\"0.08\" type=\"capsule\"/>\n          <body pos=\"-0.2 -0.2 0\">\n            <joint axis=\"-1 1 0\" name=\"ankle_3\" pos=\"0.0 0.0 0.0\" range=\"-70 -30\" type=\"hinge\"/>\n            <geom fromto=\"0.0 0.0 0.0 -0.4 -0.4 0.0\" name=\"third_ankle_geom\" size=\"0.08\" type=\"capsule\"/>\n          </body>\n        </body>\n      </body>\n      <body name=\"right_back_leg\" pos=\"0 0 0\">\n        <geom fromto=\"0.0 0.0 0.0 0.2 -0.2 0.0\" name=\"aux_4_geom\" size=\"0.08\" type=\"capsule\"/>\n        <body name=\"aux_4\" pos=\"0.2 -0.2 0\">\n          <joint axis=\"0 0 1\" name=\"hip_4\" pos=\"0.0 0.0 0.0\" range=\"-30 30\" type=\"hinge\"/>\n          <geom fromto=\"0.0 0.0 0.0 0.2 -0.2 0.0\" name=\"rightback_leg_geom\" size=\"0.08\" type=\"capsule\"/>\n          <body pos=\"0.2 -0.2 0\">\n            <joint axis=\"1 1 0\" name=\"ankle_4\" pos=\"0.0 0.0 0.0\" range=\"30 70\" type=\"hinge\"/>\n            <geom fromto=\"0.0 0.0 0.0 0.4 -0.4 0.0\" name=\"fourth_ankle_geom\" size=\"0.08\" type=\"capsule\"/>\n          </body>\n        </body>\n      </body>\n    </body>\n  </worldbody>\n  <actuator>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" joint=\"hip_4\" gear=\"150\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" joint=\"ankle_4\" gear=\"150\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" joint=\"hip_1\" gear=\"150\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" joint=\"ankle_1\" gear=\"150\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" joint=\"hip_2\" gear=\"150\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" joint=\"ankle_2\" gear=\"150\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" joint=\"hip_3\" gear=\"150\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" joint=\"ankle_3\" gear=\"150\"/>\n  </actuator>\n</mujoco>\n"
  },
  {
    "path": "gym/envs/mujoco/assets/half_cheetah.xml",
    "content": "<!-- Cheetah Model\n\n    The state space is populated with joints in the order that they are\n    defined in this file. The actuators also operate on joints.\n\n    State-Space (name/joint/parameter):\n        - rootx     slider      position (m)\n        - rootz     slider      position (m)\n        - rooty     hinge       angle (rad)\n        - bthigh    hinge       angle (rad)\n        - bshin     hinge       angle (rad)\n        - bfoot     hinge       angle (rad)\n        - fthigh    hinge       angle (rad)\n        - fshin     hinge       angle (rad)\n        - ffoot     hinge       angle (rad)\n        - rootx     slider      velocity (m/s)\n        - rootz     slider      velocity (m/s)\n        - rooty     hinge       angular velocity (rad/s)\n        - bthigh    hinge       angular velocity (rad/s)\n        - bshin     hinge       angular velocity (rad/s)\n        - bfoot     hinge       angular velocity (rad/s)\n        - fthigh    hinge       angular velocity (rad/s)\n        - fshin     hinge       angular velocity (rad/s)\n        - ffoot     hinge       angular velocity (rad/s)\n\n    Actuators (name/actuator/parameter):\n        - bthigh    hinge       torque (N m)\n        - bshin     hinge       torque (N m)\n        - bfoot     hinge       torque (N m)\n        - fthigh    hinge       torque (N m)\n        - fshin     hinge       torque (N m)\n        - ffoot     hinge       torque (N m)\n\n-->\n<mujoco model=\"cheetah\">\n  <compiler angle=\"radian\" coordinate=\"local\" inertiafromgeom=\"true\" settotalmass=\"14\"/>\n  <default>\n    <joint armature=\".1\" damping=\".01\" limited=\"true\" solimplimit=\"0 .8 .03\" solreflimit=\".02 1\" stiffness=\"8\"/>\n    <geom conaffinity=\"0\" condim=\"3\" contype=\"1\" friction=\".4 .1 .1\" rgba=\"0.8 0.6 .4 1\" solimp=\"0.0 0.8 0.01\" solref=\"0.02 1\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1 1\"/>\n  </default>\n  <size nstack=\"300000\" nuser_geom=\"1\"/>\n  <option gravity=\"0 0 -9.81\" timestep=\"0.01\"/>\n  <asset>\n    <texture builtin=\"gradient\" height=\"100\" rgb1=\"1 1 1\" rgb2=\"0 0 0\" type=\"skybox\" width=\"100\"/>\n    <texture builtin=\"flat\" height=\"1278\" mark=\"cross\" markrgb=\"1 1 1\" name=\"texgeom\" random=\"0.01\" rgb1=\"0.8 0.6 0.4\" rgb2=\"0.8 0.6 0.4\" type=\"cube\" width=\"127\"/>\n    <texture builtin=\"checker\" height=\"100\" name=\"texplane\" rgb1=\"0 0 0\" rgb2=\"0.8 0.8 0.8\" type=\"2d\" width=\"100\"/>\n    <material name=\"MatPlane\" reflectance=\"0.5\" shininess=\"1\" specular=\"1\" texrepeat=\"60 60\" texture=\"texplane\"/>\n    <material name=\"geom\" texture=\"texgeom\" texuniform=\"true\"/>\n  </asset>\n  <worldbody>\n    <light cutoff=\"100\" diffuse=\"1 1 1\" dir=\"-0 0 -1.3\" directional=\"true\" exponent=\"1\" pos=\"0 0 1.3\" specular=\".1 .1 .1\"/>\n    <geom conaffinity=\"1\" condim=\"3\" material=\"MatPlane\" name=\"floor\" pos=\"0 0 0\" rgba=\"0.8 0.9 0.8 1\" size=\"40 40 40\" type=\"plane\"/>\n    <body name=\"torso\" pos=\"0 0 .7\">\n      <camera name=\"track\" mode=\"trackcom\" pos=\"0 -3 0.3\" xyaxes=\"1 0 0 0 0 1\"/>\n      <joint armature=\"0\" axis=\"1 0 0\" damping=\"0\" limited=\"false\" name=\"rootx\" pos=\"0 0 0\" stiffness=\"0\" type=\"slide\"/>\n      <joint armature=\"0\" axis=\"0 0 1\" damping=\"0\" limited=\"false\" name=\"rootz\" pos=\"0 0 0\" stiffness=\"0\" type=\"slide\"/>\n      <joint armature=\"0\" axis=\"0 1 0\" damping=\"0\" limited=\"false\" name=\"rooty\" pos=\"0 0 0\" stiffness=\"0\" type=\"hinge\"/>\n      <geom fromto=\"-.5 0 0 .5 0 0\" name=\"torso\" size=\"0.046\" type=\"capsule\"/>\n      <geom axisangle=\"0 1 0 .87\" name=\"head\" pos=\".6 0 .1\" size=\"0.046 .15\" type=\"capsule\"/>\n      <!-- <site name='tip'  pos='.15 0 .11'/>-->\n      <body name=\"bthigh\" pos=\"-.5 0 0\">\n        <joint axis=\"0 1 0\" damping=\"6\" name=\"bthigh\" pos=\"0 0 0\" range=\"-.52 1.05\" stiffness=\"240\" type=\"hinge\"/>\n        <geom axisangle=\"0 1 0 -3.8\" name=\"bthigh\" pos=\".1 0 -.13\" size=\"0.046 .145\" type=\"capsule\"/>\n        <body name=\"bshin\" pos=\".16 0 -.25\">\n          <joint axis=\"0 1 0\" damping=\"4.5\" name=\"bshin\" pos=\"0 0 0\" range=\"-.785 .785\" stiffness=\"180\" type=\"hinge\"/>\n          <geom axisangle=\"0 1 0 -2.03\" name=\"bshin\" pos=\"-.14 0 -.07\" rgba=\"0.9 0.6 0.6 1\" size=\"0.046 .15\" type=\"capsule\"/>\n          <body name=\"bfoot\" pos=\"-.28 0 -.14\">\n            <joint axis=\"0 1 0\" damping=\"3\" name=\"bfoot\" pos=\"0 0 0\" range=\"-.4 .785\" stiffness=\"120\" type=\"hinge\"/>\n            <geom axisangle=\"0 1 0 -.27\" name=\"bfoot\" pos=\".03 0 -.097\" rgba=\"0.9 0.6 0.6 1\" size=\"0.046 .094\" type=\"capsule\"/>\n          </body>\n        </body>\n      </body>\n      <body name=\"fthigh\" pos=\".5 0 0\">\n        <joint axis=\"0 1 0\" damping=\"4.5\" name=\"fthigh\" pos=\"0 0 0\" range=\"-1 .7\" stiffness=\"180\" type=\"hinge\"/>\n        <geom axisangle=\"0 1 0 .52\" name=\"fthigh\" pos=\"-.07 0 -.12\" size=\"0.046 .133\" type=\"capsule\"/>\n        <body name=\"fshin\" pos=\"-.14 0 -.24\">\n          <joint axis=\"0 1 0\" damping=\"3\" name=\"fshin\" pos=\"0 0 0\" range=\"-1.2 .87\" stiffness=\"120\" type=\"hinge\"/>\n          <geom axisangle=\"0 1 0 -.6\" name=\"fshin\" pos=\".065 0 -.09\" rgba=\"0.9 0.6 0.6 1\" size=\"0.046 .106\" type=\"capsule\"/>\n          <body name=\"ffoot\" pos=\".13 0 -.18\">\n            <joint axis=\"0 1 0\" damping=\"1.5\" name=\"ffoot\" pos=\"0 0 0\" range=\"-.5 .5\" stiffness=\"60\" type=\"hinge\"/>\n            <geom axisangle=\"0 1 0 -.6\" name=\"ffoot\" pos=\".045 0 -.07\" rgba=\"0.9 0.6 0.6 1\" size=\"0.046 .07\" type=\"capsule\"/>\n          </body>\n        </body>\n      </body>\n    </body>\n  </worldbody>\n  <actuator>\n    <motor gear=\"120\" joint=\"bthigh\" name=\"bthigh\"/>\n    <motor gear=\"90\" joint=\"bshin\" name=\"bshin\"/>\n    <motor gear=\"60\" joint=\"bfoot\" name=\"bfoot\"/>\n    <motor gear=\"120\" joint=\"fthigh\" name=\"fthigh\"/>\n    <motor gear=\"60\" joint=\"fshin\" name=\"fshin\"/>\n    <motor gear=\"30\" joint=\"ffoot\" name=\"ffoot\"/>\n  </actuator>\n</mujoco>\n"
  },
  {
    "path": "gym/envs/mujoco/assets/hopper.xml",
    "content": "<mujoco model=\"hopper\">\n  <compiler angle=\"degree\" coordinate=\"global\" inertiafromgeom=\"true\"/>\n  <default>\n    <joint armature=\"1\" damping=\"1\" limited=\"true\"/>\n    <geom conaffinity=\"1\" condim=\"1\" contype=\"1\" margin=\"0.001\" material=\"geom\" rgba=\"0.8 0.6 .4 1\" solimp=\".8 .8 .01\" solref=\".02 1\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-.4 .4\"/>\n  </default>\n  <option integrator=\"RK4\" timestep=\"0.002\"/>\n  <visual>\n    <map znear=\"0.02\"/>\n  </visual>\n  <worldbody>\n    <light cutoff=\"100\" diffuse=\"1 1 1\" dir=\"-0 0 -1.3\" directional=\"true\" exponent=\"1\" pos=\"0 0 1.3\" specular=\".1 .1 .1\"/>\n    <geom conaffinity=\"1\" condim=\"3\" name=\"floor\" pos=\"0 0 0\" rgba=\"0.8 0.9 0.8 1\" size=\"20 20 .125\" type=\"plane\" material=\"MatPlane\"/>\n    <body name=\"torso\" pos=\"0 0 1.25\">\n      <camera name=\"track\" mode=\"trackcom\" pos=\"0 -3 1\" xyaxes=\"1 0 0 0 0 1\"/>\n      <joint armature=\"0\" axis=\"1 0 0\" damping=\"0\" limited=\"false\" name=\"rootx\" pos=\"0 0 0\" stiffness=\"0\" type=\"slide\"/>\n      <joint armature=\"0\" axis=\"0 0 1\" damping=\"0\" limited=\"false\" name=\"rootz\" pos=\"0 0 0\" ref=\"1.25\" stiffness=\"0\" type=\"slide\"/>\n      <joint armature=\"0\" axis=\"0 1 0\" damping=\"0\" limited=\"false\" name=\"rooty\" pos=\"0 0 1.25\" stiffness=\"0\" type=\"hinge\"/>\n      <geom friction=\"0.9\" fromto=\"0 0 1.45 0 0 1.05\" name=\"torso_geom\" size=\"0.05\" type=\"capsule\"/>\n      <body name=\"thigh\" pos=\"0 0 1.05\">\n        <joint axis=\"0 -1 0\" name=\"thigh_joint\" pos=\"0 0 1.05\" range=\"-150 0\" type=\"hinge\"/>\n        <geom friction=\"0.9\" fromto=\"0 0 1.05 0 0 0.6\" name=\"thigh_geom\" size=\"0.05\" type=\"capsule\"/>\n        <body name=\"leg\" pos=\"0 0 0.35\">\n          <joint axis=\"0 -1 0\" name=\"leg_joint\" pos=\"0 0 0.6\" range=\"-150 0\" type=\"hinge\"/>\n          <geom friction=\"0.9\" fromto=\"0 0 0.6 0 0 0.1\" name=\"leg_geom\" size=\"0.04\" type=\"capsule\"/>\n          <body name=\"foot\" pos=\"0.13 0 0\">\n            <joint axis=\"0 -1 0\" name=\"foot_joint\" pos=\"0 0 0.1\" range=\"-45 45\" type=\"hinge\"/>\n            <geom friction=\"2.0\" fromto=\"-0.13 0 0.1 0.26 0 0.1\" name=\"foot_geom\" size=\"0.06\" type=\"capsule\"/>\n          </body>\n        </body>\n      </body>\n    </body>\n  </worldbody>\n  <actuator>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" gear=\"200.0\" joint=\"thigh_joint\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" gear=\"200.0\" joint=\"leg_joint\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" gear=\"200.0\" joint=\"foot_joint\"/>\n  </actuator>\n    <asset>\n        <texture type=\"skybox\" builtin=\"gradient\" rgb1=\".4 .5 .6\" rgb2=\"0 0 0\"\n            width=\"100\" height=\"100\"/>\n        <texture builtin=\"flat\" height=\"1278\" mark=\"cross\" markrgb=\"1 1 1\" name=\"texgeom\" random=\"0.01\" rgb1=\"0.8 0.6 0.4\" rgb2=\"0.8 0.6 0.4\" type=\"cube\" width=\"127\"/>\n        <texture builtin=\"checker\" height=\"100\" name=\"texplane\" rgb1=\"0 0 0\" rgb2=\"0.8 0.8 0.8\" type=\"2d\" width=\"100\"/>\n        <material name=\"MatPlane\" reflectance=\"0.5\" shininess=\"1\" specular=\"1\" texrepeat=\"60 60\" texture=\"texplane\"/>\n        <material name=\"geom\" texture=\"texgeom\" texuniform=\"true\"/>\n    </asset>\n</mujoco>\n"
  },
  {
    "path": "gym/envs/mujoco/assets/humanoid.xml",
    "content": "<mujoco model=\"humanoid\">\r\n    <compiler angle=\"degree\" inertiafromgeom=\"true\"/>\r\n    <default>\r\n        <joint armature=\"1\" damping=\"1\" limited=\"true\"/>\r\n        <geom conaffinity=\"1\" condim=\"1\" contype=\"1\" margin=\"0.001\" material=\"geom\" rgba=\"0.8 0.6 .4 1\"/>\r\n        <motor ctrllimited=\"true\" ctrlrange=\"-.4 .4\"/>\r\n    </default>\r\n    <option integrator=\"RK4\" iterations=\"50\" solver=\"PGS\" timestep=\"0.003\">\r\n        <!-- <flags solverstat=\"enable\" energy=\"enable\"/>-->\r\n    </option>\r\n    <size nkey=\"5\" nuser_geom=\"1\"/>\r\n    <visual>\r\n        <map fogend=\"5\" fogstart=\"3\"/>\r\n    </visual>\r\n    <asset>\r\n        <texture builtin=\"gradient\" height=\"100\" rgb1=\".4 .5 .6\" rgb2=\"0 0 0\" type=\"skybox\" width=\"100\"/>\r\n        <!-- <texture builtin=\"gradient\" height=\"100\" rgb1=\"1 1 1\" rgb2=\"0 0 0\" type=\"skybox\" width=\"100\"/>-->\r\n        <texture builtin=\"flat\" height=\"1278\" mark=\"cross\" markrgb=\"1 1 1\" name=\"texgeom\" random=\"0.01\" rgb1=\"0.8 0.6 0.4\" rgb2=\"0.8 0.6 0.4\" type=\"cube\" width=\"127\"/>\r\n        <texture builtin=\"checker\" height=\"100\" name=\"texplane\" rgb1=\"0 0 0\" rgb2=\"0.8 0.8 0.8\" type=\"2d\" width=\"100\"/>\r\n        <material name=\"MatPlane\" reflectance=\"0.5\" shininess=\"1\" specular=\"1\" texrepeat=\"60 60\" texture=\"texplane\"/>\r\n        <material name=\"geom\" texture=\"texgeom\" texuniform=\"true\"/>\r\n    </asset>\r\n    <worldbody>\r\n        <light cutoff=\"100\" diffuse=\"1 1 1\" dir=\"-0 0 -1.3\" directional=\"true\" exponent=\"1\" pos=\"0 0 1.3\" specular=\".1 .1 .1\"/>\r\n        <geom condim=\"3\" friction=\"1 .1 .1\" material=\"MatPlane\" name=\"floor\" pos=\"0 0 0\" rgba=\"0.8 0.9 0.8 1\" size=\"20 20 0.125\" type=\"plane\"/>\r\n        <!-- <geom condim=\"3\" material=\"MatPlane\" name=\"floor\" pos=\"0 0 0\" size=\"10 10 0.125\" type=\"plane\"/>-->\r\n        <body name=\"torso\" pos=\"0 0 1.4\">\r\n            <camera name=\"track\" mode=\"trackcom\" pos=\"0 -4 0\" xyaxes=\"1 0 0 0 0 1\"/>\r\n            <joint armature=\"0\" damping=\"0\" limited=\"false\" name=\"root\" pos=\"0 0 0\" stiffness=\"0\" type=\"free\"/>\r\n            <geom fromto=\"0 -.07 0 0 .07 0\" name=\"torso1\" size=\"0.07\" type=\"capsule\"/>\r\n            <geom name=\"head\" pos=\"0 0 .19\" size=\".09\" type=\"sphere\" user=\"258\"/>\r\n            <geom fromto=\"-.01 -.06 -.12 -.01 .06 -.12\" name=\"uwaist\" size=\"0.06\" type=\"capsule\"/>\r\n            <body name=\"lwaist\" pos=\"-.01 0 -0.260\" quat=\"1.000 0 -0.002 0\">\r\n                <geom fromto=\"0 -.06 0 0 .06 0\" name=\"lwaist\" size=\"0.06\" type=\"capsule\"/>\r\n                <joint armature=\"0.02\" axis=\"0 0 1\" damping=\"5\" name=\"abdomen_z\" pos=\"0 0 0.065\" range=\"-45 45\" stiffness=\"20\" type=\"hinge\"/>\r\n                <joint armature=\"0.02\" axis=\"0 1 0\" damping=\"5\" name=\"abdomen_y\" pos=\"0 0 0.065\" range=\"-75 30\" stiffness=\"10\" type=\"hinge\"/>\r\n                <body name=\"pelvis\" pos=\"0 0 -0.165\" quat=\"1.000 0 -0.002 0\">\r\n                    <joint armature=\"0.02\" axis=\"1 0 0\" damping=\"5\" name=\"abdomen_x\" pos=\"0 0 0.1\" range=\"-35 35\" stiffness=\"10\" type=\"hinge\"/>\r\n                    <geom fromto=\"-.02 -.07 0 -.02 .07 0\" name=\"butt\" size=\"0.09\" type=\"capsule\"/>\r\n                    <body name=\"right_thigh\" pos=\"0 -0.1 -0.04\">\r\n                        <joint armature=\"0.01\" axis=\"1 0 0\" damping=\"5\" name=\"right_hip_x\" pos=\"0 0 0\" range=\"-25 5\" stiffness=\"10\" type=\"hinge\"/>\r\n                        <joint armature=\"0.01\" axis=\"0 0 1\" damping=\"5\" name=\"right_hip_z\" pos=\"0 0 0\" range=\"-60 35\" stiffness=\"10\" type=\"hinge\"/>\r\n                        <joint armature=\"0.0080\" axis=\"0 1 0\" damping=\"5\" name=\"right_hip_y\" pos=\"0 0 0\" range=\"-110 20\" stiffness=\"20\" type=\"hinge\"/>\r\n                        <geom fromto=\"0 0 0 0 0.01 -.34\" name=\"right_thigh1\" size=\"0.06\" type=\"capsule\"/>\r\n                        <body name=\"right_shin\" pos=\"0 0.01 -0.403\">\r\n                            <joint armature=\"0.0060\" axis=\"0 -1 0\" name=\"right_knee\" pos=\"0 0 .02\" range=\"-160 -2\" type=\"hinge\"/>\r\n                            <geom fromto=\"0 0 0 0 0 -.3\" name=\"right_shin1\" size=\"0.049\" type=\"capsule\"/>\r\n                            <body name=\"right_foot\" pos=\"0 0 -0.45\">\r\n                                <geom name=\"right_foot\" pos=\"0 0 0.1\" size=\"0.075\" type=\"sphere\" user=\"0\"/>\r\n                            </body>\r\n                        </body>\r\n                    </body>\r\n                    <body name=\"left_thigh\" pos=\"0 0.1 -0.04\">\r\n                        <joint armature=\"0.01\" axis=\"-1 0 0\" damping=\"5\" name=\"left_hip_x\" pos=\"0 0 0\" range=\"-25 5\" stiffness=\"10\" type=\"hinge\"/>\r\n                        <joint armature=\"0.01\" axis=\"0 0 -1\" damping=\"5\" name=\"left_hip_z\" pos=\"0 0 0\" range=\"-60 35\" stiffness=\"10\" type=\"hinge\"/>\r\n                        <joint armature=\"0.01\" axis=\"0 1 0\" damping=\"5\" name=\"left_hip_y\" pos=\"0 0 0\" range=\"-110 20\" stiffness=\"20\" type=\"hinge\"/>\r\n                        <geom fromto=\"0 0 0 0 -0.01 -.34\" name=\"left_thigh1\" size=\"0.06\" type=\"capsule\"/>\r\n                        <body name=\"left_shin\" pos=\"0 -0.01 -0.403\">\r\n                            <joint armature=\"0.0060\" axis=\"0 -1 0\" name=\"left_knee\" pos=\"0 0 .02\" range=\"-160 -2\" stiffness=\"1\" type=\"hinge\"/>\r\n                            <geom fromto=\"0 0 0 0 0 -.3\" name=\"left_shin1\" size=\"0.049\" type=\"capsule\"/>\r\n                            <body name=\"left_foot\" pos=\"0 0 -0.45\">\r\n                                <geom name=\"left_foot\" type=\"sphere\" size=\"0.075\" pos=\"0 0 0.1\" user=\"0\" />\r\n                            </body>\r\n                        </body>\r\n                    </body>\r\n                </body>\r\n            </body>\r\n            <body name=\"right_upper_arm\" pos=\"0 -0.17 0.06\">\r\n                <joint armature=\"0.0068\" axis=\"2 1 1\" name=\"right_shoulder1\" pos=\"0 0 0\" range=\"-85 60\" stiffness=\"1\" type=\"hinge\"/>\r\n                <joint armature=\"0.0051\" axis=\"0 -1 1\" name=\"right_shoulder2\" pos=\"0 0 0\" range=\"-85 60\" stiffness=\"1\" type=\"hinge\"/>\r\n                <geom fromto=\"0 0 0 .16 -.16 -.16\" name=\"right_uarm1\" size=\"0.04 0.16\" type=\"capsule\"/>\r\n                <body name=\"right_lower_arm\" pos=\".18 -.18 -.18\">\r\n                    <joint armature=\"0.0028\" axis=\"0 -1 1\" name=\"right_elbow\" pos=\"0 0 0\" range=\"-90 50\" stiffness=\"0\" type=\"hinge\"/>\r\n                    <geom fromto=\"0.01 0.01 0.01 .17 .17 .17\" name=\"right_larm\" size=\"0.031\" type=\"capsule\"/>\r\n                    <geom name=\"right_hand\" pos=\".18 .18 .18\" size=\"0.04\" type=\"sphere\"/>\r\n                    <camera pos=\"0 0 0\"/>\r\n                </body>\r\n            </body>\r\n            <body name=\"left_upper_arm\" pos=\"0 0.17 0.06\">\r\n                <joint armature=\"0.0068\" axis=\"2 -1 1\" name=\"left_shoulder1\" pos=\"0 0 0\" range=\"-60 85\" stiffness=\"1\" type=\"hinge\"/>\r\n                <joint armature=\"0.0051\" axis=\"0 1 1\" name=\"left_shoulder2\" pos=\"0 0 0\" range=\"-60 85\" stiffness=\"1\" type=\"hinge\"/>\r\n                <geom fromto=\"0 0 0 .16 .16 -.16\" name=\"left_uarm1\" size=\"0.04 0.16\" type=\"capsule\"/>\r\n                <body name=\"left_lower_arm\" pos=\".18 .18 -.18\">\r\n                    <joint armature=\"0.0028\" axis=\"0 -1 -1\" name=\"left_elbow\" pos=\"0 0 0\" range=\"-90 50\" stiffness=\"0\" type=\"hinge\"/>\r\n                    <geom fromto=\"0.01 -0.01 0.01 .17 -.17 .17\" name=\"left_larm\" size=\"0.031\" type=\"capsule\"/>\r\n                    <geom name=\"left_hand\" pos=\".18 -.18 .18\" size=\"0.04\" type=\"sphere\"/>\r\n                </body>\r\n            </body>\r\n        </body>\r\n    </worldbody>\r\n    <tendon>\r\n        <fixed name=\"left_hipknee\">\r\n            <joint coef=\"-1\" joint=\"left_hip_y\"/>\r\n            <joint coef=\"1\" joint=\"left_knee\"/>\r\n        </fixed>\r\n        <fixed name=\"right_hipknee\">\r\n            <joint coef=\"-1\" joint=\"right_hip_y\"/>\r\n            <joint coef=\"1\" joint=\"right_knee\"/>\r\n        </fixed>\r\n    </tendon>\r\n\r\n    <actuator>\r\n        <motor gear=\"100\" joint=\"abdomen_y\" name=\"abdomen_y\"/>\r\n        <motor gear=\"100\" joint=\"abdomen_z\" name=\"abdomen_z\"/>\r\n        <motor gear=\"100\" joint=\"abdomen_x\" name=\"abdomen_x\"/>\r\n        <motor gear=\"100\" joint=\"right_hip_x\" name=\"right_hip_x\"/>\r\n        <motor gear=\"100\" joint=\"right_hip_z\" name=\"right_hip_z\"/>\r\n        <motor gear=\"300\" joint=\"right_hip_y\" name=\"right_hip_y\"/>\r\n        <motor gear=\"200\" joint=\"right_knee\" name=\"right_knee\"/>\r\n        <motor gear=\"100\" joint=\"left_hip_x\" name=\"left_hip_x\"/>\r\n        <motor gear=\"100\" joint=\"left_hip_z\" name=\"left_hip_z\"/>\r\n        <motor gear=\"300\" joint=\"left_hip_y\" name=\"left_hip_y\"/>\r\n        <motor gear=\"200\" joint=\"left_knee\" name=\"left_knee\"/>\r\n        <motor gear=\"25\" joint=\"right_shoulder1\" name=\"right_shoulder1\"/>\r\n        <motor gear=\"25\" joint=\"right_shoulder2\" name=\"right_shoulder2\"/>\r\n        <motor gear=\"25\" joint=\"right_elbow\" name=\"right_elbow\"/>\r\n        <motor gear=\"25\" joint=\"left_shoulder1\" name=\"left_shoulder1\"/>\r\n        <motor gear=\"25\" joint=\"left_shoulder2\" name=\"left_shoulder2\"/>\r\n        <motor gear=\"25\" joint=\"left_elbow\" name=\"left_elbow\"/>\r\n    </actuator>\r\n</mujoco>\r\n"
  },
  {
    "path": "gym/envs/mujoco/assets/humanoidstandup.xml",
    "content": "<mujoco model=\"humanoidstandup\">\r\n    <compiler angle=\"degree\" inertiafromgeom=\"true\"/>\r\n    <default>\r\n        <joint armature=\"1\" damping=\"1\" limited=\"true\"/>\r\n        <geom conaffinity=\"1\" condim=\"1\" contype=\"1\" margin=\"0.001\" material=\"geom\" rgba=\"0.8 0.6 .4 1\"/>\r\n        <motor ctrllimited=\"true\" ctrlrange=\"-.4 .4\"/>\r\n    </default>\r\n    <option integrator=\"RK4\" iterations=\"50\" solver=\"PGS\" timestep=\"0.003\">\r\n        <!-- <flags solverstat=\"enable\" energy=\"enable\"/>-->\r\n    </option>\r\n    <size nkey=\"5\" nuser_geom=\"1\"/>\r\n    <visual>\r\n        <map fogend=\"5\" fogstart=\"3\"/>\r\n    </visual>\r\n    <asset>\r\n        <texture builtin=\"gradient\" height=\"100\" rgb1=\".4 .5 .6\" rgb2=\"0 0 0\" type=\"skybox\" width=\"100\"/>\r\n        <!-- <texture builtin=\"gradient\" height=\"100\" rgb1=\"1 1 1\" rgb2=\"0 0 0\" type=\"skybox\" width=\"100\"/>-->\r\n        <texture builtin=\"flat\" height=\"1278\" mark=\"cross\" markrgb=\"1 1 1\" name=\"texgeom\" random=\"0.01\" rgb1=\"0.8 0.6 0.4\" rgb2=\"0.8 0.6 0.4\" type=\"cube\" width=\"127\"/>\r\n        <texture builtin=\"checker\" height=\"100\" name=\"texplane\" rgb1=\"0 0 0\" rgb2=\"0.8 0.8 0.8\" type=\"2d\" width=\"100\"/>\r\n        <material name=\"MatPlane\" reflectance=\"0.5\" shininess=\"1\" specular=\"1\" texrepeat=\"60 60\" texture=\"texplane\"/>\r\n        <material name=\"geom\" texture=\"texgeom\" texuniform=\"true\"/>\r\n    </asset>\r\n    <worldbody>\r\n        <light cutoff=\"100\" diffuse=\"1 1 1\" dir=\"-0 0 -1.3\" directional=\"true\" exponent=\"1\" pos=\"0 0 1.3\" specular=\".1 .1 .1\"/>\r\n        <geom condim=\"3\" friction=\"1 .1 .1\" material=\"MatPlane\" name=\"floor\" pos=\"0 0 0\" rgba=\"0.8 0.9 0.8 1\" size=\"20 20 0.125\" type=\"plane\"/>\r\n        <!-- <geom condim=\"3\" material=\"MatPlane\" name=\"floor\" pos=\"0 0 0\" size=\"10 10 0.125\" type=\"plane\"/>-->\r\n        <body name=\"torso\" pos=\"0 0 .105\">\r\n            <camera name=\"track\" mode=\"trackcom\" pos=\"0 -3 .5\" xyaxes=\"1 0 0 0 0 1\"/>\r\n            <joint armature=\"0\" damping=\"0\" limited=\"false\" name=\"root\" pos=\"0 0 0\" stiffness=\"0\" type=\"free\"/>\r\n            <geom fromto=\"0 -.07 0 0 .07 0\" name=\"torso1\" size=\"0.07\" type=\"capsule\"/>\r\n            <geom name=\"head\" pos=\"-.15 0 0\" size=\".09\" type=\"sphere\" user=\"258\"/>\r\n            <geom fromto=\".11 -.06 0 .11 .06 0\" name=\"uwaist\" size=\"0.06\" type=\"capsule\"/>\r\n            <body name=\"lwaist\" pos=\".21 0 0\" quat=\"1.000 0 -0.002 0\">\r\n                <geom fromto=\"0 -.06 0 0 .06 0\" name=\"lwaist\" size=\"0.06\" type=\"capsule\"/>\r\n                <joint armature=\"0.02\" axis=\"0 0 1\" damping=\"5\" name=\"abdomen_z\" pos=\"0 0 0.065\" range=\"-45 45\" stiffness=\"20\" type=\"hinge\"/>\r\n                <joint armature=\"0.02\" axis=\"0 1 0\" damping=\"5\" name=\"abdomen_y\" pos=\"0 0 0.065\" range=\"-75 30\" stiffness=\"10\" type=\"hinge\"/>\r\n                <body name=\"pelvis\" pos=\"0.165 0 0\" quat=\"1.000 0 -0.002 0\">\r\n                    <joint armature=\"0.02\" axis=\"1 0 0\" damping=\"5\" name=\"abdomen_x\" pos=\"0 0 0.1\" range=\"-35 35\" stiffness=\"10\" type=\"hinge\"/>\r\n                    <geom fromto=\"-.02 -.07 0 -.02 .07 0\" name=\"butt\" size=\"0.09\" type=\"capsule\"/>\r\n                    <body name=\"right_thigh\" pos=\"0 -0.1 0\">\r\n                        <joint armature=\"0.01\" axis=\"1 0 0\" damping=\"5\" name=\"right_hip_x\" pos=\"0 0 0\" range=\"-25 5\" stiffness=\"10\" type=\"hinge\"/>\r\n                        <joint armature=\"0.01\" axis=\"0 0 1\" damping=\"5\" name=\"right_hip_z\" pos=\"0 0 0\" range=\"-60 35\" stiffness=\"10\" type=\"hinge\"/>\r\n                        <joint armature=\"0.0080\" axis=\"0 1 0\" damping=\"5\" name=\"right_hip_y\" pos=\"0 0 0\" range=\"-110 20\" stiffness=\"20\" type=\"hinge\"/>\r\n                        <geom fromto=\"0 0 0 0.34 0.01 0\" name=\"right_thigh1\" size=\"0.06\" type=\"capsule\"/>\r\n                        <body name=\"right_shin\" pos=\"0.403 0.01 0\">\r\n                            <joint armature=\"0.0060\" axis=\"0 -1 0\" name=\"right_knee\" pos=\"0 0 .02\" range=\"-160 -2\" type=\"hinge\"/>\r\n                            <geom fromto=\"0 0 0 0.3 0 0\" name=\"right_shin1\" size=\"0.049\" type=\"capsule\"/>\r\n                            <body name=\"right_foot\" pos=\"0.35 0 -.10\">\r\n                                <geom name=\"right_foot\" pos=\"0 0 0.1\" size=\"0.075\" type=\"sphere\" user=\"0\"/>\r\n                            </body>\r\n                        </body>\r\n                    </body>\r\n                    <body name=\"left_thigh\" pos=\"0 0.1 0\">\r\n                        <joint armature=\"0.01\" axis=\"-1 0 0\" damping=\"5\" name=\"left_hip_x\" pos=\"0 0 0\" range=\"-25 5\" stiffness=\"10\" type=\"hinge\"/>\r\n                        <joint armature=\"0.01\" axis=\"0 0 -1\" damping=\"5\" name=\"left_hip_z\" pos=\"0 0 0\" range=\"-60 35\" stiffness=\"10\" type=\"hinge\"/>\r\n                        <joint armature=\"0.01\" axis=\"0 1 0\" damping=\"5\" name=\"left_hip_y\" pos=\"0 0 0\" range=\"-120 20\" stiffness=\"20\" type=\"hinge\"/>\r\n                        <geom fromto=\"0 0 0 0.34 -0.01 0\" name=\"left_thigh1\" size=\"0.06\" type=\"capsule\"/>\r\n                        <body name=\"left_shin\" pos=\"0.403 -0.01 0\">\r\n                            <joint armature=\"0.0060\" axis=\"0 -1 0\" name=\"left_knee\" pos=\"0 0 .02\" range=\"-160 -2\" stiffness=\"1\" type=\"hinge\"/>\r\n                            <geom fromto=\"0 0 0 0.3 0 0\" name=\"left_shin1\" size=\"0.049\" type=\"capsule\"/>\r\n                            <body name=\"left_foot\" pos=\"0.35 0 -.1\">\r\n                                <geom name=\"left_foot\" type=\"sphere\" size=\"0.075\" pos=\"0 0 0.1\" user=\"0\" />\r\n                            </body>\r\n                        </body>\r\n                    </body>\r\n                </body>\r\n            </body>\r\n            <body name=\"right_upper_arm\" pos=\"0 -0.17 0.06\">\r\n                <joint armature=\"0.0068\" axis=\"2 1 1\" name=\"right_shoulder1\" pos=\"0 0 0\" range=\"-85 60\" stiffness=\"1\" type=\"hinge\"/>\r\n                <joint armature=\"0.0051\" axis=\"0 -1 1\" name=\"right_shoulder2\" pos=\"0 0 0\" range=\"-85 60\" stiffness=\"1\" type=\"hinge\"/>\r\n                <geom fromto=\"0 0 0 .16 -.16 -.16\" name=\"right_uarm1\" size=\"0.04 0.16\" type=\"capsule\"/>\r\n                <body name=\"right_lower_arm\" pos=\".18 -.18 -.18\">\r\n                    <joint armature=\"0.0028\" axis=\"0 -1 1\" name=\"right_elbow\" pos=\"0 0 0\" range=\"-90 50\" stiffness=\"0\" type=\"hinge\"/>\r\n                    <geom fromto=\"0.01 0.01 0.01 .17 .17 .17\" name=\"right_larm\" size=\"0.031\" type=\"capsule\"/>\r\n                    <geom name=\"right_hand\" pos=\".18 .18 .18\" size=\"0.04\" type=\"sphere\"/>\r\n                    <camera pos=\"0 0 0\"/>\r\n                </body>\r\n            </body>\r\n            <body name=\"left_upper_arm\" pos=\"0 0.17 0.06\">\r\n                <joint armature=\"0.0068\" axis=\"2 -1 1\" name=\"left_shoulder1\" pos=\"0 0 0\" range=\"-60 85\" stiffness=\"1\" type=\"hinge\"/>\r\n                <joint armature=\"0.0051\" axis=\"0 1 1\" name=\"left_shoulder2\" pos=\"0 0 0\" range=\"-60 85\" stiffness=\"1\" type=\"hinge\"/>\r\n                <geom fromto=\"0 0 0 .16 .16 -.16\" name=\"left_uarm1\" size=\"0.04 0.16\" type=\"capsule\"/>\r\n                <body name=\"left_lower_arm\" pos=\".18 .18 -.18\">\r\n                    <joint armature=\"0.0028\" axis=\"0 -1 -1\" name=\"left_elbow\" pos=\"0 0 0\" range=\"-90 50\" stiffness=\"0\" type=\"hinge\"/>\r\n                    <geom fromto=\"0.01 -0.01 0.01 .17 -.17 .17\" name=\"left_larm\" size=\"0.031\" type=\"capsule\"/>\r\n                    <geom name=\"left_hand\" pos=\".18 -.18 .18\" size=\"0.04\" type=\"sphere\"/>\r\n                </body>\r\n            </body>\r\n        </body>\r\n    </worldbody>\r\n    <tendon>\r\n        <fixed name=\"left_hipknee\">\r\n            <joint coef=\"-1\" joint=\"left_hip_y\"/>\r\n            <joint coef=\"1\" joint=\"left_knee\"/>\r\n        </fixed>\r\n        <fixed name=\"right_hipknee\">\r\n            <joint coef=\"-1\" joint=\"right_hip_y\"/>\r\n            <joint coef=\"1\" joint=\"right_knee\"/>\r\n        </fixed>\r\n    </tendon>\r\n\r\n    <actuator>\r\n        <motor gear=\"100\" joint=\"abdomen_y\" name=\"abdomen_y\"/>\r\n        <motor gear=\"100\" joint=\"abdomen_z\" name=\"abdomen_z\"/>\r\n        <motor gear=\"100\" joint=\"abdomen_x\" name=\"abdomen_x\"/>\r\n        <motor gear=\"100\" joint=\"right_hip_x\" name=\"right_hip_x\"/>\r\n        <motor gear=\"100\" joint=\"right_hip_z\" name=\"right_hip_z\"/>\r\n        <motor gear=\"300\" joint=\"right_hip_y\" name=\"right_hip_y\"/>\r\n        <motor gear=\"200\" joint=\"right_knee\" name=\"right_knee\"/>\r\n        <motor gear=\"100\" joint=\"left_hip_x\" name=\"left_hip_x\"/>\r\n        <motor gear=\"100\" joint=\"left_hip_z\" name=\"left_hip_z\"/>\r\n        <motor gear=\"300\" joint=\"left_hip_y\" name=\"left_hip_y\"/>\r\n        <motor gear=\"200\" joint=\"left_knee\" name=\"left_knee\"/>\r\n        <motor gear=\"25\" joint=\"right_shoulder1\" name=\"right_shoulder1\"/>\r\n        <motor gear=\"25\" joint=\"right_shoulder2\" name=\"right_shoulder2\"/>\r\n        <motor gear=\"25\" joint=\"right_elbow\" name=\"right_elbow\"/>\r\n        <motor gear=\"25\" joint=\"left_shoulder1\" name=\"left_shoulder1\"/>\r\n        <motor gear=\"25\" joint=\"left_shoulder2\" name=\"left_shoulder2\"/>\r\n        <motor gear=\"25\" joint=\"left_elbow\" name=\"left_elbow\"/>\r\n    </actuator>\r\n</mujoco>\r\n"
  },
  {
    "path": "gym/envs/mujoco/assets/inverted_double_pendulum.xml",
    "content": "<!-- Cartpole Model\n\n    The state space is populated with joints in the order that they are\n    defined in this file. The actuators also operate on joints.\n\n    State-Space (name/joint/parameter):\n        - cart      slider      position (m)\n        - pole      hinge       angle (rad)\n        - cart      slider      velocity (m/s)\n        - pole      hinge       angular velocity (rad/s)\n\n    Actuators (name/actuator/parameter):\n        - cart      motor       force x (N)\n\n-->\n<mujoco model=\"cartpole\">\n  <compiler coordinate=\"local\" inertiafromgeom=\"true\"/>\n  <custom>\n    <numeric data=\"2\" name=\"frame_skip\"/>\n  </custom>\n  <default>\n    <joint damping=\"0.05\"/>\n    <geom contype=\"0\" friction=\"1 0.1 0.1\" rgba=\"0.7 0.7 0 1\"/>\n  </default>\n  <option gravity=\"1e-5 0 -9.81\" integrator=\"RK4\" timestep=\"0.01\"/>\n  <size nstack=\"3000\"/>\n  <worldbody>\n    <geom name=\"floor\" pos=\"0 0 -3.0\" rgba=\"0.8 0.9 0.8 1\" size=\"40 40 40\" type=\"plane\"/>\n    <geom name=\"rail\" pos=\"0 0 0\" quat=\"0.707 0 0.707 0\" rgba=\"0.3 0.3 0.7 1\" size=\"0.02 1\" type=\"capsule\"/>\n    <body name=\"cart\" pos=\"0 0 0\">\n      <joint axis=\"1 0 0\" limited=\"true\" margin=\"0.01\" name=\"slider\" pos=\"0 0 0\" range=\"-1 1\" type=\"slide\"/>\n      <geom name=\"cart\" pos=\"0 0 0\" quat=\"0.707 0 0.707 0\" size=\"0.1 0.1\" type=\"capsule\"/>\n      <body name=\"pole\" pos=\"0 0 0\">\n        <joint axis=\"0 1 0\" name=\"hinge\" pos=\"0 0 0\" type=\"hinge\"/>\n        <geom fromto=\"0 0 0 0 0 0.6\" name=\"cpole\" rgba=\"0 0.7 0.7 1\" size=\"0.045 0.3\" type=\"capsule\"/>\n        <body name=\"pole2\" pos=\"0 0 0.6\">\n          <joint axis=\"0 1 0\" name=\"hinge2\" pos=\"0 0 0\" type=\"hinge\"/>\n          <geom fromto=\"0 0 0 0 0 0.6\" name=\"cpole2\" rgba=\"0 0.7 0.7 1\" size=\"0.045 0.3\" type=\"capsule\"/>\n          <site name=\"tip\" pos=\"0 0 .6\" size=\"0.01 0.01\"/>\n        </body>\n      </body>\n    </body>\n  </worldbody>\n  <actuator>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1 1\" gear=\"500\" joint=\"slider\" name=\"slide\"/>\n  </actuator>\n</mujoco>"
  },
  {
    "path": "gym/envs/mujoco/assets/inverted_pendulum.xml",
    "content": "<mujoco model=\"inverted pendulum\">\r\n\t<compiler inertiafromgeom=\"true\"/>\r\n\t<default>\r\n\t\t<joint armature=\"0\" damping=\"1\" limited=\"true\"/>\r\n\t\t<geom contype=\"0\" friction=\"1 0.1 0.1\" rgba=\"0.7 0.7 0 1\"/>\r\n\t\t<tendon/>\r\n\t\t<motor ctrlrange=\"-3 3\"/>\r\n\t</default>\r\n\t<option gravity=\"0 0 -9.81\" integrator=\"RK4\" timestep=\"0.02\"/>\r\n\t<size nstack=\"3000\"/>\r\n\t<worldbody>\r\n\t\t<!--geom name=\"ground\" type=\"plane\" pos=\"0 0 0\" /-->\r\n\t\t<geom name=\"rail\" pos=\"0 0 0\" quat=\"0.707 0 0.707 0\" rgba=\"0.3 0.3 0.7 1\" size=\"0.02 1\" type=\"capsule\"/>\r\n\t\t<body name=\"cart\" pos=\"0 0 0\">\r\n\t\t\t<joint axis=\"1 0 0\" limited=\"true\" name=\"slider\" pos=\"0 0 0\" range=\"-1 1\" type=\"slide\"/>\r\n\t\t\t<geom name=\"cart\" pos=\"0 0 0\" quat=\"0.707 0 0.707 0\" size=\"0.1 0.1\" type=\"capsule\"/>\r\n\t\t\t<body name=\"pole\" pos=\"0 0 0\">\r\n\t\t\t\t<joint axis=\"0 1 0\" name=\"hinge\" pos=\"0 0 0\" range=\"-90 90\" type=\"hinge\"/>\r\n\t\t\t\t<geom fromto=\"0 0 0 0.001 0 0.6\" name=\"cpole\" rgba=\"0 0.7 0.7 1\" size=\"0.049 0.3\" type=\"capsule\"/>\r\n\t\t\t\t<!--                 <body name=\"pole2\" pos=\"0.001 0 0.6\"><joint name=\"hinge2\" type=\"hinge\" pos=\"0 0 0\" axis=\"0 1 0\"/><geom name=\"cpole2\" type=\"capsule\" fromto=\"0 0 0 0 0 0.6\" size=\"0.05 0.3\" rgba=\"0.7 0 0.7 1\"/><site name=\"tip2\" pos=\"0 0 .6\"/></body>-->\r\n\t\t\t</body>\r\n\t\t</body>\r\n\t</worldbody>\r\n\t<actuator>\r\n\t\t<motor ctrllimited=\"true\" ctrlrange=\"-3 3\" gear=\"100\" joint=\"slider\" name=\"slide\"/>\r\n\t</actuator>\r\n</mujoco>"
  },
  {
    "path": "gym/envs/mujoco/assets/point.xml",
    "content": "<mujoco>\n  <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n  <option integrator=\"RK4\" timestep=\"0.02\"/>\n  <default>\n    <joint armature=\"0\" damping=\"0\" limited=\"false\"/>\n    <geom conaffinity=\"0\" condim=\"3\" density=\"100\" friction=\"1 0.5 0.5\" margin=\"0\" rgba=\"0.8 0.6 0.4 1\"/>\n  </default>\n  <asset>\n    <texture builtin=\"gradient\" height=\"100\" rgb1=\"1 1 1\" rgb2=\"0 0 0\" type=\"skybox\" width=\"100\"/>\n    <texture builtin=\"flat\" height=\"1278\" mark=\"cross\" markrgb=\"1 1 1\" name=\"texgeom\" random=\"0.01\" rgb1=\"0.8 0.6 0.4\" rgb2=\"0.8 0.6 0.4\" type=\"cube\" width=\"127\"/>\n    <texture builtin=\"checker\" height=\"100\" name=\"texplane\" rgb1=\"0 0 0\" rgb2=\"0.8 0.8 0.8\" type=\"2d\" width=\"100\"/>\n    <material name=\"MatPlane\" reflectance=\"0.5\" shininess=\"1\" specular=\"1\" texrepeat=\"30 30\" texture=\"texplane\"/>\n    <material name=\"geom\" texture=\"texgeom\" texuniform=\"true\"/>\n  </asset>\n  <worldbody>\n    <light cutoff=\"100\" diffuse=\"1 1 1\" dir=\"-0 0 -1.3\" directional=\"true\" exponent=\"1\" pos=\"0 0 1.3\" specular=\".1 .1 .1\"/>\n    <geom conaffinity=\"1\" condim=\"3\" material=\"MatPlane\" name=\"floor\" pos=\"0 0 0\" rgba=\"0.8 0.9 0.8 1\" size=\"40 40 40\" type=\"plane\"/>\n    <body name=\"torso\" pos=\"0 0 0\">\n      <geom name=\"pointbody\" pos=\"0 0 0.5\" size=\"0.5\" type=\"sphere\"/>\n      <geom name=\"pointarrow\" pos=\"0.6 0 0.5\" size=\"0.5 0.1 0.1\" type=\"box\"/>\n      <joint axis=\"1 0 0\" name=\"ballx\" pos=\"0 0 0\" type=\"slide\"/>\n      <joint axis=\"0 1 0\" name=\"bally\" pos=\"0 0 0\" type=\"slide\"/>\n      <joint axis=\"0 0 1\" limited=\"false\" name=\"rot\" pos=\"0 0 0\" type=\"hinge\"/>\n    </body>\n  </worldbody>\n  <actuator>\n    <!-- Those are just dummy actuators for providing ranges -->\n    <motor ctrllimited=\"true\" ctrlrange=\"-1 1\" joint=\"ballx\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-0.25 0.25\" joint=\"rot\"/>\n  </actuator>\n</mujoco>\n"
  },
  {
    "path": "gym/envs/mujoco/assets/pusher.xml",
    "content": "<mujoco model=\"arm3d\">\n  <compiler inertiafromgeom=\"true\" angle=\"radian\" coordinate=\"local\"/>\n  <option timestep=\"0.01\" gravity=\"0 0 0\" iterations=\"20\" integrator=\"Euler\" />\n  \n  <default>\n    <joint armature='0.04' damping=\"1\" limited=\"true\"/>\n    <geom friction=\".8 .1 .1\" density=\"300\" margin=\"0.002\" condim=\"1\" contype=\"0\" conaffinity=\"0\"/>\n  </default>\n  \n  <worldbody>\n    <light diffuse=\".5 .5 .5\" pos=\"0 0 3\" dir=\"0 0 -1\"/>\n    <geom name=\"table\" type=\"plane\" pos=\"0 0.5 -0.325\" size=\"1 1 0.1\" contype=\"1\" conaffinity=\"1\"/>\n\n    <body name=\"r_shoulder_pan_link\" pos=\"0 -0.6 0\">\n      <geom name=\"e1\" type=\"sphere\" rgba=\"0.6 0.6 0.6 1\" pos=\"-0.06 0.05 0.2\" size=\"0.05\" />\n      <geom name=\"e2\" type=\"sphere\" rgba=\"0.6 0.6 0.6 1\" pos=\" 0.06 0.05 0.2\" size=\"0.05\" />\n      <geom name=\"e1p\" type=\"sphere\" rgba=\"0.1 0.1 0.1 1\" pos=\"-0.06 0.09 0.2\" size=\"0.03\" />\n      <geom name=\"e2p\" type=\"sphere\" rgba=\"0.1 0.1 0.1 1\" pos=\" 0.06 0.09 0.2\" size=\"0.03\" />\n      <geom name=\"sp\" type=\"capsule\" fromto=\"0 0 -0.4 0 0 0.2\" size=\"0.1\" />\n      <joint name=\"r_shoulder_pan_joint\" type=\"hinge\" pos=\"0 0 0\" axis=\"0 0 1\" range=\"-2.2854 1.714602\" damping=\"1.0\" />\n\n      <body name=\"r_shoulder_lift_link\" pos=\"0.1 0 0\">\n        <geom name=\"sl\" type=\"capsule\" fromto=\"0 -0.1 0 0 0.1 0\" size=\"0.1\" />\n        <joint name=\"r_shoulder_lift_joint\" type=\"hinge\" pos=\"0 0 0\" axis=\"0 1 0\" range=\"-0.5236 1.3963\" damping=\"1.0\" />\n\n        <body name=\"r_upper_arm_roll_link\" pos=\"0 0 0\">\n          <geom name=\"uar\" type=\"capsule\" fromto=\"-0.1 0 0 0.1 0 0\" size=\"0.02\" />\n          <joint name=\"r_upper_arm_roll_joint\" type=\"hinge\" pos=\"0 0 0\" axis=\"1 0 0\" range=\"-1.5 1.7\" damping=\"0.1\" />\n\n          <body name=\"r_upper_arm_link\" pos=\"0 0 0\">\n            <geom name=\"ua\" type=\"capsule\" fromto=\"0 0 0 0.4 0 0\" size=\"0.06\" />\n\n            <body name=\"r_elbow_flex_link\" pos=\"0.4 0 0\">\n              <geom name=\"ef\" type=\"capsule\" fromto=\"0 -0.02 0 0.0 0.02 0\" size=\"0.06\" />\n              <joint name=\"r_elbow_flex_joint\" type=\"hinge\" pos=\"0 0 0\" axis=\"0 1 0\" range=\"-2.3213 0\" damping=\"0.1\" />\n\n              <body name=\"r_forearm_roll_link\" pos=\"0 0 0\">\n                <geom name=\"fr\" type=\"capsule\" fromto=\"-0.1 0 0 0.1 0 0\" size=\"0.02\" />\n                <joint name=\"r_forearm_roll_joint\" type=\"hinge\" limited=\"true\" pos=\"0 0 0\" axis=\"1 0 0\" damping=\".1\" range=\"-1.5 1.5\"/>\n\n                <body name=\"r_forearm_link\" pos=\"0 0 0\">\n                  <geom name=\"fa\" type=\"capsule\" fromto=\"0 0 0 0.291 0 0\" size=\"0.05\" />\n\n                  <body name=\"r_wrist_flex_link\" pos=\"0.321 0 0\">\n                    <geom name=\"wf\" type=\"capsule\" fromto=\"0 -0.02 0 0 0.02 0\" size=\"0.01\" />\n                    <joint name=\"r_wrist_flex_joint\" type=\"hinge\" pos=\"0 0 0\" axis=\"0 1 0\" range=\"-1.094 0\" damping=\".1\" />\n\n                    <body name=\"r_wrist_roll_link\" pos=\"0 0 0\">\n                      <joint name=\"r_wrist_roll_joint\" type=\"hinge\" pos=\"0 0 0\" limited=\"true\" axis=\"1 0 0\" damping=\"0.1\" range=\"-1.5 1.5\"/>\n                      <body name=\"tips_arm\" pos=\"0 0 0\">\n                        <geom name=\"tip_arml\" type=\"sphere\" pos=\"0.1 -0.1 0.\" size=\"0.01\" />\n                        <geom name=\"tip_armr\" type=\"sphere\" pos=\"0.1 0.1 0.\" size=\"0.01\" />\n                      </body>\n                      <geom type=\"capsule\" fromto=\"0 -0.1 0. 0.0 +0.1 0\" size=\"0.02\" contype=\"1\" conaffinity=\"1\" />\n                      <geom type=\"capsule\" fromto=\"0 -0.1 0. 0.1 -0.1 0\" size=\"0.02\" contype=\"1\" conaffinity=\"1\" />\n                      <geom type=\"capsule\" fromto=\"0 +0.1 0. 0.1 +0.1 0.\" size=\"0.02\" contype=\"1\" conaffinity=\"1\" />\n                    </body>\n                  </body>\n                </body>\n              </body>\n            </body>\n          </body>\n        </body>\n      </body>\n    </body>\n\n    <!--<body name=\"object\" pos=\"0.55 -0.3 -0.275\" >-->\n    <body name=\"object\" pos=\"0.45 -0.05 -0.275\" >\n      <geom rgba=\"1 1 1 0\" type=\"sphere\" size=\"0.05 0.05 0.05\" density=\"0.00001\" conaffinity=\"0\"/>\n      <geom rgba=\"1 1 1 1\" type=\"cylinder\" size=\"0.05 0.05 0.05\" density=\"0.00001\" contype=\"1\" conaffinity=\"0\"/>\n      <joint name=\"obj_slidey\" type=\"slide\" pos=\"0 0 0\" axis=\"0 1 0\" range=\"-10.3213 10.3\" damping=\"0.5\"/>\n      <joint name=\"obj_slidex\" type=\"slide\" pos=\"0 0 0\" axis=\"1 0 0\" range=\"-10.3213 10.3\" damping=\"0.5\"/>\n    </body>\n\n    <body name=\"goal\" pos=\"0.45 -0.05 -0.3230\">\n      <geom rgba=\"1 0 0 1\" type=\"cylinder\" size=\"0.08 0.001 0.1\" density='0.00001' contype=\"0\" conaffinity=\"0\"/>\n      <joint name=\"goal_slidey\" type=\"slide\" pos=\"0 0 0\" axis=\"0 1 0\" range=\"-10.3213 10.3\" damping=\"0.5\"/>\n      <joint name=\"goal_slidex\" type=\"slide\" pos=\"0 0 0\" axis=\"1 0 0\" range=\"-10.3213 10.3\" damping=\"0.5\"/> \n    </body>\n  </worldbody>\n\n  <actuator>\n    <motor joint=\"r_shoulder_pan_joint\" ctrlrange=\"-2.0 2.0\" ctrllimited=\"true\" />\n    <motor joint=\"r_shoulder_lift_joint\" ctrlrange=\"-2.0 2.0\" ctrllimited=\"true\" />\n    <motor joint=\"r_upper_arm_roll_joint\" ctrlrange=\"-2.0 2.0\" ctrllimited=\"true\" />\n    <motor joint=\"r_elbow_flex_joint\" ctrlrange=\"-2.0 2.0\" ctrllimited=\"true\" />\n    <motor joint=\"r_forearm_roll_joint\" ctrlrange=\"-2.0 2.0\" ctrllimited=\"true\" />\n    <motor joint=\"r_wrist_flex_joint\" ctrlrange=\"-2.0 2.0\" ctrllimited=\"true\" />\n    <motor joint=\"r_wrist_roll_joint\" ctrlrange=\"-2.0 2.0\" ctrllimited=\"true\"/>\n  </actuator>\n</mujoco>\n"
  },
  {
    "path": "gym/envs/mujoco/assets/reacher.xml",
    "content": "<mujoco model=\"reacher\">\r\n\t<compiler angle=\"radian\" inertiafromgeom=\"true\"/>\r\n\t<default>\r\n\t\t<joint armature=\"1\" damping=\"1\" limited=\"true\"/>\r\n\t\t<geom contype=\"0\" friction=\"1 0.1 0.1\" rgba=\"0.7 0.7 0 1\"/>\r\n\t</default>\r\n\t<option gravity=\"0 0 -9.81\" integrator=\"RK4\" timestep=\"0.01\"/>\r\n\t<worldbody>\r\n\t\t<!-- Arena -->\r\n\t\t<geom conaffinity=\"0\" contype=\"0\" name=\"ground\" pos=\"0 0 0\" rgba=\"0.9 0.9 0.9 1\" size=\"1 1 10\" type=\"plane\"/>\r\n\t\t<geom conaffinity=\"0\" fromto=\"-.3 -.3 .01 .3 -.3 .01\" name=\"sideS\" rgba=\"0.9 0.4 0.6 1\" size=\".02\" type=\"capsule\"/>\r\n\t\t<geom conaffinity=\"0\" fromto=\" .3 -.3 .01 .3  .3 .01\" name=\"sideE\" rgba=\"0.9 0.4 0.6 1\" size=\".02\" type=\"capsule\"/>\r\n\t\t<geom conaffinity=\"0\" fromto=\"-.3  .3 .01 .3  .3 .01\" name=\"sideN\" rgba=\"0.9 0.4 0.6 1\" size=\".02\" type=\"capsule\"/>\r\n\t\t<geom conaffinity=\"0\" fromto=\"-.3 -.3 .01 -.3 .3 .01\" name=\"sideW\" rgba=\"0.9 0.4 0.6 1\" size=\".02\" type=\"capsule\"/>\r\n\t\t<!-- Arm -->\r\n\t\t<geom conaffinity=\"0\" contype=\"0\" fromto=\"0 0 0 0 0 0.02\" name=\"root\" rgba=\"0.9 0.4 0.6 1\" size=\".011\" type=\"cylinder\"/>\r\n\t\t<body name=\"body0\" pos=\"0 0 .01\">\r\n\t\t\t<geom fromto=\"0 0 0 0.1 0 0\" name=\"link0\" rgba=\"0.0 0.4 0.6 1\" size=\".01\" type=\"capsule\"/>\r\n\t\t\t<joint axis=\"0 0 1\" limited=\"false\" name=\"joint0\" pos=\"0 0 0\" type=\"hinge\"/>\r\n\t\t\t<body name=\"body1\" pos=\"0.1 0 0\">\r\n\t\t\t\t<joint axis=\"0 0 1\" limited=\"true\" name=\"joint1\" pos=\"0 0 0\" range=\"-3.0 3.0\" type=\"hinge\"/>\r\n\t\t\t\t<geom fromto=\"0 0 0 0.1 0 0\" name=\"link1\" rgba=\"0.0 0.4 0.6 1\" size=\".01\" type=\"capsule\"/>\r\n\t\t\t\t<body name=\"fingertip\" pos=\"0.11 0 0\">\r\n\t\t\t\t\t<geom contype=\"0\" name=\"fingertip\" pos=\"0 0 0\" rgba=\"0.0 0.8 0.6 1\" size=\".01\" type=\"sphere\"/>\r\n\t\t\t\t</body>\r\n\t\t\t</body>\r\n\t\t</body>\r\n\t\t<!-- Target -->\r\n\t\t<body name=\"target\" pos=\".1 -.1 .01\">\r\n\t\t\t<joint armature=\"0\" axis=\"1 0 0\" damping=\"0\" limited=\"true\" name=\"target_x\" pos=\"0 0 0\" range=\"-.27 .27\" ref=\".1\" stiffness=\"0\" type=\"slide\"/>\r\n\t\t\t<joint armature=\"0\" axis=\"0 1 0\" damping=\"0\" limited=\"true\" name=\"target_y\" pos=\"0 0 0\" range=\"-.27 .27\" ref=\"-.1\" stiffness=\"0\" type=\"slide\"/>\r\n\t\t\t<geom conaffinity=\"0\" contype=\"0\" name=\"target\" pos=\"0 0 0\" rgba=\"0.9 0.2 0.2 1\" size=\".009\" type=\"sphere\"/>\r\n\t\t</body>\r\n\t</worldbody>\r\n\t<actuator>\r\n\t\t<motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" gear=\"200.0\" joint=\"joint0\"/>\r\n\t\t<motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" gear=\"200.0\" joint=\"joint1\"/>\r\n\t</actuator>\r\n</mujoco>\r\n"
  },
  {
    "path": "gym/envs/mujoco/assets/swimmer.xml",
    "content": "<mujoco model=\"swimmer\">\n  <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n  <option collision=\"predefined\" density=\"4000\" integrator=\"RK4\" timestep=\"0.01\" viscosity=\"0.1\"/>\n  <default>\n    <geom conaffinity=\"1\" condim=\"1\" contype=\"1\" material=\"geom\" rgba=\"0.8 0.6 .4 1\"/>\n    <joint armature='0.1'  />\n  </default>\n  <asset>\n    <texture builtin=\"gradient\" height=\"100\" rgb1=\"1 1 1\" rgb2=\"0 0 0\" type=\"skybox\" width=\"100\"/>\n    <texture builtin=\"flat\" height=\"1278\" mark=\"cross\" markrgb=\"1 1 1\" name=\"texgeom\" random=\"0.01\" rgb1=\"0.8 0.6 0.4\" rgb2=\"0.8 0.6 0.4\" type=\"cube\" width=\"127\"/>\n    <texture builtin=\"checker\" height=\"100\" name=\"texplane\" rgb1=\"0 0 0\" rgb2=\"0.8 0.8 0.8\" type=\"2d\" width=\"100\"/>\n    <material name=\"MatPlane\" reflectance=\"0.5\" shininess=\"1\" specular=\"1\" texrepeat=\"30 30\" texture=\"texplane\"/>\n    <material name=\"geom\" texture=\"texgeom\" texuniform=\"true\"/>\n  </asset>\n  <worldbody>\n    <light cutoff=\"100\" diffuse=\"1 1 1\" dir=\"-0 0 -1.3\" directional=\"true\" exponent=\"1\" pos=\"0 0 1.3\" specular=\".1 .1 .1\"/>\n    <geom conaffinity=\"1\" condim=\"3\" material=\"MatPlane\" name=\"floor\" pos=\"0 0 -0.1\" rgba=\"0.8 0.9 0.8 1\" size=\"40 40 0.1\" type=\"plane\"/>\n    <!--  ================= SWIMMER ================= /-->\n    <body name=\"torso\" pos=\"0 0 0\">\n      <camera name=\"track\" mode=\"trackcom\" pos=\"0 -3 3\" xyaxes=\"1 0 0 0 1 1\"/>\n      <geom density=\"1000\" fromto=\"1.5 0 0 0.5 0 0\" size=\"0.1\" type=\"capsule\"/>\n      <joint axis=\"1 0 0\" name=\"slider1\" pos=\"0 0 0\" type=\"slide\"/>\n      <joint axis=\"0 1 0\" name=\"slider2\" pos=\"0 0 0\" type=\"slide\"/>\n      <joint axis=\"0 0 1\" name=\"free_body_rot\" pos=\"0 0 0\" type=\"hinge\"/>\n      <body name=\"mid\" pos=\"0.5 0 0\">\n        <geom density=\"1000\" fromto=\"0 0 0 -1 0 0\" size=\"0.1\" type=\"capsule\"/>\n        <joint axis=\"0 0 1\" limited=\"true\" name=\"motor1_rot\" pos=\"0 0 0\" range=\"-100 100\" type=\"hinge\"/>\n        <body name=\"back\" pos=\"-1 0 0\">\n          <geom density=\"1000\" fromto=\"0 0 0 -1 0 0\" size=\"0.1\" type=\"capsule\"/>\n          <joint axis=\"0 0 1\" limited=\"true\" name=\"motor2_rot\" pos=\"0 0 0\" range=\"-100 100\" type=\"hinge\"/>\n        </body>\n      </body>\n    </body>\n  </worldbody>\n  <actuator>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1 1\" gear=\"150.0\" joint=\"motor1_rot\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1 1\" gear=\"150.0\" joint=\"motor2_rot\"/>\n  </actuator>\n</mujoco>\n"
  },
  {
    "path": "gym/envs/mujoco/assets/walker2d.xml",
    "content": "<mujoco model=\"walker2d\">\n  <compiler angle=\"degree\" coordinate=\"global\" inertiafromgeom=\"true\"/>\n  <default>\n    <joint armature=\"0.01\" damping=\".1\" limited=\"true\"/>\n    <geom conaffinity=\"0\" condim=\"3\" contype=\"1\" density=\"1000\" friction=\".7 .1 .1\" rgba=\"0.8 0.6 .4 1\"/>\n  </default>\n  <option integrator=\"RK4\" timestep=\"0.002\"/>\n  <worldbody>\n    <light cutoff=\"100\" diffuse=\"1 1 1\" dir=\"-0 0 -1.3\" directional=\"true\" exponent=\"1\" pos=\"0 0 1.3\" specular=\".1 .1 .1\"/>\n    <geom conaffinity=\"1\" condim=\"3\" name=\"floor\" pos=\"0 0 0\" rgba=\"0.8 0.9 0.8 1\" size=\"40 40 40\" type=\"plane\" material=\"MatPlane\"/>\n    <body name=\"torso\" pos=\"0 0 1.25\">\n      <camera name=\"track\" mode=\"trackcom\" pos=\"0 -3 1\" xyaxes=\"1 0 0 0 0 1\"/>\n      <joint armature=\"0\" axis=\"1 0 0\" damping=\"0\" limited=\"false\" name=\"rootx\" pos=\"0 0 0\" stiffness=\"0\" type=\"slide\"/>\n      <joint armature=\"0\" axis=\"0 0 1\" damping=\"0\" limited=\"false\" name=\"rootz\" pos=\"0 0 0\" ref=\"1.25\" stiffness=\"0\" type=\"slide\"/>\n      <joint armature=\"0\" axis=\"0 1 0\" damping=\"0\" limited=\"false\" name=\"rooty\" pos=\"0 0 1.25\" stiffness=\"0\" type=\"hinge\"/>\n      <geom friction=\"0.9\" fromto=\"0 0 1.45 0 0 1.05\" name=\"torso_geom\" size=\"0.05\" type=\"capsule\"/>\n      <body name=\"thigh\" pos=\"0 0 1.05\">\n        <joint axis=\"0 -1 0\" name=\"thigh_joint\" pos=\"0 0 1.05\" range=\"-150 0\" type=\"hinge\"/>\n        <geom friction=\"0.9\" fromto=\"0 0 1.05 0 0 0.6\" name=\"thigh_geom\" size=\"0.05\" type=\"capsule\"/>\n        <body name=\"leg\" pos=\"0 0 0.35\">\n          <joint axis=\"0 -1 0\" name=\"leg_joint\" pos=\"0 0 0.6\" range=\"-150 0\" type=\"hinge\"/>\n          <geom friction=\"0.9\" fromto=\"0 0 0.6 0 0 0.1\" name=\"leg_geom\" size=\"0.04\" type=\"capsule\"/>\n          <body name=\"foot\" pos=\"0.2 0 0\">\n            <joint axis=\"0 -1 0\" name=\"foot_joint\" pos=\"0 0 0.1\" range=\"-45 45\" type=\"hinge\"/>\n            <geom friction=\"0.9\" fromto=\"-0.0 0 0.1 0.2 0 0.1\" name=\"foot_geom\" size=\"0.06\" type=\"capsule\"/>\n          </body>\n        </body>\n      </body>\n      <!-- copied and then replace thigh->thigh_left, leg->leg_left, foot->foot_right -->\n      <body name=\"thigh_left\" pos=\"0 0 1.05\">\n        <joint axis=\"0 -1 0\" name=\"thigh_left_joint\" pos=\"0 0 1.05\" range=\"-150 0\" type=\"hinge\"/>\n        <geom friction=\"0.9\" fromto=\"0 0 1.05 0 0 0.6\" name=\"thigh_left_geom\" rgba=\".7 .3 .6 1\" size=\"0.05\" type=\"capsule\"/>\n        <body name=\"leg_left\" pos=\"0 0 0.35\">\n          <joint axis=\"0 -1 0\" name=\"leg_left_joint\" pos=\"0 0 0.6\" range=\"-150 0\" type=\"hinge\"/>\n          <geom friction=\"0.9\" fromto=\"0 0 0.6 0 0 0.1\" name=\"leg_left_geom\" rgba=\".7 .3 .6 1\" size=\"0.04\" type=\"capsule\"/>\n          <body name=\"foot_left\" pos=\"0.2 0 0\">\n            <joint axis=\"0 -1 0\" name=\"foot_left_joint\" pos=\"0 0 0.1\" range=\"-45 45\" type=\"hinge\"/>\n            <geom friction=\"1.9\" fromto=\"-0.0 0 0.1 0.2 0 0.1\" name=\"foot_left_geom\" rgba=\".7 .3 .6 1\" size=\"0.06\" type=\"capsule\"/>\n          </body>\n        </body>\n      </body>\n    </body>\n  </worldbody>\n  <actuator>\n    <!-- <motor joint=\"torso_joint\" ctrlrange=\"-100.0 100.0\" isctrllimited=\"true\"/>-->\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" gear=\"100\" joint=\"thigh_joint\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" gear=\"100\" joint=\"leg_joint\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" gear=\"100\" joint=\"foot_joint\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" gear=\"100\" joint=\"thigh_left_joint\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" gear=\"100\" joint=\"leg_left_joint\"/>\n    <motor ctrllimited=\"true\" ctrlrange=\"-1.0 1.0\" gear=\"100\" joint=\"foot_left_joint\"/>\n    <!-- <motor joint=\"finger2_rot\" ctrlrange=\"-20.0 20.0\" isctrllimited=\"true\"/>-->\n  </actuator>\n    <asset>\n        <texture type=\"skybox\" builtin=\"gradient\" rgb1=\".4 .5 .6\" rgb2=\"0 0 0\"\n            width=\"100\" height=\"100\"/>\n        <texture builtin=\"flat\" height=\"1278\" mark=\"cross\" markrgb=\"1 1 1\" name=\"texgeom\" random=\"0.01\" rgb1=\"0.8 0.6 0.4\" rgb2=\"0.8 0.6 0.4\" type=\"cube\" width=\"127\"/>\n        <texture builtin=\"checker\" height=\"100\" name=\"texplane\" rgb1=\"0 0 0\" rgb2=\"0.8 0.8 0.8\" type=\"2d\" width=\"100\"/>\n        <material name=\"MatPlane\" reflectance=\"0.5\" shininess=\"1\" specular=\"1\" texrepeat=\"60 60\" texture=\"texplane\"/>\n        <material name=\"geom\" texture=\"texgeom\" texuniform=\"true\"/>\n    </asset>\n</mujoco>\n"
  },
  {
    "path": "gym/envs/mujoco/half_cheetah.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\n\nclass HalfCheetahEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 20,\n    }\n\n    def __init__(self, **kwargs):\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(17,), dtype=np.float64)\n        MuJocoPyEnv.__init__(\n            self, \"half_cheetah.xml\", 5, observation_space=observation_space, **kwargs\n        )\n        utils.EzPickle.__init__(self, **kwargs)\n\n    def step(self, action):\n        xposbefore = self.sim.data.qpos[0]\n        self.do_simulation(action, self.frame_skip)\n        xposafter = self.sim.data.qpos[0]\n\n        ob = self._get_obs()\n        reward_ctrl = -0.1 * np.square(action).sum()\n        reward_run = (xposafter - xposbefore) / self.dt\n        reward = reward_ctrl + reward_run\n        terminated = False\n\n        if self.render_mode == \"human\":\n            self.render()\n        return (\n            ob,\n            reward,\n            terminated,\n            False,\n            dict(reward_run=reward_run, reward_ctrl=reward_ctrl),\n        )\n\n    def _get_obs(self):\n        return np.concatenate(\n            [\n                self.sim.data.qpos.flat[1:],\n                self.sim.data.qvel.flat,\n            ]\n        )\n\n    def reset_model(self):\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=-0.1, high=0.1, size=self.model.nq\n        )\n        qvel = self.init_qvel + self.np_random.standard_normal(self.model.nv) * 0.1\n        self.set_state(qpos, qvel)\n        return self._get_obs()\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.distance = self.model.stat.extent * 0.5\n"
  },
  {
    "path": "gym/envs/mujoco/half_cheetah_v3.py",
    "content": "__credits__ = [\"Rushiv Arora\"]\n\nimport numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {\n    \"distance\": 4.0,\n}\n\n\nclass HalfCheetahEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 20,\n    }\n\n    def __init__(\n        self,\n        xml_file=\"half_cheetah.xml\",\n        forward_reward_weight=1.0,\n        ctrl_cost_weight=0.1,\n        reset_noise_scale=0.1,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            xml_file,\n            forward_reward_weight,\n            ctrl_cost_weight,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._forward_reward_weight = forward_reward_weight\n\n        self._ctrl_cost_weight = ctrl_cost_weight\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n\n        if exclude_current_positions_from_observation:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(17,), dtype=np.float64\n            )\n        else:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64\n            )\n\n        MuJocoPyEnv.__init__(\n            self, xml_file, 5, observation_space=observation_space, **kwargs\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(action))\n        return control_cost\n\n    def step(self, action):\n        x_position_before = self.sim.data.qpos[0]\n        self.do_simulation(action, self.frame_skip)\n        x_position_after = self.sim.data.qpos[0]\n        x_velocity = (x_position_after - x_position_before) / self.dt\n\n        ctrl_cost = self.control_cost(action)\n\n        forward_reward = self._forward_reward_weight * x_velocity\n\n        observation = self._get_obs()\n        reward = forward_reward - ctrl_cost\n        terminated = False\n        info = {\n            \"x_position\": x_position_after,\n            \"x_velocity\": x_velocity,\n            \"reward_run\": forward_reward,\n            \"reward_ctrl\": -ctrl_cost,\n        }\n\n        if self.render_mode == \"human\":\n            self.render()\n        return observation, reward, terminated, False, info\n\n    def _get_obs(self):\n        position = self.sim.data.qpos.flat.copy()\n        velocity = self.sim.data.qvel.flat.copy()\n\n        if self._exclude_current_positions_from_observation:\n            position = position[1:]\n\n        observation = np.concatenate((position, velocity)).ravel()\n        return observation\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = (\n            self.init_qvel\n            + self._reset_noise_scale * self.np_random.standard_normal(self.model.nv)\n        )\n\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/mujoco/half_cheetah_v4.py",
    "content": "__credits__ = [\"Rushiv Arora\"]\n\nimport numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MujocoEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {\n    \"distance\": 4.0,\n}\n\n\nclass HalfCheetahEnv(MujocoEnv, utils.EzPickle):\n    \"\"\"\n    ### Description\n\n    This environment is based on the work by P. Wawrzyński in\n    [\"A Cat-Like Robot Real-Time Learning to Run\"](http://staff.elka.pw.edu.pl/~pwawrzyn/pub-s/0812_LSCLRR.pdf).\n    The HalfCheetah is a 2-dimensional robot consisting of 9 links and 8\n    joints connecting them (including two paws). The goal is to apply a torque\n    on the joints to make the cheetah run forward (right) as fast as possible,\n    with a positive reward allocated based on the distance moved forward and a\n    negative reward allocated for moving backward. The torso and head of the\n    cheetah are fixed, and the torque can only be applied on the other 6 joints\n    over the front and back thighs (connecting to the torso), shins\n    (connecting to the thighs) and feet (connecting to the shins).\n\n    ### Action Space\n    The action space is a `Box(-1, 1, (6,), float32)`. An action represents the torques applied between *links*.\n\n    | Num | Action                                  | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit         |\n    | --- | --------------------------------------- | ----------- | ----------- | -------------------------------- | ----- | ------------ |\n    | 0   | Torque applied on the back thigh rotor  | -1          | 1           | bthigh                           | hinge | torque (N m) |\n    | 1   | Torque applied on the back shin rotor   | -1          | 1           | bshin                            | hinge | torque (N m) |\n    | 2   | Torque applied on the back foot rotor   | -1          | 1           | bfoot                            | hinge | torque (N m) |\n    | 3   | Torque applied on the front thigh rotor | -1          | 1           | fthigh                           | hinge | torque (N m) |\n    | 4   | Torque applied on the front shin rotor  | -1          | 1           | fshin                            | hinge | torque (N m) |\n    | 5   | Torque applied on the front foot rotor  | -1          | 1           | ffoot                            | hinge | torque (N m) |\n\n\n    ### Observation Space\n\n    Observations consist of positional values of different body parts of the\n    cheetah, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities.\n\n    By default, observations do not include the x-coordinate of the cheetah's center of mass. It may\n    be included by passing `exclude_current_positions_from_observation=False` during construction.\n    In that case, the observation space will have 18 dimensions where the first dimension\n    represents the x-coordinate of the cheetah's center of mass.\n    Regardless of whether `exclude_current_positions_from_observation` was set to true or false, the x-coordinate\n    will be returned in `info` with key `\"x_position\"`.\n\n    However, by default, the observation is a `ndarray` with shape `(17,)` where the elements correspond to the following:\n\n\n    | Num | Observation                          | Min  | Max | Name (in corresponding XML file) | Joint | Unit                     |\n    | --- | ------------------------------------ | ---- | --- | -------------------------------- | ----- | ------------------------ |\n    | 0   | z-coordinate of the front tip        | -Inf | Inf | rootz                            | slide | position (m)             |\n    | 1   | angle of the front tip               | -Inf | Inf | rooty                            | hinge | angle (rad)              |\n    | 2   | angle of the second rotor            | -Inf | Inf | bthigh                           | hinge | angle (rad)              |\n    | 3   | angle of the second rotor            | -Inf | Inf | bshin                            | hinge | angle (rad)              |\n    | 4   | velocity of the tip along the x-axis | -Inf | Inf | bfoot                            | hinge | angle (rad)              |\n    | 5   | velocity of the tip along the y-axis | -Inf | Inf | fthigh                           | hinge | angle (rad)              |\n    | 6   | angular velocity of front tip        | -Inf | Inf | fshin                            | hinge | angle (rad)              |\n    | 7   | angular velocity of second rotor     | -Inf | Inf | ffoot                            | hinge | angle (rad)              |\n    | 8   | x-coordinate of the front tip        | -Inf | Inf | rootx                            | slide | velocity (m/s)           |\n    | 9   | y-coordinate of the front tip        | -Inf | Inf | rootz                            | slide | velocity (m/s)           |\n    | 10  | angle of the front tip               | -Inf | Inf | rooty                            | hinge | angular velocity (rad/s) |\n    | 11  | angle of the second rotor            | -Inf | Inf | bthigh                           | hinge | angular velocity (rad/s) |\n    | 12  | angle of the second rotor            | -Inf | Inf | bshin                            | hinge | angular velocity (rad/s) |\n    | 13  | velocity of the tip along the x-axis | -Inf | Inf | bfoot                            | hinge | angular velocity (rad/s) |\n    | 14  | velocity of the tip along the y-axis | -Inf | Inf | fthigh                           | hinge | angular velocity (rad/s) |\n    | 15  | angular velocity of front tip        | -Inf | Inf | fshin                            | hinge | angular velocity (rad/s) |\n    | 16  | angular velocity of second rotor     | -Inf | Inf | ffoot                            | hinge | angular velocity (rad/s) |\n\n    ### Rewards\n    The reward consists of two parts:\n    - *forward_reward*: A reward of moving forward which is measured\n    as *`forward_reward_weight` * (x-coordinate before action - x-coordinate after action)/dt*. *dt* is\n    the time between actions and is dependent on the frame_skip parameter\n    (fixed to 5), where the frametime is 0.01 - making the\n    default *dt = 5 * 0.01 = 0.05*. This reward would be positive if the cheetah\n    runs forward (right).\n    - *ctrl_cost*: A cost for penalising the cheetah if it takes\n    actions that are too large. It is measured as *`ctrl_cost_weight` *\n    sum(action<sup>2</sup>)* where *`ctrl_cost_weight`* is a parameter set for the\n    control and has a default value of 0.1\n\n    The total reward returned is ***reward*** *=* *forward_reward - ctrl_cost* and `info` will also contain the individual reward terms\n\n    ### Starting State\n    All observations start in state (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,\n    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,) with a noise added to the\n    initial state for stochasticity. As seen before, the first 8 values in the\n    state are positional and the last 9 values are velocity. A uniform noise in\n    the range of [-`reset_noise_scale`, `reset_noise_scale`] is added to the positional values while a standard\n    normal noise with a mean of 0 and standard deviation of `reset_noise_scale` is added to the\n    initial velocity values of all zeros.\n\n    ### Episode End\n    The episode truncates when the episode length is greater than 1000.\n\n    ### Arguments\n\n    No additional arguments are currently supported in v2 and lower.\n\n    ```\n    env = gym.make('HalfCheetah-v2')\n    ```\n\n    v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.\n\n    ```\n    env = gym.make('HalfCheetah-v4', ctrl_cost_weight=0.1, ....)\n    ```\n\n    | Parameter                                    | Type      | Default              | Description                                                                                                                                                       |\n    | -------------------------------------------- | --------- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n    | `xml_file`                                   | **str**   | `\"half_cheetah.xml\"` | Path to a MuJoCo model                                                                                                                                            |\n    | `forward_reward_weight`                      | **float** | `1.0`                | Weight for _forward_reward_ term (see section on reward)                                                                                                          |\n    | `ctrl_cost_weight`                           | **float** | `0.1`                | Weight for _ctrl_cost_ weight (see section on reward)                                                                                                             |\n    | `reset_noise_scale`                          | **float** | `0.1`                | Scale of random perturbations of initial position and velocity (see section on Starting State)                                                                    |\n    | `exclude_current_positions_from_observation` | **bool**  | `True`               | Whether or not to omit the x-coordinate from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies |\n\n    ### Version History\n\n    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3\n    * v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)\n    * v2: All continuous control environments now use mujoco_py >= 1.50\n    * v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 20,\n    }\n\n    def __init__(\n        self,\n        forward_reward_weight=1.0,\n        ctrl_cost_weight=0.1,\n        reset_noise_scale=0.1,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            forward_reward_weight,\n            ctrl_cost_weight,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._forward_reward_weight = forward_reward_weight\n\n        self._ctrl_cost_weight = ctrl_cost_weight\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n\n        if exclude_current_positions_from_observation:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(17,), dtype=np.float64\n            )\n        else:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64\n            )\n\n        MujocoEnv.__init__(\n            self, \"half_cheetah.xml\", 5, observation_space=observation_space, **kwargs\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(action))\n        return control_cost\n\n    def step(self, action):\n        x_position_before = self.data.qpos[0]\n        self.do_simulation(action, self.frame_skip)\n        x_position_after = self.data.qpos[0]\n        x_velocity = (x_position_after - x_position_before) / self.dt\n\n        ctrl_cost = self.control_cost(action)\n\n        forward_reward = self._forward_reward_weight * x_velocity\n\n        observation = self._get_obs()\n        reward = forward_reward - ctrl_cost\n        terminated = False\n        info = {\n            \"x_position\": x_position_after,\n            \"x_velocity\": x_velocity,\n            \"reward_run\": forward_reward,\n            \"reward_ctrl\": -ctrl_cost,\n        }\n\n        if self.render_mode == \"human\":\n            self.render()\n        return observation, reward, terminated, False, info\n\n    def _get_obs(self):\n        position = self.data.qpos.flat.copy()\n        velocity = self.data.qvel.flat.copy()\n\n        if self._exclude_current_positions_from_observation:\n            position = position[1:]\n\n        observation = np.concatenate((position, velocity)).ravel()\n        return observation\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = (\n            self.init_qvel\n            + self._reset_noise_scale * self.np_random.standard_normal(self.model.nv)\n        )\n\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/mujoco/hopper.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\n\nclass HopperEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 125,\n    }\n\n    def __init__(self, **kwargs):\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(11,), dtype=np.float64)\n        MuJocoPyEnv.__init__(\n            self, \"hopper.xml\", 4, observation_space=observation_space, **kwargs\n        )\n        utils.EzPickle.__init__(self, **kwargs)\n\n    def step(self, a):\n        posbefore = self.sim.data.qpos[0]\n        self.do_simulation(a, self.frame_skip)\n        posafter, height, ang = self.sim.data.qpos[0:3]\n\n        alive_bonus = 1.0\n        reward = (posafter - posbefore) / self.dt\n        reward += alive_bonus\n        reward -= 1e-3 * np.square(a).sum()\n        s = self.state_vector()\n        terminated = not (\n            np.isfinite(s).all()\n            and (np.abs(s[2:]) < 100).all()\n            and (height > 0.7)\n            and (abs(ang) < 0.2)\n        )\n        ob = self._get_obs()\n\n        if self.render_mode == \"human\":\n            self.render()\n        return ob, reward, terminated, False, {}\n\n    def _get_obs(self):\n        return np.concatenate(\n            [self.sim.data.qpos.flat[1:], np.clip(self.sim.data.qvel.flat, -10, 10)]\n        )\n\n    def reset_model(self):\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=-0.005, high=0.005, size=self.model.nq\n        )\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=-0.005, high=0.005, size=self.model.nv\n        )\n        self.set_state(qpos, qvel)\n        return self._get_obs()\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.trackbodyid = 2\n        self.viewer.cam.distance = self.model.stat.extent * 0.75\n        self.viewer.cam.lookat[2] = 1.15\n        self.viewer.cam.elevation = -20\n"
  },
  {
    "path": "gym/envs/mujoco/hopper_v3.py",
    "content": "__credits__ = [\"Rushiv Arora\"]\n\nimport numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {\n    \"trackbodyid\": 2,\n    \"distance\": 3.0,\n    \"lookat\": np.array((0.0, 0.0, 1.15)),\n    \"elevation\": -20.0,\n}\n\n\nclass HopperEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 125,\n    }\n\n    def __init__(\n        self,\n        xml_file=\"hopper.xml\",\n        forward_reward_weight=1.0,\n        ctrl_cost_weight=1e-3,\n        healthy_reward=1.0,\n        terminate_when_unhealthy=True,\n        healthy_state_range=(-100.0, 100.0),\n        healthy_z_range=(0.7, float(\"inf\")),\n        healthy_angle_range=(-0.2, 0.2),\n        reset_noise_scale=5e-3,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            xml_file,\n            forward_reward_weight,\n            ctrl_cost_weight,\n            healthy_reward,\n            terminate_when_unhealthy,\n            healthy_state_range,\n            healthy_z_range,\n            healthy_angle_range,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._forward_reward_weight = forward_reward_weight\n\n        self._ctrl_cost_weight = ctrl_cost_weight\n\n        self._healthy_reward = healthy_reward\n        self._terminate_when_unhealthy = terminate_when_unhealthy\n\n        self._healthy_state_range = healthy_state_range\n        self._healthy_z_range = healthy_z_range\n        self._healthy_angle_range = healthy_angle_range\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n\n        if exclude_current_positions_from_observation:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(11,), dtype=np.float64\n            )\n        else:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(12,), dtype=np.float64\n            )\n\n        MuJocoPyEnv.__init__(\n            self, xml_file, 4, observation_space=observation_space, **kwargs\n        )\n\n    @property\n    def healthy_reward(self):\n        return (\n            float(self.is_healthy or self._terminate_when_unhealthy)\n            * self._healthy_reward\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(action))\n        return control_cost\n\n    @property\n    def is_healthy(self):\n        z, angle = self.sim.data.qpos[1:3]\n        state = self.state_vector()[2:]\n\n        min_state, max_state = self._healthy_state_range\n        min_z, max_z = self._healthy_z_range\n        min_angle, max_angle = self._healthy_angle_range\n\n        healthy_state = np.all(np.logical_and(min_state < state, state < max_state))\n        healthy_z = min_z < z < max_z\n        healthy_angle = min_angle < angle < max_angle\n\n        is_healthy = all((healthy_state, healthy_z, healthy_angle))\n\n        return is_healthy\n\n    @property\n    def terminated(self):\n        terminated = not self.is_healthy if self._terminate_when_unhealthy else False\n        return terminated\n\n    def _get_obs(self):\n        position = self.sim.data.qpos.flat.copy()\n        velocity = np.clip(self.sim.data.qvel.flat.copy(), -10, 10)\n\n        if self._exclude_current_positions_from_observation:\n            position = position[1:]\n\n        observation = np.concatenate((position, velocity)).ravel()\n        return observation\n\n    def step(self, action):\n        x_position_before = self.sim.data.qpos[0]\n        self.do_simulation(action, self.frame_skip)\n        x_position_after = self.sim.data.qpos[0]\n        x_velocity = (x_position_after - x_position_before) / self.dt\n\n        ctrl_cost = self.control_cost(action)\n\n        forward_reward = self._forward_reward_weight * x_velocity\n        healthy_reward = self.healthy_reward\n\n        rewards = forward_reward + healthy_reward\n        costs = ctrl_cost\n\n        observation = self._get_obs()\n        reward = rewards - costs\n        terminated = self.terminated\n        info = {\n            \"x_position\": x_position_after,\n            \"x_velocity\": x_velocity,\n        }\n\n        if self.render_mode == \"human\":\n            self.render()\n        return observation, reward, terminated, False, info\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nv\n        )\n\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/mujoco/hopper_v4.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MujocoEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {\n    \"trackbodyid\": 2,\n    \"distance\": 3.0,\n    \"lookat\": np.array((0.0, 0.0, 1.15)),\n    \"elevation\": -20.0,\n}\n\n\nclass HopperEnv(MujocoEnv, utils.EzPickle):\n    \"\"\"\n    ### Description\n\n    This environment is based on the work done by Erez, Tassa, and Todorov in\n    [\"Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks\"](http://www.roboticsproceedings.org/rss07/p10.pdf). The environment aims to\n    increase the number of independent state and control variables as compared to\n    the classic control environments. The hopper is a two-dimensional\n    one-legged figure that consist of four main body parts - the torso at the\n    top, the thigh in the middle, the leg in the bottom, and a single foot on\n    which the entire body rests. The goal is to make hops that move in the\n    forward (right) direction by applying torques on the three hinges\n    connecting the four body parts.\n\n    ### Action Space\n    The action space is a `Box(-1, 1, (3,), float32)`. An action represents the torques applied between *links*\n\n    | Num | Action                             | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit         |\n    |-----|------------------------------------|-------------|-------------|----------------------------------|-------|--------------|\n    | 0   | Torque applied on the thigh rotor  | -1          | 1           | thigh_joint                      | hinge | torque (N m) |\n    | 1   | Torque applied on the leg rotor    | -1          | 1           | leg_joint                        | hinge | torque (N m) |\n    | 3   | Torque applied on the foot rotor   | -1          | 1           | foot_joint                       | hinge | torque (N m) |\n\n    ### Observation Space\n\n    Observations consist of positional values of different body parts of the\n    hopper, followed by the velocities of those individual parts\n    (their derivatives) with all the positions ordered before all the velocities.\n\n    By default, observations do not include the x-coordinate of the hopper. It may\n    be included by passing `exclude_current_positions_from_observation=False` during construction.\n    In that case, the observation space will have 12 dimensions where the first dimension\n    represents the x-coordinate of the hopper.\n    Regardless of whether `exclude_current_positions_from_observation` was set to true or false, the x-coordinate\n    will be returned in `info` with key `\"x_position\"`.\n\n    However, by default, the observation is a `ndarray` with shape `(11,)` where the elements\n    correspond to the following:\n\n    | Num | Observation                                      | Min  | Max | Name (in corresponding XML file) | Joint | Unit                     |\n    | --- | ------------------------------------------------ | ---- | --- | -------------------------------- | ----- | ------------------------ |\n    | 0   | z-coordinate of the top (height of hopper)       | -Inf | Inf | rootz                            | slide | position (m)             |\n    | 1   | angle of the top                                 | -Inf | Inf | rooty                            | hinge | angle (rad)              |\n    | 2   | angle of the thigh joint                         | -Inf | Inf | thigh_joint                      | hinge | angle (rad)              |\n    | 3   | angle of the leg joint                           | -Inf | Inf | leg_joint                        | hinge | angle (rad)              |\n    | 4   | angle of the foot joint                          | -Inf | Inf | foot_joint                       | hinge | angle (rad)              |\n    | 5   | velocity of the x-coordinate of the top          | -Inf | Inf | rootx                            | slide | velocity (m/s)           |\n    | 6   | velocity of the z-coordinate (height) of the top | -Inf | Inf | rootz                            | slide | velocity (m/s)           |\n    | 7   | angular velocity of the angle of the top         | -Inf | Inf | rooty                            | hinge | angular velocity (rad/s) |\n    | 8   | angular velocity of the thigh hinge              | -Inf | Inf | thigh_joint                      | hinge | angular velocity (rad/s) |\n    | 9   | angular velocity of the leg hinge                | -Inf | Inf | leg_joint                        | hinge | angular velocity (rad/s) |\n    | 10  | angular velocity of the foot hinge               | -Inf | Inf | foot_joint                       | hinge | angular velocity (rad/s) |\n\n\n    ### Rewards\n    The reward consists of three parts:\n    - *healthy_reward*: Every timestep that the hopper is healthy (see definition in section \"Episode Termination\"), it gets a reward of fixed value `healthy_reward`.\n    - *forward_reward*: A reward of hopping forward which is measured\n    as *`forward_reward_weight` * (x-coordinate before action - x-coordinate after action)/dt*. *dt* is\n    the time between actions and is dependent on the frame_skip parameter\n    (fixed to 4), where the frametime is 0.002 - making the\n    default *dt = 4 * 0.002 = 0.008*. This reward would be positive if the hopper\n    hops forward (positive x direction).\n    - *ctrl_cost*: A cost for penalising the hopper if it takes\n    actions that are too large. It is measured as *`ctrl_cost_weight` *\n    sum(action<sup>2</sup>)* where *`ctrl_cost_weight`* is a parameter set for the\n    control and has a default value of 0.001\n\n    The total reward returned is ***reward*** *=* *healthy_reward + forward_reward - ctrl_cost* and `info` will also contain the individual reward terms\n\n    ### Starting State\n    All observations start in state\n    (0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise\n     in the range of [-`reset_noise_scale`, `reset_noise_scale`] added to the values for stochasticity.\n\n    ### Episode End\n    The hopper is said to be unhealthy if any of the following happens:\n\n    1. An element of `observation[1:]` (if  `exclude_current_positions_from_observation=True`, else `observation[2:]`) is no longer contained in the closed interval specified by the argument `healthy_state_range`\n    2. The height of the hopper (`observation[0]` if  `exclude_current_positions_from_observation=True`, else `observation[1]`) is no longer contained in the closed interval specified by the argument `healthy_z_range` (usually meaning that it has fallen)\n    3. The angle (`observation[1]` if  `exclude_current_positions_from_observation=True`, else `observation[2]`) is no longer contained in the closed interval specified by the argument `healthy_angle_range`\n\n    If `terminate_when_unhealthy=True` is passed during construction (which is the default),\n    the episode ends when any of the following happens:\n\n    1. Truncation: The episode duration reaches a 1000 timesteps\n    2. Termination: The hopper is unhealthy\n\n    If `terminate_when_unhealthy=False` is passed, the episode is ended only when 1000 timesteps are exceeded.\n\n    ### Arguments\n\n    No additional arguments are currently supported in v2 and lower.\n\n    ```\n    env = gym.make('Hopper-v2')\n    ```\n\n    v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.\n\n    ```\n    env = gym.make('Hopper-v4', ctrl_cost_weight=0.1, ....)\n    ```\n\n    | Parameter                                    | Type      | Default               | Description                                                                                                                                                                     |\n    | -------------------------------------------- | --------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n    | `xml_file`                                   | **str**   | `\"hopper.xml\"`        | Path to a MuJoCo model                                                                                                                                                          |\n    | `forward_reward_weight`                      | **float** | `1.0`                 | Weight for _forward_reward_ term (see section on reward)                                                                                                                        |\n    | `ctrl_cost_weight`                           | **float** | `0.001`               | Weight for _ctrl_cost_ reward (see section on reward)                                                                                                                           |\n    | `healthy_reward`                             | **float** | `1`                   | Constant reward given if the ant is \"healthy\" after timestep                                                                                                                    |\n    | `terminate_when_unhealthy`                   | **bool**  | `True`                | If true, issue a done signal if the hopper is no longer healthy                                                                                                                 |\n    | `healthy_state_range`                        | **tuple** | `(-100, 100)`         | The elements of `observation[1:]` (if `exclude_current_positions_from_observation=True`, else `observation[2:]`) must be in this range for the hopper to be considered healthy  |\n    | `healthy_z_range`                            | **tuple** | `(0.7, float(\"inf\"))` | The z-coordinate must be in this range for the hopper to be considered healthy                                                                                                  |\n    | `healthy_angle_range`                        | **tuple** | `(-0.2, 0.2)`         | The angle given by `observation[1]` (if `exclude_current_positions_from_observation=True`, else `observation[2]`) must be in this range for the hopper to be considered healthy |\n    | `reset_noise_scale`                          | **float** | `5e-3`                | Scale of random perturbations of initial position and velocity (see section on Starting State)                                                                                  |\n    | `exclude_current_positions_from_observation` | **bool**  | `True`                | Whether or not to omit the x-coordinate from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies               |\n\n    ### Version History\n\n    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3\n    * v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)\n    * v2: All continuous control environments now use mujoco_py >= 1.50\n    * v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 125,\n    }\n\n    def __init__(\n        self,\n        forward_reward_weight=1.0,\n        ctrl_cost_weight=1e-3,\n        healthy_reward=1.0,\n        terminate_when_unhealthy=True,\n        healthy_state_range=(-100.0, 100.0),\n        healthy_z_range=(0.7, float(\"inf\")),\n        healthy_angle_range=(-0.2, 0.2),\n        reset_noise_scale=5e-3,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            forward_reward_weight,\n            ctrl_cost_weight,\n            healthy_reward,\n            terminate_when_unhealthy,\n            healthy_state_range,\n            healthy_z_range,\n            healthy_angle_range,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._forward_reward_weight = forward_reward_weight\n\n        self._ctrl_cost_weight = ctrl_cost_weight\n\n        self._healthy_reward = healthy_reward\n        self._terminate_when_unhealthy = terminate_when_unhealthy\n\n        self._healthy_state_range = healthy_state_range\n        self._healthy_z_range = healthy_z_range\n        self._healthy_angle_range = healthy_angle_range\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n\n        if exclude_current_positions_from_observation:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(11,), dtype=np.float64\n            )\n        else:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(12,), dtype=np.float64\n            )\n\n        MujocoEnv.__init__(\n            self, \"hopper.xml\", 4, observation_space=observation_space, **kwargs\n        )\n\n    @property\n    def healthy_reward(self):\n        return (\n            float(self.is_healthy or self._terminate_when_unhealthy)\n            * self._healthy_reward\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(action))\n        return control_cost\n\n    @property\n    def is_healthy(self):\n        z, angle = self.data.qpos[1:3]\n        state = self.state_vector()[2:]\n\n        min_state, max_state = self._healthy_state_range\n        min_z, max_z = self._healthy_z_range\n        min_angle, max_angle = self._healthy_angle_range\n\n        healthy_state = np.all(np.logical_and(min_state < state, state < max_state))\n        healthy_z = min_z < z < max_z\n        healthy_angle = min_angle < angle < max_angle\n\n        is_healthy = all((healthy_state, healthy_z, healthy_angle))\n\n        return is_healthy\n\n    @property\n    def terminated(self):\n        terminated = not self.is_healthy if self._terminate_when_unhealthy else False\n        return terminated\n\n    def _get_obs(self):\n        position = self.data.qpos.flat.copy()\n        velocity = np.clip(self.data.qvel.flat.copy(), -10, 10)\n\n        if self._exclude_current_positions_from_observation:\n            position = position[1:]\n\n        observation = np.concatenate((position, velocity)).ravel()\n        return observation\n\n    def step(self, action):\n        x_position_before = self.data.qpos[0]\n        self.do_simulation(action, self.frame_skip)\n        x_position_after = self.data.qpos[0]\n        x_velocity = (x_position_after - x_position_before) / self.dt\n\n        ctrl_cost = self.control_cost(action)\n\n        forward_reward = self._forward_reward_weight * x_velocity\n        healthy_reward = self.healthy_reward\n\n        rewards = forward_reward + healthy_reward\n        costs = ctrl_cost\n\n        observation = self._get_obs()\n        reward = rewards - costs\n        terminated = self.terminated\n        info = {\n            \"x_position\": x_position_after,\n            \"x_velocity\": x_velocity,\n        }\n\n        if self.render_mode == \"human\":\n            self.render()\n        return observation, reward, terminated, False, info\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nv\n        )\n\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/mujoco/humanoid.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\n\ndef mass_center(model, sim):\n    mass = np.expand_dims(model.body_mass, 1)\n    xpos = sim.data.xipos\n    return (np.sum(mass * xpos, 0) / np.sum(mass))[0]\n\n\nclass HumanoidEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 67,\n    }\n\n    def __init__(self, **kwargs):\n        observation_space = Box(\n            low=-np.inf, high=np.inf, shape=(376,), dtype=np.float64\n        )\n        MuJocoPyEnv.__init__(\n            self, \"humanoid.xml\", 5, observation_space=observation_space, **kwargs\n        )\n        utils.EzPickle.__init__(self, **kwargs)\n\n    def _get_obs(self):\n        data = self.sim.data\n        return np.concatenate(\n            [\n                data.qpos.flat[2:],\n                data.qvel.flat,\n                data.cinert.flat,\n                data.cvel.flat,\n                data.qfrc_actuator.flat,\n                data.cfrc_ext.flat,\n            ]\n        )\n\n    def step(self, a):\n        pos_before = mass_center(self.model, self.sim)\n        self.do_simulation(a, self.frame_skip)\n        pos_after = mass_center(self.model, self.sim)\n\n        alive_bonus = 5.0\n        data = self.sim.data\n        lin_vel_cost = 1.25 * (pos_after - pos_before) / self.dt\n        quad_ctrl_cost = 0.1 * np.square(data.ctrl).sum()\n        quad_impact_cost = 0.5e-6 * np.square(data.cfrc_ext).sum()\n        quad_impact_cost = min(quad_impact_cost, 10)\n        reward = lin_vel_cost - quad_ctrl_cost - quad_impact_cost + alive_bonus\n        qpos = self.sim.data.qpos\n        terminated = bool((qpos[2] < 1.0) or (qpos[2] > 2.0))\n\n        if self.render_mode == \"human\":\n            self.render()\n        return (\n            self._get_obs(),\n            reward,\n            terminated,\n            False,\n            dict(\n                reward_linvel=lin_vel_cost,\n                reward_quadctrl=-quad_ctrl_cost,\n                reward_alive=alive_bonus,\n                reward_impact=-quad_impact_cost,\n            ),\n        )\n\n    def reset_model(self):\n        c = 0.01\n        self.set_state(\n            self.init_qpos + self.np_random.uniform(low=-c, high=c, size=self.model.nq),\n            self.init_qvel\n            + self.np_random.uniform(\n                low=-c,\n                high=c,\n                size=self.model.nv,\n            ),\n        )\n        return self._get_obs()\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.trackbodyid = 1\n        self.viewer.cam.distance = self.model.stat.extent * 1.0\n        self.viewer.cam.lookat[2] = 2.0\n        self.viewer.cam.elevation = -20\n"
  },
  {
    "path": "gym/envs/mujoco/humanoid_v3.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {\n    \"trackbodyid\": 1,\n    \"distance\": 4.0,\n    \"lookat\": np.array((0.0, 0.0, 2.0)),\n    \"elevation\": -20.0,\n}\n\n\ndef mass_center(model, sim):\n    mass = np.expand_dims(model.body_mass, axis=1)\n    xpos = sim.data.xipos\n    return (np.sum(mass * xpos, axis=0) / np.sum(mass))[0:2].copy()\n\n\nclass HumanoidEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 67,\n    }\n\n    def __init__(\n        self,\n        xml_file=\"humanoid.xml\",\n        forward_reward_weight=1.25,\n        ctrl_cost_weight=0.1,\n        contact_cost_weight=5e-7,\n        contact_cost_range=(-np.inf, 10.0),\n        healthy_reward=5.0,\n        terminate_when_unhealthy=True,\n        healthy_z_range=(1.0, 2.0),\n        reset_noise_scale=1e-2,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            xml_file,\n            forward_reward_weight,\n            ctrl_cost_weight,\n            contact_cost_weight,\n            contact_cost_range,\n            healthy_reward,\n            terminate_when_unhealthy,\n            healthy_z_range,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._forward_reward_weight = forward_reward_weight\n        self._ctrl_cost_weight = ctrl_cost_weight\n        self._contact_cost_weight = contact_cost_weight\n        self._contact_cost_range = contact_cost_range\n        self._healthy_reward = healthy_reward\n        self._terminate_when_unhealthy = terminate_when_unhealthy\n        self._healthy_z_range = healthy_z_range\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n        if exclude_current_positions_from_observation:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(376,), dtype=np.float64\n            )\n        else:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(378,), dtype=np.float64\n            )\n\n        MuJocoPyEnv.__init__(\n            self, xml_file, 5, observation_space=observation_space, **kwargs\n        )\n\n    @property\n    def healthy_reward(self):\n        return (\n            float(self.is_healthy or self._terminate_when_unhealthy)\n            * self._healthy_reward\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(self.sim.data.ctrl))\n        return control_cost\n\n    @property\n    def contact_cost(self):\n        contact_forces = self.sim.data.cfrc_ext\n        contact_cost = self._contact_cost_weight * np.sum(np.square(contact_forces))\n        min_cost, max_cost = self._contact_cost_range\n        contact_cost = np.clip(contact_cost, min_cost, max_cost)\n        return contact_cost\n\n    @property\n    def is_healthy(self):\n        min_z, max_z = self._healthy_z_range\n        is_healthy = min_z < self.sim.data.qpos[2] < max_z\n\n        return is_healthy\n\n    @property\n    def terminated(self):\n        terminated = (not self.is_healthy) if self._terminate_when_unhealthy else False\n        return terminated\n\n    def _get_obs(self):\n        position = self.sim.data.qpos.flat.copy()\n        velocity = self.sim.data.qvel.flat.copy()\n\n        com_inertia = self.sim.data.cinert.flat.copy()\n        com_velocity = self.sim.data.cvel.flat.copy()\n\n        actuator_forces = self.sim.data.qfrc_actuator.flat.copy()\n        external_contact_forces = self.sim.data.cfrc_ext.flat.copy()\n\n        if self._exclude_current_positions_from_observation:\n            position = position[2:]\n\n        return np.concatenate(\n            (\n                position,\n                velocity,\n                com_inertia,\n                com_velocity,\n                actuator_forces,\n                external_contact_forces,\n            )\n        )\n\n    def step(self, action):\n        xy_position_before = mass_center(self.model, self.sim)\n        self.do_simulation(action, self.frame_skip)\n        xy_position_after = mass_center(self.model, self.sim)\n\n        xy_velocity = (xy_position_after - xy_position_before) / self.dt\n        x_velocity, y_velocity = xy_velocity\n\n        ctrl_cost = self.control_cost(action)\n        contact_cost = self.contact_cost\n\n        forward_reward = self._forward_reward_weight * x_velocity\n        healthy_reward = self.healthy_reward\n\n        rewards = forward_reward + healthy_reward\n        costs = ctrl_cost + contact_cost\n\n        observation = self._get_obs()\n        reward = rewards - costs\n        terminated = self.terminated\n        info = {\n            \"reward_linvel\": forward_reward,\n            \"reward_quadctrl\": -ctrl_cost,\n            \"reward_alive\": healthy_reward,\n            \"reward_impact\": -contact_cost,\n            \"x_position\": xy_position_after[0],\n            \"y_position\": xy_position_after[1],\n            \"distance_from_origin\": np.linalg.norm(xy_position_after, ord=2),\n            \"x_velocity\": x_velocity,\n            \"y_velocity\": y_velocity,\n            \"forward_reward\": forward_reward,\n        }\n\n        if self.render_mode == \"human\":\n            self.render()\n        return observation, reward, terminated, False, info\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nv\n        )\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/mujoco/humanoid_v4.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MujocoEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {\n    \"trackbodyid\": 1,\n    \"distance\": 4.0,\n    \"lookat\": np.array((0.0, 0.0, 2.0)),\n    \"elevation\": -20.0,\n}\n\n\ndef mass_center(model, data):\n    mass = np.expand_dims(model.body_mass, axis=1)\n    xpos = data.xipos\n    return (np.sum(mass * xpos, axis=0) / np.sum(mass))[0:2].copy()\n\n\nclass HumanoidEnv(MujocoEnv, utils.EzPickle):\n    \"\"\"\n    ### Description\n\n    This environment is based on the environment introduced by Tassa, Erez and Todorov\n    in [\"Synthesis and stabilization of complex behaviors through online trajectory optimization\"](https://ieeexplore.ieee.org/document/6386025).\n    The 3D bipedal robot is designed to simulate a human. It has a torso (abdomen) with a pair of\n    legs and arms. The legs each consist of two links, and so the arms (representing the knees and\n    elbows respectively). The goal of the environment is to walk forward as fast as possible without falling over.\n\n    ### Action Space\n    The action space is a `Box(-1, 1, (17,), float32)`. An action represents the torques applied at the hinge joints.\n\n    | Num | Action                    | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit |\n    |-----|----------------------|---------------|----------------|---------------------------------------|-------|------|\n    | 0   | Torque applied on the hinge in the y-coordinate of the abdomen                     | -0.4 | 0.4 | hip_1 (front_left_leg)      | hinge | torque (N m) |\n    | 1   | Torque applied on the hinge in the z-coordinate of the abdomen                     | -0.4 | 0.4 | angle_1 (front_left_leg)    | hinge | torque (N m) |\n    | 2   | Torque applied on the hinge in the x-coordinate of the abdomen                     | -0.4 | 0.4 | hip_2 (front_right_leg)     | hinge | torque (N m) |\n    | 3   | Torque applied on the rotor between torso/abdomen and the right hip (x-coordinate) | -0.4 | 0.4 | right_hip_x (right_thigh)   | hinge | torque (N m) |\n    | 4   | Torque applied on the rotor between torso/abdomen and the right hip (z-coordinate) | -0.4 | 0.4 | right_hip_z (right_thigh)   | hinge | torque (N m) |\n    | 5   | Torque applied on the rotor between torso/abdomen and the right hip (y-coordinate) | -0.4 | 0.4 | right_hip_y (right_thigh)   | hinge | torque (N m) |\n    | 6   | Torque applied on the rotor between the right hip/thigh and the right shin         | -0.4 | 0.4 | right_knee                  | hinge | torque (N m) |\n    | 7   | Torque applied on the rotor between torso/abdomen and the left hip (x-coordinate)  | -0.4 | 0.4 | left_hip_x (left_thigh)     | hinge | torque (N m) |\n    | 8   | Torque applied on the rotor between torso/abdomen and the left hip (z-coordinate)  | -0.4 | 0.4 | left_hip_z (left_thigh)     | hinge | torque (N m) |\n    | 9   | Torque applied on the rotor between torso/abdomen and the left hip (y-coordinate)  | -0.4 | 0.4 | left_hip_y (left_thigh)     | hinge | torque (N m) |\n    | 10   | Torque applied on the rotor between the left hip/thigh and the left shin          | -0.4 | 0.4 | left_knee                   | hinge | torque (N m) |\n    | 11   | Torque applied on the rotor between the torso and right upper arm (coordinate -1) | -0.4 | 0.4 | right_shoulder1             | hinge | torque (N m) |\n    | 12   | Torque applied on the rotor between the torso and right upper arm (coordinate -2) | -0.4 | 0.4 | right_shoulder2             | hinge | torque (N m) |\n    | 13   | Torque applied on the rotor between the right upper arm and right lower arm       | -0.4 | 0.4 | right_elbow                 | hinge | torque (N m) |\n    | 14   | Torque applied on the rotor between the torso and left upper arm (coordinate -1)  | -0.4 | 0.4 | left_shoulder1              | hinge | torque (N m) |\n    | 15   | Torque applied on the rotor between the torso and left upper arm (coordinate -2)  | -0.4 | 0.4 | left_shoulder2              | hinge | torque (N m) |\n    | 16   | Torque applied on the rotor between the left upper arm and left lower arm         | -0.4 | 0.4 | left_elbow                  | hinge | torque (N m) |\n\n    ### Observation Space\n\n    Observations consist of positional values of different body parts of the Humanoid,\n     followed by the velocities of those individual parts (their derivatives) with all the\n     positions ordered before all the velocities.\n\n    By default, observations do not include the x- and y-coordinates of the torso. These may\n    be included by passing `exclude_current_positions_from_observation=False` during construction.\n    In that case, the observation space will have 378 dimensions where the first two dimensions\n    represent the x- and y-coordinates of the torso.\n    Regardless of whether `exclude_current_positions_from_observation` was set to true or false, the x- and y-coordinates\n    will be returned in `info` with keys `\"x_position\"` and `\"y_position\"`, respectively.\n\n    However, by default, the observation is a `ndarray` with shape `(376,)` where the elements correspond to the following:\n\n    | Num | Observation                                                                                                     | Min  | Max | Name (in corresponding XML file) | Joint | Unit                       |\n    | --- | --------------------------------------------------------------------------------------------------------------- | ---- | --- | -------------------------------- | ----- | -------------------------- |\n    | 0   | z-coordinate of the torso (centre)                                                                              | -Inf | Inf | root                             | free  | position (m)               |\n    | 1   | x-orientation of the torso (centre)                                                                             | -Inf | Inf | root                             | free  | angle (rad)                |\n    | 2   | y-orientation of the torso (centre)                                                                             | -Inf | Inf | root                             | free  | angle (rad)                |\n    | 3   | z-orientation of the torso (centre)                                                                             | -Inf | Inf | root                             | free  | angle (rad)                |\n    | 4   | w-orientation of the torso (centre)                                                                             | -Inf | Inf | root                             | free  | angle (rad)                |\n    | 5   | z-angle of the abdomen (in lower_waist)                                                                         | -Inf | Inf | abdomen_z                        | hinge | angle (rad)                |\n    | 6   | y-angle of the abdomen (in lower_waist)                                                                         | -Inf | Inf | abdomen_y                        | hinge | angle (rad)                |\n    | 7   | x-angle of the abdomen (in pelvis)                                                                              | -Inf | Inf | abdomen_x                        | hinge | angle (rad)                |\n    | 8   | x-coordinate of angle between pelvis and right hip (in right_thigh)                                             | -Inf | Inf | right_hip_x                      | hinge | angle (rad)                |\n    | 9   | z-coordinate of angle between pelvis and right hip (in right_thigh)                                             | -Inf | Inf | right_hip_z                      | hinge | angle (rad)                |\n    | 19  | y-coordinate of angle between pelvis and right hip (in right_thigh)                                             | -Inf | Inf | right_hip_y                      | hinge | angle (rad)                |\n    | 11  | angle between right hip and the right shin (in right_knee)                                                      | -Inf | Inf | right_knee                       | hinge | angle (rad)                |\n    | 12  | x-coordinate of angle between pelvis and left hip (in left_thigh)                                               | -Inf | Inf | left_hip_x                       | hinge | angle (rad)                |\n    | 13  | z-coordinate of angle between pelvis and left hip (in left_thigh)                                               | -Inf | Inf | left_hip_z                       | hinge | angle (rad)                |\n    | 14  | y-coordinate of angle between pelvis and left hip (in left_thigh)                                               | -Inf | Inf | left_hip_y                       | hinge | angle (rad)                |\n    | 15  | angle between left hip and the left shin (in left_knee)                                                         | -Inf | Inf | left_knee                        | hinge | angle (rad)                |\n    | 16  | coordinate-1 (multi-axis) angle between torso and right arm (in right_upper_arm)                                | -Inf | Inf | right_shoulder1                  | hinge | angle (rad)                |\n    | 17  | coordinate-2 (multi-axis) angle between torso and right arm (in right_upper_arm)                                | -Inf | Inf | right_shoulder2                  | hinge | angle (rad)                |\n    | 18  | angle between right upper arm and right_lower_arm                                                               | -Inf | Inf | right_elbow                      | hinge | angle (rad)                |\n    | 19  | coordinate-1 (multi-axis) angle between torso and left arm (in left_upper_arm)                                  | -Inf | Inf | left_shoulder1                   | hinge | angle (rad)                |\n    | 20  | coordinate-2 (multi-axis) angle between torso and left arm (in left_upper_arm)                                  | -Inf | Inf | left_shoulder2                   | hinge | angle (rad)                |\n    | 21  | angle between left upper arm and left_lower_arm                                                                 | -Inf | Inf | left_elbow                       | hinge | angle (rad)                |\n    | 22  | x-coordinate velocity of the torso (centre)                                                                     | -Inf | Inf | root                             | free  | velocity (m/s)             |\n    | 23  | y-coordinate velocity of the torso (centre)                                                                     | -Inf | Inf | root                             | free  | velocity (m/s)             |\n    | 24  | z-coordinate velocity of the torso (centre)                                                                     | -Inf | Inf | root                             | free  | velocity (m/s)             |\n    | 25  | x-coordinate angular velocity of the torso (centre)                                                             | -Inf | Inf | root                             | free  | anglular velocity (rad/s)  |\n    | 26  | y-coordinate angular velocity of the torso (centre)                                                             | -Inf | Inf | root                             | free  | anglular velocity (rad/s)  |\n    | 27  | z-coordinate angular velocity of the torso (centre)                                                             | -Inf | Inf | root                             | free  | anglular velocity (rad/s)  |\n    | 28  | z-coordinate of angular velocity of the abdomen (in lower_waist)                                                | -Inf | Inf | abdomen_z                        | hinge | anglular velocity (rad/s)  |\n    | 29  | y-coordinate of angular velocity of the abdomen (in lower_waist)                                                | -Inf | Inf | abdomen_y                        | hinge | anglular velocity (rad/s)  |\n    | 30  | x-coordinate of angular velocity of the abdomen (in pelvis)                                                     | -Inf | Inf | abdomen_x                        | hinge | aanglular velocity (rad/s) |\n    | 31  | x-coordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh)                 | -Inf | Inf | right_hip_x                      | hinge | anglular velocity (rad/s)  |\n    | 32  | z-coordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh)                 | -Inf | Inf | right_hip_z                      | hinge | anglular velocity (rad/s)  |\n    | 33  | y-coordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh)                 | -Inf | Inf | right_hip_y                      | hinge | anglular velocity (rad/s)  |\n    | 34  | angular velocity of the angle between right hip and the right shin (in right_knee)                              | -Inf | Inf | right_knee                       | hinge | anglular velocity (rad/s)  |\n    | 35  | x-coordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh)                   | -Inf | Inf | left_hip_x                       | hinge | anglular velocity (rad/s)  |\n    | 36  | z-coordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh)                   | -Inf | Inf | left_hip_z                       | hinge | anglular velocity (rad/s)  |\n    | 37  | y-coordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh)                   | -Inf | Inf | left_hip_y                       | hinge | anglular velocity (rad/s)  |\n    | 38  | angular velocity of the angle between left hip and the left shin (in left_knee)                                 | -Inf | Inf | left_knee                        | hinge | anglular velocity (rad/s)  |\n    | 39  | coordinate-1 (multi-axis) of the angular velocity of the angle between torso and right arm (in right_upper_arm) | -Inf | Inf | right_shoulder1                  | hinge | anglular velocity (rad/s)  |\n    | 40  | coordinate-2 (multi-axis) of the angular velocity of the angle between torso and right arm (in right_upper_arm) | -Inf | Inf | right_shoulder2                  | hinge | anglular velocity (rad/s)  |\n    | 41  | angular velocity of the angle between right upper arm and right_lower_arm                                       | -Inf | Inf | right_elbow                      | hinge | anglular velocity (rad/s)  |\n    | 42  | coordinate-1 (multi-axis) of the angular velocity of the angle between torso and left arm (in left_upper_arm)   | -Inf | Inf | left_shoulder1                   | hinge | anglular velocity (rad/s)  |\n    | 43  | coordinate-2 (multi-axis) of the angular velocity of the angle between torso and left arm (in left_upper_arm)   | -Inf | Inf | left_shoulder2                   | hinge | anglular velocity (rad/s)  |\n    | 44  | angular velocitty of the angle between left upper arm and left_lower_arm                                        | -Inf | Inf | left_elbow                       | hinge | anglular velocity (rad/s)  |\n\n    Additionally, after all the positional and velocity based values in the table,\n    the observation contains (in order):\n    - *cinert:* Mass and inertia of a single rigid body relative to the center of mass\n    (this is an intermediate result of transition). It has shape 14*10 (*nbody * 10*)\n    and hence adds to another 140 elements in the state space.\n    - *cvel:* Center of mass based velocity. It has shape 14 * 6 (*nbody * 6*) and hence\n    adds another 84 elements in the state space\n    - *qfrc_actuator:* Constraint force generated as the actuator force. This has shape\n    `(23,)`  *(nv * 1)* and hence adds another 23 elements to the state space.\n    - *cfrc_ext:* This is the center of mass based external force on the body.  It has shape\n    14 * 6 (*nbody * 6*) and hence adds to another 84 elements in the state space.\n    where *nbody* stands for the number of bodies in the robot and *nv* stands for the\n    number of degrees of freedom (*= dim(qvel)*)\n\n    The (x,y,z) coordinates are translational DOFs while the orientations are rotational\n    DOFs expressed as quaternions. One can read more about free joints on the\n    [Mujoco Documentation](https://mujoco.readthedocs.io/en/latest/XMLreference.html).\n\n    **Note:** Humanoid-v4 environment no longer has the following contact forces issue.\n    If using previous Humanoid versions from v4, there have been reported issues that using a Mujoco-Py version > 2.0\n    results in the contact forces always being 0. As such we recommend to use a Mujoco-Py\n    version < 2.0 when using the Humanoid environment if you would like to report results\n    with contact forces (if contact forces are not used in your experiments, you can use\n    version > 2.0).\n\n    ### Rewards\n    The reward consists of three parts:\n    - *healthy_reward*: Every timestep that the humanoid is alive (see section Episode Termination for definition), it gets a reward of fixed value `healthy_reward`\n    - *forward_reward*: A reward of walking forward which is measured as *`forward_reward_weight` *\n    (average center of mass before action - average center of mass after action)/dt*.\n    *dt* is the time between actions and is dependent on the frame_skip parameter\n    (default is 5), where the frametime is 0.003 - making the default *dt = 5 * 0.003 = 0.015*.\n    This reward would be positive if the humanoid walks forward (in positive x-direction). The calculation\n    for the center of mass is defined in the `.py` file for the Humanoid.\n    - *ctrl_cost*: A negative reward for penalising the humanoid if it has too\n    large of a control force. If there are *nu* actuators/controls, then the control has\n    shape  `nu x 1`. It is measured as *`ctrl_cost_weight` * sum(control<sup>2</sup>)*.\n    - *contact_cost*: A negative reward for penalising the humanoid if the external\n    contact force is too large. It is calculated by clipping\n    *`contact_cost_weight` * sum(external contact force<sup>2</sup>)* to the interval specified by `contact_cost_range`.\n\n    The total reward returned is ***reward*** *=* *healthy_reward + forward_reward - ctrl_cost - contact_cost* and `info` will also contain the individual reward terms\n\n    ### Starting State\n    All observations start in state\n    (0.0, 0.0,  1.4, 1.0, 0.0  ... 0.0) with a uniform noise in the range\n    of [-`reset_noise_scale`, `reset_noise_scale`] added to the positional and velocity values (values in the table)\n    for stochasticity. Note that the initial z coordinate is intentionally\n    selected to be high, thereby indicating a standing up humanoid. The initial\n    orientation is designed to make it face forward as well.\n\n    ### Episode End\n    The humanoid is said to be unhealthy if the z-position of the torso is no longer contained in the\n    closed interval specified by the argument `healthy_z_range`.\n\n    If `terminate_when_unhealthy=True` is passed during construction (which is the default),\n    the episode ends when any of the following happens:\n\n    1. Truncation: The episode duration reaches a 1000 timesteps\n    3. Termination: The humanoid is unhealthy\n\n    If `terminate_when_unhealthy=False` is passed, the episode is ended only when 1000 timesteps are exceeded.\n\n    ### Arguments\n\n    No additional arguments are currently supported in v2 and lower.\n\n    ```\n    env = gym.make('Humanoid-v4')\n    ```\n\n    v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.\n\n    ```\n    env = gym.make('Humanoid-v4', ctrl_cost_weight=0.1, ....)\n    ```\n\n    | Parameter                                    | Type      | Default          | Description                                                                                                                                                               |\n    | -------------------------------------------- | --------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n    | `xml_file`                                   | **str**   | `\"humanoid.xml\"` | Path to a MuJoCo model                                                                                                                                                    |\n    | `forward_reward_weight`                      | **float** | `1.25`           | Weight for _forward_reward_ term (see section on reward)                                                                                                                  |\n    | `ctrl_cost_weight`                           | **float** | `0.1`            | Weight for _ctrl_cost_ term (see section on reward)                                                                                                                       |\n    | `contact_cost_weight`                        | **float** | `5e-7`           | Weight for _contact_cost_ term (see section on reward)                                                                                                                    |\n    | `healthy_reward`                             | **float** | `5.0`            | Constant reward given if the humanoid is \"healthy\" after timestep                                                                                                         |\n    | `terminate_when_unhealthy`                   | **bool**  | `True`           | If true, issue a done signal if the z-coordinate of the torso is no longer in the `healthy_z_range`                                                                       |\n    | `healthy_z_range`                            | **tuple** | `(1.0, 2.0)`     | The humanoid is considered healthy if the z-coordinate of the torso is in this range                                                                                      |\n    | `reset_noise_scale`                          | **float** | `1e-2`           | Scale of random perturbations of initial position and velocity (see section on Starting State)                                                                            |\n    | `exclude_current_positions_from_observation` | **bool**  | `True`           | Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies |\n\n    ### Version History\n\n    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3\n    * v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)\n    * v2: All continuous control environments now use mujoco_py >= 1.50\n    * v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 67,\n    }\n\n    def __init__(\n        self,\n        forward_reward_weight=1.25,\n        ctrl_cost_weight=0.1,\n        healthy_reward=5.0,\n        terminate_when_unhealthy=True,\n        healthy_z_range=(1.0, 2.0),\n        reset_noise_scale=1e-2,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            forward_reward_weight,\n            ctrl_cost_weight,\n            healthy_reward,\n            terminate_when_unhealthy,\n            healthy_z_range,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._forward_reward_weight = forward_reward_weight\n        self._ctrl_cost_weight = ctrl_cost_weight\n        self._healthy_reward = healthy_reward\n        self._terminate_when_unhealthy = terminate_when_unhealthy\n        self._healthy_z_range = healthy_z_range\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n\n        if exclude_current_positions_from_observation:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(376,), dtype=np.float64\n            )\n        else:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(378,), dtype=np.float64\n            )\n\n        MujocoEnv.__init__(\n            self, \"humanoid.xml\", 5, observation_space=observation_space, **kwargs\n        )\n\n    @property\n    def healthy_reward(self):\n        return (\n            float(self.is_healthy or self._terminate_when_unhealthy)\n            * self._healthy_reward\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(self.data.ctrl))\n        return control_cost\n\n    @property\n    def is_healthy(self):\n        min_z, max_z = self._healthy_z_range\n        is_healthy = min_z < self.data.qpos[2] < max_z\n\n        return is_healthy\n\n    @property\n    def terminated(self):\n        terminated = (not self.is_healthy) if self._terminate_when_unhealthy else False\n        return terminated\n\n    def _get_obs(self):\n        position = self.data.qpos.flat.copy()\n        velocity = self.data.qvel.flat.copy()\n\n        com_inertia = self.data.cinert.flat.copy()\n        com_velocity = self.data.cvel.flat.copy()\n\n        actuator_forces = self.data.qfrc_actuator.flat.copy()\n        external_contact_forces = self.data.cfrc_ext.flat.copy()\n\n        if self._exclude_current_positions_from_observation:\n            position = position[2:]\n\n        return np.concatenate(\n            (\n                position,\n                velocity,\n                com_inertia,\n                com_velocity,\n                actuator_forces,\n                external_contact_forces,\n            )\n        )\n\n    def step(self, action):\n        xy_position_before = mass_center(self.model, self.data)\n        self.do_simulation(action, self.frame_skip)\n        xy_position_after = mass_center(self.model, self.data)\n\n        xy_velocity = (xy_position_after - xy_position_before) / self.dt\n        x_velocity, y_velocity = xy_velocity\n\n        ctrl_cost = self.control_cost(action)\n\n        forward_reward = self._forward_reward_weight * x_velocity\n        healthy_reward = self.healthy_reward\n\n        rewards = forward_reward + healthy_reward\n\n        observation = self._get_obs()\n        reward = rewards - ctrl_cost\n        terminated = self.terminated\n        info = {\n            \"reward_linvel\": forward_reward,\n            \"reward_quadctrl\": -ctrl_cost,\n            \"reward_alive\": healthy_reward,\n            \"x_position\": xy_position_after[0],\n            \"y_position\": xy_position_after[1],\n            \"distance_from_origin\": np.linalg.norm(xy_position_after, ord=2),\n            \"x_velocity\": x_velocity,\n            \"y_velocity\": y_velocity,\n            \"forward_reward\": forward_reward,\n        }\n\n        if self.render_mode == \"human\":\n            self.render()\n        return observation, reward, terminated, False, info\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nv\n        )\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/mujoco/humanoidstandup.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\n\nclass HumanoidStandupEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 67,\n    }\n\n    def __init__(self, **kwargs):\n        observation_space = Box(\n            low=-np.inf, high=np.inf, shape=(376,), dtype=np.float64\n        )\n        MuJocoPyEnv.__init__(\n            self,\n            \"humanoidstandup.xml\",\n            5,\n            observation_space=observation_space,\n            **kwargs\n        )\n        utils.EzPickle.__init__(self, **kwargs)\n\n    def _get_obs(self):\n        data = self.sim.data\n        return np.concatenate(\n            [\n                data.qpos.flat[2:],\n                data.qvel.flat,\n                data.cinert.flat,\n                data.cvel.flat,\n                data.qfrc_actuator.flat,\n                data.cfrc_ext.flat,\n            ]\n        )\n\n    def step(self, a):\n        self.do_simulation(a, self.frame_skip)\n        pos_after = self.sim.data.qpos[2]\n        data = self.sim.data\n        uph_cost = (pos_after - 0) / self.model.opt.timestep\n\n        quad_ctrl_cost = 0.1 * np.square(data.ctrl).sum()\n        quad_impact_cost = 0.5e-6 * np.square(data.cfrc_ext).sum()\n        quad_impact_cost = min(quad_impact_cost, 10)\n        reward = uph_cost - quad_ctrl_cost - quad_impact_cost + 1\n\n        if self.render_mode == \"human\":\n            self.render()\n        return (\n            self._get_obs(),\n            reward,\n            False,\n            False,\n            dict(\n                reward_linup=uph_cost,\n                reward_quadctrl=-quad_ctrl_cost,\n                reward_impact=-quad_impact_cost,\n            ),\n        )\n\n    def reset_model(self):\n        c = 0.01\n        self.set_state(\n            self.init_qpos + self.np_random.uniform(low=-c, high=c, size=self.model.nq),\n            self.init_qvel\n            + self.np_random.uniform(\n                low=-c,\n                high=c,\n                size=self.model.nv,\n            ),\n        )\n        return self._get_obs()\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.trackbodyid = 1\n        self.viewer.cam.distance = self.model.stat.extent * 1.0\n        self.viewer.cam.lookat[2] = 0.8925\n        self.viewer.cam.elevation = -20\n"
  },
  {
    "path": "gym/envs/mujoco/humanoidstandup_v4.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MujocoEnv\nfrom gym.spaces import Box\n\n\nclass HumanoidStandupEnv(MujocoEnv, utils.EzPickle):\n    \"\"\"\n    ### Description\n\n    This environment is based on the environment introduced by Tassa, Erez and Todorov\n    in [\"Synthesis and stabilization of complex behaviors through online trajectory optimization\"](https://ieeexplore.ieee.org/document/6386025).\n    The 3D bipedal robot is designed to simulate a human. It has a torso (abdomen) with a\n    pair of legs and arms. The legs each consist of two links, and so the arms (representing the\n    knees and elbows respectively). The environment starts with the humanoid laying on the ground,\n    and then the goal of the environment is to make the humanoid standup and then keep it standing\n    by applying torques on the various hinges.\n\n    ### Action Space\n    The agent take a 17-element vector for actions.\n\n    The action space is a continuous `(action, ...)` all in `[-1, 1]`, where `action`\n    represents the numerical torques applied at the hinge joints.\n\n    | Num | Action                                                                             | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit         |\n    | --- | ---------------------------------------------------------------------------------- | ----------- | ----------- | -------------------------------- | ----- | ------------ |\n    | 0   | Torque applied on the hinge in the y-coordinate of the abdomen                     | -0.4        | 0.4         | hip_1 (front_left_leg)           | hinge | torque (N m) |\n    | 1   | Torque applied on the hinge in the z-coordinate of the abdomen                     | -0.4        | 0.4         | angle_1 (front_left_leg)         | hinge | torque (N m) |\n    | 2   | Torque applied on the hinge in the x-coordinate of the abdomen                     | -0.4        | 0.4         | hip_2 (front_right_leg)          | hinge | torque (N m) |\n    | 3   | Torque applied on the rotor between torso/abdomen and the right hip (x-coordinate) | -0.4        | 0.4         | right_hip_x (right_thigh)        | hinge | torque (N m) |\n    | 4   | Torque applied on the rotor between torso/abdomen and the right hip (z-coordinate) | -0.4        | 0.4         | right_hip_z (right_thigh)        | hinge | torque (N m) |\n    | 5   | Torque applied on the rotor between torso/abdomen and the right hip (y-coordinate) | -0.4        | 0.4         | right_hip_y (right_thigh)        | hinge | torque (N m) |\n    | 6   | Torque applied on the rotor between the right hip/thigh and the right shin         | -0.4        | 0.4         | right_knee                       | hinge | torque (N m) |\n    | 7   | Torque applied on the rotor between torso/abdomen and the left hip (x-coordinate)  | -0.4        | 0.4         | left_hip_x (left_thigh)          | hinge | torque (N m) |\n    | 8   | Torque applied on the rotor between torso/abdomen and the left hip (z-coordinate)  | -0.4        | 0.4         | left_hip_z (left_thigh)          | hinge | torque (N m) |\n    | 9   | Torque applied on the rotor between torso/abdomen and the left hip (y-coordinate)  | -0.4        | 0.4         | left_hip_y (left_thigh)          | hinge | torque (N m) |\n    | 10  | Torque applied on the rotor between the left hip/thigh and the left shin           | -0.4        | 0.4         | left_knee                        | hinge | torque (N m) |\n    | 11  | Torque applied on the rotor between the torso and right upper arm (coordinate -1)  | -0.4        | 0.4         | right_shoulder1                  | hinge | torque (N m) |\n    | 12  | Torque applied on the rotor between the torso and right upper arm (coordinate -2)  | -0.4        | 0.4         | right_shoulder2                  | hinge | torque (N m) |\n    | 13  | Torque applied on the rotor between the right upper arm and right lower arm        | -0.4        | 0.4         | right_elbow                      | hinge | torque (N m) |\n    | 14  | Torque applied on the rotor between the torso and left upper arm (coordinate -1)   | -0.4        | 0.4         | left_shoulder1                   | hinge | torque (N m) |\n    | 15  | Torque applied on the rotor between the torso and left upper arm (coordinate -2)   | -0.4        | 0.4         | left_shoulder2                   | hinge | torque (N m) |\n    | 16  | Torque applied on the rotor between the left upper arm and left lower arm          | -0.4        | 0.4         | left_elbow                       | hinge | torque (N m) |\n\n    ### Observation Space\n\n    The state space consists of positional values of different body parts of the Humanoid,\n    followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities.\n\n    **Note:** The x- and y-coordinates of the torso are being omitted to produce position-agnostic behavior in policies\n\n    The observation is a `ndarray` with shape `(376,)` where the elements correspond to the following:\n\n    | Num | Observation                                                                                                     | Min  | Max | Name (in corresponding XML file) | Joint | Unit                       |\n    | --- | --------------------------------------------------------------------------------------------------------------- | ---- | --- | -------------------------------- | ----- | -------------------------- |\n    | 0   | z-coordinate of the torso (centre)                                                                              | -Inf | Inf | root                             | free  | position (m)               |\n    | 1   | x-orientation of the torso (centre)                                                                             | -Inf | Inf | root                             | free  | angle (rad)                |\n    | 2   | y-orientation of the torso (centre)                                                                             | -Inf | Inf | root                             | free  | angle (rad)                |\n    | 3   | z-orientation of the torso (centre)                                                                             | -Inf | Inf | root                             | free  | angle (rad)                |\n    | 4   | w-orientation of the torso (centre)                                                                             | -Inf | Inf | root                             | free  | angle (rad)                |\n    | 5   | z-angle of the abdomen (in lower_waist)                                                                         | -Inf | Inf | abdomen_z                        | hinge | angle (rad)                |\n    | 6   | y-angle of the abdomen (in lower_waist)                                                                         | -Inf | Inf | abdomen_y                        | hinge | angle (rad)                |\n    | 7   | x-angle of the abdomen (in pelvis)                                                                              | -Inf | Inf | abdomen_x                        | hinge | angle (rad)                |\n    | 8   | x-coordinate of angle between pelvis and right hip (in right_thigh)                                             | -Inf | Inf | right_hip_x                      | hinge | angle (rad)                |\n    | 9   | z-coordinate of angle between pelvis and right hip (in right_thigh)                                             | -Inf | Inf | right_hip_z                      | hinge | angle (rad)                |\n    | 10  | y-coordinate of angle between pelvis and right hip (in right_thigh)                                             | -Inf | Inf | right_hip_y                      | hinge | angle (rad)                |\n    | 11  | angle between right hip and the right shin (in right_knee)                                                      | -Inf | Inf | right_knee                       | hinge | angle (rad)                |\n    | 12  | x-coordinate of angle between pelvis and left hip (in left_thigh)                                               | -Inf | Inf | left_hip_x                       | hinge | angle (rad)                |\n    | 13  | z-coordinate of angle between pelvis and left hip (in left_thigh)                                               | -Inf | Inf | left_hip_z                       | hinge | angle (rad)                |\n    | 14  | y-coordinate of angle between pelvis and left hip (in left_thigh)                                               | -Inf | Inf | left_hip_y                       | hinge | angle (rad)                |\n    | 15  | angle between left hip and the left shin (in left_knee)                                                         | -Inf | Inf | left_knee                        | hinge | angle (rad)                |\n    | 16  | coordinate-1 (multi-axis) angle between torso and right arm (in right_upper_arm)                                | -Inf | Inf | right_shoulder1                  | hinge | angle (rad)                |\n    | 17  | coordinate-2 (multi-axis) angle between torso and right arm (in right_upper_arm)                                | -Inf | Inf | right_shoulder2                  | hinge | angle (rad)                |\n    | 18  | angle between right upper arm and right_lower_arm                                                               | -Inf | Inf | right_elbow                      | hinge | angle (rad)                |\n    | 19  | coordinate-1 (multi-axis) angle between torso and left arm (in left_upper_arm)                                  | -Inf | Inf | left_shoulder1                   | hinge | angle (rad)                |\n    | 20  | coordinate-2 (multi-axis) angle between torso and left arm (in left_upper_arm)                                  | -Inf | Inf | left_shoulder2                   | hinge | angle (rad)                |\n    | 21  | angle between left upper arm and left_lower_arm                                                                 | -Inf | Inf | left_elbow                       | hinge | angle (rad)                |\n    | 22  | x-coordinate velocity of the torso (centre)                                                                     | -Inf | Inf | root                             | free  | velocity (m/s)             |\n    | 23  | y-coordinate velocity of the torso (centre)                                                                     | -Inf | Inf | root                             | free  | velocity (m/s)             |\n    | 24  | z-coordinate velocity of the torso (centre)                                                                     | -Inf | Inf | root                             | free  | velocity (m/s)             |\n    | 25  | x-coordinate angular velocity of the torso (centre)                                                             | -Inf | Inf | root                             | free  | anglular velocity (rad/s)  |\n    | 26  | y-coordinate angular velocity of the torso (centre)                                                             | -Inf | Inf | root                             | free  | anglular velocity (rad/s)  |\n    | 27  | z-coordinate angular velocity of the torso (centre)                                                             | -Inf | Inf | root                             | free  | anglular velocity (rad/s)  |\n    | 28  | z-coordinate of angular velocity of the abdomen (in lower_waist)                                                | -Inf | Inf | abdomen_z                        | hinge | anglular velocity (rad/s)  |\n    | 29  | y-coordinate of angular velocity of the abdomen (in lower_waist)                                                | -Inf | Inf | abdomen_y                        | hinge | anglular velocity (rad/s)  |\n    | 30  | x-coordinate of angular velocity of the abdomen (in pelvis)                                                     | -Inf | Inf | abdomen_x                        | hinge | aanglular velocity (rad/s) |\n    | 31  | x-coordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh)                 | -Inf | Inf | right_hip_x                      | hinge | anglular velocity (rad/s)  |\n    | 32  | z-coordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh)                 | -Inf | Inf | right_hip_z                      | hinge | anglular velocity (rad/s)  |\n    | 33  | y-coordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh)                 | -Inf | Inf | right_hip_y                      | hinge | anglular velocity (rad/s)  |\n    | 35  | angular velocity of the angle between right hip and the right shin (in right_knee)                              | -Inf | Inf | right_knee                       | hinge | anglular velocity (rad/s)  |\n    | 36  | x-coordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh)                   | -Inf | Inf | left_hip_x                       | hinge | anglular velocity (rad/s)  |\n    | 37  | z-coordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh)                   | -Inf | Inf | left_hip_z                       | hinge | anglular velocity (rad/s)  |\n    | 38  | y-coordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh)                   | -Inf | Inf | left_hip_y                       | hinge | anglular velocity (rad/s)  |\n    | 39  | angular velocity of the angle between left hip and the left shin (in left_knee)                                 | -Inf | Inf | left_knee                        | hinge | anglular velocity (rad/s)  |\n    | 40  | coordinate-1 (multi-axis) of the angular velocity of the angle between torso and right arm (in right_upper_arm) | -Inf | Inf | right_shoulder1                  | hinge | anglular velocity (rad/s)  |\n    | 41  | coordinate-2 (multi-axis) of the angular velocity of the angle between torso and right arm (in right_upper_arm) | -Inf | Inf | right_shoulder2                  | hinge | anglular velocity (rad/s)  |\n    | 42  | angular velocity of the angle between right upper arm and right_lower_arm                                       | -Inf | Inf | right_elbow                      | hinge | anglular velocity (rad/s)  |\n    | 43  | coordinate-1 (multi-axis) of the angular velocity of the angle between torso and left arm (in left_upper_arm)   | -Inf | Inf | left_shoulder1                   | hinge | anglular velocity (rad/s)  |\n    | 44  | coordinate-2 (multi-axis) of the angular velocity of the angle between torso and left arm (in left_upper_arm)   | -Inf | Inf | left_shoulder2                   | hinge | anglular velocity (rad/s)  |\n    | 45  | angular velocitty of the angle between left upper arm and left_lower_arm                                        | -Inf | Inf | left_elbow                       | hinge | anglular velocity (rad/s)  |\n\n\n    Additionally, after all the positional and velocity based values in the table,\n    the state_space consists of (in order):\n    - *cinert:* Mass and inertia of a single rigid body relative to the center of mass\n    (this is an intermediate result of transition). It has shape 14*10 (*nbody * 10*)\n    and hence adds to another 140 elements in the state space.\n    - *cvel:* Center of mass based velocity. It has shape 14 * 6 (*nbody * 6*) and hence\n    adds another 84 elements in the state space\n    - *qfrc_actuator:* Constraint force generated as the actuator force. This has shape\n    `(23,)`  *(nv * 1)* and hence adds another 23 elements to the state space.\n    - *cfrc_ext:* This is the center of mass based external force on the body.  It has shape\n    14 * 6 (*nbody * 6*) and hence adds to another 84 elements in the state space.\n    where *nbody* stands for the number of bodies in the robot and *nv* stands for the number\n    of degrees of freedom (*= dim(qvel)*)\n\n    The (x,y,z) coordinates are translational DOFs while the orientations are rotational\n    DOFs expressed as quaternions. One can read more about free joints on the\n    [Mujoco Documentation](https://mujoco.readthedocs.io/en/latest/XMLreference.html).\n\n    **Note:** HumanoidStandup-v4 environment no longer has the following contact forces issue.\n    If using previous HumanoidStandup versions from v4, there have been reported issues that using a Mujoco-Py version > 2.0 results\n    in the contact forces always being 0. As such we recommend to use a Mujoco-Py version < 2.0\n    when using the Humanoid environment if you would like to report results with contact forces\n    (if contact forces are not used in your experiments, you can use version > 2.0).\n\n    ### Rewards\n    The reward consists of three parts:\n    - *uph_cost*: A reward for moving upward (in an attempt to stand up). This is not a relative\n    reward which measures how much upward it has moved from the last timestep, but it is an\n    absolute reward which measures how much upward the Humanoid has moved overall. It is\n    measured as *(z coordinate after action - 0)/(atomic timestep)*, where *z coordinate after\n    action* is index 0 in the state/index 2 in the table, and *atomic timestep* is the time for\n    one frame of movement even though the simulation has a framerate of 5 (done in order to inflate\n    rewards a little for faster learning).\n    - *quad_ctrl_cost*: A negative reward for penalising the humanoid if it has too large of\n    a control force. If there are *nu* actuators/controls, then the control has shape  `nu x 1`.\n    It is measured as *0.1 **x** sum(control<sup>2</sup>)*.\n    - *quad_impact_cost*: A negative reward for penalising the humanoid if the external\n    contact force is too large. It is calculated as *min(0.5 * 0.000001 * sum(external\n    contact force<sup>2</sup>), 10)*.\n\n    The total reward returned is ***reward*** *=* *uph_cost + 1 - quad_ctrl_cost - quad_impact_cost*\n\n    ### Starting State\n    All observations start in state\n    (0.0, 0.0,  0.105, 1.0, 0.0  ... 0.0) with a uniform noise in the range of\n    [-0.01, 0.01] added to the positional and velocity values (values in the table)\n    for stochasticity. Note that the initial z coordinate is intentionally selected\n    to be low, thereby indicating a laying down humanoid. The initial orientation is\n    designed to make it face forward as well.\n\n    ### Episode End\n    The episode ends when any of the following happens:\n\n    1. Truncation: The episode duration reaches a 1000 timesteps\n    2. Termination: Any of the state space values is no longer finite\n\n    ### Arguments\n\n    No additional arguments are currently supported.\n\n    ```\n    env = gym.make('HumanoidStandup-v4')\n    ```\n\n    There is no v3 for HumanoidStandup, unlike the robot environments where a v3 and\n    beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.\n\n\n    ### Version History\n\n    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3\n    * v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)\n    * v2: All continuous control environments now use mujoco_py >= 1.50\n    * v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 67,\n    }\n\n    def __init__(self, **kwargs):\n        observation_space = Box(\n            low=-np.inf, high=np.inf, shape=(376,), dtype=np.float64\n        )\n        MujocoEnv.__init__(\n            self,\n            \"humanoidstandup.xml\",\n            5,\n            observation_space=observation_space,\n            **kwargs\n        )\n        utils.EzPickle.__init__(self, **kwargs)\n\n    def _get_obs(self):\n        data = self.data\n        return np.concatenate(\n            [\n                data.qpos.flat[2:],\n                data.qvel.flat,\n                data.cinert.flat,\n                data.cvel.flat,\n                data.qfrc_actuator.flat,\n                data.cfrc_ext.flat,\n            ]\n        )\n\n    def step(self, a):\n        self.do_simulation(a, self.frame_skip)\n        pos_after = self.data.qpos[2]\n        data = self.data\n        uph_cost = (pos_after - 0) / self.model.opt.timestep\n\n        quad_ctrl_cost = 0.1 * np.square(data.ctrl).sum()\n        quad_impact_cost = 0.5e-6 * np.square(data.cfrc_ext).sum()\n        quad_impact_cost = min(quad_impact_cost, 10)\n        reward = uph_cost - quad_ctrl_cost - quad_impact_cost + 1\n\n        if self.render_mode == \"human\":\n            self.render()\n        return (\n            self._get_obs(),\n            reward,\n            False,\n            False,\n            dict(\n                reward_linup=uph_cost,\n                reward_quadctrl=-quad_ctrl_cost,\n                reward_impact=-quad_impact_cost,\n            ),\n        )\n\n    def reset_model(self):\n        c = 0.01\n        self.set_state(\n            self.init_qpos + self.np_random.uniform(low=-c, high=c, size=self.model.nq),\n            self.init_qvel\n            + self.np_random.uniform(\n                low=-c,\n                high=c,\n                size=self.model.nv,\n            ),\n        )\n        return self._get_obs()\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.trackbodyid = 1\n        self.viewer.cam.distance = self.model.stat.extent * 1.0\n        self.viewer.cam.lookat[2] = 0.8925\n        self.viewer.cam.elevation = -20\n"
  },
  {
    "path": "gym/envs/mujoco/inverted_double_pendulum.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\n\nclass InvertedDoublePendulumEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 20,\n    }\n\n    def __init__(self, **kwargs):\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(11,), dtype=np.float64)\n        MuJocoPyEnv.__init__(\n            self,\n            \"inverted_double_pendulum.xml\",\n            5,\n            observation_space=observation_space,\n            **kwargs\n        )\n        utils.EzPickle.__init__(self, **kwargs)\n\n    def step(self, action):\n        self.do_simulation(action, self.frame_skip)\n\n        ob = self._get_obs()\n        x, _, y = self.sim.data.site_xpos[0]\n        dist_penalty = 0.01 * x**2 + (y - 2) ** 2\n        v1, v2 = self.sim.data.qvel[1:3]\n        vel_penalty = 1e-3 * v1**2 + 5e-3 * v2**2\n        alive_bonus = 10\n        r = alive_bonus - dist_penalty - vel_penalty\n        terminated = bool(y <= 1)\n\n        if self.render_mode == \"human\":\n            self.render()\n        return ob, r, terminated, False, {}\n\n    def _get_obs(self):\n        return np.concatenate(\n            [\n                self.sim.data.qpos[:1],  # cart x pos\n                np.sin(self.sim.data.qpos[1:]),  # link angles\n                np.cos(self.sim.data.qpos[1:]),\n                np.clip(self.sim.data.qvel, -10, 10),\n                np.clip(self.sim.data.qfrc_constraint, -10, 10),\n            ]\n        ).ravel()\n\n    def reset_model(self):\n        self.set_state(\n            self.init_qpos\n            + self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nq),\n            self.init_qvel + self.np_random.standard_normal(self.model.nv) * 0.1,\n        )\n        return self._get_obs()\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        v = self.viewer\n        v.cam.trackbodyid = 0\n        v.cam.distance = self.model.stat.extent * 0.5\n        v.cam.lookat[2] = 0.12250000000000005  # v.model.stat.center[2]\n"
  },
  {
    "path": "gym/envs/mujoco/inverted_double_pendulum_v4.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MujocoEnv\nfrom gym.spaces import Box\n\n\nclass InvertedDoublePendulumEnv(MujocoEnv, utils.EzPickle):\n    \"\"\"\n    ### Description\n\n    This environment originates from control theory and builds on the cartpole\n    environment based on the work done by Barto, Sutton, and Anderson in\n    [\"Neuronlike adaptive elements that can solve difficult learning control problems\"](https://ieeexplore.ieee.org/document/6313077),\n    powered by the Mujoco physics simulator - allowing for more complex experiments\n    (such as varying the effects of gravity or constraints). This environment involves a cart that can\n    moved linearly, with a pole fixed on it and a second pole fixed on the other end of the first one\n    (leaving the second pole as the only one with one free end). The cart can be pushed left or right,\n    and the goal is to balance the second pole on top of the first pole, which is in turn on top of the\n    cart, by applying continuous forces on the cart.\n\n    ### Action Space\n    The agent take a 1-element vector for actions.\n    The action space is a continuous `(action)` in `[-1, 1]`, where `action` represents the\n    numerical force applied to the cart (with magnitude representing the amount of force and\n    sign representing the direction)\n\n    | Num | Action                    | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit      |\n    |-----|---------------------------|-------------|-------------|----------------------------------|-------|-----------|\n    | 0   | Force applied on the cart | -1          | 1           | slider                           | slide | Force (N) |\n\n    ### Observation Space\n\n    The state space consists of positional values of different body parts of the pendulum system,\n    followed by the velocities of those individual parts (their derivatives) with all the\n    positions ordered before all the velocities.\n\n    The observation is a `ndarray` with shape `(11,)` where the elements correspond to the following:\n\n    | Num | Observation                                                       | Min  | Max | Name (in corresponding XML file) | Joint | Unit                     |\n    | --- | ----------------------------------------------------------------- | ---- | --- | -------------------------------- | ----- | ------------------------ |\n    | 0   | position of the cart along the linear surface                     | -Inf | Inf | slider                           | slide | position (m)             |\n    | 1   | sine of the angle between the cart and the first pole             | -Inf | Inf | sin(hinge)                       | hinge | unitless                 |\n    | 2   | sine of the angle between the two poles                           | -Inf | Inf | sin(hinge2)                      | hinge | unitless                 |\n    | 3   | cosine of the angle between the cart and the first pole           | -Inf | Inf | cos(hinge)                       | hinge | unitless                 |\n    | 4   | cosine of the angle between the two poles                         | -Inf | Inf | cos(hinge2)                      | hinge | unitless                 |\n    | 5   | velocity of the cart                                              | -Inf | Inf | slider                           | slide | velocity (m/s)           |\n    | 6   | angular velocity of the angle between the cart and the first pole | -Inf | Inf | hinge                            | hinge | angular velocity (rad/s) |\n    | 7   | angular velocity of the angle between the two poles               | -Inf | Inf | hinge2                           | hinge | angular velocity (rad/s) |\n    | 8   | constraint force - 1                                              | -Inf | Inf |                                  |       | Force (N)                |\n    | 9   | constraint force - 2                                              | -Inf | Inf |                                  |       | Force (N)                |\n    | 10  | constraint force - 3                                              | -Inf | Inf |                                  |       | Force (N)                |\n\n\n    There is physical contact between the robots and their environment - and Mujoco\n    attempts at getting realisitic physics simulations for the possible physical contact\n    dynamics by aiming for physical accuracy and computational efficiency.\n\n    There is one constraint force for contacts for each degree of freedom (3).\n    The approach and handling of constraints by Mujoco is unique to the simulator\n    and is based on their research. Once can find more information in their\n    [*documentation*](https://mujoco.readthedocs.io/en/latest/computation.html)\n    or in their paper\n    [\"Analytically-invertible dynamics with contacts and constraints: Theory and implementation in MuJoCo\"](https://homes.cs.washington.edu/~todorov/papers/TodorovICRA14.pdf).\n\n\n    ### Rewards\n\n    The reward consists of two parts:\n    - *alive_bonus*: The goal is to make the second inverted pendulum stand upright\n    (within a certain angle limit) as long as possible - as such a reward of +10 is awarded\n     for each timestep that the second pole is upright.\n    - *distance_penalty*: This reward is a measure of how far the *tip* of the second pendulum\n    (the only free end) moves, and it is calculated as\n    *0.01 * x<sup>2</sup> + (y - 2)<sup>2</sup>*, where *x* is the x-coordinate of the tip\n    and *y* is the y-coordinate of the tip of the second pole.\n    - *velocity_penalty*: A negative reward for penalising the agent if it moves too\n    fast *0.001 *  v<sub>1</sub><sup>2</sup> + 0.005 * v<sub>2</sub> <sup>2</sup>*\n\n    The total reward returned is ***reward*** *=* *alive_bonus - distance_penalty - velocity_penalty*\n\n    ### Starting State\n    All observations start in state\n    (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise in the range\n    of [-0.1, 0.1] added to the positional values (cart position and pole angles) and standard\n    normal force with a standard deviation of 0.1 added to the velocity values for stochasticity.\n\n    ### Episode End\n    The episode ends when any of the following happens:\n\n    1.Truncation:  The episode duration reaches 1000 timesteps.\n    2.Termination: Any of the state space values is no longer finite.\n    3.Termination: The y_coordinate of the tip of the second pole *is less than or equal* to 1. The maximum standing height of the system is 1.196 m when all the parts are perpendicularly vertical on top of each other).\n\n    ### Arguments\n\n    No additional arguments are currently supported.\n\n    ```\n    env = gym.make('InvertedDoublePendulum-v4')\n    ```\n    There is no v3 for InvertedPendulum, unlike the robot environments where a v3 and\n    beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.\n\n\n    ### Version History\n\n    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3\n    * v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)\n    * v2: All continuous control environments now use mujoco_py >= 1.50\n    * v1: max_time_steps raised to 1000 for robot based tasks (including inverted pendulum)\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 20,\n    }\n\n    def __init__(self, **kwargs):\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(11,), dtype=np.float64)\n        MujocoEnv.__init__(\n            self,\n            \"inverted_double_pendulum.xml\",\n            5,\n            observation_space=observation_space,\n            **kwargs\n        )\n        utils.EzPickle.__init__(self, **kwargs)\n\n    def step(self, action):\n        self.do_simulation(action, self.frame_skip)\n        ob = self._get_obs()\n        x, _, y = self.data.site_xpos[0]\n        dist_penalty = 0.01 * x**2 + (y - 2) ** 2\n        v1, v2 = self.data.qvel[1:3]\n        vel_penalty = 1e-3 * v1**2 + 5e-3 * v2**2\n        alive_bonus = 10\n        r = alive_bonus - dist_penalty - vel_penalty\n        terminated = bool(y <= 1)\n        if self.render_mode == \"human\":\n            self.render()\n        return ob, r, terminated, False, {}\n\n    def _get_obs(self):\n        return np.concatenate(\n            [\n                self.data.qpos[:1],  # cart x pos\n                np.sin(self.data.qpos[1:]),  # link angles\n                np.cos(self.data.qpos[1:]),\n                np.clip(self.data.qvel, -10, 10),\n                np.clip(self.data.qfrc_constraint, -10, 10),\n            ]\n        ).ravel()\n\n    def reset_model(self):\n        self.set_state(\n            self.init_qpos\n            + self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nq),\n            self.init_qvel + self.np_random.standard_normal(self.model.nv) * 0.1,\n        )\n        return self._get_obs()\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        v = self.viewer\n        v.cam.trackbodyid = 0\n        v.cam.distance = self.model.stat.extent * 0.5\n        v.cam.lookat[2] = 0.12250000000000005  # v.model.stat.center[2]\n"
  },
  {
    "path": "gym/envs/mujoco/inverted_pendulum.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\n\nclass InvertedPendulumEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 25,\n    }\n\n    def __init__(self, **kwargs):\n        utils.EzPickle.__init__(self, **kwargs)\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(4,), dtype=np.float64)\n        MuJocoPyEnv.__init__(\n            self,\n            \"inverted_pendulum.xml\",\n            2,\n            observation_space=observation_space,\n            **kwargs\n        )\n\n    def step(self, a):\n        reward = 1.0\n        self.do_simulation(a, self.frame_skip)\n\n        ob = self._get_obs()\n        terminated = bool(not np.isfinite(ob).all() or (np.abs(ob[1]) > 0.2))\n\n        if self.render_mode == \"human\":\n            self.render()\n        return ob, reward, terminated, False, {}\n\n    def reset_model(self):\n        qpos = self.init_qpos + self.np_random.uniform(\n            size=self.model.nq, low=-0.01, high=0.01\n        )\n        qvel = self.init_qvel + self.np_random.uniform(\n            size=self.model.nv, low=-0.01, high=0.01\n        )\n        self.set_state(qpos, qvel)\n        return self._get_obs()\n\n    def _get_obs(self):\n        return np.concatenate([self.sim.data.qpos, self.sim.data.qvel]).ravel()\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.trackbodyid = 0\n        self.viewer.cam.distance = self.model.stat.extent\n"
  },
  {
    "path": "gym/envs/mujoco/inverted_pendulum_v4.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MujocoEnv\nfrom gym.spaces import Box\n\n\nclass InvertedPendulumEnv(MujocoEnv, utils.EzPickle):\n    \"\"\"\n    ### Description\n\n    This environment is the cartpole environment based on the work done by\n    Barto, Sutton, and Anderson in [\"Neuronlike adaptive elements that can\n    solve difficult learning control problems\"](https://ieeexplore.ieee.org/document/6313077),\n    just like in the classic environments but now powered by the Mujoco physics simulator -\n    allowing for more complex experiments (such as varying the effects of gravity).\n    This environment involves a cart that can moved linearly, with a pole fixed on it\n    at one end and having another end free. The cart can be pushed left or right, and the\n    goal is to balance the pole on the top of the cart by applying forces on the cart.\n\n    ### Action Space\n    The agent take a 1-element vector for actions.\n\n    The action space is a continuous `(action)` in `[-3, 3]`, where `action` represents\n    the numerical force applied to the cart (with magnitude representing the amount of\n    force and sign representing the direction)\n\n    | Num | Action                    | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit      |\n    |-----|---------------------------|-------------|-------------|----------------------------------|-------|-----------|\n    | 0   | Force applied on the cart | -3          | 3           | slider                           | slide | Force (N) |\n\n    ### Observation Space\n\n    The state space consists of positional values of different body parts of\n    the pendulum system, followed by the velocities of those individual parts (their derivatives)\n    with all the positions ordered before all the velocities.\n\n    The observation is a `ndarray` with shape `(4,)` where the elements correspond to the following:\n\n    | Num | Observation                                   | Min  | Max | Name (in corresponding XML file) | Joint | Unit                      |\n    | --- | --------------------------------------------- | ---- | --- | -------------------------------- | ----- | ------------------------- |\n    | 0   | position of the cart along the linear surface | -Inf | Inf | slider                           | slide | position (m)              |\n    | 1   | vertical angle of the pole on the cart        | -Inf | Inf | hinge                            | hinge | angle (rad)               |\n    | 2   | linear velocity of the cart                   | -Inf | Inf | slider                           | slide | velocity (m/s)            |\n    | 3   | angular velocity of the pole on the cart      | -Inf | Inf | hinge                            | hinge | anglular velocity (rad/s) |\n\n\n    ### Rewards\n\n    The goal is to make the inverted pendulum stand upright (within a certain angle limit)\n    as long as possible - as such a reward of +1 is awarded for each timestep that\n    the pole is upright.\n\n    ### Starting State\n    All observations start in state\n    (0.0, 0.0, 0.0, 0.0) with a uniform noise in the range\n    of [-0.01, 0.01] added to the values for stochasticity.\n\n    ### Episode End\n    The episode ends when any of the following happens:\n\n    1. Truncation: The episode duration reaches 1000 timesteps.\n    2. Termination: Any of the state space values is no longer finite.\n    3. Termination: The absolutely value of the vertical angle between the pole and the cart is greater than 0.2 radian.\n\n    ### Arguments\n\n    No additional arguments are currently supported.\n\n    ```\n    env = gym.make('InvertedPendulum-v4')\n    ```\n    There is no v3 for InvertedPendulum, unlike the robot environments where a\n    v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.\n\n\n    ### Version History\n\n    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3\n    * v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)\n    * v2: All continuous control environments now use mujoco_py >= 1.50\n    * v1: max_time_steps raised to 1000 for robot based tasks (including inverted pendulum)\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 25,\n    }\n\n    def __init__(self, **kwargs):\n        utils.EzPickle.__init__(self, **kwargs)\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(4,), dtype=np.float64)\n        MujocoEnv.__init__(\n            self,\n            \"inverted_pendulum.xml\",\n            2,\n            observation_space=observation_space,\n            **kwargs\n        )\n\n    def step(self, a):\n        reward = 1.0\n        self.do_simulation(a, self.frame_skip)\n        ob = self._get_obs()\n        terminated = bool(not np.isfinite(ob).all() or (np.abs(ob[1]) > 0.2))\n        if self.render_mode == \"human\":\n            self.render()\n        return ob, reward, terminated, False, {}\n\n    def reset_model(self):\n        qpos = self.init_qpos + self.np_random.uniform(\n            size=self.model.nq, low=-0.01, high=0.01\n        )\n        qvel = self.init_qvel + self.np_random.uniform(\n            size=self.model.nv, low=-0.01, high=0.01\n        )\n        self.set_state(qpos, qvel)\n        return self._get_obs()\n\n    def _get_obs(self):\n        return np.concatenate([self.data.qpos, self.data.qvel]).ravel()\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        v = self.viewer\n        v.cam.trackbodyid = 0\n        v.cam.distance = self.model.stat.extent\n"
  },
  {
    "path": "gym/envs/mujoco/mujoco_env.py",
    "content": "from os import path\nfrom typing import Optional, Union\n\nimport numpy as np\n\nimport gym\nfrom gym import error, logger, spaces\nfrom gym.spaces import Space\n\ntry:\n    import mujoco_py\nexcept ImportError as e:\n    MUJOCO_PY_IMPORT_ERROR = e\nelse:\n    MUJOCO_PY_IMPORT_ERROR = None\n\ntry:\n    import mujoco\nexcept ImportError as e:\n    MUJOCO_IMPORT_ERROR = e\nelse:\n    MUJOCO_IMPORT_ERROR = None\n\n\nDEFAULT_SIZE = 480\n\n\nclass BaseMujocoEnv(gym.Env):\n    \"\"\"Superclass for all MuJoCo environments.\"\"\"\n\n    def __init__(\n        self,\n        model_path,\n        frame_skip,\n        observation_space: Space,\n        render_mode: Optional[str] = None,\n        width: int = DEFAULT_SIZE,\n        height: int = DEFAULT_SIZE,\n        camera_id: Optional[int] = None,\n        camera_name: Optional[str] = None,\n    ):\n        if model_path.startswith(\"/\"):\n            self.fullpath = model_path\n        else:\n            self.fullpath = path.join(path.dirname(__file__), \"assets\", model_path)\n        if not path.exists(self.fullpath):\n            raise OSError(f\"File {self.fullpath} does not exist\")\n\n        self.width = width\n        self.height = height\n        self._initialize_simulation()  # may use width and height\n\n        self.init_qpos = self.data.qpos.ravel().copy()\n        self.init_qvel = self.data.qvel.ravel().copy()\n        self._viewers = {}\n\n        self.frame_skip = frame_skip\n\n        self.viewer = None\n\n        assert self.metadata[\"render_modes\"] == [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ], self.metadata[\"render_modes\"]\n        assert (\n            int(np.round(1.0 / self.dt)) == self.metadata[\"render_fps\"]\n        ), f'Expected value: {int(np.round(1.0 / self.dt))}, Actual value: {self.metadata[\"render_fps\"]}'\n\n        self.observation_space = observation_space\n        self._set_action_space()\n\n        self.render_mode = render_mode\n        self.camera_name = camera_name\n        self.camera_id = camera_id\n\n    def _set_action_space(self):\n        bounds = self.model.actuator_ctrlrange.copy().astype(np.float32)\n        low, high = bounds.T\n        self.action_space = spaces.Box(low=low, high=high, dtype=np.float32)\n        return self.action_space\n\n    # methods to override:\n    # ----------------------------\n\n    def reset_model(self):\n        \"\"\"\n        Reset the robot degrees of freedom (qpos and qvel).\n        Implement this in each subclass.\n        \"\"\"\n        raise NotImplementedError\n\n    def viewer_setup(self):\n        \"\"\"\n        This method is called when the viewer is initialized.\n        Optionally implement this method, if you need to tinker with camera position and so forth.\n        \"\"\"\n\n    def _initialize_simulation(self):\n        \"\"\"\n        Initialize MuJoCo simulation data structures mjModel and mjData.\n        \"\"\"\n        raise NotImplementedError\n\n    def _reset_simulation(self):\n        \"\"\"\n        Reset MuJoCo simulation data structures, mjModel and mjData.\n        \"\"\"\n        raise NotImplementedError\n\n    def _step_mujoco_simulation(self, ctrl, n_frames):\n        \"\"\"\n        Step over the MuJoCo simulation.\n        \"\"\"\n        raise NotImplementedError\n\n    def render(self):\n        \"\"\"\n        Render a frame from the MuJoCo simulation as specified by the render_mode.\n        \"\"\"\n        raise NotImplementedError\n\n    # -----------------------------\n\n    def reset(\n        self,\n        *,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ):\n        super().reset(seed=seed)\n\n        self._reset_simulation()\n\n        ob = self.reset_model()\n        if self.render_mode == \"human\":\n            self.render()\n        return ob, {}\n\n    def set_state(self, qpos, qvel):\n        \"\"\"\n        Set the joints position qpos and velocity qvel of the model. Override this method depending on the MuJoCo bindings used.\n        \"\"\"\n        assert qpos.shape == (self.model.nq,) and qvel.shape == (self.model.nv,)\n\n    @property\n    def dt(self):\n        return self.model.opt.timestep * self.frame_skip\n\n    def do_simulation(self, ctrl, n_frames):\n        \"\"\"\n        Step the simulation n number of frames and applying a control action.\n        \"\"\"\n        # Check control input is contained in the action space\n        if np.array(ctrl).shape != self.action_space.shape:\n            raise ValueError(\"Action dimension mismatch\")\n        self._step_mujoco_simulation(ctrl, n_frames)\n\n    def close(self):\n        if self.viewer is not None:\n            self.viewer = None\n            self._viewers = {}\n\n    def get_body_com(self, body_name):\n        \"\"\"Return the cartesian position of a body frame\"\"\"\n        raise NotImplementedError\n\n    def state_vector(self):\n        \"\"\"Return the position and velocity joint states of the model\"\"\"\n        return np.concatenate([self.data.qpos.flat, self.data.qvel.flat])\n\n\nclass MuJocoPyEnv(BaseMujocoEnv):\n    def __init__(\n        self,\n        model_path: str,\n        frame_skip: int,\n        observation_space: Space,\n        render_mode: Optional[str] = None,\n        width: int = DEFAULT_SIZE,\n        height: int = DEFAULT_SIZE,\n        camera_id: Optional[int] = None,\n        camera_name: Optional[str] = None,\n    ):\n        if MUJOCO_PY_IMPORT_ERROR is not None:\n            raise error.DependencyNotInstalled(\n                f\"{MUJOCO_PY_IMPORT_ERROR}. (HINT: you need to install mujoco_py, and also perform the setup instructions here: https://github.com/openai/mujoco-py/.)\"\n            )\n\n        logger.warn(\n            \"This version of the mujoco environments depends \"\n            \"on the mujoco-py bindings, which are no longer maintained \"\n            \"and may stop working. Please upgrade to the v4 versions of \"\n            \"the environments (which depend on the mujoco python bindings instead), unless \"\n            \"you are trying to precisely replicate previous works).\"\n        )\n\n        super().__init__(\n            model_path,\n            frame_skip,\n            observation_space,\n            render_mode,\n            width,\n            height,\n            camera_id,\n            camera_name,\n        )\n\n    def _initialize_simulation(self):\n        self.model = mujoco_py.load_model_from_path(self.fullpath)\n        self.sim = mujoco_py.MjSim(self.model)\n        self.data = self.sim.data\n\n    def _reset_simulation(self):\n        self.sim.reset()\n\n    def set_state(self, qpos, qvel):\n        super().set_state(qpos, qvel)\n        state = self.sim.get_state()\n        state = mujoco_py.MjSimState(state.time, qpos, qvel, state.act, state.udd_state)\n        self.sim.set_state(state)\n        self.sim.forward()\n\n    def _step_mujoco_simulation(self, ctrl, n_frames):\n        self.sim.data.ctrl[:] = ctrl\n\n        for _ in range(n_frames):\n            self.sim.step()\n\n    def render(self):\n        if self.render_mode is None:\n            gym.logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n            return\n\n        width, height = self.width, self.height\n        camera_name, camera_id = self.camera_name, self.camera_id\n        if self.render_mode in {\"rgb_array\", \"depth_array\"}:\n            if camera_id is not None and camera_name is not None:\n                raise ValueError(\n                    \"Both `camera_id` and `camera_name` cannot be\"\n                    \" specified at the same time.\"\n                )\n\n            no_camera_specified = camera_name is None and camera_id is None\n            if no_camera_specified:\n                camera_name = \"track\"\n\n            if camera_id is None and camera_name in self.model._camera_name2id:\n                if camera_name in self.model._camera_name2id:\n                    camera_id = self.model.camera_name2id(camera_name)\n\n                self._get_viewer(self.render_mode).render(\n                    width, height, camera_id=camera_id\n                )\n\n        if self.render_mode == \"rgb_array\":\n            data = self._get_viewer(self.render_mode).read_pixels(\n                width, height, depth=False\n            )\n            # original image is upside-down, so flip it\n            return data[::-1, :, :]\n        elif self.render_mode == \"depth_array\":\n            self._get_viewer(self.render_mode).render(width, height)\n            # Extract depth part of the read_pixels() tuple\n            data = self._get_viewer(self.render_mode).read_pixels(\n                width, height, depth=True\n            )[1]\n            # original image is upside-down, so flip it\n            return data[::-1, :]\n        elif self.render_mode == \"human\":\n            self._get_viewer(self.render_mode).render()\n\n    def _get_viewer(\n        self, mode\n    ) -> Union[\"mujoco_py.MjViewer\", \"mujoco_py.MjRenderContextOffscreen\"]:\n        self.viewer = self._viewers.get(mode)\n        if self.viewer is None:\n            if mode == \"human\":\n                self.viewer = mujoco_py.MjViewer(self.sim)\n\n            elif mode in {\"rgb_array\", \"depth_array\"}:\n                self.viewer = mujoco_py.MjRenderContextOffscreen(self.sim, -1)\n            else:\n                raise AttributeError(\n                    f\"Unknown mode: {mode}, expected modes: {self.metadata['render_modes']}\"\n                )\n\n            self.viewer_setup()\n            self._viewers[mode] = self.viewer\n\n        return self.viewer\n\n    def get_body_com(self, body_name):\n        return self.data.get_body_xpos(body_name)\n\n\nclass MujocoEnv(BaseMujocoEnv):\n    \"\"\"Superclass for MuJoCo environments.\"\"\"\n\n    def __init__(\n        self,\n        model_path,\n        frame_skip,\n        observation_space: Space,\n        render_mode: Optional[str] = None,\n        width: int = DEFAULT_SIZE,\n        height: int = DEFAULT_SIZE,\n        camera_id: Optional[int] = None,\n        camera_name: Optional[str] = None,\n    ):\n        if MUJOCO_IMPORT_ERROR is not None:\n            raise error.DependencyNotInstalled(\n                f\"{MUJOCO_IMPORT_ERROR}. (HINT: you need to install mujoco)\"\n            )\n        super().__init__(\n            model_path,\n            frame_skip,\n            observation_space,\n            render_mode,\n            width,\n            height,\n            camera_id,\n            camera_name,\n        )\n\n    def _initialize_simulation(self):\n        self.model = mujoco.MjModel.from_xml_path(self.fullpath)\n        # MjrContext will copy model.vis.global_.off* to con.off*\n        self.model.vis.global_.offwidth = self.width\n        self.model.vis.global_.offheight = self.height\n        self.data = mujoco.MjData(self.model)\n\n    def _reset_simulation(self):\n        mujoco.mj_resetData(self.model, self.data)\n\n    def set_state(self, qpos, qvel):\n        super().set_state(qpos, qvel)\n        self.data.qpos[:] = np.copy(qpos)\n        self.data.qvel[:] = np.copy(qvel)\n        if self.model.na == 0:\n            self.data.act[:] = None\n        mujoco.mj_forward(self.model, self.data)\n\n    def _step_mujoco_simulation(self, ctrl, n_frames):\n        self.data.ctrl[:] = ctrl\n\n        mujoco.mj_step(self.model, self.data, nstep=self.frame_skip)\n\n        # As of MuJoCo 2.0, force-related quantities like cacc are not computed\n        # unless there's a force sensor in the model.\n        # See https://github.com/openai/gym/issues/1541\n        mujoco.mj_rnePostConstraint(self.model, self.data)\n\n    def render(self):\n        if self.render_mode is None:\n            gym.logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n            return\n\n        if self.render_mode in {\n            \"rgb_array\",\n            \"depth_array\",\n        }:\n            camera_id = self.camera_id\n            camera_name = self.camera_name\n\n            if camera_id is not None and camera_name is not None:\n                raise ValueError(\n                    \"Both `camera_id` and `camera_name` cannot be\"\n                    \" specified at the same time.\"\n                )\n\n            no_camera_specified = camera_name is None and camera_id is None\n            if no_camera_specified:\n                camera_name = \"track\"\n\n            if camera_id is None:\n                camera_id = mujoco.mj_name2id(\n                    self.model,\n                    mujoco.mjtObj.mjOBJ_CAMERA,\n                    camera_name,\n                )\n\n                self._get_viewer(self.render_mode).render(camera_id=camera_id)\n\n        if self.render_mode == \"rgb_array\":\n            data = self._get_viewer(self.render_mode).read_pixels(depth=False)\n            # original image is upside-down, so flip it\n            return data[::-1, :, :]\n        elif self.render_mode == \"depth_array\":\n            self._get_viewer(self.render_mode).render()\n            # Extract depth part of the read_pixels() tuple\n            data = self._get_viewer(self.render_mode).read_pixels(depth=True)[1]\n            # original image is upside-down, so flip it\n            return data[::-1, :]\n        elif self.render_mode == \"human\":\n            self._get_viewer(self.render_mode).render()\n\n    def close(self):\n        if self.viewer is not None:\n            self.viewer.close()\n        super().close()\n\n    def _get_viewer(\n        self, mode\n    ) -> Union[\n        \"gym.envs.mujoco.mujoco_rendering.Viewer\",\n        \"gym.envs.mujoco.mujoco_rendering.RenderContextOffscreen\",\n    ]:\n        self.viewer = self._viewers.get(mode)\n        if self.viewer is None:\n            if mode == \"human\":\n                from gym.envs.mujoco.mujoco_rendering import Viewer\n\n                self.viewer = Viewer(self.model, self.data)\n            elif mode in {\"rgb_array\", \"depth_array\"}:\n                from gym.envs.mujoco.mujoco_rendering import RenderContextOffscreen\n\n                self.viewer = RenderContextOffscreen(self.model, self.data)\n            else:\n                raise AttributeError(\n                    f\"Unexpected mode: {mode}, expected modes: {self.metadata['render_modes']}\"\n                )\n\n            self.viewer_setup()\n            self._viewers[mode] = self.viewer\n        return self.viewer\n\n    def get_body_com(self, body_name):\n        return self.data.body(body_name).xpos\n"
  },
  {
    "path": "gym/envs/mujoco/mujoco_rendering.py",
    "content": "import collections\nimport os\nimport time\nfrom threading import Lock\n\nimport glfw\nimport imageio\nimport mujoco\nimport numpy as np\n\n\ndef _import_egl(width, height):\n    from mujoco.egl import GLContext\n\n    return GLContext(width, height)\n\n\ndef _import_glfw(width, height):\n    from mujoco.glfw import GLContext\n\n    return GLContext(width, height)\n\n\ndef _import_osmesa(width, height):\n    from mujoco.osmesa import GLContext\n\n    return GLContext(width, height)\n\n\n_ALL_RENDERERS = collections.OrderedDict(\n    [\n        (\"glfw\", _import_glfw),\n        (\"egl\", _import_egl),\n        (\"osmesa\", _import_osmesa),\n    ]\n)\n\n\nclass RenderContext:\n    \"\"\"Render context superclass for offscreen and window rendering.\"\"\"\n\n    def __init__(self, model, data, offscreen=True):\n\n        self.model = model\n        self.data = data\n        self.offscreen = offscreen\n        self.offwidth = model.vis.global_.offwidth\n        self.offheight = model.vis.global_.offheight\n        max_geom = 1000\n\n        mujoco.mj_forward(self.model, self.data)\n\n        self.scn = mujoco.MjvScene(self.model, max_geom)\n        self.cam = mujoco.MjvCamera()\n        self.vopt = mujoco.MjvOption()\n        self.pert = mujoco.MjvPerturb()\n        self.con = mujoco.MjrContext(self.model, mujoco.mjtFontScale.mjFONTSCALE_150)\n\n        self._markers = []\n        self._overlays = {}\n\n        self._init_camera()\n        self._set_mujoco_buffers()\n\n    def _set_mujoco_buffers(self):\n        if self.offscreen:\n            mujoco.mjr_setBuffer(mujoco.mjtFramebuffer.mjFB_OFFSCREEN, self.con)\n            if self.con.currentBuffer != mujoco.mjtFramebuffer.mjFB_OFFSCREEN:\n                raise RuntimeError(\"Offscreen rendering not supported\")\n        else:\n            mujoco.mjr_setBuffer(mujoco.mjtFramebuffer.mjFB_WINDOW, self.con)\n            if self.con.currentBuffer != mujoco.mjtFramebuffer.mjFB_WINDOW:\n                raise RuntimeError(\"Window rendering not supported\")\n\n    def render(self, camera_id=None, segmentation=False):\n        width, height = self.offwidth, self.offheight\n        rect = mujoco.MjrRect(left=0, bottom=0, width=width, height=height)\n\n        if camera_id is not None:\n            if camera_id == -1:\n                self.cam.type = mujoco.mjtCamera.mjCAMERA_FREE\n            else:\n                self.cam.type = mujoco.mjtCamera.mjCAMERA_FIXED\n            self.cam.fixedcamid = camera_id\n\n        mujoco.mjv_updateScene(\n            self.model,\n            self.data,\n            self.vopt,\n            self.pert,\n            self.cam,\n            mujoco.mjtCatBit.mjCAT_ALL,\n            self.scn,\n        )\n\n        if segmentation:\n            self.scn.flags[mujoco.mjtRndFlag.mjRND_SEGMENT] = 1\n            self.scn.flags[mujoco.mjtRndFlag.mjRND_IDCOLOR] = 1\n\n        for marker_params in self._markers:\n            self._add_marker_to_scene(marker_params)\n\n        mujoco.mjr_render(rect, self.scn, self.con)\n\n        for gridpos, (text1, text2) in self._overlays.items():\n            mujoco.mjr_overlay(\n                mujoco.mjtFontScale.mjFONTSCALE_150,\n                gridpos,\n                rect,\n                text1.encode(),\n                text2.encode(),\n                self.con,\n            )\n\n        if segmentation:\n            self.scn.flags[mujoco.mjtRndFlag.mjRND_SEGMENT] = 0\n            self.scn.flags[mujoco.mjtRndFlag.mjRND_IDCOLOR] = 0\n\n    def read_pixels(self, depth=True, segmentation=False):\n        width, height = self.offwidth, self.offheight\n        rect = mujoco.MjrRect(left=0, bottom=0, width=width, height=height)\n\n        rgb_arr = np.zeros(3 * rect.width * rect.height, dtype=np.uint8)\n        depth_arr = np.zeros(rect.width * rect.height, dtype=np.float32)\n\n        mujoco.mjr_readPixels(rgb_arr, depth_arr, rect, self.con)\n        rgb_img = rgb_arr.reshape(rect.height, rect.width, 3)\n\n        ret_img = rgb_img\n        if segmentation:\n            seg_img = (\n                rgb_img[:, :, 0]\n                + rgb_img[:, :, 1] * (2**8)\n                + rgb_img[:, :, 2] * (2**16)\n            )\n            seg_img[seg_img >= (self.scn.ngeom + 1)] = 0\n            seg_ids = np.full((self.scn.ngeom + 1, 2), fill_value=-1, dtype=np.int32)\n\n            for i in range(self.scn.ngeom):\n                geom = self.scn.geoms[i]\n                if geom.segid != -1:\n                    seg_ids[geom.segid + 1, 0] = geom.objtype\n                    seg_ids[geom.segid + 1, 1] = geom.objid\n            ret_img = seg_ids[seg_img]\n\n        if depth:\n            depth_img = depth_arr.reshape(rect.height, rect.width)\n            return (ret_img, depth_img)\n        else:\n            return ret_img\n\n    def _init_camera(self):\n        self.cam.type = mujoco.mjtCamera.mjCAMERA_FREE\n        self.cam.fixedcamid = -1\n        for i in range(3):\n            self.cam.lookat[i] = np.median(self.data.geom_xpos[:, i])\n        self.cam.distance = self.model.stat.extent\n\n    def add_overlay(self, gridpos: int, text1: str, text2: str):\n        \"\"\"Overlays text on the scene.\"\"\"\n        if gridpos not in self._overlays:\n            self._overlays[gridpos] = [\"\", \"\"]\n        self._overlays[gridpos][0] += text1 + \"\\n\"\n        self._overlays[gridpos][1] += text2 + \"\\n\"\n\n    def add_marker(self, **marker_params):\n        self._markers.append(marker_params)\n\n    def _add_marker_to_scene(self, marker):\n        if self.scn.ngeom >= self.scn.maxgeom:\n            raise RuntimeError(\"Ran out of geoms. maxgeom: %d\" % self.scn.maxgeom)\n\n        g = self.scn.geoms[self.scn.ngeom]\n        # default values.\n        g.dataid = -1\n        g.objtype = mujoco.mjtObj.mjOBJ_UNKNOWN\n        g.objid = -1\n        g.category = mujoco.mjtCatBit.mjCAT_DECOR\n        g.texid = -1\n        g.texuniform = 0\n        g.texrepeat[0] = 1\n        g.texrepeat[1] = 1\n        g.emission = 0\n        g.specular = 0.5\n        g.shininess = 0.5\n        g.reflectance = 0\n        g.type = mujoco.mjtGeom.mjGEOM_BOX\n        g.size[:] = np.ones(3) * 0.1\n        g.mat[:] = np.eye(3)\n        g.rgba[:] = np.ones(4)\n\n        for key, value in marker.items():\n            if isinstance(value, (int, float, mujoco._enums.mjtGeom)):\n                setattr(g, key, value)\n            elif isinstance(value, (tuple, list, np.ndarray)):\n                attr = getattr(g, key)\n                attr[:] = np.asarray(value).reshape(attr.shape)\n            elif isinstance(value, str):\n                assert key == \"label\", \"Only label is a string in mjtGeom.\"\n                if value is None:\n                    g.label[0] = 0\n                else:\n                    g.label = value\n            elif hasattr(g, key):\n                raise ValueError(\n                    \"mjtGeom has attr {} but type {} is invalid\".format(\n                        key, type(value)\n                    )\n                )\n            else:\n                raise ValueError(\"mjtGeom doesn't have field %s\" % key)\n\n        self.scn.ngeom += 1\n\n    def close(self):\n        \"\"\"Override close in your rendering subclass to perform any necessary cleanup\n        after env.close() is called.\n        \"\"\"\n        pass\n\n\nclass RenderContextOffscreen(RenderContext):\n    \"\"\"Offscreen rendering class with opengl context.\"\"\"\n\n    def __init__(self, model, data):\n        # We must make GLContext before MjrContext\n        width = model.vis.global_.offwidth\n        height = model.vis.global_.offheight\n        self._get_opengl_backend(width, height)\n        self.opengl_context.make_current()\n\n        super().__init__(model, data, offscreen=True)\n\n    def _get_opengl_backend(self, width, height):\n\n        backend = os.environ.get(\"MUJOCO_GL\")\n        if backend is not None:\n            try:\n                self.opengl_context = _ALL_RENDERERS[backend](width, height)\n            except KeyError:\n                raise RuntimeError(\n                    \"Environment variable {} must be one of {!r}: got {!r}.\".format(\n                        \"MUJOCO_GL\", _ALL_RENDERERS.keys(), backend\n                    )\n                )\n\n        else:\n            for name, _ in _ALL_RENDERERS.items():\n                try:\n                    self.opengl_context = _ALL_RENDERERS[name](width, height)\n                    backend = name\n                    break\n                except:  # noqa:E722\n                    pass\n            if backend is None:\n                raise RuntimeError(\n                    \"No OpenGL backend could be imported. Attempting to create a \"\n                    \"rendering context will result in a RuntimeError.\"\n                )\n\n\nclass Viewer(RenderContext):\n    \"\"\"Class for window rendering in all MuJoCo environments.\"\"\"\n\n    def __init__(self, model, data):\n        self._gui_lock = Lock()\n        self._button_left_pressed = False\n        self._button_right_pressed = False\n        self._last_mouse_x = 0\n        self._last_mouse_y = 0\n        self._paused = False\n        self._transparent = False\n        self._contacts = False\n        self._render_every_frame = True\n        self._image_idx = 0\n        self._image_path = \"/tmp/frame_%07d.png\"\n        self._time_per_render = 1 / 60.0\n        self._run_speed = 1.0\n        self._loop_count = 0\n        self._advance_by_one_step = False\n        self._hide_menu = False\n\n        # glfw init\n        glfw.init()\n        width, height = glfw.get_video_mode(glfw.get_primary_monitor()).size\n        self.window = glfw.create_window(width // 2, height // 2, \"mujoco\", None, None)\n        glfw.make_context_current(self.window)\n        glfw.swap_interval(1)\n\n        framebuffer_width, framebuffer_height = glfw.get_framebuffer_size(self.window)\n        window_width, _ = glfw.get_window_size(self.window)\n        self._scale = framebuffer_width * 1.0 / window_width\n\n        # set callbacks\n        glfw.set_cursor_pos_callback(self.window, self._cursor_pos_callback)\n        glfw.set_mouse_button_callback(self.window, self._mouse_button_callback)\n        glfw.set_scroll_callback(self.window, self._scroll_callback)\n        glfw.set_key_callback(self.window, self._key_callback)\n\n        # get viewport\n        self.viewport = mujoco.MjrRect(0, 0, framebuffer_width, framebuffer_height)\n\n        super().__init__(model, data, offscreen=False)\n\n    def _key_callback(self, window, key, scancode, action, mods):\n        if action != glfw.RELEASE:\n            return\n        # Switch cameras\n        elif key == glfw.KEY_TAB:\n            self.cam.fixedcamid += 1\n            self.cam.type = mujoco.mjtCamera.mjCAMERA_FIXED\n            if self.cam.fixedcamid >= self.model.ncam:\n                self.cam.fixedcamid = -1\n                self.cam.type = mujoco.mjtCamera.mjCAMERA_FREE\n        # Pause simulation\n        elif key == glfw.KEY_SPACE and self._paused is not None:\n            self._paused = not self._paused\n        # Advances simulation by one step.\n        elif key == glfw.KEY_RIGHT and self._paused is not None:\n            self._advance_by_one_step = True\n            self._paused = True\n        # Slows down simulation\n        elif key == glfw.KEY_S:\n            self._run_speed /= 2.0\n        # Speeds up simulation\n        elif key == glfw.KEY_F:\n            self._run_speed *= 2.0\n        # Turn off / turn on rendering every frame.\n        elif key == glfw.KEY_D:\n            self._render_every_frame = not self._render_every_frame\n        # Capture screenshot\n        elif key == glfw.KEY_T:\n            img = np.zeros(\n                (\n                    glfw.get_framebuffer_size(self.window)[1],\n                    glfw.get_framebuffer_size(self.window)[0],\n                    3,\n                ),\n                dtype=np.uint8,\n            )\n            mujoco.mjr_readPixels(img, None, self.viewport, self.con)\n            imageio.imwrite(self._image_path % self._image_idx, np.flipud(img))\n            self._image_idx += 1\n        # Display contact forces\n        elif key == glfw.KEY_C:\n            self._contacts = not self._contacts\n            self.vopt.flags[mujoco.mjtVisFlag.mjVIS_CONTACTPOINT] = self._contacts\n            self.vopt.flags[mujoco.mjtVisFlag.mjVIS_CONTACTFORCE] = self._contacts\n        # Display coordinate frames\n        elif key == glfw.KEY_E:\n            self.vopt.frame = 1 - self.vopt.frame\n        # Hide overlay menu\n        elif key == glfw.KEY_H:\n            self._hide_menu = not self._hide_menu\n        # Make transparent\n        elif key == glfw.KEY_R:\n            self._transparent = not self._transparent\n            if self._transparent:\n                self.model.geom_rgba[:, 3] /= 5.0\n            else:\n                self.model.geom_rgba[:, 3] *= 5.0\n        # Geom group visibility\n        elif key in (glfw.KEY_0, glfw.KEY_1, glfw.KEY_2, glfw.KEY_3, glfw.KEY_4):\n            self.vopt.geomgroup[key - glfw.KEY_0] ^= 1\n        # Quit\n        if key == glfw.KEY_ESCAPE:\n            print(\"Pressed ESC\")\n            print(\"Quitting.\")\n            glfw.destroy_window(self.window)\n            glfw.terminate()\n\n    def _cursor_pos_callback(self, window, xpos, ypos):\n        if not (self._button_left_pressed or self._button_right_pressed):\n            return\n\n        mod_shift = (\n            glfw.get_key(window, glfw.KEY_LEFT_SHIFT) == glfw.PRESS\n            or glfw.get_key(window, glfw.KEY_RIGHT_SHIFT) == glfw.PRESS\n        )\n        if self._button_right_pressed:\n            action = (\n                mujoco.mjtMouse.mjMOUSE_MOVE_H\n                if mod_shift\n                else mujoco.mjtMouse.mjMOUSE_MOVE_V\n            )\n        elif self._button_left_pressed:\n            action = (\n                mujoco.mjtMouse.mjMOUSE_ROTATE_H\n                if mod_shift\n                else mujoco.mjtMouse.mjMOUSE_ROTATE_V\n            )\n        else:\n            action = mujoco.mjtMouse.mjMOUSE_ZOOM\n\n        dx = int(self._scale * xpos) - self._last_mouse_x\n        dy = int(self._scale * ypos) - self._last_mouse_y\n        width, height = glfw.get_framebuffer_size(window)\n\n        with self._gui_lock:\n            mujoco.mjv_moveCamera(\n                self.model, action, dx / height, dy / height, self.scn, self.cam\n            )\n\n        self._last_mouse_x = int(self._scale * xpos)\n        self._last_mouse_y = int(self._scale * ypos)\n\n    def _mouse_button_callback(self, window, button, act, mods):\n        self._button_left_pressed = (\n            glfw.get_mouse_button(window, glfw.MOUSE_BUTTON_LEFT) == glfw.PRESS\n        )\n        self._button_right_pressed = (\n            glfw.get_mouse_button(window, glfw.MOUSE_BUTTON_RIGHT) == glfw.PRESS\n        )\n\n        x, y = glfw.get_cursor_pos(window)\n        self._last_mouse_x = int(self._scale * x)\n        self._last_mouse_y = int(self._scale * y)\n\n    def _scroll_callback(self, window, x_offset, y_offset):\n        with self._gui_lock:\n            mujoco.mjv_moveCamera(\n                self.model,\n                mujoco.mjtMouse.mjMOUSE_ZOOM,\n                0,\n                -0.05 * y_offset,\n                self.scn,\n                self.cam,\n            )\n\n    def _create_overlay(self):\n        topleft = mujoco.mjtGridPos.mjGRID_TOPLEFT\n        bottomleft = mujoco.mjtGridPos.mjGRID_BOTTOMLEFT\n\n        if self._render_every_frame:\n            self.add_overlay(topleft, \"\", \"\")\n        else:\n            self.add_overlay(\n                topleft,\n                \"Run speed = %.3f x real time\" % self._run_speed,\n                \"[S]lower, [F]aster\",\n            )\n        self.add_overlay(\n            topleft, \"Ren[d]er every frame\", \"On\" if self._render_every_frame else \"Off\"\n        )\n        self.add_overlay(\n            topleft,\n            \"Switch camera (#cams = %d)\" % (self.model.ncam + 1),\n            \"[Tab] (camera ID = %d)\" % self.cam.fixedcamid,\n        )\n        self.add_overlay(topleft, \"[C]ontact forces\", \"On\" if self._contacts else \"Off\")\n        self.add_overlay(topleft, \"T[r]ansparent\", \"On\" if self._transparent else \"Off\")\n        if self._paused is not None:\n            if not self._paused:\n                self.add_overlay(topleft, \"Stop\", \"[Space]\")\n            else:\n                self.add_overlay(topleft, \"Start\", \"[Space]\")\n                self.add_overlay(\n                    topleft, \"Advance simulation by one step\", \"[right arrow]\"\n                )\n        self.add_overlay(\n            topleft, \"Referenc[e] frames\", \"On\" if self.vopt.frame == 1 else \"Off\"\n        )\n        self.add_overlay(topleft, \"[H]ide Menu\", \"\")\n        if self._image_idx > 0:\n            fname = self._image_path % (self._image_idx - 1)\n            self.add_overlay(topleft, \"Cap[t]ure frame\", \"Saved as %s\" % fname)\n        else:\n            self.add_overlay(topleft, \"Cap[t]ure frame\", \"\")\n        self.add_overlay(topleft, \"Toggle geomgroup visibility\", \"0-4\")\n\n        self.add_overlay(bottomleft, \"FPS\", \"%d%s\" % (1 / self._time_per_render, \"\"))\n        self.add_overlay(\n            bottomleft, \"Solver iterations\", str(self.data.solver_iter + 1)\n        )\n        self.add_overlay(\n            bottomleft, \"Step\", str(round(self.data.time / self.model.opt.timestep))\n        )\n        self.add_overlay(bottomleft, \"timestep\", \"%.5f\" % self.model.opt.timestep)\n\n    def render(self):\n        # mjv_updateScene, mjr_render, mjr_overlay\n        def update():\n            # fill overlay items\n            self._create_overlay()\n\n            render_start = time.time()\n            if self.window is None:\n                return\n            elif glfw.window_should_close(self.window):\n                glfw.destroy_window(self.window)\n                glfw.terminate()\n            self.viewport.width, self.viewport.height = glfw.get_framebuffer_size(\n                self.window\n            )\n            with self._gui_lock:\n                # update scene\n                mujoco.mjv_updateScene(\n                    self.model,\n                    self.data,\n                    self.vopt,\n                    mujoco.MjvPerturb(),\n                    self.cam,\n                    mujoco.mjtCatBit.mjCAT_ALL.value,\n                    self.scn,\n                )\n                # marker items\n                for marker in self._markers:\n                    self._add_marker_to_scene(marker)\n                # render\n                mujoco.mjr_render(self.viewport, self.scn, self.con)\n                # overlay items\n                if not self._hide_menu:\n                    for gridpos, [t1, t2] in self._overlays.items():\n                        mujoco.mjr_overlay(\n                            mujoco.mjtFontScale.mjFONTSCALE_150,\n                            gridpos,\n                            self.viewport,\n                            t1,\n                            t2,\n                            self.con,\n                        )\n                glfw.swap_buffers(self.window)\n            glfw.poll_events()\n            self._time_per_render = 0.9 * self._time_per_render + 0.1 * (\n                time.time() - render_start\n            )\n\n            # clear overlay\n            self._overlays.clear()\n\n        if self._paused:\n            while self._paused:\n                update()\n                if self._advance_by_one_step:\n                    self._advance_by_one_step = False\n                    break\n        else:\n            self._loop_count += self.model.opt.timestep / (\n                self._time_per_render * self._run_speed\n            )\n            if self._render_every_frame:\n                self._loop_count = 1\n            while self._loop_count > 0:\n                update()\n                self._loop_count -= 1\n\n        # clear markers\n        self._markers[:] = []\n\n    def close(self):\n        glfw.destroy_window(self.window)\n        glfw.terminate()\n"
  },
  {
    "path": "gym/envs/mujoco/pusher.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\n\nclass PusherEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 20,\n    }\n\n    def __init__(self, **kwargs):\n        utils.EzPickle.__init__(self, **kwargs)\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(23,), dtype=np.float64)\n        MuJocoPyEnv.__init__(\n            self, \"pusher.xml\", 5, observation_space=observation_space, **kwargs\n        )\n\n    def step(self, a):\n        vec_1 = self.get_body_com(\"object\") - self.get_body_com(\"tips_arm\")\n        vec_2 = self.get_body_com(\"object\") - self.get_body_com(\"goal\")\n\n        reward_near = -np.linalg.norm(vec_1)\n        reward_dist = -np.linalg.norm(vec_2)\n        reward_ctrl = -np.square(a).sum()\n        reward = reward_dist + 0.1 * reward_ctrl + 0.5 * reward_near\n\n        self.do_simulation(a, self.frame_skip)\n        if self.render_mode == \"human\":\n            self.render()\n\n        ob = self._get_obs()\n        return (\n            ob,\n            reward,\n            False,\n            False,\n            dict(reward_dist=reward_dist, reward_ctrl=reward_ctrl),\n        )\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.trackbodyid = -1\n        self.viewer.cam.distance = 4.0\n\n    def reset_model(self):\n        qpos = self.init_qpos\n\n        self.goal_pos = np.asarray([0, 0])\n        while True:\n            self.cylinder_pos = np.concatenate(\n                [\n                    self.np_random.uniform(low=-0.3, high=0, size=1),\n                    self.np_random.uniform(low=-0.2, high=0.2, size=1),\n                ]\n            )\n            if np.linalg.norm(self.cylinder_pos - self.goal_pos) > 0.17:\n                break\n\n        qpos[-4:-2] = self.cylinder_pos\n        qpos[-2:] = self.goal_pos\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=-0.005, high=0.005, size=self.model.nv\n        )\n        qvel[-4:] = 0\n        self.set_state(qpos, qvel)\n        return self._get_obs()\n\n    def _get_obs(self):\n        return np.concatenate(\n            [\n                self.sim.data.qpos.flat[:7],\n                self.sim.data.qvel.flat[:7],\n                self.get_body_com(\"tips_arm\"),\n                self.get_body_com(\"object\"),\n                self.get_body_com(\"goal\"),\n            ]\n        )\n"
  },
  {
    "path": "gym/envs/mujoco/pusher_v4.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MujocoEnv\nfrom gym.spaces import Box\n\n\nclass PusherEnv(MujocoEnv, utils.EzPickle):\n    \"\"\"\n    ### Description\n    \"Pusher\" is a multi-jointed robot arm which is very similar to that of a human.\n     The goal is to move a target cylinder (called *object*) to a goal position using the robot's end effector (called *fingertip*).\n      The robot consists of shoulder, elbow, forearm, and wrist joints.\n\n    ### Action Space\n    The action space is a `Box(-2, 2, (7,), float32)`. An action `(a, b)` represents the torques applied at the hinge joints.\n\n    | Num | Action                                                             | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit         |\n    |-----|--------------------------------------------------------------------|-------------|-------------|----------------------------------|-------|--------------|\n    | 0    | Rotation of the panning the shoulder                              | -2          | 2           | r_shoulder_pan_joint             | hinge | torque (N m) |\n    | 1    | Rotation of the shoulder lifting joint                            | -2          | 2           | r_shoulder_lift_joint            | hinge | torque (N m) |\n    | 2    | Rotation of the shoulder rolling joint                            | -2          | 2           | r_upper_arm_roll_joint           | hinge | torque (N m) |\n    | 3    | Rotation of hinge joint that flexed the elbow                     | -2          | 2           | r_elbow_flex_joint               | hinge | torque (N m) |\n    | 4    | Rotation of hinge that rolls the forearm                          | -2          | 2           | r_forearm_roll_joint             | hinge | torque (N m) |\n    | 5    | Rotation of flexing the wrist                                     | -2          | 2           | r_wrist_flex_joint               | hinge | torque (N m) |\n    | 6    | Rotation of rolling the wrist                                     | -2          | 2           | r_wrist_roll_joint               | hinge | torque (N m) |\n\n    ### Observation Space\n\n    Observations consist of\n\n    - Angle of rotational joints on the pusher\n    - Angular velocities of rotational joints on the pusher\n    - The coordinates of the fingertip of the pusher\n    - The coordinates of the object to be moved\n    - The coordinates of the goal position\n\n    The observation is a `ndarray` with shape `(23,)` where the elements correspond to the table below.\n    An analogy can be drawn to a human arm in order to help understand the state space, with the words flex and roll meaning the\n    same as human joints.\n\n    | Num | Observation                                              | Min  | Max | Name (in corresponding XML file) | Joint    | Unit                     |\n    | --- | -------------------------------------------------------- | ---- | --- | -------------------------------- | -------- | ------------------------ |\n    | 0   | Rotation of the panning the shoulder                     | -Inf | Inf | r_shoulder_pan_joint             | hinge    | angle (rad)              |\n    | 1   | Rotation of the shoulder lifting joint                   | -Inf | Inf | r_shoulder_lift_joint            | hinge    | angle (rad)              |\n    | 2   | Rotation of the shoulder rolling joint                   | -Inf | Inf | r_upper_arm_roll_joint           | hinge    | angle (rad)              |\n    | 3   | Rotation of hinge joint that flexed the elbow            | -Inf | Inf | r_elbow_flex_joint               | hinge    | angle (rad)              |\n    | 4   | Rotation of hinge that rolls the forearm                 | -Inf | Inf | r_forearm_roll_joint             | hinge    | angle (rad)              |\n    | 5   | Rotation of flexing the wrist                            | -Inf | Inf | r_wrist_flex_joint               | hinge    | angle (rad)              |\n    | 6   | Rotation of rolling the wrist                            | -Inf | Inf | r_wrist_roll_joint               | hinge    | angle (rad)              |\n    | 7   | Rotational velocity of the panning the shoulder          | -Inf | Inf | r_shoulder_pan_joint             | hinge    | angular velocity (rad/s) |\n    | 8   | Rotational velocity of the shoulder lifting joint        | -Inf | Inf | r_shoulder_lift_joint            | hinge    | angular velocity (rad/s) |\n    | 9   | Rotational velocity of the shoulder rolling joint        | -Inf | Inf | r_upper_arm_roll_joint           | hinge    | angular velocity (rad/s) |\n    | 10  | Rotational velocity of hinge joint that flexed the elbow | -Inf | Inf | r_elbow_flex_joint               | hinge    | angular velocity (rad/s) |\n    | 11  | Rotational velocity of hinge that rolls the forearm      | -Inf | Inf | r_forearm_roll_joint             | hinge    | angular velocity (rad/s) |\n    | 12  | Rotational velocity of flexing the wrist                 | -Inf | Inf | r_wrist_flex_joint               | hinge    | angular velocity (rad/s) |\n    | 13  | Rotational velocity of rolling the wrist                 | -Inf | Inf | r_wrist_roll_joint               | hinge    | angular velocity (rad/s) |\n    | 14  | x-coordinate of the fingertip of the pusher              | -Inf | Inf | tips_arm                         | slide    | position (m)             |\n    | 15  | y-coordinate of the fingertip of the pusher              | -Inf | Inf | tips_arm                         | slide    | position (m)             |\n    | 16  | z-coordinate of the fingertip of the pusher              | -Inf | Inf | tips_arm                         | slide    | position (m)             |\n    | 17  | x-coordinate of the object to be moved                   | -Inf | Inf | object (obj_slidex)              | slide    | position (m)             |\n    | 18  | y-coordinate of the object to be moved                   | -Inf | Inf | object (obj_slidey)              | slide    | position (m)             |\n    | 19  | z-coordinate of the object to be moved                   | -Inf | Inf | object                           | cylinder | position (m)             |\n    | 20  | x-coordinate of the goal position of the object          | -Inf | Inf | goal (goal_slidex)               | slide    | position (m)             |\n    | 21  | y-coordinate of the goal position of the object          | -Inf | Inf | goal (goal_slidey)               | slide    | position (m)             |\n    | 22  | z-coordinate of the goal position of the object          | -Inf | Inf | goal                             | sphere   | position (m)             |\n\n\n    ### Rewards\n    The reward consists of two parts:\n    - *reward_near *: This reward is a measure of how far the *fingertip*\n    of the pusher (the unattached end) is from the object, with a more negative\n    value assigned for when the pusher's *fingertip* is further away from the\n    target. It is calculated as the negative vector norm of (position of\n    the fingertip - position of target), or *-norm(\"fingertip\" - \"target\")*.\n    - *reward_dist *: This reward is a measure of how far the object is from\n    the target goal position, with a more negative value assigned for object is\n    further away from the target. It is calculated as the negative vector norm of\n    (position of the object - position of goal), or *-norm(\"object\" - \"target\")*.\n    - *reward_control*: A negative reward for penalising the pusher if\n    it takes actions that are too large. It is measured as the negative squared\n    Euclidean norm of the action, i.e. as *- sum(action<sup>2</sup>)*.\n\n    The total reward returned is ***reward*** *=* *reward_dist + 0.1 * reward_ctrl + 0.5 * reward_near*\n\n    Unlike other environments, Pusher does not allow you to specify weights for the individual reward terms.\n    However, `info` does contain the keys *reward_dist* and *reward_ctrl*. Thus, if you'd like to weight the terms,\n    you should create a wrapper that computes the weighted reward from `info`.\n\n\n    ### Starting State\n    All pusher (not including object and goal) states start in\n    (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0). A uniform noise in the range\n    [-0.005, 0.005] is added to the velocity attributes only. The velocities of\n    the object and goal are permanently set to 0. The object's x-position is selected uniformly\n    between [-0.3, 0] while the y-position is selected uniformly between [-0.2, 0.2], and this\n    process is repeated until the vector norm between the object's (x,y) position and origin is not greater\n    than 0.17. The goal always have the same position of (0.45, -0.05, -0.323).\n\n    The default framerate is 5 with each frame lasting for 0.01, giving rise to a *dt = 5 * 0.01 = 0.05*\n\n    ### Episode End\n\n    The episode ends when any of the following happens:\n\n    1. Truncation: The episode duration reaches a 100 timesteps.\n    2. Termination: Any of the state space values is no longer finite.\n\n    ### Arguments\n\n    No additional arguments are currently supported (in v2 and lower),\n    but modifications can be made to the XML file in the assets folder\n    (or by changing the path to a modified XML file in another folder)..\n\n    ```\n    env = gym.make('Pusher-v4')\n    ```\n\n    There is no v3 for Pusher, unlike the robot environments where a v3 and\n    beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.\n\n\n    ### Version History\n\n    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3\n    * v2: All continuous control environments now use mujoco_py >= 1.50\n    * v1: max_time_steps raised to 1000 for robot based tasks (not including reacher, which has a max_time_steps of 50). Added reward_threshold to environments.\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 20,\n    }\n\n    def __init__(self, **kwargs):\n        utils.EzPickle.__init__(self, **kwargs)\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(23,), dtype=np.float64)\n        MujocoEnv.__init__(\n            self, \"pusher.xml\", 5, observation_space=observation_space, **kwargs\n        )\n\n    def step(self, a):\n        vec_1 = self.get_body_com(\"object\") - self.get_body_com(\"tips_arm\")\n        vec_2 = self.get_body_com(\"object\") - self.get_body_com(\"goal\")\n\n        reward_near = -np.linalg.norm(vec_1)\n        reward_dist = -np.linalg.norm(vec_2)\n        reward_ctrl = -np.square(a).sum()\n        reward = reward_dist + 0.1 * reward_ctrl + 0.5 * reward_near\n\n        self.do_simulation(a, self.frame_skip)\n        if self.render_mode == \"human\":\n            self.render()\n\n        ob = self._get_obs()\n        return (\n            ob,\n            reward,\n            False,\n            False,\n            dict(reward_dist=reward_dist, reward_ctrl=reward_ctrl),\n        )\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.trackbodyid = -1\n        self.viewer.cam.distance = 4.0\n\n    def reset_model(self):\n        qpos = self.init_qpos\n\n        self.goal_pos = np.asarray([0, 0])\n        while True:\n            self.cylinder_pos = np.concatenate(\n                [\n                    self.np_random.uniform(low=-0.3, high=0, size=1),\n                    self.np_random.uniform(low=-0.2, high=0.2, size=1),\n                ]\n            )\n            if np.linalg.norm(self.cylinder_pos - self.goal_pos) > 0.17:\n                break\n\n        qpos[-4:-2] = self.cylinder_pos\n        qpos[-2:] = self.goal_pos\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=-0.005, high=0.005, size=self.model.nv\n        )\n        qvel[-4:] = 0\n        self.set_state(qpos, qvel)\n        return self._get_obs()\n\n    def _get_obs(self):\n        return np.concatenate(\n            [\n                self.data.qpos.flat[:7],\n                self.data.qvel.flat[:7],\n                self.get_body_com(\"tips_arm\"),\n                self.get_body_com(\"object\"),\n                self.get_body_com(\"goal\"),\n            ]\n        )\n"
  },
  {
    "path": "gym/envs/mujoco/reacher.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\n\nclass ReacherEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 50,\n    }\n\n    def __init__(self, **kwargs):\n        utils.EzPickle.__init__(self, **kwargs)\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(11,), dtype=np.float64)\n        MuJocoPyEnv.__init__(\n            self, \"reacher.xml\", 2, observation_space=observation_space, **kwargs\n        )\n\n    def step(self, a):\n        vec = self.get_body_com(\"fingertip\") - self.get_body_com(\"target\")\n        reward_dist = -np.linalg.norm(vec)\n        reward_ctrl = -np.square(a).sum()\n        reward = reward_dist + reward_ctrl\n\n        self.do_simulation(a, self.frame_skip)\n        if self.render_mode == \"human\":\n            self.render()\n\n        ob = self._get_obs()\n        return (\n            ob,\n            reward,\n            False,\n            False,\n            dict(reward_dist=reward_dist, reward_ctrl=reward_ctrl),\n        )\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.trackbodyid = 0\n\n    def reset_model(self):\n        qpos = (\n            self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nq)\n            + self.init_qpos\n        )\n        while True:\n            self.goal = self.np_random.uniform(low=-0.2, high=0.2, size=2)\n            if np.linalg.norm(self.goal) < 0.2:\n                break\n        qpos[-2:] = self.goal\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=-0.005, high=0.005, size=self.model.nv\n        )\n        qvel[-2:] = 0\n        self.set_state(qpos, qvel)\n        return self._get_obs()\n\n    def _get_obs(self):\n        theta = self.sim.data.qpos.flat[:2]\n        return np.concatenate(\n            [\n                np.cos(theta),\n                np.sin(theta),\n                self.sim.data.qpos.flat[2:],\n                self.sim.data.qvel.flat[:2],\n                self.get_body_com(\"fingertip\") - self.get_body_com(\"target\"),\n            ]\n        )\n"
  },
  {
    "path": "gym/envs/mujoco/reacher_v4.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MujocoEnv\nfrom gym.spaces import Box\n\n\nclass ReacherEnv(MujocoEnv, utils.EzPickle):\n    \"\"\"\n    ### Description\n    \"Reacher\" is a two-jointed robot arm. The goal is to move the robot's end effector (called *fingertip*) close to a\n    target that is spawned at a random position.\n\n    ### Action Space\n    The action space is a `Box(-1, 1, (2,), float32)`. An action `(a, b)` represents the torques applied at the hinge joints.\n\n    | Num | Action                                                                          | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit |\n    |-----|---------------------------------------------------------------------------------|-------------|-------------|--------------------------|-------|------|\n    | 0   | Torque applied at the first hinge (connecting the link to the point of fixture) | -1 | 1 | joint0  | hinge | torque (N m) |\n    | 1   |  Torque applied at the second hinge (connecting the two links)                  | -1 | 1 | joint1  | hinge | torque (N m) |\n\n    ### Observation Space\n\n    Observations consist of\n\n    - The cosine of the angles of the two arms\n    - The sine of the angles of the two arms\n    - The coordinates of the target\n    - The angular velocities of the arms\n    - The vector between the target and the reacher's fingertip (3 dimensional with the last element being 0)\n\n    The observation is a `ndarray` with shape `(11,)` where the elements correspond to the following:\n\n    | Num | Observation                                                                                    | Min  | Max | Name (in corresponding XML file) | Joint | Unit                     |\n    | --- | ---------------------------------------------------------------------------------------------- | ---- | --- | -------------------------------- | ----- | ------------------------ |\n    | 0   | cosine of the angle of the first arm                                                           | -Inf | Inf | cos(joint0)                      | hinge | unitless                 |\n    | 1   | cosine of the angle of the second arm                                                          | -Inf | Inf | cos(joint1)                      | hinge | unitless                 |\n    | 2   | sine of the angle of the first arm                                                             | -Inf | Inf | cos(joint0)                      | hinge | unitless                 |\n    | 3   | sine of the angle of the second arm                                                            | -Inf | Inf | cos(joint1)                      | hinge | unitless                 |\n    | 4   | x-coordinate of the target                                                                    | -Inf | Inf | target_x                         | slide | position (m)             |\n    | 5   | y-coordinate of the target                                                                    | -Inf | Inf | target_y                         | slide | position (m)             |\n    | 6   | angular velocity of the first arm                                                              | -Inf | Inf | joint0                           | hinge | angular velocity (rad/s) |\n    | 7   | angular velocity of the second arm                                                             | -Inf | Inf | joint1                           | hinge | angular velocity (rad/s) |\n    | 8   | x-value of position_fingertip - position_target                                                | -Inf | Inf | NA                               | slide | position (m)             |\n    | 9   | y-value of position_fingertip - position_target                                                | -Inf | Inf | NA                               | slide | position (m)             |\n    | 10  | z-value of position_fingertip - position_target (0 since reacher is 2d and z is same for both) | -Inf | Inf | NA                               | slide | position (m)             |\n\n\n    Most Gym environments just return the positions and velocity of the\n    joints in the `.xml` file as the state of the environment. However, in\n    reacher the state is created by combining only certain elements of the\n    position and velocity, and performing some function transformations on them.\n    If one is to read the `.xml` for reacher then they will find 4 joints:\n\n    | Num | Observation                 | Min      | Max      | Name (in corresponding XML file) | Joint | Unit               |\n    |-----|-----------------------------|----------|----------|----------------------------------|-------|--------------------|\n    | 0   | angle of the first arm      | -Inf     | Inf      | joint0                           | hinge | angle (rad)        |\n    | 1   | angle of the second arm     | -Inf     | Inf      | joint1                           | hinge | angle (rad)        |\n    | 2   | x-coordinate of the target  | -Inf     | Inf      | target_x                         | slide | position (m)       |\n    | 3   | y-coordinate of the target  | -Inf     | Inf      | target_y                         | slide | position (m)       |\n\n\n    ### Rewards\n    The reward consists of two parts:\n    - *reward_distance*: This reward is a measure of how far the *fingertip*\n    of the reacher (the unattached end) is from the target, with a more negative\n    value assigned for when the reacher's *fingertip* is further away from the\n    target. It is calculated as the negative vector norm of (position of\n    the fingertip - position of target), or *-norm(\"fingertip\" - \"target\")*.\n    - *reward_control*: A negative reward for penalising the walker if\n    it takes actions that are too large. It is measured as the negative squared\n    Euclidean norm of the action, i.e. as *- sum(action<sup>2</sup>)*.\n\n    The total reward returned is ***reward*** *=* *reward_distance + reward_control*\n\n    Unlike other environments, Reacher does not allow you to specify weights for the individual reward terms.\n    However, `info` does contain the keys *reward_dist* and *reward_ctrl*. Thus, if you'd like to weight the terms,\n    you should create a wrapper that computes the weighted reward from `info`.\n\n\n    ### Starting State\n    All observations start in state\n    (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)\n    with a noise added for stochasticity. A uniform noise in the range\n    [-0.1, 0.1] is added to the positional attributes, while the target position\n    is selected uniformly at random in a disk of radius 0.2 around the origin.\n    Independent, uniform noise in the\n    range of [-0.005, 0.005] is added to the velocities, and the last\n    element (\"fingertip\" - \"target\") is calculated at the end once everything\n    is set. The default setting has a framerate of 2 and a *dt = 2 * 0.01 = 0.02*\n\n    ### Episode End\n\n    The episode ends when any of the following happens:\n\n    1. Truncation: The episode duration reaches a 50 timesteps (with a new random target popping up if the reacher's fingertip reaches it before 50 timesteps)\n    2. Termination: Any of the state space values is no longer finite.\n\n    ### Arguments\n\n    No additional arguments are currently supported (in v2 and lower),\n    but modifications can be made to the XML file in the assets folder\n    (or by changing the path to a modified XML file in another folder)..\n\n    ```\n    env = gym.make('Reacher-v4')\n    ```\n\n    There is no v3 for Reacher, unlike the robot environments where a v3 and\n    beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.\n\n\n    ### Version History\n\n    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3\n    * v2: All continuous control environments now use mujoco_py >= 1.50\n    * v1: max_time_steps raised to 1000 for robot based tasks (not including reacher, which has a max_time_steps of 50). Added reward_threshold to environments.\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 50,\n    }\n\n    def __init__(self, **kwargs):\n        utils.EzPickle.__init__(self, **kwargs)\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(11,), dtype=np.float64)\n        MujocoEnv.__init__(\n            self, \"reacher.xml\", 2, observation_space=observation_space, **kwargs\n        )\n\n    def step(self, a):\n        vec = self.get_body_com(\"fingertip\") - self.get_body_com(\"target\")\n        reward_dist = -np.linalg.norm(vec)\n        reward_ctrl = -np.square(a).sum()\n        reward = reward_dist + reward_ctrl\n\n        self.do_simulation(a, self.frame_skip)\n        if self.render_mode == \"human\":\n            self.render()\n\n        ob = self._get_obs()\n        return (\n            ob,\n            reward,\n            False,\n            False,\n            dict(reward_dist=reward_dist, reward_ctrl=reward_ctrl),\n        )\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.trackbodyid = 0\n\n    def reset_model(self):\n        qpos = (\n            self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nq)\n            + self.init_qpos\n        )\n        while True:\n            self.goal = self.np_random.uniform(low=-0.2, high=0.2, size=2)\n            if np.linalg.norm(self.goal) < 0.2:\n                break\n        qpos[-2:] = self.goal\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=-0.005, high=0.005, size=self.model.nv\n        )\n        qvel[-2:] = 0\n        self.set_state(qpos, qvel)\n        return self._get_obs()\n\n    def _get_obs(self):\n        theta = self.data.qpos.flat[:2]\n        return np.concatenate(\n            [\n                np.cos(theta),\n                np.sin(theta),\n                self.data.qpos.flat[2:],\n                self.data.qvel.flat[:2],\n                self.get_body_com(\"fingertip\") - self.get_body_com(\"target\"),\n            ]\n        )\n"
  },
  {
    "path": "gym/envs/mujoco/swimmer.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\n\nclass SwimmerEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 25,\n    }\n\n    def __init__(self, **kwargs):\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(8,), dtype=np.float64)\n        MuJocoPyEnv.__init__(\n            self, \"swimmer.xml\", 4, observation_space=observation_space, **kwargs\n        )\n        utils.EzPickle.__init__(self, **kwargs)\n\n    def step(self, a):\n        ctrl_cost_coeff = 0.0001\n        xposbefore = self.sim.data.qpos[0]\n        self.do_simulation(a, self.frame_skip)\n        xposafter = self.sim.data.qpos[0]\n\n        reward_fwd = (xposafter - xposbefore) / self.dt\n        reward_ctrl = -ctrl_cost_coeff * np.square(a).sum()\n        reward = reward_fwd + reward_ctrl\n        ob = self._get_obs()\n\n        if self.render_mode == \"human\":\n            self.render()\n\n        return (\n            ob,\n            reward,\n            False,\n            False,\n            dict(reward_fwd=reward_fwd, reward_ctrl=reward_ctrl),\n        )\n\n    def _get_obs(self):\n        qpos = self.sim.data.qpos\n        qvel = self.sim.data.qvel\n        return np.concatenate([qpos.flat[2:], qvel.flat])\n\n    def reset_model(self):\n        self.set_state(\n            self.init_qpos\n            + self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nq),\n            self.init_qvel\n            + self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nv),\n        )\n        return self._get_obs()\n"
  },
  {
    "path": "gym/envs/mujoco/swimmer_v3.py",
    "content": "__credits__ = [\"Rushiv Arora\"]\n\nimport numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {}\n\n\nclass SwimmerEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 25,\n    }\n\n    def __init__(\n        self,\n        xml_file=\"swimmer.xml\",\n        forward_reward_weight=1.0,\n        ctrl_cost_weight=1e-4,\n        reset_noise_scale=0.1,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            xml_file,\n            forward_reward_weight,\n            ctrl_cost_weight,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._forward_reward_weight = forward_reward_weight\n        self._ctrl_cost_weight = ctrl_cost_weight\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n\n        if exclude_current_positions_from_observation:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(8,), dtype=np.float64\n            )\n        else:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(10,), dtype=np.float64\n            )\n\n        MuJocoPyEnv.__init__(\n            self, xml_file, 4, observation_space=observation_space, **kwargs\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(action))\n        return control_cost\n\n    def step(self, action):\n        xy_position_before = self.sim.data.qpos[0:2].copy()\n        self.do_simulation(action, self.frame_skip)\n        xy_position_after = self.sim.data.qpos[0:2].copy()\n\n        xy_velocity = (xy_position_after - xy_position_before) / self.dt\n        x_velocity, y_velocity = xy_velocity\n\n        forward_reward = self._forward_reward_weight * x_velocity\n        ctrl_cost = self.control_cost(action)\n\n        observation = self._get_obs()\n        reward = forward_reward - ctrl_cost\n        info = {\n            \"reward_fwd\": forward_reward,\n            \"reward_ctrl\": -ctrl_cost,\n            \"x_position\": xy_position_after[0],\n            \"y_position\": xy_position_after[1],\n            \"distance_from_origin\": np.linalg.norm(xy_position_after, ord=2),\n            \"x_velocity\": x_velocity,\n            \"y_velocity\": y_velocity,\n            \"forward_reward\": forward_reward,\n        }\n\n        if self.render_mode == \"human\":\n            self.render()\n\n        return observation, reward, False, False, info\n\n    def _get_obs(self):\n        position = self.sim.data.qpos.flat.copy()\n        velocity = self.sim.data.qvel.flat.copy()\n\n        if self._exclude_current_positions_from_observation:\n            position = position[2:]\n\n        observation = np.concatenate([position, velocity]).ravel()\n        return observation\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nv\n        )\n\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/mujoco/swimmer_v4.py",
    "content": "__credits__ = [\"Rushiv Arora\"]\n\nimport numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MujocoEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {}\n\n\nclass SwimmerEnv(MujocoEnv, utils.EzPickle):\n    \"\"\"\n    ### Description\n\n    This environment corresponds to the Swimmer environment described in Rémi Coulom's PhD thesis\n    [\"Reinforcement Learning Using Neural Networks, with Applications to Motor Control\"](https://tel.archives-ouvertes.fr/tel-00003985/document).\n    The environment aims to increase the number of independent state and control\n    variables as compared to the classic control environments. The swimmers\n    consist of three or more segments ('***links***') and one less articulation\n    joints ('***rotors***') - one rotor joint connecting exactly two links to\n    form a linear chain. The swimmer is suspended in a two dimensional pool and\n    always starts in the same position (subject to some deviation drawn from an\n    uniform distribution), and the goal is to move as fast as possible towards\n    the right by applying torque on the rotors and using the fluids friction.\n\n    ### Notes\n\n    The problem parameters are:\n    Problem parameters:\n    * *n*: number of body parts\n    * *m<sub>i</sub>*: mass of part *i* (*i* ∈ {1...n})\n    * *l<sub>i</sub>*: length of part *i* (*i* ∈ {1...n})\n    * *k*: viscous-friction coefficient\n\n    While the default environment has *n* = 3, *l<sub>i</sub>* = 0.1,\n    and *k* = 0.1. It is possible to pass a custom MuJoCo XML file during construction to increase the\n    number of links, or to tweak any of the parameters.\n\n    ### Action Space\n    The action space is a `Box(-1, 1, (2,), float32)`. An action represents the torques applied between *links*\n\n    | Num | Action                             | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit         |\n    |-----|------------------------------------|-------------|-------------|----------------------------------|-------|--------------|\n    | 0   | Torque applied on the first rotor  | -1          | 1           | motor1_rot                       | hinge | torque (N m) |\n    | 1   | Torque applied on the second rotor | -1          | 1           | motor2_rot                       | hinge | torque (N m) |\n\n    ### Observation Space\n\n    By default, observations consists of:\n    * θ<sub>i</sub>: angle of part *i* with respect to the *x* axis\n    * θ<sub>i</sub>': its derivative with respect to time (angular velocity)\n\n    In the default case, observations do not include the x- and y-coordinates of the front tip. These may\n    be included by passing `exclude_current_positions_from_observation=False` during construction.\n    Then, the observation space will have 10 dimensions where the first two dimensions\n    represent the x- and y-coordinates of the front tip.\n    Regardless of whether `exclude_current_positions_from_observation` was set to true or false, the x- and y-coordinates\n    will be returned in `info` with keys `\"x_position\"` and `\"y_position\"`, respectively.\n\n    By default, the observation is a `ndarray` with shape `(8,)` where the elements correspond to the following:\n\n    | Num | Observation                          | Min  | Max | Name (in corresponding XML file) | Joint | Unit                     |\n    | --- | ------------------------------------ | ---- | --- | -------------------------------- | ----- | ------------------------ |\n    | 0   | angle of the front tip               | -Inf | Inf | free_body_rot                    | hinge | angle (rad)              |\n    | 1   | angle of the first rotor             | -Inf | Inf | motor1_rot                       | hinge | angle (rad)              |\n    | 2   | angle of the second rotor            | -Inf | Inf | motor2_rot                       | hinge | angle (rad)              |\n    | 3   | velocity of the tip along the x-axis | -Inf | Inf | slider1                          | slide | velocity (m/s)           |\n    | 4   | velocity of the tip along the y-axis | -Inf | Inf | slider2                          | slide | velocity (m/s)           |\n    | 5   | angular velocity of front tip        | -Inf | Inf | free_body_rot                    | hinge | angular velocity (rad/s) |\n    | 6   | angular velocity of first rotor      | -Inf | Inf | motor1_rot                       | hinge | angular velocity (rad/s) |\n    | 7   | angular velocity of second rotor     | -Inf | Inf | motor2_rot                             | hinge | angular velocity (rad/s) |\n\n    ### Rewards\n    The reward consists of two parts:\n    - *forward_reward*: A reward of moving forward which is measured\n    as *`forward_reward_weight` * (x-coordinate before action - x-coordinate after action)/dt*. *dt* is\n    the time between actions and is dependent on the frame_skip parameter\n    (default is 4), where the frametime is 0.01 - making the\n    default *dt = 4 * 0.01 = 0.04*. This reward would be positive if the swimmer\n    swims right as desired.\n    - *ctrl_cost*: A cost for penalising the swimmer if it takes\n    actions that are too large. It is measured as *`ctrl_cost_weight` *\n    sum(action<sup>2</sup>)* where *`ctrl_cost_weight`* is a parameter set for the\n    control and has a default value of 1e-4\n\n    The total reward returned is ***reward*** *=* *forward_reward - ctrl_cost* and `info` will also contain the individual reward terms\n\n    ### Starting State\n    All observations start in state (0,0,0,0,0,0,0,0) with a Uniform noise in the range of [-`reset_noise_scale`, `reset_noise_scale`] is added to the initial state for stochasticity.\n\n    ### Episode End\n    The episode truncates when the episode length is greater than 1000.\n\n    ### Arguments\n\n    No additional arguments are currently supported in v2 and lower.\n\n    ```\n    gym.make('Swimmer-v4')\n    ```\n\n    v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.\n\n    ```\n    env = gym.make('Swimmer-v4', ctrl_cost_weight=0.1, ....)\n    ```\n\n    | Parameter                                    | Type      | Default         | Description                                                                                                                                                               |\n    | -------------------------------------------- | --------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n    | `xml_file`                                   | **str**   | `\"swimmer.xml\"` | Path to a MuJoCo model                                                                                                                                                    |\n    | `forward_reward_weight`                      | **float** | `1.0`           | Weight for _forward_reward_ term (see section on reward)                                                                                                                  |\n    | `ctrl_cost_weight`                           | **float** | `1e-4`          | Weight for _ctrl_cost_ term (see section on reward)                                                                                                                       |\n    | `reset_noise_scale`                          | **float** | `0.1`           | Scale of random perturbations of initial position and velocity (see section on Starting State)                                                                            |\n    | `exclude_current_positions_from_observation` | **bool**  | `True`          | Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies |\n\n\n    ### Version History\n\n    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3\n    * v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)\n    * v2: All continuous control environments now use mujoco_py >= 1.50\n    * v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 25,\n    }\n\n    def __init__(\n        self,\n        forward_reward_weight=1.0,\n        ctrl_cost_weight=1e-4,\n        reset_noise_scale=0.1,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            forward_reward_weight,\n            ctrl_cost_weight,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._forward_reward_weight = forward_reward_weight\n        self._ctrl_cost_weight = ctrl_cost_weight\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n        if exclude_current_positions_from_observation:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(8,), dtype=np.float64\n            )\n        else:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(10,), dtype=np.float64\n            )\n        MujocoEnv.__init__(\n            self, \"swimmer.xml\", 4, observation_space=observation_space, **kwargs\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(action))\n        return control_cost\n\n    def step(self, action):\n        xy_position_before = self.data.qpos[0:2].copy()\n        self.do_simulation(action, self.frame_skip)\n        xy_position_after = self.data.qpos[0:2].copy()\n\n        xy_velocity = (xy_position_after - xy_position_before) / self.dt\n        x_velocity, y_velocity = xy_velocity\n\n        forward_reward = self._forward_reward_weight * x_velocity\n\n        ctrl_cost = self.control_cost(action)\n\n        observation = self._get_obs()\n        reward = forward_reward - ctrl_cost\n        info = {\n            \"reward_fwd\": forward_reward,\n            \"reward_ctrl\": -ctrl_cost,\n            \"x_position\": xy_position_after[0],\n            \"y_position\": xy_position_after[1],\n            \"distance_from_origin\": np.linalg.norm(xy_position_after, ord=2),\n            \"x_velocity\": x_velocity,\n            \"y_velocity\": y_velocity,\n            \"forward_reward\": forward_reward,\n        }\n\n        if self.render_mode == \"human\":\n            self.render()\n\n        return observation, reward, False, False, info\n\n    def _get_obs(self):\n        position = self.data.qpos.flat.copy()\n        velocity = self.data.qvel.flat.copy()\n\n        if self._exclude_current_positions_from_observation:\n            position = position[2:]\n\n        observation = np.concatenate([position, velocity]).ravel()\n        return observation\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nv\n        )\n\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/mujoco/walker2d.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\n\nclass Walker2dEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 125,\n    }\n\n    def __init__(self, **kwargs):\n        observation_space = Box(low=-np.inf, high=np.inf, shape=(17,), dtype=np.float64)\n        MuJocoPyEnv.__init__(\n            self, \"walker2d.xml\", 4, observation_space=observation_space, **kwargs\n        )\n        utils.EzPickle.__init__(self, **kwargs)\n\n    def step(self, a):\n        posbefore = self.sim.data.qpos[0]\n        self.do_simulation(a, self.frame_skip)\n        posafter, height, ang = self.sim.data.qpos[0:3]\n\n        alive_bonus = 1.0\n        reward = (posafter - posbefore) / self.dt\n        reward += alive_bonus\n        reward -= 1e-3 * np.square(a).sum()\n        terminated = not (height > 0.8 and height < 2.0 and ang > -1.0 and ang < 1.0)\n        ob = self._get_obs()\n\n        if self.render_mode == \"human\":\n            self.render()\n\n        return ob, reward, terminated, False, {}\n\n    def _get_obs(self):\n        qpos = self.sim.data.qpos\n        qvel = self.sim.data.qvel\n        return np.concatenate([qpos[1:], np.clip(qvel, -10, 10)]).ravel()\n\n    def reset_model(self):\n        self.set_state(\n            self.init_qpos\n            + self.np_random.uniform(low=-0.005, high=0.005, size=self.model.nq),\n            self.init_qvel\n            + self.np_random.uniform(low=-0.005, high=0.005, size=self.model.nv),\n        )\n        return self._get_obs()\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        self.viewer.cam.trackbodyid = 2\n        self.viewer.cam.distance = self.model.stat.extent * 0.5\n        self.viewer.cam.lookat[2] = 1.15\n        self.viewer.cam.elevation = -20\n"
  },
  {
    "path": "gym/envs/mujoco/walker2d_v3.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MuJocoPyEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {\n    \"trackbodyid\": 2,\n    \"distance\": 4.0,\n    \"lookat\": np.array((0.0, 0.0, 1.15)),\n    \"elevation\": -20.0,\n}\n\n\nclass Walker2dEnv(MuJocoPyEnv, utils.EzPickle):\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 125,\n    }\n\n    def __init__(\n        self,\n        xml_file=\"walker2d.xml\",\n        forward_reward_weight=1.0,\n        ctrl_cost_weight=1e-3,\n        healthy_reward=1.0,\n        terminate_when_unhealthy=True,\n        healthy_z_range=(0.8, 2.0),\n        healthy_angle_range=(-1.0, 1.0),\n        reset_noise_scale=5e-3,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            xml_file,\n            forward_reward_weight,\n            ctrl_cost_weight,\n            healthy_reward,\n            terminate_when_unhealthy,\n            healthy_z_range,\n            healthy_angle_range,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._forward_reward_weight = forward_reward_weight\n        self._ctrl_cost_weight = ctrl_cost_weight\n\n        self._healthy_reward = healthy_reward\n        self._terminate_when_unhealthy = terminate_when_unhealthy\n\n        self._healthy_z_range = healthy_z_range\n        self._healthy_angle_range = healthy_angle_range\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n\n        if exclude_current_positions_from_observation:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(17,), dtype=np.float64\n            )\n        else:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64\n            )\n\n        MuJocoPyEnv.__init__(\n            self, xml_file, 4, observation_space=observation_space, **kwargs\n        )\n\n    @property\n    def healthy_reward(self):\n        return (\n            float(self.is_healthy or self._terminate_when_unhealthy)\n            * self._healthy_reward\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(action))\n        return control_cost\n\n    @property\n    def is_healthy(self):\n        z, angle = self.sim.data.qpos[1:3]\n\n        min_z, max_z = self._healthy_z_range\n        min_angle, max_angle = self._healthy_angle_range\n\n        healthy_z = min_z < z < max_z\n        healthy_angle = min_angle < angle < max_angle\n        is_healthy = healthy_z and healthy_angle\n\n        return is_healthy\n\n    @property\n    def terminated(self):\n        terminated = not self.is_healthy if self._terminate_when_unhealthy else False\n        return terminated\n\n    def _get_obs(self):\n        position = self.sim.data.qpos.flat.copy()\n        velocity = np.clip(self.sim.data.qvel.flat.copy(), -10, 10)\n\n        if self._exclude_current_positions_from_observation:\n            position = position[1:]\n\n        observation = np.concatenate((position, velocity)).ravel()\n        return observation\n\n    def step(self, action):\n        x_position_before = self.sim.data.qpos[0]\n        self.do_simulation(action, self.frame_skip)\n        x_position_after = self.sim.data.qpos[0]\n        x_velocity = (x_position_after - x_position_before) / self.dt\n\n        ctrl_cost = self.control_cost(action)\n        forward_reward = self._forward_reward_weight * x_velocity\n        healthy_reward = self.healthy_reward\n\n        rewards = forward_reward + healthy_reward\n        costs = ctrl_cost\n\n        observation = self._get_obs()\n        reward = rewards - costs\n        terminated = self.terminated\n        info = {\n            \"x_position\": x_position_after,\n            \"x_velocity\": x_velocity,\n        }\n\n        if self.render_mode == \"human\":\n            self.render()\n\n        return observation, reward, terminated, False, info\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nv\n        )\n\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/mujoco/walker2d_v4.py",
    "content": "import numpy as np\n\nfrom gym import utils\nfrom gym.envs.mujoco import MujocoEnv\nfrom gym.spaces import Box\n\nDEFAULT_CAMERA_CONFIG = {\n    \"trackbodyid\": 2,\n    \"distance\": 4.0,\n    \"lookat\": np.array((0.0, 0.0, 1.15)),\n    \"elevation\": -20.0,\n}\n\n\nclass Walker2dEnv(MujocoEnv, utils.EzPickle):\n    \"\"\"\n    ### Description\n\n    This environment builds on the hopper environment based on the work done by Erez, Tassa, and Todorov\n    in [\"Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks\"](http://www.roboticsproceedings.org/rss07/p10.pdf)\n    by adding another set of legs making it possible for the robot to walker forward instead of\n    hop. Like other Mujoco environments, this environment aims to increase the number of independent state\n    and control variables as compared to the classic control environments. The walker is a\n    two-dimensional two-legged figure that consist of four main body parts - a single torso at the top\n    (with the two legs splitting after the torso), two thighs in the middle below the torso, two legs\n    in the bottom below the thighs, and two feet attached to the legs on which the entire body rests.\n    The goal is to make coordinate both sets of feet, legs, and thighs to move in the forward (right)\n    direction by applying torques on the six hinges connecting the six body parts.\n\n    ### Action Space\n    The action space is a `Box(-1, 1, (6,), float32)`. An action represents the torques applied at the hinge joints.\n\n    | Num | Action                                 | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit         |\n    |-----|----------------------------------------|-------------|-------------|----------------------------------|-------|--------------|\n    | 0   | Torque applied on the thigh rotor      | -1          | 1           | thigh_joint                      | hinge | torque (N m) |\n    | 1   | Torque applied on the leg rotor        | -1          | 1           | leg_joint                        | hinge | torque (N m) |\n    | 2   | Torque applied on the foot rotor       | -1          | 1           | foot_joint                       | hinge | torque (N m) |\n    | 3   | Torque applied on the left thigh rotor | -1          | 1           | thigh_left_joint                 | hinge | torque (N m) |\n    | 4   | Torque applied on the left leg rotor   | -1          | 1           | leg_left_joint                   | hinge | torque (N m) |\n    | 5   | Torque applied on the left foot rotor  | -1          | 1           | foot_left_joint                  | hinge | torque (N m) |\n\n    ### Observation Space\n\n    Observations consist of positional values of different body parts of the walker,\n    followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities.\n\n    By default, observations do not include the x-coordinate of the top. It may\n    be included by passing `exclude_current_positions_from_observation=False` during construction.\n    In that case, the observation space will have 18 dimensions where the first dimension\n    represent the x-coordinates of the top of the walker.\n    Regardless of whether `exclude_current_positions_from_observation` was set to true or false, the x-coordinate\n    of the top will be returned in `info` with key `\"x_position\"`.\n\n    By default, observation is a `ndarray` with shape `(17,)` where the elements correspond to the following:\n\n    | Num | Observation                                      | Min  | Max | Name (in corresponding XML file) | Joint | Unit                     |\n    | --- | ------------------------------------------------ | ---- | --- | -------------------------------- | ----- | ------------------------ |\n    | 0   | z-coordinate of the top (height of hopper)       | -Inf | Inf | rootz (torso)                    | slide | position (m)             |\n    | 1   | angle of the top                                 | -Inf | Inf | rooty (torso)                    | hinge | angle (rad)              |\n    | 2   | angle of the thigh joint                         | -Inf | Inf | thigh_joint                      | hinge | angle (rad)              |\n    | 3   | angle of the leg joint                           | -Inf | Inf | leg_joint                        | hinge | angle (rad)              |\n    | 4   | angle of the foot joint                          | -Inf | Inf | foot_joint                       | hinge | angle (rad)              |\n    | 5   | angle of the left thigh joint                    | -Inf | Inf | thigh_left_joint                 | hinge | angle (rad)              |\n    | 6   | angle of the left leg joint                      | -Inf | Inf | leg_left_joint                   | hinge | angle (rad)              |\n    | 7   | angle of the left foot joint                     | -Inf | Inf | foot_left_joint                  | hinge | angle (rad)              |\n    | 8   | velocity of the x-coordinate of the top          | -Inf | Inf | rootx                            | slide | velocity (m/s)           |\n    | 9   | velocity of the z-coordinate (height) of the top | -Inf | Inf | rootz                            | slide | velocity (m/s)           |\n    | 10  | angular velocity of the angle of the top         | -Inf | Inf | rooty                            | hinge | angular velocity (rad/s) |\n    | 11  | angular velocity of the thigh hinge              | -Inf | Inf | thigh_joint                      | hinge | angular velocity (rad/s) |\n    | 12  | angular velocity of the leg hinge                | -Inf | Inf | leg_joint                        | hinge | angular velocity (rad/s) |\n    | 13  | angular velocity of the foot hinge               | -Inf | Inf | foot_joint                       | hinge | angular velocity (rad/s) |\n    | 14  | angular velocity of the thigh hinge              | -Inf | Inf | thigh_left_joint                 | hinge | angular velocity (rad/s) |\n    | 15  | angular velocity of the leg hinge                | -Inf | Inf | leg_left_joint                   | hinge | angular velocity (rad/s) |\n    | 16  | angular velocity of the foot hinge               | -Inf | Inf | foot_left_joint                  | hinge | angular velocity (rad/s) |\n    ### Rewards\n    The reward consists of three parts:\n    - *healthy_reward*: Every timestep that the walker is alive, it receives a fixed reward of value `healthy_reward`,\n    - *forward_reward*: A reward of walking forward which is measured as\n    *`forward_reward_weight` * (x-coordinate before action - x-coordinate after action)/dt*.\n    *dt* is the time between actions and is dependeent on the frame_skip parameter\n    (default is 4), where the frametime is 0.002 - making the default\n    *dt = 4 * 0.002 = 0.008*. This reward would be positive if the walker walks forward (right) desired.\n    - *ctrl_cost*: A cost for penalising the walker if it\n    takes actions that are too large. It is measured as\n    *`ctrl_cost_weight` * sum(action<sup>2</sup>)* where *`ctrl_cost_weight`* is\n    a parameter set for the control and has a default value of 0.001\n\n    The total reward returned is ***reward*** *=* *healthy_reward bonus + forward_reward - ctrl_cost* and `info` will also contain the individual reward terms\n\n    ### Starting State\n    All observations start in state\n    (0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)\n    with a uniform noise in the range of [-`reset_noise_scale`, `reset_noise_scale`] added to the values for stochasticity.\n\n    ### Episode End\n    The walker is said to be unhealthy if any of the following happens:\n\n    1. Any of the state space values is no longer finite\n    2. The height of the walker is ***not*** in the closed interval specified by `healthy_z_range`\n    3. The absolute value of the angle (`observation[1]` if `exclude_current_positions_from_observation=False`, else `observation[2]`) is ***not*** in the closed interval specified by `healthy_angle_range`\n\n    If `terminate_when_unhealthy=True` is passed during construction (which is the default),\n    the episode ends when any of the following happens:\n\n    1. Truncation: The episode duration reaches a 1000 timesteps\n    2. Termination: The walker is unhealthy\n\n    If `terminate_when_unhealthy=False` is passed, the episode is ended only when 1000 timesteps are exceeded.\n\n    ### Arguments\n\n    No additional arguments are currently supported in v2 and lower.\n\n    ```\n    env = gym.make('Walker2d-v4')\n    ```\n\n    v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.\n\n    ```\n    env = gym.make('Walker2d-v4', ctrl_cost_weight=0.1, ....)\n    ```\n\n    | Parameter                                    | Type      | Default          | Description                                                                                                                                                       |\n    | -------------------------------------------- | --------- | ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n    | `xml_file`                                   | **str**   | `\"walker2d.xml\"` | Path to a MuJoCo model                                                                                                                                            |\n    | `forward_reward_weight`                      | **float** | `1.0`            | Weight for _forward_reward_ term (see section on reward)                                                                                                          |\n    | `ctrl_cost_weight`                           | **float** | `1e-3`           | Weight for _ctr_cost_ term (see section on reward)                                                                                                                |\n    | `healthy_reward`                             | **float** | `1.0`            | Constant reward given if the ant is \"healthy\" after timestep                                                                                                      |\n    | `terminate_when_unhealthy`                   | **bool**  | `True`           | If true, issue a done signal if the z-coordinate of the walker is no longer healthy                                                                               |\n    | `healthy_z_range`                            | **tuple** | `(0.8, 2)`       | The z-coordinate of the top of the walker must be in this range to be considered healthy                                                                          |\n    | `healthy_angle_range`                        | **tuple** | `(-1, 1)`        | The angle must be in this range to be considered healthy                                                                                                          |\n    | `reset_noise_scale`                          | **float** | `5e-3`           | Scale of random perturbations of initial position and velocity (see section on Starting State)                                                                    |\n    | `exclude_current_positions_from_observation` | **bool**  | `True`           | Whether or not to omit the x-coordinate from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies |\n\n\n    ### Version History\n\n    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3\n    * v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)\n    * v2: All continuous control environments now use mujoco_py >= 1.50\n    * v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\n            \"human\",\n            \"rgb_array\",\n            \"depth_array\",\n        ],\n        \"render_fps\": 125,\n    }\n\n    def __init__(\n        self,\n        forward_reward_weight=1.0,\n        ctrl_cost_weight=1e-3,\n        healthy_reward=1.0,\n        terminate_when_unhealthy=True,\n        healthy_z_range=(0.8, 2.0),\n        healthy_angle_range=(-1.0, 1.0),\n        reset_noise_scale=5e-3,\n        exclude_current_positions_from_observation=True,\n        **kwargs\n    ):\n        utils.EzPickle.__init__(\n            self,\n            forward_reward_weight,\n            ctrl_cost_weight,\n            healthy_reward,\n            terminate_when_unhealthy,\n            healthy_z_range,\n            healthy_angle_range,\n            reset_noise_scale,\n            exclude_current_positions_from_observation,\n            **kwargs\n        )\n\n        self._forward_reward_weight = forward_reward_weight\n        self._ctrl_cost_weight = ctrl_cost_weight\n\n        self._healthy_reward = healthy_reward\n        self._terminate_when_unhealthy = terminate_when_unhealthy\n\n        self._healthy_z_range = healthy_z_range\n        self._healthy_angle_range = healthy_angle_range\n\n        self._reset_noise_scale = reset_noise_scale\n\n        self._exclude_current_positions_from_observation = (\n            exclude_current_positions_from_observation\n        )\n\n        if exclude_current_positions_from_observation:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(17,), dtype=np.float64\n            )\n        else:\n            observation_space = Box(\n                low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64\n            )\n\n        MujocoEnv.__init__(\n            self, \"walker2d.xml\", 4, observation_space=observation_space, **kwargs\n        )\n\n    @property\n    def healthy_reward(self):\n        return (\n            float(self.is_healthy or self._terminate_when_unhealthy)\n            * self._healthy_reward\n        )\n\n    def control_cost(self, action):\n        control_cost = self._ctrl_cost_weight * np.sum(np.square(action))\n        return control_cost\n\n    @property\n    def is_healthy(self):\n        z, angle = self.data.qpos[1:3]\n\n        min_z, max_z = self._healthy_z_range\n        min_angle, max_angle = self._healthy_angle_range\n\n        healthy_z = min_z < z < max_z\n        healthy_angle = min_angle < angle < max_angle\n        is_healthy = healthy_z and healthy_angle\n\n        return is_healthy\n\n    @property\n    def terminated(self):\n        terminated = not self.is_healthy if self._terminate_when_unhealthy else False\n        return terminated\n\n    def _get_obs(self):\n        position = self.data.qpos.flat.copy()\n        velocity = np.clip(self.data.qvel.flat.copy(), -10, 10)\n\n        if self._exclude_current_positions_from_observation:\n            position = position[1:]\n\n        observation = np.concatenate((position, velocity)).ravel()\n        return observation\n\n    def step(self, action):\n        x_position_before = self.data.qpos[0]\n        self.do_simulation(action, self.frame_skip)\n        x_position_after = self.data.qpos[0]\n        x_velocity = (x_position_after - x_position_before) / self.dt\n\n        ctrl_cost = self.control_cost(action)\n\n        forward_reward = self._forward_reward_weight * x_velocity\n        healthy_reward = self.healthy_reward\n\n        rewards = forward_reward + healthy_reward\n        costs = ctrl_cost\n\n        observation = self._get_obs()\n        reward = rewards - costs\n        terminated = self.terminated\n        info = {\n            \"x_position\": x_position_after,\n            \"x_velocity\": x_velocity,\n        }\n\n        if self.render_mode == \"human\":\n            self.render()\n\n        return observation, reward, terminated, False, info\n\n    def reset_model(self):\n        noise_low = -self._reset_noise_scale\n        noise_high = self._reset_noise_scale\n\n        qpos = self.init_qpos + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nq\n        )\n        qvel = self.init_qvel + self.np_random.uniform(\n            low=noise_low, high=noise_high, size=self.model.nv\n        )\n\n        self.set_state(qpos, qvel)\n\n        observation = self._get_obs()\n        return observation\n\n    def viewer_setup(self):\n        assert self.viewer is not None\n        for key, value in DEFAULT_CAMERA_CONFIG.items():\n            if isinstance(value, np.ndarray):\n                getattr(self.viewer.cam, key)[:] = value\n            else:\n                setattr(self.viewer.cam, key, value)\n"
  },
  {
    "path": "gym/envs/registration.py",
    "content": "import contextlib\nimport copy\nimport difflib\nimport importlib\nimport importlib.util\nimport re\nimport sys\nimport warnings\nfrom dataclasses import dataclass, field\nfrom typing import (\n    Callable,\n    Dict,\n    List,\n    Optional,\n    Sequence,\n    SupportsFloat,\n    Tuple,\n    Union,\n    overload,\n)\n\nimport numpy as np\n\nfrom gym.wrappers import (\n    AutoResetWrapper,\n    HumanRendering,\n    OrderEnforcing,\n    RenderCollection,\n    TimeLimit,\n)\nfrom gym.wrappers.compatibility import EnvCompatibility\nfrom gym.wrappers.env_checker import PassiveEnvChecker\n\nif sys.version_info < (3, 10):\n    import importlib_metadata as metadata  # type: ignore\nelse:\n    import importlib.metadata as metadata\n\nif sys.version_info >= (3, 8):\n    from typing import Literal\nelse:\n    from typing_extensions import Literal\n\nfrom gym import Env, error, logger\n\nENV_ID_RE = re.compile(\n    r\"^(?:(?P<namespace>[\\w:-]+)\\/)?(?:(?P<name>[\\w:.-]+?))(?:-v(?P<version>\\d+))?$\"\n)\n\n\ndef load(name: str) -> callable:\n    \"\"\"Loads an environment with name and returns an environment creation function\n\n    Args:\n        name: The environment name\n\n    Returns:\n        Calls the environment constructor\n    \"\"\"\n    mod_name, attr_name = name.split(\":\")\n    mod = importlib.import_module(mod_name)\n    fn = getattr(mod, attr_name)\n    return fn\n\n\ndef parse_env_id(id: str) -> Tuple[Optional[str], str, Optional[int]]:\n    \"\"\"Parse environment ID string format.\n\n    This format is true today, but it's *not* an official spec.\n    [namespace/](env-name)-v(version)    env-name is group 1, version is group 2\n\n    2016-10-31: We're experimentally expanding the environment ID format\n    to include an optional namespace.\n\n    Args:\n        id: The environment id to parse\n\n    Returns:\n        A tuple of environment namespace, environment name and version number\n\n    Raises:\n        Error: If the environment id does not a valid environment regex\n    \"\"\"\n    match = ENV_ID_RE.fullmatch(id)\n    if not match:\n        raise error.Error(\n            f\"Malformed environment ID: {id}.\"\n            f\"(Currently all IDs must be of the form [namespace/](env-name)-v(version). (namespace is optional))\"\n        )\n    namespace, name, version = match.group(\"namespace\", \"name\", \"version\")\n    if version is not None:\n        version = int(version)\n\n    return namespace, name, version\n\n\ndef get_env_id(ns: Optional[str], name: str, version: Optional[int]) -> str:\n    \"\"\"Get the full env ID given a name and (optional) version and namespace. Inverse of :meth:`parse_env_id`.\n\n    Args:\n        ns: The environment namespace\n        name: The environment name\n        version: The environment version\n\n    Returns:\n        The environment id\n    \"\"\"\n\n    full_name = name\n    if version is not None:\n        full_name += f\"-v{version}\"\n    if ns is not None:\n        full_name = ns + \"/\" + full_name\n    return full_name\n\n\n@dataclass\nclass EnvSpec:\n    \"\"\"A specification for creating environments with `gym.make`.\n\n    * id: The string used to create the environment with `gym.make`\n    * entry_point: The location of the environment to create from\n    * reward_threshold: The reward threshold for completing the environment.\n    * nondeterministic: If the observation of an environment cannot be repeated with the same initial state, random number generator state and actions.\n    * max_episode_steps: The max number of steps that the environment can take before truncation\n    * order_enforce: If to enforce the order of `reset` before `step` and `render` functions\n    * autoreset: If to automatically reset the environment on episode end\n    * disable_env_checker: If to disable the environment checker wrapper in `gym.make`, by default False (runs the environment checker)\n    * kwargs: Additional keyword arguments passed to the environments through `gym.make`\n    \"\"\"\n\n    id: str\n    entry_point: Union[Callable, str]\n\n    # Environment attributes\n    reward_threshold: Optional[float] = field(default=None)\n    nondeterministic: bool = field(default=False)\n\n    # Wrappers\n    max_episode_steps: Optional[int] = field(default=None)\n    order_enforce: bool = field(default=True)\n    autoreset: bool = field(default=False)\n    disable_env_checker: bool = field(default=False)\n    apply_api_compatibility: bool = field(default=False)\n\n    # Environment arguments\n    kwargs: dict = field(default_factory=dict)\n\n    # post-init attributes\n    namespace: Optional[str] = field(init=False)\n    name: str = field(init=False)\n    version: Optional[int] = field(init=False)\n\n    def __post_init__(self):\n        # Initialize namespace, name, version\n        self.namespace, self.name, self.version = parse_env_id(self.id)\n\n    def make(self, **kwargs) -> Env:\n        # For compatibility purposes\n        return make(self, **kwargs)\n\n\ndef _check_namespace_exists(ns: Optional[str]):\n    \"\"\"Check if a namespace exists. If it doesn't, print a helpful error message.\"\"\"\n    if ns is None:\n        return\n    namespaces = {\n        spec_.namespace for spec_ in registry.values() if spec_.namespace is not None\n    }\n    if ns in namespaces:\n        return\n\n    suggestion = (\n        difflib.get_close_matches(ns, namespaces, n=1) if len(namespaces) > 0 else None\n    )\n    suggestion_msg = (\n        f\"Did you mean: `{suggestion[0]}`?\"\n        if suggestion\n        else f\"Have you installed the proper package for {ns}?\"\n    )\n\n    raise error.NamespaceNotFound(f\"Namespace {ns} not found. {suggestion_msg}\")\n\n\ndef _check_name_exists(ns: Optional[str], name: str):\n    \"\"\"Check if an env exists in a namespace. If it doesn't, print a helpful error message.\"\"\"\n    _check_namespace_exists(ns)\n    names = {spec_.name for spec_ in registry.values() if spec_.namespace == ns}\n\n    if name in names:\n        return\n\n    suggestion = difflib.get_close_matches(name, names, n=1)\n    namespace_msg = f\" in namespace {ns}\" if ns else \"\"\n    suggestion_msg = f\"Did you mean: `{suggestion[0]}`?\" if suggestion else \"\"\n\n    raise error.NameNotFound(\n        f\"Environment {name} doesn't exist{namespace_msg}. {suggestion_msg}\"\n    )\n\n\ndef _check_version_exists(ns: Optional[str], name: str, version: Optional[int]):\n    \"\"\"Check if an env version exists in a namespace. If it doesn't, print a helpful error message.\n    This is a complete test whether an environment identifier is valid, and will provide the best available hints.\n\n    Args:\n        ns: The environment namespace\n        name: The environment space\n        version: The environment version\n\n    Raises:\n        DeprecatedEnv: The environment doesn't exist but a default version does\n        VersionNotFound: The ``version`` used doesn't exist\n        DeprecatedEnv: Environment version is deprecated\n    \"\"\"\n    if get_env_id(ns, name, version) in registry:\n        return\n\n    _check_name_exists(ns, name)\n    if version is None:\n        return\n\n    message = f\"Environment version `v{version}` for environment `{get_env_id(ns, name, None)}` doesn't exist.\"\n\n    env_specs = [\n        spec_\n        for spec_ in registry.values()\n        if spec_.namespace == ns and spec_.name == name\n    ]\n    env_specs = sorted(env_specs, key=lambda spec_: int(spec_.version or -1))\n\n    default_spec = [spec_ for spec_ in env_specs if spec_.version is None]\n\n    if default_spec:\n        message += f\" It provides the default version {default_spec[0].id}`.\"\n        if len(env_specs) == 1:\n            raise error.DeprecatedEnv(message)\n\n    # Process possible versioned environments\n\n    versioned_specs = [spec_ for spec_ in env_specs if spec_.version is not None]\n\n    latest_spec = max(versioned_specs, key=lambda spec: spec.version, default=None)  # type: ignore\n    if latest_spec is not None and version > latest_spec.version:\n        version_list_msg = \", \".join(f\"`v{spec_.version}`\" for spec_ in env_specs)\n        message += f\" It provides versioned environments: [ {version_list_msg} ].\"\n\n        raise error.VersionNotFound(message)\n\n    if latest_spec is not None and version < latest_spec.version:\n        raise error.DeprecatedEnv(\n            f\"Environment version v{version} for `{get_env_id(ns, name, None)}` is deprecated. \"\n            f\"Please use `{latest_spec.id}` instead.\"\n        )\n\n\ndef find_highest_version(ns: Optional[str], name: str) -> Optional[int]:\n    version: List[int] = [\n        spec_.version\n        for spec_ in registry.values()\n        if spec_.namespace == ns and spec_.name == name and spec_.version is not None\n    ]\n    return max(version, default=None)\n\n\ndef load_env_plugins(entry_point: str = \"gym.envs\") -> None:\n    # Load third-party environments\n    for plugin in metadata.entry_points(group=entry_point):\n        # Python 3.8 doesn't support plugin.module, plugin.attr\n        # So we'll have to try and parse this ourselves\n        module, attr = None, None\n        try:\n            module, attr = plugin.module, plugin.attr  # type: ignore  ## error: Cannot access member \"attr\" for type \"EntryPoint\"\n        except AttributeError:\n            if \":\" in plugin.value:\n                module, attr = plugin.value.split(\":\", maxsplit=1)\n            else:\n                module, attr = plugin.value, None\n        except Exception as e:\n            warnings.warn(\n                f\"While trying to load plugin `{plugin}` from {entry_point}, an exception occurred: {e}\"\n            )\n            module, attr = None, None\n        finally:\n            if attr is None:\n                raise error.Error(\n                    f\"Gym environment plugin `{module}` must specify a function to execute, not a root module\"\n                )\n\n        context = namespace(plugin.name)\n        if plugin.name.startswith(\"__\") and plugin.name.endswith(\"__\"):\n            # `__internal__` is an artifact of the plugin system when\n            # the root namespace had an allow-list. The allow-list is now\n            # removed and plugins can register environments in the root\n            # namespace with the `__root__` magic key.\n            if plugin.name == \"__root__\" or plugin.name == \"__internal__\":\n                context = contextlib.nullcontext()\n            else:\n                logger.warn(\n                    f\"The environment namespace magic key `{plugin.name}` is unsupported. \"\n                    \"To register an environment at the root namespace you should specify the `__root__` namespace.\"\n                )\n\n        with context:\n            fn = plugin.load()\n            try:\n                fn()\n            except Exception as e:\n                logger.warn(str(e))\n\n\n# fmt: off\n@overload\ndef make(id: str, **kwargs) -> Env: ...\n@overload\ndef make(id: EnvSpec, **kwargs) -> Env: ...\n\n\n# Classic control\n# ----------------------------------------\n@overload\ndef make(id: Literal[\"CartPole-v0\", \"CartPole-v1\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, int]]: ...\n@overload\ndef make(id: Literal[\"MountainCar-v0\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, int]]: ...\n@overload\ndef make(id: Literal[\"MountainCarContinuous-v0\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, Sequence[SupportsFloat]]]: ...\n@overload\ndef make(id: Literal[\"Pendulum-v1\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, Sequence[SupportsFloat]]]: ...\n@overload\ndef make(id: Literal[\"Acrobot-v1\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, int]]: ...\n\n\n# Box2d\n# ----------------------------------------\n@overload\ndef make(id: Literal[\"LunarLander-v2\", \"LunarLanderContinuous-v2\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, int]]: ...\n@overload\ndef make(id: Literal[\"BipedalWalker-v3\", \"BipedalWalkerHardcore-v3\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, Sequence[SupportsFloat]]]: ...\n@overload\ndef make(id: Literal[\"CarRacing-v2\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, Sequence[SupportsFloat]]]: ...\n\n\n# Toy Text\n# ----------------------------------------\n@overload\ndef make(id: Literal[\"Blackjack-v1\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, int]]: ...\n@overload\ndef make(id: Literal[\"FrozenLake-v1\", \"FrozenLake8x8-v1\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, int]]: ...\n@overload\ndef make(id: Literal[\"CliffWalking-v0\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, int]]: ...\n@overload\ndef make(id: Literal[\"Taxi-v3\"], **kwargs) -> Env[np.ndarray, Union[np.ndarray, int]]: ...\n\n\n# Mujoco\n# ----------------------------------------\n@overload\ndef make(id: Literal[\n    \"Reacher-v2\", \"Reacher-v4\",\n    \"Pusher-v2\", \"Pusher-v4\",\n    \"InvertedPendulum-v2\", \"InvertedPendulum-v4\",\n    \"InvertedDoublePendulum-v2\", \"InvertedDoublePendulum-v4\",\n    \"HalfCheetah-v2\", \"HalfCheetah-v3\", \"HalfCheetah-v4\",\n    \"Hopper-v2\", \"Hopper-v3\", \"Hopper-v4\",\n    \"Swimmer-v2\", \"Swimmer-v3\", \"Swimmer-v4\",\n    \"Walker2d-v2\", \"Walker2d-v3\", \"Walker2d-v4\",\n    \"Ant-v2\", \"Ant-v3\", \"Ant-v4\",\n    \"HumanoidStandup-v2\", \"HumanoidStandup-v4\",\n    \"Humanoid-v2\", \"Humanoid-v3\", \"Humanoid-v4\",\n], **kwargs) -> Env[np.ndarray, np.ndarray]: ...\n# fmt: on\n\n\n# Global registry of environments. Meant to be accessed through `register` and `make`\nregistry: Dict[str, EnvSpec] = {}\ncurrent_namespace: Optional[str] = None\n\n\ndef _check_spec_register(spec: EnvSpec):\n    \"\"\"Checks whether the spec is valid to be registered. Helper function for `register`.\"\"\"\n    global registry\n    latest_versioned_spec = max(\n        (\n            spec_\n            for spec_ in registry.values()\n            if spec_.namespace == spec.namespace\n            and spec_.name == spec.name\n            and spec_.version is not None\n        ),\n        key=lambda spec_: int(spec_.version),  # type: ignore\n        default=None,\n    )\n\n    unversioned_spec = next(\n        (\n            spec_\n            for spec_ in registry.values()\n            if spec_.namespace == spec.namespace\n            and spec_.name == spec.name\n            and spec_.version is None\n        ),\n        None,\n    )\n\n    if unversioned_spec is not None and spec.version is not None:\n        raise error.RegistrationError(\n            \"Can't register the versioned environment \"\n            f\"`{spec.id}` when the unversioned environment \"\n            f\"`{unversioned_spec.id}` of the same name already exists.\"\n        )\n    elif latest_versioned_spec is not None and spec.version is None:\n        raise error.RegistrationError(\n            \"Can't register the unversioned environment \"\n            f\"`{spec.id}` when the versioned environment \"\n            f\"`{latest_versioned_spec.id}` of the same name \"\n            f\"already exists. Note: the default behavior is \"\n            f\"that `gym.make` with the unversioned environment \"\n            f\"will return the latest versioned environment\"\n        )\n\n\n# Public API\n\n\n@contextlib.contextmanager\ndef namespace(ns: str):\n    global current_namespace\n    old_namespace = current_namespace\n    current_namespace = ns\n    yield\n    current_namespace = old_namespace\n\n\ndef register(\n    id: str,\n    entry_point: Union[Callable, str],\n    reward_threshold: Optional[float] = None,\n    nondeterministic: bool = False,\n    max_episode_steps: Optional[int] = None,\n    order_enforce: bool = True,\n    autoreset: bool = False,\n    disable_env_checker: bool = False,\n    apply_api_compatibility: bool = False,\n    **kwargs,\n):\n    \"\"\"Register an environment with gym.\n\n    The `id` parameter corresponds to the name of the environment, with the syntax as follows:\n    `(namespace)/(env_name)-v(version)` where `namespace` is optional.\n\n    It takes arbitrary keyword arguments, which are passed to the `EnvSpec` constructor.\n\n    Args:\n        id: The environment id\n        entry_point: The entry point for creating the environment\n        reward_threshold: The reward threshold considered to have learnt an environment\n        nondeterministic: If the environment is nondeterministic (even with knowledge of the initial seed and all actions)\n        max_episode_steps: The maximum number of episodes steps before truncation. Used by the Time Limit wrapper.\n        order_enforce: If to enable the order enforcer wrapper to ensure users run functions in the correct order\n        autoreset: If to add the autoreset wrapper such that reset does not need to be called.\n        disable_env_checker: If to disable the environment checker for the environment. Recommended to False.\n        apply_api_compatibility: If to apply the `StepAPICompatibility` wrapper.\n        **kwargs: arbitrary keyword arguments which are passed to the environment constructor\n    \"\"\"\n    global registry, current_namespace\n    ns, name, version = parse_env_id(id)\n\n    if current_namespace is not None:\n        if (\n            kwargs.get(\"namespace\") is not None\n            and kwargs.get(\"namespace\") != current_namespace\n        ):\n            logger.warn(\n                f\"Custom namespace `{kwargs.get('namespace')}` is being overridden by namespace `{current_namespace}`. \"\n                f\"If you are developing a plugin you shouldn't specify a namespace in `register` calls. \"\n                \"The namespace is specified through the entry point package metadata.\"\n            )\n        ns_id = current_namespace\n    else:\n        ns_id = ns\n\n    full_id = get_env_id(ns_id, name, version)\n\n    new_spec = EnvSpec(\n        id=full_id,\n        entry_point=entry_point,\n        reward_threshold=reward_threshold,\n        nondeterministic=nondeterministic,\n        max_episode_steps=max_episode_steps,\n        order_enforce=order_enforce,\n        autoreset=autoreset,\n        disable_env_checker=disable_env_checker,\n        apply_api_compatibility=apply_api_compatibility,\n        **kwargs,\n    )\n    _check_spec_register(new_spec)\n    if new_spec.id in registry:\n        logger.warn(f\"Overriding environment {new_spec.id} already in registry.\")\n    registry[new_spec.id] = new_spec\n\n\ndef make(\n    id: Union[str, EnvSpec],\n    max_episode_steps: Optional[int] = None,\n    autoreset: bool = False,\n    apply_api_compatibility: Optional[bool] = None,\n    disable_env_checker: Optional[bool] = None,\n    **kwargs,\n) -> Env:\n    \"\"\"Create an environment according to the given ID.\n\n    To find all available environments use `gym.envs.registry.keys()` for all valid ids.\n\n    Args:\n        id: Name of the environment. Optionally, a module to import can be included, eg. 'module:Env-v0'\n        max_episode_steps: Maximum length of an episode (TimeLimit wrapper).\n        autoreset: Whether to automatically reset the environment after each episode (AutoResetWrapper).\n        apply_api_compatibility: Whether to wrap the environment with the `StepAPICompatibility` wrapper that\n            converts the environment step from a done bool to return termination and truncation bools.\n            By default, the argument is None to which the environment specification `apply_api_compatibility` is used\n            which defaults to False. Otherwise, the value of `apply_api_compatibility` is used.\n            If `True`, the wrapper is applied otherwise, the wrapper is not applied.\n        disable_env_checker: If to run the env checker, None will default to the environment specification `disable_env_checker`\n            (which is by default False, running the environment checker),\n            otherwise will run according to this parameter (`True` = not run, `False` = run)\n        kwargs: Additional arguments to pass to the environment constructor.\n\n    Returns:\n        An instance of the environment.\n\n    Raises:\n        Error: If the ``id`` doesn't exist then an error is raised\n    \"\"\"\n    if isinstance(id, EnvSpec):\n        spec_ = id\n    else:\n        module, id = (None, id) if \":\" not in id else id.split(\":\")\n        if module is not None:\n            try:\n                importlib.import_module(module)\n            except ModuleNotFoundError as e:\n                raise ModuleNotFoundError(\n                    f\"{e}. Environment registration via importing a module failed. \"\n                    f\"Check whether '{module}' contains env registration and can be imported.\"\n                )\n        spec_ = registry.get(id)\n\n        ns, name, version = parse_env_id(id)\n        latest_version = find_highest_version(ns, name)\n        if (\n            version is not None\n            and latest_version is not None\n            and latest_version > version\n        ):\n            logger.warn(\n                f\"The environment {id} is out of date. You should consider \"\n                f\"upgrading to version `v{latest_version}`.\"\n            )\n        if version is None and latest_version is not None:\n            version = latest_version\n            new_env_id = get_env_id(ns, name, version)\n            spec_ = registry.get(new_env_id)\n            logger.warn(\n                f\"Using the latest versioned environment `{new_env_id}` \"\n                f\"instead of the unversioned environment `{id}`.\"\n            )\n\n        if spec_ is None:\n            _check_version_exists(ns, name, version)\n            raise error.Error(f\"No registered env with id: {id}\")\n\n    _kwargs = spec_.kwargs.copy()\n    _kwargs.update(kwargs)\n\n    if spec_.entry_point is None:\n        raise error.Error(f\"{spec_.id} registered but entry_point is not specified\")\n    elif callable(spec_.entry_point):\n        env_creator = spec_.entry_point\n    else:\n        # Assume it's a string\n        env_creator = load(spec_.entry_point)\n\n    mode = _kwargs.get(\"render_mode\")\n    apply_human_rendering = False\n    apply_render_collection = False\n\n    # If we have access to metadata we check that \"render_mode\" is valid and see if the HumanRendering wrapper needs to be applied\n    if mode is not None and hasattr(env_creator, \"metadata\"):\n        assert isinstance(\n            env_creator.metadata, dict\n        ), f\"Expect the environment creator ({env_creator}) metadata to be dict, actual type: {type(env_creator.metadata)}\"\n\n        if \"render_modes\" in env_creator.metadata:\n            render_modes = env_creator.metadata[\"render_modes\"]\n            if not isinstance(render_modes, Sequence):\n                logger.warn(\n                    f\"Expects the environment metadata render_modes to be a Sequence (tuple or list), actual type: {type(render_modes)}\"\n                )\n\n            # Apply the `HumanRendering` wrapper, if the mode==\"human\" but \"human\" not in render_modes\n            if (\n                mode == \"human\"\n                and \"human\" not in render_modes\n                and (\"rgb_array\" in render_modes or \"rgb_array_list\" in render_modes)\n            ):\n                logger.warn(\n                    \"You are trying to use 'human' rendering for an environment that doesn't natively support it. \"\n                    \"The HumanRendering wrapper is being applied to your environment.\"\n                )\n                apply_human_rendering = True\n                if \"rgb_array\" in render_modes:\n                    _kwargs[\"render_mode\"] = \"rgb_array\"\n                else:\n                    _kwargs[\"render_mode\"] = \"rgb_array_list\"\n            elif (\n                mode not in render_modes\n                and mode.endswith(\"_list\")\n                and mode[: -len(\"_list\")] in render_modes\n            ):\n                _kwargs[\"render_mode\"] = mode[: -len(\"_list\")]\n                apply_render_collection = True\n            elif mode not in render_modes:\n                logger.warn(\n                    f\"The environment is being initialised with mode ({mode}) that is not in the possible render_modes ({render_modes}).\"\n                )\n        else:\n            logger.warn(\n                f\"The environment creator metadata doesn't include `render_modes`, contains: {list(env_creator.metadata.keys())}\"\n            )\n\n    if apply_api_compatibility is True or (\n        apply_api_compatibility is None and spec_.apply_api_compatibility is True\n    ):\n        # If we use the compatibility layer, we treat the render mode explicitly and don't pass it to the env creator\n        render_mode = _kwargs.pop(\"render_mode\", None)\n    else:\n        render_mode = None\n\n    try:\n        env = env_creator(**_kwargs)\n    except TypeError as e:\n        if (\n            str(e).find(\"got an unexpected keyword argument 'render_mode'\") >= 0\n            and apply_human_rendering\n        ):\n            raise error.Error(\n                f\"You passed render_mode='human' although {id} doesn't implement human-rendering natively. \"\n                \"Gym tried to apply the HumanRendering wrapper but it looks like your environment is using the old \"\n                \"rendering API, which is not supported by the HumanRendering wrapper.\"\n            )\n        else:\n            raise e\n\n    # Copies the environment creation specification and kwargs to add to the environment specification details\n    spec_ = copy.deepcopy(spec_)\n    spec_.kwargs = _kwargs\n    env.unwrapped.spec = spec_\n\n    # Add step API wrapper\n    if apply_api_compatibility is True or (\n        apply_api_compatibility is None and spec_.apply_api_compatibility is True\n    ):\n        env = EnvCompatibility(env, render_mode)\n\n    # Run the environment checker as the lowest level wrapper\n    if disable_env_checker is False or (\n        disable_env_checker is None and spec_.disable_env_checker is False\n    ):\n        env = PassiveEnvChecker(env)\n\n    # Add the order enforcing wrapper\n    if spec_.order_enforce:\n        env = OrderEnforcing(env)\n\n    # Add the time limit wrapper\n    if max_episode_steps is not None:\n        env = TimeLimit(env, max_episode_steps)\n    elif spec_.max_episode_steps is not None:\n        env = TimeLimit(env, spec_.max_episode_steps)\n\n    # Add the autoreset wrapper\n    if autoreset:\n        env = AutoResetWrapper(env)\n\n    # Add human rendering wrapper\n    if apply_human_rendering:\n        env = HumanRendering(env)\n    elif apply_render_collection:\n        env = RenderCollection(env)\n\n    return env\n\n\ndef spec(env_id: str) -> EnvSpec:\n    \"\"\"Retrieve the spec for the given environment from the global registry.\"\"\"\n    spec_ = registry.get(env_id)\n    if spec_ is None:\n        ns, name, version = parse_env_id(env_id)\n        _check_version_exists(ns, name, version)\n        raise error.Error(f\"No registered env with id: {env_id}\")\n    else:\n        assert isinstance(spec_, EnvSpec)\n        return spec_\n"
  },
  {
    "path": "gym/envs/toy_text/__init__.py",
    "content": "from gym.envs.toy_text.blackjack import BlackjackEnv\nfrom gym.envs.toy_text.cliffwalking import CliffWalkingEnv\nfrom gym.envs.toy_text.frozen_lake import FrozenLakeEnv\nfrom gym.envs.toy_text.taxi import TaxiEnv\n"
  },
  {
    "path": "gym/envs/toy_text/blackjack.py",
    "content": "import os\nfrom typing import Optional\n\nimport numpy as np\n\nimport gym\nfrom gym import spaces\nfrom gym.error import DependencyNotInstalled\n\n\ndef cmp(a, b):\n    return float(a > b) - float(a < b)\n\n\n# 1 = Ace, 2-10 = Number cards, Jack/Queen/King = 10\ndeck = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10]\n\n\ndef draw_card(np_random):\n    return int(np_random.choice(deck))\n\n\ndef draw_hand(np_random):\n    return [draw_card(np_random), draw_card(np_random)]\n\n\ndef usable_ace(hand):  # Does this hand have a usable ace?\n    return 1 in hand and sum(hand) + 10 <= 21\n\n\ndef sum_hand(hand):  # Return current hand total\n    if usable_ace(hand):\n        return sum(hand) + 10\n    return sum(hand)\n\n\ndef is_bust(hand):  # Is this hand a bust?\n    return sum_hand(hand) > 21\n\n\ndef score(hand):  # What is the score of this hand (0 if bust)\n    return 0 if is_bust(hand) else sum_hand(hand)\n\n\ndef is_natural(hand):  # Is this hand a natural blackjack?\n    return sorted(hand) == [1, 10]\n\n\nclass BlackjackEnv(gym.Env):\n    \"\"\"\n    Blackjack is a card game where the goal is to beat the dealer by obtaining cards\n    that sum to closer to 21 (without going over 21) than the dealers cards.\n\n    ### Description\n    Card Values:\n\n    - Face cards (Jack, Queen, King) have a point value of 10.\n    - Aces can either count as 11 (called a 'usable ace') or 1.\n    - Numerical cards (2-9) have a value equal to their number.\n\n    This game is played with an infinite deck (or with replacement).\n    The game starts with the dealer having one face up and one face down card,\n    while the player has two face up cards.\n\n    The player can request additional cards (hit, action=1) until they decide to stop (stick, action=0)\n    or exceed 21 (bust, immediate loss).\n    After the player sticks, the dealer reveals their facedown card, and draws\n    until their sum is 17 or greater.  If the dealer goes bust, the player wins.\n    If neither the player nor the dealer busts, the outcome (win, lose, draw) is\n    decided by whose sum is closer to 21.\n\n    ### Action Space\n    There are two actions: stick (0), and hit (1).\n\n    ### Observation Space\n    The observation consists of a 3-tuple containing: the player's current sum,\n    the value of the dealer's one showing card (1-10 where 1 is ace),\n    and whether the player holds a usable ace (0 or 1).\n\n    This environment corresponds to the version of the blackjack problem\n    described in Example 5.1 in Reinforcement Learning: An Introduction\n    by Sutton and Barto (http://incompleteideas.net/book/the-book-2nd.html).\n\n    ### Rewards\n    - win game: +1\n    - lose game: -1\n    - draw game: 0\n    - win game with natural blackjack:\n\n        +1.5 (if <a href=\"#nat\">natural</a> is True)\n\n        +1 (if <a href=\"#nat\">natural</a> is False)\n\n    ### Arguments\n\n    ```\n    gym.make('Blackjack-v1', natural=False, sab=False)\n    ```\n\n    <a id=\"nat\">`natural=False`</a>: Whether to give an additional reward for\n    starting with a natural blackjack, i.e. starting with an ace and ten (sum is 21).\n\n    <a id=\"sab\">`sab=False`</a>: Whether to follow the exact rules outlined in the book by\n    Sutton and Barto. If `sab` is `True`, the keyword argument `natural` will be ignored.\n    If the player achieves a natural blackjack and the dealer does not, the player\n    will win (i.e. get a reward of +1). The reverse rule does not apply.\n    If both the player and the dealer get a natural, it will be a draw (i.e. reward 0).\n\n    ### Version History\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\"human\", \"rgb_array\"],\n        \"render_fps\": 4,\n    }\n\n    def __init__(self, render_mode: Optional[str] = None, natural=False, sab=False):\n        self.action_space = spaces.Discrete(2)\n        self.observation_space = spaces.Tuple(\n            (spaces.Discrete(32), spaces.Discrete(11), spaces.Discrete(2))\n        )\n\n        # Flag to payout 1.5 on a \"natural\" blackjack win, like casino rules\n        # Ref: http://www.bicyclecards.com/how-to-play/blackjack/\n        self.natural = natural\n\n        # Flag for full agreement with the (Sutton and Barto, 2018) definition. Overrides self.natural\n        self.sab = sab\n\n        self.render_mode = render_mode\n\n    def step(self, action):\n        assert self.action_space.contains(action)\n        if action:  # hit: add a card to players hand and return\n            self.player.append(draw_card(self.np_random))\n            if is_bust(self.player):\n                terminated = True\n                reward = -1.0\n            else:\n                terminated = False\n                reward = 0.0\n        else:  # stick: play out the dealers hand, and score\n            terminated = True\n            while sum_hand(self.dealer) < 17:\n                self.dealer.append(draw_card(self.np_random))\n            reward = cmp(score(self.player), score(self.dealer))\n            if self.sab and is_natural(self.player) and not is_natural(self.dealer):\n                # Player automatically wins. Rules consistent with S&B\n                reward = 1.0\n            elif (\n                not self.sab\n                and self.natural\n                and is_natural(self.player)\n                and reward == 1.0\n            ):\n                # Natural gives extra points, but doesn't autowin. Legacy implementation\n                reward = 1.5\n\n        if self.render_mode == \"human\":\n            self.render()\n        return self._get_obs(), reward, terminated, False, {}\n\n    def _get_obs(self):\n        return (sum_hand(self.player), self.dealer[0], usable_ace(self.player))\n\n    def reset(\n        self,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ):\n        super().reset(seed=seed)\n        self.dealer = draw_hand(self.np_random)\n        self.player = draw_hand(self.np_random)\n\n        _, dealer_card_value, _ = self._get_obs()\n\n        suits = [\"C\", \"D\", \"H\", \"S\"]\n        self.dealer_top_card_suit = self.np_random.choice(suits)\n\n        if dealer_card_value == 1:\n            self.dealer_top_card_value_str = \"A\"\n        elif dealer_card_value == 10:\n            self.dealer_top_card_value_str = self.np_random.choice([\"J\", \"Q\", \"K\"])\n        else:\n            self.dealer_top_card_value_str = str(dealer_card_value)\n\n        if self.render_mode == \"human\":\n            self.render()\n        return self._get_obs(), {}\n\n    def render(self):\n        if self.render_mode is None:\n            gym.logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n            return\n\n        try:\n            import pygame\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[toy_text]`\"\n            )\n\n        player_sum, dealer_card_value, usable_ace = self._get_obs()\n        screen_width, screen_height = 600, 500\n        card_img_height = screen_height // 3\n        card_img_width = int(card_img_height * 142 / 197)\n        spacing = screen_height // 20\n\n        bg_color = (7, 99, 36)\n        white = (255, 255, 255)\n\n        if not hasattr(self, \"screen\"):\n            pygame.init()\n            if self.render_mode == \"human\":\n                pygame.display.init()\n                self.screen = pygame.display.set_mode((screen_width, screen_height))\n            else:\n                pygame.font.init()\n                self.screen = pygame.Surface((screen_width, screen_height))\n\n        if not hasattr(self, \"clock\"):\n            self.clock = pygame.time.Clock()\n\n        self.screen.fill(bg_color)\n\n        def get_image(path):\n            cwd = os.path.dirname(__file__)\n            image = pygame.image.load(os.path.join(cwd, path))\n            return image\n\n        def get_font(path, size):\n            cwd = os.path.dirname(__file__)\n            font = pygame.font.Font(os.path.join(cwd, path), size)\n            return font\n\n        small_font = get_font(\n            os.path.join(\"font\", \"Minecraft.ttf\"), screen_height // 15\n        )\n        dealer_text = small_font.render(\n            \"Dealer: \" + str(dealer_card_value), True, white\n        )\n        dealer_text_rect = self.screen.blit(dealer_text, (spacing, spacing))\n\n        def scale_card_img(card_img):\n            return pygame.transform.scale(card_img, (card_img_width, card_img_height))\n\n        dealer_card_img = scale_card_img(\n            get_image(\n                os.path.join(\n                    \"img\",\n                    f\"{self.dealer_top_card_suit}{self.dealer_top_card_value_str}.png\",\n                )\n            )\n        )\n        dealer_card_rect = self.screen.blit(\n            dealer_card_img,\n            (\n                screen_width // 2 - card_img_width - spacing // 2,\n                dealer_text_rect.bottom + spacing,\n            ),\n        )\n\n        hidden_card_img = scale_card_img(get_image(os.path.join(\"img\", \"Card.png\")))\n        self.screen.blit(\n            hidden_card_img,\n            (\n                screen_width // 2 + spacing // 2,\n                dealer_text_rect.bottom + spacing,\n            ),\n        )\n\n        player_text = small_font.render(\"Player\", True, white)\n        player_text_rect = self.screen.blit(\n            player_text, (spacing, dealer_card_rect.bottom + 1.5 * spacing)\n        )\n\n        large_font = get_font(os.path.join(\"font\", \"Minecraft.ttf\"), screen_height // 6)\n        player_sum_text = large_font.render(str(player_sum), True, white)\n        player_sum_text_rect = self.screen.blit(\n            player_sum_text,\n            (\n                screen_width // 2 - player_sum_text.get_width() // 2,\n                player_text_rect.bottom + spacing,\n            ),\n        )\n\n        if usable_ace:\n            usable_ace_text = small_font.render(\"usable ace\", True, white)\n            self.screen.blit(\n                usable_ace_text,\n                (\n                    screen_width // 2 - usable_ace_text.get_width() // 2,\n                    player_sum_text_rect.bottom + spacing // 2,\n                ),\n            )\n        if self.render_mode == \"human\":\n            pygame.event.pump()\n            pygame.display.update()\n            self.clock.tick(self.metadata[\"render_fps\"])\n        else:\n            return np.transpose(\n                np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2)\n            )\n\n    def close(self):\n        if hasattr(self, \"screen\"):\n            import pygame\n\n            pygame.display.quit()\n            pygame.quit()\n\n\n# Pixel art from Mariia Khmelnytska (https://www.123rf.com/photo_104453049_stock-vector-pixel-art-playing-cards-standart-deck-vector-set.html)\n"
  },
  {
    "path": "gym/envs/toy_text/cliffwalking.py",
    "content": "from contextlib import closing\nfrom io import StringIO\nfrom os import path\nfrom typing import Optional\n\nimport numpy as np\n\nfrom gym import Env, logger, spaces\nfrom gym.envs.toy_text.utils import categorical_sample\nfrom gym.error import DependencyNotInstalled\n\nUP = 0\nRIGHT = 1\nDOWN = 2\nLEFT = 3\n\n\nclass CliffWalkingEnv(Env):\n    \"\"\"\n    This is a simple implementation of the Gridworld Cliff\n    reinforcement learning task.\n\n    Adapted from Example 6.6 (page 106) from [Reinforcement Learning: An Introduction\n    by Sutton and Barto](http://incompleteideas.net/book/bookdraft2018jan1.pdf).\n\n    With inspiration from:\n    [https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py]\n    (https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py)\n\n    ### Description\n    The board is a 4x12 matrix, with (using NumPy matrix indexing):\n    - [3, 0] as the start at bottom-left\n    - [3, 11] as the goal at bottom-right\n    - [3, 1..10] as the cliff at bottom-center\n\n    If the agent steps on the cliff, it returns to the start.\n    An episode terminates when the agent reaches the goal.\n\n    ### Actions\n    There are 4 discrete deterministic actions:\n    - 0: move up\n    - 1: move right\n    - 2: move down\n    - 3: move left\n\n    ### Observations\n    There are 3x12 + 1 possible states. In fact, the agent cannot be at the cliff, nor at the goal\n    (as this results in the end of the episode).\n    It remains all the positions of the first 3 rows plus the bottom-left cell.\n    The observation is simply the current position encoded as [flattened index](https://numpy.org/doc/stable/reference/generated/numpy.unravel_index.html).\n\n    ### Reward\n    Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward.\n\n    ### Arguments\n\n    ```\n    gym.make('CliffWalking-v0')\n    ```\n\n    ### Version History\n    - v0: Initial version release\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\"human\", \"rgb_array\", \"ansi\"],\n        \"render_fps\": 4,\n    }\n\n    def __init__(self, render_mode: Optional[str] = None):\n        self.shape = (4, 12)\n        self.start_state_index = np.ravel_multi_index((3, 0), self.shape)\n\n        self.nS = np.prod(self.shape)\n        self.nA = 4\n\n        # Cliff Location\n        self._cliff = np.zeros(self.shape, dtype=bool)\n        self._cliff[3, 1:-1] = True\n\n        # Calculate transition probabilities and rewards\n        self.P = {}\n        for s in range(self.nS):\n            position = np.unravel_index(s, self.shape)\n            self.P[s] = {a: [] for a in range(self.nA)}\n            self.P[s][UP] = self._calculate_transition_prob(position, [-1, 0])\n            self.P[s][RIGHT] = self._calculate_transition_prob(position, [0, 1])\n            self.P[s][DOWN] = self._calculate_transition_prob(position, [1, 0])\n            self.P[s][LEFT] = self._calculate_transition_prob(position, [0, -1])\n\n        # Calculate initial state distribution\n        # We always start in state (3, 0)\n        self.initial_state_distrib = np.zeros(self.nS)\n        self.initial_state_distrib[self.start_state_index] = 1.0\n\n        self.observation_space = spaces.Discrete(self.nS)\n        self.action_space = spaces.Discrete(self.nA)\n\n        self.render_mode = render_mode\n\n        # pygame utils\n        self.cell_size = (60, 60)\n        self.window_size = (\n            self.shape[1] * self.cell_size[1],\n            self.shape[0] * self.cell_size[0],\n        )\n        self.window_surface = None\n        self.clock = None\n        self.elf_images = None\n        self.start_img = None\n        self.goal_img = None\n        self.cliff_img = None\n        self.mountain_bg_img = None\n        self.near_cliff_img = None\n        self.tree_img = None\n\n    def _limit_coordinates(self, coord: np.ndarray) -> np.ndarray:\n        \"\"\"Prevent the agent from falling out of the grid world.\"\"\"\n        coord[0] = min(coord[0], self.shape[0] - 1)\n        coord[0] = max(coord[0], 0)\n        coord[1] = min(coord[1], self.shape[1] - 1)\n        coord[1] = max(coord[1], 0)\n        return coord\n\n    def _calculate_transition_prob(self, current, delta):\n        \"\"\"Determine the outcome for an action. Transition Prob is always 1.0.\n\n        Args:\n            current: Current position on the grid as (row, col)\n            delta: Change in position for transition\n\n        Returns:\n            Tuple of ``(1.0, new_state, reward, terminated)``\n        \"\"\"\n        new_position = np.array(current) + np.array(delta)\n        new_position = self._limit_coordinates(new_position).astype(int)\n        new_state = np.ravel_multi_index(tuple(new_position), self.shape)\n        if self._cliff[tuple(new_position)]:\n            return [(1.0, self.start_state_index, -100, False)]\n\n        terminal_state = (self.shape[0] - 1, self.shape[1] - 1)\n        is_terminated = tuple(new_position) == terminal_state\n        return [(1.0, new_state, -1, is_terminated)]\n\n    def step(self, a):\n        transitions = self.P[self.s][a]\n        i = categorical_sample([t[0] for t in transitions], self.np_random)\n        p, s, r, t = transitions[i]\n        self.s = s\n        self.lastaction = a\n\n        if self.render_mode == \"human\":\n            self.render()\n        return (int(s), r, t, False, {\"prob\": p})\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        self.s = categorical_sample(self.initial_state_distrib, self.np_random)\n        self.lastaction = None\n\n        if self.render_mode == \"human\":\n            self.render()\n        return int(self.s), {\"prob\": 1}\n\n    def render(self):\n        if self.render_mode is None:\n            logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n        elif self.render_mode == \"ansi\":\n            return self._render_text()\n        else:\n            return self._render_gui(self.render_mode)\n\n    def _render_gui(self, mode):\n        try:\n            import pygame\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[toy_text]`\"\n            )\n        if self.window_surface is None:\n            pygame.init()\n\n            if mode == \"human\":\n                pygame.display.init()\n                pygame.display.set_caption(\"CliffWalking\")\n                self.window_surface = pygame.display.set_mode(self.window_size)\n            else:  # rgb_array\n                self.window_surface = pygame.Surface(self.window_size)\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n        if self.elf_images is None:\n            hikers = [\n                path.join(path.dirname(__file__), \"img/elf_up.png\"),\n                path.join(path.dirname(__file__), \"img/elf_right.png\"),\n                path.join(path.dirname(__file__), \"img/elf_down.png\"),\n                path.join(path.dirname(__file__), \"img/elf_left.png\"),\n            ]\n            self.elf_images = [\n                pygame.transform.scale(pygame.image.load(f_name), self.cell_size)\n                for f_name in hikers\n            ]\n        if self.start_img is None:\n            file_name = path.join(path.dirname(__file__), \"img/stool.png\")\n            self.start_img = pygame.transform.scale(\n                pygame.image.load(file_name), self.cell_size\n            )\n        if self.goal_img is None:\n            file_name = path.join(path.dirname(__file__), \"img/cookie.png\")\n            self.goal_img = pygame.transform.scale(\n                pygame.image.load(file_name), self.cell_size\n            )\n        if self.mountain_bg_img is None:\n            bg_imgs = [\n                path.join(path.dirname(__file__), \"img/mountain_bg1.png\"),\n                path.join(path.dirname(__file__), \"img/mountain_bg2.png\"),\n            ]\n            self.mountain_bg_img = [\n                pygame.transform.scale(pygame.image.load(f_name), self.cell_size)\n                for f_name in bg_imgs\n            ]\n        if self.near_cliff_img is None:\n            near_cliff_imgs = [\n                path.join(path.dirname(__file__), \"img/mountain_near-cliff1.png\"),\n                path.join(path.dirname(__file__), \"img/mountain_near-cliff2.png\"),\n            ]\n            self.near_cliff_img = [\n                pygame.transform.scale(pygame.image.load(f_name), self.cell_size)\n                for f_name in near_cliff_imgs\n            ]\n        if self.cliff_img is None:\n            file_name = path.join(path.dirname(__file__), \"img/mountain_cliff.png\")\n            self.cliff_img = pygame.transform.scale(\n                pygame.image.load(file_name), self.cell_size\n            )\n\n        for s in range(self.nS):\n            row, col = np.unravel_index(s, self.shape)\n            pos = (col * self.cell_size[0], row * self.cell_size[1])\n            check_board_mask = row % 2 ^ col % 2\n            self.window_surface.blit(self.mountain_bg_img[check_board_mask], pos)\n\n            if self._cliff[row, col]:\n                self.window_surface.blit(self.cliff_img, pos)\n            if row < self.shape[0] - 1 and self._cliff[row + 1, col]:\n                self.window_surface.blit(self.near_cliff_img[check_board_mask], pos)\n            if s == self.start_state_index:\n                self.window_surface.blit(self.start_img, pos)\n            if s == self.nS - 1:\n                self.window_surface.blit(self.goal_img, pos)\n            if s == self.s:\n                elf_pos = (pos[0], pos[1] - 0.1 * self.cell_size[1])\n                last_action = self.lastaction if self.lastaction is not None else 2\n                self.window_surface.blit(self.elf_images[last_action], elf_pos)\n\n        if mode == \"human\":\n            pygame.event.pump()\n            pygame.display.update()\n            self.clock.tick(self.metadata[\"render_fps\"])\n        else:  # rgb_array\n            return np.transpose(\n                np.array(pygame.surfarray.pixels3d(self.window_surface)), axes=(1, 0, 2)\n            )\n\n    def _render_text(self):\n        outfile = StringIO()\n\n        for s in range(self.nS):\n            position = np.unravel_index(s, self.shape)\n            if self.s == s:\n                output = \" x \"\n            # Print terminal state\n            elif position == (3, 11):\n                output = \" T \"\n            elif self._cliff[position]:\n                output = \" C \"\n            else:\n                output = \" o \"\n\n            if position[1] == 0:\n                output = output.lstrip()\n            if position[1] == self.shape[1] - 1:\n                output = output.rstrip()\n                output += \"\\n\"\n\n            outfile.write(output)\n        outfile.write(\"\\n\")\n\n        with closing(outfile):\n            return outfile.getvalue()\n"
  },
  {
    "path": "gym/envs/toy_text/frozen_lake.py",
    "content": "from contextlib import closing\nfrom io import StringIO\nfrom os import path\nfrom typing import List, Optional\n\nimport numpy as np\n\nfrom gym import Env, logger, spaces, utils\nfrom gym.envs.toy_text.utils import categorical_sample\nfrom gym.error import DependencyNotInstalled\n\nLEFT = 0\nDOWN = 1\nRIGHT = 2\nUP = 3\n\nMAPS = {\n    \"4x4\": [\"SFFF\", \"FHFH\", \"FFFH\", \"HFFG\"],\n    \"8x8\": [\n        \"SFFFFFFF\",\n        \"FFFFFFFF\",\n        \"FFFHFFFF\",\n        \"FFFFFHFF\",\n        \"FFFHFFFF\",\n        \"FHHFFFHF\",\n        \"FHFFHFHF\",\n        \"FFFHFFFG\",\n    ],\n}\n\n\n# DFS to check that it's a valid path.\ndef is_valid(board: List[List[str]], max_size: int) -> bool:\n    frontier, discovered = [], set()\n    frontier.append((0, 0))\n    while frontier:\n        r, c = frontier.pop()\n        if not (r, c) in discovered:\n            discovered.add((r, c))\n            directions = [(1, 0), (0, 1), (-1, 0), (0, -1)]\n            for x, y in directions:\n                r_new = r + x\n                c_new = c + y\n                if r_new < 0 or r_new >= max_size or c_new < 0 or c_new >= max_size:\n                    continue\n                if board[r_new][c_new] == \"G\":\n                    return True\n                if board[r_new][c_new] != \"H\":\n                    frontier.append((r_new, c_new))\n    return False\n\n\ndef generate_random_map(size: int = 8, p: float = 0.8) -> List[str]:\n    \"\"\"Generates a random valid map (one that has a path from start to goal)\n\n    Args:\n        size: size of each side of the grid\n        p: probability that a tile is frozen\n\n    Returns:\n        A random valid map\n    \"\"\"\n    valid = False\n    board = []  # initialize to make pyright happy\n\n    while not valid:\n        p = min(1, p)\n        board = np.random.choice([\"F\", \"H\"], (size, size), p=[p, 1 - p])\n        board[0][0] = \"S\"\n        board[-1][-1] = \"G\"\n        valid = is_valid(board, size)\n    return [\"\".join(x) for x in board]\n\n\nclass FrozenLakeEnv(Env):\n    \"\"\"\n    Frozen lake involves crossing a frozen lake from Start(S) to Goal(G) without falling into any Holes(H)\n    by walking over the Frozen(F) lake.\n    The agent may not always move in the intended direction due to the slippery nature of the frozen lake.\n\n\n    ### Action Space\n    The agent takes a 1-element vector for actions.\n    The action space is `(dir)`, where `dir` decides direction to move in which can be:\n\n    - 0: LEFT\n    - 1: DOWN\n    - 2: RIGHT\n    - 3: UP\n\n    ### Observation Space\n    The observation is a value representing the agent's current position as\n    current_row * nrows + current_col (where both the row and col start at 0).\n    For example, the goal position in the 4x4 map can be calculated as follows: 3 * 4 + 3 = 15.\n    The number of possible observations is dependent on the size of the map.\n    For example, the 4x4 map has 16 possible observations.\n\n    ### Rewards\n\n    Reward schedule:\n    - Reach goal(G): +1\n    - Reach hole(H): 0\n    - Reach frozen(F): 0\n\n    ### Arguments\n\n    ```\n    gym.make('FrozenLake-v1', desc=None, map_name=\"4x4\", is_slippery=True)\n    ```\n\n    `desc`: Used to specify custom map for frozen lake. For example,\n\n        desc=[\"SFFF\", \"FHFH\", \"FFFH\", \"HFFG\"].\n\n        A random generated map can be specified by calling the function `generate_random_map`. For example,\n\n        ```\n        from gym.envs.toy_text.frozen_lake import generate_random_map\n\n        gym.make('FrozenLake-v1', desc=generate_random_map(size=8))\n        ```\n\n    `map_name`: ID to use any of the preloaded maps.\n\n        \"4x4\":[\n            \"SFFF\",\n            \"FHFH\",\n            \"FFFH\",\n            \"HFFG\"\n            ]\n\n        \"8x8\": [\n            \"SFFFFFFF\",\n            \"FFFFFFFF\",\n            \"FFFHFFFF\",\n            \"FFFFFHFF\",\n            \"FFFHFFFF\",\n            \"FHHFFFHF\",\n            \"FHFFHFHF\",\n            \"FFFHFFFG\",\n        ]\n\n    `is_slippery`: True/False. If True will move in intended direction with\n    probability of 1/3 else will move in either perpendicular direction with\n    equal probability of 1/3 in both directions.\n\n        For example, if action is left and is_slippery is True, then:\n        - P(move left)=1/3\n        - P(move up)=1/3\n        - P(move down)=1/3\n\n    ### Version History\n    * v1: Bug fixes to rewards\n    * v0: Initial versions release (1.0.0)\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\"human\", \"ansi\", \"rgb_array\"],\n        \"render_fps\": 4,\n    }\n\n    def __init__(\n        self,\n        render_mode: Optional[str] = None,\n        desc=None,\n        map_name=\"4x4\",\n        is_slippery=True,\n    ):\n        if desc is None and map_name is None:\n            desc = generate_random_map()\n        elif desc is None:\n            desc = MAPS[map_name]\n        self.desc = desc = np.asarray(desc, dtype=\"c\")\n        self.nrow, self.ncol = nrow, ncol = desc.shape\n        self.reward_range = (0, 1)\n\n        nA = 4\n        nS = nrow * ncol\n\n        self.initial_state_distrib = np.array(desc == b\"S\").astype(\"float64\").ravel()\n        self.initial_state_distrib /= self.initial_state_distrib.sum()\n\n        self.P = {s: {a: [] for a in range(nA)} for s in range(nS)}\n\n        def to_s(row, col):\n            return row * ncol + col\n\n        def inc(row, col, a):\n            if a == LEFT:\n                col = max(col - 1, 0)\n            elif a == DOWN:\n                row = min(row + 1, nrow - 1)\n            elif a == RIGHT:\n                col = min(col + 1, ncol - 1)\n            elif a == UP:\n                row = max(row - 1, 0)\n            return (row, col)\n\n        def update_probability_matrix(row, col, action):\n            newrow, newcol = inc(row, col, action)\n            newstate = to_s(newrow, newcol)\n            newletter = desc[newrow, newcol]\n            terminated = bytes(newletter) in b\"GH\"\n            reward = float(newletter == b\"G\")\n            return newstate, reward, terminated\n\n        for row in range(nrow):\n            for col in range(ncol):\n                s = to_s(row, col)\n                for a in range(4):\n                    li = self.P[s][a]\n                    letter = desc[row, col]\n                    if letter in b\"GH\":\n                        li.append((1.0, s, 0, True))\n                    else:\n                        if is_slippery:\n                            for b in [(a - 1) % 4, a, (a + 1) % 4]:\n                                li.append(\n                                    (1.0 / 3.0, *update_probability_matrix(row, col, b))\n                                )\n                        else:\n                            li.append((1.0, *update_probability_matrix(row, col, a)))\n\n        self.observation_space = spaces.Discrete(nS)\n        self.action_space = spaces.Discrete(nA)\n\n        self.render_mode = render_mode\n\n        # pygame utils\n        self.window_size = (min(64 * ncol, 512), min(64 * nrow, 512))\n        self.cell_size = (\n            self.window_size[0] // self.ncol,\n            self.window_size[1] // self.nrow,\n        )\n        self.window_surface = None\n        self.clock = None\n        self.hole_img = None\n        self.cracked_hole_img = None\n        self.ice_img = None\n        self.elf_images = None\n        self.goal_img = None\n        self.start_img = None\n\n    def step(self, a):\n        transitions = self.P[self.s][a]\n        i = categorical_sample([t[0] for t in transitions], self.np_random)\n        p, s, r, t = transitions[i]\n        self.s = s\n        self.lastaction = a\n\n        if self.render_mode == \"human\":\n            self.render()\n        return (int(s), r, t, False, {\"prob\": p})\n\n    def reset(\n        self,\n        *,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ):\n        super().reset(seed=seed)\n        self.s = categorical_sample(self.initial_state_distrib, self.np_random)\n        self.lastaction = None\n\n        if self.render_mode == \"human\":\n            self.render()\n        return int(self.s), {\"prob\": 1}\n\n    def render(self):\n        if self.render_mode is None:\n            logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n        elif self.render_mode == \"ansi\":\n            return self._render_text()\n        else:  # self.render_mode in {\"human\", \"rgb_array\"}:\n            return self._render_gui(self.render_mode)\n\n    def _render_gui(self, mode):\n        try:\n            import pygame\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[toy_text]`\"\n            )\n\n        if self.window_surface is None:\n            pygame.init()\n\n            if mode == \"human\":\n                pygame.display.init()\n                pygame.display.set_caption(\"Frozen Lake\")\n                self.window_surface = pygame.display.set_mode(self.window_size)\n            elif mode == \"rgb_array\":\n                self.window_surface = pygame.Surface(self.window_size)\n\n        assert (\n            self.window_surface is not None\n        ), \"Something went wrong with pygame. This should never happen.\"\n\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n        if self.hole_img is None:\n            file_name = path.join(path.dirname(__file__), \"img/hole.png\")\n            self.hole_img = pygame.transform.scale(\n                pygame.image.load(file_name), self.cell_size\n            )\n        if self.cracked_hole_img is None:\n            file_name = path.join(path.dirname(__file__), \"img/cracked_hole.png\")\n            self.cracked_hole_img = pygame.transform.scale(\n                pygame.image.load(file_name), self.cell_size\n            )\n        if self.ice_img is None:\n            file_name = path.join(path.dirname(__file__), \"img/ice.png\")\n            self.ice_img = pygame.transform.scale(\n                pygame.image.load(file_name), self.cell_size\n            )\n        if self.goal_img is None:\n            file_name = path.join(path.dirname(__file__), \"img/goal.png\")\n            self.goal_img = pygame.transform.scale(\n                pygame.image.load(file_name), self.cell_size\n            )\n        if self.start_img is None:\n            file_name = path.join(path.dirname(__file__), \"img/stool.png\")\n            self.start_img = pygame.transform.scale(\n                pygame.image.load(file_name), self.cell_size\n            )\n        if self.elf_images is None:\n            elfs = [\n                path.join(path.dirname(__file__), \"img/elf_left.png\"),\n                path.join(path.dirname(__file__), \"img/elf_down.png\"),\n                path.join(path.dirname(__file__), \"img/elf_right.png\"),\n                path.join(path.dirname(__file__), \"img/elf_up.png\"),\n            ]\n            self.elf_images = [\n                pygame.transform.scale(pygame.image.load(f_name), self.cell_size)\n                for f_name in elfs\n            ]\n\n        desc = self.desc.tolist()\n        assert isinstance(desc, list), f\"desc should be a list or an array, got {desc}\"\n        for y in range(self.nrow):\n            for x in range(self.ncol):\n                pos = (x * self.cell_size[0], y * self.cell_size[1])\n                rect = (*pos, *self.cell_size)\n\n                self.window_surface.blit(self.ice_img, pos)\n                if desc[y][x] == b\"H\":\n                    self.window_surface.blit(self.hole_img, pos)\n                elif desc[y][x] == b\"G\":\n                    self.window_surface.blit(self.goal_img, pos)\n                elif desc[y][x] == b\"S\":\n                    self.window_surface.blit(self.start_img, pos)\n\n                pygame.draw.rect(self.window_surface, (180, 200, 230), rect, 1)\n\n        # paint the elf\n        bot_row, bot_col = self.s // self.ncol, self.s % self.ncol\n        cell_rect = (bot_col * self.cell_size[0], bot_row * self.cell_size[1])\n        last_action = self.lastaction if self.lastaction is not None else 1\n        elf_img = self.elf_images[last_action]\n\n        if desc[bot_row][bot_col] == b\"H\":\n            self.window_surface.blit(self.cracked_hole_img, cell_rect)\n        else:\n            self.window_surface.blit(elf_img, cell_rect)\n\n        if mode == \"human\":\n            pygame.event.pump()\n            pygame.display.update()\n            self.clock.tick(self.metadata[\"render_fps\"])\n        elif mode == \"rgb_array\":\n            return np.transpose(\n                np.array(pygame.surfarray.pixels3d(self.window_surface)), axes=(1, 0, 2)\n            )\n\n    @staticmethod\n    def _center_small_rect(big_rect, small_dims):\n        offset_w = (big_rect[2] - small_dims[0]) / 2\n        offset_h = (big_rect[3] - small_dims[1]) / 2\n        return (\n            big_rect[0] + offset_w,\n            big_rect[1] + offset_h,\n        )\n\n    def _render_text(self):\n        desc = self.desc.tolist()\n        outfile = StringIO()\n\n        row, col = self.s // self.ncol, self.s % self.ncol\n        desc = [[c.decode(\"utf-8\") for c in line] for line in desc]\n        desc[row][col] = utils.colorize(desc[row][col], \"red\", highlight=True)\n        if self.lastaction is not None:\n            outfile.write(f\"  ({['Left', 'Down', 'Right', 'Up'][self.lastaction]})\\n\")\n        else:\n            outfile.write(\"\\n\")\n        outfile.write(\"\\n\".join(\"\".join(line) for line in desc) + \"\\n\")\n\n        with closing(outfile):\n            return outfile.getvalue()\n\n    def close(self):\n        if self.window_surface is not None:\n            import pygame\n\n            pygame.display.quit()\n            pygame.quit()\n\n\n# Elf and stool from https://franuka.itch.io/rpg-snow-tileset\n# All other assets by Mel Tillery http://www.cyaneus.com/\n"
  },
  {
    "path": "gym/envs/toy_text/taxi.py",
    "content": "from contextlib import closing\nfrom io import StringIO\nfrom os import path\nfrom typing import Optional\n\nimport numpy as np\n\nfrom gym import Env, logger, spaces, utils\nfrom gym.envs.toy_text.utils import categorical_sample\nfrom gym.error import DependencyNotInstalled\n\nMAP = [\n    \"+---------+\",\n    \"|R: | : :G|\",\n    \"| : | : : |\",\n    \"| : : : : |\",\n    \"| | : | : |\",\n    \"|Y| : |B: |\",\n    \"+---------+\",\n]\nWINDOW_SIZE = (550, 350)\n\n\nclass TaxiEnv(Env):\n    \"\"\"\n\n    The Taxi Problem\n    from \"Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition\"\n    by Tom Dietterich\n\n    ### Description\n    There are four designated locations in the grid world indicated by R(ed),\n    G(reen), Y(ellow), and B(lue). When the episode starts, the taxi starts off\n    at a random square and the passenger is at a random location. The taxi\n    drives to the passenger's location, picks up the passenger, drives to the\n    passenger's destination (another one of the four specified locations), and\n    then drops off the passenger. Once the passenger is dropped off, the episode ends.\n\n    Map:\n\n        +---------+\n        |R: | : :G|\n        | : | : : |\n        | : : : : |\n        | | : | : |\n        |Y| : |B: |\n        +---------+\n\n    ### Actions\n    There are 6 discrete deterministic actions:\n    - 0: move south\n    - 1: move north\n    - 2: move east\n    - 3: move west\n    - 4: pickup passenger\n    - 5: drop off passenger\n\n    ### Observations\n    There are 500 discrete states since there are 25 taxi positions, 5 possible\n    locations of the passenger (including the case when the passenger is in the\n    taxi), and 4 destination locations.\n\n    Note that there are 400 states that can actually be reached during an\n    episode. The missing states correspond to situations in which the passenger\n    is at the same location as their destination, as this typically signals the\n    end of an episode. Four additional states can be observed right after a\n    successful episodes, when both the passenger and the taxi are at the destination.\n    This gives a total of 404 reachable discrete states.\n\n    Each state space is represented by the tuple:\n    (taxi_row, taxi_col, passenger_location, destination)\n\n    An observation is an integer that encodes the corresponding state.\n    The state tuple can then be decoded with the \"decode\" method.\n\n    Passenger locations:\n    - 0: R(ed)\n    - 1: G(reen)\n    - 2: Y(ellow)\n    - 3: B(lue)\n    - 4: in taxi\n\n    Destinations:\n    - 0: R(ed)\n    - 1: G(reen)\n    - 2: Y(ellow)\n    - 3: B(lue)\n\n    ### Info\n\n    ``step`` and ``reset()`` will return an info dictionary that contains \"p\" and \"action_mask\" containing\n        the probability that the state is taken and a mask of what actions will result in a change of state to speed up training.\n\n    As Taxi's initial state is a stochastic, the \"p\" key represents the probability of the\n    transition however this value is currently bugged being 1.0, this will be fixed soon.\n    As the steps are deterministic, \"p\" represents the probability of the transition which is always 1.0\n\n    For some cases, taking an action will have no effect on the state of the agent.\n    In v0.25.0, ``info[\"action_mask\"]`` contains a np.ndarray for each of the action specifying\n    if the action will change the state.\n\n    To sample a modifying action, use ``action = env.action_space.sample(info[\"action_mask\"])``\n    Or with a Q-value based algorithm ``action = np.argmax(q_values[obs, np.where(info[\"action_mask\"] == 1)[0]])``.\n\n    ### Rewards\n    - -1 per step unless other reward is triggered.\n    - +20 delivering passenger.\n    - -10  executing \"pickup\" and \"drop-off\" actions illegally.\n\n    ### Arguments\n\n    ```\n    gym.make('Taxi-v3')\n    ```\n\n    ### Version History\n    * v3: Map Correction + Cleaner Domain Description, v0.25.0 action masking added to the reset and step information\n    * v2: Disallow Taxi start location = goal location, Update Taxi observations in the rollout, Update Taxi reward threshold.\n    * v1: Remove (3,2) from locs, add passidx<4 check\n    * v0: Initial versions release\n    \"\"\"\n\n    metadata = {\n        \"render_modes\": [\"human\", \"ansi\", \"rgb_array\"],\n        \"render_fps\": 4,\n    }\n\n    def __init__(self, render_mode: Optional[str] = None):\n        self.desc = np.asarray(MAP, dtype=\"c\")\n\n        self.locs = locs = [(0, 0), (0, 4), (4, 0), (4, 3)]\n        self.locs_colors = [(255, 0, 0), (0, 255, 0), (255, 255, 0), (0, 0, 255)]\n\n        num_states = 500\n        num_rows = 5\n        num_columns = 5\n        max_row = num_rows - 1\n        max_col = num_columns - 1\n        self.initial_state_distrib = np.zeros(num_states)\n        num_actions = 6\n        self.P = {\n            state: {action: [] for action in range(num_actions)}\n            for state in range(num_states)\n        }\n        for row in range(num_rows):\n            for col in range(num_columns):\n                for pass_idx in range(len(locs) + 1):  # +1 for being inside taxi\n                    for dest_idx in range(len(locs)):\n                        state = self.encode(row, col, pass_idx, dest_idx)\n                        if pass_idx < 4 and pass_idx != dest_idx:\n                            self.initial_state_distrib[state] += 1\n                        for action in range(num_actions):\n                            # defaults\n                            new_row, new_col, new_pass_idx = row, col, pass_idx\n                            reward = (\n                                -1\n                            )  # default reward when there is no pickup/dropoff\n                            terminated = False\n                            taxi_loc = (row, col)\n\n                            if action == 0:\n                                new_row = min(row + 1, max_row)\n                            elif action == 1:\n                                new_row = max(row - 1, 0)\n                            if action == 2 and self.desc[1 + row, 2 * col + 2] == b\":\":\n                                new_col = min(col + 1, max_col)\n                            elif action == 3 and self.desc[1 + row, 2 * col] == b\":\":\n                                new_col = max(col - 1, 0)\n                            elif action == 4:  # pickup\n                                if pass_idx < 4 and taxi_loc == locs[pass_idx]:\n                                    new_pass_idx = 4\n                                else:  # passenger not at location\n                                    reward = -10\n                            elif action == 5:  # dropoff\n                                if (taxi_loc == locs[dest_idx]) and pass_idx == 4:\n                                    new_pass_idx = dest_idx\n                                    terminated = True\n                                    reward = 20\n                                elif (taxi_loc in locs) and pass_idx == 4:\n                                    new_pass_idx = locs.index(taxi_loc)\n                                else:  # dropoff at wrong location\n                                    reward = -10\n                            new_state = self.encode(\n                                new_row, new_col, new_pass_idx, dest_idx\n                            )\n                            self.P[state][action].append(\n                                (1.0, new_state, reward, terminated)\n                            )\n        self.initial_state_distrib /= self.initial_state_distrib.sum()\n        self.action_space = spaces.Discrete(num_actions)\n        self.observation_space = spaces.Discrete(num_states)\n\n        self.render_mode = render_mode\n\n        # pygame utils\n        self.window = None\n        self.clock = None\n        self.cell_size = (\n            WINDOW_SIZE[0] / self.desc.shape[1],\n            WINDOW_SIZE[1] / self.desc.shape[0],\n        )\n        self.taxi_imgs = None\n        self.taxi_orientation = 0\n        self.passenger_img = None\n        self.destination_img = None\n        self.median_horiz = None\n        self.median_vert = None\n        self.background_img = None\n\n    def encode(self, taxi_row, taxi_col, pass_loc, dest_idx):\n        # (5) 5, 5, 4\n        i = taxi_row\n        i *= 5\n        i += taxi_col\n        i *= 5\n        i += pass_loc\n        i *= 4\n        i += dest_idx\n        return i\n\n    def decode(self, i):\n        out = []\n        out.append(i % 4)\n        i = i // 4\n        out.append(i % 5)\n        i = i // 5\n        out.append(i % 5)\n        i = i // 5\n        out.append(i)\n        assert 0 <= i < 5\n        return reversed(out)\n\n    def action_mask(self, state: int):\n        \"\"\"Computes an action mask for the action space using the state information.\"\"\"\n        mask = np.zeros(6, dtype=np.int8)\n        taxi_row, taxi_col, pass_loc, dest_idx = self.decode(state)\n        if taxi_row < 4:\n            mask[0] = 1\n        if taxi_row > 0:\n            mask[1] = 1\n        if taxi_col < 4 and self.desc[taxi_row + 1, 2 * taxi_col + 2] == b\":\":\n            mask[2] = 1\n        if taxi_col > 0 and self.desc[taxi_row + 1, 2 * taxi_col] == b\":\":\n            mask[3] = 1\n        if pass_loc < 4 and (taxi_row, taxi_col) == self.locs[pass_loc]:\n            mask[4] = 1\n        if pass_loc == 4 and (\n            (taxi_row, taxi_col) == self.locs[dest_idx]\n            or (taxi_row, taxi_col) in self.locs\n        ):\n            mask[5] = 1\n        return mask\n\n    def step(self, a):\n        transitions = self.P[self.s][a]\n        i = categorical_sample([t[0] for t in transitions], self.np_random)\n        p, s, r, t = transitions[i]\n        self.s = s\n        self.lastaction = a\n\n        if self.render_mode == \"human\":\n            self.render()\n        return (int(s), r, t, False, {\"prob\": p, \"action_mask\": self.action_mask(s)})\n\n    def reset(\n        self,\n        *,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ):\n        super().reset(seed=seed)\n        self.s = categorical_sample(self.initial_state_distrib, self.np_random)\n        self.lastaction = None\n        self.taxi_orientation = 0\n\n        if self.render_mode == \"human\":\n            self.render()\n        return int(self.s), {\"prob\": 1.0, \"action_mask\": self.action_mask(self.s)}\n\n    def render(self):\n        if self.render_mode is None:\n            logger.warn(\n                \"You are calling render method without specifying any render mode. \"\n                \"You can specify the render_mode at initialization, \"\n                f'e.g. gym(\"{self.spec.id}\", render_mode=\"rgb_array\")'\n            )\n        if self.render_mode == \"ansi\":\n            return self._render_text()\n        else:  # self.render_mode in {\"human\", \"rgb_array\"}:\n            return self._render_gui(self.render_mode)\n\n    def _render_gui(self, mode):\n        try:\n            import pygame  # dependency to pygame only if rendering with human\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[toy_text]`\"\n            )\n\n        if self.window is None:\n            pygame.init()\n            pygame.display.set_caption(\"Taxi\")\n            if mode == \"human\":\n                self.window = pygame.display.set_mode(WINDOW_SIZE)\n            elif mode == \"rgb_array\":\n                self.window = pygame.Surface(WINDOW_SIZE)\n\n        assert (\n            self.window is not None\n        ), \"Something went wrong with pygame. This should never happen.\"\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n        if self.taxi_imgs is None:\n            file_names = [\n                path.join(path.dirname(__file__), \"img/cab_front.png\"),\n                path.join(path.dirname(__file__), \"img/cab_rear.png\"),\n                path.join(path.dirname(__file__), \"img/cab_right.png\"),\n                path.join(path.dirname(__file__), \"img/cab_left.png\"),\n            ]\n            self.taxi_imgs = [\n                pygame.transform.scale(pygame.image.load(file_name), self.cell_size)\n                for file_name in file_names\n            ]\n        if self.passenger_img is None:\n            file_name = path.join(path.dirname(__file__), \"img/passenger.png\")\n            self.passenger_img = pygame.transform.scale(\n                pygame.image.load(file_name), self.cell_size\n            )\n        if self.destination_img is None:\n            file_name = path.join(path.dirname(__file__), \"img/hotel.png\")\n            self.destination_img = pygame.transform.scale(\n                pygame.image.load(file_name), self.cell_size\n            )\n            self.destination_img.set_alpha(170)\n        if self.median_horiz is None:\n            file_names = [\n                path.join(path.dirname(__file__), \"img/gridworld_median_left.png\"),\n                path.join(path.dirname(__file__), \"img/gridworld_median_horiz.png\"),\n                path.join(path.dirname(__file__), \"img/gridworld_median_right.png\"),\n            ]\n            self.median_horiz = [\n                pygame.transform.scale(pygame.image.load(file_name), self.cell_size)\n                for file_name in file_names\n            ]\n        if self.median_vert is None:\n            file_names = [\n                path.join(path.dirname(__file__), \"img/gridworld_median_top.png\"),\n                path.join(path.dirname(__file__), \"img/gridworld_median_vert.png\"),\n                path.join(path.dirname(__file__), \"img/gridworld_median_bottom.png\"),\n            ]\n            self.median_vert = [\n                pygame.transform.scale(pygame.image.load(file_name), self.cell_size)\n                for file_name in file_names\n            ]\n        if self.background_img is None:\n            file_name = path.join(path.dirname(__file__), \"img/taxi_background.png\")\n            self.background_img = pygame.transform.scale(\n                pygame.image.load(file_name), self.cell_size\n            )\n\n        desc = self.desc\n\n        for y in range(0, desc.shape[0]):\n            for x in range(0, desc.shape[1]):\n                cell = (x * self.cell_size[0], y * self.cell_size[1])\n                self.window.blit(self.background_img, cell)\n                if desc[y][x] == b\"|\" and (y == 0 or desc[y - 1][x] != b\"|\"):\n                    self.window.blit(self.median_vert[0], cell)\n                elif desc[y][x] == b\"|\" and (\n                    y == desc.shape[0] - 1 or desc[y + 1][x] != b\"|\"\n                ):\n                    self.window.blit(self.median_vert[2], cell)\n                elif desc[y][x] == b\"|\":\n                    self.window.blit(self.median_vert[1], cell)\n                elif desc[y][x] == b\"-\" and (x == 0 or desc[y][x - 1] != b\"-\"):\n                    self.window.blit(self.median_horiz[0], cell)\n                elif desc[y][x] == b\"-\" and (\n                    x == desc.shape[1] - 1 or desc[y][x + 1] != b\"-\"\n                ):\n                    self.window.blit(self.median_horiz[2], cell)\n                elif desc[y][x] == b\"-\":\n                    self.window.blit(self.median_horiz[1], cell)\n\n        for cell, color in zip(self.locs, self.locs_colors):\n            color_cell = pygame.Surface(self.cell_size)\n            color_cell.set_alpha(128)\n            color_cell.fill(color)\n            loc = self.get_surf_loc(cell)\n            self.window.blit(color_cell, (loc[0], loc[1] + 10))\n\n        taxi_row, taxi_col, pass_idx, dest_idx = self.decode(self.s)\n\n        if pass_idx < 4:\n            self.window.blit(self.passenger_img, self.get_surf_loc(self.locs[pass_idx]))\n\n        if self.lastaction in [0, 1, 2, 3]:\n            self.taxi_orientation = self.lastaction\n        dest_loc = self.get_surf_loc(self.locs[dest_idx])\n        taxi_location = self.get_surf_loc((taxi_row, taxi_col))\n\n        if dest_loc[1] <= taxi_location[1]:\n            self.window.blit(\n                self.destination_img,\n                (dest_loc[0], dest_loc[1] - self.cell_size[1] // 2),\n            )\n            self.window.blit(self.taxi_imgs[self.taxi_orientation], taxi_location)\n        else:  # change blit order for overlapping appearance\n            self.window.blit(self.taxi_imgs[self.taxi_orientation], taxi_location)\n            self.window.blit(\n                self.destination_img,\n                (dest_loc[0], dest_loc[1] - self.cell_size[1] // 2),\n            )\n\n        if mode == \"human\":\n            pygame.display.update()\n            self.clock.tick(self.metadata[\"render_fps\"])\n        elif mode == \"rgb_array\":\n            return np.transpose(\n                np.array(pygame.surfarray.pixels3d(self.window)), axes=(1, 0, 2)\n            )\n\n    def get_surf_loc(self, map_loc):\n        return (map_loc[1] * 2 + 1) * self.cell_size[0], (\n            map_loc[0] + 1\n        ) * self.cell_size[1]\n\n    def _render_text(self):\n        desc = self.desc.copy().tolist()\n        outfile = StringIO()\n\n        out = [[c.decode(\"utf-8\") for c in line] for line in desc]\n        taxi_row, taxi_col, pass_idx, dest_idx = self.decode(self.s)\n\n        def ul(x):\n            return \"_\" if x == \" \" else x\n\n        if pass_idx < 4:\n            out[1 + taxi_row][2 * taxi_col + 1] = utils.colorize(\n                out[1 + taxi_row][2 * taxi_col + 1], \"yellow\", highlight=True\n            )\n            pi, pj = self.locs[pass_idx]\n            out[1 + pi][2 * pj + 1] = utils.colorize(\n                out[1 + pi][2 * pj + 1], \"blue\", bold=True\n            )\n        else:  # passenger in taxi\n            out[1 + taxi_row][2 * taxi_col + 1] = utils.colorize(\n                ul(out[1 + taxi_row][2 * taxi_col + 1]), \"green\", highlight=True\n            )\n\n        di, dj = self.locs[dest_idx]\n        out[1 + di][2 * dj + 1] = utils.colorize(out[1 + di][2 * dj + 1], \"magenta\")\n        outfile.write(\"\\n\".join([\"\".join(row) for row in out]) + \"\\n\")\n        if self.lastaction is not None:\n            outfile.write(\n                f\"  ({['South', 'North', 'East', 'West', 'Pickup', 'Dropoff'][self.lastaction]})\\n\"\n            )\n        else:\n            outfile.write(\"\\n\")\n\n        with closing(outfile):\n            return outfile.getvalue()\n\n    def close(self):\n        if self.window is not None:\n            import pygame\n\n            pygame.display.quit()\n            pygame.quit()\n\n\n# Taxi rider from https://franuka.itch.io/rpg-asset-pack\n# All other assets by Mel Tillery http://www.cyaneus.com/\n"
  },
  {
    "path": "gym/envs/toy_text/utils.py",
    "content": "import numpy as np\n\n\ndef categorical_sample(prob_n, np_random: np.random.Generator):\n    \"\"\"Sample from categorical distribution where each row specifies class probabilities.\"\"\"\n    prob_n = np.asarray(prob_n)\n    csprob_n = np.cumsum(prob_n)\n    return np.argmax(csprob_n > np_random.random())\n"
  },
  {
    "path": "gym/error.py",
    "content": "\"\"\"Set of Error classes for gym.\"\"\"\nimport warnings\n\n\nclass Error(Exception):\n    \"\"\"Error superclass.\"\"\"\n\n\n# Local errors\n\n\nclass Unregistered(Error):\n    \"\"\"Raised when the user requests an item from the registry that does not actually exist.\"\"\"\n\n\nclass UnregisteredEnv(Unregistered):\n    \"\"\"Raised when the user requests an env from the registry that does not actually exist.\"\"\"\n\n\nclass NamespaceNotFound(UnregisteredEnv):\n    \"\"\"Raised when the user requests an env from the registry where the namespace doesn't exist.\"\"\"\n\n\nclass NameNotFound(UnregisteredEnv):\n    \"\"\"Raised when the user requests an env from the registry where the name doesn't exist.\"\"\"\n\n\nclass VersionNotFound(UnregisteredEnv):\n    \"\"\"Raised when the user requests an env from the registry where the version doesn't exist.\"\"\"\n\n\nclass UnregisteredBenchmark(Unregistered):\n    \"\"\"Raised when the user requests an env from the registry that does not actually exist.\"\"\"\n\n\nclass DeprecatedEnv(Error):\n    \"\"\"Raised when the user requests an env from the registry with an older version number than the latest env with the same name.\"\"\"\n\n\nclass RegistrationError(Error):\n    \"\"\"Raised when the user attempts to register an invalid env. For example, an unversioned env when a versioned env exists.\"\"\"\n\n\nclass UnseedableEnv(Error):\n    \"\"\"Raised when the user tries to seed an env that does not support seeding.\"\"\"\n\n\nclass DependencyNotInstalled(Error):\n    \"\"\"Raised when the user has not installed a dependency.\"\"\"\n\n\nclass UnsupportedMode(Error):\n    \"\"\"Raised when the user requests a rendering mode not supported by the environment.\"\"\"\n\n\nclass ResetNeeded(Error):\n    \"\"\"When the order enforcing is violated, i.e. step or render is called before reset.\"\"\"\n\n\nclass ResetNotAllowed(Error):\n    \"\"\"When the monitor is active, raised when the user tries to step an environment that's not yet terminated or truncated.\"\"\"\n\n\nclass InvalidAction(Error):\n    \"\"\"Raised when the user performs an action not contained within the action space.\"\"\"\n\n\n# API errors\n\n\nclass APIError(Error):\n    \"\"\"Deprecated, to be removed at gym 1.0.\"\"\"\n\n    def __init__(\n        self,\n        message=None,\n        http_body=None,\n        http_status=None,\n        json_body=None,\n        headers=None,\n    ):\n        \"\"\"Initialise API error.\"\"\"\n        super().__init__(message)\n\n        warnings.warn(\"APIError is deprecated and will be removed at gym 1.0\")\n\n        if http_body and hasattr(http_body, \"decode\"):\n            try:\n                http_body = http_body.decode(\"utf-8\")\n            except Exception:\n                http_body = \"<Could not decode body as utf-8.>\"\n\n        self._message = message\n        self.http_body = http_body\n        self.http_status = http_status\n        self.json_body = json_body\n        self.headers = headers or {}\n        self.request_id = self.headers.get(\"request-id\", None)\n\n    def __unicode__(self):\n        \"\"\"Returns a string, if request_id is not None then make message other use the _message.\"\"\"\n        if self.request_id is not None:\n            msg = self._message or \"<empty message>\"\n            return f\"Request {self.request_id}: {msg}\"\n        else:\n            return self._message\n\n    def __str__(self):\n        \"\"\"Returns the __unicode__.\"\"\"\n        return self.__unicode__()\n\n\nclass APIConnectionError(APIError):\n    \"\"\"Deprecated, to be removed at gym 1.0.\"\"\"\n\n\nclass InvalidRequestError(APIError):\n    \"\"\"Deprecated, to be removed at gym 1.0.\"\"\"\n\n    def __init__(\n        self,\n        message,\n        param,\n        http_body=None,\n        http_status=None,\n        json_body=None,\n        headers=None,\n    ):\n        \"\"\"Initialises the invalid request error.\"\"\"\n        super().__init__(message, http_body, http_status, json_body, headers)\n        self.param = param\n\n\nclass AuthenticationError(APIError):\n    \"\"\"Deprecated, to be removed at gym 1.0.\"\"\"\n\n\nclass RateLimitError(APIError):\n    \"\"\"Deprecated, to be removed at gym 1.0.\"\"\"\n\n\n# Video errors\n\n\nclass VideoRecorderError(Error):\n    \"\"\"Unused error.\"\"\"\n\n\nclass InvalidFrame(Error):\n    \"\"\"Error message when an invalid frame is captured.\"\"\"\n\n\n# Wrapper errors\n\n\nclass DoubleWrapperError(Error):\n    \"\"\"Error message for when using double wrappers.\"\"\"\n\n\nclass WrapAfterConfigureError(Error):\n    \"\"\"Error message for using wrap after configure.\"\"\"\n\n\nclass RetriesExceededError(Error):\n    \"\"\"Error message for retries exceeding set number.\"\"\"\n\n\n# Vectorized environments errors\n\n\nclass AlreadyPendingCallError(Exception):\n    \"\"\"Raised when `reset`, or `step` is called asynchronously (e.g. with `reset_async`, or `step_async` respectively), and `reset_async`, or `step_async` (respectively) is called again (without a complete call to `reset_wait`, or `step_wait` respectively).\"\"\"\n\n    def __init__(self, message: str, name: str):\n        \"\"\"Initialises the exception with name attributes.\"\"\"\n        super().__init__(message)\n        self.name = name\n\n\nclass NoAsyncCallError(Exception):\n    \"\"\"Raised when an asynchronous `reset`, or `step` is not running, but `reset_wait`, or `step_wait` (respectively) is called.\"\"\"\n\n    def __init__(self, message: str, name: str):\n        \"\"\"Initialises the exception with name attributes.\"\"\"\n        super().__init__(message)\n        self.name = name\n\n\nclass ClosedEnvironmentError(Exception):\n    \"\"\"Trying to call `reset`, or `step`, while the environment is closed.\"\"\"\n\n\nclass CustomSpaceError(Exception):\n    \"\"\"The space is a custom gym.Space instance, and is not supported by `AsyncVectorEnv` with `shared_memory=True`.\"\"\"\n"
  },
  {
    "path": "gym/logger.py",
    "content": "\"\"\"Set of functions for logging messages.\"\"\"\nimport sys\nimport warnings\nfrom typing import Optional, Type\n\nfrom gym.utils import colorize\n\nDEBUG = 10\nINFO = 20\nWARN = 30\nERROR = 40\nDISABLED = 50\n\nmin_level = 30\n\n\n# Ensure DeprecationWarning to be displayed (#2685, #3059)\nwarnings.filterwarnings(\"once\", \"\", DeprecationWarning, module=r\"^gym\\.\")\n\n\ndef set_level(level: int):\n    \"\"\"Set logging threshold on current logger.\"\"\"\n    global min_level\n    min_level = level\n\n\ndef debug(msg: str, *args: object):\n    \"\"\"Logs a debug message to the user.\"\"\"\n    if min_level <= DEBUG:\n        print(f\"DEBUG: {msg % args}\", file=sys.stderr)\n\n\ndef info(msg: str, *args: object):\n    \"\"\"Logs an info message to the user.\"\"\"\n    if min_level <= INFO:\n        print(f\"INFO: {msg % args}\", file=sys.stderr)\n\n\ndef warn(\n    msg: str,\n    *args: object,\n    category: Optional[Type[Warning]] = None,\n    stacklevel: int = 1,\n):\n    \"\"\"Raises a warning to the user if the min_level <= WARN.\n\n    Args:\n        msg: The message to warn the user\n        *args: Additional information to warn the user\n        category: The category of warning\n        stacklevel: The stack level to raise to\n    \"\"\"\n    if min_level <= WARN:\n        warnings.warn(\n            colorize(f\"WARN: {msg % args}\", \"yellow\"),\n            category=category,\n            stacklevel=stacklevel + 1,\n        )\n\n\ndef deprecation(msg: str, *args: object):\n    \"\"\"Logs a deprecation warning to users.\"\"\"\n    warn(msg, *args, category=DeprecationWarning, stacklevel=2)\n\n\ndef error(msg: str, *args: object):\n    \"\"\"Logs an error message if min_level <= ERROR in red on the sys.stderr.\"\"\"\n    if min_level <= ERROR:\n        print(colorize(f\"ERROR: {msg % args}\", \"red\"), file=sys.stderr)\n\n\n# DEPRECATED:\nsetLevel = set_level\n"
  },
  {
    "path": "gym/py.typed",
    "content": ""
  },
  {
    "path": "gym/spaces/__init__.py",
    "content": "\"\"\"This module implements various spaces.\n\nSpaces describe mathematical sets and are used in Gym to specify valid actions and observations.\nEvery Gym environment must have the attributes ``action_space`` and ``observation_space``.\nIf, for instance, three possible actions (0,1,2) can be performed in your environment and observations\nare vectors in the two-dimensional unit cube, the environment code may contain the following two lines::\n\n    self.action_space = spaces.Discrete(3)\n    self.observation_space = spaces.Box(0, 1, shape=(2,))\n\"\"\"\nfrom gym.spaces.box import Box\nfrom gym.spaces.dict import Dict\nfrom gym.spaces.discrete import Discrete\nfrom gym.spaces.graph import Graph, GraphInstance\nfrom gym.spaces.multi_binary import MultiBinary\nfrom gym.spaces.multi_discrete import MultiDiscrete\nfrom gym.spaces.sequence import Sequence\nfrom gym.spaces.space import Space\nfrom gym.spaces.text import Text\nfrom gym.spaces.tuple import Tuple\nfrom gym.spaces.utils import flatdim, flatten, flatten_space, unflatten\n\n__all__ = [\n    \"Space\",\n    \"Box\",\n    \"Discrete\",\n    \"Text\",\n    \"Graph\",\n    \"GraphInstance\",\n    \"MultiDiscrete\",\n    \"MultiBinary\",\n    \"Tuple\",\n    \"Sequence\",\n    \"Dict\",\n    \"flatdim\",\n    \"flatten_space\",\n    \"flatten\",\n    \"unflatten\",\n]\n"
  },
  {
    "path": "gym/spaces/box.py",
    "content": "\"\"\"Implementation of a space that represents closed boxes in euclidean space.\"\"\"\nfrom typing import Dict, List, Optional, Sequence, SupportsFloat, Tuple, Type, Union\n\nimport numpy as np\n\nimport gym.error\nfrom gym import logger\nfrom gym.spaces.space import Space\n\n\ndef _short_repr(arr: np.ndarray) -> str:\n    \"\"\"Create a shortened string representation of a numpy array.\n\n    If arr is a multiple of the all-ones vector, return a string representation of the multiplier.\n    Otherwise, return a string representation of the entire array.\n\n    Args:\n        arr: The array to represent\n\n    Returns:\n        A short representation of the array\n    \"\"\"\n    if arr.size != 0 and np.min(arr) == np.max(arr):\n        return str(np.min(arr))\n    return str(arr)\n\n\ndef is_float_integer(var) -> bool:\n    \"\"\"Checks if a variable is an integer or float.\"\"\"\n    return np.issubdtype(type(var), np.integer) or np.issubdtype(type(var), np.floating)\n\n\nclass Box(Space[np.ndarray]):\n    r\"\"\"A (possibly unbounded) box in :math:`\\mathbb{R}^n`.\n\n    Specifically, a Box represents the Cartesian product of n closed intervals.\n    Each interval has the form of one of :math:`[a, b]`, :math:`(-\\infty, b]`,\n    :math:`[a, \\infty)`, or :math:`(-\\infty, \\infty)`.\n\n    There are two common use cases:\n\n    * Identical bound for each dimension::\n\n        >>> Box(low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32)\n        Box(3, 4)\n\n    * Independent bound for each dimension::\n\n        >>> Box(low=np.array([-1.0, -2.0]), high=np.array([2.0, 4.0]), dtype=np.float32)\n        Box(2,)\n    \"\"\"\n\n    def __init__(\n        self,\n        low: Union[SupportsFloat, np.ndarray],\n        high: Union[SupportsFloat, np.ndarray],\n        shape: Optional[Sequence[int]] = None,\n        dtype: Type = np.float32,\n        seed: Optional[Union[int, np.random.Generator]] = None,\n    ):\n        r\"\"\"Constructor of :class:`Box`.\n\n        The argument ``low`` specifies the lower bound of each dimension and ``high`` specifies the upper bounds.\n        I.e., the space that is constructed will be the product of the intervals :math:`[\\text{low}[i], \\text{high}[i]]`.\n\n        If ``low`` (or ``high``) is a scalar, the lower bound (or upper bound, respectively) will be assumed to be\n        this value across all dimensions.\n\n        Args:\n            low (Union[SupportsFloat, np.ndarray]): Lower bounds of the intervals.\n            high (Union[SupportsFloat, np.ndarray]): Upper bounds of the intervals.\n            shape (Optional[Sequence[int]]): The shape is inferred from the shape of `low` or `high` `np.ndarray`s with\n                `low` and `high` scalars defaulting to a shape of (1,)\n            dtype: The dtype of the elements of the space. If this is an integer type, the :class:`Box` is essentially a discrete space.\n            seed: Optionally, you can use this argument to seed the RNG that is used to sample from the space.\n\n        Raises:\n            ValueError: If no shape information is provided (shape is None, low is None and high is None) then a\n                value error is raised.\n        \"\"\"\n        assert (\n            dtype is not None\n        ), \"Box dtype must be explicitly provided, cannot be None.\"\n        self.dtype = np.dtype(dtype)\n\n        # determine shape if it isn't provided directly\n        if shape is not None:\n            assert all(\n                np.issubdtype(type(dim), np.integer) for dim in shape\n            ), f\"Expect all shape elements to be an integer, actual type: {tuple(type(dim) for dim in shape)}\"\n            shape = tuple(int(dim) for dim in shape)  # This changes any np types to int\n        elif isinstance(low, np.ndarray):\n            shape = low.shape\n        elif isinstance(high, np.ndarray):\n            shape = high.shape\n        elif is_float_integer(low) and is_float_integer(high):\n            shape = (1,)\n        else:\n            raise ValueError(\n                f\"Box shape is inferred from low and high, expect their types to be np.ndarray, an integer or a float, actual type low: {type(low)}, high: {type(high)}\"\n            )\n\n        # Capture the boundedness information before replacing np.inf with get_inf\n        _low = np.full(shape, low, dtype=float) if is_float_integer(low) else low\n        self.bounded_below = -np.inf < _low\n        _high = np.full(shape, high, dtype=float) if is_float_integer(high) else high\n        self.bounded_above = np.inf > _high\n\n        low = _broadcast(low, dtype, shape, inf_sign=\"-\")  # type: ignore\n        high = _broadcast(high, dtype, shape, inf_sign=\"+\")  # type: ignore\n\n        assert isinstance(low, np.ndarray)\n        assert (\n            low.shape == shape\n        ), f\"low.shape doesn't match provided shape, low.shape: {low.shape}, shape: {shape}\"\n        assert isinstance(high, np.ndarray)\n        assert (\n            high.shape == shape\n        ), f\"high.shape doesn't match provided shape, high.shape: {high.shape}, shape: {shape}\"\n\n        self._shape: Tuple[int, ...] = shape\n\n        low_precision = get_precision(low.dtype)\n        high_precision = get_precision(high.dtype)\n        dtype_precision = get_precision(self.dtype)\n        if min(low_precision, high_precision) > dtype_precision:  # type: ignore\n            logger.warn(f\"Box bound precision lowered by casting to {self.dtype}\")\n        self.low = low.astype(self.dtype)\n        self.high = high.astype(self.dtype)\n\n        self.low_repr = _short_repr(self.low)\n        self.high_repr = _short_repr(self.high)\n\n        super().__init__(self.shape, self.dtype, seed)\n\n    @property\n    def shape(self) -> Tuple[int, ...]:\n        \"\"\"Has stricter type than gym.Space - never None.\"\"\"\n        return self._shape\n\n    @property\n    def is_np_flattenable(self):\n        \"\"\"Checks whether this space can be flattened to a :class:`spaces.Box`.\"\"\"\n        return True\n\n    def is_bounded(self, manner: str = \"both\") -> bool:\n        \"\"\"Checks whether the box is bounded in some sense.\n\n        Args:\n            manner (str): One of ``\"both\"``, ``\"below\"``, ``\"above\"``.\n\n        Returns:\n            If the space is bounded\n\n        Raises:\n            ValueError: If `manner` is neither ``\"both\"`` nor ``\"below\"`` or ``\"above\"``\n        \"\"\"\n        below = bool(np.all(self.bounded_below))\n        above = bool(np.all(self.bounded_above))\n        if manner == \"both\":\n            return below and above\n        elif manner == \"below\":\n            return below\n        elif manner == \"above\":\n            return above\n        else:\n            raise ValueError(\n                f\"manner is not in {{'below', 'above', 'both'}}, actual value: {manner}\"\n            )\n\n    def sample(self, mask: None = None) -> np.ndarray:\n        r\"\"\"Generates a single random sample inside the Box.\n\n        In creating a sample of the box, each coordinate is sampled (independently) from a distribution\n        that is chosen according to the form of the interval:\n\n        * :math:`[a, b]` : uniform distribution\n        * :math:`[a, \\infty)` : shifted exponential distribution\n        * :math:`(-\\infty, b]` : shifted negative exponential distribution\n        * :math:`(-\\infty, \\infty)` : normal distribution\n\n        Args:\n            mask: A mask for sampling values from the Box space, currently unsupported.\n\n        Returns:\n            A sampled value from the Box\n        \"\"\"\n        if mask is not None:\n            raise gym.error.Error(\n                f\"Box.sample cannot be provided a mask, actual value: {mask}\"\n            )\n\n        high = self.high if self.dtype.kind == \"f\" else self.high.astype(\"int64\") + 1\n        sample = np.empty(self.shape)\n\n        # Masking arrays which classify the coordinates according to interval\n        # type\n        unbounded = ~self.bounded_below & ~self.bounded_above\n        upp_bounded = ~self.bounded_below & self.bounded_above\n        low_bounded = self.bounded_below & ~self.bounded_above\n        bounded = self.bounded_below & self.bounded_above\n\n        # Vectorized sampling by interval type\n        sample[unbounded] = self.np_random.normal(size=unbounded[unbounded].shape)\n\n        sample[low_bounded] = (\n            self.np_random.exponential(size=low_bounded[low_bounded].shape)\n            + self.low[low_bounded]\n        )\n\n        sample[upp_bounded] = (\n            -self.np_random.exponential(size=upp_bounded[upp_bounded].shape)\n            + self.high[upp_bounded]\n        )\n\n        sample[bounded] = self.np_random.uniform(\n            low=self.low[bounded], high=high[bounded], size=bounded[bounded].shape\n        )\n        if self.dtype.kind == \"i\":\n            sample = np.floor(sample)\n\n        return sample.astype(self.dtype)\n\n    def contains(self, x) -> bool:\n        \"\"\"Return boolean specifying if x is a valid member of this space.\"\"\"\n        if not isinstance(x, np.ndarray):\n            logger.warn(\"Casting input x to numpy array.\")\n            try:\n                x = np.asarray(x, dtype=self.dtype)\n            except (ValueError, TypeError):\n                return False\n\n        return bool(\n            np.can_cast(x.dtype, self.dtype)\n            and x.shape == self.shape\n            and np.all(x >= self.low)\n            and np.all(x <= self.high)\n        )\n\n    def to_jsonable(self, sample_n):\n        \"\"\"Convert a batch of samples from this space to a JSONable data type.\"\"\"\n        return np.array(sample_n).tolist()\n\n    def from_jsonable(self, sample_n: Sequence[Union[float, int]]) -> List[np.ndarray]:\n        \"\"\"Convert a JSONable data type to a batch of samples from this space.\"\"\"\n        return [np.asarray(sample) for sample in sample_n]\n\n    def __repr__(self) -> str:\n        \"\"\"A string representation of this space.\n\n        The representation will include bounds, shape and dtype.\n        If a bound is uniform, only the corresponding scalar will be given to avoid redundant and ugly strings.\n\n        Returns:\n            A representation of the space\n        \"\"\"\n        return f\"Box({self.low_repr}, {self.high_repr}, {self.shape}, {self.dtype})\"\n\n    def __eq__(self, other) -> bool:\n        \"\"\"Check whether `other` is equivalent to this instance. Doesn't check dtype equivalence.\"\"\"\n        return (\n            isinstance(other, Box)\n            and (self.shape == other.shape)\n            # and (self.dtype == other.dtype)\n            and np.allclose(self.low, other.low)\n            and np.allclose(self.high, other.high)\n        )\n\n    def __setstate__(self, state: Dict):\n        \"\"\"Sets the state of the box for unpickling a box with legacy support.\"\"\"\n        super().__setstate__(state)\n\n        # legacy support through re-adding \"low_repr\" and \"high_repr\" if missing from pickled state\n        if not hasattr(self, \"low_repr\"):\n            self.low_repr = _short_repr(self.low)\n\n        if not hasattr(self, \"high_repr\"):\n            self.high_repr = _short_repr(self.high)\n\n\ndef get_inf(dtype, sign: str) -> SupportsFloat:\n    \"\"\"Returns an infinite that doesn't break things.\n\n    Args:\n        dtype: An `np.dtype`\n        sign (str): must be either `\"+\"` or `\"-\"`\n\n    Returns:\n        Gets an infinite value with the sign and dtype\n\n    Raises:\n        TypeError: Unknown sign, use either '+' or '-'\n        ValueError: Unknown dtype for infinite bounds\n    \"\"\"\n    if np.dtype(dtype).kind == \"f\":\n        if sign == \"+\":\n            return np.inf\n        elif sign == \"-\":\n            return -np.inf\n        else:\n            raise TypeError(f\"Unknown sign {sign}, use either '+' or '-'\")\n    elif np.dtype(dtype).kind == \"i\":\n        if sign == \"+\":\n            return np.iinfo(dtype).max - 2\n        elif sign == \"-\":\n            return np.iinfo(dtype).min + 2\n        else:\n            raise TypeError(f\"Unknown sign {sign}, use either '+' or '-'\")\n    else:\n        raise ValueError(f\"Unknown dtype {dtype} for infinite bounds\")\n\n\ndef get_precision(dtype) -> SupportsFloat:\n    \"\"\"Get precision of a data type.\"\"\"\n    if np.issubdtype(dtype, np.floating):\n        return np.finfo(dtype).precision\n    else:\n        return np.inf\n\n\ndef _broadcast(\n    value: Union[SupportsFloat, np.ndarray],\n    dtype,\n    shape: Tuple[int, ...],\n    inf_sign: str,\n) -> np.ndarray:\n    \"\"\"Handle infinite bounds and broadcast at the same time if needed.\"\"\"\n    if is_float_integer(value):\n        value = get_inf(dtype, inf_sign) if np.isinf(value) else value  # type: ignore\n        value = np.full(shape, value, dtype=dtype)\n    else:\n        assert isinstance(value, np.ndarray)\n        if np.any(np.isinf(value)):\n            # create new array with dtype, but maintain old one to preserve np.inf\n            temp = value.astype(dtype)\n            temp[np.isinf(value)] = get_inf(dtype, inf_sign)\n            value = temp\n    return value\n"
  },
  {
    "path": "gym/spaces/dict.py",
    "content": "\"\"\"Implementation of a space that represents the cartesian product of other spaces as a dictionary.\"\"\"\nfrom collections import OrderedDict\nfrom collections.abc import Mapping, Sequence\nfrom typing import Any\nfrom typing import Dict as TypingDict\nfrom typing import List, Optional\nfrom typing import Sequence as TypingSequence\nfrom typing import Tuple, Union\n\nimport numpy as np\n\nfrom gym.spaces.space import Space\n\n\nclass Dict(Space[TypingDict[str, Space]], Mapping):\n    \"\"\"A dictionary of :class:`Space` instances.\n\n    Elements of this space are (ordered) dictionaries of elements from the constituent spaces.\n\n    Example usage:\n\n        >>> from gym.spaces import Dict, Discrete\n        >>> observation_space = Dict({\"position\": Discrete(2), \"velocity\": Discrete(3)})\n        >>> observation_space.sample()\n        OrderedDict([('position', 1), ('velocity', 2)])\n\n    Example usage [nested]::\n\n        >>> from gym.spaces import Box, Dict, Discrete, MultiBinary, MultiDiscrete\n        >>> Dict(\n        ...     {\n        ...         \"ext_controller\": MultiDiscrete([5, 2, 2]),\n        ...         \"inner_state\": Dict(\n        ...             {\n        ...                 \"charge\": Discrete(100),\n        ...                 \"system_checks\": MultiBinary(10),\n        ...                 \"job_status\": Dict(\n        ...                     {\n        ...                         \"task\": Discrete(5),\n        ...                         \"progress\": Box(low=0, high=100, shape=()),\n        ...                     }\n        ...                 ),\n        ...             }\n        ...         ),\n        ...     }\n        ... )\n\n    It can be convenient to use :class:`Dict` spaces if you want to make complex observations or actions more human-readable.\n    Usually, it will not be possible to use elements of this space directly in learning code. However, you can easily\n    convert `Dict` observations to flat arrays by using a :class:`gym.wrappers.FlattenObservation` wrapper. Similar wrappers can be\n    implemented to deal with :class:`Dict` actions.\n    \"\"\"\n\n    def __init__(\n        self,\n        spaces: Optional[\n            Union[\n                TypingDict[str, Space],\n                TypingSequence[Tuple[str, Space]],\n            ]\n        ] = None,\n        seed: Optional[Union[dict, int, np.random.Generator]] = None,\n        **spaces_kwargs: Space,\n    ):\n        \"\"\"Constructor of :class:`Dict` space.\n\n        This space can be instantiated in one of two ways: Either you pass a dictionary\n        of spaces to :meth:`__init__` via the ``spaces`` argument, or you pass the spaces as separate\n        keyword arguments (where you will need to avoid the keys ``spaces`` and ``seed``)\n\n        Example::\n\n            >>> from gym.spaces import Box, Discrete\n            >>> Dict({\"position\": Box(-1, 1, shape=(2,)), \"color\": Discrete(3)})\n            Dict(color:Discrete(3), position:Box(-1.0, 1.0, (2,), float32))\n            >>> Dict(position=Box(-1, 1, shape=(2,)), color=Discrete(3))\n            Dict(color:Discrete(3), position:Box(-1.0, 1.0, (2,), float32))\n\n        Args:\n            spaces: A dictionary of spaces. This specifies the structure of the :class:`Dict` space\n            seed: Optionally, you can use this argument to seed the RNGs of the spaces that make up the :class:`Dict` space.\n            **spaces_kwargs: If ``spaces`` is ``None``, you need to pass the constituent spaces as keyword arguments, as described above.\n        \"\"\"\n        # Convert the spaces into an OrderedDict\n        if isinstance(spaces, Mapping) and not isinstance(spaces, OrderedDict):\n            try:\n                spaces = OrderedDict(sorted(spaces.items()))\n            except TypeError:\n                # Incomparable types (e.g. `int` vs. `str`, or user-defined types) found.\n                # The keys remain in the insertion order.\n                spaces = OrderedDict(spaces.items())\n        elif isinstance(spaces, Sequence):\n            spaces = OrderedDict(spaces)\n        elif spaces is None:\n            spaces = OrderedDict()\n        else:\n            assert isinstance(\n                spaces, OrderedDict\n            ), f\"Unexpected Dict space input, expecting dict, OrderedDict or Sequence, actual type: {type(spaces)}\"\n\n        # Add kwargs to spaces to allow both dictionary and keywords to be used\n        for key, space in spaces_kwargs.items():\n            if key not in spaces:\n                spaces[key] = space\n            else:\n                raise ValueError(\n                    f\"Dict space keyword '{key}' already exists in the spaces dictionary.\"\n                )\n\n        self.spaces = spaces\n        for key, space in self.spaces.items():\n            assert isinstance(\n                space, Space\n            ), f\"Dict space element is not an instance of Space: key='{key}', space={space}\"\n\n        super().__init__(\n            None, None, seed  # type: ignore\n        )  # None for shape and dtype, since it'll require special handling\n\n    @property\n    def is_np_flattenable(self):\n        \"\"\"Checks whether this space can be flattened to a :class:`spaces.Box`.\"\"\"\n        return all(space.is_np_flattenable for space in self.spaces.values())\n\n    def seed(self, seed: Optional[Union[dict, int]] = None) -> list:\n        \"\"\"Seed the PRNG of this space and all subspaces.\n\n        Depending on the type of seed, the subspaces will be seeded differently\n        * None - All the subspaces will use a random initial seed\n        * Int - The integer is used to seed the `Dict` space that is used to generate seed values for each of the subspaces. Warning, this does not guarantee unique seeds for all of the subspaces.\n        * Dict - Using all the keys in the seed dictionary, the values are used to seed the subspaces. This allows the seeding of multiple composite subspaces (`Dict[\"space\": Dict[...], ...]` with `{\"space\": {...}, ...}`).\n\n        Args:\n            seed: An optional list of ints or int to seed the (sub-)spaces.\n        \"\"\"\n        seeds = []\n\n        if isinstance(seed, dict):\n            assert (\n                seed.keys() == self.spaces.keys()\n            ), f\"The seed keys: {seed.keys()} are not identical to space keys: {self.spaces.keys()}\"\n            for key in seed.keys():\n                seeds += self.spaces[key].seed(seed[key])\n        elif isinstance(seed, int):\n            seeds = super().seed(seed)\n            # Using `np.int32` will mean that the same key occurring is extremely low, even for large subspaces\n            subseeds = self.np_random.integers(\n                np.iinfo(np.int32).max, size=len(self.spaces)\n            )\n            for subspace, subseed in zip(self.spaces.values(), subseeds):\n                seeds += subspace.seed(int(subseed))\n        elif seed is None:\n            for space in self.spaces.values():\n                seeds += space.seed(None)\n        else:\n            raise TypeError(\n                f\"Expected seed type: dict, int or None, actual type: {type(seed)}\"\n            )\n\n        return seeds\n\n    def sample(self, mask: Optional[TypingDict[str, Any]] = None) -> dict:\n        \"\"\"Generates a single random sample from this space.\n\n        The sample is an ordered dictionary of independent samples from the constituent spaces.\n\n        Args:\n            mask: An optional mask for each of the subspaces, expects the same keys as the space\n\n        Returns:\n            A dictionary with the same key and sampled values from :attr:`self.spaces`\n        \"\"\"\n        if mask is not None:\n            assert isinstance(\n                mask, dict\n            ), f\"Expects mask to be a dict, actual type: {type(mask)}\"\n            assert (\n                mask.keys() == self.spaces.keys()\n            ), f\"Expect mask keys to be same as space keys, mask keys: {mask.keys()}, space keys: {self.spaces.keys()}\"\n            return OrderedDict(\n                [(k, space.sample(mask[k])) for k, space in self.spaces.items()]\n            )\n\n        return OrderedDict([(k, space.sample()) for k, space in self.spaces.items()])\n\n    def contains(self, x) -> bool:\n        \"\"\"Return boolean specifying if x is a valid member of this space.\"\"\"\n        if isinstance(x, dict) and x.keys() == self.spaces.keys():\n            return all(x[key] in self.spaces[key] for key in self.spaces.keys())\n        return False\n\n    def __getitem__(self, key: str) -> Space:\n        \"\"\"Get the space that is associated to `key`.\"\"\"\n        return self.spaces[key]\n\n    def __setitem__(self, key: str, value: Space):\n        \"\"\"Set the space that is associated to `key`.\"\"\"\n        assert isinstance(\n            value, Space\n        ), f\"Trying to set {key} to Dict space with value that is not a gym space, actual type: {type(value)}\"\n        self.spaces[key] = value\n\n    def __iter__(self):\n        \"\"\"Iterator through the keys of the subspaces.\"\"\"\n        yield from self.spaces\n\n    def __len__(self) -> int:\n        \"\"\"Gives the number of simpler spaces that make up the `Dict` space.\"\"\"\n        return len(self.spaces)\n\n    def __repr__(self) -> str:\n        \"\"\"Gives a string representation of this space.\"\"\"\n        return (\n            \"Dict(\" + \", \".join([f\"{k!r}: {s}\" for k, s in self.spaces.items()]) + \")\"\n        )\n\n    def __eq__(self, other) -> bool:\n        \"\"\"Check whether `other` is equivalent to this instance.\"\"\"\n        return (\n            isinstance(other, Dict)\n            # Comparison of `OrderedDict`s is order-sensitive\n            and self.spaces == other.spaces  # OrderedDict.__eq__\n        )\n\n    def to_jsonable(self, sample_n: list) -> dict:\n        \"\"\"Convert a batch of samples from this space to a JSONable data type.\"\"\"\n        # serialize as dict-repr of vectors\n        return {\n            key: space.to_jsonable([sample[key] for sample in sample_n])\n            for key, space in self.spaces.items()\n        }\n\n    def from_jsonable(self, sample_n: TypingDict[str, list]) -> List[dict]:\n        \"\"\"Convert a JSONable data type to a batch of samples from this space.\"\"\"\n        dict_of_list: TypingDict[str, list] = {\n            key: space.from_jsonable(sample_n[key])\n            for key, space in self.spaces.items()\n        }\n\n        n_elements = len(next(iter(dict_of_list.values())))\n        result = [\n            OrderedDict({key: value[n] for key, value in dict_of_list.items()})\n            for n in range(n_elements)\n        ]\n        return result\n"
  },
  {
    "path": "gym/spaces/discrete.py",
    "content": "\"\"\"Implementation of a space consisting of finitely many elements.\"\"\"\nfrom typing import Optional, Union\n\nimport numpy as np\n\nfrom gym.spaces.space import Space\n\n\nclass Discrete(Space[int]):\n    r\"\"\"A space consisting of finitely many elements.\n\n    This class represents a finite subset of integers, more specifically a set of the form :math:`\\{ a, a+1, \\dots, a+n-1 \\}`.\n\n    Example::\n\n        >>> Discrete(2)            # {0, 1}\n        >>> Discrete(3, start=-1)  # {-1, 0, 1}\n    \"\"\"\n\n    def __init__(\n        self,\n        n: int,\n        seed: Optional[Union[int, np.random.Generator]] = None,\n        start: int = 0,\n    ):\n        r\"\"\"Constructor of :class:`Discrete` space.\n\n        This will construct the space :math:`\\{\\text{start}, ..., \\text{start} + n - 1\\}`.\n\n        Args:\n            n (int): The number of elements of this space.\n            seed: Optionally, you can use this argument to seed the RNG that is used to sample from the ``Dict`` space.\n            start (int): The smallest element of this space.\n        \"\"\"\n        assert isinstance(n, (int, np.integer))\n        assert n > 0, \"n (counts) have to be positive\"\n        assert isinstance(start, (int, np.integer))\n        self.n = int(n)\n        self.start = int(start)\n        super().__init__((), np.int64, seed)\n\n    @property\n    def is_np_flattenable(self):\n        \"\"\"Checks whether this space can be flattened to a :class:`spaces.Box`.\"\"\"\n        return True\n\n    def sample(self, mask: Optional[np.ndarray] = None) -> int:\n        \"\"\"Generates a single random sample from this space.\n\n        A sample will be chosen uniformly at random with the mask if provided\n\n        Args:\n            mask: An optional mask for if an action can be selected.\n                Expected `np.ndarray` of shape `(n,)` and dtype `np.int8` where `1` represents valid actions and `0` invalid / infeasible actions.\n                If there are no possible actions (i.e. `np.all(mask == 0)`) then `space.start` will be returned.\n\n        Returns:\n            A sampled integer from the space\n        \"\"\"\n        if mask is not None:\n            assert isinstance(\n                mask, np.ndarray\n            ), f\"The expected type of the mask is np.ndarray, actual type: {type(mask)}\"\n            assert (\n                mask.dtype == np.int8\n            ), f\"The expected dtype of the mask is np.int8, actual dtype: {mask.dtype}\"\n            assert mask.shape == (\n                self.n,\n            ), f\"The expected shape of the mask is {(self.n,)}, actual shape: {mask.shape}\"\n            valid_action_mask = mask == 1\n            assert np.all(\n                np.logical_or(mask == 0, valid_action_mask)\n            ), f\"All values of a mask should be 0 or 1, actual values: {mask}\"\n            if np.any(valid_action_mask):\n                return int(\n                    self.start + self.np_random.choice(np.where(valid_action_mask)[0])\n                )\n            else:\n                return self.start\n\n        return int(self.start + self.np_random.integers(self.n))\n\n    def contains(self, x) -> bool:\n        \"\"\"Return boolean specifying if x is a valid member of this space.\"\"\"\n        if isinstance(x, int):\n            as_int = x\n        elif isinstance(x, (np.generic, np.ndarray)) and (\n            np.issubdtype(x.dtype, np.integer) and x.shape == ()\n        ):\n            as_int = int(x)  # type: ignore\n        else:\n            return False\n\n        return self.start <= as_int < self.start + self.n\n\n    def __repr__(self) -> str:\n        \"\"\"Gives a string representation of this space.\"\"\"\n        if self.start != 0:\n            return f\"Discrete({self.n}, start={self.start})\"\n        return f\"Discrete({self.n})\"\n\n    def __eq__(self, other) -> bool:\n        \"\"\"Check whether ``other`` is equivalent to this instance.\"\"\"\n        return (\n            isinstance(other, Discrete)\n            and self.n == other.n\n            and self.start == other.start\n        )\n\n    def __setstate__(self, state):\n        \"\"\"Used when loading a pickled space.\n\n        This method has to be implemented explicitly to allow for loading of legacy states.\n\n        Args:\n            state: The new state\n        \"\"\"\n        # Don't mutate the original state\n        state = dict(state)\n\n        # Allow for loading of legacy states.\n        # See https://github.com/openai/gym/pull/2470\n        if \"start\" not in state:\n            state[\"start\"] = 0\n\n        super().__setstate__(state)\n"
  },
  {
    "path": "gym/spaces/graph.py",
    "content": "\"\"\"Implementation of a space that represents graph information where nodes and edges can be represented with euclidean space.\"\"\"\nfrom typing import NamedTuple, Optional, Sequence, Tuple, Union\n\nimport numpy as np\n\nfrom gym.logger import warn\nfrom gym.spaces.box import Box\nfrom gym.spaces.discrete import Discrete\nfrom gym.spaces.multi_discrete import MultiDiscrete\nfrom gym.spaces.space import Space\n\n\nclass GraphInstance(NamedTuple):\n    \"\"\"A Graph space instance.\n\n    * nodes (np.ndarray): an (n x ...) sized array representing the features for n nodes, (...) must adhere to the shape of the node space.\n    * edges (Optional[np.ndarray]): an (m x ...) sized array representing the features for m edges, (...) must adhere to the shape of the edge space.\n    * edge_links (Optional[np.ndarray]): an (m x 2) sized array of ints representing the indices of the two nodes that each edge connects.\n    \"\"\"\n\n    nodes: np.ndarray\n    edges: Optional[np.ndarray]\n    edge_links: Optional[np.ndarray]\n\n\nclass Graph(Space):\n    r\"\"\"A space representing graph information as a series of `nodes` connected with `edges` according to an adjacency matrix represented as a series of `edge_links`.\n\n    Example usage::\n\n        self.observation_space = spaces.Graph(node_space=space.Box(low=-100, high=100, shape=(3,)), edge_space=spaces.Discrete(3))\n    \"\"\"\n\n    def __init__(\n        self,\n        node_space: Union[Box, Discrete],\n        edge_space: Union[None, Box, Discrete],\n        seed: Optional[Union[int, np.random.Generator]] = None,\n    ):\n        r\"\"\"Constructor of :class:`Graph`.\n\n        The argument ``node_space`` specifies the base space that each node feature will use.\n        This argument must be either a Box or Discrete instance.\n\n        The argument ``edge_space`` specifies the base space that each edge feature will use.\n        This argument must be either a None, Box or Discrete instance.\n\n        Args:\n            node_space (Union[Box, Discrete]): space of the node features.\n            edge_space (Union[None, Box, Discrete]): space of the node features.\n            seed: Optionally, you can use this argument to seed the RNG that is used to sample from the space.\n        \"\"\"\n        assert isinstance(\n            node_space, (Box, Discrete)\n        ), f\"Values of the node_space should be instances of Box or Discrete, got {type(node_space)}\"\n        if edge_space is not None:\n            assert isinstance(\n                edge_space, (Box, Discrete)\n            ), f\"Values of the edge_space should be instances of None Box or Discrete, got {type(node_space)}\"\n\n        self.node_space = node_space\n        self.edge_space = edge_space\n\n        super().__init__(None, None, seed)\n\n    @property\n    def is_np_flattenable(self):\n        \"\"\"Checks whether this space can be flattened to a :class:`spaces.Box`.\"\"\"\n        return False\n\n    def _generate_sample_space(\n        self, base_space: Union[None, Box, Discrete], num: int\n    ) -> Optional[Union[Box, MultiDiscrete]]:\n        if num == 0 or base_space is None:\n            return None\n\n        if isinstance(base_space, Box):\n            return Box(\n                low=np.array(max(1, num) * [base_space.low]),\n                high=np.array(max(1, num) * [base_space.high]),\n                shape=(num,) + base_space.shape,\n                dtype=base_space.dtype,\n                seed=self.np_random,\n            )\n        elif isinstance(base_space, Discrete):\n            return MultiDiscrete(nvec=[base_space.n] * num, seed=self.np_random)\n        else:\n            raise TypeError(\n                f\"Expects base space to be Box and Discrete, actual space: {type(base_space)}.\"\n            )\n\n    def sample(\n        self,\n        mask: Optional[\n            Tuple[\n                Optional[Union[np.ndarray, tuple]],\n                Optional[Union[np.ndarray, tuple]],\n            ]\n        ] = None,\n        num_nodes: int = 10,\n        num_edges: Optional[int] = None,\n    ) -> GraphInstance:\n        \"\"\"Generates a single sample graph with num_nodes between 1 and 10 sampled from the Graph.\n\n        Args:\n            mask: An optional tuple of optional node and edge mask that is only possible with Discrete spaces\n                (Box spaces don't support sample masks).\n                If no `num_edges` is provided then the `edge_mask` is multiplied by the number of edges\n            num_nodes: The number of nodes that will be sampled, the default is 10 nodes\n            num_edges: An optional number of edges, otherwise, a random number between 0 and `num_nodes`^2\n\n        Returns:\n            A NamedTuple representing a graph with attributes .nodes, .edges, and .edge_links.\n        \"\"\"\n        assert (\n            num_nodes > 0\n        ), f\"The number of nodes is expected to be greater than 0, actual value: {num_nodes}\"\n\n        if mask is not None:\n            node_space_mask, edge_space_mask = mask\n        else:\n            node_space_mask, edge_space_mask = None, None\n\n        # we only have edges when we have at least 2 nodes\n        if num_edges is None:\n            if num_nodes > 1:\n                # maximal number of edges is `n*(n-1)` allowing self connections and two-way is allowed\n                num_edges = self.np_random.integers(num_nodes * (num_nodes - 1))\n            else:\n                num_edges = 0\n\n            if edge_space_mask is not None:\n                edge_space_mask = tuple(edge_space_mask for _ in range(num_edges))\n        else:\n            if self.edge_space is None:\n                warn(\n                    f\"The number of edges is set ({num_edges}) but the edge space is None.\"\n                )\n            assert (\n                num_edges >= 0\n            ), f\"Expects the number of edges to be greater than 0, actual value: {num_edges}\"\n        assert num_edges is not None\n\n        sampled_node_space = self._generate_sample_space(self.node_space, num_nodes)\n        sampled_edge_space = self._generate_sample_space(self.edge_space, num_edges)\n\n        assert sampled_node_space is not None\n        sampled_nodes = sampled_node_space.sample(node_space_mask)\n        sampled_edges = (\n            sampled_edge_space.sample(edge_space_mask)\n            if sampled_edge_space is not None\n            else None\n        )\n\n        sampled_edge_links = None\n        if sampled_edges is not None and num_edges > 0:\n            sampled_edge_links = self.np_random.integers(\n                low=0, high=num_nodes, size=(num_edges, 2)\n            )\n\n        return GraphInstance(sampled_nodes, sampled_edges, sampled_edge_links)\n\n    def contains(self, x: GraphInstance) -> bool:\n        \"\"\"Return boolean specifying if x is a valid member of this space.\"\"\"\n        if isinstance(x, GraphInstance):\n            # Checks the nodes\n            if isinstance(x.nodes, np.ndarray):\n                if all(node in self.node_space for node in x.nodes):\n                    # Check the edges and edge links which are optional\n                    if isinstance(x.edges, np.ndarray) and isinstance(\n                        x.edge_links, np.ndarray\n                    ):\n                        assert x.edges is not None\n                        assert x.edge_links is not None\n                        if self.edge_space is not None:\n                            if all(edge in self.edge_space for edge in x.edges):\n                                if np.issubdtype(x.edge_links.dtype, np.integer):\n                                    if x.edge_links.shape == (len(x.edges), 2):\n                                        if np.all(\n                                            np.logical_and(\n                                                x.edge_links >= 0,\n                                                x.edge_links < len(x.nodes),\n                                            )\n                                        ):\n                                            return True\n                    else:\n                        return x.edges is None and x.edge_links is None\n        return False\n\n    def __repr__(self) -> str:\n        \"\"\"A string representation of this space.\n\n        The representation will include node_space and edge_space\n\n        Returns:\n            A representation of the space\n        \"\"\"\n        return f\"Graph({self.node_space}, {self.edge_space})\"\n\n    def __eq__(self, other) -> bool:\n        \"\"\"Check whether `other` is equivalent to this instance.\"\"\"\n        return (\n            isinstance(other, Graph)\n            and (self.node_space == other.node_space)\n            and (self.edge_space == other.edge_space)\n        )\n\n    def to_jsonable(self, sample_n: NamedTuple) -> list:\n        \"\"\"Convert a batch of samples from this space to a JSONable data type.\"\"\"\n        # serialize as list of dicts\n        ret_n = []\n        for sample in sample_n:\n            ret = {}\n            ret[\"nodes\"] = sample.nodes.tolist()\n            if sample.edges is not None:\n                ret[\"edges\"] = sample.edges.tolist()\n                ret[\"edge_links\"] = sample.edge_links.tolist()\n            ret_n.append(ret)\n        return ret_n\n\n    def from_jsonable(self, sample_n: Sequence[dict]) -> list:\n        \"\"\"Convert a JSONable data type to a batch of samples from this space.\"\"\"\n        ret = []\n        for sample in sample_n:\n            if \"edges\" in sample:\n                ret_n = GraphInstance(\n                    np.asarray(sample[\"nodes\"]),\n                    np.asarray(sample[\"edges\"]),\n                    np.asarray(sample[\"edge_links\"]),\n                )\n            else:\n                ret_n = GraphInstance(\n                    np.asarray(sample[\"nodes\"]),\n                    None,\n                    None,\n                )\n            ret.append(ret_n)\n        return ret\n"
  },
  {
    "path": "gym/spaces/multi_binary.py",
    "content": "\"\"\"Implementation of a space that consists of binary np.ndarrays of a fixed shape.\"\"\"\nfrom typing import Optional, Sequence, Tuple, Union\n\nimport numpy as np\n\nfrom gym.spaces.space import Space\n\n\nclass MultiBinary(Space[np.ndarray]):\n    \"\"\"An n-shape binary space.\n\n    Elements of this space are binary arrays of a shape that is fixed during construction.\n\n    Example Usage::\n\n        >>> observation_space = MultiBinary(5)\n        >>> observation_space.sample()\n            array([0, 1, 0, 1, 0], dtype=int8)\n        >>> observation_space = MultiBinary([3, 2])\n        >>> observation_space.sample()\n            array([[0, 0],\n                [0, 1],\n                [1, 1]], dtype=int8)\n    \"\"\"\n\n    def __init__(\n        self,\n        n: Union[np.ndarray, Sequence[int], int],\n        seed: Optional[Union[int, np.random.Generator]] = None,\n    ):\n        \"\"\"Constructor of :class:`MultiBinary` space.\n\n        Args:\n            n: This will fix the shape of elements of the space. It can either be an integer (if the space is flat)\n                or some sort of sequence (tuple, list or np.ndarray) if there are multiple axes.\n            seed: Optionally, you can use this argument to seed the RNG that is used to sample from the space.\n        \"\"\"\n        if isinstance(n, (Sequence, np.ndarray)):\n            self.n = input_n = tuple(int(i) for i in n)\n            assert (np.asarray(input_n) > 0).all()  # n (counts) have to be positive\n        else:\n            self.n = n = int(n)\n            input_n = (n,)\n            assert (np.asarray(input_n) > 0).all()  # n (counts) have to be positive\n\n        super().__init__(input_n, np.int8, seed)\n\n    @property\n    def shape(self) -> Tuple[int, ...]:\n        \"\"\"Has stricter type than gym.Space - never None.\"\"\"\n        return self._shape  # type: ignore\n\n    @property\n    def is_np_flattenable(self):\n        \"\"\"Checks whether this space can be flattened to a :class:`spaces.Box`.\"\"\"\n        return True\n\n    def sample(self, mask: Optional[np.ndarray] = None) -> np.ndarray:\n        \"\"\"Generates a single random sample from this space.\n\n        A sample is drawn by independent, fair coin tosses (one toss per binary variable of the space).\n\n        Args:\n            mask: An optional np.ndarray to mask samples with expected shape of ``space.shape``.\n                For mask == 0 then the samples will be 0 and mask == 1 then random samples will be generated.\n                The expected mask shape is the space shape and mask dtype is `np.int8`.\n\n        Returns:\n            Sampled values from space\n        \"\"\"\n        if mask is not None:\n            assert isinstance(\n                mask, np.ndarray\n            ), f\"The expected type of the mask is np.ndarray, actual type: {type(mask)}\"\n            assert (\n                mask.dtype == np.int8\n            ), f\"The expected dtype of the mask is np.int8, actual dtype: {mask.dtype}\"\n            assert (\n                mask.shape == self.shape\n            ), f\"The expected shape of the mask is {self.shape}, actual shape: {mask.shape}\"\n            assert np.all(\n                (mask == 0) | (mask == 1) | (mask == 2)\n            ), f\"All values of a mask should be 0, 1 or 2, actual values: {mask}\"\n\n            return np.where(\n                mask == 2,\n                self.np_random.integers(low=0, high=2, size=self.n, dtype=self.dtype),\n                mask.astype(self.dtype),\n            )\n\n        return self.np_random.integers(low=0, high=2, size=self.n, dtype=self.dtype)\n\n    def contains(self, x) -> bool:\n        \"\"\"Return boolean specifying if x is a valid member of this space.\"\"\"\n        if isinstance(x, Sequence):\n            x = np.array(x)  # Promote list to array for contains check\n\n        return bool(\n            isinstance(x, np.ndarray)\n            and self.shape == x.shape\n            and np.all((x == 0) | (x == 1))\n        )\n\n    def to_jsonable(self, sample_n) -> list:\n        \"\"\"Convert a batch of samples from this space to a JSONable data type.\"\"\"\n        return np.array(sample_n).tolist()\n\n    def from_jsonable(self, sample_n) -> list:\n        \"\"\"Convert a JSONable data type to a batch of samples from this space.\"\"\"\n        return [np.asarray(sample, self.dtype) for sample in sample_n]\n\n    def __repr__(self) -> str:\n        \"\"\"Gives a string representation of this space.\"\"\"\n        return f\"MultiBinary({self.n})\"\n\n    def __eq__(self, other) -> bool:\n        \"\"\"Check whether `other` is equivalent to this instance.\"\"\"\n        return isinstance(other, MultiBinary) and self.n == other.n\n"
  },
  {
    "path": "gym/spaces/multi_discrete.py",
    "content": "\"\"\"Implementation of a space that represents the cartesian product of `Discrete` spaces.\"\"\"\nfrom typing import Iterable, List, Optional, Sequence, Tuple, Union\n\nimport numpy as np\n\nfrom gym import logger\nfrom gym.spaces.discrete import Discrete\nfrom gym.spaces.space import Space\n\n\nclass MultiDiscrete(Space[np.ndarray]):\n    \"\"\"This represents the cartesian product of arbitrary :class:`Discrete` spaces.\n\n    It is useful to represent game controllers or keyboards where each key can be represented as a discrete action space.\n\n    Note:\n        Some environment wrappers assume a value of 0 always represents the NOOP action.\n\n    e.g. Nintendo Game Controller - Can be conceptualized as 3 discrete action spaces:\n\n    1. Arrow Keys: Discrete 5  - NOOP[0], UP[1], RIGHT[2], DOWN[3], LEFT[4]  - params: min: 0, max: 4\n    2. Button A:   Discrete 2  - NOOP[0], Pressed[1] - params: min: 0, max: 1\n    3. Button B:   Discrete 2  - NOOP[0], Pressed[1] - params: min: 0, max: 1\n\n    It can be initialized as ``MultiDiscrete([ 5, 2, 2 ])`` such that a sample might be ``array([3, 1, 0])``.\n\n    Although this feature is rarely used, :class:`MultiDiscrete` spaces may also have several axes\n    if ``nvec`` has several axes:\n\n    Example::\n\n        >> d = MultiDiscrete(np.array([[1, 2], [3, 4]]))\n        >> d.sample()\n        array([[0, 0],\n               [2, 3]])\n    \"\"\"\n\n    def __init__(\n        self,\n        nvec: Union[np.ndarray, list],\n        dtype=np.int64,\n        seed: Optional[Union[int, np.random.Generator]] = None,\n    ):\n        \"\"\"Constructor of :class:`MultiDiscrete` space.\n\n        The argument ``nvec`` will determine the number of values each categorical variable can take.\n\n        Args:\n            nvec: vector of counts of each categorical variable. This will usually be a list of integers. However,\n                you may also pass a more complicated numpy array if you'd like the space to have several axes.\n            dtype: This should be some kind of integer type.\n            seed: Optionally, you can use this argument to seed the RNG that is used to sample from the space.\n        \"\"\"\n        self.nvec = np.array(nvec, dtype=dtype, copy=True)\n        assert (self.nvec > 0).all(), \"nvec (counts) have to be positive\"\n\n        super().__init__(self.nvec.shape, dtype, seed)\n\n    @property\n    def shape(self) -> Tuple[int, ...]:\n        \"\"\"Has stricter type than :class:`gym.Space` - never None.\"\"\"\n        return self._shape  # type: ignore\n\n    @property\n    def is_np_flattenable(self):\n        \"\"\"Checks whether this space can be flattened to a :class:`spaces.Box`.\"\"\"\n        return True\n\n    def sample(self, mask: Optional[tuple] = None) -> np.ndarray:\n        \"\"\"Generates a single random sample this space.\n\n        Args:\n            mask: An optional mask for multi-discrete, expects tuples with a `np.ndarray` mask in the position of each\n                action with shape `(n,)` where `n` is the number of actions and `dtype=np.int8`.\n                Only mask values == 1 are possible to sample unless all mask values for an action are 0 then the default action 0 is sampled.\n\n        Returns:\n            An `np.ndarray` of shape `space.shape`\n        \"\"\"\n        if mask is not None:\n\n            def _apply_mask(\n                sub_mask: Union[np.ndarray, tuple],\n                sub_nvec: Union[np.ndarray, np.integer],\n            ) -> Union[int, List[int]]:\n                if isinstance(sub_nvec, np.ndarray):\n                    assert isinstance(\n                        sub_mask, tuple\n                    ), f\"Expects the mask to be a tuple for sub_nvec ({sub_nvec}), actual type: {type(sub_mask)}\"\n                    assert len(sub_mask) == len(\n                        sub_nvec\n                    ), f\"Expects the mask length to be equal to the number of actions, mask length: {len(sub_mask)}, nvec length: {len(sub_nvec)}\"\n                    return [\n                        _apply_mask(new_mask, new_nvec)\n                        for new_mask, new_nvec in zip(sub_mask, sub_nvec)\n                    ]\n                else:\n                    assert np.issubdtype(\n                        type(sub_nvec), np.integer\n                    ), f\"Expects the sub_nvec to be an action, actually: {sub_nvec}, {type(sub_nvec)}\"\n                    assert isinstance(\n                        sub_mask, np.ndarray\n                    ), f\"Expects the sub mask to be np.ndarray, actual type: {type(sub_mask)}\"\n                    assert (\n                        len(sub_mask) == sub_nvec\n                    ), f\"Expects the mask length to be equal to the number of actions, mask length: {len(sub_mask)}, action: {sub_nvec}\"\n                    assert (\n                        sub_mask.dtype == np.int8\n                    ), f\"Expects the mask dtype to be np.int8, actual dtype: {sub_mask.dtype}\"\n\n                    valid_action_mask = sub_mask == 1\n                    assert np.all(\n                        np.logical_or(sub_mask == 0, valid_action_mask)\n                    ), f\"Expects all masks values to 0 or 1, actual values: {sub_mask}\"\n\n                    if np.any(valid_action_mask):\n                        return self.np_random.choice(np.where(valid_action_mask)[0])\n                    else:\n                        return 0\n\n            return np.array(_apply_mask(mask, self.nvec), dtype=self.dtype)\n\n        return (self.np_random.random(self.nvec.shape) * self.nvec).astype(self.dtype)\n\n    def contains(self, x) -> bool:\n        \"\"\"Return boolean specifying if x is a valid member of this space.\"\"\"\n        if isinstance(x, Sequence):\n            x = np.array(x)  # Promote list to array for contains check\n\n        # if nvec is uint32 and space dtype is uint32, then 0 <= x < self.nvec guarantees that x\n        # is within correct bounds for space dtype (even though x does not have to be unsigned)\n        return bool(\n            isinstance(x, np.ndarray)\n            and x.shape == self.shape\n            and x.dtype != object\n            and np.all(0 <= x)\n            and np.all(x < self.nvec)\n        )\n\n    def to_jsonable(self, sample_n: Iterable[np.ndarray]):\n        \"\"\"Convert a batch of samples from this space to a JSONable data type.\"\"\"\n        return [sample.tolist() for sample in sample_n]\n\n    def from_jsonable(self, sample_n):\n        \"\"\"Convert a JSONable data type to a batch of samples from this space.\"\"\"\n        return np.array(sample_n)\n\n    def __repr__(self):\n        \"\"\"Gives a string representation of this space.\"\"\"\n        return f\"MultiDiscrete({self.nvec})\"\n\n    def __getitem__(self, index):\n        \"\"\"Extract a subspace from this ``MultiDiscrete`` space.\"\"\"\n        nvec = self.nvec[index]\n        if nvec.ndim == 0:\n            subspace = Discrete(nvec)\n        else:\n            subspace = MultiDiscrete(nvec, self.dtype)  # type: ignore\n\n        # you don't need to deepcopy as np random generator call replaces the state not the data\n        subspace.np_random.bit_generator.state = self.np_random.bit_generator.state\n\n        return subspace\n\n    def __len__(self):\n        \"\"\"Gives the ``len`` of samples from this space.\"\"\"\n        if self.nvec.ndim >= 2:\n            logger.warn(\n                \"Getting the length of a multi-dimensional MultiDiscrete space.\"\n            )\n        return len(self.nvec)\n\n    def __eq__(self, other):\n        \"\"\"Check whether ``other`` is equivalent to this instance.\"\"\"\n        return isinstance(other, MultiDiscrete) and np.all(self.nvec == other.nvec)\n"
  },
  {
    "path": "gym/spaces/sequence.py",
    "content": "\"\"\"Implementation of a space that represents finite-length sequences.\"\"\"\nfrom collections.abc import Sequence as CollectionSequence\nfrom typing import Any, List, Optional, Tuple, Union\n\nimport numpy as np\n\nimport gym\nfrom gym.spaces.space import Space\n\n\nclass Sequence(Space[Tuple]):\n    r\"\"\"This space represent sets of finite-length sequences.\n\n    This space represents the set of tuples of the form :math:`(a_0, \\dots, a_n)` where the :math:`a_i` belong\n    to some space that is specified during initialization and the integer :math:`n` is not fixed\n\n    Example::\n        >>> space = Sequence(Box(0, 1))\n        >>> space.sample()\n        (array([0.0259352], dtype=float32),)\n        >>> space.sample()\n        (array([0.80977976], dtype=float32), array([0.80066574], dtype=float32), array([0.77165383], dtype=float32))\n    \"\"\"\n\n    def __init__(\n        self,\n        space: Space,\n        seed: Optional[Union[int, np.random.Generator]] = None,\n    ):\n        \"\"\"Constructor of the :class:`Sequence` space.\n\n        Args:\n            space: Elements in the sequences this space represent must belong to this space.\n            seed: Optionally, you can use this argument to seed the RNG that is used to sample from the space.\n        \"\"\"\n        assert isinstance(\n            space, gym.Space\n        ), f\"Expects the feature space to be instance of a gym Space, actual type: {type(space)}\"\n        self.feature_space = space\n        super().__init__(\n            None, None, seed  # type: ignore\n        )  # None for shape and dtype, since it'll require special handling\n\n    def seed(self, seed: Optional[int] = None) -> list:\n        \"\"\"Seed the PRNG of this space and the feature space.\"\"\"\n        seeds = super().seed(seed)\n        seeds += self.feature_space.seed(seed)\n        return seeds\n\n    @property\n    def is_np_flattenable(self):\n        \"\"\"Checks whether this space can be flattened to a :class:`spaces.Box`.\"\"\"\n        return False\n\n    def sample(\n        self,\n        mask: Optional[Tuple[Optional[Union[np.ndarray, int]], Optional[Any]]] = None,\n    ) -> Tuple[Any]:\n        \"\"\"Generates a single random sample from this space.\n\n        Args:\n            mask: An optional mask for (optionally) the length of the sequence and (optionally) the values in the sequence.\n                If you specify `mask`, it is expected to be a tuple of the form `(length_mask, sample_mask)` where `length_mask`\n                is\n                - `None` The length will be randomly drawn from a geometric distribution\n                - `np.ndarray` of integers, in which case the length of the sampled sequence is randomly drawn from this array.\n                - `int` for a fixed length sample\n                The second element of the mask tuple `sample` mask specifies a mask that is applied when\n                sampling elements from the base space. The mask is applied for each feature space sample.\n\n        Returns:\n            A tuple of random length with random samples of elements from the :attr:`feature_space`.\n        \"\"\"\n        if mask is not None:\n            length_mask, feature_mask = mask\n        else:\n            length_mask, feature_mask = None, None\n\n        if length_mask is not None:\n            if np.issubdtype(type(length_mask), np.integer):\n                assert (\n                    0 <= length_mask\n                ), f\"Expects the length mask to be greater than or equal to zero, actual value: {length_mask}\"\n                length = length_mask\n            elif isinstance(length_mask, np.ndarray):\n                assert (\n                    len(length_mask.shape) == 1\n                ), f\"Expects the shape of the length mask to be 1-dimensional, actual shape: {length_mask.shape}\"\n                assert np.all(\n                    0 <= length_mask\n                ), f\"Expects all values in the length_mask to be greater than or equal to zero, actual values: {length_mask}\"\n                length = self.np_random.choice(length_mask)\n            else:\n                raise TypeError(\n                    f\"Expects the type of length_mask to an integer or a np.ndarray, actual type: {type(length_mask)}\"\n                )\n        else:\n            # The choice of 0.25 is arbitrary\n            length = self.np_random.geometric(0.25)\n\n        return tuple(\n            self.feature_space.sample(mask=feature_mask) for _ in range(length)\n        )\n\n    def contains(self, x) -> bool:\n        \"\"\"Return boolean specifying if x is a valid member of this space.\"\"\"\n        return isinstance(x, CollectionSequence) and all(\n            self.feature_space.contains(item) for item in x\n        )\n\n    def __repr__(self) -> str:\n        \"\"\"Gives a string representation of this space.\"\"\"\n        return f\"Sequence({self.feature_space})\"\n\n    def to_jsonable(self, sample_n: list) -> list:\n        \"\"\"Convert a batch of samples from this space to a JSONable data type.\"\"\"\n        # serialize as dict-repr of vectors\n        return [self.feature_space.to_jsonable(list(sample)) for sample in sample_n]\n\n    def from_jsonable(self, sample_n: List[List[Any]]) -> list:\n        \"\"\"Convert a JSONable data type to a batch of samples from this space.\"\"\"\n        return [tuple(self.feature_space.from_jsonable(sample)) for sample in sample_n]\n\n    def __eq__(self, other) -> bool:\n        \"\"\"Check whether ``other`` is equivalent to this instance.\"\"\"\n        return isinstance(other, Sequence) and self.feature_space == other.feature_space\n"
  },
  {
    "path": "gym/spaces/space.py",
    "content": "\"\"\"Implementation of the `Space` metaclass.\"\"\"\n\nfrom typing import (\n    Any,\n    Generic,\n    Iterable,\n    List,\n    Mapping,\n    Optional,\n    Sequence,\n    Tuple,\n    Type,\n    TypeVar,\n    Union,\n)\n\nimport numpy as np\n\nfrom gym.utils import seeding\n\nT_cov = TypeVar(\"T_cov\", covariant=True)\n\n\nclass Space(Generic[T_cov]):\n    \"\"\"Superclass that is used to define observation and action spaces.\n\n    Spaces are crucially used in Gym to define the format of valid actions and observations.\n    They serve various purposes:\n\n    * They clearly define how to interact with environments, i.e. they specify what actions need to look like\n      and what observations will look like\n    * They allow us to work with highly structured data (e.g. in the form of elements of :class:`Dict` spaces)\n      and painlessly transform them into flat arrays that can be used in learning code\n    * They provide a method to sample random elements. This is especially useful for exploration and debugging.\n\n    Different spaces can be combined hierarchically via container spaces (:class:`Tuple` and :class:`Dict`) to build a\n    more expressive space\n\n    Warning:\n        Custom observation & action spaces can inherit from the ``Space``\n        class. However, most use-cases should be covered by the existing space\n        classes (e.g. :class:`Box`, :class:`Discrete`, etc...), and container classes (:class`Tuple` &\n        :class:`Dict`). Note that parametrized probability distributions (through the\n        :meth:`Space.sample()` method), and batching functions (in :class:`gym.vector.VectorEnv`), are\n        only well-defined for instances of spaces provided in gym by default.\n        Moreover, some implementations of Reinforcement Learning algorithms might\n        not handle custom spaces properly. Use custom spaces with care.\n    \"\"\"\n\n    def __init__(\n        self,\n        shape: Optional[Sequence[int]] = None,\n        dtype: Optional[Union[Type, str, np.dtype]] = None,\n        seed: Optional[Union[int, np.random.Generator]] = None,\n    ):\n        \"\"\"Constructor of :class:`Space`.\n\n        Args:\n            shape (Optional[Sequence[int]]): If elements of the space are numpy arrays, this should specify their shape.\n            dtype (Optional[Type | str]): If elements of the space are numpy arrays, this should specify their dtype.\n            seed: Optionally, you can use this argument to seed the RNG that is used to sample from the space\n        \"\"\"\n        self._shape = None if shape is None else tuple(shape)\n        self.dtype = None if dtype is None else np.dtype(dtype)\n        self._np_random = None\n        if seed is not None:\n            if isinstance(seed, np.random.Generator):\n                self._np_random = seed\n            else:\n                self.seed(seed)\n\n    @property\n    def np_random(self) -> np.random.Generator:\n        \"\"\"Lazily seed the PRNG since this is expensive and only needed if sampling from this space.\"\"\"\n        if self._np_random is None:\n            self.seed()\n\n        return self._np_random  # type: ignore  ## self.seed() call guarantees right type.\n\n    @property\n    def shape(self) -> Optional[Tuple[int, ...]]:\n        \"\"\"Return the shape of the space as an immutable property.\"\"\"\n        return self._shape\n\n    @property\n    def is_np_flattenable(self):\n        \"\"\"Checks whether this space can be flattened to a :class:`spaces.Box`.\"\"\"\n        raise NotImplementedError\n\n    def sample(self, mask: Optional[Any] = None) -> T_cov:\n        \"\"\"Randomly sample an element of this space.\n\n        Can be uniform or non-uniform sampling based on boundedness of space.\n\n        Args:\n            mask: A mask used for sampling, expected ``dtype=np.int8`` and see sample implementation for expected shape.\n\n        Returns:\n            A sampled actions from the space\n        \"\"\"\n        raise NotImplementedError\n\n    def seed(self, seed: Optional[int] = None) -> list:\n        \"\"\"Seed the PRNG of this space and possibly the PRNGs of subspaces.\"\"\"\n        self._np_random, seed = seeding.np_random(seed)\n        return [seed]\n\n    def contains(self, x) -> bool:\n        \"\"\"Return boolean specifying if x is a valid member of this space.\"\"\"\n        raise NotImplementedError\n\n    def __contains__(self, x) -> bool:\n        \"\"\"Return boolean specifying if x is a valid member of this space.\"\"\"\n        return self.contains(x)\n\n    def __setstate__(self, state: Union[Iterable, Mapping]):\n        \"\"\"Used when loading a pickled space.\n\n        This method was implemented explicitly to allow for loading of legacy states.\n\n        Args:\n            state: The updated state value\n        \"\"\"\n        # Don't mutate the original state\n        state = dict(state)\n\n        # Allow for loading of legacy states.\n        # See:\n        #   https://github.com/openai/gym/pull/2397 -- shape\n        #   https://github.com/openai/gym/pull/1913 -- np_random\n        #\n        if \"shape\" in state:\n            state[\"_shape\"] = state[\"shape\"]\n            del state[\"shape\"]\n        if \"np_random\" in state:\n            state[\"_np_random\"] = state[\"np_random\"]\n            del state[\"np_random\"]\n\n        # Update our state\n        self.__dict__.update(state)\n\n    def to_jsonable(self, sample_n: Sequence[T_cov]) -> list:\n        \"\"\"Convert a batch of samples from this space to a JSONable data type.\"\"\"\n        # By default, assume identity is JSONable\n        return list(sample_n)\n\n    def from_jsonable(self, sample_n: list) -> List[T_cov]:\n        \"\"\"Convert a JSONable data type to a batch of samples from this space.\"\"\"\n        # By default, assume identity is JSONable\n        return sample_n\n"
  },
  {
    "path": "gym/spaces/text.py",
    "content": "\"\"\"Implementation of a space that represents textual strings.\"\"\"\nfrom typing import Any, Dict, FrozenSet, Optional, Set, Tuple, Union\n\nimport numpy as np\n\nfrom gym.spaces.space import Space\n\nalphanumeric: FrozenSet[str] = frozenset(\n    \"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789\"\n)\n\n\nclass Text(Space[str]):\n    r\"\"\"A space representing a string comprised of characters from a given charset.\n\n    Example::\n        >>> # {\"\", \"B5\", \"hello\", ...}\n        >>> Text(5)\n        >>> # {\"0\", \"42\", \"0123456789\", ...}\n        >>> import string\n        >>> Text(min_length = 1,\n        ...      max_length = 10,\n        ...      charset = string.digits)\n    \"\"\"\n\n    def __init__(\n        self,\n        max_length: int,\n        *,\n        min_length: int = 1,\n        charset: Union[Set[str], str] = alphanumeric,\n        seed: Optional[Union[int, np.random.Generator]] = None,\n    ):\n        r\"\"\"Constructor of :class:`Text` space.\n\n        Both bounds for text length are inclusive.\n\n        Args:\n            min_length (int): Minimum text length (in characters). Defaults to 1 to prevent empty strings.\n            max_length (int): Maximum text length (in characters).\n            charset (Union[set], str): Character set, defaults to the lower and upper english alphabet plus latin digits.\n            seed: The seed for sampling from the space.\n        \"\"\"\n        assert np.issubdtype(\n            type(min_length), np.integer\n        ), f\"Expects the min_length to be an integer, actual type: {type(min_length)}\"\n        assert np.issubdtype(\n            type(max_length), np.integer\n        ), f\"Expects the max_length to be an integer, actual type: {type(max_length)}\"\n        assert (\n            0 <= min_length\n        ), f\"Minimum text length must be non-negative, actual value: {min_length}\"\n        assert (\n            min_length <= max_length\n        ), f\"The min_length must be less than or equal to the max_length, min_length: {min_length}, max_length: {max_length}\"\n\n        self.min_length: int = int(min_length)\n        self.max_length: int = int(max_length)\n\n        self._char_set: FrozenSet[str] = frozenset(charset)\n        self._char_list: Tuple[str, ...] = tuple(charset)\n        self._char_index: Dict[str, np.int32] = {\n            val: np.int32(i) for i, val in enumerate(tuple(charset))\n        }\n        self._char_str: str = \"\".join(sorted(tuple(charset)))\n\n        # As the shape is dynamic (between min_length and max_length) then None\n        super().__init__(dtype=str, seed=seed)\n\n    def sample(\n        self, mask: Optional[Tuple[Optional[int], Optional[np.ndarray]]] = None\n    ) -> str:\n        \"\"\"Generates a single random sample from this space with by default a random length between `min_length` and `max_length` and sampled from the `charset`.\n\n        Args:\n            mask: An optional tuples of length and mask for the text.\n                The length is expected to be between the `min_length` and `max_length` otherwise a random integer between `min_length` and `max_length` is selected.\n                For the mask, we expect a numpy array of length of the charset passed with `dtype == np.int8`.\n                If the charlist mask is all zero then an empty string is returned no matter the `min_length`\n\n        Returns:\n            A sampled string from the space\n        \"\"\"\n        if mask is not None:\n            assert isinstance(\n                mask, tuple\n            ), f\"Expects the mask type to be a tuple, actual type: {type(mask)}\"\n            assert (\n                len(mask) == 2\n            ), f\"Expects the mask length to be two, actual length: {len(mask)}\"\n            length, charlist_mask = mask\n\n            if length is not None:\n                assert np.issubdtype(\n                    type(length), np.integer\n                ), f\"Expects the Text sample length to be an integer, actual type: {type(length)}\"\n                assert (\n                    self.min_length <= length <= self.max_length\n                ), f\"Expects the Text sample length be between {self.min_length} and {self.max_length}, actual length: {length}\"\n\n            if charlist_mask is not None:\n                assert isinstance(\n                    charlist_mask, np.ndarray\n                ), f\"Expects the Text sample mask to be an np.ndarray, actual type: {type(charlist_mask)}\"\n                assert (\n                    charlist_mask.dtype == np.int8\n                ), f\"Expects the Text sample mask to be an np.ndarray, actual dtype: {charlist_mask.dtype}\"\n                assert charlist_mask.shape == (\n                    len(self.character_set),\n                ), f\"expects the Text sample mask to be {(len(self.character_set),)}, actual shape: {charlist_mask.shape}\"\n                assert np.all(\n                    np.logical_or(charlist_mask == 0, charlist_mask == 1)\n                ), f\"Expects all masks values to 0 or 1, actual values: {charlist_mask}\"\n        else:\n            length, charlist_mask = None, None\n\n        if length is None:\n            length = self.np_random.integers(self.min_length, self.max_length + 1)\n\n        if charlist_mask is None:\n            string = self.np_random.choice(self.character_list, size=length)\n        else:\n            valid_mask = charlist_mask == 1\n            valid_indexes = np.where(valid_mask)[0]\n            if len(valid_indexes) == 0:\n                if self.min_length == 0:\n                    string = \"\"\n                else:\n                    # Otherwise the string will not be contained in the space\n                    raise ValueError(\n                        f\"Trying to sample with a minimum length > 0 ({self.min_length}) but the character mask is all zero meaning that no character could be sampled.\"\n                    )\n            else:\n                string = \"\".join(\n                    self.character_list[index]\n                    for index in self.np_random.choice(valid_indexes, size=length)\n                )\n\n        return \"\".join(string)\n\n    def contains(self, x: Any) -> bool:\n        \"\"\"Return boolean specifying if x is a valid member of this space.\"\"\"\n        if isinstance(x, str):\n            if self.min_length <= len(x) <= self.max_length:\n                return all(c in self.character_set for c in x)\n        return False\n\n    def __repr__(self) -> str:\n        \"\"\"Gives a string representation of this space.\"\"\"\n        return (\n            f\"Text({self.min_length}, {self.max_length}, characters={self.characters})\"\n        )\n\n    def __eq__(self, other) -> bool:\n        \"\"\"Check whether ``other`` is equivalent to this instance.\"\"\"\n        return (\n            isinstance(other, Text)\n            and self.min_length == other.min_length\n            and self.max_length == other.max_length\n            and self.character_set == other.character_set\n        )\n\n    @property\n    def character_set(self) -> FrozenSet[str]:\n        \"\"\"Returns the character set for the space.\"\"\"\n        return self._char_set\n\n    @property\n    def character_list(self) -> Tuple[str, ...]:\n        \"\"\"Returns a tuple of characters in the space.\"\"\"\n        return self._char_list\n\n    def character_index(self, char: str) -> np.int32:\n        \"\"\"Returns a unique index for each character in the space's character set.\"\"\"\n        return self._char_index[char]\n\n    @property\n    def characters(self) -> str:\n        \"\"\"Returns a string with all Text characters.\"\"\"\n        return self._char_str\n\n    @property\n    def is_np_flattenable(self) -> bool:\n        \"\"\"The flattened version is an integer array for each character, padded to the max character length.\"\"\"\n        return True\n"
  },
  {
    "path": "gym/spaces/tuple.py",
    "content": "\"\"\"Implementation of a space that represents the cartesian product of other spaces.\"\"\"\nfrom collections.abc import Sequence as CollectionSequence\nfrom typing import Iterable, Optional\nfrom typing import Sequence as TypingSequence\nfrom typing import Tuple as TypingTuple\nfrom typing import Union\n\nimport numpy as np\n\nfrom gym.spaces.space import Space\n\n\nclass Tuple(Space[tuple], CollectionSequence):\n    \"\"\"A tuple (more precisely: the cartesian product) of :class:`Space` instances.\n\n    Elements of this space are tuples of elements of the constituent spaces.\n\n    Example usage::\n\n        >>> from gym.spaces import Box, Discrete\n        >>> observation_space = Tuple((Discrete(2), Box(-1, 1, shape=(2,))))\n        >>> observation_space.sample()\n        (0, array([0.03633198, 0.42370757], dtype=float32))\n    \"\"\"\n\n    def __init__(\n        self,\n        spaces: Iterable[Space],\n        seed: Optional[Union[int, TypingSequence[int], np.random.Generator]] = None,\n    ):\n        r\"\"\"Constructor of :class:`Tuple` space.\n\n        The generated instance will represent the cartesian product :math:`\\text{spaces}[0] \\times ... \\times \\text{spaces}[-1]`.\n\n        Args:\n            spaces (Iterable[Space]): The spaces that are involved in the cartesian product.\n            seed: Optionally, you can use this argument to seed the RNGs of the ``spaces`` to ensure reproducible sampling.\n        \"\"\"\n        self.spaces = tuple(spaces)\n        for space in self.spaces:\n            assert isinstance(\n                space, Space\n            ), \"Elements of the tuple must be instances of gym.Space\"\n        super().__init__(None, None, seed)  # type: ignore\n\n    @property\n    def is_np_flattenable(self):\n        \"\"\"Checks whether this space can be flattened to a :class:`spaces.Box`.\"\"\"\n        return all(space.is_np_flattenable for space in self.spaces)\n\n    def seed(\n        self, seed: Optional[Union[int, TypingSequence[int]]] = None\n    ) -> TypingSequence[int]:\n        \"\"\"Seed the PRNG of this space and all subspaces.\n\n        Depending on the type of seed, the subspaces will be seeded differently\n        * None - All the subspaces will use a random initial seed\n        * Int - The integer is used to seed the `Tuple` space that is used to generate seed values for each of the subspaces. Warning, this does not guarantee unique seeds for all of the subspaces.\n        * List - Values used to seed the subspaces. This allows the seeding of multiple composite subspaces (`List(42, 54, ...)`).\n\n        Args:\n            seed: An optional list of ints or int to seed the (sub-)spaces.\n        \"\"\"\n        seeds = []\n\n        if isinstance(seed, CollectionSequence):\n            assert len(seed) == len(\n                self.spaces\n            ), f\"Expects that the subspaces of seeds equals the number of subspaces. Actual length of seeds: {len(seeds)}, length of subspaces: {len(self.spaces)}\"\n            for subseed, space in zip(seed, self.spaces):\n                seeds += space.seed(subseed)\n        elif isinstance(seed, int):\n            seeds = super().seed(seed)\n            subseeds = self.np_random.integers(\n                np.iinfo(np.int32).max, size=len(self.spaces)\n            )\n            for subspace, subseed in zip(self.spaces, subseeds):\n                seeds += subspace.seed(int(subseed))\n        elif seed is None:\n            for space in self.spaces:\n                seeds += space.seed(seed)\n        else:\n            raise TypeError(\n                f\"Expected seed type: list, tuple, int or None, actual type: {type(seed)}\"\n            )\n\n        return seeds\n\n    def sample(\n        self, mask: Optional[TypingTuple[Optional[np.ndarray], ...]] = None\n    ) -> tuple:\n        \"\"\"Generates a single random sample inside this space.\n\n        This method draws independent samples from the subspaces.\n\n        Args:\n            mask: An optional tuple of optional masks for each of the subspace's samples,\n                expects the same number of masks as spaces\n\n        Returns:\n            Tuple of the subspace's samples\n        \"\"\"\n        if mask is not None:\n            assert isinstance(\n                mask, tuple\n            ), f\"Expected type of mask is tuple, actual type: {type(mask)}\"\n            assert len(mask) == len(\n                self.spaces\n            ), f\"Expected length of mask is {len(self.spaces)}, actual length: {len(mask)}\"\n\n            return tuple(\n                space.sample(mask=sub_mask)\n                for space, sub_mask in zip(self.spaces, mask)\n            )\n\n        return tuple(space.sample() for space in self.spaces)\n\n    def contains(self, x) -> bool:\n        \"\"\"Return boolean specifying if x is a valid member of this space.\"\"\"\n        if isinstance(x, (list, np.ndarray)):\n            x = tuple(x)  # Promote list and ndarray to tuple for contains check\n        return (\n            isinstance(x, tuple)\n            and len(x) == len(self.spaces)\n            and all(space.contains(part) for (space, part) in zip(self.spaces, x))\n        )\n\n    def __repr__(self) -> str:\n        \"\"\"Gives a string representation of this space.\"\"\"\n        return \"Tuple(\" + \", \".join([str(s) for s in self.spaces]) + \")\"\n\n    def to_jsonable(self, sample_n: CollectionSequence) -> list:\n        \"\"\"Convert a batch of samples from this space to a JSONable data type.\"\"\"\n        # serialize as list-repr of tuple of vectors\n        return [\n            space.to_jsonable([sample[i] for sample in sample_n])\n            for i, space in enumerate(self.spaces)\n        ]\n\n    def from_jsonable(self, sample_n) -> list:\n        \"\"\"Convert a JSONable data type to a batch of samples from this space.\"\"\"\n        return [\n            sample\n            for sample in zip(\n                *[\n                    space.from_jsonable(sample_n[i])\n                    for i, space in enumerate(self.spaces)\n                ]\n            )\n        ]\n\n    def __getitem__(self, index: int) -> Space:\n        \"\"\"Get the subspace at specific `index`.\"\"\"\n        return self.spaces[index]\n\n    def __len__(self) -> int:\n        \"\"\"Get the number of subspaces that are involved in the cartesian product.\"\"\"\n        return len(self.spaces)\n\n    def __eq__(self, other) -> bool:\n        \"\"\"Check whether ``other`` is equivalent to this instance.\"\"\"\n        return isinstance(other, Tuple) and self.spaces == other.spaces\n"
  },
  {
    "path": "gym/spaces/utils.py",
    "content": "\"\"\"Implementation of utility functions that can be applied to spaces.\n\nThese functions mostly take care of flattening and unflattening elements of spaces\n to facilitate their usage in learning code.\n\"\"\"\nimport operator as op\nfrom collections import OrderedDict\nfrom functools import reduce, singledispatch\nfrom typing import Dict as TypingDict\nfrom typing import TypeVar, Union, cast\n\nimport numpy as np\n\nfrom gym.spaces import (\n    Box,\n    Dict,\n    Discrete,\n    Graph,\n    GraphInstance,\n    MultiBinary,\n    MultiDiscrete,\n    Sequence,\n    Space,\n    Text,\n    Tuple,\n)\n\n\n@singledispatch\ndef flatdim(space: Space) -> int:\n    \"\"\"Return the number of dimensions a flattened equivalent of this space would have.\n\n    Example usage::\n\n        >>> from gym.spaces import Discrete\n        >>> space = Dict({\"position\": Discrete(2), \"velocity\": Discrete(3)})\n        >>> flatdim(space)\n        5\n\n    Args:\n        space: The space to return the number of dimensions of the flattened spaces\n\n    Returns:\n        The number of dimensions for the flattened spaces\n\n    Raises:\n         NotImplementedError: if the space is not defined in ``gym.spaces``.\n         ValueError: if the space cannot be flattened into a :class:`Box`\n    \"\"\"\n    if not space.is_np_flattenable:\n        raise ValueError(\n            f\"{space} cannot be flattened to a numpy array, probably because it contains a `Graph` or `Sequence` subspace\"\n        )\n\n    raise NotImplementedError(f\"Unknown space: `{space}`\")\n\n\n@flatdim.register(Box)\n@flatdim.register(MultiBinary)\ndef _flatdim_box_multibinary(space: Union[Box, MultiBinary]) -> int:\n    return reduce(op.mul, space.shape, 1)\n\n\n@flatdim.register(Discrete)\ndef _flatdim_discrete(space: Discrete) -> int:\n    return int(space.n)\n\n\n@flatdim.register(MultiDiscrete)\ndef _flatdim_multidiscrete(space: MultiDiscrete) -> int:\n    return int(np.sum(space.nvec))\n\n\n@flatdim.register(Tuple)\ndef _flatdim_tuple(space: Tuple) -> int:\n    if space.is_np_flattenable:\n        return sum(flatdim(s) for s in space.spaces)\n    raise ValueError(\n        f\"{space} cannot be flattened to a numpy array, probably because it contains a `Graph` or `Sequence` subspace\"\n    )\n\n\n@flatdim.register(Dict)\ndef _flatdim_dict(space: Dict) -> int:\n    if space.is_np_flattenable:\n        return sum(flatdim(s) for s in space.spaces.values())\n    raise ValueError(\n        f\"{space} cannot be flattened to a numpy array, probably because it contains a `Graph` or `Sequence` subspace\"\n    )\n\n\n@flatdim.register(Graph)\ndef _flatdim_graph(space: Graph):\n    raise ValueError(\n        \"Cannot get flattened size as the Graph Space in Gym has a dynamic size.\"\n    )\n\n\n@flatdim.register(Text)\ndef _flatdim_text(space: Text) -> int:\n    return space.max_length\n\n\nT = TypeVar(\"T\")\nFlatType = Union[np.ndarray, TypingDict, tuple, GraphInstance]\n\n\n@singledispatch\ndef flatten(space: Space[T], x: T) -> FlatType:\n    \"\"\"Flatten a data point from a space.\n\n    This is useful when e.g. points from spaces must be passed to a neural\n    network, which only understands flat arrays of floats.\n\n    Args:\n        space: The space that ``x`` is flattened by\n        x: The value to flatten\n\n    Returns:\n        - For ``Box`` and ``MultiBinary``, this is a flattened array\n        - For ``Discrete`` and ``MultiDiscrete``, this is a flattened one-hot array of the sample\n        - For ``Tuple`` and ``Dict``, this is a concatenated array the subspaces (does not support graph subspaces)\n        - For graph spaces, returns `GraphInstance` where:\n            - `nodes` are n x k arrays\n            - `edges` are either:\n                - m x k arrays\n                - None\n            - `edge_links` are either:\n                - m x 2 arrays\n                - None\n\n    Raises:\n        NotImplementedError: If the space is not defined in ``gym.spaces``.\n    \"\"\"\n    raise NotImplementedError(f\"Unknown space: `{space}`\")\n\n\n@flatten.register(Box)\n@flatten.register(MultiBinary)\ndef _flatten_box_multibinary(space, x) -> np.ndarray:\n    return np.asarray(x, dtype=space.dtype).flatten()\n\n\n@flatten.register(Discrete)\ndef _flatten_discrete(space, x) -> np.ndarray:\n    onehot = np.zeros(space.n, dtype=space.dtype)\n    onehot[x - space.start] = 1\n    return onehot\n\n\n@flatten.register(MultiDiscrete)\ndef _flatten_multidiscrete(space, x) -> np.ndarray:\n    offsets = np.zeros((space.nvec.size + 1,), dtype=space.dtype)\n    offsets[1:] = np.cumsum(space.nvec.flatten())\n\n    onehot = np.zeros((offsets[-1],), dtype=space.dtype)\n    onehot[offsets[:-1] + x.flatten()] = 1\n    return onehot\n\n\n@flatten.register(Tuple)\ndef _flatten_tuple(space, x) -> Union[tuple, np.ndarray]:\n    if space.is_np_flattenable:\n        return np.concatenate(\n            [flatten(s, x_part) for x_part, s in zip(x, space.spaces)]\n        )\n    return tuple(flatten(s, x_part) for x_part, s in zip(x, space.spaces))\n\n\n@flatten.register(Dict)\ndef _flatten_dict(space, x) -> Union[dict, np.ndarray]:\n    if space.is_np_flattenable:\n        return np.concatenate([flatten(s, x[key]) for key, s in space.spaces.items()])\n    return OrderedDict((key, flatten(s, x[key])) for key, s in space.spaces.items())\n\n\n@flatten.register(Graph)\ndef _flatten_graph(space, x) -> GraphInstance:\n    \"\"\"We're not using `.unflatten() for :class:`Box` and :class:`Discrete` because a graph is not a homogeneous space, see `.flatten` docstring.\"\"\"\n\n    def _graph_unflatten(unflatten_space, unflatten_x):\n        ret = None\n        if unflatten_space is not None and unflatten_x is not None:\n            if isinstance(unflatten_space, Box):\n                ret = unflatten_x.reshape(unflatten_x.shape[0], -1)\n            elif isinstance(unflatten_space, Discrete):\n                ret = np.zeros(\n                    (unflatten_x.shape[0], unflatten_space.n - unflatten_space.start),\n                    dtype=unflatten_space.dtype,\n                )\n                ret[\n                    np.arange(unflatten_x.shape[0]), unflatten_x - unflatten_space.start\n                ] = 1\n        return ret\n\n    nodes = _graph_unflatten(space.node_space, x.nodes)\n    edges = _graph_unflatten(space.edge_space, x.edges)\n\n    return GraphInstance(nodes, edges, x.edge_links)\n\n\n@flatten.register(Text)\ndef _flatten_text(space: Text, x: str) -> np.ndarray:\n    arr = np.full(\n        shape=(space.max_length,), fill_value=len(space.character_set), dtype=np.int32\n    )\n    for i, val in enumerate(x):\n        arr[i] = space.character_index(val)\n    return arr\n\n\n@flatten.register(Sequence)\ndef _flatten_sequence(space, x) -> tuple:\n    return tuple(flatten(space.feature_space, item) for item in x)\n\n\n@singledispatch\ndef unflatten(space: Space[T], x: FlatType) -> T:\n    \"\"\"Unflatten a data point from a space.\n\n    This reverses the transformation applied by :func:`flatten`. You must ensure\n    that the ``space`` argument is the same as for the :func:`flatten` call.\n\n    Args:\n        space: The space used to unflatten ``x``\n        x: The array to unflatten\n\n    Returns:\n        A point with a structure that matches the space.\n\n    Raises:\n        NotImplementedError: if the space is not defined in ``gym.spaces``.\n    \"\"\"\n    raise NotImplementedError(f\"Unknown space: `{space}`\")\n\n\n@unflatten.register(Box)\n@unflatten.register(MultiBinary)\ndef _unflatten_box_multibinary(\n    space: Union[Box, MultiBinary], x: np.ndarray\n) -> np.ndarray:\n    return np.asarray(x, dtype=space.dtype).reshape(space.shape)\n\n\n@unflatten.register(Discrete)\ndef _unflatten_discrete(space: Discrete, x: np.ndarray) -> int:\n    return int(space.start + np.nonzero(x)[0][0])\n\n\n@unflatten.register(MultiDiscrete)\ndef _unflatten_multidiscrete(space: MultiDiscrete, x: np.ndarray) -> np.ndarray:\n    offsets = np.zeros((space.nvec.size + 1,), dtype=space.dtype)\n    offsets[1:] = np.cumsum(space.nvec.flatten())\n\n    (indices,) = cast(type(offsets[:-1]), np.nonzero(x))\n    return np.asarray(indices - offsets[:-1], dtype=space.dtype).reshape(space.shape)\n\n\n@unflatten.register(Tuple)\ndef _unflatten_tuple(space: Tuple, x: Union[np.ndarray, tuple]) -> tuple:\n    if space.is_np_flattenable:\n        assert isinstance(\n            x, np.ndarray\n        ), f\"{space} is numpy-flattenable. Thus, you should only unflatten numpy arrays for this space. Got a {type(x)}\"\n        dims = np.asarray([flatdim(s) for s in space.spaces], dtype=np.int_)\n        list_flattened = np.split(x, np.cumsum(dims[:-1]))\n        return tuple(\n            unflatten(s, flattened)\n            for flattened, s in zip(list_flattened, space.spaces)\n        )\n    assert isinstance(\n        x, tuple\n    ), f\"{space} is not numpy-flattenable. Thus, you should only unflatten tuples for this space. Got a {type(x)}\"\n    return tuple(unflatten(s, flattened) for flattened, s in zip(x, space.spaces))\n\n\n@unflatten.register(Dict)\ndef _unflatten_dict(space: Dict, x: Union[np.ndarray, TypingDict]) -> dict:\n    if space.is_np_flattenable:\n        dims = np.asarray([flatdim(s) for s in space.spaces.values()], dtype=np.int_)\n        list_flattened = np.split(x, np.cumsum(dims[:-1]))\n        return OrderedDict(\n            [\n                (key, unflatten(s, flattened))\n                for flattened, (key, s) in zip(list_flattened, space.spaces.items())\n            ]\n        )\n    assert isinstance(\n        x, dict\n    ), f\"{space} is not numpy-flattenable. Thus, you should only unflatten dictionary for this space. Got a {type(x)}\"\n    return OrderedDict((key, unflatten(s, x[key])) for key, s in space.spaces.items())\n\n\n@unflatten.register(Graph)\ndef _unflatten_graph(space: Graph, x: GraphInstance) -> GraphInstance:\n    \"\"\"We're not using `.unflatten() for :class:`Box` and :class:`Discrete` because a graph is not a homogeneous space.\n\n    The size of the outcome is actually not fixed, but determined based on the number of\n    nodes and edges in the graph.\n    \"\"\"\n\n    def _graph_unflatten(space, x):\n        ret = None\n        if space is not None and x is not None:\n            if isinstance(space, Box):\n                ret = x.reshape(-1, *space.shape)\n            elif isinstance(space, Discrete):\n                ret = np.asarray(np.nonzero(x))[-1, :]\n        return ret\n\n    nodes = _graph_unflatten(space.node_space, x.nodes)\n    edges = _graph_unflatten(space.edge_space, x.edges)\n\n    return GraphInstance(nodes, edges, x.edge_links)\n\n\n@unflatten.register(Text)\ndef _unflatten_text(space: Text, x: np.ndarray) -> str:\n    return \"\".join(\n        [space.character_list[val] for val in x if val < len(space.character_set)]\n    )\n\n\n@unflatten.register(Sequence)\ndef _unflatten_sequence(space: Sequence, x: tuple) -> tuple:\n    return tuple(unflatten(space.feature_space, item) for item in x)\n\n\n@singledispatch\ndef flatten_space(space: Space) -> Union[Dict, Sequence, Tuple, Graph]:\n    \"\"\"Flatten a space into a space that is as flat as possible.\n\n    This function will attempt to flatten `space` into a single :class:`Box` space.\n    However, this might not be possible when `space` is an instance of :class:`Graph`,\n    :class:`Sequence` or a compound space that contains a :class:`Graph` or :class:`Sequence`space.\n    This is equivalent to :func:`flatten`, but operates on the space itself. The\n    result for non-graph spaces is always a `Box` with flat boundaries. While\n    the result for graph spaces is always a `Graph` with `node_space` being a `Box`\n    with flat boundaries and `edge_space` being a `Box` with flat boundaries or\n    `None`. The box has exactly :func:`flatdim` dimensions. Flattening a sample\n    of the original space has the same effect as taking a sample of the flattenend\n    space.\n\n    Example::\n\n        >>> box = Box(0.0, 1.0, shape=(3, 4, 5))\n        >>> box\n        Box(3, 4, 5)\n        >>> flatten_space(box)\n        Box(60,)\n        >>> flatten(box, box.sample()) in flatten_space(box)\n        True\n\n    Example that flattens a discrete space::\n\n        >>> discrete = Discrete(5)\n        >>> flatten_space(discrete)\n        Box(5,)\n        >>> flatten(box, box.sample()) in flatten_space(box)\n        True\n\n    Example that recursively flattens a dict::\n\n        >>> space = Dict({\"position\": Discrete(2), \"velocity\": Box(0, 1, shape=(2, 2))})\n        >>> flatten_space(space)\n        Box(6,)\n        >>> flatten(space, space.sample()) in flatten_space(space)\n        True\n\n\n    Example that flattens a graph::\n\n        >>> space = Graph(node_space=Box(low=-100, high=100, shape=(3, 4)), edge_space=Discrete(5))\n        >>> flatten_space(space)\n        Graph(Box(-100.0, 100.0, (12,), float32), Box(0, 1, (5,), int64))\n        >>> flatten(space, space.sample()) in flatten_space(space)\n        True\n\n    Args:\n        space: The space to flatten\n\n    Returns:\n        A flattened Box\n\n    Raises:\n        NotImplementedError: if the space is not defined in ``gym.spaces``.\n    \"\"\"\n    raise NotImplementedError(f\"Unknown space: `{space}`\")\n\n\n@flatten_space.register(Box)\ndef _flatten_space_box(space: Box) -> Box:\n    return Box(space.low.flatten(), space.high.flatten(), dtype=space.dtype)\n\n\n@flatten_space.register(Discrete)\n@flatten_space.register(MultiBinary)\n@flatten_space.register(MultiDiscrete)\ndef _flatten_space_binary(space: Union[Discrete, MultiBinary, MultiDiscrete]) -> Box:\n    return Box(low=0, high=1, shape=(flatdim(space),), dtype=space.dtype)\n\n\n@flatten_space.register(Tuple)\ndef _flatten_space_tuple(space: Tuple) -> Union[Box, Tuple]:\n    if space.is_np_flattenable:\n        space_list = [flatten_space(s) for s in space.spaces]\n        return Box(\n            low=np.concatenate([s.low for s in space_list]),\n            high=np.concatenate([s.high for s in space_list]),\n            dtype=np.result_type(*[s.dtype for s in space_list]),\n        )\n    return Tuple(spaces=[flatten_space(s) for s in space.spaces])\n\n\n@flatten_space.register(Dict)\ndef _flatten_space_dict(space: Dict) -> Union[Box, Dict]:\n    if space.is_np_flattenable:\n        space_list = [flatten_space(s) for s in space.spaces.values()]\n        return Box(\n            low=np.concatenate([s.low for s in space_list]),\n            high=np.concatenate([s.high for s in space_list]),\n            dtype=np.result_type(*[s.dtype for s in space_list]),\n        )\n    return Dict(\n        spaces=OrderedDict(\n            (key, flatten_space(space)) for key, space in space.spaces.items()\n        )\n    )\n\n\n@flatten_space.register(Graph)\ndef _flatten_space_graph(space: Graph) -> Graph:\n    return Graph(\n        node_space=flatten_space(space.node_space),\n        edge_space=flatten_space(space.edge_space)\n        if space.edge_space is not None\n        else None,\n    )\n\n\n@flatten_space.register(Text)\ndef _flatten_space_text(space: Text) -> Box:\n    return Box(\n        low=0, high=len(space.character_set), shape=(space.max_length,), dtype=np.int32\n    )\n\n\n@flatten_space.register(Sequence)\ndef _flatten_space_sequence(space: Sequence) -> Sequence:\n    return Sequence(flatten_space(space.feature_space))\n"
  },
  {
    "path": "gym/utils/__init__.py",
    "content": "\"\"\"A set of common utilities used within the environments.\n\nThese are not intended as API functions, and will not remain stable over time.\n\"\"\"\n\n# These submodules should not have any import-time dependencies.\n# We want this since we use `utils` during our import-time sanity checks\n# that verify that our dependencies are actually present.\nfrom gym.utils.colorize import colorize\nfrom gym.utils.ezpickle import EzPickle\n"
  },
  {
    "path": "gym/utils/colorize.py",
    "content": "\"\"\"A set of common utilities used within the environments.\n\nThese are not intended as API functions, and will not remain stable over time.\n\"\"\"\n\ncolor2num = dict(\n    gray=30,\n    red=31,\n    green=32,\n    yellow=33,\n    blue=34,\n    magenta=35,\n    cyan=36,\n    white=37,\n    crimson=38,\n)\n\n\ndef colorize(\n    string: str, color: str, bold: bool = False, highlight: bool = False\n) -> str:\n    \"\"\"Returns string surrounded by appropriate terminal colour codes to print colourised text.\n\n    Args:\n        string: The message to colourise\n        color: Literal values are gray, red, green, yellow, blue, magenta, cyan, white, crimson\n        bold: If to bold the string\n        highlight: If to highlight the string\n\n    Returns:\n        Colourised string\n    \"\"\"\n    attr = []\n    num = color2num[color]\n    if highlight:\n        num += 10\n    attr.append(str(num))\n    if bold:\n        attr.append(\"1\")\n    attrs = \";\".join(attr)\n    return f\"\\x1b[{attrs}m{string}\\x1b[0m\"\n"
  },
  {
    "path": "gym/utils/env_checker.py",
    "content": "\"\"\"A set of functions for checking an environment details.\n\nThis file is originally from the Stable Baselines3 repository hosted on GitHub\n(https://github.com/DLR-RM/stable-baselines3/)\nOriginal Author: Antonin Raffin\n\nIt also uses some warnings/assertions from the PettingZoo repository hosted on GitHub\n(https://github.com/PettingZoo-Team/PettingZoo)\nOriginal Author: J K Terry\n\nThis was rewritten and split into \"env_checker.py\" and \"passive_env_checker.py\" for invasive and passive environment checking\nOriginal Author: Mark Towers\n\nThese projects are covered by the MIT License.\n\"\"\"\n\nimport inspect\nfrom copy import deepcopy\n\nimport numpy as np\n\nimport gym\nfrom gym import logger, spaces\nfrom gym.utils.passive_env_checker import (\n    check_action_space,\n    check_observation_space,\n    env_render_passive_checker,\n    env_reset_passive_checker,\n    env_step_passive_checker,\n)\n\n\ndef data_equivalence(data_1, data_2) -> bool:\n    \"\"\"Assert equality between data 1 and 2, i.e observations, actions, info.\n\n    Args:\n        data_1: data structure 1\n        data_2: data structure 2\n\n    Returns:\n        If observation 1 and 2 are equivalent\n    \"\"\"\n    if type(data_1) == type(data_2):\n        if isinstance(data_1, dict):\n            return data_1.keys() == data_2.keys() and all(\n                data_equivalence(data_1[k], data_2[k]) for k in data_1.keys()\n            )\n        elif isinstance(data_1, (tuple, list)):\n            return len(data_1) == len(data_2) and all(\n                data_equivalence(o_1, o_2) for o_1, o_2 in zip(data_1, data_2)\n            )\n        elif isinstance(data_1, np.ndarray):\n            return data_1.shape == data_2.shape and np.allclose(\n                data_1, data_2, atol=0.00001\n            )\n        else:\n            return data_1 == data_2\n    else:\n        return False\n\n\ndef check_reset_seed(env: gym.Env):\n    \"\"\"Check that the environment can be reset with a seed.\n\n    Args:\n        env: The environment to check\n\n    Raises:\n        AssertionError: The environment cannot be reset with a random seed,\n            even though `seed` or `kwargs` appear in the signature.\n    \"\"\"\n    signature = inspect.signature(env.reset)\n    if \"seed\" in signature.parameters or (\n        \"kwargs\" in signature.parameters\n        and signature.parameters[\"kwargs\"].kind is inspect.Parameter.VAR_KEYWORD\n    ):\n        try:\n            obs_1, info = env.reset(seed=123)\n            assert (\n                obs_1 in env.observation_space\n            ), \"The observation returned by `env.reset(seed=123)` is not within the observation space.\"\n            assert (\n                env.unwrapped._np_random  # pyright: ignore [reportPrivateUsage]\n                is not None\n            ), \"Expects the random number generator to have been generated given a seed was passed to reset. Mostly likely the environment reset function does not call `super().reset(seed=seed)`.\"\n            seed_123_rng = deepcopy(\n                env.unwrapped._np_random  # pyright: ignore [reportPrivateUsage]\n            )\n\n            obs_2, info = env.reset(seed=123)\n            assert (\n                obs_2 in env.observation_space\n            ), \"The observation returned by `env.reset(seed=123)` is not within the observation space.\"\n            if env.spec is not None and env.spec.nondeterministic is False:\n                assert data_equivalence(\n                    obs_1, obs_2\n                ), \"Using `env.reset(seed=123)` is non-deterministic as the observations are not equivalent.\"\n            assert (\n                env.unwrapped._np_random.bit_generator.state  # pyright: ignore [reportPrivateUsage]\n                == seed_123_rng.bit_generator.state\n            ), \"Mostly likely the environment reset function does not call `super().reset(seed=seed)` as the random generates are not same when the same seeds are passed to `env.reset`.\"\n\n            obs_3, info = env.reset(seed=456)\n            assert (\n                obs_3 in env.observation_space\n            ), \"The observation returned by `env.reset(seed=456)` is not within the observation space.\"\n            assert (\n                env.unwrapped._np_random.bit_generator.state  # pyright: ignore [reportPrivateUsage]\n                != seed_123_rng.bit_generator.state\n            ), \"Mostly likely the environment reset function does not call `super().reset(seed=seed)` as the random number generators are not different when different seeds are passed to `env.reset`.\"\n\n        except TypeError as e:\n            raise AssertionError(\n                \"The environment cannot be reset with a random seed, even though `seed` or `kwargs` appear in the signature. \"\n                f\"This should never happen, please report this issue. The error was: {e}\"\n            )\n\n        seed_param = signature.parameters.get(\"seed\")\n        # Check the default value is None\n        if seed_param is not None and seed_param.default is not None:\n            logger.warn(\n                \"The default seed argument in reset should be `None`, otherwise the environment will by default always be deterministic. \"\n                f\"Actual default: {seed_param.default}\"\n            )\n    else:\n        raise gym.error.Error(\n            \"The `reset` method does not provide a `seed` or `**kwargs` keyword argument.\"\n        )\n\n\ndef check_reset_options(env: gym.Env):\n    \"\"\"Check that the environment can be reset with options.\n\n    Args:\n        env: The environment to check\n\n    Raises:\n        AssertionError: The environment cannot be reset with options,\n            even though `options` or `kwargs` appear in the signature.\n    \"\"\"\n    signature = inspect.signature(env.reset)\n    if \"options\" in signature.parameters or (\n        \"kwargs\" in signature.parameters\n        and signature.parameters[\"kwargs\"].kind is inspect.Parameter.VAR_KEYWORD\n    ):\n        try:\n            env.reset(options={})\n        except TypeError as e:\n            raise AssertionError(\n                \"The environment cannot be reset with options, even though `options` or `**kwargs` appear in the signature. \"\n                f\"This should never happen, please report this issue. The error was: {e}\"\n            )\n    else:\n        raise gym.error.Error(\n            \"The `reset` method does not provide an `options` or `**kwargs` keyword argument.\"\n        )\n\n\ndef check_reset_return_info_deprecation(env: gym.Env):\n    \"\"\"Makes sure support for deprecated `return_info` argument is dropped.\n\n    Args:\n        env: The environment to check\n    Raises:\n        UserWarning\n    \"\"\"\n    signature = inspect.signature(env.reset)\n    if \"return_info\" in signature.parameters:\n        logger.warn(\n            \"`return_info` is deprecated as an optional argument to `reset`. `reset`\"\n            \"should now always return `obs, info` where `obs` is an observation, and `info` is a dictionary\"\n            \"containing additional information.\"\n        )\n\n\ndef check_seed_deprecation(env: gym.Env):\n    \"\"\"Makes sure support for deprecated function `seed` is dropped.\n\n    Args:\n        env: The environment to check\n    Raises:\n        UserWarning\n    \"\"\"\n    seed_fn = getattr(env, \"seed\", None)\n    if callable(seed_fn):\n        logger.warn(\n            \"Official support for the `seed` function is dropped. \"\n            \"Standard practice is to reset gym environments using `env.reset(seed=<desired seed>)`\"\n        )\n\n\ndef check_reset_return_type(env: gym.Env):\n    \"\"\"Checks that :meth:`reset` correctly returns a tuple of the form `(obs , info)`.\n\n    Args:\n        env: The environment to check\n    Raises:\n        AssertionError depending on spec violation\n    \"\"\"\n    result = env.reset()\n    assert isinstance(\n        result, tuple\n    ), f\"The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `{type(result)}`\"\n    assert (\n        len(result) == 2\n    ), f\"Calling the reset method did not return a 2-tuple, actual length: {len(result)}\"\n\n    obs, info = result\n    assert (\n        obs in env.observation_space\n    ), \"The first element returned by `env.reset()` is not within the observation space.\"\n    assert isinstance(\n        info, dict\n    ), f\"The second element returned by `env.reset()` was not a dictionary, actual type: {type(info)}\"\n\n\ndef check_space_limit(space, space_type: str):\n    \"\"\"Check the space limit for only the Box space as a test that only runs as part of `check_env`.\"\"\"\n    if isinstance(space, spaces.Box):\n        if np.any(np.equal(space.low, -np.inf)):\n            logger.warn(\n                f\"A Box {space_type} space minimum value is -infinity. This is probably too low.\"\n            )\n        if np.any(np.equal(space.high, np.inf)):\n            logger.warn(\n                f\"A Box {space_type} space maximum value is -infinity. This is probably too high.\"\n            )\n\n        # Check that the Box space is normalized\n        if space_type == \"action\":\n            if len(space.shape) == 1:  # for vector boxes\n                if (\n                    np.any(\n                        np.logical_and(\n                            space.low != np.zeros_like(space.low),\n                            np.abs(space.low) != np.abs(space.high),\n                        )\n                    )\n                    or np.any(space.low < -1)\n                    or np.any(space.high > 1)\n                ):\n                    # todo - Add to gymlibrary.ml?\n                    logger.warn(\n                        \"For Box action spaces, we recommend using a symmetric and normalized space (range=[-1, 1] or [0, 1]). \"\n                        \"See https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html for more information.\"\n                    )\n    elif isinstance(space, spaces.Tuple):\n        for subspace in space.spaces:\n            check_space_limit(subspace, space_type)\n    elif isinstance(space, spaces.Dict):\n        for subspace in space.values():\n            check_space_limit(subspace, space_type)\n\n\ndef check_env(env: gym.Env, warn: bool = None, skip_render_check: bool = False):\n    \"\"\"Check that an environment follows Gym API.\n\n    This is an invasive function that calls the environment's reset and step.\n\n    This is particularly useful when using a custom environment.\n    Please take a look at https://www.gymlibrary.dev/content/environment_creation/\n    for more information about the API.\n\n    Args:\n        env: The Gym environment that will be checked\n        warn: Ignored\n        skip_render_check: Whether to skip the checks for the render method. True by default (useful for the CI)\n    \"\"\"\n    if warn is not None:\n        logger.warn(\"`check_env(warn=...)` parameter is now ignored.\")\n\n    assert isinstance(\n        env, gym.Env\n    ), \"The environment must inherit from the gym.Env class. See https://www.gymlibrary.dev/content/environment_creation/ for more info.\"\n\n    if env.unwrapped is not env:\n        logger.warn(\n            f\"The environment ({env}) is different from the unwrapped version ({env.unwrapped}). This could effect the environment checker as the environment most likely has a wrapper applied to it. We recommend using the raw environment for `check_env` using `env.unwrapped`.\"\n        )\n\n    # ============= Check the spaces (observation and action) ================\n    assert hasattr(\n        env, \"action_space\"\n    ), \"The environment must specify an action space. See https://www.gymlibrary.dev/content/environment_creation/ for more info.\"\n    check_action_space(env.action_space)\n    check_space_limit(env.action_space, \"action\")\n\n    assert hasattr(\n        env, \"observation_space\"\n    ), \"The environment must specify an observation space. See https://www.gymlibrary.dev/content/environment_creation/ for more info.\"\n    check_observation_space(env.observation_space)\n    check_space_limit(env.observation_space, \"observation\")\n\n    # ==== Check the reset method ====\n    check_seed_deprecation(env)\n    check_reset_return_info_deprecation(env)\n    check_reset_return_type(env)\n    check_reset_seed(env)\n    check_reset_options(env)\n\n    # ============ Check the returned values ===============\n    env_reset_passive_checker(env)\n    env_step_passive_checker(env, env.action_space.sample())\n\n    # ==== Check the render method and the declared render modes ====\n    if not skip_render_check:\n        if env.render_mode is not None:\n            env_render_passive_checker(env)\n\n        # todo: recreate the environment with a different render_mode for check that each work\n"
  },
  {
    "path": "gym/utils/ezpickle.py",
    "content": "\"\"\"Class for pickling and unpickling objects via their constructor arguments.\"\"\"\n\n\nclass EzPickle:\n    \"\"\"Objects that are pickled and unpickled via their constructor arguments.\n\n    Example::\n\n        >>> class Dog(Animal, EzPickle):\n        ...    def __init__(self, furcolor, tailkind=\"bushy\"):\n        ...        Animal.__init__()\n        ...        EzPickle.__init__(furcolor, tailkind)\n\n    When this object is unpickled, a new ``Dog`` will be constructed by passing the provided furcolor and tailkind into the constructor.\n    However, philosophers are still not sure whether it is still the same dog.\n\n    This is generally needed only for environments which wrap C/C++ code, such as MuJoCo and Atari.\n    \"\"\"\n\n    def __init__(self, *args, **kwargs):\n        \"\"\"Uses the ``args`` and ``kwargs`` from the object's constructor for pickling.\"\"\"\n        self._ezpickle_args = args\n        self._ezpickle_kwargs = kwargs\n\n    def __getstate__(self):\n        \"\"\"Returns the object pickle state with args and kwargs.\"\"\"\n        return {\n            \"_ezpickle_args\": self._ezpickle_args,\n            \"_ezpickle_kwargs\": self._ezpickle_kwargs,\n        }\n\n    def __setstate__(self, d):\n        \"\"\"Sets the object pickle state using d.\"\"\"\n        out = type(self)(*d[\"_ezpickle_args\"], **d[\"_ezpickle_kwargs\"])\n        self.__dict__.update(out.__dict__)\n"
  },
  {
    "path": "gym/utils/passive_env_checker.py",
    "content": "\"\"\"A set of functions for passively checking environment implementations.\"\"\"\nimport inspect\nfrom functools import partial\nfrom typing import Callable\n\nimport numpy as np\n\nfrom gym import Space, error, logger, spaces\n\n\ndef _check_box_observation_space(observation_space: spaces.Box):\n    \"\"\"Checks that a :class:`Box` observation space is defined in a sensible way.\n\n    Args:\n        observation_space: A box observation space\n    \"\"\"\n    # Check if the box is an image\n    if len(observation_space.shape) == 3:\n        if observation_space.dtype != np.uint8:\n            logger.warn(\n                f\"It seems a Box observation space is an image but the `dtype` is not `np.uint8`, actual type: {observation_space.dtype}. \"\n                \"If the Box observation space is not an image, we recommend flattening the observation to have only a 1D vector.\"\n            )\n        if np.any(observation_space.low != 0) or np.any(observation_space.high != 255):\n            logger.warn(\n                \"It seems a Box observation space is an image but the upper and lower bounds are not in [0, 255]. \"\n                \"Generally, CNN policies assume observations are within that range, so you may encounter an issue if the observation values are not.\"\n            )\n\n    if len(observation_space.shape) not in [1, 3]:\n        logger.warn(\n            \"A Box observation space has an unconventional shape (neither an image, nor a 1D vector). \"\n            \"We recommend flattening the observation to have only a 1D vector or use a custom policy to properly process the data. \"\n            f\"Actual observation shape: {observation_space.shape}\"\n        )\n\n    assert (\n        observation_space.low.shape == observation_space.shape\n    ), f\"The Box observation space shape and low shape have different shapes, low shape: {observation_space.low.shape}, box shape: {observation_space.shape}\"\n    assert (\n        observation_space.high.shape == observation_space.shape\n    ), f\"The Box observation space shape and high shape have have different shapes, high shape: {observation_space.high.shape}, box shape: {observation_space.shape}\"\n\n    if np.any(observation_space.low == observation_space.high):\n        logger.warn(\"A Box observation space maximum and minimum values are equal.\")\n    elif np.any(observation_space.high < observation_space.low):\n        logger.warn(\"A Box observation space low value is greater than a high value.\")\n\n\ndef _check_box_action_space(action_space: spaces.Box):\n    \"\"\"Checks that a :class:`Box` action space is defined in a sensible way.\n\n    Args:\n        action_space: A box action space\n    \"\"\"\n    assert (\n        action_space.low.shape == action_space.shape\n    ), f\"The Box action space shape and low shape have have different shapes, low shape: {action_space.low.shape}, box shape: {action_space.shape}\"\n    assert (\n        action_space.high.shape == action_space.shape\n    ), f\"The Box action space shape and high shape have different shapes, high shape: {action_space.high.shape}, box shape: {action_space.shape}\"\n\n    if np.any(action_space.low == action_space.high):\n        logger.warn(\"A Box action space maximum and minimum values are equal.\")\n    elif np.any(action_space.high < action_space.low):\n        logger.warn(\"A Box action space low value is greater than a high value.\")\n\n\ndef check_space(\n    space: Space, space_type: str, check_box_space_fn: Callable[[spaces.Box], None]\n):\n    \"\"\"A passive check of the environment action space that should not affect the environment.\"\"\"\n    if not isinstance(space, spaces.Space):\n        raise AssertionError(\n            f\"{space_type} space does not inherit from `gym.spaces.Space`, actual type: {type(space)}\"\n        )\n\n    elif isinstance(space, spaces.Box):\n        check_box_space_fn(space)\n    elif isinstance(space, spaces.Discrete):\n        assert (\n            0 < space.n\n        ), f\"Discrete {space_type} space's number of elements must be positive, actual number of elements: {space.n}\"\n        assert (\n            space.shape == ()\n        ), f\"Discrete {space_type} space's shape should be empty, actual shape: {space.shape}\"\n    elif isinstance(space, spaces.MultiDiscrete):\n        assert (\n            space.shape == space.nvec.shape\n        ), f\"Multi-discrete {space_type} space's shape must be equal to the nvec shape, space shape: {space.shape}, nvec shape: {space.nvec.shape}\"\n        assert np.all(\n            0 < space.nvec\n        ), f\"Multi-discrete {space_type} space's all nvec elements must be greater than 0, actual nvec: {space.nvec}\"\n    elif isinstance(space, spaces.MultiBinary):\n        assert np.all(\n            0 < np.asarray(space.shape)\n        ), f\"Multi-binary {space_type} space's all shape elements must be greater than 0, actual shape: {space.shape}\"\n    elif isinstance(space, spaces.Tuple):\n        assert 0 < len(\n            space.spaces\n        ), f\"An empty Tuple {space_type} space is not allowed.\"\n        for subspace in space.spaces:\n            check_space(subspace, space_type, check_box_space_fn)\n    elif isinstance(space, spaces.Dict):\n        assert 0 < len(\n            space.spaces.keys()\n        ), f\"An empty Dict {space_type} space is not allowed.\"\n        for subspace in space.values():\n            check_space(subspace, space_type, check_box_space_fn)\n\n\ncheck_observation_space = partial(\n    check_space,\n    space_type=\"observation\",\n    check_box_space_fn=_check_box_observation_space,\n)\ncheck_action_space = partial(\n    check_space, space_type=\"action\", check_box_space_fn=_check_box_action_space\n)\n\n\ndef check_obs(obs, observation_space: spaces.Space, method_name: str):\n    \"\"\"Check that the observation returned by the environment correspond to the declared one.\n\n    Args:\n        obs: The observation to check\n        observation_space: The observation space of the observation\n        method_name: The method name that generated the observation\n    \"\"\"\n    pre = f\"The obs returned by the `{method_name}()` method\"\n    if isinstance(observation_space, spaces.Discrete):\n        if not isinstance(obs, (np.int64, int)):\n            logger.warn(f\"{pre} should be an int or np.int64, actual type: {type(obs)}\")\n    elif isinstance(observation_space, spaces.Box):\n        if observation_space.shape != ():\n            if not isinstance(obs, np.ndarray):\n                logger.warn(\n                    f\"{pre} was expecting a numpy array, actual type: {type(obs)}\"\n                )\n            elif obs.dtype != observation_space.dtype:\n                logger.warn(\n                    f\"{pre} was expecting numpy array dtype to be {observation_space.dtype}, actual type: {obs.dtype}\"\n                )\n    elif isinstance(observation_space, (spaces.MultiBinary, spaces.MultiDiscrete)):\n        if not isinstance(obs, np.ndarray):\n            logger.warn(f\"{pre} was expecting a numpy array, actual type: {type(obs)}\")\n    elif isinstance(observation_space, spaces.Tuple):\n        if not isinstance(obs, tuple):\n            logger.warn(f\"{pre} was expecting a tuple, actual type: {type(obs)}\")\n        assert len(obs) == len(\n            observation_space.spaces\n        ), f\"{pre} length is not same as the observation space length, obs length: {len(obs)}, space length: {len(observation_space.spaces)}\"\n        for sub_obs, sub_space in zip(obs, observation_space.spaces):\n            check_obs(sub_obs, sub_space, method_name)\n    elif isinstance(observation_space, spaces.Dict):\n        assert isinstance(obs, dict), f\"{pre} must be a dict, actual type: {type(obs)}\"\n        assert (\n            obs.keys() == observation_space.spaces.keys()\n        ), f\"{pre} observation keys is not same as the observation space keys, obs keys: {list(obs.keys())}, space keys: {list(observation_space.spaces.keys())}\"\n        for space_key in observation_space.spaces.keys():\n            check_obs(obs[space_key], observation_space[space_key], method_name)\n\n    try:\n        if obs not in observation_space:\n            logger.warn(f\"{pre} is not within the observation space.\")\n    except Exception as e:\n        logger.warn(f\"{pre} is not within the observation space with exception: {e}\")\n\n\ndef env_reset_passive_checker(env, **kwargs):\n    \"\"\"A passive check of the `Env.reset` function investigating the returning reset information and returning the data unchanged.\"\"\"\n    signature = inspect.signature(env.reset)\n    if \"seed\" not in signature.parameters and \"kwargs\" not in signature.parameters:\n        logger.warn(\n            \"Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.\"\n        )\n    else:\n        seed_param = signature.parameters.get(\"seed\")\n        # Check the default value is None\n        if seed_param is not None and seed_param.default is not None:\n            logger.warn(\n                \"The default seed argument in `Env.reset` should be `None`, otherwise the environment will by default always be deterministic. \"\n                f\"Actual default: {seed_param}\"\n            )\n\n    if \"options\" not in signature.parameters and \"kwargs\" not in signature.parameters:\n        logger.warn(\n            \"Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.\"\n        )\n\n    # Checks the result of env.reset with kwargs\n    result = env.reset(**kwargs)\n\n    if not isinstance(result, tuple):\n        logger.warn(\n            f\"The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `{type(result)}`\"\n        )\n    elif len(result) != 2:\n        logger.warn(\n            \"The result returned by `env.reset()` should be `(obs, info)` by default, , where `obs` is a observation and `info` is a dictionary containing additional information.\"\n        )\n    else:\n        obs, info = result\n        check_obs(obs, env.observation_space, \"reset\")\n        assert isinstance(\n            info, dict\n        ), f\"The second element returned by `env.reset()` was not a dictionary, actual type: {type(info)}\"\n    return result\n\n\ndef env_step_passive_checker(env, action):\n    \"\"\"A passive check for the environment step, investigating the returning data then returning the data unchanged.\"\"\"\n    # We don't check the action as for some environments then out-of-bounds values can be given\n    result = env.step(action)\n    assert isinstance(\n        result, tuple\n    ), f\"Expects step result to be a tuple, actual type: {type(result)}\"\n    if len(result) == 4:\n        logger.deprecation(\n            \"Core environment is written in old step API which returns one bool instead of two. \"\n            \"It is recommended to rewrite the environment with new step API. \"\n        )\n        obs, reward, done, info = result\n\n        if not isinstance(done, (bool, np.bool8)):\n            logger.warn(\n                f\"Expects `done` signal to be a boolean, actual type: {type(done)}\"\n            )\n    elif len(result) == 5:\n        obs, reward, terminated, truncated, info = result\n\n        # np.bool is actual python bool not np boolean type, therefore bool_ or bool8\n        if not isinstance(terminated, (bool, np.bool8)):\n            logger.warn(\n                f\"Expects `terminated` signal to be a boolean, actual type: {type(terminated)}\"\n            )\n        if not isinstance(truncated, (bool, np.bool8)):\n            logger.warn(\n                f\"Expects `truncated` signal to be a boolean, actual type: {type(truncated)}\"\n            )\n    else:\n        raise error.Error(\n            f\"Expected `Env.step` to return a four or five element tuple, actual number of elements returned: {len(result)}.\"\n        )\n\n    check_obs(obs, env.observation_space, \"step\")\n\n    if not (\n        np.issubdtype(type(reward), np.integer)\n        or np.issubdtype(type(reward), np.floating)\n    ):\n        logger.warn(\n            f\"The reward returned by `step()` must be a float, int, np.integer or np.floating, actual type: {type(reward)}\"\n        )\n    else:\n        if np.isnan(reward):\n            logger.warn(\"The reward is a NaN value.\")\n        if np.isinf(reward):\n            logger.warn(\"The reward is an inf value.\")\n\n    assert isinstance(\n        info, dict\n    ), f\"The `info` returned by `step()` must be a python dictionary, actual type: {type(info)}\"\n\n    return result\n\n\ndef env_render_passive_checker(env, *args, **kwargs):\n    \"\"\"A passive check of the `Env.render` that the declared render modes/fps in the metadata of the environment is declared.\"\"\"\n    render_modes = env.metadata.get(\"render_modes\")\n    if render_modes is None:\n        logger.warn(\n            \"No render modes was declared in the environment (env.metadata['render_modes'] is None or not defined), you may have trouble when calling `.render()`.\"\n        )\n    else:\n        if not isinstance(render_modes, (list, tuple)):\n            logger.warn(\n                f\"Expects the render_modes to be a sequence (i.e. list, tuple), actual type: {type(render_modes)}\"\n            )\n        elif not all(isinstance(mode, str) for mode in render_modes):\n            logger.warn(\n                f\"Expects all render modes to be strings, actual types: {[type(mode) for mode in render_modes]}\"\n            )\n\n        render_fps = env.metadata.get(\"render_fps\")\n        # We only require `render_fps` if rendering is actually implemented\n        if len(render_modes) > 0:\n            if render_fps is None:\n                logger.warn(\n                    \"No render fps was declared in the environment (env.metadata['render_fps'] is None or not defined), rendering may occur at inconsistent fps.\"\n                )\n            else:\n                if not (\n                    np.issubdtype(type(render_fps), np.integer)\n                    or np.issubdtype(type(render_fps), np.floating)\n                ):\n                    logger.warn(\n                        f\"Expects the `env.metadata['render_fps']` to be an integer or a float, actual type: {type(render_fps)}\"\n                    )\n                else:\n                    assert (\n                        render_fps > 0\n                    ), f\"Expects the `env.metadata['render_fps']` to be greater than zero, actual value: {render_fps}\"\n\n        # env.render is now an attribute with default None\n        if len(render_modes) == 0:\n            assert (\n                env.render_mode is None\n            ), f\"With no render_modes, expects the Env.render_mode to be None, actual value: {env.render_mode}\"\n        else:\n            assert env.render_mode is None or env.render_mode in render_modes, (\n                \"The environment was initialized successfully however with an unsupported render mode. \"\n                f\"Render mode: {env.render_mode}, modes: {render_modes}\"\n            )\n\n    result = env.render(*args, **kwargs)\n\n    # TODO: Check that the result is correct\n\n    return result\n"
  },
  {
    "path": "gym/utils/play.py",
    "content": "\"\"\"Utilities of visualising an environment.\"\"\"\nfrom collections import deque\nfrom typing import Callable, Dict, List, Optional, Tuple, Union\n\nimport numpy as np\n\nimport gym.error\nfrom gym import Env, logger\nfrom gym.core import ActType, ObsType\nfrom gym.error import DependencyNotInstalled\nfrom gym.logger import deprecation\n\ntry:\n    import pygame\n    from pygame import Surface\n    from pygame.event import Event\n    from pygame.locals import VIDEORESIZE\nexcept ImportError:\n    raise gym.error.DependencyNotInstalled(\n        \"Pygame is not installed, run `pip install gym[classic_control]`\"\n    )\n\ntry:\n    import matplotlib\n\n    matplotlib.use(\"TkAgg\")\n    import matplotlib.pyplot as plt\nexcept ImportError:\n    logger.warn(\"Matplotlib is not installed, run `pip install gym[other]`\")\n    matplotlib, plt = None, None\n\n\nclass MissingKeysToAction(Exception):\n    \"\"\"Raised when the environment does not have a default ``keys_to_action`` mapping.\"\"\"\n\n\nclass PlayableGame:\n    \"\"\"Wraps an environment allowing keyboard inputs to interact with the environment.\"\"\"\n\n    def __init__(\n        self,\n        env: Env,\n        keys_to_action: Optional[Dict[Tuple[int, ...], int]] = None,\n        zoom: Optional[float] = None,\n    ):\n        \"\"\"Wraps an environment with a dictionary of keyboard buttons to action and if to zoom in on the environment.\n\n        Args:\n            env: The environment to play\n            keys_to_action: The dictionary of keyboard tuples and action value\n            zoom: If to zoom in on the environment render\n        \"\"\"\n        if env.render_mode not in {\"rgb_array\", \"rgb_array_list\"}:\n            logger.error(\n                \"PlayableGame wrapper works only with rgb_array and rgb_array_list render modes, \"\n                f\"but your environment render_mode = {env.render_mode}.\"\n            )\n\n        self.env = env\n        self.relevant_keys = self._get_relevant_keys(keys_to_action)\n        self.video_size = self._get_video_size(zoom)\n        self.screen = pygame.display.set_mode(self.video_size)\n        self.pressed_keys = []\n        self.running = True\n\n    def _get_relevant_keys(\n        self, keys_to_action: Optional[Dict[Tuple[int], int]] = None\n    ) -> set:\n        if keys_to_action is None:\n            if hasattr(self.env, \"get_keys_to_action\"):\n                keys_to_action = self.env.get_keys_to_action()\n            elif hasattr(self.env.unwrapped, \"get_keys_to_action\"):\n                keys_to_action = self.env.unwrapped.get_keys_to_action()\n            else:\n                raise MissingKeysToAction(\n                    f\"{self.env.spec.id} does not have explicit key to action mapping, \"\n                    \"please specify one manually\"\n                )\n        assert isinstance(keys_to_action, dict)\n        relevant_keys = set(sum((list(k) for k in keys_to_action.keys()), []))\n        return relevant_keys\n\n    def _get_video_size(self, zoom: Optional[float] = None) -> Tuple[int, int]:\n        rendered = self.env.render()\n        if isinstance(rendered, List):\n            rendered = rendered[-1]\n        assert rendered is not None and isinstance(rendered, np.ndarray)\n        video_size = (rendered.shape[1], rendered.shape[0])\n\n        if zoom is not None:\n            video_size = (int(video_size[0] * zoom), int(video_size[1] * zoom))\n\n        return video_size\n\n    def process_event(self, event: Event):\n        \"\"\"Processes a PyGame event.\n\n        In particular, this function is used to keep track of which buttons are currently pressed\n        and to exit the :func:`play` function when the PyGame window is closed.\n\n        Args:\n            event: The event to process\n        \"\"\"\n        if event.type == pygame.KEYDOWN:\n            if event.key in self.relevant_keys:\n                self.pressed_keys.append(event.key)\n            elif event.key == pygame.K_ESCAPE:\n                self.running = False\n        elif event.type == pygame.KEYUP:\n            if event.key in self.relevant_keys:\n                self.pressed_keys.remove(event.key)\n        elif event.type == pygame.QUIT:\n            self.running = False\n        elif event.type == VIDEORESIZE:\n            self.video_size = event.size\n            self.screen = pygame.display.set_mode(self.video_size)\n\n\ndef display_arr(\n    screen: Surface, arr: np.ndarray, video_size: Tuple[int, int], transpose: bool\n):\n    \"\"\"Displays a numpy array on screen.\n\n    Args:\n        screen: The screen to show the array on\n        arr: The array to show\n        video_size: The video size of the screen\n        transpose: If to transpose the array on the screen\n    \"\"\"\n    arr_min, arr_max = np.min(arr), np.max(arr)\n    arr = 255.0 * (arr - arr_min) / (arr_max - arr_min)\n    pyg_img = pygame.surfarray.make_surface(arr.swapaxes(0, 1) if transpose else arr)\n    pyg_img = pygame.transform.scale(pyg_img, video_size)\n    screen.blit(pyg_img, (0, 0))\n\n\ndef play(\n    env: Env,\n    transpose: Optional[bool] = True,\n    fps: Optional[int] = None,\n    zoom: Optional[float] = None,\n    callback: Optional[Callable] = None,\n    keys_to_action: Optional[Dict[Union[Tuple[Union[str, int]], str], ActType]] = None,\n    seed: Optional[int] = None,\n    noop: ActType = 0,\n):\n    \"\"\"Allows one to play the game using keyboard.\n\n    Example::\n\n        >>> import gym\n        >>> from gym.utils.play import play\n        >>> play(gym.make(\"CarRacing-v1\", render_mode=\"rgb_array\"), keys_to_action={\n        ...                                                \"w\": np.array([0, 0.7, 0]),\n        ...                                                \"a\": np.array([-1, 0, 0]),\n        ...                                                \"s\": np.array([0, 0, 1]),\n        ...                                                \"d\": np.array([1, 0, 0]),\n        ...                                                \"wa\": np.array([-1, 0.7, 0]),\n        ...                                                \"dw\": np.array([1, 0.7, 0]),\n        ...                                                \"ds\": np.array([1, 0, 1]),\n        ...                                                \"as\": np.array([-1, 0, 1]),\n        ...                                               }, noop=np.array([0,0,0]))\n\n\n    Above code works also if the environment is wrapped, so it's particularly useful in\n    verifying that the frame-level preprocessing does not render the game\n    unplayable.\n\n    If you wish to plot real time statistics as you play, you can use\n    :class:`gym.utils.play.PlayPlot`. Here's a sample code for plotting the reward\n    for last 150 steps.\n\n        >>> def callback(obs_t, obs_tp1, action, rew, terminated, truncated, info):\n        ...        return [rew,]\n        >>> plotter = PlayPlot(callback, 150, [\"reward\"])\n        >>> play(gym.make(\"ALE/AirRaid-v5\"), callback=plotter.callback)\n\n\n    Args:\n        env: Environment to use for playing.\n        transpose: If this is ``True``, the output of observation is transposed. Defaults to ``True``.\n        fps: Maximum number of steps of the environment executed every second. If ``None`` (the default),\n            ``env.metadata[\"render_fps\"\"]`` (or 30, if the environment does not specify \"render_fps\") is used.\n        zoom: Zoom the observation in, ``zoom`` amount, should be positive float\n        callback: If a callback is provided, it will be executed after every step. It takes the following input:\n                obs_t: observation before performing action\n                obs_tp1: observation after performing action\n                action: action that was executed\n                rew: reward that was received\n                terminated: whether the environment is terminated or not\n                truncated: whether the environment is truncated or not\n                info: debug info\n        keys_to_action:  Mapping from keys pressed to action performed.\n            Different formats are supported: Key combinations can either be expressed as a tuple of unicode code\n            points of the keys, as a tuple of characters, or as a string where each character of the string represents\n            one key.\n            For example if pressing 'w' and space at the same time is supposed\n            to trigger action number 2 then ``key_to_action`` dict could look like this:\n                >>> {\n                ...    # ...\n                ...    (ord('w'), ord(' ')): 2\n                ...    # ...\n                ... }\n            or like this:\n                >>> {\n                ...    # ...\n                ...    (\"w\", \" \"): 2\n                ...    # ...\n                ... }\n            or like this:\n                >>> {\n                ...    # ...\n                ...    \"w \": 2\n                ...    # ...\n                ... }\n            If ``None``, default ``key_to_action`` mapping for that environment is used, if provided.\n        seed: Random seed used when resetting the environment. If None, no seed is used.\n        noop: The action used when no key input has been entered, or the entered key combination is unknown.\n    \"\"\"\n    env.reset(seed=seed)\n\n    if keys_to_action is None:\n        if hasattr(env, \"get_keys_to_action\"):\n            keys_to_action = env.get_keys_to_action()\n        elif hasattr(env.unwrapped, \"get_keys_to_action\"):\n            keys_to_action = env.unwrapped.get_keys_to_action()\n        else:\n            raise MissingKeysToAction(\n                f\"{env.spec.id} does not have explicit key to action mapping, \"\n                \"please specify one manually\"\n            )\n    assert keys_to_action is not None\n\n    key_code_to_action = {}\n    for key_combination, action in keys_to_action.items():\n        key_code = tuple(\n            sorted(ord(key) if isinstance(key, str) else key for key in key_combination)\n        )\n        key_code_to_action[key_code] = action\n\n    game = PlayableGame(env, key_code_to_action, zoom)\n\n    if fps is None:\n        fps = env.metadata.get(\"render_fps\", 30)\n\n    done, obs = True, None\n    clock = pygame.time.Clock()\n\n    while game.running:\n        if done:\n            done = False\n            obs = env.reset(seed=seed)\n        else:\n            action = key_code_to_action.get(tuple(sorted(game.pressed_keys)), noop)\n            prev_obs = obs\n            obs, rew, terminated, truncated, info = env.step(action)\n            done = terminated or truncated\n            if callback is not None:\n                callback(prev_obs, obs, action, rew, terminated, truncated, info)\n        if obs is not None:\n            rendered = env.render()\n            if isinstance(rendered, List):\n                rendered = rendered[-1]\n            assert rendered is not None and isinstance(rendered, np.ndarray)\n            display_arr(\n                game.screen, rendered, transpose=transpose, video_size=game.video_size\n            )\n\n        # process pygame events\n        for event in pygame.event.get():\n            game.process_event(event)\n\n        pygame.display.flip()\n        clock.tick(fps)\n    pygame.quit()\n\n\nclass PlayPlot:\n    \"\"\"Provides a callback to create live plots of arbitrary metrics when using :func:`play`.\n\n    This class is instantiated with a function that accepts information about a single environment transition:\n        - obs_t: observation before performing action\n        - obs_tp1: observation after performing action\n        - action: action that was executed\n        - rew: reward that was received\n        - terminated: whether the environment is terminated or not\n        - truncated: whether the environment is truncated or not\n        - info: debug info\n\n    It should return a list of metrics that are computed from this data.\n    For instance, the function may look like this::\n\n        >>> def compute_metrics(obs_t, obs_tp, action, reward, terminated, truncated, info):\n        ...     return [reward, info[\"cumulative_reward\"], np.linalg.norm(action)]\n\n    :class:`PlayPlot` provides the method :meth:`callback` which will pass its arguments along to that function\n    and uses the returned values to update live plots of the metrics.\n\n    Typically, this :meth:`callback` will be used in conjunction with :func:`play` to see how the metrics evolve as you play::\n\n        >>> plotter = PlayPlot(compute_metrics, horizon_timesteps=200,\n        ...                    plot_names=[\"Immediate Rew.\", \"Cumulative Rew.\", \"Action Magnitude\"])\n        >>> play(your_env, callback=plotter.callback)\n    \"\"\"\n\n    def __init__(\n        self, callback: callable, horizon_timesteps: int, plot_names: List[str]\n    ):\n        \"\"\"Constructor of :class:`PlayPlot`.\n\n        The function ``callback`` that is passed to this constructor should return\n        a list of metrics that is of length ``len(plot_names)``.\n\n        Args:\n            callback: Function that computes metrics from environment transitions\n            horizon_timesteps: The time horizon used for the live plots\n            plot_names: List of plot titles\n\n        Raises:\n            DependencyNotInstalled: If matplotlib is not installed\n        \"\"\"\n        deprecation(\n            \"`PlayPlot` is marked as deprecated and will be removed in the near future.\"\n        )\n        self.data_callback = callback\n        self.horizon_timesteps = horizon_timesteps\n        self.plot_names = plot_names\n\n        if plt is None:\n            raise DependencyNotInstalled(\n                \"matplotlib is not installed, run `pip install gym[other]`\"\n            )\n\n        num_plots = len(self.plot_names)\n        self.fig, self.ax = plt.subplots(num_plots)\n        if num_plots == 1:\n            self.ax = [self.ax]\n        for axis, name in zip(self.ax, plot_names):\n            axis.set_title(name)\n        self.t = 0\n        self.cur_plot: List[Optional[plt.Axes]] = [None for _ in range(num_plots)]\n        self.data = [deque(maxlen=horizon_timesteps) for _ in range(num_plots)]\n\n    def callback(\n        self,\n        obs_t: ObsType,\n        obs_tp1: ObsType,\n        action: ActType,\n        rew: float,\n        terminated: bool,\n        truncated: bool,\n        info: dict,\n    ):\n        \"\"\"The callback that calls the provided data callback and adds the data to the plots.\n\n        Args:\n            obs_t: The observation at time step t\n            obs_tp1: The observation at time step t+1\n            action: The action\n            rew: The reward\n            terminated: If the environment is terminated\n            truncated: If the environment is truncated\n            info: The information from the environment\n        \"\"\"\n        points = self.data_callback(\n            obs_t, obs_tp1, action, rew, terminated, truncated, info\n        )\n        for point, data_series in zip(points, self.data):\n            data_series.append(point)\n        self.t += 1\n\n        xmin, xmax = max(0, self.t - self.horizon_timesteps), self.t\n\n        for i, plot in enumerate(self.cur_plot):\n            if plot is not None:\n                plot.remove()\n            self.cur_plot[i] = self.ax[i].scatter(\n                range(xmin, xmax), list(self.data[i]), c=\"blue\"\n            )\n            self.ax[i].set_xlim(xmin, xmax)\n\n        if plt is None:\n            raise DependencyNotInstalled(\n                \"matplotlib is not installed, run `pip install gym[other]`\"\n            )\n        plt.pause(0.000001)\n"
  },
  {
    "path": "gym/utils/save_video.py",
    "content": "\"\"\"Utility functions to save rendering videos.\"\"\"\nimport os\nfrom typing import Callable, Optional\n\nimport gym\nfrom gym import logger\n\ntry:\n    from moviepy.video.io.ImageSequenceClip import ImageSequenceClip\nexcept ImportError:\n    raise gym.error.DependencyNotInstalled(\n        \"MoviePy is not installed, run `pip install moviepy`\"\n    )\n\n\ndef capped_cubic_video_schedule(episode_id: int) -> bool:\n    \"\"\"The default episode trigger.\n\n    This function will trigger recordings at the episode indices 0, 1, 4, 8, 27, ..., :math:`k^3`, ..., 729, 1000, 2000, 3000, ...\n\n    Args:\n        episode_id: The episode number\n\n    Returns:\n        If to apply a video schedule number\n    \"\"\"\n    if episode_id < 1000:\n        return int(round(episode_id ** (1.0 / 3))) ** 3 == episode_id\n    else:\n        return episode_id % 1000 == 0\n\n\ndef save_video(\n    frames: list,\n    video_folder: str,\n    episode_trigger: Callable[[int], bool] = None,\n    step_trigger: Callable[[int], bool] = None,\n    video_length: Optional[int] = None,\n    name_prefix: str = \"rl-video\",\n    episode_index: int = 0,\n    step_starting_index: int = 0,\n    **kwargs,\n):\n    \"\"\"Save videos from rendering frames.\n\n    This function extract video from a list of render frame episodes.\n\n    Args:\n        frames (List[RenderFrame]): A list of frames to compose the video.\n        video_folder (str): The folder where the recordings will be stored\n        episode_trigger: Function that accepts an integer and returns ``True`` iff a recording should be started at this episode\n        step_trigger: Function that accepts an integer and returns ``True`` iff a recording should be started at this step\n        video_length (int): The length of recorded episodes. If it isn't specified, the entire episode is recorded.\n            Otherwise, snippets of the specified length are captured.\n        name_prefix (str): Will be prepended to the filename of the recordings.\n        episode_index (int): The index of the current episode.\n        step_starting_index (int): The step index of the first frame.\n        **kwargs: The kwargs that will be passed to moviepy's ImageSequenceClip.\n            You need to specify either fps or duration.\n\n    Example:\n        >>> import gym\n        >>> from gym.utils.save_video import save_video\n        >>> env = gym.make(\"FrozenLake-v1\", render_mode=\"rgb_array_list\")\n        >>> env.reset()\n        >>> step_starting_index = 0\n        >>> episode_index = 0\n        >>> for step_index in range(199):\n        ...    action = env.action_space.sample()\n        ...    _, _, done, _ = env.step(action)\n        ...    if done:\n        ...       save_video(\n        ...          env.render(),\n        ...          \"videos\",\n        ...          fps=env.metadata[\"render_fps\"],\n        ...          step_starting_index=step_starting_index,\n        ...          episode_index=episode_index\n        ...       )\n        ...       step_starting_index = step_index + 1\n        ...       episode_index += 1\n        ...       env.reset()\n        >>> env.close()\n    \"\"\"\n    if not isinstance(frames, list):\n        logger.error(f\"Expected a list of frames, got a {type(frames)} instead.\")\n    if episode_trigger is None and step_trigger is None:\n        episode_trigger = capped_cubic_video_schedule\n\n    video_folder = os.path.abspath(video_folder)\n    os.makedirs(video_folder, exist_ok=True)\n    path_prefix = f\"{video_folder}/{name_prefix}\"\n\n    if episode_trigger is not None and episode_trigger(episode_index):\n        clip = ImageSequenceClip(frames[:video_length], **kwargs)\n        clip.write_videofile(f\"{path_prefix}-episode-{episode_index}.mp4\")\n\n    if step_trigger is not None:\n        # skip the first frame since it comes from reset\n        for step_index, frame_index in enumerate(\n            range(1, len(frames)), start=step_starting_index\n        ):\n            if step_trigger(step_index):\n                end_index = (\n                    frame_index + video_length if video_length is not None else None\n                )\n                clip = ImageSequenceClip(frames[frame_index:end_index], **kwargs)\n                clip.write_videofile(f\"{path_prefix}-step-{step_index}.mp4\")\n"
  },
  {
    "path": "gym/utils/seeding.py",
    "content": "\"\"\"Set of random number generator functions: seeding, generator, hashing seeds.\"\"\"\nfrom typing import Any, Optional, Tuple\n\nimport numpy as np\n\nfrom gym import error\n\n\ndef np_random(seed: Optional[int] = None) -> Tuple[np.random.Generator, Any]:\n    \"\"\"Generates a random number generator from the seed and returns the Generator and seed.\n\n    Args:\n        seed: The seed used to create the generator\n\n    Returns:\n        The generator and resulting seed\n\n    Raises:\n        Error: Seed must be a non-negative integer or omitted\n    \"\"\"\n    if seed is not None and not (isinstance(seed, int) and 0 <= seed):\n        raise error.Error(f\"Seed must be a non-negative integer or omitted, not {seed}\")\n\n    seed_seq = np.random.SeedSequence(seed)\n    np_seed = seed_seq.entropy\n    rng = RandomNumberGenerator(np.random.PCG64(seed_seq))\n    return rng, np_seed\n\n\nRNG = RandomNumberGenerator = np.random.Generator\n"
  },
  {
    "path": "gym/utils/step_api_compatibility.py",
    "content": "\"\"\"Contains methods for step compatibility, from old-to-new and new-to-old API.\"\"\"\nfrom typing import Tuple, Union\n\nimport numpy as np\n\nfrom gym.core import ObsType\n\nDoneStepType = Tuple[\n    Union[ObsType, np.ndarray],\n    Union[float, np.ndarray],\n    Union[bool, np.ndarray],\n    Union[dict, list],\n]\n\nTerminatedTruncatedStepType = Tuple[\n    Union[ObsType, np.ndarray],\n    Union[float, np.ndarray],\n    Union[bool, np.ndarray],\n    Union[bool, np.ndarray],\n    Union[dict, list],\n]\n\n\ndef convert_to_terminated_truncated_step_api(\n    step_returns: Union[DoneStepType, TerminatedTruncatedStepType], is_vector_env=False\n) -> TerminatedTruncatedStepType:\n    \"\"\"Function to transform step returns to new step API irrespective of input API.\n\n    Args:\n        step_returns (tuple): Items returned by step(). Can be (obs, rew, done, info) or (obs, rew, terminated, truncated, info)\n        is_vector_env (bool): Whether the step_returns are from a vector environment\n    \"\"\"\n    if len(step_returns) == 5:\n        return step_returns\n    else:\n        assert len(step_returns) == 4\n        observations, rewards, dones, infos = step_returns\n\n        # Cases to handle - info single env /  info vector env (list) / info vector env (dict)\n        if is_vector_env is False:\n            truncated = infos.pop(\"TimeLimit.truncated\", False)\n            return (\n                observations,\n                rewards,\n                dones and not truncated,\n                dones and truncated,\n                infos,\n            )\n        elif isinstance(infos, list):\n            truncated = np.array(\n                [info.pop(\"TimeLimit.truncated\", False) for info in infos]\n            )\n            return (\n                observations,\n                rewards,\n                np.logical_and(dones, np.logical_not(truncated)),\n                np.logical_and(dones, truncated),\n                infos,\n            )\n        elif isinstance(infos, dict):\n            num_envs = len(dones)\n            truncated = infos.pop(\"TimeLimit.truncated\", np.zeros(num_envs, dtype=bool))\n            return (\n                observations,\n                rewards,\n                np.logical_and(dones, np.logical_not(truncated)),\n                np.logical_and(dones, truncated),\n                infos,\n            )\n        else:\n            raise TypeError(\n                f\"Unexpected value of infos, as is_vector_envs=False, expects `info` to be a list or dict, actual type: {type(infos)}\"\n            )\n\n\ndef convert_to_done_step_api(\n    step_returns: Union[TerminatedTruncatedStepType, DoneStepType],\n    is_vector_env: bool = False,\n) -> DoneStepType:\n    \"\"\"Function to transform step returns to old step API irrespective of input API.\n\n    Args:\n        step_returns (tuple): Items returned by step(). Can be (obs, rew, done, info) or (obs, rew, terminated, truncated, info)\n        is_vector_env (bool): Whether the step_returns are from a vector environment\n    \"\"\"\n    if len(step_returns) == 4:\n        return step_returns\n    else:\n        assert len(step_returns) == 5\n        observations, rewards, terminated, truncated, infos = step_returns\n\n        # Cases to handle - info single env /  info vector env (list) / info vector env (dict)\n        if is_vector_env is False:\n            if truncated or terminated:\n                infos[\"TimeLimit.truncated\"] = truncated and not terminated\n            return (\n                observations,\n                rewards,\n                terminated or truncated,\n                infos,\n            )\n        elif isinstance(infos, list):\n            for info, env_truncated, env_terminated in zip(\n                infos, truncated, terminated\n            ):\n                if env_truncated or env_terminated:\n                    info[\"TimeLimit.truncated\"] = env_truncated and not env_terminated\n            return (\n                observations,\n                rewards,\n                np.logical_or(terminated, truncated),\n                infos,\n            )\n        elif isinstance(infos, dict):\n            if np.logical_or(np.any(truncated), np.any(terminated)):\n                infos[\"TimeLimit.truncated\"] = np.logical_and(\n                    truncated, np.logical_not(terminated)\n                )\n            return (\n                observations,\n                rewards,\n                np.logical_or(terminated, truncated),\n                infos,\n            )\n        else:\n            raise TypeError(\n                f\"Unexpected value of infos, as is_vector_envs=False, expects `info` to be a list or dict, actual type: {type(infos)}\"\n            )\n\n\ndef step_api_compatibility(\n    step_returns: Union[TerminatedTruncatedStepType, DoneStepType],\n    output_truncation_bool: bool = True,\n    is_vector_env: bool = False,\n) -> Union[TerminatedTruncatedStepType, DoneStepType]:\n    \"\"\"Function to transform step returns to the API specified by `output_truncation_bool` bool.\n\n    Done (old) step API refers to step() method returning (observation, reward, done, info)\n    Terminated Truncated (new) step API refers to step() method returning (observation, reward, terminated, truncated, info)\n    (Refer to docs for details on the API change)\n\n    Args:\n        step_returns (tuple): Items returned by step(). Can be (obs, rew, done, info) or (obs, rew, terminated, truncated, info)\n        output_truncation_bool (bool): Whether the output should return two booleans (new API) or one (old) (True by default)\n        is_vector_env (bool): Whether the step_returns are from a vector environment\n\n    Returns:\n        step_returns (tuple): Depending on `output_truncation_bool` bool, it can return (obs, rew, done, info) or (obs, rew, terminated, truncated, info)\n\n    Examples:\n        This function can be used to ensure compatibility in step interfaces with conflicting API. Eg. if env is written in old API,\n         wrapper is written in new API, and the final step output is desired to be in old API.\n\n        >>> obs, rew, done, info = step_api_compatibility(env.step(action), output_truncation_bool=False)\n        >>> obs, rew, terminated, truncated, info = step_api_compatibility(env.step(action), output_truncation_bool=True)\n        >>> observations, rewards, dones, infos = step_api_compatibility(vec_env.step(action), is_vector_env=True)\n    \"\"\"\n    if output_truncation_bool:\n        return convert_to_terminated_truncated_step_api(step_returns, is_vector_env)\n    else:\n        return convert_to_done_step_api(step_returns, is_vector_env)\n"
  },
  {
    "path": "gym/vector/__init__.py",
    "content": "\"\"\"Module for vector environments.\"\"\"\nfrom typing import Iterable, List, Optional, Union\n\nimport gym\nfrom gym.vector.async_vector_env import AsyncVectorEnv\nfrom gym.vector.sync_vector_env import SyncVectorEnv\nfrom gym.vector.vector_env import VectorEnv, VectorEnvWrapper\n\n__all__ = [\"AsyncVectorEnv\", \"SyncVectorEnv\", \"VectorEnv\", \"VectorEnvWrapper\", \"make\"]\n\n\ndef make(\n    id: str,\n    num_envs: int = 1,\n    asynchronous: bool = True,\n    wrappers: Optional[Union[callable, List[callable]]] = None,\n    disable_env_checker: Optional[bool] = None,\n    **kwargs,\n) -> VectorEnv:\n    \"\"\"Create a vectorized environment from multiple copies of an environment, from its id.\n\n    Example::\n\n        >>> import gym\n        >>> env = gym.vector.make('CartPole-v1', num_envs=3)\n        >>> env.reset()\n        array([[-0.04456399,  0.04653909,  0.01326909, -0.02099827],\n               [ 0.03073904,  0.00145001, -0.03088818, -0.03131252],\n               [ 0.03468829,  0.01500225,  0.01230312,  0.01825218]],\n              dtype=float32)\n\n    Args:\n        id: The environment ID. This must be a valid ID from the registry.\n        num_envs: Number of copies of the environment.\n        asynchronous: If `True`, wraps the environments in an :class:`AsyncVectorEnv` (which uses `multiprocessing`_ to run the environments in parallel). If ``False``, wraps the environments in a :class:`SyncVectorEnv`.\n        wrappers: If not ``None``, then apply the wrappers to each internal environment during creation.\n        disable_env_checker: If to run the env checker for the first environment only. None will default to the environment spec `disable_env_checker` parameter\n            (that is by default False), otherwise will run according to this argument (True = not run, False = run)\n        **kwargs: Keywords arguments applied during `gym.make`\n\n    Returns:\n        The vectorized environment.\n    \"\"\"\n\n    def create_env(env_num: int):\n        \"\"\"Creates an environment that can enable or disable the environment checker.\"\"\"\n        # If the env_num > 0 then disable the environment checker otherwise use the parameter\n        _disable_env_checker = True if env_num > 0 else disable_env_checker\n\n        def _make_env():\n            env = gym.envs.registration.make(\n                id,\n                disable_env_checker=_disable_env_checker,\n                **kwargs,\n            )\n            if wrappers is not None:\n                if callable(wrappers):\n                    env = wrappers(env)\n                elif isinstance(wrappers, Iterable) and all(\n                    [callable(w) for w in wrappers]\n                ):\n                    for wrapper in wrappers:\n                        env = wrapper(env)\n                else:\n                    raise NotImplementedError\n            return env\n\n        return _make_env\n\n    env_fns = [\n        create_env(disable_env_checker or env_num > 0) for env_num in range(num_envs)\n    ]\n    return AsyncVectorEnv(env_fns) if asynchronous else SyncVectorEnv(env_fns)\n"
  },
  {
    "path": "gym/vector/async_vector_env.py",
    "content": "\"\"\"An async vector environment.\"\"\"\nimport multiprocessing as mp\nimport sys\nimport time\nfrom copy import deepcopy\nfrom enum import Enum\nfrom typing import List, Optional, Sequence, Tuple, Union\n\nimport numpy as np\n\nimport gym\nfrom gym import logger\nfrom gym.core import ObsType\nfrom gym.error import (\n    AlreadyPendingCallError,\n    ClosedEnvironmentError,\n    CustomSpaceError,\n    NoAsyncCallError,\n)\nfrom gym.vector.utils import (\n    CloudpickleWrapper,\n    clear_mpi_env_vars,\n    concatenate,\n    create_empty_array,\n    create_shared_memory,\n    iterate,\n    read_from_shared_memory,\n    write_to_shared_memory,\n)\nfrom gym.vector.vector_env import VectorEnv\n\n__all__ = [\"AsyncVectorEnv\"]\n\n\nclass AsyncState(Enum):\n    DEFAULT = \"default\"\n    WAITING_RESET = \"reset\"\n    WAITING_STEP = \"step\"\n    WAITING_CALL = \"call\"\n\n\nclass AsyncVectorEnv(VectorEnv):\n    \"\"\"Vectorized environment that runs multiple environments in parallel.\n\n    It uses ``multiprocessing`` processes, and pipes for communication.\n\n    Example::\n\n        >>> import gym\n        >>> env = gym.vector.AsyncVectorEnv([\n        ...     lambda: gym.make(\"Pendulum-v0\", g=9.81),\n        ...     lambda: gym.make(\"Pendulum-v0\", g=1.62)\n        ... ])\n        >>> env.reset()\n        array([[-0.8286432 ,  0.5597771 ,  0.90249056],\n               [-0.85009176,  0.5266346 ,  0.60007906]], dtype=float32)\n    \"\"\"\n\n    def __init__(\n        self,\n        env_fns: Sequence[callable],\n        observation_space: Optional[gym.Space] = None,\n        action_space: Optional[gym.Space] = None,\n        shared_memory: bool = True,\n        copy: bool = True,\n        context: Optional[str] = None,\n        daemon: bool = True,\n        worker: Optional[callable] = None,\n    ):\n        \"\"\"Vectorized environment that runs multiple environments in parallel.\n\n        Args:\n            env_fns: Functions that create the environments.\n            observation_space: Observation space of a single environment. If ``None``,\n                then the observation space of the first environment is taken.\n            action_space: Action space of a single environment. If ``None``,\n                then the action space of the first environment is taken.\n            shared_memory: If ``True``, then the observations from the worker processes are communicated back through\n                shared variables. This can improve the efficiency if the observations are large (e.g. images).\n            copy: If ``True``, then the :meth:`~AsyncVectorEnv.reset` and :meth:`~AsyncVectorEnv.step` methods\n                return a copy of the observations.\n            context: Context for `multiprocessing`_. If ``None``, then the default context is used.\n            daemon: If ``True``, then subprocesses have ``daemon`` flag turned on; that is, they will quit if\n                the head process quits. However, ``daemon=True`` prevents subprocesses to spawn children,\n                so for some environments you may want to have it set to ``False``.\n            worker: If set, then use that worker in a subprocess instead of a default one.\n                Can be useful to override some inner vector env logic, for instance, how resets on termination or truncation are handled.\n\n        Warnings: worker is an advanced mode option. It provides a high degree of flexibility and a high chance\n            to shoot yourself in the foot; thus, if you are writing your own worker, it is recommended to start\n            from the code for ``_worker`` (or ``_worker_shared_memory``) method, and add changes.\n\n        Raises:\n            RuntimeError: If the observation space of some sub-environment does not match observation_space\n                (or, by default, the observation space of the first sub-environment).\n            ValueError: If observation_space is a custom space (i.e. not a default space in Gym,\n                such as gym.spaces.Box, gym.spaces.Discrete, or gym.spaces.Dict) and shared_memory is True.\n        \"\"\"\n        ctx = mp.get_context(context)\n        self.env_fns = env_fns\n        self.shared_memory = shared_memory\n        self.copy = copy\n        dummy_env = env_fns[0]()\n        self.metadata = dummy_env.metadata\n\n        if (observation_space is None) or (action_space is None):\n            observation_space = observation_space or dummy_env.observation_space\n            action_space = action_space or dummy_env.action_space\n        dummy_env.close()\n        del dummy_env\n        super().__init__(\n            num_envs=len(env_fns),\n            observation_space=observation_space,\n            action_space=action_space,\n        )\n\n        if self.shared_memory:\n            try:\n                _obs_buffer = create_shared_memory(\n                    self.single_observation_space, n=self.num_envs, ctx=ctx\n                )\n                self.observations = read_from_shared_memory(\n                    self.single_observation_space, _obs_buffer, n=self.num_envs\n                )\n            except CustomSpaceError:\n                raise ValueError(\n                    \"Using `shared_memory=True` in `AsyncVectorEnv` \"\n                    \"is incompatible with non-standard Gym observation spaces \"\n                    \"(i.e. custom spaces inheriting from `gym.Space`), and is \"\n                    \"only compatible with default Gym spaces (e.g. `Box`, \"\n                    \"`Tuple`, `Dict`) for batching. Set `shared_memory=False` \"\n                    \"if you use custom observation spaces.\"\n                )\n        else:\n            _obs_buffer = None\n            self.observations = create_empty_array(\n                self.single_observation_space, n=self.num_envs, fn=np.zeros\n            )\n\n        self.parent_pipes, self.processes = [], []\n        self.error_queue = ctx.Queue()\n        target = _worker_shared_memory if self.shared_memory else _worker\n        target = worker or target\n        with clear_mpi_env_vars():\n            for idx, env_fn in enumerate(self.env_fns):\n                parent_pipe, child_pipe = ctx.Pipe()\n                process = ctx.Process(\n                    target=target,\n                    name=f\"Worker<{type(self).__name__}>-{idx}\",\n                    args=(\n                        idx,\n                        CloudpickleWrapper(env_fn),\n                        child_pipe,\n                        parent_pipe,\n                        _obs_buffer,\n                        self.error_queue,\n                    ),\n                )\n\n                self.parent_pipes.append(parent_pipe)\n                self.processes.append(process)\n\n                process.daemon = daemon\n                process.start()\n                child_pipe.close()\n\n        self._state = AsyncState.DEFAULT\n        self._check_spaces()\n\n    def reset_async(\n        self,\n        seed: Optional[Union[int, List[int]]] = None,\n        options: Optional[dict] = None,\n    ):\n        \"\"\"Send calls to the :obj:`reset` methods of the sub-environments.\n\n        To get the results of these calls, you may invoke :meth:`reset_wait`.\n\n        Args:\n            seed: List of seeds for each environment\n            options: The reset option\n\n        Raises:\n            ClosedEnvironmentError: If the environment was closed (if :meth:`close` was previously called).\n            AlreadyPendingCallError: If the environment is already waiting for a pending call to another\n                method (e.g. :meth:`step_async`). This can be caused by two consecutive\n                calls to :meth:`reset_async`, with no call to :meth:`reset_wait` in between.\n        \"\"\"\n        self._assert_is_running()\n\n        if seed is None:\n            seed = [None for _ in range(self.num_envs)]\n        if isinstance(seed, int):\n            seed = [seed + i for i in range(self.num_envs)]\n        assert len(seed) == self.num_envs\n\n        if self._state != AsyncState.DEFAULT:\n            raise AlreadyPendingCallError(\n                f\"Calling `reset_async` while waiting for a pending call to `{self._state.value}` to complete\",\n                self._state.value,\n            )\n\n        for pipe, single_seed in zip(self.parent_pipes, seed):\n            single_kwargs = {}\n            if single_seed is not None:\n                single_kwargs[\"seed\"] = single_seed\n            if options is not None:\n                single_kwargs[\"options\"] = options\n\n            pipe.send((\"reset\", single_kwargs))\n        self._state = AsyncState.WAITING_RESET\n\n    def reset_wait(\n        self,\n        timeout: Optional[Union[int, float]] = None,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ) -> Union[ObsType, Tuple[ObsType, List[dict]]]:\n        \"\"\"Waits for the calls triggered by :meth:`reset_async` to finish and returns the results.\n\n        Args:\n            timeout: Number of seconds before the call to `reset_wait` times out. If `None`, the call to `reset_wait` never times out.\n            seed: ignored\n            options: ignored\n\n        Returns:\n            A tuple of batched observations and list of dictionaries\n\n        Raises:\n            ClosedEnvironmentError: If the environment was closed (if :meth:`close` was previously called).\n            NoAsyncCallError: If :meth:`reset_wait` was called without any prior call to :meth:`reset_async`.\n            TimeoutError: If :meth:`reset_wait` timed out.\n        \"\"\"\n        self._assert_is_running()\n        if self._state != AsyncState.WAITING_RESET:\n            raise NoAsyncCallError(\n                \"Calling `reset_wait` without any prior \" \"call to `reset_async`.\",\n                AsyncState.WAITING_RESET.value,\n            )\n\n        if not self._poll(timeout):\n            self._state = AsyncState.DEFAULT\n            raise mp.TimeoutError(\n                f\"The call to `reset_wait` has timed out after {timeout} second(s).\"\n            )\n\n        results, successes = zip(*[pipe.recv() for pipe in self.parent_pipes])\n        self._raise_if_errors(successes)\n        self._state = AsyncState.DEFAULT\n\n        infos = {}\n        results, info_data = zip(*results)\n        for i, info in enumerate(info_data):\n            infos = self._add_info(infos, info, i)\n\n        if not self.shared_memory:\n            self.observations = concatenate(\n                self.single_observation_space, results, self.observations\n            )\n\n        return (deepcopy(self.observations) if self.copy else self.observations), infos\n\n    def step_async(self, actions: np.ndarray):\n        \"\"\"Send the calls to :obj:`step` to each sub-environment.\n\n        Args:\n            actions: Batch of actions. element of :attr:`~VectorEnv.action_space`\n\n        Raises:\n            ClosedEnvironmentError: If the environment was closed (if :meth:`close` was previously called).\n            AlreadyPendingCallError: If the environment is already waiting for a pending call to another\n                method (e.g. :meth:`reset_async`). This can be caused by two consecutive\n                calls to :meth:`step_async`, with no call to :meth:`step_wait` in\n                between.\n        \"\"\"\n        self._assert_is_running()\n        if self._state != AsyncState.DEFAULT:\n            raise AlreadyPendingCallError(\n                f\"Calling `step_async` while waiting for a pending call to `{self._state.value}` to complete.\",\n                self._state.value,\n            )\n\n        actions = iterate(self.action_space, actions)\n        for pipe, action in zip(self.parent_pipes, actions):\n            pipe.send((\"step\", action))\n        self._state = AsyncState.WAITING_STEP\n\n    def step_wait(\n        self, timeout: Optional[Union[int, float]] = None\n    ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, List[dict]]:\n        \"\"\"Wait for the calls to :obj:`step` in each sub-environment to finish.\n\n        Args:\n            timeout: Number of seconds before the call to :meth:`step_wait` times out. If ``None``, the call to :meth:`step_wait` never times out.\n\n        Returns:\n             The batched environment step information, (obs, reward, terminated, truncated, info)\n\n        Raises:\n            ClosedEnvironmentError: If the environment was closed (if :meth:`close` was previously called).\n            NoAsyncCallError: If :meth:`step_wait` was called without any prior call to :meth:`step_async`.\n            TimeoutError: If :meth:`step_wait` timed out.\n        \"\"\"\n        self._assert_is_running()\n        if self._state != AsyncState.WAITING_STEP:\n            raise NoAsyncCallError(\n                \"Calling `step_wait` without any prior call \" \"to `step_async`.\",\n                AsyncState.WAITING_STEP.value,\n            )\n\n        if not self._poll(timeout):\n            self._state = AsyncState.DEFAULT\n            raise mp.TimeoutError(\n                f\"The call to `step_wait` has timed out after {timeout} second(s).\"\n            )\n\n        observations_list, rewards, terminateds, truncateds, infos = [], [], [], [], {}\n        successes = []\n        for i, pipe in enumerate(self.parent_pipes):\n            result, success = pipe.recv()\n            obs, rew, terminated, truncated, info = result\n\n            successes.append(success)\n            observations_list.append(obs)\n            rewards.append(rew)\n            terminateds.append(terminated)\n            truncateds.append(truncated)\n            infos = self._add_info(infos, info, i)\n\n        self._raise_if_errors(successes)\n        self._state = AsyncState.DEFAULT\n\n        if not self.shared_memory:\n            self.observations = concatenate(\n                self.single_observation_space,\n                observations_list,\n                self.observations,\n            )\n\n        return (\n            deepcopy(self.observations) if self.copy else self.observations,\n            np.array(rewards),\n            np.array(terminateds, dtype=np.bool_),\n            np.array(truncateds, dtype=np.bool_),\n            infos,\n        )\n\n    def call_async(self, name: str, *args, **kwargs):\n        \"\"\"Calls the method with name asynchronously and apply args and kwargs to the method.\n\n        Args:\n            name: Name of the method or property to call.\n            *args: Arguments to apply to the method call.\n            **kwargs: Keyword arguments to apply to the method call.\n\n        Raises:\n            ClosedEnvironmentError: If the environment was closed (if :meth:`close` was previously called).\n            AlreadyPendingCallError: Calling `call_async` while waiting for a pending call to complete\n        \"\"\"\n        self._assert_is_running()\n        if self._state != AsyncState.DEFAULT:\n            raise AlreadyPendingCallError(\n                \"Calling `call_async` while waiting \"\n                f\"for a pending call to `{self._state.value}` to complete.\",\n                self._state.value,\n            )\n\n        for pipe in self.parent_pipes:\n            pipe.send((\"_call\", (name, args, kwargs)))\n        self._state = AsyncState.WAITING_CALL\n\n    def call_wait(self, timeout: Optional[Union[int, float]] = None) -> list:\n        \"\"\"Calls all parent pipes and waits for the results.\n\n        Args:\n            timeout: Number of seconds before the call to `step_wait` times out.\n                If `None` (default), the call to `step_wait` never times out.\n\n        Returns:\n            List of the results of the individual calls to the method or property for each environment.\n\n        Raises:\n            NoAsyncCallError: Calling `call_wait` without any prior call to `call_async`.\n            TimeoutError: The call to `call_wait` has timed out after timeout second(s).\n        \"\"\"\n        self._assert_is_running()\n        if self._state != AsyncState.WAITING_CALL:\n            raise NoAsyncCallError(\n                \"Calling `call_wait` without any prior call to `call_async`.\",\n                AsyncState.WAITING_CALL.value,\n            )\n\n        if not self._poll(timeout):\n            self._state = AsyncState.DEFAULT\n            raise mp.TimeoutError(\n                f\"The call to `call_wait` has timed out after {timeout} second(s).\"\n            )\n\n        results, successes = zip(*[pipe.recv() for pipe in self.parent_pipes])\n        self._raise_if_errors(successes)\n        self._state = AsyncState.DEFAULT\n\n        return results\n\n    def set_attr(self, name: str, values: Union[list, tuple, object]):\n        \"\"\"Sets an attribute of the sub-environments.\n\n        Args:\n            name: Name of the property to be set in each individual environment.\n            values: Values of the property to be set to. If ``values`` is a list or\n                tuple, then it corresponds to the values for each individual\n                environment, otherwise a single value is set for all environments.\n\n        Raises:\n            ValueError: Values must be a list or tuple with length equal to the number of environments.\n            AlreadyPendingCallError: Calling `set_attr` while waiting for a pending call to complete.\n        \"\"\"\n        self._assert_is_running()\n        if not isinstance(values, (list, tuple)):\n            values = [values for _ in range(self.num_envs)]\n        if len(values) != self.num_envs:\n            raise ValueError(\n                \"Values must be a list or tuple with length equal to the \"\n                f\"number of environments. Got `{len(values)}` values for \"\n                f\"{self.num_envs} environments.\"\n            )\n\n        if self._state != AsyncState.DEFAULT:\n            raise AlreadyPendingCallError(\n                \"Calling `set_attr` while waiting \"\n                f\"for a pending call to `{self._state.value}` to complete.\",\n                self._state.value,\n            )\n\n        for pipe, value in zip(self.parent_pipes, values):\n            pipe.send((\"_setattr\", (name, value)))\n        _, successes = zip(*[pipe.recv() for pipe in self.parent_pipes])\n        self._raise_if_errors(successes)\n\n    def close_extras(\n        self, timeout: Optional[Union[int, float]] = None, terminate: bool = False\n    ):\n        \"\"\"Close the environments & clean up the extra resources (processes and pipes).\n\n        Args:\n            timeout: Number of seconds before the call to :meth:`close` times out. If ``None``,\n                the call to :meth:`close` never times out. If the call to :meth:`close`\n                times out, then all processes are terminated.\n            terminate: If ``True``, then the :meth:`close` operation is forced and all processes are terminated.\n\n        Raises:\n            TimeoutError: If :meth:`close` timed out.\n        \"\"\"\n        timeout = 0 if terminate else timeout\n        try:\n            if self._state != AsyncState.DEFAULT:\n                logger.warn(\n                    f\"Calling `close` while waiting for a pending call to `{self._state.value}` to complete.\"\n                )\n                function = getattr(self, f\"{self._state.value}_wait\")\n                function(timeout)\n        except mp.TimeoutError:\n            terminate = True\n\n        if terminate:\n            for process in self.processes:\n                if process.is_alive():\n                    process.terminate()\n        else:\n            for pipe in self.parent_pipes:\n                if (pipe is not None) and (not pipe.closed):\n                    pipe.send((\"close\", None))\n            for pipe in self.parent_pipes:\n                if (pipe is not None) and (not pipe.closed):\n                    pipe.recv()\n\n        for pipe in self.parent_pipes:\n            if pipe is not None:\n                pipe.close()\n        for process in self.processes:\n            process.join()\n\n    def _poll(self, timeout=None):\n        self._assert_is_running()\n        if timeout is None:\n            return True\n        end_time = time.perf_counter() + timeout\n        delta = None\n        for pipe in self.parent_pipes:\n            delta = max(end_time - time.perf_counter(), 0)\n            if pipe is None:\n                return False\n            if pipe.closed or (not pipe.poll(delta)):\n                return False\n        return True\n\n    def _check_spaces(self):\n        self._assert_is_running()\n        spaces = (self.single_observation_space, self.single_action_space)\n        for pipe in self.parent_pipes:\n            pipe.send((\"_check_spaces\", spaces))\n        results, successes = zip(*[pipe.recv() for pipe in self.parent_pipes])\n        self._raise_if_errors(successes)\n        same_observation_spaces, same_action_spaces = zip(*results)\n        if not all(same_observation_spaces):\n            raise RuntimeError(\n                \"Some environments have an observation space different from \"\n                f\"`{self.single_observation_space}`. In order to batch observations, \"\n                \"the observation spaces from all environments must be equal.\"\n            )\n        if not all(same_action_spaces):\n            raise RuntimeError(\n                \"Some environments have an action space different from \"\n                f\"`{self.single_action_space}`. In order to batch actions, the \"\n                \"action spaces from all environments must be equal.\"\n            )\n\n    def _assert_is_running(self):\n        if self.closed:\n            raise ClosedEnvironmentError(\n                f\"Trying to operate on `{type(self).__name__}`, after a call to `close()`.\"\n            )\n\n    def _raise_if_errors(self, successes):\n        if all(successes):\n            return\n\n        num_errors = self.num_envs - sum(successes)\n        assert num_errors > 0\n        for i in range(num_errors):\n            index, exctype, value = self.error_queue.get()\n            logger.error(\n                f\"Received the following error from Worker-{index}: {exctype.__name__}: {value}\"\n            )\n            logger.error(f\"Shutting down Worker-{index}.\")\n            self.parent_pipes[index].close()\n            self.parent_pipes[index] = None\n\n            if i == num_errors - 1:\n                logger.error(\"Raising the last exception back to the main process.\")\n                raise exctype(value)\n\n    def __del__(self):\n        \"\"\"On deleting the object, checks that the vector environment is closed.\"\"\"\n        if not getattr(self, \"closed\", True) and hasattr(self, \"_state\"):\n            self.close(terminate=True)\n\n\ndef _worker(index, env_fn, pipe, parent_pipe, shared_memory, error_queue):\n    assert shared_memory is None\n    env = env_fn()\n    parent_pipe.close()\n    try:\n        while True:\n            command, data = pipe.recv()\n            if command == \"reset\":\n                observation, info = env.reset(**data)\n                pipe.send(((observation, info), True))\n\n            elif command == \"step\":\n                (\n                    observation,\n                    reward,\n                    terminated,\n                    truncated,\n                    info,\n                ) = env.step(data)\n                if terminated or truncated:\n                    old_observation, old_info = observation, info\n                    observation, info = env.reset()\n                    info[\"final_observation\"] = old_observation\n                    info[\"final_info\"] = old_info\n                pipe.send(((observation, reward, terminated, truncated, info), True))\n            elif command == \"seed\":\n                env.seed(data)\n                pipe.send((None, True))\n            elif command == \"close\":\n                pipe.send((None, True))\n                break\n            elif command == \"_call\":\n                name, args, kwargs = data\n                if name in [\"reset\", \"step\", \"seed\", \"close\"]:\n                    raise ValueError(\n                        f\"Trying to call function `{name}` with \"\n                        f\"`_call`. Use `{name}` directly instead.\"\n                    )\n                function = getattr(env, name)\n                if callable(function):\n                    pipe.send((function(*args, **kwargs), True))\n                else:\n                    pipe.send((function, True))\n            elif command == \"_setattr\":\n                name, value = data\n                setattr(env, name, value)\n                pipe.send((None, True))\n            elif command == \"_check_spaces\":\n                pipe.send(\n                    (\n                        (data[0] == env.observation_space, data[1] == env.action_space),\n                        True,\n                    )\n                )\n            else:\n                raise RuntimeError(\n                    f\"Received unknown command `{command}`. Must \"\n                    \"be one of {`reset`, `step`, `seed`, `close`, `_call`, \"\n                    \"`_setattr`, `_check_spaces`}.\"\n                )\n    except (KeyboardInterrupt, Exception):\n        error_queue.put((index,) + sys.exc_info()[:2])\n        pipe.send((None, False))\n    finally:\n        env.close()\n\n\ndef _worker_shared_memory(index, env_fn, pipe, parent_pipe, shared_memory, error_queue):\n    assert shared_memory is not None\n    env = env_fn()\n    observation_space = env.observation_space\n    parent_pipe.close()\n    try:\n        while True:\n            command, data = pipe.recv()\n            if command == \"reset\":\n                observation, info = env.reset(**data)\n                write_to_shared_memory(\n                    observation_space, index, observation, shared_memory\n                )\n                pipe.send(((None, info), True))\n\n            elif command == \"step\":\n                (\n                    observation,\n                    reward,\n                    terminated,\n                    truncated,\n                    info,\n                ) = env.step(data)\n                if terminated or truncated:\n                    old_observation, old_info = observation, info\n                    observation, info = env.reset()\n                    info[\"final_observation\"] = old_observation\n                    info[\"final_info\"] = old_info\n                write_to_shared_memory(\n                    observation_space, index, observation, shared_memory\n                )\n                pipe.send(((None, reward, terminated, truncated, info), True))\n            elif command == \"seed\":\n                env.seed(data)\n                pipe.send((None, True))\n            elif command == \"close\":\n                pipe.send((None, True))\n                break\n            elif command == \"_call\":\n                name, args, kwargs = data\n                if name in [\"reset\", \"step\", \"seed\", \"close\"]:\n                    raise ValueError(\n                        f\"Trying to call function `{name}` with \"\n                        f\"`_call`. Use `{name}` directly instead.\"\n                    )\n                function = getattr(env, name)\n                if callable(function):\n                    pipe.send((function(*args, **kwargs), True))\n                else:\n                    pipe.send((function, True))\n            elif command == \"_setattr\":\n                name, value = data\n                setattr(env, name, value)\n                pipe.send((None, True))\n            elif command == \"_check_spaces\":\n                pipe.send(\n                    ((data[0] == observation_space, data[1] == env.action_space), True)\n                )\n            else:\n                raise RuntimeError(\n                    f\"Received unknown command `{command}`. Must \"\n                    \"be one of {`reset`, `step`, `seed`, `close`, `_call`, \"\n                    \"`_setattr`, `_check_spaces`}.\"\n                )\n    except (KeyboardInterrupt, Exception):\n        error_queue.put((index,) + sys.exc_info()[:2])\n        pipe.send((None, False))\n    finally:\n        env.close()\n"
  },
  {
    "path": "gym/vector/sync_vector_env.py",
    "content": "\"\"\"A synchronous vector environment.\"\"\"\nfrom copy import deepcopy\nfrom typing import Any, Callable, Iterator, List, Optional, Sequence, Union\n\nimport numpy as np\n\nfrom gym import Env\nfrom gym.spaces import Space\nfrom gym.vector.utils import concatenate, create_empty_array, iterate\nfrom gym.vector.vector_env import VectorEnv\n\n__all__ = [\"SyncVectorEnv\"]\n\n\nclass SyncVectorEnv(VectorEnv):\n    \"\"\"Vectorized environment that serially runs multiple environments.\n\n    Example::\n\n        >>> import gym\n        >>> env = gym.vector.SyncVectorEnv([\n        ...     lambda: gym.make(\"Pendulum-v0\", g=9.81),\n        ...     lambda: gym.make(\"Pendulum-v0\", g=1.62)\n        ... ])\n        >>> env.reset()\n        array([[-0.8286432 ,  0.5597771 ,  0.90249056],\n               [-0.85009176,  0.5266346 ,  0.60007906]], dtype=float32)\n    \"\"\"\n\n    def __init__(\n        self,\n        env_fns: Iterator[Callable[[], Env]],\n        observation_space: Space = None,\n        action_space: Space = None,\n        copy: bool = True,\n    ):\n        \"\"\"Vectorized environment that serially runs multiple environments.\n\n        Args:\n            env_fns: iterable of callable functions that create the environments.\n            observation_space: Observation space of a single environment. If ``None``,\n                then the observation space of the first environment is taken.\n            action_space: Action space of a single environment. If ``None``,\n                then the action space of the first environment is taken.\n            copy: If ``True``, then the :meth:`reset` and :meth:`step` methods return a copy of the observations.\n\n        Raises:\n            RuntimeError: If the observation space of some sub-environment does not match observation_space\n                (or, by default, the observation space of the first sub-environment).\n        \"\"\"\n        self.env_fns = env_fns\n        self.envs = [env_fn() for env_fn in env_fns]\n        self.copy = copy\n        self.metadata = self.envs[0].metadata\n\n        if (observation_space is None) or (action_space is None):\n            observation_space = observation_space or self.envs[0].observation_space\n            action_space = action_space or self.envs[0].action_space\n        super().__init__(\n            num_envs=len(self.envs),\n            observation_space=observation_space,\n            action_space=action_space,\n        )\n\n        self._check_spaces()\n        self.observations = create_empty_array(\n            self.single_observation_space, n=self.num_envs, fn=np.zeros\n        )\n        self._rewards = np.zeros((self.num_envs,), dtype=np.float64)\n        self._terminateds = np.zeros((self.num_envs,), dtype=np.bool_)\n        self._truncateds = np.zeros((self.num_envs,), dtype=np.bool_)\n        self._actions = None\n\n    def seed(self, seed: Optional[Union[int, Sequence[int]]] = None):\n        \"\"\"Sets the seed in all sub-environments.\n\n        Args:\n            seed: The seed\n        \"\"\"\n        super().seed(seed=seed)\n        if seed is None:\n            seed = [None for _ in range(self.num_envs)]\n        if isinstance(seed, int):\n            seed = [seed + i for i in range(self.num_envs)]\n        assert len(seed) == self.num_envs\n\n        for env, single_seed in zip(self.envs, seed):\n            env.seed(single_seed)\n\n    def reset_wait(\n        self,\n        seed: Optional[Union[int, List[int]]] = None,\n        options: Optional[dict] = None,\n    ):\n        \"\"\"Waits for the calls triggered by :meth:`reset_async` to finish and returns the results.\n\n        Args:\n            seed: The reset environment seed\n            options: Option information for the environment reset\n\n        Returns:\n            The reset observation of the environment and reset information\n        \"\"\"\n        if seed is None:\n            seed = [None for _ in range(self.num_envs)]\n        if isinstance(seed, int):\n            seed = [seed + i for i in range(self.num_envs)]\n        assert len(seed) == self.num_envs\n\n        self._terminateds[:] = False\n        self._truncateds[:] = False\n        observations = []\n        infos = {}\n        for i, (env, single_seed) in enumerate(zip(self.envs, seed)):\n\n            kwargs = {}\n            if single_seed is not None:\n                kwargs[\"seed\"] = single_seed\n            if options is not None:\n                kwargs[\"options\"] = options\n\n            observation, info = env.reset(**kwargs)\n            observations.append(observation)\n            infos = self._add_info(infos, info, i)\n\n        self.observations = concatenate(\n            self.single_observation_space, observations, self.observations\n        )\n        return (deepcopy(self.observations) if self.copy else self.observations), infos\n\n    def step_async(self, actions):\n        \"\"\"Sets :attr:`_actions` for use by the :meth:`step_wait` by converting the ``actions`` to an iterable version.\"\"\"\n        self._actions = iterate(self.action_space, actions)\n\n    def step_wait(self):\n        \"\"\"Steps through each of the environments returning the batched results.\n\n        Returns:\n            The batched environment step results\n        \"\"\"\n        observations, infos = [], {}\n        for i, (env, action) in enumerate(zip(self.envs, self._actions)):\n\n            (\n                observation,\n                self._rewards[i],\n                self._terminateds[i],\n                self._truncateds[i],\n                info,\n            ) = env.step(action)\n\n            if self._terminateds[i] or self._truncateds[i]:\n                old_observation, old_info = observation, info\n                observation, info = env.reset()\n                info[\"final_observation\"] = old_observation\n                info[\"final_info\"] = old_info\n            observations.append(observation)\n            infos = self._add_info(infos, info, i)\n        self.observations = concatenate(\n            self.single_observation_space, observations, self.observations\n        )\n\n        return (\n            deepcopy(self.observations) if self.copy else self.observations,\n            np.copy(self._rewards),\n            np.copy(self._terminateds),\n            np.copy(self._truncateds),\n            infos,\n        )\n\n    def call(self, name, *args, **kwargs) -> tuple:\n        \"\"\"Calls the method with name and applies args and kwargs.\n\n        Args:\n            name: The method name\n            *args: The method args\n            **kwargs: The method kwargs\n\n        Returns:\n            Tuple of results\n        \"\"\"\n        results = []\n        for env in self.envs:\n            function = getattr(env, name)\n            if callable(function):\n                results.append(function(*args, **kwargs))\n            else:\n                results.append(function)\n\n        return tuple(results)\n\n    def set_attr(self, name: str, values: Union[list, tuple, Any]):\n        \"\"\"Sets an attribute of the sub-environments.\n\n        Args:\n            name: The property name to change\n            values: Values of the property to be set to. If ``values`` is a list or\n                tuple, then it corresponds to the values for each individual\n                environment, otherwise, a single value is set for all environments.\n\n        Raises:\n            ValueError: Values must be a list or tuple with length equal to the number of environments.\n        \"\"\"\n        if not isinstance(values, (list, tuple)):\n            values = [values for _ in range(self.num_envs)]\n        if len(values) != self.num_envs:\n            raise ValueError(\n                \"Values must be a list or tuple with length equal to the \"\n                f\"number of environments. Got `{len(values)}` values for \"\n                f\"{self.num_envs} environments.\"\n            )\n\n        for env, value in zip(self.envs, values):\n            setattr(env, name, value)\n\n    def close_extras(self, **kwargs):\n        \"\"\"Close the environments.\"\"\"\n        [env.close() for env in self.envs]\n\n    def _check_spaces(self) -> bool:\n        for env in self.envs:\n            if not (env.observation_space == self.single_observation_space):\n                raise RuntimeError(\n                    \"Some environments have an observation space different from \"\n                    f\"`{self.single_observation_space}`. In order to batch observations, \"\n                    \"the observation spaces from all environments must be equal.\"\n                )\n\n            if not (env.action_space == self.single_action_space):\n                raise RuntimeError(\n                    \"Some environments have an action space different from \"\n                    f\"`{self.single_action_space}`. In order to batch actions, the \"\n                    \"action spaces from all environments must be equal.\"\n                )\n\n        return True\n"
  },
  {
    "path": "gym/vector/utils/__init__.py",
    "content": "\"\"\"Module for gym vector utils.\"\"\"\nfrom gym.vector.utils.misc import CloudpickleWrapper, clear_mpi_env_vars\nfrom gym.vector.utils.numpy_utils import concatenate, create_empty_array\nfrom gym.vector.utils.shared_memory import (\n    create_shared_memory,\n    read_from_shared_memory,\n    write_to_shared_memory,\n)\nfrom gym.vector.utils.spaces import _BaseGymSpaces  # pyright: reportPrivateUsage=false\nfrom gym.vector.utils.spaces import BaseGymSpaces, batch_space, iterate\n\n__all__ = [\n    \"CloudpickleWrapper\",\n    \"clear_mpi_env_vars\",\n    \"concatenate\",\n    \"create_empty_array\",\n    \"create_shared_memory\",\n    \"read_from_shared_memory\",\n    \"write_to_shared_memory\",\n    \"BaseGymSpaces\",\n    \"batch_space\",\n    \"iterate\",\n]\n"
  },
  {
    "path": "gym/vector/utils/misc.py",
    "content": "\"\"\"Miscellaneous utilities.\"\"\"\nimport contextlib\nimport os\n\n__all__ = [\"CloudpickleWrapper\", \"clear_mpi_env_vars\"]\n\n\nclass CloudpickleWrapper:\n    \"\"\"Wrapper that uses cloudpickle to pickle and unpickle the result.\"\"\"\n\n    def __init__(self, fn: callable):\n        \"\"\"Cloudpickle wrapper for a function.\"\"\"\n        self.fn = fn\n\n    def __getstate__(self):\n        \"\"\"Get the state using `cloudpickle.dumps(self.fn)`.\"\"\"\n        import cloudpickle\n\n        return cloudpickle.dumps(self.fn)\n\n    def __setstate__(self, ob):\n        \"\"\"Sets the state with obs.\"\"\"\n        import pickle\n\n        self.fn = pickle.loads(ob)\n\n    def __call__(self):\n        \"\"\"Calls the function `self.fn` with no arguments.\"\"\"\n        return self.fn()\n\n\n@contextlib.contextmanager\ndef clear_mpi_env_vars():\n    \"\"\"Clears the MPI of environment variables.\n\n    `from mpi4py import MPI` will call `MPI_Init` by default.\n    If the child process has MPI environment variables, MPI will think that the child process\n    is an MPI process just like the parent and do bad things such as hang.\n\n    This context manager is a hacky way to clear those environment variables\n    temporarily such as when we are starting multiprocessing Processes.\n\n    Yields:\n        Yields for the context manager\n    \"\"\"\n    removed_environment = {}\n    for k, v in list(os.environ.items()):\n        for prefix in [\"OMPI_\", \"PMI_\"]:\n            if k.startswith(prefix):\n                removed_environment[k] = v\n                del os.environ[k]\n    try:\n        yield\n    finally:\n        os.environ.update(removed_environment)\n"
  },
  {
    "path": "gym/vector/utils/numpy_utils.py",
    "content": "\"\"\"Numpy utility functions: concatenate space samples and create empty array.\"\"\"\nfrom collections import OrderedDict\nfrom functools import singledispatch\nfrom typing import Iterable, Union\n\nimport numpy as np\n\nfrom gym.spaces import Box, Dict, Discrete, MultiBinary, MultiDiscrete, Space, Tuple\n\n__all__ = [\"concatenate\", \"create_empty_array\"]\n\n\n@singledispatch\ndef concatenate(\n    space: Space, items: Iterable, out: Union[tuple, dict, np.ndarray]\n) -> Union[tuple, dict, np.ndarray]:\n    \"\"\"Concatenate multiple samples from space into a single object.\n\n    Example::\n\n        >>> from gym.spaces import Box\n        >>> space = Box(low=0, high=1, shape=(3,), dtype=np.float32)\n        >>> out = np.zeros((2, 3), dtype=np.float32)\n        >>> items = [space.sample() for _ in range(2)]\n        >>> concatenate(space, items, out)\n        array([[0.6348213 , 0.28607962, 0.60760117],\n               [0.87383074, 0.192658  , 0.2148103 ]], dtype=float32)\n\n    Args:\n        space: Observation space of a single environment in the vectorized environment.\n        items: Samples to be concatenated.\n        out: The output object. This object is a (possibly nested) numpy array.\n\n    Returns:\n        The output object. This object is a (possibly nested) numpy array.\n\n    Raises:\n        ValueError: Space is not a valid :class:`gym.Space` instance\n    \"\"\"\n    raise ValueError(\n        f\"Space of type `{type(space)}` is not a valid `gym.Space` instance.\"\n    )\n\n\n@concatenate.register(Box)\n@concatenate.register(Discrete)\n@concatenate.register(MultiDiscrete)\n@concatenate.register(MultiBinary)\ndef _concatenate_base(space, items, out):\n    return np.stack(items, axis=0, out=out)\n\n\n@concatenate.register(Tuple)\ndef _concatenate_tuple(space, items, out):\n    return tuple(\n        concatenate(subspace, [item[i] for item in items], out[i])\n        for (i, subspace) in enumerate(space.spaces)\n    )\n\n\n@concatenate.register(Dict)\ndef _concatenate_dict(space, items, out):\n    return OrderedDict(\n        [\n            (key, concatenate(subspace, [item[key] for item in items], out[key]))\n            for (key, subspace) in space.spaces.items()\n        ]\n    )\n\n\n@concatenate.register(Space)\ndef _concatenate_custom(space, items, out):\n    return tuple(items)\n\n\n@singledispatch\ndef create_empty_array(\n    space: Space, n: int = 1, fn: callable = np.zeros\n) -> Union[tuple, dict, np.ndarray]:\n    \"\"\"Create an empty (possibly nested) numpy array.\n\n    Example::\n\n        >>> from gym.spaces import Box, Dict\n        >>> space = Dict({\n        ... 'position': Box(low=0, high=1, shape=(3,), dtype=np.float32),\n        ... 'velocity': Box(low=0, high=1, shape=(2,), dtype=np.float32)})\n        >>> create_empty_array(space, n=2, fn=np.zeros)\n        OrderedDict([('position', array([[0., 0., 0.],\n                                         [0., 0., 0.]], dtype=float32)),\n                     ('velocity', array([[0., 0.],\n                                         [0., 0.]], dtype=float32))])\n\n    Args:\n        space: Observation space of a single environment in the vectorized environment.\n        n: Number of environments in the vectorized environment. If `None`, creates an empty sample from `space`.\n        fn: Function to apply when creating the empty numpy array. Examples of such functions are `np.empty` or `np.zeros`.\n\n    Returns:\n        The output object. This object is a (possibly nested) numpy array.\n\n    Raises:\n        ValueError: Space is not a valid :class:`gym.Space` instance\n    \"\"\"\n    raise ValueError(\n        f\"Space of type `{type(space)}` is not a valid `gym.Space` instance.\"\n    )\n\n\n@create_empty_array.register(Box)\n@create_empty_array.register(Discrete)\n@create_empty_array.register(MultiDiscrete)\n@create_empty_array.register(MultiBinary)\ndef _create_empty_array_base(space, n=1, fn=np.zeros):\n    shape = space.shape if (n is None) else (n,) + space.shape\n    return fn(shape, dtype=space.dtype)\n\n\n@create_empty_array.register(Tuple)\ndef _create_empty_array_tuple(space, n=1, fn=np.zeros):\n    return tuple(create_empty_array(subspace, n=n, fn=fn) for subspace in space.spaces)\n\n\n@create_empty_array.register(Dict)\ndef _create_empty_array_dict(space, n=1, fn=np.zeros):\n    return OrderedDict(\n        [\n            (key, create_empty_array(subspace, n=n, fn=fn))\n            for (key, subspace) in space.spaces.items()\n        ]\n    )\n\n\n@create_empty_array.register(Space)\ndef _create_empty_array_custom(space, n=1, fn=np.zeros):\n    return None\n"
  },
  {
    "path": "gym/vector/utils/shared_memory.py",
    "content": "\"\"\"Utility functions for vector environments to share memory between processes.\"\"\"\nimport multiprocessing as mp\nfrom collections import OrderedDict\nfrom ctypes import c_bool\nfrom functools import singledispatch\nfrom typing import Union\n\nimport numpy as np\n\nfrom gym.error import CustomSpaceError\nfrom gym.spaces import Box, Dict, Discrete, MultiBinary, MultiDiscrete, Space, Tuple\n\n__all__ = [\"create_shared_memory\", \"read_from_shared_memory\", \"write_to_shared_memory\"]\n\n\n@singledispatch\ndef create_shared_memory(\n    space: Space, n: int = 1, ctx=mp\n) -> Union[dict, tuple, mp.Array]:\n    \"\"\"Create a shared memory object, to be shared across processes.\n\n    This eventually contains the observations from the vectorized environment.\n\n    Args:\n        space: Observation space of a single environment in the vectorized environment.\n        n: Number of environments in the vectorized environment (i.e. the number of processes).\n        ctx: The multiprocess module\n\n    Returns:\n        shared_memory for the shared object across processes.\n\n    Raises:\n        CustomSpaceError: Space is not a valid :class:`gym.Space` instance\n    \"\"\"\n    raise CustomSpaceError(\n        \"Cannot create a shared memory for space with \"\n        f\"type `{type(space)}`. Shared memory only supports \"\n        \"default Gym spaces (e.g. `Box`, `Tuple`, \"\n        \"`Dict`, etc...), and does not support custom \"\n        \"Gym spaces.\"\n    )\n\n\n@create_shared_memory.register(Box)\n@create_shared_memory.register(Discrete)\n@create_shared_memory.register(MultiDiscrete)\n@create_shared_memory.register(MultiBinary)\ndef _create_base_shared_memory(space, n: int = 1, ctx=mp):\n    dtype = space.dtype.char\n    if dtype in \"?\":\n        dtype = c_bool\n    return ctx.Array(dtype, n * int(np.prod(space.shape)))\n\n\n@create_shared_memory.register(Tuple)\ndef _create_tuple_shared_memory(space, n: int = 1, ctx=mp):\n    return tuple(\n        create_shared_memory(subspace, n=n, ctx=ctx) for subspace in space.spaces\n    )\n\n\n@create_shared_memory.register(Dict)\ndef _create_dict_shared_memory(space, n=1, ctx=mp):\n    return OrderedDict(\n        [\n            (key, create_shared_memory(subspace, n=n, ctx=ctx))\n            for (key, subspace) in space.spaces.items()\n        ]\n    )\n\n\n@singledispatch\ndef read_from_shared_memory(\n    space: Space, shared_memory: Union[dict, tuple, mp.Array], n: int = 1\n) -> Union[dict, tuple, np.ndarray]:\n    \"\"\"Read the batch of observations from shared memory as a numpy array.\n\n    ..notes::\n        The numpy array objects returned by `read_from_shared_memory` shares the\n        memory of `shared_memory`. Any changes to `shared_memory` are forwarded\n        to `observations`, and vice-versa. To avoid any side-effect, use `np.copy`.\n\n    Args:\n        space: Observation space of a single environment in the vectorized environment.\n        shared_memory: Shared object across processes. This contains the observations from the vectorized environment.\n            This object is created with `create_shared_memory`.\n        n: Number of environments in the vectorized environment (i.e. the number of processes).\n\n    Returns:\n        Batch of observations as a (possibly nested) numpy array.\n\n    Raises:\n        CustomSpaceError: Space is not a valid :class:`gym.Space` instance\n    \"\"\"\n    raise CustomSpaceError(\n        \"Cannot read from a shared memory for space with \"\n        f\"type `{type(space)}`. Shared memory only supports \"\n        \"default Gym spaces (e.g. `Box`, `Tuple`, \"\n        \"`Dict`, etc...), and does not support custom \"\n        \"Gym spaces.\"\n    )\n\n\n@read_from_shared_memory.register(Box)\n@read_from_shared_memory.register(Discrete)\n@read_from_shared_memory.register(MultiDiscrete)\n@read_from_shared_memory.register(MultiBinary)\ndef _read_base_from_shared_memory(space, shared_memory, n: int = 1):\n    return np.frombuffer(shared_memory.get_obj(), dtype=space.dtype).reshape(\n        (n,) + space.shape\n    )\n\n\n@read_from_shared_memory.register(Tuple)\ndef _read_tuple_from_shared_memory(space, shared_memory, n: int = 1):\n    return tuple(\n        read_from_shared_memory(subspace, memory, n=n)\n        for (memory, subspace) in zip(shared_memory, space.spaces)\n    )\n\n\n@read_from_shared_memory.register(Dict)\ndef _read_dict_from_shared_memory(space, shared_memory, n: int = 1):\n    return OrderedDict(\n        [\n            (key, read_from_shared_memory(subspace, shared_memory[key], n=n))\n            for (key, subspace) in space.spaces.items()\n        ]\n    )\n\n\n@singledispatch\ndef write_to_shared_memory(\n    space: Space,\n    index: int,\n    value: np.ndarray,\n    shared_memory: Union[dict, tuple, mp.Array],\n):\n    \"\"\"Write the observation of a single environment into shared memory.\n\n    Args:\n        space: Observation space of a single environment in the vectorized environment.\n        index: Index of the environment (must be in `[0, num_envs)`).\n        value: Observation of the single environment to write to shared memory.\n        shared_memory: Shared object across processes. This contains the observations from the vectorized environment.\n            This object is created with `create_shared_memory`.\n\n    Raises:\n        CustomSpaceError: Space is not a valid :class:`gym.Space` instance\n    \"\"\"\n    raise CustomSpaceError(\n        \"Cannot write to a shared memory for space with \"\n        f\"type `{type(space)}`. Shared memory only supports \"\n        \"default Gym spaces (e.g. `Box`, `Tuple`, \"\n        \"`Dict`, etc...), and does not support custom \"\n        \"Gym spaces.\"\n    )\n\n\n@write_to_shared_memory.register(Box)\n@write_to_shared_memory.register(Discrete)\n@write_to_shared_memory.register(MultiDiscrete)\n@write_to_shared_memory.register(MultiBinary)\ndef _write_base_to_shared_memory(space, index, value, shared_memory):\n    size = int(np.prod(space.shape))\n    destination = np.frombuffer(shared_memory.get_obj(), dtype=space.dtype)\n    np.copyto(\n        destination[index * size : (index + 1) * size],\n        np.asarray(value, dtype=space.dtype).flatten(),\n    )\n\n\n@write_to_shared_memory.register(Tuple)\ndef _write_tuple_to_shared_memory(space, index, values, shared_memory):\n    for value, memory, subspace in zip(values, shared_memory, space.spaces):\n        write_to_shared_memory(subspace, index, value, memory)\n\n\n@write_to_shared_memory.register(Dict)\ndef _write_dict_to_shared_memory(space, index, values, shared_memory):\n    for key, subspace in space.spaces.items():\n        write_to_shared_memory(subspace, index, values[key], shared_memory[key])\n"
  },
  {
    "path": "gym/vector/utils/spaces.py",
    "content": "\"\"\"Utility functions for gym spaces: batch space and iterator.\"\"\"\nfrom collections import OrderedDict\nfrom copy import deepcopy\nfrom functools import singledispatch\nfrom typing import Iterator\n\nimport numpy as np\n\nfrom gym.error import CustomSpaceError\nfrom gym.spaces import Box, Dict, Discrete, MultiBinary, MultiDiscrete, Space, Tuple\n\nBaseGymSpaces = (Box, Discrete, MultiDiscrete, MultiBinary)\n_BaseGymSpaces = BaseGymSpaces\n__all__ = [\"BaseGymSpaces\", \"_BaseGymSpaces\", \"batch_space\", \"iterate\"]\n\n\n@singledispatch\ndef batch_space(space: Space, n: int = 1) -> Space:\n    \"\"\"Create a (batched) space, containing multiple copies of a single space.\n\n    Example::\n\n        >>> from gym.spaces import Box, Dict\n        >>> space = Dict({\n        ...     'position': Box(low=0, high=1, shape=(3,), dtype=np.float32),\n        ...     'velocity': Box(low=0, high=1, shape=(2,), dtype=np.float32)\n        ... })\n        >>> batch_space(space, n=5)\n        Dict(position:Box(5, 3), velocity:Box(5, 2))\n\n    Args:\n        space: Space (e.g. the observation space) for a single environment in the vectorized environment.\n        n: Number of environments in the vectorized environment.\n\n    Returns:\n        Space (e.g. the observation space) for a batch of environments in the vectorized environment.\n\n    Raises:\n        ValueError: Cannot batch space that is not a valid :class:`gym.Space` instance\n    \"\"\"\n    raise ValueError(\n        f\"Cannot batch space with type `{type(space)}`. The space must be a valid `gym.Space` instance.\"\n    )\n\n\n@batch_space.register(Box)\ndef _batch_space_box(space, n=1):\n    repeats = tuple([n] + [1] * space.low.ndim)\n    low, high = np.tile(space.low, repeats), np.tile(space.high, repeats)\n    return Box(low=low, high=high, dtype=space.dtype, seed=deepcopy(space.np_random))\n\n\n@batch_space.register(Discrete)\ndef _batch_space_discrete(space, n=1):\n    if space.start == 0:\n        return MultiDiscrete(\n            np.full((n,), space.n, dtype=space.dtype),\n            dtype=space.dtype,\n            seed=deepcopy(space.np_random),\n        )\n    else:\n        return Box(\n            low=space.start,\n            high=space.start + space.n - 1,\n            shape=(n,),\n            dtype=space.dtype,\n            seed=deepcopy(space.np_random),\n        )\n\n\n@batch_space.register(MultiDiscrete)\ndef _batch_space_multidiscrete(space, n=1):\n    repeats = tuple([n] + [1] * space.nvec.ndim)\n    high = np.tile(space.nvec, repeats) - 1\n    return Box(\n        low=np.zeros_like(high),\n        high=high,\n        dtype=space.dtype,\n        seed=deepcopy(space.np_random),\n    )\n\n\n@batch_space.register(MultiBinary)\ndef _batch_space_multibinary(space, n=1):\n    return Box(\n        low=0,\n        high=1,\n        shape=(n,) + space.shape,\n        dtype=space.dtype,\n        seed=deepcopy(space.np_random),\n    )\n\n\n@batch_space.register(Tuple)\ndef _batch_space_tuple(space, n=1):\n    return Tuple(\n        tuple(batch_space(subspace, n=n) for subspace in space.spaces),\n        seed=deepcopy(space.np_random),\n    )\n\n\n@batch_space.register(Dict)\ndef _batch_space_dict(space, n=1):\n    return Dict(\n        OrderedDict(\n            [\n                (key, batch_space(subspace, n=n))\n                for (key, subspace) in space.spaces.items()\n            ]\n        ),\n        seed=deepcopy(space.np_random),\n    )\n\n\n@batch_space.register(Space)\ndef _batch_space_custom(space, n=1):\n    # Without deepcopy, then the space.np_random is batched_space.spaces[0].np_random\n    # Which is an issue if you are sampling actions of both the original space and the batched space\n    batched_space = Tuple(\n        tuple(deepcopy(space) for _ in range(n)), seed=deepcopy(space.np_random)\n    )\n    new_seeds = list(map(int, batched_space.np_random.integers(0, 1e8, n)))\n    batched_space.seed(new_seeds)\n    return batched_space\n\n\n@singledispatch\ndef iterate(space: Space, items) -> Iterator:\n    \"\"\"Iterate over the elements of a (batched) space.\n\n    Example::\n\n        >>> from gym.spaces import Box, Dict\n        >>> space = Dict({\n        ... 'position': Box(low=0, high=1, shape=(2, 3), dtype=np.float32),\n        ... 'velocity': Box(low=0, high=1, shape=(2, 2), dtype=np.float32)})\n        >>> items = space.sample()\n        >>> it = iterate(space, items)\n        >>> next(it)\n        {'position': array([-0.99644893, -0.08304597, -0.7238421 ], dtype=float32),\n        'velocity': array([0.35848552, 0.1533453 ], dtype=float32)}\n        >>> next(it)\n        {'position': array([-0.67958736, -0.49076623,  0.38661423], dtype=float32),\n        'velocity': array([0.7975036 , 0.93317133], dtype=float32)}\n        >>> next(it)\n        StopIteration\n\n    Args:\n        space: Space to which `items` belong to.\n        items: Items to be iterated over.\n\n    Returns:\n        Iterator over the elements in `items`.\n\n    Raises:\n        ValueError: Space is not an instance of :class:`gym.Space`\n    \"\"\"\n    raise ValueError(\n        f\"Space of type `{type(space)}` is not a valid `gym.Space` instance.\"\n    )\n\n\n@iterate.register(Discrete)\ndef _iterate_discrete(space, items):\n    raise TypeError(\"Unable to iterate over a space of type `Discrete`.\")\n\n\n@iterate.register(Box)\n@iterate.register(MultiDiscrete)\n@iterate.register(MultiBinary)\ndef _iterate_base(space, items):\n    try:\n        return iter(items)\n    except TypeError:\n        raise TypeError(f\"Unable to iterate over the following elements: {items}\")\n\n\n@iterate.register(Tuple)\ndef _iterate_tuple(space, items):\n    # If this is a tuple of custom subspaces only, then simply iterate over items\n    if all(\n        isinstance(subspace, Space)\n        and (not isinstance(subspace, BaseGymSpaces + (Tuple, Dict)))\n        for subspace in space.spaces\n    ):\n        return iter(items)\n\n    return zip(\n        *[iterate(subspace, items[i]) for i, subspace in enumerate(space.spaces)]\n    )\n\n\n@iterate.register(Dict)\ndef _iterate_dict(space, items):\n    keys, values = zip(\n        *[\n            (key, iterate(subspace, items[key]))\n            for key, subspace in space.spaces.items()\n        ]\n    )\n    for item in zip(*values):\n        yield OrderedDict([(key, value) for (key, value) in zip(keys, item)])\n\n\n@iterate.register(Space)\ndef _iterate_custom(space, items):\n    raise CustomSpaceError(\n        f\"Unable to iterate over {items}, since {space} \"\n        \"is a custom `gym.Space` instance (i.e. not one of \"\n        \"`Box`, `Dict`, etc...).\"\n    )\n"
  },
  {
    "path": "gym/vector/vector_env.py",
    "content": "\"\"\"Base class for vectorized environments.\"\"\"\nfrom typing import Any, List, Optional, Tuple, Union\n\nimport numpy as np\n\nimport gym\nfrom gym.vector.utils.spaces import batch_space\n\n__all__ = [\"VectorEnv\"]\n\n\nclass VectorEnv(gym.Env):\n    \"\"\"Base class for vectorized environments. Runs multiple independent copies of the same environment in parallel.\n\n    This is not the same as 1 environment that has multiple subcomponents, but it is many copies of the same base env.\n\n    Each observation returned from vectorized environment is a batch of observations for each parallel environment.\n    And :meth:`step` is also expected to receive a batch of actions for each parallel environment.\n\n    Notes:\n        All parallel environments should share the identical observation and action spaces.\n        In other words, a vector of multiple different environments is not supported.\n    \"\"\"\n\n    def __init__(\n        self,\n        num_envs: int,\n        observation_space: gym.Space,\n        action_space: gym.Space,\n    ):\n        \"\"\"Base class for vectorized environments.\n\n        Args:\n            num_envs: Number of environments in the vectorized environment.\n            observation_space: Observation space of a single environment.\n            action_space: Action space of a single environment.\n        \"\"\"\n        self.num_envs = num_envs\n        self.is_vector_env = True\n        self.observation_space = batch_space(observation_space, n=num_envs)\n        self.action_space = batch_space(action_space, n=num_envs)\n\n        self.closed = False\n        self.viewer = None\n\n        # The observation and action spaces of a single environment are\n        # kept in separate properties\n        self.single_observation_space = observation_space\n        self.single_action_space = action_space\n\n    def reset_async(\n        self,\n        seed: Optional[Union[int, List[int]]] = None,\n        options: Optional[dict] = None,\n    ):\n        \"\"\"Reset the sub-environments asynchronously.\n\n        This method will return ``None``. A call to :meth:`reset_async` should be followed\n        by a call to :meth:`reset_wait` to retrieve the results.\n\n        Args:\n            seed: The reset seed\n            options: Reset options\n        \"\"\"\n        pass\n\n    def reset_wait(\n        self,\n        seed: Optional[Union[int, List[int]]] = None,\n        options: Optional[dict] = None,\n    ):\n        \"\"\"Retrieves the results of a :meth:`reset_async` call.\n\n        A call to this method must always be preceded by a call to :meth:`reset_async`.\n\n        Args:\n            seed: The reset seed\n            options: Reset options\n\n        Returns:\n            The results from :meth:`reset_async`\n\n        Raises:\n            NotImplementedError: VectorEnv does not implement function\n        \"\"\"\n        raise NotImplementedError(\"VectorEnv does not implement function\")\n\n    def reset(\n        self,\n        *,\n        seed: Optional[Union[int, List[int]]] = None,\n        options: Optional[dict] = None,\n    ):\n        \"\"\"Reset all parallel environments and return a batch of initial observations.\n\n        Args:\n            seed: The environment reset seeds\n            options: If to return the options\n\n        Returns:\n            A batch of observations from the vectorized environment.\n        \"\"\"\n        self.reset_async(seed=seed, options=options)\n        return self.reset_wait(seed=seed, options=options)\n\n    def step_async(self, actions):\n        \"\"\"Asynchronously performs steps in the sub-environments.\n\n        The results can be retrieved via a call to :meth:`step_wait`.\n\n        Args:\n            actions: The actions to take asynchronously\n        \"\"\"\n\n    def step_wait(self, **kwargs):\n        \"\"\"Retrieves the results of a :meth:`step_async` call.\n\n        A call to this method must always be preceded by a call to :meth:`step_async`.\n\n        Args:\n            **kwargs: Additional keywords for vector implementation\n\n        Returns:\n            The results from the :meth:`step_async` call\n        \"\"\"\n\n    def step(self, actions):\n        \"\"\"Take an action for each parallel environment.\n\n        Args:\n            actions: element of :attr:`action_space` Batch of actions.\n\n        Returns:\n            Batch of (observations, rewards, terminated, truncated, infos) or (observations, rewards, dones, infos)\n        \"\"\"\n        self.step_async(actions)\n        return self.step_wait()\n\n    def call_async(self, name, *args, **kwargs):\n        \"\"\"Calls a method name for each parallel environment asynchronously.\"\"\"\n\n    def call_wait(self, **kwargs) -> List[Any]:  # type: ignore\n        \"\"\"After calling a method in :meth:`call_async`, this function collects the results.\"\"\"\n\n    def call(self, name: str, *args, **kwargs) -> List[Any]:\n        \"\"\"Call a method, or get a property, from each parallel environment.\n\n        Args:\n            name (str): Name of the method or property to call.\n            *args: Arguments to apply to the method call.\n            **kwargs: Keyword arguments to apply to the method call.\n\n        Returns:\n            List of the results of the individual calls to the method or property for each environment.\n        \"\"\"\n        self.call_async(name, *args, **kwargs)\n        return self.call_wait()\n\n    def get_attr(self, name: str):\n        \"\"\"Get a property from each parallel environment.\n\n        Args:\n            name (str): Name of the property to be get from each individual environment.\n\n        Returns:\n            The property with name\n        \"\"\"\n        return self.call(name)\n\n    def set_attr(self, name: str, values: Union[list, tuple, object]):\n        \"\"\"Set a property in each sub-environment.\n\n        Args:\n            name (str): Name of the property to be set in each individual environment.\n            values (list, tuple, or object): Values of the property to be set to. If `values` is a list or\n                tuple, then it corresponds to the values for each individual environment, otherwise a single value\n                is set for all environments.\n        \"\"\"\n\n    def close_extras(self, **kwargs):\n        \"\"\"Clean up the extra resources e.g. beyond what's in this base class.\"\"\"\n        pass\n\n    def close(self, **kwargs):\n        \"\"\"Close all parallel environments and release resources.\n\n        It also closes all the existing image viewers, then calls :meth:`close_extras` and set\n        :attr:`closed` as ``True``.\n\n        Warnings:\n            This function itself does not close the environments, it should be handled\n            in :meth:`close_extras`. This is generic for both synchronous and asynchronous\n            vectorized environments.\n\n        Notes:\n            This will be automatically called when garbage collected or program exited.\n\n        Args:\n            **kwargs: Keyword arguments passed to :meth:`close_extras`\n        \"\"\"\n        if self.closed:\n            return\n        if self.viewer is not None:\n            self.viewer.close()\n        self.close_extras(**kwargs)\n        self.closed = True\n\n    def _add_info(self, infos: dict, info: dict, env_num: int) -> dict:\n        \"\"\"Add env info to the info dictionary of the vectorized environment.\n\n        Given the `info` of a single environment add it to the `infos` dictionary\n        which represents all the infos of the vectorized environment.\n        Every `key` of `info` is paired with a boolean mask `_key` representing\n        whether or not the i-indexed environment has this `info`.\n\n        Args:\n            infos (dict): the infos of the vectorized environment\n            info (dict): the info coming from the single environment\n            env_num (int): the index of the single environment\n\n        Returns:\n            infos (dict): the (updated) infos of the vectorized environment\n\n        \"\"\"\n        for k in info.keys():\n            if k not in infos:\n                info_array, array_mask = self._init_info_arrays(type(info[k]))\n            else:\n                info_array, array_mask = infos[k], infos[f\"_{k}\"]\n\n            info_array[env_num], array_mask[env_num] = info[k], True\n            infos[k], infos[f\"_{k}\"] = info_array, array_mask\n        return infos\n\n    def _init_info_arrays(self, dtype: type) -> Tuple[np.ndarray, np.ndarray]:\n        \"\"\"Initialize the info array.\n\n        Initialize the info array. If the dtype is numeric\n        the info array will have the same dtype, otherwise\n        will be an array of `None`. Also, a boolean array\n        of the same length is returned. It will be used for\n        assessing which environment has info data.\n\n        Args:\n            dtype (type): data type of the info coming from the env.\n\n        Returns:\n            array (np.ndarray): the initialized info array.\n            array_mask (np.ndarray): the initialized boolean array.\n\n        \"\"\"\n        if dtype in [int, float, bool] or issubclass(dtype, np.number):\n            array = np.zeros(self.num_envs, dtype=dtype)\n        else:\n            array = np.zeros(self.num_envs, dtype=object)\n            array[:] = None\n        array_mask = np.zeros(self.num_envs, dtype=bool)\n        return array, array_mask\n\n    def __del__(self):\n        \"\"\"Closes the vector environment.\"\"\"\n        if not getattr(self, \"closed\", True):\n            self.close()\n\n    def __repr__(self) -> str:\n        \"\"\"Returns a string representation of the vector environment.\n\n        Returns:\n            A string containing the class name, number of environments and environment spec id\n        \"\"\"\n        if self.spec is None:\n            return f\"{self.__class__.__name__}({self.num_envs})\"\n        else:\n            return f\"{self.__class__.__name__}({self.spec.id}, {self.num_envs})\"\n\n\nclass VectorEnvWrapper(VectorEnv):\n    \"\"\"Wraps the vectorized environment to allow a modular transformation.\n\n    This class is the base class for all wrappers for vectorized environments. The subclass\n    could override some methods to change the behavior of the original vectorized environment\n    without touching the original code.\n\n    Notes:\n        Don't forget to call ``super().__init__(env)`` if the subclass overrides :meth:`__init__`.\n    \"\"\"\n\n    def __init__(self, env: VectorEnv):\n        assert isinstance(env, VectorEnv)\n        self.env = env\n\n    # explicitly forward the methods defined in VectorEnv\n    # to self.env (instead of the base class)\n    def reset_async(self, **kwargs):\n        return self.env.reset_async(**kwargs)\n\n    def reset_wait(self, **kwargs):\n        return self.env.reset_wait(**kwargs)\n\n    def step_async(self, actions):\n        return self.env.step_async(actions)\n\n    def step_wait(self):\n        return self.env.step_wait()\n\n    def close(self, **kwargs):\n        return self.env.close(**kwargs)\n\n    def close_extras(self, **kwargs):\n        return self.env.close_extras(**kwargs)\n\n    def call(self, name, *args, **kwargs):\n        return self.env.call(name, *args, **kwargs)\n\n    def set_attr(self, name, values):\n        return self.env.set_attr(name, values)\n\n    # implicitly forward all other methods and attributes to self.env\n    def __getattr__(self, name):\n        if name.startswith(\"_\"):\n            raise AttributeError(f\"attempted to get missing private attribute '{name}'\")\n        return getattr(self.env, name)\n\n    @property\n    def unwrapped(self):\n        return self.env.unwrapped\n\n    def __repr__(self):\n        return f\"<{self.__class__.__name__}, {self.env}>\"\n\n    def __del__(self):\n        self.env.__del__()\n"
  },
  {
    "path": "gym/version.py",
    "content": "VERSION = \"0.26.2\"\n"
  },
  {
    "path": "gym/wrappers/README.md",
    "content": "# Wrappers\n\nWrappers are used to transform an environment in a modular way:\n\n```python\nenv = gym.make('Pong-v0')\nenv = MyWrapper(env)\n```\n\nNote that we may later restructure any of the files in this directory,\nbut will keep the wrappers available at the wrappers' top-level\nfolder. So for example, you should access `MyWrapper` as follows:\n\n```python\nfrom gym.wrappers import MyWrapper\n```\n\n## Quick tips for writing your own wrapper\n\n- Don't forget to call `super(class_name, self).__init__(env)` if you override the wrapper's `__init__` function\n- You can access the inner environment with `self.unwrapped`\n- You can access the previous layer using `self.env`\n- The variables `metadata`, `action_space`, `observation_space`, `reward_range`, and `spec` are copied to `self` from the previous layer\n- Create a wrapped function for at least one of the following: `__init__(self, env)`, `step`, `reset`, `render`, `close`, or `seed`\n- Your layered function should take its input from the previous layer (`self.env`) and/or the inner layer (`self.unwrapped`)\n"
  },
  {
    "path": "gym/wrappers/__init__.py",
    "content": "\"\"\"Module of wrapper classes.\"\"\"\nfrom gym import error\nfrom gym.wrappers.atari_preprocessing import AtariPreprocessing\nfrom gym.wrappers.autoreset import AutoResetWrapper\nfrom gym.wrappers.clip_action import ClipAction\nfrom gym.wrappers.filter_observation import FilterObservation\nfrom gym.wrappers.flatten_observation import FlattenObservation\nfrom gym.wrappers.frame_stack import FrameStack, LazyFrames\nfrom gym.wrappers.gray_scale_observation import GrayScaleObservation\nfrom gym.wrappers.human_rendering import HumanRendering\nfrom gym.wrappers.normalize import NormalizeObservation, NormalizeReward\nfrom gym.wrappers.order_enforcing import OrderEnforcing\nfrom gym.wrappers.record_episode_statistics import RecordEpisodeStatistics\nfrom gym.wrappers.record_video import RecordVideo, capped_cubic_video_schedule\nfrom gym.wrappers.render_collection import RenderCollection\nfrom gym.wrappers.rescale_action import RescaleAction\nfrom gym.wrappers.resize_observation import ResizeObservation\nfrom gym.wrappers.step_api_compatibility import StepAPICompatibility\nfrom gym.wrappers.time_aware_observation import TimeAwareObservation\nfrom gym.wrappers.time_limit import TimeLimit\nfrom gym.wrappers.transform_observation import TransformObservation\nfrom gym.wrappers.transform_reward import TransformReward\nfrom gym.wrappers.vector_list_info import VectorListInfo\n"
  },
  {
    "path": "gym/wrappers/atari_preprocessing.py",
    "content": "\"\"\"Implementation of Atari 2600 Preprocessing following the guidelines of Machado et al., 2018.\"\"\"\nimport numpy as np\n\nimport gym\nfrom gym.spaces import Box\n\ntry:\n    import cv2\nexcept ImportError:\n    cv2 = None\n\n\nclass AtariPreprocessing(gym.Wrapper):\n    \"\"\"Atari 2600 preprocessing wrapper.\n\n    This class follows the guidelines in Machado et al. (2018),\n    \"Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents\".\n\n    Specifically, the following preprocess stages applies to the atari environment:\n    - Noop Reset: Obtains the initial state by taking a random number of no-ops on reset, default max 30 no-ops.\n    - Frame skipping: The number of frames skipped between steps, 4 by default\n    - Max-pooling: Pools over the most recent two observations from the frame skips\n    - Termination signal when a life is lost: When the agent losses a life during the environment, then the environment is terminated.\n        Turned off by default. Not recommended by Machado et al. (2018).\n    - Resize to a square image: Resizes the atari environment original observation shape from 210x180 to 84x84 by default\n    - Grayscale observation: If the observation is colour or greyscale, by default, greyscale.\n    - Scale observation: If to scale the observation between [0, 1) or [0, 255), by default, not scaled.\n    \"\"\"\n\n    def __init__(\n        self,\n        env: gym.Env,\n        noop_max: int = 30,\n        frame_skip: int = 4,\n        screen_size: int = 84,\n        terminal_on_life_loss: bool = False,\n        grayscale_obs: bool = True,\n        grayscale_newaxis: bool = False,\n        scale_obs: bool = False,\n    ):\n        \"\"\"Wrapper for Atari 2600 preprocessing.\n\n        Args:\n            env (Env): The environment to apply the preprocessing\n            noop_max (int): For No-op reset, the max number no-ops actions are taken at reset, to turn off, set to 0.\n            frame_skip (int): The number of frames between new observation the agents observations effecting the frequency at which the agent experiences the game.\n            screen_size (int): resize Atari frame\n            terminal_on_life_loss (bool): `if True`, then :meth:`step()` returns `terminated=True` whenever a\n                life is lost.\n            grayscale_obs (bool): if True, then gray scale observation is returned, otherwise, RGB observation\n                is returned.\n            grayscale_newaxis (bool): `if True and grayscale_obs=True`, then a channel axis is added to\n                grayscale observations to make them 3-dimensional.\n            scale_obs (bool): if True, then observation normalized in range [0,1) is returned. It also limits memory\n                optimization benefits of FrameStack Wrapper.\n\n        Raises:\n            DependencyNotInstalled: opencv-python package not installed\n            ValueError: Disable frame-skipping in the original env\n        \"\"\"\n        super().__init__(env)\n        if cv2 is None:\n            raise gym.error.DependencyNotInstalled(\n                \"opencv-python package not installed, run `pip install gym[other]` to get dependencies for atari\"\n            )\n        assert frame_skip > 0\n        assert screen_size > 0\n        assert noop_max >= 0\n        if frame_skip > 1:\n            if (\n                \"NoFrameskip\" not in env.spec.id\n                and getattr(env.unwrapped, \"_frameskip\", None) != 1\n            ):\n                raise ValueError(\n                    \"Disable frame-skipping in the original env. Otherwise, more than one \"\n                    \"frame-skip will happen as through this wrapper\"\n                )\n        self.noop_max = noop_max\n        assert env.unwrapped.get_action_meanings()[0] == \"NOOP\"\n\n        self.frame_skip = frame_skip\n        self.screen_size = screen_size\n        self.terminal_on_life_loss = terminal_on_life_loss\n        self.grayscale_obs = grayscale_obs\n        self.grayscale_newaxis = grayscale_newaxis\n        self.scale_obs = scale_obs\n\n        # buffer of most recent two observations for max pooling\n        assert isinstance(env.observation_space, Box)\n        if grayscale_obs:\n            self.obs_buffer = [\n                np.empty(env.observation_space.shape[:2], dtype=np.uint8),\n                np.empty(env.observation_space.shape[:2], dtype=np.uint8),\n            ]\n        else:\n            self.obs_buffer = [\n                np.empty(env.observation_space.shape, dtype=np.uint8),\n                np.empty(env.observation_space.shape, dtype=np.uint8),\n            ]\n\n        self.lives = 0\n        self.game_over = False\n\n        _low, _high, _obs_dtype = (\n            (0, 255, np.uint8) if not scale_obs else (0, 1, np.float32)\n        )\n        _shape = (screen_size, screen_size, 1 if grayscale_obs else 3)\n        if grayscale_obs and not grayscale_newaxis:\n            _shape = _shape[:-1]  # Remove channel axis\n        self.observation_space = Box(\n            low=_low, high=_high, shape=_shape, dtype=_obs_dtype\n        )\n\n    @property\n    def ale(self):\n        \"\"\"Make ale as a class property to avoid serialization error.\"\"\"\n        return self.env.unwrapped.ale\n\n    def step(self, action):\n        \"\"\"Applies the preprocessing for an :meth:`env.step`.\"\"\"\n        total_reward, terminated, truncated, info = 0.0, False, False, {}\n\n        for t in range(self.frame_skip):\n            _, reward, terminated, truncated, info = self.env.step(action)\n            total_reward += reward\n            self.game_over = terminated\n\n            if self.terminal_on_life_loss:\n                new_lives = self.ale.lives()\n                terminated = terminated or new_lives < self.lives\n                self.game_over = terminated\n                self.lives = new_lives\n\n            if terminated or truncated:\n                break\n            if t == self.frame_skip - 2:\n                if self.grayscale_obs:\n                    self.ale.getScreenGrayscale(self.obs_buffer[1])\n                else:\n                    self.ale.getScreenRGB(self.obs_buffer[1])\n            elif t == self.frame_skip - 1:\n                if self.grayscale_obs:\n                    self.ale.getScreenGrayscale(self.obs_buffer[0])\n                else:\n                    self.ale.getScreenRGB(self.obs_buffer[0])\n        return self._get_obs(), total_reward, terminated, truncated, info\n\n    def reset(self, **kwargs):\n        \"\"\"Resets the environment using preprocessing.\"\"\"\n        # NoopReset\n        _, reset_info = self.env.reset(**kwargs)\n\n        noops = (\n            self.env.unwrapped.np_random.integers(1, self.noop_max + 1)\n            if self.noop_max > 0\n            else 0\n        )\n        for _ in range(noops):\n            _, _, terminated, truncated, step_info = self.env.step(0)\n            reset_info.update(step_info)\n            if terminated or truncated:\n                _, reset_info = self.env.reset(**kwargs)\n\n        self.lives = self.ale.lives()\n        if self.grayscale_obs:\n            self.ale.getScreenGrayscale(self.obs_buffer[0])\n        else:\n            self.ale.getScreenRGB(self.obs_buffer[0])\n        self.obs_buffer[1].fill(0)\n\n        return self._get_obs(), reset_info\n\n    def _get_obs(self):\n        if self.frame_skip > 1:  # more efficient in-place pooling\n            np.maximum(self.obs_buffer[0], self.obs_buffer[1], out=self.obs_buffer[0])\n        assert cv2 is not None\n        obs = cv2.resize(\n            self.obs_buffer[0],\n            (self.screen_size, self.screen_size),\n            interpolation=cv2.INTER_AREA,\n        )\n\n        if self.scale_obs:\n            obs = np.asarray(obs, dtype=np.float32) / 255.0\n        else:\n            obs = np.asarray(obs, dtype=np.uint8)\n\n        if self.grayscale_obs and self.grayscale_newaxis:\n            obs = np.expand_dims(obs, axis=-1)  # Add a channel axis\n        return obs\n"
  },
  {
    "path": "gym/wrappers/autoreset.py",
    "content": "\"\"\"Wrapper that autoreset environments when `terminated=True` or `truncated=True`.\"\"\"\nimport gym\n\n\nclass AutoResetWrapper(gym.Wrapper):\n    \"\"\"A class for providing an automatic reset functionality for gym environments when calling :meth:`self.step`.\n\n    When calling step causes :meth:`Env.step` to return `terminated=True` or `truncated=True`, :meth:`Env.reset` is called,\n    and the return format of :meth:`self.step` is as follows: ``(new_obs, final_reward, final_terminated, final_truncated, info)``\n    with new step API and ``(new_obs, final_reward, final_done, info)`` with the old step API.\n     - ``new_obs`` is the first observation after calling :meth:`self.env.reset`\n     - ``final_reward`` is the reward after calling :meth:`self.env.step`, prior to calling :meth:`self.env.reset`.\n     - ``final_terminated`` is the terminated value before calling :meth:`self.env.reset`.\n     - ``final_truncated`` is the truncated value before calling :meth:`self.env.reset`. Both `final_terminated` and `final_truncated` cannot be False.\n     - ``info`` is a dict containing all the keys from the info dict returned by the call to :meth:`self.env.reset`,\n       with an additional key \"final_observation\" containing the observation returned by the last call to :meth:`self.env.step`\n       and \"final_info\" containing the info dict returned by the last call to :meth:`self.env.step`.\n\n    Warning: When using this wrapper to collect rollouts, note that when :meth:`Env.step` returns `terminated` or `truncated`, a\n        new observation from after calling :meth:`Env.reset` is returned by :meth:`Env.step` alongside the\n        final reward, terminated and truncated state from the previous episode.\n        If you need the final state from the previous episode, you need to retrieve it via the\n        \"final_observation\" key in the info dict.\n        Make sure you know what you're doing if you use this wrapper!\n    \"\"\"\n\n    def __init__(self, env: gym.Env):\n        \"\"\"A class for providing an automatic reset functionality for gym environments when calling :meth:`self.step`.\n\n        Args:\n            env (gym.Env): The environment to apply the wrapper\n        \"\"\"\n        super().__init__(env)\n\n    def step(self, action):\n        \"\"\"Steps through the environment with action and resets the environment if a terminated or truncated signal is encountered.\n\n        Args:\n            action: The action to take\n\n        Returns:\n            The autoreset environment :meth:`step`\n        \"\"\"\n        obs, reward, terminated, truncated, info = self.env.step(action)\n        if terminated or truncated:\n\n            new_obs, new_info = self.env.reset()\n            assert (\n                \"final_observation\" not in new_info\n            ), 'info dict cannot contain key \"final_observation\" '\n            assert (\n                \"final_info\" not in new_info\n            ), 'info dict cannot contain key \"final_info\" '\n\n            new_info[\"final_observation\"] = obs\n            new_info[\"final_info\"] = info\n\n            obs = new_obs\n            info = new_info\n\n        return obs, reward, terminated, truncated, info\n"
  },
  {
    "path": "gym/wrappers/clip_action.py",
    "content": "\"\"\"Wrapper for clipping actions within a valid bound.\"\"\"\nimport numpy as np\n\nimport gym\nfrom gym import ActionWrapper\nfrom gym.spaces import Box\n\n\nclass ClipAction(ActionWrapper):\n    \"\"\"Clip the continuous action within the valid :class:`Box` observation space bound.\n\n    Example:\n        >>> import gym\n        >>> env = gym.make('Bipedal-Walker-v3')\n        >>> env = ClipAction(env)\n        >>> env.action_space\n        Box(-1.0, 1.0, (4,), float32)\n        >>> env.step(np.array([5.0, 2.0, -10.0, 0.0]))\n        # Executes the action np.array([1.0, 1.0, -1.0, 0]) in the base environment\n    \"\"\"\n\n    def __init__(self, env: gym.Env):\n        \"\"\"A wrapper for clipping continuous actions within the valid bound.\n\n        Args:\n            env: The environment to apply the wrapper\n        \"\"\"\n        assert isinstance(env.action_space, Box)\n        super().__init__(env)\n\n    def action(self, action):\n        \"\"\"Clips the action within the valid bounds.\n\n        Args:\n            action: The action to clip\n\n        Returns:\n            The clipped action\n        \"\"\"\n        return np.clip(action, self.action_space.low, self.action_space.high)\n"
  },
  {
    "path": "gym/wrappers/compatibility.py",
    "content": "\"\"\"A compatibility wrapper converting an old-style environment into a valid environment.\"\"\"\nimport sys\nfrom typing import Any, Dict, Optional, Tuple\n\nimport gym\nfrom gym.core import ObsType\nfrom gym.utils.step_api_compatibility import convert_to_terminated_truncated_step_api\n\nif sys.version_info >= (3, 8):\n    from typing import Protocol, runtime_checkable\nelif sys.version_info >= (3, 7):\n    from typing_extensions import Protocol, runtime_checkable\nelse:\n    Protocol = object\n    runtime_checkable = lambda x: x  # noqa: E731\n\n\n@runtime_checkable\nclass LegacyEnv(Protocol):\n    \"\"\"A protocol for environments using the old step API.\"\"\"\n\n    observation_space: gym.Space\n    action_space: gym.Space\n\n    def reset(self) -> Any:\n        \"\"\"Reset the environment and return the initial observation.\"\"\"\n        ...\n\n    def step(self, action: Any) -> Tuple[Any, float, bool, Dict]:\n        \"\"\"Run one timestep of the environment's dynamics.\"\"\"\n        ...\n\n    def render(self, mode: Optional[str] = \"human\") -> Any:\n        \"\"\"Render the environment.\"\"\"\n        ...\n\n    def close(self):\n        \"\"\"Close the environment.\"\"\"\n        ...\n\n    def seed(self, seed: Optional[int] = None):\n        \"\"\"Set the seed for this env's random number generator(s).\"\"\"\n        ...\n\n\nclass EnvCompatibility(gym.Env):\n    r\"\"\"A wrapper which can transform an environment from the old API to the new API.\n\n    Old step API refers to step() method returning (observation, reward, done, info), and reset() only retuning the observation.\n    New step API refers to step() method returning (observation, reward, terminated, truncated, info) and reset() returning (observation, info).\n    (Refer to docs for details on the API change)\n\n    Known limitations:\n    - Environments that use `self.np_random` might not work as expected.\n    \"\"\"\n\n    def __init__(self, old_env: LegacyEnv, render_mode: Optional[str] = None):\n        \"\"\"A wrapper which converts old-style envs to valid modern envs.\n\n        Some information may be lost in the conversion, so we recommend updating your environment.\n\n        Args:\n            old_env (LegacyEnv): the env to wrap, implemented with the old API\n            render_mode (str): the render mode to use when rendering the environment, passed automatically to env.render\n        \"\"\"\n        self.metadata = getattr(old_env, \"metadata\", {\"render_modes\": []})\n        self.render_mode = render_mode\n        self.reward_range = getattr(old_env, \"reward_range\", None)\n        self.spec = getattr(old_env, \"spec\", None)\n        self.env = old_env\n\n        self.observation_space = old_env.observation_space\n        self.action_space = old_env.action_space\n\n    def reset(\n        self, seed: Optional[int] = None, options: Optional[dict] = None\n    ) -> Tuple[ObsType, dict]:\n        \"\"\"Resets the environment.\n\n        Args:\n            seed: the seed to reset the environment with\n            options: the options to reset the environment with\n\n        Returns:\n            (observation, info)\n        \"\"\"\n        if seed is not None:\n            self.env.seed(seed)\n        # Options are ignored\n\n        if self.render_mode == \"human\":\n            self.render()\n\n        return self.env.reset(), {}\n\n    def step(self, action: Any) -> Tuple[Any, float, bool, bool, Dict]:\n        \"\"\"Steps through the environment.\n\n        Args:\n            action: action to step through the environment with\n\n        Returns:\n            (observation, reward, terminated, truncated, info)\n        \"\"\"\n        obs, reward, done, info = self.env.step(action)\n\n        if self.render_mode == \"human\":\n            self.render()\n\n        return convert_to_terminated_truncated_step_api((obs, reward, done, info))\n\n    def render(self) -> Any:\n        \"\"\"Renders the environment.\n\n        Returns:\n            The rendering of the environment, depending on the render mode\n        \"\"\"\n        return self.env.render(mode=self.render_mode)\n\n    def close(self):\n        \"\"\"Closes the environment.\"\"\"\n        self.env.close()\n\n    def __str__(self):\n        \"\"\"Returns the wrapper name and the unwrapped environment string.\"\"\"\n        return f\"<{type(self).__name__}{self.env}>\"\n\n    def __repr__(self):\n        \"\"\"Returns the string representation of the wrapper.\"\"\"\n        return str(self)\n"
  },
  {
    "path": "gym/wrappers/env_checker.py",
    "content": "\"\"\"A passive environment checker wrapper for an environment's observation and action space along with the reset, step and render functions.\"\"\"\nimport gym\nfrom gym.core import ActType\nfrom gym.utils.passive_env_checker import (\n    check_action_space,\n    check_observation_space,\n    env_render_passive_checker,\n    env_reset_passive_checker,\n    env_step_passive_checker,\n)\n\n\nclass PassiveEnvChecker(gym.Wrapper):\n    \"\"\"A passive environment checker wrapper that surrounds the step, reset and render functions to check they follow the gym API.\"\"\"\n\n    def __init__(self, env):\n        \"\"\"Initialises the wrapper with the environments, run the observation and action space tests.\"\"\"\n        super().__init__(env)\n\n        assert hasattr(\n            env, \"action_space\"\n        ), \"The environment must specify an action space. https://www.gymlibrary.dev/content/environment_creation/\"\n        check_action_space(env.action_space)\n        assert hasattr(\n            env, \"observation_space\"\n        ), \"The environment must specify an observation space. https://www.gymlibrary.dev/content/environment_creation/\"\n        check_observation_space(env.observation_space)\n\n        self.checked_reset = False\n        self.checked_step = False\n        self.checked_render = False\n\n    def step(self, action: ActType):\n        \"\"\"Steps through the environment that on the first call will run the `passive_env_step_check`.\"\"\"\n        if self.checked_step is False:\n            self.checked_step = True\n            return env_step_passive_checker(self.env, action)\n        else:\n            return self.env.step(action)\n\n    def reset(self, **kwargs):\n        \"\"\"Resets the environment that on the first call will run the `passive_env_reset_check`.\"\"\"\n        if self.checked_reset is False:\n            self.checked_reset = True\n            return env_reset_passive_checker(self.env, **kwargs)\n        else:\n            return self.env.reset(**kwargs)\n\n    def render(self, *args, **kwargs):\n        \"\"\"Renders the environment that on the first call will run the `passive_env_render_check`.\"\"\"\n        if self.checked_render is False:\n            self.checked_render = True\n            return env_render_passive_checker(self.env, *args, **kwargs)\n        else:\n            return self.env.render(*args, **kwargs)\n"
  },
  {
    "path": "gym/wrappers/filter_observation.py",
    "content": "\"\"\"A wrapper for filtering dictionary observations by their keys.\"\"\"\nimport copy\nfrom typing import Sequence\n\nimport gym\nfrom gym import spaces\n\n\nclass FilterObservation(gym.ObservationWrapper):\n    \"\"\"Filter Dict observation space by the keys.\n\n    Example:\n        >>> import gym\n        >>> env = gym.wrappers.TransformObservation(\n        ...     gym.make('CartPole-v1'), lambda obs: {'obs': obs, 'time': 0}\n        ... )\n        >>> env.observation_space = gym.spaces.Dict(obs=env.observation_space, time=gym.spaces.Discrete(1))\n        >>> env.reset()\n        {'obs': array([-0.00067088, -0.01860439,  0.04772898, -0.01911527], dtype=float32), 'time': 0}\n        >>> env = FilterObservation(env, filter_keys=['time'])\n        >>> env.reset()\n        {'obs': array([ 0.04560107,  0.04466959, -0.0328232 , -0.02367178], dtype=float32)}\n        >>> env.step(0)\n        ({'obs': array([ 0.04649447, -0.14996664, -0.03329664,  0.25847703], dtype=float32)}, 1.0, False, {})\n    \"\"\"\n\n    def __init__(self, env: gym.Env, filter_keys: Sequence[str] = None):\n        \"\"\"A wrapper that filters dictionary observations by their keys.\n\n        Args:\n            env: The environment to apply the wrapper\n            filter_keys: List of keys to be included in the observations. If ``None``, observations will not be filtered and this wrapper has no effect\n\n        Raises:\n            ValueError: If the environment's observation space is not :class:`spaces.Dict`\n            ValueError: If any of the `filter_keys` are not included in the original `env`'s observation space\n        \"\"\"\n        super().__init__(env)\n\n        wrapped_observation_space = env.observation_space\n        if not isinstance(wrapped_observation_space, spaces.Dict):\n            raise ValueError(\n                f\"FilterObservationWrapper is only usable with dict observations, \"\n                f\"environment observation space is {type(wrapped_observation_space)}\"\n            )\n\n        observation_keys = wrapped_observation_space.spaces.keys()\n        if filter_keys is None:\n            filter_keys = tuple(observation_keys)\n\n        missing_keys = {key for key in filter_keys if key not in observation_keys}\n        if missing_keys:\n            raise ValueError(\n                \"All the filter_keys must be included in the original observation space.\\n\"\n                f\"Filter keys: {filter_keys}\\n\"\n                f\"Observation keys: {observation_keys}\\n\"\n                f\"Missing keys: {missing_keys}\"\n            )\n\n        self.observation_space = type(wrapped_observation_space)(\n            [\n                (name, copy.deepcopy(space))\n                for name, space in wrapped_observation_space.spaces.items()\n                if name in filter_keys\n            ]\n        )\n\n        self._env = env\n        self._filter_keys = tuple(filter_keys)\n\n    def observation(self, observation):\n        \"\"\"Filters the observations.\n\n        Args:\n            observation: The observation to filter\n\n        Returns:\n            The filtered observations\n        \"\"\"\n        filter_observation = self._filter_observation(observation)\n        return filter_observation\n\n    def _filter_observation(self, observation):\n        observation = type(observation)(\n            [\n                (name, value)\n                for name, value in observation.items()\n                if name in self._filter_keys\n            ]\n        )\n        return observation\n"
  },
  {
    "path": "gym/wrappers/flatten_observation.py",
    "content": "\"\"\"Wrapper for flattening observations of an environment.\"\"\"\nimport gym\nimport gym.spaces as spaces\n\n\nclass FlattenObservation(gym.ObservationWrapper):\n    \"\"\"Observation wrapper that flattens the observation.\n\n    Example:\n        >>> import gym\n        >>> env = gym.make('CarRacing-v1')\n        >>> env.observation_space.shape\n        (96, 96, 3)\n        >>> env = FlattenObservation(env)\n        >>> env.observation_space.shape\n        (27648,)\n        >>> obs = env.reset()\n        >>> obs.shape\n        (27648,)\n    \"\"\"\n\n    def __init__(self, env: gym.Env):\n        \"\"\"Flattens the observations of an environment.\n\n        Args:\n            env: The environment to apply the wrapper\n        \"\"\"\n        super().__init__(env)\n        self.observation_space = spaces.flatten_space(env.observation_space)\n\n    def observation(self, observation):\n        \"\"\"Flattens an observation.\n\n        Args:\n            observation: The observation to flatten\n\n        Returns:\n            The flattened observation\n        \"\"\"\n        return spaces.flatten(self.env.observation_space, observation)\n"
  },
  {
    "path": "gym/wrappers/frame_stack.py",
    "content": "\"\"\"Wrapper that stacks frames.\"\"\"\nfrom collections import deque\nfrom typing import Union\n\nimport numpy as np\n\nimport gym\nfrom gym.error import DependencyNotInstalled\nfrom gym.spaces import Box\n\n\nclass LazyFrames:\n    \"\"\"Ensures common frames are only stored once to optimize memory use.\n\n    To further reduce the memory use, it is optionally to turn on lz4 to compress the observations.\n\n    Note:\n        This object should only be converted to numpy array just before forward pass.\n    \"\"\"\n\n    __slots__ = (\"frame_shape\", \"dtype\", \"shape\", \"lz4_compress\", \"_frames\")\n\n    def __init__(self, frames: list, lz4_compress: bool = False):\n        \"\"\"Lazyframe for a set of frames and if to apply lz4.\n\n        Args:\n            frames (list): The frames to convert to lazy frames\n            lz4_compress (bool): Use lz4 to compress the frames internally\n\n        Raises:\n            DependencyNotInstalled: lz4 is not installed\n        \"\"\"\n        self.frame_shape = tuple(frames[0].shape)\n        self.shape = (len(frames),) + self.frame_shape\n        self.dtype = frames[0].dtype\n        if lz4_compress:\n            try:\n                from lz4.block import compress\n            except ImportError:\n                raise DependencyNotInstalled(\n                    \"lz4 is not installed, run `pip install gym[other]`\"\n                )\n\n            frames = [compress(frame) for frame in frames]\n        self._frames = frames\n        self.lz4_compress = lz4_compress\n\n    def __array__(self, dtype=None):\n        \"\"\"Gets a numpy array of stacked frames with specific dtype.\n\n        Args:\n            dtype: The dtype of the stacked frames\n\n        Returns:\n            The array of stacked frames with dtype\n        \"\"\"\n        arr = self[:]\n        if dtype is not None:\n            return arr.astype(dtype)\n        return arr\n\n    def __len__(self):\n        \"\"\"Returns the number of frame stacks.\n\n        Returns:\n            The number of frame stacks\n        \"\"\"\n        return self.shape[0]\n\n    def __getitem__(self, int_or_slice: Union[int, slice]):\n        \"\"\"Gets the stacked frames for a particular index or slice.\n\n        Args:\n            int_or_slice: Index or slice to get items for\n\n        Returns:\n            np.stacked frames for the int or slice\n\n        \"\"\"\n        if isinstance(int_or_slice, int):\n            return self._check_decompress(self._frames[int_or_slice])  # single frame\n        return np.stack(\n            [self._check_decompress(f) for f in self._frames[int_or_slice]], axis=0\n        )\n\n    def __eq__(self, other):\n        \"\"\"Checks that the current frames are equal to the other object.\"\"\"\n        return self.__array__() == other\n\n    def _check_decompress(self, frame):\n        if self.lz4_compress:\n            from lz4.block import decompress\n\n            return np.frombuffer(decompress(frame), dtype=self.dtype).reshape(\n                self.frame_shape\n            )\n        return frame\n\n\nclass FrameStack(gym.ObservationWrapper):\n    \"\"\"Observation wrapper that stacks the observations in a rolling manner.\n\n    For example, if the number of stacks is 4, then the returned observation contains\n    the most recent 4 observations. For environment 'Pendulum-v1', the original observation\n    is an array with shape [3], so if we stack 4 observations, the processed observation\n    has shape [4, 3].\n\n    Note:\n        - To be memory efficient, the stacked observations are wrapped by :class:`LazyFrame`.\n        - The observation space must be :class:`Box` type. If one uses :class:`Dict`\n          as observation space, it should apply :class:`FlattenObservation` wrapper first.\n          - After :meth:`reset` is called, the frame buffer will be filled with the initial observation. I.e. the observation returned by :meth:`reset` will consist of ``num_stack`-many identical frames,\n\n    Example:\n        >>> import gym\n        >>> env = gym.make('CarRacing-v1')\n        >>> env = FrameStack(env, 4)\n        >>> env.observation_space\n        Box(4, 96, 96, 3)\n        >>> obs = env.reset()\n        >>> obs.shape\n        (4, 96, 96, 3)\n    \"\"\"\n\n    def __init__(\n        self,\n        env: gym.Env,\n        num_stack: int,\n        lz4_compress: bool = False,\n    ):\n        \"\"\"Observation wrapper that stacks the observations in a rolling manner.\n\n        Args:\n            env (Env): The environment to apply the wrapper\n            num_stack (int): The number of frames to stack\n            lz4_compress (bool): Use lz4 to compress the frames internally\n        \"\"\"\n        super().__init__(env)\n        self.num_stack = num_stack\n        self.lz4_compress = lz4_compress\n\n        self.frames = deque(maxlen=num_stack)\n\n        low = np.repeat(self.observation_space.low[np.newaxis, ...], num_stack, axis=0)\n        high = np.repeat(\n            self.observation_space.high[np.newaxis, ...], num_stack, axis=0\n        )\n        self.observation_space = Box(\n            low=low, high=high, dtype=self.observation_space.dtype\n        )\n\n    def observation(self, observation):\n        \"\"\"Converts the wrappers current frames to lazy frames.\n\n        Args:\n            observation: Ignored\n\n        Returns:\n            :class:`LazyFrames` object for the wrapper's frame buffer,  :attr:`self.frames`\n        \"\"\"\n        assert len(self.frames) == self.num_stack, (len(self.frames), self.num_stack)\n        return LazyFrames(list(self.frames), self.lz4_compress)\n\n    def step(self, action):\n        \"\"\"Steps through the environment, appending the observation to the frame buffer.\n\n        Args:\n            action: The action to step through the environment with\n\n        Returns:\n            Stacked observations, reward, terminated, truncated, and information from the environment\n        \"\"\"\n        observation, reward, terminated, truncated, info = self.env.step(action)\n        self.frames.append(observation)\n        return self.observation(None), reward, terminated, truncated, info\n\n    def reset(self, **kwargs):\n        \"\"\"Reset the environment with kwargs.\n\n        Args:\n            **kwargs: The kwargs for the environment reset\n\n        Returns:\n            The stacked observations\n        \"\"\"\n        obs, info = self.env.reset(**kwargs)\n\n        [self.frames.append(obs) for _ in range(self.num_stack)]\n\n        return self.observation(None), info\n"
  },
  {
    "path": "gym/wrappers/gray_scale_observation.py",
    "content": "\"\"\"Wrapper that converts a color observation to grayscale.\"\"\"\nimport numpy as np\n\nimport gym\nfrom gym.spaces import Box\n\n\nclass GrayScaleObservation(gym.ObservationWrapper):\n    \"\"\"Convert the image observation from RGB to gray scale.\n\n    Example:\n        >>> env = gym.make('CarRacing-v1')\n        >>> env.observation_space\n        Box(0, 255, (96, 96, 3), uint8)\n        >>> env = GrayScaleObservation(gym.make('CarRacing-v1'))\n        >>> env.observation_space\n        Box(0, 255, (96, 96), uint8)\n        >>> env = GrayScaleObservation(gym.make('CarRacing-v1'), keep_dim=True)\n        >>> env.observation_space\n        Box(0, 255, (96, 96, 1), uint8)\n    \"\"\"\n\n    def __init__(self, env: gym.Env, keep_dim: bool = False):\n        \"\"\"Convert the image observation from RGB to gray scale.\n\n        Args:\n            env (Env): The environment to apply the wrapper\n            keep_dim (bool): If `True`, a singleton dimension will be added, i.e. observations are of the shape AxBx1.\n                Otherwise, they are of shape AxB.\n        \"\"\"\n        super().__init__(env)\n        self.keep_dim = keep_dim\n\n        assert (\n            isinstance(self.observation_space, Box)\n            and len(self.observation_space.shape) == 3\n            and self.observation_space.shape[-1] == 3\n        )\n\n        obs_shape = self.observation_space.shape[:2]\n        if self.keep_dim:\n            self.observation_space = Box(\n                low=0, high=255, shape=(obs_shape[0], obs_shape[1], 1), dtype=np.uint8\n            )\n        else:\n            self.observation_space = Box(\n                low=0, high=255, shape=obs_shape, dtype=np.uint8\n            )\n\n    def observation(self, observation):\n        \"\"\"Converts the colour observation to greyscale.\n\n        Args:\n            observation: Color observations\n\n        Returns:\n            Grayscale observations\n        \"\"\"\n        import cv2\n\n        observation = cv2.cvtColor(observation, cv2.COLOR_RGB2GRAY)\n        if self.keep_dim:\n            observation = np.expand_dims(observation, -1)\n        return observation\n"
  },
  {
    "path": "gym/wrappers/human_rendering.py",
    "content": "\"\"\"A wrapper that adds human-renering functionality to an environment.\"\"\"\nimport numpy as np\n\nimport gym\nfrom gym.error import DependencyNotInstalled\n\n\nclass HumanRendering(gym.Wrapper):\n    \"\"\"Performs human rendering for an environment that only supports \"rgb_array\"rendering.\n\n    This wrapper is particularly useful when you have implemented an environment that can produce\n    RGB images but haven't implemented any code to render the images to the screen.\n    If you want to use this wrapper with your environments, remember to specify ``\"render_fps\"``\n    in the metadata of your environment.\n\n    The ``render_mode`` of the wrapped environment must be either ``'rgb_array'`` or ``'rgb_array_list'``.\n\n    Example:\n        >>> env = gym.make(\"LunarLander-v2\", render_mode=\"rgb_array\")\n        >>> wrapped = HumanRendering(env)\n        >>> wrapped.reset()     # This will start rendering to the screen\n\n    The wrapper can also be applied directly when the environment is instantiated, simply by passing\n    ``render_mode=\"human\"`` to ``make``. The wrapper will only be applied if the environment does not\n    implement human-rendering natively (i.e. ``render_mode`` does not contain ``\"human\"``).\n\n    Example:\n        >>> env = gym.make(\"NoNativeRendering-v2\", render_mode=\"human\")      # NoNativeRendering-v0 doesn't implement human-rendering natively\n        >>> env.reset()     # This will start rendering to the screen\n\n    Warning: If the base environment uses ``render_mode=\"rgb_array_list\"``, its (i.e. the *base environment's*) render method\n        will always return an empty list:\n\n            >>> env = gym.make(\"LunarLander-v2\", render_mode=\"rgb_array_list\")\n            >>> wrapped = HumanRendering(env)\n            >>> wrapped.reset()\n            >>> env.render()\n            []          # env.render() will always return an empty list!\n\n    \"\"\"\n\n    def __init__(self, env):\n        \"\"\"Initialize a :class:`HumanRendering` instance.\n\n        Args:\n            env: The environment that is being wrapped\n        \"\"\"\n        super().__init__(env)\n        assert env.render_mode in [\n            \"rgb_array\",\n            \"rgb_array_list\",\n        ], f\"Expected env.render_mode to be one of 'rgb_array' or 'rgb_array_list' but got '{env.render_mode}'\"\n        assert (\n            \"render_fps\" in env.metadata\n        ), \"The base environment must specify 'render_fps' to be used with the HumanRendering wrapper\"\n\n        self.screen_size = None\n        self.window = None\n        self.clock = None\n\n    @property\n    def render_mode(self):\n        \"\"\"Always returns ``'human'``.\"\"\"\n        return \"human\"\n\n    def step(self, *args, **kwargs):\n        \"\"\"Perform a step in the base environment and render a frame to the screen.\"\"\"\n        result = self.env.step(*args, **kwargs)\n        self._render_frame()\n        return result\n\n    def reset(self, *args, **kwargs):\n        \"\"\"Reset the base environment and render a frame to the screen.\"\"\"\n        result = self.env.reset(*args, **kwargs)\n        self._render_frame()\n        return result\n\n    def render(self):\n        \"\"\"This method doesn't do much, actual rendering is performed in :meth:`step` and :meth:`reset`.\"\"\"\n        return None\n\n    def _render_frame(self):\n        \"\"\"Fetch the last frame from the base environment and render it to the screen.\"\"\"\n        try:\n            import pygame\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"pygame is not installed, run `pip install gym[box2d]`\"\n            )\n        if self.env.render_mode == \"rgb_array_list\":\n            last_rgb_array = self.env.render()\n            assert isinstance(last_rgb_array, list)\n            last_rgb_array = last_rgb_array[-1]\n        elif self.env.render_mode == \"rgb_array\":\n            last_rgb_array = self.env.render()\n        else:\n            raise Exception(\n                f\"Wrapped environment must have mode 'rgb_array' or 'rgb_array_list', actual render mode: {self.env.render_mode}\"\n            )\n        assert isinstance(last_rgb_array, np.ndarray)\n\n        rgb_array = np.transpose(last_rgb_array, axes=(1, 0, 2))\n\n        if self.screen_size is None:\n            self.screen_size = rgb_array.shape[:2]\n\n        assert (\n            self.screen_size == rgb_array.shape[:2]\n        ), f\"The shape of the rgb array has changed from {self.screen_size} to {rgb_array.shape[:2]}\"\n\n        if self.window is None:\n            pygame.init()\n            pygame.display.init()\n            self.window = pygame.display.set_mode(self.screen_size)\n\n        if self.clock is None:\n            self.clock = pygame.time.Clock()\n\n        surf = pygame.surfarray.make_surface(rgb_array)\n        self.window.blit(surf, (0, 0))\n        pygame.event.pump()\n        self.clock.tick(self.metadata[\"render_fps\"])\n        pygame.display.flip()\n\n    def close(self):\n        \"\"\"Close the rendering window.\"\"\"\n        super().close()\n        if self.window is not None:\n            import pygame\n\n            pygame.display.quit()\n            pygame.quit()\n"
  },
  {
    "path": "gym/wrappers/monitoring/__init__.py",
    "content": "\"\"\"Module for monitoring.video_recorder.\"\"\"\n"
  },
  {
    "path": "gym/wrappers/monitoring/video_recorder.py",
    "content": "\"\"\"A wrapper for video recording environments by rolling it out, frame by frame.\"\"\"\nimport json\nimport os\nimport os.path\nimport tempfile\nfrom typing import List, Optional\n\nfrom gym import error, logger\n\n\nclass VideoRecorder:\n    \"\"\"VideoRecorder renders a nice movie of a rollout, frame by frame.\n\n    It comes with an ``enabled`` option, so you can still use the same code on episodes where you don't want to record video.\n\n    Note:\n        You are responsible for calling :meth:`close` on a created VideoRecorder, or else you may leak an encoder process.\n    \"\"\"\n\n    def __init__(\n        self,\n        env,\n        path: Optional[str] = None,\n        metadata: Optional[dict] = None,\n        enabled: bool = True,\n        base_path: Optional[str] = None,\n    ):\n        \"\"\"Video recorder renders a nice movie of a rollout, frame by frame.\n\n        Args:\n            env (Env): Environment to take video of.\n            path (Optional[str]): Path to the video file; will be randomly chosen if omitted.\n            metadata (Optional[dict]): Contents to save to the metadata file.\n            enabled (bool): Whether to actually record video, or just no-op (for convenience)\n            base_path (Optional[str]): Alternatively, path to the video file without extension, which will be added.\n\n        Raises:\n            Error: You can pass at most one of `path` or `base_path`\n            Error: Invalid path given that must have a particular file extension\n        \"\"\"\n        try:\n            # check that moviepy is now installed\n            import moviepy  # noqa: F401\n        except ImportError:\n            raise error.DependencyNotInstalled(\n                \"MoviePy is not installed, run `pip install moviepy`\"\n            )\n\n        self._async = env.metadata.get(\"semantics.async\")\n        self.enabled = enabled\n        self._closed = False\n\n        self.render_history = []\n        self.env = env\n\n        self.render_mode = env.render_mode\n\n        if \"rgb_array_list\" != self.render_mode and \"rgb_array\" != self.render_mode:\n            logger.warn(\n                f\"Disabling video recorder because environment {env} was not initialized with any compatible video \"\n                \"mode between `rgb_array` and `rgb_array_list`\"\n            )\n            # Disable since the environment has not been initialized with a compatible `render_mode`\n            self.enabled = False\n\n        # Don't bother setting anything else if not enabled\n        if not self.enabled:\n            return\n\n        if path is not None and base_path is not None:\n            raise error.Error(\"You can pass at most one of `path` or `base_path`.\")\n\n        required_ext = \".mp4\"\n        if path is None:\n            if base_path is not None:\n                # Base path given, append ext\n                path = base_path + required_ext\n            else:\n                # Otherwise, just generate a unique filename\n                with tempfile.NamedTemporaryFile(suffix=required_ext) as f:\n                    path = f.name\n        self.path = path\n\n        path_base, actual_ext = os.path.splitext(self.path)\n\n        if actual_ext != required_ext:\n            raise error.Error(\n                f\"Invalid path given: {self.path} -- must have file extension {required_ext}.\"\n            )\n\n        self.frames_per_sec = env.metadata.get(\"render_fps\", 30)\n\n        self.broken = False\n\n        # Dump metadata\n        self.metadata = metadata or {}\n        self.metadata[\"content_type\"] = \"video/mp4\"\n        self.metadata_path = f\"{path_base}.meta.json\"\n        self.write_metadata()\n\n        logger.info(f\"Starting new video recorder writing to {self.path}\")\n        self.recorded_frames = []\n\n    @property\n    def functional(self):\n        \"\"\"Returns if the video recorder is functional, is enabled and not broken.\"\"\"\n        return self.enabled and not self.broken\n\n    def capture_frame(self):\n        \"\"\"Render the given `env` and add the resulting frame to the video.\"\"\"\n        frame = self.env.render()\n        if isinstance(frame, List):\n            self.render_history += frame\n            frame = frame[-1]\n\n        if not self.functional:\n            return\n        if self._closed:\n            logger.warn(\n                \"The video recorder has been closed and no frames will be captured anymore.\"\n            )\n            return\n        logger.debug(\"Capturing video frame: path=%s\", self.path)\n\n        if frame is None:\n            if self._async:\n                return\n            else:\n                # Indicates a bug in the environment: don't want to raise\n                # an error here.\n                logger.warn(\n                    \"Env returned None on `render()`. Disabling further rendering for video recorder by marking as \"\n                    f\"disabled: path={self.path} metadata_path={self.metadata_path}\"\n                )\n                self.broken = True\n        else:\n            self.recorded_frames.append(frame)\n\n    def close(self):\n        \"\"\"Flush all data to disk and close any open frame encoders.\"\"\"\n        if not self.enabled or self._closed:\n            return\n\n        # First close the environment\n        self.env.close()\n\n        # Close the encoder\n        if len(self.recorded_frames) > 0:\n            try:\n                from moviepy.video.io.ImageSequenceClip import ImageSequenceClip\n            except ImportError:\n                raise error.DependencyNotInstalled(\n                    \"MoviePy is not installed, run `pip install moviepy`\"\n                )\n\n            logger.debug(f\"Closing video encoder: path={self.path}\")\n            clip = ImageSequenceClip(self.recorded_frames, fps=self.frames_per_sec)\n            clip.write_videofile(self.path)\n        else:\n            # No frames captured. Set metadata.\n            if self.metadata is None:\n                self.metadata = {}\n            self.metadata[\"empty\"] = True\n\n        self.write_metadata()\n\n        # Stop tracking this for autoclose\n        self._closed = True\n\n    def write_metadata(self):\n        \"\"\"Writes metadata to metadata path.\"\"\"\n        with open(self.metadata_path, \"w\") as f:\n            json.dump(self.metadata, f)\n\n    def __del__(self):\n        \"\"\"Closes the environment correctly when the recorder is deleted.\"\"\"\n        # Make sure we've closed up shop when garbage collecting\n        self.close()\n"
  },
  {
    "path": "gym/wrappers/normalize.py",
    "content": "\"\"\"Set of wrappers for normalizing actions and observations.\"\"\"\nimport numpy as np\n\nimport gym\n\n\n# taken from https://github.com/openai/baselines/blob/master/baselines/common/vec_env/vec_normalize.py\nclass RunningMeanStd:\n    \"\"\"Tracks the mean, variance and count of values.\"\"\"\n\n    # https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm\n    def __init__(self, epsilon=1e-4, shape=()):\n        \"\"\"Tracks the mean, variance and count of values.\"\"\"\n        self.mean = np.zeros(shape, \"float64\")\n        self.var = np.ones(shape, \"float64\")\n        self.count = epsilon\n\n    def update(self, x):\n        \"\"\"Updates the mean, var and count from a batch of samples.\"\"\"\n        batch_mean = np.mean(x, axis=0)\n        batch_var = np.var(x, axis=0)\n        batch_count = x.shape[0]\n        self.update_from_moments(batch_mean, batch_var, batch_count)\n\n    def update_from_moments(self, batch_mean, batch_var, batch_count):\n        \"\"\"Updates from batch mean, variance and count moments.\"\"\"\n        self.mean, self.var, self.count = update_mean_var_count_from_moments(\n            self.mean, self.var, self.count, batch_mean, batch_var, batch_count\n        )\n\n\ndef update_mean_var_count_from_moments(\n    mean, var, count, batch_mean, batch_var, batch_count\n):\n    \"\"\"Updates the mean, var and count using the previous mean, var, count and batch values.\"\"\"\n    delta = batch_mean - mean\n    tot_count = count + batch_count\n\n    new_mean = mean + delta * batch_count / tot_count\n    m_a = var * count\n    m_b = batch_var * batch_count\n    M2 = m_a + m_b + np.square(delta) * count * batch_count / tot_count\n    new_var = M2 / tot_count\n    new_count = tot_count\n\n    return new_mean, new_var, new_count\n\n\nclass NormalizeObservation(gym.core.Wrapper):\n    \"\"\"This wrapper will normalize observations s.t. each coordinate is centered with unit variance.\n\n    Note:\n        The normalization depends on past trajectories and observations will not be normalized correctly if the wrapper was\n        newly instantiated or the policy was changed recently.\n    \"\"\"\n\n    def __init__(self, env: gym.Env, epsilon: float = 1e-8):\n        \"\"\"This wrapper will normalize observations s.t. each coordinate is centered with unit variance.\n\n        Args:\n            env (Env): The environment to apply the wrapper\n            epsilon: A stability parameter that is used when scaling the observations.\n        \"\"\"\n        super().__init__(env)\n        self.num_envs = getattr(env, \"num_envs\", 1)\n        self.is_vector_env = getattr(env, \"is_vector_env\", False)\n        if self.is_vector_env:\n            self.obs_rms = RunningMeanStd(shape=self.single_observation_space.shape)\n        else:\n            self.obs_rms = RunningMeanStd(shape=self.observation_space.shape)\n        self.epsilon = epsilon\n\n    def step(self, action):\n        \"\"\"Steps through the environment and normalizes the observation.\"\"\"\n        obs, rews, terminateds, truncateds, infos = self.env.step(action)\n        if self.is_vector_env:\n            obs = self.normalize(obs)\n        else:\n            obs = self.normalize(np.array([obs]))[0]\n        return obs, rews, terminateds, truncateds, infos\n\n    def reset(self, **kwargs):\n        \"\"\"Resets the environment and normalizes the observation.\"\"\"\n        obs, info = self.env.reset(**kwargs)\n\n        if self.is_vector_env:\n            return self.normalize(obs), info\n        else:\n            return self.normalize(np.array([obs]))[0], info\n\n    def normalize(self, obs):\n        \"\"\"Normalises the observation using the running mean and variance of the observations.\"\"\"\n        self.obs_rms.update(obs)\n        return (obs - self.obs_rms.mean) / np.sqrt(self.obs_rms.var + self.epsilon)\n\n\nclass NormalizeReward(gym.core.Wrapper):\n    r\"\"\"This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.\n\n    The exponential moving average will have variance :math:`(1 - \\gamma)^2`.\n\n    Note:\n        The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly\n        instantiated or the policy was changed recently.\n    \"\"\"\n\n    def __init__(\n        self,\n        env: gym.Env,\n        gamma: float = 0.99,\n        epsilon: float = 1e-8,\n    ):\n        \"\"\"This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.\n\n        Args:\n            env (env): The environment to apply the wrapper\n            epsilon (float): A stability parameter\n            gamma (float): The discount factor that is used in the exponential moving average.\n        \"\"\"\n        super().__init__(env)\n        self.num_envs = getattr(env, \"num_envs\", 1)\n        self.is_vector_env = getattr(env, \"is_vector_env\", False)\n        self.return_rms = RunningMeanStd(shape=())\n        self.returns = np.zeros(self.num_envs)\n        self.gamma = gamma\n        self.epsilon = epsilon\n\n    def step(self, action):\n        \"\"\"Steps through the environment, normalizing the rewards returned.\"\"\"\n        obs, rews, terminateds, truncateds, infos = self.env.step(action)\n        if not self.is_vector_env:\n            rews = np.array([rews])\n        self.returns = self.returns * self.gamma + rews\n        rews = self.normalize(rews)\n        dones = np.logical_or(terminateds, truncateds)\n        self.returns[dones] = 0.0\n        if not self.is_vector_env:\n            rews = rews[0]\n        return obs, rews, terminateds, truncateds, infos\n\n    def normalize(self, rews):\n        \"\"\"Normalizes the rewards with the running mean rewards and their variance.\"\"\"\n        self.return_rms.update(self.returns)\n        return rews / np.sqrt(self.return_rms.var + self.epsilon)\n"
  },
  {
    "path": "gym/wrappers/order_enforcing.py",
    "content": "\"\"\"Wrapper to enforce the proper ordering of environment operations.\"\"\"\nimport gym\nfrom gym.error import ResetNeeded\n\n\nclass OrderEnforcing(gym.Wrapper):\n    \"\"\"A wrapper that will produce an error if :meth:`step` is called before an initial :meth:`reset`.\n\n    Example:\n        >>> from gym.envs.classic_control import CartPoleEnv\n        >>> env = CartPoleEnv()\n        >>> env = OrderEnforcing(env)\n        >>> env.step(0)\n        ResetNeeded: Cannot call env.step() before calling env.reset()\n        >>> env.render()\n        ResetNeeded: Cannot call env.render() before calling env.reset()\n        >>> env.reset()\n        >>> env.render()\n        >>> env.step(0)\n    \"\"\"\n\n    def __init__(self, env: gym.Env, disable_render_order_enforcing: bool = False):\n        \"\"\"A wrapper that will produce an error if :meth:`step` is called before an initial :meth:`reset`.\n\n        Args:\n            env: The environment to wrap\n            disable_render_order_enforcing: If to disable render order enforcing\n        \"\"\"\n        super().__init__(env)\n        self._has_reset: bool = False\n        self._disable_render_order_enforcing: bool = disable_render_order_enforcing\n\n    def step(self, action):\n        \"\"\"Steps through the environment with `kwargs`.\"\"\"\n        if not self._has_reset:\n            raise ResetNeeded(\"Cannot call env.step() before calling env.reset()\")\n        return self.env.step(action)\n\n    def reset(self, **kwargs):\n        \"\"\"Resets the environment with `kwargs`.\"\"\"\n        self._has_reset = True\n        return self.env.reset(**kwargs)\n\n    def render(self, *args, **kwargs):\n        \"\"\"Renders the environment with `kwargs`.\"\"\"\n        if not self._disable_render_order_enforcing and not self._has_reset:\n            raise ResetNeeded(\n                \"Cannot call `env.render()` before calling `env.reset()`, if this is a intended action, \"\n                \"set `disable_render_order_enforcing=True` on the OrderEnforcer wrapper.\"\n            )\n        return self.env.render(*args, **kwargs)\n\n    @property\n    def has_reset(self):\n        \"\"\"Returns if the environment has been reset before.\"\"\"\n        return self._has_reset\n"
  },
  {
    "path": "gym/wrappers/pixel_observation.py",
    "content": "\"\"\"Wrapper for augmenting observations by pixel values.\"\"\"\nimport collections\nimport copy\nfrom collections.abc import MutableMapping\nfrom typing import Any, Dict, List, Optional, Tuple\n\nimport numpy as np\n\nimport gym\nfrom gym import spaces\n\nSTATE_KEY = \"state\"\n\n\nclass PixelObservationWrapper(gym.ObservationWrapper):\n    \"\"\"Augment observations by pixel values.\n\n    Observations of this wrapper will be dictionaries of images.\n    You can also choose to add the observation of the base environment to this dictionary.\n    In that case, if the base environment has an observation space of type :class:`Dict`, the dictionary\n    of rendered images will be updated with the base environment's observation. If, however, the observation\n    space is of type :class:`Box`, the base environment's observation (which will be an element of the :class:`Box`\n    space) will be added to the dictionary under the key \"state\".\n\n    Example:\n        >>> import gym\n        >>> env = PixelObservationWrapper(gym.make('CarRacing-v1', render_mode=\"rgb_array\"))\n        >>> obs = env.reset()\n        >>> obs.keys()\n        odict_keys(['pixels'])\n        >>> obs['pixels'].shape\n        (400, 600, 3)\n        >>> env = PixelObservationWrapper(gym.make('CarRacing-v1', render_mode=\"rgb_array\"), pixels_only=False)\n        >>> obs = env.reset()\n        >>> obs.keys()\n        odict_keys(['state', 'pixels'])\n        >>> obs['state'].shape\n        (96, 96, 3)\n        >>> obs['pixels'].shape\n        (400, 600, 3)\n        >>> env = PixelObservationWrapper(gym.make('CarRacing-v1', render_mode=\"rgb_array\"), pixel_keys=('obs',))\n        >>> obs = env.reset()\n        >>> obs.keys()\n        odict_keys(['obs'])\n        >>> obs['obs'].shape\n        (400, 600, 3)\n    \"\"\"\n\n    def __init__(\n        self,\n        env: gym.Env,\n        pixels_only: bool = True,\n        render_kwargs: Optional[Dict[str, Dict[str, Any]]] = None,\n        pixel_keys: Tuple[str, ...] = (\"pixels\",),\n    ):\n        \"\"\"Initializes a new pixel Wrapper.\n\n        Args:\n            env: The environment to wrap.\n            pixels_only (bool): If ``True`` (default), the original observation returned\n                by the wrapped environment will be discarded, and a dictionary\n                observation will only include pixels. If ``False``, the\n                observation dictionary will contain both the original\n                observations and the pixel observations.\n            render_kwargs (dict): Optional dictionary containing that maps elements of ``pixel_keys``to\n                keyword arguments passed to the :meth:`self.render` method.\n            pixel_keys: Optional custom string specifying the pixel\n                observation's key in the ``OrderedDict`` of observations.\n                Defaults to ``(pixels,)``.\n\n        Raises:\n            AssertionError: If any of the keys in ``render_kwargs``do not show up in ``pixel_keys``.\n            ValueError: If ``env``'s observation space is not compatible with the\n                wrapper. Supported formats are a single array, or a dict of\n                arrays.\n            ValueError: If ``env``'s observation already contains any of the\n                specified ``pixel_keys``.\n            TypeError: When an unexpected pixel type is used\n        \"\"\"\n        super().__init__(env)\n\n        # Avoid side-effects that occur when render_kwargs is manipulated\n        render_kwargs = copy.deepcopy(render_kwargs)\n        self.render_history = []\n\n        if render_kwargs is None:\n            render_kwargs = {}\n\n        for key in render_kwargs:\n            assert key in pixel_keys, (\n                \"The argument render_kwargs should map elements of \"\n                \"pixel_keys to dictionaries of keyword arguments. \"\n                f\"Found key '{key}' in render_kwargs but not in pixel_keys.\"\n            )\n\n        default_render_kwargs = {}\n        if not env.render_mode:\n            raise AttributeError(\n                \"env.render_mode must be specified to use PixelObservationWrapper:\"\n                \"`gym.make(env_name, render_mode='rgb_array')`.\"\n            )\n\n        for key in pixel_keys:\n            render_kwargs.setdefault(key, default_render_kwargs)\n\n        wrapped_observation_space = env.observation_space\n\n        if isinstance(wrapped_observation_space, spaces.Box):\n            self._observation_is_dict = False\n            invalid_keys = {STATE_KEY}\n        elif isinstance(wrapped_observation_space, (spaces.Dict, MutableMapping)):\n            self._observation_is_dict = True\n            invalid_keys = set(wrapped_observation_space.spaces.keys())\n        else:\n            raise ValueError(\"Unsupported observation space structure.\")\n\n        if not pixels_only:\n            # Make sure that now keys in the `pixel_keys` overlap with\n            # `observation_keys`\n            overlapping_keys = set(pixel_keys) & set(invalid_keys)\n            if overlapping_keys:\n                raise ValueError(\n                    f\"Duplicate or reserved pixel keys {overlapping_keys!r}.\"\n                )\n\n        if pixels_only:\n            self.observation_space = spaces.Dict()\n        elif self._observation_is_dict:\n            self.observation_space = copy.deepcopy(wrapped_observation_space)\n        else:\n            self.observation_space = spaces.Dict({STATE_KEY: wrapped_observation_space})\n\n        # Extend observation space with pixels.\n\n        self.env.reset()\n        pixels_spaces = {}\n        for pixel_key in pixel_keys:\n            pixels = self._render(**render_kwargs[pixel_key])\n            pixels: np.ndarray = pixels[-1] if isinstance(pixels, List) else pixels\n\n            if not hasattr(pixels, \"dtype\") or not hasattr(pixels, \"shape\"):\n                raise TypeError(\n                    f\"Render method returns a {pixels.__class__.__name__}, but an array with dtype and shape is expected.\"\n                    \"Be sure to specify the correct render_mode.\"\n                )\n\n            if np.issubdtype(pixels.dtype, np.integer):\n                low, high = (0, 255)\n            elif np.issubdtype(pixels.dtype, np.float):\n                low, high = (-float(\"inf\"), float(\"inf\"))\n            else:\n                raise TypeError(pixels.dtype)\n\n            pixels_space = spaces.Box(\n                shape=pixels.shape, low=low, high=high, dtype=pixels.dtype\n            )\n            pixels_spaces[pixel_key] = pixels_space\n\n        self.observation_space.spaces.update(pixels_spaces)\n\n        self._pixels_only = pixels_only\n        self._render_kwargs = render_kwargs\n        self._pixel_keys = pixel_keys\n\n    def observation(self, observation):\n        \"\"\"Updates the observations with the pixel observations.\n\n        Args:\n            observation: The observation to add pixel observations for\n\n        Returns:\n            The updated pixel observations\n        \"\"\"\n        pixel_observation = self._add_pixel_observation(observation)\n        return pixel_observation\n\n    def _add_pixel_observation(self, wrapped_observation):\n        if self._pixels_only:\n            observation = collections.OrderedDict()\n        elif self._observation_is_dict:\n            observation = type(wrapped_observation)(wrapped_observation)\n        else:\n            observation = collections.OrderedDict()\n            observation[STATE_KEY] = wrapped_observation\n\n        pixel_observations = {\n            pixel_key: self._render(**self._render_kwargs[pixel_key])\n            for pixel_key in self._pixel_keys\n        }\n\n        observation.update(pixel_observations)\n\n        return observation\n\n    def render(self, *args, **kwargs):\n        \"\"\"Renders the environment.\"\"\"\n        render = self.env.render(*args, **kwargs)\n        if isinstance(render, list):\n            render = self.render_history + render\n            self.render_history = []\n        return render\n\n    def _render(self, *args, **kwargs):\n        render = self.env.render(*args, **kwargs)\n        if isinstance(render, list):\n            self.render_history += render\n        return render\n"
  },
  {
    "path": "gym/wrappers/record_episode_statistics.py",
    "content": "\"\"\"Wrapper that tracks the cumulative rewards and episode lengths.\"\"\"\nimport time\nfrom collections import deque\nfrom typing import Optional\n\nimport numpy as np\n\nimport gym\n\n\ndef add_vector_episode_statistics(\n    info: dict, episode_info: dict, num_envs: int, env_num: int\n):\n    \"\"\"Add episode statistics.\n\n    Add statistics coming from the vectorized environment.\n\n    Args:\n        info (dict): info dict of the environment.\n        episode_info (dict): episode statistics data.\n        num_envs (int): number of environments.\n        env_num (int): env number of the vectorized environments.\n\n    Returns:\n        info (dict): the input info dict with the episode statistics.\n    \"\"\"\n    info[\"episode\"] = info.get(\"episode\", {})\n\n    info[\"_episode\"] = info.get(\"_episode\", np.zeros(num_envs, dtype=bool))\n    info[\"_episode\"][env_num] = True\n\n    for k in episode_info.keys():\n        info_array = info[\"episode\"].get(k, np.zeros(num_envs))\n        info_array[env_num] = episode_info[k]\n        info[\"episode\"][k] = info_array\n\n    return info\n\n\nclass RecordEpisodeStatistics(gym.Wrapper):\n    \"\"\"This wrapper will keep track of cumulative rewards and episode lengths.\n\n    At the end of an episode, the statistics of the episode will be added to ``info``\n    using the key ``episode``. If using a vectorized environment also the key\n    ``_episode`` is used which indicates whether the env at the respective index has\n    the episode statistics.\n\n    After the completion of an episode, ``info`` will look like this::\n\n        >>> info = {\n        ...     ...\n        ...     \"episode\": {\n        ...         \"r\": \"<cumulative reward>\",\n        ...         \"l\": \"<episode length>\",\n        ...         \"t\": \"<elapsed time since instantiation of wrapper>\"\n        ...     },\n        ... }\n\n    For a vectorized environments the output will be in the form of::\n\n        >>> infos = {\n        ...     ...\n        ...     \"episode\": {\n        ...         \"r\": \"<array of cumulative reward>\",\n        ...         \"l\": \"<array of episode length>\",\n        ...         \"t\": \"<array of elapsed time since instantiation of wrapper>\"\n        ...     },\n        ...     \"_episode\": \"<boolean array of length num-envs>\"\n        ... }\n\n    Moreover, the most recent rewards and episode lengths are stored in buffers that can be accessed via\n    :attr:`wrapped_env.return_queue` and :attr:`wrapped_env.length_queue` respectively.\n\n    Attributes:\n        return_queue: The cumulative rewards of the last ``deque_size``-many episodes\n        length_queue: The lengths of the last ``deque_size``-many episodes\n    \"\"\"\n\n    def __init__(self, env: gym.Env, deque_size: int = 100):\n        \"\"\"This wrapper will keep track of cumulative rewards and episode lengths.\n\n        Args:\n            env (Env): The environment to apply the wrapper\n            deque_size: The size of the buffers :attr:`return_queue` and :attr:`length_queue`\n        \"\"\"\n        super().__init__(env)\n        self.num_envs = getattr(env, \"num_envs\", 1)\n        self.t0 = time.perf_counter()\n        self.episode_count = 0\n        self.episode_returns: Optional[np.ndarray] = None\n        self.episode_lengths: Optional[np.ndarray] = None\n        self.return_queue = deque(maxlen=deque_size)\n        self.length_queue = deque(maxlen=deque_size)\n        self.is_vector_env = getattr(env, \"is_vector_env\", False)\n\n    def reset(self, **kwargs):\n        \"\"\"Resets the environment using kwargs and resets the episode returns and lengths.\"\"\"\n        observations = super().reset(**kwargs)\n        self.episode_returns = np.zeros(self.num_envs, dtype=np.float32)\n        self.episode_lengths = np.zeros(self.num_envs, dtype=np.int32)\n        return observations\n\n    def step(self, action):\n        \"\"\"Steps through the environment, recording the episode statistics.\"\"\"\n        (\n            observations,\n            rewards,\n            terminateds,\n            truncateds,\n            infos,\n        ) = self.env.step(action)\n        assert isinstance(\n            infos, dict\n        ), f\"`info` dtype is {type(infos)} while supported dtype is `dict`. This may be due to usage of other wrappers in the wrong order.\"\n        self.episode_returns += rewards\n        self.episode_lengths += 1\n        if not self.is_vector_env:\n            terminateds = [terminateds]\n            truncateds = [truncateds]\n        terminateds = list(terminateds)\n        truncateds = list(truncateds)\n\n        for i in range(len(terminateds)):\n            if terminateds[i] or truncateds[i]:\n                episode_return = self.episode_returns[i]\n                episode_length = self.episode_lengths[i]\n                episode_info = {\n                    \"episode\": {\n                        \"r\": episode_return,\n                        \"l\": episode_length,\n                        \"t\": round(time.perf_counter() - self.t0, 6),\n                    }\n                }\n                if self.is_vector_env:\n                    infos = add_vector_episode_statistics(\n                        infos, episode_info[\"episode\"], self.num_envs, i\n                    )\n                else:\n                    infos = {**infos, **episode_info}\n                self.return_queue.append(episode_return)\n                self.length_queue.append(episode_length)\n                self.episode_count += 1\n                self.episode_returns[i] = 0\n                self.episode_lengths[i] = 0\n        return (\n            observations,\n            rewards,\n            terminateds if self.is_vector_env else terminateds[0],\n            truncateds if self.is_vector_env else truncateds[0],\n            infos,\n        )\n"
  },
  {
    "path": "gym/wrappers/record_video.py",
    "content": "\"\"\"Wrapper for recording videos.\"\"\"\nimport os\nfrom typing import Callable, Optional\n\nimport gym\nfrom gym import logger\nfrom gym.wrappers.monitoring import video_recorder\n\n\ndef capped_cubic_video_schedule(episode_id: int) -> bool:\n    \"\"\"The default episode trigger.\n\n    This function will trigger recordings at the episode indices 0, 1, 4, 8, 27, ..., :math:`k^3`, ..., 729, 1000, 2000, 3000, ...\n\n    Args:\n        episode_id: The episode number\n\n    Returns:\n        If to apply a video schedule number\n    \"\"\"\n    if episode_id < 1000:\n        return int(round(episode_id ** (1.0 / 3))) ** 3 == episode_id\n    else:\n        return episode_id % 1000 == 0\n\n\nclass RecordVideo(gym.Wrapper):\n    \"\"\"This wrapper records videos of rollouts.\n\n    Usually, you only want to record episodes intermittently, say every hundredth episode.\n    To do this, you can specify **either** ``episode_trigger`` **or** ``step_trigger`` (not both).\n    They should be functions returning a boolean that indicates whether a recording should be started at the\n    current episode or step, respectively.\n    If neither :attr:`episode_trigger` nor ``step_trigger`` is passed, a default ``episode_trigger`` will be employed.\n    By default, the recording will be stopped once a `terminated` or `truncated` signal has been emitted by the environment. However, you can\n    also create recordings of fixed length (possibly spanning several episodes) by passing a strictly positive value for\n    ``video_length``.\n    \"\"\"\n\n    def __init__(\n        self,\n        env: gym.Env,\n        video_folder: str,\n        episode_trigger: Callable[[int], bool] = None,\n        step_trigger: Callable[[int], bool] = None,\n        video_length: int = 0,\n        name_prefix: str = \"rl-video\",\n    ):\n        \"\"\"Wrapper records videos of rollouts.\n\n        Args:\n            env: The environment that will be wrapped\n            video_folder (str): The folder where the recordings will be stored\n            episode_trigger: Function that accepts an integer and returns ``True`` iff a recording should be started at this episode\n            step_trigger: Function that accepts an integer and returns ``True`` iff a recording should be started at this step\n            video_length (int): The length of recorded episodes. If 0, entire episodes are recorded.\n                Otherwise, snippets of the specified length are captured\n            name_prefix (str): Will be prepended to the filename of the recordings\n        \"\"\"\n        super().__init__(env)\n\n        if episode_trigger is None and step_trigger is None:\n            episode_trigger = capped_cubic_video_schedule\n\n        trigger_count = sum(x is not None for x in [episode_trigger, step_trigger])\n        assert trigger_count == 1, \"Must specify exactly one trigger\"\n\n        self.episode_trigger = episode_trigger\n        self.step_trigger = step_trigger\n        self.video_recorder: Optional[video_recorder.VideoRecorder] = None\n\n        self.video_folder = os.path.abspath(video_folder)\n        # Create output folder if needed\n        if os.path.isdir(self.video_folder):\n            logger.warn(\n                f\"Overwriting existing videos at {self.video_folder} folder \"\n                f\"(try specifying a different `video_folder` for the `RecordVideo` wrapper if this is not desired)\"\n            )\n        os.makedirs(self.video_folder, exist_ok=True)\n\n        self.name_prefix = name_prefix\n        self.step_id = 0\n        self.video_length = video_length\n\n        self.recording = False\n        self.terminated = False\n        self.truncated = False\n        self.recorded_frames = 0\n        self.is_vector_env = getattr(env, \"is_vector_env\", False)\n        self.episode_id = 0\n\n    def reset(self, **kwargs):\n        \"\"\"Reset the environment using kwargs and then starts recording if video enabled.\"\"\"\n        observations = super().reset(**kwargs)\n        self.terminated = False\n        self.truncated = False\n        if self.recording:\n            assert self.video_recorder is not None\n            self.video_recorder.frames = []\n            self.video_recorder.capture_frame()\n            self.recorded_frames += 1\n            if self.video_length > 0:\n                if self.recorded_frames > self.video_length:\n                    self.close_video_recorder()\n        elif self._video_enabled():\n            self.start_video_recorder()\n        return observations\n\n    def start_video_recorder(self):\n        \"\"\"Starts video recorder using :class:`video_recorder.VideoRecorder`.\"\"\"\n        self.close_video_recorder()\n\n        video_name = f\"{self.name_prefix}-step-{self.step_id}\"\n        if self.episode_trigger:\n            video_name = f\"{self.name_prefix}-episode-{self.episode_id}\"\n\n        base_path = os.path.join(self.video_folder, video_name)\n        self.video_recorder = video_recorder.VideoRecorder(\n            env=self.env,\n            base_path=base_path,\n            metadata={\"step_id\": self.step_id, \"episode_id\": self.episode_id},\n        )\n\n        self.video_recorder.capture_frame()\n        self.recorded_frames = 1\n        self.recording = True\n\n    def _video_enabled(self):\n        if self.step_trigger:\n            return self.step_trigger(self.step_id)\n        else:\n            return self.episode_trigger(self.episode_id)\n\n    def step(self, action):\n        \"\"\"Steps through the environment using action, recording observations if :attr:`self.recording`.\"\"\"\n        (\n            observations,\n            rewards,\n            terminateds,\n            truncateds,\n            infos,\n        ) = self.env.step(action)\n\n        if not (self.terminated or self.truncated):\n            # increment steps and episodes\n            self.step_id += 1\n            if not self.is_vector_env:\n                if terminateds or truncateds:\n                    self.episode_id += 1\n                    self.terminated = terminateds\n                    self.truncated = truncateds\n            elif terminateds[0] or truncateds[0]:\n                self.episode_id += 1\n                self.terminated = terminateds[0]\n                self.truncated = truncateds[0]\n\n            if self.recording:\n                assert self.video_recorder is not None\n                self.video_recorder.capture_frame()\n                self.recorded_frames += 1\n                if self.video_length > 0:\n                    if self.recorded_frames > self.video_length:\n                        self.close_video_recorder()\n                else:\n                    if not self.is_vector_env:\n                        if terminateds or truncateds:\n                            self.close_video_recorder()\n                    elif terminateds[0] or truncateds[0]:\n                        self.close_video_recorder()\n\n            elif self._video_enabled():\n                self.start_video_recorder()\n\n        return observations, rewards, terminateds, truncateds, infos\n\n    def close_video_recorder(self):\n        \"\"\"Closes the video recorder if currently recording.\"\"\"\n        if self.recording:\n            assert self.video_recorder is not None\n            self.video_recorder.close()\n        self.recording = False\n        self.recorded_frames = 1\n\n    def render(self, *args, **kwargs):\n        \"\"\"Compute the render frames as specified by render_mode attribute during initialization of the environment or as specified in kwargs.\"\"\"\n        if self.video_recorder is None or not self.video_recorder.enabled:\n            return super().render(*args, **kwargs)\n\n        if len(self.video_recorder.render_history) > 0:\n            recorded_frames = [\n                self.video_recorder.render_history.pop()\n                for _ in range(len(self.video_recorder.render_history))\n            ]\n            if self.recording:\n                return recorded_frames\n            else:\n                return recorded_frames + super().render(*args, **kwargs)\n        else:\n            if self.recording:\n                return self.video_recorder.last_frame\n            else:\n                return super().render(*args, **kwargs)\n\n    def close(self):\n        \"\"\"Closes the wrapper then the video recorder.\"\"\"\n        super().close()\n        self.close_video_recorder()\n\n    def __del__(self):\n        \"\"\"Closes the video recorder.\"\"\"\n        self.close_video_recorder()\n"
  },
  {
    "path": "gym/wrappers/render_collection.py",
    "content": "\"\"\"A wrapper that adds render collection mode to an environment.\"\"\"\nimport gym\n\n\nclass RenderCollection(gym.Wrapper):\n    \"\"\"Save collection of render frames.\"\"\"\n\n    def __init__(self, env: gym.Env, pop_frames: bool = True, reset_clean: bool = True):\n        \"\"\"Initialize a :class:`RenderCollection` instance.\n\n        Args:\n            env: The environment that is being wrapped\n            pop_frames (bool): If true, clear the collection frames after .render() is called.\n            Default value is True.\n            reset_clean (bool): If true, clear the collection frames when .reset() is called.\n            Default value is True.\n        \"\"\"\n        super().__init__(env)\n        assert env.render_mode is not None\n        assert not env.render_mode.endswith(\"_list\")\n        self.frame_list = []\n        self.reset_clean = reset_clean\n        self.pop_frames = pop_frames\n\n    @property\n    def render_mode(self):\n        \"\"\"Returns the collection render_mode name.\"\"\"\n        return f\"{self.env.render_mode}_list\"\n\n    def step(self, *args, **kwargs):\n        \"\"\"Perform a step in the base environment and collect a frame.\"\"\"\n        output = self.env.step(*args, **kwargs)\n        self.frame_list.append(self.env.render())\n        return output\n\n    def reset(self, *args, **kwargs):\n        \"\"\"Reset the base environment, eventually clear the frame_list, and collect a frame.\"\"\"\n        result = self.env.reset(*args, **kwargs)\n\n        if self.reset_clean:\n            self.frame_list = []\n        self.frame_list.append(self.env.render())\n\n        return result\n\n    def render(self):\n        \"\"\"Returns the collection of frames and, if pop_frames = True, clears it.\"\"\"\n        frames = self.frame_list\n        if self.pop_frames:\n            self.frame_list = []\n\n        return frames\n"
  },
  {
    "path": "gym/wrappers/rescale_action.py",
    "content": "\"\"\"Wrapper for rescaling actions to within a max and min action.\"\"\"\nfrom typing import Union\n\nimport numpy as np\n\nimport gym\nfrom gym import spaces\n\n\nclass RescaleAction(gym.ActionWrapper):\n    \"\"\"Affinely rescales the continuous action space of the environment to the range [min_action, max_action].\n\n    The base environment :attr:`env` must have an action space of type :class:`spaces.Box`. If :attr:`min_action`\n    or :attr:`max_action` are numpy arrays, the shape must match the shape of the environment's action space.\n\n    Example:\n        >>> import gym\n        >>> env = gym.make('BipedalWalker-v3')\n        >>> env.action_space\n        Box(-1.0, 1.0, (4,), float32)\n        >>> min_action = -0.5\n        >>> max_action = np.array([0.0, 0.5, 1.0, 0.75])\n        >>> env = RescaleAction(env, min_action=min_action, max_action=max_action)\n        >>> env.action_space\n        Box(-0.5, [0.   0.5  1.   0.75], (4,), float32)\n        >>> RescaleAction(env, min_action, max_action).action_space == gym.spaces.Box(min_action, max_action)\n        True\n    \"\"\"\n\n    def __init__(\n        self,\n        env: gym.Env,\n        min_action: Union[float, int, np.ndarray],\n        max_action: Union[float, int, np.ndarray],\n    ):\n        \"\"\"Initializes the :class:`RescaleAction` wrapper.\n\n        Args:\n            env (Env): The environment to apply the wrapper\n            min_action (float, int or np.ndarray): The min values for each action. This may be a numpy array or a scalar.\n            max_action (float, int or np.ndarray): The max values for each action. This may be a numpy array or a scalar.\n        \"\"\"\n        assert isinstance(\n            env.action_space, spaces.Box\n        ), f\"expected Box action space, got {type(env.action_space)}\"\n        assert np.less_equal(min_action, max_action).all(), (min_action, max_action)\n\n        super().__init__(env)\n        self.min_action = (\n            np.zeros(env.action_space.shape, dtype=env.action_space.dtype) + min_action\n        )\n        self.max_action = (\n            np.zeros(env.action_space.shape, dtype=env.action_space.dtype) + max_action\n        )\n        self.action_space = spaces.Box(\n            low=min_action,\n            high=max_action,\n            shape=env.action_space.shape,\n            dtype=env.action_space.dtype,\n        )\n\n    def action(self, action):\n        \"\"\"Rescales the action affinely from  [:attr:`min_action`, :attr:`max_action`] to the action space of the base environment, :attr:`env`.\n\n        Args:\n            action: The action to rescale\n\n        Returns:\n            The rescaled action\n        \"\"\"\n        assert np.all(np.greater_equal(action, self.min_action)), (\n            action,\n            self.min_action,\n        )\n        assert np.all(np.less_equal(action, self.max_action)), (action, self.max_action)\n        low = self.env.action_space.low\n        high = self.env.action_space.high\n        action = low + (high - low) * (\n            (action - self.min_action) / (self.max_action - self.min_action)\n        )\n        action = np.clip(action, low, high)\n        return action\n"
  },
  {
    "path": "gym/wrappers/resize_observation.py",
    "content": "\"\"\"Wrapper for resizing observations.\"\"\"\nfrom typing import Union\n\nimport numpy as np\n\nimport gym\nfrom gym.error import DependencyNotInstalled\nfrom gym.spaces import Box\n\n\nclass ResizeObservation(gym.ObservationWrapper):\n    \"\"\"Resize the image observation.\n\n    This wrapper works on environments with image observations (or more generally observations of shape AxBxC) and resizes\n    the observation to the shape given by the 2-tuple :attr:`shape`. The argument :attr:`shape` may also be an integer.\n    In that case, the observation is scaled to a square of side-length :attr:`shape`.\n\n    Example:\n        >>> import gym\n        >>> env = gym.make('CarRacing-v1')\n        >>> env.observation_space.shape\n        (96, 96, 3)\n        >>> env = ResizeObservation(env, 64)\n        >>> env.observation_space.shape\n        (64, 64, 3)\n    \"\"\"\n\n    def __init__(self, env: gym.Env, shape: Union[tuple, int]):\n        \"\"\"Resizes image observations to shape given by :attr:`shape`.\n\n        Args:\n            env: The environment to apply the wrapper\n            shape: The shape of the resized observations\n        \"\"\"\n        super().__init__(env)\n        if isinstance(shape, int):\n            shape = (shape, shape)\n        assert all(x > 0 for x in shape), shape\n\n        self.shape = tuple(shape)\n\n        assert isinstance(\n            env.observation_space, Box\n        ), f\"Expected the observation space to be Box, actual type: {type(env.observation_space)}\"\n        obs_shape = self.shape + env.observation_space.shape[2:]\n        self.observation_space = Box(low=0, high=255, shape=obs_shape, dtype=np.uint8)\n\n    def observation(self, observation):\n        \"\"\"Updates the observations by resizing the observation to shape given by :attr:`shape`.\n\n        Args:\n            observation: The observation to reshape\n\n        Returns:\n            The reshaped observations\n\n        Raises:\n            DependencyNotInstalled: opencv-python is not installed\n        \"\"\"\n        try:\n            import cv2\n        except ImportError:\n            raise DependencyNotInstalled(\n                \"opencv is not install, run `pip install gym[other]`\"\n            )\n\n        observation = cv2.resize(\n            observation, self.shape[::-1], interpolation=cv2.INTER_AREA\n        )\n        if observation.ndim == 2:\n            observation = np.expand_dims(observation, -1)\n        return observation\n"
  },
  {
    "path": "gym/wrappers/step_api_compatibility.py",
    "content": "\"\"\"Implementation of StepAPICompatibility wrapper class for transforming envs between new and old step API.\"\"\"\nimport gym\nfrom gym.logger import deprecation\nfrom gym.utils.step_api_compatibility import (\n    convert_to_done_step_api,\n    convert_to_terminated_truncated_step_api,\n)\n\n\nclass StepAPICompatibility(gym.Wrapper):\n    r\"\"\"A wrapper which can transform an environment from new step API to old and vice-versa.\n\n    Old step API refers to step() method returning (observation, reward, done, info)\n    New step API refers to step() method returning (observation, reward, terminated, truncated, info)\n    (Refer to docs for details on the API change)\n\n    Args:\n        env (gym.Env): the env to wrap. Can be in old or new API\n        apply_step_compatibility (bool): Apply to convert environment to use new step API that returns two bools. (False by default)\n\n    Examples:\n        >>> env = gym.make(\"CartPole-v1\")\n        >>> env # wrapper not applied by default, set to new API\n        <TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>\n        >>> env = gym.make(\"CartPole-v1\", apply_api_compatibility=True) # set to old API\n        <StepAPICompatibility<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>>\n        >>> env = StepAPICompatibility(CustomEnv(), apply_step_compatibility=False) # manually using wrapper on unregistered envs\n\n    \"\"\"\n\n    def __init__(self, env: gym.Env, output_truncation_bool: bool = True):\n        \"\"\"A wrapper which can transform an environment from new step API to old and vice-versa.\n\n        Args:\n            env (gym.Env): the env to wrap. Can be in old or new API\n            output_truncation_bool (bool): Whether the wrapper's step method outputs two booleans (new API) or one boolean (old API)\n        \"\"\"\n        super().__init__(env)\n        self.output_truncation_bool = output_truncation_bool\n        if not self.output_truncation_bool:\n            deprecation(\n                \"Initializing environment in old step API which returns one bool instead of two.\"\n            )\n\n    def step(self, action):\n        \"\"\"Steps through the environment, returning 5 or 4 items depending on `apply_step_compatibility`.\n\n        Args:\n            action: action to step through the environment with\n\n        Returns:\n            (observation, reward, terminated, truncated, info) or (observation, reward, done, info)\n        \"\"\"\n        step_returns = self.env.step(action)\n        if self.output_truncation_bool:\n            return convert_to_terminated_truncated_step_api(step_returns)\n        else:\n            return convert_to_done_step_api(step_returns)\n"
  },
  {
    "path": "gym/wrappers/time_aware_observation.py",
    "content": "\"\"\"Wrapper for adding time aware observations to environment observation.\"\"\"\nimport numpy as np\n\nimport gym\nfrom gym.spaces import Box\n\n\nclass TimeAwareObservation(gym.ObservationWrapper):\n    \"\"\"Augment the observation with the current time step in the episode.\n\n    The observation space of the wrapped environment is assumed to be a flat :class:`Box`.\n    In particular, pixel observations are not supported. This wrapper will append the current timestep within the current episode to the observation.\n\n    Example:\n        >>> import gym\n        >>> env = gym.make('CartPole-v1')\n        >>> env = TimeAwareObservation(env)\n        >>> env.reset()\n        array([ 0.03810719,  0.03522411,  0.02231044, -0.01088205,  0.        ])\n        >>> env.step(env.action_space.sample())[0]\n        array([ 0.03881167, -0.16021058,  0.0220928 ,  0.28875574,  1.        ])\n    \"\"\"\n\n    def __init__(self, env: gym.Env):\n        \"\"\"Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box` observation space.\n\n        Args:\n            env: The environment to apply the wrapper\n        \"\"\"\n        super().__init__(env)\n        assert isinstance(env.observation_space, Box)\n        assert env.observation_space.dtype == np.float32\n        low = np.append(self.observation_space.low, 0.0)\n        high = np.append(self.observation_space.high, np.inf)\n        self.observation_space = Box(low, high, dtype=np.float32)\n        self.is_vector_env = getattr(env, \"is_vector_env\", False)\n\n    def observation(self, observation):\n        \"\"\"Adds to the observation with the current time step.\n\n        Args:\n            observation: The observation to add the time step to\n\n        Returns:\n            The observation with the time step appended to\n        \"\"\"\n        return np.append(observation, self.t)\n\n    def step(self, action):\n        \"\"\"Steps through the environment, incrementing the time step.\n\n        Args:\n            action: The action to take\n\n        Returns:\n            The environment's step using the action.\n        \"\"\"\n        self.t += 1\n        return super().step(action)\n\n    def reset(self, **kwargs):\n        \"\"\"Reset the environment setting the time to zero.\n\n        Args:\n            **kwargs: Kwargs to apply to env.reset()\n\n        Returns:\n            The reset environment\n        \"\"\"\n        self.t = 0\n        return super().reset(**kwargs)\n"
  },
  {
    "path": "gym/wrappers/time_limit.py",
    "content": "\"\"\"Wrapper for limiting the time steps of an environment.\"\"\"\nfrom typing import Optional\n\nimport gym\n\n\nclass TimeLimit(gym.Wrapper):\n    \"\"\"This wrapper will issue a `truncated` signal if a maximum number of timesteps is exceeded.\n\n    If a truncation is not defined inside the environment itself, this is the only place that the truncation signal is issued.\n    Critically, this is different from the `terminated` signal that originates from the underlying environment as part of the MDP.\n\n    Example:\n       >>> from gym.envs.classic_control import CartPoleEnv\n       >>> from gym.wrappers import TimeLimit\n       >>> env = CartPoleEnv()\n       >>> env = TimeLimit(env, max_episode_steps=1000)\n    \"\"\"\n\n    def __init__(\n        self,\n        env: gym.Env,\n        max_episode_steps: Optional[int] = None,\n    ):\n        \"\"\"Initializes the :class:`TimeLimit` wrapper with an environment and the number of steps after which truncation will occur.\n\n        Args:\n            env: The environment to apply the wrapper\n            max_episode_steps: An optional max episode steps (if ``Ǹone``, ``env.spec.max_episode_steps`` is used)\n        \"\"\"\n        super().__init__(env)\n        if max_episode_steps is None and self.env.spec is not None:\n            max_episode_steps = env.spec.max_episode_steps\n        if self.env.spec is not None:\n            self.env.spec.max_episode_steps = max_episode_steps\n        self._max_episode_steps = max_episode_steps\n        self._elapsed_steps = None\n\n    def step(self, action):\n        \"\"\"Steps through the environment and if the number of steps elapsed exceeds ``max_episode_steps`` then truncate.\n\n        Args:\n            action: The environment step action\n\n        Returns:\n            The environment step ``(observation, reward, terminated, truncated, info)`` with `truncated=True`\n            if the number of steps elapsed >= max episode steps\n\n        \"\"\"\n        observation, reward, terminated, truncated, info = self.env.step(action)\n        self._elapsed_steps += 1\n\n        if self._elapsed_steps >= self._max_episode_steps:\n            truncated = True\n\n        return observation, reward, terminated, truncated, info\n\n    def reset(self, **kwargs):\n        \"\"\"Resets the environment with :param:`**kwargs` and sets the number of steps elapsed to zero.\n\n        Args:\n            **kwargs: The kwargs to reset the environment with\n\n        Returns:\n            The reset environment\n        \"\"\"\n        self._elapsed_steps = 0\n        return self.env.reset(**kwargs)\n"
  },
  {
    "path": "gym/wrappers/transform_observation.py",
    "content": "\"\"\"Wrapper for transforming observations.\"\"\"\nfrom typing import Any, Callable\n\nimport gym\n\n\nclass TransformObservation(gym.ObservationWrapper):\n    \"\"\"Transform the observation via an arbitrary function :attr:`f`.\n\n    The function :attr:`f` should be defined on the observation space of the base environment, ``env``, and should, ideally, return values in the same space.\n\n    If the transformation you wish to apply to observations returns values in a *different* space, you should subclass :class:`ObservationWrapper`, implement the transformation, and set the new observation space accordingly. If you were to use this wrapper instead, the observation space would be set incorrectly.\n\n    Example:\n        >>> import gym\n        >>> import numpy as np\n        >>> env = gym.make('CartPole-v1')\n        >>> env = TransformObservation(env, lambda obs: obs + 0.1*np.random.randn(*obs.shape))\n        >>> env.reset()\n        array([-0.08319338,  0.04635121, -0.07394746,  0.20877492])\n    \"\"\"\n\n    def __init__(self, env: gym.Env, f: Callable[[Any], Any]):\n        \"\"\"Initialize the :class:`TransformObservation` wrapper with an environment and a transform function :param:`f`.\n\n        Args:\n            env: The environment to apply the wrapper\n            f: A function that transforms the observation\n        \"\"\"\n        super().__init__(env)\n        assert callable(f)\n        self.f = f\n\n    def observation(self, observation):\n        \"\"\"Transforms the observations with callable :attr:`f`.\n\n        Args:\n            observation: The observation to transform\n\n        Returns:\n            The transformed observation\n        \"\"\"\n        return self.f(observation)\n"
  },
  {
    "path": "gym/wrappers/transform_reward.py",
    "content": "\"\"\"Wrapper for transforming the reward.\"\"\"\nfrom typing import Callable\n\nimport gym\nfrom gym import RewardWrapper\n\n\nclass TransformReward(RewardWrapper):\n    \"\"\"Transform the reward via an arbitrary function.\n\n    Warning:\n        If the base environment specifies a reward range which is not invariant under :attr:`f`, the :attr:`reward_range` of the wrapped environment will be incorrect.\n\n    Example:\n        >>> import gym\n        >>> env = gym.make('CartPole-v1')\n        >>> env = TransformReward(env, lambda r: 0.01*r)\n        >>> env.reset()\n        >>> observation, reward, terminated, truncated, info = env.step(env.action_space.sample())\n        >>> reward\n        0.01\n    \"\"\"\n\n    def __init__(self, env: gym.Env, f: Callable[[float], float]):\n        \"\"\"Initialize the :class:`TransformReward` wrapper with an environment and reward transform function :param:`f`.\n\n        Args:\n            env: The environment to apply the wrapper\n            f: A function that transforms the reward\n        \"\"\"\n        super().__init__(env)\n        assert callable(f)\n        self.f = f\n\n    def reward(self, reward):\n        \"\"\"Transforms the reward using callable :attr:`f`.\n\n        Args:\n            reward: The reward to transform\n\n        Returns:\n            The transformed reward\n        \"\"\"\n        return self.f(reward)\n"
  },
  {
    "path": "gym/wrappers/vector_list_info.py",
    "content": "\"\"\"Wrapper that converts the info format for vec envs into the list format.\"\"\"\n\nfrom typing import List\n\nimport gym\n\n\nclass VectorListInfo(gym.Wrapper):\n    \"\"\"Converts infos of vectorized environments from dict to List[dict].\n\n    This wrapper converts the info format of a\n    vector environment from a dictionary to a list of dictionaries.\n    This wrapper is intended to be used around vectorized\n    environments. If using other wrappers that perform\n    operation on info like `RecordEpisodeStatistics` this\n    need to be the outermost wrapper.\n\n    i.e. VectorListInfo(RecordEpisodeStatistics(envs))\n\n    Example::\n\n    >>> # actual\n    >>> {\n    ...      \"k\": np.array[0., 0., 0.5, 0.3],\n    ...      \"_k\": np.array[False, False, True, True]\n    ...  }\n    >>> # classic\n    >>> [{}, {}, {k: 0.5}, {k: 0.3}]\n\n    \"\"\"\n\n    def __init__(self, env):\n        \"\"\"This wrapper will convert the info into the list format.\n\n        Args:\n            env (Env): The environment to apply the wrapper\n        \"\"\"\n        assert getattr(\n            env, \"is_vector_env\", False\n        ), \"This wrapper can only be used in vectorized environments.\"\n        super().__init__(env)\n\n    def step(self, action):\n        \"\"\"Steps through the environment, convert dict info to list.\"\"\"\n        observation, reward, terminated, truncated, infos = self.env.step(action)\n        list_info = self._convert_info_to_list(infos)\n\n        return observation, reward, terminated, truncated, list_info\n\n    def reset(self, **kwargs):\n        \"\"\"Resets the environment using kwargs.\"\"\"\n        obs, infos = self.env.reset(**kwargs)\n        list_info = self._convert_info_to_list(infos)\n        return obs, list_info\n\n    def _convert_info_to_list(self, infos: dict) -> List[dict]:\n        \"\"\"Convert the dict info to list.\n\n        Convert the dict info of the vectorized environment\n        into a list of dictionaries where the i-th dictionary\n        has the info of the i-th environment.\n\n        Args:\n            infos (dict): info dict coming from the env.\n\n        Returns:\n            list_info (list): converted info.\n\n        \"\"\"\n        list_info = [{} for _ in range(self.num_envs)]\n        list_info = self._process_episode_statistics(infos, list_info)\n        for k in infos:\n            if k.startswith(\"_\"):\n                continue\n            for i, has_info in enumerate(infos[f\"_{k}\"]):\n                if has_info:\n                    list_info[i][k] = infos[k][i]\n        return list_info\n\n    def _process_episode_statistics(self, infos: dict, list_info: list) -> List[dict]:\n        \"\"\"Process episode statistics.\n\n        `RecordEpisodeStatistics` wrapper add extra\n        information to the info. This information are in\n        the form of a dict of dict. This method process these\n        information and add them to the info.\n        `RecordEpisodeStatistics` info contains the keys\n        \"r\", \"l\", \"t\" which represents \"cumulative reward\",\n        \"episode length\", \"elapsed time since instantiation of wrapper\".\n\n        Args:\n            infos (dict): infos coming from `RecordEpisodeStatistics`.\n            list_info (list): info of the current vectorized environment.\n\n        Returns:\n            list_info (list): updated info.\n\n        \"\"\"\n        episode_statistics = infos.pop(\"episode\", False)\n        if not episode_statistics:\n            return list_info\n\n        episode_statistics_mask = infos.pop(\"_episode\")\n        for i, has_info in enumerate(episode_statistics_mask):\n            if has_info:\n                list_info[i][\"episode\"] = {}\n                list_info[i][\"episode\"][\"r\"] = episode_statistics[\"r\"][i]\n                list_info[i][\"episode\"][\"l\"] = episode_statistics[\"l\"][i]\n                list_info[i][\"episode\"][\"t\"] = episode_statistics[\"t\"][i]\n\n        return list_info\n"
  },
  {
    "path": "py.Dockerfile",
    "content": "# A Dockerfile that sets up a full Gym install with test dependencies\nARG PYTHON_VERSION\nFROM python:$PYTHON_VERSION\n\nSHELL [\"/bin/bash\", \"-o\", \"pipefail\", \"-c\"]\n\nRUN apt-get -y update \\\n    && apt-get install --no-install-recommends -y \\\n    unzip \\\n    libglu1-mesa-dev \\\n    libgl1-mesa-dev \\\n    libosmesa6-dev \\\n    xvfb \\\n    patchelf \\\n    ffmpeg cmake \\\n    && apt-get autoremove -y \\\n    && apt-get clean \\\n    && rm -rf /var/lib/apt/lists/* \\\n    # Download mujoco\n    && mkdir /root/.mujoco \\\n    && cd /root/.mujoco \\\n    && wget -qO- 'https://github.com/deepmind/mujoco/releases/download/2.1.0/mujoco210-linux-x86_64.tar.gz' | tar -xzvf -\n\nENV LD_LIBRARY_PATH=\"$LD_LIBRARY_PATH:/root/.mujoco/mujoco210/bin\"\n\nCOPY . /usr/local/gym/\nWORKDIR /usr/local/gym/\n\nRUN if [ \"python:${PYTHON_VERSION}\" = \"python:3.6.15\" ] ; then pip install .[box2d,classic_control,toy_text,other] pytest==\"7.0.1\" --no-cache-dir; else pip install .[testing] --no-cache-dir; fi\n\nENTRYPOINT [\"/usr/local/gym/bin/docker_entrypoint\"]\n"
  },
  {
    "path": "pyproject.toml",
    "content": "[tool.pyright]\n\ninclude = [\n    \"gym/**\",\n    \"tests/**\"\n]\n\nexclude = [\n    \"**/node_modules\",\n    \"**/__pycache__\",\n]\n\nstrict = [\n\n]\n\ntypeCheckingMode = \"basic\"\npythonVersion = \"3.6\"\npythonPlatform = \"All\"\ntypeshedPath = \"typeshed\"\nenableTypeIgnoreComments = true\n\n# This is required as the CI pre-commit does not download the module (i.e. numpy, pygame, box2d)\n#   Therefore, we have to ignore missing imports\nreportMissingImports = \"none\"\n# Some modules are missing type stubs, which is an issue when running pyright locally\nreportMissingTypeStubs = false\n# For warning and error, will raise an error when\nreportInvalidTypeVarUse = \"none\"\n\n# reportUnknownMemberType = \"warning\"  # -> raises 6035 warnings\n# reportUnknownParameterType = \"warning\"  # -> raises 1327 warnings\n# reportUnknownVariableType = \"warning\"  # -> raises 2585 warnings\n# reportUnknownArgumentType = \"warning\"  # -> raises 2104 warnings\nreportGeneralTypeIssues = \"none\"  # -> commented out raises 489 errors\nreportUntypedFunctionDecorator = \"none\"  # -> pytest.mark.parameterize issues\n\nreportPrivateUsage = \"warning\"\nreportUnboundVariable = \"warning\"\n\n[tool.pytest.ini_options]\nfilterwarnings = ['ignore:.*step API.*:DeprecationWarning'] # TODO: to be removed when old step API is removed\n"
  },
  {
    "path": "requirements.txt",
    "content": "numpy>=1.18.0\ncloudpickle>=1.2.0\nimportlib_metadata>=4.8.0; python_version < '3.10'\ngym_notices>=0.0.4\ndataclasses==0.8; python_version == '3.6'\ntyping_extensions==4.3.0; python_version == '3.7'\nopencv-python>=3.0\nlz4>=3.1.0\nmatplotlib>=3.0\nbox2d-py==2.3.5\npygame==2.1.0\nale-py~=0.8.0\nmujoco==2.2.0\nmujoco_py<2.2,>=2.1\nimageio>=2.14.1"
  },
  {
    "path": "setup.py",
    "content": "\"\"\"Setups the project.\"\"\"\nimport itertools\nimport re\n\nfrom setuptools import find_packages, setup\n\nwith open(\"gym/version.py\") as file:\n    full_version = file.read()\n    assert (\n        re.match(r'VERSION = \"\\d\\.\\d+\\.\\d+\"\\n', full_version).group(0) == full_version\n    ), f\"Unexpected version: {full_version}\"\n    VERSION = re.search(r\"\\d\\.\\d+\\.\\d+\", full_version).group(0)\n\n# Environment-specific dependencies.\nextras = {\n    \"atari\": [\"ale-py~=0.8.0\"],\n    \"accept-rom-license\": [\"autorom[accept-rom-license]~=0.4.2\"],\n    \"box2d\": [\"box2d-py==2.3.5\", \"pygame==2.1.0\", \"swig==4.*\"],\n    \"classic_control\": [\"pygame==2.1.0\"],\n    \"mujoco_py\": [\"mujoco_py<2.2,>=2.1\"],\n    \"mujoco\": [\"mujoco==2.2\", \"imageio>=2.14.1\"],\n    \"toy_text\": [\"pygame==2.1.0\"],\n    \"other\": [\"lz4>=3.1.0\", \"opencv-python>=3.0\", \"matplotlib>=3.0\", \"moviepy>=1.0.0\"],\n}\n\n# Testing dependency groups.\ntesting_group = set(extras.keys()) - {\"accept-rom-license\", \"atari\"}\nextras[\"testing\"] = list(\n    set(itertools.chain.from_iterable(map(lambda group: extras[group], testing_group)))\n) + [\"pytest==7.0.1\"]\n\n# All dependency groups - accept rom license as requires user to run\nall_groups = set(extras.keys()) - {\"accept-rom-license\"}\nextras[\"all\"] = list(\n    set(itertools.chain.from_iterable(map(lambda group: extras[group], all_groups)))\n)\n\n# Uses the readme as the description on PyPI\nwith open(\"README.md\") as fh:\n    long_description = \"\"\n    header_count = 0\n    for line in fh:\n        if line.startswith(\"##\"):\n            header_count += 1\n        if header_count < 2:\n            long_description += line\n        else:\n            break\n\nsetup(\n    author=\"Gym Community\",\n    author_email=\"jkterry@umd.edu\",\n    classifiers=[\n        # Python 3.6 is minimally supported (only with basic gym environments and API)\n        \"Programming Language :: Python :: 3\",\n        \"Programming Language :: Python :: 3.6\",\n        \"Programming Language :: Python :: 3.7\",\n        \"Programming Language :: Python :: 3.8\",\n        \"Programming Language :: Python :: 3.9\",\n        \"Programming Language :: Python :: 3.10\",\n    ],\n    description=\"Gym: A universal API for reinforcement learning environments\",\n    extras_require=extras,\n    install_requires=[\n        \"numpy >= 1.18.0\",\n        \"cloudpickle >= 1.2.0\",\n        \"importlib_metadata >= 4.8.0; python_version < '3.10'\",\n        \"gym_notices >= 0.0.4\",\n        \"dataclasses == 0.8; python_version == '3.6'\",\n    ],\n    license=\"MIT\",\n    long_description=long_description,\n    long_description_content_type=\"text/markdown\",\n    name=\"gym\",\n    packages=[package for package in find_packages() if package.startswith(\"gym\")],\n    package_data={\n        \"gym\": [\n            \"envs/mujoco/assets/*.xml\",\n            \"envs/classic_control/assets/*.png\",\n            \"envs/toy_text/font/*.ttf\",\n            \"envs/toy_text/img/*.png\",\n            \"py.typed\",\n        ]\n    },\n    python_requires=\">=3.6\",\n    tests_require=extras[\"testing\"],\n    url=\"https://www.gymlibrary.dev/\",\n    version=VERSION,\n    zip_safe=False,\n)\n"
  },
  {
    "path": "test_requirements.txt",
    "content": "box2d-py==2.3.5\nlz4>=3.1.0\nopencv-python>=3.0\nmujoco==2.2.0\nmatplotlib>=3.0\nimageio>=2.14.1\npygame==2.1.0\nmujoco_py<2.2,>=2.1\npytest==7.0.1\n"
  },
  {
    "path": "tests/__init__.py",
    "content": ""
  },
  {
    "path": "tests/envs/__init__.py",
    "content": ""
  },
  {
    "path": "tests/envs/test_action_dim_check.py",
    "content": "import numpy as np\nimport pytest\n\nimport gym\nfrom gym import spaces\nfrom gym.envs.registration import EnvSpec\nfrom tests.envs.utils import all_testing_initialised_envs, mujoco_testing_env_specs\n\n\n@pytest.mark.parametrize(\n    \"env_spec\",\n    mujoco_testing_env_specs,\n    ids=[env_spec.id for env_spec in mujoco_testing_env_specs],\n)\ndef test_mujoco_action_dimensions(env_spec: EnvSpec):\n    \"\"\"Test that for all mujoco environment, mis-dimensioned actions, an error is raised.\n\n    Types of mis-dimensioned actions:\n     * Too few actions\n     * Too many actions\n     * Too few dimensions\n     * Too many dimensions\n     * Incorrect shape\n    \"\"\"\n    env = env_spec.make(disable_env_checker=True)\n    env.reset()\n\n    # Too few actions\n    with pytest.raises(ValueError, match=\"Action dimension mismatch\"):\n        env.step(env.action_space.sample()[1:])\n\n    # Too many actions\n    with pytest.raises(ValueError, match=\"Action dimension mismatch\"):\n        env.step(np.append(env.action_space.sample(), 0))\n\n    # Too few dimensions\n    with pytest.raises(ValueError, match=\"Action dimension mismatch\"):\n        env.step(0.1)\n\n    # Too many dimensions\n    with pytest.raises(ValueError, match=\"Action dimension mismatch\"):\n        env.step(np.expand_dims(env.action_space.sample(), 0))\n\n    # Incorrect shape\n    with pytest.raises(ValueError, match=\"Action dimension mismatch\"):\n        env.step(np.expand_dims(env.action_space.sample(), 1))\n\n    env.close()\n\n\nDISCRETE_ENVS = list(\n    filter(\n        lambda env: isinstance(env.action_space, spaces.Discrete),\n        all_testing_initialised_envs,\n    )\n)\n\n\n@pytest.mark.parametrize(\n    \"env\", DISCRETE_ENVS, ids=[env.spec.id for env in DISCRETE_ENVS]\n)\ndef test_discrete_actions_out_of_bound(env: gym.Env):\n    \"\"\"Test out of bound actions in Discrete action_space.\n\n    In discrete action_space environments, `out-of-bound`\n    actions are not allowed and should raise an exception.\n\n    Args:\n        env (gym.Env): the gym environment\n    \"\"\"\n    assert isinstance(env.action_space, spaces.Discrete)\n    upper_bound = env.action_space.start + env.action_space.n - 1\n\n    env.reset()\n    with pytest.raises(Exception):\n        env.step(upper_bound + 1)\n\n    env.close()\n\n\nBOX_ENVS = list(\n    filter(\n        lambda env: isinstance(env.action_space, spaces.Box),\n        all_testing_initialised_envs,\n    )\n)\nOOB_VALUE = 100\n\n\n@pytest.mark.parametrize(\"env\", BOX_ENVS, ids=[env.spec.id for env in BOX_ENVS])\ndef test_box_actions_out_of_bound(env: gym.Env):\n    \"\"\"Test out of bound actions in Box action_space.\n\n    Environments with Box actions spaces perform clipping inside `step`.\n    The expected behaviour is that an action `out-of-bound` has the same effect\n    of an action with value exactly at the upper (or lower) bound.\n\n    Args:\n        env (gym.Env): the gym environment\n    \"\"\"\n    env.reset(seed=42)\n\n    oob_env = gym.make(env.spec.id, disable_env_checker=True)\n    oob_env.reset(seed=42)\n\n    assert isinstance(env.action_space, spaces.Box)\n    dtype = env.action_space.dtype\n    upper_bounds = env.action_space.high\n    lower_bounds = env.action_space.low\n\n    for i, (is_upper_bound, is_lower_bound) in enumerate(\n        zip(env.action_space.bounded_above, env.action_space.bounded_below)\n    ):\n        if is_upper_bound:\n            obs, _, _, _, _ = env.step(upper_bounds)\n            oob_action = upper_bounds.copy()\n            oob_action[i] += np.cast[dtype](OOB_VALUE)\n\n            assert oob_action[i] > upper_bounds[i]\n            oob_obs, _, _, _, _ = oob_env.step(oob_action)\n\n            assert np.alltrue(obs == oob_obs)\n\n        if is_lower_bound:\n            obs, _, _, _, _ = env.step(\n                lower_bounds\n            )  # `env` is unwrapped, and in new step API\n            oob_action = lower_bounds.copy()\n            oob_action[i] -= np.cast[dtype](OOB_VALUE)\n\n            assert oob_action[i] < lower_bounds[i]\n            oob_obs, _, _, _, _ = oob_env.step(oob_action)\n\n            assert np.alltrue(obs == oob_obs)\n\n    env.close()\n"
  },
  {
    "path": "tests/envs/test_compatibility.py",
    "content": "import sys\nfrom typing import Any, Dict, Optional, Tuple\n\nimport numpy as np\n\nimport gym\nfrom gym.spaces import Discrete\nfrom gym.wrappers.compatibility import EnvCompatibility, LegacyEnv\n\n\nclass LegacyEnvExplicit(LegacyEnv, gym.Env):\n    \"\"\"Legacy env that explicitly implements the old API.\"\"\"\n\n    observation_space = Discrete(1)\n    action_space = Discrete(1)\n    metadata = {\"render.modes\": [\"human\", \"rgb_array\"]}\n\n    def __init__(self):\n        pass\n\n    def reset(self):\n        return 0\n\n    def step(self, action):\n        return 0, 0, False, {}\n\n    def render(self, mode=\"human\"):\n        if mode == \"human\":\n            return\n        elif mode == \"rgb_array\":\n            return np.zeros((1, 1, 3), dtype=np.uint8)\n\n    def close(self):\n        pass\n\n    def seed(self, seed=None):\n        pass\n\n\nclass LegacyEnvImplicit(gym.Env):\n    \"\"\"Legacy env that implicitly implements the old API as a protocol.\"\"\"\n\n    observation_space = Discrete(1)\n    action_space = Discrete(1)\n    metadata = {\"render.modes\": [\"human\", \"rgb_array\"]}\n\n    def __init__(self):\n        pass\n\n    def reset(self):  # type: ignore\n        return 0  # type: ignore\n\n    def step(self, action: Any) -> Tuple[int, float, bool, Dict]:\n        return 0, 0.0, False, {}\n\n    def render(self, mode: Optional[str] = \"human\") -> Any:\n        if mode == \"human\":\n            return\n        elif mode == \"rgb_array\":\n            return np.zeros((1, 1, 3), dtype=np.uint8)\n\n    def close(self):\n        pass\n\n    def seed(self, seed: Optional[int] = None):\n        pass\n\n\ndef test_explicit():\n    old_env = LegacyEnvExplicit()\n    assert isinstance(old_env, LegacyEnv)\n    env = EnvCompatibility(old_env, render_mode=\"rgb_array\")\n    assert env.observation_space == Discrete(1)\n    assert env.action_space == Discrete(1)\n    assert env.reset() == (0, {})\n    assert env.reset(seed=0, options={\"some\": \"option\"}) == (0, {})\n    assert env.step(0) == (0, 0, False, False, {})\n    assert env.render().shape == (1, 1, 3)\n    env.close()\n\n\ndef test_implicit():\n    old_env = LegacyEnvImplicit()\n    if sys.version_info >= (3, 7):\n        # We need to give up on typing in Python 3.6\n        assert isinstance(old_env, LegacyEnv)\n    env = EnvCompatibility(old_env, render_mode=\"rgb_array\")\n    assert env.observation_space == Discrete(1)\n    assert env.action_space == Discrete(1)\n    assert env.reset() == (0, {})\n    assert env.reset(seed=0, options={\"some\": \"option\"}) == (0, {})\n    assert env.step(0) == (0, 0, False, False, {})\n    assert env.render().shape == (1, 1, 3)\n    env.close()\n\n\ndef test_make_compatibility_in_spec():\n    gym.register(\n        id=\"LegacyTestEnv-v0\",\n        entry_point=LegacyEnvExplicit,\n        apply_api_compatibility=True,\n    )\n    env = gym.make(\"LegacyTestEnv-v0\", render_mode=\"rgb_array\")\n    assert env.observation_space == Discrete(1)\n    assert env.action_space == Discrete(1)\n    assert env.reset() == (0, {})\n    assert env.reset(seed=0, options={\"some\": \"option\"}) == (0, {})\n    assert env.step(0) == (0, 0, False, False, {})\n    img = env.render()\n    assert isinstance(img, np.ndarray)\n    assert img.shape == (1, 1, 3)  # type: ignore\n    env.close()\n    del gym.envs.registration.registry[\"LegacyTestEnv-v0\"]\n\n\ndef test_make_compatibility_in_make():\n    gym.register(id=\"LegacyTestEnv-v0\", entry_point=LegacyEnvExplicit)\n    env = gym.make(\n        \"LegacyTestEnv-v0\", apply_api_compatibility=True, render_mode=\"rgb_array\"\n    )\n    assert env.observation_space == Discrete(1)\n    assert env.action_space == Discrete(1)\n    assert env.reset() == (0, {})\n    assert env.reset(seed=0, options={\"some\": \"option\"}) == (0, {})\n    assert env.step(0) == (0, 0, False, False, {})\n    img = env.render()\n    assert isinstance(img, np.ndarray)\n    assert img.shape == (1, 1, 3)  # type: ignore\n    env.close()\n    del gym.envs.registration.registry[\"LegacyTestEnv-v0\"]\n"
  },
  {
    "path": "tests/envs/test_env_implementation.py",
    "content": "from typing import Optional\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym.envs.box2d import BipedalWalker\nfrom gym.envs.box2d.lunar_lander import demo_heuristic_lander\nfrom gym.envs.toy_text import TaxiEnv\nfrom gym.envs.toy_text.frozen_lake import generate_random_map\n\n\ndef test_lunar_lander_heuristics():\n    \"\"\"Tests the LunarLander environment by checking if the heuristic lander works.\"\"\"\n    lunar_lander = gym.make(\"LunarLander-v2\", disable_env_checker=True)\n    total_reward = demo_heuristic_lander(lunar_lander, seed=1)\n    assert total_reward > 100\n\n\ndef test_carracing_domain_randomize():\n    \"\"\"Tests the CarRacing Environment domain randomization.\n\n    CarRacing DomainRandomize should have different colours at every reset.\n    However, it should have same colours when `options={\"randomize\": False}` is given to reset.\n    \"\"\"\n    env = gym.make(\"CarRacing-v2\", domain_randomize=True)\n\n    road_color = env.road_color\n    bg_color = env.bg_color\n    grass_color = env.grass_color\n\n    env.reset(options={\"randomize\": False})\n\n    assert (\n        road_color == env.road_color\n    ).all(), f\"Have different road color after reset with randomize turned off. Before: {road_color}, after: {env.road_color}.\"\n    assert (\n        bg_color == env.bg_color\n    ).all(), f\"Have different bg color after reset with randomize turned off. Before: {bg_color}, after: {env.bg_color}.\"\n    assert (\n        grass_color == env.grass_color\n    ).all(), f\"Have different grass color after reset with randomize turned off. Before: {grass_color}, after: {env.grass_color}.\"\n\n    env.reset()\n\n    assert (\n        road_color != env.road_color\n    ).all(), f\"Have same road color after reset. Before: {road_color}, after: {env.road_color}.\"\n    assert (\n        bg_color != env.bg_color\n    ).all(), (\n        f\"Have same bg color after reset. Before: {bg_color}, after: {env.bg_color}.\"\n    )\n    assert (\n        grass_color != env.grass_color\n    ).all(), f\"Have same grass color after reset. Before: {grass_color}, after: {env.grass_color}.\"\n\n\n@pytest.mark.parametrize(\"seed\", range(5))\ndef test_bipedal_walker_hardcore_creation(seed: int):\n    \"\"\"Test BipedalWalker hardcore creation.\n\n    BipedalWalker with `hardcore=True` should have ladders\n    stumps and pitfalls. A convenient way to identify if ladders,\n    stumps and pitfall are created is checking whether the terrain\n    has that particular terrain color.\n\n    Args:\n        seed (int): environment seed\n    \"\"\"\n    HC_TERRAINS_COLOR1 = (255, 255, 255)\n    HC_TERRAINS_COLOR2 = (153, 153, 153)\n\n    env = gym.make(\"BipedalWalker-v3\", disable_env_checker=True).unwrapped\n    hc_env = gym.make(\"BipedalWalkerHardcore-v3\", disable_env_checker=True).unwrapped\n    assert isinstance(env, BipedalWalker) and isinstance(hc_env, BipedalWalker)\n    assert env.hardcore is False and hc_env.hardcore is True\n\n    env.reset(seed=seed)\n    hc_env.reset(seed=seed)\n\n    for terrain in env.terrain:\n        assert terrain.color1 != HC_TERRAINS_COLOR1\n        assert terrain.color2 != HC_TERRAINS_COLOR2\n\n    hc_terrains_color1_count = 0\n    hc_terrains_color2_count = 0\n    for terrain in hc_env.terrain:\n        if terrain.color1 == HC_TERRAINS_COLOR1:\n            hc_terrains_color1_count += 1\n        if terrain.color2 == HC_TERRAINS_COLOR2:\n            hc_terrains_color2_count += 1\n\n    assert hc_terrains_color1_count > 0\n    assert hc_terrains_color2_count > 0\n\n\n@pytest.mark.parametrize(\"map_size\", [5, 10, 16])\ndef test_frozenlake_dfs_map_generation(map_size: int):\n    \"\"\"Frozenlake has the ability to generate random maps.\n\n    This function checks that the random maps will always be possible to solve for sizes 5, 10, 16,\n    currently only 8x8 maps can be generated.\n    \"\"\"\n    new_frozenlake = generate_random_map(map_size)\n    assert len(new_frozenlake) == map_size\n    assert len(new_frozenlake[0]) == map_size\n\n    # Runs a depth first search through the map to find the path.\n    directions = [(1, 0), (0, 1), (-1, 0), (0, -1)]\n    frontier, discovered = [], set()\n    frontier.append((0, 0))\n    while frontier:\n        row, col = frontier.pop()\n        if (row, col) not in discovered:\n            discovered.add((row, col))\n\n            for row_direction, col_direction in directions:\n                new_row = row + row_direction\n                new_col = col + col_direction\n                if 0 <= new_row < map_size and 0 <= new_col < map_size:\n                    if new_frozenlake[new_row][new_col] == \"G\":\n                        return  # Successful, a route through the map was found\n                    if new_frozenlake[new_row][new_col] not in \"#H\":\n                        frontier.append((new_row, new_col))\n    raise AssertionError(\"No path through the frozenlake was found.\")\n\n\ndef test_taxi_action_mask():\n    env = TaxiEnv()\n\n    for state in env.P:\n        mask = env.action_mask(state)\n        for action, possible in enumerate(mask):\n            _, next_state, _, _ = env.P[state][action][0]\n            assert state != next_state if possible else state == next_state\n\n\ndef test_taxi_encode_decode():\n    env = TaxiEnv()\n\n    state, info = env.reset()\n    for _ in range(100):\n        assert (\n            env.encode(*env.decode(state)) == state\n        ), f\"state={state}, encode(decode(state))={env.encode(*env.decode(state))}\"\n        state, _, _, _, _ = env.step(env.action_space.sample())\n\n\n@pytest.mark.parametrize(\n    \"env_name\",\n    [\"Acrobot-v1\", \"CartPole-v1\", \"MountainCar-v0\", \"MountainCarContinuous-v0\"],\n)\n@pytest.mark.parametrize(\n    \"low_high\", [None, (-0.4, 0.4), (np.array(-0.4), np.array(0.4))]\n)\ndef test_customizable_resets(env_name: str, low_high: Optional[list]):\n    env = gym.make(env_name)\n    env.action_space.seed(0)\n    # First ensure we can do a reset.\n    if low_high is None:\n        env.reset()\n    else:\n        low, high = low_high\n        env.reset(options={\"low\": low, \"high\": high})\n        assert np.all((env.state >= low) & (env.state <= high))\n    # Make sure we can take a step.\n    env.step(env.action_space.sample())\n\n\n# We test Pendulum separately, as the parameters are handled differently.\n@pytest.mark.parametrize(\n    \"low_high\",\n    [\n        None,\n        (1.2, 1.0),\n        (np.array(1.2), np.array(1.0)),\n    ],\n)\ndef test_customizable_pendulum_resets(low_high: Optional[list]):\n    env = gym.make(\"Pendulum-v1\")\n    env.action_space.seed(0)\n    # First ensure we can do a reset and the values are within expected ranges.\n    if low_high is None:\n        env.reset()\n    else:\n        low, high = low_high\n        # Pendulum is initialized a little differently than the other\n        # environments, where we specify the x and y values for the upper\n        # limit (and lower limit is just the negative of it).\n        env.reset(options={\"x_init\": low, \"y_init\": high})\n    # Make sure we can take a step.\n    env.step(env.action_space.sample())\n\n\n@pytest.mark.parametrize(\n    \"env_name\",\n    [\"Acrobot-v1\", \"CartPole-v1\", \"MountainCar-v0\", \"MountainCarContinuous-v0\"],\n)\n@pytest.mark.parametrize(\n    \"low_high\",\n    [\n        (\"x\", \"y\"),\n        (10.0, 8.0),\n        ([-1.0, -1.0], [1.0, 1.0]),\n        (np.array([-1.0, -1.0]), np.array([1.0, 1.0])),\n    ],\n)\ndef test_invalid_customizable_resets(env_name: str, low_high: list):\n    env = gym.make(env_name)\n    low, high = low_high\n    with pytest.raises(ValueError):\n        # match=re.escape(f\"Lower bound ({low}) must be lower than higher bound ({high}).\")\n        # match=f\"An option ({x}) could not be converted to a float.\"\n        env.reset(options={\"low\": low, \"high\": high})\n"
  },
  {
    "path": "tests/envs/test_envs.py",
    "content": "import pickle\nimport warnings\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym.envs.registration import EnvSpec\nfrom gym.logger import warn\nfrom gym.utils.env_checker import check_env, data_equivalence\nfrom tests.envs.utils import (\n    all_testing_env_specs,\n    all_testing_initialised_envs,\n    assert_equals,\n)\n\n# This runs a smoketest on each official registered env. We may want\n# to try also running environments which are not officially registered envs.\nPASSIVE_CHECK_IGNORE_WARNING = [\n    f\"\\x1b[33mWARN: {message}\\x1b[0m\"\n    for message in [\n        \"This version of the mujoco environments depends on the mujoco-py bindings, which are no longer maintained and may stop working. Please upgrade to the v4 versions of the environments (which depend on the mujoco python bindings instead), unless you are trying to precisely replicate previous works).\",\n        \"Initializing environment in done (old) step API which returns one bool instead of two.\",\n    ]\n]\n\nCHECK_ENV_IGNORE_WARNINGS = [\n    f\"\\x1b[33mWARN: {message}\\x1b[0m\"\n    for message in [\n        \"This version of the mujoco environments depends on the mujoco-py bindings, which are no longer maintained and may stop working. Please upgrade to the v4 versions of the environments (which depend on the mujoco python bindings instead), unless you are trying to precisely replicate previous works).\",\n        \"A Box observation space minimum value is -infinity. This is probably too low.\",\n        \"A Box observation space maximum value is -infinity. This is probably too high.\",\n        \"For Box action spaces, we recommend using a symmetric and normalized space (range=[-1, 1] or [0, 1]). See https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html for more information.\",\n    ]\n]\n\n\n@pytest.mark.parametrize(\n    \"spec\", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]\n)\ndef test_envs_pass_env_checker(spec):\n    \"\"\"Check that all environments pass the environment checker with no warnings other than the expected.\"\"\"\n    with warnings.catch_warnings(record=True) as caught_warnings:\n        env = spec.make(disable_env_checker=True).unwrapped\n        check_env(env)\n\n        env.close()\n\n    for warning in caught_warnings:\n        if warning.message.args[0] not in CHECK_ENV_IGNORE_WARNINGS:\n            raise gym.error.Error(f\"Unexpected warning: {warning.message}\")\n\n\n# Note that this precludes running this test in multiple threads.\n# However, we probably already can't do multithreading due to some environments.\nSEED = 0\nNUM_STEPS = 50\n\n\n@pytest.mark.parametrize(\n    \"env_spec\", all_testing_env_specs, ids=[env.id for env in all_testing_env_specs]\n)\ndef test_env_determinism_rollout(env_spec: EnvSpec):\n    \"\"\"Run a rollout with two environments and assert equality.\n\n    This test run a rollout of NUM_STEPS steps with two environments\n    initialized with the same seed and assert that:\n\n    - observation after first reset are the same\n    - same actions are sampled by the two envs\n    - observations are contained in the observation space\n    - obs, rew, done and info are equals between the two envs\n    \"\"\"\n    # Don't check rollout equality if it's a nondeterministic environment.\n    if env_spec.nondeterministic is True:\n        return\n\n    env_1 = env_spec.make(disable_env_checker=True)\n    env_2 = env_spec.make(disable_env_checker=True)\n\n    initial_obs_1, initial_info_1 = env_1.reset(seed=SEED)\n    initial_obs_2, initial_info_2 = env_2.reset(seed=SEED)\n    assert_equals(initial_obs_1, initial_obs_2)\n\n    env_1.action_space.seed(SEED)\n\n    for time_step in range(NUM_STEPS):\n        # We don't evaluate the determinism of actions\n        action = env_1.action_space.sample()\n\n        obs_1, rew_1, terminated_1, truncated_1, info_1 = env_1.step(action)\n        obs_2, rew_2, terminated_2, truncated_2, info_2 = env_2.step(action)\n\n        assert_equals(obs_1, obs_2, f\"[{time_step}] \")\n        assert env_1.observation_space.contains(\n            obs_1\n        )  # obs_2 verified by previous assertion\n\n        assert rew_1 == rew_2, f\"[{time_step}] reward 1={rew_1}, reward 2={rew_2}\"\n        assert (\n            terminated_1 == terminated_2\n        ), f\"[{time_step}] done 1={terminated_1}, done 2={terminated_2}\"\n        assert (\n            truncated_1 == truncated_2\n        ), f\"[{time_step}] done 1={truncated_1}, done 2={truncated_2}\"\n        assert_equals(info_1, info_2, f\"[{time_step}] \")\n\n        if (\n            terminated_1 or truncated_1\n        ):  # terminated_2, truncated_2 verified by previous assertion\n            env_1.reset(seed=SEED)\n            env_2.reset(seed=SEED)\n\n    env_1.close()\n    env_2.close()\n\n\ndef check_rendered(rendered_frame, mode: str):\n    \"\"\"Check that the rendered frame is as expected.\"\"\"\n    if mode == \"rgb_array_list\":\n        assert isinstance(rendered_frame, list)\n        for frame in rendered_frame:\n            check_rendered(frame, \"rgb_array\")\n    elif mode == \"rgb_array\":\n        assert isinstance(rendered_frame, np.ndarray)\n        assert len(rendered_frame.shape) == 3\n        assert rendered_frame.shape[2] == 3\n        assert np.all(rendered_frame >= 0) and np.all(rendered_frame <= 255)\n    elif mode == \"ansi\":\n        assert isinstance(rendered_frame, str)\n        assert len(rendered_frame) > 0\n    elif mode == \"state_pixels_list\":\n        assert isinstance(rendered_frame, list)\n        for frame in rendered_frame:\n            check_rendered(frame, \"rgb_array\")\n    elif mode == \"state_pixels\":\n        check_rendered(rendered_frame, \"rgb_array\")\n    elif mode == \"depth_array_list\":\n        assert isinstance(rendered_frame, list)\n        for frame in rendered_frame:\n            check_rendered(frame, \"depth_array\")\n    elif mode == \"depth_array\":\n        assert isinstance(rendered_frame, np.ndarray)\n        assert len(rendered_frame.shape) == 2\n    else:\n        warn(\n            f\"Unknown render mode: {mode}, cannot check that the rendered data is correct. Add case to `check_rendered`\"\n        )\n\n\nnon_mujoco_py_env_specs = [\n    spec\n    for spec in all_testing_env_specs\n    if \"mujoco\" not in spec.entry_point or \"v4\" in spec.id\n]\n\n\n@pytest.mark.parametrize(\n    \"spec\", non_mujoco_py_env_specs, ids=[spec.id for spec in non_mujoco_py_env_specs]\n)\ndef test_render_modes(spec):\n    \"\"\"There is a known issue where rendering a mujoco environment then mujoco-py will cause an error on non-mac based systems.\n\n    Therefore, we are only testing with mujoco environments.\n    \"\"\"\n    env = spec.make()\n\n    assert \"rgb_array\" in env.metadata[\"render_modes\"]\n    assert \"human\" in env.metadata[\"render_modes\"]\n\n    for mode in env.metadata[\"render_modes\"]:\n        if mode != \"human\":\n            new_env = spec.make(render_mode=mode)\n\n            new_env.reset()\n            rendered = new_env.render()\n            check_rendered(rendered, mode)\n\n            new_env.step(new_env.action_space.sample())\n            rendered = new_env.render()\n            check_rendered(rendered, mode)\n\n            new_env.close()\n    env.close()\n\n\n@pytest.mark.parametrize(\n    \"env\",\n    all_testing_initialised_envs,\n    ids=[env.spec.id for env in all_testing_initialised_envs],\n)\ndef test_pickle_env(env: gym.Env):\n    pickled_env = pickle.loads(pickle.dumps(env))\n\n    data_equivalence(env.reset(), pickled_env.reset())\n\n    action = env.action_space.sample()\n    data_equivalence(env.step(action), pickled_env.step(action))\n    env.close()\n    pickled_env.close()\n"
  },
  {
    "path": "tests/envs/test_make.py",
    "content": "\"\"\"Tests that gym.make works as expected.\"\"\"\n\nimport re\nimport warnings\nfrom copy import deepcopy\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym.envs.classic_control import cartpole\nfrom gym.wrappers import AutoResetWrapper, HumanRendering, OrderEnforcing, TimeLimit\nfrom gym.wrappers.env_checker import PassiveEnvChecker\nfrom tests.envs.test_envs import PASSIVE_CHECK_IGNORE_WARNING\nfrom tests.envs.utils import all_testing_env_specs\nfrom tests.envs.utils_envs import ArgumentEnv, RegisterDuringMakeEnv\nfrom tests.testing_env import GenericTestEnv, old_step_fn\nfrom tests.wrappers.utils import has_wrapper\n\ngym.register(\n    \"RegisterDuringMakeEnv-v0\",\n    entry_point=\"tests.envs.utils_envs:RegisterDuringMakeEnv\",\n)\n\ngym.register(\n    id=\"test.ArgumentEnv-v0\",\n    entry_point=\"tests.envs.utils_envs:ArgumentEnv\",\n    kwargs={\n        \"arg1\": \"arg1\",\n        \"arg2\": \"arg2\",\n    },\n)\n\ngym.register(\n    id=\"test/NoHuman-v0\",\n    entry_point=\"tests.envs.utils_envs:NoHuman\",\n)\ngym.register(\n    id=\"test/NoHumanOldAPI-v0\",\n    entry_point=\"tests.envs.utils_envs:NoHumanOldAPI\",\n)\n\ngym.register(\n    id=\"test/NoHumanNoRGB-v0\",\n    entry_point=\"tests.envs.utils_envs:NoHumanNoRGB\",\n)\n\n\ndef test_make():\n    env = gym.make(\"CartPole-v1\", disable_env_checker=True)\n    assert env.spec.id == \"CartPole-v1\"\n    assert isinstance(env.unwrapped, cartpole.CartPoleEnv)\n    env.close()\n\n\ndef test_make_deprecated():\n    with warnings.catch_warnings(record=True):\n        with pytest.raises(\n            gym.error.Error,\n            match=re.escape(\n                \"Environment version v0 for `Humanoid` is deprecated. Please use `Humanoid-v4` instead.\"\n            ),\n        ):\n            gym.make(\"Humanoid-v0\", disable_env_checker=True)\n\n\ndef test_make_max_episode_steps():\n    # Default, uses the spec's\n    env = gym.make(\"CartPole-v1\", disable_env_checker=True)\n    assert has_wrapper(env, TimeLimit)\n    assert (\n        env.spec.max_episode_steps == gym.envs.registry[\"CartPole-v1\"].max_episode_steps\n    )\n    env.close()\n\n    # Custom max episode steps\n    env = gym.make(\"CartPole-v1\", max_episode_steps=100, disable_env_checker=True)\n    assert has_wrapper(env, TimeLimit)\n    assert env.spec.max_episode_steps == 100\n    env.close()\n\n    # Env spec has no max episode steps\n    assert gym.spec(\"test.ArgumentEnv-v0\").max_episode_steps is None\n    env = gym.make(\n        \"test.ArgumentEnv-v0\", arg1=None, arg2=None, arg3=None, disable_env_checker=True\n    )\n    assert has_wrapper(env, TimeLimit) is False\n    env.close()\n\n\ndef test_gym_make_autoreset():\n    \"\"\"Tests that `gym.make` autoreset wrapper is applied only when `gym.make(..., autoreset=True)`.\"\"\"\n    env = gym.make(\"CartPole-v1\", disable_env_checker=True)\n    assert has_wrapper(env, AutoResetWrapper) is False\n    env.close()\n\n    env = gym.make(\"CartPole-v1\", autoreset=False, disable_env_checker=True)\n    assert has_wrapper(env, AutoResetWrapper) is False\n    env.close()\n\n    env = gym.make(\"CartPole-v1\", autoreset=True)\n    assert has_wrapper(env, AutoResetWrapper)\n    env.close()\n\n\ndef test_make_disable_env_checker():\n    \"\"\"Tests that `gym.make` disable env checker is applied only when `gym.make(..., disable_env_checker=False)`.\"\"\"\n    spec = deepcopy(gym.spec(\"CartPole-v1\"))\n\n    # Test with spec disable env checker\n    spec.disable_env_checker = False\n    env = gym.make(spec)\n    assert has_wrapper(env, PassiveEnvChecker)\n    env.close()\n\n    # Test with overwritten spec using make disable env checker\n    assert spec.disable_env_checker is False\n    env = gym.make(spec, disable_env_checker=True)\n    assert has_wrapper(env, PassiveEnvChecker) is False\n    env.close()\n\n    # Test with spec enabled disable env checker\n    spec.disable_env_checker = True\n    env = gym.make(spec)\n    assert has_wrapper(env, PassiveEnvChecker) is False\n    env.close()\n\n    # Test with overwritten spec using make disable env checker\n    assert spec.disable_env_checker is True\n    env = gym.make(spec, disable_env_checker=False)\n    assert has_wrapper(env, PassiveEnvChecker)\n    env.close()\n\n\ndef test_apply_api_compatibility():\n    gym.register(\n        \"testing-old-env\",\n        lambda: GenericTestEnv(step_fn=old_step_fn),\n        apply_api_compatibility=True,\n        max_episode_steps=3,\n    )\n    env = gym.make(\"testing-old-env\")\n\n    env.reset()\n    assert len(env.step(env.action_space.sample())) == 5\n    env.step(env.action_space.sample())\n    _, _, termination, truncation, _ = env.step(env.action_space.sample())\n    assert termination is False and truncation is True\n\n    gym.spec(\"testing-old-env\").apply_api_compatibility = False\n    env = gym.make(\"testing-old-env\")\n    # Cannot run reset and step as will not work\n\n    env = gym.make(\"testing-old-env\", apply_api_compatibility=True)\n\n    env.reset()\n    assert len(env.step(env.action_space.sample())) == 5\n    env.step(env.action_space.sample())\n    _, _, termination, truncation, _ = env.step(env.action_space.sample())\n    assert termination is False and truncation is True\n\n    gym.envs.registry.pop(\"testing-old-env\")\n\n\n@pytest.mark.parametrize(\n    \"spec\", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]\n)\ndef test_passive_checker_wrapper_warnings(spec):\n    with warnings.catch_warnings(record=True) as caught_warnings:\n        env = gym.make(spec)  # disable_env_checker=False\n        env.reset()\n        env.step(env.action_space.sample())\n        # todo, add check for render, bugged due to mujoco v2/3 and v4 envs\n\n        env.close()\n\n    for warning in caught_warnings:\n        if warning.message.args[0] not in PASSIVE_CHECK_IGNORE_WARNING:\n            raise gym.error.Error(f\"Unexpected warning: {warning.message}\")\n\n\ndef test_make_order_enforcing():\n    \"\"\"Checks that gym.make wrappers the environment with the OrderEnforcing wrapper.\"\"\"\n    assert all(spec.order_enforce is True for spec in all_testing_env_specs)\n\n    env = gym.make(\"CartPole-v1\", disable_env_checker=True)\n    assert has_wrapper(env, OrderEnforcing)\n    # We can assume that there all other specs will also have the order enforcing\n    env.close()\n\n    gym.register(\n        id=\"test.OrderlessArgumentEnv-v0\",\n        entry_point=\"tests.envs.utils_envs:ArgumentEnv\",\n        order_enforce=False,\n        kwargs={\"arg1\": None, \"arg2\": None, \"arg3\": None},\n    )\n\n    env = gym.make(\"test.OrderlessArgumentEnv-v0\", disable_env_checker=True)\n    assert has_wrapper(env, OrderEnforcing) is False\n    env.close()\n\n\ndef test_make_render_mode():\n    env = gym.make(\"CartPole-v1\", disable_env_checker=True)\n    assert env.render_mode is None\n    env.close()\n\n    # Make sure that render_mode is applied correctly\n    env = gym.make(\n        \"CartPole-v1\", render_mode=\"rgb_array_list\", disable_env_checker=True\n    )\n    assert env.render_mode == \"rgb_array_list\"\n    env.reset()\n    renders = env.render()\n    assert isinstance(\n        renders, list\n    )  # Make sure that the `render` method does what is supposed to\n    assert isinstance(renders[0], np.ndarray)\n    env.close()\n\n    env = gym.make(\"CartPole-v1\", render_mode=None, disable_env_checker=True)\n    assert env.render_mode is None\n    valid_render_modes = env.metadata[\"render_modes\"]\n    env.close()\n\n    assert len(valid_render_modes) > 0\n    with warnings.catch_warnings(record=True) as caught_warnings:\n        env = gym.make(\n            \"CartPole-v1\", render_mode=valid_render_modes[0], disable_env_checker=True\n        )\n        assert env.render_mode == valid_render_modes[0]\n        env.close()\n\n    for warning in caught_warnings:\n        raise gym.error.Error(f\"Unexpected warning: {warning.message}\")\n\n    # Make sure that native rendering is used when possible\n    env = gym.make(\"CartPole-v1\", render_mode=\"human\", disable_env_checker=True)\n    assert not has_wrapper(env, HumanRendering)  # Should use native human-rendering\n    assert env.render_mode == \"human\"\n    env.close()\n\n    with pytest.warns(\n        UserWarning,\n        match=re.escape(\n            \"You are trying to use 'human' rendering for an environment that doesn't natively support it. The HumanRendering wrapper is being applied to your environment.\"\n        ),\n    ):\n        # Make sure that `HumanRendering` is applied here\n        env = gym.make(\n            \"test/NoHuman-v0\", render_mode=\"human\", disable_env_checker=True\n        )  # This environment doesn't use native rendering\n        assert has_wrapper(env, HumanRendering)\n        assert env.render_mode == \"human\"\n        env.close()\n\n    with pytest.raises(\n        TypeError, match=re.escape(\"got an unexpected keyword argument 'render_mode'\")\n    ):\n        gym.make(\n            \"test/NoHumanOldAPI-v0\",\n            render_mode=\"rgb_array_list\",\n            disable_env_checker=True,\n        )\n\n    # Make sure that an additional error is thrown a user tries to use the wrapper on an environment with old API\n    with warnings.catch_warnings(record=True):\n        with pytest.raises(\n            gym.error.Error,\n            match=re.escape(\n                \"You passed render_mode='human' although test/NoHumanOldAPI-v0 doesn't implement human-rendering natively.\"\n            ),\n        ):\n            gym.make(\n                \"test/NoHumanOldAPI-v0\", render_mode=\"human\", disable_env_checker=True\n            )\n\n    # This test ensures that the additional exception \"Gym tried to apply the HumanRendering wrapper but it looks like\n    # your environment is using the old rendering API\" is *not* triggered by a TypeError that originate from\n    # a keyword that is not `render_mode`\n    with pytest.raises(\n        TypeError,\n        match=re.escape(\"got an unexpected keyword argument 'render'\"),\n    ):\n        gym.make(\"CarRacing-v2\", render=\"human\")\n\n\ndef test_make_kwargs():\n    env = gym.make(\n        \"test.ArgumentEnv-v0\",\n        arg2=\"override_arg2\",\n        arg3=\"override_arg3\",\n        disable_env_checker=True,\n    )\n    assert env.spec.id == \"test.ArgumentEnv-v0\"\n    assert isinstance(env.unwrapped, ArgumentEnv)\n    assert env.arg1 == \"arg1\"\n    assert env.arg2 == \"override_arg2\"\n    assert env.arg3 == \"override_arg3\"\n    env.close()\n\n\ndef test_import_module_during_make():\n    # Test custom environment which is registered at make\n    env = gym.make(\n        \"tests.envs.utils:RegisterDuringMakeEnv-v0\",\n        disable_env_checker=True,\n    )\n    assert isinstance(env.unwrapped, RegisterDuringMakeEnv)\n    env.close()\n"
  },
  {
    "path": "tests/envs/test_mujoco.py",
    "content": "import numpy as np\nimport pytest\n\nimport gym\nfrom gym import envs\nfrom gym.envs.registration import EnvSpec\nfrom tests.envs.utils import mujoco_testing_env_specs\n\nEPS = 1e-6\n\n\ndef verify_environments_match(\n    old_env_id: str, new_env_id: str, seed: int = 1, num_actions: int = 1000\n):\n    \"\"\"Verifies with two environment ids (old and new) are identical in obs, reward and done\n    (except info where all old info must be contained in new info).\"\"\"\n    old_env = envs.make(old_env_id, disable_env_checker=True)\n    new_env = envs.make(new_env_id, disable_env_checker=True)\n\n    old_reset_obs, old_info = old_env.reset(seed=seed)\n    new_reset_obs, new_info = new_env.reset(seed=seed)\n\n    np.testing.assert_allclose(old_reset_obs, new_reset_obs)\n\n    for i in range(num_actions):\n        action = old_env.action_space.sample()\n        old_obs, old_reward, old_terminated, old_truncated, old_info = old_env.step(\n            action\n        )\n        new_obs, new_reward, new_terminated, new_truncated, new_info = new_env.step(\n            action\n        )\n\n        np.testing.assert_allclose(old_obs, new_obs, atol=EPS)\n        np.testing.assert_allclose(old_reward, new_reward, atol=EPS)\n        np.testing.assert_equal(old_terminated, new_terminated)\n        np.testing.assert_equal(old_truncated, new_truncated)\n\n        for key in old_info:\n            np.testing.assert_allclose(old_info[key], new_info[key], atol=EPS)\n\n        if old_terminated or old_truncated:\n            break\n\n\nEXCLUDE_POS_FROM_OBS = [\n    \"Ant\",\n    \"HalfCheetah\",\n    \"Hopper\",\n    \"Humanoid\",\n    \"Swimmer\",\n    \"Walker2d\",\n]\n\n\n@pytest.mark.parametrize(\n    \"env_spec\",\n    mujoco_testing_env_specs,\n    ids=[env_spec.id for env_spec in mujoco_testing_env_specs],\n)\ndef test_obs_space_mujoco_environments(env_spec: EnvSpec):\n    \"\"\"Check that the returned observations are contained in the observation space of the environment\"\"\"\n    env = env_spec.make(disable_env_checker=True)\n    reset_obs, info = env.reset()\n    assert env.observation_space.contains(\n        reset_obs\n    ), f\"Obseravtion returned by reset() of {env_spec.id} is not contained in the default observation space {env.observation_space}.\"\n\n    action = env.action_space.sample()\n    step_obs, _, _, _, _ = env.step(action)\n    assert env.observation_space.contains(\n        step_obs\n    ), f\"Obseravtion returned by step(action) of {env_spec.id} is not contained in the default observation space {env.observation_space}.\"\n\n    if env_spec.name in EXCLUDE_POS_FROM_OBS and (\n        env_spec.version == 4 or env_spec.version == 3\n    ):\n        env = env_spec.make(\n            disable_env_checker=True, exclude_current_positions_from_observation=False\n        )\n        reset_obs, info = env.reset()\n        assert env.observation_space.contains(\n            reset_obs\n        ), f\"Obseravtion of {env_spec.id} is not contained in the default observation space {env.observation_space} when excluding current position from observation.\"\n\n        step_obs, _, _, _, _ = env.step(action)\n        assert env.observation_space.contains(\n            step_obs\n        ), f\"Obseravtion returned by step(action) of {env_spec.id} is not contained in the default observation space {env.observation_space} when excluding current position from observation.\"\n\n    # Ant-v4 has the option of including contact forces in the observation space with the use_contact_forces argument\n    if env_spec.name == \"Ant\" and env_spec.version == 4:\n        env = env_spec.make(disable_env_checker=True, use_contact_forces=True)\n        reset_obs, info = env.reset()\n        assert env.observation_space.contains(\n            reset_obs\n        ), f\"Obseravtion of {env_spec.id} is not contained in the default observation space {env.observation_space} when using contact forces.\"\n\n        step_obs, _, _, _, _ = env.step(action)\n        assert env.observation_space.contains(\n            step_obs\n        ), f\"Obseravtion returned by step(action) of {env_spec.id} is not contained in the default observation space {env.observation_space} when using contact forces.\"\n\n\nMUJOCO_V2_V3_ENVS = [\n    spec.name\n    for spec in mujoco_testing_env_specs\n    if spec.version == 2 and f\"{spec.name}-v3\" in gym.envs.registry\n]\n\n\n@pytest.mark.parametrize(\"env_name\", MUJOCO_V2_V3_ENVS)\ndef test_mujoco_v2_to_v3_conversion(env_name: str):\n    \"\"\"Checks that all v2 mujoco environments are the same as v3 environments.\"\"\"\n    verify_environments_match(f\"{env_name}-v2\", f\"{env_name}-v3\")\n\n\n@pytest.mark.parametrize(\"env_name\", MUJOCO_V2_V3_ENVS)\ndef test_mujoco_incompatible_v3_to_v2(env_name: str):\n    \"\"\"Checks that the v3 environment are slightly different from v2, (v3 has additional info keys that v2 does not).\"\"\"\n    with pytest.raises(KeyError):\n        verify_environments_match(f\"{env_name}-v3\", f\"{env_name}-v2\")\n"
  },
  {
    "path": "tests/envs/test_register.py",
    "content": "\"\"\"Tests that `gym.register` works as expected.\"\"\"\nimport re\nfrom typing import Optional\n\nimport pytest\n\nimport gym\n\n\n@pytest.fixture(scope=\"function\")\ndef register_testing_envs():\n    \"\"\"Registers testing environments.\"\"\"\n    namespace = \"MyAwesomeNamespace\"\n    versioned_name = \"MyAwesomeVersionedEnv\"\n    unversioned_name = \"MyAwesomeUnversionedEnv\"\n    versions = [1, 3, 5]\n    for version in versions:\n        env_id = f\"{namespace}/{versioned_name}-v{version}\"\n        gym.register(\n            id=env_id,\n            entry_point=\"tests.envs.utils_envs:ArgumentEnv\",\n            kwargs={\n                \"arg1\": \"arg1\",\n                \"arg2\": \"arg2\",\n                \"arg3\": \"arg3\",\n            },\n        )\n    gym.register(\n        id=f\"{namespace}/{unversioned_name}\",\n        entry_point=\"tests.env.utils_envs:ArgumentEnv\",\n        kwargs={\n            \"arg1\": \"arg1\",\n            \"arg2\": \"arg2\",\n            \"arg3\": \"arg3\",\n        },\n    )\n\n    yield\n\n    for version in versions:\n        env_id = f\"{namespace}/{versioned_name}-v{version}\"\n        del gym.envs.registry[env_id]\n    del gym.envs.registry[f\"{namespace}/{unversioned_name}\"]\n\n\n@pytest.mark.parametrize(\n    \"env_id, namespace, name, version\",\n    [\n        (\n            \"MyAwesomeNamespace/MyAwesomeEnv-v0\",\n            \"MyAwesomeNamespace\",\n            \"MyAwesomeEnv\",\n            0,\n        ),\n        (\"MyAwesomeEnv-v0\", None, \"MyAwesomeEnv\", 0),\n        (\"MyAwesomeEnv\", None, \"MyAwesomeEnv\", None),\n        (\"MyAwesomeEnv-vfinal-v0\", None, \"MyAwesomeEnv-vfinal\", 0),\n        (\"MyAwesomeEnv-vfinal\", None, \"MyAwesomeEnv-vfinal\", None),\n        (\"MyAwesomeEnv--\", None, \"MyAwesomeEnv--\", None),\n        (\"MyAwesomeEnv-v\", None, \"MyAwesomeEnv-v\", None),\n    ],\n)\ndef test_register(\n    env_id: str, namespace: Optional[str], name: str, version: Optional[int]\n):\n    gym.register(env_id, \"no-entry-point\")\n    assert gym.spec(env_id).id == env_id\n\n    full_name = f\"{name}\"\n    if namespace:\n        full_name = f\"{namespace}/{full_name}\"\n    if version is not None:\n        full_name = f\"{full_name}-v{version}\"\n\n    assert full_name in gym.envs.registry.keys()\n\n    del gym.envs.registry[env_id]\n\n\n@pytest.mark.parametrize(\n    \"env_id\",\n    [\n        \"“Breakout-v0”\",\n        \"MyNotSoAwesomeEnv-vNone\\n\",\n        \"MyNamespace///MyNotSoAwesomeEnv-vNone\",\n    ],\n)\ndef test_register_error(env_id):\n    with pytest.raises(gym.error.Error, match=f\"^Malformed environment ID: {env_id}\"):\n        gym.register(env_id, \"no-entry-point\")\n\n\n@pytest.mark.parametrize(\n    \"env_id_input, env_id_suggested\",\n    [\n        (\"cartpole-v1\", \"CartPole\"),\n        (\"blackjack-v1\", \"Blackjack\"),\n        (\"Blackjock-v1\", \"Blackjack\"),\n        (\"mountaincarcontinuous-v0\", \"MountainCarContinuous\"),\n        (\"taxi-v3\", \"Taxi\"),\n        (\"taxi-v30\", \"Taxi\"),\n        (\"MyAwesomeNamspce/MyAwesomeVersionedEnv-v1\", \"MyAwesomeNamespace\"),\n        (\"MyAwesomeNamspce/MyAwesomeUnversionedEnv\", \"MyAwesomeNamespace\"),\n        (\"MyAwesomeNamespace/MyAwesomeUnversioneEnv\", \"MyAwesomeUnversionedEnv\"),\n        (\"MyAwesomeNamespace/MyAwesomeVersioneEnv\", \"MyAwesomeVersionedEnv\"),\n    ],\n)\ndef test_env_suggestions(register_testing_envs, env_id_input, env_id_suggested):\n    with pytest.raises(\n        gym.error.UnregisteredEnv, match=f\"Did you mean: `{env_id_suggested}`?\"\n    ):\n        gym.make(env_id_input, disable_env_checker=True)\n\n\n@pytest.mark.parametrize(\n    \"env_id_input, suggested_versions, default_version\",\n    [\n        (\"CartPole-v12\", \"`v0`, `v1`\", False),\n        (\"Blackjack-v10\", \"`v1`\", False),\n        (\"MountainCarContinuous-v100\", \"`v0`\", False),\n        (\"Taxi-v30\", \"`v3`\", False),\n        (\"MyAwesomeNamespace/MyAwesomeVersionedEnv-v6\", \"`v1`, `v3`, `v5`\", False),\n        (\"MyAwesomeNamespace/MyAwesomeUnversionedEnv-v6\", \"\", True),\n    ],\n)\ndef test_env_version_suggestions(\n    register_testing_envs, env_id_input, suggested_versions, default_version\n):\n    if default_version:\n        with pytest.raises(\n            gym.error.DeprecatedEnv,\n            match=\"It provides the default version\",  # env name,\n        ):\n            gym.make(env_id_input, disable_env_checker=True)\n    else:\n        with pytest.raises(\n            gym.error.UnregisteredEnv,\n            match=f\"It provides versioned environments: \\\\[ {suggested_versions} \\\\]\",\n        ):\n            gym.make(env_id_input, disable_env_checker=True)\n\n\ndef test_register_versioned_unversioned():\n    # Register versioned then unversioned\n    versioned_env = \"Test/MyEnv-v0\"\n    gym.register(versioned_env, \"no-entry-point\")\n    assert gym.envs.spec(versioned_env).id == versioned_env\n\n    unversioned_env = \"Test/MyEnv\"\n    with pytest.raises(\n        gym.error.RegistrationError,\n        match=re.escape(\n            \"Can't register the unversioned environment `Test/MyEnv` when the versioned environment `Test/MyEnv-v0` of the same name already exists\"\n        ),\n    ):\n        gym.register(unversioned_env, \"no-entry-point\")\n\n    # Clean everything\n    del gym.envs.registry[versioned_env]\n\n    # Register unversioned then versioned\n    gym.register(unversioned_env, \"no-entry-point\")\n    assert gym.envs.spec(unversioned_env).id == unversioned_env\n    with pytest.raises(\n        gym.error.RegistrationError,\n        match=re.escape(\n            \"Can't register the versioned environment `Test/MyEnv-v0` when the unversioned environment `Test/MyEnv` of the same name already exists.\"\n        ),\n    ):\n        gym.register(versioned_env, \"no-entry-point\")\n\n    # Clean everything\n    del gym.envs.registry[unversioned_env]\n\n\ndef test_make_latest_versioned_env(register_testing_envs):\n    with pytest.warns(\n        UserWarning,\n        match=re.escape(\n            \"Using the latest versioned environment `MyAwesomeNamespace/MyAwesomeVersionedEnv-v5` instead of the unversioned environment `MyAwesomeNamespace/MyAwesomeVersionedEnv`.\"\n        ),\n    ):\n        env = gym.make(\n            \"MyAwesomeNamespace/MyAwesomeVersionedEnv\", disable_env_checker=True\n        )\n    assert env.spec.id == \"MyAwesomeNamespace/MyAwesomeVersionedEnv-v5\"\n\n\ndef test_namespace():\n    # Check if the namespace context manager works\n    with gym.envs.registration.namespace(\"MyDefaultNamespace\"):\n        gym.register(\"MyDefaultEnvironment-v0\", \"no-entry-point\")\n    gym.register(\"MyDefaultEnvironment-v1\", \"no-entry-point\")\n    assert \"MyDefaultNamespace/MyDefaultEnvironment-v0\" in gym.envs.registry\n    assert \"MyDefaultEnvironment-v1\" in gym.envs.registry\n\n    del gym.envs.registry[\"MyDefaultNamespace/MyDefaultEnvironment-v0\"]\n    del gym.envs.registry[\"MyDefaultEnvironment-v1\"]\n"
  },
  {
    "path": "tests/envs/test_spec.py",
    "content": "\"\"\"Tests that gym.spec works as expected.\"\"\"\n\nimport re\n\nimport pytest\n\nimport gym\n\n\ndef test_spec():\n    spec = gym.spec(\"CartPole-v1\")\n    assert spec.id == \"CartPole-v1\"\n    assert spec is gym.envs.registry[\"CartPole-v1\"]\n\n\ndef test_spec_kwargs():\n    map_name_value = \"8x8\"\n    env = gym.make(\"FrozenLake-v1\", map_name=map_name_value)\n    assert env.spec.kwargs[\"map_name\"] == map_name_value\n\n\ndef test_spec_missing_lookup():\n    gym.register(id=\"Test1-v0\", entry_point=\"no-entry-point\")\n    gym.register(id=\"Test1-v15\", entry_point=\"no-entry-point\")\n    gym.register(id=\"Test1-v9\", entry_point=\"no-entry-point\")\n    gym.register(id=\"Other1-v100\", entry_point=\"no-entry-point\")\n\n    with pytest.raises(\n        gym.error.DeprecatedEnv,\n        match=re.escape(\n            \"Environment version v1 for `Test1` is deprecated. Please use `Test1-v15` instead.\"\n        ),\n    ):\n        gym.spec(\"Test1-v1\")\n\n    with pytest.raises(\n        gym.error.UnregisteredEnv,\n        match=re.escape(\n            \"Environment version `v1000` for environment `Test1` doesn't exist. It provides versioned environments: [ `v0`, `v9`, `v15` ].\"\n        ),\n    ):\n        gym.spec(\"Test1-v1000\")\n\n    with pytest.raises(\n        gym.error.UnregisteredEnv,\n        match=re.escape(\"Environment Unknown1 doesn't exist. \"),\n    ):\n        gym.spec(\"Unknown1-v1\")\n\n\ndef test_spec_malformed_lookup():\n    with pytest.raises(\n        gym.error.Error,\n        match=f'^{re.escape(\"Malformed environment ID: “Breakout-v0”.(Currently all IDs must be of the form [namespace/](env-name)-v(version). (namespace is optional))\")}$',\n    ):\n        gym.spec(\"“Breakout-v0”\")\n\n\ndef test_spec_versioned_lookups():\n    gym.register(\"test/Test2-v5\", \"no-entry-point\")\n\n    with pytest.raises(\n        gym.error.VersionNotFound,\n        match=re.escape(\n            \"Environment version `v9` for environment `test/Test2` doesn't exist. It provides versioned environments: [ `v5` ].\"\n        ),\n    ):\n        gym.spec(\"test/Test2-v9\")\n\n    with pytest.raises(\n        gym.error.DeprecatedEnv,\n        match=re.escape(\n            \"Environment version v4 for `test/Test2` is deprecated. Please use `test/Test2-v5` instead.\"\n        ),\n    ):\n        gym.spec(\"test/Test2-v4\")\n\n    assert gym.spec(\"test/Test2-v5\") is not None\n\n\ndef test_spec_default_lookups():\n    gym.register(\"test/Test3\", \"no-entry-point\")\n\n    with pytest.raises(\n        gym.error.DeprecatedEnv,\n        match=re.escape(\n            \"Environment version `v0` for environment `test/Test3` doesn't exist. It provides the default version test/Test3`.\"\n        ),\n    ):\n        gym.spec(\"test/Test3-v0\")\n\n    assert gym.spec(\"test/Test3\") is not None\n"
  },
  {
    "path": "tests/envs/utils.py",
    "content": "\"\"\"Finds all the specs that we can test with\"\"\"\nfrom typing import List, Optional\n\nimport numpy as np\n\nimport gym\nfrom gym import error, logger\nfrom gym.envs.registration import EnvSpec\n\n\ndef try_make_env(env_spec: EnvSpec) -> Optional[gym.Env]:\n    \"\"\"Tries to make the environment showing if it is possible.\n\n    Warning the environments have no wrappers, including time limit and order enforcing.\n    \"\"\"\n    # To avoid issues with registered environments during testing, we check that the spec entry points are from gym.envs.\n    if \"gym.envs.\" in env_spec.entry_point:\n        try:\n            return env_spec.make(disable_env_checker=True).unwrapped\n        except (ImportError, error.DependencyNotInstalled) as e:\n            logger.warn(f\"Not testing {env_spec.id} due to error: {e}\")\n    return None\n\n\n# Tries to make all environment to test with\nall_testing_initialised_envs: List[Optional[gym.Env]] = [\n    try_make_env(env_spec) for env_spec in gym.envs.registry.values()\n]\nall_testing_initialised_envs: List[gym.Env] = [\n    env for env in all_testing_initialised_envs if env is not None\n]\n\n# All testing, mujoco and gym environment specs\nall_testing_env_specs: List[EnvSpec] = [\n    env.spec for env in all_testing_initialised_envs\n]\nmujoco_testing_env_specs: List[EnvSpec] = [\n    env_spec\n    for env_spec in all_testing_env_specs\n    if \"gym.envs.mujoco\" in env_spec.entry_point\n]\ngym_testing_env_specs: List[EnvSpec] = [\n    env_spec\n    for env_spec in all_testing_env_specs\n    if any(\n        f\"gym.envs.{ep}\" in env_spec.entry_point\n        for ep in [\"box2d\", \"classic_control\", \"toy_text\"]\n    )\n]\n# TODO, add minimum testing env spec in testing\nminimum_testing_env_specs = [\n    env_spec\n    for env_spec in [\n        \"CartPole-v1\",\n        \"MountainCarContinuous-v0\",\n        \"LunarLander-v2\",\n        \"LunarLanderContinuous-v2\",\n        \"CarRacing-v2\",\n        \"Blackjack-v1\",\n        \"Reacher-v4\",\n    ]\n    if env_spec in all_testing_env_specs\n]\n\n\ndef assert_equals(a, b, prefix=None):\n    \"\"\"Assert equality of data structures `a` and `b`.\n\n    Args:\n        a: first data structure\n        b: second data structure\n        prefix: prefix for failed assertion message for types and dicts\n    \"\"\"\n    assert type(a) == type(b), f\"{prefix}Differing types: {a} and {b}\"\n    if isinstance(a, dict):\n        assert list(a.keys()) == list(b.keys()), f\"{prefix}Key sets differ: {a} and {b}\"\n\n        for k in a.keys():\n            v_a = a[k]\n            v_b = b[k]\n            assert_equals(v_a, v_b)\n    elif isinstance(a, np.ndarray):\n        np.testing.assert_array_equal(a, b)\n    elif isinstance(a, tuple):\n        for elem_from_a, elem_from_b in zip(a, b):\n            assert_equals(elem_from_a, elem_from_b)\n    else:\n        assert a == b\n"
  },
  {
    "path": "tests/envs/utils_envs.py",
    "content": "import gym\n\n\nclass RegisterDuringMakeEnv(gym.Env):\n    \"\"\"Used in `test_registration.py` to check if `env.make` can import and register an env\"\"\"\n\n    def __init__(self):\n        self.action_space = gym.spaces.Discrete(1)\n        self.observation_space = gym.spaces.Discrete(1)\n\n\nclass ArgumentEnv(gym.Env):\n    observation_space = gym.spaces.Box(low=-1, high=1, shape=(1,))\n    action_space = gym.spaces.Box(low=-1, high=1, shape=(1,))\n\n    def __init__(self, arg1, arg2, arg3):\n        self.arg1 = arg1\n        self.arg2 = arg2\n        self.arg3 = arg3\n\n\n# Environments to test render_mode\nclass NoHuman(gym.Env):\n    \"\"\"Environment that does not have human-rendering.\"\"\"\n\n    metadata = {\"render_modes\": [\"rgb_array_list\"], \"render_fps\": 4}\n\n    def __init__(self, render_mode=None):\n        assert render_mode in self.metadata[\"render_modes\"]\n        self.render_mode = render_mode\n\n\nclass NoHumanOldAPI(gym.Env):\n    \"\"\"Environment that does not have human-rendering.\"\"\"\n\n    metadata = {\"render_modes\": [\"rgb_array_list\"], \"render_fps\": 4}\n\n    def __init__(self):\n        pass\n\n\nclass NoHumanNoRGB(gym.Env):\n    \"\"\"Environment that has neither human- nor rgb-rendering\"\"\"\n\n    metadata = {\"render_modes\": [\"ascii\"], \"render_fps\": 4}\n\n    def __init__(self, render_mode=None):\n        assert render_mode in self.metadata[\"render_modes\"]\n        self.render_mode = render_mode\n"
  },
  {
    "path": "tests/spaces/__init__.py",
    "content": ""
  },
  {
    "path": "tests/spaces/test_box.py",
    "content": "import re\nimport warnings\n\nimport numpy as np\nimport pytest\n\nimport gym.error\nfrom gym.spaces import Box\nfrom gym.spaces.box import get_inf\n\n\n@pytest.mark.parametrize(\n    \"box,expected_shape\",\n    [\n        (  # Test with same 1-dim low and high shape\n            Box(low=np.zeros(2), high=np.ones(2), dtype=np.int32),\n            (2,),\n        ),\n        (  # Test with same multi-dim low and high shape\n            Box(low=np.zeros((2, 1)), high=np.ones((2, 1)), dtype=np.int32),\n            (2, 1),\n        ),\n        (  # Test with scalar low high and different shape\n            Box(low=0, high=1, shape=(5, 2)),\n            (5, 2),\n        ),\n        (Box(low=0, high=1), (1,)),  # Test with int and int\n        (Box(low=0.0, high=1.0), (1,)),  # Test with float and float\n        (Box(low=np.zeros(1)[0], high=np.ones(1)[0]), (1,)),\n        (Box(low=0.0, high=1), (1,)),  # Test with float and int\n        (Box(low=0, high=np.int32(1)), (1,)),  # Test with python int and numpy int32\n        (Box(low=0, high=np.ones(3)), (3,)),  # Test with array and scalar\n        (Box(low=np.zeros(3), high=1.0), (3,)),  # Test with array and scalar\n    ],\n)\ndef test_shape_inference(box, expected_shape):\n    \"\"\"Test that the shape inference is as expected.\"\"\"\n    assert box.shape == expected_shape\n    assert box.sample().shape == expected_shape\n\n\n@pytest.mark.parametrize(\n    \"value,valid\",\n    [\n        (1, True),\n        (1.0, True),\n        (np.int32(1), True),\n        (np.float32(1.0), True),\n        (np.zeros(2, dtype=np.float32), True),\n        (np.zeros((2, 2), dtype=np.float32), True),\n        (np.inf, True),\n        (np.nan, True),  # This is a weird case that we allow\n        (True, False),\n        (np.bool8(True), False),\n        (1 + 1j, False),\n        (np.complex128(1 + 1j), False),\n        (\"string\", False),\n    ],\n)\ndef test_low_high_values(value, valid: bool):\n    \"\"\"Test what `low` and `high` values are valid for `Box` space.\"\"\"\n    if valid:\n        with warnings.catch_warnings(record=True) as caught_warnings:\n            Box(low=value, high=value)\n        assert len(caught_warnings) == 0, tuple(\n            warning.message for warning in caught_warnings\n        )\n    else:\n        with pytest.raises(\n            ValueError,\n            match=re.escape(\n                \"expect their types to be np.ndarray, an integer or a float\"\n            ),\n        ):\n            Box(low=value, high=value)\n\n\n@pytest.mark.parametrize(\n    \"low,high,kwargs,error,message\",\n    [\n        (\n            0,\n            1,\n            {\"dtype\": None},\n            AssertionError,\n            \"Box dtype must be explicitly provided, cannot be None.\",\n        ),\n        (\n            0,\n            1,\n            {\"shape\": (None,)},\n            AssertionError,\n            \"Expect all shape elements to be an integer, actual type: (<class 'NoneType'>,)\",\n        ),\n        (\n            0,\n            1,\n            {\n                \"shape\": (\n                    1,\n                    None,\n                )\n            },\n            AssertionError,\n            \"Expect all shape elements to be an integer, actual type: (<class 'int'>, <class 'NoneType'>)\",\n        ),\n        (\n            0,\n            1,\n            {\n                \"shape\": (\n                    np.int64(1),\n                    None,\n                )\n            },\n            AssertionError,\n            \"Expect all shape elements to be an integer, actual type: (<class 'numpy.int64'>, <class 'NoneType'>)\",\n        ),\n        (\n            None,\n            None,\n            {},\n            ValueError,\n            \"Box shape is inferred from low and high, expect their types to be np.ndarray, an integer or a float, actual type low: <class 'NoneType'>, high: <class 'NoneType'>\",\n        ),\n        (\n            0,\n            None,\n            {},\n            ValueError,\n            \"Box shape is inferred from low and high, expect their types to be np.ndarray, an integer or a float, actual type low: <class 'int'>, high: <class 'NoneType'>\",\n        ),\n        (\n            np.zeros(3),\n            np.ones(2),\n            {},\n            AssertionError,\n            \"high.shape doesn't match provided shape, high.shape: (2,), shape: (3,)\",\n        ),\n    ],\n)\ndef test_init_errors(low, high, kwargs, error, message):\n    \"\"\"Test all constructor errors.\"\"\"\n    with pytest.raises(error, match=f\"^{re.escape(message)}$\"):\n        Box(low=low, high=high, **kwargs)\n\n\ndef test_dtype_check():\n    \"\"\"Tests the Box contains function with different dtypes.\"\"\"\n    # Related Issues:\n    # https://github.com/openai/gym/issues/2357\n    # https://github.com/openai/gym/issues/2298\n\n    space = Box(0, 1, (), dtype=np.float32)\n\n    # casting will match the correct type\n    assert np.array(0.5, dtype=np.float32) in space\n\n    # float16 is in float32 space\n    assert np.array(0.5, dtype=np.float16) in space\n\n    # float64 is not in float32 space\n    assert np.array(0.5, dtype=np.float64) not in space\n\n\n@pytest.mark.parametrize(\n    \"space\",\n    [\n        Box(low=0, high=np.inf, shape=(2,), dtype=np.int32),\n        Box(low=0, high=np.inf, shape=(2,), dtype=np.float32),\n        Box(low=0, high=np.inf, shape=(2,), dtype=np.int64),\n        Box(low=0, high=np.inf, shape=(2,), dtype=np.float64),\n        Box(low=-np.inf, high=0, shape=(2,), dtype=np.int32),\n        Box(low=-np.inf, high=0, shape=(2,), dtype=np.float32),\n        Box(low=-np.inf, high=0, shape=(2,), dtype=np.int64),\n        Box(low=-np.inf, high=0, shape=(2,), dtype=np.float64),\n        Box(low=-np.inf, high=np.inf, shape=(2,), dtype=np.int32),\n        Box(low=-np.inf, high=np.inf, shape=(2,), dtype=np.float32),\n        Box(low=-np.inf, high=np.inf, shape=(2,), dtype=np.int64),\n        Box(low=-np.inf, high=np.inf, shape=(2,), dtype=np.float64),\n        Box(low=0, high=np.inf, shape=(2, 3), dtype=np.int32),\n        Box(low=0, high=np.inf, shape=(2, 3), dtype=np.float32),\n        Box(low=0, high=np.inf, shape=(2, 3), dtype=np.int64),\n        Box(low=0, high=np.inf, shape=(2, 3), dtype=np.float64),\n        Box(low=-np.inf, high=0, shape=(2, 3), dtype=np.int32),\n        Box(low=-np.inf, high=0, shape=(2, 3), dtype=np.float32),\n        Box(low=-np.inf, high=0, shape=(2, 3), dtype=np.int64),\n        Box(low=-np.inf, high=0, shape=(2, 3), dtype=np.float64),\n        Box(low=-np.inf, high=np.inf, shape=(2, 3), dtype=np.int32),\n        Box(low=-np.inf, high=np.inf, shape=(2, 3), dtype=np.float32),\n        Box(low=-np.inf, high=np.inf, shape=(2, 3), dtype=np.int64),\n        Box(low=-np.inf, high=np.inf, shape=(2, 3), dtype=np.float64),\n        Box(low=np.array([-np.inf, 0]), high=np.array([0.0, np.inf]), dtype=np.int32),\n        Box(low=np.array([-np.inf, 0]), high=np.array([0.0, np.inf]), dtype=np.float32),\n        Box(low=np.array([-np.inf, 0]), high=np.array([0.0, np.inf]), dtype=np.int64),\n        Box(low=np.array([-np.inf, 0]), high=np.array([0.0, np.inf]), dtype=np.float64),\n    ],\n)\ndef test_infinite_space(space):\n    \"\"\"\n    To test spaces that are passed in have only 0 or infinite bounds because `space.high` and `space.low`\n     are both modified within the init, we check for infinite when we know it's not 0\n    \"\"\"\n\n    assert np.all(\n        space.low < space.high\n    ), f\"Box low bound ({space.low}) is not lower than the high bound ({space.high})\"\n\n    space.seed(0)\n    sample = space.sample()\n\n    # check if space contains sample\n    assert (\n        sample in space\n    ), f\"Sample ({sample}) not inside space according to `space.contains()`\"\n\n    # manually check that the sign of the sample is within the bounds\n    assert np.all(\n        np.sign(sample) <= np.sign(space.high)\n    ), f\"Sign of sample ({sample}) is less than space upper bound ({space.high})\"\n    assert np.all(\n        np.sign(space.low) <= np.sign(sample)\n    ), f\"Sign of sample ({sample}) is more than space lower bound ({space.low})\"\n\n    # check that int bounds are bounded for everything\n    # but floats are unbounded for infinite\n    if np.any(space.high != 0):\n        assert (\n            space.is_bounded(\"above\") is False\n        ), \"inf upper bound supposed to be unbounded\"\n    else:\n        assert (\n            space.is_bounded(\"above\") is True\n        ), \"non-inf upper bound supposed to be bounded\"\n\n    if np.any(space.low != 0):\n        assert (\n            space.is_bounded(\"below\") is False\n        ), \"inf lower bound supposed to be unbounded\"\n    else:\n        assert (\n            space.is_bounded(\"below\") is True\n        ), \"non-inf lower bound supposed to be bounded\"\n\n    if np.any(space.low != 0) or np.any(space.high != 0):\n        assert space.is_bounded(\"both\") is False\n    else:\n        assert space.is_bounded(\"both\") is True\n\n    # check for dtype\n    assert (\n        space.high.dtype == space.dtype\n    ), f\"High's dtype {space.high.dtype} doesn't match `space.dtype`'\"\n    assert (\n        space.low.dtype == space.dtype\n    ), f\"Low's dtype {space.high.dtype} doesn't match `space.dtype`'\"\n\n    with pytest.raises(\n        ValueError, match=\"manner is not in {'below', 'above', 'both'}, actual value:\"\n    ):\n        space.is_bounded(\"test\")\n\n\ndef test_legacy_state_pickling():\n    legacy_state = {\n        \"dtype\": np.dtype(\"float32\"),\n        \"_shape\": (5,),\n        \"low\": np.array([0.0, 0.0, 0.0, 0.0, 0.0], dtype=np.float32),\n        \"high\": np.array([1.0, 1.0, 1.0, 1.0, 1.0], dtype=np.float32),\n        \"bounded_below\": np.array([True, True, True, True, True]),\n        \"bounded_above\": np.array([True, True, True, True, True]),\n        \"_np_random\": None,\n    }\n\n    b = Box(-1, 1, ())\n    assert \"low_repr\" in b.__dict__ and \"high_repr\" in b.__dict__\n    del b.__dict__[\"low_repr\"]\n    del b.__dict__[\"high_repr\"]\n    assert \"low_repr\" not in b.__dict__ and \"high_repr\" not in b.__dict__\n\n    b.__setstate__(legacy_state)\n    assert b.low_repr == \"0.0\"\n    assert b.high_repr == \"1.0\"\n\n\ndef test_get_inf():\n    \"\"\"Tests that get inf function works as expected, primarily for coverage.\"\"\"\n    assert get_inf(np.float32, \"+\") == np.inf\n    assert get_inf(np.float16, \"-\") == -np.inf\n    with pytest.raises(\n        TypeError, match=re.escape(\"Unknown sign *, use either '+' or '-'\")\n    ):\n        get_inf(np.float32, \"*\")\n\n    assert get_inf(np.int16, \"+\") == 32765\n    assert get_inf(np.int8, \"-\") == -126\n    with pytest.raises(\n        TypeError, match=re.escape(\"Unknown sign *, use either '+' or '-'\")\n    ):\n        get_inf(np.int32, \"*\")\n\n    with pytest.raises(\n        ValueError,\n        match=re.escape(\"Unknown dtype <class 'numpy.complex128'> for infinite bounds\"),\n    ):\n        get_inf(np.complex_, \"+\")\n\n\ndef test_sample_mask():\n    \"\"\"Box cannot have a mask applied.\"\"\"\n    space = Box(0, 1)\n    with pytest.raises(\n        gym.error.Error,\n        match=re.escape(\"Box.sample cannot be provided a mask, actual value: \"),\n    ):\n        space.sample(mask=np.array([0, 1, 0], dtype=np.int8))\n"
  },
  {
    "path": "tests/spaces/test_dict.py",
    "content": "from collections import OrderedDict\n\nimport numpy as np\nimport pytest\n\nfrom gym.spaces import Box, Dict, Discrete\n\n\ndef test_dict_init():\n    with pytest.raises(\n        AssertionError,\n        match=r\"^Unexpected Dict space input, expecting dict, OrderedDict or Sequence, actual type: \",\n    ):\n        Dict(Discrete(2))\n\n    with pytest.raises(\n        ValueError,\n        match=\"Dict space keyword 'a' already exists in the spaces dictionary\",\n    ):\n        Dict({\"a\": Discrete(3)}, a=Box(0, 1))\n\n    with pytest.raises(\n        AssertionError,\n        match=\"Dict space element is not an instance of Space: key='b', space=Box\",\n    ):\n        Dict(a=Discrete(2), b=\"Box\")\n\n    with pytest.warns(None) as warnings:\n        a = Dict({\"a\": Discrete(2), \"b\": Box(low=0.0, high=1.0)})\n        b = Dict(OrderedDict(a=Discrete(2), b=Box(low=0.0, high=1.0)))\n        c = Dict(((\"a\", Discrete(2)), (\"b\", Box(low=0.0, high=1.0))))\n        d = Dict(a=Discrete(2), b=Box(low=0.0, high=1.0))\n\n        assert a == b == c == d\n    assert len(warnings) == 0\n\n    with pytest.warns(None) as warnings:\n        Dict({1: Discrete(2), \"a\": Discrete(3)})\n    assert len(warnings) == 0\n\n\nDICT_SPACE = Dict(\n    {\n        \"a\": Box(low=0, high=1, shape=(3, 3)),\n        \"b\": Dict(\n            {\n                \"b_1\": Box(low=-100, high=100, shape=(2,)),\n                \"b_2\": Box(low=-1, high=1, shape=(2,)),\n            }\n        ),\n        \"c\": Discrete(5),\n    }\n)\n\n\ndef test_dict_seeding():\n    seeds = DICT_SPACE.seed(\n        {\n            \"a\": 0,\n            \"b\": {\n                \"b_1\": 1,\n                \"b_2\": 2,\n            },\n            \"c\": 3,\n        }\n    )\n    assert all(isinstance(seed, int) for seed in seeds)\n\n    # \"Unpack\" the dict sub-spaces into individual spaces\n    a = Box(low=0, high=1, shape=(3, 3), seed=0)\n    b_1 = Box(low=-100, high=100, shape=(2,), seed=1)\n    b_2 = Box(low=-1, high=1, shape=(2,), seed=2)\n    c = Discrete(5, seed=3)\n\n    for i in range(10):\n        dict_sample = DICT_SPACE.sample()\n        assert np.all(dict_sample[\"a\"] == a.sample())\n        assert np.all(dict_sample[\"b\"][\"b_1\"] == b_1.sample())\n        assert np.all(dict_sample[\"b\"][\"b_2\"] == b_2.sample())\n        assert dict_sample[\"c\"] == c.sample()\n\n\ndef test_int_seeding():\n    seeds = DICT_SPACE.seed(1)\n    assert all(isinstance(seed, int) for seed in seeds)\n\n    # rng, seeds = seeding.np_random(1)\n    # subseeds = rng.choice(np.iinfo(int).max, size=3, replace=False)\n    # b_rng, b_seeds = seeding.np_random(int(subseeds[1]))\n    # b_subseeds = b_rng.choice(np.iinfo(int).max, size=2, replace=False)\n\n    # \"Unpack\" the dict sub-spaces into individual spaces\n    a = Box(low=0, high=1, shape=(3, 3), seed=seeds[1])\n    b_1 = Box(low=-100, high=100, shape=(2,), seed=seeds[3])\n    b_2 = Box(low=-1, high=1, shape=(2,), seed=seeds[4])\n    c = Discrete(5, seed=seeds[5])\n\n    for i in range(10):\n        dict_sample = DICT_SPACE.sample()\n        assert np.all(dict_sample[\"a\"] == a.sample())\n        assert np.all(dict_sample[\"b\"][\"b_1\"] == b_1.sample())\n        assert np.all(dict_sample[\"b\"][\"b_2\"] == b_2.sample())\n        assert dict_sample[\"c\"] == c.sample()\n\n\ndef test_none_seeding():\n    seeds = DICT_SPACE.seed(None)\n    assert len(seeds) == 4 and all(isinstance(seed, int) for seed in seeds)\n\n\ndef test_bad_seed():\n    with pytest.raises(TypeError):\n        DICT_SPACE.seed(\"a\")\n\n\ndef test_mapping():\n    \"\"\"The Gym Dict space inherits from Mapping that allows it to appear like a standard python Dictionary.\"\"\"\n    assert len(DICT_SPACE) == 3\n\n    a = DICT_SPACE[\"a\"]\n    b = Discrete(5)\n    assert a != b\n    DICT_SPACE[\"a\"] = b\n    assert DICT_SPACE[\"a\"] == b\n\n    with pytest.raises(\n        AssertionError,\n        match=\"Trying to set a to Dict space with value that is not a gym space, actual type: <class 'int'>\",\n    ):\n        DICT_SPACE[\"a\"] = 5\n\n    DICT_SPACE[\"a\"] = a\n\n\ndef test_iterator():\n    \"\"\"Tests the Dict `__iter__` function correctly returns keys in the subspaces\"\"\"\n    for key in DICT_SPACE:\n        assert key in DICT_SPACE.spaces\n\n    assert {key for key in DICT_SPACE} == DICT_SPACE.spaces.keys()\n"
  },
  {
    "path": "tests/spaces/test_discrete.py",
    "content": "import numpy as np\n\nfrom gym.spaces import Discrete\n\n\ndef test_space_legacy_pickling():\n    \"\"\"Test the legacy pickle of Discrete that is missing the `start` parameter.\"\"\"\n    legacy_state = {\n        \"shape\": (\n            1,\n            2,\n            3,\n        ),\n        \"dtype\": np.int64,\n        \"np_random\": np.random.default_rng(),\n        \"n\": 3,\n    }\n    space = Discrete(1)\n    space.__setstate__(legacy_state)\n\n    assert space.shape == legacy_state[\"shape\"]\n    assert space.np_random == legacy_state[\"np_random\"]\n    assert space.n == 3\n    assert space.dtype == legacy_state[\"dtype\"]\n\n    # Test that start is missing\n    assert \"start\" in space.__dict__\n    del space.__dict__[\"start\"]  # legacy did not include start param\n    assert \"start\" not in space.__dict__\n\n    space.__setstate__(legacy_state)\n    assert space.start == 0\n\n\ndef test_sample_mask():\n    space = Discrete(4, start=2)\n    assert 2 <= space.sample() < 6\n    assert space.sample(mask=np.array([0, 1, 0, 0], dtype=np.int8)) == 3\n    assert space.sample(mask=np.array([0, 0, 0, 0], dtype=np.int8)) == 2\n    assert space.sample(mask=np.array([0, 1, 0, 1], dtype=np.int8)) in [3, 5]\n"
  },
  {
    "path": "tests/spaces/test_graph.py",
    "content": "import re\n\nimport numpy as np\nimport pytest\n\nfrom gym.spaces import Discrete, Graph, GraphInstance\n\n\ndef test_node_space_sample():\n    space = Graph(node_space=Discrete(3), edge_space=None)\n    space.seed(0)\n\n    sample = space.sample(\n        mask=(tuple(np.array([0, 1, 0], dtype=np.int8) for _ in range(5)), None),\n        num_nodes=5,\n    )\n    assert sample in space\n    assert np.all(sample.nodes == 1)\n\n    sample = space.sample(\n        (\n            (np.array([1, 0, 0], dtype=np.int8), np.array([0, 1, 0], dtype=np.int8)),\n            None,\n        ),\n        num_nodes=2,\n    )\n    assert sample in space\n    assert np.all(sample.nodes == np.array([0, 1]))\n\n    with pytest.warns(\n        UserWarning,\n        match=re.escape(\"The number of edges is set (5) but the edge space is None.\"),\n    ):\n        sample = space.sample(num_edges=5)\n        assert sample in space\n\n    # Change the node_space or edge_space to a non-Box or discrete space.\n    # This should not happen, test is primarily to increase coverage.\n    with pytest.raises(\n        TypeError,\n        match=re.escape(\n            \"Expects base space to be Box and Discrete, actual space: <class 'str'>\"\n        ),\n    ):\n        space.node_space = \"abc\"\n        space.sample()\n\n\ndef test_edge_space_sample():\n    space = Graph(node_space=Discrete(3), edge_space=Discrete(3))\n    space.seed(0)\n    # When num_nodes>1 then num_edges is set to 0\n    assert space.sample(num_nodes=1).edges is None\n    assert 0 <= len(space.sample(num_edges=3).edges) < 6\n\n    sample = space.sample(mask=(None, np.array([0, 1, 0], dtype=np.int8)))\n    assert np.all(sample.edges == 1) or sample.edges is None\n\n    sample = space.sample(\n        mask=(\n            None,\n            (\n                np.array([1, 0, 0], dtype=np.int8),\n                np.array([0, 1, 0], dtype=np.int8),\n                np.array([0, 0, 1], dtype=np.int8),\n            ),\n        ),\n        num_edges=3,\n    )\n    assert np.all(sample.edges == np.array([0, 1, 2]))\n\n    with pytest.raises(\n        AssertionError,\n        match=\"Expects the number of edges to be greater than 0, actual value: -1\",\n    ):\n        space.sample(num_edges=-1)\n\n    space = Graph(node_space=Discrete(3), edge_space=None)\n    with pytest.warns(\n        UserWarning,\n        match=re.escape(\n            \"\\x1b[33mWARN: The number of edges is set (5) but the edge space is None.\\x1b[0m\"\n        ),\n    ):\n        sample = space.sample(num_edges=5)\n    assert sample.edges is None\n\n\n@pytest.mark.parametrize(\n    \"sample\",\n    [\n        \"abc\",\n        GraphInstance(\n            nodes=None, edges=np.array([0, 1]), edge_links=np.array([[0, 1], [1, 0]])\n        ),\n        GraphInstance(\n            nodes=np.array([10, 1, 0]),\n            edges=np.array([0, 1]),\n            edge_links=np.array([[0, 1], [1, 0]]),\n        ),\n        GraphInstance(\n            nodes=np.array([0, 1]), edges=None, edge_links=np.array([[0, 1], [1, 0]])\n        ),\n        GraphInstance(nodes=np.array([0, 1]), edges=np.array([0, 1]), edge_links=None),\n        GraphInstance(\n            nodes=np.array([1, 2]),\n            edges=np.array([10, 1]),\n            edge_links=np.array([[0, 1], [1, 0]]),\n        ),\n        GraphInstance(\n            nodes=np.array([1, 2]),\n            edges=np.array([0, 1]),\n            edge_links=np.array([[0.5, 1.0], [2.0, 1.0]]),\n        ),\n        GraphInstance(\n            nodes=np.array([1, 2]), edges=np.array([10, 1]), edge_links=np.array([0, 1])\n        ),\n        GraphInstance(\n            nodes=np.array([1, 2]),\n            edges=np.array([0, 1]),\n            edge_links=np.array([[[0], [1]], [[0], [0]]]),\n        ),\n        GraphInstance(\n            nodes=np.array([1, 2]),\n            edges=np.array([0, 1]),\n            edge_links=np.array([[10, 1], [0, 0]]),\n        ),\n        GraphInstance(\n            nodes=np.array([1, 2]),\n            edges=np.array([0, 1]),\n            edge_links=np.array([[-10, 1], [0, 0]]),\n        ),\n    ],\n)\ndef test_not_contains(sample):\n    space = Graph(node_space=Discrete(2), edge_space=Discrete(2))\n    assert sample not in space\n"
  },
  {
    "path": "tests/spaces/test_multibinary.py",
    "content": "import numpy as np\n\nfrom gym.spaces import MultiBinary\n\n\ndef test_sample():\n    space = MultiBinary(4)\n\n    sample = space.sample(mask=np.array([0, 0, 1, 1], dtype=np.int8))\n    assert np.all(sample == [0, 0, 1, 1])\n\n    sample = space.sample(mask=np.array([0, 1, 2, 2], dtype=np.int8))\n    assert sample[0] == 0 and sample[1] == 1\n    assert sample[2] == 0 or sample[2] == 1\n    assert sample[3] == 0 or sample[3] == 1\n\n    space = MultiBinary(np.array([2, 3]))\n    sample = space.sample(mask=np.array([[0, 0, 0], [1, 1, 1]], dtype=np.int8))\n    assert np.all(sample == [[0, 0, 0], [1, 1, 1]]), sample\n"
  },
  {
    "path": "tests/spaces/test_multidiscrete.py",
    "content": "import pytest\n\nfrom gym.spaces import Discrete, MultiDiscrete\nfrom gym.utils.env_checker import data_equivalence\n\n\ndef test_multidiscrete_as_tuple():\n    # 1D multi-discrete\n    space = MultiDiscrete([3, 4, 5])\n\n    assert space.shape == (3,)\n    assert space[0] == Discrete(3)\n    assert space[0:1] == MultiDiscrete([3])\n    assert space[0:2] == MultiDiscrete([3, 4])\n    assert space[:] == space and space[:] is not space\n\n    # 2D multi-discrete\n    space = MultiDiscrete([[3, 4, 5], [6, 7, 8]])\n\n    assert space.shape == (2, 3)\n    assert space[0, 1] == Discrete(4)\n    assert space[0] == MultiDiscrete([3, 4, 5])\n    assert space[0:1] == MultiDiscrete([[3, 4, 5]])\n    assert space[0:2, :] == MultiDiscrete([[3, 4, 5], [6, 7, 8]])\n    assert space[:, 0:1] == MultiDiscrete([[3], [6]])\n    assert space[0:2, 0:2] == MultiDiscrete([[3, 4], [6, 7]])\n    assert space[:] == space and space[:] is not space\n    assert space[:, :] == space and space[:, :] is not space\n\n\ndef test_multidiscrete_subspace_reproducibility():\n    # 1D multi-discrete\n    space = MultiDiscrete([100, 200, 300])\n    space.seed()\n\n    assert data_equivalence(space[0].sample(), space[0].sample())\n    assert data_equivalence(space[0:1].sample(), space[0:1].sample())\n    assert data_equivalence(space[0:2].sample(), space[0:2].sample())\n    assert data_equivalence(space[:].sample(), space[:].sample())\n    assert data_equivalence(space[:].sample(), space.sample())\n\n    # 2D multi-discrete\n    space = MultiDiscrete([[300, 400, 500], [600, 700, 800]])\n    space.seed()\n\n    assert data_equivalence(space[0, 1].sample(), space[0, 1].sample())\n    assert data_equivalence(space[0].sample(), space[0].sample())\n    assert data_equivalence(space[0:1].sample(), space[0:1].sample())\n    assert data_equivalence(space[0:2, :].sample(), space[0:2, :].sample())\n    assert data_equivalence(space[:, 0:1].sample(), space[:, 0:1].sample())\n    assert data_equivalence(space[0:2, 0:2].sample(), space[0:2, 0:2].sample())\n    assert data_equivalence(space[:].sample(), space[:].sample())\n    assert data_equivalence(space[:, :].sample(), space[:, :].sample())\n    assert data_equivalence(space[:, :].sample(), space.sample())\n\n\ndef test_multidiscrete_length():\n    space = MultiDiscrete(nvec=[3, 2, 4])\n    assert len(space) == 3\n\n    space = MultiDiscrete(nvec=[[2, 3], [3, 2]])\n    with pytest.warns(\n        UserWarning,\n        match=\"Getting the length of a multi-dimensional MultiDiscrete space.\",\n    ):\n        assert len(space) == 2\n"
  },
  {
    "path": "tests/spaces/test_sequence.py",
    "content": "import re\n\nimport numpy as np\nimport pytest\n\nimport gym.spaces\n\n\ndef test_sample():\n    \"\"\"Tests the sequence sampling works as expects and the errors are correctly raised.\"\"\"\n    space = gym.spaces.Sequence(gym.spaces.Box(0, 1))\n\n    # Test integer mask length\n    for length in range(4):\n        sample = space.sample(mask=(length, None))\n        assert sample in space\n        assert len(sample) == length\n\n    with pytest.raises(\n        AssertionError,\n        match=re.escape(\n            \"Expects the length mask to be greater than or equal to zero, actual value: -1\"\n        ),\n    ):\n        space.sample(mask=(-1, None))\n\n    # Test np.array mask length\n    sample = space.sample(mask=(np.array([5]), None))\n    assert sample in space\n    assert len(sample) == 5\n\n    sample = space.sample(mask=(np.array([3, 4, 5]), None))\n    assert sample in space\n    assert len(sample) in [3, 4, 5]\n\n    with pytest.raises(\n        AssertionError,\n        match=re.escape(\n            \"Expects the shape of the length mask to be 1-dimensional, actual shape: (2, 2)\"\n        ),\n    ):\n        space.sample(mask=(np.array([[2, 2], [2, 2]]), None))\n\n    with pytest.raises(\n        AssertionError,\n        match=re.escape(\n            \"Expects all values in the length_mask to be greater than or equal to zero, actual values: [ 1  2 -1]\"\n        ),\n    ):\n        space.sample(mask=(np.array([1, 2, -1]), None))\n\n    # Test with an invalid length\n    with pytest.raises(\n        TypeError,\n        match=re.escape(\n            \"Expects the type of length_mask to an integer or a np.ndarray, actual type: <class 'str'>\"\n        ),\n    ):\n        space.sample(mask=(\"abc\", None))\n"
  },
  {
    "path": "tests/spaces/test_space.py",
    "content": "from functools import partial\n\nimport pytest\n\nfrom gym import Space\nfrom gym.spaces import utils\n\nTESTING_SPACE = Space()\n\n\n@pytest.mark.parametrize(\n    \"func\",\n    [\n        TESTING_SPACE.sample,\n        partial(TESTING_SPACE.contains, None),\n        partial(utils.flatdim, TESTING_SPACE),\n        partial(utils.flatten, TESTING_SPACE, None),\n        partial(utils.flatten_space, TESTING_SPACE),\n        partial(utils.unflatten, TESTING_SPACE, None),\n    ],\n)\ndef test_not_implemented_errors(func):\n    with pytest.raises(NotImplementedError):\n        func()\n"
  },
  {
    "path": "tests/spaces/test_spaces.py",
    "content": "import copy\nimport itertools\nimport json  # note: ujson fails this test due to float equality\nimport pickle\nimport tempfile\nfrom typing import List, Union\n\nimport numpy as np\nimport pytest\n\nfrom gym.spaces import Box, Discrete, MultiBinary, MultiDiscrete, Space, Text\nfrom gym.utils import seeding\nfrom gym.utils.env_checker import data_equivalence\nfrom tests.spaces.utils import (\n    TESTING_FUNDAMENTAL_SPACES,\n    TESTING_FUNDAMENTAL_SPACES_IDS,\n    TESTING_SPACES,\n    TESTING_SPACES_IDS,\n)\n\n# Due to this test taking a 1ms each then we don't mind generating so many tests\n# This generates all pairs of spaces of the same type in TESTING_SPACES\nTESTING_SPACES_PERMUTATIONS = list(\n    itertools.chain(\n        *[\n            list(itertools.permutations(list(group), r=2))\n            for key, group in itertools.groupby(\n                TESTING_SPACES, key=lambda space: type(space)\n            )\n        ]\n    )\n)\n\n\n@pytest.mark.parametrize(\"space\", TESTING_SPACES, ids=TESTING_SPACES_IDS)\ndef test_roundtripping(space: Space):\n    \"\"\"Tests if space samples passed to `to_jsonable` and `from_jsonable` produce the original samples.\"\"\"\n    sample_1 = space.sample()\n    sample_2 = space.sample()\n\n    # Convert the samples to json, dump + load json and convert back to python\n    sample_json = space.to_jsonable([sample_1, sample_2])\n    sample_roundtripped = json.loads(json.dumps(sample_json))\n    sample_1_prime, sample_2_prime = space.from_jsonable(sample_roundtripped)\n\n    # Check if the samples are equivalent\n    assert data_equivalence(\n        sample_1, sample_1_prime\n    ), f\"sample 1: {sample_1}, prime: {sample_1_prime}\"\n    assert data_equivalence(\n        sample_2, sample_2_prime\n    ), f\"sample 2: {sample_2}, prime: {sample_2_prime}\"\n\n\n@pytest.mark.parametrize(\n    \"space_1,space_2\",\n    TESTING_SPACES_PERMUTATIONS,\n    ids=[f\"({s1}, {s2})\" for s1, s2 in TESTING_SPACES_PERMUTATIONS],\n)\ndef test_space_equality(space_1, space_2):\n    \"\"\"Check that `space.__eq__` works.\n\n    Testing spaces permutations contains all combinations of testing spaces of the same type.\n    \"\"\"\n    assert space_1 == space_1\n    assert space_2 == space_2\n    assert space_1 != space_2\n\n\n# The expected sum of variance for an alpha of 0.05\n# CHI_SQUARED = [0] + [scipy.stats.chi2.isf(0.05, df=df) for df in range(1, 25)]\nCHI_SQUARED = np.array(\n    [\n        0.01,\n        3.8414588206941285,\n        5.991464547107983,\n        7.814727903251178,\n        9.487729036781158,\n        11.070497693516355,\n        12.59158724374398,\n        14.067140449340167,\n        15.507313055865454,\n        16.91897760462045,\n    ]\n)\n\n\n@pytest.mark.parametrize(\n    \"space\", TESTING_FUNDAMENTAL_SPACES, ids=TESTING_FUNDAMENTAL_SPACES_IDS\n)\ndef test_sample(space: Space, n_trials: int = 1_000):\n    \"\"\"Test the space sample has the expected distribution with the chi-squared test and KS test.\n\n    Example code with scipy.stats.chisquared that should have the same\n\n    >>> import scipy.stats\n    >>> variance = np.sum(np.square(observed_frequency - expected_frequency) / expected_frequency)\n    >>> f'X2 at alpha=0.05 = {scipy.stats.chi2.isf(0.05, df=4)}'\n    >>> f'p-value = {scipy.stats.chi2.sf(variance, df=4)}'\n    >>> scipy.stats.chisquare(f_obs=observed_frequency)\n    \"\"\"\n    space.seed(0)\n    samples = np.array([space.sample() for _ in range(n_trials)])\n    assert len(samples) == n_trials\n\n    if isinstance(space, Box):\n        # TODO: Add KS testing for continuous uniform distribution\n        pass\n    elif isinstance(space, Discrete):\n        expected_frequency = np.ones(space.n) * n_trials / space.n\n        observed_frequency = np.zeros(space.n)\n        for sample in samples:\n            observed_frequency[sample - space.start] += 1\n        degrees_of_freedom = space.n - 1\n\n        assert observed_frequency.shape == expected_frequency.shape\n        assert np.sum(observed_frequency) == n_trials\n\n        variance = np.sum(\n            np.square(expected_frequency - observed_frequency) / expected_frequency\n        )\n        assert variance < CHI_SQUARED[degrees_of_freedom]\n    elif isinstance(space, MultiBinary):\n        expected_frequency = n_trials / 2\n        observed_frequency = np.sum(samples, axis=0)\n        assert observed_frequency.shape == space.shape\n\n        # As this is a binary space, then we can be lazy in the variance as the np.square is symmetric for the 0 and 1 categories\n        variance = (\n            2 * np.square(observed_frequency - expected_frequency) / expected_frequency\n        )\n        assert variance.shape == space.shape\n        assert np.all(variance < CHI_SQUARED[1])\n    elif isinstance(space, MultiDiscrete):\n        # Due to the multi-axis capability of MultiDiscrete, these functions need to be recursive and that the expected / observed numpy are of non-regular shapes\n        def _generate_frequency(dim, func):\n            if isinstance(dim, np.ndarray):\n                return np.array(\n                    [_generate_frequency(sub_dim, func) for sub_dim in dim],\n                    dtype=object,\n                )\n            else:\n                return func(dim)\n\n        def _update_observed_frequency(obs_sample, obs_freq):\n            if isinstance(obs_sample, np.ndarray):\n                for sub_sample, sub_freq in zip(obs_sample, obs_freq):\n                    _update_observed_frequency(sub_sample, sub_freq)\n            else:\n                obs_freq[obs_sample] += 1\n\n        expected_frequency = _generate_frequency(\n            space.nvec, lambda dim: np.ones(dim) * n_trials / dim\n        )\n        observed_frequency = _generate_frequency(space.nvec, lambda dim: np.zeros(dim))\n        for sample in samples:\n            _update_observed_frequency(sample, observed_frequency)\n\n        def _chi_squared_test(dim, exp_freq, obs_freq):\n            if isinstance(dim, np.ndarray):\n                for sub_dim, sub_exp_freq, sub_obs_freq in zip(dim, exp_freq, obs_freq):\n                    _chi_squared_test(sub_dim, sub_exp_freq, sub_obs_freq)\n            else:\n                assert exp_freq.shape == (dim,) and obs_freq.shape == (dim,)\n                assert np.sum(obs_freq) == n_trials\n                assert np.sum(exp_freq) == n_trials\n                _variance = np.sum(np.square(exp_freq - obs_freq) / exp_freq)\n                _degrees_of_freedom = dim - 1\n                assert _variance < CHI_SQUARED[_degrees_of_freedom]\n\n        _chi_squared_test(space.nvec, expected_frequency, observed_frequency)\n    elif isinstance(space, Text):\n        expected_frequency = (\n            np.ones(len(space.character_set))\n            * n_trials\n            * (space.min_length + (space.max_length - space.min_length) / 2)\n            / len(space.character_set)\n        )\n        observed_frequency = np.zeros(len(space.character_set))\n        for sample in samples:\n            for x in sample:\n                observed_frequency[space.character_index(x)] += 1\n        degrees_of_freedom = len(space.character_set) - 1\n\n        assert observed_frequency.shape == expected_frequency.shape\n        assert np.sum(observed_frequency) == sum(len(sample) for sample in samples)\n\n        variance = np.sum(\n            np.square(expected_frequency - observed_frequency) / expected_frequency\n        )\n        if degrees_of_freedom == 61:\n            # scipy.stats.chi2.isf(0.05, df=61)\n            assert variance < 80.23209784876272\n        else:\n            assert variance < CHI_SQUARED[degrees_of_freedom]\n    else:\n        raise NotImplementedError(f\"Unknown sample testing for {type(space)}\")\n\n\nSAMPLE_MASK_RNG, _ = seeding.np_random(1)\n\n\n@pytest.mark.parametrize(\n    \"space,mask\",\n    itertools.zip_longest(\n        TESTING_FUNDAMENTAL_SPACES,\n        [\n            # Discrete\n            np.array([1, 1, 0], dtype=np.int8),\n            np.array([0, 0, 0], dtype=np.int8),\n            # Box\n            None,\n            None,\n            None,\n            None,\n            None,\n            # Multi-discrete\n            (np.array([1, 1], dtype=np.int8), np.array([0, 0], dtype=np.int8)),\n            (\n                (np.array([1, 0], dtype=np.int8), np.array([0, 1, 1], dtype=np.int8)),\n                (np.array([1, 1, 0], dtype=np.int8), np.array([0, 1], dtype=np.int8)),\n            ),\n            # Multi-binary\n            np.array([0, 1, 0, 1, 0, 2, 1, 1], dtype=np.int8),\n            np.array([[0, 1, 2], [0, 2, 1]], dtype=np.int8),\n            # Text\n            (None, SAMPLE_MASK_RNG.integers(low=0, high=2, size=62, dtype=np.int8)),\n            (4, SAMPLE_MASK_RNG.integers(low=0, high=2, size=62, dtype=np.int8)),\n            (None, np.array([1, 1, 0, 1, 0, 0], dtype=np.int8)),\n        ],\n    ),\n    ids=TESTING_FUNDAMENTAL_SPACES_IDS,\n)\ndef test_space_sample_mask(space: Space, mask, n_trials: int = 100):\n    \"\"\"Tests that the sampling a space with a mask has the expected distribution.\n\n    The implemented code is similar to the `test_space_sample` that considers the mask applied.\n    \"\"\"\n    if isinstance(space, Box):\n        # The box space can't have a sample mask\n        assert mask is None\n        return\n    assert mask is not None\n\n    space.seed(1)\n    samples = np.array([space.sample(mask) for _ in range(n_trials)])\n\n    if isinstance(space, Discrete):\n        if np.any(mask == 1):\n            expected_frequency = np.ones(space.n) * (n_trials / np.sum(mask)) * mask\n        else:\n            expected_frequency = np.zeros(space.n)\n            expected_frequency[0] = n_trials\n        observed_frequency = np.zeros(space.n)\n        for sample in samples:\n            observed_frequency[sample - space.start] += 1\n        degrees_of_freedom = max(np.sum(mask) - 1, 0)\n\n        assert observed_frequency.shape == expected_frequency.shape\n        assert np.sum(observed_frequency) == n_trials\n        assert np.sum(expected_frequency) == n_trials\n        variance = np.sum(\n            np.square(expected_frequency - observed_frequency)\n            / np.clip(expected_frequency, 1, None)\n        )\n        assert variance < CHI_SQUARED[degrees_of_freedom]\n    elif isinstance(space, MultiBinary):\n        expected_frequency = (\n            np.ones(space.shape) * np.where(mask == 2, 0.5, mask) * n_trials\n        )\n        print(expected_frequency)\n        observed_frequency = np.sum(samples, axis=0)\n        assert space.shape == expected_frequency.shape == observed_frequency.shape\n\n        variance = (\n            2\n            * np.square(observed_frequency - expected_frequency)\n            / np.clip(expected_frequency, 1, None)\n        )\n        assert variance.shape == space.shape\n        assert np.all(variance < CHI_SQUARED[1])\n    elif isinstance(space, MultiDiscrete):\n        # Due to the multi-axis capability of MultiDiscrete, these functions need to be recursive and that the expected / observed numpy are of non-regular shapes\n        def _generate_frequency(\n            _dim: Union[np.ndarray, int], _mask, func: callable\n        ) -> List:\n            if isinstance(_dim, np.ndarray):\n                return [\n                    _generate_frequency(sub_dim, sub_mask, func)\n                    for sub_dim, sub_mask in zip(_dim, _mask)\n                ]\n            else:\n                return func(_dim, _mask)\n\n        def _update_observed_frequency(obs_sample, obs_freq):\n            if isinstance(obs_sample, np.ndarray):\n                for sub_sample, sub_freq in zip(obs_sample, obs_freq):\n                    _update_observed_frequency(sub_sample, sub_freq)\n            else:\n                obs_freq[obs_sample] += 1\n\n        def _exp_freq_fn(_dim: int, _mask: np.ndarray):\n            if np.any(_mask == 1):\n                assert _dim == len(_mask)\n                return np.ones(_dim) * (n_trials / np.sum(_mask)) * _mask\n            else:\n                freq = np.zeros(_dim)\n                freq[0] = n_trials\n                return freq\n\n        expected_frequency = _generate_frequency(\n            space.nvec, mask, lambda dim, _mask: _exp_freq_fn(dim, _mask)\n        )\n        observed_frequency = _generate_frequency(\n            space.nvec, mask, lambda dim, _: np.zeros(dim)\n        )\n        for sample in samples:\n            _update_observed_frequency(sample, observed_frequency)\n\n        def _chi_squared_test(dim, _mask, exp_freq, obs_freq):\n            if isinstance(dim, np.ndarray):\n                for sub_dim, sub_mask, sub_exp_freq, sub_obs_freq in zip(\n                    dim, _mask, exp_freq, obs_freq\n                ):\n                    _chi_squared_test(sub_dim, sub_mask, sub_exp_freq, sub_obs_freq)\n            else:\n                assert exp_freq.shape == (dim,) and obs_freq.shape == (dim,)\n                assert np.sum(obs_freq) == n_trials\n                assert np.sum(exp_freq) == n_trials\n                _variance = np.sum(\n                    np.square(exp_freq - obs_freq) / np.clip(exp_freq, 1, None)\n                )\n                _degrees_of_freedom = max(np.sum(_mask) - 1, 0)\n                assert _variance < CHI_SQUARED[_degrees_of_freedom]\n\n        _chi_squared_test(space.nvec, mask, expected_frequency, observed_frequency)\n    elif isinstance(space, Text):\n        length, charlist_mask = mask\n\n        if length is None:\n            expected_length = (\n                space.min_length + (space.max_length - space.min_length) / 2\n            )\n        else:\n            expected_length = length\n\n        if np.any(charlist_mask == 1):\n            expected_frequency = (\n                np.ones(len(space.character_set))\n                * n_trials\n                * expected_length\n                / np.sum(charlist_mask)\n                * charlist_mask\n            )\n        else:\n            expected_frequency = np.zeros(len(space.character_set))\n\n        observed_frequency = np.zeros(len(space.character_set))\n        for sample in samples:\n            for char in sample:\n                observed_frequency[space.character_index(char)] += 1\n\n        degrees_of_freedom = max(np.sum(charlist_mask) - 1, 0)\n\n        assert observed_frequency.shape == expected_frequency.shape\n        assert np.sum(observed_frequency) == sum(len(sample) for sample in samples)\n\n        variance = np.sum(\n            np.square(expected_frequency - observed_frequency)\n            / np.clip(expected_frequency, 1, None)\n        )\n        if degrees_of_freedom == 26:\n            # scipy.stats.chi2.isf(0.05, df=29)\n            assert variance < 38.88513865983007\n        elif degrees_of_freedom == 31:\n            # scipy.stats.chi2.isf(0.05, df=31)\n            assert variance < 44.985343280365136\n        else:\n            assert variance < CHI_SQUARED[degrees_of_freedom]\n    else:\n        raise NotImplementedError()\n\n\n@pytest.mark.parametrize(\"space\", TESTING_SPACES, ids=TESTING_SPACES_IDS)\ndef test_seed_reproducibility(space):\n    \"\"\"Test that the set the space seed will reproduce the same samples.\"\"\"\n    space_1 = space\n    space_2 = copy.deepcopy(space)\n\n    for seed in range(5):\n        assert space_1.seed(seed) == space_2.seed(seed)\n        # With the same seed, the two spaces should be identical\n        assert all(\n            data_equivalence(space_1.sample(), space_2.sample()) for _ in range(10)\n        )\n\n    assert space_1.seed(123) != space_2.seed(456)\n    # Due to randomness, it is difficult to test that random seeds produce different answers\n    #   Therefore, taking 10 samples and checking that they are not all the same.\n    assert not all(\n        data_equivalence(space_1.sample(), space_2.sample()) for _ in range(10)\n    )\n\n\nSPACE_CLS = list(dict.fromkeys(type(space) for space in TESTING_SPACES))\nSPACE_KWARGS = [\n    {\"n\": 3},  # Discrete\n    {\"low\": 1, \"high\": 10},  # Box\n    {\"nvec\": [3, 2]},  # MultiDiscrete\n    {\"n\": 2},  # MultiBinary\n    {\"max_length\": 5},  # Text\n    {\"spaces\": (Discrete(3), Discrete(2))},  # Tuple\n    {\"spaces\": {\"a\": Discrete(3), \"b\": Discrete(2)}},  # Dict\n    {\"node_space\": Discrete(4), \"edge_space\": Discrete(3)},  # Graph\n    {\"space\": Discrete(4)},  # Sequence\n]\nassert len(SPACE_CLS) == len(SPACE_KWARGS)\n\n\n@pytest.mark.parametrize(\n    \"space_cls,kwarg\",\n    list(zip(SPACE_CLS, SPACE_KWARGS)),\n    ids=[f\"{space_cls}\" for space_cls in SPACE_CLS],\n)\ndef test_seed_np_random(space_cls, kwarg):\n    \"\"\"During initialisation of a space, a rng instance can be passed to the space.\n\n    Test that the space's `np_random` is the rng instance\n    \"\"\"\n    rng, _ = seeding.np_random(123)\n\n    space = space_cls(seed=rng, **kwarg)\n    assert space.np_random is rng\n\n\n@pytest.mark.parametrize(\"space\", TESTING_SPACES, ids=TESTING_SPACES_IDS)\ndef test_sample_contains(space):\n    \"\"\"Test that samples are contained within the space.\n\n    Then test that for all other spaces, we test that an error is not raise with a sample and a bool is returned.\n    As other spaces can be contained with this space, we cannot test that the contains is always true or false.\n    \"\"\"\n    for _ in range(10):\n        sample = space.sample()\n        assert sample in space\n        assert space.contains(sample)\n\n    for other_space in TESTING_SPACES:\n        assert isinstance(space.contains(other_space.sample()), bool)\n\n\n@pytest.mark.parametrize(\"space\", TESTING_SPACES, ids=TESTING_SPACES_IDS)\ndef test_repr(space):\n    assert isinstance(str(space), str)\n\n\n@pytest.mark.parametrize(\"space\", TESTING_SPACES, ids=TESTING_SPACES_IDS)\ndef test_space_pickling(space):\n    \"\"\"Tests the spaces can be pickled with the unpickled version being equivalent to the original.\"\"\"\n    space.seed(0)\n\n    # Pickle and unpickle with a string\n    pickled_space = pickle.dumps(space)\n    unpickled_space = pickle.loads(pickled_space)\n    assert space == unpickled_space\n\n    # Pickle and unpickle with a file\n    with tempfile.TemporaryFile() as f:\n        pickle.dump(space, f)\n        f.seek(0)\n        file_unpickled_space = pickle.load(f)\n\n    assert space == file_unpickled_space\n\n    # Check that space samples are the same\n    space_sample = space.sample()\n    unpickled_sample = unpickled_space.sample()\n    file_unpickled_sample = file_unpickled_space.sample()\n    assert data_equivalence(space_sample, unpickled_sample)\n    assert data_equivalence(space_sample, file_unpickled_sample)\n"
  },
  {
    "path": "tests/spaces/test_text.py",
    "content": "import re\n\nimport numpy as np\nimport pytest\n\nfrom gym.spaces import Text\n\n\ndef test_sample_mask():\n    space = Text(min_length=1, max_length=5)\n\n    # Test the sample length\n    sample = space.sample(mask=(3, None))\n    assert sample in space\n    assert len(sample) == 3\n\n    sample = space.sample(mask=None)\n    assert sample in space\n    assert 1 <= len(sample) <= 5\n\n    with pytest.raises(\n        ValueError,\n        match=re.escape(\n            \"Trying to sample with a minimum length > 0 (1) but the character mask is all zero meaning that no character could be sampled.\"\n        ),\n    ):\n        space.sample(mask=(3, np.zeros(len(space.character_set), dtype=np.int8)))\n\n    space = Text(min_length=0, max_length=5)\n    sample = space.sample(\n        mask=(None, np.zeros(len(space.character_set), dtype=np.int8))\n    )\n    assert sample in space\n    assert sample == \"\"\n\n    # Test the sample characters\n    space = Text(max_length=5, charset=\"abcd\")\n\n    sample = space.sample(mask=(3, np.array([0, 1, 0, 0], dtype=np.int8)))\n    assert sample in space\n    assert sample == \"bbb\"\n"
  },
  {
    "path": "tests/spaces/test_tuple.py",
    "content": "import numpy as np\nimport pytest\n\nimport gym.spaces\nfrom gym.spaces import Box, Dict, Discrete, MultiBinary, Tuple\nfrom gym.utils.env_checker import data_equivalence\n\n\ndef test_sequence_inheritance():\n    \"\"\"The gym Tuple space inherits from abc.Sequences, this test checks all functions work\"\"\"\n    spaces = [Discrete(5), Discrete(10), Discrete(5)]\n    tuple_space = Tuple(spaces)\n\n    assert len(tuple_space) == len(spaces)\n    # Test indexing\n    for i in range(len(tuple_space)):\n        assert tuple_space[i] == spaces[i]\n\n    # Test iterable\n    for space in tuple_space:\n        assert space in spaces\n\n    # Test count\n    assert tuple_space.count(Discrete(5)) == 2\n    assert tuple_space.count(Discrete(6)) == 0\n    assert tuple_space.count(MultiBinary(2)) == 0\n\n    # Test index\n    assert tuple_space.index(Discrete(5)) == 0\n    assert tuple_space.index(Discrete(5), 1) == 2\n\n    # Test errors\n    with pytest.raises(ValueError):\n        tuple_space.index(Discrete(10), 0, 1)\n    with pytest.raises(IndexError):\n        assert tuple_space[4]\n\n\n@pytest.mark.parametrize(\n    \"space, seed, expected_len\",\n    [\n        (Tuple([Discrete(5), Discrete(4)]), None, 2),\n        (Tuple([Discrete(5), Discrete(4)]), 123, 3),\n        (Tuple([Discrete(5), Discrete(4)]), (123, 456), 2),\n        (\n            Tuple(\n                (Discrete(5), Tuple((Box(low=0.0, high=1.0, shape=(3,)), Discrete(2))))\n            ),\n            (123, (456, 789)),\n            3,\n        ),\n        (\n            Tuple(\n                (\n                    Discrete(3),\n                    Dict(position=Box(low=0.0, high=1.0), velocity=Discrete(2)),\n                )\n            ),\n            (123, {\"position\": 456, \"velocity\": 789}),\n            3,\n        ),\n    ],\n)\ndef test_seeds(space, seed, expected_len):\n    seeds = space.seed(seed)\n    assert isinstance(seeds, list) and all(isinstance(elem, int) for elem in seeds)\n    assert len(seeds) == expected_len\n\n    sample1 = space.sample()\n\n    seeds2 = space.seed(seed)\n    sample2 = space.sample()\n\n    data_equivalence(seeds, seeds2)\n    data_equivalence(sample1, sample2)\n\n\n@pytest.mark.parametrize(\n    \"space_fn\",\n    [\n        lambda: Tuple([\"abc\"]),\n        lambda: Tuple([gym.spaces.Box(0, 1), \"abc\"]),\n        lambda: Tuple(\"abc\"),\n    ],\n)\ndef test_bad_space_calls(space_fn):\n    with pytest.raises(AssertionError):\n        space_fn()\n\n\ndef test_contains_promotion():\n    space = gym.spaces.Tuple((gym.spaces.Box(0, 1), gym.spaces.Box(-1, 0, (2,))))\n\n    assert (\n        np.array([0.0], dtype=np.float32),\n        np.array([0.0, 0.0], dtype=np.float32),\n    ) in space\n\n    space = gym.spaces.Tuple((gym.spaces.Box(0, 1), gym.spaces.Box(-1, 0, (1,))))\n    assert np.array([[0.0], [0.0]], dtype=np.float32) in space\n\n\ndef test_bad_seed():\n    space = gym.spaces.Tuple((gym.spaces.Box(0, 1), gym.spaces.Box(0, 1)))\n    with pytest.raises(\n        TypeError,\n        match=\"Expected seed type: list, tuple, int or None, actual type: <class 'float'>\",\n    ):\n        space.seed(0.0)\n"
  },
  {
    "path": "tests/spaces/test_utils.py",
    "content": "from itertools import zip_longest\nfrom typing import Optional\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym.spaces import Box, Graph, utils\nfrom gym.utils.env_checker import data_equivalence\nfrom tests.spaces.utils import TESTING_SPACES, TESTING_SPACES_IDS\n\nTESTING_SPACES_EXPECTED_FLATDIMS = [\n    # Discrete\n    3,\n    3,\n    # Box\n    1,\n    4,\n    2,\n    2,\n    2,\n    # Multi-discrete\n    4,\n    10,\n    # Multi-binary\n    8,\n    6,\n    # Text\n    6,\n    6,\n    6,\n    # Tuple\n    9,\n    7,\n    10,\n    6,\n    None,\n    # Dict\n    7,\n    8,\n    17,\n    None,\n    # Graph\n    None,\n    None,\n    None,\n    # Sequence\n    None,\n    None,\n    None,\n]\n\n\n@pytest.mark.parametrize(\n    [\"space\", \"flatdim\"],\n    zip_longest(TESTING_SPACES, TESTING_SPACES_EXPECTED_FLATDIMS),\n    ids=TESTING_SPACES_IDS,\n)\ndef test_flatdim(space: gym.spaces.Space, flatdim: Optional[int]):\n    \"\"\"Checks that the flattened dims of the space is equal to an expected value.\"\"\"\n    if space.is_np_flattenable:\n        dim = utils.flatdim(space)\n        assert dim == flatdim, f\"Expected {dim} to equal {flatdim}\"\n    else:\n        with pytest.raises(\n            ValueError,\n        ):\n            utils.flatdim(space)\n\n\n@pytest.mark.parametrize(\"space\", TESTING_SPACES, ids=TESTING_SPACES_IDS)\ndef test_flatten_space(space):\n    \"\"\"Test that the flattened spaces are a box and have the `flatdim` shape.\"\"\"\n    flat_space = utils.flatten_space(space)\n\n    if space.is_np_flattenable:\n        assert isinstance(flat_space, Box)\n        (single_dim,) = flat_space.shape\n        flatdim = utils.flatdim(space)\n\n        assert single_dim == flatdim\n    elif isinstance(flat_space, Graph):\n        assert isinstance(space, Graph)\n\n        (node_single_dim,) = flat_space.node_space.shape\n        node_flatdim = utils.flatdim(space.node_space)\n        assert node_single_dim == node_flatdim\n\n        if flat_space.edge_space is not None:\n            (edge_single_dim,) = flat_space.edge_space.shape\n            edge_flatdim = utils.flatdim(space.edge_space)\n            assert edge_single_dim == edge_flatdim\n    else:\n        assert isinstance(\n            space, (gym.spaces.Tuple, gym.spaces.Dict, gym.spaces.Sequence)\n        )\n\n\n@pytest.mark.parametrize(\"space\", TESTING_SPACES, ids=TESTING_SPACES_IDS)\ndef test_flatten(space):\n    \"\"\"Test that a flattened sample have the `flatdim` shape.\"\"\"\n    flattened_sample = utils.flatten(space, space.sample())\n\n    if space.is_np_flattenable:\n        assert isinstance(flattened_sample, np.ndarray)\n        (single_dim,) = flattened_sample.shape\n        flatdim = utils.flatdim(space)\n\n        assert single_dim == flatdim\n    else:\n        assert isinstance(flattened_sample, (tuple, dict, Graph))\n\n\n@pytest.mark.parametrize(\"space\", TESTING_SPACES, ids=TESTING_SPACES_IDS)\ndef test_flat_space_contains_flat_points(space):\n    \"\"\"Test that the flattened samples are contained within the flattened space.\"\"\"\n    flattened_samples = [utils.flatten(space, space.sample()) for _ in range(10)]\n    flat_space = utils.flatten_space(space)\n\n    for flat_sample in flattened_samples:\n        assert flat_sample in flat_space\n\n\n@pytest.mark.parametrize(\"space\", TESTING_SPACES, ids=TESTING_SPACES_IDS)\ndef test_flatten_roundtripping(space):\n    \"\"\"Tests roundtripping with flattening and unflattening are equal to the original sample.\"\"\"\n    samples = [space.sample() for _ in range(10)]\n\n    flattened_samples = [utils.flatten(space, sample) for sample in samples]\n    unflattened_samples = [\n        utils.unflatten(space, sample) for sample in flattened_samples\n    ]\n\n    for original, roundtripped in zip(samples, unflattened_samples):\n        assert data_equivalence(original, roundtripped)\n"
  },
  {
    "path": "tests/spaces/utils.py",
    "content": "from typing import List\n\nimport numpy as np\n\nfrom gym.spaces import (\n    Box,\n    Dict,\n    Discrete,\n    Graph,\n    MultiBinary,\n    MultiDiscrete,\n    Sequence,\n    Space,\n    Text,\n    Tuple,\n)\n\nTESTING_FUNDAMENTAL_SPACES = [\n    Discrete(3),\n    Discrete(3, start=-1),\n    Box(low=0.0, high=1.0),\n    Box(low=0.0, high=np.inf, shape=(2, 2)),\n    Box(low=np.array([-10.0, 0.0]), high=np.array([10.0, 10.0]), dtype=np.float64),\n    Box(low=-np.inf, high=0.0, shape=(2, 1)),\n    Box(low=0.0, high=np.inf, shape=(2, 1)),\n    MultiDiscrete([2, 2]),\n    MultiDiscrete([[2, 3], [3, 2]]),\n    MultiBinary(8),\n    MultiBinary([2, 3]),\n    Text(6),\n    Text(min_length=3, max_length=6),\n    Text(6, charset=\"abcdef\"),\n]\nTESTING_FUNDAMENTAL_SPACES_IDS = [f\"{space}\" for space in TESTING_FUNDAMENTAL_SPACES]\n\n\nTESTING_COMPOSITE_SPACES = [\n    # Tuple spaces\n    Tuple([Discrete(5), Discrete(4)]),\n    Tuple(\n        (\n            Discrete(5),\n            Box(\n                low=np.array([0.0, 0.0]),\n                high=np.array([1.0, 5.0]),\n                dtype=np.float64,\n            ),\n        )\n    ),\n    Tuple((Discrete(5), Tuple((Box(low=0.0, high=1.0, shape=(3,)), Discrete(2))))),\n    Tuple((Discrete(3), Dict(position=Box(low=0.0, high=1.0), velocity=Discrete(2)))),\n    Tuple((Graph(node_space=Box(-1, 1, shape=(2, 1)), edge_space=None), Discrete(2))),\n    # Dict spaces\n    Dict(\n        {\n            \"position\": Discrete(5),\n            \"velocity\": Box(\n                low=np.array([0.0, 0.0]),\n                high=np.array([1.0, 5.0]),\n                dtype=np.float64,\n            ),\n        }\n    ),\n    Dict(\n        position=Discrete(6),\n        velocity=Box(\n            low=np.array([0.0, 0.0]),\n            high=np.array([1.0, 5.0]),\n            dtype=np.float64,\n        ),\n    ),\n    Dict(\n        {\n            \"a\": Box(low=0, high=1, shape=(3, 3)),\n            \"b\": Dict(\n                {\n                    \"b_1\": Box(low=-100, high=100, shape=(2,)),\n                    \"b_2\": Box(low=-1, high=1, shape=(2,)),\n                }\n            ),\n            \"c\": Discrete(4),\n        }\n    ),\n    Dict(\n        a=Dict(\n            a=Graph(node_space=Box(-100, 100, shape=(2, 2)), edge_space=None),\n            b=Box(-100, 100, shape=(2, 2)),\n        ),\n        b=Tuple((Box(-100, 100, shape=(2,)), Box(-100, 100, shape=(2,)))),\n    ),\n    # Graph spaces\n    Graph(node_space=Box(low=-100, high=100, shape=(3, 4)), edge_space=Discrete(5)),\n    Graph(node_space=Discrete(5), edge_space=Box(low=-100, high=100, shape=(3, 4))),\n    Graph(node_space=Discrete(3), edge_space=Discrete(4)),\n    # Sequence spaces\n    Sequence(Discrete(4)),\n    Sequence(Dict({\"feature\": Box(0, 1, (3,))})),\n    Sequence(Graph(node_space=Box(-100, 100, shape=(2, 2)), edge_space=Discrete(4))),\n]\nTESTING_COMPOSITE_SPACES_IDS = [f\"{space}\" for space in TESTING_COMPOSITE_SPACES]\n\nTESTING_SPACES: List[Space] = TESTING_FUNDAMENTAL_SPACES + TESTING_COMPOSITE_SPACES\nTESTING_SPACES_IDS = TESTING_FUNDAMENTAL_SPACES_IDS + TESTING_COMPOSITE_SPACES_IDS\n"
  },
  {
    "path": "tests/test_core.py",
    "content": "from typing import Optional\n\nimport numpy as np\nimport pytest\n\nfrom gym import core, spaces\nfrom gym.wrappers import OrderEnforcing, TimeLimit\n\n\nclass ArgumentEnv(core.Env):\n    observation_space = spaces.Box(low=0, high=1, shape=(1,))\n    action_space = spaces.Box(low=0, high=1, shape=(1,))\n    calls = 0\n\n    def __init__(self, arg):\n        self.calls += 1\n        self.arg = arg\n\n\nclass UnittestEnv(core.Env):\n    observation_space = spaces.Box(low=0, high=255, shape=(64, 64, 3), dtype=np.uint8)\n    action_space = spaces.Discrete(3)\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        return self.observation_space.sample(), {\"info\": \"dummy\"}\n\n    def step(self, action):\n        observation = self.observation_space.sample()  # Dummy observation\n        return (observation, 0.0, False, {})\n\n\nclass UnknownSpacesEnv(core.Env):\n    \"\"\"This environment defines its observation & action spaces only\n    after the first call to reset. Although this pattern is sometimes\n    necessary when implementing a new environment (e.g. if it depends\n    on external resources), it is not encouraged.\n    \"\"\"\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        self.observation_space = spaces.Box(\n            low=0, high=255, shape=(64, 64, 3), dtype=np.uint8\n        )\n        self.action_space = spaces.Discrete(3)\n        return self.observation_space.sample(), {}  # Dummy observation with info\n\n    def step(self, action):\n        observation = self.observation_space.sample()  # Dummy observation\n        return (observation, 0.0, False, {})\n\n\nclass OldStyleEnv(core.Env):\n    \"\"\"This environment doesn't accept any arguments in reset, ideally we want to support this too (for now)\"\"\"\n\n    def __init__(self):\n        pass\n\n    def reset(self):\n        super().reset()\n        return 0\n\n    def step(self, action):\n        return 0, 0, False, {}\n\n\nclass NewPropertyWrapper(core.Wrapper):\n    def __init__(\n        self,\n        env,\n        observation_space=None,\n        action_space=None,\n        reward_range=None,\n        metadata=None,\n    ):\n        super().__init__(env)\n        if observation_space is not None:\n            # Only set the observation space if not None to test property forwarding\n            self.observation_space = observation_space\n        if action_space is not None:\n            self.action_space = action_space\n        if reward_range is not None:\n            self.reward_range = reward_range\n        if metadata is not None:\n            self.metadata = metadata\n\n\ndef test_env_instantiation():\n    # This looks like a pretty trivial, but given our usage of\n    # __new__, it's worth having.\n    env = ArgumentEnv(\"arg\")\n    assert env.arg == \"arg\"\n    assert env.calls == 1\n\n\nproperties = [\n    {\n        \"observation_space\": spaces.Box(\n            low=0.0, high=1.0, shape=(64, 64, 3), dtype=np.float32\n        )\n    },\n    {\"action_space\": spaces.Discrete(2)},\n    {\"reward_range\": (-1.0, 1.0)},\n    {\"metadata\": {\"render_modes\": [\"human\", \"rgb_array_list\"]}},\n    {\n        \"observation_space\": spaces.Box(\n            low=0.0, high=1.0, shape=(64, 64, 3), dtype=np.float32\n        ),\n        \"action_space\": spaces.Discrete(2),\n    },\n]\n\n\n@pytest.mark.parametrize(\"class_\", [UnittestEnv, UnknownSpacesEnv])\n@pytest.mark.parametrize(\"props\", properties)\ndef test_wrapper_property_forwarding(class_, props):\n    env = class_()\n    env = NewPropertyWrapper(env, **props)\n\n    # If UnknownSpacesEnv, then call reset to define the spaces\n    if isinstance(env.unwrapped, UnknownSpacesEnv):\n        _ = env.reset()\n\n    # Test the properties set by the wrapper\n    for key, value in props.items():\n        assert getattr(env, key) == value\n\n    # Otherwise, test if the properties are forwarded\n    all_properties = {\"observation_space\", \"action_space\", \"reward_range\", \"metadata\"}\n    for key in all_properties - props.keys():\n        assert getattr(env, key) == getattr(env.unwrapped, key)\n\n\ndef test_compatibility_with_old_style_env():\n    env = OldStyleEnv()\n    env = OrderEnforcing(env)\n    env = TimeLimit(env)\n    obs = env.reset()\n    assert obs == 0\n"
  },
  {
    "path": "tests/testing_env.py",
    "content": "\"\"\"Provides a generic testing environment for use in tests with custom reset, step and render functions.\"\"\"\nimport types\nfrom typing import Any, Dict, Optional, Tuple, Union\n\nimport gym\nfrom gym import spaces\nfrom gym.core import ActType, ObsType\nfrom gym.envs.registration import EnvSpec\n\n\ndef basic_reset_fn(\n    self,\n    *,\n    seed: Optional[int] = None,\n    options: Optional[dict] = None,\n) -> Union[ObsType, Tuple[ObsType, dict]]:\n    \"\"\"A basic reset function that will pass the environment check using random actions from the observation space.\"\"\"\n    super(GenericTestEnv, self).reset(seed=seed)\n    self.observation_space.seed(seed)\n    return self.observation_space.sample(), {\"options\": options}\n\n\ndef new_step_fn(self, action: ActType) -> Tuple[ObsType, float, bool, bool, dict]:\n    \"\"\"A step function that follows the new step api that will pass the environment check using random actions from the observation space.\"\"\"\n    return self.observation_space.sample(), 0, False, False, {}\n\n\ndef old_step_fn(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:\n    \"\"\"A step function that follows the old step api that will pass the environment check using random actions from the observation space.\"\"\"\n    return self.observation_space.sample(), 0, False, {}\n\n\ndef basic_render_fn(self):\n    \"\"\"Basic render fn that does nothing.\"\"\"\n    pass\n\n\n# todo: change all testing environment to this generic class\nclass GenericTestEnv(gym.Env):\n    \"\"\"A generic testing environment for use in testing with modified environments are required.\"\"\"\n\n    def __init__(\n        self,\n        action_space: spaces.Space = spaces.Box(0, 1, (1,)),\n        observation_space: spaces.Space = spaces.Box(0, 1, (1,)),\n        reset_fn: callable = basic_reset_fn,\n        step_fn: callable = new_step_fn,\n        render_fn: callable = basic_render_fn,\n        metadata: Optional[Dict[str, Any]] = None,\n        render_mode: Optional[str] = None,\n        spec: EnvSpec = EnvSpec(\"TestingEnv-v0\", \"testing-env-no-entry-point\"),\n    ):\n        self.metadata = {} if metadata is None else metadata\n        self.render_mode = render_mode\n        self.spec = spec\n\n        if observation_space is not None:\n            self.observation_space = observation_space\n        if action_space is not None:\n            self.action_space = action_space\n\n        if reset_fn is not None:\n            self.reset = types.MethodType(reset_fn, self)\n        if step_fn is not None:\n            self.step = types.MethodType(step_fn, self)\n        if render_fn is not None:\n            self.render = types.MethodType(render_fn, self)\n\n    def reset(\n        self,\n        *,\n        seed: Optional[int] = None,\n        options: Optional[dict] = None,\n    ) -> Union[ObsType, Tuple[ObsType, dict]]:\n        # If you need a default working reset function, use `basic_reset_fn` above\n        raise NotImplementedError(\"TestingEnv reset_fn is not set.\")\n\n    def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:\n        raise NotImplementedError(\"TestingEnv step_fn is not set.\")\n\n    def render(self):\n        raise NotImplementedError(\"testingEnv render_fn is not set.\")\n"
  },
  {
    "path": "tests/utils/__init__.py",
    "content": ""
  },
  {
    "path": "tests/utils/test_env_checker.py",
    "content": "\"\"\"Tests that the `env_checker` runs as expects and all errors are possible.\"\"\"\nimport re\nimport warnings\nfrom typing import Tuple, Union\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym import spaces\nfrom gym.core import ObsType\nfrom gym.utils.env_checker import (\n    check_env,\n    check_reset_options,\n    check_reset_return_info_deprecation,\n    check_reset_return_type,\n    check_reset_seed,\n    check_seed_deprecation,\n)\nfrom tests.testing_env import GenericTestEnv\n\n\n@pytest.mark.parametrize(\n    \"env\",\n    [\n        gym.make(\"CartPole-v1\", disable_env_checker=True).unwrapped,\n        gym.make(\"MountainCar-v0\", disable_env_checker=True).unwrapped,\n        GenericTestEnv(\n            observation_space=spaces.Dict(\n                a=spaces.Discrete(10), b=spaces.Box(np.zeros(2), np.ones(2))\n            )\n        ),\n        GenericTestEnv(\n            observation_space=spaces.Tuple(\n                [spaces.Discrete(10), spaces.Box(np.zeros(2), np.ones(2))]\n            )\n        ),\n        GenericTestEnv(\n            observation_space=spaces.Dict(\n                a=spaces.Tuple(\n                    [spaces.Discrete(10), spaces.Box(np.zeros(2), np.ones(2))]\n                ),\n                b=spaces.Box(np.zeros(2), np.ones(2)),\n            )\n        ),\n    ],\n)\ndef test_no_error_warnings(env):\n    \"\"\"A full version of this test with all gym envs is run in tests/envs/test_envs.py.\"\"\"\n    with warnings.catch_warnings(record=True) as caught_warnings:\n        check_env(env)\n\n    assert len(caught_warnings) == 0, [warning.message for warning in caught_warnings]\n\n\ndef _no_super_reset(self, seed=None, options=None):\n    self.np_random.random()  # generates a new prng\n    # generate seed deterministic result\n    self.observation_space.seed(0)\n    return self.observation_space.sample(), {}\n\n\ndef _super_reset_fixed(self, seed=None, options=None):\n    # Call super that ignores the seed passed, use fixed seed\n    super(GenericTestEnv, self).reset(seed=1)\n    # deterministic output\n    self.observation_space._np_random = self.np_random\n    return self.observation_space.sample(), {}\n\n\ndef _reset_default_seed(self: GenericTestEnv, seed=\"Error\", options=None):\n    super(GenericTestEnv, self).reset(seed=seed)\n    self.observation_space._np_random = (  # pyright: ignore [reportPrivateUsage]\n        self.np_random\n    )\n    return self.observation_space.sample(), {}\n\n\n@pytest.mark.parametrize(\n    \"test,func,message\",\n    [\n        [\n            gym.error.Error,\n            lambda self: (self.observation_space.sample(), {}),\n            \"The `reset` method does not provide a `seed` or `**kwargs` keyword argument.\",\n        ],\n        [\n            AssertionError,\n            lambda self, seed, *_: (self.observation_space.sample(), {}),\n            \"Expects the random number generator to have been generated given a seed was passed to reset. Mostly likely the environment reset function does not call `super().reset(seed=seed)`.\",\n        ],\n        [\n            AssertionError,\n            _no_super_reset,\n            \"Mostly likely the environment reset function does not call `super().reset(seed=seed)` as the random generates are not same when the same seeds are passed to `env.reset`.\",\n        ],\n        [\n            AssertionError,\n            _super_reset_fixed,\n            \"Mostly likely the environment reset function does not call `super().reset(seed=seed)` as the random number generators are not different when different seeds are passed to `env.reset`.\",\n        ],\n        [\n            UserWarning,\n            _reset_default_seed,\n            \"The default seed argument in reset should be `None`, otherwise the environment will by default always be deterministic. Actual default: Error\",\n        ],\n    ],\n)\ndef test_check_reset_seed(test, func: callable, message: str):\n    \"\"\"Tests the check reset seed function works as expected.\"\"\"\n    if test is UserWarning:\n        with pytest.warns(\n            UserWarning, match=f\"^\\\\x1b\\\\[33mWARN: {re.escape(message)}\\\\x1b\\\\[0m$\"\n        ):\n            check_reset_seed(GenericTestEnv(reset_fn=func))\n    else:\n        with pytest.raises(test, match=f\"^{re.escape(message)}$\"):\n            check_reset_seed(GenericTestEnv(reset_fn=func))\n\n\ndef _deprecated_return_info(\n    self, return_info: bool = False\n) -> Union[Tuple[ObsType, dict], ObsType]:\n    \"\"\"function to simulate the signature and behavior of a `reset` function with the deprecated `return_info` optional argument\"\"\"\n    if return_info:\n        return self.observation_space.sample(), {}\n    else:\n        return self.observation_space.sample()\n\n\ndef _reset_var_keyword_kwargs(self, kwargs):\n    return self.observation_space.sample(), {}\n\n\ndef _reset_return_info_type(self, seed=None, options=None):\n    \"\"\"Returns a `list` instead of a `tuple`. This function is used to make sure `env_checker` correctly\n    checks that the return type of `env.reset()` is a `tuple`\"\"\"\n    return [self.observation_space.sample(), {}]\n\n\ndef _reset_return_info_length(self, seed=None, options=None):\n    return 1, 2, 3\n\n\ndef _return_info_obs_outside(self, seed=None, options=None):\n    return self.observation_space.sample() + self.observation_space.high, {}\n\n\ndef _return_info_not_dict(self, seed=None, options=None):\n    return self.observation_space.sample(), [\"key\", \"value\"]\n\n\n@pytest.mark.parametrize(\n    \"test,func,message\",\n    [\n        [\n            AssertionError,\n            _reset_return_info_type,\n            \"The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `<class 'list'>`\",\n        ],\n        [\n            AssertionError,\n            _reset_return_info_length,\n            \"Calling the reset method did not return a 2-tuple, actual length: 3\",\n        ],\n        [\n            AssertionError,\n            _return_info_obs_outside,\n            \"The first element returned by `env.reset()` is not within the observation space.\",\n        ],\n        [\n            AssertionError,\n            _return_info_not_dict,\n            \"The second element returned by `env.reset()` was not a dictionary, actual type: <class 'list'>\",\n        ],\n    ],\n)\ndef test_check_reset_return_type(test, func: callable, message: str):\n    \"\"\"Tests the check `env.reset()` function has a correct return type.\"\"\"\n\n    with pytest.raises(test, match=f\"^{re.escape(message)}$\"):\n        check_reset_return_type(GenericTestEnv(reset_fn=func))\n\n\n@pytest.mark.parametrize(\n    \"test,func,message\",\n    [\n        [\n            UserWarning,\n            _deprecated_return_info,\n            \"`return_info` is deprecated as an optional argument to `reset`. `reset`\"\n            \"should now always return `obs, info` where `obs` is an observation, and `info` is a dictionary\"\n            \"containing additional information.\",\n        ],\n    ],\n)\ndef test_check_reset_return_info_deprecation(test, func: callable, message: str):\n    \"\"\"Tests that return_info has been correct deprecated as an argument to `env.reset()`.\"\"\"\n\n    with pytest.warns(test, match=f\"^\\\\x1b\\\\[33mWARN: {re.escape(message)}\\\\x1b\\\\[0m$\"):\n        check_reset_return_info_deprecation(GenericTestEnv(reset_fn=func))\n\n\ndef test_check_seed_deprecation():\n    \"\"\"Tests that `check_seed_deprecation()` throws a warning if `env.seed()` has not been removed.\"\"\"\n\n    message = \"\"\"Official support for the `seed` function is dropped. Standard practice is to reset gym environments using `env.reset(seed=<desired seed>)`\"\"\"\n\n    env = GenericTestEnv()\n\n    def seed(seed):\n        return\n\n    with pytest.warns(\n        UserWarning, match=f\"^\\\\x1b\\\\[33mWARN: {re.escape(message)}\\\\x1b\\\\[0m$\"\n    ):\n        env.seed = seed\n        assert callable(env.seed)\n        check_seed_deprecation(env)\n\n    with warnings.catch_warnings(record=True) as caught_warnings:\n        env.seed = []\n        check_seed_deprecation(env)\n        env.seed = 123\n        check_seed_deprecation(env)\n        del env.seed\n        check_seed_deprecation(env)\n        assert len(caught_warnings) == 0\n\n\ndef test_check_reset_options():\n    \"\"\"Tests the check_reset_options function.\"\"\"\n    with pytest.raises(\n        gym.error.Error,\n        match=re.escape(\n            \"The `reset` method does not provide an `options` or `**kwargs` keyword argument\"\n        ),\n    ):\n        check_reset_options(GenericTestEnv(reset_fn=lambda self: (0, {})))\n\n\n@pytest.mark.parametrize(\n    \"env,message\",\n    [\n        [\n            \"Error\",\n            \"The environment must inherit from the gym.Env class. See https://www.gymlibrary.dev/content/environment_creation/ for more info.\",\n        ],\n        [\n            GenericTestEnv(action_space=None),\n            \"The environment must specify an action space. See https://www.gymlibrary.dev/content/environment_creation/ for more info.\",\n        ],\n        [\n            GenericTestEnv(observation_space=None),\n            \"The environment must specify an observation space. See https://www.gymlibrary.dev/content/environment_creation/ for more info.\",\n        ],\n    ],\n)\ndef test_check_env(env: gym.Env, message: str):\n    \"\"\"Tests the check_env function works as expected.\"\"\"\n    with pytest.raises(AssertionError, match=f\"^{re.escape(message)}$\"):\n        check_env(env)\n"
  },
  {
    "path": "tests/utils/test_passive_env_checker.py",
    "content": "import re\nimport warnings\nfrom typing import Dict, Union\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym import spaces\nfrom gym.utils.passive_env_checker import (\n    check_action_space,\n    check_obs,\n    check_observation_space,\n    env_render_passive_checker,\n    env_reset_passive_checker,\n    env_step_passive_checker,\n)\nfrom tests.testing_env import GenericTestEnv\n\n\ndef _modify_space(space: spaces.Space, attribute: str, value):\n    setattr(space, attribute, value)\n    return space\n\n\n@pytest.mark.parametrize(\n    \"test,space,message\",\n    [\n        [\n            AssertionError,\n            \"error\",\n            \"observation space does not inherit from `gym.spaces.Space`, actual type: <class 'str'>\",\n        ],\n        # ===== Check box observation space ====\n        [\n            UserWarning,\n            spaces.Box(np.zeros((5, 5, 1)), 255 * np.ones((5, 5, 1)), dtype=np.int32),\n            \"It seems a Box observation space is an image but the `dtype` is not `np.uint8`, actual type: int32. If the Box observation space is not an image, we recommend flattening the observation to have only a 1D vector.\",\n        ],\n        [\n            UserWarning,\n            spaces.Box(np.ones((2, 2, 1)), 255 * np.ones((2, 2, 1)), dtype=np.uint8),\n            \"It seems a Box observation space is an image but the upper and lower bounds are not in [0, 255]. Generally, CNN policies assume observations are within that range, so you may encounter an issue if the observation values are not.\",\n        ],\n        [\n            UserWarning,\n            spaces.Box(np.zeros((5, 5, 1)), np.ones((5, 5, 1)), dtype=np.uint8),\n            \"It seems a Box observation space is an image but the upper and lower bounds are not in [0, 255]. Generally, CNN policies assume observations are within that range, so you may encounter an issue if the observation values are not.\",\n        ],\n        [\n            UserWarning,\n            spaces.Box(np.zeros((5, 5)), np.ones((5, 5))),\n            \"A Box observation space has an unconventional shape (neither an image, nor a 1D vector). We recommend flattening the observation to have only a 1D vector or use a custom policy to properly process the data. Actual observation shape: (5, 5)\",\n        ],\n        [\n            UserWarning,\n            spaces.Box(np.zeros(5), np.zeros(5)),\n            \"A Box observation space maximum and minimum values are equal.\",\n        ],\n        [\n            UserWarning,\n            spaces.Box(np.ones(5), np.zeros(5)),\n            \"A Box observation space low value is greater than a high value.\",\n        ],\n        [\n            AssertionError,\n            _modify_space(spaces.Box(np.zeros(2), np.ones(2)), \"low\", np.zeros(3)),\n            \"The Box observation space shape and low shape have different shapes, low shape: (3,), box shape: (2,)\",\n        ],\n        [\n            AssertionError,\n            _modify_space(spaces.Box(np.zeros(2), np.ones(2)), \"high\", np.ones(3)),\n            \"The Box observation space shape and high shape have have different shapes, high shape: (3,), box shape: (2,)\",\n        ],\n        # ==== Other observation spaces (Discrete, MultiDiscrete, MultiBinary, Tuple, Dict)\n        [\n            AssertionError,\n            _modify_space(spaces.Discrete(5), \"n\", -1),\n            \"Discrete observation space's number of elements must be positive, actual number of elements: -1\",\n        ],\n        [\n            AssertionError,\n            _modify_space(spaces.MultiDiscrete([2, 2]), \"nvec\", np.array([2, -1])),\n            \"Multi-discrete observation space's all nvec elements must be greater than 0, actual nvec: [ 2 -1]\",\n        ],\n        [\n            AssertionError,\n            _modify_space(spaces.MultiDiscrete([2, 2]), \"_shape\", (2, 1, 2)),\n            \"Multi-discrete observation space's shape must be equal to the nvec shape, space shape: (2, 1, 2), nvec shape: (2,)\",\n        ],\n        [\n            AssertionError,\n            _modify_space(spaces.MultiBinary((2, 2)), \"_shape\", (2, -1)),\n            \"Multi-binary observation space's all shape elements must be greater than 0, actual shape: (2, -1)\",\n        ],\n        [\n            AssertionError,\n            spaces.Tuple([]),\n            \"An empty Tuple observation space is not allowed.\",\n        ],\n        [\n            AssertionError,\n            spaces.Dict(),\n            \"An empty Dict observation space is not allowed.\",\n        ],\n    ],\n)\ndef test_check_observation_space(test, space, message: str):\n    \"\"\"Tests the check observation space.\"\"\"\n    if test is UserWarning:\n        with pytest.warns(\n            UserWarning, match=f\"^\\\\x1b\\\\[33mWARN: {re.escape(message)}\\\\x1b\\\\[0m$\"\n        ):\n            check_observation_space(space)\n    else:\n        with warnings.catch_warnings(record=True) as caught_warnings:\n            with pytest.raises(test, match=f\"^{re.escape(message)}$\"):\n                check_observation_space(space)\n        assert len(caught_warnings) == 0\n\n\n@pytest.mark.parametrize(\n    \"test,space,message\",\n    [\n        [\n            AssertionError,\n            \"error\",\n            \"action space does not inherit from `gym.spaces.Space`, actual type: <class 'str'>\",\n        ],\n        # ===== Check box observation space ====\n        [\n            UserWarning,\n            spaces.Box(np.zeros(5), np.zeros(5)),\n            \"A Box action space maximum and minimum values are equal.\",\n        ],\n        [\n            UserWarning,\n            spaces.Box(np.ones(5), np.zeros(5)),\n            \"A Box action space low value is greater than a high value.\",\n        ],\n        [\n            AssertionError,\n            _modify_space(spaces.Box(np.zeros(2), np.ones(2)), \"low\", np.zeros(3)),\n            \"The Box action space shape and low shape have have different shapes, low shape: (3,), box shape: (2,)\",\n        ],\n        [\n            AssertionError,\n            _modify_space(spaces.Box(np.zeros(2), np.ones(2)), \"high\", np.ones(3)),\n            \"The Box action space shape and high shape have different shapes, high shape: (3,), box shape: (2,)\",\n        ],\n        # ==== Other observation spaces (Discrete, MultiDiscrete, MultiBinary, Tuple, Dict)\n        [\n            AssertionError,\n            _modify_space(spaces.Discrete(5), \"n\", -1),\n            \"Discrete action space's number of elements must be positive, actual number of elements: -1\",\n        ],\n        [\n            AssertionError,\n            _modify_space(spaces.MultiDiscrete([2, 2]), \"_shape\", (2, -1)),\n            \"Multi-discrete action space's shape must be equal to the nvec shape, space shape: (2, -1), nvec shape: (2,)\",\n        ],\n        [\n            AssertionError,\n            _modify_space(spaces.MultiBinary((2, 2)), \"_shape\", (2, -1)),\n            \"Multi-binary action space's all shape elements must be greater than 0, actual shape: (2, -1)\",\n        ],\n        [\n            AssertionError,\n            spaces.Tuple([]),\n            \"An empty Tuple action space is not allowed.\",\n        ],\n        [AssertionError, spaces.Dict(), \"An empty Dict action space is not allowed.\"],\n    ],\n)\ndef test_check_action_space(\n    test: Union[UserWarning, type], space: spaces.Space, message: str\n):\n    \"\"\"Tests the check action space function.\"\"\"\n    if test is UserWarning:\n        with pytest.warns(\n            UserWarning, match=f\"^\\\\x1b\\\\[33mWARN: {re.escape(message)}\\\\x1b\\\\[0m$\"\n        ):\n            check_action_space(space)\n    else:\n        with warnings.catch_warnings(record=True) as caught_warnings:\n            with pytest.raises(test, match=f\"^{re.escape(message)}$\"):\n                check_action_space(space)\n        assert len(caught_warnings) == 0\n\n\n@pytest.mark.parametrize(\n    \"test,obs,obs_space,message\",\n    [\n        [\n            UserWarning,\n            3,\n            spaces.Discrete(2),\n            \"The obs returned by the `testing()` method is not within the observation space.\",\n        ],\n        [\n            UserWarning,\n            np.uint8(0),\n            spaces.Discrete(1),\n            \"The obs returned by the `testing()` method should be an int or np.int64, actual type: <class 'numpy.uint8'>\",\n        ],\n        [\n            UserWarning,\n            [0, 1],\n            spaces.Tuple([spaces.Discrete(1), spaces.Discrete(2)]),\n            \"The obs returned by the `testing()` method was expecting a tuple, actual type: <class 'list'>\",\n        ],\n        [\n            AssertionError,\n            (1, 2, 3),\n            spaces.Tuple([spaces.Discrete(1), spaces.Discrete(2)]),\n            \"The obs returned by the `testing()` method length is not same as the observation space length, obs length: 3, space length: 2\",\n        ],\n        [\n            AssertionError,\n            {1, 2, 3},\n            spaces.Dict(a=spaces.Discrete(1), b=spaces.Discrete(2)),\n            \"The obs returned by the `testing()` method must be a dict, actual type: <class 'set'>\",\n        ],\n        [\n            AssertionError,\n            {\"a\": 1, \"c\": 2},\n            spaces.Dict(a=spaces.Discrete(1), b=spaces.Discrete(2)),\n            \"The obs returned by the `testing()` method observation keys is not same as the observation space keys, obs keys: ['a', 'c'], space keys: ['a', 'b']\",\n        ],\n    ],\n)\ndef test_check_obs(test, obs, obs_space: spaces.Space, message: str):\n    \"\"\"Tests the check observations function.\"\"\"\n    if test is UserWarning:\n        with pytest.warns(\n            UserWarning, match=f\"^\\\\x1b\\\\[33mWARN: {re.escape(message)}\\\\x1b\\\\[0m$\"\n        ):\n            check_obs(obs, obs_space, \"testing\")\n    else:\n        with warnings.catch_warnings(record=True) as caught_warnings:\n            with pytest.raises(test, match=f\"^{re.escape(message)}$\"):\n                check_obs(obs, obs_space, \"testing\")\n        assert len(caught_warnings) == 0\n\n\ndef _reset_no_seed(self, options=None):\n    return self.observation_space.sample(), {}\n\n\ndef _reset_seed_default(self, seed=\"error\", options=None):\n    return self.observation_space.sample(), {}\n\n\ndef _reset_no_option(self, seed=None):\n    return self.observation_space.sample(), {}\n\n\ndef _make_reset_results(results):\n    def _reset_result(self, seed=None, options=None):\n        return results\n\n    return _reset_result\n\n\n@pytest.mark.parametrize(\n    \"test,func,message,kwargs\",\n    [\n        [\n            UserWarning,\n            _reset_no_seed,\n            \"Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.\",\n            {},\n        ],\n        [\n            UserWarning,\n            _reset_seed_default,\n            \"The default seed argument in `Env.reset` should be `None`, otherwise the environment will by default always be deterministic. Actual default: seed='error'\",\n            {},\n        ],\n        [\n            UserWarning,\n            _reset_no_option,\n            \"Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.\",\n            {},\n        ],\n        [\n            UserWarning,\n            _make_reset_results([0, {}]),\n            \"The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `<class 'list'>`\",\n            {},\n        ],\n        [\n            AssertionError,\n            _make_reset_results((np.array([0], dtype=np.float32), {1, 2})),\n            \"The second element returned by `env.reset()` was not a dictionary, actual type: <class 'set'>\",\n            {},\n        ],\n    ],\n)\ndef test_passive_env_reset_checker(test, func: callable, message: str, kwargs: Dict):\n    \"\"\"Tests the passive env reset check\"\"\"\n    if test is UserWarning:\n        with pytest.warns(\n            UserWarning, match=f\"^\\\\x1b\\\\[33mWARN: {re.escape(message)}\\\\x1b\\\\[0m$\"\n        ):\n            env_reset_passive_checker(GenericTestEnv(reset_fn=func), **kwargs)\n    else:\n        with warnings.catch_warnings(record=True) as caught_warnings:\n            with pytest.raises(test, match=f\"^{re.escape(message)}$\"):\n                env_reset_passive_checker(GenericTestEnv(reset_fn=func), **kwargs)\n        assert len(caught_warnings) == 0\n\n\ndef _modified_step(\n    self, obs=None, reward=0, terminated=False, truncated=False, info=None\n):\n    if obs is None:\n        obs = self.observation_space.sample()\n    if info is None:\n        info = {}\n\n    if truncated is None:\n        return obs, reward, terminated, info\n    else:\n        return obs, reward, terminated, truncated, info\n\n\n@pytest.mark.parametrize(\n    \"test,func,message\",\n    [\n        [\n            AssertionError,\n            lambda self, _: \"error\",\n            \"Expects step result to be a tuple, actual type: <class 'str'>\",\n        ],\n        [\n            UserWarning,\n            lambda self, _: _modified_step(self, terminated=\"error\", truncated=None),\n            \"Expects `done` signal to be a boolean, actual type: <class 'str'>\",\n        ],\n        [\n            UserWarning,\n            lambda self, _: _modified_step(self, terminated=\"error\", truncated=False),\n            \"Expects `terminated` signal to be a boolean, actual type: <class 'str'>\",\n        ],\n        [\n            UserWarning,\n            lambda self, _: _modified_step(self, truncated=\"error\"),\n            \"Expects `truncated` signal to be a boolean, actual type: <class 'str'>\",\n        ],\n        [\n            gym.error.Error,\n            lambda self, _: (1, 2, 3),\n            \"Expected `Env.step` to return a four or five element tuple, actual number of elements returned: 3.\",\n        ],\n        [\n            UserWarning,\n            lambda self, _: _modified_step(self, reward=\"error\"),\n            \"The reward returned by `step()` must be a float, int, np.integer or np.floating, actual type: <class 'str'>\",\n        ],\n        [\n            UserWarning,\n            lambda self, _: _modified_step(self, reward=np.nan),\n            \"The reward is a NaN value.\",\n        ],\n        [\n            UserWarning,\n            lambda self, _: _modified_step(self, reward=np.inf),\n            \"The reward is an inf value.\",\n        ],\n        [\n            AssertionError,\n            lambda self, _: _modified_step(self, info=\"error\"),\n            \"The `info` returned by `step()` must be a python dictionary, actual type: <class 'str'>\",\n        ],\n    ],\n)\ndef test_passive_env_step_checker(\n    test: Union[UserWarning, type], func: callable, message: str\n):\n    \"\"\"Tests the passive env step checker.\"\"\"\n    if test is UserWarning:\n        with pytest.warns(\n            UserWarning, match=f\"^\\\\x1b\\\\[33mWARN: {re.escape(message)}\\\\x1b\\\\[0m$\"\n        ):\n            env_step_passive_checker(GenericTestEnv(step_fn=func), 0)\n    else:\n        with warnings.catch_warnings(record=True) as caught_warnings:\n            with pytest.raises(test, match=f\"^{re.escape(message)}$\"):\n                env_step_passive_checker(GenericTestEnv(step_fn=func), 0)\n        assert len(caught_warnings) == 0, caught_warnings\n\n\n@pytest.mark.parametrize(\n    \"test,env,message\",\n    [\n        [\n            UserWarning,\n            GenericTestEnv(metadata={\"render_modes\": None}),\n            \"No render modes was declared in the environment (env.metadata['render_modes'] is None or not defined), you may have trouble when calling `.render()`.\",\n        ],\n        [\n            UserWarning,\n            GenericTestEnv(metadata={\"render_modes\": \"Testing mode\"}),\n            \"Expects the render_modes to be a sequence (i.e. list, tuple), actual type: <class 'str'>\",\n        ],\n        [\n            UserWarning,\n            GenericTestEnv(\n                metadata={\"render_modes\": [\"Testing mode\", 1], \"render_fps\": 1},\n            ),\n            \"Expects all render modes to be strings, actual types: [<class 'str'>, <class 'int'>]\",\n        ],\n        [\n            UserWarning,\n            GenericTestEnv(\n                metadata={\"render_modes\": [\"Testing mode\"], \"render_fps\": None},\n                render_mode=\"Testing mode\",\n                render_fn=lambda self: 0,\n            ),\n            \"No render fps was declared in the environment (env.metadata['render_fps'] is None or not defined), rendering may occur at inconsistent fps.\",\n        ],\n        [\n            UserWarning,\n            GenericTestEnv(\n                metadata={\"render_modes\": [\"Testing mode\"], \"render_fps\": \"fps\"}\n            ),\n            \"Expects the `env.metadata['render_fps']` to be an integer or a float, actual type: <class 'str'>\",\n        ],\n        [\n            AssertionError,\n            GenericTestEnv(\n                metadata={\"render_modes\": [], \"render_fps\": 30}, render_mode=\"Test\"\n            ),\n            \"With no render_modes, expects the Env.render_mode to be None, actual value: Test\",\n        ],\n        [\n            AssertionError,\n            GenericTestEnv(\n                metadata={\"render_modes\": [\"Testing mode\"], \"render_fps\": 30},\n                render_mode=\"Non mode\",\n            ),\n            \"The environment was initialized successfully however with an unsupported render mode. Render mode: Non mode, modes: ['Testing mode']\",\n        ],\n    ],\n)\ndef test_passive_render_checker(test, env: GenericTestEnv, message: str):\n    \"\"\"Tests the passive render checker.\"\"\"\n    if test is UserWarning:\n        with pytest.warns(\n            UserWarning, match=f\"^\\\\x1b\\\\[33mWARN: {re.escape(message)}\\\\x1b\\\\[0m$\"\n        ):\n            env_render_passive_checker(env)\n    else:\n        with warnings.catch_warnings(record=True) as caught_warnings:\n            with pytest.raises(test, match=f\"^{re.escape(message)}$\"):\n                env_render_passive_checker(env)\n        assert len(caught_warnings) == 0\n"
  },
  {
    "path": "tests/utils/test_play.py",
    "content": "from functools import partial\nfrom itertools import product\nfrom typing import Callable\n\nimport numpy as np\nimport pygame\nimport pytest\nfrom pygame import KEYDOWN, KEYUP, QUIT, event\nfrom pygame.event import Event\n\nimport gym\nfrom gym.utils.play import MissingKeysToAction, PlayableGame, play\nfrom tests.testing_env import GenericTestEnv\n\nRELEVANT_KEY_1 = ord(\"a\")  # 97\nRELEVANT_KEY_2 = ord(\"d\")  # 100\nIRRELEVANT_KEY = 1\n\n\nPlayableEnv = partial(\n    GenericTestEnv,\n    metadata={\"render_modes\": [\"rgb_array\"]},\n    render_fn=lambda self: np.ones((10, 10, 3)),\n)\n\n\nclass KeysToActionWrapper(gym.Wrapper):\n    def __init__(self, env, keys_to_action):\n        super().__init__(env)\n        self.keys_to_action = keys_to_action\n\n    def get_keys_to_action(self):\n        return self.keys_to_action\n\n\nclass PlayStatus:\n    def __init__(self, callback: Callable):\n        self.data_callback = callback\n        self.cumulative_reward = 0\n        self.last_observation = None\n\n    def callback(self, obs_t, obs_tp1, action, rew, terminated, truncated, info):\n        _, obs_tp1, _, rew, _, _, _ = self.data_callback(\n            obs_t, obs_tp1, action, rew, terminated, truncated, info\n        )\n        self.cumulative_reward += rew\n        self.last_observation = obs_tp1\n\n\ndef dummy_keys_to_action():\n    return {(RELEVANT_KEY_1,): 0, (RELEVANT_KEY_2,): 1}\n\n\ndef dummy_keys_to_action_str():\n    \"\"\"{'a': 0, 'd': 1}\"\"\"\n    return {chr(RELEVANT_KEY_1): 0, chr(RELEVANT_KEY_2): 1}\n\n\n@pytest.fixture(autouse=True)\ndef close_pygame():\n    yield\n    pygame.quit()\n\n\ndef test_play_relevant_keys():\n    env = PlayableEnv(render_mode=\"rgb_array\")\n    game = PlayableGame(env, dummy_keys_to_action())\n    assert game.relevant_keys == {RELEVANT_KEY_1, RELEVANT_KEY_2}\n\n\ndef test_play_relevant_keys_no_mapping():\n    env = PlayableEnv(render_mode=\"rgb_array\")\n\n    with pytest.raises(MissingKeysToAction):\n        PlayableGame(env)\n\n\ndef test_play_relevant_keys_with_env_attribute():\n    \"\"\"Env has a keys_to_action attribute\"\"\"\n    env = PlayableEnv(render_mode=\"rgb_array\")\n    env.get_keys_to_action = dummy_keys_to_action\n    game = PlayableGame(env)\n    assert game.relevant_keys == {RELEVANT_KEY_1, RELEVANT_KEY_2}\n\n\ndef test_video_size_no_zoom():\n    env = PlayableEnv(render_mode=\"rgb_array\")\n    game = PlayableGame(env, dummy_keys_to_action())\n    assert game.video_size == env.render().shape[:2]\n\n\ndef test_video_size_zoom():\n    env = PlayableEnv(render_mode=\"rgb_array\")\n    zoom = 2.2\n    game = PlayableGame(env, dummy_keys_to_action(), zoom)\n    assert game.video_size == tuple(int(dim * zoom) for dim in env.render().shape[:2])\n\n\ndef test_keyboard_quit_event():\n    env = PlayableEnv(render_mode=\"rgb_array\")\n    game = PlayableGame(env, dummy_keys_to_action())\n    event = Event(pygame.KEYDOWN, {\"key\": pygame.K_ESCAPE})\n    assert game.running is True\n    game.process_event(event)\n    assert game.running is False\n\n\ndef test_pygame_quit_event():\n    env = PlayableEnv(render_mode=\"rgb_array\")\n    game = PlayableGame(env, dummy_keys_to_action())\n    event = Event(pygame.QUIT)\n    assert game.running is True\n    game.process_event(event)\n    assert game.running is False\n\n\ndef test_keyboard_relevant_keydown_event():\n    env = PlayableEnv(render_mode=\"rgb_array\")\n    game = PlayableGame(env, dummy_keys_to_action())\n    event = Event(pygame.KEYDOWN, {\"key\": RELEVANT_KEY_1})\n    game.process_event(event)\n    assert game.pressed_keys == [RELEVANT_KEY_1]\n\n\ndef test_keyboard_irrelevant_keydown_event():\n    env = PlayableEnv(render_mode=\"rgb_array\")\n    game = PlayableGame(env, dummy_keys_to_action())\n    event = Event(pygame.KEYDOWN, {\"key\": IRRELEVANT_KEY})\n    game.process_event(event)\n    assert game.pressed_keys == []\n\n\ndef test_keyboard_keyup_event():\n    env = PlayableEnv(render_mode=\"rgb_array\")\n    game = PlayableGame(env, dummy_keys_to_action())\n    event = Event(pygame.KEYDOWN, {\"key\": RELEVANT_KEY_1})\n    game.process_event(event)\n    event = Event(pygame.KEYUP, {\"key\": RELEVANT_KEY_1})\n    game.process_event(event)\n    assert game.pressed_keys == []\n\n\ndef test_play_loop_real_env():\n    SEED = 42\n    ENV = \"CartPole-v1\"\n\n    # If apply_wrapper is true, we provide keys_to_action through the environment. If str_keys is true, the\n    # keys_to_action dictionary will have strings as keys\n    for apply_wrapper, str_keys in product([False, True], [False, True]):\n        # set of key events to inject into the play loop as callback\n        callback_events = [\n            Event(KEYDOWN, {\"key\": RELEVANT_KEY_1}),\n            Event(KEYUP, {\"key\": RELEVANT_KEY_1}),\n            Event(KEYDOWN, {\"key\": RELEVANT_KEY_2}),\n            Event(KEYUP, {\"key\": RELEVANT_KEY_2}),\n            Event(KEYDOWN, {\"key\": RELEVANT_KEY_1}),\n            Event(KEYUP, {\"key\": RELEVANT_KEY_1}),\n            Event(KEYDOWN, {\"key\": RELEVANT_KEY_1}),\n            Event(KEYUP, {\"key\": RELEVANT_KEY_1}),\n            Event(KEYDOWN, {\"key\": RELEVANT_KEY_2}),\n            Event(KEYUP, {\"key\": RELEVANT_KEY_2}),\n            Event(QUIT),\n        ]\n        keydown_events = [k for k in callback_events if k.type == KEYDOWN]\n\n        def callback(obs_t, obs_tp1, action, rew, terminated, truncated, info):\n            pygame_event = callback_events.pop(0)\n            event.post(pygame_event)\n\n            # after releasing a key, post new events until\n            # we have one keydown\n            while pygame_event.type == KEYUP:\n                pygame_event = callback_events.pop(0)\n                event.post(pygame_event)\n\n            return obs_t, obs_tp1, action, rew, terminated, truncated, info\n\n        env = gym.make(ENV, render_mode=\"rgb_array\", disable_env_checker=True)\n        env.reset(seed=SEED)\n        keys_to_action = (\n            dummy_keys_to_action_str() if str_keys else dummy_keys_to_action()\n        )\n\n        # first action is 0 because at the first iteration\n        # we can not inject a callback event into play()\n        obs, _, _, _, _ = env.step(0)\n        for e in keydown_events:\n            action = keys_to_action[chr(e.key) if str_keys else (e.key,)]\n            obs, _, _, _, _ = env.step(action)\n\n        env_play = gym.make(ENV, render_mode=\"rgb_array\", disable_env_checker=True)\n        if apply_wrapper:\n            env_play = KeysToActionWrapper(env, keys_to_action=keys_to_action)\n            assert hasattr(env_play, \"get_keys_to_action\")\n\n        status = PlayStatus(callback)\n        play(\n            env_play,\n            callback=status.callback,\n            keys_to_action=None if apply_wrapper else keys_to_action,\n            seed=SEED,\n        )\n\n        assert (status.last_observation == obs).all()\n\n\ndef test_play_no_keys():\n    with pytest.raises(MissingKeysToAction):\n        play(gym.make(\"CartPole-v1\"))\n"
  },
  {
    "path": "tests/utils/test_save_video.py",
    "content": "import os\nimport shutil\n\nimport numpy as np\n\nimport gym\nfrom gym.utils.save_video import capped_cubic_video_schedule, save_video\n\n\ndef test_record_video_using_default_trigger():\n    env = gym.make(\n        \"CartPole-v1\", render_mode=\"rgb_array_list\", disable_env_checker=True\n    )\n\n    env.reset()\n    step_starting_index = 0\n    episode_index = 0\n    for step_index in range(199):\n        action = env.action_space.sample()\n        _, _, terminated, truncated, _ = env.step(action)\n        if terminated or truncated:\n            save_video(\n                env.render(),\n                \"videos\",\n                fps=env.metadata[\"render_fps\"],\n                step_starting_index=step_starting_index,\n                episode_index=episode_index,\n            )\n            step_starting_index = step_index + 1\n            episode_index += 1\n            env.reset()\n\n    env.close()\n    assert os.path.isdir(\"videos\")\n    mp4_files = [file for file in os.listdir(\"videos\") if file.endswith(\".mp4\")]\n    shutil.rmtree(\"videos\")\n    assert len(mp4_files) == sum(\n        capped_cubic_video_schedule(i) for i in range(episode_index)\n    )\n\n\ndef modulo_step_trigger(mod: int):\n    def step_trigger(step_index):\n        return step_index % mod == 0\n\n    return step_trigger\n\n\ndef test_record_video_step_trigger():\n    env = gym.make(\"CartPole-v1\", render_mode=\"rgb_array_list\")\n    env._max_episode_steps = 20\n\n    env.reset()\n    step_starting_index = 0\n    episode_index = 0\n    for step_index in range(199):\n        action = env.action_space.sample()\n        _, _, terminated, truncated, _ = env.step(action)\n        if terminated or truncated:\n            save_video(\n                env.render(),\n                \"videos\",\n                fps=env.metadata[\"render_fps\"],\n                step_trigger=modulo_step_trigger(100),\n                step_starting_index=step_starting_index,\n                episode_index=episode_index,\n            )\n            step_starting_index = step_index + 1\n            episode_index += 1\n            env.reset()\n    env.close()\n\n    assert os.path.isdir(\"videos\")\n    mp4_files = [file for file in os.listdir(\"videos\") if file.endswith(\".mp4\")]\n    shutil.rmtree(\"videos\")\n    assert len(mp4_files) == 2\n\n\ndef test_record_video_within_vector():\n    step_trigger = modulo_step_trigger(100)\n    n_steps = 199\n    expected_video = 2\n\n    envs = gym.vector.make(\n        \"CartPole-v1\", num_envs=2, asynchronous=True, render_mode=\"rgb_array_list\"\n    )\n    envs.reset()\n    episode_frames = []\n    step_starting_index = 0\n    episode_index = 0\n    for step_index in range(n_steps):\n        _, _, terminated, truncated, _ = envs.step(envs.action_space.sample())\n        episode_frames.extend(envs.call(\"render\")[0])\n\n        if np.any(np.logical_or(terminated, truncated)):\n            save_video(\n                episode_frames,\n                \"videos\",\n                fps=envs.metadata[\"render_fps\"],\n                step_trigger=step_trigger,\n                step_starting_index=step_starting_index,\n                episode_index=episode_index,\n            )\n            episode_frames = []\n            step_starting_index = step_index + 1\n            episode_index += 1\n\n            # TODO: fix this test (see https://github.com/openai/gym/issues/3054)\n            if step_trigger(step_index):\n                expected_video -= 1\n\n    envs.close()\n\n    assert os.path.isdir(\"videos\")\n    mp4_files = [file for file in os.listdir(\"videos\") if file.endswith(\".mp4\")]\n    shutil.rmtree(\"videos\")\n    assert len(mp4_files) == expected_video\n"
  },
  {
    "path": "tests/utils/test_seeding.py",
    "content": "import pickle\n\nfrom gym import error\nfrom gym.utils import seeding\n\n\ndef test_invalid_seeds():\n    for seed in [-1, \"test\"]:\n        try:\n            seeding.np_random(seed)\n        except error.Error:\n            pass\n        else:\n            assert False, f\"Invalid seed {seed} passed validation\"\n\n\ndef test_valid_seeds():\n    for seed in [0, 1]:\n        random, seed1 = seeding.np_random(seed)\n        assert seed == seed1\n\n\ndef test_rng_pickle():\n    rng, _ = seeding.np_random(seed=0)\n    pickled = pickle.dumps(rng)\n    rng2 = pickle.loads(pickled)\n    assert isinstance(\n        rng2, seeding.RandomNumberGenerator\n    ), \"Unpickled object is not a RandomNumberGenerator\"\n    assert rng.random() == rng2.random()\n"
  },
  {
    "path": "tests/utils/test_step_api_compatibility.py",
    "content": "import numpy as np\nimport pytest\n\nfrom gym.utils.env_checker import data_equivalence\nfrom gym.utils.step_api_compatibility import (\n    convert_to_done_step_api,\n    convert_to_terminated_truncated_step_api,\n)\n\n\n@pytest.mark.parametrize(\n    \"is_vector_env, done_returns, expected_terminated, expected_truncated\",\n    (\n        # Test each of the permutations for single environments with and without the old info\n        (False, (0, 0, False, {\"Test-info\": True}), False, False),\n        (False, (0, 0, False, {\"TimeLimit.truncated\": False}), False, False),\n        (False, (0, 0, True, {}), True, False),\n        (False, (0, 0, True, {\"TimeLimit.truncated\": True}), False, True),\n        (False, (0, 0, True, {\"Test-info\": True}), True, False),\n        # Test vectorise versions with both list and dict infos testing each permutation for sub-environments\n        (\n            True,\n            (\n                0,\n                0,\n                np.array([False, True, True]),\n                [{}, {}, {\"TimeLimit.truncated\": True}],\n            ),\n            np.array([False, True, False]),\n            np.array([False, False, True]),\n        ),\n        (\n            True,\n            (\n                0,\n                0,\n                np.array([False, True, True]),\n                {\"TimeLimit.truncated\": np.array([False, False, True])},\n            ),\n            np.array([False, True, False]),\n            np.array([False, False, True]),\n        ),\n        # empty truncated info\n        (\n            True,\n            (\n                0,\n                0,\n                np.array([False, True]),\n                {},\n            ),\n            np.array([False, True]),\n            np.array([False, False]),\n        ),\n    ),\n)\ndef test_to_done_step_api(\n    is_vector_env, done_returns, expected_terminated, expected_truncated\n):\n    _, _, terminated, truncated, info = convert_to_terminated_truncated_step_api(\n        done_returns, is_vector_env=is_vector_env\n    )\n    assert np.all(terminated == expected_terminated)\n    assert np.all(truncated == expected_truncated)\n\n    if is_vector_env is False:\n        assert \"TimeLimit.truncated\" not in info\n    elif isinstance(info, list):\n        assert all(\"TimeLimit.truncated\" not in sub_info for sub_info in info)\n    else:  # isinstance(info, dict)\n        assert \"TimeLimit.truncated\" not in info\n\n    roundtripped_returns = convert_to_done_step_api(\n        (0, 0, terminated, truncated, info), is_vector_env=is_vector_env\n    )\n    assert data_equivalence(done_returns, roundtripped_returns)\n\n\n@pytest.mark.parametrize(\n    \"is_vector_env, terminated_truncated_returns, expected_done, expected_truncated\",\n    (\n        (False, (0, 0, False, False, {\"Test-info\": True}), False, False),\n        (False, (0, 0, True, False, {}), True, False),\n        (False, (0, 0, False, True, {}), True, True),\n        # (False, (), True, True),  # Not possible to encode in the old step api\n        # Test vector dict info\n        (\n            True,\n            (0, 0, np.array([False, True, False]), np.array([False, False, True]), {}),\n            np.array([False, True, True]),\n            np.array([False, False, True]),\n        ),\n        # Test vector dict info with no truncation\n        (\n            True,\n            (0, 0, np.array([False, True]), np.array([False, False]), {}),\n            np.array([False, True]),\n            np.array([False, False]),\n        ),\n        # Test vector list info\n        (\n            True,\n            (\n                0,\n                0,\n                np.array([False, True, False]),\n                np.array([False, False, True]),\n                [{\"Test-Info\": True}, {}, {}],\n            ),\n            np.array([False, True, True]),\n            np.array([False, False, True]),\n        ),\n    ),\n)\ndef test_to_terminated_truncated_step_api(\n    is_vector_env, terminated_truncated_returns, expected_done, expected_truncated\n):\n    _, _, done, info = convert_to_done_step_api(\n        terminated_truncated_returns, is_vector_env=is_vector_env\n    )\n    assert np.all(done == expected_done)\n\n    if is_vector_env is False:\n        if expected_done:\n            assert info[\"TimeLimit.truncated\"] == expected_truncated\n        else:\n            assert \"TimeLimit.truncated\" not in info\n    elif isinstance(info, list):\n        for sub_info, env_done, env_truncated in zip(\n            info, expected_done, expected_truncated\n        ):\n            if env_done:\n                assert sub_info[\"TimeLimit.truncated\"] == env_truncated\n            else:\n                assert \"TimeLimit.truncated\" not in sub_info\n    else:  # isinstance(info, dict)\n        if np.any(expected_done):\n            assert np.all(info[\"TimeLimit.truncated\"] == expected_truncated)\n        else:\n            assert \"TimeLimit.truncated\" not in info\n\n    roundtripped_returns = convert_to_terminated_truncated_step_api(\n        (0, 0, done, info), is_vector_env=is_vector_env\n    )\n    assert data_equivalence(terminated_truncated_returns, roundtripped_returns)\n\n\ndef test_edge_case():\n    # When converting between the two-step APIs this is not possible in a single case\n    #   terminated=True and truncated=True -> done=True and info={}\n    # We cannot test this in test_to_terminated_truncated_step_api as the roundtripping test will fail\n    _, _, done, info = convert_to_done_step_api((0, 0, True, True, {}))\n    assert done is True\n    assert info == {\"TimeLimit.truncated\": False}\n\n    # Test with vector dict info\n    _, _, done, info = convert_to_done_step_api(\n        (0, 0, np.array([True]), np.array([True]), {}), is_vector_env=True\n    )\n    assert np.all(done)\n    assert info == {\"TimeLimit.truncated\": np.array([False])}\n\n    # Test with vector list info\n    _, _, done, info = convert_to_done_step_api(\n        (0, 0, np.array([True]), np.array([True]), [{\"Test-Info\": True}]),\n        is_vector_env=True,\n    )\n    assert np.all(done)\n    assert info == [{\"Test-Info\": True, \"TimeLimit.truncated\": False}]\n"
  },
  {
    "path": "tests/vector/__init__.py",
    "content": ""
  },
  {
    "path": "tests/vector/test_async_vector_env.py",
    "content": "import re\nfrom multiprocessing import TimeoutError\n\nimport numpy as np\nimport pytest\n\nfrom gym.error import AlreadyPendingCallError, ClosedEnvironmentError, NoAsyncCallError\nfrom gym.spaces import Box, Discrete, MultiDiscrete, Tuple\nfrom gym.vector.async_vector_env import AsyncVectorEnv\nfrom tests.vector.utils import (\n    CustomSpace,\n    make_custom_space_env,\n    make_env,\n    make_slow_env,\n)\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_create_async_vector_env(shared_memory):\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(8)]\n\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    assert env.num_envs == 8\n    env.close()\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_reset_async_vector_env(shared_memory):\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(8)]\n\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    observations, infos = env.reset()\n\n    env.close()\n\n    assert isinstance(env.observation_space, Box)\n    assert isinstance(observations, np.ndarray)\n    assert observations.dtype == env.observation_space.dtype\n    assert observations.shape == (8,) + env.single_observation_space.shape\n    assert observations.shape == env.observation_space.shape\n\n    try:\n        env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n        observations, infos = env.reset()\n    finally:\n        env.close()\n\n    assert isinstance(env.observation_space, Box)\n    assert isinstance(observations, np.ndarray)\n    assert observations.dtype == env.observation_space.dtype\n    assert observations.shape == (8,) + env.single_observation_space.shape\n    assert observations.shape == env.observation_space.shape\n    assert isinstance(infos, dict)\n    assert all([isinstance(info, dict) for info in infos])\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\n@pytest.mark.parametrize(\"use_single_action_space\", [True, False])\ndef test_step_async_vector_env(shared_memory, use_single_action_space):\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(8)]\n\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    observations = env.reset()\n\n    assert isinstance(env.single_action_space, Discrete)\n    assert isinstance(env.action_space, MultiDiscrete)\n\n    if use_single_action_space:\n        actions = [env.single_action_space.sample() for _ in range(8)]\n    else:\n        actions = env.action_space.sample()\n    observations, rewards, terminateds, truncateds, _ = env.step(actions)\n\n    env.close()\n\n    assert isinstance(env.observation_space, Box)\n    assert isinstance(observations, np.ndarray)\n    assert observations.dtype == env.observation_space.dtype\n    assert observations.shape == (8,) + env.single_observation_space.shape\n    assert observations.shape == env.observation_space.shape\n\n    assert isinstance(rewards, np.ndarray)\n    assert isinstance(rewards[0], (float, np.floating))\n    assert rewards.ndim == 1\n    assert rewards.size == 8\n\n    assert isinstance(terminateds, np.ndarray)\n    assert terminateds.dtype == np.bool_\n    assert terminateds.ndim == 1\n    assert terminateds.size == 8\n\n    assert isinstance(truncateds, np.ndarray)\n    assert truncateds.dtype == np.bool_\n    assert truncateds.ndim == 1\n    assert truncateds.size == 8\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_call_async_vector_env(shared_memory):\n    env_fns = [\n        make_env(\"CartPole-v1\", i, render_mode=\"rgb_array_list\") for i in range(4)\n    ]\n\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    _ = env.reset()\n    images = env.call(\"render\")\n    gravity = env.call(\"gravity\")\n\n    env.close()\n\n    assert isinstance(images, tuple)\n    assert len(images) == 4\n    for i in range(4):\n        assert len(images[i]) == 1\n        assert isinstance(images[i][0], np.ndarray)\n\n    assert isinstance(gravity, tuple)\n    assert len(gravity) == 4\n    for i in range(4):\n        assert isinstance(gravity[i], float)\n        assert gravity[i] == 9.8\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_set_attr_async_vector_env(shared_memory):\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(4)]\n\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    env.set_attr(\"gravity\", [9.81, 3.72, 8.87, 1.62])\n    gravity = env.get_attr(\"gravity\")\n    assert gravity == (9.81, 3.72, 8.87, 1.62)\n\n    env.close()\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_copy_async_vector_env(shared_memory):\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(8)]\n\n    # TODO, these tests do nothing, understand the purpose of the tests and fix them\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory, copy=True)\n    observations, infos = env.reset()\n    observations[0] = 0\n\n    env.close()\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_no_copy_async_vector_env(shared_memory):\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(8)]\n\n    # TODO, these tests do nothing, understand the purpose of the tests and fix them\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory, copy=False)\n    observations, infos = env.reset()\n    observations[0] = 0\n\n    env.close()\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_reset_timeout_async_vector_env(shared_memory):\n    env_fns = [make_slow_env(0.3, i) for i in range(4)]\n\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    with pytest.raises(TimeoutError):\n        env.reset_async()\n        env.reset_wait(timeout=0.1)\n\n    env.close(terminate=True)\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_step_timeout_async_vector_env(shared_memory):\n    env_fns = [make_slow_env(0.0, i) for i in range(4)]\n\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    with pytest.raises(TimeoutError):\n        env.reset()\n        env.step_async(np.array([0.1, 0.1, 0.3, 0.1]))\n        observations, rewards, terminateds, truncateds, _ = env.step_wait(timeout=0.1)\n    env.close(terminate=True)\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_reset_out_of_order_async_vector_env(shared_memory):\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(4)]\n\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    with pytest.raises(\n        NoAsyncCallError,\n        match=re.escape(\n            \"Calling `reset_wait` without any prior call to `reset_async`.\"\n        ),\n    ):\n        env.reset_wait()\n\n    env.close(terminate=True)\n\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    with pytest.raises(\n        AlreadyPendingCallError,\n        match=re.escape(\n            \"Calling `reset_async` while waiting for a pending call to `step` to complete\"\n        ),\n    ):\n        actions = env.action_space.sample()\n        env.reset()\n        env.step_async(actions)\n        env.reset_async()\n\n    with pytest.warns(\n        UserWarning,\n        match=re.escape(\n            \"Calling `close` while waiting for a pending call to `step` to complete.\"\n        ),\n    ):\n        env.close(terminate=True)\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_step_out_of_order_async_vector_env(shared_memory):\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(4)]\n\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    with pytest.raises(\n        NoAsyncCallError,\n        match=re.escape(\"Calling `step_wait` without any prior call to `step_async`.\"),\n    ):\n        env.action_space.sample()\n        env.reset()\n        env.step_wait()\n\n    env.close(terminate=True)\n\n    env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    with pytest.raises(\n        AlreadyPendingCallError,\n        match=re.escape(\n            \"Calling `step_async` while waiting for a pending call to `reset` to complete\"\n        ),\n    ):\n        actions = env.action_space.sample()\n        env.reset_async()\n        env.step_async(actions)\n\n    with pytest.warns(\n        UserWarning,\n        match=re.escape(\n            \"Calling `close` while waiting for a pending call to `reset` to complete.\"\n        ),\n    ):\n        env.close(terminate=True)\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_already_closed_async_vector_env(shared_memory):\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(4)]\n    with pytest.raises(ClosedEnvironmentError):\n        env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n        env.close()\n        env.reset()\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_check_spaces_async_vector_env(shared_memory):\n    # CartPole-v1 - observation_space: Box(4,), action_space: Discrete(2)\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(8)]\n    # FrozenLake-v1 - Discrete(16), action_space: Discrete(4)\n    env_fns[1] = make_env(\"FrozenLake-v1\", 1)\n    with pytest.raises(RuntimeError):\n        env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n        env.close(terminate=True)\n\n\ndef test_custom_space_async_vector_env():\n    env_fns = [make_custom_space_env(i) for i in range(4)]\n\n    env = AsyncVectorEnv(env_fns, shared_memory=False)\n    reset_observations, reset_infos = env.reset()\n\n    assert isinstance(env.single_action_space, CustomSpace)\n    assert isinstance(env.action_space, Tuple)\n\n    actions = (\"action-2\", \"action-3\", \"action-5\", \"action-7\")\n    step_observations, rewards, terminateds, truncateds, _ = env.step(actions)\n\n    env.close()\n\n    assert isinstance(env.single_observation_space, CustomSpace)\n    assert isinstance(env.observation_space, Tuple)\n\n    assert isinstance(reset_observations, tuple)\n    assert reset_observations == (\"reset\", \"reset\", \"reset\", \"reset\")\n\n    assert isinstance(step_observations, tuple)\n    assert step_observations == (\n        \"step(action-2)\",\n        \"step(action-3)\",\n        \"step(action-5)\",\n        \"step(action-7)\",\n    )\n\n\ndef test_custom_space_async_vector_env_shared_memory():\n    env_fns = [make_custom_space_env(i) for i in range(4)]\n    with pytest.raises(ValueError):\n        env = AsyncVectorEnv(env_fns, shared_memory=True)\n        env.close(terminate=True)\n"
  },
  {
    "path": "tests/vector/test_numpy_utils.py",
    "content": "from collections import OrderedDict\n\nimport numpy as np\nimport pytest\n\nfrom gym.spaces import Dict, Tuple\nfrom gym.vector.utils.numpy_utils import concatenate, create_empty_array\nfrom gym.vector.utils.spaces import BaseGymSpaces\nfrom tests.vector.utils import spaces\n\n\n@pytest.mark.parametrize(\n    \"space\", spaces, ids=[space.__class__.__name__ for space in spaces]\n)\ndef test_concatenate(space):\n    def assert_type(lhs, rhs, n):\n        # Special case: if rhs is a list of scalars, lhs must be an np.ndarray\n        if np.isscalar(rhs[0]):\n            assert isinstance(lhs, np.ndarray)\n            assert all([np.isscalar(rhs[i]) for i in range(n)])\n        else:\n            assert all([isinstance(rhs[i], type(lhs)) for i in range(n)])\n\n    def assert_nested_equal(lhs, rhs, n):\n        assert isinstance(rhs, list)\n        assert (n > 0) and (len(rhs) == n)\n        assert_type(lhs, rhs, n)\n        if isinstance(lhs, np.ndarray):\n            assert lhs.shape[0] == n\n            for i in range(n):\n                assert np.all(lhs[i] == rhs[i])\n\n        elif isinstance(lhs, tuple):\n            for i in range(len(lhs)):\n                rhs_T_i = [rhs[j][i] for j in range(n)]\n                assert_nested_equal(lhs[i], rhs_T_i, n)\n\n        elif isinstance(lhs, OrderedDict):\n            for key in lhs.keys():\n                rhs_T_key = [rhs[j][key] for j in range(n)]\n                assert_nested_equal(lhs[key], rhs_T_key, n)\n\n        else:\n            raise TypeError(f\"Got unknown type `{type(lhs)}`.\")\n\n    samples = [space.sample() for _ in range(8)]\n    array = create_empty_array(space, n=8)\n    concatenated = concatenate(space, samples, array)\n\n    assert np.all(concatenated == array)\n    assert_nested_equal(array, samples, n=8)\n\n\n@pytest.mark.parametrize(\"n\", [1, 8])\n@pytest.mark.parametrize(\n    \"space\", spaces, ids=[space.__class__.__name__ for space in spaces]\n)\ndef test_create_empty_array(space, n):\n    def assert_nested_type(arr, space, n):\n        if isinstance(space, BaseGymSpaces):\n            assert isinstance(arr, np.ndarray)\n            assert arr.dtype == space.dtype\n            assert arr.shape == (n,) + space.shape\n\n        elif isinstance(space, Tuple):\n            assert isinstance(arr, tuple)\n            assert len(arr) == len(space.spaces)\n            for i in range(len(arr)):\n                assert_nested_type(arr[i], space.spaces[i], n)\n\n        elif isinstance(space, Dict):\n            assert isinstance(arr, OrderedDict)\n            assert set(arr.keys()) ^ set(space.spaces.keys()) == set()\n            for key in arr.keys():\n                assert_nested_type(arr[key], space.spaces[key], n)\n\n        else:\n            raise TypeError(f\"Got unknown type `{type(arr)}`.\")\n\n    array = create_empty_array(space, n=n, fn=np.empty)\n    assert_nested_type(array, space, n=n)\n\n\n@pytest.mark.parametrize(\"n\", [1, 8])\n@pytest.mark.parametrize(\n    \"space\", spaces, ids=[space.__class__.__name__ for space in spaces]\n)\ndef test_create_empty_array_zeros(space, n):\n    def assert_nested_type(arr, space, n):\n        if isinstance(space, BaseGymSpaces):\n            assert isinstance(arr, np.ndarray)\n            assert arr.dtype == space.dtype\n            assert arr.shape == (n,) + space.shape\n            assert np.all(arr == 0)\n\n        elif isinstance(space, Tuple):\n            assert isinstance(arr, tuple)\n            assert len(arr) == len(space.spaces)\n            for i in range(len(arr)):\n                assert_nested_type(arr[i], space.spaces[i], n)\n\n        elif isinstance(space, Dict):\n            assert isinstance(arr, OrderedDict)\n            assert set(arr.keys()) ^ set(space.spaces.keys()) == set()\n            for key in arr.keys():\n                assert_nested_type(arr[key], space.spaces[key], n)\n\n        else:\n            raise TypeError(f\"Got unknown type `{type(arr)}`.\")\n\n    array = create_empty_array(space, n=n, fn=np.zeros)\n    assert_nested_type(array, space, n=n)\n\n\n@pytest.mark.parametrize(\n    \"space\", spaces, ids=[space.__class__.__name__ for space in spaces]\n)\ndef test_create_empty_array_none_shape_ones(space):\n    def assert_nested_type(arr, space):\n        if isinstance(space, BaseGymSpaces):\n            assert isinstance(arr, np.ndarray)\n            assert arr.dtype == space.dtype\n            assert arr.shape == space.shape\n            assert np.all(arr == 1)\n\n        elif isinstance(space, Tuple):\n            assert isinstance(arr, tuple)\n            assert len(arr) == len(space.spaces)\n            for i in range(len(arr)):\n                assert_nested_type(arr[i], space.spaces[i])\n\n        elif isinstance(space, Dict):\n            assert isinstance(arr, OrderedDict)\n            assert set(arr.keys()) ^ set(space.spaces.keys()) == set()\n            for key in arr.keys():\n                assert_nested_type(arr[key], space.spaces[key])\n\n        else:\n            raise TypeError(f\"Got unknown type `{type(arr)}`.\")\n\n    array = create_empty_array(space, n=None, fn=np.ones)\n    assert_nested_type(array, space)\n"
  },
  {
    "path": "tests/vector/test_shared_memory.py",
    "content": "import multiprocessing as mp\nfrom collections import OrderedDict\nfrom multiprocessing import Array, Process\nfrom multiprocessing.sharedctypes import SynchronizedArray\n\nimport numpy as np\nimport pytest\n\nfrom gym.error import CustomSpaceError\nfrom gym.spaces import Dict, Tuple\nfrom gym.vector.utils.shared_memory import (\n    create_shared_memory,\n    read_from_shared_memory,\n    write_to_shared_memory,\n)\nfrom gym.vector.utils.spaces import BaseGymSpaces\nfrom tests.vector.utils import custom_spaces, spaces\n\nexpected_types = [\n    Array(\"d\", 1),\n    Array(\"f\", 1),\n    Array(\"f\", 3),\n    Array(\"f\", 4),\n    Array(\"B\", 1),\n    Array(\"B\", 32 * 32 * 3),\n    Array(\"i\", 1),\n    Array(\"i\", 1),\n    (Array(\"i\", 1), Array(\"i\", 1)),\n    (Array(\"i\", 1), Array(\"f\", 2)),\n    Array(\"B\", 3),\n    Array(\"B\", 19),\n    OrderedDict([(\"position\", Array(\"i\", 1)), (\"velocity\", Array(\"f\", 1))]),\n    OrderedDict(\n        [\n            (\"position\", OrderedDict([(\"x\", Array(\"i\", 1)), (\"y\", Array(\"i\", 1))])),\n            (\"velocity\", (Array(\"i\", 1), Array(\"B\", 1))),\n        ]\n    ),\n]\n\n\n@pytest.mark.parametrize(\"n\", [1, 8])\n@pytest.mark.parametrize(\n    \"space,expected_type\",\n    list(zip(spaces, expected_types)),\n    ids=[space.__class__.__name__ for space in spaces],\n)\n@pytest.mark.parametrize(\n    \"ctx\", [None, \"fork\", \"spawn\"], ids=[\"default\", \"fork\", \"spawn\"]\n)\ndef test_create_shared_memory(space, expected_type, n, ctx):\n    def assert_nested_type(lhs, rhs, n):\n        assert type(lhs) == type(rhs)\n        if isinstance(lhs, (list, tuple)):\n            assert len(lhs) == len(rhs)\n            for lhs_, rhs_ in zip(lhs, rhs):\n                assert_nested_type(lhs_, rhs_, n)\n\n        elif isinstance(lhs, (dict, OrderedDict)):\n            assert set(lhs.keys()) ^ set(rhs.keys()) == set()\n            for key in lhs.keys():\n                assert_nested_type(lhs[key], rhs[key], n)\n\n        elif isinstance(lhs, SynchronizedArray):\n            # Assert the length of the array\n            assert len(lhs[:]) == n * len(rhs[:])\n            # Assert the data type\n            assert isinstance(lhs[0], type(rhs[0]))\n        else:\n            raise TypeError(f\"Got unknown type `{type(lhs)}`.\")\n\n    ctx = mp if (ctx is None) else mp.get_context(ctx)\n    shared_memory = create_shared_memory(space, n=n, ctx=ctx)\n    assert_nested_type(shared_memory, expected_type, n=n)\n\n\n@pytest.mark.parametrize(\"n\", [1, 8])\n@pytest.mark.parametrize(\n    \"ctx\", [None, \"fork\", \"spawn\"], ids=[\"default\", \"fork\", \"spawn\"]\n)\n@pytest.mark.parametrize(\"space\", custom_spaces)\ndef test_create_shared_memory_custom_space(n, ctx, space):\n    ctx = mp if (ctx is None) else mp.get_context(ctx)\n    with pytest.raises(CustomSpaceError):\n        create_shared_memory(space, n=n, ctx=ctx)\n\n\ndef _write_shared_memory(space, i, shared_memory, sample):\n    write_to_shared_memory(space, i, sample, shared_memory)\n\n\n@pytest.mark.parametrize(\n    \"space\", spaces, ids=[space.__class__.__name__ for space in spaces]\n)\ndef test_write_to_shared_memory(space):\n    def assert_nested_equal(lhs, rhs):\n        assert isinstance(rhs, list)\n        if isinstance(lhs, (list, tuple)):\n            for i in range(len(lhs)):\n                assert_nested_equal(lhs[i], [rhs_[i] for rhs_ in rhs])\n\n        elif isinstance(lhs, (dict, OrderedDict)):\n            for key in lhs.keys():\n                assert_nested_equal(lhs[key], [rhs_[key] for rhs_ in rhs])\n\n        elif isinstance(lhs, SynchronizedArray):\n            assert np.all(np.array(lhs[:]) == np.stack(rhs, axis=0).flatten())\n\n        else:\n            raise TypeError(f\"Got unknown type `{type(lhs)}`.\")\n\n    shared_memory_n8 = create_shared_memory(space, n=8)\n    samples = [space.sample() for _ in range(8)]\n\n    processes = [\n        Process(\n            target=_write_shared_memory, args=(space, i, shared_memory_n8, samples[i])\n        )\n        for i in range(8)\n    ]\n\n    for process in processes:\n        process.start()\n    for process in processes:\n        process.join()\n\n    assert_nested_equal(shared_memory_n8, samples)\n\n\ndef _process_write(space, i, shared_memory, sample):\n    write_to_shared_memory(space, i, sample, shared_memory)\n\n\n@pytest.mark.parametrize(\n    \"space\", spaces, ids=[space.__class__.__name__ for space in spaces]\n)\ndef test_read_from_shared_memory(space):\n    def assert_nested_equal(lhs, rhs, space, n):\n        assert isinstance(rhs, list)\n        if isinstance(space, Tuple):\n            assert isinstance(lhs, tuple)\n            for i in range(len(lhs)):\n                assert_nested_equal(\n                    lhs[i], [rhs_[i] for rhs_ in rhs], space.spaces[i], n\n                )\n\n        elif isinstance(space, Dict):\n            assert isinstance(lhs, OrderedDict)\n            for key in lhs.keys():\n                assert_nested_equal(\n                    lhs[key], [rhs_[key] for rhs_ in rhs], space.spaces[key], n\n                )\n\n        elif isinstance(space, BaseGymSpaces):\n            assert isinstance(lhs, np.ndarray)\n            assert lhs.shape == ((n,) + space.shape)\n            assert lhs.dtype == space.dtype\n            assert np.all(lhs == np.stack(rhs, axis=0))\n\n        else:\n            raise TypeError(f\"Got unknown type `{type(space)}`\")\n\n    shared_memory_n8 = create_shared_memory(space, n=8)\n    memory_view_n8 = read_from_shared_memory(space, shared_memory_n8, n=8)\n    samples = [space.sample() for _ in range(8)]\n\n    processes = [\n        Process(target=_process_write, args=(space, i, shared_memory_n8, samples[i]))\n        for i in range(8)\n    ]\n\n    for process in processes:\n        process.start()\n    for process in processes:\n        process.join()\n\n    assert_nested_equal(memory_view_n8, samples, space, n=8)\n"
  },
  {
    "path": "tests/vector/test_spaces.py",
    "content": "import copy\n\nimport numpy as np\nimport pytest\nfrom numpy.testing import assert_array_equal\n\nfrom gym.spaces import Box, Dict, MultiDiscrete, Space, Tuple\nfrom gym.vector.utils.spaces import batch_space, iterate\nfrom tests.vector.utils import CustomSpace, assert_rng_equal, custom_spaces, spaces\n\nexpected_batch_spaces_4 = [\n    Box(low=-1.0, high=1.0, shape=(4,), dtype=np.float64),\n    Box(low=0.0, high=10.0, shape=(4, 1), dtype=np.float64),\n    Box(\n        low=np.array(\n            [[-1.0, 0.0, 0.0], [-1.0, 0.0, 0.0], [-1.0, 0.0, 0.0], [-1.0, 0.0, 0.0]]\n        ),\n        high=np.array(\n            [[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0]]\n        ),\n        dtype=np.float64,\n    ),\n    Box(\n        low=np.array(\n            [\n                [[-1.0, 0.0], [0.0, -1.0]],\n                [[-1.0, 0.0], [0.0, -1.0]],\n                [[-1.0, 0.0], [0.0, -1]],\n                [[-1.0, 0.0], [0.0, -1.0]],\n            ]\n        ),\n        high=np.ones((4, 2, 2)),\n        dtype=np.float64,\n    ),\n    Box(low=0, high=255, shape=(4,), dtype=np.uint8),\n    Box(low=0, high=255, shape=(4, 32, 32, 3), dtype=np.uint8),\n    MultiDiscrete([2, 2, 2, 2]),\n    Box(low=-2, high=2, shape=(4,), dtype=np.int64),\n    Tuple((MultiDiscrete([3, 3, 3, 3]), MultiDiscrete([5, 5, 5, 5]))),\n    Tuple(\n        (\n            MultiDiscrete([7, 7, 7, 7]),\n            Box(\n                low=np.array([[0.0, -1.0], [0.0, -1.0], [0.0, -1.0], [0.0, -1]]),\n                high=np.array([[1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0]]),\n                dtype=np.float64,\n            ),\n        )\n    ),\n    Box(\n        low=np.array([[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]),\n        high=np.array([[10, 12, 16], [10, 12, 16], [10, 12, 16], [10, 12, 16]]),\n        dtype=np.int64,\n    ),\n    Box(low=0, high=1, shape=(4, 19), dtype=np.int8),\n    Dict(\n        {\n            \"position\": MultiDiscrete([23, 23, 23, 23]),\n            \"velocity\": Box(low=0.0, high=1.0, shape=(4, 1), dtype=np.float64),\n        }\n    ),\n    Dict(\n        {\n            \"position\": Dict(\n                {\n                    \"x\": MultiDiscrete([29, 29, 29, 29]),\n                    \"y\": MultiDiscrete([31, 31, 31, 31]),\n                }\n            ),\n            \"velocity\": Tuple(\n                (\n                    MultiDiscrete([37, 37, 37, 37]),\n                    Box(low=0, high=255, shape=(4,), dtype=np.uint8),\n                )\n            ),\n        }\n    ),\n]\n\nexpected_custom_batch_spaces_4 = [\n    Tuple((CustomSpace(), CustomSpace(), CustomSpace(), CustomSpace())),\n    Tuple(\n        (\n            Tuple((CustomSpace(), CustomSpace(), CustomSpace(), CustomSpace())),\n            Box(low=0, high=255, shape=(4,), dtype=np.uint8),\n        )\n    ),\n]\n\n\n@pytest.mark.parametrize(\n    \"space,expected_batch_space_4\",\n    list(zip(spaces, expected_batch_spaces_4)),\n    ids=[space.__class__.__name__ for space in spaces],\n)\ndef test_batch_space(space, expected_batch_space_4):\n    batch_space_4 = batch_space(space, n=4)\n    assert batch_space_4 == expected_batch_space_4\n\n\n@pytest.mark.parametrize(\n    \"space,expected_batch_space_4\",\n    list(zip(custom_spaces, expected_custom_batch_spaces_4)),\n    ids=[space.__class__.__name__ for space in custom_spaces],\n)\ndef test_batch_space_custom_space(space, expected_batch_space_4):\n    batch_space_4 = batch_space(space, n=4)\n    assert batch_space_4 == expected_batch_space_4\n\n\n@pytest.mark.parametrize(\n    \"space,batch_space\",\n    list(zip(spaces, expected_batch_spaces_4)),\n    ids=[space.__class__.__name__ for space in spaces],\n)\ndef test_iterate(space, batch_space):\n    items = batch_space.sample()\n    iterator = iterate(batch_space, items)\n    i = 0\n    for i, item in enumerate(iterator):\n        assert item in space\n    assert i == 3\n\n\n@pytest.mark.parametrize(\n    \"space,batch_space\",\n    list(zip(custom_spaces, expected_custom_batch_spaces_4)),\n    ids=[space.__class__.__name__ for space in custom_spaces],\n)\ndef test_iterate_custom_space(space, batch_space):\n    items = batch_space.sample()\n    iterator = iterate(batch_space, items)\n    i = 0\n    for i, item in enumerate(iterator):\n        assert item in space\n    assert i == 3\n\n\n@pytest.mark.parametrize(\n    \"space\", spaces, ids=[space.__class__.__name__ for space in spaces]\n)\n@pytest.mark.parametrize(\"n\", [4, 5], ids=[f\"n={n}\" for n in [4, 5]])\n@pytest.mark.parametrize(\n    \"base_seed\", [123, 456], ids=[f\"seed={base_seed}\" for base_seed in [123, 456]]\n)\ndef test_rng_different_at_each_index(space: Space, n: int, base_seed: int):\n    \"\"\"\n    Tests that the rng values produced at each index are different\n    to prevent if the rng is copied for each subspace\n    \"\"\"\n    space.seed(base_seed)\n\n    batched_space = batch_space(space, n)\n    assert space.np_random is not batched_space.np_random\n    assert_rng_equal(space.np_random, batched_space.np_random)\n\n    batched_sample = batched_space.sample()\n    sample = list(iterate(batched_space, batched_sample))\n    assert not all(np.all(element == sample[0]) for element in sample), sample\n\n\n@pytest.mark.parametrize(\n    \"space\", spaces, ids=[space.__class__.__name__ for space in spaces]\n)\n@pytest.mark.parametrize(\"n\", [1, 2, 5], ids=[f\"n={n}\" for n in [1, 2, 5]])\n@pytest.mark.parametrize(\n    \"base_seed\", [123, 456], ids=[f\"seed={base_seed}\" for base_seed in [123, 456]]\n)\ndef test_deterministic(space: Space, n: int, base_seed: int):\n    \"\"\"Tests the batched spaces are deterministic by using a copied version\"\"\"\n    # Copy the spaces and check that the np_random are not reference equal\n    space_a = space\n    space_a.seed(base_seed)\n    space_b = copy.deepcopy(space_a)\n    assert_rng_equal(space_a.np_random, space_b.np_random)\n    assert space_a.np_random is not space_b.np_random\n\n    # Batch the spaces and check that the np_random are not reference equal\n    space_a_batched = batch_space(space_a, n)\n    space_b_batched = batch_space(space_b, n)\n    assert_rng_equal(space_a_batched.np_random, space_b_batched.np_random)\n    assert space_a_batched.np_random is not space_b_batched.np_random\n    # Create that the batched space is not reference equal to the origin spaces\n    assert space_a.np_random is not space_a_batched.np_random\n\n    # Check that batched space a and b random number generator are not effected by the original space\n    space_a.sample()\n    space_a_batched_sample = space_a_batched.sample()\n    space_b_batched_sample = space_b_batched.sample()\n    for a_sample, b_sample in zip(\n        iterate(space_a_batched, space_a_batched_sample),\n        iterate(space_b_batched, space_b_batched_sample),\n    ):\n        if isinstance(a_sample, tuple):\n            assert len(a_sample) == len(b_sample)\n            for a_subsample, b_subsample in zip(a_sample, b_sample):\n                assert_array_equal(a_subsample, b_subsample)\n        else:\n            assert_array_equal(a_sample, b_sample)\n"
  },
  {
    "path": "tests/vector/test_sync_vector_env.py",
    "content": "import numpy as np\nimport pytest\n\nfrom gym.envs.registration import EnvSpec\nfrom gym.spaces import Box, Discrete, MultiDiscrete, Tuple\nfrom gym.vector.sync_vector_env import SyncVectorEnv\nfrom tests.envs.utils import all_testing_env_specs\nfrom tests.vector.utils import (\n    CustomSpace,\n    assert_rng_equal,\n    make_custom_space_env,\n    make_env,\n)\n\n\ndef test_create_sync_vector_env():\n    env_fns = [make_env(\"FrozenLake-v1\", i) for i in range(8)]\n    env = SyncVectorEnv(env_fns)\n    env.close()\n\n    assert env.num_envs == 8\n\n\ndef test_reset_sync_vector_env():\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(8)]\n    env = SyncVectorEnv(env_fns)\n    observations, infos = env.reset()\n    env.close()\n\n    assert isinstance(env.observation_space, Box)\n    assert isinstance(observations, np.ndarray)\n    assert observations.dtype == env.observation_space.dtype\n    assert observations.shape == (8,) + env.single_observation_space.shape\n    assert observations.shape == env.observation_space.shape\n\n    del observations\n\n\n@pytest.mark.parametrize(\"use_single_action_space\", [True, False])\ndef test_step_sync_vector_env(use_single_action_space):\n    env_fns = [make_env(\"FrozenLake-v1\", i) for i in range(8)]\n\n    env = SyncVectorEnv(env_fns)\n    observations = env.reset()\n\n    assert isinstance(env.single_action_space, Discrete)\n    assert isinstance(env.action_space, MultiDiscrete)\n\n    if use_single_action_space:\n        actions = [env.single_action_space.sample() for _ in range(8)]\n    else:\n        actions = env.action_space.sample()\n    observations, rewards, terminateds, truncateds, _ = env.step(actions)\n\n    env.close()\n\n    assert isinstance(env.observation_space, MultiDiscrete)\n    assert isinstance(observations, np.ndarray)\n    assert observations.dtype == env.observation_space.dtype\n    assert observations.shape == (8,) + env.single_observation_space.shape\n    assert observations.shape == env.observation_space.shape\n\n    assert isinstance(rewards, np.ndarray)\n    assert isinstance(rewards[0], (float, np.floating))\n    assert rewards.ndim == 1\n    assert rewards.size == 8\n\n    assert isinstance(terminateds, np.ndarray)\n    assert terminateds.dtype == np.bool_\n    assert terminateds.ndim == 1\n    assert terminateds.size == 8\n\n    assert isinstance(truncateds, np.ndarray)\n    assert truncateds.dtype == np.bool_\n    assert truncateds.ndim == 1\n    assert truncateds.size == 8\n\n\ndef test_call_sync_vector_env():\n    env_fns = [\n        make_env(\"CartPole-v1\", i, render_mode=\"rgb_array_list\") for i in range(4)\n    ]\n\n    env = SyncVectorEnv(env_fns)\n    _ = env.reset()\n    images = env.call(\"render\")\n    gravity = env.call(\"gravity\")\n\n    env.close()\n\n    assert isinstance(images, tuple)\n    assert len(images) == 4\n    for i in range(4):\n        assert len(images[i]) == 1\n        assert isinstance(images[i][0], np.ndarray)\n\n    assert isinstance(gravity, tuple)\n    assert len(gravity) == 4\n    for i in range(4):\n        assert isinstance(gravity[i], float)\n        assert gravity[i] == 9.8\n\n\ndef test_set_attr_sync_vector_env():\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(4)]\n\n    env = SyncVectorEnv(env_fns)\n    env.set_attr(\"gravity\", [9.81, 3.72, 8.87, 1.62])\n    gravity = env.get_attr(\"gravity\")\n    assert gravity == (9.81, 3.72, 8.87, 1.62)\n\n    env.close()\n\n\ndef test_check_spaces_sync_vector_env():\n    # CartPole-v1 - observation_space: Box(4,), action_space: Discrete(2)\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(8)]\n    # FrozenLake-v1 - Discrete(16), action_space: Discrete(4)\n    env_fns[1] = make_env(\"FrozenLake-v1\", 1)\n    with pytest.raises(RuntimeError):\n        env = SyncVectorEnv(env_fns)\n        env.close()\n\n\ndef test_custom_space_sync_vector_env():\n    env_fns = [make_custom_space_env(i) for i in range(4)]\n\n    env = SyncVectorEnv(env_fns)\n    reset_observations, infos = env.reset()\n\n    assert isinstance(env.single_action_space, CustomSpace)\n    assert isinstance(env.action_space, Tuple)\n\n    actions = (\"action-2\", \"action-3\", \"action-5\", \"action-7\")\n    step_observations, rewards, terminateds, truncateds, _ = env.step(actions)\n\n    env.close()\n\n    assert isinstance(env.single_observation_space, CustomSpace)\n    assert isinstance(env.observation_space, Tuple)\n\n    assert isinstance(reset_observations, tuple)\n    assert reset_observations == (\"reset\", \"reset\", \"reset\", \"reset\")\n\n    assert isinstance(step_observations, tuple)\n    assert step_observations == (\n        \"step(action-2)\",\n        \"step(action-3)\",\n        \"step(action-5)\",\n        \"step(action-7)\",\n    )\n\n\ndef test_sync_vector_env_seed():\n    env = make_env(\"BipedalWalker-v3\", seed=123)()\n    sync_vector_env = SyncVectorEnv([make_env(\"BipedalWalker-v3\", seed=123)])\n\n    assert_rng_equal(env.action_space.np_random, sync_vector_env.action_space.np_random)\n    for _ in range(100):\n        env_action = env.action_space.sample()\n        vector_action = sync_vector_env.action_space.sample()\n        assert np.all(env_action == vector_action)\n\n\n@pytest.mark.parametrize(\n    \"spec\", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]\n)\ndef test_sync_vector_determinism(spec: EnvSpec, seed: int = 123, n: int = 3):\n    \"\"\"Check that for all environments, the sync vector envs produce the same action samples using the same seeds\"\"\"\n    env_1 = SyncVectorEnv([make_env(spec.id, seed=seed) for _ in range(n)])\n    env_2 = SyncVectorEnv([make_env(spec.id, seed=seed) for _ in range(n)])\n    assert_rng_equal(env_1.action_space.np_random, env_2.action_space.np_random)\n\n    for _ in range(100):\n        env_1_samples = env_1.action_space.sample()\n        env_2_samples = env_2.action_space.sample()\n        assert np.all(env_1_samples == env_2_samples)\n"
  },
  {
    "path": "tests/vector/test_vector_env.py",
    "content": "from functools import partial\n\nimport numpy as np\nimport pytest\n\nfrom gym.spaces import Discrete, Tuple\nfrom gym.vector.async_vector_env import AsyncVectorEnv\nfrom gym.vector.sync_vector_env import SyncVectorEnv\nfrom gym.vector.vector_env import VectorEnv\nfrom tests.testing_env import GenericTestEnv\nfrom tests.vector.utils import CustomSpace, make_env\n\n\n@pytest.mark.parametrize(\"shared_memory\", [True, False])\ndef test_vector_env_equal(shared_memory):\n    env_fns = [make_env(\"CartPole-v1\", i) for i in range(4)]\n    num_steps = 100\n\n    async_env = AsyncVectorEnv(env_fns, shared_memory=shared_memory)\n    sync_env = SyncVectorEnv(env_fns)\n\n    assert async_env.num_envs == sync_env.num_envs\n    assert async_env.observation_space == sync_env.observation_space\n    assert async_env.single_observation_space == sync_env.single_observation_space\n    assert async_env.action_space == sync_env.action_space\n    assert async_env.single_action_space == sync_env.single_action_space\n\n    async_observations, async_infos = async_env.reset(seed=0)\n    sync_observations, sync_infos = sync_env.reset(seed=0)\n    assert np.all(async_observations == sync_observations)\n\n    for _ in range(num_steps):\n        actions = async_env.action_space.sample()\n        assert actions in sync_env.action_space\n\n        # fmt: off\n        async_observations, async_rewards, async_terminateds, async_truncateds, async_infos = async_env.step(actions)\n        sync_observations, sync_rewards, sync_terminateds, sync_truncateds, sync_infos = sync_env.step(actions)\n        # fmt: on\n\n        if any(sync_terminateds) or any(sync_truncateds):\n            assert \"final_observation\" in async_infos\n            assert \"_final_observation\" in async_infos\n            assert \"final_observation\" in sync_infos\n            assert \"_final_observation\" in sync_infos\n\n        assert np.all(async_observations == sync_observations)\n        assert np.all(async_rewards == sync_rewards)\n        assert np.all(async_terminateds == sync_terminateds)\n        assert np.all(async_truncateds == sync_truncateds)\n\n    async_env.close()\n    sync_env.close()\n\n\ndef test_custom_space_vector_env():\n    env = VectorEnv(4, CustomSpace(), CustomSpace())\n\n    assert isinstance(env.single_observation_space, CustomSpace)\n    assert isinstance(env.observation_space, Tuple)\n\n    assert isinstance(env.single_action_space, CustomSpace)\n    assert isinstance(env.action_space, Tuple)\n\n\n@pytest.mark.parametrize(\n    \"vectoriser\",\n    (\n        SyncVectorEnv,\n        partial(AsyncVectorEnv, shared_memory=True),\n        partial(AsyncVectorEnv, shared_memory=False),\n    ),\n    ids=[\"Sync\", \"Async with shared memory\", \"Async without shared memory\"],\n)\ndef test_final_obs_info(vectoriser):\n    \"\"\"Tests that the vector environments correctly return the final observation and info.\"\"\"\n\n    def reset_fn(self, seed=None, options=None):\n        return 0, {\"reset\": True}\n\n    def thunk():\n        return GenericTestEnv(\n            action_space=Discrete(4),\n            observation_space=Discrete(4),\n            reset_fn=reset_fn,\n            step_fn=lambda self, action: (\n                action if action < 3 else 0,\n                0,\n                action >= 3,\n                False,\n                {\"action\": action},\n            ),\n        )\n\n    env = vectoriser([thunk])\n    obs, info = env.reset()\n    assert obs == np.array([0]) and info == {\n        \"reset\": np.array([True]),\n        \"_reset\": np.array([True]),\n    }\n\n    obs, _, termination, _, info = env.step([1])\n    assert (\n        obs == np.array([1])\n        and termination == np.array([False])\n        and info == {\"action\": np.array([1]), \"_action\": np.array([True])}\n    )\n\n    obs, _, termination, _, info = env.step([2])\n    assert (\n        obs == np.array([2])\n        and termination == np.array([False])\n        and info == {\"action\": np.array([2]), \"_action\": np.array([True])}\n    )\n\n    obs, _, termination, _, info = env.step([3])\n    assert (\n        obs == np.array([0])\n        and termination == np.array([True])\n        and info[\"reset\"] == np.array([True])\n    )\n    assert \"final_observation\" in info and \"final_info\" in info\n    assert info[\"final_observation\"] == np.array([0]) and info[\"final_info\"] == {\n        \"action\": 3\n    }\n"
  },
  {
    "path": "tests/vector/test_vector_env_info.py",
    "content": "import numpy as np\nimport pytest\n\nimport gym\nfrom gym.vector.sync_vector_env import SyncVectorEnv\nfrom tests.vector.utils import make_env\n\nENV_ID = \"CartPole-v1\"\nNUM_ENVS = 3\nENV_STEPS = 50\nSEED = 42\n\n\n@pytest.mark.parametrize(\"asynchronous\", [True, False])\ndef test_vector_env_info(asynchronous):\n    env = gym.vector.make(\n        ENV_ID, num_envs=NUM_ENVS, asynchronous=asynchronous, disable_env_checker=True\n    )\n    env.reset(seed=SEED)\n    for _ in range(ENV_STEPS):\n        env.action_space.seed(SEED)\n        action = env.action_space.sample()\n        _, _, terminateds, truncateds, infos = env.step(action)\n        if any(terminateds) or any(truncateds):\n            assert len(infos[\"final_observation\"]) == NUM_ENVS\n            assert len(infos[\"_final_observation\"]) == NUM_ENVS\n\n            assert isinstance(infos[\"final_observation\"], np.ndarray)\n            assert isinstance(infos[\"_final_observation\"], np.ndarray)\n\n            for i, (terminated, truncated) in enumerate(zip(terminateds, truncateds)):\n                if terminated or truncated:\n                    assert infos[\"_final_observation\"][i]\n                else:\n                    assert not infos[\"_final_observation\"][i]\n                    assert infos[\"final_observation\"][i] is None\n\n\n@pytest.mark.parametrize(\"concurrent_ends\", [1, 2, 3])\ndef test_vector_env_info_concurrent_termination(concurrent_ends):\n    # envs that need to terminate together will have the same action\n    actions = [0] * concurrent_ends + [1] * (NUM_ENVS - concurrent_ends)\n    envs = [make_env(ENV_ID, SEED) for _ in range(NUM_ENVS)]\n    envs = SyncVectorEnv(envs)\n\n    for _ in range(ENV_STEPS):\n        _, _, terminateds, truncateds, infos = envs.step(actions)\n        if any(terminateds) or any(truncateds):\n            for i, (terminated, truncated) in enumerate(zip(terminateds, truncateds)):\n                if i < concurrent_ends:\n                    assert terminated or truncated\n                    assert infos[\"_final_observation\"][i]\n                else:\n                    assert not infos[\"_final_observation\"][i]\n                    assert infos[\"final_observation\"][i] is None\n            return\n"
  },
  {
    "path": "tests/vector/test_vector_env_wrapper.py",
    "content": "import numpy as np\n\nfrom gym.vector import VectorEnvWrapper, make\n\n\nclass DummyWrapper(VectorEnvWrapper):\n    def __init__(self, env):\n        self.env = env\n        self.counter = 0\n\n    def reset_async(self, **kwargs):\n        super().reset_async()\n        self.counter += 1\n\n\ndef test_vector_env_wrapper_inheritance():\n    env = make(\"FrozenLake-v1\", asynchronous=False)\n    wrapped = DummyWrapper(env)\n    wrapped.reset()\n    assert wrapped.counter == 1\n\n\ndef test_vector_env_wrapper_attributes():\n    \"\"\"Test if `set_attr`, `call` methods for VecEnvWrapper get correctly forwarded to the vector env it is wrapping.\"\"\"\n    env = make(\"CartPole-v1\", num_envs=3)\n    wrapped = DummyWrapper(make(\"CartPole-v1\", num_envs=3))\n\n    assert np.allclose(wrapped.call(\"gravity\"), env.call(\"gravity\"))\n    env.set_attr(\"gravity\", [20.0, 20.0, 20.0])\n    wrapped.set_attr(\"gravity\", [20.0, 20.0, 20.0])\n    assert np.allclose(wrapped.get_attr(\"gravity\"), env.get_attr(\"gravity\"))\n"
  },
  {
    "path": "tests/vector/test_vector_make.py",
    "content": "import pytest\n\nimport gym\nfrom gym.vector import AsyncVectorEnv, SyncVectorEnv\nfrom gym.wrappers import OrderEnforcing, TimeLimit, TransformObservation\nfrom gym.wrappers.env_checker import PassiveEnvChecker\nfrom tests.wrappers.utils import has_wrapper\n\n\ndef test_vector_make_id():\n    env = gym.vector.make(\"CartPole-v1\")\n    assert isinstance(env, AsyncVectorEnv)\n    assert env.num_envs == 1\n    env.close()\n\n\n@pytest.mark.parametrize(\"num_envs\", [1, 3, 10])\ndef test_vector_make_num_envs(num_envs):\n    env = gym.vector.make(\"CartPole-v1\", num_envs=num_envs)\n    assert env.num_envs == num_envs\n    env.close()\n\n\ndef test_vector_make_asynchronous():\n    env = gym.vector.make(\"CartPole-v1\", asynchronous=True)\n    assert isinstance(env, AsyncVectorEnv)\n    env.close()\n\n    env = gym.vector.make(\"CartPole-v1\", asynchronous=False)\n    assert isinstance(env, SyncVectorEnv)\n    env.close()\n\n\ndef test_vector_make_wrappers():\n    env = gym.vector.make(\"CartPole-v1\", num_envs=2, asynchronous=False)\n    assert isinstance(env, SyncVectorEnv)\n    assert len(env.envs) == 2\n\n    sub_env = env.envs[0]\n    assert isinstance(sub_env, gym.Env)\n    if sub_env.spec.order_enforce:\n        assert has_wrapper(sub_env, OrderEnforcing)\n    if sub_env.spec.max_episode_steps is not None:\n        assert has_wrapper(sub_env, TimeLimit)\n\n    assert all(\n        has_wrapper(sub_env, TransformObservation) is False for sub_env in env.envs\n    )\n    env.close()\n\n    env = gym.vector.make(\n        \"CartPole-v1\",\n        num_envs=2,\n        asynchronous=False,\n        wrappers=lambda _env: TransformObservation(_env, lambda obs: obs * 2),\n    )\n    # As asynchronous environment are inaccessible, synchronous vector must be used\n    assert isinstance(env, SyncVectorEnv)\n    assert all(has_wrapper(sub_env, TransformObservation) for sub_env in env.envs)\n\n    env.close()\n\n\ndef test_vector_make_disable_env_checker():\n    # As asynchronous environment are inaccessible, synchronous vector must be used\n    env = gym.vector.make(\"CartPole-v1\", num_envs=1, asynchronous=False)\n    assert isinstance(env, SyncVectorEnv)\n    assert has_wrapper(env.envs[0], PassiveEnvChecker)\n    env.close()\n\n    env = gym.vector.make(\"CartPole-v1\", num_envs=5, asynchronous=False)\n    assert isinstance(env, SyncVectorEnv)\n    assert has_wrapper(env.envs[0], PassiveEnvChecker)\n    assert all(\n        has_wrapper(env.envs[i], PassiveEnvChecker) is False for i in [1, 2, 3, 4]\n    )\n    env.close()\n\n    env = gym.vector.make(\n        \"CartPole-v1\", num_envs=3, asynchronous=False, disable_env_checker=True\n    )\n    assert isinstance(env, SyncVectorEnv)\n    assert all(has_wrapper(sub_env, PassiveEnvChecker) is False for sub_env in env.envs)\n    env.close()\n"
  },
  {
    "path": "tests/vector/utils.py",
    "content": "import time\nfrom typing import Optional\n\nimport numpy as np\n\nimport gym\nfrom gym.spaces import Box, Dict, Discrete, MultiBinary, MultiDiscrete, Tuple\nfrom gym.utils.seeding import RandomNumberGenerator\n\nspaces = [\n    Box(low=np.array(-1.0), high=np.array(1.0), dtype=np.float64),\n    Box(low=np.array([0.0]), high=np.array([10.0]), dtype=np.float64),\n    Box(\n        low=np.array([-1.0, 0.0, 0.0]), high=np.array([1.0, 1.0, 1.0]), dtype=np.float64\n    ),\n    Box(\n        low=np.array([[-1.0, 0.0], [0.0, -1.0]]), high=np.ones((2, 2)), dtype=np.float64\n    ),\n    Box(low=0, high=255, shape=(), dtype=np.uint8),\n    Box(low=0, high=255, shape=(32, 32, 3), dtype=np.uint8),\n    Discrete(2),\n    Discrete(5, start=-2),\n    Tuple((Discrete(3), Discrete(5))),\n    Tuple(\n        (\n            Discrete(7),\n            Box(low=np.array([0.0, -1.0]), high=np.array([1.0, 1.0]), dtype=np.float64),\n        )\n    ),\n    MultiDiscrete([11, 13, 17]),\n    MultiBinary(19),\n    Dict(\n        {\n            \"position\": Discrete(23),\n            \"velocity\": Box(\n                low=np.array([0.0]), high=np.array([1.0]), dtype=np.float64\n            ),\n        }\n    ),\n    Dict(\n        {\n            \"position\": Dict({\"x\": Discrete(29), \"y\": Discrete(31)}),\n            \"velocity\": Tuple(\n                (Discrete(37), Box(low=0, high=255, shape=(), dtype=np.uint8))\n            ),\n        }\n    ),\n]\n\nHEIGHT, WIDTH = 64, 64\n\n\nclass UnittestSlowEnv(gym.Env):\n    def __init__(self, slow_reset=0.3):\n        super().__init__()\n        self.slow_reset = slow_reset\n        self.observation_space = Box(\n            low=0, high=255, shape=(HEIGHT, WIDTH, 3), dtype=np.uint8\n        )\n        self.action_space = Box(low=0.0, high=1.0, shape=(), dtype=np.float32)\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        if self.slow_reset > 0:\n            time.sleep(self.slow_reset)\n        return self.observation_space.sample(), {}\n\n    def step(self, action):\n        time.sleep(action)\n        observation = self.observation_space.sample()\n        reward, terminated, truncated = 0.0, False, False\n        return observation, reward, terminated, truncated, {}\n\n\nclass CustomSpace(gym.Space):\n    \"\"\"Minimal custom observation space.\"\"\"\n\n    def sample(self):\n        return self.np_random.integers(0, 10, ())\n\n    def contains(self, x):\n        return 0 <= x <= 10\n\n    def __eq__(self, other):\n        return isinstance(other, CustomSpace)\n\n\ncustom_spaces = [\n    CustomSpace(),\n    Tuple((CustomSpace(), Box(low=0, high=255, shape=(), dtype=np.uint8))),\n]\n\n\nclass CustomSpaceEnv(gym.Env):\n    def __init__(self):\n        super().__init__()\n        self.observation_space = CustomSpace()\n        self.action_space = CustomSpace()\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        return \"reset\", {}\n\n    def step(self, action):\n        observation = f\"step({action:s})\"\n        reward, terminated, truncated = 0.0, False, False\n        return observation, reward, terminated, truncated, {}\n\n\ndef make_env(env_name, seed, **kwargs):\n    def _make():\n        env = gym.make(env_name, disable_env_checker=True, **kwargs)\n        env.action_space.seed(seed)\n        env.reset(seed=seed)\n        return env\n\n    return _make\n\n\ndef make_slow_env(slow_reset, seed):\n    def _make():\n        env = UnittestSlowEnv(slow_reset=slow_reset)\n        env.reset(seed=seed)\n        return env\n\n    return _make\n\n\ndef make_custom_space_env(seed):\n    def _make():\n        env = CustomSpaceEnv()\n        env.reset(seed=seed)\n        return env\n\n    return _make\n\n\ndef assert_rng_equal(rng_1: RandomNumberGenerator, rng_2: RandomNumberGenerator):\n    assert rng_1.bit_generator.state == rng_2.bit_generator.state\n"
  },
  {
    "path": "tests/wrappers/__init__.py",
    "content": ""
  },
  {
    "path": "tests/wrappers/test_atari_preprocessing.py",
    "content": "import numpy as np\nimport pytest\n\nfrom gym.spaces import Box, Discrete\nfrom gym.wrappers import AtariPreprocessing, StepAPICompatibility\nfrom tests.testing_env import GenericTestEnv, old_step_fn\n\n\nclass AleTesting:\n    \"\"\"A testing implementation for the ALE object in atari games.\"\"\"\n\n    grayscale_obs_space = Box(low=0, high=255, shape=(210, 160), dtype=np.uint8, seed=1)\n    rgb_obs_space = Box(low=0, high=255, shape=(210, 160, 3), dtype=np.uint8, seed=1)\n\n    def lives(self) -> int:\n        \"\"\"Returns the number of lives in the atari game.\"\"\"\n        return 1\n\n    def getScreenGrayscale(self, buffer: np.ndarray):\n        \"\"\"Updates the buffer with a random grayscale observation.\"\"\"\n        buffer[...] = self.grayscale_obs_space.sample()\n\n    def getScreenRGB(self, buffer: np.ndarray):\n        \"\"\"Updates the buffer with a random rgb observation.\"\"\"\n        buffer[...] = self.rgb_obs_space.sample()\n\n\nclass AtariTestingEnv(GenericTestEnv):\n    \"\"\"A testing environment to replicate the atari (ale-py) environments.\"\"\"\n\n    def __init__(self):\n        super().__init__(\n            observation_space=Box(\n                low=0, high=255, shape=(210, 160, 3), dtype=np.uint8, seed=1\n            ),\n            action_space=Discrete(3, seed=1),\n            step_fn=old_step_fn,\n        )\n        self.ale = AleTesting()\n\n    def get_action_meanings(self):\n        \"\"\"Returns the meanings of each of the actions available to the agent. First index must be 'NOOP'.\"\"\"\n        return [\"NOOP\", \"UP\", \"DOWN\"]\n\n\n@pytest.mark.parametrize(\n    \"env, obs_shape\",\n    [\n        (AtariTestingEnv(), (210, 160, 3)),\n        (\n            AtariPreprocessing(\n                StepAPICompatibility(AtariTestingEnv(), output_truncation_bool=True),\n                screen_size=84,\n                grayscale_obs=True,\n                frame_skip=1,\n                noop_max=0,\n            ),\n            (84, 84),\n        ),\n        (\n            AtariPreprocessing(\n                StepAPICompatibility(AtariTestingEnv(), output_truncation_bool=True),\n                screen_size=84,\n                grayscale_obs=False,\n                frame_skip=1,\n                noop_max=0,\n            ),\n            (84, 84, 3),\n        ),\n        (\n            AtariPreprocessing(\n                StepAPICompatibility(AtariTestingEnv(), output_truncation_bool=True),\n                screen_size=84,\n                grayscale_obs=True,\n                frame_skip=1,\n                noop_max=0,\n                grayscale_newaxis=True,\n            ),\n            (84, 84, 1),\n        ),\n    ],\n)\ndef test_atari_preprocessing_grayscale(env, obs_shape):\n    assert env.observation_space.shape == obs_shape\n\n    # It is not possible to test the outputs as we are not using actual observations.\n    # todo: update when ale-py is compatible with the ci\n\n    env = StepAPICompatibility(\n        env, output_truncation_bool=True\n    )  # using compatibility wrapper since ale-py uses old step API\n\n    obs, _ = env.reset(seed=0)\n    assert obs in env.observation_space\n\n    obs, _, _, _, _ = env.step(env.action_space.sample())\n    assert obs in env.observation_space\n\n    env.close()\n\n\n@pytest.mark.parametrize(\"grayscale\", [True, False])\n@pytest.mark.parametrize(\"scaled\", [True, False])\ndef test_atari_preprocessing_scale(grayscale, scaled, max_test_steps=10):\n    # arbitrarily chosen number for stepping into env. and ensuring all observations are in the required range\n    env = AtariPreprocessing(\n        StepAPICompatibility(AtariTestingEnv(), output_truncation_bool=True),\n        screen_size=84,\n        grayscale_obs=grayscale,\n        scale_obs=scaled,\n        frame_skip=1,\n        noop_max=0,\n    )\n\n    obs, _ = env.reset()\n\n    max_obs = 1 if scaled else 255\n    assert np.all(0 <= obs) and np.all(obs <= max_obs)\n\n    terminated, truncated, step_i = False, False, 0\n    while not (terminated or truncated) and step_i <= max_test_steps:\n        obs, _, terminated, truncated, _ = env.step(env.action_space.sample())\n        assert np.all(0 <= obs) and np.all(obs <= max_obs)\n\n        step_i += 1\n    env.close()\n"
  },
  {
    "path": "tests/wrappers/test_autoreset.py",
    "content": "\"\"\"Tests the gym.wrapper.AutoResetWrapper operates as expected.\"\"\"\nfrom typing import Generator, Optional\nfrom unittest.mock import MagicMock\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym.wrappers import AutoResetWrapper\nfrom tests.envs.utils import all_testing_env_specs\n\n\nclass DummyResetEnv(gym.Env):\n    \"\"\"A dummy environment which returns ascending numbers starting at `0` when :meth:`self.step()` is called.\n\n    After the second call to :meth:`self.step()` terminated is true.\n    Info dicts are also returned containing the same number returned as an observation, accessible via the key \"count\".\n    This environment is provided for the purpose of testing the autoreset wrapper.\n    \"\"\"\n\n    metadata = {}\n\n    def __init__(self):\n        \"\"\"Initialise the DummyResetEnv.\"\"\"\n        self.action_space = gym.spaces.Box(\n            low=np.array([0]), high=np.array([2]), dtype=np.int64\n        )\n        self.observation_space = gym.spaces.Discrete(2)\n        self.count = 0\n\n    def step(self, action: int):\n        \"\"\"Steps the DummyEnv with the incremented step, reward and terminated `if self.count > 1` and updated info.\"\"\"\n        self.count += 1\n        return (\n            np.array([self.count]),  # Obs\n            self.count > 2,  # Reward\n            self.count > 2,  # Terminated\n            False,  # Truncated\n            {\"count\": self.count},  # Info\n        )\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        \"\"\"Resets the DummyEnv to return the count array and info with count.\"\"\"\n        self.count = 0\n        return np.array([self.count]), {\"count\": self.count}\n\n\ndef unwrap_env(env) -> Generator[gym.Wrapper, None, None]:\n    \"\"\"Unwraps an environment yielding all wrappers around environment.\"\"\"\n    while isinstance(env, gym.Wrapper):\n        yield type(env)\n        env = env.env\n\n\n@pytest.mark.parametrize(\n    \"spec\", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]\n)\ndef test_make_autoreset_true(spec):\n    \"\"\"Tests gym.make with `autoreset=True`, and check that the reset actually happens.\n\n    Note: This test assumes that the outermost wrapper is AutoResetWrapper so if that\n     is being changed in the future, this test will break and need to be updated.\n    Note: This test assumes that all first-party environments will terminate in a finite\n     amount of time with random actions, which is true as of the time of adding this test.\n    \"\"\"\n    env = gym.make(spec.id, autoreset=True, disable_env_checker=True)\n    assert AutoResetWrapper in unwrap_env(env)\n\n    env.reset(seed=0)\n    env.unwrapped.reset = MagicMock(side_effect=env.unwrapped.reset)\n\n    terminated, truncated = False, False\n    while not (terminated or truncated):\n        obs, reward, terminated, truncated, info = env.step(env.action_space.sample())\n\n    assert env.unwrapped.reset.called\n    env.close()\n\n\n@pytest.mark.parametrize(\n    \"spec\", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]\n)\ndef test_gym_make_autoreset(spec):\n    \"\"\"Tests that `gym.make` autoreset wrapper is applied only when `gym.make(..., autoreset=True)`.\"\"\"\n    env = gym.make(spec.id, disable_env_checker=True)\n    assert AutoResetWrapper not in unwrap_env(env)\n    env.close()\n\n    env = gym.make(spec.id, autoreset=False, disable_env_checker=True)\n    assert AutoResetWrapper not in unwrap_env(env)\n    env.close()\n\n    env = gym.make(spec.id, autoreset=True, disable_env_checker=True)\n    assert AutoResetWrapper in unwrap_env(env)\n    env.close()\n\n\ndef test_autoreset_wrapper_autoreset():\n    \"\"\"Tests the autoreset wrapper actually automatically resets correctly.\"\"\"\n    env = DummyResetEnv()\n    env = AutoResetWrapper(env)\n\n    obs, info = env.reset()\n    assert obs == np.array([0])\n    assert info == {\"count\": 0}\n\n    action = 0\n    obs, reward, terminated, truncated, info = env.step(action)\n    assert obs == np.array([1])\n    assert reward == 0\n    assert (terminated or truncated) is False\n    assert info == {\"count\": 1}\n\n    obs, reward, terminated, truncated, info = env.step(action)\n    assert obs == np.array([2])\n    assert (terminated or truncated) is False\n    assert reward == 0\n    assert info == {\"count\": 2}\n\n    obs, reward, terminated, truncated, info = env.step(action)\n    assert obs == np.array([0])\n    assert (terminated or truncated) is True\n    assert reward == 1\n    assert info == {\n        \"count\": 0,\n        \"final_observation\": np.array([3]),\n        \"final_info\": {\"count\": 3},\n    }\n\n    obs, reward, terminated, truncated, info = env.step(action)\n    assert obs == np.array([1])\n    assert reward == 0\n    assert (terminated or truncated) is False\n    assert info == {\"count\": 1}\n\n    env.close()\n"
  },
  {
    "path": "tests/wrappers/test_clip_action.py",
    "content": "import numpy as np\n\nimport gym\nfrom gym.wrappers import ClipAction\n\n\ndef test_clip_action():\n    # mountaincar: action-based rewards\n    env = gym.make(\"MountainCarContinuous-v0\", disable_env_checker=True)\n    wrapped_env = ClipAction(\n        gym.make(\"MountainCarContinuous-v0\", disable_env_checker=True)\n    )\n\n    seed = 0\n\n    env.reset(seed=seed)\n    wrapped_env.reset(seed=seed)\n\n    actions = [[0.4], [1.2], [-0.3], [0.0], [-2.5]]\n    for action in actions:\n        obs1, r1, ter1, trunc1, _ = env.step(\n            np.clip(action, env.action_space.low, env.action_space.high)\n        )\n        obs2, r2, ter2, trunc2, _ = wrapped_env.step(action)\n        assert np.allclose(r1, r2)\n        assert np.allclose(obs1, obs2)\n        assert ter1 == ter2\n        assert trunc1 == trunc2\n"
  },
  {
    "path": "tests/wrappers/test_filter_observation.py",
    "content": "from typing import Optional, Tuple\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym import spaces\nfrom gym.wrappers.filter_observation import FilterObservation\n\n\nclass FakeEnvironment(gym.Env):\n    def __init__(\n        self, render_mode=None, observation_keys: Tuple[str, ...] = (\"state\",)\n    ):\n        self.observation_space = spaces.Dict(\n            {\n                name: spaces.Box(shape=(2,), low=-1, high=1, dtype=np.float32)\n                for name in observation_keys\n            }\n        )\n        self.action_space = spaces.Box(shape=(1,), low=-1, high=1, dtype=np.float32)\n        self.render_mode = render_mode\n\n    def render(self, mode=\"human\"):\n        image_shape = (32, 32, 3)\n        return np.zeros(image_shape, dtype=np.uint8)\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        observation = self.observation_space.sample()\n        return observation, {}\n\n    def step(self, action):\n        del action\n        observation = self.observation_space.sample()\n        reward, terminal, info = 0.0, False, {}\n        return observation, reward, terminal, info\n\n\nFILTER_OBSERVATION_TEST_CASES = (\n    ((\"key1\", \"key2\"), (\"key1\",)),\n    ((\"key1\", \"key2\"), (\"key1\", \"key2\")),\n    ((\"key1\",), None),\n    ((\"key1\",), (\"key1\",)),\n)\n\nERROR_TEST_CASES = (\n    (\"key\", ValueError, \"All the filter_keys must be included..*\"),\n    (False, TypeError, \"'bool' object is not iterable\"),\n    (1, TypeError, \"'int' object is not iterable\"),\n)\n\n\nclass TestFilterObservation:\n    @pytest.mark.parametrize(\n        \"observation_keys,filter_keys\", FILTER_OBSERVATION_TEST_CASES\n    )\n    def test_filter_observation(self, observation_keys, filter_keys):\n        env = FakeEnvironment(observation_keys=observation_keys)\n\n        # Make sure we are testing the right environment for the test.\n        observation_space = env.observation_space\n        assert isinstance(observation_space, spaces.Dict)\n\n        wrapped_env = FilterObservation(env, filter_keys=filter_keys)\n\n        assert isinstance(wrapped_env.observation_space, spaces.Dict)\n\n        if filter_keys is None:\n            filter_keys = tuple(observation_keys)\n\n        assert len(wrapped_env.observation_space.spaces) == len(filter_keys)\n        assert tuple(wrapped_env.observation_space.spaces.keys()) == tuple(filter_keys)\n\n        # Check that the added space item is consistent with the added observation.\n        observation, info = wrapped_env.reset()\n        assert len(observation) == len(filter_keys)\n        assert isinstance(info, dict)\n\n    @pytest.mark.parametrize(\"filter_keys,error_type,error_match\", ERROR_TEST_CASES)\n    def test_raises_with_incorrect_arguments(\n        self, filter_keys, error_type, error_match\n    ):\n        env = FakeEnvironment(observation_keys=(\"key1\", \"key2\"))\n\n        with pytest.raises(error_type, match=error_match):\n            FilterObservation(env, filter_keys=filter_keys)\n"
  },
  {
    "path": "tests/wrappers/test_flatten.py",
    "content": "\"\"\"Tests for the flatten observation wrapper.\"\"\"\n\nfrom collections import OrderedDict\nfrom typing import Optional\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym.spaces import Box, Dict, flatten, unflatten\nfrom gym.wrappers import FlattenObservation\n\n\nclass FakeEnvironment(gym.Env):\n    def __init__(self, observation_space):\n        self.observation_space = observation_space\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        self.observation = self.observation_space.sample()\n        return self.observation, {}\n\n\nOBSERVATION_SPACES = (\n    (\n        Dict(\n            OrderedDict(\n                [\n                    (\"key1\", Box(shape=(2, 3), low=0, high=0, dtype=np.float32)),\n                    (\"key2\", Box(shape=(), low=1, high=1, dtype=np.float32)),\n                    (\"key3\", Box(shape=(2,), low=2, high=2, dtype=np.float32)),\n                ]\n            )\n        ),\n        True,\n    ),\n    (\n        Dict(\n            OrderedDict(\n                [\n                    (\"key2\", Box(shape=(), low=0, high=0, dtype=np.float32)),\n                    (\"key3\", Box(shape=(2,), low=1, high=1, dtype=np.float32)),\n                    (\"key1\", Box(shape=(2, 3), low=2, high=2, dtype=np.float32)),\n                ]\n            )\n        ),\n        True,\n    ),\n    (\n        Dict(\n            {\n                \"key1\": Box(shape=(2, 3), low=-1, high=1, dtype=np.float32),\n                \"key2\": Box(shape=(), low=-1, high=1, dtype=np.float32),\n                \"key3\": Box(shape=(2,), low=-1, high=1, dtype=np.float32),\n            }\n        ),\n        False,\n    ),\n)\n\n\nclass TestFlattenEnvironment:\n    @pytest.mark.parametrize(\"observation_space, ordered_values\", OBSERVATION_SPACES)\n    def test_flattened_environment(self, observation_space, ordered_values):\n        \"\"\"\n        make sure that flattened observations occur in the order expected\n        \"\"\"\n        env = FakeEnvironment(observation_space=observation_space)\n        wrapped_env = FlattenObservation(env)\n        flattened, info = wrapped_env.reset()\n\n        unflattened = unflatten(env.observation_space, flattened)\n        original = env.observation\n\n        self._check_observations(original, flattened, unflattened, ordered_values)\n\n    @pytest.mark.parametrize(\"observation_space, ordered_values\", OBSERVATION_SPACES)\n    def test_flatten_unflatten(self, observation_space, ordered_values):\n        \"\"\"\n        test flatten and unflatten functions directly\n        \"\"\"\n        original = observation_space.sample()\n\n        flattened = flatten(observation_space, original)\n        unflattened = unflatten(observation_space, flattened)\n\n        self._check_observations(original, flattened, unflattened, ordered_values)\n\n    def _check_observations(self, original, flattened, unflattened, ordered_values):\n        # make sure that unflatten(flatten(original)) == original\n        assert set(unflattened.keys()) == set(original.keys())\n        for k, v in original.items():\n            np.testing.assert_allclose(unflattened[k], v)\n\n        if ordered_values:\n            # make sure that the values were flattened in the order they appeared in the\n            # OrderedDict\n            np.testing.assert_allclose(sorted(flattened), flattened)\n"
  },
  {
    "path": "tests/wrappers/test_flatten_observation.py",
    "content": "import numpy as np\nimport pytest\n\nimport gym\nfrom gym import spaces\nfrom gym.wrappers import FlattenObservation\n\n\n@pytest.mark.parametrize(\"env_id\", [\"Blackjack-v1\"])\ndef test_flatten_observation(env_id):\n    env = gym.make(env_id, disable_env_checker=True)\n    wrapped_env = FlattenObservation(env)\n\n    obs, info = env.reset()\n    wrapped_obs, wrapped_obs_info = wrapped_env.reset()\n\n    space = spaces.Tuple((spaces.Discrete(32), spaces.Discrete(11), spaces.Discrete(2)))\n    wrapped_space = spaces.Box(0, 1, [32 + 11 + 2], dtype=np.int64)\n\n    assert space.contains(obs)\n    assert wrapped_space.contains(wrapped_obs)\n    assert isinstance(info, dict)\n    assert isinstance(wrapped_obs_info, dict)\n"
  },
  {
    "path": "tests/wrappers/test_frame_stack.py",
    "content": "import numpy as np\nimport pytest\n\nimport gym\nfrom gym.wrappers import FrameStack\n\ntry:\n    import lz4\nexcept ImportError:\n    lz4 = None\n\n\n@pytest.mark.parametrize(\"env_id\", [\"CartPole-v1\", \"Pendulum-v1\", \"CarRacing-v2\"])\n@pytest.mark.parametrize(\"num_stack\", [2, 3, 4])\n@pytest.mark.parametrize(\n    \"lz4_compress\",\n    [\n        pytest.param(\n            True,\n            marks=pytest.mark.skipif(\n                lz4 is None, reason=\"Need lz4 to run tests with compression\"\n            ),\n        ),\n        False,\n    ],\n)\ndef test_frame_stack(env_id, num_stack, lz4_compress):\n    env = gym.make(env_id, disable_env_checker=True)\n    shape = env.observation_space.shape\n    env = FrameStack(env, num_stack, lz4_compress)\n    assert env.observation_space.shape == (num_stack,) + shape\n    assert env.observation_space.dtype == env.env.observation_space.dtype\n\n    dup = gym.make(env_id, disable_env_checker=True)\n\n    obs, _ = env.reset(seed=0)\n    dup_obs, _ = dup.reset(seed=0)\n    assert np.allclose(obs[-1], dup_obs)\n\n    for _ in range(num_stack**2):\n        action = env.action_space.sample()\n        dup_obs, _, dup_terminated, dup_truncated, _ = dup.step(action)\n        obs, _, terminated, truncated, _ = env.step(action)\n\n        assert dup_terminated == terminated\n        assert dup_truncated == truncated\n        assert np.allclose(obs[-1], dup_obs)\n\n        if terminated or truncated:\n            break\n\n    assert len(obs) == num_stack\n"
  },
  {
    "path": "tests/wrappers/test_gray_scale_observation.py",
    "content": "import pytest\n\nimport gym\nfrom gym import spaces\nfrom gym.wrappers import GrayScaleObservation\n\n\n@pytest.mark.parametrize(\"env_id\", [\"CarRacing-v2\"])\n@pytest.mark.parametrize(\"keep_dim\", [True, False])\ndef test_gray_scale_observation(env_id, keep_dim):\n    rgb_env = gym.make(env_id, disable_env_checker=True)\n\n    assert isinstance(rgb_env.observation_space, spaces.Box)\n    assert len(rgb_env.observation_space.shape) == 3\n    assert rgb_env.observation_space.shape[-1] == 3\n\n    wrapped_env = GrayScaleObservation(rgb_env, keep_dim=keep_dim)\n    assert isinstance(wrapped_env.observation_space, spaces.Box)\n    if keep_dim:\n        assert len(wrapped_env.observation_space.shape) == 3\n        assert wrapped_env.observation_space.shape[-1] == 1\n    else:\n        assert len(wrapped_env.observation_space.shape) == 2\n\n    wrapped_obs, info = wrapped_env.reset()\n    assert wrapped_obs in wrapped_env.observation_space\n"
  },
  {
    "path": "tests/wrappers/test_human_rendering.py",
    "content": "import re\n\nimport pytest\n\nimport gym\nfrom gym.wrappers import HumanRendering\n\n\ndef test_human_rendering():\n    for mode in [\"rgb_array\", \"rgb_array_list\"]:\n        env = HumanRendering(\n            gym.make(\"CartPole-v1\", render_mode=mode, disable_env_checker=True)\n        )\n        assert env.render_mode == \"human\"\n        env.reset()\n\n        for _ in range(75):\n            _, _, terminated, truncated, _ = env.step(env.action_space.sample())\n            if terminated or truncated:\n                env.reset()\n\n        env.close()\n\n    env = gym.make(\"CartPole-v1\", render_mode=\"human\")\n    with pytest.raises(\n        AssertionError,\n        match=re.escape(\n            \"Expected env.render_mode to be one of 'rgb_array' or 'rgb_array_list' but got 'human'\"\n        ),\n    ):\n        HumanRendering(env)\n    env.close()\n"
  },
  {
    "path": "tests/wrappers/test_nested_dict.py",
    "content": "\"\"\"Tests for the filter observation wrapper.\"\"\"\nfrom typing import Optional\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym.spaces import Box, Dict, Tuple\nfrom gym.wrappers import FilterObservation, FlattenObservation\n\n\nclass FakeEnvironment(gym.Env):\n    def __init__(self, observation_space, render_mode=None):\n        self.observation_space = observation_space\n        self.obs_keys = self.observation_space.spaces.keys()\n        self.action_space = Box(shape=(1,), low=-1, high=1, dtype=np.float32)\n        self.render_mode = render_mode\n\n    def render(self, mode=\"human\"):\n        image_shape = (32, 32, 3)\n        return np.zeros(image_shape, dtype=np.uint8)\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        observation = self.observation_space.sample()\n        return observation, {}\n\n    def step(self, action):\n        del action\n        observation = self.observation_space.sample()\n        reward, terminal, info = 0.0, False, {}\n        return observation, reward, terminal, info\n\n\nNESTED_DICT_TEST_CASES = (\n    (\n        Dict(\n            {\n                \"key1\": Box(shape=(2,), low=-1, high=1, dtype=np.float32),\n                \"key2\": Dict(\n                    {\n                        \"subkey1\": Box(shape=(2,), low=-1, high=1, dtype=np.float32),\n                        \"subkey2\": Box(shape=(2,), low=-1, high=1, dtype=np.float32),\n                    }\n                ),\n            }\n        ),\n        (6,),\n    ),\n    (\n        Dict(\n            {\n                \"key1\": Box(shape=(2, 3), low=-1, high=1, dtype=np.float32),\n                \"key2\": Box(shape=(), low=-1, high=1, dtype=np.float32),\n                \"key3\": Box(shape=(2,), low=-1, high=1, dtype=np.float32),\n            }\n        ),\n        (9,),\n    ),\n    (\n        Dict(\n            {\n                \"key1\": Tuple(\n                    (\n                        Box(shape=(2,), low=-1, high=1, dtype=np.float32),\n                        Box(shape=(2,), low=-1, high=1, dtype=np.float32),\n                    )\n                ),\n                \"key2\": Box(shape=(), low=-1, high=1, dtype=np.float32),\n                \"key3\": Box(shape=(2,), low=-1, high=1, dtype=np.float32),\n            }\n        ),\n        (7,),\n    ),\n    (\n        Dict(\n            {\n                \"key1\": Tuple((Box(shape=(2,), low=-1, high=1, dtype=np.float32),)),\n                \"key2\": Box(shape=(), low=-1, high=1, dtype=np.float32),\n                \"key3\": Box(shape=(2,), low=-1, high=1, dtype=np.float32),\n            }\n        ),\n        (5,),\n    ),\n    (\n        Dict(\n            {\n                \"key1\": Tuple(\n                    (Dict({\"key9\": Box(shape=(2,), low=-1, high=1, dtype=np.float32)}),)\n                ),\n                \"key2\": Box(shape=(), low=-1, high=1, dtype=np.float32),\n                \"key3\": Box(shape=(2,), low=-1, high=1, dtype=np.float32),\n            }\n        ),\n        (5,),\n    ),\n)\n\n\nclass TestNestedDictWrapper:\n    @pytest.mark.parametrize(\"observation_space, flat_shape\", NESTED_DICT_TEST_CASES)\n    def test_nested_dicts_size(self, observation_space, flat_shape):\n        env = FakeEnvironment(observation_space=observation_space)\n\n        # Make sure we are testing the right environment for the test.\n        observation_space = env.observation_space\n        assert isinstance(observation_space, Dict)\n\n        wrapped_env = FlattenObservation(FilterObservation(env, env.obs_keys))\n        assert wrapped_env.observation_space.shape == flat_shape\n\n        assert wrapped_env.observation_space.dtype == np.float32\n\n    @pytest.mark.parametrize(\"observation_space, flat_shape\", NESTED_DICT_TEST_CASES)\n    def test_nested_dicts_ravel(self, observation_space, flat_shape):\n        env = FakeEnvironment(observation_space=observation_space)\n        wrapped_env = FlattenObservation(FilterObservation(env, env.obs_keys))\n        obs, info = wrapped_env.reset()\n        assert obs.shape == wrapped_env.observation_space.shape\n        assert isinstance(info, dict)\n"
  },
  {
    "path": "tests/wrappers/test_normalize.py",
    "content": "from typing import Optional\n\nimport numpy as np\nfrom numpy.testing import assert_almost_equal\n\nimport gym\nfrom gym.wrappers.normalize import NormalizeObservation, NormalizeReward\n\n\nclass DummyRewardEnv(gym.Env):\n    metadata = {}\n\n    def __init__(self, return_reward_idx=0):\n        self.action_space = gym.spaces.Discrete(2)\n        self.observation_space = gym.spaces.Box(\n            low=np.array([-1.0]), high=np.array([1.0]), dtype=np.float64\n        )\n        self.returned_rewards = [0, 1, 2, 3, 4]\n        self.return_reward_idx = return_reward_idx\n        self.t = self.return_reward_idx\n\n    def step(self, action):\n        self.t += 1\n        return (\n            np.array([self.t]),\n            self.t,\n            self.t == len(self.returned_rewards),\n            False,\n            {},\n        )\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        self.t = self.return_reward_idx\n        return np.array([self.t]), {}\n\n\ndef make_env(return_reward_idx):\n    def thunk():\n        env = DummyRewardEnv(return_reward_idx)\n        return env\n\n    return thunk\n\n\ndef test_normalize_observation():\n    env = DummyRewardEnv(return_reward_idx=0)\n    env = NormalizeObservation(env)\n    env.reset()\n    env.step(env.action_space.sample())\n    assert_almost_equal(env.obs_rms.mean, 0.5, decimal=4)\n    env.step(env.action_space.sample())\n    assert_almost_equal(env.obs_rms.mean, 1.0, decimal=4)\n\n\ndef test_normalize_reset_info():\n    env = DummyRewardEnv(return_reward_idx=0)\n    env = NormalizeObservation(env)\n    obs, info = env.reset()\n    assert isinstance(obs, np.ndarray)\n    assert isinstance(info, dict)\n\n\ndef test_normalize_return():\n    env = DummyRewardEnv(return_reward_idx=0)\n    env = NormalizeReward(env)\n    env.reset()\n    env.step(env.action_space.sample())\n    assert_almost_equal(\n        env.return_rms.mean,\n        np.mean([1]),  # [first return]\n        decimal=4,\n    )\n    env.step(env.action_space.sample())\n    assert_almost_equal(\n        env.return_rms.mean,\n        np.mean([2 + env.gamma * 1, 1]),  # [second return, first return]\n        decimal=4,\n    )\n\n\ndef test_normalize_observation_vector_env():\n    env_fns = [make_env(0), make_env(1)]\n    envs = gym.vector.SyncVectorEnv(env_fns)\n    envs.reset()\n    obs, reward, _, _, _ = envs.step(envs.action_space.sample())\n    np.testing.assert_almost_equal(obs, np.array([[1], [2]]), decimal=4)\n    np.testing.assert_almost_equal(reward, np.array([1, 2]), decimal=4)\n\n    env_fns = [make_env(0), make_env(1)]\n    envs = gym.vector.SyncVectorEnv(env_fns)\n    envs = NormalizeObservation(envs)\n    envs.reset()\n    assert_almost_equal(\n        envs.obs_rms.mean,\n        np.mean([0.5]),  # the mean of first observations [[0, 1]]\n        decimal=4,\n    )\n    obs, reward, _, _, _ = envs.step(envs.action_space.sample())\n    assert_almost_equal(\n        envs.obs_rms.mean,\n        np.mean([1.0]),  # the mean of first and second observations [[0, 1], [1, 2]]\n        decimal=4,\n    )\n\n\ndef test_normalize_return_vector_env():\n    env_fns = [make_env(0), make_env(1)]\n    envs = gym.vector.SyncVectorEnv(env_fns)\n    envs = NormalizeReward(envs)\n    obs = envs.reset()\n    obs, reward, _, _, _ = envs.step(envs.action_space.sample())\n    assert_almost_equal(\n        envs.return_rms.mean,\n        np.mean([1.5]),  # the mean of first returns [[1, 2]]\n        decimal=4,\n    )\n    obs, reward, _, _, _ = envs.step(envs.action_space.sample())\n    assert_almost_equal(\n        envs.return_rms.mean,\n        np.mean(\n            [[1, 2], [2 + envs.gamma * 1, 3 + envs.gamma * 2]]\n        ),  # the mean of first and second returns [[1, 2], [2 + envs.gamma * 1, 3 + envs.gamma * 2]]\n        decimal=4,\n    )\n"
  },
  {
    "path": "tests/wrappers/test_order_enforcing.py",
    "content": "import pytest\n\nimport gym\nfrom gym.envs.classic_control import CartPoleEnv\nfrom gym.error import ResetNeeded\nfrom gym.wrappers import OrderEnforcing\nfrom tests.envs.utils import all_testing_env_specs\nfrom tests.wrappers.utils import has_wrapper\n\n\n@pytest.mark.parametrize(\n    \"spec\", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]\n)\ndef test_gym_make_order_enforcing(spec):\n    \"\"\"Checks that gym.make wrappers the environment with the OrderEnforcing wrapper.\"\"\"\n    env = gym.make(spec.id, disable_env_checker=True)\n\n    assert has_wrapper(env, OrderEnforcing)\n\n\ndef test_order_enforcing():\n    \"\"\"Checks that the order enforcing works as expected, raising an error before reset is called and not after.\"\"\"\n    # The reason for not using gym.make is that all environments are by default wrapped in the order enforcing wrapper\n    env = CartPoleEnv(render_mode=\"rgb_array_list\")\n    assert not has_wrapper(env, OrderEnforcing)\n\n    # Assert that the order enforcing works for step and render before reset\n    order_enforced_env = OrderEnforcing(env)\n    assert order_enforced_env.has_reset is False\n    with pytest.raises(ResetNeeded):\n        order_enforced_env.step(0)\n    with pytest.raises(ResetNeeded):\n        order_enforced_env.render()\n    assert order_enforced_env.has_reset is False\n\n    # Assert that the Assertion errors are not raised after reset\n    order_enforced_env.reset()\n    assert order_enforced_env.has_reset is True\n    order_enforced_env.step(0)\n    order_enforced_env.render()\n\n    # Assert that with disable_render_order_enforcing works, the environment has already been reset\n    env = CartPoleEnv(render_mode=\"rgb_array_list\")\n    env = OrderEnforcing(env, disable_render_order_enforcing=True)\n    env.render()  # no assertion error\n"
  },
  {
    "path": "tests/wrappers/test_passive_env_checker.py",
    "content": "import re\nimport warnings\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym.wrappers.env_checker import PassiveEnvChecker\nfrom tests.envs.test_envs import PASSIVE_CHECK_IGNORE_WARNING\nfrom tests.envs.utils import all_testing_initialised_envs\nfrom tests.testing_env import GenericTestEnv\n\n\n@pytest.mark.parametrize(\n    \"env\",\n    all_testing_initialised_envs,\n    ids=[env.spec.id for env in all_testing_initialised_envs],\n)\ndef test_passive_checker_wrapper_warnings(env):\n    with warnings.catch_warnings(record=True) as caught_warnings:\n        checker_env = PassiveEnvChecker(env)\n        checker_env.reset()\n        checker_env.step(checker_env.action_space.sample())\n        # todo, add check for render, bugged due to mujoco v2/3 and v4 envs\n\n        checker_env.close()\n\n    for warning in caught_warnings:\n        if warning.message.args[0] not in PASSIVE_CHECK_IGNORE_WARNING:\n            raise gym.error.Error(f\"Unexpected warning: {warning.message}\")\n\n\n@pytest.mark.parametrize(\n    \"env, message\",\n    [\n        (\n            GenericTestEnv(action_space=None),\n            \"The environment must specify an action space. https://www.gymlibrary.dev/content/environment_creation/\",\n        ),\n        (\n            GenericTestEnv(action_space=\"error\"),\n            \"action space does not inherit from `gym.spaces.Space`, actual type: <class 'str'>\",\n        ),\n        (\n            GenericTestEnv(observation_space=None),\n            \"The environment must specify an observation space. https://www.gymlibrary.dev/content/environment_creation/\",\n        ),\n        (\n            GenericTestEnv(observation_space=\"error\"),\n            \"observation space does not inherit from `gym.spaces.Space`, actual type: <class 'str'>\",\n        ),\n    ],\n)\ndef test_initialise_failures(env, message):\n    with pytest.raises(AssertionError, match=f\"^{re.escape(message)}$\"):\n        PassiveEnvChecker(env)\n\n    env.close()\n\n\ndef _reset_failure(self, seed=None, options=None):\n    return np.array([-1.0], dtype=np.float32), {}\n\n\ndef _step_failure(self, action):\n    return \"error\"\n\n\ndef test_api_failures():\n    env = GenericTestEnv(\n        reset_fn=_reset_failure,\n        step_fn=_step_failure,\n        metadata={\"render_modes\": \"error\"},\n    )\n    env = PassiveEnvChecker(env)\n    assert env.checked_reset is False\n    assert env.checked_step is False\n    assert env.checked_render is False\n\n    with pytest.warns(\n        UserWarning,\n        match=re.escape(\n            \"The obs returned by the `reset()` method is not within the observation space\"\n        ),\n    ):\n        env.reset()\n    assert env.checked_reset\n\n    with pytest.raises(\n        AssertionError,\n        match=\"Expects step result to be a tuple, actual type: <class 'str'>\",\n    ):\n        env.step(env.action_space.sample())\n    assert env.checked_step\n\n    with pytest.warns(\n        UserWarning,\n        match=r\"Expects the render_modes to be a sequence \\(i\\.e\\. list, tuple\\), actual type: <class 'str'>\",\n    ):\n        env.render()\n    assert env.checked_render\n\n    env.close()\n"
  },
  {
    "path": "tests/wrappers/test_pixel_observation.py",
    "content": "\"\"\"Tests for the pixel observation wrapper.\"\"\"\nfrom typing import Optional\n\nimport numpy as np\nimport pytest\n\nimport gym\nfrom gym import spaces\nfrom gym.wrappers.pixel_observation import STATE_KEY, PixelObservationWrapper\n\n\nclass FakeEnvironment(gym.Env):\n    def __init__(self, render_mode=\"single_rgb_array\"):\n        self.action_space = spaces.Box(shape=(1,), low=-1, high=1, dtype=np.float32)\n        self.render_mode = render_mode\n\n    def render(self, mode=\"human\", width=32, height=32):\n        image_shape = (height, width, 3)\n        return np.zeros(image_shape, dtype=np.uint8)\n\n    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):\n        super().reset(seed=seed)\n        observation = self.observation_space.sample()\n        return observation, {}\n\n    def step(self, action):\n        del action\n        observation = self.observation_space.sample()\n        reward, terminal, info = 0.0, False, {}\n        return observation, reward, terminal, info\n\n\nclass FakeArrayObservationEnvironment(FakeEnvironment):\n    def __init__(self, *args, **kwargs):\n        self.observation_space = spaces.Box(\n            shape=(2,), low=-1, high=1, dtype=np.float32\n        )\n        super().__init__(*args, **kwargs)\n\n\nclass FakeDictObservationEnvironment(FakeEnvironment):\n    def __init__(self, *args, **kwargs):\n        self.observation_space = spaces.Dict(\n            {\n                \"state\": spaces.Box(shape=(2,), low=-1, high=1, dtype=np.float32),\n            }\n        )\n        super().__init__(*args, **kwargs)\n\n\n@pytest.mark.parametrize(\"pixels_only\", (True, False))\ndef test_dict_observation(pixels_only):\n    pixel_key = \"rgb\"\n\n    env = FakeDictObservationEnvironment()\n\n    # Make sure we are testing the right environment for the test.\n    observation_space = env.observation_space\n    assert isinstance(observation_space, spaces.Dict)\n\n    width, height = (320, 240)\n\n    # The wrapper should only add one observation.\n    wrapped_env = PixelObservationWrapper(\n        env,\n        pixel_keys=(pixel_key,),\n        pixels_only=pixels_only,\n        render_kwargs={pixel_key: {\"width\": width, \"height\": height}},\n    )\n\n    assert isinstance(wrapped_env.observation_space, spaces.Dict)\n\n    if pixels_only:\n        assert len(wrapped_env.observation_space.spaces) == 1\n        assert list(wrapped_env.observation_space.spaces.keys()) == [pixel_key]\n    else:\n        assert (\n            len(wrapped_env.observation_space.spaces)\n            == len(observation_space.spaces) + 1\n        )\n        expected_keys = list(observation_space.spaces.keys()) + [pixel_key]\n        assert list(wrapped_env.observation_space.spaces.keys()) == expected_keys\n\n    # Check that the added space item is consistent with the added observation.\n    observation, info = wrapped_env.reset()\n    rgb_observation = observation[pixel_key]\n\n    assert isinstance(info, dict)\n    assert rgb_observation.shape == (height, width, 3)\n    assert rgb_observation.dtype == np.uint8\n\n\n@pytest.mark.parametrize(\"pixels_only\", (True, False))\ndef test_single_array_observation(pixels_only):\n    pixel_key = \"depth\"\n\n    env = FakeArrayObservationEnvironment()\n    observation_space = env.observation_space\n    assert isinstance(observation_space, spaces.Box)\n\n    wrapped_env = PixelObservationWrapper(\n        env, pixel_keys=(pixel_key,), pixels_only=pixels_only\n    )\n    wrapped_env.observation_space = wrapped_env.observation_space\n    assert isinstance(wrapped_env.observation_space, spaces.Dict)\n\n    if pixels_only:\n        assert len(wrapped_env.observation_space.spaces) == 1\n        assert list(wrapped_env.observation_space.spaces.keys()) == [pixel_key]\n    else:\n        assert len(wrapped_env.observation_space.spaces) == 2\n        assert list(wrapped_env.observation_space.spaces.keys()) == [\n            STATE_KEY,\n            pixel_key,\n        ]\n\n    observation, info = wrapped_env.reset()\n    depth_observation = observation[pixel_key]\n\n    assert isinstance(info, dict)\n    assert depth_observation.shape == (32, 32, 3)\n    assert depth_observation.dtype == np.uint8\n\n    if not pixels_only:\n        assert isinstance(observation[STATE_KEY], np.ndarray)\n"
  },
  {
    "path": "tests/wrappers/test_record_episode_statistics.py",
    "content": "import numpy as np\nimport pytest\n\nimport gym\nfrom gym.wrappers import RecordEpisodeStatistics, VectorListInfo\nfrom gym.wrappers.record_episode_statistics import add_vector_episode_statistics\n\n\n@pytest.mark.parametrize(\"env_id\", [\"CartPole-v1\", \"Pendulum-v1\"])\n@pytest.mark.parametrize(\"deque_size\", [2, 5])\ndef test_record_episode_statistics(env_id, deque_size):\n    env = gym.make(env_id, disable_env_checker=True)\n    env = RecordEpisodeStatistics(env, deque_size)\n\n    for n in range(5):\n        env.reset()\n        assert env.episode_returns is not None and env.episode_lengths is not None\n        assert env.episode_returns[0] == 0.0\n        assert env.episode_lengths[0] == 0\n        for t in range(env.spec.max_episode_steps):\n            _, _, terminated, truncated, info = env.step(env.action_space.sample())\n            if terminated or truncated:\n                assert \"episode\" in info\n                assert all([item in info[\"episode\"] for item in [\"r\", \"l\", \"t\"]])\n                break\n    assert len(env.return_queue) == deque_size\n    assert len(env.length_queue) == deque_size\n\n\ndef test_record_episode_statistics_reset_info():\n    env = gym.make(\"CartPole-v1\", disable_env_checker=True)\n    env = RecordEpisodeStatistics(env)\n    ob_space = env.observation_space\n    obs, info = env.reset()\n    assert ob_space.contains(obs)\n    assert isinstance(info, dict)\n\n\n@pytest.mark.parametrize(\n    (\"num_envs\", \"asynchronous\"), [(1, False), (1, True), (4, False), (4, True)]\n)\ndef test_record_episode_statistics_with_vectorenv(num_envs, asynchronous):\n    envs = gym.vector.make(\n        \"CartPole-v1\",\n        render_mode=None,\n        num_envs=num_envs,\n        asynchronous=asynchronous,\n        disable_env_checker=True,\n    )\n    envs = RecordEpisodeStatistics(envs)\n    max_episode_step = (\n        envs.env_fns[0]().spec.max_episode_steps\n        if asynchronous\n        else envs.env.envs[0].spec.max_episode_steps\n    )\n    envs.reset()\n    for _ in range(max_episode_step + 1):\n        _, _, terminateds, truncateds, infos = envs.step(envs.action_space.sample())\n        if any(terminateds) or any(truncateds):\n            assert \"episode\" in infos\n            assert \"_episode\" in infos\n            assert all(infos[\"_episode\"] == np.bitwise_or(terminateds, truncateds))\n            assert all([item in infos[\"episode\"] for item in [\"r\", \"l\", \"t\"]])\n            break\n        else:\n            assert \"episode\" not in infos\n            assert \"_episode\" not in infos\n\n\ndef test_wrong_wrapping_order():\n    envs = gym.vector.make(\"CartPole-v1\", num_envs=3, disable_env_checker=True)\n    wrapped_env = RecordEpisodeStatistics(VectorListInfo(envs))\n    wrapped_env.reset()\n\n    with pytest.raises(AssertionError):\n        wrapped_env.step(wrapped_env.action_space.sample())\n\n\ndef test_add_vector_episode_statistics():\n    NUM_ENVS = 5\n\n    info = {}\n    for i in range(NUM_ENVS):\n        episode_info = {\n            \"episode\": {\n                \"r\": i,\n                \"l\": i,\n                \"t\": i,\n            }\n        }\n        info = add_vector_episode_statistics(info, episode_info[\"episode\"], NUM_ENVS, i)\n        assert np.alltrue(info[\"_episode\"][: i + 1])\n\n        for j in range(NUM_ENVS):\n            if j <= i:\n                assert info[\"episode\"][\"r\"][j] == j\n                assert info[\"episode\"][\"l\"][j] == j\n                assert info[\"episode\"][\"t\"][j] == j\n            else:\n                assert info[\"episode\"][\"r\"][j] == 0\n                assert info[\"episode\"][\"l\"][j] == 0\n                assert info[\"episode\"][\"t\"][j] == 0\n"
  },
  {
    "path": "tests/wrappers/test_record_video.py",
    "content": "import os\nimport shutil\n\nimport gym\nfrom gym.wrappers import capped_cubic_video_schedule\n\n\ndef test_record_video_using_default_trigger():\n    env = gym.make(\n        \"CartPole-v1\", render_mode=\"rgb_array_list\", disable_env_checker=True\n    )\n    env = gym.wrappers.RecordVideo(env, \"videos\")\n    env.reset()\n    for _ in range(199):\n        action = env.action_space.sample()\n        _, _, terminated, truncated, _ = env.step(action)\n        if terminated or truncated:\n            env.reset()\n    env.close()\n    assert os.path.isdir(\"videos\")\n    mp4_files = [file for file in os.listdir(\"videos\") if file.endswith(\".mp4\")]\n    assert len(mp4_files) == sum(\n        capped_cubic_video_schedule(i) for i in range(env.episode_id + 1)\n    )\n    shutil.rmtree(\"videos\")\n\n\ndef test_record_video_reset():\n    env = gym.make(\"CartPole-v1\", render_mode=\"rgb_array\", disable_env_checker=True)\n    env = gym.wrappers.RecordVideo(env, \"videos\", step_trigger=lambda x: x % 100 == 0)\n    ob_space = env.observation_space\n    obs, info = env.reset()\n    env.close()\n    assert os.path.isdir(\"videos\")\n    shutil.rmtree(\"videos\")\n    assert ob_space.contains(obs)\n    assert isinstance(info, dict)\n\n\ndef test_record_video_step_trigger():\n    env = gym.make(\"CartPole-v1\", render_mode=\"rgb_array\", disable_env_checker=True)\n    env._max_episode_steps = 20\n    env = gym.wrappers.RecordVideo(env, \"videos\", step_trigger=lambda x: x % 100 == 0)\n    env.reset()\n    for _ in range(199):\n        action = env.action_space.sample()\n        _, _, terminated, truncated, _ = env.step(action)\n        if terminated or truncated:\n            env.reset()\n    env.close()\n    assert os.path.isdir(\"videos\")\n    mp4_files = [file for file in os.listdir(\"videos\") if file.endswith(\".mp4\")]\n    assert len(mp4_files) == 2\n    shutil.rmtree(\"videos\")\n\n\ndef make_env(gym_id, seed, **kwargs):\n    def thunk():\n        env = gym.make(gym_id, disable_env_checker=True, **kwargs)\n        env._max_episode_steps = 20\n        if seed == 1:\n            env = gym.wrappers.RecordVideo(\n                env, \"videos\", step_trigger=lambda x: x % 100 == 0\n            )\n        return env\n\n    return thunk\n\n\ndef test_record_video_within_vector():\n    envs = gym.vector.SyncVectorEnv(\n        [make_env(\"CartPole-v1\", 1 + i, render_mode=\"rgb_array\") for i in range(2)]\n    )\n    envs = gym.wrappers.RecordEpisodeStatistics(envs)\n    envs.reset()\n    for i in range(199):\n        _, _, _, _, infos = envs.step(envs.action_space.sample())\n\n        # break when every env is done\n        if \"episode\" in infos and all(infos[\"_episode\"]):\n            print(f\"episode_reward={infos['episode']['r']}\")\n\n    assert os.path.isdir(\"videos\")\n    mp4_files = [file for file in os.listdir(\"videos\") if file.endswith(\".mp4\")]\n    assert len(mp4_files) == 2\n    shutil.rmtree(\"videos\")\n"
  },
  {
    "path": "tests/wrappers/test_rescale_action.py",
    "content": "import numpy as np\nimport pytest\n\nimport gym\nfrom gym.wrappers import RescaleAction\n\n\ndef test_rescale_action():\n    env = gym.make(\"CartPole-v1\", disable_env_checker=True)\n    with pytest.raises(AssertionError):\n        env = RescaleAction(env, -1, 1)\n    del env\n\n    env = gym.make(\"Pendulum-v1\", disable_env_checker=True)\n    wrapped_env = RescaleAction(\n        gym.make(\"Pendulum-v1\", disable_env_checker=True), -1, 1\n    )\n\n    seed = 0\n\n    obs, info = env.reset(seed=seed)\n    wrapped_obs, wrapped_obs_info = wrapped_env.reset(seed=seed)\n    assert np.allclose(obs, wrapped_obs)\n\n    obs, reward, _, _, _ = env.step([1.5])\n    with pytest.raises(AssertionError):\n        wrapped_env.step([1.5])\n    wrapped_obs, wrapped_reward, _, _, _ = wrapped_env.step([0.75])\n\n    assert np.allclose(obs, wrapped_obs)\n    assert np.allclose(reward, wrapped_reward)\n"
  },
  {
    "path": "tests/wrappers/test_resize_observation.py",
    "content": "import pytest\n\nimport gym\nfrom gym import spaces\nfrom gym.wrappers import ResizeObservation\n\n\n@pytest.mark.parametrize(\"env_id\", [\"CarRacing-v2\"])\n@pytest.mark.parametrize(\"shape\", [16, 32, (8, 5), [10, 7]])\ndef test_resize_observation(env_id, shape):\n    env = gym.make(env_id, disable_env_checker=True)\n    env = ResizeObservation(env, shape)\n\n    assert isinstance(env.observation_space, spaces.Box)\n    assert env.observation_space.shape[-1] == 3\n    obs, _ = env.reset()\n    if isinstance(shape, int):\n        assert env.observation_space.shape[:2] == (shape, shape)\n        assert obs.shape == (shape, shape, 3)\n    else:\n        assert env.observation_space.shape[:2] == tuple(shape)\n        assert obs.shape == tuple(shape) + (3,)\n"
  },
  {
    "path": "tests/wrappers/test_step_compatibility.py",
    "content": "import pytest\n\nimport gym\nfrom gym.spaces import Discrete\nfrom gym.wrappers import StepAPICompatibility\n\n\nclass OldStepEnv(gym.Env):\n    def __init__(self):\n        self.action_space = Discrete(2)\n        self.observation_space = Discrete(2)\n\n    def step(self, action):\n        obs = self.observation_space.sample()\n        rew = 0\n        done = False\n        info = {}\n        return obs, rew, done, info\n\n\nclass NewStepEnv(gym.Env):\n    def __init__(self):\n        self.action_space = Discrete(2)\n        self.observation_space = Discrete(2)\n\n    def step(self, action):\n        obs = self.observation_space.sample()\n        rew = 0\n        terminated = False\n        truncated = False\n        info = {}\n        return obs, rew, terminated, truncated, info\n\n\n@pytest.mark.parametrize(\"env\", [OldStepEnv, NewStepEnv])\n@pytest.mark.parametrize(\"output_truncation_bool\", [None, True])\ndef test_step_compatibility_to_new_api(env, output_truncation_bool):\n    if output_truncation_bool is None:\n        env = StepAPICompatibility(env())\n    else:\n        env = StepAPICompatibility(env(), output_truncation_bool)\n    step_returns = env.step(0)\n    _, _, terminated, truncated, _ = step_returns\n    assert isinstance(terminated, bool)\n    assert isinstance(truncated, bool)\n\n\n@pytest.mark.parametrize(\"env\", [OldStepEnv, NewStepEnv])\ndef test_step_compatibility_to_old_api(env):\n    env = StepAPICompatibility(env(), False)\n    step_returns = env.step(0)\n    assert len(step_returns) == 4\n    _, _, done, _ = step_returns\n    assert isinstance(done, bool)\n\n\n@pytest.mark.parametrize(\"apply_api_compatibility\", [None, True, False])\ndef test_step_compatibility_in_make(apply_api_compatibility):\n    gym.register(\"OldStepEnv-v0\", entry_point=OldStepEnv)\n\n    if apply_api_compatibility is not None:\n        env = gym.make(\n            \"OldStepEnv-v0\",\n            apply_api_compatibility=apply_api_compatibility,\n            disable_env_checker=True,\n        )\n    else:\n        env = gym.make(\"OldStepEnv-v0\", disable_env_checker=True)\n\n    env.reset()\n    step_returns = env.step(0)\n    if apply_api_compatibility:\n        assert len(step_returns) == 5\n        _, _, terminated, truncated, _ = step_returns\n        assert isinstance(terminated, bool)\n        assert isinstance(truncated, bool)\n    else:\n        assert len(step_returns) == 4\n        _, _, done, _ = step_returns\n        assert isinstance(done, bool)\n\n    gym.envs.registry.pop(\"OldStepEnv-v0\")\n"
  },
  {
    "path": "tests/wrappers/test_time_aware_observation.py",
    "content": "import pytest\n\nimport gym\nfrom gym import spaces\nfrom gym.wrappers import TimeAwareObservation\n\n\n@pytest.mark.parametrize(\"env_id\", [\"CartPole-v1\", \"Pendulum-v1\"])\ndef test_time_aware_observation(env_id):\n    env = gym.make(env_id, disable_env_checker=True)\n    wrapped_env = TimeAwareObservation(env)\n\n    assert isinstance(env.observation_space, spaces.Box)\n    assert isinstance(wrapped_env.observation_space, spaces.Box)\n    assert wrapped_env.observation_space.shape[0] == env.observation_space.shape[0] + 1\n\n    obs, info = env.reset()\n    wrapped_obs, wrapped_obs_info = wrapped_env.reset()\n    assert wrapped_env.t == 0.0\n    assert wrapped_obs[-1] == 0.0\n    assert wrapped_obs.shape[0] == obs.shape[0] + 1\n\n    wrapped_obs, _, _, _, _ = wrapped_env.step(env.action_space.sample())\n    assert wrapped_env.t == 1.0\n    assert wrapped_obs[-1] == 1.0\n    assert wrapped_obs.shape[0] == obs.shape[0] + 1\n\n    wrapped_obs, _, _, _, _ = wrapped_env.step(env.action_space.sample())\n    assert wrapped_env.t == 2.0\n    assert wrapped_obs[-1] == 2.0\n    assert wrapped_obs.shape[0] == obs.shape[0] + 1\n\n    wrapped_obs, wrapped_obs_info = wrapped_env.reset()\n    assert wrapped_env.t == 0.0\n    assert wrapped_obs[-1] == 0.0\n    assert wrapped_obs.shape[0] == obs.shape[0] + 1\n"
  },
  {
    "path": "tests/wrappers/test_time_limit.py",
    "content": "import pytest\n\nimport gym\nfrom gym.envs.classic_control.pendulum import PendulumEnv\nfrom gym.wrappers import TimeLimit\n\n\ndef test_time_limit_reset_info():\n    env = gym.make(\"CartPole-v1\", disable_env_checker=True)\n    env = TimeLimit(env)\n    ob_space = env.observation_space\n    obs, info = env.reset()\n    assert ob_space.contains(obs)\n    assert isinstance(info, dict)\n\n\n@pytest.mark.parametrize(\"double_wrap\", [False, True])\ndef test_time_limit_wrapper(double_wrap):\n    # The pendulum env does not terminate by default\n    # so we are sure termination is only due to timeout\n    env = PendulumEnv()\n    max_episode_length = 20\n    env = TimeLimit(env, max_episode_length)\n    if double_wrap:\n        env = TimeLimit(env, max_episode_length)\n    env.reset()\n    terminated, truncated = False, False\n    n_steps = 0\n    info = {}\n    while not (terminated or truncated):\n        n_steps += 1\n        _, _, terminated, truncated, info = env.step(env.action_space.sample())\n\n    assert n_steps == max_episode_length\n    assert truncated\n\n\n@pytest.mark.parametrize(\"double_wrap\", [False, True])\ndef test_termination_on_last_step(double_wrap):\n    # Special case: termination at the last timestep\n    # Truncation due to timeout also happens at the same step\n\n    env = PendulumEnv()\n\n    def patched_step(_action):\n        return env.observation_space.sample(), 0.0, True, False, {}\n\n    env.step = patched_step\n\n    max_episode_length = 1\n    env = TimeLimit(env, max_episode_length)\n    if double_wrap:\n        env = TimeLimit(env, max_episode_length)\n    env.reset()\n    _, _, terminated, truncated, _ = env.step(env.action_space.sample())\n    assert terminated is True\n    assert truncated is True\n"
  },
  {
    "path": "tests/wrappers/test_transform_observation.py",
    "content": "import numpy as np\nimport pytest\n\nimport gym\nfrom gym.wrappers import TransformObservation\n\n\n@pytest.mark.parametrize(\"env_id\", [\"CartPole-v1\", \"Pendulum-v1\"])\ndef test_transform_observation(env_id):\n    def affine_transform(x):\n        return 3 * x + 2\n\n    env = gym.make(env_id, disable_env_checker=True)\n    wrapped_env = TransformObservation(\n        gym.make(env_id, disable_env_checker=True), lambda obs: affine_transform(obs)\n    )\n\n    obs, info = env.reset(seed=0)\n    wrapped_obs, wrapped_obs_info = wrapped_env.reset(seed=0)\n    assert np.allclose(wrapped_obs, affine_transform(obs))\n    assert isinstance(wrapped_obs_info, dict)\n\n    action = env.action_space.sample()\n    obs, reward, terminated, truncated, _ = env.step(action)\n    (\n        wrapped_obs,\n        wrapped_reward,\n        wrapped_terminated,\n        wrapped_truncated,\n        _,\n    ) = wrapped_env.step(action)\n    assert np.allclose(wrapped_obs, affine_transform(obs))\n    assert np.allclose(wrapped_reward, reward)\n    assert wrapped_terminated == terminated\n    assert wrapped_truncated == truncated\n"
  },
  {
    "path": "tests/wrappers/test_transform_reward.py",
    "content": "import numpy as np\nimport pytest\n\nimport gym\nfrom gym.wrappers import TransformReward\n\n\n@pytest.mark.parametrize(\"env_id\", [\"CartPole-v1\", \"Pendulum-v1\"])\ndef test_transform_reward(env_id):\n    # use case #1: scale\n    scales = [0.1, 200]\n    for scale in scales:\n        env = gym.make(env_id, disable_env_checker=True)\n        wrapped_env = TransformReward(\n            gym.make(env_id, disable_env_checker=True), lambda r: scale * r\n        )\n        action = env.action_space.sample()\n\n        env.reset(seed=0)\n        wrapped_env.reset(seed=0)\n\n        _, reward, _, _, _ = env.step(action)\n        _, wrapped_reward, _, _, _ = wrapped_env.step(action)\n\n        assert wrapped_reward == scale * reward\n        del env, wrapped_env\n\n    # use case #2: clip\n    min_r = -0.0005\n    max_r = 0.0002\n    env = gym.make(env_id, disable_env_checker=True)\n    wrapped_env = TransformReward(\n        gym.make(env_id, disable_env_checker=True), lambda r: np.clip(r, min_r, max_r)\n    )\n    action = env.action_space.sample()\n\n    env.reset(seed=0)\n    wrapped_env.reset(seed=0)\n\n    _, reward, _, _, _ = env.step(action)\n    _, wrapped_reward, _, _, _ = wrapped_env.step(action)\n\n    assert abs(wrapped_reward) < abs(reward)\n    assert wrapped_reward == -0.0005 or wrapped_reward == 0.0002\n    del env, wrapped_env\n\n    # use case #3: sign\n    env = gym.make(env_id, disable_env_checker=True)\n    wrapped_env = TransformReward(\n        gym.make(env_id, disable_env_checker=True), lambda r: np.sign(r)\n    )\n\n    env.reset(seed=0)\n    wrapped_env.reset(seed=0)\n\n    for _ in range(1000):\n        action = env.action_space.sample()\n        _, wrapped_reward, terminated, truncated, _ = wrapped_env.step(action)\n        assert wrapped_reward in [-1.0, 0.0, 1.0]\n        if terminated or truncated:\n            break\n    del env, wrapped_env\n"
  },
  {
    "path": "tests/wrappers/test_vector_list_info.py",
    "content": "import pytest\n\nimport gym\nfrom gym.wrappers import RecordEpisodeStatistics, VectorListInfo\n\nENV_ID = \"CartPole-v1\"\nNUM_ENVS = 3\nENV_STEPS = 50\nSEED = 42\n\n\ndef test_usage_in_vector_env():\n    env = gym.make(ENV_ID, disable_env_checker=True)\n    vector_env = gym.vector.make(ENV_ID, num_envs=NUM_ENVS, disable_env_checker=True)\n\n    VectorListInfo(vector_env)\n\n    with pytest.raises(AssertionError):\n        VectorListInfo(env)\n\n\ndef test_info_to_list():\n    env_to_wrap = gym.vector.make(ENV_ID, num_envs=NUM_ENVS, disable_env_checker=True)\n    wrapped_env = VectorListInfo(env_to_wrap)\n    wrapped_env.action_space.seed(SEED)\n    _, info = wrapped_env.reset(seed=SEED)\n    assert isinstance(info, list)\n    assert len(info) == NUM_ENVS\n\n    for _ in range(ENV_STEPS):\n        action = wrapped_env.action_space.sample()\n        _, _, terminateds, truncateds, list_info = wrapped_env.step(action)\n        for i, (terminated, truncated) in enumerate(zip(terminateds, truncateds)):\n            if terminated or truncated:\n                assert \"final_observation\" in list_info[i]\n            else:\n                assert \"final_observation\" not in list_info[i]\n\n\ndef test_info_to_list_statistics():\n    env_to_wrap = gym.vector.make(ENV_ID, num_envs=NUM_ENVS, disable_env_checker=True)\n    wrapped_env = VectorListInfo(RecordEpisodeStatistics(env_to_wrap))\n    _, info = wrapped_env.reset(seed=SEED)\n    wrapped_env.action_space.seed(SEED)\n    assert isinstance(info, list)\n    assert len(info) == NUM_ENVS\n\n    for _ in range(ENV_STEPS):\n        action = wrapped_env.action_space.sample()\n        _, _, terminateds, truncateds, list_info = wrapped_env.step(action)\n        for i, (terminated, truncated) in enumerate(zip(terminateds, truncateds)):\n            if terminated or truncated:\n                assert \"episode\" in list_info[i]\n                for stats in [\"r\", \"l\", \"t\"]:\n                    assert stats in list_info[i][\"episode\"]\n                    assert isinstance(list_info[i][\"episode\"][stats], float)\n            else:\n                assert \"episode\" not in list_info[i]\n"
  },
  {
    "path": "tests/wrappers/test_video_recorder.py",
    "content": "import gc\nimport os\nimport re\nimport time\n\nimport pytest\n\nimport gym\nfrom gym.wrappers.monitoring.video_recorder import VideoRecorder\n\n\nclass BrokenRecordableEnv(gym.Env):\n    metadata = {\"render_modes\": [\"rgb_array_list\"]}\n\n    def __init__(self, render_mode=\"rgb_array_list\"):\n        self.render_mode = render_mode\n\n    def render(self):\n        pass\n\n\nclass UnrecordableEnv(gym.Env):\n    metadata = {\"render_modes\": [None]}\n\n    def __init__(self, render_mode=None):\n        self.render_mode = render_mode\n\n    def render(self):\n        pass\n\n\ndef test_record_simple():\n    env = gym.make(\n        \"CartPole-v1\", render_mode=\"rgb_array_list\", disable_env_checker=True\n    )\n    rec = VideoRecorder(env)\n    env.reset()\n    rec.capture_frame()\n\n    rec.close()\n\n    assert not rec.broken\n    assert os.path.exists(rec.path)\n    f = open(rec.path)\n    assert os.fstat(f.fileno()).st_size > 100\n\n\ndef test_autoclose():\n    def record():\n        env = gym.make(\n            \"CartPole-v1\", render_mode=\"rgb_array_list\", disable_env_checker=True\n        )\n        rec = VideoRecorder(env)\n        env.reset()\n        rec.capture_frame()\n\n        rec_path = rec.path\n\n        # The function ends without an explicit `rec.close()` call\n        # The Python interpreter will implicitly do `del rec` on garbage cleaning\n        return rec_path\n\n    rec_path = record()\n\n    gc.collect()  # do explicit garbage collection for test\n    time.sleep(5)  # wait for subprocess exiting\n\n    assert os.path.exists(rec_path)\n    f = open(rec_path)\n    assert os.fstat(f.fileno()).st_size > 100\n\n\ndef test_no_frames():\n    env = BrokenRecordableEnv()\n    rec = VideoRecorder(env)\n    rec.close()\n    assert rec.functional\n    assert not os.path.exists(rec.path)\n\n\ndef test_record_unrecordable_method():\n    with pytest.warns(\n        UserWarning,\n        match=re.escape(\n            \"\\x1b[33mWARN: Disabling video recorder because environment <UnrecordableEnv instance> was not initialized with any compatible video mode between `rgb_array` and `rgb_array_list`\\x1b[0m\"\n        ),\n    ):\n        env = UnrecordableEnv()\n        rec = VideoRecorder(env)\n        assert not rec.enabled\n        rec.close()\n\n\ndef test_record_breaking_render_method():\n    with pytest.warns(\n        UserWarning,\n        match=re.escape(\n            \"Env returned None on `render()`. Disabling further rendering for video recorder by marking as disabled:\"\n        ),\n    ):\n        env = BrokenRecordableEnv()\n        rec = VideoRecorder(env)\n        rec.capture_frame()\n        rec.close()\n        assert rec.broken\n        assert not os.path.exists(rec.path)\n\n\ndef test_text_envs():\n    env = gym.make(\n        \"FrozenLake-v1\", render_mode=\"rgb_array_list\", disable_env_checker=True\n    )\n    video = VideoRecorder(env)\n    try:\n        env.reset()\n        video.capture_frame()\n        video.close()\n    finally:\n        os.remove(video.path)\n"
  },
  {
    "path": "tests/wrappers/utils.py",
    "content": "import gym\n\n\ndef has_wrapper(wrapped_env: gym.Env, wrapper_type: type) -> bool:\n    while isinstance(wrapped_env, gym.Wrapper):\n        if isinstance(wrapped_env, wrapper_type):\n            return True\n        wrapped_env = wrapped_env.env\n    return False\n"
  }
]