[
  {
    "path": ".gitignore",
    "content": ".*\n!.gitignore\n\n__pycache__/\n*.egg-info\n\nsandbox/\nlog/"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2022 jurgisp\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "**Status:** Stable release\n\n[![PyPI](https://img.shields.io/pypi/v/memory-maze.svg)](https://pypi.python.org/pypi/memory-maze/#history)\n\n# Memory Maze\n\nMemory Maze is a 3D domain of randomized mazes designed for evaluating the long-term memory abilities of RL agents. Memory Maze isolates long-term memory from confounding challenges, such as exploration, and requires remembering several pieces of information: the positions of objects, the wall layout, and keeping track of agent’s own position.\n\n| Memory 9x9 | Memory 11x11 | Memory 13x13 | Memory 15x15 |\n|------------|--------------|--------------|--------------|\n| ![map-9x9](https://user-images.githubusercontent.com/3135115/177040204-fbf3b558-d063-49d3-9973-ae113137782f.png) | ![map-11x11](https://user-images.githubusercontent.com/3135115/177040184-16ccb614-b897-44db-ab2c-7ae66e14c007.png) | ![map-13x13](https://user-images.githubusercontent.com/3135115/177040164-d3edb11f-de6a-4c17-bce2-38e539639f40.png) | ![map-15x15](https://user-images.githubusercontent.com/3135115/177040126-b9a0f861-b15b-492c-9216-89502e8f8ae9.png) |\n\nKey features:\n- Online RL memory tasks (with baselines)\n- Offline dataset for representation learning (with baselines)\n- Verified that memory is the key challenge\n- Challenging but solvable by human baseline\n- Easy installation via a simple pip command\n- Available `gym` and `dm_env` interfaces\n- Supports headless and hardware rendering\n- Interactive GUI for human players\n- Hidden state information for probe evaluation\n\nAlso see the accompanying research paper: [Evaluating Long-Term Memory in 3D Mazes](https://arxiv.org/abs/2210.13383)\n\n```\n@article{pasukonis2022memmaze,\n  title={Evaluating Long-Term Memory in 3D Mazes},\n  author={Pasukonis, Jurgis and Lillicrap, Timothy and Hafner, Danijar},\n  journal={arXiv preprint arXiv:2210.13383},\n  year={2022}\n}\n```\n\n## Installation\n\nMemory Maze builds on the [`dm_control`](https://github.com/deepmind/dm_control) and [`mujoco`](https://github.com/deepmind/mujoco) packages, which are automatically installed as dependencies:\n\n```sh\npip install memory-maze\n```\n\n## Play Yourself\n\nMemory Maze allows you to play the levels in human mode. We used this mode for recording the human baseline scores. These are the instructions for launching the GUI:\n\n```sh\n# GUI dependencies\npip install gym pygame pillow imageio\n\n# Launch with standard 64x64 resolution\npython gui/run_gui.py\n\n# Launch with higher 256x256 resolution\npython gui/run_gui.py --env \"memory_maze:MemoryMaze-9x9-HD-v0\"\n```\n\n## Task Description\n\nThe task is based on a game known as scavenger hunt or treasure hunt:\n- The agent starts in a randomly generated maze, which contains several objects of different colors.\n- The agent is prompted to find the target object of a specific color, indicated by the border color in the observation image.\n- Once the agent successfully finds and touches the correct object, it gets a +1 reward and the next random object is chosen as a target.\n- If the agent touches an object of the wrong color, there is no effect.\n- Throughout the episode, the maze layout and the locations of the objects do not change.\n- The episode continues for a fixed amount of time, so the total episode reward equals the number of reached targets.\n\n<p align=\"center\"><img width=\"256\" src=\"https://user-images.githubusercontent.com/3135115/177040240-847f0f0d-b20b-4652-83c3-a486f6f22c22.gif\"></p>\n\nAn agent with long-term memory only has to explore each maze once (which is possible in a time much shorter than the length of an episode) and can afterwards follow the shortest path to each requested target, whereas an agent with no memory has to randomly wander through the maze to find each target.\n\nThere are 4 size variations of the maze. The largest maze 15x15 is designed to be challenging but solvable for humans (see benchmark results below), but out of reach for the state-of-the-art RL methods. The smaller sizes are provided as stepping stones, with 9x9 being solvable with current RL methods.\n\n| Size | env_id | Objects | Episode steps | Mean human score | Mean max score |\n|:---------:|-----------------------|:---:|:-----:|:----:|:----:|\n| **9x9**   | `MemoryMaze-9x9-v0`   |  3  | 1000  | 26.4 | 34.8 |\n| **11x11** | `MemoryMaze-11x11-v0` |  4  | 2000  | 44.3 | 58.0 |\n| **13x13** | `MemoryMaze-13x13-v0` |  5  | 3000  | 55.5 | 74.5 |\n| **15x15** | `MemoryMaze-15x15-v0` |  6  | 4000  | 67.7 | 87.7 |\n\nThe mazes are generated with [labmaze](https://github.com/deepmind/labmaze), the same algorithm as used by [DmLab-30](https://github.com/deepmind/lab/tree/master/game_scripts/levels/contributed/dmlab30). The 9x9 corresponds to the [small](https://github.com/deepmind/lab/tree/master/game_scripts/levels/contributed/dmlab30#goal-locations-small) variant and 15x15 corresponds to the [large](https://github.com/deepmind/lab/tree/master/game_scripts/levels/contributed/dmlab30#goal-locations-large) variant.\n\n## Gym Interface\n\nYou can create the environment using the [Gym](https://github.com/openai/gym) interface:\n\n```python\n!pip install gym\nimport gym\n\n# Set this if you are getting \"Unable to load EGL library\" error:\n#  os.environ['MUJOCO_GL'] = 'glfw'  \n\nenv = gym.make('memory_maze:MemoryMaze-9x9-v0')\nenv = gym.make('memory_maze:MemoryMaze-11x11-v0')\nenv = gym.make('memory_maze:MemoryMaze-13x13-v0')\nenv = gym.make('memory_maze:MemoryMaze-15x15-v0')\n```\n\n**Troubleshooting:** if you are getting \"Unable to load EGL library error\", that is because we enable MuJoCo headless GPU rendering (`MUJOCO_GL=egl`) by default. If you are testing locally on your machine, you can enable windowed rendering instead (`MUJOCO_GL=glfw`). [Read here](https://github.com/deepmind/dm_control#rendering) about the different rendering options. \n\nThe default environment has 64x64 image observations:\n\n```python\n>>> env.observation_space\nBox(0, 255, (64, 64, 3), uint8)\n```\n\nThere are 6 discrete actions:\n\n```python\n>>> env.action_space\nDiscrete(6)  # (noop, forward, left, right, forward_left, forward_right)\n```\n\nTo create an environment with extra observations for debugging and probe analysis, append `ExtraObs` to the names:\n\n```python\n>>> env = gym.make('memory_maze:MemoryMaze-9x9-ExtraObs-v0')\n>>> env.observation_space\nDict(\n    agent_dir: Box(-inf, inf, (2,), float64), \n    agent_pos: Box(-inf, inf, (2,), float64),\n    image: Box(0, 255, (64, 64, 3), uint8),\n    maze_layout: Box(0, 1, (9, 9), uint8),\n    target_color: Box(-inf, inf, (3,), float64),\n    target_pos: Box(-inf, inf, (2,), float64),\n    target_vec: Box(-inf, inf, (2,), float64),\n    targets_pos: Box(-inf, inf, (3, 2), float64),\n    targets_vec: Box(-inf, inf, (3, 2), float64)\n)\n```\n\nWe also register [additional variants](memory_maze/__init__.py) of the environment that can be useful in certain scenarios.\n\n## DeepMind Interface\n\nYou can create the environment using the [dm_env](https://github.com/deepmind/dm_env) interface:\n\n```python\nfrom memory_maze import tasks\n\nenv = tasks.memory_maze_9x9()\nenv = tasks.memory_maze_11x11()\nenv = tasks.memory_maze_13x13()\nenv = tasks.memory_maze_15x15()\n```\n\nEach observation is a dictionary that includes `image` key:\n\n```python\n>>> env.observation_spec()\n{\n  'image': BoundedArray(shape=(64, 64, 3), ...)\n}\n```\n\nThe constructor accepts a number of arguments, which can be used to tweak the environment:\n\n```python\nenv = tasks.memory_maze_9x9(\n    global_observables=True,\n    image_only_obs=False,\n    top_camera=False,\n    camera_resolution=64,\n    control_freq=4.0,\n    discrete_actions=True,\n)\n```\n\n## Offline Dataset\n\n[**Dataset download here** (~100GB per dataset)](https://drive.google.com/drive/folders/1RcnkTZVwEHnAQeEuw7X8Y1RPSmrFLDFB)\n\nWe provide two datasets of experience collected from the Memory Maze environment: Memory Maze 9x9 (30M) and Memory Maze 15x15 (30M). Each dataset contains 30 thousand trajectories from Memory Maze 9x9 and 15x15 environments respectively, split into 29k trajectories for training and 1k for evaluation. All trajectories are 1000 steps long, so each dataset has 30M steps total.\n\nThe data is generated with a scripted policy that navigates to randomly chosen points in the maze under action noise. This choice of policy was made to generate diverse trajectories that explore the maze effectively and that form spatial loops, which can be important for learning long-term memory. We intentionally avoid recording data with a trained agent to ensure a diverse data distribution and to avoid dataset bias that could favor some methods over others. Because of this, the rewards are quite sparse in the data, occurring on average 1-2 times per trajectory.\n\nEach trajectory is saved as an NPZ file with the following entries available:\n\n| Key            | Shape              | Type   | Description                                   |\n|----------------|--------------------|--------|-----------------------------------------------|\n| `image`        | (64, 64, 3)        | uint8  | First-person view observation                 |\n| `action`       | (6)                | binary | Last action, one-hot encoded                  |\n| `reward`       | ()                 | float  | Last reward                                   |\n| `maze_layout`  | (9, 9) or (15, 15) | binary | Maze layout (wall / no wall)                  |\n| `agent_pos`    | (2)                | float  | Agent position in global coordinates          |\n| `agent_dir`    | (2)                | float  | Agent orientation as a unit vector            |\n| `targets_pos`  | (3, 2) or (6, 2)   | float  | Object locations in global coordinates        |\n| `targets_vec`  | (3, 2) or (6, 2)   | float  | Object locations in agent-centric coordinates |\n| `target_pos`   | (2)                | float  | Current target object location, global        |\n| `target_vec`   | (2)                | float  | Current target object location, agent-centric |\n| `target_color` | (3)                | float  | Current target object color RGB               |\n\nYou can load a trajectory using [`np.load()`](https://numpy.org/doc/stable/reference/generated/numpy.load.html) to obtain a dictionary of Numpy arrays as follows:\n\n```python\nepisode = np.load('trajectory.npz')\nepisode = {key: episode[key] for key in episode.keys()}\n\nassert episode['image'].shape == (1001, 64, 64, 3)\nassert episode['image'].dtype == np.uint8\n```\n\nAll tensors have a leading time dimension, e.g. `image` tensor has shape (1001, 64, 64, 3). The tensor length is 1001 because there are 1000 steps (actions) in a trajectory, `image[0]` is the observation *before* the first action, and `image[-1]` is the observation *after* the last action.\n\n## Online RL Baselines\n\nIn our [research paper](https://arxiv.org/abs/2210.13383), we evaluate the model-free [IMPALA](https://github.com/google-research/seed_rl/tree/master/agents/vtrace) agent and the model-based [Dreamer](https://github.com/jurgisp/pydreamer) agent as baselines.\n\n<p align=\"center\">\n  <img width=\"650\" alt=\"baselines\" src=\"https://user-images.githubusercontent.com/3135115/197349778-74073613-bf6c-449b-b5c2-07adf21030ff.png\">\n  <br/>\n  <img width=\"650\" alt=\"training\" src=\"https://user-images.githubusercontent.com/3135115/197485498-60560934-2629-47b0-ada8-0484398800d0.png\">\n</p>\n\nHere are videos of the learned behaviors:\n\n**Memory 9x9 - Dreamer (TBTT)**\n\nhttps://user-images.githubusercontent.com/3135115/197378287-4e413440-7097-4d11-8627-3d7fac0845f1.mp4\n\n**Memory 9x9 - IMPALA (400M)**\n\nhttps://user-images.githubusercontent.com/3135115/197378929-7fe3f374-c11c-409a-8a95-03feeb489330.mp4\n\n**Memory 15x15 - Dreamer (TBTT)**\n\nhttps://user-images.githubusercontent.com/3135115/197378324-fb99b496-dba8-4b00-ad80-2d6e19ba8acd.mp4\n\n**Memory 15x15 - IMPALA (400M)**\n\nhttps://user-images.githubusercontent.com/3135115/197378936-939e7615-9dad-4765-b0ef-a49c5a38fe28.mp4\n\n## Offline Probing Baselines\n\nHere we visualize probe predictions alongside trajectories of the offline dataset, as explained in [the paper](https://arxiv.org/abs/2210.13383). These trajectories are from the offline dataset, where the agent just navigates to random points in the maze, it does *not* try to collect rewards.\n\nBottom-left: Object location predictions (x) versus the actual locations (o).\n\nBottom-right: Wall layout predictions (dark green = true positive, light green = true negative, light red = false positive, dark red = false negative).\n\n**Memory 9x9 Walls Objects - RSSM (TBTT)**\n\nhttps://user-images.githubusercontent.com/3135115/197379227-775ec5bc-0780-4dcc-b7f1-660bc7cf95f1.mp4\n\n**Memory 9x9 Walls Objects - Supervised oracle**\n\nhttps://user-images.githubusercontent.com/3135115/197379235-a5ea0388-2718-4035-8bbc-064ecc9ea444.mp4\n\n**Memory 15x15 Walls Objects - RSSM (TBTT)**\n\nhttps://user-images.githubusercontent.com/3135115/197379245-fb96bd12-6ef5-481e-adc6-f119a39e8e43.mp4\n\n**Memory 15x15 Walls Objects - Supervised oracle**\n\nhttps://user-images.githubusercontent.com/3135115/197379248-26a8093e-8b54-443c-b154-e33e0383b5e4.mp4\n\n## Questions\n\nPlease [open an issue][issues] on Github.\n\n[issues]: https://github.com/jurgisp/memory-maze/issues\n"
  },
  {
    "path": "gui/recording.py",
    "content": "from datetime import datetime\nfrom pathlib import Path\n\nimport gym\nimport imageio\nimport numpy as np\n\nfrom PIL import Image\n\n\nclass SaveNpzWrapper(gym.Wrapper):\n\n    def __init__(self, env, log_dir, video_fps=30, video_size=256, video_format='mp4'):\n        env = ActionRewardResetWrapper(env)\n        env = CollectWrapper(env)\n        super().__init__(env)\n        self.log_dir = Path(log_dir)\n        self.log_dir.mkdir(parents=True, exist_ok=True)\n        self.video_fps = video_fps\n        self.video_size = video_size\n        self.video_format = video_format\n\n    def step(self, action):\n        obs, reward, done, info = self.env.step(action)  # type: ignore\n        data = info.get('episode')\n        if data:\n            ep_id = info['episode_id']\n            ep_reward = data['reward'].sum()\n            ep_steps = len(data['reward']) - 1\n            ep_name = f'{ep_id}-r{ep_reward:.0f}-{ep_steps:04}'\n            self._save_npz(data, self.log_dir / f'{ep_name}.npz')\n            if self.video_format:\n                self._save_video(data, self.log_dir / f'{ep_name}.{self.video_format}')\n        return obs, reward, done, info\n\n    def _save_npz(self, data, path):\n        with path.open('wb') as f:\n            np.savez_compressed(f, **data)\n        print(f'Saved {path}', {k: v.shape for k, v in data.items()})\n    \n    def _save_video(self, data, path):\n        writer = imageio.get_writer(path, fps=self.video_fps)\n        for frame in data['image']:\n            img = Image.fromarray(frame)\n            img = img.resize((self.video_size, self.video_size), resample=0)\n            writer.append_data(np.array(img))\n        writer.close()\n        print(f'Saved {path}')\n\n\nclass CollectWrapper(gym.Wrapper):\n    \"\"\"Copied from pydreamer.envs.wrappers.\"\"\"\n\n    def __init__(self, env):\n        super().__init__(env)\n        self.env = env\n        self.episode = []\n        self.episode_id = ''\n\n    def step(self, action):\n        obs, reward, done, info = self.env.step(action)\n        self.episode.append(obs.copy())\n        if done:\n            episode = {k: np.array([t[k] for t in self.episode]) for k in self.episode[0]}\n            info['episode'] = episode\n        info['episode_id'] = self.episode_id\n        return obs, reward, done, info\n\n    def reset(self):\n        obs = self.env.reset()\n        self.episode = [obs.copy()]\n        self.episode_id = datetime.now().strftime('%Y%m%dT%H%M%S')\n        return obs\n\n\nclass ActionRewardResetWrapper(gym.Wrapper):\n    \"\"\"Copied from pydreamer.envs.wrappers.\"\"\"\n\n    def __init__(self, env, no_terminal=False):\n        super().__init__(env)\n        self.env = env\n        self.no_terminal = no_terminal\n        # Handle environments with one-hot or discrete action, but collect always as one-hot\n        self.action_size = env.action_space.n if hasattr(env.action_space, 'n') else env.action_space.shape[0]\n\n    def step(self, action):\n        obs, reward, done, info = self.env.step(action)\n        if isinstance(action, int):\n            action_vec = np.zeros(self.action_size)\n            action_vec[action] = 1.0\n        else:\n            assert isinstance(action, np.ndarray) and action.shape == (self.action_size,), \"Wrong one-hot action shape\"\n            action_vec = action\n        obs['action'] = action_vec\n        obs['reward'] = np.array(reward)\n        obs['terminal'] = np.array(False if self.no_terminal or 'TimeLimit.truncated' in info or info.get('time_limit') else done)\n        obs['reset'] = np.array(False)\n        return obs, reward, done, info\n\n    def reset(self):\n        obs = self.env.reset()\n        obs['action'] = np.zeros(self.action_size)\n        obs['reward'] = np.array(0.0)\n        obs['terminal'] = np.array(False)\n        obs['reset'] = np.array(True)\n        return obs\n"
  },
  {
    "path": "gui/requirements.txt",
    "content": "gym\npygame\npillow\nimageio\nimageio-ffmpeg\n"
  },
  {
    "path": "gui/run_gui.py",
    "content": "import os, sys\n\nimport argparse\nfrom collections import defaultdict\n\nimport gym\nimport numpy as np\nimport pygame\nimport pygame.freetype\nfrom gym import spaces\nfrom PIL import Image\n\nfrom recording import SaveNpzWrapper\n\nif 'MUJOCO_GL' not in os.environ:\n    if \"linux\" in sys.platform:\n        os.environ['MUJOCO_GL'] = 'osmesa' # Software rendering to avoid rendering interference with pygame\n    else:\n        os.environ['MUJOCO_GL'] = 'glfw'  # Windowed rendering\n\nPANEL_LEFT = 250\nPANEL_RIGHT = 250\nFOCUS_HACK = False\nRECORD_DIR = './log'\nK_NONE = tuple()\n\n\ndef get_keymap(env):\n    return {\n        tuple(): 0,\n        (pygame.K_UP, ): 1,\n        (pygame.K_LEFT, ): 2,\n        (pygame.K_RIGHT, ): 3,\n        (pygame.K_UP, pygame.K_LEFT): 4,\n        (pygame.K_UP, pygame.K_RIGHT): 5,\n    }\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--env', type=str, default='memory_maze:MemoryMaze-9x9-v0')\n    parser.add_argument('--size', type=int, nargs=2, default=(600, 600))\n    parser.add_argument('--fps', type=int, default=6)\n    parser.add_argument('--random', type=float, default=0.0)\n    parser.add_argument('--noreset', action='store_true')\n    parser.add_argument('--fullscreen', action='store_true')\n    parser.add_argument('--nonoop', action='store_true', help='Pause instead of noop')\n    parser.add_argument('--record', action='store_true')\n    parser.add_argument('--record_mp4', action='store_true')\n    parser.add_argument('--record_gif', action='store_true')\n    args = parser.parse_args()\n    render_size = args.size\n    window_size = (render_size[0] + PANEL_LEFT + PANEL_RIGHT, render_size[1])\n\n    print(f'Creating environment: {args.env}')\n    env = gym.make(args.env, disable_env_checker=True)\n\n    if isinstance(env.observation_space, spaces.Dict):\n        print('Observation space:')\n        for k, v in env.observation_space.spaces.items():  # type: ignore\n            print(f'{k:>25}: {v}')\n    else:\n        print(f'Observation space:  {env.observation_space}')\n    print(f'Action space:  {env.action_space}')\n\n    if args.record:\n        env = SaveNpzWrapper(\n            env,\n            RECORD_DIR,\n            video_format='mp4' if args.record_mp4 else 'gif' if args.record_gif else None,\n            video_fps=args.fps * 2)\n\n    keymap = get_keymap(env)\n\n    steps = 0\n    return_ = 0.0\n    episode = 0\n    obs = env.reset()\n\n    pygame.init()\n    start_fullscreen = args.fullscreen or FOCUS_HACK\n    screen = pygame.display.set_mode(window_size, pygame.FULLSCREEN if start_fullscreen else 0)\n    if FOCUS_HACK and not args.fullscreen:\n        # Hack: for some reason app window doesn't get focus when launching, so\n        # we launch it as full screen and then exit full screen.\n        pygame.display.toggle_fullscreen()\n    clock = pygame.time.Clock()\n    font = pygame.freetype.SysFont('Mono', 16)\n    fontsmall = pygame.freetype.SysFont('Mono', 12)\n\n    running = True\n    paused = False\n    speedup = False\n\n    while running:\n\n        # Rendering\n\n        screen.fill((64, 64, 64))\n\n        # Render image observation\n        if isinstance(obs, dict):\n            assert 'image' in obs, 'Expecting dictionary observation with obs[\"image\"]'\n            image = obs['image']  # type: ignore\n        else:\n            assert isinstance(obs, np.ndarray) and len(obs.shape) == 3, 'Expecting image observation'\n            image = obs\n        image = Image.fromarray(image)\n        image = image.resize(render_size, resample=0)\n        image = np.array(image)\n        surface = pygame.surfarray.make_surface(image.transpose((1, 0, 2)))\n        screen.blit(surface, (PANEL_LEFT, 0))\n\n        # Render statistics\n        lines = obs_to_text(obs, env, steps, return_)\n        y = 5\n        for line in lines:\n            text_surface, rect = font.render(line, (255, 255, 255))\n            screen.blit(text_surface, (16, y))\n            y += font.size + 2  # type: ignore\n\n        # Render keymap help\n        lines = keymap_to_text(keymap)\n        y = 5\n        for line in lines:\n            text_surface, rect = fontsmall.render(line, (255, 255, 255))\n            screen.blit(text_surface, (render_size[0] + PANEL_LEFT + 16, y))\n            y += fontsmall.size + 2  # type: ignore\n\n        pygame.display.flip()\n        clock.tick(args.fps if not speedup else 0)\n\n        # Keyboard input\n\n        pygame.event.pump()\n        keys_down = defaultdict(bool)\n        for event in pygame.event.get():\n            if event.type == pygame.QUIT:  # Close\n                running = False\n            if event.type == pygame.KEYDOWN:\n                keys_down[event.key] = True\n        keys_hold = pygame.key.get_pressed()\n\n        # Action keys\n        action = keymap[K_NONE]  # noop, if no keys pressed\n        for keys, act in keymap.items():\n            if all(keys_hold[key] or keys_down[key] for key in keys):\n                # The last keymap entry which has all keys pressed wins\n                action = act\n\n        # Special keys\n        force_reset = False\n        speedup = False\n        if keys_down[pygame.K_ESCAPE]:  # Quit\n            running = False\n        if keys_down[pygame.K_SPACE]:  # Pause\n            paused = not paused\n        else:\n            if action != keymap[K_NONE]:\n                paused = False  # unpause on action press\n        if keys_down[pygame.K_BACKSPACE]:  # Force reset\n            force_reset = True\n        if keys_hold[pygame.K_TAB]:\n            speedup = True\n\n        if paused:\n            continue\n        if action == keymap[K_NONE] and args.nonoop and not force_reset:\n            continue\n\n        # Environment step\n\n        if args.random:\n            if np.random.random() < args.random:\n                action = env.action_space.sample()\n\n        obs, reward, done, info = env.step(action)  # type: ignore\n        # print({k: v for k, v in obs.items() if k != 'image'})\n        steps += 1\n        return_ += reward\n\n        # Episode end\n\n        if reward:\n            print(f'reward: {reward}')\n        if done or force_reset:\n            print(f'Episode done - length: {steps}  return: {return_}')\n            obs = env.reset()\n            steps = 0\n            return_ = 0.0\n            episode += 1\n            if done and args.record:\n                # If recording, require relaunch for next episode\n                running = False\n\n    pygame.quit()\n\n\ndef obs_to_text(obs, env, steps, return_):\n    kvs = []\n    kvs.append(('## Stats ##', ''))\n    kvs.append(('', ''))\n    kvs.append(('step', steps))\n    kvs.append(('return', return_))\n    lines = [f'{k:<15} {v:>5}' for k, v in kvs]\n    return lines\n\n\ndef keymap_to_text(keymap, verbose=False):\n    kvs = []\n    kvs.append(('## Commands ##', ''))\n    kvs.append(('', ''))\n\n    # mapped actions\n    kvs.append(('forward', 'up arrow'))\n    kvs.append(('left', 'left arrow'))\n    kvs.append(('right', 'right arrow'))\n\n    # special actions\n    kvs.append(('', ''))\n    kvs.append(('reset', 'backspace'))\n    kvs.append(('pause', 'space'))\n    kvs.append(('speed up', 'tab'))\n    kvs.append(('quit', 'esc'))\n\n    lines = [f'{k:<15} {v}' for k, v in kvs]\n    return lines\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "memory_maze/__init__.py",
    "content": "import os\n\n# NOTE: Env MUJOCO_GL=egl is necessary for headless hardware rendering on GPU,\n# but breaks when running on a CPU machine. Alternatively set MUJOCO_GL=osmesa.\nif 'MUJOCO_GL' not in os.environ:\n    os.environ['MUJOCO_GL'] = 'egl'\n\nfrom . import tasks\n\ntry:\n    # Register gym environments, if gym is available\n\n    from typing import Callable\n    from functools import partial as f\n\n    import dm_env\n    import gym\n    from gym.envs.registration import register\n\n    from .gym_wrappers import GymWrapper\n\n    def _make_gym_env(dm_task: Callable[[], dm_env.Environment], **kwargs):\n        dmenv = dm_task(**kwargs)\n        return GymWrapper(dmenv)\n\n    sizes = {\n        '9x9': tasks.memory_maze_9x9,\n        '11x11': tasks.memory_maze_11x11,\n        '13x13': tasks.memory_maze_13x13,\n        '15x15': tasks.memory_maze_15x15,\n    }\n\n    for key, dm_task in sizes.items():\n        # Image-only obs space\n        register(id=f'MemoryMaze-{key}-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True))  # Standard\n        register(id=f'MemoryMaze-{key}-Vis-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, good_visibility=True))  # Easily visible targets\n        register(id=f'MemoryMaze-{key}-HD-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, camera_resolution=256))  # High-res camera\n        register(id=f'MemoryMaze-{key}-Top-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, camera_resolution=256, top_camera=True))  # Top-down camera\n        \n        # Extra global observables (dict obs space)\n        register(id=f'MemoryMaze-{key}-ExtraObs-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True))\n        register(id=f'MemoryMaze-{key}-ExtraObs-Vis-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True, good_visibility=True))\n        register(id=f'MemoryMaze-{key}-ExtraObs-Top-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True, camera_resolution=256, top_camera=True))\n        \n        # Oracle observables with shortest path shown\n        register(id=f'MemoryMaze-{key}-Oracle-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, global_observables=True, show_path=True))\n        register(id=f'MemoryMaze-{key}-Oracle-Top-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, global_observables=True, show_path=True, camera_resolution=256, top_camera=True))\n        register(id=f'MemoryMaze-{key}-Oracle-ExtraObs-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True, show_path=True))\n        \n        # High control frequency\n        register(id=f'MemoryMaze-{key}-HiFreq-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, control_freq=40))\n        register(id=f'MemoryMaze-{key}-HiFreq-Vis-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, control_freq=40, good_visibility=True))\n        register(id=f'MemoryMaze-{key}-HiFreq-HD-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, control_freq=40, camera_resolution=256))\n\n        # Six colors even for smaller mazes\n        register(id=f'MemoryMaze-{key}-6CL-v0', entry_point=f(_make_gym_env, dm_task, randomize_colors=True, image_only_obs=True))\n        register(id=f'MemoryMaze-{key}-6CL-Top-v0', entry_point=f(_make_gym_env, dm_task, randomize_colors=True, image_only_obs=True, camera_resolution=256, top_camera=True))\n        register(id=f'MemoryMaze-{key}-6CL-ExtraObs-v0', entry_point=f(_make_gym_env, dm_task, randomize_colors=True, global_observables=True))\n        \n\n\nexcept ImportError:\n    print('memory_maze: gym environments not registered.')\n    raise\n"
  },
  {
    "path": "memory_maze/gym_wrappers.py",
    "content": "from typing import Any, Tuple\nimport numpy as np\n\nimport dm_env\nimport gym\nfrom dm_env import specs\nfrom gym import spaces\n\n\nclass GymWrapper(gym.Env):\n\n    def __init__(self, env: dm_env.Environment):\n        self.env = env\n        self.action_space = _convert_to_space(env.action_spec())\n        self.observation_space = _convert_to_space(env.observation_spec())\n\n    def reset(self) -> Any:\n        ts = self.env.reset()\n        return ts.observation\n\n    def step(self, action) -> Tuple[Any, float, bool, dict]:\n        ts = self.env.step(action)\n        assert not ts.first(), \"dm_env.step() caused reset, reward will be undefined.\"\n        assert ts.reward is not None\n        done = ts.last()\n        terminal = ts.last() and ts.discount == 0.0\n        info = {}\n        if done and not terminal:\n            info['TimeLimit.truncated'] = True  # acme.GymWrapper understands this and converts back to dm_env.truncation()\n        return ts.observation, ts.reward, done, info\n\n\ndef _convert_to_space(spec: Any) -> gym.Space:\n    # Inverse of acme.gym_wrappers._convert_to_spec\n\n    if isinstance(spec, specs.DiscreteArray):\n        return spaces.Discrete(spec.num_values)\n\n    if isinstance(spec, specs.BoundedArray):\n        return spaces.Box(\n            shape=spec.shape,\n            dtype=spec.dtype,\n            low=spec.minimum.item() if len(spec.minimum.shape) == 0 else spec.minimum,\n            high=spec.maximum.item() if len(spec.maximum.shape) == 0 else spec.maximum)\n    \n    if isinstance(spec, specs.Array):\n        return spaces.Box(\n            shape=spec.shape,\n            dtype=spec.dtype,\n            low=-np.inf,\n            high=np.inf)\n\n    if isinstance(spec, tuple):\n        return spaces.Tuple(_convert_to_space(s) for s in spec)\n\n    if isinstance(spec, dict):\n        return spaces.Dict({key: _convert_to_space(value) for key, value in spec.items()})\n\n    raise ValueError(f'Unexpected spec: {spec}')\n"
  },
  {
    "path": "memory_maze/helpers.py",
    "content": "from dm_env.specs import BoundedArray, DiscreteArray\nimport numpy as np\n\ndef sample_spec(space: BoundedArray) -> np.ndarray:\n    if isinstance(space, DiscreteArray):\n        return np.random.randint(space.num_values, size=space.shape)\n    \n    if isinstance(space, BoundedArray):\n        return np.random.uniform(space.minimum, space.maximum, size=space.shape)\n    \n    raise NotImplementedError\n"
  },
  {
    "path": "memory_maze/maze.py",
    "content": "from typing import Optional\nimport functools\nimport string\n\nimport labmaze\nimport numpy as np\nfrom dm_control import mjcf\nfrom dm_control.composer.observation import observable as observable_lib\nfrom dm_control.locomotion.arenas import covering, labmaze_textures, mazes\nfrom dm_control.locomotion.props import target_sphere\nfrom dm_control.locomotion.tasks import random_goal_maze\nfrom dm_control.locomotion.walkers import jumping_ball\nfrom labmaze import assets as labmaze_assets\nfrom numpy.random import RandomState\n\nDEFAULT_CONTROL_TIMESTEP = 0.025\nDEFAULT_PHYSICS_TIMESTEP = 0.005\n\nTARGET_COLORS = [\n    np.array([170, 38, 30]) / 220,  # red\n    np.array([99, 170, 88]) / 220,  # green\n    np.array([39, 140, 217]) / 220,  # blue\n    np.array([93, 105, 199]) / 220,  # purple\n    np.array([220, 193, 59]) / 220,  # yellow\n    np.array([220, 128, 107]) / 220,  # salmon\n]\n\n\nclass RollingBallWithFriction(jumping_ball.RollingBallWithHead):\n\n    def _build(self, roll_damping=5.0, steer_damping=20.0, **kwargs):\n        super()._build(**kwargs)\n        # Increase friction to the joints, so the movement feels more like traditional\n        # first-person navigation control, without much acceleration/deceleration.\n        self._mjcf_root.find('joint', 'roll').damping = roll_damping\n        self._mjcf_root.find('joint', 'steer').damping = steer_damping\n\n\nclass MemoryMazeTask(random_goal_maze.NullGoalMaze):\n    # Adapted from dm_control.locomotion.tasks.RepeatSingleGoalMaze\n\n    def __init__(self,\n                 walker,\n                 maze_arena,\n                 n_targets=3,\n                 target_radius=0.3,\n                 target_height_above_ground=0.0,\n                 target_reward_scale=1.0,\n                 target_randomize_colors=False,\n                 enable_global_task_observables=False,\n                 camera_resolution=64,\n                 physics_timestep=DEFAULT_PHYSICS_TIMESTEP,\n                 control_timestep=DEFAULT_CONTROL_TIMESTEP,\n                 ):\n        super().__init__(\n            walker=walker,\n            maze_arena=maze_arena,\n            randomize_spawn_position=True,\n            randomize_spawn_rotation=True,\n            contact_termination=False,\n            enable_global_task_observables=enable_global_task_observables,\n            physics_timestep=physics_timestep,\n            control_timestep=control_timestep\n        )\n        self.n_targets = n_targets\n        self._target_radius = target_radius\n        self._target_height_above_ground = target_height_above_ground\n        self._target_reward_scale = target_reward_scale\n        self._target_randomize_colors = target_randomize_colors\n\n        self._targets = []\n        self._target_colors = list(TARGET_COLORS)  # This contains all colors, not only n_targets\n        self._create_targets()\n        self._current_target_ix = 0\n        self._rewarded_this_step = False\n        self._targets_obtained = 0\n\n        if enable_global_task_observables:\n            # Add egocentric vectors to targets\n            xpos_origin_callable = lambda phys: phys.bind(walker.root_body).xpos\n\n            def _target_pos(physics, targets, index):\n                return physics.bind(targets[index].geom).xpos\n\n            for i in range(n_targets):\n                # Absolute target position\n                walker.observables.add_observable(\n                    f'target_abs_{i}',\n                    observable_lib.Generic(functools.partial(_target_pos, targets=self._targets, index=i)),\n                )\n                # Relative target position\n                walker.observables.add_egocentric_vector(\n                    f'target_rel_{i}',\n                    observable_lib.Generic(functools.partial(_target_pos, targets=self._targets, index=i)),\n                    origin_callable=xpos_origin_callable)\n\n        self._task_observables = super().task_observables\n\n        def _current_target_index(_):\n            return self._current_target_ix\n\n        def _current_target_color(_):\n            return self._target_colors[self._current_target_ix]\n\n        self._task_observables['target_index'] = observable_lib.Generic(_current_target_index)\n        self._task_observables['target_index'].enabled = True\n        self._task_observables['target_color'] = observable_lib.Generic(_current_target_color)\n        self._task_observables['target_color'].enabled = True\n\n        self._walker.observables.egocentric_camera.height = camera_resolution\n        self._walker.observables.egocentric_camera.width = camera_resolution\n        self._maze_arena.observables.top_camera.height = camera_resolution\n        self._maze_arena.observables.top_camera.width = camera_resolution\n\n    @property\n    def task_observables(self):\n        return self._task_observables\n\n    @property\n    def name(self):\n        return 'memory_maze'\n\n    def initialize_episode_mjcf(self, rng: RandomState):\n        self._maze_arena.regenerate(rng)  # Bypass super()._initialize_episode_mjcf(), because it ignores rng\n        while True:\n            if self._target_randomize_colors:\n                # Recreate target objects with new colors\n                self._create_targets(clear_existing=True, randomize_colors=True, rng=rng)\n            ok = self._place_targets(rng)\n            if not ok:\n                # Could not place targets - regenerate the maze\n                self._maze_arena.regenerate(rng)\n                continue\n            break\n        self._pick_new_target(rng)\n\n    def initialize_episode(self, physics, rng: RandomState):\n        super().initialize_episode(physics, rng)\n        self._rewarded_this_step = False\n        self._targets_obtained = 0\n\n    def after_step(self, physics, rng: RandomState):\n        super().after_step(physics, rng)\n        self._rewarded_this_step = False\n        for i, target in enumerate(self._targets):\n            if target.activated:\n                if i == self._current_target_ix:\n                    self._rewarded_this_step = True\n                    self._targets_obtained += 1\n                    self._pick_new_target(rng)\n                target.reset(physics)  # Resets activated=False\n\n    def should_terminate_episode(self, physics):\n        return super().should_terminate_episode(physics)\n\n    def get_reward(self, physics):\n        if self._rewarded_this_step:\n            return self._target_reward_scale\n        return 0.0\n\n    def _create_targets(self, clear_existing=False, randomize_colors=False, rng: Optional[RandomState] = None):\n        if clear_existing:\n            while self._targets:\n                target = self._targets.pop()\n                target.detach()  # Important to detach old targets, if creating new ones\n        else:\n            assert not self._targets, 'Targets already created.'\n\n        if randomize_colors:\n            assert rng is not None\n            rng.shuffle(self._target_colors)\n\n        for i in range(self.n_targets):\n            color = self._target_colors[i]\n            target = target_sphere.TargetSphere(\n                radius=self._target_radius,\n                height_above_ground=self._target_radius + self._target_height_above_ground,\n                rgb1=tuple(color * 1.0),\n                rgb2=tuple(color * 1.0),\n            )\n            self._targets.append(target)\n            self._maze_arena.attach(target)\n\n    def _place_targets(self, rng: RandomState) -> bool:\n        possible_positions = list(self._maze_arena.target_positions)\n        rng.shuffle(possible_positions)\n        if len(possible_positions) < len(self._targets):\n            # Too few rooms - need to regenerate the maze\n            return False\n        for target, pos in zip(self._targets, possible_positions):\n            mjcf.get_attachment_frame(target.mjcf_model).pos = pos\n        return True\n\n    def _pick_new_target(self, rng: RandomState):\n        while True:\n            ix = rng.randint(len(self._targets))\n            if self._targets[ix].activated:\n                continue  # Skip the target that the agent is touching\n            self._current_target_ix = ix\n            break\n\n\nclass FixedWallTexture(labmaze_textures.WallTextures):\n    \"\"\"Selects a single texture instead of a collection to sample from.\"\"\"\n\n    def _build(self, style, texture_name):\n        labmaze_textures = labmaze_assets.get_wall_texture_paths(style)\n        self._mjcf_root = mjcf.RootElement(model='labmaze_' + style)\n        self._textures = []\n        if texture_name not in labmaze_textures:\n            raise ValueError(f'`texture_name` should be one of {labmaze_textures.keys()}: got {texture_name}')\n        texture_path = labmaze_textures[texture_name]\n        self._textures.append(self._mjcf_root.asset.add(  # type: ignore\n            'texture', type='2d', name=texture_name,\n            file=texture_path.format(texture_name)))\n\n\nclass FixedFloorTexture(labmaze_textures.FloorTextures):\n    \"\"\"Selects a single texture instead of a collection to sample from.\"\"\"\n\n    def _build(self, style, texture_names):\n        labmaze_textures = labmaze_assets.get_floor_texture_paths(style)\n        self._mjcf_root = mjcf.RootElement(model='labmaze_' + style)\n        self._textures = []\n        if isinstance(texture_names, str):\n            texture_names = [texture_names]\n        for texture_name in texture_names:\n            if texture_name not in labmaze_textures:\n                raise ValueError(f'`texture_name` should be one of {labmaze_textures.keys()}: got {texture_name}')\n            texture_path = labmaze_textures[texture_name]\n            self._textures.append(self._mjcf_root.asset.add(  # type: ignore\n                'texture', type='2d', name=texture_name,\n                file=texture_path.format(texture_name)))\n\n\nclass MazeWithTargetsArena(mazes.MazeWithTargets):\n    \"\"\"Fork of mazes.RandomMazeWithTargets.\"\"\"\n\n    def _build(self,\n               x_cells,\n               y_cells,\n               xy_scale=2.0,\n               z_height=2.0,\n               max_rooms=4,\n               room_min_size=3,\n               room_max_size=5,\n               spawns_per_room=0,\n               targets_per_room=0,\n               max_variations=26,\n               simplify=True,\n               skybox_texture=None,\n               wall_textures=None,\n               floor_textures=None,\n               aesthetic='default',\n               name='random_maze',\n               random_seed=None):\n        assert random_seed, \"Expected to be set by tasks._memory_maze()\"\n        super()._build(\n            maze=TextMazeVaryingWalls(\n                height=y_cells,\n                width=x_cells,\n                max_rooms=max_rooms,\n                room_min_size=room_min_size,\n                room_max_size=room_max_size,\n                max_variations=max_variations,\n                spawns_per_room=spawns_per_room,\n                objects_per_room=targets_per_room,\n                simplify=simplify,\n                random_seed=random_seed),\n            xy_scale=xy_scale,\n            z_height=z_height,\n            skybox_texture=skybox_texture,\n            wall_textures=wall_textures,\n            floor_textures=floor_textures,\n            aesthetic=aesthetic,\n            name=name)\n\n    def regenerate(self, random_state):\n        \"\"\"Generates a new maze layout.\n\n        Patch of MazeWithTargets.regenerate() which uses random_state.\n        \"\"\"\n        self._maze.regenerate()\n        # logging.debug('GENERATED MAZE:\\n%s', self._maze.entity_layer)\n        self._find_spawn_and_target_positions()\n\n        if self._text_maze_regenerated_hook:\n            self._text_maze_regenerated_hook()\n\n        # Remove old texturing planes.\n        for geom_name in self._texturing_geom_names:\n            del self._mjcf_root.worldbody.geom[geom_name]\n        self._texturing_geom_names = []\n\n        # Remove old texturing materials.\n        for material_name in self._texturing_material_names:\n            del self._mjcf_root.asset.material[material_name]\n        self._texturing_material_names = []\n\n        # Remove old actual-wall geoms.\n        self._maze_body.geom.clear()\n\n        self._current_wall_texture = {\n            wall_char: random_state.choice(wall_textures)  # PATCH: use random_state for wall textures\n            for wall_char, wall_textures in self._wall_textures.items()\n        }\n\n        for wall_char in self._wall_textures:\n            self._make_wall_geoms(wall_char)\n        self._make_floor_variations()\n\n    def _make_floor_variations(self, build_tile_geoms_fn=None):\n        \"\"\"Fork of mazes.MazeWithTargets._make_floor_variations().\n\n        Makes the room floors different if possible, instead of sampling randomly.\n        \"\"\"\n        _DEFAULT_FLOOR_CHAR = '.'\n\n        main_floor_texture = self._floor_textures[0]\n        if len(self._floor_textures) > 1:\n            room_floor_textures = self._floor_textures[1:]\n        else:\n            room_floor_textures = [main_floor_texture]\n\n        for i_var, variation in enumerate(_DEFAULT_FLOOR_CHAR + string.ascii_uppercase):\n            if variation not in self._maze.variations_layer:\n                break\n\n            if build_tile_geoms_fn is None:\n                # Break the floor variation down to odd-sized tiles.\n                tiles = covering.make_walls(self._maze.variations_layer,\n                                            wall_char=variation,\n                                            make_odd_sized_walls=True)\n            else:\n                tiles = build_tile_geoms_fn(wall_char=variation)\n\n            if variation == _DEFAULT_FLOOR_CHAR:\n                variation_texture = main_floor_texture\n            else:\n                variation_texture = room_floor_textures[i_var % len(room_floor_textures)]\n\n            for i, tile in enumerate(tiles):\n                tile_mid = covering.GridCoordinates(\n                    (tile.start.y + tile.end.y - 1) / 2,\n                    (tile.start.x + tile.end.x - 1) / 2)\n                tile_pos = np.array([(tile_mid.x - self._x_offset) * self._xy_scale,\n                                     -(tile_mid.y - self._y_offset) * self._xy_scale,\n                                     0.0])\n                tile_size = np.array([(tile.end.x - tile_mid.x - 0.5) * self._xy_scale,\n                                      (tile.end.y - tile_mid.y - 0.5) * self._xy_scale,\n                                      self._xy_scale])\n                if variation == _DEFAULT_FLOOR_CHAR:\n                    tile_name = 'floor_{}'.format(i)\n                else:\n                    tile_name = 'floor_{}_{}'.format(variation, i)\n                self._tile_geom_names[tile.start] = tile_name\n                self._texturing_material_names.append(tile_name)\n                self._texturing_geom_names.append(tile_name)\n                material = self._mjcf_root.asset.add(\n                    'material', name=tile_name, texture=variation_texture,\n                    texrepeat=(2 * tile_size[[0, 1]] / self._xy_scale))\n                self._mjcf_root.worldbody.add(\n                    'geom', name=tile_name, type='plane', material=material,\n                    pos=tile_pos, size=tile_size, contype=0, conaffinity=0)\n\n\nclass TextMazeVaryingWalls(labmaze.RandomMaze):\n    \"\"\"Augments standard generated labmaze with some walls marked with different chars.\"\"\"\n\n    def regenerate(self):\n        super().regenerate()\n        self._block_variations()\n\n    def _block_variations(self):\n        nblocks = 3\n        wall_chars = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']\n\n        n = self.entity_layer.shape[0]\n        ivar = 0\n        for i in range(nblocks):\n            for j in range(nblocks):\n                i_from = i * n // nblocks\n                i_to = (i + 1) * n // nblocks\n                j_from = j * n // nblocks\n                j_to = (j + 1) * n // nblocks\n                self._change_block_char(i_from, i_to, j_from, j_to, wall_chars[ivar])\n                ivar += 1\n\n    def _change_block_char(self, i1, i2, j1, j2, char):\n        grid = self.entity_layer\n        i, j = np.where(grid[i1:i2, j1:j2] == '*')\n        grid[i + i1, j + j1] = char\n"
  },
  {
    "path": "memory_maze/oracle.py",
    "content": "from collections import deque\nfrom typing import List, Optional, Tuple\nimport numpy as np\n\nfrom memory_maze.wrappers import ObservationWrapper\n\n\nclass PathToTargetWrapper(ObservationWrapper):\n    \"\"\"Find shortest path to target and indicate it on maze_layout. Used for Oracle.\"\"\"\n\n    def observation_spec(self):\n        spec = self.env.observation_spec()\n        assert isinstance(spec, dict)\n        assert 'agent_pos' in spec\n        assert 'target_pos' in spec\n        assert 'maze_layout' in spec\n        return spec\n\n    def observation(self, obs):\n        assert isinstance(obs, dict)\n        # Find shortest path (in gridworld) from agent to target\n        maze = obs['maze_layout']\n        start = tuple(obs['agent_pos'].astype(int))\n        finish = tuple(obs['target_pos'].astype(int))\n        path = breadth_first_search(maze, start, finish)\n        if path:\n            for x, y in path:\n                maze[y, x] = 2  # Update maze_layout observation\n        return obs\n\n\nclass DrawMinimapWrapper(ObservationWrapper):\n    \"\"\"Show maze_layout as minimap in image observation. Used for Oracle.\"\"\"\n\n    def observation_spec(self):\n        spec = self.env.observation_spec()\n        assert isinstance(spec, dict)\n        assert 'maze_layout' in spec\n        assert 'image' in spec\n        assert 'agent_dir' in spec\n        return spec\n\n    def observation(self, obs):\n        from PIL import Image\n\n        assert isinstance(obs, dict)\n        maze = obs['maze_layout']\n        x, y = obs['agent_pos']\n        dx, dy = obs['agent_dir']\n        angle = np.arctan2(dx, dy)\n        N = maze.shape[0]\n        SIZE = N * 2\n\n        # Draw map\n        map = np.zeros((N, N, 3), np.uint8)  # walls in black\n        map[:, :] += (maze == 1)[..., None] * np.array([[[255, 255, 255]]], np.uint8)  # corridors in white\n        map[:, :] += (maze == 2)[..., None] * np.array([[[0, 255, 0]]], np.uint8)  # path in green\n        map[int(y), int(x)] = np.array([255, 0, 0], np.uint8)  # agent in red\n        map = np.flip(map, 0)\n\n        # Scale, rotate, translate\n        mapimg = Image.fromarray(map)\n        mapimg = mapimg.resize((SIZE, SIZE), resample=0)\n        tx = (x - N / 2) / N * SIZE\n        ty = - (y - N / 2) / N * SIZE\n        mapimg = mapimg.transform(mapimg.size, 0,\n                                  (1, 0, tx,\n                                   0, 1, ty),\n                                  resample=0)\n        mapimg = mapimg.rotate(angle / np.pi * 180, resample=0)\n\n        # Overlay minimap onto observation image top-right corner\n        img = obs['image']\n        img[:SIZE, -SIZE:] = img[:SIZE, -SIZE:] // 2 + np.array(mapimg) // 2\n        return obs\n\n\ndef breadth_first_search(maze: np.ndarray, start: Tuple[int, int], finish: Tuple[int, int]) -> Optional[List[Tuple[int, int]]]:\n    h, w = maze.shape\n\n    queue = deque()\n    visited = np.zeros(maze.shape, dtype=bool)\n    backtrace = np.zeros(maze.shape + (2,), dtype=int)\n\n    xs, ys = start\n    queue.append((xs, ys))\n    visited[ys, xs] = True\n\n    while len(queue) > 0:\n        x, y = queue.popleft()\n        for dx, dy in [(-1, 0), (1, 0), (0, -1), (0, 1)]:\n            x1 = x + dx\n            y1 = y + dy\n            if 0 <= x1 < w and 0 <= y1 < h and maze[y1, x1] and not visited[y1, x1]:\n                queue.append((x1, y1))\n                visited[y1, x1] = True\n                backtrace[y1, x1, :] = np.array([x, y])\n                if (x1, y1) == finish:\n                    break\n\n    xf, yf = finish\n    if not visited[yf, xf]:\n        return None\n\n    path = []\n    path.append((xf, yf))\n    while (xf, yf) != start:\n        xf, yf = backtrace[yf, xf]\n        path.append((xf, yf))\n    path.reverse()\n    return path\n"
  },
  {
    "path": "memory_maze/tasks.py",
    "content": "import numpy as np\nfrom dm_control import composer\nfrom dm_control.locomotion.arenas import labmaze_textures\n\nfrom memory_maze.maze import *\nfrom memory_maze.oracle import DrawMinimapWrapper, PathToTargetWrapper\nfrom memory_maze.wrappers import *\n\n# Slow control (4Hz), so that agent without HRL has a chance.\n# Native control would be ~20Hz, so this corresponds roughly to action_repeat=5.\nDEFAULT_CONTROL_FREQ = 4.0\n\n\ndef memory_maze_9x9(**kwargs):\n    \"\"\"\n    Maze based on DMLab30-explore_goal_locations_small\n    {\n        mazeHeight = 11,  # with outer walls\n        mazeWidth = 11,\n        roomCount = 4,\n        roomMaxSize = 5,\n        roomMinSize = 3,\n    }\n    \"\"\"\n    return _memory_maze(9, 3, 250, **kwargs)\n\n\ndef memory_maze_11x11(**kwargs):\n    return _memory_maze(11, 4, 500, **kwargs)\n\n\ndef memory_maze_13x13(**kwargs):\n    return _memory_maze(13, 5, 750, **kwargs)\n\n\ndef memory_maze_15x15(**kwargs):\n    \"\"\"\n    Maze based on DMLab30-explore_goal_locations_large\n    {\n        mazeHeight = 17,  # with outer walls\n        mazeWidth = 17,\n        roomCount = 9,\n        roomMaxSize = 3,\n        roomMaxSize = 3,\n    }\n    \"\"\"\n    return _memory_maze(15, 6, 1000, max_rooms=9, room_max_size=3, **kwargs)\n\n\ndef _memory_maze(\n    maze_size,  # measured without exterior walls\n    n_targets,\n    time_limit,\n    max_rooms=6,\n    room_min_size=3,\n    room_max_size=5,\n    control_freq=DEFAULT_CONTROL_FREQ,\n    discrete_actions=True,\n    image_only_obs=False,\n    target_color_in_image=True,\n    global_observables=False,\n    top_camera=False,\n    good_visibility=False,\n    show_path=False,\n    camera_resolution=64,\n    seed=None,\n    randomize_colors=False,\n):\n    random_state = np.random.RandomState(seed)\n    walker = RollingBallWithFriction(camera_height=0.3, add_ears=top_camera)\n    arena = MazeWithTargetsArena(\n        x_cells=maze_size + 2,  # inner size => outer size\n        y_cells=maze_size + 2,\n        xy_scale=2.0,\n        z_height=1.5 if not good_visibility else 0.4,\n        max_rooms=max_rooms,\n        room_min_size=room_min_size,\n        room_max_size=room_max_size,\n        spawns_per_room=1,\n        targets_per_room=1,\n        floor_textures=FixedFloorTexture('style_01', ['blue', 'blue_bright']),\n        wall_textures=dict({\n            '*': FixedWallTexture('style_01', 'yellow'),  # default wall\n        }, **{str(i): labmaze_textures.WallTextures('style_01') for i in range(10)}  # variations\n        ),\n        skybox_texture=None,\n        random_seed=random_state.randint(2147483648),\n    )\n\n    task = MemoryMazeTask(\n        walker=walker,\n        maze_arena=arena,\n        n_targets=n_targets,\n        target_radius=0.6,\n        target_height_above_ground=0.5 if good_visibility else -0.6,\n        enable_global_task_observables=True,  # Always add to underlying env, but not always expose in RemapObservationWrapper\n        control_timestep=1.0 / control_freq,\n        camera_resolution=camera_resolution,\n        target_randomize_colors=randomize_colors,\n    )\n\n    if top_camera:\n        task.observables['top_camera'].enabled = True\n\n    env = composer.Environment(\n        time_limit=time_limit - 1e-3,  # subtract epsilon to make sure ep_length=time_limit*fps\n        task=task,\n        random_state=random_state,\n        strip_singleton_obs_buffer_dim=True)\n\n    obs_mapping = {\n        'image': 'walker/egocentric_camera' if not top_camera else 'top_camera',\n        'target_color': 'target_color',\n    }\n    if global_observables:\n        env = TargetsPositionWrapper(env, task._maze_arena.xy_scale, task._maze_arena.maze.width, task._maze_arena.maze.height)\n        env = AgentPositionWrapper(env, task._maze_arena.xy_scale, task._maze_arena.maze.width, task._maze_arena.maze.height)\n        env = MazeLayoutWrapper(env)\n        obs_mapping = dict(obs_mapping, **{\n            'agent_pos': 'agent_pos',\n            'agent_dir': 'agent_dir',\n            'targets_vec': 'targets_vec',\n            'targets_pos': 'targets_pos',\n            'target_vec': 'target_vec',\n            'target_pos': 'target_pos',\n            'maze_layout': 'maze_layout',\n        })\n\n    env = RemapObservationWrapper(env, obs_mapping)\n\n    if target_color_in_image:\n        env = TargetColorAsBorderWrapper(env)\n\n    if show_path:\n        env = PathToTargetWrapper(env)\n        env = DrawMinimapWrapper(env)\n\n    if image_only_obs:\n        assert target_color_in_image, 'Image-only observation only makes sense with target_color_in_image'\n        env = ImageOnlyObservationWrapper(env)\n\n    if discrete_actions:\n        env = DiscreteActionSetWrapper(env, [\n            np.array([0.0, 0.0]),  # noop\n            np.array([-1.0, 0.0]),  # forward\n            np.array([0.0, -1.0]),  # left\n            np.array([0.0, +1.0]),  # right\n            np.array([-1.0, -1.0]),  # forward + left\n            np.array([-1.0, +1.0]),  # forward + right\n        ])\n\n    return env\n"
  },
  {
    "path": "memory_maze/wrappers.py",
    "content": "\n\nfrom typing import Any, Dict, List\n\nimport dm_env\nimport numpy as np\nfrom dm_env import specs\n\n\nclass Wrapper(dm_env.Environment):\n    \"\"\"Base class for dm_env.Environment wrapper.\"\"\"\n\n    def __init__(self, env: dm_env.Environment):\n        self.env = env\n\n    def __getattr__(self, name):\n        if name.startswith('__'):\n            raise AttributeError(f'Attempted to get missing private attribute {name}')\n        return getattr(self.env, name)\n\n    def step(self, action) -> dm_env.TimeStep:\n        return self.env.step(action)\n\n    def reset(self) -> dm_env.TimeStep:\n        return self.env.reset()\n\n    def action_spec(self) -> Any:\n        return self.env.action_spec()\n\n    def discount_spec(self) -> Any:\n        return self.env.discount_spec()\n\n    def observation_spec(self) -> Any:\n        return self.env.observation_spec()\n\n    def reward_spec(self) -> Any:\n        return self.env.reward_spec()\n\n    def close(self):\n        return self.env.close()\n\n\nclass ObservationWrapper(Wrapper):\n    \"\"\"Base class for observation wrapper.\"\"\"\n\n    def observation_spec(self):\n        raise NotImplementedError\n\n    def observation(self, obs: Any) -> Any:\n        raise NotImplementedError\n\n    def step(self, action) -> dm_env.TimeStep:\n        step_type, discount, reward, observation = self.env.step(action)\n        return dm_env.TimeStep(step_type, discount, reward, self.observation(observation))\n\n    def reset(self) -> dm_env.TimeStep:\n        step_type, discount, reward, observation = self.env.reset()\n        return dm_env.TimeStep(step_type, discount, reward, self.observation(observation))\n\n\nclass RemapObservationWrapper(ObservationWrapper):\n    \"\"\"Select a subset of dictionary observation keys and rename them.\"\"\"\n\n    def __init__(self, env: dm_env.Environment, mapping: Dict[str, str]):\n        super().__init__(env)\n        self.mapping = mapping\n\n    def observation_spec(self):\n        spec = self.env.observation_spec()\n        assert isinstance(spec, dict)\n        return {key: spec[key_orig] for key, key_orig in self.mapping.items()}\n\n    def observation(self, obs):\n        assert isinstance(obs, dict)\n        return {key: obs[key_orig] for key, key_orig in self.mapping.items()}\n\n\nclass TargetsPositionWrapper(ObservationWrapper):\n    \"\"\"Collects and postporcesses walker/target_rel_{i} relative position vectors into \n    targets_vec (n_targets,2) tensor, and walker/targets_abs_{i} absolute positions \n    into targets_pos tensor.\"\"\"\n\n    def __init__(self, env: dm_env.Environment, maze_xy_scale, maze_width, maze_height):\n        super().__init__(env)\n        self.maze_xy_scale = maze_xy_scale\n        self.center_ji = np.array([maze_width - 2.0, maze_height - 2.0]) / 2.0\n\n        spec = self.env.observation_spec()\n        assert isinstance(spec, dict)\n        assert 'walker/target_rel_0' in spec\n        assert 'walker/target_abs_0' in spec\n        assert 'target_index' in spec\n\n        i = 0\n        while f'walker/target_rel_{i}' in spec:\n            assert f'walker/target_abs_{i}' in spec\n            i += 1\n\n        self.n_targets = i\n\n    def observation_spec(self):\n        spec = self.env.observation_spec()\n        assert isinstance(spec, dict)\n        # All targets\n        spec['targets_vec'] = specs.Array((self.n_targets, 2), float, 'targets_vec')\n        spec['targets_pos'] = specs.Array((self.n_targets, 2), float, 'targets_pos')\n        # Current target\n        spec['target_vec'] = specs.Array((2,), float, 'target_vec')\n        spec['target_pos'] = specs.Array((2,), float, 'target_pos')\n        return spec\n\n    def observation(self, obs):\n        assert isinstance(obs, dict)\n        # All targets\n        x_rel = np.zeros((self.n_targets, 2))\n        x_abs = np.zeros((self.n_targets, 2))\n        for i in range(self.n_targets):\n            x_rel[i] = obs[f'walker/target_rel_{i}'][:2] / self.maze_xy_scale\n            x_abs[i] = obs[f'walker/target_abs_{i}'][:2] / self.maze_xy_scale + self.center_ji\n        obs['targets_vec'] = x_rel\n        obs['targets_pos'] = x_abs\n        # Current target\n        target_ix = int(obs['target_index'])\n        obs['target_vec'] = x_rel[target_ix]\n        obs['target_pos'] = x_abs[target_ix]\n        return obs\n\n\nclass AgentPositionWrapper(ObservationWrapper):\n    \"\"\"Postprocesses absolute_position and absolute_orientation.\"\"\"\n\n    def __init__(self, env: dm_env.Environment, maze_xy_scale, maze_width, maze_height):\n        super().__init__(env)\n        self.maze_xy_scale = maze_xy_scale\n        self.center_ji = np.array([maze_width - 2.0, maze_height - 2.0]) / 2.0\n\n    def observation_spec(self):\n        spec = self.env.observation_spec()\n        # absolute_position and absolute_orientation should already be generated by the environment.\n        assert isinstance(spec, dict) and 'absolute_position' in spec and 'absolute_orientation' in spec\n        # Add agent_pos, measured in grid coordinates\n        spec['agent_pos'] = specs.Array((2, ), float, 'agent_pos')\n        # Add agent_dir as 2-vector\n        spec['agent_dir'] = specs.Array((2, ), float, 'agent_dir')\n        return spec\n\n    def observation(self, obs):\n        assert isinstance(obs, dict)\n        walker_xy = obs['absolute_position'][:2]\n        walker_ji = walker_xy / self.maze_xy_scale + self.center_ji\n        # agent_pos, measured in grid coordinates, where bottom-left coordinate is (0.1,0.1),\n        # and top-right coordinate for a 15x15 maze is (14.9,14.9)\n        obs['agent_pos'] = walker_ji\n        # Pick orientation vector such, that going forward increases agent_pos in the direction of agent_dir.\n        obs['agent_dir'] = obs['absolute_orientation'][:2, 1]\n        return obs\n\n\nclass MazeLayoutWrapper(ObservationWrapper):\n    \"\"\"Postprocesses maze_layout observation.\"\"\"\n\n    def observation_spec(self):\n        spec = self.env.observation_spec()\n        # maze_layout should already be generated by the environment\n        assert isinstance(spec, dict) and 'maze_layout' in spec\n        # Change char array to binary array, removing outer walls\n        n, m = spec['maze_layout'].shape\n        spec['maze_layout'] = specs.BoundedArray((n - 2, m - 2), np.uint8, 0, 1, 'maze_layout')\n        return spec\n\n    def observation(self, obs):\n        assert isinstance(obs, dict)\n        maze = obs['maze_layout']\n        maze = maze[1:-1, 1:-1]  # Remove outer walls\n        maze = np.flip(maze, 0)  # Flip vertical axis so that bottom-left is at maze[0,0]\n        nonwalls = (maze == ' ') | (maze == 'P') | (maze == 'G')\n        obs['maze_layout'] = nonwalls.astype(np.uint8)\n        return obs\n\n\nclass ImageOnlyObservationWrapper(ObservationWrapper):\n    \"\"\"Select one of the dictionary observation keys as observation.\"\"\"\n\n    def __init__(self, env: dm_env.Environment, key: str = 'image'):\n        super().__init__(env)\n        self.key = key\n\n    def observation_spec(self):\n        spec = self.env.observation_spec()\n        assert isinstance(spec, dict)\n        return spec[self.key]\n\n    def observation(self, obs):\n        assert isinstance(obs, dict)\n        return obs[self.key]\n\n\nclass DiscreteActionSetWrapper(Wrapper):\n    \"\"\"Change action space from continuous to discrete with given set of action vectors.\"\"\"\n\n    def __init__(self, env: dm_env.Environment, action_set: List[np.ndarray]):\n        super().__init__(env)\n        self.action_set = action_set\n\n    def action_spec(self):\n        return specs.DiscreteArray(len(self.action_set))\n\n    def step(self, action) -> dm_env.TimeStep:\n        return self.env.step(self.action_set[action])\n\n\nclass TargetColorAsBorderWrapper(ObservationWrapper):\n    \"\"\"MemoryMaze-specific wrapper, which draws target_color as border on the image.\"\"\"\n\n    def observation_spec(self):\n        spec = self.env.observation_spec()\n        assert isinstance(spec, dict)\n        assert 'target_color' in spec\n        return spec\n\n    def observation(self, obs):\n        assert isinstance(obs, dict)\n        assert 'target_color' in obs and 'image' in obs\n        target_color = obs['target_color']\n        img = obs['image']\n        B = int(2 * np.sqrt(img.shape[0] // 64))\n        img[:, :B] = target_color * 255 * 0.7\n        img[:, -B:] = target_color * 255 * 0.7\n        img[:B, :] = target_color * 255 * 0.7\n        img[-B:, :] = target_color * 255 * 0.7\n        return obs\n"
  },
  {
    "path": "setup.py",
    "content": "from setuptools import setup\nimport pathlib\n\n__version__ = \"1.0.3\"\n\nsetup(\n    name=\"memory-maze\",\n    version=__version__,\n    author=\"Jurgis Pasukonis\",\n    author_email=\"jurgisp@gmail.com\",\n    url=\"https://github.com/jurgisp/memory-maze\",\n    description=\"Memory Maze is an environment to benchmark memory abilities of RL agents\",\n    long_description=pathlib.Path('README.md').read_text(),\n    long_description_content_type='text/markdown',\n    zip_safe=False,\n    python_requires=\">=3\",\n    packages=[\"memory_maze\"],\n    install_requires=[\n        'dm_control'\n    ],\n)\n"
  }
]