Repository: jurgisp/memory-maze Branch: main Commit: 4030901cef3b Files: 14 Total size: 63.2 KB Directory structure: gitextract_h395e807/ ├── .gitignore ├── LICENSE ├── README.md ├── gui/ │ ├── recording.py │ ├── requirements.txt │ └── run_gui.py ├── memory_maze/ │ ├── __init__.py │ ├── gym_wrappers.py │ ├── helpers.py │ ├── maze.py │ ├── oracle.py │ ├── tasks.py │ └── wrappers.py └── setup.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .* !.gitignore __pycache__/ *.egg-info sandbox/ log/ ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2022 jurgisp Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ **Status:** Stable release [![PyPI](https://img.shields.io/pypi/v/memory-maze.svg)](https://pypi.python.org/pypi/memory-maze/#history) # Memory Maze Memory Maze is a 3D domain of randomized mazes designed for evaluating the long-term memory abilities of RL agents. Memory Maze isolates long-term memory from confounding challenges, such as exploration, and requires remembering several pieces of information: the positions of objects, the wall layout, and keeping track of agent’s own position. | Memory 9x9 | Memory 11x11 | Memory 13x13 | Memory 15x15 | |------------|--------------|--------------|--------------| | ![map-9x9](https://user-images.githubusercontent.com/3135115/177040204-fbf3b558-d063-49d3-9973-ae113137782f.png) | ![map-11x11](https://user-images.githubusercontent.com/3135115/177040184-16ccb614-b897-44db-ab2c-7ae66e14c007.png) | ![map-13x13](https://user-images.githubusercontent.com/3135115/177040164-d3edb11f-de6a-4c17-bce2-38e539639f40.png) | ![map-15x15](https://user-images.githubusercontent.com/3135115/177040126-b9a0f861-b15b-492c-9216-89502e8f8ae9.png) | Key features: - Online RL memory tasks (with baselines) - Offline dataset for representation learning (with baselines) - Verified that memory is the key challenge - Challenging but solvable by human baseline - Easy installation via a simple pip command - Available `gym` and `dm_env` interfaces - Supports headless and hardware rendering - Interactive GUI for human players - Hidden state information for probe evaluation Also see the accompanying research paper: [Evaluating Long-Term Memory in 3D Mazes](https://arxiv.org/abs/2210.13383) ``` @article{pasukonis2022memmaze, title={Evaluating Long-Term Memory in 3D Mazes}, author={Pasukonis, Jurgis and Lillicrap, Timothy and Hafner, Danijar}, journal={arXiv preprint arXiv:2210.13383}, year={2022} } ``` ## Installation Memory Maze builds on the [`dm_control`](https://github.com/deepmind/dm_control) and [`mujoco`](https://github.com/deepmind/mujoco) packages, which are automatically installed as dependencies: ```sh pip install memory-maze ``` ## Play Yourself Memory Maze allows you to play the levels in human mode. We used this mode for recording the human baseline scores. These are the instructions for launching the GUI: ```sh # GUI dependencies pip install gym pygame pillow imageio # Launch with standard 64x64 resolution python gui/run_gui.py # Launch with higher 256x256 resolution python gui/run_gui.py --env "memory_maze:MemoryMaze-9x9-HD-v0" ``` ## Task Description The task is based on a game known as scavenger hunt or treasure hunt: - The agent starts in a randomly generated maze, which contains several objects of different colors. - The agent is prompted to find the target object of a specific color, indicated by the border color in the observation image. - Once the agent successfully finds and touches the correct object, it gets a +1 reward and the next random object is chosen as a target. - If the agent touches an object of the wrong color, there is no effect. - Throughout the episode, the maze layout and the locations of the objects do not change. - The episode continues for a fixed amount of time, so the total episode reward equals the number of reached targets.

An agent with long-term memory only has to explore each maze once (which is possible in a time much shorter than the length of an episode) and can afterwards follow the shortest path to each requested target, whereas an agent with no memory has to randomly wander through the maze to find each target. There are 4 size variations of the maze. The largest maze 15x15 is designed to be challenging but solvable for humans (see benchmark results below), but out of reach for the state-of-the-art RL methods. The smaller sizes are provided as stepping stones, with 9x9 being solvable with current RL methods. | Size | env_id | Objects | Episode steps | Mean human score | Mean max score | |:---------:|-----------------------|:---:|:-----:|:----:|:----:| | **9x9** | `MemoryMaze-9x9-v0` | 3 | 1000 | 26.4 | 34.8 | | **11x11** | `MemoryMaze-11x11-v0` | 4 | 2000 | 44.3 | 58.0 | | **13x13** | `MemoryMaze-13x13-v0` | 5 | 3000 | 55.5 | 74.5 | | **15x15** | `MemoryMaze-15x15-v0` | 6 | 4000 | 67.7 | 87.7 | The mazes are generated with [labmaze](https://github.com/deepmind/labmaze), the same algorithm as used by [DmLab-30](https://github.com/deepmind/lab/tree/master/game_scripts/levels/contributed/dmlab30). The 9x9 corresponds to the [small](https://github.com/deepmind/lab/tree/master/game_scripts/levels/contributed/dmlab30#goal-locations-small) variant and 15x15 corresponds to the [large](https://github.com/deepmind/lab/tree/master/game_scripts/levels/contributed/dmlab30#goal-locations-large) variant. ## Gym Interface You can create the environment using the [Gym](https://github.com/openai/gym) interface: ```python !pip install gym import gym # Set this if you are getting "Unable to load EGL library" error: # os.environ['MUJOCO_GL'] = 'glfw' env = gym.make('memory_maze:MemoryMaze-9x9-v0') env = gym.make('memory_maze:MemoryMaze-11x11-v0') env = gym.make('memory_maze:MemoryMaze-13x13-v0') env = gym.make('memory_maze:MemoryMaze-15x15-v0') ``` **Troubleshooting:** if you are getting "Unable to load EGL library error", that is because we enable MuJoCo headless GPU rendering (`MUJOCO_GL=egl`) by default. If you are testing locally on your machine, you can enable windowed rendering instead (`MUJOCO_GL=glfw`). [Read here](https://github.com/deepmind/dm_control#rendering) about the different rendering options. The default environment has 64x64 image observations: ```python >>> env.observation_space Box(0, 255, (64, 64, 3), uint8) ``` There are 6 discrete actions: ```python >>> env.action_space Discrete(6) # (noop, forward, left, right, forward_left, forward_right) ``` To create an environment with extra observations for debugging and probe analysis, append `ExtraObs` to the names: ```python >>> env = gym.make('memory_maze:MemoryMaze-9x9-ExtraObs-v0') >>> env.observation_space Dict( agent_dir: Box(-inf, inf, (2,), float64), agent_pos: Box(-inf, inf, (2,), float64), image: Box(0, 255, (64, 64, 3), uint8), maze_layout: Box(0, 1, (9, 9), uint8), target_color: Box(-inf, inf, (3,), float64), target_pos: Box(-inf, inf, (2,), float64), target_vec: Box(-inf, inf, (2,), float64), targets_pos: Box(-inf, inf, (3, 2), float64), targets_vec: Box(-inf, inf, (3, 2), float64) ) ``` We also register [additional variants](memory_maze/__init__.py) of the environment that can be useful in certain scenarios. ## DeepMind Interface You can create the environment using the [dm_env](https://github.com/deepmind/dm_env) interface: ```python from memory_maze import tasks env = tasks.memory_maze_9x9() env = tasks.memory_maze_11x11() env = tasks.memory_maze_13x13() env = tasks.memory_maze_15x15() ``` Each observation is a dictionary that includes `image` key: ```python >>> env.observation_spec() { 'image': BoundedArray(shape=(64, 64, 3), ...) } ``` The constructor accepts a number of arguments, which can be used to tweak the environment: ```python env = tasks.memory_maze_9x9( global_observables=True, image_only_obs=False, top_camera=False, camera_resolution=64, control_freq=4.0, discrete_actions=True, ) ``` ## Offline Dataset [**Dataset download here** (~100GB per dataset)](https://drive.google.com/drive/folders/1RcnkTZVwEHnAQeEuw7X8Y1RPSmrFLDFB) We provide two datasets of experience collected from the Memory Maze environment: Memory Maze 9x9 (30M) and Memory Maze 15x15 (30M). Each dataset contains 30 thousand trajectories from Memory Maze 9x9 and 15x15 environments respectively, split into 29k trajectories for training and 1k for evaluation. All trajectories are 1000 steps long, so each dataset has 30M steps total. The data is generated with a scripted policy that navigates to randomly chosen points in the maze under action noise. This choice of policy was made to generate diverse trajectories that explore the maze effectively and that form spatial loops, which can be important for learning long-term memory. We intentionally avoid recording data with a trained agent to ensure a diverse data distribution and to avoid dataset bias that could favor some methods over others. Because of this, the rewards are quite sparse in the data, occurring on average 1-2 times per trajectory. Each trajectory is saved as an NPZ file with the following entries available: | Key | Shape | Type | Description | |----------------|--------------------|--------|-----------------------------------------------| | `image` | (64, 64, 3) | uint8 | First-person view observation | | `action` | (6) | binary | Last action, one-hot encoded | | `reward` | () | float | Last reward | | `maze_layout` | (9, 9) or (15, 15) | binary | Maze layout (wall / no wall) | | `agent_pos` | (2) | float | Agent position in global coordinates | | `agent_dir` | (2) | float | Agent orientation as a unit vector | | `targets_pos` | (3, 2) or (6, 2) | float | Object locations in global coordinates | | `targets_vec` | (3, 2) or (6, 2) | float | Object locations in agent-centric coordinates | | `target_pos` | (2) | float | Current target object location, global | | `target_vec` | (2) | float | Current target object location, agent-centric | | `target_color` | (3) | float | Current target object color RGB | You can load a trajectory using [`np.load()`](https://numpy.org/doc/stable/reference/generated/numpy.load.html) to obtain a dictionary of Numpy arrays as follows: ```python episode = np.load('trajectory.npz') episode = {key: episode[key] for key in episode.keys()} assert episode['image'].shape == (1001, 64, 64, 3) assert episode['image'].dtype == np.uint8 ``` All tensors have a leading time dimension, e.g. `image` tensor has shape (1001, 64, 64, 3). The tensor length is 1001 because there are 1000 steps (actions) in a trajectory, `image[0]` is the observation *before* the first action, and `image[-1]` is the observation *after* the last action. ## Online RL Baselines In our [research paper](https://arxiv.org/abs/2210.13383), we evaluate the model-free [IMPALA](https://github.com/google-research/seed_rl/tree/master/agents/vtrace) agent and the model-based [Dreamer](https://github.com/jurgisp/pydreamer) agent as baselines.

baselines
training

Here are videos of the learned behaviors: **Memory 9x9 - Dreamer (TBTT)** https://user-images.githubusercontent.com/3135115/197378287-4e413440-7097-4d11-8627-3d7fac0845f1.mp4 **Memory 9x9 - IMPALA (400M)** https://user-images.githubusercontent.com/3135115/197378929-7fe3f374-c11c-409a-8a95-03feeb489330.mp4 **Memory 15x15 - Dreamer (TBTT)** https://user-images.githubusercontent.com/3135115/197378324-fb99b496-dba8-4b00-ad80-2d6e19ba8acd.mp4 **Memory 15x15 - IMPALA (400M)** https://user-images.githubusercontent.com/3135115/197378936-939e7615-9dad-4765-b0ef-a49c5a38fe28.mp4 ## Offline Probing Baselines Here we visualize probe predictions alongside trajectories of the offline dataset, as explained in [the paper](https://arxiv.org/abs/2210.13383). These trajectories are from the offline dataset, where the agent just navigates to random points in the maze, it does *not* try to collect rewards. Bottom-left: Object location predictions (x) versus the actual locations (o). Bottom-right: Wall layout predictions (dark green = true positive, light green = true negative, light red = false positive, dark red = false negative). **Memory 9x9 Walls Objects - RSSM (TBTT)** https://user-images.githubusercontent.com/3135115/197379227-775ec5bc-0780-4dcc-b7f1-660bc7cf95f1.mp4 **Memory 9x9 Walls Objects - Supervised oracle** https://user-images.githubusercontent.com/3135115/197379235-a5ea0388-2718-4035-8bbc-064ecc9ea444.mp4 **Memory 15x15 Walls Objects - RSSM (TBTT)** https://user-images.githubusercontent.com/3135115/197379245-fb96bd12-6ef5-481e-adc6-f119a39e8e43.mp4 **Memory 15x15 Walls Objects - Supervised oracle** https://user-images.githubusercontent.com/3135115/197379248-26a8093e-8b54-443c-b154-e33e0383b5e4.mp4 ## Questions Please [open an issue][issues] on Github. [issues]: https://github.com/jurgisp/memory-maze/issues ================================================ FILE: gui/recording.py ================================================ from datetime import datetime from pathlib import Path import gym import imageio import numpy as np from PIL import Image class SaveNpzWrapper(gym.Wrapper): def __init__(self, env, log_dir, video_fps=30, video_size=256, video_format='mp4'): env = ActionRewardResetWrapper(env) env = CollectWrapper(env) super().__init__(env) self.log_dir = Path(log_dir) self.log_dir.mkdir(parents=True, exist_ok=True) self.video_fps = video_fps self.video_size = video_size self.video_format = video_format def step(self, action): obs, reward, done, info = self.env.step(action) # type: ignore data = info.get('episode') if data: ep_id = info['episode_id'] ep_reward = data['reward'].sum() ep_steps = len(data['reward']) - 1 ep_name = f'{ep_id}-r{ep_reward:.0f}-{ep_steps:04}' self._save_npz(data, self.log_dir / f'{ep_name}.npz') if self.video_format: self._save_video(data, self.log_dir / f'{ep_name}.{self.video_format}') return obs, reward, done, info def _save_npz(self, data, path): with path.open('wb') as f: np.savez_compressed(f, **data) print(f'Saved {path}', {k: v.shape for k, v in data.items()}) def _save_video(self, data, path): writer = imageio.get_writer(path, fps=self.video_fps) for frame in data['image']: img = Image.fromarray(frame) img = img.resize((self.video_size, self.video_size), resample=0) writer.append_data(np.array(img)) writer.close() print(f'Saved {path}') class CollectWrapper(gym.Wrapper): """Copied from pydreamer.envs.wrappers.""" def __init__(self, env): super().__init__(env) self.env = env self.episode = [] self.episode_id = '' def step(self, action): obs, reward, done, info = self.env.step(action) self.episode.append(obs.copy()) if done: episode = {k: np.array([t[k] for t in self.episode]) for k in self.episode[0]} info['episode'] = episode info['episode_id'] = self.episode_id return obs, reward, done, info def reset(self): obs = self.env.reset() self.episode = [obs.copy()] self.episode_id = datetime.now().strftime('%Y%m%dT%H%M%S') return obs class ActionRewardResetWrapper(gym.Wrapper): """Copied from pydreamer.envs.wrappers.""" def __init__(self, env, no_terminal=False): super().__init__(env) self.env = env self.no_terminal = no_terminal # Handle environments with one-hot or discrete action, but collect always as one-hot self.action_size = env.action_space.n if hasattr(env.action_space, 'n') else env.action_space.shape[0] def step(self, action): obs, reward, done, info = self.env.step(action) if isinstance(action, int): action_vec = np.zeros(self.action_size) action_vec[action] = 1.0 else: assert isinstance(action, np.ndarray) and action.shape == (self.action_size,), "Wrong one-hot action shape" action_vec = action obs['action'] = action_vec obs['reward'] = np.array(reward) obs['terminal'] = np.array(False if self.no_terminal or 'TimeLimit.truncated' in info or info.get('time_limit') else done) obs['reset'] = np.array(False) return obs, reward, done, info def reset(self): obs = self.env.reset() obs['action'] = np.zeros(self.action_size) obs['reward'] = np.array(0.0) obs['terminal'] = np.array(False) obs['reset'] = np.array(True) return obs ================================================ FILE: gui/requirements.txt ================================================ gym pygame pillow imageio imageio-ffmpeg ================================================ FILE: gui/run_gui.py ================================================ import os, sys import argparse from collections import defaultdict import gym import numpy as np import pygame import pygame.freetype from gym import spaces from PIL import Image from recording import SaveNpzWrapper if 'MUJOCO_GL' not in os.environ: if "linux" in sys.platform: os.environ['MUJOCO_GL'] = 'osmesa' # Software rendering to avoid rendering interference with pygame else: os.environ['MUJOCO_GL'] = 'glfw' # Windowed rendering PANEL_LEFT = 250 PANEL_RIGHT = 250 FOCUS_HACK = False RECORD_DIR = './log' K_NONE = tuple() def get_keymap(env): return { tuple(): 0, (pygame.K_UP, ): 1, (pygame.K_LEFT, ): 2, (pygame.K_RIGHT, ): 3, (pygame.K_UP, pygame.K_LEFT): 4, (pygame.K_UP, pygame.K_RIGHT): 5, } def main(): parser = argparse.ArgumentParser() parser.add_argument('--env', type=str, default='memory_maze:MemoryMaze-9x9-v0') parser.add_argument('--size', type=int, nargs=2, default=(600, 600)) parser.add_argument('--fps', type=int, default=6) parser.add_argument('--random', type=float, default=0.0) parser.add_argument('--noreset', action='store_true') parser.add_argument('--fullscreen', action='store_true') parser.add_argument('--nonoop', action='store_true', help='Pause instead of noop') parser.add_argument('--record', action='store_true') parser.add_argument('--record_mp4', action='store_true') parser.add_argument('--record_gif', action='store_true') args = parser.parse_args() render_size = args.size window_size = (render_size[0] + PANEL_LEFT + PANEL_RIGHT, render_size[1]) print(f'Creating environment: {args.env}') env = gym.make(args.env, disable_env_checker=True) if isinstance(env.observation_space, spaces.Dict): print('Observation space:') for k, v in env.observation_space.spaces.items(): # type: ignore print(f'{k:>25}: {v}') else: print(f'Observation space: {env.observation_space}') print(f'Action space: {env.action_space}') if args.record: env = SaveNpzWrapper( env, RECORD_DIR, video_format='mp4' if args.record_mp4 else 'gif' if args.record_gif else None, video_fps=args.fps * 2) keymap = get_keymap(env) steps = 0 return_ = 0.0 episode = 0 obs = env.reset() pygame.init() start_fullscreen = args.fullscreen or FOCUS_HACK screen = pygame.display.set_mode(window_size, pygame.FULLSCREEN if start_fullscreen else 0) if FOCUS_HACK and not args.fullscreen: # Hack: for some reason app window doesn't get focus when launching, so # we launch it as full screen and then exit full screen. pygame.display.toggle_fullscreen() clock = pygame.time.Clock() font = pygame.freetype.SysFont('Mono', 16) fontsmall = pygame.freetype.SysFont('Mono', 12) running = True paused = False speedup = False while running: # Rendering screen.fill((64, 64, 64)) # Render image observation if isinstance(obs, dict): assert 'image' in obs, 'Expecting dictionary observation with obs["image"]' image = obs['image'] # type: ignore else: assert isinstance(obs, np.ndarray) and len(obs.shape) == 3, 'Expecting image observation' image = obs image = Image.fromarray(image) image = image.resize(render_size, resample=0) image = np.array(image) surface = pygame.surfarray.make_surface(image.transpose((1, 0, 2))) screen.blit(surface, (PANEL_LEFT, 0)) # Render statistics lines = obs_to_text(obs, env, steps, return_) y = 5 for line in lines: text_surface, rect = font.render(line, (255, 255, 255)) screen.blit(text_surface, (16, y)) y += font.size + 2 # type: ignore # Render keymap help lines = keymap_to_text(keymap) y = 5 for line in lines: text_surface, rect = fontsmall.render(line, (255, 255, 255)) screen.blit(text_surface, (render_size[0] + PANEL_LEFT + 16, y)) y += fontsmall.size + 2 # type: ignore pygame.display.flip() clock.tick(args.fps if not speedup else 0) # Keyboard input pygame.event.pump() keys_down = defaultdict(bool) for event in pygame.event.get(): if event.type == pygame.QUIT: # Close running = False if event.type == pygame.KEYDOWN: keys_down[event.key] = True keys_hold = pygame.key.get_pressed() # Action keys action = keymap[K_NONE] # noop, if no keys pressed for keys, act in keymap.items(): if all(keys_hold[key] or keys_down[key] for key in keys): # The last keymap entry which has all keys pressed wins action = act # Special keys force_reset = False speedup = False if keys_down[pygame.K_ESCAPE]: # Quit running = False if keys_down[pygame.K_SPACE]: # Pause paused = not paused else: if action != keymap[K_NONE]: paused = False # unpause on action press if keys_down[pygame.K_BACKSPACE]: # Force reset force_reset = True if keys_hold[pygame.K_TAB]: speedup = True if paused: continue if action == keymap[K_NONE] and args.nonoop and not force_reset: continue # Environment step if args.random: if np.random.random() < args.random: action = env.action_space.sample() obs, reward, done, info = env.step(action) # type: ignore # print({k: v for k, v in obs.items() if k != 'image'}) steps += 1 return_ += reward # Episode end if reward: print(f'reward: {reward}') if done or force_reset: print(f'Episode done - length: {steps} return: {return_}') obs = env.reset() steps = 0 return_ = 0.0 episode += 1 if done and args.record: # If recording, require relaunch for next episode running = False pygame.quit() def obs_to_text(obs, env, steps, return_): kvs = [] kvs.append(('## Stats ##', '')) kvs.append(('', '')) kvs.append(('step', steps)) kvs.append(('return', return_)) lines = [f'{k:<15} {v:>5}' for k, v in kvs] return lines def keymap_to_text(keymap, verbose=False): kvs = [] kvs.append(('## Commands ##', '')) kvs.append(('', '')) # mapped actions kvs.append(('forward', 'up arrow')) kvs.append(('left', 'left arrow')) kvs.append(('right', 'right arrow')) # special actions kvs.append(('', '')) kvs.append(('reset', 'backspace')) kvs.append(('pause', 'space')) kvs.append(('speed up', 'tab')) kvs.append(('quit', 'esc')) lines = [f'{k:<15} {v}' for k, v in kvs] return lines if __name__ == '__main__': main() ================================================ FILE: memory_maze/__init__.py ================================================ import os # NOTE: Env MUJOCO_GL=egl is necessary for headless hardware rendering on GPU, # but breaks when running on a CPU machine. Alternatively set MUJOCO_GL=osmesa. if 'MUJOCO_GL' not in os.environ: os.environ['MUJOCO_GL'] = 'egl' from . import tasks try: # Register gym environments, if gym is available from typing import Callable from functools import partial as f import dm_env import gym from gym.envs.registration import register from .gym_wrappers import GymWrapper def _make_gym_env(dm_task: Callable[[], dm_env.Environment], **kwargs): dmenv = dm_task(**kwargs) return GymWrapper(dmenv) sizes = { '9x9': tasks.memory_maze_9x9, '11x11': tasks.memory_maze_11x11, '13x13': tasks.memory_maze_13x13, '15x15': tasks.memory_maze_15x15, } for key, dm_task in sizes.items(): # Image-only obs space register(id=f'MemoryMaze-{key}-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True)) # Standard register(id=f'MemoryMaze-{key}-Vis-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, good_visibility=True)) # Easily visible targets register(id=f'MemoryMaze-{key}-HD-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, camera_resolution=256)) # High-res camera register(id=f'MemoryMaze-{key}-Top-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, camera_resolution=256, top_camera=True)) # Top-down camera # Extra global observables (dict obs space) register(id=f'MemoryMaze-{key}-ExtraObs-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True)) register(id=f'MemoryMaze-{key}-ExtraObs-Vis-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True, good_visibility=True)) register(id=f'MemoryMaze-{key}-ExtraObs-Top-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True, camera_resolution=256, top_camera=True)) # Oracle observables with shortest path shown register(id=f'MemoryMaze-{key}-Oracle-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, global_observables=True, show_path=True)) register(id=f'MemoryMaze-{key}-Oracle-Top-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, global_observables=True, show_path=True, camera_resolution=256, top_camera=True)) register(id=f'MemoryMaze-{key}-Oracle-ExtraObs-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True, show_path=True)) # High control frequency register(id=f'MemoryMaze-{key}-HiFreq-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, control_freq=40)) register(id=f'MemoryMaze-{key}-HiFreq-Vis-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, control_freq=40, good_visibility=True)) register(id=f'MemoryMaze-{key}-HiFreq-HD-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, control_freq=40, camera_resolution=256)) # Six colors even for smaller mazes register(id=f'MemoryMaze-{key}-6CL-v0', entry_point=f(_make_gym_env, dm_task, randomize_colors=True, image_only_obs=True)) register(id=f'MemoryMaze-{key}-6CL-Top-v0', entry_point=f(_make_gym_env, dm_task, randomize_colors=True, image_only_obs=True, camera_resolution=256, top_camera=True)) register(id=f'MemoryMaze-{key}-6CL-ExtraObs-v0', entry_point=f(_make_gym_env, dm_task, randomize_colors=True, global_observables=True)) except ImportError: print('memory_maze: gym environments not registered.') raise ================================================ FILE: memory_maze/gym_wrappers.py ================================================ from typing import Any, Tuple import numpy as np import dm_env import gym from dm_env import specs from gym import spaces class GymWrapper(gym.Env): def __init__(self, env: dm_env.Environment): self.env = env self.action_space = _convert_to_space(env.action_spec()) self.observation_space = _convert_to_space(env.observation_spec()) def reset(self) -> Any: ts = self.env.reset() return ts.observation def step(self, action) -> Tuple[Any, float, bool, dict]: ts = self.env.step(action) assert not ts.first(), "dm_env.step() caused reset, reward will be undefined." assert ts.reward is not None done = ts.last() terminal = ts.last() and ts.discount == 0.0 info = {} if done and not terminal: info['TimeLimit.truncated'] = True # acme.GymWrapper understands this and converts back to dm_env.truncation() return ts.observation, ts.reward, done, info def _convert_to_space(spec: Any) -> gym.Space: # Inverse of acme.gym_wrappers._convert_to_spec if isinstance(spec, specs.DiscreteArray): return spaces.Discrete(spec.num_values) if isinstance(spec, specs.BoundedArray): return spaces.Box( shape=spec.shape, dtype=spec.dtype, low=spec.minimum.item() if len(spec.minimum.shape) == 0 else spec.minimum, high=spec.maximum.item() if len(spec.maximum.shape) == 0 else spec.maximum) if isinstance(spec, specs.Array): return spaces.Box( shape=spec.shape, dtype=spec.dtype, low=-np.inf, high=np.inf) if isinstance(spec, tuple): return spaces.Tuple(_convert_to_space(s) for s in spec) if isinstance(spec, dict): return spaces.Dict({key: _convert_to_space(value) for key, value in spec.items()}) raise ValueError(f'Unexpected spec: {spec}') ================================================ FILE: memory_maze/helpers.py ================================================ from dm_env.specs import BoundedArray, DiscreteArray import numpy as np def sample_spec(space: BoundedArray) -> np.ndarray: if isinstance(space, DiscreteArray): return np.random.randint(space.num_values, size=space.shape) if isinstance(space, BoundedArray): return np.random.uniform(space.minimum, space.maximum, size=space.shape) raise NotImplementedError ================================================ FILE: memory_maze/maze.py ================================================ from typing import Optional import functools import string import labmaze import numpy as np from dm_control import mjcf from dm_control.composer.observation import observable as observable_lib from dm_control.locomotion.arenas import covering, labmaze_textures, mazes from dm_control.locomotion.props import target_sphere from dm_control.locomotion.tasks import random_goal_maze from dm_control.locomotion.walkers import jumping_ball from labmaze import assets as labmaze_assets from numpy.random import RandomState DEFAULT_CONTROL_TIMESTEP = 0.025 DEFAULT_PHYSICS_TIMESTEP = 0.005 TARGET_COLORS = [ np.array([170, 38, 30]) / 220, # red np.array([99, 170, 88]) / 220, # green np.array([39, 140, 217]) / 220, # blue np.array([93, 105, 199]) / 220, # purple np.array([220, 193, 59]) / 220, # yellow np.array([220, 128, 107]) / 220, # salmon ] class RollingBallWithFriction(jumping_ball.RollingBallWithHead): def _build(self, roll_damping=5.0, steer_damping=20.0, **kwargs): super()._build(**kwargs) # Increase friction to the joints, so the movement feels more like traditional # first-person navigation control, without much acceleration/deceleration. self._mjcf_root.find('joint', 'roll').damping = roll_damping self._mjcf_root.find('joint', 'steer').damping = steer_damping class MemoryMazeTask(random_goal_maze.NullGoalMaze): # Adapted from dm_control.locomotion.tasks.RepeatSingleGoalMaze def __init__(self, walker, maze_arena, n_targets=3, target_radius=0.3, target_height_above_ground=0.0, target_reward_scale=1.0, target_randomize_colors=False, enable_global_task_observables=False, camera_resolution=64, physics_timestep=DEFAULT_PHYSICS_TIMESTEP, control_timestep=DEFAULT_CONTROL_TIMESTEP, ): super().__init__( walker=walker, maze_arena=maze_arena, randomize_spawn_position=True, randomize_spawn_rotation=True, contact_termination=False, enable_global_task_observables=enable_global_task_observables, physics_timestep=physics_timestep, control_timestep=control_timestep ) self.n_targets = n_targets self._target_radius = target_radius self._target_height_above_ground = target_height_above_ground self._target_reward_scale = target_reward_scale self._target_randomize_colors = target_randomize_colors self._targets = [] self._target_colors = list(TARGET_COLORS) # This contains all colors, not only n_targets self._create_targets() self._current_target_ix = 0 self._rewarded_this_step = False self._targets_obtained = 0 if enable_global_task_observables: # Add egocentric vectors to targets xpos_origin_callable = lambda phys: phys.bind(walker.root_body).xpos def _target_pos(physics, targets, index): return physics.bind(targets[index].geom).xpos for i in range(n_targets): # Absolute target position walker.observables.add_observable( f'target_abs_{i}', observable_lib.Generic(functools.partial(_target_pos, targets=self._targets, index=i)), ) # Relative target position walker.observables.add_egocentric_vector( f'target_rel_{i}', observable_lib.Generic(functools.partial(_target_pos, targets=self._targets, index=i)), origin_callable=xpos_origin_callable) self._task_observables = super().task_observables def _current_target_index(_): return self._current_target_ix def _current_target_color(_): return self._target_colors[self._current_target_ix] self._task_observables['target_index'] = observable_lib.Generic(_current_target_index) self._task_observables['target_index'].enabled = True self._task_observables['target_color'] = observable_lib.Generic(_current_target_color) self._task_observables['target_color'].enabled = True self._walker.observables.egocentric_camera.height = camera_resolution self._walker.observables.egocentric_camera.width = camera_resolution self._maze_arena.observables.top_camera.height = camera_resolution self._maze_arena.observables.top_camera.width = camera_resolution @property def task_observables(self): return self._task_observables @property def name(self): return 'memory_maze' def initialize_episode_mjcf(self, rng: RandomState): self._maze_arena.regenerate(rng) # Bypass super()._initialize_episode_mjcf(), because it ignores rng while True: if self._target_randomize_colors: # Recreate target objects with new colors self._create_targets(clear_existing=True, randomize_colors=True, rng=rng) ok = self._place_targets(rng) if not ok: # Could not place targets - regenerate the maze self._maze_arena.regenerate(rng) continue break self._pick_new_target(rng) def initialize_episode(self, physics, rng: RandomState): super().initialize_episode(physics, rng) self._rewarded_this_step = False self._targets_obtained = 0 def after_step(self, physics, rng: RandomState): super().after_step(physics, rng) self._rewarded_this_step = False for i, target in enumerate(self._targets): if target.activated: if i == self._current_target_ix: self._rewarded_this_step = True self._targets_obtained += 1 self._pick_new_target(rng) target.reset(physics) # Resets activated=False def should_terminate_episode(self, physics): return super().should_terminate_episode(physics) def get_reward(self, physics): if self._rewarded_this_step: return self._target_reward_scale return 0.0 def _create_targets(self, clear_existing=False, randomize_colors=False, rng: Optional[RandomState] = None): if clear_existing: while self._targets: target = self._targets.pop() target.detach() # Important to detach old targets, if creating new ones else: assert not self._targets, 'Targets already created.' if randomize_colors: assert rng is not None rng.shuffle(self._target_colors) for i in range(self.n_targets): color = self._target_colors[i] target = target_sphere.TargetSphere( radius=self._target_radius, height_above_ground=self._target_radius + self._target_height_above_ground, rgb1=tuple(color * 1.0), rgb2=tuple(color * 1.0), ) self._targets.append(target) self._maze_arena.attach(target) def _place_targets(self, rng: RandomState) -> bool: possible_positions = list(self._maze_arena.target_positions) rng.shuffle(possible_positions) if len(possible_positions) < len(self._targets): # Too few rooms - need to regenerate the maze return False for target, pos in zip(self._targets, possible_positions): mjcf.get_attachment_frame(target.mjcf_model).pos = pos return True def _pick_new_target(self, rng: RandomState): while True: ix = rng.randint(len(self._targets)) if self._targets[ix].activated: continue # Skip the target that the agent is touching self._current_target_ix = ix break class FixedWallTexture(labmaze_textures.WallTextures): """Selects a single texture instead of a collection to sample from.""" def _build(self, style, texture_name): labmaze_textures = labmaze_assets.get_wall_texture_paths(style) self._mjcf_root = mjcf.RootElement(model='labmaze_' + style) self._textures = [] if texture_name not in labmaze_textures: raise ValueError(f'`texture_name` should be one of {labmaze_textures.keys()}: got {texture_name}') texture_path = labmaze_textures[texture_name] self._textures.append(self._mjcf_root.asset.add( # type: ignore 'texture', type='2d', name=texture_name, file=texture_path.format(texture_name))) class FixedFloorTexture(labmaze_textures.FloorTextures): """Selects a single texture instead of a collection to sample from.""" def _build(self, style, texture_names): labmaze_textures = labmaze_assets.get_floor_texture_paths(style) self._mjcf_root = mjcf.RootElement(model='labmaze_' + style) self._textures = [] if isinstance(texture_names, str): texture_names = [texture_names] for texture_name in texture_names: if texture_name not in labmaze_textures: raise ValueError(f'`texture_name` should be one of {labmaze_textures.keys()}: got {texture_name}') texture_path = labmaze_textures[texture_name] self._textures.append(self._mjcf_root.asset.add( # type: ignore 'texture', type='2d', name=texture_name, file=texture_path.format(texture_name))) class MazeWithTargetsArena(mazes.MazeWithTargets): """Fork of mazes.RandomMazeWithTargets.""" def _build(self, x_cells, y_cells, xy_scale=2.0, z_height=2.0, max_rooms=4, room_min_size=3, room_max_size=5, spawns_per_room=0, targets_per_room=0, max_variations=26, simplify=True, skybox_texture=None, wall_textures=None, floor_textures=None, aesthetic='default', name='random_maze', random_seed=None): assert random_seed, "Expected to be set by tasks._memory_maze()" super()._build( maze=TextMazeVaryingWalls( height=y_cells, width=x_cells, max_rooms=max_rooms, room_min_size=room_min_size, room_max_size=room_max_size, max_variations=max_variations, spawns_per_room=spawns_per_room, objects_per_room=targets_per_room, simplify=simplify, random_seed=random_seed), xy_scale=xy_scale, z_height=z_height, skybox_texture=skybox_texture, wall_textures=wall_textures, floor_textures=floor_textures, aesthetic=aesthetic, name=name) def regenerate(self, random_state): """Generates a new maze layout. Patch of MazeWithTargets.regenerate() which uses random_state. """ self._maze.regenerate() # logging.debug('GENERATED MAZE:\n%s', self._maze.entity_layer) self._find_spawn_and_target_positions() if self._text_maze_regenerated_hook: self._text_maze_regenerated_hook() # Remove old texturing planes. for geom_name in self._texturing_geom_names: del self._mjcf_root.worldbody.geom[geom_name] self._texturing_geom_names = [] # Remove old texturing materials. for material_name in self._texturing_material_names: del self._mjcf_root.asset.material[material_name] self._texturing_material_names = [] # Remove old actual-wall geoms. self._maze_body.geom.clear() self._current_wall_texture = { wall_char: random_state.choice(wall_textures) # PATCH: use random_state for wall textures for wall_char, wall_textures in self._wall_textures.items() } for wall_char in self._wall_textures: self._make_wall_geoms(wall_char) self._make_floor_variations() def _make_floor_variations(self, build_tile_geoms_fn=None): """Fork of mazes.MazeWithTargets._make_floor_variations(). Makes the room floors different if possible, instead of sampling randomly. """ _DEFAULT_FLOOR_CHAR = '.' main_floor_texture = self._floor_textures[0] if len(self._floor_textures) > 1: room_floor_textures = self._floor_textures[1:] else: room_floor_textures = [main_floor_texture] for i_var, variation in enumerate(_DEFAULT_FLOOR_CHAR + string.ascii_uppercase): if variation not in self._maze.variations_layer: break if build_tile_geoms_fn is None: # Break the floor variation down to odd-sized tiles. tiles = covering.make_walls(self._maze.variations_layer, wall_char=variation, make_odd_sized_walls=True) else: tiles = build_tile_geoms_fn(wall_char=variation) if variation == _DEFAULT_FLOOR_CHAR: variation_texture = main_floor_texture else: variation_texture = room_floor_textures[i_var % len(room_floor_textures)] for i, tile in enumerate(tiles): tile_mid = covering.GridCoordinates( (tile.start.y + tile.end.y - 1) / 2, (tile.start.x + tile.end.x - 1) / 2) tile_pos = np.array([(tile_mid.x - self._x_offset) * self._xy_scale, -(tile_mid.y - self._y_offset) * self._xy_scale, 0.0]) tile_size = np.array([(tile.end.x - tile_mid.x - 0.5) * self._xy_scale, (tile.end.y - tile_mid.y - 0.5) * self._xy_scale, self._xy_scale]) if variation == _DEFAULT_FLOOR_CHAR: tile_name = 'floor_{}'.format(i) else: tile_name = 'floor_{}_{}'.format(variation, i) self._tile_geom_names[tile.start] = tile_name self._texturing_material_names.append(tile_name) self._texturing_geom_names.append(tile_name) material = self._mjcf_root.asset.add( 'material', name=tile_name, texture=variation_texture, texrepeat=(2 * tile_size[[0, 1]] / self._xy_scale)) self._mjcf_root.worldbody.add( 'geom', name=tile_name, type='plane', material=material, pos=tile_pos, size=tile_size, contype=0, conaffinity=0) class TextMazeVaryingWalls(labmaze.RandomMaze): """Augments standard generated labmaze with some walls marked with different chars.""" def regenerate(self): super().regenerate() self._block_variations() def _block_variations(self): nblocks = 3 wall_chars = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] n = self.entity_layer.shape[0] ivar = 0 for i in range(nblocks): for j in range(nblocks): i_from = i * n // nblocks i_to = (i + 1) * n // nblocks j_from = j * n // nblocks j_to = (j + 1) * n // nblocks self._change_block_char(i_from, i_to, j_from, j_to, wall_chars[ivar]) ivar += 1 def _change_block_char(self, i1, i2, j1, j2, char): grid = self.entity_layer i, j = np.where(grid[i1:i2, j1:j2] == '*') grid[i + i1, j + j1] = char ================================================ FILE: memory_maze/oracle.py ================================================ from collections import deque from typing import List, Optional, Tuple import numpy as np from memory_maze.wrappers import ObservationWrapper class PathToTargetWrapper(ObservationWrapper): """Find shortest path to target and indicate it on maze_layout. Used for Oracle.""" def observation_spec(self): spec = self.env.observation_spec() assert isinstance(spec, dict) assert 'agent_pos' in spec assert 'target_pos' in spec assert 'maze_layout' in spec return spec def observation(self, obs): assert isinstance(obs, dict) # Find shortest path (in gridworld) from agent to target maze = obs['maze_layout'] start = tuple(obs['agent_pos'].astype(int)) finish = tuple(obs['target_pos'].astype(int)) path = breadth_first_search(maze, start, finish) if path: for x, y in path: maze[y, x] = 2 # Update maze_layout observation return obs class DrawMinimapWrapper(ObservationWrapper): """Show maze_layout as minimap in image observation. Used for Oracle.""" def observation_spec(self): spec = self.env.observation_spec() assert isinstance(spec, dict) assert 'maze_layout' in spec assert 'image' in spec assert 'agent_dir' in spec return spec def observation(self, obs): from PIL import Image assert isinstance(obs, dict) maze = obs['maze_layout'] x, y = obs['agent_pos'] dx, dy = obs['agent_dir'] angle = np.arctan2(dx, dy) N = maze.shape[0] SIZE = N * 2 # Draw map map = np.zeros((N, N, 3), np.uint8) # walls in black map[:, :] += (maze == 1)[..., None] * np.array([[[255, 255, 255]]], np.uint8) # corridors in white map[:, :] += (maze == 2)[..., None] * np.array([[[0, 255, 0]]], np.uint8) # path in green map[int(y), int(x)] = np.array([255, 0, 0], np.uint8) # agent in red map = np.flip(map, 0) # Scale, rotate, translate mapimg = Image.fromarray(map) mapimg = mapimg.resize((SIZE, SIZE), resample=0) tx = (x - N / 2) / N * SIZE ty = - (y - N / 2) / N * SIZE mapimg = mapimg.transform(mapimg.size, 0, (1, 0, tx, 0, 1, ty), resample=0) mapimg = mapimg.rotate(angle / np.pi * 180, resample=0) # Overlay minimap onto observation image top-right corner img = obs['image'] img[:SIZE, -SIZE:] = img[:SIZE, -SIZE:] // 2 + np.array(mapimg) // 2 return obs def breadth_first_search(maze: np.ndarray, start: Tuple[int, int], finish: Tuple[int, int]) -> Optional[List[Tuple[int, int]]]: h, w = maze.shape queue = deque() visited = np.zeros(maze.shape, dtype=bool) backtrace = np.zeros(maze.shape + (2,), dtype=int) xs, ys = start queue.append((xs, ys)) visited[ys, xs] = True while len(queue) > 0: x, y = queue.popleft() for dx, dy in [(-1, 0), (1, 0), (0, -1), (0, 1)]: x1 = x + dx y1 = y + dy if 0 <= x1 < w and 0 <= y1 < h and maze[y1, x1] and not visited[y1, x1]: queue.append((x1, y1)) visited[y1, x1] = True backtrace[y1, x1, :] = np.array([x, y]) if (x1, y1) == finish: break xf, yf = finish if not visited[yf, xf]: return None path = [] path.append((xf, yf)) while (xf, yf) != start: xf, yf = backtrace[yf, xf] path.append((xf, yf)) path.reverse() return path ================================================ FILE: memory_maze/tasks.py ================================================ import numpy as np from dm_control import composer from dm_control.locomotion.arenas import labmaze_textures from memory_maze.maze import * from memory_maze.oracle import DrawMinimapWrapper, PathToTargetWrapper from memory_maze.wrappers import * # Slow control (4Hz), so that agent without HRL has a chance. # Native control would be ~20Hz, so this corresponds roughly to action_repeat=5. DEFAULT_CONTROL_FREQ = 4.0 def memory_maze_9x9(**kwargs): """ Maze based on DMLab30-explore_goal_locations_small { mazeHeight = 11, # with outer walls mazeWidth = 11, roomCount = 4, roomMaxSize = 5, roomMinSize = 3, } """ return _memory_maze(9, 3, 250, **kwargs) def memory_maze_11x11(**kwargs): return _memory_maze(11, 4, 500, **kwargs) def memory_maze_13x13(**kwargs): return _memory_maze(13, 5, 750, **kwargs) def memory_maze_15x15(**kwargs): """ Maze based on DMLab30-explore_goal_locations_large { mazeHeight = 17, # with outer walls mazeWidth = 17, roomCount = 9, roomMaxSize = 3, roomMaxSize = 3, } """ return _memory_maze(15, 6, 1000, max_rooms=9, room_max_size=3, **kwargs) def _memory_maze( maze_size, # measured without exterior walls n_targets, time_limit, max_rooms=6, room_min_size=3, room_max_size=5, control_freq=DEFAULT_CONTROL_FREQ, discrete_actions=True, image_only_obs=False, target_color_in_image=True, global_observables=False, top_camera=False, good_visibility=False, show_path=False, camera_resolution=64, seed=None, randomize_colors=False, ): random_state = np.random.RandomState(seed) walker = RollingBallWithFriction(camera_height=0.3, add_ears=top_camera) arena = MazeWithTargetsArena( x_cells=maze_size + 2, # inner size => outer size y_cells=maze_size + 2, xy_scale=2.0, z_height=1.5 if not good_visibility else 0.4, max_rooms=max_rooms, room_min_size=room_min_size, room_max_size=room_max_size, spawns_per_room=1, targets_per_room=1, floor_textures=FixedFloorTexture('style_01', ['blue', 'blue_bright']), wall_textures=dict({ '*': FixedWallTexture('style_01', 'yellow'), # default wall }, **{str(i): labmaze_textures.WallTextures('style_01') for i in range(10)} # variations ), skybox_texture=None, random_seed=random_state.randint(2147483648), ) task = MemoryMazeTask( walker=walker, maze_arena=arena, n_targets=n_targets, target_radius=0.6, target_height_above_ground=0.5 if good_visibility else -0.6, enable_global_task_observables=True, # Always add to underlying env, but not always expose in RemapObservationWrapper control_timestep=1.0 / control_freq, camera_resolution=camera_resolution, target_randomize_colors=randomize_colors, ) if top_camera: task.observables['top_camera'].enabled = True env = composer.Environment( time_limit=time_limit - 1e-3, # subtract epsilon to make sure ep_length=time_limit*fps task=task, random_state=random_state, strip_singleton_obs_buffer_dim=True) obs_mapping = { 'image': 'walker/egocentric_camera' if not top_camera else 'top_camera', 'target_color': 'target_color', } if global_observables: env = TargetsPositionWrapper(env, task._maze_arena.xy_scale, task._maze_arena.maze.width, task._maze_arena.maze.height) env = AgentPositionWrapper(env, task._maze_arena.xy_scale, task._maze_arena.maze.width, task._maze_arena.maze.height) env = MazeLayoutWrapper(env) obs_mapping = dict(obs_mapping, **{ 'agent_pos': 'agent_pos', 'agent_dir': 'agent_dir', 'targets_vec': 'targets_vec', 'targets_pos': 'targets_pos', 'target_vec': 'target_vec', 'target_pos': 'target_pos', 'maze_layout': 'maze_layout', }) env = RemapObservationWrapper(env, obs_mapping) if target_color_in_image: env = TargetColorAsBorderWrapper(env) if show_path: env = PathToTargetWrapper(env) env = DrawMinimapWrapper(env) if image_only_obs: assert target_color_in_image, 'Image-only observation only makes sense with target_color_in_image' env = ImageOnlyObservationWrapper(env) if discrete_actions: env = DiscreteActionSetWrapper(env, [ np.array([0.0, 0.0]), # noop np.array([-1.0, 0.0]), # forward np.array([0.0, -1.0]), # left np.array([0.0, +1.0]), # right np.array([-1.0, -1.0]), # forward + left np.array([-1.0, +1.0]), # forward + right ]) return env ================================================ FILE: memory_maze/wrappers.py ================================================ from typing import Any, Dict, List import dm_env import numpy as np from dm_env import specs class Wrapper(dm_env.Environment): """Base class for dm_env.Environment wrapper.""" def __init__(self, env: dm_env.Environment): self.env = env def __getattr__(self, name): if name.startswith('__'): raise AttributeError(f'Attempted to get missing private attribute {name}') return getattr(self.env, name) def step(self, action) -> dm_env.TimeStep: return self.env.step(action) def reset(self) -> dm_env.TimeStep: return self.env.reset() def action_spec(self) -> Any: return self.env.action_spec() def discount_spec(self) -> Any: return self.env.discount_spec() def observation_spec(self) -> Any: return self.env.observation_spec() def reward_spec(self) -> Any: return self.env.reward_spec() def close(self): return self.env.close() class ObservationWrapper(Wrapper): """Base class for observation wrapper.""" def observation_spec(self): raise NotImplementedError def observation(self, obs: Any) -> Any: raise NotImplementedError def step(self, action) -> dm_env.TimeStep: step_type, discount, reward, observation = self.env.step(action) return dm_env.TimeStep(step_type, discount, reward, self.observation(observation)) def reset(self) -> dm_env.TimeStep: step_type, discount, reward, observation = self.env.reset() return dm_env.TimeStep(step_type, discount, reward, self.observation(observation)) class RemapObservationWrapper(ObservationWrapper): """Select a subset of dictionary observation keys and rename them.""" def __init__(self, env: dm_env.Environment, mapping: Dict[str, str]): super().__init__(env) self.mapping = mapping def observation_spec(self): spec = self.env.observation_spec() assert isinstance(spec, dict) return {key: spec[key_orig] for key, key_orig in self.mapping.items()} def observation(self, obs): assert isinstance(obs, dict) return {key: obs[key_orig] for key, key_orig in self.mapping.items()} class TargetsPositionWrapper(ObservationWrapper): """Collects and postporcesses walker/target_rel_{i} relative position vectors into targets_vec (n_targets,2) tensor, and walker/targets_abs_{i} absolute positions into targets_pos tensor.""" def __init__(self, env: dm_env.Environment, maze_xy_scale, maze_width, maze_height): super().__init__(env) self.maze_xy_scale = maze_xy_scale self.center_ji = np.array([maze_width - 2.0, maze_height - 2.0]) / 2.0 spec = self.env.observation_spec() assert isinstance(spec, dict) assert 'walker/target_rel_0' in spec assert 'walker/target_abs_0' in spec assert 'target_index' in spec i = 0 while f'walker/target_rel_{i}' in spec: assert f'walker/target_abs_{i}' in spec i += 1 self.n_targets = i def observation_spec(self): spec = self.env.observation_spec() assert isinstance(spec, dict) # All targets spec['targets_vec'] = specs.Array((self.n_targets, 2), float, 'targets_vec') spec['targets_pos'] = specs.Array((self.n_targets, 2), float, 'targets_pos') # Current target spec['target_vec'] = specs.Array((2,), float, 'target_vec') spec['target_pos'] = specs.Array((2,), float, 'target_pos') return spec def observation(self, obs): assert isinstance(obs, dict) # All targets x_rel = np.zeros((self.n_targets, 2)) x_abs = np.zeros((self.n_targets, 2)) for i in range(self.n_targets): x_rel[i] = obs[f'walker/target_rel_{i}'][:2] / self.maze_xy_scale x_abs[i] = obs[f'walker/target_abs_{i}'][:2] / self.maze_xy_scale + self.center_ji obs['targets_vec'] = x_rel obs['targets_pos'] = x_abs # Current target target_ix = int(obs['target_index']) obs['target_vec'] = x_rel[target_ix] obs['target_pos'] = x_abs[target_ix] return obs class AgentPositionWrapper(ObservationWrapper): """Postprocesses absolute_position and absolute_orientation.""" def __init__(self, env: dm_env.Environment, maze_xy_scale, maze_width, maze_height): super().__init__(env) self.maze_xy_scale = maze_xy_scale self.center_ji = np.array([maze_width - 2.0, maze_height - 2.0]) / 2.0 def observation_spec(self): spec = self.env.observation_spec() # absolute_position and absolute_orientation should already be generated by the environment. assert isinstance(spec, dict) and 'absolute_position' in spec and 'absolute_orientation' in spec # Add agent_pos, measured in grid coordinates spec['agent_pos'] = specs.Array((2, ), float, 'agent_pos') # Add agent_dir as 2-vector spec['agent_dir'] = specs.Array((2, ), float, 'agent_dir') return spec def observation(self, obs): assert isinstance(obs, dict) walker_xy = obs['absolute_position'][:2] walker_ji = walker_xy / self.maze_xy_scale + self.center_ji # agent_pos, measured in grid coordinates, where bottom-left coordinate is (0.1,0.1), # and top-right coordinate for a 15x15 maze is (14.9,14.9) obs['agent_pos'] = walker_ji # Pick orientation vector such, that going forward increases agent_pos in the direction of agent_dir. obs['agent_dir'] = obs['absolute_orientation'][:2, 1] return obs class MazeLayoutWrapper(ObservationWrapper): """Postprocesses maze_layout observation.""" def observation_spec(self): spec = self.env.observation_spec() # maze_layout should already be generated by the environment assert isinstance(spec, dict) and 'maze_layout' in spec # Change char array to binary array, removing outer walls n, m = spec['maze_layout'].shape spec['maze_layout'] = specs.BoundedArray((n - 2, m - 2), np.uint8, 0, 1, 'maze_layout') return spec def observation(self, obs): assert isinstance(obs, dict) maze = obs['maze_layout'] maze = maze[1:-1, 1:-1] # Remove outer walls maze = np.flip(maze, 0) # Flip vertical axis so that bottom-left is at maze[0,0] nonwalls = (maze == ' ') | (maze == 'P') | (maze == 'G') obs['maze_layout'] = nonwalls.astype(np.uint8) return obs class ImageOnlyObservationWrapper(ObservationWrapper): """Select one of the dictionary observation keys as observation.""" def __init__(self, env: dm_env.Environment, key: str = 'image'): super().__init__(env) self.key = key def observation_spec(self): spec = self.env.observation_spec() assert isinstance(spec, dict) return spec[self.key] def observation(self, obs): assert isinstance(obs, dict) return obs[self.key] class DiscreteActionSetWrapper(Wrapper): """Change action space from continuous to discrete with given set of action vectors.""" def __init__(self, env: dm_env.Environment, action_set: List[np.ndarray]): super().__init__(env) self.action_set = action_set def action_spec(self): return specs.DiscreteArray(len(self.action_set)) def step(self, action) -> dm_env.TimeStep: return self.env.step(self.action_set[action]) class TargetColorAsBorderWrapper(ObservationWrapper): """MemoryMaze-specific wrapper, which draws target_color as border on the image.""" def observation_spec(self): spec = self.env.observation_spec() assert isinstance(spec, dict) assert 'target_color' in spec return spec def observation(self, obs): assert isinstance(obs, dict) assert 'target_color' in obs and 'image' in obs target_color = obs['target_color'] img = obs['image'] B = int(2 * np.sqrt(img.shape[0] // 64)) img[:, :B] = target_color * 255 * 0.7 img[:, -B:] = target_color * 255 * 0.7 img[:B, :] = target_color * 255 * 0.7 img[-B:, :] = target_color * 255 * 0.7 return obs ================================================ FILE: setup.py ================================================ from setuptools import setup import pathlib __version__ = "1.0.3" setup( name="memory-maze", version=__version__, author="Jurgis Pasukonis", author_email="jurgisp@gmail.com", url="https://github.com/jurgisp/memory-maze", description="Memory Maze is an environment to benchmark memory abilities of RL agents", long_description=pathlib.Path('README.md').read_text(), long_description_content_type='text/markdown', zip_safe=False, python_requires=">=3", packages=["memory_maze"], install_requires=[ 'dm_control' ], )