Repository: jurgisp/memory-maze
Branch: main
Commit: 4030901cef3b
Files: 14
Total size: 63.2 KB
Directory structure:
gitextract_h395e807/
├── .gitignore
├── LICENSE
├── README.md
├── gui/
│ ├── recording.py
│ ├── requirements.txt
│ └── run_gui.py
├── memory_maze/
│ ├── __init__.py
│ ├── gym_wrappers.py
│ ├── helpers.py
│ ├── maze.py
│ ├── oracle.py
│ ├── tasks.py
│ └── wrappers.py
└── setup.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
.*
!.gitignore
__pycache__/
*.egg-info
sandbox/
log/
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2022 jurgisp
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
**Status:** Stable release
[](https://pypi.python.org/pypi/memory-maze/#history)
# Memory Maze
Memory Maze is a 3D domain of randomized mazes designed for evaluating the long-term memory abilities of RL agents. Memory Maze isolates long-term memory from confounding challenges, such as exploration, and requires remembering several pieces of information: the positions of objects, the wall layout, and keeping track of agent’s own position.
| Memory 9x9 | Memory 11x11 | Memory 13x13 | Memory 15x15 |
|------------|--------------|--------------|--------------|
|  |  |  |  |
Key features:
- Online RL memory tasks (with baselines)
- Offline dataset for representation learning (with baselines)
- Verified that memory is the key challenge
- Challenging but solvable by human baseline
- Easy installation via a simple pip command
- Available `gym` and `dm_env` interfaces
- Supports headless and hardware rendering
- Interactive GUI for human players
- Hidden state information for probe evaluation
Also see the accompanying research paper: [Evaluating Long-Term Memory in 3D Mazes](https://arxiv.org/abs/2210.13383)
```
@article{pasukonis2022memmaze,
title={Evaluating Long-Term Memory in 3D Mazes},
author={Pasukonis, Jurgis and Lillicrap, Timothy and Hafner, Danijar},
journal={arXiv preprint arXiv:2210.13383},
year={2022}
}
```
## Installation
Memory Maze builds on the [`dm_control`](https://github.com/deepmind/dm_control) and [`mujoco`](https://github.com/deepmind/mujoco) packages, which are automatically installed as dependencies:
```sh
pip install memory-maze
```
## Play Yourself
Memory Maze allows you to play the levels in human mode. We used this mode for recording the human baseline scores. These are the instructions for launching the GUI:
```sh
# GUI dependencies
pip install gym pygame pillow imageio
# Launch with standard 64x64 resolution
python gui/run_gui.py
# Launch with higher 256x256 resolution
python gui/run_gui.py --env "memory_maze:MemoryMaze-9x9-HD-v0"
```
## Task Description
The task is based on a game known as scavenger hunt or treasure hunt:
- The agent starts in a randomly generated maze, which contains several objects of different colors.
- The agent is prompted to find the target object of a specific color, indicated by the border color in the observation image.
- Once the agent successfully finds and touches the correct object, it gets a +1 reward and the next random object is chosen as a target.
- If the agent touches an object of the wrong color, there is no effect.
- Throughout the episode, the maze layout and the locations of the objects do not change.
- The episode continues for a fixed amount of time, so the total episode reward equals the number of reached targets.

An agent with long-term memory only has to explore each maze once (which is possible in a time much shorter than the length of an episode) and can afterwards follow the shortest path to each requested target, whereas an agent with no memory has to randomly wander through the maze to find each target.
There are 4 size variations of the maze. The largest maze 15x15 is designed to be challenging but solvable for humans (see benchmark results below), but out of reach for the state-of-the-art RL methods. The smaller sizes are provided as stepping stones, with 9x9 being solvable with current RL methods.
| Size | env_id | Objects | Episode steps | Mean human score | Mean max score |
|:---------:|-----------------------|:---:|:-----:|:----:|:----:|
| **9x9** | `MemoryMaze-9x9-v0` | 3 | 1000 | 26.4 | 34.8 |
| **11x11** | `MemoryMaze-11x11-v0` | 4 | 2000 | 44.3 | 58.0 |
| **13x13** | `MemoryMaze-13x13-v0` | 5 | 3000 | 55.5 | 74.5 |
| **15x15** | `MemoryMaze-15x15-v0` | 6 | 4000 | 67.7 | 87.7 |
The mazes are generated with [labmaze](https://github.com/deepmind/labmaze), the same algorithm as used by [DmLab-30](https://github.com/deepmind/lab/tree/master/game_scripts/levels/contributed/dmlab30). The 9x9 corresponds to the [small](https://github.com/deepmind/lab/tree/master/game_scripts/levels/contributed/dmlab30#goal-locations-small) variant and 15x15 corresponds to the [large](https://github.com/deepmind/lab/tree/master/game_scripts/levels/contributed/dmlab30#goal-locations-large) variant.
## Gym Interface
You can create the environment using the [Gym](https://github.com/openai/gym) interface:
```python
!pip install gym
import gym
# Set this if you are getting "Unable to load EGL library" error:
# os.environ['MUJOCO_GL'] = 'glfw'
env = gym.make('memory_maze:MemoryMaze-9x9-v0')
env = gym.make('memory_maze:MemoryMaze-11x11-v0')
env = gym.make('memory_maze:MemoryMaze-13x13-v0')
env = gym.make('memory_maze:MemoryMaze-15x15-v0')
```
**Troubleshooting:** if you are getting "Unable to load EGL library error", that is because we enable MuJoCo headless GPU rendering (`MUJOCO_GL=egl`) by default. If you are testing locally on your machine, you can enable windowed rendering instead (`MUJOCO_GL=glfw`). [Read here](https://github.com/deepmind/dm_control#rendering) about the different rendering options.
The default environment has 64x64 image observations:
```python
>>> env.observation_space
Box(0, 255, (64, 64, 3), uint8)
```
There are 6 discrete actions:
```python
>>> env.action_space
Discrete(6) # (noop, forward, left, right, forward_left, forward_right)
```
To create an environment with extra observations for debugging and probe analysis, append `ExtraObs` to the names:
```python
>>> env = gym.make('memory_maze:MemoryMaze-9x9-ExtraObs-v0')
>>> env.observation_space
Dict(
agent_dir: Box(-inf, inf, (2,), float64),
agent_pos: Box(-inf, inf, (2,), float64),
image: Box(0, 255, (64, 64, 3), uint8),
maze_layout: Box(0, 1, (9, 9), uint8),
target_color: Box(-inf, inf, (3,), float64),
target_pos: Box(-inf, inf, (2,), float64),
target_vec: Box(-inf, inf, (2,), float64),
targets_pos: Box(-inf, inf, (3, 2), float64),
targets_vec: Box(-inf, inf, (3, 2), float64)
)
```
We also register [additional variants](memory_maze/__init__.py) of the environment that can be useful in certain scenarios.
## DeepMind Interface
You can create the environment using the [dm_env](https://github.com/deepmind/dm_env) interface:
```python
from memory_maze import tasks
env = tasks.memory_maze_9x9()
env = tasks.memory_maze_11x11()
env = tasks.memory_maze_13x13()
env = tasks.memory_maze_15x15()
```
Each observation is a dictionary that includes `image` key:
```python
>>> env.observation_spec()
{
'image': BoundedArray(shape=(64, 64, 3), ...)
}
```
The constructor accepts a number of arguments, which can be used to tweak the environment:
```python
env = tasks.memory_maze_9x9(
global_observables=True,
image_only_obs=False,
top_camera=False,
camera_resolution=64,
control_freq=4.0,
discrete_actions=True,
)
```
## Offline Dataset
[**Dataset download here** (~100GB per dataset)](https://drive.google.com/drive/folders/1RcnkTZVwEHnAQeEuw7X8Y1RPSmrFLDFB)
We provide two datasets of experience collected from the Memory Maze environment: Memory Maze 9x9 (30M) and Memory Maze 15x15 (30M). Each dataset contains 30 thousand trajectories from Memory Maze 9x9 and 15x15 environments respectively, split into 29k trajectories for training and 1k for evaluation. All trajectories are 1000 steps long, so each dataset has 30M steps total.
The data is generated with a scripted policy that navigates to randomly chosen points in the maze under action noise. This choice of policy was made to generate diverse trajectories that explore the maze effectively and that form spatial loops, which can be important for learning long-term memory. We intentionally avoid recording data with a trained agent to ensure a diverse data distribution and to avoid dataset bias that could favor some methods over others. Because of this, the rewards are quite sparse in the data, occurring on average 1-2 times per trajectory.
Each trajectory is saved as an NPZ file with the following entries available:
| Key | Shape | Type | Description |
|----------------|--------------------|--------|-----------------------------------------------|
| `image` | (64, 64, 3) | uint8 | First-person view observation |
| `action` | (6) | binary | Last action, one-hot encoded |
| `reward` | () | float | Last reward |
| `maze_layout` | (9, 9) or (15, 15) | binary | Maze layout (wall / no wall) |
| `agent_pos` | (2) | float | Agent position in global coordinates |
| `agent_dir` | (2) | float | Agent orientation as a unit vector |
| `targets_pos` | (3, 2) or (6, 2) | float | Object locations in global coordinates |
| `targets_vec` | (3, 2) or (6, 2) | float | Object locations in agent-centric coordinates |
| `target_pos` | (2) | float | Current target object location, global |
| `target_vec` | (2) | float | Current target object location, agent-centric |
| `target_color` | (3) | float | Current target object color RGB |
You can load a trajectory using [`np.load()`](https://numpy.org/doc/stable/reference/generated/numpy.load.html) to obtain a dictionary of Numpy arrays as follows:
```python
episode = np.load('trajectory.npz')
episode = {key: episode[key] for key in episode.keys()}
assert episode['image'].shape == (1001, 64, 64, 3)
assert episode['image'].dtype == np.uint8
```
All tensors have a leading time dimension, e.g. `image` tensor has shape (1001, 64, 64, 3). The tensor length is 1001 because there are 1000 steps (actions) in a trajectory, `image[0]` is the observation *before* the first action, and `image[-1]` is the observation *after* the last action.
## Online RL Baselines
In our [research paper](https://arxiv.org/abs/2210.13383), we evaluate the model-free [IMPALA](https://github.com/google-research/seed_rl/tree/master/agents/vtrace) agent and the model-based [Dreamer](https://github.com/jurgisp/pydreamer) agent as baselines.
Here are videos of the learned behaviors:
**Memory 9x9 - Dreamer (TBTT)**
https://user-images.githubusercontent.com/3135115/197378287-4e413440-7097-4d11-8627-3d7fac0845f1.mp4
**Memory 9x9 - IMPALA (400M)**
https://user-images.githubusercontent.com/3135115/197378929-7fe3f374-c11c-409a-8a95-03feeb489330.mp4
**Memory 15x15 - Dreamer (TBTT)**
https://user-images.githubusercontent.com/3135115/197378324-fb99b496-dba8-4b00-ad80-2d6e19ba8acd.mp4
**Memory 15x15 - IMPALA (400M)**
https://user-images.githubusercontent.com/3135115/197378936-939e7615-9dad-4765-b0ef-a49c5a38fe28.mp4
## Offline Probing Baselines
Here we visualize probe predictions alongside trajectories of the offline dataset, as explained in [the paper](https://arxiv.org/abs/2210.13383). These trajectories are from the offline dataset, where the agent just navigates to random points in the maze, it does *not* try to collect rewards.
Bottom-left: Object location predictions (x) versus the actual locations (o).
Bottom-right: Wall layout predictions (dark green = true positive, light green = true negative, light red = false positive, dark red = false negative).
**Memory 9x9 Walls Objects - RSSM (TBTT)**
https://user-images.githubusercontent.com/3135115/197379227-775ec5bc-0780-4dcc-b7f1-660bc7cf95f1.mp4
**Memory 9x9 Walls Objects - Supervised oracle**
https://user-images.githubusercontent.com/3135115/197379235-a5ea0388-2718-4035-8bbc-064ecc9ea444.mp4
**Memory 15x15 Walls Objects - RSSM (TBTT)**
https://user-images.githubusercontent.com/3135115/197379245-fb96bd12-6ef5-481e-adc6-f119a39e8e43.mp4
**Memory 15x15 Walls Objects - Supervised oracle**
https://user-images.githubusercontent.com/3135115/197379248-26a8093e-8b54-443c-b154-e33e0383b5e4.mp4
## Questions
Please [open an issue][issues] on Github.
[issues]: https://github.com/jurgisp/memory-maze/issues
================================================
FILE: gui/recording.py
================================================
from datetime import datetime
from pathlib import Path
import gym
import imageio
import numpy as np
from PIL import Image
class SaveNpzWrapper(gym.Wrapper):
def __init__(self, env, log_dir, video_fps=30, video_size=256, video_format='mp4'):
env = ActionRewardResetWrapper(env)
env = CollectWrapper(env)
super().__init__(env)
self.log_dir = Path(log_dir)
self.log_dir.mkdir(parents=True, exist_ok=True)
self.video_fps = video_fps
self.video_size = video_size
self.video_format = video_format
def step(self, action):
obs, reward, done, info = self.env.step(action) # type: ignore
data = info.get('episode')
if data:
ep_id = info['episode_id']
ep_reward = data['reward'].sum()
ep_steps = len(data['reward']) - 1
ep_name = f'{ep_id}-r{ep_reward:.0f}-{ep_steps:04}'
self._save_npz(data, self.log_dir / f'{ep_name}.npz')
if self.video_format:
self._save_video(data, self.log_dir / f'{ep_name}.{self.video_format}')
return obs, reward, done, info
def _save_npz(self, data, path):
with path.open('wb') as f:
np.savez_compressed(f, **data)
print(f'Saved {path}', {k: v.shape for k, v in data.items()})
def _save_video(self, data, path):
writer = imageio.get_writer(path, fps=self.video_fps)
for frame in data['image']:
img = Image.fromarray(frame)
img = img.resize((self.video_size, self.video_size), resample=0)
writer.append_data(np.array(img))
writer.close()
print(f'Saved {path}')
class CollectWrapper(gym.Wrapper):
"""Copied from pydreamer.envs.wrappers."""
def __init__(self, env):
super().__init__(env)
self.env = env
self.episode = []
self.episode_id = ''
def step(self, action):
obs, reward, done, info = self.env.step(action)
self.episode.append(obs.copy())
if done:
episode = {k: np.array([t[k] for t in self.episode]) for k in self.episode[0]}
info['episode'] = episode
info['episode_id'] = self.episode_id
return obs, reward, done, info
def reset(self):
obs = self.env.reset()
self.episode = [obs.copy()]
self.episode_id = datetime.now().strftime('%Y%m%dT%H%M%S')
return obs
class ActionRewardResetWrapper(gym.Wrapper):
"""Copied from pydreamer.envs.wrappers."""
def __init__(self, env, no_terminal=False):
super().__init__(env)
self.env = env
self.no_terminal = no_terminal
# Handle environments with one-hot or discrete action, but collect always as one-hot
self.action_size = env.action_space.n if hasattr(env.action_space, 'n') else env.action_space.shape[0]
def step(self, action):
obs, reward, done, info = self.env.step(action)
if isinstance(action, int):
action_vec = np.zeros(self.action_size)
action_vec[action] = 1.0
else:
assert isinstance(action, np.ndarray) and action.shape == (self.action_size,), "Wrong one-hot action shape"
action_vec = action
obs['action'] = action_vec
obs['reward'] = np.array(reward)
obs['terminal'] = np.array(False if self.no_terminal or 'TimeLimit.truncated' in info or info.get('time_limit') else done)
obs['reset'] = np.array(False)
return obs, reward, done, info
def reset(self):
obs = self.env.reset()
obs['action'] = np.zeros(self.action_size)
obs['reward'] = np.array(0.0)
obs['terminal'] = np.array(False)
obs['reset'] = np.array(True)
return obs
================================================
FILE: gui/requirements.txt
================================================
gym
pygame
pillow
imageio
imageio-ffmpeg
================================================
FILE: gui/run_gui.py
================================================
import os, sys
import argparse
from collections import defaultdict
import gym
import numpy as np
import pygame
import pygame.freetype
from gym import spaces
from PIL import Image
from recording import SaveNpzWrapper
if 'MUJOCO_GL' not in os.environ:
if "linux" in sys.platform:
os.environ['MUJOCO_GL'] = 'osmesa' # Software rendering to avoid rendering interference with pygame
else:
os.environ['MUJOCO_GL'] = 'glfw' # Windowed rendering
PANEL_LEFT = 250
PANEL_RIGHT = 250
FOCUS_HACK = False
RECORD_DIR = './log'
K_NONE = tuple()
def get_keymap(env):
return {
tuple(): 0,
(pygame.K_UP, ): 1,
(pygame.K_LEFT, ): 2,
(pygame.K_RIGHT, ): 3,
(pygame.K_UP, pygame.K_LEFT): 4,
(pygame.K_UP, pygame.K_RIGHT): 5,
}
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--env', type=str, default='memory_maze:MemoryMaze-9x9-v0')
parser.add_argument('--size', type=int, nargs=2, default=(600, 600))
parser.add_argument('--fps', type=int, default=6)
parser.add_argument('--random', type=float, default=0.0)
parser.add_argument('--noreset', action='store_true')
parser.add_argument('--fullscreen', action='store_true')
parser.add_argument('--nonoop', action='store_true', help='Pause instead of noop')
parser.add_argument('--record', action='store_true')
parser.add_argument('--record_mp4', action='store_true')
parser.add_argument('--record_gif', action='store_true')
args = parser.parse_args()
render_size = args.size
window_size = (render_size[0] + PANEL_LEFT + PANEL_RIGHT, render_size[1])
print(f'Creating environment: {args.env}')
env = gym.make(args.env, disable_env_checker=True)
if isinstance(env.observation_space, spaces.Dict):
print('Observation space:')
for k, v in env.observation_space.spaces.items(): # type: ignore
print(f'{k:>25}: {v}')
else:
print(f'Observation space: {env.observation_space}')
print(f'Action space: {env.action_space}')
if args.record:
env = SaveNpzWrapper(
env,
RECORD_DIR,
video_format='mp4' if args.record_mp4 else 'gif' if args.record_gif else None,
video_fps=args.fps * 2)
keymap = get_keymap(env)
steps = 0
return_ = 0.0
episode = 0
obs = env.reset()
pygame.init()
start_fullscreen = args.fullscreen or FOCUS_HACK
screen = pygame.display.set_mode(window_size, pygame.FULLSCREEN if start_fullscreen else 0)
if FOCUS_HACK and not args.fullscreen:
# Hack: for some reason app window doesn't get focus when launching, so
# we launch it as full screen and then exit full screen.
pygame.display.toggle_fullscreen()
clock = pygame.time.Clock()
font = pygame.freetype.SysFont('Mono', 16)
fontsmall = pygame.freetype.SysFont('Mono', 12)
running = True
paused = False
speedup = False
while running:
# Rendering
screen.fill((64, 64, 64))
# Render image observation
if isinstance(obs, dict):
assert 'image' in obs, 'Expecting dictionary observation with obs["image"]'
image = obs['image'] # type: ignore
else:
assert isinstance(obs, np.ndarray) and len(obs.shape) == 3, 'Expecting image observation'
image = obs
image = Image.fromarray(image)
image = image.resize(render_size, resample=0)
image = np.array(image)
surface = pygame.surfarray.make_surface(image.transpose((1, 0, 2)))
screen.blit(surface, (PANEL_LEFT, 0))
# Render statistics
lines = obs_to_text(obs, env, steps, return_)
y = 5
for line in lines:
text_surface, rect = font.render(line, (255, 255, 255))
screen.blit(text_surface, (16, y))
y += font.size + 2 # type: ignore
# Render keymap help
lines = keymap_to_text(keymap)
y = 5
for line in lines:
text_surface, rect = fontsmall.render(line, (255, 255, 255))
screen.blit(text_surface, (render_size[0] + PANEL_LEFT + 16, y))
y += fontsmall.size + 2 # type: ignore
pygame.display.flip()
clock.tick(args.fps if not speedup else 0)
# Keyboard input
pygame.event.pump()
keys_down = defaultdict(bool)
for event in pygame.event.get():
if event.type == pygame.QUIT: # Close
running = False
if event.type == pygame.KEYDOWN:
keys_down[event.key] = True
keys_hold = pygame.key.get_pressed()
# Action keys
action = keymap[K_NONE] # noop, if no keys pressed
for keys, act in keymap.items():
if all(keys_hold[key] or keys_down[key] for key in keys):
# The last keymap entry which has all keys pressed wins
action = act
# Special keys
force_reset = False
speedup = False
if keys_down[pygame.K_ESCAPE]: # Quit
running = False
if keys_down[pygame.K_SPACE]: # Pause
paused = not paused
else:
if action != keymap[K_NONE]:
paused = False # unpause on action press
if keys_down[pygame.K_BACKSPACE]: # Force reset
force_reset = True
if keys_hold[pygame.K_TAB]:
speedup = True
if paused:
continue
if action == keymap[K_NONE] and args.nonoop and not force_reset:
continue
# Environment step
if args.random:
if np.random.random() < args.random:
action = env.action_space.sample()
obs, reward, done, info = env.step(action) # type: ignore
# print({k: v for k, v in obs.items() if k != 'image'})
steps += 1
return_ += reward
# Episode end
if reward:
print(f'reward: {reward}')
if done or force_reset:
print(f'Episode done - length: {steps} return: {return_}')
obs = env.reset()
steps = 0
return_ = 0.0
episode += 1
if done and args.record:
# If recording, require relaunch for next episode
running = False
pygame.quit()
def obs_to_text(obs, env, steps, return_):
kvs = []
kvs.append(('## Stats ##', ''))
kvs.append(('', ''))
kvs.append(('step', steps))
kvs.append(('return', return_))
lines = [f'{k:<15} {v:>5}' for k, v in kvs]
return lines
def keymap_to_text(keymap, verbose=False):
kvs = []
kvs.append(('## Commands ##', ''))
kvs.append(('', ''))
# mapped actions
kvs.append(('forward', 'up arrow'))
kvs.append(('left', 'left arrow'))
kvs.append(('right', 'right arrow'))
# special actions
kvs.append(('', ''))
kvs.append(('reset', 'backspace'))
kvs.append(('pause', 'space'))
kvs.append(('speed up', 'tab'))
kvs.append(('quit', 'esc'))
lines = [f'{k:<15} {v}' for k, v in kvs]
return lines
if __name__ == '__main__':
main()
================================================
FILE: memory_maze/__init__.py
================================================
import os
# NOTE: Env MUJOCO_GL=egl is necessary for headless hardware rendering on GPU,
# but breaks when running on a CPU machine. Alternatively set MUJOCO_GL=osmesa.
if 'MUJOCO_GL' not in os.environ:
os.environ['MUJOCO_GL'] = 'egl'
from . import tasks
try:
# Register gym environments, if gym is available
from typing import Callable
from functools import partial as f
import dm_env
import gym
from gym.envs.registration import register
from .gym_wrappers import GymWrapper
def _make_gym_env(dm_task: Callable[[], dm_env.Environment], **kwargs):
dmenv = dm_task(**kwargs)
return GymWrapper(dmenv)
sizes = {
'9x9': tasks.memory_maze_9x9,
'11x11': tasks.memory_maze_11x11,
'13x13': tasks.memory_maze_13x13,
'15x15': tasks.memory_maze_15x15,
}
for key, dm_task in sizes.items():
# Image-only obs space
register(id=f'MemoryMaze-{key}-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True)) # Standard
register(id=f'MemoryMaze-{key}-Vis-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, good_visibility=True)) # Easily visible targets
register(id=f'MemoryMaze-{key}-HD-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, camera_resolution=256)) # High-res camera
register(id=f'MemoryMaze-{key}-Top-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, camera_resolution=256, top_camera=True)) # Top-down camera
# Extra global observables (dict obs space)
register(id=f'MemoryMaze-{key}-ExtraObs-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True))
register(id=f'MemoryMaze-{key}-ExtraObs-Vis-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True, good_visibility=True))
register(id=f'MemoryMaze-{key}-ExtraObs-Top-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True, camera_resolution=256, top_camera=True))
# Oracle observables with shortest path shown
register(id=f'MemoryMaze-{key}-Oracle-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, global_observables=True, show_path=True))
register(id=f'MemoryMaze-{key}-Oracle-Top-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, global_observables=True, show_path=True, camera_resolution=256, top_camera=True))
register(id=f'MemoryMaze-{key}-Oracle-ExtraObs-v0', entry_point=f(_make_gym_env, dm_task, global_observables=True, show_path=True))
# High control frequency
register(id=f'MemoryMaze-{key}-HiFreq-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, control_freq=40))
register(id=f'MemoryMaze-{key}-HiFreq-Vis-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, control_freq=40, good_visibility=True))
register(id=f'MemoryMaze-{key}-HiFreq-HD-v0', entry_point=f(_make_gym_env, dm_task, image_only_obs=True, control_freq=40, camera_resolution=256))
# Six colors even for smaller mazes
register(id=f'MemoryMaze-{key}-6CL-v0', entry_point=f(_make_gym_env, dm_task, randomize_colors=True, image_only_obs=True))
register(id=f'MemoryMaze-{key}-6CL-Top-v0', entry_point=f(_make_gym_env, dm_task, randomize_colors=True, image_only_obs=True, camera_resolution=256, top_camera=True))
register(id=f'MemoryMaze-{key}-6CL-ExtraObs-v0', entry_point=f(_make_gym_env, dm_task, randomize_colors=True, global_observables=True))
except ImportError:
print('memory_maze: gym environments not registered.')
raise
================================================
FILE: memory_maze/gym_wrappers.py
================================================
from typing import Any, Tuple
import numpy as np
import dm_env
import gym
from dm_env import specs
from gym import spaces
class GymWrapper(gym.Env):
def __init__(self, env: dm_env.Environment):
self.env = env
self.action_space = _convert_to_space(env.action_spec())
self.observation_space = _convert_to_space(env.observation_spec())
def reset(self) -> Any:
ts = self.env.reset()
return ts.observation
def step(self, action) -> Tuple[Any, float, bool, dict]:
ts = self.env.step(action)
assert not ts.first(), "dm_env.step() caused reset, reward will be undefined."
assert ts.reward is not None
done = ts.last()
terminal = ts.last() and ts.discount == 0.0
info = {}
if done and not terminal:
info['TimeLimit.truncated'] = True # acme.GymWrapper understands this and converts back to dm_env.truncation()
return ts.observation, ts.reward, done, info
def _convert_to_space(spec: Any) -> gym.Space:
# Inverse of acme.gym_wrappers._convert_to_spec
if isinstance(spec, specs.DiscreteArray):
return spaces.Discrete(spec.num_values)
if isinstance(spec, specs.BoundedArray):
return spaces.Box(
shape=spec.shape,
dtype=spec.dtype,
low=spec.minimum.item() if len(spec.minimum.shape) == 0 else spec.minimum,
high=spec.maximum.item() if len(spec.maximum.shape) == 0 else spec.maximum)
if isinstance(spec, specs.Array):
return spaces.Box(
shape=spec.shape,
dtype=spec.dtype,
low=-np.inf,
high=np.inf)
if isinstance(spec, tuple):
return spaces.Tuple(_convert_to_space(s) for s in spec)
if isinstance(spec, dict):
return spaces.Dict({key: _convert_to_space(value) for key, value in spec.items()})
raise ValueError(f'Unexpected spec: {spec}')
================================================
FILE: memory_maze/helpers.py
================================================
from dm_env.specs import BoundedArray, DiscreteArray
import numpy as np
def sample_spec(space: BoundedArray) -> np.ndarray:
if isinstance(space, DiscreteArray):
return np.random.randint(space.num_values, size=space.shape)
if isinstance(space, BoundedArray):
return np.random.uniform(space.minimum, space.maximum, size=space.shape)
raise NotImplementedError
================================================
FILE: memory_maze/maze.py
================================================
from typing import Optional
import functools
import string
import labmaze
import numpy as np
from dm_control import mjcf
from dm_control.composer.observation import observable as observable_lib
from dm_control.locomotion.arenas import covering, labmaze_textures, mazes
from dm_control.locomotion.props import target_sphere
from dm_control.locomotion.tasks import random_goal_maze
from dm_control.locomotion.walkers import jumping_ball
from labmaze import assets as labmaze_assets
from numpy.random import RandomState
DEFAULT_CONTROL_TIMESTEP = 0.025
DEFAULT_PHYSICS_TIMESTEP = 0.005
TARGET_COLORS = [
np.array([170, 38, 30]) / 220, # red
np.array([99, 170, 88]) / 220, # green
np.array([39, 140, 217]) / 220, # blue
np.array([93, 105, 199]) / 220, # purple
np.array([220, 193, 59]) / 220, # yellow
np.array([220, 128, 107]) / 220, # salmon
]
class RollingBallWithFriction(jumping_ball.RollingBallWithHead):
def _build(self, roll_damping=5.0, steer_damping=20.0, **kwargs):
super()._build(**kwargs)
# Increase friction to the joints, so the movement feels more like traditional
# first-person navigation control, without much acceleration/deceleration.
self._mjcf_root.find('joint', 'roll').damping = roll_damping
self._mjcf_root.find('joint', 'steer').damping = steer_damping
class MemoryMazeTask(random_goal_maze.NullGoalMaze):
# Adapted from dm_control.locomotion.tasks.RepeatSingleGoalMaze
def __init__(self,
walker,
maze_arena,
n_targets=3,
target_radius=0.3,
target_height_above_ground=0.0,
target_reward_scale=1.0,
target_randomize_colors=False,
enable_global_task_observables=False,
camera_resolution=64,
physics_timestep=DEFAULT_PHYSICS_TIMESTEP,
control_timestep=DEFAULT_CONTROL_TIMESTEP,
):
super().__init__(
walker=walker,
maze_arena=maze_arena,
randomize_spawn_position=True,
randomize_spawn_rotation=True,
contact_termination=False,
enable_global_task_observables=enable_global_task_observables,
physics_timestep=physics_timestep,
control_timestep=control_timestep
)
self.n_targets = n_targets
self._target_radius = target_radius
self._target_height_above_ground = target_height_above_ground
self._target_reward_scale = target_reward_scale
self._target_randomize_colors = target_randomize_colors
self._targets = []
self._target_colors = list(TARGET_COLORS) # This contains all colors, not only n_targets
self._create_targets()
self._current_target_ix = 0
self._rewarded_this_step = False
self._targets_obtained = 0
if enable_global_task_observables:
# Add egocentric vectors to targets
xpos_origin_callable = lambda phys: phys.bind(walker.root_body).xpos
def _target_pos(physics, targets, index):
return physics.bind(targets[index].geom).xpos
for i in range(n_targets):
# Absolute target position
walker.observables.add_observable(
f'target_abs_{i}',
observable_lib.Generic(functools.partial(_target_pos, targets=self._targets, index=i)),
)
# Relative target position
walker.observables.add_egocentric_vector(
f'target_rel_{i}',
observable_lib.Generic(functools.partial(_target_pos, targets=self._targets, index=i)),
origin_callable=xpos_origin_callable)
self._task_observables = super().task_observables
def _current_target_index(_):
return self._current_target_ix
def _current_target_color(_):
return self._target_colors[self._current_target_ix]
self._task_observables['target_index'] = observable_lib.Generic(_current_target_index)
self._task_observables['target_index'].enabled = True
self._task_observables['target_color'] = observable_lib.Generic(_current_target_color)
self._task_observables['target_color'].enabled = True
self._walker.observables.egocentric_camera.height = camera_resolution
self._walker.observables.egocentric_camera.width = camera_resolution
self._maze_arena.observables.top_camera.height = camera_resolution
self._maze_arena.observables.top_camera.width = camera_resolution
@property
def task_observables(self):
return self._task_observables
@property
def name(self):
return 'memory_maze'
def initialize_episode_mjcf(self, rng: RandomState):
self._maze_arena.regenerate(rng) # Bypass super()._initialize_episode_mjcf(), because it ignores rng
while True:
if self._target_randomize_colors:
# Recreate target objects with new colors
self._create_targets(clear_existing=True, randomize_colors=True, rng=rng)
ok = self._place_targets(rng)
if not ok:
# Could not place targets - regenerate the maze
self._maze_arena.regenerate(rng)
continue
break
self._pick_new_target(rng)
def initialize_episode(self, physics, rng: RandomState):
super().initialize_episode(physics, rng)
self._rewarded_this_step = False
self._targets_obtained = 0
def after_step(self, physics, rng: RandomState):
super().after_step(physics, rng)
self._rewarded_this_step = False
for i, target in enumerate(self._targets):
if target.activated:
if i == self._current_target_ix:
self._rewarded_this_step = True
self._targets_obtained += 1
self._pick_new_target(rng)
target.reset(physics) # Resets activated=False
def should_terminate_episode(self, physics):
return super().should_terminate_episode(physics)
def get_reward(self, physics):
if self._rewarded_this_step:
return self._target_reward_scale
return 0.0
def _create_targets(self, clear_existing=False, randomize_colors=False, rng: Optional[RandomState] = None):
if clear_existing:
while self._targets:
target = self._targets.pop()
target.detach() # Important to detach old targets, if creating new ones
else:
assert not self._targets, 'Targets already created.'
if randomize_colors:
assert rng is not None
rng.shuffle(self._target_colors)
for i in range(self.n_targets):
color = self._target_colors[i]
target = target_sphere.TargetSphere(
radius=self._target_radius,
height_above_ground=self._target_radius + self._target_height_above_ground,
rgb1=tuple(color * 1.0),
rgb2=tuple(color * 1.0),
)
self._targets.append(target)
self._maze_arena.attach(target)
def _place_targets(self, rng: RandomState) -> bool:
possible_positions = list(self._maze_arena.target_positions)
rng.shuffle(possible_positions)
if len(possible_positions) < len(self._targets):
# Too few rooms - need to regenerate the maze
return False
for target, pos in zip(self._targets, possible_positions):
mjcf.get_attachment_frame(target.mjcf_model).pos = pos
return True
def _pick_new_target(self, rng: RandomState):
while True:
ix = rng.randint(len(self._targets))
if self._targets[ix].activated:
continue # Skip the target that the agent is touching
self._current_target_ix = ix
break
class FixedWallTexture(labmaze_textures.WallTextures):
"""Selects a single texture instead of a collection to sample from."""
def _build(self, style, texture_name):
labmaze_textures = labmaze_assets.get_wall_texture_paths(style)
self._mjcf_root = mjcf.RootElement(model='labmaze_' + style)
self._textures = []
if texture_name not in labmaze_textures:
raise ValueError(f'`texture_name` should be one of {labmaze_textures.keys()}: got {texture_name}')
texture_path = labmaze_textures[texture_name]
self._textures.append(self._mjcf_root.asset.add( # type: ignore
'texture', type='2d', name=texture_name,
file=texture_path.format(texture_name)))
class FixedFloorTexture(labmaze_textures.FloorTextures):
"""Selects a single texture instead of a collection to sample from."""
def _build(self, style, texture_names):
labmaze_textures = labmaze_assets.get_floor_texture_paths(style)
self._mjcf_root = mjcf.RootElement(model='labmaze_' + style)
self._textures = []
if isinstance(texture_names, str):
texture_names = [texture_names]
for texture_name in texture_names:
if texture_name not in labmaze_textures:
raise ValueError(f'`texture_name` should be one of {labmaze_textures.keys()}: got {texture_name}')
texture_path = labmaze_textures[texture_name]
self._textures.append(self._mjcf_root.asset.add( # type: ignore
'texture', type='2d', name=texture_name,
file=texture_path.format(texture_name)))
class MazeWithTargetsArena(mazes.MazeWithTargets):
"""Fork of mazes.RandomMazeWithTargets."""
def _build(self,
x_cells,
y_cells,
xy_scale=2.0,
z_height=2.0,
max_rooms=4,
room_min_size=3,
room_max_size=5,
spawns_per_room=0,
targets_per_room=0,
max_variations=26,
simplify=True,
skybox_texture=None,
wall_textures=None,
floor_textures=None,
aesthetic='default',
name='random_maze',
random_seed=None):
assert random_seed, "Expected to be set by tasks._memory_maze()"
super()._build(
maze=TextMazeVaryingWalls(
height=y_cells,
width=x_cells,
max_rooms=max_rooms,
room_min_size=room_min_size,
room_max_size=room_max_size,
max_variations=max_variations,
spawns_per_room=spawns_per_room,
objects_per_room=targets_per_room,
simplify=simplify,
random_seed=random_seed),
xy_scale=xy_scale,
z_height=z_height,
skybox_texture=skybox_texture,
wall_textures=wall_textures,
floor_textures=floor_textures,
aesthetic=aesthetic,
name=name)
def regenerate(self, random_state):
"""Generates a new maze layout.
Patch of MazeWithTargets.regenerate() which uses random_state.
"""
self._maze.regenerate()
# logging.debug('GENERATED MAZE:\n%s', self._maze.entity_layer)
self._find_spawn_and_target_positions()
if self._text_maze_regenerated_hook:
self._text_maze_regenerated_hook()
# Remove old texturing planes.
for geom_name in self._texturing_geom_names:
del self._mjcf_root.worldbody.geom[geom_name]
self._texturing_geom_names = []
# Remove old texturing materials.
for material_name in self._texturing_material_names:
del self._mjcf_root.asset.material[material_name]
self._texturing_material_names = []
# Remove old actual-wall geoms.
self._maze_body.geom.clear()
self._current_wall_texture = {
wall_char: random_state.choice(wall_textures) # PATCH: use random_state for wall textures
for wall_char, wall_textures in self._wall_textures.items()
}
for wall_char in self._wall_textures:
self._make_wall_geoms(wall_char)
self._make_floor_variations()
def _make_floor_variations(self, build_tile_geoms_fn=None):
"""Fork of mazes.MazeWithTargets._make_floor_variations().
Makes the room floors different if possible, instead of sampling randomly.
"""
_DEFAULT_FLOOR_CHAR = '.'
main_floor_texture = self._floor_textures[0]
if len(self._floor_textures) > 1:
room_floor_textures = self._floor_textures[1:]
else:
room_floor_textures = [main_floor_texture]
for i_var, variation in enumerate(_DEFAULT_FLOOR_CHAR + string.ascii_uppercase):
if variation not in self._maze.variations_layer:
break
if build_tile_geoms_fn is None:
# Break the floor variation down to odd-sized tiles.
tiles = covering.make_walls(self._maze.variations_layer,
wall_char=variation,
make_odd_sized_walls=True)
else:
tiles = build_tile_geoms_fn(wall_char=variation)
if variation == _DEFAULT_FLOOR_CHAR:
variation_texture = main_floor_texture
else:
variation_texture = room_floor_textures[i_var % len(room_floor_textures)]
for i, tile in enumerate(tiles):
tile_mid = covering.GridCoordinates(
(tile.start.y + tile.end.y - 1) / 2,
(tile.start.x + tile.end.x - 1) / 2)
tile_pos = np.array([(tile_mid.x - self._x_offset) * self._xy_scale,
-(tile_mid.y - self._y_offset) * self._xy_scale,
0.0])
tile_size = np.array([(tile.end.x - tile_mid.x - 0.5) * self._xy_scale,
(tile.end.y - tile_mid.y - 0.5) * self._xy_scale,
self._xy_scale])
if variation == _DEFAULT_FLOOR_CHAR:
tile_name = 'floor_{}'.format(i)
else:
tile_name = 'floor_{}_{}'.format(variation, i)
self._tile_geom_names[tile.start] = tile_name
self._texturing_material_names.append(tile_name)
self._texturing_geom_names.append(tile_name)
material = self._mjcf_root.asset.add(
'material', name=tile_name, texture=variation_texture,
texrepeat=(2 * tile_size[[0, 1]] / self._xy_scale))
self._mjcf_root.worldbody.add(
'geom', name=tile_name, type='plane', material=material,
pos=tile_pos, size=tile_size, contype=0, conaffinity=0)
class TextMazeVaryingWalls(labmaze.RandomMaze):
"""Augments standard generated labmaze with some walls marked with different chars."""
def regenerate(self):
super().regenerate()
self._block_variations()
def _block_variations(self):
nblocks = 3
wall_chars = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
n = self.entity_layer.shape[0]
ivar = 0
for i in range(nblocks):
for j in range(nblocks):
i_from = i * n // nblocks
i_to = (i + 1) * n // nblocks
j_from = j * n // nblocks
j_to = (j + 1) * n // nblocks
self._change_block_char(i_from, i_to, j_from, j_to, wall_chars[ivar])
ivar += 1
def _change_block_char(self, i1, i2, j1, j2, char):
grid = self.entity_layer
i, j = np.where(grid[i1:i2, j1:j2] == '*')
grid[i + i1, j + j1] = char
================================================
FILE: memory_maze/oracle.py
================================================
from collections import deque
from typing import List, Optional, Tuple
import numpy as np
from memory_maze.wrappers import ObservationWrapper
class PathToTargetWrapper(ObservationWrapper):
"""Find shortest path to target and indicate it on maze_layout. Used for Oracle."""
def observation_spec(self):
spec = self.env.observation_spec()
assert isinstance(spec, dict)
assert 'agent_pos' in spec
assert 'target_pos' in spec
assert 'maze_layout' in spec
return spec
def observation(self, obs):
assert isinstance(obs, dict)
# Find shortest path (in gridworld) from agent to target
maze = obs['maze_layout']
start = tuple(obs['agent_pos'].astype(int))
finish = tuple(obs['target_pos'].astype(int))
path = breadth_first_search(maze, start, finish)
if path:
for x, y in path:
maze[y, x] = 2 # Update maze_layout observation
return obs
class DrawMinimapWrapper(ObservationWrapper):
"""Show maze_layout as minimap in image observation. Used for Oracle."""
def observation_spec(self):
spec = self.env.observation_spec()
assert isinstance(spec, dict)
assert 'maze_layout' in spec
assert 'image' in spec
assert 'agent_dir' in spec
return spec
def observation(self, obs):
from PIL import Image
assert isinstance(obs, dict)
maze = obs['maze_layout']
x, y = obs['agent_pos']
dx, dy = obs['agent_dir']
angle = np.arctan2(dx, dy)
N = maze.shape[0]
SIZE = N * 2
# Draw map
map = np.zeros((N, N, 3), np.uint8) # walls in black
map[:, :] += (maze == 1)[..., None] * np.array([[[255, 255, 255]]], np.uint8) # corridors in white
map[:, :] += (maze == 2)[..., None] * np.array([[[0, 255, 0]]], np.uint8) # path in green
map[int(y), int(x)] = np.array([255, 0, 0], np.uint8) # agent in red
map = np.flip(map, 0)
# Scale, rotate, translate
mapimg = Image.fromarray(map)
mapimg = mapimg.resize((SIZE, SIZE), resample=0)
tx = (x - N / 2) / N * SIZE
ty = - (y - N / 2) / N * SIZE
mapimg = mapimg.transform(mapimg.size, 0,
(1, 0, tx,
0, 1, ty),
resample=0)
mapimg = mapimg.rotate(angle / np.pi * 180, resample=0)
# Overlay minimap onto observation image top-right corner
img = obs['image']
img[:SIZE, -SIZE:] = img[:SIZE, -SIZE:] // 2 + np.array(mapimg) // 2
return obs
def breadth_first_search(maze: np.ndarray, start: Tuple[int, int], finish: Tuple[int, int]) -> Optional[List[Tuple[int, int]]]:
h, w = maze.shape
queue = deque()
visited = np.zeros(maze.shape, dtype=bool)
backtrace = np.zeros(maze.shape + (2,), dtype=int)
xs, ys = start
queue.append((xs, ys))
visited[ys, xs] = True
while len(queue) > 0:
x, y = queue.popleft()
for dx, dy in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
x1 = x + dx
y1 = y + dy
if 0 <= x1 < w and 0 <= y1 < h and maze[y1, x1] and not visited[y1, x1]:
queue.append((x1, y1))
visited[y1, x1] = True
backtrace[y1, x1, :] = np.array([x, y])
if (x1, y1) == finish:
break
xf, yf = finish
if not visited[yf, xf]:
return None
path = []
path.append((xf, yf))
while (xf, yf) != start:
xf, yf = backtrace[yf, xf]
path.append((xf, yf))
path.reverse()
return path
================================================
FILE: memory_maze/tasks.py
================================================
import numpy as np
from dm_control import composer
from dm_control.locomotion.arenas import labmaze_textures
from memory_maze.maze import *
from memory_maze.oracle import DrawMinimapWrapper, PathToTargetWrapper
from memory_maze.wrappers import *
# Slow control (4Hz), so that agent without HRL has a chance.
# Native control would be ~20Hz, so this corresponds roughly to action_repeat=5.
DEFAULT_CONTROL_FREQ = 4.0
def memory_maze_9x9(**kwargs):
"""
Maze based on DMLab30-explore_goal_locations_small
{
mazeHeight = 11, # with outer walls
mazeWidth = 11,
roomCount = 4,
roomMaxSize = 5,
roomMinSize = 3,
}
"""
return _memory_maze(9, 3, 250, **kwargs)
def memory_maze_11x11(**kwargs):
return _memory_maze(11, 4, 500, **kwargs)
def memory_maze_13x13(**kwargs):
return _memory_maze(13, 5, 750, **kwargs)
def memory_maze_15x15(**kwargs):
"""
Maze based on DMLab30-explore_goal_locations_large
{
mazeHeight = 17, # with outer walls
mazeWidth = 17,
roomCount = 9,
roomMaxSize = 3,
roomMaxSize = 3,
}
"""
return _memory_maze(15, 6, 1000, max_rooms=9, room_max_size=3, **kwargs)
def _memory_maze(
maze_size, # measured without exterior walls
n_targets,
time_limit,
max_rooms=6,
room_min_size=3,
room_max_size=5,
control_freq=DEFAULT_CONTROL_FREQ,
discrete_actions=True,
image_only_obs=False,
target_color_in_image=True,
global_observables=False,
top_camera=False,
good_visibility=False,
show_path=False,
camera_resolution=64,
seed=None,
randomize_colors=False,
):
random_state = np.random.RandomState(seed)
walker = RollingBallWithFriction(camera_height=0.3, add_ears=top_camera)
arena = MazeWithTargetsArena(
x_cells=maze_size + 2, # inner size => outer size
y_cells=maze_size + 2,
xy_scale=2.0,
z_height=1.5 if not good_visibility else 0.4,
max_rooms=max_rooms,
room_min_size=room_min_size,
room_max_size=room_max_size,
spawns_per_room=1,
targets_per_room=1,
floor_textures=FixedFloorTexture('style_01', ['blue', 'blue_bright']),
wall_textures=dict({
'*': FixedWallTexture('style_01', 'yellow'), # default wall
}, **{str(i): labmaze_textures.WallTextures('style_01') for i in range(10)} # variations
),
skybox_texture=None,
random_seed=random_state.randint(2147483648),
)
task = MemoryMazeTask(
walker=walker,
maze_arena=arena,
n_targets=n_targets,
target_radius=0.6,
target_height_above_ground=0.5 if good_visibility else -0.6,
enable_global_task_observables=True, # Always add to underlying env, but not always expose in RemapObservationWrapper
control_timestep=1.0 / control_freq,
camera_resolution=camera_resolution,
target_randomize_colors=randomize_colors,
)
if top_camera:
task.observables['top_camera'].enabled = True
env = composer.Environment(
time_limit=time_limit - 1e-3, # subtract epsilon to make sure ep_length=time_limit*fps
task=task,
random_state=random_state,
strip_singleton_obs_buffer_dim=True)
obs_mapping = {
'image': 'walker/egocentric_camera' if not top_camera else 'top_camera',
'target_color': 'target_color',
}
if global_observables:
env = TargetsPositionWrapper(env, task._maze_arena.xy_scale, task._maze_arena.maze.width, task._maze_arena.maze.height)
env = AgentPositionWrapper(env, task._maze_arena.xy_scale, task._maze_arena.maze.width, task._maze_arena.maze.height)
env = MazeLayoutWrapper(env)
obs_mapping = dict(obs_mapping, **{
'agent_pos': 'agent_pos',
'agent_dir': 'agent_dir',
'targets_vec': 'targets_vec',
'targets_pos': 'targets_pos',
'target_vec': 'target_vec',
'target_pos': 'target_pos',
'maze_layout': 'maze_layout',
})
env = RemapObservationWrapper(env, obs_mapping)
if target_color_in_image:
env = TargetColorAsBorderWrapper(env)
if show_path:
env = PathToTargetWrapper(env)
env = DrawMinimapWrapper(env)
if image_only_obs:
assert target_color_in_image, 'Image-only observation only makes sense with target_color_in_image'
env = ImageOnlyObservationWrapper(env)
if discrete_actions:
env = DiscreteActionSetWrapper(env, [
np.array([0.0, 0.0]), # noop
np.array([-1.0, 0.0]), # forward
np.array([0.0, -1.0]), # left
np.array([0.0, +1.0]), # right
np.array([-1.0, -1.0]), # forward + left
np.array([-1.0, +1.0]), # forward + right
])
return env
================================================
FILE: memory_maze/wrappers.py
================================================
from typing import Any, Dict, List
import dm_env
import numpy as np
from dm_env import specs
class Wrapper(dm_env.Environment):
"""Base class for dm_env.Environment wrapper."""
def __init__(self, env: dm_env.Environment):
self.env = env
def __getattr__(self, name):
if name.startswith('__'):
raise AttributeError(f'Attempted to get missing private attribute {name}')
return getattr(self.env, name)
def step(self, action) -> dm_env.TimeStep:
return self.env.step(action)
def reset(self) -> dm_env.TimeStep:
return self.env.reset()
def action_spec(self) -> Any:
return self.env.action_spec()
def discount_spec(self) -> Any:
return self.env.discount_spec()
def observation_spec(self) -> Any:
return self.env.observation_spec()
def reward_spec(self) -> Any:
return self.env.reward_spec()
def close(self):
return self.env.close()
class ObservationWrapper(Wrapper):
"""Base class for observation wrapper."""
def observation_spec(self):
raise NotImplementedError
def observation(self, obs: Any) -> Any:
raise NotImplementedError
def step(self, action) -> dm_env.TimeStep:
step_type, discount, reward, observation = self.env.step(action)
return dm_env.TimeStep(step_type, discount, reward, self.observation(observation))
def reset(self) -> dm_env.TimeStep:
step_type, discount, reward, observation = self.env.reset()
return dm_env.TimeStep(step_type, discount, reward, self.observation(observation))
class RemapObservationWrapper(ObservationWrapper):
"""Select a subset of dictionary observation keys and rename them."""
def __init__(self, env: dm_env.Environment, mapping: Dict[str, str]):
super().__init__(env)
self.mapping = mapping
def observation_spec(self):
spec = self.env.observation_spec()
assert isinstance(spec, dict)
return {key: spec[key_orig] for key, key_orig in self.mapping.items()}
def observation(self, obs):
assert isinstance(obs, dict)
return {key: obs[key_orig] for key, key_orig in self.mapping.items()}
class TargetsPositionWrapper(ObservationWrapper):
"""Collects and postporcesses walker/target_rel_{i} relative position vectors into
targets_vec (n_targets,2) tensor, and walker/targets_abs_{i} absolute positions
into targets_pos tensor."""
def __init__(self, env: dm_env.Environment, maze_xy_scale, maze_width, maze_height):
super().__init__(env)
self.maze_xy_scale = maze_xy_scale
self.center_ji = np.array([maze_width - 2.0, maze_height - 2.0]) / 2.0
spec = self.env.observation_spec()
assert isinstance(spec, dict)
assert 'walker/target_rel_0' in spec
assert 'walker/target_abs_0' in spec
assert 'target_index' in spec
i = 0
while f'walker/target_rel_{i}' in spec:
assert f'walker/target_abs_{i}' in spec
i += 1
self.n_targets = i
def observation_spec(self):
spec = self.env.observation_spec()
assert isinstance(spec, dict)
# All targets
spec['targets_vec'] = specs.Array((self.n_targets, 2), float, 'targets_vec')
spec['targets_pos'] = specs.Array((self.n_targets, 2), float, 'targets_pos')
# Current target
spec['target_vec'] = specs.Array((2,), float, 'target_vec')
spec['target_pos'] = specs.Array((2,), float, 'target_pos')
return spec
def observation(self, obs):
assert isinstance(obs, dict)
# All targets
x_rel = np.zeros((self.n_targets, 2))
x_abs = np.zeros((self.n_targets, 2))
for i in range(self.n_targets):
x_rel[i] = obs[f'walker/target_rel_{i}'][:2] / self.maze_xy_scale
x_abs[i] = obs[f'walker/target_abs_{i}'][:2] / self.maze_xy_scale + self.center_ji
obs['targets_vec'] = x_rel
obs['targets_pos'] = x_abs
# Current target
target_ix = int(obs['target_index'])
obs['target_vec'] = x_rel[target_ix]
obs['target_pos'] = x_abs[target_ix]
return obs
class AgentPositionWrapper(ObservationWrapper):
"""Postprocesses absolute_position and absolute_orientation."""
def __init__(self, env: dm_env.Environment, maze_xy_scale, maze_width, maze_height):
super().__init__(env)
self.maze_xy_scale = maze_xy_scale
self.center_ji = np.array([maze_width - 2.0, maze_height - 2.0]) / 2.0
def observation_spec(self):
spec = self.env.observation_spec()
# absolute_position and absolute_orientation should already be generated by the environment.
assert isinstance(spec, dict) and 'absolute_position' in spec and 'absolute_orientation' in spec
# Add agent_pos, measured in grid coordinates
spec['agent_pos'] = specs.Array((2, ), float, 'agent_pos')
# Add agent_dir as 2-vector
spec['agent_dir'] = specs.Array((2, ), float, 'agent_dir')
return spec
def observation(self, obs):
assert isinstance(obs, dict)
walker_xy = obs['absolute_position'][:2]
walker_ji = walker_xy / self.maze_xy_scale + self.center_ji
# agent_pos, measured in grid coordinates, where bottom-left coordinate is (0.1,0.1),
# and top-right coordinate for a 15x15 maze is (14.9,14.9)
obs['agent_pos'] = walker_ji
# Pick orientation vector such, that going forward increases agent_pos in the direction of agent_dir.
obs['agent_dir'] = obs['absolute_orientation'][:2, 1]
return obs
class MazeLayoutWrapper(ObservationWrapper):
"""Postprocesses maze_layout observation."""
def observation_spec(self):
spec = self.env.observation_spec()
# maze_layout should already be generated by the environment
assert isinstance(spec, dict) and 'maze_layout' in spec
# Change char array to binary array, removing outer walls
n, m = spec['maze_layout'].shape
spec['maze_layout'] = specs.BoundedArray((n - 2, m - 2), np.uint8, 0, 1, 'maze_layout')
return spec
def observation(self, obs):
assert isinstance(obs, dict)
maze = obs['maze_layout']
maze = maze[1:-1, 1:-1] # Remove outer walls
maze = np.flip(maze, 0) # Flip vertical axis so that bottom-left is at maze[0,0]
nonwalls = (maze == ' ') | (maze == 'P') | (maze == 'G')
obs['maze_layout'] = nonwalls.astype(np.uint8)
return obs
class ImageOnlyObservationWrapper(ObservationWrapper):
"""Select one of the dictionary observation keys as observation."""
def __init__(self, env: dm_env.Environment, key: str = 'image'):
super().__init__(env)
self.key = key
def observation_spec(self):
spec = self.env.observation_spec()
assert isinstance(spec, dict)
return spec[self.key]
def observation(self, obs):
assert isinstance(obs, dict)
return obs[self.key]
class DiscreteActionSetWrapper(Wrapper):
"""Change action space from continuous to discrete with given set of action vectors."""
def __init__(self, env: dm_env.Environment, action_set: List[np.ndarray]):
super().__init__(env)
self.action_set = action_set
def action_spec(self):
return specs.DiscreteArray(len(self.action_set))
def step(self, action) -> dm_env.TimeStep:
return self.env.step(self.action_set[action])
class TargetColorAsBorderWrapper(ObservationWrapper):
"""MemoryMaze-specific wrapper, which draws target_color as border on the image."""
def observation_spec(self):
spec = self.env.observation_spec()
assert isinstance(spec, dict)
assert 'target_color' in spec
return spec
def observation(self, obs):
assert isinstance(obs, dict)
assert 'target_color' in obs and 'image' in obs
target_color = obs['target_color']
img = obs['image']
B = int(2 * np.sqrt(img.shape[0] // 64))
img[:, :B] = target_color * 255 * 0.7
img[:, -B:] = target_color * 255 * 0.7
img[:B, :] = target_color * 255 * 0.7
img[-B:, :] = target_color * 255 * 0.7
return obs
================================================
FILE: setup.py
================================================
from setuptools import setup
import pathlib
__version__ = "1.0.3"
setup(
name="memory-maze",
version=__version__,
author="Jurgis Pasukonis",
author_email="jurgisp@gmail.com",
url="https://github.com/jurgisp/memory-maze",
description="Memory Maze is an environment to benchmark memory abilities of RL agents",
long_description=pathlib.Path('README.md').read_text(),
long_description_content_type='text/markdown',
zip_safe=False,
python_requires=">=3",
packages=["memory_maze"],
install_requires=[
'dm_control'
],
)