Showing preview only (854K chars total). Download the full file or copy to clipboard to get everything.
Repository: chauncygu/Multi-Agent-Constrained-Policy-Optimisation
Branch: main
Commit: b80a9f5b4a00
Files: 141
Total size: 802.8 KB
Directory structure:
gitextract_e69a341i/
├── LICENSE
├── MACPO/
│ ├── .gitignore
│ ├── environment.yaml
│ ├── macpo/
│ │ ├── __init__.py
│ │ ├── algorithms/
│ │ │ ├── __init__.py
│ │ │ ├── r_mappo/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── algorithm/
│ │ │ │ │ ├── MACPPOPolicy.py
│ │ │ │ │ ├── rMAPPOPolicy.py
│ │ │ │ │ └── r_actor_critic.py
│ │ │ │ └── r_macpo.py
│ │ │ └── utils/
│ │ │ ├── act.py
│ │ │ ├── cnn.py
│ │ │ ├── distributions.py
│ │ │ ├── mlp.py
│ │ │ ├── rnn.py
│ │ │ └── util.py
│ │ ├── config.py
│ │ ├── envs/
│ │ │ ├── __init__.py
│ │ │ ├── env_wrappers.py
│ │ │ └── safety_ma_mujoco/
│ │ │ ├── MUJOCO_LOG.TXT
│ │ │ ├── README.md
│ │ │ ├── __init__.py
│ │ │ ├── safety_multiagent_mujoco/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── ant.py
│ │ │ │ ├── assets/
│ │ │ │ │ ├── .gitignore
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── ant.xml
│ │ │ │ │ ├── coupled_half_cheetah.xml
│ │ │ │ │ ├── half_cheetah.xml
│ │ │ │ │ ├── hopper.xml
│ │ │ │ │ ├── humanoid.xml
│ │ │ │ │ ├── manyagent_ant.xml
│ │ │ │ │ ├── manyagent_ant.xml.template
│ │ │ │ │ ├── manyagent_ant__stage1.xml
│ │ │ │ │ ├── manyagent_swimmer.xml.template
│ │ │ │ │ ├── manyagent_swimmer__bckp2.xml
│ │ │ │ │ └── manyagent_swimmer_bckp.xml
│ │ │ │ ├── coupled_half_cheetah.py
│ │ │ │ ├── half_cheetah.py
│ │ │ │ ├── hopper.py
│ │ │ │ ├── humanoid.py
│ │ │ │ ├── manyagent_ant.py
│ │ │ │ ├── manyagent_swimmer.py
│ │ │ │ ├── mujoco_env.py
│ │ │ │ ├── mujoco_multi.py
│ │ │ │ ├── multiagentenv.py
│ │ │ │ └── obsk.py
│ │ │ └── test.py
│ │ ├── runner/
│ │ │ ├── __init__.py
│ │ │ └── separated/
│ │ │ ├── __init__.py
│ │ │ ├── base_runner.py
│ │ │ ├── base_runner_macpo.py
│ │ │ ├── mujoco_runner.py
│ │ │ └── mujoco_runner_macpo.py
│ │ ├── scripts/
│ │ │ ├── __init__.py
│ │ │ ├── train/
│ │ │ │ ├── __init__.py
│ │ │ │ └── train_mujoco.py
│ │ │ └── train_mujoco.sh
│ │ └── utils/
│ │ ├── __init__.py
│ │ ├── multi_discrete.py
│ │ ├── popart.py
│ │ ├── separated_buffer.py
│ │ └── util.py
│ ├── macpo.egg-info/
│ │ ├── PKG-INFO
│ │ ├── SOURCES.txt
│ │ ├── dependency_links.txt
│ │ └── top_level.txt
│ └── setup.py
├── MAPPO-Lagrangian/
│ ├── .gitignore
│ ├── environment.yaml
│ ├── mappo_lagrangian/
│ │ ├── __init__.py
│ │ ├── algorithms/
│ │ │ ├── __init__.py
│ │ │ ├── r_mappo/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── algorithm/
│ │ │ │ │ ├── MACPPOPolicy.py
│ │ │ │ │ ├── rMAPPOPolicy.py
│ │ │ │ │ └── r_actor_critic.py
│ │ │ │ └── r_mappo_lagr.py
│ │ │ └── utils/
│ │ │ ├── act.py
│ │ │ ├── cnn.py
│ │ │ ├── distributions.py
│ │ │ ├── mlp.py
│ │ │ ├── rnn.py
│ │ │ └── util.py
│ │ ├── config.py
│ │ ├── envs/
│ │ │ ├── __init__.py
│ │ │ ├── env_wrappers.py
│ │ │ └── safety_ma_mujoco/
│ │ │ ├── MUJOCO_LOG.TXT
│ │ │ ├── README.md
│ │ │ ├── __init__.py
│ │ │ ├── safety_multiagent_mujoco/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── ant.py
│ │ │ │ ├── assets/
│ │ │ │ │ ├── .gitignore
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── ant.xml
│ │ │ │ │ ├── beifen_hopper.xml
│ │ │ │ │ ├── coupled_half_cheetah.xml
│ │ │ │ │ ├── half_cheetah.xml
│ │ │ │ │ ├── hopper.xml
│ │ │ │ │ ├── humanoid.xml
│ │ │ │ │ ├── manyagent_ant.xml
│ │ │ │ │ ├── manyagent_ant.xml.template
│ │ │ │ │ ├── manyagent_ant__stage1.xml
│ │ │ │ │ ├── manyagent_swimmer.xml.template
│ │ │ │ │ ├── manyagent_swimmer__bckp2.xml
│ │ │ │ │ └── manyagent_swimmer_bckp.xml
│ │ │ │ ├── coupled_half_cheetah.py
│ │ │ │ ├── half_cheetah.py
│ │ │ │ ├── hopper.py
│ │ │ │ ├── humanoid.py
│ │ │ │ ├── manyagent_ant.py
│ │ │ │ ├── manyagent_swimmer.py
│ │ │ │ ├── mujoco_env.py
│ │ │ │ ├── mujoco_multi.py
│ │ │ │ ├── multiagentenv.py
│ │ │ │ └── obsk.py
│ │ │ └── test.py
│ │ ├── runner/
│ │ │ ├── __init__.py
│ │ │ └── separated/
│ │ │ ├── __init__.py
│ │ │ ├── base_runner.py
│ │ │ ├── base_runner_mappo_lagr.py
│ │ │ ├── mujoco_runner.py
│ │ │ └── mujoco_runner_mappo_lagr.py
│ │ ├── scripts/
│ │ │ ├── __init__.py
│ │ │ ├── eval/
│ │ │ │ └── eval_hanabi.py
│ │ │ ├── train/
│ │ │ │ ├── __init__.py
│ │ │ │ └── train_mujoco.py
│ │ │ └── train_mujoco.sh
│ │ └── utils/
│ │ ├── __init__.py
│ │ ├── multi_discrete.py
│ │ ├── popart.py
│ │ ├── separated_buffer.py
│ │ ├── shared_buffer.py
│ │ └── util.py
│ ├── mappo_lagrangian.egg-info/
│ │ ├── PKG-INFO
│ │ ├── SOURCES.txt
│ │ ├── dependency_links.txt
│ │ └── top_level.txt
│ └── setup.py
├── README.md
├── environment.yaml
└── requirements.txt
================================================
FILE CONTENTS
================================================
================================================
FILE: LICENSE
================================================
MIT License
<<<<<<< HEAD
Copyright (c) 2021 anybodyany
=======
Copyright (c) 2020 Tianshou contributors
>>>>>>> upload macpo code
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
<<<<<<< HEAD
SOFTWARE.
=======
SOFTWARE.
>>>>>>> upload macpo code
================================================
FILE: MACPO/.gitignore
================================================
/.idea/
*/__pycache__/
================================================
FILE: MACPO/environment.yaml
================================================
name: marl
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _tflow_select=2.1.0=gpu
- absl-py=0.9.0=py36_0
- astor=0.8.0=py36_0
- blas=1.0=mkl
- c-ares=1.15.0=h7b6447c_1001
- ca-certificates=2020.1.1=0
- certifi=2020.4.5.2=py36_0
- cudatoolkit=10.0.130=0
- cudnn=7.6.5=cuda10.0_0
- cupti=10.0.130=0
- gast=0.2.2=py36_0
- google-pasta=0.2.0=py_0
- grpcio=1.14.1=py36h9ba97e2_0
- h5py=2.10.0=py36h7918eee_0
- hdf5=1.10.4=hb1b8bf9_0
- intel-openmp=2020.1=217
- keras-applications=1.0.8=py_0
- keras-preprocessing=1.1.0=py_1
- libedit=3.1=heed3624_0
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=9.1.0=hdf63c60_0
- libgfortran-ng=7.3.0=hdf63c60_0
- libprotobuf=3.12.3=hd408876_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- markdown=3.1.1=py36_0
- mkl=2020.1=217
- mkl-service=2.3.0=py36he904b0f_0
- mkl_fft=1.1.0=py36h23d657b_0
- mkl_random=1.1.1=py36h0573a6f_0
- ncurses=6.0=h9df7e31_2
- numpy=1.18.1=py36h4f9e942_0
- numpy-base=1.18.1=py36hde5b4d6_1
- openssl=1.0.2u=h7b6447c_0
- opt_einsum=3.1.0=py_0
- pip=20.1.1=py36_1
- protobuf=3.12.3=py36he6710b0_0
- python=3.6.2=hca45abc_19
- readline=7.0=ha6073c6_4
- scipy=1.4.1=py36h0b6359f_0
- setuptools=47.3.0=py36_0
- six=1.15.0=py_0
- sqlite=3.23.1=he433501_0
- tensorboard=2.0.0=pyhb38c66f_1
- tensorflow=2.0.0=gpu_py36h6b29c10_0
- tensorflow-base=2.0.0=gpu_py36h0ec5d1f_0
- tensorflow-estimator=2.0.0=pyh2649769_0
- tensorflow-gpu=2.0.0=h0d30ee6_0
- termcolor=1.1.0=py36_1
- tk=8.6.8=hbc83047_0
- werkzeug=0.16.1=py_0
- wheel=0.34.2=py36_0
- wrapt=1.12.1=py36h7b6447c_1
- xz=5.2.5=h7b6447c_0
- zlib=1.2.11=h7b6447c_3
- pip:
- aiohttp==3.6.2
- aioredis==1.3.1
- astunparse==1.6.3
- async-timeout==3.0.1
- atari-py==0.2.6
- atomicwrites==1.2.1
- attrs==18.2.0
- beautifulsoup4==4.9.1
- blessings==1.7
- cachetools==4.1.1
- cffi==1.14.1
- chardet==3.0.4
- click==7.1.2
- cloudpickle==1.3.0
- colorama==0.4.3
- colorful==0.5.4
- configparser==5.0.1
- contextvars==2.4
- cycler==0.10.0
- cython==0.29.21
- deepdiff==4.3.2
- dill==0.3.2
- docker-pycreds==0.4.0
- docopt==0.6.2
- fasteners==0.15
- filelock==3.0.12
- funcsigs==1.0.2
- future==0.16.0
- gin==0.1.6
- gin-config==0.3.0
- gitdb==4.0.5
- gitpython==3.1.9
- glfw==1.12.0
- google==3.0.0
- google-api-core==1.22.1
- google-auth==1.21.0
- google-auth-oauthlib==0.4.1
- googleapis-common-protos==1.52.0
- gpustat==0.6.0
- gql==0.2.0
- graphql-core==1.1
- gym==0.17.2
- hiredis==1.1.0
- idna==2.7
- idna-ssl==1.1.0
- imageio==2.4.1
- immutables==0.14
- importlib-metadata==1.7.0
- joblib==0.16.0
- jsonnet==0.16.0
- jsonpickle==0.9.6
- jsonschema==3.2.0
- kiwisolver==1.0.1
- lockfile==0.12.2
- mappo==0.0.1
- matplotlib==3.0.0
- mock==2.0.0
- monotonic==1.5
- more-itertools==4.3.0
- mpi4py==3.0.3
- mpyq==0.2.5
- msgpack==1.0.0
- mujoco-py==2.0.2.13
- mujoco-worldgen==0.0.0
- multidict==4.7.6
- munch==2.3.2
- nvidia-ml-py3==7.352.0
- oauthlib==3.1.0
- opencensus==0.7.10
- opencensus-context==0.1.1
- opencv-python==4.2.0.34
- ordered-set==4.0.2
- packaging==20.4
- pandas==1.1.1
- pathlib2==2.3.2
- pathtools==0.1.2
- pbr==4.3.0
- pillow==5.3.0
- pluggy==0.7.1
- portpicker==1.2.0
- probscale==0.2.3
- progressbar2==3.53.1
- prometheus-client==0.8.0
- promise==2.3
- psutil==5.7.2
- py==1.6.0
- py-spy==0.3.3
- pyasn1==0.4.8
- pyasn1-modules==0.2.8
- pycparser==2.20
- pygame==1.9.4
- pyglet==1.5.0
- pyopengl==3.1.5
- pyopengl-accelerate==3.1.5
- pyparsing==2.2.2
- pyrsistent==0.16.0
- pysc2==3.0.0
- pytest==3.8.2
- python-dateutil==2.7.3
- python-utils==2.4.0
- pytz==2020.1
- pyyaml==3.13
- pyzmq==19.0.2
- ray==0.8.0
- redis==3.4.1
- requests==2.24.0
- requests-oauthlib==1.3.0
- rsa==4.6
- s2clientprotocol==4.10.1.75800.0
- s2protocol==4.11.4.78285.0
- sacred==0.7.2
- seaborn==0.10.1
- sentry-sdk==0.18.0
- shortuuid==1.0.1
- sk-video==1.1.10
- smmap==3.0.4
- snakeviz==1.0.0
- soupsieve==2.0.1
- subprocess32==3.5.4
- tabulate==0.8.7
- tensorboard-logger==0.1.0
- tensorboard-plugin-wit==1.7.0
- tensorboardx==2.0
- torch==1.5.1+cu101
- torchvision==0.6.1+cu101
- tornado==5.1.1
- tqdm==4.48.2
- typing-extensions==3.7.4.3
- urllib3==1.23
- wandb==0.10.5
- watchdog==0.10.3
- websocket-client==0.53.0
- whichcraft==0.5.2
- xmltodict==0.12.0
- yarl==1.5.1
- zipp==3.1.0
- zmq==0.0.0
================================================
FILE: MACPO/macpo/__init__.py
================================================
from macpo import algorithms, envs, runner, scripts, utils, config
__version__ = "0.1.0"
__all__ = [
"algorithms",
"envs",
"runner",
"scripts",
"utils",
"config",
]
================================================
FILE: MACPO/macpo/algorithms/__init__.py
================================================
================================================
FILE: MACPO/macpo/algorithms/r_mappo/__init__.py
================================================
def cost_trpo_macppo():
return None
================================================
FILE: MACPO/macpo/algorithms/r_mappo/algorithm/MACPPOPolicy.py
================================================
import torch
from macpo.algorithms.r_mappo.algorithm.r_actor_critic import R_Actor, R_Critic
from macpo.utils.util import update_linear_schedule
class MACPPOPolicy:
"""
MACPO Policy class. Wraps actor and critic networks to compute actions and value function predictions.
:param args: (argparse.Namespace) arguments containing relevant model and policy information.
:param obs_space: (gym.Space) observation space.
:param cent_obs_space: (gym.Space) value function input space (centralized input for MAPPO, decentralized for IPPO).
:param action_space: (gym.Space) action space.
:param device: (torch.device) specifies the device to run on (cpu/gpu).
"""
def __init__(self, args, obs_space, cent_obs_space, act_space, device=torch.device("cpu")):
self.args = args
self.device = device
self.lr = args.lr
self.critic_lr = args.critic_lr
self.opti_eps = args.opti_eps
self.weight_decay = args.weight_decay
self.obs_space = obs_space
self.share_obs_space = cent_obs_space
self.act_space = act_space
self.actor = R_Actor(args, self.obs_space, self.act_space, self.device)
self.critic = R_Critic(args, self.share_obs_space, self.device)
self.cost_critic = R_Critic(args, self.share_obs_space, self.device)
self.actor_optimizer = torch.optim.Adam(self.actor.parameters(),
lr=self.lr, eps=self.opti_eps,
weight_decay=self.weight_decay)
self.critic_optimizer = torch.optim.Adam(self.critic.parameters(),
lr=self.critic_lr,
eps=self.opti_eps,
weight_decay=self.weight_decay)
self.cost_optimizer = torch.optim.Adam(self.cost_critic.parameters(),
lr=self.critic_lr,
eps=self.opti_eps,
weight_decay=self.weight_decay)
def lr_decay(self, episode, episodes):
"""
Decay the actor and critic learning rates.
:param episode: (int) current training episode.
:param episodes: (int) total number of training episodes.
"""
update_linear_schedule(self.actor_optimizer, episode, episodes, self.lr)
update_linear_schedule(self.critic_optimizer, episode, episodes, self.critic_lr)
update_linear_schedule(self.cost_optimizer, episode, episodes, self.critic_lr)
def get_actions(self, cent_obs, obs, rnn_states_actor, rnn_states_critic, masks, available_actions=None,
deterministic=False, rnn_states_cost=None):
"""
Compute actions and value function predictions for the given inputs.
:param cent_obs (np.ndarray): centralized input to the critic.
:param obs (np.ndarray): local agent inputs to the actor.
:param rnn_states_actor: (np.ndarray) if actor is RNN, RNN states for actor.
:param rnn_states_critic: (np.ndarray) if critic is RNN, RNN states for critic.
:param masks: (np.ndarray) denotes points at which RNN states should be reset.
:param available_actions: (np.ndarray) denotes which actions are available to agent
(if None, all actions available)
:param deterministic: (bool) whether the action should be mode of distribution or should be sampled.
:return values: (torch.Tensor) value function predictions.
:return actions: (torch.Tensor) actions to take.
:return action_log_probs: (torch.Tensor) log probabilities of chosen actions.
:return rnn_states_actor: (torch.Tensor) updated actor network RNN states.
:return rnn_states_critic: (torch.Tensor) updated critic network RNN states.
"""
actions, action_log_probs, rnn_states_actor = self.actor(obs,
rnn_states_actor,
masks,
available_actions,
deterministic)
values, rnn_states_critic = self.critic(cent_obs, rnn_states_critic, masks)
if rnn_states_cost is None:
return values, actions, action_log_probs, rnn_states_actor, rnn_states_critic
else:
cost_preds, rnn_states_cost = self.cost_critic(cent_obs, rnn_states_cost, masks)
return values, actions, action_log_probs, rnn_states_actor, rnn_states_critic, cost_preds, rnn_states_cost
def get_values(self, cent_obs, rnn_states_critic, masks):
"""
Get value function predictions.
:param cent_obs (np.ndarray): centralized input to the critic.
:param rnn_states_critic: (np.ndarray) if critic is RNN, RNN states for critic.
:param masks: (np.ndarray) denotes points at which RNN states should be reset.
:return values: (torch.Tensor) value function predictions.
"""
values, _ = self.critic(cent_obs, rnn_states_critic, masks)
return values
def get_cost_values(self, cent_obs, rnn_states_cost, masks):
"""
Get constraint cost predictions.
:param cent_obs (np.ndarray): centralized input to the critic.
:param rnn_states_critic: (np.ndarray) if critic is RNN, RNN states for critic.
:param masks: (np.ndarray) denotes points at which RNN states should be reset.
:return values: (torch.Tensor) value function predictions.
"""
cost_preds, _ = self.cost_critic(cent_obs, rnn_states_cost, masks)
return cost_preds
def evaluate_actions(self, cent_obs, obs, rnn_states_actor, rnn_states_critic, action, masks,
available_actions=None, active_masks=None, rnn_states_cost=None):
"""
Get action logprobs / entropy and value function predictions for actor update.
:param cent_obs (np.ndarray): centralized input to the critic.
:param obs (np.ndarray): local agent inputs to the actor.
:param rnn_states_actor: (np.ndarray) if actor is RNN, RNN states for actor.
:param rnn_states_critic: (np.ndarray) if critic is RNN, RNN states for critic.
:param action: (np.ndarray) actions whose log probabilites and entropy to compute.
:param masks: (np.ndarray) denotes points at which RNN states should be reset.
:param available_actions: (np.ndarray) denotes which actions are available to agent
(if None, all actions available)
:param active_masks: (torch.Tensor) denotes whether an agent is active or dead.
:return values: (torch.Tensor) value function predictions.
:return action_log_probs: (torch.Tensor) log probabilities of the input actions.
:return dist_entropy: (torch.Tensor) action distribution entropy for the given inputs.
"""
if self.args.algorithm_name == "macpo": # todo: for mactrpo
action_log_probs, dist_entropy, action_mu, action_std = self.actor.evaluate_actions(obs,
rnn_states_actor,
action,
masks,
available_actions,
active_masks)
values, _ = self.critic(cent_obs, rnn_states_critic, masks)
cost_values, _ = self.cost_critic(cent_obs, rnn_states_cost, masks)
values, _ = self.critic(cent_obs, rnn_states_critic, masks)
return values, action_log_probs, dist_entropy, cost_values, action_mu, action_std
else: # todo: for lagrangrian
action_log_probs, dist_entropy = self.actor.evaluate_actions(obs,
rnn_states_actor,
action,
masks,
available_actions,
active_masks)
values, _ = self.critic(cent_obs, rnn_states_critic, masks)
cost_values, _ = self.cost_critic(cent_obs, rnn_states_cost, masks)
return values, action_log_probs, dist_entropy, cost_values
def act(self, obs, rnn_states_actor, masks, available_actions=None, deterministic=False):
"""
Compute actions using the given inputs.
:param obs (np.ndarray): local agent inputs to the actor.
:param rnn_states_actor: (np.ndarray) if actor is RNN, RNN states for actor.
:param masks: (np.ndarray) denotes points at which RNN states should be reset.
:param available_actions: (np.ndarray) denotes which actions are available to agent
(if None, all actions available)
:param deterministic: (bool) whether the action should be mode of distribution or should be sampled.
"""
actions, _, rnn_states_actor = self.actor(obs, rnn_states_actor, masks, available_actions, deterministic)
return actions, rnn_states_actor
================================================
FILE: MACPO/macpo/algorithms/r_mappo/algorithm/rMAPPOPolicy.py
================================================
import torch
from macpo.algorithms.r_mappo.algorithm.r_actor_critic import R_Actor, R_Critic
from macpo.utils.util import update_linear_schedule
class R_MAPPOPolicy:
"""
MAPPO Policy class. Wraps actor and critic networks to compute actions and value function predictions.
:param args: (argparse.Namespace) arguments containing relevant model and policy information.
:param obs_space: (gym.Space) observation space.
:param cent_obs_space: (gym.Space) value function input space (centralized input for MAPPO, decentralized for IPPO).
:param action_space: (gym.Space) action space.
:param device: (torch.device) specifies the device to run on (cpu/gpu).
"""
def __init__(self, args, obs_space, cent_obs_space, act_space, device=torch.device("cpu")):
self.device = device
self.lr = args.lr
self.critic_lr = args.critic_lr
self.opti_eps = args.opti_eps
self.weight_decay = args.weight_decay
self.obs_space = obs_space
self.share_obs_space = cent_obs_space
self.act_space = act_space
self.actor = R_Actor(args, self.obs_space, self.act_space, self.device)
self.critic = R_Critic(args, self.share_obs_space, self.device)
self.actor_optimizer = torch.optim.Adam(self.actor.parameters(),
lr=self.lr, eps=self.opti_eps,
weight_decay=self.weight_decay)
self.critic_optimizer = torch.optim.Adam(self.critic.parameters(),
lr=self.critic_lr,
eps=self.opti_eps,
weight_decay=self.weight_decay)
def lr_decay(self, episode, episodes):
"""
Decay the actor and critic learning rates.
:param episode: (int) current training episode.
:param episodes: (int) total number of training episodes.
"""
update_linear_schedule(self.actor_optimizer, episode, episodes, self.lr)
update_linear_schedule(self.critic_optimizer, episode, episodes, self.critic_lr)
def get_actions(self, cent_obs, obs, rnn_states_actor, rnn_states_critic, masks, available_actions=None,
deterministic=False):
"""
Compute actions and value function predictions for the given inputs.
:param cent_obs (np.ndarray): centralized input to the critic.
:param obs (np.ndarray): local agent inputs to the actor.
:param rnn_states_actor: (np.ndarray) if actor is RNN, RNN states for actor.
:param rnn_states_critic: (np.ndarray) if critic is RNN, RNN states for critic.
:param masks: (np.ndarray) denotes points at which RNN states should be reset.
:param available_actions: (np.ndarray) denotes which actions are available to agent
(if None, all actions available)
:param deterministic: (bool) whether the action should be mode of distribution or should be sampled.
:return values: (torch.Tensor) value function predictions.
:return actions: (torch.Tensor) actions to take.
:return action_log_probs: (torch.Tensor) log probabilities of chosen actions.
:return rnn_states_actor: (torch.Tensor) updated actor network RNN states.
:return rnn_states_critic: (torch.Tensor) updated critic network RNN states.
"""
actions, action_log_probs, rnn_states_actor = self.actor(obs,
rnn_states_actor,
masks,
available_actions,
deterministic)
values, rnn_states_critic = self.critic(cent_obs, rnn_states_critic, masks)
return values, actions, action_log_probs, rnn_states_actor, rnn_states_critic
def get_values(self, cent_obs, rnn_states_critic, masks):
"""
Get value function predictions.
:param cent_obs (np.ndarray): centralized input to the critic.
:param rnn_states_critic: (np.ndarray) if critic is RNN, RNN states for critic.
:param masks: (np.ndarray) denotes points at which RNN states should be reset.
:return values: (torch.Tensor) value function predictions.
"""
values, _ = self.critic(cent_obs, rnn_states_critic, masks)
return values
def evaluate_actions(self, cent_obs, obs, rnn_states_actor, rnn_states_critic, action, masks,
available_actions=None, active_masks=None):
"""
Get action logprobs / entropy and value function predictions for actor update.
:param cent_obs (np.ndarray): centralized input to the critic.
:param obs (np.ndarray): local agent inputs to the actor.
:param rnn_states_actor: (np.ndarray) if actor is RNN, RNN states for actor.
:param rnn_states_critic: (np.ndarray) if critic is RNN, RNN states for critic.
:param action: (np.ndarray) actions whose log probabilites and entropy to compute.
:param masks: (np.ndarray) denotes points at which RNN states should be reset.
:param available_actions: (np.ndarray) denotes which actions are available to agent
(if None, all actions available)
:param active_masks: (torch.Tensor) denotes whether an agent is active or dead.
:return values: (torch.Tensor) value function predictions.
:return action_log_probs: (torch.Tensor) log probabilities of the input actions.
:return dist_entropy: (torch.Tensor) action distribution entropy for the given inputs.
"""
action_log_probs, dist_entropy = self.actor.evaluate_actions(obs,
rnn_states_actor,
action,
masks,
available_actions,
active_masks)
values, _ = self.critic(cent_obs, rnn_states_critic, masks)
return values, action_log_probs, dist_entropy
def act(self, obs, rnn_states_actor, masks, available_actions=None, deterministic=False):
"""
Compute actions using the given inputs.
:param obs (np.ndarray): local agent inputs to the actor.
:param rnn_states_actor: (np.ndarray) if actor is RNN, RNN states for actor.
:param masks: (np.ndarray) denotes points at which RNN states should be reset.
:param available_actions: (np.ndarray) denotes which actions are available to agent
(if None, all actions available)
:param deterministic: (bool) whether the action should be mode of distribution or should be sampled.
"""
actions, _, rnn_states_actor = self.actor(obs, rnn_states_actor, masks, available_actions, deterministic)
return actions, rnn_states_actor
================================================
FILE: MACPO/macpo/algorithms/r_mappo/algorithm/r_actor_critic.py
================================================
import torch
import torch.nn as nn
from macpo.algorithms.utils.util import init, check
from macpo.algorithms.utils.cnn import CNNBase
from macpo.algorithms.utils.mlp import MLPBase
from macpo.algorithms.utils.rnn import RNNLayer
from macpo.algorithms.utils.act import ACTLayer
from macpo.utils.util import get_shape_from_obs_space
class R_Actor(nn.Module):
"""
Actor network class for MACPO. Outputs actions given observations.
:param args: (argparse.Namespace) arguments containing relevant model information.
:param obs_space: (gym.Space) observation space.
:param action_space: (gym.Space) action space.
:param device: (torch.device) specifies the device to run on (cpu/gpu).
"""
def __init__(self, args, obs_space, action_space, device=torch.device("cpu")):
super(R_Actor, self).__init__()
self.args = args
self.hidden_size = args.hidden_size
self._gain = args.gain
self._use_orthogonal = args.use_orthogonal
self._use_policy_active_masks = args.use_policy_active_masks
self._use_naive_recurrent_policy = args.use_naive_recurrent_policy
self._use_recurrent_policy = args.use_recurrent_policy
self._recurrent_N = args.recurrent_N
self.tpdv = dict(dtype=torch.float32, device=device)
obs_shape = get_shape_from_obs_space(obs_space)
base = CNNBase if len(obs_shape) == 3 else MLPBase
self.base = base(args, obs_shape)
if self._use_naive_recurrent_policy or self._use_recurrent_policy:
self.rnn = RNNLayer(self.hidden_size, self.hidden_size, self._recurrent_N, self._use_orthogonal)
self.act = ACTLayer(action_space, self.hidden_size, self._use_orthogonal, self._gain, args)
self.to(device)
def forward(self, obs, rnn_states, masks, available_actions=None, deterministic=False):
"""
Compute actions from the given inputs.
:param obs: (np.ndarray / torch.Tensor) observation inputs into network.
:param rnn_states: (np.ndarray / torch.Tensor) if RNN network, hidden states for RNN.
:param masks: (np.ndarray / torch.Tensor) mask tensor denoting if hidden states should be reinitialized to zeros.
:param available_actions: (np.ndarray / torch.Tensor) denotes which actions are available to agent
(if None, all actions available)
:param deterministic: (bool) whether to sample from action distribution or return the mode.
:return actions: (torch.Tensor) actions to take.
:return action_log_probs: (torch.Tensor) log probabilities of taken actions.
:return rnn_states: (torch.Tensor) updated RNN hidden states.
"""
obs = check(obs).to(**self.tpdv)
rnn_states = check(rnn_states).to(**self.tpdv)
masks = check(masks).to(**self.tpdv)
if available_actions is not None:
available_actions = check(available_actions).to(**self.tpdv)
actor_features = self.base(obs)
if self._use_naive_recurrent_policy or self._use_recurrent_policy:
actor_features, rnn_states = self.rnn(actor_features, rnn_states, masks)
actions, action_log_probs = self.act(actor_features, available_actions, deterministic)
return actions, action_log_probs, rnn_states
def evaluate_actions(self, obs, rnn_states, action, masks, available_actions=None, active_masks=None):
"""
Compute log probability and entropy of given actions.
:param obs: (torch.Tensor) observation inputs into network.
:param action: (torch.Tensor) actions whose entropy and log probability to evaluate.
:param rnn_states: (torch.Tensor) if RNN network, hidden states for RNN.
:param masks: (torch.Tensor) mask tensor denoting if hidden states should be reinitialized to zeros.
:param available_actions: (torch.Tensor) denotes which actions are available to agent
(if None, all actions available)
:param active_masks: (torch.Tensor) denotes whether an agent is active or dead.
:return action_log_probs: (torch.Tensor) log probabilities of the input actions.
:return dist_entropy: (torch.Tensor) action distribution entropy for the given inputs.
"""
obs = check(obs).to(**self.tpdv)
rnn_states = check(rnn_states).to(**self.tpdv)
action = check(action).to(**self.tpdv)
masks = check(masks).to(**self.tpdv)
if available_actions is not None:
available_actions = check(available_actions).to(**self.tpdv)
if active_masks is not None:
active_masks = check(active_masks).to(**self.tpdv)
actor_features = self.base(obs)
if self._use_naive_recurrent_policy or self._use_recurrent_policy:
actor_features, rnn_states = self.rnn(actor_features, rnn_states, masks)
if self.args.algorithm_name == "macpo":
action_log_probs, dist_entropy, action_mu, action_std = self.act.evaluate_actions_trpo(actor_features,
action,
available_actions,
active_masks=
active_masks if self._use_policy_active_masks
else None)
# print("action_log_probs", action_log_probs)
# print("action_std", action_std)
return action_log_probs, dist_entropy, action_mu, action_std
else:
action_log_probs, dist_entropy = self.act.evaluate_actions(actor_features,
action, available_actions,
active_masks=
active_masks if self._use_policy_active_masks
else None)
return action_log_probs, dist_entropy
class R_Critic(nn.Module):
"""
Critic network class for MAPPO. Outputs value function predictions given centralized input (MAPPO) or
local observations (IPPO).
:param args: (argparse.Namespace) arguments containing relevant model information.
:param cent_obs_space: (gym.Space) (centralized) observation space.
:param device: (torch.device) specifies the device to run on (cpu/gpu).
"""
def __init__(self, args, cent_obs_space, device=torch.device("cpu")):
super(R_Critic, self).__init__()
self.hidden_size = args.hidden_size
self._use_orthogonal = args.use_orthogonal
self._use_naive_recurrent_policy = args.use_naive_recurrent_policy
self._use_recurrent_policy = args.use_recurrent_policy
self._recurrent_N = args.recurrent_N
self.tpdv = dict(dtype=torch.float32, device=device)
init_method = [nn.init.xavier_uniform_, nn.init.orthogonal_][self._use_orthogonal]
cent_obs_shape = get_shape_from_obs_space(cent_obs_space)
base = CNNBase if len(cent_obs_shape) == 3 else MLPBase
self.base = base(args, cent_obs_shape)
if self._use_naive_recurrent_policy or self._use_recurrent_policy:
self.rnn = RNNLayer(self.hidden_size, self.hidden_size, self._recurrent_N, self._use_orthogonal)
def init_(m):
return init(m, init_method, lambda x: nn.init.constant_(x, 0))
self.v_out = init_(nn.Linear(self.hidden_size, 1))
self.to(device)
def forward(self, cent_obs, rnn_states, masks):
"""
Compute actions from the given inputs.
:param cent_obs: (np.ndarray / torch.Tensor) observation inputs into network.
:param rnn_states: (np.ndarray / torch.Tensor) if RNN network, hidden states for RNN.
:param masks: (np.ndarray / torch.Tensor) mask tensor denoting if RNN states should be reinitialized to zeros.
:return values: (torch.Tensor) value function predictions.
:return rnn_states: (torch.Tensor) updated RNN hidden states.
"""
cent_obs = check(cent_obs).to(**self.tpdv)
rnn_states = check(rnn_states).to(**self.tpdv)
masks = check(masks).to(**self.tpdv)
critic_features = self.base(cent_obs)
if self._use_naive_recurrent_policy or self._use_recurrent_policy:
critic_features, rnn_states = self.rnn(critic_features, rnn_states, masks)
values = self.v_out(critic_features)
return values, rnn_states
================================================
FILE: MACPO/macpo/algorithms/r_mappo/r_macpo.py
================================================
import numpy as np
import torch
import torch.nn as nn
from macpo.utils.util import get_gard_norm, huber_loss, mse_loss
from macpo.utils.popart import PopArt
from macpo.algorithms.utils.util import check
from macpo.algorithms.r_mappo.algorithm.r_actor_critic import R_Actor
from torch.nn.utils import clip_grad_norm
import copy
# EPS = 1e-8
class R_MACTRPO_CPO():
"""
Trainer class for MACPO to update policies.
:param args: (argparse.Namespace) arguments containing relevant model, policy, and env information.
:param policy: (R_MAPPO_Policy) policy to update.
:param device: (torch.device) specifies the device to run on (cpu/gpu).
"""
def __init__(self,
args,
policy, attempt_feasible_recovery=False,
attempt_infeasible_recovery=False, revert_to_last_safe_point=False, delta_bound=0.011,
safety_bound=0.1,
_backtrack_ratio=0.8, _max_backtracks=15, _constraint_name_1="trust_region",
_constraint_name_2="safety_region", linesearch_infeasible_recovery=True, accept_violation=False,
learn_margin=False,
device=torch.device("cpu")):
self.device = device
self.tpdv = dict(dtype=torch.float32, device=device)
self.policy = policy
self.clip_param = args.clip_param
self.ppo_epoch = args.ppo_epoch
self.num_mini_batch = args.num_mini_batch
self.data_chunk_length = args.data_chunk_length
self.value_loss_coef = args.value_loss_coef
self.entropy_coef = args.entropy_coef
self.max_grad_norm = args.max_grad_norm
self.huber_delta = args.huber_delta
self.episode_length = args.episode_length
self.kl_threshold = args.kl_threshold
self.safety_bound = args.safety_bound
self.ls_step = args.ls_step
self.accept_ratio = args.accept_ratio
self.EPS = args.EPS
self.gamma = args.gamma
self.safety_gamma = args.safety_gamma
self.line_search_fraction = args.line_search_fraction
self.g_step_dir_coef = args.g_step_dir_coef
self.b_step_dir_coef = args.b_step_dir_coef
self.fraction_coef = args.fraction_coef
self._use_recurrent_policy = args.use_recurrent_policy
self._use_naive_recurrent = args.use_naive_recurrent_policy
self._use_max_grad_norm = args.use_max_grad_norm
self._use_clipped_value_loss = args.use_clipped_value_loss
self._use_huber_loss = args.use_huber_loss
self._use_popart = args.use_popart
self._use_value_active_masks = args.use_value_active_masks
self._use_policy_active_masks = args.use_policy_active_masks
# todo: my args-start
self.args = args
self.device = device
self.tpdv = dict(dtype=torch.float32, device=device)
self.policy = policy
self._damping = 0.0001
self._delta = 0.01
self._max_backtracks = 10
self._backtrack_coeff = 0.5
self.clip_param = args.clip_param
self.ppo_epoch = args.ppo_epoch
self.num_mini_batch = args.num_mini_batch
self.data_chunk_length = args.data_chunk_length
self.value_loss_coef = args.value_loss_coef
self.entropy_coef = args.entropy_coef
self.max_grad_norm = args.max_grad_norm
self.huber_delta = args.huber_delta
self._use_recurrent_policy = args.use_recurrent_policy
self._use_naive_recurrent = args.use_naive_recurrent_policy
self._use_max_grad_norm = args.use_max_grad_norm
self._use_clipped_value_loss = args.use_clipped_value_loss
self._use_huber_loss = args.use_huber_loss
self._use_popart = args.use_popart
self._use_value_active_masks = args.use_value_active_masks
self._use_policy_active_masks = args.use_policy_active_masks
self.attempt_feasible_recovery = attempt_feasible_recovery
self.attempt_infeasible_recovery = attempt_infeasible_recovery
self.revert_to_last_safe_point = revert_to_last_safe_point
self._max_quad_constraint_val = args.kl_threshold # delta_bound
self._max_lin_constraint_val = args.safety_bound
self._backtrack_ratio = _backtrack_ratio
self._max_backtracks = _max_backtracks
self._constraint_name_1 = _constraint_name_1
self._constraint_name_2 = _constraint_name_2
self._linesearch_infeasible_recovery = linesearch_infeasible_recovery
self._accept_violation = accept_violation
hvp_approach = None
num_slices = 1
self.lamda_coef = 0
self.lamda_coef_a_star = 0
self.lamda_coef_b_star = 0
self.margin = 0
self.margin_lr = 0.05
self.learn_margin = learn_margin
self.n_rollout_threads = args.n_rollout_threads
if self._use_popart:
self.value_normalizer = PopArt(1, device=self.device)
else:
self.value_normalizer = None
def cal_value_loss(self, values, value_preds_batch, return_batch, active_masks_batch):
"""
Calculate value function loss.
:param values: (torch.Tensor) value function predictions.
:param value_preds_batch: (torch.Tensor) "old" value predictions from data batch (used for value clip loss)
:param return_batch: (torch.Tensor) reward to go returns.
:param active_masks_batch: (torch.Tensor) denotes if agent is active or dead at a given timesep.
:return value_loss: (torch.Tensor) value function loss.
"""
if self._use_popart:
value_pred_clipped = value_preds_batch + (values - value_preds_batch).clamp(-self.clip_param,
self.clip_param)
error_clipped = self.value_normalizer(return_batch) - value_pred_clipped
error_original = self.value_normalizer(return_batch) - values
else:
value_pred_clipped = value_preds_batch + (values - value_preds_batch).clamp(-self.clip_param,
self.clip_param)
error_clipped = return_batch - value_pred_clipped
error_original = return_batch - values
if self._use_huber_loss:
value_loss_clipped = huber_loss(error_clipped, self.huber_delta)
value_loss_original = huber_loss(error_original, self.huber_delta)
else:
value_loss_clipped = mse_loss(error_clipped)
value_loss_original = mse_loss(error_original)
if self._use_clipped_value_loss:
value_loss = torch.max(value_loss_original, value_loss_clipped)
else:
value_loss = value_loss_original
if self._use_value_active_masks:
value_loss = (value_loss * active_masks_batch).sum() / active_masks_batch.sum()
else:
value_loss = value_loss.mean()
return value_loss
def flat_grad(self, grads):
grad_flatten = []
for grad in grads:
if grad is None:
continue
grad_flatten.append(grad.view(-1))
grad_flatten = torch.cat(grad_flatten)
return grad_flatten
def flat_hessian(self, hessians):
hessians_flatten = []
for hessian in hessians:
if hessian is None:
continue
hessians_flatten.append(hessian.contiguous().view(-1))
hessians_flatten = torch.cat(hessians_flatten).data
return hessians_flatten
def flat_params(self, model):
params = []
for param in model.parameters():
params.append(param.data.view(-1))
params_flatten = torch.cat(params)
return params_flatten
def update_model(self, model, new_params):
index = 0
for params in model.parameters():
params_length = len(params.view(-1))
new_param = new_params[index: index + params_length]
new_param = new_param.view(params.size())
params.data.copy_(new_param)
index += params_length
def kl_divergence(self, obs, rnn_states, action, masks, available_actions, active_masks, new_actor, old_actor):
_, _, mu, std = new_actor.evaluate_actions(obs, rnn_states, action, masks, available_actions, active_masks)
_, _, mu_old, std_old = old_actor.evaluate_actions(obs, rnn_states, action, masks, available_actions,
active_masks)
logstd = torch.log(std)
mu_old = mu_old.detach()
std_old = std_old.detach()
logstd_old = torch.log(std_old)
# kl divergence between old policy and new policy : D( pi_old || pi_new )
# pi_old -> mu0, logstd0, std0 / pi_new -> mu, logstd, std
# be careful of calculating KL-divergence. It is not symmetric metric
kl = logstd_old - logstd + (std_old.pow(2) + (mu_old - mu).pow(2)) / \
(self.EPS + 2.0 * std.pow(2)) - 0.5
return kl.sum(1, keepdim=True)
# from openai baseline code
# https://github.com/openai/baselines/blob/master/baselines/common/cg.py
def conjugate_gradient(self, actor, obs, rnn_states, action, masks, available_actions, active_masks, b, nsteps,
residual_tol=1e-10):
x = torch.zeros(b.size()).to(device=self.device)
r = b.clone()
p = b.clone()
rdotr = torch.dot(r, r)
for i in range(nsteps):
_Avp = self.fisher_vector_product(actor, obs, rnn_states, action, masks, available_actions, active_masks, p)
alpha = rdotr / torch.dot(p, _Avp)
x += alpha * p
r -= alpha * _Avp
new_rdotr = torch.dot(r, r)
betta = new_rdotr / rdotr
p = r + betta * p
rdotr = new_rdotr
if rdotr < residual_tol:
break
return x
def fisher_vector_product(self, actor, obs, rnn_states, action, masks, available_actions, active_masks, p):
p.detach()
kl = self.kl_divergence(obs, rnn_states, action, masks, available_actions, active_masks, new_actor=actor,
old_actor=actor)
kl = kl.mean()
kl_grad = torch.autograd.grad(kl, actor.parameters(), create_graph=True, allow_unused=True)
kl_grad = self.flat_grad(kl_grad) # check kl_grad == 0
kl_grad_p = (kl_grad * p).sum()
kl_hessian_p = torch.autograd.grad(kl_grad_p, actor.parameters(), allow_unused=True)
kl_hessian_p = self.flat_hessian(kl_hessian_p)
return kl_hessian_p + 0.1 * p
def _get_flat_grad(self, y, model, retain_graph=None, create_graph=False):
grads = torch.autograd.grad(y, model.parameters(), retain_graph=retain_graph,
create_graph=create_graph, allow_unused=True)
_grads = []
for val, p in zip(grads, model.parameters()):
if val is not None:
_grads.append(val)
else:
_grads.append(torch.zeros_like(p.data, requires_grad=create_graph))
return torch.cat([grad.reshape(-1) for grad in _grads])
def _flat_grad_(self, f, model, retain_graph=None, create_graph=False):
return self.flat_grad(torch.autograd.grad(f, model.parameters(), retain_graph=retain_graph,
create_graph=create_graph, allow_unused=True))
def hessian_vector_product(self, f, model):
# for H = grad**2 f, compute Hx
g = self._flat_grad_(f, model)
# g = self._get_flat_grad(f, model)
# x = torch.placeholder(torch.float32, shape=g.shape)
x = torch.FloatTensor(g.shape)
return x, self._flat_grad_(torch.sum(g * x), model)
def cg(self, Ax, b, cg_iters=10):
x = np.zeros_like(b)
r = b.clone() # Note: should be 'b - Ax(x)', but for x=0, Ax(x)=0. Change if doing warm start.
p = r.clone()
r_dot_old = torch.dot(r, r)
for _ in range(cg_iters):
z = Ax(p)
alpha = r_dot_old / (torch.dot(p, z) + self.EPS)
x += alpha * p
r -= alpha * z
r_dot_new = torch.dot(r, r)
p = r + (r_dot_new / r_dot_old) * p
r_dot_old = r_dot_new
return x
def trpo_update(self, sample, update_actor=True):
"""
Update actor and critic networks.
:param sample: (Tuple) contains data batch with which to update networks.
:update_actor: (bool) whether to update actor network.
:return value_loss: (torch.Tensor) value function loss.
:return critic_grad_norm: (torch.Tensor) gradient norm from critic update.
;return policy_loss: (torch.Tensor) actor(policy) loss value.
:return dist_entropy: (torch.Tensor) action entropies.
:return actor_grad_norm: (torch.Tensor) gradient norm from actor update.
:return imp_weights: (torch.Tensor) importance sampling weights.
"""
share_obs_batch, obs_batch, rnn_states_batch, rnn_states_critic_batch, actions_batch, \
value_preds_batch, return_batch, masks_batch, active_masks_batch, old_action_log_probs_batch, \
adv_targ, available_actions_batch, factor_batch, cost_preds_batch, cost_returns_barch, rnn_states_cost_batch, \
cost_adv_targ, aver_episode_costs = sample
old_action_log_probs_batch = check(old_action_log_probs_batch).to(**self.tpdv)
adv_targ = check(adv_targ).to(**self.tpdv)
cost_adv_targ = check(cost_adv_targ).to(**self.tpdv)
value_preds_batch = check(value_preds_batch).to(**self.tpdv)
return_batch = check(return_batch).to(**self.tpdv)
active_masks_batch = check(active_masks_batch).to(**self.tpdv)
factor_batch = check(factor_batch).to(**self.tpdv)
cost_returns_barch = check(cost_returns_barch).to(**self.tpdv)
cost_preds_batch = check(cost_preds_batch).to(**self.tpdv)
# Reshape to do in a single forward pass for all steps
# values, action_log_probs, dist_entropy, cost_values, action_mu, action_std
values, action_log_probs, dist_entropy, cost_values, action_mu, action_std = self.policy.evaluate_actions(
share_obs_batch,
obs_batch,
rnn_states_batch,
rnn_states_critic_batch,
actions_batch,
masks_batch,
available_actions_batch,
active_masks_batch,
rnn_states_cost_batch)
# todo: reward critic update
value_loss = self.cal_value_loss(values, value_preds_batch, return_batch, active_masks_batch)
self.policy.critic_optimizer.zero_grad()
(value_loss * self.value_loss_coef).backward()
if self._use_max_grad_norm:
critic_grad_norm = nn.utils.clip_grad_norm_(self.policy.critic.parameters(), self.max_grad_norm)
else:
critic_grad_norm = get_gard_norm(self.policy.critic.parameters())
self.policy.critic_optimizer.step()
# todo: cost critic update
cost_loss = self.cal_value_loss(cost_values, cost_preds_batch, cost_returns_barch, active_masks_batch)
self.policy.cost_optimizer.zero_grad()
(cost_loss * self.value_loss_coef).backward()
if self._use_max_grad_norm:
cost_grad_norm = nn.utils.clip_grad_norm_(self.policy.cost_critic.parameters(), self.max_grad_norm)
else:
cost_grad_norm = get_gard_norm(self.policy.cost_critic.parameters())
self.policy.cost_optimizer.step()
# todo: actor update
rescale_constraint_val = (aver_episode_costs.mean() - self._max_lin_constraint_val) * (1 - self.gamma)
if rescale_constraint_val == 0:
rescale_constraint_val = self.EPS
# todo:reward-g
ratio = torch.exp(action_log_probs - old_action_log_probs_batch)
if self._use_policy_active_masks:
reward_loss = (torch.sum(ratio * factor_batch * adv_targ, dim=-1, keepdim=True) *
active_masks_batch).sum() / active_masks_batch.sum()
else:
reward_loss = torch.sum(ratio * factor_batch * adv_targ, dim=-1, keepdim=True).mean()
reward_loss = - reward_loss # todo:
reward_loss_grad = torch.autograd.grad(reward_loss, self.policy.actor.parameters(), retain_graph=True,
allow_unused=True)
reward_loss_grad = self.flat_grad(reward_loss_grad)
# todo:cost-b
if self._use_policy_active_masks:
cost_loss = (torch.sum(ratio * factor_batch * (cost_adv_targ), dim=-1, keepdim=True) *
active_masks_batch).sum() / active_masks_batch.sum()
else:
cost_loss = torch.sum(ratio * factor_batch * (cost_adv_targ), dim=-1, keepdim=True).mean()
cost_loss_grad = torch.autograd.grad(cost_loss, self.policy.actor.parameters(), retain_graph=True,
allow_unused=True)
cost_loss_grad = self.flat_grad(cost_loss_grad)
B_cost_loss_grad = cost_loss_grad.unsqueeze(0)
B_cost_loss_grad = self.flat_grad(B_cost_loss_grad)
# todo: compute lamda_coef and v_coef
g_step_dir = self.conjugate_gradient(self.policy.actor,
obs_batch,
rnn_states_batch,
actions_batch,
masks_batch,
available_actions_batch,
active_masks_batch,
reward_loss_grad.data,
nsteps=10) # todo: compute H^{-1} g
b_step_dir = self.conjugate_gradient(self.policy.actor,
obs_batch,
rnn_states_batch,
actions_batch,
masks_batch,
available_actions_batch,
active_masks_batch,
B_cost_loss_grad.data,
nsteps=10) # todo: compute H^{-1} b
q_coef = (reward_loss_grad * g_step_dir).sum(0, keepdim=True) # todo: compute q_coef: = g^T H^{-1} g
r_coef = (reward_loss_grad * b_step_dir).sum(0, keepdim=True) # todo: compute r_coef: = g^T H^{-1} b
s_coef = (cost_loss_grad * b_step_dir).sum(0, keepdim=True) # todo: compute s_coef: = b^T H^{-1} b
fraction = self.line_search_fraction #0.5 # 0.5 # line search step size
loss_improve = 0 # initialization
"""self._max_lin_constraint_val = c, B_cost_loss_grad = c in cpo"""
B_cost_loss_grad_dot = torch.dot(B_cost_loss_grad, B_cost_loss_grad)
# torch.dot(B_cost_loss_grad, B_cost_loss_grad) # B_cost_loss_grad.mean() * B_cost_loss_grad.mean()
if (torch.dot(B_cost_loss_grad, B_cost_loss_grad)) <= self.EPS and rescale_constraint_val < 0:
# feasible and cost grad is zero---shortcut to pure TRPO update!
# w, r, s, A, B = 0, 0, 0, 0, 0
# g_step_dir = torch.tensor(0)
b_step_dir = torch.tensor(0)
r_coef = torch.tensor(0)
s_coef = torch.tensor(0)
positive_Cauchy_value = torch.tensor(0)
whether_recover_policy_value = torch.tensor(0)
optim_case = 4
else:
# cost grad is nonzero: CPO update!
r_coef = (reward_loss_grad * b_step_dir).sum(0, keepdim=True) # todo: compute r_coef: = g^T H^{-1} b
s_coef = (cost_loss_grad * b_step_dir).sum(0, keepdim=True) # todo: compute s_coef: = b^T H^{-1} b
if r_coef == 0:
r_coef = self.EPS
if s_coef == 0:
s_coef = self.EPS
positive_Cauchy_value = (
q_coef - (r_coef ** 2) / (self.EPS + s_coef)) # should be always positive (Cauchy-Shwarz)
whether_recover_policy_value = 2 * self._max_quad_constraint_val - (
rescale_constraint_val ** 2) / (
self.EPS + s_coef) # does safety boundary intersect trust region? (positive = yes)
if rescale_constraint_val < 0 and whether_recover_policy_value < 0:
# point in trust region is feasible and safety boundary doesn't intersect
# ==> entire trust region is feasible
optim_case = 3
elif rescale_constraint_val < 0 and whether_recover_policy_value >= 0:
# x = 0 is feasible and safety boundary intersects
# ==> most of trust region is feasible
optim_case = 2
elif rescale_constraint_val >= 0 and whether_recover_policy_value >= 0:
# x = 0 is infeasible and safety boundary intersects
# ==> part of trust region is feasible, recovery possible
optim_case = 1
else:
# x = 0 infeasible, and safety halfspace is outside trust region
# ==> whole trust region is infeasible, try to fail gracefully
optim_case = 0
if whether_recover_policy_value == 0:
whether_recover_policy_value = self.EPS
if optim_case in [3, 4]:
lam = torch.sqrt(
(q_coef / (2 * self._max_quad_constraint_val))) # self.lamda_coef = lam = np.sqrt(q / (2 * target_kl))
nu = torch.tensor(0) # v_coef = 0
elif optim_case in [1, 2]:
LA, LB = [0, r_coef / rescale_constraint_val], [r_coef / rescale_constraint_val, np.inf]
LA, LB = (LA, LB) if rescale_constraint_val < 0 else (LB, LA)
proj = lambda x, L: max(L[0], min(L[1], x))
lam_a = proj(torch.sqrt(positive_Cauchy_value / whether_recover_policy_value), LA)
lam_b = proj(torch.sqrt(q_coef / (torch.tensor(2 * self._max_quad_constraint_val))), LB)
f_a = lambda lam: -0.5 * (positive_Cauchy_value / (
self.EPS + lam) + whether_recover_policy_value * lam) - r_coef * rescale_constraint_val / (
self.EPS + s_coef)
f_b = lambda lam: -0.5 * (q_coef / (self.EPS + lam) + 2 * self._max_quad_constraint_val * lam)
lam = lam_a if f_a(lam_a) >= f_b(lam_b) else lam_b
nu = max(0, lam * rescale_constraint_val - r_coef) / (self.EPS + s_coef)
else:
lam = torch.tensor(0)
nu = torch.sqrt(torch.tensor(2 * self._max_quad_constraint_val) / (self.EPS + s_coef))
x_a = (1. / (lam + self.EPS)) * (g_step_dir + nu * b_step_dir)
x_b = (nu * b_step_dir)
x = x_a if optim_case > 0 else x_b
# todo: update actor and learning
reward_loss = reward_loss.data.cpu().numpy()
cost_loss = cost_loss.data.cpu().numpy()
params = self.flat_params(self.policy.actor)
old_actor = R_Actor(self.policy.args,
self.policy.obs_space,
self.policy.act_space,
self.device)
self.update_model(old_actor, params)
expected_improve = -torch.dot(x, reward_loss_grad).sum(0, keepdim=True)
expected_improve = expected_improve.data.cpu().numpy()
# line search
flag = False
fraction_coef = self.fraction_coef
# print("fraction_coef", fraction_coef)
for i in range(self.ls_step):
x_norm = torch.norm(x)
if x_norm > 0.5:
x = x * 0.5 / x_norm
new_params = params - fraction_coef * (fraction**i) * x
self.update_model(self.policy.actor, new_params)
values, action_log_probs, dist_entropy, new_cost_values, action_mu, action_std = self.policy.evaluate_actions(
share_obs_batch,
obs_batch,
rnn_states_batch,
rnn_states_critic_batch,
actions_batch,
masks_batch,
available_actions_batch,
active_masks_batch,
rnn_states_cost_batch)
ratio = torch.exp(action_log_probs - old_action_log_probs_batch)
if self._use_policy_active_masks:
new_reward_loss = (torch.sum(ratio * factor_batch * adv_targ, dim=-1, keepdim=True) *
active_masks_batch).sum() / active_masks_batch.sum()
else:
new_reward_loss = torch.sum(ratio * factor_batch * adv_targ, dim=-1, keepdim=True).mean()
if self._use_policy_active_masks:
new_cost_loss = (torch.sum(ratio * factor_batch * cost_adv_targ, dim=-1, keepdim=True) *
active_masks_batch).sum() / active_masks_batch.sum()
else:
new_cost_loss = torch.sum(ratio * factor_batch * cost_adv_targ, dim=-1, keepdim=True).mean()
new_reward_loss = new_reward_loss.data.cpu().numpy()
new_reward_loss = -new_reward_loss
new_cost_loss = new_cost_loss.data.cpu().numpy()
loss_improve = new_reward_loss - reward_loss
kl = self.kl_divergence(obs_batch,
rnn_states_batch,
actions_batch,
masks_batch,
available_actions_batch,
active_masks_batch,
new_actor=self.policy.actor,
old_actor=old_actor)
kl = kl.mean()
# see https: // en.wikipedia.org / wiki / Backtracking_line_search
if ((kl < self.kl_threshold) and (loss_improve < 0 if optim_case > 1 else True)
and (new_cost_loss.mean() - cost_loss.mean() <= max(-rescale_constraint_val, 0))):
flag = True
# print("line search successful")
break
expected_improve *= fraction
if not flag:
# line search failed
print("line search failed")
params = self.flat_params(old_actor)
self.update_model(self.policy.actor, params)
return value_loss, critic_grad_norm, kl, loss_improve, expected_improve, dist_entropy, ratio, cost_loss, cost_grad_norm, whether_recover_policy_value, cost_preds_batch, cost_returns_barch, B_cost_loss_grad, lam, nu, g_step_dir, b_step_dir, x, action_mu, action_std, B_cost_loss_grad_dot
def train(self, buffer, shared_buffer=None, update_actor=True):
"""
Perform a training update using minibatch GD.
:param buffer: (SharedReplayBuffer) buffer containing training data.
:param update_actor: (bool) whether to update actor network.
:return train_info: (dict) contains information regarding training update (e.g. loss, grad norms, etc).
"""
if self._use_popart:
advantages = buffer.returns[:-1] - self.value_normalizer.denormalize(buffer.value_preds[:-1])
else:
advantages = buffer.returns[:-1] - buffer.value_preds[:-1]
advantages_copy = advantages.copy()
advantages_copy[buffer.active_masks[:-1] == 0.0] = np.nan
mean_advantages = np.nanmean(advantages_copy)
std_advantages = np.nanstd(advantages_copy)
advantages = (advantages - mean_advantages) / (std_advantages + 1e-5)
if self._use_popart:
cost_adv = buffer.cost_returns[:-1] - self.value_normalizer.denormalize(buffer.cost_preds[:-1])
else:
cost_adv = buffer.cost_returns[:-1] - buffer.cost_preds[:-1]
cost_adv_copy = cost_adv.copy()
cost_adv_copy[buffer.active_masks[:-1] == 0.0] = np.nan
mean_cost_adv = np.nanmean(cost_adv_copy)
std_cost_adv = np.nanstd(cost_adv_copy)
cost_adv = (cost_adv - mean_cost_adv) / (std_cost_adv + 1e-5)
train_info = {}
train_info['value_loss'] = 0
train_info['kl'] = 0
train_info['dist_entropy'] = 0
train_info['loss_improve'] = 0
train_info['expected_improve'] = 0
train_info['critic_grad_norm'] = 0
train_info['ratio'] = 0
train_info['cost_loss'] = 0
train_info['cost_grad_norm'] = 0
train_info['whether_recover_policy_value'] = 0
train_info['cost_preds_batch'] = 0
train_info['cost_returns_barch'] = 0
train_info['B_cost_loss_grad'] = 0
train_info['lam'] = 0
train_info['nu'] = 0
train_info['g_step_dir'] = 0
train_info['b_step_dir'] = 0
train_info['x'] = 0
train_info['action_mu'] = 0
train_info['action_std'] = 0
train_info['B_cost_loss_grad_dot'] = 0
if self._use_recurrent_policy:
data_generator = buffer.recurrent_generator(advantages, self.num_mini_batch, self.data_chunk_length,
cost_adv=cost_adv)
elif self._use_naive_recurrent:
data_generator = buffer.naive_recurrent_generator(advantages, self.num_mini_batch, cost_adv=cost_adv)
else:
data_generator = buffer.feed_forward_generator(advantages, self.num_mini_batch, cost_adv=cost_adv)
# old_actor = copy.deepcopy(self.policy.actor)
for sample in data_generator:
value_loss, critic_grad_norm, kl, loss_improve, expected_improve, dist_entropy, imp_weights, cost_loss, cost_grad_norm, whether_recover_policy_value, cost_preds_batch, cost_returns_barch, B_cost_loss_grad, lam, nu, g_step_dir, b_step_dir, x, action_mu, action_std, B_cost_loss_grad_dot \
= self.trpo_update(sample, update_actor)
train_info['value_loss'] += value_loss.item()
train_info['kl'] += kl
train_info['loss_improve'] += loss_improve
train_info['expected_improve'] += expected_improve
train_info['dist_entropy'] += dist_entropy.item()
train_info['critic_grad_norm'] += critic_grad_norm
train_info['ratio'] += imp_weights.mean()
train_info['cost_loss'] += value_loss.item()
train_info['cost_grad_norm'] += cost_grad_norm
train_info['whether_recover_policy_value'] += whether_recover_policy_value
train_info['cost_preds_batch'] += cost_preds_batch.mean()
train_info['cost_returns_barch'] += cost_returns_barch.mean()
train_info['B_cost_loss_grad'] += B_cost_loss_grad.mean()
train_info['g_step_dir'] += g_step_dir.float().mean()
train_info['b_step_dir'] += b_step_dir.float().mean()
train_info['x'] = x.float().mean()
train_info['action_mu'] += action_mu.float().mean()
train_info['action_std'] += action_std.float().mean()
train_info['B_cost_loss_grad_dot'] += B_cost_loss_grad_dot.item()
num_updates = self.ppo_epoch * self.num_mini_batch
for k in train_info.keys():
train_info[k] /= num_updates
return train_info
def prep_training(self):
self.policy.actor.train()
self.policy.critic.train()
def prep_rollout(self):
self.policy.actor.eval()
self.policy.critic.eval()
================================================
FILE: MACPO/macpo/algorithms/utils/act.py
================================================
from .distributions import Bernoulli, Categorical, DiagGaussian
import torch
import torch.nn as nn
class ACTLayer(nn.Module):
"""
MLP Module to compute actions.
:param action_space: (gym.Space) action space.
:param inputs_dim: (int) dimension of network input.
:param use_orthogonal: (bool) whether to use orthogonal initialization.
:param gain: (float) gain of the output layer of the network.
"""
def __init__(self, action_space, inputs_dim, use_orthogonal, gain, args=None):
super(ACTLayer, self).__init__()
self.mixed_action = False
self.multi_discrete = False
# print("action_space.__class__.__name__", action_space.__class__.__name__)
if action_space.__class__.__name__ == "Discrete":
action_dim = action_space.n
self.action_out = Categorical(inputs_dim, action_dim, use_orthogonal, gain)
elif action_space.__class__.__name__ == "Box":
action_dim = action_space.shape[0]
self.action_out = DiagGaussian(inputs_dim, action_dim, use_orthogonal, gain, args)
elif action_space.__class__.__name__ == "MultiBinary":
action_dim = action_space.shape[0]
self.action_out = Bernoulli(inputs_dim, action_dim, use_orthogonal, gain)
elif action_space.__class__.__name__ == "MultiDiscrete":
self.multi_discrete = True
action_dims = action_space.high - action_space.low + 1
self.action_outs = []
for action_dim in action_dims:
self.action_outs.append(Categorical(inputs_dim, action_dim, use_orthogonal, gain))
self.action_outs = nn.ModuleList(self.action_outs)
else: # discrete + continous
self.mixed_action = True
continous_dim = action_space[0].shape[0]
discrete_dim = action_space[1].n
self.action_outs = nn.ModuleList([DiagGaussian(inputs_dim, continous_dim, use_orthogonal, gain, args),
Categorical(inputs_dim, discrete_dim, use_orthogonal, gain)])
def forward(self, x, available_actions=None, deterministic=False):
"""
Compute actions and action logprobs from given input.
:param x: (torch.Tensor) input to network.
:param available_actions: (torch.Tensor) denotes which actions are available to agent
(if None, all actions available)
:param deterministic: (bool) whether to sample from action distribution or return the mode.
:return actions: (torch.Tensor) actions to take.
:return action_log_probs: (torch.Tensor) log probabilities of taken actions.
"""
if self.mixed_action :
actions = []
action_log_probs = []
for action_out in self.action_outs:
action_logit = action_out(x)
action = action_logit.mode() if deterministic else action_logit.sample()
action_log_prob = action_logit.log_probs(action)
actions.append(action.float())
action_log_probs.append(action_log_prob)
actions = torch.cat(actions, -1)
action_log_probs = torch.sum(torch.cat(action_log_probs, -1), -1, keepdim=True)
elif self.multi_discrete:
actions = []
action_log_probs = []
for action_out in self.action_outs:
action_logit = action_out(x)
action = action_logit.mode() if deterministic else action_logit.sample()
action_log_prob = action_logit.log_probs(action)
actions.append(action)
action_log_probs.append(action_log_prob)
actions = torch.cat(actions, -1)
action_log_probs = torch.cat(action_log_probs, -1)
else:
action_logits = self.action_out(x, available_actions)
actions = action_logits.mode() if deterministic else action_logits.sample()
action_log_probs = action_logits.log_probs(actions)
return actions, action_log_probs
def get_probs(self, x, available_actions=None):
"""
Compute action probabilities from inputs.
:param x: (torch.Tensor) input to network.
:param available_actions: (torch.Tensor) denotes which actions are available to agent
(if None, all actions available)
:return action_probs: (torch.Tensor)
"""
if self.mixed_action or self.multi_discrete:
action_probs = []
for action_out in self.action_outs:
action_logit = action_out(x)
action_prob = action_logit.probs
action_probs.append(action_prob)
action_probs = torch.cat(action_probs, -1)
else:
action_logits = self.action_out(x, available_actions)
action_probs = action_logits.probs
return action_probs
def evaluate_actions(self, x, action, available_actions=None, active_masks=None):
"""
Compute log probability and entropy of given actions.
:param x: (torch.Tensor) input to network.
:param action: (torch.Tensor) actions whose entropy and log probability to evaluate.
:param available_actions: (torch.Tensor) denotes which actions are available to agent
(if None, all actions available)
:param active_masks: (torch.Tensor) denotes whether an agent is active or dead.
:return action_log_probs: (torch.Tensor) log probabilities of the input actions.
:return dist_entropy: (torch.Tensor) action distribution entropy for the given inputs.
"""
if self.mixed_action:
a, b = action.split((2, 1), -1)
b = b.long()
action = [a, b]
action_log_probs = []
dist_entropy = []
for action_out, act in zip(self.action_outs, action):
action_logit = action_out(x)
action_log_probs.append(action_logit.log_probs(act))
if active_masks is not None:
if len(action_logit.entropy().shape) == len(active_masks.shape):
dist_entropy.append((action_logit.entropy() * active_masks).sum()/active_masks.sum())
else:
dist_entropy.append((action_logit.entropy() * active_masks.squeeze(-1)).sum()/active_masks.sum())
else:
dist_entropy.append(action_logit.entropy().mean())
action_log_probs = torch.sum(torch.cat(action_log_probs, -1), -1, keepdim=True)
dist_entropy = dist_entropy[0] / 2.0 + dist_entropy[1] / 0.98 #! dosen't make sense
elif self.multi_discrete:
action = torch.transpose(action, 0, 1)
action_log_probs = []
dist_entropy = []
for action_out, act in zip(self.action_outs, action):
action_logit = action_out(x)
action_log_probs.append(action_logit.log_probs(act))
if active_masks is not None:
dist_entropy.append((action_logit.entropy()*active_masks.squeeze(-1)).sum()/active_masks.sum())
else:
dist_entropy.append(action_logit.entropy().mean())
action_log_probs = torch.cat(action_log_probs, -1) # ! could be wrong
dist_entropy = torch.tensor(dist_entropy).mean()
else:
action_logits = self.action_out(x, available_actions)
action_log_probs = action_logits.log_probs(action)
if active_masks is not None:
dist_entropy = (action_logits.entropy()*active_masks).sum()/active_masks.sum()
# dist_entropy = (action_logits.entropy()*active_masks.squeeze(-1)).sum()/active_masks.sum()
else:
dist_entropy = action_logits.entropy().mean()
return action_log_probs, dist_entropy
def evaluate_actions_trpo(self, x, action, available_actions=None, active_masks=None):
"""
Compute log probability and entropy of given actions.
:param x: (torch.Tensor) input to network.
:param action: (torch.Tensor) actions whose entropy and log probability to evaluate.
:param available_actions: (torch.Tensor) denotes which actions are available to agent
(if None, all actions available)
:param active_masks: (torch.Tensor) denotes whether an agent is active or dead.
:return action_log_probs: (torch.Tensor) log probabilities of the input actions.
:return dist_entropy: (torch.Tensor) action distribution entropy for the given inputs.
"""
if self.mixed_action:
a, b = action.split((2, 1), -1)
b = b.long()
action = [a, b]
action_log_probs = []
dist_entropy = []
for action_out, act in zip(self.action_outs, action):
action_logit = action_out(x)
action_log_probs.append(action_logit.log_probs(act))
if active_masks is not None:
if len(action_logit.entropy().shape) == len(active_masks.shape):
dist_entropy.append((action_logit.entropy() * active_masks).sum() / active_masks.sum())
else:
dist_entropy.append(
(action_logit.entropy() * active_masks.squeeze(-1)).sum() / active_masks.sum())
else:
dist_entropy.append(action_logit.entropy().mean())
action_log_probs = torch.sum(torch.cat(action_log_probs, -1), -1, keepdim=True)
dist_entropy = dist_entropy[0] / 2.0 + dist_entropy[1] / 0.98 # ! dosen't make sense
elif self.multi_discrete:
action = torch.transpose(action, 0, 1)
action_log_probs = []
dist_entropy = []
for action_out, act in zip(self.action_outs, action):
action_logit = action_out(x)
action_log_probs.append(action_logit.log_probs(act))
if active_masks is not None:
dist_entropy.append((action_logit.entropy() * active_masks.squeeze(-1)).sum() / active_masks.sum())
else:
dist_entropy.append(action_logit.entropy().mean())
action_log_probs = torch.cat(action_log_probs, -1) # ! could be wrong
dist_entropy = torch.tensor(dist_entropy).mean()
else:
action_logits = self.action_out(x, available_actions)
# print("action_logits.mean-macppo-act.py", action_logits.mean)
action_mu = action_logits.mean
action_std = action_logits.stddev
action_log_probs = action_logits.log_probs(action)
# print("action_log_probs-act.py", action_log_probs)
if active_masks is not None:
dist_entropy = (action_logits.entropy() * active_masks).sum() / active_masks.sum()
# dist_entropy = (action_logits.entropy()*active_masks.squeeze(-1)).sum()/active_masks.sum()
else:
dist_entropy = action_logits.entropy().mean()
# print("action_logits-act.py", action_logits)
# print("action_mu-act.py", action_mu)
return action_log_probs, dist_entropy, action_mu, action_std
================================================
FILE: MACPO/macpo/algorithms/utils/cnn.py
================================================
import torch.nn as nn
from .util import init
"""CNN Modules and utils."""
class Flatten(nn.Module):
def forward(self, x):
return x.view(x.size(0), -1)
class CNNLayer(nn.Module):
def __init__(self, obs_shape, hidden_size, use_orthogonal, use_ReLU, kernel_size=3, stride=1):
super(CNNLayer, self).__init__()
active_func = [nn.Tanh(), nn.ReLU()][use_ReLU]
init_method = [nn.init.xavier_uniform_, nn.init.orthogonal_][use_orthogonal]
gain = nn.init.calculate_gain(['tanh', 'relu'][use_ReLU])
def init_(m):
return init(m, init_method, lambda x: nn.init.constant_(x, 0), gain=gain)
input_channel = obs_shape[0]
input_width = obs_shape[1]
input_height = obs_shape[2]
self.cnn = nn.Sequential(
init_(nn.Conv2d(in_channels=input_channel,
out_channels=hidden_size // 2,
kernel_size=kernel_size,
stride=stride)
),
active_func,
Flatten(),
init_(nn.Linear(hidden_size // 2 * (input_width - kernel_size + stride) * (input_height - kernel_size + stride),
hidden_size)
),
active_func,
init_(nn.Linear(hidden_size, hidden_size)), active_func)
def forward(self, x):
x = x / 255.0
x = self.cnn(x)
return x
class CNNBase(nn.Module):
def __init__(self, args, obs_shape):
super(CNNBase, self).__init__()
self._use_orthogonal = args.use_orthogonal
self._use_ReLU = args.use_ReLU
self.hidden_size = args.hidden_size
self.cnn = CNNLayer(obs_shape, self.hidden_size, self._use_orthogonal, self._use_ReLU)
def forward(self, x):
x = self.cnn(x)
return x
================================================
FILE: MACPO/macpo/algorithms/utils/distributions.py
================================================
import torch
import torch.nn as nn
from .util import init
"""
Modify standard PyTorch distributions so they to make compatible with this codebase.
"""
#
# Standardize distribution interfaces
#
# Categorical
class FixedCategorical(torch.distributions.Categorical):
def sample(self):
return super().sample().unsqueeze(-1)
def log_probs(self, actions):
return (
super()
.log_prob(actions.squeeze(-1))
.view(actions.size(0), -1)
.sum(-1)
.unsqueeze(-1)
)
def mode(self):
return self.probs.argmax(dim=-1, keepdim=True)
# Normal
class FixedNormal(torch.distributions.Normal):
def log_probs(self, actions):
return super().log_prob(actions)
# return super().log_prob(actions).sum(-1, keepdim=True)
def entrop(self):
return super.entropy().sum(-1)
def mode(self):
return self.mean
# Bernoulli
class FixedBernoulli(torch.distributions.Bernoulli):
def log_probs(self, actions):
return super.log_prob(actions).view(actions.size(0), -1).sum(-1).unsqueeze(-1)
def entropy(self):
return super().entropy().sum(-1)
def mode(self):
return torch.gt(self.probs, 0.5).float()
class Categorical(nn.Module):
def __init__(self, num_inputs, num_outputs, use_orthogonal=True, gain=0.01):
super(Categorical, self).__init__()
init_method = [nn.init.xavier_uniform_, nn.init.orthogonal_][use_orthogonal]
def init_(m):
return init(m, init_method, lambda x: nn.init.constant_(x, 0), gain)
self.linear = init_(nn.Linear(num_inputs, num_outputs))
def forward(self, x, available_actions=None):
x = self.linear(x)
if available_actions is not None:
x[available_actions == 0] = -1e10
return FixedCategorical(logits=x)
# class DiagGaussian(nn.Module):
# def __init__(self, num_inputs, num_outputs, use_orthogonal=True, gain=0.01):
# super(DiagGaussian, self).__init__()
#
# init_method = [nn.init.xavier_uniform_, nn.init.orthogonal_][use_orthogonal]
# def init_(m):
# return init(m, init_method, lambda x: nn.init.constant_(x, 0), gain)
#
# self.fc_mean = init_(nn.Linear(num_inputs, num_outputs))
# self.logstd = AddBias(torch.zeros(num_outputs))
#
# def forward(self, x, available_actions=None):
# action_mean = self.fc_mean(x)
#
# # An ugly hack for my KFAC implementation.
# zeros = torch.zeros(action_mean.size())
# if x.is_cuda:
# zeros = zeros.cuda()
#
# action_logstd = self.logstd(zeros)
# return FixedNormal(action_mean, action_logstd.exp())
class DiagGaussian(nn.Module):
def __init__(self, num_inputs, num_outputs, use_orthogonal=True, gain=0.01, args=None):
super(DiagGaussian, self).__init__()
init_method = [nn.init.xavier_uniform_, nn.init.orthogonal_][use_orthogonal]
def init_(m):
return init(m, init_method, lambda x: nn.init.constant_(x, 0), gain)
if args is not None:
self.std_x_coef = args.std_x_coef
self.std_y_coef = args.std_y_coef
else:
self.std_x_coef = 1.
self.std_y_coef = 0.5
self.fc_mean = init_(nn.Linear(num_inputs, num_outputs))
log_std = torch.ones(num_outputs) * self.std_x_coef
self.log_std = torch.nn.Parameter(log_std)
def forward(self, x, available_actions=None):
action_mean = self.fc_mean(x)
action_std = torch.sigmoid(self.log_std / self.std_x_coef) * self.std_y_coef
# print("self.log_std", self.log_std)
# print("action_mean", action_mean)
# print("_action_std", action_std)
# action_std = torch.zeros_like(_action_std)
# print("action_std", action_std)
# action_std = torch.where(torch.isnan(action_std), torch.full_like(action_std, 1e-8), action_std)
# torch.where((action_std == torch.tensor(0)), torch.tensor(1e-8), action_std)
return FixedNormal(action_mean, action_std)
class Bernoulli(nn.Module):
def __init__(self, num_inputs, num_outputs, use_orthogonal=True, gain=0.01):
super(Bernoulli, self).__init__()
init_method = [nn.init.xavier_uniform_, nn.init.orthogonal_][use_orthogonal]
def init_(m):
return init(m, init_method, lambda x: nn.init.constant_(x, 0), gain)
self.linear = init_(nn.Linear(num_inputs, num_outputs))
def forward(self, x):
x = self.linear(x)
return FixedBernoulli(logits=x)
class AddBias(nn.Module):
def __init__(self, bias):
super(AddBias, self).__init__()
self._bias = nn.Parameter(bias.unsqueeze(1))
def forward(self, x):
if x.dim() == 2:
bias = self._bias.t().view(1, -1)
else:
bias = self._bias.t().view(1, -1, 1, 1)
return x + bias
================================================
FILE: MACPO/macpo/algorithms/utils/mlp.py
================================================
import torch.nn as nn
from .util import init, get_clones
"""MLP modules."""
class MLPLayer(nn.Module):
def __init__(self, input_dim, hidden_size, layer_N, use_orthogonal, use_ReLU):
super(MLPLayer, self).__init__()
self._layer_N = layer_N
active_func = [nn.Tanh(), nn.ReLU()][use_ReLU]
init_method = [nn.init.xavier_uniform_, nn.init.orthogonal_][use_orthogonal]
gain = nn.init.calculate_gain(['tanh', 'relu'][use_ReLU])
def init_(m):
return init(m, init_method, lambda x: nn.init.constant_(x, 0), gain=gain)
self.fc1 = nn.Sequential(
init_(nn.Linear(input_dim, hidden_size)), active_func, nn.LayerNorm(hidden_size))
# self.fc_h = nn.Sequential(init_(
# nn.Linear(hidden_size, hidden_size)), active_func, nn.LayerNorm(hidden_size))
# self.fc2 = get_clones(self.fc_h, self._layer_N)
self.fc2 = nn.ModuleList([nn.Sequential(init_(
nn.Linear(hidden_size, hidden_size)), active_func, nn.LayerNorm(hidden_size)) for i in
range(self._layer_N)])
def forward(self, x):
x = self.fc1(x)
for i in range(self._layer_N):
x = self.fc2[i](x)
return x
class MLPBase(nn.Module):
def __init__(self, args, obs_shape, cat_self=True, attn_internal=False):
super(MLPBase, self).__init__()
self._use_feature_normalization = args.use_feature_normalization
self._use_orthogonal = args.use_orthogonal
self._use_ReLU = args.use_ReLU
self._stacked_frames = args.stacked_frames
self._layer_N = args.layer_N
self.hidden_size = args.hidden_size
obs_dim = obs_shape[0]
if self._use_feature_normalization:
self.feature_norm = nn.LayerNorm(obs_dim)
self.mlp = MLPLayer(obs_dim, self.hidden_size,
self._layer_N, self._use_orthogonal, self._use_ReLU)
def forward(self, x):
if self._use_feature_normalization:
x = self.feature_norm(x)
x = self.mlp(x)
return x
================================================
FILE: MACPO/macpo/algorithms/utils/rnn.py
================================================
import torch
import torch.nn as nn
"""RNN modules."""
class RNNLayer(nn.Module):
def __init__(self, inputs_dim, outputs_dim, recurrent_N, use_orthogonal):
super(RNNLayer, self).__init__()
self._recurrent_N = recurrent_N
self._use_orthogonal = use_orthogonal
self.rnn = nn.GRU(inputs_dim, outputs_dim, num_layers=self._recurrent_N)
for name, param in self.rnn.named_parameters():
if 'bias' in name:
nn.init.constant_(param, 0)
elif 'weight' in name:
if self._use_orthogonal:
nn.init.orthogonal_(param)
else:
nn.init.xavier_uniform_(param)
self.norm = nn.LayerNorm(outputs_dim)
def forward(self, x, hxs, masks):
if x.size(0) == hxs.size(0):
x, hxs = self.rnn(x.unsqueeze(0),
(hxs * masks.repeat(1, self._recurrent_N).unsqueeze(-1)).transpose(0, 1).contiguous())
x = x.squeeze(0)
hxs = hxs.transpose(0, 1)
else:
# x is a (T, N, -1) tensor that has been flatten to (T * N, -1)
N = hxs.size(0)
T = int(x.size(0) / N)
# unflatten
x = x.view(T, N, x.size(1))
# Same deal with masks
masks = masks.view(T, N)
# Let's figure out which steps in the sequence have a zero for any agent
# We will always assume t=0 has a zero in it as that makes the logic cleaner
has_zeros = ((masks[1:] == 0.0)
.any(dim=-1)
.nonzero()
.squeeze()
.cpu())
# +1 to correct the masks[1:]
if has_zeros.dim() == 0:
# Deal with scalar
has_zeros = [has_zeros.item() + 1]
else:
has_zeros = (has_zeros + 1).numpy().tolist()
# add t=0 and t=T to the list
has_zeros = [0] + has_zeros + [T]
hxs = hxs.transpose(0, 1)
outputs = []
for i in range(len(has_zeros) - 1):
# We can now process steps that don't have any zeros in masks together!
# This is much faster
start_idx = has_zeros[i]
end_idx = has_zeros[i + 1]
temp = (hxs * masks[start_idx].view(1, -1, 1).repeat(self._recurrent_N, 1, 1)).contiguous()
rnn_scores, hxs = self.rnn(x[start_idx:end_idx], temp)
outputs.append(rnn_scores)
# assert len(outputs) == T
# x is a (T, N, -1) tensor
x = torch.cat(outputs, dim=0)
# flatten
x = x.reshape(T * N, -1)
hxs = hxs.transpose(0, 1)
x = self.norm(x)
return x, hxs
================================================
FILE: MACPO/macpo/algorithms/utils/util.py
================================================
import copy
import numpy as np
import torch
import torch.nn as nn
def init(module, weight_init, bias_init, gain=1):
weight_init(module.weight.data, gain=gain)
bias_init(module.bias.data)
return module
def get_clones(module, N):
return nn.ModuleList([copy.deepcopy(module) for i in range(N)])
def check(input):
output = torch.from_numpy(input) if type(input) == np.ndarray else input
return output
================================================
FILE: MACPO/macpo/config.py
================================================
import argparse
def get_config():
"""
The configuration parser for common hyperparameters of all environment.
Please reach each `scripts/train/<env>_runner.py` file to find private hyperparameters
only used in <env>.
Prepare parameters:
--algorithm_name <algorithm_name>
specifiy the algorithm, including `["rmappo", "mappo", "rmappg", "mappg", "trpo"]`
--experiment_name <str>
an identifier to distinguish different experiment.
--seed <int>
set seed for numpy and torch
--cuda
by default True, will use GPU to train; or else will use CPU;
--cuda_deterministic
by default, make sure random seed effective. if set, bypass such function.
--n_training_threads <int>
number of training threads working in parallel. by default 1
--n_rollout_threads <int>
number of parallel envs for training rollout. by default 32
--n_eval_rollout_threads <int>
number of parallel envs for evaluating rollout. by default 1
--n_render_rollout_threads <int>
number of parallel envs for rendering, could only be set as 1 for some environments.
--num_env_steps <int>
number of env steps to train (default: 10e6)
--user_name <str>
[for wandb usage], to specify user's name for simply collecting training data.
--use_wandb
[for wandb usage], by default True, will log date to wandb server. or else will use tensorboard to log data.
Env parameters:
--env_name <str>
specify the name of environment
--use_obs_instead_of_state
[only for some env] by default False, will use global state; or else will use concatenated local obs.
Replay Buffer parameters:
--episode_length <int>
the max length of episode in the buffer.
Network parameters:
--share_policy
by default True, all agents will share the same network; set to make training agents use different policies.
--use_centralized_V
by default True, use centralized training mode; or else will decentralized training mode.
--stacked_frames <int>
Number of input frames which should be stack together.
--hidden_size <int>
Dimension of hidden layers for actor/critic networks
--layer_N <int>
Number of layers for actor/critic networks
--use_ReLU
by default True, will use ReLU. or else will use Tanh.
--use_popart
by default True, use running mean and std to normalize rewards.
--use_feature_normalization
by default True, apply layernorm to normalize inputs.
--use_orthogonal
by default True, use Orthogonal initialization for weights and 0 initialization for biases. or else, will use xavier uniform inilialization.
--gain
by default 0.01, use the gain # of last action layer
--use_naive_recurrent_policy
by default False, use the whole trajectory to calculate hidden states.
--use_recurrent_policy
by default, use Recurrent Policy. If set, do not use.
--recurrent_N <int>
The number of recurrent layers ( default 1).
--data_chunk_length <int>
Time length of chunks used to train a recurrent_policy, default 10.
Optimizer parameters:
--lr <float>
learning rate parameter, (default: 5e-4, fixed).
--critic_lr <float>
learning rate of critic (default: 5e-4, fixed)
--opti_eps <float>
RMSprop optimizer epsilon (default: 1e-5)
--weight_decay <float>
coefficience of weight decay (default: 0)
PPO parameters:
--ppo_epoch <int>
number of ppo epochs (default: 15)
--use_clipped_value_loss
by default, clip loss value. If set, do not clip loss value.
--clip_param <float>
ppo clip parameter (default: 0.2)
--num_mini_batch <int>
number of batches for ppo (default: 1)
--entropy_coef <float>
entropy term coefficient (default: 0.01)
--use_max_grad_norm
by default, use max norm of gradients. If set, do not use.
--max_grad_norm <float>
max norm of gradients (default: 0.5)
--use_gae
by default, use generalized advantage estimation. If set, do not use gae.
--gamma <float>
discount factor for rewards (default: 0.99)
--gae_lambda <float>
gae lambda parameter (default: 0.95)
--use_proper_time_limits
by default, the return value does consider limits of time. If set, compute returns with considering time limits factor.
--use_huber_loss
by default, use huber loss. If set, do not use huber loss.
--use_value_active_masks
by default True, whether to mask useless data in value loss.
--huber_delta <float>
coefficient of huber loss.
PPG parameters:
--aux_epoch <int>
number of auxiliary epochs. (default: 4)
--clone_coef <float>
clone term coefficient (default: 0.01)
Run parameters:
--use_linear_lr_decay
by default, do not apply linear decay to learning rate. If set, use a linear schedule on the learning rate
Save & Log parameters:
--save_interval <int>
time duration between contiunous twice models saving.
--log_interval <int>
time duration between contiunous twice log printing.
Eval parameters:
--use_eval
by default, do not start evaluation. If set`, start evaluation alongside with training.
--eval_interval <int>
time duration between contiunous twice evaluation progress.
--eval_episodes <int>
number of episodes of a single evaluation.
Render parameters:
--save_gifs
by default, do not save render video. If set, save video.
--use_render
by default, do not render the env during training. If set, start render. Note: something, the environment has internal render process which is not controlled by this hyperparam.
--render_episodes <int>
the number of episodes to render a given env
--ifi <float>
the play interval of each rendered image in saved video.
Pretrained parameters:
--model_dir <str>
by default None. set the path to pretrained model.
"""
parser = argparse.ArgumentParser(
description='macpo', formatter_class=argparse.RawDescriptionHelpFormatter)
# prepare parameters
parser.add_argument("--algorithm_name", type=str,
default=' ', choices=["macpo"])
parser.add_argument("--experiment_name", type=str, default="check", help="an identifier to distinguish different experiment.")
parser.add_argument("--seed", type=int, default=1, help="Random seed for numpy/torch")
parser.add_argument("--cuda", action='store_false', default=False, help="by default True, will use GPU to train; or else will use CPU;")
parser.add_argument("--cuda_deterministic",
action='store_false', default=True, help="by default, make sure random seed effective. if set, bypass such function.")
parser.add_argument("--n_training_threads", type=int,
default=1, help="Number of torch threads for training")
parser.add_argument("--n_rollout_threads", type=int, default=32,
help="Number of parallel envs for training rollouts")
parser.add_argument("--n_eval_rollout_threads", type=int, default=1,
help="Number of parallel envs for evaluating rollouts")
parser.add_argument("--n_render_rollout_threads", type=int, default=1,
help="Number of parallel envs for rendering rollouts")
parser.add_argument("--num_env_steps", type=int, default=10e6,
help='Number of environment steps to train (default: 10e6)')
parser.add_argument("--user_name", type=str, default='marl',help="[for wandb usage], to specify user's name for simply collecting training data.")
parser.add_argument("--use_wandb", action='store_false', default=False, help="[for wandb usage], by default True, will log date to wandb server. or else will use tensorboard to log data.")
# env parameters
parser.add_argument("--env_name", type=str, default='StarCraft2', help="specify the name of environment")
parser.add_argument("--use_obs_instead_of_state", action='store_true',
default=False, help="Whether to use global state or concatenated obs")
# replay buffer parameters
parser.add_argument("--episode_length", type=int,
default=200, help="Max length for any episode")
# network parameters
parser.add_argument("--share_policy", action='store_false',
default=True, help='Whether agent share the same policy')
parser.add_argument("--use_centralized_V", action='store_false',
default=True, help="Whether to use centralized V function")
parser.add_argument("--stacked_frames", type=int, default=1,
help="Dimension of hidden layers for actor/critic networks")
parser.add_argument("--use_stacked_frames", action='store_true',
default=False, help="Whether to use stacked_frames")
parser.add_argument("--hidden_size", type=int, default=64,
help="Dimension of hidden layers for actor/critic networks")
parser.add_argument("--layer_N", type=int, default=1,
help="Number of layers for actor/critic networks")
parser.add_argument("--use_ReLU", action='store_false',
default=True, help="Whether to use ReLU")
parser.add_argument("--use_popart", action='store_false', default=True, help="by default True, use running mean and std to normalize rewards.")
parser.add_argument("--use_valuenorm", action='store_false', default=True, help="by default True, use running mean and std to normalize rewards.")
parser.add_argument("--use_feature_normalization", action='store_false',
default=True, help="Whether to apply layernorm to the inputs")
parser.add_argument("--use_orthogonal", action='store_false', default=True,
help="Whether to use Orthogonal initialization for weights and 0 initialization for biases")
parser.add_argument("--gain", type=float, default=0.01,
help="The gain # of last action layer")
# recurrent parameters
parser.add_argument("--use_naive_recurrent_policy", action='store_true',
default=False, help='Whether to use a naive recurrent policy')
parser.add_argument("--use_recurrent_policy", action='store_true',
default=False, help='use a recurrent policy')
parser.add_argument("--recurrent_N", type=int, default=1, help="The number of recurrent layers.")
parser.add_argument("--data_chunk_length", type=int, default=10,
help="Time length of chunks used to train a recurrent_policy")
# optimizer parameters
parser.add_argument("--lr", type=float, default=5e-4,
help='learning rate (default: 5e-4)')
parser.add_argument("--critic_lr", type=float, default=5e-4,
help='critic learning rate (default: 5e-4)')
parser.add_argument("--opti_eps", type=float, default=1e-5,
help='RMSprop optimizer epsilon (default: 1e-5)')
parser.add_argument("--weight_decay", type=float, default=0)
parser.add_argument("--std_x_coef", type=float, default=1)
parser.add_argument("--std_y_coef", type=float, default=0.5)
# trpo parameters
parser.add_argument("--kl_threshold", type=float, default=0.01,
help='the threshold of kl-divergence (default: 0.01)')
parser.add_argument("--safety_bound", type=float, default=0.1,
help='safety')
parser.add_argument("--ls_step", type=int, default=10,
help='number of line search (default: 10)')
parser.add_argument("--accept_ratio", type=float, default=0.5,
help='accept ratio of loss improve (default: 0.5)')
parser.add_argument("--EPS", type=float, default=1e-8,
help='hyper parameter, close to zero')
# ppo parameters
parser.add_argument("--ppo_epoch", type=int, default=15,
help='number of ppo epochs (default: 15)')
parser.add_argument("--use_clipped_value_loss",
action='store_false', default=True, help="by default, clip loss value. If set, do not clip loss value.")
parser.add_argument("--clip_param", type=float, default=0.2,
help='ppo clip parameter (default: 0.2)')
parser.add_argument("--num_mini_batch", type=int, default=1,
help='number of batches for ppo (default: 1)')
parser.add_argument("--entropy_coef", type=float, default=0.01,
help='entropy term coefficient (default: 0.01)')
# todo: lagrangian_coef is the lagrangian coefficient for mappo_lagrangian
parser.add_argument("--lagrangian_coef", type=float, default=0.01,
help='entropy term coefficient (default: 0.01)')
parser.add_argument("--value_loss_coef", type=float,
default=1, help='value loss coefficient (default: 0.5)')
parser.add_argument("--use_max_grad_norm",
action='store_false', default=True, help="by default, use max norm of gradients. If set, do not use.")
parser.add_argument("--max_grad_norm", type=float, default=10.0,
help='max norm of gradients (default: 0.5)')
parser.add_argument("--use_gae", action='store_false',
default=True, help='use generalized advantage estimation')
parser.add_argument("--gamma", type=float, default=0.99,
help='discount factor for rewards (default: 0.99)')
parser.add_argument("--safety_gamma", type=float, default=0.2,
help='discount factor for rewards (default: 0.2)')
parser.add_argument("--gae_lambda", type=float, default=0.95,
help='gae lambda parameter (default: 0.95)')
parser.add_argument("--use_proper_time_limits", action='store_true',
default=False, help='compute returns taking into account time limits')
parser.add_argument("--use_huber_loss", action='store_false', default=True, help="by default, use huber loss. If set, do not use huber loss.")
parser.add_argument("--use_value_active_masks",
action='store_false', default=True, help="by default True, whether to mask useless data in value loss.")
parser.add_argument("--use_policy_active_masks",
action='store_false', default=True, help="by default True, whether to mask useless data in policy loss.")
parser.add_argument("--huber_delta", type=float, default=10.0, help=" coefficience of huber loss.")
# run parameters
parser.add_argument("--use_linear_lr_decay", action='store_true',
default=False, help='use a linear schedule on the learning rate')
# save parameters
parser.add_argument("--save_interval", type=int, default=1, help="time duration between contiunous twice models saving.")
# log parameters
parser.add_argument("--log_interval", type=int, default=5, help="time duration between contiunous twice log printing.")
# eval parameters
parser.add_argument("--use_eval", action='store_true', default=False, help="by default, do not start evaluation. If set`, start evaluation alongside with training.")
parser.add_argument("--eval_interval", type=int, default=25, help="time duration between contiunous twice evaluation progress.")
parser.add_argument("--eval_episodes", type=int, default=32, help="number of episodes of a single evaluation.")
# render parameters
parser.add_argument("--save_gifs", action='store_true', default=False, help="by default, do not save render video. If set, save video.")
parser.add_argument("--use_render", action='store_true', default=False, help="by default, do not render the env during training. If set, start render. Note: something, the environment has internal render process which is not controlled by this hyperparam.")
parser.add_argument("--render_episodes", type=int, default=5, help="the number of episodes to render a given env")
parser.add_argument("--ifi", type=float, default=0.1, help="the play interval of each rendered image in saved video.")
# pretrained parameters
parser.add_argument("--model_dir", type=str, default=None, help="by default None. set the path to pretrained model.")
# safe parameters fraction
parser.add_argument("--safty_bound", type=float, default=0.1, help=" ")
parser.add_argument("--line_search_fraction", type=float, default=0.5, help="line search step size")
parser.add_argument("--g_step_dir_coef", type=float, default=0.1, help="rescale g")
parser.add_argument("--b_step_dir_coef", type=float, default=0.1, help="rescale b")
parser.add_argument("--fraction_coef", type=float, default=0.1, help="the coef of line search step size")
return parser
================================================
FILE: MACPO/macpo/envs/__init__.py
================================================
import socket
from absl import flags
FLAGS = flags.FLAGS
FLAGS(['train_sc.py'])
================================================
FILE: MACPO/macpo/envs/env_wrappers.py
================================================
"""
Modified from OpenAI Baselines code to work with multi-agent envs
"""
import numpy as np
import torch
from multiprocessing import Process, Pipe
from abc import ABC, abstractmethod
from macpo.utils.util import tile_images
class CloudpickleWrapper(object):
"""
Uses cloudpickle to serialize contents (otherwise multiprocessing tries to use pickle)
"""
def __init__(self, x):
self.x = x
def __getstate__(self):
import cloudpickle
return cloudpickle.dumps(self.x)
def __setstate__(self, ob):
import pickle
self.x = pickle.loads(ob)
class ShareVecEnv(ABC):
"""
An abstract asynchronous, vectorized environment.
Used to batch data from multiple copies of an environment, so that
each observation becomes an batch of observations, and expected action is a batch of actions to
be applied per-environment.
"""
closed = False
viewer = None
metadata = {
'render.modes': ['human', 'rgb_array']
}
def __init__(self, num_envs, observation_space, share_observation_space, action_space):
self.num_envs = num_envs
self.observation_space = observation_space
self.share_observation_space = share_observation_space
self.action_space = action_space
@abstractmethod
def reset(self):
"""
Reset all the environments and return an array of
observations, or a dict of observation arrays.
If step_async is still doing work, that work will
be cancelled and step_wait() should not be called
until step_async() is invoked again.
"""
pass
@abstractmethod
def step_async(self, actions):
"""
Tell all the environments to start taking a step
with the given actions.
Call step_wait() to get the results of the step.
You should not call this if a step_async run is
already pending.
"""
pass
@abstractmethod
def step_wait(self):
"""
Wait for the step taken with step_async().
Returns (obs, rews, cos, dones, infos):
- obs: an array of observations, or a dict of
arrays of observations.
- rews: an array of rewards
- cos: an array of costs
- dones: an array of "episode done" booleans
- infos: a sequence of info objects
"""
pass
def close_extras(self):
"""
Clean up the extra resources, beyond what's in this base class.
Only runs when not self.closed.
"""
pass
def close(self):
if self.closed:
return
if self.viewer is not None:
self.viewer.close()
self.close_extras()
self.closed = True
def step(self, actions):
"""
Step the environments synchronously.
This is available for backwards compatibility.
"""
self.step_async(actions)
return self.step_wait()
def render(self, mode='human'):
imgs = self.get_images()
bigimg = tile_images(imgs)
if mode == 'human':
self.get_viewer().imshow(bigimg)
return self.get_viewer().isopen
elif mode == 'rgb_array':
return bigimg
else:
raise NotImplementedError
def get_images(self):
"""
Return RGB images from each environment
"""
raise NotImplementedError
@property
def unwrapped(self):
if isinstance(self, VecEnvWrapper):
return self.venv.unwrapped
else:
return self
def get_viewer(self):
if self.viewer is None:
from gym.envs.classic_control import rendering
self.viewer = rendering.SimpleImageViewer()
return self.viewer
def worker(remote, parent_remote, env_fn_wrapper):
parent_remote.close()
env = env_fn_wrapper.x()
while True:
cmd, data = remote.recv()
if cmd == 'step':
ob, reward, done, info = env.step(data)
if 'bool' in done.__class__.__name__:
if done:
ob = env.reset()
else:
if np.all(done):
ob = env.reset()
remote.send((ob, reward, info["cost"], done, info))
elif cmd == 'reset':
ob = env.reset()
remote.send((ob))
elif cmd == 'render':
if data == "rgb_array":
fr = env.render(mode=data)
remote.send(fr)
elif data == "human":
env.render(mode=data)
elif cmd == 'reset_task':
ob = env.reset_task()
remote.send(ob)
elif cmd == 'close':
env.close()
remote.close()
break
elif cmd == 'get_spaces':
remote.send((env.observation_space, env.share_observation_space, env.action_space))
else:
raise NotImplementedError
class GuardSubprocVecEnv(ShareVecEnv):
def __init__(self, env_fns, spaces=None):
"""
envs: list of gym environments to run in subprocesses
"""
self.waiting = False
self.closed = False
nenvs = len(env_fns)
self.remotes, self.work_remotes = zip(*[Pipe() for _ in range(nenvs)])
self.ps = [Process(target=worker, args=(work_remote, remote, CloudpickleWrapper(env_fn)))
for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)]
for p in self.ps:
p.daemon = False # could cause zombie process
p.start()
for remote in self.work_remotes:
remote.close()
self.remotes[0].send(('get_spaces', None))
observation_space, share_observation_space, action_space = self.remotes[0].recv()
ShareVecEnv.__init__(self, len(env_fns), observation_space,
share_observation_space, action_space)
def step_async(self, actions):
for remote, action in zip(self.remotes, actions):
remote.send(('step', action))
self.waiting = True
def step_wait(self):
results = [remote.recv() for remote in self.remotes]
self.waiting = False
obs, rews, cos, dones, infos = zip(*results)
return np.stack(obs), np.stack(rews), np.stack(cos), np.stack(dones), infos
def reset(self):
for remote in self.remotes:
remote.send(('reset', None))
obs = [remote.recv() for remote in self.remotes]
return np.stack(obs)
def reset_task(self):
for remote in self.remotes:
remote.send(('reset_task', None))
return np.stack([remote.recv() for remote in self.remotes])
def close(self):
if self.closed:
return
if self.waiting:
for remote in self.remotes:
remote.recv()
for remote in self.remotes:
remote.send(('close', None))
for p in self.ps:
p.join()
self.closed = True
class SubprocVecEnv(ShareVecEnv):
def __init__(self, env_fns, spaces=None):
"""
envs: list of gym environments to run in subprocesses
"""
self.waiting = False
self.closed = False
nenvs = len(env_fns)
self.remotes, self.work_remotes = zip(*[Pipe() for _ in range(nenvs)])
self.ps = [Process(target=worker, args=(work_remote, remote, CloudpickleWrapper(env_fn)))
for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)]
for p in self.ps:
p.daemon = True # if the main process crashes, we should not cause things to hang
p.start()
for remote in self.work_remotes:
remote.close()
self.remotes[0].send(('get_spaces', None))
observation_space, share_observation_space, action_space = self.remotes[0].recv()
ShareVecEnv.__init__(self, len(env_fns), observation_space,
share_observation_space, action_space)
def step_async(self, actions):
for remote, action in zip(self.remotes, actions):
remote.send(('step', action))
self.waiting = True
def step_wait(self):
results = [remote.recv() for remote in self.remotes]
self.waiting = False
obs, rews, cos, dones, infos = zip(*results)
return np.stack(obs), np.stack(rews), np.stack(cos), np.stack(dones), infos
def reset(self):
for remote in self.remotes:
remote.send(('reset', None))
obs = [remote.recv() for remote in self.remotes]
return np.stack(obs)
def reset_task(self):
for remote in self.remotes:
remote.send(('reset_task', None))
return np.stack([remote.recv() for remote in self.remotes])
def close(self):
if self.closed:
return
if self.waiting:
for remote in self.remotes:
remote.recv()
for remote in self.remotes:
remote.send(('close', None))
for p in self.ps:
p.join()
self.closed = True
def render(self, mode="rgb_array"):
for remote in self.remotes:
remote.send(('render', mode))
if mode == "rgb_array":
frame = [remote.recv() for remote in self.remotes]
return np.stack(frame)
def shareworker(remote, parent_remote, env_fn_wrapper):
parent_remote.close()
env = env_fn_wrapper.x()
while True:
cmd, data = remote.recv()
if cmd == 'step':
ob, s_ob, reward, done, info, available_actions = env.step(data)
if 'bool' in done.__class__.__name__:
if done:
ob, s_ob, available_actions = env.reset()
else:
if np.all(done):
ob, s_ob, available_actions = env.reset()
remote.send((ob, s_ob, reward, done, info, available_actions))
elif cmd == 'reset':
ob, s_ob, available_actions = env.reset()
remote.send((ob, s_ob, available_actions))
elif cmd == 'reset_task':
ob = env.reset_task()
remote.send(ob)
elif cmd == 'render':
if data == "rgb_array":
fr = env.render(mode=data)
remote.send(fr)
elif data == "human":
env.render(mode=data)
elif cmd == 'close':
env.close()
remote.close()
break
elif cmd == 'get_spaces':
remote.send(
(env.observation_space, env.share_observation_space, env.action_space))
elif cmd == 'render_vulnerability':
fr = env.render_vulnerability(data)
remote.send((fr))
elif cmd == 'get_num_agents':
remote.send((env.n_agents))
else:
raise NotImplementedError
class ShareSubprocVecEnv(ShareVecEnv):
def __init__(self, env_fns, spaces=None):
"""
envs: list of gym environments to run in subprocesses
"""
self.waiting = False
self.closed = False
nenvs = len(env_fns)
self.remotes, self.work_remotes = zip(*[Pipe() for _ in range(nenvs)])
self.ps = [Process(target=shareworker, args=(work_remote, remote, CloudpickleWrapper(env_fn)))
for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)]
for p in self.ps:
p.daemon = True # if the main process crashes, we should not cause things to hang
p.start()
for remote in self.work_remotes:
remote.close()
self.remotes[0].send(('get_num_agents', None))
self.n_agents = self.remotes[0].recv()
self.remotes[0].send(('get_spaces', None))
observation_space, share_observation_space, action_space = self.remotes[0].recv(
)
# print("wrapper:", share_observation_space)
ShareVecEnv.__init__(self, len(env_fns), observation_space,
share_observation_space, action_space)
def step_async(self, actions):
for remote, action in zip(self.remotes, actions):
remote.send(('step', action))
self.waiting = True
def step_wait(self):
results = [remote.recv() for remote in self.remotes]
self.waiting = False
obs, share_obs, rews, dones, infos, available_actions = zip(*results)
cost_x= np.array([item[0]['cost'] for item in infos])
# print("=====cost_x=====: ", cost_x.sum())
# print("=====np.stack(dones)=====: ", np.stack(dones))
return np.stack(obs), np.stack(share_obs), np.stack(rews), np.stack(cost_x), np.stack(dones), infos, np.stack(available_actions)
def reset(self):
for remote in self.remotes:
remote.send(('reset', None))
results = [remote.recv() for remote in self.remotes]
obs, share_obs, available_actions = zip(*results)
return np.stack(obs), np.stack(share_obs), np.stack(available_actions)
def reset_task(self):
for remote in self.remotes:
remote.send(('reset_task', None))
return np.stack([remote.recv() for remote in self.remotes])
def close(self):
if self.closed:
return
if self.waiting:
for remote in self.remotes:
remote.recv()
for remote in self.remotes:
remote.send(('close', None))
for p in self.ps:
p.join()
self.closed = True
def choosesimpleworker(remote, parent_remote, env_fn_wrapper):
parent_remote.close()
env = env_fn_wrapper.x()
while True:
cmd, data = remote.recv()
if cmd == 'step':
ob, reward, done, info = env.step(data)
remote.send((ob, reward, info["cost"], done, info))
elif cmd == 'reset':
ob = env.reset(data)
remote.send((ob))
elif cmd == 'reset_task':
ob = env.reset_task()
remote.send(ob)
elif cmd == 'close':
env.close()
remote.close()
break
elif cmd == 'render':
if data == "rgb_array":
fr = env.render(mode=data)
remote.send(fr)
elif data == "human":
env.render(mode=data)
elif cmd == 'get_spaces':
remote.send(
(env.observation_space, env.share_observation_space, env.action_space))
else:
raise NotImplementedError
class ChooseSimpleSubprocVecEnv(ShareVecEnv):
def __init__(self, env_fns, spaces=None):
"""
envs: list of gym environments to run in subprocesses
"""
self.waiting = False
self.closed = False
nenvs = len(env_fns)
self.remotes, self.work_remotes = zip(*[Pipe() for _ in range(nenvs)])
self.ps = [Process(target=choosesimpleworker, args=(work_remote, remote, CloudpickleWrapper(env_fn)))
for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)]
for p in self.ps:
p.daemon = True # if the main process crashes, we should not cause things to hang
p.start()
for remote in self.work_remotes:
remote.close()
self.remotes[0].send(('get_spaces', None))
observation_space, share_observation_space, action_space = self.remotes[0].recv()
ShareVecEnv.__init__(self, len(env_fns), observation_space,
share_observation_space, action_space)
def step_async(self, actions):
for remote, action in zip(self.remotes, actions):
remote.send(('step', action))
self.waiting = True
def step_wait(self):
results = [remote.recv() for remote in self.remotes]
self.waiting = False
obs, rews, cos, dones, infos = zip(*results)
return np.stack(obs), np.stack(rews), np.stack(cos), np.stack(dones), infos
def reset(self, reset_choose):
for remote, choose in zip(self.remotes, reset_choose):
remote.send(('reset', choose))
obs = [remote.recv() for remote in self.remotes]
return np.stack(obs)
def render(self, mode="rgb_array"):
for remote in self.remotes:
remote.send(('render', mode))
if mode == "rgb_array":
frame = [remote.recv() for remote in self.remotes]
return np.stack(frame)
def reset_task(self):
for remote in self.remotes:
remote.send(('reset_task', None))
return np.stack([remote.recv() for remote in self.remotes])
def close(self):
if self.closed:
return
if self.waiting:
for remote in self.remotes:
remote.recv()
for remote in self.remotes:
remote.send(('close', None))
for p in self.ps:
p.join()
self.closed = True
def chooseworker(remote, parent_remote, env_fn_wrapper):
parent_remote.close()
env = env_fn_wrapper.x()
while True:
cmd, data = remote.recv()
if cmd == 'step':
ob, s_ob, reward, done, info, available_actions = env.step(data)
remote.send((ob, s_ob, reward, info["cost"], done, info, available_actions))
elif cmd == 'reset':
ob, s_ob, available_actions = env.reset(data)
remote.send((ob, s_ob, available_actions))
elif cmd == 'reset_task':
ob = env.reset_task()
remote.send(ob)
elif cmd == 'close':
env.close()
remote.close()
break
elif cmd == 'render':
remote.send(env.render(mode='rgb_array'))
elif cmd == 'get_spaces':
remote.send(
(env.observation_space, env.share_observation_space, env.action_space))
else:
raise NotImplementedError
class ChooseSubprocVecEnv(ShareVecEnv):
def __init__(self, env_fns, spaces=None):
"""
envs: list of gym environments to run in subprocesses
"""
self.waiting = False
self.closed = False
nenvs = len(env_fns)
self.remotes, self.work_remotes = zip(*[Pipe() for _ in range(nenvs)])
self.ps = [Process(target=chooseworker, args=(work_remote, remote, CloudpickleWrapper(env_fn)))
for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)]
for p in self.ps:
p.daemon = True # if the main process crashes, we should not cause things to hang
p.start()
for remote in self.work_remotes:
remote.close()
self.remotes[0].send(('get_spaces', None))
observation_space, share_observation_space, action_space = self.remotes[0].recv(
)
ShareVecEnv.__init__(self, len(env_fns), observation_space,
share_observation_space, action_space)
def step_async(self, actions):
for remote, action in zip(self.remotes, actions):
remote.send(('step', action))
self.waiting = True
def step_wait(self):
results = [remote.recv() for remote in self.remotes]
self.waiting = False
obs, share_obs, rews, cos, dones, infos, available_actions = zip(*results)
return np.stack(obs), np.stack(share_obs), np.stack(rews), np.stack(cos), np.stack(dones), infos, np.stack(available_actions)
def reset(self, reset_choose):
for remote, choose in zip(self.remotes, reset_choose):
remote.send(('reset', choose))
results = [remote.recv() for remote in self.remotes]
obs, share_obs, available_actions = zip(*results)
return np.stack(obs), np.stack(share_obs), np.stack(available_actions)
def reset_task(self):
for remote in self.remotes:
remote.send(('reset_task', None))
return np.stack([remote.recv() for remote in self.remotes])
def close(self):
if self.closed:
return
if self.waiting:
for remote in self.remotes:
remote.recv()
for remote in self.remotes:
remote.send(('close', None))
for p in self.ps:
p.join()
self.closed = True
def chooseguardworker(remote, parent_remote, env_fn_wrapper):
parent_remote.close()
env = env_fn_wrapper.x()
while True:
cmd, data = remote.recv()
if cmd == 'step':
ob, reward, done, info = env.step(data)
remote.send((ob, reward, info["cost"], done, info))
elif cmd == 'reset':
ob = env.reset(data)
remote.send((ob))
elif cmd == 'reset_task':
ob = env.reset_task()
remote.send(ob)
elif cmd == 'close':
env.close()
remote.close()
break
elif cmd == 'get_spaces':
remote.send(
(env.observation_space, env.share_observation_space, env.action_space))
else:
raise NotImplementedError
class ChooseGuardSubprocVecEnv(ShareVecEnv):
def __init__(self, env_fns, spaces=None):
"""
envs: list of gym environments to run in subprocesses
"""
self.waiting = False
self.closed = False
nenvs = len(env_fns)
self.remotes, self.work_remotes = zip(*[Pipe() for _ in range(nenvs)])
self.ps = [Process(target=chooseguardworker, args=(work_remote, remote, CloudpickleWrapper(env_fn)))
for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)]
for p in self.ps:
p.daemon = False # if the main process crashes, we should not cause things to hang
p.start()
for remote in self.work_remotes:
remote.close()
self.remotes[0].send(('get_spaces', None))
observation_space, share_observation_space, action_space = self.remotes[0].recv(
)
ShareVecEnv.__init__(self, len(env_fns), observation_space,
share_observation_space, action_space)
def step_async(self, actions):
for remote, action in zip(self.remotes, actions):
remote.send(('step', action))
self.waiting = True
def step_wait(self):
results = [remote.recv() for remote in self.remotes]
self.waiting = False
obs, rews, cos, dones, infos = zip(*results)
return np.stack(obs), np.stack(rews), np.stack(cos), np.stack(dones), infos
def reset(self, reset_choose):
for remote, choose in zip(self.remotes, reset_choose):
remote.send(('reset', choose))
obs = [remote.recv() for remote in self.remotes]
return np.stack(obs)
def reset_task(self):
for remote in self.remotes:
remote.send(('reset_task', None))
return np.stack([remote.recv() for remote in self.remotes])
def close(self):
if self.closed:
return
if self.waiting:
for remote in self.remotes:
remote.recv()
for remote in self.remotes:
remote.send(('close', None))
for p in self.ps:
p.join()
self.closed = True
# single env
class DummyVecEnv(ShareVecEnv):
def __init__(self, env_fns):
self.envs = [fn() for fn in env_fns]
env = self.envs[0]
ShareVecEnv.__init__(self, len(
env_fns), env.observation_space, env.share_observation_space, env.action_space)
self.actions = None
def step_async(self, actions):
self.actions = actions
def step_wait(self):
results = [env.step(a) for (a, env) in zip(self.actions, self.envs)]
obs, rews, cos, dones, infos = map(np.array, zip(*results))
for (i, done) in enumerate(dones):
if 'bool' in done.__class__.__name__:
if done:
obs[i] = self.envs[i].reset()
else:
if np.all(done):
obs[i] = self.envs[i].reset()
self.actions = None
return obs, rews, cos, dones, infos
def reset(self):
obs = [env.reset() for env in self.envs]
return np.array(obs)
def close(self):
for env in self.envs:
env.close()
def render(self, mode="human"):
if mode == "rgb_array":
return np.array([env.render(mode=mode) for env in self.envs])
elif mode == "human":
for env in self.envs:
env.render(mode=mode)
else:
raise NotImplementedError
class ShareDummyVecEnv(ShareVecEnv):
def __init__(self, env_fns):
self.envs = [fn() for fn in env_fns]
env = self.envs[0]
ShareVecEnv.__init__(self, len(
env_fns), env.observation_space, env.share_observation_space, env.action_space)
self.actions = None
def step_async(self, actions):
self.actions = actions
def step_wait(self):
results = [env.step(a) for (a, env) in zip(self.actions, self.envs)]
obs, share_obs, rews, cos, dones, infos, available_actions = map(
np.array, zip(*results))
for (i, done) in enumerate(dones):
if 'bool' in done.__class__.__name__:
if done:
obs[i], share_obs[i], available_actions[i] = self.envs[i].reset()
else:
if np.all(done):
obs[i], share_obs[i], available_actions[i] = self.envs[i].reset()
self.actions = None
return obs, share_obs, rews, cos, dones, infos, available_actions
def reset(self):
results = [env.reset() for env in self.envs]
obs, share_obs, available_actions = map(np.array, zip(*results))
return obs, share_obs, available_actions
def close(self):
for env in self.envs:
env.close()
def render(self, mode="human"):
if mode == "rgb_array":
return np.array([env.render(mode=mode) for env in self.envs])
elif mode == "human":
for env in self.envs:
env.render(mode=mode)
else:
raise NotImplementedError
class ChooseDummyVecEnv(ShareVecEnv):
def __init__(self, env_fns):
self.envs = [fn() for fn in env_fns]
env = self.envs[0]
ShareVecEnv.__init__(self, len(
env_fns), env.observation_space, env.share_observation_space, env.action_space)
self.actions = None
def step_async(self, actions):
self.actions = actions
def step_wait(self):
results = [env.step(a) for (a, env) in zip(self.actions, self.envs)]
obs, share_obs, rews, cos, dones, infos, available_actions = map(
np.array, zip(*results))
self.actions = None
return obs, share_obs, rews, cos, dones, infos, available_actions
def reset(self, reset_choose):
results = [env.reset(choose)
for (env, choose) in zip(self.envs, reset_choose)]
obs, share_obs, available_actions = map(np.array, zip(*results))
return obs, share_obs, available_actions
def close(self):
for env in self.envs:
env.close()
def render(self, mode="human"):
if mode == "rgb_array":
return np.array([env.render(mode=mode) for env in self.envs])
elif mode == "human":
for env in self.envs:
env.render(mode=mode)
else:
raise NotImplementedError
class ChooseSimpleDummyVecEnv(ShareVecEnv):
def __init__(self, env_fns):
self.envs = [fn() for fn in env_fns]
env = self.envs[0]
ShareVecEnv.__init__(self, len(
env_fns), env.observation_space, env.share_observation_space, env.action_space)
self.actions = None
def step_async(self, actions):
self.actions = actions
def step_wait(self):
results = [env.step(a) for (a, env) in zip(self.actions, self.envs)]
obs, rews, cos, dones, infos = map(np.array, zip(*results))
self.actions = None
return obs, rews, cos, dones, infos
def reset(self, reset_choose):
obs = [env.reset(choose)
for (env, choose) in zip(self.envs, reset_choose)]
return np.array(obs)
def close(self):
for env in self.envs:
env.close()
def render(self, mode="human"):
if mode == "rgb_array":
return np.array([env.render(mode=mode) for env in self.envs])
elif mode == "human":
for env in self.envs:
env.render(mode=mode)
else:
raise NotImplementedError
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/MUJOCO_LOG.TXT
================================================
Sun Aug 29 11:16:41 2021
ERROR: Expired activation key
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/README.md
================================================
#### Safety Multi-agent Mujoco
## 1. Sate Many Agent Ant
According to Zanger's work,
The reward function is equal to the rewards in the common Ant-v2 environment and comprises the torso velocity in global x-direction, a negative control reward on exerted torque, a negative contact reward and a constant positive reward for survival, which results in
<img src="https://latex.codecogs.com/png.image?\dpi{110}&space;r=\frac{x_{\text&space;{torso&space;},&space;t+1}-x_{\text&space;{torso&space;},&space;t}}{d&space;t}-\frac{1}{2}\left\|\boldsymbol{a}_{t}\right\|_{2}^{2}-\frac{1}{2&space;*&space;10^{3}}&space;\|&space;\text&space;{&space;contact&space;}_{t}&space;\|_{2}^{2}+1" title="r=\frac{x_{\text {torso }, t+1}-x_{\text {torso }, t}}{d t}-\frac{1}{2}\left\|\boldsymbol{a}_{t}\right\|_{2}^{2}-\frac{1}{2 * 10^{3}} \| \text { contact }_{t} \|_{2}^{2}+1" />
```python
xposafter = self.get_body_com("torso_0")[0]
forward_reward = (xposafter - xposbefore)/self.dt
ctrl_cost = .5 * np.square(a).sum()
contact_cost = 0.5 * 1e-3 * np.sum(np.square(np.clip(self.sim.data.cfrc_ext, -1, 1)))
survive_reward = 1.0
reward = forward_reward - ctrl_cost - contact_cost + survive_reward
```
And the cost,
<img src="https://latex.codecogs.com/png.image?\dpi{110}&space;c=&space;\begin{cases}0,&space;&&space;\text&space;{&space;for&space;}&space;\quad&space;0.2&space;\leq&space;z_{\text&space;{torso&space;},&space;t+1}&space;\leq&space;1.0&space;\\&space;&&space;\text&space;{&space;and&space;}\left\|\boldsymbol{x}_{\text&space;{torso&space;},&space;t+1}-\boldsymbol{x}_{\text&space;{wall&space;}}\right\|_{2}&space;\geq&space;1.8&space;\\&space;1,&space;&&space;\text&space;{&space;else&space;}\end{cases}" title="c= \begin{cases}0, & \text { for } \quad 0.2 \leq z_{\text {torso }, t+1} \leq 1.0 \\ & \text { and }\left\|\boldsymbol{x}_{\text {torso }, t+1}-\boldsymbol{x}_{\text {wall }}\right\|_{2} \geq 1.8 \\ 1, & \text { else }\end{cases}" />
```python
yposafter = self.get_body_com("torso_0")[1]
ywall = np.array([-5, 5])
if xposafter < 20:
y_walldist = yposafter - xposafter * np.tan(30 / 360 * 2 * np.pi) + ywall
elif xposafter>20 and xposafter<60:
y_walldist = yposafter + (xposafter-40)*np.tan(30/360*2*np.pi) - ywall
elif xposafter>60 and xposafter<100:
y_walldist = yposafter - (xposafter-80)*np.tan(30/360*2*np.pi) + ywall
else:
y_walldist = yposafter - 20*np.tan(30/360*2*np.pi) + ywall
obj_cost = (abs(y_walldist) < 1.8).any() * 1.0
body_quat = self.data.get_body_xquat('torso_0')
z_rot = 1-2*(body_quat[1]**2+body_quat[2]**2) ### normally xx-rotation, not sure what axes mujoco uses
state = self.state_vector()
notdone = np.isfinite(state).all() \
and state[2] >= 0.2 and state[2] <= 1.0\
and z_rot>=-0.7 #ADDED
done = not notdone
done_cost = done * 1.0
cost = np.clip(obj_cost + done_cost, 0, 1)
```
[1] Zanger, Moritz A., Karam Daaboul, and J. Marius Zöllner. 2021. “Safe Continuous Control with Constrained Model-Based Policy Optimization.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/2104.06922.
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/__init__.py
================================================
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/__init__.py
================================================
from .mujoco_multi import MujocoMulti
from .coupled_half_cheetah import CoupledHalfCheetah
from .manyagent_swimmer import ManyAgentSwimmerEnv
from .manyagent_ant import ManyAgentAntEnv
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/ant.py
================================================
import numpy as np
# from mujoco_safety_gym.envs import mujoco_env
from macpo.envs.safety_ma_mujoco.safety_multiagent_mujoco import mujoco_env
from gym import utils
import mujoco_py as mjp
class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self, **kwargs):
mujoco_env.MujocoEnv.__init__(self, 'ant.xml', 5)
utils.EzPickle.__init__(self)
def step(self, a):
xposbefore = self.get_body_com("torso")[0]
self.do_simulation(a, self.frame_skip)
mjp.functions.mj_rnePostConstraint(self.sim.model,
self.sim.data) #### calc contacts, this is a mujoco py version mismatch issue with mujoco200
xposafter = self.get_body_com("torso")[0]
forward_reward = (xposafter - xposbefore) / self.dt
ctrl_cost = .5 * np.square(a).sum()
contact_cost = 0.5 * 1e-3 * np.sum(
np.square(np.clip(self.sim.data.cfrc_ext, -1, 1)))
survive_reward = 1.0
### safety stuff
yposafter = self.get_body_com("torso")[1]
ywall = np.array([-5, 5])
if xposafter < 20:
y_walldist = yposafter - xposafter * np.tan(30 / 360 * 2 * np.pi) + ywall
elif xposafter > 20 and xposafter < 60:
y_walldist = yposafter + (xposafter - 40) * np.tan(30 / 360 * 2 * np.pi) - ywall
elif xposafter > 60 and xposafter < 100:
y_walldist = yposafter - (xposafter - 80) * np.tan(30 / 360 * 2 * np.pi) + ywall
else:
y_walldist = yposafter - 20 * np.tan(30 / 360 * 2 * np.pi) + ywall
obj_cost = (abs(y_walldist) < 1.8).any() * 1.0
reward = forward_reward - ctrl_cost - contact_cost + survive_reward
body_quat = self.data.get_body_xquat('torso')
z_rot = 1 - 2 * (
body_quat[1] ** 2 + body_quat[2] ** 2) ### normally xx-rotation, not sure what axes mujoco uses
state = self.state_vector()
notdone = np.isfinite(state).all() \
and state[2] >= 0.2 and state[2] <= 1.0 \
and z_rot >= -0.7
done = not notdone
done_cost = done * 1.0
cost = np.clip(obj_cost + done_cost, 0, 1)
ob = self._get_obs()
return ob, reward, done, dict(
reward_forward=forward_reward,
reward_ctrl=-ctrl_cost,
reward_contact=-contact_cost,
reward_survive=survive_reward,
cost_obj=obj_cost,
cost_done=done_cost,
cost=cost,
)
def _get_obs(self):
x = self.sim.data.qpos.flat[0]
y = self.sim.data.qpos.flat[1]
if x < 20:
y_off = y - x * np.tan(30 / 360 * 2 * np.pi)
elif x > 20 and x < 60:
y_off = y + (x - 40) * np.tan(30 / 360 * 2 * np.pi)
elif x > 60 and x < 100:
y_off = y - (x - 80) * np.tan(30 / 360 * 2 * np.pi)
else:
y_off = y - 20 * np.tan(30 / 360 * 2 * np.pi)
return np.concatenate([
self.sim.data.qpos.flat[2:-42],
self.sim.data.qvel.flat[:-36],
[x / 5],
[y_off],
# np.clip(self.sim.data.cfrc_ext, -1, 1).flat,
])
def reset_model(self):
qpos = self.init_qpos + self.np_random.uniform(size=self.model.nq, low=-.1, high=.1)
qpos[-42:] = self.init_qpos[-42:]
qvel = self.init_qvel + self.np_random.randn(self.model.nv) * .1
qvel[-36:] = self.init_qvel[-36:]
self.set_state(qpos, qvel)
return self._get_obs()
def viewer_setup(self):
self.viewer.cam.distance = self.model.stat.extent * 0.5
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/.gitignore
================================================
*.auto.xml
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/__init__.py
================================================
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/ant.xml
================================================
<mujoco model="ant">
<compiler angle="degree" coordinate="local" inertiafromgeom="true"/>
<option integrator="RK4" timestep="0.01"/>
<custom>
<numeric data="0.0 0.0 0.55 1.0 0.0 0.0 0.0 0.0 1.0 0.0 -1.0 0.0 -1.0 0.0 1.0" name="init_qpos"/>
</custom>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="0" condim="3" density="5.0" friction="1 0.5 0.5" margin="0.01" rgba="0.8 0.6 0.4 1"/>
</default>
<asset>
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="#2c5987" rgb2="#1f4060" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<texture builtin="checker" height="100" name="texbox" rgb1="#ff66ff" rgb2="#ff66ff" type="2d" width="100"/>
<material name="BoxMat" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texbox"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="30 0 0" rgba="0.2 0.2 0.2 1" size="70 25 40" type="plane"/>
<!-- <geom conaffinity="1" condim="3" name="obj11" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="10 0 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="obj12" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="10 -10 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="obj13" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="10 10 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="obj21" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="20 -4 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="obj22" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="20 4 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="obj23" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="20 -14 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="obj24" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="20 14 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="obj31" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="30 0 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="obj32" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="30 -9 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="obj33" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="30 11 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="obj34" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="30 -16 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="obj35" type="box" material="BoxMat" size="0.5 0.5 0.5" pos="30 19 .5" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="wall1" type="box" material="BoxMat" size="0.1 14 1.0" pos="-14 0 1" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="wall2" type="box" material="BoxMat" size="14 .1 1.0" pos="0 14 1" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="wall3" type="box" material="BoxMat" size="14 0.1 1.0" pos="0 -14 1.0" rgba="#ff66ff"/> -->
<!-- <geom conaffinity="1" condim="3" name="wall2" type="box" density=".01" size="20 0.1 1.0" pos="0 6 1.0" euler='0 0 30' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall3" type="box" density=".01" size="20 0.1 1.0" pos="40 -6 1.0" euler='0 0 -30' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall4" type="box" density=".01" size="20 0.1 1.0" pos="40 6 1.0" euler='0 0 -30' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall5" type="box" density=".01" size="20 0.1 1.0" pos="80 -6 1.0" euler='0 0 30' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall6" type="box" density=".01" size="20 0.1 1.0" pos="80 6 1.0" euler='0 0 30' rgba="1 0.5 0.5 1"/> -->
<body name="torso" pos="0 0 0.75">
<camera name="track" mode="trackcom" pos="0 -10 -10" xyaxes=".8 .4 0 0 .4 .6"/>
<geom name="torso_geom" pos="0 0 0" size="0.25" type="sphere"/>
<joint armature="0" damping="0" limited="false" margin="0.01" name="root" pos="0 0 0" type="free"/>
<body name="front_left_leg" pos="0 0 0">
<geom fromto="0.0 0.0 0.0 0.2 0.2 0.0" name="aux_1_geom" size="0.08" type="capsule"/>
<body name="aux_1" pos="0.2 0.2 0">
<joint axis="0 0 1" name="hip_1" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.2 0.2 0.0" name="left_leg_geom" size="0.08" type="capsule"/>
<body pos="0.2 0.2 0">
<joint axis="-1 1 0" name="ankle_1" pos="0.0 0.0 0.0" range="30 70" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.4 0.4 0.0" name="left_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="front_right_leg" pos="0 0 0">
<geom fromto="0.0 0.0 0.0 -0.2 0.2 0.0" name="aux_2_geom" size="0.08" type="capsule"/>
<body name="aux_2" pos="-0.2 0.2 0">
<joint axis="0 0 1" name="hip_2" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.2 0.2 0.0" name="right_leg_geom" size="0.08" type="capsule"/>
<body pos="-0.2 0.2 0">
<joint axis="1 1 0" name="ankle_2" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.4 0.4 0.0" name="right_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="back_leg" pos="0 0 0">
<geom fromto="0.0 0.0 0.0 -0.2 -0.2 0.0" name="aux_3_geom" size="0.08" type="capsule"/>
<body name="aux_3" pos="-0.2 -0.2 0">
<joint axis="0 0 1" name="hip_3" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.2 -0.2 0.0" name="back_leg_geom" size="0.08" type="capsule"/>
<body pos="-0.2 -0.2 0">
<joint axis="-1 1 0" name="ankle_3" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.4 -0.4 0.0" name="third_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="right_back_leg" pos="0 0 0">
<geom fromto="0.0 0.0 0.0 0.2 -0.2 0.0" name="aux_4_geom" size="0.08" type="capsule"/>
<body name="aux_4" pos="0.2 -0.2 0">
<joint axis="0 0 1" name="hip_4" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.2 -0.2 0.0" name="rightback_leg_geom" size="0.08" type="capsule"/>
<body pos="0.2 -0.2 0">
<joint axis="1 1 0" name="ankle_4" pos="0.0 0.0 0.0" range="30 70" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.4 -0.4 0.0" name="fourth_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
</body>
<body name='b1' pos="0 5 1" euler='0 0 30'>
<freejoint name="b1_fj"/>
<geom conaffinity="1" condim="3" name="wall1" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b2' pos="0 -5 1" euler='0 0 30'>
<freejoint name="b2_fj"/>
<geom conaffinity="1" condim="3" name="wall2" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b3' pos="40 5 1" euler='0 0 -30'>
<freejoint name="b3_fj"/>
<geom conaffinity="1" condim="3" name="wall3" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b4' pos="40 -5 1" euler='0 0 -30'>
<freejoint name="b4_fj"/>
<geom conaffinity="1" condim="3" name="wall4" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b5' pos="80 5 1" euler='0 0 30'>
<freejoint name="b5_fj"/>
<geom conaffinity="1" condim="3" name="wall5" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b6' pos="80 -5 1" euler='0 0 30'>
<freejoint name="b6_fj"/>
<geom conaffinity="1" condim="3" name="wall6" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_4" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_4" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_1" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_1" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_2" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_2" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_3" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_3" gear="150"/>
</actuator>
</mujoco>
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/coupled_half_cheetah.xml
================================================
<!-- Cheetah Model
The state space is populated with joints in the order that they are
defined in this file. The actuators also operate on joints.
State-Space (name/joint/parameter):
- rootx slider position (m)
- rootz slider position (m)
- rooty hinge angle (rad)
- bthigh hinge angle (rad)
- bshin hinge angle (rad)
- bfoot hinge angle (rad)
- fthigh hinge angle (rad)
- fshin hinge angle (rad)
- ffoot hinge angle (rad)
- rootx slider velocity (m/s)
- rootz slider velocity (m/s)
- rooty hinge angular velocity (rad/s)
- bthigh hinge angular velocity (rad/s)
- bshin hinge angular velocity (rad/s)
- bfoot hinge angular velocity (rad/s)
- fthigh hinge angular velocity (rad/s)
- fshin hinge angular velocity (rad/s)
- ffoot hinge angular velocity (rad/s)
Actuators (name/actuator/parameter):
- bthigh hinge torque (N m)
- bshin hinge torque (N m)
- bfoot hinge torque (N m)
- fthigh hinge torque (N m)
- fshin hinge torque (N m)
- ffoot hinge torque (N m)
-->
<mujoco model="cheetah">
<compiler angle="radian" coordinate="local" inertiafromgeom="true" settotalmass="14"/>
<default>
<joint armature=".1" damping=".01" limited="true" solimplimit="0 .8 .03" solreflimit=".02 1" stiffness="8"/>
<geom conaffinity="0" condim="3" contype="1" friction=".4 .1 .1" rgba="0.8 0.6 .4 1" solimp="0.0 0.8 0.01" solref="0.02 1"/>
<motor ctrllimited="true" ctrlrange="-1 1"/>
</default>
<size nstack="300000" nuser_geom="1"/>
<option gravity="0 0 -9.81" timestep="0.01"/>
<asset>
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="65 0 0" rgba="0.2 0.2 0.2 1" size="150 40 40" type="plane"/>
<geom conaffinity="1" condim="3" name="wall1" type="box" size="24.8 0.1 1.0" pos="0 -7.3 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall2" type="box" size="24.8 0.1 1.0" pos="0 7.3 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall3" type="box" size="24.8 0.1 1.0" pos="50 -4 1.0" euler='0 0 -0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall4" type="box" size="24.8 0.1 1.0" pos="50 4 1.0" euler='0 0 -0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall5" type="box" size="24.8 0.1 1.0" pos="100 -4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall6" type="box" size="24.8 0.1 1.0" pos="100 4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall7" type="box" size="24.8 0.1 1.0" pos="150 -4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall8" type="box" size="24.8 0.1 1.0" pos="150 4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall9" type="box" size="24.8 0.1 1.0" pos="-50 4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall10" type="box" size="24.8 0.1 1.0" pos="-50 -4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<body name="obj1" pos="-39 0 .7">
<geom conaffinity="1" condim="3" name="obj_geom" pos='5 0 .7' density="0.0001" type="box" size=".1 2.3 1.3" rgba="1 0.5 0.5 .8"/>
<!--<joint axis="1 0 0" damping=".2" name="wall_joint" pos="5 0 .7" range="-10000 10000" stiffness=".0" type="slide"/>-->
<joint axis="1 0 0" damping=".2" name="wall_joint" pos="2 0 .7" range="-30 30" stiffness=".0" type="slide"/>
<!--<joint axis="1 0 0" damping=".2" name="wall_joint1" pos="2 0 .7" range="10 20" stiffness=".0" type="slide"/>-->
</body>
<!-- <body name="obj2" pos="5 0 .7">-->
<!-- <geom conaffinity="1" condim="3" name="obj_geom1" pos='5 0 .7' density="0.0001" type="box" size=".1 2.3 1.3" rgba="1 0.5 0.5 .8"/>-->
<!-- <!–<joint axis="1 0 0" damping=".2" name="wall_joint" pos="5 0 .7" range="-10000 10000" stiffness=".0" type="slide"/>–>-->
<!-- <joint axis="1 0 0" damping=".2" name="wall_joint1" pos="2 0 .7" range="-10000 10000" stiffness=".0" type="slide"/>-->
<!-- <!–<joint axis="1 0 0" damping=".2" name="wall_joint1" pos="2 0 .7" range="10 20" stiffness=".0" type="slide"/>–>-->
<!-- </body>-->
wallpos1
<!--<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane"/>-->
<body name="torso" pos="0 -1 .7">
<site name="t1" pos="0.0 0 0" size="0.1"/>
<camera name="track" mode="trackcom" pos="0 -3 0.3" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 0" stiffness="0" type="hinge"/>
<geom fromto="-.5 0 0 .5 0 0" name="torso" size="0.046" type="capsule"/>
<geom axisangle="0 1 0 .87" name="head" pos=".6 0 .1" size="0.046 .15" type="capsule"/>
<!-- <site name='tip' pos='.15 0 .11'/>-->
<body name="bthigh" pos="-.5 0 0">
<joint axis="0 1 0" damping="6" name="bthigh" pos="0 0 0" range="-.52 1.05" stiffness="240" type="hinge"/>
<geom axisangle="0 1 0 -3.8" name="bthigh" pos=".1 0 -.13" size="0.046 .145" type="capsule"/>
<body name="bshin" pos=".16 0 -.25">
<joint axis="0 1 0" damping="4.5" name="bshin" pos="0 0 0" range="-.785 .785" stiffness="180" type="hinge"/>
<geom axisangle="0 1 0 -2.03" name="bshin" pos="-.14 0 -.07" rgba="0.9 0.6 0.6 1" size="0.046 .15" type="capsule"/>
<body name="bfoot" pos="-.28 0 -.14">
<joint axis="0 1 0" damping="3" name="bfoot" pos="0 0 0" range="-.4 .785" stiffness="120" type="hinge"/>
<geom axisangle="0 1 0 -.27" name="bfoot" pos=".03 0 -.097" rgba="0.9 0.6 0.6 1" size="0.046 .094" type="capsule"/>
</body>
</body>
</body>
<body name="fthigh" pos=".5 0 0">
<joint axis="0 1 0" damping="4.5" name="fthigh" pos="0 0 0" range="-1 .7" stiffness="180" type="hinge"/>
<geom axisangle="0 1 0 .52" name="fthigh" pos="-.07 0 -.12" size="0.046 .133" type="capsule"/>
<body name="fshin" pos="-.14 0 -.24">
<joint axis="0 1 0" damping="3" name="fshin" pos="0 0 0" range="-1.2 .87" stiffness="120" type="hinge"/>
<geom axisangle="0 1 0 -.6" name="fshin" pos=".065 0 -.09" rgba="0.9 0.6 0.6 1" size="0.046 .106" type="capsule"/>
<body name="ffoot" pos=".13 0 -.18">
<joint axis="0 1 0" damping="1.5" name="ffoot" pos="0 0 0" range="-.5 .5" stiffness="60" type="hinge"/>
<geom axisangle="0 1 0 -.6" name="ffoot" pos=".045 0 -.07" rgba="0.9 0.6 0.6 1" size="0.046 .07" type="capsule"/>
</body>
</body>
</body>
</body>
<!-- second cheetah definition -->
<body name="torso2" pos="0 1 .7">
<site name="t2" pos="0 0 0" size="0.1"/>
<camera name="track2" mode="trackcom" pos="0 -3 0.3" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx2" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz2" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty2" pos="0 0 0" stiffness="0" type="hinge"/>
<geom fromto="-.5 0 0 .5 0 0" name="torso2" size="0.046" type="capsule"/>
<geom axisangle="0 1 0 .87" name="head2" pos=".6 0 .1" size="0.046 .15" type="capsule"/>
<!-- <site name='tip' pos='.15 0 .11'/>-->
<body name="bthigh2" pos="-.5 0 0">
<joint axis="0 1 0" damping="6" name="bthigh2" pos="0 0 0" range="-.52 1.05" stiffness="240" type="hinge"/>
<geom axisangle="0 1 0 -3.8" name="bthigh2" pos=".1 0 -.13" size="0.046 .145" type="capsule"/>
<body name="bshin2" pos=".16 0 -.25">
<joint axis="0 1 0" damping="4.5" name="bshin2" pos="0 0 0" range="-.785 .785" stiffness="180" type="hinge"/>
<geom axisangle="0 1 0 -2.03" name="bshin2" pos="-.14 0 -.07" rgba="0.9 0.6 0.6 1" size="0.046 .15" type="capsule"/>
<body name="bfoot2" pos="-.28 0 -.14">
<joint axis="0 1 0" damping="3" name="bfoot2" pos="0 0 0" range="-.4 .785" stiffness="120" type="hinge"/>
<geom axisangle="0 1 0 -.27" name="bfoot2" pos=".03 0 -.097" rgba="0.9 0.6 0.6 1" size="0.046 .094" type="capsule"/>
</body>
</body>
</body>
<body name="fthigh2" pos=".5 0 0">
<joint axis="0 1 0" damping="4.5" name="fthigh2" pos="0 0 0" range="-1 .7" stiffness="180" type="hinge"/>
<geom axisangle="0 1 0 .52" name="fthigh2" pos="-.07 0 -.12" size="0.046 .133" type="capsule"/>
<body name="fshin2" pos="-.14 0 -.24">
<joint axis="0 1 0" damping="3" name="fshin2" pos="0 0 0" range="-1.2 .87" stiffness="120" type="hinge"/>
<geom axisangle="0 1 0 -.6" name="fshin2" pos=".065 0 -.09" rgba="0.9 0.6 0.6 1" size="0.046 .106" type="capsule"/>
<body name="ffoot2" pos=".13 0 -.18">
<joint axis="0 1 0" damping="1.5" name="ffoot2" pos="0 0 0" range="-.5 .5" stiffness="60" type="hinge"/>
<geom axisangle="0 1 0 -.6" name="ffoot2" pos=".045 0 -.07" rgba="0.9 0.6 0.6 1" size="0.046 .07" type="capsule"/>
</body>
</body>
</body>
</body>
</worldbody>
<tendon>
<spatial name="tendon1" width="0.05" rgba=".95 .3 .3 1" limited="true" range="1.5 3.5" stiffness="0.1">
<site site="t1"/>
<site site="t2"/>
</spatial>
</tendon>-
<actuator>
<motor gear="120" joint="bthigh" name="bthigh"/>
<motor gear="90" joint="bshin" name="bshin"/>
<motor gear="60" joint="bfoot" name="bfoot"/>
<motor gear="120" joint="fthigh" name="fthigh"/>
<motor gear="60" joint="fshin" name="fshin"/>
<motor gear="30" joint="ffoot" name="ffoot"/>
<motor gear="120" joint="bthigh2" name="bthigh2"/>
<motor gear="90" joint="bshin2" name="bshin2"/>
<motor gear="60" joint="bfoot2" name="bfoot2"/>
<motor gear="120" joint="fthigh2" name="fthigh2"/>
<motor gear="60" joint="fshin2" name="fshin2"/>
<motor gear="30" joint="ffoot2" name="ffoot2"/>
<motor gear="120" joint="wall_joint" name="wall_joint_ac"/>
<!--<motor gear="120" joint="wall_joint1" name="wall_joint_ac1"/>-->
</actuator>
</mujoco>
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/half_cheetah.xml
================================================
<!-- Cheetah Model
The state space is populated with joints in the order that they are
defined in this file. The actuators also operate on joints.
State-Space (name/joint/parameter):
- rootx slider position (m)
- rootz slider position (m)
- rooty hinge angle (rad)
- bthigh hinge angle (rad)
- bshin hinge angle (rad)
- bfoot hinge angle (rad)
- fthigh hinge angle (rad)
- fshin hinge angle (rad)
- ffoot hinge angle (rad)
- rootx slider velocity (m/s)
- rootz slider velocity (m/s)
- rooty hinge angular velocity (rad/s)
- bthigh hinge angular velocity (rad/s)
- bshin hinge angular velocity (rad/s)
- bfoot hinge angular velocity (rad/s)
- fthigh hinge angular velocity (rad/s)
- fshin hinge angular velocity (rad/s)
- ffoot hinge angular velocity (rad/s)
Actuators (name/actuator/parameter):
- bthigh hinge torque (N m)
- bshin hinge torque (N m)
- bfoot hinge torque (N m)
- fthigh hinge torque (N m)
- fshin hinge torque (N m)
- ffoot hinge torque (N m)
-->
<mujoco model="cheetah">
<compiler angle="radian" coordinate="local" inertiafromgeom="true" settotalmass="14"/>
<default>
<joint armature=".1" damping=".01" limited="true" solimplimit="0 .8 .03" solreflimit=".02 1" stiffness="8"/>
<geom conaffinity="0" condim="3" contype="1" friction=".4 .1 .1" rgba="0.8 0.6 .4 1" solimp="0.0 0.8 0.01" solref="0.02 1"/>
<motor ctrllimited="true" ctrlrange="-1 1"/>
</default>
<size nstack="300000" nuser_geom="1"/>
<option gravity="0 0 -9.81" timestep="0.01"/>
<asset>
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="#2c5987" rgb2="#1f4060" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="65 0 0" rgba="0.2 0.2 0.2 1" size="150 40 40" type="plane"/>
<geom conaffinity="1" condim="3" name="wall1" type="box" size="24.8 0.1 1.0" pos="0 -4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall2" type="box" size="24.8 0.1 1.0" pos="0 4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall3" type="box" size="24.8 0.1 1.0" pos="50 -4 1.0" euler='0 0 -0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall4" type="box" size="24.8 0.1 1.0" pos="50 4 1.0" euler='0 0 -0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall5" type="box" size="24.8 0.1 1.0" pos="100 -4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall6" type="box" size="24.8 0.1 1.0" pos="100 4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall7" type="box" size="24.8 0.1 1.0" pos="150 -4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall8" type="box" size="24.8 0.1 1.0" pos="150 4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<!-- <body name="obj1" pos="5 0 .7">-->
<!-- <geom conaffinity="1" condim="3" name="obj_geom" pos='5 0 .7' density="0.0001" type="box" size=".1 2.3 1.3" rgba="1 0.5 0.5 .8"/>-->
<!-- <joint axis="1 0 0" damping=".2" name="wall_joint" pos="5 0 .7" range="-10000 10000" stiffness=".0" type="slide"/>-->
<!-- </body>-->
<!-- <body name="obj1" pos="-39 0 .7">-->
<!-- <geom conaffinity="1" condim="3" name="obj_geom" pos='5 0 .7' density="0.0001" type="box" size=".1 2.3 1.3" rgba="1 0.5 0.5 .8"/>-->
<!-- <joint axis="1 0 0" damping=".2" name="wall_joint" pos="2 0 .7" range="-30 30" stiffness=".0" type="slide"/>-->
<!-- </body>-->
<body name="obj1" pos="5 0 .7">
<geom conaffinity="1" condim="3" name="obj_geom" pos='5 0 .7' density="0.0001" type="box" size=".1 2.3 1.3" rgba="1 0.5 0.5 .8"/>
<joint axis="1 0 0" damping=".2" name="wall_joint" pos="5 0 .7" range="-5000 5000" stiffness=".0" type="slide"/>
</body>
<body name="torso" pos="0 0 .7">
<camera name="track" mode="trackcom" pos="0 -3 0.3" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 0" stiffness="0" type="hinge"/>
<geom fromto="-.5 0 0 .5 0 0" name="torso" size="0.046" type="capsule"/>
<geom axisangle="0 1 0 .87" name="head" pos=".6 0 .1" size="0.046 .15" type="capsule"/>
<!-- <site name='tip' pos='.15 0 .11'/>-->
<body name="bthigh" pos="-.5 0 0">
<joint axis="0 1 0" damping="6" name="bthigh" pos="0 0 0" range="-.52 1.05" stiffness="240" type="hinge"/>
<geom axisangle="0 1 0 -3.8" name="bthigh" pos=".1 0 -.13" size="0.046 .145" type="capsule"/>
<body name="bshin" pos=".16 0 -.25">
<joint axis="0 1 0" damping="4.5" name="bshin" pos="0 0 0" range="-.785 .785" stiffness="180" type="hinge"/>
<geom axisangle="0 1 0 -2.03" name="bshin" pos="-.14 0 -.07" rgba="0.9 0.6 0.6 1" size="0.046 .15" type="capsule"/>
<body name="bfoot" pos="-.28 0 -.14">
<joint axis="0 1 0" damping="3" name="bfoot" pos="0 0 0" range="-.4 .785" stiffness="120" type="hinge"/>
<geom axisangle="0 1 0 -.27" name="bfoot" pos=".03 0 -.097" rgba="0.9 0.6 0.6 1" size="0.046 .094" type="capsule"/>
</body>
</body>
</body>
<body name="fthigh" pos=".5 0 0">
<joint axis="0 1 0" damping="4.5" name="fthigh" pos="0 0 0" range="-1 .7" stiffness="180" type="hinge"/>
<geom axisangle="0 1 0 .52" name="fthigh" pos="-.07 0 -.12" size="0.046 .133" type="capsule"/>
<body name="fshin" pos="-.14 0 -.24">
<joint axis="0 1 0" damping="3" name="fshin" pos="0 0 0" range="-1.2 .87" stiffness="120" type="hinge"/>
<geom axisangle="0 1 0 -.6" name="fshin" pos=".065 0 -.09" rgba="0.9 0.6 0.6 1" size="0.046 .106" type="capsule"/>
<body name="ffoot" pos=".13 0 -.18">
<joint axis="0 1 0" damping="1.5" name="ffoot" pos="0 0 0" range="-.5 .5" stiffness="60" type="hinge"/>
<geom axisangle="0 1 0 -.6" name="ffoot" pos=".045 0 -.07" rgba="0.9 0.6 0.6 1" size="0.046 .07" type="capsule"/>
</body>
</body>
</body>
</body>
</worldbody>
<!-- <equality>
<weld name="weld1" body1="mocap1" body2="obj1" solref=".02 2.5"/>
</equality> -->
<actuator>
<motor gear="120" joint="bthigh" name="bthigh"/>
<motor gear="90" joint="bshin" name="bshin"/>
<motor gear="60" joint="bfoot" name="bfoot"/>
<motor gear="120" joint="fthigh" name="fthigh"/>
<motor gear="60" joint="fshin" name="fshin"/>
<motor gear="30" joint="ffoot" name="ffoot"/>
<motor gear="120" joint="wall_joint" name="wall_joint_ac"/>
</actuator>
</mujoco>
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/hopper.xml
================================================
<mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<visual>
<map znear="0.02"/>
</visual>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" name="floor" pos="40 0 0" rgba="0.2 0.2 0.2 1" size="100 25 .125" type="plane" material="MatPlane"/>
<geom conaffinity="1" condim="3" name="wall1" type="box" size="24.8 0.1 1.0" pos="0 -4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall2" type="box" size="24.8 0.1 1.0" pos="0 4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall3" type="box" size="24.8 0.1 1.0" pos="50 -4 1.0" euler='0 0 -0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall4" type="box" size="24.8 0.1 1.0" pos="50 4 1.0" euler='0 0 -0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall5" type="box" size="24.8 0.1 1.0" pos="100 -4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<geom conaffinity="1" condim="3" name="wall6" type="box" size="24.8 0.1 1.0" pos="100 4 1.0" euler='0 0 0' rgba="1 0.5 0.5 1"/>
<body name="mocap1" pos="5 0 .5" mocap="true">
<geom conaffinity="0" condim="3" name="mocap_geom" pos='5 0 .5' type="box" size=".3 0.3 0.3" rgba="1 0.5 0.5 0"/>
</body>
<body name="obj1" pos="5 0 .5">
<freejoint name="obj1_fj"/>
<geom conaffinity="1" condim="3" name="obj_geom" pos='5 0 .5' type="box" size=".3 0.3 0.3" rgba="1 0.5 0.5 1"/>
</body>
<body name="torso" pos="0 1 1.25">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
<body name="thigh" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
<body name="leg" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
<body name="foot" pos="0.13/2 0 0.1">
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
</body>
</body>
</body>
</body>
</worldbody>
<equality>
<weld name="weld1" body1="mocap1" body2="obj1" solref=".02 .5"/>
</equality>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
</actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="#2c5987" rgb2="#1f4060" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/humanoid.xml
================================================
<mujoco model="humanoid">
<compiler angle="degree" inertiafromgeom="true"/>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" iterations="50" solver="PGS" timestep="0.003">
<!-- <flags solverstat="enable" energy="enable"/>-->
</option>
<size nkey="5" nuser_geom="1"/>
<visual>
<map fogend="5" fogstart="3"/>
</visual>
<asset>
<texture builtin="gradient" height="100" rgb1=".4 .5 .6" rgb2="0 0 0" type="skybox" width="100"/>
<!-- <texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>-->
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<material name="geom" texture="texgeom" texuniform="true"/>
<texture builtin="checker" height="100" name="texplane" rgb1="#2c5987" rgb2="#1f4060" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom condim="3" friction="1 .1 .1" material="MatPlane" name="floor" pos="30 0 0" rgba="0.2 0.2 0.2 1" size="70 25 0.125" type="plane"/>
<!-- <geom condim="3" material="MatPlane" name="floor" pos="0 0 0" size="10 10 0.125" type="plane"/>-->
<body name="torso" pos="0 0 1.4">
<camera name="track" mode="trackcom" pos="0 -4 0" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" damping="0" limited="false" name="root" pos="0 0 0" stiffness="0" type="free"/>
<geom fromto="0 -.07 0 0 .07 0" name="torso1" size="0.07" type="capsule"/>
<geom name="head" pos="0 0 .19" size=".09" type="sphere" user="258"/>
<geom fromto="-.01 -.06 -.12 -.01 .06 -.12" name="uwaist" size="0.06" type="capsule"/>
<body name="lwaist" pos="-.01 0 -0.260" quat="1.000 0 -0.002 0">
<geom fromto="0 -.06 0 0 .06 0" name="lwaist" size="0.06" type="capsule"/>
<joint armature="0.02" axis="0 0 1" damping="5" name="abdomen_z" pos="0 0 0.065" range="-45 45" stiffness="20" type="hinge"/>
<joint armature="0.02" axis="0 1 0" damping="5" name="abdomen_y" pos="0 0 0.065" range="-75 30" stiffness="10" type="hinge"/>
<body name="pelvis" pos="0 0 -0.165" quat="1.000 0 -0.002 0">
<joint armature="0.02" axis="1 0 0" damping="5" name="abdomen_x" pos="0 0 0.1" range="-35 35" stiffness="10" type="hinge"/>
<geom fromto="-.02 -.07 0 -.02 .07 0" name="butt" size="0.09" type="capsule"/>
<body name="right_thigh" pos="0 -0.1 -0.04">
<joint armature="0.01" axis="1 0 0" damping="5" name="right_hip_x" pos="0 0 0" range="-25 5" stiffness="10" type="hinge"/>
<joint armature="0.01" axis="0 0 1" damping="5" name="right_hip_z" pos="0 0 0" range="-60 35" stiffness="10" type="hinge"/>
<joint armature="0.0080" axis="0 1 0" damping="5" name="right_hip_y" pos="0 0 0" range="-110 20" stiffness="20" type="hinge"/>
<geom fromto="0 0 0 0 0.01 -.34" name="right_thigh1" size="0.06" type="capsule"/>
<body name="right_shin" pos="0 0.01 -0.403">
<joint armature="0.0060" axis="0 -1 0" name="right_knee" pos="0 0 .02" range="-160 -2" type="hinge"/>
<geom fromto="0 0 0 0 0 -.3" name="right_shin1" size="0.049" type="capsule"/>
<body name="right_foot" pos="0 0 -0.45">
<geom name="right_foot" pos="0 0 0.1" size="0.075" type="sphere" user="0"/>
</body>
</body>
</body>
<body name="left_thigh" pos="0 0.1 -0.04">
<joint armature="0.01" axis="-1 0 0" damping="5" name="left_hip_x" pos="0 0 0" range="-25 5" stiffness="10" type="hinge"/>
<joint armature="0.01" axis="0 0 -1" damping="5" name="left_hip_z" pos="0 0 0" range="-60 35" stiffness="10" type="hinge"/>
<joint armature="0.01" axis="0 1 0" damping="5" name="left_hip_y" pos="0 0 0" range="-110 20" stiffness="20" type="hinge"/>
<geom fromto="0 0 0 0 -0.01 -.34" name="left_thigh1" size="0.06" type="capsule"/>
<body name="left_shin" pos="0 -0.01 -0.403">
<joint armature="0.0060" axis="0 -1 0" name="left_knee" pos="0 0 .02" range="-160 -2" stiffness="1" type="hinge"/>
<geom fromto="0 0 0 0 0 -.3" name="left_shin1" size="0.049" type="capsule"/>
<body name="left_foot" pos="0 0 -0.45">
<geom name="left_foot" type="sphere" size="0.075" pos="0 0 0.1" user="0" />
</body>
</body>
</body>
</body>
</body>
<body name="right_upper_arm" pos="0 -0.17 0.06">
<joint armature="0.0068" axis="2 1 1" name="right_shoulder1" pos="0 0 0" range="-85 60" stiffness="1" type="hinge"/>
<joint armature="0.0051" axis="0 -1 1" name="right_shoulder2" pos="0 0 0" range="-85 60" stiffness="1" type="hinge"/>
<geom fromto="0 0 0 .16 -.16 -.16" name="right_uarm1" size="0.04 0.16" type="capsule"/>
<body name="right_lower_arm" pos=".18 -.18 -.18">
<joint armature="0.0028" axis="0 -1 1" name="right_elbow" pos="0 0 0" range="-90 50" stiffness="0" type="hinge"/>
<geom fromto="0.01 0.01 0.01 .17 .17 .17" name="right_larm" size="0.031" type="capsule"/>
<geom name="right_hand" pos=".18 .18 .18" size="0.04" type="sphere"/>
<camera pos="0 0 0"/>
</body>
</body>
<body name="left_upper_arm" pos="0 0.17 0.06">
<joint armature="0.0068" axis="2 -1 1" name="left_shoulder1" pos="0 0 0" range="-60 85" stiffness="1" type="hinge"/>
<joint armature="0.0051" axis="0 1 1" name="left_shoulder2" pos="0 0 0" range="-60 85" stiffness="1" type="hinge"/>
<geom fromto="0 0 0 .16 .16 -.16" name="left_uarm1" size="0.04 0.16" type="capsule"/>
<body name="left_lower_arm" pos=".18 .18 -.18">
<joint armature="0.0028" axis="0 -1 -1" name="left_elbow" pos="0 0 0" range="-90 50" stiffness="0" type="hinge"/>
<geom fromto="0.01 -0.01 0.01 .17 -.17 .17" name="left_larm" size="0.031" type="capsule"/>
<geom name="left_hand" pos=".18 -.18 .18" size="0.04" type="sphere"/>
</body>
</body>
</body>
<body name='b1' pos="0 2.3 1" euler='0 0 30'>
<freejoint name="b1_fj"/>
<geom conaffinity="1" condim="3" name="wall1" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b2' pos="0 -2.3 1" euler='0 0 30'>
<freejoint name="b2_fj"/>
<geom conaffinity="1" condim="3" name="wall2" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b3' pos="40 2.3 1" euler='0 0 -30'>
<freejoint name="b3_fj"/>
<geom conaffinity="1" condim="3" name="wall3" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b4' pos="40 -2.3 1" euler='0 0 -30'>
<freejoint name="b4_fj"/>
<geom conaffinity="1" condim="3" name="wall4" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b5' pos="80 2.3 1" euler='0 0 30'>
<freejoint name="b5_fj"/>
<geom conaffinity="1" condim="3" name="wall5" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b6' pos="80 -2.3 1" euler='0 0 30'>
<freejoint name="b6_fj"/>
<geom conaffinity="1" condim="3" name="wall6" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
</worldbody>
<tendon>
<fixed name="left_hipknee">
<joint coef="-1" joint="left_hip_y"/>
<joint coef="1" joint="left_knee"/>
</fixed>
<fixed name="right_hipknee">
<joint coef="-1" joint="right_hip_y"/>
<joint coef="1" joint="right_knee"/>
</fixed>
</tendon>
<actuator>
<motor gear="100" joint="abdomen_y" name="abdomen_y"/>
<motor gear="100" joint="abdomen_z" name="abdomen_z"/>
<motor gear="100" joint="abdomen_x" name="abdomen_x"/>
<motor gear="100" joint="right_hip_x" name="right_hip_x"/>
<motor gear="100" joint="right_hip_z" name="right_hip_z"/>
<motor gear="300" joint="right_hip_y" name="right_hip_y"/>
<motor gear="200" joint="right_knee" name="right_knee"/>
<motor gear="100" joint="left_hip_x" name="left_hip_x"/>
<motor gear="100" joint="left_hip_z" name="left_hip_z"/>
<motor gear="300" joint="left_hip_y" name="left_hip_y"/>
<motor gear="200" joint="left_knee" name="left_knee"/>
<motor gear="25" joint="right_shoulder1" name="right_shoulder1"/>
<motor gear="25" joint="right_shoulder2" name="right_shoulder2"/>
<motor gear="25" joint="right_elbow" name="right_elbow"/>
<motor gear="25" joint="left_shoulder1" name="left_shoulder1"/>
<motor gear="25" joint="left_shoulder2" name="left_shoulder2"/>
<motor gear="25" joint="left_elbow" name="left_elbow"/>
</actuator>
</mujoco>
================================================
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_ant.xml
================================================
<mujoco model="ant">
<size nconmax="200"/>
<compiler angle="degree" coordinate="local" inertiafromgeom="true"/>
<option integrator="RK4" timestep="0.01"/>
<custom>
<numeric data="0.0 0.0 0.55 1.0 0.0 0.0 0.0 0.0 1.0 0.0 -1.0 0.0 -1.0 0.0 1.0" name="init_qpos"/>
</custom>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="0" condim="3" density="5.0" friction="1 0.5 0.5" margin="0.01" rgba="0.8 0.6 0.4 1"/>
</default>
<asset>
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<texture builtin="checker" height="100" name="texbox" rgb1="#ff66ff" rgb2="#ff66ff" type="2d" width="100"/>
<material name="BoxMat" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texbox"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<!-- <geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane"/>-->
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="30 0 0" rgba="0.2 0.2 0.2 1" size="70 25 40" type="plane"/>
<geom conaffinity="1" condim="3" name="wall1" type="box" material="BoxMat" size="0.1 14 1.0" pos="-14 0 1" rgba="#ff66ff"/>
<geom conaffinity="1" condim="3" name="wall2" type="box" material="BoxMat" size="14 .1 1.0" pos="0 14 1" rgba="#ff66ff"/>
<body name="torso" pos="0 0 0.75">
<!-- <camera name="track" mode="trackcom" pos="0 -3 0.3" xyaxes="1 0 0 0 0 1"/>-->
<camera name="track" mode="trackcom" pos="0 -10 -10" xyaxes=".8 .4 0 0 .4 .6"/>
<geom name="torso_geom" pos="0 0 0" size="0.25" type="sphere"/>
<joint armature="0" damping="0" limited="false" margin="0.01" name="root" pos="0 0 0" type="free"/>
<body name="front_left_leg" pos="0 0 0">
<geom fromto="0.0 0.0 0.0 0.2 0.2 0.0" name="aux_1_geom" size="0.08" type="capsule"/>
<body name="aux_1" pos="0.2 0.2 0">
<joint axis="0 0 1" name="hip_1" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.2 0.2 0.0" name="left_leg_geom" size="0.08" type="capsule"/>
<body pos="0.2 0.2 0">
<joint axis="-1 1 0" name="ankle_1" pos="0.0 0.0 0.0" range="30 70" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.4 0.4 0.0" name="left_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="right_back_leg" pos="0 0 0">
<geom fromto="0.0 0.0 0.0 0.2 -0.2 0.0" name="aux_4_geom" size="0.08" type="capsule"/>
<body name="aux_4" pos="0.2 -0.2 0">
<joint axis="0 0 1" name="hip_4" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.2 -0.2 0.0" name="rightback_leg_geom" size="0.08" type="capsule"/>
<body pos="0.2 -0.2 0">
<joint axis="1 1 0" name="ankle_4" pos="0.0 0.0 0.0" range="30 70" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.4 -0.4 0.0" name="fourth_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="midx" pos="0.0 0 0">
<geom density="1000" fromto="0 0 0 -1 0 0" size="0.1" type="capsule"/>
<!--<joint axis="0 0 1" limited="true" name="rot2" pos="0 0 0" range="-100 100" type="hinge"/>-->
<body name="front_right_legx" pos="-1 0 0">
<geom fromto="0.0 0.0 0.0 0.0 0.2 0.0" name="aux_2_geomx" size="0.08" type="capsule"/>
<body name="aux_2x" pos="0.0 0.2 0">
<joint axis="0 0 1" name="hip_2x" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.2 0.2 0.0" name="right_leg_geomx" size="0.08" type="capsule"/>
<body pos="-0.2 0.2 0">
<joint axis="1 1 0" name="ankle_2x" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.4 0.4 0.0" name="right_ankle_geomx" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="back_legx" pos="-1 0 0">
<geom fromto="0.0 0.0 0.0 0.0 -0.2 0.0" name="aux_3_geomx" size="0.08" type="capsule"/>
<body name="aux_3x" pos="0.0 -0.2 0">
<joint axis="0 0 1" name="hip_3x" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.2 -0.2 0.0" name="back_leg_geomx" size="0.08" type="capsule"/>
<body pos="-0.2 -0.2 0">
<joint axis="-1 1 0" name="ankle_3x" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.4 -0.4 0.0" name="third_ankle_geomx" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="mid" pos="-1 0 0">
<geom density="1000" fromto="0 0 0 -1 0 0" size="0.1" type="capsule"/>
<!--<joint axis="0 0 1" limited="true" name="rot2" pos="0 0 0" range="-100 100" type="hinge"/>-->
<!--<body name="front_right_leg" pos="-1 0 0">
<geom fromto="0.0 0.0 0.0 -0.2 0.2 0.0" name="aux_2_geom" size="0.08" type="capsule"/>
<body name="aux_2" pos="-0.2 0.2 0">
<joint axis="0 0 1" name="hip_2" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.2 0.2 0.0" name="right_leg_geom" size="0.08" type="capsule"/>
<body pos="-0.2 0.2 0">
<joint axis="1 1 0" name="ankle_2" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.4 0.4 0.0" name="right_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="back_leg" pos="-1 0 0">
<geom fromto="0.0 0.0 0.0 -0.2 -0.2 0.0" name="aux_3_geom" size="0.08" type="capsule"/>
<body name="aux_3" pos="-0.2 -0.2 0">
<joint axis="0 0 1" name="hip_3" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.2 -0.2 0.0" name="back_leg_geom" size="0.08" type="capsule"/>
<body pos="-0.2 -0.2 0">
<joint axis="-1 1 0" name="ankle_3" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.4 -0.4 0.0" name="third_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>-->
<body name="front_right_leg" pos="-1 0 0">
<geom fromto="0.0 0.0 0.0 0.0 0.2 0.0" name="aux_2_geom" size="0.08" type="capsule"/>
<body name="aux_2" pos="0.0 0.2 0">
<joint axis="0 0 1" name="hip_2" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.2 0.2 0.0" name="right_leg_geom" size="0.08" type="capsule"/>
<body pos="-0.2 0.2 0">
<joint axis="1 1 0" name="ankle_2" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.4 0.4 0.0" name="right_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="back_leg" pos="-1 0 0">
<geom fromto="0.0 0.0 0.0 0.0 -0.2 0.0" name="aux_3_geom" size="0.08" type="capsule"/>
<body name="aux_3" pos="0.0 -0.2 0">
<joint axis="0 0 1" name="hip_3" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.2 -0.2 0.0" name="back_leg_geom" size="0.08" type="capsule"/>
<body pos="-0.2 -0.2 0">
<joint axis="-1 1 0" name="ankle_3" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.4 -0.4 0.0" name="third_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
</body>
</body>
</body>
<body name='b1' pos="0 5 1" euler='0 0 30'>
<freejoint name="b1_fj"/>
<geom conaffinity="1" condim="3" name="wall1" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b2' pos="0 -5 1" euler='0 0 30'>
<freejoint name="b2_fj"/>
<geom conaffinity="1" condim="3" name="wall2" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b3' pos="40 5 1" euler='0 0 -30'>
<freejoint name="b3_fj"/>
<geom conaffinity="1" condim="3" name="wall3" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b4' pos="40 -5 1" euler='0 0 -30'>
<freejoint name="b4_fj"/>
<geom conaffinity="1" condim="3" name="wall4" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b5' pos="80 5 1" euler='0 0 30'>
<freejoint name="b5_fj"/>
<geom conaffinity="1" condim="3" name="wall5" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
<body name='b6' pos="80 -5 1" euler='0 0 30'>
<freejoint name="b6_fj"/>
<geom conaffinity="1" condim="3" name="wall6" type="box" density=".000001" size="20 0.01 .7" rgba="1 0.5 0.5 1"/>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_4" gear="150"/>
gitextract_e69a341i/ ├── LICENSE ├── MACPO/ │ ├── .gitignore │ ├── environment.yaml │ ├── macpo/ │ │ ├── __init__.py │ │ ├── algorithms/ │ │ │ ├── __init__.py │ │ │ ├── r_mappo/ │ │ │ │ ├── __init__.py │ │ │ │ ├── algorithm/ │ │ │ │ │ ├── MACPPOPolicy.py │ │ │ │ │ ├── rMAPPOPolicy.py │ │ │ │ │ └── r_actor_critic.py │ │ │ │ └── r_macpo.py │ │ │ └── utils/ │ │ │ ├── act.py │ │ │ ├── cnn.py │ │ │ ├── distributions.py │ │ │ ├── mlp.py │ │ │ ├── rnn.py │ │ │ └── util.py │ │ ├── config.py │ │ ├── envs/ │ │ │ ├── __init__.py │ │ │ ├── env_wrappers.py │ │ │ └── safety_ma_mujoco/ │ │ │ ├── MUJOCO_LOG.TXT │ │ │ ├── README.md │ │ │ ├── __init__.py │ │ │ ├── safety_multiagent_mujoco/ │ │ │ │ ├── __init__.py │ │ │ │ ├── ant.py │ │ │ │ ├── assets/ │ │ │ │ │ ├── .gitignore │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── ant.xml │ │ │ │ │ ├── coupled_half_cheetah.xml │ │ │ │ │ ├── half_cheetah.xml │ │ │ │ │ ├── hopper.xml │ │ │ │ │ ├── humanoid.xml │ │ │ │ │ ├── manyagent_ant.xml │ │ │ │ │ ├── manyagent_ant.xml.template │ │ │ │ │ ├── manyagent_ant__stage1.xml │ │ │ │ │ ├── manyagent_swimmer.xml.template │ │ │ │ │ ├── manyagent_swimmer__bckp2.xml │ │ │ │ │ └── manyagent_swimmer_bckp.xml │ │ │ │ ├── coupled_half_cheetah.py │ │ │ │ ├── half_cheetah.py │ │ │ │ ├── hopper.py │ │ │ │ ├── humanoid.py │ │ │ │ ├── manyagent_ant.py │ │ │ │ ├── manyagent_swimmer.py │ │ │ │ ├── mujoco_env.py │ │ │ │ ├── mujoco_multi.py │ │ │ │ ├── multiagentenv.py │ │ │ │ └── obsk.py │ │ │ └── test.py │ │ ├── runner/ │ │ │ ├── __init__.py │ │ │ └── separated/ │ │ │ ├── __init__.py │ │ │ ├── base_runner.py │ │ │ ├── base_runner_macpo.py │ │ │ ├── mujoco_runner.py │ │ │ └── mujoco_runner_macpo.py │ │ ├── scripts/ │ │ │ ├── __init__.py │ │ │ ├── train/ │ │ │ │ ├── __init__.py │ │ │ │ └── train_mujoco.py │ │ │ └── train_mujoco.sh │ │ └── utils/ │ │ ├── __init__.py │ │ ├── multi_discrete.py │ │ ├── popart.py │ │ ├── separated_buffer.py │ │ └── util.py │ ├── macpo.egg-info/ │ │ ├── PKG-INFO │ │ ├── SOURCES.txt │ │ ├── dependency_links.txt │ │ └── top_level.txt │ └── setup.py ├── MAPPO-Lagrangian/ │ ├── .gitignore │ ├── environment.yaml │ ├── mappo_lagrangian/ │ │ ├── __init__.py │ │ ├── algorithms/ │ │ │ ├── __init__.py │ │ │ ├── r_mappo/ │ │ │ │ ├── __init__.py │ │ │ │ ├── algorithm/ │ │ │ │ │ ├── MACPPOPolicy.py │ │ │ │ │ ├── rMAPPOPolicy.py │ │ │ │ │ └── r_actor_critic.py │ │ │ │ └── r_mappo_lagr.py │ │ │ └── utils/ │ │ │ ├── act.py │ │ │ ├── cnn.py │ │ │ ├── distributions.py │ │ │ ├── mlp.py │ │ │ ├── rnn.py │ │ │ └── util.py │ │ ├── config.py │ │ ├── envs/ │ │ │ ├── __init__.py │ │ │ ├── env_wrappers.py │ │ │ └── safety_ma_mujoco/ │ │ │ ├── MUJOCO_LOG.TXT │ │ │ ├── README.md │ │ │ ├── __init__.py │ │ │ ├── safety_multiagent_mujoco/ │ │ │ │ ├── __init__.py │ │ │ │ ├── ant.py │ │ │ │ ├── assets/ │ │ │ │ │ ├── .gitignore │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── ant.xml │ │ │ │ │ ├── beifen_hopper.xml │ │ │ │ │ ├── coupled_half_cheetah.xml │ │ │ │ │ ├── half_cheetah.xml │ │ │ │ │ ├── hopper.xml │ │ │ │ │ ├── humanoid.xml │ │ │ │ │ ├── manyagent_ant.xml │ │ │ │ │ ├── manyagent_ant.xml.template │ │ │ │ │ ├── manyagent_ant__stage1.xml │ │ │ │ │ ├── manyagent_swimmer.xml.template │ │ │ │ │ ├── manyagent_swimmer__bckp2.xml │ │ │ │ │ └── manyagent_swimmer_bckp.xml │ │ │ │ ├── coupled_half_cheetah.py │ │ │ │ ├── half_cheetah.py │ │ │ │ ├── hopper.py │ │ │ │ ├── humanoid.py │ │ │ │ ├── manyagent_ant.py │ │ │ │ ├── manyagent_swimmer.py │ │ │ │ ├── mujoco_env.py │ │ │ │ ├── mujoco_multi.py │ │ │ │ ├── multiagentenv.py │ │ │ │ └── obsk.py │ │ │ └── test.py │ │ ├── runner/ │ │ │ ├── __init__.py │ │ │ └── separated/ │ │ │ ├── __init__.py │ │ │ ├── base_runner.py │ │ │ ├── base_runner_mappo_lagr.py │ │ │ ├── mujoco_runner.py │ │ │ └── mujoco_runner_mappo_lagr.py │ │ ├── scripts/ │ │ │ ├── __init__.py │ │ │ ├── eval/ │ │ │ │ └── eval_hanabi.py │ │ │ ├── train/ │ │ │ │ ├── __init__.py │ │ │ │ └── train_mujoco.py │ │ │ └── train_mujoco.sh │ │ └── utils/ │ │ ├── __init__.py │ │ ├── multi_discrete.py │ │ ├── popart.py │ │ ├── separated_buffer.py │ │ ├── shared_buffer.py │ │ └── util.py │ ├── mappo_lagrangian.egg-info/ │ │ ├── PKG-INFO │ │ ├── SOURCES.txt │ │ ├── dependency_links.txt │ │ └── top_level.txt │ └── setup.py ├── README.md ├── environment.yaml └── requirements.txt
SYMBOL INDEX (796 symbols across 72 files)
FILE: MACPO/macpo/algorithms/r_mappo/__init__.py
function cost_trpo_macppo (line 1) | def cost_trpo_macppo():
FILE: MACPO/macpo/algorithms/r_mappo/algorithm/MACPPOPolicy.py
class MACPPOPolicy (line 6) | class MACPPOPolicy:
method __init__ (line 17) | def __init__(self, args, obs_space, cent_obs_space, act_space, device=...
method lr_decay (line 45) | def lr_decay(self, episode, episodes):
method get_actions (line 55) | def get_actions(self, cent_obs, obs, rnn_states_actor, rnn_states_crit...
method get_values (line 88) | def get_values(self, cent_obs, rnn_states_critic, masks):
method get_cost_values (line 100) | def get_cost_values(self, cent_obs, rnn_states_cost, masks):
method evaluate_actions (line 112) | def evaluate_actions(self, cent_obs, obs, rnn_states_actor, rnn_states...
method act (line 157) | def act(self, obs, rnn_states_actor, masks, available_actions=None, de...
FILE: MACPO/macpo/algorithms/r_mappo/algorithm/rMAPPOPolicy.py
class R_MAPPOPolicy (line 6) | class R_MAPPOPolicy:
method __init__ (line 17) | def __init__(self, args, obs_space, cent_obs_space, act_space, device=...
method lr_decay (line 39) | def lr_decay(self, episode, episodes):
method get_actions (line 48) | def get_actions(self, cent_obs, obs, rnn_states_actor, rnn_states_crit...
method get_values (line 76) | def get_values(self, cent_obs, rnn_states_critic, masks):
method evaluate_actions (line 88) | def evaluate_actions(self, cent_obs, obs, rnn_states_actor, rnn_states...
method act (line 116) | def act(self, obs, rnn_states_actor, masks, available_actions=None, de...
FILE: MACPO/macpo/algorithms/r_mappo/algorithm/r_actor_critic.py
class R_Actor (line 11) | class R_Actor(nn.Module):
method __init__ (line 19) | def __init__(self, args, obs_space, action_space, device=torch.device(...
method forward (line 43) | def forward(self, obs, rnn_states, masks, available_actions=None, dete...
method evaluate_actions (line 72) | def evaluate_actions(self, obs, rnn_states, action, masks, available_a...
class R_Critic (line 125) | class R_Critic(nn.Module):
method __init__ (line 133) | def __init__(self, args, cent_obs_space, device=torch.device("cpu")):
method forward (line 157) | def forward(self, cent_obs, rnn_states, masks):
FILE: MACPO/macpo/algorithms/r_mappo/r_macpo.py
class R_MACTRPO_CPO (line 14) | class R_MACTRPO_CPO():
method __init__ (line 22) | def __init__(self,
method cal_value_loss (line 124) | def cal_value_loss(self, values, value_preds_batch, return_batch, acti...
method flat_grad (line 164) | def flat_grad(self, grads):
method flat_hessian (line 173) | def flat_hessian(self, hessians):
method flat_params (line 182) | def flat_params(self, model):
method update_model (line 189) | def update_model(self, model, new_params):
method kl_divergence (line 198) | def kl_divergence(self, obs, rnn_states, action, masks, available_acti...
method conjugate_gradient (line 218) | def conjugate_gradient(self, actor, obs, rnn_states, action, masks, av...
method fisher_vector_product (line 237) | def fisher_vector_product(self, actor, obs, rnn_states, action, masks,...
method _get_flat_grad (line 251) | def _get_flat_grad(self, y, model, retain_graph=None, create_graph=Fal...
method _flat_grad_ (line 262) | def _flat_grad_(self, f, model, retain_graph=None, create_graph=False):
method hessian_vector_product (line 266) | def hessian_vector_product(self, f, model):
method cg (line 274) | def cg(self, Ax, b, cg_iters=10):
method trpo_update (line 289) | def trpo_update(self, sample, update_actor=True):
method train (line 565) | def train(self, buffer, shared_buffer=None, update_actor=True):
method prep_training (line 657) | def prep_training(self):
method prep_rollout (line 661) | def prep_rollout(self):
FILE: MACPO/macpo/algorithms/utils/act.py
class ACTLayer (line 5) | class ACTLayer(nn.Module):
method __init__ (line 13) | def __init__(self, action_space, inputs_dim, use_orthogonal, gain, arg...
method forward (line 41) | def forward(self, x, available_actions=None, deterministic=False):
method get_probs (line 85) | def get_probs(self, x, available_actions=None):
method evaluate_actions (line 107) | def evaluate_actions(self, x, action, available_actions=None, active_m...
method evaluate_actions_trpo (line 165) | def evaluate_actions_trpo(self, x, action, available_actions=None, act...
FILE: MACPO/macpo/algorithms/utils/cnn.py
class Flatten (line 6) | class Flatten(nn.Module):
method forward (line 7) | def forward(self, x):
class CNNLayer (line 11) | class CNNLayer(nn.Module):
method __init__ (line 12) | def __init__(self, obs_shape, hidden_size, use_orthogonal, use_ReLU, k...
method forward (line 40) | def forward(self, x):
class CNNBase (line 46) | class CNNBase(nn.Module):
method __init__ (line 47) | def __init__(self, args, obs_shape):
method forward (line 56) | def forward(self, x):
FILE: MACPO/macpo/algorithms/utils/distributions.py
class FixedCategorical (line 14) | class FixedCategorical(torch.distributions.Categorical):
method sample (line 15) | def sample(self):
method log_probs (line 18) | def log_probs(self, actions):
method mode (line 27) | def mode(self):
class FixedNormal (line 32) | class FixedNormal(torch.distributions.Normal):
method log_probs (line 33) | def log_probs(self, actions):
method entrop (line 37) | def entrop(self):
method mode (line 40) | def mode(self):
class FixedBernoulli (line 45) | class FixedBernoulli(torch.distributions.Bernoulli):
method log_probs (line 46) | def log_probs(self, actions):
method entropy (line 49) | def entropy(self):
method mode (line 52) | def mode(self):
class Categorical (line 56) | class Categorical(nn.Module):
method __init__ (line 57) | def __init__(self, num_inputs, num_outputs, use_orthogonal=True, gain=...
method forward (line 65) | def forward(self, x, available_actions=None):
class DiagGaussian (line 94) | class DiagGaussian(nn.Module):
method __init__ (line 95) | def __init__(self, num_inputs, num_outputs, use_orthogonal=True, gain=...
method forward (line 113) | def forward(self, x, available_actions=None):
class Bernoulli (line 125) | class Bernoulli(nn.Module):
method __init__ (line 126) | def __init__(self, num_inputs, num_outputs, use_orthogonal=True, gain=...
method forward (line 134) | def forward(self, x):
class AddBias (line 138) | class AddBias(nn.Module):
method __init__ (line 139) | def __init__(self, bias):
method forward (line 143) | def forward(self, x):
FILE: MACPO/macpo/algorithms/utils/mlp.py
class MLPLayer (line 6) | class MLPLayer(nn.Module):
method __init__ (line 7) | def __init__(self, input_dim, hidden_size, layer_N, use_orthogonal, us...
method forward (line 27) | def forward(self, x):
class MLPBase (line 34) | class MLPBase(nn.Module):
method __init__ (line 35) | def __init__(self, args, obs_shape, cat_self=True, attn_internal=False):
method forward (line 53) | def forward(self, x):
FILE: MACPO/macpo/algorithms/utils/rnn.py
class RNNLayer (line 7) | class RNNLayer(nn.Module):
method __init__ (line 8) | def __init__(self, inputs_dim, outputs_dim, recurrent_N, use_orthogonal):
method forward (line 24) | def forward(self, x, hxs, masks):
FILE: MACPO/macpo/algorithms/utils/util.py
function init (line 7) | def init(module, weight_init, bias_init, gain=1):
function get_clones (line 12) | def get_clones(module, N):
function check (line 15) | def check(input):
FILE: MACPO/macpo/config.py
function get_config (line 4) | def get_config():
FILE: MACPO/macpo/envs/env_wrappers.py
class CloudpickleWrapper (line 10) | class CloudpickleWrapper(object):
method __init__ (line 15) | def __init__(self, x):
method __getstate__ (line 18) | def __getstate__(self):
method __setstate__ (line 22) | def __setstate__(self, ob):
class ShareVecEnv (line 27) | class ShareVecEnv(ABC):
method __init__ (line 41) | def __init__(self, num_envs, observation_space, share_observation_spac...
method reset (line 48) | def reset(self):
method step_async (line 60) | def step_async(self, actions):
method step_wait (line 72) | def step_wait(self):
method close_extras (line 86) | def close_extras(self):
method close (line 93) | def close(self):
method step (line 101) | def step(self, actions):
method render (line 110) | def render(self, mode='human'):
method get_images (line 121) | def get_images(self):
method unwrapped (line 128) | def unwrapped(self):
method get_viewer (line 134) | def get_viewer(self):
function worker (line 141) | def worker(remote, parent_remote, env_fn_wrapper):
class GuardSubprocVecEnv (line 178) | class GuardSubprocVecEnv(ShareVecEnv):
method __init__ (line 179) | def __init__(self, env_fns, spaces=None):
method step_async (line 200) | def step_async(self, actions):
method step_wait (line 206) | def step_wait(self):
method reset (line 212) | def reset(self):
method reset_task (line 218) | def reset_task(self):
method close (line 223) | def close(self):
class SubprocVecEnv (line 236) | class SubprocVecEnv(ShareVecEnv):
method __init__ (line 237) | def __init__(self, env_fns, spaces=None):
method step_async (line 258) | def step_async(self, actions):
method step_wait (line 263) | def step_wait(self):
method reset (line 269) | def reset(self):
method reset_task (line 276) | def reset_task(self):
method close (line 281) | def close(self):
method render (line 293) | def render(self, mode="rgb_array"):
function shareworker (line 301) | def shareworker(remote, parent_remote, env_fn_wrapper):
class ShareSubprocVecEnv (line 344) | class ShareSubprocVecEnv(ShareVecEnv):
method __init__ (line 345) | def __init__(self, env_fns, spaces=None):
method step_async (line 369) | def step_async(self, actions):
method step_wait (line 374) | def step_wait(self):
method reset (line 384) | def reset(self):
method reset_task (line 391) | def reset_task(self):
method close (line 396) | def close(self):
function choosesimpleworker (line 409) | def choosesimpleworker(remote, parent_remote, env_fn_wrapper):
class ChooseSimpleSubprocVecEnv (line 440) | class ChooseSimpleSubprocVecEnv(ShareVecEnv):
method __init__ (line 441) | def __init__(self, env_fns, spaces=None):
method step_async (line 461) | def step_async(self, actions):
method step_wait (line 466) | def step_wait(self):
method reset (line 472) | def reset(self, reset_choose):
method render (line 478) | def render(self, mode="rgb_array"):
method reset_task (line 485) | def reset_task(self):
method close (line 490) | def close(self):
function chooseworker (line 503) | def chooseworker(remote, parent_remote, env_fn_wrapper):
class ChooseSubprocVecEnv (line 530) | class ChooseSubprocVecEnv(ShareVecEnv):
method __init__ (line 531) | def __init__(self, env_fns, spaces=None):
method step_async (line 552) | def step_async(self, actions):
method step_wait (line 557) | def step_wait(self):
method reset (line 563) | def reset(self, reset_choose):
method reset_task (line 570) | def reset_task(self):
method close (line 575) | def close(self):
function chooseguardworker (line 588) | def chooseguardworker(remote, parent_remote, env_fn_wrapper):
class ChooseGuardSubprocVecEnv (line 613) | class ChooseGuardSubprocVecEnv(ShareVecEnv):
method __init__ (line 614) | def __init__(self, env_fns, spaces=None):
method step_async (line 635) | def step_async(self, actions):
method step_wait (line 640) | def step_wait(self):
method reset (line 646) | def reset(self, reset_choose):
method reset_task (line 652) | def reset_task(self):
method close (line 657) | def close(self):
class DummyVecEnv (line 671) | class DummyVecEnv(ShareVecEnv):
method __init__ (line 672) | def __init__(self, env_fns):
method step_async (line 679) | def step_async(self, actions):
method step_wait (line 682) | def step_wait(self):
method reset (line 697) | def reset(self):
method close (line 701) | def close(self):
method render (line 705) | def render(self, mode="human"):
class ShareDummyVecEnv (line 716) | class ShareDummyVecEnv(ShareVecEnv):
method __init__ (line 717) | def __init__(self, env_fns):
method step_async (line 724) | def step_async(self, actions):
method step_wait (line 727) | def step_wait(self):
method reset (line 743) | def reset(self):
method close (line 748) | def close(self):
method render (line 752) | def render(self, mode="human"):
class ChooseDummyVecEnv (line 762) | class ChooseDummyVecEnv(ShareVecEnv):
method __init__ (line 763) | def __init__(self, env_fns):
method step_async (line 770) | def step_async(self, actions):
method step_wait (line 773) | def step_wait(self):
method reset (line 780) | def reset(self, reset_choose):
method close (line 786) | def close(self):
method render (line 790) | def render(self, mode="human"):
class ChooseSimpleDummyVecEnv (line 799) | class ChooseSimpleDummyVecEnv(ShareVecEnv):
method __init__ (line 800) | def __init__(self, env_fns):
method step_async (line 807) | def step_async(self, actions):
method step_wait (line 810) | def step_wait(self):
method reset (line 816) | def reset(self, reset_choose):
method close (line 821) | def close(self):
method render (line 825) | def render(self, mode="human"):
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/ant.py
class AntEnv (line 8) | class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 9) | def __init__(self, **kwargs):
method step (line 13) | def step(self, a):
method _get_obs (line 61) | def _get_obs(self):
method reset_model (line 81) | def reset_model(self):
method viewer_setup (line 89) | def viewer_setup(self):
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/coupled_half_cheetah.py
class CoupledHalfCheetah (line 9) | class CoupledHalfCheetah(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 10) | def __init__(self, **kwargs):
method step (line 14) | def step(self, action):
method _get_obs (line 117) | def _get_obs(self):
method reset_model (line 137) | def reset_model(self):
method viewer_setup (line 143) | def viewer_setup(self):
method get_env_info (line 146) | def get_env_info(self):
method _set_action_space (line 149) | def _set_action_space(self):
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/half_cheetah.py
class HalfCheetahEnv (line 10) | class HalfCheetahEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 11) | def __init__(self, **kwargs):
method step (line 16) | def step(self, action):
method _get_obs (line 48) | def _get_obs(self):
method reset_model (line 61) | def reset_model(self):
method viewer_setup (line 67) | def viewer_setup(self):
method _set_action_space (line 70) | def _set_action_space(self):
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/hopper.py
class HopperEnv (line 7) | class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 8) | def __init__(self, **kwargs):
method step (line 13) | def step(self, a):
method _get_obs (line 50) | def _get_obs(self):
method reset_model (line 63) | def reset_model(self):
method last_mocap_x (line 69) | def last_mocap_x(self):
method viewer_setup (line 72) | def viewer_setup(self):
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/humanoid.py
function mass_center (line 8) | def mass_center(model, sim):
class HumanoidEnv (line 14) | class HumanoidEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 15) | def __init__(self, **kwargs):
method _get_obs (line 19) | def _get_obs(self):
method step (line 44) | def step(self, a):
method reset_model (line 85) | def reset_model(self):
method viewer_setup (line 99) | def viewer_setup(self):
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/manyagent_ant.py
class ManyAgentAntEnv (line 10) | class ManyAgentAntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 11) | def __init__(self, **kwargs):
method _generate_asset (line 33) | def _generate_asset(self, n_segs, asset_path):
method step (line 86) | def step(self, a):
method _get_obs (line 156) | def _get_obs(self):
method reset_model (line 188) | def reset_model(self):
method viewer_setup (line 196) | def viewer_setup(self):
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/manyagent_swimmer.py
class ManyAgentSwimmerEnv (line 9) | class ManyAgentSwimmerEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 10) | def __init__(self, **kwargs):
method _generate_asset (line 30) | def _generate_asset(self, n_segs, asset_path):
method step (line 66) | def step(self, a):
method _get_obs (line 123) | def _get_obs(self):
method reset_model (line 144) | def reset_model(self):
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/mujoco_env.py
function convert_observation_to_space (line 19) | def convert_observation_to_space(observation):
class MujocoEnv (line 35) | class MujocoEnv(gym.Env):
method __init__ (line 39) | def __init__(self, model_path, frame_skip):
method _set_action_space (line 71) | def _set_action_space(self):
method _set_observation_space (line 77) | def _set_observation_space(self, observation):
method seed (line 81) | def seed(self, seed=None):
method reset_model (line 88) | def reset_model(self):
method viewer_setup (line 95) | def viewer_setup(self):
method reset (line 105) | def reset(self):
method set_state (line 110) | def set_state(self, qpos, qvel):
method dt (line 119) | def dt(self):
method do_simulation (line 122) | def do_simulation(self, ctrl, n_frames):
method render (line 127) | def render(self,
method close (line 160) | def close(self):
method _get_viewer (line 166) | def _get_viewer(self, mode):
method get_body_com (line 178) | def get_body_com(self, body_name):
method state_vector (line 181) | def state_vector(self):
method place_random_objects (line 187) | def place_random_objects(self):
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/mujoco_multi.py
function env_fn (line 13) | def env_fn(env, **kwargs) -> MultiAgentEnv: # TODO: this may be a more ...
class NormalizedActions (line 26) | class NormalizedActions(gym.ActionWrapper):
method _action (line 28) | def _action(self, action):
method action (line 34) | def action(self, action_):
method _reverse_action (line 37) | def _reverse_action(self, action):
class MujocoMulti (line 44) | class MujocoMulti(MultiAgentEnv):
method __init__ (line 46) | def __init__(self, batch_size=None, **kwargs):
method step (line 176) | def step(self, actions):
method get_obs (line 205) | def get_obs(self):
method get_obs_agent (line 220) | def get_obs_agent(self, agent_id):
method get_obs_size (line 236) | def get_obs_size(self):
method get_state (line 244) | def get_state(self, team=None):
method get_state_size (line 257) | def get_state_size(self):
method get_avail_actions (line 261) | def get_avail_actions(self): # all actions are always available
method get_avail_agent_actions (line 264) | def get_avail_agent_actions(self, agent_id):
method get_total_actions (line 268) | def get_total_actions(self):
method get_stats (line 273) | def get_stats(self):
method get_agg_stats (line 277) | def get_agg_stats(self, stats):
method reset (line 280) | def reset(self, **kwargs):
method render (line 286) | def render(self, **kwargs):
method close (line 289) | def close(self):
method seed (line 292) | def seed(self, args):
method get_env_info (line 295) | def get_env_info(self):
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/multiagentenv.py
function convert (line 5) | def convert(dictionary):
class MultiAgentEnv (line 8) | class MultiAgentEnv(object):
method __init__ (line 10) | def __init__(self, batch_size=None, **kwargs):
method step (line 21) | def step(self, actions):
method get_obs (line 25) | def get_obs(self):
method get_obs_agent (line 29) | def get_obs_agent(self, agent_id):
method get_obs_size (line 33) | def get_obs_size(self):
method get_state (line 37) | def get_state(self):
method get_state_size (line 40) | def get_state_size(self):
method get_avail_actions (line 44) | def get_avail_actions(self):
method get_avail_agent_actions (line 47) | def get_avail_agent_actions(self, agent_id):
method get_total_actions (line 51) | def get_total_actions(self):
method get_stats (line 56) | def get_stats(self):
method get_agg_stats (line 60) | def get_agg_stats(self, stats):
method reset (line 63) | def reset(self):
method render (line 67) | def render(self):
method close (line 70) | def close(self):
method seed (line 73) | def seed(self, seed):
method get_env_info (line 76) | def get_env_info(self):
FILE: MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/obsk.py
class Node (line 5) | class Node():
method __init__ (line 6) | def __init__(self, label, qpos_ids, qvel_ids, act_ids, body_fn=None, b...
method __str__ (line 17) | def __str__(self):
method __repr__ (line 20) | def __repr__(self):
class HyperEdge (line 24) | class HyperEdge():
method __init__ (line 25) | def __init__(self, *edges):
method __contains__ (line 28) | def __contains__(self, item):
method __str__ (line 31) | def __str__(self):
method __repr__ (line 34) | def __repr__(self):
function get_joints_at_kdist (line 38) | def get_joints_at_kdist(agent_id, agent_partitions, hyperedges, k=0, kag...
function build_obs (line 74) | def build_obs(env, k_dict, k_categories, global_dict, global_categories,...
function build_actions (line 141) | def build_actions(agent_partitions, k_dict):
function get_parts_and_edges (line 146) | def get_parts_and_edges(label, partitioning):
FILE: MACPO/macpo/envs/safety_ma_mujoco/test.py
function main (line 6) | def main():
FILE: MACPO/macpo/runner/separated/base_runner.py
function _t2n (line 13) | def _t2n(x):
class Runner (line 16) | class Runner(object):
method __init__ (line 17) | def __init__(self, config):
method run (line 105) | def run(self):
method warmup (line 108) | def warmup(self):
method collect (line 111) | def collect(self, step):
method insert (line 114) | def insert(self, data):
method compute (line 118) | def compute(self):
method train (line 127) | def train(self):
method save (line 139) | def save(self):
method restore (line 150) | def restore(self):
method log_train (line 161) | def log_train(self, train_infos, total_num_steps):
method log_env (line 170) | def log_env(self, env_infos, total_num_steps):
FILE: MACPO/macpo/runner/separated/base_runner_macpo.py
function _t2n (line 14) | def _t2n(x):
class Runner (line 18) | class Runner(object):
method __init__ (line 19) | def __init__(self, config):
method run (line 110) | def run(self):
method warmup (line 113) | def warmup(self):
method collect (line 116) | def collect(self, step):
method insert (line 119) | def insert(self, data):
method compute (line 123) | def compute(self):
method train (line 138) | def train(self):
method buffer_filter (line 192) | def buffer_filter(self, agent_id):
method remove_episodes (line 219) | def remove_episodes(self, agent_id, del_ids):
method save (line 244) | def save(self):
method restore (line 255) | def restore(self):
method log_train (line 266) | def log_train(self, train_infos, total_num_steps):
method log_env (line 275) | def log_env(self, env_infos, total_num_steps):
FILE: MACPO/macpo/runner/separated/mujoco_runner.py
function _t2n (line 9) | def _t2n(x):
class MujocoRunner (line 13) | class MujocoRunner(Runner):
method __init__ (line 16) | def __init__(self, config):
method run (line 19) | def run(self):
method warmup (line 90) | def warmup(self):
method collect (line 102) | def collect(self, step):
method insert (line 130) | def insert(self, data):
method log_train (line 158) | def log_train(self, train_infos, total_num_steps):
method eval (line 170) | def eval(self, total_num_steps):
FILE: MACPO/macpo/runner/separated/mujoco_runner_macpo.py
function _t2n (line 11) | def _t2n(x):
class MujocoRunner (line 15) | class MujocoRunner(Runner):
method __init__ (line 18) | def __init__(self, config):
method run (line 22) | def run(self):
method return_aver_cost (line 118) | def return_aver_cost(self, aver_episode_costs):
method warmup (line 125) | def warmup(self):
method collect (line 138) | def collect(self, step):
method insert (line 178) | def insert(self, data, aver_episode_costs = 0):
method log_train (line 213) | def log_train(self, train_infos, total_num_steps):
method eval (line 226) | def eval(self, total_num_steps):
FILE: MACPO/macpo/scripts/train/train_mujoco.py
function make_train_env (line 23) | def make_train_env(all_args):
function make_eval_env (line 46) | def make_eval_env(all_args):
function parse_args (line 69) | def parse_args(args, parser):
function main (line 92) | def main(args):
FILE: MACPO/macpo/utils/multi_discrete.py
class MultiDiscrete (line 6) | class MultiDiscrete(gym.Space):
method __init__ (line 22) | def __init__(self, array_of_param_array):
method sample (line 28) | def sample(self):
method contains (line 34) | def contains(self, x):
method shape (line 38) | def shape(self):
method __repr__ (line 41) | def __repr__(self):
method __eq__ (line 44) | def __eq__(self, other):
FILE: MACPO/macpo/utils/popart.py
class PopArt (line 8) | class PopArt(nn.Module):
method __init__ (line 11) | def __init__(self, input_shape, norm_axes=1, beta=0.99999, per_element...
method reset_parameters (line 25) | def reset_parameters(self):
method running_mean_var (line 30) | def running_mean_var(self):
method forward (line 36) | def forward(self, input_vector, train=True):
method denormalize (line 64) | def denormalize(self, input_vector):
FILE: MACPO/macpo/utils/separated_buffer.py
function _flatten (line 8) | def _flatten(T, N, x):
function _cast (line 12) | def _cast(x):
class SeparatedReplayBuffer (line 16) | class SeparatedReplayBuffer(object):
method __init__ (line 17) | def __init__(self, args, obs_space, share_obs_space, act_space):
method update_factor (line 74) | def update_factor(self, factor):
method return_aver_insert (line 77) | def return_aver_insert(self, aver_episode_costs):
method insert (line 80) | def insert(self, share_obs, obs, rnn_states, rnn_states_critic, action...
method chooseinsert (line 110) | def chooseinsert(self, share_obs, obs, rnn_states, rnn_states_critic, ...
method after_update (line 130) | def after_update(self):
method chooseafter_update (line 142) | def chooseafter_update(self):
method compute_returns (line 148) | def compute_returns(self, next_value, value_normalizer=None):
method compute_cost_returns (line 195) | def compute_cost_returns(self, next_cost, value_normalizer=None):
method feed_forward_generator (line 239) | def feed_forward_generator(self, advantages, num_mini_batch=None, mini...
method naive_recurrent_generator (line 323) | def naive_recurrent_generator(self, advantages, num_mini_batch, cost_a...
FILE: MACPO/macpo/utils/util.py
function check (line 5) | def check(input):
function get_gard_norm (line 9) | def get_gard_norm(it):
function update_linear_schedule (line 17) | def update_linear_schedule(optimizer, epoch, total_num_epochs, initial_lr):
function huber_loss (line 23) | def huber_loss(e, d):
function mse_loss (line 28) | def mse_loss(e):
function get_shape_from_obs_space (line 31) | def get_shape_from_obs_space(obs_space):
function get_shape_from_act_space (line 40) | def get_shape_from_act_space(act_space):
function tile_images (line 54) | def tile_images(img_nhwc):
FILE: MACPO/setup.py
function get_version (line 8) | def get_version() -> str:
FILE: MAPPO-Lagrangian/mappo_lagrangian/algorithms/r_mappo/__init__.py
function cost_trpo_macppo (line 1) | def cost_trpo_macppo():
FILE: MAPPO-Lagrangian/mappo_lagrangian/algorithms/r_mappo/algorithm/MACPPOPolicy.py
class MACPPOPolicy (line 6) | class MACPPOPolicy:
method __init__ (line 17) | def __init__(self, args, obs_space, cent_obs_space, act_space, device=...
method lr_decay (line 44) | def lr_decay(self, episode, episodes):
method get_actions (line 54) | def get_actions(self, cent_obs, obs, rnn_states_actor, rnn_states_crit...
method get_values (line 87) | def get_values(self, cent_obs, rnn_states_critic, masks):
method get_cost_values (line 99) | def get_cost_values(self, cent_obs, rnn_states_cost, masks):
method evaluate_actions (line 111) | def evaluate_actions(self, cent_obs, obs, rnn_states_actor, rnn_states...
method act (line 143) | def act(self, obs, rnn_states_actor, masks, available_actions=None, de...
FILE: MAPPO-Lagrangian/mappo_lagrangian/algorithms/r_mappo/algorithm/rMAPPOPolicy.py
class R_MAPPOPolicy (line 6) | class R_MAPPOPolicy:
method __init__ (line 17) | def __init__(self, args, obs_space, cent_obs_space, act_space, device=...
method lr_decay (line 39) | def lr_decay(self, episode, episodes):
method get_actions (line 48) | def get_actions(self, cent_obs, obs, rnn_states_actor, rnn_states_crit...
method get_values (line 76) | def get_values(self, cent_obs, rnn_states_critic, masks):
method evaluate_actions (line 88) | def evaluate_actions(self, cent_obs, obs, rnn_states_actor, rnn_states...
method act (line 116) | def act(self, obs, rnn_states_actor, masks, available_actions=None, de...
FILE: MAPPO-Lagrangian/mappo_lagrangian/algorithms/r_mappo/algorithm/r_actor_critic.py
class R_Actor (line 11) | class R_Actor(nn.Module):
method __init__ (line 19) | def __init__(self, args, obs_space, action_space, device=torch.device(...
method forward (line 42) | def forward(self, obs, rnn_states, masks, available_actions=None, dete...
method evaluate_actions (line 71) | def evaluate_actions(self, obs, rnn_states, action, masks, available_a...
class R_Critic (line 109) | class R_Critic(nn.Module):
method __init__ (line 117) | def __init__(self, args, cent_obs_space, device=torch.device("cpu")):
method forward (line 141) | def forward(self, cent_obs, rnn_states, masks):
FILE: MAPPO-Lagrangian/mappo_lagrangian/algorithms/r_mappo/r_mappo_lagr.py
class R_MAPPO_Lagr (line 8) | class R_MAPPO_Lagr:
method __init__ (line 26) | def __init__(self,
method cal_value_loss (line 86) | def cal_value_loss(self, values, value_preds_batch, return_batch, acti...
method _get_flat_grad (line 126) | def _get_flat_grad(self, y: torch.Tensor, model: nn.Module, **kwargs) ...
method _conjugate_gradients (line 137) | def _conjugate_gradients(self, b: torch.Tensor, flat_kl_grad: torch.Te...
method cal_second_hessian (line 156) | def cal_second_hessian(self, v: torch.Tensor, flat_kl_grad: torch.Tens...
method _set_from_flat_params (line 164) | def _set_from_flat_params(self, model: nn.Module, flat_params: torch.T...
method ppo_update (line 173) | def ppo_update(self, sample, update_actor=True, precomputed_eval=None,
method train (line 303) | def train(self, buffer, update_actor=True):
method prep_training (line 370) | def prep_training(self):
method prep_rollout (line 375) | def prep_rollout(self):
FILE: MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/act.py
class ACTLayer (line 5) | class ACTLayer(nn.Module):
method __init__ (line 13) | def __init__(self, action_space, inputs_dim, use_orthogonal, gain, arg...
method forward (line 41) | def forward(self, x, available_actions=None, deterministic=False):
method get_probs (line 85) | def get_probs(self, x, available_actions=None):
method evaluate_actions (line 107) | def evaluate_actions(self, x, action, available_actions=None, active_m...
FILE: MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/cnn.py
class Flatten (line 6) | class Flatten(nn.Module):
method forward (line 7) | def forward(self, x):
class CNNLayer (line 11) | class CNNLayer(nn.Module):
method __init__ (line 12) | def __init__(self, obs_shape, hidden_size, use_orthogonal, use_ReLU, k...
method forward (line 40) | def forward(self, x):
class CNNBase (line 46) | class CNNBase(nn.Module):
method __init__ (line 47) | def __init__(self, args, obs_shape):
method forward (line 56) | def forward(self, x):
FILE: MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/distributions.py
class FixedCategorical (line 14) | class FixedCategorical(torch.distributions.Categorical):
method sample (line 15) | def sample(self):
method log_probs (line 18) | def log_probs(self, actions):
method mode (line 27) | def mode(self):
class FixedNormal (line 32) | class FixedNormal(torch.distributions.Normal):
method log_probs (line 33) | def log_probs(self, actions):
method entrop (line 37) | def entrop(self):
method mode (line 40) | def mode(self):
class FixedBernoulli (line 45) | class FixedBernoulli(torch.distributions.Bernoulli):
method log_probs (line 46) | def log_probs(self, actions):
method entropy (line 49) | def entropy(self):
method mode (line 52) | def mode(self):
class Categorical (line 56) | class Categorical(nn.Module):
method __init__ (line 57) | def __init__(self, num_inputs, num_outputs, use_orthogonal=True, gain=...
method forward (line 65) | def forward(self, x, available_actions=None):
class DiagGaussian (line 94) | class DiagGaussian(nn.Module):
method __init__ (line 95) | def __init__(self, num_inputs, num_outputs, use_orthogonal=True, gain=...
method forward (line 113) | def forward(self, x, available_actions=None):
class Bernoulli (line 118) | class Bernoulli(nn.Module):
method __init__ (line 119) | def __init__(self, num_inputs, num_outputs, use_orthogonal=True, gain=...
method forward (line 127) | def forward(self, x):
class AddBias (line 131) | class AddBias(nn.Module):
method __init__ (line 132) | def __init__(self, bias):
method forward (line 136) | def forward(self, x):
FILE: MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/mlp.py
class MLPLayer (line 6) | class MLPLayer(nn.Module):
method __init__ (line 7) | def __init__(self, input_dim, hidden_size, layer_N, use_orthogonal, us...
method forward (line 24) | def forward(self, x):
class MLPBase (line 31) | class MLPBase(nn.Module):
method __init__ (line 32) | def __init__(self, args, obs_shape, cat_self=True, attn_internal=False):
method forward (line 50) | def forward(self, x):
FILE: MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/rnn.py
class RNNLayer (line 7) | class RNNLayer(nn.Module):
method __init__ (line 8) | def __init__(self, inputs_dim, outputs_dim, recurrent_N, use_orthogonal):
method forward (line 24) | def forward(self, x, hxs, masks):
FILE: MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/util.py
function init (line 7) | def init(module, weight_init, bias_init, gain=1):
function get_clones (line 12) | def get_clones(module, N):
function check (line 15) | def check(input):
FILE: MAPPO-Lagrangian/mappo_lagrangian/config.py
function get_config (line 4) | def get_config():
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/env_wrappers.py
class CloudpickleWrapper (line 10) | class CloudpickleWrapper(object):
method __init__ (line 15) | def __init__(self, x):
method __getstate__ (line 18) | def __getstate__(self):
method __setstate__ (line 22) | def __setstate__(self, ob):
class ShareVecEnv (line 27) | class ShareVecEnv(ABC):
method __init__ (line 41) | def __init__(self, num_envs, observation_space, share_observation_spac...
method reset (line 48) | def reset(self):
method step_async (line 60) | def step_async(self, actions):
method step_wait (line 72) | def step_wait(self):
method close_extras (line 86) | def close_extras(self):
method close (line 93) | def close(self):
method step (line 101) | def step(self, actions):
method render (line 110) | def render(self, mode='human'):
method get_images (line 121) | def get_images(self):
method unwrapped (line 128) | def unwrapped(self):
method get_viewer (line 134) | def get_viewer(self):
function worker (line 141) | def worker(remote, parent_remote, env_fn_wrapper):
class GuardSubprocVecEnv (line 178) | class GuardSubprocVecEnv(ShareVecEnv):
method __init__ (line 179) | def __init__(self, env_fns, spaces=None):
method step_async (line 200) | def step_async(self, actions):
method step_wait (line 206) | def step_wait(self):
method reset (line 212) | def reset(self):
method reset_task (line 218) | def reset_task(self):
method close (line 223) | def close(self):
class SubprocVecEnv (line 236) | class SubprocVecEnv(ShareVecEnv):
method __init__ (line 237) | def __init__(self, env_fns, spaces=None):
method step_async (line 258) | def step_async(self, actions):
method step_wait (line 263) | def step_wait(self):
method reset (line 269) | def reset(self):
method reset_task (line 276) | def reset_task(self):
method close (line 281) | def close(self):
method render (line 293) | def render(self, mode="rgb_array"):
function shareworker (line 301) | def shareworker(remote, parent_remote, env_fn_wrapper):
class ShareSubprocVecEnv (line 344) | class ShareSubprocVecEnv(ShareVecEnv):
method __init__ (line 345) | def __init__(self, env_fns, spaces=None):
method step_async (line 369) | def step_async(self, actions):
method step_wait (line 374) | def step_wait(self):
method reset (line 384) | def reset(self):
method reset_task (line 391) | def reset_task(self):
method close (line 396) | def close(self):
function choosesimpleworker (line 409) | def choosesimpleworker(remote, parent_remote, env_fn_wrapper):
class ChooseSimpleSubprocVecEnv (line 440) | class ChooseSimpleSubprocVecEnv(ShareVecEnv):
method __init__ (line 441) | def __init__(self, env_fns, spaces=None):
method step_async (line 461) | def step_async(self, actions):
method step_wait (line 466) | def step_wait(self):
method reset (line 472) | def reset(self, reset_choose):
method render (line 478) | def render(self, mode="rgb_array"):
method reset_task (line 485) | def reset_task(self):
method close (line 490) | def close(self):
function chooseworker (line 503) | def chooseworker(remote, parent_remote, env_fn_wrapper):
class ChooseSubprocVecEnv (line 530) | class ChooseSubprocVecEnv(ShareVecEnv):
method __init__ (line 531) | def __init__(self, env_fns, spaces=None):
method step_async (line 552) | def step_async(self, actions):
method step_wait (line 557) | def step_wait(self):
method reset (line 563) | def reset(self, reset_choose):
method reset_task (line 570) | def reset_task(self):
method close (line 575) | def close(self):
function chooseguardworker (line 588) | def chooseguardworker(remote, parent_remote, env_fn_wrapper):
class ChooseGuardSubprocVecEnv (line 613) | class ChooseGuardSubprocVecEnv(ShareVecEnv):
method __init__ (line 614) | def __init__(self, env_fns, spaces=None):
method step_async (line 635) | def step_async(self, actions):
method step_wait (line 640) | def step_wait(self):
method reset (line 646) | def reset(self, reset_choose):
method reset_task (line 652) | def reset_task(self):
method close (line 657) | def close(self):
class DummyVecEnv (line 671) | class DummyVecEnv(ShareVecEnv):
method __init__ (line 672) | def __init__(self, env_fns):
method step_async (line 679) | def step_async(self, actions):
method step_wait (line 682) | def step_wait(self):
method reset (line 697) | def reset(self):
method close (line 701) | def close(self):
method render (line 705) | def render(self, mode="human"):
class ShareDummyVecEnv (line 716) | class ShareDummyVecEnv(ShareVecEnv):
method __init__ (line 717) | def __init__(self, env_fns):
method step_async (line 724) | def step_async(self, actions):
method step_wait (line 727) | def step_wait(self):
method reset (line 743) | def reset(self):
method close (line 748) | def close(self):
method render (line 752) | def render(self, mode="human"):
class ChooseDummyVecEnv (line 762) | class ChooseDummyVecEnv(ShareVecEnv):
method __init__ (line 763) | def __init__(self, env_fns):
method step_async (line 770) | def step_async(self, actions):
method step_wait (line 773) | def step_wait(self):
method reset (line 780) | def reset(self, reset_choose):
method close (line 786) | def close(self):
method render (line 790) | def render(self, mode="human"):
class ChooseSimpleDummyVecEnv (line 799) | class ChooseSimpleDummyVecEnv(ShareVecEnv):
method __init__ (line 800) | def __init__(self, env_fns):
method step_async (line 807) | def step_async(self, actions):
method step_wait (line 810) | def step_wait(self):
method reset (line 816) | def reset(self, reset_choose):
method close (line 821) | def close(self):
method render (line 825) | def render(self, mode="human"):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/ant.py
class AntEnv (line 8) | class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 9) | def __init__(self, **kwargs):
method step (line 13) | def step(self, a):
method _get_obs (line 61) | def _get_obs(self):
method reset_model (line 81) | def reset_model(self):
method viewer_setup (line 89) | def viewer_setup(self):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/coupled_half_cheetah.py
class CoupledHalfCheetah (line 9) | class CoupledHalfCheetah(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 10) | def __init__(self, **kwargs):
method step (line 14) | def step(self, action):
method _get_obs (line 117) | def _get_obs(self):
method reset_model (line 137) | def reset_model(self):
method viewer_setup (line 143) | def viewer_setup(self):
method get_env_info (line 146) | def get_env_info(self):
method _set_action_space (line 149) | def _set_action_space(self):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/half_cheetah.py
class HalfCheetahEnv (line 10) | class HalfCheetahEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 11) | def __init__(self, **kwargs):
method step (line 16) | def step(self, action):
method _get_obs (line 48) | def _get_obs(self):
method reset_model (line 61) | def reset_model(self):
method viewer_setup (line 67) | def viewer_setup(self):
method _set_action_space (line 70) | def _set_action_space(self):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/hopper.py
class HopperEnv (line 7) | class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 8) | def __init__(self, **kwargs):
method step (line 13) | def step(self, a):
method _get_obs (line 50) | def _get_obs(self):
method reset_model (line 63) | def reset_model(self):
method last_mocap_x (line 69) | def last_mocap_x(self):
method viewer_setup (line 72) | def viewer_setup(self):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/humanoid.py
function mass_center (line 8) | def mass_center(model, sim):
class HumanoidEnv (line 14) | class HumanoidEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 15) | def __init__(self, **kwargs):
method _get_obs (line 19) | def _get_obs(self):
method step (line 44) | def step(self, a):
method reset_model (line 85) | def reset_model(self):
method viewer_setup (line 99) | def viewer_setup(self):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/manyagent_ant.py
class ManyAgentAntEnv (line 10) | class ManyAgentAntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 11) | def __init__(self, **kwargs):
method _generate_asset (line 33) | def _generate_asset(self, n_segs, asset_path):
method step (line 86) | def step(self, a):
method _get_obs (line 154) | def _get_obs(self):
method reset_model (line 186) | def reset_model(self):
method viewer_setup (line 194) | def viewer_setup(self):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/manyagent_swimmer.py
class ManyAgentSwimmerEnv (line 9) | class ManyAgentSwimmerEnv(mujoco_env.MujocoEnv, utils.EzPickle):
method __init__ (line 10) | def __init__(self, **kwargs):
method _generate_asset (line 30) | def _generate_asset(self, n_segs, asset_path):
method step (line 66) | def step(self, a):
method _get_obs (line 123) | def _get_obs(self):
method reset_model (line 144) | def reset_model(self):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/mujoco_env.py
function convert_observation_to_space (line 19) | def convert_observation_to_space(observation):
class MujocoEnv (line 35) | class MujocoEnv(gym.Env):
method __init__ (line 39) | def __init__(self, model_path, frame_skip):
method _set_action_space (line 71) | def _set_action_space(self):
method _set_observation_space (line 77) | def _set_observation_space(self, observation):
method seed (line 81) | def seed(self, seed=None):
method reset_model (line 88) | def reset_model(self):
method viewer_setup (line 95) | def viewer_setup(self):
method reset (line 105) | def reset(self):
method set_state (line 110) | def set_state(self, qpos, qvel):
method dt (line 119) | def dt(self):
method do_simulation (line 122) | def do_simulation(self, ctrl, n_frames):
method render (line 127) | def render(self,
method close (line 160) | def close(self):
method _get_viewer (line 166) | def _get_viewer(self, mode):
method get_body_com (line 178) | def get_body_com(self, body_name):
method state_vector (line 181) | def state_vector(self):
method place_random_objects (line 187) | def place_random_objects(self):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/mujoco_multi.py
function env_fn (line 13) | def env_fn(env, **kwargs) -> MultiAgentEnv: # TODO: this may be a more ...
class NormalizedActions (line 26) | class NormalizedActions(gym.ActionWrapper):
method _action (line 28) | def _action(self, action):
method action (line 34) | def action(self, action_):
method _reverse_action (line 37) | def _reverse_action(self, action):
class MujocoMulti (line 44) | class MujocoMulti(MultiAgentEnv):
method __init__ (line 46) | def __init__(self, batch_size=None, **kwargs):
method step (line 176) | def step(self, actions):
method get_obs (line 205) | def get_obs(self):
method get_obs_agent (line 220) | def get_obs_agent(self, agent_id):
method get_obs_size (line 236) | def get_obs_size(self):
method get_state (line 244) | def get_state(self, team=None):
method get_state_size (line 257) | def get_state_size(self):
method get_avail_actions (line 261) | def get_avail_actions(self): # all actions are always available
method get_avail_agent_actions (line 264) | def get_avail_agent_actions(self, agent_id):
method get_total_actions (line 268) | def get_total_actions(self):
method get_stats (line 273) | def get_stats(self):
method get_agg_stats (line 277) | def get_agg_stats(self, stats):
method reset (line 280) | def reset(self, **kwargs):
method render (line 286) | def render(self, **kwargs):
method close (line 289) | def close(self):
method seed (line 292) | def seed(self, args):
method get_env_info (line 295) | def get_env_info(self):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/multiagentenv.py
function convert (line 5) | def convert(dictionary):
class MultiAgentEnv (line 8) | class MultiAgentEnv(object):
method __init__ (line 10) | def __init__(self, batch_size=None, **kwargs):
method step (line 21) | def step(self, actions):
method get_obs (line 25) | def get_obs(self):
method get_obs_agent (line 29) | def get_obs_agent(self, agent_id):
method get_obs_size (line 33) | def get_obs_size(self):
method get_state (line 37) | def get_state(self):
method get_state_size (line 40) | def get_state_size(self):
method get_avail_actions (line 44) | def get_avail_actions(self):
method get_avail_agent_actions (line 47) | def get_avail_agent_actions(self, agent_id):
method get_total_actions (line 51) | def get_total_actions(self):
method get_stats (line 56) | def get_stats(self):
method get_agg_stats (line 60) | def get_agg_stats(self, stats):
method reset (line 63) | def reset(self):
method render (line 67) | def render(self):
method close (line 70) | def close(self):
method seed (line 73) | def seed(self, seed):
method get_env_info (line 76) | def get_env_info(self):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/obsk.py
class Node (line 5) | class Node():
method __init__ (line 6) | def __init__(self, label, qpos_ids, qvel_ids, act_ids, body_fn=None, b...
method __str__ (line 17) | def __str__(self):
method __repr__ (line 20) | def __repr__(self):
class HyperEdge (line 24) | class HyperEdge():
method __init__ (line 25) | def __init__(self, *edges):
method __contains__ (line 28) | def __contains__(self, item):
method __str__ (line 31) | def __str__(self):
method __repr__ (line 34) | def __repr__(self):
function get_joints_at_kdist (line 38) | def get_joints_at_kdist(agent_id, agent_partitions, hyperedges, k=0, kag...
function build_obs (line 74) | def build_obs(env, k_dict, k_categories, global_dict, global_categories,...
function build_actions (line 141) | def build_actions(agent_partitions, k_dict):
function get_parts_and_edges (line 146) | def get_parts_and_edges(label, partitioning):
FILE: MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/test.py
function main (line 6) | def main():
FILE: MAPPO-Lagrangian/mappo_lagrangian/runner/separated/base_runner.py
function _t2n (line 13) | def _t2n(x):
class Runner (line 16) | class Runner(object):
method __init__ (line 17) | def __init__(self, config):
method run (line 105) | def run(self):
method warmup (line 108) | def warmup(self):
method collect (line 111) | def collect(self, step):
method insert (line 114) | def insert(self, data):
method compute (line 118) | def compute(self):
method train (line 127) | def train(self):
method save (line 139) | def save(self):
method restore (line 150) | def restore(self):
method log_train (line 161) | def log_train(self, train_infos, total_num_steps):
method log_env (line 170) | def log_env(self, env_infos, total_num_steps):
FILE: MAPPO-Lagrangian/mappo_lagrangian/runner/separated/base_runner_mappo_lagr.py
function _t2n (line 14) | def _t2n(x):
class Runner (line 18) | class Runner(object):
method __init__ (line 19) | def __init__(self, config):
method run (line 113) | def run(self):
method warmup (line 116) | def warmup(self):
method collect (line 119) | def collect(self, step):
method insert (line 122) | def insert(self, data):
method compute (line 126) | def compute(self):
method train (line 141) | def train(self):
method buffer_filter (line 185) | def buffer_filter(self, agent_id):
method remove_episodes (line 214) | def remove_episodes(self, agent_id, del_ids):
method save (line 241) | def save(self):
method restore (line 252) | def restore(self):
method log_train (line 263) | def log_train(self, train_infos, total_num_steps):
method log_env (line 272) | def log_env(self, env_infos, total_num_steps):
FILE: MAPPO-Lagrangian/mappo_lagrangian/runner/separated/mujoco_runner.py
function _t2n (line 9) | def _t2n(x):
class MujocoRunner (line 13) | class MujocoRunner(Runner):
method __init__ (line 16) | def __init__(self, config):
method run (line 19) | def run(self):
method warmup (line 90) | def warmup(self):
method collect (line 102) | def collect(self, step):
method insert (line 130) | def insert(self, data):
method log_train (line 158) | def log_train(self, train_infos, total_num_steps):
method eval (line 170) | def eval(self, total_num_steps):
FILE: MAPPO-Lagrangian/mappo_lagrangian/runner/separated/mujoco_runner_mappo_lagr.py
function _t2n (line 11) | def _t2n(x):
class MujocoRunner (line 15) | class MujocoRunner(Runner):
method __init__ (line 18) | def __init__(self, config):
method run (line 21) | def run(self):
method return_aver_cost (line 105) | def return_aver_cost(self, aver_episode_costs):
method warmup (line 109) | def warmup(self):
method collect (line 122) | def collect(self, step):
method insert (line 162) | def insert(self, data):
method log_train (line 195) | def log_train(self, train_infos, total_num_steps):
method eval (line 208) | def eval(self, total_num_steps):
FILE: MAPPO-Lagrangian/mappo_lagrangian/scripts/eval/eval_hanabi.py
function make_train_env (line 18) | def make_train_env(all_args):
function make_eval_env (line 38) | def make_eval_env(all_args):
function parse_args (line 59) | def parse_args(args, parser):
function main (line 70) | def main(args):
FILE: MAPPO-Lagrangian/mappo_lagrangian/scripts/train/train_mujoco.py
function make_train_env (line 23) | def make_train_env(all_args):
function make_eval_env (line 46) | def make_eval_env(all_args):
function parse_args (line 69) | def parse_args(args, parser):
function main (line 92) | def main(args):
FILE: MAPPO-Lagrangian/mappo_lagrangian/utils/multi_discrete.py
class MultiDiscrete (line 6) | class MultiDiscrete(gym.Space):
method __init__ (line 22) | def __init__(self, array_of_param_array):
method sample (line 28) | def sample(self):
method contains (line 34) | def contains(self, x):
method shape (line 38) | def shape(self):
method __repr__ (line 41) | def __repr__(self):
method __eq__ (line 44) | def __eq__(self, other):
FILE: MAPPO-Lagrangian/mappo_lagrangian/utils/popart.py
class PopArt (line 8) | class PopArt(nn.Module):
method __init__ (line 11) | def __init__(self, input_shape, norm_axes=1, beta=0.99999, per_element...
method reset_parameters (line 25) | def reset_parameters(self):
method running_mean_var (line 30) | def running_mean_var(self):
method forward (line 36) | def forward(self, input_vector, train=True):
method denormalize (line 64) | def denormalize(self, input_vector):
FILE: MAPPO-Lagrangian/mappo_lagrangian/utils/separated_buffer.py
function _flatten (line 8) | def _flatten(T, N, x):
function _cast (line 12) | def _cast(x):
class SeparatedReplayBuffer (line 16) | class SeparatedReplayBuffer(object):
method __init__ (line 17) | def __init__(self, args, obs_space, share_obs_space, act_space):
method update_factor (line 74) | def update_factor(self, factor):
method return_aver_insert (line 77) | def return_aver_insert(self, aver_episode_costs):
method insert (line 80) | def insert(self, share_obs, obs, rnn_states, rnn_states_critic, action...
method chooseinsert (line 107) | def chooseinsert(self, share_obs, obs, rnn_states, rnn_states_critic, ...
method after_update (line 127) | def after_update(self):
method chooseafter_update (line 139) | def chooseafter_update(self):
method compute_returns (line 145) | def compute_returns(self, next_value, value_normalizer=None):
method compute_cost_returns (line 192) | def compute_cost_returns(self, next_cost, value_normalizer=None):
method feed_forward_generator (line 236) | def feed_forward_generator(self, advantages, num_mini_batch=None, mini...
method naive_recurrent_generator (line 313) | def naive_recurrent_generator(self, advantages, num_mini_batch, cost_a...
FILE: MAPPO-Lagrangian/mappo_lagrangian/utils/shared_buffer.py
function _flatten (line 6) | def _flatten(T, N, x):
function _cast (line 10) | def _cast(x):
class SharedReplayBuffer (line 14) | class SharedReplayBuffer(object):
method __init__ (line 24) | def __init__(self, args, num_agents, obs_space, cent_obs_space, act_sp...
method insert (line 80) | def insert(self, share_obs, obs, rnn_states_actor, rnn_states_critic, ...
method chooseinsert (line 115) | def chooseinsert(self, share_obs, obs, rnn_states, rnn_states_critic, ...
method after_update (line 150) | def after_update(self):
method chooseafter_update (line 162) | def chooseafter_update(self):
method compute_returns (line 169) | def compute_returns(self, next_value, value_normalizer=None):
method feed_forward_generator (line 195) | def feed_forward_generator(self, advantages, num_mini_batch=None, mini...
method naive_recurrent_generator (line 257) | def naive_recurrent_generator(self, advantages, num_mini_batch):
method recurrent_generator (line 354) | def recurrent_generator(self, advantages, num_mini_batch, data_chunk_l...
FILE: MAPPO-Lagrangian/mappo_lagrangian/utils/util.py
function check (line 5) | def check(input):
function get_gard_norm (line 9) | def get_gard_norm(it):
function update_linear_schedule (line 17) | def update_linear_schedule(optimizer, epoch, total_num_epochs, initial_lr):
function huber_loss (line 23) | def huber_loss(e, d):
function mse_loss (line 28) | def mse_loss(e):
function get_shape_from_obs_space (line 31) | def get_shape_from_obs_space(obs_space):
function get_shape_from_act_space (line 40) | def get_shape_from_act_space(act_space):
function tile_images (line 54) | def tile_images(img_nhwc):
FILE: MAPPO-Lagrangian/setup.py
function get_version (line 8) | def get_version() -> str:
Condensed preview — 141 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (871K chars).
[
{
"path": "LICENSE",
"chars": 1212,
"preview": "MIT License\n\n<<<<<<< HEAD\nCopyright (c) 2021 anybodyany\n=======\nCopyright (c) 2020 Tianshou contributors\n>>>>>>> upload "
},
{
"path": "MACPO/.gitignore",
"chars": 23,
"preview": "/.idea/\n*/__pycache__/\n"
},
{
"path": "MACPO/environment.yaml",
"chars": 5127,
"preview": "name: marl\nchannels:\n - defaults\ndependencies:\n - _libgcc_mutex=0.1=main\n - _tflow_select=2.1.0=gpu\n - absl-py=0.9.0"
},
{
"path": "MACPO/macpo/__init__.py",
"chars": 191,
"preview": "from macpo import algorithms, envs, runner, scripts, utils, config\n\n\n__version__ = \"0.1.0\"\n\n__all__ = [\n \"algorithms\""
},
{
"path": "MACPO/macpo/algorithms/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MACPO/macpo/algorithms/r_mappo/__init__.py",
"chars": 39,
"preview": "def cost_trpo_macppo():\n return None"
},
{
"path": "MACPO/macpo/algorithms/r_mappo/algorithm/MACPPOPolicy.py",
"chars": 9806,
"preview": "import torch\nfrom macpo.algorithms.r_mappo.algorithm.r_actor_critic import R_Actor, R_Critic\nfrom macpo.utils.util impor"
},
{
"path": "MACPO/macpo/algorithms/r_mappo/algorithm/rMAPPOPolicy.py",
"chars": 7264,
"preview": "import torch\nfrom macpo.algorithms.r_mappo.algorithm.r_actor_critic import R_Actor, R_Critic\nfrom macpo.utils.util impor"
},
{
"path": "MACPO/macpo/algorithms/r_mappo/algorithm/r_actor_critic.py",
"chars": 8972,
"preview": "import torch\nimport torch.nn as nn\nfrom macpo.algorithms.utils.util import init, check\nfrom macpo.algorithms.utils.cnn i"
},
{
"path": "MACPO/macpo/algorithms/r_mappo/r_macpo.py",
"chars": 31855,
"preview": "import numpy as np\nimport torch\nimport torch.nn as nn\nfrom macpo.utils.util import get_gard_norm, huber_loss, mse_loss\nf"
},
{
"path": "MACPO/macpo/algorithms/utils/act.py",
"chars": 11516,
"preview": "from .distributions import Bernoulli, Categorical, DiagGaussian\nimport torch\nimport torch.nn as nn\n\nclass ACTLayer(nn.Mo"
},
{
"path": "MACPO/macpo/algorithms/utils/cnn.py",
"chars": 1852,
"preview": "import torch.nn as nn\nfrom .util import init\n\n\"\"\"CNN Modules and utils.\"\"\"\n\nclass Flatten(nn.Module):\n def forward(se"
},
{
"path": "MACPO/macpo/algorithms/utils/distributions.py",
"chars": 4962,
"preview": "import torch\nimport torch.nn as nn\nfrom .util import init\n\n\"\"\"\nModify standard PyTorch distributions so they to make com"
},
{
"path": "MACPO/macpo/algorithms/utils/mlp.py",
"chars": 2087,
"preview": "import torch.nn as nn\nfrom .util import init, get_clones\n\n\"\"\"MLP modules.\"\"\"\n\nclass MLPLayer(nn.Module):\n def __init_"
},
{
"path": "MACPO/macpo/algorithms/utils/rnn.py",
"chars": 2849,
"preview": "import torch\nimport torch.nn as nn\n\n\"\"\"RNN modules.\"\"\"\n\n\nclass RNNLayer(nn.Module):\n def __init__(self, inputs_dim, o"
},
{
"path": "MACPO/macpo/algorithms/utils/util.py",
"chars": 425,
"preview": "import copy\nimport numpy as np\n\nimport torch\nimport torch.nn as nn\n\ndef init(module, weight_init, bias_init, gain=1):\n "
},
{
"path": "MACPO/macpo/config.py",
"chars": 17774,
"preview": "import argparse\n\n\ndef get_config():\n \"\"\"\n The configuration parser for common hyperparameters of all environment. "
},
{
"path": "MACPO/macpo/envs/__init__.py",
"chars": 90,
"preview": "\r\nimport socket\r\nfrom absl import flags\r\nFLAGS = flags.FLAGS\r\nFLAGS(['train_sc.py'])\r\n\r\n\r\n"
},
{
"path": "MACPO/macpo/envs/env_wrappers.py",
"chars": 28871,
"preview": "\"\"\"\nModified from OpenAI Baselines code to work with multi-agent envs\n\"\"\"\nimport numpy as np\nimport torch\nfrom multiproc"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/MUJOCO_LOG.TXT",
"chars": 56,
"preview": "Sun Aug 29 11:16:41 2021\nERROR: Expired activation key\n\n"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/README.md",
"chars": 3109,
"preview": "#### Safety Multi-agent Mujoco \n\n\n## 1. Sate Many Agent Ant\n\nAccording to Zanger's work, \n\nThe reward function is equal "
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/__init__.py",
"chars": 185,
"preview": "from .mujoco_multi import MujocoMulti\nfrom .coupled_half_cheetah import CoupledHalfCheetah\nfrom .manyagent_swimmer impor"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/ant.py",
"chars": 3637,
"preview": "import numpy as np\n# from mujoco_safety_gym.envs import mujoco_env\nfrom macpo.envs.safety_ma_mujoco.safety_multiagent_mu"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/.gitignore",
"chars": 11,
"preview": "*.auto.xml\n"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/ant.xml",
"chars": 9216,
"preview": "<mujoco model=\"ant\">\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <option integrator=\"RK4\" t"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/coupled_half_cheetah.xml",
"chars": 11755,
"preview": "<!-- Cheetah Model\n The state space is populated with joints in the order that they are\n defined in this file. The"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/half_cheetah.xml",
"chars": 7847,
"preview": "<!-- Cheetah Model\n The state space is populated with joints in the order that they are\n defined in this file. The"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/hopper.xml",
"chars": 4326,
"preview": "<mujoco model=\"hopper\">\n <compiler angle=\"degree\" coordinate=\"global\" inertiafromgeom=\"true\"/>\n <default>\n <joint a"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/humanoid.xml",
"chars": 10177,
"preview": "<mujoco model=\"humanoid\">\n <compiler angle=\"degree\" inertiafromgeom=\"true\"/>\n <default>\n <joint armature=\"1"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_ant.xml",
"chars": 10356,
"preview": "<mujoco model=\"ant\">\n <size nconmax=\"200\"/>\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <o"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_ant.xml.template",
"chars": 3739,
"preview": "<mujoco model=\"ant\">\n <size nconmax=\"200\"/>\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <o"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_ant__stage1.xml",
"chars": 5215,
"preview": "<mujoco model=\"ant\">\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <option integrator=\"RK4\" t"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_swimmer.xml.template",
"chars": 2170,
"preview": "<mujoco model=\"swimmer\">\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <option collision=\"pre"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_swimmer__bckp2.xml",
"chars": 2900,
"preview": "<mujoco model=\"swimmer\">\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <option collision=\"pre"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_swimmer_bckp.xml",
"chars": 2570,
"preview": "<mujoco model=\"swimmer\">\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <option collision=\"pre"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/coupled_half_cheetah.py",
"chars": 6656,
"preview": "import numpy as np\nfrom gym import utils\nfrom gym.envs.mujoco import mujoco_env\nfrom macpo.envs.safety_ma_mujoco.safety_"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/half_cheetah.py",
"chars": 2889,
"preview": "import numpy as np\nfrom gym import utils\n# from mujoco_safety_gym.envs import mujoco_env\n# from gym.envs.mujoco import m"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/hopper.py",
"chars": 2753,
"preview": "import numpy as np\nfrom macpo.envs.safety_ma_mujoco.safety_multiagent_mujoco import mujoco_env\nfrom gym import utils\nimp"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/humanoid.py",
"chars": 4485,
"preview": "import numpy as np\n# from mujoco_safety_gym.envs import mujoco_env\nfrom macpo.envs.safety_ma_mujoco.safety_multiagent_mu"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/manyagent_ant.py",
"chars": 8801,
"preview": "import numpy as np\nfrom gym import utils\nfrom gym.envs.mujoco import mujoco_env\nfrom jinja2 import Template\n\nimport mujo"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/manyagent_swimmer.py",
"chars": 6226,
"preview": "import numpy as np\nfrom gym import utils\nfrom gym.envs.mujoco import mujoco_env\n\nimport os\nfrom jinja2 import Template\ni"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/mujoco_env.py",
"chars": 6668,
"preview": "from collections import OrderedDict\nimport os\n\n\nfrom gym import error, spaces\nfrom gym.utils import seeding\nimport numpy"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/mujoco_multi.py",
"chars": 14130,
"preview": "from functools import partial\nimport gym\nfrom gym.spaces import Box\nfrom gym.wrappers import TimeLimit\nimport numpy as n"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/multiagentenv.py",
"chars": 2411,
"preview": "from collections import namedtuple\nimport numpy as np\n\n\ndef convert(dictionary):\n return namedtuple('GenericDict', di"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/safety_multiagent_mujoco/obsk.py",
"chars": 23708,
"preview": "import itertools\nimport numpy as np\nfrom copy import deepcopy\n\nclass Node():\n def __init__(self, label, qpos_ids, qve"
},
{
"path": "MACPO/macpo/envs/safety_ma_mujoco/test.py",
"chars": 3935,
"preview": "from safety_multiagent_mujoco.mujoco_multi import MujocoMulti\nimport numpy as np\nimport time\n\n\ndef main():\n\n # Swimme"
},
{
"path": "MACPO/macpo/runner/__init__.py",
"chars": 64,
"preview": "from macpo.runner import separated\n\n__all__=[\n\n \"separated\"\n]"
},
{
"path": "MACPO/macpo/runner/separated/__init__.py",
"chars": 77,
"preview": "from macpo.runner.separated import base_runner\n\n__all__=[\n \"base_runner\"\n]"
},
{
"path": "MACPO/macpo/runner/separated/base_runner.py",
"chars": 7703,
"preview": " \nimport time\nimport wandb\nimport os\nimport numpy as np\nfrom itertools import chain\nimport torch\nfrom tensorboardX im"
},
{
"path": "MACPO/macpo/runner/separated/base_runner_macpo.py",
"chars": 14185,
"preview": "import copy\nimport time\nimport wandb\nimport os\nimport numpy as np\nfrom itertools import chain\nimport torch\nfrom tensorbo"
},
{
"path": "MACPO/macpo/runner/separated/mujoco_runner.py",
"chars": 10718,
"preview": "import time\nimport wandb\nimport numpy as np\nfrom functools import reduce\nimport torch\nfrom macpo.runner.separated.base_r"
},
{
"path": "MACPO/macpo/runner/separated/mujoco_runner_macpo.py",
"chars": 14564,
"preview": "import time\nfrom itertools import chain\n\nimport wandb\nimport numpy as np\nfrom functools import reduce\nimport torch\nfrom "
},
{
"path": "MACPO/macpo/scripts/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MACPO/macpo/scripts/train/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MACPO/macpo/scripts/train/train_mujoco.py",
"chars": 7253,
"preview": "#!/usr/bin/env python\nimport sys\nimport os\ncurPath = os.path.abspath(__file__)\n\nif len(curPath.split('/'))==1:\n rootP"
},
{
"path": "MACPO/macpo/scripts/train_mujoco.sh",
"chars": 843,
"preview": "#!/bin/sh\nenv=\"mujoco\"\nscenario=\"Ant-v2\"\nagent_conf=\"2x4\"\nagent_obsk=1\nalgo=\"macpo\"\nexp=\"rnn\"\nseed_max=1\n\necho \"env is $"
},
{
"path": "MACPO/macpo/utils/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MACPO/macpo/utils/multi_discrete.py",
"chars": 2346,
"preview": "import gym\nimport numpy as np\n\n# An old version of OpenAI Gym's multi_discrete.py. (Was getting affected by Gym updates)"
},
{
"path": "MACPO/macpo/utils/popart.py",
"chars": 3106,
"preview": "\nimport numpy as np\n\nimport torch\nimport torch.nn as nn\n\n\nclass PopArt(nn.Module):\n \"\"\" Normalize a vector of observa"
},
{
"path": "MACPO/macpo/utils/separated_buffer.py",
"chars": 32622,
"preview": "import torch\nimport numpy as np\nfrom collections import defaultdict\n\nfrom macpo.utils.util import check, get_shape_from_"
},
{
"path": "MACPO/macpo/utils/util.py",
"chars": 2233,
"preview": "import numpy as np\nimport math\nimport torch\n\ndef check(input):\n if type(input) == np.ndarray:\n return torch.fr"
},
{
"path": "MACPO/macpo.egg-info/PKG-INFO",
"chars": 5959,
"preview": "Metadata-Version: 2.1\nName: macpo\nVersion: 0.1.0\nSummary: macpo algorithms of marlbenchmark\nHome-page: UNKNOWN\nAuthor: m"
},
{
"path": "MACPO/macpo.egg-info/SOURCES.txt",
"chars": 1770,
"preview": "README.md\nsetup.py\nmacpo/__init__.py\nmacpo/config.py\nmacpo.egg-info/PKG-INFO\nmacpo.egg-info/SOURCES.txt\nmacpo.egg-info/d"
},
{
"path": "MACPO/macpo.egg-info/dependency_links.txt",
"chars": 1,
"preview": "\n"
},
{
"path": "MACPO/macpo.egg-info/top_level.txt",
"chars": 6,
"preview": "macpo\n"
},
{
"path": "MACPO/setup.py",
"chars": 1201,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nimport os\nfrom setuptools import setup, find_packages\nimport setuptools\n\n"
},
{
"path": "MAPPO-Lagrangian/.gitignore",
"chars": 23,
"preview": "/.idea/\n*/__pycache__/\n"
},
{
"path": "MAPPO-Lagrangian/environment.yaml",
"chars": 5127,
"preview": "name: marl\nchannels:\n - defaults\ndependencies:\n - _libgcc_mutex=0.1=main\n - _tflow_select=2.1.0=gpu\n - absl-py=0.9.0"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/__init__.py",
"chars": 202,
"preview": "from mappo_lagrangian import algorithms, envs, runner, scripts, utils, config\n\n\n__version__ = \"0.1.0\"\n\n__all__ = [\n \""
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/r_mappo/__init__.py",
"chars": 39,
"preview": "def cost_trpo_macppo():\n return None"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/r_mappo/algorithm/MACPPOPolicy.py",
"chars": 8805,
"preview": "import torch\nfrom mappo_lagrangian.algorithms.r_mappo.algorithm.r_actor_critic import R_Actor, R_Critic\nfrom mappo_lagra"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/r_mappo/algorithm/rMAPPOPolicy.py",
"chars": 7286,
"preview": "import torch\nfrom mappo_lagrangian.algorithms.r_mappo.algorithm.r_actor_critic import R_Actor, R_Critic\nfrom mappo_lagra"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/r_mappo/algorithm/r_actor_critic.py",
"chars": 8038,
"preview": "import torch\nimport torch.nn as nn\nfrom mappo_lagrangian.algorithms.utils.util import init, check\nfrom mappo_lagrangian."
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/r_mappo/r_mappo_lagr.py",
"chars": 19221,
"preview": "import numpy as np\nimport torch\nimport torch.nn as nn\nfrom mappo_lagrangian.utils.util import get_gard_norm, huber_loss,"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/act.py",
"chars": 8019,
"preview": "from .distributions import Bernoulli, Categorical, DiagGaussian\nimport torch\nimport torch.nn as nn\n\nclass ACTLayer(nn.Mo"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/cnn.py",
"chars": 1852,
"preview": "import torch.nn as nn\nfrom .util import init\n\n\"\"\"CNN Modules and utils.\"\"\"\n\nclass Flatten(nn.Module):\n def forward(se"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/distributions.py",
"chars": 4540,
"preview": "import torch\nimport torch.nn as nn\nfrom .util import init\n\n\"\"\"\nModify standard PyTorch distributions so they to make com"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/mlp.py",
"chars": 1892,
"preview": "import torch.nn as nn\nfrom .util import init, get_clones\n\n\"\"\"MLP modules.\"\"\"\n\nclass MLPLayer(nn.Module):\n def __init_"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/rnn.py",
"chars": 2849,
"preview": "import torch\nimport torch.nn as nn\n\n\"\"\"RNN modules.\"\"\"\n\n\nclass RNNLayer(nn.Module):\n def __init__(self, inputs_dim, o"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/algorithms/utils/util.py",
"chars": 425,
"preview": "import copy\nimport numpy as np\n\nimport torch\nimport torch.nn as nn\n\ndef init(module, weight_init, bias_init, gain=1):\n "
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/config.py",
"chars": 16899,
"preview": "import argparse\n\n\ndef get_config():\n \"\"\"\n The configuration parser for common hyperparameters of all environment. "
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/__init__.py",
"chars": 90,
"preview": "\r\nimport socket\r\nfrom absl import flags\r\nFLAGS = flags.FLAGS\r\nFLAGS(['train_sc.py'])\r\n\r\n\r\n"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/env_wrappers.py",
"chars": 28882,
"preview": "\"\"\"\nModified from OpenAI Baselines code to work with multi-agent envs\n\"\"\"\nimport numpy as np\nimport torch\nfrom multiproc"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/MUJOCO_LOG.TXT",
"chars": 56,
"preview": "Sun Aug 29 11:16:41 2021\nERROR: Expired activation key\n\n"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/README.md",
"chars": 3109,
"preview": "#### Safety Multi-agent Mujoco \n\n\n## 1. Sate Many Agent Ant\n\nAccording to Zanger's work, \n\nThe reward function is equal "
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/__init__.py",
"chars": 185,
"preview": "from .mujoco_multi import MujocoMulti\nfrom .coupled_half_cheetah import CoupledHalfCheetah\nfrom .manyagent_swimmer impor"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/ant.py",
"chars": 3648,
"preview": "import numpy as np\n# from mujoco_safety_gym.envs import mujoco_env\nfrom mappo_lagrangian.envs.safety_ma_mujoco.safety_mu"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/.gitignore",
"chars": 11,
"preview": "*.auto.xml\n"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/ant.xml",
"chars": 9216,
"preview": "<mujoco model=\"ant\">\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <option integrator=\"RK4\" t"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/beifen_hopper.xml",
"chars": 5045,
"preview": "<mujoco model=\"hopper\">\n <compiler angle=\"degree\" coordinate=\"global\" inertiafromgeom=\"true\"/>\n <default>\n <joint a"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/coupled_half_cheetah.xml",
"chars": 11755,
"preview": "<!-- Cheetah Model\n The state space is populated with joints in the order that they are\n defined in this file. The"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/half_cheetah.xml",
"chars": 7847,
"preview": "<!-- Cheetah Model\n The state space is populated with joints in the order that they are\n defined in this file. The"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/hopper.xml",
"chars": 4326,
"preview": "<mujoco model=\"hopper\">\n <compiler angle=\"degree\" coordinate=\"global\" inertiafromgeom=\"true\"/>\n <default>\n <joint a"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/humanoid.xml",
"chars": 10177,
"preview": "<mujoco model=\"humanoid\">\n <compiler angle=\"degree\" inertiafromgeom=\"true\"/>\n <default>\n <joint armature=\"1"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_ant.xml",
"chars": 10356,
"preview": "<mujoco model=\"ant\">\n <size nconmax=\"200\"/>\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <o"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_ant.xml.template",
"chars": 3739,
"preview": "<mujoco model=\"ant\">\n <size nconmax=\"200\"/>\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <o"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_ant__stage1.xml",
"chars": 5215,
"preview": "<mujoco model=\"ant\">\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <option integrator=\"RK4\" t"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_swimmer.xml.template",
"chars": 2170,
"preview": "<mujoco model=\"swimmer\">\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <option collision=\"pre"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_swimmer__bckp2.xml",
"chars": 2900,
"preview": "<mujoco model=\"swimmer\">\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <option collision=\"pre"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/assets/manyagent_swimmer_bckp.xml",
"chars": 2570,
"preview": "<mujoco model=\"swimmer\">\n <compiler angle=\"degree\" coordinate=\"local\" inertiafromgeom=\"true\"/>\n <option collision=\"pre"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/coupled_half_cheetah.py",
"chars": 6667,
"preview": "import numpy as np\nfrom gym import utils\nfrom gym.envs.mujoco import mujoco_env\nfrom mappo_lagrangian.envs.safety_ma_muj"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/half_cheetah.py",
"chars": 2901,
"preview": "import numpy as np\nfrom gym import utils\n# from mujoco_safety_gym.envs import mujoco_env\n# from gym.envs.mujoco import m"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/hopper.py",
"chars": 2764,
"preview": "import numpy as np\nfrom mappo_lagrangian.envs.safety_ma_mujoco.safety_multiagent_mujoco import mujoco_env\nfrom gym impor"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/humanoid.py",
"chars": 4496,
"preview": "import numpy as np\n# from mujoco_safety_gym.envs import mujoco_env\nfrom mappo_lagrangian.envs.safety_ma_mujoco.safety_mu"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/manyagent_ant.py",
"chars": 8704,
"preview": "import numpy as np\nfrom gym import utils\nfrom gym.envs.mujoco import mujoco_env\nfrom jinja2 import Template\n\nimport mujo"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/manyagent_swimmer.py",
"chars": 6226,
"preview": "import numpy as np\nfrom gym import utils\nfrom gym.envs.mujoco import mujoco_env\n\nimport os\nfrom jinja2 import Template\ni"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/mujoco_env.py",
"chars": 6668,
"preview": "from collections import OrderedDict\nimport os\n\n\nfrom gym import error, spaces\nfrom gym.utils import seeding\nimport numpy"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/mujoco_multi.py",
"chars": 14130,
"preview": "from functools import partial\nimport gym\nfrom gym.spaces import Box\nfrom gym.wrappers import TimeLimit\nimport numpy as n"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/multiagentenv.py",
"chars": 2411,
"preview": "from collections import namedtuple\nimport numpy as np\n\n\ndef convert(dictionary):\n return namedtuple('GenericDict', di"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/safety_multiagent_mujoco/obsk.py",
"chars": 23708,
"preview": "import itertools\nimport numpy as np\nfrom copy import deepcopy\n\nclass Node():\n def __init__(self, label, qpos_ids, qve"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/envs/safety_ma_mujoco/test.py",
"chars": 4272,
"preview": "from safety_multiagent_mujoco.mujoco_multi import MujocoMulti\nimport numpy as np\nimport time\n\n\ndef main():\n\n # Swimme"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/runner/__init__.py",
"chars": 74,
"preview": "from mappo_lagrangian.runner import separated\n\n__all__=[\n \"separated\"\n]"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/runner/separated/__init__.py",
"chars": 88,
"preview": "from mappo_lagrangian.runner.separated import base_runner\n\n__all__=[\n \"base_runner\"\n]"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/runner/separated/base_runner.py",
"chars": 7747,
"preview": " \nimport time\nimport wandb\nimport os\nimport numpy as np\nfrom itertools import chain\nimport torch\nfrom tensorboardX im"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/runner/separated/base_runner_mappo_lagr.py",
"chars": 13573,
"preview": "import copy\nimport time\nimport wandb\nimport os\nimport numpy as np\nfrom itertools import chain\nimport torch\nfrom tensorbo"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/runner/separated/mujoco_runner.py",
"chars": 10729,
"preview": "import time\nimport wandb\nimport numpy as np\nfrom functools import reduce\nimport torch\nfrom mappo_lagrangian.runner.separ"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/runner/separated/mujoco_runner_mappo_lagr.py",
"chars": 13370,
"preview": "import time\nfrom itertools import chain\n\nimport wandb\nimport numpy as np\nfrom functools import reduce\nimport torch\nfrom "
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/scripts/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/scripts/eval/eval_hanabi.py",
"chars": 6076,
"preview": "#!/usr/bin/env python\nimport sys\nimport os\nimport wandb\nimport socket\nimport setproctitle\nimport numpy as np\nfrom pathli"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/scripts/train/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/scripts/train/train_mujoco.py",
"chars": 7334,
"preview": "#!/usr/bin/env python\nimport sys\nimport os\ncurPath = os.path.abspath(__file__)\n\nif len(curPath.split('/'))==1:\n rootP"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/scripts/train_mujoco.sh",
"chars": 821,
"preview": "#!/bin/sh\nenv=\"mujoco\"\nscenario=\"Ant-v2\"\nagent_conf=\"2x4\"\nagent_obsk=1\nalgo=\"mappo_lagr\"\nexp=\"rnn\"\nseed_max=1\nseed_=50\n\n"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/utils/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/utils/multi_discrete.py",
"chars": 2346,
"preview": "import gym\nimport numpy as np\n\n# An old version of OpenAI Gym's multi_discrete.py. (Was getting affected by Gym updates)"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/utils/popart.py",
"chars": 3106,
"preview": "\nimport numpy as np\n\nimport torch\nimport torch.nn as nn\n\n\nclass PopArt(nn.Module):\n \"\"\" Normalize a vector of observa"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/utils/separated_buffer.py",
"chars": 23975,
"preview": "import torch\nimport numpy as np\nfrom collections import defaultdict\n\nfrom mappo_lagrangian.utils.util import check, get_"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/utils/shared_buffer.py",
"chars": 24972,
"preview": "import torch\nimport numpy as np\nfrom mappo_lagrangian.utils.util import get_shape_from_obs_space, get_shape_from_act_spa"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian/utils/util.py",
"chars": 2233,
"preview": "import numpy as np\nimport math\nimport torch\n\ndef check(input):\n if type(input) == np.ndarray:\n return torch.fr"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian.egg-info/PKG-INFO",
"chars": 5981,
"preview": "Metadata-Version: 2.1\nName: mappo-lagrangian\nVersion: 0.1.0\nSummary: mappo_lagrangian algorithms of marlbenchmark\nHome-p"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian.egg-info/SOURCES.txt",
"chars": 2223,
"preview": "README.md\nsetup.py\nmappo_lagrangian/__init__.py\nmappo_lagrangian/config.py\nmappo_lagrangian.egg-info/PKG-INFO\nmappo_lagr"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian.egg-info/dependency_links.txt",
"chars": 1,
"preview": "\n"
},
{
"path": "MAPPO-Lagrangian/mappo_lagrangian.egg-info/top_level.txt",
"chars": 17,
"preview": "mappo_lagrangian\n"
},
{
"path": "MAPPO-Lagrangian/setup.py",
"chars": 1234,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nimport os\nfrom setuptools import setup, find_packages\nimport setuptools\n\n"
},
{
"path": "README.md",
"chars": 9346,
"preview": "# Multi-Agent Constrained Policy Optimisation (MACPO)\r\n\r\nThe repository is for the paper: **[Multi-Agent Constrained Pol"
},
{
"path": "environment.yaml",
"chars": 5127,
"preview": "name: marl\nchannels:\n - defaults\ndependencies:\n - _libgcc_mutex=0.1=main\n - _tflow_select=2.1.0=gpu\n - absl-py=0.9.0"
},
{
"path": "requirements.txt",
"chars": 2697,
"preview": "absl-py==0.9.0\naiohttp==3.6.2\naioredis==1.3.1\nastor==0.8.0\nastunparse==1.6.3\nasync-timeout==3.0.1\natari-py==0.2.6\natomic"
}
]
About this extraction
This page contains the full source code of the chauncygu/Multi-Agent-Constrained-Policy-Optimisation GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 141 files (802.8 KB), approximately 231.4k tokens, and a symbol index with 796 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.