Repository: Shuijing725/CrowdNav_Prediction_AttnGraph
Branch: main
Commit: 390773137be0
Files: 111
Total size: 603.4 KB

Directory structure:
gitextract_a_pq4yd8/

├── .gitignore
├── CHANGELOG.md
├── LICENSE
├── README.md
├── arguments.py
├── collect_data.py
├── crowd_nav/
│   ├── __init__.py
│   ├── configs/
│   │   ├── __init__.py
│   │   └── config.py
│   └── policy/
│       ├── orca.py
│       ├── policy.py
│       ├── policy_factory.py
│       ├── social_force.py
│       └── srnn.py
├── crowd_sim/
│   ├── README.md
│   ├── __init__.py
│   └── envs/
│       ├── __init__.py
│       ├── crowd_sim.py
│       ├── crowd_sim_pred.py
│       ├── crowd_sim_pred_real_gst.py
│       ├── crowd_sim_var_num.py
│       ├── crowd_sim_var_num_collect.py
│       ├── ros_turtlebot2i_env.py
│       └── utils/
│           ├── __init__.py
│           ├── action.py
│           ├── agent.py
│           ├── human.py
│           ├── info.py
│           ├── recorder.py
│           ├── robot.py
│           └── state.py
├── gst_updated/
│   ├── .gitignore
│   ├── LICENSE
│   ├── README-old.md
│   ├── README.md
│   ├── __init__.py
│   ├── requirements.txt
│   ├── run/
│   │   ├── create_batch_datasets_eth_ucy.sh
│   │   ├── create_batch_datasets_self_eth_ucy.sh
│   │   ├── create_batch_datasets_synth.sh
│   │   ├── download_datasets_models.sh
│   │   ├── download_wrapper_data_model.sh
│   │   └── make_dirs.sh
│   ├── scripts/
│   │   ├── experiments/
│   │   │   ├── draft.py
│   │   │   ├── eval.py
│   │   │   ├── eval_trajnet.py
│   │   │   ├── load_trajnet_test_data.py
│   │   │   ├── pathhack.py
│   │   │   ├── test.py
│   │   │   ├── test_gst.py
│   │   │   └── train.py
│   │   └── wrapper/
│   │       ├── crowd_nav_interface_multi_env_parallel.py
│   │       ├── crowd_nav_interface_multi_env_visualization_test_single_batch.py
│   │       ├── crowd_nav_interface_parallel.py
│   │       └── pathhack.py
│   ├── src/
│   │   ├── gumbel_social_transformer/
│   │   │   ├── edge_selector_ghost.py
│   │   │   ├── edge_selector_no_ghost.py
│   │   │   ├── gumbel_social_transformer.py
│   │   │   ├── mha.py
│   │   │   ├── node_encoder_layer_ghost.py
│   │   │   ├── node_encoder_layer_no_ghost.py
│   │   │   ├── pathhack.py
│   │   │   ├── st_model.py
│   │   │   ├── temperature_scheduler.py
│   │   │   ├── temporal_convolution_net.py
│   │   │   └── utils.py
│   │   ├── mgnn/
│   │   │   ├── batch_trajectories.py
│   │   │   ├── trajectories.py
│   │   │   ├── trajectories_sdd.py
│   │   │   ├── trajectories_trajnet.py
│   │   │   ├── trajectories_trajnet_testset.py
│   │   │   └── utils.py
│   │   └── pec_net/
│   │       ├── config/
│   │       │   └── optimal.yaml
│   │       ├── sdd_trajectories.py
│   │       └── social_utils.py
│   └── tuning/
│       ├── 211130-train_shuijing.sh
│       ├── 211203-eval_shuijing.sh
│       ├── 211203-train_shuijing.sh
│       └── 211209-test_shuijing.sh
├── plot.py
├── requirements.txt
├── rl/
│   ├── .gitignore
│   ├── __init__.py
│   ├── evaluation.py
│   ├── networks/
│   │   ├── __init__.py
│   │   ├── distributions.py
│   │   ├── dummy_vec_env.py
│   │   ├── envs.py
│   │   ├── model.py
│   │   ├── network_utils.py
│   │   ├── selfAttn_srnn_temp_node.py
│   │   ├── shmem_vec_env.py
│   │   ├── srnn_model.py
│   │   └── storage.py
│   ├── ppo/
│   │   ├── __init__.py
│   │   └── ppo.py
│   └── vec_env/
│       ├── __init__.py
│       ├── dummy_vec_env.py
│       ├── envs.py
│       ├── logger.py
│       ├── running_mean_std.py
│       ├── shmem_vec_env.py
│       ├── tile_images.py
│       ├── util.py
│       ├── vec_env.py
│       ├── vec_frame_stack.py
│       ├── vec_normalize.py
│       ├── vec_pretext_normalize.py
│       └── vec_remove_dict_obs.py
├── test.py
└── train.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
gst_datasets/
gst_updated/scripts/data
trained_models/
.idea/
venv/
*.pyc 
*__pycache__*

train1.py
test1.py
test2.py
check_env.py

================================================
FILE: CHANGELOG.md
================================================
# Changelog

All notable changes to this project will be documented in this file.

## 2023-03-28
### Changed
- Changed instructions for Pytorch installation (moved Pytorch out of requirements.txt, and added link of official website in readme.md)
- Fixed bug for 'visible_masks' in generate_ob function in crowd_sim_var_num.py and crowd_sim_pred_real_gst.py, so that the shape of observation will not throw errors when config.sim.human_num_range > 0

## 2023-01
Initial commit


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 Shuijing Liu

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# CrowdNav++
This repository contains the codes for our paper titled "Intention Aware Robot Crowd Navigation with Attention-Based Interaction Graph" in ICRA 2023. 
For more details, please refer to the [project website](https://sites.google.com/view/intention-aware-crowdnav/home) and [arXiv preprint](https://arxiv.org/abs/2203.01821).
For experiment demonstrations, please refer to the [youtube video](https://www.youtube.com/watch?v=nxpxhF019VA).

**[News]**
- Please check out our most recent follow-up work below:
  - [HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments](https://github.com/Shuijing725/CrowdNav_HEIGHT)
- Please check out our open-sourced sim2real tutorial [here](https://github.com/Shuijing725/CrowdNav_Sim2Real_Turtlebot)
- Please check out my curated paper list for robot social navigation [here](https://github.com/Shuijing725/awesome-robot-social-navigation) (It is under active development) 

## Abstract
We study the problem of safe and intention-aware robot navigation in dense and interactive crowds. 
Most previous reinforcement learning (RL) based methods fail to consider different types of interactions among all agents or ignore the intentions of people, which results in performance degradation. 
In this paper, we propose a novel recurrent graph neural network with attention mechanisms to capture heterogeneous interactions among agents through space and time. 
To encourage longsighted robot behaviors, we infer the intentions of dynamic agents by predicting their future trajectories for several timesteps. 
The predictions are incorporated into a model-free RL framework to prevent the robot from intruding into the intended paths of other agents. 
We demonstrate that our method enables the robot to achieve good navigation performance and non-invasiveness in challenging crowd navigation scenarios. We successfully transfer the policy learned in simulation to a real-world TurtleBot 2i.

<p align="center">
<img src="/figures/open.png" width="450" />
</p>

## Setup
1. In a conda environment or virtual environment with Python 3.x, install the required python package
```
pip install -r requirements.txt
```

2. Install Pytorch 1.12.1 following the instructions [here](https://pytorch.org/get-started/previous-versions/#v1121)

3. Install [OpenAI Baselines](https://github.com/openai/baselines#installation) 
```
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
```

4. Install [Python-RVO2](https://github.com/sybrenstuvel/Python-RVO2) library


## Overview
This repository is organized in five parts: 
- `crowd_nav/` folder contains configurations and policies used in the simulator.
- `crowd_sim/` folder contains the simulation environment. 
- `gst_updated/` folder contains the code for running inference of a human trajectory predictor, named Gumbel Social Transformer (GST) [2].
- `rl/` contains the code for the RL policy networks, wrappers for the prediction network, and ppo algorithm. 
- `trained_models/` contains some pretrained models provided by us. 

Note that this repository does not include codes for training a trajectory prediction network. Please refer to from [this repo](https://github.com/tedhuang96/gst) instead.

## Run the code
### Training
- Modify the configurations.
  1. Environment configurations: Modify `crowd_nav/configs/config.py`. Especially,
     - Choice of human trajectory predictor: 
       - Set `sim.predict_method = 'inferred'` if a learning-based GST predictor is used [2]. Please also change `pred.model_dir` to be the directory of a trained GST model. We provide two pretrained models [here](https://github.com/Shuijing725/CrowdNav_Prediction_AttnGraph/tree/main/gst_updated/results/).
       - Set `sim.predict_method = 'const_vel'` if constant velocity model is used.
       - Set `sim.predict_method = 'truth'` if ground truth predictor is used.
       - Set `sim.predict_method = 'none'` if you do not want to use future trajectories to change the observation and reward.
     - Randomization of human behaviors: If you want to randomize the ORCA humans, 
       - set `env.randomize_attributes` to True to randomize the preferred velocity and radius of humans;
       - set `humans.random_goal_changing` to True to let humans randomly change goals before they arrive at their original goals.

  2. PPO and network configurations: modify `arguments.py`
     - `env_name` (must be consistent with `sim.predict_method` in `crowd_nav/configs/config.py`): 
        - If you use the GST predictor, set to `CrowdSimPredRealGST-v0`.
        - If you use the ground truth predictor or constant velocity predictor, set to `CrowdSimPred-v0`.
        - If you don't want to use prediction, set to `CrowdSimVarNum-v0`. 
     - `use_self_attn`: human-human attention network will be included if set to True, else there will be no human-human attention.
     - `use_hr_attn`: robot-human attention network will be included if set to True, else there will be no robot-human attention.
- After you change the configurations, run
  ```
  python train.py 
  ```
- The checkpoints and configuration files will be saved to the folder specified by `output_dir` in `arguments.py`.

### Testing
Please modify the test arguments in line 20-33 of `test.py` (**Don't set the argument values in terminal!**), and run   
```
python test.py 
```
Note that the `config.py` and `arguments.py` in the testing folder will be loaded, instead of those in the root directory.  
The testing results are logged in `trained_models/your_output_dir/test/` folder, and are also printed on terminal.  
If you set `visualize=True` in `test.py`, you will be able to see visualizations like this:  
<img src="/figures/visual.gif" width="420" />

#### Test pre-trained models provided by us
| Method                                 | `--model_dir` in test.py               | `--test_model` in test.py |
|----------------------------------------|----------------------------------------|---------------------------|
| Ours without randomized humans         | `trained_models/GST_predictor_no_rand` | `41200.pt`                |
| ORCA without randomized humans         | `trained_models/ORCA_no_rand`          | `00000.pt`                |
| Social force without randomized humans | `trained_models/SF_no_rand`            | `00000.pt`                |
| Ours with randomized humans            | `trained_models/GST_predictor_rand`    | `41665.pt`                |

#### Plot predicted future human positions
To visualize the episodes with predicted human trajectories, as well as saving visualizations to disk, please refer to [save_slides branch](https://github.com/Shuijing725/CrowdNav_Prediction_AttnGraph/tree/save_slides).  
Note that the above visualization and file saving will slow down testing significantly!   
- Set `save_slides=True` in `test.py` and all rendered frames will be saved in a subfolder inside the `trained_models/your_output_dir/social_eval/`.   

### Plot the training curves
```
python plot.py
```
Here are example learning curves of our proposed network model with GST predictor.

<img src="/figures/rewards.png" width="370" /> <img src="/figures/losses.png" width="370" />

## Sim2Real
We are happy to announce that our sim2real tutorial and code are released [here](https://github.com/Shuijing725/CrowdNav_Sim2Real_Turtlebot)!  
**Note:** This repo only serves as a reference point for the sim2real transfer of crowd navigation. Since there are lots of uncertainties in real-world experiments that may affect performance, we cannot guarantee that it is reproducible on all cases. 

## Disclaimer
1. We only tested our code in Ubuntu with Python 3.6 and Python 3.8. The code may work on other OS or other versions of Python, but we do not have any guarantee.  

2. The performance of our code can vary depending on the choice of hyperparameters and random seeds (see [this reddit post](https://www.reddit.com/r/MachineLearning/comments/rkewa3/d_what_are_your_machine_learning_superstitions/)). 
Unfortunately, we do not have time or resources for a thorough hyperparameter search. Thus, if your results are slightly worse than what is claimed in the paper, it is normal. 
To achieve the best performance, we recommend some manual hyperparameter tuning.

## Citation
If you find the code or the paper useful for your research, please cite the following papers:
```
@inproceedings{liu2022intention,
  title={Intention Aware Robot Crowd Navigation with Attention-Based Interaction Graph},
  author={Liu, Shuijing and Chang, Peixin and Huang, Zhe and Chakraborty, Neeloy and Hong, Kaiwen and Liang, Weihang and Livingston McPherson, D. and Geng, Junyi and Driggs-Campbell, Katherine},
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2023},
  pages={12015-12021}
}

@inproceedings{liu2020decentralized,
  title={Decentralized Structural-RNN for Robot Crowd Navigation with Deep Reinforcement Learning},
  author={Liu, Shuijing and Chang, Peixin and Liang, Weihang and Chakraborty, Neeloy and Driggs-Campbell, Katherine},
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2021},
  pages={3517-3524}
}
```

## Credits
Other contributors:  
[Peixin Chang](https://github.com/PeixinC)  
[Zhe Huang](https://github.com/tedhuang96)   
[Neeloy Chakraborty](https://github.com/TheNeeloy)  

Part of the code is based on the following repositories:  

[1] S. Liu, P. Chang, W. Liang, N. Chakraborty, and K. Driggs-Campbell, "Decentralized Structural-RNN for Robot Crowd Navigation with Deep Reinforcement Learning," in IEEE International Conference on Robotics and Automation (ICRA), 2019, pp. 3517-3524. (Github: https://github.com/Shuijing725/CrowdNav_DSRNN)  

[2] Z. Huang, R. Li, K. Shin, and K. Driggs-Campbell. "Learning Sparse Interaction Graphs of Partially Detected Pedestrians for Trajectory Prediction," in IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1198–1205, 2022. (Github: https://github.com/tedhuang96/gst)  

## Contact
If you have any questions or find any bugs, please feel free to open an issue or pull request.


================================================
FILE: arguments.py
================================================
import argparse

import torch


def get_args():
    parser = argparse.ArgumentParser(description='RL')

    # the saving directory for train.py
    parser.add_argument(
        '--output_dir', type=str, default='trained_models/my_model')

    # resume training from an existing checkpoint or not
    parser.add_argument(
        '--resume', default=False, action='store_true')
    # if resume = True, load from the following checkpoint
    parser.add_argument(
        '--load-path', default='trained_models/GST_predictor_non_rand/checkpoints/41200.pt',
        help='path of weights for resume training')
    parser.add_argument(
        '--overwrite',
        default=True,
        action='store_true',
        help = "whether to overwrite the output directory in training")
    parser.add_argument(
        '--num_threads',
        type=int,
        default=1,
        help="number of threads used for intraop parallelism on CPU")
    # only implement in testing
    parser.add_argument(
        '--phase', type=str, default='test')

    parser.add_argument(
        '--cuda-deterministic',
        action='store_true',
        default=False,
        help="sets flags for determinism when using CUDA (potentially slow!)")

    # only works for gpu only (although you can make it work on cpu after some minor fixes)
    parser.add_argument(
        '--no-cuda',
        action='store_true',
        default=False,
        help='disables CUDA training')
    parser.add_argument(
        '--seed', type=int, default=425, help='random seed (default: 1)')

    parser.add_argument(
        '--num-processes',
        type=int,
        default=16,
        help='how many training processes to use (default: 16)')

    parser.add_argument(
        '--num-mini-batch',
        type=int,
        default=2,
        help='number of batches for ppo (default: 32)')
    parser.add_argument(
        '--num-steps',
        type=int,
        default=30,
        help='number of forward steps in A2C (default: 5)')
    parser.add_argument(
        '--recurrent-policy',
        action='store_true',
        default=True,
        help='use a recurrent policy')

    parser.add_argument(
        '--ppo-epoch',
        type=int,
        default=5,
        help='number of ppo epochs (default: 4)')
    parser.add_argument(
        '--clip-param',
        type=float,
        default=0.2,
        help='ppo clip parameter (default: 0.2)')
    parser.add_argument(
        '--value-loss-coef',
        type=float,
        default=0.5,
        help='value loss coefficient (default: 0.5)')
    parser.add_argument(
        '--entropy-coef',
        type=float,
        default=0.0,
        help='entropy term coefficient (default: 0.01)')
    parser.add_argument(
        '--lr', type=float, default=4e-5, help='learning rate (default: 7e-4)')
    parser.add_argument(
        '--eps',
        type=float,
        default=1e-5,
        help='RMSprop optimizer epsilon (default: 1e-5)')
    parser.add_argument(
        '--alpha',
        type=float,
        default=0.99,
        help='RMSprop optimizer apha (default: 0.99)')
    parser.add_argument(
        '--gamma',
        type=float,
        default=0.99,
        help='discount factor for rewards (default: 0.99)')
    parser.add_argument(
        '--max-grad-norm',
        type=float,
        default=0.5,
        help='max norm of gradients (default: 0.5)')
    # 10e6 for holonomic, 20e6 for unicycle
    parser.add_argument(
        '--num-env-steps',
        type=int,
        default=20e6,
        help='number of environment steps to train (default: 10e6)')
    # True for unicycle, False for holonomic
    parser.add_argument(
        '--use-linear-lr-decay',
        action='store_true',
        default=False,
        help='use a linear schedule on the learning rate')
    parser.add_argument(
        '--algo', default='ppo', help='algorithm to use: a2c | ppo | acktr')
    parser.add_argument(
        '--save-interval',
        type=int,
        default=200,
        help='save interval, one save per n updates (default: 100)')
    parser.add_argument(
        '--use-gae',
        action='store_true',
        default=True,
        help='use generalized advantage estimation')
    parser.add_argument(
        '--gae-lambda',
        type=float,
        default=0.95,
        help='gae lambda parameter (default: 0.95)')
    parser.add_argument(
        '--log-interval',
        type=int,
        default=20,
        help='log interval, one log per n updates (default: 10)')
    parser.add_argument(
        '--use-proper-time-limits',
        action='store_true',
        default=False,
        help='compute returns taking into account time limits')

    # for srnn only
    # RNN size
    parser.add_argument('--human_node_rnn_size', type=int, default=128,
                        help='Size of Human Node RNN hidden state')
    parser.add_argument('--human_human_edge_rnn_size', type=int, default=256,
                        help='Size of Human Human Edge RNN hidden state')
    parser.add_argument(
        '--aux-loss',
        action='store_true',
        default=False,
        help='auxiliary loss on human nodes outputs')


    # Input and output size
    parser.add_argument('--human_node_input_size', type=int, default=3,
                        help='Dimension of the node features')
    parser.add_argument('--human_human_edge_input_size', type=int, default=2,
                        help='Dimension of the edge features')
    parser.add_argument('--human_node_output_size', type=int, default=256,
                        help='Dimension of the node output')

    # Embedding size
    parser.add_argument('--human_node_embedding_size', type=int, default=64,
                        help='Embedding size of node features')
    parser.add_argument('--human_human_edge_embedding_size', type=int, default=64,
                        help='Embedding size of edge features')

    # Attention vector dimension
    parser.add_argument('--attention_size', type=int, default=64,
                        help='Attention size')

    # Sequence length
    parser.add_argument('--seq_length', type=int, default=30,
                        help='Sequence length')

    # use self attn in human states or not
    parser.add_argument('--use_self_attn', type=bool, default=True,
                        help='Attention size')

    # use attn between humans and robots or not (todo: implment this in network models)
    parser.add_argument('--use_hr_attn', type=bool, default=True,
                        help='Attention size')

    # No prediction: for orca, sf, old_srnn, selfAttn_srnn_noPred ablation: 'CrowdSimVarNum-v0',
    # for constant velocity Pred, ground truth Pred: 'CrowdSimPred-v0'
    # gst pred: 'CrowdSimPredRealGST-v0'
    parser.add_argument(
        '--env-name',
        default='CrowdSimPredRealGST-v0',
        help='name of the environment')


    # sort all humans and squeeze them to the front or not
    parser.add_argument('--sort_humans', type=bool, default=True)


    args = parser.parse_args()

    args.cuda = not args.no_cuda and torch.cuda.is_available()

    assert args.algo in ['a2c', 'ppo', 'acktr']
    if args.recurrent_policy:
        assert args.algo in ['a2c', 'ppo'], \
            'Recurrent policy is not implemented for ACKTR'

    return args

================================================
FILE: collect_data.py
================================================
import matplotlib.pyplot as plt
import torch
import copy
import os
import numpy as np
from tqdm import trange

from crowd_nav.configs.config import Config
from rl.networks.envs import make_vec_envs
from crowd_sim.envs import *

# train_data: True if collect training data, False if collect testing data
def collectData(device, train_data, config):
    # set robot policy to orca
    config.robot.policy = 'orca'

    env_name = 'CrowdSimVarNumCollect-v0'
    # for render
    env_num = 1 if config.data.render else config.data.num_processes

    if config.data.render:
        fig, ax = plt.subplots(figsize=(7, 7))
        ax.set_xlim(-10, 10)
        ax.set_ylim(-10, 10)
        ax.set_xlabel('x(m)', fontsize=16)
        ax.set_ylabel('y(m)', fontsize=16)
        plt.ion()
        plt.show()
    else:
        ax = None

    # create parallel envs
    seed = np.random.randint(0, np.iinfo(np.uint32).max)
    envs = make_vec_envs(env_name, seed, env_num,
                         config.reward.gamma, None, device, allow_early_resets=True, config=config,
                         ax=ax, wrap_pytorch=False)


    # collect data for pretext training
    data = [] # list for all data collected
    for i in range(env_num):
        data.append([])

    obs = envs.reset()

    # 1 epoch -> 1 file
    # todo: add pretext arguments to config.py
    # make one prediction every "pred_interval" simulation steps
    pred_interval = config.data.pred_timestep // config.env.time_step
    tot_steps = int(config.data.tot_steps * pred_interval)
    for step in trange(tot_steps):

        if config.data.render:
            envs.render()
        if step % pred_interval == 0:
            # append a single data one by one
            for i in range(env_num):
                non_empty_obs = obs['pred_info'][i][np.logical_not(np.isinf(obs['pred_info'][i, :, -1]))]
                non_empty_obs = non_empty_obs.reshape((-1, 4)).tolist()
                if non_empty_obs:
                    data[i].extend(copy.deepcopy(non_empty_obs))

        action = np.zeros((env_num, 2))
        # action is is dummy action!
        obs, rew, done, info = envs.step(action)

    # save observations as pickle files
    filePath = os.path.join(config.data.data_save_dir, 'train') if train_data \
        else os.path.join(config.data.data_save_dir, 'test')
    if not os.path.isdir(filePath):
        os.makedirs(filePath)

    for i in range(env_num):
        with open(os.path.join(filePath, str(i)+'.txt'), 'w') as f:
            for item in data[i]:
                item = str(item[0]) + '\t' + str(item[1]) + '\t' + str(item[2]) + '\t' + str(item[3])
                f.write("%s\n" % item)


    envs.close()

if __name__ == '__main__':
    config = Config()
    device = torch.device("cuda")
    collectData(device, config.data.collect_train_data, config)


================================================
FILE: crowd_nav/__init__.py
================================================


================================================
FILE: crowd_nav/configs/__init__.py
================================================


================================================
FILE: crowd_nav/configs/config.py
================================================
import numpy as np
from arguments import get_args

class BaseConfig(object):
    def __init__(self):
        pass


class Config(object):
    # for now, import all args from arguments.py
    args = get_args()

    training = BaseConfig()
    training.device = "cuda:0" if args.cuda else "cpu"

    # general configs for OpenAI gym env
    env = BaseConfig()
    env.time_limit = 50
    env.time_step = 0.25
    env.val_size = 100
    env.test_size = 500
    # if randomize human behaviors, set to True, else set to False
    env.randomize_attributes = True
    env.num_processes = args.num_processes
    # record robot states and actions an episode for system identification in sim2real
    env.record = False
    env.load_act = False

    # config for reward function
    reward = BaseConfig()
    reward.success_reward = 10
    reward.collision_penalty = -20
    # discomfort distance
    reward.discomfort_dist = 0.25
    reward.discomfort_penalty_factor = 10
    reward.gamma = 0.99

    # config for simulation
    sim = BaseConfig()
    sim.circle_radius = 6 * np.sqrt(2)
    sim.arena_size = 6
    sim.human_num = 20
    # actual human num in each timestep, in [human_num-human_num_range, human_num+human_num_range]
    sim.human_num_range = 0
    sim.predict_steps = 5
    # 'const_vel': constant velocity model,
    # 'truth': ground truth future traj (with info in robot's fov)
    # 'inferred': inferred future traj from GST network
    # 'none': no prediction
    sim.predict_method = 'inferred'
    # render the simulation during training or not
    sim.render = False

    # for save_traj only
    render_traj = False
    save_slides = False
    save_path = None

    # whether wrap the vec env with VecPretextNormalize class
    # = True only if we are using a network for human trajectory prediction (sim.predict_method = 'inferred')
    if sim.predict_method == 'inferred':
        env.use_wrapper = True
    else:
        env.use_wrapper = False

    # human config
    humans = BaseConfig()
    humans.visible = True
    # orca or social_force for now
    humans.policy = "orca"
    humans.radius = 0.3
    humans.v_pref = 1
    humans.sensor = "coordinates"
    # FOV = this values * PI
    humans.FOV = 2.

    # a human may change its goal before it reaches its old goal
    # if randomize human behaviors, set to True, else set to False
    humans.random_goal_changing = True
    humans.goal_change_chance = 0.5

    # a human may change its goal after it reaches its old goal
    humans.end_goal_changing = True
    humans.end_goal_change_chance = 1.0

    # a human may change its radius and/or v_pref after it reaches its current goal
    humans.random_radii = False
    humans.random_v_pref = False

    # one human may have a random chance to be blind to other agents at every time step
    humans.random_unobservability = False
    humans.unobservable_chance = 0.3

    humans.random_policy_changing = False

    # robot config
    robot = BaseConfig()
    # whether robot is visible to humans (whether humans respond to the robot's motion)
    robot.visible = False
    # For baseline: srnn; our method: selfAttn_merge_srnn
    robot.policy = 'selfAttn_merge_srnn'
    robot.radius = 0.3
    robot.v_pref = 1
    robot.sensor = "coordinates"
    # FOV = this values * PI
    robot.FOV = 2
    # radius of perception range
    robot.sensor_range = 5

    # action space of the robot
    action_space = BaseConfig()
    # holonomic or unicycle
    action_space.kinematics = "holonomic"

    # config for ORCA
    orca = BaseConfig()
    orca.neighbor_dist = 10
    orca.safety_space = 0.15
    orca.time_horizon = 5
    orca.time_horizon_obst = 5

    # config for social force
    sf = BaseConfig()
    sf.A = 2.
    sf.B = 1
    sf.KI = 1

    # config for data collection for training the GST predictor
    data = BaseConfig()
    data.tot_steps = 40000
    data.render = False
    data.collect_train_data = False
    data.num_processes = 5
    data.data_save_dir = 'gst_updated/datasets/orca_20humans_no_rand'
    # number of seconds between each position in traj pred model
    data.pred_timestep = 0.25

    # config for the GST predictor
    pred = BaseConfig()
    # see 'gst_updated/results/README.md' for how to set this variable
    # If randomized humans: gst_updated/results/100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000_rand/sj
    # else: gst_updated/results/100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000/sj
    pred.model_dir = 'gst_updated/results/100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000_rand/sj'

    # LIDAR config
    lidar = BaseConfig()
    # angular resolution (offset angle between neighboring rays) in degrees
    lidar.angular_res = 5
    # range in meters
    lidar.range = 10

    # config for sim2real
    sim2real = BaseConfig()
    # use dummy robot and human states or not
    sim2real.use_dummy_detect = True
    sim2real.record = False
    sim2real.load_act = False
    sim2real.ROSStepInterval = 0.03
    sim2real.fixed_time_interval = 0.1
    sim2real.use_fixed_time_interval = True

    if sim.predict_method == 'inferred' and env.use_wrapper == False:
        raise ValueError("If using inferred prediction, you must wrap the envs!")
    if sim.predict_method != 'inferred' and env.use_wrapper:
        raise ValueError("If not using inferred prediction, you must NOT wrap the envs!")


================================================
FILE: crowd_nav/policy/orca.py
================================================
import numpy as np
import rvo2
from crowd_nav.policy.policy import Policy
from crowd_sim.envs.utils.action import ActionXY


class ORCA(Policy):
    def __init__(self, config):
        """
        timeStep        The time step of the simulation.
                        Must be positive.
        neighborDist    The default maximum distance (center point
                        to center point) to other agents a new agent
                        takes into account in the navigation. The
                        larger this number, the longer the running
                        time of the simulation. If the number is too
                        low, the simulation will not be safe. Must be
                        non-negative.
        maxNeighbors    The default maximum number of other agents a
                        new agent takes into account in the
                        navigation. The larger this number, the
                        longer the running time of the simulation.
                        If the number is too low, the simulation
                        will not be safe.
        timeHorizon     The default minimal amount of time for which
                        a new agent's velocities that are computed
                        by the simulation are safe with respect to
                        other agents. The larger this number, the
                        sooner an agent will respond to the presence
                        of other agents, but the less freedom the
                        agent has in choosing its velocities.
                        Must be positive.
        timeHorizonObst The default minimal amount of time for which
                        a new agent's velocities that are computed
                        by the simulation are safe with respect to
                        obstacles. The larger this number, the
                        sooner an agent will respond to the presence
                        of obstacles, but the less freedom the agent
                        has in choosing its velocities.
                        Must be positive.
        radius          The default radius of a new agent.
                        Must be non-negative.
        maxSpeed        The default maximum speed of a new agent.
                        Must be non-negative.
        velocity        The default initial two-dimensional linear
                        velocity of a new agent (optional).

        ORCA first uses neighborDist and maxNeighbors to find neighbors that need to be taken into account.
        Here set them to be large enough so that all agents will be considered as neighbors.
        Time_horizon should be set that at least it's safe for one time step

        In this work, obstacles are not considered. So the value of time_horizon_obst doesn't matter.

        """
        super().__init__(config)
        self.name = 'ORCA'
        self.max_neighbors = None
        self.radius = None
        self.max_speed = 1 # the ego agent assumes that all other agents have this max speed
        self.sim = None
        self.safety_space = self.config.orca.safety_space


    def predict(self, state):
        """
        Create a rvo2 simulation at each time step and run one step
        Python-RVO2 API: https://github.com/sybrenstuvel/Python-RVO2/blob/master/src/rvo2.pyx
        How simulation is done in RVO2: https://github.com/sybrenstuvel/Python-RVO2/blob/master/src/Agent.cpp

        Agent doesn't stop moving after it reaches the goal, because once it stops moving, the reciprocal rule is broken

        :param state:
        :return:
        """
        self_state = state.self_state
        # max number of humans = current number of humans
        self.max_neighbors = len(state.human_states)
        self.radius = state.self_state.radius
        params = self.config.orca.neighbor_dist, self.max_neighbors, self.config.orca.time_horizon, self.config.orca.time_horizon_obst
        if self.sim is not None and self.sim.getNumAgents() != len(state.human_states) + 1:
            del self.sim
            self.sim = None
        if self.sim is None:
            self.sim = rvo2.PyRVOSimulator(self.time_step, *params, self.radius, self.max_speed)
            self.sim.addAgent((self_state.px, self_state.py), *params, self_state.radius + 0.01 + self.safety_space,
                              self_state.v_pref, (self_state.vx, self_state.vy))
            for human_state in state.human_states:
                self.sim.addAgent((human_state.px, human_state.py), *params, human_state.radius + 0.01 + self.config.orca.safety_space,
                                  self.max_speed, (human_state.vx, human_state.vy))
        else:
            self.sim.setAgentPosition(0, (self_state.px, self_state.py))
            self.sim.setAgentVelocity(0, (self_state.vx, self_state.vy))
            for i, human_state in enumerate(state.human_states):
                self.sim.setAgentPosition(i + 1, (human_state.px, human_state.py))
                self.sim.setAgentVelocity(i + 1, (human_state.vx, human_state.vy))

        # Set the preferred velocity to be a vector of unit magnitude (speed) in the direction of the goal.
        velocity = np.array((self_state.gx - self_state.px, self_state.gy - self_state.py))
        speed = np.linalg.norm(velocity)
        pref_vel = velocity / speed if speed > 1 else velocity

        # Perturb a little to avoid deadlocks due to perfect symmetry.
        # perturb_angle = np.random.random() * 2 * np.pi
        # perturb_dist = np.random.random() * 0.01
        # perturb_vel = np.array((np.cos(perturb_angle), np.sin(perturb_angle))) * perturb_dist
        # pref_vel += perturb_vel

        self.sim.setAgentPrefVelocity(0, tuple(pref_vel))
        for i, human_state in enumerate(state.human_states):
            # unknown goal position of other humans
            self.sim.setAgentPrefVelocity(i + 1, (0, 0))

        self.sim.doStep()
        action = ActionXY(*self.sim.getAgentVelocity(0))
        self.last_state = state

        return action


================================================
FILE: crowd_nav/policy/policy.py
================================================
import abc
import numpy as np


class Policy(object):
    def __init__(self, config):
        """
        Base class for all policies, has an abstract method predict().
        """
        self.trainable = False
        self.phase = None
        self.model = None
        self.device = None
        self.last_state = None
        self.time_step = None
        # if agent is assumed to know the dynamics of real world
        self.env = None
        self.config = config


    @abc.abstractmethod
    def predict(self, state):
        """
        Policy takes state as input and output an action

        """
        return

    @staticmethod
    def reach_destination(state):
        self_state = state.self_state
        if np.linalg.norm((self_state.py - self_state.gy, self_state.px - self_state.gx)) < self_state.radius:
            return True
        else:
            return False


================================================
FILE: crowd_nav/policy/policy_factory.py
================================================
policy_factory = dict()
def none_policy():
    return None

from crowd_nav.policy.orca import ORCA
from crowd_nav.policy.social_force import SOCIAL_FORCE
from crowd_nav.policy.srnn import SRNN, selfAttn_merge_SRNN

policy_factory['orca'] = ORCA
policy_factory['none'] = none_policy
policy_factory['social_force'] = SOCIAL_FORCE
policy_factory['srnn'] = SRNN
policy_factory['selfAttn_merge_srnn'] = selfAttn_merge_SRNN

================================================
FILE: crowd_nav/policy/social_force.py
================================================
import numpy as np
from crowd_nav.policy.policy import Policy
from crowd_sim.envs.utils.action import ActionXY

class SOCIAL_FORCE(Policy):
    def __init__(self, config):
        super().__init__(config)
        self.name = 'social_force'


    def predict(self, state):
        """
        Produce action for agent with circular specification of social force model.
        """
        # Pull force to goal
        delta_x = state.self_state.gx - state.self_state.px
        delta_y = state.self_state.gy - state.self_state.py
        dist_to_goal = np.sqrt(delta_x**2 + delta_y**2)
        desired_vx = (delta_x / dist_to_goal) * state.self_state.v_pref
        desired_vy = (delta_y / dist_to_goal) * state.self_state.v_pref
        KI = self.config.sf.KI # Inverse of relocation time K_i
        curr_delta_vx = KI * (desired_vx - state.self_state.vx)
        curr_delta_vy = KI * (desired_vy - state.self_state.vy)
        
        # Push force(s) from other agents
        A = self.config.sf.A # Other observations' interaction strength: 1.5
        B = self.config.sf.B # Other observations' interaction range: 1.0
        interaction_vx = 0
        interaction_vy = 0
        for other_human_state in state.human_states:
            delta_x = state.self_state.px - other_human_state.px
            delta_y = state.self_state.py - other_human_state.py
            dist_to_human = np.sqrt(delta_x**2 + delta_y**2)
            interaction_vx += A * np.exp((state.self_state.radius + other_human_state.radius - dist_to_human) / B) * (delta_x / dist_to_human)
            interaction_vy += A * np.exp((state.self_state.radius + other_human_state.radius - dist_to_human) / B) * (delta_y / dist_to_human)

        # Sum of push & pull forces
        total_delta_vx = (curr_delta_vx + interaction_vx) * self.config.env.time_step
        total_delta_vy = (curr_delta_vy + interaction_vy) * self.config.env.time_step

        # clip the speed so that sqrt(vx^2 + vy^2) <= v_pref
        new_vx = state.self_state.vx + total_delta_vx
        new_vy = state.self_state.vy + total_delta_vy
        act_norm = np.linalg.norm([new_vx, new_vy])

        if act_norm > state.self_state.v_pref:
            return ActionXY(new_vx / act_norm * state.self_state.v_pref, new_vy / act_norm * state.self_state.v_pref)
        else:
            return ActionXY(new_vx, new_vy)


================================================
FILE: crowd_nav/policy/srnn.py
================================================

from crowd_nav.policy.policy import Policy
import numpy as np
from crowd_sim.envs.utils.action import ActionRot, ActionXY


class SRNN(Policy):
	def __init__(self, config):
		super().__init__(config)
		self.time_step = self.config.env.time_step # Todo: is this needed?
		self.name = 'srnn'
		self.trainable = True
		self.multiagent_training = True


	# clip the self.raw_action and return the clipped action
	def clip_action(self, raw_action, v_pref):
		"""
        Input state is the joint state of robot concatenated by the observable state of other agents

        To predict the best action, agent samples actions and propagates one step to see how good the next state is
        thus the reward function is needed

        """
		# quantize the action
		holonomic = True if self.config.action_space.kinematics == 'holonomic' else False
		# clip the action
		if holonomic:
			act_norm = np.linalg.norm(raw_action)
			if act_norm > v_pref:
				raw_action[0] = raw_action[0] / act_norm * v_pref
				raw_action[1] = raw_action[1] / act_norm * v_pref
			return ActionXY(raw_action[0], raw_action[1])
		else:
			# for sim2real
			# old value: -0.1, 0.1
			raw_action[0] = np.clip(raw_action[0], -0.1, 0.087) # action[0] is change of v
			# raw[0, 1] = np.clip(raw[0, 1], -0.25, 0.25) # action[1] is change of w
			# raw[0, 0] = np.clip(raw[0, 0], -state.self_state.v_pref, state.self_state.v_pref) # action[0] is v
			# old value: -0.1, 0.1
			# 0.073
			raw_action[1] = np.clip(raw_action[1], -0.06, 0.06) # action[1] is change of theta

			return ActionRot(raw_action[0], raw_action[1])


class selfAttn_merge_SRNN(SRNN):
	def __init__(self, config):
		super().__init__(config)
		self.name = 'selfAttn_merge_srnn'

================================================
FILE: crowd_sim/README.md
================================================
# Simulation Framework
## Environment
The environment contains n+1 agents. N of them are humans controlled by certain a fixed
policy. The other is robot and it's controlled by a trainable policy.
The environment is built on top of OpenAI gym library, and has implemented two abstract methods.
* reset(): the environment will reset positions for all the agents and return observation 
for robot. Observation for one agent is the observable states of all other agents.
* step(action): taking action of the robot as input, the environment computes observation
for each agent and call agent.act(observation) to get actions of agents. Then environment detects
whether there is a collision between agents. If not, the states of agents will be updated. Then 
observation, reward, done, info will be returned.


## Agent
Agent is a base class, and has two derived class of human and robot. Agent class holds
all the physical properties of an agent, including position, velocity, orientation, policy and etc.
* visibility: humans and robot can be set to be visible or invisible
* sensor: can be either visual input or coordinate input
* kinematics: can be either holonomic (move in any direction) or unicycle (has rotation constraints)
* act(observation): transform observation to state and pass it to policy


## Policy
Policy takes state as input and output an action. Current available policies:
* ORCA: model other agents as velocity obstacles to find optimal collision-free velocities under reciprocal assumption
* Social force: models the interactions in crowds using attractive and  repulsive  forces
* DS-RNN: our method


## State
There are multiple definition of states in different cases. The state of an agent representing all
the knowledge of environments is defined as JointState, and it's different from the state of the whole environment.
* ObservableState: position, velocity, radius of one agent
* FullState: position, velocity, radius, goal position, preferred velocity, rotation of one agent
* JoinState: concatenation of one agent's full state and all other agents' observable states 


## Action
There are two types of actions depending on what kinematics constraint the agent has.
* ActionXY: (vx, vy) if kinematics == 'holonomic'
* ActionRot: (velocity, rotation angle) if kinematics == 'unicycle'

================================================
FILE: crowd_sim/__init__.py
================================================
from gym.envs.registration import register

register(
    id='CrowdSim-v0',
    entry_point='crowd_sim.envs:CrowdSim',
)

register(
    id='CrowdSimPred-v0',
    entry_point='crowd_sim.envs:CrowdSimPred',
)

register(
    id='CrowdSimVarNum-v0',
    entry_point='crowd_sim.envs:CrowdSimVarNum',
)

register(
    id='CrowdSimVarNumCollect-v0',
    entry_point='crowd_sim.envs:CrowdSimVarNumCollect',
)

register(
    id='CrowdSimPredRealGST-v0',
    entry_point='crowd_sim.envs:CrowdSimPredRealGST',
)

register(
    id='rosTurtlebot2iEnv-v0',
    entry_point='crowd_sim.envs.ros_turtlebot2i_env:rosTurtlebot2iEnv',
)

================================================
FILE: crowd_sim/envs/__init__.py
================================================
from .crowd_sim import CrowdSim
from .crowd_sim_pred import CrowdSimPred
from .crowd_sim_var_num import CrowdSimVarNum
from .crowd_sim_var_num_collect import CrowdSimVarNumCollect
from .crowd_sim_pred_real_gst import CrowdSimPredRealGST

================================================
FILE: crowd_sim/envs/crowd_sim.py
================================================
import logging
import gym
import numpy as np
import rvo2
import random
import copy

from numpy.linalg import norm
from crowd_sim.envs.utils.human import Human
from crowd_sim.envs.utils.robot import Robot
from crowd_sim.envs.utils.info import *
from crowd_nav.policy.orca import ORCA
from crowd_sim.envs.utils.state import *
from crowd_sim.envs.utils.action import ActionRot, ActionXY
from crowd_sim.envs.utils.recorder import Recoder


class CrowdSim(gym.Env):
    """
    A base environment
    treat it as an abstract class, all other environments inherit from this one
    """
    def __init__(self):
        """
        Movement simulation for n+1 agents
        Agent can either be human or robot.
        humans are controlled by a unknown and fixed policy.
        robot is controlled by a known and learnable policy.
        """
        self.time_limit = None
        self.time_step = None
        self.robot = None # a Robot instance representing the robot
        self.humans = None # a list of Human instances, representing all humans in the environment
        self.global_time = None
        self.step_counter=0

        # reward function
        self.success_reward = None
        self.collision_penalty = None
        self.discomfort_dist = None
        self.discomfort_penalty_factor = None
        # simulation configuration
        self.config = None
        self.case_capacity = None
        self.case_size = None
        self.case_counter = None
        self.randomize_attributes = None

        self.circle_radius = None
        self.human_num = None


        self.action_space=None
        self.observation_space=None

        # limit FOV
        self.robot_fov = None
        self.human_fov = None

        self.dummy_human = None
        self.dummy_robot = None

        #seed
        self.thisSeed=None # the seed will be set when the env is created

        #nenv
        self.nenv=None # the number of env will be set when the env is created.
        # Because the human crossing cases are controlled by random seed, we will calculate unique random seed for each
        # parallel env.

        self.phase=None # set the phase to be train, val or test
        self.test_case=None # the test case ID, which will be used to calculate a seed to generate a human crossing case

        # for render
        self.render_axis=None

        self.humans = []

        self.potential = None

        self.desiredVelocity = [0.0, 0.0]

        self.last_left = 0.
        self.last_right = 0.


    def configure(self, config):
        """ read the config to the environment variables """

        self.config = config

        self.time_limit = config.env.time_limit
        self.time_step = config.env.time_step
        self.randomize_attributes = config.env.randomize_attributes

        self.success_reward = config.reward.success_reward
        self.collision_penalty = config.reward.collision_penalty
        self.discomfort_dist = config.reward.discomfort_dist
        self.discomfort_penalty_factor = config.reward.discomfort_penalty_factor


        self.case_capacity = {'train': np.iinfo(np.uint32).max - 2000, 'val': 1000, 'test': 1000}
        self.case_size = {'train': np.iinfo(np.uint32).max - 2000, 'val': self.config.env.val_size,
                          'test': self.config.env.test_size}
        self.circle_radius = config.sim.circle_radius
        self.human_num = config.sim.human_num

        self.arena_size = config.sim.arena_size

        self.case_counter = {'train': 0, 'test': 0, 'val': 0}

        logging.info('human number: {}'.format(self.human_num))
        if self.randomize_attributes:
            logging.info("Randomize human's radius and preferred speed")
        else:
            logging.info("Not randomize human's radius and preferred speed")

        logging.info('Circle width: {}'.format(self.circle_radius))


        self.robot_fov = np.pi * config.robot.FOV
        self.human_fov = np.pi * config.humans.FOV
        logging.info('robot FOV %f', self.robot_fov)
        logging.info('humans FOV %f', self.human_fov)


        # set dummy human and dummy robot
        # dummy humans, used if any human is not in view of other agents
        self.dummy_human = Human(self.config, 'humans')
        # if a human is not in view, set its state to (px = 100, py = 100, vx = 0, vy = 0, theta = 0, radius = 0)
        self.dummy_human.set(7, 7, 7, 7, 0, 0, 0) # (7, 7, 7, 7, 0, 0, 0)
        self.dummy_human.time_step = config.env.time_step

        self.dummy_robot = Robot(self.config, 'robot')
        self.dummy_robot.set(7, 7, 7, 7, 0, 0, 0)
        self.dummy_robot.time_step = config.env.time_step
        self.dummy_robot.kinematics = 'holonomic'
        self.dummy_robot.policy = ORCA(config)

        self.r = self.config.humans.radius


        # configure randomized goal changing of humans midway through episode
        self.random_goal_changing = config.humans.random_goal_changing
        if self.random_goal_changing:
            self.goal_change_chance = config.humans.goal_change_chance

        # configure randomized goal changing of humans after reaching their respective goals
        self.end_goal_changing = config.humans.end_goal_changing
        if self.end_goal_changing:
            self.end_goal_change_chance = config.humans.end_goal_change_chance

        self.last_human_states = np.zeros((self.human_num, 5))

        self.predict_steps = config.sim.predict_steps
        self.human_num_range = config.sim.human_num_range
        assert self.human_num > self.human_num_range

        self.max_human_num = self.human_num + self.human_num_range
        self.min_human_num = self.human_num - self.human_num_range

        # for sim2real dynamics check
        self.record=config.sim2real.record
        self.load_act=config.sim2real.load_act
        self.ROSStepInterval=config.sim2real.ROSStepInterval
        self.fixed_time_interval=config.sim2real.fixed_time_interval
        self.use_fixed_time_interval = config.sim2real.use_fixed_time_interval
        if self.record:
            self.episodeRecoder=Recoder()
            self.load_act=config.sim2real.load_act
            if self.load_act:
                self.episodeRecoder.loadActions()
        # use dummy robot and human states or use detected states from sensors
        self.use_dummy_detect = config.sim2real.use_dummy_detect


        # prediction period / control (or simulation) period
        self.pred_interval = int(config.data.pred_timestep // config.env.time_step)
        self.buffer_len = self.predict_steps * self.pred_interval

        # set robot for this envs
        rob_RL = Robot(config, 'robot')
        self.set_robot(rob_RL)


    def set_robot(self, robot):
        raise NotImplementedError


    def generate_random_human_position(self, human_num):
        """
        Calls generate_circle_crossing_human function to generate a certain number of random humans
        :param human_num: the total number of humans to be generated
        :return: None
        """
        # initial min separation distance to avoid danger penalty at beginning
        for i in range(human_num):
            self.humans.append(self.generate_circle_crossing_human())


    def generate_circle_crossing_human(self):
        """Generate a human: generate start position on a circle, goal position is at the opposite side"""
        human = Human(self.config, 'humans')
        if self.randomize_attributes:
            human.sample_random_attributes()

        while True:
            angle = np.random.random() * np.pi * 2
            # add some noise to simulate all the possible cases robot could meet with human
            v_pref = 1.0 if human.v_pref == 0 else human.v_pref
            px_noise = (np.random.random() - 0.5) * v_pref
            py_noise = (np.random.random() - 0.5) * v_pref
            px = self.circle_radius * np.cos(angle) + px_noise
            py = self.circle_radius * np.sin(angle) + py_noise
            collide = False

            for i, agent in enumerate([self.robot] + self.humans):
                # keep human at least 3 meters away from robot
                if self.robot.kinematics == 'unicycle' and i == 0:
                    min_dist = self.circle_radius / 2 # Todo: if circle_radius <= 4, it will get stuck here
                else:
                    min_dist = human.radius + agent.radius + self.discomfort_dist
                if norm((px - agent.px, py - agent.py)) < min_dist or \
                        norm((px - agent.gx, py - agent.gy)) < min_dist:
                    collide = True
                    break
            if not collide:
                break

        # px = np.random.uniform(-6, 6)
        # py = np.random.uniform(-3, 3.5)
        # human.set(px, py, px, py, 0, 0, 0)
        human.set(px, py, -px, -py, 0, 0, 0)
        return human


    # update the robot belief of human states
    # if a human is visible, its state is updated to its current ground truth state
    # else we assume it keeps going in a straight line with last observed velocity
    def update_last_human_states(self, human_visibility, reset):
        """
        update the self.last_human_states array
        human_visibility: list of booleans returned by get_human_in_fov (e.x. [T, F, F, T, F])
        reset: True if this function is called by reset, False if called by step
        :return:
        """
        # keep the order of 5 humans at each timestep
        for i in range(self.human_num):
            if human_visibility[i]:
                humanS = np.array(self.humans[i].get_observable_state_list())
                self.last_human_states[i, :] = humanS

            else:
                if reset:
                    humanS = np.array([15., 15., 0., 0., 0.3])
                    self.last_human_states[i, :] = humanS

                else:
                    # Plan A: linear approximation of human's next position
                    px, py, vx, vy, r = self.last_human_states[i, :]
                    px = px + vx * self.time_step
                    py = py + vy * self.time_step
                    self.last_human_states[i, :] = np.array([px, py, vx, vy, r])

                    # Plan B: assume the human doesn't move, use last observation
                    # self.last_human_states[i, :] = np.array([px, py, 0., 0., r])

                    # Plan C: put invisible humans in a dummy position
                    # humanS = np.array([15., 15., 0., 0., 0.3])
                    # self.last_human_states[i, :] = humanS

    # return the ground truth locations of all humans
    def get_true_human_states(self):
        true_human_states = np.zeros((self.human_num, 2))
        for i in range(self.human_num):
            humanS = np.array(self.humans[i].get_observable_state_list())
            true_human_states[i, :] = humanS[:2]
        return true_human_states


    # set robot initial state and generate all humans for reset function
    # for crowd nav: human_num == self.human_num
    # for leader follower: human_num = self.human_num - 1
    def generate_robot_humans(self, phase, human_num=None):
        if human_num is None:
            human_num = self.human_num

        if self.robot.kinematics == 'unicycle':
            angle = np.random.uniform(0, np.pi * 2)
            px = self.circle_radius * np.cos(angle)
            py = self.circle_radius * np.sin(angle)
            while True:
                gx, gy = np.random.uniform(-self.circle_radius, self.circle_radius, 2)
                if np.linalg.norm([px - gx, py - gy]) >= 6:  # 1 was 6
                    break
            self.robot.set(px, py, gx, gy, 0, 0, np.random.uniform(0, 2*np.pi)) # randomize init orientation

        # randomize starting position and goal position
        else:
            while True:
                px, py, gx, gy = np.random.uniform(-self.circle_radius, self.circle_radius, 4)
                if np.linalg.norm([px - gx, py - gy]) >= 6:
                    break
            self.robot.set(px, py, gx, gy, 0, 0, np.pi/2)


        # generate humans
        self.generate_random_human_position(human_num=human_num)


    def smooth_action(self, action):
        """ mimic the dynamics of Turtlebot2i for sim2real """
        # if action.r is delta theta
        w = action.r / self.time_step
        # if action.r is w
        # w = action.r
        beta = 0.1
        left = (2 * action.v - 0.23 * w) / (2 * 0.035)
        right = (2 * action.v + 0.23 * w) / (2 * 0.035)

        left = np.clip(left, -17.5, 17.5)
        right = np.clip(right, -17.5, 17.5)

        # print('Before: left:', left, 'right:', right)
        if self.phase == 'test':
            left = (1. - beta) * self.last_left + beta * left
            right = (1. - beta) * self.last_right + beta * right

        self.last_left = copy.deepcopy(left)
        self.last_right = copy.deepcopy(right)

        # subtract a noisy amount of delay from wheel speeds to simulate the delay in tb2
        # do this in the last step because this happens after we send action commands to tb2
        if left > 0:
            adjust_left = left - np.random.normal(loc=1.8, scale=0.15)
            left = max(0., adjust_left)
        else:
            adjust_left = left + np.random.normal(loc=1.8, scale=0.15)
            left = min(0., adjust_left)

        if right > 0:
            adjust_right = right - np.random.normal(loc=1.8, scale=0.15)
            right = max(0., adjust_right)
        else:
            adjust_right = right + np.random.normal(loc=1.8, scale=0.15)
            right = min(0., adjust_right)

        if self.record:
            self.episodeRecoder.wheelVelList.append([left, right])
        # print('After: left:', left, 'right:', right)

        v = 0.035 / 2 * (left + right)
        r = 0.035 / 0.23 * (right - left) * self.time_step
        return ActionRot(v, r)


    def reset(self, phase='train', test_case=None):
        """
        Reset the environment
        :return:
        """

        if self.phase is not None:
            phase = self.phase
        if self.test_case is not None:
            test_case=self.test_case

        if self.robot is None:
            raise AttributeError('robot has to be set!')
        assert phase in ['train', 'val', 'test']
        if test_case is not None:
            self.case_counter[phase] = test_case # test case is passed in to calculate specific seed to generate case
        self.global_time = 0
        self.step_counter=0


        self.humans = []
        # train, val, and test phase should start with different seed.
        # case capacity: the maximum number for train(max possible int -2000), val(1000), and test(1000)
        # val start from seed=0, test start from seed=case_capacity['val']=1000
        # train start from self.case_capacity['val'] + self.case_capacity['test']=2000
        counter_offset = {'train': self.case_capacity['val'] + self.case_capacity['test'],
                          'val': 0, 'test': self.case_capacity['val']}

        np.random.seed(counter_offset[phase] + self.case_counter[phase] + self.thisSeed)

        self.generate_robot_humans(phase)


        for agent in [self.robot] + self.humans:
            agent.time_step = self.time_step
            agent.policy.time_step = self.time_step


        # case size is used to make sure that the case_counter is always between 0 and case_size[phase]
        self.case_counter[phase] = (self.case_counter[phase] + int(1*self.nenv)) % self.case_size[phase]


        # get current observation
        ob = self.generate_ob(reset=True)

        # initialize potential
        self.potential = -abs(np.linalg.norm(np.array([self.robot.px, self.robot.py]) - np.array([self.robot.gx, self.robot.gy])))


        return ob


    # Update the humans' end goals in the environment
    # Produces valid end goals for each human
    def update_human_goals_randomly(self):
        # Update humans' goals randomly
        for human in self.humans:
            if human.isObstacle or human.v_pref == 0:
                continue
            if np.random.random() <= self.goal_change_chance:
                humans_copy = []
                for h in self.humans:
                    if h != human:
                        humans_copy.append(h)


                # Produce valid goal for human in case of circle setting
                while True:
                    angle = np.random.random() * np.pi * 2
                    # add some noise to simulate all the possible cases robot could meet with human
                    v_pref = 1.0 if human.v_pref == 0 else human.v_pref
                    gx_noise = (np.random.random() - 0.5) * v_pref
                    gy_noise = (np.random.random() - 0.5) * v_pref
                    gx = self.circle_radius * np.cos(angle) + gx_noise
                    gy = self.circle_radius * np.sin(angle) + gy_noise
                    collide = False

                    for agent in [self.robot] + humans_copy:
                        min_dist = human.radius + agent.radius + self.discomfort_dist
                        if norm((gx - agent.px, gy - agent.py)) < min_dist or \
                                norm((gx - agent.gx, gy - agent.gy)) < min_dist:
                            collide = True
                            break
                    if not collide:
                        break

                # Give human new goal
                human.gx = gx
                human.gy = gy
        return

    # Update the specified human's end goals in the environment randomly
    def update_human_goal(self, human):

        # Update human's goals randomly
        if np.random.random() <= self.end_goal_change_chance:
            humans_copy = []
            for h in self.humans:
                if h != human:
                    humans_copy.append(h)


            while True:
                angle = np.random.random() * np.pi * 2
                # add some noise to simulate all the possible cases robot could meet with human
                v_pref = 1.0 if human.v_pref == 0 else human.v_pref
                gx_noise = (np.random.random() - 0.5) * v_pref
                gy_noise = (np.random.random() - 0.5) * v_pref
                gx = self.circle_radius * np.cos(angle) + gx_noise
                gy = self.circle_radius * np.sin(angle) + gy_noise
                collide = False

                for agent in [self.robot] + humans_copy:
                    min_dist = human.radius + agent.radius + self.discomfort_dist
                    if norm((gx - agent.px, gy - agent.py)) < min_dist or \
                            norm((gx - agent.gx, gy - agent.gy)) < min_dist:
                        collide = True
                        break
                if not collide:
                    break

            # Give human new goal
            human.gx = gx
            human.gy = gy
        return

    # calculate the angle between the direction vector of state1 and the vector from state1 to state2
    def calc_offset_angle(self, state1, state2):
        if self.robot.kinematics == 'holonomic':
            real_theta = np.arctan2(state1.vy, state1.vx)
        else:
            real_theta = state1.theta
        # angle of center line of FOV of agent1
        v_fov = [np.cos(real_theta), np.sin(real_theta)]

        # angle between agent1 and agent2
        v_12 = [state2.px - state1.px, state2.py - state1.py]
        # angle between center of FOV and agent 2

        v_fov = v_fov / np.linalg.norm(v_fov)
        v_12 = v_12 / np.linalg.norm(v_12)

        offset = np.arccos(np.clip(np.dot(v_fov, v_12), a_min=-1, a_max=1))
        return offset

    # Caculate whether agent2 is in agent1's FOV
    # Not the same as whether agent1 is in agent2's FOV!!!!
    # arguments:
    # state1, state2: can be agent instance OR state instance
    # robot1: is True if state1 is robot, else is False
    # return value:
    # return True if state2 is visible to state1, else return False
    def detect_visible(self, state1, state2, robot1 = False, custom_fov=None, custom_sensor_range=None):
        if self.robot.kinematics == 'holonomic':
            real_theta = np.arctan2(state1.vy, state1.vx)
        else:
            real_theta = state1.theta
        # angle of center line of FOV of agent1
        v_fov = [np.cos(real_theta), np.sin(real_theta)]

        # angle between agent1 and agent2
        v_12 = [state2.px - state1.px, state2.py - state1.py]
        # angle between center of FOV and agent 2

        v_fov = v_fov / np.linalg.norm(v_fov)
        v_12 = v_12 / np.linalg.norm(v_12)

        offset = np.arccos(np.clip(np.dot(v_fov, v_12), a_min=-1, a_max=1))
        if custom_fov:
            fov = custom_fov
        else:
            if robot1:
                fov = self.robot_fov
            else:
                fov = self.human_fov

        if np.abs(offset) <= fov / 2:
            inFov = True
        else:
            inFov = False

        # detect whether state2 is in state1's sensor_range
        dist = np.linalg.norm([state1.px - state2.px, state1.py - state2.py]) - state1.radius - state2.radius
        if custom_sensor_range:
            inSensorRange = dist <= custom_sensor_range
        else:
            if robot1:
                inSensorRange = dist <= self.robot.sensor_range
            else:
                inSensorRange = True

        return (inFov and inSensorRange)


    # for robot:
    # return only visible humans to robot and number of visible humans
    # and a list of True/False to indicate whether each human is visible
    def get_num_human_in_fov(self):
        human_ids = []
        humans_in_view = []
        num_humans_in_view = 0

        for i in range(self.human_num):
            visible = self.detect_visible(self.robot, self.humans[i], robot1=True)
            if visible:
                humans_in_view.append(self.humans[i])
                num_humans_in_view = num_humans_in_view + 1
                human_ids.append(True)
            else:
                human_ids.append(False)

        return humans_in_view, num_humans_in_view, human_ids


    def last_human_states_obj(self):
        '''
        convert self.last_human_states to a list of observable state objects for old algorithms to use
        '''
        humans = []
        for i in range(self.human_num):
            h = ObservableState(*self.last_human_states[i])
            humans.append(h)
        return humans


    # calculate the reward at current timestep R(s, a)
    def calc_reward(self, action):
        # collision detection
        dmin = float('inf')

        danger_dists = []
        collision = False

        for i, human in enumerate(self.humans):
            dx = human.px - self.robot.px
            dy = human.py - self.robot.py
            closest_dist = (dx ** 2 + dy ** 2) ** (1 / 2) - human.radius - self.robot.radius

            if closest_dist < self.discomfort_dist:
                danger_dists.append(closest_dist)
            if closest_dist < 0:
                collision = True
                # logging.debug("Collision: distance between robot and p{} is {:.2E}".format(i, closest_dist))
                break
            elif closest_dist < dmin:
                dmin = closest_dist


        # check if reaching the goal
        reaching_goal = norm(np.array(self.robot.get_position()) - np.array(self.robot.get_goal_position())) < self.robot.radius

        if self.global_time >= self.time_limit - 1:
            reward = 0
            done = True
            episode_info = Timeout()
        elif collision:
            reward = self.collision_penalty
            done = True
            episode_info = Collision()
        elif reaching_goal:
            reward = self.success_reward
            done = True
            episode_info = ReachGoal()

        elif dmin < self.discomfort_dist:
            # only penalize agent for getting too close if it's visible
            # adjust the reward based on FPS
            # print(dmin)
            reward = (dmin - self.discomfort_dist) * self.discomfort_penalty_factor * self.time_step
            done = False
            episode_info = Danger(dmin)

        else:
            # potential reward
            potential_cur = np.linalg.norm(
                np.array([self.robot.px, self.robot.py]) - np.array(self.robot.get_goal_position()))
            reward = 2 * (-abs(potential_cur) - self.potential)
            self.potential = -abs(potential_cur)

            done = False
            episode_info = Nothing()


        # if the robot is near collision/arrival, it should be able to turn a large angle
        if self.robot.kinematics == 'unicycle':
            # add a rotational penalty
            # if action.r is w, factor = -0.02 if w in [-1.5, 1.5], factor = -0.045 if w in [-1, 1];
            # if action.r is delta theta, factor = -2 if r in [-0.15, 0.15], factor = -4.5 if r in [-0.1, 0.1]
            r_spin = -5 * action.r**2

            # add a penalty for going backwards
            if action.v < 0:
                r_back = -2 * abs(action.v)
            else:
                r_back = 0.

            reward = reward + r_spin + r_back

        return reward, done, episode_info

    # compute the observation
    def generate_ob(self, reset):
        visible_human_states, num_visible_humans, human_visibility = self.get_num_human_in_fov()
        self.update_last_human_states(human_visibility, reset=reset)
        if self.robot.policy.name in ['lstm_ppo', 'srnn']:
            ob = [num_visible_humans]
            # append robot's state
            robotS = np.array(self.robot.get_full_state_list())
            ob.extend(list(robotS))

            ob.extend(list(np.ravel(self.last_human_states)))
            ob = np.array(ob)

        else: # for orca and sf
            ob = self.last_human_states_obj()

        return ob

    def get_human_actions(self):
        # step all humans
        human_actions = []  # a list of all humans' actions

        for i, human in enumerate(self.humans):
            # observation for humans is always coordinates
            ob = []
            for other_human in self.humans:
                if other_human != human:
                    # Else detectable humans are always observable to each other
                    if self.detect_visible(human, other_human):
                        ob.append(other_human.get_observable_state())
                    else:
                        ob.append(self.dummy_human.get_observable_state())

            if self.robot.visible:
                if self.detect_visible(self.humans[i], self.robot):
                    ob += [self.robot.get_observable_state()]
                else:
                    ob += [self.dummy_robot.get_observable_state()]

            human_actions.append(human.act(ob))

        return human_actions

    def step(self, action, update=True):
        """
        Compute actions for all agents, detect collision, update environment and return (ob, reward, done, info)
        """

        # clip the action to obey robot's constraint
        action = self.robot.policy.clip_action(action, self.robot.v_pref)

        human_actions = self.get_human_actions()


        # compute reward and episode info
        reward, done, episode_info = self.calc_reward(action)


        # apply action and update all agents
        self.robot.step(action)
        for i, human_action in enumerate(human_actions):
            self.humans[i].step(human_action)
        self.global_time += self.time_step # max episode length=time_limit/time_step
        self.step_counter=self.step_counter+1

        ##### compute_ob goes here!!!!!
        ob = self.generate_ob(reset=False)


        if self.robot.policy.name in ['srnn']:
            info={'info':episode_info}
        else: # for orca and sf
            info=episode_info

        # Update all humans' goals randomly midway through episode
        if self.random_goal_changing:
            if self.global_time % 5 == 0:
                self.update_human_goals_randomly()


        # Update a specific human's goal once its reached its original goal
        if self.end_goal_changing:
            for human in self.humans:
                if not human.isObstacle and human.v_pref != 0 and norm((human.gx - human.px, human.gy - human.py)) < human.radius:
                    self.update_human_goal(human)

        return ob, reward, done, info

    def render(self, mode='human'):
        """ Render the current status of the environment using matplotlib """
        import matplotlib.pyplot as plt
        import matplotlib.lines as mlines
        from matplotlib import patches

        plt.rcParams['animation.ffmpeg_path'] = '/usr/bin/ffmpeg'

        robot_color = 'yellow'
        goal_color = 'red'
        arrow_color = 'red'
        arrow_style = patches.ArrowStyle("->", head_length=4, head_width=2)

        def calcFOVLineEndPoint(ang, point, extendFactor):
            # choose the extendFactor big enough
            # so that the endPoints of the FOVLine is out of xlim and ylim of the figure
            FOVLineRot = np.array([[np.cos(ang), -np.sin(ang), 0],
                                   [np.sin(ang), np.cos(ang), 0],
                                   [0, 0, 1]])
            point.extend([1])
            # apply rotation matrix
            newPoint = np.matmul(FOVLineRot, np.reshape(point, [3, 1]))
            # increase the distance between the line start point and the end point
            newPoint = [extendFactor * newPoint[0, 0], extendFactor * newPoint[1, 0], 1]
            return newPoint


        ax=self.render_axis
        artists=[]

        # add goal
        goal=mlines.Line2D([self.robot.gx], [self.robot.gy], color=goal_color, marker='*', linestyle='None', markersize=15, label='Goal')
        ax.add_artist(goal)
        artists.append(goal)

        # add robot
        robotX,robotY=self.robot.get_position()

        robot=plt.Circle((robotX,robotY), self.robot.radius, fill=True, color=robot_color)
        ax.add_artist(robot)
        artists.append(robot)

        plt.legend([robot, goal], ['Robot', 'Goal'], fontsize=16)


        # compute orientation in each step and add arrow to show the direction
        radius = self.robot.radius
        arrowStartEnd=[]

        robot_theta = self.robot.theta if self.robot.kinematics == 'unicycle' else np.arctan2(self.robot.vy, self.robot.vx)

        arrowStartEnd.append(((robotX, robotY), (robotX + radius * np.cos(robot_theta), robotY + radius * np.sin(robot_theta))))

        for i, human in enumerate(self.humans):
            theta = np.arctan2(human.vy, human.vx)
            arrowStartEnd.append(((human.px, human.py), (human.px + radius * np.cos(theta), human.py + radius * np.sin(theta))))

        arrows = [patches.FancyArrowPatch(*arrow, color=arrow_color, arrowstyle=arrow_style)
                  for arrow in arrowStartEnd]
        for arrow in arrows:
            ax.add_artist(arrow)
            artists.append(arrow)


        # draw FOV for the robot
        # add robot FOV

        if self.robot.FOV < 2 * np.pi:
            FOVAng = self.robot_fov / 2
            FOVLine1 = mlines.Line2D([0, 0], [0, 0], linestyle='--')
            FOVLine2 = mlines.Line2D([0, 0], [0, 0], linestyle='--')


            startPointX = robotX
            startPointY = robotY
            endPointX = robotX + radius * np.cos(robot_theta)
            endPointY = robotY + radius * np.sin(robot_theta)

            # transform the vector back to world frame origin, apply rotation matrix, and get end point of FOVLine
            # the start point of the FOVLine is the center of the robot
            FOVEndPoint1 = calcFOVLineEndPoint(FOVAng, [endPointX - startPointX, endPointY - startPointY], 20. / self.robot.radius)
            FOVLine1.set_xdata(np.array([startPointX, startPointX + FOVEndPoint1[0]]))
            FOVLine1.set_ydata(np.array([startPointY, startPointY + FOVEndPoint1[1]]))
            FOVEndPoint2 = calcFOVLineEndPoint(-FOVAng, [endPointX - startPointX, endPointY - startPointY], 20. / self.robot.radius)
            FOVLine2.set_xdata(np.array([startPointX, startPointX + FOVEndPoint2[0]]))
            FOVLine2.set_ydata(np.array([startPointY, startPointY + FOVEndPoint2[1]]))

            ax.add_artist(FOVLine1)
            ax.add_artist(FOVLine2)
            artists.append(FOVLine1)
            artists.append(FOVLine2)

        # add an arc of robot's sensor range
        sensor_range = plt.Circle(self.robot.get_position(), self.robot.sensor_range, fill=False, linestyle='--')
        ax.add_artist(sensor_range)
        artists.append(sensor_range)
        # add humans and change the color of them based on visibility
        human_circles = [plt.Circle(human.get_position(), human.radius, fill=False) for human in self.humans]


        for i in range(len(self.humans)):
            ax.add_artist(human_circles[i])
            artists.append(human_circles[i])

            # green: visible; red: invisible
            if self.detect_visible(self.robot, self.humans[i], robot1=True):
                human_circles[i].set_color(c='g')
            else:
                human_circles[i].set_color(c='r')

            # label numbers on each human
            # plt.text(self.humans[i].px - 0.1, self.humans[i].py - 0.1, str(self.humans[i].id), color='black', fontsize=12)
            plt.text(self.humans[i].px - 0.1, self.humans[i].py - 0.1, i, color='black', fontsize=12)


        plt.pause(0.1)
        for item in artists:
            item.remove() # there should be a better way to do this. For example,
            # initially use add_artist and draw_artist later on
        for t in ax.texts:
            t.set_visible(False)


================================================
FILE: crowd_sim/envs/crowd_sim_pred.py
================================================
import gym
import numpy as np
from numpy.linalg import norm
import copy

from crowd_sim.envs.utils.action import ActionRot, ActionXY
from crowd_sim.envs.crowd_sim_var_num import CrowdSimVarNum


class CrowdSimPred(CrowdSimVarNum):
    """
    The environment for our model with non-neural network trajectory predictors, including const vel predictor and ground truth predictor
    The number of humans at each timestep can change within a range
    """
    def __init__(self):
        """
        Movement simulation for n+1 agents
        Agent can either be human or robot.
        humans are controlled by a unknown and fixed policy.
        robot is controlled by a known and learnable policy.
        """
        super(CrowdSimPred, self).__init__()
        self.pred_method = None
        self.cur_human_states=None

    def configure(self, config):
        """ read the config to the environment variables """
        super().configure(config)
        # 'const_vel', 'truth', or 'inferred'
        self.pred_method = config.sim.predict_method

    def reset(self, phase='train', test_case=None):
        ob = super().reset(phase=phase, test_case=test_case)
        return ob

    # set observation space and action space
    def set_robot(self, robot):
        self.robot = robot

        # we set the max and min of action/observation space as inf
        # clip the action and observation as you need

        d = {}
        # robot node: num_visible_humans, px, py, r, gx, gy, v_pref, theta
        d['robot_node'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1, 7,), dtype=np.float32)
        # only consider all temporal edges (human_num+1) and spatial edges pointing to robot (human_num)
        d['temporal_edges'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1, 2,), dtype=np.float32)

        d['spatial_edges'] = gym.spaces.Box(low=-np.inf, high=np.inf,
                            shape=(self.config.sim.human_num + self.config.sim.human_num_range, int(2*(self.predict_steps+1))),
                            dtype=np.float32)
        # number of humans detected at each timestep
        d['detected_human_num'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1,), dtype=np.float32)
        self.observation_space = gym.spaces.Dict(d)

        high = np.inf * np.ones([2, ])
        self.action_space = gym.spaces.Box(-high, high, dtype=np.float32)


    # reset = True: reset calls this function; reset = False: step calls this function
    def generate_ob(self, reset, sort=True):
        """Generate observation for reset and step functions"""
        ob = {}

        # nodes
        visible_humans, num_visibles, self.human_visibility = self.get_num_human_in_fov()

        ob['robot_node'] = self.robot.get_full_state_list_noV()

        self.prev_human_pos = copy.deepcopy(self.last_human_states)
        self.update_last_human_states(self.human_visibility, reset=reset)

        # edges
        ob['temporal_edges'] = np.array([self.robot.vx, self.robot.vy])

        # initialize storage space for max_human_num humans
        ob['spatial_edges'] = np.ones((self.config.sim.human_num + self.config.sim.human_num_range, int(2*(self.predict_steps+1)))) * np.inf

        # calculate the current and future positions of present humans (first self.human_num humans in ob['spatial_edges'])
        predicted_states = self.calc_human_future_traj(method=self.pred_method) # [pred_steps + 1, human_num, 4]
        # [human_num, pred_steps + 1, 2]
        predicted_pos = np.transpose(predicted_states[:, :, :2], (1, 0, 2)) - np.array([self.robot.px, self.robot.py])

        # [human_num, (pred_steps + 1)*2]
        ob['spatial_edges'][:self.human_num][self.human_visibility] = predicted_pos.reshape((self.human_num, -1))[self.human_visibility]
        # sort all humans by distance (invisible humans will be in the end automatically)
        if self.config.args.sort_humans:
            ob['spatial_edges'] = np.array(sorted(ob['spatial_edges'], key=lambda x: np.linalg.norm(x[:2])))
        ob['spatial_edges'][np.isinf(ob['spatial_edges'])] = 15

        ob['detected_human_num'] = num_visibles
        # if no human is detected, assume there is one dummy human at (15, 15) to make the pack_padded_sequence work
        if ob['detected_human_num'] == 0:
            ob['detected_human_num'] = 1

        return ob


    def step(self, action, update=True):
        """
        step function
        Compute actions for all agents, detect collision, update environment and return (ob, reward, done, info)
        """
        if self.robot.policy.name == 'ORCA':
            # assemble observation for orca: px, py, vx, vy, r
            # include all observable humans from t to t+t_pred
            _, _, human_visibility = self.get_num_human_in_fov()
            # [self.predict_steps + 1, self.human_num, 4]
            human_states = copy.deepcopy(self.calc_human_future_traj(method='truth'))
            # append the radius, convert it to [human_num*(self.predict_steps+1), 5] by treating each predicted pos as a new human
            human_states = np.concatenate((human_states.reshape((-1, 4)),
                                           np.tile(self.last_human_states[:, -1], self.predict_steps+1).reshape((-1, 1))),
                                          axis=1)
            # get orca action
            action = self.robot.act(human_states.tolist())
        else:
            action = self.robot.policy.clip_action(action, self.robot.v_pref)

        if self.robot.kinematics == 'unicycle':
            self.desiredVelocity[0] = np.clip(self.desiredVelocity[0] + action.v, -self.robot.v_pref, self.robot.v_pref)
            action = ActionRot(self.desiredVelocity[0], action.r)

            # if action.r is delta theta
            action = ActionRot(self.desiredVelocity[0], action.r)
            if self.record:
                self.episodeRecoder.unsmoothed_actions.append(list(action))

            action = self.smooth_action(action)


        human_actions = self.get_human_actions()

        # need to update self.human_future_traj in testing to calculate number of intrusions
        if self.phase == 'test':
            # use ground truth future positions of humans
            self.calc_human_future_traj(method='truth')

        # compute reward and episode info
        reward, done, episode_info = self.calc_reward(action, danger_zone='future')


        if self.record:

            self.episodeRecoder.actionList.append(list(action))
            self.episodeRecoder.positionList.append([self.robot.px, self.robot.py])
            self.episodeRecoder.orientationList.append(self.robot.theta)

            if done:
                self.episodeRecoder.robot_goal.append([self.robot.gx, self.robot.gy])
                self.episodeRecoder.saveEpisode(self.case_counter['test'])

        # apply action and update all agents
        self.robot.step(action)
        for i, human_action in enumerate(human_actions):
            self.humans[i].step(human_action)
            self.cur_human_states[i] = np.array([self.humans[i].px, self.humans[i].py, self.humans[i].radius])

        self.global_time += self.time_step # max episode length=time_limit/time_step
        self.step_counter = self.step_counter+1

        info={'info':episode_info}

        # Add or remove at most self.human_num_range humans
        # if self.human_num_range == 0 -> human_num is fixed at all times
        if self.human_num_range > 0 and self.global_time % 5 == 0:
            # remove humans
            if np.random.rand() < 0.5:
                # if no human is visible, anyone can be removed
                if len(self.observed_human_ids) == 0:
                    max_remove_num = self.human_num - 1
                else:
                    max_remove_num = (self.human_num - 1) - max(self.observed_human_ids)
                remove_num = np.random.randint(low=0, high=min(self.human_num_range, max_remove_num) + 1)
                for _ in range(remove_num):
                    self.humans.pop()
                self.human_num = self.human_num - remove_num
                self.last_human_states = self.last_human_states[:self.human_num]
            # add humans
            else:
                add_num = np.random.randint(low=0, high=self.human_num_range + 1)
                if add_num > 0:
                    # set human ids
                    true_add_num = 0
                    for i in range(self.human_num, self.human_num + add_num):
                        if i == self.config.sim.human_num + self.human_num_range:
                            break
                        self.generate_random_human_position(human_num=1)
                        self.humans[i].id = i
                        true_add_num = true_add_num + 1
                    self.human_num = self.human_num + true_add_num
                    if true_add_num > 0:
                        self.last_human_states = np.concatenate((self.last_human_states, np.array([[15, 15, 0, 0, 0.3]]*true_add_num)), axis=0)


        # compute the observation
        ob = self.generate_ob(reset=False)


        # Update all humans' goals randomly midway through episode
        if self.random_goal_changing:
            if self.global_time % 5 == 0:
                self.update_human_goals_randomly()

        # Update a specific human's goal once its reached its original goal
        if self.end_goal_changing and not self.record:
            for i, human in enumerate(self.humans):
                if norm((human.gx - human.px, human.gy - human.py)) < human.radius:
                    self.humans[i] = self.generate_circle_crossing_human()
                    self.humans[i].id = i

        return ob, reward, done, info


    def calc_reward(self, action, danger_zone='future'):
        """Calculate the reward, which includes the functionality reward and social reward"""
        # compute the functionality reward, same as the reward function with no prediction
        reward, done, episode_info = super().calc_reward(action, danger_zone=danger_zone)
        # compute the social reward
        # if the robot collides with future states, give it a collision penalty
        relative_pos = self.human_future_traj[1:, :, :2] - np.array([self.robot.px, self.robot.py])
        # assumes constant radius of each personal zone circle
        collision_idx = np.linalg.norm(relative_pos, axis=-1) < self.robot.radius + self.config.humans.radius # [predict_steps, human_num]

        coefficients = 2. ** np.arange(2, self.predict_steps + 2).reshape((self.predict_steps, 1))  # 4, 8, 16, 32

        collision_penalties = self.collision_penalty / coefficients

        reward_future = collision_idx * collision_penalties # [predict_steps, human_num]
        reward_future = np.min(reward_future)

        return reward + reward_future, done, episode_info


    def render(self, mode='human'):
        """ Render the current status of the environment using matplotlib """
        import matplotlib.pyplot as plt
        import matplotlib.lines as mlines
        from matplotlib import patches

        plt.rcParams['animation.ffmpeg_path'] = '/usr/bin/ffmpeg'

        robot_color = 'gold'
        goal_color = 'red'
        arrow_color = 'red'
        arrow_style = patches.ArrowStyle("->", head_length=4, head_width=2)

        def calcFOVLineEndPoint(ang, point, extendFactor):
            # choose the extendFactor big enough
            # so that the endPoints of the FOVLine is out of xlim and ylim of the figure
            FOVLineRot = np.array([[np.cos(ang), -np.sin(ang), 0],
                                   [np.sin(ang), np.cos(ang), 0],
                                   [0, 0, 1]])
            point.extend([1])
            # apply rotation matrix
            newPoint = np.matmul(FOVLineRot, np.reshape(point, [3, 1]))
            # increase the distance between the line start point and the end point
            newPoint = [extendFactor * newPoint[0, 0], extendFactor * newPoint[1, 0], 1]
            return newPoint


        ax=self.render_axis
        artists=[]

        # add goal
        goal=mlines.Line2D([self.robot.gx], [self.robot.gy], color=goal_color, marker='*', linestyle='None', markersize=15, label='Goal')
        ax.add_artist(goal)
        artists.append(goal)

        # add robot
        robotX,robotY=self.robot.get_position()

        robot=plt.Circle((robotX,robotY), self.robot.radius, fill=True, color=robot_color)
        ax.add_artist(robot)
        artists.append(robot)

        # plt.legend([robot, goal], ['Robot', 'Goal'], fontsize=16)


        # compute orientation in each step and add arrow to show the direction
        radius = self.robot.radius
        arrowStartEnd=[]

        robot_theta = self.robot.theta if self.robot.kinematics == 'unicycle' else np.arctan2(self.robot.vy, self.robot.vx)

        arrowStartEnd.append(((robotX, robotY), (robotX + radius * np.cos(robot_theta), robotY + radius * np.sin(robot_theta))))

        for i, human in enumerate(self.humans):
            theta = np.arctan2(human.vy, human.vx)
            arrowStartEnd.append(((human.px, human.py), (human.px + radius * np.cos(theta), human.py + radius * np.sin(theta))))

        arrows = [patches.FancyArrowPatch(*arrow, color=arrow_color, arrowstyle=arrow_style)
                  for arrow in arrowStartEnd]
        for arrow in arrows:
            ax.add_artist(arrow)
            artists.append(arrow)


        # draw FOV for the robot
        # add robot FOV
        if self.robot.FOV < 2 * np.pi:
            FOVAng = self.robot_fov / 2
            FOVLine1 = mlines.Line2D([0, 0], [0, 0], linestyle='--')
            FOVLine2 = mlines.Line2D([0, 0], [0, 0], linestyle='--')


            startPointX = robotX
            startPointY = robotY
            endPointX = robotX + radius * np.cos(robot_theta)
            endPointY = robotY + radius * np.sin(robot_theta)

            # transform the vector back to world frame origin, apply rotation matrix, and get end point of FOVLine
            # the start point of the FOVLine is the center of the robot
            FOVEndPoint1 = calcFOVLineEndPoint(FOVAng, [endPointX - startPointX, endPointY - startPointY], 20. / self.robot.radius)
            FOVLine1.set_xdata(np.array([startPointX, startPointX + FOVEndPoint1[0]]))
            FOVLine1.set_ydata(np.array([startPointY, startPointY + FOVEndPoint1[1]]))
            FOVEndPoint2 = calcFOVLineEndPoint(-FOVAng, [endPointX - startPointX, endPointY - startPointY], 20. / self.robot.radius)
            FOVLine2.set_xdata(np.array([startPointX, startPointX + FOVEndPoint2[0]]))
            FOVLine2.set_ydata(np.array([startPointY, startPointY + FOVEndPoint2[1]]))

            ax.add_artist(FOVLine1)
            ax.add_artist(FOVLine2)
            artists.append(FOVLine1)
            artists.append(FOVLine2)

        # add an arc of robot's sensor range
        sensor_range = plt.Circle(self.robot.get_position(), self.robot.sensor_range + self.robot.radius+self.config.humans.radius, fill=False, linestyle='--')
        ax.add_artist(sensor_range)
        artists.append(sensor_range)

        # add humans and change the color of them based on visibility
        human_circles = [plt.Circle(human.get_position(), human.radius, fill=False, linewidth=1.5) for human in self.humans]
        # human_circles = [plt.Circle(human.get_position(), 2, fill=False) for human in self.humans]

        # hardcoded for now
        actual_arena_size = self.arena_size + 0.5
        # plot current human states
        for i in range(len(self.humans)):
            ax.add_artist(human_circles[i])
            artists.append(human_circles[i])

            # green: visible; red: invisible
            if self.detect_visible(self.robot, self.humans[i], robot1=True):
                human_circles[i].set_color(c='b')
            else:
                human_circles[i].set_color(c='r')


            if -actual_arena_size <= self.humans[i].px <= actual_arena_size and -actual_arena_size <= self.humans[
                i].py <= actual_arena_size:
                # label numbers on each human
                # plt.text(self.humans[i].px - 0.1, self.humans[i].py - 0.1, str(self.humans[i].id), color='black', fontsize=12)
                plt.text(self.humans[i].px - 0.1, self.humans[i].py - 0.1, i, color='black', fontsize=12)


        # save plot for slide show
        if self.config.save_slides:
            import os
            folder_path = os.path.join(self.config.save_path, str(self.rand_seed)+'pred')
            if not os.path.isdir(folder_path):
                os.makedirs(folder_path, exist_ok = True)
            plt.savefig(os.path.join(folder_path, str(self.step_counter)+'.png'), dpi=300)

        plt.pause(0.1)
        for item in artists:
            item.remove() # there should be a better way to do this. For example,
            # initially use add_artist and draw_artist later on
        for t in ax.texts:
            t.set_visible(False)


================================================
FILE: crowd_sim/envs/crowd_sim_pred_real_gst.py
================================================
import gym
import numpy as np

from crowd_sim.envs.crowd_sim_pred import CrowdSimPred


class CrowdSimPredRealGST(CrowdSimPred):
    '''
    Same as CrowdSimPred, except that
    The future human traj in 'spatial_edges' are dummy placeholders
    and will be replaced by the outputs of a real GST pred model in the wrapper function in vec_pretext_normalize.py
    '''
    def __init__(self):
        """
        Movement simulation for n+1 agents
        Agent can either be human or robot.
        humans are controlled by a unknown and fixed policy.
        robot is controlled by a known and learnable policy.
        """
        super(CrowdSimPredRealGST, self).__init__()
        self.pred_method = None

        # to receive data from gst pred model
        self.gst_out_traj = None


    def set_robot(self, robot):
        """set observation space and action space"""
        self.robot = robot

        # we set the max and min of action/observation space as inf
        # clip the action and observation as you need

        d = {}
        # robot node: num_visible_humans, px, py, r, gx, gy, v_pref, theta
        d['robot_node'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1, 7,), dtype=np.float32)
        # only consider all temporal edges (human_num+1) and spatial edges pointing to robot (human_num)
        d['temporal_edges'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1, 2,), dtype=np.float32)
        '''
        format of spatial_edges: [max_human_num, [state_t, state_(t+1), ..., state(t+self.pred_steps)]]
        '''

        # predictions only include mu_x, mu_y (or px, py)
        self.spatial_edge_dim = int(2*(self.predict_steps+1))

        d['spatial_edges'] = gym.spaces.Box(low=-np.inf, high=np.inf,
                            shape=(self.config.sim.human_num + self.config.sim.human_num_range, self.spatial_edge_dim),
                            dtype=np.float32)

        # masks for gst pred model
        # whether each human is visible to robot (ordered by human ID, should not be sorted)
        d['visible_masks'] = gym.spaces.Box(low=-np.inf, high=np.inf,
                                            shape=(self.config.sim.human_num + self.config.sim.human_num_range,),
                                            dtype=np.bool)

        # number of humans detected at each timestep
        d['detected_human_num'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1,), dtype=np.float32)

        self.observation_space = gym.spaces.Dict(d)

        high = np.inf * np.ones([2, ])
        self.action_space = gym.spaces.Box(-high, high, dtype=np.float32)

    def talk2Env(self, data):
        """
        Call this function when you want extra information to send to/recv from the env
        :param data: data that is sent from gst_predictor network to the env, it has 2 parts:
        output predicted traj and output masks
        :return: True means received
        """
        self.gst_out_traj=data
        return True


    # reset = True: reset calls this function; reset = False: step calls this function
    def generate_ob(self, reset, sort=False):
        """Generate observation for reset and step functions"""
        # since gst pred model needs ID tracking, don't sort all humans
        # inherit from crowd_sim_lstm, not crowd_sim_pred to avoid computation of true pred!
        # sort=False because we will sort in wrapper in vec_pretext_normalize.py later
        parent_ob = super(CrowdSimPred, self).generate_ob(reset=reset, sort=False)

        # add additional keys, removed unused keys
        ob = {}

        ob['visible_masks'] = parent_ob['visible_masks']
        ob['robot_node'] = parent_ob['robot_node']
        ob['temporal_edges'] = parent_ob['temporal_edges']

        ob['spatial_edges'] = np.tile(parent_ob['spatial_edges'], self.predict_steps+1)

        ob['detected_human_num'] = parent_ob['detected_human_num']

        return ob


    def calc_reward(self, action, danger_zone='future'):
        # inherit from crowd_sim_lstm, not crowd_sim_pred to prevent social reward calculation
        # since we don't know the GST predictions yet
        reward, done, episode_info = super(CrowdSimPred, self).calc_reward(action, danger_zone=danger_zone)
        return reward, done, episode_info


    def render(self, mode='human'):
        """
        render function
        use talk2env to plot the predicted future traj of humans
        """
        import matplotlib.pyplot as plt
        import matplotlib.lines as mlines
        from matplotlib import patches

        plt.rcParams['animation.ffmpeg_path'] = '/usr/bin/ffmpeg'

        robot_color = 'gold'
        goal_color = 'red'
        arrow_color = 'red'
        arrow_style = patches.ArrowStyle("->", head_length=4, head_width=2)

        def calcFOVLineEndPoint(ang, point, extendFactor):
            # choose the extendFactor big enough
            # so that the endPoints of the FOVLine is out of xlim and ylim of the figure
            FOVLineRot = np.array([[np.cos(ang), -np.sin(ang), 0],
                                   [np.sin(ang), np.cos(ang), 0],
                                   [0, 0, 1]])
            point.extend([1])
            # apply rotation matrix
            newPoint = np.matmul(FOVLineRot, np.reshape(point, [3, 1]))
            # increase the distance between the line start point and the end point
            newPoint = [extendFactor * newPoint[0, 0], extendFactor * newPoint[1, 0], 1]
            return newPoint


        ax=self.render_axis
        artists=[]

        # add goal
        goal=mlines.Line2D([self.robot.gx], [self.robot.gy], color=goal_color, marker='*', linestyle='None', markersize=15, label='Goal')
        ax.add_artist(goal)
        artists.append(goal)

        # add robot
        robotX,robotY=self.robot.get_position()

        robot=plt.Circle((robotX,robotY), self.robot.radius, fill=True, color=robot_color)
        ax.add_artist(robot)
        artists.append(robot)


        # compute orientation in each step and add arrow to show the direction
        radius = self.robot.radius
        arrowStartEnd=[]

        robot_theta = self.robot.theta if self.robot.kinematics == 'unicycle' else np.arctan2(self.robot.vy, self.robot.vx)

        arrowStartEnd.append(((robotX, robotY), (robotX + radius * np.cos(robot_theta), robotY + radius * np.sin(robot_theta))))

        for i, human in enumerate(self.humans):
            theta = np.arctan2(human.vy, human.vx)
            arrowStartEnd.append(((human.px, human.py), (human.px + radius * np.cos(theta), human.py + radius * np.sin(theta))))

        arrows = [patches.FancyArrowPatch(*arrow, color=arrow_color, arrowstyle=arrow_style)
                  for arrow in arrowStartEnd]
        for arrow in arrows:
            ax.add_artist(arrow)
            artists.append(arrow)


        # draw FOV for the robot
        # add robot FOV
        if self.robot.FOV < 2 * np.pi:
            FOVAng = self.robot_fov / 2
            FOVLine1 = mlines.Line2D([0, 0], [0, 0], linestyle='--')
            FOVLine2 = mlines.Line2D([0, 0], [0, 0], linestyle='--')


            startPointX = robotX
            startPointY = robotY
            endPointX = robotX + radius * np.cos(robot_theta)
            endPointY = robotY + radius * np.sin(robot_theta)

            # transform the vector back to world frame origin, apply rotation matrix, and get end point of FOVLine
            # the start point of the FOVLine is the center of the robot
            FOVEndPoint1 = calcFOVLineEndPoint(FOVAng, [endPointX - startPointX, endPointY - startPointY], 20. / self.robot.radius)
            FOVLine1.set_xdata(np.array([startPointX, startPointX + FOVEndPoint1[0]]))
            FOVLine1.set_ydata(np.array([startPointY, startPointY + FOVEndPoint1[1]]))
            FOVEndPoint2 = calcFOVLineEndPoint(-FOVAng, [endPointX - startPointX, endPointY - startPointY], 20. / self.robot.radius)
            FOVLine2.set_xdata(np.array([startPointX, startPointX + FOVEndPoint2[0]]))
            FOVLine2.set_ydata(np.array([startPointY, startPointY + FOVEndPoint2[1]]))

            ax.add_artist(FOVLine1)
            ax.add_artist(FOVLine2)
            artists.append(FOVLine1)
            artists.append(FOVLine2)

        # add an arc of robot's sensor range
        sensor_range = plt.Circle(self.robot.get_position(), self.robot.sensor_range + self.robot.radius+self.config.humans.radius, fill=False, linestyle='--')
        ax.add_artist(sensor_range)
        artists.append(sensor_range)

        # add humans and change the color of them based on visibility
        human_circles = [plt.Circle(human.get_position(), human.radius, fill=False, linewidth=1.5) for human in self.humans]

        # hardcoded for now
        actual_arena_size = self.arena_size + 0.5

        # plot the current human states
        for i in range(len(self.humans)):
            ax.add_artist(human_circles[i])
            artists.append(human_circles[i])

            # green: visible; red: invisible
            if self.human_visibility[i]:
                human_circles[i].set_color(c='b')
            else:
                human_circles[i].set_color(c='r')

            if -actual_arena_size <= self.humans[i].px <= actual_arena_size and -actual_arena_size <= self.humans[
                i].py <= actual_arena_size:
                # label numbers on each human
                # plt.text(self.humans[i].px - 0.1, self.humans[i].py - 0.1, str(self.humans[i].id), color='black', fontsize=12)
                plt.text(self.humans[i].px , self.humans[i].py , i, color='black', fontsize=12)

        # plot predicted human positions
        for i in range(len(self.humans)):
            # add future predicted positions of each human
            if self.gst_out_traj is not None:
                for j in range(self.predict_steps):
                    circle = plt.Circle(self.gst_out_traj[i, (2 * j):(2 * j + 2)] + np.array([robotX, robotY]),
                                        self.config.humans.radius, fill=False, color='tab:orange', linewidth=1.5)
                    # circle = plt.Circle(np.array([robotX, robotY]),
                    #                     self.humans[i].radius, fill=False)
                    ax.add_artist(circle)
                    artists.append(circle)

        plt.pause(0.1)
        for item in artists:
            item.remove() # there should be a better way to do this. For example,
            # initially use add_artist and draw_artist later on
        for t in ax.texts:
            t.set_visible(False)


================================================
FILE: crowd_sim/envs/crowd_sim_var_num.py
================================================
import gym
import numpy as np
from numpy.linalg import norm
import copy

from crowd_sim.envs.utils.action import ActionRot, ActionXY
from crowd_sim.envs import *
from crowd_sim.envs.utils.info import *
from crowd_sim.envs.utils.human import Human
from crowd_sim.envs.utils.state import JointState


class CrowdSimVarNum(CrowdSim):
    """
    The environment for our model with no trajectory prediction, or the baseline models with no prediction
    The number of humans at each timestep can change within a range
    """
    def __init__(self):
        """
        Movement simulation for n+1 agents
        Agent can either be human or robot.
        humans are controlled by a unknown and fixed policy.
        robot is controlled by a known and learnable policy.
        """
        super().__init__()
        self.id_counter = None
        self.observed_human_ids = None
        self.pred_method = None


    def configure(self, config):
        """ read the config to the environment variables """
        super(CrowdSimVarNum, self).configure(config)
        self.action_type=config.action_space.kinematics

    # set observation space and action space
    def set_robot(self, robot):
        self.robot = robot

        # we set the max and min of action/observation space as inf
        # clip the action and observation as you need

        d={}
        # robot node: px, py, r, gx, gy, v_pref, theta
        d['robot_node'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1,7,), dtype = np.float32)
        # only consider all temporal edges (human_num+1) and spatial edges pointing to robot (human_num)
        d['temporal_edges'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1, 2,), dtype=np.float32)
        d['spatial_edges'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(self.max_human_num, 2), dtype=np.float32)
        # number of humans detected at each timestep
        d['detected_human_num'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1, ), dtype=np.float32)
        # whether each human is visible to robot (ordered by human ID, should not be sorted)
        d['visible_masks'] = gym.spaces.Box(low=-np.inf, high=np.inf,
                                            shape=(self.max_human_num,),
                                            dtype=np.bool)
        self.observation_space=gym.spaces.Dict(d)

        high = np.inf * np.ones([2, ])
        self.action_space = gym.spaces.Box(-high, high, dtype=np.float32)


    # set robot initial state and generate all humans for reset function
    # for crowd nav: human_num == self.human_num
    # for leader follower: human_num = self.human_num - 1
    def generate_robot_humans(self, phase, human_num=None):
        if self.record:
            px, py = 0, 0
            gx, gy = 0, -1.5
            self.robot.set(px, py, gx, gy, 0, 0, np.pi / 2)
            # generate a dummy human
            for i in range(self.max_human_num):
                human = Human(self.config, 'humans')
                human.set(15, 15, 15, 15, 0, 0, 0)
                human.isObstacle = True
                self.humans.append(human)

        else:
            # for sim2real
            if self.robot.kinematics == 'unicycle':
                # generate robot
                angle = np.random.uniform(0, np.pi * 2)
                px = self.arena_size * np.cos(angle)
                py = self.arena_size * np.sin(angle)
                while True:
                    gx, gy = np.random.uniform(-self.arena_size, self.arena_size, 2)
                    if np.linalg.norm([px - gx, py - gy]) >= 4:  # 1 was 6
                        break
                self.robot.set(px, py, gx, gy, 0, 0, np.random.uniform(0, 2 * np.pi))  # randomize init orientation
                # 1 to 4 humans
                self.human_num = np.random.randint(1, self.config.sim.human_num + self.human_num_range + 1)
                # print('human_num:', self.human_num)
                # self.human_num = 4


            # for sim exp
            else:
                # generate robot
                while True:
                    px, py, gx, gy = np.random.uniform(-self.arena_size, self.arena_size, 4)
                    if np.linalg.norm([px - gx, py - gy]) >= 8: # 6
                        break
                self.robot.set(px, py, gx, gy, 0, 0, np.pi / 2)
                # generate humans
                self.human_num = np.random.randint(low=self.config.sim.human_num - self.human_num_range,
                                                   high=self.config.sim.human_num + self.human_num_range + 1)


            self.generate_random_human_position(human_num=self.human_num)
            self.last_human_states = np.zeros((self.human_num, 5))
            # set human ids
            for i in range(self.human_num):
                self.humans[i].id = i


    # generate a human that starts on a circle, and its goal is on the opposite side of the circle
    def generate_circle_crossing_human(self):
        human = Human(self.config, 'humans')
        if self.randomize_attributes:
            human.sample_random_attributes()

        while True:
            angle = np.random.random() * np.pi * 2
            # add some noise to simulate all the possible cases robot could meet with human
            noise_range = 2
            px_noise = np.random.uniform(0, 1) * noise_range
            py_noise = np.random.uniform(0, 1) * noise_range
            px = self.circle_radius * np.cos(angle) + px_noise
            py = self.circle_radius * np.sin(angle) + py_noise
            collide = False

            for i, agent in enumerate([self.robot] + self.humans):
                # keep human at least 3 meters away from robot
                if self.robot.kinematics == 'unicycle' and i == 0:
                    min_dist = self.circle_radius / 2  # Todo: if circle_radius <= 4, it will get stuck here
                else:
                    min_dist = human.radius + agent.radius + self.discomfort_dist
                if norm((px - agent.px, py - agent.py)) < min_dist or \
                        norm((px - agent.gx, py - agent.gy)) < min_dist:
                    collide = True
                    break
            if not collide:
                break

        human.set(px, py, -px, -py, 0, 0, 0)

        return human

    # calculate the ground truth future trajectory of humans
    # if robot is visible: assume linear motion for robot
    # ret val: [self.predict_steps + 1, self.human_num, 4]
    # method: 'truth' or 'const_vel' or 'inferred'
    def calc_human_future_traj(self, method):
        # if the robot is invisible, it won't affect human motions
        # else it will
        human_num = self.human_num + 1 if self.robot.visible else self.human_num
        # buffer to store predicted future traj of all humans [px, py, vx, vy]
        # [time, human id, features]
        if method == 'truth':
            self.human_future_traj = np.zeros((self.buffer_len + 1, human_num, 4))
        elif method == 'const_vel':
            self.human_future_traj = np.zeros((self.predict_steps + 1, human_num, 4))
        else:
            raise NotImplementedError

        # initialize the 0-th position with current states
        for i in range(self.human_num):
            # use true states for now, to count for invisible humans' influence on visible humans
            # take px, py, vx, vy, remove radius
            self.human_future_traj[0, i] = np.array(self.humans[i].get_observable_state_list()[:-1])

        # if we are using constant velocity model, we need to use displacement to approximate velocity (pos_t - pos_t-1)
        # we shouldn't use true velocity for fair comparison with GST inferred pred
        if method == 'const_vel':
            self.human_future_traj[0, :, 2:4] = self.prev_human_pos[:, 2:4]

        # add robot to the end of the array
        if self.robot.visible:
            self.human_future_traj[0, -1] = np.array(self.robot.get_observable_state_list()[:-1])

        if method == 'truth':
            for i in range(1, self.buffer_len + 1):
                for j in range(self.human_num):
                    # prepare joint state for all humans
                    full_state = np.concatenate(
                        (self.human_future_traj[i - 1, j], self.humans[j].get_full_state_list()[4:]))
                    observable_states = []
                    for k in range(self.human_num):
                        if j == k:
                            continue
                        observable_states.append(
                            np.concatenate((self.human_future_traj[i - 1, k], [self.humans[k].radius])))

                    # use joint states to get actions from the states in the last step (i-1)
                    action = self.humans[j].act_joint_state(JointState(full_state, observable_states))

                    # step all humans with action
                    self.human_future_traj[i, j] = self.humans[j].one_step_lookahead(
                        self.human_future_traj[i - 1, j, :2], action)

                if self.robot.visible:
                    action = ActionXY(*self.human_future_traj[i - 1, -1, 2:])
                    # update px, py, vx, vy
                    self.human_future_traj[i, -1] = self.robot.one_step_lookahead(self.human_future_traj[i - 1, -1, :2],
                                                                                  action)
            # only take predictions every self.pred_interval steps
            self.human_future_traj = self.human_future_traj[::self.pred_interval]
        # for const vel model
        elif method == 'const_vel':
            # [self.pred_steps+1, human_num, 4]
            self.human_future_traj = np.tile(self.human_future_traj[0].reshape(1, human_num, 4), (self.predict_steps+1, 1, 1))
            # [self.pred_steps+1, human_num, 2]
            pred_timestep = np.tile(np.arange(0, self.predict_steps+1, dtype=float).reshape((self.predict_steps+1, 1, 1)) * self.time_step * self.pred_interval,
                                    [1, human_num, 2])
            pred_disp = pred_timestep * self.human_future_traj[:, :, 2:]
            self.human_future_traj[:, :, :2] = self.human_future_traj[:, :, :2] + pred_disp
        else:
            raise NotImplementedError

        # remove the robot if it is visible
        if self.robot.visible:
            self.human_future_traj = self.human_future_traj[:, :-1]


        # remove invisible humans
        self.human_future_traj[:, np.logical_not(self.human_visibility), :2] = 15
        self.human_future_traj[:, np.logical_not(self.human_visibility), 2:] = 0

        return self.human_future_traj


    # reset = True: reset calls this function; reset = False: step calls this function
    # sorted: sort all humans by distance to robot or not
    def generate_ob(self, reset, sort=False):
        """Generate observation for reset and step functions"""
        ob = {}

        # nodes
        visible_humans, num_visibles, self.human_visibility = self.get_num_human_in_fov()

        ob['robot_node'] = self.robot.get_full_state_list_noV()

        prev_human_pos = copy.deepcopy(self.last_human_states)
        self.update_last_human_states(self.human_visibility, reset=reset)

        # edges
        ob['temporal_edges'] = np.array([self.robot.vx, self.robot.vy])

        # ([relative px, relative py, disp_x, disp_y], human id)
        all_spatial_edges = np.ones((self.max_human_num, 2)) * np.inf

        for i in range(self.human_num):
            if self.human_visibility[i]:
                # vector pointing from human i to robot
                relative_pos = np.array(
                    [self.last_human_states[i, 0] - self.robot.px, self.last_human_states[i, 1] - self.robot.py])
                all_spatial_edges[self.humans[i].id, :2] = relative_pos

        ob['visible_masks'] = np.zeros(self.max_human_num, dtype=np.bool)
        # sort all humans by distance (invisible humans will be in the end automatically)
        if sort:
            ob['spatial_edges'] = np.array(sorted(all_spatial_edges, key=lambda x: np.linalg.norm(x)))
            # after sorting, the visible humans must be in the front
            if num_visibles > 0:
                ob['visible_masks'][:num_visibles] = True
        else:
            ob['spatial_edges'] = all_spatial_edges
            ob['visible_masks'][:self.human_num] = self.human_visibility
        ob['spatial_edges'][np.isinf(ob['spatial_edges'])] = 15
        ob['detected_human_num'] = num_visibles
        # if no human is detected, assume there is one dummy human at (15, 15) to make the pack_padded_sequence work
        if ob['detected_human_num'] == 0:
            ob['detected_human_num'] = 1

        # update self.observed_human_ids
        self.observed_human_ids = np.where(self.human_visibility)[0]

        self.ob = ob

        return ob


    # Update the specified human's end goals in the environment randomly
    def update_human_pos_goal(self, human):
        while True:
            angle = np.random.random() * np.pi * 2
            # add some noise to simulate all the possible cases robot could meet with human
            v_pref = 1.0 if human.v_pref == 0 else human.v_pref
            gx_noise = (np.random.random() - 0.5) * v_pref
            gy_noise = (np.random.random() - 0.5) * v_pref
            gx = self.circle_radius * np.cos(angle) + gx_noise
            gy = self.circle_radius * np.sin(angle) + gy_noise
            collide = False

            if not collide:
                break

        # Give human new goal
        human.gx = gx
        human.gy = gy


    def reset(self, phase='train', test_case=None):
        """
        Reset the environment
        :return:
        """

        if self.phase is not None:
            phase = self.phase
        if self.test_case is not None:
            test_case=self.test_case

        if self.robot is None:
            raise AttributeError('robot has to be set!')
        assert phase in ['train', 'val', 'test']
        if test_case is not None:
            self.case_counter[phase] = test_case # test case is passed in to calculate specific seed to generate case
        self.global_time = 0
        self.step_counter = 0
        self.id_counter = 0


        self.humans = []
        # self.human_num = self.config.sim.human_num
        # initialize a list to store observed humans' IDs
        self.observed_human_ids = []

        # train, val, and test phase should start with different seed.
        # case capacity: the maximum number for train(max possible int -2000), val(1000), and test(1000)
        # val start from seed=0, test start from seed=case_capacity['val']=1000
        # train start from self.case_capacity['val'] + self.case_capacity['test']=2000
        counter_offset = {'train': self.case_capacity['val'] + self.case_capacity['test'],
                          'val': 0, 'test': self.case_capacity['val']}

        # here we use a counter to calculate seed. The seed=counter_offset + case_counter
        self.rand_seed = counter_offset[phase] + self.case_counter[phase] + self.thisSeed
        np.random.seed(self.rand_seed)

        self.generate_robot_humans(phase)

        # record px, py, r of each human, used for crowd_sim_pc env
        self.cur_human_states = np.zeros((self.max_human_num, 3))
        for i in range(self.human_num):
            self.cur_human_states[i] = np.array([self.humans[i].px, self.humans[i].py, self.humans[i].radius])

        # case size is used to make sure that the case_counter is always between 0 and case_size[phase]
        self.case_counter[phase] = (self.case_counter[phase] + int(1*self.nenv)) % self.case_size[phase]

        # initialize potential and angular potential
        rob_goal_vec = np.array([self.robot.gx, self.robot.gy]) - np.array([self.robot.px, self.robot.py])
        self.potential = -abs(np.linalg.norm(rob_goal_vec))
        self.angle = np.arctan2(rob_goal_vec[1], rob_goal_vec[0]) - self.robot.theta
        if self.angle > np.pi:
            # self.abs_angle = np.pi * 2 - self.abs_angle
            self.angle = self.angle - 2 * np.pi
        elif self.angle < -np.pi:
            self.angle = self.angle + 2 * np.pi

        # get robot observation
        ob = self.generate_ob(reset=True, sort=self.config.args.sort_humans)

        return ob


    def step(self, action, update=True):
        """
        Step the environment forward for one timestep
        Compute actions for all agents, detect collision, update environment and return (ob, reward, done, info)
        """
        if self.robot.policy.name in ['ORCA', 'social_force']:
            # assemble observation for orca: px, py, vx, vy, r
            human_states = copy.deepcopy(self.last_human_states)
            # get orca action
            action = self.robot.act(human_states.tolist())
        else:
            action = self.robot.policy.clip_action(action, self.robot.v_pref)

        if self.robot.kinematics == 'unicycle':
            self.desiredVelocity[0] = np.clip(self.desiredVelocity[0] + action.v, -self.robot.v_pref, self.robot.v_pref)
            action = ActionRot(self.desiredVelocity[0], action.r)

        human_actions = self.get_human_actions()

        # need to update self.human_future_traj in testing to calculate number of intrusions
        if self.phase == 'test':
            # use ground truth future positions of humans
            self.calc_human_future_traj(method='truth')

        # compute reward and episode info
        reward, done, episode_info = self.calc_reward(action, danger_zone='future')


        # apply action and update all agents
        self.robot.step(action)
        for i, human_action in enumerate(human_actions):
            self.humans[i].step(human_action)

        self.global_time += self.time_step # max episode length=time_limit/time_step
        self.step_counter =self.step_counter+1

        info={'info':episode_info}

        # Add or remove at most self.human_num_range humans
        # if self.human_num_range == 0 -> human_num is fixed at all times
        if self.human_num_range > 0 and self.global_time % 5 == 0:
            # remove humans
            if np.random.rand() < 0.5:
                # print('before:', self.human_num,', self.min_human_num:', self.min_human_num)
                # if no human is visible, anyone can be removed
                if len(self.observed_human_ids) == 0:
                    max_remove_num = self.human_num - self.min_human_num
                    # print('max_remove_num, invisible', max_remove_num)
                else:
                    max_remove_num = min(self.human_num - self.min_human_num, (self.human_num - 1) - max(self.observed_human_ids))
                    # print('max_remove_num, visible', max_remove_num)
                remove_num = np.random.randint(low=0, high=max_remove_num + 1)
                for _ in range(remove_num):
                    self.humans.pop()
                self.human_num = self.human_num - remove_num
                # print('after:', self.human_num)
                self.last_human_states = self.last_human_states[:self.human_num]
            # add humans
            else:
                add_num = np.random.randint(low=0, high=self.human_num_range + 1)
                if add_num > 0:
                    # set human ids
                    true_add_num = 0
                    for i in range(self.human_num, self.human_num + add_num):
                        if i == self.config.sim.human_num + self.human_num_range:
                            break
                        self.generate_random_human_position(human_num=1)
                        self.humans[i].id = i
                        true_add_num = true_add_num + 1
                    self.human_num = self.human_num + true_add_num
                    if true_add_num > 0:
                        self.last_human_states = np.concatenate((self.last_human_states, np.array([[15, 15, 0, 0, 0.3]]*true_add_num)), axis=0)

        assert self.min_human_num <= self.human_num <= self.max_human_num

        # compute the observation
        ob = self.generate_ob(reset=False, sort=self.config.args.sort_humans)


        # Update all humans' goals randomly midway through episode
        if self.random_goal_changing:
            if self.global_time % 5 == 0:
                self.update_human_goals_randomly()

        # Update a specific human's goal once its reached its original goal
        if self.end_goal_changing:
            for i, human in enumerate(self.humans):
                if norm((human.gx - human.px, human.gy - human.py)) < human.radius:
                    if self.robot.kinematics == 'holonomic':
                        self.humans[i] = self.generate_circle_crossing_human()
                        self.humans[i].id = i
                    else:
                        self.update_human_goal(human)

        return ob, reward, done, info

    # find R(s, a)
    # danger_zone: how to define the personal_zone (if the robot intrudes into this zone, the info will be Danger)
    # circle (traditional) or future (based on true future traj of humans)
    def calc_reward(self, action, danger_zone='circle'):
        # collision detection
        dmin = float('inf')

        danger_dists = []
        collision = False

        # collision check with humans
        for i, human in enumerate(self.humans):
            dx = human.px - self.robot.px
            dy = human.py - self.robot.py
            closest_dist = (dx ** 2 + dy ** 2) ** (1 / 2) - human.radius - self.robot.radius

            if closest_dist < self.discomfort_dist:
                danger_dists.append(closest_dist)
            if closest_dist < 0:
                collision = True
                break
            elif closest_dist < dmin:
                dmin = closest_dist


        # check if reaching the goal
        if self.robot.kinematics == 'unicycle':
            goal_radius = 0.6
        else:
            goal_radius = self.robot.radius
        reaching_goal = norm(
            np.array(self.robot.get_position()) - np.array(self.robot.get_goal_position())) < goal_radius

        # use danger_zone to determine the condition for Danger
        if danger_zone == 'circle' or self.phase == 'train':
            danger_cond = dmin < self.discomfort_dist
            min_danger_dist = 0
        else:
            # if the robot collides with future states, give it a collision penalty
            relative_pos = self.human_future_traj[1:, :, :2] - np.array([self.robot.px, self.robot.py])
            relative_dist = np.linalg.norm(relative_pos, axis=-1)

            collision_idx = relative_dist < self.robot.radius + self.config.humans.radius  # [predict_steps, human_num]

            danger_cond = np.any(collision_idx)
            # if robot is dangerously close to any human, calculate the min distance between robot and its closest human
            if danger_cond:
                min_danger_dist = np.amin(relative_dist[collision_idx])
            else:
                min_danger_dist = 0

        if self.global_time >= self.time_limit - 1:
            reward = 0
            done = True
            episode_info = Timeout()
        elif collision:
            reward = self.collision_penalty
            done = True
            episode_info = Collision()
        elif reaching_goal:
            reward = self.success_reward
            done = True
            episode_info = ReachGoal()

        elif danger_cond:
            # only penalize agent for getting too close if it's visible
            # adjust the reward based on FPS
            # print(dmin)
            reward = (dmin - self.discomfort_dist) * self.discomfort_penalty_factor * self.time_step
            done = False
            episode_info = Danger(min_danger_dist)

        else:
            # potential reward
            if self.robot.kinematics == 'holonomic':
                pot_factor = 2
            else:
                pot_factor = 3
            potential_cur = np.linalg.norm(
                np.array([self.robot.px, self.robot.py]) - np.array(self.robot.get_goal_position()))
            reward = pot_factor * (-abs(potential_cur) - self.potential)
            self.potential = -abs(potential_cur)

            done = False
            episode_info = Nothing()

        # if the robot is near collision/arrival, it should be able to turn a large angle
        if self.robot.kinematics == 'unicycle':
            # add a rotational penalty
            r_spin = -4.5 * action.r ** 2

            # add a penalty for going backwards
            if action.v < 0:
                r_back = -2 * abs(action.v)
            else:
                r_back = 0.

            reward = reward + r_spin + r_back

        return reward, done, episode_info


    def render(self, mode='human'):
        """ Render the current status of the environment using matplotlib """
        import matplotlib.pyplot as plt
        import matplotlib.lines as mlines
        from matplotlib import patches

        plt.rcParams['animation.ffmpeg_path'] = '/usr/bin/ffmpeg'

        robot_color = 'gold'
        goal_color = 'red'
        arrow_color = 'red'
        arrow_style = patches.ArrowStyle("->", head_length=4, head_width=2)

        def calcFOVLineEndPoint(ang, point, extendFactor):
            # choose the extendFactor big enough
            # so that the endPoints of the FOVLine is out of xlim and ylim of the figure
            FOVLineRot = np.array([[np.cos(ang), -np.sin(ang), 0],
                                   [np.sin(ang), np.cos(ang), 0],
                                   [0, 0, 1]])
            point.extend([1])
            # apply rotation matrix
            newPoint = np.matmul(FOVLineRot, np.reshape(point, [3, 1]))
            # increase the distance between the line start point and the end point
            newPoint = [extendFactor * newPoint[0, 0], extendFactor * newPoint[1, 0], 1]
            return newPoint


        ax=self.render_axis
        artists=[]

        # add goal
        goal=mlines.Line2D([self.robot.gx], [self.robot.gy], color=goal_color, marker='*', linestyle='None', markersize=15, label='Goal')
        ax.add_artist(goal)
        artists.append(goal)

        # add robot
        robotX,robotY=self.robot.get_position()

        robot=plt.Circle((robotX,robotY), self.robot.radius, fill=True, color=robot_color)
        ax.add_artist(robot)
        artists.append(robot)

        # plt.legend([robot, goal], ['Robot', 'Goal'], fontsize=16)


        # compute orientation in each step and add arrow to show the direction
        radius = self.robot.radius
        arrowStartEnd=[]

        robot_theta = self.robot.theta if self.robot.kinematics == 'unicycle' else np.arctan2(self.robot.vy, self.robot.vx)

        arrowStartEnd.append(((robotX, robotY), (robotX + radius * np.cos(robot_theta), robotY + radius * np.sin(robot_theta))))

        for i, human in enumerate(self.humans):
            theta = np.arctan2(human.vy, human.vx)
            arrowStartEnd.append(((human.px, human.py), (human.px + radius * np.cos(theta), human.py + radius * np.sin(theta))))

        arrows = [patches.FancyArrowPatch(*arrow, color=arrow_color, arrowstyle=arrow_style)
                  for arrow in arrowStartEnd]
        for arrow in arrows:
            ax.add_artist(arrow)
            artists.append(arrow)


        # draw FOV for the robot
        # add robot FOV
        if self.robot.FOV < 2 * np.pi:
            FOVAng = self.robot_fov / 2
            FOVLine1 = mlines.Line2D([0, 0], [0, 0], linestyle='--')
            FOVLine2 = mlines.Line2D([0, 0], [0, 0], linestyle='--')


            startPointX = robotX
            startPointY = robotY
            endPointX = robotX + radius * np.cos(robot_theta)
            endPointY = robotY + radius * np.sin(robot_theta)

            # transform the vector back to world frame origin, apply rotation matrix, and get end point of FOVLine
            # the start point of the FOVLine is the center of the robot
            FOVEndPoint1 = calcFOVLineEndPoint(FOVAng, [endPointX - startPointX, endPointY - startPointY], 20. / self.robot.radius)
            FOVLine1.set_xdata(np.array([startPointX, startPointX + FOVEndPoint1[0]]))
            FOVLine1.set_ydata(np.array([startPointY, startPointY + FOVEndPoint1[1]]))
            FOVEndPoint2 = calcFOVLineEndPoint(-FOVAng, [endPointX - startPointX, endPointY - startPointY], 20. / self.robot.radius)
            FOVLine2.set_xdata(np.array([startPointX, startPointX + FOVEndPoint2[0]]))
            FOVLine2.set_ydata(np.array([startPointY, startPointY + FOVEndPoint2[1]]))

            ax.add_artist(FOVLine1)
            ax.add_artist(FOVLine2)
            artists.append(FOVLine1)
            artists.append(FOVLine2)

        # add an arc of robot's sensor range
        sensor_range = plt.Circle(self.robot.get_position(), self.robot.sensor_range + self.robot.radius+self.config.humans.radius, fill=False, linestyle='--')
        ax.add_artist(sensor_range)
        artists.append(sensor_range)

        # add humans and change the color of them based on visibility
        human_circles = [plt.Circle(human.get_position(), human.radius, fill=False, linewidth=1.5) for human in self.humans]

        # hardcoded for now
        actual_arena_size = self.arena_size + 0.5
        for i in range(len(self.humans)):
            ax.add_artist(human_circles[i])
            artists.append(human_circles[i])

            # green: visible; red: invisible
            # if self.detect_visible(self.robot, self.humans[i], robot1=True):
            if self.human_visibility[i]:
                human_circles[i].set_color(c='g')
            else:
                human_circles[i].set_color(c='r')
            if self.humans[i].id in self.observed_human_ids:
                human_circles[i].set_color(c='b')

            plt.text(self.humans[i].px - 0.1, self.humans[i].py - 0.1, str(self.humans[i].id), color='black', fontsize=12)


        plt.pause(0.01)
        for item in artists:
            item.remove() # there should be a better way to do this. For example,
            # initially use add_artist and draw_artist later on
        for t in ax.texts:
            t.set_visible(False)


================================================
FILE: crowd_sim/envs/crowd_sim_var_num_collect.py
================================================
import gym
import numpy as np
from numpy.linalg import norm
import copy

from crowd_sim.envs import *
from crowd_sim.envs.utils.info import *


class CrowdSimVarNumCollect(CrowdSimVarNum):
    """
    An environment for collecting a dataset of simulated humans to train GST predictor (used in collect_data.py)
    The observation contains all detected humans
    A key in ob indicates how many humans are detected
    """
    def __init__(self):
        """
        Movement simulation for n+1 agents
        Agent can either be human or robot.
        humans are controlled by a unknown and fixed policy.
        robot is controlled by a known and learnable policy.
        """
        super().__init__()


    def set_robot(self, robot):
        self.robot = robot

        # set observation space and action space
        # we set the max and min of action/observation space as inf
        # clip the action and observation as you need

        d={}
        # frame id, human prediction id, px, py
        d['pred_info'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(self.config.sim.human_num + self.config.sim.human_num_range, 4), dtype=np.float32)
        self.observation_space=gym.spaces.Dict(d)

        high = np.inf * np.ones([2, ])
        self.action_space = gym.spaces.Box(-high, high, dtype=np.float32)


    def reset(self, phase='train', test_case=None):
        """
        Set px, py, gx, gy, vx, vy, theta for robot and humans
        :return:
        """

        if self.phase is not None:
            phase = self.phase
        if self.test_case is not None:
            test_case=self.test_case

        if self.robot is None:
            raise AttributeError('robot has to be set!')
        assert phase in ['train', 'val', 'test']
        if test_case is not None:
            self.case_counter[phase] = test_case # test case is passed in to calculate specific seed to generate case
        self.global_time = 0
        self.id_counter = 0


        self.humans = []
        # self.human_num = self.config.sim.human_num
        # initialize a list to store observed humans' IDs
        self.observed_human_ids = []

        # train, val, and test phase should start with different seed.
        # case capacity: the maximum number for train(max possible int -2000), val(1000), and test(1000)
        # val start from seed=0, test start from seed=case_capacity['val']=1000
        # train start from self.case_capacity['val'] + self.case_capacity['test']=2000
        counter_offset = {'train': self.case_capacity['val'] + self.case_capacity['test'],
                          'val': 0, 'test': self.case_capacity['val']}

        # here we use a counter to calculate seed. The seed=counter_offset + case_counter
        np.random.seed(counter_offset[phase] + self.case_counter[phase] + self.thisSeed)
        self.rand_seed = counter_offset[phase] + self.case_counter[phase] + self.thisSeed
        # print(counter_offset[phase] + self.case_counter[phase] + self.thisSeed)
        # np.random.seed(1038)

        self.generate_robot_humans(phase)
        self.last_human_observability = np.zeros(self.human_num, dtype=bool)
        self.human_pred_id = np.arange(0, self.human_num)
        self.max_human_id = self.human_num

        # case size is used to make sure that the case_counter is always between 0 and case_size[phase]
        self.case_counter[phase] = (self.case_counter[phase] + int(1*self.nenv)) % self.case_size[phase]

        # get robot observation
        ob = self.generate_ob(reset=True)

        # initialize potential
        self.potential = -abs(np.linalg.norm(np.array([self.robot.px, self.robot.py]) - np.array([self.robot.gx, self.robot.gy])))

        return ob

    # reset = True: reset calls this function; reset = False: step calls this function
    def generate_ob(self, reset, sort=False):
        # we should keep human ID tracking for traj pred!
        #assert sort == False
        ob = {}

        # nodes
        visible_humans, num_visibles, human_visibility = self.get_num_human_in_fov()

        humans_out = np.logical_and(self.last_human_observability, np.logical_not(human_visibility))
        num_humans_out = np.sum(humans_out)

        self.human_pred_id[humans_out] = np.arange(self.max_human_id, self.max_human_id+num_humans_out)
        self.max_human_id = self.max_human_id + num_humans_out

        self.update_last_human_states(human_visibility, reset=reset)

        # ([relative px, relative py, disp_x, disp_y], human id)
        all_spatial_edges = np.ones((self.config.sim.human_num + self.config.sim.human_num_range, 2)) * np.inf

        for i in range(self.human_num):
            if human_visibility[i]:
                all_spatial_edges[self.humans[i].id, :2] = self.last_human_states[i, :2]

        frame_array = np.repeat(self.global_time/self.config.data.pred_timestep, self.human_num)
        ob['pred_info'] = np.concatenate((frame_array.reshape((self.human_num, 1)),
                          self.human_pred_id.reshape((self.human_num, 1)), all_spatial_edges), axis=1)

        # update self.observed_human_ids
        self.observed_human_ids = np.where(human_visibility)[0]
        self.ob = ob

        self.last_human_observability = copy.deepcopy(human_visibility)

        return ob

    # find R(s, a)
    # danger_zone: how to define the personal_zone (if the robot intrudes into this zone, the info will be Danger)
    # circle (traditional) or future (based on true future traj of humans)
    def calc_reward(self, action, danger_zone='circle'):
        # collision detection
        dmin = float('inf')

        danger_dists = []
        collision = False

        for i, human in enumerate(self.humans):
            dx = human.px - self.robot.px
            dy = human.py - self.robot.py
            closest_dist = (dx ** 2 + dy ** 2) ** (1 / 2) - human.radius - self.robot.radius

            if closest_dist < self.discomfort_dist:
                danger_dists.append(closest_dist)
            if closest_dist < 0:
                collision = True
                # logging.debug("Collision: distance between robot and p{} is {:.2E}".format(i, closest_dist))
                break
            elif closest_dist < dmin:
                dmin = closest_dist

        # check if reaching the goal
        reaching_goal = norm(
            np.array(self.robot.get_position()) - np.array(self.robot.get_goal_position())) < self.robot.radius


        if self.global_time >= 40000:
            done = True
            episode_info = Timeout()
        elif collision:
            done = False
            episode_info = Collision()
        elif reaching_goal:
            done = False
            episode_info = ReachGoal()
            # randomize human attributes

            # median of all humans
            if np.random.uniform(0, 1) < 0.5:
                human_pos = np.zeros((self.human_num, 2))
                for i, human in enumerate(self.humans):
                    human_pos[i] = np.array([human.px, human.py])
                self.robot.gx, self.robot.gy = np.median(human_pos, axis=0)
            # random goal
            else:
                self.robot.gx, self.robot.gy = np.random.uniform(-self.arena_size, self.arena_size, size=2)

        else:
            done = False
            episode_info = Nothing()


        return 0, done, episode_info

================================================
FILE: crowd_sim/envs/ros_turtlebot2i_env.py
================================================
import gym
import numpy as np
from numpy.linalg import norm
import pandas as pd
import os

# prevent import error if other code is run in conda env
try:
	# import ROS related packages
	import rospy
	import tf2_ros
	from geometry_msgs.msg import Twist, TransformStamped, PoseArray
	import tf
	from sensor_msgs.msg import JointState
	from threading import Lock
	from message_filters import ApproximateTimeSynchronizer, TimeSynchronizer, Subscriber

except:
	pass

import copy
import sys

from crowd_sim.envs.crowd_sim_pred_real_gst import CrowdSimPredRealGST


class rosTurtlebot2iEnv(CrowdSimPredRealGST):
	'''
	Environment for testing a simulated policy on a real Turtlebot2i
	To use it, change the env_name in arguments.py in the tested model folder to 'rosTurtlebot2iEnv-v0'
	'''
	metadata = {'render.modes': ['human']}

	def __init__(self):
		super(CrowdSimPredRealGST, self).__init__()

		# subscriber callback function will change these two variables
		self.robotMsg=None # robot state message
		self.humanMsg=None # human state message
		self.jointMsg=None # joint state message

		self.currentTime=0.0
		self.lastTime=0.0 # store time for calculating time interval

		self.human_visibility=None
		self.current_human_states = None  # (px,py)
		self.detectedHumanNum=0

		# goal positions will be set manually in self.reset()
		self.goal_x = 0.0
		self.goal_y = 0.0

		self.last_left = 0.
		self.last_right = 0.
		self.last_w = 0.0
		self.jointVel=None

		# to calculate vx, vy
		self.last_v = 0.0
		self.desiredVelocity=[0.0,0.0]

		self.mutex = Lock()


	def configure(self, config):
		super().configure(config)
		# kalman filter for human tracking

		if config.sim.predict_method == 'none':
			self.add_pred = False
		else:
			self.add_pred = True

		# define ob space and action space
		self.set_ob_act_space()
		# ROS
		rospy.init_node('ros_turtlebot2i_env_node', anonymous=True)

		self.actionPublisher = rospy.Publisher('/cmd_vel_mux/input/navi', Twist, queue_size=1)
		self.tfBuffer = tf2_ros.Buffer()
		self.transformListener = tf2_ros.TransformListener(self.tfBuffer)
		# ROS subscriber
		jointStateSub = Subscriber("/joint_states", JointState)
		humanStatesSub = Subscriber('/dr_spaam_detections', PoseArray)  # human px, py, visible
		if self.use_dummy_detect:
			subList = [jointStateSub]
		else:
			subList = [jointStateSub, humanStatesSub]

		# synchronize the robot base joint states and humnan detections with at most 1 seconds of difference
		self.ats = ApproximateTimeSynchronizer(subList, queue_size=1, slop=1)

		# if ignore sensor inputs and use fake human detections
		if self.use_dummy_detect:
			self.ats.registerCallback(self.state_cb_dummy)
		else:
			self.ats.registerCallback(self.state_cb)

		rospy.on_shutdown(self.shutdown)


	def set_robot(self, robot):
		self.robot = robot

	def set_ob_act_space(self):
		# set observation space and action space
		# we set the max and min of action/observation space as inf
		# clip the action and observation as you need

		d = {}
		# robot node: num_visible_humans, px, py, r, gx, gy, v_pref, theta
		d['robot_node'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1, 7,), dtype=np.float32)
		# only consider all temporal edges (human_num+1) and spatial edges pointing to robot (human_num)
		d['temporal_edges'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1, 2,), dtype=np.float32)
		# add prediction
		if self.add_pred:
			self.spatial_edge_dim = int(2 * (self.predict_steps + 1))
			d['spatial_edges'] = gym.spaces.Box(low=-np.inf, high=np.inf,
												shape=(self.config.sim.human_num + self.config.sim.human_num_range,
													   self.spatial_edge_dim),
												dtype=np.float32)
		# no prediction
		else:
			d['spatial_edges'] = gym.spaces.Box(low=-np.inf, high=np.inf,
												shape=(self.config.sim.human_num + self.config.sim.human_num_range, 2),
												dtype=np.float32)
		# number of humans detected at each timestep

		# masks for gst pred model
		# whether each human is visible to robot
		d['visible_masks'] = gym.spaces.Box(low=-np.inf, high=np.inf,
											shape=(self.config.sim.human_num + self.config.sim.human_num_range,),
											dtype=np.bool)

		# number of humans detected at each timestep
		d['detected_human_num'] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1,), dtype=np.float32)

		self.observation_space = gym.spaces.Dict(d)

		high = np.inf * np.ones([2, ])
		self.action_space = gym.spaces.Box(-high, high, dtype=np.float32)


	# (used if self.use_dummy_detect is False)
	# callback function to store the realtime messages from the robot to this env
	def state_cb(self, jointStateMsg, humanArrayMsg):
		self.humanMsg=humanArrayMsg.poses
		self.jointMsg=jointStateMsg

	# (used if self.use_dummy_detect is True)
	# callback function to store the realtime messages from the robot to this env
	# no need to real human message
	def state_cb_dummy(self, jointStateMsg):
		self.jointMsg = jointStateMsg

	def readMsg(self):
		"""
		read messages passed through ROS & prepare for generating obervations
		this function should be called right before generate_ob() is called
		"""
		self.mutex.acquire()
		# get time
		# print(self.jointMsg.header.stamp.secs, self.jointMsg.header.stamp.nsecs)
		self.currentTime = self.jointMsg.header.stamp.secs + self.jointMsg.header.stamp.nsecs / 1e9

		# get robot pose from T265 SLAM camera
		try:
			self.robotMsg = self.tfBuffer.lookup_transform('t265_odom_frame', 't265_pose_frame', rospy.Time.now(), rospy.Duration(1.0))
		except:
			print("problem in getting transform")

		# get robot base velocity from the base
		try:
			self.jointVel=self.jointMsg.velocity
		except:
			print("problem in getting joint velocity")

		# print(self.robotMsg, "ROBOT mSG")
		# store the robot pose and robot base velocity in self variables
		self.robot.px = -self.robotMsg.transform.translation.y
		self.robot.py = self.robotMsg.transform.translation.x
		print('robot pos:', self.robot.px, self.robot.py)

		quaternion = (
			self.robotMsg.transform.rotation.x,
			self.robotMsg.transform.rotation.y,
			self.robotMsg.transform.rotation.z,
			self.robotMsg.transform.rotation.w
		)

		if self.use_dummy_detect:
			self.detectedHumanNum = 1
			self.human_visibility = np.zeros((self.max_human_num,), dtype=np.bool)
		else:
			# human states

			self.detectedHumanNum=min(len(self.humanMsg), self.max_human_num)
			self.current_human_states_raw = np.ones((self.detectedHumanNum, 2)) * 15
			self.human_visibility=np.zeros((self.max_human_num, ), dtype=np.bool)


			for i in range(self.detectedHumanNum):
				# use hard coded map to filter out obstacles
				global_x = self.robot.px + self.humanMsg[i].position.x
				global_y = self.robot.py + self.humanMsg[i].position.y
				# if -2.5 < global_x < 2.5 and -2.5 < global_y < 2.5:
				if True:
					self.current_human_states_raw[i,0]=self.humanMsg[i].position.x
					self.current_human_states_raw[i,1] = self.humanMsg[i].position.y

		self.mutex.release()

		# robot orientation
		self.robot.theta = tf.transformations.euler_from_quaternion(quaternion)[2] + np.pi / 2

		if self.robot.theta < 0:
			self.robot.theta = self.robot.theta + 2 * np.pi

		# add 180 degrees because of the transform from lidar frame to t265 camera frame
		hMatrix = np.array([[np.cos(self.robot.theta+np.pi), -np.sin(self.robot.theta+np.pi), 0, 0],
							  [np.sin(self.robot.theta+np.pi), np.cos(self.robot.theta+np.pi), 0, 0],
							 [0,0,1,0], [0,0,0,1]])

		# if we detected at least one person
		self.current_human_states = np.ones((self.max_human_num, 2)) * 15

		if not self.use_dummy_detect:
			for j in range(self.detectedHumanNum):
				xy=np.matmul(hMatrix,np.array([[self.current_human_states_raw[j,0],
												self.current_human_states_raw[j,1],
												0,
												1]]).T)

				self.current_human_states[j]=xy[:2,0]

		else:
			self.current_human_states[0] = np.array([0, 1]) - np.array([self.robot.px, self.robot.py])


		self.robot.vx = self.last_v * np.cos(self.robot.theta)
		self.robot.vy = self.last_v * np.sin(self.robot.theta)


	def reset(self):
		"""
		Reset function
		"""

		# stop the turtlebot
		self.smoothStop()
		self.step_counter=0
		self.currentTime=0.0
		self.lastTime=0.0
		self.global_time = 0.

		self.human_visibility = np.zeros((self.max_human_num,), dtype=np.bool)
		self.detectedHumanNum=0
		self.current_human_states = np.ones((self.max_human_num, 2)) * 15
		self.desiredVelocity = [0.0, 0.0]
		self.last_left  = 0.
		self.last_right = 0.
		self.last_w = 0.0

		self.last_v = 0.0

		while True:
			a = input("Press y for the next episode \t")
			if a == "y":
				self.robot.gx=float(input("Input goal location in x-axis\t"))
				self.robot.gy=float(input("Input goal location in y-axis\t"))
				break
			else:
				sys.exit()


		if self.record:
			self.episodeRecoder.robot_goal.append([self.robot.gx, self.robot.gy])


		self.readMsg()
		ob=self.generate_ob() # generate initial obs


		return ob

	# input: v, w
	# output: v, w
	def smooth(self, v, w):
		beta = 0.1 #TODO: you use 0.2 in the simulator
		left = (2. * v - 0.23 * w) / (2. * 0.035)
		right = (2. * v + 0.23 * w) / (2. * 0.035)
		# print('199:', left, right)
		left = np.clip(left, -17.5, 17.5)
		right = np.clip(right, -17.5, 17.5)
		# print('202:', left, right)
		left = (1.-beta) * self.last_left + beta * left
		right = (1.-beta) * self.last_right + beta * right
		# print('205:', left, right)
		
		self.last_left = copy.deepcopy(left)
		self.last_right = copy.deepcopy(right)

		v_smooth = 0.035 / 2 * (left + right)
		w_smooth = 0.035 / 0.23 * (right - left)

		return v_smooth, w_smooth

	def generate_ob(self):
		ob = {}

		ob['robot_node'] = np.array([[self.robot.px, self.robot.py, self.robot.radius,
								self.robot.gx, self.robot.gy, self.robot.v_pref, self.robot.theta]])
		ob['temporal_edges']=np.array([[self.robot.vx, self.robot.vy]])
		# print(self.current_human_states.shape)
		spatial_edges=self.current_human_states
		if self.add_pred:
			# predicted steps will be filled in the vec_pretext_normalize wrapper
			spatial_edges=np.concatenate([spatial_edges, np.zeros((self.max_human_num, 2 * self.predict_steps))], axis=1)

		else:
			# sort humans by distance to robot
			spatial_edges = np.array(sorted(spatial_edges, key=lambda x: np.linalg.norm(x)))
			print(spatial_edges)
		ob['spatial_edges'] = spatial_edges
		ob['visible_masks'] = self.human_visibility

		ob['detected_human_num'] = self.detectedHumanNum
		if ob['detected_human_num'] == 0:
			ob['detected_human_num'] = 1
		print(ob['detected_human_num'])
			
		return ob
		

	def step(self, action, update=True):
		""" Step function """
		print("Step", self.step_counter)
		# process action
		realAction = Twist()

		if self.load_act: # load action from file for robot dynamics checking
			v_unsmooth= self.episodeRecoder.v_list[self.step_counter]
			# in the simulator we use and recrod delta theta. We convert it to omega by dividing it by the time interval
			w_unsmooth = self.episodeRecoder.delta_theta_list[self.step_counter] / self.delta_t
			# v_smooth, w_smooth = self.desiredVelocity[0], self.desiredVelocity[1]
			v_smooth, w_smooth = self.smooth(v_unsmooth, w_unsmooth)
		else:
			action = self.robot.policy.clip_action(action, None)

			self.desiredVelocity[0] = np.clip(self.desiredVelocity[0] + action.v, -self.robot.v_pref, self.robot.v_pref)
			self.desiredVelocity[1] = action.r / self.fixed_time_interval # TODO: dynamic time step is not supported now


			v_smooth, w_smooth = self.smooth(self.desiredVelocity[0], self.desiredVelocity[1])

		
		self.last_v = v_smooth

		realAction.linear.x = v_smooth
		realAction.angular.z = w_smooth

		self.actionPublisher.publish(realAction)


		rospy.sleep(self.ROSStepInterval)  # act as frame skip

		# get the latest states

		self.readMsg()


		# update time
		if self.step_counter==0: # if it is the first step of the episode
			self.delta_t = np.inf
		else:
			# time interval between two steps
			if self.use_fixed_time_interval:
				self.delta_t=self.fixed_time_interval
			else: self.delta_t = self.currentTime - self.lastTime
			print('delta_t:', self.currentTime - self.lastTime)
			#print('actual delta t:', currentTime - self.baseEnv.lastTime)
			self.global_time = self.global_time + self.delta_t
		self.step_counter=self.step_counter+1
		self.lastTime = self.currentTime


		# generate new observation
		ob=self.generate_ob()


		# calculate reward
		reward = 0

		# determine if the episode ends
		done=False
		reaching_goal = norm(np.array([self.robot.gx, self.robot.gy]) - np.array([self.robot.px, self.robot.py]))  < 0.6
		if self.global_time >= self.time_limit:
			done = True
			print("Timeout")
		elif reaching_goal:
			done = True
			print("Goal Achieved")
		elif self.load_act and self.record:
			if self.step_counter >= len(self.episodeRecoder.v_list):
				done = True
		else:
			done = False


		info = {'info': None}

		if self.record:
			self.episodeRecoder.wheelVelList.append(self.jointVel) # it is the calculated wheel velocity, not the measured
			self.episodeRecoder.actionList.append([v_smooth, w_smooth])
			self.episodeRecoder.positionList.append([self.robot.px, self.robot.py])
			self.episodeRecoder.orientationList.append(self.robot.theta)

		if done:
			print('DOne!')
			if self.record:
				self.episodeRecoder.saveEpisode(self.case_counter['test'])


		return ob, reward, done, info

	def render(self, mode='human'):
		import matplotlib.pyplot as plt
		import matplotlib.lines as mlines
		from matplotlib import patches

		plt.rcParams['animation.ffmpeg_path'] = '/usr/bin/ffmpeg'

		robot_color = 'yellow'
		goal_color = 'red'
		arrow_color = 'red'
		arrow_style = patches.ArrowStyle("->", head_length=4, head_width=2)

		def calcFOVLineEndPoint(ang, point, extendFactor):
			# choose the extendFactor big enough
			# so that the endPoints of the FOVLine is out of xlim and ylim of the figure
			FOVLineRot = np.array([[np.cos(ang), -np.sin(ang), 0],
								   [np.sin(ang), np.cos(ang), 0],
								   [0, 0, 1]])
			point.extend([1])
			# apply rotation matrix
			newPoint = np.matmul(FOVLineRot, np.reshape(point, [3, 1]))
			# increase the distance between the line start point and the end point
			newPoint = [extendFactor * newPoint[0, 0], extendFactor * newPoint[1, 0], 1]
			return newPoint

		ax = self.render_axis
		artists = []

		# add goal
		goal = mlines.Line2D([self.robot.gx], [self.robot.gy], color=goal_color, marker='*', linestyle='None',
							 markersize=15, label='Goal')
		ax.add_artist(goal)
		artists.append(goal)

		# add robot
		robotX, robotY = self.robot.px, self.robot.py

		robot = plt.Circle((robotX, robotY), self.robot.radius, fill=True, color=robot_color)
		ax.add_artist(robot)
		artists.append(robot)

		plt.legend([robot, goal], ['Robot', 'Goal'], fontsize=16)

		# compute orientation in each step and add arrow to show the direction
		radius = 0.1
		arrowStartEnd = []

		robot_theta = self.robot.theta

		arrowStartEnd.append(
			((robotX, robotY), (robotX + radius * np.cos(robot_theta), robotY + radius * np.sin(robot_theta))))

		arrows = [patches.FancyArrowPatch(*arrow, color=arrow_color, arrowstyle=arrow_style)
				  for arrow in arrowStartEnd]
		for arrow in arrows:
			ax.add_artist(arrow)
			artists.append(arrow)


		# add humans and change the color of them based on visibility
		# print(self.current_human_states.shape)
		# print(self.track_num)
		# render tracked humans
		human_circles = [plt.Circle(self.current_human_states[i] + np.array([self.robot.px, self.robot.py]), 0.2, fill=False) \
						 for i in range(self.max_human_num)]
		# render untracked humans
		# human_circles = [
		# 	plt.Circle(self.current_human_states_raw[i] + np.array([self.robot.px, self.robot.py]), 0.2, fill=False) for i
		# 	in range(self.detectedHumanNum)]


		for i in range(self.max_human_num):
		# for i in range(self.detectedHumanNum):
			ax.add_artist(human_circles[i])
			artists.append(human_circles[i])

			# green: visible; red: invisible
			if self.human_visibility[i]:
				human_circles[i].set_color(c='g')
			else:
				human_circles[i].set_color(c='r')


		plt.pause(0.1)
		for item in artists:
			item.remove()  # there should be a better way to do this. For example,
		# initially use add_artist and draw_artist later on
		for t in ax.texts:
			t.set_visible(False)

	def shutdown(self):
		self.smoothStop()
		print("You are stopping the robot!")
		self.reset()
		

	def smoothStop(self):
		realAction = Twist()
		self.actionPublisher.publish(Twist())


================================================
FILE: crowd_sim/envs/utils/__init__.py
================================================


================================================
FILE: crowd_sim/envs/utils/action.py
================================================
from collections import namedtuple

ActionXY = namedtuple('ActionXY', ['vx', 'vy'])
ActionRot = namedtuple('ActionRot', ['v', 'r'])


================================================
FILE: crowd_sim/envs/utils/agent.py
================================================
import numpy as np
from numpy.linalg import norm
import abc
import logging
from crowd_nav.policy.policy_factory import policy_factory
from crowd_sim.envs.utils.action import ActionXY, ActionRot
from crowd_sim.envs.utils.state import ObservableState, FullState


class Agent(object):
    def __init__(self, config, section):
        """
        Base class for robot and human. Have the physical attributes of an agent.

        """
        subconfig = config.robot if section == 'robot' else config.humans
        self.visible = subconfig.visible
        self.v_pref = subconfig.v_pref
        self.radius = subconfig.radius
        # randomize neighbor_dist of ORCA
        if config.env.randomize_attributes:
            config.orca.neighbor_dist = np.random.uniform(5, 10)
        self.policy = policy_factory[subconfig.policy](config)
        self.sensor = subconfig.sensor
        self.FOV = np.pi * subconfig.FOV
        # for humans: we only have holonomic kinematics; for robot: depend on config
        self.kinematics = 'holonomic' if section == 'humans' else config.action_space.kinematics
        self.px = None
        self.py = None
        self.gx = None
        self.gy = None
        self.vx = None
        self.vy = None
        self.theta = None
        self.time_step = config.env.time_step
        self.policy.time_step = config.env.time_step


    def print_info(self):
        logging.info('Agent is {} and has {} kinematic constraint'.format(
            'visible' if self.visible else 'invisible', self.kinematics))


    def sample_random_attributes(self):
        """
        Sample agent radius and v_pref attribute from certain distribution
        :return:
        """
        self.v_pref = np.random.uniform(0.5, 1.5)
        self.radius = np.random.uniform(0.3, 0.5)

    def set(self, px, py, gx, gy, vx, vy, theta, radius=None, v_pref=None):
        self.px = px
        self.py = py
        self.gx = gx
        self.gy = gy
        self.vx = vx
        self.vy = vy
        self.theta = theta

        if radius is not None:
            self.radius = radius
        if v_pref is not None:
            self.v_pref = v_pref

    # self.px, self.py, self.vx, self.vy, self.radius, self.gx, self.gy, self.v_pref, self.theta
    def set_list(self, px, py, vx, vy, radius, gx, gy, v_pref, theta):
        self.px = px
        self.py = py
        self.gx = gx
        self.gy = gy
        self.vx = vx
        self.vy = vy
        self.theta = theta
        self.radius = radius
        self.v_pref = v_pref

    def get_observable_state(self):
        return ObservableState(self.px, self.py, self.vx, self.vy, self.radius)

    def get_observable_state_list(self):
        return [self.px, self.py, self.vx, self.vy, self.radius]

    def get_observable_state_list_noV(self):
        return [self.px, self.py, self.radius]

    def get_next_observable_state(self, action):
        self.check_validity(action)
        pos = self.compute_position(action, self.time_step)
        next_px, next_py = pos
        if self.kinematics == 'holonomic':
            next_vx = action.vx
            next_vy = action.vy
        else:
            next_theta = self.theta + action.r
            next_vx = action.v * np.cos(next_theta)
            next_vy = action.v * np.sin(next_theta)
        return ObservableState(next_px, next_py, next_vx, next_vy, self.radius)

    def get_full_state(self):
        return FullState(self.px, self.py, self.vx, self.vy, self.radius, self.gx, self.gy, self.v_pref, self.theta)

    def get_full_state_list(self):
        return [self.px, self.py, self.vx, self.vy, self.radius, self.gx, self.gy, self.v_pref, self.theta]

    def get_full_state_list_noV(self):
        return [self.px, self.py, self.radius, self.gx, self.gy, self.v_pref, self.theta]
        # return [self.px, self.py, self.radius, self.gx, self.gy, self.v_pref]

    def get_position(self):
        return self.px, self.py

    def set_position(self, position):
        self.px = position[0]
        self.py = position[1]

    def get_goal_position(self):
        return self.gx, self.gy

    def get_velocity(self):
        return self.vx, self.vy


    def set_velocity(self, velocity):
        self.vx = velocity[0]
        self.vy = velocity[1]


    @abc.abstractmethod
    def act(self, ob):
        """
        Compute state using received observation and pass it to policy

        """
        return

    def check_validity(self, action):
        if self.kinematics == 'holonomic':
            assert isinstance(action, ActionXY)
        else:
            assert isinstance(action, ActionRot)

    def compute_position(self, action, delta_t):
        self.check_validity(action)
        if self.kinematics == 'holonomic':
            px = self.px + action.vx * delta_t
            py = self.py + action.vy * delta_t
        # unicycle
        else:
            # naive dynamics
            # theta = self.theta + action.r * delta_t # if action.r is w
            # # theta = self.theta + action.r # if action.r is delta theta
            # px = self.px + np.cos(theta) * action.v * delta_t
            # py = self.py + np.sin(theta) * action.v * delta_t

            # differential drive
            epsilon = 0.0001
            if abs(action.r) < epsilon:
                R = 0
            else:
                w = action.r/delta_t # action.r is delta theta
                R = action.v/w

            px = self.px - R * np.sin(self.theta) + R * np.sin(self.theta + action.r)
            py = self.py + R * np.cos(self.theta) - R * np.cos(self.theta + action.r)


        return px, py

    def step(self, action):
        """
        Perform an action and update the state
        """
        self.check_validity(action)
        pos = self.compute_position(action, self.time_step)
        self.px, self.py = pos
        if self.kinematics == 'holonomic':
            self.vx = action.vx
            self.vy = action.vy
        else:
            self.theta = (self.theta + action.r) % (2 * np.pi)
            self.vx = action.v * np.cos(self.theta)
            self.vy = action.v * np.sin(self.theta)

    def one_step_lookahead(self, pos, action):
        px, py = pos
        self.check_validity(action)
        new_px = px + action.vx * self.time_step
        new_py = py + action.vy * self.time_step
        new_vx = action.vx
        new_vy = action.vy
        return [new_px, new_py, new_vx, new_vy]

    def reached_destination(self):
        return norm(np.array(self.get_position()) - np.array(self.get_goal_position())) < self.radius


================================================
FILE: crowd_sim/envs/utils/human.py
================================================
from crowd_sim.envs.utils.agent import Agent
from crowd_sim.envs.utils.state import JointState


class Human(Agent):
    # see Agent class in agent.py for details!!!
    def __init__(self, config, section):
        super().__init__(config, section)
        self.isObstacle = False # whether the human is a static obstacle (part of wall) or a moving agent
        self.id = None # the human's ID, used for collecting traj data
        self.observed_id = -1 # if it is observed, give it a tracking ID

    # ob: a list of observable states
    def act(self, ob):
        """
        The state for human is its full state and all other agents' observable states
        :param ob:
        :return:
        """

        state = JointState(self.get_full_state(), ob)
        action = self.policy.predict(state)
        return action

    # ob: a joint state (ego agent's full state + other agents' observable states)
    def act_joint_state(self, ob):
        """
        The state for human is its full state and all other agents' observable states
        :param ob:
        :return:
        """
        action = self.policy.predict(ob)
        return action


================================================
FILE: crowd_sim/envs/utils/info.py
================================================
class Timeout(object):
    def __init__(self):
        pass

    def __str__(self):
        return 'Timeout'


class ReachGoal(object):
    def __init__(self):
        pass

    def __str__(self):
        return 'Reaching goal'


class Danger(object):
    def __init__(self, min_dist):
        self.min_dist = min_dist

    def __str__(self):
        return 'Too close'


class Collision(object):
    def __init__(self):
        pass

    def __str__(self):
        return 'Collision'

class OutRoad(object):
    def __init__(self):
        pass

    def __str__(self):
        return 'Out of road'

class Nothing(object):
    def __init__(self):
        pass

    def __str__(self):
        return ''


================================================
FILE: crowd_sim/envs/utils/recorder.py
================================================
import pandas as pd
import os
import numpy as np

class Recoder(object):
	def __init__(self):

		self.unsmoothed_actions = []
		self.actionList = []  # recorded in turtlebot_env.step. [trans,rot]
		self.wheelVelList = []  # recorded in apply_action(). [left,right]
		self.orientationList = []  # recorded in calc_state() [angle]
		self.positionList = []  # recorded in calc_state() [x,y]
		self.robot_goal = []

		self.saveTo='data/unicycle/selfAttn_truePred_noiseActt/record/real_recordings/'
		# self.loadFrom = 'data/unicycle/selfAttn_truePred_noiseActt/record/sim_recordings/ep0-1/action.csv'
		self.loadFrom = 'data/unicycle/selfAttn_truePred_noiseActt/record/sim_recordings/ep0-1point5/unsmoothed_action.csv'
		self.loadedAction=None
		self.episodeInitNum=0


	def saveEpisode(self,episodeCounter):

		savePath=os.path.join(self.saveTo,'ep'+str(self.episodeInitNum+episodeCounter))
		#savePath = os.path.join(self.saveTo, 'ep' + str(4))
		if not os.path.exists(savePath):
			#shutil.rmtree(savePath)
			os.makedirs(savePath)

		if len(self.actionList) > 0:
			action = pd.DataFrame({'trans': np.array(self.actionList)[:,0], 'rot': np.array(self.actionList)[:,1]})
			action.to_csv(os.path.join(savePath, 'action.csv'), index=False)
		if len(self.wheelVelList) > 0:
			wheelVel=pd.DataFrame({'left': np.array(self.wheelVelList)[:,0], 'right': np.array(self.wheelVelList)[:,1]})
			wheelVel.to_csv(os.path.join(savePath, 'wheelVel.csv'), index=False)
		if len(self.orientationList) > 0:
			orientation = pd.DataFrame({'ori': np.array(self.orientationList)})
			orientation.to_csv(os.path.join(savePath, 'orientation.csv'), index=False)
		if len(self.positionList) > 0:
			position = pd.DataFrame({'x': np.array(self.positionList)[:, 0], 'y': np.array(self.positionList)[:, 1]})
			position.to_csv(os.path.join(savePath, 'position.csv'), index=False)
		if len(self.unsmoothed_actions) > 0:
			unsmooth_action = pd.DataFrame({'trans': np.array(self.unsmoothed_actions)[:,0], 'rot': np.array(self.unsmoothed_actions)[:,1]})
			unsmooth_action.to_csv(os.path.join(savePath, 'unsmoothed_action.csv'), index=False)
		if len(self.robot_goal) > 0:
			robot_goal = pd.DataFrame({'x': np.array(self.robot_goal)[:, 0], 'y': np.array(self.robot_goal)[:, 1]})
			robot_goal.to_csv(os.path.join(savePath, 'robot_goal.csv'), index=False)

		print("csv written")
		self.clear()

	def loadActions(self):
		self.loadedAction = pd.read_csv(self.loadFrom)
		self.v_list = self.loadedAction['trans']
		self.delta_theta_list = self.loadedAction['rot']
		print("Reading actions from", self.loadFrom)

	def clear(self):
		self.actionList = []  # recorded in turtlebot_env.step. [trans,rot]
		self.wheelVelList = []  # recorded in apply_action(). [left,right]
		self.orientationList = []  # recorded in calc_state() [angle]
		self.positionList = []  # recorded in calc_state() [x,y]
		self.unsmoothed_actions = []
		self.robot_goal = []

class humanRecoder(object):
	def __init__(self, human_num):
		self.human_num = human_num
		# list of [px, py, radius, v_pref] with length = human_num
		self.initPosList = [[] for i in range(self.human_num)]
		# list of goal lists with length = human_num
		# i-th element is a list i-th human's goals in the episode
		self.goalPosList = [[] for i in range(self.human_num)]

	# update self.initPosList
	# human_id: 0~4
	# pos: list of [px, py, radius, v_pref]
	def addInitPos(self, human_id, px, py, radius, v_pref):
		self.initPosList[human_id] = [px, py, radius, v_pref]

	# append [px, py] at the end of i-th element of list
	def addGoalPos(self, human_id, px, py):
		self.goalPosList[human_id].append([px, py])

	def loadLists(self, initPos, goalPos):
		self.initPosList = initPos
		self.goalPosList = goalPos

	def getInitList(self):
		return self.initPosList

	def getGoalList(self):
		return self.goalPosList

	def getInitPos(self, human_id):
		return self.initPosList[human_id]

	# get next goal position for human i
	# if the last goal is already gotten before, return None
	def getNextGoalPos(self, human_id):
		# if the list is empty
		if not self.goalPosList[human_id]:
			return None
		return self.goalPosList[human_id].pop(0)

	# check whether human i has used up its goal
	def goalIsEmpty(self, human_id):
		if not self.goalPosList[human_id]:
			return True
		else:
			return False

	def clear(self):
		self.initPosList = [[] for i in range(self.human_num)]
		self.goalPosList = [[] for i in range(self.human_num)]

class jointStateRecoder(object):
	def __init__(self, human_num):
		self.human_num = human_num

		# px, py, r, gx, gy, v_pref, theta, vx, vy
		self.robot_s = []

		# nested list of [px1, ..., px5]
		self.humans_px = []
		# nested list of [py1, ..., py5]
		self.humans_py = []

		# record the random seed for this episode
		self.episode_num = []

		# data labels:
		self.episode_num_label = []
		self.traj_ratio = []
		self.time_ratio = []
		self.idle_time = []


	# append a joint state to the lists
	# ob should be the ob returned from env.step()
	def add_traj(self, seed, ob, srnn):
		# if ob is a dictionary from crowd_sim_dict.py
		if srnn:
			robot_s_noV = ob['robot_node'][0, 0].cpu().numpy()
			robot_v = ob['edges'][0, 0].cpu().numpy()
			self.robot_s.append(np.concatenate((robot_s_noV, robot_v)))
			self.humans_px.append(ob['edges'][0, 1:, 0].cpu().numpy())
			self.humans_py.append(ob['edges'][0, 1:, 1].cpu().numpy())
		# if ob is a list from crowd_sim.py
		else:
			ob = ob[0].cpu().numpy()
			# px, py, r, gx, gy, v_pref, theta, vx, vy
			self.robot_s.append(np.concatenate((ob[1:3], ob[5:10], ob[3:5])))
			px_list, py_list = [], []
			# human num = (ob.shape - 10) // 5
			for i in range((ob.shape[0] - 10) // 5):
				px_idx = 5 * i + 10
				py_idx = 5 * i + 11
				px_list.append(ob[px_idx])
				py_list.append(ob[py_idx])
			self.humans_px.append(px_list)
			self.humans_py.append(py_list)
		self.episode_num.append(seed)

	# seed: the random seed of this episode from env
	# respond_traj_len, respond_time: the total traj length and total timesteps when the user makes last choice
	# traj_len, tot_time: total traj length and timesteps of the whole episode
	def add_label(self, seed, respond_traj_len, traj_len, respond_time, tot_time, idle_time):
		self.traj_ratio.append(respond_traj_len/traj_len)
		self.time_ratio.append(respond_time/tot_time)
		self.idle_time.append(idle_time)
		self.episode_num_label.append(seed)


	def save_to_file(self, directory):
		if not os.path.exists(os.path.join('trajectories', directory)):
			os.makedirs(os.path.join('trajectories', directory))
		# 1. save all trajectories
		# add robot data
		robot_s = np.array(self.robot_s)
		data_dict = {'epi_num': np.array(self.episode_num), 'robot_px': robot_s[:, 0], 'robot_py': robot_s[:, 1],
					 'robot_r': robot_s[:, 2],
					 'robot_gx': robot_s[:, 3], 'robot_gy': robot_s[:, 4], 'robot_vpref': robot_s[:, 5],
					 'robot_theta': robot_s[:, 6], 'robot_vx': robot_s[:, 7], 'robot_vy': robot_s[:, 8]}
		# add humans data
		humans_px = np.array(self.humans_px)
		humans_py = np.array(self.humans_py)
		for i in range(self.human_num):
			data_dict['human'+str(i)+'px'] = humans_px[:, i]
			data_dict['human'+str(i)+'py'] = humans_py[:, i]

		all_data = pd.DataFrame(data_dict)
		all_data.to_csv(os.path.join('trajectories', directory, 'states.csv'), index=False)

		# 2. save all labels
		data_dict = {'epi_num': np.array(self.episode_num_label), 'traj_ratio': np.array(self.traj_ratio),
					 'time_ratio': np.array(self.time_ratio), 'idle_time': np.array(self.idle_time)}
		all_data = pd.DataFrame(data_dict)
		all_data.to_csv(os.path.join('trajectories', directory, 'labels.csv'), index=False)

		self.clear()

	def clear(self):
		# px, py, r, gx, gy, v_pref, theta, vx, vy
		self.robot_s = []

		# nested list of [px1, ..., px5]
		self.humans_px = []
		# nested list of [py1, ..., py5]
		self.humans_py = []

		# record the random seed for this episode
		self.episode_num = []

		# data labels:
		self.episode_num_label = []
		self.traj_ratio = []
		self.time_ratio = []
		self.idle_time = []


================================================
FILE: crowd_sim/envs/utils/robot.py
================================================
from crowd_sim.envs.utils.agent import Agent
from crowd_sim.envs.utils.state import JointState


class Robot(Agent):
    def __init__(self, config,section):
        super().__init__(config,section)
        self.sensor_range = config.robot.sensor_range

    def act(self, ob):
        if self.policy is None:
            raise AttributeError('Policy attribute has to be set!')

        state = JointState(self.get_full_state(), ob)
        action = self.policy.predict(state)
        return action


    def actWithJointState(self,ob):
        action = self.policy.predict(ob)
        return action


================================================
FILE: crowd_sim/envs/utils/state.py
================================================
from collections import namedtuple
import numpy as np

FullState = namedtuple('FullState', ['px', 'py', 'vx', 'vy', 'radius', 'gx', 'gy', 'v_pref', 'theta'])
ObservableState = namedtuple('ObservableState', ['px', 'py', 'vx', 'vy', 'radius'])

# JointState has 2 attributes:
# self.self_state is a FullState
# self.human_states is a list of ObservableStates
class JointState(object):
    # self_state: list of length 9
    # human_states: list of length human_num*5 or nested list [human_num, 5]
    def __init__(self, self_state, human_states):
        assert len(self_state) == 9
        human_states_namedtuple = []
        # if human states is a nested list [human_num, 5]
        if len(np.shape(human_states)) == 2:
            for human_state in human_states:
                assert len(human_state) == 5
                human_states_namedtuple.append(ObservableState(*human_state))
        # if human states is a flatten list of length human_num*5
        else:
            assert len(human_states) % 5 == 0
            human_num = len(human_states) // 5
            for i in range(human_num):
                human_states_namedtuple.append(ObservableState(*human_states[int(i*5):(int((i+1)*5))]))

        self.self_state = FullState(*self_state)
        self.human_states = human_states_namedtuple

    # convert a joint state to a flattened list of length 9 + 5 * human_num
    def to_flatten_list(self):
        flatten_list = list(self.self_state)
        for human_state in self.human_states:
            flatten_list.extend(list(human_state))
        return flatten_list

================================================
FILE: gst_updated/.gitignore
================================================
results/
datasets/
logs/
myenv/
*.pickle
*.png
*.pt

================================================
FILE: gst_updated/LICENSE
================================================
MIT License

Copyright (c) 2021 Zhe Huang

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: gst_updated/README-old.md
================================================
# gst
Gumbel Social Transformer.

This is the implementation for the paper
### Learning Sparse Interaction Graphs of Partially Observed Pedestrians for Trajectory Prediction
GST is the abbreviation of our model Gumbel Social Transformer. All code was developed and tested on Ubuntu 18.04 with CUDA 10.2, Python 3.6.9, and PyTorch 1.7.1. <br/>

### Setup
##### 1. Create a Virtual Environment. (Optional)
```
virtualenv -p /usr/bin/python3 myenv
source myenv/bin/activate
```
##### 2. Install Packages
You can run either <br/>
```
pip install -r requirements.txt
```
or <br/>
```
pip install numpy
pip install scipy
pip install matplotlib
pip install tensorboardX
pip install pandas
pip install pyyaml
pip install jupyterlab
pip install torch==1.7.1
pip install tensorflow
```
or for the machine with 3090s,
```
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
```
##### 3. Create Folders and Dataset Files.
```
sh scripts/make_dirs.sh
sh scripts/data/create_batch_datasets_eth_ucy.sh
python scripts/data/create_batch_datasets_sdd.py 
```

### Training and Evaluation on Various Configurations
To train and evaluate a model with n=1, i.e., the target pedestrian pays attention to at most one partially observed pedestrian, run
```
sh scripts/train_sparse.sh
sh scripts/eval_sparse.sh
```
To train and evaluate a model with n=1 and temporal component as a temporal convolution network, run
```
sh scripts/train_sparse_tcn.sh
sh scripts/eval_sparse_tcn.sh
```
To train and evaluate a model with full connection, i.e., the target pedestrian pays attention to all partially observed pedestrians in the scene, run
```
sh scripts/train_full_connection.sh
sh scripts/eval_full_connection.sh
```
To train and evaluate a model in which the target pedestrian pays attention to all fully observed pedestrians in the scene, run
```
sh scripts/train_full_connection_fully_observed.sh
sh scripts/eval_full_connection_fully_observed.sh
```

### Important Arguments for Building Customized Configurations
- `--spatial_num_heads_edges`: n, i.e., the upperbound number of pedestrians that the target pedestrian can pay attention to in the scene. When n=0, it is defined as full connection, i.e., the target pedestrian pays attention to all pedestrians in the scene. Default is 4.
- `--only_observe_full_period`: The target pedestrian only pays attention to fully observed pedestrians. Default is False.
- `--temporal`: Temporal component. `lstm` is Masked LSTM, and `temporal_convolution_net` is temporal convolution network. Default is `lstm`.
- `--decode_style`: Decoding style. It has to match the option `--temporal`. `recursive` matches `lstm`, and `readout` matches `temporal_convolution_net`. Default is `recursive`.
- `--ghost`: Add a ghost pedestrian in the scene to encourage sparsity. When `--spatial_num_heads_edges` is set as zero, i.e., the target pedestrian pays attention to all pedestrians in the scene, `--ghost` has to be set as False. Default is False.


================================================
FILE: gst_updated/README.md
================================================
# gst
Gumbel Social Transformer trained with data provided in Nov, 2021 by Shuijing.

## How to set up
##### 1. Create a Virtual Environment.
```
virtualenv -p /usr/bin/python3 myenv
source myenv/bin/activate
```
##### 2. Install Packages
You can run either <br/>
```
pip install -r requirements.txt
```
or <br/>
```
pip install numpy
pip install scipy
pip install matplotlib
pip install tensorboardX
pip install pandas
pip install pyyaml
pip install jupyterlab
pip install torch==1.7.1
pip install tensorflow
```
or for the machine with 3090s,
```
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
```
##### 3. Create Folders, and download datasets and pretrained models.
```
sh run/make_dirs.sh
sh run/download_datasets_models.sh
```

## How to run
##### 1. Test pretrained models.
```
source myenv/bin/activate
sh tuning/211209-test_shuijing.sh 
```
And you should see outputs which look like below:
```
--------------------------------------------------
dataset:  sj
No ghost version.
new gst
new st model
Loaded configuration:  100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000
Test loss from loaded model:  -4.0132044252296755
dataset: sj | test aoe: 0.1579 | test aoe std: 0.0175 | test foe: 0.3216 | test foe std: 0.0408 | min aoe: 0.1339, min foe: 0.2638
```
##### 2. Test wrapper.
Run
```
cd gst_updated
sh run/download_wrapper_data_model.sh
```
to collect model and demo data for wrapper. Then run
```
python scripts/wrapper/crowd_nav_interface_parallel.py
```
And you should see outputs which look like below:
```
INPUT DATA
number of environments:  4
input_traj shape:  torch.Size([4, 15, 5, 2])
input_binary_mask shape: torch.Size([4, 15, 5, 1])

No ghost version.
new gst
new st model
LOADED MODEL
device:  cuda:0


OUTPUT DATA
output_traj shape:  torch.Size([4, 15, 5, 5])
output_binary_mask shape: torch.Size([4, 15, 1])

0.png is created.
1.png is created.
2.png is created.
3.png is created.
```

## Observation dimensions for traj pred (by Shuijing)
As far as I know, the wrapper for SRNN model does not need to handle multiple timesteps simultaneously. So the time dimension is not mentioned here.
### Output format
The output of traj pred model (or the input of the planner model) at each timestep t is a 3D float array with size = [number_of_pedestrains, number of prediction steps, 5], where 5 includes [mu_x, mu_y, sigma_x, sigma_y, correlation coefficient].

If number of parallel environments > 1, the array becomes 4D with size = [number of env, number_of_pedestrains, number of prediction steps, 5].

The order of the dimensions can be swapped, and the dimensions can be combined in row-major order if needed. For example, [number of env, number_of_pedestrains, number of prediction steps * 5] is good as long as the last dimension is in row-major order.

### Input format 
The input format of traj pred model at each timestep t is a 2D float array with size = [number of pedestrains, 2], where 2 includes [px, py] of each pedestrain. 

If needed, the displacements of all pedestrains can also be added. In this case, input size = [number of pedestrains, 4], where 4 includes [px, py, delta_px, delta_py] of each pedestrain. 

In addition, there is a binary mask with size = [number of pedestrains, 1] that indicates the presence of each human. If a human with ID = i is present, then mask[i] = True, else mask[i] = False.

If number of parallel environments > 1, similarly, the number_of_env is added as the first dimension in addition to the above sizes.


================================================
FILE: gst_updated/__init__.py
================================================


================================================
FILE: gst_updated/requirements.txt
================================================
absl-py==0.13.0
anyio==3.3.0
argon2-cffi==21.1.0
astunparse==1.6.3
async-generator==1.10
attrs==21.2.0
Babel==2.9.1
backcall==0.2.0
bleach==4.1.0
cached-property==1.5.2
cachetools==4.2.2
certifi==2021.5.30
cffi==1.14.6
charset-normalizer==2.0.4
clang==5.0
contextvars==2.4
cycler==0.10.0
dataclasses==0.8
decorator==5.0.9
defusedxml==0.7.1
entrypoints==0.3
flatbuffers==1.12
gast==0.4.0
google-auth==1.35.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.39.0
h5py==3.1.0
idna==3.2
immutables==0.16
importlib-metadata==4.8.1
ipykernel==5.5.5
ipython==7.16.1
ipython-genutils==0.2.0
jedi==0.18.0
Jinja2==3.0.1
json5==0.9.6
jsonschema==3.2.0
jupyter-client==7.0.2
jupyter-core==4.7.1
jupyter-server==1.10.2
jupyterlab==3.1.10
jupyterlab-pygments==0.1.2
jupyterlab-server==2.7.2
keras==2.6.0
Keras-Preprocessing==1.1.2
kiwisolver==1.3.1
Markdown==3.3.4
MarkupSafe==2.0.1
matplotlib==3.3.4
mistune==0.8.4
nbclassic==0.3.1
nbclient==0.5.4
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
notebook==6.4.3
numpy==1.19.5
oauthlib==3.1.1
opt-einsum==3.3.0
packaging==21.0
pandas==1.1.5
pandocfilters==1.4.3
parso==0.8.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.3.1
prometheus-client==0.11.0
prompt-toolkit==3.0.20
protobuf==3.17.3
ptyprocess==0.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
Pygments==2.10.0
pyparsing==2.4.7
pyrsistent==0.18.0
python-dateutil==2.8.2
pytz==2021.1
PyYAML==5.4.1
pyzmq==22.2.1
requests==2.26.0
requests-oauthlib==1.3.0
requests-unixsocket==0.2.0
rsa==4.7.2
scipy==1.5.4
Send2Trash==1.8.0
six==1.15.0
sniffio==1.2.0
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorboardX==2.4
tensorflow==2.6.0
tensorflow-estimator==2.6.0
termcolor==1.1.0
terminado==0.11.1
testpath==0.5.0
torch==1.7.1
tornado==6.1
traitlets==4.3.3
typing-extensions==3.7.4.3
urllib3==1.26.6
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.2.1
Werkzeug==2.0.1
wrapt==1.12.1
zipp==3.5.0


================================================
FILE: gst_updated/run/create_batch_datasets_eth_ucy.sh
================================================
for dataset in eth hotel univ zara1 zara2; do
    python -u scripts/data/create_datasets_eth_ucy.py --dataset $dataset | tee -a logs/create_batch_datasets.txt
    python -u scripts/data/create_batch_datasets_eth_ucy.py --dataset $dataset | tee -a logs/create_batch_datasets.txt
    rm -rf datasets/eth_ucy/${dataset}/${dataset}_dset_train_trajectories.pt
    rm -rf datasets/eth_ucy/${dataset}/${dataset}_dset_val_trajectories.pt
    rm -rf datasets/eth_ucy/${dataset}/${dataset}_dset_test_trajectories.pt
done

================================================
FILE: gst_updated/run/create_batch_datasets_self_eth_ucy.sh
================================================
for dataset in eth hotel univ zara1 zara2; do
    python -u scripts/data/create_datasets_self_eth_ucy.py --dataset $dataset | tee -a logs/create_batch_datasets.txt
    python -u scripts/data/create_batch_datasets_self_eth_ucy.py --dataset $dataset | tee -a logs/create_batch_datasets.txt
    rm -rf datasets/self_eth_ucy/${dataset}/${dataset}_dset_train_trajectories.pt
    rm -rf datasets/self_eth_ucy/${dataset}/${dataset}_dset_val_trajectories.pt
    rm -rf datasets/self_eth_ucy/${dataset}/${dataset}_dset_test_trajectories.pt
done

================================================
FILE: gst_updated/run/create_batch_datasets_synth.sh
================================================
python -u scripts/data/create_datasets_trajnet.py --dataset synth | tee -a logs/create_datasets.txt
python -u scripts/data/create_batch_datasets_trajnet.py --dataset synth | tee -a logs/create_batch_datasets.txt

================================================
FILE: gst_updated/run/download_datasets_models.sh
================================================
cd datasets/shuijing/orca_20humans_fov
cd results/visual
# use curl for big-sized google drive files to redirect
curl -c /tmp/cookies "https://drive.google.com/uc?export=download&id=17QLGgTcCWatIrF4EE1Gai8BaMbqxz6kk" > /tmp/intermezzo.html
curl -L -b /tmp/cookies "https://drive.google.com$(cat /tmp/intermezzo.html | grep -Po 'uc-download-link" [^>]* href="\K[^"]*' | sed 's/\&amp;/\&/g')" > sj_dset_test_batch_trajectories.pt
cd ../../../results
# use wget for small-sized google drive files
wget -O 211203-model.zip "https://drive.google.com/uc?export=download&id=1ukON4cAwM63gKKlJJwgjcdLf7-4-f1KH" > /tmp/intermezzo.html
unzip 211203-model.zip && rm -rf 211203-model.zip

================================================
FILE: gst_updated/run/download_wrapper_data_model.sh
================================================
mkdir -p datasets/wrapper_demo
cd datasets/wrapper_demo
# use wget for small-sized google drive files
wget -O wrapper_demo_data.pt "https://drive.google.com/uc?export=download&id=1Fjc4nFAeqgCTd9UEUsXhJkESqgErNtlA" > /tmp/intermezzo.html

cd ../../results
# use wget for small-sized google drive files
wget -O 211222-model.zip "https://drive.google.com/uc?export=download&id=1mm364PGbp7hFHJ8uXN_uZI4Nq_3JIHPj" > /tmp/intermezzo.html
unzip 211222-model.zip && rm -rf 211222-model.zip

================================================
FILE: gst_updated/run/make_dirs.sh
================================================
mkdir -p datasets/shuijing/orca_20humans_fov
mkdir results
mkdir logs

================================================
FILE: gst_updated/scripts/experiments/draft.py
================================================
import pathhack
import pickle
import time
from os.path import join, isdir
from os import makedirs
import torch
import numpy as np
from tensorboardX import SummaryWriter
from torch.optim.lr_scheduler import StepLR
from src.mgnn.utils import arg_parse, average_offset_error, final_offset_error, \
    negative_log_likelihood, random_rotate_graph, args2writername
from src.gumbel_social_transformer.temperature_scheduler import Temp_Scheduler
from torch.utils.data import DataLoader
from datetime import datetime

num_epochs = 100
init_temp = 1.
checkpoint_epoch = 50
temperature_scheduler = Temp_Scheduler(num_epochs, init_temp, init_temp, \
        temp_min=0.03, last_epoch=checkpoint_epoch-1)
tau_list = []
for epoch in range(51, num_epochs+1):
    tau = temperature_scheduler.step()
    tau_list.append(tau)

temperature_scheduler_new = Temp_Scheduler(num_epochs, init_temp, init_temp, temp_min=0.03)  
tau_list_new = []
for epoch in range(1, num_epochs+1):
    tau = temperature_scheduler_new.step()
    if epoch > 50:
        tau_list_new.append(tau)
tau_list, tau_list_new = np.array(tau_list), np.array(tau_list_new)
print(tau_list-tau_list_new)
# print(tau_list_new)


# def load_batch_dataset(args, pkg_path, subfolder='train', num_workers=4, shuffle=None):
#     result_filename = args.dataset+'_dset_'+subfolder+'_batch_trajectories.pt'
#     if args.dataset == 'sdd':
#         dataset_folderpath = join(pkg_path, 'datasets/sdd/social_pool_data')
#     else:
#         dataset_folderpath = join(pkg_path, 'datasets/eth_ucy', args.dataset)
#     dset = torch.load(join(dataset_folderpath, result_filename))
#     if shuffle is None:
#         if subfolder == 'train':
#             shuffle = True
#         else:
#             shuffle = False
#     dloader = DataLoader(
#         dset,
#         batch_size=1,
#         shuffle=shuffle,
#         num_workers=num_workers)
#     return dloader

# def main(args):
#     print('\n\n')
#     print('-'*50)
#     print('arguments: ', args)
#     torch.manual_seed(args.random_seed)
#     np.random.seed(args.random_seed)
#     if args.batch_size != 1:
#         raise RuntimeError("Batch size must be 1 for BatchTrajectoriesDataset.")
#     if args.dataset == 'sdd' and args.rotation_pattern is not None:
#         raise RuntimeError("SDD should not allow rotation since it uses pixels.")
#     device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#     print('device: ', device)
#     loader_train = load_batch_dataset(args, pathhack.pkg_path, subfolder='train')
#     if args.dataset == 'sdd':
#         loader_val = load_batch_dataset(args, pathhack.pkg_path, subfolder='test') # no val for sdd
#     else:
#         loader_val = load_batch_dataset(args, pathhack.pkg_path, subfolder='val')
#     train_data_loaders = [loader_train, loader_val]
#     print('dataset: ', args.dataset)
#     writername = args2writername(args)
#     print('Config: ', writername)
#     logdir = join(pathhack.pkg_path, 'results', writername, args.dataset)
#     if isdir(logdir) and not args.resume_training:
#         print('Error: The result directory was already created and used.')
#         print('-'*50)
#         print('\n\n')
#         return
#     writer = SummaryWriter(logdir=logdir)
#     print('-'*50)
#     print('\n\n')
#     train(args, train_data_loaders, writer, logdir, device=device)
#     writer.close()

# def load_checkpoint(args, logdir, model, optimizer, lr_scheduler):
#     # ! to be finished
#     checkpoint = torch.load(checkpoint_filepath)
    
#     model.load_state_dict(checkpoint['state_dict'])
#     optimizer.load_state_dict(checkpoint['optimizer'])
#     lr_scheduler.load_state_dict(checkpoint['scheduler'])
#     return model, optimizer, lr_scheduler, temperature_scheduler
    

# def train(args, data_loaders, writer, logdir, device='cuda:0'):
#     print('-'*50)
#     print('Training Phase')
#     print('-'*50, '\n')
#     loader_train, loader_val = data_loaders
#     model = st_model(args, device=device).to(device)
#     optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
#     lr_scheduler = StepLR(optimizer, step_size=50, gamma=0.3)
#     if args.resume_training:
#         temperature_scheduler = Temp_Scheduler(args.num_epochs, args.init_temp, args.init_temp, temp_min=0.03)      
#     else:
#         model, optimizer, lr_scheduler, temperature_scheduler = \
#             load_checkpoint(args, logdir, model, optimizer, lr_scheduler)

#     # def load_ckp(checkpoint_fpath, model, optimizer):
#     # checkpoint = torch.load(checkpoint_fpath)
#     # model.load_state_dict(checkpoint['state_dict'])
#     # optimizer.load_state_dict(checkpoint['optimizer'])
#     # return model, optimizer, checkpoint['epoch']


#     print('Model is initialized.')
#     print('learning rate: ', args.lr)
#     checkpoint_dir = join(logdir, 'checkpoint')
#     if not isdir(checkpoint_dir):
#         makedirs(checkpoint_dir)
#     with open(join(checkpoint_dir, 'args.pickle'), 'wb') as f:
#         pickle.dump(args, f)
#     print('EPOCHS: ', args.num_epochs)
#     print('Training started.\n')
#     train_loss_task, train_aoe_task, train_foe_task = [], [], []
#     val_loss_task, val_aoe_task, val_foe_task = [], [], []
#     hist = {}
#     for epoch in range(1, args.num_epochs+1):
#         model.train()
#         epoch_start_time = time.time()
#         tau = temperature_scheduler.step()
#         train_loss_epoch, train_aoe_epoch, train_foe_epoch, train_loss_mask_epoch = [], [], [], []
#         for batch_idx, batch in enumerate(loader_train):
#             obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
#             v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt = batch
#             if args.rotation_pattern is not None:
#                 (v_obs, A_obs, v_pred_gt, A_pred_gt), _ = \
#                     random_rotate_graph(args, v_obs, A_obs, v_pred_gt, A_pred_gt)
#             v_obs, A_obs, v_pred_gt, attn_mask_obs, loss_mask_rel = \
#                 v_obs.to(device), A_obs.to(device), v_pred_gt.to(device), \
#                 attn_mask_obs.to(device), loss_mask_rel.to(device)
#             if args.deterministic:
#                 sampling = False
#             else:
#                 sampling = True
#             results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=False, sampling=sampling, device=device)
#             gaussian_params_pred, x_sample_pred, info = results
#             loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
#             loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
#             if args.deterministic:
#                 loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
#                 offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
#                 loss = offset_error_sq.sum()/eventual_loss_mask.sum()
#             else:
#                 loss = negative_log_likelihood(gaussian_params_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)  
#             train_loss_epoch.append(loss.detach().to('cpu').item())
#             loss = loss / args.batch_size
#             loss.backward()
#             aoe = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
#             foe = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
#             train_aoe_epoch.append(aoe.detach().to('cpu').numpy())
#             train_foe_epoch.append(foe.detach().to('cpu').numpy())
#             train_loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())

#             if args.clip_grad is not None:
#                 torch.nn.utils.clip_grad_norm_(
#                     model.parameters(), args.clip_grad)
#             optimizer.step()
#             optimizer.zero_grad()

#         lr_scheduler.step()
#         train_aoe_epoch, train_foe_epoch, train_loss_mask_epoch = \
#             np.concatenate(train_aoe_epoch, axis=0), \
#             np.concatenate(train_foe_epoch, axis=0), \
#             np.concatenate(train_loss_mask_epoch, axis=0)
#         train_loss_epoch, train_aoe_epoch, train_foe_epoch = \
#             np.mean(train_loss_epoch), \
#             train_aoe_epoch.sum()/train_loss_mask_epoch.sum(), \
#             train_foe_epoch.sum()/train_loss_mask_epoch.sum()
#         train_loss_task.append(train_loss_epoch)
#         train_aoe_task.append(train_aoe_epoch)
#         train_foe_task.append(train_foe_epoch)
#         training_epoch_period = time.time() - epoch_start_time
#         training_epoch_period_per_sample = training_epoch_period/len(loader_train)

#         val_loss_epoch, val_aoe_epoch, val_foe_epoch = inference(loader_val, model, args, mode='val', tau=tau, device=device)
#         val_loss_task.append(val_loss_epoch)
#         val_aoe_task.append(val_aoe_epoch)
#         val_foe_task.append(val_foe_epoch)
#         print('Epoch: {0} | train loss: {1:.4f} | val loss: {2:.4f} | train aoe: {3:.4f} | val aoe: {4:.4f} | train foe: {5:.4f} | val foe: {6:.4f} | period: {7:.2f} sec | time per sample: {8:.4f} sec'\
#                         .format(epoch, train_loss_epoch, val_loss_epoch,\
#                         train_aoe_epoch, val_aoe_epoch,\
#                         train_foe_epoch, val_foe_epoch,\
#                         training_epoch_period, training_epoch_period_per_sample))
#         if epoch % 10 == 0:
#             model_filename = join(checkpoint_dir, 'epoch_'+str(epoch)+'.pt')
#             torch.save({
#                     'epoch': epoch,
#                     'model_state_dict': model.state_dict(),
#                     'optimizer_state_dict': optimizer.state_dict(),
#                     'train_loss_task': train_loss_task,
#                     'val_loss_task': val_loss_task,
#                     'train_aoe_task': train_aoe_task,
#                     'val_aoe_task': val_aoe_task, 
#                     'train_foe_task': train_foe_task,
#                     'val_foe_task': val_foe_task,
#                     'training_date': datetime.today().strftime('%y%m%d'),
#                     }, model_filename)
#             print('epoch_'+str(epoch)+'.pt is saved.')
#             hist['epoch'] = epoch
#             hist['train_loss'], hist['val_loss'] = train_loss_task, val_loss_task
#             hist['train_aoe'], hist['val_aoe'] = train_aoe_task, val_aoe_task
#             hist['train_foe'], hist['val_foe'] = train_foe_task, val_foe_task
#             with open(join(checkpoint_dir, 'train_hist.pickle'), 'wb') as f:
#                 pickle.dump(hist, f)
#                 print(join(checkpoint_dir, 'train_hist.pickle')+' is saved.')
#         writer.add_scalars('loss', {'train': train_loss_task[-1], 'val': val_loss_task[-1]}, epoch)
#         writer.add_scalars('aoe', {'train': train_aoe_task[-1], 'val': val_aoe_task[-1]}, epoch)
#         writer.add_scalars('foe', {'train': train_foe_task[-1], 'val': val_foe_task[-1]}, epoch)
#     return

    
# def inference(loader, model, args, mode='val', tau=1., device='cuda:0'):
#     with torch.no_grad():
#         model.eval()
#         epoch_start_time = time.time()
#         loss_epoch, aoe_epoch, foe_epoch, loss_mask_epoch = [], [], [], []

#         for batch_idx, batch in enumerate(loader):
#             obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
#             v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt = batch
#             v_obs, A_obs, v_pred_gt, attn_mask_obs, loss_mask_rel = \
#                 v_obs.to(device), A_obs.to(device), v_pred_gt.to(device), \
#                 attn_mask_obs.to(device), loss_mask_rel.to(device)
#             if mode == 'val':
#                 if args.deterministic:
#                     sampling = False
#                 else:
#                     sampling = True
#                 results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=False, sampling=sampling, device=device)
#                 gaussian_params_pred, x_sample_pred, info = results
#                 loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
#                 loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
#                 if args.deterministic:
#                     loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
#                     offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
#                     loss = offset_error_sq.sum()/eventual_loss_mask.sum()
#                 else:
#                     loss = negative_log_likelihood(gaussian_params_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
#                 aoe = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
#                 foe = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
#             else:
#                 raise RuntimeError('We now only support val mode.')
#             loss_epoch.append(loss.detach().to('cpu').item())
#             aoe_epoch.append(aoe.detach().to('cpu').numpy())
#             foe_epoch.append(foe.detach().to('cpu').numpy())
#             loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())

#         aoe_epoch, foe_epoch, loss_mask_epoch = \
#             np.concatenate(aoe_epoch, axis=0), \
#             np.concatenate(foe_epoch, axis=0), \
#             np.concatenate(loss_mask_epoch, axis=0)

#         loss_epoch, aoe_epoch, foe_epoch = \
#             np.mean(loss_epoch), \
#             aoe_epoch.sum()/loss_mask_epoch.sum(), \
#             foe_epoch.sum()/loss_mask_epoch.sum()
#         return loss_epoch, aoe_epoch, foe_epoch


# if __name__ == "__main__":
#     args = arg_parse()
#     if args.temporal == "lstm" or args.temporal == "faster_lstm":
#         from src.gumbel_social_transformer.st_model import st_model, offset_error_square_full_partial
#     elif args.temporal == "temporal_convolution_net":
#         from src.gumbel_social_transformer.st_model_tcn import st_model, offset_error_square_full_partial
#     else:
#         raise RuntimeError('The temporal component is not lstm nor tcn nor faster_lstm.')
#     main(args)

================================================
FILE: gst_updated/scripts/experiments/eval.py
================================================
import pathhack
import pickle
from os.path import join, isdir
import torch
import csv
import numpy as np
from src.mgnn.utils import arg_parse, average_offset_error, final_offset_error, args2writername, load_batch_dataset
from src.gumbel_social_transformer.st_model import st_model, offset_error_square_full_partial,\
    negative_log_likelihood_full_partial


def main(args):
    print('\n\n')
    print('-'*50)
    torch.manual_seed(args.random_seed)
    np.random.seed(args.random_seed)
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    loader_test = load_batch_dataset(args, pathhack.pkg_path, subfolder='test')
    if args.dataset == 'sdd':
        loader_val = load_batch_dataset(args, pathhack.pkg_path, subfolder='test') # no val for sdd
    else:
        loader_val = load_batch_dataset(args, pathhack.pkg_path, subfolder='val')
    print('dataset: ', args.dataset)
    writername = args2writername(args)
    logdir = join(pathhack.pkg_path, 'results', writername, args.dataset)
    if not isdir(logdir):
        raise RuntimeError('The result directory was not found.')
    checkpoint_dir = join(logdir, 'checkpoint')
    model_filename = 'epoch_'+str(args.num_epochs)+'.pt'
    with open(join(checkpoint_dir, 'args.pickle'), 'rb') as f:
        args_eval = pickle.load(f)
    model = st_model(args_eval, device=device).to(device)
    model_checkpoint = torch.load(join(checkpoint_dir, model_filename), map_location=device)
    model.load_state_dict(model_checkpoint['model_state_dict'])
    print('Loaded configuration: ', writername)
    print('The best validation losses printed below should be the same.')
    print('Validation loss in the checkpoint: ', model_checkpoint['val_loss_epoch'])
    val_loss_epoch, val_aoe_epoch, val_foe_epoch = inference(loader_val, model, args, mode='val', tau=0.03, device=device)
    print('Validation loss from loaded model: ', val_loss_epoch)
    test_loss_epoch, test_aoe, test_foe, test_aoe_std, test_foe_std, test_aoe_min, test_foe_min = inference(loader_test, model, args, mode='test', tau=0.03, device=device)
    print('Test loss from loaded model: ', test_loss_epoch)
    print('dataset: {0} | test aoe: {1:.4f} | test aoe std: {2:.4f} | test foe: {3:.4f} | test foe std: {4:.4f} | min aoe: {5:.4f}, min foe: {6:.4f}'\
                        .format(args.dataset, test_aoe, test_aoe_std, test_foe, test_foe_std, test_aoe_min, test_foe_min))
    # dataset_list = ['eth', 'hotel', 'univ', 'zara1', 'zara2']
    # dataset_idx = dataset_list.index(args.dataset)
    # csv_row_data = np.ones((4,5))*999999.
    # csv_row_data[:,dataset_idx] = np.array([test_aoe, test_foe, test_aoe_std, test_foe_std])
    # csv_row_data = csv_row_data.reshape(-1)
    # csv_filename = join(checkpoint_dir, '../..', 'test_performance.csv')
    # with open(csv_filename, mode='a') as test_file:
    #     test_writer = csv.writer(test_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    #     test_writer.writerow(csv_row_data)
    #     print(csv_filename+' is written.')


def inference(loader, model, args, mode='val', tau=1., device='cuda:0'):
    with torch.no_grad():
        model.eval()
        loss_epoch, aoe_epoch, foe_epoch, loss_mask_epoch = [], [], [], []
        aoe_mean_epoch, aoe_std_epoch, foe_mean_epoch, foe_std_epoch = [], [], [], []
        aoe_min_epoch, foe_min_epoch = [], []

        for batch_idx, batch in enumerate(loader):
            obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
            v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt = batch
            v_obs, A_obs, v_pred_gt, attn_mask_obs, loss_mask_rel = \
                v_obs.to(device), A_obs.to(device), v_pred_gt.to(device), \
                attn_mask_obs.to(device), loss_mask_rel.to(device)
            if mode == 'val':
                results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=False, sampling=False, device=device)
                gaussian_params_pred, x_sample_pred, info = results
                loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
                loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
                loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
                if args.deterministic:
                    offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                    loss = offset_error_sq.sum()/eventual_loss_mask.sum()
                else:
                    prob_loss, eventual_loss_mask = negative_log_likelihood_full_partial(gaussian_params_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                    loss = prob_loss.sum()/eventual_loss_mask.sum()
                aoe = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                foe = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
            
            elif mode == 'test':
                if args.deterministic:
                    sampling = False
                else:
                    sampling = True
                best_aoe_mean = 999999.
                aoe, foe, loss = [], [], []
                for _ in range(20):
                    results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=True, sampling=sampling, device=device)
                    gaussian_params_pred, x_sample_pred, info = results
                    loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
                    loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
                    loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
                    if args.deterministic:
                        offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                        loss_tmp = offset_error_sq.sum()/eventual_loss_mask.sum()
                    else:
                        prob_loss, eventual_loss_mask = negative_log_likelihood_full_partial(gaussian_params_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                        loss_tmp = prob_loss.sum()/eventual_loss_mask.sum()
                    aoe_tmp = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                    foe_tmp = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                    aoe.append(aoe_tmp)
                    foe.append(foe_tmp)
                    loss.append(loss_tmp)
                aoe = torch.stack(aoe, dim=0)
                foe = torch.stack(foe, dim=0)
                aoe = aoe.sum(1)
                foe = foe.sum(1)
                aoe_mean = aoe.mean()
                aoe_std = aoe.std()
                foe_mean = foe.mean()
                foe_std = foe.std()
                aoe_min, foe_min = aoe.min(), foe.min()
                loss = sum(loss)/len(loss)
            else:
                raise RuntimeError('We now only support val and test mode.')
            
            if mode == 'val':
                loss_epoch.append(loss.detach().to('cpu').item())
                aoe_epoch.append(aoe.detach().to('cpu').numpy())
                foe_epoch.append(foe.detach().to('cpu').numpy())
                loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())
            elif mode == 'test':
                loss_epoch.append(loss.detach().to('cpu').item())
                aoe_mean_epoch.append(aoe_mean.item())
                foe_mean_epoch.append(foe_mean.item())
                aoe_std_epoch.append(aoe_std.item())
                foe_std_epoch.append(foe_std.item())
                aoe_min_epoch.append(aoe_min.item())
                foe_min_epoch.append(foe_min.item())
                loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())
            else:
                raise RuntimeError('We only support val and test mode.')
        
        if mode == 'val':
            aoe_epoch, foe_epoch, loss_mask_epoch = \
                np.concatenate(aoe_epoch, axis=0), \
                np.concatenate(foe_epoch, axis=0), \
                np.concatenate(loss_mask_epoch, axis=0)
            loss_epoch, aoe_epoch, foe_epoch = \
                np.mean(loss_epoch), \
                aoe_epoch.sum()/loss_mask_epoch.sum(), \
                foe_epoch.sum()/loss_mask_epoch.sum()
            return loss_epoch, aoe_epoch, foe_epoch
        elif mode == 'test':
            loss_mask_epoch = np.concatenate(loss_mask_epoch, axis=0)
            aoe = sum(aoe_mean_epoch)/loss_mask_epoch.sum()
            foe = sum(foe_mean_epoch)/loss_mask_epoch.sum()
            aoe_std = sum(aoe_std_epoch)/loss_mask_epoch.sum()
            foe_std = sum(foe_std_epoch)/loss_mask_epoch.sum()
            aoe_min = sum(aoe_min_epoch)/loss_mask_epoch.sum()
            foe_min = sum(foe_min_epoch)/loss_mask_epoch.sum()
            loss_epoch = np.mean(loss_epoch)
            return loss_epoch, aoe, foe, aoe_std, foe_std, aoe_min, foe_min
        else:
            raise RuntimeError('We now only support val mode.')    


if __name__ == "__main__":
    args = arg_parse()
    main(args)

================================================
FILE: gst_updated/scripts/experiments/eval_trajnet.py
================================================
import pathhack
import pickle
from os.path import join, isdir
import torch
import csv
import numpy as np
from src.mgnn.utils import arg_parse, average_offset_error, final_offset_error, args2writername, load_batch_dataset
from src.gumbel_social_transformer.st_model import st_model, offset_error_square_full_partial,\
    negative_log_likelihood_full_partial


def main(args):
    print('\n\n')
    print('-'*50)
    torch.manual_seed(args.random_seed)
    np.random.seed(args.random_seed)
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    loader_test = load_batch_dataset(args, pathhack.pkg_path, subfolder='test')
    if args.dataset == 'sdd':
        loader_val = load_batch_dataset(args, pathhack.pkg_path, subfolder='test') # no val for sdd
    else:
        loader_val = load_batch_dataset(args, pathhack.pkg_path, subfolder='val')
    print('dataset: ', args.dataset)
    writername = args2writername(args)
    logdir = join(pathhack.pkg_path, 'results', writername, args.dataset)
    if not isdir(logdir):
        raise RuntimeError('The result directory was not found.')
    checkpoint_dir = join(logdir, 'checkpoint')
    model_filename = 'epoch_'+str(args.num_epochs)+'.pt'
    with open(join(checkpoint_dir, 'args.pickle'), 'rb') as f:
        args_eval = pickle.load(f)
    model = st_model(args_eval, device=device).to(device)
    model_checkpoint = torch.load(join(checkpoint_dir, model_filename), map_location=device)
    model.load_state_dict(model_checkpoint['model_state_dict'])
    print('Loaded configuration: ', writername)
    print('The best validation losses printed below should be the same.')
    print('Validation loss in the checkpoint: ', model_checkpoint['val_loss_epoch'])
    val_loss_epoch, val_aoe_epoch, val_foe_epoch = inference(loader_val, model, args, mode='val', tau=0.03, device=device)
    print('Validation loss from loaded model: ', val_loss_epoch)
    test_loss_epoch, test_aoe, test_foe, test_aoe_std, test_foe_std, test_aoe_min, test_foe_min = inference(loader_test, model, args, mode='test', tau=0.03, device=device)
    print('Test loss from loaded model: ', test_loss_epoch)
    print('dataset: {0} | test aoe: {1:.4f} | test aoe std: {2:.4f} | test foe: {3:.4f} | test foe std: {4:.4f} | min aoe: {5:.4f}, min foe: {6:.4f}'\
                        .format(args.dataset, test_aoe, test_aoe_std, test_foe, test_foe_std, test_aoe_min, test_foe_min))
    dataset_list = ['eth', 'hotel', 'univ', 'zara1', 'zara2']
    dataset_idx = dataset_list.index(args.dataset)
    csv_row_data = np.ones((4,5))*999999.
    csv_row_data[:,dataset_idx] = np.array([test_aoe, test_foe, test_aoe_std, test_foe_std])
    csv_row_data = csv_row_data.reshape(-1)
    csv_filename = join(checkpoint_dir, '../..', 'test_performance.csv')
    with open(csv_filename, mode='a') as test_file:
        test_writer = csv.writer(test_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        test_writer.writerow(csv_row_data)
        print(csv_filename+' is written.')


def test(loader, model, tau=0.03, device='cuda:0'):
    with torch.no_grad():
        model.eval()
        for batch_idx, batch in enumerate(loader):
            obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
            v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt, frame_id_seq, ped_id_list = batch
            v_obs, A_obs, v_pred_gt, attn_mask_obs, loss_mask_rel = \
                v_obs.to(device), A_obs.to(device), v_pred_gt.to(device), \
                attn_mask_obs.to(device), loss_mask_rel.to(device)
            
            results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=False, sampling=False, device=device)
            gaussian_params_pred, x_sample_pred, info = results
            import pdb; pdb.set_trace()
            # loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
            # loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
            # pos_pred = torch.cumsum(x_sample_pred, dim=1)
            # pos_target = torch.cumsum(x_target_m, dim=1)


                # loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
                # loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]


            # gaussian_params_pred, x_sample_pred, info = results
            #     loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
            #     loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
    
            break

def inference(loader, model, args, mode='val', tau=1., device='cuda:0'):
    with torch.no_grad():
        model.eval()
        loss_epoch, aoe_epoch, foe_epoch, loss_mask_epoch = [], [], [], []
        aoe_mean_epoch, aoe_std_epoch, foe_mean_epoch, foe_std_epoch = [], [], [], []
        aoe_min_epoch, foe_min_epoch = [], []

        for batch_idx, batch in enumerate(loader):
            obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
            v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt = batch
            v_obs, A_obs, v_pred_gt, attn_mask_obs, loss_mask_rel = \
                v_obs.to(device), A_obs.to(device), v_pred_gt.to(device), \
                attn_mask_obs.to(device), loss_mask_rel.to(device)
            if mode == 'val':
                results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=False, sampling=False, device=device)
                gaussian_params_pred, x_sample_pred, info = results
                loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
                loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
                loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
                if args.deterministic:
                    offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                    loss = offset_error_sq.sum()/eventual_loss_mask.sum()
                else:
                    prob_loss, eventual_loss_mask = negative_log_likelihood_full_partial(gaussian_params_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                    loss = prob_loss.sum()/eventual_loss_mask.sum()
                aoe = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                foe = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
            
            elif mode == 'test':
                if args.deterministic:
                    sampling = False
                else:
                    sampling = True
                best_aoe_mean = 999999.
                aoe, foe, loss = [], [], []
                for _ in range(20):
                    results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=True, sampling=sampling, device=device)
                    gaussian_params_pred, x_sample_pred, info = results
                    loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
                    loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
                    loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
                    if args.deterministic:
                        offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                        loss_tmp = offset_error_sq.sum()/eventual_loss_mask.sum()
                    else:
                        prob_loss, eventual_loss_mask = negative_log_likelihood_full_partial(gaussian_params_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                        loss_tmp = prob_loss.sum()/eventual_loss_mask.sum()
                    aoe_tmp = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                    foe_tmp = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                    aoe.append(aoe_tmp)
                    foe.append(foe_tmp)
                    loss.append(loss_tmp)
                aoe = torch.stack(aoe, dim=0)
                foe = torch.stack(foe, dim=0)
                aoe = aoe.sum(1)
                foe = foe.sum(1)
                aoe_mean = aoe.mean()
                aoe_std = aoe.std()
                foe_mean = foe.mean()
                foe_std = foe.std()
                aoe_min, foe_min = aoe.min(), foe.min()
                loss = sum(loss)/len(loss)
            else:
                raise RuntimeError('We now only support val and test mode.')
            
            if mode == 'val':
                loss_epoch.append(loss.detach().to('cpu').item())
                aoe_epoch.append(aoe.detach().to('cpu').numpy())
                foe_epoch.append(foe.detach().to('cpu').numpy())
                loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())
            elif mode == 'test':
                loss_epoch.append(loss.detach().to('cpu').item())
                aoe_mean_epoch.append(aoe_mean.item())
                foe_mean_epoch.append(foe_mean.item())
                aoe_std_epoch.append(aoe_std.item())
                foe_std_epoch.append(foe_std.item())
                aoe_min_epoch.append(aoe_min.item())
                foe_min_epoch.append(foe_min.item())
                loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())
            else:
                raise RuntimeError('We only support val and test mode.')
        
        if mode == 'val':
            aoe_epoch, foe_epoch, loss_mask_epoch = \
                np.concatenate(aoe_epoch, axis=0), \
                np.concatenate(foe_epoch, axis=0), \
                np.concatenate(loss_mask_epoch, axis=0)
            loss_epoch, aoe_epoch, foe_epoch = \
                np.mean(loss_epoch), \
                aoe_epoch.sum()/loss_mask_epoch.sum(), \
                foe_epoch.sum()/loss_mask_epoch.sum()
            return loss_epoch, aoe_epoch, foe_epoch
        elif mode == 'test':
            loss_mask_epoch = np.concatenate(loss_mask_epoch, axis=0)
            aoe = sum(aoe_mean_epoch)/loss_mask_epoch.sum()
            foe = sum(foe_mean_epoch)/loss_mask_epoch.sum()
            aoe_std = sum(aoe_std_epoch)/loss_mask_epoch.sum()
            foe_std = sum(foe_std_epoch)/loss_mask_epoch.sum()
            aoe_min = sum(aoe_min_epoch)/loss_mask_epoch.sum()
            foe_min = sum(foe_min_epoch)/loss_mask_epoch.sum()
            loss_epoch = np.mean(loss_epoch)
            return loss_epoch, aoe, foe, aoe_std, foe_std, aoe_min, foe_min
        else:
            raise RuntimeError('We now only support val mode.')    


if __name__ == "__main__":
    args = arg_parse()
    main(args)

================================================
FILE: gst_updated/scripts/experiments/load_trajnet_test_data.py
================================================
import pathhack
import pickle
from os.path import join, isdir
import torch
import csv
import numpy as np
from src.mgnn.utils import arg_parse, average_offset_error, final_offset_error, args2writername, load_batch_dataset
from src.gumbel_social_transformer.st_model import st_model, offset_error_square_full_partial,\
    negative_log_likelihood_full_partial
from torch.utils.data import DataLoader
import ndjson

# def load_single_batch_dataset_eth_ucy(args, pkg_path, subfolder='train', num_workers=4, shuffle=None):
shuffle = None
subfolder = 'test'
# dset_name = 'orca_synth'
# dset_name = 'collision_test'
writername = "100-gumbel_social_transformer-faster_lstm-lr_0.001-deterministic-init_temp_0.5-edge_head_4-ebd_64-snl_1-snh_8-ghost-seed_1000"
# "100-gumbel_social_transformer-faster_lstm-lr_0.001-deterministic-init_temp_0.5-edge_head_1-ebd_64-snl_1-snh_8-ghost-seed_1000"

# "100-gumbel_social_transformer-faster_lstm-lr_0.001-deterministic-init_temp_0.5-edge_head_8-ebd_64-snl_1-snh_8-ghost-seed_1000"
for dset_name in ['orca_synth', 'collision_test']:
    result_filename = dset_name+'_dset_test_trajectories.pt'
    # result_filename = 'orca_synth_dset_test_trajectories.pt'
    # result_filename = 'collision_test_dset_test_trajectories.pt'
    dataset_folderpath = join(pathhack.pkg_path, 'datasets/trajnet++/test')
    dset = torch.load(join(dataset_folderpath, result_filename))


    dataset_folderpath = join(pathhack.pkg_path, 'datasets/trajnet++/test')
    ndjson_filepath = join(dataset_folderpath, 'synth_data', dset_name+'.ndjson')
    # ndjson_filepath = join(dataset_folderpath, 'synth_data', 'orca_synth.ndjson')
    # ndjson_filepath = join(dataset_folderpath, 'synth_data', 'collision_test.ndjson')
    scene_posts = []
    with open(ndjson_filepath) as f:
        reader = ndjson.reader(f)
        for post in reader:
            if "scene" in post.keys():
                scene_posts.append(post)

    if shuffle is None:
        if subfolder == 'train':
            shuffle = True
        else:
            shuffle = False
    dloader = DataLoader(
        dset,
        batch_size=1,
        shuffle=shuffle,
        num_workers=4)

    frame_diff = 1
    for batch_idx, batch in enumerate(dloader):
        obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
        v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt, frame_id_seq, ped_id_list = batch
        print(v_obs)
        print(frame_id_seq)
        print(ped_id_list)
        print(pred_traj_gt)
        break
    device = "cuda:0"
    
    logdir = join(pathhack.pkg_path, 'results', writername, "synth") 
    checkpoint_dir = join(logdir, 'checkpoint')
    model_filename = 'epoch_'+str(100)+'.pt'
    with open(join(checkpoint_dir, 'args.pickle'), 'rb') as f:
        args_eval = pickle.load(f)

    model = st_model(args_eval, device=device).to(device)
    model_checkpoint = torch.load(join(checkpoint_dir, model_filename), map_location=device)
    model.load_state_dict(model_checkpoint['model_state_dict'])
    print('Loaded configuration: ', writername)
    print('The best validation losses printed below should be the same.')
    print('Validation loss in the checkpoint: ', model_checkpoint['val_loss_epoch'])

    ndjson_lines = []
    with torch.no_grad():
        model.eval()
        for batch_idx, batch in enumerate(dloader):
            if batch_idx % 100 == 0:
                print(batch_idx)
            obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
            v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt, frame_id_seq, ped_id_list = batch
            v_obs, A_obs, attn_mask_obs, loss_mask_rel = \
                v_obs.to(device), A_obs.to(device), \
                attn_mask_obs.to(device), loss_mask_rel.to(device)
            
            results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=0.03, hard=True, sampling=False, device=device)
            gaussian_params_pred, x_sample_pred, info = results

            obs_traj = obs_traj.to("cpu")
            x_sample_pred = x_sample_pred.to("cpu").permute(0, 2, 3, 1) #[1, 6, 2, 12]
            pred_traj = torch.cat((obs_traj[:,:,:,-1:], x_sample_pred), dim=3) # [1, 6, 2, 13]
            pred_traj = torch.cumsum(pred_traj, dim=3)

            start_frame_id = int(frame_id_seq.item())
            pred_traj = pred_traj[:,:,:,1:] # [1, 6, 2, 12]
            loss_mask_rel_full_partial = info['loss_mask_rel_full_partial'][0]
            # {"track": {"f": 370, "p": 3, "x": -1.17, "y": 4.24, "prediction_number": 0, "scene_id": 0}}
            for i in range(len(loss_mask_rel_full_partial)):
                if loss_mask_rel_full_partial[i]:
                    for tt in range(9, 21): # 9-20
                        ndjson_line = {}
                        ndjson_line["track"] = {}
                        ndjson_line["track"]["f"] = start_frame_id + int(frame_diff)*tt
                        ndjson_line["track"]["p"] = int(ped_id_list[i].item())
                        ndjson_line["track"]["x"] = round(pred_traj[0,i,0,tt-9].item(), 2)
                        ndjson_line["track"]["y"] = round(pred_traj[0,i,1,tt-9].item(), 2)
                        ndjson_line["track"]["prediction_number"] = 0
                        ndjson_line["track"]["scene_id"] = batch_idx
                        ndjson_lines.append(ndjson_line)
        print("final scene id: ", batch_idx)

    # with open('./collision_test_predictions.ndjson', 'w') as f:
    # with open('./orca_synth_predictions.ndjson', 'w') as f:
    with open('./'+dset_name+'_predictions.ndjson', 'w') as f:
        writer = ndjson.writer(f, ensure_ascii=False)
        for post in scene_posts:
            writer.writerow(post)
        for post in ndjson_lines:
            writer.writerow(post)

# import pdb; pdb.set_trace()
"""     
# import pdb; pdb.set_trace()
# break
ndjson_lines = []
# ndjson_line = {}
# ndjson_line["track"] = {}
start_frame_id = int(frame_id_seq.item())
# ndjson_line["track"]["p"] = int(frame_id_seq.item()[0])


# {"track": {"f": 113, "p": 19973, "x": -0.95, "y": -1.86}}
pred_traj = pred_traj[:,:,:,1:] # [1, 6, 2, 12]
loss_mask_rel_full_partial = info['loss_mask_rel_full_partial'][0]
# {"track": {"f": 370, "p": 3, "x": -1.17, "y": 4.24, "prediction_number": 0, "scene_id": 0}}
for i in range(len(loss_mask_rel_full_partial)):
    if loss_mask_rel_full_partial[i]:
        for tt in range(9, 21): # 9-20
            ndjson_line = {}
            ndjson_line["track"] = {}
            ndjson_line["track"]["f"] = start_frame_id + int(frame_diff)*tt
            ndjson_line["track"]["p"] = int(ped_id_list[i].item())
            ndjson_line["track"]["x"] = round(pred_traj[0,i,0,tt-9], 2)
            ndjson_line["track"]["y"] = round(pred_traj[0,i,0,tt-9], 2)
            ndjson_line["track"]["prediction_number"] = 0
            ndjson_line["track"]["scene_id"] = batch_idx
"""


# import matplotlib .pyplot as plt
# (Pdb) fig, ax = plt.subplots()
# (Pdb) [ax.plot(pred_traj[0,i,0,:], pred_traj[0,i,1,:], '.-') for i in range(6)]
# [[<matplotlib.lines.Line2D object at 0x7f9317e107b8>], [<matplotlib.lines.Line2D object at 0x7f93bc3eab38>], [<matplotlib.lines.Line2D object at 0x7f93bc3eada0>], [<matplotlib.lines.Line2D object at 0x7f93bc403048>], [<matplotlib.lines.Line2D object at 0x7f93bc403320>], [<matplotlib.lines.Line2D object at 0x7f93bc4035f8>]]
# (Pdb) fig.savefig("tmp4.jpg")


# import ndjson

# # Streaming lines from ndjson file:
# with open('./posts.ndjson') as f:
#     reader = ndjson.reader(f)

#     for post in reader:
#         print(post)

# # Writing items to a ndjson file
# with open('./posts.ndjson', 'w') as f:
#     writer = ndjson.writer(f, ensure_ascii=False)

#     for post in posts:
#         writer.writerow(post)


# print(pred_traj.shape)
# obs_traj
# obs_traj # (1,6,2,8)
# x_sample_pred torch.Size([1, 12, 6, 2])
# pos_pred = torch.cumsum(x_sample_pred, dim=1)
# pos_target = torch.cumsum(x_target_m, dim=1)

# self.obs_traj[start:end, :], self.pred_traj[start:end, :],
#             self.obs_traj_rel[start:end, :], self.pred_traj_rel[start:end, :],
#             self.loss_mask_rel[start:end, :], self.loss_mask[start:end, :],
#             self.v_obs[index], self.A_obs[index],
#             self.v_pred[index], self.A_pred[index],
#             self.attn_mask_obs[index], self.attn_mask_pred[index],
#         ]
#     for mode in ['test']:
#         dataset_filepath = join(dataset_folderpath, 'biwi_eth.ndjson')
#         dset = TrajectoriesDataset(
#             dataset_filepath,
#             obs_seq_len=args.obs_seq_len,
#             pred_seq_len=args.pred_seq_len,
#             skip=1,
#             invalid_value=args.invalid_value,
#             mode=mode,
#         )
#         result_filename = 'biwi_eth_dset_'+mode+'_trajectories.pt'
#         torch.save(dset, join(dataset_folderpath, '..', result_filename))
#         print(join(dataset_folderpath, '..', result_filename)+' is created.')


# def main(args):
#     print('\n\n')
#     print('-'*50)
#     torch.manual_seed(args.random_seed)
#     np.random.seed(args.random_seed)
#     device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#     loader_test = load_batch_dataset(args, pathhack.pkg_path, subfolder='test')
#     if args.dataset == 'sdd':
#         loader_val = load_batch_dataset(args, pathhack.pkg_path, subfolder='test') # no val for sdd
#     else:
#         loader_val = load_batch_dataset(args, pathhack.pkg_path, subfolder='val')
#     print('dataset: ', args.dataset)
#     writername = args2writername(args)
#     logdir = join(pathhack.pkg_path, 'results', writername, args.dataset)
#     if not isdir(logdir):
#         raise RuntimeError('The result directory was not found.')
#     checkpoint_dir = join(logdir, 'checkpoint')
#     model_filename = 'epoch_'+str(args.num_epochs)+'.pt'
#     with open(join(checkpoint_dir, 'args.pickle'), 'rb') as f:
#         args_eval = pickle.load(f)
#     model = st_model(args_eval, device=device).to(device)
#     model_checkpoint = torch.load(join(checkpoint_dir, model_filename), map_location=device)
#     model.load_state_dict(model_checkpoint['model_state_dict'])
#     print('Loaded configuration: ', writername)
#     print('The best validation losses printed below should be the same.')
#     print('Validation loss in the checkpoint: ', model_checkpoint['val_loss_epoch'])
#     val_loss_epoch, val_aoe_epoch, val_foe_epoch = inference(loader_val, model, args, mode='val', tau=0.03, device=device)
#     print('Validation loss from loaded model: ', val_loss_epoch)
#     test_loss_epoch, test_aoe, test_foe, test_aoe_std, test_foe_std, test_aoe_min, test_foe_min = inference(loader_test, model, args, mode='test', tau=0.03, device=device)
#     print('Test loss from loaded model: ', test_loss_epoch)
#     print('dataset: {0} | test aoe: {1:.4f} | test aoe std: {2:.4f} | test foe: {3:.4f} | test foe std: {4:.4f} | min aoe: {5:.4f}, min foe: {6:.4f}'\
#                         .format(args.dataset, test_aoe, test_aoe_std, test_foe, test_foe_std, test_aoe_min, test_foe_min))
#     dataset_list = ['eth', 'hotel', 'univ', 'zara1', 'zara2']
#     dataset_idx = dataset_list.index(args.dataset)
#     csv_row_data = np.ones((4,5))*999999.
#     csv_row_data[:,dataset_idx] = np.array([test_aoe, test_foe, test_aoe_std, test_foe_std])
#     csv_row_data = csv_row_data.reshape(-1)
#     csv_filename = join(checkpoint_dir, '../..', 'test_performance.csv')
#     with open(csv_filename, mode='a') as test_file:
#         test_writer = csv.writer(test_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
#         test_writer.writerow(csv_row_data)
#         print(csv_filename+' is written.')


# def inference(loader, model, args, mode='val', tau=1., device='cuda:0'):
#     with torch.no_grad():
#         model.eval()
#         loss_epoch, aoe_epoch, foe_epoch, loss_mask_epoch = [], [], [], []
#         aoe_mean_epoch, aoe_std_epoch, foe_mean_epoch, foe_std_epoch = [], [], [], []
#         aoe_min_epoch, foe_min_epoch = [], []

#         for batch_idx, batch in enumerate(loader):
#             obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
#             v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt = batch
#             v_obs, A_obs, v_pred_gt, attn_mask_obs, loss_mask_rel = \
#                 v_obs.to(device), A_obs.to(device), v_pred_gt.to(device), \
#                 attn_mask_obs.to(device), loss_mask_rel.to(device)
#             if mode == 'val':
#                 results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=False, sampling=False, device=device)
#                 gaussian_params_pred, x_sample_pred, info = results
#                 loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
#                 loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
#                 loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
#                 if args.deterministic:
#                     offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
#                     loss = offset_error_sq.sum()/eventual_loss_mask.sum()
#                 else:
#                     prob_loss, eventual_loss_mask = negative_log_likelihood_full_partial(gaussian_params_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
#                     loss = prob_loss.sum()/eventual_loss_mask.sum()
#                 aoe = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
#                 foe = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
            
#             elif mode == 'test':
#                 if args.deterministic:
#                     sampling = False
#                 else:
#                     sampling = True
#                 best_aoe_mean = 999999.
#                 aoe, foe, loss = [], [], []
#                 for _ in range(20):
#                     results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=True, sampling=sampling, device=device)
#                     gaussian_params_pred, x_sample_pred, info = results
#                     loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
#                     loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
#                     loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
#                     if args.deterministic:
#                         offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
#                         loss_tmp = offset_error_sq.sum()/eventual_loss_mask.sum()
#                     else:
#                         prob_loss, eventual_loss_mask = negative_log_likelihood_full_partial(gaussian_params_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
#                         loss_tmp = prob_loss.sum()/eventual_loss_mask.sum()
#                     aoe_tmp = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
#                     foe_tmp = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
#                     aoe.append(aoe_tmp)
#                     foe.append(foe_tmp)
#                     loss.append(loss_tmp)
#                 aoe = torch.stack(aoe, dim=0)
#                 foe = torch.stack(foe, dim=0)
#                 aoe = aoe.sum(1)
#                 foe = foe.sum(1)
#                 aoe_mean = aoe.mean()
#                 aoe_std = aoe.std()
#                 foe_mean = foe.mean()
#                 foe_std = foe.std()
#                 aoe_min, foe_min = aoe.min(), foe.min()
#                 loss = sum(loss)/len(loss)
#             else:
#                 raise RuntimeError('We now only support val and test mode.')
            
#             if mode == 'val':
#                 loss_epoch.append(loss.detach().to('cpu').item())
#                 aoe_epoch.append(aoe.detach().to('cpu').numpy())
#                 foe_epoch.append(foe.detach().to('cpu').numpy())
#                 loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())
#             elif mode == 'test':
#                 loss_epoch.append(loss.detach().to('cpu').item())
#                 aoe_mean_epoch.append(aoe_mean.item())
#                 foe_mean_epoch.append(foe_mean.item())
#                 aoe_std_epoch.append(aoe_std.item())
#                 foe_std_epoch.append(foe_std.item())
#                 aoe_min_epoch.append(aoe_min.item())
#                 foe_min_epoch.append(foe_min.item())
#                 loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())
#             else:
#                 raise RuntimeError('We only support val and test mode.')
        
#         if mode == 'val':
#             aoe_epoch, foe_epoch, loss_mask_epoch = \
#                 np.concatenate(aoe_epoch, axis=0), \
#                 np.concatenate(foe_epoch, axis=0), \
#                 np.concatenate(loss_mask_epoch, axis=0)
#             loss_epoch, aoe_epoch, foe_epoch = \
#                 np.mean(loss_epoch), \
#                 aoe_epoch.sum()/loss_mask_epoch.sum(), \
#                 foe_epoch.sum()/loss_mask_epoch.sum()
#             return loss_epoch, aoe_epoch, foe_epoch
#         elif mode == 'test':
#             loss_mask_epoch = np.concatenate(loss_mask_epoch, axis=0)
#             aoe = sum(aoe_mean_epoch)/loss_mask_epoch.sum()
#             foe = sum(foe_mean_epoch)/loss_mask_epoch.sum()
#             aoe_std = sum(aoe_std_epoch)/loss_mask_epoch.sum()
#             foe_std = sum(foe_std_epoch)/loss_mask_epoch.sum()
#             aoe_min = sum(aoe_min_epoch)/loss_mask_epoch.sum()
#             foe_min = sum(foe_min_epoch)/loss_mask_epoch.sum()
#             loss_epoch = np.mean(loss_epoch)
#             return loss_epoch, aoe, foe, aoe_std, foe_std, aoe_min, foe_min
#         else:
#             raise RuntimeError('We now only support val mode.')    


# if __name__ == "__main__":
#     args = arg_parse()
#     main(args)

================================================
FILE: gst_updated/scripts/experiments/pathhack.py
================================================
import sys
from os.path import abspath, join, split
file_path = split(abspath(__file__))[0]
pkg_path = join(file_path, '../..')
sys.path.append(pkg_path)


================================================
FILE: gst_updated/scripts/experiments/test.py
================================================
from gst_updated.scripts.data import pathhack
import pickle
from os.path import join, isdir
import torch
import csv
import numpy as np
from gst_updated.src.mgnn.utils import arg_parse, average_offset_error, final_offset_error, args2writername, load_batch_dataset
from gst_updated.src.gumbel_social_transformer.st_model import st_model, offset_error_square_full_partial,\
    negative_log_likelihood_full_partial


def main(args):
    print('\n\n')
    print('-'*50)
    torch.manual_seed(args.random_seed)
    np.random.seed(args.random_seed)
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    loader_test = load_batch_dataset(args, pathhack.pkg_path, subfolder='test')
    print('dataset: ', args.dataset)
    writername = args2writername(args)
    logdir = join(pathhack.pkg_path, 'results', writername, args.dataset)
    logdir = '/home/shuijing/Desktop/CrowdNav_Prediction/gst_updated/results/100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000/sj'
    if not isdir(logdir):
        raise RuntimeError('The result directory was not found.')
    checkpoint_dir = join(logdir, 'checkpoint')
    model_filename = 'epoch_'+str(args.num_epochs)+'.pt'
    with open(join(checkpoint_dir, 'args.pickle'), 'rb') as f:
        args_eval = pickle.load(f)
    model = st_model(args_eval, device=device).to(device)
    model_checkpoint = torch.load(join(checkpoint_dir, model_filename), map_location=device)
    model.load_state_dict(model_checkpoint['model_state_dict'])
    print('Loaded configuration: ', writername)
    test_loss_epoch, test_aoe, test_foe, test_aoe_std, test_foe_std, test_aoe_min, test_foe_min = inference(loader_test, model, args, mode='test', tau=0.03, device=device)
    print('Test loss from loaded model: ', test_loss_epoch)
    print('dataset: {0} | test aoe: {1:.4f} | test aoe std: {2:.4f} | test foe: {3:.4f} | test foe std: {4:.4f} | min aoe: {5:.4f}, min foe: {6:.4f}'\
                        .format(args.dataset, test_aoe, test_aoe_std, test_foe, test_foe_std, test_aoe_min, test_foe_min))
    # dataset_list = ['eth', 'hotel', 'univ', 'zara1', 'zara2']
    # dataset_idx = dataset_list.index(args.dataset)
    # csv_row_data = np.ones((4,5))*999999.
    # csv_row_data[:,dataset_idx] = np.array([test_aoe, test_foe, test_aoe_std, test_foe_std])
    # csv_row_data = csv_row_data.reshape(-1)
    # csv_filename = join(checkpoint_dir, '../..', 'test_performance.csv')
    # with open(csv_filename, mode='a') as test_file:
    #     test_writer = csv.writer(test_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    #     test_writer.writerow(csv_row_data)
    #     print(csv_filename+' is written.')


def inference(loader, model, args, mode='val', tau=1., device='cuda:0'):
    with torch.no_grad():
        model.eval()
        loss_epoch, aoe_epoch, foe_epoch, loss_mask_epoch = [], [], [], []
        aoe_mean_epoch, aoe_std_epoch, foe_mean_epoch, foe_std_epoch = [], [], [], []
        aoe_min_epoch, foe_min_epoch = [], []

        for batch_idx, batch in enumerate(loader):
            obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
            v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt = batch
            v_obs, A_obs, v_pred_gt, attn_mask_obs, loss_mask_rel = \
                v_obs.to(device), A_obs.to(device), v_pred_gt.to(device), \
                attn_mask_obs.to(device), loss_mask_rel.to(device)
            if mode == 'val':
                results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=False, sampling=False, device=device)
                gaussian_params_pred, x_sample_pred, info = results
                loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
                loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
                loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
                if args.deterministic:
                    offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                    loss = offset_error_sq.sum()/eventual_loss_mask.sum()
                else:
                    prob_loss, eventual_loss_mask = negative_log_likelihood_full_partial(gaussian_params_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                    loss = prob_loss.sum()/eventual_loss_mask.sum()
                aoe = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                foe = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
            
            elif mode == 'test':
                if args.deterministic:
                    sampling = False
                else:
                    sampling = True
                best_aoe_mean = 999999.
                aoe, foe, loss = [], [], []
                for _ in range(20):
                    results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=True, sampling=sampling, device=device)
                    gaussian_params_pred, x_sample_pred, info = results
                    loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
                    loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
                    loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
                    if args.deterministic:
                        offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                        loss_tmp = offset_error_sq.sum()/eventual_loss_mask.sum()
                    else:
                        prob_loss, eventual_loss_mask = negative_log_likelihood_full_partial(gaussian_params_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                        loss_tmp = prob_loss.sum()/eventual_loss_mask.sum()
                    aoe_tmp = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                    foe_tmp = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                    aoe.append(aoe_tmp)
                    foe.append(foe_tmp)
                    loss.append(loss_tmp)
                aoe = torch.stack(aoe, dim=0)
                foe = torch.stack(foe, dim=0)
                aoe = aoe.sum(1)
                foe = foe.sum(1)
                aoe_mean = aoe.mean()
                aoe_std = aoe.std()
                foe_mean = foe.mean()
                foe_std = foe.std()
                aoe_min, foe_min = aoe.min(), foe.min()
                loss = sum(loss)/len(loss)
            else:
                raise RuntimeError('We now only support val and test mode.')
            
            if mode == 'val':
                loss_epoch.append(loss.detach().to('cpu').item())
                aoe_epoch.append(aoe.detach().to('cpu').numpy())
                foe_epoch.append(foe.detach().to('cpu').numpy())
                loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())
            elif mode == 'test':
                loss_epoch.append(loss.detach().to('cpu').item())
                aoe_mean_epoch.append(aoe_mean.item())
                foe_mean_epoch.append(foe_mean.item())
                aoe_std_epoch.append(aoe_std.item())
                foe_std_epoch.append(foe_std.item())
                aoe_min_epoch.append(aoe_min.item())
                foe_min_epoch.append(foe_min.item())
                loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())
            else:
                raise RuntimeError('We only support val and test mode.')
        
        if mode == 'val':
            aoe_epoch, foe_epoch, loss_mask_epoch = \
                np.concatenate(aoe_epoch, axis=0), \
                np.concatenate(foe_epoch, axis=0), \
                np.concatenate(loss_mask_epoch, axis=0)
            loss_epoch, aoe_epoch, foe_epoch = \
                np.mean(loss_epoch), \
                aoe_epoch.sum()/loss_mask_epoch.sum(), \
                foe_epoch.sum()/loss_mask_epoch.sum()
            return loss_epoch, aoe_epoch, foe_epoch
        elif mode == 'test':
            loss_mask_epoch = np.concatenate(loss_mask_epoch, axis=0)
            aoe = sum(aoe_mean_epoch)/loss_mask_epoch.sum()
            foe = sum(foe_mean_epoch)/loss_mask_epoch.sum()
            aoe_std = sum(aoe_std_epoch)/loss_mask_epoch.sum()
            foe_std = sum(foe_std_epoch)/loss_mask_epoch.sum()
            aoe_min = sum(aoe_min_epoch)/loss_mask_epoch.sum()
            foe_min = sum(foe_min_epoch)/loss_mask_epoch.sum()
            loss_epoch = np.mean(loss_epoch)
            return loss_epoch, aoe, foe, aoe_std, foe_std, aoe_min, foe_min
        else:
            raise RuntimeError('We now only support val mode.')    


if __name__ == "__main__":
    # args = arg_parse()
    # main(args)
    from crowd_nav.configs.config import Config
    config = Config()
    main(config.pred)

================================================
FILE: gst_updated/scripts/experiments/test_gst.py
================================================
from gst_updated.scripts.data import pathhack
import pickle
from os.path import join, isdir
import torch
import csv
import numpy as np
from gst_updated.src.mgnn.utils import arg_parse, average_offset_error, final_offset_error, args2writername, load_batch_dataset
from gst_updated.src.gumbel_social_transformer.st_model import st_model, offset_error_square_full_partial,\
    negative_log_likelihood_full_partial


def main(args):
    print('\n\n')
    print('-'*50)
    torch.manual_seed(args.random_seed)
    np.random.seed(args.random_seed)
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    loader_test = load_batch_dataset(args, pathhack.pkg_path, subfolder='test')
    print('dataset: ', args.dataset)
    writername = args2writername(args)
    logdir = join(pathhack.pkg_path, 'results', writername, args.dataset)
    logdir = '/home/shuijing/Desktop/CrowdNav_Prediction/gst_updated/results/100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000/sj'
    if not isdir(logdir):
        raise RuntimeError('The result directory was not found.')
    checkpoint_dir = join(logdir, 'checkpoint')
    model_filename = 'epoch_'+str(args.num_epochs)+'.pt'
    with open(join(checkpoint_dir, 'args.pickle'), 'rb') as f:
        args_eval = pickle.load(f)
    model = st_model(args_eval, device=device).to(device)
    model_checkpoint = torch.load(join(checkpoint_dir, model_filename), map_location=device)
    model.load_state_dict(model_checkpoint['model_state_dict'])
    print('Loaded configuration: ', writername)
    test_loss_epoch, test_aoe, test_foe, test_aoe_std, test_foe_std, test_aoe_min, test_foe_min = inference(loader_test, model, args, mode='test', tau=0.03, device=device)
    print('Test loss from loaded model: ', test_loss_epoch)
    print('dataset: {0} | test aoe: {1:.4f} | test aoe std: {2:.4f} | test foe: {3:.4f} | test foe std: {4:.4f} | min aoe: {5:.4f}, min foe: {6:.4f}'\
                        .format(args.dataset, test_aoe, test_aoe_std, test_foe, test_foe_std, test_aoe_min, test_foe_min))
    # dataset_list = ['eth', 'hotel', 'univ', 'zara1', 'zara2']
    # dataset_idx = dataset_list.index(args.dataset)
    # csv_row_data = np.ones((4,5))*999999.
    # csv_row_data[:,dataset_idx] = np.array([test_aoe, test_foe, test_aoe_std, test_foe_std])
    # csv_row_data = csv_row_data.reshape(-1)
    # csv_filename = join(checkpoint_dir, '../..', 'test_performance.csv')
    # with open(csv_filename, mode='a') as test_file:
    #     test_writer = csv.writer(test_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    #     test_writer.writerow(csv_row_data)
    #     print(csv_filename+' is written.')


def inference(loader, model, args, mode='val', tau=1., device='cuda:0'):
    with torch.no_grad():
        model.eval()
        loss_epoch, aoe_epoch, foe_epoch, loss_mask_epoch = [], [], [], []
        aoe_mean_epoch, aoe_std_epoch, foe_mean_epoch, foe_std_epoch = [], [], [], []
        aoe_min_epoch, foe_min_epoch = [], []

        for batch_idx, batch in enumerate(loader):
            obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
            v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt = batch
            v_obs, A_obs, v_pred_gt, attn_mask_obs, loss_mask_rel = \
                v_obs.to(device), A_obs.to(device), v_pred_gt.to(device), \
                attn_mask_obs.to(device), loss_mask_rel.to(device)
            if mode == 'val':
                results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=False, sampling=False, device=device)
                gaussian_params_pred, x_sample_pred, info = results
                loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
                loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
                loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
                if args.deterministic:
                    offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                    loss = offset_error_sq.sum()/eventual_loss_mask.sum()
                else:
                    prob_loss, eventual_loss_mask = negative_log_likelihood_full_partial(gaussian_params_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                    loss = prob_loss.sum()/eventual_loss_mask.sum()
                aoe = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                foe = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
            
            elif mode == 'test':
                if args.deterministic:
                    sampling = False
                else:
                    sampling = True
                best_aoe_mean = 999999.
                aoe, foe, loss = [], [], []
                for _ in range(20):
                    results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=True, sampling=sampling, device=device)
                    gaussian_params_pred, x_sample_pred, info = results
                    loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
                    loss_mask_rel_full_partial = info['loss_mask_rel_full_partial']
                    loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
                    if args.deterministic:
                        offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                        loss_tmp = offset_error_sq.sum()/eventual_loss_mask.sum()
                    else:
                        prob_loss, eventual_loss_mask = negative_log_likelihood_full_partial(gaussian_params_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                        loss_tmp = prob_loss.sum()/eventual_loss_mask.sum()
                    aoe_tmp = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                    foe_tmp = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
                    aoe.append(aoe_tmp)
                    foe.append(foe_tmp)
                    loss.append(loss_tmp)
                aoe = torch.stack(aoe, dim=0)
                foe = torch.stack(foe, dim=0)
                aoe = aoe.sum(1)
                foe = foe.sum(1)
                aoe_mean = aoe.mean()
                aoe_std = aoe.std()
                foe_mean = foe.mean()
                foe_std = foe.std()
                aoe_min, foe_min = aoe.min(), foe.min()
                loss = sum(loss)/len(loss)
            else:
                raise RuntimeError('We now only support val and test mode.')
            
            if mode == 'val':
                loss_epoch.append(loss.detach().to('cpu').item())
                aoe_epoch.append(aoe.detach().to('cpu').numpy())
                foe_epoch.append(foe.detach().to('cpu').numpy())
                loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())
            elif mode == 'test':
                loss_epoch.append(loss.detach().to('cpu').item())
                aoe_mean_epoch.append(aoe_mean.item())
                foe_mean_epoch.append(foe_mean.item())
                aoe_std_epoch.append(aoe_std.item())
                foe_std_epoch.append(foe_std.item())
                aoe_min_epoch.append(aoe_min.item())
                foe_min_epoch.append(foe_min.item())
                loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())
            else:
                raise RuntimeError('We only support val and test mode.')
        
        if mode == 'val':
            aoe_epoch, foe_epoch, loss_mask_epoch = \
                np.concatenate(aoe_epoch, axis=0), \
                np.concatenate(foe_epoch, axis=0), \
                np.concatenate(loss_mask_epoch, axis=0)
            loss_epoch, aoe_epoch, foe_epoch = \
                np.mean(loss_epoch), \
                aoe_epoch.sum()/loss_mask_epoch.sum(), \
                foe_epoch.sum()/loss_mask_epoch.sum()
            return loss_epoch, aoe_epoch, foe_epoch
        elif mode == 'test':
            loss_mask_epoch = np.concatenate(loss_mask_epoch, axis=0)
            aoe = sum(aoe_mean_epoch)/loss_mask_epoch.sum()
            foe = sum(foe_mean_epoch)/loss_mask_epoch.sum()
            aoe_std = sum(aoe_std_epoch)/loss_mask_epoch.sum()
            foe_std = sum(foe_std_epoch)/loss_mask_epoch.sum()
            aoe_min = sum(aoe_min_epoch)/loss_mask_epoch.sum()
            foe_min = sum(foe_min_epoch)/loss_mask_epoch.sum()
            loss_epoch = np.mean(loss_epoch)
            return loss_epoch, aoe, foe, aoe_std, foe_std, aoe_min, foe_min
        else:
            raise RuntimeError('We now only support val mode.')    


if __name__ == "__main__":
    # args = arg_parse()
    # main(args)
    from crowd_nav.configs.config import Config
    config = Config()
    main(config.pred)

================================================
FILE: gst_updated/scripts/experiments/train.py
================================================
import pathhack
import pickle
import time
from os.path import join, isdir
from os import makedirs
import torch
import numpy as np
from tensorboardX import SummaryWriter
from torch.optim.lr_scheduler import StepLR
from src.mgnn.utils import arg_parse, average_offset_error, final_offset_error, random_rotate_graph, args2writername, load_batch_dataset
from src.gumbel_social_transformer.temperature_scheduler import Temp_Scheduler
from scripts.experiments.eval import inference
from datetime import datetime


def main(args):
    print('\n\n')
    print('-'*50)
    print('arguments: ', args)
    torch.manual_seed(args.random_seed)
    np.random.seed(args.random_seed)
    if args.batch_size != 1:
        raise RuntimeError("Batch size must be 1 for BatchTrajectoriesDataset.")
    if args.dataset == 'sdd' and args.rotation_pattern is not None:
        raise RuntimeError("SDD should not allow rotation since it uses pixels.")
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print('device: ', device)
    loader_train = load_batch_dataset(args, pathhack.pkg_path, subfolder='train')
    if args.dataset == 'sdd':
        loader_val = load_batch_dataset(args, pathhack.pkg_path, subfolder='test') # no val for sdd
    else:
        loader_val = load_batch_dataset(args, pathhack.pkg_path, subfolder='val')
    train_data_loaders = [loader_train, loader_val]
    print('dataset: ', args.dataset)
    writername = args2writername(args)
    print('Config: ', writername)
    logdir = join(pathhack.pkg_path, 'results', writername, args.dataset)
    if isdir(logdir) and not args.resume_training:
        print('Error: The result directory was already created and used.')
        print('-'*50)
        print('\n\n')
        return
    writer = SummaryWriter(logdir=logdir)
    print('-'*50)
    print('\n\n')
    train(args, train_data_loaders, writer, logdir, device=device)
    writer.close()

def train(args, data_loaders, writer, logdir, device='cuda:0'):
    print('-'*50)
    print('Training Phase')
    print('-'*50, '\n')
    loader_train, loader_val = data_loaders
    model = st_model(args, device=device).to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
    lr_scheduler = StepLR(optimizer, step_size=int(args.temp_epochs/4), gamma=0.3)
    checkpoint_dir = join(logdir, 'checkpoint')
    if args.resume_training:
        if not isdir(checkpoint_dir):
            raise RuntimeError("Checkpoint folder does not exist.")
        with open(join(checkpoint_dir, 'train_hist.pickle'), 'rb') as f:
            hist = pickle.load(f)
            print(join(checkpoint_dir, 'train_hist.pickle')+' is loaded.')
        if args.resume_epoch is None:
            checkpoint_epoch = hist['epoch']
        else:
            checkpoint_epoch = args.resume_epoch
            hist['train_loss_task'], hist['val_loss_task'] = \
                hist['train_loss_task'][:checkpoint_epoch//args.save_epochs], hist['val_loss_task'][:checkpoint_epoch//args.save_epochs]
            hist['train_aoe_task'], hist['val_aoe_task'] = \
                hist['train_aoe_task'][:checkpoint_epoch//args.save_epochs], hist['val_aoe_task'][:checkpoint_epoch//args.save_epochs]
            hist['train_foe_task'], hist['val_foe_task'] = \
                hist['train_foe_task'][:checkpoint_epoch//args.save_epochs], hist['val_foe_task'][:checkpoint_epoch//args.save_epochs]
        checkpoint = torch.load(join(checkpoint_dir, 'epoch_'+str(checkpoint_epoch)+'.pt'))
        # load checkpoint
        model.load_state_dict(checkpoint['model_state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        lr_scheduler.load_state_dict(checkpoint['lr_scheduler_state_dict'])
        temperature_scheduler = Temp_Scheduler(args.temp_epochs, args.init_temp, args.init_temp, \
            temp_min=0.03, last_epoch=checkpoint_epoch-1)
        start_epoch = checkpoint_epoch+1
        print('Model, optimizer, lr_scheduler, and temperature scheduler are loaded.')
        print('EPOCHS: '+str(start_epoch)+' to '+str(args.num_epochs))
    else:
        if not isdir(checkpoint_dir):
            makedirs(checkpoint_dir)
        with open(join(checkpoint_dir, 'args.pickle'), 'wb') as f:
            pickle.dump(args, f)
        temperature_scheduler = Temp_Scheduler(args.temp_epochs, args.init_temp, args.init_temp, temp_min=0.03)
        hist = {}
        hist['epoch'] = 0
        hist['train_loss_task'], hist['val_loss_task'] = [], []
        hist['train_aoe_task'], hist['val_aoe_task'] = [], []
        hist['train_foe_task'], hist['val_foe_task'] = [], []
        start_epoch = 1
        print('Model, optimizer, lr_scheduler, and temperature scheduler are initialized.')
        print('EPOCHS: '+str(start_epoch)+' to '+str(args.num_epochs))
    print('Training started.\n')
    for epoch in range(start_epoch, args.num_epochs+1):
        model.train()
        epoch_start_time = time.time()
        tau = temperature_scheduler.step()
        train_loss_epoch, train_aoe_epoch, train_foe_epoch, train_loss_mask_epoch = [], [], [], []
        batch_len = 0
        valid_batch_len = 0
        for batch_idx, batch in enumerate(loader_train):
            batch_len+=1
            obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
            v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt = batch
            # print(v_obs.shape)
            if v_obs.shape[2]>128:
                continue
            valid_batch_len += 1
            if args.rotation_pattern is not None:
                (v_obs, A_obs, v_pred_gt, A_pred_gt), _ = \
                    random_rotate_graph(args, v_obs, A_obs, v_pred_gt, A_pred_gt)
            v_obs, A_obs, v_pred_gt, attn_mask_obs, loss_mask_rel = \
                v_obs.to(device), A_obs.to(device), v_pred_gt.to(device), \
                attn_mask_obs.to(device), loss_mask_rel.to(device)
            results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=False, sampling=False, device=device)
            gaussian_params_pred, x_sample_pred, info = results
            loss_mask_per_pedestrian = info['loss_mask_per_pedestrian']
            loss_mask_rel_full_partial = info['loss_mask_rel_full_partial'] # value depends on only_observe_full_period
            loss_mask_rel_pred = loss_mask_rel[:,:,-args.pred_seq_len:]
            if args.deterministic:
                offset_error_sq, eventual_loss_mask = offset_error_square_full_partial(x_sample_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                loss = offset_error_sq.sum()/eventual_loss_mask.sum()
            else:
                prob_loss, eventual_loss_mask = negative_log_likelihood_full_partial(gaussian_params_pred, v_pred_gt, loss_mask_rel_full_partial, loss_mask_rel_pred)
                loss = prob_loss.sum()/eventual_loss_mask.sum()

            train_loss_epoch.append(loss.detach().to('cpu').item())
            loss = loss / args.batch_size
            loss.backward()
            # only consider the fully detected pedestrians
            aoe = average_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
            foe = final_offset_error(x_sample_pred, v_pred_gt, loss_mask=loss_mask_per_pedestrian)
            train_aoe_epoch.append(aoe.detach().to('cpu').numpy())
            train_foe_epoch.append(foe.detach().to('cpu').numpy())
            train_loss_mask_epoch.append(loss_mask_per_pedestrian[0].detach().to('cpu').numpy())
            if args.clip_grad is not None:
                torch.nn.utils.clip_grad_norm_(
                    model.parameters(), args.clip_grad)
            optimizer.step()
            optimizer.zero_grad()
        lr_scheduler.step()
        print("valid batch ratio: ", valid_batch_len/batch_len)
        train_aoe_epoch, train_foe_epoch, train_loss_mask_epoch = \
            np.concatenate(train_aoe_epoch, axis=0), \
            np.concatenate(train_foe_epoch, axis=0), \
            np.concatenate(train_loss_mask_epoch, axis=0)
        train_loss_epoch, train_aoe_epoch, train_foe_epoch = \
            np.mean(train_loss_epoch), \
            train_aoe_epoch.sum()/train_loss_mask_epoch.sum(), \
            train_foe_epoch.sum()/train_loss_mask_epoch.sum()
        hist['train_loss_task'].append(train_loss_epoch)
        hist['train_aoe_task'].append(train_aoe_epoch)
        hist['train_foe_task'].append(train_foe_epoch)
        training_epoch_period = time.time() - epoch_start_time
        training_epoch_period_per_sample = training_epoch_period/len(loader_train)

        val_loss_epoch, val_aoe_epoch, val_foe_epoch = inference(loader_val, model, args, mode='val', tau=tau, device=device)
        hist['val_loss_task'].append(val_loss_epoch)
        hist['val_aoe_task'].append(val_aoe_epoch)
        hist['val_foe_task'].append(val_foe_epoch)
        hist['epoch'] = epoch
        print('Epoch: {0} | train loss: {1:.4f} | val loss: {2:.4f} | train aoe: {3:.4f} | val aoe: {4:.4f} | train foe: {5:.4f} | val foe: {6:.4f} | period: {7:.2f} sec | time per sample: {8:.4f} sec'\
                        .format(epoch, train_loss_epoch, val_loss_epoch,\
                        train_aoe_epoch, val_aoe_epoch,\
                        train_foe_epoch, val_foe_epoch,\
                        training_epoch_period, training_epoch_period_per_sample))
        if epoch % args.save_epochs == 0:
            model_filename = join(checkpoint_dir, 'epoch_'+str(epoch)+'.pt')
            torch.save({
                    'epoch': epoch,
                    'model_state_dict': model.state_dict(),
                    'optimizer_state_dict': optimizer.state_dict(),
                    'lr_scheduler_state_dict': lr_scheduler.state_dict(),
                    'train_loss_epoch': train_loss_epoch,
                    'val_loss_epoch': val_loss_epoch,
                    'train_aoe_epoch': train_aoe_epoch,
                    'val_aoe_epoch': val_aoe_epoch, 
                    'train_foe_epoch': train_foe_epoch,
                    'val_foe_epoch': val_foe_epoch,
                    'training_date': datetime.today().strftime('%y%m%d'),
                    }, model_filename)
            print('epoch_'+str(epoch)+'.pt is saved.')
            with open(join(checkpoint_dir, 'train_hist.pickle'), 'wb') as f:
                pickle.dump(hist, f)
                print(join(checkpoint_dir, 'train_hist.pickle')+' is saved.')
        writer.add_scalars('loss', {'train': hist['train_loss_task'][-1], 'val': hist['val_loss_task'][-1]}, epoch)
        writer.add_scalars('aoe', {'train': hist['train_aoe_task'][-1], 'val': hist['val_aoe_task'][-1]}, epoch)
        writer.add_scalars('foe', {'train': hist['train_foe_task'][-1], 'val': hist['val_foe_task'][-1]}, epoch)
    return

if __name__ == "__main__":
    args = arg_parse()
    if args.temporal == "lstm" or args.temporal == "faster_lstm":
        from src.gumbel_social_transformer.st_model import st_model, offset_error_square_full_partial, \
            negative_log_likelihood_full_partial
    else:
        raise RuntimeError('The temporal component is not lstm nor faster_lstm.')
    main(args)

================================================
FILE: gst_updated/scripts/wrapper/crowd_nav_interface_multi_env_parallel.py
================================================
from gst_updated.scripts.data import pathhack
import pickle
from os.path import join, isdir
import torch
import numpy as np
from gst_updated.src.mgnn.utils import seq_to_graph
from gst_updated.src.gumbel_social_transformer.st_model import st_model


class CrowdNavPredInterfaceMultiEnv(object):

    def __init__(self, load_path, device, config, num_env):
        # *** Load model
        self.args = config
        self.device = device
        self.nenv = num_env
        if not isdir(load_path):
            raise RuntimeError('The result directory was not found.')
        checkpoint_dir = join(load_path, 'checkpoint')
        with open(join(checkpoint_dir, 'args.pickle'), 'rb') as f:
            args_eval = pickle.load(f)
        # Uncomment if you want a fixed random seed.
        # torch.manual_seed(args_eval.random_seed)
        # np.random.seed(args_eval.random_seed)
        self.model = st_model(args_eval, device=device).to(device)
        model_filename = 'epoch_'+str(args_eval.num_epochs)+'.pt'
        model_checkpoint = torch.load(join(checkpoint_dir, model_filename), map_location=device)
        self.model.load_state_dict(model_checkpoint['model_state_dict'])
        self.model.eval()
        print("LOADED MODEL")
        print("device: ", device)
        print()

    def forward(self, input_traj,input_binary_mask, sampling = True):
        """
        inputs:
            - input_traj:
                # numpy
                # (n_env, num_peds, obs_seq_len, 2)
            - input_binary_mask:
                # numpy
                # (n_env, num_peds, obs_seq_len, 1)
                # Zhe: I think we should not just have the binary mask of shape (n_env, number of pedestrains, 1)
                # because some agents are partially detected, and they should not be simply ignored.
            - sampling:
                # bool
                # True means you sample from Gaussian.
                # False means you choose to use the mean of Gaussian as output.
        outputs:
            - output_traj:
                # torch "cpu"
                # (n_env, num_peds, pred_seq_len, 5)
                # where 5 includes [mu_x, mu_y, sigma_x, sigma_y, correlation coefficient]
            - output_binary_mask:
                # torch "cpu"
                # (n_env, num_peds, 1)
                # Zhe: this means for prediction, if an agent does not show up in the last and second
                # last observation time step, then the agent will not be predicted.
        """
        input_traj = input_traj.cpu().numpy()
        input_binary_mask = input_binary_mask.cpu().numpy()
        obs_seq_len = 5
        pred_seq_len = 5
        invalid_value = -999.

        # *** Process input data
        num_peds_batch = 0
        b_num_peds = []
        b_obs_traj, b_obs_traj_rel, b_loss_mask_rel, b_loss_mask, b_v_obs, b_A_obs, b_attn_mask_obs = \
            [],[],[],[],[],[],[]
        for i in range(self.nenv):
            input_traj_i = input_traj[i]
            input_binary_mask_i = input_binary_mask[i]
            obs_traj = np.transpose(input_traj_i, (0,2,1)) # (num_peds, 2, obs_seq_len)
            n_peds = obs_traj.shape[0]
            loss_mask_obs = input_binary_mask_i[:,:,0] # (num_peds, obs_seq_len)
            loss_mask_rel_obs = loss_mask_obs[:,:-1] * loss_mask_obs[:,-1:]
            loss_mask_rel_obs = np.concatenate((loss_mask_obs[:,0:1], loss_mask_rel_obs), axis=1) # (num_peds, obs_seq_len)
            loss_mask_rel_pred = (np.ones((n_peds, pred_seq_len)) * loss_mask_rel_obs[:,-1:]).astype('bool') # (num_peds)
            obs_traj_rel = obs_traj[:,:,1:] - obs_traj[:,:,:-1]
            obs_traj_rel = np.concatenate((np.zeros((n_peds, 2, 1)), obs_traj_rel), axis=2)
            obs_traj_rel = invalid_value*np.ones_like(obs_traj_rel)*(1-loss_mask_rel_obs[:,np.newaxis,:]) \
                + obs_traj_rel*loss_mask_rel_obs[:,np.newaxis,:]
            obs_traj = torch.from_numpy(obs_traj)
            obs_traj_rel = torch.from_numpy(obs_traj_rel)
            # v_obs, A_obs = seq_to_graph(obs_traj, obs_traj_rel, attn_mech='rel_conv')
            # v_obs, A_obs = v_obs.unsqueeze(0), A_obs.unsqueeze(0)
            loss_mask_rel = np.concatenate((loss_mask_rel_obs, loss_mask_rel_pred), axis=1)[np.newaxis,:,:] # (1, num_peds, seq_len)
            loss_mask_rel = torch.from_numpy(loss_mask_rel)
            loss_mask_rel_obs = torch.from_numpy(loss_mask_rel_obs[np.newaxis,:,:])
            attn_mask_obs = []
            for tt in range(obs_seq_len):
                loss_mask_rel_tt = loss_mask_rel_obs[0,:,tt] # (n_peds,)
                attn_mask_obs.append(torch.outer(loss_mask_rel_tt, loss_mask_rel_tt).float()) # (n_peds, n_peds)
            attn_mask_obs = torch.stack(attn_mask_obs, dim=0).unsqueeze(0) # (1,obs_seq_len, n_peds, n_peds)
            obs_traj = obs_traj.unsqueeze(0)
            obs_traj_rel = obs_traj_rel.unsqueeze(0)
            loss_mask_pred = loss_mask_rel_pred
            loss_mask = np.concatenate((loss_mask_obs, loss_mask_pred), axis=1)
            loss_mask = torch.from_numpy(loss_mask).unsqueeze(0)
            num_peds_batch += obs_traj.shape[1]
            b_num_peds.append(obs_traj.shape[1])
            b_obs_traj.append(obs_traj[0].float())
            b_obs_traj_rel.append(obs_traj_rel[0].float())
            b_loss_mask_rel.append(loss_mask_rel[0].float())
            b_loss_mask.append(loss_mask[0].float())
            # b_v_obs.append(v_obs[0].float())
            # b_A_obs.append(A_obs[0].float())
            b_attn_mask_obs.append(attn_mask_obs[0].float())
        b_obs_traj = torch.cat(b_obs_traj, dim=0)
        b_obs_traj_rel = torch.cat(b_obs_traj_rel, dim=0)
        b_loss_mask_rel = torch.cat(b_loss_mask_rel, dim=0)
        b_loss_mask = torch.cat(b_loss_mask, dim=0)
        b_v_obs, b_A_obs = seq_to_graph(b_obs_traj, b_obs_traj_rel, attn_mech='rel_conv')
        b_attn_mask_obs_tensor = torch.zeros(obs_seq_len, num_peds_batch, num_peds_batch).float()
        curr_ped_idx = 0
        for num_peds, attn_mask_obs_single in zip(b_num_peds, b_attn_mask_obs):
            b_attn_mask_obs_tensor[:,curr_ped_idx:curr_ped_idx+num_peds, \
                curr_ped_idx:curr_ped_idx+num_peds] = attn_mask_obs_single
            curr_ped_idx += num_peds
        b_obs_traj = b_obs_traj.unsqueeze(0)
        b_obs_traj_rel = b_obs_traj_rel.unsqueeze(0)
        b_v_obs = b_v_obs.unsqueeze(0)
        b_A_obs = b_A_obs.unsqueeze(0)
        b_loss_mask_rel = b_loss_mask_rel.unsqueeze(0)
        b_attn_mask_obs_tensor = b_attn_mask_obs_tensor.unsqueeze(0)
        b_loss_mask = b_loss_mask.unsqueeze(0)
        # print("PROCESSED DATA")
        # print("b_obs_traj shape: ", b_obs_traj.shape)
        # print("b_obs_traj_rel shape: ", b_obs_traj_rel.shape)
        # print("b_v_obs shape: ", b_v_obs.shape)
        # print("b_A_obs shape: ", b_A_obs.shape)
        # print("b_loss_mask_rel shape: ", b_loss_mask_rel.shape)
        # print("b_attn_mask_obs_tensor shape: ", b_attn_mask_obs_tensor.shape)
        # print("b_loss_mask shape: ", b_loss_mask.shape)
        # print()

        # *** Perform trajectory prediction
        sampling = False
        with torch.no_grad():

            b_v_obs, b_A_obs, b_attn_mask_obs_tensor, b_loss_mask_rel = \
                b_v_obs.to(self.device), b_A_obs.to(self.device), \
                b_attn_mask_obs_tensor.float().to(self.device), b_loss_mask_rel.float().to(self.device)
            results = self.model(b_v_obs, b_A_obs, b_attn_mask_obs_tensor, b_loss_mask_rel, tau=0.03, hard=True, sampling=sampling, device=self.device)
            gaussian_params_pred, x_sample_pred, info = results
        mu, sx, sy, corr = gaussian_params_pred
        # print(mu.shape) # torch.Size([1, 5, 48, 2])
        # print(sx.shape)
        # print(sy.shape)
        # print(corr.shape)
        # print(b_obs_traj.shape)
        # print(b_loss_mask.shape)
        mu = mu.cumsum(1)
        sx_squared = sx**2.
        sy_squared = sy**2.
        corr_sx_sy = corr*sx*sy
        sx_squared_cumsum = sx_squared.cumsum(1)
        sy_squared_cumsum = sy_squared.cumsum(1)
        corr_sx_sy_cumsum = corr_sx_sy.cumsum(1)
        sx_cumsum = sx_squared_cumsum**(1./2)
        sy_cumsum = sy_squared_cumsum**(1./2)
        corr_cumsum = corr_sx_sy_cumsum/(sx_cumsum*sy_cumsum)
        mu_cumsum = mu.detach().to("cpu") + np.transpose(b_obs_traj[:,:,:,-1:], (0,3,1,2)) # (batch, time, node, 2)
        loss_mask_pred = b_loss_mask[:,:,obs_seq_len:].permute(0,2,1).unsqueeze(-1).bool()
        mu_cumsum = mu_cumsum * loss_mask_pred + invalid_value*(~loss_mask_pred)
        output_traj = torch.cat((mu_cumsum.detach().to("cpu"), sx_cumsum.detach().to("cpu"), sy_cumsum.detach().to("cpu"), corr_cumsum.detach().to("cpu")), dim=3)
        output_traj = output_traj.permute(0, 2, 1, 3)[0] # (ped, time, 5)
        output_binary_mask = loss_mask_pred[0,0,:,:].detach().to("cpu") # (ped, 1)
        num_total_peds = output_traj.shape[0]
        output_traj = output_traj.reshape(self.nenv, num_total_peds // self.nenv, output_traj.shape[1], output_traj.shape[2]) # (n_env, ped, time, 5)
        output_binary_mask = output_binary_mask.reshape(self.nenv, num_total_peds // self.nenv, 1) # (n_env, ped, 1)

        output_traj, output_binary_mask = output_traj.to(self.device), output_binary_mask.to(self.device)
        # reshape has been tested
        # print("PERFORMED PREDICTION")
        # print("mu_cumsum shape: ", mu_cumsum.shape)
        # print("sx_cumsum shape: ", sx_cumsum.shape)
        # print("sy_cumsum shape: ", sy_cumsum.shape)
        # print("corr_cumsum shape: ", corr_cumsum.shape)
        # print("output_traj device: ", output_traj.device)
        # print("output_binary_mask device: ", output_binary_mask.device)
        # print("output_traj shape: ", output_traj.shape)
        # print("output_binary_mask shape: ", output_binary_mask.shape)
        # print()
        return output_traj, output_binary_mask

if __name__ == '__main__':
    # *** Create an input that aligns with the format of CrowdNav
    obs_seq_len = 5
    pred_seq_len = 5
    invalid_value = -999.
    ped_step = 0.56
    x_init = 2.
    ped_0 = np.stack((x_init+np.arange((obs_seq_len-1)*ped_step, -ped_step, -ped_step),np.zeros(obs_seq_len)), 0)
    ped_1 = -ped_0 # (2,8)
    ped_3 = np.stack((np.zeros(obs_seq_len), x_init+np.arange((obs_seq_len-1)*ped_step, -ped_step, -ped_step)), 0)
    ped_2 = -ped_3
    ped_4 = np.ones((2,obs_seq_len))*invalid_value
    ped_5 = np.ones((2,obs_seq_len))*invalid_value
    obs_traj_undamaged = np.stack((ped_0,ped_1,ped_2,ped_3,ped_4,ped_5), axis=0) # (num_peds, 2, obs_seq_len)
    n_peds = obs_traj_undamaged.shape[0]
    original_obs_traj = obs_traj_undamaged # (num_peds, 2, obs_seq_len)
    # 0: right, 1: left, 2: up, 3: down, 4,5: invalid agent
    original_obs_traj[2,:,:2] = invalid_value
    original_obs_traj[3,:,:2] = invalid_value
    # * input_traj and input_binary_mask are example inputs from CrowdNav
    input_traj = np.transpose(original_obs_traj, (0,2,1)) # (num_peds, obs_seq_len, 2)
    input_binary_mask = (input_traj!=invalid_value).prod(2).astype('bool')[:,:,np.newaxis] # (num_peds, obs_seq_len, 1)

    n_env = 8
    input_traj = np.stack([input_traj for i in range(n_env)], axis=0) # (n_env, num_peds, obs_seq_len, 2)
    input_binary_mask = np.stack([input_binary_mask for i in range(n_env)],axis=0) # (n_env, num_peds, obs_seq_len, 1)
    print()
    print("INPUT DATA")
    print("input_traj shape: ", input_traj.shape)
    print("input_binary_mask shape:", input_binary_mask.shape)
    print()

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    args = "100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000"
    model = CrowdNavPredInterfaceMultiEnv(load_path='/home/shuijing/Desktop/CrowdNav_Prediction/gst_updated/results/100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000/sj',
                                          device=device, config = args, num_env=n_env)
    output_traj, output_binary_mask = model.forward(
        input_traj,
        input_binary_mask,
        sampling = True,
    )
    print()
    print("OUTPUT DATA")
    print("output_traj shape: ", output_traj.shape)
    print("output_binary_mask shape:", output_binary_mask.shape)
    print()

================================================
FILE: gst_updated/scripts/wrapper/crowd_nav_interface_multi_env_visualization_test_single_batch.py
================================================
import pathhack
import pickle
from os.path import join, isdir
import torch
import numpy as np
from src.mgnn.utils import seq_to_graph
from src.gumbel_social_transformer.st_model import st_model
import matplotlib.pyplot as plt

# from gst_updated.src.mgnn.utils import average_offset_error, final_offset_error, args2writername, load_batch_dataset
# from gst_updated.src.gumbel_social_transformer.st_model import st_model, offset_error_square_full_partial,\
#     negative_log_likelihood_full_partial
from torch.utils.data import DataLoader

def load_batch_dataset(pkg_path, subfolder='test', num_workers=4, shuffle=None):
    result_filename = 'sj_dset_'+subfolder+'_trajectories.pt'
    dataset_folderpath = join(pkg_path, 'datasets/shuijing/orca_20humans_fov')
    dset = torch.load(join(dataset_folderpath, result_filename))
    if shuffle is None:
        if subfolder == 'train':
            shuffle = True
        else:
            shuffle = False
    dloader = DataLoader(
        dset,
        batch_size=1,
        shuffle=shuffle,
        num_workers=num_workers)
    return dloader

loader_test = load_batch_dataset(pathhack.pkg_path, subfolder='test')
print(len(loader_test))

test_batches = []
max_num_peds = 0
for batch_idx, batch in enumerate(loader_test):
    if batch_idx % 1000 == 0 and len(test_batches) < 4 and batch_idx != 0:
        obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
        v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt = batch
        test_batches.append(batch)
        print("obs traj shape: ", obs_traj.shape)
        print("loss mask shape: ", loss_mask.shape)
        if max_num_peds < obs_traj.shape[1]:
            max_num_peds = obs_traj.shape[1]
            print("max number of pedestrians: ", max_num_peds)

wrapper_input_traj = []
wrapper_input_binary_mask = []
wrapper_output_traj_gt = []
wrapper_output_binary_mask_gt = []

invalid_value = -999.
for batch in test_batches:
    obs_traj, pred_traj_gt, obs_traj_rel, pred_traj_rel_gt, loss_mask_rel, loss_mask, \
        v_obs, A_obs, v_pred_gt, A_pred_gt, attn_mask_obs, attn_mask_pred_gt = batch
    input_traj = obs_traj[0].permute(0,2,1) # (n_peds, obs_seq_len,  2)
    input_binary_mask = loss_mask[0,:,:input_traj.shape[1]].unsqueeze(2) # (n_peds, obs_seq_len, 1)
    # print(input_binary_mask.shape)
    output_traj_gt = pred_traj_gt[0].permute(0,2,1) # (num_peds, pred_seq_len, 2)
    output_binary_mask_gt = loss_mask[0,:,input_traj.shape[1]:].unsqueeze(2) # (n_peds, pred_seq_len, 1)
    n_peds, obs_seq_len, pred_seq_len = input_traj.shape[0], input_traj.shape[1], output_traj_gt.shape[1]
    if n_peds < max_num_peds:
        input_traj_complement = torch.ones(max_num_peds-n_peds, obs_seq_len, 2)*invalid_value
        input_binary_mask_complement = torch.zeros(max_num_peds-n_peds, obs_seq_len, 1)
        output_traj_gt_complement = torch.ones(max_num_peds-n_peds, pred_seq_len, 2)*invalid_value
        output_binary_mask_gt_complement = torch.zeros(max_num_peds-n_peds, pred_seq_len, 1)

        input_traj = torch.cat((input_traj, input_traj_complement), dim=0)
        input_binary_mask = torch.cat((input_binary_mask, input_binary_mask_complement), dim=0)
        output_traj_gt = torch.cat((output_traj_gt, output_traj_gt_complement), dim=0)
        output_binary_mask_gt = torch.cat((output_binary_mask_gt, output_binary_mask_gt_complement), dim=0)
    
    wrapper_input_traj.append(input_traj)
    wrapper_input_binary_mask.append(input_binary_mask)
    wrapper_output_traj_gt.append(output_traj_gt)
    wrapper_output_binary_mask_gt.append(output_binary_mask_gt)

wrapper_input_traj = torch.stack(wrapper_input_traj, dim=0)
wrapper_input_binary_mask = torch.stack(wrapper_input_binary_mask, dim=0)
wrapper_output_traj_gt = torch.stack(wrapper_output_traj_gt, dim=0)
wrapper_output_binary_mask_gt = torch.stack(wrapper_output_binary_mask_gt, dim=0)
print("wrapper_input_traj shape: ", wrapper_input_traj.shape)
print("wrapper_input_binary_mask shape: ", wrapper_input_binary_mask.shape)  
print("wrapper_output_traj_gt shape: ", wrapper_output_traj_gt.shape)
print("wrapper_output_binary_mask_gt shape: ", wrapper_output_binary_mask_gt.shape)  
# wrapper_input_traj shape:  torch.Size([4, 15, 5, 2])
# wrapper_input_binary_mask shape:  torch.Size([4, 15, 5, 1])
# wrapper_output_traj_gt shape:  torch.Size([4, 15, 5, 2])
# wrapper_output_binary_mask_gt shape:  torch.Size([4, 15, 5, 1])

n_env = wrapper_input_traj.shape[0]
input_traj = wrapper_input_traj.numpy()
input_binary_mask = wrapper_input_binary_mask.numpy()
print()
print("INPUT DATA")
print("input_traj shape: ", input_traj.shape)
print("input_binary_mask shape:", input_binary_mask.shape)
print()

# *** Visualize input data
def visualize_trajectory(obs_traj, loss_mask, sample_index, obs_seq_len=5, pred_seq_len=5):
    ##### Print Visualization Started #####
    n_peds, seq_len = obs_traj.shape[1], obs_seq_len+pred_seq_len
    full_ped_idx = torch.where(loss_mask.sum(2)[0]==seq_len)[0] # loss_mask tensor: (1, num_peds, seq_len)
    fig, ax = plt.subplots()
    # ax.set_xlim((-8, 8))
    # ax.set_ylim((-5, 5))
    fig.set_tight_layout(True)
    for ped_idx in range(n_peds):
        if ped_idx in full_ped_idx:
            ax.plot(obs_traj[0, ped_idx, 0, :obs_seq_len], obs_traj[0, ped_idx, 1, :obs_seq_len], '.-', c='k') # black for obs
        else:
            for t_idx in range(seq_len):
                if loss_mask[0,ped_idx,t_idx] == 1:
                    if t_idx < obs_seq_len:
                        # obs blue for partially detected pedestrians
                        ax.plot(obs_traj[0, ped_idx, 0, t_idx], obs_traj[0, ped_idx, 1, t_idx], '.', c='b')
    ax.set_aspect('equal', adjustable='box')
    ax.plot()
    fig.savefig("tmp_img_to_be_deleted_"+str(sample_index)+".png")
    print("tmp_img_to_be_deleted_"+str(sample_index)+".png is created.")
    return


# visualize_trajectory(obs_traj, loss_mask, sample_index, obs_seq_len=obs_seq_len, pred_seq_len=obs_seq_len)


def CrowdNavPredInterfaceMultiEnv(
    input_traj,
    input_binary_mask,
    sampling = True,
):
    """
    inputs:
        - input_traj:
            # numpy
            # (n_env, num_peds, obs_seq_len, 2)
        - input_binary_mask:
            # numpy
            # (n_env, num_peds, obs_seq_len, 1)
            # Zhe: I think we should not just have the binary mask of shape (n_env, number of pedestrains, 1)
            # because some agents are partially detected, and they should not be simply ignored.
        - sampling:
            # bool 
            # True means you sample from Gaussian.
            # False means you choose to use the mean of Gaussian as output.
    outputs:
        - output_traj:
            # torch "cpu"
            # (n_env, num_peds, pred_seq_len, 5)
            # where 5 includes [mu_x, mu_y, sigma_x, sigma_y, correlation coefficient]
        - output_binary_mask:
            # torch "cpu"
            # (n_env, num_peds, 1)
            # Zhe: this means for prediction, if an agent does not show up in the last and second 
            # last observation time step, then the agent will not be predicted.
    """
    # *** Process input data
    num_peds_batch = 0
    b_num_peds = []
    b_obs_traj, b_obs_traj_rel, b_loss_mask_rel, b_loss_mask, b_v_obs, b_A_obs, b_attn_mask_obs = \
        [],[],[],[],[],[],[]
    n_env = 1
    for i in range(n_env):
        input_traj_i = input_traj[i]
        input_binary_mask_i = input_binary_mask[i]
        obs_traj = np.transpose(input_traj_i, (0,2,1)) # (num_peds, 2, obs_seq_len)
        n_peds = obs_traj.shape[0]
        loss_mask_obs = input_binary_mask_i[:,:,0] # (num_peds, obs_seq_len)
        loss_mask_rel_obs = loss_mask_obs[:,:-1] * loss_mask_obs[:,-1:]
        loss_mask_rel_obs = np.concatenate((loss_mask_obs[:,0:1], loss_mask_rel_obs), axis=1) # (num_peds, obs_seq_len)
        # import pdb; pdb.set_trace()
        loss_mask_rel_pred = (np.ones((n_peds, pred_seq_len)) * loss_mask_rel_obs[:,-1:]).astype('bool') # (num_peds)
        obs_traj_rel = obs_traj[:,:,1:] - obs_traj[:,:,:-1]
        obs_traj_rel = np.concatenate((np.zeros((n_peds, 2, 1)), obs_traj_rel), axis=2)
        obs_traj_rel = invalid_value*np.ones_like(obs_traj_rel)*(1-loss_mask_rel_obs[:,np.newaxis,:]) \
            + obs_traj_rel*loss_mask_rel_obs[:,np.newaxis,:]
        obs_traj = torch.from_numpy(obs_traj)
        obs_traj_rel = torch.from_numpy(obs_traj_rel)
        v_obs, A_obs = seq_to_graph(obs_traj, obs_traj_rel, attn_mech='rel_conv')
        v_obs, A_obs = v_obs.unsqueeze(0), A_obs.unsqueeze(0)
        loss_mask_rel = np.concatenate((loss_mask_rel_obs, loss_mask_rel_pred), axis=1)[np.newaxis,:,:] # (1, num_peds, seq_len)
        loss_mask_rel = torch.from_numpy(loss_mask_rel)
        loss_mask_rel_obs = torch.from_numpy(loss_mask_rel_obs[np.newaxis,:,:])
        attn_mask_obs = []
        for tt in range(obs_seq_len):
            loss_mask_rel_tt = loss_mask_rel_obs[0,:,tt] # (n_peds,)
            attn_mask_obs.append(torch.outer(loss_mask_rel_tt, loss_mask_rel_tt).float()) # (n_peds, n_peds)
        attn_mask_obs = torch.stack(attn_mask_obs, dim=0).unsqueeze(0) # (1,obs_seq_len, n_peds, n_peds)
        obs_traj = obs_traj.unsqueeze(0)
        obs_traj_rel = obs_traj_rel.unsqueeze(0)
        loss_mask_pred = loss_mask_rel_pred
        loss_mask = np.concatenate((loss_mask_obs, loss_mask_pred), axis=1)
        loss_mask = torch.from_numpy(loss_mask).unsqueeze(0)
        visualize_trajectory(obs_traj, loss_mask, i, obs_seq_len=obs_seq_len, pred_seq_len=obs_seq_len)
        num_peds_batch += obs_traj.shape[1]
        b_num_peds.append(obs_traj.shape[1])
        b_obs_traj.append(obs_traj[0].float())
        b_obs_traj_rel.append(obs_traj_rel[0].float())
        b_loss_mask_rel.append(loss_mask_rel[0].float())
        b_loss_mask.append(loss_mask[0].float())
        b_v_obs.append(v_obs[0].float())
        b_A_obs.append(A_obs[0].float())
        b_attn_mask_obs.append(attn_mask_obs[0].float())
    b_obs_traj = torch.cat(b_obs_traj, dim=0)
    b_obs_traj_rel = torch.cat(b_obs_traj_rel, dim=0)
    b_loss_mask_rel = torch.cat(b_loss_mask_rel, dim=0)
    b_loss_mask = torch.cat(b_loss_mask, dim=0)
    b_v_obs, b_A_obs = seq_to_graph(b_obs_traj, b_obs_traj_rel, attn_mech='rel_conv')
    b_attn_mask_obs_tensor = torch.zeros(obs_seq_len, num_peds_batch, num_peds_batch).float()
    curr_ped_idx = 0
    for num_peds, attn_mask_obs_single in zip(b_num_peds, b_attn_mask_obs):
        b_attn_mask_obs_tensor[:,curr_ped_idx:curr_ped_idx+num_peds, \
            curr_ped_idx:curr_ped_idx+num_peds] = attn_mask_obs_single
        curr_ped_idx += num_peds
    b_obs_traj = b_obs_traj.unsqueeze(0)
    b_obs_traj_rel = b_obs_traj_rel.unsqueeze(0)
    b_v_obs = b_v_obs.unsqueeze(0)
    b_A_obs = b_A_obs.unsqueeze(0)
    b_loss_mask_rel = b_loss_mask_rel.unsqueeze(0)
    b_attn_mask_obs_tensor = b_attn_mask_obs_tensor.unsqueeze(0)
    b_loss_mask = b_loss_mask.unsqueeze(0)
    # print("PROCESSED DATA")
    # print("b_obs_traj shape: ", b_obs_traj.shape)
    # print("b_obs_traj_rel shape: ", b_obs_traj_rel.shape)
    # print("b_v_obs shape: ", b_v_obs.shape)
    # print("b_A_obs shape: ", b_A_obs.shape)
    # print("b_loss_mask_rel shape: ", b_loss_mask_rel.shape)
    # print("b_attn_mask_obs_tensor shape: ", b_attn_mask_obs_tensor.shape)
    # print("b_loss_mask shape: ", b_loss_mask.shape)
    # print()
    # *** Load model
    writername = "100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000"
    dataset = "sj" # Shui Jing
    logdir = join(pathhack.pkg_path, 'results', writername, dataset)
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    if not isdir(logdir):
        raise RuntimeError('The result directory was not found.')
    checkpoint_dir = join(logdir, 'checkpoint')
    with open(join(checkpoint_dir, 'args.pickle'), 'rb') as f:
        args_eval = pickle.load(f)
    # Uncomment if you want a fixed random seed.
    # torch.manual_seed(args_eval.random_seed)
    # np.random.seed(args_eval.random_seed)
    model = st_model(args_eval, device=device).to(device)
    model_filename = 'epoch_'+str(args_eval.num_epochs)+'.pt'
    model_checkpoint = torch.load(join(checkpoint_dir, model_filename), map_location=device)
    model.load_state_dict(model_checkpoint['model_state_dict'])
    # print("LOADED MODEL")
    # print("device: ", device)
    # print('Loaded configuration: ', writername)
    # print()
    # *** Perform trajectory prediction
    sampling = False
    with torch.no_grad():
        model.eval()
        b_v_obs, b_A_obs, b_attn_mask_obs_tensor, b_loss_mask_rel = \
            b_v_obs.to(device), b_A_obs.to(device), \
            b_attn_mask_obs_tensor.float().to(device), b_loss_mask_rel.float().to(device)
        results = model(b_v_obs, b_A_obs, b_attn_mask_obs_tensor, b_loss_mask_rel, tau=0.03, hard=True, sampling=sampling, device=device)
        gaussian_params_pred, x_sample_pred, info = results
    mu, sx, sy, corr = gaussian_params_pred
    # print(mu.shape) # torch.Size([1, 5, 48, 2])
    # print(sx.shape)
    # print(sy.shape)
    # print(corr.shape)
    # print(b_obs_traj.shape)
    # print(b_loss_mask.shape)
    mu = mu.cumsum(1)
    sx_squared = sx**2.
    sy_squared = sy**2.
    corr_sx_sy = corr*sx*sy
    sx_squared_cumsum = sx_squared.cumsum(1)
    sy_squared_cumsum = sy_squared.cumsum(1)
    corr_sx_sy_cumsum = corr_sx_sy.cumsum(1)
    sx_cumsum = sx_squared_cumsum**(1./2)
    sy_cumsum = sy_squared_cumsum**(1./2)
    corr_cumsum = corr_sx_sy_cumsum/(sx_cumsum*sy_cumsum)
    mu_cumsum = mu.detach().to("cpu") + np.transpose(b_obs_traj[:,:,:,-1:], (0,3,1,2)) # (batch, time, node, 2)
    loss_mask_pred = b_loss_mask[:,:,obs_seq_len:].permute(0,2,1).unsqueeze(-1).bool()
    mu_cumsum = mu_cumsum * loss_mask_pred + invalid_value*(~loss_mask_pred)
    output_traj = torch.cat((mu_cumsum.detach().to("cpu"), sx_cumsum.detach().to("cpu"), sy_cumsum.detach().to("cpu"), corr_cumsum.detach().to("cpu")), dim=3)
    output_traj = output_traj.permute(0, 2, 1, 3)[0] # (ped, time, 5)
    output_binary_mask = loss_mask_pred[0,0,:,:].detach().to("cpu") # (ped, 1)
    num_total_peds = output_traj.shape[0]
    output_traj = output_traj.reshape(n_env, num_total_peds // n_env, output_traj.shape[1], output_traj.shape[2]) # (n_env, ped, time, 5)
    output_binary_mask = output_binary_mask.reshape(n_env, num_total_peds // n_env, 1) # (n_env, ped, 1)
    output_traj, output_binary_mask = output_traj.numpy(), output_binary_mask.numpy()
    # reshape has been tested
    # print("PERFORMED PREDICTION")
    # print("mu_cumsum shape: ", mu_cumsum.shape)
    # print("sx_cumsum shape: ", sx_cumsum.shape)
    # print("sy_cumsum shape: ", sy_cumsum.shape)
    # print("corr_cumsum shape: ", corr_cumsum.shape)
    # print("output_traj device: ", output_traj.device)
    # print("output_binary_mask device: ", output_binary_mask.device)
    # print("output_traj shape: ", output_traj.shape)
    # print("output_binary_mask shape: ", output_binary_mask.shape)
    # print()
    return output_traj, output_binary_mask

output_traj, output_binary_mask = CrowdNavPredInterfaceMultiEnv(
    input_traj[:1],
    input_binary_mask[:1],
    sampling = True,
)
print()
print("OUTPUT DATA")
print("output_traj shape: ", output_traj.shape)
print("output_binary_mask shape:", output_binary_mask.shape)
print()


# wrapper_input_traj shape:  torch.Size([4, 15, 5, 2])
# wrapper_input_binary_mask shape:  torch.Size([4, 15, 5, 1])
# wrapper_output_traj_gt shape:  torch.Size([4, 15, 5, 2])
# wrapper_output_binary_mask_gt shape:  torch.Size([4, 15, 5, 1])
# OUTPUT DATA
# output_traj shape:  torch.Size([4, 15, 5, 5])
# output_binary_mask shape: torch.Size([4, 15, 1])
# n_env = wrapper_input_traj.shape[0]
# input_traj = wrapper_input_traj.numpy()
# input_binary_mask = wrapper_input_binary_mask.numpy()
# print()
# print("INPUT DATA")
# print("input_traj shape: ", input_traj.shape)
# print("input_binary_mask shape:", input_binary_mask.shape)
# print()
def visualize_output_trajectory_deterministic(input_traj, input_binary_mask, output_traj, output_binary_mask, sample_index, obs_seq_len=5, pred_seq_len=5):
    ##### Print Visualization Started #####
    input_traj_i = input_traj[sample_index] #15, 5, 2])
    input_binary_mask_i = input_binary_mask[sample_index] #15, 5, 1])
    output_traj_i = output_traj[sample_index] #15, 5, 5])
    output_binary_mask_i = output_binary_mask[sample_index] #15, 1])
    n_peds, seq_len = input_traj_i.shape[0], obs_seq_len+pred_seq_len
    full_obs_ped_idx = np.where(input_binary_mask_i.sum(1)[:,0]==obs_seq_len)[0]
    full_traj = np.concatenate((input_traj_i, output_traj_i[:,:,:2]), axis=1) # (15, 10, 2)
    output_binary_mask_i_pred_len = np.stack([output_binary_mask_i for j in range(pred_seq_len)], axis=1) # (15, 5, 1)

    loss_mask = np.concatenate((input_binary_mask_i, output_binary_mask_i_pred_len), axis=1) # (15, 10, 1)

    fig, ax = plt.subplots()
    # ax.set_xlim((-8, 8))
    # ax.set_ylim((-5, 5))
    fig.set_tight_layout(True)
    for ped_idx in range(n_peds):
        if ped_idx in full_obs_ped_idx:
            ax.plot(full_traj[ped_idx, obs_seq_len:, 0], full_traj[ped_idx, obs_seq_len:, 1], '.-', c='r')
            ax.plot(full_traj[ped_idx, :obs_seq_len, 0], full_traj[ped_idx, :obs_seq_len, 1], '.-', c='k') # black for obs   
        else:
            for t_idx in range(seq_len):
                if loss_mask[ped_idx,t_idx,0] == 1:
                    if t_idx < obs_seq_len:
                        # obs blue for partially detected pedestrians
                        ax.plot(full_traj[ped_idx, t_idx, 0], full_traj[ped_idx, t_idx, 1], '.', c='b')
                    else:
                        # pred orange for partially detected pedestrians
                        ax.plot(full_traj[ped_idx, t_idx, 0], full_traj[ped_idx, t_idx, 1], '.', c='C1', alpha=0.2)

    ax.set_aspect('equal', adjustable='box')
    ax.plot()
    fig.savefig("SINGLE_pred_tmp_img_to_be_deleted_"+str(sample_index)+".png")
    print("SINGLE_pred_tmp_img_to_be_deleted_"+str(sample_index)+".png is created.")
    return

for sample_index in range(n_env):
    visualize_output_trajectory_deterministic(input_traj, input_binary_mask, output_traj, output_binary_mask, sample_index, obs_seq_len=5, pred_seq_len=5)


wrapper_visualization_data = {}
wrapper_visualization_data['input_traj'] = input_traj
wrapper_visualization_data['input_mask'] = input_binary_mask
wrapper_visualization_data['output_traj'] = output_traj
wrapper_visualization_data['output_mask'] = output_binary_mask
wrapper_visualization_data['output_traj_gt'] = wrapper_output_traj_gt.numpy()
wrapper_visualization_data['output_mask_gt'] = wrapper_output_binary_mask_gt.numpy()
with open(join('wrapper_visualization_data.pickle'), 'wb') as f:
    pickle.dump(wrapper_visualization_data, f)
    print("wrapper_visualization_data.pickle is dumped.")
# output_traj, output_binary_mask

# wrapper_input_traj = torch.stack(wrapper_input_traj, dim=0)
# wrapper_input_binary_mask = torch.stack(wrapper_input_binary_mask, dim=0)
# wrapper_output_traj_gt = torch.stack(wrapper_output_traj_gt, dim=0)
# wrapper_output_binary_mask_gt = torch.stack(wrapper_output_binary_mask_gt, dim=0)
# print("wrapper_input_traj shape: ", wrapper_input_traj.shape)
# print("wrapper_input_binary_mask shape: ", wrapper_input_binary_mask.shape)  
# print("wrapper_output_traj_gt shape: ", wrapper_output_traj_gt.shape)
# print("wrapper_output_binary_mask_gt shape: ", wrapper_output_binary_mask_gt.shape)  
# wrapper_input_traj shape:  torch.Size([4, 15, 5, 2])
# wrapper_input_binary_mask shape:  torch.Size([4, 15, 5, 1])
# wrapper_output_traj_gt shape:  torch.Size([4, 15, 5, 2])
# wrapper_output_binary_mask_gt shape:  torch.Size([4, 15, 5, 1])

# n_env = wrapper_input_traj.shape[0]
# input_traj = wrapper_input_traj.numpy()
# input_binary_mask = wrapper_input_binary_mask.numpy()

# with open(join(checkpoint_dir, 'args.pickle'), 'wb') as f:
#     pickle.dump(args, f)

# def visualize_trajectory(obs_traj, loss_mask, sample_index, obs_seq_len=5, pred_seq_len=5):
#     ##### Print Visualization Started #####
#     n_peds, seq_len = obs_traj.shape[1], obs_seq_len+pred_seq_len
#     full_ped_idx = torch.where(loss_mask.sum(2)[0]==seq_len)[0] # loss_mask tensor: (1, num_peds, seq_len)
#     fig, ax = plt.subplots()
#     # ax.set_xlim((-8, 8))
#     # ax.set_ylim((-5, 5))
#     fig.set_tight_layout(True)
#     for ped_idx in range(n_peds):
#         if ped_idx in full_ped_idx:
#             ax.plot(obs_traj[0, ped_idx, 0, :obs_seq_len], obs_traj[0, ped_idx, 1, :obs_seq_len], '.-', c='k') # black for obs
#         else:
#             for t_idx in range(seq_len):
#                 if loss_mask[0,ped_idx,t_idx] == 1:
#                     if t_idx < obs_seq_len:
#                         # obs blue for partially detected pedestrians
#                         ax.plot(obs_traj[0, ped_idx, 0, t_idx], obs_traj[0, ped_idx, 1, t_idx], '.', c='b')
#     ax.set_aspect('equal', adjustable='box')
#     ax.plot()
#     fig.savefig("tmp_img_to_be_deleted_"+str(sample_index)+".png")
#     print("tmp_img_to_be_deleted_"+str(sample_index)+".png is created.")
#     return


# # lstm 
# torch.manual_seed(args.random_seed)
# np.random.seed(args.random_seed)
# target_ped_idx_from_full = 1#2#0#1#2 # 0-2 for zara1
# # print(full_ped_idx[target_ped_idx_from_full])
# with torch.no_grad():
#     model.eval()
#     v_obs, A_obs, attn_mask_obs, loss_mask_rel = \
#         v_obs.to(device), A_obs.to(device), \
#         attn_mask_obs.float().to(device), loss_mask_rel.float().to(device)

#     results = model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=tau, hard=True, sampling=False, device=device)
#     # gaussian_params_pred, x_sample_pred, _, loss_mask_per_pedestrian = results
#     gaussian_params_pred, x_sample_pred, info = results
    
#     pred_traj_rel = torch.cat((obs_traj[:,:,:,-1:], x_sample_pred.permute(0,2,3,1)), dim=3) # (1, num_peds, 2, 1+pred_seq_len)
#     pred_traj = pred_traj_rel.cumsum(dim=3).to('cpu')
    
    
#     # full_traj = torch.cat((obs_traj, pred_traj_gt), dim=3) # (1, num_peds, 2, seq_len)  
#     full_traj = torch.cat((obs_traj[:,:,:,:-1], pred_traj), dim=3) # (1, num_peds, 2, seq_len)  
#     n_peds, seq_len = full_traj.shape[1], full_traj.shape[3]
#     # full_ped_idx = torch.where(loss_mask.sum(2)[0]==seq_len)[0] # # loss_mask tensor: (1, num_peds, seq_len)
# #     full_ped_idx = torch.where(info['loss_mask_rel_full_partial']==1)[1]
# #     info['loss_mask_rel_full_partial']
#     full_ped_idx = torch.where(info['loss_mask_per_pedestrian']==1)[1]
#     partial_full_ped_idx = torch.where(info['loss_mask_rel_full_partial']==1)[1]

#     fig, ax = plt.subplots()
#     ax.set_xlim((-8, 8))
#     ax.set_ylim((-5, 5))
#     fig.set_tight_layout(True)
#     for ped_idx in range(n_peds):
#         if ped_idx in partial_full_ped_idx:
#             if ped_idx in full_ped_idx:
# #                 ax.plot(full_traj[0, ped_idx, 0, 7:], full_traj[0, ped_idx, 1, 7:], '.-', c='C2') # green for pred gt
#                 ax.plot(full_traj[0, ped_idx, 0, :8], full_traj[0, ped_idx, 1, :8], '.-', c='k') # black for obs
#                 ax.plot(pred_traj[0, ped_idx, 0], pred_traj[0, ped_idx, 1], '.-', c='C1',alpha=0.2) # orange for pred           
#         if ped_idx not in full_ped_idx:
#             for t_idx in range(seq_len):
#                 if loss_mask[0,ped_idx,t_idx] == 1:
#                     if t_idx < 8:
#                         ax.plot(full_traj[0, ped_idx, 0, t_idx], full_traj[0, ped_idx, 1, t_idx], '.', c='b')
#                         # obs blue
#                     else:
#                         ax.plot(full_traj[0, ped_idx, 0, t_idx], full_traj[0, ped_idx, 1, t_idx], '.', c='r')
#                         # pred red
#     ax.set_aspect('equal', adjustable='box')
#     ax.plot()
    

================================================
FILE: gst_updated/scripts/wrapper/crowd_nav_interface_parallel.py
================================================

from gst_updated.src.gumbel_social_transformer.st_model import st_model
from os.path import join, isdir
import pickle
import torch
import numpy as np
import matplotlib.pyplot as plt

def seq_to_graph(seq_, seq_rel, attn_mech='rel_conv'):
    """
    inputs:
        - seq_ # (n_env, num_peds, 2, obs_seq_len)
        - seq_rel # (n_env, num_peds, 2, obs_seq_len)
    outputs:
        - V # (n_env, obs_seq_len, num_peds, 2)
        - A # (n_env, obs_seq_len, num_peds, num_peds, 2)
    """
    V = seq_rel.permute(0, 3, 1, 2) # (n_env, obs_seq_len, num_peds, 2)
    seq_permute = seq_.permute(0, 3, 1, 2) # (n_env, obs_seq_len, num_peds, 2)
    A = seq_permute.unsqueeze(3)-seq_permute.unsqueeze(2) # (n_env, obs_seq_len, num_peds, 1, 2) - (n_env, obs_seq_len, 1, num_peds, 2)
    return V, A

class CrowdNavPredInterfaceMultiEnv(object):

    def __init__(self, load_path, device, config, num_env):
        # *** Load model
        self.args = config
        self.device = device
        self.nenv = num_env

        # Uncomment if you want a fixed random seed.
        # torch.manual_seed(args_eval.random_seed)
        # np.random.seed(args_eval.random_seed)
        self.args_eval = config
        checkpoint_dir = join(load_path, 'checkpoint')
        self.model = st_model(self.args_eval, device=device).to(device)
        model_filename = 'epoch_'+str(self.args_eval.num_epochs)+'.pt'
        model_checkpoint = torch.load(join(checkpoint_dir, model_filename), map_location=device)
        self.model.load_state_dict(model_checkpoint['model_state_dict'])
        self.model.eval()
        print("LOADED MODEL")
        print("device: ", device)
        print()

    def forward(self, input_traj,input_binary_mask, sampling = True):
        """
        inputs:
            - input_traj:
                # numpy
                # (n_env, num_peds, obs_seq_len, 2)
            - input_binary_mask:
                # numpy
                # (n_env, num_peds, obs_seq_len, 1)
                # Zhe: I think we should not just have the binary mask of shape (n_env, number of pedestrains, 1)
                # because some agents are partially detected, and they should not be simply ignored.
            - sampling:
                # bool
                # True means you sample from Gaussian.
                # False means you choose to use the mean of Gaussian as output.
        outputs:
            - output_traj:
                # torch "cpu"
                # (n_env, num_peds, pred_seq_len, 5)
                # where 5 includes [mu_x, mu_y, sigma_x, sigma_y, correlation coefficient]
            - output_binary_mask:
                # torch "cpu"
                # (n_env, num_peds, 1)
                # Zhe: this means for prediction, if an agent does not show up in the last and second
                # last observation time step, then the agent will not be predicted.
        """

        invalid_value = -999.
        # *** Process input data
        obs_traj = input_traj.permute(0,1,3,2) # (n_env, num_peds, 2, obs_seq_len)
        n_env, num_peds = obs_traj.shape[:2]
        loss_mask_obs = input_binary_mask[:,:,:,0] # (n_env, num_peds, obs_seq_len)
        loss_mask_rel_obs = loss_mask_obs[:,:,:-1] * loss_mask_obs[:,:,-1:]
        loss_mask_rel_obs = torch.cat((loss_mask_obs[:,:,:1], loss_mask_rel_obs), dim=2) # (n_env, num_peds, obs_seq_len)
        loss_mask_rel_pred = (torch.ones((n_env, num_peds, self.args_eval.pred_seq_len), device=self.device) * loss_mask_rel_obs[:,:,-1:])
        loss_mask_rel = torch.cat((loss_mask_rel_obs, loss_mask_rel_pred), dim=2) # (n_env, num_peds, seq_len)
        loss_mask_pred = loss_mask_rel_pred
        loss_mask_rel_obs_permute = loss_mask_rel_obs.permute(0,2,1).reshape(n_env*self.args_eval.obs_seq_len, num_peds) # (n_env*obs_seq_len, num_peds)
        attn_mask_obs = torch.bmm(loss_mask_rel_obs_permute.unsqueeze(2), loss_mask_rel_obs_permute.unsqueeze(1)) # (n_env*obs_seq_len, num_peds, num_peds)
        attn_mask_obs = attn_mask_obs.reshape(n_env, self.args_eval.obs_seq_len, num_peds, num_peds)
        
        obs_traj_rel = obs_traj[:,:,:,1:] - obs_traj[:,:,:,:-1]
        obs_traj_rel = torch.cat((torch.zeros(n_env, num_peds, 2, 1, device=self.device), obs_traj_rel), dim=3)
        obs_traj_rel = invalid_value*torch.ones_like(obs_traj_rel)*(1-loss_mask_rel_obs.unsqueeze(2)) \
            + obs_traj_rel*loss_mask_rel_obs.unsqueeze(2)
        v_obs, A_obs = seq_to_graph(obs_traj, obs_traj_rel, attn_mech='rel_conv')
        # *** Perform trajectory prediction
        sampling = False
        with torch.no_grad():
            v_obs, A_obs, attn_mask_obs, loss_mask_rel = \
                v_obs.to(self.device), A_obs.to(self.device), attn_mask_obs.to(self.device), loss_mask_rel.to(self.device)
            results = self.model(v_obs, A_obs, attn_mask_obs, loss_mask_rel, tau=0.03, hard=True, sampling=sampling, device=self.device)
            gaussian_params_pred, x_sample_pred, info = results
        mu, sx, sy, corr = gaussian_params_pred
        mu = mu.cumsum(1)
        sx_squared = sx**2.
        sy_squared = sy**2.
        corr_sx_sy = corr*sx*sy
        sx_squared_cumsum = sx_squared.cumsum(1)
        sy_squared_cumsum = sy_squared.cumsum(1)
        corr_sx_sy_cumsum = corr_sx_sy.cumsum(1)
        sx_cumsum = sx_squared_cumsum**(1./2)
        sy_cumsum = sy_squared_cumsum**(1./2)
        corr_cumsum = corr_sx_sy_cumsum/(sx_cumsum*sy_cumsum)
        mu_cumsum = mu.detach().to(self.device) + obs_traj.permute(0,3,1,2)[:,-1:]# np.transpose(obs_traj[:,:,:,-1:], (0,3,1,2)) # (batch, time, node, 2)
        mu_cumsum = mu_cumsum * loss_mask_pred.permute(0,2,1).unsqueeze(-1) + invalid_value*(1-loss_mask_pred.permute(0,2,1).unsqueeze(-1))
        output_traj = torch.cat((mu_cumsum.detach().to(self.device), sx_cumsum.detach().to(self.device), sy_cumsum.detach().to(self.device), corr_cumsum.detach().to(self.device)), dim=3)
        output_traj = output_traj.permute(0, 2, 1, 3) # (n_env, num_peds, pred_seq_len, 5)
        output_binary_mask = loss_mask_pred[:,:,:1].detach().to(self.device) # (n_env, num_peds, 1) # first step same as following in prediction
        return output_traj, output_binary_mask


def visualize_output_trajectory_deterministic(input_traj, input_binary_mask, output_traj, output_binary_mask, sample_index, obs_seq_len=5, pred_seq_len=5):
    ##### Print Visualization Started #####
    input_traj_i = input_traj[sample_index]
    input_binary_mask_i = input_binary_mask[sample_index]
    output_traj_i = output_traj[sample_index]
    output_binary_mask_i = output_binary_mask[sample_index]
    num_peds, seq_len = input_traj_i.shape[0], obs_seq_len+pred_seq_len
    full_obs_ped_idx = np.where(input_binary_mask_i.sum(1)[:,0]==obs_seq_len)[0]
    full_traj = np.concatenate((input_traj_i, output_traj_i[:,:,:2]), axis=1)
    output_binary_mask_i_pred_len = np.stack([output_binary_mask_i for j in range(pred_seq_len)], axis=1)
    loss_mask = np.concatenate((input_binary_mask_i, output_binary_mask_i_pred_len), axis=1)
    fig, ax = plt.subplots()
    fig.set_tight_layout(True)
    for ped_idx in range(num_peds):
        if ped_idx in full_obs_ped_idx:
            ax.plot(full_traj[ped_idx, obs_seq_len:, 0], full_traj[ped_idx, obs_seq_len:, 1], '.-', c='r')
            ax.plot(full_traj[ped_idx, :obs_seq_len, 0], full_traj[ped_idx, :obs_seq_len, 1], '.-', c='k') # black for obs   
        else:
            for t_idx in range(seq_len):
                if loss_mask[ped_idx,t_idx,0] == 1:
                    if t_idx < obs_seq_len:
                        # obs blue for partially detected pedestrians
                        ax.plot(full_traj[ped_idx, t_idx, 0], full_traj[ped_idx, t_idx, 1], '.', c='b')
                    else:
                        # pred orange for partially detected pedestrians
                        ax.plot(full_traj[ped_idx, t_idx, 0], full_traj[ped_idx, t_idx, 1], '.', c='C1', alpha=0.2)

    ax.set_aspect('equal', adjustable='box')
    ax.plot()
    fig.savefig(str(sample_index)+".png")
    print(str(sample_index)+".png is created.")
    return


if __name__ == '__main__':
    # *** Create an input that aligns with the format of CrowdNav
    obs_seq_len = 5
    pred_seq_len = 5
    invalid_value = -999.
    wrapper_demo_data = torch.load('/home/shuijing/Desktop/CrowdNav_Prediction/gst_updated/datasets/wrapper_demo/wrapper_demo_data.pt')
    print("wrapper_demo_data.pt is loaded.")
    input_traj, input_binary_mask = wrapper_demo_data['input_traj'], wrapper_demo_data['input_mask']
    n_env = input_traj.shape[0]
    assert input_traj.shape[2] == obs_seq_len
    """
    - input_traj:
        # tensor
        # (n_env, num_peds, obs_seq_len, 2)
    - input_binary_mask:
        # tensor
        # (n_env, num_peds, obs_seq_len, 1)
    """
    print()
    print("INPUT DATA")
    print("number of environments: ", n_env)
    print("input_traj shape: ", input_traj.shape)
    print("input_binary_mask shape:", input_binary_mask.shape)
    print()
    load_path = '/home/shuijing/Desktop/CrowdNav_Prediction/gst_updated/results/100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000/sj'
    # load_path = join(pathhack.pkg_path, 'results/100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000/sj')
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    args = "100-gumbel_social_transformer-faster_lstm-lr_0.001-init_temp_0.5-edge_head_0-ebd_64-snl_1-snh_8-seed_1000"
    model = CrowdNavPredInterfaceMultiEnv(load_path=load_path,
                                          device=device, config = args, num_env=n_env)

    input_traj = input_traj.cuda()
    input_binary_mask = input_binary_mask.cuda()
    output_traj, output_binary_mask = model.forward(
        input_traj,
        input_binary_mask,
        sampling = True,
    )
    print()
    print("OUTPUT DATA")
    print("output_traj shape: ", output_traj.shape)
    print("output_binary_mask shape:", output_binary_mask.shape)
    print()
    for sample_index in range(n_env):
        visualize_output_trajectory_deterministic(input_traj, input_binary_mask, output_traj, output_binary_mask, sample_index, obs_seq_len=5, pred_seq_len=5)


================================================
FILE: gst_updated/scripts/wrapper/pathhack.py
================================================
import sys
from os.path import abspath, join, split
file_path = split(abspath(__file__))[0]
pkg_path = join(file_path, '../..')
sys.path.append(pkg_path)


================================================
FILE: gst_updated/src/gumbel_social_transformer/edge_selector_ghost.py
================================================
# import pathhack
import torch
import torch.nn as nn
from torch.nn.functional import softmax
from src.gumbel_social_transformer.mha import VanillaMultiheadAttention
from src.gumbel_social_transformer.utils import _get_activation_fn, gumbel_softmax

class EdgeSelector(nn.Module):
    r"""Ghost version."""
    def __init__(self, d_motion, d_model, nhead=4, dropout=0.1, activation="relu"):
        super(EdgeSelector, self).__init__()
        assert d_model % nhead == 0
        self.nhead = nhead
        self.head_dim = d_model // nhead
        self.augmented_edge_embedding = nn.Linear(3*d_motion, d_model)
        self.self_attn = VanillaMultiheadAttention(d_model, nhead, dropout=0.0)
        self.norm_augmented_edge = nn.LayerNorm(d_model)
        self.linear1 = nn.Linear(self.head_dim, self.head_dim)
        self.linear2 = nn.Linear(self.head_dim, 1)
        self.dropout1 = nn.Dropout(dropout)
        self.activation = _get_activation_fn(activation)
        self.d_model = d_model
        self.d_motion = d_motion
        print("new edge selector")
        
    
    def forward(self, x, A, attn_mask, tau=1., hard=False, device='cuda:0'):
        """
        Encode pedestrian edge with node information.
        inputs:
            # * done: x, A need to be masked first before processing.
            - x: vertices representing pedestrians of one sample. 
                # * bsz is batch size corresponding to Transformer setting.
                # * In pedestrian setting, bsz = batch_size*time_step
                # (bsz, node, d_motion)
            - A: edges representation relationships between pedestrians of one sample.
                # (bsz, node, node, d_motion)
                # row -> neighbor, col -> target
            - attn_mask: attention mask provided in advance.
                # (bsz, target_node, neighbor_node)
                # 1. means yes, i.e. attention exists.  0. means no.
            - tau: temperature hyperparameter of gumbel softmax. 
                # ! Need annealing though training.
            - hard: hard or soft sampling.
                # True means one-hot sample for evaluation.
                # False means soft sample for reparametrization.
            - device: 'cuda:0' or 'cpu'.
        outputs:
            - edge_multinomial: The categorical distribution over the connections from targets to the neighbors
                # (time_step, target_node, num_heads, neighbor_node)
                # neighbor_node = nnode + 1 in ghost mode
            - sampled_edges: The edges sampled from edge_multinomial
                # (time_step, target_node, num_heads, neighbor_node)
                # neighbor_node = nnode + 1 in ghost mode
        """
        bsz, nnode, d_motion = x.shape
        assert d_motion == self.d_motion
        attn_mask = attn_mask.to("cpu")
        attn_mask_ped = (attn_mask.sum(-1) > 0).float().unsqueeze(-1) # (bsz, nnode, 1)
        x = x * attn_mask_ped.to(device)
        x_ghost = torch.zeros(bsz, 1, d_motion).to(device)
        x_neighbor = torch.cat((x, x_ghost), dim=1) # (bsz, nnode+1, d_motion)
        x_neighbor = torch.ones(bsz,nnode+1,nnode,d_motion).to(device)*x_neighbor.view(bsz,nnode+1,1,d_motion) # row -> neighbor
        x_target = torch.ones(bsz,nnode+1,nnode,d_motion).to(device)*x.view(bsz,1,nnode,d_motion) # col -> target
        x_neighbor_target = torch.cat((x_neighbor, x_target), dim=-1) # (bsz,nnode+1,nnode,2*d_motion)

        A = A * attn_mask.permute(0,2,1).unsqueeze(-1).to(device) # (bsz, neighbor_node, target_node, d_motion)
        A_ghost = torch.zeros(bsz, 1, nnode, d_motion).to(device)
        A = torch.cat((A, A_ghost), dim=1) # (bsz,nnode+1,nnode,d_motion)
        A = torch.cat((x_neighbor_target, A), dim=-1) # (bsz,nnode+1,nnode,3*d_motion) # n_node==t_node+1==nnode+1
        A = self.augmented_edge_embedding(A) # (bsz,nnode+1,nnode,d_model)
        A = self.norm_augmented_edge(A)
        A_perm = A.permute(0,2,1,3) # (bsz,nnode,nnode+1,d_model)
        A_perm = A_perm.reshape(A_perm.shape[0]*A_perm.shape[1], A_perm.shape[2], A_perm.shape[3]) # (time_step*target_node, neighbor_node, d_model)
        A_perm = A_perm.permute(1,0,2) # (neighbor_node, time_step*target_node, d_model)
        
        attn_mask_ghost = torch.ones(bsz, nnode, 1) # ghost exists all the time, not missing
        attn_mask = torch.cat((attn_mask, attn_mask_ghost), dim=2) #(bsz, nnode, nnode+1) # n_node==t_node+1==nnode+1
        attn_mask_neighbors = attn_mask.view(bsz, nnode, nnode+1, 1) * attn_mask.view(bsz, nnode, 1, nnode+1) # (bsz, nnode, nnode+1, nnode+1)
        attn_mask_neighbors = attn_mask_neighbors.view(bsz*nnode, nnode+1, nnode+1) # (time_step*target_node, neighbor_node, neighbor_node)
        # attn_mask_neighbors = torch.cat([attn_mask_neighbors for _ in range(self.nhead)], dim=0) # (nhead*time_step*target_node, neighbor_node, neighbor_node) # ! bug fixed
        attn_mask_neighbors = torch.stack([attn_mask_neighbors for _ in range(self.nhead)], dim=1) # (time_step*target_node, nhead, neighbor_node, neighbor_node)
        attn_mask_neighbors = attn_mask_neighbors.view(attn_mask_neighbors.shape[0]*attn_mask_neighbors.shape[1], \
            attn_mask_neighbors.shape[2], attn_mask_neighbors.shape[3]) # (time_step*target_node*nhead, neighbor_node, neighbor_node)   
        
        _, _, A2 = self.self_attn(A_perm, A_perm, A_perm, attn_mask=attn_mask_neighbors.to(device))
        # inputs: (L, N, E), (S, N, E), (S, N, E), (N*nhead, L, S)
        # bsz, num_heads, tgt_len, head_dim # A2 # (time_step*target_node, num_heads, neighbor_node, 4*d_model/nhead) # we use head_dim = 4*d_model/nhead
        A2 = A2.reshape(bsz, nnode, self.nhead, nnode+1, self.head_dim) # (time_step, target_node, num_heads, neighbor_node, head_dim)

        A2 = self.linear2(self.dropout1(self.activation(self.linear1(A2)))).squeeze(-1) # (time_step, target_node, num_heads, neighbor_node)
        edge_multinomial = softmax(A2, dim=-1) # (time_step, target_node, num_heads, neighbor_node)
        
        edge_multinomial = edge_multinomial * attn_mask.unsqueeze(2).to(device) # (time_step, target_node, num_heads, neighbor_node)
        edge_multinomial = edge_multinomial / (edge_multinomial.sum(-1).unsqueeze(-1)+1e-10)
        sampled_edges = self.edge_sampler(edge_multinomial, tau=tau, hard=hard)
        
        return edge_multinomial, sampled_edges


    def edge_sampler(self, edge_multinomial, tau=1., hard=False):
        r"""
        Sample from edge_multinomial using gumbel softmax for differentiable search.
        """
        logits = torch.log(edge_multinomial+1e-10) # (time_step, target_node, num_heads, neighbor_node)
        sampled_edges = gumbel_softmax(logits, tau=tau, hard=hard, eps=1e-10) # (time_step, target_node, num_heads, neighbor_node)
        return sampled_edges

if __name__ == "__main__":
    device = "cuda:0"
    edge_selector = EdgeSelector(2, 32, nhead=2, dropout=0.1).to(device)
    x = torch.randn(8, 3, 2)
    position = torch.randn(8,3,2)
    A = position.unsqueeze(2)-position.unsqueeze(1) # (8,3,3,2)
    loss_mask = torch.ones(3,8)
    loss_mask[0,:3] = 0.
    loss_mask[1,:5] = 0.
    attn_mask = []
    x, A = x.to(device), A.to(device)
    for i in range(8):
        attn_mask.append(torch.outer(loss_mask[:,i], loss_mask[:,i]))
    attn_mask = torch.stack(attn_mask, dim=0)
    attn_mask = attn_mask.to(device)
    print(attn_mask[3])
    edge_multinomial, sampled_edges = edge_selector(x, A, attn_mask, tau=1., hard=False, device=device)
    print("hello world.")

================================================
FILE: gst_updated/src/gumbel_social_transformer/edge_selector_no_ghost.py
================================================
import torch
import torch.nn as nn
from torch.nn.functional import softmax
from gst_updated.src.gumbel_social_transformer.mha import VanillaMultiheadAttention
from gst_updated.src.gumbel_social_transformer.utils import _get_activation_fn, gumbel_softmax


class EdgeSelector(nn.Module):
    r"""No ghost version."""
    def __init__(self, d_motion, d_model, nhead=4, dropout=0.1, activation="relu"):
        super(EdgeSelector, self).__init__()
        assert d_model % nhead == 0
        self.nhead = nhead
        self.head_dim = d_model // nhead
        self.augmented_edge_embedding = nn.Linear(3*d_motion, d_model)
        self.self_attn = VanillaMultiheadAttention(d_model, nhead, dropout=0.0)
        self.norm_augmented_edge = nn.LayerNorm(d_model)
        self.linear1 = nn.Linear(self.head_dim, self.head_dim)
        self.linear2 = nn.Linear(self.head_dim, 1)
        self.dropout1 = nn.Dropout(dropout)
        self.activation = _get_activation_fn(activation)
        self.d_model = d_model
        self.d_motion = d_motion

    
    def forward(self, x, A, attn_mask, tau=1., hard=False, device='cuda:0'):
        """
        Encode pedestrian edge with node information.
        inputs:
            # * done: x, A need to be masked first before processing.
            - x: vertices representing pedestrians of one sample. 
                # * bsz is batch size corresponding to Transformer setting.
                # * In pedestrian setting, bsz = batch_size*time_step
                # (bsz, node, d_motion)
            - A: edges representation relationships between pedestrians of one sample.
                # (bsz, node, node, 2*d_motion)
                # row -> neighbor, col -> target
            - attn_mask: attention mask provided in advance.
                # (bsz, target_node, neighbor_node)
                # 1. means yes, i.e. attention exists.  0. means no.
            - tau: temperature hyperparameter of gumbel softmax. 
                # ! Need annealing though training.
            - hard: hard or soft sampling.
                # True means one-hot sample for evaluation.
                # False means soft sample for reparametrization.
            - device: 'cuda:0' or 'cpu'.
        outputs:
            - edge_multinomial: The categorical distribution over the connections from targets to the neighbors
                # (time_step, target_node, num_heads, neighbor_node)
                # neighbor_node = nnode in no ghost mode
            - sampled_edges: The edges sampled from edge_multinomial
                # (time_step, target_node, num_heads, neighbor_node)
                # neighbor_node = nnode in no ghost mode
        """
        bsz, nnode, d_motion = x.shape
        assert d_motion == self.d_motion
        attn_mask = attn_mask.to("cpu")
        attn_mask_ped = (attn_mask.sum(-1) > 0).float().unsqueeze(-1) # (bsz, nnode, 1)
        x = x * attn_mask_ped.to(device)

        x_neighbor = torch.ones(bsz,nnode,nnode,d_motion).to(device)*x.view(bsz,nnode,1,d_motion) # row -> neighbor
        x_target = torch.ones(bsz,nnode,nnode,d_motion).to(device)*x.view(bsz,1,nnode,d_motion) # col -> target
        x_neighbor_target = torch.cat((x_neighbor, x_target), dim=-1) # (bsz,nnode,nnode,2*d_motion)

        A = A * attn_mask.permute(0,2,1).unsqueeze(-1).to(device) # (bsz, neighbor_node, target_node, d_motion)
        A = torch.cat((x_neighbor_target, A), dim=-1) # (bsz,n_node,t_node,3*d_motion)
        A = self.augmented_edge_embedding(A) # (bsz,n_node,t_node,d_model)
        A = self.norm_augmented_edge(A)
        A_perm = A.permute(0,2,1,3) # (bsz,t_node,n_node,d_model)
        A_perm = A_perm.reshape(A_perm.shape[0]*A_perm.shape[1], A_perm.shape[2], A_perm.shape[3]) # (time_step*target_node, neighbor_node, d_model)
        A_perm = A_perm.permute(1,0,2) # (neighbor_node, time_step*target_node, d_model)

        attn_mask_neighbors = attn_mask.view(bsz, nnode, nnode, 1) * attn_mask.view(bsz, nnode, 1, nnode) # (bsz, t_node, n_node, n_nnode)
        attn_mask_neighbors = attn_mask_neighbors.reshape(bsz*nnode, nnode, nnode) # (time_step*target_node, neighbor_node, neighbor_node)
        # attn_mask_neighbors = torch.cat([attn_mask_neighbors for _ in range(self.nhead)], dim=0) # (nhead*time_step*target_node, neighbor_node, neighbor_node) # ! bug fixed
        attn_mask_neighbors = torch.stack([attn_mask_neighbors for _ in range(self.nhead)], dim=1) # (time_step*target_node, nhead, neighbor_node, neighbor_node)
        attn_mask_neighbors = attn_mask_neighbors.view(attn_mask_neighbors.shape[0]*attn_mask_neighbors.shape[1], \
            attn_mask_neighbors.shape[2], attn_mask_neighbors.shape[3]) # (time_step*target_node*nhead, neighbor_node, neighbor_node)   

        _, _, A2 = self.self_attn(A_perm, A_perm, A_perm, attn_mask=attn_mask_neighbors.to(device)) 
        # bsz, num_heads, tgt_len, head_dim # A2 # (time_step*target_node, num_heads, neighbor_node, d_model/nhead) # we use head_dim = 4*d_model/nhead
        A2 = A2.reshape(bsz, nnode, self.nhead, nnode, self.head_dim) # (time_step, target_node, num_heads, neighbor_node, head_dim)

        A2 = self.linear2(self.dropout1(self.activation(self.linear1(A2)))).squeeze(-1) # (time_step, target_node, num_heads, neighbor_node)
        edge_multinomial = softmax(A2, dim=-1) # (time_step, target_node, num_heads, neighbor_node)
        edge_multinomial = edge_multinomial * attn_mask.unsqueeze(2).to(device) # (time_step, target_node, num_heads, neighbor_node)
        edge_multinomial = edge_multinomial / (edge_multinomial.sum(-1).unsqueeze(-1)+1e-10)
        sampled_edges = self.edge_sampler(edge_multinomial, tau=tau, hard=hard)
        return edge_multinomial, sampled_edges

    def edge_sampler(self, edge_multinomial, tau=1., hard=False):
        r"""
        Sample from edge_multinomial using gumbel softmax for differentiable search.
        """
        logits = torch.log(edge_multinomial+1e-10) # (time_step, target_node, num_heads, neighbor_node)
        sampled_edges = gumbel_softmax(logits, tau=tau, hard=hard, eps=1e-10) # (time_step, target_node, num_heads, neighbor_node)
        return sampled_edges

================================================
FILE: gst_updated/src/gumbel_social_transformer/gumbel_social_transformer.py
================================================
import torch
import torch.nn as nn
from gst_updated.src.gumbel_social_transformer.utils import _get_clones

class GumbelSocialTransformer(nn.Module):
    def __init__(self, d_motion, d_model, nhead_nodes, nhead_edges, nlayer, dim_feedforward=512, dim_hidden=32, \
        dropout=0.1, activation="relu", attn_mech="vanilla", ghost=True):
        super(GumbelSocialTransformer, self).__init__()
        if ghost:
            if nhead_edges == 0:
                raise RuntimeError("Full connectivity conflicts with the Ghost setting.")
            print("Ghost version.")
            from gst_updated.src.gumbel_social_transformer.edge_selector_ghost import EdgeSelector
            from gst_updated.src.gumbel_social_transformer.node_encoder_layer_ghost import NodeEncoderLayer
        else:
            print("No ghost version.")
            from gst_updated.src.gumbel_social_transformer.edge_selector_no_ghost import EdgeSelector
            from gst_updated.src.gumbel_social_transformer.node_encoder_layer_no_ghost import NodeEncoderLayer
        if nhead_edges != 0:
            # 0 means it is fully connected
            self.edge_selector = EdgeSelector(
                d_motion,
                d_model,
                nhead=nhead_edges,
                dropout=dropout,
                activation=activation,
            )
        self.node_embedding = nn.Linear(d_motion, d_model)
        node_encoder_layer = NodeEncoderLayer(
            d_model,
            nhead_nodes,
            dim_feedforward=dim_feedforward,
            dropout=dropout,
            activation=activation,
            attn_mech=attn_mech,
        )
        self.node_encoder_layers = _get_clones(node_encoder_layer, nlayer)
        self.nlayer = nlayer
        self.nhead_nodes = nhead_nodes
        self.nhead_edges = nhead_edges
        print("new gst")

    def forward(self, x, A, attn_mask, tau=1., hard=False, device='cuda:0'):
        r"""
        Pass the input through the encoder layers in turn.
        inputs:
            - x: vertices representing pedestrians of one sample. 
                # * bsz is batch size corresponding to Transformer setting.
                # * In pedestrian setting, bsz = batch_size*time_step
                # (bsz, nnode, d_motion)
            - A: edges representation relationships between pedestrians of one sample.
                # (bsz, nnode <neighbor>, nnode <target>, d_motion)
                # row -> neighbor, col -> target
            - attn_mask: attention mask provided in advance.
                # (bsz, nnode <target>, nnode <neighbor>)
                # row -> target, col -> neighbor
                # 1. means yes, i.e. attention exists.  0. means no.
            - tau: temperature hyperparameter of gumbel softmax.
                # ! Need annealing though training. 1 is considered really soft at the beginning.
            - hard: hard or soft sampling.
                # True means one-hot sample for evaluation.
                # False means soft sample for reparametrization.
            - device: 'cuda:0' or 'cpu'.
        outputs:
            - x: encoded vertices representing pedestrians of one sample. 
                # (bsz, nnode, d_model) # same as input
            - sampled_edges: sampled adjacency matrix at the last column.
                # (bsz, nnode <target>, nhead_edges, neighbor_node)
                # * where neighbor_node = nnode+1 <neighbor> for ghost==True,
                # * and   neighbor_node = nnode   <neighbor> for ghost==False.
            - edge_multinomial: multinomial where sampled_edges are sampled.
                # (bsz, nnode <target>, nhead_edges, neighbor_node)
            - attn_weights: attention weights during self-attention for nodes x.
                # (nlayer, bsz, nhead, nnode <target>, neighbor_node)
        """
        if self.nhead_edges != 0:
            # edge_multinomial # (bsz, nnode <target>, nhead_edges, neighbor_node)
            # sampled_edges    # (bsz, nnode <target>, nhead_edges, neighbor_node)
            # ! Remember here edges does not include self edges automatically.
            edge_multinomial, sampled_edges = \
                self.edge_selector(x, A, attn_mask, tau=tau, hard=hard, device=device)
        else:
            # full connectivity
            bsz, nnode = attn_mask.shape[0], attn_mask.shape[1]
            sampled_edges = torch.ones(bsz, nnode, 1, nnode).to(device) * attn_mask.unsqueeze(2)
            edge_multinomial = torch.ones(bsz, nnode, 1, nnode).to(device) * attn_mask.unsqueeze(2) # fake edge_multinomial, meaningless

        attn_weights_list = []
        x = self.node_embedding(x)
        for i in range(self.nlayer):
            x, attn_weights_layer = self.node_encoder_layers[i](x, sampled_edges, attn_mask, device=device)
            attn_weights_list.append(attn_weights_layer)
            # x (bsz, nnode, d_model)
            # attn_weights_layer: (bsz, nhead, nnode <target>, neighbor_node)
        attn_weights = torch.stack(attn_weights_list, dim=0) # (nlayer, bsz, nhead, nnode <target>, neighbor_node)
        return x, sampled_edges, edge_multinomial, attn_weights


================================================
FILE: gst_updated/src/gumbel_social_transformer/mha.py
================================================
import torch
from torch.nn.functional import softmax, dropout, linear
from torch import Tensor
from typing import Optional, Tuple
from torch.nn import Module, Parameter
# from torch.nn.modules.linear import _LinearWithBias
from torch.nn import Linear
from torch.nn.init import xavier_uniform_, constant_
from torch.overrides import has_torch_function


def multi_head_attention_forward(
    query: Tensor,
    key: Tensor,
    value: Tensor,
    embed_dim_to_check: int,
    num_heads: int,
    in_proj_weight: Tensor,
    in_proj_bias: Optional[Tensor],
    bias_k: Optional[Tensor],
    bias_v: Optional[Tensor],
    add_zero_attn: bool,
    dropout_p: float,
    out_proj_weight: Tensor,
    out_proj_bias: Optional[Tensor],
    training: bool = True,
    key_padding_mask: Optional[Tensor] = None,
    need_weights: bool = True,
    attn_mask: Optional[Tensor] = None,
    use_separate_proj_weight: bool = False,
    q_proj_weight: Optional[Tensor] = None,
    k_proj_weight: Optional[Tensor] = None,
    v_proj_weight: Optional[Tensor] = None,
    static_k: Optional[Tensor] = None,
    static_v: Optional[Tensor] = None,
) -> Tuple[Tensor, Optional[Tensor]]:
    tens_ops = (query, key, value, in_proj_weight, in_proj_bias, bias_k, bias_v, out_proj_weight, out_proj_bias)
    if has_torch_function(tens_ops):
        return handle_torch_function(
            multi_head_attention_forward,
            tens_ops,
            query,
            key,
            value,
            embed_dim_to_check,
            num_heads,
            in_proj_weight,
            in_proj_bias,
            bias_k,
            bias_v,
            add_zero_attn,
            dropout_p,
            out_proj_weight,
            out_proj_bias,
            training=training,
            key_padding_mask=key_padding_mask,
            need_weights=need_weights,
            attn_mask=attn_mask,
            use_separate_proj_weight=use_separate_proj_weight,
            q_proj_weight=q_proj_weight,
            k_proj_weight=k_proj_weight,
            v_proj_weight=v_proj_weight,
            static_k=static_k,
            static_v=static_v,
        )
    tgt_len, bsz, embed_dim = query.size()
    assert embed_dim == embed_dim_to_check
    assert key.size(0) == value.size(0) and key.size(1) == value.size(1)

    if isinstance(embed_dim, torch.Tensor):
        head_dim = embed_dim.div(num_heads, rounding_mode='trunc')
    else:
        head_dim = embed_dim // num_heads
    assert head_dim * num_heads == embed_dim, "embed_dim must be divisible by num_heads"
    scaling = float(head_dim) ** -0.5

    if not use_separate_proj_weight:
        if (query is key or torch.equal(query, key)) and (key is value or torch.equal(key, value)):
            q, k, v = linear(query, in_proj_weight, in_proj_bias).chunk(3, dim=-1)

        elif key is value or torch.equal(key, value):
            _b = in_proj_bias
            _start = 0
            _end = embed_dim
            _w = in_proj_weight[_start:_end, :]
            if _b is not None:
                _b = _b[_start:_end]
            q = linear(query, _w, _b)

            if key is None:
                assert value is None
                k = None
                v = None
            else:
                _b = in_proj_bias
                _start = embed_dim
                _end = None
                _w = in_proj_weight[_start:, :]
                if _b is not None:
                    _b = _b[_start:]
                k, v = linear(key, _w, _b).chunk(2, dim=-1)
        else:
            _b = in_proj_bias
            _start = 0
            _end = embed_dim
            _w = in_proj_weight[_start:_end, :]
            if _b is not None:
                _b = _b[_start:_end]
            q = linear(query, _w, _b)
            _b = in_proj_bias
            _start = embed_dim
            _end = embed_dim * 2
            _w = in_proj_weight[_start:_end, :]
            if _b is not None:
                _b = _b[_start:_end]
            k = linear(key, _w, _b)
            _b = in_proj_bias
            _start = embed_dim * 2
            _end = None
            _w = in_proj_weight[_start:, :]
            if _b is not None:
                _b = _b[_start:]
            v = linear(value, _w, _b)
    else:
        q_proj_weight_non_opt = torch.jit._unwrap_optional(q_proj_weight)
        len1, len2 = q_proj_weight_non_opt.size()
        assert len1 == embed_dim and len2 == query.size(-1)

        k_proj_weight_non_opt = torch.jit._unwrap_optional(k_proj_weight)
        len1, len2 = k_proj_weight_non_opt.size()
        assert len1 == embed_dim and len2 == key.size(-1)

        v_proj_weight_non_opt = torch.jit._unwrap_optional(v_proj_weight)
        len1, len2 = v_proj_weight_non_opt.size()
        assert len1 == embed_dim and len2 == value.size(-1)

        if in_proj_bias is not None:
            q = linear(query, q_proj_weight_non_opt, in_proj_bias[0:embed_dim])
            k = linear(key, k_proj_weight_non_opt, in_proj_bias[embed_dim : (embed_dim * 2)])
            v = linear(value, v_proj_weight_non_opt, in_proj_bias[(embed_dim * 2) :])
        else:
            q = linear(query, q_proj_weight_non_opt, in_proj_bias)
            k = linear(key, k_proj_weight_non_opt, in_proj_bias)
            v = linear(value, v_proj_weight_non_opt, in_proj_bias)
    q = q * scaling

    if attn_mask is not None:
        assert (
            attn_mask.dtype == torch.float32
            or attn_mask.dtype == torch.float64
            or attn_mask.dtype == torch.float16
            or attn_mask.dtype == torch.uint8
            or attn_mask.dtype == torch.bool
        ), "Only float, byte, and bool types are supported for attn_mask, not {}".format(attn_mask.dtype)
        if attn_mask.dtype == torch.uint8:
            warnings.warn("Byte tensor for attn_mask in nn.MultiheadAttention is deprecated. Use bool tensor instead.")
            attn_mask = attn_mask.to(torch.bool)

        if attn_mask.dim() == 2:
            attn_mask = attn_mask.unsqueeze(0)
            if list(attn_mask.size()) != [1, query.size(0), key.size(0)]:
                raise RuntimeError("The size of the 2D attn_mask is not correct.")
        elif attn_mask.dim() == 3:
            if list(attn_mask.size()) != [bsz * num_heads, query.size(0), key.size(0)]:
                raise RuntimeError("The size of the 3D attn_mask is not correct.")
        else:
            raise RuntimeError("attn_mask's dimension {} is not supported".format(attn_mask.dim()))

    if key_padding_mask is not None and key_padding_mask.dtype == torch.uint8:
        warnings.warn(
            "Byte tensor for key_padding_mask in nn.MultiheadAttention is deprecated. Use bool tensor instead."
        )
        key_padding_mask = key_padding_mask.to(torch.bool)

    if bias_k is not None and bias_v is not None:
        if static_k is None and static_v is None:
            k = torch.cat([k, bias_k.repeat(1, bsz, 1)])
            v = torch.cat([v, bias_v.repeat(1, bsz, 1)])
            if attn_mask is not None:
                attn_mask = pad(attn_mask, (0, 1))
            if key_padding_mask is not None:
                key_padding_mask = pad(key_padding_mask, (0, 1))
        else:
            assert static_k is None, "bias cannot be added to static key."
            assert static_v is None, "bias cannot be added to static value."
    else:
        assert bias_k is None
        assert bias_v is None

    q = q.contiguous().view(tgt_len, bsz * num_heads, head_dim).transpose(0, 1)
    if k is not None:
        k = k.contiguous().view(-1, bsz * num_heads, head_dim).transpose(0, 1)
    if v is not None:
        v = v.contiguous().view(-1, bsz * num_heads, head_dim).transpose(0, 1)

    if static_k is not None:
        assert static_k.size(0) == bsz * num_heads
        assert static_k.size(2) == head_dim
        k = static_k

    if static_v is not None:
        assert static_v.size(0) == bsz * num_heads
        assert static_v.size(2) == head_dim
        v = static_v

    src_len = k.size(1)

    if key_padding_mask is not None:
        assert key_padding_mask.size(0) == bsz
        assert key_padding_mask.size(1) == src_len

    if add_zero_attn:
        src_len += 1
        k = torch.cat([k, torch.zeros((k.size(0), 1) + k.size()[2:], dtype=k.dtype, device=k.device)], dim=1)
        v = torch.cat([v, torch.zeros((v.size(0), 1) + v.size()[2:], dtype=v.dtype, device=v.device)], dim=1)
        if attn_mask is not None:
            attn_mask = pad(attn_mask, (0, 1))
        if key_padding_mask is not None:
            key_padding_mask = pad(key_padding_mask, (0, 1))

    attn_output_weights = torch.bmm(q, k.transpose(1, 2))
    assert list(attn_output_weights.size()) == [bsz * num_heads, tgt_len, src_len]

    if attn_mask is not None:
        if attn_mask.dtype == torch.bool:
            attn_output_weights.masked_fill_(attn_mask, float("-inf"))

    if key_padding_mask is not None:
        attn_output_weights = attn_output_weights.view(bsz, num_heads, tgt_len, src_len)
        attn_output_weights = attn_output_weights.masked_fill(
            key_padding_mask.unsqueeze(1).unsqueeze(2),
            float("-inf"),
        )
        attn_output_weights = attn_output_weights.view(bsz * num_heads, tgt_len, src_len)

    attn_output_weights = softmax(attn_output_weights, dim=-1)
    if attn_mask is not None:
        if attn_mask.dtype == torch.float32 \
            or attn_mask.dtype == torch.float64 \
            or attn_mask.dtype == torch.float16:
            attn_output_weights = attn_output_weights * attn_mask
            attn_output_weights = attn_output_weights / (attn_output_weights.sum(-1).unsqueeze(-1)+1e-10)
    attn_output_weights = dropout(attn_output_weights, p=dropout_p, training=training)

    attn_output = torch.bmm(attn_output_weights, v)
    assert list(attn_output.size()) == [bsz * num_heads, tgt_len, head_dim]
    attn_output_heads = attn_output.reshape(bsz, num_heads, tgt_len, head_dim)
    attn_output = attn_output.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
    attn_output = linear(attn_output, out_proj_weight, out_proj_bias)

    if need_weights:
        attn_output_weights = attn_output_weights.view(bsz, num_heads, tgt_len, src_len)
        return attn_output, attn_output_weights, attn_output_heads

    else:
        return attn_output, None, attn_output_heads


class VanillaMultiheadAttention(Module):
    bias_k: Optional[torch.Tensor]
    bias_v: Optional[torch.Tensor]

    def __init__(self, embed_dim, num_heads, dropout=0., bias=True, add_bias_kv=False, add_zero_attn=False, kdim=None, vdim=None):
        super(VanillaMultiheadAttention, self).__init__()
        self.embed_dim = embed_dim
        self.kdim = kdim if kdim is not None else embed_dim
        self.vdim = vdim if vdim is not None else embed_dim
        self._qkv_same_embed_dim = self.kdim == embed_dim and self.vdim == embed_dim

        self.num_heads = num_heads
        self.dropout = dropout
        self.head_dim = embed_dim // num_heads
        assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads"

        if self._qkv_same_embed_dim is False:
            self.q_proj_weight = Parameter(torch.Tensor(embed_dim, embed_dim))
            self.k_proj_weight = Parameter(torch.Tensor(embed_dim, self.kdim))
            self.v_proj_weight = Parameter(torch.Tensor(embed_dim, self.vdim))
            self.register_parameter('in_proj_weight', None)
        else:
            self.in_proj_weight = Parameter(torch.empty(3 * embed_dim, embed_dim))
            self.register_parameter('q_proj_weight', None)
            self.register_parameter('k_proj_weight', None)
            self.register_parameter('v_proj_weight', None)

        if bias:
            self.in_proj_bias = Parameter(torch.empty(3 * embed_dim))
        else:
            self.register_parameter('in_proj_bias', None)
        # self.out_proj = _LinearWithBias(embed_dim, embed_dim)
        self.out_proj = Linear(embed_dim, embed_dim)

        if add_bias_kv:
            self.bias_k = Parameter(torch.empty(1, 1, embed_dim))
            self.bias_v = Parameter(torch.empty(1, 1, embed_dim))
        else:
            self.bias_k = self.bias_v = None

        self.add_zero_attn = add_zero_attn

        self._reset_parameters()

    def _reset_parameters(self):
        if self._qkv_same_embed_dim:
            xavier_uniform_(self.in_proj_weight)
        else:
            xavier_uniform_(self.q_proj_weight)
            xavier_uniform_(self.k_proj_weight)
            xavier_uniform_(self.v_proj_weight)

        if self.in_proj_bias is not None:
            constant_(self.in_proj_bias, 0.)
            constant_(self.out_proj.bias, 0.)
        if self.bias_k is not None:
            xavier_normal_(self.bias_k)
        if self.bias_v is not None:
            xavier_normal_(self.bias_v)

    def __setstate__(self, state):
        if '_qkv_same_embed_dim' not in state:
            state['_qkv_same_embed_dim'] = True

        super(VanillaMultiheadAttention, self).__setstate__(state)

    def forward(self, query: Tensor, key: Tensor, value: Tensor, key_padding_mask: Optional[Tensor] = None,
                need_weights: bool = True, attn_mask: Optional[Tensor] = None) -> Tuple[Tensor, Optional[Tensor]]:
        if not self._qkv_same_embed_dim:
            return multi_head_attention_forward(
                query, key, value, self.embed_dim, self.num_heads,
                self.in_proj_weight, self.in_proj_bias,
                self.bias_k, self.bias_v, self.add_zero_attn,
                self.dropout, self.out_proj.weight, self.out_proj.bias,
                training=self.training,
                key_padding_mask=key_padding_mask, need_weights=need_weights,
                attn_mask=attn_mask, use_separate_proj_weight=True,
                q_proj_weight=self.q_proj_weight, k_proj_weight=self.k_proj_weight,
                v_proj_weight=self.v_proj_weight)
        else:
            return multi_head_attention_forward(
                query, key, value, self.embed_dim, self.num_heads,
                self.in_proj_weight, self.in_proj_bias,
                self.bias_k, self.bias_v, self.add_zero_attn,
                self.dropout, self.out_proj.weight, self.out_proj.bias,
                training=self.training,
                key_padding_mask=key_padding_mask, need_weights=need_weights,
                attn_mask=attn_mask)

================================================
FILE: gst_updated/src/gumbel_social_transformer/node_encoder_layer_ghost.py
================================================
import torch
import torch.nn as nn
from src.gumbel_social_transformer.mha import VanillaMultiheadAttention
from src.gumbel_social_transformer.utils import _get_activation_fn

class NodeEncoderLayer(nn.Module):
    r"""Ghost version"""
    def __init__(self, d_model, nhead, dim_feedforward=512, dropout=0.1, activation="relu", attn_mech='vanilla'):
        super(NodeEncoderLayer, self).__init__()
        self.attn_mech = attn_mech
        if self.attn_mech == 'vanilla':
            self.self_attn = VanillaMultiheadAttention(d_model, nhead, dropout=dropout)
            self.norm_node = nn.LayerNorm(d_model)
        else:
            raise RuntimeError('NodeEncoderLayer currently only supports vanilla mode.')
        self.norm1_node = nn.LayerNorm(d_model)
        self.linear1 = nn.Linear(d_model, dim_feedforward)
        self.linear2 = nn.Linear(dim_feedforward, d_model)
        self.dropout = nn.Dropout(dropout)
        self.dropout1 = nn.Dropout(dropout)
        self.dropout2 = nn.Dropout(dropout)
        self.activation = _get_activation_fn(activation)
        self.nhead = nhead
        self.d_model = d_model
    
    def forward(self, x, sampled_edges, attn_mask, device="cuda:0"):
        """
        Encode pedestrian edge with node information.
        inputs:
            - x: vertices representing pedestrians of one sample. 
                # bsz is batch size corresponding to Transformer setting. it corresponds to time steps in pedestrian setting.
                # (bsz, nnode, d_model)
            - sampled_edges: sampled adjacency matrix with ghost at the last column.
                # (time_step, nnode <target>, nhead_edges, neighbor_node)
                # where neighbor_node = nnode+1 <neighbor>
            - attn_mask: attention mask provided in advance.
                # (bsz, nnode <target>, nnode <neighbor>)
                # row -> target, col -> neighbor
                # 1. means yes, i.e. attention exists.  0. means no.
            - device: 'cuda:0' or 'cpu'.
        outputs:
            - x: encoded vertices. # (bsz, nnode, d_model)
            - attn_weights: attention weights. # (bsz, nhead, nnode <target>, neighbor_node)
                # where neighbor_node = nnode+1 <neighbor>
        """
        if self.attn_mech == 'vanilla':
            bsz = x.shape[0]
            attn_mask_ped = (attn_mask.sum(-1) > 0).float().unsqueeze(-1).to(device) # (bsz, nnode, 1)
            x = self.norm_node(x)
            x = x * attn_mask_ped
            x_ghost = torch.zeros(bsz, 1, self.d_model).to(device)
            # add zero vector for ghost at every node encoder layer.
            x_w_ghost = torch.cat((x, x_ghost), dim=1) # (bsz, nnode+1, d_model) # x with ghost
            x_w_ghost_perm = x_w_ghost.permute(1,0,2) # (nnode+1, bsz, d_model)
            x_perm = x.permute(1, 0, 2) # (nnode, bsz, d_model)
            adj_mat = sampled_edges.sum(2) # (bsz, nnode, nnode+1)
            # adj_mat = torch.cat([adj_mat for _ in range(self.nhead)], dim=0) # (nhead*bsz, target_node, neighbor_node)
            adj_mat = torch.stack([adj_mat for _ in range(self.nhead)], dim=1) # (bsz, nhead, target_node, neighbor_node)
            adj_mat = adj_mat.view(bsz*self.nhead, adj_mat.shape[2], adj_mat.shape[3]) # (bsz*nhead, target_node, neighbor_node) # ! really important bug fix
            x2, attn_weights, _ = self.self_attn(x_perm, x_w_ghost_perm, x_w_ghost_perm, attn_mask=adj_mat) 
            # inputs: (L, N, E), (S, N, E), (S, N, E), (N*nhead, L, S)
            # x2: (nnode, bsz, d_model); attn_weights: (bsz, nhead, target_node, neighbor_node)
            x2 = x2.permute(1, 0, 2) # (bsz, node, d_model)
            x = x + self.dropout(x2)
        else:
            raise RuntimeError('NodeEncoderLayer currently only supports vanilla mode.')
        
        x2 = self.norm1_node(x)
        x2 = self.dropout1(self.activation(self.linear1(x2)))
        x2 = self.dropout2(self.linear2(x2))
        x = x + x2
        return x, attn_weights

================================================
FILE: gst_updated/src/gumbel_social_transformer/node_encoder_layer_no_ghost.py
================================================
import torch
import torch.nn as nn
from gst_updated.src.gumbel_social_transformer.mha import VanillaMultiheadAttention
from gst_updated.src.gumbel_social_transformer.utils import _get_activation_fn

class NodeEncoderLayer(nn.Module):
    r"""No ghost version"""
    def __init__(self, d_model, nhead, dim_feedforward=512, dropout=0.1, activation="relu", attn_mech='vanilla'):
        super(NodeEncoderLayer, self).__init__()
        self.attn_mech = attn_mech
        if self.attn_mech == 'vanilla':
            self.self_attn = VanillaMultiheadAttention(d_model, nhead, dropout=dropout)
            self.norm_node = nn.LayerNorm(d_model)
        else:
            raise RuntimeError('NodeEncoderLayer currently only supports vanilla mode.')
        self.norm1_node = nn.LayerNorm(d_model)
        self.linear1 = nn.Linear(d_model, dim_feedforward)
        self.linear2 = nn.Linear(dim_feedforward, d_model)
        self.dropout = nn.Dropout(dropout)
        self.dropout1 = nn.Dropout(dropout)
        self.dropout2 = nn.Dropout(dropout)
        self.activation = _get_activation_fn(activation)
        self.nhead = nhead
    
    def forward(self, x, sampled_edges, attn_mask, device="cuda:0"):
        """
        Encode pedestrian edge with node information.
        inputs:
            - x: vertices representing pedestrians of one sample. 
                # bsz is batch size corresponding to Transformer setting. it corresponds to time steps in pedestrian setting.
                # (bsz, nnode, d_model)
            - sampled_edges: sampled adjacency matrix with ghost at the last column.
                # (time_step, nnode <target>, nhead_edges, neighbor_node)
                # where neighbor_node = nnode <neighbor>
            - attn_mask: attention mask provided in advance.
                # (bsz, nnode <target>, nnode <neighbor>)
                # row -> target, col -> neighbor
                # 1. means yes, i.e. attention exists.  0. means no.
            - device: 'cuda:0' or 'cpu'.
        outputs:
            - x: encoded vertices. # (bsz, nnode, d_model)
            - attn_weights: attention weights. # (bsz, nhead, nnode <target>, neighbor_node)
                # where neighbor_node = nnode <neighbor>
        """
        if self.attn_mech == 'vanilla':
            bsz = x.shape[0]
            attn_mask_ped = (attn_mask.sum(-1) > 0).float().unsqueeze(-1).to(device) # (bsz, nnode, 1)
            x = self.norm_node(x)
            x = x * attn_mask_ped
            x_perm = x.permute(1, 0, 2) # (nnode, bsz, d_model)
            adj_mat = sampled_edges.sum(2) # (bsz, nnode, nnode)
            # adj_mat = torch.cat([adj_mat for _ in range(self.nhead)], dim=0) # (nhead*bsz, target_node, neighbor_node)
            adj_mat = torch.stack([adj_mat for _ in range(self.nhead)], dim=1) # (bsz, nhead, target_node, neighbor_node)
            adj_mat = adj_mat.view(bsz*self.nhead, adj_mat.shape[2], adj_mat.shape[3]) # (bsz*nhead, target_node, neighbor_node) # ! really important bug fix
            x2, attn_weights, _ = self.self_attn(x_perm, x_perm, x_perm, attn_mask=adj_mat) 
            # inputs: (L, N, E), (S, N, E), (S, N, E), (N*nhead, L, S)
            # x2: (nnode, bsz, d_model); attn_weights: (bsz, nhead, target_node, neighbor_node)
            x2 = x2.permute(1, 0, 2) # (bsz, node, d_model)
            x = x + self.dropout(x2)
        else:
            raise RuntimeError('NodeEncoderLayer currently only supports vanilla mode.')
        
        x2 = self.norm1_node(x)
        x2 = self.dropout1(self.activation(self.linear1(x2)))
        x2 = self.dropout2(self.linear2(x2))
        x = x + x2
        return x, attn_weights

================================================
FILE: gst_updated/src/gumbel_social_transformer/pathhack.py
================================================
import sys
from os.path import abspath, join, split
file_path = split(abspath(__file__))[0]
pkg_path = join(file_path, '../..')
sys.path.append(pkg_path)


================================================
FILE: gst_updated/src/gumbel_social_transformer/st_model.py
================================================
"""
st_model.py
A multi-pedestrian trajectory prediction model 
that follows spatial -> temporal encoding manners.
"""

import torch
import torch.nn as nn

# from src.social_transformer.social_transformer import SpatialSocialTransformerEncoder
from gst_updated.src.gumbel_social_transformer.gumbel_social_transformer import GumbelSocialTransformer
from gst_updated.src.gumbel_social_transformer.temporal_convolution_net import TemporalConvolutionNet


def offset_error_square_full_partial(x_pred, x_target, loss_mask_ped, loss_mask_pred_seq):
    """
    Offset Error Square between positions.
    # * average_offset_error and final_offset_error in utils.py are computed for full pedestrians.
    inputs:
        - x_pred
            # prediction on pedestrian displacements in prediction period.
            # (batch, pred_seq_len, node, motion_dim)
            # batch = 1
        - x_target
            # ground truth pedestrian displacements in prediction period.
            # (batch, pred_seq_len, node, motion_dim)
        - loss_mask_ped
            # loss mask on each pedestrian. 1 means the pedestrian is valid, and 0 means not valid.
            # * equivalent as loss_mask_rel_full_partial in st_model.
            # * Used to filter out the ones we do not predict. (disappear early, not appear until prediction period.)
            # (batch, node)
        - loss_mask_pred_seq
            # loss_mask_rel in prediction sequence. float32 tensor: (batch, num_peds, pred_seq_len)
    outputs:
        - offset_error_sq: offset error for each pedestrians. 
            # Already times eventual_loss_mask before output. shape: (pred_seq_len, node)
        - eventual_loss_mask: eventual loss mask on each pedestrian and each prediction step. 
            # shape: (pred_seq_len, node)
    """
    assert x_pred.shape[0] == loss_mask_ped.shape[0] == loss_mask_pred_seq.shape[0] == 1 # batch
    assert x_pred.shape[1] == x_target.shape[1] == loss_mask_pred_seq.shape[2] # pred_seq_len
    assert x_pred.shape[2] == x_target.shape[2] == loss_mask_ped.shape[1] == loss_mask_pred_seq.shape[1] # num_peds
    assert x_pred.shape[3] == x_target.shape[3] == 2 # motion dim
    # mask out invalid values
    loss_mask_rel_pred = loss_mask_pred_seq.permute(0, 2, 1).unsqueeze(-1) # (batch, pred_seq_len, num_peds, 1)
    x_pred_m = x_pred * loss_mask_rel_pred # (batch, pred_seq_len, node, motion_dim)
    x_target_m = x_target * loss_mask_rel_pred
    x_pred_m = x_pred_m * loss_mask_ped.unsqueeze(1).unsqueeze(-1) # (batch, pred_seq_len, node, motion_dim)
    x_target_m = x_target_m * loss_mask_ped.unsqueeze(1).unsqueeze(-1) # (batch, pred_seq_len, node, motion_dim)

    pos_pred = torch.cumsum(x_pred_m, dim=1)
    pos_target = torch.cumsum(x_target_m, dim=1)
    offset_error_sq = (((pos_pred-pos_target)**2.).sum(3))[0] # (pred_seq_len, node)

    eventual_loss_mask = loss_mask_rel_pred[0,:,:,0] * loss_mask_ped[0] # (pred_seq_len, node)
    offset_error_sq = offset_error_sq * eventual_loss_mask

    return offset_error_sq, eventual_loss_mask


def negative_log_likelihood_full_partial(gaussian_params, x_target, loss_mask_ped, loss_mask_pred_seq):
    """
    Compute negative log likelihood of gaussian parameters.
    inputs:
        - gaussian_params: tuple.
            - mu: (batch, pred_seq_len, node, 2)
            - sx: (batch, pred_seq_len, node, 1)
            - sy: (batch, pred_seq_len, node, 1)
            - corr: (batch, pred_seq_len, node, 1)
        - x_target
            # ground truth pedestrian displacements in prediction period.
            # (batch, pred_seq_len, node, motion_dim)
        - loss_mask_ped
            # loss mask on each pedestrian. 1 means the pedestrian is valid, and 0 means not valid.
            # * equivalent as loss_mask_rel_full_partial in st_model.
            # * Used to filter out the ones we do not predict. (disappear early, not appear until prediction period.)
            # (batch, node)
        - loss_mask_pred_seq
            # loss_mask_rel in prediction sequence. float32 tensor: (batch, num_peds, pred_seq_len)
    outputs:
        - prob_loss: (pred_seq_len, node)
        - eventual_loss_mask: eventual loss mask on each pedestrian and each prediction step. 
            # shape: (pred_seq_len, node)
    """

    mu, sx, sy, corr = gaussian_params
    
    loss_mask_rel_pred = loss_mask_pred_seq.permute(0, 2, 1).unsqueeze(-1) # (batch, pred_seq_len, num_peds, 1)
    mu = mu * loss_mask_rel_pred # (batch, pred_seq_len, node, 2)
    corr = corr * loss_mask_rel_pred # (batch, pred_seq_len, node, 1)
    x_target = x_target * loss_mask_rel_pred
    mu = mu * loss_mask_ped.unsqueeze(1).unsqueeze(-1) # (batch, pred_seq_len, node, 2)
    corr = corr * loss_mask_ped.unsqueeze(1).unsqueeze(-1) # (batch, pred_seq_len, node, 1)
    x_target = x_target * loss_mask_ped.unsqueeze(1).unsqueeze(-1) # (batch, pred_seq_len, node, 2)
    
    sx = sx * loss_mask_rel_pred + (1.-loss_mask_rel_pred) # sigma should not be zero, so mask with one
    sy = sy * loss_mask_rel_pred + (1.-loss_mask_rel_pred) # sigma should not be zero, so mask with one
    sx = sx * loss_mask_ped.unsqueeze(1).unsqueeze(-1) + (1.-loss_mask_ped.unsqueeze(1).unsqueeze(-1)) # (batch, pred_seq_len, node, 1)
    sy = sy * loss_mask_ped.unsqueeze(1).unsqueeze(-1) + (1.-loss_mask_ped.unsqueeze(1).unsqueeze(-1)) # (batch, pred_seq_len, node, 1)
    sigma = torch.cat((sx, sy), dim=3)

    x_target_norm = (x_target-mu)/sigma
    nx, ny = x_target_norm[:,:,:,0:1], x_target_norm[:,:,:,1:2]
    loss_term_1 = torch.log(1.-corr**2.)/2.+torch.log(sx)+torch.log(sy)
    loss_term_2 = (nx**2.-2.*corr*nx*ny+ny**2.)/(2.*(1.-corr**2.))
    prob_loss = (loss_term_1+loss_term_2).squeeze(3).squeeze(0) # (pred_seq_len, node)
    eventual_loss_mask = loss_mask_rel_pred[0,:,:,0] * loss_mask_ped[0] # (pred_seq_len, node)
    prob_loss = prob_loss * eventual_loss_mask

    return prob_loss, eventual_loss_mask


class st_model(nn.Module):

    def __init__(self, args, device='cuda:0'):
        """
        Initialize spatial and temporal encoding components.
        inputs:
            - args: arguments from user input. Here only list arguments used in st_model.
                (in __init__)
                ### * in function __init__() * ###
                - spatial # spatial encoding methods. options: rel_conv.
                - temporal # temporal encoding methods. options: lstm.
                - motion_dim # pedestrian motion is 2D, so motion_dim is always 2.
                - output_dim # 5 means probabilistic output (mu_x, mu_y, sigma_x, sigma_y, corr)
                # 2 means deterministic output (x, y) # ! may not do output_dim=2 in our work
                - embedding_size # size of pedstrian embeddings after spatial encoding.
                - spatial_num_heads # number of heads for multi-head attention
                # mechanism in spatial encoding.
                - spatial_beta # beta used in skip connection as a percentage of original input.
                # default can be None. If beta is not None, beta = 0.9 means
                # out <- 0.9 * x + 0.1 * out
                - lstm_hidden_size # hidden size of lstm.
                - lstm_num_layers # number of layers of lstm.
                - lstm_batch_first # batch first or not for lstm. 
                - lstm_dropout # dropout rate of lstm.
                - decode_style # 'recursive' or 'readout'.
                # 'recursive' means recursively encode and decode.
                # 'readout' means encoding and decoding are separated.
                - detach_sample # bool value on whether detach samples from gaussian_params or not.
                # detach_sample=False is default. It means using reparametrization trick and enable gradient flow.
                # detach_sample=True means to disable reparametrization trick.
                # ! To add
                # ! args.spatial_num_heads_edges
                # ! args.ghost
                ### * in function foward() * ###
                - pred_seq_len # length of prediction period: 12

            - device: 'cuda:0' or 'cpu'.
        """
        super(st_model, self).__init__()
        ## spatial
        if args.spatial == 'gumbel_social_transformer':
            # self.node_embedding = nn.Linear(args.motion_dim, args.embedding_size).to(device)
            # self.edge_embedding = nn.Linear(args.motion_dim, 2 * args.embedding_size).to(device)
            self.gumbel_social_transformer = GumbelSocialTransformer(
                args.motion_dim,
                args.embedding_size,
                args.spatial_num_heads,
                args.spatial_num_heads_edges,
                args.spatial_num_layers,
                dim_feedforward=128,
                dim_hidden=32,
                dropout=0.1,
                activation="relu",
                attn_mech="vanilla",
                ghost=args.ghost,
            ).to(device)
            # self.spatial_social_transformer = SpatialSocialTransformerEncoder(args.embedding_size, args.spatial_num_heads, args.spatial_num_layers, \
            #     dim_feedforward=256, dim_hidden=32, dropout=0.1, activation="relu", attn_mech="vanilla").to(device)
        else:
            raise RuntimeError('The spatial component is not found.')
        ## temporal
        if args.temporal == 'lstm' or args.temporal == 'faster_lstm':
            self.lstm = nn.LSTM(
                    input_size=args.embedding_size,
                    hidden_size=args.lstm_hidden_size,
                    num_layers=args.lstm_num_layers,
                    batch_first=False,
                    dropout=0.,
                    bidirectional=False,
                    ).to(device)
            self.hidden2pos = nn.Linear(args.lstm_num_layers*args.lstm_hidden_size, args.output_dim).to(device)
        else:
            raise RuntimeError('The temporal component is not lstm nor faster_lstm.')
        ## others
        self.args = args
        print("new st model")

    def raw2gaussian(self, prob_raw):
        """
        Turn raw values into gaussian parameters.
        inputs:
            - prob_raw: (batch, time, node, output_dim)
            - device: 'cuda:0' or 'cpu'.
        outputs:
            - gaussian_params: tuple.
                - mu: (batch, time, node, 2)
                - sx: (batch, time, node, 1)
                - sy: (batch, time, node, 1)
                - corr: (batch, time, node, 1)
        """
        mu = prob_raw[:,:,:,:2]
        sx, sy = torch.exp(prob_raw[:,:,:,2:3]), torch.exp(prob_raw[:,:,:,3:4])
        corr = torch.tanh(prob_raw[:,:,:,4:5])
        gaussian_params = (mu, sx, sy, corr)
        return gaussian_params

    def sample_gaussian(self, gaussian_params, device='cuda:0', detach_sample=False, sampling=True):
        """
        Generate a sample from Gaussian.
        inputs:
            - gaussian_params: tuple.
                - mu: (batch, time, node, 2)
                - sx: (batch, time, node, 1)
                - sy: (batch, time, node, 1)
                - corr: (batch, time, node, 1)
            - device: 'cuda:0' or 'cpu'
            - detach_sample: Bool. Default False.
                # Detach is to cut the gradient flow between gaussian_params and the next sample.
                # detach_sample=True means reparameterization trick is disabled.
                # detach_sample=False means reparameterization trick is enabled.
                # ! if it causes error, we need to manually turn detach_sample=False 
                # ! or we have to change args file for val_best before jan 4, 2021.
            - sampling: 
                # True means sampling. # False means using mu.
        outputs:
            - sample: (batch, time, node, 2)
        """
        mu, sx, sy, corr = gaussian_params
        if detach_sample:
            mu, sx, sy, corr = mu.detach(), sx.detach(), sy.detach(), corr.detach()
        if sampling:
            sample_unit = torch.empty(mu.shape).normal_().to(device) # N(0,1) with shape (batch, time, node, 2)
            sample_unit_x, sample_unit_y = sample_unit[:,:,:,0:1], sample_unit[:,:,:,1:2] # (batch, time, node, 1)
            sample_x = sx*sample_unit_x
            sample_y = corr*sy*sample_unit_x+((1.-corr**2.)**0.5)*sy*sample_unit_y
            sample = torch.cat((sample_x, sample_y), dim=3)+mu
        else:
            sample = mu
        return sample


    def edge_evolution(self, xt_plus, At, device='cuda:0'):
        """
        Compute edges at the next time step (At_plus) based on 
        pedestrian displacements at the next time step (xt_plus)
        and edges at the current time step (At).
        inputs:
            - xt_plus: vertices representing pedestrian displacement from t to t+1.
            # (batch, unit_time, node, motion_dim)
            - At: edges representing relative position between pedestrians at time t.
            At(i, j) is the vector pos_i,t - pos_j,t. I.e. the vector from pedestrian j
            to pedestrian i. 
            # (batch, unit_time, node, node, edge_feat)
            # batch = unit_time = 1.
            # edge_feat = 2.
            - device: 'cuda:0' or 'cpu'.
        outputs:
            - At_plus: edges representing relative position between pedestrians at time t.
            # (batch, unit_time, node, node, edge_feat)
        """
        # xt_plus # (batch, unit_time, node, motion_dim)
        # At # (batch, unit_time, node, node, edge_feat)
        # (batch, unit_time, node, 1, motion_dim) - (batch, unit_time, 1, node, motion_dim)
        At_plus = At + (xt_plus.unsqueeze(3) - xt_plus.unsqueeze(2))
        return At_plus

    def forward(self, x, A, attn_mask, loss_mask_rel, tau=1., hard=False, sampling=True, device='cuda:0'):
        """
        Forward function.
        inputs:
            - x
                # vertices representing pedestrians during observation period.
                # (batch, obs_seq_len, node, in_feat)
                # node: number of pedestrians
                # in_feat: motion_dim, i.e. 2.
                # Refer to V_obs in src.mgnn.utils.dataset_format().
            - A
                # edges representation relationships between pedestrians during observation period.
                # (batch, obs_seq_len, node, node, edge_feat)
                # edge_feat: feature dim of edges. if spatial encoding is rel_conv, edge_feat = 2. 
                # Refer to A_obs in src.mgnn.utils.dataset_format().
            - attn_mask
                # attention mask on pedestrian interactions in observation period.
                # row -> neighbor, col -> target
                # Should neighbor affect target?
                # 1 means yes, i.e. attention exists.  0 means no.
                # float32 tensor: (batch, obs_seq_len, neighbor_num_peds, target_num_peds)
            - loss_mask_rel
                # loss mask on displacement in the whole period
                # float32 tensor: (batch, num_peds, seq_len)
                # 1 means the displacement of pedestrian i at time t is valid. 0 means not valid.
                # If the displacement of pedestrian i at time t is valid,
                # then position of pedestrian i at time t and t-1 is valid.
                # If t is zero, then it means position of pedestrian i at time t is valid.
            - tau: temperature hyperparameter of gumbel softmax.
                # ! Need annealing though training. 1 is considered really soft at the beginning.
            - hard: hard or soft sampling.
                # True means one-hot sample for evaluation.
                # False means soft sample for reparametrization.
            - sampling: sample gaussian (True) or use mean for prediction (False).
            - device: 'cuda:0' or 'cpu'.
        outputs:
            # TODO
        """
        # ! Start writing multiple batches
        info = {}
        # processing when missing pedestrians are included
        batch_size, _, num_peds, _ = x.shape
        loss_mask_per_pedestrian = (loss_mask_rel.sum(2)==loss_mask_rel.shape[2]).float() # (batch, num_peds)
        if self.args.only_observe_full_period:
            attn_mask_single_step = torch.bmm(loss_mask_per_pedestrian.unsqueeze(2), loss_mask_per_pedestrian.unsqueeze(1)) # (batch, num_peds, num_peds)
            attn_mask = torch.ones(batch_size, self.args.obs_seq_len, num_peds, num_peds).to(device) * attn_mask_single_step.unsqueeze(1) # (batch, obs_seq_len, num_peds, num_peds)
        ## observation period: spatial
        if self.args.spatial == 'gumbel_social_transformer':
            # x_embedding = self.node_embedding(x)[0] # (obs_seq_len, node, d_model)
            # A_embedding = self.edge_embedding(A)[0] # (obs_seq_len, nnode <neighbor>, nnode <target>, 2*d_model)
            attn_mask = attn_mask.permute(0, 1, 3, 2) # (batch, obs_seq_len, nnode <target>, nnode <neighbor>)
            attn_mask_reshaped = attn_mask.reshape(batch_size*self.args.obs_seq_len, num_peds, num_peds)
            x_reshaped = x.reshape(batch_size*self.args.obs_seq_len, num_peds, -1)
            A_reshaped = A.reshape(batch_size*self.args.obs_seq_len, num_peds, num_peds, -1) #(batch, obs_seq_len, node, node, edge_feat)
            # ! attn_mask = attn_mask[0].permute(0,2,1) # (obs_seq_len, nnode <target>, nnode <neighbor>)
            # ! xs, sampled_edges, edge_multinomial, attn_weights = self.gumbel_social_transformer(x[0], A[0], attn_mask, tau=tau, hard=hard, device=device)
            xs, sampled_edges, edge_multinomial, attn_weights = self.gumbel_social_transformer(x_reshaped, A_reshaped, attn_mask_reshaped, tau=tau, hard=hard, device=device)
            xs = xs.reshape(batch_size, self.args.obs_seq_len, num_peds, -1) # (batch, obs_seq_len, nnode, embedding_size)
            info['sampled_edges'], info['edge_multinomial'], info['attn_weights'] = [], [], []
            sampled_edges = sampled_edges.reshape(batch_size, self.args.obs_seq_len,
                sampled_edges.shape[1], sampled_edges.shape[2], sampled_edges.shape[3])
            edge_multinomial = edge_multinomial.reshape(batch_size, self.args.obs_seq_len,
                edge_multinomial.shape[1], edge_multinomial.shape[2], edge_multinomial.shape[3])
            attn_weights = attn_weights.reshape(attn_weights.shape[0], batch_size, self.args.obs_seq_len,
                attn_weights.shape[2], attn_weights.shape[3], attn_weights.shape[4])
            info['sampled_edges'].append(sampled_edges.detach().to("cpu"))
            info['edge_multinomial'].append(edge_multinomial.detach().to("cpu"))
            info['attn_weights'].append(attn_weights.detach().to("cpu")) 
        else:
            raise RuntimeError("The spatial component is not found.")
        ## observation period: temporal
        ht = torch.zeros(self.args.lstm_num_layers, batch_size*num_peds, self.args.lstm_hidden_size).to(device)
        ct = torch.zeros(self.args.lstm_num_layers, batch_size*num_peds, self.args.lstm_hidden_size).to(device) 
        if self.args.temporal == 'lstm':
            for tt in range(self.args.obs_seq_len):
                loss_mask_rel_tt = loss_mask_rel[:,:,tt:tt+1].reshape(-1,1) # (batch*num_peds, 1)
                xs_tt = xs[:, tt].reshape(batch_size*num_peds, -1).unsqueeze(0)*loss_mask_rel_tt # (1, batch*num_peds, embedding_size)
                _, (htp, ctp) = self.lstm(xs_tt, (ht, ct)) # tp == tplus
                ht = htp * loss_mask_rel_tt + ht * (1.-loss_mask_rel_tt)
                ct = ctp * loss_mask_rel_tt + ct * (1.-loss_mask_rel_tt)
        elif self.args.temporal == 'faster_lstm':
            obs_mask = loss_mask_rel[:,:,:self.args.obs_seq_len].permute(0,2,1).unsqueeze(-1) # (batch, obs_seq_len, num_peds,1)
            xs_masked = xs*obs_mask # (batch, obs_seq_len, num_peds, embedding_size)
            xs_masked = xs_masked.permute(1,0,2,3).reshape(self.args.obs_seq_len, batch_size*num_peds, -1) # (obs_seq_len, batch*num_peds, embedding_size)
            _, (ht, ct) = self.lstm(xs_masked, (ht, ct))
        else:
            raise RuntimeError('The temporal component is not lstm nor faster_lstm.')
        if self.args.only_observe_full_period:
            loss_mask_rel_full_partial = loss_mask_per_pedestrian # (batch, num_peds)
        else:
            # We predict motion of pedestrians that show up in observation period, and did not disappear after last observed time step. <-> equivalent as only predict pedestrians whose relative position is present at the last observed time step.
            loss_mask_rel_full_partial = loss_mask_rel[:,:,self.args.obs_seq_len-1] # (batch, num_peds)
        ht = ht * loss_mask_rel_full_partial.reshape(-1).unsqueeze(-1)
        ct = ct * loss_mask_rel_full_partial.reshape(-1).unsqueeze(-1)
        attn_mask_pred = torch.bmm(loss_mask_rel_full_partial.unsqueeze(2), loss_mask_rel_full_partial.unsqueeze(1)).permute(0,2,1) # (batch*unit_time, nnode <target>, nnode <neighbor>)
        ## prediction period
        if self.args.decode_style == 'recursive':
            if self.args.temporal == 'lstm' or self.args.temporal == 'faster_lstm':
                # ht # (num_layers, node, hidden_size)
                # ht # (num_layers, batch_size*num_peds, hidden_size) # batch_size*num_peds, sth
                prob_raw = self.hidden2pos(ht.permute(1,0,2).reshape(batch_size*num_peds, -1)).reshape(batch_size, num_peds, -1).unsqueeze(1) # (batch, unit_time, node, output_dim)
                gaussian_params = self.raw2gaussian(prob_raw)
                mu, sx, sy, corr = gaussian_params
                x_sample = self.sample_gaussian(gaussian_params, device=device, detach_sample=self.args.detach_sample, sampling=sampling)
                # x_sample: (batch, unit_time, node, motion_dim)
                # loss_mask_rel_full_partial： (batch, num_peds)
                x_sample = x_sample * loss_mask_rel_full_partial.unsqueeze(1).unsqueeze(-1) # mask x_sample here for edge evolution
                # A: (batch, obs_seq_len, node, node, edge_feat)
                A_sample = self.edge_evolution(x_sample, A[:,-1:], device=device) # (batch, unit_time, node, node, edge_feat)
                # starts recursion
                prob_raw_pred, x_sample_pred, A_sample_pred = [], [], []
                mu_pred, sx_pred, sy_pred, corr_pred = [], [], [], []
                prob_raw_pred.append(prob_raw)
                x_sample_pred.append(x_sample)
                A_sample_pred.append(A_sample)
                mu_pred.append(mu)
                sx_pred.append(sx)
                sy_pred.append(sy)
                corr_pred.append(corr)
                for tt in range(1, self.args.pred_seq_len):
                    if self.args.spatial == 'gumbel_social_transformer':
                        # * spatial encoding at prediction step tt
                        # # attn_mask_pred # (batch*unit_time, nnode <target>, nnode <neighbor>)
                        x_sample_reshaped = x_sample.reshape(batch_size, num_peds, -1) # (batch*unit_time, node, motion_dim)
                        A_sample_reshaped = A_sample.reshape(batch_size, num_peds, num_peds, -1) # (batch*unit_time, node, node, edge_feat)
                        xs_tt, sampled_edges, edge_multinomial, attn_weights = \
                            self.gumbel_social_transformer(x_sample_reshaped, A_sample_reshaped, attn_mask_pred, \
                            tau=tau, hard=hard, device=device)
                        sampled_edges = sampled_edges.reshape(batch_size, 1,
                            sampled_edges.shape[1], sampled_edges.shape[2], sampled_edges.shape[3])
                        edge_multinomial = edge_multinomial.reshape(batch_size, 1,
                            edge_multinomial.shape[1], edge_multinomial.shape[2], edge_multinomial.shape[3])
                        attn_weights = attn_weights.reshape(attn_weights.shape[0], batch_size, 1,
                            attn_weights.shape[2], attn_weights.shape[3], attn_weights.shape[4])
                        info['sampled_edges'].append(sampled_edges.detach().to("cpu"))
                        info['edge_multinomial'].append(edge_multinomial.detach().to("cpu"))
                        info['attn_weights'].append(attn_weights.detach().to("cpu"))
                        # xs_tt: # (batch*unit_time, node, d_model)
                        # * temporal encoding at prediction step tt
                        loss_mask_rel_tt = loss_mask_rel_full_partial.reshape(-1,1) # (batch*num_peds,1)
                        xs_tt = xs_tt.reshape(batch_size*num_peds, -1).unsqueeze(0)*loss_mask_rel_tt # (unit_time, batch*num_peds, embedding_size)
                        _, (htp, ctp) = self.lstm(xs_tt, (ht, ct)) # tp == tplus
                        ht = htp * loss_mask_rel_tt + ht * (1.-loss_mask_rel_tt) # (num_layers, batch_size*num_peds, hidden_size)
                        ct = ctp * loss_mask_rel_tt + ct * (1.-loss_mask_rel_tt) # (num_layers, batch_size*num_peds, hidden_size)
                        # * prediction at prediction step tt
                        prob_raw = self.hidden2pos(ht.permute(1,0,2).reshape(batch_size*num_peds, -1)).reshape(batch_size, num_peds, -1).unsqueeze(1) # (batch, unit_time, node, output_dim)
                        gaussian_params = self.raw2gaussian(prob_raw)
                        mu, sx, sy, corr = gaussian_params
                        x_sample = self.sample_gaussian(gaussian_params, device=device, detach_sample=self.args.detach_sample, sampling=sampling)
                        x_sample = x_sample * loss_mask_rel_full_partial.unsqueeze(1).unsqueeze(-1) # mask x_sample here for edge evolution
                        A_sample = self.edge_evolution(x_sample, A_sample, device=device) # (batch, unit_time, node, node, edge_feat)
                        # * append to results
                        prob_raw_pred.append(prob_raw)
                        x_sample_pred.append(x_sample)
                        A_sample_pred.append(A_sample)
                        mu_pred.append(mu)
                        sx_pred.append(sx)
                        sy_pred.append(sy)
                        corr_pred.append(corr)
                    else:
                        raise RuntimeError("The spatial component is not found.")
                
                # concatenate predictions together
                prob_raw_pred = torch.cat(prob_raw_pred, dim=1) # (batch, pred_seq_len, node, output_dim)
                x_sample_pred = torch.cat(x_sample_pred, dim=1) # (batch, pred_seq_len, node, motion_dim)
                A_sample_pred = torch.cat(A_sample_pred, dim=1) # (batch, pred_seq_len, node, node, edge_feat)
                mu_pred = torch.cat(mu_pred, dim=1) # (batch, pred_seq_len, node, 2)
                sx_pred = torch.cat(sx_pred, dim=1) # (batch, pred_seq_len, node, 1)
                sy_pred = torch.cat(sy_pred, dim=1) # (batch, pred_seq_len, node, 1)
                corr_pred = torch.cat(corr_pred, dim=1) # (batch, pred_seq_len, node, 1)
                gaussian_params_pred = (mu_pred, sx_pred, sy_pred, corr_pred)
                # gaussian_params_pred = self.raw2gaussian(prob_raw_pred)
                info['sampled_edges'] = torch.cat(info['sampled_edges'], dim=1) # (batch_size, seq_len-1, nnode <target>, nhead_edges, neighbor_node)
                info['edge_multinomial'] = torch.cat(info['edge_multinomial'], dim=1) # (batch_size, seq_len-1, nnode <target>, nhead_edges, neighbor_node)
                info['attn_weights'] = torch.cat(info['attn_weights'], dim=2) # (batch_size, nlayer, seq_len-1, nhead, nnode <target>, neighbor_node)
                info['A_sample_pred'] = A_sample_pred # (batch, pred_seq_len, node, node, edge_feat)
                info['loss_mask_rel_full_partial'] = loss_mask_rel_full_partial # (batch, num_peds)
                info['loss_mask_per_pedestrian'] = loss_mask_per_pedestrian # (batch, num_peds)
                # pack results
                results = (gaussian_params_pred, x_sample_pred, info)
                return results
            else:
                raise RuntimeError('The temporal component is not lstm nor faster_lstm.')
        else:
            raise RuntimeError("The decoder style is not recursive.")

================================================
FILE: gst_updated/src/gumbel_social_transformer/temperature_scheduler.py
================================================
class Temp_Scheduler(object):
    def __init__(self, total_epochs, curr_temp, base_temp, temp_min=0.33, last_epoch=-1):
        self.curr_temp = curr_temp
        self.base_temp = base_temp
        self.temp_min = temp_min
        self.last_epoch = last_epoch
        self.total_epochs = total_epochs
        self.step(last_epoch + 1)

    def step(self, epoch=None):
        return self.decay_whole_process()

    def decay_whole_process(self, epoch=None):
        if epoch is None:
            epoch = self.last_epoch + 1
        self.last_epoch = epoch
        self.curr_temp = (1 - self.last_epoch / self.total_epochs) * (self.base_temp - self.temp_min) + self.temp_min
        if self.curr_temp < self.temp_min:
            self.curr_temp = self.temp_min
        return self.curr_temp

================================================
FILE: gst_updated/src/gumbel_social_transformer/temporal_convolution_net.py
================================================
import torch.nn as nn
from gst_updated.src.gumbel_social_transformer.utils import _get_clones, _get_activation_fn

class TemporalConvolutionNet(nn.Module):
    def __init__(
        self,
        in_channels,
        out_channels,
        dim_hidden,
        nconv=2,
        obs_seq_len=8,
        pred_seq_len=12,
        kernel_size=3,
        stride=1,
        dropout=0.1,
        activation="relu",
        ):
        super(TemporalConvolutionNet,self).__init__()
        assert kernel_size % 2 == 1
        assert nconv >= 2
        padding = ((kernel_size - 1) // 2, 0)
        norm_layer = nn.LayerNorm(in_channels)
        timeconv_layer = nn.Conv2d(in_channels, in_channels, (kernel_size, 1), (stride, 1), padding)

        self.nconv = nconv
        self.norms = _get_clones(norm_layer, nconv)
        self.timeconvs = _get_clones(timeconv_layer, nconv)

        self.timelinear1 = nn.Linear(obs_seq_len, pred_seq_len)
        self.timelinear2 = nn.Linear(pred_seq_len, pred_seq_len)
        self.timedropout1 = nn.Dropout(dropout)
        self.timedropout2 = nn.Dropout(dropout)

        self.linear1 = nn.Linear(in_channels, dim_hidden)
        self.linear2 = nn.Linear(dim_hidden, out_channels)
        self.dropout = nn.Dropout(dropout)

        self.activation = _get_activation_fn(activation)

    def forward(self, x):
        # x # (batch, obs_seq_len, node, embedding_size) # in_channels = embedding_size
        # out # (batch, pred_seq_len, node, output_size) # output_size = 5
        for i in range(self.nconv):
            x_norm = self.norms[i](x)
            x_perm = x_norm.permute(0, 3, 1, 2) # (batch, embedding_size, obs_seq_len, node)
            x_perm = self.activation(self.timeconvs[i](x_perm)) # (N, C, H, W) # (N, channels, T_{in}, V)
            x_perm = x_perm.permute(0, 2, 3, 1) # (batch, obs_seq_len, node, embedding_size)
            x = x + x_perm
        x = x.permute(0, 2, 3, 1) # (batch, node, embedding_size, obs_seq_len)
        x = self.timedropout1(self.activation(self.timelinear1(x)))
        x = self.timedropout2(self.activation(self.timelinear2(x))) # (batch, node, embedding_size, pred_seq_len)
        x = x.permute(0, 3, 1, 2) # (batch, pred_seq_len, node, embedding_size)
        x = self.dropout(self.activation(self.linear1(x)))
        out = self.linear2(x) # (batch, pred_seq_len, node, out_channels)
        return out

================================================
FILE: gst_updated/src/gumbel_social_transformer/utils.py
================================================
import copy
import torch
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.modules.container import ModuleList


def _get_activation_fn(activation):
    if activation == "relu":
        return F.relu
    elif activation == "gelu":
        return F.gelu
    raise RuntimeError("activation should be relu/gelu, not {}".format(activation))

def _get_clones(module, N):
    return ModuleList([copy.deepcopy(module) for i in range(N)])

def gumbel_softmax(logits, tau=1, hard=False, eps=1e-10):
    y_soft = gumbel_softmax_sample(logits, tau=tau, eps=eps)
    if hard:
        shape = logits.size()
        _, k = y_soft.data.max(-1)
        y_hard = torch.zeros(*shape)
        if y_soft.is_cuda:
            y_hard = y_hard.cuda()
        y_hard = y_hard.zero_().scatter_(-1, k.view(shape[:-1] + (1,)), 1.0)
        y = Variable(y_hard - y_soft.data) + y_soft
    else:
        y = y_soft
    return y

def gumbel_softmax_sample(logits, tau=1, eps=1e-10):
    gumbel_noise = sample_gumbel(logits.size(), eps=eps)
    if logits.is_cuda:
        gumbel_noise = gumbel_noise.cuda()
    y = logits + Variable(gumbel_noise)
    return F.softmax(y / tau, dim=-1)

def sample_gumbel(shape, eps=1e-10):
    uniform = torch.rand(shape).float()
    return - torch.log(eps - torch.log(uniform + eps))

================================================
FILE: gst_updated/src/mgnn/batch_trajectories.py
================================================
from torch.utils.data import Dataset

class BatchTrajectoriesDataset(Dataset):
    def __init__(
        self,
        ):
        super(BatchTrajectoriesDataset, self).__init__()
        self.num_seq = 0
        self.obs_traj = []
        self.pred_traj = []
        self.obs_traj_rel = []
        self.pred_traj_rel = []
        self.loss_mask_rel = []
        self.loss_mask = []
        self.v_obs = []
        self.A_obs = []
        self.v_pred = []
        self.A_pred = []
        self.attn_mask_obs = []
        self.attn_mask_pred = []

    def add_batch(
        self,
        obs_traj,
        pred_traj_gt,
        obs_traj_rel,
        pred_traj_rel_gt,
        loss_mask_rel,
        loss_mask,
        v_obs, 
        A_obs,
        v_pred,
        A_pred,
        attn_mask_obs,
        attn_mask_pred,
    ):
        self.num_seq += 1
        self.obs_traj.append(obs_traj.float())
        self.pred_traj.append(pred_traj_gt.float())
        self.obs_traj_rel.append(obs_traj_rel.float())
        self.pred_traj_rel.append(pred_traj_rel_gt.float())
        self.loss_mask_rel.append(loss_mask_rel.float())
        self.loss_mask.append(loss_mask.float())
        self.v_obs.append(v_obs.float())
        self.A_obs.append(A_obs.float())
        self.v_pred.append(v_pred.float())
        self.A_pred.append(A_pred.float())
        self.attn_mask_obs.append(attn_mask_obs.float())
        self.attn_mask_pred.append(attn_mask_pred.float())

    def __len__(self):
        return self.num_seq


    def __getitem__(self, index):
        out = [
            self.obs_traj[index], self.pred_traj[index],
            self.obs_traj_rel[index], self.pred_traj_rel[index],
            self.loss_mask_rel[index], self.loss_mask[index],
            self.v_obs[index], self.A_obs[index],
            self.v_pred[index], self.A_pred[index],
            self.attn_mask_obs[index], self.attn_mask_pred[index],
        ]
        return out


================================================
FILE: gst_updated/src/mgnn/trajectories.py
================================================
from torch.utils.data import Dataset
import os
import numpy as np
import math
import torch
from src.mgnn.utils import seq_to_graph

class TrajectoriesDataset(Dataset):
    def __init__(
        self,
        data_dir,
        obs_seq_len=8,
        pred_seq_len=12,
        skip=1,
        delim='\t',
        invalid_value=-999.,
        mode=None,
        frame_diff=10.,
        ):
        super(TrajectoriesDataset, self).__init__()
        self.data_dir = data_dir
        self.obs_seq_len = obs_seq_len
        self.pred_seq_len = pred_seq_len
        self.skip = skip
        self.seq_len = self.obs_seq_len + self.pred_seq_len
        self.delim = delim
        all_files = os.listdir(self.data_dir)
        all_files = [os.path.join(self.data_dir, _path) for _path in all_files]
        num_peds_in_seq = []
        seq_list = []
        seq_list_rel = []
        loss_mask_list = []
        loss_mask_rel_list = []
        frame_id_seq = []
        print("Files to be written into the dataset: ")
        print(all_files)
        for path in all_files:
            print(path)
            data = read_file(path, delim)
            frames = np.unique(data[:, 0]).tolist()
            frame_data = []
            for frame in frames:
                frame_data.append(data[data[:, 0]==frame, :])
            num_sequences = math.floor((len(frames)-self.seq_len)/self.skip)+1
            if mode is None:
                idx_range = range(0, num_sequences * self.skip + 1, skip)
            elif mode == 'train':
                idx_range = range(0, int((num_sequences * self.skip + 1)*0.8), skip)
            elif mode == 'val':
                idx_range = range(int((num_sequences * self.skip + 1)*0.8), num_sequences * self.skip + 1, skip)
            elif mode == 'test':
                idx_range = range(int((num_sequences * self.skip + 1)*0.8), num_sequences * self.skip + 1, skip)
            else:
                raise RuntimeError("Wrong mode for TrajectoriesDataset.")
            # idx_range = range(0, num_sequences * self.skip + 1, skip)
            for idx in idx_range:
                curr_seq_data = np.concatenate(frame_data[idx:idx + self.seq_len], axis=0)
                start_frame_id = curr_seq_data[0, 0]
                peds_in_curr_seq = np.unique(curr_seq_data[:, 1])
                ped_survive_all_time = False
                for _, ped_id in enumerate(peds_in_curr_seq):
                    curr_ped_seq = curr_seq_data[curr_seq_data[:, 1] == ped_id, :]
                    frames_curr_ped_seq = np.unique(curr_ped_seq[:,0])
                    if len(frames_curr_ped_seq) == self.seq_len and np.all(frames_curr_ped_seq[1:]-frames_curr_ped_seq[:-1]==frame_diff):
                        ped_survive_all_time = True
                        break
                if not ped_survive_all_time:
                    continue
                curr_seq = np.ones((len(peds_in_curr_seq), 2, self.seq_len)) * invalid_value
                curr_seq_rel = np.ones((len(peds_in_curr_seq), 2, self.seq_len)) * invalid_value
                curr_loss_mask = np.zeros((len(peds_in_curr_seq), self.seq_len))
                curr_loss_mask_rel = np.zeros((len(peds_in_curr_seq), self.seq_len))
                for ped_id_curr_seq, ped_id in enumerate(peds_in_curr_seq):
                    for tt in range(self.seq_len):
                        frame_id = start_frame_id + tt * frame_diff
                        frame_ped_id = (curr_seq_data[:,0]==frame_id) * (curr_seq_data[:,1]==ped_id)
                        if len(curr_seq_data[frame_ped_id]) == 0:
                            curr_loss_mask[ped_id_curr_seq, tt] = 0
                            curr_loss_mask_rel[ped_id_curr_seq, tt] = 0
                        elif len(curr_seq_data[frame_ped_id]) == 1:
                            curr_seq[ped_id_curr_seq,:,tt] = curr_seq_data[frame_ped_id, 2:]
                            curr_loss_mask[ped_id_curr_seq, tt] = 1
                            if tt == 0:
                                curr_seq_rel[ped_id_curr_seq,:,tt] = np.zeros((2,))
                                curr_loss_mask_rel[ped_id_curr_seq, tt] = 1
                            else:
                                if curr_loss_mask[ped_id_curr_seq, tt-1] == 1:
                                    curr_seq_rel[ped_id_curr_seq,:,tt] = curr_seq[ped_id_curr_seq,:,tt] - curr_seq[ped_id_curr_seq,:,tt-1]
                                    curr_loss_mask_rel[ped_id_curr_seq, tt] = 1
                                else:
                                    curr_loss_mask_rel[ped_id_curr_seq, tt] = 0
                        else:
                            raise RuntimeError("The same pedestrian has multiple locations in the same frame.")

                num_peds_in_seq.append(len(peds_in_curr_seq))
                seq_list.append(curr_seq)
                seq_list_rel.append(curr_seq_rel)
                loss_mask_list.append(curr_loss_mask)
                loss_mask_rel_list.append(curr_loss_mask_rel)
                frame_id_seq.append(start_frame_id)

        self.num_seq = len(seq_list)
        seq_list = np.concatenate(seq_list, axis=0)
        seq_list_rel = np.concatenate(seq_list_rel, axis=0)
        loss_mask_list = np.concatenate(loss_mask_list, axis=0)
        loss_mask_rel_list = np.concatenate(loss_mask_rel_list, axis=0)
        self.obs_traj = torch.from_numpy(
            seq_list[:, :, :self.obs_seq_len]).type(torch.float)
        self.pred_traj = torch.from_numpy(
            seq_list[:, :, self.obs_seq_len:]).type(torch.float)
        self.obs_traj_rel = torch.from_numpy(
            seq_list_rel[:, :, :self.obs_seq_len]).type(torch.float)
        self.pred_traj_rel = torch.from_numpy(
            seq_list_rel[:, :, self.obs_seq_len:]).type(torch.float)
        self.loss_mask = torch.from_numpy(loss_mask_list).type(torch.float)
        self.loss_mask_rel = torch.from_numpy(loss_mask_rel_list).type(torch.float)
        cum_start_idx = [0] + np.cumsum(num_peds_in_seq).tolist()
        self.seq_start_end = [(start, end) for start, end in zip(cum_start_idx[:-1], cum_start_idx[1:])]
        self.frame_id_seq = frame_id_seq
        self.v_obs = []
        self.A_obs = []
        self.attn_mask_obs = []
        self.v_pred = []
        self.A_pred = []
        self.attn_mask_pred = []
        for ss in range(len(self.seq_start_end)):
            start, end = self.seq_start_end[ss]
            v_, a_ = seq_to_graph(
                self.obs_traj[start:end, :], self.obs_traj_rel[start:end, :], attn_mech='rel_conv')
            self.v_obs.append(v_.clone())
            self.A_obs.append(a_.clone())
            v_, a_ = seq_to_graph(
                self.pred_traj[start:end, :], self.pred_traj_rel[start:end, :], attn_mech='rel_conv')
            self.v_pred.append(v_.clone())
            self.A_pred.append(a_.clone())
            attn_mask = []
            for tt in range(self.seq_len):
                loss_mask_rel_tt = self.loss_mask_rel[start:end, tt]
                attn_mask.append(torch.outer(loss_mask_rel_tt, loss_mask_rel_tt).float())
            attn_mask = torch.stack(attn_mask, dim=0)
            self.attn_mask_obs.append(attn_mask[:self.obs_seq_len])
            self.attn_mask_pred.append(attn_mask[self.obs_seq_len:])


    def __len__(self):
        return self.num_seq


    def __getitem__(self, index):
        start, end = self.seq_start_end[index]
        out = [
            self.obs_traj[start:end, :], self.pred_traj[start:end, :],
            self.obs_traj_rel[start:end, :], self.pred_traj_rel[start:end, :],
            self.loss_mask_rel[start:end, :], self.loss_mask[start:end, :],
            self.v_obs[index], self.A_obs[index],
            self.v_pred[index], self.A_pred[index],
            self.attn_mask_obs[index], self.attn_mask_pred[index],
        ]
        return out
    
    
def read_file(_path, delim='\t'):
    data = []
    if delim == 'tab':
        delim = '\t'
    elif delim == 'space':
        delim = ' '
    with open(_path, 'r') as f:
        for line in f:
            line = line.strip().split(delim)
            line = [float(i) for i in line]
            data.append(line)
    return np.asarray(data)


================================================
FILE: gst_updated/src/mgnn/trajectories_sdd.py
================================================
from torch.utils.data import Dataset
import os
import numpy as np
import math
import torch
from src.mgnn.utils import seq_to_graph

class TrajectoriesDataset(Dataset):
    def __init__(
        self,
        data_dir,
        obs_seq_len=8,
        pred_seq_len=12,
        skip=1,
        delim='\t',
        invalid_value=-999.,
        mode=None,
        frame_diff=10.,
        ):
        super(TrajectoriesDataset, self).__init__()
        self.data_dir = data_dir
        self.obs_seq_len = obs_seq_len
        self.pred_seq_len = pred_seq_len
        self.skip = skip
        self.seq_len = self.obs_seq_len + self.pred_seq_len
        self.delim = delim
        self.frame_diff = frame_diff
        all_files = os.listdir(self.data_dir)
        all_files = [os.path.join(self.data_dir, _path) for _path in all_files]
        num_peds_in_seq = []
        seq_list = []
        seq_list_rel = []
        loss_mask_list = []
        loss_mask_rel_list = []
        frame_id_seq = []
        print("Files to be written into the dataset: ")
        print(all_files)
        for path in all_files:
            print(path)
            print("current sequence list: ", len(seq_list))
            data = read_sdd_file(path)
            frames = np.unique(data[:, 0]).tolist()
            frame_data = []
            for frame in frames:
                frame_data.append(data[data[:, 0]==frame, :])
            num_sequences = math.floor((len(frames)-self.seq_len)/self.skip)+1
            if mode is None:
                idx_range = range(0, num_sequences * self.skip + 1, skip)
            elif mode == 'train':
                idx_range = range(0, int((num_sequences * self.skip + 1)*0.8), skip)
            elif mode == 'val':
                idx_range = range(int((num_sequences * self.skip + 1)*0.8), int((num_sequences * self.skip + 1)*0.9), skip)
            elif mode == 'test':
                idx_range = range(int((num_sequences * self.skip + 1)*0.9), num_sequences * self.skip + 1, skip)
            else:
                raise RuntimeError("Wrong mode for TrajectoriesDataset.")
            for idx in idx_range:
                curr_seq_data = np.concatenate(frame_data[idx:idx + self.seq_len], axis=0)
                start_frame_id = curr_seq_data[0, 0]
                peds_in_curr_seq = np.unique(curr_seq_data[:, 1])
                ped_survive_all_time = False
                for _, ped_id in enumerate(peds_in_curr_seq):
                    curr_ped_seq = curr_seq_data[curr_seq_data[:, 1] == ped_id, :]
                    frames_curr_ped_seq = np.unique(curr_ped_seq[:,0])
                    if len(frames_curr_ped_seq) == self.seq_len and np.all(frames_curr_ped_seq[1:]-frames_curr_ped_seq[:-1]==self.frame_diff):
                        ped_survive_all_time = True
                        break
                if not ped_survive_all_time:
                    continue
                curr_seq = np.ones((len(peds_in_curr_seq), 2, self.seq_len)) * invalid_value
                curr_seq_rel = np.ones((len(peds_in_curr_seq), 2, self.seq_len)) * invalid_value
                curr_loss_mask = np.zeros((len(peds_in_curr_seq), self.seq_len))
                curr_loss_mask_rel = np.zeros((len(peds_in_curr_seq), self.seq_len))
                for ped_id_curr_seq, ped_id in enumerate(peds_in_curr_seq):
                    for tt in range(self.seq_len):
                        frame_id = start_frame_id + tt * self.frame_diff
                        frame_ped_id = (curr_seq_data[:,0]==frame_id) * (curr_seq_data[:,1]==ped_id)
                        if len(curr_seq_data[frame_ped_id]) == 0:
                            curr_loss_mask[ped_id_curr_seq, tt] = 0
                            curr_loss_mask_rel[ped_id_curr_seq, tt] = 0
                        elif len(curr_seq_data[frame_ped_id]) == 1:
                            curr_seq[ped_id_curr_seq,:,tt] = curr_seq_data[frame_ped_id, 2:]
                            curr_loss_mask[ped_id_curr_seq, tt] = 1
                            if tt == 0:
                                curr_seq_rel[ped_id_curr_seq,:,tt] = np.zeros((2,))
                                curr_loss_mask_rel[ped_id_curr_seq, tt] = 1
                            else:
                                if curr_loss_mask[ped_id_curr_seq, tt-1] == 1:
                                    curr_seq_rel[ped_id_curr_seq,:,tt] = curr_seq[ped_id_curr_seq,:,tt] - curr_seq[ped_id_curr_seq,:,tt-1]
                                    curr_loss_mask_rel[ped_id_curr_seq, tt] = 1
                                else:
                                    curr_loss_mask_rel[ped_id_curr_seq, tt] = 0
                        else:
                            raise RuntimeError("The same pedestrian has multiple locations in the same frame.")

                num_peds_in_seq.append(len(peds_in_curr_seq))
                seq_list.append(curr_seq)
                seq_list_rel.append(curr_seq_rel)
                loss_mask_list.append(curr_loss_mask)
                loss_mask_rel_list.append(curr_loss_mask_rel)
                frame_id_seq.append(start_frame_id)

        self.num_seq = len(seq_list)
        seq_list = np.concatenate(seq_list, axis=0)
        seq_list_rel = np.concatenate(seq_list_rel, axis=0)
        loss_mask_list = np.concatenate(loss_mask_list, axis=0)
        loss_mask_rel_list = np.concatenate(loss_mask_rel_list, axis=0)
        self.obs_traj = torch.from_numpy(
            seq_list[:, :, :self.obs_seq_len]).type(torch.float)
        self.pred_traj = torch.from_numpy(
            seq_list[:, :, self.obs_seq_len:]).type(torch.float)
        self.obs_traj_rel = torch.from_numpy(
            seq_list_rel[:, :, :self.obs_seq_len]).type(torch.float)
        self.pred_traj_rel = torch.from_numpy(
            seq_list_rel[:, :, self.obs_seq_len:]).type(torch.float)
        self.loss_mask = torch.from_numpy(loss_mask_list).type(torch.float)
        self.loss_mask_rel = torch.from_numpy(loss_mask_rel_list).type(torch.float)
        cum_start_idx = [0] + np.cumsum(num_peds_in_seq).tolist()
        self.seq_start_end = [(start, end) for start, end in zip(cum_start_idx[:-1], cum_start_idx[1:])]
        self.frame_id_seq = frame_id_seq
        self.v_obs = []
        self.A_obs = []
        self.attn_mask_obs = []
        self.v_pred = []
        self.A_pred = []
        self.attn_mask_pred = []
        for ss in range(len(self.seq_start_end)):
            start, end = self.seq_start_end[ss]
            v_, a_ = seq_to_graph(
                self.obs_traj[start:end, :], self.obs_traj_rel[start:end, :], attn_mech='rel_conv')
            self.v_obs.append(v_.clone())
            self.A_obs.append(a_.clone())
            v_, a_ = seq_to_graph(
                self.pred_traj[start:end, :], self.pred_traj_rel[start:end, :], attn_mech='rel_conv')
            self.v_pred.append(v_.clone())
            self.A_pred.append(a_.clone())
            attn_mask = []
            for tt in range(self.seq_len):
                loss_mask_rel_tt = self.loss_mask_rel[start:end, tt]
                attn_mask.append(torch.outer(loss_mask_rel_tt, loss_mask_rel_tt).float())
            attn_mask = torch.stack(attn_mask, dim=0)
            self.attn_mask_obs.append(attn_mask[:self.obs_seq_len])
            self.attn_mask_pred.append(attn_mask[self.obs_seq_len:])


    def __len__(self):
        return self.num_seq


    def __getitem__(self, index):
        start, end = self.seq_start_end[index]
        out = [
            self.obs_traj[start:end, :], self.pred_traj[start:end, :],
            self.obs_traj_rel[start:end, :], self.pred_traj_rel[start:end, :],
            self.loss_mask_rel[start:end, :], self.loss_mask[start:end, :],
            self.v_obs[index], self.A_obs[index],
            self.v_pred[index], self.A_pred[index],
            self.attn_mask_obs[index], self.attn_mask_pred[index],
        ]
        return out
    
    
def read_file(_path, delim='\t'):
    data = []
    if delim == 'tab':
        delim = '\t'
    elif delim == 'space':
        delim = ' '
    with open(_path, 'r') as f:
        for line in f:
            line = line.strip().split(delim)
            line = [float(i) for i in line]
            data.append(line)
    return np.asarray(data)

def read_sdd_file(_path):
    data = []
    with open(_path, 'r') as f:
        for line in f:
            line = line.rstrip()
            line = line.split()
            line[-1] = line[-1].strip('"')
            if line[-1] == "Car":
                continue
            line = [float(i) for i in line[:-1]]
            agent_id,xmin,ymin,xmax,ymax,frame,lost,occluded,generated = line
            if lost == 1 or frame % 10 != 0:
                # lost agent, and have to be selected every 10 frames. (30fps)
                continue
            f = frame
            p = agent_id
            x = (xmin+xmax)/2.
            y = (ymin+ymax)/2.
            line = [f,p,x,y]
            data.append(line)
    return np.asarray(data)


================================================
FILE: gst_updated/src/mgnn/trajectories_trajnet.py
================================================
from torch.utils.data import Dataset
import os
import numpy as np
import math
import torch
from src.mgnn.utils import seq_to_graph
import ndjson

class TrajectoriesDataset(Dataset):
    def __init__(
        self,
        data_dir,
        obs_seq_len=8,
        pred_seq_len=12,
        skip=1,
        delim='\t',
        invalid_value=-999.,
        mode=None,
        ):
        super(TrajectoriesDataset, self).__init__()
        self.data_dir = data_dir
        self.obs_seq_len = obs_seq_len
        self.pred_seq_len = pred_seq_len
        self.skip = skip
        self.seq_len = self.obs_seq_len + self.pred_seq_len
        self.delim = delim
        all_files = os.listdir(self.data_dir)
        all_files = [os.path.join(self.data_dir, _path) for _path in all_files]
        num_peds_in_seq = []
        seq_list = []
        seq_list_rel = []
        loss_mask_list = []
        loss_mask_rel_list = []
        frame_id_seq = []
        print("Files to be written into the dataset: ")
        print(all_files)
        for path in all_files:
            print("current sequence length: ", len(seq_list))
            print(path)
            data, frame_diff, start_frames_considered, filename_str = read_trajnet_file(path)
            if len(data) == 0:
                continue
            # data = read_file(path, delim)
            frames_np = np.unique(data[:, 0])
            frames = frames_np.tolist()
            frame_data = []
            for frame in frames:
                frame_data.append(data[data[:, 0]==frame, :])
            if filename_str[:3]=="cff": # too big
                skip = 100
            else:
                skip = self.skip
            print("skip: ", skip)
            print("total length: ", len(start_frames_considered[::skip]))
            if mode is None:
                start_frames_np = start_frames_considered[::skip]
            elif mode == 'train':
                start_frames_np = start_frames_considered[:int(0.8*len(start_frames_considered)):skip]
                # idx_range = range(0, int((num_sequences * self.skip + 1)*0.8), skip)
            elif mode == 'val':
                start_frames_np = start_frames_considered[int(0.8*len(start_frames_considered)):int(0.9*len(start_frames_considered)):skip]
                # idx_range = range(int((num_sequences * self.skip + 1)*0.8), num_sequences * self.skip + 1, skip)
            elif mode == 'test':
                start_frames_np = start_frames_considered[int(0.9*len(start_frames_considered))::skip]
                # idx_range = range(int((num_sequences * self.skip + 1)*0.8), num_sequences * self.skip + 1, skip)
            else:
                raise RuntimeError("Wrong mode for TrajectoriesDataset.")

            # num_sequences = math.floor((len(frames)-self.seq_len)/self.skip)+1
            # if mode is None:
            #     idx_range = range(0, num_sequences * self.skip + 1, skip)
            # elif mode == 'train':
            #     idx_range = range(0, int((num_sequences * self.skip + 1)*0.8), skip)
            # elif mode == 'val':
            #     idx_range = range(int((num_sequences * self.skip + 1)*0.8), num_sequences * self.skip + 1, skip)
            # elif mode == 'test':
            #     idx_range = range(int((num_sequences * self.skip + 1)*0.8), num_sequences * self.skip + 1, skip)
            # else:
            #     raise RuntimeError("Wrong mode for TrajectoriesDataset.")
            # idx_range = range(0, num_sequences * self.skip + 1, skip)
            # print(len(idx_range))
            # for idx in idx_range:
            for start_frame in start_frames_np:
                idx, = np.where(frames_np==start_frame)
                idx = idx[0]
                curr_seq_data = np.concatenate(frame_data[idx:idx + self.seq_len], axis=0)
                start_frame_id = curr_seq_data[0, 0]
                peds_in_curr_seq = np.unique(curr_seq_data[:, 1])
                ped_survive_all_time = False
                for _, ped_id in enumerate(peds_in_curr_seq):
                    curr_ped_seq = curr_seq_data[curr_seq_data[:, 1] == ped_id, :]
                    frames_curr_ped_seq = np.unique(curr_ped_seq[:,0])
                    if len(frames_curr_ped_seq) == self.seq_len and \
                        np.all(frames_curr_ped_seq[1:]-frames_curr_ped_seq[:-1]==frame_diff):
                            ped_survive_all_time = True
                            break
                # import pdb; pdb.set_trace()
                if not ped_survive_all_time:
                    continue
                curr_seq = np.ones((len(peds_in_curr_seq), 2, self.seq_len)) * invalid_value
                curr_seq_rel = np.ones((len(peds_in_curr_seq), 2, self.seq_len)) * invalid_value
                curr_loss_mask = np.zeros((len(peds_in_curr_seq), self.seq_len))
                curr_loss_mask_rel = np.zeros((len(peds_in_curr_seq), self.seq_len))
                for ped_id_curr_seq, ped_id in enumerate(peds_in_curr_seq):
                    for tt in range(self.seq_len):
                        frame_id = start_frame_id + tt * frame_diff
                        frame_ped_id = (curr_seq_data[:,0]==frame_id) * (curr_seq_data[:,1]==ped_id)
                        if len(curr_seq_data[frame_ped_id]) == 0:
                            curr_loss_mask[ped_id_curr_seq, tt] = 0
                            curr_loss_mask_rel[ped_id_curr_seq, tt] = 0
                        elif len(curr_seq_data[frame_ped_id]) == 1:
                            curr_seq[ped_id_curr_seq,:,tt] = curr_seq_data[frame_ped_id, 2:]
                            curr_loss_mask[ped_id_curr_seq, tt] = 1
                            if tt == 0:
                                curr_seq_rel[ped_id_curr_seq,:,tt] = np.zeros((2,))
                                curr_loss_mask_rel[ped_id_curr_seq, tt] = 1
                            else:
                                if curr_loss_mask[ped_id_curr_seq, tt-1] == 1:
                                    curr_seq_rel[ped_id_curr_seq,:,tt] = curr_seq[ped_id_curr_seq,:,tt] - curr_seq[ped_id_curr_seq,:,tt-1]
                                    curr_loss_mask_rel[ped_id_curr_seq, tt] = 1
                                else:
                                    curr_loss_mask_rel[ped_id_curr_seq, tt] = 0
                        else:
                            raise RuntimeError("The same pedestrian has multiple locations in the same frame.")

                num_peds_in_seq.append(len(peds_in_curr_seq))
                seq_list.append(curr_seq)
                seq_list_rel.append(curr_seq_rel)
                loss_mask_list.append(curr_loss_mask)
                loss_mask_rel_list.append(curr_loss_mask_rel)
                frame_id_seq.append(start_frame_id)

        self.num_seq = len(seq_list)
        print("total sequence length: ", len(seq_list))
        seq_list = np.concatenate(seq_list, axis=0)
        seq_list_rel = np.concatenate(seq_list_rel, axis=0)
        loss_mask_list = np.concatenate(loss_mask_list, axis=0)
        loss_mask_rel_list = np.concatenate(loss_mask_rel_list, axis=0)
        self.obs_traj = torch.from_numpy(
            seq_list[:, :, :self.obs_seq_len]).type(torch.float)
        self.pred_traj = torch.from_numpy(
            seq_list[:, :, self.obs_seq_len:]).type(torch.float)
        self.obs_traj_rel = torch.from_numpy(
            seq_list_rel[:, :, :self.obs_seq_len]).type(torch.float)
        self.pred_traj_rel = torch.from_numpy(
            seq_list_rel[:, :, self.obs_seq_len:]).type(torch.float)
        self.loss_mask = torch.from_numpy(loss_mask_list).type(torch.float)
        self.loss_mask_rel = torch.from_numpy(loss_mask_rel_list).type(torch.float)
        cum_start_idx = [0] + np.cumsum(num_peds_in_seq).tolist()
        self.seq_start_end = [(start, end) for start, end in zip(cum_start_idx[:-1], cum_start_idx[1:])]
        self.frame_id_seq = frame_id_seq
        self.v_obs = []
        self.A_obs = []
        self.attn_mask_obs = []
        self.v_pred = []
        self.A_pred = []
        self.attn_mask_pred = []
        for ss in range(len(self.seq_start_end)):
            start, end = self.seq_start_end[ss]
            v_, a_ = seq_to_graph(
                self.obs_traj[start:end, :], self.obs_traj_rel[start:end, :], attn_mech='rel_conv')
            self.v_obs.append(v_.clone())
            self.A_obs.append(a_.clone())
            v_, a_ = seq_to_graph(
                self.pred_traj[start:end, :], self.pred_traj_rel[start:end, :], attn_mech='rel_conv')
            self.v_pred.append(v_.clone())
            self.A_pred.append(a_.clone())
            attn_mask = []
            for tt in range(self.seq_len):
                loss_mask_rel_tt = self.loss_mask_rel[start:end, tt]
                attn_mask.append(torch.outer(loss_mask_rel_tt, loss_mask_rel_tt).float())
            attn_mask = torch.stack(attn_mask, dim=0)
            self.attn_mask_obs.append(attn_mask[:self.obs_seq_len])
            self.attn_mask_pred.append(attn_mask[self.obs_seq_len:])


    def __len__(self):
        return self.num_seq


    def __getitem__(self, index):
        start, end = self.seq_start_end[index]
        out = [
            self.obs_traj[start:end, :], self.pred_traj[start:end, :],
            self.obs_traj_rel[start:end, :], self.pred_traj_rel[start:end, :],
            self.loss_mask_rel[start:end, :], self.loss_mask[start:end, :],
            self.v_obs[index], self.A_obs[index],
            self.v_pred[index], self.A_pred[index],
            self.attn_mask_obs[index], self.attn_mask_pred[index],
        ]
        return out
    
    
def read_file(_path, delim='\t'):
    data = []
    if delim == 'tab':
        delim = '\t'
    elif delim == 'space':
        delim = ' '
    with open(_path, 'r') as f:
        for line in f:
            line = line.strip().split(delim)
            line = [float(i) for i in line]
            data.append(line)
    return np.asarray(data)


def read_trajnet_file(_path):
    _, filename = os.path.split(_path)
    print(filename)
    filename_str, filename_ext = filename.split('.')
    if filename_ext != "ndjson":
        print("Not ndjson file for trajnet++.")
        data = []
        frame_diff = 0.
        start_frames_considered = []
        return data, frame_diff, start_frames_considered, filename_str
        # raise RuntimeError("Not ndjson file for trajnet++.")
    lines = []
    frame_diff = 0.
    start_frames_considered = []
    with open(_path) as f:
        reader = ndjson.reader(f)
        for post in reader:
            # print(type(post))
            # print(post.keys())
            # print(post['scene'])
            if "scene" in post.keys():# and len(start_frames_considered) < 20:
                start_frame = post["scene"]["s"]
                if frame_diff==0.:
                    frame_diff = (post["scene"]["e"]-post["scene"]["s"])/20
                start_frames_considered.append(start_frame)

                # np.unique(data[:, 0]).tolist()
                # {"scene": {"id": 0, "p": 24, "s": 500, "e": 700, "fps": 2.5, "tag": [3, [2]]}}
                # {"scene": {"id": 1, "p": 24, "s": 520, "e": 720, "fps": 2.5, "tag": [4, []]}}
                # {"scene": {"id": 2, "p": 24, "s": 540, "e": 740, "fps": 2.5, "tag": [4, []]}}
                # {"scene": {"id": 3, "p": 24, "s": 560, "e": 760, "fps": 2.5, "tag": [4, []]}}
            if "track" in post.keys():
                f = post["track"]["f"]
                p = post["track"]["p"]
                x = post["track"]["x"]
                y = post["track"]["y"]
                line = [f,p,x,y] # str(f)+'\t'+str(p)+'\t'+str(x)+'\t'+str(y)+'\n'
                lines.append(line)
                # if len(lines)==3000:
                #     break
    data = np.asarray(lines)
    start_frames_considered = np.unique(start_frames_considered)
    return data, frame_diff, start_frames_considered, filename_str


#     import pathhack
# import ndjson
# import os

# raw_folder = 'real_data'
# output_folder = raw_folder+'_txt'
# data_dir = os.path.join(pathhack.pkg_path, "datasets/trajnet++/train", raw_folder)
# output_dir = os.path.join(pathhack.pkg_path, "datasets/trajnet++/train", output_folder)
# all_files = os.listdir(data_dir)
# all_files = [os.path.join(data_dir, _path) for _path in all_files]

# for path in all_files:
#     print(path)
#     _, filename = os.path.split(path)
#     filename_str, filename_ext = filename.split('.')
#     frame = None
#     if filename_ext == "ndjson":
#         lines = []
#         with open(path) as f:
#             reader = ndjson.reader(f)
#             for post in reader:
#                 # print(type(post))
#                 # print(post.keys())
#                 # print(post['scene'])
#                 if "track" in post.keys():
#                     f = post["track"]["f"]
#                     if frame is None:
#                         frame = f
#                     else:
#                         print(f-frame)
#                         break
#                     p = post["track"]["p"]
#                     x = post["track"]["x"]
#                     y = post["track"]["y"]
#                     line = str(f)+'\t'+str(p)+'\t'+str(x)+'\t'+str(y)+'\n'
#                     lines.append(line)


================================================
FILE: gst_updated/src/mgnn/trajectories_trajnet_testset.py
================================================
from torch.utils.data import Dataset
import os
import numpy as np
import math
import torch
from src.mgnn.utils import seq_to_graph
import ndjson

class TrajectoriesDataset(Dataset):
    def __init__(
        self,
        data_filepath,
        obs_seq_len=8,
        pred_seq_len=12,
        skip=1,
        delim='\t',
        invalid_value=-999.,
        mode=None,
        ):
        super(TrajectoriesDataset, self).__init__()
        self.data_filepath = data_filepath
        self.obs_seq_len = obs_seq_len
        self.pred_seq_len = pred_seq_len
        self.skip = skip
        self.seq_len = self.obs_seq_len + self.pred_seq_len
        self.delim = delim
        self.frame_diff = 0.
        # ! all_files = os.listdir(self.data_dir)
        # ! all_files = [os.path.join(self.data_dir, _path) for _path in all_files]
        # ! Need to make it only one file!!!
        all_files = [self.data_filepath]
        num_peds_in_seq = []
        seq_list = []
        seq_list_rel = []
        loss_mask_list = []
        loss_mask_rel_list = []
        frame_id_seq = []
        ped_id_list = []
        print("Files to be written into the dataset: ")
        print(all_files)
        for path in all_files:
            print("current sequence length: ", len(seq_list))
            print(path)
            data, frame_diff, start_frames_considered, filename_str = read_trajnet_file(path)
            self.frame_diff = frame_diff
            if len(data) == 0:
                continue
            # data = read_file(path, delim)
            frames_np = np.unique(data[:, 0])
            frames = frames_np.tolist()
            frame_data = []
            for frame in frames:
                frame_data.append(data[data[:, 0]==frame, :])
            if filename_str[:3]=="cff": # too big
                skip = 100
            else:
                skip = self.skip
            print("skip: ", skip)
            if mode == 'test':
                start_frames_np = start_frames_considered
            else:
                raise RuntimeError("Wrong mode for TrajectoriesDataset.")
            for start_frame in start_frames_np:
                idx, = np.where(frames_np==start_frame)
                idx = idx[0]
                curr_seq_data = np.concatenate(frame_data[idx:idx + self.obs_seq_len+1], axis=0) # only observation is available in testset
                # ! REALLY IMPORTANT: they have 21 frames. 9 obs, 12 pred. i.e. still 8 offset in obs, 12 offset in prediction
                
                start_frame_id = curr_seq_data[0, 0]
                peds_in_curr_seq = np.unique(curr_seq_data[:, 1])
                ped_survive_all_time = False
                for _, ped_id in enumerate(peds_in_curr_seq):
                    curr_ped_seq = curr_seq_data[curr_seq_data[:, 1] == ped_id, :]
                    frames_curr_ped_seq = np.unique(curr_ped_seq[:,0])
                    # if len(frames_curr_ped_seq) == self.seq_len and \
                    if len(frames_curr_ped_seq) == self.obs_seq_len+1 and \
                        np.all(frames_curr_ped_seq[1:]-frames_curr_ped_seq[:-1]==frame_diff):
                            ped_survive_all_time = True
                            break
                
                if not ped_survive_all_time:
                    print("Never")
                    continue
                curr_seq = np.ones((len(peds_in_curr_seq), 2, self.obs_seq_len)) * invalid_value
                curr_seq_rel = np.ones((len(peds_in_curr_seq), 2, self.obs_seq_len)) * invalid_value
                curr_loss_mask = np.zeros((len(peds_in_curr_seq), self.obs_seq_len))
                curr_loss_mask_rel = np.zeros((len(peds_in_curr_seq), self.obs_seq_len))
                
                ped_id_dict = {}
                
                for ped_id_curr_seq, ped_id in enumerate(peds_in_curr_seq):
                    ped_id_dict[ped_id_curr_seq] = ped_id
                    
                    for tt in range(self.obs_seq_len):
                        frame_id = start_frame_id + (tt+1) * frame_diff
                        frame_ped_id = (curr_seq_data[:,0]==frame_id) * (curr_seq_data[:,1]==ped_id)
                        if len(curr_seq_data[frame_ped_id]) == 0:
                            curr_loss_mask[ped_id_curr_seq, tt] = 0
                            curr_loss_mask_rel[ped_id_curr_seq, tt] = 0
                        elif len(curr_seq_data[frame_ped_id]) == 1:
                            curr_seq[ped_id_curr_seq,:,tt] = curr_seq_data[frame_ped_id, 2:]
                            curr_loss_mask[ped_id_curr_seq, tt] = 1
                            if tt == 0:
                                start_frame_ped_id = (curr_seq_data[:,0]==start_frame_id) * (curr_seq_data[:,1]==ped_id)
                                if len(curr_seq_data[start_frame_ped_id]) == 1:
                                    curr_seq_rel[ped_id_curr_seq,:,0] = \
                                        curr_seq_data[frame_ped_id, 2:] - curr_seq_data[start_frame_ped_id, 2:]
                                    curr_loss_mask_rel[ped_id_curr_seq, 0] = 1
                                else:
                                    curr_loss_mask_rel[ped_id_curr_seq, 0] = 0
                            else:
                                if curr_loss_mask[ped_id_curr_seq, tt-1] == 1:
                                    curr_seq_rel[ped_id_curr_seq,:,tt] = curr_seq[ped_id_curr_seq,:,tt] - curr_seq[ped_id_curr_seq,:,tt-1]
                                    curr_loss_mask_rel[ped_id_curr_seq, tt] = 1
                                else:
                                    curr_loss_mask_rel[ped_id_curr_seq, tt] = 0
                        else:
                            raise RuntimeError("The same pedestrian has multiple locations in the same frame.")
                

                num_peds_in_seq.append(len(peds_in_curr_seq))
                seq_list.append(curr_seq)
                seq_list_rel.append(curr_seq_rel)
                loss_mask_list.append(curr_loss_mask)
                loss_mask_rel_list.append(curr_loss_mask_rel)
                frame_id_seq.append(start_frame_id)
                ped_id_list.append(ped_id_dict)

        self.num_seq = len(seq_list)
        print("total sequence length: ", len(seq_list))
        seq_list = np.concatenate(seq_list, axis=0)
        seq_list_rel = np.concatenate(seq_list_rel, axis=0)
        loss_mask_list = np.concatenate(loss_mask_list, axis=0)
        loss_mask_rel_list = np.concatenate(loss_mask_rel_list, axis=0)
        self.obs_traj = torch.from_numpy(
            seq_list[:, :, :self.obs_seq_len]).type(torch.float)
        # self.pred_traj = torch.from_numpy(
        #     seq_list[:, :, self.obs_seq_len:]).type(torch.float)
        self.obs_traj_rel = torch.from_numpy(
            seq_list_rel[:, :, :self.obs_seq_len]).type(torch.float)
        # self.pred_traj_rel = torch.from_numpy(
        #     seq_list_rel[:, :, self.obs_seq_len:]).type(torch.float)
        self.loss_mask = torch.from_numpy(loss_mask_list).type(torch.float)
        self.loss_mask_rel = torch.from_numpy(loss_mask_rel_list).type(torch.float)
        cum_start_idx = [0] + np.cumsum(num_peds_in_seq).tolist()
        self.seq_start_end = [(start, end) for start, end in zip(cum_start_idx[:-1], cum_start_idx[1:])]
        self.frame_id_seq = frame_id_seq
        self.ped_id_list = ped_id_list
        self.v_obs = []
        self.A_obs = []
        self.attn_mask_obs = []
        # self.v_pred = []
        # self.A_pred = []
        # self.attn_mask_pred = []
        for ss in range(len(self.seq_start_end)):
            start, end = self.seq_start_end[ss]
            v_, a_ = seq_to_graph(
                self.obs_traj[start:end, :], self.obs_traj_rel[start:end, :], attn_mech='rel_conv')
            self.v_obs.append(v_.clone())
            self.A_obs.append(a_.clone())
            # v_, a_ = seq_to_graph(
            #     self.pred_traj[start:end, :], self.pred_traj_rel[start:end, :], attn_mech='rel_conv')
            # self.v_pred.append(v_.clone())
            # self.A_pred.append(a_.clone())
            attn_mask = []
            # for tt in range(self.seq_len):
            for tt in range(self.obs_seq_len):
                loss_mask_rel_tt = self.loss_mask_rel[start:end, tt]
                attn_mask.append(torch.outer(loss_mask_rel_tt, loss_mask_rel_tt).float())
            attn_mask = torch.stack(attn_mask, dim=0)
            self.attn_mask_obs.append(attn_mask[:self.obs_seq_len])
            # self.attn_mask_pred.append(attn_mask[self.obs_seq_len:])


    def __len__(self):
        return self.num_seq


    def __getitem__(self, index):
        start, end = self.seq_start_end[index]
        # out = [
        #     self.obs_traj[start:end, :], self.pred_traj[start:end, :],
        #     self.obs_traj_rel[start:end, :], self.pred_traj_rel[start:end, :],
        #     self.loss_mask_rel[start:end, :], self.loss_mask[start:end, :],
        #     self.v_obs[index], self.A_obs[index],
        #     self.v_pred[index], self.A_pred[index],
        #     self.attn_mask_obs[index], self.attn_mask_pred[index],
        #     self.frame_id_seq[index], self.ped_id_list[index]
        # ]
        out = [
            self.obs_traj[start:end, :], [],
            self.obs_traj_rel[start:end, :], [],
            self.loss_mask_rel[start:end, :], self.loss_mask[start:end, :],
            self.v_obs[index], self.A_obs[index],
            [], [],
            self.attn_mask_obs[index], [],
            self.frame_id_seq[index], self.ped_id_list[index]
        ]
        return out
    
    
def read_file(_path, delim='\t'):
    data = []
    if delim == 'tab':
        delim = '\t'
    elif delim == 'space':
        delim = ' '
    with open(_path, 'r') as f:
        for line in f:
            line = line.strip().split(delim)
            line = [float(i) for i in line]
            data.append(line)
    return np.asarray(data)


def read_trajnet_file(_path):
    _, filename = os.path.split(_path)
    print(filename)
    filename_str, filename_ext = filename.split('.')
    if filename_ext != "ndjson":
        print("Not ndjson file for trajnet++.")
        data = []
        frame_diff = 0.
        start_frames_considered = []
        return data, frame_diff, start_frames_considered, filename_str
        # raise RuntimeError("Not ndjson file for trajnet++.")
    lines = []
    frame_diff = 0.
    start_frames_considered = []
    with open(_path) as f:
        reader = ndjson.reader(f)
        for post in reader:
            # print(type(post))
            # print(post.keys())
            # print(post['scene'])
            if "scene" in post.keys():# and len(start_frames_considered) < 10:
                start_frame = post["scene"]["s"]
                if frame_diff==0.:
                    frame_diff = (post["scene"]["e"]-post["scene"]["s"])/20
                start_frames_considered.append(start_frame)
                # np.unique(data[:, 0]).tolist()
                # {"scene": {"id": 0, "p": 24, "s": 500, "e": 700, "fps": 2.5, "tag": [3, [2]]}}
                # {"scene": {"id": 1, "p": 24, "s": 520, "e": 720, "fps": 2.5, "tag": [4, []]}}
                # {"scene": {"id": 2, "p": 24, "s": 540, "e": 740, "fps": 2.5, "tag": [4, []]}}
                # {"scene": {"id": 3, "p": 24, "s": 560, "e": 760, "fps": 2.5, "tag": [4, []]}}
            if "track" in post.keys():
                f = post["track"]["f"]
                p = post["track"]["p"]
                x = post["track"]["x"]
                y = post["track"]["y"]
                line = [f,p,x,y] # str(f)+'\t'+str(p)+'\t'+str(x)+'\t'+str(y)+'\n'
                lines.append(line)
                # if len(lines)==3000:
                #     break
    data = np.asarray(lines)
    start_frames_considered = np.unique(start_frames_considered)
    return data, frame_diff, start_frames_considered, filename_str


#     import pathhack
# import ndjson
# import os

# raw_folder = 'real_data'
# output_folder = raw_folder+'_txt'
# data_dir = os.path.join(pathhack.pkg_path, "datasets/trajnet++/train", raw_folder)
# output_dir = os.path.join(pathhack.pkg_path, "datasets/trajnet++/train", output_folder)
# all_files = os.listdir(data_dir)
# all_files = [os.path.join(data_dir, _path) for _path in all_files]

# for path in all_files:
#     print(path)
#     _, filename = os.path.split(path)
#     filename_str, filename_ext = filename.split('.')
#     frame = None
#     if filename_ext == "ndjson":
#         lines = []
#         with open(path) as f:
#             reader = ndjson.reader(f)
#             for post in reader:
#                 # print(type(post))
#                 # print(post.keys())
#                 # print(post['scene'])
#                 if "track" in post.keys():
#                     f = post["track"]["f"]
#                     if frame is None:
#                         frame = f
#                     else:
#                         print(f-frame)
#                         break
#                     p = post["track"]["p"]
#                     x = post["track"]["x"]
#                     y = post["track"]["y"]
#                     line = str(f)+'\t'+str(p)+'\t'+str(x)+'\t'+str(y)+'\n'
#                     lines.append(line)


================================================
FILE: gst_updated/src/mgnn/utils.py
================================================
from os.path import join
import argparse
import torch
import numpy as np
from torch.utils.data import DataLoader


def average_offset_error(x_pred, x_target, loss_mask=None):
    assert x_pred.shape[0] == 1
    pos_pred = torch.cumsum(x_pred, dim=1)
    pos_target = torch.cumsum(x_target, dim=1)
    offset_error = torch.sqrt(((pos_pred-pos_target)**2.).sum(3))[0]
    aoe = offset_error.mean(0)
    if loss_mask is not None:
        aoe = aoe * loss_mask[0]
    return aoe

def final_offset_error(x_pred, x_target, loss_mask=None):
    assert x_pred.shape[0] == 1
    pos_pred = torch.cumsum(x_pred, dim=1)
    pos_target = torch.cumsum(x_target, dim=1)
    offset_error = torch.sqrt(((pos_pred-pos_target)**2.).sum(3))[0]
    foe = offset_error[-1]
    if loss_mask is not None:
        foe = foe * loss_mask[0]
    return foe


def negative_log_likelihood(gaussian_params, x_target, loss_mask=None):
    mu, sx, sy, corr = gaussian_params
    assert mu.shape[0] == 1
    sigma = torch.cat((sx, sy), dim=3)
    x_target_norm = (x_target-mu)/sigma
    nx, ny = x_target_norm[:,:,:,0:1], x_target_norm[:,:,:,1:2]
    loss_term_1 = torch.log(1.-corr**2.)/2.+torch.log(sx)+torch.log(sy)
    loss_term_2 = (nx**2.-2.*corr*nx*ny+ny**2.)/(2.*(1.-corr**2.))
    loss = loss_term_1+loss_term_2
    loss = loss.squeeze(3).mean(1)
    loss = (loss[0] * loss_mask[0]).sum()/loss_mask[0].sum()
    return loss


def seq_to_graph(seq_, seq_rel, attn_mech='glob_kip'):
    if len(seq_.shape) == 4:
        seq_ = seq_[0]
        seq_rel = seq_rel[0]
    seq_len = seq_.shape[2]
    max_nodes = seq_.shape[0]

    V = np.zeros((seq_len, max_nodes, 2))

    for s in range(seq_len):
        step_rel = seq_rel[:, :, s]
        for h in range(len(step_rel)):
            V[s, h, :] = step_rel[h]

    if attn_mech == 'plain':
        A = np.ones((seq_len, max_nodes, max_nodes))*(1./max_nodes)
    elif attn_mech == 'rel_conv':
        A = []
        for tt in range(seq_len):
            x = seq_[:,:,tt]
            x_row = torch.ones(max_nodes,max_nodes,2)*x.view(max_nodes,1,2)
            x_col = torch.ones(max_nodes,max_nodes,2)*x.view(1,max_nodes,2)
            edge_attr = x_row-x_col
            A.append(edge_attr.numpy())
        A = np.stack(A, axis=0)
    else:
        raise RuntimeError('Wrong attention mechanism.')

    return torch.from_numpy(V).type(torch.float),\
        torch.from_numpy(A).type(torch.float)


def rotate_graph(vtx, adj, theta):
    vtx_rotated_x = vtx[:,:,:,0:1]*np.cos(theta)-vtx[:,:,:,1:2]*np.sin(theta)
    vtx_rotated_y = vtx[:,:,:,0:1]*np.sin(theta)+vtx[:,:,:,1:2]*np.cos(theta)
    vtx_rotated = torch.cat((vtx_rotated_x, vtx_rotated_y), dim=3)

    adj_rotated_x = adj[:,:,:,:,0:1]*np.cos(theta)-adj[:,:,:,:,1:2]*np.sin(theta)
    adj_rotated_y = adj[:,:,:,:,0:1]*np.sin(theta)+adj[:,:,:,:,1:2]*np.cos(theta)
    adj_rotated = torch.cat((adj_rotated_x, adj_rotated_y), dim=4)

    return vtx_rotated, adj_rotated

def random_rotate_graph(args, vtx_obs, adj_obs, vtx_pred_gt, adj_pred_gt):
    if args.rotation_pattern is None:
        raise RuntimeError('random_rotate_seq should not be called when rotation_pattern is None.')

    if args.rotation_pattern == 'right_angle':
        theta = (torch.randint(0, 4, ()).float()/2.*np.pi).item()
    elif args.rotation_pattern == 'random':
        theta = (torch.rand(())*2.*np.pi).item()
    else:
        raise RuntimeError('Rotation pattern is not found in args.')
    vtx_obs_rotated, adj_obs_rotated = rotate_graph(vtx_obs, adj_obs, theta)
    vtx_pred_gt_rotated, adj_pred_gt_rotated = rotate_graph(vtx_pred_gt, adj_pred_gt, theta)
    return (vtx_obs_rotated, adj_obs_rotated, vtx_pred_gt_rotated, adj_pred_gt_rotated), theta

def load_batch_dataset(args, pkg_path, subfolder='train', num_workers=4, shuffle=None):
    result_filename = args.dataset+'_dset_'+subfolder+'_batch_trajectories.pt'
    if args.dataset == 'sdd':
        dataset_folderpath = join(pkg_path, 'datasets/sdd/social_pool_data')
    elif args.dataset == 'real' or args.dataset == 'synth' or args.dataset == 'all':
        dataset_folderpath = join(pkg_path, 'datasets/trajnet++/train/')
    elif args.dataset == 'deathCircle' or args.dataset == 'hyang':
        dataset_folderpath = join(pkg_path, 'datasets/sdd', args.dataset)
    elif args.dataset == 'sj':
        dataset_folderpath = join(pkg_path, 'datasets/shuijing/orca_20humans_fov')
    else:
        # dataset_folderpath = join(pkg_path, 'datasets/eth_ucy', args.dataset)
        dataset_folderpath = join(pkg_path, 'datasets/self_eth_ucy', args.dataset)
        print("self_eth_ucy")
    dset = torch.load(join(dataset_folderpath, result_filename))
    if shuffle is None:
        if subfolder == 'train':
            shuffle = True
        else:
            shuffle = False
    dloader = DataLoader(
        dset,
        batch_size=1,
        shuffle=shuffle,
        num_workers=num_workers)
    return dloader

def args2writername(args):
    writername = str(args.temp_epochs)+'-'+str(args.spatial)\
        +'-'+str(args.temporal)+'-lr_'+str(args.lr)
    if args.deterministic:
        writername = writername + '-deterministic'
    # if args.init_temp != 1.:
    writername = writername + '-init_temp_'+str(args.init_temp)
    if args.clip_grad is not None:
        writername = writername + '-clip_grad_'+str(args.clip_grad)
    # if args.spatial_num_heads_edges != 4:
    writername = writername + '-edge_head_'+str(args.spatial_num_heads_edges)
    if args.only_observe_full_period:
        writername = writername + '-only_full'
    if args.detach_sample:
        writername = writername + '-detach'
    # if args.embedding_size != 64:
    writername = writername + '-ebd_'+str(args.embedding_size)
    # if args.spatial_num_layers != 3:
    writername = writername + '-snl_'+str(args.spatial_num_layers)
    # if args.spatial_num_heads != 8:
    writername = writername + '-snh_'+str(args.spatial_num_heads)
    if args.ghost:
        writername = writername + '-ghost'
    writername = writername + '-seed_'+str(args.random_seed)
    return writername


def arg_parse():
    parser = argparse.ArgumentParser()
    parser.add_argument('--spatial', default='rel_conv',
                        help='spatial encoding methods: rel_conv, plain')
    parser.add_argument('--temporal', default='lstm',
                        help='temporal encoding methods: lstm, temporal_convolution_net, faster_lstm.')                    
    parser.add_argument('--output_dim', type=int, default=5,
                        help='5 for mu_x, mu_y, sigma_x, sigma_y, corr')
    parser.add_argument('--embedding_size', type=int, default=64)
    parser.add_argument('--spatial_num_heads', type=int, default=8,
                        help='number of heads for multi-head \
                        attention mechanism in spatial encoding')
    parser.add_argument('--lstm_hidden_size', type=int, default=64)
    parser.add_argument('--lstm_num_layers', type=int, default=1)          
    parser.add_argument('--decode_style', default='recursive',
                        help='recursive, readout')
    parser.add_argument('--detach_sample', action='store_true',
                        help='Default False means using reparameterization trick. \
                        True means disable the trick and cut gradient flow between \
                        gaussian parameters and samples in prediction period.')
    parser.add_argument('--motion_dim', type=int, default=2,
                        help='pedestrian motion is 2D')
    parser.add_argument('--dataset', default='eth',
                        help='eth,hotel,univ,zara1,zara2')
    parser.add_argument('--obs_seq_len', type=int, default=8)
    parser.add_argument('--pred_seq_len', type=int, default=12)
    parser.add_argument('--rotation_pattern', default=None,
                        help='rotation pattern used for data augmentation in training \
                        phase: right_angle, or random. None means no rotation.')
    parser.add_argument('--batch_size', type=int, default=16,
                        help='minibatch size')
    parser.add_argument('--num_epochs', type=int, default=200,
                        help='number of epochs')
    parser.add_argument('--lr', type=float, default=1e-3,
                        help='learning rate')
    parser.add_argument('--clip_grad', type=float, default=None,
                        help='gradient clipping')
    parser.add_argument('--organize_csv', action='store_true',
                        help='Organize test_performance.csv towards performance_organized.csv.')
    parser.add_argument('--spatial_num_layers', type=int, default=3)
    parser.add_argument('--only_observe_full_period', action='store_true',
                        help='Only observe pedestrians that appear in the full period.')
    parser.add_argument('--spatial_num_heads_edges', type=int, default=4)
    parser.add_argument('--ghost', action='store_true')
    parser.add_argument('--random_seed', type=int, default=1000)
    parser.add_argument('--deterministic', action='store_true')
    parser.add_argument('--init_temp', type=float, default=1.)
    parser.add_argument('--resume_training', action='store_true')
    parser.add_argument('--temp_epochs', type=int, default=200)
    parser.add_argument('--save_epochs', type=int, default=10)
    parser.add_argument('--resume_epoch', type=int, default=None,
                        help='the index of epoch where we resume training.')
    return parser.parse_args()


================================================
FILE: gst_updated/src/pec_net/config/optimal.yaml
================================================
adl_reg: 1
data_scale: 1.86
dataset_type: image
dec_size:
- 1024
- 512
- 1024
dist_thresh: 100
enc_dest_size:
- 8
- 16
enc_latent_size:
- 8
- 50
enc_past_size:
- 512
- 256
non_local_theta_size:
- 256
- 128
- 64
non_local_phi_size:
- 256
- 128
- 64
non_local_g_size:
- 256
- 128
- 64
non_local_dim: 128
fdim: 16
future_length: 12
gpu_index: 0
kld_reg: 1
learning_rate: 0.0003
mu: 0
n_values: 20
nonlocal_pools: 3
normalize_type: shift_origin
num_epochs: 650
num_workers: 0
past_length: 8
predictor_hidden_size:
- 1024
- 512
- 256
sigma: 1.3
test_b_size: 4096
time_thresh: 0
train_b_size: 512
zdim: 16


================================================
FILE: gst_updated/src/pec_net/sdd_trajectories.py
================================================
from torch.utils.data import Dataset
import numpy as np

class SDDTrajectoriesDataset(Dataset):
    def __init__(
        self,
        ):
        super(SDDTrajectoriesDataset, self).__init__()
        self.num_seq = 0
        self.obs_traj = []
        self.pred_traj = []
        self.obs_traj_rel = []
        self.pred_traj_rel = []
        self.loss_mask_rel = []
        self.loss_mask = []
        self.v_obs = []
        self.A_obs = []
        self.v_pred = []
        self.A_pred = []
        self.attn_mask_obs = []
        self.attn_mask_pred = []

    def add_batch(
        self,
        obs_traj,
        pred_traj_gt,
        obs_traj_rel,
        pred_traj_rel_gt,
        loss_mask_rel,
        loss_mask,
        v_obs, 
        A_obs,
        v_pred,
        A_pred,
        attn_mask_obs,
        attn_mask_pred,
    ):
        self.num_seq += 1
        self.obs_traj.append(obs_traj[0].float())
        self.pred_traj.append(pred_traj_gt[0].float())
        self.obs_traj_rel.append(obs_traj_rel[0].float())
        self.pred_traj_rel.append(pred_traj_rel_gt[0].float())
        self.loss_mask_rel.append(loss_mask_rel[0].float())
        self.loss_mask.append(loss_mask[0].float())
        self.v_obs.append(v_obs[0].float())
        self.A_obs.append(A_obs[0].float())
        self.v_pred.append(v_pred[0].float())
        self.A_pred.append(A_pred[0].float())
        self.attn_mask_obs.append(attn_mask_obs[0].float())
        self.attn_mask_pred.append(attn_mask_pred[0].float())

    def __len__(self):
        return self.num_seq

    def __getitem__(self, index):
        out = [
            self.obs_traj[index], self.pred_traj[index],
            self.obs_traj_rel[index], self.pred_traj_rel[index],
            self.loss_mask_rel[index], self.loss_mask[index],
            self.v_obs[index], self.A_obs[index],
            self.v_pred[index], self.A_pred[index],
            self.attn_mask_obs[index], self.attn_mask_pred[index],
        ]
        return out

================================================
FILE: gst_updated/src/pec_net/social_utils.py
================================================
from IPython import embed
import glob
import pandas as pd
import pickle
import os
import torch
from torch import nn
from torch.utils import data
import random
import numpy as np

'''for sanity check'''
def naive_social(p1_key, p2_key, all_data_dict):
	if abs(p1_key-p2_key)<4:
		return True
	else:
		return False

def find_min_time(t1, t2):
	'''given two time frame arrays, find then min dist (time)'''
	min_d = 9e4
	t1, t2 = t1[:8], t2[:8]

	for t in t2:
		if abs(t1[0]-t)<min_d:
			min_d = abs(t1[0]-t)

	for t in t1:
		if abs(t2[0]-t)<min_d:
			min_d = abs(t2[0]-t)

	return min_d

def find_min_dist(p1x, p1y, p2x, p2y):
	'''given two time frame arrays, find then min dist'''
	min_d = 9e4
	p1x, p1y = p1x[:8], p1y[:8]
	p2x, p2y = p2x[:8], p2y[:8]

	for i in range(len(p1x)):
		for j in range(len(p1x)):
			if ((p2x[i]-p1x[j])**2 + (p2y[i]-p1y[j])**2)**0.5 < min_d:
				min_d = ((p2x[i]-p1x[j])**2 + (p2y[i]-p1y[j])**2)**0.5

	return min_d

def social_and_temporal_filter(p1_key, p2_key, all_data_dict, time_thresh=48, dist_tresh=100):
	p1_traj, p2_traj = np.array(all_data_dict[p1_key]), np.array(all_data_dict[p2_key])
	p1_time, p2_time = p1_traj[:,1], p2_traj[:,1]
	p1_x, p2_x = p1_traj[:,2], p2_traj[:,2]
	p1_y, p2_y = p1_traj[:,3], p2_traj[:,3]

	if find_min_time(p1_time, p2_time)>time_thresh:
		return False
	if find_min_dist(p1_x, p1_y, p2_x, p2_y)>dist_tresh:
		return False

	return True

def mark_similar(mask, sim_list):
	for i in range(len(sim_list)):
		for j in range(len(sim_list)):
			mask[sim_list[i]][sim_list[j]] = 1


def collect_data(set_name, dataset_type = 'image', batch_size=512, time_thresh=48, dist_tresh=100, scene=None, verbose=True, root_path="./"):

	assert set_name in ['train','val','test']

	'''Please specify the parent directory of the dataset. In our case data was stored in:
		root_path/trajnet_image/train/scene_name.txt
		root_path/trajnet_image/test/scene_name.txt
	'''

	rel_path = '/trajnet_{0}/{1}/stanford'.format(dataset_type, set_name)

	full_dataset = []
	full_masks = []

	current_batch = []
	mask_batch = [[0 for i in range(int(batch_size*1.5))] for j in range(int(batch_size*1.5))]

	current_size = 0
	social_id = 0
	part_file = '/{}.txt'.format('*' if scene == None else scene)

	for file in glob.glob(root_path + rel_path + part_file):
		scene_name = file[len(root_path+rel_path)+1:-6] + file[-5]
		data = np.loadtxt(fname = file, delimiter = ' ')

		data_by_id = {}
		for frame_id, person_id, x, y in data:
			if person_id not in data_by_id.keys():
				data_by_id[person_id] = []
			data_by_id[person_id].append([person_id, frame_id, x, y])

		all_data_dict = data_by_id.copy()
		if verbose:
			print("Total People: ", len(list(data_by_id.keys())))
		while len(list(data_by_id.keys()))>0:
			related_list = []
			curr_keys = list(data_by_id.keys())

			if current_size<batch_size:
				pass
			else:
				full_dataset.append(current_batch.copy())
				mask_batch = np.array(mask_batch)
				full_masks.append(mask_batch[0:len(current_batch), 0:len(current_batch)])

				current_size = 0
				social_id = 0
				current_batch = []
				mask_batch = [[0 for i in range(int(batch_size*1.5))] for j in range(int(batch_size*1.5))]

			current_batch.append((all_data_dict[curr_keys[0]]))
			related_list.append(current_size)
			current_size+=1
			del data_by_id[curr_keys[0]]

			for i in range(1, len(curr_keys)):
				if social_and_temporal_filter(curr_keys[0], curr_keys[i], all_data_dict, time_thresh, dist_tresh):
					current_batch.append((all_data_dict[curr_keys[i]]))
					related_list.append(current_size)
					current_size+=1
					del data_by_id[curr_keys[i]]

			mark_similar(mask_batch, related_list)
			social_id +=1


	full_dataset.append(current_batch)
	mask_batch = np.array(mask_batch)
	full_masks.append(mask_batch[0:len(current_batch),0:len(current_batch)])
	return full_dataset, full_masks

def generate_pooled_data(b_size, t_tresh, d_tresh, train=True, scene=None, verbose=True):
	if train:
		full_train, full_masks_train = collect_data("train", batch_size=b_size, time_thresh=t_tresh, dist_tresh=d_tresh, scene=scene, verbose=verbose)
		train = [full_train, full_masks_train]
		train_name = "../social_pool_data/train_{0}_{1}_{2}_{3}.pickle".format('all' if scene is None else scene[:-2] + scene[-1], b_size, t_tresh, d_tresh)
		with open(train_name, 'wb') as f:
			pickle.dump(train, f)

	if not train:
		full_test, full_masks_test = collect_data("test", batch_size=b_size, time_thresh=t_tresh, dist_tresh=d_tresh, scene=scene, verbose=verbose)
		test = [full_test, full_masks_test]
		test_name = "../social_pool_data/test_{0}_{1}_{2}_{3}.pickle".format('all' if scene is None else scene[:-2] + scene[-1], b_size, t_tresh, d_tresh)# + str(b_size) + "_" + str(t_tresh) + "_" + str(d_tresh) + ".pickle"
		with open(test_name, 'wb') as f:
			pickle.dump(test, f)

def initial_pos(traj_batches):
	batches = []
	for b in traj_batches:
		starting_pos = b[:,7,:].copy()/1000 #starting pos is end of past, start of future. scaled down.
		batches.append(starting_pos)

	return batches

def calculate_loss(x, reconstructed_x, mean, log_var, criterion, future, interpolated_future):
	# reconstruction loss
	RCL_dest = criterion(x, reconstructed_x)

	ADL_traj = criterion(future, interpolated_future) # better with l2 loss

	# kl divergence loss
	KLD = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())

	return RCL_dest, KLD, ADL_traj

class SocialDataset(data.Dataset):

	def __init__(self, set_name, pkg_path, b_size=4096, t_tresh=60, d_tresh=50, scene=None, id=False, verbose=True):
		# 'Initialization'
		# set_name = 'train' or 'test'
		load_name = os.path.join(pkg_path, "datasets/sdd/social_pool_data/{0}_{1}{2}_{3}_{4}.pickle".format(set_name, 'all_' if scene is None else scene[:-2] + scene[-1] + '_', b_size, t_tresh, d_tresh))
		print(load_name)
		with open(load_name, 'rb') as f:
			data = pickle.load(f)

		traj, masks = data
		traj_new = []

		if id==False:
			for t in traj:
				t = np.array(t)
				t = t[:,:,2:]
				traj_new.append(t)
				if set_name=="train":
					#augment training set with reversed tracklets...
					reverse_t = np.flip(t, axis=1).copy()
					traj_new.append(reverse_t)
		else:
			for t in traj:
				t = np.array(t)
				traj_new.append(t)

				if set_name=="train":
					#augment training set with reversed tracklets...
					reverse_t = np.flip(t, axis=1).copy()
					traj_new.append(reverse_t)


		masks_new = []
		for m in masks:
			masks_new.append(m)

			if set_name=="train":
				#add second time for the reversed tracklets...
				masks_new.append(m)

		traj_new = np.array(traj_new)
		masks_new = np.array(masks_new)
		self.trajectory_batches = traj_new.copy()
		self.mask_batches = masks_new.copy()
		self.initial_pos_batches = np.array(initial_pos(self.trajectory_batches)) #for relative positioning
		if verbose:
			print("Initialized social dataloader...")

"""
We've provided pickle files, but to generate new files for different datasets or thresholds, please use a command like so:
Parameter1: batchsize, Parameter2: time_thresh, Param3: dist_thresh
"""

# generate_pooled_data(512,0,25, train=True, verbose=True, root_path="./")

def split_square_block_matrix(block_mat):
    block_size_list = []
    start_idx = 0
    curr_block_size = 1
    for i in range(1, block_mat.shape[0]):
        if block_mat[start_idx, i] != 0:
            curr_block_size += 1
        else:
            block_size_list.append(curr_block_size)
            curr_block_size = 1
            start_idx = i
    block_size_list.append(curr_block_size)
    return block_size_list


================================================
FILE: gst_updated/tuning/211130-train_shuijing.sh
================================================
temporal=faster_lstm
decode_style=recursive
only_observe_full_period=

save_epochs=10
spatial=gumbel_social_transformer
rotation_pattern=random
lr=1e-3
spatial_num_heads=8
embedding_size=64
lstm_hidden_size=64
batch_size=1

deterministic=
detach_sample=
init_temp=0.5
clip_grad=10.
temp_epochs=100

for random_seed in 1000; do
    for num_epochs in 100; do
        for spatial_num_layers in 1; do
            for lr in 1e-3; do
                for dataset in sj; do
                    for spatial_num_heads_edges in 0; do
                        CUDA_VISIBLE_DEVICES=1 python -u scripts/experiments/train.py --spatial $spatial --temporal $temporal --lr $lr\
                            --dataset $dataset --temp_epochs $temp_epochs --num_epochs $num_epochs --save_epochs $save_epochs\
                            $detach_sample\
                            --spatial_num_heads $spatial_num_heads --spatial_num_layers $spatial_num_layers\
                            --decode_style $decode_style \
                            --spatial_num_heads_edges $spatial_num_heads_edges --random_seed $random_seed\
                            $deterministic --lstm_hidden_size $lstm_hidden_size --embedding_size $embedding_size\
                            --batch_size $batch_size \
                            --init_temp $init_temp $only_observe_full_period\
                            --obs_seq_len 5 --pred_seq_len 5 \
                            | tee -a logs/211130.txt
                    done
                done
            done
        done
    done
done

# for random_seed in 1000; do
#     for num_epochs in 100; do
#         for spatial_num_layers in 1; do
#             for lr in 1e-3; do
#                 for dataset in deathCircle hyang; do
#                     for spatial_num_heads_edges in 0; do
#                         CUDA_VISIBLE_DEVICES=1 python -u scripts/experiments/train.py --spatial $spatial --temporal $temporal --lr $lr\
#                             --dataset $dataset --temp_epochs $temp_epochs --num_epochs $num_epochs --save_epochs $save_epochs\
#                             $detach_sample\
#                             --spatial_num_heads $spatial_num_heads --spatial_num_layers $spatial_num_layers\
#                             --decode_style $decode_style \
#                             --spatial_num_heads_edges $spatial_num_heads_edges --random_seed $random_seed\
#                             $deterministic --lstm_hidden_size $lstm_hidden_size --embedding_size $embedding_size\
#                             --batch_size $batch_size \
#                             --init_temp $init_temp --only_observe_full_period\
#                             | tee -a logs/211112.txt
#                     done
#                 done
#             done
#         done
#     done
# done


# for random_seed in 1000; do
#     for num_epochs in 100; do
#         for spatial_num_layers in 1; do
#             for lr in 1e-3; do
#                 for dataset in deathCircle hyang; do
#                     for spatial_num_heads_edges in 16 8 4 2 1 0; do
#                         CUDA_VISIBLE_DEVICES=1 python -u scripts/experiments/train.py --spatial $spatial --temporal $temporal --lr $lr\
#                             --dataset $dataset --temp_epochs $temp_epochs --num_epochs $num_epochs --save_epochs $save_epochs\
#                             $detach_sample\
#                             --spatial_num_heads $spatial_num_heads --spatial_num_layers $spatial_num_layers\
#                             --decode_style $decode_style \
#                             --spatial_num_heads_edges $spatial_num_heads_edges --random_seed $random_seed\
#                             $deterministic --lstm_hidden_size $lstm_hidden_size --embedding_size $embedding_size\
#                             --batch_size $batch_size \
#                             --init_temp $init_temp $only_observe_full_period\
#                             | tee -a logs/211112.txt
#                     done
#                 done
#             done
#         done
#     done
# done


================================================
FILE: gst_updated/tuning/211203-eval_shuijing.sh
================================================
temporal=faster_lstm
decode_style=recursive
only_observe_full_period=

save_epochs=10
spatial=gumbel_social_transformer
rotation_pattern=random
lr=1e-3
spatial_num_heads=8
embedding_size=64
lstm_hidden_size=64
batch_size=1

deterministic=
detach_sample=
init_temp=0.5
clip_grad=10.
temp_epochs=100

for random_seed in 1000; do
    for num_epochs in 100; do
        for spatial_num_layers in 1; do
            for lr in 1e-3; do
                for dataset in sj; do
                    for spatial_num_heads_edges in 0; do
                        CUDA_VISIBLE_DEVICES=0 python -u scripts/experiments/eval.py --spatial $spatial --temporal $temporal --lr $lr\
                            --dataset $dataset --temp_epochs $temp_epochs --num_epochs $num_epochs --save_epochs $save_epochs\
                            $detach_sample\
                            --spatial_num_heads $spatial_num_heads --spatial_num_layers $spatial_num_layers\
                            --decode_style $decode_style \
                            --spatial_num_heads_edges $spatial_num_heads_edges --random_seed $random_seed\
                            $deterministic --lstm_hidden_size $lstm_hidden_size --embedding_size $embedding_size\
                            --batch_size $batch_size \
                            --init_temp $init_temp $only_observe_full_period\
                            --obs_seq_len 5 --pred_seq_len 5 \
                            | tee -a logs/211203_sj.txt
                        
                    done
                done
            done
        done
    done
done


================================================
FILE: gst_updated/tuning/211203-train_shuijing.sh
================================================
temporal=faster_lstm
decode_style=recursive
only_observe_full_period=

save_epochs=10
spatial=gumbel_social_transformer
rotation_pattern=random
lr=1e-3
spatial_num_heads=8
embedding_size=64
lstm_hidden_size=64
batch_size=1

deterministic=
detach_sample=
init_temp=0.5
clip_grad=10.
temp_epochs=100

for random_seed in 1000; do
    for num_epochs in 100; do
        for spatial_num_layers in 1; do
            for lr in 1e-3; do
                for dataset in sj; do
                    for spatial_num_heads_edges in 4; do
                        CUDA_VISIBLE_DEVICES=1 python -u scripts/experiments/train.py --spatial $spatial --temporal $temporal --lr $lr\
                            --dataset $dataset --temp_epochs $temp_epochs --num_epochs $num_epochs --save_epochs $save_epochs\
                            $detach_sample\
                            --spatial_num_heads $spatial_num_heads --spatial_num_layers $spatial_num_layers\
                            --decode_style $decode_style \
                            --spatial_num_heads_edges $spatial_num_heads_edges --random_seed $random_seed\
                            $deterministic --lstm_hidden_size $lstm_hidden_size --embedding_size $embedding_size\
                            --batch_size $batch_size \
                            --init_temp $init_temp $only_observe_full_period --ghost\
                            --obs_seq_len 5 --pred_seq_len 5 \
                            | tee -a logs/211130.txt
                    done
                done
            done
        done
    done
done

# for random_seed in 1000; do
#     for num_epochs in 100; do
#         for spatial_num_layers in 1; do
#             for lr in 1e-3; do
#                 for dataset in deathCircle hyang; do
#                     for spatial_num_heads_edges in 0; do
#                         CUDA_VISIBLE_DEVICES=1 python -u scripts/experiments/train.py --spatial $spatial --temporal $temporal --lr $lr\
#                             --dataset $dataset --temp_epochs $temp_epochs --num_epochs $num_epochs --save_epochs $save_epochs\
#                             $detach_sample\
#                             --spatial_num_heads $spatial_num_heads --spatial_num_layers $spatial_num_layers\
#                             --decode_style $decode_style \
#                             --spatial_num_heads_edges $spatial_num_heads_edges --random_seed $random_seed\
#                             $deterministic --lstm_hidden_size $lstm_hidden_size --embedding_size $embedding_size\
#                             --batch_size $batch_size \
#                             --init_temp $init_temp --only_observe_full_period\
#                             | tee -a logs/211112.txt
#                     done
#                 done
#             done
#         done
#     done
# done


# for random_seed in 1000; do
#     for num_epochs in 100; do
#         for spatial_num_layers in 1; do
#             for lr in 1e-3; do
#                 for dataset in deathCircle hyang; do
#                     for spatial_num_heads_edges in 16 8 4 2 1 0; do
#                         CUDA_VISIBLE_DEVICES=1 python -u scripts/experiments/train.py --spatial $spatial --temporal $temporal --lr $lr\
#                             --dataset $dataset --temp_epochs $temp_epochs --num_epochs $num_epochs --save_epochs $save_epochs\
#                             $detach_sample\
#                             --spatial_num_heads $spatial_num_heads --spatial_num_layers $spatial_num_layers\
#                             --decode_style $decode_style \
#                             --spatial_num_heads_edges $spatial_num_heads_edges --random_seed $random_seed\
#                             $deterministic --lstm_hidden_size $lstm_hidden_size --embedding_size $embedding_size\
#                             --batch_size $batch_size \
#                             --init_temp $init_temp $only_observe_full_period\
#                             | tee -a logs/211112.txt
#                     done
#                 done
#             done
#         done
#     done
# done


================================================
FILE: gst_updated/tuning/211209-test_shuijing.sh
================================================
temporal=faster_lstm
decode_style=recursive
only_observe_full_period=

save_epochs=10
spatial=gumbel_social_transformer
rotation_pattern=random
lr=1e-3
spatial_num_heads=8
embedding_size=64
lstm_hidden_size=64
batch_size=1

deterministic=
detach_sample=
init_temp=0.5
clip_grad=10.
temp_epochs=100

for random_seed in 1000; do
    for num_epochs in 100; do
        for spatial_num_layers in 1; do
            for lr in 1e-3; do
                for dataset in sj; do
                    for spatial_num_heads_edges in 0; do
                        CUDA_VISIBLE_DEVICES=0 python -u scripts/experiments/test.py --spatial $spatial --temporal $temporal --lr $lr\
                            --dataset $dataset --temp_epochs $temp_epochs --num_epochs $num_epochs --save_epochs $save_epochs\
                            $detach_sample\
                            --spatial_num_heads $spatial_num_heads --spatial_num_layers $spatial_num_layers\
                            --decode_style $decode_style \
                            --spatial_num_heads_edges $spatial_num_heads_edges --random_seed $random_seed\
                            $deterministic --lstm_hidden_size $lstm_hidden_size --embedding_size $embedding_size\
                            --batch_size $batch_size \
                            --init_temp $init_temp $only_observe_full_period\
                            --obs_seq_len 5 --pred_seq_len 5 \
                            | tee -a logs/211203_sj.txt
                        
                    done
                done
            done
        done
    done
done


================================================
FILE: plot.py
================================================
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np


legends = ['Non-randomized humans', 'randomized humans', '', '', '']

# add any folder directories here!
log_list = [
pd.read_csv("trained_models/GST_predictor_non_rand/progress.csv"),
pd.read_csv("trained_models/GST_predictor_rand/progress.csv"),
	]


logDicts = {}
for i in range(len(log_list)):
	logDicts[i] = log_list[i]

graphDicts={0:'eprewmean', 1:'loss/value_loss'}

legendList=[]
# summarize history for accuracy

# for each metric
for i in range(len(graphDicts)):
	plt.figure(i)
	plt.title(graphDicts[i])
	j = 0
	for key in logDicts:
		if graphDicts[i] not in logDicts[key]:
			continue
		else:
			plt.plot(logDicts[key]['misc/total_timesteps'],logDicts[key][graphDicts[i]])

			legendList.append(legends[j])
			print('avg', str(key), graphDicts[i], np.average(logDicts[key][graphDicts[i]]))
		j = j + 1
	print('------------------------')

	plt.xlabel('total_timesteps')
	plt.legend(legendList, loc='lower right')
	legendList=[]

plt.show()


================================================
FILE: requirements.txt
================================================
numpy==1.20.3
pandas==1.5.2
gym==0.15.7
tensorflow-gpu==2.11.0
matplotlib==3.6.2
Cython

================================================
FILE: rl/.gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

trained_models/
.fuse_hidden*


================================================
FILE: rl/__init__.py
================================================


================================================
FILE: rl/evaluation.py
================================================
import numpy as np
import torch

from crowd_sim.envs.utils.info import *


def evaluate(actor_critic, eval_envs, num_processes, device, test_size, logging, config, args, visualize=False):
    """ function to run all testing episodes and log the testing metrics """
    # initializations
    eval_episode_rewards = []

    if config.robot.policy not in ['orca', 'social_force']:
        eval_recurrent_hidden_states = {}

        node_num = 1
        edge_num = actor_critic.base.human_num + 1
        eval_recurrent_hidden_states['human_node_rnn'] = torch.zeros(num_processes, node_num, actor_critic.base.human_node_rnn_size,
                                                                     device=device)

        eval_recurrent_hidden_states['human_human_edge_rnn'] = torch.zeros(num_processes, edge_num,
                                                                           actor_critic.base.human_human_edge_rnn_size,
                                                                           device=device)

    eval_masks = torch.zeros(num_processes, 1, device=device)

    success_times = []
    collision_times = []
    timeout_times = []

    success = 0
    collision = 0
    timeout = 0
    too_close_ratios = []
    min_dist = []

    collision_cases = []
    timeout_cases = []

    all_path_len = []

    # to make it work with the virtualenv in sim2real
    if hasattr(eval_envs.venv, 'envs'):
        baseEnv = eval_envs.venv.envs[0].env
    else:
        baseEnv = eval_envs.venv.unwrapped.envs[0].env
    time_limit = baseEnv.time_limit

    # start the testing episodes
    for k in range(test_size):
        baseEnv.episode_k = k
        done = False
        rewards = []
        stepCounter = 0
        episode_rew = 0
        obs = eval_envs.reset()
        global_time = 0.0
        path_len = 0.
        too_close = 0.
        last_pos = obs['robot_node'][0, 0, :2].cpu().numpy()


        while not done:
            stepCounter = stepCounter + 1
            if config.robot.policy not in ['orca', 'social_force']:
                # run inference on the NN policy
                with torch.no_grad():
                    _, action, _, eval_recurrent_hidden_states = actor_critic.act(
                        obs,
                        eval_recurrent_hidden_states,
                        eval_masks,
                        deterministic=True)
            else:
                action = torch.zeros([1, 2], device=device)
            if not done:
                global_time = baseEnv.global_time

            # if the vec_pretext_normalize.py wrapper is used, send the predicted traj to env
            if args.env_name == 'CrowdSimPredRealGST-v0' and config.env.use_wrapper:
                out_pred = obs['spatial_edges'][:, :, 2:].to('cpu').numpy()
                # send manager action to all processes
                ack = eval_envs.talk2Env(out_pred)
                assert all(ack)
            # render
            if visualize:
                eval_envs.render()

            # Obser reward and next obs
            obs, rew, done, infos = eval_envs.step(action)

            # record the info for calculating testing metrics
            rewards.append(rew)

            path_len = path_len + np.linalg.norm(obs['robot_node'][0, 0, :2].cpu().numpy() - last_pos)
            last_pos = obs['robot_node'][0, 0, :2].cpu().numpy()


            if isinstance(infos[0]['info'], Danger):
                too_close = too_close + 1
                min_dist.append(infos[0]['info'].min_dist)

            episode_rew += rew[0]


            eval_masks = torch.tensor(
                [[0.0] if done_ else [1.0] for done_ in done],
                dtype=torch.float32,
                device=device)

            for info in infos:
                if 'episode' in info.keys():
                    eval_episode_rewards.append(info['episode']['r'])

        # an episode ends!
        print('')
        print('Reward={}'.format(episode_rew))
        print('Episode', k, 'ends in', stepCounter)
        all_path_len.append(path_len)
        too_close_ratios.append(too_close/stepCounter*100)


        if isinstance(infos[0]['info'], ReachGoal):
            success += 1
            success_times.append(global_time)
            print('Success')
        elif isinstance(infos[0]['info'], Collision):
            collision += 1
            collision_cases.append(k)
            collision_times.append(global_time)
            print('Collision')
        elif isinstance(infos[0]['info'], Timeout):
            timeout += 1
            timeout_cases.append(k)
            timeout_times.append(time_limit)
            print('Time out')
        elif isinstance(infos[0]['info'] is None):
            pass
        else:
            raise ValueError('Invalid end signal from environment')

    # all episodes end
    success_rate = success / test_size
    collision_rate = collision / test_size
    timeout_rate = timeout / test_size
    assert success + collision + timeout == test_size
    avg_nav_time = sum(success_times) / len(
        success_times) if success_times else time_limit  # baseEnv.env.time_limit

    # logging
    logging.info(
        'Testing success rate: {:.2f}, collision rate: {:.2f}, timeout rate: {:.2f}, '
        'nav time: {:.2f}, path length: {:.2f}, average intrusion ratio: {:.2f}%, '
        'average minimal distance during intrusions: {:.2f}'.
            format(success_rate, collision_rate, timeout_rate, avg_nav_time, np.mean(all_path_len),
                   np.mean(too_close_ratios), np.mean(min_dist)))

    logging.info('Collision cases: ' + ' '.join([str(x) for x in collision_cases]))
    logging.info('Timeout cases: ' + ' '.join([str(x) for x in timeout_cases]))
    print(" Evaluation using {} episodes: mean reward {:.5f}\n".format(
        len(eval_episode_rewards), np.mean(eval_episode_rewards)))

    eval_envs.close()

================================================
FILE: rl/networks/__init__.py
================================================


================================================
FILE: rl/networks/distributions.py
================================================
import math

import torch
import torch.nn as nn
import torch.nn.functional as F

from rl.networks.network_utils import AddBias, init

"""
Modify standard PyTorch distributions so they are compatible with this code.
"""

#
# Standardize distribution interfaces
#

# Categorical
class FixedCategorical(torch.distributions.Categorical):
    def sample(self):
        return super().sample().unsqueeze(-1)

    def log_probs(self, actions):
        return (
            super()
            .log_prob(actions.squeeze(-1))
            .view(actions.size(0), -1)
            .sum(-1)
            .unsqueeze(-1)
        )

    def mode(self):
        return self.probs.argmax(dim=-1, keepdim=True)


# Normal
class FixedNormal(torch.distributions.Normal):
    def log_probs(self, actions):
        return super(FixedNormal, self).log_prob(actions).sum(-1, keepdim=True)

    def entrop(self):
        return super.entropy().sum(-1)

    def mode(self):
        return self.mean


# Bernoulli
class FixedBernoulli(torch.distributions.Bernoulli):
    def log_probs(self, actions):
        return super.log_prob(actions).view(actions.size(0), -1).sum(-1).unsqueeze(-1)

    def entropy(self):
        return super().entropy().sum(-1)

    def mode(self):
        return torch.gt(self.probs, 0.5).float()


class Categorical(nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(Categorical, self).__init__()

        init_ = lambda m: init(
            m,
            nn.init.orthogonal_,
            lambda x: nn.init.constant_(x, 0),
            gain=0.01)

        self.linear = init_(nn.Linear(num_inputs, num_outputs))

    def forward(self, x):
        x = self.linear(x)
        return FixedCategorical(logits=x)


class DiagGaussian(nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(DiagGaussian, self).__init__()

        init_ = lambda m: init(m, nn.init.orthogonal_, lambda x: nn.init.
                               constant_(x, 0))

        self.fc_mean = init_(nn.Linear(num_inputs, num_outputs))
        self.logstd = AddBias(torch.zeros(num_outputs))

    def forward(self, x):
        action_mean = self.fc_mean(x)

        #  An ugly hack for my KFAC implementation.
        zeros = torch.zeros(action_mean.size())
        if x.is_cuda:
            zeros = zeros.cuda()

        action_logstd = self.logstd(zeros)
        return FixedNormal(action_mean, action_logstd.exp())


class Bernoulli(nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(Bernoulli, self).__init__()

        init_ = lambda m: init(m, nn.init.orthogonal_, lambda x: nn.init.
                               constant_(x, 0))

        self.linear = init_(nn.Linear(num_inputs, num_outputs))

    def forward(self, x):
        x = self.linear(x)
        return FixedBernoulli(logits=x)


================================================
FILE: rl/networks/dummy_vec_env.py
================================================
import numpy as np
from baselines.common.vec_env.vec_env import VecEnv
from baselines.common.vec_env.util import dict_to_obs, obs_space_info, copy_obs_dict

class DummyVecEnv(VecEnv):
    """
    VecEnv that does runs multiple environments sequentially, that is,
    the step and reset commands are send to one environment at a time.
    Useful when debugging and when num_env == 1 (in the latter case,
    avoids communication overhead)
    """
    def __init__(self, env_fns):
        """
        Arguments:

        env_fns: iterable of callables      functions that build environments
        """
        self.envs = [fn() for fn in env_fns]
        env = self.envs[0]
        VecEnv.__init__(self, len(env_fns), env.observation_space, env.action_space)
        obs_space = env.observation_space
        self.keys, shapes, dtypes = obs_space_info(obs_space)

        self.buf_obs = { k: np.zeros((self.num_envs,) + tuple(shapes[k]), dtype=dtypes[k]) for k in self.keys }
        self.buf_dones = np.zeros((self.num_envs,), dtype=np.bool)
        self.buf_rews  = np.zeros((self.num_envs,), dtype=np.float32)
        self.buf_infos = [{} for _ in range(self.num_envs)]
        self.actions = None
        self.spec = self.envs[0].spec

    def step_async(self, actions):
        listify = True
        try:
            if len(actions) == self.num_envs:
                listify = False
        except TypeError:
            pass

        if not listify:
            self.actions = actions
        else:
            assert self.num_envs == 1, "actions {} is either not a list or has a wrong size - cannot match to {} environments".format(actions, self.num_envs)
            self.actions = [actions]

    def step_wait(self):
        for e in range(self.num_envs):
            action = self.actions[e]
            # if isinstance(self.envs[e].action_space, spaces.Discrete):
            #    action = int(action)

            obs, self.buf_rews[e], self.buf_dones[e], self.buf_infos[e] = self.envs[e].step(action)
            if self.buf_dones[e]:
                obs = self.envs[e].reset()
            self._save_obs(e, obs)
        return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones),
                self.buf_infos.copy())

    def reset(self):
        for e in range(self.num_envs):
            obs = self.envs[e].reset()
            self._save_obs(e, obs)
        return self._obs_from_buf()

    def talk2Env_async(self, data):
        self.envs[0].env.talk2Env(data[0])
        pass

    def talk2Env_wait(self):
        return [True]

    def _save_obs(self, e, obs):
        for k in self.keys:
            if k is None:
                self.buf_obs[k][e] = obs
            else:
                self.buf_obs[k][e] = obs[k]

    def _obs_from_buf(self):
        return dict_to_obs(copy_obs_dict(self.buf_obs))

    def get_images(self):
        return [env.render(mode='rgb_array') for env in self.envs]

    def render(self, mode='human'):
        if self.num_envs == 1:
            return self.envs[0].render(mode=mode)
        else:
            return super().render(mode=mode)


================================================
FILE: rl/networks/envs.py
================================================
import os

import gym
import numpy as np
import torch
from gym.spaces.box import Box
from gym.spaces.dict import Dict

from baselines import bench
from baselines.common.atari_wrappers import make_atari, wrap_deepmind
from baselines.common.vec_env import VecEnvWrapper
# from baselines.common.vec_env.dummy_vec_env import DummyVecEnv
# from baselines.common.vec_env.shmem_vec_env import ShmemVecEnv
from rl.networks.dummy_vec_env import DummyVecEnv
from rl.networks.shmem_vec_env import ShmemVecEnv
from baselines.common.vec_env.vec_normalize import \
    VecNormalize as VecNormalize_
from rl.vec_env.vec_pretext_normalize import VecPretextNormalize

try:
    import dm_control2gym
except ImportError:
    pass

try:
    import roboschool
except ImportError:
    pass

try:
    import pybullet_envs
except ImportError:
    pass


def make_env(env_id, seed, rank, log_dir, allow_early_resets, config=None, envNum=1, ax=None, test_case=-1):
    def _thunk():
        if env_id.startswith("dm"):
            _, domain, task = env_id.split('.')
            env = dm_control2gym.make(domain_name=domain, task_name=task)
        else:
            env = gym.make(env_id)

        is_atari = hasattr(gym.envs, 'atari') and isinstance(
            env.unwrapped, gym.envs.atari.atari_env.AtariEnv)
        if is_atari:
            env = make_atari(env_id)

        env.configure(config)

        envSeed = seed + rank if seed is not None else None
        # environment.render_axis = ax
        env.thisSeed = envSeed
        env.nenv = envNum
        if envNum > 1:
            env.phase = 'train'
        else:
            env.phase = 'test'

        if ax:
            env.render_axis = ax
            if test_case >= 0:
                env.test_case = test_case
        env.seed(seed + rank)

        if str(env.__class__.__name__).find('TimeLimit') >= 0:
            env = TimeLimitMask(env)

        # if log_dir is not None:
        env = bench.Monitor(
            env,
            None,
            allow_early_resets=allow_early_resets)
        print(env)

        if isinstance(env.observation_space, Box):
            if is_atari:
                if len(env.observation_space.shape) == 3:
                    env = wrap_deepmind(env)
            elif len(env.observation_space.shape) == 3:
                raise NotImplementedError(
                    "CNN models work only for atari,\n"
                    "please use a custom wrapper for a custom pixel input env.\n"
                    "See wrap_deepmind for an example.")

            # If the input has shape (W,H,3), wrap for PyTorch convolutions

            obs_shape = env.observation_space.shape
            if len(obs_shape) == 3 and obs_shape[2] in [1, 3]:
                env = TransposeImage(env, op=[2, 0, 1])

        return env

    return _thunk


def make_vec_envs(env_name,
                  seed,
                  num_processes,
                  gamma,
                  log_dir,
                  device,
                  allow_early_resets,
                  num_frame_stack=None,
                  config=None,
                  ax=None, test_case=-1, wrap_pytorch=True, pretext_wrapper=False):
    envs = [
        make_env(env_name, seed, i, log_dir, allow_early_resets, config=config,
                 envNum=num_processes, ax=ax, test_case=test_case)
        for i in range(num_processes)
    ]
    test = False if len(envs) > 1 else True

    if len(envs) > 1:
        envs = ShmemVecEnv(envs, context='fork')
    else:
        envs = DummyVecEnv(envs)
    # for collect data in supervised learning, we don't need to wrap pytorch
    if wrap_pytorch:
        if isinstance(envs.observation_space, Box):
            if len(envs.observation_space.shape) == 1:
                if gamma is None:
                    envs = VecNormalize(envs, ret=False, ob=False)
                else:
                    envs = VecNormalize(envs, gamma=gamma, ob=False, ret=False)

        envs = VecPyTorch(envs, device)
    if pretext_wrapper:
        if gamma is None:
            envs = VecPretextNormalize(envs, ret=False, ob=False, config=config, test=test)
        else:
            envs = VecPretextNormalize(envs, gamma=gamma, ob=False, ret=False, config=config, test=test)

    if num_frame_stack is not None:
        envs = VecPyTorchFrameStack(envs, num_frame_stack, device)
    elif isinstance(envs.observation_space, Box):
        if len(envs.observation_space.shape) == 3:
            envs = VecPyTorchFrameStack(envs, 4, device)

    return envs


# Checks whether done was caused my timit limits or not
class TimeLimitMask(gym.Wrapper):
    def step(self, action):
        obs, rew, done, info = self.env.step(action)
        if done and self.env._max_episode_steps == self.env._elapsed_steps:
            info['bad_transition'] = True

        return obs, rew, done, info

    def reset(self, **kwargs):
        return self.env.reset(**kwargs)


# Can be used to test recurrent policies for Reacher-v2
class MaskGoal(gym.ObservationWrapper):
    def observation(self, observation):
        if self.env._elapsed_steps > 0:
            observation[-2:] = 0
        return observation


class TransposeObs(gym.ObservationWrapper):
    def __init__(self, env=None):
        """
        Transpose observation space (base class)
        """
        super(TransposeObs, self).__init__(env)


class TransposeImage(TransposeObs):
    def __init__(self, env=None, op=[2, 0, 1]):
        """
        Transpose observation space for images
        """
        super(TransposeImage, self).__init__(env)
        assert len(op) == 3, "Error: Operation, " + str(op) + ", must be dim3"
        self.op = op
        obs_shape = self.observation_space.shape
        self.observation_space = Box(
            self.observation_space.low[0, 0, 0],
            self.observation_space.high[0, 0, 0], [
                obs_shape[self.op[0]], obs_shape[self.op[1]],
                obs_shape[self.op[2]]
            ],
            dtype=self.observation_space.dtype)

    def observation(self, ob):
        return ob.transpose(self.op[0], self.op[1], self.op[2])


class VecPyTorch(VecEnvWrapper):
    def __init__(self, venv, device):
        """Return only every `skip`-th frame"""
        super(VecPyTorch, self).__init__(venv)
        self.device = device
        # TODO: Fix data types

    def reset(self):
        obs = self.venv.reset()
        if isinstance(obs, dict):
            for key in obs:
                obs[key]=torch.from_numpy(obs[key]).to(self.device)
        else:
            obs = torch.from_numpy(obs).float().to(self.device)
        return obs

    def step_async(self, actions):
        if isinstance(actions, torch.LongTensor):
            # Squeeze the dimension for discrete actions
            actions = actions.squeeze(1)
        actions = actions.cpu().numpy()
        self.venv.step_async(actions)

    def step_wait(self):
        obs, reward, done, info = self.venv.step_wait()
        if isinstance(obs, dict):
            for key in obs:
                obs[key] = torch.from_numpy(obs[key]).to(self.device)
        else:
            obs = torch.from_numpy(obs).float().to(self.device)
        reward = torch.from_numpy(reward).unsqueeze(dim=1).float()
        return obs, reward, done, info

    def render_traj(self, path, episode_num):
        if self.venv.num_envs == 1:
            return self.venv.envs[0].env.render_traj(path, episode_num)
        else:
            for i, curr_env in enumerate(self.venv.envs):
                curr_env.env.render_traj(path, str(episode_num) + '.' + str(i))


class VecNormalize(VecNormalize_):
    def __init__(self, *args, **kwargs):
        super(VecNormalize, self).__init__(*args, **kwargs)
        self.training = True

    def _obfilt(self, obs, update=True):
        if self.ob_rms:
            if self.training and update:
                self.ob_rms.update(obs)
            obs = np.clip((obs - self.ob_rms.mean) /
                          np.sqrt(self.ob_rms.var + self.epsilon),
                          -self.clipob, self.clipob)
            return obs
        else:
            return obs

    def train(self):
        self.training = True

    def eval(self):
        self.training = False


# Derived from
# https://github.com/openai/baselines/blob/master/baselines/common/vec_env/vec_frame_stack.py
class VecPyTorchFrameStack(VecEnvWrapper):
    def __init__(self, venv, nstack, device=None):
        self.venv = venv
        self.nstack = nstack

        wos = venv.observation_space  # wrapped ob space
        self.shape_dim0 = wos.shape[0]

        low = np.repeat(wos.low, self.nstack, axis=0)
        high = np.repeat(wos.high, self.nstack, axis=0)

        if device is None:
            device = torch.device('cpu')
        self.stacked_obs = torch.zeros((venv.num_envs, ) +
                                       low.shape).to(device)

        observation_space = gym.spaces.Box(
            low=low, high=high, dtype=venv.observation_space.dtype)
        VecEnvWrapper.__init__(self, venv, observation_space=observation_space)

    def step_wait(self):
        obs, rews, news, infos = self.venv.step_wait()
        self.stacked_obs[:, :-self.shape_dim0] = \
            self.stacked_obs[:, self.shape_dim0:].clone()
        for (i, new) in enumerate(news):
            if new:
                self.stacked_obs[i] = 0
        self.stacked_obs[:, -self.shape_dim0:] = obs
        return self.stacked_obs, rews, news, infos

    def reset(self):
        obs = self.venv.reset()
        if torch.backends.cudnn.deterministic:
            self.stacked_obs = torch.zeros(self.stacked_obs.shape)
        else:
            self.stacked_obs.zero_()
        self.stacked_obs[:, -self.shape_dim0:] = obs
        return self.stacked_obs

    def close(self):
        self.venv.close()


================================================
FILE: rl/networks/model.py
================================================
import torch
import torch.nn as nn


from rl.networks.distributions import Bernoulli, Categorical, DiagGaussian
from .srnn_model import SRNN
from .selfAttn_srnn_temp_node import selfAttn_merge_SRNN

class Flatten(nn.Module):
    def forward(self, x):
        return x.view(x.size(0), -1)


class Policy(nn.Module):
    """ Class for a robot policy network """
    def __init__(self, obs_shape, action_space, base=None, base_kwargs=None):
        super(Policy, self).__init__()
        if base_kwargs is None:
            base_kwargs = {}

        if base == 'srnn':
            base=SRNN
        elif base == 'selfAttn_merge_srnn':
            base = selfAttn_merge_SRNN
        else:
            raise NotImplementedError

        self.base = base(obs_shape, base_kwargs)
        self.srnn = True

        if action_space.__class__.__name__ == "Discrete":
            num_outputs = action_space.n
            self.dist = Categorical(self.base.output_size, num_outputs)
        elif action_space.__class__.__name__ == "Box":
            num_outputs = action_space.shape[0]

            self.dist = DiagGaussian(self.base.output_size, num_outputs)
        elif action_space.__class__.__name__ == "MultiBinary":
            num_outputs = action_space.shape[0]
            self.dist = Bernoulli(self.base.output_size, num_outputs)
        else:
            raise NotImplementedError

    @property
    def is_recurrent(self):
        return self.base.is_recurrent

    @property
    def recurrent_hidden_state_size(self):
        """Size of rnn_hx."""
        return self.base.recurrent_hidden_state_size

    def forward(self, inputs, rnn_hxs, masks):
        raise NotImplementedError

    def act(self, inputs, rnn_hxs, masks, deterministic=False):
        if not hasattr(self, 'srnn'):
            self.srnn = False
        if self.srnn:
            value, actor_features, rnn_hxs = self.base(inputs, rnn_hxs, masks, infer=True)

        else:
            value, actor_features, rnn_hxs = self.base(inputs, rnn_hxs, masks)
        dist = self.dist(actor_features)

        if deterministic:
            action = dist.mode()
        else:
            action = dist.sample()

        action_log_probs = dist.log_probs(action)
        dist_entropy = dist.entropy().mean()

        return value, action, action_log_probs, rnn_hxs

    def get_value(self, inputs, rnn_hxs, masks):

        value, _, _ = self.base(inputs, rnn_hxs, masks, infer=True)

        return value

    def evaluate_actions(self, inputs, rnn_hxs, masks, action):
        value, actor_features, rnn_hxs = self.base(inputs, rnn_hxs, masks)

        dist = self.dist(actor_features)

        action_log_probs = dist.log_probs(action)
        dist_entropy = dist.entropy().mean()

        return value, action_log_probs, dist_entropy, rnn_hxs


================================================
FILE: rl/networks/network_utils.py
================================================
import glob
import os

import torch.nn as nn

from rl.networks.envs import VecNormalize


# Get a render function
def get_render_func(venv):
    if hasattr(venv, 'envs'):
        return venv.envs[0].render
    elif hasattr(venv, 'venv'):
        return get_render_func(venv.venv)
    elif hasattr(venv, 'env'):
        return get_render_func(venv.env)

    return None


def get_vec_normalize(venv):
    if isinstance(venv, VecNormalize):
        return venv
    elif hasattr(venv, 'venv'):
        return get_vec_normalize(venv.venv)

    return None


# Necessary for my KFAC implementation.
class AddBias(nn.Module):
    def __init__(self, bias):
        super(AddBias, self).__init__()
        self._bias = nn.Parameter(bias.unsqueeze(1))

    def forward(self, x):
        if x.dim() == 2:
            bias = self._bias.t().view(1, -1)
        else:
            bias = self._bias.t().view(1, -1, 1, 1)

        return x + bias


def update_linear_schedule(optimizer, epoch, total_num_epochs, initial_lr):
    """Decreases the learning rate linearly"""
    lr = initial_lr - (initial_lr * (epoch / float(total_num_epochs)))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr


def init(module, weight_init, bias_init, gain=1):
    weight_init(module.weight.data, gain=gain)
    bias_init(module.bias.data)
    return module


def cleanup_log_dir(log_dir):
    try:
        os.makedirs(log_dir)
    except OSError:
        files = glob.glob(os.path.join(log_dir, '*.monitor.csv'))
        for f in files:
            os.remove(f)


================================================
FILE: rl/networks/selfAttn_srnn_temp_node.py
================================================
import torch.nn.functional as F

from .srnn_model import *

class SpatialEdgeSelfAttn(nn.Module):
    """
    Class for the human-human attention,
    uses a multi-head self attention proposed by https://arxiv.org/abs/1706.03762
    """
    def __init__(self, args):
        super(SpatialEdgeSelfAttn, self).__init__()
        self.args = args

        # Store required sizes
        # todo: hard-coded for now
        # with human displacement: + 2
        # pred 4 steps + disp: 12
        # pred 4 steps + no disp: 10
        # pred 5 steps + no disp: 12
        # pred 5 steps + no disp + probR: 17
        # Gaussian pred 5 steps + no disp: 27
        # pred 8 steps + no disp: 18
        if args.env_name in ['CrowdSimPred-v0', 'CrowdSimPredRealGST-v0']:
            self.input_size = 12
        elif args.env_name == 'CrowdSimVarNum-v0':
            self.input_size = 2 # 4
        else:
            raise NotImplementedError
        self.num_attn_heads=8
        self.attn_size=512


        # Linear layer to embed input
        self.embedding_layer = nn.Sequential(nn.Linear(self.input_size, 128), nn.ReLU(),
                                             nn.Linear(128, self.attn_size), nn.ReLU()
                                             )

        self.q_linear = nn.Linear(self.attn_size, self.attn_size)
        self.v_linear = nn.Linear(self.attn_size, self.attn_size)
        self.k_linear = nn.Linear(self.attn_size, self.attn_size)

        # multi-head self attention
        self.multihead_attn=torch.nn.MultiheadAttention(self.attn_size, self.num_attn_heads)


    # Given a list of sequence lengths, create a mask to indicate which indices are padded
    # e.x. Input: [3, 1, 4], max_human_num = 5
    # Output: [[1, 1, 1, 0, 0], [1, 0, 0, 0, 0], [1, 1, 1, 1, 0]]
    def create_attn_mask(self, each_seq_len, seq_len, nenv, max_human_num):
        # mask with value of False means padding and should be ignored by attention
        # why +1: use a sentinel in the end to handle the case when each_seq_len = 18
        if self.args.no_cuda:
            mask = torch.zeros(seq_len * nenv, max_human_num + 1).cpu()
        else:
            mask = torch.zeros(seq_len*nenv, max_human_num+1).cuda()
        mask[torch.arange(seq_len*nenv), each_seq_len.long()] = 1.
        mask = torch.logical_not(mask.cumsum(dim=1))
        # remove the sentinel
        mask = mask[:, :-1].unsqueeze(-2) # seq_len*nenv, 1, max_human_num
        return mask


    def forward(self, inp, each_seq_len):
        '''
        Forward pass for the model
        params:
        inp : input edge features
        each_seq_len:
        if self.args.sort_humans is True, the true length of the sequence. Should be the number of detected humans
        else, it is the mask itself
        '''
        # inp is padded sequence [seq_len, nenv, max_human_num, 2]
        seq_len, nenv, max_human_num, _ = inp.size()
        if self.args.sort_humans:
            attn_mask = self.create_attn_mask(each_seq_len, seq_len, nenv, max_human_num)  # [seq_len*nenv, 1, max_human_num]
            attn_mask = attn_mask.squeeze(1)  # if we use pytorch builtin function
        else:
            # combine the first two dimensions
            attn_mask = each_seq_len.reshape(seq_len*nenv, max_human_num)


        input_emb=self.embedding_layer(inp).view(seq_len*nenv, max_human_num, -1)
        input_emb=torch.transpose(input_emb, dim0=0, dim1=1) # if we use pytorch builtin function, v1.7.0 has no batch first option
        q=self.q_linear(input_emb)
        k=self.k_linear(input_emb)
        v=self.v_linear(input_emb)

        #z=self.multihead_attn(q, k, v, mask=attn_mask)
        z,_=self.multihead_attn(q, k, v, key_padding_mask=torch.logical_not(attn_mask)) # if we use pytorch builtin function
        z=torch.transpose(z, dim0=0, dim1=1) # if we use pytorch builtin function
        return z


class EdgeAttention_M(nn.Module):
    '''
    Class for the robot-human attention module
    '''
    def __init__(self, args):
        '''
        Initializer function
        params:
        args : Training arguments
        infer : Training or test time (True at test time)
        '''
        super(EdgeAttention_M, self).__init__()

        self.args = args

        # Store required sizes
        self.human_human_edge_rnn_size = args.human_human_edge_rnn_size
        self.human_node_rnn_size = args.human_node_rnn_size
        self.attention_size = args.attention_size


        # Linear layer to embed temporal edgeRNN hidden state
        self.temporal_edge_layer=nn.ModuleList()
        self.spatial_edge_layer=nn.ModuleList()

        self.temporal_edge_layer.append(nn.Linear(self.human_human_edge_rnn_size, self.attention_size))

        # Linear layer to embed spatial edgeRNN hidden states
        self.spatial_edge_layer.append(nn.Linear(self.human_human_edge_rnn_size, self.attention_size))


        # number of agents who have spatial edges (complete graph: all 6 agents; incomplete graph: only the robot)
        self.agent_num = 1
        self.num_attention_head = 1

    def create_attn_mask(self, each_seq_len, seq_len, nenv, max_human_num):
        # mask with value of False means padding and should be ignored by attention
        # why +1: use a sentinel in the end to handle the case when each_seq_len = 18
        if self.args.no_cuda:
            mask = torch.zeros(seq_len * nenv, max_human_num + 1).cpu()
        else:
            mask = torch.zeros(seq_len * nenv, max_human_num + 1).cuda()
        mask[torch.arange(seq_len * nenv), each_seq_len.long()] = 1.
        mask = torch.logical_not(mask.cumsum(dim=1))
        # remove the sentinel
        mask = mask[:, :-1].unsqueeze(-2)  # seq_len*nenv, 1, max_human_num
        return mask

    def att_func(self, temporal_embed, spatial_embed, h_spatials, attn_mask=None):
        seq_len, nenv, num_edges, h_size = h_spatials.size()  # [1, 12, 30, 256] in testing,  [12, 30, 256] in training
        attn = temporal_embed * spatial_embed
        attn = torch.sum(attn, dim=3)

        # Variable length
        temperature = num_edges / np.sqrt(self.attention_size)
        attn = torch.mul(attn, temperature)

        # if we don't want to mask invalid humans, attn_mask is None and no mask will be applied
        # else apply attn masks
        if attn_mask is not None:
            attn = attn.masked_fill(attn_mask == 0, -1e9)

        # Softmax
        attn = attn.view(seq_len, nenv, self.agent_num, self.human_num)
        attn = torch.nn.functional.softmax(attn, dim=-1)
        # print(attn[0, 0, 0].cpu().numpy())

        # Compute weighted value
        # weighted_value = torch.mv(torch.t(h_spatials), attn)

        # reshape h_spatials and attn
        # shape[0] = seq_len, shape[1] = num of spatial edges (6*5 = 30), shape[2] = 256
        h_spatials = h_spatials.view(seq_len, nenv, self.agent_num, self.human_num, h_size)
        h_spatials = h_spatials.view(seq_len * nenv * self.agent_num, self.human_num, h_size).permute(0, 2,
                                                                                         1)  # [seq_len*nenv*6, 5, 256] -> [seq_len*nenv*6, 256, 5]

        attn = attn.view(seq_len * nenv * self.agent_num, self.human_num).unsqueeze(-1)  # [seq_len*nenv*6, 5, 1]
        weighted_value = torch.bmm(h_spatials, attn)  # [seq_len*nenv*6, 256, 1]

        # reshape back
        weighted_value = weighted_value.squeeze(-1).view(seq_len, nenv, self.agent_num, h_size)  # [seq_len, 12, 6 or 1, 256]
        return weighted_value, attn


    # h_temporal: [seq_len, nenv, 1, 256]
    # h_spatials: [seq_len, nenv, 5, 256]
    def forward(self, h_temporal, h_spatials, each_seq_len):
        '''
        Forward pass for the model
        params:
        h_temporal : Hidden state of the temporal edgeRNN
        h_spatials : Hidden states of all spatial edgeRNNs connected to the node.
        each_seq_len:
            if self.args.sort_humans is True, the true length of the sequence. Should be the number of detected humans
            else, it is the mask itself
        '''
        seq_len, nenv, max_human_num, _ = h_spatials.size()
        # find the number of humans by the size of spatial edgeRNN hidden state
        self.human_num = max_human_num // self.agent_num

        weighted_value_list, attn_list=[],[]
        for i in range(self.num_attention_head):

            # Embed the temporal edgeRNN hidden state
            temporal_embed = self.temporal_edge_layer[i](h_temporal)
            # temporal_embed = temporal_embed.squeeze(0)

            # Embed the spatial edgeRNN hidden states
            spatial_embed = self.spatial_edge_layer[i](h_spatials)

            # Dot based attention
            temporal_embed = temporal_embed.repeat_interleave(self.human_num, dim=2)

            if self.args.sort_humans:
                attn_mask = self.create_attn_mask(each_seq_len, seq_len, nenv, max_human_num)  # [seq_len*nenv, 1, max_human_num]
                attn_mask = attn_mask.squeeze(-2).view(seq_len, nenv, max_human_num)
            else:
                attn_mask = each_seq_len
            weighted_value,attn=self.att_func(temporal_embed, spatial_embed, h_spatials, attn_mask=attn_mask)
            weighted_value_list.append(weighted_value)
            attn_list.append(attn)

        if self.num_attention_head > 1:
            return self.final_attn_linear(torch.cat(weighted_value_list, dim=-1)), attn_list
        else:
            return weighted_value_list[0], attn_list[0]

class EndRNN(RNNBase):
    '''
    Class for the GRU
    '''
    def __init__(self, args):
        '''
        Initializer function
        params:
        args : Training arguments
        infer : Training or test time (True at test time)
        '''
        super(EndRNN, self).__init__(args, edge=False)

        self.args = args

        # Store required sizes
        self.rnn_size = args.human_node_rnn_size
        self.output_size = args.human_node_output_size
        self.embedding_size = args.human_node_embedding_size
        self.input_size = args.human_node_input_size
        self.edge_rnn_size = args.human_human_edge_rnn_size

        # Linear layer to embed input
        self.encoder_linear = nn.Linear(256, self.embedding_size)

        # ReLU and Dropout layers
        self.relu = nn.ReLU()

        # Linear layer to embed attention module output
        self.edge_attention_embed = nn.Linear(self.edge_rnn_size, self.embedding_size)


        # Output linear layer
        self.output_linear = nn.Linear(self.rnn_size, self.output_size)


    def forward(self, robot_s, h_spatial_other, h, masks):
        '''
        Forward pass for the model
        params:
        pos : input position
        h_temporal : hidden state of the temporal edgeRNN corresponding to this node
        h_spatial_other : output of the attention module
        h : hidden state of the current nodeRNN
        c : cell state of the current nodeRNN
        '''
        # Encode the input position
        encoded_input = self.encoder_linear(robot_s)
        encoded_input = self.relu(encoded_input)

        h_edges_embedded = self.relu(self.edge_attention_embed(h_spatial_other))

        concat_encoded = torch.cat((encoded_input, h_edges_embedded), -1)

        x, h_new = self._forward_gru(concat_encoded, h, masks)

        outputs = self.output_linear(x)


        return outputs, h_new

class selfAttn_merge_SRNN(nn.Module):
    """
    Class for the proposed network
    """
    def __init__(self, obs_space_dict, args, infer=False):
        """
        Initializer function
        params:
        args : Training arguments
        infer : Training or test time (True at test time)
        """
        super(selfAttn_merge_SRNN, self).__init__()
        self.infer = infer
        self.is_recurrent = True
        self.args=args

        self.human_num = obs_space_dict['spatial_edges'].shape[0]

        self.seq_length = args.seq_length
        self.nenv = args.num_processes
        self.nminibatch = args.num_mini_batch

        # Store required sizes
        self.human_node_rnn_size = args.human_node_rnn_size
        self.human_human_edge_rnn_size = args.human_human_edge_rnn_size
        self.output_size = args.human_node_output_size

        # Initialize the Node and Edge RNNs
        self.humanNodeRNN = EndRNN(args)

        # Initialize attention module
        self.attn = EdgeAttention_M(args)


        init_ = lambda m: init(m, nn.init.orthogonal_, lambda x: nn.init.
                               constant_(x, 0), np.sqrt(2))

        num_inputs = hidden_size = self.output_size

        self.actor = nn.Sequential(
            init_(nn.Linear(num_inputs, hidden_size)), nn.Tanh(),
            init_(nn.Linear(hidden_size, hidden_size)), nn.Tanh())

        self.critic = nn.Sequential(
            init_(nn.Linear(num_inputs, hidden_size)), nn.Tanh(),
            init_(nn.Linear(hidden_size, hidden_size)), nn.Tanh())


        self.critic_linear = init_(nn.Linear(hidden_size, 1))
        robot_size = 9
        self.robot_linear = nn.Sequential(init_(nn.Linear(robot_size, 256)), nn.ReLU()) # todo: check dim
        self.human_node_final_linear=init_(nn.Linear(self.output_size,2))

        if self.args.use_self_attn:
            self.spatial_attn = SpatialEdgeSelfAttn(args)
            self.spatial_linear = nn.Sequential(init_(nn.Linear(512, 256)), nn.ReLU())
        else:
            self.spatial_linear = nn.Sequential(init_(nn.Linear(obs_space_dict['spatial_edges'].shape[1], 128)), nn.ReLU(),
                                                init_(nn.Linear(128, 256)), nn.ReLU())


        self.temporal_edges = [0]
        self.spatial_edges = np.arange(1, self.human_num+1)

        dummy_human_mask = [0] * self.human_num
        dummy_human_mask[0] = 1
        if self.args.no_cuda:
            self.dummy_human_mask = Variable(torch.Tensor([dummy_human_mask]).cpu())
        else:
            self.dummy_human_mask = Variable(torch.Tensor([dummy_human_mask]).cuda())


    def forward(self, inputs, rnn_hxs, masks, infer=False):
        if infer:
            # Test/rollout time
            seq_length = 1
            nenv = self.nenv

        else:
            # Training time
            seq_length = self.seq_length
            nenv = self.nenv // self.nminibatch

        robot_node = reshapeT(inputs['robot_node'], seq_length, nenv)
        temporal_edges = reshapeT(inputs['temporal_edges'], seq_length, nenv)
        spatial_edges = reshapeT(inputs['spatial_edges'], seq_length, nenv)

        # to prevent errors in old models that does not have sort_humans argument
        if not hasattr(self.args, 'sort_humans'):
            self.args.sort_humans = True
        if self.args.sort_humans:
            detected_human_num = inputs['detected_human_num'].squeeze(-1).cpu().int()
        else:
            human_masks = reshapeT(inputs['visible_masks'], seq_length, nenv).float() # [seq_len, nenv, max_human_num]
            # if no human is detected (human_masks are all False, set the first human to True)
            human_masks[human_masks.sum(dim=-1)==0] = self.dummy_human_mask


        hidden_states_node_RNNs = reshapeT(rnn_hxs['human_node_rnn'], 1, nenv)
        masks = reshapeT(masks, seq_length, nenv)


        if self.args.no_cuda:
            all_hidden_states_edge_RNNs = Variable(
                torch.zeros(1, nenv, 1+self.human_num, rnn_hxs['human_human_edge_rnn'].size()[-1]).cpu())
        else:
            all_hidden_states_edge_RNNs = Variable(
                torch.zeros(1, nenv, 1+self.human_num, rnn_hxs['human_human_edge_rnn'].size()[-1]).cuda())

        robot_states = torch.cat((temporal_edges, robot_node), dim=-1)
        robot_states = self.robot_linear(robot_states)


        # attention modules
        if self.args.sort_humans:
            # human-human attention
            if self.args.use_self_attn:
                spatial_attn_out=self.spatial_attn(spatial_edges, detected_human_num).view(seq_length, nenv, self.human_num, -1)
            else:
                spatial_attn_out = spatial_edges
            output_spatial = self.spatial_linear(spatial_attn_out)

            # robot-human attention
            hidden_attn_weighted, _ = self.attn(robot_states, output_spatial, detected_human_num)
        else:
            # human-human attention
            if self.args.use_self_attn:
                spatial_attn_out = self.spatial_attn(spatial_edges, human_masks).view(seq_length, nenv, self.human_num, -1)
            else:
                spatial_attn_out = spatial_edges
            output_spatial = self.spatial_linear(spatial_attn_out)

            # robot-human attention
            hidden_attn_weighted, _ = self.attn(robot_states, output_spatial, human_masks)


        # Do a forward pass through GRU
        outputs, h_nodes \
            = self.humanNodeRNN(robot_states, hidden_attn_weighted, hidden_states_node_RNNs, masks)


        # Update the hidden and cell states
        all_hidden_states_node_RNNs = h_nodes
        outputs_return = outputs

        rnn_hxs['human_node_rnn'] = all_hidden_states_node_RNNs
        rnn_hxs['human_human_edge_rnn'] = all_hidden_states_edge_RNNs


        # x is the output and will be sent to actor and critic
        x = outputs_return[:, :, 0, :]

        hidden_critic = self.critic(x)
        hidden_actor = self.actor(x)

        for key in rnn_hxs:
            rnn_hxs[key] = rnn_hxs[key].squeeze(0)

        if infer:
            return self.critic_linear(hidden_critic).squeeze(0), hidden_actor.squeeze(0), rnn_hxs
        else:
            return self.critic_linear(hidden_critic).view(-1, 1), hidden_actor.view(-1, self.output_size), rnn_hxs


def reshapeT(T, seq_length, nenv):
    shape = T.size()[1:]
    return T.unsqueeze(0).reshape((seq_length, nenv, *shape))

================================================
FILE: rl/networks/shmem_vec_env.py
================================================
"""
An interface for asynchronous vectorized environments.
"""

import multiprocessing as mp
import numpy as np
from baselines.common.vec_env.vec_env import VecEnv, CloudpickleWrapper, clear_mpi_env_vars
import ctypes
from baselines import logger

from baselines.common.vec_env.util import dict_to_obs, obs_space_info, obs_to_dict

_NP_TO_CT = {np.float32: ctypes.c_float,
             np.int32: ctypes.c_int32,
             np.int8: ctypes.c_int8,
             np.uint8: ctypes.c_char,
             np.bool: ctypes.c_bool,
             np.bool_: ctypes.c_bool}


class ShmemVecEnv(VecEnv):
    """
    Optimized version of SubprocVecEnv that uses shared variables to communicate observations.
    """

    def __init__(self, env_fns, spaces=None, context='spawn'):
        """
        If you don't specify observation_space, we'll have to create a dummy
        environment to get it.
        """
        ctx = mp.get_context(context)
        if spaces:
            observation_space, action_space = spaces
        else:
            logger.log('Creating dummy env object to get spaces')
            with logger.scoped_configure(format_strs=[]):
                dummy = env_fns[0]()
                observation_space, action_space = dummy.observation_space, dummy.action_space
                dummy.close()
                del dummy
        VecEnv.__init__(self, len(env_fns), observation_space, action_space)
        self.obs_keys, self.obs_shapes, self.obs_dtypes = obs_space_info(observation_space)
        self.obs_bufs = [
            {k: ctx.Array(_NP_TO_CT[self.obs_dtypes[k].type], int(np.prod(self.obs_shapes[k]))) for k in self.obs_keys}
            for _ in env_fns]
        self.parent_pipes = []
        self.procs = []
        with clear_mpi_env_vars():
            for env_fn, obs_buf in zip(env_fns, self.obs_bufs):
                wrapped_fn = CloudpickleWrapper(env_fn)
                parent_pipe, child_pipe = ctx.Pipe()
                proc = ctx.Process(target=_subproc_worker,
                            args=(child_pipe, parent_pipe, wrapped_fn, obs_buf, self.obs_shapes, self.obs_dtypes, self.obs_keys))
                proc.daemon = True
                self.procs.append(proc)
                self.parent_pipes.append(parent_pipe)
                proc.start()
                child_pipe.close()
        self.waiting_step = False
        self.viewer = None

    def reset(self):
        if self.waiting_step:
            logger.warn('Called reset() while waiting for the step to complete')
            self.step_wait()
        for pipe in self.parent_pipes:
            pipe.send(('reset', None))
        return self._decode_obses([pipe.recv() for pipe in self.parent_pipes])

    def step_async(self, actions):
        assert len(actions) == len(self.parent_pipes)
        for pipe, act in zip(self.parent_pipes, actions):
            pipe.send(('step', act))
        self.waiting_step = True

    def step_wait(self):
        outs = [pipe.recv() for pipe in self.parent_pipes]
        self.waiting_step = False
        obs, rews, dones, infos = zip(*outs)
        return self._decode_obses(obs), np.array(rews), np.array(dones), infos

    def talk2Env_async(self, data):
        assert len(data) == len(self.parent_pipes)
        for pipe, d in zip(self.parent_pipes, data):
            pipe.send(('talk2Env', d))
        self.waiting_step = True

    def talk2Env_wait(self):
        outs = [pipe.recv() for pipe in self.parent_pipes]  # pipe.recv() is a blocking call
        self.waiting_step = False
        return np.array(outs)

    def close_extras(self):
        if self.waiting_step:
            self.step_wait()
        for pipe in self.parent_pipes:
            pipe.send(('close', None))
        for pipe in self.parent_pipes:
            pipe.recv()
            pipe.close()
        for proc in self.procs:
            proc.join()

    def get_images(self, mode='human'):
        for pipe in self.parent_pipes:
            pipe.send(('render', None))
        return [pipe.recv() for pipe in self.parent_pipes]

    def _decode_obses(self, obs):
        result = {}
        for k in self.obs_keys:

            bufs = [b[k] for b in self.obs_bufs]
            o = [np.frombuffer(b.get_obj(), dtype=self.obs_dtypes[k]).reshape(self.obs_shapes[k]) for b in bufs]
            result[k] = np.array(o)
        return dict_to_obs(result)


def _subproc_worker(pipe, parent_pipe, env_fn_wrapper, obs_bufs, obs_shapes, obs_dtypes, keys):
    """
    Control a single environment instance using IPC and
    shared memory.
    """
    def _write_obs(maybe_dict_obs):
        flatdict = obs_to_dict(maybe_dict_obs)
        for k in keys:
            dst = obs_bufs[k].get_obj()
            dst_np = np.frombuffer(dst, dtype=obs_dtypes[k]).reshape(obs_shapes[k])  # pylint: disable=W0212
            np.copyto(dst_np, flatdict[k])

    env = env_fn_wrapper.x()
    parent_pipe.close()
    try:
        while True:
            cmd, data = pipe.recv()
            if cmd == 'reset':
                pipe.send(_write_obs(env.reset()))
            elif cmd == 'step':
                obs, reward, done, info = env.step(data)
                if done:
                    obs = env.reset()
                pipe.send((_write_obs(obs), reward, done, info))
            elif cmd == 'render':
                pipe.send(env.render(mode='rgb_array'))
            elif cmd == 'close':
                pipe.send(None)
                break
            elif cmd == 'talk2Env':
                pipe.send(env.talk2Env(data))
            else:
                raise RuntimeError('Got unrecognized cmd %s' % cmd)
    except KeyboardInterrupt:
        print('ShmemVecEnv worker: got KeyboardInterrupt')
    finally:
        env.close()


================================================
FILE: rl/networks/srnn_model.py
================================================
import torch.nn as nn
from torch.autograd import Variable
import torch
import numpy as np
import copy

from rl.networks.network_utils import init


class RNNBase(nn.Module):
    """
    The class for RNN with done masks
    """
    # edge: True -> edge RNN, False -> node RNN
    def __init__(self, args, edge):
        super(RNNBase, self).__init__()
        self.args = args

        # if this is an edge RNN
        if edge:
            self.gru = nn.GRU(args.human_human_edge_embedding_size, args.human_human_edge_rnn_size)
        # if this is a node RNN
        else:
            self.gru = nn.GRU(args.human_node_embedding_size*2, args.human_node_rnn_size)

        for name, param in self.gru.named_parameters():
            if 'bias' in name:
                nn.init.constant_(param, 0)
            elif 'weight' in name:
                nn.init.orthogonal_(param)

    # x: [seq_len, nenv, 6 or 30 or 36, ?]
    # hxs: [1, nenv, human_num, ?]
    # masks: [1, nenv, 1]
    def _forward_gru(self, x, hxs, masks):
        # for acting model, input shape[0] == hidden state shape[0]
        if x.size(0) == hxs.size(0):
            # use env dimension as batch
            # [1, 12, 6, ?] -> [1, 12*6, ?] or [30, 6, 6, ?] -> [30, 6*6, ?]
            seq_len, nenv, agent_num, _ = x.size()
            x = x.view(seq_len, nenv*agent_num, -1)
            mask_agent_num = masks.size()[-1]
            hxs_times_masks = hxs * (masks.view(seq_len, nenv, mask_agent_num, 1))
            hxs_times_masks = hxs_times_masks.view(seq_len, nenv*agent_num, -1)
            x, hxs = self.gru(x, hxs_times_masks) # we already unsqueezed the inputs in SRNN forward function
            x = x.view(seq_len, nenv, agent_num, -1)
            hxs = hxs.view(seq_len, nenv, agent_num, -1)

        # during update, input shape[0] * nsteps (30) = hidden state shape[0]
        else:

            # N: nenv, T: seq_len, agent_num: node num or edge num
            T, N, agent_num, _ = x.size()
            # x = x.view(T, N, agent_num, x.size(2))

            # Same deal with masks
            masks = masks.view(T, N)

            # Let's figure out which steps in the sequence have a zero for any agent
            # We will always assume t=0 has a zero in it as that makes the logic cleaner
            # for the [29, num_env] boolean array, if any entry in the second axis (num_env) is True -> True
            # to make it [29, 1], then select the indices of True entries
            has_zeros = ((masks[1:] == 0.0) \
                            .any(dim=-1)
                            .nonzero()
                            .squeeze()
                            .cpu())

            # +1 to correct the masks[1:]
            if has_zeros.dim() == 0:
                # Deal with scalar
                has_zeros = [has_zeros.item() + 1]
            else:
                has_zeros = (has_zeros + 1).numpy().tolist()

            # add t=0 and t=T to the list
            has_zeros = [0] + has_zeros + [T]

            # hxs = hxs.unsqueeze(0)
            # hxs = hxs.view(hxs.size(0), hxs.size(1)*hxs.size(2), hxs.size(3))
            outputs = []
            for i in range(len(has_zeros) - 1):
                # We can now process steps that don't have any zeros in masks together!
                # This is much faster
                start_idx = has_zeros[i]
                end_idx = has_zeros[i + 1]

                # x and hxs have 4 dimensions, merge the 2nd and 3rd dimension
                x_in = x[start_idx:end_idx]
                x_in = x_in.view(x_in.size(0), x_in.size(1)*x_in.size(2), x_in.size(3))
                hxs = hxs.view(hxs.size(0), N, agent_num, -1)
                hxs = hxs * (masks[start_idx].view(1, -1, 1, 1))
                hxs = hxs.view(hxs.size(0), hxs.size(1) * hxs.size(2), hxs.size(3))
                rnn_scores, hxs = self.gru(x_in, hxs)

                outputs.append(rnn_scores)

            # assert len(outputs) == T
            # x is a (T, N, -1) tensor
            x = torch.cat(outputs, dim=0)
            # flatten
            x = x.view(T, N, agent_num, -1)
            hxs = hxs.view(1, N, agent_num, -1)

        return x, hxs


class HumanNodeRNN(RNNBase):
    '''
    Class representing human Node RNNs in the st-graph
    '''
    def __init__(self, args):
        '''
        Initializer function
        params:
        args : Training arguments
        infer : Training or test time (True at test time)
        '''
        super(HumanNodeRNN, self).__init__(args, edge=False)

        self.args = args

        # Store required sizes
        self.rnn_size = args.human_node_rnn_size
        self.output_size = args.human_node_output_size
        self.embedding_size = args.human_node_embedding_size
        self.input_size = args.human_node_input_size
        self.edge_rnn_size = args.human_human_edge_rnn_size

        # Linear layer to embed input
        self.encoder_linear = nn.Linear(self.input_size, self.embedding_size)

        # ReLU and Dropout layers
        self.relu = nn.ReLU()


        # Linear layer to embed edgeRNN hidden states
        self.edge_embed = nn.Linear(self.edge_rnn_size, self.embedding_size)

        # Linear layer to embed attention module output
        self.edge_attention_embed = nn.Linear(self.edge_rnn_size*2, self.embedding_size)


        # Output linear layer
        self.output_linear = nn.Linear(self.rnn_size, self.output_size)


    def forward(self, pos, h_temporal, h_spatial_other, h, masks):
        '''
        Forward pass for the model
        params:
        pos : input position
        h_temporal : hidden state of the temporal edgeRNN corresponding to this node
        h_spatial_other : output of the attention module
        h : hidden state of the current nodeRNN
        c : cell state of the current nodeRNN
        '''
        # Encode the input position
        encoded_input = self.encoder_linear(pos)
        encoded_input = self.relu(encoded_input)

        # Concat both the embeddings
        h_edges = torch.cat((h_temporal, h_spatial_other), -1)
        h_edges_embedded = self.relu(self.edge_attention_embed(h_edges))

        concat_encoded = torch.cat((encoded_input, h_edges_embedded), -1)

        x, h_new = self._forward_gru(concat_encoded, h, masks)

        outputs = self.output_linear(x)


        return outputs, h_new


class HumanHumanEdgeRNN(RNNBase):
    '''
    Class representing the Human-Human Edge RNN in the s-t graph
    '''
    def __init__(self, args):
        '''
        Initializer function
        params:
        args : Training arguments
        infer : Training or test time (True at test time)
        '''
        super(HumanHumanEdgeRNN, self).__init__(args, edge=True)

        self.args = args

        # Store required sizes
        self.rnn_size = args.human_human_edge_rnn_size
        self.embedding_size = args.human_human_edge_embedding_size
        self.input_size = args.human_human_edge_input_size

        # Linear layer to embed input
        self.encoder_linear = nn.Linear(self.input_size, self.embedding_size)
        self.relu = nn.ReLU()


    def forward(self, inp, h, masks):
        '''
        Forward pass for the model
        params:
        inp : input edge features
        h : hidden state of the current edgeRNN
        c : cell state of the current edgeRNN
        '''
        # Encode the input position
        encoded_input = self.encoder_linear(inp)
        encoded_input = self.relu(encoded_input)

        x, h_new = self._forward_gru(encoded_input, h, masks)

        return x, encoded_input, h_new


class EdgeAttention(nn.Module):
    '''
    Class representing the attention module
    '''
    def __init__(self, args):
        '''
        Initializer function
        params:
        args : Training arguments
        infer : Training or test time (True at test time)
        '''
        super(EdgeAttention, self).__init__()

        self.args = args

        # Store required sizes
        self.human_human_edge_rnn_size = args.human_human_edge_rnn_size
        self.human_node_rnn_size = args.human_node_rnn_size
        self.attention_size = args.attention_size


        # Linear layer to embed temporal edgeRNN hidden state
        self.temporal_edge_layer=nn.ModuleList()
        self.spatial_edge_layer=nn.ModuleList()

        self.temporal_edge_layer.append(nn.Linear(self.human_human_edge_rnn_size, self.attention_size))

        # Linear layer to embed spatial edgeRNN hidden states
        self.spatial_edge_layer.append(nn.Linear(self.human_human_edge_rnn_size, self.attention_size))


        # number of agents who have spatial edges (complete graph: all 6 agents; incomplete graph: only the robot)
        self.agent_num = 1
        self.num_attention_head = 1

    def att_func(self, temporal_embed, spatial_embed, h_spatials):
        seq_len, nenv, num_edges, h_size = h_spatials.size()  # [1, 12, 30, 256] in testing,  [12, 30, 256] in training
        attn = temporal_embed * spatial_embed
        attn = torch.sum(attn, dim=3)

        # Variable length
        temperature = num_edges / np.sqrt(self.attention_size)
        attn = torch.mul(attn, temperature)

        # Softmax

        attn = attn.view(seq_len, nenv, self.agent_num, self.human_num)
        attn = torch.nn.functional.softmax(attn, dim=-1)
        # print(attn[0, 0, 0].cpu().numpy())

        # Compute weighted value
        # weighted_value = torch.mv(torch.t(h_spatials), attn)

        # reshape h_spatials and attn
        # shape[0] = seq_len, shape[1] = num of spatial edges (6*5 = 30), shape[2] = 256
        h_spatials = h_spatials.view(seq_len, nenv, self.agent_num, self.human_num, h_size)
        h_spatials = h_spatials.view(seq_len * nenv * self.agent_num, self.human_num, h_size).permute(0, 2,
                                                                                         1)  # [seq_len*nenv*6, 5, 256] -> [seq_len*nenv*6, 256, 5]

        attn = attn.view(seq_len * nenv * self.agent_num, self.human_num).unsqueeze(-1)  # [seq_len*nenv*6, 5, 1]
        weighted_value = torch.bmm(h_spatials, attn)  # [seq_len*nenv*6, 256, 1]

        # reshape back
        weighted_value = weighted_value.squeeze(-1).view(seq_len, nenv, self.agent_num, h_size)  # [seq_len, 12, 6 or 1, 256]
        return weighted_value, attn


    # h_temporal: [seq_len, nenv, 1, 256]
    # h_spatials: [seq_len, nenv, 5, 256]
    def forward(self, h_temporal, h_spatials):
        '''
        Forward pass for the model
        params:
        h_temporal : Hidden state of the temporal edgeRNN
        h_spatials : Hidden states of all spatial edgeRNNs connected to the node.
        '''
        # find the number of humans by the size of spatial edgeRNN hidden state
        self.human_num = h_spatials.size()[2] // self.agent_num

        weighted_value_list, attn_list=[],[]
        for i in range(self.num_attention_head):

            # Embed the temporal edgeRNN hidden state
            temporal_embed = self.temporal_edge_layer[i](h_temporal)
            # temporal_embed = temporal_embed.squeeze(0)

            # Embed the spatial edgeRNN hidden states
            spatial_embed = self.spatial_edge_layer[i](h_spatials)

            # Dot based attention
            try:
                temporal_embed = temporal_embed.repeat_interleave(self.human_num, dim=2)
            except RuntimeError:
                print('hello')
            weighted_value,attn=self.att_func(temporal_embed, spatial_embed, h_spatials)
            weighted_value_list.append(weighted_value)
            attn_list.append(attn)

        if self.num_attention_head > 1:
            return self.final_attn_linear(torch.cat(weighted_value_list, dim=-1)), attn_list
        else:
            return weighted_value_list[0], attn_list[0]


class SRNN(nn.Module):
    """
    Class for the DS-RNN model, see https://arxiv.org/abs/2011.04820 for details
    """
    def __init__(self, obs_space_dict, args, infer=False):
        """
        Initializer function
        params:
        args : Training arguments
        infer : Training or test time (True at test time)
        """
        super(SRNN, self).__init__()
        self.infer = infer
        self.is_recurrent = True
        self.args=args

        self.human_num = obs_space_dict['spatial_edges'].shape[0]

        self.seq_length = args.seq_length
        self.nenv = args.num_processes
        self.nminibatch = args.num_mini_batch

        # Store required sizes
        self.human_node_rnn_size = args.human_node_rnn_size
        self.human_human_edge_rnn_size = args.human_human_edge_rnn_size
        self.output_size = args.human_node_output_size

        # Initialize the Node and Edge RNNs
        self.humanNodeRNN = HumanNodeRNN(args)
        spatial_args = copy.deepcopy(args)
        spatial_args.human_human_edge_input_size = obs_space_dict['spatial_edges'].shape[1]
        self.humanhumanEdgeRNN_spatial = HumanHumanEdgeRNN(spatial_args)
        self.humanhumanEdgeRNN_temporal = HumanHumanEdgeRNN(args)

        # Initialize attention module
        self.attn = EdgeAttention(args)

        init_ = lambda m: init(m, nn.init.orthogonal_, lambda x: nn.init.
                               constant_(x, 0), np.sqrt(2))

        num_inputs = hidden_size = self.output_size

        self.actor = nn.Sequential(
            init_(nn.Linear(num_inputs, hidden_size)), nn.Tanh(),
            init_(nn.Linear(hidden_size, hidden_size)), nn.Tanh())

        self.critic = nn.Sequential(
            init_(nn.Linear(num_inputs, hidden_size)), nn.Tanh(),
            init_(nn.Linear(hidden_size, hidden_size)), nn.Tanh())


        self.critic_linear = init_(nn.Linear(hidden_size, 1))
        robot_size = 7 if args.env_type == 'crowd_sim' else 2
        self.robot_linear = init_(nn.Linear(robot_size, 3))
        self.human_node_final_linear=init_(nn.Linear(self.output_size,2))


        self.temporal_edges = [0]
        self.spatial_edges = np.arange(1, self.human_num+1)
        # make sure the spatial edge embedding size is the same as temporal edge size
        self.spatial_linear = init_(nn.Linear(obs_space_dict['spatial_edges'].shape[1], 2))


    def forward(self, inputs, rnn_hxs, masks, infer=False):
        if infer:
            # Test time/rollout trajectory time
            seq_length = 1
            nenv = self.nenv

        else:
            # Training time
            seq_length = self.seq_length
            nenv = self.nenv // self.nminibatch

        robot_node = reshapeT(inputs['robot_node'], seq_length, nenv)
        temporal_edges = reshapeT(inputs['temporal_edges'], seq_length, nenv)
        spatial_edges = reshapeT(inputs['spatial_edges'], seq_length, nenv)

        hidden_states_node_RNNs = reshapeT(rnn_hxs['human_node_rnn'], 1, nenv)
        hidden_states_edge_RNNs = reshapeT(rnn_hxs['human_human_edge_rnn'], 1, nenv)
        masks = reshapeT(masks, seq_length, nenv)


        if self.args.no_cuda:
            all_hidden_states_edge_RNNs = Variable(
                torch.zeros(1, nenv, self.human_num + 1, rnn_hxs['human_human_edge_rnn'].size()[-1]).cpu())
        else:
            all_hidden_states_edge_RNNs = Variable(
                torch.zeros(1, nenv, self.human_num + 1, rnn_hxs['human_human_edge_rnn'].size()[-1]).cuda())


        # Do forward pass through temporaledgeRNN
        # todo: now edgeRNNs has 3 return values!
        hidden_temporal_start_end=hidden_states_edge_RNNs[:,:,self.temporal_edges,:]
        output_temporal, _, hidden_temporal = self.humanhumanEdgeRNN_temporal(temporal_edges, hidden_temporal_start_end, masks)

        # Update the hidden state and cell state
        all_hidden_states_edge_RNNs[:, :, self.temporal_edges,:] = hidden_temporal

        # Spatial Edges
        # spatial_edges = self.spatial_linear(spatial_edges)
        hidden_spatial_start_end=hidden_states_edge_RNNs[:,:,self.spatial_edges,:]
        # Do forward pass through spatialedgeRNN
        output_spatial, _, hidden_spatial = self.humanhumanEdgeRNN_spatial(spatial_edges, hidden_spatial_start_end, masks)

        # Update the hidden state and cell state
        all_hidden_states_edge_RNNs[:, :,self.spatial_edges,: ] = hidden_spatial


        # Do forward pass through attention module
        hidden_attn_weighted, _ = self.attn(output_temporal, output_spatial)


        # concatenate human node features with robot features
        nodes_current_selected = self.robot_linear(robot_node)

        # Do a forward pass through nodeRNN
        outputs, h_nodes \
            = self.humanNodeRNN(nodes_current_selected, output_temporal, hidden_attn_weighted, hidden_states_node_RNNs, masks)


        # Update the hidden and cell states
        all_hidden_states_node_RNNs = h_nodes
        outputs_return = outputs

        rnn_hxs['human_node_rnn'] = all_hidden_states_node_RNNs
        rnn_hxs['human_human_edge_rnn'] = all_hidden_states_edge_RNNs


        # x is the output of the robot node and will be sent to actor and critic
        x = outputs_return[:, :, 0, :]

        hidden_critic = self.critic(x)
        hidden_actor = self.actor(x)

        for key in rnn_hxs:
            rnn_hxs[key] = rnn_hxs[key].squeeze(0)

        if infer:
            return self.critic_linear(hidden_critic).squeeze(0), hidden_actor.squeeze(0), rnn_hxs
        else:
            return self.critic_linear(hidden_critic).view(-1, 1), hidden_actor.view(-1, self.output_size), rnn_hxs


def reshapeT(T, seq_length, nenv):
    shape = T.size()[1:]
    return T.unsqueeze(0).reshape((seq_length, nenv, *shape))

================================================
FILE: rl/networks/storage.py
================================================
import torch
from torch.utils.data.sampler import BatchSampler, SubsetRandomSampler


def _flatten_helper(T, N, _tensor):
    if isinstance(_tensor, dict):
        for key in _tensor:
            _tensor[key] = _tensor[key].view(T * N, *(_tensor[key].size()[2:]))
        return _tensor
    else:
        return _tensor.view(T * N, *_tensor.size()[2:])

class RolloutStorage(object):
    """ The rollout buffer to store the agent's experience for PPO """
    def __init__(self, num_steps, num_processes, obs_shape, action_space,
                 human_node_rnn_size, human_human_edge_rnn_size):

        if isinstance(obs_shape, dict):
            self.obs = {}
            for key in obs_shape:
                self.obs[key] = torch.zeros(num_steps + 1, num_processes, *(obs_shape[key].shape))
            self.human_num = obs_shape['spatial_edges'].shape[0]
        else:
            self.obs = torch.zeros(num_steps + 1, num_processes, *obs_shape)

        self.recurrent_hidden_states = {} # a dict of tuple(hidden state, cell state)

        node_num = 1
        # todo: uncomment the next line for previous models!
        edge_num = self.human_num + 1
        # edge_num = 2

        self.recurrent_hidden_states['human_node_rnn'] = torch.zeros(num_steps + 1, num_processes, node_num, human_node_rnn_size)
        self.recurrent_hidden_states['human_human_edge_rnn'] = torch.zeros(num_steps + 1, num_processes, edge_num, human_human_edge_rnn_size)

        self.rewards = torch.zeros(num_steps, num_processes, 1)
        self.value_preds = torch.zeros(num_steps + 1, num_processes, 1)
        self.returns = torch.zeros(num_steps + 1, num_processes, 1)
        self.action_log_probs = torch.zeros(num_steps, num_processes, 1)
        if action_space.__class__.__name__ == 'Discrete':
            action_shape = 1
        else:
            action_shape = action_space.shape[0]
        self.actions = torch.zeros(num_steps, num_processes, action_shape)
        if action_space.__class__.__name__ == 'Discrete':
            self.actions = self.actions.long()
        self.masks = torch.ones(num_steps + 1, num_processes, 1)

        # Masks that indicate whether it's a true terminal state
        # or time limit end state
        self.bad_masks = torch.ones(num_steps + 1, num_processes, 1)

        self.num_steps = num_steps
        self.step = 0

    def to(self, device):
        for key in self.obs:
            self.obs[key] = self.obs[key].to(device)
        for key in self.recurrent_hidden_states:
            self.recurrent_hidden_states[key] = self.recurrent_hidden_states[key].to(device)

        self.rewards = self.rewards.to(device)
        self.value_preds = self.value_preds.to(device)
        self.returns = self.returns.to(device)
        self.action_log_probs = self.action_log_probs.to(device)
        self.actions = self.actions.to(device)
        self.masks = self.masks.to(device)
        self.bad_masks = self.bad_masks.to(device)

    def insert(self, obs, recurrent_hidden_states, actions, action_log_probs,
               value_preds, rewards, masks, bad_masks):


        for key in self.obs:
            self.obs[key][self.step + 1].copy_(obs[key])
        for key in recurrent_hidden_states:
            self.recurrent_hidden_states[key][self.step + 1].copy_(recurrent_hidden_states[key])

        self.actions[self.step].copy_(actions)
        self.action_log_probs[self.step].copy_(action_log_probs)
        self.value_preds[self.step].copy_(value_preds)
        self.rewards[self.step].copy_(rewards)
        self.masks[self.step + 1].copy_(masks)
        self.bad_masks[self.step + 1].copy_(bad_masks)

        self.step = (self.step + 1) % self.num_steps

    def after_update(self):

        for key in self.obs:
            self.obs[key][0].copy_(self.obs[key][-1])
        for key in self.recurrent_hidden_states:
            self.recurrent_hidden_states[key][0].copy_(self.recurrent_hidden_states[key][-1])

        self.masks[0].copy_(self.masks[-1])
        self.bad_masks[0].copy_(self.bad_masks[-1])

    def compute_returns(self,
                        next_value,
                        use_gae,
                        gamma,
                        gae_lambda,
                        use_proper_time_limits=True):
        if use_proper_time_limits:
            if use_gae:
                self.value_preds[-1] = next_value
                gae = 0
                for step in reversed(range(self.rewards.size(0))):
                    delta = self.rewards[step] + gamma * self.value_preds[
                        step + 1] * self.masks[step +
                                               1] - self.value_preds[step]
                    gae = delta + gamma * gae_lambda * self.masks[step +
                                                                  1] * gae
                    gae = gae * self.bad_masks[step + 1]
                    self.returns[step] = gae + self.value_preds[step]
            else:
                self.returns[-1] = next_value
                for step in reversed(range(self.rewards.size(0))):
                    self.returns[step] = (self.returns[step + 1] * \
                        gamma * self.masks[step + 1] + self.rewards[step]) * self.bad_masks[step + 1] \
                        + (1 - self.bad_masks[step + 1]) * self.value_preds[step]
        else:
            if use_gae:
                self.value_preds[-1] = next_value
                gae = 0
                for step in reversed(range(self.rewards.size(0))):
                    delta = self.rewards[step] + gamma * self.value_preds[
                        step + 1] * self.masks[step +
                                               1] - self.value_preds[step]
                    gae = delta + gamma * gae_lambda * self.masks[step +
                                                                  1] * gae
                    self.returns[step] = gae + self.value_preds[step]
            else:
                self.returns[-1] = next_value
                for step in reversed(range(self.rewards.size(0))):
                    self.returns[step] = self.returns[step + 1] * \
                        gamma * self.masks[step + 1] + self.rewards[step]

    def feed_forward_generator(self,
                               advantages,
                               num_mini_batch=None,
                               mini_batch_size=None):
        num_steps, num_processes = self.rewards.size()[0:2]
        batch_size = num_processes * num_steps

        if mini_batch_size is None:
            assert batch_size >= num_mini_batch, (
                "PPO requires the number of processes ({}) "
                "* number of steps ({}) = {} "
                "to be greater than or equal to the number of PPO mini batches ({})."
                "".format(num_processes, num_steps, num_processes * num_steps,
                          num_mini_batch))
            mini_batch_size = batch_size // num_mini_batch
        sampler = BatchSampler(
            SubsetRandomSampler(range(batch_size)),
            mini_batch_size,
            drop_last=True)
        for indices in sampler:


            obs_batch = {}
            for key in self.obs:
                obs_batch[key] = self.obs[key][:-1].view(-1, *self.obs[key].size()[2:])[indices]
            recurrent_hidden_states_batch = {}
            for key in self.recurrent_hidden_states:
                recurrent_hidden_states_batch[key] = self.recurrent_hidden_states[key][:-1].view(
                -1, self.recurrent_hidden_states[key].size(-1))[indices]

            actions_batch = self.actions.view(-1,
                                              self.actions.size(-1))[indices]
            value_preds_batch = self.value_preds[:-1].view(-1, 1)[indices]
            return_batch = self.returns[:-1].view(-1, 1)[indices]
            masks_batch = self.masks[:-1].view(-1, 1)[indices]
            old_action_log_probs_batch = self.action_log_probs.view(-1,
                                                                    1)[indices]
            if advantages is None:
                adv_targ = None
            else:
                adv_targ = advantages.view(-1, 1)[indices]

            yield obs_batch, recurrent_hidden_states_batch, actions_batch, \
                value_preds_batch, return_batch, masks_batch, old_action_log_probs_batch, adv_targ

    def recurrent_generator(self, advantages, num_mini_batch):
        num_processes = self.rewards.size(1)
        assert num_processes >= num_mini_batch, (
            "PPO requires the number of processes ({}) "
            "to be greater than or equal to the number of "
            "PPO mini batches ({}).".format(num_processes, num_mini_batch))
        num_envs_per_batch = num_processes // num_mini_batch
        perm = torch.randperm(num_processes)
        for start_ind in range(0, num_processes, num_envs_per_batch):

            obs_batch = {}
            for key in self.obs:
                obs_batch[key] = []
            recurrent_hidden_states_batch = {}
            for key in self.recurrent_hidden_states:
                recurrent_hidden_states_batch[key] = []

            actions_batch = []
            value_preds_batch = []
            return_batch = []
            masks_batch = []
            old_action_log_probs_batch = []
            adv_targ = []

            for offset in range(num_envs_per_batch):
                ind = perm[start_ind + offset]


                for key in self.obs:
                    obs_batch[key].append(self.obs[key][:-1, ind])
                for key in self.recurrent_hidden_states:
                    recurrent_hidden_states_batch[key].append(self.recurrent_hidden_states[key][0:1, ind])

                actions_batch.append(self.actions[:, ind])
                value_preds_batch.append(self.value_preds[:-1, ind])
                return_batch.append(self.returns[:-1, ind])
                masks_batch.append(self.masks[:-1, ind])
                old_action_log_probs_batch.append(
                    self.action_log_probs[:, ind])
                adv_targ.append(advantages[:, ind])

            T, N = self.num_steps, num_envs_per_batch
            # These are all tensors of size (T, N, -1)

            actions_batch = torch.stack(actions_batch, 1)
            value_preds_batch = torch.stack(value_preds_batch, 1)
            return_batch = torch.stack(return_batch, 1)
            masks_batch = torch.stack(masks_batch, 1)
            old_action_log_probs_batch = torch.stack(
                old_action_log_probs_batch, 1)
            adv_targ = torch.stack(adv_targ, 1)

            for key in obs_batch:
                obs_batch[key] = torch.stack(obs_batch[key], 1)
            for key in recurrent_hidden_states_batch:
                temp = torch.stack(recurrent_hidden_states_batch[key], 1)
                recurrent_hidden_states_batch[key] = temp.view(N, *(temp.size()[2:]))

            # Flatten the (T, N, ...) tensors to (T * N, ...)
            obs_batch = _flatten_helper(T, N, obs_batch)
            actions_batch = _flatten_helper(T, N, actions_batch)
            value_preds_batch = _flatten_helper(T, N, value_preds_batch)
            return_batch = _flatten_helper(T, N, return_batch)
            masks_batch = _flatten_helper(T, N, masks_batch)
            old_action_log_probs_batch = _flatten_helper(T, N, \
                    old_action_log_probs_batch)
            adv_targ = _flatten_helper(T, N, adv_targ)

            yield obs_batch, recurrent_hidden_states_batch, actions_batch, \
                value_preds_batch, return_batch, masks_batch, old_action_log_probs_batch, adv_targ


================================================
FILE: rl/ppo/__init__.py
================================================

from .ppo import PPO

================================================
FILE: rl/ppo/ppo.py
================================================
import torch
import torch.nn as nn
import torch.optim as optim


class PPO():
    """ Class for the PPO optimizer """
    def __init__(self,
                 actor_critic,
                 clip_param,
                 ppo_epoch,
                 num_mini_batch,
                 value_loss_coef,
                 entropy_coef,
                 lr=None,
                 eps=None,
                 max_grad_norm=None,
                 use_clipped_value_loss=True):

        self.actor_critic = actor_critic

        self.clip_param = clip_param
        self.ppo_epoch = ppo_epoch
        self.num_mini_batch = num_mini_batch

        self.value_loss_coef = value_loss_coef
        self.entropy_coef = entropy_coef

        self.max_grad_norm = max_grad_norm
        self.use_clipped_value_loss = use_clipped_value_loss

        self.optimizer = optim.Adam(actor_critic.parameters(), lr=lr, eps=eps)


    def update(self, rollouts):
        advantages = rollouts.returns[:-1] - rollouts.value_preds[:-1]
        advantages = (advantages - advantages.mean()) / (
            advantages.std() + 1e-5)

        value_loss_epoch = 0
        action_loss_epoch = 0
        dist_entropy_epoch = 0


        for e in range(self.ppo_epoch):
            if self.actor_critic.is_recurrent:
                data_generator = rollouts.recurrent_generator(
                    advantages, self.num_mini_batch)
            else:
                data_generator = rollouts.feed_forward_generator(
                    advantages, self.num_mini_batch)

            for sample in data_generator:
                obs_batch, recurrent_hidden_states_batch, actions_batch, \
                   value_preds_batch, return_batch, masks_batch, old_action_log_probs_batch, \
                        adv_targ = sample

                # Reshape to do in a single forward pass for all steps
                values, action_log_probs, dist_entropy, _ = self.actor_critic.evaluate_actions(
                    obs_batch, recurrent_hidden_states_batch, masks_batch,
                    actions_batch)

                ratio = torch.exp(action_log_probs -
                                  old_action_log_probs_batch)
                surr1 = ratio * adv_targ
                surr2 = torch.clamp(ratio, 1.0 - self.clip_param,
                                    1.0 + self.clip_param) * adv_targ
                action_loss = -torch.min(surr1, surr2).mean()

                if self.use_clipped_value_loss:
                    value_pred_clipped = value_preds_batch + \
                        (values - value_preds_batch).clamp(-self.clip_param, self.clip_param)
                    value_losses = (values - return_batch).pow(2)
                    value_losses_clipped = (
                        value_pred_clipped - return_batch).pow(2)
                    value_loss = 0.5 * torch.max(value_losses,
                                                 value_losses_clipped).mean()
                else:
                    value_loss = 0.5 * (return_batch - values).pow(2).mean()

                self.optimizer.zero_grad()
                total_loss=value_loss * self.value_loss_coef + action_loss - dist_entropy * self.entropy_coef
                total_loss.backward()
                nn.utils.clip_grad_norm_(self.actor_critic.parameters(),
                                         self.max_grad_norm)
                self.optimizer.step()

                value_loss_epoch += value_loss.item()
                action_loss_epoch += action_loss.item()
                dist_entropy_epoch += dist_entropy.item()


        num_updates = self.ppo_epoch * self.num_mini_batch

        value_loss_epoch /= num_updates
        action_loss_epoch /= num_updates
        dist_entropy_epoch /= num_updates


        return value_loss_epoch, action_loss_epoch, dist_entropy_epoch


================================================
FILE: rl/vec_env/__init__.py
================================================
from .vec_env import AlreadySteppingError, NotSteppingError, VecEnv, VecEnvWrapper, VecEnvObservationWrapper, CloudpickleWrapper
from .dummy_vec_env import DummyVecEnv
from .shmem_vec_env import ShmemVecEnv
from .vec_frame_stack import VecFrameStack

from .vec_normalize import VecNormalize
from .vec_remove_dict_obs import VecExtractDictObs

__all__ = ['AlreadySteppingError', 'NotSteppingError', 'VecEnv', 'VecEnvWrapper', 'VecEnvObservationWrapper', 'CloudpickleWrapper', 'DummyVecEnv', 'ShmemVecEnv', 'VecFrameStack', 'VecNormalize', 'VecExtractDictObs']


================================================
FILE: rl/vec_env/dummy_vec_env.py
================================================
import numpy as np
from .vec_env import VecEnv
from .util import copy_obs_dict, dict_to_obs, obs_space_info
import copy

class DummyVecEnv(VecEnv):
    """
    VecEnv that does runs multiple environments sequentially, that is,
    the step and reset commands are send to one environment at a time.
    Useful when debugging and when num_env == 1 (in the latter case,
    avoids communication overhead)
    """
    def __init__(self, env_fns):
        """
        Arguments:

        env_fns: iterable of callables      functions that build environments
        """
        self.envs = [fn() for fn in env_fns]
        env = self.envs[0]
        VecEnv.__init__(self, len(env_fns), env.observation_space, env.action_space)
        obs_space = env.observation_space
        self.keys, shapes, dtypes = obs_space_info(obs_space)

        self.buf_obs = { k: np.zeros((self.num_envs,) + tuple(shapes[k]), dtype=dtypes[k]) for k in self.keys }
        self.obs_list = []
        self.buf_dones = np.zeros((self.num_envs,), dtype=np.bool)
        self.buf_rews  = np.zeros((self.num_envs,), dtype=np.float32)
        self.buf_infos = [{} for _ in range(self.num_envs)]
        self.actions = None
        self.spec = self.envs[0].spec

    def step_async(self, actions):
        listify = True
        try:
            if len(actions) == self.num_envs:
                listify = False
        except TypeError:
            pass

        if not listify:
            self.actions = actions
        else:
            assert self.num_envs == 1, "actions {} is either not a list or has a wrong size - cannot match to {} environments".format(actions, self.num_envs)
            self.actions = [actions]

    def step_wait(self):
        for e in range(self.num_envs):
            action = self.actions[e]
            # if isinstance(self.envs[e].action_space, spaces.Discrete):
            #    action = int(action)

            obs, self.buf_rews[e], self.buf_dones[e], self.buf_infos[e] = self.envs[e].step(action)
            if self.buf_dones[e]:
                obs = self.envs[e].reset()
            self._save_obs(e, obs)
        return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones),
                copy.deepcopy(self.buf_infos))

    def reset(self):
        for e in range(self.num_envs):
            obs = self.envs[e].reset()
            self._save_obs(e, obs)
        return self._obs_from_buf()

    def talk2Env_async(self, data):
        self.envs[0].env.talk2Env(data[0])
        pass

    def talk2Env_wait(self):
        return [True]

    def _save_obs(self, e, obs):
        d={}
        for k in self.keys:
            if k is None:
                self.buf_obs[k][e] = obs
            else:
                d[k] = self.buf_obs[k][0]
                self.buf_obs[k][e] = obs[k]
        self.obs_list = [dict_to_obs(copy_obs_dict(d))]

    def _obs_from_buf(self):
        return dict_to_obs(copy_obs_dict(self.buf_obs))

    def get_images(self):
        return [env.render(mode='rgb_array') for env in self.envs]

    def render(self, mode='human'):
        if self.num_envs == 1:
            return self.envs[0].render(mode=mode)
        else:
            return super().render(mode=mode)


================================================
FILE: rl/vec_env/envs.py
================================================
import os

import gym
import numpy as np

from gym.spaces.box import Box

from .vec_env import VecEnvWrapper
import torch

from .dummy_vec_env import DummyVecEnv
from .shmem_vec_env import ShmemVecEnv


from .vec_pretext_normalize import VecPretextNormalize


def make_env(env_id, seed, rank):
    def _thunk():

        env = gym.make(env_id)

        env.seed(seed + rank)

        if str(env.__class__.__name__).find('TimeLimit') >= 0:
            env = TimeLimitMask(env)

        return env

    return _thunk


def make_vec_envs(env_name,
                  seed,
                  num_processes,
                  gamma,
                  device,
                  randomCollect,
                  config=None):
    envs = [
        make_env(env_name, seed, i)
        for i in range(num_processes)
    ]

    if len(envs) > 1:
        envs = ShmemVecEnv(envs) if os.name == 'nt' else ShmemVecEnv(envs, context='fork')
    else:
        envs = DummyVecEnv(envs)

    if not randomCollect: # if it is not for random data collection phase
        if gamma is None:
            envs = VecPretextNormalize(envs, ob=False, ret=False, config=config)
        else:
            envs = VecPretextNormalize(envs, ob=False, gamma=gamma, config=config)

        envs = VecPyTorch(envs, device)

    return envs


# Checks whether done was caused my timit limits or not
class TimeLimitMask(gym.Wrapper):
    def step(self, action):
        obs, rew, done, info = self.env.step(action)
        if done and self.env._max_episode_steps == self.env._elapsed_steps:
            info['bad_transition'] = True

        return obs, rew, done, info

    def reset(self, **kwargs):
        return self.env.reset(**kwargs)

class VecPyTorch(VecEnvWrapper):
    def __init__(self, venv, device):
        """Return only every `skip`-th frame"""
        super(VecPyTorch, self).__init__(venv)
        self.device = device
        # TODO: Fix data types

    def reset(self):
        obs = self.venv.reset()
        if isinstance(obs, dict):
            for key in obs:
                obs[key]=torch.from_numpy(obs[key]).to(self.device)
        else:
            obs = torch.from_numpy(obs).float().to(self.device)
        return obs

    def step_async(self, actions):
        if isinstance(actions, torch.LongTensor):
            # Squeeze the dimension for discrete actions
            actions = actions.squeeze(1)
        actions = actions.cpu().numpy()
        self.venv.step_async(actions)

    def step_wait(self):
        obs, reward, done, info = self.venv.step_wait()
        if isinstance(obs, dict):
            for key in obs:
                obs[key] = torch.from_numpy(obs[key]).to(self.device)
        else:
            obs = torch.from_numpy(obs).float().to(self.device)
        reward = torch.from_numpy(reward).unsqueeze(dim=1).float()
        return obs, reward, done, info


================================================
FILE: rl/vec_env/logger.py
================================================
import os
import sys
import shutil
import os.path as osp
import json
import time
import datetime
import tempfile
from collections import defaultdict
from contextlib import contextmanager

DEBUG = 10
INFO = 20
WARN = 30
ERROR = 40

DISABLED = 50

class KVWriter(object):
    def writekvs(self, kvs):
        raise NotImplementedError

class SeqWriter(object):
    def writeseq(self, seq):
        raise NotImplementedError

class HumanOutputFormat(KVWriter, SeqWriter):
    def __init__(self, filename_or_file):
        if isinstance(filename_or_file, str):
            self.file = open(filename_or_file, 'wt')
            self.own_file = True
        else:
            assert hasattr(filename_or_file, 'read'), 'expected file or str, got %s'%filename_or_file
            self.file = filename_or_file
            self.own_file = False

    def writekvs(self, kvs):
        # Create strings for printing
        key2str = {}
        for (key, val) in sorted(kvs.items()):
            if hasattr(val, '__float__'):
                valstr = '%-8.3g' % val
            else:
                valstr = str(val)
            key2str[self._truncate(key)] = self._truncate(valstr)

        # Find max widths
        if len(key2str) == 0:
            print('WARNING: tried to write empty key-value dict')
            return
        else:
            keywidth = max(map(len, key2str.keys()))
            valwidth = max(map(len, key2str.values()))

        # Write out the data
        dashes = '-' * (keywidth + valwidth + 7)
        lines = [dashes]
        for (key, val) in sorted(key2str.items(), key=lambda kv: kv[0].lower()):
            lines.append('| %s%s | %s%s |' % (
                key,
                ' ' * (keywidth - len(key)),
                val,
                ' ' * (valwidth - len(val)),
            ))
        lines.append(dashes)
        self.file.write('\n'.join(lines) + '\n')

        # Flush the output to the file
        self.file.flush()

    def _truncate(self, s):
        maxlen = 30
        return s[:maxlen-3] + '...' if len(s) > maxlen else s

    def writeseq(self, seq):
        seq = list(seq)
        for (i, elem) in enumerate(seq):
            self.file.write(elem)
            if i < len(seq) - 1: # add space unless this is the last one
                self.file.write(' ')
        self.file.write('\n')
        self.file.flush()

    def close(self):
        if self.own_file:
            self.file.close()

class JSONOutputFormat(KVWriter):
    def __init__(self, filename):
        self.file = open(filename, 'wt')

    def writekvs(self, kvs):
        for k, v in sorted(kvs.items()):
            if hasattr(v, 'dtype'):
                kvs[k] = float(v)
        self.file.write(json.dumps(kvs) + '\n')
        self.file.flush()

    def close(self):
        self.file.close()

class CSVOutputFormat(KVWriter):
    def __init__(self, filename):
        self.file = open(filename, 'w+t')
        self.keys = []
        self.sep = ','

    def writekvs(self, kvs):
        # Add our current row to the history
        extra_keys = list(kvs.keys() - self.keys)
        extra_keys.sort()
        if extra_keys:
            self.keys.extend(extra_keys)
            self.file.seek(0)
            lines = self.file.readlines()
            self.file.seek(0)
            for (i, k) in enumerate(self.keys):
                if i > 0:
                    self.file.write(',')
                self.file.write(k)
            self.file.write('\n')
            for line in lines[1:]:
                self.file.write(line[:-1])
                self.file.write(self.sep * len(extra_keys))
                self.file.write('\n')
        for (i, k) in enumerate(self.keys):
            if i > 0:
                self.file.write(',')
            v = kvs.get(k)
            if v is not None:
                self.file.write(str(v))
        self.file.write('\n')
        self.file.flush()

    def close(self):
        self.file.close()


class TensorBoardOutputFormat(KVWriter):
    """
    Dumps key/value pairs into TensorBoard's numeric format.
    """
    def __init__(self, dir):
        os.makedirs(dir, exist_ok=True)
        self.dir = dir
        self.step = 1
        prefix = 'events'
        path = osp.join(osp.abspath(dir), prefix)
        import tensorflow as tf
        from tensorflow.python import pywrap_tensorflow
        from tensorflow.core.util import event_pb2
        from tensorflow.python.util import compat
        self.tf = tf
        self.event_pb2 = event_pb2
        self.pywrap_tensorflow = pywrap_tensorflow
        self.writer = pywrap_tensorflow.EventsWriter(compat.as_bytes(path))

    def writekvs(self, kvs):
        def summary_val(k, v):
            kwargs = {'tag': k, 'simple_value': float(v)}
            return self.tf.Summary.Value(**kwargs)
        summary = self.tf.Summary(value=[summary_val(k, v) for k, v in kvs.items()])
        event = self.event_pb2.Event(wall_time=time.time(), summary=summary)
        event.step = self.step # is there any reason why you'd want to specify the step?
        self.writer.WriteEvent(event)
        self.writer.Flush()
        self.step += 1

    def close(self):
        if self.writer:
            self.writer.Close()
            self.writer = None

def make_output_format(format, ev_dir, log_suffix=''):
    os.makedirs(ev_dir, exist_ok=True)
    if format == 'stdout':
        return HumanOutputFormat(sys.stdout)
    elif format == 'log':
        return HumanOutputFormat(osp.join(ev_dir, 'log%s.txt' % log_suffix))
    elif format == 'json':
        return JSONOutputFormat(osp.join(ev_dir, 'progress%s.json' % log_suffix))
    elif format == 'csv':
        return CSVOutputFormat(osp.join(ev_dir, 'progress%s.csv' % log_suffix))
    elif format == 'tensorboard':
        return TensorBoardOutputFormat(osp.join(ev_dir, 'tb%s' % log_suffix))
    else:
        raise ValueError('Unknown format specified: %s' % (format,))

# ================================================================
# API
# ================================================================

def logkv(key, val):
    """
    Log a value of some diagnostic
    Call this once for each diagnostic quantity, each iteration
    If called many times, last value will be used.
    """
    get_current().logkv(key, val)

def logkv_mean(key, val):
    """
    The same as logkv(), but if called many times, values averaged.
    """
    get_current().logkv_mean(key, val)

def logkvs(d):
    """
    Log a dictionary of key-value pairs
    """
    for (k, v) in d.items():
        logkv(k, v)

def dumpkvs():
    """
    Write all of the diagnostics from the current iteration
    """
    return get_current().dumpkvs()

def getkvs():
    return get_current().name2val


def log(*args, level=INFO):
    """
    Write the sequence of args, with no separators, to the console and output files (if you've configured an output file).
    """
    get_current().log(*args, level=level)

def debug(*args):
    log(*args, level=DEBUG)

def info(*args):
    log(*args, level=INFO)

def warn(*args):
    log(*args, level=WARN)

def error(*args):
    log(*args, level=ERROR)


def set_level(level):
    """
    Set logging threshold on current logger.
    """
    get_current().set_level(level)

def set_comm(comm):
    get_current().set_comm(comm)

def get_dir():
    """
    Get directory that log files are being written to.
    will be None if there is no output directory (i.e., if you didn't call start)
    """
    return get_current().get_dir()

record_tabular = logkv
dump_tabular = dumpkvs

@contextmanager
def profile_kv(scopename):
    logkey = 'wait_' + scopename
    tstart = time.time()
    try:
        yield
    finally:
        get_current().name2val[logkey] += time.time() - tstart

def profile(n):
    """
    Usage:
    @profile("my_func")
    def my_func(): code
    """
    def decorator_with_name(func):
        def func_wrapper(*args, **kwargs):
            with profile_kv(n):
                return func(*args, **kwargs)
        return func_wrapper
    return decorator_with_name


# ================================================================
# Backend
# ================================================================

def get_current():
    if Logger.CURRENT is None:
        _configure_default_logger()

    return Logger.CURRENT


class Logger(object):
    DEFAULT = None  # A logger with no output files. (See right below class definition)
                    # So that you can still log to the terminal without setting up any output files
    CURRENT = None  # Current logger being used by the free functions above

    def __init__(self, dir, output_formats, comm=None):
        self.name2val = defaultdict(float)  # values this iteration
        self.name2cnt = defaultdict(int)
        self.level = INFO
        self.dir = dir
        self.output_formats = output_formats
        self.comm = comm

    # Logging API, forwarded
    # ----------------------------------------
    def logkv(self, key, val):
        self.name2val[key] = val

    def logkv_mean(self, key, val):
        oldval, cnt = self.name2val[key], self.name2cnt[key]
        self.name2val[key] = oldval*cnt/(cnt+1) + val/(cnt+1)
        self.name2cnt[key] = cnt + 1

    def dumpkvs(self):
        if self.comm is None:
            d = self.name2val
        else:
            from baselines.common import mpi_util
            d = mpi_util.mpi_weighted_mean(self.comm,
                {name : (val, self.name2cnt.get(name, 1))
                    for (name, val) in self.name2val.items()})
            if self.comm.rank != 0:
                d['dummy'] = 1 # so we don't get a warning about empty dict
        out = d.copy() # Return the dict for unit testing purposes
        for fmt in self.output_formats:
            if isinstance(fmt, KVWriter):
                fmt.writekvs(d)
        self.name2val.clear()
        self.name2cnt.clear()
        return out

    def log(self, *args, level=INFO):
        if self.level <= level:
            self._do_log(args)

    # Configuration
    # ----------------------------------------
    def set_level(self, level):
        self.level = level

    def set_comm(self, comm):
        self.comm = comm

    def get_dir(self):
        return self.dir

    def close(self):
        for fmt in self.output_formats:
            fmt.close()

    # Misc
    # ----------------------------------------
    def _do_log(self, args):
        for fmt in self.output_formats:
            if isinstance(fmt, SeqWriter):
                fmt.writeseq(map(str, args))

def get_rank_without_mpi_import():
    # check environment variables here instead of importing mpi4py
    # to avoid calling MPI_Init() when this module is imported
    for varname in ['PMI_RANK', 'OMPI_COMM_WORLD_RANK']:
        if varname in os.environ:
            return int(os.environ[varname])
    return 0


def configure(dir=None, format_strs=None, comm=None, log_suffix=''):
    """
    If comm is provided, average all numerical stats across that comm
    """
    if dir is None:
        dir = os.getenv('OPENAI_LOGDIR')
    if dir is None:
        dir = osp.join(tempfile.gettempdir(),
            datetime.datetime.now().strftime("openai-%Y-%m-%d-%H-%M-%S-%f"))
    assert isinstance(dir, str)
    dir = os.path.expanduser(dir)
    os.makedirs(os.path.expanduser(dir), exist_ok=True)

    rank = get_rank_without_mpi_import()
    if rank > 0:
        log_suffix = log_suffix + "-rank%03i" % rank

    if format_strs is None:
        if rank == 0:
            format_strs = os.getenv('OPENAI_LOG_FORMAT', 'stdout,log,csv').split(',')
        else:
            format_strs = os.getenv('OPENAI_LOG_FORMAT_MPI', 'log').split(',')
    format_strs = filter(None, format_strs)
    output_formats = [make_output_format(f, dir, log_suffix) for f in format_strs]

    Logger.CURRENT = Logger(dir=dir, output_formats=output_formats, comm=comm)
    if output_formats:
        log('Logging to %s'%dir)

def _configure_default_logger():
    configure()
    Logger.DEFAULT = Logger.CURRENT

def reset():
    if Logger.CURRENT is not Logger.DEFAULT:
        Logger.CURRENT.close()
        Logger.CURRENT = Logger.DEFAULT
        log('Reset logger')

@contextmanager
def scoped_configure(dir=None, format_strs=None, comm=None):
    prevlogger = Logger.CURRENT
    configure(dir=dir, format_strs=format_strs, comm=comm)
    try:
        yield
    finally:
        Logger.CURRENT.close()
        Logger.CURRENT = prevlogger

# ================================================================

def _demo():
    info("hi")
    debug("shouldn't appear")
    set_level(DEBUG)
    debug("should appear")
    dir = "/tmp/testlogging"
    if os.path.exists(dir):
        shutil.rmtree(dir)
    configure(dir=dir)
    logkv("a", 3)
    logkv("b", 2.5)
    dumpkvs()
    logkv("b", -2.5)
    logkv("a", 5.5)
    dumpkvs()
    info("^^^ should see a = 5.5")
    logkv_mean("b", -22.5)
    logkv_mean("b", -44.4)
    logkv("a", 5.5)
    dumpkvs()
    info("^^^ should see b = -33.3")

    logkv("b", -2.5)
    dumpkvs()

    logkv("a", "longasslongasslongasslongasslongasslongassvalue")
    dumpkvs()


# ================================================================
# Readers
# ================================================================

def read_json(fname):
    import pandas
    ds = []
    with open(fname, 'rt') as fh:
        for line in fh:
            ds.append(json.loads(line))
    return pandas.DataFrame(ds)

def read_csv(fname):
    import pandas
    return pandas.read_csv(fname, index_col=None, comment='#')

def read_tb(path):
    """
    path : a tensorboard file OR a directory, where we will find all TB files
           of the form events.*
    """
    import pandas
    import numpy as np
    from glob import glob
    import tensorflow as tf
    if osp.isdir(path):
        fnames = glob(osp.join(path, "events.*"))
    elif osp.basename(path).startswith("events."):
        fnames = [path]
    else:
        raise NotImplementedError("Expected tensorboard file or directory containing them. Got %s"%path)
    tag2pairs = defaultdict(list)
    maxstep = 0
    for fname in fnames:
        for summary in tf.train.summary_iterator(fname):
            if summary.step > 0:
                for v in summary.summary.value:
                    pair = (summary.step, v.simple_value)
                    tag2pairs[v.tag].append(pair)
                maxstep = max(summary.step, maxstep)
    data = np.empty((maxstep, len(tag2pairs)))
    data[:] = np.nan
    tags = sorted(tag2pairs.keys())
    for (colidx,tag) in enumerate(tags):
        pairs = tag2pairs[tag]
        for (step, value) in pairs:
            data[step-1, colidx] = value
    return pandas.DataFrame(data, columns=tags)

if __name__ == "__main__":
    _demo()


================================================
FILE: rl/vec_env/running_mean_std.py
================================================
#from typing import Tuple

import numpy as np


class RunningMeanStd(object):
    def __init__(self, epsilon = 1e-4, shape = ()):
        """
        Calulates the running mean and std of a data stream
        https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
        :param epsilon: helps with arithmetic issues
        :param shape: the shape of the data stream's output
        """
        self.mean = np.zeros(shape, np.float64)
        self.var = np.ones(shape, np.float64)
        self.count = epsilon

    def update(self, arr):
        batch_mean = np.mean(arr, axis=0)
        batch_var = np.var(arr, axis=0)
        batch_count = arr.shape[0]
        self.update_from_moments(batch_mean, batch_var, batch_count)

    def update_from_moments(self, batch_mean, batch_var, batch_count):
        delta = batch_mean - self.mean
        tot_count = self.count + batch_count

        new_mean = self.mean + delta * batch_count / tot_count
        m_a = self.var * self.count
        m_b = batch_var * batch_count
        m_2 = m_a + m_b + np.square(delta) * self.count * batch_count / (self.count + batch_count)
        new_var = m_2 / (self.count + batch_count)

        new_count = batch_count + self.count

        self.mean = new_mean
        self.var = new_var
        self.count = new_count

================================================
FILE: rl/vec_env/shmem_vec_env.py
================================================
"""
An interface for asynchronous vectorized environments.
"""

import multiprocessing as mp
import numpy as np
from .vec_env import VecEnv, CloudpickleWrapper, clear_mpi_env_vars
import ctypes
#from . import logger

from .util import dict_to_obs, obs_space_info, obs_to_dict

_NP_TO_CT = {np.float32: ctypes.c_float,
             np.int32: ctypes.c_int32,
             np.int8: ctypes.c_int8,
             np.uint8: ctypes.c_char,
             np.bool: ctypes.c_bool}


class ShmemVecEnv(VecEnv):
    """
    Optimized version of SubprocVecEnv that uses shared variables to communicate observations.
    """

    def __init__(self, env_fns, spaces=None, context='spawn'):
        """
        If you don't specify observation_space, we'll have to create a dummy
        environment to get it.
        """
        ctx = mp.get_context(context)
        if spaces:
            observation_space, action_space = spaces
        else:
            print('Creating dummy env object to get spaces')
            dummy = env_fns[0]()
            observation_space, action_space = dummy.observation_space, dummy.action_space
            dummy.close()
            del dummy
        VecEnv.__init__(self, len(env_fns), observation_space, action_space)
        self.obs_keys, self.obs_shapes, self.obs_dtypes = obs_space_info(observation_space)
        self.obs_bufs = [
            {k: ctx.Array(_NP_TO_CT[self.obs_dtypes[k].type], int(np.prod(self.obs_shapes[k]))) for k in self.obs_keys}
            for _ in env_fns]
        self.obs_list=[]
        self.parent_pipes = []
        self.procs = []
        with clear_mpi_env_vars():
            for env_fn, obs_buf in zip(env_fns, self.obs_bufs):
                wrapped_fn = CloudpickleWrapper(env_fn)
                parent_pipe, child_pipe = ctx.Pipe()
                proc = ctx.Process(target=_subproc_worker,
                            args=(child_pipe, parent_pipe, wrapped_fn, obs_buf, self.obs_shapes, self.obs_dtypes, self.obs_keys))
                proc.daemon = True
                self.procs.append(proc)
                self.parent_pipes.append(parent_pipe)
                proc.start()
                child_pipe.close()
        self.waiting_step = False
        self.viewer = None

    def reset(self):
        if self.waiting_step:
            print('Called reset() while waiting for the step to complete')
            self.step_wait()
        for pipe in self.parent_pipes:
            pipe.send(('reset', None))
        return self._decode_obses([pipe.recv() for pipe in self.parent_pipes])

    def step_async(self, actions):
        assert len(actions) == len(self.parent_pipes)
        for pipe, act in zip(self.parent_pipes, actions):
            pipe.send(('step', act))
        self.waiting_step = True

    def step_wait(self):
        outs = [pipe.recv() for pipe in self.parent_pipes]
        self.waiting_step = False
        obs, rews, dones, infos = zip(*outs)
        return self._decode_obses(obs), np.array(rews), np.array(dones), infos

    def talk2Env_async(self, data):
        assert len(data) == len(self.parent_pipes)
        for pipe, d in zip(self.parent_pipes, data):
            pipe.send(('talk2Env', d))
        self.waiting_step = True

    def talk2Env_wait(self):
        outs = [pipe.recv() for pipe in self.parent_pipes]  # pipe.recv() is a blocking call
        self.waiting_step = False
        return np.array(outs)

    def close_extras(self):
        if self.waiting_step:
            self.step_wait()
        for pipe in self.parent_pipes:
            pipe.send(('close', None))
        for pipe in self.parent_pipes:
            pipe.recv()
            pipe.close()
        for proc in self.procs:
            proc.join()

    def get_images(self, mode='human'):
        for pipe in self.parent_pipes:
            pipe.send(('render', None))
        return [pipe.recv() for pipe in self.parent_pipes]

    def _decode_obses(self, obs):
        result = {}
        self.obs_list=[{} for _ in range(len(self.obs_bufs))]
        for k in self.obs_keys:
            bufs = [b[k] for b in self.obs_bufs]
            o=[]
            for idx, b in enumerate(bufs):
                item=np.frombuffer(b.get_obj(), dtype=self.obs_dtypes[k]).reshape(self.obs_shapes[k])
                o.append(item)
                self.obs_list[idx][k]=item
            result[k] = np.array(o)
        return dict_to_obs(result)


def _subproc_worker(pipe, parent_pipe, env_fn_wrapper, obs_bufs, obs_shapes, obs_dtypes, keys):
    """
    Control a single environment instance using IPC and
    shared memory.
    """
    def _write_obs(maybe_dict_obs):
        flatdict = obs_to_dict(maybe_dict_obs)
        for k in keys:
            dst = obs_bufs[k].get_obj()
            dst_np = np.frombuffer(dst, dtype=obs_dtypes[k]).reshape(obs_shapes[k])  # pylint: disable=W0212
            np.copyto(dst_np, flatdict[k])

    env = env_fn_wrapper.x()
    parent_pipe.close()
    try:
        while True:
            cmd, data = pipe.recv()
            if cmd == 'reset':
                pipe.send(_write_obs(env.reset()))
            elif cmd == 'step':
                obs, reward, done, info = env.step(data)
                if done:
                    obs = env.reset()
                pipe.send((_write_obs(obs), reward, done, info))
            elif cmd == 'render':
                pipe.send(env.render(mode='rgb_array'))
            elif cmd == 'close':
                pipe.send(None)
                break
            elif cmd == 'talk2Env':
                pipe.send(env.talk2Env(data))
            else:
                raise RuntimeError('Got unrecognized cmd %s' % cmd)
    except KeyboardInterrupt:
        print('ShmemVecEnv worker: got KeyboardInterrupt')
    finally:
        env.close()


================================================
FILE: rl/vec_env/tile_images.py
================================================
import numpy as np

def tile_images(img_nhwc):
    """
    Tile N images into one big PxQ image
    (P,Q) are chosen to be as close as possible, and if N
    is square, then P=Q.

    input: img_nhwc, list or array of images, ndim=4 once turned into array
        n = batch index, h = height, w = width, c = channel
    returns:
        bigim_HWc, ndarray with ndim=3
    """
    img_nhwc = np.asarray(img_nhwc)
    N, h, w, c = img_nhwc.shape
    H = int(np.ceil(np.sqrt(N)))
    W = int(np.ceil(float(N)/H))
    img_nhwc = np.array(list(img_nhwc) + [img_nhwc[0]*0 for _ in range(N, H*W)])
    img_HWhwc = img_nhwc.reshape(H, W, h, w, c)
    img_HhWwc = img_HWhwc.transpose(0, 2, 1, 3, 4)
    img_Hh_Ww_c = img_HhWwc.reshape(H*h, W*w, c)
    return img_Hh_Ww_c


================================================
FILE: rl/vec_env/util.py
================================================
"""
Helpers for dealing with vectorized environments.
"""

from collections import OrderedDict

import gym
import numpy as np


def copy_obs_dict(obs):
    """
    Deep-copy an observation dict.
    """
    return {k: np.copy(v) for k, v in obs.items()}


def dict_to_obs(obs_dict):
    """
    Convert an observation dict into a raw array if the
    original observation space was not a Dict space.
    """
    if set(obs_dict.keys()) == {None}:
        return obs_dict[None]
    return obs_dict


def obs_space_info(obs_space):
    """
    Get dict-structured information about a gym.Space.

    Returns:
      A tuple (keys, shapes, dtypes):
        keys: a list of dict keys.
        shapes: a dict mapping keys to shapes.
        dtypes: a dict mapping keys to dtypes.
    """
    if isinstance(obs_space, gym.spaces.Dict):
        assert isinstance(obs_space.spaces, OrderedDict)
        subspaces = obs_space.spaces
    elif isinstance(obs_space, gym.spaces.Tuple):
        assert isinstance(obs_space.spaces, tuple)
        subspaces = {i: obs_space.spaces[i] for i in range(len(obs_space.spaces))}
    else:
        subspaces = {None: obs_space}
    keys = []
    shapes = {}
    dtypes = {}
    for key, box in subspaces.items():
        keys.append(key)
        shapes[key] = box.shape
        dtypes[key] = box.dtype
    return keys, shapes, dtypes


def obs_to_dict(obs):
    """
    Convert an observation into a dict.
    """
    if isinstance(obs, dict):
        return obs
    return {None: obs}


================================================
FILE: rl/vec_env/vec_env.py
================================================
import contextlib
import os
import abc
from abc import abstractmethod
import abc
ABC = abc.ABCMeta('ABC', (), {})

#from envs.vec_env.tile_images import tile_images

class AlreadySteppingError(Exception):
    """
    Raised when an asynchronous step is running while
    step_async() is called again.
    """

    def __init__(self):
        msg = 'already running an async step'
        Exception.__init__(self, msg)


class NotSteppingError(Exception):
    """
    Raised when an asynchronous step is not running but
    step_wait() is called.
    """

    def __init__(self):
        msg = 'not running an async step'
        Exception.__init__(self, msg)


class VecEnv(ABC):
    """
    An abstract asynchronous, vectorized environment.
    Used to batch data from multiple copies of an environment, so that
    each observation becomes an batch of observations, and expected action is a batch of actions to
    be applied per-environment.
    """
    closed = False
    viewer = None

    metadata = {
        'render.modes': ['human', 'rgb_array']
    }

    def __init__(self, num_envs, observation_space, action_space):
        self.num_envs = num_envs
        self.observation_space = observation_space
        self.action_space = action_space

    @abstractmethod
    def reset(self):
        """
        Reset all the environments and return an array of
        observations, or a dict of observation arrays.

        If step_async is still doing work, that work will
        be cancelled and step_wait() should not be called
        until step_async() is invoked again.
        """
        pass

    @abstractmethod
    def step_async(self, actions):
        """
        Tell all the environments to start taking a step
        with the given actions.
        Call step_wait() to get the results of the step.

        You should not call this if a step_async run is
        already pending.
        """
        pass

    @abstractmethod
    def step_wait(self):
        """
        Wait for the step taken with step_async().

        Returns (obs, rews, dones, infos):
         - obs: an array of observations, or a dict of
                arrays of observations.
         - rews: an array of rewards
         - dones: an array of "episode done" booleans
         - infos: a sequence of info objects
        """
        pass

    @abstractmethod
    def talk2Env_async(self, data):
        pass

    @abstractmethod
    def talk2Env_wait(self):
        pass

    def close_extras(self):
        """
        Clean up the  extra resources, beyond what's in this base class.
        Only runs when not self.closed.
        """
        pass

    def close(self):
        if self.closed:
            return
        if self.viewer is not None:
            self.viewer.close()
        self.close_extras()
        self.closed = True

    def step(self, actions):
        """
        Step the environments synchronously.

        This is available for backwards compatibility.
        """
        self.step_async(actions)
        return self.step_wait()

    def render(self, mode='human'):
        imgs = self.get_images()
        bigimg = tile_images(imgs)
        if mode == 'human':
            self.get_viewer().imshow(bigimg)
            return self.get_viewer().isopen
        elif mode == 'rgb_array':
            return bigimg
        else:
            raise NotImplementedError

    def talk2Env(self, data):
        self.talk2Env_async(data)
        return self.talk2Env_wait()

    def get_images(self):
        """
        Return RGB images from each environment
        """
        raise NotImplementedError

    @property
    def unwrapped(self):
        if isinstance(self, VecEnvWrapper):
            return self.venv.unwrapped
        else:
            return self

    def get_viewer(self):
        if self.viewer is None:
            from gym.envs.classic_control import rendering
            self.viewer = rendering.SimpleImageViewer()
        return self.viewer

class VecEnvWrapper(VecEnv):
    """
    An environment wrapper that applies to an entire batch
    of environments at once.
    """

    def __init__(self, venv, observation_space=None, action_space=None):
        self.venv = venv
        super(VecEnvWrapper,self).__init__(num_envs=venv.num_envs,
                        observation_space=observation_space or venv.observation_space,
                        action_space=action_space or venv.action_space)

    def step_async(self, actions):
        self.venv.step_async(actions)

    def talk2Env_async(self, data):
        self.venv.talk2Env_async(data)

    @abstractmethod
    def talk2Env_wait(self):
        pass

    @abstractmethod
    def reset(self):
        pass

    @abstractmethod
    def step_wait(self):
        pass

    def close(self):
        return self.venv.close()

    def render(self, mode='human'):
        return self.venv.render(mode=mode)

    def get_images(self):
        return self.venv.get_images()

    def __getattr__(self, name):
        if name.startswith('_'):
            raise AttributeError("attempted to get missing private attribute '{}'".format(name))
        return getattr(self.venv, name)

class VecEnvObservationWrapper(VecEnvWrapper):
    @abstractmethod
    def process(self, obs):
        pass

    def reset(self):
        obs = self.venv.reset()
        return self.process(obs)

    def step_wait(self):
        obs, rews, dones, infos = self.venv.step_wait()
        return self.process(obs), rews, dones, infos

class CloudpickleWrapper(object):
    """
    Uses cloudpickle to serialize contents (otherwise multiprocessing tries to use pickle)
    """

    def __init__(self, x):
        self.x = x

    def __getstate__(self):
        import cloudpickle
        return cloudpickle.dumps(self.x)

    def __setstate__(self, ob):
        import pickle
        self.x = pickle.loads(ob)


@contextlib.contextmanager
def clear_mpi_env_vars():
    """
    from mpi4py import MPI will call MPI_Init by default.  If the child process has MPI environment variables, MPI will think that the child process is an MPI process just like the parent and do bad things such as hang.
    This context manager is a hacky way to clear those environment variables temporarily such as when we are starting multiprocessing
    Processes.
    """
    removed_environment = {}
    for k, v in list(os.environ.items()):
        for prefix in ['OMPI_', 'PMI_']:
            if k.startswith(prefix):
                removed_environment[k] = v
                del os.environ[k]
    try:
        yield
    finally:
        os.environ.update(removed_environment)


================================================
FILE: rl/vec_env/vec_frame_stack.py
================================================
from .vec_env import VecEnvWrapper
import numpy as np
from gym import spaces


class VecFrameStack(VecEnvWrapper):
    def __init__(self, venv, nstack):
        self.venv = venv
        self.nstack = nstack
        wos = venv.observation_space  # wrapped ob space
        low = np.repeat(wos.low, self.nstack, axis=-1)
        high = np.repeat(wos.high, self.nstack, axis=-1)
        self.stackedobs = np.zeros((venv.num_envs,) + low.shape, low.dtype)
        observation_space = spaces.Box(low=low, high=high, dtype=venv.observation_space.dtype)
        VecEnvWrapper.__init__(self, venv, observation_space=observation_space)

    def step_wait(self):
        obs, rews, news, infos = self.venv.step_wait()
        self.stackedobs = np.roll(self.stackedobs, shift=-1, axis=-1)
        for (i, new) in enumerate(news):
            if new:
                self.stackedobs[i] = 0
        self.stackedobs[..., -obs.shape[-1]:] = obs
        return self.stackedobs, rews, news, infos

    def reset(self):
        obs = self.venv.reset()
        self.stackedobs[...] = 0
        self.stackedobs[..., -obs.shape[-1]:] = obs
        return self.stackedobs


================================================
FILE: rl/vec_env/vec_normalize.py
================================================
from . import VecEnvWrapper
import numpy as np

class VecNormalize(VecEnvWrapper):
    """
    A vectorized wrapper that normalizes the observations
    and returns from an environment.
    """

    def __init__(self, venv, ob=True, ret=True, clipob=10., cliprew=10., gamma=0.99, epsilon=1e-8, use_tf=False):
        VecEnvWrapper.__init__(self, venv)
        if use_tf:
            from baselines.common.running_mean_std import TfRunningMeanStd
            self.ob_rms = TfRunningMeanStd(shape=self.observation_space.shape, scope='ob_rms') if ob else None
            self.ret_rms = TfRunningMeanStd(shape=(), scope='ret_rms') if ret else None
        else:
            from baselines.common.running_mean_std import RunningMeanStd
            self.ob_rms = RunningMeanStd(shape=self.observation_space.shape) if ob else None
            self.ret_rms = RunningMeanStd(shape=()) if ret else None
        self.clipob = clipob
        self.cliprew = cliprew
        self.ret = np.zeros(self.num_envs)
        self.gamma = gamma
        self.epsilon = epsilon

    def step_wait(self):
        obs, rews, news, infos = self.venv.step_wait()
        self.ret = self.ret * self.gamma + rews
        obs = self._obfilt(obs)
        if self.ret_rms:
            self.ret_rms.update(self.ret)
            rews = np.clip(rews / np.sqrt(self.ret_rms.var + self.epsilon), -self.cliprew, self.cliprew)
        self.ret[news] = 0.
        return obs, rews, news, infos

    def _obfilt(self, obs):
        if self.ob_rms:
            self.ob_rms.update(obs)
            obs = np.clip((obs - self.ob_rms.mean) / np.sqrt(self.ob_rms.var + self.epsilon), -self.clipob, self.clipob)
            return obs
        else:
            return obs

    def reset(self):
        self.ret = np.zeros(self.num_envs)
        obs = self.venv.reset()
        return self._obfilt(obs)


================================================
FILE: rl/vec_env/vec_pretext_normalize.py
================================================
from . import VecEnvWrapper
import numpy as np
from .running_mean_std import RunningMeanStd
import torch
import os
from collections import deque

import copy
import pickle

from gst_updated.src.gumbel_social_transformer.temperature_scheduler import Temp_Scheduler
from gst_updated.scripts.wrapper.crowd_nav_interface_parallel import CrowdNavPredInterfaceMultiEnv

class VecPretextNormalize(VecEnvWrapper):
    """
    A vectorized wrapper that processes the observations and rewards, used for GST predictors
    and returns from an environment.
    config: a Config object
    test: whether we are training or testing
    """

    def __init__(self, venv, ob=False, ret=False, clipob=10., cliprew=10., gamma=0.99, epsilon=1e-8, config=None, test=False):
        VecEnvWrapper.__init__(self, venv)

        self.config = config
        self.device=torch.device(self.config.training.device)
        if test:
            self.num_envs = 1
        else:
            self.num_envs = self.config.env.num_processes

        self.max_human_num = config.sim.human_num + config.sim.human_num_range

        self.ob_rms = RunningMeanStd(shape=self.observation_space.shape) if ob else None
        self.ret_rms = RunningMeanStd(shape=()) if ret else None
        self.clipob = clipob
        self.cliprew = cliprew
        self.ret = torch.zeros(self.num_envs).to(self.device)
        self.gamma = gamma
        self.epsilon = epsilon

        # load and configure the prediction model
        load_path = os.path.join(os.getcwd(), self.config.pred.model_dir)
        if not os.path.isdir(load_path):
            raise RuntimeError('The result directory was not found.')
        checkpoint_dir = os.path.join(load_path, 'checkpoint')
        with open(os.path.join(checkpoint_dir, 'args.pickle'), 'rb') as f:
            self.args = pickle.load(f)

        self.predictor = CrowdNavPredInterfaceMultiEnv(load_path=load_path, device=self.device, config = self.args, num_env = self.num_envs)

        temperature_scheduler = Temp_Scheduler(self.args.num_epochs, self.args.init_temp, self.args.init_temp, temp_min=0.03)
        self.tau = temperature_scheduler.decay_whole_process(epoch=100)

        # handle different prediction and control frequency
        self.pred_interval = int(self.config.data.pred_timestep//self.config.env.time_step)
        self.buffer_len = (self.args.obs_seq_len - 1) * self.pred_interval + 1


    def talk2Env_async(self, data):
        self.venv.talk2Env_async(data)


    def talk2Env_wait(self):
        outs=self.venv.talk2Env_wait()
        return outs

    def step_wait(self):
        obs, rews, done, infos = self.venv.step_wait()

        # process the observations and reward
        obs, rews = self.process_obs_rew(obs, done, rews=rews)

        return obs, rews, done, infos

    def _obfilt(self, obs):
        if self.ob_rms and self.config.RLTrain:
            self.ob_rms.update(obs)
            obs = np.clip((obs - self.ob_rms.mean) / np.sqrt(self.ob_rms.var + self.epsilon), -self.clipob, self.clipob)
            return obs
        else:
            return obs

    def reset(self):
        # queue for inputs to the pred model
        # fill the queue with dummy values
        self.traj_buffer = deque(list(-torch.ones((self.buffer_len, self.num_envs, self.max_human_num, 2), device=self.device)*999),
                                 maxlen=self.buffer_len) # (n_env, num_peds, obs_seq_len, 2)
        self.mask_buffer = deque(list(torch.zeros((self.buffer_len, self.num_envs, self.max_human_num, 1), dtype=torch.bool, device=self.device)),
                                 maxlen=self.buffer_len) # (n_env, num_peds, obs_seq_len, 1)

        self.step_counter = 0

        # for calculating the displacement of human positions
        self.last_pos = torch.zeros(self.num_envs, self.max_human_num, 2).to(self.device)

        obs = self.venv.reset()
        obs, _ = self.process_obs_rew(obs, np.zeros(self.num_envs))

        return obs


    '''
    1. Process observations: 
    Run inference on pred model with past obs as inputs, fill in the predicted trajectory in O['spatial_edges']
    
    2. Process rewards: 
    Calculate reward for colliding with predicted future traj and add to the original reward, 
    same as calc_reward() function in crowd_sim_pred.py except the data are torch tensors
    '''
    def process_obs_rew(self, O, done, rews=0.):
        # O: robot_node: [nenv, 1, 7], spatial_edges: [nenv, observed_human_num, 2*(1+predict_steps)],temporal_edges: [nenv, 1, 2],
        # pos_mask: [nenv, max_human_num], pos_disp_mask: [nenv, max_human_num]
        # prepare inputs for pred_model
        # find humans' absolute positions
        human_pos = O['robot_node'][:, :, :2] + O['spatial_edges'][:, :, :2]

        # insert the new ob to deque
        self.traj_buffer.append(human_pos)
        self.mask_buffer.append(O['visible_masks'].unsqueeze(-1))
        # [obs_seq_len, nenv, max_human_num, 2] -> [nenv, max_human_num, obs_seq_len, 2]
        in_traj = torch.stack(list(self.traj_buffer)).permute(1, 2, 0, 3)
        in_mask = torch.stack(list(self.mask_buffer)).permute(1, 2, 0, 3).float()

        # index select the input traj and input mask for GST
        in_traj = in_traj[:, :, ::self.pred_interval]
        in_mask = in_mask[:, :, ::self.pred_interval]

        # forward predictor model
        out_traj, out_mask = self.predictor.forward(input_traj=in_traj, input_binary_mask=in_mask)
        out_mask = out_mask.bool()

        # add penalties if the robot collides with predicted future pos of humans
        # deterministic reward, only uses mu_x, mu_y and a predefined radius
        # constant radius of each personal zone circle
        # [nenv, human_num, predict_steps]
        hr_dist_future = out_traj[:, :, :, :2] - O['robot_node'][:, :, :2].unsqueeze(1)
        # [nenv, human_num, predict_steps]
        collision_idx = torch.norm(hr_dist_future, dim=-1) < self.config.robot.radius + self.config.humans.radius

        # [1,1, predict_steps]
        # mask out invalid predictions
        # [nenv, human_num, predict_steps] AND [nenv, human_num, 1]
        collision_idx = torch.logical_and(collision_idx, out_mask)
        coefficients = 2. ** torch.arange(2, self.config.sim.predict_steps + 2, device=self.device).reshape(
            (1, 1, self.config.sim.predict_steps))  # 4, 8, 16, 32, 64

        # [1, 1, predict_steps]
        collision_penalties = self.config.reward.collision_penalty / coefficients

        # [nenv, human_num, predict_steps]
        reward_future = collision_idx.to(torch.float)*collision_penalties
        # [nenv, human_num, predict_steps] -> [nenv, human_num*predict_steps] -> [nenv,]
        # keep the values & discard indices
        reward_future, _ = torch.min(reward_future.reshape(self.num_envs, -1), dim=1)
        # print(reward_future)
        # seems that rews is on cpu
        rews = rews + reward_future.reshape(self.num_envs, 1).cpu().numpy()

        # get observation back to env
        robot_pos = O['robot_node'][:, :, :2].unsqueeze(1)

        # convert from positions in world frame to robot frame
        out_traj[:, :, :, :2] = out_traj[:, :, :, :2] - robot_pos

        # only take mu_x and mu_y
        out_mask = out_mask.repeat(1, 1, self.config.sim.predict_steps * 2)
        new_spatial_edges = out_traj[:, :, :, :2].reshape(self.num_envs, self.max_human_num, -1)
        O['spatial_edges'][:, :, 2:][out_mask] = new_spatial_edges[out_mask]

        # sort all humans by distance to robot
        # [nenv, human_num]
        hr_dist_cur = torch.linalg.norm(O['spatial_edges'][:, :, :2], dim=-1)
        sorted_idx = torch.argsort(hr_dist_cur, dim=1)
        # sorted_idx = sorted_idx.unsqueeze(-1).repeat(1, 1, 2*(self.config.sim.predict_steps+1))
        for i in range(self.num_envs):
            O['spatial_edges'][i] = O['spatial_edges'][i][sorted_idx[i]]

        obs={'robot_node':O['robot_node'],
            'spatial_edges':O['spatial_edges'],
            'temporal_edges':O['temporal_edges'],
            'visible_masks':O['visible_masks'],
             'detected_human_num': O['detected_human_num'],

        }

        self.last_pos = copy.deepcopy(human_pos)
        self.step_counter = self.step_counter + 1

        return obs, rews


================================================
FILE: rl/vec_env/vec_remove_dict_obs.py
================================================
from .vec_env import VecEnvObservationWrapper

class VecExtractDictObs(VecEnvObservationWrapper):
    def __init__(self, venv, key):
        self.key = key
        super().__init__(venv=venv,
            observation_space=venv.observation_space.spaces[self.key])

    def process(self, obs):
        return obs[self.key]


================================================
FILE: test.py
================================================
import logging
import argparse
import os
import sys
from matplotlib import pyplot as plt
import torch
import torch.nn as nn

from rl.networks.envs import make_vec_envs
from rl.evaluation import evaluate
from rl.networks.model import Policy

from crowd_sim import *


def main():
	"""
	The main function for testing a trained model
	"""
	# the following parameters will be determined for each test run
	parser = argparse.ArgumentParser('Parse configuration file')
	# the model directory that we are testing
	parser.add_argument('--model_dir', type=str, default='trained_models/GST_predictor_rand')
	# render the environment or not
	parser.add_argument('--visualize', default=True, action='store_true')
	# if -1, it will run 500 different cases; if >=0, it will run the specified test case repeatedly
	parser.add_argument('--test_case', type=int, default=-1)
	# model weight file you want to test
	parser.add_argument('--test_model', type=str, default='41665.pt')
	# whether to save trajectories of episodes
	parser.add_argument('--render_traj', default=False, action='store_true')
	# whether to save slide show of episodes
	parser.add_argument('--save_slides', default=False, action='store_true')
	test_args = parser.parse_args()
	if test_args.save_slides:
		test_args.visualize = True

	from importlib import import_module
	model_dir_temp = test_args.model_dir
	if model_dir_temp.endswith('/'):
		model_dir_temp = model_dir_temp[:-1]
	# import arguments.py from saved directory
	# if not found, import from the default directory
	try:
		model_dir_string = model_dir_temp.replace('/', '.') + '.arguments'
		model_arguments = import_module(model_dir_string)
		get_args = getattr(model_arguments, 'get_args')
	except:
		print('Failed to get get_args function from ', test_args.model_dir, '/arguments.py')
		from arguments import get_args

	algo_args = get_args()

	# import config class from saved directory
	# if not found, import from the default directory

	try:
		model_dir_string = model_dir_temp.replace('/', '.') + '.configs.config'
		model_arguments = import_module(model_dir_string)
		Config = getattr(model_arguments, 'Config')

	except:
		print('Failed to get Config function from ', test_args.model_dir)
		from crowd_nav.configs.config import Config
	env_config = config = Config()


	# configure logging and device
	# print test result in log file
	log_file = os.path.join(test_args.model_dir,'test')
	if not os.path.exists(log_file):
		os.mkdir(log_file)
	if test_args.visualize:
		log_file = os.path.join(test_args.model_dir, 'test', 'test_visual.log')
	else:
		log_file = os.path.join(test_args.model_dir, 'test', 'test_' + test_args.test_model + '.log')


	file_handler = logging.FileHandler(log_file, mode='w')
	stdout_handler = logging.StreamHandler(sys.stdout)
	level = logging.INFO
	logging.basicConfig(level=level, handlers=[stdout_handler, file_handler],
						format='%(asctime)s, %(levelname)s: %(message)s', datefmt="%Y-%m-%d %H:%M:%S")

	logging.info('robot FOV %f', config.robot.FOV)
	logging.info('humans FOV %f', config.humans.FOV)

	torch.manual_seed(algo_args.seed)
	torch.cuda.manual_seed_all(algo_args.seed)
	if algo_args.cuda:
		if algo_args.cuda_deterministic:
			# reproducible but slower
			torch.backends.cudnn.benchmark = False
			torch.backends.cudnn.deterministic = True
		else:
			# not reproducible but faster
			torch.backends.cudnn.benchmark = True
			torch.backends.cudnn.deterministic = False


	torch.set_num_threads(1)
	device = torch.device("cuda" if algo_args.cuda else "cpu")

	logging.info('Create other envs with new settings')

	# set up visualization
	if test_args.visualize:
		fig, ax = plt.subplots(figsize=(7, 7))
		ax.set_xlim(-6.5, 6.5) # 6
		ax.set_ylim(-6.5, 6.5)
		ax.axes.xaxis.set_visible(False)
		ax.axes.yaxis.set_visible(False)
		# ax.set_xlabel('x(m)', fontsize=16)
		# ax.set_ylabel('y(m)', fontsize=16)
		plt.ion()
		plt.show()
	else:
		ax = None


	load_path=os.path.join(test_args.model_dir,'checkpoints', test_args.test_model)
	print(load_path)


	# create an environment
	env_name = algo_args.env_name

	eval_dir = os.path.join(test_args.model_dir,'eval')
	if not os.path.exists(eval_dir):
		os.mkdir(eval_dir)

	env_config.render_traj = test_args.render_traj
	env_config.save_slides = test_args.save_slides
	env_config.save_path = os.path.join(test_args.model_dir, 'social_eval', test_args.test_model[:-3])
	envs = make_vec_envs(env_name, algo_args.seed, 1,
						 algo_args.gamma, eval_dir, device, allow_early_resets=True,
						 config=env_config, ax=ax, test_case=test_args.test_case, pretext_wrapper=config.env.use_wrapper)

	if config.robot.policy not in ['orca', 'social_force']:
		# load the policy weights
		actor_critic = Policy(
			envs.observation_space.spaces,
			envs.action_space,
			base_kwargs=algo_args,
			base=config.robot.policy)
		actor_critic.load_state_dict(torch.load(load_path, map_location=device))
		actor_critic.base.nenv = 1

		# allow the usage of multiple GPUs to increase the number of examples processed simultaneously
		nn.DataParallel(actor_critic).to(device)
	else:
		actor_critic = None

	test_size = config.env.test_size

	# call the evaluation function
	evaluate(actor_critic, envs, 1, device, test_size, logging, config, algo_args, test_args.visualize)


if __name__ == '__main__':
	main()

================================================
FILE: train.py
================================================
import os
import shutil
import time
from collections import deque
import numpy as np
import torch
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt

from rl import ppo
from rl.networks import network_utils
from arguments import get_args
from rl.networks.envs import make_vec_envs
from rl.networks.model import Policy
from rl.networks.storage import RolloutStorage


from crowd_nav.configs.config import Config
from crowd_sim import *


def main():
	"""
	main function for training a robot policy network
	"""
	# read arguments
	algo_args = get_args()

	# create a directory for saving the logs and weights
	if not os.path.exists(algo_args.output_dir):
		os.makedirs(algo_args.output_dir)
	# if output_dir exists and overwrite = False
	elif not algo_args.overwrite:
		raise ValueError('output_dir already exists!')

	save_config_dir = os.path.join(algo_args.output_dir, 'configs')
	if not os.path.exists(save_config_dir):
		os.makedirs(save_config_dir)
	shutil.copy('crowd_nav/configs/config.py', save_config_dir)
	shutil.copy('crowd_nav/configs/__init__.py', save_config_dir)
	shutil.copy('arguments.py', algo_args.output_dir)


	env_config = config = Config()

	torch.manual_seed(algo_args.seed)
	torch.cuda.manual_seed_all(algo_args.seed)
	if algo_args.cuda:
		if algo_args.cuda_deterministic:
			# reproducible but slower
			torch.backends.cudnn.benchmark = False
			torch.backends.cudnn.deterministic = True
		else:
			# not reproducible but faster
			torch.backends.cudnn.benchmark = True
			torch.backends.cudnn.deterministic = False


	torch.set_num_threads(algo_args.num_threads)
	device = torch.device("cuda" if algo_args.cuda else "cpu")


	env_name = algo_args.env_name

	if config.sim.render:
		algo_args.num_processes = 1
		algo_args.num_mini_batch = 1

	# for visualization
	if config.sim.render:
		fig, ax = plt.subplots(figsize=(7, 7))
		ax.set_xlim(-10, 10)
		ax.set_ylim(-10, 10)
		ax.set_xlabel('x(m)', fontsize=16)
		ax.set_ylabel('y(m)', fontsize=16)
		plt.ion()
		plt.show()
	else:
		ax = None


	# Create a wrapped, monitored VecEnv
	envs = make_vec_envs(env_name, algo_args.seed, algo_args.num_processes,
						 algo_args.gamma, None, device, False, config=env_config, ax=ax, pretext_wrapper=config.env.use_wrapper)


	# create a policy network
	actor_critic = Policy(
		envs.observation_space.spaces, # pass the Dict into policy to parse
		envs.action_space,
		base_kwargs=algo_args,
		base=config.robot.policy)

	# storage buffer to store the agent's experience
	rollouts = RolloutStorage(algo_args.num_steps,
							  algo_args.num_processes,
							  envs.observation_space.spaces,
							  envs.action_space,
							  algo_args.human_node_rnn_size,
							  algo_args.human_human_edge_rnn_size)

	# continue training from an existing model if resume = True
	if algo_args.resume:
		load_path = config.training.load_path
		actor_critic.load_state_dict(torch.load(load_path))
		print("Loaded the following checkpoint:", load_path)


	# allow the usage of multiple GPUs to increase the number of examples processed simultaneously
	nn.DataParallel(actor_critic).to(device)

	# create the ppo optimizer
	agent = ppo.PPO(
		actor_critic,
		algo_args.clip_param,
		algo_args.ppo_epoch,
		algo_args.num_mini_batch,
		algo_args.value_loss_coef,
		algo_args.entropy_coef,
		lr=algo_args.lr,
		eps=algo_args.eps,
		max_grad_norm=algo_args.max_grad_norm)


	obs = envs.reset()
	if isinstance(obs, dict):
		for key in obs:
			rollouts.obs[key][0].copy_(obs[key])
	else:
		rollouts.obs[0].copy_(obs)

	rollouts.to(device)

	episode_rewards = deque(maxlen=100)

	start = time.time()
	num_updates = int(
		algo_args.num_env_steps) // algo_args.num_steps // algo_args.num_processes

	# start the training loop
	for j in range(num_updates):
		# schedule learning rate if needed
		if algo_args.use_linear_lr_decay:
			network_utils.update_linear_schedule(
				agent.optimizer, j, num_updates,
				agent.optimizer.lr if algo_args.algo == "acktr" else algo_args.lr)

		# step the environment for a few times
		for step in range(algo_args.num_steps):
			# Sample actions
			with torch.no_grad():

				rollouts_obs = {}
				for key in rollouts.obs:
					rollouts_obs[key] = rollouts.obs[key][step]
				rollouts_hidden_s = {}
				for key in rollouts.recurrent_hidden_states:
					rollouts_hidden_s[key] = rollouts.recurrent_hidden_states[key][step]
				value, action, action_log_prob, recurrent_hidden_states = actor_critic.act(
					rollouts_obs, rollouts_hidden_s,
					rollouts.masks[step])

			# if we use real prediction, send predictions to env for rendering
			if env_name == 'CrowdSimPredRealGST-v0' and env_config.env.use_wrapper:
				# [nenv, max_human_num, 2*(pred_steps+1)] -> [nenv, max_human_num, 2*pred_steps]
				out_pred = rollouts_obs['spatial_edges'][:, :, 2:].to('cpu').numpy()
				# send manager action to all processes
				ack = envs.talk2Env(out_pred)
				assert all(ack)

			if config.sim.render:
				envs.render()
			# Obser reward and next obs
			obs, reward, done, infos = envs.step(action)


			for info in infos:
				if 'episode' in info.keys():
					episode_rewards.append(info['episode']['r'])

			# If done then clean the history of observations.
			masks = torch.FloatTensor(
				[[0.0] if done_ else [1.0] for done_ in done])
			bad_masks = torch.FloatTensor(
				[[0.0] if 'bad_transition' in info.keys() else [1.0]
				 for info in infos])
			rollouts.insert(obs, recurrent_hidden_states, action,
							action_log_prob, value, reward, masks, bad_masks)
		# store the stepped experience to buffer
		with torch.no_grad():
			rollouts_obs = {}
			for key in rollouts.obs:
				rollouts_obs[key] = rollouts.obs[key][-1]
			rollouts_hidden_s = {}
			for key in rollouts.recurrent_hidden_states:
				rollouts_hidden_s[key] = rollouts.recurrent_hidden_states[key][-1]
			next_value = actor_critic.get_value(
				rollouts_obs, rollouts_hidden_s,
				rollouts.masks[-1]).detach()

		# compute advantage and gradient, and update the network parameters
		rollouts.compute_returns(next_value, algo_args.use_gae, algo_args.gamma,
								 algo_args.gae_lambda, algo_args.use_proper_time_limits)

		value_loss, action_loss, dist_entropy = agent.update(rollouts)

		rollouts.after_update()

		# save the model for every interval-th episode or for the last epoch
		if (j % algo_args.save_interval == 0
			or j == num_updates - 1) :
			save_path = os.path.join(algo_args.output_dir, 'checkpoints')
			if not os.path.exists(save_path):
				os.mkdir(save_path)

			torch.save(actor_critic.state_dict(), os.path.join(save_path, '%.5i'%j + ".pt"))

		if j % algo_args.log_interval == 0 and len(episode_rewards) > 1:
			total_num_steps = (j + 1) * algo_args.num_processes * algo_args.num_steps
			end = time.time()
			print(
				"Updates {}, num timesteps {}, FPS {} \n Last {} training episodes: mean/median reward "
				"{:.1f}/{:.1f}, min/max reward {:.1f}/{:.1f}\n"
					.format(j, total_num_steps,
							int(total_num_steps / (end - start)),
							len(episode_rewards), np.mean(episode_rewards),
							np.median(episode_rewards), np.min(episode_rewards),
							np.max(episode_rewards), dist_entropy, value_loss,
							action_loss))

			df = pd.DataFrame({'misc/nupdates': [j], 'misc/total_timesteps': [total_num_steps],
							   'fps': int(total_num_steps / (end - start)), 'eprewmean': [np.mean(episode_rewards)],
							   'loss/policy_entropy': dist_entropy, 'loss/policy_loss': action_loss,
							   'loss/value_loss': value_loss})

			if os.path.exists(os.path.join(algo_args.output_dir, 'progress.csv')) and j > 20:
				df.to_csv(os.path.join(algo_args.output_dir, 'progress.csv'), mode='a', header=False, index=False)
			else:
				df.to_csv(os.path.join(algo_args.output_dir, 'progress.csv'), mode='w', header=True, index=False)

	envs.close()


if __name__ == '__main__':
	main()