Full Code of princeton-computational-imaging/scenario-dreamer for AI

main 675423469766 cached

274 files

51.8 MB

13.6M tokens

500 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (54,320K chars total). Download the full file to get everything.

Repository: princeton-computational-imaging/scenario-dreamer
Branch: main
Commit: 675423469766
Files: 274
Total size: 51.8 MB

Directory structure:
gitextract_kak740hh/

├── .gitignore
├── .gitmodules
├── README.md
├── cfgs/
│   ├── config.py
│   ├── config.yaml
│   ├── datamodule/
│   │   ├── base.yaml
│   │   ├── nuplan_autoencoder.yaml
│   │   ├── nuplan_ctrl_sim.yaml
│   │   ├── nuplan_ldm.yaml
│   │   ├── waymo_autoencoder.yaml
│   │   ├── waymo_ctrl_sim.yaml
│   │   └── waymo_ldm.yaml
│   ├── dataset/
│   │   ├── nuplan_autoencoder.yaml
│   │   ├── nuplan_base.yaml
│   │   ├── nuplan_ctrl_sim.yaml
│   │   ├── nuplan_ldm.yaml
│   │   ├── waymo_autoencoder.yaml
│   │   ├── waymo_base.yaml
│   │   ├── waymo_ctrl_sim.yaml
│   │   └── waymo_ldm.yaml
│   ├── dataset_name/
│   │   ├── nuplan.yaml
│   │   └── waymo.yaml
│   ├── eval/
│   │   ├── base.yaml
│   │   ├── nuplan_autoencoder.yaml
│   │   ├── nuplan_ldm.yaml
│   │   ├── waymo_autoencoder.yaml
│   │   └── waymo_ldm.yaml
│   ├── model/
│   │   ├── autoencoder.yaml
│   │   ├── ldm.yaml
│   │   ├── nuplan_autoencoder.yaml
│   │   ├── nuplan_ctrl_sim.yaml
│   │   ├── nuplan_ldm.yaml
│   │   ├── waymo_autoencoder.yaml
│   │   ├── waymo_ctrl_sim.yaml
│   │   └── waymo_ldm.yaml
│   ├── sim/
│   │   ├── base.yaml
│   │   ├── scenario_dreamer_100m.yaml
│   │   ├── scenario_dreamer_100m_adv.yaml
│   │   ├── scenario_dreamer_55m.yaml
│   │   ├── waymo_ctrl_sim.yaml
│   │   └── waymo_log_replay.yaml
│   └── train/
│       ├── base.yaml
│       ├── nuplan_autoencoder.yaml
│       ├── nuplan_ctrl_sim.yaml
│       ├── nuplan_ldm.yaml
│       ├── waymo_autoencoder.yaml
│       ├── waymo_ctrl_sim.yaml
│       └── waymo_ldm.yaml
├── data_processing/
│   ├── nuplan/
│   │   ├── generate_nuplan_dataset.py
│   │   └── preprocess_dataset_nuplan.py
│   ├── postprocess_simulation_environments.py
│   └── waymo/
│       ├── add_nocturne_compatible_val_scenarios_to_test.py
│       ├── convert_pickles_to_jsons.py
│       ├── create_gpudrive_pickles.py
│       ├── create_waymo_eval_set.py
│       ├── generate_k_disks_vocabulary.py
│       ├── generate_waymo_dataset.py
│       └── preprocess_dataset_waymo.py
├── datamodules/
│   ├── nuplan/
│   │   ├── nuplan_datamodule_autoencoder.py
│   │   └── nuplan_datamodule_ldm.py
│   └── waymo/
│       ├── waymo_datamodule_autoencoder.py
│       ├── waymo_datamodule_ctrl_sim.py
│       └── waymo_datamodule_ldm.py
├── datasets/
│   ├── nuplan/
│   │   ├── dataset_autoencoder_nuplan.py
│   │   └── dataset_ldm_nuplan.py
│   └── waymo/
│       ├── dataset_autoencoder_waymo.py
│       ├── dataset_ctrl_sim.py
│       └── dataset_ldm_waymo.py
├── environment.yml
├── eval.py
├── metadata/
│   ├── gpudrive_checkpoint/
│   │   └── pretrained.pt
│   ├── initial_prob_matrix_nuplan.pt
│   ├── initial_prob_matrix_waymo.pt
│   ├── inpainting_prob_matrix_nuplan.pt
│   ├── inpainting_prob_matrix_waymo.pt
│   ├── k_disks_vocab_384_10Hz_seed42.pkl
│   ├── latent_stats/
│   │   ├── scenario_dreamer_autoencoder_nuplan.pkl
│   │   └── scenario_dreamer_autoencoder_waymo.pkl
│   ├── nocturne_test_filenames.pkl
│   ├── nocturne_train_filenames.pkl
│   ├── nocturne_val_filenames.pkl
│   ├── nuplan_eval_set.pkl
│   ├── simulation_environment_datasets/
│   │   ├── scenario_dreamer_waymo_200m_jsons/
│   │   │   ├── 0_2.json
│   │   │   ├── 0_7.json
│   │   │   ├── 10_11.json
│   │   │   ├── 10_3.json
│   │   │   ├── 10_4.json
│   │   │   ├── 11_13.json
│   │   │   ├── 11_5.json
│   │   │   ├── 11_6.json
│   │   │   ├── 12_7.json
│   │   │   ├── 12_8.json
│   │   │   ├── 13_10.json
│   │   │   ├── 13_6.json
│   │   │   ├── 13_8.json
│   │   │   ├── 14_5.json
│   │   │   ├── 14_9.json
│   │   │   ├── 15_1.json
│   │   │   ├── 15_6.json
│   │   │   ├── 15_7.json
│   │   │   ├── 16_11.json
│   │   │   ├── 16_7.json
│   │   │   ├── 17_6.json
│   │   │   ├── 17_7.json
│   │   │   ├── 18_12.json
│   │   │   ├── 19_13.json
│   │   │   ├── 1_0.json
│   │   │   ├── 1_1.json
│   │   │   ├── 1_11.json
│   │   │   ├── 1_13.json
│   │   │   ├── 1_3.json
│   │   │   ├── 20_8.json
│   │   │   ├── 21_13.json
│   │   │   ├── 22_11.json
│   │   │   ├── 22_12.json
│   │   │   ├── 23_5.json
│   │   │   ├── 23_6.json
│   │   │   ├── 23_8.json
│   │   │   ├── 24_14.json
│   │   │   ├── 24_6.json
│   │   │   ├── 25_0.json
│   │   │   ├── 25_5.json
│   │   │   ├── 25_6.json
│   │   │   ├── 26_10.json
│   │   │   ├── 26_2.json
│   │   │   ├── 26_6.json
│   │   │   ├── 26_8.json
│   │   │   ├── 27_13.json
│   │   │   ├── 27_2.json
│   │   │   ├── 28_77.json
│   │   │   ├── 28_9.json
│   │   │   ├── 29_0.json
│   │   │   ├── 2_2.json
│   │   │   ├── 2_3.json
│   │   │   ├── 30_5.json
│   │   │   ├── 30_6.json
│   │   │   ├── 31_11.json
│   │   │   ├── 3_7.json
│   │   │   ├── 4_12.json
│   │   │   ├── 4_4.json
│   │   │   ├── 4_7.json
│   │   │   ├── 4_9.json
│   │   │   ├── 5_1.json
│   │   │   ├── 5_10.json
│   │   │   ├── 5_8.json
│   │   │   ├── 6_0.json
│   │   │   ├── 6_5.json
│   │   │   ├── 7_10.json
│   │   │   ├── 7_11.json
│   │   │   ├── 7_14.json
│   │   │   ├── 7_7.json
│   │   │   ├── 7_9.json
│   │   │   ├── 8_2.json
│   │   │   ├── 8_4.json
│   │   │   ├── 8_8.json
│   │   │   ├── 9_2.json
│   │   │   └── 9_4.json
│   │   └── scenario_dreamer_waymo_200m_pickles/
│   │       ├── 0_2.pkl
│   │       ├── 0_7.pkl
│   │       ├── 10_11.pkl
│   │       ├── 10_3.pkl
│   │       ├── 10_4.pkl
│   │       ├── 11_13.pkl
│   │       ├── 11_5.pkl
│   │       ├── 11_6.pkl
│   │       ├── 12_7.pkl
│   │       ├── 12_8.pkl
│   │       ├── 13_10.pkl
│   │       ├── 13_6.pkl
│   │       ├── 13_8.pkl
│   │       ├── 14_5.pkl
│   │       ├── 14_9.pkl
│   │       ├── 15_1.pkl
│   │       ├── 15_6.pkl
│   │       ├── 15_7.pkl
│   │       ├── 16_11.pkl
│   │       ├── 16_7.pkl
│   │       ├── 17_6.pkl
│   │       ├── 17_7.pkl
│   │       ├── 18_12.pkl
│   │       ├── 19_13.pkl
│   │       ├── 1_0.pkl
│   │       ├── 1_1.pkl
│   │       ├── 1_11.pkl
│   │       ├── 1_13.pkl
│   │       ├── 1_3.pkl
│   │       ├── 20_8.pkl
│   │       ├── 21_13.pkl
│   │       ├── 22_11.pkl
│   │       ├── 22_12.pkl
│   │       ├── 23_5.pkl
│   │       ├── 23_6.pkl
│   │       ├── 23_8.pkl
│   │       ├── 24_14.pkl
│   │       ├── 24_6.pkl
│   │       ├── 25_0.pkl
│   │       ├── 25_5.pkl
│   │       ├── 25_6.pkl
│   │       ├── 26_10.pkl
│   │       ├── 26_2.pkl
│   │       ├── 26_6.pkl
│   │       ├── 26_8.pkl
│   │       ├── 27_13.pkl
│   │       ├── 27_2.pkl
│   │       ├── 28_77.pkl
│   │       ├── 28_9.pkl
│   │       ├── 29_0.pkl
│   │       ├── 2_2.pkl
│   │       ├── 2_3.pkl
│   │       ├── 30_5.pkl
│   │       ├── 30_6.pkl
│   │       ├── 31_11.pkl
│   │       ├── 3_7.pkl
│   │       ├── 4_12.pkl
│   │       ├── 4_4.pkl
│   │       ├── 4_7.pkl
│   │       ├── 4_9.pkl
│   │       ├── 5_1.pkl
│   │       ├── 5_10.pkl
│   │       ├── 5_8.pkl
│   │       ├── 6_0.pkl
│   │       ├── 6_5.pkl
│   │       ├── 7_10.pkl
│   │       ├── 7_11.pkl
│   │       ├── 7_14.pkl
│   │       ├── 7_7.pkl
│   │       ├── 7_9.pkl
│   │       ├── 8_2.pkl
│   │       ├── 8_4.pkl
│   │       ├── 8_8.pkl
│   │       ├── 9_2.pkl
│   │       └── 9_4.pkl
│   ├── sledge_files/
│   │   └── nuplan.yaml
│   └── waymo_eval_set.pkl
├── metrics.py
├── models/
│   ├── ctrl_sim.py
│   ├── scenario_dreamer_autoencoder.py
│   └── scenario_dreamer_ldm.py
├── nn_modules/
│   ├── autoencoder.py
│   ├── ctrl_sim.py
│   ├── dit.py
│   └── ldm.py
├── policies/
│   ├── idm_policy.py
│   └── rl_policy.py
├── run_simulation.py
├── scripts/
│   ├── define_env_variables.sh
│   ├── extract_nuplan_data.sh
│   ├── extract_waymo_data.sh
│   ├── preprocess_ctrl_sim_waymo_dataset.sh
│   ├── preprocess_nuplan_dataset.sh
│   └── preprocess_waymo_dataset.sh
├── simulator.py
├── train.py
└── utils/
    ├── __init__.py
    ├── collision_helpers.py
    ├── data_container.py
    ├── data_helpers.py
    ├── diffusion_helpers.py
    ├── dit_layers.py
    ├── geometry.py
    ├── gpudrive_helpers.py
    ├── inpainting_helpers.py
    ├── k_disks_helpers.py
    ├── lane_graph_helpers.py
    ├── layers.py
    ├── losses.py
    ├── metrics_helpers.py
    ├── pyg_helpers.py
    ├── sim_env_helpers.py
    ├── sim_helpers.py
    ├── sledge_helpers.py
    ├── torch_helpers.py
    ├── train_helpers.py
    └── viz.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
slurm_logs/
*.out
__pycache__/
*/__pycache__/
lightning_logs/
viz_*/
movies*/
metadata/simulation_environment_datasets/scenario_dreamer_waymo_55m_pickles/
metadata/simulation_environment_datasets/scenario_dreamer_waymo_100m_pickles/
metadata/simulation_environment_datasets/scenario_dreamer_waymo_55m_jsons/
metadata/simulation_environment_datasets/scenario_dreamer_waymo_100m_jsons/
metadata/simulation_environment_datasets/waymo_sim_test_pickles/
metadata/simulation_environment_datasets/waymo_sim_test_jsons/
run.sh

================================================
FILE: .gitmodules
================================================
[submodule "gpudrive"]
	path = gpudrive
	url = https://github.com/RLuke22/gpudrive-scenario-dreamer.git


================================================
FILE: README.md
================================================
# Official Repository for Scenario Dreamer

<p align="left">
<a href="https://arxiv.org/abs/2503.22496" alt="arXiv">
    <img src="https://img.shields.io/badge/arXiv-2503.22496-b31b1b.svg?style=flat" /></a>
<a href="https://princeton-computational-imaging.github.io/scenario-dreamer/" alt="webpage">
    <img src="https://img.shields.io/badge/Project Page-Scenario Dreamer-blue" /></a>
<a href="https://youtu.be/kShULuL8VO4" alt="youtube">
    <img src="https://img.shields.io/badge/YouTube-Video-FF0000?logo=youtube&logoColor=FF0000" /></a>
<a href="https://drive.google.com/drive/folders/13DSHf2UhrvguD7i7iYL5SfSDhgLcW_ja?usp=sharing" alt="google drive">
    <img src="https://img.shields.io/badge/Data%20and%20Models-grey?logo=googledrive&logoColor=fff)" /></a>

> [**Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments**](https://arxiv.org/abs/2503.22496)  <br>
> [Luke Rowe](https://rluke22.github.io)<sup>1,2,6</sup>, [Roger Girgis](https://mila.quebec/en/person/roger-girgis/)<sup>1,3,6</sup>, [Anthony Gosselin](https://www.linkedin.com/in/anthony-gosselin-098b7a1a1/)<sup>1,3</sup>, [Liam Paull](https://liampaull.ca/)<sup>1,2,5</sup>, [Christopher Pal](https://sites.google.com/view/christopher-pal)<sup>1,2,3,5</sup>, [Felix Heide](https://www.cs.princeton.edu/~fheide/)<sup>4,6</sup>  <br>
> <sup>1</sup> Mila, <sup>2</sup> Université de Montréal, <sup>3</sup> Polytechnique Montréal, <sup>4</sup> Princeton University, <sup>5</sup> CIFAR AI Chair, <sup>6</sup> Torc Robotics <br>
> <br>
> Computer Vision and Pattern Recognition (CVPR), 2025 <br>
>

We propose Scenario Dreamer, a fully data-driven closed-loop generative simulator for autonomous vehicle planning.

<video src="https://github.com/user-attachments/assets/83bcea5f-a459-45b7-8d36-eb9dd76e100a" width="250" height="250"></video>

## Repository Timeline

- [x] [06/11/2025] Environment setup
- [x] [06/11/2025] Dataset Preprocessing
- [x] [07/21/2025] Train Scenario Dreamer autoencoder model on Waymo and NuPlan
- [x] [07/21/2025] Train Scenario Dreamer latent diffusion model on Waymo and NuPlan
- [x] [07/21/2025] Support generation of Scenario Dreamer initial scenes
- [x] [07/21/2025] Support visualization of Scenario Dreamer initial scenes
- [x] [07/21/2025] Support computing evaluation metrics
- [x] [07/21/2025] Release of pre-trained Scenario Dreamer checkpoints
- [x] [07/21/2025] Support lane-conditioned object generation
- [x] [10/25/2025] Support inpainting generation mode
- [x] [10/31/2025] Support generation of large simulation environments
- [x] [11/05/2025] CtRL-Sim Dataset Preprocessing
- [x] [11/05/2025] Train CtRL-Sim behaviour model on Waymo
- [x] [11/25/2025] Release of 1M-step pre-trained CtRL-Sim checkpoint
- [x] [11/25/2025] Evaluate IDM policy in Scenario Dreamer environments
- [x] [01/12/2026] Train Scenario-Dreamer compatible agents in GPUDrive
- [x] [01/12/2026] Evaluate GPUDrive-trained RL policy in Waymo and Scenario Dreamer environments

## Table of Contents
1. [Setup](#setup)
2. [Waymo Dataset Preparation](#waymo-dataset-preparation)
3. [Nuplan Dataset Preparation](#nuplan-dataset-preparation)
4. [Pre-Trained Checkpoints](#pretrained-checkpoints)
5. [Training](#training)
6. [Evaluation](#evaluation)
7. [Simulation](#simulation)
8. [GPUDrive Integration](#gpudrive-integration)
9. [Citation](#citation)
10. [Acknowledgements](#acknowledgements)

## Setup <a name="setup"></a>

Start by cloning the repository
```
git clone https://github.com/princeton-computational-imaging/scenario-dreamer.git
cd scenario-dreamer
```

This repository assumes you have a "scratch" directory for larger files (datasets, checkpoints, etc.). If disk space is not an issue, you can keep everything in the repository directory:
```
export SCRATCH_ROOT=$(pwd) # prefer a separate drive? Point SCRATCH_ROOT there instead.
```

Define environment variables to let the code know where things live:
```
source $(pwd)/scripts/define_env_variables.sh
```

### Conda Setup 

```
# create conda environment
conda env create -f environment.yml
conda activate scenario-dreamer

# login to wandb for experiment logging
export WANDB_API_KEY=<your_api_key>
wandb login
```

## Waymo Dataset Preparation <a name="waymo-dataset-preparation"></a>

> **Quick Option:**  
> If you'd prefer to skip data extraction and preprocessing, you can directly download the prepared files.
> Place the following tar files in your scratch directory and extract:
> - `scenario_dreamer_ae_preprocess_waymo.tar` (preprocessed dataset for Scenario Dreamer autoencoder training on Waymo)  
> - `scenario_dreamer_ctrl_sim_preprocess.tar.gz` (preprocessed dataset for CtRL-Sim training on Waymo)
> [Download from Google Drive](https://drive.google.com/drive/folders/13DSHf2UhrvguD7i7iYL5SfSDhgLcW_ja?usp=sharing)  

<details> <summary><strong>Instructions</strong></summary>

Download the Waymo Open Motion Dataset (v1.1.0) into your scratch directory with the following directory structure:

```
$SCRATCH_ROOT/waymo_open_dataset_motion_v_1_1_0/
├── training/
│   ├── training.tfrecord-00000-of-01000
│   ├── …
│   └── training.tfrecord-00999-of-01000
├── validation/
│   ├── validation.tfrecord-00000-of-00150
│   ├── …
│   └── validation.tfrecord-00149-of-00150
└── testing/
    ├── testing.tfrecord-00000-of-00150
    ├── …
    └── testing.tfrecord-00149-of-00150
```

Then, we preprocess the waymo dataset to prepare for Scenario Dreamer model training. The first script takes ~12hrs and the second script takes ~12hrs (8 CPU cores, 64GB RAM):
```
bash scripts/extract_waymo_data.sh # extract relevant data from tfrecords and create train/val/test splits
bash scripts/preprocess_waymo_dataset.sh # preprocess data to facilitate efficient model training
bash scripts/preprocess_ctrl_sim_waymo_dataset.sh # preprocess data to facilitate efficient ctrl_sim model training
```

</details>

## NuPlan Dataset Preparation <a name="nuplan-dataset-preparation"></a>

> **Quick Option:**  
> If you'd prefer to skip data extraction and preprocessing, you can directly download the prepared files:
> Place the following files in your scratch directory and extract:
> - `scenario_dreamer_nuplan.tar` (processed nuPlan data (required for computing metrics, but not required for training)) 
> - `scenario_dreamer_ae_preprocess_nuplan.tar` (preprocessed dataset for Scenario Dreamer autoencoder training on nuplan)  
> [Download from Google Drive](https://drive.google.com/drive/folders/13DSHf2UhrvguD7i7iYL5SfSDhgLcW_ja?usp=sharing)  

<details> <summary><strong>Instructions</strong></summary>

We use the same extracted NuPlan data as [SLEDGE](https://github.com/autonomousvision/sledge), with minor modifications tailored for **Scenario Dreamer**. Our modified fork for extracting the Nuplan data is available [here](https://github.com/RLuke22/sledge-scenario-dreamer).

#### Step-by-Step Instructions

1. **Install dependencies & download raw NuPlan data**  
   Follow the guide in the [`installation.md`](https://github.com/RLuke22/sledge-scenario-dreamer/blob/main/docs/installation.md) file of our forked repo.  
   This will walk you through:
   - Downloading the NuPlan dataset
   - Setting up the correct environment variables
   - Installing the `sledge-devkit`

2. **Extract NuPlan data**  
   Use the instructions under [“1. Feature Caching”](https://github.com/RLuke22/sledge-scenario-dreamer/blob/main/docs/autoencoder.md#1-feature-caching) in the `autoencoder.md` to preprocess the NuPlan data.

3. **Extract train/val/test splits and preprocess data for training**  
   Run the following to extract train/val/test splits and create the preprocessed data for training.
   ```
   bash scripts/extract_nuplan_data.sh # create train/val/test splits and create eval set for computing metrics
   bash scripts/preprocess_nuplan_dataset.sh # preprocess data to facilitate efficient model training
   ```

</details>

## Pre-Trained Checkpoints <a name="pretrained-checkpoints"></a>

Pre-trained checkpoints can be downloaded from [Google Drive](https://drive.google.com/drive/folders/1G9jUA_wgF2Vo40I5HckO1yxUjA_0kUEJ?usp=sharing). Place the `checkpoints` directory into your scratch (`$SCRATCH_ROOT`) directory. 

To download all checkpoints into your scratch directory, run:
```bash
cd $SCRATCH_ROOT
gdown --folder https://drive.google.com/drive/folders/1G9jUA_wgF2Vo40I5HckO1yxUjA_0kUEJ
```

#### Checkpoints

| Model                                             | Dataset | Size     | SHA‑256    |
|---------------------------------------------------|---------|----------|------------|
| [Autoencoder](https://drive.google.com/drive/folders/1kGvNQleNL_FAn956ngdx2Ga0iQDRkX_8?usp=sharing)                                   |  Waymo  | 362 MB   | `3c3033a107de727ca1c2399a8e0df107e5eb1a84bce3d7e18cc2e01698ccf6ac` |
| [LDM Large](https://drive.google.com/drive/folders/1xV35C5aEjbjbGsujzz134VvwgrZG4gWW?usp=sharing)                                     |  Waymo  | 12.4 GB  | `06a1a65e9949f55c3398aeadacde388b03a6705f2661bc273cf43e7319de4cd5` |
| [Autoencoder](https://drive.google.com/drive/folders/1d7mX2GcD_1SP2YT5hWWXtAmm8ralOM_d?usp=sharing)                                   |  Nuplan | 371 MB   | `386b1f89eda71c5cdf6d29f7c343293e1a74bbd09395bfdeab6c2fb57f43e258` |
| [LDM Large](https://drive.google.com/drive/folders/1Uhtzy8ovrvMU6lhksppIwmnG6G3o0GRM?usp=sharing)                                     |  Nuplan | 12.5 GB  | `2151e59307282e29b456ffc7338b9ece92fc2e2cf22ef93a67929da3176b5c59` |
| [CtRL-Sim](https://drive.google.com/drive/folders/1vw84BCfqqolY4DTl3YXOZwfY6nYnCmNo?usp=sharing)                                     |  Waymo | 83.3 MB  | `8ed2d3a0546a06907f797224492c44b38013ae804af1de0fe9991814d12d0062` |

**Note**: The LDM Large checkpoints were trained for 250k steps. While the Scenario Dreamer paper reports results at 165k steps, training to 250k steps leads to improvements across most metrics. For this reason, we are releasing the 250k step checkpoints and the expected results are marginally better than those reported in the paper.

<details> <summary><strong>Expected Performance</strong></summary>

<details> <summary><strong>Scenario Dreamer L Waymo</strong></summary>


| **Lane metrics**            | Value | **Agent metrics**   | Value |
|-----------------------------|-------|---------------------|-------|
| route_length_mean (m)       | 38.80 | nearest_dist_jsd    | 0.05  |
| route_length_std (m)        | 13.56 | lat_dev_jsd         | 0.03  |
| endpoint_dist_mean (m)      | 0.21  | ang_dev_jsd         | 0.08  |
| endpoint_dist_std (m)       | 0.81  | length_jsd          | 0.43  |
| frechet_connectivity        | 0.10  | width_jsd           | 0.29  |
| frechet_density             | 0.26  | speed_jsd           | 0.38  |
| frechet_reach               | 0.26  | collision_rate (%)  | 4.01  |
| frechet_convenience         | 1.29  |                     |       |

</details>

<details> <summary><strong>Scenario Dreamer L Nuplan</strong></summary>


| **Lane metrics**            | Value | **Agent metrics**   | Value |
|-----------------------------|-------|---------------------|-------|
| route_length_mean (m)       | 36.68 | nearest_dist_jsd    | 0.08  |
| route_length_std (m)        | 10.39 | lat_dev_jsd         | 0.10  |
| endpoint_dist_mean (m)      | 0.25  | ang_dev_jsd         | 0.11  |
| endpoint_dist_std (m)       | 0.71  | length_jsd          | 0.25  |
| frechet_connectivity        | 0.08  | width_jsd           | 0.20  |
| frechet_density             | 0.25  | speed_jsd           | 0.06  |
| frechet_reach               | 0.05  | collision_rate (%)  | 9.22  |
| frechet_convenience         | 0.40  |                     |       |

</details>

</details>

## Training <a name="training"></a>

### 📈 Autoencoder Training

<details> <summary><strong>1. Prerequisites</strong></summary>

- Verify that you have the preprocessed dataset (`scenario_dreamer_ae_preprocess_[waymo|nuplan]`) and that it resides in your scratch directory.

</details>

<details> <summary><strong>2. Launch Autoencoder Training</strong></summary>

````bash
python train.py \
  dataset_name=[waymo|nuplan] \
  model_name=autoencoder \
  ae.train.run_name=[your_autoencoder_run_name] \
  ae.train.track=True
````

By default `ae.train.run_name` is set to `scenario_dreamer_autoencoder_[waymo|nuplan]`.

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- Trains on 1 GPU (≈ 36-40 h with A100 GPU).
- Training metrics and visualizations are logged to Weights & Biases (W&B).
- After each epoch a single checkpoint (overwritten to `last.ckpt`) is saved to `$SCRATCH_ROOT/checkpoints/[your_autoencoder_run_name]`.

</details>

### 💾 Autoencoder Latent Caching

<details> <summary><strong>1. Prerequisites</strong></summary>

- Verify that you have the preprocessed dataset (`scenario_dreamer_ae_preprocess_[waymo|nuplan]`) and a trained autoencoder from the previous step.

</details>

<details> <summary><strong>2. Launch Caching</strong></summary>

````bash
python eval.py \
  dataset_name=[waymo|nuplan] \
  model_name=autoencoder \
  ae.eval.run_name=[your_autoencoder_run_name] \
  ae.eval.cache_latents.enable_caching=True \
  ae.eval.cache_latents.split_name=[train|val|test]
````

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- Caches latents (mean/log_var) to disk at `$SCRATCH_ROOT/scenario_dreamer_autoencoder_latents_[waymo|nuplan]/[train|val|test]` for ldm training.
- Utilizes 1 GPU (≈ 1 h with A100 GPU)

</details>

### 🚀 Latent Diffusion Model (LDM) Training

<details> <summary><strong>1. Prerequisites</strong></summary>

- Verify that you have the cached latents (`scenario_dreamer_autoencoder_latents_[waymo|nuplan]`) for the train and val split in your scratch directory, and the corresponding trained autoencoder.

</details>

<details> <summary><strong>2. Launch Training</strong></summary>

<details>
<summary><strong>Scenario Dreamer Base</strong></summary>

By default, `train.py` trains a Scenario Dreamer Base model:
```bash
python train.py \
  dataset_name=[waymo|nuplan] \
  model_name=ldm \
  ldm.model.autoencoder_run_name=[your_autoencoder_run_name] \
  ldm.train.run_name=[your_ldm_run_name] \
  ldm.train.track=True
```

- Ensure your ldm run name is different to your autoencoder run name. By default, `ldm.train.run_name` is set to `scenario_dreamer_ldm_base_[waymo|nuplan]`.

</details> 

<details> <summary><strong>Scenario Dreamer Large</strong></summary>

```bash
python train.py \
  dataset_name=[waymo|nuplan] \
  model_name=ldm \
  ldm.model.autoencoder_run_name=[your_autoencoder_run_name] \
  ldm.train.run_name=[your_ldm_run_name] \
  ldm.train.devices=8 \
  ldm.datamodule.train_batch_size=128 \
  ldm.datamodule.val_batch_size=128 \
  ldm.model.num_l2l_blocks=3 \
  ldm.train.track=True
```

- Ensure your ldm run name is different to your autoencoder run name. By default, `ldm.train.run_name` is set to `scenario_dreamer_ldm_large_[waymo|nuplan]`.

</details>

</details>  

<details> <summary><strong>3. What to Expect</strong></summary>

- Scenario Dreamer B trains on 4 GPUs (≈ 24h with 4 A100-L GPUs) and Scenario Dreamer L trains on 8 GPUs (≈ 32-36h with 8 A100-L GPUs).
- By default, both models train for 165k steps.
- Training metrics and visualizations are logged to Weights & Biases (W&B).
- After each epoch a single checkpoint (overwritten to `last.ckpt`) is saved to `$SCRATCH_ROOT/checkpoints/[your_ldm_run_name]`.
- To resume training from an existing checkpoint, run the same training command and the code will automatically resume training from the `last.ckpt` stored in the run's `$SCRATCH_ROOT/checkpoints/[your_ldm_run_name]` directory.

</details>

### 📈 CtRL-Sim Training <a name="ctrlsim-training"></a>

<details> <summary><strong>1. Prerequisites</strong></summary>

- Verify that you have the preprocessed dataset (`scenario_dreamer_ctrl_sim_preprocess`) and that it resides in your scratch directory.

</details>

<details> <summary><strong>2. Launch CtRL-Sim Training</strong></summary>

````bash
python train.py \
  dataset_name=waymo \
  model_name=ctrl_sim \
  ctrl_sim.train.run_name=[your_ctrl_sim_run_name] \
  ctrl_sim.train.track=True
````

By default `ctrl_sim.train.run_name` is set to `ctrl_sim_waymo`.

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- By default, trains for 1M steps. However, we used 500k-step checkpoint in paper due to resource limitations.
- Trains on 4 GPUs (≈ 100 h with 4 A100 GPUs to 1M steps). 
- Training metrics and visualizations are logged to Weights & Biases (W&B).
- After each epoch, a single checkpoint (overwritten to `last.ckpt`) is saved to `$SCRATCH_ROOT/checkpoints/[your_ctrl_sim_run_name]`. The 15 checkpoints
with lowest val loss are additionally saved to `$SCRATCH_ROOT/checkpoints/[your_ctrl_sim_run_name]`.

</details>

## Evaluation <a name="evaluation"></a>

### 📈 Autoencoder Evaluation

<details> <summary><strong>1. Prerequisites</strong></summary>

- Verify that you have the preprocessed dataset (`scenario_dreamer_ae_preprocess_[waymo|nuplan]`) and a trained autoencoder.

</details>

<details> <summary><strong>2. Launch Eval</strong></summary>

```bash
python eval.py \
  dataset_name=[waymo|nuplan] \
  model_name=autoencoder \
  ae.eval.run_name=[your_autoencoder_run_name]
```

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- By default, 50 reconstructed scenes will be visualized and logged to `$PROJECT_ROOT/viz_eval_[your_autoencoder_run_name]`.
- The reconstruction metrics computed on the full test set will be printed.

</details>

### 🚀 Generate Scenes with LDM

<a id="initial-scene-generation"></a>
<details>
<summary><strong>Initial Scene Generation</strong></summary>

<details> <summary><strong>1. Prerequisites</strong></summary>

- Verify that you have a trained autoencoder and ldm.

</details>

<details> <summary><strong>2. Generate and Visualize Samples</strong></summary>

To generate and visualize 100 initial scenes from your trained model:
```bash
python eval.py \
  dataset_name=[waymo|nuplan] \
  model_name=ldm \
  ldm.eval.mode=initial_scene \
  ldm.model.num_l2l_blocks=[1|3] \ # base model has 1 l2l block, large model has 3
  ldm.eval.run_name=[your_ldm_run_name] \
  ldm.model.autoencoder_run_name=[your_autoencoder_run_name] \
  ldm.eval.num_samples=100 \
  ldm.eval.visualize=True
```

To additionally cache the samples to disk for metrics computation, set `ldm.eval.cache_samples=True`. You can adjust `ldm.eval.num_samples` to configure the number of samples generated.

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- 100 samples will be generated on 1 GPU with a default batch size of 32. 
- The samples will be visualized to `$PROJECT_ROOT/viz_gen_samples_[your_ldm_run_name]`.
- If you toggle `ldm.eval.cache_samples=True`, samples will be cached to `$SCRATCH_ROOT/checkpoints/[your_ldm_run_name]/initial_scene_samples`.

</details>

</details> 

<details> <summary><strong>Lane-conditioned Object Generation</strong></summary>

<details> <summary><strong>1. Prerequisites</strong></summary>

- Verify that you have a trained autoencoder and ldm.
- Verify that you have the cached latents (`scenario_dreamer_autoencoder_latents_[waymo|nuplan]`) for the train and val split in your scratch directory. We will condition the reverse diffusion process on the lane latents loaded from the cache for lane-conditioned generation.

</details>

<details> <summary><strong>2. Generate and Visualize Samples</strong></summary>

To generate and visualize 100 lane-conditioned scenes from your trained model:
```bash
python eval.py \
  dataset_name=[waymo|nuplan] \
  model_name=ldm \
  ldm.eval.mode=lane_conditioned \
  ldm.model.num_l2l_blocks=[1|3] \ # base model has 1 l2l block, large model has 3
  ldm.eval.run_name=[your_ldm_run_name] \
  ldm.model.autoencoder_run_name=[your_autoencoder_run_name] \
  ldm.eval.conditioning_path=${SCRATCH_ROOT}/scenario_dreamer_autoencoder_latents_[waymo|nuplan]/val
  ldm.eval.num_samples=100 \
  ldm.eval.visualize=True
```

This will load lane latents from the validation set for conditioning. You can adjust `ldm.eval.num_samples` to configure the number of samples generated.

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- 100 lane-conditioned samples will be generated on 1 GPU with a default batch size of 32. 
- The lane-conditioned samples will be visualized to `$PROJECT_ROOT/viz_gen_samples_[your_ldm_run_name]`.

</details>

</details>

<details> <summary><strong>Inpainting Generation</strong></summary>

<details> <summary><strong>1. Prerequisites</strong></summary>

- Verify that you have a trained autoencoder and ldm.
- Verify that you have generated and cached a set of scenarios by following the steps in [Initial Scene Generation](#initial-scene-generation). By default,
the scenarios are saved to `/path/to/ldm/checkpoint/initial_scene_samples`. 

</details>

<details> <summary><strong>2. Generate and Visualize Samples</strong></summary>

To generate and visualize 100 inpainted scenes from your trained model:
```bash
python eval.py \
  dataset_name=[waymo|nuplan] \
  model_name=ldm \
  ldm.eval.mode=inpainting \
  ldm.model.num_l2l_blocks=[1|3] \ # base model has 1 l2l block, large model has 3
  ldm.eval.run_name=[your_ldm_run_name] \
  ldm.model.autoencoder_run_name=[your_autoencoder_run_name] \
  ldm.eval.conditioning_path=${SCRATCH_ROOT}/checkpoints/[your_ldm_run_name]/initial_scene_samples \
  ldm.eval.num_samples=100 \
  ldm.eval.visualize=True
```

This script will load each of the initial scenes, randomly sample a valid route for the ego (as a sequence of lane segments), renormalize the scene to the end of the route, and then run an inpainting forward pass. You can adjust `ldm.eval.num_samples` to configure the number of samples generated, but ensure that you have cached a sufficient number of initial scenes.

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- 100 inpainted samples will be generated on 1 GPU with a default batch size of 32. 
- The inpainted samples will be visualized to `$PROJECT_ROOT/viz_gen_samples_[your_ldm_run_name]`.

</details>

</details>

### 📊 Compute Evaluation Metrics 

<details> <summary><strong>1. Prerequisites</strong></summary>

- Verify that you have a trained autoencoder and ldm.
- You first need to generate 50k samples with your trained LDM:
```bash
python eval.py \
  dataset_name=[waymo|nuplan] \
  model_name=ldm \
  ldm.eval.mode=initial_scene \
  ldm.model.num_l2l_blocks=[1|3] \ # base model has 1 l2l block, large model has 3
  ldm.eval.run_name=[your_ldm_run_name] \
  ldm.model.autoencoder_run_name=[your_autoencoder_run_name] \
  ldm.eval.num_samples=50000 \
  ldm.eval.cache_samples=True
```

</details>

<details> <summary><strong>2. Compute Metrics</strong></summary>

```bash
python eval.py \
  dataset_name=[waymo|nuplan] \
  model_name=ldm \
  ldm.eval.mode=metrics \
  ldm.eval.run_name=[your_ldm_run_name]
```

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- Computes metrics using 50k generated scenes from your trained LDM and 50k real scenes whose paths are loaded from `$PROJECT_ROOT/metadata/eval_set_[waymo|nuplan].pkl`.
- Lane generation and agent generation metrics will be printed and written to `$SCRATCH_ROOT/checkpoints/[your_ldm_run_name]/metrics.pkl`.

</details>

### 🚀 Generate Scenario Dreamer Simulation Environments <a name="generate-scenario-dreamer-simulation-environments"></a>

<details> <summary><strong>1. Prerequisites</strong></summary>

- Verify that you have a trained autoencoder and ldm.

</details>

<details> <summary><strong>2. Generate and Visualize Simulation Environments</strong></summary>

**Note: Scenario Dreamer supports the generation of nuPlan simulation environments; however, simulation environment generation has been primarily verified on the Waymo dataset.**

**Note: To generate the most diverse and interesting simulation environments, we recommend setting `ldm.eval.sim_envs.nocturne_compatible_only=False` and `dataset_name=waymo`.**

To generate and visualize 10 simulation environments from your trained model, run:
```bash
python eval.py \
  dataset_name=[waymo|nuplan] \
  model_name=ldm \
  ldm.eval.mode=simulation_environments \
  ldm.model.num_l2l_blocks=[1|3] \
  ldm.eval.run_name=[your_ldm_run_name] \
  ldm.model.autoencoder_run_name=[your_autoencoder_run_name] \
  ldm.eval.num_samples=10 \
  ldm.eval.sim_envs.route_length=500 \
  ldm.eval.visualize=True
```

Simulation environments are generated by performing 1 iteration of initial scene generation, followed by N iterations of inpainting until the route length is exceeded. The route for the ego is generated by randomly sampling from the lane graph on-the-fly. After each partial generation, a series of heuristic checks are implemented to mitigate the occurrence of degenerate scenes. Moreover, at each of the N inpainting steps, by default we sample 8 candidate inpainting extensions, and sample from the valid candidate extensions to extend the simulation environment. If all candidate inpainting extensions are invalid, generation of that partial scene is terminated. To account for degenerate partial scenes, `ldm.eval.num_samples x ldm.eval.sim_envs.overhead_factor` initial scenes are generated, and execution terminates once `ldm.eval.num_samples` complete simulation environments are created.

By default, this script will produce (at most) 10 simulation environments with route length at least 500m. To customize the route length, modify `ldm.eval.sim_envs.route_length`. To modulate the number of candidate extensions, modify `ldm.eval.sim_envs.num_inpainting_candidates`.

By default, the Waymo model generates nocturne-compatible simulation environments via classifier-free guidance. To remove this constraint, set `ldm.eval.sim_envs.nocturne_compatible_only=False`.

By setting `ldm.eval.visualize=True`, the script will visualize the partially generated simulation environment after each inpainting step to `$SCRATCH_ROOT/checkpoints/[your_ldm_run_name]/viz_sim_envs_[waymo|nuplan]`. The completed simulation environments are also visualized here.

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- 10 simulation environments will be generated on 1 GPU with a default batch size of 32. 
- The partial and complete simulation environments will be visualized to `$SCRATCH_ROOT/checkpoints/[your_ldm_run_name]/viz_sim_envs_[waymo|nuplan]`.
- The complete simulation environments are written to disk at `$SCRATCH_ROOT/checkpoints/[your_ldm_run_name]/complete_sim_envs`.

</details>

</details>

## Simulation <a name="simulation"></a>

### 🚗 Run Simulations in Scenario Dreamer Environments

<details> <summary><strong>1. Prerequisites</strong></summary>

- **Simulation Environments**: You need a set of postprocessed simulation environments. You have two options:
  - **Option A (Generate your own)**: Generate simulation environments by following the instructions in [Generate Scenario Dreamer Simulation Environments](#generate-scenario-dreamer-simulation-environments). Then, postprocess the generated simulation environments:
    ```bash
    python data_processing/postprocess_simulation_environments.py \
      dataset_name=waymo \
      postprocess_sim_envs.run_name=[your_ldm_run_name] \
      postprocess_sim_envs.route_length=200
    ```
  - **Option B (Use pre-generated)**: By default, we provide a small set of 75 postprocessed Waymo simulation environments, each with a 200 m route length in [`metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m`](metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m). 

- **Trained CtRL-Sim Model**: You need a trained CtRL-Sim behaviour model checkpoint. You can either:
  - Train your own by following the instructions in [CtRL-Sim Training](#ctrlsim-training).
  - Download a pre-trained 1M step checkpoint from [Google Drive](https://drive.google.com/drive/folders/13DSHf2UhrvguD7i7iYL5SfSDhgLcW_ja?usp=sharing) and place the `ctrl_sim_waymo_1M_steps` directory in `$SCRATCH_ROOT/checkpoints`.

</details>

<details> <summary><strong>2. Run Simulations</strong></summary>

To run simulations in Scenario Dreamer environments, run:

```bash
python run_simulation.py \
  sim.dataset_path=[path_to_postprocessed_sim_envs] \
  sim.behaviour_model.run_name=[ctrl_sim_run_name]
```

By default, `sim.dataset_path` points to [`metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m`](metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m), so you can omit this parameter if using the pre-generated environments. By default, `sim.behaviour_model.run_name` is set to `ctrl_sim_waymo_1M_steps`.

You can optionally enable visualization of simulation rollouts as videos by setting `sim.visualize=True`. To make video generation lightweight (runs faster with lower DPI and frame rate), set `sim.lightweight=True`. To compute and display planning metrics in a verbose way after each simulation, set `sim.verbose=True`.

By default, we simulate vehicles, pedestrians, and cyclists. To simulate only vehicles (which yields a 2-3x speedup, due to not having to simulate a large number of pedestrians), set `sim.simulate_vehicles_only=True`.

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- The simulator will run through all simulation environments in the specified dataset path.
- By default, each simulation runs at 10 Hz for 400 steps (configurable via `sim.steps`), which is tailored to 200 m route lengths. 
- The IDM policy is used by default to control the ego vehicle, while other agents are controlled by the CtRL-Sim behaviour model.
- If visualization is enabled, videos will be saved to the specified `sim.movie_path` directory.
- If verbose mode is enabled, metrics (collision rate, off-route rate, completion rate, and progress) will be printed after each simulation.
- Final aggregated metrics across all simulations will be printed at the end of execution.

</details>

## 🏎️ GPUDrive Integration <a name="gpudrive-integration"></a>

<details> <summary><strong>Introduction</strong></summary>

This repository supports evaluating RL agents trained in (adapted) GPUDrive on both Waymo and Scenario Dreamer environments. We forked the GPUDrive repository and adapted it so that the RL agents are trained on the Scenario Dreamer scene representation. This allows the RL agents to be evaluated in Scenario Dreamer environments. 

We provide the following:
- The fork of GPUDrive that is adapted for Scenario Dreamer compatibility. We fork the latest commit of GPUDrive as of Jan 9, 2026 (commit [aa48a43](https://github.com/Emerge-Lab/gpudrive/tree/aa48a431ed127a37610cc2176db30ec73d0c55df)) and make the necessary changes to train Scenario Dreamer-compatible RL agents.
- Script to generate Waymo training scenarios (json files) for the Scenario Dreamer-compatible RL policy in GPUDrive.
- Training script, configurations, and pretrained checkpoint for the Scenario Dreamer-compatible RL policy.
- Support for evaluating the RL policy in the Scenario Dreamer simulator. This largely involves a re-implementation of the observation and dynamics functions of gpudrive (written in utils/gpudrive_helpers.py) in Python within the Scenario Dreamer simulator. We verify correctness by evaluating the same policy on the same held-out set of 250 Waymo scenes in both simulators. Performance is roughly identical, validating the re-implementation (slight differences stem from minor differences in the implementation of the collision, offroad, and goal success indicators).
- An updated table of results with the expected performance of the RL policy when evaluated across a variety of Waymo and Scenario Dreamer environment configurations.

We've improved upon the original GPUDrive integration (outlined in Section B.4 of the Scenario Dreamer Appendix) by making the following upgrades:
- We train using improved configurations, detailed in `gpudrive/baselines/ppo/config/ppo_base_puffer.yaml`. Crucially, we set collision_weight=-0.75, off_road_weight=-0.5, goal_achieved_weight=1.0, and collision_behavior="ignore", which we found to yield superior performance compared to the original configurations outlined in Section B.4.
- We train to 200M steps compared to 100M steps, and train over 10k unique scenarios compared to 100, thus boosting generalization.
- We apply the length/width scaling factor of 0.7 in the Scenario Dreamer simulator to be consistent with GPUDrive.

The following upgrades enables:
- Consistent performance when evaluating the same policy over the same scenarios in both simulators.
- Close to 90% goal success rate over a held out set of 250 Waymo scenarios, compared to 64% goal success rate prior to the upgrades (as reported in Table 4 of the Scenario Dreamer paper).

We hope that these upgrades provide a better starting point for researchers hoping to evaluate RL policies in Scenario Dreamer environments.

</details>

<details> <summary><strong>Pre-trained Checkpoint and Expected Performance</strong></summary>

The pre-trained RL policy weights can be found at `metadata/gpudrive_checkpoint/pretrained.pt`.

<details> <summary><strong>Expected Performance</strong></summary>

We evaluate the provided checkpoint across the same evaluation configurations as Table 4 of the Scenario Dreamer paper. The results are reported in the table below. ARL=Average Route Length (m), CR=Collision Rate, OR=Offroad Rate, SR=Success Rate:

| **Simulator** | **Other Agent Beh** | **Test Env** | **ARL** | **CR**  | **OR**  | **SR**  |
|---------------|---------------------|--------------|---------|---------|---------|---------|
|   GPUDrive    | Log Replay          |  Waymo Test  |   55m   |   7.6   |  5.6    |  87.2   |
|     SD        | Log Replay          |  Waymo Test  |   55m   |   7.6   |  3.6    |  88.0   |
|     SD        | CtRL-Sim (Pos Tilt) |  Waymo Test  |   55m   |   6.8   |  3.2    |  87.6   |
|     SD        | CtRL-Sim (Pos Tilt) |  SD (55m)    |   55m   |   7.6   |  8.4    |  82.8   |
|     SD        | CtRL-Sim (Pos Tilt) |  SD (100m)   |   100m  |   24.0  |  12.0   |  64.0   |
|     SD        | CtRL-Sim (Neg Tilt) |  SD (100m)   |   100m  |   27.2  |  12.4   |  60.8   |

</details>

</details>

### Generating the GPUDrive Training JSON Files

> **Quick Option:**  
> If you'd prefer to skip generation of the gpudrive training dataset, you can directly download the 10k prepared json files:
> Place the following files in your scratch directory and extract:
> - `gpudrive_training_set_jsons.tar` (10k gpudrive training scenarios in Scenario Dreamer-compatible format) 
> [Download from Google Drive](https://drive.google.com/drive/folders/13DSHf2UhrvguD7i7iYL5SfSDhgLcW_ja?usp=sharing)  

<details> <summary><strong>Instructions</strong></summary>

To generate the Scenario Dreamer-compatible gpudrive training set jsons (size 10k), run the following:
```bash
# generate pickle files (compatible with Scenario Dreamer simulator)
python data_processing/waymo/create_gpudrive_pickles.py \
  dataset_name=waymo \
  preprocess_waymo.mode=val 
# generate json files from pickle files (compatible with adapted GPUDrive simulator)
python data_processing/waymo/convert_pickles_to_jsons.py \
  dataset_name=waymo \
  convert_pickles_to_jsons.directory=gpudrive_training_set \
  convert_pickles_to_jsons.dataset_size=10000
```

</details> 

### Generating the Evaluation Datasets

> **Quick Option:**  
> If you'd prefer to skip generation of the evaluation datasets, you can directly download the prepared files:
> Place the following files in your **metadata** directory and extract:
> - `simulation_environment_datasets.tar` (250 pickles/jsons for: waymo test, scenario dreamer 55m routes, scenario dreamer 100m routes) 
> [Download from Google Drive](https://drive.google.com/drive/folders/13DSHf2UhrvguD7i7iYL5SfSDhgLcW_ja?usp=sharing) 

<details> <summary><strong>Instructions</strong></summary>

To generate the evaluation datasets, run the following:
```bash
# Generate Waymo test simulation environments
python data_processing/waymo/create_gpudrive_pickles.py \
  dataset_name=waymo preprocess_waymo.mode=test

# Generate Scenario Dreamer simulation environments
python eval.py \
  dataset_name=waymo \
  model_name=ldm \
  ldm.eval.mode=simulation_environments \
  ldm.model.num_l2l_blocks=3 \
  ldm.eval.run_name=scenario_dreamer_ldm_large_waymo \
  ldm.eval.num_samples=500 \
  ldm.eval.sim_envs.route_length=200 \
  ldm.eval.sim_envs.overhead_factor=3

# Generate 55m route postprocessed simulation environments
python data_processing/postprocess_simulation_environments.py \
  dataset_name=waymo \
  postprocess_sim_envs.route_length=55 \
  postprocess_sim_envs.max_num_envs=250

# Generate 100m route postprocessed simulation environments
python data_processing/postprocess_simulation_environments.py \
  dataset_name=waymo \
  postprocess_sim_envs.route_length=100 \
  postprocess_sim_envs.max_num_envs=250

# Convert all pickle files to jsons
python data_processing/waymo/convert_pickles_to_jsons.py \
  convert_pickles_to_jsons.directory=waymo_sim_test

python data_processing/waymo/convert_pickles_to_jsons.py \
  convert_pickles_to_jsons.directory=scenario_dreamer_waymo_55m

python data_processing/waymo/convert_pickles_to_jsons.py \
  convert_pickles_to_jsons.directory=scenario_dreamer_waymo_100m

# move simulation environments into metadata directory
mv $SCRATCH_ROOT/simulation_environment_datasets/* $PROJECT_ROOT/metadata/simulation_environment_datasets
```

</details>

### Training RL Policies in GPUDrive

<details> <summary><strong>1. Setup</strong></summary>

**Initializing the GPUDrive Submodule**

Since this repository uses GPUDrive as a git submodule, you need to initialize and update submodules after cloning:

```bash
git clone --recursive https://github.com/princeton-computational-imaging/scenario-dreamer.git
cd scenario-dreamer
```

If you've already cloned the repository without the `--recursive` flag, you can initialize the submodule afterwards:

```bash
git submodule update --init --recursive
```

**Setting Up GPUDrive**

Navigate to the `gpudrive` directory and follow the GPUDrive installation instructions in its README (`gpudrive/README.md`). This includes installing dependencies, building the simulator, and setting up the Python environment.

> **Note**: Please do not create issues in the Scenario Dreamer repository for GPUDrive installation issues unless they are specific to the modifications in the adapted GPUDrive repository. If you encounter problems with GPUDrive setup, please refer to the [GPUDrive repository](https://github.com/Emerge-Lab/gpudrive) for support.

**Using the Singularity Container (Optional)**

For convenience, we provide a Singularity container (`gpudrive_2025.sif`) that we used to set up GPUDrive. This container can be downloaded from [Google Drive](https://drive.google.com/drive/folders/13DSHf2UhrvguD7i7iYL5SfSDhgLcW_ja?usp=sharing). The container includes a base environment with the necessary dependencies from which one could install GPUDrive. For reference, the training script we used (`gpudrive/run.sh`) is included in the repository and demonstrates how to run training using the Singularity container.

**Training Configuration**

Ensure that you have generated or downloaded the Scenario Dreamer-compatible gpudrive training json files and placed them in your `$SCRATCH_ROOT` directory. Modify the `data_dir` field in `gpudrive/baselines/ppo/config/ppo_base_puffer.yaml` accordingly.

</details>

<details> <summary><strong>2. Training an RL Policy</strong></summary>

The custom configurations we used can be found at `gpudrive/baselines/ppo/config/ppo_base_puffer.yaml`. 

To train an RL policy, run:
```bash
python baselines/ppo/ppo_pufferlib.py
```

We manually terminated the run after 500 epochs (~250M steps), but it will train by default to 1B steps.

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- The RL policy will train on 1 GPU (we used 1 L40S GPU) to 1B steps. We terminated the run after 500 epochs (~250M steps), and used the 400-epoch checkpoint.
- Metrics will be logged to wandb and the Pufferlib interface will be displayed in the console. Note that it often takes 10-15 minutes for the pufferlib display to update from all zeros.
- Checkpoints will be saved by default to the `gpudrive/wandb/...` directory every 400 epochs.
- We attained a controlled_agent_sps of around 1400.
- A screenshot of expected trend in performance during training can be found below:

<details> <summary><strong>Performance</strong></summary>

![GPUDrive Training Performance Trend](metadata/gpudrive_trend.png)

</details> 

</details>

### Evaluating RL Policies in Scenario Dreamer

<details> <summary><strong>1. Prerequisites</strong></summary>

- Verify that you have a trained RL policy (provided pretrained rl policy weights are located at `$PROJECT_ROOT/metadata/gpudrive_checkpoint/pretrained.pt`). Set `cfgs/sim/base/rl_model_path` and `cfgs/sim/base/rl_model_name` accordingly.
- Verify that you have a pre-trained CtRL-Sim checkpoint. Model weights can be found on the [Google Drive](https://drive.google.com/drive/folders/13DSHf2UhrvguD7i7iYL5SfSDhgLcW_ja?usp=sharing).
- Verify that you have generated or downloaded the evaluation datasets (pickles and jsons) and stored them in `$PROJECT_ROOT/metadata/simulation_environment_datasets`. The evaluation datasets can be found on the [Google Drive](https://drive.google.com/drive/folders/13DSHf2UhrvguD7i7iYL5SfSDhgLcW_ja?usp=sharing).

**Note: RL Policy evaluation is run in the Scenario Dreamer Python environment, not in the GPUDrive Python environment. The GPUDrive setup and corresponding Python environment is only required to train the RL policy.**

</details>

<details> <summary><strong>2. Run Evaluation</strong></summary>

To evaluate the rl policy on 250 waymo test environments with log replay agents, run:
```bash
python run_simulation.py sim=waymo_log_replay
```

To evaluate the rl policy on 250 waymo test environments with ctrl-sim agents, run:
```bash
python run_simulation.py sim=waymo_ctrl_sim
```

To evaluate the rl policy on 250 scenario dreamer (55m routes) test environments with ctrl-sim agents, run:
```bash
python run_simulation.py sim=scenario_dreamer_55m
```

To evaluate the rl policy on 250 scenario dreamer (100m routes) test environments with ctrl-sim agents, run:
```bash
python run_simulation.py sim=scenario_dreamer_100m
```

To evaluate the rl policy on 250 scenario dreamer (100m routes) test environments with adversarial ctrl-sim agents, run:
```bash
python run_simulation.py sim=scenario_dreamer_100m_adv
```

You can visualize the simulations by setting `sim.visualize=True`.

</details>

<details> <summary><strong>3. What to Expect</strong></summary>

- The RL policy will be evaluated on 250 simulation environments on 1 GPU. 
- The planner metrics (collision rate, offroad rate, goal success rate, progress) will be aggregated and reported after each simulation.
- If you set `sim.visualize=True`, simulations will be visualized as mp4s to the `movies` directory.

</details>

## Citation <a name="citation"></a>

```bibtex
@InProceedings{rowe2025scenariodreamer,
  title={Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments},
  author={Rowe, Luke and Girgis, Roger and Gosselin, Anthony and Paull, Liam and Pal, Christopher and Heide, Felix},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={17207--17218},
  year={2025}
}
```

## Acknowledgements <a name="acknowledgements"></a>

Special thanks to the authors of the following open-source repositories:
- [SLEDGE](https://github.com/autonomousvision/sledge)
- [QCNet](https://github.com/ZikangZhou/QCNet)
- [decision-diffuser](https://github.com/anuragajay/decision-diffuser)
- [latent-diffusion](https://github.com/CompVis/latent-diffusion)
- [ctrl-sim](https://github.com/montrealrobotics/ctrl-sim)
- [gpudrive](https://github.com/Emerge-Lab/gpudrive)


================================================
FILE: cfgs/config.py
================================================
from pathlib import Path
import os

# 1.  Pull from user’s shell if it exists  ──────────────────────────
#    $ export PROJECT_ROOT=/my/cloned/scenario-dreamer
#    $ export CONFIG_PATH=/my/custom/location/cfgs   # optional override
PROJECT_ROOT = Path(os.getenv("PROJECT_ROOT", Path(__file__).resolve().parents[1]))
CONFIG_PATH  = Path(os.getenv("CONFIG_PATH", PROJECT_ROOT / "cfgs")).as_posix()

### CONSTANTS

NUM_WAYMO_TRAIN_SCENARIOS = 487002

# lg_type
NON_PARTITIONED = 0
PARTITIONED = 1 

# partition mask
AFTER_PARTITION = 0
BEFORE_PARTITION = 1

# Waymo connection type
LANE_CONNECTION_TYPES_WAYMO = {
    "none": 0,
    "pred": 1,
    "succ": 2,
    "left": 3,
    "right": 4,
    "self": 5
}

# NuPlan connection type
LANE_CONNECTION_TYPES_NUPLAN = {
    "none": 0,
    "pred": 1,
    "succ": 2,
    "self": 3,
}

# proportion of nocturne compatible scenes in the Waymo dataset
PROPORTION_NOCTURNE_COMPATIBLE = 0.38
NOCTURNE_COMPATIBLE = 1

# object type
NUPLAN_VEHICLE = 0
NUPLAN_PEDESTRIAN = 1
NUPLAN_STATIC_OBJECT = 2

# unified format for computing metrics
# [pos_x, pos_y, speed, cos(heading), sin(heading), length, width]
UNIFIED_FORMAT_INDICES = {
    'pos_x': 0,
    'pos_y': 1,
    'speed': 2,
    'cos_heading': 3,
    'sin_heading': 4,
    'length': 5,
    'width': 6
}



================================================
FILE: cfgs/config.yaml
================================================
scratch_root: ${oc.env:SCRATCH_ROOT} # scratch root directory, used for datasets and checkpoints
dataset_root: ${oc.env:DATASET_ROOT} # root directory for datasets
project_root: ${oc.env:PROJECT_ROOT} # project root directory
waymo_data_folder: ${dataset_root}/waymo_open_dataset_motion_v_1_1_0 # path to the Waymo Open Dataset
waymo_train_folder: ${waymo_data_folder}/training # path to the training set of the Waymo Open Dataset
waymo_val_folder: ${waymo_data_folder}/validation # path to the validation set of the Waymo Open Dataset
waymo_test_folder: ${waymo_data_folder}/testing # path to the testing set of the Waymo Open Dataset

hydra:
  run:
    dir: ${project_root}/slurm_logs/${now:%Y.%m.%d}/${now:%H.%M.%S}/${hydra.job.override_dirname}

generate_waymo_dataset:
  output_data_folder_train: ${dataset_root}/scenario_dreamer_waymo/train # output folder for training data
  output_data_folder_val: ${dataset_root}/scenario_dreamer_waymo/val # output folder for validation data
  output_data_folder_test: ${dataset_root}/scenario_dreamer_waymo/test # output folder for test data
  num_workers: 20 # number of workers to use for generating the dataset
  mode: train # or val or test
  chunk_idx: -1 # If chunk_idx >= 0, indexes into the chunk_idx'th chunk of size chunk_size. If chunk_idx=-1, process all chunks in parallel processes
  chunk_size: 50 # size of each chunk to process in parallel processes, if chunk_idx >= 0

preprocess_waymo:
  stage: scenario_dreamer # or ctrl_sim
  num_workers: 10 # number of workers to use for preprocessing the dataset
  mode: train # or val or test
  chunk_idx: -1 # If chunk_idx >= 0, indexes into the chunk_idx'th chunk of size chunk_size. If chunk_idx=-1, process all chunks in parallel processes
  chunk_size: 50000 # size of each chunk to process in parallel processes, if chunk_idx >= 0

preprocess_nuplan:
  num_workers: 10 # number of workers to use for preprocessing the dataset
  mode: train # or val or test
  chunk_idx: -1 # If chunk_idx >= 0, indexes into the chunk_idx'th chunk of size chunk_size. If chunk_idx=-1, process all chunks in parallel processes
  chunk_size: 50000 # size of each chunk to process in parallel processes, if chunk_idx >= 0

postprocess_sim_envs:
  run_name: scenario_dreamer_ldm_large_${dataset_name.name} # run name of ldm that generated the simulation environments
  pre_path: ${scratch_root}/checkpoints/${postprocess_sim_envs.run_name}/complete_sim_envs # input path for generated simulation environments
  post_path: ${scratch_root}/simulation_environment_datasets/scenario_dreamer_${dataset_name.name}_${postprocess_sim_envs.route_length}m_pickles # output path for post-processed simulation environments
  route_length: 200 # route length for post-processing
  max_num_envs: -1 # if -1, process all environments, otherwise process only the first max_num_envs environments

convert_pickles_to_jsons:
  directory: scenario_dreamer_waymo_200m # directory name of the simulation environments
  path_to_pickles: ${scratch_root}/simulation_environment_datasets/${convert_pickles_to_jsons.directory}_pickles # path to set of simulation environments
  path_to_jsons: ${scratch_root}/simulation_environment_datasets/${convert_pickles_to_jsons.directory}_jsons # path to the json files
  dataset_size: 250 # number of simulation environments to convert

model_name: autoencoder # or ldm or ctrl_sim

# hacky way to not have to manually change the dataset name in all config groups from command line
# only dataset_name has to be configured from command line 
# we have to define this as a group because Hydra resolves the defaults-list before it finishes building the root config
# e.g. `python train.py dataset_name=waymo model_name=autoencoder`
defaults:
  - dataset_name: waymo 
  - dataset@ae.dataset:    ${dataset_name}_autoencoder
  - train@ae.train:      ${dataset_name}_autoencoder
  - eval@ae.eval:       ${dataset_name}_autoencoder
  - model@ae.model:      ${dataset_name}_autoencoder
  - datamodule@ae.datamodule: ${dataset_name}_autoencoder
  - dataset@ldm.dataset:    ${dataset_name}_ldm
  - train@ldm.train:      ${dataset_name}_ldm
  - eval@ldm.eval:       ${dataset_name}_ldm
  - model@ldm.model:      ${dataset_name}_ldm
  - datamodule@ldm.datamodule: ${dataset_name}_ldm
  - dataset@ctrl_sim.dataset:   ${dataset_name}_ctrl_sim
  - train@ctrl_sim.train:      ${dataset_name}_ctrl_sim
  - model@ctrl_sim.model:      ${dataset_name}_ctrl_sim
  - datamodule@ctrl_sim.datamodule: ${dataset_name}_ctrl_sim
  - sim: base

================================================
FILE: cfgs/datamodule/base.yaml
================================================
train_batch_size: null # train batch size (per-GPU)
val_batch_size: null # validation batch size (per-GPU)
num_workers: 6 # Datamodule workers
pin_memory: True # Pin memory for dataloader
persistent_workers: True # Keep dataloader workers persistent across epochs

================================================
FILE: cfgs/datamodule/nuplan_autoencoder.yaml
================================================
defaults:
- base

_target_: datamodules.nuplan.nuplan_datamodule_autoencoder.NuplanDataModuleAutoEncoder

train_batch_size: 128
val_batch_size: 128
dataset_cfg: null



================================================
FILE: cfgs/datamodule/nuplan_ctrl_sim.yaml
================================================


================================================
FILE: cfgs/datamodule/nuplan_ldm.yaml
================================================
defaults:
- base

_target_: datamodules.nuplan.nuplan_datamodule_ldm.NuplanDataModuleLDM

train_batch_size: 256
val_batch_size: 256
dataset_cfg: null

================================================
FILE: cfgs/datamodule/waymo_autoencoder.yaml
================================================
defaults:
- base

_target_: datamodules.waymo.waymo_datamodule_autoencoder.WaymoDataModuleAutoEncoder

train_batch_size: 128
val_batch_size: 128
dataset_cfg: null



================================================
FILE: cfgs/datamodule/waymo_ctrl_sim.yaml
================================================
defaults: 
- base

_target_: datamodules.waymo.waymo_datamodule_ctrl_sim.WaymoDataModuleCtRLSim

train_batch_size: 16
val_batch_size: 16
dataset_cfg: null

================================================
FILE: cfgs/datamodule/waymo_ldm.yaml
================================================
defaults:
- base

_target_: datamodules.waymo.waymo_datamodule_ldm.WaymoDataModuleLDM

train_batch_size: 256
val_batch_size: 256
dataset_cfg: null

================================================
FILE: cfgs/dataset/nuplan_autoencoder.yaml
================================================
defaults:
- nuplan_base

sledge_raw_dataset_path: ${dataset_root}/scenario_dreamer_nuplan/sledge_raw # we use the same processed nuplan dataset as SLEDGE
map_id_dataset_path: ${dataset_root}/scenario_dreamer_nuplan/map_id # we use the same processed nuplan dataset as SLEDGE
preprocess: True # get data from preprocessed files if True, otherwise write preprocessed data to disk (you can only train with preprocess=True).
preprocess_dir: ${dataset_root}/scenario_dreamer_ae_preprocess_nuplan # directory to save preprocessed data
remove_left_right_connections: True # we only process the pre/succ connections. This cannot be toggled.

================================================
FILE: cfgs/dataset/nuplan_base.yaml
================================================
# From https://github.com/motional/nuplan-devkit/blob/cd3fd8d3d0c4d390fcb74d05fd56f92d9e0c366b/nuplan/common/actor_state/vehicle_parameters.py#L125
# 4.049 + 1.127
ego_length: 5.176 # length of the ego vehicle in metres
# 1.1485 * 2.0
ego_width: 2.297 # width of the ego vehicle in metres
fov: 64 # length = width of field of view in metres
num_points_per_lane: 20 # number of points per lane in the FOV
# we set these explicitly as numbers in 64x64 FOV are much larger than this
# and we want to be consistent with SLEDGE for fair comparison
max_num_vehicles: 31 # this includes the ego vehicle
max_num_pedestrians: 10 # maximum number of pedestrians in the FOV
max_num_static_objects: 20 # maximum number of static objects in the FOV
max_num_agents: 61 # max_num_vehicles + max_num_pedestrians + max_num_static_objects 
max_num_lanes: 100 # maximum number of lane segments in the FOV
upsample_lane_num_points: 1000 # number of points per lane segment after upsampling
num_agent_types: 3 # 0: vehicle, 1: pedestrian, 2: static object
num_lane_types: 3 # 0: lane, 1: green light, 2: red light
num_lane_connection_types: 4 # {"none": 0, "pred": 1, "succ": 2, "self": 3}
num_map_ids: 4 # 4 cities in nuPlan dataset

# statistics taken from training dataset
min_speed: 0
max_speed: 29.19
min_length: 0.09
max_length: 19.77
min_width: 0.14
max_width: 7.20
min_lane_x: -32
max_lane_x: 32
min_lane_y: -32
max_lane_y: 32

================================================
FILE: cfgs/dataset/nuplan_ctrl_sim.yaml
================================================


================================================
FILE: cfgs/dataset/nuplan_ldm.yaml
================================================
defaults:
- nuplan_base

dataset_path: ${dataset_root}/scenario_dreamer_autoencoder_latents_nuplan # path to dataset of latent cache for training ldm
latent_stats_dir: ${project_root}/metadata/latent_stats # directory to save latent statistics (mean/std across training data for normalization)
latent_stats_path: ${ldm.dataset.latent_stats_dir}/${ldm.model.autoencoder_run_name}.pkl # path to save latent statistics
agent_latents_mean: null # mean of agent latents, to be computed from training data
agent_latents_std: null # std of agent latents, to be computed from training data
lane_latents_mean: null # mean of lane latents, to be computed from training data
lane_latents_std: null # std of lane latents, to be computed from training data

================================================
FILE: cfgs/dataset/waymo_autoencoder.yaml
================================================
defaults:
- waymo_base

dataset_path: ${dataset_root}/scenario_dreamer_waymo # path to extracted waymo dataset that will be loaded and preprocessed
preprocess: True # get data from preprocessed files if True, otherwise write preprocessed data to disk (you can only train with preprocess=True).
preprocess_dir: ${dataset_root}/scenario_dreamer_ae_preprocess_waymo # directory to save preprocessed data for autoencoder training
generate_only_vehicles: False # if True, train to only generate vehicle agents, otherwise generate all agents (vehicles, pedestrians, cyclists)
remove_left_right_connections: False # if True, remove left and right lane lane connections from the dataset (only keep pre/succ), otherwise keep them

================================================
FILE: cfgs/dataset/waymo_base.yaml
================================================
dataset_path: null
max_num_agents: 30 # maximum number of agents in the FOV, including the ego vehicle
max_num_lanes: 100 # maximum number of lane segments in the FOV
upsample_lane_num_points: 1000 # number of points per lane segment after upsampling
remove_offroad_agents: True # if True, remove agents that are offroad (i.e. more than 1.5m from lane segment), otherwise keep them
offroad_threshold: 1.5 # threshold for offroad agents in metres
fov: 64 # in metres
num_points_per_lane: 20 # number of points per lane segment
num_agent_types: 3 # 0: vehicle, 1: pedestrian, 2: cyclist
num_lane_types: 0 # all lanes
num_lane_connection_types: 6 # {"none": 0, "pred": 1, "succ": 2, "left": 3, "right": 4, "self": 5}
num_map_ids: 2 # either nocturne_compatible (1) or not (0) used for sampling GPU-Drive compatible scenes

# statistics taken from training dataset
min_speed: 0
max_speed: 114.088
min_length: -0.098
max_length: 22.929
min_width: 0.096
max_width: 12.527
min_lane_x: -32
max_lane_x: 32
min_lane_y: -32
max_lane_y: 32


================================================
FILE: cfgs/dataset/waymo_ctrl_sim.yaml
================================================
preprocess: True # get data from preprocessed files if True, otherwise write preprocessed data to disk (you can only train with preprocess=True).
preprocess_dir: ${dataset_root}/scenario_dreamer_ctrl_sim_preprocess # directory to save preprocessed data
dataset_path: ${dataset_root}/scenario_dreamer_waymo # path to extracted waymo dataset that will be loaded and preprocessed
train_context_length: 32 # CtRL-Sim context length (in timesteps)
max_num_lanes: 100 # maximum number of lanes in context
max_num_agents: 24 # maximum number of modeled agents
lane_fov: 300 # lane graph field of view (in meters)
fov: 80 # agent field of view (in meters)
num_points_per_lane: 50 # number of points per lane segment
upsample_lane_num_points: 1000 # number of points per lane segment after upsampling
simulation_hz: 10 # simulation frequency in Hz
value_fn_horizon: 20 # we clip reward to go (RTG) to this horizon (in seconds)
offroad_threshold: 2.5 # threshold for offroad agents in metres
moving_threshold: 0 # minimum speed (in m/s) for an agent to be considered "moving"
normalize_to_random_timestep: True # if True, normalize context to random timestep within the context during training
rew_multiplier: 10 # reward multiplier to scale up rewards during training (not necessary)
max_veh_ego_distance: 10 # maximum distance of vehicles from ego to be modeled (in metres)
min_rtg_veh: -10 # minimum reward to go (RTG) for vehicles
max_rtg_veh: 0 # maximum reward to go (RTG) for vehicles
rtg_discretization: 350 # number of discrete bins for reward to go (RTG) values
create_gpudrive_dataset: False # if True, create Scenario Dreamer compatible GPUDrive dataset
gpudrive_training_set_dir: ${dataset_root}/simulation_environment_datasets/gpudrive_training_set_pickles # path to save Scenario Dreamer compatible GPUDrive training set
gpudrive_evaluation_set_dir: ${dataset_root}/simulation_environment_datasets/waymo_sim_test_pickles # path to save Scenario Dreamer compatible GPUDrive evaluation set

# k-disks tokenizer settings
collect_state_transitions: False # if True, collect state transitions for k-disks tokenizer training
k_disks_vocab_path: ${project_root}/metadata/k_disks_vocab_384_10Hz_seed42.pkl # path to k-disks vocabulary file
vocab_size: 384 # size of k-disks vocabulary
k_disks_eps: 0.035 # epsilon value for k-disks clustering (see pg 5. of https://arxiv.org/abs/2312.04535)
tokenize_with_nucleus_sampling: True # if True, use nucleus sampling for k-disks tokenization
tokenization_temperature: 0.008 # temperature for k-disks tokenization
tokenization_nucleus: 0.95 # nucleus value for k-disks tokenization

================================================
FILE: cfgs/dataset/waymo_ldm.yaml
================================================
defaults:
- waymo_base

dataset_path: ${dataset_root}/scenario_dreamer_autoencoder_latents_waymo # path to dataset of latent cache for training ldm
latent_stats_dir: ${project_root}/metadata/latent_stats # directory to save latent statistics (mean/std across training data for normalization)
latent_stats_path: ${ldm.dataset.latent_stats_dir}/${ldm.model.autoencoder_run_name}.pkl # path to save latent statistics
agent_latents_mean: null # mean of agent latents, to be computed from training data
agent_latents_std: null # std of agent latents, to be computed from training data
lane_latents_mean: null # mean of lane latents, to be computed from training data
lane_latents_std: null # std of lane latents, to be computed from training data

================================================
FILE: cfgs/dataset_name/nuplan.yaml
================================================
# placeholder file
name: nuplan

================================================
FILE: cfgs/dataset_name/waymo.yaml
================================================
# placeholder file
name: waymo

================================================
FILE: cfgs/eval/base.yaml
================================================
seed: 0 # random seed for reproducibility
save_dir: ${scratch_root}/checkpoints/ # save directory for evaluation results
run_name: null # name of the run to be evaluated

================================================
FILE: cfgs/eval/nuplan_autoencoder.yaml
================================================
defaults:
- base

run_name: scenario_dreamer_autoencoder_nuplan # default run name for evaluation

num_samples_to_visualize: 50 # number of samples to visualize during evaluation
visualize_lane_graph: False # visualize lane graph in the evaluation results?
viz_dir: ${project_root}/viz_eval_${ae.eval.run_name} # directory to save visualizations of generated samples

# latent caching for ldm training (run as a separate job)
cache_latents:
  enable_caching: False # cache latents to disk?
  split_name: train # which dataset split to cache latents for
  latent_dir: ${scratch_root}/scenario_dreamer_autoencoder_latents_nuplan # directory to cache latents to

================================================
FILE: cfgs/eval/nuplan_ldm.yaml
================================================
defaults:
- base

run_name: scenario_dreamer_ldm_base_nuplan # default run name for evaluation

mode: initial_scene # or lane_conditioned or inpainting or metrics or simulation_environments
num_samples: 100 # number of samples to generate during evaluation
init_prob_matrix_path: ${project_root}/metadata/initial_prob_matrix_nuplan.pt # path to initial scene num lanes and agents probability matrix
inpainting_prob_matrix_path: ${project_root}/metadata/inpainting_prob_matrix_nuplan.pt # path to inpainting num lanes and agents probability matrix
batch_size: 32 # batch size for evaluation
visualize: False # visualize samples during evaluation?
viz_dir: ${project_root}/viz_gen_samples_${ldm.eval.run_name} # directory to save visualizations of generated samples
cache_samples: False # cache samples to disk?
conditioning_path: null # optional path to conditioning data for lane-conditioned or inpainting evaluation

sim_envs:
  num_inpainting_candidates: 8 # number of inpainting candidates to generate per partial scene when generating simulation environments
  overhead_factor: 10 # generate num_samples * overhead_factor initial scenes to account for filtering when generating simulation environments
  route_length: 500 # length of routes (in meters) to generate simulation environments for

metrics:
  samples_path: ${dataset_root}/checkpoints/${ldm.eval.run_name}/initial_scene_samples # path to load generated samples
  metrics_save_path: ${dataset_root}/checkpoints/${ldm.eval.run_name} # path to save metrics
  eval_set: ${project_root}/metadata/nuplan_eval_set.pkl # pickle containing paths to ground-truth samples for metrics computation
  gt_agent_test_dir: ${dataset_root}/scenario_dreamer_ae_preprocess_nuplan/test # directory containing ground-truth agent data for metrics computation
  gt_lane_test_dir: ${dataset_root}/scenario_dreamer_nuplan/sledge_raw/test # directory containing ground-truth lane data for metrics computation

================================================
FILE: cfgs/eval/waymo_autoencoder.yaml
================================================
defaults:
- base

run_name: scenario_dreamer_autoencoder_waymo # default run name for evaluation

num_samples_to_visualize: 50 # number of samples to visualize during evaluation
visualize_lane_graph: False # visualize lane graph in the evaluation results?
viz_dir: ${project_root}/viz_eval_${ae.eval.run_name} # directory to save visualizations of generated samples

# latent caching for ldm training (run as a separate job)
cache_latents:
  enable_caching: False # cache latents to disk?
  split_name: train # which dataset split to cache latents for
  latent_dir: ${scratch_root}/scenario_dreamer_autoencoder_latents_waymo # directory to cache latents to
  nocturne_train_filenames_path: ${project_root}/metadata/nocturne_train_filenames.pkl # path to nocturne train filenames for caching latents
  nocturne_val_filenames_path: ${project_root}/metadata/nocturne_val_filenames.pkl # path to nocturne val filenames for caching latents

================================================
FILE: cfgs/eval/waymo_ldm.yaml
================================================
defaults:
- base

run_name: scenario_dreamer_ldm_base_waymo # default run name for evaluation

mode: initial_scene # or lane_conditioned or inpainting or metrics or simulation_environments
num_samples: 100 # number of samples to generate during evaluation
init_prob_matrix_path: ${project_root}/metadata/initial_prob_matrix_waymo.pt # path to initial scene num lanes and agents probability matrix
inpainting_prob_matrix_path: ${project_root}/metadata/inpainting_prob_matrix_waymo.pt # path to inpainting num lanes and agents probability matrix
batch_size: 32 # batch size for evaluation
visualize: False # visualize samples during evaluation?
viz_dir: ${project_root}/viz_gen_samples_${ldm.eval.run_name} # directory to save visualizations of generated samples
cache_samples: False # cache samples to disk?
conditioning_path: null # optional path to conditioning data for lane-conditioned or inpainting evaluation

sim_envs:
  num_inpainting_candidates: 8 # number of inpainting candidates to generate per partial scene when generating simulation environments
  overhead_factor: 5 # generate num_samples * overhead_factor initial scenes to account for filtering when generating simulation environments
  route_length: 500 # length of routes (in meters) to generate simulation environments for
  nocturne_compatible_only: True # only generate simulation environments from the distribution of Nocturne/GPUDrive-compatible scenes?

metrics:
  samples_path: ${dataset_root}/checkpoints/${ldm.eval.run_name}/initial_scene_samples # path to load generated samples
  metrics_save_path: ${dataset_root}/checkpoints/${ldm.eval.run_name} # path to save metrics
  eval_set: ${project_root}/metadata/waymo_eval_set.pkl # pickle containing paths to ground-truth samples for metrics computation
  gt_test_dir: ${dataset_root}/scenario_dreamer_ae_preprocess_waymo/test # directory containing ground-truth agent/lane data for metrics computation

================================================
FILE: cfgs/model/autoencoder.yaml
================================================
# architecture configurations
hidden_dim: 512 # lane hidden dimension
num_encoder_blocks: 2 # number of factorized encoder blocks
num_decoder_blocks: 2 # number of factorized decoder blocks
lane_attr: 2 # number of lane attributes (x and y coordinates)
num_heads: 4 # number of attention heads in *_to_lane attention layers
dropout: 0 # dropout rate
dim_f: 2048 # feedforward dimension in *_to_lane attention layers
state_dim: 7 # agent feature dimension
num_agent_types: 3 # number of agent types (car, pedestrian, cyclist)
lane_conn_attr: null # number of lane connection attributes (none, pred, succ, left, right, self)
num_lane_types: null # number of lane types (eg, lane, green light, red light)
agent_hidden_dim: 256 # agent hidden dimension
agent_num_heads: 4 # number of attention heads in *_to_agent attention layers
agent_dim_f: 1024 # feedforward dimension in *_to_agent attention layers
lane_conn_hidden_dim: 64 # lane connection hidden dimension
lane_latent_dim: 24 # lane latent dimension
agent_latent_dim: 8 # agent latent dimension

# loss weights
kl_weight: 1e-2 # weight of KL loss in VAE training (Beta in Beta-VAE)
lane_weight: 10 # weight of lane prediction loss
lane_conn_weight: 10 # weight of lane connection prediction loss
cond_dis_weight: 0.1 # weight of conditional lane distribution predictor that predicts number of lanes in top half of scene conditioned on bottom half (used for inpainting)

# ── dataset-derived constants pulled in by interpolation ── #
num_points_per_lane: ${ae.dataset.num_points_per_lane}
max_num_lanes:       ${ae.dataset.max_num_lanes}

================================================
FILE: cfgs/model/ldm.yaml
================================================
# architecture parameters
autoencoder_run_name: null # Name of the autoencoder run
autoencoder_path: null # Path to the pre-trained autoencoder checkpoint
hidden_dim: 2048 # Hidden layer dimension for lane embeddings
num_heads: 16 # Number of attention heads for *_to_lane attention layers
agent_hidden_dim: 512 # Hidden layer dimension for agent embeddings
agent_num_heads: 8 # Number of attention heads for *_to_agent attention layers
num_factorized_dit_blocks: 2 # Number of factorized DiT blocks
lane_latent_dim: 24 # Dimension of autoencoder lane latents
agent_latent_dim: 8 # Dimension of autoencoder agent latents
dropout: 0 # Dropout rate
label_dropout: 0.1 # Dropout rate for labels for training (to enable classifier-free guidance at inference)
num_l2l_blocks: 1 # Number of lane-to-lane blocks in each factorized DiT block

# sampling parameters
n_diffusion_timesteps: 100 # Number of diffusion timesteps
lane_sampling_temperature: 0.75 # Sampling temperature for lane latents in diffusion
diffusion_clip: 5 # clip value for diffusion

================================================
FILE: cfgs/model/nuplan_autoencoder.yaml
================================================
defaults:
- autoencoder

lane_conn_attr: ${ae.dataset.num_lane_connection_types} # number of lane connection types
num_lane_types: ${ae.dataset.num_lane_types} # number of lane types

================================================
FILE: cfgs/model/nuplan_ctrl_sim.yaml
================================================


================================================
FILE: cfgs/model/nuplan_ldm.yaml
================================================
defaults:
- ldm

# architecture parameters
autoencoder_run_name: scenario_dreamer_autoencoder_nuplan # Name of the autoencoder run
autoencoder_path: ${scratch_root}/checkpoints/${ldm.model.autoencoder_run_name}/last.ckpt # Path to the pre-trained autoencoder checkpoint

================================================
FILE: cfgs/model/waymo_autoencoder.yaml
================================================
defaults:
- autoencoder

lane_conn_attr: ${ae.dataset.num_lane_connection_types} # number of lane connection types
num_lane_types: ${ae.dataset.num_lane_types} # number of lane types

================================================
FILE: cfgs/model/waymo_ctrl_sim.yaml
================================================
hidden_dim: 256 # ctrl-sim model hidden dim
map_attr: 3 # number of map attributes (x,y,existence)
num_road_types: 1 # number of road types in map encoder
num_heads: 8 # number of attention heads
num_reward_components: 1 # number of modeled reward components
dim_feedforward: 1024 # feedforward dim in transformer
dropout: 0 # model is not overfitting
state_dim: 13 # number of features in state. Note we have is_ego feature
predict_rtg: True # predict return-to-go
num_transformer_encoder_layers: 2 # number of transformer encoder layers
num_decoder_layers: 4 # number of transformer decoder layers
loss_action_coef: 1. # coefficient for action loss
encode_initial_state: True # whether to encode initial state in encoder
trajeglish: False # set to true to train trajeglish model: https://arxiv.org/abs/2312.04535
ctrl_sim: True # set to true to train ctrl-sim model: https://arxiv.org/abs/2403.19918
il: False # set to true to train imitation learning model (no return tokens)

================================================
FILE: cfgs/model/waymo_ldm.yaml
================================================
defaults:
- ldm

# architecture parameters
autoencoder_run_name: scenario_dreamer_autoencoder_waymo # Name of the autoencoder run
autoencoder_path: ${scratch_root}/checkpoints/${ldm.model.autoencoder_run_name}/last.ckpt # Path to the pre-trained autoencoder checkpoint

================================================
FILE: cfgs/sim/base.yaml
================================================
seed: 0 # random seed for reproducibility
mode: scenario_dreamer # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]
dataset_path: ${project_root}/metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_pickles # path to set of simulation environments
json_path: ${project_root}/metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons # path to the json files
policy: idm # or rl (policy being evaluated)
rl_model_path: ${project_root}/metadata/gpudrive_checkpoint # path to the RL model
rl_model_name: pretrained # name of the RL model
polyline_reduction_threshold: 0.1 # from gpudrive 
visualize: False # create videos of the simulation?
lightweight: False # lightweight visualization (runs faster, less dpi and lower frame rate)
movie_path: ${project_root}/movies # path to save visualization movies
verbose: True # report metrics after each simulation?
steps: 400 # number of steps in each simulation
dt: 0.1 # time delta between simulation steps
simulate_vehicles_only: False # whether to simulate only vehicles (default is to simulate all agents (vehicles + pedestrians + cyclists))
agent_scale: 1.0 # agent scale for collision checking (GPUDrive uses 0.7)
behaviour_model:
  run_name: ctrl_sim_waymo_1M_steps # run name of the behaviour model to load
  model_path: ${scratch_root}/checkpoints/${sim.behaviour_model.run_name}/last.ckpt # path to the behaviour model checkpoint
  tilt: 10 # exponential tilting value for return sampling
  action_temperature: 1.0 # action sampling temperature
  use_rtg: True # whether behaviour model conditions on return-to-go
  predict_rtgs: True # whether behaviour model predicts return-to-go
  compute_metrics: False # compute metrics for behaviour model?


================================================
FILE: cfgs/sim/scenario_dreamer_100m.yaml
================================================
defaults:
- base

mode: scenario_dreamer # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]
dataset_path: ${project_root}/metadata/simulation_environment_datasets/scenario_dreamer_waymo_100m_pickles # path to set of simulation environments
json_path: ${project_root}/metadata/simulation_environment_datasets/scenario_dreamer_waymo_100m_jsons # path to the json files
steps: 200 # number of steps in each simulation
simulate_vehicles_only: True # whether to simulate only vehicles (default is to simulate all agents (vehicles + pedestrians + cyclists))
agent_scale: 0.7 # agent scale for collision checking (GPUDrive uses 0.7)
policy: rl # or idm (policy being evaluated)

================================================
FILE: cfgs/sim/scenario_dreamer_100m_adv.yaml
================================================
defaults:
- base

mode: scenario_dreamer # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]
dataset_path: ${project_root}/metadata/simulation_environment_datasets/scenario_dreamer_waymo_100m_pickles # path to set of simulation environments
json_path: ${project_root}/metadata/simulation_environment_datasets/scenario_dreamer_waymo_100m_jsons # path to the json files
steps: 200 # number of steps in each simulation
simulate_vehicles_only: True # whether to simulate only vehicles (default is to simulate all agents (vehicles + pedestrians + cyclists))
agent_scale: 0.7 # agent scale for collision checking (GPUDrive uses 0.7)
policy: rl # or idm (policy being evaluated)
behaviour_model:
  tilt: -10
  action_temperature: 1.0

================================================
FILE: cfgs/sim/scenario_dreamer_55m.yaml
================================================
defaults:
- base

mode: scenario_dreamer # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]
dataset_path: ${project_root}/metadata/simulation_environment_datasets/scenario_dreamer_waymo_55m_pickles # path to set of simulation environments
json_path: ${project_root}/metadata/simulation_environment_datasets/scenario_dreamer_waymo_55m_jsons # path to the json files
steps: 90 # number of steps in each simulation
simulate_vehicles_only: True # whether to simulate only vehicles (default is to simulate all agents (vehicles + pedestrians + cyclists))
agent_scale: 0.7 # agent scale for collision checking (GPUDrive uses 0.7)
policy: rl # or idm (policy being evaluated)

================================================
FILE: cfgs/sim/waymo_ctrl_sim.yaml
================================================
defaults:
- base

mode: waymo_ctrl_sim # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]
dataset_path: ${project_root}/metadata/simulation_environment_datasets/waymo_sim_test_pickles # path to set of simulation environments
json_path: ${project_root}/metadata/simulation_environment_datasets/waymo_sim_test_jsons # path to the json files
steps: 90 # number of steps in each simulation
simulate_vehicles_only: True # whether to simulate only vehicles (default is to simulate all agents (vehicles + pedestrians + cyclists))
agent_scale: 0.7 # agent scale for collision checking (GPUDrive uses 0.7)
policy: rl # or idm (policy being evaluated)

================================================
FILE: cfgs/sim/waymo_log_replay.yaml
================================================
defaults:
- base

mode: waymo_log_replay # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]
dataset_path: ${project_root}/metadata/simulation_environment_datasets/waymo_sim_test_pickles # path to set of simulation environments
json_path: ${project_root}/metadata/simulation_environment_datasets/waymo_sim_test_jsons # path to the json files
steps: 90 # number of steps in each simulation
simulate_vehicles_only: True # whether to simulate only vehicles (default is to simulate all agents (vehicles + pedestrians + cyclists))
agent_scale: 0.7 # agent scale for collision checking (GPUDrive uses 0.7)
policy: rl # or idm (policy being evaluated)

================================================
FILE: cfgs/train/base.yaml
================================================
### training configurations
seed: 0 # training seed for reproducibility
save_dir: ${scratch_root}/checkpoints/ # directory to save checkpoints/log files
run_name: null # name of saved directory for run
wandb_project: scenario-dreamer # wandb project
wandb_entity: null # wandb entity
track: False # track to wandb
accelerator: auto # pytorch lightning accelerator
devices: null # number of gpus - set to 4 when multi-gpu training
precision: 32-true # bf16-mixed for bfloat16 mixed precision or 32-true for regular
check_val_every_n_epoch: 1 # interval at which we eval on val set
limit_train_batches: 1.0 # proportion of training batches used (reduce for easier debugging)
limit_val_batches: 1.0 # proportion of validation batches used (reduce for easier debugging)
num_samples_to_visualize: null # number of samples to visualize in each epoch of training
viz_dir: null # directory to save visualizations

### optimizer configurations
max_steps: null # number of training steps
warmup_steps: null  # linear lr warmup over warmup_steps steps
lr_schedule: null # linear or constant
lr: 1e-4 # initial lr
beta_1: 0.9 # following SceneControl
beta_2: 0.999 # following SceneControl
epsilon: 10e-8 # following SceneControl
weight_decay: 1e-5 # following SceneControl
gradient_clip_val: 10.0

### loss function configurations
loss_type: l2 # l2 loss on vectorized elements
lane_weight: 10 # weight of lane regression loss
lane_conn_weight: 0.1 # weight of lane connectivity loss

================================================
FILE: cfgs/train/nuplan_autoencoder.yaml
================================================
defaults:
- base

run_name: scenario_dreamer_autoencoder_nuplan # from base.yaml
max_steps: 357450 # from base.yaml
warmup_steps: 1000 # from base.yaml
lr_schedule: linear # from base.yaml
weight_decay: 1e-4 # from base.yaml
check_val_every_n_epoch: 1 # from base.yaml
num_samples_to_visualize: 10 # from base.yaml
viz_dir: ${project_root}/viz_val_${ae.train.run_name} # from base.yaml
devices: 1 # from base.yaml


================================================
FILE: cfgs/train/nuplan_ctrl_sim.yaml
================================================


================================================
FILE: cfgs/train/nuplan_ldm.yaml
================================================
defaults:
- base

run_name: scenario_dreamer_ldm_base_nuplan # from base.yaml
lr_schedule: constant # from base.yaml
save_top_k: 0 # only save latest checkpoint
max_steps: 165000 # from base.yaml
warmup_steps: 500 # from base.yaml
weight_decay: 1e-5 # from base.yaml
check_val_every_n_epoch: 2 # from base.yaml
ema_decay: 0.9999 # EMA decay for model weights
guidance_scale: 4.0 # guidance scale for classifier-free guidance
devices: 4 # from base.yaml
num_samples_to_visualize: 10 # from base.yaml
viz_dir: ${project_root}/viz_val_${ldm.train.run_name} # from base.yaml

# this is only computed once at the start of training
num_batches_compute_stats: 250 # number of batches to compute mean/std of latents for normalization
batch_size_compute_stats: 1024 # batch size for computing mean/std of latents

================================================
FILE: cfgs/train/waymo_autoencoder.yaml
================================================
defaults:
- base

run_name: scenario_dreamer_autoencoder_waymo # from base.yaml
max_steps: 350000 # from base.yaml
warmup_steps: 1000 # from base.yaml
lr_schedule: linear # from base.yaml
weight_decay: 1e-4 # from base.yaml
check_val_every_n_epoch: 1 # from base.yaml
num_samples_to_visualize: 10 # from base.yaml
viz_dir: ${project_root}/viz_val_${ae.train.run_name} # from base.yaml
devices: 1 # from base.yaml


================================================
FILE: cfgs/train/waymo_ctrl_sim.yaml
================================================
defaults:
- base

run_name: ctrl_sim_waymo # from base.yaml
devices: 4 # from base.yaml
warmup_steps: 500 # from base.yaml
max_steps: 1000000 # from base.yaml
weight_decay: 1e-4 # from base.yaml
lr: 5e-4 # from base.yaml

================================================
FILE: cfgs/train/waymo_ldm.yaml
================================================
defaults:
- base

run_name: scenario_dreamer_ldm_base_waymo # from base.yaml
lr_schedule: constant # from base.yaml
save_top_k: 0 # only save latest checkpoint
max_steps: 165000 # from base.yaml
warmup_steps: 500 # from base.yaml
weight_decay: 1e-5 # from base.yaml
check_val_every_n_epoch: 2 # from base.yaml
ema_decay: 0.9999 # EMA decay for model weights
guidance_scale: 4.0 # guidance scale for classifier-free guidance
devices: 4 # from base.yaml
num_samples_to_visualize: 10 # from base.yaml
viz_dir: ${project_root}/viz_val_${ldm.train.run_name} # from base.yaml

# this is only computed once at the start of training
num_batches_compute_stats: 250 # number of batches to compute mean/std of latents for normalization
batch_size_compute_stats: 1024 # batch size for computing mean/std of latents

================================================
FILE: data_processing/nuplan/generate_nuplan_dataset.py
================================================
from pathlib import Path
import pickle
import hydra
import os
from cfgs.config import CONFIG_PATH
import yaml
import shutil
from tqdm import tqdm
from typing import List
import random
import gzip

def find_feature_paths(root_path, feature_name):
    """Find all paths to the specified feature files in the given root path."""
    file_paths: List[Path] = []
    for log_path in root_path.iterdir():
        if log_path.name == "metadata":
            continue
        for scenario_type_path in log_path.iterdir():
            for token_path in scenario_type_path.iterdir():
                feature_path = token_path / f"{feature_name}.gz"
                if feature_path.is_file():
                    file_paths.append(token_path / feature_name)

    return file_paths

@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):
    print("Creating train/val/test splits for SLEDGE autoencoder cache files...")
    autoencoder_cache_path = os.path.join(cfg.scratch_root, 'exp/caches/autoencoder_cache')
    path = Path(autoencoder_cache_path)
    file_paths = find_feature_paths(path, 'sledge_raw')
    print("Number of files: ", len(file_paths))
    
    # we use the same train/val/test split as SLEDGE.
    yaml_file_path = os.path.join(cfg.project_root, 'metadata/sledge_files/nuplan.yaml')
    with open(yaml_file_path, 'r') as yaml_file:
        yaml_data = yaml.safe_load(yaml_file)
    
    # Extract the directories
    log_splits = yaml_data.get('log_splits', {})
    train_dirs = log_splits.get('train', [])
    val_dirs = log_splits.get('val', [])
    test_dirs = log_splits.get('test', [])

    # Define destination directories
    train_dest_sledge_raw = os.path.join(cfg.dataset_root, 'scenario_dreamer_nuplan/sledge_raw/train')
    val_dest_sledge_raw = os.path.join(cfg.dataset_root, 'scenario_dreamer_nuplan/sledge_raw/val')
    test_dest_sledge_raw = os.path.join(cfg.dataset_root, 'scenario_dreamer_nuplan/sledge_raw/test')
    train_dest_map_id = os.path.join(cfg.dataset_root, 'scenario_dreamer_nuplan/map_id/train')
    val_dest_map_id = os.path.join(cfg.dataset_root, 'scenario_dreamer_nuplan/map_id/val')
    test_dest_map_id = os.path.join(cfg.dataset_root, 'scenario_dreamer_nuplan/map_id/test')

    # Create the destination directories if they don't exist
    os.makedirs(train_dest_sledge_raw, exist_ok=True)
    os.makedirs(val_dest_sledge_raw, exist_ok=True)
    os.makedirs(test_dest_sledge_raw, exist_ok=True)
    os.makedirs(train_dest_map_id, exist_ok=True)
    os.makedirs(val_dest_map_id, exist_ok=True)
    os.makedirs(test_dest_map_id, exist_ok=True)

    train_files = []
    val_files = []
    test_files = []
    for file_path in tqdm(file_paths):
        path_partitioned = str(file_path).split("/")
        split_id_index = path_partitioned.index('autoencoder_cache') + 1
        split_id = path_partitioned[split_id_index]

        if split_id in train_dirs:
            train_files.append(file_path)
        elif split_id in val_dirs:
            val_files.append(file_path)
        elif split_id in test_dirs:
            test_files.append(file_path)
    
    def copy_files(file_list, sledge_raw_destination, map_id_destination):
        for i, file_name in enumerate(tqdm(file_list)):
            sledge_raw_gz_path = str(file_name) + '.gz'
            map_id_gz_path = str(file_name)[:-len('sledge_raw')] + 'map_id.gz'
            
            identifier = sledge_raw_gz_path.split("/")[-2]
            
            sledge_raw_destination_path = os.path.join(sledge_raw_destination, identifier + '.gz')
            map_id_destination_path = os.path.join(map_id_destination, identifier + '.gz')
            shutil.copy(sledge_raw_gz_path, sledge_raw_destination_path)
            shutil.copy(map_id_gz_path, map_id_destination_path)

    # Copy train, val, and test files
    copy_files(train_files, train_dest_sledge_raw, train_dest_map_id)
    copy_files(val_files, val_dest_sledge_raw, val_dest_map_id)
    copy_files(test_files, test_dest_sledge_raw, test_dest_map_id)

    # Create nuplan eval set
    print("Creating nuplan eval set (for computing metrics)...")
    random.seed(42)

    map_id_path = os.path.join(cfg.dataset_root, 'scenario_dreamer_nuplan', 'map_id', 'test')
    test_files = os.listdir(map_id_path)

    test_files_dict = {
        0: [],
        1: [],
        2: [],
        3: []
    }
    for test_file in tqdm(test_files):
        test_file_path = os.path.join(map_id_path, test_file)
        with gzip.open(test_file_path, 'rb') as f:
            map_id_dict = pickle.load(f)

        test_files_dict[map_id_dict['id'].item()].append(test_file)

    for i in range(4):
        random.shuffle(test_files_dict[i])

    print("Number of files in each city:", 
          [len(test_files_dict[i]) for i in range(4)])

    # list of 12500 files from each city that forms the nuplan test set
    nuplan_test_files = test_files_dict[0][:12500] + test_files_dict[1][:12500] + test_files_dict[2][:12500] + test_files_dict[3][:12500]
    # This ensures files are not ordered by city
    random.shuffle(nuplan_test_files)

    assert len(nuplan_test_files) == 50000

    nuplan_test_dict = {
        'files': nuplan_test_files
    }

    with open(os.path.join(cfg.project_root, 'metadata', 'nuplan_eval_set.pkl'), 'wb') as f:
        pickle.dump(nuplan_test_dict, f)

    print("Done.")

main()

================================================
FILE: data_processing/nuplan/preprocess_dataset_nuplan.py
================================================
import hydra
import random
from tqdm import tqdm
from datasets.nuplan.dataset_autoencoder_nuplan import NuplanDatasetAutoEncoder
from cfgs.config import CONFIG_PATH
import multiprocessing as mp
from pathlib import Path
from omegaconf import OmegaConf

# ───────────────────────────────────────────────────────────
#  helper so the Pool can pickle it
# ───────────────────────────────────────────────────────────
def _work_one_chunk(idx, cfg_dict):
    cfg = OmegaConf.create(cfg_dict)
    cfg.preprocess_nuplan.chunk_idx = idx          # set my own chunk
    _run_one_cfg(cfg)


# ───────────────────────────────────────────────────────────
#  the old body → turned into a function we can reuse
# ───────────────────────────────────────────────────────────
def _run_one_cfg(cfg):
    random.seed(42)

    cfg.dataset_root = cfg.scratch_root
    cfg.ae.dataset.preprocess = False
    dset = NuplanDatasetAutoEncoder(cfg.ae.dataset, split_name=cfg.preprocess_nuplan.mode)

    start = cfg.preprocess_nuplan.chunk_idx * cfg.preprocess_nuplan.chunk_size
    end   = start + cfg.preprocess_nuplan.chunk_size
    chunk = [i for i in range(start, min(end, len(dset.files)))]
    if not chunk:
        return

    for idx in tqdm(chunk, position=0, leave=False):
        dset.get(idx)


@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):

    # case 1 — behave exactly like before (single-chunk run)
    if cfg.preprocess_nuplan.chunk_idx >= 0:
        _run_one_cfg(cfg)
        print("Done!")
        return

    # case 2 — chunk_idx == -1  ⇒  run *all* chunks in parallel
    if cfg.preprocess_nuplan.mode == 'train':
        total_chunks = 10
    else:
        total_chunks = 1
    
    if cfg.preprocess_nuplan.mode == 'test':
        cfg.preprocess_nuplan.chunk_size = 67000
        

    n_workers = min(
        cfg.preprocess_nuplan.get("num_workers", mp.cpu_count()),
        total_chunks
    )
    cfg_dict = OmegaConf.to_container(cfg, resolve=True)

    print(f"[preprocess-nuplan]  Launching {total_chunks} chunks on {n_workers} workers …")
    with mp.Pool(processes=n_workers) as pool:
        pool.starmap(
            _work_one_chunk,
            [(i, cfg_dict) for i in range(total_chunks)]
        )
    print("Done!")

main()

================================================
FILE: data_processing/postprocess_simulation_environments.py
================================================
import os
import pickle
from tqdm import tqdm
import numpy as np
import hydra

from utils.sim_env_helpers import postprocess_sim_env
from cfgs.config import CONFIG_PATH


@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):
    dataset_name = cfg.dataset_name.name
    pre_path = cfg.postprocess_sim_envs.pre_path
    post_path = cfg.postprocess_sim_envs.post_path
    route_length = cfg.postprocess_sim_envs.route_length
    max_num_envs = cfg.postprocess_sim_envs.max_num_envs

    if max_num_envs == -1:
        max_num_envs = len(os.listdir(pre_path))
    
    os.makedirs(post_path, exist_ok=True)

    num_sim_envs = 0
    for i, filename in enumerate(tqdm(os.listdir(pre_path))):
        file_path = os.path.join(pre_path, filename)

        with open(file_path, "rb") as f:
            sim_env = pickle.load(f)
        
        # Filter to only lane-based scenarios (for nuPlan)
        if dataset_name == 'nuplan' and sim_env['lane_types'][:, 0].sum() < len(sim_env['lane_types']):
            continue # skip if not all lane types are lane

        sim_env_filtered = postprocess_sim_env(
            sim_env,
            route_length,
            dataset_name)

        if sim_env_filtered['route_lane_indices'] is None:
            continue

        with open(os.path.join(post_path, filename), "wb") as f:
            pickle.dump(sim_env_filtered, f)
        num_sim_envs += 1

        if num_sim_envs >= max_num_envs:
            break

    print(f"Post-processed {num_sim_envs} simulation environments saved to {post_path}")


if __name__ == '__main__':
    main()
    

    

    



================================================
FILE: data_processing/waymo/add_nocturne_compatible_val_scenarios_to_test.py
================================================
import os 
import pickle 
import shutil
import hydra
from tqdm import tqdm
from cfgs.config import CONFIG_PATH

# Move half of the nocturne-compatible validation scenarios to the test set to ensure we have 
# a held-out set of nocturne-compatible scenarios for evaluation.
def add_val_to_test(cfg):
    print("Before: ")
    print("Num val scenarios: ", len(os.listdir(cfg.generate_waymo_dataset.output_data_folder_val)))
    print("Num test scenarios: ", len(os.listdir(cfg.generate_waymo_dataset.output_data_folder_test)))
    
    # list containing half of nocturne-compatible filenames in the validation set
    with open(os.path.join(cfg.project_root, 'metadata', 'nocturne_test_filenames.pkl'), 'rb') as f:
        test_filenames = pickle.load(f)

    for filename in tqdm(test_filenames):
        full_filename = 'validation.' + filename + '.pkl'        
        old_path = os.path.join(cfg.generate_waymo_dataset.output_data_folder_val, full_filename)
        new_path = os.path.join(cfg.generate_waymo_dataset.output_data_folder_test, full_filename)

        shutil.move(old_path, new_path)

    print("After: ")
    print("Num val scenarios: ", len(os.listdir(cfg.generate_waymo_dataset.output_data_folder_val)))
    print("Num test scenarios: ", len(os.listdir(cfg.generate_waymo_dataset.output_data_folder_test)))

@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):
    add_val_to_test(cfg)


if __name__ == "__main__":
    main()

================================================
FILE: data_processing/waymo/convert_pickles_to_jsons.py
================================================
import os
import glob
import json
import pickle
import numpy as np
from tqdm import tqdm
import hydra
from cfgs.config import CONFIG_PATH

ROAD_EDGE_OFFSET = 4.83  # meters laterally offset from the route

def reverse_ag_type_mapping(agent_type_onehot):
    """ Reverse the agent type mapping from one-hot to string. """
    agent_types = {0: "unset", 1: "vehicle", 2: "pedestrian", 3: "cyclist", 4: "other"}
    ag_type_idx = agent_type_onehot.argmax()
    return agent_types[ag_type_idx]


def compute_route_road_edges(route):
    """ Compute the road edges from the route. """
    repeated_route = np.vstack((route[0], route, route[-1]))
    road_edges = np.zeros((2, *route.shape))
    diffs = repeated_route[2:] - repeated_route[:-2]
    norms = np.linalg.norm(diffs, axis=1, keepdims=True)
    unit_directions = diffs / norms

    # Get perpendicular directions by rotating 90 degrees
    perp_directions = np.stack([-unit_directions[:, 1], unit_directions[:, 0]], axis=1)

    # Add offsets to the route points to create corridors
    right_corridor = repeated_route[1:-1] + perp_directions * ROAD_EDGE_OFFSET
    left_corridor = repeated_route[1:-1] - perp_directions * ROAD_EDGE_OFFSET
    road_edges[0] = left_corridor
    road_edges[1] = right_corridor

    return road_edges


@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):
    path_to_pickles = cfg.convert_pickles_to_jsons.path_to_pickles
    path_to_jsons = cfg.convert_pickles_to_jsons.path_to_jsons
    dataset_size = cfg.convert_pickles_to_jsons.dataset_size

    pkl_fnames = glob.glob(f"{path_to_pickles}/*.pkl")
    print("Number of scenes", len(pkl_fnames))
    num_converted_files = 0

    os.makedirs(path_to_jsons, exist_ok=True)
    for raw_path in tqdm(pkl_fnames):
        with open(raw_path, 'rb') as f:
            data = pickle.load(f)
        
        new_data = {}
        # store metadata
        new_data['tl_states'] = {}
        new_fname = raw_path.split("/")[-1].split("validation.")[-1].replace("pkl", "json")
        new_data['name'] = new_fname
        # Add scenario_id (filename without extension)
        new_data['scenario_id'] = new_fname.replace('.json', '')
        if 'ego_index' not in data:
            data['ego_index'] = len(data['agents']) - 1
        new_data['ego_idx'] = data['ego_index']
        
        # store route
        curr_route =  [{'x': data['route'][i, 0], 
                        'y': data['route'][i, 1]
                        } for i in range(len(data['route']))]
        assert len(curr_route) > 0
        new_data['route'] = curr_route
        
        # store lane centerlines
        new_data['roads'] = []
        for s in range(len(data['lanes'])):
            curr_road_pts = [{
                'x': data['lanes'][s, i, 0], 
                'y': data['lanes'][s, i, 1]
            } for i in range(data['lanes'][s].shape[0])]
            curr_road_dict = {'geometry': curr_road_pts, 
                            'type': 'lane'}
            new_data['roads'].append(curr_road_dict)

        # compute and store road edges defining the corridor around the route 
        road_edges = compute_route_road_edges(route=data['route'])
        for s in range(len(road_edges)):
            curr_road_pts = [{'x': road_edges[s, i, 0], 
                              'y': road_edges[s, i, 1]
                              } for i in range(road_edges[s].shape[0])]
            curr_road_dict = {'geometry': curr_road_pts, 'type': 'road_edge'}
            new_data['roads'].append(curr_road_dict)

        # store objects
        new_data['objects'] = []
        for n in range(len(data['agents'])):
            positions  = data['agents'][n, :, :2]
            positions  = [{'x': positions[i, 0], 
                           'y': positions[i, 1]
                           } for i in range(len(positions))]
            velocities = data['agents'][n, :, 2:4]
            velocities = [{'x': velocities[i, 0], 
                           'y': velocities[i, 1]
                           } for i in range(len(velocities))]
            headings   = data['agents'][n, :, 4]
            length     = data['agents'][n, :, 5]
            width      = data['agents'][n, :, 6]
            valid      = data['agents'][n, :, 7].astype(bool)
            goals      = data['agents'][n, valid, :2][-1]
            goals      = {'x': goals[0], 'y': goals[1]}
            ag_type    = reverse_ag_type_mapping(
                            data['agent_types'][n]
                        )
            if n == data['ego_index']:
                mark_as_expert = False
                goals = curr_route[-1]
            else:
                mark_as_expert = True

            height = 1.5
            curr_obj_dict = {
                'position': positions,
                'width': width[valid][0],
                'length': length[valid][0],
                'height': height,
                'id': n,  # Use original index as ID (will be reassigned after reordering)
                'heading': headings.tolist(),
                'velocity': velocities,
                'valid': valid.tolist(),
                'goalPosition': goals,
                'type': ag_type,
                'mark_as_expert': mark_as_expert
            }
            new_data['objects'].append(curr_obj_dict)

        # store ego vehicle as the first object
        new_data['objects'] = [new_data['objects'][data['ego_index']]] + new_data['objects']
        new_data['objects'].pop(data['ego_index']+1)
        
        # Reassign IDs after reordering (ego is now at index 0, others follow sequentially)
        for idx, obj in enumerate(new_data['objects']):
            obj['id'] = idx
        
        # Create metadata with sdc_track_index, tracks_to_predict, and objects_of_interest
        # Since ego is moved to index 0, sdc_track_index should be 0
        sdc_track_index = 0
        
        # Include ego and a few other vehicles in tracks_to_predict
        # Find vehicle indices (excluding ego at index 0)
        vehicle_indices = [0]  # Always include ego
        
        tracks_to_predict = [
            {'track_index': idx, 'difficulty': 0} 
            for idx in vehicle_indices
        ]
        
        # Objects of interest: include ego (ID 0)
        objects_of_interest = [0]
        
        new_data['metadata'] = {
            'sdc_track_index': sdc_track_index,
            'tracks_to_predict': tracks_to_predict,
            'objects_of_interest': objects_of_interest
        }
        
        # store the json
        with open(os.path.join(path_to_jsons, new_fname), 'w') as f:
            json.dump(new_data, f, indent=4)
        
        num_converted_files += 1
        if num_converted_files >= dataset_size:
            break
    
    print(f"GPUDrive dataset size: {num_converted_files}")


if __name__ == "__main__":
    main()


================================================
FILE: data_processing/waymo/create_gpudrive_pickles.py
================================================
import os
os.environ.setdefault("OMP_NUM_THREADS", "1")
os.environ.setdefault("MKL_NUM_THREADS", "1")
os.environ.setdefault("OPENBLAS_NUM_THREADS", "1")
os.environ.setdefault("NUMEXPR_NUM_THREADS", "1")

import math
import hydra
import random
from tqdm import tqdm
from datasets.waymo.dataset_ctrl_sim import CtRLSimDataset
from cfgs.config import CONFIG_PATH
import multiprocessing as mp
from omegaconf import OmegaConf

def _mp_init():
    # Ensure per-process thread caps
    os.environ["OMP_NUM_THREADS"] = "1"
    os.environ["MKL_NUM_THREADS"] = "1"
    os.environ["OPENBLAS_NUM_THREADS"] = "1"
    os.environ["NUMEXPR_NUM_THREADS"] = "1"
    try:
        import torch
        torch.set_num_threads(1)
        torch.set_num_interop_threads(1)
    except Exception:
        pass

# ───────────────────────────────────────────────────────────
#  helper so the Pool can pickle it
# ───────────────────────────────────────────────────────────
def _work_one_chunk(idx, cfg_dict):
    cfg = OmegaConf.create(cfg_dict)
    cfg.preprocess_waymo.chunk_idx = idx
    _run_one_cfg(cfg)


# ───────────────────────────────────────────────────────────
#  the old body → turned into a function we can reuse
# ───────────────────────────────────────────────────────────
def _run_one_cfg(cfg):
    random.seed(42)

    cfg.ctrl_sim.dataset.preprocess = False
    cfg.ctrl_sim.dataset.create_gpudrive_dataset = True
    dset = CtRLSimDataset(cfg.ctrl_sim.dataset, split_name=cfg.preprocess_waymo.mode)

    if cfg.preprocess_waymo.mode != 'test':
        dataset_size = 10000
    else:
        dataset_size = 250

    start = cfg.preprocess_waymo.chunk_idx * cfg.preprocess_waymo.chunk_size
    end   = start + cfg.preprocess_waymo.chunk_size
    chunk = [i for i in range(start, min(end, len(dset.files)))]
    if not chunk:
        return

    num_valid_files = 0
    for idx in tqdm(chunk, position=0, leave=False):
        d = dset.get(idx)

        if d:
            num_valid_files += 1

        if num_valid_files >= dataset_size:
            break

    print(f"Created {num_valid_files} valid files out of {len(chunk)} files")


@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):

    if cfg.preprocess_waymo.mode != 'test':
        cfg.preprocess_waymo.chunk_idx = 0
        cfg.preprocess_waymo.chunk_size = 20000
    else:
        cfg.preprocess_waymo.chunk_idx = 0
        cfg.preprocess_waymo.chunk_size = 500
    
    # case 1 — behave exactly like before (single-chunk run)
    if cfg.preprocess_waymo.chunk_idx >= 0:
        _run_one_cfg(cfg)
        print("Done!")
        return

    total_chunks = 1
    n_workers = min(
        cfg.preprocess_waymo.get("num_workers", mp.cpu_count()),
        total_chunks
    )
    cfg_dict = OmegaConf.to_container(cfg, resolve=True)

    print(f"[preprocess-waymo]  Launching {total_chunks} chunks on {n_workers} workers …")
    with mp.Pool(processes=n_workers, initializer=_mp_init) as pool:
        pool.starmap(
            _work_one_chunk,
            [(i, cfg_dict) for i in range(total_chunks)]
        )
    print("Done!")

if __name__ == "__main__":
    mp.set_start_method("spawn", force=True)
    main()

================================================
FILE: data_processing/waymo/create_waymo_eval_set.py
================================================
import random 
import pickle
import hydra 
from cfgs.config import CONFIG_PATH 
import glob
import os

@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):
    """Cache 50000 random test files for computing metrics on the Waymo dataset."""
    random.seed(42)

    test_dir = os.path.join(cfg.scratch_root, 'scenario_dreamer_ae_preprocess_waymo', 'test')
    test_files = sorted(glob.glob(test_dir + "/*-of-*_*_0_*.pkl")) # grab all lg_type = NON_PARTITIONED files
    random.shuffle(test_files)

    test_files = test_files[:50000]
    test_filenames = [os.path.basename(file) for file in test_files]

    assert len(test_filenames) == 50000

    waymo_test_dict = {
        'files': test_filenames
    }

    with open(os.path.join(cfg.project_root, 'metadata', 'waymo_eval_set.pkl'), 'wb') as f:
        pickle.dump(waymo_test_dict, f)

    print("Done!")

if __name__ == "__main__":
    main()

================================================
FILE: data_processing/waymo/generate_k_disks_vocabulary.py
================================================
import os
import hydra
import numpy as np
import random
import torch
import pickle
from tqdm import tqdm

from cfgs.config import CONFIG_PATH
from datasets.waymo.dataset_ctrl_sim import CtRLSimDataset
from utils.k_disks_helpers import compute_k_disks_vocabulary
from utils.viz import plot_k_disks_vocabulary

SEED = 42
NUM_SCENARIOS = 10000

@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):
    """ Generate k-disks vocabulary from Waymo dataset."""
    project_root = cfg.project_root
    cfg = cfg.ctrl_sim.dataset
    cfg.preprocess = False
    cfg.collect_state_transitions = True
    
    np.random.seed(SEED)
    random.seed(SEED)
    torch.manual_seed(SEED)

    dset = CtRLSimDataset(cfg, split_name='train')

    state_transitions_all = []
    for idx in tqdm(range(len(dset))):
        with open(dset.files[idx], 'rb') as file:
            data = pickle.load(file)
        
        state_transitions = dset.collect_state_transitions(data)
        state_transitions_all.append(state_transitions)
    
        if idx == NUM_SCENARIOS:
            break 
    state_transitions_all = np.concatenate(state_transitions_all, axis=0)
    
    V = compute_k_disks_vocabulary(
        state_transitions_all, 
        vocab_size=cfg.vocab_size, 
        l=1, 
        w=1, 
        eps=cfg.k_disks_eps
    )
    plot_k_disks_vocabulary(
        V, 
        png_path=os.path.join(
            project_root,
            'metadata',
            f'k_disks_vocab_{cfg.vocab_size}_{cfg.simulation_hz}Hz_seed{SEED}.png'
        ), dpi=1000)
    V_dict = {'V': V}

    with open(os.path.join(
        project_root,
        'metadata',
        f'k_disks_vocab_{cfg.vocab_size}_{cfg.simulation_hz}Hz_seed{SEED}.pkl'
        ), 'wb') as f:
        pickle.dump(V_dict, f)
    print("Finished generating k disks vocabulary.")


if __name__ == "__main__":
    main()

================================================
FILE: data_processing/waymo/generate_waymo_dataset.py
================================================
import math
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""    # hide GPUs
import copy
import pickle
import hydra
from tqdm import tqdm
from omegaconf import OmegaConf

import numpy as np
import tensorflow as tf
import multiprocessing as mp

from waymo_open_dataset.protos import scenario_pb2
from google.protobuf.json_format import MessageToDict
from cfgs.config import CONFIG_PATH

ERR_VAL = -1e4
_WAYMO_OBJECT_STR = {
    'TYPE_UNSET': "unset",
    'TYPE_VEHICLE': "vehicle",
    'TYPE_PEDESTRIAN': "pedestrian",
    'TYPE_CYCLIST': "cyclist",
    'TYPE_OTHER': "other",
}

def poly_gon_and_line(poly_dict):

    plg_xyz = []

    if type(poly_dict) == list:
        for plg in poly_dict:
            plg_xyz += [[plg['x'],plg['y'],plg['z']]]
    else:
        plg_xyz = [poly_dict['x'],poly_dict['y'],poly_dict['z']]

    plg_xyz = np.array(plg_xyz)

    return plg_xyz


def road_info_except_lane(x_list, road_keys):

    output = {}
    output['id'] = []
    
    key_x = list(x_list[0].keys())[1]
    keys = road_keys[key_x]
    for key in keys:
        output[key] = []

    for x in x_list:
        output['id'] += [x['id']]
        for key in keys:
            if key in list(x[key_x].keys()):
                if key[0] == 'p':
                    output[key] += [poly_gon_and_line(x[key_x][key])]
                else:
                    output[key] += [x[key_x][key]]
            else:
                output[key] += [None]
    
    return output


def road_info_lane(x_dict):
    
    lanes = dict()

    for ln in x_dict:
        
        ln_info = dict()
        ln_id = ln['id']

        for key in ln['lane'].keys():
            if key[0] == 'p':
                ln_info[key] = poly_gon_and_line(ln['lane']['polyline'])
            else:
                ln_info[key] = ln['lane'][key]

        lanes[ln_id] = ln_info
    
    return lanes


def get_lane_pairs(engage_lanes):

    lane_ids = list(engage_lanes.keys())
    pre_pairs, suc_pairs = {}, {}
    left_pairs, right_pairs = {}, {}

    for i, lane_id in enumerate(lane_ids):

        lane = engage_lanes[lane_id]

        if 'entryLanes' in lane.keys():
            for eL in lane['entryLanes']:
                if eL in lane_ids:
                    if int(lane_id) in pre_pairs:
                        pre_pairs[int(lane_id)].append(int(eL))
                    else:
                        pre_pairs[int(lane_id)] = [int(eL)]
        
        if 'exitLanes' in lane.keys():
            for eL in lane['exitLanes']:
                if eL in lane_ids:
                    if int(lane_id) in suc_pairs:
                        suc_pairs[int(lane_id)].append(int(eL))
                    else:
                        suc_pairs[int(lane_id)] = [int(eL)]

        if 'leftNeighbors' in lane.keys():
            for left in lane['leftNeighbors']:
                left = left['featureId']
                if left in lane_ids:
                    if int(lane_id) in left_pairs:
                        left_pairs[int(lane_id)].append(int(left))
                    else:
                        left_pairs[int(lane_id)] = [int(left)]

        # Add right neighbors
        if 'rightNeighbors' in lane.keys():
            for right in lane['rightNeighbors']:
                right = right['featureId']
                if right in lane_ids:
                    if int(lane_id) in right_pairs:
                        right_pairs[int(lane_id)].append(int(right))
                    else:
                        right_pairs[int(lane_id)] = [int(right)]

    return pre_pairs, suc_pairs, left_pairs, right_pairs

def get_engage_lanes(data):

    lanes = data['road_info']['lane']
    engage_lanes = dict()
    lane_ids = list(lanes.keys())

    for id in lane_ids:
        lane = lanes[id]
        if len(lane['polyline']) < 2: #rule out those 1 point lane
            continue
        else:
            lane = copy.deepcopy(lane)
            engage_lanes[id] = lane
            
    return engage_lanes

def get_lane_graph(data):
    engage_lanes = get_engage_lanes(data)
    pre_pairs, suc_pairs, left_pairs, right_pairs = get_lane_pairs(engage_lanes)
    graph = dict()

    graph['pre_pairs'] = pre_pairs   
    graph['suc_pairs'] = suc_pairs
    graph['left_pairs'] = left_pairs
    graph['right_pairs'] = right_pairs  

    return graph


def process_lanegraph(data):
    """lanes are
    {
        xys: n x 2 array (xy locations)
        in_edges: n x X list of lists
        out_edges: n x X list of lists
        edges: m x 5 (x,y,hcos,hsin,l)
        edgeixes: m x 2 (v0, v1)
        ee2ix: dict (v0, v1) -> ei
    }
    """
    lanes = {}
    pre_pairs = {}
    suc_pairs = {}
    left_pairs = {}
    right_pairs = {}
    for lane_id in data['road_info']['lane']:
        lane_type = data['road_info']['lane'][lane_id]['type']
        if lane_type == 'TYPE_UNDEFINED' or lane_type == 'TYPE_BIKE_LANE':
            continue
        
        my_lane = data['road_info']['lane'][lane_id]['polyline']
        lanes[int(lane_id)] = my_lane[:, :2]
        
        if int(lane_id) in data['graph']['pre_pairs'].keys():
            pre_pairs[int(lane_id)] = data['graph']['pre_pairs'][int(lane_id)]
        if int(lane_id) in data['graph']['suc_pairs'].keys():
            suc_pairs[int(lane_id)] = data['graph']['suc_pairs'][int(lane_id)]
        if int(lane_id) in data['graph']['left_pairs'].keys():
            left_pairs[int(lane_id)] = data['graph']['left_pairs'][int(lane_id)]
        if int(lane_id) in data['graph']['right_pairs'].keys():
            right_pairs[int(lane_id)] = data['graph']['right_pairs'][int(lane_id)]

    road_edges = {}
    crosswalks = {}
    stop_signs = {}
    if 'roadEdge' in data['road_info']:
        for i in range(len(data['road_info']['roadEdge']['id'])):
            road_edges[int(data['road_info']['roadEdge']['id'][i])] = data['road_info']['roadEdge']['polyline'][i][:, :2]
    if 'crosswalk' in data['road_info']:
        for i in range(len(data['road_info']['crosswalk']['id'])):
            crosswalks[int(data['road_info']['crosswalk']['id'][i])] = data['road_info']['crosswalk']['polygon'][i][:, :2]
    if 'stopSign' in data['road_info']:
        for i in range(len(data['road_info']['stopSign']['id'])):
            stop_signs[int(data['road_info']['stopSign']['id'][i])] = data['road_info']['stopSign']['position'][i][:2]

    return {'lanes': lanes, 
            'pre_pairs': pre_pairs, 
            'suc_pairs': suc_pairs, 
            'left_pairs': left_pairs, 
            'right_pairs': right_pairs,
            'road_edges': road_edges,
            'crosswalks': crosswalks,
            'stop_signs': stop_signs}


def _parse_object_state(states, final_state):
    return {
        "position": [{
            "x": state['centerX'],
            "y": state['centerY']
        } if state['valid'] else {
            "x": ERR_VAL,
            "y": ERR_VAL
        } for state in states],
        "width": final_state['width'],
        "length": final_state['length'],
        "heading": [
            math.degrees(state['heading']) if state['valid'] else ERR_VAL
            for state in states
        ],  # Use rad here?
        "velocity": [{
            "x": state['velocityX'],
            "y": state['velocityY']
        } if state['valid'] else {
            "x": ERR_VAL,
            "y": ERR_VAL
        } for state in states],
        "valid": [state['valid'] for state in states]
    }


def _init_object(track):
    """Construct a dict representing the state of the object (vehicle, cyclist, pedestrian).

    Args:
        track (scenario_pb2.Track): protobuf representing the scenario

    Returns
    -------
        Optional[Dict[str, Any]]: dict representing the trajectory and velocity of an object.
    """
    
    final_valid_index = 0
    for i, state in enumerate(track['states']):
        if state['valid']:
            final_valid_index = i

    # not a car
    if 'width' not in track['states'][final_valid_index]:
        return None

    obj = _parse_object_state(track['states'], track['states'][final_valid_index])
    obj["type"] = _WAYMO_OBJECT_STR[track['objectType']]
    return obj


def get_objects(scenario_list, index):
    objects = []
    av_objects_idx = -1
    
    scen = scenario_list[index]
    av_idx = scen['sdcTrackIndex']
    for i, track in enumerate(scen['tracks']):
        if i == av_idx:
            av_objects_idx = len(objects)
        
        obj_to_append = _init_object(track)
        if obj_to_append is not None:
            objects.append(obj_to_append)

    assert av_objects_idx != -1
    
    return objects, av_objects_idx


# we only retrieve lanes, as we are only generating the lane graph
def get_road_info(scenario_list, index):
    scen = scenario_list[index]
    map_feature = dict()

    road_keys = dict()

    road_keys['crosswalk'] = ['polygon']
    road_keys['stopSign'] = ['position']
    road_keys['roadEdge'] = ['polyline']

    if 'mapFeatures' not in scen:
        return None
    
    for mf in scen['mapFeatures']:
        key = list(mf.keys())[1]
        if key in map_feature.keys():
            map_feature[key] += [mf]
        else:
            map_feature[key] = [mf]
    
    road_info = dict()
    for key in map_feature.keys():
        if key == 'lane':
            road_info[key] = road_info_lane(map_feature[key]) 
        elif key in ['roadEdge', 'crosswalk', 'stopSign']:
            road_info[key] = road_info_except_lane(map_feature[key],road_keys)  
        
    return road_info


def collect_data(cfg, output_path, files_path, files, chunk):
    for c in tqdm(chunk):
        filename_path = os.path.join(files_path, files[c])
        dataset = tf.data.TFRecordDataset(filename_path, compression_type='')
        scenario_list = []    
        for data in dataset:
            proto_string = data.numpy()
            proto = scenario_pb2.Scenario()
            proto.ParseFromString(proto_string)
            scenario_dict = MessageToDict(proto)
            scenario_list += [scenario_dict]
        
        for i in range(len(scenario_list)):
            output_file = f'{files[c]}_{i}.pkl'
        
            data = {}
            road_info = get_road_info(scenario_list, i)
            if road_info is None:
                continue
            data['road_info'] = road_info
            
            if 'lane' not in data['road_info']:
                continue
            
            data['graph'] = get_lane_graph(data)
            
            scenario = {}
            lane_graph = process_lanegraph(data)
            objects, av_idx = get_objects(scenario_list, i)

            ### VISUALIZATION FOR TESTING PURPOSES
            # print("Visualizing", i)
            # for lane_id in lane_graph['lanes'].keys():
            #     to_plot = lane_graph['lanes'][lane_id]
            #     plt.plot(to_plot[:, 0], to_plot[:, 1], color='grey', linewidth=0.5)
            #     idx = len(to_plot) // 2
            #     plt.annotate(lane_id,
            #         (to_plot[idx, 0], to_plot[idx, 1]), zorder=5, fontsize=1, color='blue')

            # for road_edge_id in lane_graph['road_edges'].keys():
            #     to_plot = lane_graph['road_edges'][road_edge_id]
            #     plt.plot(to_plot[:, 0], to_plot[:, 1], linewidth=0.75)
            #     idx = len(to_plot) // 2
            #     plt.annotate(road_edge_id,
            #         (to_plot[idx, 0], to_plot[idx, 1]), zorder=5, fontsize=4, color='red')

            # for stop_sign_id in lane_graph['stop_signs'].keys():
            #     to_plot = lane_graph['stop_signs'][stop_sign_id]
            #     plt.scatter(to_plot[0], to_plot[1], color='red', s=10)
            
            # for crosswalk_id in lane_graph['crosswalks'].keys():
            #     to_plot = lane_graph['crosswalks'][crosswalk_id]
            #     plt.plot(to_plot[:, 0], to_plot[:, 1], color='green', linewidth=0.5)
            
            # plt.gca().set_aspect('equal')
            # plt.savefig(f'lane_graph_{i}.png', dpi=1000)
            # plt.clf()

            scenario['lane_graph'] = lane_graph
            scenario['objects'] = objects
            scenario['av_idx'] = av_idx 

            scenario_path = os.path.join(output_path, output_file)
            with open(scenario_path, "wb") as f:
                pickle.dump(scenario, f)


def _work_one_chunk(idx, cfg_dict):
    """Helper so the Pool can pickle the cfg."""
    # Re-create Hydra config in the subprocess
    cfg = OmegaConf.create(cfg_dict)
    cfg.generate_waymo_dataset.chunk_idx = idx
    _run_one_cfg(cfg)                 # <-- see wrapper below


def _run_one_cfg(cfg):
    """A tiny wrapper around your existing logic."""
    if cfg.generate_waymo_dataset.mode == 'train':
        files_path = cfg.waymo_train_folder
        output_path = cfg.generate_waymo_dataset.output_data_folder_train
    elif cfg.generate_waymo_dataset.mode == 'val':
        files_path = cfg.waymo_val_folder
        output_path = cfg.generate_waymo_dataset.output_data_folder_val
    else:
        files_path = cfg.waymo_test_folder
        output_path = cfg.generate_waymo_dataset.output_data_folder_test

    files = sorted(os.listdir(files_path))
    start = cfg.generate_waymo_dataset.chunk_idx * cfg.generate_waymo_dataset.chunk_size
    end   = start + cfg.generate_waymo_dataset.chunk_size
    chunk = [i for i in range(start, min(end, len(files)))]
    if not chunk:
        return

    os.makedirs(output_path, exist_ok=True)
    collect_data(cfg, output_path, files_path, files, chunk)


@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):
    """
    If chunk_idx >= 0 → behave exactly as before (single chunk).
    If chunk_idx  < 0 → farm out *all* chunks to a worker pool.
    """
    if cfg.generate_waymo_dataset.chunk_idx >= 0:
        _run_one_cfg(cfg)
        return

    # ---  fan out  ---
    if cfg.generate_waymo_dataset.mode == 'train':
        total_chunks = 20
    else:               # val or test
        total_chunks = 5

    n_workers = min(cfg.generate_waymo_dataset.get("num_workers", mp.cpu_count()),
                    total_chunks)
    print("Num workers: ", n_workers)
    print("Total chunks: ", total_chunks)
    print("Chunk size: ", cfg.generate_waymo_dataset.chunk_size)
    cfg_dict = OmegaConf.to_container(cfg, resolve=True)

    with mp.Pool(processes=n_workers) as pool:
        pool.starmap(_work_one_chunk,
                     [(i, cfg_dict) for i in range(total_chunks)])


if __name__ == "__main__":
    mp.set_start_method("spawn", force=True)
    main()

================================================
FILE: data_processing/waymo/preprocess_dataset_waymo.py
================================================
import os
os.environ.setdefault("OMP_NUM_THREADS", "1")
os.environ.setdefault("MKL_NUM_THREADS", "1")
os.environ.setdefault("OPENBLAS_NUM_THREADS", "1")
os.environ.setdefault("NUMEXPR_NUM_THREADS", "1")

import math
import hydra
import random
from tqdm import tqdm
from datasets.waymo.dataset_autoencoder_waymo import WaymoDatasetAutoEncoder
from datasets.waymo.dataset_ctrl_sim import CtRLSimDataset
from cfgs.config import CONFIG_PATH, NUM_WAYMO_TRAIN_SCENARIOS
import multiprocessing as mp
from omegaconf import OmegaConf

def _mp_init():
    # Ensure per-process thread caps
    os.environ["OMP_NUM_THREADS"] = "1"
    os.environ["MKL_NUM_THREADS"] = "1"
    os.environ["OPENBLAS_NUM_THREADS"] = "1"
    os.environ["NUMEXPR_NUM_THREADS"] = "1"
    try:
        import torch
        torch.set_num_threads(1)
        torch.set_num_interop_threads(1)
    except Exception:
        pass

# ───────────────────────────────────────────────────────────
#  helper so the Pool can pickle it
# ───────────────────────────────────────────────────────────
def _work_one_chunk(idx, cfg_dict):
    cfg = OmegaConf.create(cfg_dict)
    cfg.preprocess_waymo.chunk_idx = idx          # set my own chunk
    _run_one_cfg(cfg)


# ───────────────────────────────────────────────────────────
#  the old body → turned into a function we can reuse
# ───────────────────────────────────────────────────────────
def _run_one_cfg(cfg):
    random.seed(42)

    cfg.dataset_root = cfg.scratch_root
    if cfg.preprocess_waymo.stage == 'scenario_dreamer':
        cfg.ae.dataset.preprocess = False
        dset = WaymoDatasetAutoEncoder(cfg.ae.dataset, split_name=cfg.preprocess_waymo.mode)
    else:
        cfg.ctrl_sim.dataset.preprocess = False
        dset = CtRLSimDataset(cfg.ctrl_sim.dataset, split_name=cfg.preprocess_waymo.mode)

    start = cfg.preprocess_waymo.chunk_idx * cfg.preprocess_waymo.chunk_size
    end   = start + cfg.preprocess_waymo.chunk_size
    chunk = [i for i in range(start, min(end, len(dset.files)))]
    if not chunk:
        return

    for idx in tqdm(chunk, position=0, leave=False):
        d = dset.get(idx)


@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):

    # case 1 — behave exactly like before (single-chunk run)
    if cfg.preprocess_waymo.chunk_idx >= 0:
        _run_one_cfg(cfg)
        print("Done!")
        return

    # case 2 — chunk_idx == -1  ⇒  run *all* chunks in parallel
    if cfg.preprocess_waymo.mode == 'train':
        total_chunks = math.ceil(NUM_WAYMO_TRAIN_SCENARIOS / cfg.preprocess_waymo.chunk_size)
    else:
        if cfg.preprocess_waymo.stage == 'scenario_dreamer':
            total_chunks = 1 # chunk size is 50000
        else:
            total_chunks = 4 # chunk size is 12000
    
    if cfg.preprocess_waymo.mode == 'test':
        cfg.preprocess_waymo.chunk_size = 61000 # test have 10000 scenes resampled from same scenarios
    n_workers = min(
        cfg.preprocess_waymo.get("num_workers", mp.cpu_count()),
        total_chunks
    )
    cfg_dict = OmegaConf.to_container(cfg, resolve=True)

    print(f"[preprocess-waymo]  Launching {total_chunks} chunks on {n_workers} workers …")
    with mp.Pool(processes=n_workers, initializer=_mp_init) as pool:
        pool.starmap(
            _work_one_chunk,
            [(i, cfg_dict) for i in range(total_chunks)]
        )
    print("Done!")

if __name__ == "__main__":
    mp.set_start_method("spawn", force=True)
    main()

================================================
FILE: datamodules/nuplan/nuplan_datamodule_autoencoder.py
================================================
import pytorch_lightning as pl 

from datasets.nuplan.dataset_autoencoder_nuplan import NuplanDatasetAutoEncoder
from torch_geometric.loader import DataLoader
import os

# this is so that CPUs are not suboptimally utilized
def worker_init_fn(worker_id):
    os.sched_setaffinity(0, range(os.cpu_count())) 

class NuplanDataModuleAutoEncoder(pl.LightningDataModule):

    def __init__(self,
                 train_batch_size,
                 val_batch_size,
                 num_workers,
                 pin_memory,
                 persistent_workers,
                 dataset_cfg):
        super(NuplanDataModuleAutoEncoder, self).__init__()
        self.train_batch_size = train_batch_size
        self.val_batch_size = val_batch_size 
        self.num_workers = num_workers
        self.pin_memory = pin_memory 
        self.persistent_workers = persistent_workers
        self.cfg_dataset = dataset_cfg
        

    def setup(self, stage):
        self.train_dataset = NuplanDatasetAutoEncoder(self.cfg_dataset, split_name='train')
        self.val_dataset = NuplanDatasetAutoEncoder(self.cfg_dataset, split_name='val') 


    def train_dataloader(self):
        return DataLoader(self.train_dataset, 
                          batch_size=self.train_batch_size, 
                          shuffle=True,
                          num_workers=self.num_workers,
                          pin_memory=self.pin_memory,
                          drop_last=True,
                          worker_init_fn=worker_init_fn)


    def val_dataloader(self):
        return DataLoader(self.val_dataset,
                          batch_size=self.val_batch_size,
                          shuffle=False,
                          num_workers=self.num_workers,
                          pin_memory=self.pin_memory,
                          drop_last=False)

================================================
FILE: datamodules/nuplan/nuplan_datamodule_ldm.py
================================================
import pytorch_lightning as pl 
from datasets.nuplan.dataset_ldm_nuplan import NuplanDatasetLDM
from torch_geometric.loader import DataLoader
import os

# this ensures CPUs are not suboptimally utilized
def worker_init_fn(worker_id):
    os.sched_setaffinity(0, range(os.cpu_count())) 

class NuplanDataModuleLDM(pl.LightningDataModule):

    def __init__(self,
                 train_batch_size,
                 val_batch_size,
                 num_workers,
                 pin_memory,
                 persistent_workers,
                 dataset_cfg):
        super(NuplanDataModuleLDM, self).__init__()
        self.train_batch_size = train_batch_size
        self.val_batch_size = val_batch_size 
        self.num_workers = num_workers
        self.pin_memory = pin_memory 
        self.persistent_workers = persistent_workers
        self.cfg_dataset = dataset_cfg
        

    def setup(self, stage):
        self.train_dataset = NuplanDatasetLDM(self.cfg_dataset, split_name='train')
        self.val_dataset = NuplanDatasetLDM(self.cfg_dataset, split_name='val') 


    def train_dataloader(self):
        return DataLoader(self.train_dataset, 
                          batch_size=self.train_batch_size, 
                          shuffle=True,
                          num_workers=self.num_workers,
                          pin_memory=self.pin_memory,
                          drop_last=True,
                          worker_init_fn=worker_init_fn)


    def val_dataloader(self):
        return DataLoader(self.val_dataset,
                          batch_size=self.val_batch_size,
                          shuffle=True,
                          num_workers=self.num_workers,
                          pin_memory=self.pin_memory,
                          drop_last=True)

================================================
FILE: datamodules/waymo/waymo_datamodule_autoencoder.py
================================================
import pytorch_lightning as pl 

from datasets.waymo.dataset_autoencoder_waymo import WaymoDatasetAutoEncoder
from torch_geometric.loader import DataLoader
import os

# this is so that CPUs are not suboptimally utilized
def worker_init_fn(worker_id):
    os.sched_setaffinity(0, range(os.cpu_count())) 

class WaymoDataModuleAutoEncoder(pl.LightningDataModule):

    def __init__(self,
                 train_batch_size,
                 val_batch_size,
                 num_workers,
                 pin_memory,
                 persistent_workers,
                 dataset_cfg):
        super(WaymoDataModuleAutoEncoder, self).__init__()
        self.train_batch_size = train_batch_size
        self.val_batch_size = val_batch_size 
        self.num_workers = num_workers
        self.pin_memory = pin_memory 
        self.persistent_workers = persistent_workers
        self.cfg_dataset = dataset_cfg
        

    def setup(self, stage):
        self.train_dataset = WaymoDatasetAutoEncoder(self.cfg_dataset, split_name='train')
        self.val_dataset = WaymoDatasetAutoEncoder(self.cfg_dataset, split_name='val') 


    def train_dataloader(self):
        return DataLoader(self.train_dataset, 
                          batch_size=self.train_batch_size, 
                          shuffle=True,
                          num_workers=self.num_workers,
                          pin_memory=self.pin_memory,
                          drop_last=True,
                          worker_init_fn=worker_init_fn)


    def val_dataloader(self):
        return DataLoader(self.val_dataset,
                          batch_size=self.val_batch_size,
                          shuffle=False,
                          num_workers=self.num_workers,
                          pin_memory=self.pin_memory,
                          drop_last=False)

================================================
FILE: datamodules/waymo/waymo_datamodule_ctrl_sim.py
================================================
import pytorch_lightning as pl 

from datasets.waymo.dataset_ctrl_sim import CtRLSimDataset
from torch_geometric.loader import DataLoader
import os

# this is so that CPUs are not suboptimally utilized
def worker_init_fn(worker_id):
    os.sched_setaffinity(0, range(os.cpu_count())) 

class WaymoDataModuleCtRLSim(pl.LightningDataModule):

    def __init__(self,
                 train_batch_size,
                 val_batch_size,
                 num_workers,
                 pin_memory,
                 persistent_workers,
                 dataset_cfg):
        super(WaymoDataModuleCtRLSim, self).__init__()
        self.train_batch_size = train_batch_size
        self.val_batch_size = val_batch_size 
        self.num_workers = num_workers
        self.pin_memory = pin_memory 
        self.persistent_workers = persistent_workers
        self.cfg_dataset = dataset_cfg
        

    def setup(self, stage):
        self.train_dataset = CtRLSimDataset(self.cfg_dataset, split_name='train')
        self.val_dataset = CtRLSimDataset(self.cfg_dataset, split_name='val') 


    def train_dataloader(self):
        return DataLoader(self.train_dataset, 
                          batch_size=self.train_batch_size, 
                          shuffle=True,
                          num_workers=self.num_workers,
                          pin_memory=self.pin_memory,
                          drop_last=True,
                          worker_init_fn=worker_init_fn)


    def val_dataloader(self):
        return DataLoader(self.val_dataset,
                          batch_size=self.val_batch_size,
                          shuffle=False,
                          num_workers=self.num_workers,
                          pin_memory=self.pin_memory,
                          drop_last=False)

================================================
FILE: datamodules/waymo/waymo_datamodule_ldm.py
================================================
import pytorch_lightning as pl 
from datasets.waymo.dataset_ldm_waymo import WaymoDatasetLDM
from torch_geometric.loader import DataLoader
import os

# this ensures CPUs are not suboptimally utilized
def worker_init_fn(worker_id):
    os.sched_setaffinity(0, range(os.cpu_count())) 

class WaymoDataModuleLDM(pl.LightningDataModule):

    def __init__(self,
                 train_batch_size,
                 val_batch_size,
                 num_workers,
                 pin_memory,
                 persistent_workers,
                 dataset_cfg):
        super(WaymoDataModuleLDM, self).__init__()
        self.train_batch_size = train_batch_size
        self.val_batch_size = val_batch_size 
        self.num_workers = num_workers
        self.pin_memory = pin_memory 
        self.persistent_workers = persistent_workers
        self.cfg_dataset = dataset_cfg
        

    def setup(self, stage):
        self.train_dataset = WaymoDatasetLDM(self.cfg_dataset, split_name='train')
        self.val_dataset = WaymoDatasetLDM(self.cfg_dataset, split_name='val') 


    def train_dataloader(self):
        return DataLoader(self.train_dataset, 
                          batch_size=self.train_batch_size, 
                          shuffle=True,
                          num_workers=self.num_workers,
                          pin_memory=self.pin_memory,
                          drop_last=True,
                          worker_init_fn=worker_init_fn)


    def val_dataloader(self):
        return DataLoader(self.val_dataset,
                          batch_size=self.val_batch_size,
                          shuffle=True,
                          num_workers=self.num_workers,
                          pin_memory=self.pin_memory,
                          drop_last=True)

================================================
FILE: datasets/nuplan/dataset_autoencoder_nuplan.py
================================================
import os
import sys
import glob
import hydra
import torch
import pickle
import random
import sys
import copy
import gzip
from tqdm import tqdm
from typing import Any, Dict

from torch_geometric.data import Dataset
torch.set_printoptions(threshold=100000)
import numpy as np
np.set_printoptions(suppress=True, threshold=sys.maxsize)
from cfgs.config import CONFIG_PATH, NUPLAN_VEHICLE, NUPLAN_PEDESTRIAN, NUPLAN_STATIC_OBJECT, PARTITIONED

from utils.data_container import ScenarioDreamerData
from utils.lane_graph_helpers import resample_polyline, adjacency_matrix_to_adjacency_list
from utils.pyg_helpers import get_edge_index_bipartite, get_edge_index_complete_graph
from utils.torch_helpers import from_numpy
from utils.data_helpers import get_lane_connection_type_onehot_nuplan, get_object_type_onehot_nuplan, get_lane_type_onehot_nuplan, modify_agent_states, normalize_scene, randomize_indices

class NuplanDatasetAutoEncoder(Dataset):
    """A Torch-Geometric ``Dataset`` wrapping NuPlan scenes for auto-encoding.

    The dataset performs processing of the extracted
    NuPlan Dataset pickles (obtained from a separate SLEDGE data extraction script), including agent / lane-graph extraction,
    and partitioning for in-painting. If preprocess=True, loads directly from preprocessed files
    for efficient autoencoder training. If preprocess=False, saves preprocessed data to disk.
    """
    def __init__(self, cfg: Any, split_name: str = "train", mode: str = "train") -> None:
        """Instantiate a :class:`NuplanDatasetAutoEncoder`.

        Parameters
        ----------
        cfg
            Hydra configuration object containing dataset configs (cfg.dataset in global config)
        split_name
            One of ``{"train", "val", "test"}`` selecting which split
            to load from ``cfg.dataset.dataset_path``.
        mode
            "train" or "eval" - affects shuffling/randomisation inside
            :meth:`get_data`.
        """
        
        super(NuplanDatasetAutoEncoder, self).__init__()
        self.cfg = cfg
        self.data_root_raw = self.cfg.sledge_raw_dataset_path
        self.data_root_map_id = self.cfg.map_id_dataset_path
        self.split_name = split_name 
        self.mode = mode
        self.preprocess = self.cfg.preprocess
        self.preprocessed_dir = os.path.join(self.cfg.preprocess_dir, f"{self.split_name}")
        if not os.path.exists(self.preprocessed_dir):
            os.makedirs(self.preprocessed_dir, exist_ok=True)

        if not self.preprocess:
            self.files = sorted(glob.glob(os.path.join(self.data_root_raw, f"{self.split_name}") + "/*.gz"))
        else:
            self.files = sorted(glob.glob(self.preprocessed_dir + "/*.pkl"))

        self.dset_len = len(self.files)

    
    def get_lane_graph_within_fov(self, lane_graph: Dict[str, Any]) -> Dict[str, Any]:
        """Return only those lanes that intersect the square *field-of-view*.

        The coordinate frame is converted to an ego-centred frame
        earlier in the pipeline, so the autonomous vehicle (AV) is at
        the origin.  A lane point is considered *in view* when both its
        absolute X *and* Y coordinates are strictly smaller than
        ``cfg_dataset.fov / 2``.  Each retained lane is then resampled
        to a fixed number of points.

        Parameters
        ----------
        lane_graph : Dict[str, Any]
            A *compact* or *partitioned* lane-graph with the standard
            keys ``{"lanes", "lane_types", "pre_pairs", "suc_pairs"}``.  
            All coordinates must already be expressed
            in the AV-centric frame.

        Returns
        -------
        lane_graph_within_fov: Dict[str, Any]
            A new lane-graph containing only lanes that intersect the
            configured field-of-view.  Connection dictionaries are
            pruned so they reference *in-FOV* lanes exclusively, and each
            lane polyline has exactly
            ``cfg_dataset.upsample_lane_num_points`` points.
        """
        lane_ids = lane_graph['lanes'].keys()
        pre_pairs = lane_graph['pre_pairs']
        suc_pairs = lane_graph['suc_pairs']
        
        # ── Identify lanes that intersect the square FOV ──────────────
        lane_ids_within_fov = []
        valid_pts = {}
        for lane_id in lane_ids:
            lane = lane_graph['lanes'][lane_id]
            points_in_fov_x = np.abs(lane[:, 0]) < (self.cfg.fov / 2)
            points_in_fov_y = np.abs(lane[:, 1]) < (self.cfg.fov / 2)
            points_in_fov = points_in_fov_x * points_in_fov_y
            
            if np.any(points_in_fov):
                lane_ids_within_fov.append(lane_id)
                valid_pts[lane_id] = points_in_fov

        lanes_within_fov = {}
        lane_types_within_fov = {}
        pre_pairs_within_fov = {}
        suc_pairs_within_fov = {}

        # ── Prune connection dictionaries and resample polylines ─────────────────────────────
        for lane_id in lane_ids_within_fov:
            if lane_id in lane_ids:
                lane = lane_graph['lanes'][lane_id][valid_pts[lane_id]]
                resampled_lane = resample_polyline(lane, num_points=self.cfg.upsample_lane_num_points)
                lanes_within_fov[lane_id] = resampled_lane
                lane_types_within_fov[lane_id] = lane_graph['lane_types'][lane_id]
            
            if lane_id in pre_pairs:
                pre_pairs_within_fov[lane_id] = [l for l in pre_pairs[lane_id] if l in lane_ids_within_fov]
            else:
                pre_pairs_within_fov[lane_id] = []
            
            if lane_id in suc_pairs:
                suc_pairs_within_fov[lane_id] = [l for l in suc_pairs[lane_id] if l in lane_ids_within_fov]
            else:
                suc_pairs_within_fov[lane_id] = [] 
        
        lane_graph_within_fov = {
            'lanes': lanes_within_fov,
            'lane_types': lane_types_within_fov,
            'pre_pairs': pre_pairs_within_fov,
            'suc_pairs': suc_pairs_within_fov
        }
        
        return lane_graph_within_fov


    def partition_compact_lane_graph(self, compact_lane_graph: Dict[str, Any]) -> Dict[str, Any]:
        """Split lanes that cross the scene's y-axis (``x = 0``).
        NOTE: Waymo splits on ``x = 0``, but NuPlan splits on ``y = 0``. This is to stay consistent 
        with how SLEDGE scenes are oriented.

        The coordinate frame places the ego at ``(0, 0)``.
        To simplify conditional generation (in-painting), we partition
        any merged *compact* lane that crosses ``y = 0`` into multiple
        *sub-lanes* so that the origin acts as a semantic divider.

        Parameters
        ----------
        compact_lane_graph
            The *compact* lane graph returned by
            :meth:`extract_lane_graph`.

        Returns
        -------
        partitioned_lane_graph
            A deep-copy of *compact_lane_graph* where lanes have been
            further split and edge dictionaries updated so that no lane
            segment itself crosses ``y = 0``.
        """
        max_lane_id = max(list(compact_lane_graph['lanes'].keys()))
        next_lane_id = max_lane_id + 1

        lane_ids = list(compact_lane_graph['lanes'].keys())
        for lane_id in lane_ids:
            lane = compact_lane_graph['lanes'][lane_id]
            
            # Get x-values of the lane and find where it crosses or is near x = 0
            x_values = lane[:, 0]  # Assuming lane is [x, y] points
            sign_diff = np.insert(np.diff(np.signbit(x_values)), 0, 0)
            zero_crossings = np.where(sign_diff)[0]  # Indices where lane crosses x = 0
            
            if len(zero_crossings) == 0:  # If no crossings, skip this lane
                continue
            
            # Add artificial partitions at x = 0 crossings
            new_lanes = {}
            start_index = 0
            for crossing in zero_crossings:
                end_index = crossing + 1  # Create a partition from start to crossing
                new_lanes[next_lane_id] = lane[start_index:end_index]
                start_index = crossing  # Update start index for the next partition
                next_lane_id += 1
            
            # Handle the remaining part of the lane after the last crossing
            if zero_crossings[-1] < len(x_values) - 1:
                new_lanes[next_lane_id] = lane[start_index:]
                next_lane_id += 1
            
            # Update the compact_lane_graph with new lanes
            num_new_lanes = len(new_lanes)
            if num_new_lanes == 1:
                continue
            
            for j, new_lane_id in enumerate(new_lanes.keys()):
                compact_lane_graph['lanes'][new_lane_id] = new_lanes[new_lane_id]
                compact_lane_graph['lane_types'][new_lane_id] = compact_lane_graph['lane_types'][lane_id]
                if j == 0:
                    compact_lane_graph['pre_pairs'][new_lane_id] = compact_lane_graph['pre_pairs'][lane_id]
                    # leveraging bijection between suc/pre
                    # replace successors of other lanes with new lane
                    for other_lane_id in compact_lane_graph['pre_pairs'][lane_id]:
                        if other_lane_id is not None:
                            compact_lane_graph['suc_pairs'][other_lane_id].remove(lane_id)
                            compact_lane_graph['suc_pairs'][other_lane_id].append(new_lane_id)
                    compact_lane_graph['suc_pairs'][new_lane_id] = [new_lane_id + 1] # by way we defined new lane ids
                
                elif j == num_new_lanes - 1:
                    compact_lane_graph['suc_pairs'][new_lane_id] = compact_lane_graph['suc_pairs'][lane_id]
                    # leveraging bijection between suc/pre
                    # replace predecessors of other lanes with new lane
                    for other_lane_id in compact_lane_graph['suc_pairs'][lane_id]:
                        if other_lane_id is not None:
                            compact_lane_graph['pre_pairs'][other_lane_id].remove(lane_id)
                            compact_lane_graph['pre_pairs'][other_lane_id].append(new_lane_id)
                    compact_lane_graph['pre_pairs'][new_lane_id] = [new_lane_id - 1] # by way we define new lane ids
                
                else:
                    compact_lane_graph['pre_pairs'][new_lane_id] = [new_lane_id - 1]
                    compact_lane_graph['suc_pairs'][new_lane_id] = [new_lane_id + 1]

            # remove old (now partitioned) lane from lane graph
            del compact_lane_graph['lanes'][lane_id]
            del compact_lane_graph['pre_pairs'][lane_id]
            del compact_lane_graph['suc_pairs'][lane_id]

        return compact_lane_graph
    
    
    def extract_lane_graph(
            self, 
            G, 
            lines, 
            green_lights, 
            red_lights, 
            map_id):
        """ Extracts lane graph from SLEDGE cache data format. Outputs similar format to Waymo 
        but additional lane_type attribute distinguishes lane/green light/red light."""

        lanes = {}
        lane_types = {}
        ct = 0

        lane_graph_adj = G['states']
        
        # remove lanes with only one point
        indices_to_remove = []
        for i, (line_states, line_mask) in enumerate(zip(lines['states'], lines['mask'])):
            line_in_mask = line_states[line_mask, :]  # (n, 3)
            if len(line_in_mask) < 2:
                indices_to_remove.append(i)
                continue

            lanes[ct] = line_in_mask
            lane_types[ct] = get_lane_type_onehot_nuplan("lane")

            ct += 1

        if len(indices_to_remove) > 0:
            lane_graph_adj = np.delete(lane_graph_adj, indices_to_remove, axis=0)
            lane_graph_adj = np.delete(lane_graph_adj, indices_to_remove, axis=1)

        # add green lights to lanes
        for green_light_states, green_light_mask in zip(green_lights['states'], green_lights['mask']):
            green_light = green_light_states[green_light_mask, :]
            if len(green_light) < 2:
                continue

            lanes[ct] = green_light
            lane_types[ct] = get_lane_type_onehot_nuplan("green_light")

            ct += 1
        
        # add red lights to lanes
        for red_light_states, red_light_mask in zip(red_lights['states'], red_lights['mask']):
            red_light = red_light_states[red_light_mask, :]
            if len(red_light) < 2:
                continue

            lanes[ct] = red_light
            lane_types[ct] = get_lane_type_onehot_nuplan("red_light")

            ct += 1

        # adjacency list only defined over lanes, not red/green lights
        pre_pairs, suc_pairs = adjacency_matrix_to_adjacency_list(lane_graph_adj)

        lane_graph = {
            'lanes': lanes,
            'lane_types': lane_types,
            'pre_pairs': pre_pairs,
            'suc_pairs': suc_pairs,
            'map_id': map_id
        }

        return lane_graph

    
    def extract_agents(self, ego, vehicles, pedestrians, static_objects):
        """ Extracts agent features from SLEDGE cache data format.
            Output format is the same as the Waymo dataset, but instead of modeling
            vehicle/pedestrian/bicycle we model vehicle/pedestrian/static_object."""
        processed_agent_states = []
        agent_types = []
        
        """
        `ego` indices:
        0: vel_x
        1: vel_y
        2: accel_x
        3: accel_y
        """
        ego_states = ego['states']
        ego_x = 0.
        ego_y = 0.
        ego_vel_x = ego_states[0]
        ego_vel_y = ego_states[1]
        ego_heading = 0.
        ego_length = self.cfg.ego_length 
        ego_width = self.cfg.ego_width
        ego_state = np.array([ego_x, ego_y, ego_vel_x, ego_vel_y, ego_heading, ego_length, ego_width, 1])
        processed_agent_states.append(ego_state)
        agent_types.append(get_object_type_onehot_nuplan("vehicle"))

        vehicle_states = vehicles['states']
        vehicle_mask = ~vehicles['mask']
        vehicle_states = vehicle_states[vehicle_mask]

        """
        `vehicles`, `pedestrians`, and `static_objects` indices:
        0: x
        1: y
        2: heading
        3: width
        4: length
        5: velocity (speed)
        """
        for v in range(len(vehicle_states)):
            x = vehicle_states[v, 0]
            y = vehicle_states[v, 1]
            heading = vehicle_states[v, 2]
            speed = vehicle_states[v, 5]
            vel_x = speed * np.cos(heading)
            vel_y = speed * np.sin(heading)
            length = vehicle_states[v, 4]
            width = vehicle_states[v, 3]
            veh_state = np.array([x, y, vel_x, vel_y, heading, length, width, 1])
            processed_agent_states.append(veh_state)
            agent_types.append(get_object_type_onehot_nuplan("vehicle"))

        pedestrian_states = pedestrians['states']
        pedestrian_mask = ~pedestrians['mask']
        pedestrian_states = pedestrian_states[pedestrian_mask]

        for v in range(len(pedestrian_states)):
            x = pedestrian_states[v, 0]
            y = pedestrian_states[v, 1]
            heading = pedestrian_states[v, 2]
            speed = pedestrian_states[v, 5]
            vel_x = speed * np.cos(heading)
            vel_y = speed * np.sin(heading)
            length = pedestrian_states[v, 4]
            width = pedestrian_states[v, 3]
            veh_state = np.array([x, y, vel_x, vel_y, heading, length, width, 1])
            processed_agent_states.append(veh_state)
            agent_types.append(get_object_type_onehot_nuplan("pedestrian"))

        static_object_states = static_objects['states']
        static_object_mask = ~static_objects['mask']
        static_object_states = static_object_states[static_object_mask]

        for v in range(len(static_object_states)):
            x = static_object_states[v, 0]
            y = static_object_states[v, 1]
            heading = static_object_states[v, 2]
            vel_x = 0.
            vel_y = 0.
            length = static_object_states[v, 4]
            width = static_object_states[v, 3]
            veh_state = np.array([x, y, vel_x, vel_y, heading, length, width, 1])
            processed_agent_states.append(veh_state)
            agent_types.append(get_object_type_onehot_nuplan("static_object"))
        
        processed_agent_states = np.array(processed_agent_states)
        agent_types = np.array(agent_types)

        return processed_agent_states, agent_types


    def get_agents_within_fov(self, agent_states, agent_types):
        """ Filters agents that are within the field of view (fov) and returns the closest agents
        to the origin, up to the specific max number of vehicles, pedestrians, and static objects."""
        
        # filter agents that are within the field of view (fov)
        agents_in_fov_x = np.abs(agent_states[:, 0]) < (self.cfg.fov / 2)
        agents_in_fov_y = np.abs(agent_states[:, 1]) < (self.cfg.fov / 2)
        agents_in_fov_mask = agents_in_fov_x * agents_in_fov_y
        valid_agents = np.where(agents_in_fov_mask > 0)[0]
        valid_vehicles = np.array(list(set(valid_agents).intersection(set(np.where(agent_types[:, NUPLAN_VEHICLE] == 1)[0]))))
        valid_pedestrians = np.array(list(set(valid_agents).intersection(set(np.where(agent_types[:, NUPLAN_PEDESTRIAN] == 1)[0]))))
        valid_static_objects = np.array(list(set(valid_agents).intersection(set(np.where(agent_types[:, NUPLAN_STATIC_OBJECT] == 1)[0]))))
        
        # find closest agents to the origin that are within the field of view, up to the specific max number
        dist_to_origin = np.linalg.norm(agent_states[:, :2], axis=-1)
        closest_ag_ids = np.argsort(dist_to_origin)
        closest_veh_ids = closest_ag_ids[np.in1d(closest_ag_ids, valid_vehicles)]
        closest_veh_ids = closest_veh_ids[:self.cfg.max_num_vehicles]
        closest_ped_ids = closest_ag_ids[np.in1d(closest_ag_ids, valid_pedestrians)]
        closest_ped_ids = closest_ped_ids[:self.cfg.max_num_pedestrians]
        closest_static_obj_ids = closest_ag_ids[np.in1d(closest_ag_ids, valid_static_objects)]
        closest_static_obj_ids = closest_static_obj_ids[:self.cfg.max_num_static_objects]
        closest_ag_ids = np.concatenate([closest_veh_ids, closest_ped_ids, closest_static_obj_ids], axis=0)

        return agent_states[closest_ag_ids], agent_types[closest_ag_ids]


    def get_road_points_adj(self, compact_lane_graph):
        """This helper converts the *sparse*, dictionary-based lane graph
        representation that comes out of
        :meth:`get_compact_lane_graph` / :meth:`partition_compact_lane_graph`
        into adjacency matrices and resamples lanes to num_points_per_lane points."""
        
        # ── Step 1: resample every lane to fixed P points ──────────────
        resampled_lanes = []
        lane_types = []
        idx_to_id = {}
        id_to_idx = {}
        i = 0
        for lane_id in compact_lane_graph['lanes']:
            lane = compact_lane_graph['lanes'][lane_id]
            lane_type = compact_lane_graph['lane_types'][lane_id]
            resampled_lane = resample_polyline(lane, num_points=self.cfg.num_points_per_lane)
            resampled_lanes.append(resampled_lane)
            lane_types.append(lane_type)
            idx_to_id[i] = lane_id
            id_to_idx[lane_id] = i
            
            i += 1
        
        # ── Step 2: keep the max_num_lanes closest to the origin ───────
        resampled_lanes = np.array(resampled_lanes)
        lane_types = np.array(lane_types)
        num_lanes = min(len(resampled_lanes), self.cfg.max_num_lanes)
        dist_to_origin = np.linalg.norm(resampled_lanes, axis=-1).min(1)
        closest_lane_ids = np.argsort(dist_to_origin)[:num_lanes]
        resampled_lanes = resampled_lanes[closest_lane_ids]
        lane_types = lane_types[closest_lane_ids]

        # mapping from old idx to new index after ordering by distance
        idx_to_new_idx = {}
        new_idx_to_idx = {}
        for i, j in enumerate(closest_lane_ids):
            idx_to_new_idx[j] = i 
            new_idx_to_idx[i] = j

        # Pre‑allocate adjacency matrices (no left/right connections in nuplan) --------
        pre_road_adj = np.zeros((num_lanes, num_lanes))
        suc_road_adj = np.zeros((num_lanes, num_lanes))
        for new_idx_i in range(num_lanes):
            for id_j in compact_lane_graph['pre_pairs'][idx_to_id[new_idx_to_idx[new_idx_i]]]:
                if id_to_idx[id_j] in closest_lane_ids:
                    pre_road_adj[new_idx_i, idx_to_new_idx[id_to_idx[id_j]]] = 1 

            for id_j in compact_lane_graph['suc_pairs'][idx_to_id[new_idx_to_idx[new_idx_i]]]:
                if id_to_idx[id_j] in closest_lane_ids:
                    suc_road_adj[new_idx_i, idx_to_new_idx[id_to_idx[id_j]]] = 1

        return resampled_lanes, lane_types, pre_road_adj, suc_road_adj, num_lanes

    
    def get_partitioned_masks(self, agents, lanes, a2a_edge_index, l2l_edge_index, l2a_edge_index):
        """Create boolean masks that *hide* edges crossing the Y-axis partition."""
        a2a_edge_index = a2a_edge_index.numpy()
        l2l_edge_index = l2l_edge_index.numpy()
        l2a_edge_index = l2a_edge_index.numpy()
        
        num_agents = len(agents)
        num_lanes = len(lanes)

        agents_x = agents[:, 0]
        lanes_x = lanes[:, 9, 0]
        agents_after_origin = np.where(agents_x > 0)[0]
        lanes_after_origin = np.where(lanes_x > 0)[0]

        # sum only equals 1 if two agents on opposite sides of partition
        a2a_mask = np.isin(a2a_edge_index, agents_after_origin).sum(0) != 1
        l2l_mask = np.isin(l2l_edge_index, lanes_after_origin).sum(0) != 1

        lane_l2a_mask = np.isin(l2a_edge_index[0], lanes_after_origin)[None, :]
        agent_l2a_mask = np.isin(l2a_edge_index[1], agents_after_origin)[None, :]
        l2a_mask = np.concatenate([lane_l2a_mask, agent_l2a_mask], axis=0).sum(0) != 1   

        return torch.from_numpy(a2a_mask), torch.from_numpy(l2l_mask), torch.from_numpy(l2a_mask), lanes_x <= 0
    
    
    def get_data(self, data, idx):
        """Process **one** Nuplan scenario.

        if preprocess=True: read from cached preprocessed pickle and return ScenarioDreamerData object for autoencoder training
        if preprocess=False: cache processed data as pickle file to disk to reduce data processing overhead during autoencoder training."""
        
        # ───────────────────────────────────────────────────────────────
        # FAST PATH: already pre-processed tensors on disk
        # ───────────────────────────────────────────────────────────────
        if self.preprocess:
            road_points = data['road_points']
            agent_states = data['agent_states']
            edge_index_lane_to_lane = data['edge_index_lane_to_lane']
            edge_index_lane_to_agent = data['edge_index_lane_to_agent']
            edge_index_agent_to_agent = data['edge_index_agent_to_agent']
            road_connection_types = data['road_connection_types']
            num_lanes = data['num_lanes']
            num_agents = data['num_agents']
            agent_types = data['agent_types']
            lane_types = data['lane_types']
            lg_type = data['lg_type']
            map_id = data['map_id']
            
        # ───────────────────────────────────────────────────────────────
        # SLOW PATH: raw Nuplan pickle → preprocess and cache to disk
        # ───────────────────────────────────────────────────────────────
        else:
            # elements of scene already normalized to ego by SLEDGE preprocessing and agents off driveable area have already been removed
            compact_lane_graph_scene = self.extract_lane_graph(
                data['G'], 
                data['lines'], 
                data['green_lights'], 
                data['red_lights'], 
                data['id'])
            agent_states, agent_types = self.extract_agents(
                data['ego'], 
                data['vehicles'], 
                data['pedestrians'], 
                data['static_objects']) 
            
            # statistics here
            normalize_statistics = {}
            
            compact_lane_graph_scene = self.get_lane_graph_within_fov(compact_lane_graph_scene)
            if len(compact_lane_graph_scene['lanes']) == 0:
                d = {
                'normalize_statistics': None,
                'valid_scene': False
                }
                return d
            
            # partitioned lane graph enables explicit training to inpaint
            compact_lane_graph_inpainting = self.partition_compact_lane_graph(copy.deepcopy(compact_lane_graph_scene))
            agent_states, agent_types = self.get_agents_within_fov(agent_states, agent_types)
            agent_states = modify_agent_states(agent_states)
            num_agents = len(agent_states)
            
            if num_agents == 0:
                d = {
                'normalize_statistics': None,
                'valid_scene': False
                }
                return d
            
            # Process *both* regular & partitioned lane graphs
            lg_dict = {
                'regular': compact_lane_graph_scene,   
                'partitioned': compact_lane_graph_inpainting
            }
            for lg_type in lg_dict.keys():
                lg = lg_dict[lg_type]
                road_points, lane_types, pre_road_adj, suc_road_adj, num_lanes = self.get_road_points_adj(lg)
                
                # get edge information
                edge_index_lane_to_lane = get_edge_index_complete_graph(num_lanes)
                edge_index_agent_to_agent = get_edge_index_complete_graph(num_agents)
                edge_index_lane_to_agent = get_edge_index_bipartite(num_lanes, num_agents)
                
                road_connection_types = []
                for i in range(edge_index_lane_to_lane.shape[1]):
                    pre_conn_indicator = pre_road_adj[edge_index_lane_to_lane[1, i], edge_index_lane_to_lane[0, i]]
                    suc_conn_indicator = suc_road_adj[edge_index_lane_to_lane[1, i], edge_index_lane_to_lane[0, i]]
                    if edge_index_lane_to_lane[1, i] == edge_index_lane_to_lane[0, i]:
                        road_connection_types.append(get_lane_connection_type_onehot_nuplan('self'))
                    elif pre_conn_indicator:
                        road_connection_types.append(get_lane_connection_type_onehot_nuplan('pred'))
                    elif suc_conn_indicator:
                        road_connection_types.append(get_lane_connection_type_onehot_nuplan('succ'))
                    else:
                        road_connection_types.append(get_lane_connection_type_onehot_nuplan('none'))
                road_connection_types = np.array(road_connection_types)
                
                # cache the processed dict to disk so subsequent runs take the fast path
                raw_file_name = os.path.splitext(os.path.basename(self.files[idx]))[0]
                to_pickle = dict()
                to_pickle['idx'] = idx
                to_pickle['lg_type'] = 0 if lg_type == 'regular' else 1
                to_pickle['num_agents'] = num_agents 
                to_pickle['num_lanes'] = num_lanes
                to_pickle['road_points'] = road_points
                to_pickle['lane_types'] = lane_types
                to_pickle['agent_states'] = agent_states[:, :-1] # no need for existence dimension
                to_pickle['agent_types'] = agent_types # only vehicle, pedestrian, static_object
                to_pickle['edge_index_lane_to_lane'] = edge_index_lane_to_lane
                to_pickle['edge_index_agent_to_agent'] = edge_index_agent_to_agent
                to_pickle['edge_index_lane_to_agent'] = edge_index_lane_to_agent
                to_pickle['road_connection_types'] = road_connection_types
                to_pickle['map_id'] = data['id']
                # # save preprocessed file
                with open(os.path.join(self.preprocessed_dir, f'{raw_file_name}_{to_pickle["lg_type"]}.pkl'), 'wb') as f:
                    pickle.dump(to_pickle, f, protocol=pickle.HIGHEST_PROTOCOL)

                if lg_type == 'regular':
                    normalize_statistics['num_agents'] = num_agents
                    normalize_statistics['num_lanes'] = num_lanes
                    normalize_statistics['max_speed'] = agent_states[:, 2].max()
                    normalize_statistics['min_length'] = agent_states[:, 5].min()
                    normalize_statistics['max_length'] = agent_states[:, 5].max()
                    normalize_statistics['min_width'] = agent_states[:, 6].min()
                    normalize_statistics['max_width'] = agent_states[:, 6].max()
                    normalize_statistics['min_lane_x'] = road_points[:, 0].min()
                    normalize_statistics['min_lane_y'] = road_points[:, 1].min()
                    normalize_statistics['max_lane_x'] = road_points[:, 0].max()
                    normalize_statistics['max_lane_y'] = road_points[:, 1].max()
            
            d = {
                'normalize_statistics': normalize_statistics,
                'valid_scene': True
            }

            return d
        
        agent_states, road_points = normalize_scene(
            agent_states, 
            road_points,
            fov=self.cfg.fov,
            min_speed=self.cfg.min_speed,
            max_speed=self.cfg.max_speed,
            min_length=self.cfg.min_length,
            max_length=self.cfg.max_length,
            min_width=self.cfg.min_width,
            max_width=self.cfg.max_width,
            min_lane_x=self.cfg.min_lane_x,
            min_lane_y=self.cfg.min_lane_y,
            max_lane_x=self.cfg.max_lane_x,
            max_lane_y=self.cfg.max_lane_y)

        # randomize order of indices except for ego (which is always index 0)
        if self.mode == 'train':
            agent_states, agent_types, road_points, lane_types, edge_index_lane_to_lane = randomize_indices(
                agent_states, 
                agent_types, 
                road_points, 
                edge_index_lane_to_lane, 
                lane_types)
            edge_index_lane_to_lane = torch.from_numpy(edge_index_lane_to_lane)
        
        if lg_type == PARTITIONED:
            a2a_mask, l2l_mask, l2a_mask, lane_partition_mask = self.get_partitioned_masks(
                agent_states, 
                road_points, 
                edge_index_agent_to_agent, 
                edge_index_lane_to_lane, 
                edge_index_lane_to_agent)
        
            agents_x = agent_states[:, 0]
            lanes_x = road_points[:, 9, 0]
            num_agents_after_origin = len(np.where(agents_x > 0)[0])
            num_lanes_after_origin = len(np.where(lanes_x > 0)[0])
        else:
            a2a_mask = torch.ones(edge_index_agent_to_agent.shape[1]).bool()
            l2l_mask = torch.ones(edge_index_lane_to_lane.shape[1]).bool()
            l2a_mask = torch.ones(edge_index_lane_to_agent.shape[1]).bool()
            lane_partition_mask = np.zeros(num_lanes).astype(bool)
            num_agents_after_origin = 0 
            num_lanes_after_origin = 0
            

        assert a2a_mask.shape[0] == edge_index_agent_to_agent.shape[1]
        assert l2l_mask.shape[0] == edge_index_lane_to_lane.shape[1]
        assert l2a_mask.shape[0] == edge_index_lane_to_agent.shape[1]
        assert lane_partition_mask.shape[0] == num_lanes
        
        
        # --------------------------------------------------------------
        # ️Assemble final PyG heterogeneous graph ------------------
        # --------------------------------------------------------------
        d = ScenarioDreamerData()
        d['idx'] = idx
        d['num_lanes'] = num_lanes 
        d['num_agents'] = num_agents
        d['lg_type'] = lg_type
        d['map_id'] = int(map_id)
        d['agent'].x = from_numpy(agent_states)
        d['agent'].type = from_numpy(agent_types)
        d['lane'].x = from_numpy(road_points)
        d['lane'].type = from_numpy(lane_types)
        d['lane'].partition_mask = from_numpy(lane_partition_mask)
        d['num_agents_after_origin'] = num_agents_after_origin 
        d['num_lanes_after_origin'] = num_lanes_after_origin

        # edge indices required for pyg
        d['lane', 'to', 'lane'].edge_index = edge_index_lane_to_lane
        d['lane', 'to', 'lane'].type = torch.from_numpy(road_connection_types)
        d['agent', 'to', 'agent'].edge_index = edge_index_agent_to_agent
        d['lane', 'to', 'agent'].edge_index = edge_index_lane_to_agent
        d['lane', 'to', 'lane'].encoder_mask = l2l_mask
        d['lane', 'to', 'agent'].encoder_mask = l2a_mask
        d['agent', 'to', 'agent'].encoder_mask = a2a_mask

        return d

    
    def get(self, idx: int):
        # cache processed data to disk as pickle file
        if not self.cfg.preprocess:
            raw_file_path = self.files[idx]
            map_id_file_path = raw_file_path.replace('sledge_raw', 'map_id')
            with gzip.open(raw_file_path, 'rb') as f:
                data = pickle.load(f)
            with gzip.open(map_id_file_path, 'rb') as f:
                map_id_data = pickle.load(f)
            data['id'] = map_id_data['id']
            
            d = self.get_data(data, idx)

        # return ScenarioDreamerData object for autoencoder training
        else:
            raw_file_name = os.path.splitext(os.path.basename(self.files[idx]))[0]
            raw_path = os.path.join(self.preprocessed_dir, f'{raw_file_name}.pkl')
            with open(raw_path, 'rb') as f:
                data = pickle.load(f)
            d = self.get_data(data, idx)
        
        return d

    
    def len(self):
        return self.dset_len

@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):
    cfg.ae.dataset.preprocess = False
    dset = NuplanDatasetAutoEncoder(cfg.ae.dataset, split_name='train')
    print(len(dset))
    np.random.seed(10)
    random.seed(10)
    torch.manual_seed(10)

    for idx in tqdm(range(len(dset))):
        raw_file_path = dset.files[idx]
        map_id_file_path = raw_file_path.replace('sledge_raw', 'map_id')
        with gzip.open(raw_file_path, 'rb') as f:
            data = pickle.load(f)
        with gzip.open(map_id_file_path, 'rb') as f:
            map_id_data = pickle.load(f)
        data['id'] = map_id_data['id']
        
        d = dset.get_data(data, idx)

        if idx == 25:
            break


if __name__ == '__main__':
    main()

================================================
FILE: datasets/nuplan/dataset_ldm_nuplan.py
================================================
import os
import sys
import json
import glob
import hydra
import torch
import pickle
import random
import sys
from tqdm import tqdm
from cfgs.config import CONFIG_PATH
from typing import Any

from torch_geometric.data import Dataset
from torch_geometric.loader import DataLoader
torch.set_printoptions(threshold=100000)
import numpy as np
np.set_printoptions(suppress=True, threshold=sys.maxsize)

from utils.data_container import ScenarioDreamerData
from utils.torch_helpers import from_numpy
from utils.data_helpers import sample_latents, reorder_indices

class NuplanDatasetLDM(Dataset):
    def __init__(self, cfg: Any, split_name: str = "train") -> None:
        """Instantiate a :class:`NuplanDatasetLDM`.

        Parameters
        ----------
        cfg
            Hydra configuration object containing dataset configs (cfg.dataset in global config)
        split_name
            One of ``{"train", "val", "test"}`` selecting which split
            to load from ``cfg.dataset.dataset_path``.
        """
        super(NuplanDatasetLDM, self).__init__()
        self.cfg = cfg
        self.split_name = split_name 
        self.dataset_dir = os.path.join(self.cfg.dataset_path, f"{self.split_name}")
        if not os.path.exists(self.dataset_dir):
            os.makedirs(self.dataset_dir, exist_ok=True)

        self.files = sorted(glob.glob(self.dataset_dir + "/*.pkl"))
        self.dset_len = len(self.files)

    
    def get_data(self, data, idx):
        """Return a sample for ldm training"""
        idx = data['idx']
        agent_states = data['agent_states']
        road_points = data['road_points']
        lane_mu = data['lane_mu']
        agent_mu = data['agent_mu']
        lane_log_var = data['lane_log_var']
        agent_log_var = data['agent_log_var']
        edge_index_lane_to_lane = data['edge_index_lane_to_lane']
        edge_index_lane_to_agent = data['edge_index_lane_to_agent']
        edge_index_agent_to_agent = data['edge_index_agent_to_agent']
        scene_type = data['scene_type']
        map_id = np.array([data['map_id']], dtype=int)
        num_lanes = lane_mu.shape[0]
        num_agents = agent_mu.shape[0]

        # apply recursive ordering
        agent_mu, agent_log_var, lane_mu, lane_log_var, edge_index_lane_to_lane, agent_partition_mask, lane_partition_mask = reorder_indices(
            agent_mu, 
            agent_log_var, 
            lane_mu, 
            lane_log_var, 
            edge_index_lane_to_lane, 
            agent_states, 
            road_points, 
            scene_type,
            dataset='nuplan')
        edge_index_lane_to_lane = torch.from_numpy(edge_index_lane_to_lane)

        # sample for ldm training
        d = dict()
        d = ScenarioDreamerData()
        d['idx'] = idx
        d['num_lanes'] = num_lanes 
        d['num_agents'] = num_agents
        d['lg_type'] = scene_type
        d['map_id'] = from_numpy(map_id)
        d['agent'].x = from_numpy(agent_mu)
        d['lane'].x = from_numpy(lane_mu)
        d['agent'].partition_mask = from_numpy(agent_partition_mask)
        d['lane'].partition_mask = from_numpy(lane_partition_mask)
        d['agent'].log_var = from_numpy(agent_log_var)
        d['lane'].log_var = from_numpy(lane_log_var)
        d['agent'].latents, d['lane'].latents = sample_latents(
            d, 
            self.cfg.agent_latents_mean,
            self.cfg.agent_latents_std,
            self.cfg.lane_latents_mean,
            self.cfg.lane_latents_std,
            normalize=True) # sample normalized latents for training

        d['lane', 'to', 'lane'].edge_index = from_numpy(edge_index_lane_to_lane)
        d['agent', 'to', 'agent'].edge_index = from_numpy(edge_index_agent_to_agent)
        d['lane', 'to', 'agent'].edge_index = from_numpy(edge_index_lane_to_agent)

        return d

    
    def get(self, idx: int):
        raw_file_name = os.path.splitext(os.path.basename(self.files[idx]))[0]
        raw_path = os.path.join(self.dataset_dir, f'{raw_file_name}.pkl')
        with open(raw_path, 'rb') as f:
            data = pickle.load(f)
        
        d = self.get_data(data, idx)
        
        return d

    
    def len(self):
        return self.dset_len

@hydra.main(version_base=None, config_path=CONFIG_PATH, config_name="config")
def main(cfg):
    cfg = cfg.ldm
    dset = NuplanDatasetLDM(cfg.dataset, split_name='train')

    print(cfg.dataset.dataset_path)
    
    np.random.seed(1)
    random.seed(1)
    torch.manual_seed(1)

    print(len(dset))

    if not os.path.exists(cfg.dataset.latent_stats_path):
        cfg.dataset.agent_latents_mean = 0.0
        cfg.dataset.agent_latents_std = 1.0
        cfg.dataset.lane_latents_mean = 0.0
        cfg.dataset.lane_latents_std = 1.0
    
    dloader = DataLoader(dset, 
               batch_size=1024, 
               shuffle=True, 
               num_workers=0,
               pin_memory=True,
               drop_last=True)

    agent_latents_all = []
    lane_latents_all = []
    for i, d in enumerate(tqdm(dloader)):
        agent_latents, lane_latents = sample_latents(
            d, 
            cfg.dataset.agent_latents_mean,
            cfg.dataset.agent_latents_std,
            cfg.dataset.lane_latents_mean,
            cfg.dataset.lane_latents_std,
            normalize=False)
        
        agent_latents_all.append(agent_latents)
        lane_latents_all.append(lane_latents)

        if i == 5:
            break
    
    agent_latents_all = torch.cat(agent_latents_all, dim=0)
    lane_latents_all = torch.cat(lane_latents_all, dim=0)

    print(agent_latents_all.mean(), agent_latents_all.std())
    print(lane_latents_all.mean(), lane_latents_all.std())



if __name__ == '__main__':
    main()

================================================
FILE: datasets/waymo/dataset_autoencoder_waymo.py
================================================
import os
import sys
import glob
import hydra
import torch
import pickle
import random
import copy
from tqdm import tqdm
from typing import Any, Dict, List, Tuple, Union

from torch_geometric.data import Dataset
torch.set_printoptions(threshold=100000)
import numpy as np
np.set_printoptions(suppress=True, threshold=sys.maxsize)
from cfgs.config import CONFIG_PATH, PARTITIONED

from utils.data_container import ScenarioDreamerData
from utils.lane_graph_helpers import resample_polyline, get_compact_lane_graph
from utils.pyg_helpers import get_edge_index_bipartite, get_edge_index_complete_graph
from utils.data_helpers import (
    get_object_type_onehot_waymo, 
    get_lane_connection_type_onehot_waymo, 
    modify_agent_states, 
    normalize_scene, 
    randomize_indices,
    extract_raw_waymo_data
)
from utils.torch_helpers import from_numpy
from utils.geometry import apply_se2_transform, rotate_and_normalize_angles

class WaymoDatasetAutoEncoder(Dataset):
    """A Torch-Geometric ``Dataset`` wrapping Waymo scenes for auto-encoding.

    The dataset performs processing of the extracted
    Waymo Open Dataset pickles (obtained from a separate data extraction script), including lane-graph extraction,
    agent-state normalisation, partitioning for in-painting. If preprocess=True, loads directly from preprocessed files
    for efficient autoencoder training. If preprocess=False, saves preprocessed data to disk.
    """

    def __init__(self, cfg: Any, split_name: str = "train", mode: str = "train") -> None:
        """Instantiate a :class:`WaymoDatasetAutoEncoder`.

        Parameters
        ----------
        cfg
            Hydra configuration object containing dataset configs (cfg.dataset in global config)
        split_name
            One of ``{"train", "val", "test"}`` selecting which split
            to load from ``cfg.dataset.dataset_path``.
        mode
            "train" or "eval" - affects shuffling/randomisation inside
            :meth:`get_data`.
        """
        super(WaymoDatasetAutoEncoder, self).__init__()
        self.cfg = cfg
        self.data_root = self.cfg.dataset_path
        self.split_name = split_name 
        self.mode = mode
        self.preprocess = self.cfg.preprocess
        self.preprocessed_dir = os.path.join(self.cfg.preprocess_dir, f"{self.split_name}")
        if not os.path.exists(self.preprocessed_dir):
            os.makedirs(self.preprocessed_dir, exist_ok=True)

        if not self.preprocess:
            self.files = sorted(glob.glob(os.path.join(self.data_root, f"{self.split_name}") + "/*.pkl"))
            
            # ──────────────────────────────────────────────────────────────────────────────
            # Test-set augmentation
            # ──────────────────────────────────────────────────────────────────────────────
            # To obtain a more statistically reliable evaluation, we *augment* the raw test
            # split by sampling *additional* timesteps from the same underlying scenarios.
            #
            # •  Each extra file corresponds to a **new, randomly chosen timestep** within a
            #    scenario, so we never duplicate an identical *(scenario, timestep)* pair.
            # •  If the random draw happens to select a timestep that has already been
            #    exported, the new file will **overwrite** the earlier one on disk.  As a
            #    result, the final number of *added* files is *≤ 10 000* rather than
            #    guaranteed to be exactly 10 000.
            # •  The original list is first shuffled so that the extra samples come from a
            #    diverse set of scenarios.
            #
            if self.split_name == 'test':
                self.files_augmented = copy.deepcopy(self.files)
                random.shuffle(self.files)
                # add at most 10000 more random files to get a large enough test set for evaluation
                self.files_augmented.extend(self.files[:10000])
                self.files = self.files_augmented
        else:
            self.files = sorted(glob.glob(self.preprocessed_dir + "/*.pkl"))
            
        self.dset_len = len(self.files)


    def partition_compact_lane_graph(self, compact_lane_graph: Dict[str, Any]) -> Dict[str, Any]:
        """Split lanes that cross the scene's x-axis (``y = 0``).

        The coordinate frame places the ego at ``(0, 0)``.
        To simplify conditional generation (in-painting), we partition
        any merged *compact* lane that crosses ``y = 0`` into multiple
        *sub-lanes* so that the origin acts as a semantic divider.

        Parameters
        ----------
        compact_lane_graph
            The *compact* lane graph returned by
            :meth:`get_compact_lane_graph`.

        Returns
        -------
        partitioned_lane_graph
            A deep-copy of *compact_lane_graph* where lanes have been
            further split and edge dictionaries updated so that no lane
            segment itself crosses ``y = 0``.
        """
        max_lane_id = max(list(compact_lane_graph['lanes'].keys()))
        next_lane_id = max_lane_id + 1

        lane_ids = list(compact_lane_graph['lanes'].keys())
        for lane_id in lane_ids:
            lane = compact_lane_graph['lanes'][lane_id]
            
            # Get y-values of the lane and find where it crosses or is near y = 0
            y_values = lane[:, 1]  # Assuming lane is [x, y] points
            sign_diff = np.insert(np.diff(np.signbit(y_values)), 0, 0)
            zero_crossings = np.where(sign_diff)[0]  # Indices where lane crosses y = 0
            
            if len(zero_crossings) == 0:  # If no crossings, skip this lane
                continue
            
            # Add artificial partitions at y = 0 crossings
            new_lanes = {}
            start_index = 0
            for crossing in zero_crossings:
                end_index = crossing + 1  # Create a partition from start to crossing
                new_lanes[next_lane_id] = lane[start_index:end_index]
                start_index = crossing  # Update start index for the next partition
                next_lane_id += 1
            
            # Handle the remaining part of the lane after the last crossing
            if zero_crossings[-1] < len(y_values) - 1:
                new_lanes[next_lane_id] = lane[start_index:]
                next_lane_id += 1
            
            # Update the compact_lane_graph with new lanes
            num_new_lanes = len(new_lanes)
            if num_new_lanes == 1:
                continue
            
            for j, new_lane_id in enumerate(new_lanes.keys()):
                compact_lane_graph['lanes'][new_lane_id] = new_lanes[new_lane_id]
                if j == 0:
                    compact_lane_graph['pre_pairs'][new_lane_id] = compact_lane_graph['pre_pairs'][lane_id]
                    # leveraging bijection between suc/pre
                    # replace successors of other lanes with new lane
                    for other_lane_id in compact_lane_graph['pre_pairs'][lane_id]:
                        if other_lane_id is not None:
                            compact_lane_graph['suc_pairs'][other_lane_id].remove(lane_id)
                            compact_lane_graph['suc_pairs'][other_lane_id].append(new_lane_id)
                    compact_lane_graph['suc_pairs'][new_lane_id] = [new_lane_id + 1] # by way we defined new lane ids
                
                elif j == num_new_lanes - 1:
                    compact_lane_graph['suc_pairs'][new_lane_id] = compact_lane_graph['suc_pairs'][lane_id]
                    # leveraging bijection between suc/pre
                    # replace predecessors of other lanes with new lane
                    for other_lane_id in compact_lane_graph['suc_pairs'][lane_id]:
                        if other_lane_id is not None:
                            compact_lane_graph['pre_pairs'][other_lane_id].remove(lane_id)
                            compact_lane_graph['pre_pairs'][other_lane_id].append(new_lane_id)
                    compact_lane_graph['pre_pairs'][new_lane_id] = [new_lane_id - 1] # by way we define new lane ids
                
                else:
                    compact_lane_graph['pre_pairs'][new_lane_id] = [new_lane_id - 1]
                    compact_lane_graph['suc_pairs'][new_lane_id] = [new_lane_id + 1]

                compact_lane_graph['left_pairs'][new_lane_id] = compact_lane_graph['left_pairs'][lane_id]
                compact_lane_graph['right_pairs'][new_lane_id] = compact_lane_graph['right_pairs'][lane_id]

            for other_lane_id in compact_lane_graph['right_pairs']:
                if lane_id in compact_lane_graph['right_pairs'][other_lane_id]:
                    compact_lane_graph['right_pairs'][other_lane_id].remove(lane_id)
                    for new_lane_id in new_lanes.keys():
                        compact_lane_graph['right_pairs'][other_lane_id].append(new_lane_id)

            for other_lane_id in compact_lane_graph['left_pairs']:
                if lane_id in compact_lane_graph['left_pairs'][other_lane_id]:
                    compact_lane_graph['left_pairs'][other_lane_id].remove(lane_id)
                    for new_lane_id in new_lanes.keys():
                        compact_lane_graph['left_pairs'][other_lane_id].append(new_lane_id)

            # remove old (now partitioned) lane from lane graph
            del compact_lane_graph['lanes'][lane_id]
            del compact_lane_graph['pre_pairs'][lane_id]
            del compact_lane_graph['suc_pairs'][lane_id]
            del compact_lane_graph['left_pairs'][lane_id]
            del compact_lane_graph['right_pairs'][lane_id]

        return compact_lane_graph


    def normalize_compact_lane_graph(self, lane_graph: Dict[str, Any], normalize_dict: Dict[str, np.ndarray]) -> Dict[str, Any]:
        """Translate & rotate lanes so that the AV sits at the origin.

        Parameters
        ----------
        lane_graph
            *Compact* or *partitioned* lane graph in global Waymo
            coordinates.
        normalize_dict
            Dictionary with keys ``{"center", "yaw"}`` describing the
            ego vehicle's position and heading at the sampling
            time-step.

        Returns
        -------
        lane_graph
            The *same* input dict, modified *in-place* so that every lane
            point is expressed in the AV-centric coordinate frame.
        """
        lane_ids = lane_graph['lanes'].keys()
        center = normalize_dict['center']
        angle_of_rotation = (np.pi / 2) + np.sign(-normalize_dict['yaw']) * np.abs(normalize_dict['yaw'])
        center = center[np.newaxis, np.newaxis, :]

        # normalize lanes to ego
        for lane_id in lane_ids:
            lane = lane_graph['lanes'][lane_id]
            lane = apply_se2_transform(coordinates=lane[:, np.newaxis, :],
                                       translation=center,
                                       yaw=angle_of_rotation)[:, 0]
            # overwrite with normalized lane (centered + rotated on AV)
            lane_graph['lanes'][lane_id] = lane
        
        return lane_graph


    def get_lane_graph_within_fov(self, lane_graph: Dict[str, Any]) -> Dict[str, Any]:
        """Return only those lanes that intersect the square *field-of-view*.

        The coordinate frame is converted to an ego-centred frame
        earlier in the pipeline, so the autonomous vehicle (AV) is at
        the origin.  A lane point is considered *in view* when both its
        absolute X *and* Y coordinates are strictly smaller than
        ``cfg_dataset.fov / 2``.  Each retained lane is then resampled
        to a fixed number of points.

        Parameters
        ----------
        lane_graph : Dict[str, Any]
            A *compact* or *partitioned* lane-graph with the standard
            keys ``{"lanes", "pre_pairs", "suc_pairs", "left_pairs",
            "right_pairs"}``.  All coordinates must already be expressed
            in the AV-centric frame.

        Returns
        -------
        lane_graph_within_fov: Dict[str, Any]
            A new lane-graph containing only lanes that intersect the
            configured field-of-view.  Connection dictionaries are
            pruned so they reference *in-FOV* lanes exclusively, and each
            lane polyline has exactly
            ``cfg_dataset.upsample_lane_num_points`` points.
        """
        lane_ids = lane_graph['lanes'].keys()
        pre_pairs = lane_graph['pre_pairs']
        suc_pairs = lane_graph['suc_pairs']
        left_pairs = lane_graph['left_pairs']
        right_pairs = lane_graph['right_pairs']
        
        # ── Identify lanes that intersect the square FOV ──────────────
        lane_ids_within_fov = []
        valid_pts = {}
        for lane_id in lane_ids:
            lane = lane_graph['lanes'][lane_id]
            points_in_fov_x = np.abs(lane[:, 0]) < (self.cfg.fov / 2)
            points_in_fov_y = np.abs(lane[:, 1]) < (self.cfg.fov / 2)
            points_in_fov = points_in_fov_x * points_in_fov_y
            
            if np.any(points_in_fov):
                lane_ids_within_fov.append(lane_id)
                valid_pts[lane_id] = points_in_fov

        lanes_within_fov = {}
        pre_pairs_within_fov = {}
        suc_pairs_within_fov = {}
        left_pairs_within_fov = {}
        right_pairs_within_fov = {}
        
        # ── Prune connection dictionaries and resample polylines ─────────────────────────────
        for lane_id in lane_ids_within_fov:
            if lane_id in lane_ids:
                lane = lane_graph['lanes'][lane_id][valid_pts[lane_id]]
                # why upsample here instead of resample to self.cfg.num_points_per_lane?
                # these lanes may need to be partitioned later, so we want to ensure high lane resolution
                # for accurate partitioning. We resample to self.cfg.num_points_per_lane in get_road_points_adj
                resampled_lane = resample_polyline(lane, num_points=self.cfg.upsample_lane_num_points)
                lanes_within_fov[lane_id] = resampled_lane
            
            if lane_id in pre_pairs:
                pre_pairs_within_fov[lane_id] = [l for l in pre_pairs[lane_id] if l in lane_ids_within_fov]
            else:
                pre_pairs_within_fov[lane_id] = []
            
            if lane_id in suc_pairs:
                suc_pairs_within_fov[lane_id] = [l for l in suc_pairs[lane_id] if l in lane_ids_within_fov]
            else:
                suc_pairs_within_fov[lane_id] = [] 

            if lane_id in left_pairs:
                left_pairs_within_fov[lane_id] = [l for l in left_pairs[lane_id] if l in lane_ids_within_fov]
            else:
                left_pairs_within_fov[lane_id] = []
            
            if lane_id in right_pairs:
                right_pairs_within_fov[lane_id] = [l for l in right_pairs[lane_id] if l in lane_ids_within_fov]
            else:
                right_pairs_within_fov[lane_id] = []
        
        lane_graph_within_fov = {
            'lanes': lanes_within_fov,
            'pre_pairs': pre_pairs_within_fov,
            'suc_pairs': suc_pairs_within_fov,
            'left_pairs': left_pairs_within_fov,
            'right_pairs': right_pairs_within_fov
        }
        
        return lane_graph_within_fov

    
    def get_road_points_adj(
        self,
        compact_lane_graph: Dict[str, Any],
    ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, int]:
        """This helper converts the *sparse*, dictionary-based lane graph
        representation that comes out of
        :meth:`get_compact_lane_graph` / :meth:`partition_compact_lane_graph`
        into adjacency matrices and resamples lanes to num_points_per_lane points.

        Parameters
        ----------
        compact_lane_graph : Dict[str, Any]
            Lane graph already translated & rotated into the
            ego-centric frame.  Must contain keys
            ``{"lanes", "pre_pairs", "suc_pairs", "left_pairs", "right_pairs"}``.

        Returns
        -------
        road_points : np.ndarray
            Float32 tensor of shape ``(L, P, 2)`` where ``P`` is
            ``cfg_dataset.num_points_per_lane`` and ``L`` ≤
            ``cfg_dataset.max_num_lanes``.
        pre_adj, suc_adj, left_adj, right_adj : np.ndarray
            Four dense binary adjacency matrices of shape ``(L, L)``
            corresponding to predecessor, successor, left and
            right relationships respectively.
        num_lanes : int
            The number of lanes actually retained
        """
        
        # ── Step 1: resample every lane to fixed P points ──────────────
        resampled_lanes = []
        idx_to_id = {}
        id_to_idx = {}
        i = 0
        for lane_id in compact_lane_graph['lanes']:
            lane = compact_lane_graph['lanes'][lane_id]
            resampled_lane = resample_polyline(lane, num_points=self.cfg.num_points_per_lane)
            resampled_lanes.append(resampled_lane)
            idx_to_id[i] = lane_id
            id_to_idx[lane_id] = i
            
            i += 1
        
        # ── Step 2: keep the max_num_lanes closest to the origin ───────
        resampled_lanes = np.array(resampled_lanes)
        num_lanes = min(len(resampled_lanes), self.cfg.max_num_lanes)
        dist_to_origin = np.linalg.norm(resampled_lanes, axis=-1).min(1)
        closest_lane_ids = np.argsort(dist_to_origin)[:num_lanes]
        resampled_lanes = resampled_lanes[closest_lane_ids]

        # mapping from old idx to new index after ordering by distance
        idx_to_new_idx = {}
        new_idx_to_idx = {}
        for i, j in enumerate(closest_lane_ids):
            idx_to_new_idx[j] = i 
            new_idx_to_idx[i] = j

        # Pre‑allocate adjacency matrices ------------------------------
        pre_road_adj = np.zeros((num_lanes, num_lanes))
        suc_road_adj = np.zeros((num_lanes, num_lanes))
        left_road_adj = np.zeros((num_lanes, num_lanes))
        right_road_adj = np.zeros((num_lanes, num_lanes))
        
        
        # ── Step 3: populate the matrices ──────────────────────────────
        for new_idx_i in range(num_lanes):
            for id_j in compact_lane_graph['pre_pairs'][idx_to_id[new_idx_to_idx[new_idx_i]]]:
                if id_to_idx[id_j] in closest_lane_ids:
                    pre_road_adj[new_idx_i, idx_to_new_idx[id_to_idx[id_j]]] = 1 

            for id_j in compact_lane_graph['suc_pairs'][idx_to_id[new_idx_to_idx[new_idx_i]]]:
                if id_to_idx[id_j] in closest_lane_ids:
                    suc_road_adj[new_idx_i, idx_to_new_idx[id_to_idx[id_j]]] = 1

            for id_j in compact_lane_graph['left_pairs'][idx_to_id[new_idx_to_idx[new_idx_i]]]:
                if id_to_idx[id_j] in closest_lane_ids:
                    left_road_adj[new_idx_i, idx_to_new_idx[id_to_idx[id_j]]] = 1

            for id_j in compact_lane_graph['right_pairs'][idx_to_id[new_idx_to_idx[new_idx_i]]]:
                if id_to_idx[id_j] in closest_lane_ids:
                    right_road_adj[new_idx_i, idx_to_new_idx[id_to_idx[id_j]]] = 1
        
        return resampled_lanes, pre_road_adj, suc_road_adj, left_road_adj, right_road_adj, num_lanes


    def get_agents_within_fov(
        self,
        agent_states: np.ndarray,
        agent_types: np.ndarray,
        normalize_dict: Dict[str, np.ndarray]
    ) -> Tuple[np.ndarray, np.ndarray]:
        """Translate agent states into the AV frame and retain only those in view.

        Parameters
        ----------
        agent_states : np.ndarray
            Float32 array of shape ``(N, D)`` where the first 5 columns
            follow Waymo's convention ``[x, y, vx, vy, yaw]`` and the
            remaining columns hold size/existence meta-data.  Coordinates
            are in the *scenario* frame **before** any ego alignment.
        agent_types : np.ndarray
            One-hot encoded array of shape ``(N, 5

Download .txt

gitextract_kak740hh/

├── .gitignore
├── .gitmodules
├── README.md
├── cfgs/
│   ├── config.py
│   ├── config.yaml
│   ├── datamodule/
│   │   ├── base.yaml
│   │   ├── nuplan_autoencoder.yaml
│   │   ├── nuplan_ctrl_sim.yaml
│   │   ├── nuplan_ldm.yaml
│   │   ├── waymo_autoencoder.yaml
│   │   ├── waymo_ctrl_sim.yaml
│   │   └── waymo_ldm.yaml
│   ├── dataset/
│   │   ├── nuplan_autoencoder.yaml
│   │   ├── nuplan_base.yaml
│   │   ├── nuplan_ctrl_sim.yaml
│   │   ├── nuplan_ldm.yaml
│   │   ├── waymo_autoencoder.yaml
│   │   ├── waymo_base.yaml
│   │   ├── waymo_ctrl_sim.yaml
│   │   └── waymo_ldm.yaml
│   ├── dataset_name/
│   │   ├── nuplan.yaml
│   │   └── waymo.yaml
│   ├── eval/
│   │   ├── base.yaml
│   │   ├── nuplan_autoencoder.yaml
│   │   ├── nuplan_ldm.yaml
│   │   ├── waymo_autoencoder.yaml
│   │   └── waymo_ldm.yaml
│   ├── model/
│   │   ├── autoencoder.yaml
│   │   ├── ldm.yaml
│   │   ├── nuplan_autoencoder.yaml
│   │   ├── nuplan_ctrl_sim.yaml
│   │   ├── nuplan_ldm.yaml
│   │   ├── waymo_autoencoder.yaml
│   │   ├── waymo_ctrl_sim.yaml
│   │   └── waymo_ldm.yaml
│   ├── sim/
│   │   ├── base.yaml
│   │   ├── scenario_dreamer_100m.yaml
│   │   ├── scenario_dreamer_100m_adv.yaml
│   │   ├── scenario_dreamer_55m.yaml
│   │   ├── waymo_ctrl_sim.yaml
│   │   └── waymo_log_replay.yaml
│   └── train/
│       ├── base.yaml
│       ├── nuplan_autoencoder.yaml
│       ├── nuplan_ctrl_sim.yaml
│       ├── nuplan_ldm.yaml
│       ├── waymo_autoencoder.yaml
│       ├── waymo_ctrl_sim.yaml
│       └── waymo_ldm.yaml
├── data_processing/
│   ├── nuplan/
│   │   ├── generate_nuplan_dataset.py
│   │   └── preprocess_dataset_nuplan.py
│   ├── postprocess_simulation_environments.py
│   └── waymo/
│       ├── add_nocturne_compatible_val_scenarios_to_test.py
│       ├── convert_pickles_to_jsons.py
│       ├── create_gpudrive_pickles.py
│       ├── create_waymo_eval_set.py
│       ├── generate_k_disks_vocabulary.py
│       ├── generate_waymo_dataset.py
│       └── preprocess_dataset_waymo.py
├── datamodules/
│   ├── nuplan/
│   │   ├── nuplan_datamodule_autoencoder.py
│   │   └── nuplan_datamodule_ldm.py
│   └── waymo/
│       ├── waymo_datamodule_autoencoder.py
│       ├── waymo_datamodule_ctrl_sim.py
│       └── waymo_datamodule_ldm.py
├── datasets/
│   ├── nuplan/
│   │   ├── dataset_autoencoder_nuplan.py
│   │   └── dataset_ldm_nuplan.py
│   └── waymo/
│       ├── dataset_autoencoder_waymo.py
│       ├── dataset_ctrl_sim.py
│       └── dataset_ldm_waymo.py
├── environment.yml
├── eval.py
├── metadata/
│   ├── gpudrive_checkpoint/
│   │   └── pretrained.pt
│   ├── initial_prob_matrix_nuplan.pt
│   ├── initial_prob_matrix_waymo.pt
│   ├── inpainting_prob_matrix_nuplan.pt
│   ├── inpainting_prob_matrix_waymo.pt
│   ├── k_disks_vocab_384_10Hz_seed42.pkl
│   ├── latent_stats/
│   │   ├── scenario_dreamer_autoencoder_nuplan.pkl
│   │   └── scenario_dreamer_autoencoder_waymo.pkl
│   ├── nocturne_test_filenames.pkl
│   ├── nocturne_train_filenames.pkl
│   ├── nocturne_val_filenames.pkl
│   ├── nuplan_eval_set.pkl
│   ├── simulation_environment_datasets/
│   │   ├── scenario_dreamer_waymo_200m_jsons/
│   │   │   ├── 0_2.json
│   │   │   ├── 0_7.json
│   │   │   ├── 10_11.json
│   │   │   ├── 10_3.json
│   │   │   ├── 10_4.json
│   │   │   ├── 11_13.json
│   │   │   ├── 11_5.json
│   │   │   ├── 11_6.json
│   │   │   ├── 12_7.json
│   │   │   ├── 12_8.json
│   │   │   ├── 13_10.json
│   │   │   ├── 13_6.json
│   │   │   ├── 13_8.json
│   │   │   ├── 14_5.json
│   │   │   ├── 14_9.json
│   │   │   ├── 15_1.json
│   │   │   ├── 15_6.json
│   │   │   ├── 15_7.json
│   │   │   ├── 16_11.json
│   │   │   ├── 16_7.json
│   │   │   ├── 17_6.json
│   │   │   ├── 17_7.json
│   │   │   ├── 18_12.json
│   │   │   ├── 19_13.json
│   │   │   ├── 1_0.json
│   │   │   ├── 1_1.json
│   │   │   ├── 1_11.json
│   │   │   ├── 1_13.json
│   │   │   ├── 1_3.json
│   │   │   ├── 20_8.json
│   │   │   ├── 21_13.json
│   │   │   ├── 22_11.json
│   │   │   ├── 22_12.json
│   │   │   ├── 23_5.json
│   │   │   ├── 23_6.json
│   │   │   ├── 23_8.json
│   │   │   ├── 24_14.json
│   │   │   ├── 24_6.json
│   │   │   ├── 25_0.json
│   │   │   ├── 25_5.json
│   │   │   ├── 25_6.json
│   │   │   ├── 26_10.json
│   │   │   ├── 26_2.json
│   │   │   ├── 26_6.json
│   │   │   ├── 26_8.json
│   │   │   ├── 27_13.json
│   │   │   ├── 27_2.json
│   │   │   ├── 28_77.json
│   │   │   ├── 28_9.json
│   │   │   ├── 29_0.json
│   │   │   ├── 2_2.json
│   │   │   ├── 2_3.json
│   │   │   ├── 30_5.json
│   │   │   ├── 30_6.json
│   │   │   ├── 31_11.json
│   │   │   ├── 3_7.json
│   │   │   ├── 4_12.json
│   │   │   ├── 4_4.json
│   │   │   ├── 4_7.json
│   │   │   ├── 4_9.json
│   │   │   ├── 5_1.json
│   │   │   ├── 5_10.json
│   │   │   ├── 5_8.json
│   │   │   ├── 6_0.json
│   │   │   ├── 6_5.json
│   │   │   ├── 7_10.json
│   │   │   ├── 7_11.json
│   │   │   ├── 7_14.json
│   │   │   ├── 7_7.json
│   │   │   ├── 7_9.json
│   │   │   ├── 8_2.json
│   │   │   ├── 8_4.json
│   │   │   ├── 8_8.json
│   │   │   ├── 9_2.json
│   │   │   └── 9_4.json
│   │   └── scenario_dreamer_waymo_200m_pickles/
│   │       ├── 0_2.pkl
│   │       ├── 0_7.pkl
│   │       ├── 10_11.pkl
│   │       ├── 10_3.pkl
│   │       ├── 10_4.pkl
│   │       ├── 11_13.pkl
│   │       ├── 11_5.pkl
│   │       ├── 11_6.pkl
│   │       ├── 12_7.pkl
│   │       ├── 12_8.pkl
│   │       ├── 13_10.pkl
│   │       ├── 13_6.pkl
│   │       ├── 13_8.pkl
│   │       ├── 14_5.pkl
│   │       ├── 14_9.pkl
│   │       ├── 15_1.pkl
│   │       ├── 15_6.pkl
│   │       ├── 15_7.pkl
│   │       ├── 16_11.pkl
│   │       ├── 16_7.pkl
│   │       ├── 17_6.pkl
│   │       ├── 17_7.pkl
│   │       ├── 18_12.pkl
│   │       ├── 19_13.pkl
│   │       ├── 1_0.pkl
│   │       ├── 1_1.pkl
│   │       ├── 1_11.pkl
│   │       ├── 1_13.pkl
│   │       ├── 1_3.pkl
│   │       ├── 20_8.pkl
│   │       ├── 21_13.pkl
│   │       ├── 22_11.pkl
│   │       ├── 22_12.pkl
│   │       ├── 23_5.pkl
│   │       ├── 23_6.pkl
│   │       ├── 23_8.pkl
│   │       ├── 24_14.pkl
│   │       ├── 24_6.pkl
│   │       ├── 25_0.pkl
│   │       ├── 25_5.pkl
│   │       ├── 25_6.pkl
│   │       ├── 26_10.pkl
│   │       ├── 26_2.pkl
│   │       ├── 26_6.pkl
│   │       ├── 26_8.pkl
│   │       ├── 27_13.pkl
│   │       ├── 27_2.pkl
│   │       ├── 28_77.pkl
│   │       ├── 28_9.pkl
│   │       ├── 29_0.pkl
│   │       ├── 2_2.pkl
│   │       ├── 2_3.pkl
│   │       ├── 30_5.pkl
│   │       ├── 30_6.pkl
│   │       ├── 31_11.pkl
│   │       ├── 3_7.pkl
│   │       ├── 4_12.pkl
│   │       ├── 4_4.pkl
│   │       ├── 4_7.pkl
│   │       ├── 4_9.pkl
│   │       ├── 5_1.pkl
│   │       ├── 5_10.pkl
│   │       ├── 5_8.pkl
│   │       ├── 6_0.pkl
│   │       ├── 6_5.pkl
│   │       ├── 7_10.pkl
│   │       ├── 7_11.pkl
│   │       ├── 7_14.pkl
│   │       ├── 7_7.pkl
│   │       ├── 7_9.pkl
│   │       ├── 8_2.pkl
│   │       ├── 8_4.pkl
│   │       ├── 8_8.pkl
│   │       ├── 9_2.pkl
│   │       └── 9_4.pkl
│   ├── sledge_files/
│   │   └── nuplan.yaml
│   └── waymo_eval_set.pkl
├── metrics.py
├── models/
│   ├── ctrl_sim.py
│   ├── scenario_dreamer_autoencoder.py
│   └── scenario_dreamer_ldm.py
├── nn_modules/
│   ├── autoencoder.py
│   ├── ctrl_sim.py
│   ├── dit.py
│   └── ldm.py
├── policies/
│   ├── idm_policy.py
│   └── rl_policy.py
├── run_simulation.py
├── scripts/
│   ├── define_env_variables.sh
│   ├── extract_nuplan_data.sh
│   ├── extract_waymo_data.sh
│   ├── preprocess_ctrl_sim_waymo_dataset.sh
│   ├── preprocess_nuplan_dataset.sh
│   └── preprocess_waymo_dataset.sh
├── simulator.py
├── train.py
└── utils/
    ├── __init__.py
    ├── collision_helpers.py
    ├── data_container.py
    ├── data_helpers.py
    ├── diffusion_helpers.py
    ├── dit_layers.py
    ├── geometry.py
    ├── gpudrive_helpers.py
    ├── inpainting_helpers.py
    ├── k_disks_helpers.py
    ├── lane_graph_helpers.py
    ├── layers.py
    ├── losses.py
    ├── metrics_helpers.py
    ├── pyg_helpers.py
    ├── sim_env_helpers.py
    ├── sim_helpers.py
    ├── sledge_helpers.py
    ├── torch_helpers.py
    ├── train_helpers.py
    └── viz.py

Download .txt

SYMBOL INDEX (500 symbols across 54 files)

FILE: data_processing/nuplan/generate_nuplan_dataset.py
  function find_feature_paths (line 13) | def find_feature_paths(root_path, feature_name):
  function main (line 28) | def main(cfg):

FILE: data_processing/nuplan/preprocess_dataset_nuplan.py
  function _work_one_chunk (line 13) | def _work_one_chunk(idx, cfg_dict):
  function _run_one_cfg (line 22) | def _run_one_cfg(cfg):
  function main (line 40) | def main(cfg):

FILE: data_processing/postprocess_simulation_environments.py
  function main (line 12) | def main(cfg):

FILE: data_processing/waymo/add_nocturne_compatible_val_scenarios_to_test.py
  function add_val_to_test (line 10) | def add_val_to_test(cfg):
  function main (line 31) | def main(cfg):

FILE: data_processing/waymo/convert_pickles_to_jsons.py
  function reverse_ag_type_mapping (line 12) | def reverse_ag_type_mapping(agent_type_onehot):
  function compute_route_road_edges (line 19) | def compute_route_road_edges(route):
  function main (line 40) | def main(cfg):

FILE: data_processing/waymo/create_gpudrive_pickles.py
  function _mp_init (line 16) | def _mp_init():
  function _work_one_chunk (line 32) | def _work_one_chunk(idx, cfg_dict):
  function _run_one_cfg (line 41) | def _run_one_cfg(cfg):
  function main (line 73) | def main(cfg):

FILE: data_processing/waymo/create_waymo_eval_set.py
  function main (line 9) | def main(cfg):

FILE: data_processing/waymo/generate_k_disks_vocabulary.py
  function main (line 18) | def main(cfg):

FILE: data_processing/waymo/generate_waymo_dataset.py
  function poly_gon_and_line (line 27) | def poly_gon_and_line(poly_dict):
  function road_info_except_lane (line 42) | def road_info_except_lane(x_list, road_keys):
  function road_info_lane (line 66) | def road_info_lane(x_dict):
  function get_lane_pairs (line 86) | def get_lane_pairs(engage_lanes):
  function get_engage_lanes (line 133) | def get_engage_lanes(data):
  function get_lane_graph (line 149) | def get_lane_graph(data):
  function process_lanegraph (line 162) | def process_lanegraph(data):
  function _parse_object_state (line 218) | def _parse_object_state(states, final_state):
  function _init_object (line 244) | def _init_object(track):
  function get_objects (line 269) | def get_objects(scenario_list, index):
  function get_road_info (line 289) | def get_road_info(scenario_list, index):
  function collect_data (line 319) | def collect_data(cfg, output_path, files_path, files, chunk):
  function _work_one_chunk (line 386) | def _work_one_chunk(idx, cfg_dict):
  function _run_one_cfg (line 394) | def _run_one_cfg(cfg):
  function main (line 418) | def main(cfg):

FILE: data_processing/waymo/preprocess_dataset_waymo.py
  function _mp_init (line 17) | def _mp_init():
  function _work_one_chunk (line 33) | def _work_one_chunk(idx, cfg_dict):
  function _run_one_cfg (line 42) | def _run_one_cfg(cfg):
  function main (line 64) | def main(cfg):

FILE: datamodules/nuplan/nuplan_datamodule_autoencoder.py
  function worker_init_fn (line 8) | def worker_init_fn(worker_id):
  class NuplanDataModuleAutoEncoder (line 11) | class NuplanDataModuleAutoEncoder(pl.LightningDataModule):
    method __init__ (line 13) | def __init__(self,
    method setup (line 29) | def setup(self, stage):
    method train_dataloader (line 34) | def train_dataloader(self):
    method val_dataloader (line 44) | def val_dataloader(self):

FILE: datamodules/nuplan/nuplan_datamodule_ldm.py
  function worker_init_fn (line 7) | def worker_init_fn(worker_id):
  class NuplanDataModuleLDM (line 10) | class NuplanDataModuleLDM(pl.LightningDataModule):
    method __init__ (line 12) | def __init__(self,
    method setup (line 28) | def setup(self, stage):
    method train_dataloader (line 33) | def train_dataloader(self):
    method val_dataloader (line 43) | def val_dataloader(self):

FILE: datamodules/waymo/waymo_datamodule_autoencoder.py
  function worker_init_fn (line 8) | def worker_init_fn(worker_id):
  class WaymoDataModuleAutoEncoder (line 11) | class WaymoDataModuleAutoEncoder(pl.LightningDataModule):
    method __init__ (line 13) | def __init__(self,
    method setup (line 29) | def setup(self, stage):
    method train_dataloader (line 34) | def train_dataloader(self):
    method val_dataloader (line 44) | def val_dataloader(self):

FILE: datamodules/waymo/waymo_datamodule_ctrl_sim.py
  function worker_init_fn (line 8) | def worker_init_fn(worker_id):
  class WaymoDataModuleCtRLSim (line 11) | class WaymoDataModuleCtRLSim(pl.LightningDataModule):
    method __init__ (line 13) | def __init__(self,
    method setup (line 29) | def setup(self, stage):
    method train_dataloader (line 34) | def train_dataloader(self):
    method val_dataloader (line 44) | def val_dataloader(self):

FILE: datamodules/waymo/waymo_datamodule_ldm.py
  function worker_init_fn (line 7) | def worker_init_fn(worker_id):
  class WaymoDataModuleLDM (line 10) | class WaymoDataModuleLDM(pl.LightningDataModule):
    method __init__ (line 12) | def __init__(self,
    method setup (line 28) | def setup(self, stage):
    method train_dataloader (line 33) | def train_dataloader(self):
    method val_dataloader (line 43) | def val_dataloader(self):

FILE: datasets/nuplan/dataset_autoencoder_nuplan.py
  class NuplanDatasetAutoEncoder (line 26) | class NuplanDatasetAutoEncoder(Dataset):
    method __init__ (line 34) | def __init__(self, cfg: Any, split_name: str = "train", mode: str = "t...
    method get_lane_graph_within_fov (line 68) | def get_lane_graph_within_fov(self, lane_graph: Dict[str, Any]) -> Dic...
    method partition_compact_lane_graph (line 145) | def partition_compact_lane_graph(self, compact_lane_graph: Dict[str, A...
    method extract_lane_graph (line 237) | def extract_lane_graph(
    method extract_agents (line 306) | def extract_agents(self, ego, vehicles, pedestrians, static_objects):
    method get_agents_within_fov (line 397) | def get_agents_within_fov(self, agent_states, agent_types):
    method get_road_points_adj (line 424) | def get_road_points_adj(self, compact_lane_graph):
    method get_partitioned_masks (line 478) | def get_partitioned_masks(self, agents, lanes, a2a_edge_index, l2l_edg...
    method get_data (line 503) | def get_data(self, data, idx):
    method get (line 716) | def get(self, idx: int):
    method len (line 740) | def len(self):
  function main (line 744) | def main(cfg):

FILE: datasets/nuplan/dataset_ldm_nuplan.py
  class NuplanDatasetLDM (line 24) | class NuplanDatasetLDM(Dataset):
    method __init__ (line 25) | def __init__(self, cfg: Any, split_name: str = "train") -> None:
    method get_data (line 47) | def get_data(self, data, idx):
    method get (line 106) | def get(self, idx: int):
    method len (line 117) | def len(self):
  function main (line 121) | def main(cfg):

FILE: datasets/waymo/dataset_autoencoder_waymo.py
  class WaymoDatasetAutoEncoder (line 32) | class WaymoDatasetAutoEncoder(Dataset):
    method __init__ (line 41) | def __init__(self, cfg: Any, split_name: str = "train", mode: str = "t...
    method partition_compact_lane_graph (line 95) | def partition_compact_lane_graph(self, compact_lane_graph: Dict[str, A...
    method normalize_compact_lane_graph (line 201) | def normalize_compact_lane_graph(self, lane_graph: Dict[str, Any], nor...
    method get_lane_graph_within_fov (line 237) | def get_lane_graph_within_fov(self, lane_graph: Dict[str, Any]) -> Dic...
    method get_road_points_adj (line 330) | def get_road_points_adj(
    method get_agents_within_fov (line 416) | def get_agents_within_fov(
    method remove_offroad_agents (line 477) | def remove_offroad_agents(
    method get_partitioned_masks (line 527) | def get_partitioned_masks(
    method get_data (line 587) | def get_data(self,
    method get (line 997) | def get(self, idx: int):
    method len (line 1013) | def len(self):
  function main (line 1017) | def main(cfg):

FILE: datasets/waymo/dataset_ctrl_sim.py
  class CtRLSimDataset (line 30) | class CtRLSimDataset(Dataset):
    method __init__ (line 43) | def __init__(self, cfg, split_name='train'):
    method get_upsampled_and_sd_lanes (line 95) | def get_upsampled_and_sd_lanes(self, compact_lane_graph):
    method remove_offroad_agents (line 112) | def remove_offroad_agents(self, agent_states, agent_types, lanes):
    method rollout_k_disks (line 135) | def rollout_k_disks(self, agent_states):
    method get_ego_collision_rewards (line 246) | def get_ego_collision_rewards(self, agent_states_all):
    method get_last_valid_positions (line 279) | def get_last_valid_positions(self, states):
    method get_agent_mask (line 290) | def get_agent_mask(self, agent_states, normalize_dict, fov=None):
    method compute_rtgs (line 306) | def compute_rtgs(self, rewards):
    method select_closest_max_num_agents (line 327) | def select_closest_max_num_agents(
    method get_normalized_lanes_in_fov (line 394) | def get_normalized_lanes_in_fov(self, lanes, normalize_dict):
    method discretize_rtgs (line 433) | def discretize_rtgs(self, rtgs):
    method collect_state_transitions (line 441) | def collect_state_transitions(self, data):
    method get_data (line 474) | def get_data(self, data, idx):
    method get (line 709) | def get(self, idx):
    method len (line 745) | def len(self):
  function main (line 750) | def main(cfg):

FILE: datasets/waymo/dataset_ldm_waymo.py
  class WaymoDatasetLDM (line 23) | class WaymoDatasetLDM(Dataset):
    method __init__ (line 24) | def __init__(self, cfg: Any, split_name: str = "train") -> None:
    method get_data (line 46) | def get_data(self, data, idx):
    method get (line 105) | def get(self, idx: int):
    method len (line 116) | def len(self):
  function main (line 120) | def main(cfg):

FILE: eval.py
  function generate_simulation_environments (line 18) | def generate_simulation_environments(cfg, cfg_ae, save_dir=None):
  function eval_ldm (line 42) | def eval_ldm(cfg, cfg_ae, save_dir=None):
  function eval_autoencoder (line 79) | def eval_autoencoder(cfg, save_dir=None):
  function main (line 105) | def main(cfg):

FILE: metrics.py
  class Metrics (line 7) | class Metrics():
    method __init__ (line 9) | def __init__(self, cfg):
    method compute_metrics (line 16) | def compute_metrics(self):

FILE: models/ctrl_sim.py
  class CtRLSim (line 11) | class CtRLSim(pl.LightningModule):
    method __init__ (line 13) | def __init__(self, cfg):
    method forward (line 28) | def forward(self, data, eval=False):
    method compute_loss (line 36) | def compute_loss(self, data, preds):
    method training_step (line 99) | def training_step(self, data, batch_idx):
    method validation_step (line 138) | def validation_step(self, data, batch_idx):
    method on_before_optimizer_step (line 165) | def on_before_optimizer_step(self, optimizer):
    method configure_optimizers (line 175) | def configure_optimizers(self):

FILE: models/scenario_dreamer_autoencoder.py
  function worker_init_fn (line 22) | def worker_init_fn(worker_id):
  class ScenarioDreamerAutoEncoder (line 26) | class ScenarioDreamerAutoEncoder(pl.LightningModule):
    method __init__ (line 28) | def __init__(self, cfg):
    method test_dataloader (line 45) | def test_dataloader(self):
    method forward (line 71) | def forward(self, data):
    method _log_losses (line 92) | def _log_losses(self, loss_dict, split='train', batch_size=None):
    method _cache_latents (line 118) | def _cache_latents(self, data):
    method training_step (line 187) | def training_step(self, data, batch_idx):
    method validation_step (line 195) | def validation_step(self, data, batch_idx):
    method test_step (line 229) | def test_step(self, data, batch_idx):
    method on_before_optimizer_step (line 265) | def on_before_optimizer_step(self, optimizer):
    method configure_optimizers (line 273) | def configure_optimizers(self):

FILE: models/scenario_dreamer_ldm.py
  class ScenarioDreamerLDM (line 27) | class ScenarioDreamerLDM(pl.LightningModule):
    method __init__ (line 28) | def __init__(self, cfg, cfg_ae):
    method on_train_start (line 42) | def on_train_start(self):
    method optimizer_step (line 47) | def optimizer_step(self, *args, **kwargs):
    method _log_losses (line 53) | def _log_losses(self, loss_dict, split='train', batch_size=None):
    method training_step (line 79) | def training_step(self, data, batch_idx):
    method validation_step (line 87) | def validation_step(self, data, batch_idx):
    method forward (line 118) | def forward(self,
    method _build_ldm_dset_from_ae_dset_for_inpainting (line 191) | def _build_ldm_dset_from_ae_dset_for_inpainting(self, ae_dset, batch_s...
    method _initialize_pyg_dset (line 294) | def _initialize_pyg_dset(self, mode, num_samples, batch_size, conditio...
    method generate (line 448) | def generate(
    method on_before_optimizer_step (line 515) | def on_before_optimizer_step(self, optimizer):
    method on_save_checkpoint (line 522) | def on_save_checkpoint(self, checkpoint):
    method on_load_checkpoint (line 527) | def on_load_checkpoint(self, checkpoint):
    method configure_optimizers (line 533) | def configure_optimizers(self):

FILE: nn_modules/autoencoder.py
  class ScenarioDreamerEncoder (line 15) | class ScenarioDreamerEncoder(nn.Module):
    method __init__ (line 18) | def __init__(self, cfg):
    method forward (line 75) | def forward(
  class ScenarioDreamerDecoder (line 167) | class ScenarioDreamerDecoder(nn.Module):
    method __init__ (line 170) | def __init__(self, cfg):
    method forward (line 221) | def forward(
  class AutoEncoder (line 301) | class AutoEncoder(nn.Module):
    method __init__ (line 304) | def __init__(self, cfg):
    method loss (line 321) | def loss(self, data):
    method forward_encoder (line 404) | def forward_encoder(self, data, return_stats=False, return_lane_embedd...
    method forward_decoder (line 435) | def forward_decoder(self, agent_latents, lane_latents, data):
    method forward (line 448) | def forward(self, data, return_latents=False, return_lane_embeddings=F...

FILE: nn_modules/ctrl_sim.py
  class CtRLSimMapEncoder (line 8) | class CtRLSimMapEncoder(nn.Module):
    method __init__ (line 10) | def __init__(self, cfg):
    method get_road_pts_mask (line 38) | def get_road_pts_mask(self, roads):
    method forward (line 45) | def forward(self, data):
  class CtRLSimEncoder (line 83) | class CtRLSimEncoder(nn.Module):
    method __init__ (line 85) | def __init__(self, cfg):
    method forward (line 129) | def forward(self, data, eval):
  class CtRLSimDecoder (line 249) | class CtRLSimDecoder(nn.Module):
    method __init__ (line 251) | def __init__(self, cfg):
    method forward (line 291) | def forward(self, data, scene_enc, eval=False):

FILE: nn_modules/dit.py
  class DiT (line 9) | class DiT(nn.Module):
    method __init__ (line 11) | def __init__(self, cfg):
    method initialize_weights (line 58) | def initialize_weights(self):
    method forward (line 109) | def forward(self,

FILE: nn_modules/ldm.py
  class LDM (line 12) | class LDM(nn.Module):
    method __init__ (line 13) | def __init__(self, cfg):
    method predict_start_from_noise (line 59) | def predict_start_from_noise(self, x_t, t, noise):
    method q_posterior (line 67) | def q_posterior(self, x_start, x_t, t):
    method p_mean_variance (line 77) | def p_mean_variance(self, x_agent, x_lane, data, t_agent, t_lane):
    method p_sample (line 101) | def p_sample(self, x_agent, x_lane, data, t_agent, t_lane):
    method p_sample_loop (line 127) | def p_sample_loop(
    method forward (line 202) | def forward(self, data, mode='initial_scene'):
    method q_sample (line 217) | def q_sample(self, x_start, t, noise=None):
    method p_losses (line 227) | def p_losses(
    method loss (line 267) | def loss(self, data):

FILE: policies/idm_policy.py
  class IDMPolicy (line 17) | class IDMPolicy:
    method __init__ (line 29) | def __init__(self, cfg, env):
    method reset (line 57) | def reset(self, obs):
    method update_running_statistics (line 66) | def update_running_statistics(self, data_dict, scenario_dict, scene_co...
    method compute_metrics (line 152) | def compute_metrics(self):
    method act (line 200) | def act(self, obs, is_planner=True):
    method select_action (line 204) | def select_action(self, obs, is_planner=True):
    method _plot_lanes (line 247) | def _plot_lanes(self, lane_dict, agent_states, filename="lane_graph", ...
    method _plot_ego_path (line 290) | def _plot_ego_path(self, ego_path_polygon, agent_occupancies, ego_id):
    method _get_lane_data (line 309) | def _get_lane_data(self):
    method _get_closest_lane_point_from_position (line 330) | def _get_closest_lane_point_from_position(self, lane_points, position):
    method _get_closest_lane_from_position (line 336) | def _get_closest_lane_from_position(self, position, lane_geometries, m...
    method _compute_all_agent_lanes (line 350) | def _compute_all_agent_lanes(self, agent_states):
    method _get_ego_path (line 374) | def _get_ego_path(self, agent_id):
    method _initialize_ego_path (line 382) | def _initialize_ego_path(self, actor_id):
    method _compute_agent_occupancies (line 447) | def _compute_agent_occupancies(self, agent_states):
    method _compute_leading_agents_occ (line 464) | def _compute_leading_agents_occ(self, agent_states):
    method _get_accelerations (line 547) | def _get_accelerations(self, agent_states):
    method _get_steerings (line 598) | def _get_steerings(self, agent_states):
    method _get_next_states (line 648) | def _get_next_states(self, agent_states):
    method _get_next_path_position (line 716) | def _get_next_path_position(self, ego_path, current_lane, current_lane...

FILE: policies/rl_policy.py
  class RLPolicy (line 3) | class RLPolicy:
    method __init__ (line 4) | def __init__(self, cfg):
    method act (line 14) | def act(self, obs):

FILE: run_simulation.py
  class PolicyEvaluator (line 13) | class PolicyEvaluator:
    method __init__ (line 15) | def __init__(self, cfg, policy, env):
    method reset (line 23) | def reset(self):
    method update_running_statistics (line 35) | def update_running_statistics(self, info):
    method compute_metrics (line 43) | def compute_metrics(self):
    method evaluate_policy (line 55) | def evaluate_policy(self):
  function main (line 98) | def main(cfg):

FILE: simulator.py
  class Simulator (line 40) | class Simulator:
    method __init__ (line 49) | def __init__(self, cfg):
    method load_initial_scene (line 78) | def load_initial_scene(self, i):
    method _find_invalid_new_agents (line 106) | def _find_invalid_new_agents(
    method step (line 174) | def step(self, action):
    method _get_observation (line 350) | def _get_observation(self):
    method _update_viz_state (line 399) | def _update_viz_state(self, num_route_points=30):
    method initialize_data_dict (line 433) | def initialize_data_dict(self):
    method reset (line 476) | def reset(self, i):
    method render_state (line 536) | def render_state(self, name, movie_path):
  class CtRLSimBehaviourModel (line 577) | class CtRLSimBehaviourModel:
    method __init__ (line 582) | def __init__(self,
    method update_running_statistics (line 627) | def update_running_statistics(
    method compute_metrics (line 819) | def compute_metrics(self):
    method reset (line 844) | def reset(self, num_agents):
    method update_state (line 853) | def update_state(self, data_dict):
    method get_motion_data (line 878) | def get_motion_data(self, data_dict):
    method get_tilt_logits (line 1030) | def get_tilt_logits(self, tilt):
    method process_predicted_rtg (line 1037) | def process_predicted_rtg(
    method predict (line 1074) | def predict(self, motion_datas, data_dict, correspondences):
    method step (line 1129) | def step(self, data_dict):

FILE: train.py
  function train_ctrl_sim (line 21) | def train_ctrl_sim(cfg, save_dir=None):
  function train_ldm (line 70) | def train_ldm(cfg, cfg_ae, save_dir=None):
  function train_autoencoder (line 134) | def train_autoencoder(cfg, save_dir=None):
  function main (line 186) | def main(cfg):

FILE: utils/collision_helpers.py
  function compute_corners (line 4) | def compute_corners(positions, headings, lengths, widths):
  function get_axes (line 36) | def get_axes(vertices):
  function is_colliding (line 50) | def is_colliding(poly1, poly2):
  function batched_collision_checker (line 77) | def batched_collision_checker(ego_state, agent_states):
  function compute_collision_states_one_scene (line 112) | def compute_collision_states_one_scene(vehicles):

FILE: utils/data_container.py
  class CtRLSimData (line 7) | class CtRLSimData(HeteroData):
    method __inc__ (line 9) | def __inc__(self, key, value, store):
  class ScenarioDreamerData (line 13) | class ScenarioDreamerData(HeteroData):
    method __inc__ (line 15) | def __inc__(self, key, value, store):
  function get_batches (line 29) | def get_batches(data):
  function get_features (line 39) | def get_features(data):
  function get_edge_indices (line 65) | def get_edge_indices(data):
  function get_encoder_edge_indices (line 80) | def get_encoder_edge_indices(data):

FILE: utils/data_helpers.py
  function extract_raw_waymo_data (line 10) | def extract_raw_waymo_data(agents_data: List[Dict[str, Any]]) -> Tuple[n...
  function add_batch_dim (line 68) | def add_batch_dim(arr):
  function get_object_type_onehot_waymo (line 71) | def get_object_type_onehot_waymo(agent_type):
  function get_lane_connection_type_onehot_waymo (line 76) | def get_lane_connection_type_onehot_waymo(lane_connection_type):
  function get_lane_connection_type_onehot_nuplan (line 81) | def get_lane_connection_type_onehot_nuplan(lane_connection_type):
  function get_lane_type_onehot_nuplan (line 86) | def get_lane_type_onehot_nuplan(lane_type):
  function get_object_type_onehot_nuplan (line 91) | def get_object_type_onehot_nuplan(agent_type):
  function reorder_indices (line 97) | def reorder_indices(
  function modify_agent_states (line 312) | def modify_agent_states(agent_states):
  function normalize_scene (line 337) | def normalize_scene(
  function unnormalize_scene (line 379) | def unnormalize_scene(
  function randomize_indices (line 421) | def randomize_indices(
  function normalize_latents (line 455) | def normalize_latents(
  function unnormalize_latents (line 469) | def unnormalize_latents(
  function reparameterize (line 483) | def reparameterize(mu, log_var):
  function sample_latents (line 497) | def sample_latents(
  function convert_batch_to_scenarios (line 525) | def convert_batch_to_scenarios(data, batch_size, batch_idx, cache_dir, c...

FILE: utils/diffusion_helpers.py
  function extract (line 4) | def extract(a, t, x_shape):
  function cosine_beta_schedule (line 10) | def cosine_beta_schedule(timesteps, s=0.008, dtype=torch.float32):

FILE: utils/dit_layers.py
  function modulate (line 11) | def modulate(x, shift, scale):
  function _ntuple (line 15) | def _ntuple(n):
  function get_1d_sincos_pos_embed_from_grid (line 29) | def get_1d_sincos_pos_embed_from_grid(embed_dim, pos):
  class AttentionLayerDiT (line 51) | class AttentionLayerDiT(MessagePassing):
    method __init__ (line 53) | def __init__(self,
    method message (line 82) | def message(self, q_i, k_j, v_j, index, ptr):
    method update (line 88) | def update(self, inputs):
    method _attn_block (line 92) | def _attn_block(self, x_src, x_dst, edge_index):
    method forward (line 104) | def forward(self, x, edge_index):
  class Mlp (line 109) | class Mlp(nn.Module):
    method __init__ (line 113) | def __init__(
    method forward (line 138) | def forward(self, x):
  class LabelEmbedder (line 148) | class LabelEmbedder(nn.Module):
    method __init__ (line 153) | def __init__(self, num_classes, hidden_size, dropout_prob):
    method token_drop (line 160) | def token_drop(self, labels, force_drop_ids=None):
    method forward (line 171) | def forward(self, labels, train, force_drop_ids=None):
  class TimestepEmbedder (line 180) | class TimestepEmbedder(nn.Module):
    method __init__ (line 185) | def __init__(self, hidden_size, frequency_embedding_size=256):
    method timestep_embedding (line 195) | def timestep_embedding(t, dim, max_period=10000):
    method forward (line 215) | def forward(self, t):
  class DiTBlock (line 221) | class DiTBlock(nn.Module):
    method __init__ (line 226) | def __init__(self, hidden_size, num_heads, dropout, mlp_ratio=4.0, **b...
    method forward (line 239) | def forward(self, x, c, edge_index):
  class FactorizedDiTBlock (line 248) | class FactorizedDiTBlock(nn.Module):
    method __init__ (line 252) | def __init__(
    method forward (line 284) | def forward(
  class FinalLayer (line 315) | class FinalLayer(nn.Module):
    method __init__ (line 320) | def __init__(self, hidden_size, latent_size):
    method forward (line 329) | def forward(self, x, c):
  class TwoLayerResMLP (line 336) | class TwoLayerResMLP(nn.Module):
    method __init__ (line 337) | def __init__(self, input_dim, hidden_dim):
    method forward (line 349) | def forward(self, x):

FILE: utils/geometry.py
  function rotate_and_normalize_angles (line 3) | def rotate_and_normalize_angles(current_angles, rotation_angle):
  function normalize_angle (line 11) | def normalize_angle(angle):
  function dot_product_2d (line 15) | def dot_product_2d(a, b):
  function cross_product_2d (line 20) | def cross_product_2d(a, b):
  function make_2d_rotation_matrix (line 24) | def make_2d_rotation_matrix(angle_in_radians):
  function apply_se2_transform (line 30) | def apply_se2_transform(coordinates, translation, yaw):
  function radians_to_degrees (line 43) | def radians_to_degrees(radians):
  function normalize_lanes_and_agents (line 48) | def normalize_lanes_and_agents(agents, lanes, normalize_dict, dataset):
  function normalize_agents (line 79) | def normalize_agents(agent_states, normalize_dict, offset=np.pi/2):
  function normalize_lanes (line 102) | def normalize_lanes(lanes, normalize_dict, offset=np.pi/2):

FILE: utils/gpudrive_helpers.py
  class EntityType (line 58) | class EntityType(Enum):
  class MapType (line 75) | class MapType(Enum):
  class MapVector2 (line 103) | class MapVector2:
    method __init__ (line 105) | def __init__(self, x: float = 0.0, y: float = 0.0):
  class MapVector3 (line 110) | class MapVector3:
    method __init__ (line 112) | def __init__(self, x: float = 0.0, y: float = 0.0, z: float = 0.0):
  class VehicleSize (line 119) | class VehicleSize:
    method __init__ (line 121) | def __init__(self, length: float = 0.0, width: float = 0.0, height: fl...
  class MetaData (line 128) | class MetaData:
    method __init__ (line 130) | def __init__(self):
  class MapObject (line 138) | class MapObject:
    method __init__ (line 140) | def __init__(self):
  class MapRoad (line 159) | class MapRoad:
    method __init__ (line 161) | def __init__(self):
  class Map (line 171) | class Map:
    method __init__ (line 173) | def __init__(self):
  function distance_2d (line 186) | def distance_2d(p1: MapVector2, p2: MapVector2) -> float:
  function get_ego_state (line 192) | def get_ego_state(ego_state):
  function get_partner_obs (line 242) | def get_partner_obs(
  function get_map_obs (line 348) | def get_map_obs(
  function get_route_obs (line 523) | def get_route_obs(
  class ForwardKinematics (line 624) | class ForwardKinematics:
    method __init__ (line 628) | def __init__(
    method forward_kinematics (line 646) | def forward_kinematics(self, action):
  function get_action_value_tensor (line 696) | def get_action_value_tensor() -> torch.Tensor:
  function _normalize_min_max (line 741) | def _normalize_min_max(tensor, min_val, max_val):
  function _angle_add (line 755) | def _angle_add(angle1: float, angle2: float) -> float:
  function from_json_MapVector2 (line 762) | def from_json_MapVector2(j: dict) -> MapVector2:
  function from_json_MapObject (line 773) | def from_json_MapObject(j: dict) -> MapObject:
  function from_json_MapRoad (line 856) | def from_json_MapRoad(j: dict, polylineReductionThreshold: float = 0.0) ...
  function calc_mean (line 992) | def calc_mean(j: dict) -> Tuple[float, float]:
  function make_road_edge (line 1035) | def make_road_edge(road_init, j, world_mean) -> dict:
  function create_road_edges (line 1132) | def create_road_edges(data, world_mean, max_num_edges=10000) -> List[dict]:
  function from_json_Map (line 1209) | def from_json_Map(j: dict, polylineReductionThreshold: float = 0.0) -> d...
  function load_policy (line 1436) | def load_policy(path_to_cpt, model_name, device):
  function log_prob (line 1462) | def log_prob(logits, value):
  function entropy (line 1470) | def entropy(logits):
  function sample_logits (line 1478) | def sample_logits(
  function layer_init (line 1519) | def layer_init(layer, std=np.sqrt(2), bias_const=0.0):
  class NeuralNet (line 1528) | class NeuralNet(
    method __init__ (line 1534) | def __init__(
    method encode_observations (line 1631) | def encode_observations(self, observation):
    method forward (line 1664) | def forward(self, obs, action=None, deterministic=False):
    method unpack_obs (line 1677) | def unpack_obs(self, obs_flat):

FILE: utils/inpainting_helpers.py
  function normalize_and_crop_scene (line 15) | def normalize_and_crop_scene(cond_d, new_d, normalize_dict, cfg, dataset...
  function sample_num_lanes_agents_inpainting (line 196) | def sample_num_lanes_agents_inpainting(

FILE: utils/k_disks_helpers.py
  function compute_k_disks_vocabulary (line 5) | def compute_k_disks_vocabulary(state_transitions, vocab_size, l, w, eps):
  function transform_box_corners_from_vocab (line 54) | def transform_box_corners_from_vocab(box_coords, V):
  function get_local_state_transition (line 84) | def get_local_state_transition(current_state, next_state):
  function transform_box_corners_from_local_state (line 106) | def transform_box_corners_from_local_state(box_coords, local_state_trans...
  function get_global_next_state (line 133) | def get_global_next_state(global_states, local_transitions):
  function forward_k_disks (line 162) | def forward_k_disks(states, actions, vocab, delta_t, exists):
  function inverse_k_disks (line 184) | def inverse_k_disks(states, next_states, vocab):

FILE: utils/lane_graph_helpers.py
  function get_compact_lane_graph (line 5) | def get_compact_lane_graph(data):
  function find_lane_groups (line 130) | def find_lane_groups(pre_pairs, suc_pairs):
  function find_lane_group_id (line 187) | def find_lane_group_id(lane_id, lane_groups):
  function resample_polyline (line 194) | def resample_polyline(points, num_points=20):
  function resample_polyline_every (line 213) | def resample_polyline_every(polyline, every=1.5):
  function resample_lanes (line 234) | def resample_lanes(lanes, num_points):
  function resample_lanes_with_mask (line 243) | def resample_lanes_with_mask(lanes, lanes_mask, num_points):
  function adjacency_matrix_to_adjacency_list (line 254) | def adjacency_matrix_to_adjacency_list(lane_graph_adj):
  function estimate_heading (line 272) | def estimate_heading(positions):
  function find_closest_lane (line 287) | def find_closest_lane(lanes, pos):

FILE: utils/layers.py
  class ResidualMLP (line 7) | class ResidualMLP(nn.Module):
    method __init__ (line 21) | def __init__(self, input_dim: int, hidden_dim: int,
    method forward (line 44) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class MLP (line 62) | class MLP(nn.Module):
    method __init__ (line 65) | def __init__(self, input_dim, hidden_dim, output_dim):
    method forward (line 75) | def forward(self, x):
  class AttentionLayer (line 79) | class AttentionLayer(MessagePassing):
    method __init__ (line 81) | def __init__(self,
    method forward (line 128) | def forward(self, x, r, edge_index):
    method message (line 142) | def message(self, q_i, k_j, v_j, r, index, ptr):
    method update (line 151) | def update(self, inputs, x_dst):
    method _attn_block (line 156) | def _attn_block(self, x_src, x_dst, r, edge_index):
    method _ff_block (line 163) | def _ff_block(self, x):
  class EdgeFeatureUpdate (line 167) | class EdgeFeatureUpdate(MessagePassing):
    method __init__ (line 169) | def __init__(self, node_hidden_dim, edge_hidden_dim):
    method forward (line 178) | def forward(self, x, edge_index, edge_attr):
    method edge_update (line 181) | def edge_update(self, x_i, x_j, edge_attr):
  class AutoEncoderFactorizedAttentionBlock (line 190) | class AutoEncoderFactorizedAttentionBlock(nn.Module):
    method __init__ (line 192) | def __init__(self,
    method forward (line 246) | def forward(self, agent_embeddings,

FILE: utils/losses.py
  class GeometricLoss (line 5) | class GeometricLoss(nn.Module):
    method __init__ (line 6) | def __init__(self, mean_dim=1, apply_mean=True):
    method forward (line 11) | def forward(self, pred, targ, batch):
  class GeometricCrossEntropy (line 31) | class GeometricCrossEntropy(GeometricLoss):
    method _loss (line 33) | def _loss(self, pred, targ):
  class GeometricHuber (line 36) | class GeometricHuber(GeometricLoss):
    method _loss (line 38) | def _loss(self, pred, targ):
  class GeometricL2 (line 41) | class GeometricL2(GeometricLoss):
    method _loss (line 43) | def _loss(self, pred, targ):
  class GeometricL1 (line 46) | class GeometricL1(GeometricLoss):
    method _loss (line 48) | def _loss(self, pred, targ):
  class GeometricKL (line 51) | class GeometricKL(GeometricLoss):
    method _loss (line 53) | def _loss(self, mu, log_var):

FILE: utils/metrics_helpers.py
  function compute_frechet_distance (line 12) | def compute_frechet_distance(X1, X2, apply_sqrt=True):
  function jsd (line 39) | def jsd(sim, gt, clip_min, clip_max, bin_size):
  function compute_vehicle_circles (line 57) | def compute_vehicle_circles(xy_position, heading, length, width):
  function compute_collision_rate (line 74) | def compute_collision_rate(samples):
  function get_compact_lane_graph (line 121) | def get_compact_lane_graph(G, lanes, num_points_per_lane=20):
  function _get_sledge_lane_graph_nuplan (line 184) | def _get_sledge_lane_graph_nuplan(data):
  function get_networkx_lane_graph_without_traffic_lights (line 303) | def get_networkx_lane_graph_without_traffic_lights(data):
  function get_networkx_lane_graph (line 350) | def get_networkx_lane_graph(data):
  function convert_data_to_unified_format (line 378) | def convert_data_to_unified_format(data, dataset_name):
  function get_lane_length (line 412) | def get_lane_length(positions):
  function compute_route_length (line 420) | def compute_route_length(samples):
  function compute_endpoint_dist (line 466) | def compute_endpoint_dist(samples):
  function get_keypoint_G (line 491) | def get_keypoint_G(G, lanes):
  function get_num_keypoints (line 537) | def get_num_keypoints(G):
  function get_degree_keypoints (line 542) | def get_degree_keypoints(G):
  function urban_planning_reach_and_convenience (line 549) | def urban_planning_reach_and_convenience(G_edges):
  function get_onroad_vehicles (line 566) | def get_onroad_vehicles(vehicles, lanes, tol=1.5):
  function get_nearest_dists (line 577) | def get_nearest_dists(vehicles):
  function get_lateral_devs (line 587) | def get_lateral_devs(vehicles, lanes):
  function get_angular_devs (line 597) | def get_angular_devs(vehicles, lanes):
  function get_lengths (line 627) | def get_lengths(vehicles):
  function get_widths (line 632) | def get_widths(vehicles):
  function get_speeds (line 637) | def get_speeds(vehicles):
  function compute_urban_planning_metrics (line 642) | def compute_urban_planning_metrics(samples, gt_samples):
  function compute_jsd_metrics (line 703) | def compute_jsd_metrics(samples, gt_samples):
  function compute_lane_metrics (line 774) | def compute_lane_metrics(samples, gt_samples):
  function compute_agent_metrics (line 793) | def compute_agent_metrics(samples, gt_samples):
  function compute_sim_agent_jsd_metrics (line 809) | def compute_sim_agent_jsd_metrics(

FILE: utils/pyg_helpers.py
  function get_edge_index_bipartite (line 3) | def get_edge_index_bipartite(num_src, num_dst):
  function get_edge_index_complete_graph (line 15) | def get_edge_index_complete_graph(graph_size):
  function get_indices_within_scene (line 23) | def get_indices_within_scene(batch):

FILE: utils/sim_env_helpers.py
  function postprocess_sim_env (line 23) | def postprocess_sim_env(
  function get_route_lane_indices (line 207) | def get_route_lane_indices(
  function clean_up_scene (line 301) | def clean_up_scene(data, dataset, mode='initial_scene', endpoint_thresho...
  function check_scene_validity (line 476) | def check_scene_validity(data, dataset):
  function check_scene_validity_inpainting (line 518) | def check_scene_validity_inpainting(data, dataset, heading_tolerance=np....
  function sample_route (line 590) | def sample_route(d, dataset, heading_tolerance=np.pi/3, num_points_in_ro...
  function get_default_route_center_yaw (line 654) | def get_default_route_center_yaw(dataset):
  function generate_simulation_environments (line 662) | def generate_simulation_environments(model, cfg, save_dir):
  function _transform_scene (line 851) | def _transform_scene(agents, lanes, route, transform_dict, dataset):
  function _extend_simulation_environment (line 885) | def _extend_simulation_environment(current_env, new_tile, target_route_l...
  function _sample_candidate (line 995) | def _sample_candidate(candidates, dataset):
  function _near_border (line 1017) | def _near_border(pos, fov=64, threshold=1):
  function _near_partition (line 1024) | def _near_partition(pos, dataset, threshold=2.5):
  function _valid_route_end (line 1030) | def _valid_route_end(lane_id, lane, fov=64, border_threshold=1, heading_...
  function _transform_corners (line 1055) | def _transform_corners(corners, transform_dict, dataset):
  function _check_overlapping_tiles (line 1071) | def _check_overlapping_tiles(new_tile_corners, existing_tiles, ignore_la...

FILE: utils/sim_helpers.py
  function ego_completed_route (line 9) | def ego_completed_route(ego_state, route, dist_threshold=2.0):
  function ego_collided (line 18) | def ego_collided(ego_state, agent_states, agent_scale=1.0):
  function ego_off_route (line 38) | def ego_off_route(ego_state, route, off_route_threshold=5.0):
  function ego_progress (line 49) | def ego_progress(ego_state, route):
  function normalize_route (line 60) | def normalize_route(route, normalize_dict, offset=np.pi/2):
  function get_ego_route (line 74) | def get_ego_route(compact_lane_graph, lanes, ego_trajectory, dist_thresh...

FILE: utils/sledge_helpers.py
  function calculate_progress (line 5) | def calculate_progress(path):
  function get_path_length (line 19) | def get_path_length(path):
  function interpolate_path (line 23) | def interpolate_path(distances, length, progress, states_se2_array, as_a...
  function coords_in_frame (line 40) | def coords_in_frame(coords, frame):
  function find_consecutive_true_indices (line 55) | def find_consecutive_true_indices(mask):
  function pixel_in_frame (line 74) | def pixel_in_frame(pixel, pixel_frame):
  function coords_to_pixel (line 90) | def coords_to_pixel(coords, frame, pixel_size):

FILE: utils/torch_helpers.py
  function from_numpy (line 4) | def from_numpy(data):

FILE: utils/train_helpers.py
  function create_lambda_lr_cosine (line 13) | def create_lambda_lr_cosine(cfg):
  function create_lambda_lr_linear (line 24) | def create_lambda_lr_linear(cfg):
  function create_lambda_lr_constant (line 35) | def create_lambda_lr_constant(cfg):
  function weight_init (line 43) | def weight_init(m):
  function cache_latent_stats (line 112) | def cache_latent_stats(cfg):
  function set_latent_stats (line 173) | def set_latent_stats(cfg):
  function get_causal_mask (line 191) | def get_causal_mask(cfg, num_timesteps, num_types):

FILE: utils/viz.py
  function plot_scene (line 12) | def plot_scene(
  function plot_lane_graph (line 183) | def plot_lane_graph(
  function visualize_batch (line 260) | def visualize_batch(num_samples,
  function plot_k_disks_vocabulary (line 349) | def plot_k_disks_vocabulary(V, png_path, dpi=1000):
  function render_state (line 362) | def render_state(
  function generate_video (line 518) | def generate_video(name, output_dir, delete_images=False):

Copy disabled (too large) Download .json

Condensed preview — 274 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (57,840K chars).

[
  {
    "path": ".gitignore",
    "chars": 518,
    "preview": "slurm_logs/\n*.out\n__pycache__/\n*/__pycache__/\nlightning_logs/\nviz_*/\nmovies*/\nmetadata/simulation_environment_datasets/s"
  },
  {
    "path": ".gitmodules",
    "chars": 104,
    "preview": "[submodule \"gpudrive\"]\n\tpath = gpudrive\n\turl = https://github.com/RLuke22/gpudrive-scenario-dreamer.git\n"
  },
  {
    "path": "README.md",
    "chars": 44288,
    "preview": "# Official Repository for Scenario Dreamer\n\n<p align=\"left\">\n<a href=\"https://arxiv.org/abs/2503.22496\" alt=\"arXiv\">\n   "
  },
  {
    "path": "cfgs/config.py",
    "chars": 1295,
    "preview": "from pathlib import Path\nimport os\n\n# 1.  Pull from user’s shell if it exists  ──────────────────────────\n#    $ export "
  },
  {
    "path": "cfgs/config.yaml",
    "chars": 4516,
    "preview": "scratch_root: ${oc.env:SCRATCH_ROOT} # scratch root directory, used for datasets and checkpoints\ndataset_root: ${oc.env:"
  },
  {
    "path": "cfgs/datamodule/base.yaml",
    "chars": 263,
    "preview": "train_batch_size: null # train batch size (per-GPU)\nval_batch_size: null # validation batch size (per-GPU)\nnum_workers: "
  },
  {
    "path": "cfgs/datamodule/nuplan_autoencoder.yaml",
    "chars": 167,
    "preview": "defaults:\n- base\n\n_target_: datamodules.nuplan.nuplan_datamodule_autoencoder.NuplanDataModuleAutoEncoder\n\ntrain_batch_si"
  },
  {
    "path": "cfgs/datamodule/nuplan_ctrl_sim.yaml",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "cfgs/datamodule/nuplan_ldm.yaml",
    "chars": 149,
    "preview": "defaults:\n- base\n\n_target_: datamodules.nuplan.nuplan_datamodule_ldm.NuplanDataModuleLDM\n\ntrain_batch_size: 256\nval_batc"
  },
  {
    "path": "cfgs/datamodule/waymo_autoencoder.yaml",
    "chars": 164,
    "preview": "defaults:\n- base\n\n_target_: datamodules.waymo.waymo_datamodule_autoencoder.WaymoDataModuleAutoEncoder\n\ntrain_batch_size:"
  },
  {
    "path": "cfgs/datamodule/waymo_ctrl_sim.yaml",
    "chars": 154,
    "preview": "defaults: \n- base\n\n_target_: datamodules.waymo.waymo_datamodule_ctrl_sim.WaymoDataModuleCtRLSim\n\ntrain_batch_size: 16\nva"
  },
  {
    "path": "cfgs/datamodule/waymo_ldm.yaml",
    "chars": 146,
    "preview": "defaults:\n- base\n\n_target_: datamodules.waymo.waymo_datamodule_ldm.WaymoDataModuleLDM\n\ntrain_batch_size: 256\nval_batch_s"
  },
  {
    "path": "cfgs/dataset/nuplan_autoencoder.yaml",
    "chars": 632,
    "preview": "defaults:\n- nuplan_base\n\nsledge_raw_dataset_path: ${dataset_root}/scenario_dreamer_nuplan/sledge_raw # we use the same p"
  },
  {
    "path": "cfgs/dataset/nuplan_base.yaml",
    "chars": 1413,
    "preview": "# From https://github.com/motional/nuplan-devkit/blob/cd3fd8d3d0c4d390fcb74d05fd56f92d9e0c366b/nuplan/common/actor_state"
  },
  {
    "path": "cfgs/dataset/nuplan_ctrl_sim.yaml",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "cfgs/dataset/nuplan_ldm.yaml",
    "chars": 743,
    "preview": "defaults:\n- nuplan_base\n\ndataset_path: ${dataset_root}/scenario_dreamer_autoencoder_latents_nuplan # path to dataset of "
  },
  {
    "path": "cfgs/dataset/waymo_autoencoder.yaml",
    "chars": 720,
    "preview": "defaults:\n- waymo_base\n\ndataset_path: ${dataset_root}/scenario_dreamer_waymo # path to extracted waymo dataset that will"
  },
  {
    "path": "cfgs/dataset/waymo_base.yaml",
    "chars": 1028,
    "preview": "dataset_path: null\nmax_num_agents: 30 # maximum number of agents in the FOV, including the ego vehicle\nmax_num_lanes: 10"
  },
  {
    "path": "cfgs/dataset/waymo_ctrl_sim.yaml",
    "chars": 2620,
    "preview": "preprocess: True # get data from preprocessed files if True, otherwise write preprocessed data to disk (you can only tra"
  },
  {
    "path": "cfgs/dataset/waymo_ldm.yaml",
    "chars": 741,
    "preview": "defaults:\n- waymo_base\n\ndataset_path: ${dataset_root}/scenario_dreamer_autoencoder_latents_waymo # path to dataset of la"
  },
  {
    "path": "cfgs/dataset_name/nuplan.yaml",
    "chars": 31,
    "preview": "# placeholder file\nname: nuplan"
  },
  {
    "path": "cfgs/dataset_name/waymo.yaml",
    "chars": 30,
    "preview": "# placeholder file\nname: waymo"
  },
  {
    "path": "cfgs/eval/base.yaml",
    "chars": 169,
    "preview": "seed: 0 # random seed for reproducibility\nsave_dir: ${scratch_root}/checkpoints/ # save directory for evaluation results"
  },
  {
    "path": "cfgs/eval/nuplan_autoencoder.yaml",
    "chars": 658,
    "preview": "defaults:\n- base\n\nrun_name: scenario_dreamer_autoencoder_nuplan # default run name for evaluation\n\nnum_samples_to_visual"
  },
  {
    "path": "cfgs/eval/nuplan_ldm.yaml",
    "chars": 1947,
    "preview": "defaults:\n- base\n\nrun_name: scenario_dreamer_ldm_base_nuplan # default run name for evaluation\n\nmode: initial_scene # or"
  },
  {
    "path": "cfgs/eval/waymo_autoencoder.yaml",
    "chars": 934,
    "preview": "defaults:\n- base\n\nrun_name: scenario_dreamer_autoencoder_waymo # default run name for evaluation\n\nnum_samples_to_visuali"
  },
  {
    "path": "cfgs/eval/waymo_ldm.yaml",
    "chars": 1929,
    "preview": "defaults:\n- base\n\nrun_name: scenario_dreamer_ldm_base_waymo # default run name for evaluation\n\nmode: initial_scene # or "
  },
  {
    "path": "cfgs/model/autoencoder.yaml",
    "chars": 1590,
    "preview": "# architecture configurations\nhidden_dim: 512 # lane hidden dimension\nnum_encoder_blocks: 2 # number of factorized encod"
  },
  {
    "path": "cfgs/model/ldm.yaml",
    "chars": 1045,
    "preview": "# architecture parameters\nautoencoder_run_name: null # Name of the autoencoder run\nautoencoder_path: null # Path to the "
  },
  {
    "path": "cfgs/model/nuplan_autoencoder.yaml",
    "chars": 182,
    "preview": "defaults:\n- autoencoder\n\nlane_conn_attr: ${ae.dataset.num_lane_connection_types} # number of lane connection types\nnum_l"
  },
  {
    "path": "cfgs/model/nuplan_ctrl_sim.yaml",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "cfgs/model/nuplan_ldm.yaml",
    "chars": 269,
    "preview": "defaults:\n- ldm\n\n# architecture parameters\nautoencoder_run_name: scenario_dreamer_autoencoder_nuplan # Name of the autoe"
  },
  {
    "path": "cfgs/model/waymo_autoencoder.yaml",
    "chars": 182,
    "preview": "defaults:\n- autoencoder\n\nlane_conn_attr: ${ae.dataset.num_lane_connection_types} # number of lane connection types\nnum_l"
  },
  {
    "path": "cfgs/model/waymo_ctrl_sim.yaml",
    "chars": 978,
    "preview": "hidden_dim: 256 # ctrl-sim model hidden dim\nmap_attr: 3 # number of map attributes (x,y,existence)\nnum_road_types: 1 # n"
  },
  {
    "path": "cfgs/model/waymo_ldm.yaml",
    "chars": 268,
    "preview": "defaults:\n- ldm\n\n# architecture parameters\nautoencoder_run_name: scenario_dreamer_autoencoder_waymo # Name of the autoen"
  },
  {
    "path": "cfgs/sim/base.yaml",
    "chars": 1735,
    "preview": "seed: 0 # random seed for reproducibility\nmode: scenario_dreamer # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]\n"
  },
  {
    "path": "cfgs/sim/scenario_dreamer_100m.yaml",
    "chars": 674,
    "preview": "defaults:\n- base\n\nmode: scenario_dreamer # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]\ndataset_path: ${project_"
  },
  {
    "path": "cfgs/sim/scenario_dreamer_100m_adv.yaml",
    "chars": 729,
    "preview": "defaults:\n- base\n\nmode: scenario_dreamer # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]\ndataset_path: ${project_"
  },
  {
    "path": "cfgs/sim/scenario_dreamer_55m.yaml",
    "chars": 671,
    "preview": "defaults:\n- base\n\nmode: scenario_dreamer # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]\ndataset_path: ${project_"
  },
  {
    "path": "cfgs/sim/waymo_ctrl_sim.yaml",
    "chars": 645,
    "preview": "defaults:\n- base\n\nmode: waymo_ctrl_sim # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]\ndataset_path: ${project_ro"
  },
  {
    "path": "cfgs/sim/waymo_log_replay.yaml",
    "chars": 647,
    "preview": "defaults:\n- base\n\nmode: waymo_log_replay # [waymo_ctrl_sim, waymo_log_replay, scenario_dreamer]\ndataset_path: ${project_"
  },
  {
    "path": "cfgs/train/base.yaml",
    "chars": 1472,
    "preview": "### training configurations\nseed: 0 # training seed for reproducibility\nsave_dir: ${scratch_root}/checkpoints/ # directo"
  },
  {
    "path": "cfgs/train/nuplan_autoencoder.yaml",
    "chars": 414,
    "preview": "defaults:\n- base\n\nrun_name: scenario_dreamer_autoencoder_nuplan # from base.yaml\nmax_steps: 357450 # from base.yaml\nwarm"
  },
  {
    "path": "cfgs/train/nuplan_ctrl_sim.yaml",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "cfgs/train/nuplan_ldm.yaml",
    "chars": 803,
    "preview": "defaults:\n- base\n\nrun_name: scenario_dreamer_ldm_base_nuplan # from base.yaml\nlr_schedule: constant # from base.yaml\nsav"
  },
  {
    "path": "cfgs/train/waymo_autoencoder.yaml",
    "chars": 413,
    "preview": "defaults:\n- base\n\nrun_name: scenario_dreamer_autoencoder_waymo # from base.yaml\nmax_steps: 350000 # from base.yaml\nwarmu"
  },
  {
    "path": "cfgs/train/waymo_ctrl_sim.yaml",
    "chars": 220,
    "preview": "defaults:\n- base\n\nrun_name: ctrl_sim_waymo # from base.yaml\ndevices: 4 # from base.yaml\nwarmup_steps: 500 # from base.ya"
  },
  {
    "path": "cfgs/train/waymo_ldm.yaml",
    "chars": 802,
    "preview": "defaults:\n- base\n\nrun_name: scenario_dreamer_ldm_base_waymo # from base.yaml\nlr_schedule: constant # from base.yaml\nsave"
  },
  {
    "path": "data_processing/nuplan/generate_nuplan_dataset.py",
    "chars": 5399,
    "preview": "from pathlib import Path\nimport pickle\nimport hydra\nimport os\nfrom cfgs.config import CONFIG_PATH\nimport yaml\nimport shu"
  },
  {
    "path": "data_processing/nuplan/preprocess_dataset_nuplan.py",
    "chars": 2279,
    "preview": "import hydra\nimport random\nfrom tqdm import tqdm\nfrom datasets.nuplan.dataset_autoencoder_nuplan import NuplanDatasetAut"
  },
  {
    "path": "data_processing/postprocess_simulation_environments.py",
    "chars": 1638,
    "preview": "import os\nimport pickle\nfrom tqdm import tqdm\nimport numpy as np\nimport hydra\n\nfrom utils.sim_env_helpers import postpro"
  },
  {
    "path": "data_processing/waymo/add_nocturne_compatible_val_scenarios_to_test.py",
    "chars": 1484,
    "preview": "import os \nimport pickle \nimport shutil\nimport hydra\nfrom tqdm import tqdm\nfrom cfgs.config import CONFIG_PATH\n\n# Move h"
  },
  {
    "path": "data_processing/waymo/convert_pickles_to_jsons.py",
    "chars": 6896,
    "preview": "import os\nimport glob\nimport json\nimport pickle\nimport numpy as np\nfrom tqdm import tqdm\nimport hydra\nfrom cfgs.config i"
  },
  {
    "path": "data_processing/waymo/create_gpudrive_pickles.py",
    "chars": 3200,
    "preview": "import os\nos.environ.setdefault(\"OMP_NUM_THREADS\", \"1\")\nos.environ.setdefault(\"MKL_NUM_THREADS\", \"1\")\nos.environ.setdefa"
  },
  {
    "path": "data_processing/waymo/create_waymo_eval_set.py",
    "chars": 937,
    "preview": "import random \nimport pickle\nimport hydra \nfrom cfgs.config import CONFIG_PATH \nimport glob\nimport os\n\n@hydra.main(versi"
  },
  {
    "path": "data_processing/waymo/generate_k_disks_vocabulary.py",
    "chars": 1895,
    "preview": "import os\nimport hydra\nimport numpy as np\nimport random\nimport torch\nimport pickle\nfrom tqdm import tqdm\n\nfrom cfgs.conf"
  },
  {
    "path": "data_processing/waymo/generate_waymo_dataset.py",
    "chars": 14596,
    "preview": "import math\nimport os\nos.environ[\"CUDA_VISIBLE_DEVICES\"] = \"\"    # hide GPUs\nimport copy\nimport pickle\nimport hydra\nfrom"
  },
  {
    "path": "data_processing/waymo/preprocess_dataset_waymo.py",
    "chars": 3483,
    "preview": "import os\nos.environ.setdefault(\"OMP_NUM_THREADS\", \"1\")\nos.environ.setdefault(\"MKL_NUM_THREADS\", \"1\")\nos.environ.setdefa"
  },
  {
    "path": "datamodules/nuplan/nuplan_datamodule_autoencoder.py",
    "chars": 1846,
    "preview": "import pytorch_lightning as pl \n\nfrom datasets.nuplan.dataset_autoencoder_nuplan import NuplanDatasetAutoEncoder\nfrom to"
  },
  {
    "path": "datamodules/nuplan/nuplan_datamodule_ldm.py",
    "chars": 1792,
    "preview": "import pytorch_lightning as pl \nfrom datasets.nuplan.dataset_ldm_nuplan import NuplanDatasetLDM\nfrom torch_geometric.loa"
  },
  {
    "path": "datamodules/waymo/waymo_datamodule_autoencoder.py",
    "chars": 1839,
    "preview": "import pytorch_lightning as pl \n\nfrom datasets.waymo.dataset_autoencoder_waymo import WaymoDatasetAutoEncoder\nfrom torch"
  },
  {
    "path": "datamodules/waymo/waymo_datamodule_ctrl_sim.py",
    "chars": 1795,
    "preview": "import pytorch_lightning as pl \n\nfrom datasets.waymo.dataset_ctrl_sim import CtRLSimDataset\nfrom torch_geometric.loader "
  },
  {
    "path": "datamodules/waymo/waymo_datamodule_ldm.py",
    "chars": 1785,
    "preview": "import pytorch_lightning as pl \nfrom datasets.waymo.dataset_ldm_waymo import WaymoDatasetLDM\nfrom torch_geometric.loader"
  },
  {
    "path": "datasets/nuplan/dataset_autoencoder_nuplan.py",
    "chars": 34939,
    "preview": "import os\nimport sys\nimport glob\nimport hydra\nimport torch\nimport pickle\nimport random\nimport sys\nimport copy\nimport gzi"
  },
  {
    "path": "datasets/nuplan/dataset_ldm_nuplan.py",
    "chars": 5737,
    "preview": "import os\nimport sys\nimport json\nimport glob\nimport hydra\nimport torch\nimport pickle\nimport random\nimport sys\nfrom tqdm "
  },
  {
    "path": "datasets/waymo/dataset_autoencoder_waymo.py",
    "chars": 49929,
    "preview": "import os\nimport sys\nimport glob\nimport hydra\nimport torch\nimport pickle\nimport random\nimport copy\nfrom tqdm import tqdm"
  },
  {
    "path": "datasets/waymo/dataset_ctrl_sim.py",
    "chars": 32382,
    "preview": "import os\nimport hydra\nimport glob\nimport torch\nimport pickle\nimport random\nimport copy\nfrom tqdm import tqdm\n\nfrom torc"
  },
  {
    "path": "datasets/waymo/dataset_ldm_waymo.py",
    "chars": 5698,
    "preview": "import os\nimport sys\nimport glob\nimport hydra\nimport torch\nimport pickle\nimport random\nimport sys\nfrom tqdm import tqdm\n"
  },
  {
    "path": "environment.yml",
    "chars": 570,
    "preview": "name: scenario-dreamer\nchannels:\n  - defaults\ndependencies:\n  - python=3.10\n  - pip=24.2\n\n  - pip:\n    - hydra-core==1.3"
  },
  {
    "path": "eval.py",
    "chars": 5561,
    "preview": "import os \nimport hydra\nfrom omegaconf import OmegaConf\nfrom models.scenario_dreamer_autoencoder import ScenarioDreamerA"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/0_2.json",
    "chars": 948524,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"0_2.json\",\n    \"scenario_id\": \"0_2\",\n    \"ego_idx\": 67,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/0_7.json",
    "chars": 991219,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"0_7.json\",\n    \"scenario_id\": \"0_7\",\n    \"ego_idx\": 83,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/10_11.json",
    "chars": 184373,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"10_11.json\",\n    \"scenario_id\": \"10_11\",\n    \"ego_idx\": 62,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/10_3.json",
    "chars": 645277,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"10_3.json\",\n    \"scenario_id\": \"10_3\",\n    \"ego_idx\": 46,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/10_4.json",
    "chars": 884680,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"10_4.json\",\n    \"scenario_id\": \"10_4\",\n    \"ego_idx\": 79,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/11_13.json",
    "chars": 713269,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"11_13.json\",\n    \"scenario_id\": \"11_13\",\n    \"ego_idx\": 88,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/11_5.json",
    "chars": 669979,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"11_5.json\",\n    \"scenario_id\": \"11_5\",\n    \"ego_idx\": 48,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/11_6.json",
    "chars": 687064,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"11_6.json\",\n    \"scenario_id\": \"11_6\",\n    \"ego_idx\": 102,\n    \"route\": [\n        {\n"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/12_7.json",
    "chars": 1071994,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"12_7.json\",\n    \"scenario_id\": \"12_7\",\n    \"ego_idx\": 106,\n    \"route\": [\n        {\n"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/12_8.json",
    "chars": 491508,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"12_8.json\",\n    \"scenario_id\": \"12_8\",\n    \"ego_idx\": 81,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/13_10.json",
    "chars": 959740,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"13_10.json\",\n    \"scenario_id\": \"13_10\",\n    \"ego_idx\": 115,\n    \"route\": [\n        "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/13_6.json",
    "chars": 503192,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"13_6.json\",\n    \"scenario_id\": \"13_6\",\n    \"ego_idx\": 54,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/13_8.json",
    "chars": 416277,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"13_8.json\",\n    \"scenario_id\": \"13_8\",\n    \"ego_idx\": 59,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/14_5.json",
    "chars": 828956,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"14_5.json\",\n    \"scenario_id\": \"14_5\",\n    \"ego_idx\": 37,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/14_9.json",
    "chars": 493476,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"14_9.json\",\n    \"scenario_id\": \"14_9\",\n    \"ego_idx\": 69,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/15_1.json",
    "chars": 655145,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"15_1.json\",\n    \"scenario_id\": \"15_1\",\n    \"ego_idx\": 55,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/15_6.json",
    "chars": 867408,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"15_6.json\",\n    \"scenario_id\": \"15_6\",\n    \"ego_idx\": 77,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/15_7.json",
    "chars": 514931,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"15_7.json\",\n    \"scenario_id\": \"15_7\",\n    \"ego_idx\": 43,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/16_11.json",
    "chars": 435177,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"16_11.json\",\n    \"scenario_id\": \"16_11\",\n    \"ego_idx\": 73,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/16_7.json",
    "chars": 1184901,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"16_7.json\",\n    \"scenario_id\": \"16_7\",\n    \"ego_idx\": 75,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/17_6.json",
    "chars": 560245,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"17_6.json\",\n    \"scenario_id\": \"17_6\",\n    \"ego_idx\": 69,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/17_7.json",
    "chars": 810422,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"17_7.json\",\n    \"scenario_id\": \"17_7\",\n    \"ego_idx\": 46,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/18_12.json",
    "chars": 589528,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"18_12.json\",\n    \"scenario_id\": \"18_12\",\n    \"ego_idx\": 47,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/19_13.json",
    "chars": 619390,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"19_13.json\",\n    \"scenario_id\": \"19_13\",\n    \"ego_idx\": 79,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/1_0.json",
    "chars": 686796,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"1_0.json\",\n    \"scenario_id\": \"1_0\",\n    \"ego_idx\": 66,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/1_1.json",
    "chars": 820016,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"1_1.json\",\n    \"scenario_id\": \"1_1\",\n    \"ego_idx\": 102,\n    \"route\": [\n        {\n  "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/1_11.json",
    "chars": 1139398,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"1_11.json\",\n    \"scenario_id\": \"1_11\",\n    \"ego_idx\": 93,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/1_13.json",
    "chars": 915565,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"1_13.json\",\n    \"scenario_id\": \"1_13\",\n    \"ego_idx\": 44,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/1_3.json",
    "chars": 376483,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"1_3.json\",\n    \"scenario_id\": \"1_3\",\n    \"ego_idx\": 71,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/20_8.json",
    "chars": 599435,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"20_8.json\",\n    \"scenario_id\": \"20_8\",\n    \"ego_idx\": 88,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/21_13.json",
    "chars": 485476,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"21_13.json\",\n    \"scenario_id\": \"21_13\",\n    \"ego_idx\": 96,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/22_11.json",
    "chars": 785755,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"22_11.json\",\n    \"scenario_id\": \"22_11\",\n    \"ego_idx\": 60,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/22_12.json",
    "chars": 606800,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"22_12.json\",\n    \"scenario_id\": \"22_12\",\n    \"ego_idx\": 67,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/23_5.json",
    "chars": 1018807,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"23_5.json\",\n    \"scenario_id\": \"23_5\",\n    \"ego_idx\": 91,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/23_6.json",
    "chars": 559037,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"23_6.json\",\n    \"scenario_id\": \"23_6\",\n    \"ego_idx\": 68,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/23_8.json",
    "chars": 478487,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"23_8.json\",\n    \"scenario_id\": \"23_8\",\n    \"ego_idx\": 55,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/24_14.json",
    "chars": 571894,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"24_14.json\",\n    \"scenario_id\": \"24_14\",\n    \"ego_idx\": 82,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/24_6.json",
    "chars": 921024,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"24_6.json\",\n    \"scenario_id\": \"24_6\",\n    \"ego_idx\": 90,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/25_0.json",
    "chars": 534789,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"25_0.json\",\n    \"scenario_id\": \"25_0\",\n    \"ego_idx\": 55,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/25_5.json",
    "chars": 601655,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"25_5.json\",\n    \"scenario_id\": \"25_5\",\n    \"ego_idx\": 37,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/25_6.json",
    "chars": 649558,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"25_6.json\",\n    \"scenario_id\": \"25_6\",\n    \"ego_idx\": 75,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/26_10.json",
    "chars": 681848,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"26_10.json\",\n    \"scenario_id\": \"26_10\",\n    \"ego_idx\": 77,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/26_2.json",
    "chars": 755960,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"26_2.json\",\n    \"scenario_id\": \"26_2\",\n    \"ego_idx\": 80,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/26_6.json",
    "chars": 701916,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"26_6.json\",\n    \"scenario_id\": \"26_6\",\n    \"ego_idx\": 58,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/26_8.json",
    "chars": 962178,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"26_8.json\",\n    \"scenario_id\": \"26_8\",\n    \"ego_idx\": 94,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/27_13.json",
    "chars": 704152,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"27_13.json\",\n    \"scenario_id\": \"27_13\",\n    \"ego_idx\": 53,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/27_2.json",
    "chars": 620314,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"27_2.json\",\n    \"scenario_id\": \"27_2\",\n    \"ego_idx\": 78,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/28_77.json",
    "chars": 850629,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"28_77.json\",\n    \"scenario_id\": \"28_77\",\n    \"ego_idx\": 55,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/28_9.json",
    "chars": 180448,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"28_9.json\",\n    \"scenario_id\": \"28_9\",\n    \"ego_idx\": 65,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/29_0.json",
    "chars": 855270,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"29_0.json\",\n    \"scenario_id\": \"29_0\",\n    \"ego_idx\": 88,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/2_2.json",
    "chars": 720919,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"2_2.json\",\n    \"scenario_id\": \"2_2\",\n    \"ego_idx\": 63,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/2_3.json",
    "chars": 663414,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"2_3.json\",\n    \"scenario_id\": \"2_3\",\n    \"ego_idx\": 34,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/30_5.json",
    "chars": 677403,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"30_5.json\",\n    \"scenario_id\": \"30_5\",\n    \"ego_idx\": 70,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/30_6.json",
    "chars": 950535,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"30_6.json\",\n    \"scenario_id\": \"30_6\",\n    \"ego_idx\": 88,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/31_11.json",
    "chars": 518970,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"31_11.json\",\n    \"scenario_id\": \"31_11\",\n    \"ego_idx\": 58,\n    \"route\": [\n        {"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/3_7.json",
    "chars": 299667,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"3_7.json\",\n    \"scenario_id\": \"3_7\",\n    \"ego_idx\": 71,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/4_12.json",
    "chars": 541920,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"4_12.json\",\n    \"scenario_id\": \"4_12\",\n    \"ego_idx\": 63,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/4_4.json",
    "chars": 623517,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"4_4.json\",\n    \"scenario_id\": \"4_4\",\n    \"ego_idx\": 82,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/4_7.json",
    "chars": 389542,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"4_7.json\",\n    \"scenario_id\": \"4_7\",\n    \"ego_idx\": 61,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/4_9.json",
    "chars": 624568,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"4_9.json\",\n    \"scenario_id\": \"4_9\",\n    \"ego_idx\": 40,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/5_1.json",
    "chars": 407571,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"5_1.json\",\n    \"scenario_id\": \"5_1\",\n    \"ego_idx\": 93,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/5_10.json",
    "chars": 623500,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"5_10.json\",\n    \"scenario_id\": \"5_10\",\n    \"ego_idx\": 72,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/5_8.json",
    "chars": 1498740,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"5_8.json\",\n    \"scenario_id\": \"5_8\",\n    \"ego_idx\": 130,\n    \"route\": [\n        {\n  "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/6_0.json",
    "chars": 678633,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"6_0.json\",\n    \"scenario_id\": \"6_0\",\n    \"ego_idx\": 75,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/6_5.json",
    "chars": 892289,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"6_5.json\",\n    \"scenario_id\": \"6_5\",\n    \"ego_idx\": 68,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/7_10.json",
    "chars": 406291,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"7_10.json\",\n    \"scenario_id\": \"7_10\",\n    \"ego_idx\": 100,\n    \"route\": [\n        {\n"
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/7_11.json",
    "chars": 739248,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"7_11.json\",\n    \"scenario_id\": \"7_11\",\n    \"ego_idx\": 63,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/7_14.json",
    "chars": 789279,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"7_14.json\",\n    \"scenario_id\": \"7_14\",\n    \"ego_idx\": 43,\n    \"route\": [\n        {\n "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/7_7.json",
    "chars": 648721,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"7_7.json\",\n    \"scenario_id\": \"7_7\",\n    \"ego_idx\": 110,\n    \"route\": [\n        {\n  "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/7_9.json",
    "chars": 1374401,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"7_9.json\",\n    \"scenario_id\": \"7_9\",\n    \"ego_idx\": 110,\n    \"route\": [\n        {\n  "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/8_2.json",
    "chars": 810789,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"8_2.json\",\n    \"scenario_id\": \"8_2\",\n    \"ego_idx\": 58,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/8_4.json",
    "chars": 721766,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"8_4.json\",\n    \"scenario_id\": \"8_4\",\n    \"ego_idx\": 80,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/8_8.json",
    "chars": 572995,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"8_8.json\",\n    \"scenario_id\": \"8_8\",\n    \"ego_idx\": 53,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/9_2.json",
    "chars": 1087060,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"9_2.json\",\n    \"scenario_id\": \"9_2\",\n    \"ego_idx\": 102,\n    \"route\": [\n        {\n  "
  },
  {
    "path": "metadata/simulation_environment_datasets/scenario_dreamer_waymo_200m_jsons/9_4.json",
    "chars": 749698,
    "preview": "{\n    \"tl_states\": {},\n    \"name\": \"9_4.json\",\n    \"scenario_id\": \"9_4\",\n    \"ego_idx\": 62,\n    \"route\": [\n        {\n   "
  },
  {
    "path": "metadata/sledge_files/nuplan.yaml",
    "chars": 716078,
    "preview": "_target_: nuplan.planning.training.data_loader.log_splitter.LogSplitter\n_convert_: 'all'\n\nlog_splits:\n  train:\n    - 202"
  },
  {
    "path": "metrics.py",
    "chars": 4503,
    "preview": "import os \nfrom tqdm import tqdm \nimport pickle \nimport gzip\nfrom utils.metrics_helpers import convert_data_to_unified_f"
  },
  {
    "path": "models/ctrl_sim.py",
    "chars": 8104,
    "preview": "import torch\nimport torch.nn.functional as F \nfrom torch import nn\nimport pytorch_lightning as pl\nfrom pytorch_lightning"
  },
  {
    "path": "models/scenario_dreamer_autoencoder.py",
    "chars": 15295,
    "preview": "import os \nimport pickle \nfrom utils.train_helpers import create_lambda_lr_cosine, create_lambda_lr_linear\nfrom datasets"
  },
  {
    "path": "models/scenario_dreamer_ldm.py",
    "chars": 27246,
    "preview": "import os\nimport pickle\nimport glob\nfrom tqdm import tqdm\nfrom utils.train_helpers import create_lambda_lr_cosine, creat"
  },
  {
    "path": "nn_modules/autoencoder.py",
    "chars": 24399,
    "preview": "import torch\nimport torch.nn as nn\nimport numpy as np\n\nfrom typing import Tuple, Union\n\nimport torch.nn.functional as F\n"
  },
  {
    "path": "nn_modules/ctrl_sim.py",
    "chars": 14690,
    "preview": "import torch\nimport torch.nn as nn\n\nfrom utils.train_helpers import weight_init, get_causal_mask\nfrom utils.layers impor"
  },
  {
    "path": "nn_modules/dit.py",
    "chars": 8440,
    "preview": "import torch\nimport torch.nn as nn\n\nimport numpy as np\nfrom utils.dit_layers import FactorizedDiTBlock, FinalLayer, Labe"
  },
  {
    "path": "nn_modules/ldm.py",
    "chars": 13160,
    "preview": "import numpy as np\nimport torch\nfrom torch import nn\nfrom utils.diffusion_helpers import (\n    cosine_beta_schedule,\n   "
  },
  {
    "path": "policies/idm_policy.py",
    "chars": 32744,
    "preview": "import torch\nimport numpy as np\nfrom collections import deque\nimport json\nimport matplotlib.pyplot as plt\nimport random\n"
  },
  {
    "path": "policies/rl_policy.py",
    "chars": 426,
    "preview": "from utils.gpudrive_helpers import load_policy\n\nclass RLPolicy:\n    def __init__(self, cfg):\n        self.cfg = cfg\n\n   "
  },
  {
    "path": "run_simulation.py",
    "chars": 3991,
    "preview": "import hydra\nfrom simulator import Simulator\nfrom policies.idm_policy import IDMPolicy\nfrom policies.rl_policy import RL"
  },
  {
    "path": "scripts/define_env_variables.sh",
    "chars": 136,
    "preview": "export PYTHONPATH=$(pwd):$PYTHONPATH\nexport PROJECT_ROOT=$(pwd) \nexport DATASET_ROOT=$SCRATCH_ROOT\nexport CONFIG_PATH=$P"
  },
  {
    "path": "scripts/extract_nuplan_data.sh",
    "chars": 147,
    "preview": "#!/usr/bin/env bash\n\nCODE_DIR=\"$PROJECT_ROOT/data_processing/nuplan\"\n\ncd \"$CODE_DIR\"\n# ignore all the tf warnings\npython"
  },
  {
    "path": "scripts/extract_waymo_data.sh",
    "chars": 370,
    "preview": "#!/usr/bin/env bash\n\nCODE_DIR=\"$PROJECT_ROOT/data_processing/waymo\"\n\ncd \"$CODE_DIR\"\n# ignore all the tf warnings\npython "
  },
  {
    "path": "scripts/preprocess_ctrl_sim_waymo_dataset.sh",
    "chars": 548,
    "preview": "CODE_DIR=\"$PROJECT_ROOT/data_processing/waymo\"\n\ncd \"$CODE_DIR\"\n# uncomment to regenerate the k-disks vocabulary\n# python"
  },
  {
    "path": "scripts/preprocess_nuplan_dataset.sh",
    "chars": 315,
    "preview": "CODE_DIR=\"$PROJECT_ROOT/data_processing/nuplan\"\n\ncd \"$CODE_DIR\"\npython preprocess_dataset_nuplan.py dataset_name=nuplan "
  },
  {
    "path": "scripts/preprocess_waymo_dataset.sh",
    "chars": 422,
    "preview": "CODE_DIR=\"$PROJECT_ROOT/data_processing/waymo\"\n\ncd \"$CODE_DIR\"\npython preprocess_dataset_waymo.py dataset_name=waymo pre"
  },
  {
    "path": "simulator.py",
    "chars": 43827,
    "preview": "import os\nimport pickle\nimport json\nimport copy\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\n\nfrom da"
  },
  {
    "path": "train.py",
    "chars": 9658,
    "preview": "import os \nimport hydra\nfrom models.scenario_dreamer_autoencoder import ScenarioDreamerAutoEncoder\nfrom models.scenario_"
  },
  {
    "path": "utils/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "utils/collision_helpers.py",
    "chars": 6175,
    "preview": "import numpy as np \nimport torch\n\ndef compute_corners(positions, headings, lengths, widths):\n    \"\"\" Compute the four co"
  },
  {
    "path": "utils/data_container.py",
    "chars": 5188,
    "preview": "import numpy as np\nimport torch\nnp.set_printoptions(suppress=True)\nfrom torch_geometric.data import HeteroData\n\n\nclass C"
  },
  {
    "path": "utils/data_helpers.py",
    "chars": 23774,
    "preview": "import numpy as np\nimport torch\nnp.set_printoptions(suppress=True)\nfrom utils.data_container import get_batches, get_fea"
  },
  {
    "path": "utils/diffusion_helpers.py",
    "chars": 766,
    "preview": "import numpy as np\nimport torch\n\ndef extract(a, t, x_shape):\n    \"\"\" Extracts values from a tensor `a` at indices specif"
  },
  {
    "path": "utils/dit_layers.py",
    "chars": 12820,
    "preview": "import torch\nimport torch.nn as nn\nfrom itertools import repeat\nimport collections.abc\nfrom torch_geometric.nn.conv impo"
  },
  {
    "path": "utils/geometry.py",
    "chars": 5007,
    "preview": "import numpy as np\n\ndef rotate_and_normalize_angles(current_angles, rotation_angle):\n    \"\"\" Rotates angles by a given r"
  },
  {
    "path": "utils/gpudrive_helpers.py",
    "chars": 65098,
    "preview": "\"\"\" Helper functions and classes for GPUDrive integration. \nMany of these functions are Python versions of C++ code in G"
  },
  {
    "path": "utils/inpainting_helpers.py",
    "chars": 10269,
    "preview": "import torch\nimport numpy as np\nimport copy\nimport networkx as nx\nimport random\nfrom utils.lane_graph_helpers import res"
  },
  {
    "path": "utils/k_disks_helpers.py",
    "chars": 9198,
    "preview": "import numpy as np\nfrom utils.geometry import normalize_angle\nfrom tqdm import tqdm\n\ndef compute_k_disks_vocabulary(stat"
  },
  {
    "path": "utils/lane_graph_helpers.py",
    "chars": 10796,
    "preview": "import numpy as np\nnp.set_printoptions(suppress=True)\nimport networkx as nx\n\ndef get_compact_lane_graph(data):\n    \"\"\"Ap"
  },
  {
    "path": "utils/layers.py",
    "chars": 11893,
    "preview": "import torch\nimport torch.nn as nn\nfrom torch_geometric.nn.conv import MessagePassing\nfrom torch_geometric.utils import "
  },
  {
    "path": "utils/losses.py",
    "chars": 1982,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nclass GeometricLoss(nn.Module):\n    def __init__(sel"
  },
  {
    "path": "utils/metrics_helpers.py",
    "chars": 35933,
    "preview": "import numpy as np\nimport torch\nimport networkx as nx\nfrom utils.pyg_helpers import get_edge_index_complete_graph\nfrom c"
  },
  {
    "path": "utils/pyg_helpers.py",
    "chars": 1252,
    "preview": "import torch\n\ndef get_edge_index_bipartite(num_src, num_dst):\n    \"\"\"Create a fully connected bipartite `edge_index` ten"
  },
  {
    "path": "utils/sim_env_helpers.py",
    "chars": 46996,
    "preview": "import os\nimport copy\nimport pickle\nimport numpy as np\nimport networkx as nx\nimport random\nimport torch\n\nfrom cfgs.confi"
  },
  {
    "path": "utils/sim_helpers.py",
    "chars": 6310,
    "preview": "import numpy as np\nimport networkx as nx\n\nfrom utils.collision_helpers import batched_collision_checker\nfrom utils.lane_"
  },
  {
    "path": "utils/sledge_helpers.py",
    "chars": 4091,
    "preview": "import numpy as np\nfrom scipy.interpolate import interp1d\nfrom utils.geometry import normalize_angle\n\ndef calculate_prog"
  },
  {
    "path": "utils/torch_helpers.py",
    "chars": 466,
    "preview": "import numpy as np\nimport torch\n\ndef from_numpy(data):\n    \"\"\"Recursively transform numpy.ndarray to torch.Tensor.\n    \""
  },
  {
    "path": "utils/train_helpers.py",
    "chars": 8580,
    "preview": "import torch.nn as nn\nimport torch\nimport math\nimport os\nfrom torch_geometric.loader import DataLoader\nfrom tqdm import "
  },
  {
    "path": "utils/viz.py",
    "chars": 18046,
    "preview": "import numpy as np\nimport matplotlib.pyplot as plt\nimport matplotlib.patches as mpatches\nimport matplotlib.transforms as"
  }
]

// ... and 88 more files (download for full content)

About this extraction

This page contains the full source code of the princeton-computational-imaging/scenario-dreamer GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 274 files (51.8 MB), approximately 13.6M tokens, and a symbol index with 500 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo