[
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2024 szh\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "<div id=\"top\" align=\"center\">\n<p align=\"center\">\n  <strong>\n    <h2 align=\"center\">VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving</h2>\n    <h3 align=\"center\"><a href=\"https://www.huang-zilin.com/VLM-RL-website/\">Website</a> | <a href=\"https://arxiv.org/abs/2412.15544\">Paper</a> | <a href=\"https://www.youtube.com/embed/oXBih9r2DdI?si=XRUEthPoni_zNTR6\">Video</a> | <a href=\"https://huggingface.co/zihaosheng/VLM-RL\">Hugging Face</a>  </h3>\n  </strong>\n</p>\n</div>\n\n<br/>\n\n> **[VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving](https://arxiv.org/abs/2412.15544)**\n>\n> [Zilin Huang](https://scholar.google.com/citations?user=RgO7ppoAAAAJ&hl=en)<sup>1,\\*</sup>,\n> [Zihao Sheng](https://scholar.google.com/citations?user=3T-SILsAAAAJ&hl=en)<sup>1,\\*</sup>,\n> [Yansong Qu](https://scholar.google.com/citations?view_op=list_works&hl=zh-CN&user=hIt7KnUAAAAJ)<sup>2,†</sup>,\n> [Junwei You](https://scholar.google.com/citations?user=wIGL3SQAAAAJ&hl=en)<sup>1</sup>,\n> [Sikai Chen](https://scholar.google.com/citations?user=DPN2wc4AAAAJ&hl=en)<sup>1,✉</sup><br>\n>\n> <sup>1</sup>University of Wisconsin-Madison, <sup>2</sup>Purdue University\n>\n> <sup>\\*</sup>Equally Contributing First Authors,\n> <sup>✉</sup>Corresponding Author\n> <br/>\n\n## 📢 News\n- **2025.09**: 🔥🔥 The model weights are now available on [🤗 Hugging Face](https://huggingface.co/zihaosheng/VLM-RL). Feel free to try them out!\n- **2025.08**: 🔥🔥 **VLM-RL** has been accepted to *Transportation Research Part C: Emerging Technologies*!\n\n## 💡 Highlights <a name=\"highlight\"></a>\n\n🔥 To the best of our knowledge, **VLM-RL** is the first work in the autonomous driving field to unify VLMs with RL for\nend-to-end driving policy learning in the CARLA simulator.\n\n🏁 **VLM-RL** outperforms state-of-the-art baselines, achieving a 10.5% reduction in collision rate, a 104.6% increase in\nroute completion rate, and robust generalization to unseen driving scenarios.\n\n|                                                       Route 1                                                        |                                                       Route 2                                                        |                                                       Route 3                                                        |                                                       Route 4                                                        |                                                       Route 5                                                        |\n|:--------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------:|\n| ![Route 1](https://raw.githubusercontent.com/zilin-huang/VLM-RL-website/master/static/videos/CLIP/CLIP_town2_normal/CLIP_town2_normal_s1.gif) | ![Route 2](https://raw.githubusercontent.com/zilin-huang/VLM-RL-website/master/static/videos/CLIP/CLIP_town2_normal/CLIP_town2_normal_s2.gif) | ![Route 3](https://raw.githubusercontent.com/zilin-huang/VLM-RL-website/master/static/videos/CLIP/CLIP_town2_normal/CLIP_town2_normal_s3.gif) | ![Route 4](https://raw.githubusercontent.com/zilin-huang/VLM-RL-website/master/static/videos/CLIP/CLIP_town2_normal/CLIP_town2_normal_s4.gif) | ![Route 5](https://raw.githubusercontent.com/zilin-huang/VLM-RL-website/master/static/videos/CLIP/CLIP_town2_normal/CLIP_town2_normal_s5.gif) |\n\n|                                                       Route 6                                                        |                                                       Route 7                                                        |                                                       Route 8                                                        |                                                       Route 9                                                        |                                                        Route 10                                                        |\n|:--------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------:|\n| ![Route 6](https://raw.githubusercontent.com/zilin-huang/VLM-RL-website/master/static/videos/CLIP/CLIP_town2_normal/CLIP_town2_normal_s6.gif) | ![Route 7](https://raw.githubusercontent.com/zilin-huang/VLM-RL-website/master/static/videos/CLIP/CLIP_town2_normal/CLIP_town2_normal_s7.gif) | ![Route 8](https://raw.githubusercontent.com/zilin-huang/VLM-RL-website/master/static/videos/CLIP/CLIP_town2_normal/CLIP_town2_normal_s8.gif) | ![Route 9](https://raw.githubusercontent.com/zilin-huang/VLM-RL-website/master/static/videos/CLIP/CLIP_town2_normal/CLIP_town2_normal_s9.gif) | ![Overtake](https://raw.githubusercontent.com/zilin-huang/VLM-RL-website/master/static/videos/CLIP/CLIP_town2_normal/CLIP_town2_normal_s10.gif) |\n\n## 📋 Table of Contents\n\n1. [Highlights](#highlight)\n2. [Getting Started](#setup)\n3. [Training](#training)\n4. [Evaluation](#evaluation)\n5. [Contributors](#contributors)\n6. [Citation](#citation)\n7. [Other Resources](#resources)\n\n## 🛠️ Getting Started <a name=\"setup\"></a>\n\n\n1. Download and install `CARLA 0.9.13` from the [official release page](https://github.com/carla-simulator/carla/releases/tag/0.9.13).\n2. Create a conda env and install the requirements:\n```shell\n# Clone the repo\ngit clone https://github.com/zihaosheng/VLM-RL.git\ncd VLM-RL\n\n# Create a conda env\nconda create -y -n vlm-rl python=3.8\nconda activate vlm-rl\n\n# Install PyTorch\npip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116\n\n# Install the requirements\npip install -r requirements.txt\n```\n\n3. Start a Carla server with the following command. You can ignore this if `start_carla=True`\n\n```shell\n./CARLA_0.9.13/CarlaUE4.sh -quality_level=Low -benchmark -fps=15 -RenderOffScreen -prefernvidia -carla-world-port=2000\n```\n\nIf `start_carla=True`, revise the `CARLA_ROOT` in `carla_env/envs/carla_route_env.py` to the path of your CARLA installation.\n\n<p align=\"right\">(<a href=\"#top\">back to top</a>)</p>\n\n## 🚋 Training <a name=\"training\"></a>\n\n### Training VLM-RL\n\nTo reproduce the results in the paper, we provide the following training scripts:\n\n```shell\npython train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0\n```\n\n**Note:** On the first run, the script will automatically download the required OpenCLIP pre-trained model, which may take a few minutes. Please wait for the download to complete before the training begins.\n\n#### To accelerate the training process, you can set up multiple CARLA servers running in parallel. \n<details>\n  <summary>For example, to train the VLM-RL model with 3 CARLA servers on different GPUs, run the following commands in three separate terminals:\n</summary>\n\n#### Terminal 1:\n```shell\npython train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0\n```\n\n#### Terminal 2:\n```shell\npython train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2005 --device=cuda:1\n```\n\n#### Terminal 3:\n```shell\npython train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2010 --device=cuda:2\n```\n</details>\n\nTo train the VLM-RL model with PPO, run:\n```shell\npython train.py --config=vlm_rl_ppo --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0\n```\n\n### Training Baselines\n\nTo train baseline models, simply change the `--config` argument to the desired model. For example, to train the TIRL-SAC model, run:\n```shell\npython train.py --config=tirl_sac --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0\n```\n\nMore baseline models can be found in the `CONFIGS` dictionary of `config.py`.\n\n<p align=\"right\">(<a href=\"#top\">back to top</a>)</p>\n\n## 📊 Evaluation <a name=\"evaluation\"></a>\n\nTo evaluate trained model checkpoints, run:\n\n```shell\npython run_eval.py\n```\n\n**Note:** that this command will first **KILL** all the existing CARLA servers and then start a new one. \nTry to avoid running this command while training is in progress.\n\n<p align=\"right\">(<a href=\"#top\">back to top</a>)</p>\n\n## 👥 Contributors <a name=\"contributors\"></a>\n\nSpecial thanks to the following contributors who have helped with this project:\n\n<!-- readme: contributors -start -->\n<table>\n\t<tbody>\n\t\t<tr>\n            <td align=\"center\">\n                <a href=\"https://github.com/zihaosheng\">\n                    <img src=\"https://scholar.googleusercontent.com/citations?view_op=view_photo&user=3T-SILsAAAAJ&citpid=7\" width=\"100;\" alt=\"zihaosheng\"/>\n                    <br />\n                    <sub><b>Zihao Sheng</b></sub>\n                </a>\n            </td>\n            <td align=\"center\">\n                <a href=\"https://github.com/zilin-huang\">\n                    <img src=\"https://avatars.githubusercontent.com/u/59532565?v=4\" width=\"100;\" alt=\"zilinhuang\"/>\n                    <br />\n                    <sub><b>Zilin Huang</b></sub>\n                </a>\n            </td>\n            <td align=\"center\">\n                <a href=\"https://scholar.google.com/citations?user=hIt7KnUAAAAJ&hl=en&oi=sra\">\n                    <img src=\"https://scholar.googleusercontent.com/citations?view_op=view_photo&user=hIt7KnUAAAAJ&citpid=2\" width=\"100;\" alt=\"yansongqu\"/>\n                    <br />\n                    <sub><b>Yansong Qu</b></sub>\n                </a>\n            </td>\n            <td align=\"center\">\n                <a href=\"https://scholar.google.com/citations?user=wIGL3SQAAAAJ&hl=en\">\n                    <img src=\"https://scholar.google.com/citations/images/avatar_scholar_128.png\" width=\"100;\" alt=\"junweiyou\"/>\n                    <br />\n                    <sub><b>Junwei You</b></sub>\n                </a>\n            </td>\n\t\t</tr>\n\t<tbody>\n</table>\n<!-- readme: contributors -end -->\n\n<p align=\"right\">(<a href=\"#top\">back to top</a>)</p>\n\n## 🎯 Citation <a name=\"citation\"></a>\n\nIf you find VLM-RL useful for your research, please consider giving us a star 🌟 and citing our paper:\n\n```BibTeX\n@article{huang2024vlmrl,\n  title={VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving},\n  author={Huang, Zilin and Sheng, Zihao and Qu, Yansong and You, Junwei and Chen, Sikai},\n  journal={arXiv preprint arXiv:2412.15544},\n  year={2024}\n}\n```\n\n<p align=\"right\">(<a href=\"#top\">back to top</a>)</p>\n\n## 📚 Other Resources <a name=\"resources\"></a>\n\nOur team is actively working on research projects in the field of AI and autonomous driving. Here are a few of them you might find interesting:\n\n- **[Human as AI mentor](https://zilin-huang.github.io/HAIM-DRL-website/)**\n- **[Physics-enhanced RLHF](https://zilin-huang.github.io/PE-RLHF-website/)**\n- **[Traffic expertise meets residual RL](https://github.com/zihaosheng/traffic-expertise-RL)**\n  \n<p align=\"right\">(<a href=\"#top\">back to top</a>)</p>\n"
  },
  {
    "path": "carla_env/__init__.py",
    "content": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom carla import *"
  },
  {
    "path": "carla_env/envs/carla_route_env.py",
    "content": "import os\nimport subprocess\nimport time\nimport gym\nimport h5py\nimport pygame\nimport cv2\nfrom pygame.locals import *\nimport random\n\nfrom config import CONFIG\n\nfrom carla_env.tools.hud import HUD\nfrom carla_env.navigation.planner import RoadOption, compute_route_waypoints\nfrom carla_env.wrappers import *\n\nimport carla\nfrom collections import deque\nimport itertools\n\nintersection_routes = itertools.cycle(\n    [(57, 81), (70, 11), (70, 12), (78, 68), (74, 41), (42, 73), (71, 62), (74, 40), (71, 77), (6, 12), (65, 52),\n     (63, 80)])\neval_routes = itertools.cycle(\n    [(48, 21), (0, 72), (28, 83), (61, 39), (27, 14), (6, 67), (61, 49), (37, 64), (33, 80), (12, 30), ])\n\nCOLOR_BLACK = (0, 0, 0)\nCOLOR_RED = (255, 0, 0)\nCOLOR_GREEN = (0, 255, 0)\nCOLOR_BLUE = (0, 0, 255)\nCOLOR_CYAN = (0, 255, 255)\nCOLOR_MAGENTA = (255, 0, 255)\nCOLOR_MAGENTA_2 = (255, 140, 255)\nCOLOR_YELLOW = (255, 255, 0)\nCOLOR_YELLOW_2 = (160, 160, 0)\nCOLOR_WHITE = (255, 255, 255)\nCOLOR_ALUMINIUM_0 = (238, 238, 236)\nCOLOR_ALUMINIUM_3 = (136, 138, 133)\nCOLOR_ALUMINIUM_5 = (46, 52, 54)\n\n\ndef tint(color, factor):\n    r, g, b = color\n    r = int(r + (255 - r) * factor)\n    g = int(g + (255 - g) * factor)\n    b = int(b + (255 - b) * factor)\n    r = min(r, 255)\n    g = min(g, 255)\n    b = min(b, 255)\n    return (r, g, b)\n\n\ndiscrete_actions = {\n    0: [-1.0, 0.0],\n    1: [0.7, -0.5],\n    2: [0.7, -0.3],\n    3: [0.7, -0.2],\n    4: [0.7, -0.1],\n    5: [0.7, 0.0],\n    6: [0.7, 0.1],\n    7: [0.7, 0.2],\n    8: [0.7, 0.3],\n    9: [0.7, 0.5],\n    10: [0.3, -0.7],\n    11: [0.3, -0.5],\n    12: [0.3, -0.3],\n    13: [0.3, -0.2],\n    14: [0.3, -0.1],\n    15: [0.3, 0.0],\n    16: [0.3, 0.1],\n    17: [0.3, 0.2],\n    18: [0.3, 0.3],\n    19: [0.3, 0.5],\n    20: [0.3, 0.7],\n    21: [0.0, -1.0],\n    22: [0.0, -0.6],\n    23: [0.0, -0.3],\n    24: [0.0, -0.1],\n    25: [0.0, 0.0],\n    26: [0.0, 0.1],\n    27: [0.0, 0.3],\n    28: [0.0, 0.6],\n    29: [0.0, 1.0],\n}\n\nclass_blueprint = {\n    'car': ['vehicle.tesla.model3',\n            'vehicle.audi.tt',\n            'vehicle.chevrolet.impala', ]\n}\n\n\ndef random_choice_from_blueprint(blueprint):\n    all_elements = [item for sublist in blueprint.values() for item in sublist]\n    return random.choice(all_elements)\n\n\nclass CarlaRouteEnv(gym.Env):\n    metadata = {\n        \"render.modes\": [\"human\", \"rgb_array\", \"rgb_array_no_hud\", \"state_pixels\"]\n    }\n\n    def __init__(self, host=\"127.0.0.1\", port=2000,\n                 viewer_res=(1120, 560), obs_res=(80, 120),\n                 reward_fn=None,\n                 observation_space=None,\n                 encode_state_fn=None,\n                 fps=15, action_smoothing=0.0,\n                 action_space_type=\"continuous\",\n                 activate_spectator=True,\n                 activate_bev=False,\n                 start_carla=False,\n                 eval=False,\n                 activate_render=True,\n                 activate_traffic_flow=False,\n                 activate_seg_bev=False,\n                 tf_num=20,\n                 town='Town02'):\n\n        self.carla_process = None\n        if start_carla:\n            CARLA_ROOT = \"/home/sky-lab/CARLA_0.9.13\"\n            carla_path = os.path.join(CARLA_ROOT, \"CarlaUE4.sh\")\n            launch_command = [carla_path]\n            launch_command += ['-quality_level=Low']\n            launch_command += ['-benchmark']\n            launch_command += [\"-fps=%i\" % fps]\n            launch_command += ['-RenderOffScreen']\n            launch_command += ['-prefernvidia']\n            launch_command += [f'-carla-world-port={port}']\n            print(\"Running command:\")\n            print(\" \".join(launch_command))\n            self.carla_process = subprocess.Popen(launch_command, stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)\n            print(\"Waiting for CARLA to initialize\\n\")\n\n            time.sleep(5)\n\n        width, height = viewer_res\n        if obs_res is None:\n            out_width, out_height = width, height\n        else:\n            out_width, out_height = obs_res\n        self.activate_render = activate_render\n\n        self.num_envs = 1\n\n        # Setup gym environment\n        self.action_space_type = action_space_type\n        if self.action_space_type == \"continuous\":\n            self.action_space = gym.spaces.Box(np.array([-1, -1]), np.array([1, 1]),\n                                               dtype=np.float32)  # steer, throttle\n        elif self.action_space_type == \"discrete\":\n            self.action_space = gym.spaces.Discrete(len(discrete_actions))\n\n        self.observation_space = observation_space\n\n        self.fps = fps\n        self.action_smoothing = action_smoothing\n        self.episode_idx = -2\n\n        self.encode_state_fn = (lambda x: x) if not callable(encode_state_fn) else encode_state_fn\n        self.reward_fn = (lambda x: 0) if not callable(reward_fn) else reward_fn\n        self.max_distance = 3000  # m\n        self.activate_spectator = activate_spectator\n        self.activate_bev = activate_bev\n        self.eval = eval\n        self.activate_traffic_flow = activate_traffic_flow\n        self.traffic_flow_vehicles = []\n        self.low_speed_timer = 0.0\n        self.collision_num = 0\n        self.cps = 0  # average collision per timestep\n        self.cpm = 0  # average collision per kilometer\n        self.collision_speed = 0.0\n        self.collision_interval = 0\n        self.last_collision_step = 0\n        self.collision_deque = deque(maxlen=100)\n        self.total_steps = 0\n\n        # bev parameters\n        self.use_seg_bev = True if activate_seg_bev else False\n        self._width = 192\n        self._pixels_per_meter = 5.0\n        self._distance_threshold = np.ceil(self._width / self._pixels_per_meter)\n        self._scale_bbox = True\n        self._history_queue = deque(maxlen=20)\n        self._pixels_ev_to_bottom = 40\n        self._history_idx = [-16, -11, -6, -1]\n        self._scale_mask_col = 1.0\n        maps_h5_path = './carla_env/envs/{}.h5'.format(town)\n        with h5py.File(maps_h5_path, 'r', libver='latest', swmr=True) as hf:\n            self._road = np.array(hf['road'], dtype=np.uint8)\n            self._lane_marking_all = np.array(hf['lane_marking_all'], dtype=np.uint8)\n            self._lane_marking_white_broken = np.array(hf['lane_marking_white_broken'], dtype=np.uint8)\n            self._world_offset = np.array(hf.attrs['world_offset_in_meters'], dtype=np.float32)\n            assert np.isclose(self._pixels_per_meter, float(hf.attrs['pixels_per_meter']))\n\n        self.world = None\n        try:\n            self.client = carla.Client(host, port)\n            self.client.set_timeout(5.0)\n            self.world = World(self.client, town=town)\n\n            settings = self.world.get_settings()\n            settings.fixed_delta_seconds = 1 / self.fps\n            settings.synchronous_mode = True\n            self.world.apply_settings(settings)\n            self.client.reload_world(False)\n            if not self.eval:\n                self.world.set_weather(\n                    carla.WeatherParameters(cloudiness=100.0, precipitation=0.0, sun_altitude_angle=45.0, )\n                )\n\n            self.vehicle = Vehicle(self.world, self.world.map.get_spawn_points()[0],\n                                   on_collision_fn=lambda e: self._on_collision(e),\n                                   on_invasion_fn=lambda e: self._on_invasion(e),\n                                   is_ego=True)\n\n            if self.activate_render:\n                pygame.init()\n                pygame.font.init()\n                self.display = pygame.display.set_mode((width, height), pygame.HWSURFACE | pygame.DOUBLEBUF)\n                self.clock = pygame.time.Clock()\n                self.hud = HUD(width, height)\n                self.hud.set_vehicle(self.vehicle)\n                self.world.on_tick(self.hud.on_world_tick)\n\n            seg_settings = {}\n            if \"seg_camera\" in self.observation_space.spaces:\n                seg_settings.update({\n                    'camera_type': \"sensor.camera.semantic_segmentation\",\n                    'custom_palette': True\n                })\n            if not self.use_seg_bev:\n                self.dashcam = Camera(self.world, out_width, out_height,\n                                      transform=sensor_transforms[\"dashboard\"],\n                                      attach_to=self.vehicle, on_recv_image=lambda e: self._set_observation_image(e),\n                                      **seg_settings)\n\n            if self.activate_spectator:\n                self.camera = Camera(self.world, width, height,\n                                     transform=sensor_transforms[\"spectator\"],\n                                     attach_to=self.vehicle, on_recv_image=lambda e: self._set_viewer_image(e))\n                self.bev_spectator = Camera(self.world, height // 2, height // 2,\n                                            transform=sensor_transforms[\"birdview0\"],\n                                            attach_to=self.vehicle,\n                                            on_recv_image=lambda e: self._set_bev_spectator_data(e))\n            if self.activate_bev:\n                self.bev = Camera(self.world, 517, 517,\n                                  transform=sensor_transforms[\"birdview1\"],\n                                  attach_to=self.vehicle, on_recv_image=lambda e: self._set_bev_data(e))\n\n            if self.activate_traffic_flow:\n                self.tf_num = tf_num\n                self.traffic_manager = self.client.get_trafficmanager(port + 6000)\n                self.traffic_manager.set_global_distance_to_leading_vehicle(2.5)\n                self.traffic_manager.set_synchronous_mode(True)\n                self.traffic_manager.set_hybrid_physics_mode(True)\n                self.traffic_manager.set_hybrid_physics_radius(50.0)\n        except Exception as e:\n            self.close()\n            raise e\n        # Reset env to set initial state\n        self.reset()\n\n    def reset(self, is_training=False):\n        # Create new route\n        self.num_routes_completed = -1\n        self.episode_idx += 1\n        self.new_route()\n\n        self.terminal_state = False  # Set to True when we want to end episode\n        self.success_state = False  # Set to True when we want to end episode.\n        self.collision_state = False\n        self._history_queue.clear()\n        self.world.world = None\n        self.world.world = self.client.get_world()\n\n        self.closed = False  # Set to True when ESC is pressed\n        self.extra_info = []  # List of extra info shown on the HUD\n        self.observation = self.observation_buffer = None  # Last received observation\n        self.viewer_image = self.viewer_image_buffer = None  # Last received image to show in the viewer\n        self.bev_spectator_data = self.bev_spectator_data_buffer = None\n        self.bev_data = self.bev_data_buffer = None\n        self.step_count = 0\n\n        self.total_reward = 0.0\n        self.previous_location = self.vehicle.get_transform().location\n        self.distance_traveled = 0.0\n        self.center_lane_deviation = 0.0\n        self.speed_accum = 0.0\n        self.routes_completed = 0.0\n        self.low_speed_timer = 0.0\n        self.collision = False\n        self.action_list = []\n        self.world.tick()\n\n        time.sleep(0.2)\n        obs = self.step(None)[0]\n        time.sleep(0.2)\n\n        return obs\n\n    def new_route(self):\n        if self.activate_traffic_flow:\n            for bg_veh in list(self.traffic_flow_vehicles):\n                if bg_veh.is_alive:\n                    bg_veh.destroy()\n                    self.traffic_flow_vehicles.remove(bg_veh)\n\n        self.vehicle.control.steer = float(0.0)\n        self.vehicle.control.throttle = float(0.0)\n        self.vehicle.set_simulate_physics(False)\n\n        if not self.eval:\n            if self.episode_idx % 2 == 0 and self.num_routes_completed == -1:\n                spawn_points_list = [self.world.map.get_spawn_points()[index] for index in next(intersection_routes)]\n            else:\n                spawn_points_list = np.random.choice(self.world.map.get_spawn_points(), 2, replace=False)\n        else:\n            spawn_points_list = [self.world.map.get_spawn_points()[index] for index in next(eval_routes)]\n        route_length = 1\n        while route_length <= 1:\n            self.start_wp, self.end_wp = [self.world.map.get_waypoint(spawn.location) for spawn in\n                                          spawn_points_list]\n            self.route_waypoints = compute_route_waypoints(self.world.map, self.start_wp, self.end_wp, resolution=1.0)\n            route_length = len(self.route_waypoints)\n            if route_length <= 1:\n                spawn_points_list = np.random.choice(self.world.map.get_spawn_points(), 2, replace=False)\n\n        self.distance_from_center_history = deque(maxlen=30)\n\n        self.current_waypoint_index = 0\n        self.num_routes_completed += 1\n        self.vehicle.set_transform(self.start_wp.transform)\n        time.sleep(0.2)\n        self.vehicle.set_simulate_physics(True)\n\n        if self.activate_traffic_flow:\n            spawn_points = self.world.get_map().get_spawn_points()\n            number_of_vehicles = min(len(spawn_points), self.tf_num)  # Adjust the number of vehicles as needed\n\n            for _ in range(number_of_vehicles):\n                blueprint = random_choice_from_blueprint(class_blueprint)\n                blueprint = self.world.get_blueprint_library().find(blueprint)\n                spawn_point = random.choice(spawn_points)\n                bg_veh = self.world.try_spawn_actor(blueprint, spawn_point)\n\n                if bg_veh:\n                    bg_veh.set_autopilot(True, self.traffic_manager.get_port())\n                    self.traffic_flow_vehicles.append(bg_veh)\n\n    def close(self):\n        if self.carla_process:\n            self.carla_process.terminate()\n        pygame.quit()\n        if self.world is not None:\n            self.world.destroy()\n        self.closed = True\n        if self.world is not None:\n            actors = self.world.get_actors().filter('vehicle.*')\n            for actor in actors:\n                actor.destroy()\n\n    def render(self, mode=\"human\"):\n        if mode == \"rgb_array_no_hud\":\n            return self.viewer_image\n        elif mode == \"rgb_array\":\n            return np.array(pygame.surfarray.array3d(self.display), dtype=np.uint8).transpose([1, 0, 2])\n        elif mode == \"state_pixels\":\n            return self.observation_render\n\n        self.clock.tick()\n        self.hud.tick(self.world, self.clock)\n\n        if self.current_road_maneuver == RoadOption.LANEFOLLOW:\n            maneuver = \"Follow Lane\"\n        elif self.current_road_maneuver == RoadOption.LEFT:\n            maneuver = \"Left\"\n        elif self.current_road_maneuver == RoadOption.RIGHT:\n            maneuver = \"Right\"\n        elif self.current_road_maneuver == RoadOption.STRAIGHT:\n            maneuver = \"Straight\"\n        elif self.current_road_maneuver == RoadOption.CHANGELANELEFT:\n            maneuver = \"Change Lane Left\"\n        elif self.current_road_maneuver == RoadOption.CHANGELANERIGHT:\n            maneuver = \"Change Lane Right\"\n        else:\n            maneuver = \"INVALID\"\n\n        self.extra_info.extend([\n            \"Episode {}\".format(self.episode_idx),\n            \"\",\n            \"Maneuver:  % 17s\" % maneuver,\n            \"Throttle:            %7.2f\" % self.vehicle.control.throttle,\n            \"Brake:               %7.2f\" % self.vehicle.control.brake,\n            \"Steer:               %7.2f\" % self.vehicle.control.steer,\n            \"Routes completed:    % 7.2f\" % self.routes_completed,\n            \"Distance traveled: % 7d m\" % self.distance_traveled,\n            \"Center deviance:   % 7.2f m\" % self.distance_from_center,\n            \"Avg center dev:    % 7.2f m\" % (self.center_lane_deviation / self.step_count),\n            \"Avg speed:      % 7.2f km/h\" % (self.speed_accum / self.step_count),\n        ])\n\n        if self.activate_spectator:\n            self.viewer_image = self._draw_path(self.camera, self.viewer_image)\n            self.display.blit(pygame.surfarray.make_surface(self.viewer_image.swapaxes(0, 1)), (0, 0))\n\n            new_size = (self.display.get_size()[1] // 2, self.display.get_size()[1] // 2)\n            pos_bev = (self.display.get_size()[0] - self.display.get_size()[1] // 2, self.display.get_size()[1] // 2)\n            bev_surface = pygame.surfarray.make_surface(self.bev_spectator_data.swapaxes(0, 1))\n            scaled_surface = pygame.transform.scale(bev_surface, new_size)\n            self.display.blit(scaled_surface, pos_bev)\n\n            pos_obs = (self.display.get_size()[0] - self.display.get_size()[1] // 2, 0)\n            obs_surface = pygame.surfarray.make_surface(self.observation_render.swapaxes(0, 1))\n            scaled_surface = pygame.transform.scale(obs_surface, new_size)\n            self.display.blit(scaled_surface, pos_obs)\n\n        self.hud.render(self.display, extra_info=self.extra_info)\n        self.extra_info = []\n\n        pygame.display.flip()\n\n    def step(self, action):\n        if self.closed:\n            raise Exception(\"CarlaEnv.step() called after the environment was closed.\" +\n                            \"Check for info[\\\"closed\\\"] == True in the learning loop.\")\n        if action is not None:\n            if self.current_waypoint_index >= len(self.route_waypoints) - 1:\n                if not self.eval:\n                    self.new_route()\n                else:\n                    self.success_state = True\n\n            if self.action_space_type == \"continuous\":\n                steer, throttle = [float(a) for a in action]\n            elif self.action_space_type == \"discrete\":\n                throttle, steer = discrete_actions[int(action)]\n\n            self.vehicle.control.steer = smooth_action(self.vehicle.control.steer, steer, self.action_smoothing)\n            if throttle >= 0:\n                self.vehicle.control.throttle = throttle\n                self.vehicle.control.brake = 0\n            else:\n                self.vehicle.control.throttle = 0\n                self.vehicle.control.brake = -throttle\n            self.action_list.append(self.vehicle.control.steer)\n        self.world.tick()\n\n        if self.use_seg_bev:\n            self.observation_render = self._get_observation_seg_bev()['rendered']\n            self.observation = self._get_observation_seg_bev()['masks']\n        else:\n            self.observation = self._get_observation()\n            self.observation_render = self._get_observation_seg_bev()['rendered']\n        if self.activate_spectator:\n            self.viewer_image = self._get_viewer_image()\n            self.bev_spectator_data = self._get_bev_spectator_data()\n\n        if self.activate_bev:\n            self.bev_data = self._get_bev_data()\n\n        transform = self.vehicle.get_transform()\n\n        self.prev_waypoint_index = self.current_waypoint_index\n        waypoint_index = self.current_waypoint_index\n        for _ in range(len(self.route_waypoints)):\n            next_waypoint_index = waypoint_index + 1\n            wp, _ = self.route_waypoints[next_waypoint_index % len(self.route_waypoints)]\n            dot = np.dot(vector(wp.transform.get_forward_vector())[:2],\n                         vector(transform.location - wp.transform.location)[:2])\n            if dot > 0.0:\n                waypoint_index += 1\n            else:\n                break\n        self.current_waypoint_index = waypoint_index\n\n        if self.current_waypoint_index < len(self.route_waypoints) - 1:\n            self.next_waypoint, self.next_road_maneuver = self.route_waypoints[\n                (self.current_waypoint_index + 1) % len(self.route_waypoints)]\n\n        self.current_waypoint, self.current_road_maneuver = self.route_waypoints[\n            self.current_waypoint_index % len(self.route_waypoints)]\n        self.routes_completed = self.num_routes_completed + (self.current_waypoint_index + 1) / len(\n            self.route_waypoints)\n\n        self.distance_from_center = distance_to_line(vector(self.current_waypoint.transform.location),\n                                                     vector(self.next_waypoint.transform.location),\n                                                     vector(transform.location))\n        self.center_lane_deviation += self.distance_from_center\n\n        if action is not None:\n            self.distance_traveled += self.previous_location.distance(transform.location)\n        self.previous_location = transform.location\n\n        self.speed_accum += self.vehicle.get_speed()\n\n        if self.distance_traveled >= self.max_distance and not self.eval:\n            self.success_state = True\n\n        self.distance_from_center_history.append(self.distance_from_center)\n\n        self.last_reward = self.reward_fn(self)\n        self.total_reward += self.last_reward\n\n        encoded_state = self.encode_state_fn(self)\n        self.step_count += 1\n        self.total_steps += 1\n\n        if self.activate_render:\n            pygame.event.pump()\n            if pygame.key.get_pressed()[K_ESCAPE]:\n                self.close()\n                self.terminal_state = True\n            self.render()\n\n        max_distance = CONFIG.reward_params.max_distance\n        max_std_center_lane = CONFIG.reward_params.max_std_center_lane\n        max_angle_center_lane = CONFIG.reward_params.max_angle_center_lane\n        centering_factor = max(1.0 - self.distance_from_center / max_distance, 0.0)\n\n        angle = self.vehicle.get_angle(self.current_waypoint)\n        angle_factor = max(1.0 - abs(angle / np.deg2rad(max_angle_center_lane)), 0.0)\n\n        std = np.std(self.distance_from_center_history)\n        distance_std_factor = max(1.0 - abs(std / max_std_center_lane), 0.0)\n\n        if self.terminal_state or self.success_state:\n            if self.collision_state:\n                self.collision_num += 1\n                self.collision_deque.append(1)\n                self.cps = 1 / self.step_count\n                self.cpm = 1 / self.distance_traveled * 1000\n                self.collision_interval = self.total_steps - self.last_collision_step\n                self.last_collision_step = self.total_steps\n                self.collision_speed = self.vehicle.get_speed()\n            else:\n                self.collision_deque.append(0)\n\n        info = {\n            \"closed\": self.closed,\n            'total_reward': self.total_reward,\n            'routes_completed': self.routes_completed,\n            'total_distance': self.distance_traveled,\n            'avg_center_dev': (self.center_lane_deviation / self.step_count),\n            'avg_speed': (self.speed_accum / self.step_count),\n            'mean_reward': (self.total_reward / self.step_count),\n            'render_array': self.bev_data,\n            \"centering_factor\": centering_factor,\n            \"angle_factor\": angle_factor,\n            \"distance_std_factor\": distance_std_factor,\n            \"speed\": self.vehicle.get_speed(),\n            \"collision_num\": self.collision_num,\n            \"collision_rate\": sum(self.collision_deque) / len(self.collision_deque) if self.collision_deque else 0.0,\n            \"episode_length\": self.step_count,\n            \"collision_state\": self.collision_state,\n        }\n\n        if self.terminal_state or self.success_state:\n            if self.collision_state:\n                info.update({\"CPS\": self.cps,\n                             \"CPM\": self.cpm,\n                             \"collision_interval\": self.collision_interval,\n                             \"collision_speed\": self.collision_speed,\n                             })\n        return encoded_state, self.last_reward, self.terminal_state or self.success_state, info\n\n    def _draw_path_server(self, life_time=60.0, skip=0):\n        \"\"\"\n            Draw a connected path from start of route to end.\n            Green node = start\n            Red node   = point along path\n            Blue node  = destination\n        \"\"\"\n        for i in range(0, len(self.route_waypoints) - 1, skip + 1):\n            z = 30.25\n            w0 = self.route_waypoints[i][0]\n            w1 = self.route_waypoints[i + 1][0]\n            self.world.debug.draw_line(\n                w0.transform.location + carla.Location(z=z),\n                w1.transform.location + carla.Location(z=z),\n                thickness=0.1, color=carla.Color(255, 0, 0),\n                life_time=life_time, persistent_lines=False)\n            self.world.debug.draw_point(\n                w0.transform.location + carla.Location(z=z), 0.1,\n                carla.Color(0, 255, 0) if i == 0 else carla.Color(255, 0, 0),\n                life_time, False)\n        self.world.debug.draw_point(\n            self.route_waypoints[-1][0].transform.location + carla.Location(z=z), 0.1,\n            carla.Color(0, 0, 255),\n            life_time, False)\n\n    def _draw_path(self, camera, image):\n        \"\"\"\n            Draw a connected path from start of route to end using homography.\n        \"\"\"\n        vehicle_vector = vector(self.vehicle.get_transform().location)\n        # Get the world to camera matrix\n        world_2_camera = np.array(camera.get_transform().get_inverse_matrix())\n\n        # Get the attributes from the camera\n        image_w = int(camera.actor.attributes['image_size_x'])\n        image_h = int(camera.actor.attributes['image_size_y'])\n        fov = float(camera.actor.attributes['fov'])\n        for i in range(self.current_waypoint_index, len(self.route_waypoints)):\n            waypoint_location = self.route_waypoints[i][0].transform.location + carla.Location(z=1.25)\n            waypoint_vector = vector(waypoint_location)\n            if not (2 < abs(np.linalg.norm(vehicle_vector - waypoint_vector)) < 50):\n                continue\n            # Calculate the camera projection matrix to project from 3D -> 2D\n            K = build_projection_matrix(image_w, image_h, fov)\n            x, y = get_image_point(waypoint_location, K, world_2_camera)\n            if i == len(self.route_waypoints) - 1:\n                color = (255, 0, 0)\n            else:\n                color = (0, 0, 255)\n            image = cv2.circle(image, (x, y), radius=3, color=color, thickness=-1)\n        return image\n\n    def _get_observation(self):\n        while self.observation_buffer is None:\n            pass\n        obs = self.observation_buffer.copy()\n        self.observation_buffer = None\n        return obs\n\n    def _get_viewer_image(self):\n        while self.viewer_image_buffer is None:\n            pass\n        image = self.viewer_image_buffer.copy()\n        self.viewer_image_buffer = None\n        return image\n\n    def _get_bev_spectator_data(self):\n        while self.bev_spectator_data_buffer is None:\n            pass\n        image = self.bev_spectator_data_buffer.copy()\n        self.bev_spectator_data_buffer = None\n        return image\n\n    def _get_bev_data(self):\n        while self.bev_data_buffer is None:\n            pass\n        image = self.bev_data_buffer.copy()\n        self.bev_data_buffer = None\n        return image\n\n    def _on_collision(self, event):\n        if get_actor_display_name(event.other_actor) != \"Road\":\n            self.terminal_state = True\n            self.collision_state = True\n            print(\"0| Terminal:  Collision with {}\".format(event.other_actor.type_id))\n        if self.activate_render:\n            self.hud.notification(\"Collision with {}\".format(get_actor_display_name(event.other_actor)))\n\n    def _on_invasion(self, event):\n        lane_types = set(x.type for x in event.crossed_lane_markings)\n        text = [\"%r\" % str(x).split()[-1] for x in lane_types]\n        if self.activate_render:\n            self.hud.notification(\"Crossed line %s\" % \" and \".join(text))\n\n    def _set_observation_image(self, image):\n        self.observation_buffer = image\n\n    def _set_viewer_image(self, image):\n        self.viewer_image_buffer = image\n\n    def _set_bev_spectator_data(self, image):\n        self.bev_spectator_data_buffer = image\n\n    def _set_bev_data(self, image):\n        self.bev_data_buffer = image\n\n    def _get_observation_seg_bev(self):\n        ev_transform = self.vehicle.get_transform()\n        ev_loc = ev_transform.location\n        ev_rot = ev_transform.rotation\n        ev_bbox = self.vehicle.bounding_box\n\n        def is_within_distance(w):\n            c_distance = abs(ev_loc.x - w.location.x) < self._distance_threshold \\\n                         and abs(ev_loc.y - w.location.y) < self._distance_threshold \\\n                         and abs(ev_loc.z - w.location.z) < 8.0\n            c_ev = abs(ev_loc.x - w.location.x) < 1.0 and abs(ev_loc.y - w.location.y) < 1.0\n            return c_distance and (not c_ev)\n\n        vehicle_bbox_list = self.world.get_level_bbs(carla.CityObjectLabel.Vehicles)\n        walker_bbox_list = self.world.get_level_bbs(carla.CityObjectLabel.Pedestrians)\n        if self._scale_bbox:\n            vehicles = self._get_surrounding_actors(vehicle_bbox_list, is_within_distance, 1.0)\n            walkers = self._get_surrounding_actors(walker_bbox_list, is_within_distance, 2.0)\n        else:\n            vehicles = self._get_surrounding_actors(vehicle_bbox_list, is_within_distance)\n            walkers = self._get_surrounding_actors(walker_bbox_list, is_within_distance)\n\n        self._history_queue.append((vehicles, walkers))\n\n        M_warp = self._get_warp_transform(ev_loc, ev_rot)\n\n        # objects with history\n        vehicle_masks, walker_masks = self._get_history_masks(M_warp)\n\n        # road_mask, lane_mask\n        road_mask = cv2.warpAffine(self._road, M_warp, (self._width, self._width)).astype(bool)\n        lane_mask_all = cv2.warpAffine(self._lane_marking_all, M_warp, (self._width, self._width)).astype(bool)\n        lane_mask_broken = cv2.warpAffine(self._lane_marking_white_broken, M_warp,\n                                          (self._width, self._width)).astype(bool)\n\n        # ev_mask\n        ev_mask = self._get_mask_from_actor_list([(ev_transform, ev_bbox.location, ev_bbox.extent)], M_warp)\n\n        # render\n        image = np.zeros([self._width, self._width, 3], dtype=np.uint8)\n        image[road_mask] = COLOR_ALUMINIUM_5\n        # image[route_mask] = COLOR_ALUMINIUM_3\n        image[lane_mask_all] = COLOR_MAGENTA\n        image[lane_mask_broken] = COLOR_MAGENTA_2\n\n        h_len = len(self._history_idx) - 1\n\n        for i, mask in enumerate(vehicle_masks):\n            image[mask] = tint(COLOR_BLUE, (h_len - i) * 0.2)\n        for i, mask in enumerate(walker_masks):\n            image[mask] = tint(COLOR_CYAN, (h_len - i) * 0.2)\n\n        image[ev_mask] = COLOR_WHITE\n\n        # masks\n        c_road = road_mask * 255\n        c_lane = lane_mask_all * 255\n        c_lane[lane_mask_broken] = 120\n\n        # masks with history\n        c_vehicle_history = [m * 255 for m in vehicle_masks]\n\n        masks = np.stack((c_road, c_lane, *c_vehicle_history), axis=2)\n\n        obs_dict = {'rendered': image, 'masks': masks}\n\n        return obs_dict\n\n    @staticmethod\n    def _get_surrounding_actors(bbox_list, criterium, scale=None):\n        actors = []\n        for bbox in bbox_list:\n            is_within_distance = criterium(bbox)\n            if is_within_distance:\n                bb_loc = carla.Location()\n                bb_ext = carla.Vector3D(bbox.extent)\n                if scale is not None:\n                    bb_ext = bb_ext * scale\n                    bb_ext.x = max(bb_ext.x, 0.8)\n                    bb_ext.y = max(bb_ext.y, 0.8)\n\n                actors.append((carla.Transform(bbox.location, bbox.rotation), bb_loc, bb_ext))\n        return actors\n\n    def _get_warp_transform(self, ev_loc, ev_rot):\n        ev_loc_in_px = self._world_to_pixel(ev_loc)\n        yaw = np.deg2rad(ev_rot.yaw)\n\n        forward_vec = np.array([np.cos(yaw), np.sin(yaw)])\n        right_vec = np.array([np.cos(yaw + 0.5 * np.pi), np.sin(yaw + 0.5 * np.pi)])\n\n        bottom_left = ev_loc_in_px - self._pixels_ev_to_bottom * forward_vec - (0.5 * self._width) * right_vec\n        top_left = ev_loc_in_px + (self._width - self._pixels_ev_to_bottom) * forward_vec - (\n            0.5 * self._width) * right_vec\n        top_right = ev_loc_in_px + (self._width - self._pixels_ev_to_bottom) * forward_vec + (\n            0.5 * self._width) * right_vec\n\n        src_pts = np.stack((bottom_left, top_left, top_right), axis=0).astype(np.float32)\n        dst_pts = np.array([[0, self._width - 1],\n                            [0, 0],\n                            [self._width - 1, 0]], dtype=np.float32)\n        return cv2.getAffineTransform(src_pts, dst_pts)\n\n    def _world_to_pixel(self, location, projective=False):\n        \"\"\"Converts the world coordinates to pixel coordinates\"\"\"\n        x = self._pixels_per_meter * (location.x - self._world_offset[0])\n        y = self._pixels_per_meter * (location.y - self._world_offset[1])\n\n        if projective:\n            p = np.array([x, y, 1], dtype=np.float32)\n        else:\n            p = np.array([x, y], dtype=np.float32)\n        return p\n\n    def _get_history_masks(self, M_warp):\n        qsize = len(self._history_queue)\n        vehicle_masks, walker_masks, tl_green_masks, tl_yellow_masks, tl_red_masks, stop_masks = [], [], [], [], [], []\n        for idx in self._history_idx:\n            idx = max(idx, -1 * qsize)\n\n            vehicles, walkers = self._history_queue[idx]\n\n            vehicle_masks.append(self._get_mask_from_actor_list(vehicles, M_warp))\n            walker_masks.append(self._get_mask_from_actor_list(walkers, M_warp))\n\n        return vehicle_masks, walker_masks\n\n    def _get_mask_from_actor_list(self, actor_list, M_warp):\n        mask = np.zeros([self._width, self._width], dtype=np.uint8)\n        # for actor_transform, bb_loc, bb_ext in actor_list:\n        for data in actor_list:\n            if len(data) == 12:\n                loc_x, loc_y, loc_z, pitch, yaw, roll, bb_loc_x, bb_loc_y, bb_loc_z, bb_ext_x, bb_ext_y, bb_ext_z = data\n                actor_transform = carla.Transform(\n                    carla.Location(x=loc_x, y=loc_y, z=loc_z),\n                    carla.Rotation(pitch=pitch, yaw=yaw, roll=roll)\n                )\n                bb_loc = carla.Location(x=bb_loc_x, y=bb_loc_y, z=bb_loc_z)\n                bb_ext = carla.Vector3D(x=bb_ext_x, y=bb_ext_y, z=bb_ext_z)\n            else:\n                actor_transform, bb_loc, bb_ext = data\n            corners = [carla.Location(x=-bb_ext.x, y=-bb_ext.y),\n                       carla.Location(x=bb_ext.x, y=-bb_ext.y),\n                       carla.Location(x=bb_ext.x, y=0),\n                       carla.Location(x=bb_ext.x, y=bb_ext.y),\n                       carla.Location(x=-bb_ext.x, y=bb_ext.y)]\n            corners = [bb_loc + corner for corner in corners]\n\n            corners = [actor_transform.transform(corner) for corner in corners]\n            corners_in_pixel = np.array([[self._world_to_pixel(corner)] for corner in corners])\n            corners_warped = cv2.transform(corners_in_pixel, M_warp)\n\n            cv2.fillConvexPoly(mask, np.round(corners_warped).astype(np.int32), 1)\n        return mask.astype(bool)\n\n\nif __name__ == \"__main__\":\n    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'\n\n"
  },
  {
    "path": "carla_env/navigation/__init__.py",
    "content": ""
  },
  {
    "path": "carla_env/navigation/agent.py",
    "content": "#!/usr/bin/env python\n\n# Copyright (c) 2018 Intel Labs.\n# authors: German Ros (german.ros@intel.com)\n#\n# This work is licensed under the terms of the MIT license.\n# For a copy, see <https://opensource.org/licenses/MIT>.\n\n\"\"\" This module implements an agent that roams around a track following random\nwaypoints and avoiding other vehicles.\nThe agent also responds to traffic lights. \"\"\"\n\nfrom enum import Enum\n\nimport carla\nfrom agents.tools.misc import is_within_distance_ahead, compute_magnitude_angle\n\n\nclass AgentState(Enum):\n    \"\"\"\n    AGENT_STATE represents the possible states of a roaming agent\n    \"\"\"\n    NAVIGATING = 1\n    BLOCKED_BY_VEHICLE = 2\n    BLOCKED_RED_LIGHT = 3\n\n\nclass Agent(object):\n    \"\"\"\n    Base class to define agents in CARLA\n    \"\"\"\n\n    def __init__(self, vehicle):\n        \"\"\"\n\n        :param vehicle: actor to apply to local planner logic onto\n        \"\"\"\n        self._vehicle = vehicle\n        self._proximity_threshold = 10.0  # meters\n        self._local_planner = None\n        self._world = self._vehicle.get_world()\n        self._map = self._vehicle.get_world().get_map()\n        self._last_traffic_light = None\n\n    def run_step(self, debug=False):\n        \"\"\"\n        Execute one step of navigation.\n        :return: control\n        \"\"\"\n        control = carla.VehicleControl()\n\n        if debug:\n            control.steer = 0.0\n            control.throttle = 0.0\n            control.brake = 0.0\n            control.hand_brake = False\n            control.manual_gear_shift = False\n\n        return control\n\n    def _is_light_red(self, lights_list):\n        \"\"\"\n        Method to check if there is a red light affecting us. This version of\n        the method is compatible with both European and US style traffic lights.\n\n        :param lights_list: list containing TrafficLight objects\n        :return: a tuple given by (bool_flag, traffic_light), where\n                 - bool_flag is True if there is a traffic light in RED\n                   affecting us and False otherwise\n                 - traffic_light is the object itself or None if there is no\n                   red traffic light affecting us\n        \"\"\"\n        if self._map.name == 'Town01' or self._map.name == 'Town02':\n            return self._is_light_red_europe_style(lights_list)\n        else:\n            return self._is_light_red_us_style(lights_list)\n\n    def _is_light_red_europe_style(self, lights_list):\n        \"\"\"\n        This method is specialized to check European style traffic lights.\n\n        :param lights_list: list containing TrafficLight objects\n        :return: a tuple given by (bool_flag, traffic_light), where\n                 - bool_flag is True if there is a traffic light in RED\n                  affecting us and False otherwise\n                 - traffic_light is the object itself or None if there is no\n                   red traffic light affecting us\n        \"\"\"\n        ego_vehicle_location = self._vehicle.get_location()\n        ego_vehicle_waypoint = self._map.get_waypoint(ego_vehicle_location)\n\n        for traffic_light in lights_list:\n            object_waypoint = self._map.get_waypoint(traffic_light.get_location())\n            if object_waypoint.road_id != ego_vehicle_waypoint.road_id or \\\n                    object_waypoint.lane_id != ego_vehicle_waypoint.lane_id:\n                continue\n\n            loc = traffic_light.get_location()\n            if is_within_distance_ahead(loc, ego_vehicle_location,\n                                        self._vehicle.get_transform().rotation.yaw,\n                                        self._proximity_threshold):\n                if traffic_light.state == carla.TrafficLightState.Red:\n                    return (True, traffic_light)\n\n        return (False, None)\n\n    def _is_light_red_us_style(self, lights_list, debug=False):\n        \"\"\"\n        This method is specialized to check US style traffic lights.\n\n        :param lights_list: list containing TrafficLight objects\n        :return: a tuple given by (bool_flag, traffic_light), where\n                 - bool_flag is True if there is a traffic light in RED\n                   affecting us and False otherwise\n                 - traffic_light is the object itself or None if there is no\n                   red traffic light affecting us\n        \"\"\"\n        ego_vehicle_location = self._vehicle.get_location()\n        ego_vehicle_waypoint = self._map.get_waypoint(ego_vehicle_location)\n\n        if ego_vehicle_waypoint.is_intersection:\n            # It is too late. Do not block the intersection! Keep going!\n            return (False, None)\n\n        if self._local_planner.target_waypoint is not None:\n            if self._local_planner.target_waypoint.is_intersection:\n                min_angle = 180.0\n                sel_magnitude = 0.0\n                sel_traffic_light = None\n                for traffic_light in lights_list:\n                    loc = traffic_light.get_location()\n                    magnitude, angle = compute_magnitude_angle(loc,\n                                                               ego_vehicle_location,\n                                                               self._vehicle.get_transform().rotation.yaw)\n                    if magnitude < 60.0 and angle < min(25.0, min_angle):\n                        sel_magnitude = magnitude\n                        sel_traffic_light = traffic_light\n                        min_angle = angle\n\n                if sel_traffic_light is not None:\n                    if debug:\n                        print('=== Magnitude = {} | Angle = {} | ID = {}'.format(\n                            sel_magnitude, min_angle, sel_traffic_light.id))\n\n                    if self._last_traffic_light is None:\n                        self._last_traffic_light = sel_traffic_light\n\n                    if self._last_traffic_light.state == carla.TrafficLightState.Red:\n                        return (True, self._last_traffic_light)\n                else:\n                    self._last_traffic_light = None\n\n        return (False, None)\n\n    def _is_vehicle_hazard(self, vehicle_list):\n        \"\"\"\n        Check if a given vehicle is an obstacle in our way. To this end we take\n        into account the road and lane the target vehicle is on and run a\n        geometry test to check if the target vehicle is under a certain distance\n        in front of our ego vehicle.\n\n        WARNING: This method is an approximation that could fail for very large\n         vehicles, which center is actually on a different lane but their\n         extension falls within the ego vehicle lane.\n\n        :param vehicle_list: list of potential obstacle to check\n        :return: a tuple given by (bool_flag, vehicle), where\n                 - bool_flag is True if there is a vehicle ahead blocking us\n                   and False otherwise\n                 - vehicle is the blocker object itself\n        \"\"\"\n\n        ego_vehicle_location = self._vehicle.get_location()\n        ego_vehicle_waypoint = self._map.get_waypoint(ego_vehicle_location)\n\n        for target_vehicle in vehicle_list:\n            # do not account for the ego vehicle\n            if target_vehicle.id == self._vehicle.id:\n                continue\n\n            # if the object is not in our lane it's not an obstacle\n            target_vehicle_waypoint = self._map.get_waypoint(target_vehicle.get_location())\n            if target_vehicle_waypoint.road_id != ego_vehicle_waypoint.road_id or \\\n                    target_vehicle_waypoint.lane_id != ego_vehicle_waypoint.lane_id:\n                continue\n\n            loc = target_vehicle.get_location()\n            if is_within_distance_ahead(loc, ego_vehicle_location,\n                                        self._vehicle.get_transform().rotation.yaw,\n                                        self._proximity_threshold):\n                return (True, target_vehicle)\n\n        return (False, None)\n\n    def emergency_stop(self):\n        \"\"\"\n        Send an emergency stop command to the vehicle\n        :return:\n        \"\"\"\n        control = carla.VehicleControl()\n        control.steer = 0.0\n        control.throttle = 0.0\n        control.brake = 1.0\n        control.hand_brake = False\n\n        return control\n"
  },
  {
    "path": "carla_env/navigation/basic_agent.py",
    "content": "#!/usr/bin/env python\n\n# Copyright (c) 2018 Intel Labs.\n# authors: German Ros (german.ros@intel.com)\n#\n# This work is licensed under the terms of the MIT license.\n# For a copy, see <https://opensource.org/licenses/MIT>.\n\n\"\"\" This module implements an agent that roams around a track following random\nwaypoints and avoiding other vehicles.\nThe agent also responds to traffic lights. \"\"\"\n\n\nimport carla\nfrom agents.navigation.agent import Agent, AgentState\nfrom agents.navigation.local_planner import LocalPlanner\nfrom agents.navigation.global_route_planner import GlobalRoutePlanner\nfrom agents.navigation.global_route_planner_dao import GlobalRoutePlannerDAO\n\nclass BasicAgent(Agent):\n    \"\"\"\n    BasicAgent implements a basic agent that navigates scenes to reach a given\n    target destination. This agent respects traffic lights and other vehicles.\n    \"\"\"\n\n    def __init__(self, vehicle, target_speed=20):\n        \"\"\"\n\n        :param vehicle: actor to apply to local planner logic onto\n        \"\"\"\n        super(BasicAgent, self).__init__(vehicle)\n\n        self._proximity_threshold = 10.0  # meters\n        self._state = AgentState.NAVIGATING\n        args_lateral_dict = {\n            'K_P': 1,\n            'K_D': 0.02,\n            'K_I': 0,\n            'dt': 1.0/20.0}\n        self._local_planner = LocalPlanner(\n            self._vehicle, opt_dict={'target_speed' : target_speed,\n            'lateral_control_dict':args_lateral_dict})\n        self._hop_resolution = 2.0\n        self._path_seperation_hop = 2\n        self._path_seperation_threshold = 0.5\n        self._target_speed = target_speed\n        self._grp = None\n\n    def set_destination(self, location):\n        \"\"\"\n        This method creates a list of waypoints from agent's position to destination location\n        based on the route returned by the global router\n        \"\"\"\n\n        start_waypoint = self._map.get_waypoint(self._vehicle.get_location())\n        end_waypoint = self._map.get_waypoint(\n            carla.Location(location[0], location[1], location[2]))\n\n        route_trace = self._trace_route(start_waypoint, end_waypoint)\n        assert route_trace\n\n        self._local_planner.set_global_plan(route_trace)\n\n    def _trace_route(self, start_waypoint, end_waypoint):\n        \"\"\"\n        This method sets up a global router and returns the optimal route\n        from start_waypoint to end_waypoint\n        \"\"\"\n\n        # Setting up global router\n        if self._grp is None:\n            dao = GlobalRoutePlannerDAO(self._vehicle.get_world().get_map(), self._hop_resolution)\n            grp = GlobalRoutePlanner(dao)\n            grp.setup()\n            self._grp = grp\n\n        # Obtain route plan\n        route = self._grp.trace_route(\n            start_waypoint.transform.location,\n            end_waypoint.transform.location)\n\n        return route\n\n    def run_step(self, debug=False):\n        \"\"\"\n        Execute one step of navigation.\n        :return: carla.VehicleControl\n        \"\"\"\n\n        # is there an obstacle in front of us?\n        hazard_detected = False\n\n        # retrieve relevant elements for safe navigation, i.e.: traffic lights\n        # and other vehicles\n        actor_list = self._world.get_actors()\n        vehicle_list = actor_list.filter(\"*vehicle*\")\n        lights_list = actor_list.filter(\"*traffic_light*\")\n\n        # check possible obstacles\n        vehicle_state, vehicle = self._is_vehicle_hazard(vehicle_list)\n        if vehicle_state:\n            if debug:\n                print('!!! VEHICLE BLOCKING AHEAD [{}])'.format(vehicle.id))\n\n            self._state = AgentState.BLOCKED_BY_VEHICLE\n            hazard_detected = True\n\n        # check for the state of the traffic lights\n        light_state, traffic_light = self._is_light_red(lights_list)\n        if light_state:\n            if debug:\n                print('=== RED LIGHT AHEAD [{}])'.format(traffic_light.id))\n\n            self._state = AgentState.BLOCKED_RED_LIGHT\n            hazard_detected = True\n\n        if hazard_detected:\n            control = self.emergency_stop()\n        else:\n            self._state = AgentState.NAVIGATING\n            # standard local planner behavior\n            control = self._local_planner.run_step()\n\n        return control\n"
  },
  {
    "path": "carla_env/navigation/controller.py",
    "content": "#!/usr/bin/env python\n\n# Copyright (c) 2018 Intel Labs.\n# authors: German Ros (german.ros@intel.com)\n#\n# This work is licensed under the terms of the MIT license.\n# For a copy, see <https://opensource.org/licenses/MIT>.\n\n\"\"\" This module contains PID controllers to perform lateral and longitudinal control. \"\"\"\n\nfrom collections import deque\nimport math\n\nimport numpy as np\n\nimport carla\nfrom carla_env.tools.misc import get_speed\n\n\nclass VehiclePIDController():\n    \"\"\"\n    VehiclePIDController is the combination of two PID controllers (lateral and longitudinal) to perform the\n    low level control a vehicle from client side\n    \"\"\"\n\n    def __init__(self, vehicle, args_lateral=None, args_longitudinal=None):\n        \"\"\"\n        :param vehicle: actor to apply to local planner logic onto\n        :param args_lateral: dictionary of arguments to set the lateral PID controller using the following semantics:\n                             K_P -- Proportional term\n                             K_D -- Differential term\n                             K_I -- Integral term\n        :param args_longitudinal: dictionary of arguments to set the longitudinal PID controller using the following\n        semantics:\n                             K_P -- Proportional term\n                             K_D -- Differential term\n                             K_I -- Integral term\n        \"\"\"\n        if not args_lateral:\n            args_lateral = {'K_P': 1.0, 'K_D': 0.0, 'K_I': 0.0}\n        if not args_longitudinal:\n            args_longitudinal = {'K_P': 1.0, 'K_D': 0.0, 'K_I': 0.0}\n\n        self._vehicle = vehicle\n        self._world = self._vehicle.get_world()\n        self._lon_controller = PIDLongitudinalController(self._vehicle, **args_longitudinal)\n        self._lat_controller = PIDLateralController(self._vehicle, **args_lateral)\n\n    def run_step(self, target_speed, waypoint):\n        \"\"\"\n        Execute one step of control invoking both lateral and longitudinal PID controllers to reach a target waypoint\n        at a given target_speed.\n\n        :param target_speed: desired vehicle speed\n        :param waypoint: target location encoded as a waypoint\n        :return: distance (in meters) to the waypoint\n        \"\"\"\n        throttle = self._lon_controller.run_step(target_speed)\n        steering = self._lat_controller.run_step(waypoint)\n\n        control = carla.VehicleControl()\n        control.steer = steering\n        control.throttle = throttle\n        control.brake = 0.0\n        control.hand_brake = False\n        control.manual_gear_shift = False\n\n        return control\n\n\nclass PIDLongitudinalController():\n    \"\"\"\n    PIDLongitudinalController implements longitudinal control using a PID.\n    \"\"\"\n\n    def __init__(self, vehicle, K_P=1.0, K_D=0.0, K_I=0.0, dt=0.03):\n        \"\"\"\n        :param vehicle: actor to apply to local planner logic onto\n        :param K_P: Proportional term\n        :param K_D: Differential term\n        :param K_I: Integral term\n        :param dt: time differential in seconds\n        \"\"\"\n        self._vehicle = vehicle\n        self._K_P = K_P\n        self._K_D = K_D\n        self._K_I = K_I\n        self._dt = dt\n        self._e_buffer = deque(maxlen=30)\n\n    def run_step(self, target_speed, debug=False):\n        \"\"\"\n        Execute one step of longitudinal control to reach a given target speed.\n\n        :param target_speed: target speed in Km/h\n        :return: throttle control in the range [0, 1]\n        \"\"\"\n        current_speed = get_speed(self._vehicle)\n\n        if debug:\n            print('Current speed = {}'.format(current_speed))\n\n        return self._pid_control(target_speed, current_speed)\n\n    def _pid_control(self, target_speed, current_speed):\n        \"\"\"\n        Estimate the throttle of the vehicle based on the PID equations\n\n        :param target_speed:  target speed in Km/h\n        :param current_speed: current speed of the vehicle in Km/h\n        :return: throttle control in the range [0, 1]\n        \"\"\"\n        _e = (target_speed - current_speed)\n        self._e_buffer.append(_e)\n\n        if len(self._e_buffer) >= 2:\n            _de = (self._e_buffer[-1] - self._e_buffer[-2]) / self._dt\n            _ie = sum(self._e_buffer) * self._dt\n        else:\n            _de = 0.0\n            _ie = 0.0\n\n        return np.clip((self._K_P * _e) + (self._K_D * _de / self._dt) + (self._K_I * _ie * self._dt), 0.0, 1.0)\n\n\nclass PIDLateralController():\n    \"\"\"\n    PIDLateralController implements lateral control using a PID.\n    \"\"\"\n\n    def __init__(self, vehicle, K_P=1.0, K_D=0.0, K_I=0.0, dt=0.03):\n        \"\"\"\n        :param vehicle: actor to apply to local planner logic onto\n        :param K_P: Proportional term\n        :param K_D: Differential term\n        :param K_I: Integral term\n        :param dt: time differential in seconds\n        \"\"\"\n        self._vehicle = vehicle\n        self._K_P = K_P\n        self._K_D = K_D\n        self._K_I = K_I\n        self._dt = dt\n        self._e_buffer = deque(maxlen=10)\n\n    def run_step(self, waypoint):\n        \"\"\"\n        Execute one step of lateral control to steer the vehicle towards a certain waypoin.\n\n        :param waypoint: target waypoint\n        :return: steering control in the range [-1, 1] where:\n            -1 represent maximum steering to left\n            +1 maximum steering to right\n        \"\"\"\n        return self._pid_control(waypoint, self._vehicle.get_transform())\n\n    def _pid_control(self, waypoint, vehicle_transform):\n        \"\"\"\n        Estimate the steering angle of the vehicle based on the PID equations\n\n        :param waypoint: target waypoint\n        :param vehicle_transform: current transform of the vehicle\n        :return: steering control in the range [-1, 1]\n        \"\"\"\n        v_begin = vehicle_transform.location\n        v_end = v_begin + carla.Location(x=math.cos(math.radians(vehicle_transform.rotation.yaw)),\n                                         y=math.sin(math.radians(vehicle_transform.rotation.yaw)))\n\n        v_vec = np.array([v_end.x - v_begin.x, v_end.y - v_begin.y, 0.0])\n        w_vec = np.array([waypoint.transform.location.x -\n                          v_begin.x, waypoint.transform.location.y -\n                          v_begin.y, 0.0])\n        _dot = math.acos(np.clip(np.dot(w_vec, v_vec) /\n                                 (np.linalg.norm(w_vec) * np.linalg.norm(v_vec)), -1.0, 1.0))\n\n        _cross = np.cross(v_vec, w_vec)\n        if _cross[2] < 0:\n            _dot *= -1.0\n\n        self._e_buffer.append(_dot)\n        if len(self._e_buffer) >= 2:\n            _de = (self._e_buffer[-1] - self._e_buffer[-2]) / self._dt\n            _ie = sum(self._e_buffer) * self._dt\n        else:\n            _de = 0.0\n            _ie = 0.0\n\n        return np.clip((self._K_P * _dot) + (self._K_D * _de /\n                                             self._dt) + (self._K_I * _ie * self._dt), -1.0, 1.0)\n"
  },
  {
    "path": "carla_env/navigation/global_route_planner.py",
    "content": "# This work is licensed under the terms of the MIT license.\n# For a copy, see <https://opensource.org/licenses/MIT>.\n\n\"\"\"\nThis module provides GlobalRoutePlanner implementation.\n\"\"\"\n\nimport math\n\nimport numpy as np\nimport networkx as nx\n\nimport carla\nfrom carla_env.navigation.local_planner import RoadOption\nfrom carla_env.tools.misc import vector\n\n\nclass GlobalRoutePlanner(object):\n    \"\"\"\n    This class provides a very high level route plan.\n    Instantiate the class by passing a reference to\n    A GlobalRoutePlannerDAO object.\n    \"\"\"\n\n    def __init__(self, dao):\n        \"\"\"\n        Constructor\n        \"\"\"\n        self._dao = dao\n        self._topology = None\n        self._graph = None\n        self._id_map = None\n        self._road_id_to_edge = None\n        self._intersection_end_node = -1\n        self._previous_decision = RoadOption.VOID\n\n    def setup(self):\n        \"\"\"\n        Performs initial server data lookup for detailed topology\n        and builds graph representation of the world map.\n        \"\"\"\n        self._topology = self._dao.get_topology()\n        self._graph, self._id_map, self._road_id_to_edge = self._build_graph()\n        self._find_loose_ends()\n        self._lane_change_link()\n\n    def _build_graph(self):\n        \"\"\"\n        This function builds a networkx  graph representation of topology.\n        The topology is read from self._topology.\n        graph node properties:\n            vertex   -   (x,y,z) position in world map\n        graph edge properties:\n            entry_vector    -   unit vector along tangent at entry point\n            exit_vector     -   unit vector along tangent at exit point\n            net_vector      -   unit vector of the chord from entry to exit\n            intersection    -   boolean indicating if the edge belongs to an\n                                intersection\n        return      :   graph -> networkx graph representing the world map,\n                        id_map-> mapping from (x,y,z) to node id\n                        road_id_to_edge-> map from road id to edge in the graph\n        \"\"\"\n        graph = nx.DiGraph()\n        id_map = dict() # Map with structure {(x,y,z): id, ... }\n        road_id_to_edge = dict() # Map with structure {road_id: {lane_id: edge, ... }, ... }\n\n        for segment in self._topology:\n\n            entry_xyz, exit_xyz = segment['entryxyz'], segment['exitxyz']\n            path = segment['path']\n            entry_wp, exit_wp = segment['entry'], segment['exit']\n            intersection = entry_wp.is_intersection\n            road_id, section_id, lane_id = entry_wp.road_id, entry_wp.section_id, entry_wp.lane_id\n\n            for vertex in entry_xyz, exit_xyz:\n                # Adding unique nodes and populating id_map\n                if vertex not in id_map:\n                    new_id = len(id_map)\n                    id_map[vertex] = new_id\n                    graph.add_node(new_id, vertex=vertex)\n            n1 = id_map[entry_xyz]\n            n2 = id_map[exit_xyz]\n            if road_id not in road_id_to_edge:\n                road_id_to_edge[road_id] = dict()\n            if section_id not in road_id_to_edge[road_id]:\n                road_id_to_edge[road_id][section_id] = dict()\n            road_id_to_edge[road_id][section_id][lane_id] = (n1, n2)\n\n            entry_carla_vector = entry_wp.transform.rotation.get_forward_vector()\n            exit_carla_vector = exit_wp.transform.rotation.get_forward_vector()\n            # Adding edge with attributes\n            graph.add_edge(\n                n1, n2,\n                length=len(path) + 1, path=path,\n                entry_waypoint=entry_wp, exit_waypoint=exit_wp,\n                entry_vector=np.array(\n                    [entry_carla_vector.x, entry_carla_vector.y, entry_carla_vector.z]),\n                exit_vector=np.array(\n                    [exit_carla_vector.x, exit_carla_vector.y, exit_carla_vector.z]),\n                net_vector=vector(entry_wp.transform.location, exit_wp.transform.location),\n                intersection=intersection, type=RoadOption.LANEFOLLOW)\n\n        return graph, id_map, road_id_to_edge\n\n    def _find_loose_ends(self):\n        \"\"\"\n        This method finds road segments that have an unconnected end and\n        adds them to the internal graph representation\n        \"\"\"\n        count_loose_ends = 0\n        hop_resolution = self._dao.get_resolution()\n        for segment in self._topology:\n            end_wp = segment['exit']\n            exit_xyz = segment['exitxyz']\n            road_id, section_id, lane_id = end_wp.road_id, end_wp.section_id, end_wp.lane_id\n            if road_id in self._road_id_to_edge and \\\n                section_id in self._road_id_to_edge[road_id] and \\\n                    lane_id in self._road_id_to_edge[road_id][section_id]:\n                pass\n            else:\n                count_loose_ends += 1\n                if road_id not in self._road_id_to_edge:\n                    self._road_id_to_edge[road_id] = dict()\n                if section_id not in self._road_id_to_edge[road_id]:\n                    self._road_id_to_edge[road_id][section_id] = dict()\n                n1 = self._id_map[exit_xyz]\n                n2 = -1*count_loose_ends\n                self._road_id_to_edge[road_id][section_id][lane_id] = (n1, n2)\n                next_wp = end_wp.next(hop_resolution)\n                path = []\n                while next_wp is not None and next_wp and \\\n                    next_wp[0].road_id == road_id and \\\n                        next_wp[0].section_id == section_id and \\\n                            next_wp[0].lane_id == lane_id:\n                    path.append(next_wp[0])\n                    next_wp = next_wp[0].next(hop_resolution)\n                if path:\n                    n2_xyz = (path[-1].transform.location.x,\n                              path[-1].transform.location.y,\n                              path[-1].transform.location.z)\n                    self._graph.add_node(n2, vertex=n2_xyz)\n                    self._graph.add_edge(\n                        n1, n2,\n                        length=len(path) + 1, path=path,\n                        entry_waypoint=end_wp, exit_waypoint=path[-1],\n                        entry_vector=None, exit_vector=None, net_vector=None,\n                        intersection=end_wp.is_intersection, type=RoadOption.LANEFOLLOW)\n\n    def _localize(self, location):\n        \"\"\"\n        This function finds the road segment closest to given location\n        location        :   carla.Location to be localized in the graph\n        return          :   pair node ids representing an edge in the graph\n        \"\"\"\n        waypoint = self._dao.get_waypoint(location)\n        edge = None\n        try:\n            edge = self._road_id_to_edge[waypoint.road_id][waypoint.section_id][waypoint.lane_id]\n        except KeyError:\n            print(\n                \"Failed to localize! : \",\n                \"Road id : \", waypoint.road_id,\n                \"Section id : \", waypoint.section_id,\n                \"Lane id : \", waypoint.lane_id,\n                \"Location : \", waypoint.transform.location.x,\n                waypoint.transform.location.y)\n        return edge\n\n    def _lane_change_link(self):\n        \"\"\"\n        This method places zero cost links in the topology graph\n        representing availability of lane changes.\n        \"\"\"\n\n        for segment in self._topology:\n            left_found, right_found = False, False\n\n            for waypoint in segment['path']:\n                if not segment['entry'].is_intersection:\n                    next_waypoint, next_road_option, next_segment = None, None, None\n\n                    if bool(waypoint.lane_change & carla.LaneChange.Right) and not right_found:\n                        next_waypoint = waypoint.get_right_lane()\n                        if next_waypoint is not None and \\\n                            next_waypoint.lane_type == carla.LaneType.Driving and \\\n                                waypoint.road_id == next_waypoint.road_id:\n                            next_road_option = RoadOption.CHANGELANERIGHT\n                            next_segment = self._localize(next_waypoint.transform.location)\n                            if next_segment is not None:\n                                self._graph.add_edge(\n                                    self._id_map[segment['entryxyz']], next_segment[0], entry_waypoint=segment['entry'],\n                                    exit_waypoint=self._graph.edges[next_segment[0], next_segment[1]]['entry_waypoint'],\n                                    path=[], length=0, type=next_road_option, change_waypoint = waypoint)\n                                right_found = True\n\n                    if bool(waypoint.lane_change & carla.LaneChange.Left) and not left_found:\n                        next_waypoint = waypoint.get_left_lane()\n                        if next_waypoint is not None and next_waypoint.lane_type == carla.LaneType.Driving and \\\n                            waypoint.road_id == next_waypoint.road_id:\n                            next_road_option = RoadOption.CHANGELANELEFT\n                            next_segment = self._localize(next_waypoint.transform.location)\n                            if next_segment is not None:\n                                self._graph.add_edge(\n                                    self._id_map[segment['entryxyz']], next_segment[0], entry_waypoint=segment['entry'],\n                                    exit_waypoint=self._graph.edges[next_segment[0], next_segment[1]]['entry_waypoint'],\n                                    path=[], length=0, type=next_road_option, change_waypoint = waypoint)\n                                left_found = True\n\n                if left_found and right_found:\n                    break\n\n    def _distance_heuristic(self, n1, n2):\n        \"\"\"\n        Distance heuristic calculator for path searching\n        in self._graph\n        \"\"\"\n        l1 = np.array(self._graph.nodes[n1]['vertex'])\n        l2 = np.array(self._graph.nodes[n2]['vertex'])\n        return np.linalg.norm(l1-l2)\n\n    def _path_search(self, origin, destination):\n        \"\"\"\n        This function finds the shortest path connecting origin and destination\n        using A* search with distance heuristic.\n        origin      :   carla.Location object of start position\n        destination :   carla.Location object of of end position\n        return      :   path as list of node ids (as int) of the graph self._graph\n        connecting origin and destination\n        \"\"\"\n\n        start, end = self._localize(origin), self._localize(destination)\n\n        route = nx.astar_path(\n            self._graph, source=start[0], target=end[0],\n            heuristic=self._distance_heuristic, weight='length')\n        route.append(end[1])\n        return route\n\n    def _successive_last_intersection_edge(self, index, route):\n        \"\"\"\n        This method returns the last successive intersection edge\n        from a starting index on the route.\n\n        This helps moving past tiny intersection edges to calculate\n        proper turn decisions.\n        \"\"\"\n\n        last_intersection_edge = None\n        last_node = None\n        for node1, node2 in [(route[i], route[i+1]) for i in range(index, len(route)-1)]:\n            candidate_edge = self._graph.edges[node1, node2]\n            if node1 == route[index]:\n                last_intersection_edge = candidate_edge\n            if candidate_edge['type'] == RoadOption.LANEFOLLOW and \\\n                candidate_edge['intersection']:\n                last_intersection_edge = candidate_edge\n                last_node = node2\n            else:\n                break\n\n        return last_node, last_intersection_edge\n\n    def _turn_decision(self, index, route, threshold=math.radians(5)):\n        \"\"\"\n        This method returns the turn decision (RoadOption) for pair of edges\n        around current index of route list\n        \"\"\"\n\n        decision = None\n        previous_node = route[index-1]\n        current_node = route[index]\n        next_node = route[index+1]\n        next_edge = self._graph.edges[current_node, next_node]\n        if index > 0:\n            if self._previous_decision != RoadOption.VOID and \\\n                self._intersection_end_node > 0 and \\\n                    self._intersection_end_node != previous_node and \\\n                        next_edge['type'] == RoadOption.LANEFOLLOW and \\\n                            next_edge['intersection']:\n                decision = self._previous_decision\n            else:\n                self._intersection_end_node = -1\n                current_edge = self._graph.edges[previous_node, current_node]\n                calculate_turn = current_edge['type'].value == RoadOption.LANEFOLLOW.value and \\\n                    not current_edge['intersection'] and \\\n                        next_edge['type'].value == RoadOption.LANEFOLLOW.value and \\\n                            next_edge['intersection']\n                if calculate_turn:\n                    last_node, tail_edge = self._successive_last_intersection_edge(index, route)\n                    self._intersection_end_node = last_node\n                    if tail_edge is not None:\n                        next_edge = tail_edge\n                    cv, nv = current_edge['exit_vector'], next_edge['net_vector']\n                    cross_list = []\n                    for neighbor in self._graph.successors(current_node):\n                        select_edge = self._graph.edges[current_node, neighbor]\n                        if select_edge['type'].value == RoadOption.LANEFOLLOW.value:\n                            if neighbor != route[index+1]:\n                                sv = select_edge['net_vector']\n                                cross_list.append(np.cross(cv, sv)[2])\n                    next_cross = np.cross(cv, nv)[2]\n                    deviation = math.acos(np.clip(\n                        np.dot(cv, nv)/(np.linalg.norm(cv)*np.linalg.norm(nv)), -1.0, 1.0))\n                    if not cross_list:\n                        cross_list.append(0)\n                    if deviation < threshold:\n                        decision = RoadOption.STRAIGHT\n                    elif cross_list and next_cross < min(cross_list):\n                        decision = RoadOption.LEFT\n                    elif cross_list and next_cross > max(cross_list):\n                        decision = RoadOption.RIGHT\n                    elif next_cross < 0:\n                        decision = RoadOption.LEFT\n                    elif next_cross > 0:\n                        decision = RoadOption.RIGHT\n                else:\n                    decision = next_edge['type']\n        else:\n            decision = next_edge['type']\n        self._previous_decision = decision\n\n        return decision\n\n    def abstract_route_plan(self, origin, destination):\n        \"\"\"\n        The following function generates the route plan based on\n        origin      : carla.Location object of the route's start position\n        destination : carla.Location object of the route's end position\n        return      : list of turn by turn navigation decisions as\n        agents.navigation.local_planner.RoadOption elements\n        Possible values are STRAIGHT, LEFT, RIGHT, LANEFOLLOW, VOID\n        CHANGELANELEFT, CHANGELANERIGHT\n        \"\"\"\n\n        route = self._path_search(origin, destination)\n        plan = []\n\n        for i in range(len(route) - 1):\n            road_option = self._turn_decision(i, route)\n            plan.append(road_option)\n\n        return plan\n\n    def _find_closest_in_list(self, current_waypoint, waypoint_list):\n        min_distance = float('inf')\n        closest_index = -1\n        for i, waypoint in enumerate(waypoint_list):\n            distance = waypoint.transform.location.distance(\n                current_waypoint.transform.location)\n            if distance < min_distance:\n                min_distance = distance\n                closest_index = i\n\n        return closest_index\n\n    def trace_route(self, origin, destination):\n        \"\"\"\n        This method returns list of (carla.Waypoint, RoadOption)\n        from origin (carla.Location) to destination (carla.Location)\n        \"\"\"\n\n        route_trace = []\n        route = self._path_search(origin, destination)\n        current_waypoint = self._dao.get_waypoint(origin)\n        destination_waypoint = self._dao.get_waypoint(destination)\n        resolution = self._dao.get_resolution()\n\n        for i in range(len(route) - 1):\n            road_option = self._turn_decision(i, route)\n            edge = self._graph.edges[route[i], route[i+1]]\n            path = []\n\n            if edge['type'].value != RoadOption.LANEFOLLOW.value and \\\n                edge['type'].value != RoadOption.VOID.value:\n                route_trace.append((current_waypoint, road_option))\n                exit_wp = edge['exit_waypoint']\n                n1, n2 = self._road_id_to_edge[exit_wp.road_id][exit_wp.section_id][exit_wp.lane_id]\n                next_edge = self._graph.edges[n1, n2]\n                if next_edge['path']:\n                    closest_index = self._find_closest_in_list(current_waypoint, next_edge['path'])\n                    closest_index = min(len(next_edge['path'])-1, closest_index+5)\n                    current_waypoint = next_edge['path'][closest_index]\n                else:\n                    current_waypoint = next_edge['exit_waypoint']\n                route_trace.append((current_waypoint, road_option))\n\n            else:\n                path = path + [edge['entry_waypoint']] + edge['path'] + [edge['exit_waypoint']]\n                closest_index = self._find_closest_in_list(current_waypoint, path)\n                for waypoint in path[closest_index:]:\n                    current_waypoint = waypoint\n                    route_trace.append((current_waypoint, road_option))\n                    if len(route)-i <= 2 and \\\n                        waypoint.transform.location.distance(destination) < 2*resolution:\n                        break\n                    elif len(route)-i <= 2 and \\\n                        current_waypoint.road_id == destination_waypoint.road_id and \\\n                            current_waypoint.section_id == destination_waypoint.section_id and \\\n                                current_waypoint.lane_id == destination_waypoint.lane_id:\n                        destination_index = self._find_closest_in_list(destination_waypoint, path)\n                        if closest_index > destination_index:\n                            break\n\n        return route_trace\n"
  },
  {
    "path": "carla_env/navigation/global_route_planner_dao.py",
    "content": "# This work is licensed under the terms of the MIT license.\n# For a copy, see <https://opensource.org/licenses/MIT>.\n\n\"\"\"\nThis module provides implementation for GlobalRoutePlannerDAO\n\"\"\"\n\nimport numpy as np\n\n\nclass GlobalRoutePlannerDAO(object):\n    \"\"\"\n    This class is the data access layer for fetching data\n    from the carla server instance for GlobalRoutePlanner\n    \"\"\"\n\n    def __init__(self, wmap, sampling_resolution=1):\n        \"\"\"get_topology\n        Constructor\n\n        wmap    :   carl world map object\n        \"\"\"\n        self._sampling_resolution = sampling_resolution\n        self._wmap = wmap\n\n    def get_topology(self):\n        \"\"\"\n        Accessor for topology.\n        This function retrieves topology from the server as a list of\n        road segments as pairs of waypoint objects, and processes the\n        topology into a list of dictionary objects.\n\n        return: list of dictionary objects with the following attributes\n                entry   -   waypoint of entry point of road segment\n                entryxyz-   (x,y,z) of entry point of road segment\n                exit    -   waypoint of exit point of road segment\n                exitxyz -   (x,y,z) of exit point of road segment\n                path    -   list of waypoints separated by 1m from entry\n                            to exit\n        \"\"\"\n        topology = []\n        # Retrieving waypoints to construct a detailed topology\n        for segment in self._wmap.get_topology():\n            wp1, wp2 = segment[0], segment[1]\n            l1, l2 = wp1.transform.location, wp2.transform.location\n            # Rounding off to avoid floating point imprecision\n            x1, y1, z1, x2, y2, z2 = np.round([l1.x, l1.y, l1.z, l2.x, l2.y, l2.z], 0)\n            wp1.transform.location, wp2.transform.location = l1, l2\n            seg_dict = dict()\n            seg_dict['entry'], seg_dict['exit'] = wp1, wp2\n            seg_dict['entryxyz'], seg_dict['exitxyz'] = (x1, y1, z1), (x2, y2, z2)\n            seg_dict['path'] = []\n            endloc = wp2.transform.location\n            if wp1.transform.location.distance(endloc) > self._sampling_resolution:\n                w = wp1.next(self._sampling_resolution)[0]\n                while w.transform.location.distance(endloc) > self._sampling_resolution:\n                    seg_dict['path'].append(w)\n                    w = w.next(self._sampling_resolution)[0]\n            else:\n                seg_dict['path'].append(wp1.next(self._sampling_resolution/2.0)[0])\n            topology.append(seg_dict)\n        return topology\n\n    def get_waypoint(self, location):\n        \"\"\"\n        The method returns waypoint at given location\n        \"\"\"\n        waypoint = self._wmap.get_waypoint(location)\n        return waypoint\n\n    def get_resolution(self):\n        \"\"\" Accessor for self._sampling_resolution \"\"\"\n        return self._sampling_resolution"
  },
  {
    "path": "carla_env/navigation/local_planner.py",
    "content": "#!/usr/bin/env python\n\n# Copyright (c) 2018 Intel Labs.\n# authors: German Ros (german.ros@intel.com)\n#\n# This work is licensed under the terms of the MIT license.\n# For a copy, see <https://opensource.org/licenses/MIT>.\n\n\"\"\" This module contains a local planner to perform low-level waypoint following based on PID controllers. \"\"\"\n\nfrom enum import Enum\nfrom collections import deque\nimport random\n\nimport carla\nfrom carla_env.navigation.controller import VehiclePIDController\nfrom carla_env.tools.misc import distance_vehicle, draw_waypoints\n\n\nclass RoadOption(Enum):\n    \"\"\"\n    RoadOption represents the possible topological configurations when moving from a segment of lane to other.\n    \"\"\"\n    LANEFOLLOW = 0\n    LEFT = 1\n    RIGHT = 2\n    STRAIGHT = 3\n    VOID = -1\n    CHANGELANELEFT = 5\n    CHANGELANERIGHT = 6\n\n    def __eq__(self, other):\n        return self.value == other.value\n\n\nclass LocalPlanner(object):\n    \"\"\"\n    LocalPlanner implements the basic behavior of following a trajectory of waypoints that is generated on-the-fly.\n    The low-level motion of the vehicle is computed by using two PID controllers, one is used for the lateral control\n    and the other for the longitudinal control (cruise speed).\n\n    When multiple paths are available (intersections) this local planner makes a random choice.\n    \"\"\"\n\n    # minimum distance to target waypoint as a percentage (e.g. within 90% of\n    # total distance)\n    MIN_DISTANCE_PERCENTAGE = 0.9\n\n    def __init__(self, vehicle, opt_dict=None):\n        \"\"\"\n        :param vehicle: actor to apply to local planner logic onto\n        :param opt_dict: dictionary of arguments with the following semantics:\n            dt -- time difference between physics control in seconds. This is typically fixed from server side\n                  using the arguments -benchmark -fps=F . In this case dt = 1/F\n\n            target_speed -- desired cruise speed in Km/h\n\n            sampling_radius -- search radius for next waypoints in seconds: e.g. 0.5 seconds ahead\n\n            lateral_control_dict -- dictionary of arguments to setup the lateral PID controller\n                                    {'K_P':, 'K_D':, 'K_I':, 'dt'}\n\n            longitudinal_control_dict -- dictionary of arguments to setup the longitudinal PID controller\n                                        {'K_P':, 'K_D':, 'K_I':, 'dt'}\n        \"\"\"\n        self._vehicle = vehicle\n        self._map = self._vehicle.get_world().get_map()\n\n        self._dt = None\n        self._target_speed = None\n        self._sampling_radius = None\n        self._min_distance = None\n        self._current_waypoint = None\n        self._target_road_option = None\n        self._next_waypoints = None\n        self.target_waypoint = None\n        self._vehicle_controller = None\n        self._global_plan = None\n        # queue with tuples of (waypoint, RoadOption)\n        self._waypoints_queue = deque(maxlen=20000)\n        self._buffer_size = 5\n        self._waypoint_buffer = deque(maxlen=self._buffer_size)\n\n        # initializing controller\n        self._init_controller(opt_dict)\n\n    def __del__(self):\n        if self._vehicle:\n            self._vehicle.destroy()\n        print(\"Destroying ego-vehicle!\")\n\n    def reset_vehicle(self):\n        self._vehicle = None\n        print(\"Resetting ego-vehicle!\")\n\n    def _init_controller(self, opt_dict):\n        \"\"\"\n        Controller initialization.\n\n        :param opt_dict: dictionary of arguments.\n        :return:\n        \"\"\"\n        # default params\n        self._dt = 1.0 / 20.0\n        self._target_speed = 20.0  # Km/h\n        self._sampling_radius = self._target_speed * 1 / 3.6  # 1 seconds horizon\n        self._min_distance = self._sampling_radius * self.MIN_DISTANCE_PERCENTAGE\n        args_lateral_dict = {\n            'K_P': 1.95,\n            'K_D': 0.01,\n            'K_I': 1.4,\n            'dt': self._dt}\n        args_longitudinal_dict = {\n            'K_P': 1.0,\n            'K_D': 0,\n            'K_I': 1,\n            'dt': self._dt}\n\n        # parameters overload\n        if opt_dict:\n            if 'dt' in opt_dict:\n                self._dt = opt_dict['dt']\n            if 'target_speed' in opt_dict:\n                self._target_speed = opt_dict['target_speed']\n            if 'sampling_radius' in opt_dict:\n                self._sampling_radius = self._target_speed * \\\n                    opt_dict['sampling_radius'] / 3.6\n            if 'lateral_control_dict' in opt_dict:\n                args_lateral_dict = opt_dict['lateral_control_dict']\n            if 'longitudinal_control_dict' in opt_dict:\n                args_longitudinal_dict = opt_dict['longitudinal_control_dict']\n\n        self._current_waypoint = self._map.get_waypoint(self._vehicle.get_location())\n        self._vehicle_controller = VehiclePIDController(self._vehicle,\n                                                       args_lateral=args_lateral_dict,\n                                                       args_longitudinal=args_longitudinal_dict)\n\n        self._global_plan = False\n\n        # compute initial waypoints\n        self._waypoints_queue.append((self._current_waypoint.next(self._sampling_radius)[0], RoadOption.LANEFOLLOW))\n\n        self._target_road_option = RoadOption.LANEFOLLOW\n        # fill waypoint trajectory queue\n        self._compute_next_waypoints(k=200)\n\n    def set_speed(self, speed):\n        \"\"\"\n        Request new target speed.\n\n        :param speed: new target speed in Km/h\n        :return:\n        \"\"\"\n        self._target_speed = speed\n\n    def _compute_next_waypoints(self, k=1):\n        \"\"\"\n        Add new waypoints to the trajectory queue.\n\n        :param k: how many waypoints to compute\n        :return:\n        \"\"\"\n        # check we do not overflow the queue\n        available_entries = self._waypoints_queue.maxlen - len(self._waypoints_queue)\n        k = min(available_entries, k)\n\n        for _ in range(k):\n            last_waypoint = self._waypoints_queue[-1][0]\n            next_waypoints = list(last_waypoint.next(self._sampling_radius))\n\n            if len(next_waypoints) == 1:\n                # only one option available ==> lanefollowing\n                next_waypoint = next_waypoints[0]\n                road_option = RoadOption.LANEFOLLOW\n            else:\n                # random choice between the possible options\n                road_options_list = _retrieve_options(\n                    next_waypoints, last_waypoint)\n                road_option = random.choice(road_options_list)\n                next_waypoint = next_waypoints[road_options_list.index(\n                    road_option)]\n\n            self._waypoints_queue.append((next_waypoint, road_option))\n\n    def set_global_plan(self, current_plan):\n        self._waypoints_queue.clear()\n        for elem in current_plan:\n            self._waypoints_queue.append(elem)\n        self._target_road_option = RoadOption.LANEFOLLOW\n        self._global_plan = True\n\n    def run_step(self, debug=True):\n        \"\"\"\n        Execute one step of local planning which involves running the longitudinal and lateral PID controllers to\n        follow the waypoints trajectory.\n\n        :param debug: boolean flag to activate waypoints debugging\n        :return:\n        \"\"\"\n\n        # not enough waypoints in the horizon? => add more!\n        if not self._global_plan and len(self._waypoints_queue) < int(self._waypoints_queue.maxlen * 0.5):\n            self._compute_next_waypoints(k=100)\n\n        if len(self._waypoints_queue) == 0:\n            control = carla.VehicleControl()\n            control.steer = 0.0\n            control.throttle = 0.0\n            control.brake = 1.0\n            control.hand_brake = False\n            control.manual_gear_shift = False\n\n            return control\n\n        #   Buffering the waypoints\n        if not self._waypoint_buffer:\n            for i in range(self._buffer_size):\n                if self._waypoints_queue:\n                    self._waypoint_buffer.append(\n                        self._waypoints_queue.popleft())\n                else:\n                    break\n\n        # current vehicle waypoint\n        self._current_waypoint = self._map.get_waypoint(self._vehicle.get_location())\n        # target waypoint\n        self.target_waypoint, self._target_road_option = self._waypoint_buffer[0]\n        # move using PID controllers\n        control = self._vehicle_controller.run_step(self._target_speed, self.target_waypoint)\n\n        # purge the queue of obsolete waypoints\n        vehicle_transform = self._vehicle.get_transform()\n        max_index = -1\n\n        for i, (waypoint, _) in enumerate(self._waypoint_buffer):\n            if distance_vehicle(\n                    waypoint, vehicle_transform) < self._min_distance:\n                max_index = i\n        if max_index >= 0:\n            for i in range(max_index + 1):\n                self._waypoint_buffer.popleft()\n\n        if debug:\n            draw_waypoints(self._vehicle.get_world(), [self.target_waypoint], self._vehicle.get_location().z + 1.0)\n\n        return control\n\n\ndef _retrieve_options(list_waypoints, current_waypoint):\n    \"\"\"\n    Compute the type of connection between the current active waypoint and the multiple waypoints present in\n    list_waypoints. The result is encoded as a list of RoadOption enums.\n\n    :param list_waypoints: list with the possible target waypoints in case of multiple options\n    :param current_waypoint: current active waypoint\n    :return: list of RoadOption enums representing the type of connection from the active waypoint to each\n             candidate in list_waypoints\n    \"\"\"\n    options = []\n    for next_waypoint in list_waypoints:\n        # this is needed because something we are linking to\n        # the beggining of an intersection, therefore the\n        # variation in angle is small\n        next_next_waypoint = next_waypoint.next(3.0)[0]\n        link = _compute_connection(current_waypoint, next_next_waypoint)\n        options.append(link)\n\n    return options\n\n\ndef _compute_connection(current_waypoint, next_waypoint):\n    \"\"\"\n    Compute the type of topological connection between an active waypoint (current_waypoint) and a target waypoint\n    (next_waypoint).\n\n    :param current_waypoint: active waypoint\n    :param next_waypoint: target waypoint\n    :return: the type of topological connection encoded as a RoadOption enum:\n             RoadOption.STRAIGHT\n             RoadOption.LEFT\n             RoadOption.RIGHT\n    \"\"\"\n    n = next_waypoint.transform.rotation.yaw\n    n = n % 360.0\n\n    c = current_waypoint.transform.rotation.yaw\n    c = c % 360.0\n\n    diff_angle = (n - c) % 180.0\n    if diff_angle < 1.0:\n        return RoadOption.STRAIGHT\n    elif diff_angle > 90.0:\n        return RoadOption.LEFT\n    else:\n        return RoadOption.RIGHT\n"
  },
  {
    "path": "carla_env/navigation/planner.py",
    "content": "import numpy as np\n\n# ==============================================================================\n# -- import route planning (copied and modified from CARLA 0.9.4's PythonAPI) --\n# ==============================================================================\nimport carla\nfrom carla_env.navigation.local_planner import RoadOption\nfrom carla_env.navigation.global_route_planner import GlobalRoutePlanner\nfrom carla_env.navigation.global_route_planner_dao import GlobalRoutePlannerDAO\nfrom carla_env.tools.misc import vector\n\ndef compute_route_waypoints(world_map, start_waypoint, end_waypoint, resolution=1.0, plan=None):\n    \"\"\"\n        Returns a list of (waypoint, RoadOption)-tuples that describes a route\n        starting at start_waypoint, ending at end_waypoint.\n\n        start_waypoint (carla.Waypoint):\n            Starting waypoint of the route\n        end_waypoint (carla.Waypoint):\n            Destination waypoint of the route\n        resolution (float):\n            Resolution, or lenght, of the steps between waypoints\n            (in meters)\n        plan (list(RoadOption) or None):\n            If plan is not None, generate a route that takes every option as provided\n            in the list for every intersections, in the given order.\n            (E.g. set plan=[RoadOption.STRAIGHT, RoadOption.LEFT, RoadOption.RIGHT]\n            to make the route go straight, then left, then right.)\n            If plan is None, we use the GlobalRoutePlanner to find a path between\n            start_waypoint and end_waypoint.\n    \"\"\"\n\n    if plan is None:\n        # Setting up global router\n        dao = GlobalRoutePlannerDAO(world_map, resolution)\n        grp = GlobalRoutePlanner(dao)\n        grp.setup()\n        \n        # Obtain route plan\n        route = grp.trace_route(\n            start_waypoint.transform.location,\n            end_waypoint.transform.location)\n    else:\n        # Compute route waypoints\n        route = []\n        current_waypoint = start_waypoint\n        for i, action in enumerate(plan):\n            # Generate waypoints to next junction\n            wp_choice = [current_waypoint]\n            while len(wp_choice) == 1:\n                current_waypoint = wp_choice[0]\n                route.append((current_waypoint, RoadOption.LANEFOLLOW))\n                wp_choice = current_waypoint.next(resolution)\n\n                # Stop at destination\n                if i > 0 and current_waypoint.transform.location.distance(end_waypoint.transform.location) < resolution:\n                    break\n\n            if action == RoadOption.VOID:\n                break\n\n            # Make sure that next intersection waypoints are far enough\n            # from each other so we choose the correct path\n            step = resolution\n            while len(wp_choice) > 1:\n                wp_choice = current_waypoint.next(step)\n                wp0, wp1 = wp_choice[:2]\n                if wp0.transform.location.distance(wp1.transform.location) < resolution:\n                    step += resolution\n                else:\n                    break\n\n            # Select appropriate path at the junction\n            if len(wp_choice) > 1:\n                # Current heading vector\n                current_transform = current_waypoint.transform\n                current_location = current_transform.location\n                projected_location = current_location + \\\n                    carla.Location(\n                        x=np.cos(np.radians(current_transform.rotation.yaw)),\n                        y=np.sin(np.radians(current_transform.rotation.yaw)))\n                v_current = vector(current_location, projected_location)\n\n                direction = 0\n                if action == RoadOption.LEFT:\n                    direction = 1\n                elif action == RoadOption.RIGHT:\n                    direction = -1\n                elif action == RoadOption.STRAIGHT:\n                    direction = 0\n                select_criteria = float(\"inf\")\n\n                # Choose correct path\n                for wp_select in wp_choice:\n                    v_select = vector(\n                        current_location, wp_select.transform.location)\n                    cross = float(\"inf\")\n                    if direction == 0:\n                        cross = abs(np.cross(v_current, v_select)[-1])\n                    else:\n                        cross = direction * np.cross(v_current, v_select)[-1]\n                    if cross < select_criteria:\n                        select_criteria = cross\n                        current_waypoint = wp_select\n\n                # Generate all waypoints within the junction\n                # along selected path\n                route.append((current_waypoint, action))\n                current_waypoint = current_waypoint.next(resolution)[0]\n                while current_waypoint.is_intersection:\n                    route.append((current_waypoint, action))\n                    current_waypoint = current_waypoint.next(resolution)[0]\n        assert route\n\n    # Change action 5 wp before intersection\n    num_wp_to_extend_actions_with = 5\n    action = route[0][1]\n    for i in range(1, len(route)):\n        next_action = route[i][1]\n        if next_action != action:\n            if next_action != RoadOption.LANEFOLLOW:\n                for j in range(num_wp_to_extend_actions_with):\n                    route[i-j-1] = (route[i-j-1][0], route[i][1])\n        action = next_action\n\n    return route\n"
  },
  {
    "path": "carla_env/navigation/roaming_agent.py",
    "content": "#!/usr/bin/env python\n\n# Copyright (c) 2018 Intel Labs.\n# authors: German Ros (german.ros@intel.com)\n#\n# This work is licensed under the terms of the MIT license.\n# For a copy, see <https://opensource.org/licenses/MIT>.\n\n\"\"\" This module implements an agent that roams around a track following random waypoints and avoiding other vehicles.\nThe agent also responds to traffic lights. \"\"\"\n\nfrom agents.navigation.agent import Agent, AgentState\nfrom agents.navigation.local_planner import LocalPlanner\n\n\nclass RoamingAgent(Agent):\n    \"\"\"\n    RoamingAgent implements a basic agent that navigates scenes making random\n    choices when facing an intersection.\n\n    This agent respects traffic lights and other vehicles.\n    \"\"\"\n\n    def __init__(self, vehicle):\n        \"\"\"\n\n        :param vehicle: actor to apply to local planner logic onto\n        \"\"\"\n        super(RoamingAgent, self).__init__(vehicle)\n        self._proximity_threshold = 10.0  # meters\n        self._state = AgentState.NAVIGATING\n        self._local_planner = LocalPlanner(self._vehicle)\n\n    def run_step(self, debug=False):\n        \"\"\"\n        Execute one step of navigation.\n        :return: carla.VehicleControl\n        \"\"\"\n\n        # is there an obstacle in front of us?\n        hazard_detected = False\n\n        # retrieve relevant elements for safe navigation, i.e.: traffic lights\n        # and other vehicles\n        actor_list = self._world.get_actors()\n        vehicle_list = actor_list.filter(\"*vehicle*\")\n        lights_list = actor_list.filter(\"*traffic_light*\")\n\n        # check possible obstacles\n        vehicle_state, vehicle = self._is_vehicle_hazard(vehicle_list)\n        if vehicle_state:\n            if debug:\n                print('!!! VEHICLE BLOCKING AHEAD [{}])'.format(vehicle.id))\n\n            self._state = AgentState.BLOCKED_BY_VEHICLE\n            hazard_detected = True\n\n        # check for the state of the traffic lights\n        light_state, traffic_light = self._is_light_red(lights_list)\n        if light_state:\n            if debug:\n                print('=== RED LIGHT AHEAD [{}])'.format(traffic_light.id))\n\n            self._state = AgentState.BLOCKED_RED_LIGHT\n            hazard_detected = True\n\n        if hazard_detected:\n            control = self.emergency_stop()\n        else:\n            self._state = AgentState.NAVIGATING\n            # standard local planner behavior\n            control = self._local_planner.run_step()\n\n        return control\n"
  },
  {
    "path": "carla_env/rewards.py",
    "content": "import math\n\nimport numpy as np\n\nfrom config import CONFIG\n\n\nmin_speed = CONFIG.reward_params.min_speed\nmax_speed = CONFIG.reward_params.max_speed\ntarget_speed = CONFIG.reward_params.target_speed\nmax_distance = CONFIG.reward_params.max_distance\nmax_std_center_lane = CONFIG.reward_params.max_std_center_lane\nmax_angle_center_lane = CONFIG.reward_params.max_angle_center_lane\npenalty_reward = CONFIG.reward_params.penalty_reward\nearly_stop = CONFIG.reward_params.early_stop\nreward_functions = {}\n\n\ndef create_reward_fn(reward_fn):\n    def func(env):\n        terminal_reason = \"Running...\"\n        if early_stop:\n            speed = env.vehicle.get_speed()\n            if speed < 1.0:\n                env.low_speed_timer += 1\n            else:\n                env.low_speed_timer = 0.0  # Reset timer if speed goes above threshold\n\n            # Check if speed is low for 90 consecutive second\n            if env.low_speed_timer >= 90 * env.fps:\n                env.terminal_state = True\n                terminal_reason = \"Vehicle stopped\"\n\n            # Stop if distance from center > max distance\n            if env.distance_from_center > max_distance and not env.eval:\n                env.terminal_state = True\n                terminal_reason = \"Off-track\"\n\n            # Stop if speed is too high\n            if max_speed > 0 and speed > max_speed and not env.eval:\n                env.terminal_state = True\n                terminal_reason = \"Too fast\"\n\n        # Calculate reward\n        reward = 0\n        if not env.terminal_state:\n            reward += reward_fn(env)\n        else:\n            env.low_speed_timer = 0.0\n            if reward_fn in {reward_fn5}:\n                reward += penalty_reward\n            print(f\"{env.episode_idx}| Terminal: \", terminal_reason)\n\n        if env.success_state:\n            print(f\"{env.episode_idx}| Success\")\n\n        env.extra_info.extend([\n            terminal_reason,\n            \"\"\n        ])\n        return reward\n\n    return func\n\ndef reward_fn_revolve(env):\n    \"\"\"\n    Revolve reward function\n    https://arxiv.org/pdf/2406.01309\n    :return: reward value and reward_components\n    \"\"\"\n    # Parameters to tweak the importance of different reward components\n    collision_penalty = -100\n    inactivity_penalty = -10\n    speed_reward_weight = 2.0\n    position_reward_weight = 1.0\n    smoothness_reward_weight = 0.5\n    # Adjusted temperature parameters for score transformation\n    speed_temp = 0.5  # Increased temperature for speed_reward\n    position_temp = 0.1\n    smoothness_temp = 0.1\n    reward_components = {\n        \"collision_penalty\": 0,\n        \"inactivity_penalty\": 0,\n        \"speed_reward\": 0,\n        \"position_reward\": 0,\n        \"smoothness_reward\": 0\n    }\n    collision = env.collision_state\n    speed = env.vehicle.get_speed() / 3.6  # unit: m/s\n    action_list = env.action_list  ### record the historical actions by setting self.steering_list = []\n    min_pos = env.distance_from_center\n\n    # Penalize for collision\n    if collision:\n        reward_components[\"collision_penalty\"] = collision_penalty\n    # Penalize for inactivity (speed too close to zero)\n    if speed < 1.5:  # Adjusted threshold for inactivity\n        reward_components[\"inactivity_penalty\"] = inactivity_penalty\n    # Reward for maintaining an optimal speed range\n    if 4.0 <= speed <= 6.0:\n        # Centering and normalizing around the average of optima speed range\n        speed_score = 1 - np.abs(speed - 5) / 1.75\n    else:\n        speed_score = -1\n    # Use a sigmoid function to smoothly transform the speed score and constrain it\n    reward_components[\"speed_reward\"] = speed_reward_weight * (1 / (1 + np.exp(\n        -speed_score / speed_temp)))\n    # Reward for being close to the center of the road\n    position_score = np.exp(-min_pos / position_temp)\n    reward_components[\"position_reward\"] = position_reward_weight * position_score\n    # Reward for smooth driving (small variations in consecutive steering actions)\n    steering_smoothness = -np.std(action_list)\n    reward_components[\"smoothness_reward\"] = smoothness_reward_weight * np.exp(\n        steering_smoothness / smoothness_temp)\n    # Calculate total reward, giving precedence to penalties\n    total_reward = 0\n    if reward_components[\"collision_penalty\"] < 0:\n        total_reward = reward_components[\"collision_penalty\"]\n    elif reward_components[\"inactivity_penalty\"] < 0:\n        total_reward = reward_components[\"inactivity_penalty\"]\n    else:\n        total_reward = sum(reward_components.values())\n    # Ensure the total reward is within a reasonable range\n    total_reward = np.clip(total_reward, -1, 1)\n    return total_reward\n\nreward_functions[\"reward_fn_revolve\"] = create_reward_fn(reward_fn_revolve)\n\ndef reward_fn_revolve_auto(env):\n    \"\"\"\n    An variant of revolve reward function, with an automatic feedback mechanism originally proposed in Eureka\n    :return: reward value\n    \"\"\"\n\n    def calculate_distance(point1, point2):\n        return math.sqrt((point1.x - point2.x) ** 2 + (point1.y - point2.y) ** 2 + (point1.z - point2.z) ** 2)\n\n    def get_distance_to_nearest_obstacle(vehicle, max_distance_view=20):\n        vehicle_transform = vehicle.get_transform()\n        start_location = vehicle_transform.location\n        forward_vector = vehicle_transform.get_forward_vector()\n\n        end_location = start_location + forward_vector * max_distance_view\n\n        world = vehicle.get_world()\n        hit_result = world.cast_ray(start_location, end_location)\n\n        if hit_result:\n            min_distance = 9999\n            for hit_res in hit_result:\n                distance = calculate_distance(start_location, hit_res.location)\n                if distance < min_distance:\n                    min_distance = distance\n            # distance = start_location.distance(hit_result.location)\n            return min_distance\n        else:\n            return max_distance_view\n\n    # Define temperature parameters for reward transformations\n    temp_collision = 5.0\n    temp_inactivity = 10.0\n    temp_centering = 2.0\n    temp_speed = 2.0\n    temp_smoothness = 1.0\n    temp_proximity = 2.0\n    # Initialize the total reward and reward components dictionary\n    reward_components = {\n        'collision_penalty': 0,\n        'inactivity_penalty': 0,\n        'lane_centering_bonus': 0,\n        'speed_regulation_bonus': 0,\n        'smooth_driving_bonus': 0,\n        'front_distance_bonus': 0\n    }\n    collision = env.collision_state\n    speed = env.vehicle.get_speed() / 3.6  # unit: m/s\n    action_list = env.action_list  ### record the historical actions by setting self.steering_list = []\n    min_pos = env.distance_from_center\n\n    # Penalty for collisions\n    reward_components['collision_penalty'] = np.exp(-temp_collision) if collision else 0\n    # Penalty for inactivity\n    inactivity_threshold = 1.5\n    if speed < inactivity_threshold and not collision:\n        reward_components['inactivity_penalty'] = np.exp(-temp_inactivity * (\n                inactivity_threshold - speed))\n    # Lane centering bonus\n    max_distance_from_center = 2.0  # adaptable based on road width\n    reward_components['lane_centering_bonus'] = np.exp(-temp_centering * min_pos)\n    # Speed regulation bonus\n    ideal_speed = (4.0 + 6.0) / 2\n    speed_range = 6.0 - 4.0\n    reward_components['speed_regulation_bonus'] = np.exp(-temp_speed * (abs(speed - ideal_speed) / speed_range))\n    # Smooth driving bonus\n    max_steering_variance = 0.1  # configured to the acceptable steering variance for full bonus\n    smoothness_factor = np.std(action_list)\n    reward_components['smooth_driving_bonus'] = np.exp(-temp_smoothness * (\n            smoothness_factor / max_steering_variance))\n    # Front distance bonus\n    max_distance_view = 20\n    distance = get_distance_to_nearest_obstacle(env.vehicle, max_distance_view)\n    reward_components['front_distance_bonus'] = np.clip(distance / max_distance_view, 0, 1)\n    # Sum the individual rewards for the total reward\n    total_reward = sum(reward_components.values())\n    # Ensure penalties take precedence\n    if collision or speed < inactivity_threshold:\n        total_reward = -1  # Apply max negative reward for collision or inactivity\n    # Normalize the total reward to lie between-1 and 1\n    total_reward = np.clip(total_reward, -1, 1)\n    return total_reward\n\nreward_functions[\"reward_fn_revolve_auto\"] = create_reward_fn(reward_fn_revolve_auto)\n\ndef reward_fn_chatscene(env):\n    out_lane_thres = 4\n    desired_speed = 5.5\n\n    def get_lane_dis(waypoints, x, y):\n        \"\"\"\n            Calculate distance from (x, y) to waypoints.\n            :param waypoints: a list of list storing waypoints like [[x0, y0], [x1, y1], ...]\n            :param x: x position of vehicle\n            :param y: y position of vehicle\n            :return: a tuple of the distance and the closest waypoint orientation\n        \"\"\"\n        eps = 1e-5\n        dis_min = 99999\n        waypt = waypoints[0]\n        for pt in waypoints:\n            pt_loc = pt.transform.location\n            d = np.sqrt((x - pt_loc.x) ** 2 + (y - pt_loc.y) ** 2)\n            if d < dis_min:\n                dis_min = d\n                waypt = pt\n        vec = np.array([x - waypt.transform.location.x, y - waypt.transform.location.y])\n        lv = np.linalg.norm(np.array(vec)) + eps\n        w = np.array([np.cos(waypt.transform.rotation.yaw / 180 * np.pi),\n                      np.sin(waypt.transform.rotation.yaw / 180 * np.pi)])\n        cross = np.cross(w, vec / lv)\n        dis = - lv * cross\n        return dis, w\n\n    collision = env.collision_state  ### self.collision in env\n    waypoints = [i[0] for i in env.route_waypoints[env.current_waypoint_index % len(env.route_waypoints): ]]\n    r_collision = -1 if collision else 0\n\n    # reward for steering:\n    r_steer = -env.vehicle.get_control().steer ** 2\n\n    # reward for out of lane\n    trans = env.vehicle.get_transform()\n    ego_x = trans.location.x\n    ego_y = trans.location.y\n    dis, w = get_lane_dis(waypoints, ego_x, ego_y)\n    r_out = -1 if abs(dis) > out_lane_thres else 0\n\n    # reward for speed tracking\n    v = env.vehicle.get_velocity()\n\n    # cost for too fast\n    lspeed = np.array([v.x, v.y])\n    lspeed_lon = np.dot(lspeed, w)\n    r_fast = -1 if lspeed_lon > desired_speed else 0\n\n    # cost for lateral acceleration\n    r_lat = -abs(env.vehicle.get_control().steer) * lspeed_lon ** 2\n\n    # combine all rewards\n    total_reward = 1 * r_collision + 1 * lspeed_lon + 10 * r_fast + 1 * r_out + r_steer * 5 + 0.2 * r_lat\n    return total_reward\n\nreward_functions[\"reward_fn_chatscene\"] = create_reward_fn(reward_fn_chatscene)\n\ndef reward_fn_simple(env):\n    \"\"\"\n    Trustworthy safety improvement for autonomous driving using reinforcement learning, 2022\n    When getting collisions, the reward is -1, else it is 0.\n    \"\"\"\n    collision = env.collision_state  ### self.collision in env\n    if collision:\n        return -1\n    else:\n        return 0\n\nreward_functions[\"reward_fn_simple\"] = create_reward_fn(reward_fn_simple)\n\ndef reward_fn5(env):\n    \"\"\"\n        reward = Positive speed reward for being close to target speed,\n                 however, quick decline in reward beyond target speed\n               * centering factor (1 when centered, 0 when not)\n               * angle factor (1 when aligned with the road, 0 when more than max_angle_center_lane degress off)\n               * distance_std_factor (1 when std from center lane is low, 0 when not)\n    \"\"\"\n\n    angle = env.vehicle.get_angle(env.current_waypoint)\n    speed_kmh = env.vehicle.get_speed()\n    if speed_kmh < min_speed:  # When speed is in [0, min_speed] range\n        speed_reward = speed_kmh / min_speed  # Linearly interpolate [0, 1] over [0, min_speed]\n    elif speed_kmh > target_speed:  # When speed is in [target_speed, inf]\n        # Interpolate from [1, 0, -inf] over [target_speed, max_speed, inf]\n        speed_reward = 1.0 - (speed_kmh - target_speed) / (max_speed - target_speed)\n    else:  # Otherwise\n        speed_reward = 1.0  # Return 1 for speeds in range [min_speed, target_speed]\n\n    # Interpolated from 1 when centered to 0 when 3 m from center\n    centering_factor = max(1.0 - env.distance_from_center / max_distance, 0.0)\n\n    # Interpolated from 1 when aligned with the road to 0 when +/- 20 degress of road\n    angle_factor = max(1.0 - abs(angle / np.deg2rad(max_angle_center_lane)), 0.0)\n\n    std = np.std(env.distance_from_center_history)\n    distance_std_factor = max(1.0 - abs(std / max_std_center_lane), 0.0)\n\n    # Final reward\n    reward = speed_reward * centering_factor * angle_factor * distance_std_factor\n\n    return reward\n\nreward_functions[\"reward_fn5\"] = create_reward_fn(reward_fn5)\n\n\ndef reward_fn_Chen(env):\n    \"\"\"\n    A reward function that combines multiple factors:\n    r = 200*r_collision + v_lon + 10*r_fast + r_out - 5*α^2 + 0.2*r_lat - 0.1\n\n    where:\n    - r_collision: -1 if collision occurs, 0 otherwise\n    - v_lon: longitudinal speed of the ego vehicle\n    - r_fast: -1 if speed exceeds threshold (8 m/s), 0 otherwise\n    - r_out: -1 if vehicle leaves the lane, 0 otherwise\n    - α: steering angle in radians, penalized quadratically\n    - r_lat: lateral acceleration term, calculated as r_lat = -|α|*v_lon^2\n    - -0.1: constant term to discourage the vehicle from remaining stationary\n    \"\"\"\n    # Get vehicle state\n    collision = env.collision_state\n    speed = env.vehicle.get_speed() / 3.6  # Convert from km/h to m/s\n    steering = env.vehicle.get_control().steer  # Steering angle in radians\n\n    # Calculate individual reward components\n    r_collision = -1 if collision else 0\n    v_lon = speed  # Longitudinal speed\n    r_fast = -1 if speed > 8 else 0  # Penalty for exceeding 8 m/s\n    r_out = -1 if env.distance_from_center > 4 else 0  # Penalty for leaving lane (assuming 4m threshold)\n    alpha_squared = -5 * (steering ** 2)  # Quadratic steering penalty\n    r_lat = -abs(steering) * (speed ** 2)  # Lateral acceleration penalty\n    constant_term = -0.1  # Constant penalty\n\n    # Combine all components according to the formula\n    total_reward = (200 * r_collision +\n                    v_lon +\n                    10 * r_fast +\n                    r_out +\n                    alpha_squared +\n                    0.2 * r_lat +\n                    constant_term)\n\n    return total_reward\n\n\nreward_functions[\"reward_fn_Chen\"] = create_reward_fn(reward_fn_Chen)\n\n\ndef reward_fn_ASAP(env):\n    \"\"\"\n    ASAP (Approximate Safe Action Policy) reward function that combines multiple factors:\n    r_t = R_progress + R_destination + R_crash + R_overtaking\n\n    where:\n    - R_progress: 1 reward for every 10 meters of distance completed\n    - R_destination: 1 reward if the agent successfully reaches its destination\n    - R_crash: -5 penalty if the agent collides with another vehicle or road curbs\n    - R_overtaking: 0.1 reward each time the agent overtakes another vehicle\n    \"\"\"\n    # Get vehicle state\n    collision = env.collision_state\n\n    # Calculate progress reward (1 reward per 10 meters)\n    distance_traveled = env.distance_traveled  # Total distance traveled in meters\n    r_progress = distance_traveled / 10.0  # 1 reward per 10 meters\n\n    # Calculate destination reward\n    r_destination = 1.0 if env.success_state else 0.0\n\n    # Calculate crash penalty\n    r_crash = -5.0 if collision else 0.0\n\n    # Calculate overtaking reward\n    # Note: This requires tracking overtaken vehicles, which might need to be implemented in the environment\n    # For now, we'll use a placeholder based on nearby vehicles\n    r_overtaking = 0.0  # This should be implemented based on environment's vehicle tracking\n\n    # Combine all components according to the formula\n    total_reward = (r_progress +\n                    r_destination +\n                    r_crash +\n                    r_overtaking)\n\n    return total_reward\n\n\nreward_functions[\"reward_fn_ASAP\"] = create_reward_fn(reward_fn_ASAP)"
  },
  {
    "path": "carla_env/state_commons.py",
    "content": "import torch\nfrom torchvision import transforms\n\nimport numpy as np\nimport gym\n\nfrom carla_env.wrappers import vector, get_displacement_vector\n\ntorch.cuda.empty_cache()\n\n\ndef preprocess_frame(frame):\n    preprocess = transforms.Compose([\n        transforms.ToTensor(),\n    ])\n    frame = preprocess(frame).unsqueeze(0)\n    return frame\n\n\ndef create_encode_state_fn(measurements_to_include, CONFIG, vae=None):\n    \"\"\"\n        Returns a function that encodes the current state of\n        the environment into some feature vector.\n    \"\"\"\n\n    measure_flags = [\"steer\" in measurements_to_include,\n                     \"throttle\" in measurements_to_include,\n                     \"speed\" in measurements_to_include,\n                     \"angle_next_waypoint\" in measurements_to_include,\n                     \"maneuver\" in measurements_to_include,\n                     \"waypoints\" in measurements_to_include,\n                     \"rgb_camera\" in measurements_to_include,\n                     \"seg_camera\" in measurements_to_include,\n                     \"end_wp_vector\" in measurements_to_include,\n                     \"end_wp_fixed\" in measurements_to_include,\n                     \"distance_goal\" in measurements_to_include]\n\n    def create_observation_space():\n        observation_space = {}\n        low, high = [], []\n        if measure_flags[0]: low.append(-1), high.append(1)\n        if measure_flags[1]: low.append(0), high.append(1)\n        if measure_flags[2]: low.append(0), high.append(120)\n        if measure_flags[3]: low.append(-3.14), high.append(3.14)\n        observation_space['vehicle_measures'] = gym.spaces.Box(low=np.array(low), high=np.array(high), dtype=np.float32)\n\n        if measure_flags[4]: observation_space['maneuver'] = gym.spaces.Discrete(4)\n\n        if measure_flags[5]: observation_space['waypoints'] = gym.spaces.Box(low=-50, high=50, shape=(15, 2),\n                                                                             dtype=np.float32)\n\n        if measure_flags[6]: observation_space['rgb_camera'] = gym.spaces.Box(low=0, high=255, shape=(CONFIG['obs_res'][1], CONFIG['obs_res'][0], 3), dtype=np.uint8)\n        if measure_flags[7]: observation_space['seg_camera'] = gym.spaces.Box(low=0, high=255, shape=(CONFIG['obs_res'][1], CONFIG['obs_res'][0], 3), dtype=np.uint8)\n        if measure_flags[8]: observation_space['end_wp_vector'] = gym.spaces.Box(low=-50, high=50, shape=(1, 2), dtype=np.float32)\n        if measure_flags[9]: observation_space['end_wp_fixed'] = gym.spaces.Box(low=-50, high=50, shape=(1, 2), dtype=np.float32)\n        if measure_flags[10]: observation_space['distance_goal'] = gym.spaces.Box(low=0, high=1500, shape=(1, 1), dtype=np.float32)\n\n        if CONFIG.use_seg_bev: observation_space['seg_camera'] = gym.spaces.Box(low=0, high=255, shape=(192, 192, 6), dtype=np.uint8)\n\n        return gym.spaces.Dict(observation_space)\n\n    def encode_state(env):\n        encoded_state = {}\n        vehicle_measures = []\n        if measure_flags[0]: vehicle_measures.append(env.vehicle.control.steer)\n        if measure_flags[1]: vehicle_measures.append(env.vehicle.control.throttle)\n        if measure_flags[2]: vehicle_measures.append(env.vehicle.get_speed())\n        if measure_flags[3]: vehicle_measures.append(env.vehicle.get_angle(env.current_waypoint))\n        encoded_state['vehicle_measures'] = vehicle_measures\n        if measure_flags[4]: encoded_state['maneuver'] = env.current_road_maneuver.value\n\n        if measure_flags[5]:\n            next_waypoints_state = env.route_waypoints[env.current_waypoint_index: env.current_waypoint_index + 15]\n            waypoints = [vector(way[0].transform.location) for way in next_waypoints_state]\n\n            vehicle_location = vector(env.vehicle.get_location())\n            theta = np.deg2rad(env.vehicle.get_transform().rotation.yaw)\n\n            relative_waypoints = np.zeros((15, 2))\n            for i, w_location in enumerate(waypoints):\n                relative_waypoints[i] = get_displacement_vector(vehicle_location, w_location, theta)[:2]\n            if len(waypoints) < 15:\n                start_index = len(waypoints)\n                reference_vector = relative_waypoints[start_index-1] - relative_waypoints[start_index-2]\n                for i in range(start_index, 15):\n                    relative_waypoints[i] = relative_waypoints[i-1] + reference_vector\n\n            encoded_state['waypoints'] = relative_waypoints\n\n        if measure_flags[6]: encoded_state['rgb_camera'] = env.observation\n        if measure_flags[7] : encoded_state['seg_camera'] = env.observation\n        if measure_flags[8]:\n            vehicle_location = vector(env.vehicle.get_location())\n            theta = np.deg2rad(env.vehicle.get_transform().rotation.yaw)\n            end_wp_location = vector(env.end_wp.transform.location)\n            encoded_state['end_wp_vector'] = get_displacement_vector(vehicle_location, end_wp_location, theta)[:2]\n        if measure_flags[9]:\n            vehicle_location = vector(env.start_wp.transform.location)\n            theta = np.deg2rad(env.start_wp.transform.rotation.yaw)\n            end_wp_location = vector(env.end_wp.transform.location)\n            encoded_state['end_wp_fixed'] = get_displacement_vector(vehicle_location, end_wp_location, theta)[:2]\n        if measure_flags[10]:\n            encoded_state['distance_goal'] = [[len(env.route_waypoints) - env.current_waypoint_index]]\n\n        return encoded_state\n\n    return create_observation_space(), encode_state\n"
  },
  {
    "path": "carla_env/tools/__init__.py",
    "content": ""
  },
  {
    "path": "carla_env/tools/hud.py",
    "content": "\"\"\"\nWelcome to CARLA manual control.\n\nUse ARROWS or WASD keys for control.\n\n    W            : throttle\n    S            : brake\n    AD           : steer\n    Q            : toggle reverse\n    Space        : hand-brake\n    M            : toggle manual transmission\n    ,/.          : gear up/down\n\n    TAB          : change sensor position\n    `            : next sensor\n    [1-9]        : change to sensor [1-9]\n    C            : change weather (Shift+C reverse)\n    Backspace    : change vehicle\n\n    R            : toggle recording images to disk\n\n    F1           : toggle HUD\n    H/?          : toggle help\n    ESC          : quit\n\"\"\"\n\nimport pygame\nimport math\nfrom carla_env.wrappers import get_actor_display_name\n#===============================================================================\n# HUD\n#===============================================================================\n\nclass HUD(object):\n    \"\"\"\n        HUD class for displaying on-screen information\n    \"\"\"\n\n    def __init__(self, width, height):\n        self.dim = (width, height)\n\n        # Select a monospace font for the info panel\n        fonts = [x for x in pygame.font.get_fonts() if \"mono\" in x]\n        default_font = \"ubuntumono\"\n        mono = default_font if default_font in fonts else fonts[0]\n        mono = pygame.font.match_font(mono)\n        self.font_mono = pygame.font.Font(mono, 14)\n\n        # Use default font for everything else\n        font = pygame.font.Font(pygame.font.get_default_font(), 20)\n        \n        self.notifications = FadingText(font, (width, 40), (0, height - 40))\n        self.help = HelpText(pygame.font.Font(mono, 24), width, height)\n        self.server_fps = 0\n        self.frame_number = 0\n        self.simulation_time = 0\n        self.show_info = True\n        self.info_text = []\n        self.server_clock = pygame.time.Clock()\n        self.vehicle = None\n\n    def set_vehicle(self, vehicle):\n        self.vehicle = vehicle\n\n    def tick(self, world, clock):\n        if self.show_info:\n            # Get all world vehicles\n            vehicles = world.get_actors().filter(\"vehicle.*\")\n\n            # General simulation info\n            self.info_text = [\n                \"Server:  % 16d FPS\" % self.server_fps,\n                \"Client:  % 16d FPS\" % clock.get_fps(),\n                \"\",\n                \"Map:     % 20s\" % world.map.name,\n                #\"Simulation time: % 12s\" % datetime.timedelta(seconds=int(self.simulation_time)),\n                \"Number of vehicles: % 8d\" % len(vehicles)\n            ]\n\n            # Show info of attached vehicle\n            if self.vehicle is not None:\n                # Get transform, velocity and heading\n                t = self.vehicle.get_transform()\n                v = self.vehicle.get_velocity()\n                c = self.vehicle.get_control()\n                heading = \"N\" if abs(t.rotation.yaw) < 89.5 else \"\"\n                heading += \"S\" if abs(t.rotation.yaw) > 90.5 else \"\"\n                heading += \"E\" if 179.5 > t.rotation.yaw > 0.5 else \"\"\n                heading += \"W\" if -0.5 > t.rotation.yaw > -179.5 else \"\"\n                \n                self.info_text.extend([\n                    \"Vehicle: % 20s\" % get_actor_display_name(self.vehicle, truncate=20),\n                    \"\",\n                    \"Speed:   % 15.0f km/h\" % (3.6 * math.sqrt(v.x**2 + v.y**2 + v.z**2)),\n                    u\"Heading:% 16.0f\\N{DEGREE SIGN} % 2s\" % (t.rotation.yaw, heading),\n                    \"Location:% 20s\" % (\"(% 5.1f, % 5.1f)\" % (t.location.x, t.location.y)),\n                    \"Height:  % 18.0f m\" % t.location.z,\n                    \"\",\n                    (\"Throttle:\", c.throttle, 0.0, 1.0),\n                    (\"Steer:\", c.steer, -1.0, 1.0),\n                    (\"Brake:\", c.brake, 0.0, 1.0)\n                ])\n            else:\n                self.info_text.append(\"Vehicle: % 20s\" % \"None\")\n            \n        # Tick notifications\n        self.notifications.tick(world, clock)\n\n    def render(self, display, extra_info=[]):\n        if self.show_info:\n            info_surface = pygame.Surface((220, self.dim[1]))\n            info_surface.set_alpha(100)\n            display.blit(info_surface, (0, 0))\n            v_offset = 4\n            bar_h_offset = 100\n            bar_width = 106\n            self.info_text.append(\"\")\n            self.info_text.extend(extra_info)\n            for item in self.info_text:\n                if v_offset + 18 > self.dim[1]:\n                    break\n                if isinstance(item, list):\n                    if len(item) > 1:\n                        points = [(x + 8, v_offset + 8 + (1.0 - y) * 30) for x, y in enumerate(item)]\n                        pygame.draw.lines(display, (255, 136, 0), False, points, 2)\n                    item = None\n                    v_offset += 18\n                elif isinstance(item, tuple):\n                    if isinstance(item[1], bool):\n                        rect = pygame.Rect((bar_h_offset, v_offset + 8), (6, 6))\n                        pygame.draw.rect(display, (255, 255, 255), rect, 0 if item[1] else 1)\n                    else:\n                        rect_border = pygame.Rect((bar_h_offset, v_offset + 8), (bar_width, 6))\n                        pygame.draw.rect(display, (255, 255, 255), rect_border, 1)\n                        f = (item[1] - item[2]) / (item[3] - item[2])\n                        if item[2] < 0.0:\n                            rect = pygame.Rect((bar_h_offset + f * (bar_width - 6), v_offset + 8), (6, 6))\n                        else:\n                            rect = pygame.Rect((bar_h_offset, v_offset + 8), (f * bar_width, 6))\n                        pygame.draw.rect(display, (255, 255, 255), rect)\n                    item = item[0]\n                if item: # At this point has to be a str.\n                    surface = self.font_mono.render(item, True, (255, 255, 255))\n                    display.blit(surface, (8, v_offset))\n                v_offset += 18\n        self.notifications.render(display)\n        self.help.render(display)\n\n    def on_world_tick(self, timestamp):\n        # Store info when server ticks\n        self.server_clock.tick()\n        self.server_fps = self.server_clock.get_fps()\n        self.frame_number = timestamp.frame_count\n        self.simulation_time = timestamp.elapsed_seconds\n\n    def toggle_info(self):\n        self.show_info = not self.show_info\n\n    def notification(self, text, seconds=2.0):\n        self.notifications.set_text(text, seconds=seconds)\n\n    def error(self, text):\n        self.notifications.set_text(\"Error: %s\" % text, (255, 0, 0))\n\n\n#===============================================================================\n# FadingText\n#===============================================================================\n\nclass FadingText(object):\n    def __init__(self, font, dim, pos):\n        self.font = font\n        self.dim = dim\n        self.pos = pos\n        self.seconds_left = 0\n        self.surface = pygame.Surface(self.dim)\n\n    def set_text(self, text, color=(255, 255, 255), seconds=2.0):\n        text_texture = self.font.render(text, True, color)\n        self.surface = pygame.Surface(self.dim)\n        self.seconds_left = seconds\n        self.surface.fill((0, 0, 0, 0))\n        self.surface.blit(text_texture, (10, 11))\n\n    def tick(self, _, clock):\n        delta_seconds = 1e-3 * clock.get_time()\n        self.seconds_left = max(0.0, self.seconds_left - delta_seconds)\n        self.surface.set_alpha(500.0 * self.seconds_left)\n\n    def render(self, display):\n        display.blit(self.surface, self.pos)\n\n\n#===============================================================================\n# HelpText\n#===============================================================================\n\nclass HelpText(object):\n    def __init__(self, font, width, height):\n        lines = __doc__.split(\"\\n\")\n        self.font = font\n        self.dim = (680, len(lines) * 22 + 12)\n        self.pos = (0.5 * width - 0.5 * self.dim[0], 0.5 * height - 0.5 * self.dim[1])\n        self.seconds_left = 0\n        self.surface = pygame.Surface(self.dim)\n        self.surface.fill((0, 0, 0, 0))\n        for n, line in enumerate(lines):\n            text_texture = self.font.render(line, True, (255, 255, 255))\n            self.surface.blit(text_texture, (22, n * 22))\n            self._render = False\n        self.surface.set_alpha(220)\n\n    def toggle(self):\n        self._render = not self._render\n\n    def render(self, display):\n        if self._render:\n            display.blit(self.surface, self.pos)\n"
  },
  {
    "path": "carla_env/tools/misc.py",
    "content": "#!/usr/bin/env python\n\n# Copyright (c) 2018 Intel Labs.\n# authors: German Ros (german.ros@intel.com)\n#\n# This work is licensed under the terms of the MIT license.\n# For a copy, see <https://opensource.org/licenses/MIT>.\n\n\"\"\" Module with auxiliary functions. \"\"\"\n\nimport math\n\nimport numpy as np\n\nimport carla\n\n\ndef draw_waypoints(world, waypoints, z=0.5):\n    \"\"\"\n    Draw a list of waypoints at a certain height given in z.\n\n    :param world: carla.world object\n    :param waypoints: list or iterable container with the waypoints to draw\n    :param z: height in meters\n    :return:\n    \"\"\"\n    for w in waypoints:\n        t = w.transform\n        begin = t.location + carla.Location(z=z)\n        angle = math.radians(t.rotation.yaw)\n        end = begin + carla.Location(x=math.cos(angle), y=math.sin(angle))\n        world.debug.draw_arrow(begin, end, arrow_size=0.3, life_time=1.0)\n\n\ndef get_speed(vehicle):\n    \"\"\"\n    Compute speed of a vehicle in Kmh\n    :param vehicle: the vehicle for which speed is calculated\n    :return: speed as a float in Kmh\n    \"\"\"\n    vel = vehicle.get_velocity()\n    return 3.6 * math.sqrt(vel.x ** 2 + vel.y ** 2 + vel.z ** 2)\n\n\ndef is_within_distance_ahead(target_location, current_location, orientation, max_distance):\n    \"\"\"\n    Check if a target object is within a certain distance in front of a reference object.\n\n    :param target_location: location of the target object\n    :param current_location: location of the reference object\n    :param orientation: orientation of the reference object\n    :param max_distance: maximum allowed distance\n    :return: True if target object is within max_distance ahead of the reference object\n    \"\"\"\n    target_vector = np.array([target_location.x - current_location.x, target_location.y - current_location.y])\n    norm_target = np.linalg.norm(target_vector)\n\n    # If the vector is too short, we can simply stop here\n    if norm_target < 0.001:\n        return True\n\n    if norm_target > max_distance:\n        return False\n\n    forward_vector = np.array(\n        [math.cos(math.radians(orientation)), math.sin(math.radians(orientation))])\n    d_angle = math.degrees(math.acos(np.dot(forward_vector, target_vector) / norm_target))\n\n    return d_angle < 90.0\n\n\ndef compute_magnitude_angle(target_location, current_location, orientation):\n    \"\"\"\n    Compute relative angle and distance between a target_location and a current_location\n\n    :param target_location: location of the target object\n    :param current_location: location of the reference object\n    :param orientation: orientation of the reference object\n    :return: a tuple composed by the distance to the object and the angle between both objects\n    \"\"\"\n    target_vector = np.array([target_location.x - current_location.x, target_location.y - current_location.y])\n    norm_target = np.linalg.norm(target_vector)\n\n    forward_vector = np.array([math.cos(math.radians(orientation)), math.sin(math.radians(orientation))])\n    d_angle = math.degrees(math.acos(np.dot(forward_vector, target_vector) / norm_target))\n\n    return (norm_target, d_angle)\n\n\ndef distance_vehicle(waypoint, vehicle_transform):\n    loc = vehicle_transform.location\n    dx = waypoint.transform.location.x - loc.x\n    dy = waypoint.transform.location.y - loc.y\n\n    return math.sqrt(dx * dx + dy * dy)\n\n\ndef vector(location_1, location_2):\n    \"\"\"\n    Returns the unit vector from location_1 to location_2\n    location_1, location_2:   carla.Location objects\n    \"\"\"\n    x = location_2.x - location_1.x\n    y = location_2.y - location_1.y\n    z = location_2.z - location_1.z\n    norm = np.linalg.norm([x, y, z]) + np.finfo(float).eps\n\n    return [x / norm, y / norm, z / norm]\n"
  },
  {
    "path": "carla_env/wrappers.py",
    "content": "import carla\nimport numpy as np\nimport weakref\n\n\ndef get_actor_display_name(actor, truncate=250):\n    name = \" \".join(actor.type_id.replace(\"_\", \".\").title().split(\".\")[1:])\n    return (name[:truncate - 1] + u\"\\u2026\") if len(name) > truncate else name\n\n\ndef get_displacement_vector(car_pos, waypoint_pos, theta):\n    \"\"\"\n    Calculates the displacement vector from the car to a waypoint, taking into account the orientation of the car.\n\n    Parameters:\n        car_pos (numpy.ndarray): 1D numpy array of shape (3,) representing the x, y, z coordinates of the car.\n        waypoint_pos (numpy.ndarray): 1D numpy array of shape (3,) representing the x, y, z coordinates of the waypoint.\n        theta (float): Angle in radians representing the orientation of the car.\n\n    Returns:\n        numpy.ndarray: 1D numpy array of shape (3,) representing the displacement vector from the car to the waypoint,\n        with the car as the origin and the y-axis pointing in the direction of the car's orientation.\n    \"\"\"\n    # Calculate the relative position of the waypoint with respect to the car\n    relative_pos = waypoint_pos - car_pos\n\n    theta = theta\n    # Construct the rotation transformation matrix\n    R = np.array([[np.cos(theta), np.sin(theta), 0],\n                  [-np.sin(theta), np.cos(theta), 0],\n                  [0, 0, 1]])\n    T = np.array([[0, 1, 0],\n                  [1, 0, 0],\n                  [0, 0, 1]])\n    # Apply the rotation matrix to the relative position vector\n    waypoint_car = R @ relative_pos\n    waypoint_car = T @ waypoint_car\n    # Set values very close to zero to exactly zero\n    waypoint_car[np.abs(waypoint_car) < 10e-10] = 0\n\n    return waypoint_car\n\n\ndef angle_diff(v0, v1):\n    \"\"\"\n    Calculates the signed angle difference between 2D vectors v0 and v1.\n    It returns the angle difference in radians between v0 and v1.\n    The v0 is the reference for the sign of the angle\n    \"\"\"\n    v0_xy = v0[:2]\n    v1_xy = v1[:2]\n    v0_xy_norm = np.linalg.norm(v0_xy)\n    v1_xy_norm = np.linalg.norm(v1_xy)\n    if v0_xy_norm == 0 or v1_xy_norm == 0:\n        return 0\n\n    v0_xy_u = v0_xy / v0_xy_norm\n    v1_xy_u = v1_xy / v1_xy_norm\n    dot_product = np.dot(v0_xy_u, v1_xy_u)\n    angle = np.arccos(dot_product)\n\n    # Calculate the sign of the angle using the cross product\n    cross_product = np.cross(v0_xy_u, v1_xy_u)\n    if cross_product < 0:\n        angle = -angle\n    if abs(angle) >= 2.3:\n        return 0\n    return round(angle, 2)\n\n\ndef distance_to_line(A, B, p):\n    p[2] = 0\n    num = np.linalg.norm(np.cross(B - A, A - p))\n    denom = np.linalg.norm(B - A)\n    if np.isclose(denom, 0):\n        return np.linalg.norm(p - A)\n    return num / denom\n\n\ndef vector(v):\n    \"\"\" Turn carla Location/Vector3D/Rotation to np.array \"\"\"\n    if isinstance(v, carla.Location) or isinstance(v, carla.Vector3D):\n        return np.array([v.x, v.y, v.z])\n    elif isinstance(v, carla.Rotation):\n        return np.array([v.pitch, v.yaw, v.roll])\n\n\ndef smooth_action(old_value, new_value, smooth_factor):\n    return old_value * smooth_factor + new_value * (1.0 - smooth_factor)\n\n\ndef build_projection_matrix(w, h, fov):\n    focal = w / (2.0 * np.tan(fov * np.pi / 360.0))\n    K = np.identity(3)\n    K[0, 0] = K[1, 1] = focal\n    K[0, 2] = w / 2.0\n    K[1, 2] = h / 2.0\n    return K\n\n\ndef get_image_point(loc, K, w2c):\n    # Calculate 2D projection of 3D coordinate\n\n    # Format the input coordinate (loc is a carla.Position object)\n    point = np.array([loc.x, loc.y, loc.z, 1])\n    # transform to camera coordinates\n    point_camera = np.dot(w2c, point)\n\n    # New we must change from UE4's coordinate system to an \"standard\"\n    # (x, y ,z) -> (y, -z, x)\n    # and we remove the fourth componebonent also\n    point_camera = [point_camera[1], -point_camera[2], point_camera[0]]\n\n    # now project 3D->2D using the camera matrix\n    point_img = np.dot(K, point_camera)\n    # normalize\n    point_img[0] /= point_img[2]\n    point_img[1] /= point_img[2]\n\n    return point_img[0:2].astype(int)\n\n\nsensor_transforms = {\n    \"spectator\": carla.Transform(carla.Location(x=-5.5, z=2.8), carla.Rotation(pitch=-15)),\n    \"lidar\": carla.Transform(carla.Location(x=0.0, z=2.4)),\n    \"dashboard\": carla.Transform(carla.Location(z=10, x=10), carla.Rotation(pitch=-90)),\n    \"birdview0\": carla.Transform(carla.Location(x=0, y=0, z=16), carla.Rotation(pitch=-90)),\n    \"birdview1\": carla.Transform(carla.Location(z=6.4, x=5.2, y=-1.6), carla.Rotation(pitch=-90)),\n    \"fpv\": carla.Transform(carla.Location(x=0.6, y=-0.5, z=1.2)),\n}\n\n\n# ===============================================================================\n# CarlaActorBase\n# ===============================================================================\n\nclass CarlaActorBase(object):\n    def __init__(self, world, actor):\n        self.world = world\n        self.actor = actor\n        self.world.actor_list.append(self)\n        self.destroyed = False\n\n    def destroy(self):\n        if self.destroyed:\n            raise Exception(\"Actor already destroyed.\")\n        else:\n            print(\"Destroying \", self, \"...\")\n            self.actor.destroy()\n            self.world.actor_list.remove(self)\n            self.destroyed = True\n\n    def get_carla_actor(self):\n        return self.actor\n\n    def tick(self):\n        pass\n\n    def __getattr__(self, name):\n        \"\"\"Relay missing methods to underlying carla actor\"\"\"\n        return getattr(self.actor, name)\n\n\n# ===============================================================================\n# Lidar\n# ===============================================================================\n\nclass Lidar(CarlaActorBase):\n    def __init__(self, world, width=120, height=120, transform=carla.Transform(), on_recv_image=None,\n                 attach_to=None):\n        self._width = width\n        self._height = height\n        self.on_recv_image = on_recv_image\n        self.range = 20\n        # Setup lidar blueprint\n        lidar_bp = world.get_blueprint_library().find('sensor.lidar.ray_cast')\n        lidar_bp.set_attribute('points_per_second', '50000')\n        lidar_bp.set_attribute('channels', '64')\n        lidar_bp.set_attribute('range', str(self.range))\n        lidar_bp.set_attribute('upper_fov', '10')\n        lidar_bp.set_attribute('horizontal_fov', '110')\n        lidar_bp.set_attribute('lower_fov', '-20')\n        lidar_bp.set_attribute('rotation_frequency', '30')\n\n        # Create and setup camera actor\n        weak_self = weakref.ref(self)\n        actor = world.spawn_actor(lidar_bp, transform, attach_to=attach_to.get_carla_actor())\n        actor.listen(lambda lidar_data: Lidar.process_lidar_input(weak_self, lidar_data))\n        print(\"Spawned actor \\\"{}\\\"\".format(actor.type_id))\n\n        super().__init__(world, actor)\n\n    @staticmethod\n    def process_lidar_input(weak_self, raw):\n        self = weak_self()\n        if not self:\n            return\n        if callable(self.on_recv_image):\n            lidar_range = 2.0 * float(self.range)\n\n            points = np.frombuffer(raw.raw_data, dtype=np.dtype('f4'))\n            points = np.reshape(points, (int(points.shape[0] / 4), 4))\n            lidar_data = np.array(points[:, :2])\n            lidar_data *= min(self._width, self._height) / lidar_range\n            lidar_data += (0.5 * self._width, 0.5 * self._height)\n            lidar_data = np.fabs(lidar_data)  # pylint: disable=E1111\n            lidar_data = lidar_data.astype(np.int32)\n            lidar_data = np.reshape(lidar_data, (-1, 2))\n            lidar_img_size = (self._width, self._height, 3)\n            lidar_img = np.zeros((lidar_img_size), dtype=np.uint8)\n\n            lidar_img[tuple(lidar_data.T)] = (255, 255, 255)\n            self.on_recv_image(lidar_img)\n        \"\"\"\n        if callable(self.on_recv_image):\n            lidar_data.convert(self.color_converter)\n            array = np.frombuffer(lidar_data.raw_data, dtype=np.dtype(\"uint8\"))\n            array = np.reshape(array, (lidar_data.height, lidar_data.width, 4))\n            array = array[:, :, :3]\n            array = array[:, :, ::-1]\n            self.on_recv_image(array)\n            \"\"\"\n\n\n# ===============================================================================\n# CollisionSensor\n# ===============================================================================\n\nclass CollisionSensor(CarlaActorBase):\n    def __init__(self, world, vehicle, on_collision_fn):\n        self.on_collision_fn = on_collision_fn\n\n        # Collision history\n        self.history = []\n\n        # Setup sensor blueprint\n        bp = world.get_blueprint_library().find(\"sensor.other.collision\")\n\n        # Create and setup sensor\n        weak_self = weakref.ref(self)\n        actor = world.spawn_actor(bp, carla.Transform(), attach_to=vehicle.get_carla_actor())\n        actor.listen(lambda event: CollisionSensor.on_collision(weak_self, event))\n\n        super().__init__(world, actor)\n\n    @staticmethod\n    def on_collision(weak_self, event):\n        self = weak_self()\n        if not self:\n            return\n\n        # Call on_collision_fn\n        if callable(self.on_collision_fn):\n            self.on_collision_fn(event)\n\n\n# ===============================================================================\n# LaneInvasionSensor\n# ===============================================================================\n\nclass LaneInvasionSensor(CarlaActorBase):\n    def __init__(self, world, vehicle, on_invasion_fn):\n        self.on_invasion_fn = on_invasion_fn\n\n        # Setup sensor blueprint\n        bp = world.get_blueprint_library().find(\"sensor.other.lane_invasion\")\n\n        # Create sensor\n        weak_self = weakref.ref(self)\n        actor = world.spawn_actor(bp, carla.Transform(), attach_to=vehicle.get_carla_actor())\n        actor.listen(lambda event: LaneInvasionSensor.on_invasion(weak_self, event))\n\n        super().__init__(world, actor)\n\n    @staticmethod\n    def on_invasion(weak_self, event):\n        self = weak_self()\n        if not self:\n            return\n\n        # Call on_invasion_fn\n        if callable(self.on_invasion_fn):\n            self.on_invasion_fn(event)\n\n\n# ===============================================================================\n# Camera\n# ===============================================================================\n\nclass Camera(CarlaActorBase):\n    def __init__(self, world, width, height, transform=carla.Transform(),\n                 attach_to=None, on_recv_image=None,\n                 camera_type=\"sensor.camera.rgb\", color_converter=carla.ColorConverter.Raw, custom_palette=False):\n        self.on_recv_image = on_recv_image\n        self.color_converter = color_converter\n\n        self.custom_palette = custom_palette\n        # Setup camera blueprint\n        camera_bp = world.get_blueprint_library().find(camera_type)\n        camera_bp.set_attribute(\"image_size_x\", str(width))\n        camera_bp.set_attribute(\"image_size_y\", str(height))\n        camera_bp.set_attribute(\"fov\", f\"110\")\n        # camera_bp.set_attribute(\"sensor_tick\", str(sensor_tick))\n\n        # Create and setup camera actor\n        weak_self = weakref.ref(self)\n        actor = world.spawn_actor(camera_bp, transform, attach_to=attach_to.get_carla_actor())\n        actor.listen(lambda image: Camera.process_camera_input(weak_self, image))\n        print(\"Spawned actor \\\"{}\\\"\".format(actor.type_id))\n\n        super().__init__(world, actor)\n\n    @staticmethod\n    def process_camera_input(weak_self, image):\n        self = weak_self()\n        if not self:\n            return\n        if callable(self.on_recv_image):\n\n            image.convert(self.color_converter)\n            array = np.frombuffer(image.raw_data, dtype=np.dtype(\"uint8\"))\n            array = np.reshape(array, (image.height, image.width, 4))\n            array = array[:, :, :3]\n            array = array[:, :, ::-1]\n\n            if self.custom_palette:\n                classes = {\n                    0: [0, 0, 0],  # None\n                    1: [0, 0, 0],  # Buildings\n                    2: [0, 0, 0],  # Fences\n                    3: [0, 0, 0],  # Other\n                    4: [0, 0, 0],  # Pedestrians\n                    5: [0, 0, 0],  # Poles\n                    6: [157, 234, 50],  # RoadLines\n                    7: [50, 64, 128],  # Roads\n                    8: [255, 255, 255],  # Sidewalks\n                    9: [0, 0, 0],  # Vegetation\n                    10: [255, 0, 0],  # Vehicles\n                    11: [0, 0, 0],  # Walls\n                    12: [0, 0, 0]  # TrafficSigns\n                }\n                segimg = np.round((array[:, :, 0])).astype(np.uint8)\n                array = array.copy()\n                for j in range(array.shape[0]):\n                    for i in range(array.shape[1]):\n                        r_id = segimg[j, i]\n                        if r_id <= 12:\n                            array[j, i] = classes[segimg[j, i]]\n                        else:\n                            array[j, i] = classes[0]\n\n            self.on_recv_image(array)\n\n    def destroy(self):\n        super().destroy()\n\n\n# ===============================================================================\n# Vehicle\n# ===============================================================================\n\nclass Vehicle(CarlaActorBase):\n    def __init__(self, world, transform=carla.Transform(),\n                 on_collision_fn=None, on_invasion_fn=None,\n                 vehicle_type=\"vehicle.tesla.model3\", is_ego=False):\n        # Setup vehicle blueprint\n        vehicle_bp = world.get_blueprint_library().find(vehicle_type)\n        if is_ego:\n            vehicle_bp.set_attribute('role_name', 'hero')\n            # vehicle_bp.set_attribute('color', '0, 255, 0')\n            color = vehicle_bp.get_attribute(\"color\").recommended_values[0]\n        else:\n            color = vehicle_bp.get_attribute(\"color\").recommended_values[0]\n            vehicle_bp.set_attribute(\"color\", color)\n\n        # Create vehicle actor\n        actor = world.spawn_actor(vehicle_bp, transform)\n        print(\"Spawned actor \\\"{}\\\"\".format(actor.type_id))\n\n        super().__init__(world, actor)\n\n        # Maintain vehicle control\n        self.control = carla.VehicleControl()\n\n        if callable(on_collision_fn):\n            self.collision_sensor = CollisionSensor(world, self, on_collision_fn=on_collision_fn)\n        if callable(on_invasion_fn):\n            self.lane_sensor = LaneInvasionSensor(world, self, on_invasion_fn=on_invasion_fn)\n\n    def tick(self):\n        self.actor.apply_control(self.control)\n\n    def get_speed(self):\n        \"\"\"\n        Return current vehicle speed in km/h\n        \"\"\"\n        velocity = self.get_velocity()\n        return 3.6 * np.sqrt(velocity.x ** 2 + velocity.y ** 2 + velocity.z ** 2)\n\n    def get_angle(self, waypoint):\n        fwd = vector(self.get_velocity())\n        wp_fwd = vector(waypoint.transform.rotation.get_forward_vector())\n        return angle_diff(wp_fwd, fwd)\n\n    def get_closest_waypoint(self):\n        return self.world.map.get_waypoint(self.get_transform().location, project_to_road=True)\n\n    def set_autopilot(self, is_true, *args, **kwargs):\n        self.actor.set_autopilot(is_true, *args, **kwargs)\n\n\n\n# ===============================================================================\n# World\n# ===============================================================================\n\nclass World():\n    def __init__(self, client, town='Town02'):\n        self.world = client.load_world(town)\n        self.map = self.get_map()\n        self.actor_list = []\n\n    def tick(self):\n        for actor in list(self.actor_list):\n            actor.tick()\n\n        self.world.tick()\n\n    def destroy(self):\n        print(\"Destroying all spawned actors\")\n        for actor in list(self.actor_list):\n            actor.destroy()\n\n    def get_carla_world(self):\n        return self.world\n\n    def __getattr__(self, name):\n        \"\"\"Relay missing methods to underlying carla object\"\"\"\n        return getattr(self.world, name)\n"
  },
  {
    "path": "clip/__init__.py",
    "content": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))"
  },
  {
    "path": "clip/clip_buffer.py",
    "content": "import torch as th\nfrom gym import spaces\nfrom typing import Any, Dict, List, Union\nimport numpy as np\nfrom stable_baselines3.common.buffers import DictReplayBuffer, DictRolloutBuffer\n\n\nclass CLIPReplayBuffer(DictReplayBuffer):\n    def __init__(\n            self,\n            buffer_size: int,\n            observation_space: spaces.Space,\n            action_space: spaces.Space,\n            device: Union[th.device, str] = \"auto\",\n            n_envs: int = 1,\n            optimize_memory_usage: bool = False,\n            handle_timeout_termination: bool = True,\n    ):\n        super().__init__(\n            buffer_size,\n            observation_space,\n            action_space,\n            device,\n            n_envs,\n            optimize_memory_usage,\n            handle_timeout_termination,\n        )\n        self.render_arrays: List[np.ndarray] = []\n        self.base_rewards: List = []\n        self.centering_factors: List = []\n        self.angle_factors: List = []\n        self.speeds: List = []\n        self.distance_std_factors: List = []\n\n    def add(\n            self,\n            obs: Dict[str, Any],\n            next_obs: Dict[str, Any],\n            action: np.ndarray,\n            reward: np.ndarray,\n            done: np.ndarray,\n            infos: List[Dict[str, Any]],\n    ) -> None:\n        super().add(\n            obs,\n            next_obs,\n            action,\n            reward,\n            done,\n            infos,\n        )\n        assert len(self.render_arrays) < self.buffer_size\n        self.render_arrays.append(infos[0][\"render_array\"])\n        self.base_rewards.append(reward[0])\n        self.centering_factors.append(infos[0][\"centering_factor\"])\n        self.angle_factors.append(infos[0][\"angle_factor\"])\n        self.speeds.append(infos[0][\"speed\"])\n        self.distance_std_factors.append(infos[0][\"distance_std_factor\"])\n\n    def clear_render_arrays(self) -> None:\n        self.render_arrays = []\n        self.base_rewards = []\n        self.centering_factors = []\n        self.angle_factors = []\n        self.speeds = []\n        self.distance_std_factors = []\n\n\nclass CLIPRolloutBuffer(DictRolloutBuffer):\n    def __init__(\n            self,\n            buffer_size: int,\n            observation_space: spaces.Space,\n            action_space: spaces.Space,\n            device: Union[th.device, str] = \"auto\",\n            gae_lambda: float = 1,\n            gamma: float = 0.99,\n            n_envs: int = 1,\n    ):\n        super().__init__(\n            buffer_size,\n            observation_space,\n            action_space,\n            device,\n            gae_lambda,\n            gamma,\n            n_envs,\n        )\n        self.render_arrays: List[np.ndarray] = []\n        self.base_rewards: List = []\n        self.centering_factors: List = []\n        self.angle_factors: List = []\n        self.speeds: List = []\n        self.distance_std_factors: List = []\n\n    def add(\n            self,\n            obs: Dict[str, np.ndarray],\n            action: np.ndarray,\n            reward: np.ndarray,\n            episode_start: np.ndarray,\n            value: th.Tensor,\n            log_prob: th.Tensor,\n            infos: List[Dict[str, Any]],\n    ) -> None:\n        super().add(\n            obs,\n            action,\n            reward,\n            episode_start,\n            value,\n            log_prob\n        )\n        assert len(self.render_arrays) < self.buffer_size\n        self.render_arrays.append(infos[0][\"render_array\"])\n        self.base_rewards.append(reward[0])\n        self.centering_factors.append(infos[0][\"centering_factor\"])\n        self.angle_factors.append(infos[0][\"angle_factor\"])\n        self.speeds.append(infos[0][\"speed\"])\n        self.distance_std_factors.append(infos[0][\"distance_std_factor\"])\n\n    def clear_render_arrays(self) -> None:\n        self.render_arrays = []\n        self.base_rewards = []\n        self.centering_factors = []\n        self.angle_factors = []\n        self.speeds = []\n        self.distance_std_factors = []\n"
  },
  {
    "path": "clip/clip_reward_model.py",
    "content": "from typing import List, Tuple\n\nimport open_clip\nimport torch\nimport torch.nn as nn\nfrom torch import Tensor\n\nfrom clip.transform import image_transform\n\n\nclass CLIPEmbed(nn.Module):\n    def __init__(self, clip_model):\n        super().__init__()\n        self.clip_model = clip_model\n        if isinstance(clip_model.visual.image_size, int):\n            image_size = clip_model.visual.image_size\n        else:\n            image_size = clip_model.visual.image_size[0]\n        self.transform = image_transform(image_size)\n\n    @torch.inference_mode()\n    def forward(self, x):\n        if x.shape[1] != 3:\n            x = x.permute(0, 3, 1, 2)\n\n        with torch.no_grad(), torch.autocast(\"cuda\", enabled=torch.cuda.is_available()):\n            x = self.transform(x)  # [batch, 3, 244, 244]\n            x = self.clip_model.encode_image(x, normalize=True)  # [batch, 1024]\n        return x\n\n\nclass CLIPReward(nn.Module):\n    def __init__(\n            self,\n            *,\n            model: CLIPEmbed,\n            alpha: float,\n            target_prompts: torch.Tensor,\n            baseline_prompts: torch.Tensor,\n    ) -> None:\n        super().__init__()\n        self.clip_embed_module = model\n        targets = self.embed_prompts(target_prompts)\n        self.register_buffer(\"targets\", targets)\n        if len(baseline_prompts) > 0:\n            baselines = self.embed_prompts(baseline_prompts)\n            direction = targets - baselines  # [1, 1024]\n            # Register them as buffers so they are automatically moved around.\n            self.register_buffer(\"baselines\", baselines)\n            self.register_buffer(\"direction\", direction)\n            self.alpha = alpha\n            projection = self.compute_projection(alpha, self.direction)  # [1024, 1024]\n            self.register_buffer(\"projection\", projection)\n\n    def compute_projection(self, alpha: float, direction) -> torch.Tensor:\n        projection = direction.T @ direction / torch.norm(direction) ** 2\n        identity = torch.diag(torch.ones(projection.shape[0])).to(projection.device)\n        projection = alpha * projection + (1 - alpha) * identity\n        return projection\n\n    @torch.inference_mode()\n    def forward(self, x: torch.Tensor, vlm_reward_type: str) -> torch.Tensor:\n        if vlm_reward_type == \"VLM-RL\":\n            return self.forward_vlm_rl(x)\n        elif \"LORD\" in vlm_reward_type:\n            return self.forward_lord(x)\n        elif vlm_reward_type == \"VLM-RM\":\n            return self.forward_vlm_rm(x)\n        elif vlm_reward_type in [\"VLM-SR\", \"RoboCLIP\"]:\n            return self.forward_vlm_sr(x)\n        else:\n            raise NotImplementedError\n\n    @torch.inference_mode()\n    def forward_vlm_rl(self, x: torch.Tensor) -> torch.Tensor:\n        x = x / torch.norm(x, dim=-1, keepdim=True)\n        y = (x @ self.targets.T)\n        z = y[:, 0] - y[:, 1]\n        return z\n\n    @torch.inference_mode()\n    def forward_lord(self, x: torch.Tensor) -> torch.Tensor:\n        x = x / torch.norm(x, dim=-1, keepdim=True)\n        y = (x @ self.targets.T)\n        z = 1 - y\n        return z.squeeze()\n\n    @torch.inference_mode()\n    def forward_vlm_rm(self, x: torch.Tensor) -> torch.Tensor:\n        x = x / torch.norm(x, dim=-1, keepdim=True)\n        y = 1 - (torch.norm((x - self.targets) @ self.projection, dim=-1) ** 2) / 2\n        return y\n\n    @torch.inference_mode()\n    def forward_vlm_sr(self, x: torch.Tensor) -> torch.Tensor:\n        x = x / torch.norm(x, dim=-1, keepdim=True)\n        y = (x @ self.targets.T)\n        return y.squeeze()\n\n    @torch.inference_mode()\n    def get_pos_neg(self, x: torch.Tensor):\n        x = x / torch.norm(x, dim=-1, keepdim=True)\n        y = (x @ self.targets.T)\n        return y[:, 0], y[:, 1]\n\n    @staticmethod\n    def tokenize_prompts(x: List[str]) -> torch.Tensor:\n        \"\"\"Tokenize a list of prompts.\"\"\"\n        return open_clip.tokenize(x)\n\n    def embed_prompts(self, x) -> torch.Tensor:\n        \"\"\"Embed a list of prompts.\"\"\"\n        with torch.no_grad():\n            x = self.clip_embed_module.clip_model.encode_text(x).float()\n        x = x / x.norm(dim=-1, keepdim=True)\n        return x\n\n    def embed_images(self, x):\n        return self.clip_embed_module.forward(x)\n\n\ndef compute_rewards(\n        model: CLIPReward,\n        frames: torch.Tensor,\n        batch_size: int,\n        vlm_reward_type: str = \"VLM-RL\",\n) -> Tensor:\n    assert frames.device == torch.device(\"cpu\")\n    n_samples = len(frames)\n    basic_rewards = torch.zeros(n_samples, device=torch.device(\"cpu\"))\n    model = model.eval()\n    with torch.no_grad():\n        for i in range(0, n_samples, batch_size):\n            frames_batch = frames[i: i + batch_size].to(next(model.parameters()).device)\n            with torch.no_grad():\n                embeddings = model.clip_embed_module(frames_batch)\n                rewards_batch = model(embeddings, vlm_reward_type=vlm_reward_type)\n            basic_rewards[i: i + batch_size] = rewards_batch\n\n    return basic_rewards\n\n\n"
  },
  {
    "path": "clip/clip_rewarded_ppo.py",
    "content": "import pathlib\nimport sys\nimport time\nimport warnings\nfrom collections import deque\nfrom typing import Optional, Tuple, TypeVar, Type, Union, Dict, Any\n\nimport numpy as np\nimport open_clip\n\nfrom gymnasium import spaces\nimport torch\nimport torch as th\nfrom box import Box\nfrom stable_baselines3 import PPO\nfrom stable_baselines3.common.callbacks import BaseCallback\nfrom stable_baselines3.common.save_util import recursive_setattr, load_from_zip_file\nfrom stable_baselines3.common.type_aliases import MaybeCallback, RolloutReturn\nfrom stable_baselines3.common.utils import safe_mean, check_for_correct_spaces, obs_as_tensor\nfrom stable_baselines3.common.vec_env import VecEnv\nfrom stable_baselines3.common.vec_env.patch_gym import _convert_space\n\nfrom clip.clip_buffer import CLIPReplayBuffer, CLIPRolloutBuffer\nfrom clip.clip_reward_model import compute_rewards, CLIPEmbed, CLIPReward\n\nfrom config import CONFIG\n\nSelfCLIPRewardedPPO = TypeVar(\"SelfCLIPRewardedPPO\", bound=\"CLIPRewardedPPO\")\n\n\nclass CLIPRewardedPPO(PPO):\n    rollout_buffer: CLIPReplayBuffer\n\n    def __init__(\n            self,\n            *,\n            env: VecEnv,\n            config: Box,\n            inference_only: bool = False,\n    ):\n        self.config = config\n\n        super().__init__(\n            env=env,\n            policy='MultiInputPolicy',\n            tensorboard_log='tensorboard',\n            seed=config.seed,\n            **self.config.algorithm_params,\n        )\n        self.ep_clip_info_buffer = None  # type: Optional[deque]\n\n        self.inference_only = inference_only\n        if not self.inference_only:\n            self._setup_model()\n            self._load_modules()\n\n    def _dump_logs(self) -> None:\n        pass\n\n    def _setup_model(self):\n        super()._setup_model()\n        self.rollout_buffer = CLIPRolloutBuffer(\n            self.n_steps,\n            self.observation_space,\n            self.action_space,\n            device=self.device,\n            gamma=self.gamma,\n            gae_lambda=self.gae_lambda,\n            n_envs=self.n_envs,\n        )\n\n    def _load_modules(self):\n        model_name_prefix, pretrained = self.config.clip_reward_params.pretrained_model.split(\"/\")\n        clip_model = open_clip.create_model(\n            model_name=model_name_prefix, pretrained=pretrained  # , cache_dir=cache_dir\n        )\n        clip_model = CLIPEmbed(clip_model)\n        target_prompts = CLIPReward.tokenize_prompts(self.config.clip_reward_params.target_prompts)\n        baseline_prompts = CLIPReward.tokenize_prompts(self.config.clip_reward_params.baseline_prompts)\n\n        clip_model = CLIPReward(\n            model=clip_model,\n            alpha=self.config.clip_reward_params.alpha,\n            target_prompts=target_prompts,\n            baseline_prompts=baseline_prompts,\n        )\n        self.reward_model = clip_model.eval().to(self.device)\n\n    def _compute_clip_rewards(self) -> None:\n        assert self.env is not None\n        assert self.ep_info_buffer is not None\n        ep_info_buffer_maxlen = self.ep_info_buffer.maxlen\n        assert ep_info_buffer_maxlen is not None\n\n        frames = torch.from_numpy(np.array(self.rollout_buffer.render_arrays))  # [1024, 80, 160, 3]\n        # only use the right half\n        width = frames.shape[2]\n        left = width // 2\n        frames = frames[:, :, left:, :]\n        rewards0 = compute_rewards(\n            model=self.reward_model,\n            frames=frames,\n            batch_size=self.config.clip_reward_params.batch_size,\n        )\n        rewards0 = rewards0.numpy().reshape(-1, 1)\n        print(\"Clip reward ...\")\n        print(list(np.round(rewards0.flatten(), 4)))\n\n        base_reward = np.array(self.rollout_buffer.base_rewards).reshape(-1, 1)\n\n        speeds = np.array(self.rollout_buffer.speeds)\n        print(\"Speed ...\")\n        print(list(np.round(speeds.flatten(), 4)))\n\n        thre_min, thre_max = 0.0, 0.03\n        rewards0 = np.clip(rewards0, a_min=thre_min, a_max=thre_max)\n        rewards0 = 1 - (rewards0 - thre_min) / (thre_max - thre_min)\n\n        centering_factors = np.array(self.rollout_buffer.centering_factors)\n        angle_factors = np.array(self.rollout_buffer.angle_factors)\n        distance_std_factors = np.array(self.rollout_buffer.distance_std_factors)\n\n        desired_speed = np.clip(rewards0.flatten(), 0.0, 1.0) * CONFIG.reward_params.target_speed\n        r_speeds = 1.0 - np.abs(speeds - desired_speed) / CONFIG.reward_params.target_speed\n\n        rewards = (r_speeds * centering_factors * angle_factors * distance_std_factors).reshape(-1, 1)\n        rewards = np.where(base_reward < 0, base_reward, rewards)\n\n        print(\"Final reward ...\")\n        print(list(np.round(rewards.flatten(), 4)))\n\n        self.rollout_buffer.clear_render_arrays()\n        self.rollout_buffer.rewards = rewards\n\n    def collect_rollouts(\n            self,\n            env: VecEnv,\n            callback: BaseCallback,\n            rollout_buffer: CLIPRolloutBuffer,\n            n_rollout_steps: int,\n    ) -> bool:\n        \"\"\"\n        Collect experiences using the current policy and fill a ``RolloutBuffer``.\n        The term rollout here refers to the model-free notion and should not\n        be used with the concept of rollout used in model-based RL or planning.\n\n        :param env: The training environment\n        :param callback: Callback that will be called at each step\n            (and at the beginning and end of the rollout)\n        :param rollout_buffer: Buffer to fill with rollouts\n        :param n_rollout_steps: Number of experiences to collect per environment\n        :return: True if function returned with at least `n_rollout_steps`\n            collected, False if callback terminated rollout prematurely.\n        \"\"\"\n        assert self._last_obs is not None, \"No previous observation was provided\"\n        # Switch to eval mode (this affects batch norm / dropout)\n        self.policy.set_training_mode(False)\n\n        n_steps = 0\n        rollout_buffer.reset()\n        # Sample new weights for the state dependent exploration\n        if self.use_sde:\n            self.policy.reset_noise(env.num_envs)\n\n        callback.on_rollout_start()\n\n        while n_steps < n_rollout_steps:\n            if self.use_sde and self.sde_sample_freq > 0 and n_steps % self.sde_sample_freq == 0:\n                # Sample a new noise matrix\n                self.policy.reset_noise(env.num_envs)\n\n            with th.no_grad():\n                # Convert to pytorch tensor or to TensorDict\n                obs_tensor = obs_as_tensor(self._last_obs, self.device)\n                actions, values, log_probs = self.policy(obs_tensor)\n            actions = actions.cpu().numpy()\n\n            # Rescale and perform action\n            clipped_actions = actions\n            # Clip the actions to avoid out of bound error\n            if isinstance(self.action_space, spaces.Box):\n                clipped_actions = np.clip(actions, self.action_space.low, self.action_space.high)\n\n            new_obs, rewards, dones, infos = env.step(clipped_actions)\n\n            self.num_timesteps += env.num_envs\n\n            # Give access to local variables\n            callback.update_locals(locals())\n            if callback.on_step() is False:\n                return False\n\n            self._update_info_buffer(infos)\n            n_steps += 1\n\n            if isinstance(self.action_space, spaces.Discrete):\n                # Reshape in case of discrete action\n                actions = actions.reshape(-1, 1)\n\n            # Handle timeout by bootstraping with value function\n            # see GitHub issue #633\n            for idx, done in enumerate(dones):\n                if (\n                        done\n                        and infos[idx].get(\"terminal_observation\") is not None\n                        and infos[idx].get(\"TimeLimit.truncated\", False)\n                ):\n                    terminal_obs = self.policy.obs_to_tensor(infos[idx][\"terminal_observation\"])[0]\n                    with th.no_grad():\n                        terminal_value = self.policy.predict_values(terminal_obs)[0]\n                    rewards[idx] += self.gamma * terminal_value\n\n            rollout_buffer.add(self._last_obs, actions, rewards, self._last_episode_starts, values, log_probs, infos)\n            self._last_obs = new_obs\n            self._last_episode_starts = dones\n\n        with th.no_grad():\n            # Compute value for the last timestep\n            values = self.policy.predict_values(obs_as_tensor(new_obs, self.device))\n\n        if not self.inference_only:\n            self._compute_clip_rewards()\n\n        rollout_buffer.compute_returns_and_advantage(last_values=values, dones=dones)\n\n        callback.on_rollout_end()\n\n        return True\n\n    def _log(self) -> None:\n        time_elapsed = max(\n            (time.time_ns() - self.start_time) / 1e9, sys.float_info.epsilon\n        )\n        fps = int((self.num_timesteps - self._num_timesteps_at_start) / time_elapsed)\n        self.logger.record(\"time/episodes\", self._episode_num, exclude=\"tensorboard\")\n        if len(self.ep_info_buffer) > 0 and len(self.ep_info_buffer[0]) > 0:\n            self.logger.record(\n                \"rollout/ep_gt_rew_mean\",\n                safe_mean([ep_info[\"r\"] for ep_info in self.ep_info_buffer]),\n            )\n            self.logger.record(\n                \"rollout/ep_len_mean\",\n                safe_mean([ep_info[\"l\"] for ep_info in self.ep_info_buffer]),\n            )\n        self.logger.record(\"time/fps\", fps)\n        self.logger.record(\n            \"time/time_elapsed\", int(time_elapsed), exclude=\"tensorboard\"\n        )\n        self.logger.record(\n            \"time/total_timesteps\", self.num_timesteps, exclude=\"tensorboard\"\n        )\n        if self.use_sde:\n            self.logger.record(\"train/std\", (self.actor.get_std()).mean().item())\n\n        if len(self.ep_success_buffer) > 0:\n            self.logger.record(\n                \"rollout/success_rate\", safe_mean(self.ep_success_buffer)\n            )\n        # Pass the number of timesteps for tensorboard\n        self.logger.dump(step=self.num_timesteps)\n\n    def train(self) -> None:\n        self._log()\n        super().train()\n\n    def _setup_learn(\n            self,\n            total_timesteps: int,\n            callback: MaybeCallback = None,\n            reset_num_timesteps: bool = True,\n            *args,\n    ) -> Tuple[int, BaseCallback]:\n        total_timesteps, callback = super()._setup_learn(\n            total_timesteps,\n            callback,\n            reset_num_timesteps,\n            *args,\n        )\n        if self.ep_clip_info_buffer is None or reset_num_timesteps:\n            self.ep_clip_info_buffer = deque(maxlen=100)\n        return total_timesteps, callback\n\n    def learn(self: SelfCLIPRewardedPPO, *args, **kwargs) -> SelfCLIPRewardedPPO:\n        assert not self.inference_only\n        return super().learn(*args, **kwargs)\n\n    def save(self, *args, **kwargs) -> None:  # type: ignore\n        super().save(*args, exclude=[\"reward_model\", \"worker_frames_tensor\"], **kwargs)\n\n    @classmethod\n    def load(\n            cls: Type[SelfCLIPRewardedPPO],\n            path: Union[str, pathlib.Path],\n            *,\n            env: Optional[VecEnv] = None,\n            load_clip: bool = True,\n            device: Union[torch.device, str] = \"cuda:0\",\n            custom_objects: Optional[Dict[str, Any]] = None,\n            force_reset: bool = True,\n            **kwargs,\n    ) -> SelfCLIPRewardedPPO:\n        data, params, pytorch_variables = load_from_zip_file(\n            path,\n            device=device,\n            custom_objects=custom_objects,\n        )\n\n        assert data is not None, \"No data found in the saved file\"\n        assert params is not None, \"No params found in the saved file\"\n\n        # Remove stored device information and replace with ours\n        if \"policy_kwargs\" in data:\n            if \"device\" in data[\"policy_kwargs\"]:\n                del data[\"policy_kwargs\"][\"device\"]\n            # backward compatibility, convert to new format\n            if (\n                    \"net_arch\" in data[\"policy_kwargs\"]\n                    and len(data[\"policy_kwargs\"][\"net_arch\"]) > 0\n            ):\n                saved_net_arch = data[\"policy_kwargs\"][\"net_arch\"]\n                if isinstance(saved_net_arch, list) and isinstance(\n                        saved_net_arch[0], dict\n                ):\n                    data[\"policy_kwargs\"][\"net_arch\"] = saved_net_arch[0]\n\n        if (\n                \"policy_kwargs\" in kwargs\n                and kwargs[\"policy_kwargs\"] != data[\"policy_kwargs\"]\n        ):\n            raise ValueError(\n                f\"The specified policy kwargs do not equal the stored policy kwargs.\"\n                f\"Stored kwargs: {data['policy_kwargs']}, \"\n                f\"specified kwargs: {kwargs['policy_kwargs']}\"\n            )\n\n        if \"observation_space\" not in data or \"action_space\" not in data:\n            raise KeyError(\n                \"The observation_space and action_space were not given, can't verify \"\n                \"new environments.\"\n            )\n\n        # Gym -> Gymnasium space conversion\n        for key in {\"observation_space\", \"action_space\"}:\n            data[key] = _convert_space(\n                data[key]\n            )  # pytype: disable=unsupported-operands\n\n        if env is not None:\n            # Wrap first if needed\n            env = cls._wrap_env(env, data[\"verbose\"])\n            # Check if given env is valid\n            check_for_correct_spaces(\n                env, data[\"observation_space\"], data[\"action_space\"]\n            )\n            # Discard `_last_obs`, this will force the env to reset before training\n            # See issue https://github.com/DLR-RM/stable-baselines3/issues/597\n            if force_reset and data is not None:\n                data[\"_last_obs\"] = None\n            # `n_envs` must be updated.\n            # See issue https://github.com/DLR-RM/stable-baselines3/issues/1018\n            if data is not None:\n                data[\"n_envs\"] = env.num_envs\n        else:\n            # Use stored env, if one exists. If not, continue as is (can be used for\n            # predict)\n            if \"env\" in data:\n                env = data[\"env\"]\n\n        # Ensure the config has the necessary structure\n        if \"config\" not in data:\n            data[\"config\"] = Box(default_box=True)\n        if not hasattr(data[\"config\"], \"action_noise\"):\n            data[\"config\"].action_noise = None\n\n        # pytype: disable=not-instantiable,wrong-keyword-args\n        # use current device\n        data[\"config\"].algorithm_params.device = device\n        model = cls(\n            env=env,\n            config=data[\"config\"],\n            inference_only=not load_clip,\n        )\n        # pytype: enable=not-instantiable,wrong-keyword-args\n\n        # load parameters\n        model.__dict__.update(data)\n        model.__dict__.update(kwargs)\n        model._setup_model()\n\n        try:\n            # put state_dicts back in place\n            model.set_parameters(params, exact_match=True, device=device)\n        except RuntimeError as e:\n            # Patch to load Policy saved using SB3 < 1.7.0\n            # the error is probably due to old policy being loaded\n            # See https://github.com/DLR-RM/stable-baselines3/issues/1233\n            if \"pi_features_extractor\" in str(\n                    e\n            ) and \"Missing key(s) in state_dict\" in str(e):\n                model.set_parameters(params, exact_match=False, device=device)\n                warnings.warn(\n                    \"You are probably loading a model saved with SB3 < 1.7.0, \"\n                    \"we deactivated exact_match so you can save the model \"\n                    \"again to avoid issues in the future \"\n                    \"(see https://github.com/DLR-RM/stable-baselines3/issues/1233 for \"\n                    f\"more info). Original error: {e} \\n\"\n                    \"Note: the model should still work fine, this only a warning.\"\n                )\n            else:\n                raise e\n        # put other pytorch variables back in place\n        if pytorch_variables is not None:\n            for name in pytorch_variables:\n                # Skip if PyTorch variable was not defined (to ensure backward\n                # compatibility). This happens when using SAC/TQC.\n                # SAC has an entropy coefficient which can be fixed or optimized.\n                # If it is optimized, an additional PyTorch variable `log_ent_coef` is\n                # defined, otherwise it is initialized to `None`.\n                if pytorch_variables[name] is None:\n                    continue\n                # Set the data attribute directly to avoid issue when using optimizers\n                # See https://github.com/DLR-RM/stable-baselines3/issues/391\n                recursive_setattr(model, f\"{name}.data\", pytorch_variables[name].data)\n\n        # Sample gSDE exploration matrix, so it uses the right device\n        # see issue #44\n        if model.use_sde:\n            model.policy.reset_noise()\n\n        if load_clip:\n            model._load_modules()\n        return model"
  },
  {
    "path": "clip/clip_rewarded_sac.py",
    "content": "import pathlib\nimport sys\nimport time\nimport warnings\nfrom collections import deque\nfrom typing import Optional, Tuple, TypeVar, Type, Union, Dict, Any\n\nimport numpy as np\nimport open_clip\nimport stable_baselines3.common.noise as sb3_noise\nimport torch\nfrom box import Box\nfrom stable_baselines3 import SAC\nfrom stable_baselines3.common.callbacks import BaseCallback\nfrom stable_baselines3.common.save_util import recursive_setattr, load_from_zip_file\nfrom stable_baselines3.common.type_aliases import MaybeCallback, RolloutReturn\nfrom stable_baselines3.common.utils import safe_mean, check_for_correct_spaces\nfrom stable_baselines3.common.vec_env import VecEnv\nfrom stable_baselines3.common.vec_env.patch_gym import _convert_space\n\nfrom clip.clip_buffer import CLIPReplayBuffer\nfrom clip.clip_reward_model import compute_rewards, CLIPEmbed, CLIPReward\n\nfrom config import CONFIG\n\nSelfCLIPRewardedSAC = TypeVar(\"SelfCLIPRewardedSAC\", bound=\"CLIPRewardedSAC\")\n\n\nclass CLIPRewardedSAC(SAC):\n    replay_buffer: CLIPReplayBuffer\n\n    def __init__(\n        self,\n        *,\n        env: VecEnv,\n        config: Box,\n        inference_only: bool = False,\n    ):\n        self.config = config\n\n        if config.action_noise:\n            mean = config.action_noise.mean * np.ones(env.action_space.shape)\n            sigma = config.action_noise.sigma * np.ones(env.action_space.shape)\n            if config.action_noise.name == \"NormalActionNoise\":\n                action_noise = sb3_noise.NormalActionNoise(mean=mean, sigma=sigma)\n            elif config.action_noise.name == \"OrnsteinUhlenbeckActionNoise\":\n                action_noise = sb3_noise.OrnsteinUhlenbeckActionNoise(\n                    mean=mean,\n                    sigma=sigma,\n                    theta=config.action_noise.theta,\n                    dt=config.action_noise.dt,\n                )\n            else:\n                raise ValueError(\n                    f\"Unknown action noise name: {config.action_noise.name}\"\n                )\n        else:\n            action_noise = None\n\n        super().__init__(\n            env=env,\n            policy='MultiInputPolicy',\n            replay_buffer_class=CLIPReplayBuffer,\n            tensorboard_log='tensorboard',\n            seed=config.seed,\n            action_noise=action_noise,\n            **self.config.algorithm_params,\n        )\n        self.ep_clip_info_buffer = None  # type: Optional[deque]\n\n        self.inference_only = inference_only\n        if not self.inference_only:\n            self._load_modules()\n            self.previous_num_timesteps = 0\n            self.previous_num_episodes = 0\n\n    def _dump_logs(self) -> None:\n        pass\n\n    def _load_modules(self):\n        model_name_prefix, pretrained = self.config.clip_reward_params.pretrained_model.split(\"/\")\n        clip_model = open_clip.create_model(\n            model_name=model_name_prefix, pretrained=pretrained  # , cache_dir=cache_dir\n        )\n        clip_model = CLIPEmbed(clip_model)\n        target_prompts = CLIPReward.tokenize_prompts(self.config.clip_reward_params.target_prompts)\n        baseline_prompts = CLIPReward.tokenize_prompts(self.config.clip_reward_params.baseline_prompts)\n\n        clip_model = CLIPReward(\n            model=clip_model,\n            alpha=self.config.clip_reward_params.alpha,\n            target_prompts=target_prompts,\n            baseline_prompts=baseline_prompts,\n        )\n        self.reward_model = clip_model.eval().to(self.device)\n\n    def _compute_clip_rewards(self):\n        assert self.env is not None\n\n        replay_buffer_pos = self.replay_buffer.pos\n        total_timesteps = self.num_timesteps - self.previous_num_timesteps\n        env_episode_timesteps = total_timesteps // self.env.num_envs\n\n        frames = torch.from_numpy(np.array(self.replay_buffer.render_arrays))  # [64, height, width, 3]\n        # only use the right half\n        width = frames.shape[2]\n        left = width // 2\n        frames = frames[:, :, left:, :]\n        rewards0 = compute_rewards(\n            model=self.reward_model,\n            frames=frames,\n            batch_size=self.config.clip_reward_params.batch_size,\n            vlm_reward_type=self.config.vlm_reward_type,\n        )\n        rewards0 = rewards0.numpy().reshape(-1, 1)\n        print(\"Clip reward ...\")\n        print(list(np.round(rewards0.flatten(), 4)))\n        base_reward = np.array(self.replay_buffer.base_rewards).reshape(-1, 1)\n        speeds = np.array(self.replay_buffer.speeds)\n        print(\"Speed ...\")\n        print(list(np.round(speeds.flatten(), 4)))\n\n        if self.config.vlm_reward_type == \"VLM-RL\":\n            thre_min, thre_max = 0.0, 0.03\n            rewards0 = np.clip(rewards0, a_min=thre_min, a_max=thre_max)\n            rewards0 = 1 - (rewards0 - thre_min) / (thre_max - thre_min)\n            centering_factors = np.array(self.replay_buffer.centering_factors)\n            angle_factors = np.array(self.replay_buffer.angle_factors)\n            distance_std_factors = np.array(self.replay_buffer.distance_std_factors)\n            desired_speed = np.clip(rewards0.flatten(), 0.0, 1.0) * CONFIG.reward_params.target_speed\n            r_speeds = 1.0 - np.abs(speeds - desired_speed) / CONFIG.reward_params.target_speed\n            rewards = (r_speeds * centering_factors * angle_factors * distance_std_factors).reshape(-1, 1)\n            rewards = np.where(base_reward < 0, base_reward, rewards)\n        elif self.config.vlm_reward_type == \"LORD-Speed\":\n            lord_speed_r = 1.0 - np.abs(speeds - 20.0) / 20.0\n            rewards = rewards0 + lord_speed_r.reshape(-1, 1)\n        elif self.config.vlm_reward_type == \"VLM-SR\":\n            rewards = np.where(rewards0 > 0.32, 1, -1)\n        else:\n            rewards = rewards0\n\n        print(\"Final reward ...\")\n        print(list(np.round(rewards.flatten(), 4)))\n        self.replay_buffer.clear_render_arrays()\n\n        if not self.inference_only:\n            if replay_buffer_pos - env_episode_timesteps >= 0:\n                self.replay_buffer.rewards[\n                replay_buffer_pos - env_episode_timesteps: replay_buffer_pos\n                ] = rewards\n            else:\n                self.replay_buffer.rewards[\n                -(env_episode_timesteps - replay_buffer_pos):\n                ] = rewards[: env_episode_timesteps - replay_buffer_pos]\n                self.replay_buffer.rewards[:replay_buffer_pos] = rewards[\n                                                                 env_episode_timesteps - replay_buffer_pos:\n                                                                 ]\n\n    def collect_rollouts(self, *args, **kwargs) -> RolloutReturn:\n        rollout = super().collect_rollouts(*args, **kwargs)\n        if not self.inference_only:\n            self._compute_clip_rewards()\n            self.previous_num_timesteps = self.num_timesteps\n            self.previous_num_episodes = self._episode_num\n        return rollout\n\n    def _log(self) -> None:\n        time_elapsed = max(\n            (time.time_ns() - self.start_time) / 1e9, sys.float_info.epsilon\n        )\n        fps = int((self.num_timesteps - self._num_timesteps_at_start) / time_elapsed)\n        self.logger.record(\"time/episodes\", self._episode_num, exclude=\"tensorboard\")\n        if len(self.ep_info_buffer) > 0 and len(self.ep_info_buffer[0]) > 0:\n            self.logger.record(\n                \"rollout/ep_gt_rew_mean\",\n                safe_mean([ep_info[\"r\"] for ep_info in self.ep_info_buffer]),\n            )\n            self.logger.record(\n                \"rollout/ep_len_mean\",\n                safe_mean([ep_info[\"l\"] for ep_info in self.ep_info_buffer]),\n            )\n        self.logger.record(\"time/fps\", fps)\n        self.logger.record(\n            \"time/time_elapsed\", int(time_elapsed), exclude=\"tensorboard\"\n        )\n        self.logger.record(\n            \"time/total_timesteps\", self.num_timesteps, exclude=\"tensorboard\"\n        )\n        if self.use_sde:\n            self.logger.record(\"train/std\", (self.actor.get_std()).mean().item())\n\n        if len(self.ep_success_buffer) > 0:\n            self.logger.record(\n                \"rollout/success_rate\", safe_mean(self.ep_success_buffer)\n            )\n        # Pass the number of timesteps for tensorboard\n        self.logger.dump(step=self.num_timesteps)\n\n    def train(self, gradient_steps: int, batch_size: int = 100) -> None:\n        self._log()\n        super().train(gradient_steps, batch_size)\n\n    def _setup_learn(\n        self,\n        total_timesteps: int,\n        callback: MaybeCallback = None,\n        reset_num_timesteps: bool = True,\n        *args,\n    ) -> Tuple[int, BaseCallback]:\n        total_timesteps, callback = super()._setup_learn(\n            total_timesteps,\n            callback,\n            reset_num_timesteps,\n            *args,\n        )\n        if self.ep_clip_info_buffer is None or reset_num_timesteps:\n            self.ep_clip_info_buffer = deque(maxlen=100)\n        return total_timesteps, callback\n\n    def learn(self: SelfCLIPRewardedSAC, *args, **kwargs) -> SelfCLIPRewardedSAC:\n        assert not self.inference_only\n        self.previous_num_timesteps = 0\n        self.previous_num_episodes = 0\n        return super().learn(*args, **kwargs)\n\n    def save(self, *args, **kwargs) -> None:  # type: ignore\n        super().save(*args, exclude=[\"reward_model\", \"worker_frames_tensor\"], **kwargs)\n\n\n    @classmethod\n    def load(\n            cls: Type[SelfCLIPRewardedSAC],\n            path: Union[str, pathlib.Path],\n            *,\n            env: Optional[VecEnv] = None,\n            load_clip: bool = True,\n            device: Union[torch.device, str] = \"cuda:0\",\n            custom_objects: Optional[Dict[str, Any]] = None,\n            force_reset: bool = True,\n            **kwargs,\n    ) -> SelfCLIPRewardedSAC:\n        data, params, pytorch_variables = load_from_zip_file(\n            path,\n            device=device,\n            custom_objects=custom_objects,\n        )\n\n        assert data is not None, \"No data found in the saved file\"\n        assert params is not None, \"No params found in the saved file\"\n\n        # Remove stored device information and replace with ours\n        if \"policy_kwargs\" in data:\n            if \"device\" in data[\"policy_kwargs\"]:\n                del data[\"policy_kwargs\"][\"device\"]\n            # backward compatibility, convert to new format\n            if (\n                    \"net_arch\" in data[\"policy_kwargs\"]\n                    and len(data[\"policy_kwargs\"][\"net_arch\"]) > 0\n            ):\n                saved_net_arch = data[\"policy_kwargs\"][\"net_arch\"]\n                if isinstance(saved_net_arch, list) and isinstance(\n                        saved_net_arch[0], dict\n                ):\n                    data[\"policy_kwargs\"][\"net_arch\"] = saved_net_arch[0]\n\n        if (\n                \"policy_kwargs\" in kwargs\n                and kwargs[\"policy_kwargs\"] != data[\"policy_kwargs\"]\n        ):\n            raise ValueError(\n                f\"The specified policy kwargs do not equal the stored policy kwargs.\"\n                f\"Stored kwargs: {data['policy_kwargs']}, \"\n                f\"specified kwargs: {kwargs['policy_kwargs']}\"\n            )\n\n        if \"observation_space\" not in data or \"action_space\" not in data:\n            raise KeyError(\n                \"The observation_space and action_space were not given, can't verify \"\n                \"new environments.\"\n            )\n\n        # Gym -> Gymnasium space conversion\n        for key in {\"observation_space\", \"action_space\"}:\n            data[key] = _convert_space(\n                data[key]\n            )  # pytype: disable=unsupported-operands\n\n        if env is not None:\n            # Wrap first if needed\n            env = cls._wrap_env(env, data[\"verbose\"])\n            # Check if given env is valid\n            check_for_correct_spaces(\n                env, data[\"observation_space\"], data[\"action_space\"]\n            )\n            # Discard `_last_obs`, this will force the env to reset before training\n            # See issue https://github.com/DLR-RM/stable-baselines3/issues/597\n            if force_reset and data is not None:\n                data[\"_last_obs\"] = None\n            # `n_envs` must be updated.\n            # See issue https://github.com/DLR-RM/stable-baselines3/issues/1018\n            if data is not None:\n                data[\"n_envs\"] = env.num_envs\n        else:\n            # Use stored env, if one exists. If not, continue as is (can be used for\n            # predict)\n            if \"env\" in data:\n                env = data[\"env\"]\n\n        # Ensure the config has the necessary structure\n        if \"config\" not in data:\n            data[\"config\"] = Box(default_box=True)\n        if not hasattr(data[\"config\"], \"action_noise\"):\n            data[\"config\"].action_noise = None\n\n        # pytype: disable=not-instantiable,wrong-keyword-args\n        # use current device\n        data[\"config\"].algorithm_params.device = device\n        model = cls(\n            env=env,\n            config=data[\"config\"],\n            inference_only=not load_clip,\n        )\n        # pytype: enable=not-instantiable,wrong-keyword-args\n\n        # load parameters\n        model.__dict__.update(data)\n        model.__dict__.update(kwargs)\n        model._setup_model()\n\n        try:\n            # put state_dicts back in place\n            model.set_parameters(params, exact_match=True, device=device)\n        except RuntimeError as e:\n            # Patch to load Policy saved using SB3 < 1.7.0\n            # the error is probably due to old policy being loaded\n            # See https://github.com/DLR-RM/stable-baselines3/issues/1233\n            if \"pi_features_extractor\" in str(\n                    e\n            ) and \"Missing key(s) in state_dict\" in str(e):\n                model.set_parameters(params, exact_match=False, device=device)\n                warnings.warn(\n                    \"You are probably loading a model saved with SB3 < 1.7.0, \"\n                    \"we deactivated exact_match so you can save the model \"\n                    \"again to avoid issues in the future \"\n                    \"(see https://github.com/DLR-RM/stable-baselines3/issues/1233 for \"\n                    f\"more info). Original error: {e} \\n\"\n                    \"Note: the model should still work fine, this only a warning.\"\n                )\n            else:\n                raise e\n        # put other pytorch variables back in place\n        if pytorch_variables is not None:\n            for name in pytorch_variables:\n                # Skip if PyTorch variable was not defined (to ensure backward\n                # compatibility). This happens when using SAC/TQC.\n                # SAC has an entropy coefficient which can be fixed or optimized.\n                # If it is optimized, an additional PyTorch variable `log_ent_coef` is\n                # defined, otherwise it is initialized to `None`.\n                if pytorch_variables[name] is None:\n                    continue\n                # Set the data attribute directly to avoid issue when using optimizers\n                # See https://github.com/DLR-RM/stable-baselines3/issues/391\n                recursive_setattr(model, f\"{name}.data\", pytorch_variables[name].data)\n\n        # Sample gSDE exploration matrix, so it uses the right device\n        # see issue #44\n        if model.use_sde:\n            model.policy.reset_noise()\n\n        if load_clip:\n            model._load_modules()\n        return model"
  },
  {
    "path": "clip/transform.py",
    "content": "from typing import Optional, Tuple\n\nimport torch\n\nfrom torchvision.transforms import (\n    Normalize,\n    Compose,\n    InterpolationMode,\n    Resize,\n    CenterCrop,\n)\n\nfrom open_clip.constants import OPENAI_DATASET_MEAN, OPENAI_DATASET_STD\n\n\ndef image_transform(\n    image_size: int,\n    mean: Optional[Tuple[float, ...]] = None,\n    std: Optional[Tuple[float, ...]] = None,\n):\n    mean = mean or OPENAI_DATASET_MEAN\n    if not isinstance(mean, (list, tuple)):\n        mean = (mean,) * 3\n\n    std = std or OPENAI_DATASET_STD\n    if not isinstance(std, (list, tuple)):\n        std = (std,) * 3\n\n    if isinstance(image_size, (list, tuple)) and image_size[0] == image_size[1]:\n        # for square size, pass size as int so that\n        # Resize() uses aspect preserving shortest edge\n        image_size = image_size[0]\n\n    normalize = Normalize(mean=mean, std=std)\n\n    def convert_from_uint8_to_float(image: torch.Tensor) -> torch.Tensor:\n        if image.dtype == torch.uint8:\n            return image.to(torch.float32) / 255.0\n        else:\n            return image\n\n    return Compose(\n        [\n            convert_from_uint8_to_float,\n            Resize((image_size, image_size), interpolation=InterpolationMode.BICUBIC),\n            CenterCrop(image_size),\n            normalize,\n        ]\n    )\n"
  },
  {
    "path": "config.py",
    "content": "import torch as th\nfrom box import Box\nfrom stable_baselines3.common.noise import NormalActionNoise\nimport numpy as np\nfrom utils import lr_schedule\n\nfrom stable_baselines3.common.torch_layers import BaseFeaturesExtractor\nfrom stable_baselines3.common.preprocessing import get_flattened_obs_dim\nimport torch.nn as nn\nimport gymnasium as gym\nimport torch\n\n\nclass CustomCNN(nn.Module):\n    def __init__(self, input_shape, features_dim=1):\n        super(CustomCNN, self).__init__()\n        n_input_channels = input_shape[0]\n\n        if n_input_channels == 3:\n            self.cnn = nn.Sequential(\n                nn.Conv2d(n_input_channels, 16, kernel_size=5, stride=2),  # (16, 58, 38)\n                nn.ReLU(),\n                nn.Conv2d(16, 32, kernel_size=3, stride=2),  # (32, 28, 18)\n                nn.ReLU(),\n                nn.Conv2d(32, 64, kernel_size=3, stride=2),  # (64, 13, 8)\n                nn.ReLU(),\n                nn.Conv2d(64, 128, kernel_size=3, stride=2),  # (128, 6, 4)\n                nn.ReLU(),\n                nn.Conv2d(128, 256, kernel_size=3, stride=1),  # (256, 4, 2)\n                nn.ReLU(),\n                nn.Flatten(),\n            )\n        else:\n            self.cnn = nn.Sequential(\n                nn.Conv2d(n_input_channels, 8, kernel_size=5, stride=2),\n                nn.ReLU(),\n                nn.Conv2d(8, 16, kernel_size=5, stride=2),\n                nn.ReLU(),\n                nn.Conv2d(16, 32, kernel_size=5, stride=2),\n                nn.ReLU(),\n                nn.Conv2d(32, 64, kernel_size=3, stride=2),\n                nn.ReLU(),\n                nn.Conv2d(64, 128, kernel_size=3, stride=2),\n                nn.ReLU(),\n                nn.Conv2d(128, 256, kernel_size=3, stride=1),\n                nn.ReLU(),\n                nn.Flatten(),\n            )\n        with torch.no_grad():\n            n_flatten = self.cnn(torch.zeros(1, *input_shape)).view(-1).shape[0]\n\n        self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.ReLU())\n\n    def forward(self, x):\n        x = self.cnn(x)\n        x = self.linear(x)\n        return x\n\n\nclass CustomMultiInputExtractor(BaseFeaturesExtractor):\n    def __init__(self, observation_space: gym.Space, features_dim: int = 256):\n        super(CustomMultiInputExtractor, self).__init__(observation_space, features_dim)\n        extractors = {}\n        total_concat_size = 0\n\n        if isinstance(observation_space, gym.spaces.Dict):\n            for key, subspace in observation_space.spaces.items():\n                if key == \"seg_camera\":\n                    extractors[key] = CustomCNN(subspace.shape, features_dim=features_dim)\n                    total_concat_size += features_dim\n                else:\n                    extractors[key] = nn.Flatten()\n                    total_concat_size += get_flattened_obs_dim(subspace)\n        else:\n            extractors[\"default\"] = CustomCNN(observation_space.shape, features_dim=features_dim)\n            total_concat_size = features_dim\n\n        self.extractors = nn.ModuleDict(extractors)\n        self._features_dim = total_concat_size\n\n    def forward(self, observations) -> torch.Tensor:\n        encoded_tensor_list = []\n\n        if isinstance(observations, dict):\n            for key, extractor in self.extractors.items():\n                encoded_tensor_list.append(extractor(observations[key]))\n        else:\n            encoded_tensor_list.append(self.extractors[\"default\"](observations))\n        return torch.cat(encoded_tensor_list, dim=1)\n\n\nalgorithm_params = {\n    \"PPO\": dict(\n        device=\"cuda:0\",\n        learning_rate=lr_schedule(1e-4, 1e-6, 2),\n        gamma=0.98,\n        gae_lambda=0.95,\n        clip_range=0.2,\n        ent_coef=0.05,\n        n_epochs=10,\n        n_steps=1024,\n        policy_kwargs=dict(activation_fn=th.nn.ReLU,\n                           net_arch=[dict(pi=[500, 300], vf=[500, 300])],\n                           features_extractor_class=CustomMultiInputExtractor,\n                           features_extractor_kwargs=dict(features_dim=256),\n                           )\n    ),\n    \"SAC\": dict(\n        device=\"cuda:0\",\n        learning_rate=lr_schedule(5e-4, 1e-6, 2),\n        buffer_size=100000,\n        batch_size=256,\n        ent_coef='auto',\n        gamma=0.98,\n        tau=0.02,\n        train_freq=64,\n        gradient_steps=64,\n        learning_starts=10000,\n        use_sde=True,\n        policy_kwargs=dict(log_std_init=-3, net_arch=[400, 300]),\n    ),\n    \"DDPG\": dict(\n        device=\"cuda:0\",\n        gamma=0.98,\n        buffer_size=200000,\n        learning_starts=10000,\n        action_noise=NormalActionNoise(mean=np.zeros(2), sigma=0.5 * np.ones(2)),\n        gradient_steps=-1,\n        learning_rate=lr_schedule(5e-4, 1e-6, 2),\n        policy_kwargs=dict(net_arch=[400, 300]),\n    ),\n    \"SAC_CLIP\": dict(\n        device=\"cuda:0\",\n        learning_rate=lr_schedule(1e-4, 5e-7, 2),\n        buffer_size=100000,\n        batch_size=256,\n        ent_coef='auto',\n        gamma=0.98,\n        tau=0.02,\n        train_freq=64,\n        gradient_steps=64,\n        learning_starts=10000,\n        use_sde=True,\n        policy_kwargs=dict(\n            log_std_init=-3, net_arch=[500, 300],\n            features_extractor_class=CustomMultiInputExtractor,\n            features_extractor_kwargs=dict(features_dim=256),\n        )\n    ),\n}\n\nstates = {\n    \"1\": [\"steer\", \"throttle\", \"speed\", \"angle_next_waypoint\", \"maneuver\"],\n    \"2\": [\"steer\", \"throttle\", \"speed\", \"maneuver\"],\n    \"3\": [\"steer\", \"throttle\", \"speed\", \"waypoints\"],\n    \"4\": [\"steer\", \"throttle\", \"speed\", \"angle_next_waypoint\", \"maneuver\", \"distance_goal\"],\n    \"5\": [\"steer\", \"throttle\", \"speed\", \"waypoints\", \"seg_camera\"],\n}\n\nreward_params = {\n    \"reward_fn_5_default\": dict(\n        early_stop=True,\n        min_speed=20.0,  # km/h\n        max_speed=35.0,  # km/h\n        target_speed=25.0,  # kmh\n        max_distance=3.0,  # Max distance from center before terminating\n        max_std_center_lane=0.4,\n        max_angle_center_lane=90,\n        penalty_reward=-10,\n    ),\n    \"reward_fn_5_no_early_stop\": dict(\n        early_stop=False,\n        min_speed=20.0,  # km/h\n        max_speed=35.0,  # km/h\n        target_speed=25.0,  # kmh\n        max_distance=3.0,  # Max distance from center before terminating\n        max_std_center_lane=0.4,\n        max_angle_center_lane=90,\n        penalty_reward=-10,\n    ),\n    \"reward_fn_5_best\": dict(\n        early_stop=True,\n        min_speed=20.0,  # km/h\n        max_speed=35.0,  # km/h\n        target_speed=25.0,  # kmh\n        max_distance=2.0,  # Max distance from center before terminating\n        max_std_center_lane=0.35,\n        max_angle_center_lane=90,\n        penalty_reward=-10,\n    ),\n    \"reward_clg\": dict(\n        pretrained_model=\"ViT-bigG-14/laion2b_s39b_b160k\",\n        batch_size=64,\n        target_prompts=[\n            \"Two cars have collided with each other on the road\",\n            \"The road is clear with no car accidents\",\n        ],\n    ),\n    \"reward_lord\": dict(\n        pretrained_model=\"ViT-bigG-14/laion2b_s39b_b160k\",\n        batch_size=64,\n        target_prompts=[\n            \"Two cars have collided with each other on the road\",\n        ],\n    ),\n    \"reward_vlm_rm\": dict(\n        pretrained_model=\"ViT-bigG-14/laion2b_s39b_b160k\",\n        batch_size=64,\n        alpha=0.5,\n        target_prompts=[\n            \"A car is driving safely\",\n        ],\n        baseline_prompts=[\n            \"A car\",\n        ],\n    ),\n    \"reward_fn_Chen\": dict(\n        early_stop=True,\n        min_speed=0.0,\n        max_speed=28.8,\n        target_speed=25.0,\n        max_distance=4.0,\n        max_std_center_lane=0.4,\n        max_angle_center_lane=90,\n        penalty_reward=-10,\n    ),\n    \"reward_fn_ASAP\": dict(\n        early_stop=True,\n        min_speed=0.0,\n        max_speed=50.0,\n        target_speed=30.0,\n        max_distance=3.0,\n        max_std_center_lane=0.4,\n        max_angle_center_lane=90,\n        penalty_reward=-5,\n    ),\n}\n\n_CONFIG_1 = {\n    \"algorithm\": \"PPO\",\n    \"algorithm_params\": algorithm_params[\"PPO\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn5\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"use_rgb_bev\": False,\n}\n\n_CONFIG_2 = {\n    \"algorithm\": \"SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn5\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"use_rgb_bev\": False,\n}\n\n_CONFIG_vlm_rl = {\n    \"algorithm\": \"CLIP-SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC_CLIP\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn5\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"clip_reward_params\": reward_params[\"reward_clg\"],\n    \"vlm_reward_type\": \"VLM-RL\",\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"action_noise\": {},\n    \"use_seg_bev\": True,\n    \"use_rgb_bev\": True,\n}\n\n_CONFIG_vlm_rl_ppo = {\n    \"algorithm\": \"CLIP-PPO\",\n    \"algorithm_params\": algorithm_params[\"PPO\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn5\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"clip_reward_params\": reward_params[\"reward_clg\"],\n    \"vlm_reward_type\": \"VLM-RL\",\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"action_noise\": {},\n    \"action_space_type\": \"discrete\",\n    \"use_seg_bev\": True,\n    \"use_rgb_bev\": True,\n}\n\n_CONFIG_lord = {\n    \"algorithm\": \"CLIP-SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC_CLIP\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn5\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"clip_reward_params\": reward_params[\"reward_lord\"],\n    \"vlm_reward_type\": \"LORD\",\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"action_noise\": {},\n    \"use_seg_bev\": False,\n    \"use_rgb_bev\": True,\n}\n\n_CONFIG_lord_speed = {\n    \"algorithm\": \"CLIP-SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC_CLIP\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn5\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"clip_reward_params\": reward_params[\"reward_lord\"],\n    \"vlm_reward_type\": \"LORD-Speed\",\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"action_noise\": {},\n    \"use_seg_bev\": False,\n    \"use_rgb_bev\": True,\n}\n\n_CONFIG_vlm_rm = {\n    \"algorithm\": \"CLIP-SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC_CLIP\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn5\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"clip_reward_params\": reward_params[\"reward_vlm_rm\"],\n    \"vlm_reward_type\": \"VLM-RM\",\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"action_noise\": {},\n    \"use_seg_bev\": False,\n    \"use_rgb_bev\": True,\n}\n\n_CONFIG_vlm_sr = {\n    \"algorithm\": \"CLIP-SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC_CLIP\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn5\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"clip_reward_params\": reward_params[\"reward_vlm_rm\"],\n    \"vlm_reward_type\": \"VLM-SR\",\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"action_noise\": {},\n    \"use_seg_bev\": False,\n    \"use_rgb_bev\": True,\n}\n\n_CONFIG_roboclip = {\n    \"algorithm\": \"CLIP-SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC_CLIP\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn5\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"clip_reward_params\": reward_params[\"reward_vlm_rm\"],\n    \"vlm_reward_type\": \"RoboCLIP\",\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"action_noise\": {},\n    \"use_seg_bev\": False,\n    \"use_rgb_bev\": True,\n}\n\n_CONFIG_tirl_sac = {\n    \"algorithm\": \"SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn_simple\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"use_rgb_bev\": False,\n}\n\n_CONFIG_tirl_ppo = {\n    \"algorithm\": \"PPO\",\n    \"algorithm_params\": algorithm_params[\"PPO\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn_simple\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"use_rgb_bev\": False,\n}\n\n_CONFIG_chatscene_sac = {\n    \"algorithm\": \"SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn_chatscene\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"use_rgb_bev\": False,\n}\n\n_CONFIG_chatscene_ppo = {\n    \"algorithm\": \"PPO\",\n    \"algorithm_params\": algorithm_params[\"PPO\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn_chatscene\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"use_rgb_bev\": False,\n}\n\n_CONFIG_revolve = {\n    \"algorithm\": \"SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn_revolve\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"use_rgb_bev\": False,\n}\n\n_CONFIG_revolve_auto = {\n    \"algorithm\": \"SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC\"],\n    \"state\": states[\"5\"],\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn_revolve_auto\",\n    \"reward_params\": reward_params[\"reward_fn_5_default\"],\n    \"obs_res\": (80, 120),\n    \"seed\": 100,\n    \"wrappers\": [],\n    \"use_rgb_bev\": False,\n}\n\n_CONFIG_Chen = {\n    \"algorithm\": \"SAC\",\n    \"algorithm_params\": algorithm_params[\"SAC\"],\n    \"state\": states[\"5\"],\n    \"vae_model\": None,\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn_Chen\",\n    \"reward_params\": reward_params[\"reward_fn_Chen\"],\n    \"obs_res\": (80, 120),\n    \"seed\": 120,\n    \"wrappers\": [],\n    \"use_rgb_bev\": False,\n}\n\n_CONFIG_ASAP = {\n    \"algorithm\": \"PPO\",\n    \"algorithm_params\": algorithm_params[\"PPO\"],\n    \"state\": states[\"5\"],\n    \"vae_model\": None,\n    \"action_smoothing\": 0.75,\n    \"reward_fn\": \"reward_fn_ASAP\",\n    \"reward_params\": reward_params[\"reward_fn_ASAP\"],\n    \"obs_res\": (80, 120),\n    \"seed\": 120,\n    \"wrappers\": [],\n    \"use_rgb_bev\": False,\n}\n\n\nCONFIGS = {\n    \"1\": _CONFIG_1,\n    \"2\": _CONFIG_2,\n    \"vlm_rl\": _CONFIG_vlm_rl,\n    \"vlm_rl_ppo\": _CONFIG_vlm_rl_ppo,\n    \"lord\": _CONFIG_lord,\n    \"lord_speed\": _CONFIG_lord_speed,\n    \"vlm_rm\": _CONFIG_vlm_rm,\n    \"vlm_sr\": _CONFIG_vlm_sr,\n    \"roboclip\": _CONFIG_roboclip,\n    \"tirl_sac\": _CONFIG_tirl_sac,\n    \"tirl_ppo\": _CONFIG_tirl_ppo,\n    \"chatscene_sac\": _CONFIG_chatscene_sac,\n    \"chatscene_ppo\": _CONFIG_chatscene_ppo,\n    \"revolve\": _CONFIG_revolve,\n    \"revolve_auto\": _CONFIG_revolve_auto,\n    \"Chen\": _CONFIG_Chen,\n    \"ASAP\": _CONFIG_ASAP,\n}\n\nCONFIG = None\n\n\ndef set_config(config_name):\n    global CONFIG\n    CONFIG = Box(CONFIGS[config_name], default_box=True)\n    return CONFIG\n"
  },
  {
    "path": "eval.py",
    "content": "import os\nimport argparse\nimport pandas as pd\nimport numpy as np\nimport config\nfrom clip.clip_rewarded_ppo import CLIPRewardedPPO\n\nparser = argparse.ArgumentParser(description=\"Eval a CARLA agent\")\nparser.add_argument(\"--host\", default=\"localhost\", type=str, help=\"IP of the host server (default: 127.0.0.1)\")\nparser.add_argument(\"--port\", default=2020, type=int, help=\"TCP port to listen to (default: 2000)\")\nparser.add_argument(\"--model\", type=str, default=\"./model_400000_steps.zip\", help=\"Path to a model evaluate\")\nparser.add_argument(\"--no_render\", action=\"store_false\", help=\"If True, render the environment\")\nparser.add_argument(\"--fps\", type=int, default=15, help=\"FPS to render the environment\")\nparser.add_argument(\"--no_record_video\", action=\"store_false\", help=\"If True, record video of the evaluation\")\nparser.add_argument(\"--config\", type=str, default=\"vlm_rl\", help=\"Config to use (default: vlm_rl)\")\nparser.add_argument(\"--seed\", type=int, default=101, help=\"random seed\")\nparser.add_argument(\"--device\", type=str, default=\"cuda:0\", help=\"cpu, cuda:0, cuda:1, cuda:2\")\nparser.add_argument(\"--density\", choices=['empty', 'regular', 'dense'], default=\"regular\",\n                    help=\"different traffic densities\")\nparser.add_argument(\"--town\", choices=['Town01', 'Town02', 'Town03', 'Town04', 'Town05'], default=\"Town02\",\n                    help=\"different traffic densities\")\n\nargs = vars(parser.parse_args())\nCONFIG = config.set_config(args[\"config\"])\nCONFIG.seed = args[\"seed\"]\nCONFIG.algorithm_params.device = args[\"device\"]\n\nfrom stable_baselines3 import PPO, DDPG, SAC\nfrom clip.clip_rewarded_sac import CLIPRewardedSAC\n\nfrom utils import VideoRecorder, parse_wrapper_class\nfrom carla_env.state_commons import create_encode_state_fn\nfrom carla_env.rewards import reward_functions\n\nfrom carla_env.wrappers import vector, get_displacement_vector\nfrom carla_env.envs.carla_route_env import CarlaRouteEnv\nfrom eval_plots import plot_eval, summary_eval\n\n\ndef convert_state(state):\n    c_state = dict()\n    c_state['seg_camera'] = np.transpose(state['seg_camera'], (2, 0, 1))\n    c_state['seg_camera'] = np.array([c_state['seg_camera']])\n    c_state['waypoints'] = np.array([state['waypoints']])\n    c_state['vehicle_measures'] = np.array([state['vehicle_measures']])\n    return c_state\n\n\ndef run_eval(env, model, model_path=None, record_video=False, eval_suffix=''):\n    model_name = os.path.basename(model_path)\n    log_path = os.path.join(os.path.dirname(model_path), 'eval{}'.format(eval_suffix))\n    os.makedirs(log_path, exist_ok=True)\n    video_path = os.path.join(log_path, model_name.replace(\".zip\", \"_eval.avi\"))\n    csv_path = os.path.join(log_path, model_name.replace(\".zip\", \"_eval.csv\"))\n    model_id = f\"{model_path.split('/')[-2]}-{model_name.split('_')[-2]}\"\n    state = env.reset()\n\n    columns = [\"model_id\", \"episode\", \"step\", \"throttle\", \"steer\", \"vehicle_location_x\", \"vehicle_location_y\",\n               \"reward\", \"distance\", \"speed\", \"center_dev\", \"angle_next_waypoint\", \"waypoint_x\", \"waypoint_y\",\n               \"route_x\", \"route_y\", \"routes_completed\", \"collision_speed\", \"collision_interval\", \"CPS\", \"CPM\"\n               ]\n    df = pd.DataFrame(columns=columns)\n\n    # Init video recording\n    if record_video:\n        rendered_frame = env.render(mode=\"rgb_array\")\n        print(\"Recording video to {} ({}x{}x{}@{}fps)\".format(video_path, *rendered_frame.shape,\n                                                              int(env.fps)))\n        video_recorder = VideoRecorder(video_path,\n                                       frame_size=rendered_frame.shape,\n                                       fps=env.fps)\n        video_recorder.add_frame(rendered_frame)\n    else:\n        video_recorder = None\n\n    episode_idx = 0\n    # While non-terminal state\n    print(\"Episode \", episode_idx)\n    saved_route = False\n    while episode_idx < 10:\n        env.extra_info.append(\"Evaluation\")\n        action, _states = model.predict(state, deterministic=True)\n        next_state, reward, dones, info = env.step(action)\n\n        state = next_state\n        if env.step_count >= 150 and env.current_waypoint_index == 0:\n            dones = True\n\n        # Save route at the beginning of the episode\n        if not saved_route:\n            initial_heading = np.deg2rad(env.vehicle.get_transform().rotation.yaw)\n            initial_vehicle_location = vector(env.vehicle.get_location())\n            # Save the route to plot them later\n            for way in env.route_waypoints:\n                route_relative = get_displacement_vector(initial_vehicle_location,\n                                                         vector(way[0].transform.location),\n                                                         initial_heading)\n                new_row = pd.DataFrame([['route', env.episode_idx, route_relative[0], route_relative[1]]],\n                                       columns=[\"model_id\", \"episode\", \"route_x\", \"route_y\"])\n                df = pd.concat([df, new_row], ignore_index=True)\n            saved_route = True\n\n        vehicle_relative = get_displacement_vector(initial_vehicle_location, vector(env.vehicle.get_location()),\n                                                   initial_heading)\n        waypoint_relative = get_displacement_vector(initial_vehicle_location,\n                                                    vector(env.current_waypoint.transform.location), initial_heading)\n\n        if env.collision_state:\n            collision_speed, collision_interval, cps, cpm = env.collision_speed, env.collision_interval, env.cps, env.cpm\n        else:\n            collision_speed, collision_interval, cps, cpm = 0, None, 0, 0\n        new_row = pd.DataFrame(\n            [[model_id, env.episode_idx, env.step_count, env.vehicle.control.throttle, env.vehicle.control.steer,\n              vehicle_relative[0], vehicle_relative[1], reward,\n              env.distance_traveled,\n              env.vehicle.get_speed(), env.distance_from_center,\n              np.rad2deg(env.vehicle.get_angle(env.current_waypoint)),\n              waypoint_relative[0], waypoint_relative[1], None, None,\n              env.routes_completed, collision_speed, collision_interval, cps, cpm\n              ]], columns=columns)\n        df = pd.concat([df, new_row], ignore_index=True)\n\n        if record_video:\n            # Add frame\n            rendered_frame = env.render(mode=\"rgb_array\")\n            video_recorder.add_frame(rendered_frame)\n        if dones:\n            state = env.reset()\n            episode_idx += 1\n            saved_route = False\n            print(\"Episode \", episode_idx)\n\n    # Release video\n    if record_video:\n        video_recorder.release()\n\n    df.to_csv(csv_path, index=False)\n    plot_eval([csv_path])\n    summary_eval(csv_path)\n\n\nif __name__ == \"__main__\":\n\n    model_ckpt = args[\"model\"]\n    algorithm_dict = {\n        \"PPO\": PPO,\n        \"DDPG\": DDPG,\n        \"SAC\": SAC,\n        \"CLIP-SAC\": CLIPRewardedSAC,\n        \"CLIP-PPO\": CLIPRewardedPPO,\n    }\n    if CONFIG.algorithm not in algorithm_dict:\n        raise ValueError(\"Invalid algorithm name\")\n\n    AlgorithmRL = algorithm_dict[CONFIG.algorithm]\n\n    observation_space, encode_state_fn = create_encode_state_fn(CONFIG.state, CONFIG)\n    action_space_type = 'continuous' if CONFIG.action_space_type != 'discrete' else 'discrete'\n\n    eval_suffix = ''\n    if args['density'] == 'empty':\n        activate_traffic_flow = False\n        tf_num = 0\n        eval_suffix += 'empty'\n    else:\n        activate_traffic_flow = True\n        if args['density'] == 'regular':\n            tf_num = 20\n        else:\n            tf_num = 40\n            eval_suffix += 'dense'\n    if args['town'] != 'Town02':\n        eval_suffix += args['town']\n\n    env = CarlaRouteEnv(obs_res=CONFIG.obs_res, host=args[\"host\"], port=args[\"port\"],\n                        reward_fn=reward_functions[CONFIG.reward_fn], observation_space=observation_space,\n                        encode_state_fn=encode_state_fn, fps=args[\"fps\"], action_smoothing=CONFIG.action_smoothing,\n                        eval=True, action_space_type=action_space_type, activate_spectator=True, activate_render=True,\n                        activate_bev=True, activate_seg_bev=CONFIG.use_seg_bev, start_carla=True,\n                        activate_traffic_flow=activate_traffic_flow, tf_num=tf_num, town=args[\"town\"])\n\n    for wrapper_class_str in CONFIG.wrappers:\n        wrap_class, wrap_params = parse_wrapper_class(wrapper_class_str)\n        env = wrap_class(env, *wrap_params)\n\n    # Load the model based on the algorithm type\n    if CONFIG.algorithm == \"CLIP-SAC\":\n        model = CLIPRewardedSAC.load(model_ckpt, env=env, config=CONFIG, device=args[\"device\"], load_clip=False)\n        model.inference_only = True\n    elif CONFIG.algorithm == \"CLIP-PPO\":\n        model = CLIPRewardedPPO.load(model_ckpt, env=env, config=CONFIG, device=args[\"device\"], load_clip=False)\n        model.inference_only = True\n    else:\n        model = AlgorithmRL.load(model_ckpt, env=env, device=args[\"device\"])\n\n    print(\"Model loaded successfully...\")\n\n    run_eval(env, model, model_ckpt, record_video=args[\"no_record_video\"], eval_suffix=eval_suffix)\n\n    env.close()\n"
  },
  {
    "path": "eval_plots.py",
    "content": "import os\n\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport argparse\n\n\ndef plot_eval(eval_csv_paths, output_name=None):\n    episode_numbers = pd.read_csv(eval_csv_paths[0])['episode'].unique()\n\n    # Get a list of unique episode numbers\n    cols = ['Steer', 'Throttle', 'Speed (km/h)', 'Reward', 'Center Deviation (m)', 'Distance (m)',\n            'Angle next waypoint (grad)', 'Trayectory']\n\n    # Create a figure with subplots for each episode\n    fig, axs = plt.subplots(len(episode_numbers), len(cols), figsize=(4 * len(cols), 3 * len(episode_numbers)))\n\n    if len(eval_csv_paths) == 1:\n        eval_plot_path = eval_csv_paths[0].replace(\".csv\", \".png\")\n    else:\n        os.makedirs('tensorboard/eval_plots', exist_ok=True)\n        eval_plot_path = f'./tensorboard/eval_plots/{output_name}'\n\n    models = ['Waypoints']\n\n    # Load the dataframe\n    for e, path in enumerate(eval_csv_paths):\n        df = pd.read_csv(path)\n        model_id = df.loc[df['model_id'] != 'route', 'model_id'].unique()[0]\n        models.append(model_id)\n        # Loop over each episode number\n        for i, episode_number in enumerate(episode_numbers):\n            # Select the rows for the current episode\n            episode_df = df[(df['episode'] == episode_number) & (df['model_id'] != 'route')]\n            route_df = df[(df['episode'] == episode_number) & (df['model_id'] == 'route')]\n\n            # Plot the steer progress\n            axs[i, 0].plot(episode_df['step'], episode_df['steer'], label=model_id)\n            axs[i, 0].set_xlabel('Step')\n            axs[i, 0].set_ylim(-1, 1)  # clip y-axis limits to -1 and 1\n\n            # Plot the throttle progress\n            axs[i][1].plot(episode_df['step'], episode_df['throttle'], label=model_id)\n            axs[i][1].set_xlabel('Step')\n            axs[i, 1].set_ylim(0, 1)  # clip y-axis limits to -1 and 1\n\n            axs[i][2].plot(episode_df['step'], episode_df['speed'], label=model_id)\n            axs[i][2].set_xlabel('Step')\n            axs[i, 2].set_ylim(0, 40)  # clip y-axis limits to -1 and 1\n\n            # Plot the reward progress\n            axs[i][3].plot(episode_df['step'], episode_df['reward'], label=model_id)\n            axs[i][3].set_xlabel('Step')\n            axs[i, 3].set_ylim(-0.2, 1)  # clip y-axis limits to -1 and 1\n\n            axs[i][4].plot(episode_df['step'], episode_df['center_dev'], label=model_id)\n            axs[i][4].set_xlabel('Step')\n            axs[i, 4].set_ylim(0, 3)  # clip y-axis limits to -1 and 1\n\n            axs[i][5].plot(episode_df['step'], episode_df['distance'], label=model_id)\n            axs[i][5].set_xlabel('Step')\n\n            axs[i][6].plot(episode_df['step'], episode_df['angle_next_waypoint'], label=model_id)\n            axs[i][6].set_xlabel('Step')\n\n            if e == 0:\n                axs[i][7].plot(route_df['route_x'].head(1), route_df['route_y'].head(1), 'go',\n                               label='Start')\n                axs[i][7].plot(route_df['route_x'].tail(1), route_df['route_y'].tail(1), 'ro',\n                               label='End')\n                axs[i][7].plot(route_df['route_x'], route_df['route_y'], label='Waypoints', color=\"green\")\n\n                axs[i, 7].set_xlim(left=min(-5, min(route_df['route_x'] - 3)))\n                axs[i, 7].set_xlim(right=max(5, max(route_df['route_x'] + 3)))\n            axs[i][7].plot(episode_df['vehicle_location_x'], episode_df['vehicle_location_y'], label=model_id)\n\n    # Add legend\n\n    pad = 5  # in points\n    for ax, col in zip(axs[0], cols):\n        ax.annotate(col, xy=(0.5, 1), xytext=(0, pad),\n                    xycoords='axes fraction', textcoords='offset points',\n                    size='large', ha='center', va='baseline')\n    for ax, row in zip(axs[:, 0], episode_numbers):\n        ax.annotate(f\"Episode {row}\", xy=(0, 0.5), xytext=(-ax.yaxis.labelpad - pad, 0),\n                    xycoords=ax.yaxis.label, textcoords='offset points',\n                    size='large', ha='right', va='center')\n\n    # Adjust the spacing between subplots\n    # fig.subplots_adjust(bottom=0.062*len(labels))\n\n    handles, labels = axs[0][7].get_legend_handles_labels()\n    fig.legend(handles, labels, loc='lower center', bbox_to_anchor=(0.5, 0.02))\n    fig.tight_layout(rect=(0, 0.1 + 0.02 * len(labels), 1, 1))\n\n    # Adjust the bottom margin to make room for the legend\n    # Show the plot\n    plt.savefig(eval_plot_path)\n\n\ndef summary_eval(eval_csv_path):\n    df = pd.read_csv(eval_csv_path)\n    df_route = df[df['model_id'] == 'route']\n    df = df[df['model_id'] != 'route']\n    df = df.drop(['model_id', 'route_x', 'route_y'], axis=1)\n\n    # Get the total distance traveled from each episode based on last row\n    df_distance = df.groupby(['episode'], as_index=False).last()[['episode', 'distance']].rename(\n        columns={'distance': 'total_distance'})\n\n    # Get the total reward from each episode summing all the rewards\n    df_reward = df.groupby(['episode'], as_index=False).sum()[['episode', 'reward']].rename(columns={'reward': 'total_reward'})\n\n    # Get the mean and std from: speed, center_dev and reward\n    df_mean_std = df.groupby(['episode'], as_index=False).agg(\n        {'speed': ['mean', 'std'], 'center_dev': ['mean', 'std'], 'reward': ['mean', 'std']})\n    df_mean_std.columns = ['episode', 'speed_mean', 'speed_std', 'center_dev_mean', 'center_dev_std', 'reward_mean', 'reward_std']\n\n    # Get the RC, CS, \"collision_interval\", \"CPS\", \"CPM\" from each episode based on last row\n    df_routes_completed = df.groupby(['episode'], as_index=False).last()[['episode', 'routes_completed']]\n    df_routes_completed['routes_completed'] = df_routes_completed['routes_completed'].clip(upper=1)\n\n    df_collision_speed = df.groupby(['episode'], as_index=False).last()[['episode', 'collision_speed']]\n    df_collision_interval = df.groupby(['episode'], as_index=False).last()[['episode', 'collision_interval']]\n    df_CPS = df.groupby(['episode'], as_index=False).last()[['episode', 'CPS']]\n    df_CPM = df.groupby(['episode'], as_index=False).last()[['episode', 'CPM']]\n\n    # Calculate if the episode was successful based on the distance between waypoints and the vehicle of the last row\n\n    # First from the route dataframe get the last row of each episode\n    df_waypoint = df_route.groupby(['episode'], as_index=False).last()[['episode', 'route_x', 'route_y']]\n    df_success = df.groupby(['episode'], as_index=False).last()[['episode', 'vehicle_location_x', 'vehicle_location_y']]\n    df_success = pd.merge(df_success, df_waypoint, on='episode')\n\n    # If the distance between the last waypoint and the vehicle is less than 5 meters, the episode was successful\n    df_success['success'] = df_success.apply(\n        lambda x: eucldist(x['vehicle_location_x'], x['vehicle_location_y'], x['route_x'], x['route_y']) < 5, axis=1)\n    df_success = df_success[['episode', 'success']]\n    # Merge all the dataframes\n    df_summary = pd.merge(df_distance, df_reward, on='episode')\n    df_summary = pd.merge(df_summary, df_routes_completed, on='episode')\n    df_summary = pd.merge(df_summary, df_collision_speed, on='episode')\n    df_summary = pd.merge(df_summary, df_collision_interval, on='episode')\n    df_summary = pd.merge(df_summary, df_CPS, on='episode')\n    df_summary = pd.merge(df_summary, df_CPM, on='episode')\n\n    df_summary = pd.merge(df_summary, df_mean_std, on='episode')\n    df_summary = pd.merge(df_summary, df_success, on='episode')\n\n    # Turn the episode column into a string\n    df_summary['episode'] = df_summary['episode'].astype(str)\n\n    # Create a new row called where the episode is total with the mean of all the columns except total reward and total distance without modifying the index\n    df_summary.loc['total'] = df_summary.mean(numeric_only=True)\n    df_summary.loc['total', 'episode'] = 'total'\n    df_summary.loc['total', 'total_reward'] = df_summary['total_reward'].iloc[:-1].sum()\n    df_summary.loc['total', 'total_distance'] = df_summary['total_distance'].iloc[:-1].sum()\n\n    output_path = eval_csv_path.replace(\"eval.csv\", \"eval_summary.csv\")\n    df_summary.to_csv(output_path, index=False)\n    print(f\"Saving summary to {output_path}\")\n\n\ndef eucldist(x1, y1, x2, y2):\n    return np.sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2)\n\n\ndef main():\n    parser = argparse.ArgumentParser(description=\"Compare evaluation results from different models\")\n    parser.add_argument(\"--models\", nargs='+', type=str, default=\"\", help=\"Path to a model evaluate\")\n    args = vars(parser.parse_args())\n\n    compare_models = args['models']\n    eval_csv_paths = []\n    for model in compare_models:\n        model_id, steps = model.split(\"-\")\n        eval_csv_paths.append(os.path.join(\"tensorboard\", model_id, \"eval\", f\"model_{steps}_steps_eval.csv\"))\n    plot_eval(eval_csv_paths, output_name=\"+\".join(compare_models))\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "requirements.txt",
    "content": "carla==0.9.13\nnetworkx\npygame\ngym==0.23.0\ngymnasium==0.28.1\nstable-baselines3==2.0.0\nopen_clip_torch==2.20.0\nopencv-python==4.7.0.68\n# torch==1.13.1+cu116\nshimmy==1.3.0\ntensorboard\npython-box\nscikit-learn\nh5py"
  },
  {
    "path": "run_eval.py",
    "content": "import glob\nimport subprocess\nimport time\nimport os\n\nfrom tqdm import tqdm\n\n\ndef kill_carla():\n    print(\"Killing Carla server\\n\")\n    time.sleep(1)\n    subprocess.run([\"killall\", \"-9\", \"CarlaUE4-Linux-Shipping\"])\n    time.sleep(4)\n\n\nif __name__ == '__main__':\n    selected_models = [f\"model_{i}_steps.zip\" for i in range(100000, 1100000, 100000)]\n    tensorboard_path = './tensorboard'\n    for model_path in tqdm(os.listdir(tensorboard_path), desc=\"Processing models\"):\n        config = model_path.split('id')[-1]\n        print(config)\n        print(\"==\" * 30)\n        print(model_path, config)\n        model_ckpts = glob.glob(os.path.join(tensorboard_path, model_path, \"*.zip\"))\n        model_ckpt_filenames = [os.path.basename(path) for path in model_ckpts]\n        contains_all = all(model in model_ckpt_filenames for model in selected_models)\n        if not contains_all:\n            print(\"Skipping model {}, because not fully trained...\".format(model_path))\n            continue\n\n        for model_ckpt in model_ckpts:\n            if model_ckpt.split('/')[-1] not in selected_models: continue\n            # summary_path = os.path.join(tensorboard_path, model_path, \"eval\",\n            #                             os.path.basename(model_ckpt).replace(\".zip\", \"_eval_summary.csv\"))\n            # if os.path.exists(summary_path):\n            #     print(\"Already exists: \", summary_path)\n            #     continue\n            kill_carla()\n            print(model_ckpt)\n\n            args_eval = [\n                \"--model\", model_ckpt,\n                \"--config\", config,\n                \"--town\", \"Town02\",  # default \"Town02\", change this to evaluate on other towns\n                \"--density\", \"regular\",  # default \"regular\", change this to `dense` or `empty` to evaluate on other densities\n            ]\n\n            subprocess.run([\"python\", \"eval.py\"] + args_eval)\n"
  },
  {
    "path": "train.py",
    "content": "import warnings\nimport os\nfrom datetime import datetime\n\nwarnings.filterwarnings(\"ignore\")\nos.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'\n\nimport argparse\nimport config\n\nparser = argparse.ArgumentParser(description=\"Trains a CARLA agent\")\nparser.add_argument(\"--host\", default=\"localhost\", type=str, help=\"IP of the host server (default: 127.0.0.1)\")\nparser.add_argument(\"--port\", default=2000, type=int, help=\"TCP port to listen to (default: 2000)\")\nparser.add_argument(\"--total_timesteps\", type=int, default=1_000_000, help=\"Total timestep to train for\")\nparser.add_argument(\"--start_carla\", action=\"store_true\", help=\"If True, start a CARLA server\")\nparser.add_argument(\"--no_render\", action=\"store_false\", help=\"If True, render the environment\")\nparser.add_argument(\"--fps\", type=int, default=15, help=\"FPS to render the environment\")\nparser.add_argument(\"--num_checkpoints\", type=int, default=100, help=\"Checkpoint number\")\nparser.add_argument(\"--log_dir\", type=str, default=\"tensorboard\", help=\"Directory to save logs\")\nparser.add_argument(\"--device\", type=str, default=\"cuda:0\", help=\"cpu, cuda:0, cuda:1, cuda:2\")\nparser.add_argument(\"--config\", type=str, default=\"vlm_rl_ppo\", help=\"Config to use (default: vlm_rl)\")\n\nargs = vars(parser.parse_args())\nCONFIG = config.set_config(args[\"config\"])\nCONFIG.algorithm_params.device = args[\"device\"]\n\nfrom stable_baselines3 import PPO, DDPG, SAC\nfrom clip.clip_rewarded_sac import CLIPRewardedSAC\nfrom clip.clip_rewarded_ppo import CLIPRewardedPPO\nfrom stable_baselines3.common.callbacks import CheckpointCallback\nfrom stable_baselines3.common.logger import configure\nfrom carla_env.envs.carla_route_env import CarlaRouteEnv\nfrom carla_env.state_commons import create_encode_state_fn\nfrom carla_env.rewards import reward_functions\nfrom utils import HParamCallback, TensorboardCallback, write_json, parse_wrapper_class\n\nos.makedirs(args[\"log_dir\"], exist_ok=True)\n\nalgorithm_dict = {\"PPO\": PPO, \"DDPG\": DDPG, \"SAC\": SAC, \"CLIP-SAC\": CLIPRewardedSAC, \"CLIP-PPO\": CLIPRewardedPPO}\nif CONFIG.algorithm not in algorithm_dict:\n    raise ValueError(\"Invalid algorithm name\")\n\nAlgorithmRL = algorithm_dict[CONFIG.algorithm]\n\nobservation_space, encode_state_fn = create_encode_state_fn(CONFIG.state, CONFIG)\naction_space_type = 'continuous' if CONFIG.action_space_type != 'discrete' else 'discrete'\n\nenv = CarlaRouteEnv(obs_res=CONFIG.obs_res, host=args[\"host\"], port=args[\"port\"],\n                    reward_fn=reward_functions[CONFIG.reward_fn], observation_space=observation_space,\n                    encode_state_fn=encode_state_fn, fps=args[\"fps\"],\n                    action_smoothing=CONFIG.action_smoothing, action_space_type=action_space_type,\n                    activate_spectator=args[\"no_render\"], activate_render=args[\"no_render\"],\n                    activate_bev=CONFIG.use_rgb_bev, activate_seg_bev=CONFIG.use_seg_bev,\n                    activate_traffic_flow=True, start_carla=args[\"start_carla\"],\n                    )\n\nfor wrapper_class_str in CONFIG.wrappers:\n    wrap_class, wrap_params = parse_wrapper_class(wrapper_class_str)\n    env = wrap_class(env, *wrap_params)\n\nif AlgorithmRL.__name__ == \"CLIPRewardedSAC\":\n    model = CLIPRewardedSAC(env=env, config=CONFIG)\nelif AlgorithmRL.__name__ == \"CLIPRewardedPPO\":\n    model = CLIPRewardedPPO(env=env, config=CONFIG)\nelse:\n    model = AlgorithmRL('MultiInputPolicy', env, verbose=1, seed=CONFIG.seed, tensorboard_log=args[\"log_dir\"],\n                        **CONFIG.algorithm_params)\n\nmodel_suffix = \"{}_id{}\".format(datetime.now().strftime(\"%Y%m%d_%H%M%S\"), args['config'])\nmodel_name = f'{model.__class__.__name__}_{model_suffix}'\nmodel_dir = os.path.join(args[\"log_dir\"], model_name)\n\nnew_logger = configure(model_dir, [\"stdout\", \"csv\", \"tensorboard\"])\nmodel.set_logger(new_logger)\nwrite_json(CONFIG, os.path.join(model_dir, 'config.json'))\n\nmodel.learn(total_timesteps=args[\"total_timesteps\"],\n            callback=[HParamCallback(CONFIG), TensorboardCallback(1), CheckpointCallback(\n                save_freq=args[\"total_timesteps\"] // args[\"num_checkpoints\"],\n                save_path=model_dir, name_prefix=\"model\")], reset_num_timesteps=False)\n"
  },
  {
    "path": "utils.py",
    "content": "import cv2\nimport math\nimport json\n\nimport gym\nimport numpy as np\nimport pygame\nfrom stable_baselines3.common.callbacks import BaseCallback\nfrom stable_baselines3.common.logger import HParam\n\n\ndef write_json(data, path):\n    config_dict = {}\n    with open(path, 'w', encoding='utf-8') as f:\n        for k, v in data.items():\n            if isinstance(v, str) and v.isnumeric():\n                config_dict[k] = int(v)\n            elif isinstance(v, dict):\n                config_dict[k] = dict()\n                for k_inner, v_inner in v.items():\n                    config_dict[k][k_inner] = v_inner.__str__()\n                config_dict[k] = str(config_dict[k])\n            else:\n                config_dict[k] = v.__str__()\n        json.dump(config_dict, f, indent=4)\n\n\nclass VideoRecorder():\n    def __init__(self, filename, frame_size, fps=30):\n        fourcc = cv2.VideoWriter_fourcc(*'XVID')\n        self.video_writer = cv2.VideoWriter(filename, fourcc, int(fps), (frame_size[1], frame_size[0]))\n\n    def add_frame(self, frame):\n        self.video_writer.write(cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))\n\n    def add_frame_with_reward(self, frame, reward):\n        frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)\n\n        reward_text = f\"Reward: {reward:.2f}\"\n\n        (text_width, text_height), _ = cv2.getTextSize(\n            reward_text, cv2.FONT_HERSHEY_SIMPLEX, 1, 2\n        )\n        position = (frame.shape[1] - text_width - 10,  # x 坐标\n                    frame.shape[0] - 10)\n        cv2.putText(\n            frame, reward_text, position,\n            cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2, cv2.LINE_AA\n        )\n        self.video_writer.write(frame)\n\n    def release(self):\n        self.video_writer.release()\n\n    def __del__(self):\n        self.release()\n\n\nclass HParamCallback(BaseCallback):\n    def __init__(self, config):\n        \"\"\"\n        Saves the hyperparameters and metrics at the start of the training, and logs them to TensorBoard.\n        \"\"\"\n        super().__init__()\n        self.config = config\n\n    def _on_training_start(self) -> None:\n        hparam_dict = {}\n        for k, v in self.config.items():\n            if isinstance(v, str) and v.isnumeric():\n                hparam_dict[k] = int(v)\n            elif isinstance(v, dict):\n                hparam_dict[k] = dict()\n                for k_inner, v_inner in v.items():\n                    hparam_dict[k][k_inner] = v_inner.__str__()\n                hparam_dict[k] = str(hparam_dict[k])\n            else:\n                hparam_dict[k] = v.__str__()\n        # define the metrics that will appear in the `HPARAMS` Tensorboard tab by referencing their tag\n        # Tensorbaord will find & display metrics from the `SCALARS` tab\n        metric_dict = {\n            \"rollout/ep_len_mean\": 0,\n            \"train/value_loss\": 0,\n        }\n        self.logger.record(\n            \"hparams\",\n            HParam(hparam_dict, metric_dict),\n            exclude=(\"stdout\", \"log\", \"json\", \"csv\"),\n        )\n\n    def _on_step(self) -> bool:\n        return True\n\n\nclass TensorboardCallback(BaseCallback):\n    \"\"\"\n    Custom callback for plotting additional values in tensorboard.\n    \"\"\"\n\n    def __init__(self, verbose=0):\n        super().__init__(verbose)\n\n    def _on_step(self) -> bool:\n        # Log scalar value (here a random variable)\n        if self.locals['dones'][0]:\n            self.logger.record(\"time/num_timesteps\", self.num_timesteps)\n            self.logger.record(\"custom/total_reward\", self.locals['infos'][0]['total_reward'])\n            self.logger.record(\"custom/routes_completed\", self.locals['infos'][0]['routes_completed'])\n            self.logger.record(\"custom/total_distance\", self.locals['infos'][0]['total_distance'])\n            self.logger.record(\"custom/avg_center_dev\", self.locals['infos'][0]['avg_center_dev'])\n            self.logger.record(\"custom/avg_speed\", self.locals['infos'][0]['avg_speed'])\n            self.logger.record(\"custom/mean_reward\", self.locals['infos'][0]['mean_reward'])\n            self.logger.record(\"custom/mean_reward\", self.locals['infos'][0]['mean_reward'])\n            self.logger.record(\"time/num_timesteps\", self.num_timesteps)\n            self.logger.record(\"custom/collision_rate\", self.locals['infos'][0]['collision_rate'])\n            self.logger.record(\"custom/collision_num\", self.locals['infos'][0]['collision_num'])\n            self.logger.record(\"custom/episode_length\", self.locals['infos'][0]['episode_length'])\n            if self.locals['infos'][0]['collision_state']:\n                self.logger.record(\"custom/CPS\", self.locals['infos'][0]['CPS'])\n                self.logger.record(\"custom/CPM\", self.locals['infos'][0]['CPM'])\n                self.logger.record(\"custom/collision_interval\", self.locals['infos'][0]['collision_interval'])\n                self.logger.record(\"custom/collision_speed\", self.locals['infos'][0]['collision_speed'])\n\n            self.logger.dump(self.num_timesteps)\n\n            if hasattr(self.model, 'replay_buffer'):\n                recent_rewards = self.model.replay_buffer.rewards[max(0, self.model.replay_buffer.pos-500):self.model.replay_buffer.pos]\n                mean_recent_rewards = np.mean(recent_rewards)\n                sum_recent_rewards = np.sum(recent_rewards)\n\n                # Log the results\n                self.logger.record(\"replay_buffer/mean_recent_rewards\", mean_recent_rewards)\n                self.logger.record(\"replay_buffer/sum_recent_rewards\", sum_recent_rewards)\n\n        return True\n\nclass VideoRecorderCallback(BaseCallback):\n    def __init__(self, video_path, frame_size, video_length=-1, fps=30, skip_frame=1, verbose=0):\n        super().__init__(verbose)\n        self.video_recorder = VideoRecorder(video_path, frame_size, fps)\n        self.max_length = video_length\n        self.skip_frame = skip_frame\n\n    def _on_step(self) -> bool:\n        # Add frame to video\n        if self.max_length != -1 and self.num_timesteps > self.max_length:\n            self.video_recorder.release()\n            return False\n        # Skip every 4 frames to reduce video size\n        if self.num_timesteps % self.skip_frame != 0:\n            return True\n        display = self.training_env.unwrapped.envs[0].env.display\n        frame = np.array(pygame.surfarray.array3d(display), dtype=np.uint8).transpose([1, 0, 2])\n\n        self.video_recorder.add_frame(frame)\n        return True\n\n    def _on_training_end(self) -> None:\n        self.video_recorder.release()\n\n\ndef lr_schedule(initial_value: float, end_value: float, rate: float):\n    \"\"\"\n    Learning rate schedule:\n        Exponential decay by factors of 10 from initial_value to end_value.\n\n    :param initial_value: Initial learning rate.\n    :param rate: Exponential rate of decay. High values mean fast early drop in LR\n    :param end_value: The final value of the learning rate.\n    :return: schedule that computes current learning rate depending on remaining progress\n    \"\"\"\n\n    def func(progress_remaining: float) -> float:\n        \"\"\"\n        Progress will decrease from 1 (beginning) to 0.\n\n        :param progress_remaining: A float value between 0 and 1 that represents the remaining progress.\n        :return: The current learning rate.\n        \"\"\"\n        if progress_remaining <= 0:\n            return end_value\n\n        return end_value + (initial_value - end_value) * (10 ** (rate * math.log10(progress_remaining)))\n\n    func.__str__ = lambda: f\"lr_schedule({initial_value}, {end_value}, {rate})\"\n    lr_schedule.__str__ = lambda: f\"lr_schedule({initial_value}, {end_value}, {rate})\"\n\n    return func\n\n\nclass HistoryWrapperObsDict(gym.Wrapper):\n    # History Wrapper from rl-baselines3-zoo\n    # https://github.com/DLR-RM/rl-baselines3-zoo/blob/10de3a8804b14b4ea605b487ae7d8117c52901c4/rl_zoo3/wrappers.py\n    \"\"\"\n    History Wrapper for dict observation.\n    :param env:\n    :param horizon: Number of steps to keep in the history.\n    \"\"\"\n\n    def __init__(self, env: gym.Env, horizon: int = 2, obs_key: str = 'vae_latent') -> object:\n        self.obs_key = obs_key\n        assert isinstance(env.observation_space.spaces[obs_key], gym.spaces.Box)\n        print(\"Wrapping the env with HistoryWrapperObsDict.\")\n        wrapped_obs_space = env.observation_space.spaces[self.obs_key]\n        wrapped_action_space = env.action_space\n\n        low_obs = np.repeat(wrapped_obs_space.low, horizon, axis=-1)\n        high_obs = np.repeat(wrapped_obs_space.high, horizon, axis=-1)\n\n        low_action = np.repeat(wrapped_action_space.low, horizon, axis=-1)\n        high_action = np.repeat(wrapped_action_space.high, horizon, axis=-1)\n\n        low = np.concatenate((low_obs, low_action))\n        high = np.concatenate((high_obs, high_action))\n\n        # Overwrite the observation space\n        env.observation_space.spaces[obs_key] = gym.spaces.Box(low=low, high=high, dtype=wrapped_obs_space.dtype)\n\n        super().__init__(env)\n\n        self.horizon = horizon\n        self.low_action, self.high_action = low_action, high_action\n        self.low_obs, self.high_obs = low_obs, high_obs\n        self.low, self.high = low, high\n        self.obs_history = np.zeros(low_obs.shape, low_obs.dtype)\n        self.action_history = np.zeros(low_action.shape, low_action.dtype)\n\n    def _create_obs_from_history(self):\n        return np.concatenate((self.obs_history, self.action_history))\n\n    def reset(self):\n        # Flush the history\n        self.obs_history[...] = 0\n        self.action_history[...] = 0\n        obs_dict = self.env.reset()\n        obs = obs_dict[self.obs_key]\n        self.obs_history[..., -obs.shape[-1]:] = obs\n\n        obs_dict[self.obs_key] = self._create_obs_from_history()\n\n        return obs_dict\n\n    def step(self, action):\n        obs_dict, reward, done, info = self.env.step(action)\n        obs = obs_dict[self.obs_key]\n        last_ax_size = obs.shape[-1]\n\n        self.obs_history = np.roll(self.obs_history, shift=-last_ax_size, axis=-1)\n        self.obs_history[..., -obs.shape[-1]:] = obs\n\n        self.action_history = np.roll(self.action_history, shift=-action.shape[-1], axis=-1)\n        self.action_history[..., -action.shape[-1]:] = action\n\n        obs_dict[self.obs_key] = self._create_obs_from_history()\n\n        return obs_dict, reward, done, info\n\n\nclass FrameSkip(gym.Wrapper):\n    \"\"\"\n    Return only every ``skip``-th frame (frameskipping)\n    :param env: the environment\n    :param skip: number of ``skip``-th frame\n    \"\"\"\n\n    def __init__(self, env: gym.Env, skip: int = 4):\n        super().__init__(env)\n        print(\"Wrapping the env with FrameSkip.\")\n        self._skip = skip\n\n    def step(self, action: np.ndarray):\n        \"\"\"\n        Step the environment with the given action\n        Repeat action, sum reward.\n        :param action: the action\n        :return: observation, reward, done, information\n        \"\"\"\n        total_reward = 0.0\n        done = None\n        for _ in range(self._skip):\n            obs, reward, done, info = self.env.step(action)\n            total_reward += reward\n            if done:\n                break\n\n        return obs, total_reward, done, info\n\n    def reset(self):\n        return self.env.reset()\n\n\ndef parse_wrapper_class(wrapper_class_str: str):\n    \"\"\"\n    Parse a string to a wrapper class.\n\n    :param wrapper_class_str: (str) The string to parse.\n    :return: (type) The wrapper class and its parameters.\n    \"\"\"\n    wrap_class, wrap_params = wrapper_class_str.split(\"_\", 1)\n    wrap_params = wrap_params.split(\"_\")\n    wrap_params = [int(param) if param.isnumeric() else param for param in wrap_params]\n\n    if wrap_class == \"HistoryWrapperObsDict\":\n        return HistoryWrapperObsDict, wrap_params\n    elif wrap_class == \"FrameSkip\":\n        return FrameSkip, wrap_params\n"
  }
]